OLAC Record: The SSNCE Database of Tamil Dysarthric Speech

OLAC Record
oai:www.ldc.upenn.edu:LDC2021S04

Metadata

Title: The SSNCE Database of Tamil Dysarthric Speech

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Vijayalakshmi, P., T. A. Mariya Celin, and T. Nagarajan. The SSNCE Database of Tamil Dysarthric Speech LDC2021S04. Web Download. Philadelphia: Linguistic Data Consortium, 2021

Contributor: Vijayalakshmi, P.

Mariya Celin, T. A.

Nagarajan, T.

Date (W3CDTF): 2021

Date Issued (W3CDTF): 2021-05-17

Description: *Introduction* The SSNCE Database of Tamil Dysarthric Speech was developed by the Speech Lab, SSN College of Engineering, India, in collaboration with the Indian National Institute of Empowerment of Persons with Multiple Disabilities (NIEPMD) and contains approximately eight hours of Tamil speech data, time-aligned transcripts and metadata collected from 30 speakers (20 dysarthric speakers and 10 non-dysarthric speakers). Dysarthria is a speech disorder caused by muscle weakness which can result in slowed and slurred speech that is difficult to understand. Common causes of dysarthria include nervous system disorders and conditions that cause facial paralysis or tongue or throat muscle weakness. *Data* The non-dysarthric speakers consisted of five female and five male subjects. The dysarthric speakers (7 female, 13 male) reported a diagnosis of cerebral palsy and ranged in age from 12 years old to 37 years old. The speech data was collected between 2015 and 2017 in two sessions at NIEPMD. In total, each speaker recorded 365 utterances consisting of single words and of sentences that included a combination of common and uncommon Tamil phrases. The corpus includes time-aligned phonetic transcripts for all collected speech data. Additional documentation includes phoneme mappings and speaker metadata. Audio data is presented as 16-bit 16kHz FLAC compressed linear pcm wav. Transcripts are presented as UTF-8 encoded plain text. *Samples* Please view the following samples: * Audio sample (FLAC) * Phonetic Transcript (TXT) * Word Transcript (TXT) * Plain Transcript (TXT) *Updates* None at this time.

Extent: Corpus size: 614629 KB

Format: Sampling Rate: 16000

Sampling Format: pcm

Identifier: LDC2021S04

https://catalog.ldc.upenn.edu/LDC2021S04

ISBN: 1-58563-965-6

ISLRN: 064-987-156-004-1

DOI: 10.35111/hkh2-vh40

Language: Tamil

Language (ISO639): tam

License: The SSNCE Database of Tamil Dysarthric Speech Agreement: https://catalog.ldc.upenn.edu/license/the-ssnce-database-of-tamil-dysarthric-speech-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2021S04

Rights Holder: Portions © 2021 Speech Lab, SSN College of Engineering, © 2021 Trustees of the University of Pennsylvania

Type (DCMI): Sound

Text

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2021S04

DateStamp: 2022-01-01

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Vijayalakshmi, P.; Mariya Celin, T. A.; Nagarajan, T. 2021. Linguistic Data Consortium.
Terms: area_Asia country_IN dcmi_Sound dcmi_Text iso639_tam olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2021S04
Up-to-date as of: Wed Oct 29 7:02:05 EDT 2025

Metadata
Title:		The SSNCE Database of Tamil Dysarthric Speech
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Vijayalakshmi, P., T. A. Mariya Celin, and T. Nagarajan. The SSNCE Database of Tamil Dysarthric Speech LDC2021S04. Web Download. Philadelphia: Linguistic Data Consortium, 2021
Contributor:		Vijayalakshmi, P.
		Mariya Celin, T. A.
		Nagarajan, T.
Date (W3CDTF):		2021
Date Issued (W3CDTF):		2021-05-17
Description:		Introduction The SSNCE Database of Tamil Dysarthric Speech was developed by the Speech Lab, SSN College of Engineering, India, in collaboration with the Indian National Institute of Empowerment of Persons with Multiple Disabilities (NIEPMD) and contains approximately eight hours of Tamil speech data, time-aligned transcripts and metadata collected from 30 speakers (20 dysarthric speakers and 10 non-dysarthric speakers). Dysarthria is a speech disorder caused by muscle weakness which can result in slowed and slurred speech that is difficult to understand. Common causes of dysarthria include nervous system disorders and conditions that cause facial paralysis or tongue or throat muscle weakness. Data The non-dysarthric speakers consisted of five female and five male subjects. The dysarthric speakers (7 female, 13 male) reported a diagnosis of cerebral palsy and ranged in age from 12 years old to 37 years old. The speech data was collected between 2015 and 2017 in two sessions at NIEPMD. In total, each speaker recorded 365 utterances consisting of single words and of sentences that included a combination of common and uncommon Tamil phrases. The corpus includes time-aligned phonetic transcripts for all collected speech data. Additional documentation includes phoneme mappings and speaker metadata. Audio data is presented as 16-bit 16kHz FLAC compressed linear pcm wav. Transcripts are presented as UTF-8 encoded plain text. Samples Please view the following samples: * Audio sample (FLAC) * Phonetic Transcript (TXT) * Word Transcript (TXT) * Plain Transcript (TXT) Updates None at this time.
Extent:		Corpus size: 614629 KB
Format:		Sampling Rate: 16000
Format:		Sampling Format: pcm
Identifier:		LDC2021S04
		https://catalog.ldc.upenn.edu/LDC2021S04
		ISBN: 1-58563-965-6
		ISLRN: 064-987-156-004-1
		DOI: 10.35111/hkh2-vh40
Language:		Tamil
Language (ISO639):		tam
License:		The SSNCE Database of Tamil Dysarthric Speech Agreement: https://catalog.ldc.upenn.edu/license/the-ssnce-database-of-tamil-dysarthric-speech-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2021S04
Rights Holder:		Portions © 2021 Speech Lab, SSN College of Engineering, © 2021 Trustees of the University of Pennsylvania
Type (DCMI):		Sound
Type (DCMI):		Text
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2021S04
DateStamp:		2022-01-01
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Vijayalakshmi, P.; Mariya Celin, T. A.; Nagarajan, T. 2021. Linguistic Data Consortium.
Terms:		area_Asia country_IN dcmi_Sound dcmi_Text iso639_tam olac_primary_text