OLAC Record
oai:www.ldc.upenn.edu:LDC2021S04

Metadata
Title:The SSNCE Database of Tamil Dysarthric Speech
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Vijayalakshmi, P., T. A. Mariya Celin, and T. Nagarajan. The SSNCE Database of Tamil Dysarthric Speech LDC2021S04. Web Download. Philadelphia: Linguistic Data Consortium, 2021
Contributor:Vijayalakshmi, P.
Mariya Celin, T. A.
Nagarajan, T.
Date (W3CDTF):2021
Date Issued (W3CDTF):2021-05-17
Description:*Introduction* The SSNCE Database of Tamil Dysarthric Speech was developed by the Speech Lab, SSN College of Engineering, India, in collaboration with the Indian National Institute of Empowerment of Persons with Multiple Disabilities (NIEPMD) and contains approximately eight hours of Tamil speech data, time-aligned transcripts and metadata collected from 30 speakers (20 dysarthric speakers and 10 non-dysarthric speakers). Dysarthria is a speech disorder caused by muscle weakness which can result in slowed and slurred speech that is difficult to understand. Common causes of dysarthria include nervous system disorders and conditions that cause facial paralysis or tongue or throat muscle weakness. *Data* The non-dysarthric speakers consisted of five female and five male subjects. The dysarthric speakers (7 female, 13 male) reported a diagnosis of cerebral palsy and ranged in age from 12 years old to 37 years old. The speech data was collected between 2015 and 2017 in two sessions at NIEPMD. In total, each speaker recorded 365 utterances consisting of single words and of sentences that included a combination of common and uncommon Tamil phrases. The corpus includes time-aligned phonetic transcripts for all collected speech data. Additional documentation includes phoneme mappings and speaker metadata. Audio data is presented as 16-bit 16kHz FLAC compressed linear pcm wav. Transcripts are presented as UTF-8 encoded plain text. *Samples* Please view the following samples: * Audio sample (FLAC) * Phonetic Transcript (TXT) * Word Transcript (TXT) * Plain Transcript (TXT) *Updates* None at this time.
Extent:Corpus size: 614629 KB
Format:Sampling Rate: 16000
Sampling Format: pcm
Identifier:LDC2021S04
https://catalog.ldc.upenn.edu/LDC2021S04
ISBN: 1-58563-965-6
ISLRN: 064-987-156-004-1
DOI: 10.35111/hkh2-vh40
Language:Tamil
Language (ISO639):tam
License:The SSNCE Database of Tamil Dysarthric Speech Agreement: https://catalog.ldc.upenn.edu/license/the-ssnce-database-of-tamil-dysarthric-speech-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2021S04
Rights Holder:Portions © 2021 Speech Lab, SSN College of Engineering, © 2021 Trustees of the University of Pennsylvania
Type (DCMI):Sound
Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2021S04
DateStamp:  2022-01-01
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Vijayalakshmi, P.; Mariya Celin, T. A.; Nagarajan, T. 2021. Linguistic Data Consortium.
Terms: area_Asia country_IN dcmi_Sound dcmi_Text iso639_tam olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2021S04
Up-to-date as of: Sun Jun 16 7:35:05 EDT 2024