OLAC Record: TORGO Database of Dysarthric Articulation

OLAC Record
oai:www.ldc.upenn.edu:LDC2012S02

Metadata

Title: TORGO Database of Dysarthric Articulation

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Rudzicz, Frank, et al. TORGO Database of Dysarthric Articulation LDC2012S02. Web Download. Philadelphia: Linguistic Data Consortium, 2012

Contributor: Rudzicz, Frank

Hirst, Graeme

van Lieshout, Pascal

Penn, Gerald

Shein, Fraser

Namasivayam, Aravind

Wolff, Talya

Date (W3CDTF): 2012

Date Issued (W3CDTF): 2012-01-19

Description: *Introduction* TORGO Database of Dysarthric Articulation was developed by the University of Toronto's departments of Computer Science and Speech Language Pathology in collaboration with the Holland-Bloorview Kids Rehabilitation Hospital in Toronto, Canada. It contains approximately 23 hours of English speech data, accompanying transcripts and documentation from 8 speakers (5 males, 3 females) with cerebral palsy (CP) or amyotrophic lateral sclerosis (ALS) and from 7 speakers (4 males, 3 females) from a non-dysarthric control group. CP and ALS are examples of dysarthria which is caused by disruptions in the neuro-motor interface that distort motor commands to the vocal articulators, resulting in atypical and relatively unintelligible speech in most cases. The TORGO database is primarily a resource for developing advanced automatic speaker recognition (ASR) models suited to the needs of people with dysarthria, but it is also applicable to non-dysarthric speech. The inability of modern ASR to effectively understand dysarthric speech is a problem since the more general physical disabilities often associated with the condition can make other forms of computer input, such as computer keyboards or touch screens, difficult to use. *Data* The data consists of aligned acoustics and measured 3D articulatory features from the speakers carried out using the 3D AG500 electro-magnetic articulograph (EMA) system (Carstens Medizinelektronik GmbH, Lenglern, Germany) with fully-automated calibration. This system allows for 3D recordings of articulatory movements inside and outside the vocal tract, thus providing a detailed window on the nature and direction of speech-related activity. The data was collected between 2008 and 2010 in Toronto, Canada. All subjects read text consisting of non-words, short words and restricted sentences from a 19-inch LCD screen. The restricted sentences included 162 sentences from the sentence intelligibility section of Assessment of intelligibility of dysarthric speech (Yorkston & Beukelman, 1981) and 460 sentences derived from the TIMIT database. The unrestricted sentences were elicited by asking participants to spontaneously describe 30 images in interesting situations taken randomly from Webber Photo Cards - Story Starters (Webber, 2005), designed to prompt students to tell or write a story. Data is organized by speaker and by the session in which each speaker recorded data. Each speaker was assigned a code and given their own file directory. The code for female speakers begins with F, and the code for male speakers begins with M. If the speaker was a member of the control group, the letter C follows the gender code. The last two digits of the code indicate the order in which that subject was recruited. For example, speaker FC02 was the second female speaker without dysarthria recruited. Note that some speakers were intentionally left out of the data, and thus, there are gaps in the numbering. Each speakers directory contains Session directories which encapsulate data recorded in the respective visit and occasionally, a Notes directory which can include Frenchay assessments (test for the measurement, description and diagnosis of dysarthria), notes about sessions (e.g., sensor errors), and other relevant notes. Each Session directory can, but does not necessarily, contain the following content: * alignment.txt: This is a text file containing the sample offsets between audio files recorded simultaneously by the array microphone and the head-worn microphone. * amps: These directories contain raw *.amp and *.ini files produced by the AG500 articulograph. * phn_*: These directories contain phonemic transcriptions of audio data. Each file is plain text with a *.PHN file extensions and a filename referring to the utterance number. These files were generated using the free Wavesurfer tool. * pos: These directories contain the head-corrected positions, velocities, and orientations of sensor coils for each utterance, as generated by the AG500 articulograph. * prompts: These directories contain orthographic transcriptions. * rawpos: These directories are equivalent to the pos directories except that their articulographic content is not head-normalized to a constant upright position. * wav_*: These directories contain the acoustics. Each file is a RIFF (little-endian) WAVE audio file (Microsoft PCM, 16 bit, mono 16000 Hz). * wavall: These directories contains a stereo recording in which one channel contains the recorded acoustics and the other channel contains the analog peaks associated with the sweep signal, which is used by the AG500 hardware for synchronization. Additionally, sessions recorded with the AG500 articulograph are marked with the file EMA, and those recorded with the video-based system are marked with the file VIDEO. Files with a date form as the filename and a txt extension (e.g. april232008cal2.txt, jan28cal3.txt) are the measured responses from calibration. The *.log and *.calset files contain descriptions of the calibration process, but not the final result of calibration. See the readme file and the AG500 Wiki for more complete descriptions of the possible subfolders and of the AG500 specific files. Also, see session_contents.tsv for a tab separated table of each sessions subfolders and metadata files. *Samples* For an example of the data contained in this corpus, review these two audio samples: Dysarthric & Control. *Updates* None at this time.

Extent: Corpus size: 15209372 KB

Format: Sampling Rate: 16000

Sampling Format: pcm

Identifier: LDC2012S02

https://catalog.ldc.upenn.edu/LDC2012S02

ISBN: 1-58563-603-7

ISLRN: 239-915-065-794-5

DOI: 10.35111/cp5y-9z47

Language: English

Language (ISO639): eng

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2012S02

Rights Holder: Portions © 2008-2011 Frank Rudzicz, © 2012 Trustees of the University of Pennsylvania

Type (DCMI): Sound

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2012S02

DateStamp: 2020-11-30

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Rudzicz, Frank; Hirst, Graeme; van Lieshout, Pascal; Penn, Gerald; Shein, Fraser; Namasivayam, Aravind; Wolff, Talya. 2012. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2012S02
Up-to-date as of: Tue May 20 0:14:28 EDT 2025

Metadata
Title:		TORGO Database of Dysarthric Articulation
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Rudzicz, Frank, et al. TORGO Database of Dysarthric Articulation LDC2012S02. Web Download. Philadelphia: Linguistic Data Consortium, 2012
Contributor:		Rudzicz, Frank
		Hirst, Graeme
		van Lieshout, Pascal
		Penn, Gerald
		Shein, Fraser
		Namasivayam, Aravind
		Wolff, Talya
Date (W3CDTF):		2012
Date Issued (W3CDTF):		2012-01-19
Description:		Introduction TORGO Database of Dysarthric Articulation was developed by the University of Toronto's departments of Computer Science and Speech Language Pathology in collaboration with the Holland-Bloorview Kids Rehabilitation Hospital in Toronto, Canada. It contains approximately 23 hours of English speech data, accompanying transcripts and documentation from 8 speakers (5 males, 3 females) with cerebral palsy (CP) or amyotrophic lateral sclerosis (ALS) and from 7 speakers (4 males, 3 females) from a non-dysarthric control group. CP and ALS are examples of dysarthria which is caused by disruptions in the neuro-motor interface that distort motor commands to the vocal articulators, resulting in atypical and relatively unintelligible speech in most cases. The TORGO database is primarily a resource for developing advanced automatic speaker recognition (ASR) models suited to the needs of people with dysarthria, but it is also applicable to non-dysarthric speech. The inability of modern ASR to effectively understand dysarthric speech is a problem since the more general physical disabilities often associated with the condition can make other forms of computer input, such as computer keyboards or touch screens, difficult to use. Data The data consists of aligned acoustics and measured 3D articulatory features from the speakers carried out using the 3D AG500 electro-magnetic articulograph (EMA) system (Carstens Medizinelektronik GmbH, Lenglern, Germany) with fully-automated calibration. This system allows for 3D recordings of articulatory movements inside and outside the vocal tract, thus providing a detailed window on the nature and direction of speech-related activity. The data was collected between 2008 and 2010 in Toronto, Canada. All subjects read text consisting of non-words, short words and restricted sentences from a 19-inch LCD screen. The restricted sentences included 162 sentences from the sentence intelligibility section of Assessment of intelligibility of dysarthric speech (Yorkston & Beukelman, 1981) and 460 sentences derived from the TIMIT database. The unrestricted sentences were elicited by asking participants to spontaneously describe 30 images in interesting situations taken randomly from Webber Photo Cards - Story Starters (Webber, 2005), designed to prompt students to tell or write a story. Data is organized by speaker and by the session in which each speaker recorded data. Each speaker was assigned a code and given their own file directory. The code for female speakers begins with F, and the code for male speakers begins with M. If the speaker was a member of the control group, the letter C follows the gender code. The last two digits of the code indicate the order in which that subject was recruited. For example, speaker FC02 was the second female speaker without dysarthria recruited. Note that some speakers were intentionally left out of the data, and thus, there are gaps in the numbering. Each speakers directory contains Session directories which encapsulate data recorded in the respective visit and occasionally, a Notes directory which can include Frenchay assessments (test for the measurement, description and diagnosis of dysarthria), notes about sessions (e.g., sensor errors), and other relevant notes. Each Session directory can, but does not necessarily, contain the following content: * alignment.txt: This is a text file containing the sample offsets between audio files recorded simultaneously by the array microphone and the head-worn microphone. * amps: These directories contain raw .amp and .ini files produced by the AG500 articulograph. * phn_: These directories contain phonemic transcriptions of audio data. Each file is plain text with a .PHN file extensions and a filename referring to the utterance number. These files were generated using the free Wavesurfer tool. * pos: These directories contain the head-corrected positions, velocities, and orientations of sensor coils for each utterance, as generated by the AG500 articulograph. * prompts: These directories contain orthographic transcriptions. * rawpos: These directories are equivalent to the pos directories except that their articulographic content is not head-normalized to a constant upright position. * wav_: These directories contain the acoustics. Each file is a RIFF (little-endian) WAVE audio file (Microsoft PCM, 16 bit, mono 16000 Hz). wavall: These directories contains a stereo recording in which one channel contains the recorded acoustics and the other channel contains the analog peaks associated with the sweep signal, which is used by the AG500 hardware for synchronization. Additionally, sessions recorded with the AG500 articulograph are marked with the file EMA, and those recorded with the video-based system are marked with the file VIDEO. Files with a date form as the filename and a txt extension (e.g. april232008cal2.txt, jan28cal3.txt) are the measured responses from calibration. The .log and .calset files contain descriptions of the calibration process, but not the final result of calibration. See the readme file and the AG500 Wiki for more complete descriptions of the possible subfolders and of the AG500 specific files. Also, see session_contents.tsv for a tab separated table of each sessions subfolders and metadata files. Samples For an example of the data contained in this corpus, review these two audio samples: Dysarthric & Control. Updates None at this time.
Extent:		Corpus size: 15209372 KB
Format:		Sampling Rate: 16000
Format:		Sampling Format: pcm
Identifier:		LDC2012S02
		https://catalog.ldc.upenn.edu/LDC2012S02
		ISBN: 1-58563-603-7
		ISLRN: 239-915-065-794-5
		DOI: 10.35111/cp5y-9z47
Language:		English
Language (ISO639):		eng
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2012S02
Rights Holder:		Portions © 2008-2011 Frank Rudzicz, © 2012 Trustees of the University of Pennsylvania
Type (DCMI):		Sound
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2012S02
DateStamp:		2020-11-30
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Rudzicz, Frank; Hirst, Graeme; van Lieshout, Pascal; Penn, Gerald; Shein, Fraser; Namasivayam, Aravind; Wolff, Talya. 2012. Linguistic Data Consortium.
Terms:		area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text