OLAC Record: Audiovisual Database of Spoken American English

OLAC Record
oai:www.ldc.upenn.edu:LDC2009V01

Metadata

Title: Audiovisual Database of Spoken American English

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Richie, Carolyn, Sarah Warburton, and Megan Carter. Audiovisual Database of Spoken American English LDC2009V01. Web Download. Philadelphia: Linguistic Data Consortium, 2009

Contributor: Richie, Carolyn

Warburton, Sarah

Carter, Megan

Date (W3CDTF): 2009

Date Issued (W3CDTF): 2009-02-16

Description: *Introduction* The Audiovisual Database of Spoken American English, Linguistic Data Consortium (LDC) catalog number LDC2009V01 and isbn 1-58563-496-4, was developed at Butler University, Indianapolis, IN in 2007 for use by a a variety of researchers to evaluate speech production and speech recognition. It contains approximately seven hours of audiovisual recordings of fourteen American English speakers producing syllables, word lists and sentences used in both academic and clinical settings. All talkers were from the North Midland dialect region -- roughly defined as Indianapolis and north within the state of Indiana -- and had lived in that region for the majority of the time from birth to 18 years of age. Each participant read 238 different words and 166 different sentences. The sentences spoken were drawn from the following sources: * Central Institute for the Deaf (CID) Everyday Sentences (Lists A-J) * Northwestern University Auditory Test No. 6 (Lists I-IV) * Vowels in /hVd/ context (separate words) * Texas Instruments/Massachusetts Institute for Technology (TIMIT) sentences The CID Everyday Sentences were created in the 1950s from a sample developed by the Armed Forces National Research Committee on Hearing and Bio-Acoustics. They are considered to represent everyday American speech and have the following characteristics: the vocabulary is appropriate to adults; the words appear with high frequency in one or more of the well-known word counts of the English language; proper names and proper nouns are not used; common non-slang idioms and contractions are used freely; phonetic loading and "tongue-twisting" are avoided; redundancy is high; the level of abstraction is low; and grammatical structure varies freely. Northwestern University Auditory Test No. 6 is a phonemically-balanced set of monosyllabic English words used clinically to test speech perception in adults with hearing loss. The /hVd/ vowel list was created to elicit all of the vowel sounds of American English. The TIMIT sentences are a subset (34 sentences) of the 2342 phonetically-rich sentences read by speakers in the TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1. TIMIT was designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems. TIMIT speakers were from eight dialect regions of the United States. The Audiovisual Database of Spoken American English will be of interest in various disciplines: to linguists for studies of phonetics, phonology, and prosody of American English; to speech scientists for investigations of motor speech production and auditory-visual speech perception; to engineers and computer scientists for investigations of machine audio-visual speech recognition (AVSR); and to speech and hearing scientists for clinical purposes, such as the examination and improvement of speech perception by listeners with hearing loss. *Data* Participants were recorded individually during a single session. A participant first completed a statement of informed consent and a questionnaire to gather biographical data and then was asked by the experimenter to mark his or her Indiana hometown on a state map. The experimenter and participant then moved to a small, sound-treated studio where the participant was seated in front of three navy blue baffles. A laptop computer was elevated to eye-level on a speaker stand and placed approximately 50-60 cm in front of the participant. Prompts were presented to the participant in a Microsoft PowerPoint presentation. The experimenter was seated directly next to the participant, but outside the camera angle, and advanced the PowerPoint slides at a comfortable pace. Participants were recorded with a Panasonic DVC-80 digital video camera to miniDV digital video cassette tapes. All participants wore a Sennheiser MKE-2060 directional/cardioid lapel microphone throughout the recordings. Each speaker produced a total of 94 segmented files which were converted from Final Cut Express to Quicktime (.mov) files and then saved in the appropriately marked folder. If a speaker mispronounced a sentence or word during the recording process, the mispronunciations were edited out of the segments to be archived. The remaining parts of the recording, including the correct repetition of each prompt, were then sequenced together to create a continuous and complete segment. The fourteen participants were between 19 and 61 years of age (with a mean age of 30 years) and native speakers of American English. *Samples* For an example of the data in this corpus, please view this video sample (Quicktime, mov).

Extent: Corpus size: 7759462 KB

Format: Sampling Rate: 44100

Sampling Format: 16 bit linear PCM

Identifier: LDC2009V01

https://catalog.ldc.upenn.edu/LDC2009V01

ISBN: 1-58563-496-4

ISLRN: 121-605-639-540-9

DOI: 10.35111/xj23-6g13

Language: English

Language (ISO639): eng

License: Audiovisual Database of Spoken American English Agreement: https://catalog.ldc.upenn.edu/license/audiovisual-database-of-spoken-american-english.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2009V01

Rights Holder: Portions © 2007 Butler University, © 1993, 2009 Trustees of the University of Pennsylvania

Type (DCMI): MovingImage

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2009V01

DateStamp: 2020-11-30

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Richie, Carolyn; Warburton, Sarah; Carter, Megan. 2009. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_MovingImage iso639_eng olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2009V01
Up-to-date as of: Wed Oct 29 7:01:06 EDT 2025

Metadata
Title:		Audiovisual Database of Spoken American English
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Richie, Carolyn, Sarah Warburton, and Megan Carter. Audiovisual Database of Spoken American English LDC2009V01. Web Download. Philadelphia: Linguistic Data Consortium, 2009
Contributor:		Richie, Carolyn
		Warburton, Sarah
		Carter, Megan
Date (W3CDTF):		2009
Date Issued (W3CDTF):		2009-02-16
Description:		Introduction The Audiovisual Database of Spoken American English, Linguistic Data Consortium (LDC) catalog number LDC2009V01 and isbn 1-58563-496-4, was developed at Butler University, Indianapolis, IN in 2007 for use by a a variety of researchers to evaluate speech production and speech recognition. It contains approximately seven hours of audiovisual recordings of fourteen American English speakers producing syllables, word lists and sentences used in both academic and clinical settings. All talkers were from the North Midland dialect region -- roughly defined as Indianapolis and north within the state of Indiana -- and had lived in that region for the majority of the time from birth to 18 years of age. Each participant read 238 different words and 166 different sentences. The sentences spoken were drawn from the following sources: * Central Institute for the Deaf (CID) Everyday Sentences (Lists A-J) * Northwestern University Auditory Test No. 6 (Lists I-IV) * Vowels in /hVd/ context (separate words) * Texas Instruments/Massachusetts Institute for Technology (TIMIT) sentences The CID Everyday Sentences were created in the 1950s from a sample developed by the Armed Forces National Research Committee on Hearing and Bio-Acoustics. They are considered to represent everyday American speech and have the following characteristics: the vocabulary is appropriate to adults; the words appear with high frequency in one or more of the well-known word counts of the English language; proper names and proper nouns are not used; common non-slang idioms and contractions are used freely; phonetic loading and "tongue-twisting" are avoided; redundancy is high; the level of abstraction is low; and grammatical structure varies freely. Northwestern University Auditory Test No. 6 is a phonemically-balanced set of monosyllabic English words used clinically to test speech perception in adults with hearing loss. The /hVd/ vowel list was created to elicit all of the vowel sounds of American English. The TIMIT sentences are a subset (34 sentences) of the 2342 phonetically-rich sentences read by speakers in the TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1. TIMIT was designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems. TIMIT speakers were from eight dialect regions of the United States. The Audiovisual Database of Spoken American English will be of interest in various disciplines: to linguists for studies of phonetics, phonology, and prosody of American English; to speech scientists for investigations of motor speech production and auditory-visual speech perception; to engineers and computer scientists for investigations of machine audio-visual speech recognition (AVSR); and to speech and hearing scientists for clinical purposes, such as the examination and improvement of speech perception by listeners with hearing loss. Data Participants were recorded individually during a single session. A participant first completed a statement of informed consent and a questionnaire to gather biographical data and then was asked by the experimenter to mark his or her Indiana hometown on a state map. The experimenter and participant then moved to a small, sound-treated studio where the participant was seated in front of three navy blue baffles. A laptop computer was elevated to eye-level on a speaker stand and placed approximately 50-60 cm in front of the participant. Prompts were presented to the participant in a Microsoft PowerPoint presentation. The experimenter was seated directly next to the participant, but outside the camera angle, and advanced the PowerPoint slides at a comfortable pace. Participants were recorded with a Panasonic DVC-80 digital video camera to miniDV digital video cassette tapes. All participants wore a Sennheiser MKE-2060 directional/cardioid lapel microphone throughout the recordings. Each speaker produced a total of 94 segmented files which were converted from Final Cut Express to Quicktime (.mov) files and then saved in the appropriately marked folder. If a speaker mispronounced a sentence or word during the recording process, the mispronunciations were edited out of the segments to be archived. The remaining parts of the recording, including the correct repetition of each prompt, were then sequenced together to create a continuous and complete segment. The fourteen participants were between 19 and 61 years of age (with a mean age of 30 years) and native speakers of American English. Samples For an example of the data in this corpus, please view this video sample (Quicktime, mov).
Extent:		Corpus size: 7759462 KB
Format:		Sampling Rate: 44100
Format:		Sampling Format: 16 bit linear PCM
Identifier:		LDC2009V01
		https://catalog.ldc.upenn.edu/LDC2009V01
		ISBN: 1-58563-496-4
		ISLRN: 121-605-639-540-9
		DOI: 10.35111/xj23-6g13
Language:		English
Language (ISO639):		eng
License:		Audiovisual Database of Spoken American English Agreement: https://catalog.ldc.upenn.edu/license/audiovisual-database-of-spoken-american-english.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2009V01
Rights Holder:		Portions © 2007 Butler University, © 1993, 2009 Trustees of the University of Pennsylvania
Type (DCMI):		MovingImage
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2009V01
DateStamp:		2020-11-30
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Richie, Carolyn; Warburton, Sarah; Carter, Megan. 2009. Linguistic Data Consortium.
Terms:		area_Europe country_GB dcmi_MovingImage iso639_eng olac_primary_text