OLAC Record oai:www.ldc.upenn.edu:LDC2004S09 |
Metadata | ||
Title: | NIST Meeting Pilot Corpus Speech | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Garofolo, John S., et al. NIST Meeting Pilot Corpus Speech LDC2004S09. Web Download. Philadelphia: Linguistic Data Consortium, 2004 | |
Contributor: | Garofolo, John S. | |
Michel, Martial | ||
Stanford, Vincent M. | ||
Tabassi, Elham | ||
Fiscus, Jonathan G. | ||
Laprun, Christophe D. | ||
Pratz, Nicolas | ||
Lard, Jerome | ||
Date (W3CDTF): | 2004 | |
Date Issued (W3CDTF): | 2004-07-12 | |
Description: | *Introduction* NIST Meeting Pilot Corpus Speech was developed by the National Institutes of Standards and Technology (NIST) and contains approximately 15 hours of English meeting speech. The corresponding transcripts for these speech files are available as NIST Meeting Pilot Corpus Transcripts and Metadata (LDC2004T13). Huge efforts are being expended in mining information in newswire, news broadcasts, and conversational speech, however, little has been done to address such applications in the more challenging and equally important meeting domain. Meetings have several important properties not found in other domains, such as being diverse in formality and vocabulary, being highly interactive across multiple participants, using distant microphones, using overlapping camera views, and necessitating multi-media information integration. The development of smart meeting room core technologies that can automatically recognize and extract important information from multi-media sensor inputs will provide an invaluable resource for a variety of business, academic, and governmental applications. *Data* The data for the NIST Automatic Meeting Recognition Project was collected at the NIST Gaithersburg, MD, Meeting Data Collection Laboratory. This release contains 369 SPHERE audio files generated from 19 meetings (comprising about 15 hours of meeting room data and amounting to about 32 GB) recorded between November 2001 and December 2003. Each meeting was recorded using two wireless "personal" mics attached to each meeting participant: a close-talking noise-cancelling boom mic and an omni-directional lapel mic. Each meeting was also recorded using three omni-directional table mics and a four-channel directional table mic covering 365 degrees (each channel is recorded in a separate file). Each individual channel was converted from its 48 kHz, 24-bits, linear PCM source format to 16 kHz, 16-bits, linear PCM-sampled audio SPHERE-formatted files. A total of 61 subjects were involved in these meetings. The following is a breakdown by participant origin and gender: # Male Instances # Unique Males # Female Instances # Unique Females Total Participants Instances Total Unique Participants Native 54 30 33 15 87 45 Non-Native 18 11 10 5 28 16 Total 72 41 43 20 115 61 *Samples* * Audio Sample (SPH) *Updates* There are no updates available at this time. | |
Extent: | Corpus size: 33554432 KB | |
Format: | Sampling Rate: 16000 | |
Sampling Format: pcm | ||
Identifier: | LDC2004S09 | |
https://catalog.ldc.upenn.edu/LDC2004S09 | ||
ISBN: 1-58563-302-x | ||
ISLRN: 706-538-229-826-0 | ||
DOI: 10.35111/800p-fv08 | ||
Language: | English | |
Language (ISO639): | eng | |
License: | LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf | |
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2004S09 | |
Rights Holder: | Portions © 2004 Trustees of the University of Pennsylvania | |
Type (DCMI): | Sound | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2004S09 | |
DateStamp: | 2024-03-27 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Garofolo, John S.; Michel, Martial; Stanford, Vincent M.; Tabassi, Elham; Fiscus, Jonathan G.; Laprun, Christophe D.; Pratz, Nicolas; Lard, Jerome. 2004. Linguistic Data Consortium. | |
Terms: | area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text |