OLAC Record oai:www.ldc.upenn.edu:LDC2011S06 |
Metadata | ||
Title: | 2005 Spring NIST Rich Transcription (RT-05S) Evaluation Set | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | NIST Multimodal Information Group. 2005 Spring NIST Rich Transcription (RT-05S) Evaluation Set LDC2011S06. Web Download. Philadelphia: Linguistic Data Consortium, 2011 | |
Contributor: | NIST Multimodal Information Group | |
Date (W3CDTF): | 2011 | |
Date Issued (W3CDTF): | 2011-08-15 | |
Description: | *Introduction* 2005 Spring NIST Rich Transcription (RT-05S) Conference Meeting Evaluation Set was developed by LDC and NIST (National Institute of Standards and Technology). It contains approximately 78 hours of English meeting speech, reference transcripts and other material used in the RT Spring 2005 evaluation. Rich Transcription (RT) is broadly defined as a fusion of speech-to-text (STT) technology and metadata extraction technologies providing the bases for the generation of more usable transcriptions of human-human speech in meetings. LDC has also released 2004 Spring NIST Rich Transcription (RT-04S) Development Data LDC2007S11 and 2004 Spring NIST Rich Transcription (RT-04S) Evaluation Data LDC2007S12. RT-05S included the following tasks in the meeting domain: * Speech-To-Text (STT) -convert spoken words into streams of text * Speaker Diarization (SPKR) -find the segments of time within a meeting in which each meeting participant is talking * Speech Activity Detection (SAD) - detect when someone in a meeting space is talking Further information about the evaluation is available on the RT-05Spring Evaluation Website. Please note the lecture meeting data is not included in this release. *Data Description* The data in this release consists of portions of meeting speech collected between 2001 and 2005 by the IDIAP Research Institutes Augmented Multi-Party Interaction project (AMI), Martigny, Switzerland International Computer Science Institute (ICSI) at University of California, Berkeley Interactive Systems Laboratories (ISL) at Carnegie Mellon University (CMU), Pittsburgh, PA NIST and Virginia Polytechnic Institute and State University (VT), Blacksburg, VA. Each meeting excerpt contains a head-mic recording for each subject and one or more distant microphone recordings. Reference transcripts for the evaluation excerpts were prepared by LDC according to its Meeting Recording Careful Transcription Guidelines. Those specifications are designed to provide an accurate, verbatim (word-for-word) transcription, time-aligned with the audio file and including the identification of additional audio and speech signals with special mark-up. *Samples* For an example of the data contained in this corpus, review this audio sample and transcript sample. | |
Extent: | Corpus size: 10148710 KB | |
Format: | Sampling Rate: 16000 | |
Sampling Format: pcm | ||
Identifier: | LDC2011S06 | |
https://catalog.ldc.upenn.edu/LDC2011S06 | ||
ISBN: 1-58563-588-X | ||
ISLRN: 771-157-694-578-3 | ||
DOI: 10.35111/hv72-jf32 | ||
Language: | English | |
Language (ISO639): | eng | |
License: | LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf | |
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2011S06 | |
Rights Holder: | Portions © 2011 Trustees of the University of Pennsylvania | |
Type (DCMI): | Sound | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2011S06 | |
DateStamp: | 2021-09-09 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | NIST Multimodal Information Group. 2011. Linguistic Data Consortium. | |
Terms: | area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text |