OLAC Record oai:www.ldc.upenn.edu:LDC2007S11 |
Metadata | ||
Title: | 2004 Spring NIST Rich Transcription (RT-04S) Development Data | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Fiscus, Jonathan G., et al. 2004 Spring NIST Rich Transcription (RT-04S) Development Data LDC2007S11. Web Download. Philadelphia: Linguistic Data Consortium, 2007 | |
Contributor: | Fiscus, Jonathan G. | |
Garofolo, John S. | ||
Le, Audrey | ||
Martin, Alvin | ||
Sanders, Greg | ||
Przybocki, Mark | ||
Pallett, David | ||
Date (W3CDTF): | 2007 | |
Date Issued (W3CDTF): | 2007-12-20 | |
Description: | *Introduction* 2004 NIST Spring Rich Transcription (RT-04S) Development Data contains the test material (meeting speech and reference transcripts) used in the RT-04S evaluation administered by the NIST (National Institute of Standards and Technology) Speech Group. Rich Transcription (RT) is broadly defined as a fusion of speech-to-text technology and metadata extraction technologies designed to provide the basis for a generation of more usable transcriptions of human-human meeting speech. The data in this release contains portions of meeting speech collected, and/or transcribed by the International Computer Science Institute (ICSI) at Berkeley, the Interactive Systems Laboratories (ISL) at Carnegie Mellon University, NIST and LDC. The complete meeting speech and corresponding transcript data sets are available from LDC's catalog as follows: ICSI Meeting Speech (LDC2004S02), ICSI Meeting Transcripts (LDC2004T04), ISL Meeting Speech Part 1 (LDC2004S05), ISL Meeting Transcripts Part 1 (LDC2004T10), NIST Meeting Pilot Corpus Speech (LDC2004S09) and NIST Meeting Pilot Corpus Transcripts and Metadata (LDC2004T13). The RT-04S development data consists of the 80-minute test set used in the RT-02 Meeting Recognition Evaluation, specifcally, approximately 10 minutes of recordings of eight meetings held at ISCI, CMU, LDC and NIST. For RT-04S, NIST re-released that data with additional distant mics (if the data collection sites provided them). Although the development data is comprised of 10-minute excerpts from the same data collection sites which are represented in the RT-04S evaluation data set (2004 Spring NIST Rich Transcription (RT-04S) Evaluation Data, LDC2007S12), it is not completely reflective of the evaluation test data since it contains lapel mics in lieu of head mics for the LDC and CMU data and some different distant mics for LDC data. For more information about the development test data, see NIST's RT-04S Development Data Documentation. RT-04S included the following tasks in the meeting domain: Speech-to-Text Transcription (STT) tasks Microphone conditions: * Multiple distant microphones * Single distant microphone * Individual head microphone Processing time conditions: * Unlimited time STT * Less than or equal to twenty times realtime * Less than or equal to ten times realtime * Less than or equal to one times realtime Diarization (SPKR) task (who spoke when) Microphone conditions: * Multiple distant microphones * Single distant microphone Input conditions: * Speech input only * Speech plus reference transcript input Processing time conditions: * Unlimited time * Less than or equal to twenty times realtime * Less than or equal to ten times realtime * Less than or equal to one time realtime *Samples* For an example of the data in this release, please examine this audio sample and its transcript. | |
Extent: | Corpus size: 4089446 KB | |
Identifier: | LDC2007S11 | |
https://catalog.ldc.upenn.edu/LDC2007S11 | ||
ISBN: 1-58563-447-6 | ||
ISLRN: 293-371-412-539-2 | ||
DOI: 10.35111/fehg-1397 | ||
Language: | English | |
Language (ISO639): | eng | |
License: | LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf | |
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2007S11 | |
Rights Holder: | Portions © 2002 Interactive Systems Laboratories, Carnegie Mellon University, © 2001 International Computer Science Institute, © 2001, 2004, 2007 Trustees of the University of Pennsylvania | |
Type (DCMI): | Sound | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2007S11 | |
DateStamp: | 2021-09-09 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Fiscus, Jonathan G.; Garofolo, John S.; Le, Audrey; Martin, Alvin; Sanders, Greg; Przybocki, Mark; Pallett, David. 2007. Linguistic Data Consortium. | |
Terms: | area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text |