OLAC Record: 2004 Spring NIST Rich Transcription (RT-04S) Evaluation Data

OLAC Record
oai:www.ldc.upenn.edu:LDC2007S12

Metadata

Title: 2004 Spring NIST Rich Transcription (RT-04S) Evaluation Data

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Fiscus, Jonathan G., et al. 2004 Spring NIST Rich Transcription (RT-04S) Evaluation Data LDC2007S12. Web Download. Philadelphia: Linguistic Data Consortium, 2007

Contributor: Fiscus, Jonathan G.

Garofolo, John S.

Le, Audrey

Martin, Alvin

Sanders, Greg

Przybocki, Mark

Pallett, David

Date (W3CDTF): 2007

Date Issued (W3CDTF): 2007-10-17

Description: *Introduction* 2004 Spring NIST Rich Transcription (RT-04S) Evaluation Data contains the test material (meeting speech and reference transcripts) used in the RT-04S evaluation administered by the NIST (National Institute of Standards and Technology) Speech Group. Rich Transcription (RT) is broadly defined as a fusion of speech-to-text technology and metadata extraction technologies designed to provide the basis for a generation of more usable transcriptions of human-human meeting speech. The data in this release consists of portions of meeting speech collected and/or transcribed by the International Computer Science Institute (ICSI) at Berkeley, the Interactive Systems Laboratories (ISL) at Carnegie Mellon University, NIST and LDC. The complete meeting speech and corresponding transcript data sets are available from LDC's catalog as follows: ICSI Meeting Speech (LDC2004S02), ICSI Meeting Transcripts (LDC2004T04), ISL Meeting Speech Part 1 (LDC2004S05), ISL Meeting Transcripts Part 1 (LDC2004T10), NIST Meeting Pilot Corpus Speech (LDC2004S09) and NIST Meeting Pilot Corpus Transcripts and Metadata (LDC2004T13). RT-04S included the following tasks in the meeting domain: Speech-to-Text Transcription (STT) tasks Microphone conditions: * Multiple distant microphones * Single distant microphone * Individual head microphone Processing time conditions: * Unlimited time STT * Less than or equal to twenty times realtime * Less than or equal to ten times realtime * Less than or equal to one times realtime Diarization (SPKR) task (?who spoke when?) Microphone conditions: * Multiple distant microphones * Single distant microphone Input conditions: * Speech input only * Speech plus reference transcript input Processing time conditions: * Unlimited time * Less than or equal to twenty times realtime * Less than or equal to ten times realtime * Less than or equal to one time realtime Futher information about the evaluation is available on the RT-04 Spring Evaluation Website. *Samples* For an example of the data in this corpus, please review this audio sample.

Extent: Corpus size: 3670016 KB

Identifier: LDC2007S12

https://catalog.ldc.upenn.edu/LDC2007S12

ISBN: 1-58563-448-4

ISLRN: 581-401-882-415-9

DOI: 10.35111/xc4g-he80

Language: English

Language (ISO639): eng

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2007S12

Rights Holder: Portions © 2003 Interactive Systems Laboratories, Carnegie Mellon University, © 2000-2001 International Computer Science Institute, © 2001, 2004, 2007 Trustees of the University of Pennsylvania

Type (DCMI): Sound

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2007S12

DateStamp: 2021-09-09

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Fiscus, Jonathan G.; Garofolo, John S.; Le, Audrey; Martin, Alvin; Sanders, Greg; Przybocki, Mark; Pallett, David. 2007. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2007S12
Up-to-date as of: Wed Oct 29 7:01:00 EDT 2025

Metadata
Title:		2004 Spring NIST Rich Transcription (RT-04S) Evaluation Data
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Fiscus, Jonathan G., et al. 2004 Spring NIST Rich Transcription (RT-04S) Evaluation Data LDC2007S12. Web Download. Philadelphia: Linguistic Data Consortium, 2007
Contributor:		Fiscus, Jonathan G.
		Garofolo, John S.
		Le, Audrey
		Martin, Alvin
		Sanders, Greg
		Przybocki, Mark
		Pallett, David
Date (W3CDTF):		2007
Date Issued (W3CDTF):		2007-10-17
Description:		Introduction 2004 Spring NIST Rich Transcription (RT-04S) Evaluation Data contains the test material (meeting speech and reference transcripts) used in the RT-04S evaluation administered by the NIST (National Institute of Standards and Technology) Speech Group. Rich Transcription (RT) is broadly defined as a fusion of speech-to-text technology and metadata extraction technologies designed to provide the basis for a generation of more usable transcriptions of human-human meeting speech. The data in this release consists of portions of meeting speech collected and/or transcribed by the International Computer Science Institute (ICSI) at Berkeley, the Interactive Systems Laboratories (ISL) at Carnegie Mellon University, NIST and LDC. The complete meeting speech and corresponding transcript data sets are available from LDC's catalog as follows: ICSI Meeting Speech (LDC2004S02), ICSI Meeting Transcripts (LDC2004T04), ISL Meeting Speech Part 1 (LDC2004S05), ISL Meeting Transcripts Part 1 (LDC2004T10), NIST Meeting Pilot Corpus Speech (LDC2004S09) and NIST Meeting Pilot Corpus Transcripts and Metadata (LDC2004T13). RT-04S included the following tasks in the meeting domain: Speech-to-Text Transcription (STT) tasks Microphone conditions: * Multiple distant microphones * Single distant microphone * Individual head microphone Processing time conditions: * Unlimited time STT * Less than or equal to twenty times realtime * Less than or equal to ten times realtime * Less than or equal to one times realtime Diarization (SPKR) task (?who spoke when?) Microphone conditions: * Multiple distant microphones * Single distant microphone Input conditions: * Speech input only * Speech plus reference transcript input Processing time conditions: * Unlimited time * Less than or equal to twenty times realtime * Less than or equal to ten times realtime * Less than or equal to one time realtime Futher information about the evaluation is available on the RT-04 Spring Evaluation Website. Samples For an example of the data in this corpus, please review this audio sample.
Extent:		Corpus size: 3670016 KB
Identifier:		LDC2007S12
		https://catalog.ldc.upenn.edu/LDC2007S12
		ISBN: 1-58563-448-4
		ISLRN: 581-401-882-415-9
		DOI: 10.35111/xc4g-he80
Language:		English
Language (ISO639):		eng
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2007S12
Rights Holder:		Portions © 2003 Interactive Systems Laboratories, Carnegie Mellon University, © 2000-2001 International Computer Science Institute, © 2001, 2004, 2007 Trustees of the University of Pennsylvania
Type (DCMI):		Sound
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2007S12
DateStamp:		2021-09-09
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Fiscus, Jonathan G.; Garofolo, John S.; Le, Audrey; Martin, Alvin; Sanders, Greg; Przybocki, Mark; Pallett, David. 2007. Linguistic Data Consortium.
Terms:		area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text