OLAC Record: Road Rally

OLAC Record
oai:www.ldc.upenn.edu:LDC93S11

Metadata

Title: Road Rally

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: NIST Multimodal Information Group. Road Rally LDC93S11. Web Download. Philadelphia: Linguistic Data Consortium, 1993

Contributor: NIST Multimodal Information Group

Date (W3CDTF): 1993

Description: *Introduction* Road Rally was developed by NIST and consists of over eight hours of recorded conversational speech in English. The corpus was designed for the development and testing of word-spotting systems and was collected in a conversational domain using a road rally planning task as the topic. The corpus actually consists of two sub-corpora: "Stonehenge" and "Waterloo." The Stonehenge corpus contains road rally planning conversations as well as some read speech collected using high quality microphones and a telephone-simulating filter. The Waterloo corpus contains read road rally planning domain speech which was collected using actual telephone lines. *Data* *Stonehenge* The Stonehenge corpus was collected from subjects using telephone handsets which were modified to contain a high quality microphone. To gather conversational data, two talkers were located in separate rooms, given a road map, and asked to participate in a road rally planning task. Their objective was to form a path between two locations on the map in order to maximize their road rally point score. They were also given a time limit in which to complete the task to increase their responsiveness. Their speech was recorded on a stereo tape recorder with each subject's speech on a separate track. The tracks were digitized, and the speech was edited to remove silences longer than a second or so. This resulted in approximately three minutes of continuous speech per subject. The speech was filtered using a 300Hz to 3300Hz PCM FIR bandpass filter to simulate telephone bandwidth quality. The Stonehenge corpus consists of 80 speakers: 28 females and 52 males. Audio files are presented as single channel, 16-bit, 10 kHz sphere files. "Key" word marking files are included for almost all of the speech files. These identify key words and their locations in the speech files by providing sample-number-aligned identification of occurrences of the key words. *Waterloo* The Waterloo corpus was collected as an extension to Stonehenge to provide similar domain speech under different conditions. The corpus was collected from subjects using conventional telephones and dialed up telephone lines in the Massachusetts area. Unlike the Stonehenge speech, the Waterloo speech is naturally band-limited by the telephones/lines but for consistency, the speech was also filtered using the Stonehenge 300Hz to 3300Hz PCM FIR bandpass filter. The corpus consists of 56 speakers (28 males and 28 females) each reading aloud a paragraph of road rally domain speech. Audio files are presented as single channel, 16-bit, 10 kHz sphere files. "Key" word marking files are included for all of the speech files. These identify key words and their locations in the speech files by providing sample-number-aligned identification of occurrences of the key words. *Samples* Please view the following samples: * Stonehenge audio * Stonehenge keywords * Waterloo audio * Waterloo keywords *Updates* None at this time.

Extent: Corpus size: 454997 KB

Format: Sampling Rate: 10000

Sampling Format: 1-channel pcm

Identifier: LDC93S11

https://catalog.ldc.upenn.edu/LDC93S11

ISBN: 1-58563-014-4

ISLRN: 520-913-092-152-0

DOI: 10.35111/cwdb-n353

Language: English

Language (ISO639): eng

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC93S11

Rights Holder: Portions © 1993 Trustees of the University of Pennsylvania

Type (DCMI): Sound

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC93S11

DateStamp: 2024-06-13

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: NIST Multimodal Information Group. 1993. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC93S11
Up-to-date as of: Wed Oct 29 7:00:28 EDT 2025

Metadata
Title:		Road Rally
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		NIST Multimodal Information Group. Road Rally LDC93S11. Web Download. Philadelphia: Linguistic Data Consortium, 1993
Contributor:		NIST Multimodal Information Group
Date (W3CDTF):		1993
Description:		Introduction Road Rally was developed by NIST and consists of over eight hours of recorded conversational speech in English. The corpus was designed for the development and testing of word-spotting systems and was collected in a conversational domain using a road rally planning task as the topic. The corpus actually consists of two sub-corpora: "Stonehenge" and "Waterloo." The Stonehenge corpus contains road rally planning conversations as well as some read speech collected using high quality microphones and a telephone-simulating filter. The Waterloo corpus contains read road rally planning domain speech which was collected using actual telephone lines. Data Stonehenge The Stonehenge corpus was collected from subjects using telephone handsets which were modified to contain a high quality microphone. To gather conversational data, two talkers were located in separate rooms, given a road map, and asked to participate in a road rally planning task. Their objective was to form a path between two locations on the map in order to maximize their road rally point score. They were also given a time limit in which to complete the task to increase their responsiveness. Their speech was recorded on a stereo tape recorder with each subject's speech on a separate track. The tracks were digitized, and the speech was edited to remove silences longer than a second or so. This resulted in approximately three minutes of continuous speech per subject. The speech was filtered using a 300Hz to 3300Hz PCM FIR bandpass filter to simulate telephone bandwidth quality. The Stonehenge corpus consists of 80 speakers: 28 females and 52 males. Audio files are presented as single channel, 16-bit, 10 kHz sphere files. "Key" word marking files are included for almost all of the speech files. These identify key words and their locations in the speech files by providing sample-number-aligned identification of occurrences of the key words. Waterloo The Waterloo corpus was collected as an extension to Stonehenge to provide similar domain speech under different conditions. The corpus was collected from subjects using conventional telephones and dialed up telephone lines in the Massachusetts area. Unlike the Stonehenge speech, the Waterloo speech is naturally band-limited by the telephones/lines but for consistency, the speech was also filtered using the Stonehenge 300Hz to 3300Hz PCM FIR bandpass filter. The corpus consists of 56 speakers (28 males and 28 females) each reading aloud a paragraph of road rally domain speech. Audio files are presented as single channel, 16-bit, 10 kHz sphere files. "Key" word marking files are included for all of the speech files. These identify key words and their locations in the speech files by providing sample-number-aligned identification of occurrences of the key words. Samples Please view the following samples: * Stonehenge audio * Stonehenge keywords * Waterloo audio * Waterloo keywords Updates None at this time.
Extent:		Corpus size: 454997 KB
Format:		Sampling Rate: 10000
Format:		Sampling Format: 1-channel pcm
Identifier:		LDC93S11
		https://catalog.ldc.upenn.edu/LDC93S11
		ISBN: 1-58563-014-4
		ISLRN: 520-913-092-152-0
		DOI: 10.35111/cwdb-n353
Language:		English
Language (ISO639):		eng
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC93S11
Rights Holder:		Portions © 1993 Trustees of the University of Pennsylvania
Type (DCMI):		Sound
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC93S11
DateStamp:		2024-06-13
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		NIST Multimodal Information Group. 1993. Linguistic Data Consortium.
Terms:		area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text