OLAC Record: Resource Management Complete Set 2.0

OLAC Record
oai:www.ldc.upenn.edu:LDC93S3A

Metadata

Title: Resource Management Complete Set 2.0

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Price, P, et al. Resource Management Complete Set 2.0 LDC93S3A. Web Download. Philadelphia: Linguistic Data Consortium, 1993

Contributor: Price, P

Fisher, W M.

Bernstein, Jared

Pallett, D S.

Date (W3CDTF): 1993

Description: *Introduction* Resource Management Complete Set 2.0 was developed by NIST and consists of approximately 30 hours of English speech along with transcriptions. All RM material consists of read sentences modeled after a naval resource management task. There are two main parts, often referred to as RM1 and RM2. RM1 contains three sections, Speaker-Dependent (SD) training data, Speaker-Independent (SI) training data and test and evaluation data. RM2 has an additional and larger SD data set, including test material. Resource Management Complete Set 2.0 contains both RM1 and RM2. They have also been published separately as LDC93S3B and LDC93S3C. *Data* The complete corpus contains over 25,000 utterances from more than 160 speakers representing a variety of American dialects. The material was recorded at 16KHz, with 16-bit resolution, using a Sennheiser HMD-414 headset microphone. *Resource Managment SD and SI Training and Test Data (RM1)* The Speaker-Dependent (SD) Training Data contains 12 subjects (seven male and five female), each reading a set of 600 "training sentences," two "dialect" sentences and ten "rapid adaptation" sentences, for a total of 7,344 recorded sentence utterances. The 600 sentences designated as training cover 97 of the lexical items in the corpus. The Speaker-Independent (SI) Training Data contains 80 speakers (55 male and 25 female), each reading two "dialect" sentences plus 40 sentences from the Resource Management text corpus, for a total of 3,360 recorded sentence utterances. Any given sentence from a set of 1,600 Resource Management sentence texts was recorded by two subjects, while no sentence was read twice by the same subject. RM1 contains all SD and SI system test material used in five DARPA benchmark tests conducted in March and October of 1987, June 1988, and February and October 1989, along with scoring and diagnostic software and documentation for those tests. Documentation is also provided outlining use of the Resource Management training and test material at CMU in development of the SPHINX system. Example output and scored results for state-of-the-art speaker-dependent and speaker-independent systems (i.e. the BBN BYBLOS and CMU SPHINX systems) for the October 1989 benchmark tests are included. *Extended Resource Management Speaker-Dependent Corpus (RM2)* This set forms a speaker-dependent extension to the Resource Management (RM1) corpus. The corpus consists of a total of 10,508 sentence utterances (two male and two female speakers each speaking 2,652 sentence texts). These include the 600 "standard" Resource Management speaker-dependent training sentences, two dialect calibration sentences, ten rapid adaptation sentences, 1,800 newly-generated extended training sentences, 120 newly-generated development-test sentences and 120 newly-generated evaluation-test sentences. The evaluation-test material on this disc was used as the test set for the June 1990 DARPA SLS Resource Management Benchmark Tests (see the Proceedings). The RM2 corpus was recorded at Texas Instruments. The NIST speech recognition scoring software originally distributed on the RM1 "Test" Disc was adapted for RM2 sentences and is included in this publication. *Samples* * RM1 SD * RM1 SI * RM2 *Updates* None at this time.

Extent: Corpus size: 2411724 KB

Format: Sampling Rate: 16000

Sampling Format: 1-channel pcm

Identifier: LDC93S3A

https://catalog.ldc.upenn.edu/LDC93S3A

ISBN: 1-58563-220-1

ISLRN: 257-512-523-174-2

DOI: 10.35111/ga0p-g928

Language: English

Language (ISO639): eng

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC93S3A

Rights Holder: Portions © 1993 Trustees of the University of Pennsylvania

Type (DCMI): Sound

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC93S3A

DateStamp: 2024-05-03

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Price, P; Fisher, W M.; Bernstein, Jared; Pallett, D S. 1993. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC93S3A
Up-to-date as of: Tue May 20 0:13:40 EDT 2025

Metadata
Title:		Resource Management Complete Set 2.0
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Price, P, et al. Resource Management Complete Set 2.0 LDC93S3A. Web Download. Philadelphia: Linguistic Data Consortium, 1993
Contributor:		Price, P
		Fisher, W M.
		Bernstein, Jared
		Pallett, D S.
Date (W3CDTF):		1993
Description:		Introduction Resource Management Complete Set 2.0 was developed by NIST and consists of approximately 30 hours of English speech along with transcriptions. All RM material consists of read sentences modeled after a naval resource management task. There are two main parts, often referred to as RM1 and RM2. RM1 contains three sections, Speaker-Dependent (SD) training data, Speaker-Independent (SI) training data and test and evaluation data. RM2 has an additional and larger SD data set, including test material. Resource Management Complete Set 2.0 contains both RM1 and RM2. They have also been published separately as LDC93S3B and LDC93S3C. Data The complete corpus contains over 25,000 utterances from more than 160 speakers representing a variety of American dialects. The material was recorded at 16KHz, with 16-bit resolution, using a Sennheiser HMD-414 headset microphone. Resource Managment SD and SI Training and Test Data (RM1) The Speaker-Dependent (SD) Training Data contains 12 subjects (seven male and five female), each reading a set of 600 "training sentences," two "dialect" sentences and ten "rapid adaptation" sentences, for a total of 7,344 recorded sentence utterances. The 600 sentences designated as training cover 97 of the lexical items in the corpus. The Speaker-Independent (SI) Training Data contains 80 speakers (55 male and 25 female), each reading two "dialect" sentences plus 40 sentences from the Resource Management text corpus, for a total of 3,360 recorded sentence utterances. Any given sentence from a set of 1,600 Resource Management sentence texts was recorded by two subjects, while no sentence was read twice by the same subject. RM1 contains all SD and SI system test material used in five DARPA benchmark tests conducted in March and October of 1987, June 1988, and February and October 1989, along with scoring and diagnostic software and documentation for those tests. Documentation is also provided outlining use of the Resource Management training and test material at CMU in development of the SPHINX system. Example output and scored results for state-of-the-art speaker-dependent and speaker-independent systems (i.e. the BBN BYBLOS and CMU SPHINX systems) for the October 1989 benchmark tests are included. Extended Resource Management Speaker-Dependent Corpus (RM2) This set forms a speaker-dependent extension to the Resource Management (RM1) corpus. The corpus consists of a total of 10,508 sentence utterances (two male and two female speakers each speaking 2,652 sentence texts). These include the 600 "standard" Resource Management speaker-dependent training sentences, two dialect calibration sentences, ten rapid adaptation sentences, 1,800 newly-generated extended training sentences, 120 newly-generated development-test sentences and 120 newly-generated evaluation-test sentences. The evaluation-test material on this disc was used as the test set for the June 1990 DARPA SLS Resource Management Benchmark Tests (see the Proceedings). The RM2 corpus was recorded at Texas Instruments. The NIST speech recognition scoring software originally distributed on the RM1 "Test" Disc was adapted for RM2 sentences and is included in this publication. Samples * RM1 SD * RM1 SI * RM2 Updates None at this time.
Extent:		Corpus size: 2411724 KB
Format:		Sampling Rate: 16000
Format:		Sampling Format: 1-channel pcm
Identifier:		LDC93S3A
		https://catalog.ldc.upenn.edu/LDC93S3A
		ISBN: 1-58563-220-1
		ISLRN: 257-512-523-174-2
		DOI: 10.35111/ga0p-g928
Language:		English
Language (ISO639):		eng
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC93S3A
Rights Holder:		Portions © 1993 Trustees of the University of Pennsylvania
Type (DCMI):		Sound
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC93S3A
DateStamp:		2024-05-03
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Price, P; Fisher, W M.; Bernstein, Jared; Pallett, D S. 1993. Linguistic Data Consortium.
Terms:		area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text