OLAC Record: HUB5 Mandarin Telephone Speech Corpus

OLAC Record
oai:www.ldc.upenn.edu:LDC98S69

Metadata

Title: HUB5 Mandarin Telephone Speech Corpus

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Linguistic Data Consortium. HUB5 Mandarin Telephone Speech Corpus LDC98S69. Web Download. Philadelphia: Linguistic Data Consortium, 1998

Contributor: Linguistic Data Consortium

Date (W3CDTF): 1998

Description: LDC98S69 - Speech data LDC98T26 - Transcripts *Introduction* This release of HUB5 Mandarin training data consists of 42 calls derived from the CALLFRIEND Mandarin Chinese Mainland Dialect (Language ID) collection. The transcribed data is intended as additional training data in support of the project on Large Vocabulary Conversational Speech Recognition (LVCSR), also sponsored by the U.S. Department of Defense. The transcripts cover a contiguous 5-30 minute segment taken from a recorded conversation lasting up to 30 minutes. LDC has released HUB5 Mandarin Telephone Speech and Transcripts Second Edition (LDC2018S18), which combines the speech and transcripts and make some updates to the release. See catalog entry for more details. *Data* Speakers were solicited by the LDC to participate in this telephone speech collection effort via the internet, publications (advertisements) and personal contacts. A total of 200 call originators were found, each of whom placed a telephone call via a toll-free robot operator maintained by the LDC. Access to the robot operator was possible via a unique Personal Identification Number (PIN) issued by the recruiting staff at the LDC when the caller enrolled in the project. The participants were made aware that their telephone call would be recorded, as were the call recipients. The call was allowed only if both parties agreed to being recorded. Each caller was allowed to talk up to 30 minutes. Upon successful completion of the call, the caller was paid $20 (in addition to making a free long-distance telephone call). Each caller was allowed to place only one telephone call. They were given no guidelines concerning what they should talk about. Once a caller was recruited to participate, he/she was given a free choice of whom to call. Most participants called family members or close friends. All calls originated in North America and were placed to various locations within North America. *Updates* There are no updates at this time.

Extent: Corpus size: 1056360 KB

Format: Sampling Rate: 8000

Sampling Format: 2-channel ulaw

Identifier: LDC98S69

https://catalog.ldc.upenn.edu/LDC98S69

ISBN: 1-58563-131-0

ISLRN: 333-068-970-015-5

DOI: 10.35111/69dn-5z94

Language: Mandarin Chinese

Language (ISO639): cmn

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC98S69

Rights Holder: Portions © 1998 Trustees of the University of Pennsylvania

Type (DCMI): Sound

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC98S69

DateStamp: 2020-11-30

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Linguistic Data Consortium. 1998. Linguistic Data Consortium.
Terms: area_Asia country_CN dcmi_Sound iso639_cmn olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC98S69
Up-to-date as of: Fri Aug 8 0:27:16 EDT 2025

Metadata
Title:		HUB5 Mandarin Telephone Speech Corpus
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Linguistic Data Consortium. HUB5 Mandarin Telephone Speech Corpus LDC98S69. Web Download. Philadelphia: Linguistic Data Consortium, 1998
Contributor:		Linguistic Data Consortium
Date (W3CDTF):		1998
Description:		LDC98S69 - Speech data LDC98T26 - Transcripts Introduction This release of HUB5 Mandarin training data consists of 42 calls derived from the CALLFRIEND Mandarin Chinese Mainland Dialect (Language ID) collection. The transcribed data is intended as additional training data in support of the project on Large Vocabulary Conversational Speech Recognition (LVCSR), also sponsored by the U.S. Department of Defense. The transcripts cover a contiguous 5-30 minute segment taken from a recorded conversation lasting up to 30 minutes. LDC has released HUB5 Mandarin Telephone Speech and Transcripts Second Edition (LDC2018S18), which combines the speech and transcripts and make some updates to the release. See catalog entry for more details. Data Speakers were solicited by the LDC to participate in this telephone speech collection effort via the internet, publications (advertisements) and personal contacts. A total of 200 call originators were found, each of whom placed a telephone call via a toll-free robot operator maintained by the LDC. Access to the robot operator was possible via a unique Personal Identification Number (PIN) issued by the recruiting staff at the LDC when the caller enrolled in the project. The participants were made aware that their telephone call would be recorded, as were the call recipients. The call was allowed only if both parties agreed to being recorded. Each caller was allowed to talk up to 30 minutes. Upon successful completion of the call, the caller was paid $20 (in addition to making a free long-distance telephone call). Each caller was allowed to place only one telephone call. They were given no guidelines concerning what they should talk about. Once a caller was recruited to participate, he/she was given a free choice of whom to call. Most participants called family members or close friends. All calls originated in North America and were placed to various locations within North America. Updates There are no updates at this time.
Extent:		Corpus size: 1056360 KB
Format:		Sampling Rate: 8000
Format:		Sampling Format: 2-channel ulaw
Identifier:		LDC98S69
		https://catalog.ldc.upenn.edu/LDC98S69
		ISBN: 1-58563-131-0
		ISLRN: 333-068-970-015-5
		DOI: 10.35111/69dn-5z94
Language:		Mandarin Chinese
Language (ISO639):		cmn
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC98S69
Rights Holder:		Portions © 1998 Trustees of the University of Pennsylvania
Type (DCMI):		Sound
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC98S69
DateStamp:		2020-11-30
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Linguistic Data Consortium. 1998. Linguistic Data Consortium.
Terms:		area_Asia country_CN dcmi_Sound iso639_cmn olac_primary_text