OLAC Record: Taiwanese Putonghua Speech and Transcripts

OLAC Record
oai:www.ldc.upenn.edu:LDC98S72

Metadata

Title: Taiwanese Putonghua Speech and Transcripts

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Duanmu, San, et al. Taiwanese Putonghua Speech and Transcripts LDC98S72. Web Download. Philadelphia: Linguistic Data Consortium, 1998

Contributor: Duanmu, San

Wakefield, Gregory H.

Hsu, Yi-ping

Qui, Shan-ping

Guevara Rowena Cristina

Date (W3CDTF): 1998

Description: *Introduction* This set of data on Taiwanese accented Putonghua (PTH) was gathered by San Duanmu at the University of Michigan. The data was recorded in Taiwan from December 1994 to January 1995. Taiwanese accented PTH refers to PTH spoken by people who were born in Taiwan and whose first language is Taiwanese (Southern Min). *Data* A total of 40 speakers; ranging in age, education, birth place and family dialect; were recorded. There were five two-speaker dialogues and 30 single-speaker monologues. The dialogues were about 20 minutes each and the monologues were about 10 minutes each. Dialogues were recorded on two tracks, one for each speaker. Monologues were recorded on one track. The recordings were done in ordinary, but quiet rooms. The speakers were asked in advance to speak in conversation style, without notes, on any topic they chose, or no topic at all. Most speakers spoke spontaneously and the topic drifted freely. Some speakers talked about their professional work in a rather formal way. One speaker (#20, a public health official) used notes. Overall, the corpus provides an informative sampling of variation in speech style. The recording tools consisted of a portable DAT (Teac) which recorded at a 44.1 kHz sampling rate at 16 bits linear quantization. The microphones were AudioTechnica lapel microphones with a preamp and XLR connection to the DAT. The XLR helped low noise recordings and the AudioTechnica provided wide bandwidth, flat response over the speech range of interest, was unidirectional to minimize cross-talk and very light in comparison with standard microphones. Both single-speaker monologues and two-speaker dialogues were recorded using this system on standard DAT tape. For publication on CD-ROM, the original DAT recordings were downsampled to a 16 kHz sample rate. Before recording, all speakers read and signed the "Informed Consent Form," which was written in Chinese and which largely followed the standard format approved by the Human Subject Committee of the University of Michigan. The form stated that the participation in the recording was entirely voluntary and that the speech may be used for linguistic teaching and research purposes. The speech data are accompanied by transcripts. The monologues have start and end time stamps. The five dialogues are time stamped by speaker turn. *Updates* After the publication of this corpus some demographic data was made available to the LDC. To access this data, please go to the demographic table.

Format: Sampling Rate: 16000

Sampling Format: 1-channel pcm

Identifier: LDC98S72

https://catalog.ldc.upenn.edu/LDC98S72

ISBN: 1-58563-139-6

ISLRN: 388-547-288-616-7

DOI: 10.35111/st4q-yw96

Language: Mandarin Chinese

Language (ISO639): cmn

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC98S72

Type (DCMI): Sound

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC98S72

DateStamp: 2020-11-30

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Duanmu, San; Wakefield, Gregory H.; Hsu, Yi-ping; Qui, Shan-ping; Guevara Rowena Cristina. 1998. Linguistic Data Consortium.
Terms: area_Asia country_CN dcmi_Sound iso639_cmn olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC98S72
Up-to-date as of: Wed Oct 29 7:00:46 EDT 2025

Metadata
Title:		Taiwanese Putonghua Speech and Transcripts
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Duanmu, San, et al. Taiwanese Putonghua Speech and Transcripts LDC98S72. Web Download. Philadelphia: Linguistic Data Consortium, 1998
Contributor:		Duanmu, San
		Wakefield, Gregory H.
		Hsu, Yi-ping
		Qui, Shan-ping
		Guevara Rowena Cristina
Date (W3CDTF):		1998
Description:		Introduction This set of data on Taiwanese accented Putonghua (PTH) was gathered by San Duanmu at the University of Michigan. The data was recorded in Taiwan from December 1994 to January 1995. Taiwanese accented PTH refers to PTH spoken by people who were born in Taiwan and whose first language is Taiwanese (Southern Min). Data A total of 40 speakers; ranging in age, education, birth place and family dialect; were recorded. There were five two-speaker dialogues and 30 single-speaker monologues. The dialogues were about 20 minutes each and the monologues were about 10 minutes each. Dialogues were recorded on two tracks, one for each speaker. Monologues were recorded on one track. The recordings were done in ordinary, but quiet rooms. The speakers were asked in advance to speak in conversation style, without notes, on any topic they chose, or no topic at all. Most speakers spoke spontaneously and the topic drifted freely. Some speakers talked about their professional work in a rather formal way. One speaker (#20, a public health official) used notes. Overall, the corpus provides an informative sampling of variation in speech style. The recording tools consisted of a portable DAT (Teac) which recorded at a 44.1 kHz sampling rate at 16 bits linear quantization. The microphones were AudioTechnica lapel microphones with a preamp and XLR connection to the DAT. The XLR helped low noise recordings and the AudioTechnica provided wide bandwidth, flat response over the speech range of interest, was unidirectional to minimize cross-talk and very light in comparison with standard microphones. Both single-speaker monologues and two-speaker dialogues were recorded using this system on standard DAT tape. For publication on CD-ROM, the original DAT recordings were downsampled to a 16 kHz sample rate. Before recording, all speakers read and signed the "Informed Consent Form," which was written in Chinese and which largely followed the standard format approved by the Human Subject Committee of the University of Michigan. The form stated that the participation in the recording was entirely voluntary and that the speech may be used for linguistic teaching and research purposes. The speech data are accompanied by transcripts. The monologues have start and end time stamps. The five dialogues are time stamped by speaker turn. Updates After the publication of this corpus some demographic data was made available to the LDC. To access this data, please go to the demographic table.
Format:		Sampling Rate: 16000
Format:		Sampling Format: 1-channel pcm
Identifier:		LDC98S72
		https://catalog.ldc.upenn.edu/LDC98S72
		ISBN: 1-58563-139-6
		ISLRN: 388-547-288-616-7
		DOI: 10.35111/st4q-yw96
Language:		Mandarin Chinese
Language (ISO639):		cmn
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC98S72
Type (DCMI):		Sound
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC98S72
DateStamp:		2020-11-30
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Duanmu, San; Wakefield, Gregory H.; Hsu, Yi-ping; Qui, Shan-ping; Guevara Rowena Cristina. 1998. Linguistic Data Consortium.
Terms:		area_Asia country_CN dcmi_Sound iso639_cmn olac_primary_text