OLAC Record: Czech Broadcast Conversation Speech

OLAC Record
oai:www.ldc.upenn.edu:LDC2009S02

Metadata

Title: Czech Broadcast Conversation Speech

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Kolar, Jachym, Jan Svec, and Josef Psutka. Czech Broadcast Conversation Speech LDC2009S02. Web Download. Philadelphia: Linguistic Data Consortium, 2009

Contributor: Kolar, Jachym

Svec, Jan

Psutka, Josef

Date (W3CDTF): 2009

Date Issued (W3CDTF): 2009-07-17

Description: *Introduction* Czech Broadcast Conversation Speech was prepared by researchers at the University of West Bohemia, Pilsen, Czech Republic, and consists of 40 hours of speech recorded from Czech Radio 1 in 2003. Transcripts corresponding to the audio files in this corpus are provided in Czech Broadcast Conversation MDE Transcripts (LDC2009T20). These corpora join LDC's other Czech broadcast data sets: Czech Broadcast News Speech (LDC2004S01), Czech Broadcast News Transcripts (LDC2004T01), Voice of America (VOA) Czech Broadcast News Audio (LDC2000S89), and Voice of America (VOA) Czech Broadcast News Transcripts (LDC2000T53). Czech Broadcast Conversation Speech consists of 72 single channel recordings of Radioforum, a live talk program broadcast by Czech Radio 1 (CRo1) every weekday evening. Its format consists of invited guests (most often politicians) spontaneously answering topical questions posed by one or two interviewers. The number of interviewees in a single program varies from one to three, but typically, one interviewer and two interviewees appear in the program. The material includes passages of interactive dialogue, but longer stretches of monologue-like speech comprise the majority of the collected data. Radioforum also has an interactive segment where listeners call the studio and ask their own questions. That telephony speech was not transcribed in the current release. *Data* Individual recordings range from 27 minutes to 36 minutes each. The recordings were collected during the period from February 12, 2003 through June 26, 2003. The signal is mono, sampled at 22.05 kHZ with 16-bit resolution, stored in Windows PCM waveform format. The names of the audio files refer to the broadcast date (rfYYMMDD.wav). The table below contains details about the audio files and the transcripts: Number of shows 72 Number of word tokens 292.6k Number of unique words 30.5k Duration of transcribed speech 33.0h Total number of speakers 128 Male speakers 108 Female speakers 20 *Samples* * Speech *Sponsorship* The completion of this corpus was facilitated by funding provided by the Ministry of Education of the Czech Republic under projects No. ME909 and 2C06020.

Extent: Corpus size: 6186598 KB

Format: Sampling Rate: 22050

Sampling Format: 16 bit PCM

Identifier: LDC2009S02

https://catalog.ldc.upenn.edu/LDC2009S02

ISBN: 1-58563-519-7

ISLRN: 014-122-305-405-9

DOI: 10.35111/0xmx-k439

Language: Czech

Language (ISO639): ces

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2009S02

Rights Holder: Portions © 2003 Cesky rozhlas 1 Radiozurnal, © 2009 Trustees of the University of Pennsylvania

Type (DCMI): Sound

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2009S02

DateStamp: 2020-11-30

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Kolar, Jachym; Svec, Jan; Psutka, Josef. 2009. Linguistic Data Consortium.
Terms: area_Europe country_CZ dcmi_Sound iso639_ces olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2009S02
Up-to-date as of: Fri Aug 8 0:27:55 EDT 2025

Metadata
Title:		Czech Broadcast Conversation Speech
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Kolar, Jachym, Jan Svec, and Josef Psutka. Czech Broadcast Conversation Speech LDC2009S02. Web Download. Philadelphia: Linguistic Data Consortium, 2009
Contributor:		Kolar, Jachym
		Svec, Jan
		Psutka, Josef
Date (W3CDTF):		2009
Date Issued (W3CDTF):		2009-07-17
Description:		Introduction Czech Broadcast Conversation Speech was prepared by researchers at the University of West Bohemia, Pilsen, Czech Republic, and consists of 40 hours of speech recorded from Czech Radio 1 in 2003. Transcripts corresponding to the audio files in this corpus are provided in Czech Broadcast Conversation MDE Transcripts (LDC2009T20). These corpora join LDC's other Czech broadcast data sets: Czech Broadcast News Speech (LDC2004S01), Czech Broadcast News Transcripts (LDC2004T01), Voice of America (VOA) Czech Broadcast News Audio (LDC2000S89), and Voice of America (VOA) Czech Broadcast News Transcripts (LDC2000T53). Czech Broadcast Conversation Speech consists of 72 single channel recordings of Radioforum, a live talk program broadcast by Czech Radio 1 (CRo1) every weekday evening. Its format consists of invited guests (most often politicians) spontaneously answering topical questions posed by one or two interviewers. The number of interviewees in a single program varies from one to three, but typically, one interviewer and two interviewees appear in the program. The material includes passages of interactive dialogue, but longer stretches of monologue-like speech comprise the majority of the collected data. Radioforum also has an interactive segment where listeners call the studio and ask their own questions. That telephony speech was not transcribed in the current release. Data Individual recordings range from 27 minutes to 36 minutes each. The recordings were collected during the period from February 12, 2003 through June 26, 2003. The signal is mono, sampled at 22.05 kHZ with 16-bit resolution, stored in Windows PCM waveform format. The names of the audio files refer to the broadcast date (rfYYMMDD.wav). The table below contains details about the audio files and the transcripts: Number of shows 72 Number of word tokens 292.6k Number of unique words 30.5k Duration of transcribed speech 33.0h Total number of speakers 128 Male speakers 108 Female speakers 20 Samples * Speech Sponsorship The completion of this corpus was facilitated by funding provided by the Ministry of Education of the Czech Republic under projects No. ME909 and 2C06020.
Extent:		Corpus size: 6186598 KB
Format:		Sampling Rate: 22050
Format:		Sampling Format: 16 bit PCM
Identifier:		LDC2009S02
		https://catalog.ldc.upenn.edu/LDC2009S02
		ISBN: 1-58563-519-7
		ISLRN: 014-122-305-405-9
		DOI: 10.35111/0xmx-k439
Language:		Czech
Language (ISO639):		ces
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2009S02
Rights Holder:		Portions © 2003 Cesky rozhlas 1 Radiozurnal, © 2009 Trustees of the University of Pennsylvania
Type (DCMI):		Sound
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2009S02
DateStamp:		2020-11-30
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Kolar, Jachym; Svec, Jan; Psutka, Josef. 2009. Linguistic Data Consortium.
Terms:		area_Europe country_CZ dcmi_Sound iso639_ces olac_primary_text