OLAC Record
oai:lindat.mff.cuni.cz:11858/00-097C-0000-0005-CF9C-4

Metadata
Title:Czech Parliament Meetings
Bibliographic Citation:http://hdl.handle.net/11858/00-097C-0000-0005-CF9C-4
Creator:Pražák, Aleš
Šmídl, Luboš
Date (W3CDTF):2012-03-28T14:45:25Z
Date Available:2012-03-28T14:45:25Z
Description:The corpus consists of recordings from the Chamber of Deputies of the Parliament of the Czech Republic. It currently consists of 88 hours of speech data, which corresponds roughly to 0.5 million tokens. The annotation process is semi-automatic, as we are able to perform the speech recognition on the data with high accuracy (over 90%) and consequently align the resulting automatic transcripts with the speech. The annotator’s task is then to check the transcripts, correct errors, add proper punctuation and label speech sections with information about the speaker. The resulting corpus is therefore suitable for both acoustic model training for ASR purposes and training of speaker identification and/or verification systems. The archive contains 18 sound files (WAV PCM, 16-bit, 44.1 kHz, mono) and corresponding transcriptions in XML-based standard Transcriber format (http://trans.sourceforge.net) The date of airing of a particular recording is encoded in the filename in the form SOUND_YYMMDD_*. Note that the recordings are usually aired in the early morning on the day following the actual Parliament session. If the recording is too long to fit in the broadcasting scheme, it is divided into several parts and aired on the consecutive days.
Identifier (URI):ZCU_CZ_Parliament
http://hdl.handle.net/11858/00-097C-0000-0005-CF9C-4
Language:Czech
Language (ISO639):ces
Publisher:University of West Bohemia, Department of Cybernetics
Rights:Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0)
http://creativecommons.org/licenses/by-nc-nd/3.0/
Subject:speech corpus
acoustic model
speaker identification
speaker verification
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11858/00-097C-0000-0005-CF9C-4
DateStamp:  2018-10-29
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Pražák, Aleš; Šmídl, Luboš. 2012. University of West Bohemia, Department of Cybernetics.
Terms: area_Europe country_CZ dcmi_Text iso639_ces olac_primary_text


http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11858/00-097C-0000-0005-CF9C-4
Up-to-date as of: Sun Jul 28 14:38:25 EDT 2019