OLAC Record
oai:www.ldc.upenn.edu:LDC2005S26

Metadata
Title:CSLU: 22 Languages Corpus
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Lander, T.. CSLU: 22 Languages Corpus LDC2005S26. Web Download. Philadelphia: Linguistic Data Consortium, 2005
Contributor:Lander, T.
Date (W3CDTF):2005
Date Issued (W3CDTF):2005-11-29
Description:*Introduction* CSLU: 22 Languages v 1.2 was developed by the Center for Spoken Language Understanding (CSLU) and contains approximately 84 hours of fixed vocabulary and fluent continuous telephone speech in 21 languages and orthographic transcriptions for a subset of the utterances. The corpus is distributed by the Linguistic Data Consortium and includes the following languages: Eastern Arabic, Cantonese, Czech, Farsi, German, Hindi, Hungarian, Japanese, Korean, Malay, Mandarin, Italian, Polish, Portuguese, Russian, Spanish, Swedish, Swahili, Tamil, Vietnamese, and English. *Data* All of the data in this corpus were collected over digital telephone lines. The files were recorded with the CSLU T1 digital data collection system, with an 8-bit signal and an 8 kHz sample rate, stored as ulaw files. All files are stored in standad 16-bit linear RIFF format. Each of the 50,191 utterances is verified by a native speaker to determine if the caller followed instructions when answering the prompts. For this release, approximately 19,758 utterances have corresponding orthographic transcriptions in all the above languages except Eastern Arabic, Farsi, Korean, Russian, and Italian. *Samples* For an exampe of the data in this corpus, please listen to these Arabic (WAV) and English (WAV) samples. *Updates* None at this time.
Format:Sampling Rate: 8000
Sampling Format: ulaw
Identifier:LDC2005S26
https://catalog.ldc.upenn.edu/LDC2005S26
ISBN: 1-58563-356-9
DOI: 10.35111/zkn2-5x88
Language:Yue Chinese
Vietnamese
Tamil
Swedish
Russian
Portuguese
Polish
Korean
Japanese
Indonesian
Hindi
English
German
Arabic
Swahili (macrolanguage); Swahili
Spanish
Mandarin Chinese
Italian
Hungarian
Persian
Czech
Language (ISO639):yue
vie
tam
swe
rus
por
pol
kor
jpn
ind
hin
eng
deu
ara
swa
spa
cmn
ita
hun
fas
ces
License:CSLU Agreement: https://catalog.ldc.upenn.edu/license/cslu-corpora-non-commercial-research-only.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2005S26
Rights Holder:Portions © 1998-2002 Center for Spoken Language Understanding Oregon Health & Science University, © 2005 Trustees of the University of Pennsylvania
Type (DCMI):Sound
Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2005S26
DateStamp:  2022-01-20
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Lander, T. 2005. Linguistic Data Consortium.
Terms: area_Asia area_Europe country_CN country_CZ country_DE country_ES country_GB country_HU country_ID country_IN country_IT country_JP country_KR country_PL country_PT country_RU country_SE country_VN dcmi_Sound dcmi_Text iso639_ara iso639_ces iso639_cmn iso639_deu iso639_eng iso639_fas iso639_hin iso639_hun iso639_ind iso639_ita iso639_jpn iso639_kor iso639_pol iso639_por iso639_rus iso639_spa iso639_swa iso639_swe iso639_tam iso639_vie iso639_yue olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2005S26
Up-to-date as of: Thu Jul 21 8:41:17 EDT 2022