OLAC Record oai:www.ldc.upenn.edu:LDC2005S26 |
Metadata | ||
Title: | CSLU: 22 Languages Corpus | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Lander, T.. CSLU: 22 Languages Corpus LDC2005S26. Web Download. Philadelphia: Linguistic Data Consortium, 2005 | |
Contributor: | Lander, T. | |
Date (W3CDTF): | 2005 | |
Date Issued (W3CDTF): | 2005-11-29 | |
Description: | *Introduction* CSLU: 22 Languages v 1.2 was developed by the Center for Spoken Language Understanding (CSLU) and contains approximately 84 hours of fixed vocabulary and fluent continuous telephone speech in 21 languages and orthographic transcriptions for a subset of the utterances. The corpus is distributed by the Linguistic Data Consortium and includes the following languages: Eastern Arabic, Cantonese, Czech, Farsi, German, Hindi, Hungarian, Japanese, Korean, Malay, Mandarin, Italian, Polish, Portuguese, Russian, Spanish, Swedish, Swahili, Tamil, Vietnamese, and English. *Data* All of the data in this corpus were collected over digital telephone lines. The files were recorded with the CSLU T1 digital data collection system, with an 8-bit signal and an 8 kHz sample rate, stored as ulaw files. All files are stored in standad 16-bit linear RIFF format. Each of the 50,191 utterances is verified by a native speaker to determine if the caller followed instructions when answering the prompts. For this release, approximately 19,758 utterances have corresponding orthographic transcriptions in all the above languages except Eastern Arabic, Farsi, Korean, Russian, and Italian. *Samples* For an exampe of the data in this corpus, please listen to these Arabic (WAV) and English (WAV) samples. *Updates* None at this time. | |
Format: | Sampling Rate: 8000 | |
Sampling Format: ulaw | ||
Identifier: | LDC2005S26 | |
https://catalog.ldc.upenn.edu/LDC2005S26 | ||
ISBN: 1-58563-356-9 | ||
DOI: 10.35111/zkn2-5x88 | ||
Language: | Yue Chinese | |
Vietnamese | ||
Tamil | ||
Swedish | ||
Russian | ||
Portuguese | ||
Polish | ||
Korean | ||
Japanese | ||
Indonesian | ||
Hindi | ||
English | ||
German | ||
Arabic | ||
Swahili (macrolanguage); Swahili | ||
Spanish | ||
Mandarin Chinese | ||
Italian | ||
Hungarian | ||
Persian | ||
Czech | ||
Language (ISO639): | yue | |
vie | ||
tam | ||
swe | ||
rus | ||
por | ||
pol | ||
kor | ||
jpn | ||
ind | ||
hin | ||
eng | ||
deu | ||
ara | ||
swa | ||
spa | ||
cmn | ||
ita | ||
hun | ||
fas | ||
ces | ||
License: | CSLU Agreement: https://catalog.ldc.upenn.edu/license/cslu-corpora-non-commercial-research-only.pdf | |
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2005S26 | |
Rights Holder: | Portions © 1998-2002 Center for Spoken Language Understanding Oregon Health & Science University, © 2005 Trustees of the University of Pennsylvania | |
Type (DCMI): | Sound | |
Text | ||
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2005S26 | |
DateStamp: | 2022-01-20 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Lander, T. 2005. Linguistic Data Consortium. | |
Terms: | area_Asia area_Europe country_CN country_CZ country_DE country_ES country_GB country_HU country_ID country_IN country_IT country_JP country_KR country_PL country_PT country_RU country_SE country_VN dcmi_Sound dcmi_Text iso639_ara iso639_ces iso639_cmn iso639_deu iso639_eng iso639_fas iso639_hin iso639_hun iso639_ind iso639_ita iso639_jpn iso639_kor iso639_pol iso639_por iso639_rus iso639_spa iso639_swa iso639_swe iso639_tam iso639_vie iso639_yue olac_primary_text |