OLAC Record
oai:www.ldc.upenn.edu:LDC2008S06

Metadata
Title:CSLU: Alphadigit Version 1.3
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Cole, Ronald Allan, et al. CSLU: Alphadigit Version 1.3 LDC2008S06. Web Download. Philadelphia: Linguistic Data Consortium, 2008
Contributor:Cole, Ronald Allan
Noel, M
Lander, T.
Durham, T
Date (W3CDTF):2008
Date Issued (W3CDTF):2008-07-16
Description:*Introduction* This file contains documentation for CSLU: Alphadigit Version 1.3 , Linguistic Data Consortium (LDC) catalog number LDC2008S06 and isbn 1-58563-478-6. Alphadigit Version 1.3 is a collection of 78,044 utterances from 3,025 speakers saying six-digit strings of letters and digits over the telephone for a total of approximately 82 hours of speech. Each speech file has corresponding orthographic and phonemic transcriptions. This corpus was created by the Center for Spoken Language Understanding (CSLU), Oregon Health & Science University, Beaverton, Oregon. *Data* Speakers were recruited using USEnet postings. Respondents registered for the collection by completing an online form. Once registered, they received a list of 18-29 six-digit strings (e.g., "a 2 b 4 5 g") and participation instructions. Speakers called the CSLU data collection system by dialing a toll-free number and were prompted for each string; 1102 different strings were used throughout the course of the data collection. The lists were set up to balance for phonetic context between all letter and digit pairs. The data were recorded directly from a digital phone line without digital-to-analog or analog-to-digital conversion at the recording end using the CSLU T1 digital data collection system. The sampling rate was 8khz and the files were stored in 8-bit mu-law format on a UNIX file system. The files have been converted to RIFF standard file format, 16-bit linearly encoded. *Transcription* All of the files included in this corpus have corresponding non-time-aligned word-level transcriptions and time aligned phoneme-level transcriptions (automatic forced alignment) that comply with the conventions in the CSLU Labeling Guide. Non time-aligned orthographic transcriptions provide quick access to the content of an utterance; they may contain markers for word boundaries to support access and retrieval at the lexical level. Phonetic/phonemic transcriptions represent the phonetic content of an utterance at a given level of detail that is made explicit by the use of diacritics. Phonetic phenomena transcribed include excessive nasalization, glottalization, frication on a stop, centralization, lateralization, rounding and palatalization. *Samples* For an example of the speech contained in this corpus, please listen to this audio sample (MS wave).
Extent:Corpus size: 5242880 KB
Format:Sampling Rate: 8000
Sampling Format: ulaw
Identifier:LDC2008S06
https://catalog.ldc.upenn.edu/LDC2008S06
ISBN: 1-58563-478-6
ISLRN: 569-415-930-320-1
DOI: 10.35111/eayh-nv69
Language:English
Language (ISO639):eng
License:CSLU Agreement: https://catalog.ldc.upenn.edu/license/cslu-corpora-non-commercial-research-only.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2008S06
Rights Holder:Portions © 2000-2002 Center for Spoken Language Understanding, Oregon Health & Science University, © 2008 Trustees of the University of Pennsylvania
Type (DCMI):Sound
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2008S06
DateStamp:  2022-01-20
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Cole, Ronald Allan; Noel, M; Lander, T.; Durham, T. 2008. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2008S06
Up-to-date as of: Mon Mar 25 7:20:18 EDT 2024