OLAC Record
oai:www.ldc.upenn.edu:LDC2009S03

Metadata
Title:CSLU: S4X Release 1.2
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Cole, Ronald Allan, et al. CSLU: S4X Release 1.2 LDC2009S03. Web Download. Philadelphia: Linguistic Data Consortium, 2009
Contributor:Cole, Ronald Allan
Noel, M
Lander, T.
Durham, T
Date (W3CDTF):2009
Date Issued (W3CDTF):2009-09-15
Description:*Introduction* CSLU: S4X Release 1.2, Linguistic Data Consortium (LDC) catalog number LDC2009S03 and isbn 1-58563-523-5, was created by the Center for Spoken Language Understanding, Oregon Health and Science University (CSLU). The corpus consists of 36 speakers (22 male, 14 female) uttering 11 specified words. The speakers repeated the following words six times on each of four channels: startrek, supernova, tektronix, generation, nebula, processing, singularity, 71523, abracadabra, sungeeta and computer. The four channels used were office phone, home phone, carbon microphone telephone and speaker phone. Each speech file has a corresponding time-aligned phoneme-level transcription (achieved using automatic forced alignment) and an automatically-generated world-level transcription. Humans reviewed each utterance in two passes and classified it as good, bad, noisy or different. The results of this verification process are included in the /docs directory. *Data* The data was recorded with the CSLU T1 digital data collection system. Each utterance is recorded as a separate file. These files were sampled at 8 khz 8-bit and stored as ulaw files. All of the data use the RIFF standard file format. This file format is 16-bit linearly encoded. *Samples* For an example of the data in this corpus, please listen to this recording of a subject speaking the word 'computer': SD-1030-computer-t3-67.
Extent:Corpus size: 411648 KB
Format:Sampling Rate: 8000
Sampling Format: 8 bit ulaw
Identifier:LDC2009S03
https://catalog.ldc.upenn.edu/LDC2009S03
ISBN: 1-58563-523-5
ISLRN: 644-574-573-711-4
DOI: 10.35111/6a5x-dv17
Language:English
Language (ISO639):eng
License:CSLU Agreement: https://catalog.ldc.upenn.edu/license/cslu-corpora-non-commercial-research-only.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2009S03
Rights Holder:Portions © 1996, 1998, 2000, 2002 Center for Spoken Language Understanding, Oregon Health and Science University, © 2009 Trustees of the University of Pennsylvania
Type (DCMI):Sound
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2009S03
DateStamp:  2022-01-20
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Cole, Ronald Allan; Noel, M; Lander, T.; Durham, T. 2009. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2009S03
Up-to-date as of: Mon Mar 25 7:20:23 EDT 2024