OLAC Record oai:www.ldc.upenn.edu:LDC2002S06 |
Metadata | ||
Title: | Switchboard-2 Phase III Audio | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Graff, David, David Miller, and Kevin Walker. Switchboard-2 Phase III Audio LDC2002S06. Web Download. Philadelphia: Linguistic Data Consortium, 2002 | |
Contributor: | Graff, David | |
Miller, David | ||
Walker, Kevin | ||
Date (W3CDTF): | 2002 | |
Date Issued (W3CDTF): | 2002-03-20 | |
Description: | *Introduction* The Switchboard-2 Phase III Audio corpus was produced by the Linguistic Data Consortium; catalog number LDC2002S06 and ISBN number 1-58563-222-8. This release contains speech data files ONLY, along with documentation describing speaker information (sex, age, education, city and state where raised), call information (date, time, call duration, Personal Identification Numbers, topic), and audit information (channel quality, background noise). The data files are not compressed. The Switchboard-2 Phase III collection was focused primarily in the American South. The collection commenced on October 21, 1997 and was completed on January 1, 1998. The project's goal was to target native speakers of English in the American South, balanced by gender, to participate in (10+) five to six minute conversations on a variety of telephone (land line) handsets. *Data* The speech data was collected for research, development, and evaluation of automatic systems for speech-to-text conversion, talker identification, language identification and speech signal detection purposes. During the collection period, the LDC collected a total of 2,728 calls, or 5,456 sides, from 640 participants (292 Male, 348 Female), under varied environmental conditions. Each speech file consists of a 1,024-byte ASCII-formatted Sphere header, followed by two-channel interleaved mu-law sample data. The mu-law samples represent the actual digital data transmission from the telephone service provider (MCI), as captured separately for each side of the telephone conversation by the LDC's telephone collection platform. The header also indicates the caller_pin, callee_pin, topic_id. The speech files are named according to the following pattern: sw_NNNNN.sph where the five-digit string "NNNNN" represents the conversation-id; this string is used to identify all speech files and to identify the calls in the associated data base tables that provide information about the calls and participants (i.e. callstat.tbl, master.tbl). Other documentation files available on the publication are: 0readme.1st Field information for all database tables swb_callaudit.tbl Audit results for each channel swb_callaudit.txt Document describing audit table swb_callstats.tbl Information about recorded calls swb_callstats.txt Document describing callstats table swb_callsubjects.tbl Demographic information swb_callsubjects.txt Document describing callsubjects table topics.txt List of proposed call topics There are a total of 2,657 data files (=~ 222 hours of audio) *Updates* No updates are available at this time. | |
Extent: | Corpus size: 6710886 KB | |
Format: | Sampling Rate: 8000 | |
Sampling Format: 2-channel ulaw | ||
Identifier: | LDC2002S06 | |
https://catalog.ldc.upenn.edu/LDC2002S06 | ||
ISBN: 1-58563-222-8 | ||
ISLRN: 603-855-311-336-8 | ||
DOI: 10.35111/ydsv-hw57 | ||
Language: | English | |
Language (ISO639): | eng | |
License: | LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf | |
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2002S06 | |
Rights Holder: | Portions © 1997-2002 Trustees of the University of Pennsylvania | |
Type (DCMI): | Sound | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2002S06 | |
DateStamp: | 2020-11-30 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Graff, David; Miller, David; Walker, Kevin. 2002. Linguistic Data Consortium. | |
Terms: | area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text |