OLAC Record oai:www.ldc.upenn.edu:LDC2016S05 |
Metadata | ||
Title: | Digital Archive of Southern Speech - NLP Version | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Kretzschmar Jr., William A., et al. Digital Archive of Southern Speech - NLP Version LDC2016S05. Web Download. Philadelphia: Linguistic Data Consortium, 2016 | |
Contributor: | Kretzschmar Jr., William A. | |
Bounds, Paulina | ||
Hettel, Jacqueline | ||
Coats, Steven | ||
Pederson, Lee | ||
Lisa Lena Opas-Hänninen | ||
Juuso, Ilkka | ||
Seppänen, Tapio | ||
Date (W3CDTF): | 2016 | |
Date Issued (W3CDTF): | 2016-07-15 | |
Description: | *Introduction* Digital Archive of Southern Speech - NLP Version (DASS-NLP) was developed by LDC as an alternate version of Digital Archive of Southern Speech (DASS) (LDC2012S03) suitable for natural language processing and human language technology applications. Specifically, the original audio files have been converted to 16kHz 16-bit flac compressed wav and file names have been normalized to facilitate automatic processing. DASS was developed by the University of Georgia. It is a subset of the Linguistic Atlas of the Gulf States (LAGS), which is in turn part of the Linguist Atlas Project (LAP). DASS-NLP contains approximately 366 hours of English speech data from 30 female speakers and 34 male speakers in flac compressed wav format, along with associated metadata about the speakers and the recordings and maps in .jpeg format relating to the recording locations. LAP consists of a set of survey research projects about the words and pronunciation of everyday American English, the largest project of its kind in the United States. Interviews with thousands of native speakers across the country have been carried out since 1929. LAGS surveyed the everyday speech of Georgia, Tennessee, Florida, Alabama, Mississippi, Arkansas, Louisiana, and Texas in a series of 914 audio-taped interviews conducted from 1968-1983. Interviews average approximately six hours in length; the systematic LAGS tape archive amounts to 5500 hours of sound recordings. DASS is a collection of 64 interviews from LAGS selected to cover a range of speech across the region and to represent multiple education levels and ethnic backgrounds. *Data* The DASS-NLP speakers' average age is 61 years; there are 30 women and 34 men from the Gulf States region represented in this release. The interviews cover common topics such as family, the weather, household articles and activities, agriculture and social connections. The interviews were originally recorded in the field on reel-to-reel audio tape. A digital version of every reel of tape was then made, one .wav file per reel, usually about one hour of sound. Each interview thus consists of a set of 3 to 13 reels, or roughly 3 to 13 interview hours. Personally identifying or sensitive information in the files was replaced with a tone to protect the privacy and to assure ethical treatment of speakers. *Samples* Please listen to this sample. *Updates* None at this time. *Authorship* The following people were involved with the DASS project: William A. Kretzschmar, Jr., Paulina Bounds, Jacqueline Hettel and Steven Coats University of Georgia Lee Pederson Emory University Lisa Lena Opas-Hänninen, Ilkka Juuso and Tapio Seppänen University of Oulu (Finland) *Sponsorship* The Atlas Data contained herein comprises information collected in the period spanning from the 1930s to 2010 and has been compiled from diverse sources, by, and under the direction of, Dr. William A. Kretzschmar, Harry and Jane Wilson Professor in Humanities at the Department of English of The University of Georgia. Compilation and digitalization of this work was funded, in part, by the US National Science Foundation and by the US National Endowment for the Humanities. Additional information about the Atlas Project can be obtained at http://www.lap.uga.edu/. | |
Extent: | Corpus size: 22944064 KB | |
Format: | Sampling Rate: 16000 | |
Sampling Format: pcm | ||
Identifier: | LDC2016S05 | |
https://catalog.ldc.upenn.edu/LDC2016S05 | ||
ISBN: 1-58563-761-0 | ||
ISLRN: 920-059-271-034-1 | ||
DOI: 10.35111/v4g6-nx14 | ||
Language: | English | |
Language (ISO639): | eng | |
License: | Digital Archive of Southern Speech - NLP Version For-Profit Member Agreement: https://catalog.ldc.upenn.edu/license/dass-nlp-fp-agreement.pdf | |
LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf | ||
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2016S05 | |
Rights Holder: | Portions © 1982-2010 American Dialect Society, © 1986-2010 University of Georgia Research Foundation, © 2012, 2016 Trustees of the University of Pennsylvania | |
Type (DCMI): | Sound | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2016S05 | |
DateStamp: | 2020-11-30 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Kretzschmar Jr., William A.; Bounds, Paulina; Hettel, Jacqueline; Coats, Steven; Pederson, Lee; Lisa Lena Opas-Hänninen; Juuso, Ilkka; Seppänen, Tapio. 2016. Linguistic Data Consortium. | |
Terms: | area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text |