OLAC Record: SLX Corpus of Classic Sociolinguistic Interviews

OLAC Record
oai:www.ldc.upenn.edu:LDC2003T15

Metadata

Title: SLX Corpus of Classic Sociolinguistic Interviews

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Strassel, Stephanie, et al. SLX Corpus of Classic Sociolinguistic Interviews LDC2003T15. Web Download. Philadelphia: Linguistic Data Consortium, 2003

Contributor: Strassel, Stephanie

Conn, Jeffrey

Evans, Suzanne Wagner

Cieri, Christopher

Labov, William

Maeda, Kazuaki

Date (W3CDTF): 2003

Date Issued (W3CDTF): 2003-11-25

Description: *Introduction* The SLX Corpus of Classic Sociolinguistic Interviews was developed by William Labov and contains approximately 10 hours of English interviews along with annotations and transcripts. All of the interviews are conducted in the 1960s and 70s by William Labov or by one of his students. Labov notes that these interviews are not classic in the sense that they form part of a systematic sociolinguistic study of the speech community. What makes these interviews classic is that they represent classic solutions to the problems of achieving cross-cultural contact, reducing the effect of the Observers Paradox and approximating the vernacular of everyday life. Most importantly, they are interviews with extraordinarily gifted, memorable and fluent speakers. These particular interviews have also been targeted for inclusion in this corpus because of their sound quality and because publication of the audio data and corresponding transcripts and annotations does not violate any agreement the interviewer made with the speakers regarding data distribution. The SLX Corpus was developed as part of the Data and Annotations for Sociolinguistics (DASL) Project, an investigation of best practices in the use of digital speech corpora for the study of language variation. Containing classic interview material in the Labovian tradition, it is a valuable teaching tool for linguists. The recordings demonstrate successful interviewing techniques, the sound quality is high, and the digitization, segmentation, and transcription of the data represent best practice in these areas. The variable survey highlights over 150 sociolinguistic variables attested in the corpus and suggests avenues for further research. Most importantly, the SLX Corpus provides both an example of a digital speech corpus developed specifically to support sociolinguistic research, and a stable benchmark for training in sociolinguistic data collection, digitization, segmentation, transcription, analysis, and publication. *Data* The 17 speech files are 22050 Hz, 16-bit, single-channel in the MS WAV (RIFF) format, for a total of 575 minutes (~ 1.5GB). The files represent eight sociolinguistic interviews with a total of nine speakers. All interviews were recorded on a Nagra III or IVS with Sennheiser dynamic microphones. The interviews were digitized from the original open reel tapes onto DAT/disk at 16-bit, 44 kHz sampling. The monaural signal was passed through 2 channels at levels differing by 20% to capture the best digital copy in a single pass. The audio data reflects a broad spectrum of speaking styles, including spontaneous speech, narratives, responses and formal linguistic tasks. The interviews touch on a multitude of topics, and corpus users should note that the language of the interviews represents the uncensored opinions of the speakers, reflecting their daily concerns and personal histories. Taken as a whole, the speakers exemplify a wide variety of regional and social dialects. The corpus includes the complete interview recordings plus time-aligned verbatim transcripts for each speaker. Also included in the publication is a sociolinguistic variable survey that represents an overview of the intra- and inter-speaker variation attested in the corpus, highlighting a broad range of phonological, phonetic, grammatical, lexical, and stylistic variables. Finally, the publication includes a number of annotation tools that allow users to listen to each interview while browsing the corresponding transcripts, and to display and hear each token identified in the variable survey. These tools can be extended to create new time-aligned transcripts or tag additional variables within the existing corpus. *Samples* Please view these samples: * Speech (wav) * Transcript (lcf) * Annotation (tsv) *Updates* None at this time. *Sponsorship* The SLX corpus was funded in part through a five-year grant (BCS-998009, KDI, SBE) from the National Science Foundation via TalkBank, an interdisciplinary project to foster research and development in communicative behavior by providing tools and standards for analysis and distribution of language data. Additional funding was provided by Linguistic Data Consortium. *Note* The cost of the first 100 copies of this publication (not counting the copies distributed to LDC members) is covered by NSF Grant Number BCS-998009, and therefore free of charge. After these first 100 copies are distributed, additional copies will be available for the production cost of $100.

Extent: Corpus size: 1572864 KB

Format: Sampling Rate: 22050

Sampling Format: pcm

Identifier: LDC2003T15

https://catalog.ldc.upenn.edu/LDC2003T15

ISBN: 1-58563-273-2

ISLRN: 034-299-958-433-1

DOI: 10.35111/109x-k373

Language: English

Language (ISO639): eng

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2003T15

Rights Holder: Portions © 2003 Trustees of the University of Pennsylvania, IPA93 Fonts © 2003 SIL International

Type (DCMI): Sound

Text

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2003T15

DateStamp: 2024-09-09

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Strassel, Stephanie; Conn, Jeffrey; Evans, Suzanne Wagner; Cieri, Christopher; Labov, William; Maeda, Kazuaki. 2003. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Sound dcmi_Text iso639_eng olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2003T15
Up-to-date as of: Wed Oct 29 7:00:16 EDT 2025

Metadata
Title:		SLX Corpus of Classic Sociolinguistic Interviews
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Strassel, Stephanie, et al. SLX Corpus of Classic Sociolinguistic Interviews LDC2003T15. Web Download. Philadelphia: Linguistic Data Consortium, 2003
Contributor:		Strassel, Stephanie
		Conn, Jeffrey
		Evans, Suzanne Wagner
		Cieri, Christopher
		Labov, William
		Maeda, Kazuaki
Date (W3CDTF):		2003
Date Issued (W3CDTF):		2003-11-25
Description:		Introduction The SLX Corpus of Classic Sociolinguistic Interviews was developed by William Labov and contains approximately 10 hours of English interviews along with annotations and transcripts. All of the interviews are conducted in the 1960s and 70s by William Labov or by one of his students. Labov notes that these interviews are not classic in the sense that they form part of a systematic sociolinguistic study of the speech community. What makes these interviews classic is that they represent classic solutions to the problems of achieving cross-cultural contact, reducing the effect of the Observers Paradox and approximating the vernacular of everyday life. Most importantly, they are interviews with extraordinarily gifted, memorable and fluent speakers. These particular interviews have also been targeted for inclusion in this corpus because of their sound quality and because publication of the audio data and corresponding transcripts and annotations does not violate any agreement the interviewer made with the speakers regarding data distribution. The SLX Corpus was developed as part of the Data and Annotations for Sociolinguistics (DASL) Project, an investigation of best practices in the use of digital speech corpora for the study of language variation. Containing classic interview material in the Labovian tradition, it is a valuable teaching tool for linguists. The recordings demonstrate successful interviewing techniques, the sound quality is high, and the digitization, segmentation, and transcription of the data represent best practice in these areas. The variable survey highlights over 150 sociolinguistic variables attested in the corpus and suggests avenues for further research. Most importantly, the SLX Corpus provides both an example of a digital speech corpus developed specifically to support sociolinguistic research, and a stable benchmark for training in sociolinguistic data collection, digitization, segmentation, transcription, analysis, and publication. Data The 17 speech files are 22050 Hz, 16-bit, single-channel in the MS WAV (RIFF) format, for a total of 575 minutes (~ 1.5GB). The files represent eight sociolinguistic interviews with a total of nine speakers. All interviews were recorded on a Nagra III or IVS with Sennheiser dynamic microphones. The interviews were digitized from the original open reel tapes onto DAT/disk at 16-bit, 44 kHz sampling. The monaural signal was passed through 2 channels at levels differing by 20% to capture the best digital copy in a single pass. The audio data reflects a broad spectrum of speaking styles, including spontaneous speech, narratives, responses and formal linguistic tasks. The interviews touch on a multitude of topics, and corpus users should note that the language of the interviews represents the uncensored opinions of the speakers, reflecting their daily concerns and personal histories. Taken as a whole, the speakers exemplify a wide variety of regional and social dialects. The corpus includes the complete interview recordings plus time-aligned verbatim transcripts for each speaker. Also included in the publication is a sociolinguistic variable survey that represents an overview of the intra- and inter-speaker variation attested in the corpus, highlighting a broad range of phonological, phonetic, grammatical, lexical, and stylistic variables. Finally, the publication includes a number of annotation tools that allow users to listen to each interview while browsing the corresponding transcripts, and to display and hear each token identified in the variable survey. These tools can be extended to create new time-aligned transcripts or tag additional variables within the existing corpus. Samples Please view these samples: * Speech (wav) * Transcript (lcf) * Annotation (tsv) Updates None at this time. Sponsorship The SLX corpus was funded in part through a five-year grant (BCS-998009, KDI, SBE) from the National Science Foundation via TalkBank, an interdisciplinary project to foster research and development in communicative behavior by providing tools and standards for analysis and distribution of language data. Additional funding was provided by Linguistic Data Consortium. Note The cost of the first 100 copies of this publication (not counting the copies distributed to LDC members) is covered by NSF Grant Number BCS-998009, and therefore free of charge. After these first 100 copies are distributed, additional copies will be available for the production cost of $100.
Extent:		Corpus size: 1572864 KB
Format:		Sampling Rate: 22050
Format:		Sampling Format: pcm
Identifier:		LDC2003T15
		https://catalog.ldc.upenn.edu/LDC2003T15
		ISBN: 1-58563-273-2
		ISLRN: 034-299-958-433-1
		DOI: 10.35111/109x-k373
Language:		English
Language (ISO639):		eng
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2003T15
Rights Holder:		Portions © 2003 Trustees of the University of Pennsylvania, IPA93 Fonts © 2003 SIL International
Type (DCMI):		Sound
Type (DCMI):		Text
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2003T15
DateStamp:		2024-09-09
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Strassel, Stephanie; Conn, Jeffrey; Evans, Suzanne Wagner; Cieri, Christopher; Labov, William; Maeda, Kazuaki. 2003. Linguistic Data Consortium.
Terms:		area_Europe country_GB dcmi_Sound dcmi_Text iso639_eng olac_primary_text