OLAC Record: Santa Barbara Corpus of Spoken American English Part I

OLAC Record
oai:www.ldc.upenn.edu:LDC2000S85

Metadata

Title: Santa Barbara Corpus of Spoken American English Part I

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Du Bois, John W., et al. Santa Barbara Corpus of Spoken American English Part I LDC2000S85. Web Download. Philadelphia: Linguistic Data Consortium, 2000

Contributor: Du Bois, John W.

Chafe, Wallace L.

Meyer, Charles

Thompson, Sandra A.

Date (W3CDTF): 2000

Date Issued (W3CDTF): 2000-01-01

Description: *Introduction* The Santa Barbara Corpus of Spoken American English is based on hundreds of recordings of natural speech from all over the United States, representing a wide variety of people of different regional origins, ages, occupations, and ethnic and social backgrounds. It reflects many ways that people use language in their lives: conversation, gossip, arguments, on-the-job talk, card games, city council meetings, sales pitches, classroom lectures, political speeches, bedtime stories, sermons, weddings, and more. *Data* Part I contains 14 speech files of between 15-30 minutes each, from the Santa Barbara Corpus of Spoken American English. Collected by: University of California, Santa Barbara Center for the Study of Discourse, Director John W. Du Bois (UCSB), Associate Editors: Wallace L. Chafe (UCSB), Charlese Meyer (UMass, Boston), and Sandra A. Thompson (UCSB). The Santa Barbara Corpus of Spoken American English is part of the International Corpus of English (Charles W. Meyer, Director), representing the American Component. Each speech file is accompanied by a transcript in which phrases are time stamped with respect to the audio recording. Personal names, place names, phone numbers, etc., in the transcripts have been altered to preserve the anonymity of the speakers and their acquaintances and the audio files have been filtered to make these portions of the recordings unrecognizable. *Samples* For an example of the data in this corpus, please examine these samples of the recordings and transcripts: * Speech * Transcripts *Updates* There are no updates at this time.

Extent: Corpus size: 1677721 KB

Identifier: LDC2000S85

https://catalog.ldc.upenn.edu/LDC2000S85

ISBN: 1-58563-164-7

ISLRN: 407-731-819-668-4

DOI: 10.35111/s2q7-gq73

Language: English

Language (ISO639): eng

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2000S85

Type (DCMI): Sound

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2000S85

DateStamp: 2021-07-01

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Du Bois, John W.; Chafe, Wallace L.; Meyer, Charles; Thompson, Sandra A. 2000. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2000S85
Up-to-date as of: Wed Oct 29 7:00:02 EDT 2025

Metadata
Title:		Santa Barbara Corpus of Spoken American English Part I
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Du Bois, John W., et al. Santa Barbara Corpus of Spoken American English Part I LDC2000S85. Web Download. Philadelphia: Linguistic Data Consortium, 2000
Contributor:		Du Bois, John W.
		Chafe, Wallace L.
		Meyer, Charles
		Thompson, Sandra A.
Date (W3CDTF):		2000
Date Issued (W3CDTF):		2000-01-01
Description:		Introduction The Santa Barbara Corpus of Spoken American English is based on hundreds of recordings of natural speech from all over the United States, representing a wide variety of people of different regional origins, ages, occupations, and ethnic and social backgrounds. It reflects many ways that people use language in their lives: conversation, gossip, arguments, on-the-job talk, card games, city council meetings, sales pitches, classroom lectures, political speeches, bedtime stories, sermons, weddings, and more. Data Part I contains 14 speech files of between 15-30 minutes each, from the Santa Barbara Corpus of Spoken American English. Collected by: University of California, Santa Barbara Center for the Study of Discourse, Director John W. Du Bois (UCSB), Associate Editors: Wallace L. Chafe (UCSB), Charlese Meyer (UMass, Boston), and Sandra A. Thompson (UCSB). The Santa Barbara Corpus of Spoken American English is part of the International Corpus of English (Charles W. Meyer, Director), representing the American Component. Each speech file is accompanied by a transcript in which phrases are time stamped with respect to the audio recording. Personal names, place names, phone numbers, etc., in the transcripts have been altered to preserve the anonymity of the speakers and their acquaintances and the audio files have been filtered to make these portions of the recordings unrecognizable. Samples For an example of the data in this corpus, please examine these samples of the recordings and transcripts: * Speech * Transcripts Updates There are no updates at this time.
Extent:		Corpus size: 1677721 KB
Identifier:		LDC2000S85
		https://catalog.ldc.upenn.edu/LDC2000S85
		ISBN: 1-58563-164-7
		ISLRN: 407-731-819-668-4
		DOI: 10.35111/s2q7-gq73
Language:		English
Language (ISO639):		eng
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2000S85
Type (DCMI):		Sound
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2000S85
DateStamp:		2021-07-01
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Du Bois, John W.; Chafe, Wallace L.; Meyer, Charles; Thompson, Sandra A. 2000. Linguistic Data Consortium.
Terms:		area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text