OLAC Record: Nationwide Speech Project

OLAC Record
oai:www.ldc.upenn.edu:LDC2007S15

Metadata

Title: Nationwide Speech Project

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Clopper, Cynthia G., and David Pisoni. Nationwide Speech Project LDC2007S15. Web Download. Philadelphia: Linguistic Data Consortium, 2007

Contributor: Clopper, Cynthia G.

Pisoni, David B.

Date (W3CDTF): 2007

Date Issued (W3CDTF): 2007-09-17

Description: *Introduction* This corpus represents part of the work of the Nationwide Speech Project (NSP) conducted by the authors at Indiana University. The purpose of the NSP was to collect a large amount of speech produced by male and female talkers representing the primary regional varieties of American English: New England, Mid-Atlantic, North, Midland, South and West. This release contains approximately 60 hours of speech or nearly one hour of speech from each of 60 white American English speakers --including five male and five female talkers from the six dialect regions -- reading words and sentences. The corpus can be used for perceptual and acoustic experiments designed to explore the role of variation in spoken language processing. Such applications include speech science experiments and sociolinguistic or sociophonetic research. *Data* The speakers were recruited from the Indiana University community; they were all 18-25 years old at the time of recording, had lived exclusively in one region prior to age 18, and both parents of each speaker were also raised in the same region. Further demographic information about the speakers is provided in the file talkers.txt. The materials include 102 high predictability sentences and five repetitions of each of 10 hVd words. The high predictability sentences are 5-8 words in length and the final word in each sentence is highly predictable based on the preceding semantic context. The 10 hVd words are: heed, hid, hayed, head, had, hod, hud, hoes, hood and who'd. Participants were recorded one at a time by an experimenter in a sound attenuated booth (IAC Audiometric Testing Room, Model 402). Both the experimenter and the participant sat in the sound booth during testing. During the recording session, the participant was seated in front of a ViewSonic LCD flatscreen monitor (ViewPanel VG151) which mirrored the screen of a Macintosh Powerbook G3 laptop. The participant wore a Shure head-mounted microphone (SM10A) that was positioned approximately one inch from the left corner of the talker's mouth. The microphone output was fed to an Applied Research Technology microphone tube pre-amplifier. The output gain on the pre-amplifier was adjusted by the experimenter while the participant read the Grandfather Passage as a warm-up before recording began. The output of the microphone pre-amplifier was connected to a Roland UA-30 USB audio interface which digitized the signal and transmitted it via USB ports to the laptop where each utterance was recorded in an individual AIFF 16-bit digital sound file at a sampling rate of 44.1 kHz (converted to .wav format files for this release) The experimenter held the laptop on her lap and wore headphones connected to the Roland device so that she could hear the same audio signal that inputted into the laptop for recording. *Samples* * hpspin * vowel

Extent: Corpus size: 3355443 KB

Format: Sampling Rate: 44100

Sampling Format: pcm

Identifier: LDC2007S15

https://catalog.ldc.upenn.edu/LDC2007S15

ISBN: 1-58563-449-2

ISLRN: 686-386-828-766-0

DOI: 10.35111/tk2c-p329

Language: English

Language (ISO639): eng

License: Nationwide Speech Project Agreement: https://catalog.ldc.upenn.edu/license/nationwide-speech-project.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2007S15

Rights Holder: Portions © 2003 Indiana University Research and Technology Corporation, © 2007 Trustees of the University of Pennsylvania

Type (DCMI): Sound

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2007S15

DateStamp: 2020-11-30

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Clopper, Cynthia G.; Pisoni, David B. 2007. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2007S15
Up-to-date as of: Wed Oct 29 7:00:51 EDT 2025

Metadata
Title:		Nationwide Speech Project
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Clopper, Cynthia G., and David Pisoni. Nationwide Speech Project LDC2007S15. Web Download. Philadelphia: Linguistic Data Consortium, 2007
Contributor:		Clopper, Cynthia G.
Contributor:		Pisoni, David B.
Date (W3CDTF):		2007
Date Issued (W3CDTF):		2007-09-17
Description:		Introduction This corpus represents part of the work of the Nationwide Speech Project (NSP) conducted by the authors at Indiana University. The purpose of the NSP was to collect a large amount of speech produced by male and female talkers representing the primary regional varieties of American English: New England, Mid-Atlantic, North, Midland, South and West. This release contains approximately 60 hours of speech or nearly one hour of speech from each of 60 white American English speakers --including five male and five female talkers from the six dialect regions -- reading words and sentences. The corpus can be used for perceptual and acoustic experiments designed to explore the role of variation in spoken language processing. Such applications include speech science experiments and sociolinguistic or sociophonetic research. Data The speakers were recruited from the Indiana University community; they were all 18-25 years old at the time of recording, had lived exclusively in one region prior to age 18, and both parents of each speaker were also raised in the same region. Further demographic information about the speakers is provided in the file talkers.txt. The materials include 102 high predictability sentences and five repetitions of each of 10 hVd words. The high predictability sentences are 5-8 words in length and the final word in each sentence is highly predictable based on the preceding semantic context. The 10 hVd words are: heed, hid, hayed, head, had, hod, hud, hoes, hood and who'd. Participants were recorded one at a time by an experimenter in a sound attenuated booth (IAC Audiometric Testing Room, Model 402). Both the experimenter and the participant sat in the sound booth during testing. During the recording session, the participant was seated in front of a ViewSonic LCD flatscreen monitor (ViewPanel VG151) which mirrored the screen of a Macintosh Powerbook G3 laptop. The participant wore a Shure head-mounted microphone (SM10A) that was positioned approximately one inch from the left corner of the talker's mouth. The microphone output was fed to an Applied Research Technology microphone tube pre-amplifier. The output gain on the pre-amplifier was adjusted by the experimenter while the participant read the Grandfather Passage as a warm-up before recording began. The output of the microphone pre-amplifier was connected to a Roland UA-30 USB audio interface which digitized the signal and transmitted it via USB ports to the laptop where each utterance was recorded in an individual AIFF 16-bit digital sound file at a sampling rate of 44.1 kHz (converted to .wav format files for this release) The experimenter held the laptop on her lap and wore headphones connected to the Roland device so that she could hear the same audio signal that inputted into the laptop for recording. Samples * hpspin * vowel
Extent:		Corpus size: 3355443 KB
Format:		Sampling Rate: 44100
Format:		Sampling Format: pcm
Identifier:		LDC2007S15
		https://catalog.ldc.upenn.edu/LDC2007S15
		ISBN: 1-58563-449-2
		ISLRN: 686-386-828-766-0
		DOI: 10.35111/tk2c-p329
Language:		English
Language (ISO639):		eng
License:		Nationwide Speech Project Agreement: https://catalog.ldc.upenn.edu/license/nationwide-speech-project.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2007S15
Rights Holder:		Portions © 2003 Indiana University Research and Technology Corporation, © 2007 Trustees of the University of Pennsylvania
Type (DCMI):		Sound
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2007S15
DateStamp:		2020-11-30
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Clopper, Cynthia G.; Pisoni, David B. 2007. Linguistic Data Consortium.
Terms:		area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text