OLAC Record: CSLU: Speaker Recognition Version 1.1

OLAC Record
oai:www.ldc.upenn.edu:LDC2006S26

Metadata

Title: CSLU: Speaker Recognition Version 1.1

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: CSLU. CSLU: Speaker Recognition Version 1.1 LDC2006S26. Web Download. Philadelphia: Linguistic Data Consortium, 2006

Contributor: CSLU

Date (W3CDTF): 2006

Date Issued (W3CDTF): 2006-05-18

Description: *Introduction* The Speaker Recognition corpus (formerly known as Speaker Verification) was developed by the Center for Spoken Language Understanding (CSLU) and consists of approximately 73 hours of English telephone speech from 91 participants. Participants recorded speech in 12 sessions over a two-year period. In each session, they were given prompts intended to elicit different types of speech: limited vocabulary utterances (i.e. "What is your eye color?"), number and word strings, fixed phrases, and spontaneous speech (i.e. "Describe a typical day in your life."). Some of the recording sessions were only a few days apart and others several weeks apart. Participants followed the following calling schedule. During the first month, they called twice in a week. No calls were made in the second and third months. In the fourth month they made one call. No calls were made in the fifth and sixth months. This pattern repeated three more times for a total of 12 calls per participant. In order to balance the workload required to remind participants to call and to avoid large data collection bursts on the system, the participants were divided into 12 groups. Each group began the two-year schedule on subsequent months. The first group started in September 1996, the second in October 1996, and so on. Every attempt was made to create a gender balanced subject pool. As each group started the data collection it had an equal number of both genders. However, as participants were dropped, the balance couldn't be perfectly maintained. *Data* All of the data in this corpus were collected over digital telephone lines. The digital data were recorded with the CSLU T1 digital data collection system. These files were sampled at 8 kHz 8-bit and stored as ulaw files. Nearly all of the files included in this corpus have corresponding non-time-aligned word-level transcriptions that comply with the conventions in the CSLU Labeling Guide. The current releases have only transcribed some of the long spontaneous utterances. The .wav files contain speech data and use the RIFF standard file format. This file format is 16-bit linearly encoded. The "trans" file contains a list of all of the transcriptions. *Samples* For an example of the data in this corpus, please listen to the following audio sample (WAV). *Updates* None at this time.

Extent: Corpus size: 4299161 KB

Format: Sampling Rate: 8000

Sampling Format: ulaw

Identifier: LDC2006S26

https://catalog.ldc.upenn.edu/LDC2006S26

ISBN: 1-58563-545-6

ISLRN: 672-454-108-628-2

DOI: 10.35111/2n7h-n869

Language: English

Language (ISO639): eng

License: CSLU Agreement: https://catalog.ldc.upenn.edu/license/cslu-corpora-non-commercial-research-only.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2006S26

Rights Holder: Portions © 1996-2002 Center for Spoken Language Understanding, Oregon Health & Science University, © 2006 Trustees of the University of Pennsylvania

Type (DCMI): Sound

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2006S26

DateStamp: 2021-06-04

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: CSLU. 2006. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2006S26
Up-to-date as of: Wed Oct 29 7:00:55 EDT 2025

Metadata
Title:		CSLU: Speaker Recognition Version 1.1
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		CSLU. CSLU: Speaker Recognition Version 1.1 LDC2006S26. Web Download. Philadelphia: Linguistic Data Consortium, 2006
Contributor:		CSLU
Date (W3CDTF):		2006
Date Issued (W3CDTF):		2006-05-18
Description:		Introduction The Speaker Recognition corpus (formerly known as Speaker Verification) was developed by the Center for Spoken Language Understanding (CSLU) and consists of approximately 73 hours of English telephone speech from 91 participants. Participants recorded speech in 12 sessions over a two-year period. In each session, they were given prompts intended to elicit different types of speech: limited vocabulary utterances (i.e. "What is your eye color?"), number and word strings, fixed phrases, and spontaneous speech (i.e. "Describe a typical day in your life."). Some of the recording sessions were only a few days apart and others several weeks apart. Participants followed the following calling schedule. During the first month, they called twice in a week. No calls were made in the second and third months. In the fourth month they made one call. No calls were made in the fifth and sixth months. This pattern repeated three more times for a total of 12 calls per participant. In order to balance the workload required to remind participants to call and to avoid large data collection bursts on the system, the participants were divided into 12 groups. Each group began the two-year schedule on subsequent months. The first group started in September 1996, the second in October 1996, and so on. Every attempt was made to create a gender balanced subject pool. As each group started the data collection it had an equal number of both genders. However, as participants were dropped, the balance couldn't be perfectly maintained. Data All of the data in this corpus were collected over digital telephone lines. The digital data were recorded with the CSLU T1 digital data collection system. These files were sampled at 8 kHz 8-bit and stored as ulaw files. Nearly all of the files included in this corpus have corresponding non-time-aligned word-level transcriptions that comply with the conventions in the CSLU Labeling Guide. The current releases have only transcribed some of the long spontaneous utterances. The .wav files contain speech data and use the RIFF standard file format. This file format is 16-bit linearly encoded. The "trans" file contains a list of all of the transcriptions. Samples For an example of the data in this corpus, please listen to the following audio sample (WAV). Updates None at this time.
Extent:		Corpus size: 4299161 KB
Format:		Sampling Rate: 8000
Format:		Sampling Format: ulaw
Identifier:		LDC2006S26
		https://catalog.ldc.upenn.edu/LDC2006S26
		ISBN: 1-58563-545-6
		ISLRN: 672-454-108-628-2
		DOI: 10.35111/2n7h-n869
Language:		English
Language (ISO639):		eng
License:		CSLU Agreement: https://catalog.ldc.upenn.edu/license/cslu-corpora-non-commercial-research-only.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2006S26
Rights Holder:		Portions © 1996-2002 Center for Spoken Language Understanding, Oregon Health & Science University, © 2006 Trustees of the University of Pennsylvania
Type (DCMI):		Sound
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2006S26
DateStamp:		2021-06-04
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		CSLU. 2006. Linguistic Data Consortium.
Terms:		area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text