OLAC Record: 2010 NIST Speaker Recognition Evaluation Test Set

OLAC Record
oai:www.ldc.upenn.edu:LDC2017S06

Metadata

Title: 2010 NIST Speaker Recognition Evaluation Test Set

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Greenberg, Craig, et al. 2010 NIST Speaker Recognition Evaluation Test Set LDC2017S06. Web Download. Philadelphia: Linguistic Data Consortium, 2017

Contributor: Greenberg, Craig

Martin, Alvin

Graff, David

Brandschain, Linda

Walker, Kevin

Date (W3CDTF): 2017

Date Issued (W3CDTF): 2017-04-17

Description: *Introduction* 2010 NIST Speaker Recognition Evaluation Test Set was developed by the Linguistic Data Consortium (LDC) and NIST (National Institute of Standards and Technology). It contains 2,255 hours of American English telephone speech and speech recorded over a microphone channel involving an interview scenario used as test data in the NIST-sponsored 2010 Speaker Recognition Evaluation (SRE). The ongoing series of SRE yearly evaluations conducted by NIST are intended to be of interest to researchers working on the general problem of text independent speaker recognition. To this end the evaluations are designed to be simple, to focus on core technology issues, to be fully supported, and to be accessible to those wishing to participate. The 2010 evaluation was similar to the 2008 evaluation by including in the training and test conditions for the core test not only conversational telephone speech (CTS) recorded over ordinary telephone channels, but also CTS and conversational interview speech recorded over a room microphone channel. Unlike prior evaluations, some of the conversational telephone style speech was collected in a manner to produce particularly high, or particularly low, vocal effort on the part of the speaker of interest. *Data* The speech recordings in this release were collected in 2009 and 2010 by LDC at its Human Subjects Collection facility in Philadelphia. This collection was part of the Mixer 6 project, which was designed to support the development of robust speaker recognition technology by providing carefully collected and audited speech from a large pool of speakers recorded simultaneously across numerous microphones. The telephone speech segments include two-channel excerpts of approximately 5 minutes and 10 seconds. There are also summed-channel excerpts in the range of 5 minutes. The microphone excerpts are 3-15 minutes in duration. As in prior evaluations, intervals of silence were not removed. The data included in this release is 8 bit ulaw with a sample rate of 8 kHz. In addition to evaluation data, this package also consists of answer keys, trial and train files, development data and evaluation documentation. *Samples* For an example of the data in this corpus, please listen to this sample (SPH). *Updates* None at this time.

Extent: Corpus size: 111473192 KB

Format: Sampling Rate: 8000

Sampling Format: ulaw

Identifier: LDC2017S06

https://catalog.ldc.upenn.edu/LDC2017S06

ISBN: 1-58563-795-5

ISLRN: 429-091-121-265-4

DOI: 10.35111/fjsq-a117

Language: English

Language (ISO639): eng

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2017S06

Rights Holder: Portions © 2009, 2010, 2017 Trustees of the University of Pennsylvania

Type (DCMI): Sound

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2017S06

DateStamp: 2021-11-16

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Greenberg, Craig; Martin, Alvin; Graff, David; Brandschain, Linda; Walker, Kevin. 2017. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2017S06
Up-to-date as of: Wed Oct 29 7:01:42 EDT 2025

Metadata
Title:		2010 NIST Speaker Recognition Evaluation Test Set
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Greenberg, Craig, et al. 2010 NIST Speaker Recognition Evaluation Test Set LDC2017S06. Web Download. Philadelphia: Linguistic Data Consortium, 2017
Contributor:		Greenberg, Craig
		Martin, Alvin
		Graff, David
		Brandschain, Linda
		Walker, Kevin
Date (W3CDTF):		2017
Date Issued (W3CDTF):		2017-04-17
Description:		Introduction 2010 NIST Speaker Recognition Evaluation Test Set was developed by the Linguistic Data Consortium (LDC) and NIST (National Institute of Standards and Technology). It contains 2,255 hours of American English telephone speech and speech recorded over a microphone channel involving an interview scenario used as test data in the NIST-sponsored 2010 Speaker Recognition Evaluation (SRE). The ongoing series of SRE yearly evaluations conducted by NIST are intended to be of interest to researchers working on the general problem of text independent speaker recognition. To this end the evaluations are designed to be simple, to focus on core technology issues, to be fully supported, and to be accessible to those wishing to participate. The 2010 evaluation was similar to the 2008 evaluation by including in the training and test conditions for the core test not only conversational telephone speech (CTS) recorded over ordinary telephone channels, but also CTS and conversational interview speech recorded over a room microphone channel. Unlike prior evaluations, some of the conversational telephone style speech was collected in a manner to produce particularly high, or particularly low, vocal effort on the part of the speaker of interest. Data The speech recordings in this release were collected in 2009 and 2010 by LDC at its Human Subjects Collection facility in Philadelphia. This collection was part of the Mixer 6 project, which was designed to support the development of robust speaker recognition technology by providing carefully collected and audited speech from a large pool of speakers recorded simultaneously across numerous microphones. The telephone speech segments include two-channel excerpts of approximately 5 minutes and 10 seconds. There are also summed-channel excerpts in the range of 5 minutes. The microphone excerpts are 3-15 minutes in duration. As in prior evaluations, intervals of silence were not removed. The data included in this release is 8 bit ulaw with a sample rate of 8 kHz. In addition to evaluation data, this package also consists of answer keys, trial and train files, development data and evaluation documentation. Samples For an example of the data in this corpus, please listen to this sample (SPH). Updates None at this time.
Extent:		Corpus size: 111473192 KB
Format:		Sampling Rate: 8000
Format:		Sampling Format: ulaw
Identifier:		LDC2017S06
		https://catalog.ldc.upenn.edu/LDC2017S06
		ISBN: 1-58563-795-5
		ISLRN: 429-091-121-265-4
		DOI: 10.35111/fjsq-a117
Language:		English
Language (ISO639):		eng
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2017S06
Rights Holder:		Portions © 2009, 2010, 2017 Trustees of the University of Pennsylvania
Type (DCMI):		Sound
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2017S06
DateStamp:		2021-11-16
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Greenberg, Craig; Martin, Alvin; Graff, David; Brandschain, Linda; Walker, Kevin. 2017. Linguistic Data Consortium.
Terms:		area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text