OLAC Record: KING Speaker Verification

OLAC Record
oai:www.ldc.upenn.edu:LDC95S22

Metadata

Title: KING Speaker Verification

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Dr. Alan Higgins, and Dave Vermilyea. KING Speaker Verification LDC95S22. Web Download. Philadelphia: Linguistic Data Consortium, 1995

Contributor: Dr. Alan Higgins

Vermilyea, Dave

Date (W3CDTF): 1995

Description: *Introduction* The KING corpus was collected at ITT in 1987 under a US government research contract and although other contractors have received it, it has not been officially available for public use before now. The version now available from LDC, referred to as KING-92, is based on a 1992 reprocessing of the original recordings (see below). It contains recorded speech from 51 male speakers in two versions, which differ in channel characteristics: one from a telephone handset and one from a high-quality microphone. The speakers are further subdivided into two groups, 25 in one and 26 in the other, who were recorded at different locations. For each speaker and channel there are ten files, corresponding to sessions of about 30 to 60 seconds' duration each. The interval between sessions varies from a week to a month. The transcripts contain about 54k word tokens (4.8k types). KING is designed principally for closed set experiments in text-independent speaker identification or verification over toll-quality telephone lines, although the single-sided collection format does not permit simulation of real telephone traffic. The ten sessions allow for a variety of divisions into training and test data, with the possibility of multiple test sets. For example, one could examine the effects of the amount of training on performance, or examine the variability of performance over several test samples (sessions) given a fixed amount of training (but see below about the "Great Divide"). *Data* The collection method used in KING was to establish a call from a laboratory location at ITT (either San Diego, CA or Nutley, NJ) over long distance lines and back to another phone at the same location. The phones used by the test subjects were equipped with an additional microphone, so two parallel recordings were made of that side of the conversation, while the interlocutor's side was not recorded. The two parties either spoke spontaneously or carried out a variety of tasks designed to elicit natural-sounding speech: interpreting a drawing, solving a problem, describing a picture, etc. There were 25 speakers in Nutley and 26 in San Diego. Speech-to-noise ratios average about 10 dB worse for the Nutley telephone data than for San Diego; in fact it is less than 20 dB for over half the Nutley files. Users of this corpus therefore usually run separate experiments, or at least report results separately, according to site. A more subtle difference in the recordings, however, sometimes referred to as the "Great Divide," cuts across the telephone data for the San Diego speakers. This was apparently due to a minor equipment change which was made during the collection; it results in a slight but consistent change in the average long term spectrum of the telephone data recorded after the fifth session. Training and testing on data from the same side of this divide gives significantly better results than across it. Since the discovery of this difference, investigators now generally report results on the first and last five sessions of the San Diego telephone KING data separately, or they report within vs. across this boundary. A detailed description of the spectral differences can be found in a report by Thomas Crystal and Ned Neuburg which accompanies the CD-ROM version. Since there are a number of published papers with results based on the original KING corpus and two versions of the data in existence, note that the new CD-ROM version, called KING-92, is based on a 1992 re-issue of the data from ITT. It differs from the original corpus in a few details: * The original data was sampled at 10 kHz, but has now been resampled at 8 kHz; * Missing segments, most on the order of seconds, have been restored to the data and the alignment between the high quality microphone and the telephone handset data files has been corrected; * Originally both an orthographic and a phonetic transcription of the data, with time alignments, were part of the corpus, but there were numerous errors; only an unaligned orthographic transcription has been retained. * Documentation has been changed to reflect these differences and a description of the artifactual division between sessions 1-5 and 6-10 in the San Diego telephone data is included. *Samples* Please view this audio sample and transcript sample. *Updates* None at this time.

Format: Sampling Rate: 8000

Sampling Format: 1-channel pcm compressed

Identifier: LDC95S22

https://catalog.ldc.upenn.edu/LDC95S22

ISBN: 1-58563-050-0

ISLRN: 155-446-887-889-4

DOI: 10.35111/j0af-qf40

Language: English

Language (ISO639): eng

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC95S22

Type (DCMI): Sound

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC95S22

DateStamp: 2020-11-30

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Dr. Alan Higgins; Vermilyea, Dave. 1995. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC95S22
Up-to-date as of: Wed Oct 29 7:00:33 EDT 2025

Metadata
Title:		KING Speaker Verification
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Dr. Alan Higgins, and Dave Vermilyea. KING Speaker Verification LDC95S22. Web Download. Philadelphia: Linguistic Data Consortium, 1995
Contributor:		Dr. Alan Higgins
Contributor:		Vermilyea, Dave
Date (W3CDTF):		1995
Description:		Introduction The KING corpus was collected at ITT in 1987 under a US government research contract and although other contractors have received it, it has not been officially available for public use before now. The version now available from LDC, referred to as KING-92, is based on a 1992 reprocessing of the original recordings (see below). It contains recorded speech from 51 male speakers in two versions, which differ in channel characteristics: one from a telephone handset and one from a high-quality microphone. The speakers are further subdivided into two groups, 25 in one and 26 in the other, who were recorded at different locations. For each speaker and channel there are ten files, corresponding to sessions of about 30 to 60 seconds' duration each. The interval between sessions varies from a week to a month. The transcripts contain about 54k word tokens (4.8k types). KING is designed principally for closed set experiments in text-independent speaker identification or verification over toll-quality telephone lines, although the single-sided collection format does not permit simulation of real telephone traffic. The ten sessions allow for a variety of divisions into training and test data, with the possibility of multiple test sets. For example, one could examine the effects of the amount of training on performance, or examine the variability of performance over several test samples (sessions) given a fixed amount of training (but see below about the "Great Divide"). Data The collection method used in KING was to establish a call from a laboratory location at ITT (either San Diego, CA or Nutley, NJ) over long distance lines and back to another phone at the same location. The phones used by the test subjects were equipped with an additional microphone, so two parallel recordings were made of that side of the conversation, while the interlocutor's side was not recorded. The two parties either spoke spontaneously or carried out a variety of tasks designed to elicit natural-sounding speech: interpreting a drawing, solving a problem, describing a picture, etc. There were 25 speakers in Nutley and 26 in San Diego. Speech-to-noise ratios average about 10 dB worse for the Nutley telephone data than for San Diego; in fact it is less than 20 dB for over half the Nutley files. Users of this corpus therefore usually run separate experiments, or at least report results separately, according to site. A more subtle difference in the recordings, however, sometimes referred to as the "Great Divide," cuts across the telephone data for the San Diego speakers. This was apparently due to a minor equipment change which was made during the collection; it results in a slight but consistent change in the average long term spectrum of the telephone data recorded after the fifth session. Training and testing on data from the same side of this divide gives significantly better results than across it. Since the discovery of this difference, investigators now generally report results on the first and last five sessions of the San Diego telephone KING data separately, or they report within vs. across this boundary. A detailed description of the spectral differences can be found in a report by Thomas Crystal and Ned Neuburg which accompanies the CD-ROM version. Since there are a number of published papers with results based on the original KING corpus and two versions of the data in existence, note that the new CD-ROM version, called KING-92, is based on a 1992 re-issue of the data from ITT. It differs from the original corpus in a few details: * The original data was sampled at 10 kHz, but has now been resampled at 8 kHz; * Missing segments, most on the order of seconds, have been restored to the data and the alignment between the high quality microphone and the telephone handset data files has been corrected; * Originally both an orthographic and a phonetic transcription of the data, with time alignments, were part of the corpus, but there were numerous errors; only an unaligned orthographic transcription has been retained. * Documentation has been changed to reflect these differences and a description of the artifactual division between sessions 1-5 and 6-10 in the San Diego telephone data is included. Samples Please view this audio sample and transcript sample. Updates None at this time.
Format:		Sampling Rate: 8000
Format:		Sampling Format: 1-channel pcm compressed
Identifier:		LDC95S22
		https://catalog.ldc.upenn.edu/LDC95S22
		ISBN: 1-58563-050-0
		ISLRN: 155-446-887-889-4
		DOI: 10.35111/j0af-qf40
Language:		English
Language (ISO639):		eng
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC95S22
Type (DCMI):		Sound
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC95S22
DateStamp:		2020-11-30
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Dr. Alan Higgins; Vermilyea, Dave. 1995. Linguistic Data Consortium.
Terms:		area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text