OLAC Record: CHAracterizing INdividual Speakers (CHAINS)

OLAC Record
oai:www.ldc.upenn.edu:LDC2008S09

Metadata

Title: CHAracterizing INdividual Speakers (CHAINS)

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Cummins, Fred, et al. CHAracterizing INdividual Speakers (CHAINS) LDC2008S09. Web Download. Philadelphia: Linguistic Data Consortium, 2008

Contributor: Cummins, Fred

Grimaldi, Marco

Leonard, Thomas

Simko, Juraj

Date (W3CDTF): 2008

Date Issued (W3CDTF): 2008-11-18

Description: *Introduction* CHAINS was created by researchers at University College Dublin and contains recordings of thirty-six English speakers reading fables and selected sentences in different speaking styles. The data was obtained in two different sessions with a time separation of about two months. The goal of the corpus is to provide a range of speaking styles and voice modifications for speakers sharing the same accent. Other existing corpora, in particular CSLU Speaker Recognition Version 1.1, TIMIT and the IViE corpus (English Intonation in the British Isles), served as referents in the selection of material. This design decision was made to ensure that methods designed and evaluated on the CHAINS corpus might be directly testable on these other corpora, which were recorded using quite different dialects and channel characteristics. Additional documentation about the corpus and its methodolgy is available at the CHAINS website. *Data* The data was collected in two recording sessions in a total of six different speaking styles. The first recording session was carried out in a professional recording studio in December 2005. Speakers were recorded in a sound-attenuated booth reading text in the solo, synchronous and retell styles using a Neumann U87 condenser microphone. Additional tracks using other microphones (near and far-field) were also recorded and may be made available upon request to the authors. The second recording session took place from March 2006 to May 2006 in a quiet office environment, using an AKG C420 headset condenser microphone. Speakers read text in the rsi, whisper and fast modes. The six different speaking styles were: * solo reading * synchronous reading * spontaneous speech (retell) * reptitive synchronous imitation (rsi) * whispered fast reading * fast speech reading In two of the speaking conditions adopted, speakers modified their speech in a constrained fashion towards a known target in the synchronous condition, the speech of the co-speaker served as a target, while in rsi, there was an explicit known static target. The presence of a known target which speakers aim to copy raises the bar in the discovery and design of procedures for automatic speaker identification, as the target speech provides a potentially highly confusing foil. The whisper and fast speech conditions are also well defined speaking styles which require substantial voice modification by the speaker. Participants were recruited through the University College Dublin and were paid for their participation. No participant had any known speech or hearing deficit. The speakers were from the United Kingdom, the eastern part of Ireland (Dublin and adjacent counties) and the United States. Further information about the speakers, their gender and dialect is available in the documentation released with this corpus. *Samples * For the example of the data in this particular corpus please examine this sound file of the fast reading type

Extent: Corpus size: 3670016 KB

Format: Sampling Format: 16 bit linear PCM

Identifier: LDC2008S09

https://catalog.ldc.upenn.edu/LDC2008S09

ISBN: 1-58563-497-2

ISLRN: 726-472-023-584-8

DOI: 10.35111/cgbv-ke56

Language: English

Language (ISO639): eng

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2008S09

Rights Holder: Portions © 2005, 2006 University College Dublin, © 2008 Trustees of the University of Pennsylvania

Type (DCMI): Sound

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2008S09

DateStamp: 2020-11-30

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Cummins, Fred; Grimaldi, Marco; Leonard, Thomas; Simko, Juraj. 2008. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2008S09
Up-to-date as of: Wed Oct 29 7:01:05 EDT 2025

Metadata
Title:		CHAracterizing INdividual Speakers (CHAINS)
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Cummins, Fred, et al. CHAracterizing INdividual Speakers (CHAINS) LDC2008S09. Web Download. Philadelphia: Linguistic Data Consortium, 2008
Contributor:		Cummins, Fred
		Grimaldi, Marco
		Leonard, Thomas
		Simko, Juraj
Date (W3CDTF):		2008
Date Issued (W3CDTF):		2008-11-18
Description:		Introduction CHAINS was created by researchers at University College Dublin and contains recordings of thirty-six English speakers reading fables and selected sentences in different speaking styles. The data was obtained in two different sessions with a time separation of about two months. The goal of the corpus is to provide a range of speaking styles and voice modifications for speakers sharing the same accent. Other existing corpora, in particular CSLU Speaker Recognition Version 1.1, TIMIT and the IViE corpus (English Intonation in the British Isles), served as referents in the selection of material. This design decision was made to ensure that methods designed and evaluated on the CHAINS corpus might be directly testable on these other corpora, which were recorded using quite different dialects and channel characteristics. Additional documentation about the corpus and its methodolgy is available at the CHAINS website. Data The data was collected in two recording sessions in a total of six different speaking styles. The first recording session was carried out in a professional recording studio in December 2005. Speakers were recorded in a sound-attenuated booth reading text in the solo, synchronous and retell styles using a Neumann U87 condenser microphone. Additional tracks using other microphones (near and far-field) were also recorded and may be made available upon request to the authors. The second recording session took place from March 2006 to May 2006 in a quiet office environment, using an AKG C420 headset condenser microphone. Speakers read text in the rsi, whisper and fast modes. The six different speaking styles were: * solo reading * synchronous reading * spontaneous speech (retell) * reptitive synchronous imitation (rsi) * whispered fast reading * fast speech reading In two of the speaking conditions adopted, speakers modified their speech in a constrained fashion towards a known target in the synchronous condition, the speech of the co-speaker served as a target, while in rsi, there was an explicit known static target. The presence of a known target which speakers aim to copy raises the bar in the discovery and design of procedures for automatic speaker identification, as the target speech provides a potentially highly confusing foil. The whisper and fast speech conditions are also well defined speaking styles which require substantial voice modification by the speaker. Participants were recruited through the University College Dublin and were paid for their participation. No participant had any known speech or hearing deficit. The speakers were from the United Kingdom, the eastern part of Ireland (Dublin and adjacent counties) and the United States. Further information about the speakers, their gender and dialect is available in the documentation released with this corpus. Samples For the example of the data in this particular corpus please examine this sound file of the fast reading type
Extent:		Corpus size: 3670016 KB
Format:		Sampling Format: 16 bit linear PCM
Identifier:		LDC2008S09
		https://catalog.ldc.upenn.edu/LDC2008S09
		ISBN: 1-58563-497-2
		ISLRN: 726-472-023-584-8
		DOI: 10.35111/cgbv-ke56
Language:		English
Language (ISO639):		eng
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2008S09
Rights Holder:		Portions © 2005, 2006 University College Dublin, © 2008 Trustees of the University of Pennsylvania
Type (DCMI):		Sound
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2008S09
DateStamp:		2020-11-30
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Cummins, Fred; Grimaldi, Marco; Leonard, Thomas; Simko, Juraj. 2008. Linguistic Data Consortium.
Terms:		area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text