OLAC Record: Russian through Switched Telephone Network (RuSTeN)

OLAC Record
oai:www.ldc.upenn.edu:LDC2006S34

Metadata

Title: Russian through Switched Telephone Network (RuSTeN)

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Raev, Anrey, et al. Russian through Switched Telephone Network (RuSTeN) LDC2006S34. Web Download. Philadelphia: Linguistic Data Consortium, 2006

Contributor: Raev, Anrey

Koval, Serguei

Smirnova, Natalia

Khitrova, Daria

Stepanov, Vitaly

Date (W3CDTF): 2006

Date Issued (W3CDTF): 2006-07-21

Description: *Introduction* Russian through Switched Telephone Network (RuSTeN) was developed by the Speech Technology Center (STC) and consists of approximately 56 hours of Russian telephone speech. This corpus was developed as part of the Automatic Voice Identification System in Telephone Channel project. The purpose of the project was to develop software for automatic identification of speakers based on voice samples acquired through telephone channels. System training was performed with the RuSTeN corpus. *Data* The RuSTeN database was recorded between March 2001 and February 2003 by Speech Technology Center (STC) using the "forget-me-not" professional telephone recording and archiving software package developed by STC. Each of the speakers made at least five calls from different locations and/or telephone sets. Most of the calls were made from home or an office environment with uncontrolled noise level. Additionally, one call per speaker was made from a public telephone (with either street or metro station noise in the background). The recordings are spontaneous (sometimes guided by the near-end speaker) conversations between the caller and the speech database collector on various subjects (the weather, the caller's biography, hobbies, etc.) and include approximately 150 seconds of the far-end and at least five seconds of the near-end speaker. Besides, each time the caller was asked to utter the usual digits set (0-9) and the words "yes" and "no." The time interval between two successive sessions is at least two days. The database contains 125 speakers (far-end), 58 male and 67 female. Further demographic information can be found in the associated documentation. Each far-end speaker is represented by at least five speech files. The sound files were recorded in wav format with sample frequency 11,025 Hz, one-channel, 16-bit linear. The speech filenames contain the following information: FFF (far-end speaker number) and SS (session number). *Samples* For an example of the data in this corpus, please listen to this sample (WAV). *Updates* None at this time.

Extent: Corpus size: 4404019 KB

Format: Sampling Rate: 11025

Sampling Format: 1-channel pcm

Identifier: LDC2006S34

https://catalog.ldc.upenn.edu/LDC2006S34

ISBN: 1-58563-388-7

ISLRN: 301-264-944-856-8

DOI: 10.35111/bw5g-8741

Language: Russian

Language (ISO639): rus

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2006S34

Rights Holder: Portions © 2001 Speech Technology Center Limited, © 2006 Trustees of the University of Pennsylvania

Type (DCMI): Sound

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2006S34

DateStamp: 2021-05-10

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Raev, Anrey; Koval, Serguei; Smirnova, Natalia; Khitrova, Daria; Stepanov, Vitaly. 2006. Linguistic Data Consortium.
Terms: area_Europe country_RU dcmi_Sound iso639_rus olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2006S34
Up-to-date as of: Thu Sep 18 0:59:04 EDT 2025

Metadata
Title:		Russian through Switched Telephone Network (RuSTeN)
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Raev, Anrey, et al. Russian through Switched Telephone Network (RuSTeN) LDC2006S34. Web Download. Philadelphia: Linguistic Data Consortium, 2006
Contributor:		Raev, Anrey
		Koval, Serguei
		Smirnova, Natalia
		Khitrova, Daria
		Stepanov, Vitaly
Date (W3CDTF):		2006
Date Issued (W3CDTF):		2006-07-21
Description:		Introduction Russian through Switched Telephone Network (RuSTeN) was developed by the Speech Technology Center (STC) and consists of approximately 56 hours of Russian telephone speech. This corpus was developed as part of the Automatic Voice Identification System in Telephone Channel project. The purpose of the project was to develop software for automatic identification of speakers based on voice samples acquired through telephone channels. System training was performed with the RuSTeN corpus. Data The RuSTeN database was recorded between March 2001 and February 2003 by Speech Technology Center (STC) using the "forget-me-not" professional telephone recording and archiving software package developed by STC. Each of the speakers made at least five calls from different locations and/or telephone sets. Most of the calls were made from home or an office environment with uncontrolled noise level. Additionally, one call per speaker was made from a public telephone (with either street or metro station noise in the background). The recordings are spontaneous (sometimes guided by the near-end speaker) conversations between the caller and the speech database collector on various subjects (the weather, the caller's biography, hobbies, etc.) and include approximately 150 seconds of the far-end and at least five seconds of the near-end speaker. Besides, each time the caller was asked to utter the usual digits set (0-9) and the words "yes" and "no." The time interval between two successive sessions is at least two days. The database contains 125 speakers (far-end), 58 male and 67 female. Further demographic information can be found in the associated documentation. Each far-end speaker is represented by at least five speech files. The sound files were recorded in wav format with sample frequency 11,025 Hz, one-channel, 16-bit linear. The speech filenames contain the following information: FFF (far-end speaker number) and SS (session number). Samples For an example of the data in this corpus, please listen to this sample (WAV). Updates None at this time.
Extent:		Corpus size: 4404019 KB
Format:		Sampling Rate: 11025
Format:		Sampling Format: 1-channel pcm
Identifier:		LDC2006S34
		https://catalog.ldc.upenn.edu/LDC2006S34
		ISBN: 1-58563-388-7
		ISLRN: 301-264-944-856-8
		DOI: 10.35111/bw5g-8741
Language:		Russian
Language (ISO639):		rus
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2006S34
Rights Holder:		Portions © 2001 Speech Technology Center Limited, © 2006 Trustees of the University of Pennsylvania
Type (DCMI):		Sound
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2006S34
DateStamp:		2021-05-10
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Raev, Anrey; Koval, Serguei; Smirnova, Natalia; Khitrova, Daria; Stepanov, Vitaly. 2006. Linguistic Data Consortium.
Terms:		area_Europe country_RU dcmi_Sound iso639_rus olac_primary_text