OLAC Record: Noisy TIMIT Speech

OLAC Record
oai:www.ldc.upenn.edu:LDC2017S04

Metadata

Title: Noisy TIMIT Speech

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Abdulaziz, Azhar, and Veton Kepuska. Noisy TIMIT Speech LDC2017S04. Web Download. Philadelphia: Linguistic Data Consortium, 2017

Contributor: Abdulaziz, Azhar

Kepuska, Veton

Date (W3CDTF): 2017

Date Issued (W3CDTF): 2017-03-17

Description: *Introduction* Noisy TIMIT Speech was developed by the Florida Institute of Technology and contains approximately 322 hours of speech from the TIMIT Acoustic-Phonetic Continuous Speech Corpus (LDC93S1) modified with different additive noise levels. Only the audio has been modified; the original arrangement of the TIMIT corpus is still as described by the TIMIT documentation. *Data* The additive noise are white, pink, blue, red, violet and babble noise with noise levels varying in 5 dB (decibel) steps and ranges from 5 to 50 dB. The color of noise refers to the power spectrum of a noise signal. Sound waves have two characteristics: frequency, which describes how fast the waveform vibrates per second; and amplitude, the size of the waveform. Colored noises are named in an analogy to the colors of light. For instance, white noise contains all audible frequencies just as white light contains all frequencies in the visible range. Non-white colored noises have more energy concentrated at the high or low end of the sound spectrum. White, pink and blue noise are officially defined in the federal telecommunications standard. The white, pink, blue, red and violet noise types added to the TIMIT data in this release were generated artificially using MATLAB. For the babble noise, a random segment of recorded babble speech was selected and scaled relative to the power of the original TIMIT audio signal. All audio files are presented as single channel 16kHz 16-flac. *Samples* Please listen to the following samples: * 5db Babble * 15db Blue * 25db Pink * 35db Red * 45db Violet * 50db White *Updates* None at this time. *Related Works incorporating TIMIT* TIMIT was designed to provide speech data for acoustic-phonetic studies and for the development and evaluation of automatic speech recognition systems. Since its release in 1993, several corpora have been developed using the TIMIT database: NTIMIT (LDC93S2): transmitting TIMIT recordings through a telephone handset and over various channels in the NYNEX telephone network CTIMIT (LDC96S30): passing TIMIT files through cellular telephone circuits FFMTIMIT (LDC96S32): re-recording TIMIT files with a free-field microphone HTIMIT (LDC98S67): re-recording a subset of TIMIT files throgh different telephone handsets STC-TIMIT (LDC2008S03): passing TIMIT files through an actual telephone channel in a single call WTIMIT 1.0 (LDC2010S02): wideband mobile telephony TIMIT version

Extent: Corpus size: 21108664 KB

Format: Sampling Rate: 16000

Sampling Format: flac

Identifier: LDC2017S04

https://catalog.ldc.upenn.edu/LDC2017S04

ISBN: 1-58563-793-9

ISLRN: 107-834-092-668-3

DOI: 10.35111/m440-jj35

Language: English

Language (ISO639): eng

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2017S04

Rights Holder: Portions © 2017 Florida Institute of Technology, © 1993, 2017 Trustees of the University of Pennsylvania

Type (DCMI): Sound

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2017S04

DateStamp: 2020-11-30

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Abdulaziz, Azhar; Kepuska, Veton. 2017. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2017S04
Up-to-date as of: Wed Oct 29 7:01:41 EDT 2025

Metadata
Title:		Noisy TIMIT Speech
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Abdulaziz, Azhar, and Veton Kepuska. Noisy TIMIT Speech LDC2017S04. Web Download. Philadelphia: Linguistic Data Consortium, 2017
Contributor:		Abdulaziz, Azhar
Contributor:		Kepuska, Veton
Date (W3CDTF):		2017
Date Issued (W3CDTF):		2017-03-17
Description:		Introduction Noisy TIMIT Speech was developed by the Florida Institute of Technology and contains approximately 322 hours of speech from the TIMIT Acoustic-Phonetic Continuous Speech Corpus (LDC93S1) modified with different additive noise levels. Only the audio has been modified; the original arrangement of the TIMIT corpus is still as described by the TIMIT documentation. Data The additive noise are white, pink, blue, red, violet and babble noise with noise levels varying in 5 dB (decibel) steps and ranges from 5 to 50 dB. The color of noise refers to the power spectrum of a noise signal. Sound waves have two characteristics: frequency, which describes how fast the waveform vibrates per second; and amplitude, the size of the waveform. Colored noises are named in an analogy to the colors of light. For instance, white noise contains all audible frequencies just as white light contains all frequencies in the visible range. Non-white colored noises have more energy concentrated at the high or low end of the sound spectrum. White, pink and blue noise are officially defined in the federal telecommunications standard. The white, pink, blue, red and violet noise types added to the TIMIT data in this release were generated artificially using MATLAB. For the babble noise, a random segment of recorded babble speech was selected and scaled relative to the power of the original TIMIT audio signal. All audio files are presented as single channel 16kHz 16-flac. Samples Please listen to the following samples: * 5db Babble * 15db Blue * 25db Pink * 35db Red * 45db Violet * 50db White Updates None at this time. Related Works incorporating TIMIT TIMIT was designed to provide speech data for acoustic-phonetic studies and for the development and evaluation of automatic speech recognition systems. Since its release in 1993, several corpora have been developed using the TIMIT database: NTIMIT (LDC93S2): transmitting TIMIT recordings through a telephone handset and over various channels in the NYNEX telephone network CTIMIT (LDC96S30): passing TIMIT files through cellular telephone circuits FFMTIMIT (LDC96S32): re-recording TIMIT files with a free-field microphone HTIMIT (LDC98S67): re-recording a subset of TIMIT files throgh different telephone handsets STC-TIMIT (LDC2008S03): passing TIMIT files through an actual telephone channel in a single call WTIMIT 1.0 (LDC2010S02): wideband mobile telephony TIMIT version
Extent:		Corpus size: 21108664 KB
Format:		Sampling Rate: 16000
Format:		Sampling Format: flac
Identifier:		LDC2017S04
		https://catalog.ldc.upenn.edu/LDC2017S04
		ISBN: 1-58563-793-9
		ISLRN: 107-834-092-668-3
		DOI: 10.35111/m440-jj35
Language:		English
Language (ISO639):		eng
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2017S04
Rights Holder:		Portions © 2017 Florida Institute of Technology, © 1993, 2017 Trustees of the University of Pennsylvania
Type (DCMI):		Sound
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2017S04
DateStamp:		2020-11-30
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Abdulaziz, Azhar; Kepuska, Veton. 2017. Linguistic Data Consortium.
Terms:		area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text