OLAC Record
oai:www.ldc.upenn.edu:LDC2017S07

Metadata
Title:CHiME2 Grid
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Vincent, Emmanuel, et al. CHiME2 Grid LDC2017S07. Web Download. Philadelphia: Linguistic Data Consortium, 2017
Contributor:Vincent, Emmanuel
Barker, Jon
Watanabe, Shinji
Le Roux, Jonathan
Nesta, Francesco
Matassoni, Marco
Date (W3CDTF):2017
Date Issued (W3CDTF):2017-04-17
Description:*Introduction* CHiME2 Grid was developed as part of The 2nd CHiME Speech Separation and Recognition Challenge and contains approximately 120 hours of English speech from a noisy living room environment. The CHiME Challenges focus on distant-microphone automatic speech recognition (ASR) in real-world environments. CHiME2 Grid reflects the small vocabulary track of the CHiME2 Challenge. The target utterances were taken from the Grid corpus and consist of 34 speakers reading simple 6-word sequences. LDC also released CHiME2 WSJ0 (LDC2017S10) and CHiME3 (LDC2017S24). *Data* Data is divided into training, development and test sets. All data is provided as 16 bit WAV files sampled at 16 kHz. The noisy utterances are provided both in isolated form and in embedded form. The latter either involve five seconds of background noise before and after the utterance (in the training set) or they are mixed in continuous five minute noise background recordings (in the development and test sets). Seven hours of noise background not part of the training set are also included. The data is accompanied by one annotation file per speaker that includes additional technical information. Also included is a baseline Hidden Markov Model (HMM)-based speech recogniser and a scoring tool designed for the 2nd CHiME Challenge to allow users to obtain keyword recognition scores from formatted result files, perform recognition and score the challenge data, and estimate parameters of speaker dependent HMMs. *Samples* Please listen to the following samples: * Clean * Embedded * Isolated * Reverberated *Updates* None at this time.
Extent:Corpus size: 26680632 KB
Format:Sampling Rate: 16000
Sampling Format: pcm
Identifier:LDC2017S07
https://catalog.ldc.upenn.edu/LDC2017S07
ISBN: 1-58563-796-3
ISLRN: 134-467-387-379-1
DOI: 10.35111/g9fy-kd36
Language:English
Language (ISO639):eng
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2017S07
Rights Holder:Portions © 2017 Inria Nancy - Grand Est, University of Sheffield, Mitsubishi Electric Research Labs, Fondazione Bruno Kessler, © 2017 Trustees of the University of Pennsylvania
Type (DCMI):Sound
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2017S07
DateStamp:  2020-11-30
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Vincent, Emmanuel; Barker, Jon; Watanabe, Shinji; Le Roux, Jonathan; Nesta, Francesco; Matassoni, Marco. 2017. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2017S07
Up-to-date as of: Fri Dec 6 7:48:37 EST 2024