OLAC Record
oai:www.ldc.upenn.edu:LDC2018S14

Metadata
Title:AISHELL-1
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Bu, Hui. AISHELL-1 LDC2018S14. Hard Drive. Philadelphia: Linguistic Data Consortium, 2018
Contributor:Bu, Hui
Date (W3CDTF):2018
Date Issued (W3CDTF):2018-11-15
Description:*Introduction* AISHELL-1 was developed by Beijing Shell Shell Technology Co., Ltd. It contains approximately 520 hours of Chinese Mandarin speech from 400 speakers recorded simultaneously on three different devices with associated transcripts. The goal of the collection was to support speech recognition system development in 11 domains, five of which are include in this corpus: Finance, Science & Technology, Sports, Entertainment, and News. Participants read 500 sentences covering the domains; sentences were chosen for their speech and phonetic characteristics. Speakers were recruited from different accent areas across China, including North, South and Yue-Gui-Min regions. There were 214 female speakers and 186 male speakers, constituting 53% and 47% of the database, respectively. Additional demographic information about the participants is included in this release. *Data* Speech was recorded in a quiet indoor environment on a high fidelity microphone and two mobile phones (Android and iOS). All speech is presented as 16-bit flac compressed wav files; the microphone speech sample rate is 44.1kHz and the phone speech sample rate is 16kHz. Each speech file ranges from approximately 1 second to 14 seconds in length. Transcripts are stored as UTF-8 encoded plain text files and are not time-aligned. *Samples* Please view the following samples: * Microphone * Android * iOS * Transcript *Updates* None at this time.
Extent:Corpus size: 47897192 KB
Format:Sampling Rate: 44100
Sampling Format: pcm
Identifier:LDC2018S14
https://catalog.ldc.upenn.edu/LDC2018S14
ISBN: 1-58563-866-8
ISLRN: 733-251-884-636-1
Language:Mandarin Chinese
Language (ISO639):cmn
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/LDC%20User%20Agreement%20for%20Non-Members.pdf
Medium:Distribution: Hard Drive
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2018S14
Rights Holder:Portions © 2018 Beijing Shell Shell Technology Co., Ltd., © 2018 Trustees of the University of Pennsylvania
Type (DCMI):Sound
Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2018S14
DateStamp:  2019-12-12
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Bu, Hui. 2018. Linguistic Data Consortium.
Terms: area_Asia country_CN dcmi_Sound dcmi_Text iso639_cmn olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2018S14
Up-to-date as of: Sat Jan 18 13:58:31 EST 2020