OLAC Record
oai:catalogue.elra.info:ELRA-S0391

Metadata
Title:The FAME! Speech Corpus
Access Rights: Rights available for: nonCommercialUse, commercialUse
Date Available (W3CDTF):2017-04-07
Date Issued (W3CDTF):2017-04-07
Date Modified (W3CDTF):2017-04-07
Description:The components of the Frisian data collection are speech and language resources gathered for building a large vocabulary ASR system for the Frisian language. Firstly, a new broadcast database is created by collecting recordings from the archives of the regional broadcaster Omrop Fryslân, and annotating them with various information such as the language switches and speaker details. The second component of this collection is a language model created on a text corpus with diverse vocabulary. Thirdly, a Frisian phonetic dictionary with the mappings between the Frisian words and phones is built to make the ASR viable for this under-resourced language. Finally, an ASR recipe is provided which uses all previous resources to perform recognition and present the recognition accuracies.The Corpus consists of 203 audio segments of approximately 5 minutes long extracted from various radio programs covering a time span of almost 50 years (1966-2015), adding a longitudinal dimension to the database. The content of the recordings are very diverse including radio programs about culture, history, literature, sports, nature, agriculture, politics, society and languages.The total duration of the manually annotated radio broadcasts sums up to 18 hours, 33 minutes and 57 seconds. The stereo audio data has a sampling frequency of 48 kHz and 16-bit resolution per sample. The available meta-information helped the annotators to identify these speakers and mark them either using their names or the same label (if the name is not known). There are 309 identified speakers in the FAME! Speech Corpus, 21 of whom appear at least 3 times in the database. These speakers are mostly program presenters and celebrities appearing multiple times in different recordings over years. There are 233 unidentified speakers due to lack of meta-information. The total number of word- and sentence-level code-switching cases in the FAME! Speech Corpus is equal to 3837.Music portions have been replaced by noise, except where these overlap with speech.
Identifier:ELRA-S0391
ISLRN: 340-994-352-616-4
Identifier (URI):https://catalog.elra.info/en-us/repository/browse/ELRA-S0391/
Language:Western Frisian
Dutch; Flemish
Language (ISO639):fry
nld
Medium:Not specified
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Sound
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-S0391
DateStamp:  2017-04-07
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2017. ELRA (European Language Resources Association).
Terms: area_Europe country_NL dcmi_Sound iso639_fry iso639_nld olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-S0391
Up-to-date as of: Fri Apr 19 6:30:26 EDT 2024