OLAC Record
oai:www.ldc.upenn.edu:LDC2021S01

Metadata
Title:Althingi Parliamentary Speech
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Helgadóttir, Inga Rún, et al. Althingi Parliamentary Speech LDC2021S01. Web Download. Philadelphia: Linguistic Data Consortium, 2021
Contributor:Helgadóttir, Inga Rún
Kjaran, Róbert
Nikulásdóttir, Anna Björk
Gudnason, Jon
Date (W3CDTF):2021
Date Issued (W3CDTF):2021-02-15
Description:*Introduction* Althingi Parliamentary Speech consists of approximately 542 hours of recorded speech from Althingi, the Icelandic Parliament, along with corresponding transcripts, a pronunciation dictionary and two language models. Speeches date from 2005-2016. This dataset was collected in 2016 by the ASR for Althingi project at Reykjavik University in collaboration with the Althingi speech department. The purpose of that project was to develop an ASR (automatic speech recognition) system for parliamentary speech to replace the procedure of manually transcribing performed speeches. *Data* The mean speech length is six minutes, with speeches ranging from under one minute to around thirty minutes. The corpus features 197 speakers (105 male, 92 female) and is split into training, development and evaluation sets. The language models are of two types: a pruned trigram model, used in decoding, and an unpruned constant ARPA 5-gram model, used for re-scoring decoding results. Audio data is presented as single channel 16-bit mp3 files; the majority of these files have a sample rate of 44.1 kHz. Transcripts and other text data are plain text encoded in UTF-8. *Samples* Please view this audio sample and transcript sample. *Updates* None at this time. *Additional Citation* When publishing results based on the texts in the corpus please refer to: Inga Rún Helgadóttir, Róbert Kjaran, Anna Björk Nikulásdóttir and Jón Guðnason, 2017. Building an ASR corpus using Althingi’s Parliamentary Speeches. Proceedings of Interspeech 2017.
Extent:Corpus size: 19477181 KB
Format:Sampling Rate: 44100
Sampling Format: mp3
Identifier:LDC2021S01
https://catalog.ldc.upenn.edu/LDC2021S01
ISBN: 1-58563-956-7
ISLRN: 142-519-062-218-1
DOI: 10.35111/695b-6697
Language:Icelandic
Language (ISO639):isl
License:Althingi Parliamentary Speech Agreement (For-Profit): https://catalog.ldc.upenn.edu/license/althingi-parliamentary-speech-agreement-for-profit.pdf
Althingi Parliamentary Speech Agreement (Non-Member): https://catalog.ldc.upenn.edu/license/althingi-parliamentary-speech-agreement-non-member.pdf
Althingi Parliamentary Speech Agreement (Not-For-Profit): https://catalog.ldc.upenn.edu/license/althingi-parliamentary-speech-agreement-not-for-profit.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2021S01
Rights Holder:Portions © 2021 Reykjavik University, © 2021 Trustees of the University of Pennsylvania
Type (DCMI):Sound
Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2021S01
DateStamp:  2022-01-01
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Helgadóttir, Inga Rún; Kjaran, Róbert; Nikulásdóttir, Anna Björk; Gudnason, Jon. 2021. Linguistic Data Consortium.
Terms: area_Europe country_IS dcmi_Sound dcmi_Text iso639_isl olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2021S01
Up-to-date as of: Mon Mar 25 7:21:13 EDT 2024