Title:Slovenian parliamentary corpus SlovParl 2.0
Bibliographic Citation:http://hdl.handle.net/11356/1167
Creator:Pančur, Andrej
Šorn, Mojca
Erjavec, Tomaž
Date (W3CDTF):2017-11-24T20:28:49Z
Date Available:2017-11-24T20:28:49Z
Description:The SlovParl corpus contains minutes of the Assembly of the Republic of Slovenia for the legislative period 1990-1992, i.e. it covers the period before, during, and after Slovenia became an independent country in 1991. The corpus comprises 232 sessions, 58,813 speeches and 10.8 million words. The corpus contains extensive meta-data about the speakers, a typology of sessions etc. and structural and editorial annotations. This item comprises three datasets: - the corpus in TEI (module Transcriptions of speech); - the corpus in TEI with added automatic linguistic annotation: tokenisation, MSD tagging and lemmatisation; - the corpus in vertical format used by various concordancers, e.g. CWB and Sketch Engine; this format is simpler and smaller but does not contain all the information from the source TEI. The SlovParl data originally come from https://github.com/SIstory/SlovParl, but have been converted to use TEI elements for speech. The first version of this resource is presented in the paper: Pančur, Andrej. "Označevanje zbirke zapisnikov sej slovenskega parlamenta s smernicami TEI." In the Proceedings of the Conference on Language Technologies & Digital Humanities (Tomaž Erjavec and Darja Fišer, eds.) 142-148. Ljubljana: Znanstvena založba Filozofske fakultete v Ljubljani, 2016.
Identifier (URI):http://hdl.handle.net/11356/1167
Language (ISO639):slv
Publisher:Institute of Contemporary History
Replaces (URI):http://hdl.handle.net/11356/1075
Rights:Creative Commons - Attribution 4.0 International (CC BY 4.0)
Subject:Slovenian Parliament
parliamentary debates
Type (DCMI):Text
Type (OLAC):primary_text


Citation: Pančur, Andrej; Šorn, Mojca; Erjavec, Tomaž. 2017. Institute of Contemporary History.
