OLAC Record
oai:lindat.mff.cuni.cz:11234/1-4840

Metadata
Title:SiR 1.0
Bibliographic Citation:http://hdl.handle.net/11234/1-4840
Creator:Hladká, Barbora
Mírovský, Jiří
Kopp, Matyáš
Moravec, Václav
Date (W3CDTF):2022-09-21T09:26:48Z
Date Available:2022-09-21T09:26:48Z
Description:SiR 1.0 is a corpus of Czech articles published on iRozhlas, a news server of a Czech public radio (https://www.irozhlas.cz/). It is a collection of 1 718 articles (42 890 sentences, 614 995 words) with manually annotated attribution of citation phrases and sources. The sources are classified into several classes of named and unnamed sources. The corpus consists of three parts, depending on the quality of the annotations: (i) triple-annotated articles: 46 articles (933 sentences, 13 242 words) annotated independently by three annotators and subsequently curated by an arbiter, (ii) double-annotated articles: 543 articles (12 347 sentences, 180 622 words) annotated independently by two annotators and automatically unified, and (iii) single-annotated articles: 1 129 articles (29 610 sentences, 421 131 words) annotated each only by a single annotator. The data were annotated in the Brat tool (https://brat.nlplab.org/) and are distributed in the Brat native format, i.e. each article is represented by the original plain text and a stand-off annotation file. Please cite the following paper when using the corpus for your research: Hladká Barbora, Jiří Mírovský, Matyáš Kopp, Václav Moravec. Annotating Attribution in Czech News Server Articles. In: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), pages 1817–1823, Marseille, France 20-25 June 2022.
Identifier (URI):http://hdl.handle.net/11234/1-4840
Language:Czech
Language (ISO639):ces
Publisher:Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Rights:Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
http://creativecommons.org/licenses/by-nc-sa/4.0/
Subject:news server articles
attribution
attribution signals
attribution sources
annotation
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11234/1-4840
DateStamp:  2023-01-10
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Hladká, Barbora; Mírovský, Jiří; Kopp, Matyáš; Moravec, Václav. 2022. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL).
Terms: area_Europe country_CZ dcmi_Text iso639_ces olac_primary_text


http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11234/1-4840
Up-to-date as of: Thu Oct 5 0:43:21 EDT 2023