Title:Slovene coreference resolution corpus coref149
Bibliographic Citation:http://hdl.handle.net/11356/1182
Creator:Žitnik, Slavko
Date (W3CDTF):2018-03-23T18:56:26Z
Date Available:2018-03-23T18:56:26Z
Description:This corpus contains a subset of the ssj500k v1.4 corpus, http://hdl.handle.net/11356/1052. Each of 149 documents contains a paragraph from ssj500k that contains at least 100 words and at least 6 named entities. The data is in TCF format, exported from the WebAnno tool, https://webanno.github.io/webanno/. The annotated entities are of type person, organization or location. Mentions are annotated as coreference chains without additional classifications of different coreference types. Annotations also include implicit mentions that are specific for the Slovene language - in this case, a verb is tagged. The corpus consists of 1277 entities, 2329 mentions, 831 singleton entities, 40 appositions and 215 overlapping mentions. We also annotated overlapping mentions of the same entity - for example in text [strokovnega direktorja KC [Zorana Arneža]] we annotate two overlapping mentions that refer to the same entity. There are 97 such mentions in the corpus. In the public source code repository https://bitbucket.org/szitnik/nutie-core class TEIP5Importer contains an additional function to read the dataset and merge it together with the ssj500k dataset.
Identifier (URI):http://hdl.handle.net/11356/1182
Language (ISO639):slv
Publisher:Faculty of Computer and Information Science, University of Ljubljana
Rights:Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Subject:coreference resolution
Type (DCMI):Text
Type (OLAC):primary_text


Citation: Žitnik, Slavko. 2018. Faculty of Computer and Information Science, University of Ljubljana.
