OLAC Record
oai:www.clarin.si:11356/1081

Metadata
Title:CMC training corpus Janes-Tag 1.1
Bibliographic Citation:http://hdl.handle.net/11356/1081
Creator:Erjavec, Tomaž
Fišer, Darja
Čibej, Jaka
Arhar Holdt, Špela
Date (W3CDTF):2016-12-28T11:40:50Z
Date Available:2016-12-28T11:40:50Z
Description:Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word normalisation, morphosyntactic tagging and lemmatisation of non-standard Slovene. As the corpus has been carefully manually annotated, it is also suitable for detailed linguistic explorations which require highly accurate and reliable annotations. The corpus is further described in: ERJAVEC, Tomaž, ČIBEJ, Jaka, ARHAR HOLDT, Špela, LJUBEŠIĆ, Nikola, FIŠER, Darja. Gold-standard datasets for annotation of Slovene computer-mediated communication. In Proceedings of RASLAN 2016: Recent Advances in Slavonic Natural Language Processing. Brno: Tribun EU, 2016, pp. 29-40, https://nlp.fi.muni.cz/raslan/raslan16.pdf Note that a related corpus, Janes-Norm is also available, cf. http://hdl.handle.net/11356/1083.
Identifier (URI):http://hdl.handle.net/11356/1081
Is Replaced By (URI):http://hdl.handle.net/11356/1085
Language:Slovenian
Language (ISO639):slv
Publisher:Jožef Stefan Institute
Replaces (URI):http://hdl.handle.net/11356/1079
Rights:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
https://creativecommons.org/licenses/by-sa/4.0/
Subject:computer-mediated communication
tokenisation
word normalisation
tagging
lemmatisation
manual annotation
TEI
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  Slovenian language resource repository CLARIN.SI
Description:  http://www.language-archives.org/archive/clarin.si
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.clarin.si:11356/1081
DateStamp:  2018-10-18
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Erjavec, Tomaž; Fišer, Darja; Čibej, Jaka; Arhar Holdt, Špela. 2016. Jožef Stefan Institute.
Terms: area_Europe country_SI dcmi_Text iso639_slv olac_primary_text


http://www.language-archives.org/item.php/oai:www.clarin.si:11356/1081
Up-to-date as of: Tue Aug 20 10:27:04 EDT 2019