OLAC Record
oai:www.clarin.si:11356/1086

Metadata
Title:CMC training corpus Janes-Syn 1.0
Bibliographic Citation:http://hdl.handle.net/11356/1086
Creator:Arhar Holdt, Špela
Erjavec, Tomaž
Fišer, Darja
Date (W3CDTF):2017-01-03T11:38:46Z
Date Available:2017-01-03T11:38:46Z
Description:Janes-Syn is a syntactically annotated corpus of Slovene tweets and is meant as a gold-standard training and testing dataset for syntactic annotation of Slovene computer-mediated communication and for detailed linguistic explorations which require highly accurate and reliable annotations. Words in the dataset are normalised, lemmatised, PoS-tagged and syntactically annotated with the JOS dependency model (http://eng.slovenscina.eu/tehnologije/razclenjevalnik). The annotations on all levels were manually corrected. The corpus creation and structure are described in: ARHAR HOLDT, Špela, FIŠER, Darja, ERJAVEC, Tomaž, KREK, Simon. Syntactic annotation of Slovene CMC : first steps. Proceedings of the 4th Conference on CMC and Social Media Corpora for the Humanities, 27-28 September 2016, Ljubljana, Slovenia, 2016, pp. 3-6. http://nl.ijs.si/janes/cmc-corpora2016/proceedings/ Janes-Syn was created from two larger corpora that are also available in the repository: Janes-Norm (http://hdl.handle.net/11356/1084) and Janes-Tag (http://hdl.handle.net/11356/1123).
Identifier (URI):http://hdl.handle.net/11356/1086
Language:Slovenian
Language (ISO639):slv
Publisher:Jožef Stefan Institute
Rights:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
https://creativecommons.org/licenses/by-sa/4.0/
Subject:computer-mediated communication
tokenisation
word normalisation
tagging
lemmatisation
dependency treebank
syntactic annotation
manual annotation
TEI
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  Slovenian language resource repository CLARIN.SI
Description:  http://www.language-archives.org/archive/clarin.si
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.clarin.si:11356/1086
DateStamp:  2018-10-24
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Arhar Holdt, Špela; Erjavec, Tomaž; Fišer, Darja. 2017. Jožef Stefan Institute.
Terms: area_Europe country_SI dcmi_Text iso639_slv olac_primary_text


http://www.language-archives.org/item.php/oai:www.clarin.si:11356/1086
Up-to-date as of: Tue Aug 20 10:27:04 EDT 2019