OLAC Record
oai:www.clarin.si:11356/1200

Metadata
Title:Training corpus SETimes.SR 1.0
Bibliographic Citation:http://hdl.handle.net/11356/1200
Creator:Batanović, Vuk
Ljubešić, Nikola
Samardžić, Tanja
Erjavec, Tomaž
Date (W3CDTF):2018-08-25T15:02:15Z
Date Available:2018-08-25T15:02:15Z
Description:The SETimes.SR training corpus contains 86 726 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation, syntactic dependencies, and named entities. The annotations (and other aspects) of the corpus are documented in the teiHeader and back element of the TEI encoded corpus. In short, they follow (1) the MULTEXT-East V5 morphosyntactic specifications, http://nl.ijs.si/ME/V5/msd/, (2) the UDv2 Guidelines, http://universaldependencies.org/guidelines.html, and (3) the Janes annotation guidelines for named entities, http://nl.ijs.si/janes/wp-content/uploads/2017/09/SlovenianNER-eng-v1.1.pdf.
Identifier (URI):http://hdl.handle.net/11356/1200
Language:Serbian
Language (ISO639):srp
Publisher:Regional Linguistic Data Initiative Centre ReLDI
Rights:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
https://creativecommons.org/licenses/by-sa/4.0/
Subject:part-of-speech tagging
dependency treebank
parsing
named entities
tokenisation
manual annotation
TEI
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  Slovenian language resource repository CLARIN.SI
Description:  http://www.language-archives.org/archive/clarin.si
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.clarin.si:11356/1200
DateStamp:  2019-10-10
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Batanović, Vuk; Ljubešić, Nikola; Samardžić, Tanja; Erjavec, Tomaž. 2018. Regional Linguistic Data Initiative Centre ReLDI.
Terms: area_Europe country_RS dcmi_Text iso639_srp olac_primary_text


http://www.language-archives.org/item.php/oai:www.clarin.si:11356/1200
Up-to-date as of: Fri Jan 10 9:22:53 EST 2020