OLAC Record
oai:www.clarin.si:11356/1029

Metadata
Title:Training corpus ssj500k 1.3
Bibliographic Citation:http://hdl.handle.net/11356/1029
Creator:Krek, Simon
Erjavec, Tomaž
Dobrovoljc, Kaja
Može, Sara
Ledinek, Nina
Holz, Nanika
Date (W3CDTF):2015-05-17T19:14:37Z
Date Available:2015-05-17T19:14:37Z
Description:The ssj500k training corpus is based on two training corpora built within the JOS project (http://nl.ijs.si/jos/). It contains the jos100k corpus and additional material from the jos1M corpus forming a training corpus with 500,000 words, manually checked and annotated on the levels of tokenization, segmentation, morphosyntactic tagging, syntactic dependency parsing and named entities. The ssj500k corpus uses the JOS morphosyntactic tagset with 1,902 tags and dependencies with 10 labels. The part of the corpus annotated with dependency relations contains 11,411 sentences, named entities are annotated in the original jos100k part of the corpus.
Identifier (URI):http://hdl.handle.net/11356/1029
Is Replaced By (URI):http://hdl.handle.net/11356/1052
Language:Slovenian
Language (ISO639):slv
Publisher:Centre for Language Resources and Technologies, University of Ljubljana
Rights:Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
https://creativecommons.org/licenses/by-nc-sa/4.0/
Subject:tagging
dependency treebank
parsing
named entities
tokenisation
manual annotation
TEI
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  Slovenian language resource repository CLARIN.SI
Description:  http://www.language-archives.org/archive/clarin.si
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.clarin.si:11356/1029
DateStamp:  2017-10-13
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Krek, Simon; Erjavec, Tomaž; Dobrovoljc, Kaja; Može, Sara; Ledinek, Nina; Holz, Nanika. 2015. Centre for Language Resources and Technologies, University of Ljubljana.
Terms: area_Europe country_SI dcmi_Text iso639_slv olac_primary_text


http://www.language-archives.org/item.php/oai:www.clarin.si:11356/1029
Up-to-date as of: Tue Aug 20 10:26:52 EDT 2019