OLAC Record
oai:www.clarin.si:11356/1037

Metadata
Title:Training corpus jos1M 1.1
Bibliographic Citation:http://hdl.handle.net/11356/1037
Creator:Erjavec, Tomaž
Krek, Simon
Date (W3CDTF):2015-06-06T22:24:21Z
Date Available:2015-06-06T22:24:21Z
Description:The jos1M corpus contains 1 million words of sampled paragraphs from the FidaPLUS corpus. It is meant to serve as a training corpus for word-level tagging of Slovene. This silver-standard corpus is annotated for morphosyntactic descriptions (fine grained PoS tags) and lemmas, with about one fourth of the most problematic annotations hand-validated. The corpus is available in source TEI P5 XML and in the simpler and smaller vertical format, used by various concordancers. Note that the vertical format does not contain all of the information from the source TEI.
Identifier (URI):http://hdl.handle.net/11356/1037
Is Replaced By (URI):http://hdl.handle.net/11356/1213
Language:Slovenian
Language (ISO639):slv
Publisher:Jožef Stefan Institute
Rights:Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
https://creativecommons.org/licenses/by-nc/4.0/
Subject:tagging
lemmatisation
manual annotation
TEI
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  Slovenian language resource repository CLARIN.SI
Description:  http://www.language-archives.org/archive/clarin.si
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.clarin.si:11356/1037
DateStamp:  2019-02-13
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Erjavec, Tomaž; Krek, Simon. 2015. Jožef Stefan Institute.
Terms: area_Europe country_SI dcmi_Text iso639_slv olac_primary_text


http://www.language-archives.org/item.php/oai:www.clarin.si:11356/1037
Up-to-date as of: Thu Sep 26 21:22:17 EDT 2019