OLAC Record
oai:www.clarin.si:11356/1025

Metadata
Title:Reference corpus of historical Slovene goo300k 1.2
Bibliographic Citation:http://hdl.handle.net/11356/1025
Creator:Erjavec, Tomaž
Date (W3CDTF):2015-05-07T18:23:33Z
Date Available:2015-05-07T18:23:33Z
Description:goo300k is a manually annotated reference corpus of historical Slovene. It contains 1,100 pages (about 300,000 tokens) sampled from 89 texts from the period 1584-1899. Each text contains extensive meta-data and per-page links to facsimiles, while the word tokens in the texts are annotated with their modernised word-form, lemma, part-of-speech, and, for archaic words, their nearest modern synonyms or short explanation. The corpus is available in source TEI P5 XML and in the simpler and smaller vertical format, used by various concordancers. Note that the vertical format does not contain all the information from the source TEI.
Identifier (URI):http://hdl.handle.net/11356/1025
Language:Slovenian
Language (ISO639):slv
Publisher:Jožef Stefan Institute
Rights:Creative Commons - Attribution 4.0 International (CC BY 4.0)
https://creativecommons.org/licenses/by/4.0/
Subject:historical language
word modernisation
lemmatisation
tagging
manual annotation
TEI
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  Slovenian language resource repository CLARIN.SI
Description:  http://www.language-archives.org/archive/clarin.si
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.clarin.si:11356/1025
DateStamp:  2018-10-24
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Erjavec, Tomaž. 2015. Jožef Stefan Institute.
Terms: area_Europe country_SI dcmi_Text iso639_slv olac_primary_text


http://www.language-archives.org/item.php/oai:www.clarin.si:11356/1025
Up-to-date as of: Wed Jul 17 9:50:16 EDT 2019