OLAC Record

Title:Written corpus ccGigafida 1.0
Bibliographic Citation:http://hdl.handle.net/11356/1035
Creator:Logar, Nataša
Erjavec, Tomaž
Krek, Simon
Grčar, Miha
Holozan, Peter
Date (W3CDTF):2015-06-01T09:01:03Z
Date Available:2015-06-01T09:01:03Z
Description:Corpus ccGigafida consists of paragraph samples from 31,722 documents, each containing information about the source (e.g. newspapers, magazines), year of publication, text type (fiction, newspaper), the title and author if they are known. The corpus is annotated with morphosyntactic descriptions (PoS-tagged) and lemmatised. It is encoded in XML TEI format (Text Encoding Initiative P5). The ccGigafida corpus contains approximately 9% of the Gigafida corpus, a reference corpus of Slovene: http://eng.slovenscina.eu/korpusi/gigafida. The corpus is available in source TEI-like XML and in the simpler and smaller vertical format, used by various concordancers. The XML file has PoS (MSD) tags in Slovenian only, while the vertical file has tags both in Slovenian and English. The corpus is also available as plain text, on file per text.
Identifier (URI):http://hdl.handle.net/11356/1035
Language (ISO639):slv
Publisher:Centre for Language Resources and Technologies, University of Ljubljana
Rights:Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Type (DCMI):Text
Type (OLAC):primary_text


Archive:  Slovenian language resource repository CLARIN.SI
Description:  http://www.language-archives.org/archive/clarin.si
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.clarin.si:11356/1035
DateStamp:  2017-09-29
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Logar, Nataša; Erjavec, Tomaž; Krek, Simon; Grčar, Miha; Holozan, Peter. 2015. Centre for Language Resources and Technologies, University of Ljubljana.
Terms: area_Europe country_SI dcmi_Text iso639_slv olac_primary_text

Up-to-date as of: Fri Jan 10 9:22:23 EST 2020