OLAC Record
oai:www.clarin.si:11356/1137

Metadata
Title:Wikipedia talk corpus Janes-Wiki 1.0
Bibliographic Citation:http://hdl.handle.net/11356/1137
Creator:Ljubešić, Nikola
Erjavec, Tomaž
Fišer, Darja
Date (W3CDTF):2017-08-31T07:08:15Z
Date Available:2017-08-31T07:08:15Z
Description:Janes-Wiki is an annotated corpus of discussion pages from the Slovene Wikipedia from the period 2003-08 to 2017-06. The corpus contains page and user talks and is structured into individual pages and their comments, together with their metadata. The texts in the corpus are tokenised, sentence segmented, word normalised, morphosyntactically tagged, lemmatised and annotated with named entities.
Identifier (URI):http://hdl.handle.net/11356/1137
Language:Slovenian
Language (ISO639):slv
Publisher:Jožef Stefan Institute
Rights:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
https://creativecommons.org/licenses/by-sa/4.0/
Subject:computer-mediated communication
Wikipedia
word normalisation
tagging
lemmatisation
named entities
TEI
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  Slovenian language resource repository CLARIN.SI
Description:  http://www.language-archives.org/archive/clarin.si
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.clarin.si:11356/1137
DateStamp:  2018-10-29
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Ljubešić, Nikola; Erjavec, Tomaž; Fišer, Darja. 2017. Jožef Stefan Institute.
Terms: area_Europe country_SI dcmi_Text iso639_slv olac_primary_text


http://www.language-archives.org/item.php/oai:www.clarin.si:11356/1137
Up-to-date as of: Tue Aug 20 10:27:11 EDT 2019