OLAC Record
oai:www.clarin.si:11356/1139

Metadata
Title:Forum corpus Janes-Forum 1.0
Bibliographic Citation:http://hdl.handle.net/11356/1139
Creator:Erjavec, Tomaž
Ljubešić, Nikola
Fišer, Darja
Date (W3CDTF):2017-08-31T07:16:34Z
Date Available:2017-08-31T07:16:34Z
Description:Janes-Forum is an annotated corpus of Slovene forums from websites med.over.net, avtomobilizem.com, and kvarkadabra.net from the period 2001-02 to 2015-01. The corpus is structured into forums, threads and posts, together with their metadata. The texts in the corpus are tokenised, sentence segmented, word normalised, morphosyntactically tagged, lemmatised and annotated with named entities. Due to protection of privacy and compliance with wishes of platform owners, usernames are not included in the metadata, and 'person', 'person derivative' and 'company name' named entities have been removed from the texts.
Identifier (URI):http://hdl.handle.net/11356/1139
Language:Slovenian
Language (ISO639):slv
Publisher:Jožef Stefan Institute
Rights:Creative Commons - Attribution 4.0 International (CC BY 4.0)
https://creativecommons.org/licenses/by/4.0/
Subject:computer-mediated communication
forums
word normalisation
tagging
lemmatisation
named entities
TEI
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  Slovenian language resource repository CLARIN.SI
Description:  http://www.language-archives.org/archive/clarin.si
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.clarin.si:11356/1139
DateStamp:  2018-10-29
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Erjavec, Tomaž; Ljubešić, Nikola; Fišer, Darja. 2017. Jožef Stefan Institute.
Terms: area_Europe country_SI dcmi_Text iso639_slv olac_primary_text


http://www.language-archives.org/item.php/oai:www.clarin.si:11356/1139
Up-to-date as of: Tue Aug 20 10:27:11 EDT 2019