OLAC Record
oai:www.clarin.si:11356/1201

Metadata
Title:Dataset and baseline model of moderated content FRENK-MMC-RTV 1.0
Bibliographic Citation:http://hdl.handle.net/11356/1201
Creator:Ljubešić, Nikola
Erjavec, Tomaž
Fišer, Darja
Date (W3CDTF):2018-10-27T13:50:26Z
Date Available:2018-10-27T13:50:26Z
Description:FRENK-MMC-RTV is a dataset of moderated newspaper comments from the website rtvslo.si with metadata on the time of publishing, user identifier, thread identifier and whether the comment was deleted by the moderators or not. The full text of each comment is encrypted via a character-replacement method so that the comments are not readable by humans. Basic punctuation is not encrypted in order to enable tokenization. The main use of this dataset are experiments on automating comment moderation. For real-world usage, a fastText classification model trained on non-encrypted data is made available as well.
Identifier (URI):http://hdl.handle.net/11356/1201
Language:Slovenian
Language (ISO639):slv
Publisher:Jožef Stefan Institute
Rights:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
https://creativecommons.org/licenses/by-sa/4.0/
Subject:computer-mediated communication
news comments
content moderation
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  Slovenian language resource repository CLARIN.SI
Description:  http://www.language-archives.org/archive/clarin.si
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.clarin.si:11356/1201
DateStamp:  2018-10-27
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Ljubešić, Nikola; Erjavec, Tomaž; Fišer, Darja. 2018. Jožef Stefan Institute.
Terms: area_Europe country_SI dcmi_Text iso639_slv olac_primary_text


http://www.language-archives.org/item.php/oai:www.clarin.si:11356/1201
Up-to-date as of: Wed Jul 17 9:50:55 EDT 2019