OLAC Record

Title:Automatically sentiment annotated Slovenian news corpus AutoSentiNews 1.0
Bibliographic Citation:http://hdl.handle.net/11356/1109
Creator:Bučar, Jože
Date (W3CDTF):2017-05-10T07:36:04Z
Date Available:2017-05-10T07:36:04Z
Description:The corpus contains 256,567 documents from the Slovenian news portals 24ur, Dnevnik, Finance, Rtvslo, and Žurnal24. These portals contain political, business, economic and financial content. The submission contains 7 files: 5 of them, which are named after the news portal, contain raw news in txt format retrieved with R crawlers for five Slovenian web media 1.0 (http://hdl.handle.net/11356/1105). The file AutoSentiNews contains of 5 text files that contain 256,567 news articles annotated as positive, negative or neutral at the document level. 1,0427 of them were manually annotated (cf. Manually sentiment annotated Slovenian news corpus SentiNews 1.0, http://hdl.handle.net/11356/1110) and the remaining 246,140 news were annotated automatically. The file SloStopWords contains of 1,784 stop words for Slovene.
Identifier (URI):http://hdl.handle.net/11356/1109
Language (ISO639):slv
Publisher:Faculty of Information Studies Novo mesto
Rights:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Subject:news corpus
sentiment classification
opinion mining
Type (DCMI):Text
Type (OLAC):primary_text


Archive:  Slovenian language resource repository CLARIN.SI
Description:  http://www.language-archives.org/archive/clarin.si
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.clarin.si:11356/1109
DateStamp:  2018-03-12
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Bučar, Jože. 2017. Faculty of Information Studies Novo mesto.
Terms: area_Europe country_SI dcmi_Text iso639_slv olac_primary_text

Up-to-date as of: Tue Aug 20 10:27:07 EDT 2019