Title:Automatically sentiment annotated Slovenian news corpus AutoSentiNews 1.0
Bibliographic Citation:http://hdl.handle.net/11356/1109
Creator:Bučar, Jože
Date (W3CDTF):2017-05-10T07:36:04Z
Date Available:2017-05-10T07:36:04Z
Description:The corpus contains 256,567 documents from the Slovenian news portals 24ur, Dnevnik, Finance, Rtvslo, and Žurnal24. These portals contain political, business, economic and financial content. The submission contains 7 files: 5 of them, which are named after the news portal, contain raw news in txt format retrieved with R crawlers for five Slovenian web media 1.0 (http://hdl.handle.net/11356/1105). The file AutoSentiNews contains of 5 text files that contain 256,567 news articles annotated as positive, negative or neutral at the document level. 1,0427 of them were manually annotated (cf. Manually sentiment annotated Slovenian news corpus SentiNews 1.0, http://hdl.handle.net/11356/1110) and the remaining 246,140 news were annotated automatically. The file SloStopWords contains of 1,784 stop words for Slovene.
Identifier (URI):http://hdl.handle.net/11356/1109
Language (ISO639):slv
Publisher:Faculty of Information Studies Novo mesto
Rights:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Subject:news corpus
sentiment classification
opinion mining
Type (DCMI):Text
Type (OLAC):primary_text


Citation: Bučar, Jože. 2017. Faculty of Information Studies Novo mesto.
