OLAC Record
oai:lindat.mff.cuni.cz:11234/1-5411

Metadata
Title:AlbNews Albanian Topic Modeling
Bibliographic Citation:http://hdl.handle.net/11234/1-5411
Creator:Çano, Erion
Date (W3CDTF):2024-02-19T08:40:46Z
Date Available:2024-02-19T08:40:46Z
Description:AlbNews is a topic modeling corpus of news headlines in Albanian, consisting of 600 labeled samples and 2600 unlabeled samples. Each labeled sample includes a headline text retrieved from Albanian online news portals. It also contains one of the four labels: 'pol' for politics, 'cul' for culture, 'eco' for economy, and 'spo' for sport. Each of the unlabeled samples contain a headline text only.AlbTopic corpus is released under CC-BY 4.0 license (https://creativecommons.org/licenses/by/4.0/). If using the data, please cite the following paper: Çano Erion, Lamaj Dario. AlbNews: A Corpus of Headlines for Topic Modeling in Albanian. CoRR, abs/2402.04028, 2024. URL: https://arxiv.org/abs/2402.04028.
Identifier (URI):http://hdl.handle.net/11234/1-5411
Language:Albanian
Language (ISO639):sqi
Publisher:University of Vienna
Rights:Creative Commons - Attribution 4.0 International (CC BY 4.0)
http://creativecommons.org/licenses/by/4.0/
Subject:under-resourced language
albanian language
topic modeling
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11234/1-5411
DateStamp:  2024-02-19
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Çano, Erion. 2024. University of Vienna.
Terms: dcmi_Text iso639_sqi olac_primary_text


http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11234/1-5411
Up-to-date as of: Wed Mar 5 0:42:35 EST 2025