OLAC Record
oai:www.clarin.si:11356/1193

Metadata
Title:Kres corpus n-grams 2.0
Bibliographic Citation:http://hdl.handle.net/11356/1193
Creator:Dobrovoljc, Kaja
Date (W3CDTF):2018-08-03T18:43:23Z
Date Available:2018-08-03T18:43:23Z
Description:A collection of n-grams extracted from the Kres corpus of written Slovene (cf. http://eng.slovenscina.eu/korpusi/kres). Three sets of n-gram lists are provided for lowercased word n-grams of length 1 to 5: - extensive frequency lists of all extracted n-grams - filtered frequency lists of n-grams with minimum frequency 10/mil. - adjusted frequency list of all n-grams with minimum frequency 10/mil. Only n-grams within sentences have been counted, ignoring punctuation. For the filtered and adjusted list, only n-grams occurring in at least 2 different texts have been extracted. Key references: - K. Dobrovoljc, 2018. N-gram frequency lists for reference corpora of Slovenian language. Proceedings of the Language Technologies & Digital Humanities Conference 2018. - N. Logar Berginc, M. Grčar, M. Brakus, T. Erjavec, Š. Arhar Holdt in S. Krek (2012): Korpusi slovenskega jezika Gigafida, KRES, ccGigafida in ccKRES: gradnja, vsebina, uporaba. Ljubljana: Trojina, zavod za uporabno slovenistiko; Fakulteta za družbene vede. - M. B. O’Donnell, 2010. The adjusted frequency list: A method to produce cluster-sensitive frequency lists. ICAME Journal 35, 135–169.
Identifier (URI):http://hdl.handle.net/11356/1193
Language:Slovenian
Language (ISO639):slv
Publisher:Centre for Language Resources and Technologies, University of Ljubljana
Replaces (URI):http://hdl.handle.net/11356/1045
Rights:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
https://creativecommons.org/licenses/by-sa/4.0/
Subject:n-grams
wordlist
multiword expressions
Slovenian language
Subject (ISO639):slv
Type:lexicalConceptualResource
Type (DCMI):Text
Type (OLAC):lexicon

OLAC Info

Archive:  Slovenian language resource repository CLARIN.SI
Description:  http://www.language-archives.org/archive/clarin.si
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.clarin.si:11356/1193
DateStamp:  2018-08-03
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Dobrovoljc, Kaja. 2018. Centre for Language Resources and Technologies, University of Ljubljana.
Terms: area_Europe country_SI dcmi_Text iso639_slv olac_lexicon

Inferred Metadata

Country: Slovenia
Area: Europe


http://www.language-archives.org/item.php/oai:www.clarin.si:11356/1193
Up-to-date as of: Thu Sep 26 21:22:43 EDT 2019