OLAC Record

Title:Morphological lexicon Sloleks 2.0
Bibliographic Citation:http://hdl.handle.net/11356/1230
Creator:Dobrovoljc, Kaja
Krek, Simon
Holozan, Peter
Erjavec, Tomaž
Romih, Miro
Arhar Holdt, Špela
Čibej, Jaka
Krsnik, Luka
Robnik-Šikonja, Marko
Date (W3CDTF):2019-03-26T22:58:58Z
Date Available:2019-03-26T22:58:58Z
Description:Sloleks is the reference morphological lexicon for Slovenian language, developed to be used in NLP applications and language manuals. Encoded in LMF XML, the lexicon contains approx. 100,000 most frequent Slovenian lemmas, their inflected or derivative word forms and the corresponding grammatical description. Lemmatization rules, part-of-speech categorization and the set of feature-value pairs follow the JOS morphosyntactic specifications. In addition to grammatical information, each word form is also given the information on its absolute corpus frequency and its compliance with the reference language standard. Sloleks 2.0 includes accents automatically assigned by the use of neural networks (Krsnik 2017) and partially manually corrected, as well as automatically generated IPA and SAMPA transcriptions on lemmas and word-forms. The canonical version is encoded in XML, against the Sloleks LMF DTD. The resource is also available as a TSV file in the so MULTEXT-East format, with wordform, lemma, MSD and frequency columns, also mapped to Universal Dependencies features. References: Kaja Dobrovoljc, Simon Krek and Tomaž Erjavec, 2017: The Sloleks Morphological Lexicon and its Future Development. In (Vojko Gorjanc, Polona Gantar, Iztok Kosem and Simon Krek, eds.): Dictionary of Modern Slovene: Problems and Solutions. Ljubljana University Press, Faculty of Arts. https://e-knjige.ff.uni-lj.si/znanstvena-zalozba/catalog/download/2/1/47-1 Krsnik, Luka. Napovedovanje naglasa slovenskih besed z metodami strojnega učenja: magistrsko delo: magistrski program druge stopnje Računalništvo in informatika. Ljubljana: [L. Krsnik], 2017. http://eprints.fri.uni-lj.si/3978/
Identifier (URI):http://hdl.handle.net/11356/1230
Language (ISO639):slv
Publisher:Centre for Language Resources and Technologies, University of Ljubljana
Replaces (URI):http://hdl.handle.net/11356/1039
Rights:Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
word forms
word accents
Slovenian language
Subject (ISO639):slv
Type (DCMI):Text
Type (OLAC):lexicon


Archive:  Slovenian language resource repository CLARIN.SI
Description:  http://www.language-archives.org/archive/clarin.si
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.clarin.si:11356/1230
DateStamp:  2019-06-18
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Dobrovoljc, Kaja; Krek, Simon; Holozan, Peter; Erjavec, Tomaž; Romih, Miro; Arhar Holdt, Špela; Čibej, Jaka; Krsnik, Luka; Robnik-Šikonja, Marko. 2019. Centre for Language Resources and Technologies, University of Ljubljana.
Terms: area_Europe country_SI dcmi_Text iso639_slv olac_lexicon

Inferred Metadata

Country: Slovenia
Area: Europe

Up-to-date as of: Thu Dec 5 9:50:31 EST 2019