OLAC Record
oai:www.clarin.si:11356/1204

Metadata
Title:Word embeddings CLARIN.SI-embed.sl 1.0
Bibliographic Citation:http://hdl.handle.net/11356/1204
Creator:Ljubešić, Nikola
Erjavec, Tomaž
Date (W3CDTF):2018-11-26T18:25:21Z
Date Available:2018-11-26T18:25:21Z
Description:CLARIN.SI-embed.sl contains word embeddings induced from a large collection of Slovene texts composed of existing corpora of Slovene, e.g GigaFida, Janes, KAS, slWaC etc. The embeddings are based on the skip-gram model of fastText trained on 3,557,125,771 tokens of running text for (1) 2,466,596 lowercased surface forms (e.g., "slovenije") and (2) 2,093,848 lowercased lemmas with added part-of-speech information (e.g., "slovenija#Np").
Identifier (URI):http://hdl.handle.net/11356/1204
Language:Slovenian
Language (ISO639):slv
Publisher:Jožef Stefan Institute
Rights:Creative Commons - Attribution 4.0 International (CC BY 4.0)
https://creativecommons.org/licenses/by/4.0/
Subject:word embeddings
lemmatisation
part-of-speech tagging
Slovenian language
Subject (ISO639):slv
Type:lexicalConceptualResource
Type (DCMI):Text
Type (OLAC):lexicon

OLAC Info

Archive:  Slovenian language resource repository CLARIN.SI
Description:  http://www.language-archives.org/archive/clarin.si
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.clarin.si:11356/1204
DateStamp:  2019-10-10
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Ljubešić, Nikola; Erjavec, Tomaž. 2018. Jožef Stefan Institute.
Terms: area_Europe country_SI dcmi_Text iso639_slv olac_lexicon

Inferred Metadata

Country: Slovenia
Area: Europe


http://www.language-archives.org/item.php/oai:www.clarin.si:11356/1204
Up-to-date as of: Thu Dec 5 9:50:27 EST 2019