OLAC Record

Title:Nova Beseda Frequency Lexicon
Bibliographic Citation:http://hdl.handle.net/11356/1155
Creator:Jakopin, Primož
Date (W3CDTF):2017-09-25T08:53:32Z
Date Available:2017-09-25T08:53:32Z
Description:Nova beseda Frequency Lexicon was compiled from the Nova beseda text corpus at the Fran Ramovš Institute of Slovenian Language with hyphen characters unified and with leading and trailing non-breaking spaces deleted. Unlike most other Slovenian corpora Nova beseda texts were pre-processed before inclusion. Typos and words with supefluous hyphens, originating from false line joinings were corrected and parts of texts in foreign, non-Slovenian language were marked-up and excluded from the lexicon. The corpus contains 318 million tokens, mostly wordforms. It is available for search through the web page http://bos.zrc-sazu.si/a_beseda.html, where wordform search is reached by selecting "word seach" in the right hand side "What to do?" column. On the mentioned web page the corpus structure is also explained. The lexicon is UTF-8 coded, has 2,251,151 lines, each containing the following 2 data fields, tab separated: 1. token, Slovenian: pojavnica. The vast majority of tokens are wordforms, also included are numbers and selected multiword units such as URLs, e-mail addresses, place names like New York, car plates, ID numbers. 2. frequency, Slovenian: pogostnost. The sum of all frequencies is 318,170,212.
Identifier (URI):http://hdl.handle.net/11356/1155
Language (ISO639):slv
Publisher:ZRC SAZU
Rights:Creative Commons - Attribution 4.0 International (CC BY 4.0)
Subject:word forms
Slovenian language
Subject (ISO639):slv
Type (DCMI):Text
Type (OLAC):lexicon


Archive:  Slovenian language resource repository CLARIN.SI
Description:  http://www.language-archives.org/archive/clarin.si
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.clarin.si:11356/1155
DateStamp:  2019-09-26
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Jakopin, Primož. 2017. ZRC SAZU.
Terms: area_Europe country_SI dcmi_Text iso639_slv olac_lexicon

Inferred Metadata

Country: Slovenia
Area: Europe

Up-to-date as of: Thu Dec 5 9:50:19 EST 2019