OLAC Record

Title:Terminology identification dataset KAS-term 1.0
Bibliographic Citation:http://hdl.handle.net/11356/1198
Creator:Erjavec, Tomaž
Fišer, Darja
Ljubešić, Nikola
Arhar Holdt, Špela
Bren, Urban
Robnik Šikonja, Marko
Udovič, Boštjan
Date (W3CDTF):2018-08-18T12:09:21Z
Date Available:2018-08-18T12:09:21Z
Description:The dataset contains 22,950 term candidates extracted from 15 Slovenian PhD theses. The term candidates are of length 1 to 4, extracted via morphosyntactic patterns and the frequency threshold of 3. The PhD theses are from the areas of chemistry, computer science and political science. Each of the term candidates is annotated by four annotators as being (1) in-domain term, (2) out-of-domain term, (3) general academic term or (4) not a term. Each term candidate is also annotated with its frequency in the PhD thesis and 7 statistical measures. The resource can serve as a training set for supervised learning of term extraction and for terminology extraction tool benchmarking.
Identifier (URI):http://hdl.handle.net/11356/1198
Language (ISO639):slv
Publisher:Jožef Stefan Institute
Rights:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
manual annotation
Slovenian language
Subject (ISO639):slv
Type (DCMI):Text
Type (OLAC):lexicon


Archive:  Slovenian language resource repository CLARIN.SI
Description:  http://www.language-archives.org/archive/clarin.si
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.clarin.si:11356/1198
DateStamp:  2018-08-23
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Erjavec, Tomaž; Fišer, Darja; Ljubešić, Nikola; Arhar Holdt, Špela; Bren, Urban; Robnik Šikonja, Marko; Udovič, Boštjan. 2018. Jožef Stefan Institute.
Terms: area_Europe country_SI dcmi_Text iso639_slv olac_lexicon

Inferred Metadata

Country: Slovenia
Area: Europe

Up-to-date as of: Fri Jan 10 9:22:52 EST 2020