OLAC Record
oai:www.clarin.si:11356/1061

Metadata
Title:Slovene-English parallel corpus slenWaC 1.0
Bibliographic Citation:http://hdl.handle.net/11356/1061
Creator:Ljubešić, Nikola
Esplà-Gomis, Miquel
Ortiz Rojas, Sergio
Klubička, Filip
Toral, Antonio
Date (W3CDTF):2016-03-10T15:21:18Z
Date Available:2016-03-10T15:21:18Z
Description:The slenWaC corpus version 1.0 consists of parallel Slovene-English texts crawled from the .si top-level domain for Slovenia. The corpus was built with Spidextor (https://github.com/abumatran/spidextor), a tool that glues together the output of SpiderLing used for crawling and Bitextor used for bitext extraction. The accuracy of the extracted bitext on the segment level is around 67% and on the word level around 68%.
Identifier (URI):http://hdl.handle.net/11356/1061
Language:Slovenian
English
Language (ISO639):slv
eng
Publisher:Jožef Stefan Institute
Rights:CLARIN.SI User Licence for Internet Corpora
http://www.clarin.si/info/wp-content/uploads/2016/01/CLARIN.SI-WAC-2016-01.pdf
Subject:parallel corpus
web corpus
multilingual
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  Slovenian language resource repository CLARIN.SI
Description:  http://www.language-archives.org/archive/clarin.si
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.clarin.si:11356/1061
DateStamp:  2019-02-23
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Ljubešić, Nikola; Esplà-Gomis, Miquel; Ortiz Rojas, Sergio; Klubička, Filip; Toral, Antonio. 2016. Jožef Stefan Institute.
Terms: area_Europe country_GB country_SI dcmi_Text iso639_eng iso639_slv olac_primary_text


http://www.language-archives.org/item.php/oai:www.clarin.si:11356/1061
Up-to-date as of: Tue Aug 20 10:27:00 EDT 2019