OLAC Record

Title:CoNLL 2017 Shared Task - Automatically Annotated Raw Texts and Word Embeddings
Bibliographic Citation:http://hdl.handle.net/11234/1-1989
Creator:Ginter, Filip
Hajič, Jan
Luotolahti, Juhani
Straka, Milan
Zeman, Daniel
Date (W3CDTF):2017-03-16T11:57:32Z
Date Available:2017-03-16T11:57:32Z
Description:Automatic segmentation, tokenization and morphological and syntactic annotations of raw texts in 45 languages, generated by UDPipe (http://ufal.mff.cuni.cz/udpipe), together with word embeddings of dimension 100 computed from lowercased texts by word2vec (https://code.google.com/archive/p/word2vec/). For each language, automatic annotations in CoNLL-U format are provided in a separate archive. The word embeddings for all languages are distributed in one archive. Note that the CC BY-SA-NC 4.0 license applies to the automatically generated annotations and word embeddings, not to the underlying data, which may have different license and impose additional restrictions. Update 2018-09-03 =============== Added data in the 4 “surprise languages” from the 2017 ST: Buryat, Kurmanji, North Sami and Upper Sorbian. This has been promised before, during CoNLL-ST 2018 we gave the participants a link to this record saying the data was here. It wasn't, sorry. But now it is.
Identifier (URI):http://hdl.handle.net/11234/1-1989
Language:Multiple languages
Language (ISO639):mul
Publisher:Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Rights:Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Subject:CoNLL 2017
word embeddings
automatic annotation
Multiple languages
Subject (ISO639):mul
Type (DCMI):Text
Type (OLAC):language_description


Archive:  LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11234/1-1989
DateStamp:  2021-06-29
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Ginter, Filip; Hajič, Jan; Luotolahti, Juhani; Straka, Milan; Zeman, Daniel. 2017. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL).
Terms: dcmi_Text iso639_mul olac_language_description

Inferred Metadata


Up-to-date as of: Thu Oct 5 0:40:42 EDT 2023