OLAC Record
oai:lindat.mff.cuni.cz:11234/1-3416

Metadata
Title:Morpho-syntactically annotated corpora provided for the PARSEME Shared Task on Semi-Supervised Identification of Verbal Multiword Expressions (edition 1.2)
Bibliographic Citation:http://hdl.handle.net/11234/1-3416
Creator:Guillaume, Bruno
Ramisch, Carlos
Waszczuk, Jakub
Monti, Johanna
Di Buono, Maria Pia
Sangati, Federico
Speranza, Giulia
Carlino, Carola
Güngör, Tunga
Yirmibeşoğlu, Zeynep
Sak, Haşim
Saraçlar, Murat
Giouli, Voula
Foufi, Vassiliki
Ramisch, Renata
Rademaker, Alexandre
Vale, Oto
Wilkens, Rodrigo
Candito, Marie
Crabbé, Benoît
Segonne, Vincent
Liebeskind, Chaya
Stymne, Sara
Hajič, Jan
Ginter, Filip
Luotolahti, Juhani
Straka, Milan
Zeman, Daniel
Barbu Mititelu, Verginica
Cristescu, Mihaela
Vaidya, Ashwini
Bhatia, Archna
Lichte, Timm
Ehren, Rafael
Jiang, Menghan
Xu, Hongzhi
Walsh, Abigail
Irimia, Elena
Dowling, Meghan
Date (W3CDTF):2020-11-04T13:19:21Z
Date Available:2020-11-04T13:19:21Z
Description:This multilingual resource contains corpora for 14 languages, gathered at the occasion of the 1.2 edition of the PARSEME Shared Task on semi-supervised Identification of Verbal MWEs (2020). These corpora were meant to serve as additional "raw" corpora, to help discovering unseen verbal MWEs. The corpora are provided in CONLL-U (https://universaldependencies.org/format.html) format. They contain morphosyntactic annotations (parts of speech, lemmas, morphological features, and syntactic dependencies). Depending on the language, the information comes from treebanks (mostly Universal Dependencies v2.x) or from automatic parsers trained on UD v2.x treebanks (e.g., UDPipe). VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). For the 1.2 shared task edition, the data covers 14 languages, for which VMWEs were annotated according to the universal guidelines. The corpora are provided in the cupt format, inspired by the CONLL-U format. Morphological and syntactic information ­­­­– not necessarily using UD tagsets – including parts of speech, lemmas, morphological features and/or syntactic dependencies are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). This item contains training, development and test data, as well as the evaluation tools used in the PARSEME Shared Task 1.2 (2020). The annotation guidelines are available online: http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.2
Identifier (URI):http://hdl.handle.net/11234/1-3416
Language:German
Modern Greek (1453-)
Basque
French
Irish
Hebrew
Hindi
Italian
Polish
Portuguese
Romanian
Swedish
Turkish
Chinese
Language (ISO639):deu
ell
eus
fra
gle
heb
hin
ita
pol
por
ron
swe
tur
zho
Publisher:PARSEME
Rights:PARSEME Shared Task Raw Corpus Data (v. 1.2) Agreement
https://lindat.mff.cuni.cz/repository/xmlui/page/licence-mwe-1.2-raw
Subject:morphosyntactic annotation
dependency trees
morphological analysis
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11234/1-3416
DateStamp:  2021-03-22
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Guillaume, Bruno; Ramisch, Carlos; Waszczuk, Jakub; Monti, Johanna; Di Buono, Maria Pia; Sangati, Federico; Speranza, Giulia; Carlino, Carola; Güngör, Tunga; Yirmibeşoğlu, Zeynep; Sak, Haşim; Saraçlar, Murat; Giouli, Voula; Foufi, Vassiliki; Ramisch, Renata; Rademaker, Alexandre; Vale, Oto; Wilkens, Rodrigo; Candito, Marie; Crabbé, Benoît; Segonne, Vincent; Liebeskind, Chaya; Stymne, Sara; Hajič, Jan; Ginter, Filip; Luotolahti, Juhani; Straka, Milan; Zeman, Daniel; Barbu Mititelu, Verginica; Cristescu, Mihaela; Vaidya, Ashwini; Bhatia, Archna; Lichte, Timm; Ehren, Rafael; Jiang, Menghan; Xu, Hongzhi; Walsh, Abigail; Irimia, Elena; Dowling, Meghan. 2020. PARSEME.
Terms: area_Asia area_Europe country_DE country_ES country_FR country_GR country_IE country_IL country_IN country_IT country_PL country_PT country_RO country_SE country_TR dcmi_Text iso639_deu iso639_ell iso639_eus iso639_fra iso639_gle iso639_heb iso639_hin iso639_ita iso639_pol iso639_por iso639_ron iso639_swe iso639_tur iso639_zho olac_primary_text


http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11234/1-3416
Up-to-date as of: Tue Mar 23 7:07:40 EDT 2021