OLAC Record
oai:lindat.mff.cuni.cz:11234/1-5528

Metadata
Title:Multilingual static embeddings for Verbal Multiword Expressions trained on PARSEME raw corpora
Bibliographic Citation:http://hdl.handle.net/11234/1-5528
Creator:Estève, Louis Clément
Savary, Agata
Lavergne, Thomas
Date (W3CDTF):2024-07-12T11:53:50Z
Date Available:2024-07-12T11:53:50Z
Description:This resource is a set of 14 vector spaces for single words and Verbal Multiword Expressions (VMWEs) in different languages (German, Greek, Basque, French, Irish, Hebrew, Hindi, Italian, Polish, Brazilian Portuguese, Romanian, Swedish, Turkish, Chinese). They were trained with the Word2Vec algorithm, in its skip-gram version, on PARSEME raw corpora automatically annotated for morpho-syntax (http://hdl.handle.net/11234/1-3367). These corpora were annotated by Seen2Seen, a rule-based VMWE identifier, one of the leading tools of the PARSEME shared task version 1.2. VMWE tokens were merged into single tokens. The format of the vector space files is that of the original Word2Vec implementation by Mikolov et al. (2013), i.e. a binary format. For compression, bzip2 was used.
Identifier (URI):http://hdl.handle.net/11234/1-5528
Language:German
Modern Greek (1453-)
Basque
French
Irish
Hebrew
Hindi
Italian
Polish
Portuguese
Romanian
Swedish
Turkish
Chinese
Language (ISO639):deu
ell
eus
fra
gle
heb
hin
ita
pol
por
ron
swe
tur
zho
Publisher:Université Paris-Saclay, CNRS, Laboratoire Interdisciplinaire des Sciences du Numérique
Rights:PARSEME Shared Task Raw Corpus Data (v. 1.2) Agreement
https://lindat.mff.cuni.cz/repository/xmlui/page/licence-mwe-1.2-raw
Subject:verbal multiword expressions
word embeddings
word2vec
German language
Modern Greek (1453-) language
Basque language
French language
Irish language
Hebrew language
Hindi language
Italian language
Polish language
Portuguese language
Romanian language
Swedish language
Turkish language
Chinese language
Subject (ISO639):deu
ell
eus
fra
gle
heb
hin
ita
pol
por
ron
swe
tur
zho
Type:lexicalConceptualResource
Type (DCMI):Text
Type (OLAC):lexicon

OLAC Info

Archive:  LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11234/1-5528
DateStamp:  2024-07-12
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Estève, Louis Clément; Savary, Agata; Lavergne, Thomas. 2024. Université Paris-Saclay, CNRS, Laboratoire Interdisciplinaire des Sciences du Numérique.
Terms: area_Asia area_Europe country_DE country_ES country_FR country_GR country_IE country_IL country_IN country_IT country_PL country_PT country_RO country_SE country_TR dcmi_Text iso639_deu iso639_ell iso639_eus iso639_fra iso639_gle iso639_heb iso639_hin iso639_ita iso639_pol iso639_por iso639_ron iso639_swe iso639_tur iso639_zho olac_lexicon

Inferred Metadata

Country: GermanySpainFranceGreeceIrelandIsraelIndiaItalyPolandPortugalRomaniaSwedenTurkey
Area: AsiaEurope


http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11234/1-5528
Up-to-date as of: Wed Mar 5 0:42:38 EST 2025