![]() |
OLAC Record oai:lindat.mff.cuni.cz:11234/1-5528 |
Metadata | ||
Title: | Multilingual static embeddings for Verbal Multiword Expressions trained on PARSEME raw corpora | |
Bibliographic Citation: | http://hdl.handle.net/11234/1-5528 | |
Creator: | Estève, Louis Clément | |
Savary, Agata | ||
Lavergne, Thomas | ||
Date (W3CDTF): | 2024-07-12T11:53:50Z | |
Date Available: | 2024-07-12T11:53:50Z | |
Description: | This resource is a set of 14 vector spaces for single words and Verbal Multiword Expressions (VMWEs) in different languages (German, Greek, Basque, French, Irish, Hebrew, Hindi, Italian, Polish, Brazilian Portuguese, Romanian, Swedish, Turkish, Chinese). They were trained with the Word2Vec algorithm, in its skip-gram version, on PARSEME raw corpora automatically annotated for morpho-syntax (http://hdl.handle.net/11234/1-3367). These corpora were annotated by Seen2Seen, a rule-based VMWE identifier, one of the leading tools of the PARSEME shared task version 1.2. VMWE tokens were merged into single tokens. The format of the vector space files is that of the original Word2Vec implementation by Mikolov et al. (2013), i.e. a binary format. For compression, bzip2 was used. | |
Identifier (URI): | http://hdl.handle.net/11234/1-5528 | |
Language: | German | |
Modern Greek (1453-) | ||
Basque | ||
French | ||
Irish | ||
Hebrew | ||
Hindi | ||
Italian | ||
Polish | ||
Portuguese | ||
Romanian | ||
Swedish | ||
Turkish | ||
Chinese | ||
Language (ISO639): | deu | |
ell | ||
eus | ||
fra | ||
gle | ||
heb | ||
hin | ||
ita | ||
pol | ||
por | ||
ron | ||
swe | ||
tur | ||
zho | ||
Publisher: | Université Paris-Saclay, CNRS, Laboratoire Interdisciplinaire des Sciences du Numérique | |
Rights: | PARSEME Shared Task Raw Corpus Data (v. 1.2) Agreement | |
https://lindat.mff.cuni.cz/repository/xmlui/page/licence-mwe-1.2-raw | ||
Subject: | verbal multiword expressions | |
word embeddings | ||
word2vec | ||
German language | ||
Modern Greek (1453-) language | ||
Basque language | ||
French language | ||
Irish language | ||
Hebrew language | ||
Hindi language | ||
Italian language | ||
Polish language | ||
Portuguese language | ||
Romanian language | ||
Swedish language | ||
Turkish language | ||
Chinese language | ||
Subject (ISO639): | deu | |
ell | ||
eus | ||
fra | ||
gle | ||
heb | ||
hin | ||
ita | ||
pol | ||
por | ||
ron | ||
swe | ||
tur | ||
zho | ||
Type: | lexicalConceptualResource | |
Type (DCMI): | Text | |
Type (OLAC): | lexicon | |
OLAC Info |
||
Archive: | LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University | |
Description: | http://www.language-archives.org/archive/lindat.mff.cuni.cz | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:lindat.mff.cuni.cz:11234/1-5528 | |
DateStamp: | 2024-07-12 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Estève, Louis Clément; Savary, Agata; Lavergne, Thomas. 2024. Université Paris-Saclay, CNRS, Laboratoire Interdisciplinaire des Sciences du Numérique. | |
Terms: | area_Asia area_Europe country_DE country_ES country_FR country_GR country_IE country_IL country_IN country_IT country_PL country_PT country_RO country_SE country_TR dcmi_Text iso639_deu iso639_ell iso639_eus iso639_fra iso639_gle iso639_heb iso639_hin iso639_ita iso639_pol iso639_por iso639_ron iso639_swe iso639_tur iso639_zho olac_lexicon | |
Inferred Metadata | ||
Country: | GermanySpainFranceGreeceIrelandIsraelIndiaItalyPolandPortugalRomaniaSwedenTurkey | |
Area: | AsiaEurope |