OLAC Record oai:dspace-clarin-it.ilc.cnr.it:20.500.11752/OPEN-557 |
Metadata | ||
Title: | Italian Sense Inventory | |
Bibliographic Citation: | http://hdl.handle.net/20.500.11752/OPEN-557 | |
Creator: | Poli, Francesca | |
Date (W3CDTF): | 2021-07-01T12:23:35Z | |
Date Available: | 2021-07-01T12:23:35Z | |
Description: | The present Sense Inventory is an Italian language resource automatically derived from two Italian computational lexicons: ItalWordNet (https://dspace-clarin-it.ilc.cnr.it/repository/xmlui/handle/20.500.11752/ILC-62) and PAROLE-SIMPLE-CLIPS (https://dspace-clarin-it.ilc.cnr.it/repository/xmlui/handle/20.500.11752/ILC-88). It was built in collaboration with the CNR Institute of Computational Linguistics as an experiment related to the ELEXIS project (https://elex.is/), with the aim to produce a synthetic and structured inventory of senses to be used for the sense annotation of the ELEXIS WSD test corpus. This Sense Inventory is thus based upon the selection of lemmas occurring in the ELEXIS test corpus and on the merged sense information derived from the two existing lexicons. The Python program developed for the automatic construction of the Sense Inventory takes as input the ELEXIS dataset, extracts the lemmas from its sentences and searches for all related senses in the above mentioned resources. It also makes use of a sense mapping database of the cited lexicons, 'iwnmapdb', available upon request from CNR-ILC. The extrapolated and checked data are then arranged in a formal structure in which for each lemma - PoS pair the following details are given: - Not mapped senses extracted from PAROLE-SIMPLE-CLIPS (PSC), - Mapped senses extracted from the mapping database 'iwnmapdb', - Not mapped senses extracted from ItalWordNet (IWN). All fields with no value are filled with None. The tab separated format thus has the following structure: LEMMA POS CONCATENATED DEFINITION PSC-IWN USEMID PSC DEFINITION PSC EXAMPLE PSC SEMANTIC TYPE PSC SYNSETID IWN SENSEID IWN DEFINITION IWN The total number of lemmas (with a ADV/ADJ/NOUN/VERB part of speech) inserted in the Sense Inventory amounts to 3860. There are 12,944 senses and mappings reported in the Sense Inventory, out of a total of 15,672 senses extracted from PAROLE-SIMPLE-CLIPS and ItalWordNet; 3461 mappings were extracted from the mapping database IWNMAPDB and then included in the Sense Inventory as relevant senses. | |
Identifier (URI): | http://hdl.handle.net/20.500.11752/OPEN-557 | |
Language: | Italian | |
Language (ISO639): | ita | |
Publisher: | Università di Pisa | |
Istituto di Linguistica Computazionale “A. Zampolli” - Consiglio Nazionale delle Ricerche (ILC-CNR) | ||
Rights: | Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) | |
http://creativecommons.org/licenses/by-nc-sa/4.0/ | ||
Subject: | lexical resources | |
word sense disambiguation | ||
semantic dataset | ||
Italian language | ||
Subject (ISO639): | ita | |
Type: | lexicalConceptualResource | |
Type (DCMI): | Text | |
Type (OLAC): | lexicon | |
OLAC Info |
||
Archive: | ILC-CNR for CLARIN-IT repository hosted at Institute for Computational Linguistics "A. Zampolli", National Research Council, in Pisa | |
Description: | http://www.language-archives.org/archive/dspace-clarin-it.ilc.cnr.it | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:dspace-clarin-it.ilc.cnr.it:20.500.11752/OPEN-557 | |
DateStamp: | 2021-07-01 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Poli, Francesca. 2021. Università di Pisa. | |
Terms: | area_Europe country_IT dcmi_Text iso639_ita olac_lexicon | |
Inferred Metadata | ||
Country: | Italy | |
Area: | Europe |