OLAC Record oai:dspace-clarin-it.ilc.cnr.it:20.500.11752/ILC-985 |
Metadata | ||
Title: | TrAVaSI_VoDIM Corpus | |
Bibliographic Citation: | http://hdl.handle.net/20.500.11752/ILC-985 | |
Creator: | Favaro, Manuel | |
Biffi, Marco | ||
Montemagni, Simonetta | ||
Date (W3CDTF): | 2023-01-09T08:44:35Z | |
Date Available: | 2023-01-09T08:44:35Z | |
Description: | The TrAVaSI_VoDIM Corpus is a sample of the corpus built for the Vocabolario Dinamico Dell’Italiano Moderno (VoDIM, Marazzini and Maconi, 2018), gathering Italian texts from 1861 to the present day, after the Unification of Italy. TrAVaSI_VoDIM is balanced and representative of different prose domains (art, gastronomy, law, newspapers, literature, popular fiction, science), for a total of about 21.000 tokens. TrAVaSI_VoDIM is morpho-syntactically annotated and lemmatized. The annotation, conforming to the Universal Dependencies standard (UD, De Marneffe et al. 2021), has been carried out semi-automatically. First, TrAVaSI_VoDIM was automatically annotated with the Stanza “combined” model for Italian. Automatic annotation was then manually revised. The resulting corpus has also been used to retrain Stanza to deal with historical varieties of the Italian language: achieved results are encouraging. | |
Identifier (URI): | http://hdl.handle.net/20.500.11752/ILC-985 | |
Language: | Italian | |
Language (ISO639): | ita | |
Publisher: | Istituto di Linguistica Computazionale “A. Zampolli” - Consiglio Nazionale delle Ricerche (ILC-CNR) | |
Accademia della Crusca | ||
Rights: | Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) | |
http://creativecommons.org/licenses/by-nc-sa/4.0/ | ||
Subject: | historical annotated corpora | |
linguistic annotation | ||
Universal Dependencies | ||
Type: | corpus | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | ILC-CNR for CLARIN-IT repository hosted at Institute for Computational Linguistics "A. Zampolli", National Research Council, in Pisa | |
Description: | http://www.language-archives.org/archive/dspace-clarin-it.ilc.cnr.it | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:dspace-clarin-it.ilc.cnr.it:20.500.11752/ILC-985 | |
DateStamp: | 2023-01-09 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Favaro, Manuel; Biffi, Marco; Montemagni, Simonetta. 2023. Istituto di Linguistica Computazionale “A. Zampolli” - Consiglio Nazionale delle Ricerche (ILC-CNR). | |
Terms: | area_Europe country_IT dcmi_Text iso639_ita olac_primary_text |