OLAC Record
oai:dspace-clarin-it.ilc.cnr.it:20.500.11752/ILC-985

Metadata
Title:TrAVaSI_VoDIM Corpus
Bibliographic Citation:http://hdl.handle.net/20.500.11752/ILC-985
Creator:Favaro, Manuel
Biffi, Marco
Montemagni, Simonetta
Date (W3CDTF):2023-01-09T08:44:35Z
Date Available:2023-01-09T08:44:35Z
Description:The TrAVaSI_VoDIM Corpus is a sample of the corpus built for the Vocabolario Dinamico Dell’Italiano Moderno (VoDIM, Marazzini and Maconi, 2018), gathering Italian texts from 1861 to the present day, after the Unification of Italy. TrAVaSI_VoDIM is balanced and representative of different prose domains (art, gastronomy, law, newspapers, literature, popular fiction, science), for a total of about 21.000 tokens. TrAVaSI_VoDIM is morpho-syntactically annotated and lemmatized. The annotation, conforming to the Universal Dependencies standard (UD, De Marneffe et al. 2021), has been carried out semi-automatically. First, TrAVaSI_VoDIM was automatically annotated with the Stanza “combined” model for Italian. Automatic annotation was then manually revised. The resulting corpus has also been used to retrain Stanza to deal with historical varieties of the Italian language: achieved results are encouraging.
Identifier (URI):http://hdl.handle.net/20.500.11752/ILC-985
Language:Italian
Language (ISO639):ita
Publisher:Istituto di Linguistica Computazionale “A. Zampolli” - Consiglio Nazionale delle Ricerche (ILC-CNR)
Accademia della Crusca
Rights:Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
http://creativecommons.org/licenses/by-nc-sa/4.0/
Subject:historical annotated corpora
linguistic annotation
Universal Dependencies
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  ILC-CNR for CLARIN-IT repository hosted at Institute for Computational Linguistics "A. Zampolli", National Research Council, in Pisa
Description:  http://www.language-archives.org/archive/dspace-clarin-it.ilc.cnr.it
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:dspace-clarin-it.ilc.cnr.it:20.500.11752/ILC-985
DateStamp:  2023-01-09
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Favaro, Manuel; Biffi, Marco; Montemagni, Simonetta. 2023. Istituto di Linguistica Computazionale “A. Zampolli” - Consiglio Nazionale delle Ricerche (ILC-CNR).
Terms: area_Europe country_IT dcmi_Text iso639_ita olac_primary_text


http://www.language-archives.org/item.php/oai:dspace-clarin-it.ilc.cnr.it:20.500.11752/ILC-985
Up-to-date as of: Tue Sep 19 0:43:06 EDT 2023