OLAC Record
oai:lindat.mff.cuni.cz:11372/LRT-1105

Metadata
Title:Wikicorpus
Bibliographic Citation:http://hdl.handle.net/11372/LRT-1105
Contributor:Boleda, Gemma
Date (W3CDTF):2014-07-30T21:26:58Z
Date Available:2014-07-30T21:26:58Z
Description:Trilingual corpus (Catalan, Spanish, English) that contains large portions of the Wikipedia (based on a 2006 dump) and has been automatically enriched with linguistic information. In its present version, it contains over 750 million words.
Identifier (URI):http://hdl.handle.net/11372/LRT-1105
Language:Catalan
English
Spanish
Language (ISO639):cat
eng
spa
Publisher:Centro de Tecnologías y Aplicaciones del Lenguaje y del Habla (TALP)
Subject:trilingual corpus
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11372/LRT-1105
DateStamp:  2021-06-29
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Boleda, Gemma. 2014. Centro de Tecnologías y Aplicaciones del Lenguaje y del Habla (TALP).
Terms: area_Europe country_ES country_GB dcmi_Text iso639_cat iso639_eng iso639_spa olac_primary_text


http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11372/LRT-1105
Up-to-date as of: Thu Oct 5 0:40:02 EDT 2023