OLAC Record
oai:lindat.mff.cuni.cz:11858/00-097C-0000-0022-60D6-1

Metadata
Title:W2C – Web to Corpus – tool
Bibliographic Citation:http://hdl.handle.net/11858/00-097C-0000-0022-60D6-1
Creator:Majliš, Martin
Date (W3CDTF):2013-06-25T13:21:15Z
Date Available:2013-06-25T13:21:15Z
Description:A tool used to build multilingual corpora from wikipedia. Download the web pages, convert them to plain text, identify language, etc. A set of 120 corpora collected using this tool is available at https://ufal-point.mff.cuni.cz/xmlui/handle/11858/00-097C-0000-0022-6133-9
Identifier (URI):http://hdl.handle.net/11858/00-097C-0000-0022-60D6-1
Language:No linguistic content
Language (ISO639):zxx
Publisher:Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Rights:Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
http://creativecommons.org/licenses/by-sa/3.0/
Subject:web data
wikipedia
corpus creation
Type:toolService
Type (DCMI):Software

OLAC Info

Archive:  LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11858/00-097C-0000-0022-60D6-1
DateStamp:  2021-06-29
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Majliš, Martin. 2013. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL).
Terms: dcmi_Software iso639_zxx


http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11858/00-097C-0000-0022-60D6-1
Up-to-date as of: Thu Oct 5 0:38:52 EDT 2023