![]() |
OLAC Record oai:lindat.mff.cuni.cz:11858/00-097C-0000-0022-60D6-1 |
| Metadata | ||
| Title: | W2C – Web to Corpus – tool | |
| Bibliographic Citation: | http://hdl.handle.net/11858/00-097C-0000-0022-60D6-1 | |
| Creator: | Majliš, Martin | |
| Date (W3CDTF): | 2013-06-25T13:21:15Z | |
| Date Available: | 2013-06-25T13:21:15Z | |
| Description: | A tool used to build multilingual corpora from wikipedia. Download the web pages, convert them to plain text, identify language, etc. A set of 120 corpora collected using this tool is available at https://ufal-point.mff.cuni.cz/xmlui/handle/11858/00-097C-0000-0022-6133-9 | |
| Identifier (URI): | http://hdl.handle.net/11858/00-097C-0000-0022-60D6-1 | |
| Language: | No linguistic content | |
| Language (ISO639): | zxx | |
| Publisher: | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) | |
| Rights: | Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) | |
| http://creativecommons.org/licenses/by-sa/3.0/ | ||
| Subject: | web data | |
| wikipedia | ||
| corpus creation | ||
| Type: | toolService | |
| Type (DCMI): | Software | |
OLAC Info |
||
| Archive: | LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University | |
| Description: | http://www.language-archives.org/archive/lindat.mff.cuni.cz | |
| GetRecord: | OAI-PMH request for OLAC format | |
| GetRecord: | Pre-generated XML file | |
OAI Info |
||
| OaiIdentifier: | oai:lindat.mff.cuni.cz:11858/00-097C-0000-0022-60D6-1 | |
| DateStamp: | 2021-06-29 | |
| GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
| Citation: | Majliš, Martin. 2013. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL). | |
| Terms: | dcmi_Software iso639_zxx | |