OLAC Record
oai:lindat.mff.cuni.cz:11234/1-2586

Metadata
Title:Indonesian web corpus (idWac)
Bibliographic Citation:http://hdl.handle.net/11234/1-2586
Creator:Medveď, Marek
Suchomel, Vít
Date (W3CDTF):2018-01-09T15:57:37Z
Date Available:2018-01-09T15:57:37Z
Description:Indonesian text corpus from web. Crawling done by SpiderLing in 2017. Filtering by JusText and Onion (see http://corpus.tools/ for details). Tagged and lemmatized by MorphInd (http://septinalarasati.com/morphind/).
Identifier (URI):http://hdl.handle.net/11234/1-2586
Language:Indonesian
Language (ISO639):ind
Publisher:Natural Language Processing Centre, Faculty of Informatics, Masaryk University
Rights:NLP Centre Web Corpus License
https://lindat.mff.cuni.cz/repository/xmlui/page/license-NLPC-WeC
Subject:corpus
lemmatization
PoS tagging
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11234/1-2586
DateStamp:  2021-06-29
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Medveď, Marek; Suchomel, Vít. 2018. Natural Language Processing Centre, Faculty of Informatics, Masaryk University.
Terms: area_Asia country_ID dcmi_Text iso639_ind olac_primary_text


http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11234/1-2586
Up-to-date as of: Thu Oct 5 0:40:49 EDT 2023