OLAC Record
oai:lindat.mff.cuni.cz:11858/00-097C-0000-0001-CC1E-B

Metadata
Title:Hindi Web Texts
Bibliographic Citation:http://hdl.handle.net/11858/00-097C-0000-0001-CC1E-B
Creator:Bojar, Ondřej
Straňák, Pavel
Zeman, Daniel
Date (W3CDTF):2011-11-23T15:47:18Z
Date Available:2011-11-23T15:47:18Z
Description:A Hindi corpus of texts downloaded mostly from news sites. Contains both the original raw texts and an extensively cleaned-up and tokenized version suitable for language modeling. 18M sentences, 308M tokens
FP7-ICT-2007-3-231720 (EuroMatrix Plus), 7E09003 (Czech part of EM+)
Identifier (URI):UMC004
http://hdl.handle.net/11858/00-097C-0000-0001-CC1E-B
Is Replaced By (URI):http://hdl.handle.net/11858/00-097C-0000-0023-6260-A
Language:Hindi
Language (ISO639):hin
Publisher:Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Rights:Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0)
http://creativecommons.org/licenses/by-nc/3.0/
Subject:news
web texts
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11858/00-097C-0000-0001-CC1E-B
DateStamp:  2021-06-29
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Bojar, Ondřej; Straňák, Pavel; Zeman, Daniel. 2011. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL).
Terms: area_Asia country_IN dcmi_Text iso639_hin olac_primary_text


http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11858/00-097C-0000-0001-CC1E-B
Up-to-date as of: Thu Oct 5 0:38:50 EDT 2023