![]() |
OLAC Record oai:lindat.mff.cuni.cz:11372/LRT-2209 |
| Metadata | ||
| Title: | C4Corpus (publicdomain part) | |
| Bibliographic Citation: | http://hdl.handle.net/11372/LRT-2209 | |
| Creator: | Gurevych, Iryna | |
| Habernal, Ivan | ||
| Zayed, Omnia | ||
| Date (W3CDTF): | 2017-06-07T13:10:23Z | |
| Date Available: | 2017-06-07T13:10:23Z | |
| Description: | A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs. | |
| Identifier (URI): | http://hdl.handle.net/11372/LRT-2209 | |
| Language: | Afrikaans | |
| Arabic | ||
| Bulgarian | ||
| Czech | ||
| Danish | ||
| German | ||
| Modern Greek (1453-) | ||
| English | ||
| Estonian | ||
| Persian | ||
| Finnish | ||
| French | ||
| Croatian | ||
| Hungarian | ||
| Indonesian | ||
| Italian | ||
| Japanese | ||
| Korean | ||
| Latvian | ||
| Lithuanian | ||
| Dutch | ||
| Norwegian | ||
| Polish | ||
| Portuguese | ||
| Russian | ||
| Slovenian | ||
| Somali | ||
| Spanish | ||
| Swahili (macrolanguage) | ||
| Swedish | ||
| Tagalog | ||
| Thai | ||
| Turkish | ||
| Ukrainian | ||
| Undetermined | ||
| Vietnamese | ||
| Language (ISO639): | afr | |
| ara | ||
| bul | ||
| ces | ||
| dan | ||
| deu | ||
| ell | ||
| eng | ||
| est | ||
| fas | ||
| fin | ||
| fra | ||
| hrv | ||
| hun | ||
| ind | ||
| ita | ||
| jpn | ||
| kor | ||
| lav | ||
| lit | ||
| nld | ||
| nor | ||
| pol | ||
| por | ||
| rus | ||
| slv | ||
| som | ||
| spa | ||
| swa | ||
| swe | ||
| tgl | ||
| tha | ||
| tur | ||
| ukr | ||
| und | ||
| vie | ||
| Publisher: | Technische Universität Darmstadt | |
| Rights: | Public Domain Mark (PD) | |
| http://creativecommons.org/publicdomain/mark/1.0/ | ||
| Subject: | CommonCrawl | |
| Creative Commons | ||
| Web corpus | ||
| Amazon Web Services | ||
| Type: | corpus | |
| Type (DCMI): | Text | |
| Type (OLAC): | primary_text | |
OLAC Info |
||
| Archive: | LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University | |
| Description: | http://www.language-archives.org/archive/lindat.mff.cuni.cz | |
| GetRecord: | OAI-PMH request for OLAC format | |
| GetRecord: | Pre-generated XML file | |
OAI Info |
||
| OaiIdentifier: | oai:lindat.mff.cuni.cz:11372/LRT-2209 | |
| DateStamp: | 2021-06-29 | |
| GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
| Citation: | Gurevych, Iryna; Habernal, Ivan; Zayed, Omnia. 2017. Technische Universität Darmstadt. | |
| Terms: | area_Africa area_Asia area_Europe country_BG country_CZ country_DE country_DK country_ES country_FI country_FR country_GB country_GR country_HR country_HU country_ID country_IT country_JP country_KR country_LT country_NL country_NO country_PH country_PL country_PT country_RU country_SE country_SI country_SO country_TH country_TR country_UA country_VN country_ZA dcmi_Text iso639_afr iso639_ara iso639_bul iso639_ces iso639_dan iso639_deu iso639_ell iso639_eng iso639_est iso639_fas iso639_fin iso639_fra iso639_hrv iso639_hun iso639_ind iso639_ita iso639_jpn iso639_kor iso639_lav iso639_lit iso639_nld iso639_nor iso639_pol iso639_por iso639_rus iso639_slv iso639_som iso639_spa iso639_swa iso639_swe iso639_tgl iso639_tha iso639_tur iso639_ukr iso639_und iso639_vie olac_primary_text | |