OLAC Record oai:lindat.mff.cuni.cz:11372/LRT-1498 |
Metadata | ||
Title: | SoNaR Corpus | |
Bibliographic Citation: | http://hdl.handle.net/11372/LRT-1498 | |
Creator: | Radboud University, CLST | |
Tilburg University, ILK | ||
University of Twente, HMI | ||
University College Ghent, Faculty of Translation Studies | ||
KU Leuven, CCL | ||
Utrecht University, UiL OTS | ||
Date (W3CDTF): | 2015-06-29T13:23:32Z | |
Date Available: | 2015-06-29T13:23:32Z | |
Description: | The SoNaR-corpus is a 500-million-word reference corpus of contemporary written Dutch and it consists of two parts, viz. SoNaR500 and SONAR1. SONAR500 contains over 500 million words (i.e. word tokens) of full texts from a wide variety of text types. All texts were tokenized, POS-tagged and lemmatized. The named entities were labelled. All annotations in SoNaR500 were automatically generated. SONAR1 is largely a subset of SONAR500 and contains 1 million words. SONAR1 was enriched with various types of semantic annotations, viz. named entity labeling, coreference resolution and annotation of spatial and temporal expressions and of semantic roles. All annotations in SONAR1 were manually verified. The new media texts (tweets, chats and SMS), which were also collected during the STEVIN project SONAR are not part of the SoNaR corpus. They are separately distributed as the SoNaR New Media Corpus. | |
Identifier (URI): | http://hdl.handle.net/11372/LRT-1498 | |
Language: | Dutch | |
Language (ISO639): | nld | |
Publisher: | Dutch-Flemish HLT Agency | |
Subject: | monolingual corpus | |
annotated corpus | ||
written language | ||
Type: | corpus | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University | |
Description: | http://www.language-archives.org/archive/lindat.mff.cuni.cz | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:lindat.mff.cuni.cz:11372/LRT-1498 | |
DateStamp: | 2018-08-16 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Radboud University, CLST; Tilburg University, ILK; University of Twente, HMI; University College Ghent, Faculty of Translation Studies; KU Leuven, CCL; Utrecht University, UiL OTS. 2015. Dutch-Flemish HLT Agency. | |
Terms: | area_Europe country_NL dcmi_Text iso639_nld olac_primary_text |