OLAC Record
oai:catalogue.elra.info:ELRA-W0085

Metadata
Title:ROCO Romanian journalistic corpus
Abstract:ROCO is a Romanian journalistic corpus containing approximately 7.1 million tokens, the number of types being 231,626. It is rich in proper names, numerals and named entities. The corpus has been lemmatized and PoS annotated following the Multext-East morphosyntactic specifications, and it is XML encoded.
Access Rights:Rights available for: Research Use, Commercial Use
Date Available (W3CDTF):2015-11-30
Date Issued (W3CDTF):2015-11-30
Date Modified (W3CDTF):2015-11-30
Description:Written Corpora
ROCO is a Romanian journalistic corpus containing approximately 7.1 million tokens, the number of types being 231,626. It is rich in proper names, numerals and named entities. The corpus contains morphosyntactic information (MSD annotations) which has been assigned automatically with the high accuracy (estimated 98%) TTL tagger implementing the tiered tagging methodology. About 20% of the MSD annotations have been manually checked, validated and, where the case, corrected. MSDs follow the Multext-East specifications. For Romanian there are 614 different MSDs. They have been slightly modified (new tags for named entities have been added). The corpus was first segmented, then PoS annotated and lemmatized with the TTL processing chain. The corpus has been XML encoded and each file includes metadata (cesHeader).
Identifier:ELRA-W0085
http://catalog.elra.info/product_info.php?products_id=1249
Language:Romanian
Language (ISO639):ron
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-W0085
DateStamp:  2015-11-30
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2015. ELRA (European Language Resources Association).
Terms: area_Europe country_RO dcmi_Text iso639_ron olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-W0085
Up-to-date as of: Wed Jul 24 11:11:29 EDT 2019