OLAC Record
oai:catalogue.elra.info:ELRA-W0088

Metadata
Title:ROMBAC - Romanian balanced corpus
Abstract:ROMBAC is a Romanian corpus containing equal shares of texts from 5 different genres: journalism, legalese, fiction, medicine and biographical data for Romanian literary personalities. The entire corpus counts around 41,000,000 words, including punctuation. The corpus is annotated at paragraph, sentence, constituent group and word levels, and it provides morpho-syntactic information (MSD). It is xml encoded.
Access Rights:Rights available for: Commercial Use, Research Use
Date Available (W3CDTF):2016-01-19
Date Issued (W3CDTF):2016-01-19
Date Modified (W3CDTF):2016-01-19
Description:Written Corpora
ROMBAC is a Romanian corpus containing equal shares of texts from 5 different genres: journalism, legalese, fiction, medicine and biographical data for Romanian literary personalities. For each genre, texts have been selected containing around 7,000,000 words, so that the entire corpus counts around 41,000,000 words, including punctuation. The corpus is annotated at paragraph, sentence, constituent group and word levels. It provides morpho-syntactic information (MSD) which has been assigned automatically with the high accuracy TTL tagger (accuracy is at least 98%), which implements the tiered tagging methodology. About 20% of the MSDs have been manually checked, validated and, where the case, corrected. MSDs follow the Multext-East specifications. For Romanian there are 614 different MSDs. They have been slightly modified (new tags for named entities have been added). The corpus is xml encoded.
Identifier:ELRA-W0088
http://catalog.elra.info/product_info.php?products_id=1253
Language:Romanian
Language (ISO639):ron
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-W0088
DateStamp:  2016-01-19
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2016. ELRA (European Language Resources Association).
Terms: area_Europe country_RO dcmi_Text iso639_ron olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-W0088
Up-to-date as of: Wed Jul 24 11:11:29 EDT 2019