OLAC Record
oai:catalogue.elra.info:ELRA-W0089

Metadata
Title:NPChunks
Abstract:NPChunks is a training corpus containing approximately 1,000 sentences, with a total of 24,243 tokens, selected randomly from the written part of the CINTIL corpus. The corpus is PoS-annotated at token level, including punctuation. Noun Phrases were annotated with specific tags. It was automatically PoS-tagged with MBT tagger, and lemmatized with MBLEM, following the annotation scheme of the Corpus of Reference of Contemporary Portuguese.
Access Rights:Rights available for: Commercial Use, Research Use
Date Available (W3CDTF):2016-01-20
Date Issued (W3CDTF):2016-01-20
Date Modified (W3CDTF):2016-01-20
Description:Written Corpora
NPChunks is a training corpus containing approximately 1,000 sentences, with a total of 24,243 tokens, selected randomly from the written part of the CINTIL corpus. For more information on the CINTIL corpus, see ELRA-W0050, ISLRN: 176-775-844-396-0. The corpus is PoS-annotated at token level, including punctuation. Noun Phrases were recognized and annotated with specific tags. It was automatically PoS-tagged with MBT tagger (http://ilk.uvt.nl/mbt/), and lemmatized with MBLEM (http://ilk.uvt.nl/mbma/), following the annotation scheme of the Corpus of Reference of Contemporary Portuguese. YamCha software (http://chasen.org/~taku/software/yamcha/) was used to recognize chunks that consist of Noun Phrases and to identify the elements appearing at the beginning, in the middle and at the end of a noun phrase.
Identifier:ELRA-W0089
http://catalog.elra.info/product_info.php?products_id=1256
Language:Portuguese
Language (ISO639):por
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-W0089
DateStamp:  2016-01-20
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2016. ELRA (European Language Resources Association).
Terms: area_Europe country_PT dcmi_Text iso639_por olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-W0089
Up-to-date as of: Tue Jun 18 10:52:49 EDT 2019