OLAC Record
oai:catalogue.elra.info:ELRA-W0090

Metadata
Title:EUROPARL Corpus Parallel Corpora: Portuguese-English
Abstract:The Portuguese-English subpart of the EUROPARL Corpus was extracted from the proceedings of the European Parliament. It contains approximately 58,324,562 tokens of European Portuguese (L1) and 49,216,896 tokens of English (translation). It is composed of one text file for the English corpus and two files for the Portuguese version: a text file and an annotated file, containing a PoS tag and a lemma for each token.
Access Rights:Rights available for: Commercial Use, Research Use
Date Available (W3CDTF):2016-01-20
Date Issued (W3CDTF):2016-01-20
Date Modified (W3CDTF):2016-01-20
Description:Written Corpora
The EUROPARL Corpus (Portuguese-English subpart of the parallel corpora), was extracted from the proceedings of the European Parliament. It contains transcriptions of sessions dating back from 1996 to 2011, with a total of approximately 58,324,562 tokens of European Portuguese (L1) and 49,216,896 tokens of English (translation). The EUROPARL Corpus is composed of one text file for the English corpus and two files for the Portuguese version: a text file and an annotated file. The text version contains plain text and no further annotation. The Portuguese annotated file is a four-column file with one token per line, followed by a PoS tag and a lemma. The corpus was automatically PoS-tagged with MBT tagger (http://ilk.uvt.nl/mbt/), and lemmatized with MBLEM (http://ilk.uvt.nl/mbma/), following the annotation scheme of the Corpus of Reference of Contemporary Portuguese.
Identifier:ELRA-W0090
http://catalog.elra.info/product_info.php?products_id=1257
Language:Portuguese
English
Language (ISO639):por
eng
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-W0090
DateStamp:  2016-01-20
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2016. ELRA (European Language Resources Association).
Terms: area_Europe country_GB country_PT dcmi_Text iso639_eng iso639_por olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-W0090
Up-to-date as of: Tue Jun 18 10:52:50 EDT 2019