OLAC Record

Title:MLCC Multilingual and Parallel Corpora
Access Rights: Rights available for: nonCommercialUse
Date Available (W3CDTF):1996-09-01
Date Issued (W3CDTF):1996-09-01
Date Modified (W3CDTF):2012-05-23
Description:The MLCC text corpus has two main components - one set to allow comparable studies to be carried out in different languages and one set as the basis for translation studies. The first set is referred as the Polylingual Document Collection, a collection of newspaper articles from financial newspapers in 6 languages (Dutch, English, French, German, Italian and Spanish). It consists of the following sub-corpora:Dutch - Het Financieele Dagblad - 1992-1993 (Samples) The corpus contains articles from the Dutch financial newspaper Het Financieele Dagblad editions of 2nd January 1992 through to 24th December 1993. It contains around 8.5 million words of text.English - The Financial Times - 1993 (Samples)The corpus contains articles from the British financial newspaper The Financial Times editions from the year 1993. The corpus contains around 30 million words.French - Le Monde - 1992-1993 (Samples) A corpus of articles from the French newspaper Le Monde, consisting of two years worth (1992-1993) of articles on financial subjects, approximately 10 million words.German - Handelsblatt - 1986-1988 (Samples)This subcorpus consists of articles from the period 02.01.1986 to 15.06.1988. It contains some 33 million words. It may be possible to obtain more recent articles from Handelsblatt.Italian - Il Sole 24 Ore - 1992-1993 (Samples) The corpus described here contains articles from the Italian financial newspaper Il Sole 24 Ore from the year 1992. This corpus contains some 1.88 million words. The SGML-markup was done by the University of Edinburgh.Spanish - Expansion - 1994 (Samples)This subcorpus contains articles from the Spanish financial newspaper Expansion editions from 21.10.1991 to 24.10.1991 and 14.05.1994 to 27.12.1994. It contains some 10 million words.The second set is a Multilingual Parallel Corpus consisting of translated data in nine European languages: Danish, Dutch, English, French, German, Greek, Italian, Portuguese and Spanish. The parallel data, provided by the European Commission, comprises two sub-corpora from the Official Journal of the European Communities:Official Journal of the European Commission, C Series: Written Questions 1993Records of questions and answers regarding European Community matters. The data is regularly published as one section of the C Series of the Official Journal of the European Community in all official languages (previously nine). This corpus contains written questions asked by members of the European Parliament and corresponding answers from the European Commission in 9 parallel versions. The total size of the corpus is approximately 10.2 million words (ca. 1.1 million words per language).Official Journal of the European Commission, Annex: Debates of the European Parliament 1992-1994This parallel corpus is the records of Parliamentary sitting published as an annex to the Official Journal of the European Community Debates of the European Parliament. The Parliamentary Debates are a record of what was said by members of the meeting as well as written input provided to the meeting. The original data from which the translations are produced consist of a transcript of the sittings, each member speaking in the language of his choice. The final version consists of nine parallel versions of the material. The texts delivered comprise the Debates of Parliament from January 1992 to July 1994. This sub-corpus contains some 5 to 8 million words per language.
ISLRN: 963-635-729-341-8
Identifier (URI):https://catalog.elra.info/en-us/repository/browse/ELRA-W0023/
Spanish; Castilian
Dutch; Flemish
Modern Greek (1453-)
Language (ISO639):fra
Medium:Not specified
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Text
Type (OLAC):primary_text


Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-W0023
DateStamp:  1996-09-01
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 1996. ELRA (European Language Resources Association).
Terms: area_Europe country_DE country_DK country_ES country_FR country_GB country_GR country_IT country_NL country_PT dcmi_Text iso639_dan iso639_deu iso639_ell iso639_eng iso639_fra iso639_ita iso639_nld iso639_por iso639_spa olac_primary_text

Up-to-date as of: Fri Apr 19 6:29:30 EDT 2024