OLAC Record
oai:www.ldc.upenn.edu:LDC94T4B-1

Metadata
Title:UN Parallel Text (English)
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Graff, David. UN Parallel Text (English) LDC94T4B-1. Web Download. Philadelphia: Linguistic Data Consortium, 1994
Contributor:Graff, David
Date (W3CDTF):1994
Description:LDC94T4A - Complete UN Parallel Text corpus LDC94T4B-1 - English text only LDC94T4B-2 - French text only LDC94T4B-3 - Spanish text only This set of three compact discs contains documents provided to the LDC by the United Nations, for use in research on machine translation technology. The documents come from the Office of Conference Services at the UN in New York and are drawn from archives that span the period between 1988 and 1993. This publication contains the English, French and Spanish archives, with data from each language stored on a separate disc in the set. Care has been taken to arrange the document files in a parallel directory structure for each language, so that corresponding translations of a document are found directly by means of the directory paths and file names. All parallel files in this corpus are English-based: for every file on the English disc, there will be a corresponding file on either the French or Spanish disc, or both. Tables are included on all discs to assist in determining which parallels are present. Due to the nature and organization of UN translation services and the original electronic text archives, the process of finding and sorting out parallel documents yielded a numerous gaps, with many files in each language having no parallel in other languages. In preparing the text for publication, we have applied a fully-compliant SGML format (Standard Generalized Markup Language). For those researchers who use SGML, a working DTD (Document Type Definition) is provided on each disc. For those who do not need SGML markup, a simple script is included that can be used to filter out the SGML-specific material and leave only the plain text. The character set used is the 8-bit ISO 8859-1 Latin1, in which accented letters and some other non-ASCII characters occupy the upper 128 entries of the character table.
Identifier:LDC94T4B-1
https://catalog.ldc.upenn.edu/LDC94T4B-1
ISBN: 1-58563-039-X
ISLRN: 494-248-767-772-0
DOI: 10.35111/8m79-yv13
Language:English
Language (ISO639):eng
License:UN Parallel Text Agreement: https://catalog.ldc.upenn.edu/license/un-parallel-text-license.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC94T4B-1
Rights Holder:Portions © 1988-1993 United Nations, © 1994 Trustees of the University of Pennsylvania
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC94T4B-1
DateStamp:  2020-11-30
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Graff, David. 1994. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Text iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC94T4B-1
Up-to-date as of: Mon Mar 25 7:19:52 EDT 2024