OLAC Record oai:www.ldc.upenn.edu:LDC2002L27 |
Metadata | ||
Title: | Chinese-English Translation Lexicon Version 3.0 | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Huang, Shudong, and David Graff. Chinese-English Translation Lexicon Version 3.0 LDC2002L27. Web Download. Philadelphia: Linguistic Data Consortium, 2002 | |
Contributor: | Huang, Shudong | |
Graff, David | ||
Date (W3CDTF): | 2002 | |
Date Issued (W3CDTF): | 2002-06-17 | |
Description: | *Introduction* 2002 Chinese-English Translation Lexicon Version 3.0 was developed by the Linguistic Data Consortium (LDC). In 1999, responding to urgent demand for a Chinese-English bilingual wordlist to support various projects, LDC quickly solicited entries from both in-house and Internet resources and compiled two versions of Chinese-English wordlists, "ldc_ce_dict.1.0.gb" (henceforth Version 1) and "ldc_ce_dict.2.0.txt" (henceforth Version 2). Version 1, with 24,298 entries, was relatively small and its coverage was unbalanced. Version 2, created as an experiment, is impractical for translingual information processing. Many of its entries were created by reversing source and target language fields in various English-to-Chinese wordlists; as a result many entries are not really words. The increasing demand for richer lexical resources led to the present release, "ldc_cedict.gb.Version 3" (henceforth Version 3). *Data* What's New in Version 3 The total number of Chinese headwords in this release is 54,170. In terms of coverage, Version 3 is a superset of Version 1 and the LDC's Mandarin pronunciation lexicon (Version 3/Version 4). The pronunciation lexicon has a total of 44,404 entries, or 43,968 unique Chinese character strings (i.e. with pronunciation removed). There are still 553 entries from the pronunciation lexicon not found in Version 3. We were unable to provide accurate translations for these head words for various reasons: they may be very technical; they don't make sense unless their source is re-examined; they may have segmentation errors; or they may be rare words for which appropriate translations could not be found due to limited time and resources. Version 3 also left out less than 40 entries from Version 1. Most of these are rare single-character words whose translations cannot be verified for accuracy. *Format* There is one data file, the lexicon itself. Within the lexicon, each entry is in this format: head_word_in_Chinese_characters /gloss 1/gloss 2/.../gloss n/ For example: ººÓï /Chinese language/Chinese/ Ó¢ÎÄ /English language/English/ (A Chinese-capable browser is needed to see this properly. You may need to change your browser's character set to see Simplified Chinese characters.) *Updates* There are no updates at this time. | |
Identifier: | LDC2002L27 | |
https://catalog.ldc.upenn.edu/LDC2002L27 | ||
ISBN: 1-58563-238-4 | ||
ISLRN: 108-782-445-016-1 | ||
DOI: 10.35111/7t9x-0z23 | ||
Language: | English | |
Mandarin Chinese | ||
Language (ISO639): | eng | |
cmn | ||
License: | LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf | |
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2002L27 | |
Rights Holder: | Portions © 2002 Trustees of the University of Pennsylvania | |
Subject: | Mandarin Chinese language | |
Subject (ISO639): | cmn | |
Type (DCMI): | Text | |
Type (OLAC): | lexicon | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2002L27 | |
DateStamp: | 2020-11-30 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Huang, Shudong; Graff, David. 2002. Linguistic Data Consortium. | |
Terms: | area_Asia area_Europe country_CN country_GB dcmi_Text iso639_cmn iso639_eng olac_lexicon | |
Inferred Metadata | ||
Country: | China | |
Area: | Asia |