OLAC Record
oai:www.ldc.upenn.edu:LDC2012T05

Metadata
Title:Chinese Dependency Treebank 1.0
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Che, Wanxiang, Zhenghua Li, and Ting Liu. Chinese Dependency Treebank 1.0 LDC2012T05. Web Download. Philadelphia: Linguistic Data Consortium, 2012
Contributor:Che, Wanxiang
Li, Zhenghua
Liu, Ting
Date (W3CDTF):2012
Date Issued (W3CDTF):2012-05-16
Description:*Introduction* Chinese Dependency Treebank 1.0 was developed by the Harbin Institute of Technologys Research Center for Social Computing and Information Retrieval (HIT-SCIR). It contains 49,996 Chinese sentences (902,191 words) randomly selected from Peoples Daily newswire stories published between 1992 and 1996 and annotated with syntactic dependency structures. *Data* Ill-formed or short sentences were eliminated from the randomly-selected sentences prior to annotation. The data was segmented and annotated for part of speech (POS), syntactic structures, verb subclasses and noun compounds.Word segmentation and POS tagging were accomplished automatically using statistical models trained on a larger, annotated corpus of Peoples Daily newswire stories. Humans manually annotated the syntactic structures and corrected word segmentation errors. POS tags were not corrected. The data is provided in the format of CoNLL-X and in UTF-8. One line presents information for one word. An empty line indicates the end of a sentence. Each line contains 10 columns separated with a tab. *Samples* Please click follow this link for a sample of the data. *Updates* None at this time.
Extent:Corpus size: 25162 KB
Identifier:LDC2012T05
https://catalog.ldc.upenn.edu/LDC2012T05
ISBN: 1-58563-612-6
ISLRN: 475-765-099-443-8
Language:Mandarin Chinese
Chinese
Language (ISO639):cmn
zho
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2012T05
Rights Holder: Portions © 1992-1996 Peoples Daily, © 2012 Harbin Institute of Technology, Research Center for Social Computing and Information Retrieval, © 2012 Trustees of the University of Pennsylvania
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2012T05
DateStamp:  2019-01-03
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Che, Wanxiang; Li, Zhenghua; Liu, Ting. 2012. Linguistic Data Consortium.
Terms: area_Asia country_CN dcmi_Text iso639_cmn iso639_zho olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2012T05
Up-to-date as of: Sun Sep 1 18:18:38 EDT 2019