OLAC Record: Chinese Treebank 2.0

OLAC Record
oai:www.ldc.upenn.edu:LDC2001T11

Metadata

Title: Chinese Treebank 2.0

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Palmer, Martha, et al. Chinese Treebank 2.0 LDC2001T11. Web Download. Philadelphia: Linguistic Data Consortium, 2001

Contributor: Palmer, Martha

Marcus, Mitch

Kroch, Anthony

Xia, Fei

Xue, Nianwen

Chiou, Fu-Dong

Date (W3CDTF): 2001

Description: The Chinese Treebank 2.0 was produced by: Principal Investigators: Martha Palmer, Mitch Marcus, Tony Kroch Consultants: Martha Palmer, Mitch Marcus, Tony Kroch, Shizhe Huang, Mary Ellen Okurowski, John Kovarik, Boyan A. Onyshkevyc Project Managers and Guideline Designers: Fei Xia, Nianwen Xue Annotators: Fu-Dong Chiou, Nianwen Xue Programming support: Zhibiao Wu *Introduction* Published by the Linguistic Data Consortium (LDC), catalog number LDC2001T11 and ISBN 1-58563-204-X. The Chinese Penn Treebank Project started in Summer 1998. The goal is the creation of a 100,000 word corpus of Chinese with syntactic bracketing. More information is available at The Chinese Treebank Project. Chinese Treebank 2.0 supersedes and replaces the Chinese Penn Treebank Final Release (LDC2000T48 ISBN 1-58563-187-6). *Data* Size: About 100K words, 325 data files Source: 325 articles from Xinhua newswire between 1994 and 1998 Coding: GB code Format: Same as the UPenn English Treebank except that we keep some original file information was retained such as "SRCID" and "DATE" in the data file. Annotation: All the files are annotated at least twice, the first-pass is done by one annotator, and the resulting files are checked by the second annotator (second-pass). SGML: All data files validate against chtb.dtd using nsmls. The files are located in the data subdirectory and are sequentially named as follows: chtb_nnn.fid where nnn is the sequential file number. There is a cross reference in file.tbl which provides some annotator and historical information.

Extent: Corpus size: 1843 KB

Identifier: LDC2001T11

https://catalog.ldc.upenn.edu/LDC2001T11

ISBN: 1-58563-204-X

ISLRN: 324-683-461-517-1

DOI: 10.35111/jfkh-w176

Language: Mandarin Chinese

Language (ISO639): cmn

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2001T11

Rights Holder: Portions © 1994-1998 Xinhua News Agency, © 2001 Trustees of the University of Pennsylvania

Type (DCMI): Text

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2001T11

DateStamp: 2020-11-30

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Palmer, Martha; Marcus, Mitch; Kroch, Anthony; Xia, Fei; Xue, Nianwen; Chiou, Fu-Dong. 2001. Linguistic Data Consortium.
Terms: area_Asia country_CN dcmi_Text iso639_cmn olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2001T11
Up-to-date as of: Wed Oct 29 7:00:09 EDT 2025

Metadata
Title:		Chinese Treebank 2.0
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Palmer, Martha, et al. Chinese Treebank 2.0 LDC2001T11. Web Download. Philadelphia: Linguistic Data Consortium, 2001
Contributor:		Palmer, Martha
		Marcus, Mitch
		Kroch, Anthony
		Xia, Fei
		Xue, Nianwen
		Chiou, Fu-Dong
Date (W3CDTF):		2001
Description:		The Chinese Treebank 2.0 was produced by: Principal Investigators: Martha Palmer, Mitch Marcus, Tony Kroch Consultants: Martha Palmer, Mitch Marcus, Tony Kroch, Shizhe Huang, Mary Ellen Okurowski, John Kovarik, Boyan A. Onyshkevyc Project Managers and Guideline Designers: Fei Xia, Nianwen Xue Annotators: Fu-Dong Chiou, Nianwen Xue Programming support: Zhibiao Wu Introduction Published by the Linguistic Data Consortium (LDC), catalog number LDC2001T11 and ISBN 1-58563-204-X. The Chinese Penn Treebank Project started in Summer 1998. The goal is the creation of a 100,000 word corpus of Chinese with syntactic bracketing. More information is available at The Chinese Treebank Project. Chinese Treebank 2.0 supersedes and replaces the Chinese Penn Treebank Final Release (LDC2000T48 ISBN 1-58563-187-6). Data Size: About 100K words, 325 data files Source: 325 articles from Xinhua newswire between 1994 and 1998 Coding: GB code Format: Same as the UPenn English Treebank except that we keep some original file information was retained such as "SRCID" and "DATE" in the data file. Annotation: All the files are annotated at least twice, the first-pass is done by one annotator, and the resulting files are checked by the second annotator (second-pass). SGML: All data files validate against chtb.dtd using nsmls. The files are located in the data subdirectory and are sequentially named as follows: chtb_nnn.fid where nnn is the sequential file number. There is a cross reference in file.tbl which provides some annotator and historical information.
Extent:		Corpus size: 1843 KB
Identifier:		LDC2001T11
		https://catalog.ldc.upenn.edu/LDC2001T11
		ISBN: 1-58563-204-X
		ISLRN: 324-683-461-517-1
		DOI: 10.35111/jfkh-w176
Language:		Mandarin Chinese
Language (ISO639):		cmn
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2001T11
Rights Holder:		Portions © 1994-1998 Xinhua News Agency, © 2001 Trustees of the University of Pennsylvania
Type (DCMI):		Text
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2001T11
DateStamp:		2020-11-30
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Palmer, Martha; Marcus, Mitch; Kroch, Anthony; Xia, Fei; Xue, Nianwen; Chiou, Fu-Dong. 2001. Linguistic Data Consortium.
Terms:		area_Asia country_CN dcmi_Text iso639_cmn olac_primary_text