OLAC Record: Chinese Proposition Bank 1.0

OLAC Record
oai:www.ldc.upenn.edu:LDC2005T23

Metadata

Title: Chinese Proposition Bank 1.0

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Palmer, Martha, et al. Chinese Proposition Bank 1.0 LDC2005T23. Web Download. Philadelphia: Linguistic Data Consortium, 2005

Contributor: Palmer, Martha

Xue, Nianwen

Jiang, Zixin

Chang, Meiyu

Date (W3CDTF): 2005

Date Issued (W3CDTF): 2005-09-20

Description: *Introduction* Chinese Proposition Bank 1.0 was developed by the Linguistic Data Consortium (LDC) and contains predicate-argument relations for approximately 37,000 propositions annotated in 250,000 words of Chinese text. Chinese Proposition Bank 1.0 is the first public release of the Penn Chinese Proposition Bank project, which aims to create a corpus of text annotated with information about basic semantic propositions. Specifically, predicate-argument relations have been added to the syntactic trees of the first update to Chinese Treebank 5.0 (LDC2005T01) as an additional layer of annotation. There are two later versions of this corpus: * Chinese Proposition Bank 2.0 (LDC2008T07) * Chinese Proposition Bank 3.0 (LDC2013T13) *Data* Chinese Proposition Bank 1.0 includes annotations for files chtb_001.fid to chtb_931.fid, or the first 250K words of the first update of Chinese Treebank 5.0. There is a total of 37,183 propositions. Auxiliary verbs are not annotated. Some verbs have light verb and non-light verb uses; in these cases only the non-light verbs are annotated. All the annotations in this release are the result of double blind annotation followed by adjudication of differences. The following table summarizes the framesets in CPB 1.0: Total verbs framed 4,865 Total framesets 5,298 Verbs with multiple framesets 351 Average framesets per verb 1.09 Each predicate-argument structure is represented in a line of space separated columns. The columns are as follows: * ctb-filename: the name of the file in the Penn Chinese TreeBank 5.0 update 1. * sentence: the number of the sentence in the file (starting with 0). * terminal: the number of the terminal in the sentence that is the location of the verb. * tagger: the name of the annotator, or "gold" if it's been double annotated and adjudicated. * frameset: identifier from the frames file of the verb. * inflection: a carry-over from the Penn English Proposition Bank, no annotation in the Chinese Proposition Bank. * arglabel: a string representing the annotation associated with a particular argument or adjunct of the proposition in three columns: address of constituent, label, and functional tag. *Samples* For an example of the data in this corpus, please view this sample (XML). *Update* None at this time.

Identifier: LDC2005T23

https://catalog.ldc.upenn.edu/LDC2005T23

ISBN: 1-58563-354-2

ISLRN: 731-738-468-307-2

DOI: 10.35111/3myq-gk34

Language: Mandarin Chinese

Language (ISO639): cmn

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2005T23

Rights Holder: Portions © 1994-1998 Xinhua News Agency, © 1996-2001 Sinorama Magazine, © 1997 The Government of the Hong Kong Special Administrative Region, © 2005 Trustees of the University of Pennsylvania

Type (DCMI): Text

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2005T23

DateStamp: 2021-07-26

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Palmer, Martha; Xue, Nianwen; Jiang, Zixin; Chang, Meiyu. 2005. Linguistic Data Consortium.
Terms: area_Asia country_CN dcmi_Text iso639_cmn olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2005T23
Up-to-date as of: Wed Oct 29 7:00:52 EDT 2025

Metadata
Title:		Chinese Proposition Bank 1.0
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Palmer, Martha, et al. Chinese Proposition Bank 1.0 LDC2005T23. Web Download. Philadelphia: Linguistic Data Consortium, 2005
Contributor:		Palmer, Martha
		Xue, Nianwen
		Jiang, Zixin
		Chang, Meiyu
Date (W3CDTF):		2005
Date Issued (W3CDTF):		2005-09-20
Description:		Introduction Chinese Proposition Bank 1.0 was developed by the Linguistic Data Consortium (LDC) and contains predicate-argument relations for approximately 37,000 propositions annotated in 250,000 words of Chinese text. Chinese Proposition Bank 1.0 is the first public release of the Penn Chinese Proposition Bank project, which aims to create a corpus of text annotated with information about basic semantic propositions. Specifically, predicate-argument relations have been added to the syntactic trees of the first update to Chinese Treebank 5.0 (LDC2005T01) as an additional layer of annotation. There are two later versions of this corpus: * Chinese Proposition Bank 2.0 (LDC2008T07) * Chinese Proposition Bank 3.0 (LDC2013T13) Data Chinese Proposition Bank 1.0 includes annotations for files chtb_001.fid to chtb_931.fid, or the first 250K words of the first update of Chinese Treebank 5.0. There is a total of 37,183 propositions. Auxiliary verbs are not annotated. Some verbs have light verb and non-light verb uses; in these cases only the non-light verbs are annotated. All the annotations in this release are the result of double blind annotation followed by adjudication of differences. The following table summarizes the framesets in CPB 1.0: Total verbs framed 4,865 Total framesets 5,298 Verbs with multiple framesets 351 Average framesets per verb 1.09 Each predicate-argument structure is represented in a line of space separated columns. The columns are as follows: * ctb-filename: the name of the file in the Penn Chinese TreeBank 5.0 update 1. * sentence: the number of the sentence in the file (starting with 0). * terminal: the number of the terminal in the sentence that is the location of the verb. * tagger: the name of the annotator, or "gold" if it's been double annotated and adjudicated. * frameset: identifier from the frames file of the verb. * inflection: a carry-over from the Penn English Proposition Bank, no annotation in the Chinese Proposition Bank. * arglabel: a string representing the annotation associated with a particular argument or adjunct of the proposition in three columns: address of constituent, label, and functional tag. Samples For an example of the data in this corpus, please view this sample (XML). Update None at this time.
Identifier:		LDC2005T23
		https://catalog.ldc.upenn.edu/LDC2005T23
		ISBN: 1-58563-354-2
		ISLRN: 731-738-468-307-2
		DOI: 10.35111/3myq-gk34
Language:		Mandarin Chinese
Language (ISO639):		cmn
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2005T23
Rights Holder:		Portions © 1994-1998 Xinhua News Agency, © 1996-2001 Sinorama Magazine, © 1997 The Government of the Hong Kong Special Administrative Region, © 2005 Trustees of the University of Pennsylvania
Type (DCMI):		Text
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2005T23
DateStamp:		2021-07-26
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Palmer, Martha; Xue, Nianwen; Jiang, Zixin; Chang, Meiyu. 2005. Linguistic Data Consortium.
Terms:		area_Asia country_CN dcmi_Text iso639_cmn olac_primary_text