OLAC Record
oai:www.ldc.upenn.edu:LDC2025T06

Metadata
Title:Chinese Sentence Pattern Structure Treebank
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Peng, Weiming, et al. Chinese Sentence Pattern Structure Treebank LDC2025T06. Web Download. Philadelphia: Linguistic Data Consortium, 2025
Contributor:Peng, Weiming
Zhao, Min
He, Jing
Song, Yuchen
Song, Tianbao
Guo, Dongdong
Sun, Jingbo
Zhu, Shuqin
Zhang, Yinbin
Wei, Zuntian
Hu, Jiajia
Song, Jihua
Sui, Zhifang
Wang, Ning
Date (W3CDTF):2025
Date Issued (W3CDTF):2025-06-16
Description:*Introduction* Chinese Sentence Pattern Structure Treebank (the SPS Treebank) was developed at Beijing Normal University and Peking University. It contains 5,016 sentences and 119,627 tokens syntactically annotated following the concept of sentence constituent analysis which emphasizes sentence pattern structure. This concept is based on linguist Jinxi Li's The New Chinese Grammar. The source data consists of 27 chapters extracted from modern Mandarin and ancient Chinese works. *Data* The SPS Treebank has three annotation layers: lexical sense and structural mode for dynamic words; syntactic structure for clauses; and inter-clause relation within complex sentence and sentence clusters. These structures can be visualized using the Jbw-viewer tool. Below are the text data sources and volumes contained in this release: Book Name Chapters Characters Sentences Selected Work of Luxun (《鲁迅全集》) 8 25,545 948 Selected Work of Mao Zedong (《毛泽东选集》) 2 32,454 771 From the Soil: The Foundations of Chinese Society (《乡土中国》) 4 16,018 532 A Dream in Red Mansions (《红楼梦》) 5 33,087 1,781 The Analects of Confucius (《论语》) 6 5,392 517 Mencius (《孟子》) 2 6,771 467 Total: 27 119,267 5,016 The data is presented in UTF-8 encoding. Each file contains the three-layer annotation stored in XML format. All files were automatically verified and manually checked. *Samples* Please view the following samples: * XML *Updates* None at this time..
Extent:Corpus size: 4744 KB
Identifier:LDC2025T06
https://catalog.ldc.upenn.edu/LDC2025T06
ISLRN: 916-484-709-412-8
DOI: 10.35111/hx6v-6p30
Language:Mandarin Chinese
Chinese
Language (ISO639):cmn
zho
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2025T06
Rights Holder:Portions © 2025 Weiming Peng, © 2025 Trustees of the University of Pennsylvania
Subject:Mandarin Chinese language
Subject (ISO639):cmn
Subject (OLAC):historical_linguistics
linguistics_and_literature
Type (DCMI):Text
Type (Discourse):narrative
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2025T06
DateStamp:  2025-06-16
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Peng, Weiming; Zhao, Min; He, Jing; Song, Yuchen; Song, Tianbao; Guo, Dongdong; Sun, Jingbo; Zhu, Shuqin; Zhang, Yinbin; Wei, Zuntian; Hu, Jiajia; Song, Jihua; Sui, Zhifang; Wang, Ning. 2025. Linguistic Data Consortium.
Terms: area_Asia country_CN dcmi_Text iso639_cmn iso639_zho olac_historical_linguistics olac_linguistics_and_literature olac_narrative olac_primary_text

Inferred Metadata

Country: China
Area: Asia


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2025T06
Up-to-date as of: Wed Jun 18 0:16:08 EDT 2025