![]() |
OLAC Record oai:www.ldc.upenn.edu:LDC2025T06 |
Metadata | ||
Title: | Chinese Sentence Pattern Structure Treebank | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Peng, Weiming, et al. Chinese Sentence Pattern Structure Treebank LDC2025T06. Web Download. Philadelphia: Linguistic Data Consortium, 2025 | |
Contributor: | Peng, Weiming | |
Zhao, Min | ||
He, Jing | ||
Song, Yuchen | ||
Song, Tianbao | ||
Guo, Dongdong | ||
Sun, Jingbo | ||
Zhu, Shuqin | ||
Zhang, Yinbin | ||
Wei, Zuntian | ||
Hu, Jiajia | ||
Song, Jihua | ||
Sui, Zhifang | ||
Wang, Ning | ||
Date (W3CDTF): | 2025 | |
Date Issued (W3CDTF): | 2025-06-16 | |
Description: | *Introduction* Chinese Sentence Pattern Structure Treebank (the SPS Treebank) was developed at Beijing Normal University and Peking University. It contains 5,016 sentences and 119,627 tokens syntactically annotated following the concept of sentence constituent analysis which emphasizes sentence pattern structure. This concept is based on linguist Jinxi Li's The New Chinese Grammar. The source data consists of 27 chapters extracted from modern Mandarin and ancient Chinese works. *Data* The SPS Treebank has three annotation layers: lexical sense and structural mode for dynamic words; syntactic structure for clauses; and inter-clause relation within complex sentence and sentence clusters. These structures can be visualized using the Jbw-viewer tool. Below are the text data sources and volumes contained in this release: Book Name Chapters Characters Sentences Selected Work of Luxun (《鲁迅全集》) 8 25,545 948 Selected Work of Mao Zedong (《毛泽东选集》) 2 32,454 771 From the Soil: The Foundations of Chinese Society (《乡土中国》) 4 16,018 532 A Dream in Red Mansions (《红楼梦》) 5 33,087 1,781 The Analects of Confucius (《论语》) 6 5,392 517 Mencius (《孟子》) 2 6,771 467 Total: 27 119,267 5,016 The data is presented in UTF-8 encoding. Each file contains the three-layer annotation stored in XML format. All files were automatically verified and manually checked. *Samples* Please view the following samples: * XML *Updates* None at this time.. | |
Extent: | Corpus size: 4744 KB | |
Identifier: | LDC2025T06 | |
https://catalog.ldc.upenn.edu/LDC2025T06 | ||
ISLRN: 916-484-709-412-8 | ||
DOI: 10.35111/hx6v-6p30 | ||
Language: | Mandarin Chinese | |
Chinese | ||
Language (ISO639): | cmn | |
zho | ||
License: | LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf | |
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2025T06 | |
Rights Holder: | Portions © 2025 Weiming Peng, © 2025 Trustees of the University of Pennsylvania | |
Subject: | Mandarin Chinese language | |
Subject (ISO639): | cmn | |
Subject (OLAC): | historical_linguistics | |
linguistics_and_literature | ||
Type (DCMI): | Text | |
Type (Discourse): | narrative | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2025T06 | |
DateStamp: | 2025-06-16 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Peng, Weiming; Zhao, Min; He, Jing; Song, Yuchen; Song, Tianbao; Guo, Dongdong; Sun, Jingbo; Zhu, Shuqin; Zhang, Yinbin; Wei, Zuntian; Hu, Jiajia; Song, Jihua; Sui, Zhifang; Wang, Ning. 2025. Linguistic Data Consortium. | |
Terms: | area_Asia country_CN dcmi_Text iso639_cmn iso639_zho olac_historical_linguistics olac_linguistics_and_literature olac_narrative olac_primary_text | |
Inferred Metadata | ||
Country: | China | |
Area: | Asia |