OLAC Record oai:www.ldc.upenn.edu:LDC2020L01 |
Metadata | ||
Title: | Database of Word Level Statistics - Mandarin | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Neergaard, Karl David, Hongzhi Xu, and Chu-Ren Huang. Database of Word Level Statistics - Mandarin LDC2020L01. Web Download. Philadelphia: Linguistic Data Consortium, 2020 | |
Contributor: | Neergaard, Karl David | |
Xu, Hongzhi | ||
Huang, Chu-Ren | ||
Date (W3CDTF): | 2020 | |
Date Issued (W3CDTF): | 2020-01-15 | |
Description: | *Introduction* Database of Word Level Statistics - Mandarin was developed by The Hong Kong Polytechnic University. It provides lexical characteristics of a descriptive and statistical nature for words and nonwords of Mandarin Chinese. It is designed for researchers particularly concerned with language processing of isolated words. Invariant characteristics include each item's lexicality, sampa, pinyin, IPA transcription, lexical tone, syllable structure, syllable length, pinyin length, segment length, dominant PoS, lexical frequency of the dominant PoS, percent of that dominant PoS, and other PoSes associated with the given item. *Data* The corpus is presented in a series of UTF-8 encoded tab separated plain text files. The original frequency counts were adapted from the word list in Subtlex-CH. Monosyllables from the Subtlex-CH character list that were not present as monosyllabic words were added to the list in order to provide statistical information for all Mandarin syllables. *Samples* Please view this sample. *Updates* None at this time. | |
Extent: | Corpus size: 173827 KB | |
Identifier: | LDC2020L01 | |
https://catalog.ldc.upenn.edu/LDC2020L01 | ||
ISBN: 1-58563-914-1 | ||
ISLRN: 337-551-709-997-4 | ||
DOI: 10.35111/hcwt-mh66 | ||
Language: | Mandarin Chinese | |
Language (ISO639): | cmn | |
License: | Database of Word Level Statistics – Mandarin Agreement: https://catalog.ldc.upenn.edu/license/database-of-word-level-statistics-mandarin-agreement.pdf | |
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2020L01 | |
Rights Holder: | Portions © 2020 The Hong Kong Polytechnic University, © 2020 Trustees of the University of Pennsylvania | |
Subject: | Mandarin Chinese language | |
Subject (ISO639): | cmn | |
Type (DCMI): | Text | |
Type (OLAC): | language_description | |
lexicon | ||
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2020L01 | |
DateStamp: | 2021-01-01 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Neergaard, Karl David; Xu, Hongzhi; Huang, Chu-Ren. 2020. Linguistic Data Consortium. | |
Terms: | area_Asia country_CN dcmi_Text iso639_cmn olac_language_description olac_lexicon | |
Inferred Metadata | ||
Country: | China | |
Area: | Asia |