OLAC Record
oai:www.ldc.upenn.edu:LDC2023T05

Metadata
Title:Penn Korean Universal Dependency Treebank
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Choi, Jinho D., et al. Penn Korean Universal Dependency Treebank LDC2023T05. Web Download. Philadelphia: Linguistic Data Consortium, 2023
Contributor:Choi, Jinho D.
Han, Na-Rae
Hwang, Jena D.
Kim, Hansaem
Date (W3CDTF):2023
Date Issued (W3CDTF):2023-04-17
Description:*Introduction* Penn Korean Universal Dependency Treebank contains 5,010 sentences and 132,041 tokens annotated in dependency format under the Universal Dependencies framework. It is a conversion of Korean Treebank Annotations Version 2.0 (LDC2006T09) which was produced in constituency format. In general, dependency grammar is based on the idea that the verb is the center of the clause structure and that other units in the sentence are connected to the verb as directed links or dependencies. This is a one-to-one correspondence: for every element in the sentence there is one node in the sentence structure that corresponds to that element. In constituency or phrase structure grammars, on the other hand, clauses are divided into noun phrases and verb phrases and in each sentence, one or more nodes may correspond to one element. *Data* The source text is newswire stories from the Linguistic Data Consortium's Korean Press Agency collection contained in Korean Newswire (LDC2000T45). Sentences were automatically converted for dependency annotation; the output was manually checked. The corpus contains 112 files in CoNLL-U format, the Universal Dependencies standard, with a mapping to their counterpart in LDC2006T09. *Samples* Please view the following sample: * CoNLL-U *Updates* None at this time.
Extent:Corpus size: 4237 KB
Identifier:LDC2023T05
https://catalog.ldc.upenn.edu/LDC2023T05
ISLRN: 522-574-570-040-8
DOI: 10.35111/d63z-aw81
Language:Korean
Language (ISO639):kor
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2023T05
Rights Holder:Portions © 2023 Jinho D. Choi, © 2023 Na-Rae Han, © 2023 Jena D. Hwang, © 2023 Hansaem Kim, © 1994-2000 Korean Press Agency, © 2000-2002, 2006, 2023 Trustees of the University of Pennsylvania
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2023T05
DateStamp:  2024-01-01
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Choi, Jinho D.; Han, Na-Rae; Hwang, Jena D.; Kim, Hansaem. 2023. Linguistic Data Consortium.
Terms: area_Asia country_KR dcmi_Text iso639_kor olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2023T05
Up-to-date as of: Tue Feb 13 6:33:51 EST 2024