OLAC Record: Korean Propbank

OLAC Record
oai:www.ldc.upenn.edu:LDC2006T03

Metadata

Title: Korean Propbank

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Palmer, Martha, et al. Korean Propbank LDC2006T03. Web Download. Philadelphia: Linguistic Data Consortium, 2006

Contributor: Palmer, Martha

Ryu, Shijong

Choi, Jinyoung

Yoon, Sinwon

Jeon, Yeongmi

Date (W3CDTF): 2006

Date Issued (W3CDTF): 2006-03-24

Description: *Introduction* Korean Propbank was developed by the Computer and Information Sciences Department at the University of Pennsylvania and is comprised of approximately 33,300 predicates annotated in 186,300 words of Korean text. The text used in Propbank comes from Korean English Treebank Annotations (LDC2002T26) and Korean Treebank Version 2.0 (LDC2006T09). Each verb and adjective occurring in the Treebank has been treated as a semantic predicate and the surrounding text has been annotated for arguments and adjuncts of the predicate. The verbs and adjectives have also been tagged with coarse grained senses. *Data* This table gives a breakdown of the thousands of words and number of annotations contained in the corpus, broken down by source: Source K-words Predicates Annotated Virginia Corpus 54.5 9,590 Newswire Corpus 131.8 23,700 Total 186.3 33,300 There are two basic components to Korean Propbank: * The Verb Lexicon: A frames file, consisting of one or more frame sets, has been created for each predicate occurring in the Treebank. These files serve as a reference for the annotators and for users of the data. 2,749 such files have been created, totaling about ~10 MB of uncompressed data. The XML format and KSC 5,601 character set encoding are used in the frames file. * The Annotation: There are two annotation files. The virginia-verbs.pb file has 9,588 annotated predicate tokens. These predicate tokens include all those occurring in 54.5 K-words of the Korean English Treebank Annotations, totaling ~791 KB of uncompressed data. The newswire-verbs.pb file has 23,707 annotated predicate tokens. These predicate tokens include all those occurring in 131.8 K-words of the Korean Treebank Version 2.0, totaling ~2,054 KB of uncompressed data. *Samples* For an example of this corpus, please view this sample (TXT). *Updates* None at this time.

Extent: Corpus size: 4198 KB

Identifier: LDC2006T03

https://catalog.ldc.upenn.edu/LDC2006T03

ISBN: 1-58563-374-7

ISLRN: 815-941-649-807-9

DOI: 10.35111/j0yk-ph77

Language: Korean

Language (ISO639): kor

License: Korean Propbank: https://catalog.ldc.upenn.edu/license/korean-propbank.pdf

LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2006T03

Rights Holder: Portions © 2001-2002 CoGenTex, Inc., © 1994-2000 Korean Press Agency, © 1998-2006 Trustees of the University of Pennsylvania

Type (DCMI): Text

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2006T03

DateStamp: 2021-08-13

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Palmer, Martha; Ryu, Shijong; Choi, Jinyoung; Yoon, Sinwon; Jeon, Yeongmi. 2006. Linguistic Data Consortium.
Terms: area_Asia country_KR dcmi_Text iso639_kor olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2006T03
Up-to-date as of: Wed Oct 29 7:00:53 EDT 2025

Metadata
Title:		Korean Propbank
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Palmer, Martha, et al. Korean Propbank LDC2006T03. Web Download. Philadelphia: Linguistic Data Consortium, 2006
Contributor:		Palmer, Martha
		Ryu, Shijong
		Choi, Jinyoung
		Yoon, Sinwon
		Jeon, Yeongmi
Date (W3CDTF):		2006
Date Issued (W3CDTF):		2006-03-24
Description:		Introduction Korean Propbank was developed by the Computer and Information Sciences Department at the University of Pennsylvania and is comprised of approximately 33,300 predicates annotated in 186,300 words of Korean text. The text used in Propbank comes from Korean English Treebank Annotations (LDC2002T26) and Korean Treebank Version 2.0 (LDC2006T09). Each verb and adjective occurring in the Treebank has been treated as a semantic predicate and the surrounding text has been annotated for arguments and adjuncts of the predicate. The verbs and adjectives have also been tagged with coarse grained senses. Data This table gives a breakdown of the thousands of words and number of annotations contained in the corpus, broken down by source: Source K-words Predicates Annotated Virginia Corpus 54.5 9,590 Newswire Corpus 131.8 23,700 Total 186.3 33,300 There are two basic components to Korean Propbank: * The Verb Lexicon: A frames file, consisting of one or more frame sets, has been created for each predicate occurring in the Treebank. These files serve as a reference for the annotators and for users of the data. 2,749 such files have been created, totaling about ~10 MB of uncompressed data. The XML format and KSC 5,601 character set encoding are used in the frames file. * The Annotation: There are two annotation files. The virginia-verbs.pb file has 9,588 annotated predicate tokens. These predicate tokens include all those occurring in 54.5 K-words of the Korean English Treebank Annotations, totaling ~791 KB of uncompressed data. The newswire-verbs.pb file has 23,707 annotated predicate tokens. These predicate tokens include all those occurring in 131.8 K-words of the Korean Treebank Version 2.0, totaling ~2,054 KB of uncompressed data. Samples For an example of this corpus, please view this sample (TXT). Updates None at this time.
Extent:		Corpus size: 4198 KB
Identifier:		LDC2006T03
		https://catalog.ldc.upenn.edu/LDC2006T03
		ISBN: 1-58563-374-7
		ISLRN: 815-941-649-807-9
		DOI: 10.35111/j0yk-ph77
Language:		Korean
Language (ISO639):		kor
License:		Korean Propbank: https://catalog.ldc.upenn.edu/license/korean-propbank.pdf
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2006T03
Rights Holder:		Portions © 2001-2002 CoGenTex, Inc., © 1994-2000 Korean Press Agency, © 1998-2006 Trustees of the University of Pennsylvania
Type (DCMI):		Text
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2006T03
DateStamp:		2021-08-13
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Palmer, Martha; Ryu, Shijong; Choi, Jinyoung; Yoon, Sinwon; Jeon, Yeongmi. 2006. Linguistic Data Consortium.
Terms:		area_Asia country_KR dcmi_Text iso639_kor olac_primary_text