OLAC Record: Korean English Treebank Annotations

OLAC Record
oai:www.ldc.upenn.edu:LDC2002T26

Metadata

Title: Korean English Treebank Annotations

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Palmer, Martha, et al. Korean English Treebank Annotations LDC2002T26. Web Download. Philadelphia: Linguistic Data Consortium, 2002

Contributor: Palmer, Martha

Han, Chung-Hye

Han, Na-Rae

Ko, Eon-Suk

Yi, Hee-Jong

Lee, Alan

Walker, Chris

Duda, John

Xue, Nianwen

Date (W3CDTF): 2002

Date Issued (W3CDTF): 2002-05-13

Description: *Introduction* This file contains documentation on the Korean English Treebank Annotations, Linguistic Data Consortium (LDC) catalog number LDC2002T26 and ISBN 1-58563-236-8. This corpus consists of 33 texts originally written in Korean and translated into English for the purpose of language training in a military setting. The conversations are not authentic dialogues but were constructed for pedagogical purposes. The texts were made available for linguistic research by the Defense Language Institute (DLI). They were delivered on paper to the Institute for Research in Cognitive Science (IRCS) at the University of Pennsylvania, where they were converted to digital form using the KSC 5601 character set encoding (also known as KS X 1001 Wansung). Both the Korean and English texts are presented with complete Treebank annotation which was done manually at IRCS, including syntactic constituent bracketing and part-of-speech (POS) tagging. Further documentation about the parsing and POS specifications used in these annotations can be found on the Korean NLP web site. *Data* There are 66 data files: 33 for Korean and 33 for English. The text files mostly contain sets of question and answer sentences. A full, unannotated sentence is presented first, on a single line with an initial semi-colon character ";" -- the first token on such lines (the string preceding the first space character on the line) is a sentence-identifier tag that matches the English and Korean versions of the sentence. The parsed/POS-tagged annotation of the sentence follows on subsequent lines. *Updates* There are no updates at this time.

Extent: Corpus size: 2457 KB

Identifier: LDC2002T26

https://catalog.ldc.upenn.edu/LDC2002T26

ISBN: 1-58563-236-8

ISLRN: 977-393-913-599-2

DOI: 10.35111/bpbe-jj95

Language: Korean

English

Language (ISO639): kor

eng

License: Korean English Treebank Annotations: https://catalog.ldc.upenn.edu/license/korean-english-treebank-annotations.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2002T26

Rights Holder: Portions (c) 2001-2002 CoGenTex, Inc., Trustees of the University of Pennsylvania

Type (DCMI): Text

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2002T26

DateStamp: 2020-11-30

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Palmer, Martha; Han, Chung-Hye; Han, Na-Rae; Ko, Eon-Suk; Yi, Hee-Jong; Lee, Alan; Walker, Chris; Duda, John; Xue, Nianwen. 2002. Linguistic Data Consortium.
Terms: area_Asia area_Europe country_GB country_KR dcmi_Text iso639_eng iso639_kor olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2002T26
Up-to-date as of: Wed Oct 29 7:00:13 EDT 2025

Metadata
Title:		Korean English Treebank Annotations
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Palmer, Martha, et al. Korean English Treebank Annotations LDC2002T26. Web Download. Philadelphia: Linguistic Data Consortium, 2002
Contributor:		Palmer, Martha
		Han, Chung-Hye
		Han, Na-Rae
		Ko, Eon-Suk
		Yi, Hee-Jong
		Lee, Alan
		Walker, Chris
		Duda, John
		Xue, Nianwen
Date (W3CDTF):		2002
Date Issued (W3CDTF):		2002-05-13
Description:		Introduction This file contains documentation on the Korean English Treebank Annotations, Linguistic Data Consortium (LDC) catalog number LDC2002T26 and ISBN 1-58563-236-8. This corpus consists of 33 texts originally written in Korean and translated into English for the purpose of language training in a military setting. The conversations are not authentic dialogues but were constructed for pedagogical purposes. The texts were made available for linguistic research by the Defense Language Institute (DLI). They were delivered on paper to the Institute for Research in Cognitive Science (IRCS) at the University of Pennsylvania, where they were converted to digital form using the KSC 5601 character set encoding (also known as KS X 1001 Wansung). Both the Korean and English texts are presented with complete Treebank annotation which was done manually at IRCS, including syntactic constituent bracketing and part-of-speech (POS) tagging. Further documentation about the parsing and POS specifications used in these annotations can be found on the Korean NLP web site. Data There are 66 data files: 33 for Korean and 33 for English. The text files mostly contain sets of question and answer sentences. A full, unannotated sentence is presented first, on a single line with an initial semi-colon character ";" -- the first token on such lines (the string preceding the first space character on the line) is a sentence-identifier tag that matches the English and Korean versions of the sentence. The parsed/POS-tagged annotation of the sentence follows on subsequent lines. Updates There are no updates at this time.
Extent:		Corpus size: 2457 KB
Identifier:		LDC2002T26
		https://catalog.ldc.upenn.edu/LDC2002T26
		ISBN: 1-58563-236-8
		ISLRN: 977-393-913-599-2
		DOI: 10.35111/bpbe-jj95
Language:		Korean
Language:		English
Language (ISO639):		kor
Language (ISO639):		eng
License:		Korean English Treebank Annotations: https://catalog.ldc.upenn.edu/license/korean-english-treebank-annotations.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2002T26
Rights Holder:		Portions (c) 2001-2002 CoGenTex, Inc., Trustees of the University of Pennsylvania
Type (DCMI):		Text
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2002T26
DateStamp:		2020-11-30
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Palmer, Martha; Han, Chung-Hye; Han, Na-Rae; Ko, Eon-Suk; Yi, Hee-Jong; Lee, Alan; Walker, Chris; Duda, John; Xue, Nianwen. 2002. Linguistic Data Consortium.
Terms:		area_Asia area_Europe country_GB country_KR dcmi_Text iso639_eng iso639_kor olac_primary_text