OLAC Record oai:www.ldc.upenn.edu:LDC2015T13 |
Metadata | ||
Title: | English News Text Treebank: Penn Treebank Revised | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Bies, Ann, Justin Mott, and Colin Warner. English News Text Treebank: Penn Treebank Revised LDC2015T13. Web Download. Philadelphia: Linguistic Data Consortium, 2015 | |
Contributor: | Bies, Ann | |
Mott, Justin | ||
Warner, Colin | ||
Date (W3CDTF): | 2015 | |
Date Issued (W3CDTF): | 2015-07-15 | |
Description: | *Introduction* English News Text Treebank: Penn Treebank Revised was developed by the Linguistic Data Consortium (LDC) with funding through a gift from Google Inc. It consists of a combination of automated and manual revisions of the Penn Treebank annotation of Wall Street Journal (WSJ) stories. The data is comprised of 1,203,648 word-level tokens in 49,191 sentence-level tokens -- in all 2,312 of the original Penn Treebank WSJ files. *Data* This release includes revised tokenization, part-of-speech, and syntactic treebank annotation intended to bring the full WSJ treebank section into compliance with the agreed-upon policies and updates implemented for current English treebank annotation specifications at LDC. Examples include English Web Treebank (LDC2012T13), OntoNotes (LDC2013T19), and English translation treebanks such as English Translation Treebank: An-Nahar Newswire (LDC2012T02). English Treebank Supplemental Guidelines are included in this release. *Samples* Please view this treebank and tokenized samples. *Updates* None at this time. | |
Extent: | Corpus size: 55112 KB | |
Identifier: | LDC2015T13 | |
https://catalog.ldc.upenn.edu/LDC2015T13 | ||
ISBN: 1-58563-724-6 | ||
DOI: 10.35111/xpjy-at91 | ||
Language: | English | |
Language (ISO639): | eng | |
License: | LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf | |
Medium: | Distribution: Web Download | |
Provenance: | Collected by the Linguistic Data Consortium (LDC) in Philadelphia, PA, USA. | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2015T13 | |
Rights Holder: | Portions © 1987-1989 Dow Jones & Company, Inc., © 1999, 2015 Trustees of the University of Pennsylvania | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2015T13 | |
DateStamp: | 2020-11-30 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Bies, Ann; Mott, Justin; Warner, Colin. 2015. Linguistic Data Consortium. | |
Terms: | area_Europe country_GB dcmi_Text iso639_eng olac_primary_text |