OLAC Record oai:www.ldc.upenn.edu:LDC2000T43 |
Metadata | ||
Title: | BLLIP 1987-89 WSJ Corpus Release 1 | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Charniak, Eugene, et al. BLLIP 1987-89 WSJ Corpus Release 1 LDC2000T43. Web Download. Philadelphia: Linguistic Data Consortium, 2000 | |
Contributor: | Charniak, Eugene | |
Blaheta, Don | ||
Ge, Niyu | ||
Hall, Keith | ||
Hale, John | ||
Johnson, Mark | ||
Date (W3CDTF): | 2000 | |
Description: | *Introduction* Brown Laboratory for Linguistic Information Processing (BLLIP)1987-89 WSJ Corpus Release 1 contains a complete, Treebank-style part-of-speech (POS) tagged and parsed version of the three-year Wall Street Journal (WSJ) collection from ACL/DCI (LDC93T1), approximately 30 million words. The annotation was performed using statistically-based methods developed by BLIIP researchers Eugene Charniak, Don Blaheta, Niyu Ge, Keith Hall, John Hale and Mark Johnson. This corpus both overlaps and supplements the million-word Penn Treebank (PTB) collection of parsed and POS-tagged WSJ texts. *Data* The PTB project selected 2,499 stories from a three-year WSJ collection of 98,732 stories for syntactic annotation. These 2,499 stories are distributed in Treebank-2 (LDC95T7) and Treebank-3 (LDC99T42), both of which include the raw text for each story. *Updates* There are no updates at this time. | |
Extent: | Corpus size: 1048576 KB | |
Identifier: | LDC2000T43 | |
https://catalog.ldc.upenn.edu/LDC2000T43 | ||
ISBN: 1-58563-165-5 | ||
ISLRN: 233-420-716-637-7 | ||
DOI: 10.35111/fwew-da58 | ||
Language: | English | |
Language (ISO639): | eng | |
License: | BLLIP 1987-89 WSJ Corpus Release 1 License Agreement: https://catalog.ldc.upenn.edu/license/bllip-1987-89-wsj-corpus-release-1-license-agreement.pdf | |
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2000T43 | |
Rights Holder: | Portions © 1987-1989 Dow Jones & Company, Inc., © 2000 Trustees of the University of Pennsylvania | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2000T43 | |
DateStamp: | 2020-11-30 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Charniak, Eugene; Blaheta, Don; Ge, Niyu; Hall, Keith; Hale, John; Johnson, Mark. 2000. Linguistic Data Consortium. | |
Terms: | area_Europe country_GB dcmi_Text iso639_eng olac_primary_text |