OLAC Record
oai:www.ldc.upenn.edu:LDC2000T43

Metadata
Title:BLLIP 1987-89 WSJ Corpus Release 1
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Charniak, Eugene, et al. BLLIP 1987-89 WSJ Corpus Release 1 LDC2000T43. Web Download. Philadelphia: Linguistic Data Consortium, 2000
Contributor:Charniak, Eugene
Blaheta, Don
Ge, Niyu
Hall, Keith
Hale, John
Johnson, Mark
Date (W3CDTF):2000
Description:*Introduction* Brown Laboratory for Linguistic Information Processing (BLLIP)1987-89 WSJ Corpus Release 1 contains a complete, Treebank-style part-of-speech (POS) tagged and parsed version of the three-year Wall Street Journal (WSJ) collection from ACL/DCI (LDC93T1), approximately 30 million words. The annotation was performed using statistically-based methods developed by BLIIP researchers Eugene Charniak, Don Blaheta, Niyu Ge, Keith Hall, John Hale and Mark Johnson. This corpus both overlaps and supplements the million-word Penn Treebank (PTB) collection of parsed and POS-tagged WSJ texts. *Data* The PTB project selected 2,499 stories from a three-year WSJ collection of 98,732 stories for syntactic annotation. These 2,499 stories are distributed in Treebank-2 (LDC95T7) and Treebank-3 (LDC99T42), both of which include the raw text for each story. *Updates* There are no updates at this time.
Extent:Corpus size: 1048576 KB
Identifier:LDC2000T43
https://catalog.ldc.upenn.edu/LDC2000T43
ISBN: 1-58563-165-5
ISLRN: 233-420-716-637-7
DOI: 10.35111/fwew-da58
Language:English
Language (ISO639):eng
License:BLLIP 1987-89 WSJ Corpus Release 1 License Agreement: https://catalog.ldc.upenn.edu/license/bllip-1987-89-wsj-corpus-release-1-license-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2000T43
Rights Holder:Portions © 1987-1989 Dow Jones & Company, Inc., © 2000 Trustees of the University of Pennsylvania
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2000T43
DateStamp:  2020-11-30
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Charniak, Eugene; Blaheta, Don; Ge, Niyu; Hall, Keith; Hale, John; Johnson, Mark. 2000. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Text iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2000T43
Up-to-date as of: Fri Dec 6 7:46:37 EST 2024