OLAC Record
oai:www.ldc.upenn.edu:LDC93T3D

Metadata
Title:TIPSTER Volume 3
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Harman, Donna, and Mark Liberman. TIPSTER Volume 3 LDC93T3D. Web Download. Philadelphia: Linguistic Data Consortium, 1993
Contributor:Harman, Donna
Liberman, Mark
Date (W3CDTF):1993
Description:*Introduction* TIPSTER is sometimes also called the Text Research Collection Volume or TREC. TIPSTER Volume 3 contains disk 3, and TIPSTER Complete (LDC93T3A) contains all disks. The TIPSTER project was sponsored by the Software and Intelligent Systems Technology Office of the Advanced Research Projects Agency (ARPA/SISTO) in an effort to significantly advance the state of the art in effective document detection (information retrieval) and data extraction from large, real-world data collections. The detection data is comprised of a test collection built at NIST for the TIPSTER project and the related TREC project. The TREC project has many other participating information retrieval research groups, working on the same task as the TIPSTER groups, but meeting once a year in a workshop to compare results (similar to MUC). The test collection consists of three CD-ROMs of SGML encoded documents distributed by LDC plus queries and answers (relevant documents) distributed by NIST. *Data* The documents in the test collection are varied in style, size and subject domain. The third disk contains information from the Computer Select disks (Ziff-Davis Publishing), plus material from the San Jose Mercury News (1991), more AP newswire (1990) and about 250 megabytes of formatted U.S. Patents. The format of all the documents is relatively clean and easy to use, with SGML-like tags separating documents and document fields. There is no part-of-speech tagging or breakdown into individual sentences or paragraphs as the purpose of this collection is to test retrieval against real-world data. Source (vol) Year Approx. # Words (Millions) Associated Press (3) 1990 37 Ziff/Davis (3) 1991-92 50 San Jose Mercury News (3) 1991 45 *Samples* Please view these samples: * AP Newswire * Computer Select * San Jose Mercury News * U.S. Patents *Updates* The three Tipster discs so far released have been re-issued with updates and corrections and all recipients of the earlier versions should have received these replacements free of charge. If you think you have the unrevised original, contact LDC for confirmation.
Identifier:LDC93T3D
https://catalog.ldc.upenn.edu/LDC93T3D
ISBN: 1-58563-023-3
ISLRN: 890-582-278-450-2
DOI: 10.35111/eqgq-c895
Language:English
Language (ISO639):eng
License:Tipster Volume 3 Agreement Individual: https://catalog.ldc.upenn.edu/license/tipster-volume-3-individual.pdf
Tipster Volume 3 Agreement Organization: https://catalog.ldc.upenn.edu/license/tipster-volume-3-organization.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC93T3D
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC93T3D
DateStamp:  2024-05-22
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Harman, Donna; Liberman, Mark. 1993. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Text iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC93T3D
Up-to-date as of: Fri Dec 6 7:47:05 EST 2024