OLAC Record
oai:www.ldc.upenn.edu:LDC98T25

Metadata
Title:TDT Pilot Study Corpus
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Allan, James, et al. TDT Pilot Study Corpus LDC98T25. Web Download. Philadelphia: Linguistic Data Consortium, 1998
Contributor:Allan, James
Yang, Yiming
Carbonell, Jaime
Yamron, Jon
Doddington, George R.
Wayne, Charles
Date (W3CDTF):1998
Description:*Introduction* The TDT Pilot Study corpus was created to support an initiative in "topic detection and tracking." This initiative is directed toward computer processing of language data, both text and speech. The objective is namely to explore techniques for detecting the appearance of new and unexpected topics and for tracking the reappearance and evaluation of them. *Data* The TDT corpus comprises a set of stories that includes both newswire (text) and broadcast news (speech). Each story is represented as a stream of text, in which the text is either taken directly from the newswire (Reuters) or is a manual transcription of the broadcast news speech (CNN). The corpus spans the period from July 1, 1994 to June 30, 1995. It contains approximately 16,000 stories, with about half taken from Reuters newswire and half from CNN broadcast news transcripts. An integral and key part of the corpus is the annotation of the corpus in terms of the events discussed in the stories. 25 events were defined that span a variety of event types and that cover a subset of the events discussed in the corpus stories. Annotation data for these events are included in the corpus and provide a basis for training TDT systems. *Updates* There are no updates at this time.
Identifier:LDC98T25
https://catalog.ldc.upenn.edu/LDC98T25
ISBN: 1-58563-140-X
ISLRN: 770-765-444-577-6
DOI: 10.35111/sxw0-5q10
Language:English
Language (ISO639):eng
License:TDT Pilot Study Agreement: https://catalog.ldc.upenn.edu/license/tdt-pilot-study-license-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC98T25
Rights Holder:Portions © 1994-1995 Cable News Network, LP, LLLP, © 1994-1995 Reuters America, Inc., © 1998 Trustees of the University of Pennsylvania
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC98T25
DateStamp:  2020-11-30
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Allan, James; Yang, Yiming; Carbonell, Jaime; Yamron, Jon; Doddington, George R.; Wayne, Charles. 1998. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Text iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC98T25
Up-to-date as of: Mon Mar 25 7:20:04 EDT 2024