OLAC Record: TRECVID 2003 Keyframes & Transcripts

OLAC Record
oai:www.ldc.upenn.edu:LDC2007V02

Metadata

Title: TRECVID 2003 Keyframes & Transcripts

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Quenot, Georges, Paul Over, and Kevin Walker. TRECVID 2003 Keyframes & Transcripts LDC2007V02. Web Download. Philadelphia: Linguistic Data Consortium, 2007

Contributor: Quenot, Georges

Over, Paul

Walker, Kevin

Date (W3CDTF): 2007

Date Issued (W3CDTF): 2007-04-18

Description: *Introduction* The TREC Video Retrieval Evaluation (TRECVID) was sponsored by the National Institute of Standards and Technology (NIST) to promote progress in content-based retrieval from digital video via open, metrics-based evaluation. The keyframes in this release were extracted for use in the NIST TRECVID 2003 Evaluation. TRECVID was a laboratory-style evaluation that attempted to model real world situations or significant component tasks involved in such situations. In 2003 there were four main tasks with associated tests: * shot boundary determination * story segtmentation * high-level feature extraction * search (interactive and manual) For a detailed description of the TRECVID Evaluation Tasks, please refer to the NIST TRECVID 2003 Evaluation Description. *Data* The source data is English language broadcast programming collected by the Linguistic Data Consortium in 1998 from ABC ("World News Tonight") and CNN ("CNN Headline News"). Shots are fundamental units of video, useful for higher-level processing. To create the master list of shots, the video was segmented. The results of this pass are called subshots. Because the master shot reference is designed for use in manual assessment, a second pass over the segmentation was made to create the master shots of at least 2 seconds in length. These master shots are the ones used in submitting results for the feature and search tasks in the evaluation. In the second pass, starting at the beginning of each file, the subshots were aggregated, if necessary, until the currrent shot was at least 2 seconds in duration, at which point the aggregation began anew with the next subshot. The keyframes were selected by going to the middle frame of the shot boundary, then parsing left and right of that frame to locate the nearest I-Frame. This then became the keyframe and was extracted. Keyframes have been provided at both the subshot (NRKF) and master shot (RKF) levels. In a small number of cases (all of them subshots) there was no I-Frame within the subshot boundaries. When this occured, the middle frame was selected. There is one anomaly: at the end of the first video in the test collection, a subshot occurs outside a master shot.) The emphasis in the common shot boundary reference is on the shots, not the transitions. The shots are contiguous. There are no gaps between them. They do not overlap. The media time format is based on the Gregorian day time (ISO 8601) norm. Fractions are defined by counting pre-specified fractions of a second. *Samples* For an example of the data in this corpus, please see the keyframe and annotation files below: * shotinfo.xml * transcripts.sgml

Extent: Corpus size: 3407872 KB

Identifier: LDC2007V02

https://catalog.ldc.upenn.edu/LDC2007V02

ISBN: 1-58563-436-0

ISLRN: 558-793-302-438-0

DOI: 10.35111/0kxe-zq83

Language: English

Language (ISO639): eng

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2007V02

Rights Holder: Portions © 1998 American Broadcasting Company, © 1998 Cable News Network, LP, LLLP, © 1998, 2003, 2007 Trustees of the University of Pennsylvania

Type (DCMI): MovingImage

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2007V02

DateStamp: 2022-12-05

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Quenot, Georges; Over, Paul; Walker, Kevin. 2007. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_MovingImage iso639_eng olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2007V02
Up-to-date as of: Wed Oct 29 7:01:00 EDT 2025

Metadata
Title:		TRECVID 2003 Keyframes & Transcripts
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Quenot, Georges, Paul Over, and Kevin Walker. TRECVID 2003 Keyframes & Transcripts LDC2007V02. Web Download. Philadelphia: Linguistic Data Consortium, 2007
Contributor:		Quenot, Georges
		Over, Paul
		Walker, Kevin
Date (W3CDTF):		2007
Date Issued (W3CDTF):		2007-04-18
Description:		Introduction The TREC Video Retrieval Evaluation (TRECVID) was sponsored by the National Institute of Standards and Technology (NIST) to promote progress in content-based retrieval from digital video via open, metrics-based evaluation. The keyframes in this release were extracted for use in the NIST TRECVID 2003 Evaluation. TRECVID was a laboratory-style evaluation that attempted to model real world situations or significant component tasks involved in such situations. In 2003 there were four main tasks with associated tests: * shot boundary determination * story segtmentation * high-level feature extraction * search (interactive and manual) For a detailed description of the TRECVID Evaluation Tasks, please refer to the NIST TRECVID 2003 Evaluation Description. Data The source data is English language broadcast programming collected by the Linguistic Data Consortium in 1998 from ABC ("World News Tonight") and CNN ("CNN Headline News"). Shots are fundamental units of video, useful for higher-level processing. To create the master list of shots, the video was segmented. The results of this pass are called subshots. Because the master shot reference is designed for use in manual assessment, a second pass over the segmentation was made to create the master shots of at least 2 seconds in length. These master shots are the ones used in submitting results for the feature and search tasks in the evaluation. In the second pass, starting at the beginning of each file, the subshots were aggregated, if necessary, until the currrent shot was at least 2 seconds in duration, at which point the aggregation began anew with the next subshot. The keyframes were selected by going to the middle frame of the shot boundary, then parsing left and right of that frame to locate the nearest I-Frame. This then became the keyframe and was extracted. Keyframes have been provided at both the subshot (NRKF) and master shot (RKF) levels. In a small number of cases (all of them subshots) there was no I-Frame within the subshot boundaries. When this occured, the middle frame was selected. There is one anomaly: at the end of the first video in the test collection, a subshot occurs outside a master shot.) The emphasis in the common shot boundary reference is on the shots, not the transitions. The shots are contiguous. There are no gaps between them. They do not overlap. The media time format is based on the Gregorian day time (ISO 8601) norm. Fractions are defined by counting pre-specified fractions of a second. Samples For an example of the data in this corpus, please see the keyframe and annotation files below: * shotinfo.xml * transcripts.sgml
Extent:		Corpus size: 3407872 KB
Identifier:		LDC2007V02
		https://catalog.ldc.upenn.edu/LDC2007V02
		ISBN: 1-58563-436-0
		ISLRN: 558-793-302-438-0
		DOI: 10.35111/0kxe-zq83
Language:		English
Language (ISO639):		eng
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2007V02
Rights Holder:		Portions © 1998 American Broadcasting Company, © 1998 Cable News Network, LP, LLLP, © 1998, 2003, 2007 Trustees of the University of Pennsylvania
Type (DCMI):		MovingImage
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2007V02
DateStamp:		2022-12-05
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Quenot, Georges; Over, Paul; Walker, Kevin. 2007. Linguistic Data Consortium.
Terms:		area_Europe country_GB dcmi_MovingImage iso639_eng olac_primary_text