OLAC Record: Benchmarks for Open Relation Extraction

OLAC Record
oai:www.ldc.upenn.edu:LDC2014T27

Metadata

Title: Benchmarks for Open Relation Extraction

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Mesquita, Filipe, Jordan Schmidek, and Denilson Barbosa. Benchmarks for Open Relation Extraction LDC2014T27. Web Download. Philadelphia: Linguistic Data Consortium, 2014

Contributor: Mesquita, Filipe

Schmidek, Jordan

Barbosa, Denilson

Date (W3CDTF): 2014

Date Issued (W3CDTF): 2014-12-15

Description: * Introduction* Benchmarks for Open Relation Extraction was developed by the University of Alberta and contains annotations for approximately 14,000 sentences from The New York Times Annotated Corpus (LDC2008T19) and Treebank-3 (LDC99T42). This corpus was designed to contain benchmarks for the task of open relation extraction (ORE), along with sample extractions from ORE methods and evaluation scripts for computing a method's precision and recall. ORE attempts to extract as many relations as described in a corpus without relying on relation-specific training data. The traditional approach to relation extraction requires substantial training effort for each relation of interest. That can be unpractical for massive collections such as found on the web. Open relation extraction offers an alternative by extracting unseen relations as they come. It does not require training data for any particular relation, making it suitable for applications that require a large (or even unknown) number of relations. Results published in ORE literature are often not comparable due to the lack of reusable annotations and differences in evaluation methodology. The goal of this benchmark data set is to provide annotations that are flexible and can be used to evaluate a wide range of methods. *Data* Binary and n-ary relations were extracted from the text sources. Sentences were annotated for binary relations manually and automatically. In the manual sentence annotation, two entities and a trigger (a single token indicating a relation) were identified for the relation between them, if one existed. A window of tokens allowed to be in a relation was specified; that included modifiers of the trigger and prepositions connecting triggers to their arguments. For each sentence annotated with two entities, a system must extract a string representing the relation between them. The evaluation method deemed an extraction as correct if it contained the trigger and allowed tokens only. The automatic annotator identified pairs of entities and a trigger of the relation between them; the evaluation script for that experiment deemed an extraction correct if it contained the annotated trigger. For n-ary relations, sentences were annotated with one relation trigger and all of its arguments. An extracted argument was deemed correct if it was annotated in the sentence. This release also includes extractions from the following ORE methods: ReVerb, SONEX, OLLIE, PATTY, TreeKernel, SwiRL, Lund and EXEMPLAR. Evaluation scripts are also provided for computing a method's precision and recall. *Samples* Please view this sample. *Updates* None at this time.

Extent: Corpus size: 35272 KB

Identifier: LDC2014T27

https://catalog.ldc.upenn.edu/LDC2014T27

ISBN: 1-58563-698-3

ISLRN: 911-510-844-212-7

DOI: 10.35111/wxrn-qr14

Language: English

Language (ISO639): eng

License: Benchmarks for Open Relation Extraction: https://catalog.ldc.upenn.edu/license/benchmarks-for-open-relation-extraction.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2014T27

Rights Holder: Portions © 1987-2008 New York Times, © 2014 The Governors of the University of Alberta, © 1999, 2008, 2014 Trustees of the University of Pennsylvania

Type (DCMI): Text

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2014T27

DateStamp: 2020-11-30

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Mesquita, Filipe; Schmidek, Jordan; Barbosa, Denilson. 2014. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Text iso639_eng olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2014T27
Up-to-date as of: Tue Jan 2 7:32:11 EST 2024

Metadata
Title:		Benchmarks for Open Relation Extraction
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Mesquita, Filipe, Jordan Schmidek, and Denilson Barbosa. Benchmarks for Open Relation Extraction LDC2014T27. Web Download. Philadelphia: Linguistic Data Consortium, 2014
Contributor:		Mesquita, Filipe
		Schmidek, Jordan
		Barbosa, Denilson
Date (W3CDTF):		2014
Date Issued (W3CDTF):		2014-12-15
Description:		* Introduction* Benchmarks for Open Relation Extraction was developed by the University of Alberta and contains annotations for approximately 14,000 sentences from The New York Times Annotated Corpus (LDC2008T19) and Treebank-3 (LDC99T42). This corpus was designed to contain benchmarks for the task of open relation extraction (ORE), along with sample extractions from ORE methods and evaluation scripts for computing a method's precision and recall. ORE attempts to extract as many relations as described in a corpus without relying on relation-specific training data. The traditional approach to relation extraction requires substantial training effort for each relation of interest. That can be unpractical for massive collections such as found on the web. Open relation extraction offers an alternative by extracting unseen relations as they come. It does not require training data for any particular relation, making it suitable for applications that require a large (or even unknown) number of relations. Results published in ORE literature are often not comparable due to the lack of reusable annotations and differences in evaluation methodology. The goal of this benchmark data set is to provide annotations that are flexible and can be used to evaluate a wide range of methods. Data Binary and n-ary relations were extracted from the text sources. Sentences were annotated for binary relations manually and automatically. In the manual sentence annotation, two entities and a trigger (a single token indicating a relation) were identified for the relation between them, if one existed. A window of tokens allowed to be in a relation was specified; that included modifiers of the trigger and prepositions connecting triggers to their arguments. For each sentence annotated with two entities, a system must extract a string representing the relation between them. The evaluation method deemed an extraction as correct if it contained the trigger and allowed tokens only. The automatic annotator identified pairs of entities and a trigger of the relation between them; the evaluation script for that experiment deemed an extraction correct if it contained the annotated trigger. For n-ary relations, sentences were annotated with one relation trigger and all of its arguments. An extracted argument was deemed correct if it was annotated in the sentence. This release also includes extractions from the following ORE methods: ReVerb, SONEX, OLLIE, PATTY, TreeKernel, SwiRL, Lund and EXEMPLAR. Evaluation scripts are also provided for computing a method's precision and recall. Samples Please view this sample. Updates None at this time.
Extent:		Corpus size: 35272 KB
Identifier:		LDC2014T27
		https://catalog.ldc.upenn.edu/LDC2014T27
		ISBN: 1-58563-698-3
		ISLRN: 911-510-844-212-7
		DOI: 10.35111/wxrn-qr14
Language:		English
Language (ISO639):		eng
License:		Benchmarks for Open Relation Extraction: https://catalog.ldc.upenn.edu/license/benchmarks-for-open-relation-extraction.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2014T27
Rights Holder:		Portions © 1987-2008 New York Times, © 2014 The Governors of the University of Alberta, © 1999, 2008, 2014 Trustees of the University of Pennsylvania
Type (DCMI):		Text
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2014T27
DateStamp:		2020-11-30
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Mesquita, Filipe; Schmidek, Jordan; Barbosa, Denilson. 2014. Linguistic Data Consortium.
Terms:		area_Europe country_GB dcmi_Text iso639_eng olac_primary_text