OLAC Record oai:www.ldc.upenn.edu:LDC2014T27 |
Metadata | ||
Title: | Benchmarks for Open Relation Extraction | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Mesquita, Filipe, Jordan Schmidek, and Denilson Barbosa. Benchmarks for Open Relation Extraction LDC2014T27. Web Download. Philadelphia: Linguistic Data Consortium, 2014 | |
Contributor: | Mesquita, Filipe | |
Schmidek, Jordan | ||
Barbosa, Denilson | ||
Date (W3CDTF): | 2014 | |
Date Issued (W3CDTF): | 2014-12-15 | |
Description: | * Introduction* Benchmarks for Open Relation Extraction was developed by the University of Alberta and contains annotations for approximately 14,000 sentences from The New York Times Annotated Corpus (LDC2008T19) and Treebank-3 (LDC99T42). This corpus was designed to contain benchmarks for the task of open relation extraction (ORE), along with sample extractions from ORE methods and evaluation scripts for computing a method's precision and recall. ORE attempts to extract as many relations as described in a corpus without relying on relation-specific training data. The traditional approach to relation extraction requires substantial training effort for each relation of interest. That can be unpractical for massive collections such as found on the web. Open relation extraction offers an alternative by extracting unseen relations as they come. It does not require training data for any particular relation, making it suitable for applications that require a large (or even unknown) number of relations. Results published in ORE literature are often not comparable due to the lack of reusable annotations and differences in evaluation methodology. The goal of this benchmark data set is to provide annotations that are flexible and can be used to evaluate a wide range of methods. *Data* Binary and n-ary relations were extracted from the text sources. Sentences were annotated for binary relations manually and automatically. In the manual sentence annotation, two entities and a trigger (a single token indicating a relation) were identified for the relation between them, if one existed. A window of tokens allowed to be in a relation was specified; that included modifiers of the trigger and prepositions connecting triggers to their arguments. For each sentence annotated with two entities, a system must extract a string representing the relation between them. The evaluation method deemed an extraction as correct if it contained the trigger and allowed tokens only. The automatic annotator identified pairs of entities and a trigger of the relation between them; the evaluation script for that experiment deemed an extraction correct if it contained the annotated trigger. For n-ary relations, sentences were annotated with one relation trigger and all of its arguments. An extracted argument was deemed correct if it was annotated in the sentence. This release also includes extractions from the following ORE methods: ReVerb, SONEX, OLLIE, PATTY, TreeKernel, SwiRL, Lund and EXEMPLAR. Evaluation scripts are also provided for computing a method's precision and recall. *Samples* Please view this sample. *Updates* None at this time. | |
Extent: | Corpus size: 35272 KB | |
Identifier: | LDC2014T27 | |
https://catalog.ldc.upenn.edu/LDC2014T27 | ||
ISBN: 1-58563-698-3 | ||
ISLRN: 911-510-844-212-7 | ||
DOI: 10.35111/wxrn-qr14 | ||
Language: | English | |
Language (ISO639): | eng | |
License: | Benchmarks for Open Relation Extraction: https://catalog.ldc.upenn.edu/license/benchmarks-for-open-relation-extraction.pdf | |
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2014T27 | |
Rights Holder: | Portions © 1987-2008 New York Times, © 2014 The Governors of the University of Alberta, © 1999, 2008, 2014 Trustees of the University of Pennsylvania | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2014T27 | |
DateStamp: | 2020-11-30 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Mesquita, Filipe; Schmidek, Jordan; Barbosa, Denilson. 2014. Linguistic Data Consortium. | |
Terms: | area_Europe country_GB dcmi_Text iso639_eng olac_primary_text |