OLAC Record oai:www.ldc.upenn.edu:LDC2020T10 |
Metadata | ||
Title: | LORELEI Entity Detection and Linking Knowledge Base | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Strassel, Stephanie, et al. LORELEI Entity Detection and Linking Knowledge Base LDC2020T10. Web Download. Philadelphia: Linguistic Data Consortium, 2020 | |
Contributor: | Strassel, Stephanie | |
Tracey, Jennifer | ||
Bies, Ann | ||
Kuster, Neil | ||
Ciul, Michael | ||
Date (W3CDTF): | 2020 | |
Date Issued (W3CDTF): | 2020-05-15 | |
Description: | *Introduction* LORELEI Entity Detection and Linking Knowledge Base was developed by the Linguistic Data Consortium (LDC) and contains the full LORELEI Entity Detection and Linking (EDL) Knowledge Base (KB) used for all LORELEI Representative Language and Incident Language Pack entity linking annotation. The KB content was drawn from GeoNames, the CIA World Leaders List and the CIA World Factbook and was supplemented with manually-created KB entries developed specifically for LORELEI data. The LORELEI (Low Resource Languages for Emergent Incidents) Program was concerned with building human language technology for low resource languages in the context of emergent situations like natural disasters or disease outbreaks. Linguistic resources for LORELEI include Representative Language Packs and Incident Language Packs for over two dozen low resource languages, comprising data, annotations, basic natural language processing tools, lexicons and grammatical resources. Representative languages were selected to provide broad typological coverage, while incident languages were selected to evaluate system performance on a language whose identity was disclosed at the start of the evaluation. *Data* This corpus is comprised of an English knowledge base to support the EDL task in LORELEI for four entity types: geo-political entities (GPE), locations, including facilities (LOC), persons (PER) and organizations (ORG). There are four inputs to the KB, each designated by a unique "origin" code in the KB, as follows: GPE and LOC entities from a 2015 snapshot of GeoNames, PER entities from the CIA World Leaders List dated May 2015, ORG entities from Appendix B of the CIA World Factbook downloaded in 2015, and additional entities manually created by LDC for each of the representative and incident languages. The KB contains a total of 10,216,832 entities and consists of three tab-delimited files, which are linked via the entityid in each entry. More information is contained in the included documentation. *Acknowledgement* This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-15-C-0123. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of DARPA. *Samples* Please view the following samples: * Alternate Names Sample * Entities Sample * Member States Sample *Updates* None at this time. | |
Extent: | Corpus size: 805106 KB | |
Identifier: | LDC2020T10 | |
https://catalog.ldc.upenn.edu/LDC2020T10 | ||
ISBN: 1-58563-926-5 | ||
ISLRN: 571-976-494-378-2 | ||
DOI: 10.35111/8rdp-tq10 | ||
Language: | English | |
Language (ISO639): | eng | |
License: | LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf | |
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2020T10 | |
Rights Holder: | Portions © 2020 Trustees of the University of Pennsylvania | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2020T10 | |
DateStamp: | 2021-01-01 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Strassel, Stephanie; Tracey, Jennifer; Bies, Ann; Kuster, Neil; Ciul, Michael. 2020. Linguistic Data Consortium. | |
Terms: | area_Europe country_GB dcmi_Text iso639_eng olac_primary_text |