OLAC Record
oai:www.ldc.upenn.edu:LDC2020T10

Metadata
Title:LORELEI Entity Detection and Linking Knowledge Base
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Strassel, Stephanie, et al. LORELEI Entity Detection and Linking Knowledge Base LDC2020T10. Web Download. Philadelphia: Linguistic Data Consortium, 2020
Contributor:Strassel, Stephanie
Tracey, Jennifer
Bies, Ann
Kuster, Neil
Ciul, Michael
Date (W3CDTF):2020
Date Issued (W3CDTF):2020-05-15
Description:*Introduction* LORELEI Entity Detection and Linking Knowledge Base was developed by the Linguistic Data Consortium (LDC) and contains the full LORELEI Entity Detection and Linking (EDL) Knowledge Base (KB) used for all LORELEI Representative Language and Incident Language Pack entity linking annotation. The KB content was drawn from GeoNames, the CIA World Leaders List and the CIA World Factbook and was supplemented with manually-created KB entries developed specifically for LORELEI data. The LORELEI (Low Resource Languages for Emergent Incidents) Program was concerned with building human language technology for low resource languages in the context of emergent situations like natural disasters or disease outbreaks. Linguistic resources for LORELEI include Representative Language Packs and Incident Language Packs for over two dozen low resource languages, comprising data, annotations, basic natural language processing tools, lexicons and grammatical resources. Representative languages were selected to provide broad typological coverage, while incident languages were selected to evaluate system performance on a language whose identity was disclosed at the start of the evaluation. *Data* This corpus is comprised of an English knowledge base to support the EDL task in LORELEI for four entity types: geo-political entities (GPE), locations, including facilities (LOC), persons (PER) and organizations (ORG). There are four inputs to the KB, each designated by a unique "origin" code in the KB, as follows: GPE and LOC entities from a 2015 snapshot of GeoNames, PER entities from the CIA World Leaders List dated May 2015, ORG entities from Appendix B of the CIA World Factbook downloaded in 2015, and additional entities manually created by LDC for each of the representative and incident languages. The KB contains a total of 10,216,832 entities and consists of three tab-delimited files, which are linked via the entityid in each entry. More information is contained in the included documentation. *Acknowledgement* This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-15-C-0123. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of DARPA. *Samples* Please view the following samples: * Alternate Names Sample * Entities Sample * Member States Sample *Updates* None at this time.
Extent:Corpus size: 805106 KB
Identifier:LDC2020T10
https://catalog.ldc.upenn.edu/LDC2020T10
ISBN: 1-58563-926-5
ISLRN: 571-976-494-378-2
DOI: 10.35111/8rdp-tq10
Language:English
Language (ISO639):eng
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2020T10
Rights Holder:Portions © 2020 Trustees of the University of Pennsylvania
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2020T10
DateStamp:  2021-01-01
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Strassel, Stephanie; Tracey, Jennifer; Bies, Ann; Kuster, Neil; Ciul, Michael. 2020. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Text iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2020T10
Up-to-date as of: Fri Dec 6 7:48:58 EST 2024