OLAC Record oai:www.ldc.upenn.edu:LDC2014T16 |
Metadata | ||
Title: | TAC KBP Reference Knowledge Base | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Simpson, Heather, et al. TAC KBP Reference Knowledge Base LDC2014T16. Web Download. Philadelphia: Linguistic Data Consortium, 2014 | |
Contributor: | Simpson, Heather | |
Ellis, Joe | ||
Parker, Robert | ||
Strassel, Stephanie | ||
Date (W3CDTF): | 2014 | |
Date Issued (W3CDTF): | 2014-08-15 | |
Description: | *Introduction* TAC KBP Reference Knowledge Base was developed by the Linguistic Data Consortium (LDC) in support of the NIST-sponsored TAC-KBP evaluation series. It is a knowledge base built from English Wikipedia articles and their associated infoboxes and covers over 800,000 entities. LDC also released TAC KBP Spanish Cross-lingual Entity Linking - Comprehensive Training and Evaluation Data 2012-2014 (LDC2016T26.) TAC (Text Analysis Conference) is a series of workshops organized by NIST (the National Institute of Standards and Technology) to encourage research in natural language processing and related applications by providing a large test collection, common evaluation procedures, and a forum for researchers to share their results. TAC's KBP track (Knowledge Base Population) encourages the development of systems that can match entities mentioned in natural texts with those appearing in a knowledge base and extract novel information about entities from a document collection and add it to a new or existing knowledge base. Consult the LDC TAC-KBP project page for further information about LDC's resource development for the TAC-KBP program. *Data* The source data (Wikipedia infoboxes and articles) was taken from an October 2008 snapshot of Wikipedia. TAC KBP Reference Knowledge Base contains a set of entities, each with a canonical name and title for the Wikipedia page, an entity type, an automatically parsed version of the data from the infobox in the entity's Wikipedia article, and a stripped version of the text of the Wiki article. Each entity is assigned one of four types: PER (person), ORG (organization), GPE (geo-political entity) and UKN (unknown). All data files are presented as UTF-8 encoded XML. *Samples* Please view the following sample. *Updates* None at this time. | |
Extent: | Corpus size: 2597264 KB | |
Identifier: | LDC2014T16 | |
https://catalog.ldc.upenn.edu/LDC2014T16 | ||
ISBN: 1-58563-685-1 | ||
ISLRN: 043-495-621-872-3 | ||
DOI: 10.35111/4yac-wb16 | ||
Language: | English | |
Language (ISO639): | eng | |
License: | LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf | |
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2014T16 | |
Rights Holder: | Portions © 2008-2009, 2014 Trustees of the University of Pennsylvania | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2014T16 | |
DateStamp: | 2020-11-30 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Simpson, Heather; Ellis, Joe; Parker, Robert; Strassel, Stephanie. 2014. Linguistic Data Consortium. | |
Terms: | area_Europe country_GB dcmi_Text iso639_eng olac_primary_text |