OLAC Record
oai:lindat.mff.cuni.cz:11372/LRT-1226

Metadata
Title:LX-NER
Bibliographic Citation:http://hdl.handle.net/11372/LRT-1226
Contributor:Balsa, João
Branco, António
Ferreira, Eduardo
Silveira, Sara
Date (W3CDTF):2014-07-30T21:28:14Z
Date Available:2014-07-30T21:28:14Z
Description:LX-NER is a Named Entity Recognizer for Portuguese. LX-NER takes a segment of Portuguese text and identifies, circumscribes and classifies the expressions for named entities it contains. Furthermore, each named entity receives a standard representation. It handles the following types of expressions: * Number-based expressions o Numbers: Expressions denoting numbers are marked as NUMEX. A list of subtypes is considered, allowing for a more refined classification of these expressions: + Arabic: Entities expressed by a sequence of digits, with the option of using a period to separate a string of 3 digits, counting from the right. + Decimal: Entities expressed by an arabic number followed by a decimal part, with a comma separating both parts. + Non-compliant: Entities expressed by digits, the period and comma symbols, organized in any possible way. All entities not covered by the previous 2 subtypes are included here. + Roman: Entities expressed by the roman letters [IVXLCDM], in either uppercase or lowercase, with the string of letters obeying the well-formedness rules for roman numerals. + Cardinal: Entities that are expressed by a full or partial word description of an arabic or decimal number. A full cardinal numeral is composed of words, while a partial cardinal number is a hybrid composed by words and arabic or decimal numbers. + Fraction: Entities expressed by arabic, decimal or cardinal numbers, and specific symbols or expressions representing division. + Magnitude class: Entities expressed by arabic, decimal or cardinal numbers together with expressions representing numerical magnitude. o Measures: Terms expressing measure values are marked as MEASEX. A list of subtypes is considered, allowing for a more refined classification of these expressions: + Currency: Expressions composed of an arabic, decimal or cardinal number followed by a word or expression representing a currency (e.g. libras). + Time: Expressions composed of an arabic, decimal or cardinal number followed by a word or expression representing a time measure (e.g. segundos). + Scientifc units: Expressions composed of an arabic, decimal or cardinal number followed by a word or expression representing a scientific unit (e.g. toneladas). o Time: Terms expressing time are marked as TIMEX. A list of subtypes is considered, allowing for a more refined classification of these expressions: + Date: Expressions representing a date, whose components can be a day of the week (e.g. Segunda-Feira), a day of the month (e.g. 27), a month (e.g. Novembro) or a year (e.g. 2006). + Time periods: Expressions made by arabic, roman or cardinal numbers and an explicit indication of a period of time concerning a specific year, decade or century. + Time of the day: Expressions with different formats, indicating a specific time of the day. o Addresses: Expressions conveying addresses are marked as ADDREX. A list of subparts is considered, allowing for a more refined classification of these expressions: + Global section: Expressions referring to the global position of a certain location (e.g. Rua Almeida Garrett). This address part is mandatory for an address to be recognized. + Local section: Expressions referring to a specific position within the global position (e.g. Nº 17 - 7º Dto). + Zip code: Expressions referring to the zip code component of an address (e.g. 3654-548 Lisboa). * Name-based expressions o Names: Expressions conveying names are marked as NAMEX. A list of subtypes is considered, allowing for a more refined classification of these expressions: + Persons: Expressions conveying names of people, with the option of considering the job or social status of a person if present (e.g. Presidente Cavaco Silva). + Organizations: Expressions conveying names of companies (e.g. LG Electronics) and political organizations (e.g. ONU). + Locations: Expressions referring to specific geographical locations (e.g. Portugal). + Events: Expressions referring to competitions, conferences, workshops and similar events (e.g. 2ª Conferência Sobre o Acesso Livre ao Conhecimento). + Works: Expressions referring to movies, books, paintings and similar works (e.g. O Retrato de Dorian Gray). + Miscellaneous: Expressions referring to entities that can't be classified according to any of the previous subtypes (e.g. Boeing 747).
Identifier (URI):http://hdl.handle.net/11372/LRT-1226
Language:Portuguese
Language (ISO639):por
Publisher:NLX-Natural Language and Speech Group, University of Lisbon
Type:toolService
Type (DCMI):Software

OLAC Info

Archive:  LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11372/LRT-1226
DateStamp:  2016-04-06
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Balsa, João; Branco, António; Ferreira, Eduardo; Silveira, Sara. 2014. NLX-Natural Language and Speech Group, University of Lisbon.
Terms: area_Europe country_PT dcmi_Software iso639_por


http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11372/LRT-1226
Up-to-date as of: Mon Feb 10 15:12:29 EST 2020