OLAC Record oai:lindat.mff.cuni.cz:11372/LRT-1226 |
Metadata | ||
Title: | LX-NER | |
Bibliographic Citation: | http://hdl.handle.net/11372/LRT-1226 | |
Contributor: | Balsa, João | |
Branco, António | ||
Ferreira, Eduardo | ||
Silveira, Sara | ||
Date (W3CDTF): | 2014-07-30T21:28:14Z | |
Date Available: | 2014-07-30T21:28:14Z | |
Description: | LX-NER is a Named Entity Recognizer for Portuguese. LX-NER takes a segment of Portuguese text and identifies, circumscribes and classifies the expressions for named entities it contains. Furthermore, each named entity receives a standard representation. It handles the following types of expressions: * Number-based expressions o Numbers: Expressions denoting numbers are marked as NUMEX. A list of subtypes is considered, allowing for a more refined classification of these expressions: + Arabic: Entities expressed by a sequence of digits, with the option of using a period to separate a string of 3 digits, counting from the right. + Decimal: Entities expressed by an arabic number followed by a decimal part, with a comma separating both parts. + Non-compliant: Entities expressed by digits, the period and comma symbols, organized in any possible way. All entities not covered by the previous 2 subtypes are included here. + Roman: Entities expressed by the roman letters [IVXLCDM], in either uppercase or lowercase, with the string of letters obeying the well-formedness rules for roman numerals. + Cardinal: Entities that are expressed by a full or partial word description of an arabic or decimal number. A full cardinal numeral is composed of words, while a partial cardinal number is a hybrid composed by words and arabic or decimal numbers. + Fraction: Entities expressed by arabic, decimal or cardinal numbers, and specific symbols or expressions representing division. + Magnitude class: Entities expressed by arabic, decimal or cardinal numbers together with expressions representing numerical magnitude. o Measures: Terms expressing measure values are marked as MEASEX. A list of subtypes is considered, allowing for a more refined classification of these expressions: + Currency: Expressions composed of an arabic, decimal or cardinal number followed by a word or expression representing a currency (e.g. libras). + Time: Expressions composed of an arabic, decimal or cardinal number followed by a word or expression representing a time measure (e.g. segundos). + Scientifc units: Expressions composed of an arabic, decimal or cardinal number followed by a word or expression representing a scientific unit (e.g. toneladas). o Time: Terms expressing time are marked as TIMEX. A list of subtypes is considered, allowing for a more refined classification of these expressions: + Date: Expressions representing a date, whose components can be a day of the week (e.g. Segunda-Feira), a day of the month (e.g. 27), a month (e.g. Novembro) or a year (e.g. 2006). + Time periods: Expressions made by arabic, roman or cardinal numbers and an explicit indication of a period of time concerning a specific year, decade or century. + Time of the day: Expressions with different formats, indicating a specific time of the day. o Addresses: Expressions conveying addresses are marked as ADDREX. A list of subparts is considered, allowing for a more refined classification of these expressions: + Global section: Expressions referring to the global position of a certain location (e.g. Rua Almeida Garrett). This address part is mandatory for an address to be recognized. + Local section: Expressions referring to a specific position within the global position (e.g. Nº 17 - 7º Dto). + Zip code: Expressions referring to the zip code component of an address (e.g. 3654-548 Lisboa). * Name-based expressions o Names: Expressions conveying names are marked as NAMEX. A list of subtypes is considered, allowing for a more refined classification of these expressions: + Persons: Expressions conveying names of people, with the option of considering the job or social status of a person if present (e.g. Presidente Cavaco Silva). + Organizations: Expressions conveying names of companies (e.g. LG Electronics) and political organizations (e.g. ONU). + Locations: Expressions referring to specific geographical locations (e.g. Portugal). + Events: Expressions referring to competitions, conferences, workshops and similar events (e.g. 2ª Conferência Sobre o Acesso Livre ao Conhecimento). + Works: Expressions referring to movies, books, paintings and similar works (e.g. O Retrato de Dorian Gray). + Miscellaneous: Expressions referring to entities that can't be classified according to any of the previous subtypes (e.g. Boeing 747). | |
Identifier (URI): | http://hdl.handle.net/11372/LRT-1226 | |
Language: | Portuguese | |
Language (ISO639): | por | |
Publisher: | NLX-Natural Language and Speech Group, University of Lisbon | |
Type: | toolService | |
Type (DCMI): | Software | |
OLAC Info |
||
Archive: | LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University | |
Description: | http://www.language-archives.org/archive/lindat.mff.cuni.cz | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:lindat.mff.cuni.cz:11372/LRT-1226 | |
DateStamp: | 2016-04-06 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Balsa, João; Branco, António; Ferreira, Eduardo; Silveira, Sara. 2014. NLX-Natural Language and Speech Group, University of Lisbon. | |
Terms: | area_Europe country_PT dcmi_Software iso639_por |