OLAC Record

Title:EnToSSLNE - a Lexicon of Parallel Named Entities from English to South Slavic Languages
Access Rights: Rights available for: nonCommercialUse, commercialUse
Date Available (W3CDTF):2019-04-24
Date Issued (W3CDTF):2019-04-24
Date Modified (W3CDTF):2019-04-24
Description:This lexicon contains multiword entries which are not strictly named entities, but contain a word which is. For example, German shepherd is an entry in this lexicon, since many dogs of this breed exist. But, the adjective German makes it a named entity in a broader sense. Accordingly, there are many multiword units in the lexicon which contain ethnonyms. Similarly, the unit Planck's law belongs to this lexicon as well.Certain natural terms like biological species and substances, which are sometimes considered named entities, are not included in the lexicon.LanguagesThe lexicon consists of 26,155 parallel named entities in seven languages: English and six South Slavic ones: Bosnian, Bulgarian, Croatian, Macedonian, Serbian, and Slovenian.Slovenian, Croatian and Bosnian are written in Latin script, Macedonian and Bulgarian in Cyrillic. Serbian language is specific since it may come in two scripts (Cyrillic and Latin) and two dialects (ekavica and ijekavica). This lexicon takes Serbian ekavica variant and its Cyrillic script.ClassificationThe tags used for named entities are: ORGANIZATION, LOCATION, PERSON, PRODUCT and MISC. Each named entity belongs to one of these classes. The classes comprise:ORGANIZATION: political organizations, companies, schools, rock bands, sport teamsLOCATION: geographical terms, fictional places, cosmic termsPERSON: humans, gods, saints, fictional charactersPRODUCT: industrial products, software products, weapons, art works, documents, concepts, standards, formats, anthems, algorithms, journals, coats of arms, platforms, websitesMISC: events, languages, peoples, tribes, alliances, orders, scientific discoveries, theories, titles, currencies, holidays, dynasties, positions, projects, historical periods, competitions, deceases, breeds, programs, set of locations, awards, musical genres, missions, artistic directions, set of organizations, networks.The lexicon consists of 26,155 entries. A tag is assigned to each one of them. The distribution of classes is as follows:ORGANIZATION: 1,575 entriesLOCATION: 6,327 entriesPERSON: 8,584 entriesPRODUCT: 1,716 entriesMISC: 7,953 entriesFormatsThe lexicon comes in two formats: csv and xml.The first row in the csv file is a title row and tab is used as a field separator, eg:German ShepherdNemški ovčarNjemački ovčarNjemački ovčarНемачки овчарГермански овчарНемска овчаркаMISCIn the xml file, the tag denoting the class is an attribute and languages are elements.
ISLRN: 690-348-503-270-1
Identifier (URI):https://catalog.elra.info/en-us/repository/browse/ELRA-M0051/
Language (ISO639):bul
Medium:Not specified
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Text
Type (OLAC):lexicon


Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-M0051
DateStamp:  2019-04-24
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2019. ELRA (European Language Resources Association).
Terms: area_Europe country_BA country_BG country_GB country_HR country_MK country_RS country_SI dcmi_Text iso639_bos iso639_bul iso639_eng iso639_hrv iso639_mkd iso639_slv iso639_srp olac_lexicon

Up-to-date as of: Fri Apr 19 6:29:20 EDT 2024