OLAC Record: Arabic dictionary of inflected words with recognition of agglutinated clitics and inflection system

OLAC Record
oai:catalogue.elra.info:ELRA-L0099

Metadata

Title: Arabic dictionary of inflected words with recognition of agglutinated clitics and inflection system

Access Rights: Rights available for: commercialUse

Date Available (W3CDTF): 2017-08-31

Date Issued (W3CDTF): 2017-08-31

Date Modified (W3CDTF): 2017-08-31

Description: This dictionary consists of 6 million inflected forms, fully vowelized, generated in compliance with the grammatical rules of Arabic and tagged with grammatical information which includes POS and grammatical features, including number, gender, case, definiteness, tense, mood and compatibility with clitic agglutination.It is accompanied by a grammatical resource that recognizes hundreds of millions of valid agglutinated words, i.e. words consisting of one of the forms in the dictionary preceded and/or followed by clitics (conjunctions, prepositions, articles, pronouns) in compliance with the grammatical rules of Arabic.In order to be able to update the full-form dictionary, a dictionary of 65 000 lemmas and the data required to inflect them and regenerate the full-form dictionary are also provided. This allows adapting the dictionary to specific applications by deleting and/or adding entries. The resource as it stands covers more than 98% of the forms found in any sort of literature, newspaper articles...; the remaining 2% include proper names, which can be relevant.The data is formatted in conformity with the data formats of Unitex/GramLab, an open source corpus processing system for language processing. These data formats are publicly documented. The data can either be converted into user-specific formats, or be used directly with Unitex/GramLab.This dictionary is also available without recognition of agglutinated clitics and without inflection system in the ELRA Catalogue under reference ELRA-L0098.Authors: Alexis NEME et Eric LAPORTE

Identifier: ELRA-L0099

ISLRN: 963-860-792-289-9

Identifier (URI): https://catalog.elra.info/en-us/repository/browse/ELRA-L0099/

Language: Arabic

Language (ISO639): ara

Medium: Not specified

Publisher: ELRA (European Language Resources Association)

Type (DCMI): Text

Type (OLAC): lexicon

OLAC Info

Archive: ELRA Catalogue of Language Resources

Description: http://www.language-archives.org/archive/catalogue.elra.info

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:catalogue.elra.info:ELRA-L0099

DateStamp: 2017-08-31

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: n.a. 2017. ELRA (European Language Resources Association).
Terms: dcmi_Text iso639_ara olac_lexicon

http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-L0099
Up-to-date as of: Wed Jul 9 1:05:56 EDT 2025

Metadata
Title:		Arabic dictionary of inflected words with recognition of agglutinated clitics and inflection system
Access Rights:		Rights available for: commercialUse
Date Available (W3CDTF):		2017-08-31
Date Issued (W3CDTF):		2017-08-31
Date Modified (W3CDTF):		2017-08-31
Description:		This dictionary consists of 6 million inflected forms, fully vowelized, generated in compliance with the grammatical rules of Arabic and tagged with grammatical information which includes POS and grammatical features, including number, gender, case, definiteness, tense, mood and compatibility with clitic agglutination.It is accompanied by a grammatical resource that recognizes hundreds of millions of valid agglutinated words, i.e. words consisting of one of the forms in the dictionary preceded and/or followed by clitics (conjunctions, prepositions, articles, pronouns) in compliance with the grammatical rules of Arabic.In order to be able to update the full-form dictionary, a dictionary of 65 000 lemmas and the data required to inflect them and regenerate the full-form dictionary are also provided. This allows adapting the dictionary to specific applications by deleting and/or adding entries. The resource as it stands covers more than 98% of the forms found in any sort of literature, newspaper articles...; the remaining 2% include proper names, which can be relevant.The data is formatted in conformity with the data formats of Unitex/GramLab, an open source corpus processing system for language processing. These data formats are publicly documented. The data can either be converted into user-specific formats, or be used directly with Unitex/GramLab.This dictionary is also available without recognition of agglutinated clitics and without inflection system in the ELRA Catalogue under reference ELRA-L0098.Authors: Alexis NEME et Eric LAPORTE
Identifier:		ELRA-L0099
Identifier:		ISLRN: 963-860-792-289-9
Identifier (URI):		https://catalog.elra.info/en-us/repository/browse/ELRA-L0099/
Language:		Arabic
Language (ISO639):		ara
Medium:		Not specified
Publisher:		ELRA (European Language Resources Association)
Type (DCMI):		Text
Type (OLAC):		lexicon
OLAC Info
Archive:		ELRA Catalogue of Language Resources
Description:		http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:catalogue.elra.info:ELRA-L0099
DateStamp:		2017-08-31
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		n.a. 2017. ELRA (European Language Resources Association).
Terms:		dcmi_Text iso639_ara olac_lexicon