OLAC Record oai:www.ldc.upenn.edu:LDC2004L02 |
Metadata | ||
Title: | Buckwalter Arabic Morphological Analyzer Version 2.0 | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Buckwalter, Tim. Buckwalter Arabic Morphological Analyzer Version 2.0 LDC2004L02. Web Download. Philadelphia: Linguistic Data Consortium, 2004 | |
Contributor: | Buckwalter, Tim | |
Date (W3CDTF): | 2004 | |
Date Issued (W3CDTF): | 2004-12-15 | |
Description: | *Introduction* Buckwalter Arabic Morphological Analyzer Version 2.0 was developed by Tim Buckwalter at the Linguistic Data Consortium (LDC) and contains a Perl script for morphology analysis and part-of-speech (POS) tagging of Arabic text. The release includes lexicons with approximately 83,000 entries of Arabic prefixes, suffixes, and stems as well as compatibility tables that are referenced by the script in the analysis of the text. The analyzer considers each Arabic word token in all possible prefix-stem-suffix segmentations and lists all known/possible annotation solutions, POS labels, and glosses. The generated output may then be reviewed by users, and the most appropriate annotation selected from among several choices. This tool has been used frequently for LDC releases of annotated Arabic text. *Data* The data consists primarily of the Perl script, lexicons, and compatibility tables. Here are the three Arabic-English lexicon files: * Prefixes (299 entries) * Suffixes (618 entries) * Stems (82,158 entries representing 38,600 lemmas) The lexicons are supplemented by three morphological compatibility tables used for controlling possible word part combinations: * Prefix-stem (1,648 entries) * Stem-suffix (1,285 entries) * Prefix-suffix (598 entries) The documentation consists of a readme file with a description of the lexicon files, the morphological compatibility tables, the morphology analysis algorithm, a summary of stem morphological categories, and a table with the author's Arabic transliteration system. *Samples* To see an example of the analyzer's output, please examine this sample. *Updates* There are no updates available at this time. *Additional Licensing Instructions* This 'members-only' corpus is available to current members who can request the data at the listed reduced-license fee. Contact ldc@ldc.upenn.edu for information about becoming a member. | |
Extent: | Corpus size: 9216 KB | |
Identifier: | LDC2004L02 | |
https://catalog.ldc.upenn.edu/LDC2004L02 | ||
ISBN: 1-58563-324-0 | ||
ISLRN: 694-194-540-336-4 | ||
DOI: 10.35111/050q-5r95 | ||
Language: | Standard Arabic | |
English | ||
Language (ISO639): | arb | |
eng | ||
License: | BAMA Agreement: https://catalog.ldc.upenn.edu/license/buckwalter-arabic-morphological-analyzer.pdf | |
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2004L02 | |
Rights Holder: | Portions (c) 2002-2004 QAMUS LLC (www.qamus.org), (c) 2002-2004 Trustees of the University of Pennsylvania | |
Subject: | Standard Arabic language | |
Subject (ISO639): | arb | |
Type (DCMI): | Text | |
Type (OLAC): | lexicon | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2004L02 | |
DateStamp: | 2024-04-02 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Buckwalter, Tim. 2004. Linguistic Data Consortium. | |
Terms: | area_Asia area_Europe country_GB country_SA dcmi_Text iso639_arb iso639_eng olac_lexicon | |
Inferred Metadata | ||
Country: | Saudi Arabia | |
Area: | Asia |