OLAC Record
oai:lindat.mff.cuni.cz:11372/LRT-4768

Metadata
Title:Arabic ACL corpus
Bibliographic Citation:http://hdl.handle.net/11372/LRT-4768
Creator:Salah Elfahal Elebaed, Hoyam
Kasbi, Mohammed
Nasri, Mohammed
Bouzoubaa, Karim
Date (W3CDTF):2022-05-31T20:08:34Z
Date Available:2022-05-31T20:08:34Z
Description:This corpus constitutes all sentences representing the Arabic Controlled Language (ACL). It contains 551 sentences taken from four textbooks and websites dedicated to teach Arabic language to kids such as: a) First grade book, Republic of Sudan (كتاب الصف الاول جمهورية السودان), b) Al Jazeera Educational Site (موقع الجزيرة التعليمي), c) Bella Preparatory School Girls Forum (منتدى مدرسة بيلا الاعدادية بنات), and d) Albahr website (موقع انا البحر). These sentences are respecting 52 ACL rules. The average number of sentences for each rule is 10.6. All sentences in the corpus were analyzed by Farasa syntactic parser to confirm they are correctly analyzed. The validity of the parsing was done manually by linguist experts. The structure of this corpus is made of a header and a body. The header consists of a set of metadata that describe the corpus, such as the corpus name, the authors, the sources and further meta data. While the header is made of metadata, the body contains rules. Each rule has a code, a structure and all sentences respecting that rule. For each sentence, we store an id, the vowelledand unvowelled text as well as the result of parsing using Farasa.
Identifier (URI):http://hdl.handle.net/11372/LRT-4768
Language:Arabic
Language (ISO639):ara
Publisher:International Journal of Computer Science Trends and Technology (IJCST)
Subject:Controlled Natural Language
Arabic CNL
ACL
Arabic Corpus
and TEI.
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11372/LRT-4768
DateStamp:  2022-06-06
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Salah Elfahal Elebaed, Hoyam; Kasbi, Mohammed; Nasri, Mohammed; Bouzoubaa, Karim. 2022. International Journal of Computer Science Trends and Technology (IJCST).
Terms: dcmi_Text iso639_ara olac_primary_text


http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11372/LRT-4768
Up-to-date as of: Thu Oct 5 0:43:18 EDT 2023