OLAC Record
oai:www.ldc.upenn.edu:LDC2015S10

Metadata
Title:Arabic Learner Corpus
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Alfaifi, Abdullah, and Eric Atwell. Arabic Learner Corpus LDC2015S10. Web Download. Philadelphia: Linguistic Data Consortium, 2015
Contributor:Alfaifi, Abdullah
Atwell, Eric
Date (W3CDTF):2015
Date Issued (W3CDTF):2015-08-15
Description:*Introduction* Arabic Learner Corpus was developed at the University of Leeds and consists of written essays and spoken recordings by Arabic learners collected in Saudi Arabia in 2012 and 2013. The corpus includes 282,732 words in 1,585 materials, produced by 942 students from 67 nationalities studying at pre-university and university levels. The average length of an essay is 178 words. *Data* Two tasks were used to collect the written data, and participants had the choice to do one or both of them. In each of those tasks, learners were asked to write a narrative about a vacation trip and a discussion about the participant's study interest. Those choosing the first task generated a 40 minute timed essay without the use of any language reference materials. In the second task, participants completed the writing as a take-home assignment over two days and were permitted to use language reference materials. The audio recordings were developed by allowing students a limited amount of time to talk about the topics above without using language reference materials. The original handwritten essays were transcribed into an electronic text format. The corpus data consists of three types: (1) handwritten sheets scanned in PDF format; (2) audio recordings in MP3 format; and (3) textual unicode data in plain text and XML formats (including the transcribed audio and transcripts of the handwritten essays). The audio files are either 44100Hz 2-channel or 16000Hz 1-channel mp3 files. *Samples* Please view the following samples: * Audio sample * Arabic Header Text * English Header XML *Updates* None at this time.
Extent:Corpus size: 911904 KB
Format:Sampling Rate: 44100
Sampling Format: mp3
Identifier:LDC2015S10
https://catalog.ldc.upenn.edu/LDC2015S10
ISBN: 1-58563-727-0
ISLRN: 568-308-670-444-7
DOI: 10.35111/5312-x803
Language:Standard Arabic
Language (ISO639):arb
License:Arabic Learner Corpus User License Agreement: https://catalog.ldc.upenn.edu/license/arabic-learner-corpus-user-license-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2015S10
Rights Holder:Portions © 2015 Abdullah Alfaifi, © 2015 Trustees of the University of Pennsylvania
Type (DCMI):Sound
Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2015S10
DateStamp:  2024-09-27
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Alfaifi, Abdullah; Atwell, Eric. 2015. Linguistic Data Consortium.
Terms: area_Asia country_SA dcmi_Sound dcmi_Text iso639_arb olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2015S10
Up-to-date as of: Thu Oct 24 7:30:50 EDT 2024