OLAC Record: King Saud University Arabic Speech Database

OLAC Record
oai:www.ldc.upenn.edu:LDC2014S02

Metadata

Title: King Saud University Arabic Speech Database

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Alsulaiman, Mansour, et al. King Saud University Arabic Speech Database LDC2014S02. Web Download. Philadelphia: Linguistic Data Consortium, 2014

Contributor: Alsulaiman, Mansour

Muhammad, Ghulam

Abdelkader, Bencherif Mohamed

Mahmood, Awais

Ali, Zulfiqar

Date (W3CDTF): 2014

Date Issued (W3CDTF): 2014-02-17

Description: *Introduction* King Saud University Arabic Speech Database was developed by Speech Group (SG) at King Saud University and contains 590 hours of recorded Arabic speech from 269 male and female speakers. The utterances include read and spontaneous speech. The recordings were conducted in varied environments representing quiet and noisy settings. *Data* The corpus was designed principally for speaker recognition research. However, other possible applications include first language recognition, mobile effect, multichannel effect, and use of different type of microphones. The speech sources are word lists, sentence lists, paragraphs and question and answer sessions. Read speech text includes the following: * Sets of sentences devised to cover allophones of each phoneme, phonetic balance, and differentiation of accents. * Word lists developed to minimize missing phonemes and to represent nasals fricatives, commonly used words, and numbers. * Two paragraphs selected because they included all letters of the alphabet and were easy to read. Spontaneous speech was captured through question and answer sessions where speakers answer questions displayed on screen. The questions were on general topics such as the weather and food and included the speaker name or number. The speakers were Saudis and non-Saudis. Among the non-Saudi participants were Arabs and non-Arabs. All female speakers were either Saudis or non-Saudi Arabs. Male speakers included non-Arabs from the Indian subcontinent, Africa, South East Asia and East Europe. Non-Arab participants were required to be able to read Arabic at an acceptable level. Most of the Non-Arab speakers were from the fourth level in the Arabic Linguistics Institute at King Saud University. The non-Saudi participants represented 28 nationalities and were chosen from clusters of areas or countries. Each speaker was recorded in three different environments: in a soundproof room , in an office and in a cafeteria. The recordings were collected via different microphones and a mobile phone and averaged between 16-19 minutes. The recordings were done in three sessions with a time-gap of an approximately 6 weeks. The data was verified for missing recordings, problems with the recording system or errors in the recording process. All files are presented as two channel 48 kHz 16-bit FLAC compressed PCM wav files. Note that sizes and file names in the documentation are for the uncompressed wav files. *Samples* Please view this male sample and female sample. *Updates* None at this time.

Extent: Corpus size: 148897792 KB

Format: Sampling Rate: 48000

Sampling Format: pcm

Identifier: LDC2014S02

https://catalog.ldc.upenn.edu/LDC2014S02

ISBN: 1-58563-669-X

ISLRN: 789-673-729-277-5

DOI: 10.35111/vpqe-bz17

Language: Arabic

Language (ISO639): ara

License: King Saud University Arabic Speech Database: https://catalog.ldc.upenn.edu/license/ksu-arabic-speech-database.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2014S02

Rights Holder: Portions © 2014 King Saud University, © 2014 Trustees of the University of Pennsylvania

Type (DCMI): Sound

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2014S02

DateStamp: 2020-11-30

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Alsulaiman, Mansour; Muhammad, Ghulam; Abdelkader, Bencherif Mohamed; Mahmood, Awais; Ali, Zulfiqar. 2014. Linguistic Data Consortium.
Terms: dcmi_Sound iso639_ara olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2014S02
Up-to-date as of: Sat Jun 28 1:02:05 EDT 2025

Metadata
Title:		King Saud University Arabic Speech Database
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Alsulaiman, Mansour, et al. King Saud University Arabic Speech Database LDC2014S02. Web Download. Philadelphia: Linguistic Data Consortium, 2014
Contributor:		Alsulaiman, Mansour
		Muhammad, Ghulam
		Abdelkader, Bencherif Mohamed
		Mahmood, Awais
		Ali, Zulfiqar
Date (W3CDTF):		2014
Date Issued (W3CDTF):		2014-02-17
Description:		Introduction King Saud University Arabic Speech Database was developed by Speech Group (SG) at King Saud University and contains 590 hours of recorded Arabic speech from 269 male and female speakers. The utterances include read and spontaneous speech. The recordings were conducted in varied environments representing quiet and noisy settings. Data The corpus was designed principally for speaker recognition research. However, other possible applications include first language recognition, mobile effect, multichannel effect, and use of different type of microphones. The speech sources are word lists, sentence lists, paragraphs and question and answer sessions. Read speech text includes the following: * Sets of sentences devised to cover allophones of each phoneme, phonetic balance, and differentiation of accents. * Word lists developed to minimize missing phonemes and to represent nasals fricatives, commonly used words, and numbers. * Two paragraphs selected because they included all letters of the alphabet and were easy to read. Spontaneous speech was captured through question and answer sessions where speakers answer questions displayed on screen. The questions were on general topics such as the weather and food and included the speaker name or number. The speakers were Saudis and non-Saudis. Among the non-Saudi participants were Arabs and non-Arabs. All female speakers were either Saudis or non-Saudi Arabs. Male speakers included non-Arabs from the Indian subcontinent, Africa, South East Asia and East Europe. Non-Arab participants were required to be able to read Arabic at an acceptable level. Most of the Non-Arab speakers were from the fourth level in the Arabic Linguistics Institute at King Saud University. The non-Saudi participants represented 28 nationalities and were chosen from clusters of areas or countries. Each speaker was recorded in three different environments: in a soundproof room , in an office and in a cafeteria. The recordings were collected via different microphones and a mobile phone and averaged between 16-19 minutes. The recordings were done in three sessions with a time-gap of an approximately 6 weeks. The data was verified for missing recordings, problems with the recording system or errors in the recording process. All files are presented as two channel 48 kHz 16-bit FLAC compressed PCM wav files. Note that sizes and file names in the documentation are for the uncompressed wav files. Samples Please view this male sample and female sample. Updates None at this time.
Extent:		Corpus size: 148897792 KB
Format:		Sampling Rate: 48000
Format:		Sampling Format: pcm
Identifier:		LDC2014S02
		https://catalog.ldc.upenn.edu/LDC2014S02
		ISBN: 1-58563-669-X
		ISLRN: 789-673-729-277-5
		DOI: 10.35111/vpqe-bz17
Language:		Arabic
Language (ISO639):		ara
License:		King Saud University Arabic Speech Database: https://catalog.ldc.upenn.edu/license/ksu-arabic-speech-database.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2014S02
Rights Holder:		Portions © 2014 King Saud University, © 2014 Trustees of the University of Pennsylvania
Type (DCMI):		Sound
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2014S02
DateStamp:		2020-11-30
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Alsulaiman, Mansour; Muhammad, Ghulam; Abdelkader, Bencherif Mohamed; Mahmood, Awais; Ali, Zulfiqar. 2014. Linguistic Data Consortium.
Terms:		dcmi_Sound iso639_ara olac_primary_text