Sample Metadata Record

oai:catalogue.elra.info:ELRA-S0149


XML format

<olac:olac>
<dc:title>Spanish Speech Corpus 1 (Appen)</dc:title>
<dcterms:available xsi:type="dcterms:W3CDTF">2003-07-15</dcterms:available>
<dcterms:issued xsi:type="dcterms:W3CDTF">2003-07-15</dcterms:issued>
<dcterms:modified xsi:type="dcterms:W3CDTF">2007-02-22</dcterms:modified>
<dc:description>The Spanish Speech Corpus 1 contains the recordings of 200 native Spanish speakers (100 males, 100 females) recorded in an office and a closed public place, over 4 channels, in a range of low to medium background noise environments (Plantronics Audio 10 (computer/desk mic), Shure SM58 (desk mounted dynamic mic), Shure Beta 53 (headset mic) and Andrea DA-400 (array mic)). The data collection and transcription were performed by Appen (Australia).Speech samples are stored as sequences of 16-bit 22.05 kHz PCM in uncompressed WAV files. Each speaker read the following items (prompted):- 100 command words- 100 phonetically rich sentencesThe following age distribution has been obtained: 75 speakers are between 18 and 19, 114 are between 20 and 30, and 11 are between 31 and 45.Information about the speakers? place of birth is included.The database is provided with orthographic transcriptions in SAMPA, including canonical and alternative pronunciation, and syllable, stress and acoustic events markings. All transcriptions were segmented at the utterance (sentence/command word) level, annotated at the word level and checked manually. A pronunciation lexicon including 3,748 headwords (plus variants) is also available.This database is aimed to be used within speech recognition and voice control applications.</dc:description>
<dcterms:medium>Not specified</dcterms:medium>
<dc:identifier>ELRA-S0149</dc:identifier>
<dc:identifier>ISLRN: 184-220-498-777-5</dc:identifier>
<dc:identifier xsi:type="dcterms:URI">https://catalog.elra.info/en-us/repository/browse/ELRA-S0149/</dc:identifier>
<dc:language xsi:type="olac:language" olac:code="spa">Spanish; Castilian</dc:language>
<dc:publisher>ELRA (European Language Resources Association)</dc:publisher>
<dcterms:accessRights> Rights available for: nonCommercialUse, commercialUse </dcterms:accessRights>
<dc:type xsi:type="olac:linguistic-type" olac:code="primary_text"/>
<dc:type xsi:type="dcterms:DCMIType">Sound</dc:type>
</olac:olac>

Display format

 Title  Spanish Speech Corpus 1 (Appen)
 Available (W3CDTF)  2003-07-15
 Is su ed (W3CDTF)  2003-07-15
 Modified (W3CDTF)  2007-02-22
 Description  The Spanish Speech Corpus 1 contains the recordings of 200 native Spanish speakers (100 males, 100 females) recorded in an office and a closed public place, over 4 channels, in a range of low to medium background noise environments (Plantronics Audio 10 (computer/desk mic), Shure SM58 (desk mounted dynamic mic), Shure Beta 53 (headset mic) and Andrea DA-400 (array mic)). The data collection and transcription were performed by Appen (Australia).Speech samples are stored as sequences of 16-bit 22.05 kHz PCM in uncompressed WAV files. Each speaker read the following items (prompted):- 100 command words- 100 phonetically rich sentencesThe following age distribution has been obtained: 75 speakers are between 18 and 19, 114 are between 20 and 30, and 11 are between 31 and 45.Information about the speakers? place of birth is included.The database is provided with orthographic transcriptions in SAMPA, including canonical and alternative pronunciation, and syllable, stress and acoustic events markings. All transcriptions were segmented at the utterance (sentence/command word) level, annotated at the word level and checked manually. A pronunciation lexicon including 3,748 headwords (plus variants) is also available.This database is aimed to be used within speech recognition and voice control applications.
 Medium  Not specified
 Identifier  ELRA-S0149
 Identifier  ISLRN: 184-220-498-777-5
 Identifier (URI)  https://catalog.elra.info/en-us/repository/browse/ELRA-S0149/
 Language (ISO639-3)  Spanish [spa], Spanish; Castilian
 Publisher  ELRA (European Language Resources Association)
 Access Rights  Rights available for: nonCommercialUse, commercialUse
 Type (OLAC)  Linguistic type: Primary text
 Type (DCMI)  Sound

Metadata quality analysis

OLAC metadata records are scored for metadata quality on a 10-point scale explained in OLAC Metadata Metrics. The score for the above record (along with comments on changes that could improve the score) is as follows:

Component + - Comments
Title   1   0 
Date   1   0 
Agent   1   0 
About   1   0 
Depth   1   0 
Content Language   1   0 
Subject Language   1   0 
OLAC Type   1   0 
DCMI Type   1   0 
Precision   0.67   0.33  For the full score, make use of at least one more encoding scheme in addition to the ones counted explicitly in other components of the score. For instance,
  • olac:role on dc:creator or dc:contributor
  • use dcterms:IMT on dc:format
Quality score  9.67