OLAC Record oai:www.ldc.upenn.edu:LDC2024S07 |
Metadata | ||
Title: | MATERIAL Bulgarian-English Language Pack | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Bills, Aric, et al. MATERIAL Bulgarian-English Language Pack LDC2024S07. Web Download. Philadelphia: Linguistic Data Consortium, 2024 | |
Contributor: | Bills, Aric | |
Bishop, Judith | ||
Boyle, Anne | ||
Chouder, Sarra | ||
Clair, Nathaniel | ||
Conners, Tom | ||
Corey, Cassian | ||
Cronin, Kristina | ||
Dubinski, Eyal | ||
Ellis, Corinna | ||
Gibby, Paul | ||
Hammond, Simon | ||
Hidalgo, Guia | ||
Kaiser-Schatzlein, Alice | ||
Kalnins, Dagmara | ||
Kazi, Michael | ||
Lam, Julie | ||
Lazar, Rosie | ||
Le, Hanh | ||
Malyska, Nicolas | ||
Medel, Olivia | ||
Melot, Jennifer | ||
Mensch, Alyssa | ||
Moore, Alex | ||
Morrison, Michelle | ||
Paget, Shelley | ||
Raymer, Alston | ||
Richardson, Fred | ||
Ridgway, Hristina | ||
Roberts, Annette | ||
Rubino, Carl | ||
Saw, Kenneth | ||
Shen, Sinney | ||
Soh, Stephanie | ||
Taylor, Jonathan | ||
Thompson, Brian | ||
Tong, Audrey | ||
Tong, Richard | ||
Williams, Mariana | ||
Yelle, Julie | ||
Yu, Jennifer | ||
Zavora, Yoanna | ||
Zavorin, Ilya | ||
Date (W3CDTF): | 2024 | |
Date Issued (W3CDTF): | 2024-07-15 | |
Description: | *Introduction* MATERIAL Bulgarian-English Language Pack was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) MATERIAL (Machine Translation for English Retrieval of Information in Any Language) program. It contains approximately 80 hours of Bulgarian conversational telephone speech, transcripts, English translations, annotations and queries. The MATERIAL program focused on underserved languages with the ultimate goal to build cross language information retrieval systems to find speech and text content using English search queries. *Data* The Bulgarian speech in this release represents the Western and Eastern dialects. The gender distribution among speakers is approximately equal; speakers' ages range from 16 years to 67 years. Calls were made using different telephones (e.g., mobile, landline) from a variety of environments including the street, a home or office, a public place, and inside a vehicle. Transcripts cover approximately 40% of the speech files, and approximately 10% of the speech files were translated into English. Further information about transcription and translation methodologies is contained in the documentation accompanying this release. Bulgarian-English Language Pack also includes domain annotations, English queries and their relevance annotations. Annotators marked transcripts by domain (e.g., lifestyle, business-and-commerce, sports, education, and so on), by query (simple, conceptual, hybrid) and by their relevance to query search terms. Speech data is presented either as two channel wav or single channel sphere files, both in 8kHz A-law format. All text data is UTF-8 encoded. *Samples* Please view the following samples: * Audio Sample (WAV) * Transcript Sample (TXT) * Translation Sample (TXT) *Updates* None at this time. | |
Extent: | Corpus size: 3548391 KB | |
Format: | Sampling Rate: 8000 | |
Sampling Format: alaw | ||
Identifier: | LDC2024S07 | |
https://catalog.ldc.upenn.edu/LDC2024S07 | ||
ISLRN: 450-346-825-481-3 | ||
DOI: 10.35111/fs0v-4606 | ||
Language: | Bulgarian | |
English | ||
Language (ISO639): | bul | |
eng | ||
License: | MATERIAL Bulgarian-English Agreement (For-Profit): https://catalog.ldc.upenn.edu/license/material-bulgarian-english-agreement-for-profit.pdf | |
MATERIAL Bulgarian-English Agreement (Non-Member): https://catalog.ldc.upenn.edu/license/material-bulgarian-english-agreement-non-member.pdf | ||
MATERIAL Bulgarian-English Agreement (Not-For-Profit): https://catalog.ldc.upenn.edu/license/material-bulgarian-english-agreement-not-for-profit.pdf | ||
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2024S07 | |
Rights Holder: | Portions © 2024 U.S. Government, © 2024 Trustees of the University of Pennsylvania The U.S. Government acquired this data from Appen which assigned the copyright to the data in the U.S. Government. | |
Type (DCMI): | Sound | |
Text | ||
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2024S07 | |
DateStamp: | 2024-07-17 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Bills, Aric; Bishop, Judith; Boyle, Anne; Chouder, Sarra; Clair, Nathaniel; Conners, Tom; Corey, Cassian; Cronin, Kristina; Dubinski, Eyal; Ellis, Corinna; Gibby, Paul; Hammond, Simon; Hidalgo, Guia; Kaiser-Schatzlein, Alice; Kalnins, Dagmara; Kazi, Michael; Lam, Julie; Lazar, Rosie; Le, Hanh; Malyska, Nicolas; Medel, Olivia; Melot, Jennifer; Mensch, Alyssa; Moore, Alex; Morrison, Michelle; Paget, Shelley; Raymer, Alston; Richardson, Fred; Ridgway, Hristina; Roberts, Annette; Rubino, Carl; Saw, Kenneth; Shen, Sinney; Soh, Stephanie; Taylor, Jonathan; Thompson, Brian; Tong, Audrey; Tong, Richard; Williams, Mariana; Yelle, Julie; Yu, Jennifer; Zavora, Yoanna; Zavorin, Ilya. 2024. Linguistic Data Consortium. | |
Terms: | area_Europe country_BG country_GB dcmi_Sound dcmi_Text iso639_bul iso639_eng olac_primary_text |