OLAC Record: Language Preservation 2.0: Crowdsourcing oral language documentation using mobile devices

OLAC Record
oai:scholarspace.manoa.hawaii.edu:10125/25309

Metadata

Title: Language Preservation 2.0: Crowdsourcing oral language documentation using mobile devices

Bibliographic Citation: Bird, Steven, Bird, Steven; 2015-02-28; In crude quantitative terms, Zipf’s law tells us that documentation of something as simple as word usage requires several million words of text or several hundred hours of speech, in a wide variety of genres and styles. The only way to achieve this goal for the majority of the world’s languages is to collect speech. Speech has the added advantage of providing information about phonetics, phonology, and prosody. Speech is also the primary register for dialogue, the most common form of language use. We argue that a combination of community outreach, crowdsourcing techniques, and mobile/web technologies make it relatively easy to collect hundreds or thousands of hours of speech (Callison-Burch and Dredze, 2010; Hughes et al., 2010; Anon 2010). On its own, this would leave us with a large archive of uninterpreted audio recordings and – once the languages are no longer spoken – an onerous and unverifiable decipherment problem. To avoid this problem and to ensure interpretability, there must be a documentary record that includes translation into a major language. We take as our guide the current typical practice in documentary linguistics, which is to record and report data as interlinear glossed text. To this end, we add two layers of audio annotation to the primary recordings. The first layer is careful respeaking, or “audio transcription,” in which native speakers listen to the recordings phrase by phrase, and respeak each phrase slowly and carefully. The second layer is oral translation, in which bilingual speakers produce phrase-by-phrase interpretation of the original recordings into a widely-spoken contact language such as English. This combination of respeaking and interpreting constitutes an “acoustic Rosetta stone” which, over time, will grow to a sufficient size to allow open-ended analysis of the language even when it is no longer spoken, including new methods for developing automatic phonetic recognizers and automatic translation systems (Liberman et al., 2013, Lee et al., 2013, Anon 2013). We will demonstrate a novel way to work with the speakers of endangered languages to collect these spoken language annotations and interlinear glossed texts on a large scale. Our approach addresses key issues in such areas as informed consent, quality control, workflow management, and the diverse technological situations of linguistic fieldwork. Our work promises to speed up the process of preserving the world’s languages and enable future study of these languages and access to knowledge that is captured in archived speech recordings. References Chris Callison-Burch and Mark Dredze. Creating speech and language data with Amazon’s Mechanical Turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pages 1–12. Association for Computational Linguistics, 2010. URL http://www.aclweb.org/anthology/ W10-0701. Thad Hughes, Kaisuke Nakajima, Linne Ha, Atul Vasu, Pedro J. Moreno, and Mike LeBeau. Building transcribed speech corpora quickly and cheaply for many languages. In INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, pages 1914–1917. ISCA, 2010. Mark Liberman, Jiahong Yuan, Andreas Stolcke, Wen Wang, and Vikramjit Mitra (2013). Using multiple versions of speech input in phone recognition, ICASSP. Chia-ying Lee, Yu Zhang, and James Glass (2013). Joint learning of phonetic units and word pronunciations for ASR. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 182–192. Association for Computational Linguistics.; Kaipuleohone University of Hawai'i Digital Language Archive;http://hdl.handle.net/10125/25309.

Contributor (speaker): Bird, Steven

Creator: Bird, Steven

Date (W3CDTF): 2015-03-12

Description: In crude quantitative terms, Zipf’s law tells us that documentation of something as simple as word usage requires several million words of text or several hundred hours of speech, in a wide variety of genres and styles. The only way to achieve this goal for the majority of the world’s languages is to collect speech. Speech has the added advantage of providing information about phonetics, phonology, and prosody. Speech is also the primary register for dialogue, the most common form of language use. We argue that a combination of community outreach, crowdsourcing techniques, and mobile/web technologies make it relatively easy to collect hundreds or thousands of hours of speech (Callison-Burch and Dredze, 2010; Hughes et al., 2010; Anon 2010). On its own, this would leave us with a large archive of uninterpreted audio recordings and – once the languages are no longer spoken – an onerous and unverifiable decipherment problem. To avoid this problem and to ensure interpretability, there must be a documentary record that includes translation into a major language. We take as our guide the current typical practice in documentary linguistics, which is to record and report data as interlinear glossed text. To this end, we add two layers of audio annotation to the primary recordings. The first layer is careful respeaking, or “audio transcription,” in which native speakers listen to the recordings phrase by phrase, and respeak each phrase slowly and carefully. The second layer is oral translation, in which bilingual speakers produce phrase-by-phrase interpretation of the original recordings into a widely-spoken contact language such as English. This combination of respeaking and interpreting constitutes an “acoustic Rosetta stone” which, over time, will grow to a sufficient size to allow open-ended analysis of the language even when it is no longer spoken, including new methods for developing automatic phonetic recognizers and automatic translation systems (Liberman et al., 2013, Lee et al., 2013, Anon 2013). We will demonstrate a novel way to work with the speakers of endangered languages to collect these spoken language annotations and interlinear glossed texts on a large scale. Our approach addresses key issues in such areas as informed consent, quality control, workflow management, and the diverse technological situations of linguistic fieldwork. Our work promises to speed up the process of preserving the world’s languages and enable future study of these languages and access to knowledge that is captured in archived speech recordings. References Chris Callison-Burch and Mark Dredze. Creating speech and language data with Amazon’s Mechanical Turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pages 1–12. Association for Computational Linguistics, 2010. URL http://www.aclweb.org/anthology/ W10-0701. Thad Hughes, Kaisuke Nakajima, Linne Ha, Atul Vasu, Pedro J. Moreno, and Mike LeBeau. Building transcribed speech corpora quickly and cheaply for many languages. In INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, pages 1914–1917. ISCA, 2010. Mark Liberman, Jiahong Yuan, Andreas Stolcke, Wen Wang, and Vikramjit Mitra (2013). Using multiple versions of speech input in phone recognition, ICASSP. Chia-ying Lee, Yu Zhang, and James Glass (2013). Joint learning of phonetic units and word pronunciations for ASR. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 182–192. Association for Computational Linguistics.

Identifier (URI): http://hdl.handle.net/10125/25309

Rights: Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported

Table Of Contents: 25309.mp3

25309.pdf

OLAC Info

Archive: Language Documentation and Conservation

Description: http://www.language-archives.org/archive/ldc.scholarspace.manoa.hawaii.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:scholarspace.manoa.hawaii.edu:10125/25309

DateStamp: 2024-08-22

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Bird, Steven. 2015. Language Documentation and Conservation.

http://www.language-archives.org/item.php/oai:scholarspace.manoa.hawaii.edu:10125/25309
Up-to-date as of: Wed Jun 18 0:13:09 EDT 2025

Metadata
Title:		Language Preservation 2.0: Crowdsourcing oral language documentation using mobile devices
Bibliographic Citation:		Bird, Steven, Bird, Steven; 2015-02-28; In crude quantitative terms, Zipf’s law tells us that documentation of something as simple as word usage requires several million words of text or several hundred hours of speech, in a wide variety of genres and styles. The only way to achieve this goal for the majority of the world’s languages is to collect speech. Speech has the added advantage of providing information about phonetics, phonology, and prosody. Speech is also the primary register for dialogue, the most common form of language use. We argue that a combination of community outreach, crowdsourcing techniques, and mobile/web technologies make it relatively easy to collect hundreds or thousands of hours of speech (Callison-Burch and Dredze, 2010; Hughes et al., 2010; Anon 2010). On its own, this would leave us with a large archive of uninterpreted audio recordings and – once the languages are no longer spoken – an onerous and unverifiable decipherment problem. To avoid this problem and to ensure interpretability, there must be a documentary record that includes translation into a major language. We take as our guide the current typical practice in documentary linguistics, which is to record and report data as interlinear glossed text. To this end, we add two layers of audio annotation to the primary recordings. The first layer is careful respeaking, or “audio transcription,” in which native speakers listen to the recordings phrase by phrase, and respeak each phrase slowly and carefully. The second layer is oral translation, in which bilingual speakers produce phrase-by-phrase interpretation of the original recordings into a widely-spoken contact language such as English. This combination of respeaking and interpreting constitutes an “acoustic Rosetta stone” which, over time, will grow to a sufficient size to allow open-ended analysis of the language even when it is no longer spoken, including new methods for developing automatic phonetic recognizers and automatic translation systems (Liberman et al., 2013, Lee et al., 2013, Anon 2013). We will demonstrate a novel way to work with the speakers of endangered languages to collect these spoken language annotations and interlinear glossed texts on a large scale. Our approach addresses key issues in such areas as informed consent, quality control, workflow management, and the diverse technological situations of linguistic fieldwork. Our work promises to speed up the process of preserving the world’s languages and enable future study of these languages and access to knowledge that is captured in archived speech recordings. References Chris Callison-Burch and Mark Dredze. Creating speech and language data with Amazon’s Mechanical Turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pages 1–12. Association for Computational Linguistics, 2010. URL http://www.aclweb.org/anthology/ W10-0701. Thad Hughes, Kaisuke Nakajima, Linne Ha, Atul Vasu, Pedro J. Moreno, and Mike LeBeau. Building transcribed speech corpora quickly and cheaply for many languages. In INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, pages 1914–1917. ISCA, 2010. Mark Liberman, Jiahong Yuan, Andreas Stolcke, Wen Wang, and Vikramjit Mitra (2013). Using multiple versions of speech input in phone recognition, ICASSP. Chia-ying Lee, Yu Zhang, and James Glass (2013). Joint learning of phonetic units and word pronunciations for ASR. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 182–192. Association for Computational Linguistics.; Kaipuleohone University of Hawai'i Digital Language Archive;http://hdl.handle.net/10125/25309.
Contributor (speaker):		Bird, Steven
Creator:		Bird, Steven
Date (W3CDTF):		2015-03-12
Description:		In crude quantitative terms, Zipf’s law tells us that documentation of something as simple as word usage requires several million words of text or several hundred hours of speech, in a wide variety of genres and styles. The only way to achieve this goal for the majority of the world’s languages is to collect speech. Speech has the added advantage of providing information about phonetics, phonology, and prosody. Speech is also the primary register for dialogue, the most common form of language use. We argue that a combination of community outreach, crowdsourcing techniques, and mobile/web technologies make it relatively easy to collect hundreds or thousands of hours of speech (Callison-Burch and Dredze, 2010; Hughes et al., 2010; Anon 2010). On its own, this would leave us with a large archive of uninterpreted audio recordings and – once the languages are no longer spoken – an onerous and unverifiable decipherment problem. To avoid this problem and to ensure interpretability, there must be a documentary record that includes translation into a major language. We take as our guide the current typical practice in documentary linguistics, which is to record and report data as interlinear glossed text. To this end, we add two layers of audio annotation to the primary recordings. The first layer is careful respeaking, or “audio transcription,” in which native speakers listen to the recordings phrase by phrase, and respeak each phrase slowly and carefully. The second layer is oral translation, in which bilingual speakers produce phrase-by-phrase interpretation of the original recordings into a widely-spoken contact language such as English. This combination of respeaking and interpreting constitutes an “acoustic Rosetta stone” which, over time, will grow to a sufficient size to allow open-ended analysis of the language even when it is no longer spoken, including new methods for developing automatic phonetic recognizers and automatic translation systems (Liberman et al., 2013, Lee et al., 2013, Anon 2013). We will demonstrate a novel way to work with the speakers of endangered languages to collect these spoken language annotations and interlinear glossed texts on a large scale. Our approach addresses key issues in such areas as informed consent, quality control, workflow management, and the diverse technological situations of linguistic fieldwork. Our work promises to speed up the process of preserving the world’s languages and enable future study of these languages and access to knowledge that is captured in archived speech recordings. References Chris Callison-Burch and Mark Dredze. Creating speech and language data with Amazon’s Mechanical Turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pages 1–12. Association for Computational Linguistics, 2010. URL http://www.aclweb.org/anthology/ W10-0701. Thad Hughes, Kaisuke Nakajima, Linne Ha, Atul Vasu, Pedro J. Moreno, and Mike LeBeau. Building transcribed speech corpora quickly and cheaply for many languages. In INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, pages 1914–1917. ISCA, 2010. Mark Liberman, Jiahong Yuan, Andreas Stolcke, Wen Wang, and Vikramjit Mitra (2013). Using multiple versions of speech input in phone recognition, ICASSP. Chia-ying Lee, Yu Zhang, and James Glass (2013). Joint learning of phonetic units and word pronunciations for ASR. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 182–192. Association for Computational Linguistics.
Identifier (URI):		http://hdl.handle.net/10125/25309
Rights:		Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported
Table Of Contents:		25309.mp3
Table Of Contents:		25309.pdf
OLAC Info
Archive:		Language Documentation and Conservation
Description:		http://www.language-archives.org/archive/ldc.scholarspace.manoa.hawaii.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:scholarspace.manoa.hawaii.edu:10125/25309
DateStamp:		2024-08-22
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Bird, Steven. 2015. Language Documentation and Conservation.