OLAC Record
oai:scholarspace.manoa.hawaii.edu:10125/25317

Metadata
Title:Turning language documentation into reader’s and writer’s software tools
Bibliographic Citation:Arppe, Antti, Antonsen, Lene, Trosterud, Trond, Moshagen, Sjur, Thunder, Dorothy, Snoek, Conor, Mills, Timothy, Järvikivi, Juhani, Lachler, Jordan, Arppe, Antti, Antonsen, Lene, Trosterud, Trond, Moshagen, Sjur, Thunder, Dorothy, Snoek, Conor, Mills, Timothy, Järvikivi, Juhani, Lachler, Jordan; 2015-02-28; One avenue for supporting the continued use and revitalization of endangered languages in the current, pervasively computerized world is the creation of computational models of the often rich and complex morphology of these languages. Such computational models can be used as a basis for creating a suite of reader’s and writer’s tools, including e.g. (1) an intelligent electronic dictionary that combines the computational model and a lexical database allowing for linking any inflected form with the appropriate dictionary entry, as well as the generation of word paradigms, (2) an intelligent computer-aided language learning application (ICALL) that allows for the dynamic generation of large numbers of exercises combining the entire core vocabulary (up to several thousand of the most common words) and a substantially smaller set of exercise templates, and (3) a spell-checker that supports adherence with one or more existing orthographical conventions, and thus the production of good-quality texts. Importantly, these tools can be made publicly available over the Internet and integrated as part of general software applications such as web browsers and word processors, to be used with little or no cost by any speakers or language-learners in the respective communities as well as any researchers, anywhere – instead of remaining on an individual researcher’s computer drive or on a library bookshelf. Based on our recent experiences on trying out various practical approaches in developing computational morphological models for Plains Cree and Northern Haida, using Finite-State Transducer (FST) technology (Beesley & Karttunen, 2003), once one gains access both to (a) a comprehensive set of full word paradigms, for every possible paradigm type, and (b) an accompanying extensive electronic lexical resource with coding indicating the relevant paradigm type, we have been able to create surprisingly rapidly, potentially within only several months, initial but already full-fledged FST models that can be readily adapted into the aforementioned software tools (1-3). Nevertheless, these first versions will certainly benefit from further work, where one cannot do without the active participation of the language community. However, we will demonstrate how, when a researcher or community linguist pays careful attention in their lexical documentation work on the systematic and detailed coding of the morphological characteristics of the vocabulary in some structured electronic format (e.g. when using software such as ToolBox), they will at the same time facilitate the rapid initial development of computational tools which will make benefits of their work available to the entire community. References Beesley, Kenneth R. and Lauri Karttunen. 2003. Finite State Morphology. Stanford (CA): CSLI Publications.; Kaipuleohone University of Hawai'i Digital Language Archive;http://hdl.handle.net/10125/25317.
Contributor (speaker):Arppe, Antti
Antonsen, Lene
Trosterud, Trond
Moshagen, Sjur
Thunder, Dorothy
Snoek, Conor
Mills, Timothy
Järvikivi, Juhani
Lachler, Jordan
Creator:Arppe, Antti
Antonsen, Lene
Trosterud, Trond
Moshagen, Sjur
Thunder, Dorothy
Snoek, Conor
Mills, Timothy
Järvikivi, Juhani
Lachler, Jordan
Date (W3CDTF):2015-03-12
Description:One avenue for supporting the continued use and revitalization of endangered languages in the current, pervasively computerized world is the creation of computational models of the often rich and complex morphology of these languages. Such computational models can be used as a basis for creating a suite of reader’s and writer’s tools, including e.g. (1) an intelligent electronic dictionary that combines the computational model and a lexical database allowing for linking any inflected form with the appropriate dictionary entry, as well as the generation of word paradigms, (2) an intelligent computer-aided language learning application (ICALL) that allows for the dynamic generation of large numbers of exercises combining the entire core vocabulary (up to several thousand of the most common words) and a substantially smaller set of exercise templates, and (3) a spell-checker that supports adherence with one or more existing orthographical conventions, and thus the production of good-quality texts. Importantly, these tools can be made publicly available over the Internet and integrated as part of general software applications such as web browsers and word processors, to be used with little or no cost by any speakers or language-learners in the respective communities as well as any researchers, anywhere – instead of remaining on an individual researcher’s computer drive or on a library bookshelf. Based on our recent experiences on trying out various practical approaches in developing computational morphological models for Plains Cree and Northern Haida, using Finite-State Transducer (FST) technology (Beesley & Karttunen, 2003), once one gains access both to (a) a comprehensive set of full word paradigms, for every possible paradigm type, and (b) an accompanying extensive electronic lexical resource with coding indicating the relevant paradigm type, we have been able to create surprisingly rapidly, potentially within only several months, initial but already full-fledged FST models that can be readily adapted into the aforementioned software tools (1-3). Nevertheless, these first versions will certainly benefit from further work, where one cannot do without the active participation of the language community. However, we will demonstrate how, when a researcher or community linguist pays careful attention in their lexical documentation work on the systematic and detailed coding of the morphological characteristics of the vocabulary in some structured electronic format (e.g. when using software such as ToolBox), they will at the same time facilitate the rapid initial development of computational tools which will make benefits of their work available to the entire community. References Beesley, Kenneth R. and Lauri Karttunen. 2003. Finite State Morphology. Stanford (CA): CSLI Publications.
Identifier (URI):http://hdl.handle.net/10125/25317
Rights:Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported
Table Of Contents:25317.mp3

OLAC Info

Archive:  Language Documentation and Conservation
Description:  http://www.language-archives.org/archive/ldc.scholarspace.manoa.hawaii.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:scholarspace.manoa.hawaii.edu:10125/25317
DateStamp:  2017-05-11
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Arppe, Antti; Antonsen, Lene; Trosterud, Trond; Moshagen, Sjur; Thunder, Dorothy; Snoek, Conor; Mills, Timothy; Järvikivi, Juhani; Lachler, Jordan. 2015. Language Documentation and Conservation.


http://www.language-archives.org/item.php/oai:scholarspace.manoa.hawaii.edu:10125/25317
Up-to-date as of: Mon Mar 11 1:36:18 EDT 2024