OLAC Record

Title:Integrating Automatic Transcription into the Language Documentation Workflow: Experiments with Na Data and the Persephone Toolkit
Bibliographic Citation:Michaud, Alexis, Adams, Oliver, Cohn, Trevor Anthony, Neubig, Graham, Guillaume, Séverine; 2018-09; Kaipuleohone University of Hawai'i Digital Language Archive;http://hdl.handle.net/10125/24793.
Creator:Michaud, Alexis
Adams, Oliver
Cohn, Trevor Anthony
Neubig, Graham
Guillaume, Séverine
Date (W3CDTF):2018-09
Description:Automatic speech recognition tools have potential for facilitating language documentation, but in practice these tools remain little-used by linguists for a variety of reasons, such as that the technology is still new (and evolving rapidly), user-friendly interfaces are still under development, and case studies demonstrating the practical usefulness of automatic recognition in a low-resource setting remain few. This article reports on a success story in integrating automatic transcription into the language documentation workflow, specifically for Yongning Na, a language of Southwest China. Using Persephone, an open-source toolkit, a single-speaker speech transcription tool was trained over five hours of manually transcribed speech. The experiments found that this method can achieve a remarkably low error rate (on the order of 17%), and that automatic transcriptions were useful as a canvas for the linguist. The present report is intended for linguists with little or no knowledge of speech processing. It aims to provide insights into (i) the way the tool operates and (ii) the process of collaborating with natural language processing specialists. Practical recommendations are offered on how to anticipate the requirements of this type of technology from the early stages of data collection in the field.
National Foreign Language Resource Center
Format:37 pages
Identifier:Michaud, Alexis, Oliver Adams, Trevor Anthony Cohn, Graham Neubig & Séverine Guillaume. 2018. Integrating Automatic Transcription into the Language Documentation Workflow: Experiments with Na Data and the Persephone Toolkit. Language Documentation & Conservation 12. 393-429.
Identifier (URI):http://hdl.handle.net/10125/24793
Publisher:University of Hawaii Press
Rights:Creative Commons Attribution-NonCommercial 4.0 International
Attribution-NonCommercial 3.0 United States
Subject:language documentation
automatic speech transcription
automatic speech recognition
natural language processing
endangered languages
sound archive
multimedia corpora
open-source software
open access
Table Of Contents:michaud.pdf
Type (DCMI):Text


Archive:  Language Documentation and Conservation
Description:  http://www.language-archives.org/archive/ldc.scholarspace.manoa.hawaii.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:scholarspace.manoa.hawaii.edu:10125/24793
DateStamp:  2019-04-23
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Michaud, Alexis; Adams, Oliver; Cohn, Trevor Anthony; Neubig, Graham; Guillaume, Séverine. 2018. University of Hawaii Press.
Terms: dcmi_Text

Up-to-date as of: Sun Mar 1 15:49:33 EST 2020