OLAC Record: Forced Alignment for Understudied Language Varieties: Testing Prosodylab-Aligner with Tongan Data

OLAC Record
oai:scholarspace.manoa.hawaii.edu:10125/42032

Metadata

Title: Forced Alignment for Understudied Language Varieties: Testing Prosodylab-Aligner with Tongan Data

Bibliographic Citation: Johnson, Lisa, Di Paolo, Marianna, Bell, Adrian, Holt, Carter, Johnson, Lisa, Di Paolo, Marianna, Bell, Adrian, Holt, Carter; 2017-03-02; Linguists engaged in language documentation and sociolinguistics face similar problems when it comes to efficiently processing large corpora of recorded speech. Though field recordings can be collected efficiently, it may take months or years to process the audio for certain types of analysis. Besides transcription, phonetic analysis often requires the time-consuming alignment of transcription to audio. The expense related to this process may limit both the questions researchers can explore and the amount of data they can analyze. Recent advances in speech recognition technology have led to the development of tools to automate time alignment of transcriptions to audio (Evanini, Isard, and Liberman 2009, Goldman 2011, Kisler, Schiel, and Sloetjes 2012, Reddy and Stanford 2015, Rosenfelder 2013). Such automation promises to expedite the process of preparing data for acoustic analysis. Unfortunately, the benefits of auto-alignment have generally been available only to researchers studying majority languages like English, for which large corpora exist and for which acoustic models have been created by large-scale research projects or corporate entities. Prosodylab-Aligner (Gorman, Howell, and Wagner 2011), developed at McGill University and available free of charge, was developed specifically to facilitate automated alignment and segmentation for less-studied languages. It allows researchers to train their own acoustic models using the same audio files for which alignments will be created. Those models can then be used to create Praat Textgrids aligned to those recordings, with boundaries marked at both the word and segment level. Our study tests the use of Prosodylab-Aligner on Tongan field recordings. The results show that automated alignment of recordings of an understudied language is feasible for linguists without programming experience and less time-consuming than traditional manual alignments. For the benefit of others who may wish to use Prosodylab-Aligner for their own research data, the paper also reviews the software, and outlines the steps required to install software components, prepare data files, train acoustic models, and create time-aligned Textgrids. It also provides tips and solutions to problems we encountered along the way. In addition, since field recordings often contain more background noise than the kinds of laboratory recordings Prosodylab-Aligner was designed to use, the paper also presents an analysis (using PraatR (Albin 2014)) of the relative costs and benefits of removing background noise for both training and alignment purposes. References Albin, Aaron L. 2014. "PraatR: An architecture for controlling the phonetics software “Praat” with the R programming language." The Journal of the Acoustical Society of America 135 (4):2198-2199. Evanini, Keelan, Stephen Isard, and Mark Liberman. 2009. "Automatic formant extraction for sociolinguistic analysis of large corpora." INTERSPEECH. Goldman, Jean-Philippe. 2011. "Esayalign: an automatic phonetic alignment tool under Praat." Interspeech-2011:3233-3236. Gorman, Kyle, Jonathan Howell, and Michael Wagner. 2011. "Prosodylab-Aligner: A Tool for Forced Alignment of Laboratroy Speech." Canadian Acoustics 39 (3):192-193. Kisler, Thomas, Florian Schiel, and Han Sloetjes. 2012. "Signal processing via web services: the use case WebMAUS." Digital Humanities Conference 2012. Reddy, Sravana, and James Stanford. 2015. "Toward completely automated vowel extraction: Introducing DARLA." Linguistics Vanguard. Rosenfelder, Ingrid. 2013. "Forced Alignment & Vowel Extraction (FAVE): An online suite for automatic vowel analysis." University of Pennsylvania Linguistics Lab, Last Modified December 8, 2013, accessed November 26. 2015. http://fave.ling.upenn.edu/index.html.; Kaipuleohone University of Hawai'i Digital Language Archive;http://hdl.handle.net/10125/42032.

Contributor (speaker): Johnson, Lisa

Di Paolo, Marianna

Bell, Adrian

Holt, Carter

Creator: Johnson, Lisa

Di Paolo, Marianna

Bell, Adrian

Holt, Carter

Date (W3CDTF): 2017-03-02

Description: Linguists engaged in language documentation and sociolinguistics face similar problems when it comes to efficiently processing large corpora of recorded speech. Though field recordings can be collected efficiently, it may take months or years to process the audio for certain types of analysis. Besides transcription, phonetic analysis often requires the time-consuming alignment of transcription to audio. The expense related to this process may limit both the questions researchers can explore and the amount of data they can analyze. Recent advances in speech recognition technology have led to the development of tools to automate time alignment of transcriptions to audio (Evanini, Isard, and Liberman 2009, Goldman 2011, Kisler, Schiel, and Sloetjes 2012, Reddy and Stanford 2015, Rosenfelder 2013). Such automation promises to expedite the process of preparing data for acoustic analysis. Unfortunately, the benefits of auto-alignment have generally been available only to researchers studying majority languages like English, for which large corpora exist and for which acoustic models have been created by large-scale research projects or corporate entities. Prosodylab-Aligner (Gorman, Howell, and Wagner 2011), developed at McGill University and available free of charge, was developed specifically to facilitate automated alignment and segmentation for less-studied languages. It allows researchers to train their own acoustic models using the same audio files for which alignments will be created. Those models can then be used to create Praat Textgrids aligned to those recordings, with boundaries marked at both the word and segment level. Our study tests the use of Prosodylab-Aligner on Tongan field recordings. The results show that automated alignment of recordings of an understudied language is feasible for linguists without programming experience and less time-consuming than traditional manual alignments. For the benefit of others who may wish to use Prosodylab-Aligner for their own research data, the paper also reviews the software, and outlines the steps required to install software components, prepare data files, train acoustic models, and create time-aligned Textgrids. It also provides tips and solutions to problems we encountered along the way. In addition, since field recordings often contain more background noise than the kinds of laboratory recordings Prosodylab-Aligner was designed to use, the paper also presents an analysis (using PraatR (Albin 2014)) of the relative costs and benefits of removing background noise for both training and alignment purposes. References Albin, Aaron L. 2014. "PraatR: An architecture for controlling the phonetics software “Praat” with the R programming language." The Journal of the Acoustical Society of America 135 (4):2198-2199. Evanini, Keelan, Stephen Isard, and Mark Liberman. 2009. "Automatic formant extraction for sociolinguistic analysis of large corpora." INTERSPEECH. Goldman, Jean-Philippe. 2011. "Esayalign: an automatic phonetic alignment tool under Praat." Interspeech-2011:3233-3236. Gorman, Kyle, Jonathan Howell, and Michael Wagner. 2011. "Prosodylab-Aligner: A Tool for Forced Alignment of Laboratroy Speech." Canadian Acoustics 39 (3):192-193. Kisler, Thomas, Florian Schiel, and Han Sloetjes. 2012. "Signal processing via web services: the use case WebMAUS." Digital Humanities Conference 2012. Reddy, Sravana, and James Stanford. 2015. "Toward completely automated vowel extraction: Introducing DARLA." Linguistics Vanguard. Rosenfelder, Ingrid. 2013. "Forced Alignment & Vowel Extraction (FAVE): An online suite for automatic vowel analysis." University of Pennsylvania Linguistics Lab, Last Modified December 8, 2013, accessed November 26. 2015. http://fave.ling.upenn.edu/index.html.

Identifier (URI): http://hdl.handle.net/10125/42032

Table Of Contents: 42032.mp3

42032.pdf

Type (DCMI): Text

Sound

OLAC Info

Archive: Language Documentation and Conservation

Description: http://www.language-archives.org/archive/ldc.scholarspace.manoa.hawaii.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:scholarspace.manoa.hawaii.edu:10125/42032

DateStamp: 2024-07-31

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Johnson, Lisa; Di Paolo, Marianna; Bell, Adrian; Holt, Carter. 2017. Language Documentation and Conservation.
Terms: dcmi_Sound dcmi_Text

http://www.language-archives.org/item.php/oai:scholarspace.manoa.hawaii.edu:10125/42032
Up-to-date as of: Thu Sep 25 0:32:08 EDT 2025

Metadata
Title:		Forced Alignment for Understudied Language Varieties: Testing Prosodylab-Aligner with Tongan Data
Bibliographic Citation:		Johnson, Lisa, Di Paolo, Marianna, Bell, Adrian, Holt, Carter, Johnson, Lisa, Di Paolo, Marianna, Bell, Adrian, Holt, Carter; 2017-03-02; Linguists engaged in language documentation and sociolinguistics face similar problems when it comes to efficiently processing large corpora of recorded speech. Though field recordings can be collected efficiently, it may take months or years to process the audio for certain types of analysis. Besides transcription, phonetic analysis often requires the time-consuming alignment of transcription to audio. The expense related to this process may limit both the questions researchers can explore and the amount of data they can analyze. Recent advances in speech recognition technology have led to the development of tools to automate time alignment of transcriptions to audio (Evanini, Isard, and Liberman 2009, Goldman 2011, Kisler, Schiel, and Sloetjes 2012, Reddy and Stanford 2015, Rosenfelder 2013). Such automation promises to expedite the process of preparing data for acoustic analysis. Unfortunately, the benefits of auto-alignment have generally been available only to researchers studying majority languages like English, for which large corpora exist and for which acoustic models have been created by large-scale research projects or corporate entities. Prosodylab-Aligner (Gorman, Howell, and Wagner 2011), developed at McGill University and available free of charge, was developed specifically to facilitate automated alignment and segmentation for less-studied languages. It allows researchers to train their own acoustic models using the same audio files for which alignments will be created. Those models can then be used to create Praat Textgrids aligned to those recordings, with boundaries marked at both the word and segment level. Our study tests the use of Prosodylab-Aligner on Tongan field recordings. The results show that automated alignment of recordings of an understudied language is feasible for linguists without programming experience and less time-consuming than traditional manual alignments. For the benefit of others who may wish to use Prosodylab-Aligner for their own research data, the paper also reviews the software, and outlines the steps required to install software components, prepare data files, train acoustic models, and create time-aligned Textgrids. It also provides tips and solutions to problems we encountered along the way. In addition, since field recordings often contain more background noise than the kinds of laboratory recordings Prosodylab-Aligner was designed to use, the paper also presents an analysis (using PraatR (Albin 2014)) of the relative costs and benefits of removing background noise for both training and alignment purposes. References Albin, Aaron L. 2014. "PraatR: An architecture for controlling the phonetics software “Praat” with the R programming language." The Journal of the Acoustical Society of America 135 (4):2198-2199. Evanini, Keelan, Stephen Isard, and Mark Liberman. 2009. "Automatic formant extraction for sociolinguistic analysis of large corpora." INTERSPEECH. Goldman, Jean-Philippe. 2011. "Esayalign: an automatic phonetic alignment tool under Praat." Interspeech-2011:3233-3236. Gorman, Kyle, Jonathan Howell, and Michael Wagner. 2011. "Prosodylab-Aligner: A Tool for Forced Alignment of Laboratroy Speech." Canadian Acoustics 39 (3):192-193. Kisler, Thomas, Florian Schiel, and Han Sloetjes. 2012. "Signal processing via web services: the use case WebMAUS." Digital Humanities Conference 2012. Reddy, Sravana, and James Stanford. 2015. "Toward completely automated vowel extraction: Introducing DARLA." Linguistics Vanguard. Rosenfelder, Ingrid. 2013. "Forced Alignment & Vowel Extraction (FAVE): An online suite for automatic vowel analysis." University of Pennsylvania Linguistics Lab, Last Modified December 8, 2013, accessed November 26. 2015. http://fave.ling.upenn.edu/index.html.; Kaipuleohone University of Hawai'i Digital Language Archive;http://hdl.handle.net/10125/42032.
Contributor (speaker):		Johnson, Lisa
		Di Paolo, Marianna
		Bell, Adrian
		Holt, Carter
Creator:		Johnson, Lisa
		Di Paolo, Marianna
		Bell, Adrian
		Holt, Carter
Date (W3CDTF):		2017-03-02
Description:		Linguists engaged in language documentation and sociolinguistics face similar problems when it comes to efficiently processing large corpora of recorded speech. Though field recordings can be collected efficiently, it may take months or years to process the audio for certain types of analysis. Besides transcription, phonetic analysis often requires the time-consuming alignment of transcription to audio. The expense related to this process may limit both the questions researchers can explore and the amount of data they can analyze. Recent advances in speech recognition technology have led to the development of tools to automate time alignment of transcriptions to audio (Evanini, Isard, and Liberman 2009, Goldman 2011, Kisler, Schiel, and Sloetjes 2012, Reddy and Stanford 2015, Rosenfelder 2013). Such automation promises to expedite the process of preparing data for acoustic analysis. Unfortunately, the benefits of auto-alignment have generally been available only to researchers studying majority languages like English, for which large corpora exist and for which acoustic models have been created by large-scale research projects or corporate entities. Prosodylab-Aligner (Gorman, Howell, and Wagner 2011), developed at McGill University and available free of charge, was developed specifically to facilitate automated alignment and segmentation for less-studied languages. It allows researchers to train their own acoustic models using the same audio files for which alignments will be created. Those models can then be used to create Praat Textgrids aligned to those recordings, with boundaries marked at both the word and segment level. Our study tests the use of Prosodylab-Aligner on Tongan field recordings. The results show that automated alignment of recordings of an understudied language is feasible for linguists without programming experience and less time-consuming than traditional manual alignments. For the benefit of others who may wish to use Prosodylab-Aligner for their own research data, the paper also reviews the software, and outlines the steps required to install software components, prepare data files, train acoustic models, and create time-aligned Textgrids. It also provides tips and solutions to problems we encountered along the way. In addition, since field recordings often contain more background noise than the kinds of laboratory recordings Prosodylab-Aligner was designed to use, the paper also presents an analysis (using PraatR (Albin 2014)) of the relative costs and benefits of removing background noise for both training and alignment purposes. References Albin, Aaron L. 2014. "PraatR: An architecture for controlling the phonetics software “Praat” with the R programming language." The Journal of the Acoustical Society of America 135 (4):2198-2199. Evanini, Keelan, Stephen Isard, and Mark Liberman. 2009. "Automatic formant extraction for sociolinguistic analysis of large corpora." INTERSPEECH. Goldman, Jean-Philippe. 2011. "Esayalign: an automatic phonetic alignment tool under Praat." Interspeech-2011:3233-3236. Gorman, Kyle, Jonathan Howell, and Michael Wagner. 2011. "Prosodylab-Aligner: A Tool for Forced Alignment of Laboratroy Speech." Canadian Acoustics 39 (3):192-193. Kisler, Thomas, Florian Schiel, and Han Sloetjes. 2012. "Signal processing via web services: the use case WebMAUS." Digital Humanities Conference 2012. Reddy, Sravana, and James Stanford. 2015. "Toward completely automated vowel extraction: Introducing DARLA." Linguistics Vanguard. Rosenfelder, Ingrid. 2013. "Forced Alignment & Vowel Extraction (FAVE): An online suite for automatic vowel analysis." University of Pennsylvania Linguistics Lab, Last Modified December 8, 2013, accessed November 26. 2015. http://fave.ling.upenn.edu/index.html.
Identifier (URI):		http://hdl.handle.net/10125/42032
Table Of Contents:		42032.mp3
Table Of Contents:		42032.pdf
Type (DCMI):		Text
Type (DCMI):		Sound
OLAC Info
Archive:		Language Documentation and Conservation
Description:		http://www.language-archives.org/archive/ldc.scholarspace.manoa.hawaii.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:scholarspace.manoa.hawaii.edu:10125/42032
DateStamp:		2024-07-31
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Johnson, Lisa; Di Paolo, Marianna; Bell, Adrian; Holt, Carter. 2017. Language Documentation and Conservation.
Terms:		dcmi_Sound dcmi_Text