OLAC Record
oai:scholarspace.manoa.hawaii.edu:10125/42015

Metadata
Title:A tool for sharing interlinearized and lexical data in diverse formats
Bibliographic Citation:Kaufman, Daniel, Finkel, Raphael, Kaufman, Daniel, Finkel, Raphael; 2017-03-02; The last decade has seen great advances in the development of electronic tools for automated interlinearization, corpus creation and lexicon building (e.g. Fieldworks Explorer [FLEx]), as well as tools for creating time-aligned annotations (e.g. ELAN). However, methods for sharing these new data formats online lag far behind. While good options exist for lexical data (e.g. Webonary, Lexique Pro), there is no tool for turning a project created in the FLEx software into an online interlinearized corpus. We present here a tool in development which does precisely that. FLEx databases can be searched using regular expressions and individual lines from a text can be linked to audio and video media. The tool can furthermore bring together linguistic data in diverse formats (from ELAN, Praat, Fieldworks, Toolbox, Shoebox) for a single query and allow for queries over multiple language projects. We discuss the benefits of this program in relation to several ongoing fieldwork projects that are being used to evaluate it. These projects present several interesting challenges. In one, we attempt to create a unified database from several centuries of documentation during which the language showed considerable change. Similarly, in the second project we create a unified database for two lexically, syntactically and phonologically distinct dialects of the same language and show how an interlinearized database facilitates searching across dialects. Finally, in the third project, we show how video data can be integrated into an online FLEx database, a feature which is still lacking in the FLEx software itself. By way of conclusion, we show the audience how to upload their own data (either privately or publicly) and experiment with the tool’s features. Ultimately, the open source program will be available for anyone interested in hosting their own installations.; Kaipuleohone University of Hawai'i Digital Language Archive;http://hdl.handle.net/10125/42015.
Contributor (speaker):Kaufman, Daniel
Finkel, Raphael
Creator:Kaufman, Daniel
Finkel, Raphael
Date (W3CDTF):2017-03-02
Description:The last decade has seen great advances in the development of electronic tools for automated interlinearization, corpus creation and lexicon building (e.g. Fieldworks Explorer [FLEx]), as well as tools for creating time-aligned annotations (e.g. ELAN). However, methods for sharing these new data formats online lag far behind. While good options exist for lexical data (e.g. Webonary, Lexique Pro), there is no tool for turning a project created in the FLEx software into an online interlinearized corpus. We present here a tool in development which does precisely that. FLEx databases can be searched using regular expressions and individual lines from a text can be linked to audio and video media. The tool can furthermore bring together linguistic data in diverse formats (from ELAN, Praat, Fieldworks, Toolbox, Shoebox) for a single query and allow for queries over multiple language projects. We discuss the benefits of this program in relation to several ongoing fieldwork projects that are being used to evaluate it. These projects present several interesting challenges. In one, we attempt to create a unified database from several centuries of documentation during which the language showed considerable change. Similarly, in the second project we create a unified database for two lexically, syntactically and phonologically distinct dialects of the same language and show how an interlinearized database facilitates searching across dialects. Finally, in the third project, we show how video data can be integrated into an online FLEx database, a feature which is still lacking in the FLEx software itself. By way of conclusion, we show the audience how to upload their own data (either privately or publicly) and experiment with the tool’s features. Ultimately, the open source program will be available for anyone interested in hosting their own installations.
Identifier (URI):http://hdl.handle.net/10125/42015
Table Of Contents:42015.pdf
42015.mp3
Type (DCMI):Text
Sound

OLAC Info

Archive:  Language Documentation and Conservation
Description:  http://www.language-archives.org/archive/ldc.scholarspace.manoa.hawaii.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:scholarspace.manoa.hawaii.edu:10125/42015
DateStamp:  2017-05-11
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Kaufman, Daniel; Finkel, Raphael. 2017. Language Documentation and Conservation.
Terms: dcmi_Sound dcmi_Text


http://www.language-archives.org/item.php/oai:scholarspace.manoa.hawaii.edu:10125/42015
Up-to-date as of: Thu Aug 1 10:06:18 EDT 2019