OLAC Record
oai:scholarspace.manoa.hawaii.edu:10125/25326

Metadata
Title:LSI and DBSCAN: Natural language processing for sociolinguistic analysis
Bibliographic Citation:Collard, Jacob, Collard, Jacob; 2015-02-28; The issue of analyzing sociolinguistic and anthropological information remains an open question in contemporary social sciences. Though statistical analyses are possible to identify and quantify correlations and other relationships, it is much more difficult to examine qual- itative information, including descriptions of sociolinguistic contexts such as those found in the Endangered Languages Catalog (The Linguist List at Eastern Michigan University and The University of Hawaii at M ̄anoa, 2012). By introducing natural language processing tech- niques such as LSI, or latent semantic analysis, it becomes possible to quantify sociolinguistic descriptions to a certain degree. By quantifying natural language semantics, analysis of sociolinguistics becomes less sub- jective, though the analysis is still performed on descriptions generated by humans. Further- more, when combined with document clustering techniques such as DBSCAN (Kriegel et al., 2011) natural language processing also allows for the possibility of recognizing relationships between disparate languages hitherto overlooked. Because of the speed and breadth of this algorithm, it can recognize the relationships between any languages, regardless of geographic or genetic distance. This can provide insights into the effectiveness of different conservation techniques and language policies, as descriptions of these parameters are commonly found in natural language publications. References Hans-Peter Kriegel, Peer Kroger, Jorg Sander, and Arthur Zimek. Density-based clustering. WIREs Data Mining and Knowledge Discovery, 1(3):231–240, 2011. The Linguist List at Eastern Michigan University and The University of Hawaii at Mānoa. Endangered languages, 2012. URL http://www.endangeredlanguages.com.; Kaipuleohone University of Hawai'i Digital Language Archive;http://hdl.handle.net/10125/25326.
Contributor (speaker):Collard, Jacob
Creator:Collard, Jacob
Date (W3CDTF):2015-03-12
Description:The issue of analyzing sociolinguistic and anthropological information remains an open question in contemporary social sciences. Though statistical analyses are possible to identify and quantify correlations and other relationships, it is much more difficult to examine qual- itative information, including descriptions of sociolinguistic contexts such as those found in the Endangered Languages Catalog (The Linguist List at Eastern Michigan University and The University of Hawaii at M ̄anoa, 2012). By introducing natural language processing tech- niques such as LSI, or latent semantic analysis, it becomes possible to quantify sociolinguistic descriptions to a certain degree. By quantifying natural language semantics, analysis of sociolinguistics becomes less sub- jective, though the analysis is still performed on descriptions generated by humans. Further- more, when combined with document clustering techniques such as DBSCAN (Kriegel et al., 2011) natural language processing also allows for the possibility of recognizing relationships between disparate languages hitherto overlooked. Because of the speed and breadth of this algorithm, it can recognize the relationships between any languages, regardless of geographic or genetic distance. This can provide insights into the effectiveness of different conservation techniques and language policies, as descriptions of these parameters are commonly found in natural language publications. References Hans-Peter Kriegel, Peer Kroger, Jorg Sander, and Arthur Zimek. Density-based clustering. WIREs Data Mining and Knowledge Discovery, 1(3):231–240, 2011. The Linguist List at Eastern Michigan University and The University of Hawaii at Mānoa. Endangered languages, 2012. URL http://www.endangeredlanguages.com.
Identifier (URI):http://hdl.handle.net/10125/25326
Rights:Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported
Table Of Contents:25326.mp3

OLAC Info

Archive:  Language Documentation and Conservation
Description:  http://www.language-archives.org/archive/ldc.scholarspace.manoa.hawaii.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:scholarspace.manoa.hawaii.edu:10125/25326
DateStamp:  2017-05-11
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Collard, Jacob. 2015. Language Documentation and Conservation.


http://www.language-archives.org/item.php/oai:scholarspace.manoa.hawaii.edu:10125/25326
Up-to-date as of: Thu Aug 1 10:05:47 EDT 2019