Using hidden Markov models to find discrete targets in continuous sociophonetic data

Daniel Duncan

doi:10.1515/lingvan-2020-0057

Published by De Gruyter Mouton July 27, 2021

Using hidden Markov models to find discrete targets in continuous sociophonetic data

Daniel Duncan

From the journal Linguistics Vanguard

https://doi.org/10.1515/lingvan-2020-0057

Showing a limited preview of this publication:

Abstract

Advances in sociophonetic research resulted in features once sorted into discrete bins now being measured continuously. This has implied a shift in what sociolinguists view as the abstract representation of the sociolinguistic variable. When measured discretely, variation is variation in selection: one variant is selected for production, and factors influencing language variation and change are influencing the frequency at which variants are selected. Measured continuously, variation is variation in execution: speakers have a single target for production, which they approximate with varying success. This paper suggests that both approaches can and should be considered in sociophonetic analysis. To that end, I offer the use of hidden Markov models (HMMs) as a novel approach to find speakers’ multiple targets within continuous data. Using the lot vowel among whites in Greater St. Louis as a case study, I compare 2-state and 1-state HMMs constructed at the individual speaker level. Ten of fifty-two speakers’ production is shown to involve the regular use of distinct fronted and backed variants of the vowel. This finding illustrates HMMs’ capacity to allow us to consider variation as both variant selection and execution, making them a useful tool in the analysis of sociophonetic data.

Keywords: hidden Markov models; quantitative methods; sociophonetics; variation

Corresponding author: Daniel Duncan, School of English Literature, Language and Linguistics, Newcastle University, Percy Building, Newcastle upon Tyne NE1 7RU, UK, E-mail: daniel.duncan@ncl.ac.uk

Funding source: NSF

Award Identifier / Grant number: BCS-1651102 DDRI

Acknowledgments

This work was previously presented at the 2019 Symposium on Representations, Usage and Social Embedding in Language Change, held at the University of Manchester. Thanks to the audience there, as well as two anonymous reviewers, for helpful comments.

Research funding: The data discussed here were collected as part of NSF grant BCS-1651102 DDRI.

Appendix: Example R code

In this study, I use the depmixS4 package (Visser and Speekenbrink 2010) to run hidden Markov models in R (R Core Team 2017). Here, I illustrate the code used to obtain models similar to those run in the study. The 1-state model generated by this code assumes the data to be normally distributed around the mean, while the 2-state model assumes both states are normally distributed around the state mean.

After installing the package, it must be loaded prior to use.

Data should be loaded in one’s preferred format. If the original data file has multiple phones in it, create a new data frame composed of a single-phone subset of the original.

Because there is some randomness involved in an HMM, set the random seed to ensure consistency between runs.

HMMs will be created for individual speakers. For each speaker, make an individual-level subset of the data.

Now make a 2-state HMM for each individual. ‘nstates’ determines the number of states the model assumes. While the model here simply assumes a normal distribution around the state mean, note that the formula can be adapted for more complex modeling if necessary.

In order to view the summary data, we fit the HMM to our data. Viewing the fitted model gives the model AIC, BIC, and log likelihood.

We now make a 1-state HMM and follow the same process.

In this example, the 2-state model is selected because it has the lower BIC. In this case, we run the following to view the initial state probabilities, transition matrix, and response parameters.

References

Arthur, Rob & Greg Matthews. 2017. Baseball’s ‘hot hand’ is real. FiveThirtyEight. https://fivethirtyeight.com/features/baseballs-hot-hand-is-real/ (accessed 18 June 2020).Search in Google Scholar

Baayen, R. Harald, Douglas J. Davidson & Douglas M. Bates. 2008. Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language 59. 390–412. https://doi.org/10.1016/j.jml.2007.12.005.Search in Google Scholar

Baranowski, Maciej. 2015. Sociophonetics. In Robert Bayley, Richard Cameron & Ceil Lucas (eds.), The Oxford handbook of sociolinguistics, 403–424. Oxford: Oxford University Press.10.1093/oxfordhb/9780199744084.013.0020Search in Google Scholar

Becker, Kara. 2010. Regional dialect features on the Lower East Side of New York City: Sociophonetics, ethnicity, and identity. New York: New York University dissertation.Search in Google Scholar

Bleaman, Isaac. 2020. Implicit standardization in a minority language community: Real-time syntactic change among Hasidic Yiddish writers. Frontiers in Artificial Intelligence 3. Article 35. https://doi.org/10.3389/frai.2020.00035.Search in Google Scholar

Boersma, Paul & David Weenink. 2017. Praat: Doing phonetics by computer, Version 6.0.28. http://www.praat.org/.Search in Google Scholar

Driscoll, Anna & Emma Lape. 2015. Reversal of the Northern Cities Shift in Syracuse, New York. University of Pennsylvania Working Papers in Linguistics 21(2). 41–47.Search in Google Scholar

Duncan, Daniel. 2018. Language variation and change in the geographies of suburbs. New York: New York University Dissertation.Search in Google Scholar

Duncan, Daniel. 2019. The influence of suburban development and metropolitan fragmentation on language variation and change: Evidence from Greater St. Louis. Journal of Linguistic Geography 7(2). 82–97. https://doi.org/10.1017/jlg.2019.8.Search in Google Scholar

Duncan, Daniel. under review. Merger reversal in St. Louis: Implementation and implications. Ms., Newcastle University.Search in Google Scholar

Durian, David. 2007. Getting [S]tronger every day?: More on urbanization and the socio-geographic diffusion of (str) in Columbus, OH. University of Pennsylvania Working Papers in Linguistics 13(2). 65–79.Search in Google Scholar

Friedman, Lauren. 2014. The St. Louis Corridor: Mixing, competing, and retreating dialects. University of Pennsylvania PhD Dissertation.Search in Google Scholar

Fruehwald, Josef. 2016. The early influence of phonology on a phonetic change. Language 92(2). 376–410. https://doi.org/10.1353/lan.2016.0041.Search in Google Scholar

Goldsmith, John & Aris Xanthos. 2008. Three models for learning phonological categories. Chicago: Department of Computer Science, University of Chicago. https://newtraell.cs.uchicago.edu/research/publications/techreports/TR-2008-08 (accessed 18 June 2020).Search in Google Scholar

Goodheart, Jill C. 2004. I’m no hoosier: Evidence of the Northern Cities Shift in St. Louis, Missouri. Michigan State University MA Thesis.Search in Google Scholar

Gordon, Matthew J. 2001. Small-town values and big-city vowels: A study of the Northern Cities Shift in Michigan (Publication of the American Dialects Society 84). Durham, NC: Duke University Press.Search in Google Scholar

Gylfadottír, Duna. 2015. Streets of Philadelphia: An acoustic study of /str/-retraction in a naturalistic speech corpus. University of Pennsylvania Working Papers in Linguistics 21(2). 89–97.Search in Google Scholar

Hay, Jennifer, Paul Warren & Katie Drager. 2006. Factors influencing speech perception in the context of a merger-in-progress. Journal of Phonetics 34. 458–484. https://doi.org/10.1016/j.wocn.2005.10.001.Search in Google Scholar

Jaggers, Zachary S. 2018. Evidence and characterization of a glide-vowel distinction in American English. Laboratory Phonology 9(1). 1–27. Article 3. https://doi.org/10.5334/labphon.36.Search in Google Scholar

Johnson, Daniel E. 2009. Getting off the GoldVarb standard: Introducing Rbrul for mixed‐effects variable rule analysis. Language and Linguistics Compass 3(1). 359–383. https://doi.org/10.1111/j.1749-818x.2008.00108.x.Search in Google Scholar

Kass, Robert E. & Adrian E. Raftery. 1995. Bayes factors. Journal of the American Statistical Association 90. 773–795. https://doi.org/10.1080/01621459.1995.10476572.Search in Google Scholar

Labov, William. 1994. Principles of linguistic change volume 1: Internal factors. Oxford: Blackwell Publishers.Search in Google Scholar

Labov, William, Mark Karen & Corey Miller. 1991. Near-mergers and the suspension of phonemic contrast. Language Variation and Change 3. 33–74. https://doi.org/10.1017/s0954394500000442.Search in Google Scholar

Labov, William, Sharon Ash & Charles Boberg. 2006. The atlas of North American English. New York: Mouton de Gruyter.10.1515/9783110167467Search in Google Scholar

Leach, Hannah. 2018. Sociophonetic variation in Stoke-on-Trent’s pottery industry. University of Sheffield PhD Dissertation.Search in Google Scholar

Lobanov, Boris M. 1971. Classification of Russian vowels spoken by different listeners. Journal of the Acoustical Society of America 49. 606–608. https://doi.org/10.1121/1.1912396.Search in Google Scholar

Love, Jessica & Abby Walker. 2012. Football versus football: Effect of topic on /r/ realization in American and English sports fans. Language and Speech 56(4). 443–460. https://doi.org/10.1177/0023830912453132.Search in Google Scholar

MacKenzie, Laurel. 2020. Comparing constraints on contraction using Bayesian regression modeling. Frontiers in Artificial Intelligence 3. Article 58. https://doi.org/10.3389/frai.2020.00058.Search in Google Scholar

Mayer, Connor. 2020. An algorithm for learning phonological classes from distributional similarity. Phonology 37. 91–131. https://doi.org/10.1017/s0952675720000056.Search in Google Scholar

Nycz, Jennifer. 2013. New contrast acquisition: Methodological issues and theoretical implications. English Language and Linguistics 17(2). 325–357. https://doi.org/10.1017/s1360674313000051.Search in Google Scholar

R Core Team. 2017. R: A language and environment for statistical computing. https://www.R-project.org.Search in Google Scholar

Rosenfelder, Ingrid, Josef Fruehwald, Keelan Evanini, Seyfarth Scott, Kyle Gorman, Hilary Prichard & Jiahong Yuan. 2014. FAVE (Forced Alignment and Vowel Extraction) program suite. Version 1.2.2. https://doi.org/10.5281/zenodo.22281.Search in Google Scholar

Rutter, Ben. 2011. Acoustic analysis of a sound change in progress: The consonant cluster /stɹ/ in English. Journal of the International Phonetic Association 41. 27–40. https://doi.org/10.1017/s0025100310000307.Search in Google Scholar

Sankoff, David, Sali A. Tagliamonte & Eric Smith. 2005. Goldvarb X: A variable rule application for Macintosh and Windows. Toronto: Department of Linguistics, University of Toronto.Search in Google Scholar

Sneller, Betsy. 2018. Mechanisms of phonological change. University of Pennsylvania PhD Dissertation.Search in Google Scholar

Starner, Thad & Alex Pentland. 1995. Real-time American Sign Language recognition from video using hidden Markov models. MIT Media Laboratory Perceptual Computing Section Technical Report No. 375. https://www.cc.gatech.edu/∼thad/p/031_10_SL/real-time-asl-recognition-from%20video-using-hmm-ISCV95.pdf (accessed 18 June 2020).10.1109/ISCV.1995.477012Search in Google Scholar

Tamminga, Meredith. 2016. Persistence in phonological and morphological variation. Language Variation and Change 28. 335–356. https://doi.org/10.1017/s0954394516000119.Search in Google Scholar

Tamminga, Meredith, Christopher Ahern & Aaron Ecay. 2016. Generalized additive mixed models for intraspeaker variation. Linguistics Vanguard 2(s1). 1–9. https://doi.org/10.1515/lingvan-2016-0030.Search in Google Scholar

Turton, Danielle. 2017. Categorical or gradient? An ultrasound investigation of /l/-darkening and vocalization in varieties of English. Laboratory Phonology: Journal of the Association for Laboratory Phonology 8(1). 1–31. Article 13. https://doi.org/10.5334/labphon.35.Search in Google Scholar

Villareal, Dan, Lynn Clark, Jennifer Hay & Kevin Watson. 2020. From categories to gradience: Auto-coding sociophonetic variation with random forests. Laboratory Phonology: Journal of the Association for Laboratory Phonology 11(1). 1–31. Article 6.10.5334/labphon.216Search in Google Scholar

Visser, I. & M. Speekenbrink. 2010. depmixS4: An R Package for Hidden Markov Models. Journal of Statistical Software 36(7). 1–21. https://doi.org/10.18637/jss.v036.i07.Search in Google Scholar

Wagner, Suzanne E., Alexander Mason, Monica Nesbitt, Erin Pevan & Matt Savage. 2016. Reversal and re-organization of the Northern Cities Shift in Michigan. University of Pennsylvania Working Papers in Linguistics 22(2). 171–179.Search in Google Scholar

Wilbanks, Eric. 2017. Social and structural constraints on a phonetically-motivated change in progress: (str) retraction in Raleigh, NC. University of Pennsylvania Working Papers in Linguistics 23(1). 301–310.10.5070/P7121040720Search in Google Scholar

Received: 2020-06-22

Accepted: 2020-11-09

Published Online: 2021-07-27

Using hidden Markov models to find discrete targets in continuous sociophonetic data

Abstract

Acknowledgments

Appendix: Example R code

References

Journal and Issue

Articles in the same Issue