Abstract
We discuss the notorious problem of order selection in hidden Markov models, that is of selecting an adequate number of states, highlighting typical pitfalls and practical challenges arising when analyzing real data. Extensive simulations are used to demonstrate the reasons that render order selection particularly challenging in practice despite the conceptual simplicity of the task. In particular, we demonstrate why well-established formal procedures for model selection, such as those based on standard information criteria, tend to favor models with numbers of states that are undesirably large in situations where states shall be meaningful entities. We also offer a pragmatic step-by-step approach together with comprehensive advice for how practitioners can implement order selection. Our proposed strategy is illustrated with a real-data case study on muskox movement.
Supplementary materials accompanying this paper appear online.
Similar content being viewed by others
References
Biernacki, C., Celeux, G. & Govaert, G. (2013), Assessing a mixture model for clustering with the integrated completed likelihood IEEE Transactions on pattern analysis and machine intelligence, 22, 719–725.
Broekhuis, F., Grünewälder, S., McNutt, J.W. & Macdonald, D.W. (2014), Optimal hunting conditions drive circalunar behavior of a diurnal carnivore. Behavioral Ecology, 25, 1285–1275.
Burnham, K.P. & Anderson, D.R. (2002), Model Selection and Multimodel Inference, Second Edition, Springer, New York.
Celeux, G. & Durand, J.-B. (2008), Selecting hidden Markov model state number with cross-validated likelihood. Computational Statistics, 23, 541–564.
DeRuiter, S.L., Langrock, R., Skirbutas, T., Goldbogen, J.A., Calambokidis, J., Friedlaender, A.S. & Southall, B.L. (in press), A multivariate mixed hidden Markov model for blue whale behaviour and responses to sound exposure. Annals of Applied Statistics, 11, 362–392.
Gneiting, T. & Raftery, A.E. (2007), Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102, 359–378.
Hennig, C. (2015), What are the true clusters? Pattern Recognition Letters, 64, 53–62.
Langrock, R. (2012), Flexible latent-state modelling of Old Faithful’s eruption inter-arrival times in 2009. Australian and New Zealand Journal of Statistics, 54, 261–279.
Langrock, R., King, R., Matthiopoulos, J., Thomas, L., Fortin, D. & Morales, J.M. (2012), Flexible and practical modeling of animal telemetry data: hidden Markov models and extensions. Ecology, 93, 2336–2342.
Langrock, R., Kneib, T., Sohn, A. & DeRuiter, S.L. (2015), Nonparametric inference in hidden Markov models using P-splines. Biometrics, 71, 520–528.
Langrock, R., Marques, T.A., Baird, R.W. & Thomas, L. (2014), Modeling the diving behavior of whales: a latent-variable approach with feedback and semi-Markovian components. Journal of Agricultural, Biological and Environmental Statistics, 19, 82–100.
Leos-Barajas, V., Photopoulou, T., Langrock, R., Patterson, T.A., Watanabe, Y.Y., Murgatroyd, M. & Papastamatiou, Y.P. (in press), Analysis of animal accelerometer data using hidden Markov models. Methods in Ecology and Evolution, 8, 161–173.
Li, M. & Bolker, B.M. (2017), Incorporating periodic variability in hidden Markov models for animal movement Movement Ecology, 5, DOI:10.1186/s40462-016-0093-6.
Michelot, T., Langrock, R. & Patterson, T.A. (2016), moveHMM: An R package for analysing animal movement data using hidden Markov models. Methods in Ecology and Evolution, 7, 1308–1315.
Morales, J.M., Haydon, D.T., Frair, J., Holsinger, K.E. & Fryxell, J.M. (2004), Extracting more out of relocation data: building movement models as mixtures of random walks. Ecology, 85, 2436–2445.
Patterson, T.A., Basson, M., Bravington, M.V. & Gunn, J.S. (2009), Classifying movement behaviour in relation to environmental conditions using hidden Markov models. Journal of Animal Ecology, 78, 1113–1123.
Patterson, T.A., Parton, A., Langrock, R., Blackwell, P.G., Thomas, L. & King, R. (2016), Statistical modelling of animal movement: a myopic review and a discussion of good practice. arXiv:1603.07511.
Pradel, R. (2005), Multievent: an extension of multistate capture–recapture models to uncertain states, Biometrics, 61, 442–447.
Robert, C.P., Rydén, T. & Titterington, D.M. (2000), Bayesian inference in hidden Markov models through the reversible jump Markov chain Monte Carlo method. Journal of the Royal Statistical Society Series B, 62, 57–75.
Schmidt, N.M., van Beest, F.M., Mosbacher, J.B., Stelvig, M., Hansen, L.H. & C. Grøndahl. (2016), Ungulate movement in an extreme seasonal environment: Year-round movement patterns of high-arctic muskoxen. Wildlife Biology, 22, 253–267.
Schwarz, G. (1978), Estimating the dimension of a model. The Annals of Statistics, 6, 461–464.
Stone, M. (1977), An asymptotic equivalence of choice of model by cross-validation and Akaike’s Criterion. Journal of the Royal Statistical Society Series B, 39, 44–47.
Towner, A., Leos-Barajas, V., Langrock, R., Schick, R.S., Smale, M.J., Jewell, O., Kaschke, T. & Papastamatiou, Y.P. (2016), Sex-specific and individual preferences for hunting strategies in white sharks. Functional Ecology, 30, 1397–1407.
Zucchini, W. (2000), An introduction to model selection. Journal of Mathematical Psychology, 44, 41–61.
Zucchini, W., MacDonald, I.L. & Langrock, R. (2016), Hidden Markov Models for Time Series: An Introduction using R, Second Edition, Chapman & Hall/CRC, Boca Raton.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Pohle, J., Langrock, R., van Beest, F.M. et al. Selecting the Number of States in Hidden Markov Models: Pragmatic Solutions Illustrated Using Animal Movement. JABES 22, 270–293 (2017). https://doi.org/10.1007/s13253-017-0283-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13253-017-0283-8