Abstract
In social sciences, studies are often based on questionnaires asking participants to express ordered responses several times over a study period. We present a model-based clustering algorithm for such longitudinal ordinal data. Assuming that an ordinal variable is the discretization of an underlying latent continuous variable, the model relies on a mixture of matrix-variate normal distributions, accounting simultaneously for within- and between-time dependence structures. The model is thus able to concurrently model the heterogeneity, the association among the responses and the temporal dependence structure. An EM algorithm is developed and presented for parameters estimation, and approaches to deal with some arising computational challenges are outlined. An evaluation of the model through synthetic data shows its estimation abilities and its advantages when compared to competitors. A real-world application concerning changes in eating behaviors during the Covid-19 pandemic period in France will be presented.
Similar content being viewed by others
References
Agresti, A.: Analysis of Ordinal Categorical Data, 2nd edn. Wiley, London (2010)
Alaimo, L.S., Amato, F., Maggino, F., Piscitelli, A., Seri, E.: A comparison of migrant integration policies via mixture of matrix-normals. Soc. Indic. Res. 165(2), 473–494 (2023). https://doi.org/10.1007/s11205-022-03024-2
Anderlucci, L., Viroli, C.: Covariance pattern mixture models for the analysis of multivariate heterogeneous longitudinal data. Ann. Appl. Stat. 9(2), 777–800 (2015). https://doi.org/10.1214/15-AOAS816
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: SODA ’07: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics, USA (2007). https://doi.org/10.5555/1283383.1283494
Basford, K.E., McLachlan, G.J.: The mixture method of clustering applied to three-way data. J. Classif. 2(1), 109–125 (1985). https://doi.org/10.1007/BF01908066
Becker, W.E., Kennedy, P.E.: A Graphical exposition of the ordered probit. Economet. Theor. 8(1), 127–131 (1992). https://doi.org/10.1017/S0266466600010781
Biernacki, C., Jacques, J.: Model-based clustering of multivariate ordinal data relying on a stochastic binary search algorithm. Stat. Comput. 26(5), 929–943 (2016). https://doi.org/10.1007/s11222-015-9585-2
Bouveyron, C., Celeux, G., Murphy, T.B., Raftery, A.E.: Model-Based Clustering and Classification for Data Science: With Applications in R. Cambridge University Press, Cambridge (2019). https://doi.org/10.1017/9781108644181
Cagnone, S., Viroli, C.: Multivariate latent variable transition models of longitudinal mixed data: an analysis on alcohol use disorder. J. R. Stat. Soc.: Ser. C: Appl. Stat. 67(5), 1399–1418 (2018). https://doi.org/10.1111/rssc.12285
Corneli, M., Bouveyron, C., Latouche, P.: Co-clustering of ordinal data via latent continuous random variables and not missing at random entries. J. Comput. Graph. Stat. 29(4), 771–785 (2020). https://doi.org/10.1080/10618600.2020.1739533
D’Elia, A., Piccolo, D.: A mixture model for preferences data analysis. Comput. Stat. Data Anal. 49(3), 917–934 (2005). https://doi.org/10.1016/j.csda.2004.06.012
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc.: Ser. B (Methodol.) 39(1), 1–22 (1977). https://doi.org/10.1111/j.2517-6161.1977.tb01600.x
Dillon, W.R., Madden, T.J., Firtle, N.: Marketing Research in a Marketing Environment. Irwin, Homewood (1994)
Doğru, F.Z., Bulut, Y.M., Arslan, O.: Finite mixtures of matrix variate t distributions. Gazi Univ. J. Sci. 29(2), 335–341 (2016)
Fernandez, D., Arnold, R., Pledger, S.: Mixture-based clustering for the ordered stereotype model. Comput. Stat. Data Anal. 93, 46–75 (2016). https://doi.org/10.1016/j.csda.2014.11.004
François-Lecompte, A., Innocent, M., Kréziak, D., Prim-Allaz, I.: Confinement et comportements alimentaires - Quelles évolutions en matière d’alimentation durable ? Rev. Fr. Gest. 46(293), 55–80 (2020). https://doi.org/10.3166/rfg.2020.00493
Gallaugher, M.P.B., McNicholas, P.D.: Finite mixtures of skewed matrix variate distributions. Pattern Recognit. 80, 83–93 (2018). https://doi.org/10.1016/j.patcog.2018.02.025
Gilula, Z., McCulloch, R.E., Ritov, Y., Urminsky, O.: A study into mechanisms of attitudinal scale conversion: a randomized stochastic ordering approach. Quant. Mark. Econ. 17(3), 325–357 (2019). https://doi.org/10.1007/s11129-019-09209-3
Giordan, M., Diana, G.: A clustering method for categorical ordinal data. Commun. Stat. - Theory Methods 40(7), 1315–1334 (2011). https://doi.org/10.1080/03610920903581010
Gupta, A.K., Nagar, D.K.: Matrix Variate Distributions. Chapman and Hall/CRC (2000)
Iannario, M., Piccolo, D.: A generalized framework for modelling ordinal data. Stat. Methods Appl. 25(2), 163–189 (2016). https://doi.org/10.1007/s10260-015-0316-9
Jacques, J., Biernacki, C.: Model-based co-clustering for ordinal data. Comput. Stat. Data Anal. 123, 101–115 (2018). https://doi.org/10.1016/j.csda.2018.01.014
Komárek, A., Komárková, L.: Capabilities of R package mixAK for clustering based on multivariate continuous and discrete longitudinal data. J. Stat. Softw. 59(12), 1–38 (2014). https://doi.org/10.18637/jss.v059.i12
Kruschke, J.K.: Doing Bayesian Data Analysis. Elsevier, Academic Press (2015)
Lewis, S.J.G., Foltynie, T., Blackwell, A.D., Robbins, T.W., Owen, A.M., Barker, R.A.: Heterogeneity of Parkinson’s disease in the early clinical stages using a data driven approach. J. Neurol. Neurosurg. Psychiatry 76(3), 343–348 (2005). https://doi.org/10.1136/jnnp.2003.033530
Liddell, T.M., Kruschke, J.K.: Analyzing ordinal data with metric models: what could possibly go wrong? J. Exp. Soc. Psychol. 79, 328–348 (2018). https://doi.org/10.1016/j.jesp.2018.08.009
Likert, R.: A technique for the measurement of attitudes. Arch. Psychol. 140, 5–55 (1932)
Lynch, S.M.: Introduction to Applied Bayesian Statistics and Estimation for Social Scientists. Springer, New York (2007)
McKelvey, R.D., Zavoina, W.: A statistical model for the analysis of ordinal level dependent variables. J. Math. Sociol. 4(1), 103–120 (1975). https://doi.org/10.1080/0022250X.1975.9989847
McNicholas, P.D., Murphy, T.B.: Model-based clustering of longitudinal data. Can. J. Stat./La Revue Canadienne de Stat. 38(1), 153–168 (2010). https://doi.org/10.1002/cjs.10047
McParland, D., Gormley, I.C.: Clustering ordinal data via latent variable models. In: Algorithms from and for Nature and Life. Springer, pp. 127–135 (2013). https://doi.org/10.1007/978-3-319-00035-0_12
McParland, D., Gormley, I.C.: Model based clustering for mixed data: clustMD. Adv. Data Anal. Classif. 10(2), 155–169 (2016). https://doi.org/10.1007/s11634-016-0238-x
Melnykov, V., Zhu, X.: On model-based clustering of skewed matrix data. J. Multivar. Anal. 167, 181–194 (2018). https://doi.org/10.1016/j.jmva.2018.04.007
Melnykov, V., Zhu, X.: Studying crime trends in the USA over the years 2000–2012. Adv. Data Anal. Classif. 13(1), 325–341 (2019). https://doi.org/10.1007/s11634-018-0326-1
Millsap, R.E., Yun-Tein, J.: Assessing factorial invariance in ordered-categorical measures. Multivar. Behav. Res. 39(3), 479–515 (2004). https://doi.org/10.1207/S15327906MBR3903_4
Ranalli, M., Rocci, R.: Mixture models for ordinal data: a pairwise likelihood approach. Stat. Comput. 26(1–2), 529–547 (2016). https://doi.org/10.1007/s11222-014-9543-4
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971). https://doi.org/10.1080/01621459.1971.10482356
Sarkar, S., Zhu, X., Melnykov, V., Ingrassia, S.: On parsimonious models for modeling matrix data. Comput. Stat. Data Anal. 142, 106822 (2020). https://doi.org/10.1016/j.csda.2019.106822
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978). https://doi.org/10.1214/aos/1176344136
Scrucca, L., Fop, M., Murphy, T.B., Raftery, A.E.: mclust 5: Clustering, classification and density estimation using Gaussian finite mixture models. R J. 8(1), 289–317 (2016). https://doi.org/10.32614/RJ-2016-021
Selosse, M., Jacques, J., Biernacki, C., Cousson-Gélie, F.: Analysing a quality-of-life survey by using a co-clustering model for ordinal data and some dynamic implications. J. R. Stat. Soc.: Ser. C (Appl. Stat.) 68(5), 1327–1349 (2019). https://doi.org/10.1111/rssc.12365
Selosse, M., Jacques, J., Biernacki, C.: ordinalClust: an R package to analyze ordinal data. R J. 12(2), 173–188 (2021). https://doi.org/10.32614/RJ-2021-011
Stevens, S.S.: On the theory of scales of measurement. Science 103(2684), 677–680 (1946). https://doi.org/10.1126/science.103.2684.677
Tomarchio, S.D., Punzo, A., Bagnato, L.: Two new matrix-variate distributions with application in model-based clustering. Comput. Stat. Data Anal. 152, 107050 (2020). https://doi.org/10.1016/j.csda.2020.107050
Vávra, J., Komárek, A.: Classification based on multivariate mixed type longitudinal data with an application to the EU-SILC database. Adv. Data Anal. Classif. 17(2), 369–406 (2023). https://doi.org/10.1007/s11634-022-00504-8
Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S. Springer, New York (2002)
Vermunt, J.K., Magidson, J.: Latnt GOLD 4.0 User’s Guide. Statistical Innovations Inc., Belmont, USA (2005)
Viroli, C.: Finite mixtures of matrix normal distributions for classifying three-way data. Stat. Comput. 21(4), 511–522 (2011). https://doi.org/10.1007/s11222-010-9188-x
Viroli, C.: Model based clustering for three-way data structures. Bayesian Anal. 6(4), 573–602 (2011). https://doi.org/10.1214/11-BA622
Viroli, C.: On matrix-variate regression analysis. J. Multivar. Anal. 111, 296–309 (2012). https://doi.org/10.1016/j.jmva.2012.04.005
Wang, Y., Melnykov, V.: On variable selection in matrix mixture modelling. Stat 9(1), 278 (2020). https://doi.org/10.1002/sta4.278
Winship, C., Mare, R.D.: Regression models with ordinal variables. Am. Sociol. Rev. 512–525 (1984) https://doi.org/10.2307/2095465
Zhu, X., Sarkar, S., Melnykov, V.: MatTransMix: an R package for matrix model-based clustering and parsimonious mixture modeling. J. Classif. 39(1), 147–170 (2022). https://doi.org/10.1007/s00357-021-09401-9
Acknowledgements
This work has been realised thanks to the financial support provided by Project IADoc@UdL of the University of Lyon and Université Lumière - Lyon 2 as part of the call for "doctoral contracts in artificial intelligence 2020" (ANR-20-THIA-0007-01). We want to thank Agnès François-Lecompte, Morgane Innocent and Dominique Kréziak, co-authors for their work in François-Lecompte et al. (2020) for sharing their data. We would also like to thank Brendan Murphy for his invaluable inputs and support throughout the research process. His insights and expertise were instrumental in shaping the direction of this project.
Author information
Authors and Affiliations
Contributions
FA and JJ wrote the main manuscript text, designed and implemented the model. IP-A provided the data, contributed and supervised the real-world application. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Amato, F., Jacques, J. & Prim-Allaz, I. Clustering longitudinal ordinal data via finite mixture of matrix-variate distributions. Stat Comput 34, 81 (2024). https://doi.org/10.1007/s11222-024-10390-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-024-10390-z