Clustering longitudinal ordinal data via finite mixture of matrix-variate distributions

Amato, Francesco; Jacques, Julien; Prim-Allaz, Isabelle

doi:10.1007/s11222-024-10390-z

Clustering longitudinal ordinal data via finite mixture of matrix-variate distributions

Original Paper
Published: 17 February 2024

Volume 34, article number 81, (2024)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

137 Accesses
1 Altmetric
Explore all metrics

Abstract

In social sciences, studies are often based on questionnaires asking participants to express ordered responses several times over a study period. We present a model-based clustering algorithm for such longitudinal ordinal data. Assuming that an ordinal variable is the discretization of an underlying latent continuous variable, the model relies on a mixture of matrix-variate normal distributions, accounting simultaneously for within- and between-time dependence structures. The model is thus able to concurrently model the heterogeneity, the association among the responses and the temporal dependence structure. An EM algorithm is developed and presented for parameters estimation, and approaches to deal with some arising computational challenges are outlined. An evaluation of the model through synthetic data shows its estimation abilities and its advantages when compared to competitors. A real-world application concerning changes in eating behaviors during the Covid-19 pandemic period in France will be presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Row mixture-based clustering with covariates for ordinal responses

Article Open access 22 July 2023

Bayesian model-based clustering for longitudinal ordinal data

Article 19 February 2019

Clustering Ordinal Data via Latent Variable Models

References

Agresti, A.: Analysis of Ordinal Categorical Data, 2nd edn. Wiley, London (2010)
Book Google Scholar
Alaimo, L.S., Amato, F., Maggino, F., Piscitelli, A., Seri, E.: A comparison of migrant integration policies via mixture of matrix-normals. Soc. Indic. Res. 165(2), 473–494 (2023). https://doi.org/10.1007/s11205-022-03024-2
Article Google Scholar
Anderlucci, L., Viroli, C.: Covariance pattern mixture models for the analysis of multivariate heterogeneous longitudinal data. Ann. Appl. Stat. 9(2), 777–800 (2015). https://doi.org/10.1214/15-AOAS816
Article MathSciNet Google Scholar
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: SODA ’07: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics, USA (2007). https://doi.org/10.5555/1283383.1283494
Basford, K.E., McLachlan, G.J.: The mixture method of clustering applied to three-way data. J. Classif. 2(1), 109–125 (1985). https://doi.org/10.1007/BF01908066
Article Google Scholar
Becker, W.E., Kennedy, P.E.: A Graphical exposition of the ordered probit. Economet. Theor. 8(1), 127–131 (1992). https://doi.org/10.1017/S0266466600010781
Article Google Scholar
Biernacki, C., Jacques, J.: Model-based clustering of multivariate ordinal data relying on a stochastic binary search algorithm. Stat. Comput. 26(5), 929–943 (2016). https://doi.org/10.1007/s11222-015-9585-2
Article MathSciNet Google Scholar
Bouveyron, C., Celeux, G., Murphy, T.B., Raftery, A.E.: Model-Based Clustering and Classification for Data Science: With Applications in R. Cambridge University Press, Cambridge (2019). https://doi.org/10.1017/9781108644181
Book Google Scholar
Cagnone, S., Viroli, C.: Multivariate latent variable transition models of longitudinal mixed data: an analysis on alcohol use disorder. J. R. Stat. Soc.: Ser. C: Appl. Stat. 67(5), 1399–1418 (2018). https://doi.org/10.1111/rssc.12285
Article MathSciNet Google Scholar
Corneli, M., Bouveyron, C., Latouche, P.: Co-clustering of ordinal data via latent continuous random variables and not missing at random entries. J. Comput. Graph. Stat. 29(4), 771–785 (2020). https://doi.org/10.1080/10618600.2020.1739533
Article MathSciNet Google Scholar
D’Elia, A., Piccolo, D.: A mixture model for preferences data analysis. Comput. Stat. Data Anal. 49(3), 917–934 (2005). https://doi.org/10.1016/j.csda.2004.06.012
Article MathSciNet Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc.: Ser. B (Methodol.) 39(1), 1–22 (1977). https://doi.org/10.1111/j.2517-6161.1977.tb01600.x
Article MathSciNet Google Scholar
Dillon, W.R., Madden, T.J., Firtle, N.: Marketing Research in a Marketing Environment. Irwin, Homewood (1994)
Google Scholar
Doğru, F.Z., Bulut, Y.M., Arslan, O.: Finite mixtures of matrix variate t distributions. Gazi Univ. J. Sci. 29(2), 335–341 (2016)
Google Scholar
Fernandez, D., Arnold, R., Pledger, S.: Mixture-based clustering for the ordered stereotype model. Comput. Stat. Data Anal. 93, 46–75 (2016). https://doi.org/10.1016/j.csda.2014.11.004
Article MathSciNet Google Scholar
François-Lecompte, A., Innocent, M., Kréziak, D., Prim-Allaz, I.: Confinement et comportements alimentaires - Quelles évolutions en matière d’alimentation durable ? Rev. Fr. Gest. 46(293), 55–80 (2020). https://doi.org/10.3166/rfg.2020.00493
Article Google Scholar
Gallaugher, M.P.B., McNicholas, P.D.: Finite mixtures of skewed matrix variate distributions. Pattern Recognit. 80, 83–93 (2018). https://doi.org/10.1016/j.patcog.2018.02.025
Article Google Scholar
Gilula, Z., McCulloch, R.E., Ritov, Y., Urminsky, O.: A study into mechanisms of attitudinal scale conversion: a randomized stochastic ordering approach. Quant. Mark. Econ. 17(3), 325–357 (2019). https://doi.org/10.1007/s11129-019-09209-3
Article Google Scholar
Giordan, M., Diana, G.: A clustering method for categorical ordinal data. Commun. Stat. - Theory Methods 40(7), 1315–1334 (2011). https://doi.org/10.1080/03610920903581010
Article MathSciNet Google Scholar
Gupta, A.K., Nagar, D.K.: Matrix Variate Distributions. Chapman and Hall/CRC (2000)
Iannario, M., Piccolo, D.: A generalized framework for modelling ordinal data. Stat. Methods Appl. 25(2), 163–189 (2016). https://doi.org/10.1007/s10260-015-0316-9
Article MathSciNet Google Scholar
Jacques, J., Biernacki, C.: Model-based co-clustering for ordinal data. Comput. Stat. Data Anal. 123, 101–115 (2018). https://doi.org/10.1016/j.csda.2018.01.014
Article MathSciNet Google Scholar
Komárek, A., Komárková, L.: Capabilities of R package mixAK for clustering based on multivariate continuous and discrete longitudinal data. J. Stat. Softw. 59(12), 1–38 (2014). https://doi.org/10.18637/jss.v059.i12
Article Google Scholar
Kruschke, J.K.: Doing Bayesian Data Analysis. Elsevier, Academic Press (2015)
Lewis, S.J.G., Foltynie, T., Blackwell, A.D., Robbins, T.W., Owen, A.M., Barker, R.A.: Heterogeneity of Parkinson’s disease in the early clinical stages using a data driven approach. J. Neurol. Neurosurg. Psychiatry 76(3), 343–348 (2005). https://doi.org/10.1136/jnnp.2003.033530
Article Google Scholar
Liddell, T.M., Kruschke, J.K.: Analyzing ordinal data with metric models: what could possibly go wrong? J. Exp. Soc. Psychol. 79, 328–348 (2018). https://doi.org/10.1016/j.jesp.2018.08.009
Article Google Scholar
Likert, R.: A technique for the measurement of attitudes. Arch. Psychol. 140, 5–55 (1932)
Google Scholar
Lynch, S.M.: Introduction to Applied Bayesian Statistics and Estimation for Social Scientists. Springer, New York (2007)
Book Google Scholar
McKelvey, R.D., Zavoina, W.: A statistical model for the analysis of ordinal level dependent variables. J. Math. Sociol. 4(1), 103–120 (1975). https://doi.org/10.1080/0022250X.1975.9989847
Article MathSciNet Google Scholar
McNicholas, P.D., Murphy, T.B.: Model-based clustering of longitudinal data. Can. J. Stat./La Revue Canadienne de Stat. 38(1), 153–168 (2010). https://doi.org/10.1002/cjs.10047
Article MathSciNet Google Scholar
McParland, D., Gormley, I.C.: Clustering ordinal data via latent variable models. In: Algorithms from and for Nature and Life. Springer, pp. 127–135 (2013). https://doi.org/10.1007/978-3-319-00035-0_12
McParland, D., Gormley, I.C.: Model based clustering for mixed data: clustMD. Adv. Data Anal. Classif. 10(2), 155–169 (2016). https://doi.org/10.1007/s11634-016-0238-x
Article MathSciNet Google Scholar
Melnykov, V., Zhu, X.: On model-based clustering of skewed matrix data. J. Multivar. Anal. 167, 181–194 (2018). https://doi.org/10.1016/j.jmva.2018.04.007
Article MathSciNet Google Scholar
Melnykov, V., Zhu, X.: Studying crime trends in the USA over the years 2000–2012. Adv. Data Anal. Classif. 13(1), 325–341 (2019). https://doi.org/10.1007/s11634-018-0326-1
Article MathSciNet Google Scholar
Millsap, R.E., Yun-Tein, J.: Assessing factorial invariance in ordered-categorical measures. Multivar. Behav. Res. 39(3), 479–515 (2004). https://doi.org/10.1207/S15327906MBR3903_4
Article Google Scholar
Ranalli, M., Rocci, R.: Mixture models for ordinal data: a pairwise likelihood approach. Stat. Comput. 26(1–2), 529–547 (2016). https://doi.org/10.1007/s11222-014-9543-4
Article MathSciNet Google Scholar
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971). https://doi.org/10.1080/01621459.1971.10482356
Article Google Scholar
Sarkar, S., Zhu, X., Melnykov, V., Ingrassia, S.: On parsimonious models for modeling matrix data. Comput. Stat. Data Anal. 142, 106822 (2020). https://doi.org/10.1016/j.csda.2019.106822
Article MathSciNet Google Scholar
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978). https://doi.org/10.1214/aos/1176344136
Article MathSciNet Google Scholar
Scrucca, L., Fop, M., Murphy, T.B., Raftery, A.E.: mclust 5: Clustering, classification and density estimation using Gaussian finite mixture models. R J. 8(1), 289–317 (2016). https://doi.org/10.32614/RJ-2016-021
Article Google Scholar
Selosse, M., Jacques, J., Biernacki, C., Cousson-Gélie, F.: Analysing a quality-of-life survey by using a co-clustering model for ordinal data and some dynamic implications. J. R. Stat. Soc.: Ser. C (Appl. Stat.) 68(5), 1327–1349 (2019). https://doi.org/10.1111/rssc.12365
Article MathSciNet Google Scholar
Selosse, M., Jacques, J., Biernacki, C.: ordinalClust: an R package to analyze ordinal data. R J. 12(2), 173–188 (2021). https://doi.org/10.32614/RJ-2021-011
Article Google Scholar
Stevens, S.S.: On the theory of scales of measurement. Science 103(2684), 677–680 (1946). https://doi.org/10.1126/science.103.2684.677
Article Google Scholar
Tomarchio, S.D., Punzo, A., Bagnato, L.: Two new matrix-variate distributions with application in model-based clustering. Comput. Stat. Data Anal. 152, 107050 (2020). https://doi.org/10.1016/j.csda.2020.107050
Vávra, J., Komárek, A.: Classification based on multivariate mixed type longitudinal data with an application to the EU-SILC database. Adv. Data Anal. Classif. 17(2), 369–406 (2023). https://doi.org/10.1007/s11634-022-00504-8
Article MathSciNet Google Scholar
Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S. Springer, New York (2002)
Book Google Scholar
Vermunt, J.K., Magidson, J.: Latnt GOLD 4.0 User’s Guide. Statistical Innovations Inc., Belmont, USA (2005)
Viroli, C.: Finite mixtures of matrix normal distributions for classifying three-way data. Stat. Comput. 21(4), 511–522 (2011). https://doi.org/10.1007/s11222-010-9188-x
Article MathSciNet Google Scholar
Viroli, C.: Model based clustering for three-way data structures. Bayesian Anal. 6(4), 573–602 (2011). https://doi.org/10.1214/11-BA622
Article MathSciNet Google Scholar
Viroli, C.: On matrix-variate regression analysis. J. Multivar. Anal. 111, 296–309 (2012). https://doi.org/10.1016/j.jmva.2012.04.005
Article MathSciNet Google Scholar
Wang, Y., Melnykov, V.: On variable selection in matrix mixture modelling. Stat 9(1), 278 (2020). https://doi.org/10.1002/sta4.278
Article MathSciNet Google Scholar
Winship, C., Mare, R.D.: Regression models with ordinal variables. Am. Sociol. Rev. 512–525 (1984) https://doi.org/10.2307/2095465
Zhu, X., Sarkar, S., Melnykov, V.: MatTransMix: an R package for matrix model-based clustering and parsimonious mixture modeling. J. Classif. 39(1), 147–170 (2022). https://doi.org/10.1007/s00357-021-09401-9
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work has been realised thanks to the financial support provided by Project IADoc@UdL of the University of Lyon and Université Lumière - Lyon 2 as part of the call for "doctoral contracts in artificial intelligence 2020" (ANR-20-THIA-0007-01). We want to thank Agnès François-Lecompte, Morgane Innocent and Dominique Kréziak, co-authors for their work in François-Lecompte et al. (2020) for sharing their data. We would also like to thank Brendan Murphy for his invaluable inputs and support throughout the research process. His insights and expertise were instrumental in shaping the direction of this project.

Author information

Julien Jacques and Isabelle Prim-Allaz have contributed equally to this work.

Authors and Affiliations

ERIC, Univ Lyon, Univ Lyon 2, 5 Avenue Mendès France, 69676, Bron Cedex, France
Francesco Amato & Julien Jacques
COACTIS, Univ Lyon, Univ Lyon 2, 16 Avenue Berthelot, 69007, Lyon, France
Isabelle Prim-Allaz

Authors

Francesco Amato
View author publications
You can also search for this author in PubMed Google Scholar
Julien Jacques
View author publications
You can also search for this author in PubMed Google Scholar
Isabelle Prim-Allaz
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

FA and JJ wrote the main manuscript text, designed and implemented the model. IP-A provided the data, contributed and supervised the real-world application. All authors reviewed the manuscript.

Corresponding author

Correspondence to Francesco Amato.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

See Tables 2, 3, 4, 5 and 6.

Table 2 Clusters’ means over time. The estimated parameter \(\hat{\pi }\) = (0.37,0.44,0.19)

Full size table

Table 3 Clusters’ time correlation

Full size table

Table 4 Clusters’ time covariances

Full size table

Table 5 Clusters’ variables correlation

Full size table

Table 6 Clusters’ variables covariances

Full size table

Appendix B

See Fig. 10.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Amato, F., Jacques, J. & Prim-Allaz, I. Clustering longitudinal ordinal data via finite mixture of matrix-variate distributions. Stat Comput 34, 81 (2024). https://doi.org/10.1007/s11222-024-10390-z

Download citation

Received: 13 June 2023
Accepted: 18 January 2024
Published: 17 February 2024
DOI: https://doi.org/10.1007/s11222-024-10390-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clustering longitudinal ordinal data via finite mixture of matrix-variate distributions

Abstract

Access this article

Similar content being viewed by others

Row mixture-based clustering with covariates for ordinal responses

Bayesian model-based clustering for longitudinal ordinal data

Clustering Ordinal Data via Latent Variable Models

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A

Appendix B

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Clustering longitudinal ordinal data via finite mixture of matrix-variate distributions

Abstract

Access this article

Similar content being viewed by others

Row mixture-based clustering with covariates for ordinal responses

Bayesian model-based clustering for longitudinal ordinal data

Clustering Ordinal Data via Latent Variable Models

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A

Appendix B

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation