This is the fifth Special Issue of ADAC dedicated to recent developments in Models and Learning in Clustering and Classification, an area which provides increasingly active research in both theoretical and applied domains and has attracted the interest of a growing number of researchers.

This special issue is divided in two parts due to the large number of papers that have been submitted for publication. The first part has been published in volume 16, issue 1. This second part contains 9 papers which have been accepted for publication after a blinded peer-reviewed process, dealing essentially with three main broad areas. The first four contributions present topics in mixture models, of which the first three papers concern different issues in Gaussian mixture models: model-based clustering for high-dimensional data, implementation of hierarchical structure on variables for each component, impact of different prior specifications on the results obtained in Bayesian cluster analysis. The fourth contribution concerns mixture models with von-Mises Fisher density components.

The second main group deals with latent class models. In this framework, contributions concern new approaches in bias-adjusted three-step latent class analysis, a popular technique to relate covariates to class membership, a case study about the changing attitudes toward immigration in EU host countries, a multilevel latent Markov model based on separate random effects. The last two papers concern the proposal of new in the framework of longitudinal data analysis and for a two-class problem with high-dimensional compositional covariates.

The paper “Factor and Hybrid Components for Model-Based Clustering” by Jason Hou-Liu and Ryan Patrick Browne deals with the problem of model-based clustering for high-dimensional data set, which is challenging due to the large number of free parameters in the model. The authors impose an intra-cluster structure that captures hybridization of mean and covariance parameters between components for the multivariate normal distribution and achieve parameter reduction by expressing a subset of the Gaussian mixture components as a weighted combination of other components. An estimation procedure is provided based on the Expectation-Maximization algorithm.

The paper “Gaussian mixture model with an extended ultrametric covariance structure” by Carlo Cavicchia, Maurizio Vichi, and Giorgia Zaccaria introduces a new class of Gaussian mixture models (GMMs) by assuming an extended ultrametric covariance matrix for each cluster. The new model is parsimonious by imposing a hierarchical structure on variables for each component of the GMM, and allows identifying a different characterization of multidimensional phenomena for each component of the mixture. The multidimensional phenomena are characterized by nested latent concepts having different levels of abstraction, from the most specific to the most general. The proposed model is applied on both synthetic and real data.

The paper entitled “How many data clusters are in the Galaxy data set? Bayesian cluster analysis in action” by Bettina Grün, Gertraud Malsiner-Walli, and Sylvia Frühwirth-Schnatter investigates the impact of different prior specifications on the results obtained in Bayesian cluster analysis based on mixture models for the benchmark Galaxy data set. More specifically, they perform a sensitivity analysis of different prior specifications for finite mixture models in a full factorial design and assess their impact on the estimated number of clusters for the Galaxy data set. Results indicate the interaction effects of the prior specifications and provide some insights into which prior specifications are recommended in practical applications.

A different mixture model approach is presented in the paper entitled “A von-Mises Fisher mixture model for clustering numerical and categorical variable” by Xavier Bry and Lionel Cucala that concerns classification of units coming from variables which may be of different numerical or categorical. Here the approach is based on a mixture model with Mises–Fisher density components, which is the natural directional distribution on a unit hypersphere. Preliminary analysis focuses on the geometric representation of the data and some theoretical results about the variable space are presented. The procedure is illustrated on the ground of many numerical studies with different numbers of clusters and different structures, mainly based on simulated data.

New methodological results in the framework of Latent Class Analysis modeling are presented in the paper entitled “A new three-step method for using inverse propensity weighting with latent class analysis” by Felix Clouth, Steffen Pauws, Floortje Mols and Jeroen Vermunt. In this framework, the paper extends the bias-adjusted three-step latent class analysis, a popular technique to relate covariates to class membership, to incorporate inverse propensity weighting in LCA to estimate average treatment effects (ATE) by adjusting for confounding in observational data. This approach separates the estimation of the measurement model from the estimation of the ATE and allows for using multiple imputation in the propensity score model. The performance of the new approach is illustrated through a large numerical study concerning both simulated and real data.

An interesting case study concerning the changing attitudes toward immigration in EU host countries in the period 2010–2018 is presented in the paper entitled “Are attitudes toward immigration changing in Europe? An analysis based on latent class IRT models” by Ewa Genge and Francesco Bartolucci. Data come from the cross-national European Social Survey that measure changes in social structure, conditions, and opinions in Europe. The statistical model is based on a latent class approach including covariates and considers a suitable Item Response Theory parametrizations. The results point out which countries tend to be more or less positive toward immigration and show the temporal dynamics of the phenomenon under study. Data are extensively described in the first part of the paper and a large literature review is also presented.

An ordinal multilevel latent Markov model based on separate random effects is proposed in the paper entitled “Model-based two-way clustering of second-level units in ordinal multilevel latent Markov models” by Giorgio Eduardo Montanari, Marco Doretti and Francesca Marino. With respect to existing literature, a discrete bivariate random effect is here introduced in order to avoid unverifiable parametric assumptions on the second-level latent variable and to obtain a direct two-way clustering of second-level units. A two-step maximum likelihood estimation procedure is presented and some computational issues about the EM algorithm are also provided. The method is illustrated on the basis of some numerical simulations and a large case study based on real data.

The paper “A two-step estimator for generalized linear models for longitudinal data with time-varying measurement error” by Roberto Di Mari and Antonello Maruotti deals with generalized linear models for longitudinal data, when some of the covariates are measured with error. The main result is the proposition of a two-step estimator, which although sub-optimal compared to a one-step approach, is computationally lighter, especially when considering successively different subsets of variables in the model. Moreover, simulation study shows that the loss of efficiency of this two-step estimators in comparison with the optimal one is negligeable.

The paper entitled “Advances in Data Analysis and Classification Robust Logistic Zero-Sum Regression for Microbiome Compositional Data” by Gianna Serafina Monti and Peter Filzmoser introduces a Robust Logistic Zero-Sum Regression (RobLZS) estimator for a two-class problem with high-dimensional compositional covariates. By employing the log-contrast model, the estimator can perform feature selection among the compositional parts. In addition, the proposed method achieves the robustness by minimizing a trimmed sum of deviances. The performance of the RobLZS estimator is compared with a non-robust counterpart and with other sparse logistic regression estimators via Monte Carlo simulation studies.

The Editors gratefully acknowledge the assistance of the following experts and colleagues in the process of reviewing the manuscripts that were submitted for this special issue:

Margareta Ackerman (USA), Daniel Ahfock (Australia), Julien Ah-Pine (France), Andreas Alfons (Netherlands), Silvia Bacci (Italy), Zsuzsa Bakk (Netherlands), Michel Broniatowski (France), Ryan Browne (Canada), Silvia Cagnone (Italy), Gabriela Ciuperca (France), Pietro Coretto (Italy), Marco Corneli (France), Roberto Di Mari (Italy), Jean-Baptiste Durand (France), John Dziak (USA), Brian Franczak (Canada), Luis Angel García-Escudero (Spain), Michael Fop (Italy), Claire Gormley (Ireland), David Hitchcock (USA), Tsung-I Lin (Taiwan), Dimitris Karlis (Greece), Agustin Mayo-Iscar (Spain), Geoffrey McLachlan (Australia), Igor Melnykov (USA), Colin Pawlowski (USA), Monia Ranalli (Italy), Shuchismita Sarkar (USA), Cristina Tortora (USA), Cinzia Viroli (Italy), Adalbert Wilhelm (Germany), Darren Wraith (Australia), Ping-Feng Xu (China), Jie Yang (USA), Ying Daisy Zhuo (USA).