Multivariate ordinal regression for multiple repeated measurements

In this paper we propose a multivariate ordinal regression model which allows the joint modeling of three-dimensional panel data containing both repeated and multiple measurements for a collection of subjects. This is achieved by a multivariate autoregressive structure on the errors of the latent variables underlying the ordinal responses, where we distinguish between the correlations at a single point in time and the persistence over time. The error distribution is assumed to be normal or Student t distributed. The estimation is performed using composite likelihood methods. We perform several simulation exercises to investigate the quality of the estimates in different settings as well as in comparison with a Bayesian approach. The simulation study confirms that the estimation procedure is able to recover the model parameters well and is competitive in terms of computation time. We also introduce R package mvordflex and illustrate how this implementation can be used to estimate the proposed model in a user-friendly, convenient way. Finally, we illustrate the framework on a data set containing firm failure and credit ratings information from the rating agencies S&P and Moody's for US listed companies.


Introduction
The analysis of correlated ordinal outcomes is an important task in a wide range of research fields.It is often the case that multiple ordinal outcomes are observed repeatedly over a period of time for a collection of subjects.The modeling of such three-dimensional data (possibly in a regression setting) should therefore take into account possible dependencies in the cross-section, i.e., given that the multiple outcomes are observed on the same subjects, as well as over time (i.e., longitudinal).
In this paper we propose a multivariate ordinal regression model which can capture dependence among both repeated and multiple measurements in an ordinal model.We achieve this by imposing a multivariate AR(1) correlation structure on the errors of the continuous process underlying the discrete ordinal observations.The multivariate AR(1) process accounts for the correlations among the multiple ordinal responses at the same point in time as well as for the persistence in each of the multiple responses over time, while keeping the number of parameters to be estimated for the error structure low.The model proposed in this paper therefore extends the two-dimensional class of multivariate regression models to accommodate for a more complex dependence structure.
The estimation of the model parameters is performed by composite likelihood methods.In a simulation study we examine the quality of the estimates of the proposed model for different scenarios related to the distribution of the errors and to the degree of correlation present in the data.The results of the simulation study confirm that the composite likelihood methods are able to recover the parameters of the model well.We also perform a simulation exercise where we compare the proposed framework with an implementation using Bayesian methods and show that for the investigated setting the pairwise likelihood approach is a competitive alternative to Bayesian inference, while having a lower computational cost.
We implement the proposed model for three-dimensional panel data in mvordflex (Hirk and Vana, 2024), which is built as an extension to the existing R package mvord.As in package mvord, the model can be estimated by using a multivariate probit and multivariate logit link.Moreover, the regression coefficients and the threshold parameters of the ordinal regression are allowed to vary across time points and responses, but if more parsimonious specifications are desired, they can be constrained to be equal along some or all time-outcome dimensions.Having a ready-to-use implementation will hopefully make the model class more accessible to users in a variety of application fields.
In the empirical application we employ a data set of US listed firms which have been rated by either Standard and Poor's (S&P) or Moody's.For these firms we record the available S&P and Moody's ratings together with an indicator containing information on whether the company went into bankruptcy in the year following the rating observations.We show how the proposed model can be applied to these data and how model comparison with simpler specifications can be performed.
The composite likelihood approach for estimation in multivariate ordinal regression-type models is an attractive choice, given that it requires the computation of low dimensional integrals instead of the highdimensional integrals necessary for the evaluation of the likelihood function.Composite likelihood methods has been employed for ordinal models with two-dimensional responses either in the cross-section (e.g., Scott and Kanaroglou, 2002;Bhat et al., 2010;Kenne Pagui and Canale, 2016;Hirk et al., 2021) or longitudinally (see e.g., Varin and Czado, 2010;Reusens and Croux, 2017;Tuzcuoglu, 2022;Hirk et al., 2022).A software implementation for the two-dimensional model class is provided in the package mvord for R (Hirk et al., 2020).
The estimation of regression models with autoregressive errors has been an active field of research.Among the pioneer papers which use linear regression with autoregressive errors to model a time-series in the presence of covariates we mention Cochrane and Orcutt (1949); Anderson (1954); Durbin (1960); Zellner and Tiao (1964); Chib (1993).More recent papers include Alpuim and El-Shaarawi (2008); Tuaç et al. (2018Tuaç et al. ( , 2020) ) where the autoregressive errors of order p follow normal, Student-t and skew-symmetric distributions respectively.For the case of subjects observed over a collection of time points several approaches have been proposed such as mixed effects models (Wang and Fan, 2010) or joint mean-covariance models (Guney et al., 2022).For the three-dimensional setting, where several continuous outcomes (or responses) are observed for a collection of subjects over time, the literature is scarcer, especially for modeling ordinal data, but several approaches have been proposed for different applications.Chaubert et al. (2008) propose a dynamic multivariate ordinal probit model, which assumes a general correlation structure for the cross-sectional responses and a general multivariate autoregressive model on the time-varying regression coefficients, rather than on the error terms.This approach is appropriate if one assumes that the longitudinal correlations arise due to the autocorrelation in the regression coefficients rather than due to unobserved covariates.Similarly, Bartolucci and Farcomeni (2009) proposed multivariate model for categorical data where a set of subject-specific intercepts assumed to follow a first-order Markov chain.Another model class which has been employed are mixed-effect models, which account for dependence in the responses by introducing latent effects at different levels of hierarchy.Conditional on these effects, the responses are typically assumed to be independent (see e.g., Li et al., 2019).Liu and Hedeker (2006) propose a three-level mixed effects item response model for ordinal data which contains subject and subject-time random intercepts.Lin et al. (2021) extend the model in Liu and Hedeker (2006) to also include random slopes to measure the the effect of a subject on the response and also its change over time.Cagnone et al. (2009) propose latent variable models containing item-specific random effects and a common factor where the relationships between the time-dependent latent variables are modeled using autoregressive processes.A similar approach has been proposed in Vana and Hornik (2021) for a credit risk application similar to the one presented in this paper.An application of a model for three-dimensional panel data is presented in Schliep et al. (2021), who employ a model with random coefficients to identify the cumulative effects of training and recovery in athletes.The estimation of the models presented above is typically performed by maximum likelihood methods (using EM-type algorithms) or, more commonly, by Markov chain Monte Carlo methods in a Bayesian setting, where priors must be specified.In either case, computations can prove to be rather intensive.
The paper is structured as follows: Section 2 introduces the model and Section 3 presents the results of the simulation study.Section 4 introduces the credit risk application by describing the data employed and presenting the results of the estimated model.The software implementation as an R package is described in Section 5. Section 6 concludes the paper.

The model
We extend the approach of multivariate ordinal regression models in Hirk et al. (2020), which model ordinal responses for a collection of subjects observed either longitudinally (assuming an AR(1) correlation structure, see e.g., Hirk et al. (2022)) or cross-sectionally (e.g., assuming a general correlation structure, see e.g., Hirk et al. (2021)).In this paper we combine the two modeling settings by imposing a specific multivariate autoregressive structure of order one on the errors of the latent variables underlying the observed ordinal outcomes.The proposed error structure is able to account for both cross-sectional and longitudinal dependence among the ordinal responses.

General set-up
Let y j i,t denote an ordinal observation and i ∈ {1, . . ., n} denotes the subject index, t ∈ {1, 2, . . ., T } is a time index among all equidistant T time points, and j ∈ {1, . . .q} is the outcome index out of all q available outcomes.Here we assume the panel data does not contain any missing values (all outcomes in all time points are observed for all subjects).Note however, that the framework can accommodate for missing values, as will be discussed in Section 2.4.We assume the ordinal observation y j i,t to be a coarser version of a continuous latent variable ỹj i,t connected by a vector of suitable threshold parameters.These threshold parameters θ j t can in the most general case be assumed to be time-as well as outcome-varying: where r is one of the K j ordered categories of outcome j.For each outcome j and time point t, we have the monotonicity restriction on the threshold parameters Furthermore, we assume the following linear regression model for ỹj i,t : i.e., ỹj i,t depends linearly on a p-dimensional vector of time-and outcome-specific covariates x j i,t , where β j t is a time-and outcome-specific p-dimensional vector of covariates and ϵ j i,t is an error term of subject i for outcome j in time t.In the complete case y i,t is a q-dimensional vector with y i,t = (y1 i,t , y 2 i,t , . . ., y q i,t ) ⊤ .We define the following (p • q) × q matrix of predictors: .

Structure of the errors
We consider an auto-regressive structure on the q-dimensional error terms ϵ i,t : where Ψ = diag(ψ 1 , ψ 2 , . . ., ψ q ) is a diagonal matrix of persistence parameters for each outcome j with |ψ j | < 1.These persistence parameters will capture the longitudinal dependence on past values of the same outcome 1 .The q-dimensional mean-zero error term u i,t is independent and identically distributed among the subjects, outcomes and time points with cumulative distribution function F (u j i,t iid ∼ F ) and it is independent of ϵ j i,t−k , k > 0 for all t and j.The matrix Σ t captures the cross-sectional correlation among the different outcomes at time t conditional on ϵ i,t−1 .
For the corresponding vector of latent variables ỹ * i we have: where X * i is a block-diagonal matrix with ) ⊤ denote the subject-level qT -dimensional mean zero error terms.
We are interested in representing the above model at a subject level, where the dependence in the subjectlevel errors is given by the stationary distribution of the process in Equation (1).

Multivariate probit link
A common approach is to assume that u j i,t has a standard normal distribution, and that the conditional covariance matrix Σ t is constant.Here we assume has a general correlation structure with q(q − 1)/2 parameters to be estimated, with diagonal elements set to one to ensure identifiability in the marginal ordinal models.Then, ϵ * i follows a multivariate normal distribution with mean zero and covariance matrix where Σ the unconditional variance of the multivariate AR(1) process in Equation (1) given by vec( Σ) = (I − Ψ ⊗ Ψ) −1 vec(Σ).This choice of F gives rise to the multivariate probit link.

Multivariate logit link
In this section we show how we choose F in a way that gives rise to the multivariate logit link in Hirk et al. (2020), where the subject-level errors are assumed to have the multivariate logistic distribution of O'Brien and Dunson (2004) L qT (0, Σ * ).We use the fact that the multivariate Student t distribution closely approximates the multivariate logistic distribution in O' Brien and Dunson (2004) when the covariance matrix Σ * is scaled by a constant γ = π 2 (ν − 2)/(3ν) and ν ≈ 8. Therefore, we choose F such that ϵ * i follows a qT -variate Student-t distribution with ν degrees of freedom, mean zero and covariance matrix γΣ * : In order to achieve this stationary distribution on ϵ * i , we can apply the results in Virolainen (2021) and choose F as the univariate Student t distribution with scale one and ν + q degrees of freedom.Unlike in the normal distribution case, the conditional covariance Σ t is heteroscedastic in the multivariate AR(1) process with Student t errors: According to Theorem 1 of Virolainen (2021), this implies a multivariate Student t stationary distribution on the errors ϵ i,t ∼ M V T q (0, γ Σ, ν).This in turn translates into ϵ * i ∼ M V T qT (0, γ Σ * , ν).Note that in all exercises involving the approximate logit link we fix ν = 8.

Pairwise likelihood estimation and inference
For a given vector of parameters δ containing the threshold parameters, regression coefficients and parameters of the error structure, the likelihood is given by the product of the following multivariate probabilities over all subjects: In the complete case, each multivariate probability corresponds to a q × T -dimensional integral P j∈1,...,q t∈{1,...,T } where D i = t∈{1,...,T } j∈1,...,q (θ j t,r j i,t −1 , θ j t,r j i,t ) is a Cartesian product (here r j i,t denotes the observed ordinal class for subject i, time t and outcome j) and f qT is the multivariate density of the error terms.
For parameter estimation of model (2) we use a composite likelihood approach, where we approximate the full likelihood above by a pairwise likelihood which is constructed from bivariate marginal distributions.The pairwise likelihood function is given by the product of the bivariate probabilities corresponding to all pairs of elements in y * i : where (y * i ) k denotes the k-th element of vector y * i and (r i ) k denotes the k-th element of subject-specific vector r i = (r 1 i,1 , . . ., r q i,1 , r 1 i,2 , . . ., r q i,2 , . . ., r 1 i,T , . . ., r q i,T ) ⊤ .Given that the pairwise likelihood for subject i consists of the product of q•T 2 bivariate probabilities, for cases where q • T is large this can prove to be computationally burdensome.In order to speed-up the pairwise likelihood computation, one option is to only consider pairs which lie close to each other in time, as they are the ones who are most informative on the persistence parameter.With a slight abuse of notation we denote k t the time index and k j the outcome index corresponding to the k-th element in the vector (y * i ) (e.g., for k = 2 k t = 1 and k j = 2).The expression in (5) is adjusted to: where c is a pre-defined lag and 1 is the indicator function.This strategy of considering only pairs of observations less distant than c time points has also been employed in Varin and Czado (2010), who propose tuning this parameter as the value minimizing a global fitting criterion such as the generalized variance (the determinant of the estimated covariance matrix of the estimates).
The maximum pairwise likelihood estimates δP L (c) are obtained by direct maximization of the log pairwise likelihood using general purpose optimization tools.Under regularity conditions, √ n( δP L (c) − δ) has an asymptotic normal distribution with mean 0 and covariance matrix equal to the Godambe information matrix G(δ, c) = H(δ, c) −1 V (δ, c)H(δ, c) −1 for a fixed value of c.The following consistent estimates of the Hessian matrix H(δ, c) and variability matrix V (δ, c) only necessitate the first derivatives of the log likelihood with respect to the parameters.
In general, adding more bivariate likelihood components will increase the Hessian matrix and therefore reduce the variance but adding too many correlated bivariate likelihoods will inflate the variability matrix (Ferrari et al., 2016).Moreover, in finite samples, the estimator Ĥ(δ, c) but more so V (δ, c) is unstable when n is rather small compared to the number of parameters (Varin et al., 2011).Alternatives to the estimator in Equation ( 7) include re-sampling methods such as bootstrap, jackknife, or the less computationally demanding one-step jackknife which offers a first order approximation to the jackknife (see Varin et al., 2011;Ferrari et al., 2016).Finally, model comparison can be performed using information criteria such as the composite likelihood Akaike or Bayesian information criterion (for more details see e.g., Varin and Vidoni, 2005).

Missing values
In the presence of missing values in the q × T vector of responses of subject i, we employ the same strategy as Hirk et al. (2019) and construct the pairwise likelihood only from the bivariate probabilities corresponding to all pairs of observed responses.If the number of observed outcomes for subject i is less than two, the univariate marginal distribution enters the likelihood instead of the bivariate ones.This approach assumes that the missing value mechanism is completely at random.Approaches to model the missing data mechanism jointly with the observations in longitudinal models can be found in e.g., Li and Grace (2013); Li et al. (2019).

Constraints on the threshold and regression coefficients
Constraints can be set on the vector of regression coefficients and on the threshold parameters, which in the most general case are assumed to be time-and outcome-varying.We define two q × T -dimensional linear predictors by making again use of the matrix notation: where θ * = ((θ 1 1 ) ⊤ , . . ., (θ q 1 ) ⊤ , . . ., (θ 1 T ) ⊤ , . . ., (θ q T ) ⊤ ) ⊤ and the matrices B lower i and B upper i are (q × T ) × (T q j=1 (K j − 1)) block diagonal binary matrices where the vector b j,upper i,t has length K j − 1 and contains a one in the r j i,t -th position if r j i,t ∈ {1, . . ., K j − 1}, else zero; the vector b j,lower i,t has length K j − 1 and contains a one in the (r j i,t − 1)-th position if r j i,t ∈ {2, . . ., K j }, else zero.
The probabilities in the likelihood function in Equation ( 4) can then be expressed as: Assuming that κ = ( θ⊤ , β⊤ ) ⊤ is the reduced (h × 1) vector of thresholds and coefficients to be estimated, the linear predictors can be rewritten as: where C is a contrast matrix of dimension (T × q j=1 (K j − 1) + qT p) × h.For example, the C matrix for a model where all thresholds should be constant over time and one set of regression coefficients should be employed for all t and j would be of dimension (T q j=1 (K j − 1) + qT p) × ( q j=1 (K j − 1) + p): where I .denotes the identity matrix, 0 is the zero matrix and ⊗ denotes the Kronecker product.

Simulation study
In order to investigate the quality of pairwise likelihood estimates of the proposed model we perform a simulation study.Within this study we simulate data from the proposed model with various parameter settings.

Different correlation settings
In a first exercise, we investigate the performance of the pairwise likelihood estimates when we employ different parameter values for the correlation structure.We consider a moderate data setting with n = 1000 subjects.To align with the setting of the application presented in Section 4, we generate a panel data with q = 3 different outcomes and T = 10 time points.Two of the ordinal outcomes have four categories and one outcome is binary (K 1 = 4, K 2 = 4, K 3 = 2).The threshold parameters vary among the three outcomes but are assumed to be constant over all time points: For the first outcome, we choose the thresholds such that the distribution of the four categories is more concentrated in the middle categories 2 and 3 and less on the peripheral categories 1 and 4. For the second outcome, the chosen thresholds lead to a more balanced distribution.The binary outcome is imbalanced.In all settings, we simulate p = 2 covariates from a standard normal distribution, which vary for each of the T time points, but do not vary with the outcomes j = 1, 2, 3. Again, this is in line with the empirical application where the covariates are built from the financial information of corporations and do not vary with the creditworthiness indicators used as response variables.The vector of regression coefficients is constant among all time points and outcomes β = (2, −1) ⊤ .For the error structure we simulate four different combinations of the inter-rater correlation matrix Σ and the time-persistence matrix Ψ: 1.000 0.100 0.200 0.100 1.000 0.300 0.200 0.300 1.000 1.000 0.950 0.875 0.950 1.000 0.800 0.875 0.800 1.000 In previous simulation studies on the performance of the pairwise likelihood estimates in ordinal regression models it has been observed that, when the true correlation among responses is low, these correlation parameters are not recovered as well as when the correlation is high (see study in e.g., Hirk et al., 2019).To investigate whether this is also the case in the proposed model, we consider four different scenarios for the error structure Σ low -Ψ low, Σ low -Ψ high, Σ high -Ψ low, Σ high -Ψ high.Finally, we perform the simulation study for the multivariate probit link multivariate logit link introduced in Section 2.2.In total we therefore consider 8 scenarios.In all scenarios we assume c = T − 1 do not exclude any pairs of observations from the pairwise likelihood.For all scenarios we replicated the simulation of the data sets 100 times.In Figure 1 we present for each outcome the distribution of the categories over the 100 data sets for each of the 8 scenarios.
Estimates of the parameters for the four correlation scenarios and two link functions are illustrated in Figure 2. Furthermore, we calculated the mean parameter estimate of the repetitions, the absolute percentage bias (APB)2 , the mean asymptotic standard error and the standard deviation of the parameters over the 100 repetitions.These results are presented in Tables 4, 3, 2 and 1.For all scenarios we recover well the true parameters of the proposed model when using the probit or logit link, with absolute percentage biases below 1% for the threshold and coefficient parameters, with negligible differences to be observed between the different correlation scenarios.Higher APBs are observed for the correlation parameters, especially for the cases where the true correlations are low.In Figure 2 one can observe that, for the probit link, there  is a sign-switch for the estimates of the error structure in several repetitions or the scenarios Σ high -Ψ low and Σ low -Ψ low.A closer inspection reveals that this happens for the persistence parameter ψ 3 of the binary response and for ρ 13 and ρ 23 , which are the cross-correlations for the ordinal outcomes with the binary outcomes.This issue seems to arise due to the fact that the distribution of the binary response is highly imbalanced for those scenarios (see Figure 2).For the data application, this can imply that a less imbalanced distribution of the binary response is desirable, especially if the estimated sign is not intuitive in the application context.This can be achieved by e.g., by up-sampling.Mean asymptotic standard errors and sample standard errors are similar in magnitude for the parameters of the error structure and for the regression parameters, but we can observe that for the threshold parameters the asymptotic standard errors are overoptimistic.This need not be a crucial issue, as typically inference on the threshold parameters is not of interest in most applications.It is to be noted that the number of repetitions in this exercise is not particularly large.However, this finding is in line with previous observations in the literature on composite likelihood methods for longitudinal data (Varin and Czado, 2010) and is likely due to the instability in estimating the variability matrix V .
Computations for this exercise have been performed on 25 IBM dx360M3 nodes within a cluster of workstations.

Different values of lag parameter c
As mentioned in Section 2.3, the lag value c determines the maximal lag of the longitudinal observations to be included in the pairwise likelihood.The main rationale behind using c < T − 1 is for reducing the computational load when estimating the parameters.Another aspect is the efficiency of the estimates, which can suffer when too many pairs are added to the pairwise likelihood (see discussion in Section 2.3).
Based on recommendations in the literature, c can be chosen by minimizing the generalized variance (i.e., determinant of the covariance matrix of the estimates).We wish to investigate in a second simulation exercise whether this approach performs well for the proposed model.For this purpose we generate 100 data sets from the model with n = 200 subjects, T = 10 time points and q = 3 ordinal responses.As in the previous exercise, we simulate two covariates from a standard normal distribution (p = 2), which vary for each of the T time points, but do not vary with the ordinal outcomes.The vector of regression coefficients and the threshold parameters are chosen as in Section 3.1.For the matrices Σ and Ψ we consider a mix of We only perform the exercise for the probit link.Figure 3 summarizes the results of this exercise.In the upper left panel we show the average estimated log generalized asymptotic variance of the estimates over M = 100 simulated data sets.
where δ(m) P L (c) denotes the maximum pairwise likelihood estimates in replication m for a fixed value of c.We observe that for the investigated setting, ĝ(c) is minimal at lag one.We superimpose the log determinant of the empirical covariance of the estimates in the M = 100 repetitions in dashed line.We observe that the empirical and asymptotic generalized variance start converging to each other around lag 8.
In the lower left panel of Figure 3 we show the mean squared error (MSE) for the parameters over the M = 100 replications together with the decomposition in squared bias and variance.We observe that the MSE decreases only after lag 3 and is minimal at lag 9. Hence, we find that the generalized variance is not a satisfactory measure to select the lag parameter c, especially for lower lag values and that this measure should be employed with care in an empirical application.A bootstrapping exercise would be therefore beneficial in a real data example, if lag selection is desired.This can however amount to a significant computational burden.Such an exercise would be advisable when many model specifications should be compared to each other or when performing variable selection based on information criteria.The lag would be chosen on the most complex model and would be then used for all simpler models (the reason being that for model comparison, the pairwise likelihood of the different models should include the same pairs of observations).We also provide the computation time need for the different lags in Figure 3.We observe a quadratic relation between the computation time and the lag parameter c.This is explained by the fact that decreasing c by 1 results in a reduction in the number of pairs in the likelihood of q 2 (T − c − 1)(T − c)/2.
In spite of the reduction in computation time, based on the MSEs and the empirical covariance it is suggested that for the studied case reducing c is not connected to benefits in terms of inference.
For this exercise we used the solver NEWUOA in package optimx (John C. Nash, 2014) for optimizing the pairwise likelihood.The simulations were performed on a MacBook Pro with M1 chip and 16 GB RAM.

Comparison with a Bayesian approach
In final simulation exercise we aim to compare the proposed model estimated by package mvordflex with an alternative approach.To date, there are no ready to use implementations of such a model, but similar models have been proposed which make use of Bayesian methods for estimation and inference.We therefore estimate the proposed model with probit link using the Bayesian approach by implementing a program in rstan (Stan Development Team, 2024), the R interface to Stan.The estimation is performed using Hamiltonian Monte Carlo methods (for more details see e.g., Betancourt and Girolami, 2015).
We implement the proposed model in rstan and perform a small simulation study to compare the time of computation and quality of the estimates using mvordflex and rstan.
We again generate M = 100 data sets from the model with n = 200 subjects, q = 3 responses and T = 5 time points.We keep the same parameter values as Section 3.2.
In the estimation of the Bayesian model we use one chain with 2000 draws, out of which 1000 are discarded after warm-up.The average computation across the 100 repetitions is 90.04 seconds for the model estimated with mvordflex and 1335.24seconds for the Bayesian model.The bottom panel of Figure 4 illustrates the distribution of the computation time in seconds over the 100 repetitions.
Figure 4 shows the distribution over the 100 repetitions of estimated coefficients by mvordflex and the distribution of the posterior means of the parameters in the Bayesian model.Table 5 provides some summary statistics.We observe that the true threshold and coefficient parameters are recovered well by both methods, while the Bayesian method is performing more poorly for recovering some of the error structure parameters.It should be noted that generic priors have been used in the implementation of the Bayesian model, so further tuning of the priors and of the starting values could make the Bayesian model more competitive.Moreover, the traceplots of the Bayesian model reveal some degree of autocorrelation, so increasing the number of  draws and thinning the Markov chains would lead to more efficient estimates.All in all, we show that for the investigated setting, the pairwise likelihood approach for the proposed model performs well both in terms of parameter estimation and in terms of computation time.
For this exercise we used the solver nlminb in package optimx (John C. Nash, 2014) for optimizing the pairwise likelihood.The simulations were performed on a MacBook Pro with M1 chip and 16 GB RAM.

Empirical analysis
We apply the proposed model to corporate credit ratings from Standard and Poor's (S&P) and Moody's as well as to a failure information indicator over the period of 2003-2013.The flexible framework allows to account for the time persistence of the ratings and failures, as well as for the correlation among the raters and failure dimensions.

Data
In this paper we use S&P long-term issuer credit ratings from the Compustat-Capital IQ Credit Ratings database as well as issuer credit ratings from Moody's.S&P provides its ratings on a scale with 21 nondefault categories ranging from AAA to C. Moody's uses a different scale by assigning 21 rating classes ranging from Aaa to the default class C. The failure indicator is constructed based on the default data from the UCLA-LoPucki Bankruptcy Research Database and the Mergent issuer default file.A binary failure indicator is constructed in the following way: a default is recorded in a year if a firm filed for bankruptcy under Chapter 7 or Chapter 11 or the firm receives a default rating from one of the CRAs in the year following the rating observation.This definition is similar to definition of Campbell et al. (2008), and to the promoted default definition in Bank of International Settlements (2004).
For the construction of the firm-level variables we make use the Compustat and CRSP databases together with the corresponding linking files available on Wharton research data services (WRDS).We use the precalculated financial ratios available in the Financial Ratios Suite.We include in the analysis the universe of Compustat/CRSP US corporates which have at least one S&P rating observation or a rating observation of Moody's in the period from 2003 to 2013 (T = 11).Following common practice, we exclude financial, utility and real estate firms from the data set, as their credit risk is different than that of other sectors (mainly due to the higher likelihood of intervention of the government in case of default).The end of year ratings are merged to the financial ratios on a calendar year basis.We perform this by assigning the latest financial statement available before year endsto the end-of-year ratings.For the computation of the market variables we use daily stock price data available an CRSP.Winsorization of all explanatory variables at the 99th percentile as well as for ratios with negative values at the 1st percentile is conducted.
As explanatory variables we make use of the p = 7 variables proposed by Tian et al. (2015), who select these variables based on a statistical variable selection exercise and show that improved performance when predicting defaults in a static setting compared to other established models in the credit risk literature.Table 6: Summary statistics of all variables for the entire data set and the failure group.
Table 6 summarizes the explanatory variables idiosyncratic risk (SIGM A), short-term liabilities-to-assets ratio (lct/at), debt-to-assets ratio (debt/at), net income-to-market assets (ni/mta), total liabilities-to-market assets ratio (lt/mta), stock price capped at 15$ (P RICECAP ) and excess return over the ARCA/AMEX index (EXRET ) for the entire data set and the failure group.We observe noticeable higher means and medians for the idiosyncratic risk SIGM A, and for the liability-related ratios lct/at, debt/at and lt/mta in the failure group compared to the entire sample, while firms in the failure group have on average lower ni/mta, P RICECAP , and EXRET values.
In total, the obtained data set comprises 1519 firms with 11277 firm-year observations.Figure 5 shows 2003 2004 2005 2006 2007 2008  Fig. 6: Distribution of the number of time points observed per firm in the sample.
Finally, we motivate the autoregressive lag of order one by the short length of the time series.Figure 6 distribution of the number of time points observed per firm in the sample.Given that the companies can enter (exit) the sample after (before) the start (end) of the sample period, we observe 587 firms with all time points observed.Moreover, for the firms with more than two observed time points we perform the following exercise.For each of the three responses, we estimate a separate ordinal model with iid errors using the ordinal R package (Christensen, 2023).For each of the three models we calculate the surrogate residuals proposed in Liu and Zhang (2018) using the sure R package (Greenwell et al., 2017).We then run an ARIMA model on the residuals for each firm and each response and select the optimal autoregressive lag based on AIC.This analysis shows that for the large majority of the firms either zero or one autoregressive lag has been selected, with only a few firms having a selection of lag two.
Finally, given the low failure rate in the sample, we decided to up-sample the defaulted companies to have a ratio of 50% failed vs 50% non-failed companies in the sample.Note that this does not lead to a 50% default rate in the sample, as the whole history of the defaulted firms is repeatedly included in the sample.After up-sampling, we achieve an overall failure rate of 11.98%.
We estimate this model specification with both the logit and probit link and find the model with logit link to have a better fit based on the composite likelihood Akaike information criterion (979045.65 logit link vs. 990175.88probit link; note that a lower value implies a better fit) and composite likelihood Bayesian information criterion (993631.26logit link vs. 1004452.16probit link).The estimated parameters of the model with logit link are shown in the first column of Table 8.
The model provides information on the difference in the covariates among the three outcomes: S&P ratings, Moody's ratings and the failure indicator.Given that both the threshold and the regression parameters are allowed to vary with the outcome, direct comparisons among the magnitude of coefficients and thresholds of the different outcomes have to be performed with care.This is due to the fact that in ordinal models absolute location and absolute scale are not identifiable.Finally, the proposed error structure gives insights into the time-persistence of the separate outcomes as well as the contemporaneous correlations among the three outcomes.
When analyzing the regression coefficients displayed in Table 8, we find that the signs for most of the significant coefficients are as expected.The variables lct/at, debt/at and ni/mta are non-significant for the failure dimensions at a 5% significance level.The excess return EXRET variable has a positive sign, meaning that higher excess returns lead to higher values in the default process.On the other hand, this variable has a negative sign for the rating dimensions, which can imply that in the rating process firms with high excess returns can be considered riskier.
Qualitatively, the coefficients of the two rating dimensions are rather similar, so we can assume that they have a similar scale.This allows us to have a look at the threshold parameters for these two outcomes and cautiously make some interpretations.As the results in the first column of Table 8 show, we observe lower thresholds estimated for S&P compared to Moody's in the speculative grades.This can translate into Moody's being more conservative in the speculative grade regions.Note that the differences in the investment grade categories are negligible.
The estimated error structure provides information on the inter-rater dependence as well as on the time persistence of the ratings.The estimated parameters of the error structure can be found in the bottom part of the first column in Table 8.As expected, we find a high correlation of 0.92 among the S&P and Moody's ratings and a lower correlation between the raters and the failure indicator.We observe a correlation of 0.3 between S&P and the failure dimension and a correlation of 0.33 between Moody's and the failure dimension.The second component of the error structure provides information on the time persistence for each dimension separately.The time persistence is rather high all outcomes.We observe for the time persistence parameter ξ an estimated value of 0.93 for S&P, 0.94 for Moody's and 0.77 for the failure component.

Model comparison
When comparing the panel data model with simpler models based on the composite likelihood AIC we observe that the proposed model (AIC: 979045.65,BIC 993631.26) is indeed preferred over simpler specifications.We consider a model which takes no correlation into account (AIC: 1038281.48,BIC: 1053091.83), a model which takes only the cross-sectional dependence into account (AIC: 1031555.63,BIC: 1046438.48) and a model which takes only the longitudinal dependence into account (AIC: 1007943.9,BIC: 1022511.19).It can be observed that the longitudinal specification improves the fit more than the cross-sectional specification.Table 8 presents the parameter estimates for the different models.Note again that the absolute value of the coefficients is not directly comparable, given the non-identifiability of the scale.

Software implementation
In this section we illustrate how the model proposed above can be estimated using the R implementation in package mvordflex.This package is designed as an extension to the mvord package available on CRAN.More specifically, we implement a new multiple measurement object which takes into account the fact that the data contains three dimensions: the subject i, the time point t and the outcome j.Furthermore, new objects for the error structure among the ordinal responses are implemented to reflect the covariance in Equation (3).After introducing these new objects we exemplify on a simulated data set the use of the estimation function mvordflex() to estimate the three-dimensional model.For model comparison, we also show case how three further models can be estimated: one where all correlations are set to zero, one with cross-sectional correlations but zero longitudinal correlations and one with longitudinal correlations but zero cross-sectional correlations.

Multiple measurement object
The multiple measurement object MMO3 is implemented, to which the three dimensional panel data can be passed, in addition to the existing multiple measurement object MMO and MMO2 in mvord.The MMO3 object requires the user to specify the name of the column containing the ordinal observations, the subject index (i), the time index (t) and the multiple measurement index (j).Note that all the ordinal observations for all outcomes should be contained in one column of the data frame to be passed to the model.Moreover, the covariates are allowed to vary for each i, t, j.The MMO3 object constitutes the left hand side of the formula object.

Error structure objects
The proposed correlation structure is implemented as a new error structure named cor_MMO3() of class 'error_struct'.
> cor_MMO3(formula = ~1, value = numeric(0), fixed = FALSE, Psi.diag = TRUE) As discussed in Section 2, this correlation structure consists of q(q −1)/2 correlation parameters for the crosssectional structure and q persistence parameters.The argument value can be used to specify starting values for the parameters ρ 1 , ρ 2 , . . ., ρ q(q−1)/2 , ψ 1 , ψ 2 , . . ., ψ q .If argument fixed is set to TRUE, the parameters will be set to value and not estimated.In the current implementation the Psi.diag must be set to TRUE, as we at the time of writing do not allow for a general structure on the matrix Ψ. Setting Psi.diag to FALSE would estimate a general Ψ matrix which satisfies the stationarity constraints of multivariate autoregressive process.To ensure that the stationarity constraints are satisfied, a decomposition of Ψ which relies on a positive definite matrix and an orthogonal matrix as described in (Roy et al., 2019) can be employed.For the positive definite matrix the unconstrained log matrix parameterization can be employed while for the orthogonal matrix Givens parameterization or the Cayley representation can be used.
We also provide two additional correlation structures, namely cor_MMO3_cross() and cor_MMO3_ar1(), which allow the users to estimate models with only cross-sectional and only longitudinal correlations, respectively.
> cor_MMO3_ar1(formula = ~1, value = numeric(0), fixed = FALSE, Psi.diag = TRUE) > cor_MMO3_cross(formula = ~1, value = numeric(0), fixed = FALSE) The user may note that these two models can be estimated using mvord.However, if a comparison among different model based on AIC or BIC is desired, the same pairs should be used in the composition of the pairwise likelihood in all estimated models to ensure comparability.

Further objects
Finally, we provide a new link function mvlogitapprox() which uses the multivariate Student-t distribution as an approximation to the multivariate logistic distribution implemented in mvord::mvlogit() (for more details see Section 2.2.2).
Other functionalities of the mvord package can be used also for the three dimensional model.Constraints on the threshold parameters and on the regression coefficients can be set as described in Section 3.5 and 3.6 of Hirk et al. (2020), by taking into account that the dimensionality of the problem is q • T (see structure of correlation matrix in Equation ( 3)).

Simulated data
We use a simulated data set for software illustration purposes, given that the data used in Section 4 cannot be provided due to licensing constraints.The reader should be aware that the data below is not simulated from the proposed model, so there is no relation among the covariates and the response and no dependence among the ordinal responses.

Model comparison
We can compare all models using the AIC() function: > AIC(res_ident_logit, res_cross_logit, res_ar1_logit, res_logit) As expected from simulation of the data above, the model with an identity correlation matrix performs best in terms of CLAIC and CLBIC.

Conclusion
We propose a multivariate ordinal regression model which accounts for dependence between repeated and multiple ordinal measurements.This is achieved by imposing a multivariate autoregressive structure on the errors underlying the ordinal responses, where the contemporaneous errors have a general correlation structure and the coefficients of the AR(1) process capture persistence of the ordinal outcomes over time.The estimation is performed using composite likelihood methods and a simulation study confirms that the model parameters can be recovered well and that the pairwise likelihood approach is competitive when compared to a Bayesian approach, both in terms of computation time and accuracy of the estimates.Furthermore, we present the implementation of the model as an R mvordflex, which is an extension to the R package mvord and exemplify how users can use the functionality provided by this extension.Finally, we illustrate the framework on a data set containing default and credit rating information from S&P and Moody's for US listed companies over the period 2003-2013.We find that the proposed model improves the fit when compared to simpler specifications which take only the cross-sectional correlations or only the time dependence into account.
One of the limitations of the model, which relates mainly to the pairwise likelihood estimation is the possibility of overoptimistic standard errors for the threshold parameters.This can be seen from the provided simulation studies and has also been documented before in the literature (see e.g., Varin and Czado, 2010).Employing re-sampling techniques such as the jackknife or bootstrap can alleviate this problem and can lead to less biased variance estimates.However, it comes at a significant computational cost.
Finally, in our modeling approach we assumed that the observations are missing completely at random.In longitudinal studies, this is often not the case, so approaches which can model the missing data mechanism jointly with the observations should be investigated in future research for the panel data case (see e.g., Li and Grace, 2013;Li et al., 2019, for approaches for longitudinal models).We also observe that in the presence of highly imbalanced binary responses, the sign of some of the error structure parameters is computationally unidentifiable and the estimation procedure is inaccurate.Up-sampling can alleviate this problem.Alternatively, the model can further be extended to accommodate for different multivariate (asymmetric) link functions.This could prove beneficial for the modeling of imbalanced responses.A further extension would be modeling Ψ as a full matrix, with certain constraints to ensure stationarity and invertibility of the multivariate AR(1) process.Such a specification can prove relevant in e.g., economic applications.

Computational details
The package mvordflex is provided at https://gitlab.com/lauravana/mvordflex.The codes for reproducing the three simulation exercises as well as the the tables and figures from Section 3 can also be found

Fig. 1 :Fig. 2 :
Fig. 1: This figure displays the distribution of the categories for each outcome over the 100 simulated data sets for the 8 scenarios considered in the simulation exercise with n = 1000 subjects, q = 3 responses (two ordinal responses with four classes and a binary response) observed over T = 10 time points and p = 2 covariates.

Fig. 3 :
Fig. 3: The upper left panel displays the average generalized asymptotic variance ĝ (solid line) as well as the empirical covariance of the estimates in the proposed model (dashed line) over 100 simulated data sets.The lower left panel shows the MSE for different lag parameter c.The right panel shows the distribution of the computation time in seconds in the 100 replications for different values of c.The simulation setting employed contained n = 200 subjects, T = 10 time points and q = 3 responses (two ordinal responses with four classes and a binary response).

Fig. 4 :
Fig. 4: This figure displays the distribution for 100 simulated data sets of the coefficients for the model estimated with pairwise likelihood methods (PL) and the posterior means in the Bayesian model estimated with rstan.The lower panel shows the distribution of the computation time in seconds.The simulation setting employed contained n = 200 subjects, T = 5 time points and q = 3 responses (two ordinal responses with four classes and a binary response).

Table 1 :
This table presents simulation results based on 100 repetitions, n = 1000, T = 10, q = 3 for the correlation setting Σ high and Ψ high.

Table 2 :
This table presents simulation results based on 100 repetitions, n = 1000, T = 10, q = 3 for the correlation setting Σ high and Ψ low.

Table 3 :
This table presents simulation results based on 100 repetitions, n = 1000, T = 10, q = 3 for the correlation setting Σ low and Ψ high.

Table 4 :
This table presents simulation results based on 100 repetitions, n = 1000, T = 10, q = 3 for the correlation setting Σ low and Ψ low.

Table 5 :
This table presents simulation results based on 100 repetitions, n = 200, T = 5, q = 3 for the proposed model where the estimation using Bayesian methods is compared to the pairwise likelihood estimation.