Robust inference for ordinal response models

The present paper deals with the robustness of estimators and tests for ordinal response models. In this context, gross-errors in the response variable, specific deviations due to some respondents’ behavior, and outlying covariates can strongly affect the reliability of the maximum likelihood estimators and that of the related test procedures. The paper highlights that the choice of the link function can affect the robustness of inferential methods, and presents a comparison among the most frequently used links. Subsequently robust M -estimators are proposed as an alternative to maximum likelihood estimators. Their asymptotic properties are derived analytically, while their performance in finite samples is investigated through extensive numerical experiments either at the model or when data contaminations occur. Wald and t-tests for comparing nested models, derived from M -estimators, are also proposed. M based inference is shown to outperform maximum likelihood inference, producing more reliable results when robustness is a concern. ∗The authors would like to thank the editor and the referees for their extremely careful reading of the paper and their helpful comments and suggestions which enhanced both the style and the content of the paper. The research of the first and third authors has been partially supported by the SHAPE project in the framework of the STAR Programme (CUP E68C13000020003) at University of Naples Federico II, financially supported by UniNA and Compagnia di San Paolo.


Introduction
In recent years, the interest in ordinal data has been constantly growing since these data convey relevant information on several scientific and applied research areas, such as medicine, psychology, sociology, political sciences, economics, marketing, and so on. The motivation for this interest stems from the need to collect data on several aspects of current life of people and communities. Generally categorical data arise when items concerning opinions, preferences, judgements, evaluations, worries, etc., are expressed as ordered categories.
A large body of statistical literature has been devoted to models for ordinal data, as summarized in standard books such as Agresti (2010) and Tutz (2012) among others. The present paper focusses on models where the response variable is ordinal and depends on explanatory variables given by the subjects' covariates.
The most popular approach to ordered response models initially advocated by McCullagh (1980) is based on the assumption that a latent variable drives the response and the model is embedded within the Generalized Linear Model framework as formalized by Nelder and Wedderburn (1972) and McCullagh and Nelder (1989). A different perspective, more related to the psychological process of selection, leads to the cub models (Piccolo, 2003;, so called because they parameterize the probability of a given response as a mixture of a shifted Binomial and a discrete Uniform random variable. This approach does not require the specification of a model for the latent variable and describes directly the effect of the covariates on the feeling and the uncertainty underlying the respondents' choices. In spite of the huge body of literature on robustness both for continuous and discrete data in the past decades (see e.g. the books by Huber (1981, 2nd edition by Huber and Ronchetti 2009), Hampel et al. (1986, Maronna et al. (2006)), the area of ordinal data has been somewhat neglected. Nevertheless, it has been recognized that respondents may deliberately or unconsciously choose a wrong category. This phenomenon, in addition to the occurrence of gross-errors (whose probability is never negligible) or to erratic behavior by a few respondents, produces a contamination of the assumed model distribution, which can alter the properties of estimators and tests.
Few papers in the robustness literature are related to this problem. An early reference is Hampel (1968), where in addition to the now classical infinitesimal approach to robustness, robust issues in the pure binomial case are discussed. This was followed up more recently by Ruckstuhl and Welsh (2001). In addition, Victoria-Feser and Ronchetti (1997) develop robust estimators for grouped data, which typically appear e.g. in income studies, by investigating the robustness properties of estimators with respect to the underlying continuous model. More recently,  deal with robustness for the class of cub models, and Croux et al. (2013) propose a robust estimator for ordinal response models with a logistic link function. In the literature about latent variable models, Moustaki and Victoria-Feser (2006) develop robust alternatives to LISREL and robust estimators for Generalized Linear Latent Variable Models. Finally, from a Bayesian perspective, some interest for detecting outliers with categorical and ordinal data has been advanced by Albert and Chib (1993;, who compute Bayesian residuals for both binary and polychotomous response data. The approach has been extended also to sequential ordinal modelling (Albert and Chib, 2001) with the comparison of nonnested models. For a Bayesian modelling of categorical response data we refer to Chib (2005) who includes an overview of MCMC methods in this context.
In this paper we consider a rich class of ordinal response models based on a latent variable with covariates and different link functions. First, we study the impact of deviations from the underlying distribution on the reliability of the Maximum Likelihood (ML) estimator, and illustrate the results for the most common links: the probit, the logistic and the complementary log-log link. In particular, we highlight the role of the generalized residuals as diagnostic tools. Moreover, we compare the behavior of the most common links from a robustness point of view under different types of deviations. Then, as a robust alternative to the ML estimators (MLE), M -estimators are proposed, which yield reliable estimates of the value of the parameters, and can be used to derive robust testing procedures. They can be viewed as weighted MLEs, where the weights provide valuable diagnostic information on possible outliers and substructures in the data. Finally, in an extensive Monte Carlo study we investigate the finite sample behavior of our new robust estimators at the model and under various types of deviations that can appear in practice. In particular, we give recommendations on the choice of the tuning constant for robust estimators.
The paper is organized as follows: the next section provides a brief overview of the approach currently used in the analysis of ordinal response variables, and reviews classical ML inference in this context. Section 3 gives some insights on the robustness of current methods for ordinal response models with special emphasis on the behaviour of the generalized residuals for some popular links. Section 4 proposes a class of robust estimators for ordered response models, whose properties are numerically investigated in Section 5. Robust testing procedures are considered in Section 6. Some concluding remarks end the paper. In the Appendix we provide additional numerical evidence and some computational details. In particular, we investigate the choice of the tuning constant with respect to the efficiency and the robustness of the corresponding estimator.

Maximum likelihood inference for ordered response models
Let Y be an ordinal variable of interest which is linked to an underlying latent variable Y * through the relationship where −∞ = α 0 < α 1 < . . . < α m = +∞ are the thresholds (cutpoints) of the continuous support of the latent variable, and m represents the given number of categories of Y . The variable Y * , in turn, depends on p ≥ 1 covariates, so that for the i-th statistical unit we have the latent regression model where X i = (X i1 , X i2 , . . . , X ip ) , β = (β 1 , β 2 , . . . , β p ) and i is a random variable whose distribution and density function are denoted by G( ) and g( ), respectively. Since Y * is unobservable, a random sample is given by (Y i , X i ), for i = 1, 2, . . . , n.
Given an observed random sample (y i , x i ), for i = 1, 2, . . . , n, let y = (y 1 , y 2 , . . . , y n ) and X be the matrix whose rows are given by x 1 , x 2 , . . . , x n . The log-likelihood function is where I[ω] is an indicator function which takes the value 1 when ω holds and 0 otherwise. Notice that (θ; y i , x i ) includes only one term, that is the j-th term of the sum if the i-th respondent chooses the j-th category.
The score function is Finally, the generic term of the information matrix I(θ, X) for a single observation, conditionally on X = x, is given by for (r, s) = 1, 2, . . . , m+p−1, and the elements of the unconditional information matrix I(θ) are given by I rs (θ) = E X I rs (θ, X) .

For the thresholds we have
Thus, the elements of s(y i , x i ; θ) related to the thresholds are given by for j = 1, . . . , m − 1. Notice that the score functions for the thresholds can be unbounded, when x i β → ±∞, if the density at the numerator converges to zero slower than the denominator.
x ik , the elements of s(y i , x i ; θ) related to the regression coefficients are given by for k = 1, 2, . . . , p.

3412
M. Iannario et al. Then, (5) becomes Notice again that, although the generalized residuals e ij (θ) are defined for any j ∈ {1, 2, . . . , m}, the individual score function for β k , given by (7), contains only the j-th term such that y i = j. The score function for the whole sample is given by which has the same structure as the first-order condition of the Gaussian (ML) equation in the linear regression model, i.e. n i=1 r i x i = 0, where the r i 's are the residuals. Equation (8) makes clear why the quantities e ij (θ) are called generalized residuals: in ordinal response models they play a similar role to the r i 's in the linear model. Indeed their inspection is crucial to understanding the impact of anomalous observations on estimators.
Outlying values in the covariates induce both the numerator and the denominator of (6) to approach 0 and the final value of e ij (θ), which can either diverge or vanish, depends on the speed of convergence of the two terms. However, if a large x i is associated with a large residual, the corresponding statistical unit will have a dominating role in (8) when determining the estimate of the parameter. If instead the generalized residual tends to zero when x i becomes large, then the impact of outlying covariates in the estimation process is limited. Consequently, it is recommended to choose the link function such that the generalized residuals, viewed as a function of x i β, are bounded. This point will be dealt with in greater detail in Section 3.
Notice however that the generalized residuals are affected also by anomalous data in the response Y . This variable has a limited support, {1, 2, . . . , m}, hence unbounded outlying observations cannot occur. Nevertheless, a category inconsistent with the covariates may appear as a consequence of an incoherent selection by the respondent or of a collecting, reading or recording error. Such an event can produce an outlying residual which may have a non-negligible impact on (8), as illustrated in the following example.
Example 1. Consider a categorical variable obtained through (1) where Y * = 1.5X + , X is a standard normal variable, ∼ N (0, 1) and α = (−1.6, 0, 1.6) . An observed sample of n = 30 data has been collected, whose values are reported in Table 1. Suppose that, by mistake, the first value of the response variable Y , which is 1, is erroneously reported as 4. Figure 1 shows the generalized residuals of the original and modified data. In the original data the residuals are displayed around 0 and no outlying values appear. The first generalized residual, which originally has the value 0.521, takes, in the modified dataset, the value −3.919 (this change is displayed by an arrow in the right panel). This residual turns out to be much lower than the other residuals and clearly outlying. Furthermore it is likely to magnify the role of x 1 in the estimation process.

Influence function, link functions and generalized residuals
In order to formalize the robustness properties of the MLE, we study its influence function (Hampel, 1974). In particular we focus on the boundness of this function, which is necessary for local robustness. The influence function is given by Since, for each parameter, the influence function is a linear combination of all the score functions, it depends on both the covariates x i and the generalized residuals introduced in (6). While there is no control on the magnitude of the covariates, the generalized residuals can be bounded or unbounded according to the choice of the distribution function G( ).
To investigate the behaviour of the generalized residuals, let t = x i β and define the function A j (t) which, for given thresholds (α 1 , . . . , α m−1 ) and any real t, is given by Hereafter the behaviour of A j (t) is examined when some of the most popular links used for ordinal response models, that is those derived from the Gaussian, the extreme value and the logistic distribution, are adopted.

• Probit link
If has a normal distribution, (9) becomes where φ(·) and Φ(·) are the standard normal density and distribution function, respectively. Figure 2 shows A j (t) for j = 1, 2, . . . , m when the thresholds are −2.5, −1, 0, 1 and 2.5. The functions A j (t) are unbounded as t → ±∞ (except for j = 1 when t → −∞ and j = m when t → +∞), hence the impact of large x i β on the estimates can be amplified by unbounded generalized residuals. Furthermore, the score functions for the thresholds, defined by (4), are also unbounded. Consequently, when the probit link is adopted, a large x i affects all the elements of s(y i , x i ; θ) and it is likely to have a non-negligible impact in the estimation process.

• Complementary log-log link
Assume G( ) = 1 − exp{− exp( )}, then (9) becomes Figure 3 shows that the A j (t) functions for j > 1 are unbounded for negative values of t and converge to 1 as t → +∞ (notice the asymmetric  behaviour in the tails produced by this distribution). Moreover, for this link, analogously to the probit link, the score functions for the thresholds are also unbounded, so that the influence functions are unbounded as well.

• Logistic link
If has a logistic distribution, (9) yields The functions A j (t) are illustrated in Figure 4. When a logistic link is adopted, the functions A j (t) are bounded everywhere, for any real t, and in particular lim t→±∞ A j (t) = ±1. Hence the generalized residuals are always bounded between −1 and 1, and they neither diverge nor amplify the impact on the estimators of outliers in the covariates. Remarkably, the functions which appear on the right side of (4) in score functions for the thresholds are also bounded as illustrated in Figure 5. Hence the only source of unboundness in the score functions is given by the x ik 's which are multiplied by the generalized residuals in (8).

Robust estimation
A robust estimator for θ, with a bounded influence function, can be obtained by weighting the score functions so as to limit the influence of anomalous data. For this purpose an M -estimator (Hampel et al. 1986 p. 100, 230) where and The weights w(y i , x i ; θ) in (11) are designed to downweight outlying observations in order to control their impact in the estimation. Of course, if , and the solution of (10) is the MLE.
The term a(θ) in (11) is required to achieve Fisher consistency forθ M , and can be obtained as follows ; the influence function of the M -estimator is given by In order to have a bounded influence function, the function ψ(·) needs to be bounded. The following two cases should be distinguished.
• The covariates are outlier free. This is either the case of qualitative explanatory variables, whose categories are represented by means of dichotomous regressors, or the case of finite explanatory variables without extreme values. Here the generalized residuals are the only source of unboundness of the score functions, and hence of the influence function. Consequently estimators with a bounded influence function can be obtained either by adopting the logistic link, whose generalized residuals have the limited range (−1, 1), or by applying a downweighting of the residuals through an appropriate weight function when other links are adopted. • The covariates may contain outliers. This case occurs when there are unlimited variables among the regressors. Here unboundness of the ψ(·) functions can derive either from outlying covariates or from unbounded generalized residuals, and the sequence of weights w(·) in (11) should limit the effect of both large x i and large e ij (θ).
When the regressors cannot contain outliers and the link is different from the logistic one, the only concern from the robustness viewpoint is given by large generalized residuals. Hence, the following Huber weights (Hampel et al., 1986, p. 104), which are functions of the magnitude of the residuals, can be used where c is a positive constant and the following relationship is exploited to evaluate the magnitude of the residuals The weights in (13) are decreasing functions of the magnitude of the residuals, when the latter exceed c in absolute value.
In the second case, when the regressors may contain outliers so that large |e ij (θ)| and large x i are a concern, the following weights, which are non-increasing functions of the magnitude of both the residuals and the covariates, can be applied A Mahalanobis distance for the norm of the covariates in (14), which needs however to be based on a robust multivariate estimator of locationμ X and scatterΣ X . The threshold c xi defining the weights w(y i , (14) is no longer a constant, but depends on x i . More trimming is applied to a residual m j=1 I[y i = j] | e ij (θ) | associated with an outlying covariate, i.e. when x i is large. This is analogous to the linear regression case; see Hampel et al. (1986), Fig.1, p.

322.
Notice also that if one adopts a logistic link, the generalized residuals are always bounded (see Section 3) and in principle only weights on the x i may be needed. This is the robust estimator proposed by Croux et al. (2013). However the numerical experiments of Section 5 provide evidence that considering the interaction between regressors and generalized residuals can enhance the efficiency of the M -estimators, even with the logistic link.
Finally the value of the tuning constant c is to be chosen so that the loss of efficiency incurred by M -estimators with respect to the MLEs, does not exceed a given threshold (say 5% or 10%), when there is no contamination in the data; see Section 5.1.
If we denote by V ML and V ψ the variance-covariance matrices of the ML and M -estimators, respectively, various relative efficiency measures (listed below) can be considered to evaluate the loss of efficiency at the model corresponding to a given c.
• Trace criterion. This is a global efficiency measure which compares the traces of the asymptotic variance-covariance matrices of the MLEs and the M -estimators, • Minimum variance ratio criterion considers the minimum among the ratios between the asymptotic variances of the competing estimators, It evaluates the maximum loss of efficiency in the estimation of a single parameter, which can be encountered when the data are generated by the assumed model and an M -estimator is used. • Determinant criterion. This is also a global measure of efficiency which compares the determinants of the asymptotic variance-covariance matrices, Robust inference for ordinal response models 3421 All these criteria are functions of c. If the asymptotic variance-covariance matrices were known, given an efficiency target (say 0.95) for the M -estimator, the value of c could be chosen such that eff (θ M ,θ ML ; c) = 1 − ν, where ν is the acceptable loss of efficiency. Unfortunately the asymptotic variance-covariance matrices are typically unknown, since they depend on various factors including the distribution of the covariates which is usually unavailable. Simulated versions of these criteria are therefore used in Section 5, where numerical investigations are carried out to determine suitable values of the tuning constant.

Numerical experiments
This Section investigates the finite sample behavior of our new robust estimators and discusses the choice of the tuning constant. Subsection 5.1 is devoted to identify which values of c guarantee that the loss of efficiency, when no contamination of the data occurs, is below the 5% threshold, whereas Subsection 5.2 explores the performance of M -estimators when the data are contaminated by gross errors or outliers. Both the probit and the logistic link are considered, since they are the most frequently used in practice. Due to space constraints, only the most relevant results are reported here, while an extended version of the numerical experiments is provided in the Appendix.
The analysis is carried out through numerical experiments on six models. The first two models have only qualitative covariates, whose categories are represented through dichotomous variables, so that the explanatory variables are outlier free, although anomalous data can still be encountered in the response variable. Hence for these two models robust estimation is performed only for the probit link. When the logistic link is adopted and x i is made up of dichotomous variables, robust estimation is unnecessary since the generalized residuals belong to (−1, 1) (see Sections 3 and 4). The following three models have 1, 2 and 3 continuous covariates respectively, which implies that outlying x i are a concern. Finally, in the last model there are both dichotomous and continuous regressors. For models from 3 to 6 robust estimation is necessary with both the probit and the logistic link.
• Model 1. The response variable Y , generated by (1), assumes m = 5 categories. It depends on one qualitative explanatory variable with four categories, coded through 3 dichotomous 0 − 1 variables X 1 , X 2 , and X 3 , such that at most one of them can take the value 1. The error component in (2) is N (0, 1), the regression coefficients are given by β = (2.5, 1.2, 0.5) , and the cutpoints are given by α = (−0.7, 0, 1.5, 2.9) . • Model 2. The response variable Y assumes 4 categories and depends on two qualitative variables W i , for i = 1, 2. Each of them assumes three categories, coded by two dichotomous 0−1 variables X a i and where ∼ N (0, 1) and the cutpoints are α = (1.2, 2.8, 5) .

Tuning constant and loss of efficiency at the model
These weights are analogous to those used by Carroll and Pederson (1993) to obtain robust estimators for the logistic model with a dichotomous response variable.
In Model 3, since there is only one regressor, the norm x i for the i-th observation is computed as |x i − Med(X)|/M AD(X), where Med(X) and MAD(X) are the median and the normalized median absolute deviation of X. The same procedure is used in Model 6, where the norm is a function of the continuous variable only. In the other two models (4 and 5), the norm of the regressors is given by the Mahalanobis distance, computed after estimating location and scale of X through the Stahel-Donoho procedure (Stahel, 1981;Donoho, 1982; see also Maronna and Yohai, 1995).
The three efficiency criteria introduced in Section 4 are applied to evaluate the loss of efficiency due to M -estimation at the model, with the asymptotic variance-covariance matrices of the estimators replaced by the Mean Square Error (MSE) matrices obtained from simulation.
The minimum values of c such that the loss of efficiency of M -estimators is not larger than 5%, are shown in Table 2. • For a given loss of efficiency, the values of c are smaller for the logistic link than for the probit link. This evidence is explained by the circumstance that the logistic generalized residuals have a limited range and hence they can be expected to be smaller than the probit residuals. Therefore, the same value of c yields smaller weights with the probit link than with the logistic one, with a consequent larger loss of efficiency. • For a fixed loss of efficiency, the Trace criterion requires smaller values of c, whereas the Determinant criterion requires larger values (i.e. a milder downweighting of the observations). On the whole, the value of c does not seem to be appreciably affected by the nature (dichotomous or continuous) and the number of regressors. • When the probit link is adopted, in order to keep the loss of efficiency below 5%, c needs roughly to be between 1.1 and 1.7 according to the Trace criterion, between 1.2 and 2.3 according to the Minimum MSE ratio criterion, and between 1.6 and 3.1 according to the Determinant criterion. In short, when the target is a 5% loss at the model, c needs to be greater than 1, but a value of c ≥ 3 might be too conservative. • When the logistic link is adopted, c < 1.5 seems generally suitable to ensure that the loss of efficiency is below 5%, whatever criterion is used.
• When estimation with the logistic link is carried out with weights (15), the value of the tuning constant necessary to keep the loss of efficiency below 5% is usually slightly higher than that required by weights (14). In other words, for a given c, the loss of efficiency is larger when the residuals are neglected than when they are considered jointly with x i in the weights. This provides evidence that it is important, when evaluating the magnitude of x i , to take the corresponding |e ij (θ)| into account. In order to gain efficiency, two statistical units, such that x i takes the same value, should be treated differently according to the size of the associated generalized residuals. For this reason, the investigation of this estimation method is not pursued further. Figure 6 shows how the efficiency of the M -estimators varies with c, according to the three criteria, in case of Model 3 when the probit and the logistic link are adopted, respectively. When c decreases, the control on outlying observations increases and the ψ functions in (11) move away from the score functions, yielding a larger loss of efficiency. Consequently, the larger is the admissible loss of efficiency, the smaller is the value of the tuning constant. It is also to be remarked that the convergence of the three criteria to 1 is faster for the logistic link than for the probit one. Consequently, for a given c, the loss of efficiency at the model is smaller when the logistic link is adopted rather then when the probit one is used.

Tuning constant and robustness
After evaluating the loss of efficiency at the model due to M -estimation, it is interesting to investigate what is the gain in efficiency which can be achieved by robust estimation when data are contaminated. In what follows, some contaminations of the previous models are considered which highlight the performance of M -estimation when some observations deviate from the assumed model. In particular, three types of deviations are taken into account: gross errors in the response variable, outliers in continuous covariates and model misspecification.
In the comparison of alternative estimators, along with the Minimum MSE ratio, the maximum ratio between the MSE of the ML and M -estimators for each parameter, is also considered. This additional criterion is useful when data contamination occurs, since under deviations from the model robust estimators are expected to be more efficient than MLEs. In particular, it points out what is the maximum gain in efficiency which can be achieved by M -estimation for a single parameter. For each numerical experiment 1 . 000 samples of size n = 200 are generated.

Contaminated Model 2 (Shelter Effect).
Here we consider the case when five Y i , which originally take value 1, 2 or 3, are changed into 4. This kind of contamination occurs when the selected category (in this case "four") can be regarded as a shelter category: a choice that the respondents feel comfortable with, although it appears incoherent with their profiles in terms of covariates (see Iannario, 2012, for a more extensive illustration of shelter choices).
The minimum and the maximum MSE-ratios useful for comparing the Mestimators with the MLEs are shown in Table 3. Values of c between 1 and 1.5 seem thoroughly appropriate to achieve robust estimation according to the two criteria. If say c = 1.25 is taken, the gain in efficiency in the estimation of a single parameter varies between 37.6% and roughly 140%, which is a quite remarkable achievement obtained by M -estimators. On the contrary, high values of c yield a poor performance of robust estimation. Contaminated Model 3. Suppose that gross errors occur so that, for κ statistical units, the value of the regressor, which is N (0, 1), is erroneously recorded as 5. The MSE-ratio in Table 4 compares the MLE and the M -estimator (with c = 1.5) of the regression coefficient, when κ varies between 1 and 20, which -in a sample of size n = 200 -corresponds to an amount of contamination between 0.5% and 10%. Robust estimation produces a very large gain in efficiency, especially when the probit link is adopted. In fact, the efficiency as a function of the amount of contamination increases until a peak is reached, then decreases as a consequence of likely masking effects. Nevertheless, two remarks are to be made. The decrease in efficiency starts at a larger contamination for the logistic link with respect to the probit link; secondly, despite the decrease in efficiency, robust estimation remains still more efficient than ML estimation. Figure 7 shows the MSE-ratio for the regression coefficient when c varies and κ = 3 for both the probit and the logistic link. Although the magnitude of the gain in efficiency is widely different for the two links, Figure 7 shows that c needs to be small, say c ≤ 2, to achieve a more effective robust estimation.
Contaminated Model 3 (continued ). Table 5 shows the MSE-ratio between the MLE and the M -estimator of the regression coefficient in Model 3 whenbecause of a gross error -the value of x i for a single statistical unit is replaced by an outlier, whose value varies between 3 and 10. For both the probit and logistic link the value of the tuning constant is c = 1.5. From the table it can be appreciated how the gain in efficiency of the M -estimators over the MLEs increases as the outlier moves away from the bulk of data. As in the previous experiment, the gain in efficiency is much larger for the probit link than for the logistic one, although robust estimation is very useful in both cases, as soon as the magnitude of the outlier becomes noticeable.  Contaminated Model 4. The contamination considered for Model 4 is given by two statistical units whose regressors are jointly mis-reported as 5. Hence the amount of contamination is indeed very mild, that is 1%. Table 6 shows the Trace and the Determinant criterion, and the MSE-ratios between the MLEs and the M -estimators of the regression coefficients when both the probit and the logistic link are adopted. Also this experiment highlights the considerable greater efficiency of M -estimators with respect to MLEs. Contaminated Model 5. Model 5 has been estimated by MLEs and Mestimators after inserting two extra unnecessary and independent regressors: X 4 ∼ N (0, 1) and X 5 ∼ N (0, 1). Hence the overparameterized model assumed for the latent variable in the estimation process is Y * = β 1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 4 + β 5 X 5 + , though β 4 = β 5 = 0. In addition the data have been contaminated by changing the value of X 1 into 5 for κ statistical units. Table 7 shows how the Trace criterion varies with κ when c = 1.5 for both the probit and the logistic link. The gain in efficiency provided by M -estimation is clearly evident.

Robust testing procedures
Robust test procedures can be based on robust M -estimators by means of t-type statistics. As mentioned in Section 4, M -estimators are asymptotically normal, and the estimatorV ψ of the asymptotic variance-covariance matrix is consistent for V (θ, ψ). Hence, under the null hypothesis H k 0 : β k = 0 for k = 1, . . . p, we have whereV kk β is the k-th element on the diagonal ofV β , in turnV β is the submatrix ofV ψ related to the regression coefficients andβ M k is the k-th element of the M -estimatorβ M of β.

Wald type tests can also be carried out. Consider the hypothesis H
where H(θ M ) is the matrix of the derivatives of h(θ) with respect to θ evaluated atθ M . Following Heritier and Ronchetti (1994), these tests inherit the robustness of the corresponding M -estimators and exhibit robustness of validity and robustness of efficiency.
Model 7. This model is similar to Model 3 but the response variable Y , which still depends on one regressor X ∼ N (0, 1), assumes 4 categories. The latent variable is Y * = 1.2X + , and the cutpoints are α = (−1.5, 0, 1.5) for the probit link, and α = (−2, 0, 2) for the logistic one. In the estimation process the model is misspecified by including a dichotomous regressor D ∼ Bernoulli(0.5) and its interaction with X. Hence the latent model assumed in the estimation is Y * = β 1 D + β 2 X + β 3 XD + . In addition there is a mild contamination (0.5%) of the data: the value of X is changed into 5 for a single statistical unit. t-tests are carried out on the hypotheses H k 0 : β k = 0 for k = 1, 3. The tests are based on both MLEs and M -estimators, and for the latter the tuning constant is set to c = 1.5. The tests are performed on 1 . 000 simulated samples of size n = 200, at the significance level 5% and 10%. Table 8 shows the percentage of rejections of H k 0 for k = 1, 3. For the parameter β 1 the null hypothesis is true and gets generally rejected in a percentage of cases close to the nominal level. The simulated level of the test based on M -estimators, rather than on MLEs, is generally slightly closer to the nominal one. A remarkable difference in the performance of the tests occurs when testing the hypothesis H 3 0 . Despite the parameter being zero, the percentage of rejections of the test based on the MLEs is high, especially when the probit link is adopted. In contrast, when the test is derived from M -estimators, the actual rate of rejection is satisfactorily close to the nominal level. Table 8 Percentage of rejections in Model 7 of H k 0 : β k = 0, k = 1, 3, under 0.5% of contamination in the data (c = 1.5).

Probit link Logistic link
Significance level 5% 10% 5% 10%  Figure 8 shows the Receiver Operating Characteristics (ROC) curves for the t-tests based on the MLEs and on the M -estimators for the hypotheses H 1 0 and H 3 0 , when the probit link is adopted. The true positive rates have been evaluated when the model for the latent variable is Y * = 0.7D + 1.2X + 0.5XD + . The curves make evident the better performance of the test based on M -estimators in terms of power as well as size. Finally, the hypothesis H D 0 : β 1 = β 3 = 0, to check whether D is a useful explanatory variable, is considered. When the model is estimated by MLEs, the tests are based on the Likelihood Ratio (LR) statistic, whereas when Mestimation is performed the tests are carried out through the Wald-type statistic introduced in (16). Table 9 shows the percentage of rejections of the hypothesis H D 0 . The results are consistent with those of the t-tests. The percentage of rejections by the LR test are much higher than the nominal level, and this test becomes markedly too liberal when the probit link is adopted. Table 9 Percentage of rejections in Model 7 of H D 0 : β 1 = β 3 = 0 under 0.5% of contamination in the data (c = 1.5).

Probit link Logistic link
Significance level 5% 10% 5% 10% MLE (LR) 32.9 44.5 7.8 14.5 M (Wald) 5.2 10.5 5.5 11.0 The ROC curves for the LR test and the Wald test based on M -estimators for H D 0 -when the probit link is adopted -are shown in Figure 9. In the comparison between the Wald robust test and the LR test, these curves confirm that the former test has an actual significance level closer to the nominal one and it more powerful. To summarize: • Both t and Wald type tests based on M -estimators are considerably more accurate (in terms of actual significance level) and more powerful than tests based on MLEs. • If the MLEs are to be applied, the logistic link is to be recommended. It is more robust than the probit link, as it is reflected in the more reliable testing procedures.
Finally, the goodness of link test (Atkinson and Riani, 2000, p. 200-201) could be used as a complementary tool for guidance in the choice between alternative links.

Concluding remarks
The paper highlights that robustness is a relevant issue in modelling ordinal responses, and proposes robust estimation and testing procedures based on Mestimators. Such procedures provide reliable inference when data are contaminated and outperform procedures based on the MLEs.
Another relevant result developed in the paper concerns the impact of the link function. Actually, although there is a general agreement about the use and interpretation of estimated parameters with different link functions (logistic and probit link give practically equivalent results), our analyses point out that the behaviour of alternative links is completely different from the robustness viewpoint. In particular, analytical results as well as numerical experiments support the preference for the logistic link as a more robust link.
Finally the role of generalized residuals obtained by the estimated models with covariates is also investigated. The operational suggestion is to consider their plots as a fundamental diagnostic tool in the framework of ordinal response models.

Appendix
This appendix includes an extended version of the results of Section 5 related to numerical experiments intended to identify appropriate values of the tuning constant of M -estimators for ordered response models and to assess their robustness when data contamination occurs.

Computational details
The first step in the estimation process is the computation of the norm x i of the covariates through a robust estimation of location and scale. In Model 3, since there is only one regressor, the norm x i is computed as |x i − Med(X)|/M AD(X), where Med(X) and MAD(X) are the median and the normalized median absolute deviation of X. The same procedure is used in Model 6, where the norm is a function of the continuous variable only. In the other two models (4 and 5), the norm of the regressors is given by the 1/2 , after estimating location and scale of X through the Stahel-Dohono procedure (Stahel, 1981;Donoho, 1982; see also Maronna and Yohai, 1995). Given a direction a ∈ R p with a = 1, denote by a x i the projection of x i along a. The outlyingness of x i with respect to the sample X = (x 1 , x 2 , . . . x n ) along a is defined by

3433
The outlyingness of x i is given by t(x i ) = max a t(x i , a). The robust estimators of location and scatter are given bŷ where W X (t) is a non-increasing function of t such as the Huber weights used by Maronna and Yohai (1995), that is W X (t) = min 1, (c X /t) 2 with tuning constant c X = χ 2 p,0.95 , where χ 2 p,0.95 is the 95-th percentile of a χ 2 p distribution. Then, for a given c, the solution of (10) is obtained by the Newton-Raphson algorithm. Since at each iteration the value of a(θ) can be approximated as follows. For every x i and the current value of the estimateθ k at the k-th iteration, we approximate (17) by and then compute a(θ) by taking the sample average of (18) with respect to x 1 , . . . , x n . One important issue, which at the moment is still unsolved, is the choice of a good starting point for the Newton-Raphson algorithm. In the numerical experiments, the MLEs are used with satisfactory results. Indeed, the difficulties in finding alternative starting points provide a sound motivation for using the Huber weights, which correspond to a monotone ψ function. Alternative weights may be derived by considering redescending ψ functions, but it is well known that they would lead to multiple solutions of (10).
In order to evaluate the loss of efficiency produced by M -estimation at the model or the gain in efficiency achieved when data contamination occurs, simulated versions of the efficiency criteria are computed. Initially, the Mean Square Error (MSE) matrix of the estimators is obtained from simulation; its generic element is where B is the number of samples drawn,θ ib is the estimate obtained from the b-th sample and θ i is the i-th parameter. Then the Trace criterion is computed as the ratio between the trace of the MSE matrix of the MLEs and that of the M -estimators, The Minimum MSE ratio is obtained as the minimum among the ratios of the MSEs of the MLEs and those of the M -estimators for each parameter, The Determinant criterion is obtained as the ratio between the determinant of the MSE matrices of the ML and M -estimators, Finally the Maximum MSE ratio is computed as Briefly, in any criterion, the asymptotic variance-covariance matrices are replaced by the simulated MSE matrices.

Tuning constant and loss of efficiency at the model
In Subection 5.1, the minimum values of the tuning constant c such that the loss of efficiency at the model due to M -estimation does not exceed 5% are illustrated when the sample size is n = 200. Here some more extensive results are provided by considering also a smaller sample size n = 100 and by identifyng values of c suitable to keep the loss of efficiency below a larger threshold i.e. 10%. All the results are based on 1 . 000 simulated samples.
The minimum values of c such that the loss of efficiency of M -estimators is not larger than 5% or 10%, for n = 100 and n = 200, are shown in Tables 10, 11 and 12 for the probit link, the logistic one, and the logistic link with weights (15), respectively. Some remarks follow.
• By comparing the values of c when the sustainable loss of efficiency is 5% or 10%, we notice that the larger is the loss of efficiency, the smaller can be the value of c. Actually when c decreases the distance between the estimating function of the MLEs and that of the M -estimators increases, so that a smaller values of c implies a lower efficiency of the latter estimators at the model. • For a given c, the loss of efficiency at the model measured by the Trace criterion is smaller than that measured by the Minimum MSE ratio, which in turn is smaller than that measured by the Determinant criterion. On the whole, the value of c does not seem to be largely affected by the nature (dichotomous or continuous) and the number of regressors. • The values of c needed to keep the loss of efficiency under a given thresh- old are always smaller for the logistic link than for the probit link, since the generalized residuals produced by the former link tend to be smaller (having a limited range) and therefore yield a smaller argument for the Huber weights. As a consequence the convergence of the efficiency of the M -estimators to 1 (when c increases) is faster when the logistic link is adopted, as is also made evident by Figure 6. • When the probit link is adopted (Table 10) and the Trace criterion is considered, c needs to be between 1.1 and 1.8 to ensure a loss of efficiency below 5% and between 0.7 and 1.3 to ensure a loss below 10%. When the Minimum MSE ratio is considered, suitable values of c are between 1.2 and 2.3 when the maximum admissible loss of efficiency is 5% and between 0.9 and 1.7 when it is 10%. The Determinant criterion can require c ≥ 3.1 for a loss below 5% and c ≥ 2.5 for a loss below 10%. In short, when the target is a 5% loss at the model, c needs to be greater than 1, but a value of c ≥ 3 might be too conservative. • When the logistic link is adopted (Table 11) and the evaluation of the M -estimators is carried out through the Trace criterion, c needs to be between 0.6 and 1.2 if the maximum admissible loss of efficiency is 5% and smaller than 0.9 when it is 10%. The Minimum MSE criterion requires 0.8 ≤ c ≤ 1.4 to keep the loss below 5% and 0.6 ≤ c ≤ 1 to keep the loss below 10%. The Determinant criterion for the small sample size n = 100 can require also c = 2 to guarantee that the loss does not exceed 5% and c = 1.7 in order not to exceed 10%. c = 2 is the largest value of the tuning constant which ensures an efficiency loss not greater than 5% whatever criterion is used, but according to the first two criteria c can be even smaller, say roughly c < 1.5. • When estimation with the logistic link is carried out with weights (15) ( Table 12), the value of the tuning is generally slightly higher than what required by weights (14). As already pointed out in the paper, these results provide evidence in favour of weights (14) also for the logistic link despite the limited range of the generalized residuals. In order to reduce the loss of efficiency, two statistical units, such that x i takes the same value, should be treated differently according to the size of the associated generalized residuals.

Tuning constant and robustness
In order to investigate the gain in efficiency achieved by M -estimation in a neighborhood of the model, three types of deviations are taken into account: gross errors in the response variable, outliers in continuous covariates and model misspecification.
The following examples provide the results of numerical experiment based on 1 . 000 samples of size n = 200.
Contaminated Model 2. Here we consider the case, in Model 2, when 2 observations of the response Y , which should correspond to the fourth category, are erroneously recorded as belonging to the first one. The amount of contamination in the data due to gross errors is 1%. Table 13 shows the values of the various criteria when M -estimation is performed with the probit link and c varies between 1 and 3.5. The gain in efficiency measured by the Trace criterion is remarkable and varies between 17.8% for a very large value of the tuning constant (c = 3.5) and over 100% when c ≤ 1.25. The gain measured by the Determinant criterion varies between 2.9% when c = 3.5 and 42.9% when c = 1.5. According to the Minimum MSE criterion the minimum gain in efficiency varies between 2.4% and 7.5%. However the maximum gain in efficiency which can be obtained in the estimation of a single parameter varies between 21.8% and over 160%. The behaviour of the Minimum MSE ratio and that of the Determinant criterion are not monotone, indicating that c should not be too small (downweighting should not be excessive with respect to the amount of contamination). On the whole, suitable values of c appear those between 1 and 1.5, which are also consistent with a loss of efficiency smaller than 5% at the model according to the first two criteria. Contaminated Model 2 (Shelter Effect). This kind of contamination has already been considered in the paper, and deals with the case when -due to a shelter effect (Iannario, 2012) -five Y i , which originally take values 1, 2 or 3, are changed into 4. For sake of completeness, in addition to the Minimum and Maximum MSE ratio criterion, here we report (in Table 14) also the value of the Trace and Determinant Criterion. Whatever criterion is used, the gain in efficiency produced by M -estimation is clearly evident, especially for 1 ≤ c ≤ 1.5. Contaminated Model 3. The contamination of Model 3 lies in the value of the regressor which is erroneously recorded as 5, for κ statistical units, where κ = 1, . . . , 5. In the paper the efficiency of the M -estimator of the regression coefficient is investigated when c = 1.5. The efficiency of the M -estimators for a larger set of values of c is evaluated in Tables 15 and 16 for the probit and the logistic link respectively. Recall that, to keep the loss of efficiency at the model below 5%, c needs to be between 1.1 (for Trace criterion) and 2.2 (for the Determinant criterion) when the probit link is adopted, and between 0.6 and 0.9 when the logistic link is adopted.
Some remarks follow.
• M -estimation produces a gain in efficiency even for c ≥ 2.5 when the probit link is adopted and for c > 1 when the logistic link is adopted, though such values are much higher than those necessary to keep the loss of efficiency below 5% at the model. • According to the Trace and the Determinant criterion, M -estimation largely outperforms ML estimation whatever the amount of contamination, and the gain in efficiency increases remarkably with the number of outliers. • Occasionaly M -estimation can produce a small loss of efficiency in the estimation of the cutpoints, which is however largely offset by the gain achieved in the estimation of the other parameters and especially of the regression coefficient. This is why the global efficiency measures (Trace and the Determinant criterion) provide outstanding evidence in favour of M -estimators. • Although M -estimation is definitely more efficient when both the probit and the logistic link are adopted, in the former case the gain in efficiency is remarkably higher. This result is due to the intrinsic robustness of the logistic link, which -unlike the probit one -produces generalized residuals with a limited range. The unboundness of the generalized residuals of the probit link can generate huge loss of efficiency in ML estimation when data contamination occurs. Contaminated Model 3 (continued ). The paper considers a further contamination of Model 3, produced by a gross error which replaces the value of x i for one statistical unit by an outlier, whose value varies between 3 and 10. Figure 11 shows the various efficiency criteria when the value of the outlier varies (notice that the maximum gain in efficincy is achieved in the estimation of the regression coefficient). All criteria consistently point out an increasing gain in efficiency as the outlier moves away from the bulk of data.
Contaminated Model 4. The contamination of Model 4 (already considered in the paper) is given by two statistical units whose regressors are jointly mis-reported as 5. Figure 12 shows the efficiency of M -estimators measured by the four criteria. A loss of efficiency can be incurred in the estimation of the cutpoints (this is why the Minimum MSE criterion is below 1), which is nevertheless largely offset by the gain in efficiency obtainable in the estimation of the regression coefficients. Values of c between 1.5 and 2 seem to be optimal to achieve a robust estimation according to the Trace and the Maximum MSE criterion for both links, and also for the Determinant criterion in the case of the logistic link. Contaminated Model 5. To investigate the robustness properties of Mestimation in Model 5, the paper considers both a misspecification of the model, obtained by inserting two unnecessary regressors in the estimation process, and a mild contamination produced by changing the value of X 1 into 5 for κ statistical units. Table 17 shows how the Trace and the Determinant criterion vary with κ when c = 1.5 for both the probit and the logistic link. The gain in efficiency provided by M -estimation is clearly evident.   In addition Figure 13 shows the MSEs of the M -estimators versus those of the MLEs when κ varies. As the amount of contamination increases, the MLEs lose progressively more efficiency, whereas the MSEs of the M -estimators keep pretty stable.