Analysis of Robust Quasi-deviances for Generalized Linear Models

Generalized linear models (McCullagh and Nelder'89) are a popular technique for modeling a large variety of continuous and discrete data. They assume that the response variables Yi , for i = 1, . . . , n, come from a distribution belonging to the exponential family, such that E[Yi ] = µi and V[Yi ] = V (µi ), and that ηi = g(µi ) = xiTβ, where β ∈ IR p is the vector of parameters, xi ∈ IR p, and g(.) is the link function.The non-robustness of the maximum likelihood and the maximum quasi-likelihood estimators has been studied extensively in the literature. For model selection, the classical analysis-of-deviance approach shares the same bad robustness properties. To cope with this, Cantoni and Ronchetti (2001) propose a robust approach based on robust quasi-deviance functions for estimation and variable selection. We refer to that paper for a deeper discussion and the review of the literature.


Motivation
Generalized linear models (McCullagh and Nelder 1989) are a popular technique for modeling a large variety of continuous and discrete data. They assume that the response variables Y i , for i = 1, . . . , n, come from a distribution belonging to the exponential family, such that E[Y i ] = µ i and V[Y i ] = V (µ i ), and that where β ∈ IR p is the vector of parameters, x i ∈ IR p , and g(.) is the link function.
The non-robustness of the maximum likelihood and the maximum quasi-likelihood estimators has been studied extensively in the literature. For model selection, the classical analysis-of-deviance approach shares the same bad robustness properties. To cope with this, Cantoni and Ronchetti (2001) propose a robust approach based on robust quasi-deviance functions for estimation and variable selection. We refer to that paper for a deeper discussion and the review of the literature. Cantoni and Ronchetti (2001) suggest the following set of estimating equations for the robust estimation of β:

Some theory
where ψ(y, w(x i )µ i with the expectation taken with respect to the conditional distribution of y|x, ν(·, ·), w(x) are robustness weight functions, and µ i = µ i (β) = g −1 (x T i β). The estimator defined by equations (2) is an M-estimator characterized by the score function ψ(y Its influence function (Hampel 1974) is proportional to ψ (see Hampel, Ronchetti, Rousseeuw, and Stahel 1986), which is bounded with respect to y if ν(y, µ) is bounded, and with respect to x if w(x) is suitably chosen to down-weight leverage points.
Moreover, the estimating equations (2) can be obtained by differentiating the robust quasi-likelihood function with respect to β, where the functions Q M (y i , µ i ) can be written as withs such that ν(y i ,s) = 0, andt such that E[ν(y i ,t)] = 0. These functions can be used to construct a robust measure of discrepancy between two nested models M p (with p parameters) and M p−q (with p − q parameters) by means of the test statistic whereμ i = µ i (β) andμ i = µ i (β) are the estimates under model M p and M p−q respectively. Note that Λ QM , which is independent ofs andt, can be used to perform variable selection in an analysis-of-deviance type of approach. Cantoni and Ronchetti (2001) proved that, under some regularity conditions, the test statistic Λ QM in (5) is asymptotically equivalent to a quadratic form in normal variables and is therefore distributed according to a linear combination of χ 2 1 variables, more precisely: where N 1 , . . . , N q are independent standard normal variables, d 1 , . . . , d q are the q positive eigenvalues of the matrix The particular case when with r i = y i −µ i V 1/2 (µ i ) being the Pearson residuals and ψ c the Huber function defined by is studied in details in Cantoni and Ronchetti (2001), and the S-PLUS code implementing it is provided here. Note also that ν(y i , µ i ) = (y i − µ i )/V (µ i ) and w(x) ≡ 1 reproduce the classical approach. Cantoni and Ronchetti (2001) also computed the asymptotic level and power under contamination, and showed that they are both bounded if an M-estimator with bounded influence function is used for β (see Proposition 2 and Corollary 1 therein). A nice consequence of these results is the possibility to define a procedure to choose the tuning constant appearing in ν (see for example (7) and (8)) subject to a maximal bias on the level of the test statistic in a neighborhood of the model. This will depend on the maximal bias allowed on the test statistics, the proportion of contamination and the distribution of the test statistic.

The code
The S-PLUS code implementing the estimating equations (2) and the test statistic (5) with ν(y i , µ i ) as defined by (7) and (8)  The files Ins3.4 and Ins6.0 are scripts to install the code under S-PLUS version 3.4 and 6.0 for Unix respectively. The file ROBGLMsfun.s contains the whole set of S-PLUS functions. carrots.dump and possum.dump are dump files with the data used in Cantoni and Ronchetti (2001). The subdirectory help3.4 (and help6.0) contains the help files for the respective versions of the software.
The main routines are glm.rob for estimation, and quasi.rob for model selection. See their help files and the example below for details.

Installation Unix
To restore the files from the archive use the command tar xvf robGLM.tar: a directory called robGLM is created. In order to use these functions, go to the directory robGLM and run the proper installation file depending on the software version you are using. You are then ready to open an S-PLUS session in this directory.

Windows
Decompress the file Mspline.tar with, for example, WinZip. A folder robGLM is created. Move the HTML help files (the content of the folder helpHTML) to the folder .Data\__Hhelp. Source the code (ROBGLMsfun.s) within S-PLUS.

An example
As a tutorial example, we will reproduce the results of Section 5.1 in Cantoni and Ronchetti (2001), Tables 1 and 2. This application is on a well known dataset, called the damage carrots dataset. It is taken from Phelps (1982) and used in McCullagh and Nelder (1989) to illustrate diagnostic techniques because of the presence of a large outlier in the y-space.
We first restore the data from the dump file: The data give the proportion of carrots showing insect damage in a soil experiment where a trial with three blocks and eight dose levels of insecticide were considered.
We model the data with a binomial model B(m, π) with logit link, that is, where block1 and block2 are dummy variables taking the value 1 if measures are taken in block 1 or 2 respectively. The classical fit with the S-PLUS glm function can be obtained by > carrots.class <-glm(cbind(success,total-success)~logdose+block2+block1, data=carrots,family=binomial) with part of the summary being:  The validation of the model with different techniques -plot of deviance residuals, plot of Pearson residuals, and Cook's distance (see Figure 1, produced with glm.diag from the bootlib S-PLUS library that accompany Davison and Hinkley 1997)highlights the presence of a single large outlier (observation 14). No leverage point appears from this analysis. We therefore decide to use the robust technique by letting c = 1.2 in Equation (8). This value of c is obtained by allowing a maximal error of 0.02 on the asymptotic level and by fixing a percentage of contamination of 4% (see details in Section 5.1 in Cantoni and Ronchetti 2001). All the weights on the design were put to 1, because the h i values (hat matrix) were small for all the observations. This gives the following command lines: > carrots.rob <-glm.rob (as.matrix(carrots[,c(3,6,5) which arises some doubts about the significance of the variable block1. Note that this fact is hidden in the classical analysis.
We therefore perform a robust stepwise procedure to build the final model. We start by checking the pertinence to add the variable logdose to the null model and continue with the variable block2 and then block1. We keep the same parameter setting as for the estimation procedure, that is, c = 1.2 and all the weights on the design equal to 1. The S-PLUS command lines are: Note that the out.col argument is used to identify the column(s) of the explanatory variables that we are considering to add.
From each created object, one can extract the value of the residual robust quasideviance at the larger and at the submodel, the difference of quasi-deviances (statistic (5)) and its associated p-value, for example: These results mean that it is worthwhile to add the variable logdose to the model at the 5% level. The p-values of the other two tests are step2$pvalue = 0.017 and step3$pvalue = 0.085, which means that the significance of variable block1 is not clearcut.
The classical results can be reproduced by putting chuber=Inf in the above command lines. This is equivalent to using the anova built-in S-PLUS function to compare nested models, as it is shown in the following.
If we want to test in a classical framework whether the variable block1 plays an important role, then we would use the following commands with our software: step3.class <-quasi.rob (as.matrix(carrots) [,c(3,6,5) where the last line matches our test in step3.class.
In the classical analysis no doubts arise on the significance of block1 (at the 5% level). This is probably a masking effect due to the outlying observation.