CMregr – A Matlab software package for finding CM-Estimates for Regression

This paper describes how to use the Matlab software package CMregr, and also gives some limited information on the CM-estimation problem itself. For detailed information on the algorithms used in CMregr as well as extensive testings, please refer to Arslan, Edlund & Ekblom (2002) and Edlund & Ekblom (2004).


Introduction
This paper describes how to use the Matlab software package CMregr, and also gives some limited information on the CM-estimation problem itself.For detailed information on the algorithms used in CMregr as well as extensive testings, please refer to (Arslan, Edlund & Ekblom 2002) and (Edlund & Ekblom 2002).
New methods for robust estimation regression have been developed during the last decades.Two important examples are M-estimates (Huber 1981) and S-estimates (Rousseeuw & Yohai 1984).A more recent suggestion is Constrained M-estimates -or CM-estimates for short -where the good local properties of the M-estimates and good global robustness properties of the S-estimates are combined (Mendes & Tyler 1995).CM-estimates maintain an asymptotic breakdown point of 50% without sacrificing high asymptotic efficiency.
Until recently, there was no satisfying method for fast computation of CMestimates, with high precision.The Matlab code CMregr presented here is an implementation of a new algorithm for CM-estimates for regression presented in (Arslan et al. 2002) and (Edlund & Ekblom 2002), that addresses these issues.
We can formulate CM-estimation for regression in the following way.Consider the linear model y = X β + e, where y = (y 1 , y 2 , ..., y n ) T is the response vector, X is an (n × p) design matrix, β a p-dimensional vector of unknowns and e = (e 1 , e 2 , ..., e n ) T the error vector.Alternatively we may write where x i is the i:th row in X. Define the residuals as r i = y i − x T i β, i = 1, 2, ..., n.The CM-estimation problem is to find the global minimum of Here ρ(t) is a bounded function, symmetric around zero and nondecreasing for t ≥ 0, and c and ε are tuning parameters that need to be chosen carefully, as described in Section 2.
If strict inequality holds in the constraint above we get the redescending M-estimating equations for β and σ.Equality in the constraint instead gives the S-estimate, where σ is minimized with respect to (2).The reason is that for equality constraint we have L(β, σ) = constant + log(σ), which is minimized by minimizing σ.Thus the S-and CM-estimates are closely related.However, the CM-estimates have some nice properties not shared by the S-estimates as exemplified in Section 2.
Two ρ-functions are considered in this paper: Tukey biweight In CMregr these two functions are referred to as tukey and welsh respectively.As shown in Section 5, it is easy to add customized ρ-functions to CMregr, to calculate other CM-estimates.

Estimation parameters and how to choose them
The parameters c and ε in (1) and (2) are used for tuning the CM-estimates.The parameter ε controls the asymptotic breakdown point: The asymptotic breakdown point is given by min(ε, 1 − ε).By choosing ε = 0.5, the highest possible asymptotic breakdown point is achieved.No other choice is sensible when CM-estimates are considered.
This means that the only parameter that really matters for CM-estimates is c.If c is small, the S-estimate for ε = 0.5 is found rather than an CMestimate.For normally distributed residuals, it is in general required that c > 15 (Tukey) and c > 5.3 (Welsh) to get a CM-estimate rather than an S-estimate.
The properties of the estimate varies with c. Mendes & Tyler (1995) made a theoretical investigation of the asymptotic relative efficiency, and the residual gross error sensitivity: γ * r .When those results are applied to Tukey biweight and Welsh, it is revealed that the sensitivity γ * r has a minimum for some c, and that the efficiency is an increasing function of c.This implies that it is pointless to choose c smaller than the minimizer of γ * r , but it may be worthwhile to choose a greater value.For Tukey, γ * r is minimized by c = 19.62,and for Welsh the minimizer is c = 7.348.Other possible choices are given in Table 1.It is possible to use CMregr for finding S-estimates by setting c = 0. Then the properties of the estimate are controlled by changing the asymptotic breakdown point ε.As a comparison, Table 2 shows the values of ε corresponding to Table 1, with the addition of ε = 0.5.A table for Tukey biweight with some more reasonable choices of ε can be found in (Rousseeuw & Yohai 1984).  1 illustrates the difference in efficiency between some estimates with breakdown point 0.5, and least squares.High efficiency means that the regression line is close to the least squares solution if there are no outliers.
In Figure 2 we see the price one has to pay in S-estimates to get the same asymptotic relative efficiency as in CM-estimates.In the right graph of Figure 2, the S-estimate "breaks down" when one more outlier is added compared to the left graph.This is because of the very low breakdown point required for S-estimates to get high efficiency.The optimization problem of finding the CM-estimates shares the property, that there may be a few local minima, with the S-estimates.Figure 3 shows the two local minima for CM with two choices for the parameter c.The data-sets are the same in the two graphs with 43.5% outliers in the data.The right graph has the greater c corresponding to a higher efficiency, but then the objective function (1) reports the "wrong" solution as the best one.
The CM-estimate has an asymptotic 50% breakdown point, but that is not achieved in this case.This is what can be expected when the "good data" is too scattered.Figure 4 shows that for better behaved "good data", the 50% breakdown point is achieved, even for greater values of c.It is however obvious that the observed breakdown point may drop a bit.And the drop is generally greater for CM-estimates with higher efficiency.

The algorithm
The CM-estimates cannot be expressed explicitly.Computing these estimates numerically is a challenging problem, since we like to minimize an objective function where many local minima may exist.
If σ is held fixed in (1), we have a linear model M-estimation problem.The algorithms in (Arslan et al. 2002) and (Edlund & Ekblom 2002) make use of the fact that very good algorithms exist for this problem (Edlund 1997).So the solution is found by solving a series of linear model M-estimation problems.
The updating of σ is done as a separate step, that requires much less calculations, since we have a one-dimensional problem at hand.Details on this issue is found in (Edlund & Ekblom 2002).
To find the global minimum, good starting points for the local iteration above is generated by considering many subsamples with p observations (Arslan et al. 2002).

Invoking the software
There are two commands in CMregr for calculating CM-estimates: globalCMregr and localCMregr.The first of these tries to find the "global minimum" using the technique described above, while localCMregr needs a user-supplied starting point.-the tuning parameter, from Table 1.ε -asymptotic breakdown point.Always use 0.5 for CM-estimates.

X
-the design matrix.y -the response vector.iters -the number of subsamples to investigate for a good starting point.
If this is omitted, the routine makes a guess of the number of subsamples to use.
stat -some additional data on the solution.stat.border is 1 if the solution fulfills (2) with equality, and 0 if (2) is fulfilled with strict inequality.stat.obj is the value of the objective function (1) at the solution.stat.constr is the difference between the left and right hand sides of (2).

Example
Let us define a design matrix and a response vector: The Tukey biweight CM-estimate with c = 19.62 (see Table 1) is then found by >>[b,s,stat] = globalCMregr('tukey',19.62,0.5,X,y)b = 1.07973997179869 0.71013001410066 s = 0.54750922628337 stat = border: 0 obj: 0.63014080945719 constr: -0.02051392582918Since stat.border= 0, the solution is in the interior of the feasible region, i.e. ( 2) is not fulfilled with equality.If the number of outliers would increase, the solution would eventually reach the border.In a sense one could say that, if outliers are plentiful, the S-estimate kicks in and saves the day.
Plotting the regression line yields: -the parameter, from Table 1.ε -asymptotic breakdown point.Always use 0.5 for CM-estimates.

X
-the design matrix.y -the response vector.β start -the initial guess of the solution β.The algorithm searches from this point and finds a local minimum.
stat -some additional data on the solution.stat.border is 1 if the solution fulfills (2) with equality, and 0 if (2) is fulfilled with strict inequality.stat.obj is the value of the objective function (1) at the solution.stat.constr is the difference between the left and right hand sides of (2).

Example
Using the same design matrix and response vector as in the previous example, we want to calculate the Welsh estimate with c = 12.07 (see Table 1) and starting from the least squares solution: >>[b,s,stat] = localCMregr('welsh',12.07,0.5,X,y,X\y)b = 2.28257700557494 -0.07363502459208 s = 4.06413162024954 4. If the vector is followed by the parameter 2, the Matlab-function should return a vector with the second derivative ρ applied to each element in the input vector.Example: >>welsh([-1 0 1 2],2) ans = -0.36791.0000 -0.3679 -0.1282 The routines in CMregr make use of all these four cases.This puts some restrictions on what ρ-functions can be used.The ρ-function has to be twice differentiable!

Table 1 :
(Mendes & Tyler 1995)fficiency and residual gross error sensitivity (γ * r ) for varying choices of c in Tukey biweight and Welsh CM-estimates.The table for Tukey biweight in(Mendes & Tyler 1995)looks different since they use a slightly modified ρ-function in their analysis.The values in Table1hold for tukey and welsh in CMregr.

Table 2 :
Asymptotic relative efficiency for varying choices of asymptotic breakdown point ε in Tukey biweight and Welsh S-estimates.