Rank-Based Analyses of Linear Models Using R

It is well-known that Wilcoxon procedures out perform least squares procedures when the data deviate from normality and/or contain outliers. These procedures can be generalized by introducing weights; yielding so-called weighted Wilcoxon (WW) techniques. In this paper we demonstrate how WW-estimates can be calculated using an L 1 regression routine. More importantly, we present a collection of functions that can be used to implement a robust analysis of a linear model based on WW-estimates. For instance, estimation, tests of linear hypotheses, residual analyses, and diagnostics to detect diﬀerences in ﬁts for various weighting schemes are discussed. We analyze a regression model, designed experiment, and autoregressive time series model for the sake of illustration. We have chosen to implement the suite of functions using the R statistical software package. Because R is freely available and runs on multiple platforms, WW-estimation and associated inference is now universally accessible.

In regards to (1), it is well-known that Wilcoxon procedures out perform least squares (LS) procedures when F deviates from the Gaussian distribution. Furthermore, in designed experiments, Wilcoxon procedures provide protection against outlying responses (i.e. Y i ). See, for example, the books Hettmansperger (1984) and Hettmansperger and McKean (1998). Here and after we refer to the Hettmansperger and McKean reference as HM.
Briefly, a Wilcoxon estimate of β is defined to be a minimum of the following dispersion function where ε i (β) = Y i − X i β and R (ε i (β)) denotes the rank of ε i (β) among {ε j (β)}. This corresponds to the dispersion function of Jaeckel (1972) with Wilcoxon scores. Because (2) is invariant to location, β 0 can not be simultaneously estimated with β. Instead, β 0 = med{ε i ( β)}, where β is a minimum of (2), is typically used as an estimate of β 0 . See, for example, Section 3.5.2 of HM (1998).
Now, since Wilcoxon estimates are only robust in regards to the response, they may not be appropriate in observational studies if the independent variables (i.e. X i ) are contaminated.
As an alternative, one can consider an analysis based on weighted Wilcoxon (WW) estimates.
In short, a WW-estimate corresponds to a minimum of the following objective function where b ij denotes a weight to be used in the (i, j)th comparison. Note that D W R (β) is essentially a weighted version of Gini's mean difference measure of scale. When b ij = 1 for i = j and 0 otherwise, it can be shown (e.g. Hettmansperger 1984, p.277) that D W R (β) = 2D R (β); hence the name WW-estimate.
WW-estimates and corresponding inference have been studied extensively in the context of linear regression models. See, for example, Sievers (1983); ; Chang, McKean, Naranjo, and Sheather (1999) and Chapter 5 of HM (1998). Depending on the weighting scheme (see Section 2) used, WW-estimates can attain a continuous totally bounded influence function and a positive breakdown point, which for one class is the maximum of 50%. Thus, this class of estimates can be simultaneously robust and highly efficient. This makes WW-estimates particularly appealing when it comes to autoregressive time series analysis where an observation plays a dual role as both a response and an explanatory variable. The paper by Terpstra, McKean, and Naranjo (2001) provides a good overview of WW-estimates and their application to autoregressive time series modeling.
In spite of the abundance of literature, the desirable robustness and efficiency properties, and the wide applicability to various linear models, this class of estimators has yet to be implemented in any mainstream statistical software packages. It is here where we hope to make a contribution. That is, we present an R (see e.g. Ihaka and Gentleman 1996) implementation of WW-estimates and corresponding inference.
More specifically, we review some popular weighting schemes from the literature and illustrate how these weights and corresponding estimates can be calculated with an L 1 regression routine in Section 2. Furthermore, the asymptotic theory for WW-estimation and inference is well established. For example, the asymptotic distribution of the estimates, tests of linear hypotheses, studentized residuals, and diagnostics for comparing different WW-fits are summarized in Section 3. In Section 4 we present some of the critical functions that are used to compute these estimates, test statistics, and diagnostics. We also discuss some of the R packages that are required for our implementation. Some data set examples, which include a regression model, a designed experiment, and an autoregressive time series are used for illustration in Section 5. The purpose of these examples is to illustrate the wide applicability of our functions to various kinds of data sets (e.g. observational and experimental). Lastly, Section 6 provides some examples relating to simulation studies and bootstrapping. We conclude with a brief discussion of possible implementations in other statistical software packages.

Computational details
To compute the WW-estimate one can use an L 1 regression routine with playing the role of the response variables and design points, respectively. Note that this is essentially how weighted least squares regression estimators are calculated. However, to our knowledge, R does not provide an explicit L 1 regression function. Nevertheless, since L 1 regression estimates are equivalent to (median) quantile regression estimates, the quantreg package written by Roger Koenker can be used to calculate the WW-estimate. We refer the interested reader to Koenker and Bassett (1978) for more information on regression quantiles. To obtain the estimates then, we can call the rq.fit.br and rq.fit.fn functions using the aforementioned weighted pairwise differences to obtain the estimates. We note that rq.fit.br and rq.fit.fn are based on exterior and interior point methods, respectively. According to the R documentation for rq, rq.fit.br is recommended for smaller scaled problems. For instance, when the sample size is smaller than 5,000 and the number of regressors is less than 20.
As shown, WW-estimates are readily obtained once the weights (i.e. b ij ) have been determined. In this paper, we essentially consider three classes of weights; namely constant weights, Mallows (1975) weights, and Schweppe (e.g. Handschin, Kohlas, Fiechter, and Schweppe 1975) and Chang et al. (1999) weights. Briefly, weight functions that only depend on the design points are typically referred to as Mallows weights. Examples are given in Sections 2.2 and 2.3. On the other hand, a weighting scheme that depends on both the design point and the response is a member of the Schweppe class. See, for example, Section 2.4. We now give a brief discussion of some of the more popular members of these classes that appear in the literature.

Wilcoxon weights
The simplest weighting scheme corresponds to b ij = 1 for i = j and 0 otherwise. Note that ε j (β) − ε i (β) = 0 when i = j so that the value of b ii is essentially arbitrary. Practically speaking then, these weights are constant weights. Furthermore, as mentioned in Section 1, these weights yield the well-known rank-based Wilcoxon (WIL) estimate. Wilcoxon procedures are typically more efficient than least squares procedures when the data are non-normal and feature 95.5% efficiency when the data are normally distributed (e.g. HM (1998, p.163)). However, the influence function of the Wilcoxon estimate is only bounded for the response and not the design point (e.g. HM (1998, p.164)). Thus, Wilcoxon estimates are not robust against outlying points in the design. This, of course, is irrelevant when the data are obtained from a designed experiment.

Theil weights
When it comes to discussing outlier resistant estimates for the simple linear regression model many nonparametric textbooks present the median of the pairwise slopes (e.g. Theil 1950) as an estimate of β. For example, the books by Conover (1999), Daniel (1990), and Hollander and Wolfe (1999) discuss this estimator. Now, suppose the weights in (3) are defined by b ij = |X j − X i | −1 ; ignoring the possibility of ties for the sake of simplicity. Then, as shown in Terpstra et al. (2001), the minimum of (3) corresponds to Theil's estimator. Thus, we see that Theil's estimator is a member of the class of WW-estimates. From this perspective then, a generalization of Theil's estimator to the case where p > 1 can be obtained by letting b ij = X j − X i −1 where · represents the Euclidean norm. Note that these weights are of . That is, this weighting scheme is a member of the Mallows class.  derived both the influence function and breakdown point of the Mallows-based WW-estimate. These results are also stated as Theorems 5.7.1 and 5.7.3 respectively in HM (1998). The theorems imply that Theil's estimator is a bounded influence estimator that attains a breakdown point of 1/3. Hence, we see that the Theil estimator is robust.

GR weights
Another Mallows-based weighting scheme is defined by Here, c and k correspond to tuning constants and d 2 i (X i ) denotes the squared Mahalanobis distance of X i based on some (robust) measure of location and dispersion for the design set {X i }. For example, our default implementation calculates d 2 i (X i ) using the fast minimum covariance determinant estimates proposed by Rousseeuw and Van Driessen (1999). These estimates are available in R through the lqs (R 1.8.1 and earlier) and MASS (R 1.9.0 and later) packages written by Brian Ripley. For the tuning constants, we use c = χ 2 0.95 (p), the 95th percentile of a χ 2 (p) distribution, and k = 2.
These weights have been studied extensively in the context of linear regression. The interested reader is referred to  ;Naranjo, McKean, Sheather, and Hettmansperger (1994);McKean, Naranjo, and Sheather (1996b) and Chapter 5 of HM (1998). Once more, it follows from the results of  that these weights admit a bounded influence function and a positive breakdown point. See also McKean, Naranjo, and Sheather (1996a). We follow the convention in the literature and refer to this particular WW-estimate as a generalized rank (GR) estimate. Lastly, note that there is a fundamental difference between the GR-estimate and the Theil estimate. That is, the weights for the GR-estimate are factored, and the weights for the Theil estimate are not factored.

HBR weights
These weights also yield robust estimates but typically have higher efficiency than the Theil and GR estimates. More specifically, let (4), and ψ(t) = (t − sgn (t))I(−1 < t < 1) + sgn (t). The tuning constant, b, is set at [med{a i } + 3MAD{a i }] 2 and Lastly, ε i ( β 0 ) denotes the ith residual based on an initial estimate. Note that these weights incorporate information from the design points and the responses via the initial residuals. Our default implementation uses the fast least trimmed squares estimator proposed by Rousseeuw and Van Driessen (2002) for β 0 . Again, this estimate is available in R via the lqs or MASS package. The weights in (5) were suggested by Chang et al. (1999) and are used in Section 5.8.1 of HM (1998). This particular WW-estimate is referred to as the HBR-estimate since it acquires a 50% breakdown point provided the initial estimates (i.e. µ, Σ, and β 0 ) are high breakdown (50%) estimates.

Theoretical results
This section summarizes some of the main theoretical results pertaining to the estimation, inference, and diagnostic procedures that we have chosen to implement. In general, WWestimates do not exist in closed form. Thus, it is not universally possible to determine exact distributions of estimates and/or test statistics. Expectedly then, all of the results presented in this section are asymptotic in nature. For the sake of brevity, we refer the reader to appropriate references for the technical details.

Asymptotic distributions
Recall that D W R (β) is invariant to location and therefore, β 0 can not be directly estimated via the minimization. As an estimate then, we use β 0 = med{Y i − X i β W R } where β W R denotes a minimum of (3). Note that this is essentially a robust analog of the least squares estimate where the mean of the residuals is used as an estimate of the intercept. See, for example, Section 3.5 of HM (1998). In what follows we let θ = ( β 0 , β W R ) denote the joint estimated parameter vector.
The main result is that θ is asymptotically normal. That is, where Ω has the general form Here, x denotes the p × 1 vector of column means corresponding to the n × p design matrix X and τ s = (2f (0)) −1 . The remaining components (i.e. τ , C, and V ) depend on the type of weights that are used.
For Mallows weighting schemes define W to be the n × n matrix whose elements are where b ii is defined to be zero. Note that both W and X depend on n. Then, the quantities of Ω are defined as follows: The details regarding this result can be found in Sievers (1983) and/or Section 5.2 of HM (1998). We note that when Wilcoxon weights are used, where X c denotes the centered design matrix. Now, for practical applications an estimate of Ω is needed. Estimates of C and V correspond to (8) without the limits. For the scale parameters, we have implemented the confidence interval estimate discussed on pages 7-8 and 25-26 of HM (1998) for τ s and the density estimate presented in Section 3.7.1 of HM (1998) for τ .
Next, let us consider Schweppe weights; and recall that these weights are random since they depend on the response variable. Essentially, this is what is responsible for changing the definitions of τ , C, and V from those given in (8). Here, we have τ = 1/2. However, for C and V we need to define the following quantities: Then, C = plim n→∞ C n and V = lim The details regarding this result can be found in Chang et al. (1999) and/or Section 5.8 of HM (1998). Estimates of these quantities are also discussed in these references.

Tests of linear hypotheses
Consider testing a general linear hypothesis of the form where A is a q × p hypothesis matrix with rank q. For example, when A is the p × p identity matrix, H 1 corresponds to regression significance. We only consider tests of the regression parameters as these are typically the focus in regression analyses.
The first test is based on a standardization of the full model estimate and is typically referred to as a Wald test. More specifically, let β W R denote a WW-estimate obtained from minimizing (3) and let Ω 2,2 denote a consistent estimate of τ 2 C −1 V C −1 , which is defined in (6). Then, an approximate α level test of (11) is given by is larger than χ 2 1−α (q). However, finite sample simulation studies suggest that a better test is given by: percentile of a F distribution with q and n − p − 1 degrees of freedom. See, for example, McKean and Sheather (1991) and/or Section 3.6 of HM (1998). Note that this test can be modified to include the intercept term should one choose to do so.
The second test, typically referred to as a drop in dispersion test, is based on both the full model estimate (say β f ) and a reduced model estimate (say β r ). That is, the reduced model estimate minimizes (3) subject to the linear constraints in (11). This can be accomplished using a QR decomposition of the matrix A . Our discussion here is limited to Mallows schemes since this is the extent of the current (at least to our knowledge) theoretical development; see, for example, Theorem 5.2.12 of HM (1998). Nevertheless, in principle, the same idea is also applicable to Schweppe weights. Now, once the reduced and full model estimates are obtained, the drop in dispersion test statistic is given by where D W R (·) denotes the dispersion function in (3) and τ is a consistent estimate of τ . Next, let C and V be the matrices defined in (8), let C r denote the reduced model analog of C, and define . . , λ q are the q positive eigenvalues of V (C −1 − C + ) and χ 2 1 (1), χ 2 2 (1), . . . , χ 2 q (1) are iid χ 2 random variables, each with one degree of freedom. To obtain a p-value, Hettmansperger and McKean (1998, p.288) suggest either bootstrapping (see e.g. Section 6.1) the test statistic or simulating the sum of weighted χ 2 distributions.
Our current implementation only considers Wilcoxon weights (i.e. b ij = 1). For these weights it is readily shown that the q positive eigenvalues are all equal to one, so the limiting distribution is χ 2 with q degrees of freedom. However, like the Wald test, finite sample simulation studies suggest that a better test is given by McKean and Sheather (1991) and/or Section 3.6 of HM (1998). Lastly, it should be pointed out that Hettmansperger and McKean (1983) compared the performances of F R and W 2 /q via small sample simulation studies. In short, their findings indicate that F R exhibits a more stable Type I error rate and slightly dominates W 2 /q in terms of power. This is consistent with the simulation study presented in Section 6.2 of this paper.

Studentized residuals
For general residual analyses it is desirable to know the variance of ε i , say σ 2 i . Then, the studentized residual, which is often used for outlier identification, is defined as ε i / σ i . A general rule of thumb is to declare the ith observation a potential outlier if the absolute value of the studentized residual is larger than two.
In what follows we let ε M and ε S denote n × 1 vectors which contain the residuals for Mallows and Schweppe weights, respectively. Then, a first order approximation of VAR [ ε M ] is given by where See, for example,  and Section 5.4 of HM (1998). We note that (14) is valid for any Mallows weighting scheme. In particular, (14) holds for the three weighting schemes discussed in Sections 2.1, 2.2, and 2.3; the only quantity that changes is the W matrix, which appears in K w . Now, in practical applications we need estimates of the quantities defined in (15). For σ we use MAD{ ε M,i }, where MAD represents the median absolute difference estimator of scale. Estimates of τ s and τ were discussed in conjunction with (8). For the remaining quantities we use the residual-based moment estimators. See, for example, Section 5.4 of HM (1998).
The Schweppe weights version of (14) is similar and is given by See, for example, Chang et al. (1999) and Section 5.9 of HM (1998). Here, C and V correspond to the matrices defined in conjunction with (9) and (10) respectively, κ 1 = E [|ε 1 |], and Residual-based moment estimates of κ 1 and κ 2 can be obtained by replacing F with the empirical cumulative distribution function of the residuals. The other quantities, namely σ 2 , τ s , τ , I, and J are identical to those defined for Mallows weights. Finally, A = n( √ 12τ ) −1 W where W corresponds to the Schweppe analog of (7). Once again, we note that (16) holds for any Schweppe weighting scheme since the only quantities that depend on the weights are A, C, and V .

Diagnostics for comparing fits
It is clear from Section 2 that different weighting schemes yield different estimates. In fact, it follows from Lemma 5.2.11 of HM (1998) that the most efficient WW-estimate corresponds to the Wilcoxon estimate. Recall that all points are equally weighted for this particular estimate. However, equal weights do not always make sense for those data sets which contain outliers. Thus, it is desirable to have a diagnostic that compares the Wilcoxon estimate (b ij = 1 for all (i, j)) to a WW-estimate (b ij = 1 for some (i, j)). Actually, in principle, the diagnostics in this section can be used to compare any two WW-estimates (e.g. GR and HBR). However, to our knowledge, no studies directly address this problem. Nevertheless, in view of the theory presented in Section 5.5 of HM (1998), such a practice can be justified.
In what follows we let θ R = ( β R0 , β R ) denote the parameter estimates based on Wilcoxon weights. An analogous estimate, say θ W R , will be defined for any WW-estimate whose weights are not all equal to one. Then, the two estimates can be compared using the following diagnostic where Ω R is an estimate of the Wilcoxon version of (6). A more thorough discussion of this diagnostic can be found in McKean et al. (1996a), McKean et al. (1996b) and Section 5.5 of HM (1998). These references suggest using 4(p + 1) 2 /n as a benchmark for declaring the fits to be different. Now, when two fits are declared to be different it is sometimes desirable to investigate the nature of the difference. This can be accomplished by comparing the individual fits for the two estimates. For example, let Y R,i = β R0 + X i β R denote the Wilcoxon fit for the ith case and let Y W R,i denote the ith fit for the other WW-estimate. A diagnostic which compares these fits is given by where the denominator of (18) corresponds to the estimated standard error of Y R,i . See, for example, Section 5.5 of HM (1998). Hettmansperger and McKean (1998, p.303) suggest using 2 (p + 1)/n as a benchmark for declaring two individual fits different. We note that these diagnostics are designed to distinguish differences between fits and therefore, do not necessarily provide any information regarding which fit is best. Instead, we recommend a standard residual analysis, based on the studentized residuals of Section 3.3, for this endeavor.

R code 4.1. Web resources
In this section we give a brief description of some of the R functions that can be used to perform a weighted Wilcoxon analysis. However, we begin by listing some important web sites related to our software.

Descriptions of R functions
wwest. This function performs a weighted Wilcoxon analysis using the weighting schemes discussed in Section 2. As input, this function requires a design matrix, a response vector, and a name for the weighting scheme. Arbitrary Mallows weights are also allowed. Specifically, output is produced which summarizes a test of regression significance along with tests on individual parameters. A graphical residual analysis can also be obtained.
wwfit. As input, this function requires a design matrix, a response vector, and an n(n − 1)/2 × 1 vector of weights. Note that this function is not limited to the weighting schemes discussed in Section 2. It then minimizes the dispersion function in (3) via the L 1 procedure discussed in Section 2. The return value is a list which contains the parameter estimates, residuals, and a weight matrix.
wilwts. This function returns an n(n − 1)/2 × 1 vector of ones. These weights correspond to the Wilcoxon estimate.
theilwts. This function returns the n(n − 1)/2 × 1 vector of Theil weights discussed in Section 2.2. When p = 1 the slope estimate corresponds to the median of the pairwise slopes.
grwts. This function returns the n(n − 1)/2 × 1 vector of GR weights defined in (4). The function is flexible enough to accommodate arbitrary distances and tuning constants.
hbrwts. This function returns the n(n − 1)/2 × 1 vector of HBR weights defined in (5). The function is flexible enough to accommodate arbitrary distances, initial estimates, and tuning constants.
wts. This is a wrapper function that essentially calls one of the above weight functions. Typically, it is only used in conjunction with wwfit. For example, in simulation studies where only the estimates are being studied, it is not necessary to evaluate the extra output produced by wwest.
taustar. This function calculates the confidence interval estimate of τ s = (2f (0)) −1 discussed on pages 7-8 and 25-26 of HM (1998). It corresponds to the asymptotic standard deviation of the median-based estimate of the intercept parameter.
varcov.gr. This function calculates an estimate of the variance-covariance matrix for the regression coefficients (including the intercept) when Mallows weights are used. In particular, it can be used to determine the variance-covariance matrix of the Wilcoxon estimate. It returns several components associated with the matrix. See, for example, (6), (7), and (8).
varcov.hbr. This function calculates an estimate of the variance-covariance matrix for the regression coefficients (including the intercept) when Schweppe weights are used. It returns several components associated with the matrix. See, for example, (6), (9), and (10).
wald. This function calculates the Wald statistic defined in (12) and the corresponding pvalue for the hypotheses given in (11). It requires both the intercept parameter estimate and the regression parameter estimates. The use of the F distribution for calculating p-values is documented in McKean and Sheather (1991) and Section 3.6 of HM (1998). (11). The test statistic is defined after (13). Our current implementation requires that Wilcoxon weights (i.e. b ij = 1) be used for the analysis. Again, the use of the F distribution for calculating p-values is documented in McKean and Sheather (1991) and Section 3.6 of HM (1998).

droptest. This function performs a drop in dispersion test for the general linear hypotheses defined in
redmod. This function obtains the reduced model design matrix used by droptest. The calculation of this matrix is based on a QR decomposition of A where A is defined in (11). See, for example, Theorem 3.7.2 of HM (1998).

cellmntest. This function performs a Wilcoxon-based drop in dispersion test and returns a robust ANOVA table for the one-way analysis of variance model. The interested reader is referred to Chapter 4 of HM (1998) for the details.
pwcomp. This function is typically used in conjuntion with cellmntest. It examines all C p 2 pairwise comparisons using Wilcoxon-based drop in dispersion tests. Note that this is similar in nature to the protected LSD method. See, for example, Kuehl (2000, p.111).
studres.gr. This function calculates the studentized residuals corresponding to Mallowsbased weighted Wilcoxon estimates (see (14)). These residuals can be used to construct residual plots and flag potential outliers. For instance, if the absolute value of the studentized residual exceeds two then one might declare the observation an outlier. studres.hbr. This function is analogous to studres.gr except that it corresponds to WWestimates based on Schweppe weights (see (16)).
fitdiag. This function returns the diganostics T DBET AS R and CF IT S R,i which are defined in (17) and (18), respectively. The current implementation allows the following comparisons: WIL vs. GR, WIL vs. HBR, GR vs. HBR, and WIL vs. LS. All comparisons are standardized using the WIL fit.
plotfitdiag. This function produces a casewise plot of CF IT S R,i based on the results of fitdiag. Information on the total change in fits (i.e. T DBET AS R ) is also reported.

Data set examples
We now present some data set examples which illustrate the use of our R functions and corresponding output. Other examples can be found in our help file (wwhlp.r).

Hawkins data set
For our first example we analyze the well-known data set constructed by Hawkins, Bradu, and Kass (1984). Briefly, this data set contains 75 observations on one response variable and three predictor variables. The first 10 observations are bad leverage points, observations 11-14 are good leverage points, and the remaining observations are consistent with the underlying model. This data set is typically used to illustrate the degree of robustness (or lack of) an estimator exhibits toward outlying data points.
The analysis will parallel that given in Section 5.3 of HM (1998); except that we will compare the WIL estimator to the HBR estimator (as opposed to the GR estimator). Figures 1 and  2 pertain to the WIL estimate while Figures 3 and 4 display the HBR results. Figure 5 compares the WIL and HBR fits directly.
To begin, note that wwest essentially serves as an all-purpose estimation and inference function. It makes calls to several of the functions described in Section 4.2. In particular, calls to wwfit, wald, and regrtest are made. Note that the type of weighted Wilcoxon analysis is controlled by the bij argument. Figures 1 and 3 display the results for the WIL and HBR fits, respectively. The output is similar to that produced by many statistical software packages. That is, a test of general regression significance along with significance tests for individual model parameters are output. With the exception of the drop in dispersion test for the Wilcoxon estimate (i.e. F R ), all tests are Wald-based (i.e. (12)) procedures. Finally, the user is given an option to view a graphical residual analysis.
Figures 2 and 4 display the residual analyses for the WIL and HBR fits, respectively. The standard residual vs. fit and normal probability plots are given in positions (1,1) and (2,2), respectively. As a compliment to the normal probability plot, a histogram of the residuals appears in position (1,2). Lastly, a case plot of the studentized residuals (recall (14) and (16)) is given in position (2,1). Here, we can use the ±2 benchmark to help identify potential outliers. Finally, a few words regarding the WIL and HBR fits are in order. It is clear from Figure  5 that the WIL and HBR fits are quite different; most notably, for the first 14 observations. It is evident from Figure 2 that the WIL fit favors the bad leverage points. For example, both residual plots identify the four good leverage points as outlying observations. On the other hand, the HBR-based residual plots in Figure 4 identify the 10 bad leverage points as outlying observations. Lastly, we note that the inference results in Figures 1 and 3 contradict one another. That is, the WIL test for regression significance is significant while the HBR test is not significant. Actually, all of this comes as no surprise given the well-known fact that WIL procedures are not robust against bad leverage points.

LDL cholesterol in quail
This data set contains the LDL Cholesterol levels of 39 different quail. The data were obtained using a completely randomized design with four treatments. Each treatment essentially corresponds to a different diet containing a different drug compound. To begin, note that this is a designed experiment. Therefore, the design matrix will not contain any leverage points. It does not make sense then, from an efficiency point of view, to use a weighted Wilcoxon estimate that downweights leverage points (e.g. GR or HBR). In general, we recommend that Wilcoxon-based procedures be used for experimental design situations. See Chapter 4 of HM (1998) for details.
We are interested in testing for a significant treatment (i.e. diet) effect. This can be accomplished using the cellmntest function. Figure 6 illustrates the steps involved as well as the corresponding output. Briefly, cellmntest performs a WIL-based drop in dispersion test and returns a robust ANOVA table for the one-way analysis of variance model. From Figure 6 we see that F R = 3.79 and the corresponding p-value is 0.0187. Hence, the diets are significantly different at the 0.05 level.  Now, given a significant result, it is customary to examine the pairwise differences in order to help assess the nature of the rejection. To this end, we note that cellmntest has an argument which allows one to test where µ is the p × 1 vector of cell locations and A is an arbitrary q × p contrast matrix. For example, if p = 4, calling cellmntest with A = (1, 0, −1, 0) tests for location differences between populations one and three. Thus, we can use cellmntest to examine each of the C p 2 pairwise comparisons by defining an appropriate A matrix. When used in this manner we essentially have a Wilcoxon-based drop in dispersion version of the well-known protected LSD method. See, for example, Kuehl (2000, p.111). In fact, this is exactly what the pwcomp function is used for. From Figure 6 then, we see that diet number 2 yields the lowest cholesterol levels and this diet is significantly different from the others.

Residential extensions data
A widely used model in time series analysis is the (stationary) autoregressive model of order p, which we denote as AR(p). In short, an AR(p) model is a linear regression model where the response variable corresponds to the current time series value and the independent variables represent the previous p values of the time series. See, for example, Fuller (1996) for a more thorough description. Thus, from this perspective, our suite of R functions is equally applicable to autoregressive time series analysis. In fact, WW-estimates are particularly appealing because outlying observations play a dual role as both response and explanatory variables. Terpstra et al. (2001) provides a good overview of WW-estimates and their application to autoregressive time series modeling.
A widely cited example in the robust time series literature is a monthly time series (RESX), which originated at Bell Canada. The data set can be found in Rousseeuw and Leroy (1987, p.278-280). The series consists of the number of telephone installations in a given region and   has two obvious outliers. The outliers are essentially attributed to bargain months where telephone installations were free. Historically, the stationary zero mean AR(2) has been used to model the seasonally differenced series.
The WIL, GR, and HBR parameter estimates and standard errors for the AR(2) model are given in Table 1. Again, we used the function wwest to obtain these results. However, we have chosen to suppress the output in the interest of space. Note that there appears to be some discrepancy between the three estimates. In particular, the sign of β 2 is negative for the WIL estimate and positive for the GR and HBR estimates. This difference between the WIL estimate and the other WW-estimates can provide valuable insight into the type of outliers present.
For example, let {Y t } denote the observed time series where Y t = X t +a t , {X t } is an underlying AR(p) time series, and {a t } is an iid sequence of random variables that possess a mixture distribution, say (1−q)δ 0 +qH. Here, q represents the proportion of outliers, δ 0 is a degenerate distribution at zero, and H is some contaminating distribution function. Now, when q = 0 this model corresponds to the Type II or Innovation Outlier (IO) model of Fox (1972 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3   the IO model, outliers are introduced through the error distribution corresponding to {X t } and consequently produce good leverage points in the sense that the fit yields a small residual at these points. In Type I or Additive Outlier (AO) model (e.g. q > 0) of Fox (1972) outliers do not become part of the underlying model and, as a consequence, result in bad leverage points. Of course, any given time series may contain both IOs and AOs either in isolation and/or patches. Now, as demonstrated by Terpstra et al. (2001), the WIL estimate is highly efficient under an IO model. However, when AOs are present the WIL estimate is not robust and can differ from other WW-estimates; in particular, the GR and HBR estimates. Thus, for time series analysis, it is important to both identify and distinguish between the different types of outliers. Indeed, the T DBET AS R and CF IT S R,i diagnostics discussed in Section 3.4 are suitable for these types of comparisons. HBR T DBET AS R . That is, the GR and HBR fits are much more similar. This indicates the presence of additive outliers. For example, recall that the WIL estimate does not downweight any observations, the GR estimate downweights all leverage points, and the HBR estimate attempts to downweight only bad leverage points. Since additive outliers typically produce bad leverage points, the GR and HBR estimates will tend to be similar, but different from the WIL estimate. On the other hand, if all fits appear to be similar then any potential outliers will tend to be of the innovation variety. Lastly, we note that the CF IT S R,i diagnostics clearly indicate the two previously mentioned outliers.

Simulation applications
In this section, we present two statistical applications of our software. We would like to point out that, using our R functions, the effort to code these applications was quite minimal. The functions pertaining to these applications (bootsim.s, polysim1.s) are also available on our web site. The address is given in Section 4.1.

The bootstrap
For our first application, we consider bootstrapping the p-value of the test statistic (i.e. F R ) defined after (13). We have chosen to use the bootstrap where the model is rebuilt based on a sample of full model residuals; see Efron and Tibshirani (1993) for a discussion. A brief description of the general algorithm is as follows. (1) Fit model (1). Then obtain the full model residuals, say { ε i }, and the value of the test statistic, F R .
(3) Obtain a bootstrap sample of size n, say { ε * i }, by sampling with replacement from { ε i }. Now let Y * i = ε * i .
The bootstrap p-value is then given by where I(·) denotes the indicator function.
Note that the regression equivariance property of the estimate and step (3) imply that the fitting, and hence testing, is performed under the assumption that H 0 is true. The R code for this bootstrap consists of a wrapper function (bootsim.s) around droptest. We used the above algorithm on an example involving generated data from the model where the {ε i } and {X ji }, j = 1, 2, 3, were iid N (0, 1) variates. We set all of the β j equal to zero and used n = 30 for the sample size. The hypotheses considered were H 0 : β 2 = β 3 = 0 versus H 1 : β 2 = 0 and/or β 3 = 0.
The actual data set (eg.dat) can be obtained from our web site given in Section 4.1. For the data, the value of F R was 1.217 with a p-value of 0.312 (based on the approximate Fdistribution). Based on N B = 500 bootstraps, we calculated the bootstrap p-value to be 0.294.
Next, we simulated the process 1000 times, drawing new data each time. Figure 8 displays the asymptotic p-values versus the bootstrap p-values for the test statistic F R . In addition, Table 2 displays the empirical α levels for the nominal α values listed. We note that the α levels for the asymptotic version of the test are quite close to the nominal values whereas the α values for the bootstrap version of the test appear to be somewhat liberal. Generally speaking, the plot on the left side of Figure 8 indicates the p-values for the two tests are quite close. However, upon closer inspection (see right side of Figure 8) of the region which contains smaller p-values, we see that the asymptotic p-values tend to be slightly larger than the bootstrap p-values. In summary though, this simulation suggests that both tests performed reasonably well.

The order of a polynomial model
In this application, we investigate an algorithm for the determination of the order of a polynomial model. Consider a polynomial model of the form where ε 1 , ε 2 , . . . , ε n are iid with pdf f . Graybill (1976) presents the following algorithm for determining the order (i.e. p) of model (19).
Algorithm 6.2 (Graybill) Select a super order P so that the true order p is less than or equal to P . Select a significance level α.
(3) If H 0 is rejected, declare p to be the order and stop; else, set p = p − 1 and return to step (1).
We conducted a pilot study of the powers of five tests for this algorithm: WIL (both the Wald test and the drop in dispersion test), GR, HBR, and LS. Using the functions described in Section 4.2, the coding for the simulation was straightforward. For example, the three weighted Wilcoxon Wald tests involved calls to wwest and wald, while the drop in dispersion test was performed with the droptest function. For LS, we used the lsfit and wald functions. The R wrapper (polysim1.s) to do the simulations can be downloaded from our web site given in Section 4.1.
For the pilot study, we only considered eight situations. Each situation used a sample size of n = 30. Four of the situations used N (0, 1) simulated errors, while the remaining four used contaminated normal simulated variates. For the contaminated variates, we set the contamination at 20% and the ratio of standard deviations, contaminated to good, at 10. The regression predictors were iid N (0, 1) variables. The first through fourth situations within each distribution consisted of a polynomial of degree 1 through 4, respectively. We set β i = 0.4 for situation i of the normal errors and β j = 0 for j = i. The settings for the contaminated normals were the same, except 0.7 was substituted for 0.4. Finally, we used P = 4 for all situations. We ran 1000 simulations for each situation. Our interest was in how well the algorithm worked for each procedure. Table 3 reports the percentage of times that a particular procedure chose the correct order of the polynomial. The LS test did the best for normal errors, followed closely by the Wilcoxon drop in dispersion test, F R . For contaminated normal variates, the LS test performed poorly versus the Wilcoxon drop in dispersion test, the Wilcoxon Wald test, and the HBR test. Of the Wilcoxon procedures, the drop in dispersion test did the best. The poor behavior of the GR test confirms the discussion in  on the poor efficiency of high breakdown estimators in detecting the curvature in polynomial models. However, note that the HBR estimator recovers much of the efficiency that the GR estimator lost in the presence of curvature; although, in general, it did poorer than the Wilcoxon procedures.

Discussion and conclusion
As discussed in this paper, weighted Wilcoxon analyses can range from highly efficient to highly robust, depending on the weights employed. Nevertheless, this class of estimators has yet to be implemented in any mainstream statistical software packages. This paper addresses this issue with a suite of R functions. However, in principle, the algorithm used to obtain these analyses can be adapted to any statistical software package that has L 1 regression capabilities. For example, in S-PLUS, L 1 regression estimates are computed via the l1fit function. Therefore, upon making the appropriate substitutions to wwfit, one can readily use the functions in S-PLUS as well. In SAS, L 1 regression estimates can be computed using the IML procedure and the LAV subroutine. Furthermore, the weights defined in (4) and (5) can be computed using calls to MAD, MCD, and LTS.
However, some vigilance is necessary when considering this approach. For example, as discussed in Section 2.1, the Wilcoxon estimate minimizes the L 1 objective function applied to the differences of the residuals. In contrast, the L 1 regression estimate minimizes the sum of the absolute values of the residuals. Thus, the Wilcoxon and L 1 estimators are quite different. As outlined in Section 3.8 of HM (1998), the L 1 estimator is equivalent to an R-estimator using sign scores. Hence, the efficiency properties of the L 1 estimator are the same as those associated with the median and sign test in location problems. In particular, the L 1 estimator for linear models has an asymptotic relative efficiency (ARE) of 0.637 relative to the least squares estimator when the errors have a normal distribution, while the Wilcoxon regression estimator has an ARE of 0.955. Of course, for very heavy tailed data, the L 1 estimator is generally more efficient. Furthermore, suppose a user naively uses the inference results produced by an L 1 computing package for the corresponding WW-inferences. While such inference is appropriate for the L 1 estimator, it is not in general appropriate for the WW-estimator. For example, recall that D W R (β) is invariant to location and therefore, β 0 can not be directly estimated via the minimization. Hence, any displayed inference results for an intercept parameter would not be valid. This can be further illustrated by considering the standard errors for a simple linear regression model, say Y i = β 0 + β 1 X i + ε i , i = 1, 2, . . . , n. For instance, the asymptotic standard error for the Wilcoxon estimate of β 1 is given by where τ = ( √ 12E [f (ε 1 )]) −1 . However, in using an L 1 package for the computation of the denominator of (20), the X i s would be replaced by the pairwise differences of the X i s. Thus, an adjustment would need to be made. In general, the estimate of τ would also be different. For example, the L 1 estimate is essentially an estimator of the reciprocal of the mode of the error distribution. However, weighted differences of the residuals are used for the WWestimate. Hence, the user would have to show that the estimate produced by the package is consistent for τ ; which is unlikely for non-constant weighting schemes. Instead, we simply recommend that the user obtain the inference appropriate for the WW-estimator; that is, the inference produced by our R code.