An Object Oriented Framework for Robust Multivariate Analysis

This introduction to the R package rrcov is a (slightly) modiﬁed version of Todorov and Filzmoser (2009), published in the Journal of Statistical Software . Taking advantage of the S 4 class system of the programming environment R , which facilitates the creation and maintenance of reusable and modular components, an object oriented framework for robust multivariate analysis was developed. The framework resides in the packages robustbase and rrcov and includes an almost complete set of algorithms for computing robust multivariate location and scatter, various robust methods for principal component analysis as well as robust linear and quadratic discriminant analysis. The design of these methods follows common patterns which we call statistical design patterns in analogy to the design patterns widely used in software engineering. The application of the framework to data analysis as well as possible extensions by the development of new methods is demonstrated on examples which themselves are part of the package rrcov .


Introduction
Outliers are present in virtually every data set in any application domain, and the identification of outliers has a hundred years long history. Many researchers in science, industry and economics work with huge amounts of data and this even increases the possibility of anomalous data and makes their (visual) detection more difficult. Taking into account the multivariate aspect of the data, the outlyingness of the observations can be measured by the Mahalanobis distance which is based on location and scatter estimates of the data set. In order to avoid the masking effect, robust estimates of these parameters are called for, even more, they must possess a positive breakdown point. The estimates of the multivariate location vector µ and the scatter matrix Σ are also a cornerstone in the analysis of multidimensional data, since they form the input to many classical multivariate methods. The most common estimators of multivariate location and scatter are the sample meanx and the sample covariance matrix S, i.e., the corresponding MLE estimates. These estimates are optimal if the data come from a multivariate normal distribution but are extremely sensitive to the presence of even a few outliers (atypical values, anomalous observations, gross errors) in the data. If outliers are present in the input data they will influence the estimatesx and S and subsequently worsen the performance of the classical multivariate procedure based on these estimates. Therefore it is important to consider robust alternatives to these estimators and actually in the last two decades much effort was devoted to the development of affine equivariant estimators possessing a high breakdown point. The most widely used estimators of this type are the minimum covariance determinant (MCD) estimator of Rousseeuw (1985) for which also a fast computing algorithm was constructed- Rousseeuw and Van Driessen (1999), the S estimators (Davies 1987) and the Stahel-Donoho estimator introduced by Stahel (1981a,b) and Donoho (1982) and studied by Maronna and Yohai (1995). If we give up the requirement for affine equivariance, estimators like the one of Maronna and Zamar (2002) are available and the reward is an extreme gain in speed. Substituting the classical location and scatter estimates by their robust analogues is the most straightforward method for robustifying many multivariate procedures like principal components, discriminant and cluster analysis, canonical correlation, etc. The reliable identification of multivariate outliers which is an important task in itself, is another approach to robustifying many classical multivariate methods.
Some of these estimates and procedures became available in the popular statistical packages like S-PLUS, SAS, MATLAB as well as in R but nevertheless it is recognized that the robust methods have not yet replaced the ordinary least square techniques as it could be expected (Morgenthaler 2007;Stromberg 2004). One reason is the lack of easily accessible and easy to use software, that is software which presents the robust procedures as extensions to the classical ones-similar input and output, reasonable defaults for most of the estimation options and visualization tools. As far as the easiness of access is concerned, the robust statistical methods should be implemented in the freely available statistical software package R, (R Development Core Team 2009), which provides a powerful platform for the development of statistical software. These requirements have been defined in the project "Robust Statistics and R", see http://www.statistik.tuwien.ac.at/rsr/, and a first step in this direction was the initial development of the collaborative package robustbase, (Rousseeuw et al. 2009), with the intention that it becomes the essential robust statistics R package covering the methods described in the recent book Maronna et al. (2006).
During the last decades the object oriented programming paradigm has revolutionized the style of software system design and development. A further step in the software reuse are the object oriented frameworks (see Gamma et al. 1995) which provide technology for reusing both the architecture and the functionality of software components. Taking advantage of the new S4 class system (Chambers 1998) of R which facilitate the creation of reusable and modular components an object oriented framework for robust multivariate analysis was implemented. The goal of the framework is manyfold: methods for robust principal component analysis (PCA) including ROBPCA of Hubert et al. (2005), spherical principal components (SPC) of Locantore et al. (1999), the projection pursuit algorithms of Croux and Ruiz-Gazen (2005) and Croux et al. (2007). Further applications implemented in the framework are linear and quadratic discriminant analysis (see Todorov and Pires 2007, for a review), multivariate tests Todorov and Filzmoser 2010) and outlier detection tools.
The application of the framework to data analysis as well as the development of new methods is illustrated on examples, which themselves are part of the package. Some issues of the object oriented paradigm as applied to the R object model (naming conventions, access methods, coexistence of S3 and S4 classes, usage of UML, etc.) are discussed. The framework is implemented in the R packages robustbase and rrcov, (Todorov 2009), which are available from Comprehensive R Archive Network (CRAN) at http://CRAN.R-project.org under the GNU General Public License.
The rest of the paper is organized as follows. In the next Section 2 the design principles and the structure of the framework is presented as well as some related object oriented concepts are discussed. As a main tool for modeling of the robust estimation methods a statistical design pattern is proposed. Section 3 facilitates the quick start by an example session giving a brief overview of the framework. Section 4 describes the robust multivariate methods, their computation and implementation. The Sections 4.1, 4.2 and 4.3 are dedicated to the estimation of multivariate location and scatter, principal component analysis and discriminant analysis, respectively. For each domain the object model, the available visualization tools, an example, and other relevant information are presented. We conclude in Section 5 with discussion and outline of the future work.

Design approach and structure of the framework
In classical multivariate statistics we rely on parametric models based on assumptions about the structural and the stochastic parts of the model for which optimal procedures are derived, like the least squares estimators and the maximum likelihood estimators. The corresponding robust methods can be seen as extensions to the classical ones which can cope with deviations from the stochastic assumptions thus mitigating the dangers for classical estimators. The developed statistical procedures will remain reliable and reasonably efficient even when such deviations are present. For example in the case of location and covariance estimation the classical theory yields the sample meanx and the sample covariance matrix S, i.e., the corresponding MLE estimates as an optimal solution. One (out of many) robust alternatives is the minimum covariance determinant estimator. When we consider this situation from an object oriented design point of view we can think of an abstract base class representing the estimation problem, a concrete realization of this object-the classical estimates, and a second concrete derivative of the base class representing the MCD estimates. Since there exist many other robust estimators of multivariate location and covariance which share common characteristics we would prefer to add one more level of abstraction by defining an abstract "robust" object from which all other robust estimators are derived. We encounter a similar pattern in most of the other multivariate statistical methods like principal component analysis, linear and quadratic discriminant analysis, etc. and we will call it a statistical design pattern. A schematic representation as an UML diagram is shown in Figure 2. The following simple example demonstrates the functionality. We start with a generic object model of a robust  and the corresponding classical multivariate method with all the necessary interfaces and functionalities and then concretize it to represent the desired class hierarchy. The basic idea is to define an abstract S4 class which has as slots the common data elements of the result of the classical method and its robust counterparts (e.g., Pca). For this abstract class we can implement the standard in R generic functions like print(), summary(), plot() and maybe also predict(). Now we can derive and implement a concrete class which will represent the classical method, say PcaClassic. Further we derive another abstract class which represents a potential robust method we are going to implement, e.g., PcaRobust-it is abstract because we want to have a "placeholder" for the robust methods we are going to develop next. The generic functions that we implemented for the class Pca are still valid for PcaRobust but whenever necessary we can override them with new functionality. Now we have the necessary platform and of course we have had diligently documented everything we have implemented so far-this is our investment in the future development of robust methods from this family. The framework at its current expansion stage provides such platform for several important families of multivariate methods. It is time to dedicate our effort to the development and implementation of our new robust method/class, say PcaHubert and only to this-here comes the first obvious benefit from the framework-we do not need to care for the implementation of print(), summary(), plot() and predict() neither for their documentation or testing.
In contrast to the S3 class system the S4 system requires the creation of objects to be done by the new() function which will perform the necessary validity checks. We go one step further and require that the new() function is not used directly but only through special functions known in R as generating functions or as constructors in the conventional object oriented programming languages. A constructor function has the same name as the corresponding class, takes the estimation options as parameters, organizes the necessary computations and returns an object of the class containing the results of the computation. It can take as a parameter also a control object which itself is an S4 object and contains the estimation options. More details on the generating functions and their application for structuring the user interface can be found in Ruckdeschel et al. (2009).
The main part of the framework is implemented in the package rrcov but it relies on code in the packages robustbase and pcaPP ). The structure of the framework and its relation to other R packages is shown in Figure 2. The framework can be used by other robustbase pcaPP rrcov robust rrcovNA packages, like for example by robust (see Wang et al. 2008) or can be further extended. In Figure 2 a hypothetical package rrcovNA is shown, which could extend the available robust multivariate methods with options for dealing with missing values.
In the rest of this section some object-oriented programming (OOP) concepts will be discussed which are essential for understanding the framework.

UML diagrams
Throughout this paper we exploit UML class diagrams to give a clear picture of the framework and its components. UML stands for Unified Modeling Language-an object-oriented system of notation which has evolved from previous works of Grady Booch, James Rumbaugh and Ivar Jacobson to become a tool accepted by the Object Management Group (OMG) as the standard for modeling object oriented programs (see OMG 2009a,b). A class diagram models the structure and contents of a system by depicting classes, packages, objects and the relations among them with the aim to increase the ease of understanding the considered application. A class is denoted by a box with three compartments which contain the name, the attributes (slots) and operations (methods) of the class, respectively. The class name in italics indicates that the class is abstract. The bottom two compartments could be omitted or they can contain only the key attributes and operations which are useful for understanding the particular diagram. Each attribute is followed by its type and each operation-by the type of its return value. We use the R types like numeric, logical, vector, matrix, etc. but the type can be also a name of an S4 class.
Relationships between classes are denoted by lines or arrows with different form. The inheritance relationship is depicted by a large empty triangular arrowhead pointing to the base class. Composition means that one class contains another one as a slot (not to be mistaken with the keyword "contains" signalling inheritance in R). This relation is represented by an arrow with a solid diamond on the side of the composed class. If a class "uses" another one or depends on it, the classes are connected by a dashed arrow (dependence relation). Packages can also be present in a class diagram-in our case they correspond more or less to R packages-and are shown as tabbed boxes with the name of the package written in the tab (see Figure 2).
All UML diagrams of the framework were created with the open source UML tool ArgoUML (Robbins 1999;Robbins and Redmiles 2000) which is available for download from http: //argouml.tigris.org/.

Design patterns
Design patterns are usually defined as general solutions to recurring design problems and refer to both the description of a solution and an instance of that solution solving a particular problem. The current use of the term design patterns originates in the writings of the architect Christopher Alexander devoted to urban planning and building architecture (Alexander et al. 1977) but it was brought to the software development community by the seminal book of Gamma et al. (1995).
A design pattern can be seen as a template for how to solve a problem which can be used in many different situations. Object-Oriented design patterns are about classes and the relationships between classes or objects at abstract level, without defining the final classes or objects of the particular application. In order to be usable, design patterns must be defined formally and the documentation, including a preferably evocative name, describes the context in which the pattern is used, the pattern structure, the participants and collaboration, thus presenting the suggested solution.
Design patterns are not limited to architecture or software development but can be applied in any domain where solutions are searched for. During the development of the here presented framework several design patterns were identified, which we prefer to call statistical design patterns. The first one was already described earlier in this section and captures the relations among a classical and one or more alternative robust multivariate estimators. Another candidate is the control object encapsulating the estimation parameters and a third one is the factory-like construct which suggests selection of a robust estimation method and creation of the corresponding objects based on the data set characteristics (see Section 4.1). The formal description of these design patterns is beyond the scope of this work and we will limit the discussion to several examples.

Accessor methods
One of the major characteristics and advantages of object oriented programming is the encapsulation. Unfortunately real encapsulation (information hiding) is missing in R, but as far as the access to the member variables is concerned this could be mitigated by defining accessor methods (i.e., methods used to examine or modify the slots (member variables) of a class) and "advising" the users to use them instead of directly accessing the slots. The usual way of defining accessor functions in R is to use the same name as the name of the corresponding slot. For example for the slot a these are:

R> cc <-a(obj) R> a(obj) <-cc
In many cases this is not possible, because of conflicts with other existing functions. For example it is not possible to define an accessor function cov() for the slot cov of class Cov, since the function cov() already exists in the base R. Also it is not immediately seen if a slot is "read only" or can be modified by the user (unfortunately, as already mentioned, every slot in R can be modified by simply using obj@a <-cc). In rrcov a notation was adopted, which is usual in Java: the accessors are defined as getXxx() and setXxx() (if a setXxx() method is missing, we are "not allowed" to change the slot). The use of accessor methods allows to perform computations on demand (getMah(mcd) computes the Mahalanobis distances, stores them into the object and returns them) or even have "virtual" slots which are not at all stored in the object (e.g., getCorr(mcd) computes each time and returns the correlation matrix without storing it).

Naming conventions
There is no agreed naming convention (coding rules) in R but to facilitate the framework usage several simple rules are in order, following the recommended Sun's Java coding style (see http://java.sun.com/docs/codeconv/): • Class, function, method and variable names are alphanumeric, do not contain "-" or "." but rather use interchanging lower and upper case.
• Class names start with an uppercase letter.
• Methods, functions, and variables start with a lowercase letter.
• Exceptions are functions returning an object of a given class (i.e., generating functions or constructors)-they have the same name as the class.
• Variables and methods which are not intended to be seen by the user-i.e., private members-start with ".".
• Violate these rules whenever necessary to maintain compatibility.

Example session
In this section we will introduce the base functionalities of the framework by an example session. First of all we have to load the package rrcov which will cause all necessary packages to be loaded too. The framework includes many example data sets but here we will load only those which will be used throughout the following examples. For the rest of the paper it will be assumed that the package has been loaded already. Most of the multivariate statistical methods are based on estimates of multivariate location and covariance, therefore these estimates play a central role in the framework. We will start with computing the robust minimum covariance determinant estimate for the data set delivery from the package robustbase. The data set (see Rousseeuw and Leroy 1987, Table 23, p. 155) contains delivery time data in 25 observations with 3 variables. The aim is to explain the time required to service a vending machine (Y) by means of the number of products stocked (X1) and the distance walked by the route driver (X2). For this example we will consider only the X part of the data set. After computing its robust location and covariance matrix using the MCD method implemented in the function CovMcd() we can print the results by calling the default show() method on the returned object mcd as well as summary information by the summary() method. The standard output contains the robust estimates of location and covariance. The summary output contains additionally the eigenvalues of the covariance matrix and the robust distances of the data items (Mahalanobis type distances computed with the robust location and covariance instead of the sample ones). Now we will show one of the available plots by calling the plot() method-in Figure 3 the Distance-Distance plot introduced by Rousseeuw and van Zomeren (1991) is presented, which plots the robust distances versus the classical Mahalanobis distances and allows to classify the observations and identify the potential outliers. The description of this plot as well as examples of more graphical displays based on the covariance structure will be shown in Section 4.1.

R> ##
Apart from the demonstrated MCD method the framework provides many other robust estimators of multivariate location and covariance, actually almost all of the well established estimates in the contemporary robustness literature. The most fascinating feature of the framework is that one will get the output and the graphs in the same format, whatever estimation method was used. For example the following code lines will compute the S estimates for the same data set and provide the standard and extended output (not shown here). Nevertheless, this variety of methods could pose a serious hurdle for the novice and could be quite tedious even for the experienced user. Therefore a shortcut is provided too-the function CovRobust() can be called with a parameter set specifying any of the available estimation methods, but if this parameter set is omitted the function will decide on the basis of the data size which method to use. As we see in the example below, in this case it selects the Stahel-Donoho estimates. For details and further examples see Section 4.1.

Multivariate location and scatter
The framework provides an almost complete set of estimators for multivariate location and scatter with high breakdown point. The first such estimator was proposed by Stahel (1981a,b) and Donoho (1982) and it is recommended for small data sets, but the most widely used high breakdown estimator is the minimum covariance determinant estimate (Rousseeuw 1985). Several algorithms for computing the S estimators (Davies 1987) are provided (Ruppert 1992;Woodruff and Rocke 1994;Rocke 1996;Salibian-Barrera and Yohai 2006). The minimum volume ellipsoid (MVE) estimator (Rousseeuw 1985) is also included since it has some desirable properties when used as initial estimator for computing the S estimates (see Maronna et al. 2006, p. 198). In the rest of this section the definitions of the different estimators of location and scatter will be briefly reviewed and the algorithms for their computation will be discussed. Further details can be found in the relevant references. The object model is presented and examples of its usage, as well as further examples of the graphical displays are given.

The Minimum covariance determinant estimator and its computation
The MCD estimator for a data set {x 1 , . . . , x n } in ℜ p is defined by that subset {x i 1 , . . . , x i h } of h observations whose covariance matrix has the smallest determinant among all possible subsets of size h. The MCD location and scatter estimate T MCD and C MCD are then given as the arithmetic mean and a multiple of the sample covariance matrix of that subset The multiplication factors c ccf (consistency correction factor) and c sscf (small sample correction factor) are selected so that C is consistent at the multivariate normal model and unbiased at small samples (see Butler et al. 1993;Croux and Haesbroeck 1999;Pison et al. 2002;Todorov 2008). A recommendable choice for h is ⌊(n + p + 1)/2⌋ because then the BP of the MCD is maximized, but any integer h within the interval [(n + p + 1)/2, n] can be chosen, see Rousseeuw and Leroy (1987). Here ⌊z⌋ denotes the integer part of z which is not less than z. If h = n then the MCD location and scatter estimate T MCD and C MCD reduce to the sample mean and covariance matrix of the full data set.
The computation of the MCD estimator is far from being trivial. The naive algorithm would proceed by exhaustively investigating all subsets of size h out of n to find the subset with the smallest determinant of its covariance matrix, but this will be feasible only for very small data sets. Initially MCD was neglected in favor of MVE because the simple resampling algorithm was more efficient for MVE. Meanwhile several heuristic search algorithms (see Todorov 1992;Woodruff and Rocke 1994;Hawkins 1994) and exact algorithms (Agulló 1996) were proposed but now a very fast algorithm due to Rousseeuw and Van Driessen (1999) exists and this algorithm is usually used in practice. The algorithm is based on the C-step which moves from one approximation (T 1 , C 1 ) of the MCD estimate of a data set X = {x 1 , . . . , x n } to the next one (T 2 , C 2 ) with possibly lower determinant det(C 2 ) ≤ det(C 1 ) by computing the distances d 1 , . . . , d n relative to (T 1 , C 1 ), i.e., and then computing (T 2 , C 2 ) for those h observations which have smallest distances. "C" in C-step stands for "concentration" since we are looking for a more "concentrated" covariance matrix with lower determinant. Rousseeuw and Van Driessen (1999) have proven a theorem stating that the iteration process given by the C-step converges in a finite number of steps to a (local) minimum. Since there is no guarantee that the global minimum of the MCD objective function will be reached, the iteration must be started many times from different initial subsets, to obtain an approximate solution. The procedure is very fast for small data sets but to make it really "fast" also for large data sets several computational improvements are used.
• initial subsets: It is possible to restart the iterations from randomly generated subsets of size h, but in order to increase the probability of drawing subsets without outliers, p + 1 points are selected randomly. These p + 1 points are used to compute (T 0 , C 0 ).
Then the distances d 1 , . . . , d n are computed and sorted in increasing order. Finally the first h are selected to form the initial h−subset H 0 .
• reduced number of C-steps: The C-step involving the computation of the covariance matrix, its determinant and the relative distances, is the most computationally intensive part of the algorithm. Therefore instead of iterating to convergence for each initial subset only two C-steps are performed and the 10 subsets with lowest determinant are kept. Only these subsets are iterated to convergence.
• partitioning: For large n the computation time of the algorithm increases mainly because all n distances given by Equation (2) have to be computed at each iteration. An improvement is to partition the data set into a maximum of say five subsets of approximately equal size (but not larger than say 300) and iterate in each subset separately. The ten best solutions for each data set are kept and finally only those are iterated on the complete data set.
• nesting: Further decrease of the computational time can be achieved for data sets with n larger than say 1500 by drawing 1500 observations without replacement and performing the computations (including the partitioning) on this subset. Only the final iterations are carried out on the complete data set. The number of these iterations depends on the actual size of the data set at hand.
The MCD estimator is not very efficient at normal models, especially if h is selected so that maximal BP is achieved. To overcome the low efficiency of the MCD estimator, a reweighed version can be used. For this purpose a weight w i is assigned to each observation p,0.975 and w i = 0 otherwise, relative to the raw MCD estimates (T MCD , C MCD ). Then the reweighted estimates are computed as where ν is the sum of the weights, ν = n i=1 w i . Again, the multiplication factors c r.ccf and c r.sscf are selected so that C R is consistent at the multivariate normal model and unbiased at small samples (see Pison et al. 2002;Todorov 2008, and the references therein). These reweighted estimates (T R , C R ) which have the same breakdown point as the initial (raw) estimates but better statistical efficiency are computed and used by default.

The Minimum volume ellipsoid estimates
The minimum volume ellipsoid estimator searches for the ellipsoid of minimal volume containing at least half of the points in the data set X. Then the location estimate is defined as the center of this ellipsoid and the covariance estimate is provided by its shape. Formally the estimate is defined as these T MVE , C MVE that minimize det(C) subject to where # denotes the cardinality. The constant c is chosen as χ 2 p,0.5 . The search for the approximate solution is made over ellipsoids determined by the covariance matrix of p + 1 of the data points and by applying a simple but effective improvement of the sub-sampling procedure as described in Maronna et al. (2006), p. 198. Although there exists no formal proof of this improvement (as for MCD and LTS), simulations show that it can be recommended as an approximation of the MVE. The MVE was the first popular high breakdown point estimator of location and scatter but later it was replaced by the MCD, mainly because of the availability of an efficient algorithm for its computation (Rousseeuw and Van Driessen 1999). Recently the MVE gained importance as initial estimator for S estimation because of its small maximum bias (see Maronna et al. 2006, Table 6.2, p. 196).

The Stahel-Donoho estimator
The first multivariate equivariant estimator of location and scatter with high breakdown point was proposed by Stahel (1981a,b) and Donoho (1982) but became better known after the analysis of Maronna and Yohai (1995). For a data set X = {x 1 , . . . , x n } in ℜ p it is defined as a weighted mean and covariance matrix of the form given by Equation (3) where the weight w i of each observation is inverse proportional to the "outlyingness" of the observation. Let the univariate outlyingness of a point x i with respect to the data set X along a vector a ∈ ℜ p , ||a|| = 0 be given by where (a ⊤ X) is the projection of the data set X on a and the functions m() and s() are robust univariate location and scale statistics, for example the median and MAD, respectively. Then the multivariate outlyingness of x i is defined by The weights are computed by w i = w(r i ) where w(r) is a nonincreasing function of r and w(r) and w(r)r 2 are bounded. Maronna and Yohai (1995) use the weights with c = χ 2 p,β and β = 0.95, that are known in the literature as "Huber weights". Exact computation of the estimator is not possible and an approximate solution is found by subsampling a large number of directions a and computing the outlyingness measures r i , i = 1, . . . , n for them. For each subsample of p points the vector a is taken as the norm 1 vector orthogonal to the hyperplane spanned by these points. It has been shown by simulations (Maronna et al. 2006) that one step reweighting does not improve the estimator.

Orthogonalized Gnanadesikan/Kettenring
The MCD estimator and all other known affine equivariant high-breakdown point estimates are solutions to a highly non-convex optimization problem and as such pose a serious computational challenge. Much faster estimates with high breakdown point can be computed if one gives up the requirements of affine equivariance of the covariance matrix. Such an algorithm was proposed by Maronna and Zamar (2002) which is based on the very simple robust bivariate covariance estimator s jk proposed by Gnanadesikan and Kettenring (1972) and studied by Devlin et al. (1981). For a pair of random variables Y j and Y k and a standard deviation function σ(), s jk is defined as If a robust function is chosen for σ() then s jk is also robust and an estimate of the covariance matrix can be obtained by computing each of its elements s jk for each j = 1, . . . , p and k = 1, . . . , p using Equation (8). This estimator does not necessarily produce a positive definite matrix (although symmetric) and it is not affine equivariant. Maronna and Zamar (2002) overcome the lack of positive definiteness by the following steps: Here Y l , l = 1, . . . , p are the columns of the transformed data matrix Y and s(., .) is a robust estimate of the covariance of two random variables like the one in Equation (8).
• Obtain the "principal component decomposition" of Y by decomposing U = EΛE ⊤ where Λ is a diagonal matrix Λ = diag(λ 1 , . . . , λ p ) with the eigenvalues λ j of U and E is a matrix with columns the eigenvalues e j of U .
This can be iterated by computing C OGK and T OGK for Z = {z 1 , . . . , z n } obtained in the last step of the procedure and then transforming back to the original coordinate system. Simulations (Maronna and Zamar 2002) show that iterations beyond the second did not lead to improvement.
Similarly as for the MCD estimator a one-step reweighting can be performed using Equations (3) but the weights w i are based on the 0.9 quantile of the χ 2 p distribution (instead of 0.975) and the correction factors c r .ccf and c r .sscf are not used.
In order to complete the algorithm we need a robust and efficient location function m() and scale function σ(), and one proposal is given in Maronna and Zamar (2002). Further, the robust estimate of covariance between two random vectors s() given by Equation (8) can be replaced by another one. In the framework two such functions are predefined but the user can provide as a parameter an own function.

S estimates
S estimators of µ and Σ were introduced by Davies (1987) and further studied by Lopuhaä (1989) (see also Rousseeuw and Leroy 1987, p. 263). For a data set of p-variate observations and det(C) = 1. Here σ = σ(z) is the M-scale estimate of a data set z = {z 1 , . . . , z n } defined as the solution of 1 n Σρ(z/σ) = δ where ρ is nondecreasing, ρ(0) = 0 and ρ(∞) = 1 and δ ∈ (0, 1). An equivalent definition is to find the vector T and a positive definite symmetric matrix C that minimize det(C) subject to with the above d i and ρ.
As shown by Lopuhaä (1989) S estimators have a close connection to the M estimators and the solution (T, C) is also a solution to an equation defining an M estimator as well as a weighted sample mean and covariance matrix: The framework implements the S estimates in the class CovSest and provides four different algorithms for their computation.
1. SURREAL: This algorithm was proposed by Ruppert (1992) as an analog to the algorithm proposed by the same author for computing S estimators of regression.
2. Bisquare S estimation with HBDP start: S estimates with the biweight ρ function can be obtained using the Equations (10) by a reweighted sample covariance and reweighted sample mean algorithm as described in Maronna et al. (2006). The preferred approach is to start the iteration from a bias-robust but possibly inefficient estimate which is computed by some form of sub-sampling. Since Maronna et al. (2006) have shown that the MVE has smallest maximum bias (Table 6.2, p. 196) it is recommended to use it as initial estimate.
3. Rocke type S estimates: In Rocke (1996) it is shown that S estimators in high dimensions can be sensitive to outliers even if the breakdown point is set to 50%. Therefore they propose a modified ρ function called translated biweight (or t-biweight) and replace the standardization step given in Equation (9) with a standardization step consisting of equating the median of ρ(d i ) with the median under normality. The estimator is shown to be more outlier resistant in high dimensions than the typical S estimators. The specifics of the iteration are given in Rocke and Woodruff (1996), see also Maronna et al. (2006). As starting values for the iteration any of the available methods in the framework can be used. The recommended (and consequently the default) one is the MVE estimator computed by CovMve(). (2006) proposed a fast algorithm for regression S estimates similar to the FAST-LTS algorithm of Rousseeuw and Van Driessen (2006) and borrowing ideas from the SURREAL algorithm of Ruppert (1992). Similarly, the FAST-S algorithm for multivariate location and scatter is based on modifying each candidate to improve the S-optimality criterion thus reducing the number of the necessary sub-samples required to achieve desired high breakdown point with high probability.

Object model for robust location and scatter estimation
The object model for the S4 classes and methods implementing the different multivariate location and scatter estimators follows the proposed class hierarchy given in Section 2 and is presented in Figure 4.  The abstract class Cov serves as a base class for deriving all classes representing classical and robust location and scatter estimation methods. It defines the common slots and the corresponding accessor methods, provides implementation for the general methods like show(), plot() and summary(). The slots of Cov hold some input or default parameters as well as the results of the computations: the location, the covariance matrix and the distances. The show() method presents brief results of the computations and the summary() method returns an object of class SummaryCov which has its own show() method. As in the other sections of the framework these slots and methods are defined and documented only once in this base class and can be used by all derived classes. Whenever new data (slots) or functionality (methods) are necessary, they can be defined or redefined in the particular class.
The classical location and scatter estimates are represented by the class CovClassic which inherits directly from Cov (and uses all slots and methods defined there). The function CovClassic() serves as a constructor (generating function) of the class. It can be called by providing a data frame or matrix. As already demonstrated in Section 3 the methods show() and summary() present the results of the computations. The plot() method draws different diagnostic plots which are shown in one of the next sections. The accessor functions like getCenter(), getCov(), etc. are used to access the corresponding slots.
Another abstract class, CovRobust is derived from Cov, which serves as a base class for all robust location and scatter estimators.
The classes representing robust estimators like CovMcd, CovMve, etc. are derived from CovRobust and provide implementation for the corresponding methods. Each of the constructor functions CovMcd(), CovMve(),CovOgk(), CovMest() and CovSest() performs the necessary computations and returns an object of the class containing the results. Similarly as the CovClassic() function, these functions can be called either with a data frame or a numeric matrix.

Controlling the estimation options
Although the different robust estimators of multivariate location and scatter have some controlling options in common, like the tracing flag trace or the numeric tolerance tolSolve to be used for inversion (solve) of the covariance matrix in mahalanobis(), each of them has more specific options. For example, the MCD and MVE estimators (CovMcd() and CovMve()) can specify alpha which controls the size of the subsets over which the determinant (the volume of the ellipsoid) is minimized. The allowed values are between 0.5 and 1 and the default is 0.5. Similarly, these estimators have parameters nsamp for the number of subsets used for initial estimates and seed for the initial seed for R's random number generator while the M and S estimators (CovMest and CovSest) have to specify the required breakdown point (allowed values between (n − p)/(2 * n) and 1 with default 0.5) as well as the asymptotic rejection point, i.e., the fraction of points receiving zero weight (Rocke and Woodruff 1996).
These parameters can be passed directly to the corresponding constructor function but additionally there exists the possibility to use a control object serving as a container for the parameters. The object model for the control objects shown in Figure 5 follows the proposed class hierarchy-there is a base class CovControl which holds the common parameters and from this class all control classes holding the specific parameters of their estimators are derived. These classes have only a constructor function for creating new objects and a method restimate() which takes a data frame or a matrix, calls the corresponding estimator to perform the calculations and returns the created class with the results.
Apart from providing a structured container for the estimation parameters this class hierarchy has the following additional benefits:  • the parameters can be passed easily to another multivariate method, for example the principal components analysis based on a covariance matrix PcaCov() (see Section 4.2) can take a control object which will be used to estimate the desired covariance (or correlation) matrix. In the following example a control object holding the parameters for S estimation will be created and then PcaCov() will be called with this object.

R>
• the class hierarchy of the control objects allows to handle different estimator objects using a uniform interface thus leveraging one of the most important features of the object oriented programming, the polymorphism. In the following example we create a list containing different control objects and then via sapply we call the generic function restimate() on each of the objects in the list. The outcome will be a list containing the objects resulting from these calls (all are derived from CovRobust). This looping over the different estimation methods is very useful for implementing simulation studies. [1] "CovMcd" "CovMest" "CovOgk" "CovSest" "CovSest"

A generalized function for robust location and covariance estimation: CovRobust()
The provided variety of estimation methods, each of them with different parameters as well as the object models described in the previous sections can be overwhelming for the user, especially for the novice who does not care much about the technical implementation of the framework. Therefore a function is provided which gives a quick access to the robust estimates of location and covariance matrix. This function is loosely modeled around the abstract factory design pattern (see Gamma et al. 1995, page 87) in the sense that it creates concrete objects of derived classes and returns the result over a base class interface. The class CovRobust is abstract (defined as VIRTUAL) and no objects of it can be created but any of the classes derived from CovRobust, such as CovMcd or CovOgk, can act as an object of class CovRobust. The function CovRobust() which is technically not a constructor function can return an object of any of the classes derived from CovRobust according to the user request. This request can be specified in one of three forms: • If only a data frame or matrix is provided and the control parameter is omitted, the function decides which estimate to apply according to the size of the problem at hand. If there are less than 1000 observations and less than 10 variables or less than 5000 observations and less than 5 variables, Stahel-Donoho estimator will be used. Otherwise, if there are less than 50000 observations, either bisquare S estimates (in case of less than 10 variables) or Rocke type S estimates (for 10 to 20 variables) will be used. In both cases the S iteration starts at the initial MVE estimate. And finally, if there are more than 50000 observations and/or more than 20 variables the Orthogonalized Quadrant Correlation estimator (CovOgk with the corresponding parameters) is used. This is illustrated by the following example.

R> ## R> ## Rocke-type S-estimates R> ## R> getMeth(CovRobust(matrix(rnorm(40), ncol=2), control="rocke"))
[1] "S-estimates: Rocke type" • If it is necessary to specify also some estimation parameters, the user can create a control object (derived from CovControl) and pass it to the function together with the data. For example to compute the OGK estimator using the median absolute deviation (MAD) as a scale estimate and the quadrant correlation (QC) as a pairwise correlation estimate we create a control object ctrl passing the parameters s_mad and s_qc to the constructor function and then call CovRobust with this object. The last command line illustrates the accessor method for getting the correlation matrix of the estimate as well as a nice formatting method for covariance matrices.

Visualization of the results
The default plot accessed through the method plot() of class CovRobust is the Distance-Distance plot introduced by Rousseeuw and van Zomeren (1991). An example of this graph, which plots the robust distances versus the classical Mahalanobis distances is shown in Fig The other available plots are accessible either interactively or through the which parameter of the plot() method. The left panel of Figure 6 shows an example of the distance plot in which robust and classical Mahalanobis distances are shown in parallel panels. The outliers have large robust distances and are identified by their labels. The right panel of Figure 6 shows a Quantile-Quantile comparison plot of the robust and the classical Mahalanobis distances versus the square root of the quantiles of the chi-squared distribution.
The next plot shown in Figure 7 presents a scatter plot of the data on which the 97.5% robust and classical confidence ellipses are superimposed. Currently this plot is available only for bivariate data. The observations with distances larger than χ 2 p,0.975 are identified by their subscript. In the right panel of Figure 7 a screeplot of the milk data set is shown, presenting the robust and classical eigenvalues.

Principal component analysis
Principal component analysis is a widely used technique for dimension reduction achieved by finding a smaller number q of linear combinations of the originally observed p variables and retaining most of the variability of the data. Thus PCA is usually aiming at a graphical representation of the data in a lower dimensional space. The classical approach to PCA measures the variability through the empirical variance and is essentially based on computation of eigenvalues and eigenvectors of the sample covariance or correlation matrix. Therefore the results may be extremely sensitive to the presence of even a few atypical observations in the data. These discrepancies will carry over to any subsequent analysis and to any graphical display related to the principal components such as the biplot.
The following example in Figure 8 illustrates the effect of outliers on the classical PCA. The data set hbk from the package robustbase consists of 75 observations in 4 dimensions (one response and three explanatory variables) and was constructed by Hawkins, Bradu and Kass in 1984 for illustrating some of the merits of a robust technique (see Rousseeuw and Leroy 1987). The first 10 observations are bad leverage points, and the next four points are good leverage points (i.e., their x are outlying, but the corresponding y fit the model quite well). We will consider only the X-part of the data. The left panel shows the plot of the scores on the first two classical principal components (the first two components account for more than 98% of the total variation). The outliers are identified as separate groups, but the regular points are far from the origin (where the mean of the scores should be located). Furthermore, the ten bad leverage points 1-10 lie within the 97.5% tolerance ellipse and influence the classical estimates of location and scatter. The right panel shows the same plot based on robust estimates. We see that the estimate of the center is not shifted by the outliers and these outliers are clearly separated by the 97.5% tolerance ellipse. PCA was probably the first multivariate technique subjected to robustification, either by simply computing the eigenvalues and eigenvectors of a robust estimate of the covariance matrix or directly by estimating each principal component in a robust manner. Different approaches to robust PCA are briefly presented in the next subsections with the emphasis on those methods which are available in the framework. Details about the methods and algorithms can be found in the corresponding references. The object model is described and examples are given.

PCA based on robust covariance matrix (MCD, OGK, MVE, etc.)
The most straightforward and intuitive method to obtain robust PCA is to replace the classical estimates of location and covariance by their robust analogues. In the earlier works M estimators of location and scatter were used for this purpose (see Devlin et al. 1981;Campbell 1980) but these estimators have the disadvantage of low breakdown point in high dimensions. To cope with this problem Naga and Antille (1990) used the MVE estimator and Todorov et al. (1994b) used the MCD estimator. Croux and Haesbroeck (2000) investigated the properties of the MCD estimator and computed its influence function and efficiency.
The package stats in base R contains the function princomp() which performs a principal components analysis on a given numeric data matrix and returns the results as an object of S3 class princomp. This function has a parameter covmat which can take a covariance matrix, or a covariance list as returned by cov.wt, and if supplied, it is used rather than the covariance matrix of the input data. This allows to obtain robust principal components by supplying the covariance matrix computed by cov.mve or cov.mcd from the package MASS.
One could ask why is it then necessary to include such type of function in the framework (since it already exists in the base package). The essential value added of the framework, apart from implementing many new robust multivariate methods is the unification of the interfaces by leveraging the object orientation provided by the S4 classes and methods. The function PcaCov() computes robust PCA by replacing the classical covariance matrix with one of the robust covariance estimators available in the framework-MCD, OGK, MVE, M, S or Stahel-Donoho, i.e., the parameter cov.control can be any object of a class derived from the base class CovControl. This control class will be used to compute a robust estimate of the covariance matrix. If this parameter is omitted, MCD will be used by default. Of course any newly developed estimator following the concepts of the framework can be used as input to the function PcaCov().

Projection pursuit methods
The second approach to robust PCA uses projection pursuit (PP) and calculates directly the robust estimates of the eigenvalues and eigenvectors. Directions are seeked for, which maximize the variance (classical PCA) of the data projected onto them. Replacing the variance with a robust measure of spread yields robust PCA. Such a method was first introduced by Li and Chen (1985) using an M estimator of scale S n as a projection index (PI). They showed that the PCA estimates inherit the robustness properties of the scale estimator S n . Unfortunately, in spite of the good statistical properties of the method, the algorithm they proposed was too complicated to be used in practice. A more tractable algorithm in these lines was first proposed by Croux and Ruiz-Gazen (1996) and later improved by Croux and Ruiz-Gazen (2005). To improve the performance of the algorithm for high dimensional data a new improved version was proposed by Croux et al. (2007). The latter two algorithms are available in the package pcaPP (see Filzmoser et al. 2009) as functions PCAproj() and PCAgrid().
In the framework these methods are represented by the classes PcaProj and PcaGrid. Their generating functions provide simple wrappers around the original functions from pcaPP and return objects of the corresponding class, derived from PcaRobust.
A major advantage of the PP-approach is that it searches for the eigenvectors consecutively and in case of high dimensional data when we are interested in only the first one or two principal components this results in reduced computational time. Even more, the PP-estimates cope with the main drawback of the covariance-based estimates-they can be computed for data matrices with more variables than observations.

Hubert method (ROBPCA)
The PCA method proposed by Hubert et al. (2005) tries to combine the advantages of both approaches-the PCA based on a robust covariance matrix and PCA based on projection pursuit. A brief description of the algorithm follows, for details see the relevant references (Hubert et al. 2008).
Let n denote the number of observations, and p the number of original variables in the input data matrix X. The ROBPCA algorithm finds a robust center m of the data and a loading matrix P of dimension p × k. Its columns are orthogonal and define a new coordinate system. The scores T, an n × k matrix, are the coordinates of the centered observations with respect to the loadings: where 1 is a column vector with all n components equal to 1. The ROBPCA algorithm yields also a robust covariance matrix (often singular) which can be computed as where L is the diagonal matrix with the eigenvalues l 1 , . . . , l k . This is done in the following three main steps: Step 1: The data are preprocessed by reducing their data space to the subspace spanned by the n observations. This is done by singular value decomposition of the input data matrix. As a result the data are represented in a space whose dimension is rank(X), being at most n − 1 without loss of information.
Step 2: In this step a measure of outlyingness is computed for each data point. For this purpose the data points are projected on the n(n − 1)/2 univariate directions through each two points. If n is too large, maxdir directions are chosen at random (maxdir defaults to 250 but can be changed by the user). On every direction the univariate MCD estimator of location and scale is computed and the standardized distance to the center is measured. The largest of these distances (over all considered directions) is the outlyingness measure of the data point. The h data points with smallest outlyingness measure are used to compute the covariance matrix Σ h and to select the number k of principal components to retain. This is done by finding k such that l k /l 1 ≥ 10 −3 and Σ k j=1 l j /Σ r j=1 l j ≥ 0.8. Alternatively the number of principal components k can be specified by the user after inspecting the scree plot.
Step 3: The data points are projected on the k-dimensional subspace spanned by the k eigenvectors corresponding to the largest k eigenvalues of the matrix Σ h . The location and scatter of the projected data are computed using the reweighted MCD estimator, and the eigenvectors of this scatter matrix yield the robust principal components.

Spherical principal components (SPC)
The spherical principal components procedure was first proposed by Locantore et al. (1999) as a method for functional data analysis. The idea is to perform classical PCA on the data, projected onto a unit sphere. The estimates of the eigenvectors are consistent if the data are elliptically distributed (see Boente and Fraiman 1999) and the procedure is extremely fast. Although not much is known about the efficiency of this method, the simulations of Maronna (2005) show that it has very good performance. If each coordinate of the data is normalized using some kind of robust scale, like for example the MAD, and then SPC is applied, we obtain "elliptical PCA", but unfortunately this procedure is not consistent.

Object model for robust PCA and examples
The object model for the S4 classes and methods implementing the principal component analysis methods follows the proposed class hierarchy given in Section 2 and is presented in Figure 9. The abstract class Pca serves as a base class for deriving all classes representing  classical and robust principal components analysis methods. It defines the common slots and the corresponding accessor methods, provides implementation for the general methods like show(), plot(), summary() and predict(). The slots of Pca hold some input or default parameters like the requested number of components as well as the results of the computations: the eigenvalues, the loadings and the scores. The show() method presents brief results of the computations, and the predict() method projects the original or new data to the space spanned by the principal components. It can be used either with new observations or with the scores (if no new data are provided). The summary() method returns an object of class SummaryPca which has its own show() method. As in the other sections of the framework these slots and methods are defined and documented only once in this base class and can be used by all derived classes. Whenever new information (slots) or functionality (methods) are necessary, they can be defined or redefined in the particular class.
Classical principal component analysis is represented by the class PcaClassic which inherits directly from Pca (and uses all slots and methods defined there). The function PcaClassic() serves as a constructor (generating function) of the class. It can be called either by providing a data frame or matrix or a formula with no response variable, referring only to numeric variables. Let us consider the following simple example with the data set hbk from the package robustbase. The code line

R> PcaClassic(hbk.x)
can be rewritten as (and is equivalent to) the following code line using the formula interface R> PcaClassic(~., data = hbk.x) The function PcaClassic() performs the standard principal components analysis and returns an object of the class PcaClassic. The show() method displays the standard deviations of the resulting principal components, the loadings and the original call. The summary() method presents the importance of the calculated components. The plot() draws a PCA diagnostic plot which is shown and described later. The accessor functions like getLoadings(), getEigenvalues(), etc. are used to access the corresponding slots, and predict() is used to rotate the original or new data to the space of the principle components.

R> ##
Another abstract class, PcaRobust is derived from Pca, which serves as a base class for all robust principal components methods.
The classes representing robust PCA methods like PcaHubert, PcaLocantore, etc. are derived from PcaRobust and provide implementation for the corresponding methods. Each of the constructor functions PcaCov(), PcaHubert(), PcaLocantore(), PcaGrid() and PcaProj() performs the necessary computations and returns an object of the class containing the results. In the following example the same data are analyzed using the projection pursuit method PcaGrid(). Similar to the function PcaClassic(), these functions can be called either with a data frame or matrix or by a formula interface.

Visualization of PCA results
One of the most important applications of PCA, besides dimensionality reduction is data visualization. In the framework several plots for visualizing the results of the analysis are available. The plot() methods are implemented in the base class Pca and thus they are available for all objects derived from the class Pca no matter if classical and robust. The most straightforward plot is the screeplot which plots the variances against the number of principal components (similar to the screeplot for the standard prcomp() and princomp() functions). It is a useful tool for determining the number of relevant principal components. An example of the classical and robust screeplot for the milk data from robustbase is shown in Figure 10.

R> ## R> ## Screeplot for classical and robust PCA of the milk data set.
R> ## R> usr <-par(mfrow=c(1,2)) R> screeplot(PcaClassic(milk), type="lines", + main="Screeplot: classical PCA", sub="milk data") R> screeplot(PcaHubert(milk), type="lines", main="Screeplot: robust PCA", + sub="milk data") R> par(usr) Another plot borrowed from standard R is the biplot. The biplot (Gabriel 1971) represents both the observations and variables in the plane of (the first) two principal components allowing the visualization of the magnitude and sign of each variable's contribution to these principal components. Each observation (row of scores) is represented as a point in the biplot and each variable is represented as an arrow. The arrows graphically indicate the proportion of the original variance explained by the (first) two principal components and their direction indicates the relative loadings on these components. Figure 11 shows an example of the robust biplot of the un86 data set which contains seven socioeconomic variables observed for 73 countries. The data set is from World Statistics in Brief, Number 10, a 1986 UN publication. It was used in Daigle and Rivest (1992) to illustrate a robust biplot method.

Linear and quadratic discriminant analysis
The problem of discriminant analysis arises when one wants to assign an individual to one of g populations at the basis of a p-dimensional feature vector x. Let the p-dimensional random variable x k come from a population π k with underlying density f k . Further let the prior probability of group k, i.e., the probability that an individual comes from population π k be α k , Σ g k=1 α k = 1. The prior probabilities α k are usually estimated by the empirical frequencies n k in the k-th group of the training set, i.e.,α k = n k /Σ g j=1 n j . Then the Bayesian discriminant rule assigns an observation x to that population π k for which the expression ln(α k f k (x)) is maximal over all groups k = 1, . . . , g. Usually it is assumed that the k populations π k are p-dimensional normally distributed, With this assumption the discriminant rule is equivalent to maximizing the discriminant scores D k (x) given by and individual x is assigned to π k if The application of the discrimination rule given by Equations (16) and (17) is referred to as quadratic discriminant analysis (QDA), since the groups are separated by quadratic boundaries.
If it is further assumed that all group covariance matrices are equal (Σ 1 = . . . = Σ g = Σ), then the overall probability of misclassification is minimized by assigning a new observation x to population π k which maximizes The application of the discriminant rule given by Equation (18) is referred to as linear discriminant analysis (LDA), since the scores d k (x) are linear in x.
If the means µ k , k = 1, . . . , g, and the common covariance matrix Σ are unknown, which is usually the case, a training set consisting of samples drawn from each of the populations is required. In classical QDA and LDA the sample group means and sample covariance matrices are used to estimate µ k , Σ k and Σ. The prior probabilities can be estimated by the relative frequencies of observations in each particular group. Both QDA and LDA using the classical estimates in (16) and (18) are vulnerable to the presence of outliers. The problem of the non-robustness of the classical estimates in the setting of the quadratic and linear discriminant analysis has been addressed by many authors Neytchev (1990, 1994a) replaced the classical estimates by MCD estimates; Chork and Rousseeuw (1992) used MVE instead; Hawkins and McLachlan (1997) defined the minimum withingroup covariance determinant estimator (MWCD) especially for the case of linear discriminant analysis; He and Fung (2000) and Croux and Dehon (2001) used S estimates; Hubert and Van Driessen (2004) applied the MCD estimates computed by the FAST-MCD algorithm. For a recent review and comparison of these methods see Todorov and Pires (2007).
A robust version of quadratic discriminant analysis can be obtained by substituting the parameters µ k , Σ k by their robust estimates. For this purpose the reweighted MCD estimates, S estimates or OGK can be used. In the case of linear discriminant analysis a robust version of the common covariance matrix Σ is necessary. There are several methods for estimating the common covariance matrix based on a high breakdown point estimator which are considered in one of the next subsections.

Object model for robust LDA and QDA
The object model for the S4 classes and methods implementing the linear and quadratic discriminant analysis methods follows the proposed class hierarchy given in Section 2 and is presented in Figure 14. The abstract classes Lda and Qda serve as base classes for deriving all classes representing classical and robust methods for linear and quadratic discriminant analysis methods. They define the common slots and the corresponding accessor methods, provide implementation for the general methods like show(), plot(), summary() and predict(). This base classes also host several utility functions which are not directly exported but are documented and can be used by quoting the namespace. The slots of Lda hold some input or default parameters like the prior probabilities, the original data matrix and the grouping variable as well as the results of the computations: the group means and the common covariance matrix, the linear discriminant functions and the corresponding constants. The show() method presents brief results of the computations, and predict() can be used either for classifying new observations or for the evaluation of the discriminant rules on the basis of the training sample. The method predict() returns an object of class PredictLda which has its own show() method to print the results of the classification or evaluation. The summary() method returns an object of class SummaryLda which has its own show() method. As in the other sections of the framework these slots and methods are defined and documented only once in this base class and can be used by all derived classes. Whenever new data (slots) or functionality (methods) are necessary, they can be defined or redefined in the particular class.
Classical linear discriminant analysis is represented by the class LdaClassic which inherits directly from Lda (and uses all slots and methods defined there). The function LdaClassic() serves as a constructor (generating function) of the class. It can be called either by providing a data frame or matrix and a grouping variable (factor) or a formula specifying the model to be used. Let us consider the following simple example with the data set diabetes: the grouping variable is diabetes$group and the three explanatory variables are glucose, insulin and sspg. The code R> x <-diabetes[, c("glucose", "insulin", "sspg")] R> grpvar <-diabetes$group R> LdaClassic(x, grpvar) can be rewritten as (and is equivalent to) the following code using the formula interface: R> LdaClassic(group~., data = diabetes) The function LdaClassic() performs standard linear discriminant analysis and returns an object of class LdaClassic. Another abstract class, LdaRobust is derived from Lda, which serves as a base class for all robust linear discriminant analysis methods. The only slot added in this class is a character variable specifying the robust method to be used.
The class Linda is derived from LdaRobust and provides implementation for all methods for robust LDA currently available in the framework. If we wanted to be precisely object oriented, we should define a separate class for each robust method-for example LdaRobustMcd, LdaRobustFsa, etc. but this would lead to explosion of the necessary code and documentation. The constructor function Linda() takes a character parameter method specifying which robust location and scatter estimator to use and how to compute the common covariance matrix and returns an object of class Linda. Similarly as the function LdaClassic(), Linda() can be called either with a data matrix and grouping variable or by a formula interface.

Computing the common covariance matrix
The easiest way to estimate the common covariance matrix Σ is to obtain the estimates of the group means µ k and group covariance matrices Σ k from the individual groups as (m k , C k ), k = 1, . . . , g, and then pool the estimates C k , k = 1, . . . , g to yield the common covariance matrix This method, using MVE and MCD estimates, was proposed by Todorov et al. (1990Todorov et al. ( , 1994a and was also used, based on the MVE estimator by Chork and Rousseeuw (1992). Croux and Dehon (2001) applied this procedure for robustifying linear discriminant analysis based on S estimates. A drawback of this method is that the same trimming proportions are applied to all groups which could lead to a loss of efficiency if some groups are outlier free. We will denote this method as "A" and the corresponding estimator as XXX-A. For example, in the case of the MCD estimator this will be MCD-A.
Another method was proposed by He and Fung (2000) for the S estimates and was later adapted by Hubert and Van Driessen (2004) for the MCD estimates. Instead of pooling the group covariance matrices, the observations are centered and pooled to obtain a single sample for which the covariance matrix is estimated. It starts by obtaining the individual group location estimates t k , k = 1, . . . , g, as the reweighted MCD location estimates of each group. These group means are swept from the original observations x ik (i = 1, . . . , n k ; k = 1, . . . , g) to obtain the centered observations The common covariance matrix C is estimated as the reweighted MCD covariance matrix of the centered observations Z. The location estimate δ of Z is used to adjust the group means m k and thus the final group means are This process could be iterated until convergence, but since the improvements from such iterations are negligible (see He and Fung 2000;Hubert and Van Driessen 2004) we are not going to use it. This method will be denoted by "B" and as already mentioned, the corresponding estimator as XXX-B, for example MCD-B.
The third approach is to modify the algorithm for high breakdown point estimation itself in order to accommodate the pooled sample. He and Fung (2000) modified Ruperts's SUR-REAL algorithm for S estimation in case of two groups. Hawkins and McLachlan (1997) defined the minimum within-group covariance determinant estimator which does not apply the same trimming proportion to each group but minimizes directly the determinant of the common within groups covariance matrix by pairwise swaps of observations. Unfortunately their estimator is based on the Feasible Solution Algorithm (see Hawkins and McLachlan 1997, and the references therein), which is extremely time consuming as compared to the FAST-MCD algorithm. Hubert and Van Driessen (2004) proposed a modification of this algorithm taking advantage of the FAST-MCD, but it is still necessary to compute the MCD for each individual group. This method will be denoted by MCD-C.
Using the estimates m 0 k and C 0 obtained by one of the methods, we can calculate the initial robust distances (Rousseeuw and van Zomeren 1991) With these initial robust distances we can define a weight for each observation x ik , i = 1, . . . , n k and k = 1, . . . , g, by setting the weight to 1 if the corresponding robust distance is less or equal to a suitable cut-off, usually χ 2 p,0.975 , and to 0 otherwise, i.e., With these weights we can calculate the final reweighted estimates of the group means, m k , and the common within-groups covariance matrix, C, which are necessary for constructing the robust classification rules, where ν k are the sums of the weights within group k, for k = 1, . . . , g, and ν is the total sum of weights,

Evaluation of the discriminant rules
In order to evaluate and compare different discriminant rules, their discriminating power has to be determined in the classification of future observations, i.e., we need an estimate of the overall probability of misclassification. A number of methods to estimate this probability exists in the literature-see for example Lachenbruch (1975). The apparent error rate (known also as resubstitution error rate or reclassification error rate) is the most straightforward estimator of the actual error rate in discriminant analysis and is calculated by applying the classification criterion to the same data set from which it was derived. The number of misclassified observations for each group is divided by the group sample size. An estimate of the apparent error rate is calculated by the method predict() of the class Lda. Examples are given in the next section.
It is well known that this method is too optimistic (the true error is likely to be higher). If there are plenty of observations in each class, the error rate can be estimated by splitting the data into training and validation set. The first one is used to estimate the discriminant rules and the second to estimate the misclassification error. This method is fast and easy to apply but it is wasteful of data. Another method is the leave-one-out cross-validation (Lachenbruch and Michey 1968) which proceeds by removing one observation from the data set, estimating the discriminant rule, using the remaining n − 1 observations and then classifying this observation with the estimated discriminant rule. For the classical discriminant analysis there exist updating formulas which avoid the re-computation of the discriminant rule at each step, but no such formulas are available for the robust methods. Thus the estimation of the error rate by this method can be very time consuming depending on the size of the data set. Nevertheless, rrcov provides an (internal, not exported) function rrcov:::.CV() which calculates the leave-one-out cross-validation error rate by "brute force", but the user should be aware that its usage is appropriate only for moderate data sets. An improvement will be the implementation of a cross-validation technique similar to the one proposed by Hubert and Engelen (2007).

Example: Diabetes data
As an example for demonstrating the usage of the robust discriminant analysis classes and methods we use the diabetes data set, which was analyzed by Reaven and Miller (1979) in an attempt to examine the relationship between chemical diabetes and overt diabetes in 145 nonobese adult subjects. Their analysis was focused on three primary variables and the 145 individuals were classified initially on the basis of their plasma glucose levels into three groups: normal subjects, chemical diabetes and overt diabetes. This data set was also analyzed by Hawkins and McLachlan (1997) in the context of robust linear discriminant analysis. The data set is available in several R packages: diabetes in package mclust (Fraley and Raftery 2009), chemdiab in package locfit (Loader 2007) and diabetes.dat in Rfwdmv (Atkinson et al. 2005). We are going to use the one from mclust in which the value of the second variable, insulin, on the 104th observation, is 45 while for the other data sets this value is 455 (note that 45 is more likely to be an outlier in this variable than 455).
We start with bringing the data set diabetes from package mclust into the workspace by typing

R> data("diabetes")
Using the package lattice (Sarkar 2008) we produce a three dimensional cloud plot of the data ( Figure 15).

R> library("lattice")
# load the graphical library R> ## set different plot symbols -important for black-and-white print R> sup.sym <-trellis.par.get("superpose.symbol") R> sup.sym$pch <-c(1,2,3,4,5) R> trellis.par.set ("superpose.symbol", sup.sym) R> cloud.plt <-cloud(insulin~glucose + sspg, groups = group, data = diabetes, auto.key= We will first apply the classical linear discriminant analysis as implemented in LdaClassic() by the formula interface of the function-the grouping variable is class and all the remaining variables in diabetes are used as predictor variables. The show() method will present the results of the computations: the group means, the (common) within group covariance matrix, the linear discriminant functions together with the corresponding constants. The prior probabilities (either provided by the user or estimated as a proportion of the groups) are also presented.

R> lda <-LdaClassic(group~glucose+insulin+sspg, data=diabetes) R> lda
Call: LdaClassic(group~glucose + insulin + sspg, data = diabetes) Now the predict() method can be used on the Lda object (Lda is the base class for both LdaClassic and LdaRobust) in order to classify new observations. The method returns an object of class PredictLda which has its own show() method. If no new data are supplied, the training sample will be reclassified and a classification table will be produced to estimate the apparent error rate of the classification rules.

R> predict(lda)
Apparent error rate 0.1724 Classification Robust linear discriminant analysis can be performed in a similar way but using the function Linda (which will create an object of class Linda). As before the predict() method called without new data returns a classification table of the training subsample. Using the internal convenience function rrcov:::.AER() we can calculate and print the apparent error rate (which now is equal to 0.1103 and is lower than the obtained with the classical LDA 0.1310).
The variance-covariance structures of the three classes in the diabetes data set differ substantially and we can expect improved results if quadratic discriminant analysis is used. Robust quadratic discriminant analysis is performed by the function QdaCov() which will return an object of class QdaCov.

Conclusions
In this paper we presented an object oriented framework for robust multivariate analysis developed in the S4 class system of the programming environment R. The main goal of the framework is to support the usage, experimentation, development and testing of robust multivariate methods as well as simplifying comparisons with related methods. It minimizes the effort for performing any of the following tasks: • application of the already existing robust multivariate methods for practical data analysis; • implementation of new robust multivariate methods or variants of the existing ones; • evaluation of existing and new methods by carrying out comparison studies.
A major design goal was the openness to extensions by the development of new robust methods in the package rrcov or in other packages depending on rrcov. Further classes implementing robust multivariate methods like principal component regression and partial least squares will follow but the user is encouraged to develop own methods using the proposed reusable statistical design patterns.