Abstract
This article compares several methods for performing robust principal component analysis, two of which have not been considered in previous articles. The criterion here, unlike that of extant articles aimed at comparing methods, is how well a method maximizes a robust version of the generalized variance of the projected data. This is in contrast to maximizing some measure of scatter associated with the marginal distributions of the projected scores, which does not take into account the overall structure of the projected data. Included are comparisons in which distributions are not elliptically symmetric. One of the new methods simply removes outliers using a projection-type multivariate outlier detection method that has been found to perform well relative to other outlier detection methods that have been proposed. The other new method belongs to the class of projection pursuit techniques and differs from other projection pursuit methods in terms of the function it tries to maximize. The comparisons include the method derived by Maronna (2005), the spherical method derived by Locantore et al. (1999), as well as a method proposed by Hubert, Rousseeuw, and Vanden Branden (2005). From the perspective used, the method by Hubert et al. (2005), the spherical method, and one of the new methods dominate the method derived by Maronna.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Bedall, F. K., & Zimmermann, H. (1979). Algorithm AS 143: The mediancentre. Applied Statistics, 28, 325–328.
Bernholt, T. (2006). Robust estimators are hard to compute (Technical report). ls2-www.cs.uni-dortmund.de/~bernholt/ps/tr52-05.pdf.
Bernholt, T., & Fischer, P. (2004). The complexity of computing the MCD estimator. Theoretical Computer Science, 326, 383–393.
Brys, G., Hubert, M., & Rousseeuw, P. J. (2005). A robustification of independent component analysis. Journal of Chemometrics, 19, 364–375.
Campbell, N. A. (1980). Robust procedures in multivariate analysis I: Robust covariance estimation. Applied Statistics, 29, 231–237.
Carling, K. (2000). Resistant outlier rules and the non-Gaussian case. Computational Statistics & Data Analysis, 33, 249–258.
Croux, C., Filzmoser, P., & Oliveira, M. R. (2007). Algorithms for projection—pursuit robust principal component analysis. Chemometrics & Intelligent Laboratory Systems, 87, 218–225.
Croux, C., & Haesbroeck, G. (2000). Principal component analysis based on robust estimators of the covariance or correlation matrix: Influence functions and efficiencies. Biometrika, 87, 603–618.
Croux, C., & Ruiz-Gazen, A. (1996). A fast algorithm for robust principal components based on projection pursuit. In A. Prat (Ed.), Compstat: Proceedings in computational statistics (pp. 211–216). Heidelberg: Physica.
Croux, C., & Ruiz-Gazen, A. (2005). High breakdown estimators for principal components: The projection-pursuit approach revisited. Journal of Multivariate Analysis, 95, 206–226.
Devlin, S. J., Gnanadesikan, R., & Kettenring, J. R. (1981). Robust estimation of dispersion matrices and principal components. Journal of the American Statistical Association, 76, 354–362.
Engelen, S., Hubert, M., & Vanden Branden, K. (2005). A comparison of three procedures for robust PCA in high dimensions. Austrian Journal of Statistics, 34, 117–126.
Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., & Stahel, W. A. (1986). Robust statistics: The approach based on influence functions. New York: Wiley.
Hawkins, D. M., & Olive, D. J. (2002). Inconsistency of resampling algorithms for high-breakdown regression estimators and a new algorithm. Journal of the American Statistical Association, 97, 136–159.
Hoaglin, D. C. (1985). Summarizing shape numerically: The g-and-h distributions. In D. C. Hoaglin, F. Mosteller, & J. W. Tukey (Eds.), Exploring data tables, trends, and shapes (pp. 461–514). New York: Wiley.
Huber, P. J. (1981). Robust statistics. New York: Wiley.
Hubert, M., Rousseeuw, P. J., & Vanden Branden, K. (2005). ROBPCA: A new approach to robust principal component analysis. Technometrics, 47, 64–79.
Hubert, M., Rousseeuw, P. J., & Verboven, S. (2002). A fast method for robust principal components with applications to chemometrics. Chemometrics & Intelligent Laboratory Systems, 60, 101–111.
Li, G., & Chen, Z. (1985). Projection-pursuit approach to robust dispersion matrices and principal components: Primary theory and Monte Carlo. Journal of the American Statistical Association, 80, 759–766.
Locantore, N., Marron, J. S., Simpson, D. G., Tripoli, N., Zhang, J. T., & Cohen, K. L. (1999). Robust principal components for functional data. Test, 8, 1–28.
Maronna, R. A. (2005). Principal components and orthogonal regression based on robust scales. Technometrics, 47, 264–273.
Massé, J.-C., & Plante, J.-F. (2003). A Monte Carlo study of the accuracy and robustness of ten bivariate location estimators. Computational Statistics & Data Analysis, 42, 1–26.
Olive, D. J. (2004). A resistant estimator of multivariate location and dispersion. Computational Statistics & Data Analysis, 46, 93–102.
Olive, D. J. (2007). Applied robust statistics. Unpublished manuscript. www.math.siu.edu/olive/ol-bookp.htm.
Olive, D. J., & Hawkins, D. M. (2007). Robustifying robust estimators. Preprint (www.math.siu.edu/olive/preprints.htm).
Peña, D., & Prieto, F. J. (2001). Multivariate outlier detection and robust covariance matrix estimation. Technometrics, 43, 286–299.
Rousseeuw, P. J. (1985). Multivariate estimation with high breakdown point. In W. Grossmann, G. Pflug, & W. Wertz (Eds.), Mathematical statistics and applications, B (pp. 283–297). Dordrecht: Reidel.
Rousseeuw, P. J., & Leroy, A. M. (1987). Robust regression and outlier detection. New York: Wiley.
Rousseeuw, P. J., & Van Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41, 212–223.
Rousseeuw, P. J., & van Zomeren, B. C. (1990). Unmasking multivariate outliers and leverage points. Journal of the American Statistical Association, 85, 633–639.
Salibián-Barrera, M., Van Aelst, S., & Willems, G. (2006). Principal components analysis based on multivariate MM estimators with fast and robust bootstrap. Journal of the American Statistical Association, 101, 1198–1211.
Vardi, Y., & Zhang, C.-H. (2000). The multivariate L1-median and associated data depth. Proceedings of the National Academy of Sciences, 97, 1423–1426.
Wilcox, R. R. (2003). Applying contemporary statistical techniques. New York: Academic Press.
Wilcox, R. R. (2005). Introduction to robust estimation and hypothesis testing (2nd ed.). New York: Academic Press.
Wilcox, R. R. (in press). Some small-sample properties of some recently proposed multivariate outlier detection techniques. Journal of Statistical Computation & Simulation.
Williams, N., Stanchina, J., Bezdjian, S., Skrok, E., Raine, A., & Baker, L. (2005). Porteus’ mazes and executive function in children: Standardized administration and scoring, and relationships to childhood aggression and delinquency. Unpublished manuscript. Los Angeles: University of Southern California, Department of Psychology.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wilcox, R.R. Robust principal components: A generalized variance perspective. Behav Res 40, 102–108 (2008). https://doi.org/10.3758/BRM.40.1.102
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.3758/BRM.40.1.102