Abstract
In this paper we propose a new approach to the exploratory analysis of multivariate clustered data. Our technique is based on a fast forward search algorithm which orders multivariate observations from those most in agreement with a specified clustering structure to those least in agreement with it. Simple graphical displays of a variety of statistics involved in the forward search lead to the identification of multiple outliers and influential observations in nonhierarchical cluster analysis, without being affected by masking and swamping problems. The suggested approach is applied to the convergent K-means method in two examples, both with real and simulated data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Atkinson A. C. (1994). Fast Very Robust Methods for the Detection of Multiple Outliers, Journal of the American Statistical Association, 89, 1329–1339.
Atkinson A. C. and Riani M. (1997). Bivariate Boxplots, Multiple Outliers, Multivariate Transformations and Discriminant Analysis: the 1997 Hunter Lecture, Environmetrics, 8, 583–602.
Barnett V. and Lewis T. (1994). Outliers in Statistical Data. 3rd Edition
Wiley, Chichester. Cerioli A. (1997). Measuring the Influence of Individual Observations and Variables in Cluster Analysis, submitted for publication in the series: Classification, Data Analysis and Knowledge Organization, Springer-Verlag, Berlin.
Cheng R. and Milligan G. W. (1996). Measuring the Influence of Individual Data Points in a Cluster Analysis, Journal of Classification, 13, 315–335.
Cuesta-Albertos J. A., Gordaliza A. and Matrán C. (1997). Trimmed k- Means: an Attempt to Robustify Quantizers, The Annals of Statistics, 25, 553–576.
Eurostat (1997). Regionen. Statistisches Jahrbuch. 1996, Luxembourg.
Gordon A. D. (1996). Hierarchical Classification, in: Clustering and Classification, P. Arabie, L. J. Hubert and G. De Soete (eds.), World Scientific, Singapore, 65–121.
Hadi A. S. (1992). Identifying Multiple Outliers in Multivariate Data, Journal of the Royal Statistical Society, B, 54, 761–771.
Hadi A. S. and Simonoff J. S. (1993). Procedures for the Identification of Multiple Outliers in Linear Models, Journal of the American Statistical Association, 88, 1264–1272.
Jolliffe I. T., Jones B. and Morgan B. J. T. (1995). Identifying Influential Observations in Hierarchical Cluster Analysis, Journal of Applied Statistics, 22, 61–80.
Milligan G. W. (1996). Clustering Validation: Results and Implications for Applied Analyses, in: Clustering and Classification, P. Arabie, L. J. Hubert and G. De Soete (eds.), World Scientific, Singapore, 341–375.
SAS (1990). SAS/STAT User’s Guide. Ver. 6. 4th Edition, SAS Institute, Cary, NC.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin · Heidelberg
About this paper
Cite this paper
Cerioli, A. (1998). A New Method for Detecting Influential Observations in Nonhierarchical Cluster Analysis. In: Rizzi, A., Vichi, M., Bock, HH. (eds) Advances in Data Science and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-72253-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-72253-0_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64641-9
Online ISBN: 978-3-642-72253-0
eBook Packages: Springer Book Archive