A New Method for Detecting Influential Observations in Nonhierarchical Cluster Analysis

Cerioli, Andrea

doi:10.1007/978-3-642-72253-0_2

Andrea Cerioli⁸

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

895 Accesses
1 Citations

Abstract

In this paper we propose a new approach to the exploratory analysis of multivariate clustered data. Our technique is based on a fast forward search algorithm which orders multivariate observations from those most in agreement with a specified clustering structure to those least in agreement with it. Simple graphical displays of a variety of statistics involved in the forward search lead to the identification of multiple outliers and influential observations in nonhierarchical cluster analysis, without being affected by masking and swamping problems. The suggested approach is applied to the convergent K-means method in two examples, both with real and simulated data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Atkinson A. C. (1994). Fast Very Robust Methods for the Detection of Multiple Outliers, Journal of the American Statistical Association, 89, 1329–1339.
Article Google Scholar
Atkinson A. C. and Riani M. (1997). Bivariate Boxplots, Multiple Outliers, Multivariate Transformations and Discriminant Analysis: the 1997 Hunter Lecture, Environmetrics, 8, 583–602.
Article Google Scholar
Barnett V. and Lewis T. (1994). Outliers in Statistical Data. 3rd Edition
Google Scholar
Wiley, Chichester. Cerioli A. (1997). Measuring the Influence of Individual Observations and Variables in Cluster Analysis, submitted for publication in the series: Classification, Data Analysis and Knowledge Organization, Springer-Verlag, Berlin.
Google Scholar
Cheng R. and Milligan G. W. (1996). Measuring the Influence of Individual Data Points in a Cluster Analysis, Journal of Classification, 13, 315–335.
Article Google Scholar
Cuesta-Albertos J. A., Gordaliza A. and Matrán C. (1997). Trimmed k- Means: an Attempt to Robustify Quantizers, The Annals of Statistics, 25, 553–576.
Article Google Scholar
Eurostat (1997). Regionen. Statistisches Jahrbuch. 1996, Luxembourg.
Google Scholar
Gordon A. D. (1996). Hierarchical Classification, in: Clustering and Classification, P. Arabie, L. J. Hubert and G. De Soete (eds.), World Scientific, Singapore, 65–121.
Google Scholar
Hadi A. S. (1992). Identifying Multiple Outliers in Multivariate Data, Journal of the Royal Statistical Society, B, 54, 761–771.
Google Scholar
Hadi A. S. and Simonoff J. S. (1993). Procedures for the Identification of Multiple Outliers in Linear Models, Journal of the American Statistical Association, 88, 1264–1272.
Article Google Scholar
Jolliffe I. T., Jones B. and Morgan B. J. T. (1995). Identifying Influential Observations in Hierarchical Cluster Analysis, Journal of Applied Statistics, 22, 61–80.
Article Google Scholar
Milligan G. W. (1996). Clustering Validation: Results and Implications for Applied Analyses, in: Clustering and Classification, P. Arabie, L. J. Hubert and G. De Soete (eds.), World Scientific, Singapore, 341–375.
Google Scholar
SAS (1990). SAS/STAT User’s Guide. Ver. 6. 4th Edition, SAS Institute, Cary, NC.
Google Scholar

Download references

Author information

Authors and Affiliations

Istituto di Statistica, Università di Parma, Via Kennedy 6, 43100, Parma, Italy
Andrea Cerioli

Authors

Andrea Cerioli
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Statisticà, Probabilità e Statistiche Applicate, Università di Roma “La Sapienza”, Piazzale Aldo Moro 5, I-00185, Roma, Italia
Alfredo Rizzi
Dipartimento di Metodi Quantitativi e Teoria Economica, Università “G. D’Annunzio” di Chieti, Viale Pindaro 42, I-65127, Pescara, Italia
Maurizio Vichi
Institut für Statistik und Wirtschaftsmathematik, Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen, Wüllnerstraße 3, D-52056, Aachen, Germany
Hans-Hermann Bock

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cerioli, A. (1998). A New Method for Detecting Influential Observations in Nonhierarchical Cluster Analysis. In: Rizzi, A., Vichi, M., Bock, HH. (eds) Advances in Data Science and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-72253-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-72253-0_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64641-9
Online ISBN: 978-3-642-72253-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics