Skip to main content

A New Method for Detecting Influential Observations in Nonhierarchical Cluster Analysis

  • Conference paper
Advances in Data Science and Classification

Abstract

In this paper we propose a new approach to the exploratory analysis of multivariate clustered data. Our technique is based on a fast forward search algorithm which orders multivariate observations from those most in agreement with a specified clustering structure to those least in agreement with it. Simple graphical displays of a variety of statistics involved in the forward search lead to the identification of multiple outliers and influential observations in nonhierarchical cluster analysis, without being affected by masking and swamping problems. The suggested approach is applied to the convergent K-means method in two examples, both with real and simulated data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Atkinson A. C. (1994). Fast Very Robust Methods for the Detection of Multiple Outliers, Journal of the American Statistical Association, 89, 1329–1339.

    Article  Google Scholar 

  • Atkinson A. C. and Riani M. (1997). Bivariate Boxplots, Multiple Outliers, Multivariate Transformations and Discriminant Analysis: the 1997 Hunter Lecture, Environmetrics, 8, 583–602.

    Article  Google Scholar 

  • Barnett V. and Lewis T. (1994). Outliers in Statistical Data. 3rd Edition

    Google Scholar 

  • Wiley, Chichester. Cerioli A. (1997). Measuring the Influence of Individual Observations and Variables in Cluster Analysis, submitted for publication in the series: Classification, Data Analysis and Knowledge Organization, Springer-Verlag, Berlin.

    Google Scholar 

  • Cheng R. and Milligan G. W. (1996). Measuring the Influence of Individual Data Points in a Cluster Analysis, Journal of Classification, 13, 315–335.

    Article  Google Scholar 

  • Cuesta-Albertos J. A., Gordaliza A. and Matrán C. (1997). Trimmed k- Means: an Attempt to Robustify Quantizers, The Annals of Statistics, 25, 553–576.

    Article  Google Scholar 

  • Eurostat (1997). Regionen. Statistisches Jahrbuch. 1996, Luxembourg.

    Google Scholar 

  • Gordon A. D. (1996). Hierarchical Classification, in: Clustering and Classification, P. Arabie, L. J. Hubert and G. De Soete (eds.), World Scientific, Singapore, 65–121.

    Google Scholar 

  • Hadi A. S. (1992). Identifying Multiple Outliers in Multivariate Data, Journal of the Royal Statistical Society, B, 54, 761–771.

    Google Scholar 

  • Hadi A. S. and Simonoff J. S. (1993). Procedures for the Identification of Multiple Outliers in Linear Models, Journal of the American Statistical Association, 88, 1264–1272.

    Article  Google Scholar 

  • Jolliffe I. T., Jones B. and Morgan B. J. T. (1995). Identifying Influential Observations in Hierarchical Cluster Analysis, Journal of Applied Statistics, 22, 61–80.

    Article  Google Scholar 

  • Milligan G. W. (1996). Clustering Validation: Results and Implications for Applied Analyses, in: Clustering and Classification, P. Arabie, L. J. Hubert and G. De Soete (eds.), World Scientific, Singapore, 341–375.

    Google Scholar 

  • SAS (1990). SAS/STAT User’s Guide. Ver. 6. 4th Edition, SAS Institute, Cary, NC.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin · Heidelberg

About this paper

Cite this paper

Cerioli, A. (1998). A New Method for Detecting Influential Observations in Nonhierarchical Cluster Analysis. In: Rizzi, A., Vichi, M., Bock, HH. (eds) Advances in Data Science and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-72253-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-72253-0_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64641-9

  • Online ISBN: 978-3-642-72253-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics