Abstract
We studied pre-processing of a female urinary incontinence data set by removing uninformative variables, outliers, and noise, to allow hierarchical clustering methods to find partitions that resemble the diagnostic classes. Outliers were identified with box plots and Mahalanobis distances, while noisy cases were detected with the repeated edited nearest neighbor rule. The cleaned data were analyzed with six clustering methods. The best results, as measured with Fowlkes and Mallows similarity measure, were achieved with complete linkage (0.90) and Ward’s method (0.84). These methods managed to separate the two largest diagnostic classes, stress and mixed incontinence, from each other. Unfortunately, single linkage, average linkage, centroid, and median methods were not able to differentiate between these classes. The results are in accord with our earlier results indicating that supervised methods suit better for classification of this data than cluster analysis. However, outliers, noise, and clusters, which were identified, may be of interest to expert physicians.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Laurikkala, J., Juhola, M., Penttinen, J., Aukee P.: Parameter Evaluation of the Differential Diagnosis of Female Urinary Incontinence for the Construction of an Expert System. In: Pappas, C., Maglaveras, N., Scherrer J.-R., (eds.): Medical Informatics Europe’97. Studies in Health Technology and Informatics, Vol. 43. IOS Press, Amsterdam (1997) 671–675
Laurikkala, J., Juhola, M., Lammi, S., Penttinen, J., Aukee, P.: Analysis of the Imputed Female Urinary Incontinence Data for the Evaluation of Expert System Parameters. Comput. Biol. Med. 31 (2001) 239–257
Laurikkala, J., Juhola, M., Lammi, S., Viikki, K.: A Comparison of Genetic Algorithms and Different Classification Methods in the Diagnosis of Female Urinary Incontinence. Methods Inf. Med. 38 (1999) 125–131
Everitt, B.S.: Cluster Analysis. Wiley, London (1993)
Jain, J.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, New Jersey (1988)
Sharma, S.: Applied Multivariate Techniques. Wiley, New York (1996)
Laurikkala, J., Juhola, M., Kentala, E.: Informal Identification of Outliers in Medical Data. In: Lavrac, N., Miksch, S., Kavsek, B., (eds.): The Fifth Workshop on Intelligent Data Analysis in Medicine and Pharmacology (IDAMAP’2000), Berlin (2000) 20–24
Wilson, D.L.: Asymptotic Properties on Nearest Neighbor Rules Using Edited Data. IEEE Trans. Syst. Man. Cybern. 2 (1972) 408–421
Barnett, V., Lewis, T.: Outliers in Statistical Data. 2nd edn. Wiley, Norwich (1987)
Quinlan, J.R.: Induction of Decision Trees. Mach. Learn. 1 (1986) 81–106
Wilson, R.D., Martinez, T.R.: Reduction Techniques for Instance-based Learning Algorithms. Mach. Learn. 38 (2000) 257–286
Wilson, R.D., Martinez, T.R.: Improved Heterogeneous Distance Functions. J. Artif. Intell. Res. 6 (1997) 1–34
Fowlkes, E.B., Mallows, C.L.: A Method for Comparing Two Hierarchical Clusterings. J. Am. Stat. A. 78 (1983) 553–568
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Laurikkala, J., Juhola, M. (2001). Hierarchical Clustering of Female Urinary Incontinence Data Having Noise and Outliers. In: Crespo, J., Maojo, V., Martin, F. (eds) Medical Data Analysis. ISMDA 2001. Lecture Notes in Computer Science, vol 2199. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45497-7_24
Download citation
DOI: https://doi.org/10.1007/3-540-45497-7_24
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42734-6
Online ISBN: 978-3-540-45497-7
eBook Packages: Springer Book Archive