Skip to main content

Clustering Mixed Type Attributes in Large Dataset

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3758))

Abstract

Clustering is a widely used technique in data mining, now there exists many clustering algorithms, but most existing clustering algorithms either are limited to handle the single attribute or can handle both data types but are not efficient when clustering large data sets. Few algorithms can do both well. In this paper, we propose a clustering algorithm CFIKP that can handle large datasets with mixed type of attributes. We first use CF *-tree to pre-cluster datasets. After the dense regions are stored in leaf nodes, then we look every dense region as a single point and use an improved k-prototype to cluster such dense regions. Experiments show that the CFIKP algorithm is very efficient in clustering large datasets with mixed type of attributes.

This work is supported by the National Natural Science Foundation of China (60205007), Natural Science Foundation of Guangdong Province (031558, 04300462), Research Foundation of National Science and Technology Plan Project (2004BA721A02), Research Foundation of Science and Technology Plan Project in Guangdong Province (2003C50118) and Research Foundation of Science and Technology Plan Project in Guangzhou City(2002Z3-E0017).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Pro. 5th Berkeley Symp. Math. Statist, Pro., vol. 1, pp. 128–297 (1967)

    Google Scholar 

  2. Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovering 2, 283–304 (1998)

    Article  Google Scholar 

  3. Ng, R., Han, J.: Efficient and effective clustering method for spatial data mining. In: Pro. 1994 Int. Conf. Very Large Data Bases, pp. 144–155 (1994)

    Google Scholar 

  4. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clustering in large spatial database with noise. In: Proc. 1996 Int. Conf. Knowledge Discovering and Data Mining, pp. 266–231 (1996)

    Google Scholar 

  5. Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: Proc. ACM-SIGKDD Int. Conf. Managament of Data, pp. 103–114 (1996)

    Google Scholar 

  6. Chiu, T., Fang, D.P., Chen, J., Wang, Y.: A Robust and Scalable Clustering Algorithm for Mixed Type Attributes in Large Database Environment. In: Proc. ACM-SIGKDD int. conf. Knowledge discovery and data mining (KDD 2001), pp. 263–268 (2001)

    Google Scholar 

  7. Chen, P., Wang, Y.: An Efficient clustering algorithm for categorical and mixed typed attributes. Computer Engineering and Application (1), 190–191 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yin, J., Tan, Z. (2005). Clustering Mixed Type Attributes in Large Dataset. In: Pan, Y., Chen, D., Guo, M., Cao, J., Dongarra, J. (eds) Parallel and Distributed Processing and Applications. ISPA 2005. Lecture Notes in Computer Science, vol 3758. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11576235_66

Download citation

  • DOI: https://doi.org/10.1007/11576235_66

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29769-7

  • Online ISBN: 978-3-540-32100-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics