Skip to main content
Log in

Chameleon based on clustering feature tree and its application in customer segmentation

  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

Clustering analysis plays an important role in the filed of data mining. Nowadays, hierarchical clustering technique is becoming one of the most widely used clustering techniques. However, for most algorithms of hierarchical clustering technique, the requirements of high execution efficiency and high accuracy of clustering result cannot be met at the same time. After analyzing the advantages and disadvantages of the hierarchical algorithms, the paper puts forward a two-stage clustering algorithm, named Chameleon Based on Clustering Feature Tree (CBCFT), which hybridizes the Clustering Tree of algorithm BIRCH with algorithm CHAMELEON. By calculating the time complexity of CBCFT, the paper argues that the time complexity of CBCFT increases linearly with the number of data. By experimenting on sample data set, this paper demonstrates that CBCFT is able to identify clusters with large variance in size and shape and is robust to outliers. Moreover, the result of CBCFT is as similar as that of CHAMELEON, but CBCFT overcomes the shortcoming of the low execution efficiency of CHAMELEON. Although the execution time of CBCFT is longer than BIRCH, the clustering result of CBCFT is much satisfactory than that of BIRCH. Finally, through a case of customer segmentation of Chinese Petroleum Corp. HUBEI branch; the paper demonstrates that the clustering result of the case is meaningful and useful.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ankerst, M., Breunig, M., & Kriegel, H. P. Sander, J. (1999). OPICS: Ordering points to identify the clustering structure. In Proceedings of 1999 ACM-SIGMOD international conference on management of data (SIGMOD’99), Philadelphia, June 1999 (pp. 49–60).

  • Chen, M. S., Han, J., & Yu, P. S. (1996). Data mining: an overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6), 866–883.

    Article  Google Scholar 

  • Chicc, G., Napoli, R. C., & Piglione, F. (2006). Comparisons among clustering techniques for electricity customer classification. IEEE Transactions on Power Systems, 21(2).

  • Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the second international conference of knowledge discovery and data mining (pp. 226–231).

  • Friedman, J. H., Bentley, J. L., & Finkel, R. A. (1977). An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software, 3, 209–226.

    Article  Google Scholar 

  • Guha, S., Rastogi, R., & Shim, K. (1998) CURE: An efficient clustering algorithm for large databases. In Proceedings of ACM–SIGMOD international conference on management of data (pp. 73–84).

  • Guha, S., Rastogi, R., & Shim, K. (1999). ROCK: a robust clustering algorithm for categorical attributes. In Proceedings of the 15th international conference on data engineering (p. 512).

  • Halkidi, M., Batistakis, Y., & Vazirgiannis, M. (2001). Clustering algorithms and validity measures. In Proceedings of the thirteenth international conference on scientific and statistical database management (Vol. 3, pp. 1099–3371).

  • Han, J., & Kamber, M. (2001). Data mining concepts and techniques. BeiJing: Higher Education Press.

    Google Scholar 

  • Hinneburg, A., & Keim, D. A. (1998). An efficient approach to clustering in large multimedia databases with noise. In Proc. of 1998 int. conf. on knowledge discovery and data mining (KDD’98), New York, August 1998 (pp. 58–65).

  • Hu, T. L., & Sheu, J. B. (2003). A fuzzy-based customer classification method for demand-responsive logistical distribution operations. Fuzzy Sets and Systems, 139, 431–450.

    Article  Google Scholar 

  • Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: a review. ACM Computing Surveys (CSUR), 31(3), 264–323.

    Article  Google Scholar 

  • Jung, S. Y., & Kim, T. S. (2001). An agglomerative hierarchical clustering using partial maximum array and incremental similarity computation method. In First IEEE international conference on data mining (p. 265).

  • Karypis, G., Han, E. H., & Kumar, V. (1999). Chameleon: Hierarchical clustering using dynamic modeling. IEEE Computer, 32(8), 68–75 (Special Issue on Data Analysis and Mining).

    Google Scholar 

  • Karypis, G., & Kumar, V. (1998). hMETIS 1.5: A hypergraph partitioning package. Technical report, Department of Computer Science, University of Minnesota. http://www.cs.umn.edu/~metis.

  • Lewis, J., & Chase, J. (2004). Data structures. Java edition. Englewood Cliffs: Prentice-Hall.

    Google Scholar 

  • Li, C., Becerra, V. M., & Deng, J. (2004). Extension of fuzzy c-means algorithm. In Proceedings of the IEEE conference on cybernetics and intelligent systems, Singapore, 1–3 December 2004.

  • Lin, C. R., & Chen, M. S. (2005). Combining partitional and hierarchical algorithms for robust and efficient data clustering with cohesion self-merging. IEEE Transactions on Knowledge and Date Engineering, 17(2).

  • Ng, R., & Han, J. (1994). Efficient and effective clustering method for spatial data mining. In Proceedings of international conference of very large data bases (VLDB’94), Santiago, Chile, September 1994 (pp. 144–155).

  • Olson, D. L., & Zhao, F. (2007). CIOs’ perspectives of critical success factors in ERP upgrade projects. Enterprise Information Systems, 1(1), 129–138.

    Article  Google Scholar 

  • Palomino, M. A., & Whitley, E. A. (2007). The effects of national culture on ERP implementation: a study of Colombia and Switzerland. Enterprise Information Systems, 3(1), 301–325.

    Google Scholar 

  • Qian, Y. T., Shi, Q. S., & Wang, Q. (2002). CURE-NS: A hierarchical clustering algorithm with new shrinking scheme. In Proceedings of the first international conference on machine learning and cybernetics, Beijing, 4–5 November 2002.

  • Sheikholeslami, G., Chatterjee, S., & Zhang, A. (1998). WaveCluster: A multi-resolution clustering approach for very large spatial databases. In Proceedings of the IEEE conference on very large data bases (VLDB’98), New York, August 1998 (pp. 428–439).

  • Wan, L. H., Li, Y. J., Liu, W. Y., & Zha, D. Y. (2005). Application and study of spatial cluster and customer partitioning. In Proceedings of the fourth international conference on machine learning and cybernetics, Guangzhou, 18–21 August 2005.

  • Wang, W., Yang, J., & Muntz, R. (1997). STING: A statistical information grid approach to spatial data mining. In Proceedings of 1997 international conference on very large data bases (VLDB’97), Athens, Greece, August 1997 (pp. 186–195).

  • Wang, Z., He, P. L., Guo, L. S., & Zheng, X. S. (2004). Clustering analysis of customer relationship in securities trade. In Proceedings of the third international conference on machine learning and cybernetics, Shanghai, August 2004 (pp. 26–29).

  • Warfield, J. N. (2007). Systems science serves enterprise integration: a tutorial. Enterprise Information Systems, 2(1), 235–254.

    Article  Google Scholar 

  • Weiss, M. A. (2004). Data structures and algorithm analysis in Java. Englewood Cliffs: Prentice-Hall.

    Google Scholar 

  • Zhang, T., Ramakrishnan, R., & Linvy, M. (1996). Birch: an efficient data clustering method for large databases. In Proc. of 1996 ACM–SIGMOD international conference on management of data, Montreal, Quebec (pp. 103–114).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kanliang Wang.

Additional information

The research is partially supported by National Natural Science Foundation of China (grants #70372049 and #70121001).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Wang, K. & Xu, L. Chameleon based on clustering feature tree and its application in customer segmentation. Ann Oper Res 168, 225–245 (2009). https://doi.org/10.1007/s10479-008-0368-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-008-0368-4

Keywords

Navigation