Abstract
Clustering is a difficult problem especially when we consider the task in the context of a data stream of categorical attributes. In this paper, we propose SCLOPE, a novel algorithm based on CLOPE’s intuitive observation about cluster histograms. Unlike CLOPE however, our algo- rithm is very fast and operates within the constraints of a data stream environment. In particular, we designed SCLOPE according to the recent CluStream framework. Our evaluation of SCLOPE shows very promising results. It consistently outperforms CLOPE in speed and scalability tests on our data sets while maintaining high cluster purity; it also supports cluster analysis that other algorithms in its class do not.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bradley, P.S., Gehrke, J., Ramakrishnan, R., Srikant, R.: Philosophies and Advances in Scaling Mining Algorithms to Large Databases. Communications of the ACM (2002)
Hulten, G., Domingos, P.: Catching Up with the Data: Research Issues in Mining Data Streams. In: Workshop on Research Issues in Data Mining and Knowledge Discovery, Santa Barbara, CA (2001)
Yang, Y., Guan, X., You, J.: CLOPE: A Fast and Effective Clustering Algorithm for Transactional Data. In: Proc. SIGKDD, Edmonton, Canada (2002)
Aggarwal, C., Han, J., Wang, J., Yu, P.S.: A Framework for Clustering Evolving Data Streams. In: Proc. VLDB, Berlin, Germany (2003)
Han, J., Pei, J., Yin, Y.: Mining Frequent Patterns without Candidate Generation. In: Proc. SIGMOD, Dallas, Texas, USA (2000)
Ng, R., Han, J.: Efficient and Effective Clustering Methods for Spatial Data Mining. In: Proc. VLDB, Santiago de Chile, Chile (1994)
Guha, S., Rastogi, R., Shim, K.: ROCK: A Robust Clustering Algorithm for Categorical Attributes. In: Proc. ICDE, Sydney, Austrialia (1999)
Wang, K., Xu, C., Liu, B.: Clustering Transactions Using Large Items. In: Proc. CIKM, Kansas City, Missouri, USA (1999)
Ong, K.L., Li, W., Ng, W.K., Lim, E.P.: SCLOPE: An Algorithm for Clustering Data Streams of Categorical Attributes. Technical Report (C04/05), Deakin University (2004)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: AN Efficient Data Clustering Method for Very Large Databases. In: Proc. SIGMOD, Canada (1996)
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In: Proc. SIGMOD, Seattle, Washington, USA (1998)
Ganti, V., Gehrke, J., Ramakrishnan, R.: CACTUS: Clustering Categorical Data Using Summaries. In: Proc. SIGKDD, San Diego, California, USA (1999)
Gibson, D., Kleinberg, J.M., Raghavan, P.: Clustering Categorical Data: An Approach Based on Dynamical Systems. In: Proc. VLDB, New York, USA (1998)
O’Callaghan, L., Meyerson, A., Motwani, R., Mishra, N., Guha, S.: Streaming Data Algorithms for High Quality Clustering. In: Proc. ICDE, USA (2002)
Barbara, D.: Requirements for Clustering Data Streams. ACM SIGKDD Explorations 2 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ong, KL., Li, W., Ng, WK., Lim, EP. (2004). SCLOPE: An Algorithm for Clustering Data Streams of Categorical Attributes. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2004. Lecture Notes in Computer Science, vol 3181. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30076-2_21
Download citation
DOI: https://doi.org/10.1007/978-3-540-30076-2_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22937-7
Online ISBN: 978-3-540-30076-2
eBook Packages: Springer Book Archive