ABSTRACT
The growing number of traces left behind user transactions on the Internet (e.g. customer purchases, user navigations, etc.) has increased the importance of Web usage data analysis. A notable challenge of this analysis is the fact that the way in which a website is visited can evolve over time. As a result, the usage models must be continuously updated in order to reflect the current behaviour of the visitors. In this article, we introduce CAMEUD, a clustering approach to mine and detect changes in evolving usage data. The proposed approach is totally independent from the clustering algorithm applied in the classification problem and is able to detect and determine the nature of changes undergone by the usage groups (appearance, disappearance, fusion and split) at subsequent time intervals. Experiments on synthetic and real usage data sets evaluate the efficiency of CAMEUD.
Supplemental Material
Available for Download
Supplemental file.
- C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for clustering evolving data streams. In VLDB'2003: Proceedings of the 29th international conference on Very large data bases, pages 81--92, 2003. Google ScholarDigital Library
- M. Aldenderfer and R. Blashfield. Cluster Analysis. Sage Publications, Beverly Hills, California, 1984.Google Scholar
- G. Celeux, E. Diday, G. Govaert, Y. Lechevallier, and H. Ralambondrainy. Classification automatique des données. Dunod, Paris, 1989.Google Scholar
- B. Csernel, F. Clerot, and G. Hebrail. Streamsamp: Datastream clustering over tilted windows through sampling. In ECML PKDD 2006 Workshop on Knowledge Discovery from Data Streams, 2006.Google Scholar
- A. Da Silva, Y. Lechevallier, F. Rossi, and F. de A. T. de Carvalho. Clustering dynamic web usage data. In Innovative Applications in Data Mining, volume 169 of Studies in Computational Intelligence, pages 71--82. Springer, 2009.Google ScholarCross Ref
- O. Elemento. Apport de l'analyse en composantes principales pour l'initialisation et la validation de cartes topologiques de kohonen. In SFC'99, Nancy, France, 1999.Google Scholar
- D. Fetterly, M. Manasse, M. Najork, and J. L. Wiener. A large-scale study of the evolution of web pages. In In Proceedings of the 12th International World Wide Web Conference, pages 669--678. ACM Press, 2003. Google ScholarDigital Library
- L. Hubert and P. Arabie. Comparing partitions. Journal of Classification, 2:193--218, 1985.Google ScholarCross Ref
- E. J. Johnson, W. W. Moe, P. S. Fader, S. Bellman, and G. L. Lohse. On the depth and dynamics of online search behavior. Manage. Sci., 50(3):299--308, 2004. Google ScholarDigital Library
- M. Khalilian and N. Mustapha. Data stream clustering: Challenges and issues. In The 2010 IAENG International Conference on Data Mining and Applications, Hong Kong, March 2010.Google Scholar
- T. Kohonen. Self-Organizing Maps, volume 30 of Springer Series in Information Sciences. Springer, third edition, 1995. Last edition published in 2001. Google ScholarDigital Library
- J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 281--297. University of California Press, 1967.Google Scholar
- A. R. Mahdiraji. Clustering data stream: A survey of algorithms. Int. J. Know.-Based Intell. Eng. Syst., 13(2):39--44, 2009. Google ScholarDigital Library
- F. Murtagh. Interpreting the kohonen self-organizing feature map using contiguity-constrained clustering. Pattern Recogn. Lett., 16:399--408, April 1995. Google ScholarDigital Library
- L. O'Callaghan, N. Mishra, A. Meyerson, S. Guha, and R. Motwani. Streaming-data algorithms for high-quality clustering. In Proceedings of IEEE International Conference on Data Engineering, pages 685--694, 2001. Google ScholarDigital Library
- M. Spiliopoulou, I. Ntoutsi, Y. Theodoridis, and R. Schult. Monic: modeling and monitoring cluster transitions. In Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, pages 706--711. ACM, 2006. Google ScholarDigital Library
- J. Srivastava, R. Cooley, M. Deshpande, and P.-N. Tan. Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations, 1(2):12--23, 2000. Google ScholarDigital Library
- C. J. van Rijsbergen. Information Retrieval. Butterworths, London, second edition, 1979. Google ScholarDigital Library
- E. H. Wu, M. K. Ng, A. M. Yip, and T. F. Chan. A clustering model for mining evolving web user patterns in data stream environment. In IDEAL'04, pages 565--571, 2004.Google ScholarCross Ref
- M. L. Zhang, M. W. Edu, T. Zhang, T. Zhang, R. Ramakrishnan, R. Ramakrishnan, and M. Livny. Birch: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery, 1:141--182, 1997. Google ScholarDigital Library
Index Terms
- CAMEUD: clustering approach for mining evolving usage data
Recommendations
Efficient web usage mining process for sequential patterns
iiWAS '09: Proceedings of the 11th International Conference on Information Integration and Web-based Applications & ServicesThe tremendous growth in volume of web usage data results in the boost of web mining research with focus on discovering potentially useful knowledge from web usage data.
This paper presents a new web usage mining process for finding sequential patterns ...
Evolution and Affinity-propagation Based Approach for Data Stream Clustering
ICFET '18: Proceedings of the 4th International Conference on Frontiers of Educational TechnologiesIn this paper, SED-Stream-AP is proposed as an extension SED-Stream which is an efficient evolution-based stream clustering technique. SED-Steam-AP is a stream clustering technique that integrates evolution and affinity propagation clustering. It adopts ...
Mining and monitoring evolving data
Handbook of massive data setsData mining algorithms have been the focus of much recent research. The initial spurt of research on data mining algorithms typically considered static datasets. In practice, the input data to a data mining process resides in a large data warehouse ...
Comments