Abstract
User grouping is one of the domains that have gained more interest among the web researchers so that users can be provided a customized and better environment. This paper attempts on finding user groups through their web page navigation pattern. The methodology primarily incorporates five different ways of interpreting the input navigation sequence and investigates if there is any influence in the obtained results. For each interpretation, pair-wise sequence alignment is carried out to determine the number of aligned pages. Subsequently, a similarity matrix is formulated based on the number of aligned pages and the unique number of pages between the users. Then, the matrix is thresholded based on the maximum value at each column as column thresholding. After that, breadth first search is performed to identify the user groups. MSNBC dataset, a publicly available data is used for assess the proposed methodology. Jaccard similarity co-efficient is computed to calculate the inter-group similarity. The values exhibits that the sorted input navigation sequence without redundancy provides effective clustering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cooley R, Mobasher B, Srivastava J (1997) Web mining: information and pattern discovery on the world wide web. In: Proceedings of Ninth IEEE International Conference on Tools with Artificial Intelligence, 1997. IEEE, pp 558–567
Baeza-Yates R, Boldi P (2010) Web structure mining. In: Advanced techniques in web intelligence-I. Springer, Berlin, Heidelberg, pp 113–142
Xu G, Zhang Y, Li L (2011) Web content mining. In: Web mining and social networking. Springer, Boston, MA, pp 71–87
Mobasher B (2005) Web usage mining. In: Encyclopedia of data warehousing and mining. IGI Global, pp 1216–1220
Srivastava J, Cooley R, Deshpande M, Tan PN (2000) Web usage mining: discovery and applications of usage patterns from web data. ACM SIGKDD Explorations Newsl 1(2):12–23
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48(3):443–453
Nanopoulos A, Manolopoulos Y (2001) Mining patterns from graph traversals. Data Knowl Eng 37(3):243–266
Bondy JA, Murty USR (1976) Graph theory with applications, vol 290. Macmillan, London
Fu Y, Sandhu K, Shih M (1999) Clustering of web users based on access patterns. KDD workshop on Web Mining, San Diego
Mojica JA, Rojas DA, Gomez J, Gonzalez F (2005) Page clustering using a distance based algorithm. In: Web congress, 2005. LA-WEB 2005. Third Latin American. IEEE, 7 pp
Poornalatha G, Raghavendra PS (2011) Web user session clustering using modified K-means algorithm. In: International conference on advances in computing and communications. Springer, Berlin, Heidelberg, pp 243–252
Maheswari BU, Sumathi P (2014) A new clustering and preprocessing for web log mining. In: 2014 world congress on computing and communication technologies (WCCCT). IEEE, pp 25–29
Chitraa V, Thanamani AS (2015) Clustering of navigation patterns using Bolzwano_Weierstrass theorem. Indian J Sci Technol 8(12):1
Krishnaveni K, Rathipriya R (2016) MapReduce k-means based co-clustering approach for web page recommendation system. Int J Comput Intell Inform 6(2):161–171
GeethaRamani R, Revathy P, Lakshmi B (2018) Grouping of users based on user navigation behaviour using supervised association rule tree mining. Int J Reason-Based Intell Syst 10(3/4):307–315
Al-asadi TA, Obaid AJ (2016) Discovering similar user navigation behavior in Web log data. Int J Appl Eng Res 11(16):8797–8805
Hofgesang PI (2006) Relevance of time spent on web pages. In: Proceedings of KDD workshop on web mining and web usage analysis, in conjunction with the 12th ACM SIGKDD international conference on knowledge discovery and data mining
Xing D, Shen J (2004) Efficient data mining for web navigation patterns. Inf Softw Technol 46(1):55–63
Krol D, Scigajlo M, Trawinski B (2008) Investigation of internet system user behaviour using cluster analysis. In: 2008 international conference on machine learning and cybernetics, vol 6. IEEE, pp. 3408–3412
Shi P (2009) An efficient approach for clustering web access patterns from web logs. Int J Adv Sci Technol 5(1):354–362
Azimpour-Kivi M, Azmi R (2011) A webpage similarity measure for web sessions clustering using sequence alignment. In: 2011 international symposium on artificial intelligence and signal processing (AISP). IEEE, pp 20–24
Hay B, Wets G, Vanhoof K (2004) Mining navigation patterns using a sequence alignment method. Knowl Inf Syst 6(2):150–163
Khasawneh N, Chan CC (2007) Multidimensional sessions comparison method using dynamic programming. In: 4th international conference on innovations in information technology, 2007. IIT’07. IEEE, pp 581–585
Li C, Lu Y (2007) Similarity measurement of web sessions based on sequence alignment. Wuhan Univ J Nat Sci 12(5):814–818
Mishra R, Kumar P, Bhasker B (2015) A web recommendation system considering sequential information. Decis Support Syst 75:1–10
Geetharamani R, Revathy P, Jacob SG (2015) Prediction of users webpage access behaviour using association rule mining. Sadhana 40(8):2353–2365
University of California, Machine Learning Repository. https://archive.ics.uci.edu/ml/…/MSNBC.com+Anonymous+Web+Data
Cormen TH (2009) Introduction to algorithms, Chapter 23. MIT Press
Levandowsky M, Winter D (1971) Distance between sets. Nature 234(5323):34
Hartigan JA, Wong MA (1979) Algorithm AS 136: A k-means clustering algorithm. J R Stat Society Ser C (Appl Stat) 28(1):100–108
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Geetharamani, R., Revathy, P. (2020). Web User Grouping Based on Navigation Patterns Through Pair Wise Sequence Alignment and Breadth First Search. In: Kumar, A., Paprzycki, M., Gunjan, V. (eds) ICDSMLA 2019. Lecture Notes in Electrical Engineering, vol 601. Springer, Singapore. https://doi.org/10.1007/978-981-15-1420-3_180
Download citation
DOI: https://doi.org/10.1007/978-981-15-1420-3_180
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1419-7
Online ISBN: 978-981-15-1420-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)