Improving the prediction of page access by using semantically enhanced clustering

Sen, Erman; Toroslu, I. Hakki; Karagoz, Pinar

doi:10.1007/s10844-016-0398-3

Improving the prediction of page access by using semantically enhanced clustering

Published: 20 April 2016

Volume 47, pages 165–192, (2016)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Erman Sen¹,
I. Hakki Toroslu¹ &
Pinar Karagoz¹

344 Accesses
4 Citations
Explore all metrics

Abstract

There are many parameters that may affect the navigation behaviour of web users. Prediction of the potential next page that may be visited by the web user is important, since this information can be used for prefetching or personalization of the page for that user. One of the successful methods for the determination of the next web page is to construct behaviour models of the users by clustering. The success of clustering is highly correlated with the similarity measure that is used for calculating the similarity among navigation sequences. This work proposes a new approach for determining the next web page by extending the standard clustering with the content-based semantic similarity method. Semantics of web-pages are represented as sets of concepts, and thus, user session are modelled as sequence of sets. As a result, session similarity is defined as an alignment of two sequences of sets. The success of the proposed method has been shown through applying it on real life web log data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Assisting Web Site Navigation through Web Usage Patterns

Clustering Frequent Navigation Patterns from Website Logs by Using Ontology and Temporal Information

Efficient Web Log Mining and Navigational Prediction with EHPSO and Scaled Markov Model

Notes

CLUTO (scluster, gcluto)—a cross-platform for clustering low- and high-dimensional datasets and for analyzing the characteristics of the various clusters, http://glaros.dtc.umn.edu/gkhome/cluto/cluto/overview
One year of access web-logs of METU C.Eng. website (http://www.ceng.metu.edu.tr)

References

Batet, M., Erola, A., Sanchez, D., & Castella-Roca, J. (2013). Utility preserving query log anonymization via semantic microaggregation. Information Sciences, 242, 49–63.
Article Google Scholar
Bayir, M., Toroslu, I., Cosar, A., & Fidan, G. (2009). Smart miner: a new framework for mining large scale web usage data. In International conference in World Wide Web (pp. 161–170).
Bayir, M., Toroslu, I., Demirbas, M., & Cosar, A. (2012). Discovering better navigation sequences for the session construction problem. Data and Knowledge Engineering, 73, 58–72.
Article Google Scholar
Berendt, B. (2000a). Analysis of navigation behaviour in web sites integrating multiple information systems. VLDB Journal, 9, 56–75.
Berendt, B. (2000b). Web usage mining, site semantics and the support of navigation. In Web mining for e-commerce—challenges and opportunities workshop (WEBKDD).
Berendt, B. (2001). Understanding web usage at different levels of abstraction: coarsening and visualizing sequence. In WEBKDD Workshop of mining log data across all customer touch points.
Blanco, L., Dalvi, N., & Machanavajjhala, A. (2011). Highly efficient algorithms for structural clustering of large websites. In 20th international conference on world wide web (WWW) (pp. 443– 446).
Dai, H., & Mobasher, B. (2002). Using ontologies to discover domain-level web usage profiles. In PKDD workshop on semantic mining.
Eirinaki, M., & Vazirgiannis, M. (2003). Web mining for web personalization. ACM Transactions on Internet Technology, 3(1), 1–27.
Article Google Scholar
Eirinaki, M., Vazigiannis, M., & Varlamis, I. (2003). Sewep: using site semantics and a taxonomy to enhance the web personalization process. In ACM SIGKDD international conference on knowledge discovery and data mining (pp. 99–108).
Gunel, B., & Senkul, P. (2012a). Integrating semantic tagging with popularity based pagerank for next page prediction. In International symposium on computer and information sciences (ISCIS).
Gunel, B., & Senkul, P. (2012b). Investigating the effect of duration, page size end frequency on next page recommendation with pagerank algorithm. In WSDM Workshop on web search and click data (WSCD).
Harispe, S., Sanchez, D., Ranwez, S., Janaqi, S., & Montmain, J. (2014). A framework for unifying ontology-based semantic similarity measures: a study in the biomedical domain. Journal of Biomedical Informatics, 48, 38–53.
Article Google Scholar
Heflin, J., Hendler, J., & Luke, S. (1999). Shoe: a knowledge representation language for internet applications. CS-TR-4078 (UMACS TR-99-71), University of Maryland, Dept. of Computer Sciences.
Kilic, S., Senkul, P., & Toroslu, I.H. (2012). Clustering frequent navigation patterns from website logs by using ontology and temporal information. In International symposium on computer and information sciences (ISCIS) (pp. 363–370).
Mobasher, B., Cooley, R., & Srivastava, J. (1999). Creating adaptive web through usage-based clustering of urls. In IEEE Knowledge and data engineering exchange workshop.
Mobasher, B., Cooley, R., & Srivastava, J. (2000a). Automatic personalization based on web usage mining. Communications of the ACM, 43(8), 142–151.
Mobasher, B., Dai, H., Luo, T., Nakagawa, M., Yuqing, S., & Wiltshire, J. (2000b). Discovery of aggregate usage profiles for web personalization. In WEBKDD workshop on web mining for e-commerce.
Mobasher, B., Dai, H., Luo, T., Yuqing, S., & Zhu, J. (2000c). Integrating web usage and content mining for more effective personalization. In International conference on e-commerce and web technologies (ECWeb).
Needleman, S., & Wunsch, C. (1970). A general method applicable to search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3), 443–453.
Article Google Scholar
Pallis, G., Lefteris, A., & Vakali, A. (2007). Validation and interpretation of web users’ session clusters. Information Processing and Managament, 43(5), 1348–1367.
Article Google Scholar
Perkowitz, M., & Etzioni, O. (1998). Adaptive web sites: automatically synthesizing web pages. In National conference on artificial intelligence.
Perkowitz, M., & Etzioni, O. (1999). Adaptive web sites: conceptual cluster mining. In International joint conference on artificial intelligence (IJCAI).
Perkowitz, M., & Etzioni, O. (2000). Towards adaptive web sites: conceptual framework and case study. Artificial Intelligence, 118(1–2), 245–275.
Article MATH Google Scholar
Pirro, G. (2009). A semantic similarity metric combining features and intrinsic information content. Data and Knowledge Engineering, 68(11), 1289–1308.
Article Google Scholar
Rada, R., Mili, H., Bicknell, E., & Blettner, M. (1989). Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man, and Cybernetics, 19, 17–30.
Article Google Scholar
Ricklefs, M., & Blomqvist, E. (2008). Ontology-based relevance assesment: an evaluation of different semantic similarity measures. In On the move (OTM) confederated international conferences (coopIS) (pp. 1235–1252).
Sanchez, D., Batet, M., Isem, D., & Valls, A. (2012). Ontology-based semantic similarity: a new feature-based approach. Expert Systems with Applications, 39(9), 7718–7728.
Article Google Scholar
Senkul, P., & Salin, S. (2012). Improving pattern quality in web usage mining by using semantic information. Knowledge and Information Systems, 30, 527–541.
Article Google Scholar
Spiliopolou, M. (2000). Web usage mining for web site evaluation. Communications of the ACM, 43(8), 127–134.
Article Google Scholar
Spiliopoulou, M., & Faulstich, L. (1998). Wum: a web utilization miner. In International workshop on the web and databases.
Spiliopoulou, M., Faulstich, L., & Wilkler, K. (1999). A data miner analyzing the navigational behaviour of web users. In ACAI workshop on machine learning in user modeling.
Thwe, P. (2014). Web page access prediction based on integrated approach. International Journal of Computer Science and Business Informatics, 12(1), 55–64.
Google Scholar
Varelas, G., Voutsakis, E., Raftapoulou, P., Petrakis, E., & Milios, E. (2005). Semantic similarity methods in wordnet and their application to information retrieval on the web. In International workshop on web information and data management (WIDM) (pp. 10–16).
Zhao, Y., & Karypis, G. (2004). Emprical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning, 55, 311–331.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Computer Engineering Department, Middle East Technical University, 06800, Ankara, Turkey
Erman Sen, I. Hakki Toroslu & Pinar Karagoz

Authors

Erman Sen
View author publications
You can also search for this author in PubMed Google Scholar
I. Hakki Toroslu
View author publications
You can also search for this author in PubMed Google Scholar
Pinar Karagoz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pinar Karagoz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sen, E., Toroslu, I.H. & Karagoz, P. Improving the prediction of page access by using semantically enhanced clustering. J Intell Inf Syst 47, 165–192 (2016). https://doi.org/10.1007/s10844-016-0398-3

Download citation

Received: 22 July 2015
Revised: 24 January 2016
Accepted: 14 March 2016
Published: 20 April 2016
Issue Date: August 2016
DOI: https://doi.org/10.1007/s10844-016-0398-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving the prediction of page access by using semantically enhanced clustering

Abstract

Access this article

Similar content being viewed by others

Assisting Web Site Navigation through Web Usage Patterns

Clustering Frequent Navigation Patterns from Website Logs by Using Ontology and Temporal Information

Efficient Web Log Mining and Navigational Prediction with EHPSO and Scaled Markov Model

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation