Skip to main content
Log in

Improving the prediction of page access by using semantically enhanced clustering

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

There are many parameters that may affect the navigation behaviour of web users. Prediction of the potential next page that may be visited by the web user is important, since this information can be used for prefetching or personalization of the page for that user. One of the successful methods for the determination of the next web page is to construct behaviour models of the users by clustering. The success of clustering is highly correlated with the similarity measure that is used for calculating the similarity among navigation sequences. This work proposes a new approach for determining the next web page by extending the standard clustering with the content-based semantic similarity method. Semantics of web-pages are represented as sets of concepts, and thus, user session are modelled as sequence of sets. As a result, session similarity is defined as an alignment of two sequences of sets. The success of the proposed method has been shown through applying it on real life web log data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Notes

  1. CLUTO (scluster, gcluto)—a cross-platform for clustering low- and high-dimensional datasets and for analyzing the characteristics of the various clusters, http://glaros.dtc.umn.edu/gkhome/cluto/cluto/overview

  2. One year of access web-logs of METU C.Eng. website (http://www.ceng.metu.edu.tr)

References

  • Batet, M., Erola, A., Sanchez, D., & Castella-Roca, J. (2013). Utility preserving query log anonymization via semantic microaggregation. Information Sciences, 242, 49–63.

    Article  Google Scholar 

  • Bayir, M., Toroslu, I., Cosar, A., & Fidan, G. (2009). Smart miner: a new framework for mining large scale web usage data. In International conference in World Wide Web (pp. 161–170).

  • Bayir, M., Toroslu, I., Demirbas, M., & Cosar, A. (2012). Discovering better navigation sequences for the session construction problem. Data and Knowledge Engineering, 73, 58–72.

    Article  Google Scholar 

  • Berendt, B. (2000a). Analysis of navigation behaviour in web sites integrating multiple information systems. VLDB Journal, 9, 56–75.

  • Berendt, B. (2000b). Web usage mining, site semantics and the support of navigation. In Web mining for e-commerce—challenges and opportunities workshop (WEBKDD).

  • Berendt, B. (2001). Understanding web usage at different levels of abstraction: coarsening and visualizing sequence. In WEBKDD Workshop of mining log data across all customer touch points.

  • Blanco, L., Dalvi, N., & Machanavajjhala, A. (2011). Highly efficient algorithms for structural clustering of large websites. In 20th international conference on world wide web (WWW) (pp. 443– 446).

  • Dai, H., & Mobasher, B. (2002). Using ontologies to discover domain-level web usage profiles. In PKDD workshop on semantic mining.

  • Eirinaki, M., & Vazirgiannis, M. (2003). Web mining for web personalization. ACM Transactions on Internet Technology, 3(1), 1–27.

    Article  Google Scholar 

  • Eirinaki, M., Vazigiannis, M., & Varlamis, I. (2003). Sewep: using site semantics and a taxonomy to enhance the web personalization process. In ACM SIGKDD international conference on knowledge discovery and data mining (pp. 99–108).

  • Gunel, B., & Senkul, P. (2012a). Integrating semantic tagging with popularity based pagerank for next page prediction. In International symposium on computer and information sciences (ISCIS).

  • Gunel, B., & Senkul, P. (2012b). Investigating the effect of duration, page size end frequency on next page recommendation with pagerank algorithm. In WSDM Workshop on web search and click data (WSCD).

  • Harispe, S., Sanchez, D., Ranwez, S., Janaqi, S., & Montmain, J. (2014). A framework for unifying ontology-based semantic similarity measures: a study in the biomedical domain. Journal of Biomedical Informatics, 48, 38–53.

    Article  Google Scholar 

  • Heflin, J., Hendler, J., & Luke, S. (1999). Shoe: a knowledge representation language for internet applications. CS-TR-4078 (UMACS TR-99-71), University of Maryland, Dept. of Computer Sciences.

  • Kilic, S., Senkul, P., & Toroslu, I.H. (2012). Clustering frequent navigation patterns from website logs by using ontology and temporal information. In International symposium on computer and information sciences (ISCIS) (pp. 363–370).

  • Mobasher, B., Cooley, R., & Srivastava, J. (1999). Creating adaptive web through usage-based clustering of urls. In IEEE Knowledge and data engineering exchange workshop.

  • Mobasher, B., Cooley, R., & Srivastava, J. (2000a). Automatic personalization based on web usage mining. Communications of the ACM, 43(8), 142–151.

  • Mobasher, B., Dai, H., Luo, T., Nakagawa, M., Yuqing, S., & Wiltshire, J. (2000b). Discovery of aggregate usage profiles for web personalization. In WEBKDD workshop on web mining for e-commerce.

  • Mobasher, B., Dai, H., Luo, T., Yuqing, S., & Zhu, J. (2000c). Integrating web usage and content mining for more effective personalization. In International conference on e-commerce and web technologies (ECWeb).

  • Needleman, S., & Wunsch, C. (1970). A general method applicable to search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3), 443–453.

    Article  Google Scholar 

  • Pallis, G., Lefteris, A., & Vakali, A. (2007). Validation and interpretation of web users’ session clusters. Information Processing and Managament, 43(5), 1348–1367.

    Article  Google Scholar 

  • Perkowitz, M., & Etzioni, O. (1998). Adaptive web sites: automatically synthesizing web pages. In National conference on artificial intelligence.

  • Perkowitz, M., & Etzioni, O. (1999). Adaptive web sites: conceptual cluster mining. In International joint conference on artificial intelligence (IJCAI).

  • Perkowitz, M., & Etzioni, O. (2000). Towards adaptive web sites: conceptual framework and case study. Artificial Intelligence, 118(1–2), 245–275.

    Article  MATH  Google Scholar 

  • Pirro, G. (2009). A semantic similarity metric combining features and intrinsic information content. Data and Knowledge Engineering, 68(11), 1289–1308.

    Article  Google Scholar 

  • Rada, R., Mili, H., Bicknell, E., & Blettner, M. (1989). Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man, and Cybernetics, 19, 17–30.

    Article  Google Scholar 

  • Ricklefs, M., & Blomqvist, E. (2008). Ontology-based relevance assesment: an evaluation of different semantic similarity measures. In On the move (OTM) confederated international conferences (coopIS) (pp. 1235–1252).

  • Sanchez, D., Batet, M., Isem, D., & Valls, A. (2012). Ontology-based semantic similarity: a new feature-based approach. Expert Systems with Applications, 39(9), 7718–7728.

    Article  Google Scholar 

  • Senkul, P., & Salin, S. (2012). Improving pattern quality in web usage mining by using semantic information. Knowledge and Information Systems, 30, 527–541.

    Article  Google Scholar 

  • Spiliopolou, M. (2000). Web usage mining for web site evaluation. Communications of the ACM, 43(8), 127–134.

    Article  Google Scholar 

  • Spiliopoulou, M., & Faulstich, L. (1998). Wum: a web utilization miner. In International workshop on the web and databases.

  • Spiliopoulou, M., Faulstich, L., & Wilkler, K. (1999). A data miner analyzing the navigational behaviour of web users. In ACAI workshop on machine learning in user modeling.

  • Thwe, P. (2014). Web page access prediction based on integrated approach. International Journal of Computer Science and Business Informatics, 12(1), 55–64.

    Google Scholar 

  • Varelas, G., Voutsakis, E., Raftapoulou, P., Petrakis, E., & Milios, E. (2005). Semantic similarity methods in wordnet and their application to information retrieval on the web. In International workshop on web information and data management (WIDM) (pp. 10–16).

  • Zhao, Y., & Karypis, G. (2004). Emprical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning, 55, 311–331.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pinar Karagoz.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sen, E., Toroslu, I.H. & Karagoz, P. Improving the prediction of page access by using semantically enhanced clustering. J Intell Inf Syst 47, 165–192 (2016). https://doi.org/10.1007/s10844-016-0398-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-016-0398-3

Keywords

Navigation