Abstract
The improvement of many applications such as web search, latency reduction, and personalization/ recommendation systems depends on surfing prediction. Predicting user surfing paths involves tradeoffs between model complexity and predictive accuracy. In this paper, we combine two classification techniques, namely, the Markov model and Support Vector Machines (SVM), to resolve prediction using Dempster’s rule. Such fusion overcomes the inability of the Markov model in predicting the unseen data as well as overcoming the problem of multiclassification in the case of SVM, especially when dealing with large number of classes. We apply feature extraction to increase the power of discrimination of SVM. In addition, during prediction we employ domain knowledge to reduce the number of classifiers for the improvement of accuracy and the reduction of prediction time. We demonstrate the effectiveness of our hybrid approach by comparing our results with widely used techniques, namely, SVM, the Markov model, and association rule mining.
Similar content being viewed by others
References
Yang, Q., Zhang, H., Li, T.: Mining web logs for prediction models in WWW caching and prefetching. In: 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD, August 26–29, pp. 473–478 (2001)
Chinen, K., Yamaguchi, S.: An interactive prefetching proxy server for improvement of WWW latency. In: Proceedings of the Seventh Annual Conference of the Internet Society (INEt’97), Kuala Lumpur, June 1997
Duchamp, D.: Prefetching hyperlinks. In: Proceedings of the Second USENIX Symposium on Internet Technologies and Systems (USITS), Boulder, CO, pp. 127–138 (1999)
Teng W.-G., Chang C.-Y., Chen M.-S. (2005). Integrating Web caching and web prefetching in client-side proxies. IEEE Trans. Parallel Distrib. Syst. 16(5): 444-455
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the 7th Internatinal WWW Conference, Brisbane, Australia, pp. 107–117 (1998)
Burke R. (2002). Hybrid recommender systems: survey and experiments. User Model. User-Adapted Interact. 12(4): 331-370
Mobasher, B., Dai, H., Luo, T., Nakagawa, M.: Effective personalization based on association rule discovery from Web usage data. In: Proceedings of the ACM Workshop on Web Information and Data Management (WIDM01), pp. 9–15 (2001)
Sarwar, B.M., Karypis, G., Konstan, J., Riedl, J.: Analysis of recommender algorithms for e-commerce. In: Proceedings of the 2nd ACM E-Commerce Conference (EC’00), October 2000, Minneapolis, Minnesota, pp. 158–167 (2000)
Pitkow, J., Pirolli, P.: Mining longest repeating subsequences to predict World Wide Web surfing. In: Proceedings of the 2nd USENIX Symposium on Internet Technologies and Systems (USITS’99), Boulder, Colorado, October 1999, pp. 139–150 (1999)
Grcar, M., Fortuna, B., Mladenic, D.: kNN versus SVM in the collaborative filtering framework. In: WebKDD ’05, August 21, Chicago, Illinois, USA
Chung V., Li C.H., Kwok J. (2004). Dissimilarity learning for nominal data, Pattern Recognition 37(7): 1471-1477
Lalmas, M.: Dempster–Shafer’s theory of evidence applied to structured documents: modelling uncertainty. In: Proceedings of the 20th Annual International ACM SIGIR, Philadelphia, PA, pp. 110–118 (1997)
Pandey, A., Srivastava, J., Shekhar, S.: A Web intelligent prefetcher for dynamic pages using association rules – a summary of results. In: SIAM Workshop on Web Mining (2001)
Su, Z., Yang, Q., Lu, Y., Zhang, H.: Whatnext: a prediction system for web requests using n-gram sequence models. In: Proceedings of the First International Conference on Web Information System and Engineering Conference, Hong Kong, June 2000, pp. 200–207 (2000)
Chang, C.-Y., Chen, M.-S.: A new cache replacement algorithm for the integration of web caching and prefetching. In: Proceedings of the ACM 11th International Conference on Information and Knowledge Management (CIKM-02), November 4–9, pp. 632–634 (2002)
Nasraoui, O., Pavuluri, M.: Complete this puzzle: a connectionist approach to accurate web recommendations based on a committee of predictors. In: Mobasher, B., Liu, B., Masand, B., Nasraoui, O. (eds.) Proceedings of WebKDD 2004, Workshop on Web Mining and Web Usage Analysis, part of the ACM KDD: Knowledge Discovery and Data Mining Conference, Seattle, WA (2004)
Nasraoui, O., Petenes, C.: Combining web usage mining and fuzzy inference for website personalization. In: Proceedings of WebKDD, pp. 37–46 (2003)
Nasraoui, O., Krishnapuram, R.: One step evolutionary mining of context sensitive associations and Web navigation patterns. In: SIAM International Conferince on Data Mining, Arlington , VA, April 2002, pp. 531–547 (2002)
Kraft D.H., Chen J., Martin-Bautista M.J., Vila M.A.(2002).Textual information retrieval with user profiles using fuzzy clusering and inferencing. In: Szczepaniak P.S., Segovia J., Kacprzyk J., Zadeh L.A.(eds.) Intelligent Exploration of the Web. Physica-Verlag, Hiedelberg
Nasraoui O., Krishnapuram R.(2002). An evolutionary approach to mining robust multi-resolution web profiles and context sensitive URL Associations. International Journal of Computational Intelligence and Applications 2(3): 339-348
Joachims, T., Freitag, D., Mitchell, T.: Webwatcher: a tour guide for the World Wide Web. In: Proceedings of the IJCAI-97, pp. 770–777 (1997)
Cristianini N., Shawe-Taylor J. (2000). Introduction to Support Vector Machines. Cambridge University Press, Cambridge, pp. 93–122
Vapnik V.(1998). Statistical Learning Theory. Wiley, New York
Platt, J.: Probabilities for SV machines. In: Smola, A., Bartlett, P., Schlkopf, B, Schuurmans, D. (eds.) Advances in Large Margin Classifiers. Original Title: “Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods”, pp. 61–74, MIT Press, Cambridge (1999)
Wahba, G.: Multivariate function and operator estimation, based on smoothing splines and reproducing kernels. In: Casdagli, M., Eubank, S. (eds.) Nonlinear Modeling and Forecasting, SFI Studies in Sciences of Complexity, vol XII, pp. 95–112 (1992)
Hastie, T., Tibshirani, R.: Classifiaction by pairwise coupling. In: Proceedings of the 1997 Conference on Advances in Neural Information Processing Systems 10, Denver, Colorado, pp: 507–513 (1997)
Press W.H., Teukolsky S.A., Vetterling W.T., Flannery B.P. (1992). Numerical Recipes in C: The Art of Scientific Computing, 2nd edn. Cambridge University Press, Cambridge
Aslandogan, Y.A., Yu, C.T.: Evaluating strategies and systems for content based indexing of person images on the Web. In: Proceedings of the eighth ACM International Conference on Multimedia, Marina del Rey, California, United States, pp. 313–321 (2000)
Shafer G. (1976). A Mathematical Theory of Evidence. Princeton University Press, Princeton
Bendjebbour A., Delignon Y., Fouque L., Samson V., Pieczynski W. (2001). Multisensor image segmentation using Dempster-shafer fusion in Markov fields context. IEEE Trans. Geosci. Remote Sens. 39(8): 1789-1798
Aslandogan, Y.A., Mahajani, G.A., Taylor, S.: Evidence combination in medical data mining. In: Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC’04), vol. 2, 465 pp. (2004)
Cooley, R., Mobasher, B., Srivastava, J.: Data preparation for mining World Wide Web browsing patterns. J. Knowl. Inf. Syst.1(1) (1999)
Pirolli, P., Pitkow, J., Rao, R.: Silk from a sows ear: extracting usable structures from the web. In: Proceedings of 1996 Conference on Human Factors in Computing Systems (CHI-96), Vancouver, British Columbia, Canada, pp. 118–125 (1996)
Chang, C., Lin, C.: LIBSVM: a library for support vector machines, http://www.csie.ntu.edu.tw/~cjlin/libsvm (2001)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, San Francisco, CA, pp. 487–499 (1994)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Awad, M., Khan, L. & Thuraisingham, B. Predicting WWW surfing using multiple evidence combination. The VLDB Journal 17, 401–417 (2008). https://doi.org/10.1007/s00778-006-0014-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-006-0014-1