Abstract
User browsing information, particularly non-search-related activity, reveals important contextual information on the preferences and intents of Web users. In this article, we demonstrate the importance of mining general Web user behavior data to improve ranking and other Web-search experience, with an emphasis on analyzing individual user sessions for creating aggregate models. In this context, we introduce ClickRank, an efficient, scalable algorithm for estimating Webpage and Website importance from general Web user-behavior data. We lay out the theoretical foundation of ClickRank based on an intentional surfer model and discuss its properties. We quantitatively evaluate its effectiveness regarding the problem of Web-search ranking, showing that it contributes significantly to retrieval performance as a novel Web-search feature. We demonstrate that the results produced by ClickRank for Web-search ranking are highly competitive with those produced by other approaches, yet achieved at better scalability and substantially lower computational costs. Finally, we discuss novel applications of ClickRank in providing enriched user Web-search experience, highlighting the usefulness of our approach for nonranking tasks.
- Agichtein, E., Brill, E., and Dumais, S. 2006. Improving Web search ranking by incorporating user behavior information. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 19--26. Google ScholarDigital Library
- Agrawal, S., Chakrabarti, K., Chaudhuri, S., Ganti, V., Konig, A. C., and Xin, D. 2009. Exploiting Web search engines to search structured databases. In Proceedings of the World Wide Web Conference. 501--510. Google ScholarDigital Library
- Amitay, E. and Broder, A. 2008. Introduction to special issue on query log analysis: Technology and ethics. ACM Trans. Web 2, 4, Article 18. Google ScholarDigital Library
- Baeza-Yates, R., Castillo, C., Junqueira, F., Plachouras, V., and Silvestri, F. 2007. Challenges in distributed information retrieval. In Proceedings of the IEEE Conference on Data Engineering. 6--20.Google Scholar
- Beeferman, D. and Berger, A. 2000. Agglomerative clustering of a search engine query log. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 407--416. Google ScholarDigital Library
- Bilenko, M. and White, R. W. 2008. Mining the search trails of surfing crowds: Identifying relevant websites from user activity. In Proceedings of the World Wide Web Conference. 51--60. Google ScholarDigital Library
- Boldi, P., Bonchi, F., Castillo, C., Donato, D., Gionis, A., and Vigna, S. 2008. The query-flow graph: Model and applications. In Proceedings of the ACM Conference on Information and Knowledge Management. 609--618. Google ScholarDigital Library
- Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. 1984. Classification and Regression Trees. CRC Press, Boca Raton, FL.Google Scholar
- Broder, A., Lempel, R., Maghoul, F., and Pedersen, J. 2004. Efficient PageRank approximation via graph aggregation. In Proceedings of the World Wide Web Conference. 484--485. Google ScholarDigital Library
- Cao, H., Jiang, D., Pei, J., Chen, E., and Li, H. 2009. Towards context-aware search by learning a very large variable length hidden Markov model from search logs. In Proceedings of the World Wide Web Conference. 191--200. Google ScholarDigital Library
- Cao, H., Hu, D. H., Shen, D., Jiang, D., Sun, J.-T., Chen, E., and Yang, Q. 2009. Context-aware query classification. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 3--10. Google ScholarDigital Library
- Cohen, W., Shapire, R., and Singer, Y. 1999. Learning to order things. J. Art. Intell. Res. 10, 243--270. Google ScholarDigital Library
- Craswell, N., Robertson, S., Zaragoza, H., and Taylor, M. 2005. Relevance weighting for query-independent evidence. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 416--423. Google ScholarDigital Library
- Craswell, N. and Szummer, M. 2007. Random walks on the click graph. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 239--246. Google ScholarDigital Library
- Dean, J. and Ghemawat, S. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. 137--150. Google ScholarDigital Library
- Dontcheva, M., Drucker, S. M., Wade, G., Salesin, D., and Cohen, M. F. 2006. Summarizing personal Web browsing sessions. In Proceedings of the ACM Symposium on User Interface Software and Technology. 115--124. Google ScholarDigital Library
- Downey, D., Liebling, D., and Dumais, S. 2008. Understanding the relationship between searchers’ queries and information goals. In Proceedings of the ACM Conference on Information and Knowledge Management. 449--458. Google ScholarDigital Library
- Freund, Y., Iyer, R., Schapire, R. E., and Singer, Y. 2003. An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, 933--969. Google ScholarDigital Library
- Friedman, J. H. 2001. Greedy function approximation: A gradient boosting machine. Ann. Statist. 29, 5, 1189--1232.Google ScholarCross Ref
- Google. 2008. We knew the Web was big. http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html.Google Scholar
- Grimmett, G. and Stirzaker, D. 2001. Probability and Random Processes, 3rd Ed. Oxford University Press, Oxford, U.K.Google Scholar
- Guo, Y. Z., Ramamohanarao, K., and Park, L. A. F. 2007. Personalized PageRank for webpage prediction based on access time-length and frequency. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. 687--690. Google ScholarDigital Library
- Gyöngyi, Z., Garcia-Molina, H., and Pedersen, J. 2004. Combating Web spam with TrustRank. In Proceedings of the International Conference on Very Large Data Bases. 576--587. Google ScholarDigital Library
- Hadoop. 2007. Open source implementation of MapReduce. http://lucene.apache.org/hadoop/.Google Scholar
- He, B., Patel, M., Zhang, Z., and Chang, K. C.-C. 2007. Accessing the deep web. Comm. ACM 50, 5, 94--101. Google ScholarDigital Library
- Huang, C.-K., Chien, L.-F., and Oyang, Y.-J. 2003. Relevant term suggestion in interactive Web search based on contextual information in query session logs. J. Amer. Soc. Inf. Sci. Technol. 54, 7, 638--649. Google ScholarDigital Library
- Järvelin, K. and Kekäläinen, J. 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20, 4, 422--446. Google ScholarDigital Library
- Joachims, T. 2002. Optimizing search engines using clickthrough data. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 133--142. Google ScholarDigital Library
- Joachims, T., Granka, L., Pan, B., Hembrooke, H., and Gay, G. 2005. Accurately interpreting clickthrough data as implicit feedback. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 154--161. Google ScholarDigital Library
- Jones, K. S., Walker, S., and Robertson, S. E. 2000. A probabilistic model of information retrieval: Development and comparative experiments (parts 1 and 2). Inform. Proces. Manage. 36, 6, 779--840. Google ScholarDigital Library
- Jones, R., Rey, B., Madani, O., and Greiner, W. 2006. Generating query substitutions. In Proceedings of the World Wide Web Conference. 387--396. Google ScholarDigital Library
- Kleinberg, J. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5, 604--632. Google ScholarDigital Library
- Langville, A. N. and Meyer, C. D. 2005. Deeper inside PageRank. Internet Math. 1, 3, 335--400.Google ScholarCross Ref
- Li, P., Burges, C. J. C., and Wu, Q. 2007. McRank: Learning to rank using multiple classification and gradient boosting. In Proceedings of the Neural Information Processing Systems Conference. 897--904.Google Scholar
- Liu, T.-Y. 2009. Learning to rank for information retrieval. Found. Trends Inf. Retr. 3, 3, 225--331. Google ScholarDigital Library
- Liu, Y., Liu, T.-Y., Gao, B., Ma, Z., and Li, H. 2010. A framework to compute page importance based on user behaviors. Inf. Retr. 13, 1, 22--45. Google ScholarDigital Library
- McSherry, F. 2005. A uniform approach to accelerated PageRank computation. In Proceedings of the World Wide Web Conference. 575--582. Google ScholarDigital Library
- Meiss, M. R., Menczer, F., Fortunato, S., Flammini, A., and Vespignani, A. 2008. Ranking web sites with real user traffic. In Proceedings of the ACM Conference on Web Search and Data Mining. 65--76. Google ScholarDigital Library
- Moler, C. 2002. The world’s largest matrix computation. http://www.mathworks.com/company/newsletters/news_notes/clevescorner/oct02_cleve.html.Google Scholar
- Mowshowitz, A. and Kawaguchi, A. 2002. Bias on the web. Comm. ACM 45, 9, 56--60. Google ScholarDigital Library
- Olston, C. and Pandey, S. 2008. Recrawl scheduling based on information longevity. In Proceedings of the World Wide Web Conference. 437--446. Google ScholarDigital Library
- Page, L., Brin, S., Motwani, R., and Winograd, T. 1998. The PageRank citation ranking: Bringing order to the web. Techn. rep., Stanford University.Google Scholar
- Radlinski, F. and Joachims, T. 2005. Query chains: Learning to rank from implicit feedback. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 239--248. Google ScholarDigital Library
- Radlinski, F. and Joachims, T. 2007. Active exploration for learning rankings from clickthrough data. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 570--579. Google ScholarDigital Library
- Radlinski, F., Broder, A., Ciccolo, P., Gabrilovich, E., Josifovski, V., and Riedel, L. 2008. Optimizing relevance and revenue in ad search: A query substitution approach. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 403--410. Google ScholarDigital Library
- Rand, W. M. 1971. Objective criteria for the evaluation of clustering methods. J. Amer. Statist. Assoc. 66, 336, 846--850.Google ScholarCross Ref
- Richardson, M., Prakash, A., and Brill, E. 2006. Beyond PageRank: Machine learning for static ranking. In Proceedings of the World Wide Web Conference. 707--715. Google ScholarDigital Library
- Silverstein, C., Marais, H., Henzinger, M., and Moricz, M. 1999. Analysis of a very large Web search engine query log. SIGIR Forum 33, 1, 6--12. Google ScholarDigital Library
- White, R. W. and Drucker, S. M. 2007. Investigating behaviorial variability in Web search. In Proceedings of the World Wide Web Conference. 21--30. Google ScholarDigital Library
- White, R. W., Bilenko, M., and Cucerzan, S. 2008. Leveraging popular destinations to enhance Web search interaction. ACM Trans. Web 2, 3, Article 16. Google ScholarDigital Library
- Xu, J. and Li, H. 2007. AdaRank: A boosting algorithm for information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 391--398. Google ScholarDigital Library
- Zheng, Z., Zha, H., Zhang, T., Chapelle, O., Chen, K., and Sun, G. 2007. A general boosting method and its application to learning ranking functions for Web search. In Proceedings of the Neural Information Processing Systems Conference. 1697--1704.Google Scholar
Index Terms
- ClickRank: Learning Session-Context Models to Enrich Web Search Ranking
Recommendations
Mining rich session context to improve web search
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data miningUser browsing information, particularly their non-search related activity, reveals important contextual information on the preferences and the intent of web users. In this paper, we expand the use of browsing information for web search ranking and other ...
Quality-biased ranking for queries with commercial intent
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide WebModern search engines are good enough to answer popular commercial queries with mainly highly relevant documents. However, our experiments show that users behavior on such relevant commercial sites may differ from one to another web-site with the same ...
Smoothing clickthrough data for web search ranking
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrievalIncorporating features extracted from clickthrough data (called clickthrough features) has been demonstrated to significantly improve the performance of ranking models for Web search applications. Such benefits, however, are severely limited by the data ...
Comments