skip to main content
research-article

ClickRank: Learning Session-Context Models to Enrich Web Search Ranking

Published:01 March 2012Publication History
Skip Abstract Section

Abstract

User browsing information, particularly non-search-related activity, reveals important contextual information on the preferences and intents of Web users. In this article, we demonstrate the importance of mining general Web user behavior data to improve ranking and other Web-search experience, with an emphasis on analyzing individual user sessions for creating aggregate models. In this context, we introduce ClickRank, an efficient, scalable algorithm for estimating Webpage and Website importance from general Web user-behavior data. We lay out the theoretical foundation of ClickRank based on an intentional surfer model and discuss its properties. We quantitatively evaluate its effectiveness regarding the problem of Web-search ranking, showing that it contributes significantly to retrieval performance as a novel Web-search feature. We demonstrate that the results produced by ClickRank for Web-search ranking are highly competitive with those produced by other approaches, yet achieved at better scalability and substantially lower computational costs. Finally, we discuss novel applications of ClickRank in providing enriched user Web-search experience, highlighting the usefulness of our approach for nonranking tasks.

References

  1. Agichtein, E., Brill, E., and Dumais, S. 2006. Improving Web search ranking by incorporating user behavior information. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 19--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Agrawal, S., Chakrabarti, K., Chaudhuri, S., Ganti, V., Konig, A. C., and Xin, D. 2009. Exploiting Web search engines to search structured databases. In Proceedings of the World Wide Web Conference. 501--510. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Amitay, E. and Broder, A. 2008. Introduction to special issue on query log analysis: Technology and ethics. ACM Trans. Web 2, 4, Article 18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Baeza-Yates, R., Castillo, C., Junqueira, F., Plachouras, V., and Silvestri, F. 2007. Challenges in distributed information retrieval. In Proceedings of the IEEE Conference on Data Engineering. 6--20.Google ScholarGoogle Scholar
  5. Beeferman, D. and Berger, A. 2000. Agglomerative clustering of a search engine query log. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 407--416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bilenko, M. and White, R. W. 2008. Mining the search trails of surfing crowds: Identifying relevant websites from user activity. In Proceedings of the World Wide Web Conference. 51--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Boldi, P., Bonchi, F., Castillo, C., Donato, D., Gionis, A., and Vigna, S. 2008. The query-flow graph: Model and applications. In Proceedings of the ACM Conference on Information and Knowledge Management. 609--618. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. 1984. Classification and Regression Trees. CRC Press, Boca Raton, FL.Google ScholarGoogle Scholar
  9. Broder, A., Lempel, R., Maghoul, F., and Pedersen, J. 2004. Efficient PageRank approximation via graph aggregation. In Proceedings of the World Wide Web Conference. 484--485. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cao, H., Jiang, D., Pei, J., Chen, E., and Li, H. 2009. Towards context-aware search by learning a very large variable length hidden Markov model from search logs. In Proceedings of the World Wide Web Conference. 191--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cao, H., Hu, D. H., Shen, D., Jiang, D., Sun, J.-T., Chen, E., and Yang, Q. 2009. Context-aware query classification. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 3--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cohen, W., Shapire, R., and Singer, Y. 1999. Learning to order things. J. Art. Intell. Res. 10, 243--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Craswell, N., Robertson, S., Zaragoza, H., and Taylor, M. 2005. Relevance weighting for query-independent evidence. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 416--423. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Craswell, N. and Szummer, M. 2007. Random walks on the click graph. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 239--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Dean, J. and Ghemawat, S. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. 137--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Dontcheva, M., Drucker, S. M., Wade, G., Salesin, D., and Cohen, M. F. 2006. Summarizing personal Web browsing sessions. In Proceedings of the ACM Symposium on User Interface Software and Technology. 115--124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Downey, D., Liebling, D., and Dumais, S. 2008. Understanding the relationship between searchers’ queries and information goals. In Proceedings of the ACM Conference on Information and Knowledge Management. 449--458. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Freund, Y., Iyer, R., Schapire, R. E., and Singer, Y. 2003. An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, 933--969. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Friedman, J. H. 2001. Greedy function approximation: A gradient boosting machine. Ann. Statist. 29, 5, 1189--1232.Google ScholarGoogle ScholarCross RefCross Ref
  20. Google. 2008. We knew the Web was big. http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html.Google ScholarGoogle Scholar
  21. Grimmett, G. and Stirzaker, D. 2001. Probability and Random Processes, 3rd Ed. Oxford University Press, Oxford, U.K.Google ScholarGoogle Scholar
  22. Guo, Y. Z., Ramamohanarao, K., and Park, L. A. F. 2007. Personalized PageRank for webpage prediction based on access time-length and frequency. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. 687--690. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Gyöngyi, Z., Garcia-Molina, H., and Pedersen, J. 2004. Combating Web spam with TrustRank. In Proceedings of the International Conference on Very Large Data Bases. 576--587. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Hadoop. 2007. Open source implementation of MapReduce. http://lucene.apache.org/hadoop/.Google ScholarGoogle Scholar
  25. He, B., Patel, M., Zhang, Z., and Chang, K. C.-C. 2007. Accessing the deep web. Comm. ACM 50, 5, 94--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Huang, C.-K., Chien, L.-F., and Oyang, Y.-J. 2003. Relevant term suggestion in interactive Web search based on contextual information in query session logs. J. Amer. Soc. Inf. Sci. Technol. 54, 7, 638--649. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Järvelin, K. and Kekäläinen, J. 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20, 4, 422--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Joachims, T. 2002. Optimizing search engines using clickthrough data. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 133--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Joachims, T., Granka, L., Pan, B., Hembrooke, H., and Gay, G. 2005. Accurately interpreting clickthrough data as implicit feedback. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 154--161. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Jones, K. S., Walker, S., and Robertson, S. E. 2000. A probabilistic model of information retrieval: Development and comparative experiments (parts 1 and 2). Inform. Proces. Manage. 36, 6, 779--840. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jones, R., Rey, B., Madani, O., and Greiner, W. 2006. Generating query substitutions. In Proceedings of the World Wide Web Conference. 387--396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Kleinberg, J. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5, 604--632. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Langville, A. N. and Meyer, C. D. 2005. Deeper inside PageRank. Internet Math. 1, 3, 335--400.Google ScholarGoogle ScholarCross RefCross Ref
  34. Li, P., Burges, C. J. C., and Wu, Q. 2007. McRank: Learning to rank using multiple classification and gradient boosting. In Proceedings of the Neural Information Processing Systems Conference. 897--904.Google ScholarGoogle Scholar
  35. Liu, T.-Y. 2009. Learning to rank for information retrieval. Found. Trends Inf. Retr. 3, 3, 225--331. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Liu, Y., Liu, T.-Y., Gao, B., Ma, Z., and Li, H. 2010. A framework to compute page importance based on user behaviors. Inf. Retr. 13, 1, 22--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. McSherry, F. 2005. A uniform approach to accelerated PageRank computation. In Proceedings of the World Wide Web Conference. 575--582. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Meiss, M. R., Menczer, F., Fortunato, S., Flammini, A., and Vespignani, A. 2008. Ranking web sites with real user traffic. In Proceedings of the ACM Conference on Web Search and Data Mining. 65--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Moler, C. 2002. The world’s largest matrix computation. http://www.mathworks.com/company/newsletters/news_notes/clevescorner/oct02_cleve.html.Google ScholarGoogle Scholar
  40. Mowshowitz, A. and Kawaguchi, A. 2002. Bias on the web. Comm. ACM 45, 9, 56--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Olston, C. and Pandey, S. 2008. Recrawl scheduling based on information longevity. In Proceedings of the World Wide Web Conference. 437--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Page, L., Brin, S., Motwani, R., and Winograd, T. 1998. The PageRank citation ranking: Bringing order to the web. Techn. rep., Stanford University.Google ScholarGoogle Scholar
  43. Radlinski, F. and Joachims, T. 2005. Query chains: Learning to rank from implicit feedback. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 239--248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Radlinski, F. and Joachims, T. 2007. Active exploration for learning rankings from clickthrough data. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 570--579. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Radlinski, F., Broder, A., Ciccolo, P., Gabrilovich, E., Josifovski, V., and Riedel, L. 2008. Optimizing relevance and revenue in ad search: A query substitution approach. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 403--410. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Rand, W. M. 1971. Objective criteria for the evaluation of clustering methods. J. Amer. Statist. Assoc. 66, 336, 846--850.Google ScholarGoogle ScholarCross RefCross Ref
  47. Richardson, M., Prakash, A., and Brill, E. 2006. Beyond PageRank: Machine learning for static ranking. In Proceedings of the World Wide Web Conference. 707--715. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Silverstein, C., Marais, H., Henzinger, M., and Moricz, M. 1999. Analysis of a very large Web search engine query log. SIGIR Forum 33, 1, 6--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. White, R. W. and Drucker, S. M. 2007. Investigating behaviorial variability in Web search. In Proceedings of the World Wide Web Conference. 21--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. White, R. W., Bilenko, M., and Cucerzan, S. 2008. Leveraging popular destinations to enhance Web search interaction. ACM Trans. Web 2, 3, Article 16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Xu, J. and Li, H. 2007. AdaRank: A boosting algorithm for information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 391--398. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Zheng, Z., Zha, H., Zhang, T., Chapelle, O., Chen, K., and Sun, G. 2007. A general boosting method and its application to learning ranking functions for Web search. In Proceedings of the Neural Information Processing Systems Conference. 1697--1704.Google ScholarGoogle Scholar

Index Terms

  1. ClickRank: Learning Session-Context Models to Enrich Web Search Ranking

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on the Web
        ACM Transactions on the Web  Volume 6, Issue 1
        March 2012
        109 pages
        ISSN:1559-1131
        EISSN:1559-114X
        DOI:10.1145/2109205
        Issue’s Table of Contents

        Copyright © 2012 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 March 2012
        • Accepted: 1 June 2011
        • Revised: 1 June 2010
        • Received: 1 September 2009
        Published in tweb Volume 6, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader