research-article

ClickRank: Learning Session-Context Models to Enrich Web Search Ranking

Authors:
Guangyu Zhu

University of Maryland, College Park

University of Maryland, College Park
View Profile

,
Gilad Mishne

Yahoo! Labs

Yahoo! Labs
View Profile

Authors Info & Claims

ACM Transactions on the Web Volume 6 Issue 1Article No.: 1pp 1–22https://doi.org/10.1145/2109205.2109206

Published:01 March 2012Publication History

ACM Transactions on the Web

Abstract

User browsing information, particularly non-search-related activity, reveals important contextual information on the preferences and intents of Web users. In this article, we demonstrate the importance of mining general Web user behavior data to improve ranking and other Web-search experience, with an emphasis on analyzing individual user sessions for creating aggregate models. In this context, we introduce ClickRank, an efficient, scalable algorithm for estimating Webpage and Website importance from general Web user-behavior data. We lay out the theoretical foundation of ClickRank based on an intentional surfer model and discuss its properties. We quantitatively evaluate its effectiveness regarding the problem of Web-search ranking, showing that it contributes significantly to retrieval performance as a novel Web-search feature. We demonstrate that the results produced by ClickRank for Web-search ranking are highly competitive with those produced by other approaches, yet achieved at better scalability and substantially lower computational costs. Finally, we discuss novel applications of ClickRank in providing enriched user Web-search experience, highlighting the usefulness of our approach for nonranking tasks.

References

Agichtein, E., Brill, E., and Dumais, S. 2006. Improving Web search ranking by incorporating user behavior information. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 19--26. Google ScholarDigital Library
Agrawal, S., Chakrabarti, K., Chaudhuri, S., Ganti, V., Konig, A. C., and Xin, D. 2009. Exploiting Web search engines to search structured databases. In Proceedings of the World Wide Web Conference. 501--510. Google ScholarDigital Library
Amitay, E. and Broder, A. 2008. Introduction to special issue on query log analysis: Technology and ethics. ACM Trans. Web 2, 4, Article 18. Google ScholarDigital Library
Baeza-Yates, R., Castillo, C., Junqueira, F., Plachouras, V., and Silvestri, F. 2007. Challenges in distributed information retrieval. In Proceedings of the IEEE Conference on Data Engineering. 6--20.Google Scholar
Beeferman, D. and Berger, A. 2000. Agglomerative clustering of a search engine query log. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 407--416. Google ScholarDigital Library
Bilenko, M. and White, R. W. 2008. Mining the search trails of surfing crowds: Identifying relevant websites from user activity. In Proceedings of the World Wide Web Conference. 51--60. Google ScholarDigital Library
Boldi, P., Bonchi, F., Castillo, C., Donato, D., Gionis, A., and Vigna, S. 2008. The query-flow graph: Model and applications. In Proceedings of the ACM Conference on Information and Knowledge Management. 609--618. Google ScholarDigital Library
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. 1984. Classification and Regression Trees. CRC Press, Boca Raton, FL.Google Scholar
Broder, A., Lempel, R., Maghoul, F., and Pedersen, J. 2004. Efficient PageRank approximation via graph aggregation. In Proceedings of the World Wide Web Conference. 484--485. Google ScholarDigital Library
Cao, H., Jiang, D., Pei, J., Chen, E., and Li, H. 2009. Towards context-aware search by learning a very large variable length hidden Markov model from search logs. In Proceedings of the World Wide Web Conference. 191--200. Google ScholarDigital Library
Cao, H., Hu, D. H., Shen, D., Jiang, D., Sun, J.-T., Chen, E., and Yang, Q. 2009. Context-aware query classification. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 3--10. Google ScholarDigital Library
Cohen, W., Shapire, R., and Singer, Y. 1999. Learning to order things. J. Art. Intell. Res. 10, 243--270. Google ScholarDigital Library
Craswell, N., Robertson, S., Zaragoza, H., and Taylor, M. 2005. Relevance weighting for query-independent evidence. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 416--423. Google ScholarDigital Library
Craswell, N. and Szummer, M. 2007. Random walks on the click graph. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 239--246. Google ScholarDigital Library
Dean, J. and Ghemawat, S. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. 137--150. Google ScholarDigital Library
Dontcheva, M., Drucker, S. M., Wade, G., Salesin, D., and Cohen, M. F. 2006. Summarizing personal Web browsing sessions. In Proceedings of the ACM Symposium on User Interface Software and Technology. 115--124. Google ScholarDigital Library
Downey, D., Liebling, D., and Dumais, S. 2008. Understanding the relationship between searchers’ queries and information goals. In Proceedings of the ACM Conference on Information and Knowledge Management. 449--458. Google ScholarDigital Library
Freund, Y., Iyer, R., Schapire, R. E., and Singer, Y. 2003. An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, 933--969. Google ScholarDigital Library
Friedman, J. H. 2001. Greedy function approximation: A gradient boosting machine. Ann. Statist. 29, 5, 1189--1232.Google ScholarCross Ref
Google. 2008. We knew the Web was big. http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html.Google Scholar
Grimmett, G. and Stirzaker, D. 2001. Probability and Random Processes, 3rd Ed. Oxford University Press, Oxford, U.K.Google Scholar
Guo, Y. Z., Ramamohanarao, K., and Park, L. A. F. 2007. Personalized PageRank for webpage prediction based on access time-length and frequency. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. 687--690. Google ScholarDigital Library
Gyöngyi, Z., Garcia-Molina, H., and Pedersen, J. 2004. Combating Web spam with TrustRank. In Proceedings of the International Conference on Very Large Data Bases. 576--587. Google ScholarDigital Library
Hadoop. 2007. Open source implementation of MapReduce. http://lucene.apache.org/hadoop/.Google Scholar
He, B., Patel, M., Zhang, Z., and Chang, K. C.-C. 2007. Accessing the deep web. Comm. ACM 50, 5, 94--101. Google ScholarDigital Library
Huang, C.-K., Chien, L.-F., and Oyang, Y.-J. 2003. Relevant term suggestion in interactive Web search based on contextual information in query session logs. J. Amer. Soc. Inf. Sci. Technol. 54, 7, 638--649. Google ScholarDigital Library
Järvelin, K. and Kekäläinen, J. 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20, 4, 422--446. Google ScholarDigital Library
Joachims, T. 2002. Optimizing search engines using clickthrough data. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 133--142. Google ScholarDigital Library
Joachims, T., Granka, L., Pan, B., Hembrooke, H., and Gay, G. 2005. Accurately interpreting clickthrough data as implicit feedback. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 154--161. Google ScholarDigital Library
Jones, K. S., Walker, S., and Robertson, S. E. 2000. A probabilistic model of information retrieval: Development and comparative experiments (parts 1 and 2). Inform. Proces. Manage. 36, 6, 779--840. Google ScholarDigital Library
Jones, R., Rey, B., Madani, O., and Greiner, W. 2006. Generating query substitutions. In Proceedings of the World Wide Web Conference. 387--396. Google ScholarDigital Library
Kleinberg, J. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5, 604--632. Google ScholarDigital Library
Langville, A. N. and Meyer, C. D. 2005. Deeper inside PageRank. Internet Math. 1, 3, 335--400.Google ScholarCross Ref
Li, P., Burges, C. J. C., and Wu, Q. 2007. McRank: Learning to rank using multiple classification and gradient boosting. In Proceedings of the Neural Information Processing Systems Conference. 897--904.Google Scholar
Liu, T.-Y. 2009. Learning to rank for information retrieval. Found. Trends Inf. Retr. 3, 3, 225--331. Google ScholarDigital Library
Liu, Y., Liu, T.-Y., Gao, B., Ma, Z., and Li, H. 2010. A framework to compute page importance based on user behaviors. Inf. Retr. 13, 1, 22--45. Google ScholarDigital Library
McSherry, F. 2005. A uniform approach to accelerated PageRank computation. In Proceedings of the World Wide Web Conference. 575--582. Google ScholarDigital Library
Meiss, M. R., Menczer, F., Fortunato, S., Flammini, A., and Vespignani, A. 2008. Ranking web sites with real user traffic. In Proceedings of the ACM Conference on Web Search and Data Mining. 65--76. Google ScholarDigital Library
Moler, C. 2002. The world’s largest matrix computation. http://www.mathworks.com/company/newsletters/news_notes/clevescorner/oct02_cleve.html.Google Scholar
Mowshowitz, A. and Kawaguchi, A. 2002. Bias on the web. Comm. ACM 45, 9, 56--60. Google ScholarDigital Library
Olston, C. and Pandey, S. 2008. Recrawl scheduling based on information longevity. In Proceedings of the World Wide Web Conference. 437--446. Google ScholarDigital Library
Page, L., Brin, S., Motwani, R., and Winograd, T. 1998. The PageRank citation ranking: Bringing order to the web. Techn. rep., Stanford University.Google Scholar
Radlinski, F. and Joachims, T. 2005. Query chains: Learning to rank from implicit feedback. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 239--248. Google ScholarDigital Library
Radlinski, F. and Joachims, T. 2007. Active exploration for learning rankings from clickthrough data. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 570--579. Google ScholarDigital Library
Radlinski, F., Broder, A., Ciccolo, P., Gabrilovich, E., Josifovski, V., and Riedel, L. 2008. Optimizing relevance and revenue in ad search: A query substitution approach. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 403--410. Google ScholarDigital Library
Rand, W. M. 1971. Objective criteria for the evaluation of clustering methods. J. Amer. Statist. Assoc. 66, 336, 846--850.Google ScholarCross Ref
Richardson, M., Prakash, A., and Brill, E. 2006. Beyond PageRank: Machine learning for static ranking. In Proceedings of the World Wide Web Conference. 707--715. Google ScholarDigital Library
Silverstein, C., Marais, H., Henzinger, M., and Moricz, M. 1999. Analysis of a very large Web search engine query log. SIGIR Forum 33, 1, 6--12. Google ScholarDigital Library
White, R. W. and Drucker, S. M. 2007. Investigating behaviorial variability in Web search. In Proceedings of the World Wide Web Conference. 21--30. Google ScholarDigital Library
White, R. W., Bilenko, M., and Cucerzan, S. 2008. Leveraging popular destinations to enhance Web search interaction. ACM Trans. Web 2, 3, Article 16. Google ScholarDigital Library
Xu, J. and Li, H. 2007. AdaRank: A boosting algorithm for information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 391--398. Google ScholarDigital Library
Zheng, Z., Zha, H., Zhang, T., Chapelle, O., Chen, K., and Sun, G. 2007. A general boosting method and its application to learning ranking functions for Web search. In Proceedings of the Neural Information Processing Systems Conference. 1697--1704.Google Scholar

Index Terms

ClickRank: Learning Session-Context Models to Enrich Web Search Ranking
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
  2. Information systems applications
    1. Data mining

Recommendations

Mining rich session context to improve web search
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

User browsing information, particularly their non-search related activity, reveals important contextual information on the preferences and the intent of web users. In this paper, we expand the use of browsing information for web search ranking and other ...
Read More
Quality-biased ranking for queries with commercial intent
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide Web

Modern search engines are good enough to answer popular commercial queries with mainly highly relevant documents. However, our experiments show that users behavior on such relevant commercial sites may differ from one to another web-site with the same ...
Read More
Smoothing clickthrough data for web search ranking
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

Incorporating features extracted from clickthrough data (called clickthrough features) has been demonstrated to significantly improve the performance of ranking models for Web search applications. Such benefits, however, are severely limited by the data ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on the Web Volume 6, Issue 1
March 2012
109 pages
ISSN:1559-1131
EISSN:1559-114X
DOI:10.1145/2109205
Issue’s Table of Contents

Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 March 2012
- Accepted: 1 June 2011
- Revised: 1 June 2010
- Received: 1 September 2009
Published in tweb Volume 6, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
ClickRank
Web search
aggregate user behavior
intentional surfer model
learning to rank
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 703
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ClickRank: Learning Session-Context Models to Enrich Web Search Ranking

ACM Transactions on the Web

Abstract

References

Cited By

Index Terms

Recommendations

Mining rich session context to improve web search

Quality-biased ranking for queries with commercial intent

Smoothing clickthrough data for web search ranking

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

ClickRank: Learning Session-Context Models to Enrich Web Search Ranking

ACM Transactions on the Web

Abstract

References

Cited By

Index Terms

Recommendations

Mining rich session context to improve web search

Quality-biased ranking for queries with commercial intent

Smoothing clickthrough data for web search ranking

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media