ABSTRACT
Understanding user intent is key to designing an effective ranking system in a search engine. In the absence of any explicit knowledge of user intent, search engines want to diversify results to improve user satisfaction. In such a setting, the probability ranking principle-based approach of presenting the most relevant results on top can be sub-optimal, and hence the search engine would like to trade-off relevance for diversity in the results.
In analogy to prior work on ranking and clustering systems, we use the axiomatic approach to characterize and design diversification systems. We develop a set of natural axioms that a diversification system is expected to satisfy, and show that no diversification function can satisfy all the axioms simultaneously. We illustrate the use of the axiomatic framework by providing three example diversification objectives that satisfy different subsets of the axioms. We also uncover a rich link to the facility dispersion problem that results in algorithms for a number of diversification objectives. Finally, we propose an evaluation methodology to characterize the objectives and the underlying axioms. We conduct a large scale evaluation of our objectives based on two data sets: a data set derived from the Wikipedia disambiguation pages and a product database.
- R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In Proc. 2nd ACM Intl Conf on Web Search and Data Mining, 2009. Google ScholarDigital Library
- A. Altman and M. Tennenholtz. On the axiomatic foundations of ranking systems. In Proc. 19th International Joint Conference on Artificial Intelligence, pages 917--922, 2005. Google ScholarDigital Library
- Kenneth Arrow. Social Choice and Individual Values. Wiley, New York, 1951.Google Scholar
- Yair Bartal. On approximating arbitrary metrices by tree metrics. In STOC, pages 161--168, 1998. Google ScholarDigital Library
- Andrei Z. Broder, Moses Charikar, Alan M. Frieze, and Michael Mitzenmacher. Min-wise independent permutations. Journal of Computer and System Sciences, 60(3):630--659, 2000. Google ScholarDigital Library
- J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 335--336, 1998. Google ScholarDigital Library
- Barun Chandra and Magnus M. Halldorsson. Approximation algorithms for dispersion problems. J. Algorithms, 38(2):438--465, 2001. Google ScholarDigital Library
- H. Chen and D.R. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 429--436, 2006. Google ScholarDigital Library
- C.L.A. Clarke, M. Kolla, G.V. Cormack, O. Vechtomova, A. Ashkan, S. Buttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 659--666, 2008. Google ScholarDigital Library
- Sreenivas Gollapudi and Rina Panigrahy. Exploiting asymmetry in hierarchical topic extraction. In CIKM, pages 475--482, 2006. Google ScholarDigital Library
- R. Hassin, S. Rubinstein, and A. Tamir. Approximation algorithms for maximum dispersion. Operations Research Letters, 21(3):133--137, 1997. Google ScholarDigital Library
- J. Kleinberg. An Impossibility Theorem for Clustering. Advances in Neural Information Processing Systems 15: Proceedings of the 2002 Conference, 2003.Google Scholar
- B. Korte and D. Hausmann. An Analysis of the Greedy Heuristic for Independence Systems. Algorithmic Aspects of Combinatorics, 2:65--74, 1978.Google ScholarCross Ref
- SS Ravi, D.J. Rosenkrantz, and G.K. Tayi. Facility dispersion problems: Heuristics and special cases. Proc. 2nd Workshop on Algorithms and Data Structures (WADS), pages 355--366, 1991.Google ScholarCross Ref
- S.S. Ravi, D.J. Rosenkrantz, and G.K. Tayi. Heuristic and special case algorithms for dispersion problems. Operations Research, 42(2):299--310, 1994.Google ScholarDigital Library
- SS Ravi, D.J. Rosenkrantzt, and G.K. Tayi. Approximation Algorithms for Facility Dispersion. In Teofilo F. Gonzalez, editor, Handbook of Approximation Algorithms and Metaheuristics. Chapman & Hall/CRC, 2007.Google Scholar
- Stephen Robertson and Hugo Zaragoza. On rank-based e ectiveness measures and optimization. Inf. Retr., 10(3):321--339, 2007. Google ScholarDigital Library
- Atish Das Sarma, Sreenivas Gollapudi, and Samuel Ieong. Bypass rates: reducing query abandonment using negative inferences. In KDD '08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 177--185, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- E. Vee, U. Srivastava, J. Shanmugasundaram, P. Bhat, and S.A. Yahia. Efficient Computation of Diverse Query Results. IEEE 24th International Conference on Data Engineering, 2008. ICDE 2008, pages 228--236, 2008. Google ScholarDigital Library
- ChengXiang Zhai. Risk Minimization and Language Modeling in Information Retrieval. PhD thesis, Carnegie Mellon University, 2002.Google Scholar
- C.X. Zhai, W.W. Cohen, and J. La erty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 10--17, 2003. Google ScholarDigital Library
- C.X. Zhai and J. La erty. A risk minimization framework for information retrieval. Information Processing and Management, 42(1):31--55, 2006. Google ScholarDigital Library
- C.N. Ziegler, S.M. McNee, J.A. Konstan, and G. Lausen. Improving recommendation lists through topic diversification. Proceedings of the 14th international conference on World Wide Web, pages 22--32, 2005. Google ScholarDigital Library
Index Terms
- An axiomatic approach for result diversification
Recommendations
Query suggestion with diversification and personalization
Web search query suggestion is an important functionality that facilitates information seeking of search engine users. In existing work, the concepts of diversification and personalization have been individually introduced to query suggestion systems. ...
An exploration of pattern-based subtopic modeling for search result diversification
JCDL '11: Proceedings of the 11th annual international ACM/IEEE joint conference on Digital librariesTraditional information retrieval models do not necessarily provide users with optimal search experience because the top ranked documents may contain the same piece of relevant information, i.e., the same subtopic of a query. The goal of search result ...
Intent-based diversification of web search results: metrics and algorithms
We study the problem of web search result diversification in the case where intent based relevance scores are available. A diversified search result will hopefully satisfy the information need of user-L.s who may have different intents. In this context, ...
Comments