Abstract
Top-k query processing finds a list of k results that have largest scores w.r.t the user given query, with the assumption that all the k results are independent to each other. In practice, some of the top-k results returned can be very similar to each other. As a result some of the top-k results returned are redundant. In the literature, diversified top-k search has been studied to return k results that take both score and diversity into consideration. Most existing solutions on diversified top-k search assume that scores of all the search results are given, and some works solve the diversity problem on a specific problem and can hardly be extended to general cases. In this paper, we study the diversified top-k search problem. We define a general diversified top-k search problem that only considers the similarity of the search results themselves. We propose a framework, such that most existing solutions for top-k query processing can be extended easily to handle diversified top-k search, by simply applying three new functions, a sufficient stop condition sufficient(), a necessary stop condition necessary(), and an algorithm for diversified top-k search on the current set of generated results, div-search-current(). We propose three new algorithms, namely, div-astar, div-dp, and div-cut to solve the div-search-current() problem. div-astar is an A* based algorithm, div-dp is an algorithm that decomposes the results into components which are searched using div-astar independently and combined using dynamic programming. div-cut further decomposes the current set of generated results using cut points and combines the results using sophisticated operations. We conducted extensive performance studies using two real datasets, enwiki and reuters. Our div-cut algorithm finds the optimal solution for diversified top-k search problem in seconds even for k as large as 2, 000.
- R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In Proc. of WSDM'09, pages 5--14, 2009. Google ScholarDigital Library
- A. Angel and N. Koudas. Efficient diversity-aware search. In Proc. of SIGMOD'11, pages 781--792, 2011. Google ScholarDigital Library
- L. Babel. A fast algorithm for the maximum weight clique problem. Computing, 52(1):31--38, 1994.Google ScholarCross Ref
- E. Balas and J. Xue. Weighted and unweighted maximum clique algorithms with upper bounds from fractional coloring. Algorithmica, 15(5):397--412, 1996.Google ScholarDigital Library
- J. G. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proc. of SIGIR'98, pages 335--336, 1998. Google ScholarDigital Library
- E. Demidova, P. Fankhauser, X. Zhou, and W. Nejdl. DivQ: diversification for keyword search over structured databases. In Proc. of SIGIR'10, pages 331--338, 2010. Google ScholarDigital Library
- R. Fagin. Combining fuzzy information from multiple systems. J. Comput. Syst. Sci., 58(1):83--99, 1999. Google ScholarDigital Library
- R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In Proc. of PODS'01, pages 102--113, 2001. Google ScholarDigital Library
- R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci., 66(4):614--656, 2003. Google ScholarDigital Library
- M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., 1979. Google ScholarDigital Library
- K. Golenberg, B. Kimelfeld, and Y. Sagiv. Keyword proximity search in complex data graphs. In Proc. of SIGMOD'08, pages 927--940, 2008. Google ScholarDigital Library
- U. Güntzer, W.-T. Balke, and W. Kießling. Towards efficient multi-feature queries in heterogeneous environments. In Proc. of ITCC'01, pages 622--628, 2001. Google ScholarDigital Library
- I. F. Ilyas, G. Beskales, and M. A. Soliman. A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv., 40(4):11:1--11:58, 2008. Google ScholarDigital Library
- M. Kargar and A. An. Keyword search in graphs: finding r-cliques. Proc. VLDB Endow., 4(10):681--692, 2011. Google ScholarDigital Library
- B. Kimelfeld and Y. Sagiv. Finding and approximating top-k answers in keyword proximity search. In Proc. of PODS'06, pages 173--182, 2006. Google ScholarDigital Library
- E. L. Lawler. A procedure for computing the k best solutions to discrete optimization problems and its application to the shortest path problem. Management Science, 18(7):401--405, 1972.Google ScholarDigital Library
- N. Mamoulis, M. L. Yiu, K. H. Cheng, and D. W. Cheung. Efficient top-k aggregation of ranked inputs. ACM Trans. Database Syst., 32(3):19, 2007. Google ScholarDigital Library
- Q. Mei, J. Guo, and D. R. Radev. Divrank: the interplay of prestige and diversity in information networks. In Proc. of KDD'10, pages 1009--1018, 2010. Google ScholarDigital Library
- P. R. J. Östergård. A new algorithm for the maximum-weight clique problem. Nordic J. of Computing, 8(4):424--436, December 2001. Google ScholarDigital Library
- L. Qin, J. X. Yu, L. Chang, and Y. Tao. Query communities in relational databases. In Proc. of ICDE'09, pages 724--735, 2009. Google ScholarDigital Library
- J. S. Warren and I. V. Hicks. Combinatorial branch-and-bound for the maximum weight independent set problem, 2007.Google Scholar
- Y. Zhang, J. P. Callan, and T. P. Minka. Novelty and redundancy detection in adaptive filtering. In Proc. of SIGIR'02, pages 81--88, 2002. Google ScholarDigital Library
- X. Zhu, J. Guo, X. Cheng, P. Du, and H. Shen. A unified framework for recommending diverse and relevant queries. In Proc. of WWW'11, pages 37--46, 2011. Google ScholarDigital Library
Recommendations
Post-ranking query suggestion by diversifying search results
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information RetrievalQuery suggestion refers to the process of suggesting related queries to search engine users. Most existing researches have focused on improving the relevance of suggested queries. In this paper, we introduce the concept of diversifying the content of ...
Diversifying query suggestion results
AAAI'10: Proceedings of the Twenty-Fourth AAAI Conference on Artificial IntelligenceIn order to improve the user search experience, Query Suggestion, a technique for generating alternative queries to Web users, has become an indispensable feature for commercial search engines. However, previous work mainly focuses on suggesting ...
Diversifying Query Auto-Completion
Query auto-completion assists web search users in formulating queries with a few keystrokes, helping them to avoid spelling mistakes and to produce clear query expressions, and so on. Previous work on query auto-completion mainly centers around ...
Comments