research-article

Diversifying top-k results

Authors:
Lu Qin

The Chinese University of Hong Kong, Hong Kong, China

The Chinese University of Hong Kong, Hong Kong, China
View Profile

,
Jeffrey Xu Yu

The Chinese University of Hong Kong, Hong Kong, China

The Chinese University of Hong Kong, Hong Kong, China
View Profile

,
Lijun Chang

The Chinese University of Hong Kong, Hong Kong, China

The Chinese University of Hong Kong, Hong Kong, China
View Profile

Proceedings of the VLDB Endowment Volume 5 Issue 11pp 1124–1135https://doi.org/10.14778/2350229.2350233

Published:01 July 2012Publication History

Proceedings of the VLDB Endowment

Abstract

Top-k query processing finds a list of k results that have largest scores w.r.t the user given query, with the assumption that all the k results are independent to each other. In practice, some of the top-k results returned can be very similar to each other. As a result some of the top-k results returned are redundant. In the literature, diversified top-k search has been studied to return k results that take both score and diversity into consideration. Most existing solutions on diversified top-k search assume that scores of all the search results are given, and some works solve the diversity problem on a specific problem and can hardly be extended to general cases. In this paper, we study the diversified top-k search problem. We define a general diversified top-k search problem that only considers the similarity of the search results themselves. We propose a framework, such that most existing solutions for top-k query processing can be extended easily to handle diversified top-k search, by simply applying three new functions, a sufficient stop condition sufficient(), a necessary stop condition necessary(), and an algorithm for diversified top-k search on the current set of generated results, div-search-current(). We propose three new algorithms, namely, div-astar, div-dp, and div-cut to solve the div-search-current() problem. div-astar is an A* based algorithm, div-dp is an algorithm that decomposes the results into components which are searched using div-astar independently and combined using dynamic programming. div-cut further decomposes the current set of generated results using cut points and combines the results using sophisticated operations. We conducted extensive performance studies using two real datasets, enwiki and reuters. Our div-cut algorithm finds the optimal solution for diversified top-k search problem in seconds even for k as large as 2, 000.

References

R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In Proc. of WSDM'09, pages 5--14, 2009. Google ScholarDigital Library
A. Angel and N. Koudas. Efficient diversity-aware search. In Proc. of SIGMOD'11, pages 781--792, 2011. Google ScholarDigital Library
L. Babel. A fast algorithm for the maximum weight clique problem. Computing, 52(1):31--38, 1994.Google ScholarCross Ref
E. Balas and J. Xue. Weighted and unweighted maximum clique algorithms with upper bounds from fractional coloring. Algorithmica, 15(5):397--412, 1996.Google ScholarDigital Library
J. G. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proc. of SIGIR'98, pages 335--336, 1998. Google ScholarDigital Library
E. Demidova, P. Fankhauser, X. Zhou, and W. Nejdl. DivQ: diversification for keyword search over structured databases. In Proc. of SIGIR'10, pages 331--338, 2010. Google ScholarDigital Library
R. Fagin. Combining fuzzy information from multiple systems. J. Comput. Syst. Sci., 58(1):83--99, 1999. Google ScholarDigital Library
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In Proc. of PODS'01, pages 102--113, 2001. Google ScholarDigital Library
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci., 66(4):614--656, 2003. Google ScholarDigital Library
M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., 1979. Google ScholarDigital Library
K. Golenberg, B. Kimelfeld, and Y. Sagiv. Keyword proximity search in complex data graphs. In Proc. of SIGMOD'08, pages 927--940, 2008. Google ScholarDigital Library
U. Güntzer, W.-T. Balke, and W. Kießling. Towards efficient multi-feature queries in heterogeneous environments. In Proc. of ITCC'01, pages 622--628, 2001. Google ScholarDigital Library
I. F. Ilyas, G. Beskales, and M. A. Soliman. A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv., 40(4):11:1--11:58, 2008. Google ScholarDigital Library
M. Kargar and A. An. Keyword search in graphs: finding r-cliques. Proc. VLDB Endow., 4(10):681--692, 2011. Google ScholarDigital Library
B. Kimelfeld and Y. Sagiv. Finding and approximating top-k answers in keyword proximity search. In Proc. of PODS'06, pages 173--182, 2006. Google ScholarDigital Library
E. L. Lawler. A procedure for computing the k best solutions to discrete optimization problems and its application to the shortest path problem. Management Science, 18(7):401--405, 1972.Google ScholarDigital Library
N. Mamoulis, M. L. Yiu, K. H. Cheng, and D. W. Cheung. Efficient top-k aggregation of ranked inputs. ACM Trans. Database Syst., 32(3):19, 2007. Google ScholarDigital Library
Q. Mei, J. Guo, and D. R. Radev. Divrank: the interplay of prestige and diversity in information networks. In Proc. of KDD'10, pages 1009--1018, 2010. Google ScholarDigital Library
P. R. J. Östergård. A new algorithm for the maximum-weight clique problem. Nordic J. of Computing, 8(4):424--436, December 2001. Google ScholarDigital Library
L. Qin, J. X. Yu, L. Chang, and Y. Tao. Query communities in relational databases. In Proc. of ICDE'09, pages 724--735, 2009. Google ScholarDigital Library
J. S. Warren and I. V. Hicks. Combinatorial branch-and-bound for the maximum weight independent set problem, 2007.Google Scholar
Y. Zhang, J. P. Callan, and T. P. Minka. Novelty and redundancy detection in adaptive filtering. In Proc. of SIGIR'02, pages 81--88, 2002. Google ScholarDigital Library
X. Zhu, J. Guo, X. Cheng, P. Du, and H. Shen. A unified framework for recommending diverse and relevant queries. In Proc. of WWW'11, pages 37--46, 2011. Google ScholarDigital Library

Recommendations

Post-ranking query suggestion by diversifying search results
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

Query suggestion refers to the process of suggesting related queries to search engine users. Most existing researches have focused on improving the relevance of suggested queries. In this paper, we introduce the concept of diversifying the content of ...
Read More
Diversifying query suggestion results
AAAI'10: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence

In order to improve the user search experience, Query Suggestion, a technique for generating alternative queries to Web users, has become an indispensable feature for commercial search engines. However, previous work mainly focuses on suggesting ...
Read More
Diversifying Query Auto-Completion

Query auto-completion assists web search users in formulating queries with a few keystrokes, helping them to avoid spelling mistakes and to produce clear query expressions, and so on. Previous work on query auto-completion mainly centers around ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 5, Issue 11
July 2012
608 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 July 2012
Published in pvldb Volume 5, Issue 11
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 45
  Total Citations
  View Citations
- 309
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Diversifying top-k results

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Post-ranking query suggestion by diversifying search results

Diversifying query suggestion results

Diversifying Query Auto-Completion

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Diversifying top-k results

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Post-ranking query suggestion by diversifying search results

Diversifying query suggestion results

Diversifying Query Auto-Completion

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media