skip to main content
research-article

Diversifying top-k results

Authors Info & Claims
Published:01 July 2012Publication History
Skip Abstract Section

Abstract

Top-k query processing finds a list of k results that have largest scores w.r.t the user given query, with the assumption that all the k results are independent to each other. In practice, some of the top-k results returned can be very similar to each other. As a result some of the top-k results returned are redundant. In the literature, diversified top-k search has been studied to return k results that take both score and diversity into consideration. Most existing solutions on diversified top-k search assume that scores of all the search results are given, and some works solve the diversity problem on a specific problem and can hardly be extended to general cases. In this paper, we study the diversified top-k search problem. We define a general diversified top-k search problem that only considers the similarity of the search results themselves. We propose a framework, such that most existing solutions for top-k query processing can be extended easily to handle diversified top-k search, by simply applying three new functions, a sufficient stop condition sufficient(), a necessary stop condition necessary(), and an algorithm for diversified top-k search on the current set of generated results, div-search-current(). We propose three new algorithms, namely, div-astar, div-dp, and div-cut to solve the div-search-current() problem. div-astar is an A* based algorithm, div-dp is an algorithm that decomposes the results into components which are searched using div-astar independently and combined using dynamic programming. div-cut further decomposes the current set of generated results using cut points and combines the results using sophisticated operations. We conducted extensive performance studies using two real datasets, enwiki and reuters. Our div-cut algorithm finds the optimal solution for diversified top-k search problem in seconds even for k as large as 2, 000.

References

  1. R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In Proc. of WSDM'09, pages 5--14, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Angel and N. Koudas. Efficient diversity-aware search. In Proc. of SIGMOD'11, pages 781--792, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Babel. A fast algorithm for the maximum weight clique problem. Computing, 52(1):31--38, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  4. E. Balas and J. Xue. Weighted and unweighted maximum clique algorithms with upper bounds from fractional coloring. Algorithmica, 15(5):397--412, 1996.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. G. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proc. of SIGIR'98, pages 335--336, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. Demidova, P. Fankhauser, X. Zhou, and W. Nejdl. DivQ: diversification for keyword search over structured databases. In Proc. of SIGIR'10, pages 331--338, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Fagin. Combining fuzzy information from multiple systems. J. Comput. Syst. Sci., 58(1):83--99, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In Proc. of PODS'01, pages 102--113, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci., 66(4):614--656, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. Golenberg, B. Kimelfeld, and Y. Sagiv. Keyword proximity search in complex data graphs. In Proc. of SIGMOD'08, pages 927--940, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. U. Güntzer, W.-T. Balke, and W. Kießling. Towards efficient multi-feature queries in heterogeneous environments. In Proc. of ITCC'01, pages 622--628, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. I. F. Ilyas, G. Beskales, and M. A. Soliman. A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv., 40(4):11:1--11:58, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Kargar and A. An. Keyword search in graphs: finding r-cliques. Proc. VLDB Endow., 4(10):681--692, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. Kimelfeld and Y. Sagiv. Finding and approximating top-k answers in keyword proximity search. In Proc. of PODS'06, pages 173--182, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. E. L. Lawler. A procedure for computing the k best solutions to discrete optimization problems and its application to the shortest path problem. Management Science, 18(7):401--405, 1972.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. N. Mamoulis, M. L. Yiu, K. H. Cheng, and D. W. Cheung. Efficient top-k aggregation of ranked inputs. ACM Trans. Database Syst., 32(3):19, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Q. Mei, J. Guo, and D. R. Radev. Divrank: the interplay of prestige and diversity in information networks. In Proc. of KDD'10, pages 1009--1018, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. P. R. J. Östergård. A new algorithm for the maximum-weight clique problem. Nordic J. of Computing, 8(4):424--436, December 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. Qin, J. X. Yu, L. Chang, and Y. Tao. Query communities in relational databases. In Proc. of ICDE'09, pages 724--735, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. S. Warren and I. V. Hicks. Combinatorial branch-and-bound for the maximum weight independent set problem, 2007.Google ScholarGoogle Scholar
  22. Y. Zhang, J. P. Callan, and T. P. Minka. Novelty and redundancy detection in adaptive filtering. In Proc. of SIGIR'02, pages 81--88, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. X. Zhu, J. Guo, X. Cheng, P. Du, and H. Shen. A unified framework for recommending diverse and relevant queries. In Proc. of WWW'11, pages 37--46, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 5, Issue 11
    July 2012
    608 pages

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 July 2012
    Published in pvldb Volume 5, Issue 11

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader