Skip to main content
Log in

Diversified top-k clique search

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Maximal clique enumeration is a fundamental problem in graph theory and has been extensively studied. However, maximal clique enumeration is time-consuming in large graphs and always returns enormous cliques with large overlaps. Motivated by this, in this paper, we study the diversified top-k clique search problem which is to find top-k cliques that can cover most number of nodes in the graph. Diversified top-k clique search can be widely used in a lot of applications including community search, motif discovery, and anomaly detection in large graphs. A naive solution for diversified top-k clique search is to keep all maximal cliques in memory and then find k of them that cover most nodes in the graph by using the approximate greedy max k-cover algorithm. However, such a solution is impractical when the graph is large. In this paper, instead of keeping all maximal cliques in memory, we devise an algorithm to maintain k candidates in the process of maximal clique enumeration. Our algorithm has limited memory footprint and can achieve a guaranteed approximation ratio. We also introduce a novel light-weight \(\mathsf {PNP}\)-\(\mathsf {Index}\), based on which we design an optimal maximal clique maintenance algorithm. We further explore three optimization strategies to avoid enumerating all maximal cliques and thus largely reduce the computational cost. Besides, for the massive input graph, we develop an I/O efficient algorithm to tackle the problem when the input graph cannot fit in main memory. We conduct extensive performance studies on real graphs and synthetic graphs. One of the real graphs contains 1.02 billion edges. The results demonstrate the high efficiency and effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Aggarwal, A., Vitter, J., et al.: The input/output complexity of sorting and related problems. Commun. ACM 31(9), 1116–1127 (1988)

    Article  MathSciNet  Google Scholar 

  2. Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: Proceedings of WSDM’09, pp. 5–14 (2009)

  3. Akkoyunlu, E.A.: The enumeration of maximal cliques of large graphs. SIAM J. Comput. 2(1), 1–6 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  4. Angel, A., Koudas, N.: Efficient diversity-aware search. In: Proceedings of SIGMOD’11, pp. 781–792 (2011)

  5. Ausiello, G., Boria, N., Giannakos, A., Lucarelli, G., Paschos, V.T.: Online maximum k-coverage. Discrete Appl. Math. 160(13–14), 1901–1913 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  6. Badanidiyuru, A., Mirzasoleiman, B., Karbasi, A., Krause, A.: Streaming submodular maximization: massive data summarization on the fly. In: Proceedings of KDD’14, pp. 671–680 (2014)

  7. Batagelj, V., Zaversnik, M.: An o(m) algorithm for cores decomposition of networks. CoRR. cs.DS/0310049 (2003)

  8. Bernard, H.R., Killworth, P.D., Sailer, L.: Informant accuracy in social network data IV: a comparison of clique-level structure in behavioral and cognitive network data. Soc. Netw. 2(3), 191–218 (1979)

    Article  Google Scholar 

  9. Berry, N., Ko, T., Moy, T., Smrcka, J., Turnley, J., Wu, B.: Emergent clique formation in terrorist recruitment. In: Workshop on Agent Organizations: Theory and Practice (2004)

  10. Borodin, A., Lee, H.C., Ye, Y.: Max-sum diversification, monotone submodular functions and dynamic updates. In: Proceedings of PODS’12, pp. 155–166 (2012)

  11. Bron, C., Kerbosch, J.: Finding all cliques of an undirected graph (algorithm 457). Commun. ACM 16(9), 575–576 (1973)

    Article  MATH  Google Scholar 

  12. Carraghan, R., Pardalos, P.M.: An exact algorithm for the maximum clique problem. Operat. Res. Lett. 9(6), 375–382 (1990)

    Article  MATH  Google Scholar 

  13. Chang, L., Yu, J.X., Qin, L.: Fast maximal cliques enumeration in sparse graphs. Algorithmica 66(1), 173–186 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  14. Cheng, J., Ke, Y., Chu, S., Özsu, M.T.: Efficient core decomposition in massive networks. In: Proceedings of ICDE, pp. 51–62 (2011)

  15. Cheng, J., Ke, Y., Fu, A.W.-C., Yu, J.X., Zhu, L.: Finding maximal cliques in massive networks. ACM Trans. Database Syst. 36(4), 21:1–21:34 (2011)

    Article  Google Scholar 

  16. Cheng, J., Zhu, L., Ke, Y., Chu, S.: Fast algorithms for maximal clique enumeration with limited memory. In: Proceedings of KDD’12, pp. 1240–1248 (2012)

  17. Chierichetti, F., Kumar, R., Tomkins, A.: Max-cover in map-reduce. In: Proceedings of WWW’10, pp. 231–240 (2010)

  18. Chu, S., Cheng, J.: Triangle listing in massive networks and its applications. In: Proceedings of SIGKDD, pp. 672–680 (2011)

  19. Demidova, E., Fankhauser, P., Zhou, X., Nejdl, W.: DivQ: diversification for keyword search over structured databases. In: Proceedings of SIGIR’10, pp. 331–338 (2010)

  20. Deng, T., Fan, W.: On the complexity of query result diversification. ACM Trans. Database Syst. 39(2), 15:1–15:46 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  21. Drosou, M., Pitoura, E.: Search result diversification. SIGMOD Rec. 39(1), 41–47 (2010)

    Article  Google Scholar 

  22. Eppstein, D., Loffler, M., Strash, D.: Listing all maximal cliques in sparse graphs in near-optimal time. ISAAC 1, 403–414 (2010)

    MathSciNet  MATH  Google Scholar 

  23. Eppstein, D., Strash, D.: Listing all maximal cliques in large sparse real-world graphs. In: Proceedings of SEA’11, pp. 364–375 (2011)

  24. Fan, W., Wang, X., Wu, Y.: Diversified top-k graph pattern matching. PVLDB 6(13), 1510–1521 (2013)

    Google Scholar 

  25. Feige, U.: A threshold of ln n for approximating set cover. J. ACM 45(4), 634–652 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  26. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., San Francisco (1979)

    MATH  Google Scholar 

  27. Hu, X., Tao, Y., Chung, C.: I/O-efficient algorithms on triangle listing and counting. ACM Trans. Database Syst. 39(4), 27:1–27:30 (2014)

    Article  MathSciNet  Google Scholar 

  28. Karp, R.M.: Reducibility among combinatorial problems. In: Complexity of Computer Computations. Plenum Press (1972)

  29. Konc, J., Janezic, D.: An improved branch and bound algorithm for the maximum clique problem. Proteins 4, 5 (2007)

    MathSciNet  MATH  Google Scholar 

  30. Lee, C., Reid, F., McDaid, A., Hurley, N.: Detecting highly overlapping community structure by greedy clique expansion. In: Workshop on Social Network Mining and Analysis (2010)

  31. Lin, X., Yuan, Y., Zhang, Q., Zhang, Y.: Selecting stars: The k most representative skyline operator. In: Proceedings of ICDE, pp. 86–95 (2007)

  32. Minack, E., Siberski, W., Nejdl, W.: Incremental diversification for very large sets: a streaming-based approach. In: Proceedings of SIGIR’11, pp. 585–594 (2011)

  33. Östergård, P.R.: A fast algorithm for the maximum clique problem. Discrete Appl. Math. 120(1), 197–207 (2002)

    Article  MathSciNet  Google Scholar 

  34. Qin, L., Yu, J.X., Chang, L.: Diversifying top-k results. PVLDB 5(11), 1124–1135 (2012)

    Google Scholar 

  35. Robson, J.: Finding a maximum independent set in time \(O(2^{n/4})\). In: Technical report, 1251-01, LaBRI, Université de Bordeaux I (2001)

  36. Saha, B., Getoor, L.: On maximum coverage in the streaming model & application to multi-topic blog-watch. In: Proceedings of SDM’09, pp. 697–708 (2009)

  37. Schmidt, M.C., Samatova, N.F., Thomas, K., Park, B.-H.: A scalable, parallel algorithm for maximal clique enumeration. J. Parallel Distrib. Comput. 69(4), 417–428 (2009)

    Article  Google Scholar 

  38. Suyudi, M., Mohd, I.B., Mamat, M., Sopiyan, S., Supriatna, A.K.: Solution of maximum clique problem by using branch and bound method. Appl. Math. Sci. 8(2), 81–90 (2014)

    MathSciNet  Google Scholar 

  39. Tomita, E., Kameda, T.: An efficient branch-and-bound algorithm for finding a maximum clique with computational experiments. J. Global Optim. 37(1), 95–111 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  40. Tomita, E., Tanaka, A., Takahashi, H.: The worst-case time complexity for generating all maximal cliques and computational experiments. Theor. Comput. Sci. 363(1), 28–42 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  41. Vieira, M.R., Razente, H.L., Barioni, M.C.N., Hadjieleftheriou, M., Srivastava, D., Traina, Jr., C., Tsotras, V.J.: On query result diversification. In: Proceedings of ICDE’11 (2011)

  42. Wang, J., Cheng, J.: Truss decomposition in massive networks. PVLDB 5(9), 812–823 (2012)

    Google Scholar 

  43. Wang, J., Cheng, J., Fu, A.W.-C.: Redundancy-aware maximal cliques. In: Proceedings of KDD’13, pp. 122–130 (2013)

  44. Welsh, D.J.A., Powell, M.B.: An upper bound for the chromatic number of a graph and its application to timetabling problems. Comput. J. 10(1), 85–86 (1967)

    Article  MATH  Google Scholar 

  45. Xiang, J., Guo, C., Aboulnaga, A.: Scalable maximum clique computation using mapreduce. In Proceedings of ICDE’13, pp. 74–85 (2013)

  46. Xu, Y., Cheng, J., Fu, A.W.-C., Bu, Y.: Distributed maximal clique computation. In: Proceedings of BigData’14, pp. 160–167 (2014)

  47. Yu, H., Yuan, D.: Set coverage problems in a one-pass data stream. In: Proceedings of SDM’13, pp. 758–766 (2013)

  48. Yuan, L., Qin, L., Lin, X., Chang, L., Zhang, W.: Diversified top-k clique search. In: Proceedings of ICDE’15, pp. 387–398 (2015)

  49. Zhang, Z., Qin, L., Yu, J.X.: Contract & expand: I/O efficient sccs computing. In: Proceedings of ICDE, pp. 208–219 (2014)

  50. Zhang, Z., Yu, J.X., Qin, L., Chang, L., Lin, X.: I/O efficient: computing sccs in massive graphs. In: Proceedings of SIGMOD, pp. 181–192 (2013)

  51. Zhang, Z., Yu, J.X., Qin, L., Shang, Z.: Divide & conquer: I/O efficient depth-first search. In: Proceedings of SIGMOD, pp. 445–458 (2015)

  52. Zheng, X., Liu, T., Yang, Z., Wang, J.: Large cliques in Arabidopsis gene coexpression network and motif discovery. J. Plant Physiol. 168(6), 611–618 (2011)

    Article  Google Scholar 

Download references

Acknowledgments

Lu Qin was supported by ARC DE140100999. Xuemin Lin was supported by NSFC61232006, NSFC61021004, ARC DP120104168, and ARC DP140103578. Lijun Chang was supported by ARC DE150100563. Wenjie Zhang was supported by ARC DE120102144 and DP120104168.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lu Qin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yuan, L., Qin, L., Lin, X. et al. Diversified top-k clique search. The VLDB Journal 25, 171–196 (2016). https://doi.org/10.1007/s00778-015-0408-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-015-0408-z

Keywords

Navigation