skip to main content
10.1145/3589334.3645374acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article
Free Access

A Similarity-based Approach for Efficient Large Quasi-clique Detection

Published:13 May 2024Publication History

ABSTRACT

Identifying dense subgraphs called quasi-cliques is pivotal in various graph mining tasks across domains like biology, social networks, and e-commerce. However, recent algorithms still suffer from efficiency issues when mining large quasi-cliques in massive and complex graphs. Our key insight is that vertices within a quasi-clique exhibit similar neighborhoods to some extent. Based on this, we introduce NBSim and FastNBSim, efficient algorithms that find near-maximum quasi-cliques by exploiting vertex neighborhood similarity. FastNBSim further uses MinHash approximations to reduce the time complexity for similarity computation. Empirical evaluation on 10 real-world graphs shows that our algorithms deliver up to three orders of magnitude speedup versus the state-of-the-art algorithms, while ensuring high-quality quasi-clique extraction.

Skip Supplemental Material Section

Supplemental Material

rfp0357.mp4

Supplemental video

mp4

3.6 MB

References

  1. James Abello, Mauricio GC Resende, and Sandra Sudarsky. 2002. Massive quasi-clique detection. In LATIN 2002: Theoretical Informatics: 5th Latin American Symposium Cancun, Mexico, April 3--6, 2002 Proceedings 5. Springer, 598--612.Google ScholarGoogle ScholarCross RefCross Ref
  2. Coen Bron and Joep Kerbosch. 1973. Algorithm 457: finding all cliques of an undirected graph. Commun. ACM, Vol. 16, 9 (1973), 575--577.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Gregory Buehrer and Kumar Chellapilla. 2008. A scalable pattern mining approach to web graph compression with communities. In Proceedings of the 2008 international conference on web search and data mining. 95--106.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Renato Carmo and Alexandre Züge. 2012. Branch and bound algorithms for the maximum clique problem under a unified framework. Journal of the Brazilian Computer Society , Vol. 18 (2012), 137--151.Google ScholarGoogle ScholarCross RefCross Ref
  5. Randy Carraghan and Panos M Pardalos. 1990. An exact algorithm for the maximum clique problem. Operations Research Letters , Vol. 9, 6 (1990), 375--382.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Lijun Chang. 2019. Efficient maximum clique computation over large sparse graphs. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 529--538.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jiejiang Chen, Shaowei Cai, Shiwei Pan, Yiyuan Wang, Qingwei Lin, Mengyu Zhao, and Minghao Yin. 2021. NuQClq: an effective local search algorithm for maximum quasi-clique problem. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 12258--12266.Google ScholarGoogle ScholarCross RefCross Ref
  8. James Cheng, Linhong Zhu, Yiping Ke, and Shumo Chu. 2012. Fast algorithms for maximal clique enumeration with limited memory. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 1240--1248.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Apurba Das, Seyed-Vahid Sanei-Mehri, and Srikanta Tirthapura. 2018. Shared-memory parallel maximal clique enumeration. In 2018 IEEE 25th International Conference on High Performance Computing (HiPC). IEEE, 62--71.Google ScholarGoogle ScholarCross RefCross Ref
  10. Youcef Djeddi, Hacene Ait Haddadene, and Nabil Belacel. 2019. An extension of adaptive multi-start tabu search for the maximum quasi-clique problem. Computers & Industrial Engineering , Vol. 132 (2019), 280--292.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Alessandro Epasto, Silvio Lattanzi, and Mauro Sozio. 2015. Efficient densest subgraph computation in evolving graphs. In Proceedings of the 24th international conference on world wide web. 300--310.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. David Eppstein, Maarten Löffler, and Darren Strash. 2013. Listing all maximal cliques in large sparse real-world graphs. Journal of Experimental Algorithmics (JEA) , Vol. 18 (2013), 3--1.Google ScholarGoogle Scholar
  13. Giorgio Gallo, Michael D Grigoriadis, and Robert E Tarjan. 1989. A fast parametric maximum flow algorithm and applications. SIAM J. Comput. , Vol. 18, 1 (1989), 30--55.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Andrew V Goldberg. 1984. Finding a maximum density subgraph. (1984).Google ScholarGoogle Scholar
  15. Paul Jaccard. 1912. The distribution of the flora in the alpine zone. 1. New phytologist, Vol. 11, 2 (1912), 37--50.Google ScholarGoogle Scholar
  16. Shweta Jain and C Seshadhri. 2017. A fast and provable method for estimating clique counts using turán's theorem. In Proceedings of the 26th international conference on world wide web. 441--449.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. David Knoke and Song Yang. 2019. Social network analysis. SAGE publications.Google ScholarGoogle Scholar
  18. Aritra Konar and Nicholas D Sidiropoulos. 2020. Mining large quasi-cliques with quality guarantees from vertex neighborhoods. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 577--587.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford large network dataset collection.Google ScholarGoogle Scholar
  20. Chu-Min Li, Zhiwen Fang, and Ke Xu. 2013. Combining MaxSAT reasoning and incremental upper bound for the maximum clique problem. In 2013 IEEE 25th International Conference on Tools with Artificial Intelligence. IEEE, 939--946.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Chenhao Ma, Yixiang Fang, Reynold Cheng, Laks VS Lakshmanan, and Xiaolin Han. 2022. A convex-programming approach for efficient directed densest subgraph discovery. In Proceedings of the 2022 International Conference on Management of Data. 845--859.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Chenhao Ma, Yixiang Fang, Reynold Cheng, Laks VS Lakshmanan, Wenjie Zhang, and Xuemin Lin. 2020. Efficient algorithms for densest subgraph discovery on large directed graphs. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1051--1066.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Fabrizio Marinelli, Andrea Pizzuti, and Fabrizio Rossi. 2021. LP-based dual bounds for the maximum quasi-clique problem. Discrete Applied Mathematics , Vol. 296 (2021), 118--140.Google ScholarGoogle ScholarCross RefCross Ref
  24. Zhuqi Miao and Balabhaskar Balasundaram. 2020. An ellipsoidal bounding scheme for the quasi-clique number of a graph. INFORMS Journal on Computing , Vol. 32, 3 (2020), 763--778.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Michael Mitzenmacher, Jakub Pachocki, Richard Peng, Charalampos Tsourakakis, and Shen Chen Xu. 2015. Scalable large near-clique detection in large-scale networks via sampling. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 815--824.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Panos M Pardalos and Jue Xue. 1994. The maximum clique problem. Journal of global Optimization , Vol. 4 (1994), 301--328.Google ScholarGoogle ScholarCross RefCross Ref
  27. Jeffrey Pattillo, Alexander Veremyev, Sergiy Butenko, and Vladimir Boginski. 2013. On the maximum quasi-clique problem. Discrete Applied Mathematics , Vol. 161, 1--2 (2013), 244--257.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Bruno Q Pinto, Celso C Ribeiro, José A Riveaux, and Isabel Rosseti. 2021. A BRKGA-based matheuristic for the maximum quasi-clique problem with an exact local search strategy. RAIRO-Operations Research , Vol. 55 (2021), S741--S763.Google ScholarGoogle ScholarCross RefCross Ref
  29. Bruno Q Pinto, Celso C Ribeiro, Isabel Rosseti, and Alexandre Plastino. 2018. A biased random-key genetic algorithm for the maximum quasi-clique problem. European Journal of Operational Research , Vol. 271, 3 (2018), 849--865.Google ScholarGoogle ScholarCross RefCross Ref
  30. Celso C Ribeiro and José A Riveaux. 2019. An exact algorithm for the maximum quasi-clique problem. International Transactions in Operational Research, Vol. 26, 6 (2019), 2199--2229.Google ScholarGoogle ScholarCross RefCross Ref
  31. Ryan A Rossi, David F Gleich, and Assefaw H Gebremedhin. 2015. Parallel maximum clique algorithms with applications to network analysis. SIAM Journal on Scientific Computing , Vol. 37, 5 (2015), C589--C616.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Boyu Ruan, Junhao Gan, Hao Wu, and Anthony Wirth. 2021. Dynamic structural clustering on graphs. In Proceedings of the 2021 International Conference on Management of Data. 1491--1503.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Pablo San Segundo, Alvaro Lopez, and Panos M Pardalos. 2016. A new exact maximum clique algorithm for large and massive sparse graphs. Computers & Operations Research , Vol. 66 (2016), 81--94.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Nikita Spirin and Jiawei Han. 2012. Survey on web spam detection: principles and algorithms. ACM SIGKDD explorations newsletter , Vol. 13, 2 (2012), 50--64.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Etsuji Tomita, Akira Tanaka, and Haruhisa Takahashi. 2006. The worst-case time complexity for generating all maximal cliques and computational experiments. Theoretical computer science , Vol. 363, 1 (2006), 28--42.Google ScholarGoogle Scholar
  36. Tom Tseng, Laxman Dhulipala, and Julian Shun. 2021. Parallel index-based structural graph clustering and its approximation. In Proceedings of the 2021 International Conference on Management of Data. 1851--1864.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Charalampos Tsourakakis. 2015. The k-clique densest subgraph problem. In Proceedings of the 24th international conference on world wide web. 1122--1132.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Charalampos Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria Tsiarli. 2013. Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 104--112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Takeaki Uno. 2005. Maximal Clique Enumerator (MACE). http://research.nii.ac.jp/ uno/codes.htm.Google ScholarGoogle Scholar
  40. Alexander Veremyev, Oleg A Prokopyev, Sergiy Butenko, and Eduardo L Pasiliao. 2016. Exact MIP-based approaches for finding maximum quasi-cliques and dense subgraphs. Computational Optimization and Applications , Vol. 64, 1 (2016), 177--214.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Stanley Wasserman and Katherine Faust. 1994. Social network analysis: Methods and applications. (1994).Google ScholarGoogle Scholar
  42. David R Wood. 1997. An algorithm for finding a maximum clique in a graph. Operations Research Letters , Vol. 21, 5 (1997), 211--217.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Xiaowei Xu, Nurcan Yuruk, Zhidan Feng, and Thomas AJ Schweiger. 2007. Scan: a structural clustering algorithm for networks. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. 824--833.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Yichen Xu, Chenhao Ma, Yixiang Fang, and Zhifeng Bao. 2023. Efficient and Effective Algorithms for Generalized Densest Subgraph Discovery. Proceedings of the ACM on Management of Data, Vol. 1, 2 (2023), 1--27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Long Yuan, Lu Qin, Xuemin Lin, Lijun Chang, and Wenjie Zhang. 2016. Diversified top-k clique search. The VLDB Journal, Vol. 25, 2 (2016), 171--196.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Fangyuan Zhang and Sibo Wang. 2022. Effective indexing for dynamic structural graph clustering. Proceedings of the VLDB Endowment , Vol. 15, 11 (2022), 2908--2920. ioGoogle ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Similarity-based Approach for Efficient Large Quasi-clique Detection

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        WWW '24: Proceedings of the ACM on Web Conference 2024
        May 2024
        4826 pages
        ISBN:9798400701719
        DOI:10.1145/3589334

        Copyright © 2024 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 May 2024

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,899of8,196submissions,23%
      • Article Metrics

        • Downloads (Last 12 months)58
        • Downloads (Last 6 weeks)58

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader