ABSTRACT
Identifying dense subgraphs called quasi-cliques is pivotal in various graph mining tasks across domains like biology, social networks, and e-commerce. However, recent algorithms still suffer from efficiency issues when mining large quasi-cliques in massive and complex graphs. Our key insight is that vertices within a quasi-clique exhibit similar neighborhoods to some extent. Based on this, we introduce NBSim and FastNBSim, efficient algorithms that find near-maximum quasi-cliques by exploiting vertex neighborhood similarity. FastNBSim further uses MinHash approximations to reduce the time complexity for similarity computation. Empirical evaluation on 10 real-world graphs shows that our algorithms deliver up to three orders of magnitude speedup versus the state-of-the-art algorithms, while ensuring high-quality quasi-clique extraction.
Supplemental Material
- James Abello, Mauricio GC Resende, and Sandra Sudarsky. 2002. Massive quasi-clique detection. In LATIN 2002: Theoretical Informatics: 5th Latin American Symposium Cancun, Mexico, April 3--6, 2002 Proceedings 5. Springer, 598--612.Google ScholarCross Ref
- Coen Bron and Joep Kerbosch. 1973. Algorithm 457: finding all cliques of an undirected graph. Commun. ACM, Vol. 16, 9 (1973), 575--577.Google ScholarDigital Library
- Gregory Buehrer and Kumar Chellapilla. 2008. A scalable pattern mining approach to web graph compression with communities. In Proceedings of the 2008 international conference on web search and data mining. 95--106.Google ScholarDigital Library
- Renato Carmo and Alexandre Züge. 2012. Branch and bound algorithms for the maximum clique problem under a unified framework. Journal of the Brazilian Computer Society , Vol. 18 (2012), 137--151.Google ScholarCross Ref
- Randy Carraghan and Panos M Pardalos. 1990. An exact algorithm for the maximum clique problem. Operations Research Letters , Vol. 9, 6 (1990), 375--382.Google ScholarDigital Library
- Lijun Chang. 2019. Efficient maximum clique computation over large sparse graphs. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 529--538.Google ScholarDigital Library
- Jiejiang Chen, Shaowei Cai, Shiwei Pan, Yiyuan Wang, Qingwei Lin, Mengyu Zhao, and Minghao Yin. 2021. NuQClq: an effective local search algorithm for maximum quasi-clique problem. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 12258--12266.Google ScholarCross Ref
- James Cheng, Linhong Zhu, Yiping Ke, and Shumo Chu. 2012. Fast algorithms for maximal clique enumeration with limited memory. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 1240--1248.Google ScholarDigital Library
- Apurba Das, Seyed-Vahid Sanei-Mehri, and Srikanta Tirthapura. 2018. Shared-memory parallel maximal clique enumeration. In 2018 IEEE 25th International Conference on High Performance Computing (HiPC). IEEE, 62--71.Google ScholarCross Ref
- Youcef Djeddi, Hacene Ait Haddadene, and Nabil Belacel. 2019. An extension of adaptive multi-start tabu search for the maximum quasi-clique problem. Computers & Industrial Engineering , Vol. 132 (2019), 280--292.Google ScholarDigital Library
- Alessandro Epasto, Silvio Lattanzi, and Mauro Sozio. 2015. Efficient densest subgraph computation in evolving graphs. In Proceedings of the 24th international conference on world wide web. 300--310.Google ScholarDigital Library
- David Eppstein, Maarten Löffler, and Darren Strash. 2013. Listing all maximal cliques in large sparse real-world graphs. Journal of Experimental Algorithmics (JEA) , Vol. 18 (2013), 3--1.Google Scholar
- Giorgio Gallo, Michael D Grigoriadis, and Robert E Tarjan. 1989. A fast parametric maximum flow algorithm and applications. SIAM J. Comput. , Vol. 18, 1 (1989), 30--55.Google ScholarDigital Library
- Andrew V Goldberg. 1984. Finding a maximum density subgraph. (1984).Google Scholar
- Paul Jaccard. 1912. The distribution of the flora in the alpine zone. 1. New phytologist, Vol. 11, 2 (1912), 37--50.Google Scholar
- Shweta Jain and C Seshadhri. 2017. A fast and provable method for estimating clique counts using turán's theorem. In Proceedings of the 26th international conference on world wide web. 441--449.Google ScholarDigital Library
- David Knoke and Song Yang. 2019. Social network analysis. SAGE publications.Google Scholar
- Aritra Konar and Nicholas D Sidiropoulos. 2020. Mining large quasi-cliques with quality guarantees from vertex neighborhoods. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 577--587.Google ScholarDigital Library
- Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford large network dataset collection.Google Scholar
- Chu-Min Li, Zhiwen Fang, and Ke Xu. 2013. Combining MaxSAT reasoning and incremental upper bound for the maximum clique problem. In 2013 IEEE 25th International Conference on Tools with Artificial Intelligence. IEEE, 939--946.Google ScholarDigital Library
- Chenhao Ma, Yixiang Fang, Reynold Cheng, Laks VS Lakshmanan, and Xiaolin Han. 2022. A convex-programming approach for efficient directed densest subgraph discovery. In Proceedings of the 2022 International Conference on Management of Data. 845--859.Google ScholarDigital Library
- Chenhao Ma, Yixiang Fang, Reynold Cheng, Laks VS Lakshmanan, Wenjie Zhang, and Xuemin Lin. 2020. Efficient algorithms for densest subgraph discovery on large directed graphs. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1051--1066.Google ScholarDigital Library
- Fabrizio Marinelli, Andrea Pizzuti, and Fabrizio Rossi. 2021. LP-based dual bounds for the maximum quasi-clique problem. Discrete Applied Mathematics , Vol. 296 (2021), 118--140.Google ScholarCross Ref
- Zhuqi Miao and Balabhaskar Balasundaram. 2020. An ellipsoidal bounding scheme for the quasi-clique number of a graph. INFORMS Journal on Computing , Vol. 32, 3 (2020), 763--778.Google ScholarDigital Library
- Michael Mitzenmacher, Jakub Pachocki, Richard Peng, Charalampos Tsourakakis, and Shen Chen Xu. 2015. Scalable large near-clique detection in large-scale networks via sampling. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 815--824.Google ScholarDigital Library
- Panos M Pardalos and Jue Xue. 1994. The maximum clique problem. Journal of global Optimization , Vol. 4 (1994), 301--328.Google ScholarCross Ref
- Jeffrey Pattillo, Alexander Veremyev, Sergiy Butenko, and Vladimir Boginski. 2013. On the maximum quasi-clique problem. Discrete Applied Mathematics , Vol. 161, 1--2 (2013), 244--257.Google ScholarDigital Library
- Bruno Q Pinto, Celso C Ribeiro, José A Riveaux, and Isabel Rosseti. 2021. A BRKGA-based matheuristic for the maximum quasi-clique problem with an exact local search strategy. RAIRO-Operations Research , Vol. 55 (2021), S741--S763.Google ScholarCross Ref
- Bruno Q Pinto, Celso C Ribeiro, Isabel Rosseti, and Alexandre Plastino. 2018. A biased random-key genetic algorithm for the maximum quasi-clique problem. European Journal of Operational Research , Vol. 271, 3 (2018), 849--865.Google ScholarCross Ref
- Celso C Ribeiro and José A Riveaux. 2019. An exact algorithm for the maximum quasi-clique problem. International Transactions in Operational Research, Vol. 26, 6 (2019), 2199--2229.Google ScholarCross Ref
- Ryan A Rossi, David F Gleich, and Assefaw H Gebremedhin. 2015. Parallel maximum clique algorithms with applications to network analysis. SIAM Journal on Scientific Computing , Vol. 37, 5 (2015), C589--C616.Google ScholarDigital Library
- Boyu Ruan, Junhao Gan, Hao Wu, and Anthony Wirth. 2021. Dynamic structural clustering on graphs. In Proceedings of the 2021 International Conference on Management of Data. 1491--1503.Google ScholarDigital Library
- Pablo San Segundo, Alvaro Lopez, and Panos M Pardalos. 2016. A new exact maximum clique algorithm for large and massive sparse graphs. Computers & Operations Research , Vol. 66 (2016), 81--94.Google ScholarDigital Library
- Nikita Spirin and Jiawei Han. 2012. Survey on web spam detection: principles and algorithms. ACM SIGKDD explorations newsletter , Vol. 13, 2 (2012), 50--64.Google ScholarDigital Library
- Etsuji Tomita, Akira Tanaka, and Haruhisa Takahashi. 2006. The worst-case time complexity for generating all maximal cliques and computational experiments. Theoretical computer science , Vol. 363, 1 (2006), 28--42.Google Scholar
- Tom Tseng, Laxman Dhulipala, and Julian Shun. 2021. Parallel index-based structural graph clustering and its approximation. In Proceedings of the 2021 International Conference on Management of Data. 1851--1864.Google ScholarDigital Library
- Charalampos Tsourakakis. 2015. The k-clique densest subgraph problem. In Proceedings of the 24th international conference on world wide web. 1122--1132.Google ScholarDigital Library
- Charalampos Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria Tsiarli. 2013. Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 104--112.Google ScholarDigital Library
- Takeaki Uno. 2005. Maximal Clique Enumerator (MACE). http://research.nii.ac.jp/ uno/codes.htm.Google Scholar
- Alexander Veremyev, Oleg A Prokopyev, Sergiy Butenko, and Eduardo L Pasiliao. 2016. Exact MIP-based approaches for finding maximum quasi-cliques and dense subgraphs. Computational Optimization and Applications , Vol. 64, 1 (2016), 177--214.Google ScholarDigital Library
- Stanley Wasserman and Katherine Faust. 1994. Social network analysis: Methods and applications. (1994).Google Scholar
- David R Wood. 1997. An algorithm for finding a maximum clique in a graph. Operations Research Letters , Vol. 21, 5 (1997), 211--217.Google ScholarDigital Library
- Xiaowei Xu, Nurcan Yuruk, Zhidan Feng, and Thomas AJ Schweiger. 2007. Scan: a structural clustering algorithm for networks. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. 824--833.Google ScholarDigital Library
- Yichen Xu, Chenhao Ma, Yixiang Fang, and Zhifeng Bao. 2023. Efficient and Effective Algorithms for Generalized Densest Subgraph Discovery. Proceedings of the ACM on Management of Data, Vol. 1, 2 (2023), 1--27.Google ScholarDigital Library
- Long Yuan, Lu Qin, Xuemin Lin, Lijun Chang, and Wenjie Zhang. 2016. Diversified top-k clique search. The VLDB Journal, Vol. 25, 2 (2016), 171--196.Google ScholarDigital Library
- Fangyuan Zhang and Sibo Wang. 2022. Effective indexing for dynamic structural graph clustering. Proceedings of the VLDB Endowment , Vol. 15, 11 (2022), 2908--2920. ioGoogle ScholarDigital Library
Index Terms
- A Similarity-based Approach for Efficient Large Quasi-clique Detection
Recommendations
Mining Large Quasi-cliques with Quality Guarantees from Vertex Neighborhoods
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningMining dense subgraphs is an important primitive across a spectrum of graph-mining tasks. In this work, we formally establish that two recurring characteristics of real-world graphs, namely heavy-tailed degree distributions and large clustering ...
Finding Maximal Quasi-cliques Containing a Target Vertex in a Graph
DATA 2015: Proceedings of 4th International Conference on Data Management Technologies and ApplicationsMany real-world phenomena such as social networks and biological networks can be modeled as graphs. Discovering dense sub-graphs from these graphs may be able to find interesting facts about the phenomena.
Quasi-cliques are a type of dense graphs, which ...
Clique-transversal sets and clique-coloring in planar graphs
Let G=(V,E) be a graph. A clique-transversal setD is a subset of vertices of G such that D meets all cliques of G, where a clique is defined as a complete subgraph maximal under inclusion and having at least two vertices. The clique-transversal number, ...
Comments