research-article

Free Access

A Similarity-based Approach for Efficient Large Quasi-clique Detection

Authors:
Jiayang Pang

The Chinese University of Hong Kong, Shenzhen, Shenzhen, Guangdong, China

The Chinese University of Hong Kong, Shenzhen, Shenzhen, Guangdong, China

0009-0006-8976-0111
View Profile

,
Chenhao Ma

The Chinese University of Hong Kong, Shenzhen, Shenzhen, Guangdong, China

The Chinese University of Hong Kong, Shenzhen, Shenzhen, Guangdong, China

0000-0002-3243-8512
View Profile

,
Yixiang Fang

The Chinese University of Hong Kong, Shenzhen, Shenzhen, Guangdong, China

The Chinese University of Hong Kong, Shenzhen, Shenzhen, Guangdong, China

0000-0002-5047-8593
View Profile

Authors Info & Claims

WWW '24: Proceedings of the ACM on Web Conference 2024May 2024Pages 401–409https://doi.org/10.1145/3589334.3645374

Published:13 May 2024Publication History

WWW '24: Proceedings of the ACM on Web Conference 2024

Pages 401–409

ABSTRACT

Identifying dense subgraphs called quasi-cliques is pivotal in various graph mining tasks across domains like biology, social networks, and e-commerce. However, recent algorithms still suffer from efficiency issues when mining large quasi-cliques in massive and complex graphs. Our key insight is that vertices within a quasi-clique exhibit similar neighborhoods to some extent. Based on this, we introduce NBSim and FastNBSim, efficient algorithms that find near-maximum quasi-cliques by exploiting vertex neighborhood similarity. FastNBSim further uses MinHash approximations to reduce the time complexity for similarity computation. Empirical evaluation on 10 real-world graphs shows that our algorithms deliver up to three orders of magnitude speedup versus the state-of-the-art algorithms, while ensuring high-quality quasi-clique extraction.

Supplemental Material

rfp0357.mp4

Supplemental video

mp4

3.6 MB

Download

References

James Abello, Mauricio GC Resende, and Sandra Sudarsky. 2002. Massive quasi-clique detection. In LATIN 2002: Theoretical Informatics: 5th Latin American Symposium Cancun, Mexico, April 3--6, 2002 Proceedings 5. Springer, 598--612.Google ScholarCross Ref
Coen Bron and Joep Kerbosch. 1973. Algorithm 457: finding all cliques of an undirected graph. Commun. ACM, Vol. 16, 9 (1973), 575--577.Google ScholarDigital Library
Gregory Buehrer and Kumar Chellapilla. 2008. A scalable pattern mining approach to web graph compression with communities. In Proceedings of the 2008 international conference on web search and data mining. 95--106.Google ScholarDigital Library
Renato Carmo and Alexandre Züge. 2012. Branch and bound algorithms for the maximum clique problem under a unified framework. Journal of the Brazilian Computer Society , Vol. 18 (2012), 137--151.Google ScholarCross Ref
Randy Carraghan and Panos M Pardalos. 1990. An exact algorithm for the maximum clique problem. Operations Research Letters , Vol. 9, 6 (1990), 375--382.Google ScholarDigital Library
Lijun Chang. 2019. Efficient maximum clique computation over large sparse graphs. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 529--538.Google ScholarDigital Library
Jiejiang Chen, Shaowei Cai, Shiwei Pan, Yiyuan Wang, Qingwei Lin, Mengyu Zhao, and Minghao Yin. 2021. NuQClq: an effective local search algorithm for maximum quasi-clique problem. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 12258--12266.Google ScholarCross Ref
James Cheng, Linhong Zhu, Yiping Ke, and Shumo Chu. 2012. Fast algorithms for maximal clique enumeration with limited memory. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 1240--1248.Google ScholarDigital Library
Apurba Das, Seyed-Vahid Sanei-Mehri, and Srikanta Tirthapura. 2018. Shared-memory parallel maximal clique enumeration. In 2018 IEEE 25th International Conference on High Performance Computing (HiPC). IEEE, 62--71.Google ScholarCross Ref
Youcef Djeddi, Hacene Ait Haddadene, and Nabil Belacel. 2019. An extension of adaptive multi-start tabu search for the maximum quasi-clique problem. Computers & Industrial Engineering , Vol. 132 (2019), 280--292.Google ScholarDigital Library
Alessandro Epasto, Silvio Lattanzi, and Mauro Sozio. 2015. Efficient densest subgraph computation in evolving graphs. In Proceedings of the 24th international conference on world wide web. 300--310.Google ScholarDigital Library
David Eppstein, Maarten Löffler, and Darren Strash. 2013. Listing all maximal cliques in large sparse real-world graphs. Journal of Experimental Algorithmics (JEA) , Vol. 18 (2013), 3--1.Google Scholar
Giorgio Gallo, Michael D Grigoriadis, and Robert E Tarjan. 1989. A fast parametric maximum flow algorithm and applications. SIAM J. Comput. , Vol. 18, 1 (1989), 30--55.Google ScholarDigital Library
Andrew V Goldberg. 1984. Finding a maximum density subgraph. (1984).Google Scholar
Paul Jaccard. 1912. The distribution of the flora in the alpine zone. 1. New phytologist, Vol. 11, 2 (1912), 37--50.Google Scholar
Shweta Jain and C Seshadhri. 2017. A fast and provable method for estimating clique counts using turán's theorem. In Proceedings of the 26th international conference on world wide web. 441--449.Google ScholarDigital Library
David Knoke and Song Yang. 2019. Social network analysis. SAGE publications.Google Scholar
Aritra Konar and Nicholas D Sidiropoulos. 2020. Mining large quasi-cliques with quality guarantees from vertex neighborhoods. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 577--587.Google ScholarDigital Library
Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford large network dataset collection.Google Scholar
Chu-Min Li, Zhiwen Fang, and Ke Xu. 2013. Combining MaxSAT reasoning and incremental upper bound for the maximum clique problem. In 2013 IEEE 25th International Conference on Tools with Artificial Intelligence. IEEE, 939--946.Google ScholarDigital Library
Chenhao Ma, Yixiang Fang, Reynold Cheng, Laks VS Lakshmanan, and Xiaolin Han. 2022. A convex-programming approach for efficient directed densest subgraph discovery. In Proceedings of the 2022 International Conference on Management of Data. 845--859.Google ScholarDigital Library
Chenhao Ma, Yixiang Fang, Reynold Cheng, Laks VS Lakshmanan, Wenjie Zhang, and Xuemin Lin. 2020. Efficient algorithms for densest subgraph discovery on large directed graphs. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1051--1066.Google ScholarDigital Library
Fabrizio Marinelli, Andrea Pizzuti, and Fabrizio Rossi. 2021. LP-based dual bounds for the maximum quasi-clique problem. Discrete Applied Mathematics , Vol. 296 (2021), 118--140.Google ScholarCross Ref
Zhuqi Miao and Balabhaskar Balasundaram. 2020. An ellipsoidal bounding scheme for the quasi-clique number of a graph. INFORMS Journal on Computing , Vol. 32, 3 (2020), 763--778.Google ScholarDigital Library
Michael Mitzenmacher, Jakub Pachocki, Richard Peng, Charalampos Tsourakakis, and Shen Chen Xu. 2015. Scalable large near-clique detection in large-scale networks via sampling. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 815--824.Google ScholarDigital Library
Panos M Pardalos and Jue Xue. 1994. The maximum clique problem. Journal of global Optimization , Vol. 4 (1994), 301--328.Google ScholarCross Ref
Jeffrey Pattillo, Alexander Veremyev, Sergiy Butenko, and Vladimir Boginski. 2013. On the maximum quasi-clique problem. Discrete Applied Mathematics , Vol. 161, 1--2 (2013), 244--257.Google ScholarDigital Library
Bruno Q Pinto, Celso C Ribeiro, José A Riveaux, and Isabel Rosseti. 2021. A BRKGA-based matheuristic for the maximum quasi-clique problem with an exact local search strategy. RAIRO-Operations Research , Vol. 55 (2021), S741--S763.Google ScholarCross Ref
Bruno Q Pinto, Celso C Ribeiro, Isabel Rosseti, and Alexandre Plastino. 2018. A biased random-key genetic algorithm for the maximum quasi-clique problem. European Journal of Operational Research , Vol. 271, 3 (2018), 849--865.Google ScholarCross Ref
Celso C Ribeiro and José A Riveaux. 2019. An exact algorithm for the maximum quasi-clique problem. International Transactions in Operational Research, Vol. 26, 6 (2019), 2199--2229.Google ScholarCross Ref
Ryan A Rossi, David F Gleich, and Assefaw H Gebremedhin. 2015. Parallel maximum clique algorithms with applications to network analysis. SIAM Journal on Scientific Computing , Vol. 37, 5 (2015), C589--C616.Google ScholarDigital Library
Boyu Ruan, Junhao Gan, Hao Wu, and Anthony Wirth. 2021. Dynamic structural clustering on graphs. In Proceedings of the 2021 International Conference on Management of Data. 1491--1503.Google ScholarDigital Library
Pablo San Segundo, Alvaro Lopez, and Panos M Pardalos. 2016. A new exact maximum clique algorithm for large and massive sparse graphs. Computers & Operations Research , Vol. 66 (2016), 81--94.Google ScholarDigital Library
Nikita Spirin and Jiawei Han. 2012. Survey on web spam detection: principles and algorithms. ACM SIGKDD explorations newsletter , Vol. 13, 2 (2012), 50--64.Google ScholarDigital Library
Etsuji Tomita, Akira Tanaka, and Haruhisa Takahashi. 2006. The worst-case time complexity for generating all maximal cliques and computational experiments. Theoretical computer science , Vol. 363, 1 (2006), 28--42.Google Scholar
Tom Tseng, Laxman Dhulipala, and Julian Shun. 2021. Parallel index-based structural graph clustering and its approximation. In Proceedings of the 2021 International Conference on Management of Data. 1851--1864.Google ScholarDigital Library
Charalampos Tsourakakis. 2015. The k-clique densest subgraph problem. In Proceedings of the 24th international conference on world wide web. 1122--1132.Google ScholarDigital Library
Charalampos Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria Tsiarli. 2013. Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 104--112.Google ScholarDigital Library
Takeaki Uno. 2005. Maximal Clique Enumerator (MACE). http://research.nii.ac.jp/ uno/codes.htm.Google Scholar
Alexander Veremyev, Oleg A Prokopyev, Sergiy Butenko, and Eduardo L Pasiliao. 2016. Exact MIP-based approaches for finding maximum quasi-cliques and dense subgraphs. Computational Optimization and Applications , Vol. 64, 1 (2016), 177--214.Google ScholarDigital Library
Stanley Wasserman and Katherine Faust. 1994. Social network analysis: Methods and applications. (1994).Google Scholar
David R Wood. 1997. An algorithm for finding a maximum clique in a graph. Operations Research Letters , Vol. 21, 5 (1997), 211--217.Google ScholarDigital Library
Xiaowei Xu, Nurcan Yuruk, Zhidan Feng, and Thomas AJ Schweiger. 2007. Scan: a structural clustering algorithm for networks. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. 824--833.Google ScholarDigital Library
Yichen Xu, Chenhao Ma, Yixiang Fang, and Zhifeng Bao. 2023. Efficient and Effective Algorithms for Generalized Densest Subgraph Discovery. Proceedings of the ACM on Management of Data, Vol. 1, 2 (2023), 1--27.Google ScholarDigital Library
Long Yuan, Lu Qin, Xuemin Lin, Lijun Chang, and Wenjie Zhang. 2016. Diversified top-k clique search. The VLDB Journal, Vol. 25, 2 (2016), 171--196.Google ScholarDigital Library
Fangyuan Zhang and Sibo Wang. 2022. Effective indexing for dynamic structural graph clustering. Proceedings of the VLDB Endowment , Vol. 15, 11 (2022), 2908--2920. ioGoogle ScholarDigital Library

Index Terms

A Similarity-based Approach for Efficient Large Quasi-clique Detection
1. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory
      1. Graph algorithms
2. Theory of computation
  1. Design and analysis of algorithms
    1. Graph algorithms analysis

Recommendations

Mining Large Quasi-cliques with Quality Guarantees from Vertex Neighborhoods
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Mining dense subgraphs is an important primitive across a spectrum of graph-mining tasks. In this work, we formally establish that two recurring characteristics of real-world graphs, namely heavy-tailed degree distributions and large clustering ...
Read More
Finding Maximal Quasi-cliques Containing a Target Vertex in a Graph
DATA 2015: Proceedings of 4th International Conference on Data Management Technologies and Applications

Many real-world phenomena such as social networks and biological networks can be modeled as graphs. Discovering dense sub-graphs from these graphs may be able to find interesting facts about the phenomena.

Quasi-cliques are a type of dense graphs, which ...
Read More
Clique-transversal sets and clique-coloring in planar graphs

Let G=(V,E) be a graph. A clique-transversal setD is a subset of vertices of G such that D meets all cliques of G, where a clique is defined as a complete subgraph maximal under inclusion and having at least two vertices. The clique-transversal number, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '24: Proceedings of the ACM on Web Conference 2024
May 2024
4826 pages
ISBN:9798400701719
DOI:10.1145/3589334
General Chairs:
Tat-Seng Chua
National University of Singapore
,
Chong-Wah Ngo
Singapore Management University
,
Proceedings Chair:
Roy Ka-Wei Lee
Singapore University of Technology and Design
,
Program Chairs:
Ravi Kumar
Google
,
Hady W. Lauw
Singapore Management University
Copyright © 2024 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 May 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
minhash
neighborhoods
quasi-cliques
similarity
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 58
  Total Downloads
- Downloads (Last 12 months)58
- Downloads (Last 6 weeks)58
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Similarity-based Approach for Efficient Large Quasi-clique Detection

WWW '24: Proceedings of the ACM on Web Conference 2024

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Mining Large Quasi-cliques with Quality Guarantees from Vertex Neighborhoods

Finding Maximal Quasi-cliques Containing a Target Vertex in a Graph

Clique-transversal sets and clique-coloring in planar graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A Similarity-based Approach for Efficient Large Quasi-clique Detection

WWW '24: Proceedings of the ACM on Web Conference 2024

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Mining Large Quasi-cliques with Quality Guarantees from Vertex Neighborhoods

Finding Maximal Quasi-cliques Containing a Target Vertex in a Graph

Clique-transversal sets and clique-coloring in planar graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media