Abstract
Given a large data graph, trimming techniques can reduce the search space by removing vertices without outgoing edges. One application is to speed up the parallel decomposition of graphs into strongly connected components (SCC decomposition), which is a fundamental step for analyzing graphs. We observe that graph trimming is essentially a kind of arc-consistency problem, and AC-3, AC-4, and AC-6 are the most relevant arc-consistency algorithms for application to graph trimming. The existing parallel graph trimming methods require worst-case \(\mathcal O(nm)\) time and worst-case \(\mathcal O(n)\) space for graphs with n vertices and m edges. We call these parallel AC-3-based as they are much like the AC-3 algorithm. In this work, we propose AC-4-based and AC-6-based trimming methods. That is, AC-4-based trimming has an improved worst-case time of \(\mathcal O(n+m)\) but requires worst-case space of \(\mathcal O(n+m)\); compared with AC-4-based trimming, AC-6-based has the same worst-case time of \(\mathcal O(n+m)\) but an improved worst-case space of \(\mathcal O(n)\). We parallelize the AC-4-based and AC-6-based algorithms to be suitable for shared-memory multi-core machines. The algorithms are designed to minimize synchronization overhead. For these algorithms, we also prove the correctness and analyze time complexities with the work-depth model. In experiments, we compare these three parallel trimming algorithms over a variety of real and synthetic graphs on a multi-core machine, where each core corresponds to a worker. Specifically, for the maximum number of traversed edges per worker by using 16 workers, AC-3-based traverses up to 58.3 and 36.5 times more edges than AC-6-based trimming and AC-4-based trimming, respectively. That is, AC-6-based trimming traverses much fewer edges than other methods, which is meaningful especially for implicit graphs. In particular, for the practical running time, AC-6-based trimming achieves high speedups over graphs with a large portion of trimmable vertices.
Similar content being viewed by others
Notes
All our implementations, benchmarks, and results are available at https://github.com/Itisben/graph-trimming.git.
References
Aggarwal A, Anderson RJ (1988) A random NC algorithm for depth first search. Combinatorica 8(1):1–12
Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives Z (2007) DBpedia: a nucleus for a web of open data. In: The Semantic Web. Springer, Berlin, pp 722–735. https://doi.org/10.1007/978-3-540-76298-0_52
Backstrom L, Huttenlocher D, Kleinberg J, Lan X (2006) Group formation in large social networks: membership, growth, and evolution. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery (ACM), pp 44–54. https://doi.org/10.1145/1150402.1150412
Batagelj V, Zaversnik M (2003) An \({O}(m)\) algorithm for cores decomposition of networks. CoRR. arxiv: cs.DS/0310049
Bessière C (1994) Arc-consistency and arc-consistency again. Artif Intell 65(1):179–190. https://doi.org/10.1016/0004-3702(94)90041-8
Blelloch GE, Maggs BM (2010) Parallel algorithms. In: Algorithms and theory of computation handbook: special topics and techniques, pp 25–25
Bloemen V (2015) On-the-fly parallel decomposition of strongly connected components. Master’s thesis, University of Twente
Bloemen V, Laarman A, van de Pol J (2016) Multi-core on-the-fly SCC decomposition. ACM SIGPLAN Not 51(8):1–12. https://doi.org/10.1145/3016078.2851161
Cha M, Haddadi H, Benevenuto F, Gummadi K (2010) Measuring user influence in twitter: the million follower fallacy. In: Proceedings of the International AAAI Conference on Web and Social Media, vol 4
Cha M, Haddadi H, Benevenuto F, Gummadi KP (2010) Measuring user influence in twitter: the million follower fallacy. In: ICWSM. Washington DC, USA
Chen X, Chen C, Shen J, Fang J, Tang T, Yang C, Wang Z (2018) Orchestrating parallel detection of strongly connected components on GPUs. Parallel Comput 78:101–114. https://doi.org/10.1016/j.parco.2017.11.001
Chen Y, Guo B, Huang X (2019) \(\delta\)-transitive closures and triangle consistency checking: a new way to evaluate graph pattern queries in large graph databases. J Supercomput. https://doi.org/10.1007/s11227-019-02762-4
Cooper PR, Swain MJ (1992) Arc consistency: parallelism and domain dependence. Artif Intell 58(1–3):207–235. https://doi.org/10.1016/0004-3702(92)90008-l
Coppersmith D, Fleischer L, Hendrickson B, Pinar A (2003) A divide-and-conquer algorithm for identifying strongly connected components. Tech. rep., Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, CA (US). https://doi.org/10.2172/889876
Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms. MIT Press, Cambridge
Dagum L, Menon R (1998) OpenMP: an industry standard API for shared-memory programming. IEEE Comput Sci Eng 5(1):46–55. https://doi.org/10.1109/99.660313
Defo RK, Wang R, Manjunathaiah M (2019) Parallel BFS implementing optimized decomposition of space and KMC simulations for diffusion of vacancies for quantum storage. J Comput Sci 36:101018
Dhulipala L, Blelloch G, Shun J (2017) Julienne: a framework for parallel graph algorithms using work-efficient bucketing. In: Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, pp 293–304
Dib M, Abdallah R, Caminada A (2010) Arc-consistency in constraint satisfaction problems: a survey. In: 2010 Second International Conference on Computational Intelligence, Modelling and Simulation. IEEE. https://doi.org/10.1109/cimsim.2010.18
Erlebach T, Hagerup T, Jansen K, Minzlaff M, Wolff A (2010) Trimming of graphs, with application to point labeling. Theory Comput Syst 47(3):613–636
Fleischer LK, Hendrickson B, Pınar A (2000) On identifying strongly connected components in parallel. In: International Parallel and Distributed Processing Symposium. Springer, pp 505–511. https://doi.org/10.1007/3-540-45591-4_68
Fleischer LK, Hendrickson B, Pinar A (2007) On identifying strongly connected components in parallel (November 2014), pp 505–511. https://doi.org/10.1007/3-540-45591-4_68
Freuder E, Régin JC (1999) Using constraint metaknowledge to reduce arc consistency computation. Artif Intell 107(1):125–148. https://doi.org/10.1016/s0004-3702(98)00105-2
Gao Y, Dong W, Wu W, Chen C, Li XY, Bu J (2015) Scalpel: scalable preferential link tomography based on graph trimming. IEEE/ACM Trans Netw 24(3):1392–1403
Harabor D, Grastien A (2011) Online graph pruning for pathfinding on grid maps. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 25
Heule MJ (2019) Trimming graphs using clausal proof optimization. In: International Conference on Principles and Practice of Constraint Programming. Springer, pp 251–267
Hojati R, Brayton RK, Kurshan RP (1993) BDD-based debugging of designs using language containment and fair CTL. In: International Conference on Computer Aided Verification. Springer, pp 41–58. https://doi.org/10.1007/3-540-56922-7_5
Hong S, Chafi H, Sedlar E, Olukotun K (2012) Green-marl: a DSL for easy and efficient graph analysis. In: Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, pp 349–362. https://doi.org/10.1145/2248487.2151013
Hong S, Rodia NC, Olukotun K (2013) On fast parallel detection of strongly connected components (SCC) in small-world graphs. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC ‘13. ACM Press. https://doi.org/10.1145/2503210.2503246
JéJé J (1992) An introduction to parallel algorithms. Addison-Wesley, Reading
Ji Y, Liu H, Huang HH (2018) iSpan: parallel identification of strongly connected components with spanning trees. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE. https://doi.org/10.1109/sc.2018.00061
Kirousis LM (1993) Fast parallel constraint satisfaction. Tech. Rep. 1. https://doi.org/10.1016/0004-3702(93)90063-h
Kumar R, Novak J, Tomkins A (2010) Structure and evolution of online social networks. In: Link Mining: Models, Algorithms, and Applications. Springer, New York, pp 337–357. https://doi.org/10.1007/978-1-4419-6515-8_13
Kunegis J (2013) KONECT. In: Proceedings of the 22nd International Conference on World Wide Web—WWW ‘13 Companion. ACM, ACM Press. https://doi.org/10.1145/2487788.2488173
Leskovec J, Huttenlocher D, Kleinberg J (2010) Signed networks in social media. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp 1361–1370
Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp 177–187
Leskovec J, Krevl A (2014) SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data
Lowe G (2016) Concurrent depth-first search algorithms based on Tarjan’s Algorithm. Int J Softw Tools Technol Transf 18(2):129–147. https://doi.org/10.1007/s10009-015-0382-1
Mackworth AK (1977) Consistency in networks of relations. Artif Intell 8(1):99–118. https://doi.org/10.1016/0004-3702(77)90007-8
Mackworth AK, Freuder EC (1985) The complexity of some polynomial network consistency algorithms for constraint satisfaction problems. Artif Intell 25(1):65–74. https://doi.org/10.1016/0004-3702(85)90035-9
Mclendon III W, Hendrickson B, Plimpton SJ, Rauchwerger L (2005) Finding strongly connected components in distributed graphs. J Parallel Distrib Comput 65(8):901–910. https://doi.org/10.1016/j.jpdc.2005.03.007
Merz S (2001) Model checking: a tutorial overview. In: Modeling and verification of parallel processes. Springer, Berlin, pp 3–38. https://doi.org/10.1007/3-540-45510-8_1
Michael MM (2002) High performance dynamic lock-free hash tables and list-based sets. In: Proceedings of the Fourteenth Annual ACM Symposium on Parallel Algorithms and Architectures—SPAA ‘02. ACM Press. https://doi.org/10.1145/564870.564881
Milman G, Kogan A, Lev Y, Luchangco V, Petrank, E (2018) Bq: a lock-free queue with batching. In: Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures—SPAA ‘18. ACM Press. https://doi.org/10.1145/3210377.3210388
Mohr R, Henderson TC (1986) Arc and path consistency revisited. Artif Intell 28(2):225–233. https://doi.org/10.1016/0004-3702(86)90083-4
Niu X, Sun X, Wang H, Rong S, Qi G, Yu Y (2011) Zhishi.me—weaving chinese linking open data. In: The Semantic Web—ISWC 2011. Springer, Berlin, pp 205–220. https://doi.org/10.1007/978-3-642-25093-4_14
Pelánek R (2007) BEEM: benchmarks for explicit model checkers. In: Model checking software. Springer, Berlin, pp 263–267. https://doi.org/10.1007/978-3-540-73370-6_17
Reif JH (1985) Depth-first search is inherently sequential. Inf Process Lett 20(5):229–234. https://doi.org/10.1016/0020-0190(85)90024-9
Renault E, Duret-Lutz A, Kordon F, Poitrenaud D (2015) Parallel explicit model checking for generalized Büchi automata. In: Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 9035. Springer, Verlag, pp 613–627. https://doi.org/10.1007/978-3-662-46681-0_56
Rossi RA, Ahmed NK (2015) The network data repository with interactive graph analytics and visualization. In: AAAI. http://networkrepository.com
Russell S, Norvig P (2009) Artificial intelligence: a modern approach, 3rd edn. Prentice Hall Press, Upper Saddle River
Shun J (2017) Shared-memory parallelism can be simple, fast, and scalable. PUB7255 Association for Computing Machinery and Morgan & Claypool
Slota GM, Rajamanickam S, Madduri K (2014) BFS and coloring-based parallel algorithms for strongly connected components and related problems. In: Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS. IEEE Computer Society, pp 550–559. https://doi.org/10.1109/IPDPS.2014.64
Social network F. Friendster: the online gaming social network. https://archive.org/details/friendster-dataset-201107
Sun J, Kunegis J, Staab S (2016) Predicting user roles in social networks using transfer learning with feature transformation. In: 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW). IEEE, pp 128–135. https://doi.org/10.1109/icdmw.2016.0026
Takac L, Zabovsky M (2012) Data analysis in public social networks. In: International Scientific Conference and International Workshop Present Day Trends of Innovations, vol 1
Tarjan R (1972) Depth-first search and linear graph algorithms. SIAM J Comput 1(2):146–160. https://doi.org/10.1137/0201010
Valois JD (1995) Lock-free linked lists using compare-and-swap. In: Proceedings of the Fourteenth Annual ACM Symposium on Principles of Distributed Computing—PODC ‘95. ACM Press, pp 214–222. https://doi.org/10.1145/224964.224988
Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393(6684):440. https://doi.org/10.1515/9781400841356.301
Xiaoping G, Mengyu R, Hong Z, Ping W, Ruijun R, Feng G (2021) Construction technology of knowledge graph and its application in power grid. In: E3S Web of Conferences, vol 256. EDP Sciences, p 01039
Acknowledgements
We acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Guo, B., Sekerinski, E. Efficient parallel graph trimming by arc-consistency. J Supercomput 78, 15269–15313 (2022). https://doi.org/10.1007/s11227-022-04457-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04457-9