research-article

GraphCube: Interconnection Hierarchy-aware Graph Processing

Authors:
Xinbiao Gan

National University of Defense Technology, Changsha, China

National University of Defense Technology, Changsha, China

https://orcid.org/0000-0003-3622-1772
View Profile

,
Guang Wu

National University of Defense Technology, Changsha, China

National University of Defense Technology, Changsha, China

https://orcid.org/0000-0002-8911-966X
View Profile

,
Shenghao Qiu

University of Leeds, Leeds, United Kingdom

University of Leeds, Leeds, United Kingdom

https://orcid.org/0000-0001-6345-0306
View Profile

,
Feng Xiong

National University of Defense Technology, Changsha, China

National University of Defense Technology, Changsha, China

https://orcid.org/0009-0008-7057-7231
View Profile

,
Jiaqi Si

National University of Defense Technology, Changsha, China

National University of Defense Technology, Changsha, China

https://orcid.org/0000-0002-9959-611X
View Profile

,
Jianbin Fang

National University of Defense Technology, Changsha, China

National University of Defense Technology, Changsha, China

https://orcid.org/0000-0003-3542-4869
View Profile

,
Dezun Dong

National University of Defense Technology, Changsha, China

National University of Defense Technology, Changsha, China

https://orcid.org/0000-0001-6243-8479
View Profile

,
Chunye Gong

National University of Defense Technology, Changsha, China

National University of Defense Technology, Changsha, China

https://orcid.org/0000-0003-0349-1100
View Profile

,
Tiejun Li

National University of Defense Technology, Changsha, China

National University of Defense Technology, Changsha, China

https://orcid.org/0000-0003-3381-3027
View Profile

,
Zheng Wang

University of Leeds, Leeds, United Kingdom

University of Leeds, Leeds, United Kingdom

https://orcid.org/0000-0001-6157-0662
View Profile

PPoPP '24: Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel ProgrammingMarch 2024Pages 160–174https://doi.org/10.1145/3627535.3638498

Published:20 February 2024Publication History

PPoPP '24: Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming

Pages 160–174

ABSTRACT

Processing large-scale graphs with billions to trillions of edges requires efficiently utilizing parallel systems. However, current graph processing engines do not scale well beyond a few tens of computing nodes because they are oblivious to the communication cost variations across the interconnection hierarchy. We introduce GraphCube, a better approach to optimizing graph processing on large-scale parallel systems with complex interconnections. GraphCube features a new graph partitioning approach to achieve better load balancing and minimize communication overhead across multiple levels of the interconnection hierarchy. We evaluate GraphCube by applying it to fundamental graph operations performed on synthetic and real-world graph datasets. Our evaluation used up to 79,024 computing nodes and 1.2+ million processor cores. Our large-scale experiments show that GraphCube outperforms state-of-the-art parallel graph processing methods in throughput and scalability. Furthermore, GraphCube outperformed the top-ranked systems on the Graph 500 list.

Supplemental Material

Available for Download

pdf

p160-gan-supp.pdf (1.5 MB)

Supplemental material.

References

Zainab Abbas, Vasiliki Kalavri, Paris Carbone, and Vladimir Vlassov. 2018. Streaming graph partitioning: an experimental study. Proceedings of the VLDB Endowment 11, 11 (2018), 1590--1603.Google ScholarDigital Library
Soramichi Akiyama. 2020. Assessing Impact of Data Partitioning for Approximate Memory in C/C++ Code. arXiv preprint arXiv:2004.01637 (2020).Google Scholar
Scott Beamer, Krste Asanovic, and David Patterson. 2012. Direction-optimizing breadth-first search. In SC'12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE, 1--10.Google ScholarDigital Library
Huanqi Cao, Yuanwei Wang, Haojie Wang, Heng Lin, Zixuan Ma, Wanwang Yin, and Wenguang Chen. 2022. Scaling graph traversal to 281 trillion edges with 40 million cores. In Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 234--245.Google ScholarDigital Library
Donglin Chen, Jianbin Fang, Shizhao Chen, Chuanfu Xu, and Zheng Wang. 2019. Optimizing sparse matrix-vector multiplications on an armv8-based many-core architecture. International Journal of Parallel Programming 47, 3 (2019), 418--432.Google ScholarDigital Library
Donglin Chen, Jianbin Fang, Chuanfu Xu, Shizhao Chen, and Zheng Wang. 2020. Characterizing scalability of sparse matrix-vector multiplications on phytium ft-2000+. International Journal of Parallel Programming 48, 1 (2020), 80--97.Google ScholarDigital Library
R. Chen, J. Shi, Y. Chen, and H. Chen. 2015. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. European Conference on Computer Systems (2015), 1--15.Google Scholar
Rong Chen, Jiaxin Shi, Yanzhe Chen, Binyu Zang, Haibing Guan, and Haibo Chen. 2018. PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs. ACM Trans. Parallel Comput. 5, 3 (2018), 13:1--13:39.Google ScholarDigital Library
Rong Chen, Jiaxin Shi, Yanzhe Chen, Binyu Zang, Haibing Guan, and Haibo Chen. 2019. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. ACM Transactions on Parallel Computing (TOPC) 5, 3 (2019), 1--39.Google ScholarDigital Library
Yongzhi Chen and Yuefan Deng. 2009. A detailed analysis of communication load balance on BlueGene supercomputer. Computer physics communications 180, 8 (2009), 1251--1258.Google Scholar
William James Dally and Brian Patrick Towles. 2004. Principles and practices of interconnection networks. Elsevier.Google ScholarDigital Library
Roshan Dathathri, Gurbinder Gill, Loc Hoang, Hoang-Vu Dang, Alex Brooks, Nikoli Dryden, Marc Snir, and Keshav Pingali. 2018. Gluon: A communication-optimizing substrate for distributed heterogeneous graph analytics. In Proceedings of the 39th ACM SIGPLAN conference on programming language design and implementation. 752--768.Google ScholarDigital Library
Timothy A Davis. 2006. Direct methods for sparse linear systems. SIAM.Google Scholar
Jack Dongarra. 2020. Report on the Fujitsu Fugaku system. University of Tennessee-Knoxville Innovative Computing Laboratory, Tech. Rep. ICLUT-20-06 (2020).Google Scholar
Wenfei Fan, Tao He, Longbin Lai, Xue Li, Yong Li, Zhao Li, Zhengping Qian, Chao Tian, Lei Wang, Jingbo Xu, et al. 2021. GraphScope: a unified engine for big graph processing. Proceedings of the VLDB Endowment 14, 12 (2021), 2879--2892.Google ScholarDigital Library
Wenfei Fan, Muyang Liu, Chao Tian, Ruiqi Xu, and Jingren Zhou. 2020. Incrementalization of graph partitioning algorithms. Proceedings of the VLDB Endowment 13, 8 (2020), 1261--1274.Google ScholarDigital Library
Wenfei Fan, Ruiqi Xu, Qiang Yin, Wenyuan Yu, and Jingren Zhou. 2022. Application-driven graph partitioning. The VLDB Journal (2022), 1--24.Google Scholar
Jianbin Fang, Peng Zhang, Chun Huang, Tao Tang, Kai Lu, Ruibo Wang, and Zheng Wang. 2022. Programming Bare-Metal Accelerators with Heterogeneous Threading Models: A Case Study of Matrix-3000. arXiv preprint arXiv:2210.12230 (2022).Google Scholar
Xing Feng, Lijun Chang, Xuemin Lin, Lu Qin, and Wenjie Zhang. 2016. Computing connected components with linear communication cost in pregel-like systems. In 2016 IEEE 32nd International Conference on Data Engineering (ICDE). IEEE, 85--96.Google ScholarCross Ref
Zhisong Fu, Michael Personick, and Bryan Thompson. 2014. Map-Graph: A high level API for fast development of high performance graph analytics on GPUs. In Proceedings of workshop on GRAph data management experiences and systems. 1--6.Google ScholarDigital Library
Pablo Fuentes, Mariano Benito, Enrique Vallejo, José Luis Bosque, Ramön Beivide, Andreea Anghel, Germán Rodríguez, Mitch Gusat, Cyriel Minkenberg, and Mateo Valero. 2017. A scalable synthetic traffic model of Graph500 for computer networks analysis. Concurrency and Computation: Practice and Experience 29, 24 (2017), e4231.Google ScholarCross Ref
Xinbiao Gan, Yiming Zhang, Ruibo Wang, Tiejun Li, Tiaojie Xiao, Ruigeng Zeng, Jie Liu, and Kai Lu. 2021. TianheGraph: Customizing Graph Search for Graph500 on Tianhe Supercomputer. IEEE Transactions on Parallel and Distributed Systems (2021).Google Scholar
Xinbiao Gan, Yiming Zhang, Ruigeng Zeng, Jie Liu, Ruibo Wang, Tiejun Li, Li Chen, and Kai Lu. 2022. XTree: Traversal-Based Partitioning for Extreme-Scale Graph Processing on Supercomputers. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 2046--2059.Google Scholar
Tao Gao, Yutong Lu, Baida Zhang, and Guang Suo. 2014. Using the intel many integrated core to accelerate graph traversal. The International journal of high performance computing applications 28, 3 (2014), 255--266.Google Scholar
Sayan Ghosh, Nathan R Tallent, and Mahantesh Halappanavar. 2021. Characterizing Performance of Graph Neighborhood Communication Patterns. IEEE Transactions on Parallel and Distributed Systems 33, 4 (2021), 915--928.Google Scholar
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation. 17--30.Google ScholarDigital Library
Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: graph processing in a distributed dataflow framework. In Proceedings of the 11th USENIX conference on Operating Systems Design and Implementation. 599--613.Google Scholar
Samuel Grossman, Heiner Litz, and Christos Kozyrakis. 2018. Making pull-based graph processing performant. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, 246--260.Google ScholarDigital Library
http://graph500.org/. 2021. The Graph 500 List. https://graph500.org/ Last accessed 03 March 2022.Google Scholar
Eu Inc. 2022. url.eu-2015. https://law.di.unimi.it/webdata/eu-2015/ Last accessed 03 December 2022.Google Scholar
Twitter Inc. 2021. twitter-2010. https://law.di.unimi.it/webdata/twitter-2010/ Last accessed 03 December 2021.Google Scholar
Keita Iwabuchi, Hitoshi Sato, Yuichiro Yasui, Katsuki Fujisawa, and Satoshi Matsuoka. 2014. NVM-based hybrid BFS with memory efficient data structure. In 2014 IEEE International Conference on Big Data (Big Data). IEEE, 529--538.Google ScholarCross Ref
George Karypis and Vipin Kumar. 1995. METIS-unstructured graph partitioning and sparse matrix ordering system, version 2.0. (1995).Google Scholar
George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on scientific Computing 20, 1 (1998), 359--392.Google ScholarDigital Library
George Karypis, Kirk Schloegel, and Vipin Kumar. 1997. Parmetis: Parallel graph partitioning and sparse matrix ordering library. (1997).Google Scholar
John Kim, Wiliam J Dally, Steve Scott, and Dennis Abts. 2008. Technology-driven, highly-scalable dragonfly topology. In 2008 International Symposium on Computer Architecture. IEEE, 77--88.Google ScholarDigital Library
Deyu Kong, Xike Xie, and Zhuoxu Zhang. 2022. Clustering-based Partitioning for Large Web Graphs. arXiv preprint arXiv:2201.00472 (2022).Google Scholar
Dongsheng Li, Yiming Zhang, Jinyan Wang, and KianLee Tan. 2019. TopoX: Topology Refactorization for Efficient Graph Partitioning and Processing. PVLDB 12, 8 (2019), 891--905.Google ScholarDigital Library
Dongsheng Li, Yiming Zhang, Jinyan Wang, and Kian-Lee Tan. 2019. TopoX: Topology refactorization for efficient graph partitioning and processing. Proceedings of the VLDB Endowment 12, 8 (2019), 891--905.Google ScholarDigital Library
Z Li, C Wu, and Y Li. 2021. FEP-based large-scale virtual screening for effective drug discovery against COVID-19. In Int. Conf. High Performance Computing, Networking, Storage, and Analysis.Google Scholar
Xiangke LIAO, Liquan XIAO, Canqun YANG, and Yutong LU. [n.d.]. MilkyWay-2 supercomputer: system and application. ([n. d.]).Google Scholar
Xiang-Ke Liao, Zheng-Bin Pang, Ke-Fei Wang, Yu-Tong Lu, Min Xie, Jun Xia, De-Zun Dong, and Guang Suo. 2015. High performance interconnect network for Tianhe system. Journal of Computer Science and Technology 30, 2 (2015), 259--272.Google ScholarCross Ref
Heng Lin, Xiongchao Tang, Bowen Yu, Youwei Zhuo, Wenguang Chen, Jidong Zhai, Wanwang Yin, and Weimin Zheng. 2017. Scalable graph traversal on sunway taihulight with ten million cores. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 635--645.Google ScholarCross Ref
Heng Lin, Xiaowei Zhu, Bowen Yu, Xiongchao Tang, Wei Xue, Wenguang Chen, Lufei Zhang, Torsten Hoefler, Xiaosong Ma, Xin Liu, et al. 2018. Shentu: processing multi-trillion edge graphs on millions of cores in seconds. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 706--716.Google ScholarDigital Library
Yucheng Low. 2013. Graphlab: A distributed abstraction for large scale machine learning. University of California (2013).Google Scholar
Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A Framework for Machine Learning in the Cloud. PVLDB 5, 8 (2012), 716--727.Google ScholarDigital Library
Kai Lu, Yaohua Wang, Yang Guo, Chun Huang, Sheng Liu, Ruibo Wang, Jianbin Fang, Tao Tang, Zhaoyun Chen, Biwei Liu, et al. 2022. MT-3000: a heterogeneous multi-zone processor for HPC. CCF Transactions on High Performance Computing (2022), 1--15.Google Scholar
Meilian Lu, Zhenglin Zhang, Zhihe Qu, and Yu Kang. 2018. LPANNI: Overlapping community detection using label propagation in large-scale complex networks. IEEE Transactions on Knowledge and Data Engineering 31, 9 (2018), 1736--1749.Google ScholarDigital Library
Yutong Lu. 2019. Paving the way for China exascale computing. CCF Transactions on High Performance Computing 1, 2 (2019), 63--72.Google ScholarCross Ref
Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2009. Pregel: a system for large-scale graph processing. Sigmod (2009), 135--146.Google Scholar
Ruben Mayer and Hans-Arno Jacobsen. 2021. Hybrid edge partitioner: partitioning large power-law graphs under memory constraints. In Proceedings of the 2021 International Conference on Management of Data. 1289--1302.Google ScholarDigital Library
Andrey Molyakov. 2019. Age of Great Chinese Dragon: Supercomputer Centers and High Performance Computing. Journal of Electrical and Electronic Engineering 7, 4 (2019), 87--94.Google Scholar
Masahiro Nakao, Koji Ueno, Katsuki Fujisawa, Yuetsu Kodama, and Mitsuhisa Sato. 2020. Performance Evaluation of Supercomputer Fugaku using Breadth-First Search Benchmark in Graph500. In 2020 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 408--409.Google ScholarCross Ref
Masahiro Nakao, Koji Ueno, Katsuki Fujisawa, Yuetsu Kodama, and Mitsuhisa Sato. 2021. Performance of the supercomputer fugaku for breadth-first search in graph500 benchmark. In High Performance Computing: 36th International Conference, ISC High Performance 2021, Virtual Event, June 24--July 2, 2021, Proceedings 36. Springer, 372--390.Google ScholarDigital Library
Mahdi Nikdast, Jiang Xu, Luan HK Duong, Xiaowen Wu, Zhehui Wang, Xuan Wang, and Zhe Wang. 2014. Fat-tree-based optical interconnection networks under crosstalk noise constraint. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 23, 1 (2014), 156--169.Google ScholarDigital Library
Joel Nishimura and Johan Ugander. 2013. Restreaming graph partitioning: simple versatile algorithms for advanced balancing. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 1106--1114.Google ScholarDigital Library
Anil Pacaci and M Tamer Özsu. 2019. Experimental analysis of streaming algorithms for graph partitioning. In Proceedings of the 2019 International Conference on Management of Data. 1375--1392.Google ScholarDigital Library
Julian Shun and Guy E Blelloch. 2013. Ligra: a lightweight graph processing framework for shared memory. In ACM Sigplan Notices, Vol. 48. ACM, 135--146.Google ScholarDigital Library
George M Slota, Cameron Root, Karen Devine, Kamesh Madduri, and Sivasankaran Rajamanickam. 2020. Scalable, multi-constraint, complex-objective graph partitioning. IEEE Transactions on Parallel and Distributed Systems 31, 12 (2020), 2789--2801.Google ScholarDigital Library
Hari Subramoni, Albert Mathews Augustine, Mark Arnold, Jonathan Perkins, Xiaoyi Lu, Khaled Hamidouche, and Dhabaleswar K Panda. 2016. INAM 2: InfiniBand Network Analysis and Monitoring with MPI. In International Conference on High Performance Computing. Springer, 300--320.Google Scholar
Toyotaro Suzumura, Koji Ueno, Hitoshi Sato, Katsuki Fujisawa, and Satoshi Matsuoka. 2011. Performance characteristics of Graph500 on large-scale distributed environment. In 2011 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 149--158.Google ScholarDigital Library
Koji Ueno and Toyotaro Suzumura. 2012. 2d partitioning based graph search for the graph500 benchmark. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum. IEEE, 1925--1931.Google ScholarDigital Library
Koji Ueno and Toyotaro Suzumura. 2012. Highly scalable graph search for the graph500 benchmark. In Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing. 149--160.Google ScholarDigital Library
Koji Ueno, Toyotaro Suzumura, Naoya Maruyama, Katsuki Fujisawa, and Satoshi Matsuoka. 2016. Extreme scale breadth-first search on supercomputers. In 2016 IEEE International Conference on Big Data (Big Data). IEEE, 1040--1047.Google ScholarCross Ref
Carnegie Mellon University. 2021. ClueWeb12 Dataset. https://lemurproject.org/clueweb12/ Last accessed 03 December 2021.Google Scholar
Erik Vermij, Leandro Fiorin, Christoph Hagleitner, and Koen Bertels. 2017. Boosting the efficiency of HPCG and Graph500 with near-data processing. In 2017 46th International Conference on Parallel Processing (ICPP). IEEE, 31--40.Google ScholarCross Ref
Ruibo Wang, Kai Lu, Juan Chen, Wenzhe Zhang, Jinwen Li, Yuan Yuan, Pingjing Lu, Libo Huang, Shengguo Li, and Xiaokang Fan. 2020. Brief introduction of TianHe exascale prototype system. Tsinghua Science and Technology 26, 3 (2020), 361--369.Google ScholarCross Ref
Yuanwei Wang, Huanqi Cao, Zixuan Ma, Wanwang Yin, and Wenguang Chen. 2022. Scaling graph 500 SSSP to 140 trillion edges with over 40 million cores. In 2022 SC22: International Conference for High Performance Computing, Networking, Storage and Analysis (SC). IEEE Computer Society, 248--262.Google ScholarCross Ref
Min Xie, Yutong Lu, Kefei Wang, Lu Liu, Hongjia Cao, et al. 2011. Tianhe-1a interconnect and message-passing services. IEEE Micro 32, 1 (2011), 8--20.Google Scholar
Xue-Jun Yang, Xiang-Ke Liao, Kai Lu, Qing-Feng Hu, Jun-Qiang Song, and Jin-Shu Su. 2011. The TianHe-1A supercomputer: its hardware and software. Journal of computer science and technology 26, 3 (2011), 344--351.Google ScholarCross Ref
Yuichiro Yasui and Katsuki Fujisawa. 2015. Fast and scalable NUMA-based thread parallel breadth-first search. In 2015 International Conference on High Performance Computing & Simulation (HPCS). IEEE, 377--385.Google ScholarCross Ref
Yuichiro Yasui, Katsuki Fujisawa, and Kazushige Goto. 2013. NUMA-optimized parallel breadth-first search on multicore single-node system. In 2013 IEEE International Conference on Big Data. IEEE, 394--402.Google ScholarCross Ref
Andy Yoo, Edmond Chow, Keith Henderson, William McLendon, Bruce Hendrickson, and Umit Catalyurek. 2005. A scalable distributed parallel breadth-first search algorithm on BlueGene/L. In SC'05: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing. IEEE, 25--25.Google ScholarDigital Library
Jason Jongjin Yoo and Alan E Willner. 2001. A Performance and Implementation Comparison of Bidirectional and Dual Bus 2-D WDM Multiple-Plane Optical Interconnections with Row-ColumnMultihop Network Structures. Journal of lightwave technology 19, 6 (2001), 801.Google ScholarCross Ref
Xin You, Hailong Yang, Zhongzhi Luan, Yi Liu, and Depei Qian. 2019. Performance evaluation and analysis of linear algebra kernels in the prototype tianhe-3 cluster. In Asian Conference on Supercomputing Frontiers. Springer, 86--105.Google ScholarCross Ref
Jeffrey Young, Julian Romera, Matthias Hauck, and Holger Fröning. 2016. Optimizing communication for a 2D-partitioned scalable BFS. In 2016 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1--7.Google ScholarCross Ref
Chenglong Zhang. 2020. A New Perspective of Graph Data and A Generic and Efficient Method for Large Scale Graph Data Traversal. arXiv preprint arXiv:2009.07463 (2020).Google Scholar
Hongyang Zhang, Peter Lofgren, and Ashish Goel. 2016. Approximate personalized pagerank on dynamic graphs. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1315--1324.Google ScholarDigital Library
Yiming Zhang, Haonan Wang, Menghan Jia, Jinyan Wang, Dong sheng Li, Guangtao Xue, and K. Tan. 2020. TopoX: Topology Refactorization for Minimizing Network Communication in Graph Computations. IEEE/ACM Transactions on Networking 28 (2020), 2768--2782.Google ScholarDigital Library
Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A Computation-Centric Distributed Graph Processing System. In 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, USA, November 2--4, 2016, Kimberly Keeton and Timothy Roscoe (Eds.). USENIX Association, 301--316. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/zhuGoogle Scholar

Index Terms

GraphCube: Interconnection Hierarchy-aware Graph Processing
1. Computing methodologies
  1. Distributed computing methodologies
    1. Distributed algorithms
  2. Parallel computing methodologies
    1. Parallel algorithms
      1. Massively parallel algorithms

Recommendations

FT-topo: Architecture-Driven Folded-Triangle Partitioning for Communication-efficient Graph Processing
ICS '23: Proceedings of the 37th International Conference on Supercomputing

As graph size (numbers of vertices and edges) is increasing from billions to trillions, efficient graph processing requires exascale computing clusters, which consist of hundreds of thousands of nodes connected via hierarchical networks with multiple ...
Read More
Partitioning of a graph into induced subgraphs not containing prescribed cliques
Abstract
Let K p be a complete graph of order p ≥ 2. A K p-free k-coloring of a graph H is a partition of V ( H ) into V 1 , V 2 … , V k such that H [ V i ] does not contain K p for each i ≤ k. In 1977 Borodin and Kostochka conjectured that any graph H ...
Read More
Partitioning a Graph into Complementary Subgraphs
WALCOM: Algorithms and Computation
Abstract
In the Partition Into Complementary Subgraphs (Comp-Sub) problem we are given a graph , and an edge set property , and asked whether G can be decomposed into two graphs, H and its complement , for some graph H, in such a way that the edge cut-set (...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PPoPP '24: Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming
March 2024
498 pages
ISBN:9798400704352
DOI:10.1145/3627535
Chair:
Michel Steuwer,
Program Chairs:
I-Ting Angelina Lee,
Milind Chabbi
Uber Technologies Inc.
Copyright © 2024 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 February 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
graph processing
graph partitioning
parallel computing
vectorization
Graph500
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate230of1,014submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 176
  Total Downloads
- Downloads (Last 12 months)176
- Downloads (Last 6 weeks)66
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

GraphCube: Interconnection Hierarchy-aware Graph Processing

PPoPP '24: Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

FT-topo: Architecture-Driven Folded-Triangle Partitioning for Communication-efficient Graph Processing

Partitioning of a graph into induced subgraphs not containing prescribed cliques

Partitioning a Graph into Complementary Subgraphs