skip to main content
10.1145/3627535.3638498acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

GraphCube: Interconnection Hierarchy-aware Graph Processing

Authors Info & Claims
Published:20 February 2024Publication History

ABSTRACT

Processing large-scale graphs with billions to trillions of edges requires efficiently utilizing parallel systems. However, current graph processing engines do not scale well beyond a few tens of computing nodes because they are oblivious to the communication cost variations across the interconnection hierarchy. We introduce GraphCube, a better approach to optimizing graph processing on large-scale parallel systems with complex interconnections. GraphCube features a new graph partitioning approach to achieve better load balancing and minimize communication overhead across multiple levels of the interconnection hierarchy. We evaluate GraphCube by applying it to fundamental graph operations performed on synthetic and real-world graph datasets. Our evaluation used up to 79,024 computing nodes and 1.2+ million processor cores. Our large-scale experiments show that GraphCube outperforms state-of-the-art parallel graph processing methods in throughput and scalability. Furthermore, GraphCube outperformed the top-ranked systems on the Graph 500 list.

Skip Supplemental Material Section

Supplemental Material

References

  1. Zainab Abbas, Vasiliki Kalavri, Paris Carbone, and Vladimir Vlassov. 2018. Streaming graph partitioning: an experimental study. Proceedings of the VLDB Endowment 11, 11 (2018), 1590--1603.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Soramichi Akiyama. 2020. Assessing Impact of Data Partitioning for Approximate Memory in C/C++ Code. arXiv preprint arXiv:2004.01637 (2020).Google ScholarGoogle Scholar
  3. Scott Beamer, Krste Asanovic, and David Patterson. 2012. Direction-optimizing breadth-first search. In SC'12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE, 1--10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Huanqi Cao, Yuanwei Wang, Haojie Wang, Heng Lin, Zixuan Ma, Wanwang Yin, and Wenguang Chen. 2022. Scaling graph traversal to 281 trillion edges with 40 million cores. In Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 234--245.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Donglin Chen, Jianbin Fang, Shizhao Chen, Chuanfu Xu, and Zheng Wang. 2019. Optimizing sparse matrix-vector multiplications on an armv8-based many-core architecture. International Journal of Parallel Programming 47, 3 (2019), 418--432.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Donglin Chen, Jianbin Fang, Chuanfu Xu, Shizhao Chen, and Zheng Wang. 2020. Characterizing scalability of sparse matrix-vector multiplications on phytium ft-2000+. International Journal of Parallel Programming 48, 1 (2020), 80--97.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Chen, J. Shi, Y. Chen, and H. Chen. 2015. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. European Conference on Computer Systems (2015), 1--15.Google ScholarGoogle Scholar
  8. Rong Chen, Jiaxin Shi, Yanzhe Chen, Binyu Zang, Haibing Guan, and Haibo Chen. 2018. PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs. ACM Trans. Parallel Comput. 5, 3 (2018), 13:1--13:39.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Rong Chen, Jiaxin Shi, Yanzhe Chen, Binyu Zang, Haibing Guan, and Haibo Chen. 2019. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. ACM Transactions on Parallel Computing (TOPC) 5, 3 (2019), 1--39.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yongzhi Chen and Yuefan Deng. 2009. A detailed analysis of communication load balance on BlueGene supercomputer. Computer physics communications 180, 8 (2009), 1251--1258.Google ScholarGoogle Scholar
  11. William James Dally and Brian Patrick Towles. 2004. Principles and practices of interconnection networks. Elsevier.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Roshan Dathathri, Gurbinder Gill, Loc Hoang, Hoang-Vu Dang, Alex Brooks, Nikoli Dryden, Marc Snir, and Keshav Pingali. 2018. Gluon: A communication-optimizing substrate for distributed heterogeneous graph analytics. In Proceedings of the 39th ACM SIGPLAN conference on programming language design and implementation. 752--768.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Timothy A Davis. 2006. Direct methods for sparse linear systems. SIAM.Google ScholarGoogle Scholar
  14. Jack Dongarra. 2020. Report on the Fujitsu Fugaku system. University of Tennessee-Knoxville Innovative Computing Laboratory, Tech. Rep. ICLUT-20-06 (2020).Google ScholarGoogle Scholar
  15. Wenfei Fan, Tao He, Longbin Lai, Xue Li, Yong Li, Zhao Li, Zhengping Qian, Chao Tian, Lei Wang, Jingbo Xu, et al. 2021. GraphScope: a unified engine for big graph processing. Proceedings of the VLDB Endowment 14, 12 (2021), 2879--2892.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Wenfei Fan, Muyang Liu, Chao Tian, Ruiqi Xu, and Jingren Zhou. 2020. Incrementalization of graph partitioning algorithms. Proceedings of the VLDB Endowment 13, 8 (2020), 1261--1274.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Wenfei Fan, Ruiqi Xu, Qiang Yin, Wenyuan Yu, and Jingren Zhou. 2022. Application-driven graph partitioning. The VLDB Journal (2022), 1--24.Google ScholarGoogle Scholar
  18. Jianbin Fang, Peng Zhang, Chun Huang, Tao Tang, Kai Lu, Ruibo Wang, and Zheng Wang. 2022. Programming Bare-Metal Accelerators with Heterogeneous Threading Models: A Case Study of Matrix-3000. arXiv preprint arXiv:2210.12230 (2022).Google ScholarGoogle Scholar
  19. Xing Feng, Lijun Chang, Xuemin Lin, Lu Qin, and Wenjie Zhang. 2016. Computing connected components with linear communication cost in pregel-like systems. In 2016 IEEE 32nd International Conference on Data Engineering (ICDE). IEEE, 85--96.Google ScholarGoogle ScholarCross RefCross Ref
  20. Zhisong Fu, Michael Personick, and Bryan Thompson. 2014. Map-Graph: A high level API for fast development of high performance graph analytics on GPUs. In Proceedings of workshop on GRAph data management experiences and systems. 1--6.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Pablo Fuentes, Mariano Benito, Enrique Vallejo, José Luis Bosque, Ramön Beivide, Andreea Anghel, Germán Rodríguez, Mitch Gusat, Cyriel Minkenberg, and Mateo Valero. 2017. A scalable synthetic traffic model of Graph500 for computer networks analysis. Concurrency and Computation: Practice and Experience 29, 24 (2017), e4231.Google ScholarGoogle ScholarCross RefCross Ref
  22. Xinbiao Gan, Yiming Zhang, Ruibo Wang, Tiejun Li, Tiaojie Xiao, Ruigeng Zeng, Jie Liu, and Kai Lu. 2021. TianheGraph: Customizing Graph Search for Graph500 on Tianhe Supercomputer. IEEE Transactions on Parallel and Distributed Systems (2021).Google ScholarGoogle Scholar
  23. Xinbiao Gan, Yiming Zhang, Ruigeng Zeng, Jie Liu, Ruibo Wang, Tiejun Li, Li Chen, and Kai Lu. 2022. XTree: Traversal-Based Partitioning for Extreme-Scale Graph Processing on Supercomputers. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 2046--2059.Google ScholarGoogle Scholar
  24. Tao Gao, Yutong Lu, Baida Zhang, and Guang Suo. 2014. Using the intel many integrated core to accelerate graph traversal. The International journal of high performance computing applications 28, 3 (2014), 255--266.Google ScholarGoogle Scholar
  25. Sayan Ghosh, Nathan R Tallent, and Mahantesh Halappanavar. 2021. Characterizing Performance of Graph Neighborhood Communication Patterns. IEEE Transactions on Parallel and Distributed Systems 33, 4 (2021), 915--928.Google ScholarGoogle Scholar
  26. Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation. 17--30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: graph processing in a distributed dataflow framework. In Proceedings of the 11th USENIX conference on Operating Systems Design and Implementation. 599--613.Google ScholarGoogle Scholar
  28. Samuel Grossman, Heiner Litz, and Christos Kozyrakis. 2018. Making pull-based graph processing performant. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, 246--260.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. http://graph500.org/. 2021. The Graph 500 List. https://graph500.org/ Last accessed 03 March 2022.Google ScholarGoogle Scholar
  30. Eu Inc. 2022. url.eu-2015. https://law.di.unimi.it/webdata/eu-2015/ Last accessed 03 December 2022.Google ScholarGoogle Scholar
  31. Twitter Inc. 2021. twitter-2010. https://law.di.unimi.it/webdata/twitter-2010/ Last accessed 03 December 2021.Google ScholarGoogle Scholar
  32. Keita Iwabuchi, Hitoshi Sato, Yuichiro Yasui, Katsuki Fujisawa, and Satoshi Matsuoka. 2014. NVM-based hybrid BFS with memory efficient data structure. In 2014 IEEE International Conference on Big Data (Big Data). IEEE, 529--538.Google ScholarGoogle ScholarCross RefCross Ref
  33. George Karypis and Vipin Kumar. 1995. METIS-unstructured graph partitioning and sparse matrix ordering system, version 2.0. (1995).Google ScholarGoogle Scholar
  34. George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on scientific Computing 20, 1 (1998), 359--392.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. George Karypis, Kirk Schloegel, and Vipin Kumar. 1997. Parmetis: Parallel graph partitioning and sparse matrix ordering library. (1997).Google ScholarGoogle Scholar
  36. John Kim, Wiliam J Dally, Steve Scott, and Dennis Abts. 2008. Technology-driven, highly-scalable dragonfly topology. In 2008 International Symposium on Computer Architecture. IEEE, 77--88.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Deyu Kong, Xike Xie, and Zhuoxu Zhang. 2022. Clustering-based Partitioning for Large Web Graphs. arXiv preprint arXiv:2201.00472 (2022).Google ScholarGoogle Scholar
  38. Dongsheng Li, Yiming Zhang, Jinyan Wang, and KianLee Tan. 2019. TopoX: Topology Refactorization for Efficient Graph Partitioning and Processing. PVLDB 12, 8 (2019), 891--905.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Dongsheng Li, Yiming Zhang, Jinyan Wang, and Kian-Lee Tan. 2019. TopoX: Topology refactorization for efficient graph partitioning and processing. Proceedings of the VLDB Endowment 12, 8 (2019), 891--905.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Z Li, C Wu, and Y Li. 2021. FEP-based large-scale virtual screening for effective drug discovery against COVID-19. In Int. Conf. High Performance Computing, Networking, Storage, and Analysis.Google ScholarGoogle Scholar
  41. Xiangke LIAO, Liquan XIAO, Canqun YANG, and Yutong LU. [n.d.]. MilkyWay-2 supercomputer: system and application. ([n. d.]).Google ScholarGoogle Scholar
  42. Xiang-Ke Liao, Zheng-Bin Pang, Ke-Fei Wang, Yu-Tong Lu, Min Xie, Jun Xia, De-Zun Dong, and Guang Suo. 2015. High performance interconnect network for Tianhe system. Journal of Computer Science and Technology 30, 2 (2015), 259--272.Google ScholarGoogle ScholarCross RefCross Ref
  43. Heng Lin, Xiongchao Tang, Bowen Yu, Youwei Zhuo, Wenguang Chen, Jidong Zhai, Wanwang Yin, and Weimin Zheng. 2017. Scalable graph traversal on sunway taihulight with ten million cores. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 635--645.Google ScholarGoogle ScholarCross RefCross Ref
  44. Heng Lin, Xiaowei Zhu, Bowen Yu, Xiongchao Tang, Wei Xue, Wenguang Chen, Lufei Zhang, Torsten Hoefler, Xiaosong Ma, Xin Liu, et al. 2018. Shentu: processing multi-trillion edge graphs on millions of cores in seconds. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 706--716.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Yucheng Low. 2013. Graphlab: A distributed abstraction for large scale machine learning. University of California (2013).Google ScholarGoogle Scholar
  46. Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A Framework for Machine Learning in the Cloud. PVLDB 5, 8 (2012), 716--727.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Kai Lu, Yaohua Wang, Yang Guo, Chun Huang, Sheng Liu, Ruibo Wang, Jianbin Fang, Tao Tang, Zhaoyun Chen, Biwei Liu, et al. 2022. MT-3000: a heterogeneous multi-zone processor for HPC. CCF Transactions on High Performance Computing (2022), 1--15.Google ScholarGoogle Scholar
  48. Meilian Lu, Zhenglin Zhang, Zhihe Qu, and Yu Kang. 2018. LPANNI: Overlapping community detection using label propagation in large-scale complex networks. IEEE Transactions on Knowledge and Data Engineering 31, 9 (2018), 1736--1749.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Yutong Lu. 2019. Paving the way for China exascale computing. CCF Transactions on High Performance Computing 1, 2 (2019), 63--72.Google ScholarGoogle ScholarCross RefCross Ref
  50. Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2009. Pregel: a system for large-scale graph processing. Sigmod (2009), 135--146.Google ScholarGoogle Scholar
  51. Ruben Mayer and Hans-Arno Jacobsen. 2021. Hybrid edge partitioner: partitioning large power-law graphs under memory constraints. In Proceedings of the 2021 International Conference on Management of Data. 1289--1302.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Andrey Molyakov. 2019. Age of Great Chinese Dragon: Supercomputer Centers and High Performance Computing. Journal of Electrical and Electronic Engineering 7, 4 (2019), 87--94.Google ScholarGoogle Scholar
  53. Masahiro Nakao, Koji Ueno, Katsuki Fujisawa, Yuetsu Kodama, and Mitsuhisa Sato. 2020. Performance Evaluation of Supercomputer Fugaku using Breadth-First Search Benchmark in Graph500. In 2020 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 408--409.Google ScholarGoogle ScholarCross RefCross Ref
  54. Masahiro Nakao, Koji Ueno, Katsuki Fujisawa, Yuetsu Kodama, and Mitsuhisa Sato. 2021. Performance of the supercomputer fugaku for breadth-first search in graph500 benchmark. In High Performance Computing: 36th International Conference, ISC High Performance 2021, Virtual Event, June 24--July 2, 2021, Proceedings 36. Springer, 372--390.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Mahdi Nikdast, Jiang Xu, Luan HK Duong, Xiaowen Wu, Zhehui Wang, Xuan Wang, and Zhe Wang. 2014. Fat-tree-based optical interconnection networks under crosstalk noise constraint. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 23, 1 (2014), 156--169.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Joel Nishimura and Johan Ugander. 2013. Restreaming graph partitioning: simple versatile algorithms for advanced balancing. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 1106--1114.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Anil Pacaci and M Tamer Özsu. 2019. Experimental analysis of streaming algorithms for graph partitioning. In Proceedings of the 2019 International Conference on Management of Data. 1375--1392.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Julian Shun and Guy E Blelloch. 2013. Ligra: a lightweight graph processing framework for shared memory. In ACM Sigplan Notices, Vol. 48. ACM, 135--146.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. George M Slota, Cameron Root, Karen Devine, Kamesh Madduri, and Sivasankaran Rajamanickam. 2020. Scalable, multi-constraint, complex-objective graph partitioning. IEEE Transactions on Parallel and Distributed Systems 31, 12 (2020), 2789--2801.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Hari Subramoni, Albert Mathews Augustine, Mark Arnold, Jonathan Perkins, Xiaoyi Lu, Khaled Hamidouche, and Dhabaleswar K Panda. 2016. INAM 2: InfiniBand Network Analysis and Monitoring with MPI. In International Conference on High Performance Computing. Springer, 300--320.Google ScholarGoogle Scholar
  61. Toyotaro Suzumura, Koji Ueno, Hitoshi Sato, Katsuki Fujisawa, and Satoshi Matsuoka. 2011. Performance characteristics of Graph500 on large-scale distributed environment. In 2011 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 149--158.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Koji Ueno and Toyotaro Suzumura. 2012. 2d partitioning based graph search for the graph500 benchmark. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum. IEEE, 1925--1931.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Koji Ueno and Toyotaro Suzumura. 2012. Highly scalable graph search for the graph500 benchmark. In Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing. 149--160.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Koji Ueno, Toyotaro Suzumura, Naoya Maruyama, Katsuki Fujisawa, and Satoshi Matsuoka. 2016. Extreme scale breadth-first search on supercomputers. In 2016 IEEE International Conference on Big Data (Big Data). IEEE, 1040--1047.Google ScholarGoogle ScholarCross RefCross Ref
  65. Carnegie Mellon University. 2021. ClueWeb12 Dataset. https://lemurproject.org/clueweb12/ Last accessed 03 December 2021.Google ScholarGoogle Scholar
  66. Erik Vermij, Leandro Fiorin, Christoph Hagleitner, and Koen Bertels. 2017. Boosting the efficiency of HPCG and Graph500 with near-data processing. In 2017 46th International Conference on Parallel Processing (ICPP). IEEE, 31--40.Google ScholarGoogle ScholarCross RefCross Ref
  67. Ruibo Wang, Kai Lu, Juan Chen, Wenzhe Zhang, Jinwen Li, Yuan Yuan, Pingjing Lu, Libo Huang, Shengguo Li, and Xiaokang Fan. 2020. Brief introduction of TianHe exascale prototype system. Tsinghua Science and Technology 26, 3 (2020), 361--369.Google ScholarGoogle ScholarCross RefCross Ref
  68. Yuanwei Wang, Huanqi Cao, Zixuan Ma, Wanwang Yin, and Wenguang Chen. 2022. Scaling graph 500 SSSP to 140 trillion edges with over 40 million cores. In 2022 SC22: International Conference for High Performance Computing, Networking, Storage and Analysis (SC). IEEE Computer Society, 248--262.Google ScholarGoogle ScholarCross RefCross Ref
  69. Min Xie, Yutong Lu, Kefei Wang, Lu Liu, Hongjia Cao, et al. 2011. Tianhe-1a interconnect and message-passing services. IEEE Micro 32, 1 (2011), 8--20.Google ScholarGoogle Scholar
  70. Xue-Jun Yang, Xiang-Ke Liao, Kai Lu, Qing-Feng Hu, Jun-Qiang Song, and Jin-Shu Su. 2011. The TianHe-1A supercomputer: its hardware and software. Journal of computer science and technology 26, 3 (2011), 344--351.Google ScholarGoogle ScholarCross RefCross Ref
  71. Yuichiro Yasui and Katsuki Fujisawa. 2015. Fast and scalable NUMA-based thread parallel breadth-first search. In 2015 International Conference on High Performance Computing & Simulation (HPCS). IEEE, 377--385.Google ScholarGoogle ScholarCross RefCross Ref
  72. Yuichiro Yasui, Katsuki Fujisawa, and Kazushige Goto. 2013. NUMA-optimized parallel breadth-first search on multicore single-node system. In 2013 IEEE International Conference on Big Data. IEEE, 394--402.Google ScholarGoogle ScholarCross RefCross Ref
  73. Andy Yoo, Edmond Chow, Keith Henderson, William McLendon, Bruce Hendrickson, and Umit Catalyurek. 2005. A scalable distributed parallel breadth-first search algorithm on BlueGene/L. In SC'05: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing. IEEE, 25--25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Jason Jongjin Yoo and Alan E Willner. 2001. A Performance and Implementation Comparison of Bidirectional and Dual Bus 2-D WDM Multiple-Plane Optical Interconnections with Row-ColumnMultihop Network Structures. Journal of lightwave technology 19, 6 (2001), 801.Google ScholarGoogle ScholarCross RefCross Ref
  75. Xin You, Hailong Yang, Zhongzhi Luan, Yi Liu, and Depei Qian. 2019. Performance evaluation and analysis of linear algebra kernels in the prototype tianhe-3 cluster. In Asian Conference on Supercomputing Frontiers. Springer, 86--105.Google ScholarGoogle ScholarCross RefCross Ref
  76. Jeffrey Young, Julian Romera, Matthias Hauck, and Holger Fröning. 2016. Optimizing communication for a 2D-partitioned scalable BFS. In 2016 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1--7.Google ScholarGoogle ScholarCross RefCross Ref
  77. Chenglong Zhang. 2020. A New Perspective of Graph Data and A Generic and Efficient Method for Large Scale Graph Data Traversal. arXiv preprint arXiv:2009.07463 (2020).Google ScholarGoogle Scholar
  78. Hongyang Zhang, Peter Lofgren, and Ashish Goel. 2016. Approximate personalized pagerank on dynamic graphs. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1315--1324.Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Yiming Zhang, Haonan Wang, Menghan Jia, Jinyan Wang, Dong sheng Li, Guangtao Xue, and K. Tan. 2020. TopoX: Topology Refactorization for Minimizing Network Communication in Graph Computations. IEEE/ACM Transactions on Networking 28 (2020), 2768--2782.Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A Computation-Centric Distributed Graph Processing System. In 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, USA, November 2--4, 2016, Kimberly Keeton and Timothy Roscoe (Eds.). USENIX Association, 301--316. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/zhuGoogle ScholarGoogle Scholar

Index Terms

  1. GraphCube: Interconnection Hierarchy-aware Graph Processing

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PPoPP '24: Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming
        March 2024
        498 pages
        ISBN:9798400704352
        DOI:10.1145/3627535

        Copyright © 2024 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 February 2024

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate230of1,014submissions,23%
      • Article Metrics

        • Downloads (Last 12 months)176
        • Downloads (Last 6 weeks)66

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader