ABSTRACT
Graph neural networks (GNNs) can extract features by learning both the representation of each objects (i.e., graph nodes) and the relationship across different objects (i.e., the edges that connect nodes), achieving state-of-the-art performance in various graph-based tasks. Despite its strengths, utilizing these algorithms in a production environment faces several challenges as the number of graph nodes and edges amount to several billions to hundreds of billions scale, requiring substantial storage space for training. Unfortunately, state-of-the-art ML frameworks employ an in-memory processing model which significantly hampers the productivity of ML practitioners as it mandates the overall working set to fit within DRAM capacity. In this work, we first conduct a detailed characterization on a state-of-the-art, large-scale GNN training algorithm, GraphSAGE. Based on the characterization, we then explore the feasibility of utilizing capacity-optimized NVMe SSDs for storing memory-hungry GNN data, which enables large-scale GNN training beyond the limits of main memory size. Given the large performance gap between DRAM and SSD, however, blindly utilizing SSDs as a direct substitute for DRAM leads to significant performance loss. We therefore develop SmartSAGE, our software/hardware co-design based on an in-storage processing (ISP) architecture. Our work demonstrates that an ISP based large-scale GNN training system can achieve both high capacity storage and high performance, opening up opportunities for ML practitioners to train large GNN datasets without being hampered by the physical limitations of main memory size.
- Anurag Acharya, Mustafa Uysal, and Joel Saltz. 1998. Active Disks: Programming Model, Algorithms and Evaluation. ACM SIGOPS Operating Systems Review (1998).Google ScholarDigital Library
- Shaizeen Aga, Supreet Jeloka, Arun Subramaniyan, Satish Narayanasamy, David Blaauw, and Reetuparna Das. 2017. Compute Caches. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA).Google Scholar
- Mohammad Alian, Seung Won Min, Hadi Asgharimoghaddam, Ashutosh Dhar, Dong Kai Wang, Thomas Roewer, Adam McPadden, Oliver O'Halloran, Deming Chen, Jinjun Xiong, Daehoon Kim, Wen-mei Hwu, and Nam Sung Kim. 2018. Application-Transparent Near-Memory Processing Architecture with Memory Channel Network. In Proceedings of the International Symposium on Microarchitecture (MICRO).Google ScholarDigital Library
- Hadi Asghari-Moghaddam, Young Hoon Son, Jung Ho Ahn, and Nam Sung Kim. 2016. Chameleon: Versatile and Practical Near-DRAM Acceleration Architecture for Large Memory Systems. In Proceedings of the International Symposium on Microarchitecture (MICRO).Google ScholarCross Ref
- Duck-Ho Bae, Jin-Hyung Kim, Sang-Wook Kim, Hyunok Oh, and Chanik Park. 2013. Intelligent SSD: A Turbo for Big Data Mining. In Proceedings of the ACM International Conference on Information & Knowledge Management.Google ScholarDigital Library
- Trapit Bansal, David Belanger, and Andrew McCallum. 2016. Ask the GRU: Multi-task Learning for Deep Text Recommendations. In Proceedings of the ACM Conference on Recommender Systems (RECSYS).Google ScholarDigital Library
- Francois Belletti, Karthik Lakshmanan, Walid Krichene, Yi-Fan Chen, and John Anderson. 2019. Scalable Realistic Recommendation Datasets through Fractal Expansions. arXiv preprint arXiv:1901.08910 (2019).Google Scholar
- Wooseong Cheong, Chanho Yoon, Seonghoon Woo, Kyuwook Han, Daehyun Kim, Chulseung Lee, Youra Choi, Shine Kim, Dongku Kang, Geunyeong Yu, Jaehong Kim, Jaechun Park, Ki-Whan Song, Ki-Tae Park, Sangyeun Cho, Hwaseok Oh, Daniel DG Lee, Jin-Hyeok Choi, and Jaeheon Jeong. 2018. A Flash Memory Controller for 15us Ultra-Low-Latency SSD Using High-Speed 3D NAND Flash with 3us Read Time. In Proceedings of the International Solid State Circuits Conference (ISSCC).Google Scholar
- Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory. In Proceedings of the International Symposium on Computer Architecture (ISCA).Google ScholarDigital Library
- Benjamin Y Cho, Won Seob Jeong, Doohwan Oh, and Won Woo Ro. 2013. XSD: Accelerating MapReduce by Harnessing the GPU Inside an SSD. In Proceedings of the 1st Workshop on Near-Data Processing.Google Scholar
- M. Cho, T. Le, U. Finkler, H. Imai, Y. Negishi, T. Sekiyama, S. Vinod, V. Zolotov, K. Kawachiya, D. Kung, and H. Hunter. 2018. Large Model Support for Deep Learning in Caffe and Chainer. In SysML.Google Scholar
- Sangyeun Cho, Chanik Park, Hyunok Oh, Sungchan Kim, Youngmin Yi, and Gregory R Ganger. 2013. Active Disk Meets Flash: A Case For Intelligent SSDs. In Proceedings of the ACM International Conference on Supercomputing (ICS).Google ScholarDigital Library
- I Stephen Choi and Yang-Suk Kee. 2015. Energy Efficient Scale-In Clusters with In-Storage Processing for Big-Data Analytics. In Proceedings of the International Symposium on Memory Systems (MEMSYS).Google Scholar
- Hanjun Dai, Zornitsa Kozareva, Bo Dai, Alex Smola, and Le Song. 2018. Learning Steady-States of Iterative Algorithms over Graphs. In ICML. 1114--1122.Google Scholar
- Jaeyoung Do, Victor C. Ferreira, Hossein Bobarshad, Mahdi Torabzadehkashi, Siavash Rezaei, Ali Heydarigorji, Diego Souza, Brunno F. Goldstein, Leandro Santiago, Min Soo Kim, Priscila M. V. Lima, Felipe M. G. Franca, and Vladimir Alves. 2020. Cost-Effective, Energy-Efficient, and Scalable Storage Computing for Large-Scale AI Applications. ACM Transactions on Storage (2020).Google Scholar
- Jaeyoung Do, Yang-Suk Kee, Jignesh M Patel, Chanik Park, Kwanghyun Park, and David J DeWitt. 2013. Query Processing on Smart SSDs: Opportunities and Challenges. In Proceedings of the ACM SIGMOD International Conference on Management of Data (MOD).Google ScholarDigital Library
- David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre, Rafael Gomez-Bombarelli, Timothy Hirzel, Alan Aspuru-Guzik, and Ryan P. Adams. 2015. Convolutional Networks on Graphs for Learning Molecular Fingerprints. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS).Google Scholar
- Charles Eckert, Xiaowei Wang, Jingcheng Wang, Arun Subramaniyan, Ravi Iyer, Dennis Sylvester, David Blaaauw, and Reetuparna Das. 2018. Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks. In Proceedings of the International Symposium on Computer Architecture (ISCA).Google ScholarDigital Library
- Eideticom 2021. NoLoad CSP. https://www.eideticom.com/products.htmlGoogle Scholar
- Amin Farmahini-Farahani, Jung Ho Ahn, Katherine Morrow, and Nam Sung Kim. 2015. NDA: Near-DRAM Acceleration Architecture Leveraging Commodity DRAM Devices and Standard Memory Modules. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA).Google ScholarCross Ref
- Matthias Fey and Jan E. Lenssen. 2019. Fast Graph Representation Learning with PyTorch Geometric. In Proceedings of the International Conference on Learning Representations (ICLR).Google Scholar
- Alex Fout, Jonathon Byrd, Basir Shariat, and Asa Ben-Hur. 2017. Protein Interface Prediction Using Graph Convolutional Networks. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS).Google Scholar
- Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. 2017. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).Google ScholarDigital Library
- Tong Geng, Ang Li, Runbin Shi, Chunshu Wu, T. Wang, Yanfei Li, Pouya Haghi, Antonino Tumeo, Shuai Che, S. Reinhardt, and M. Herbordt. 2020. AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing. In Proceedings of the International Symposium on Microarchitecture (MICRO).Google Scholar
- Boncheol Gu, Andre S. Yoon, Duck-Ho Bae, Insoon Jo, Jinyoung Lee, Jonghyun Yoon, Jeong-Uk Kang, Moonsang Kwon, Chanho Yoon, Sangyeun Cho, Jaeheon Jeong, and Duckhyun Chang. 2016. Biscuit: A Framework for Near-Data Processing of Big Data Workloads. In Proceedings of the International Symposium on Computer Architecture (ISCA).Google ScholarDigital Library
- William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS).Google Scholar
- Yu-Ching Hu, Murtuza Taher Lokhandwala, Te I, and Hung-Wei Tseng. 2019. Dynamic Multi-Resolution Data Storage. In Proceedings of the International Symposium on Microarchitecture (MICRO).Google ScholarDigital Library
- Chien-Chin Huang, Gu Jin, and Jinyang Li. 2020. SwapAdvisor: Pushing Deep Learning Beyond the GPU Memory Limit via Smart Swapping. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).Google ScholarDigital Library
- Ranggi Hwang, Taehun Kim, Youngeun Kwon, and Minsoo Rhu. 2020. Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations. In Proceedings of the International Symposium on Computer Architecture (ISCA).Google ScholarDigital Library
- Hai Jin, Bo Liu, Wenbin Jiang, Yang Ma, Xuanhua Shi, Bingsheng He, and Shaofeng Zhao. 2018. Layer-Centric Memory Reuse and Data Migration for Extreme-Scale Deep Learning on Many-Core Architectures. ACM Transactions on Architecture and Code Optimization (2018).Google Scholar
- Sang-Woo Jun, Ming Liu, Sungjin Lee, Jamey Hicks, John Ankcorn, Myron King, Shuotao Xu, and Arvind. 2015. BlueDBM: An Appliance for Big Data Analytics. In Proceedings of the International Symposium on Computer Architecture (ISCA).Google ScholarDigital Library
- Sang-Woo Jun, Andy Wright, Sizhuo Zhang, Shuotao Xu, and Arvind. 2018. GraFboost: Using Accelerated Flash Storage for External Graph Analytics. In Proceedings of the International Symposium on Computer Architecture (ISCA).Google ScholarDigital Library
- Yangwook Kang, Yang-suk Kee, Ethan L Miller, and Chanik Park. 2013. Enabling Cost-Effective Data Processing With Smart SSD. In Proceedings of the IEEE Symposium on Mass Storage Systems and Technologies (MSST).Google ScholarCross Ref
- Liu Ke, Udit Gupta, Benjamin Youngjae Cho, David Brooks, Vikas Chandra, Utku Diril, Amin Firoozshahian, Kim Hazelwood, Bill Jia, Hsien-Hsin S. Lee, Meng Li, Bert Maher, Dheevatsa Mudigere, Maxim Naumov, Martin Schatz, Mikhail Smelyanskiy, Xiaodong Wang, Brandon Reagen, Carole-Jean Wu, Mark Hempstead, and Xuan Zhang. 2020. RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing. In Proceedings of the International Symposium on Computer Architecture (ISCA).Google ScholarDigital Library
- Kimberly Keeton, David A Patterson, and Joseph M Hellerstein. 1998. A Case for Intelligent Disks (IDISKs). Acm Sigmod Record (1998).Google Scholar
- Byeongho Kim, Jaehyun Park, Eojin Lee, Minsoo Rhu, and Jung Ho Ahn. 2020. TRiM: Tensor Reduction in Memory. In IEEE Computer Architecture Letters.Google Scholar
- Duckhwan Kim, Jaeha Kung, Sek Chai, Sudhakar Yalamanchili, and Saibal Mukhopadhyay. 2016. Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory. In Proceedings of the International Symposium on Computer Architecture (ISCA).Google ScholarDigital Library
- Minsub Kim and Sungjin Lee. 2020. Reducing Tail Latency of DNN-based Recommender Systems using In-Storage Processing. In Proceedings of the ACM SIGOPS Asia-Pacific Workshop on Systems (APSys).Google ScholarDigital Library
- Sungchan Kim, Hyunok Oh, Chanik Park, Sangyeun Cho, Sang-Won Lee, and Bongki Moon. 2016. In-Storage Processing of Database Scans and Joins. Information Sciences (2016).Google Scholar
- Kevin Kiningham, Christopher Re, and Philip Levis. 2020. GRIP: A Graph Neural Network Accelerator Architecture. arXiv preprint arXiv:2007.13828 (2020).Google Scholar
- Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the International Conference on Learning Representations (ICLR).Google Scholar
- Gunjae Koo, Kiran Kumar Matam, Te I., H.V. Krishna Giri Narra, Jing Li, Hung- Wei Tseng, Steven Swanson, and Murali Annavaram. 2017. Summarizer: Trading Communication with Computing Near Storage. In Proceedings of the International Symposium on Microarchitecture (MICRO).Google Scholar
- Jaewook Kwak, Sangjin Lee, Kibin Park, Jinwoo Jeong, and Yong Ho Song. 2020. Cosmos+ OpenSSD: Rapid Prototype for Flash Storage Systems. ACM Transactions on Storage (2020).Google Scholar
- Youngeun Kwon, Yunjae Lee, and Minsoo Rhu. 2019. TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning. In Proceedings of the International Symposium on Microarchitecture (MICRO).Google ScholarDigital Library
- Youngeun Kwon, Yunjae Lee, and Minsoo Rhu. 2021. Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA).Google ScholarCross Ref
- Youngeun Kwon and Minsoo Rhu. 2018. A Case for Memory-Centric HPC System Architecture for Training Deep Neural Networks. In IEEE Computer Architecture Letters.Google Scholar
- Youngeun Kwon and Minsoo Rhu. 2018. Beyond the Memory Wall: A Case for Memory-Centric HPC System for Deep Learning. In Proceedings of the International Symposium on Microarchitecture (MICRO).Google ScholarDigital Library
- Youngeun Kwon and Minsoo Rhu. 2019. A Disaggregated Memory System for Deep Learning. In IEEE Micro.Google Scholar
- Jinho Lee, Heesu Kim, Sungjoo Yoo, Kiyoung Choi, H Peter Hofstee, Gi-Joon Nam, Mark R Nutter, and Damir Jamsek. 2017. Extrav: Boosting Graph Processing Near Storage with a Coherent Accelerator. Proceedings of the VLDB Endowment (PVLDB) (2017).Google ScholarDigital Library
- Sukhan Lee, Shin-haeng Kang, Jaehoon Lee, Hyeonsu Kim, Eojin Lee, Seungwoo Seo, Hosang Yoon, Seungwon Lee, Kyounghwan Lim, Hyunsung Shin, Jinhyun Kim, O Seongil, Anand Iyer, David Wang, Kyomin Sohn, and Nam Sung Kim. 2021. Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology. In Proceedings of the International Symposium on Computer Architecture (ISCA).Google Scholar
- Yunjae Lee, Youngeun Kwon, and Minsoo Rhu. 2021. Understanding the Implication of Non-Volatile Memory for Large-Scale Graph Neural Network Training. In IEEE Computer Architecture Letters.Google Scholar
- Young-Sik Lee, Luis Cavazos Quero, Sang-Hoon Kim, Jin-Soo Kim, and Seungryoul Maeng. 2016. ActiveSort: Efficient External Sorting Using Active SSDs in the MapReduce Framework. Future Generation Computer Systems (2016).Google Scholar
- J. Leskovec, J. Kleinberg, and C. Faloutsos. 2005. Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD).Google Scholar
- Cangyuan Li, Ying Wang, Cheng Liu, Shengwen Liang, Huawei Li, and Xiaowei Li. 2021. GLIST: Towards In-Storage Graph Learning. In Proceedings of USENIX Conference on Annual Technical Conference (ATC).Google Scholar
- Jiajun Li, Ahmed Louri, Avinash Karanth, and Razvan Bunescu. 2021. GCNAX: A Flexible and Energy-Efficient Accelerator for Graph Convolutional Neural Networks. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA).Google ScholarCross Ref
- Shengwen Liang, Ying Wang, Cheng Liu, Lei He, Huawei Li, Dawen Xu, and Xiao-Wei Li. 2020. EnGN: A High-Throughput and Energy-Efficient Accelerator for Large Graph Neural Networks. IEEE Trans. Comput. (2020).Google Scholar
- Zhiqi Lin, Cheng Li, Youshan Miao, Yunxin Liu, and Yinlong Xu. 2020. PaGraph: Scaling GNN Training on Large Graphs via Computation-Aware Caching. In Proceedings of the ACM Symposium on Cloud Computing (SoCC).Google ScholarDigital Library
- Kiran Kumar Matam, Gunjae Koo, Haipeng Zha, Hung-Wei Tseng, and Murali Annavaram. 2019. GraphSSD: Graph Semantics Aware SSD. In Proceedings of the International Symposium on Computer Architecture (ISCA).Google ScholarDigital Library
- Jason Mohoney, Roger Waleffe, Henry Xu, Theodoros Rekatsinas, and Shivaram Venkataraman. 2021. Marius: Learning Massive Graph Embeddings on a Single Machine. In Proceedings of USENIX Symposium on Operating Systems Design and Implementation (OSDI).Google Scholar
- Jaehyun Park, Byeongho Kim, Sungmin Yun, Eojin Lee, Minsoo Rhu, and Jung Ho Ahn. 2021. TRiM: Enhancing Processor-Memory Interfaces with Scalable Tensor Reduction in Memory. In Proceedings of the International Symposium on Microarchitecture (MICRO).Google ScholarDigital Library
- Xuan Peng, Xuanhua Shi, Hulin Dai, Hai Jin, Weiliang Ma, Qian Xiong, Fan Yang, and Xuehai Qian. 2020. Capuchin: Tensor-based GPU Memory Management for Deep Learning. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).Google ScholarDigital Library
- Jie Ren, Jiaolin Luo, Kai Wu, Minjia Zhang, Hyeran Jeon, and Dong Li. 2021. Sentinel: Efficient Tensor Migration and Allocation on Heterogeneous Memory Systems for Deep Learning. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA).Google ScholarCross Ref
- Minsoo Rhu, Natalia Gimelshein, Jason Clemons, Arslan Zulfiqar, and Stephen W. Keckler. 2016. vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design. In Proceedings of the International Symposium on Microarchitecture (MICRO).Google Scholar
- Minsoo Rhu, Mike O'Connor, Niladrish Chatterjee, Jeff Pool, Youngeun Kwon, and StephenW. Keckler. 2018. Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA).Google ScholarCross Ref
- Erik Riedel, Garth Gibson, and Christos Faloutsos. 1998. Active Storage for Large-Scale Data Mining and Multimedia Applications. In Proceedings of Conference on Very Large Databases (VLDB).Google Scholar
- Samsung 2021. SmartSSD. https://www.xilinx.com/applications/data-center/computational-storage/smartssd.htmlGoogle Scholar
- Sudharsan Seshadri, Mark Gahagan, Sundaram Bhaskaran, Trevor Bunker, Arup De, Yanqin Jin, Yang Liu, and Steven Swanson. 2014. Willow: A User-Programmable SSD. In Proceedings of USENIX Symposium on Operating Systems Design and Implementation (OSDI).Google Scholar
- Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R. Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars. In Proceedings of the International Symposium on Computer Architecture (ISCA).Google ScholarDigital Library
- Xinkai Song, Tian Zhi, Zhe Fan, Zhenxing Zhang, Xi Zeng, Wei Li, Xing Hu, Zidong Du, Qi Guo, and Yunji Chen. 2021. Cambricon-G: A Polyvalent Energy-Efficient Accelerator for Dynamic Graph Neural Networks. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2021).Google ScholarDigital Library
- Qidong Su, Minjie Wang, Da Zheng, and Zheng Zhang. 2021. Adaptive Load Balancing for Parallel GNN Training. In Proceedings of MLSys Workshop on Graph Neural Networks and Systems (GNNSys).Google Scholar
- NGD Systems. 2021. Newport Platform. https://www.ngdsystems.com/solutions#NewportSectionGoogle Scholar
- Devesh Tiwari, Simona Boboila, Sudharshan Vazhkudai, Youngjae Kim, Xiaosong Ma, Peter Desnoyers, and Yan Solihin. 2013. Active Flash: Towards Energy-Efficient, In-Situ Data Analytics on Extreme-Scale Machines. In Proceedings of USENIX Conference on File and Storage Technologies (FAST).Google Scholar
- Mahdi Torabzadehkashi, Siavash Rezaei, Ali Heydarigorji, Hosein Bobarshad, Vladimir Alves, and Nader Bagherzadeh. 2019. Catalina: In-Storage Processing Acceleration for Scalable Big Data Analytics. In Proceedings of the Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP).Google ScholarCross Ref
- Hung-Wei Tseng, Qianchen Zhao, Yuxiao Zhou, Mark Gahagan, and Steven Swanson. 2016. Morpheus: Creating Application Objects Efficiently for Heterogeneous Computing. In Proceedings of the International Symposium on Computer Architecture (ISCA).Google ScholarDigital Library
- Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2018. Graph Attention Networks. In Proceedings of the International Conference on Learning Representations (ICLR).Google Scholar
- Jianguo Wang, Dongchul Park, Yang-Suk Kee, Yannis Papakonstantinou, and Steven Swanson. 2016. SSD In-Storage Computing for List Intersection. In Proceedings of the International Workshop on Data Management on New Hardware (DaMoN).Google ScholarDigital Library
- Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, Ang Li, Shuaiwen Leon Song, Zenglin Xu, and Tim Kraska. 2018. Superneurons: Dynamic GPU Memory Management for Training Deep Neural Networks. In Proceedings of the Symposium on Principles and Practice of Parallel Programming (PPOPP).Google ScholarDigital Library
- Minjie Wang, Da Zheng, Zihao Ye, Quan Gan, Mufei Li, Xiang Song, Jinjing Zhou, Chao Ma, Lingfan Yu, Yu Gai, Tianjun Xiao, Tong He, George Karypis, Jinyang Li, and Zheng Zhang. 2019. Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks. arXiv preprint arXiv:1909.01315 (2019).Google Scholar
- Xiaowei Wang, Jiecao Yu, Charles Augustine, Ravi Iyer, and Reetuparna Das. 2019. Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA).Google ScholarCross Ref
- Yuke Wang, Boyuan Feng, Gushu Li, Shuangchen Li, Lei Deng, Yuan Xie, and Yufei Ding. 2021. GNNAdvisor: An Adaptive and Efficient Runtime System for GNN Acceleration on GPUs. In Proceedings of USENIX Symposium on Operating Systems Design and Implementation (OSDI).Google Scholar
- Mark Wilkening, Udit Gupta, Samuel Hsia, Caroline Trippel, Carole-Jean Wu, David Brooks, and Gu-Yeon Wei. 2021. RecSSD: Near Data Processing for Solid State Drive Based Recommendation Inference. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).Google ScholarDigital Library
- Louis Woods, Zsolt István, and Gustavo Alonso. 2014. Ibex: An Intelligent Storage Engine with Support for Advanced SQL Offloading. Proceedings of the VLDB Endowment (PVLDB) (2014).Google ScholarDigital Library
- Shuotao Xu, Thomas Bourgeat, Tianhao Huang, Hojun Kim, Sungjin Lee, and Arvind Arvind. 2020. AQUOMAN: An Analytic-Query Offloading Machine. In Proceedings of the International Symposium on Microarchitecture (MICRO).Google ScholarCross Ref
- Mingyu Yan, Lei Deng, Xing Hu, Ling Liang, Yujing Feng, Xiaochun Ye, Zhimin Zhang, Dongrui Fan, and Yuan Xie. 2020. HyGCN: A GCN Accelerator with Hybrid Architecture. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA).Google ScholarCross Ref
- Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton, and Jure Leskovec. 2018. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD).Google ScholarDigital Library
- Rex Ying, Jiaxuan You, Christopher Morris, Xiang Ren, William L. Hamilton, and Jure Leskovec. 2018. Hierarchical Graph Representation Learning with Differentiable Pooling. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS).Google Scholar
- Hanqing Zeng and Viktor Prasanna. 2020.. GraphACT: Accelerating GCN Training on CPU-FPGA Heterogeneous Platforms. In Proceedings of the ACM International Symposium on Field-Programmable Gate Arrays (FPGA).Google ScholarDigital Library
- Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor Prasanna. 2020. GraphSAINT: Graph Sampling Based Inductive Learning Method. In Proceedings of the International Conference on Learning Representations (ICLR).Google Scholar
- Da Zheng, Chao Ma, Minjie Wang, Jinjing Zhou, Qidong Su, Xiang Song, Quan Gan, Zheng Zhang, and George Karypis. 2021. DistDGL: Distributed Graph Neural Network Training for Billion-Scale Graphs. arXiv preprint arXiv:2010.05337 (2021).Google Scholar
- Rong Zhu, Kun Zhao, Hongxia Yang, Wei Lin, Chang Zhou, Baole Ai, Yong Li, and Jingren Zhou. 2019. AliGraph: A Comprehensive Graph Neural Network Platform. Proceedings of the VLDB Endowment (PVLDB) (2019).Google ScholarDigital Library
Index Terms
- SmartSAGE: training large-scale graph neural networks using in-storage processing architectures
Recommendations
ParaBit: Processing Parallel Bitwise Operations in NAND Flash Memory based SSDs
MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on MicroarchitectureProcessing-in-memory (PIM) and in-storage-computing (ISC) architectures have been constructed to implement computation inside memory and near storage, respectively. While effectively mitigating the overhead of data movement from memory and storage to ...
Design space exploration for PIM architectures in 3D-stacked memories
CF '18: Proceedings of the 15th ACM International Conference on Computing FrontiersScaling existing architectures to large-scale data-intensive applications is limited by energy and performance losses caused by off-chip memory communication and data movements in the cache hierarchy. Processing-in-Memory (PIM) has been recently ...
REGISTOR: A Platform for Unstructured Data Processing Inside SSD Storage
Special Issue on ACM International Systems and Storage Conference (SYSTOR) 2018This article presents REGISTOR, a platform for regular expression grabbing inside storage. The main idea of Registor is accelerating regular expression (regex) search inside storage where large data set is stored, eliminating the I/O bottleneck problem. ...
Comments