skip to main content
10.1145/3352460.3358304acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Manna: An Accelerator for Memory-Augmented Neural Networks

Published:12 October 2019Publication History

ABSTRACT

Memory-augmented neural networks (MANNs)-- which augment a traditional Deep Neural Network (DNN) with an external, differentiable memory-- are emerging as a promising direction in machine learning. MANNs have been shown to achieve one-shot learning and complex cognitive capabilities that are well beyond those of classical DNNs. We analyze the computational characteristics of MANNs and observe that they present a unique challenge due to soft reads and writes to the differentiable memory, each of which requires access to all the memory locations. This results in poor performance of MANNs on modern CPUs, GPUs, and other accelerators. To address this, we present Manna, a specialized hardware inference accelerator for MANNs. Manna is a memory-centric design that focuses on maximizing performance in an extremely low FLOPS/Byte context. The key architectural features from which Manna derives efficiency are: (i) investing most of the die area and power in highly banked on-chip memories that provide ample bandwidth rather than large matrix-multiply units that would be underutilized due to the low reuse (ii) a hardware-assisted transpose mechanism for accommodating the diverse memory access patterns observed in MANNs, (iii) a specialized processing tile that is equipped to handle the nearly-equal mix of MAC and non-MAC computations present in MANNs, and (iv) methods to map MANNs to Manna that minimize data movement while fully exploiting the little reuse present. We evaluate Manna by developing a detailed architectural simulator with timing and power models calibrated by synthesis to the 15 nm Nangate Open Cell library. Across a suite of 10 benchmarks, Manna demonstrates average speedups of 39x with average energy improvements of 122x over an NVIDIA 1080-Ti Pascal GPU and average speedups of 24x with average energy improvements of 86x over a state-of-the-art NVIDIA 2080-Ti Turing GPU.

References

  1. AMD. [n. d.]. High-Bandwidth Memory: Reinventing Memory Technology.Google ScholarGoogle Scholar
  2. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).Google ScholarGoogle Scholar
  3. Srimat Chakradhar, Murugan Sankaradas, Venkata Jakkula, and Srihari Cadambi. 2010. A Dynamically Configurable Coprocessor for Convolutional Neural Networks. In Proceedings of the International Symposium on Computer Architecture (ISCA '10). ACM, New York, NY, USA, 247--257. https://doi.org/10.1145/1815961.1815993Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Andre Xian Ming Chang and Eugenio Culurciello. 2017. Hardware accelerators for recurrent neural networks on FPGA. In Proceedings of the International Symposium on Circuits and Systems (ISCAS). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  5. Andre Xian Ming Chang, Berin Martini, and Eugenio Culurciello. 2015. Recurrent neural networks hardware implementation on FPGA. arXiv preprint arXiv:1511.05552 (2015).Google ScholarGoogle Scholar
  6. Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: A Small-footprint High-throughput Accelerator for Ubiquitous Machine-learning. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14). ACM, New York, NY, USA, 269--284. https: //doi.org/10.1145/2541940.2541967Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, et al. 2014. Dadiannao: A machine-learning supercomputer. In Proceedings of the International Symposium on Microarchitecture (MICRO '14). IEEE Computer Society, 609--622.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2018. Eyeriss v2: A Flexible and High-Performance Accelerator for Emerging Deep Neural Networks. arXiv preprint arXiv:1807.07928 (2018).Google ScholarGoogle Scholar
  9. Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits 52, 1 (2017), 127--138.Google ScholarGoogle ScholarCross RefCross Ref
  10. Kyunghyun Cho, Bart Van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014).Google ScholarGoogle Scholar
  11. Clément Farabet, Berin Martini, Benoit Corda, Polina Akselrod, Eugenio Culurciello, and Yann LeCun. 2011. Neuflow: A runtime reconfigurable dataflow processor for vision. In Proceedings of the Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops 2011). IEEE, 109--116.Google ScholarGoogle ScholarCross RefCross Ref
  12. Vinayak Gokhale, Aliasger Zaidy, Andre Xian Ming Chang, and Eugenio Culurciello. 2017. Snowflake: A model agnostic accelerator for deep convolutional neural networks. arXiv preprint arXiv:1708.02579 (2017).Google ScholarGoogle Scholar
  13. Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP).Google ScholarGoogle ScholarCross RefCross Ref
  14. Alex Graves, Greg Wayne, and Ivo Danihelka. 2014. Neural turing machines. arXiv preprint arXiv:1410.5401 (2014).Google ScholarGoogle Scholar
  15. Alex Graves, Greg Wayne, Malcolm Reynolds, Tim Harley, Ivo Danihelka, Agnieszka Grabska-Barwińska, Sergio Gómez Colmenarejo, Edward Grefenstette, Tiago Ramalho, John Agapiou, et al. 2016. Hybrid computing using a neural network with dynamic external memory. Nature 538, 7626 (2016), 471.Google ScholarGoogle Scholar
  16. Edward Grefenstette, Karl Moritz Hermann, Mustafa Suleyman, and Phil Blunsom. 2015. Learning to transduce with unbounded memory. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS). 1828--1836.Google ScholarGoogle Scholar
  17. Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, et al. 2017. Ese: Efficient speech recognition engine with sparse lstm on fpga. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA).Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).Google ScholarGoogle Scholar
  19. Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, et al. 2014. Deep speech: Scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567 (2014).Google ScholarGoogle Scholar
  20. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR '16). 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  21. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.Google ScholarGoogle Scholar
  22. Hanhwi Jang, Joonsung Kim, Jae-Eon Jo, Jaewon Lee, and Jangwoo Kim. 2019. MnnFast: A Fast and Scalable System Architecture for Memory-augmented Neural Networks. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA '19). ACM, New York, NY, USA, 250--263. https://doi.org/10.1145/3307650.3322214Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the International Symposium on Computer Architecture (ISCA).Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Duckhwan Kim, Jaeha Kung, Sek Chai, Sudhakar Yalamanchili, and Saibal Mukhopadhyay. 2016. Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory. In Proceedings of the International Symposium on Computer Architecture (ISCA).Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, and Richard Socher. 2016. Ask me anything: Dynamic memory networks for natural language processing. In Proceedings of the International Conference on Machine Learning (ICML).Google ScholarGoogle Scholar
  26. Sheng Li, Ke Chen, Jung Ho Ahn, Jay B Brockman, and Norman P Jouppi. 2011. CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques. In Proceedings of the International Conference on Computer-Aided Design (ICCAD). IEEE Press, 694--701.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sicheng Li, Chunpeng Wu, Hai Li, Boxun Li, Yu Wang, and Qinru Qiu. 2015. Fpga acceleration of recurrent neural network based language model. In Proceedings of the International Symposium on Field-Programmable Custom Computing Machines (FCCM).Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Eriko Nurvitadhi, Jaewoong Sim, David Sheffield, Asit Mishra, Srivatsan Krishnan, and Debbie Marr. 2016. Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  29. Seongsik Park, Jaehee Jang, Seijoon Kim, and Sungroh Yoon. 2019. Energy-Efficient Inference Accelerator for Memory-Augmented Neural Networks on an FPGA. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1587--1590.Google ScholarGoogle ScholarCross RefCross Ref
  30. Ashish Ranjan, Shubham Jain, Jacob R. Stevens, Dipankar Das, Bharat Kaul, and Anand Raghunathan. 2019. X-MANN: A Crossbar Based Architecture for Memory Augmented Neural Networks. In Proceedings of the 56th Annual Design Automation Conference 2019 (DAC '19). ACM, New York, NY, USA, Article 130, 6 pages. https://doi.org/10.1145/3316781.3317935Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Greg Ruetsch and Paulius Micikevicius. 2009. Optimizing matrix transpose in CUDA. Nvidia CUDA SDK Application Note 18 (2009).Google ScholarGoogle Scholar
  32. Hardik Sharma, Jongse Park, Naveen Suda, Liangzhen Lai, Benson Chau, Vikas Chandra, and Hadi Esmaeilzadeh. 2018. Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural networks. In Proceedings of the International Symposium on Computer Architecture (ISCA).Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Sainbayar Sukhbaatar, Jason Weston, Rob Fergus, et al. 2015. End-to-end memory networks. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS).Google ScholarGoogle Scholar
  34. Swagath Venkataramani, Ashish Ranjan, Subarno Banerjee, Dipankar Das, Sasikanth Avancha, Ashok Jagannathan, Ajaya Durg, Dheemanth Nagaraj, Bharat Kaul, Pradeep Dubey, and Anand Raghunathan. 2017. ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks. In Proceedings of the International Symposium on Computer Architecture (ISCA '17). ACM, New York, NY, USA, 13--26. https://doi.org/10.1145/3079856.3080244Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Zhisheng Wang, Jun Lin, and Zhongfeng Wang. 2017. Accelerating recurrent neural networks: A memory-efficient approach. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25, 10 (2017), 2763--2775.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Greg Wayne, Chia-Chun Hung, David Amos, Mehdi Mirza, Arun Ahuja, Agnieszka Grabska-Barwinska, Jack Rae, Piotr Mirowski, Joel Z Leibo, Adam Santoro, et al. 2018. Unsupervised predictive memory in a goal-directed agent. arXiv preprint arXiv:1803.10760 (2018).Google ScholarGoogle Scholar
  37. Jason Weston, Sumit Chopra, and Antoine Bordes. 2014. Memory networks. arXiv preprint arXiv:1410.3916 (2014).Google ScholarGoogle Scholar
  38. Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016).Google ScholarGoogle Scholar
  39. Caiming Xiong, Stephen Merity, and Richard Socher. 2016. Dynamic memory networks for visual and textual question answering. In Proceedings of the International Conference on Machine Learning (ICML).Google ScholarGoogle Scholar
  40. Tsung Tai Yeh, Amit Sabne, Putt Sakdhnagool, Rudolf Eigenmann, and Timothy G. Rogers. 2017. Pagoda: Fine-Grained GPU Resource Virtualization for Narrow Tasks. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). https://doi.org/10.1145/3018743.3018754Google ScholarGoogle Scholar
  41. Xiang Zhang and Yann LeCun. 2015. Text understanding from scratch. arXiv preprint arXiv:1502.01710 (2015).Google ScholarGoogle Scholar

Index Terms

  1. Manna: An Accelerator for Memory-Augmented Neural Networks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture
      October 2019
      1104 pages
      ISBN:9781450369381
      DOI:10.1145/3352460

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 October 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate484of2,242submissions,22%

      Upcoming Conference

      MICRO '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader