skip to main content
research-article
Open Access

A Classification of Memory-Centric Computing

Published:30 January 2020Publication History
Skip Abstract Section

Abstract

Technological and architectural improvements have been constantly required to sustain the demand of faster and cheaper computers. However, CMOS down-scaling is suffering from three technology walls: leakage wall, reliability wall, and cost wall. On top of that, a performance increase due to architectural improvements is also gradually saturating due to three well-known architecture walls: memory wall, power wall, and instruction-level parallelism (ILP) wall. Hence, a lot of research is focusing on proposing and developing new technologies and architectures. In this article, we present a comprehensive classification of memory-centric computing architectures; it is based on three metrics: computation location, level of parallelism, and used memory technology. The classification not only provides an overview of existing architectures with their pros and cons but also unifies the terminology that uniquely identifies these architectures and highlights the potential future architectures that can be further explored. Hence, it sets up a direction for future research in the field.

References

  1. Shaizeen Aga, Supreet Jeloka, Arun Subramaniyan, Satish Narayanasamy, David Blaauw, and Reetuparna Das. 2017. Compute caches. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA’17). IEEE, 481--492.Google ScholarGoogle ScholarCross RefCross Ref
  2. Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2016. A scalable processing-in-memory accelerator for parallel graph processing. ACM SIGARCH Computer Architecture News 43, 3 (2016), 105--117.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Marco A. Z. Alves, Matthias Diener, Paulo C. Santos, and Luigi Carro. 2016. Large vector extensions inside the HMC. In Design, Automation and Test in Europe Conference and Exhibition (DATE'16). IEEE, 1249--1254.Google ScholarGoogle Scholar
  4. Marco Antonio Zanata Alves, Carlos Villavieja, Matthias Diener, Francis Birck Moreira, and Philippe Olivier Alexandre Navaux. 2015. SiNUCA: A validated micro-architecture simulator. In Proceeding of International Conference on High Performance Computing and Communications (HPCC), International Symposium on Cyberspace Safety and Security (CSS), and International Conference on Embedded Software and Systems (ICESS). 605--610.Google ScholarGoogle Scholar
  5. Luca Amarú, Pierre-Emmanuel Gaillardon, and Giovanni De Micheli. 2015. The EPFL combinational benchmark suite. In Proceedings of the 24th International Workshop on Logic 8 Synthesis (IWLS’15).Google ScholarGoogle Scholar
  6. Ali BanaGozar, Kanishkan Vadivel, Sander Stuijk, Henk Corporaal, Stephan Wong, Muath Abu Lebdeh, Jintao Yu, and Said Hamdioui. 2019. CIM-SIM: Computation in memory SIMuIator. In International Workshop on Software and Compilers for Embedded Systems. ACM, 1--4.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. John Barth, Don Plass, Erik Nelson, Charlie Hwang, Gregory Fredeman, Michael Sperling, Abraham Mathews, Toshiaki Kirihata, William R. Reohr, Kavita Nair, and Nianzheng Cao. 2010. A 45nm SOI embedded DRAM macro for the POWER™ processor 32 MByte on-chip L3 cache. IEEE Journal of Solid-State Circuits 46, 1 (2010), 64--75.Google ScholarGoogle ScholarCross RefCross Ref
  8. Gary Benson, Yozen Hernandez, and Joshua Loving. 2013. A bit-parallel, general integer-scoring sequence alignment algorithm. In Annual Symposium on Combinatorial Pattern Matching. Springer, 50--61.Google ScholarGoogle ScholarCross RefCross Ref
  9. Debjyoti Bhattacharjee, Rajeswari Devadoss, and Anupam Chattopadhyay. 2017. ReVAMP: ReRAM based VLIW architecture for in-memory computing. In 2017 Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’17). IEEE, 782--787.Google ScholarGoogle Scholar
  10. Sabpreet Bhatti, Rachid Sbiaa, Atsufumi Hirohata, Hideo Ohno, Shunsuke Fukami, and S. N. Piramanayagam. 2017. Spintronics based random access memory: A review. Materials Today 20, 9 (2017), 530--548.Google ScholarGoogle ScholarCross RefCross Ref
  11. Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. ACM, 72--81.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, et al. 2011. The gem5 simulator. ACM SIGARCH Computer Architecture News 39, 2 (2011), 1--7.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Evgeny Bolotin, David Nellans, Oreste Villa, Mike O’Connor, Alex Ramirez, and Stephen W. Keckler. 2015. Designing efficient heterogeneous memory architectures. IEEE Micro 35, 4 (2015), 60--68.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Julien Borghetti, Gregory S. Snider, Philip J. Kuekes, J. Joshua Yang, Duncan R. Stewart, and R. Stanley Williams. 2010. Memristive switches enable stateful logic operations via material implication. Nature 464, 7290 (2010), 873--876.Google ScholarGoogle Scholar
  15. S. Borkar. 1999. Design challenges of technology scaling. IEEE Micro 19, 4 (July 1999), 23--29. DOI:https://doi.org/10.1109/40.782564Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Rafmag Cabrera, Emmanuelle Merced, and Nelson Sepúlveda. 2013. A micro-electro-mechanical memory based on the structural phase transition of VO2. Physica Status Solidi (a) 210, 9 (2013), 1704--1711.Google ScholarGoogle ScholarCross RefCross Ref
  17. Meng-Fan Chang, Ching-Hao Chuang, Min-Ping Chen, Lai-Fu Chen, Hiroyuki Yamauchi, Pi-Feng Chiu, and Shyh-Shyuan Sheu. 2012. Endurance-aware circuit designs of nonvolatile logic and nonvolatile SRAM using resistive memory (memristor) device. In 2012 17th Asia and South Pacific Design Automation Conference (ASP-DAC’12). IEEE, 329--334.Google ScholarGoogle ScholarCross RefCross Ref
  18. Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In IEEE International Symposium on Workload Characterization, 2009 (IISWC’09). IEEE, 44--54.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. Chen, D. Apalkov, Z. Diao, A. Driskill-Smith, D. Druist, D. Lottis, V. Nikitin, X. Tang, S. Watts, S. Wang, et al. 2010. Advances and future prospects of spin-transfer torque random access memory. IEEE Transactions on Magnetics 46, 6 (2010), 1873--1878.Google ScholarGoogle ScholarCross RefCross Ref
  20. Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, et al. 2014. Dadiannao: A machine-learning supercomputer. In IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 609--622.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In ACM SIGARCH Computer Architecture News, Vol. 44. IEEE Press, 27--39.Google ScholarGoogle Scholar
  22. Gianni Conte, Stefano Tommesani, and Francesco Zanichelli. 2000. The long and winding road to high-performance image processing with MMX/SSE. In Proceedings of the 5th IEEE International Workshop on Computer Architectures for Machine Perception, 2000. IEEE, 302--310.Google ScholarGoogle ScholarCross RefCross Ref
  23. Joao Paulo C. de Lima, Paulo Cesar Santos, Marco A. Z. Alves, Antonio C. S. Beck, and Luigi Carro. 2018. Design space exploration for PIM architectures in 3D-stacked memories. In Computer Frontier. ACM, 295--308.Google ScholarGoogle Scholar
  24. Jaffrey Draper, J. Tim Barrett, Jeff Sondeen, Sumit Mediratta, Chang Woo Kang, Ihn Kim, and Gokhan Daglikoca. 2005. A prototype processing-in-memory (PIM) chip for the data-intensive architecture (DIVA) system. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology 40, 1 (2005), 73--84.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jeff Draper, Jacqueline Chame, Mary Hall, Craig Steele, Tim Barrett, Jeff LaCoss, John Granacki, Jaewook Shin, Chun Chen, Chang Woo Kang, et al. 2002. The architecture of the DIVA processing-in-memory chip. In Proceedings of the 16th International Conference on Supercomputing. ACM, 14--25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. H. A. Du Nguyen, Jintao Yu, Lei Xie, Mottaqiallah Taouil, Said Hamdioui, and Dietmar Fey. 2017. Memristive devices for computing: Beyond CMOS and beyond von Neumann. In 2017 IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC’17). IEEE, 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  27. Hoang Anh Du Nguyen, Lei Xie, Mottaqiallah Taouil, Razvan Nane, Said Hamdioui, and Koen Bertels. 2017. On the implementation of computation-in-memory parallel adder. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25, 8 (2017), 2206--2219.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Dudek and S. J. Carey. 2006. General-purpose 128/spl times/128 SIMD processor array with integrated image sensor. Electronics Letters 42, 12 (2006), 678--679.Google ScholarGoogle ScholarCross RefCross Ref
  29. Charles Eckert, Xiaowei Wang, Jingcheng Wang, Arun Subramaniyan, Ravi Iyer, Dennis Sylvester, David Blaauw, and Reetuparna Das. 2018. Neural cache: Bit-serial in-cache acceleration of deep neural networks. arXiv preprint arXiv:1805.03718 (2018).Google ScholarGoogle Scholar
  30. Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo, Rebecca L. Stamm, and Dean M. Tullsen. 1997. Simultaneous multithreading: A platform for next-generation processors. IEEE Micro 17, 5 (1997), 12--19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Amin Farmahini-Farahani, Jung Ho Ahn, Katherine Morrow, and Nam Sung Kim. 2015. DRAMA: An architecture for accelerated processing near memory. IEEE Computer Architecture Letters 14, 1 (2015), 26--29.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Tim Finkbeiner, Glen Hush, Troy Larsen, Perry Lea, John Leidel, and Troy Manning. 2017. In-memory intelligence. IEEE Micro 37, 4 (2017), 30--38.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Nadeem Firasta, Mark Buxton, Paula Jinbo, Kaveh Nasri, and Shihjong Kuo. 2008. Intel AVX: New frontiers in performance improvements and energy efficiency. Intel White Paper 19 (2008), 20.Google ScholarGoogle Scholar
  34. Randall James Fisher. 2003. General-purpose SIMD within a register: Parallel processing on consumer microprocessors. Doctoral Dissertation.Google ScholarGoogle Scholar
  35. M. Flynn. 1966. Very high-speed computing systems. Proceedings of the IEEE 54, 12 (Dec. 1966), 1901--1909. DOI:https://doi.org/10.1109/PROC.1966.5273Google ScholarGoogle ScholarCross RefCross Ref
  36. G. D. Fuchs, N. C. Emley, I. N. Krivorotov, P. M. Braganca, E. M. Ryan, S. I. Kiselev, J. C. Sankey, D. C. Ralph, R. A. Buhrman, and J. A. Katine. 2004. Spin-transfer effects in nanoscale magnetic tunnel junctions. Applied Physics Letters 85, 7 (2004), 1205--1207.Google ScholarGoogle ScholarCross RefCross Ref
  37. Daichi Fujiki, Scott Mahlke, and Reetuparna Das. 2018. In-memory data parallel processor. In Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 1--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Pierre-Emmanuel Gaillardon, Luca Amar, Anne Siemon, Eike Linn, Rainer Waser, Anupam Chattopadhyay, and Giovanni De Micheli. 2016. The programmable logic-in-memory (PLiM) computer. In 2016 Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’16). IEEE, 427--432.Google ScholarGoogle Scholar
  39. Mingyu Gao, Grant Ayers, and Christos Kozyrakis. 2015. Practical near-data processing for in-memory analytics frameworks. In 2015 International Conference on Parallel Architecture and Compilation (PACT’15). IEEE, 113--124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Simcha Gochman, Avi Mendelson, Alon Naveh, and Efraim Rotem. 2006. Introduction to Intel core duo processor architecture. Intel Technology Journal 10, 2 (2006), 89--97.Google ScholarGoogle ScholarCross RefCross Ref
  41. Jonathan E. Green, Jang Wook Choi, Akram Boukai, Yuri Bunimovich, Ezekiel Johnston-Halperin, Erica DeIonno, Yi Luo, Bonnie A. Sheriff, Ke Xu, Young Shik Shin, et al. 2007. A 160-kilobit molecular electronic memory patterned at 10 11 bits per square centimetre. Nature 445, 7126 (2007), 414.Google ScholarGoogle Scholar
  42. Beat Halg. 1990. On a micro-electro-mechanical nonvolatile memory cell. IEEE Transactions on Electron Devices 37, 10 (1990), 2230--2236.Google ScholarGoogle ScholarCross RefCross Ref
  43. Said Hamdioui, Koenraad Laurent Maria Bertels, and Mottaqiallah Taouil. 2017. Computing Device for Big Data Applications Using Memristors. US Patent 9,824,753.Google ScholarGoogle Scholar
  44. Said Hamdioui, Hoang Anh Du Nguyen, Mottaqiallah Taouil, Abu Sebastian, Manuel Le Gallo, Sandeep Pande, Siebren Schaafsma, Francky Catthoor, Shidhartha Das, Fernando G. Redondo, et al. 2019. Applications of computation-in-memory architectures based on memristive devices. In 2019 Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’19). IEEE, 486--491.Google ScholarGoogle Scholar
  45. Said Hamdioui, Shahar Kvatinsky, Gert Cauwenberghs, Lei Xie, Nimrod Wald, Siddharth Joshi, Hesham Mostafa Elsayed, Henk Corporaal, and Koen Bertels. 2017. Memristor for computing: Myth or reality? In Proceedings of the Conference on Design, Automation 8 Test in Europe. European Design and Automation Association, 722--731.Google ScholarGoogle ScholarCross RefCross Ref
  46. Said Hamdioui, Lei Xie, Hoang Anh Du Nguyen, Mottaqiallah Taouil, Koen Bertels, Henk Corporaal, Hailong Jiao, Francky Catthoor, Dirk Wouters, Linn Eike, et al. 2015. Memristor based computation-in-memory architecture for data-intensive applications. In Proceedings of the 2015 Design, Automation 8 Test in Europe Conference 8 Exhibition. EDA Consortium, 1718--1725.Google ScholarGoogle Scholar
  47. JongWook Han, Choon-Sik Park, Dae-Hyun Ryu, and Eun-Soo Kim. 1999. Optical image encryption based on XOR operations. Optical Engineering 38, 1 (1999), 47--55.Google ScholarGoogle ScholarCross RefCross Ref
  48. Adib Haron, Jintao Yu, Razvan Nane, Mottaqiallah Taouil, Said Hamdioui, and Koen Bertels. 2016. Parallel matrix multiplication on memristor-based computation-in-memory architecture. In 2016 International Conference on High Performance Computing 8 Simulation (HPCS’16). IEEE, 759--766.Google ScholarGoogle ScholarCross RefCross Ref
  49. John L. Hennessy and David A. Patterson. 2011. Computer Architecture: A Quantitative Approach. Elsevier.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. HMC. 2018. Hybrid Memory Cube Specification 2.1. Retrieved from http://hybridmemorycube.org/.Google ScholarGoogle Scholar
  51. M. Hosomi, H. Yamagishi, T. Yamamoto, K. Bessho, Y. Higo, K. Yamane, H. Yamada, M. Shoji, H. Hachino, C. Fukumoto, et al. 2005. A novel nonvolatile memory with spin torque transfer magnetization switching: Spin-RAM. In IEEE International Electron Devices Meeting, 2005. IEDM Technical Digest. IEEE, 459--462.Google ScholarGoogle ScholarCross RefCross Ref
  52. Rotem Ben Hur and Shahar Kvatinsky. 2016. Memristive memory processing unit (MPU) controller for in-memory processing. In IEEE International Conference on the Science of Electrical Engineering (ICSEE’16). IEEE, 1--5.Google ScholarGoogle Scholar
  53. IBM. 2014. Power 4 - The First Multi-Core, 1GHz Processor.Google ScholarGoogle Scholar
  54. ITRS. 2010. ITRS ERD Report. Retrieved from http://www.itrs.net.Google ScholarGoogle Scholar
  55. Subramanian S. Iyer and Howard L. Kalter. 1999. Embedded DRAM technology: Opportunities and challenges. IEEE Spectrum 36, 4 (1999), 56--64.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Shubham Jain, Ashish Ranjan, Kaushik Roy, and Anand Raghunathan. 2017. Computing in memory with spin-transfer torque magnetic RAM. arXiv preprint arXiv:1703.02118 (2017).Google ScholarGoogle Scholar
  57. Joe Jeddeloh and Brent Keeth. 2012. Hybrid memory cube new DRAM architecture increases density and performance. In 2012 Symposium on VLSI Technology (VLSIT’12). IEEE, 87--88.Google ScholarGoogle ScholarCross RefCross Ref
  58. Zhang Jianwu, Zhao Danying, et al. 2008. Survey on microprocessor architecture and development trends. In 11th IEEE International Conference on Communication Technology, 2008 (ICCT’08). IEEE, 297--300.Google ScholarGoogle Scholar
  59. David Judd, Katherine Yelick, Christoforos Kozyrakis, David Martin, and David Patterson. 2001. Exploiting on-chip memory bandwidth in the VIRAM compiler. In Intelligent Memory Systems. Springer, 122--134.Google ScholarGoogle Scholar
  60. Hongshin Jun, Jinhee Cho, Kangseol Lee, Ho-Young Son, Kwiwook Kim, Hanho Jin, and Keith Kim. 2017. HBM (high bandwidth memory) DRAM technology and architecture. In 2017 IEEE International Memory Workshop (IMW’17). IEEE, 1--4.Google ScholarGoogle ScholarCross RefCross Ref
  61. Ron Kalla, Balaram Sinharoy, William J. Starke, and Michael Floyd. 2010. Power7: IBM’s next-generation server processor. IEEE Micro 30, 2 (2010), 7--15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Yi Kang, Wei Huang, Seung-Moon Yoo, D. Keen, Zhenzhou Ge, V. Lam, P. Pattnaik, and J. Torrellas. [n.d.]. FlexRAM: Toward an advanced intelligent memory system. In 2012 IEEE 30th International Conference on Computer Design (ICCD’12). 5--14. DOI:https://doi.org/10.1109/ICCD.2012.6378608Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Yi Kang, Wei Huang, Seung-Moon Yoo, Diana Keen, Zhenzhou Ge, Vinh Lam, Pratap Pattnaik, and Josep Torrellas. 2012. FlexRAM: Toward an advanced intelligent memory system. In 2012 IEEE 30th International Conference on Computer Design (ICCD’12). IEEE, 5--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Doris Keitel-Schulz and Norbert Wehn. 1998. Issues in embedded DRAM development and applications. In Proceedings of the 11th International Symposium on System Synthesis. IEEE Computer Society, 23--31.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Doris Keitel-Schulz and Norbert Wehn. 2001. Embedded DRAM development: Technology, physical design, and application issues. IEEE Design 8 Test of Computers 18, 3 (2001), 7--15.Google ScholarGoogle Scholar
  66. Kyosun Kim, Sangho Shin, and Sung-Mo Kang. 2011. Stateful logic pipeline architecture. In 2011 IEEE International Symposium of Circuits and Systems (ISCAS’11). IEEE, 2497--2500.Google ScholarGoogle ScholarCross RefCross Ref
  67. David Kirk et al. 2007. NVIDIA CUDA software and GPU parallel computing architecture. In ISMM, Vol. 7. 103--104.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Christoforos Kozyrakis. 2002. Scalable Vector Media-Processors for Embedded Systems. Technical Report. California University Berkeley Computer Science Division.Google ScholarGoogle Scholar
  69. Christoforos Kozyrakis and David Patterson. 2002. Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks. In Proceedings of the 35th Annual ACM/IEEE International Symposium on Microarchitecture. IEEE Computer Society Press, 283--293.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Christoforos E. Kozyrakis, Stylianos Perissakis, David Patterson, Thomas Anderson, Krste Asanovic, Neal Cardwell, Richard Fromm, Jason Golbus, Benjamin Gribstad, Kimberly Keeton, et al. 1997. Scalable processors in the billion-transistor era: IRAM. Computer 30, 9 (1997), 75--78.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Nasser Kurd, Muntaquim Chowdhury, Edward Burton, Thomas P. Thomas, Christopher Mozak, Brent Boswell, Praveen Mosalikanti, Mark Neidengard, Anant Deval, Ashish Khanna, et al. 2014. Haswell: A family of IA 22nm processors. IEEE Journal of Solid-State Circuits 50, 1 (2014), 49--58.Google ScholarGoogle ScholarCross RefCross Ref
  72. Shahar Kvatinsky, Dmitry Belousov, Slavik Liman, Guy Satat, Nimrod Wald, Eby G. Friedman, Avinoam Kolodny, and Uri C. Weiser. 2014. MAGIC--Memristor-aided logic. IEEE Transactions on Circuits and Systems II: Express Briefs 61, 11 (2014), 895--899.Google ScholarGoogle ScholarCross RefCross Ref
  73. Shahar Kvatinsky, Guy Satat, Nimrod Wald, Eby G. Friedman, Avinoam Kolodny, and Uri C. Weiser. 2014. Memristor-based material implication (IMPLY) logic: Design principles and methodologies. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 22, 10 (2014), 2054--2066.Google ScholarGoogle ScholarCross RefCross Ref
  74. Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2010. Phase change memory architecture and the quest for scalability. Communications of the ACM 53, 7 (2010), 99--106.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Jong Chern Lee, Jihwan Kim, Kyung Whan Kim, Young Jun Ku, Dae Suk Kim, Chunseok Jeong, Tae Sik Yun, Hongjung Kim, Ho Sung Cho, Yeon Ok Kim, et al. 2016. 18.3 A 1.2 V 64Gb 8-channel 256GB/s HBM DRAM with peripheral-base-die architecture and small-swing technique on heavy load interface. In 2016 IEEE International Solid-State Circuits Conference (ISSCC’16). IEEE, 318--319.Google ScholarGoogle ScholarCross RefCross Ref
  76. Eero Lehtonen, Jussi H. Poikonen, and Mika Laiho. 2014. Memristive stateful logic. In Memristor Networks. Springer, 603--623.Google ScholarGoogle Scholar
  77. John D. Leidel and Yong Chen. 2016. Hmc-sim-2.0: A simulation platform for exploring custom memory cube operations. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW’16). IEEE, 621--630.Google ScholarGoogle Scholar
  78. Chao Li, Wendy Fan, Bo Lei, Daihua Zhang, Song Han, Tao Tang, Xiaolei Liu, Zuqin Liu, Sylvia Asano, Meyya Meyyappan, et al. 2004. Multilevel memory based on molecular devices. Applied Physics Letters 84, 11 (2004), 1949--1951.Google ScholarGoogle ScholarCross RefCross Ref
  79. Chao Li, Daihua Zhang, Xiaolei Liu, Song Han, Tao Tang, Chongwu Zhou, Wendy Fan, Jessica Koehne, Jie Han, Meyya Meyyappan, et al. 2003. Fabrication approach for molecular memory arrays. Applied Physics Letters 82, 4 (2003), 645--647.Google ScholarGoogle ScholarCross RefCross Ref
  80. Shuangchen Li, Dimin Niu, Krishna T. Malladi, Hongzhong Zheng, Bob Brennan, and Yuan Xie. 2017. DRISA: A DRAM -based reconfigurable in-situ accelerator. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 288--301.Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Shuangchen Li, Cong Xu, Qiaosha Zou, Jishen Zhao, Yu Lu, and Yuan Xie. 2016. Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories. In Proceeding of ACM/EDAC/IEEE Design Automation Conference (DAC). IEEE, 173--178.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. E. Linn, R. Rosezin, S. Tappertzhofen, R. Waser, et al. 2012. Beyond von Neumann--logic operations in passive crossbar arrays alongside memory operations. Nanotechnology 23, 30 (2012), 305205.Google ScholarGoogle ScholarCross RefCross Ref
  83. Andrea Lodi, Mario Toma, Fabio Campi, Andrea Cappelli, Roberto Canegallo, and Roberto Guerrieri. 2003. A VLIW processor with reconfigurable instruction set for embedded applications. IEEE Journal of Solid-state Circuits 38, 11 (2003), 1876--1886.Google ScholarGoogle ScholarCross RefCross Ref
  84. Joe Macri. 2015. AMD’s next generation GPU and high bandwidth memory architecture: FURY. In 2015 IEEE Hot Chips 27 Symposium (HCS’15). IEEE, 1--26.Google ScholarGoogle ScholarCross RefCross Ref
  85. Ken Mai, Tim Paaske, Nuwan Jayasena, Ron Ho, William J. Dally, and Mark Horowitz. 2000. Smart memories: A modular reconfigurable architecture. ACM SIGARCH Computer Architecture News 28, 2 (2000), 161--171.Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Ariel Maislos et al. 2011. A new era in embedded Flash memory. In Flash Memory Summit.Google ScholarGoogle Scholar
  87. Jack A. Mandelman, Robert H. Dennard, Gary B. Bronner, John K. DeBrosse, Rama Divakaruni, Yujun Li, and Carl J. Radens. 2002. Challenges and future directions for the scaling of dynamic random-access memory (DRAM). IBM Journal of Research and Development 46, 2.3 (2002), 187--212.Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Pedro Marcuello, Antonio González, and Jordi Tubella. 1998. Speculative multithreaded processors. In Proceedings of the 12th International Conference on Supercomputing. ACM, 77--84.Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Sparsh Mittal. 2018. A survey of ReRAM-based architectures for processing-in-memory and neural networks. Machine Learning and Knowledge Extraction 1, 1 (2018), 75--114. DOI:https://doi.org/10.3390/make1010005Google ScholarGoogle ScholarCross RefCross Ref
  90. Amir Morad, Leonid Yavits, and Ran Ginosar. 2014. Efficient dense and sparse Matrix multiplication on GP-SIMD. In 2014 24th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS’14). IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  91. Amir Morad, Leonid Yavits, and Ran Ginosar. 2015. GP-SIMD processing-in-memory. ACM Transactions on Architecture and Code Optimization (TACO) 11, 4 (2015), 53.Google ScholarGoogle Scholar
  92. Amir Morad, Leonid Yavits, Shahar Kvatinsky, and Ran Ginosar. 2016. Resistive GP-SIMD processing-in-memory. ACM Transactions on Architecture and Code Optimization (TACO) 12, 4 (2016), 57.Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Onur Mutlu. 2013. Memory scaling: A systems architecture perspective. In 2013 5th IEEE International Memory Workshop (IMW’13). IEEE, 21--25.Google ScholarGoogle ScholarCross RefCross Ref
  94. Ravi Nair. 2015. Evolution of memory architecture. Proceedings of the IEEE 103, 8 (2015), 1331--1345.Google ScholarGoogle ScholarCross RefCross Ref
  95. Ravi Nair, Samuel F. Antao, Carlo Bertolli, Pradip Bose, Jose R. Brunheroto, Tong Chen, C.-Y. Cher, Carlos H. A. Costa, Jun Doi, Constantinos Evangelinos, et al. 2015. Active memory cube: A processing-in-memory architecture for exascale systems. IBM Journal of Research and Development 59, 2/3 (2015), 17--1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. H. Noyes et al. 2014. Micron’s automata processor architecture: Reconfigurable and massively parallel automata processing. In Proceedings of 5th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies.Google ScholarGoogle Scholar
  97. NVIDIA. 2012. Tesla K20X GPU Accelerator Board Specification.Google ScholarGoogle Scholar
  98. Mark Oskin, Frederic T. Chong, and Timothy Sherwood. 1998. Active Pages: A Computation Model for Intelligent Memory. Vol. 26. IEEE Computer Society.Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. David Patterson, Thomas Anderson, Neal Cardwell, Richard Fromm, Kimberly Keeton, Christoforos Kozyrakis, Randi Thomas, and Katherine Yelick. 1997. A case for intelligent RAM. IEEE Micro 17, 2 (1997), 34--44.Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. David A. Patterson. 2006. Future of computer architecture. In Berkeley EECS Annual Research Symposium (BEARS), College of Engineering, UC Berkeley, US.Google ScholarGoogle Scholar
  101. J. Thomas Pawlowski. 2011. Hybrid memory cube (HMC). In 2011 IEEE Hot Chips 23 Symposium (HCS’11). IEEE, 1--24.Google ScholarGoogle ScholarCross RefCross Ref
  102. Alex Peleg and Uri Weiser. 1996. MMX technology extension to the Intel architecture. IEEE Micro 16, 4 (1996), 42--50.Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. M. Radosavljević, M. Freitag, K. V. Thadani, and A. T. Johnson. 2002. Nonvolatile molecular memory elements based on ambipolar nanotube field effect transistors. Nano Letters 2, 7 (2002), 761--764.Google ScholarGoogle ScholarCross RefCross Ref
  104. R. M. Ramanathan. 2006. Intel® multi-core processors. In Making the Move to Quad-Core and Beyond.Google ScholarGoogle Scholar
  105. Simone Raoux, Feng Xiong, Matthias Wuttig, and Eric Pop. 2014. Phase change materials and phase change memory. MRS Bulletin 39, 8 (2014), 703--710.Google ScholarGoogle ScholarCross RefCross Ref
  106. John Reuben, Rotem Ben-Hur, Nimrod Wald, Nishil Talati, Ameer Haj Ali, Pierre-Emmanuel Gaillardon, and Shahar Kvatinsky. 2017. Memristive logic: A framework for evaluation and comparison. In 2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS’17). IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  107. Gurtej S. Sandhu. 2013. Emerging memories technology landscape. In 2013 13th Non-Volatile Memory Technology Symposium (NVMTS’13). IEEE, 1--5.Google ScholarGoogle ScholarCross RefCross Ref
  108. Karthikeyan Sankaralingam, Ramadass Nagarajan, Haiming Liu, Changkyu Kim, Jaehyuk Huh, Doug Burger, Stephen W. Keckler, and Charles R. Moore. 2003. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In ACM SIGARCH Computer Architecture News, Vol. 31. ACM, 422--433.Google ScholarGoogle Scholar
  109. Vivek Seshadri, Kevin Hsieh, Amirali Boroum, Donghyuk Lee, Michael A. Kozuch, Onur Mutlu, Phillip B. Gibbons, and Todd C. Mowry. 2015. Fast bulk bitwise AND and OR in DRAM. IEEE Computer Architecture Letters 14, 2 (2015), 127--131.Google ScholarGoogle ScholarDigital LibraryDigital Library
  110. Vivek Seshadri, Donghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali Boroumand, Jeremie Kim, Michael A. Kozuch, Onur Mutlu, Phillip B. Gibbons, and Todd C. Mowry. 2017. Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 273--287.Google ScholarGoogle Scholar
  111. Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R. Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Computer Architecture News 44, 3 (2016), 14--26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. M. A. Shami and A. Hemani. 2012. Classification of massively parallel computer architectures. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW’12). 344--351. DOI:https://doi.org/10.1109/IPDPSW.2012.42Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. Patrick Siegl, Rainer Buchty, and Mladen Berekovic. 2016. Data-centric computing frontiers: A survey on processing-in-memory. In Proceedings of the 2nd International Symposium on Memory Systems. ACM, 295--308.Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. A. Siemon, S. Menzel, A. Chattopadhyay, R. Waser, and E. Linn. 2015. In-memory adder functionality in 1S1R arrays. In 2015 IEEE International Symposium on Circuits and Systems (ISCAS’15). IEEE, 1338--1341.Google ScholarGoogle Scholar
  115. Anne Siemon, Stephan Menzel, Rainer Waser, and Eike Linn. 2015. A complementary resistive switch-based crossbar array adder. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 5, 1 (2015), 64--74.Google ScholarGoogle ScholarCross RefCross Ref
  116. Gagandeep Singh, Lorenzo Chelini, Stefano Corda, Ahsan Javed Awan, Sander Stuijk, Roel Jordans, Henk Corporaal, and Albert-Jan Boonstra. 2018. A review of near-memory computing architectures: Opportunities and challenges. In Proceedings of the 21st Euromicro Conference on Digital System Design (DSD’18).Google ScholarGoogle ScholarCross RefCross Ref
  117. D. B. Skillicorn. 1988. A taxonomy for computer architectures. Computer 21, 11 (Nov. 1988), 46--57. DOI:https://doi.org/10.1109/2.86786Google ScholarGoogle ScholarDigital LibraryDigital Library
  118. G. Snider. 2005. Computing with hysteretic resistor crossbars. Applied Physics A: Materials Science 8 Processing 80, 6 (2005), 1165--1172.Google ScholarGoogle Scholar
  119. Kyomin Sohn, Won-Joo Yun, Reum Oh, Chi-Sung Oh, Seong-Young Seo, Min-Sang Park, Dong-Hak Shin, Won-Chang Jung, Sang-Hoon Shin, Je-Min Ryu, et al. 2017. A 1.2 V 20nm 307GB/s HBM DRAM with at-speed wafer-level IO test scheme and adaptive refresh considering temperature distribution. IEEE Journal of Solid-State Circuits 52, 1 (2017), 250--260.Google ScholarGoogle ScholarCross RefCross Ref
  120. Harold S. Stone. 1970. A logic-in-memory computer. IEEE Transactions on Computing 100, 1 (1970), 73--78.Google ScholarGoogle ScholarDigital LibraryDigital Library
  121. Arun Subramaniyan, Jingcheng Wang, Ezhil R. M. Balasubramanian, David Blaauw, Dennis Sylvester, and Reetuparna Das. 2017. Cache automaton. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-50’17). ACM, New York, NY, 259--272. DOI:https://doi.org/10.1145/3123939.3123986Google ScholarGoogle ScholarDigital LibraryDigital Library
  122. Jinwoo Suh, Eun-Gyu Kim, Stephen P. Crago, Lakshmi Srinivasan, and Matthew C. French. 2003. A performance analysis of PIM, stream processing, and tiled processing on memory-intensive signal processing kernels. In ACM SIGARCH Computer Architecture News, Vol. 31. ACM, 410--421.Google ScholarGoogle Scholar
  123. Mark R. Thistle and Burton J. Smith. 1988. A processor architecture for Horizon. In Proceedings of Supercomputing’88. Vol. 1. IEEE, 35--41.Google ScholarGoogle Scholar
  124. Dean M. Tullsen, Susan J. Eggers, and Henry M. Levy. 1995. Simultaneous multithreading: Maximizing on-chip parallelism. In ACM SIGARCH Computer Architecture News, Vol. 23. ACM, 392--403.Google ScholarGoogle Scholar
  125. Mario Vestias and Horácio Neto. 2014. Trends of CPU, GPU and FPGA for high-performance computing. In 2014 24th International Conference on Field Programmable Logic and Applications (FPL’14). IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  126. Borui Wang, Martin Torres, Dong Li, Jishen Zhao, and Florin Rusu. 2016. Performance implications of processing-in-memory designs on data-intensive applications. In 2016 45th International Conference on Parallel Processing Workshops (ICPPW’16). IEEE, 115--122.Google ScholarGoogle ScholarCross RefCross Ref
  127. Jue Wang, Xiangyu Dong, Yuan Xie, and Norman P. Jouppi. 2014. Endurance-aware cache line management for non-volatile caches. ACM Transactions on Architecture and Code Optimization (TACO) 11, 1 (2014), 4.Google ScholarGoogle ScholarDigital LibraryDigital Library
  128. Ying Wang, Yinhe Han, Lei Zhang, Huawei Li, and Xiaowei Li. 2015. ProPRAM: Exploiting the transparent logic resources in non-volatile memory for near data computing. In Proceedings of the 52nd Annual Design Automation Conference. ACM, 47.Google ScholarGoogle ScholarDigital LibraryDigital Library
  129. Rainer Waser. 2012. Redox-based resistive switching memories. Journal of Nanoscience and Nanotechnology 12, 10 (2012), 7628--7640.Google ScholarGoogle ScholarCross RefCross Ref
  130. Rainer Waser and Masakazu Aono. 2007. Nanoionics-based resistive switching memories. Nature Materials 6, 11 (2007), 833.Google ScholarGoogle ScholarCross RefCross Ref
  131. Stephan Wong, Thijs Van As, and Geoffrey Brown. 2008. ρ-VEX: A reconfigurable and extensible softcore VLIW processor. In International Conference on ICECE Technology, 2008 (FPT’08). IEEE, 369--372.Google ScholarGoogle Scholar
  132. Wm A. Wulf and Sally A. McKee. 1995. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Computer Architecture News 23, 1 (1995), 20--24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  133. Lei Xie, Hoang Anh Du Nguyen, Mottaqiallah Taouil, and Koen Bertels Said Hamdioui. 2015. Fast Boolean logic mapped on memristor crossbar. In 2015 33rd IEEE International Conference on Computer Design (ICCD’15). IEEE, 335--342.Google ScholarGoogle ScholarDigital LibraryDigital Library
  134. Lei Xie, Hoang Anh Du Nguyen, Jintao Yu, Ali Kaichouhi, Mottaqiallah Taouil, Mohammad AlFailakawi, and Said Hamdioui. 2017. Scouting logic: A novel memristor-based logic design for resistive computing. In IEEE Computer Society Annual Symposium on VLSI (ISVLSI’17). IEEE, 335--340.Google ScholarGoogle ScholarCross RefCross Ref
  135. Sheng Xu, Xiaoming Chen, Ying Wang, Yinhe Han, Xuehai Qian, and Xiaowei Li. 2018. PIMSim: A flexible and detailed processing-in-memory simulator. IEEE Computer Architecture Letters 18, 1 (2018), 6--9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  136. J. Joshua Yang, Dmitri B. Strukov, and Duncan R. Stewart. 2013. Memristive devices for computing. Nature Nanotechnology 8, 1 (2013), 13--24.Google ScholarGoogle ScholarCross RefCross Ref
  137. Leonid Yavits, Shahar Kvatinsky, Amir Morad, and Ran Ginosar. 2015. Resistive associative processor. In CAL.Google ScholarGoogle Scholar
  138. Jintao Yu, Lei Xie, Mottaqiallah Taouil, and Said Hamdioui. 2018. Memristive devices for computation-in-memory. In Design, Automation and Test in Europe (DATE’18).Google ScholarGoogle Scholar
  139. Shimeng Yu and Pai-Yu Chen. 2016. Emerging memory technologies: Recent trends and prospects. IEEE Solid-State Circuits Magazine 8, 2 (2016), 43--56.Google ScholarGoogle ScholarCross RefCross Ref
  140. Jian-Gang Zhu. 2008. Magnetoresistive random access memory: The path to competitiveness and scalability. Proceedings of the IEEE 96, 11 (2008), 1786--1798.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A Classification of Memory-Centric Computing

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Journal on Emerging Technologies in Computing Systems
        ACM Journal on Emerging Technologies in Computing Systems  Volume 16, Issue 2
        April 2020
        261 pages
        ISSN:1550-4832
        EISSN:1550-4840
        DOI:10.1145/3375712
        • Editor:
        • Zhaojun Bai
        Issue’s Table of Contents

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 30 January 2020
        • Accepted: 1 October 2019
        • Revised: 1 September 2019
        • Received: 1 December 2018
        Published in jetc Volume 16, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format