Abstract
Technological and architectural improvements have been constantly required to sustain the demand of faster and cheaper computers. However, CMOS down-scaling is suffering from three technology walls: leakage wall, reliability wall, and cost wall. On top of that, a performance increase due to architectural improvements is also gradually saturating due to three well-known architecture walls: memory wall, power wall, and instruction-level parallelism (ILP) wall. Hence, a lot of research is focusing on proposing and developing new technologies and architectures. In this article, we present a comprehensive classification of memory-centric computing architectures; it is based on three metrics: computation location, level of parallelism, and used memory technology. The classification not only provides an overview of existing architectures with their pros and cons but also unifies the terminology that uniquely identifies these architectures and highlights the potential future architectures that can be further explored. Hence, it sets up a direction for future research in the field.
- Shaizeen Aga, Supreet Jeloka, Arun Subramaniyan, Satish Narayanasamy, David Blaauw, and Reetuparna Das. 2017. Compute caches. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA’17). IEEE, 481--492.Google ScholarCross Ref
- Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2016. A scalable processing-in-memory accelerator for parallel graph processing. ACM SIGARCH Computer Architecture News 43, 3 (2016), 105--117.Google ScholarDigital Library
- Marco A. Z. Alves, Matthias Diener, Paulo C. Santos, and Luigi Carro. 2016. Large vector extensions inside the HMC. In Design, Automation and Test in Europe Conference and Exhibition (DATE'16). IEEE, 1249--1254.Google Scholar
- Marco Antonio Zanata Alves, Carlos Villavieja, Matthias Diener, Francis Birck Moreira, and Philippe Olivier Alexandre Navaux. 2015. SiNUCA: A validated micro-architecture simulator. In Proceeding of International Conference on High Performance Computing and Communications (HPCC), International Symposium on Cyberspace Safety and Security (CSS), and International Conference on Embedded Software and Systems (ICESS). 605--610.Google Scholar
- Luca Amarú, Pierre-Emmanuel Gaillardon, and Giovanni De Micheli. 2015. The EPFL combinational benchmark suite. In Proceedings of the 24th International Workshop on Logic 8 Synthesis (IWLS’15).Google Scholar
- Ali BanaGozar, Kanishkan Vadivel, Sander Stuijk, Henk Corporaal, Stephan Wong, Muath Abu Lebdeh, Jintao Yu, and Said Hamdioui. 2019. CIM-SIM: Computation in memory SIMuIator. In International Workshop on Software and Compilers for Embedded Systems. ACM, 1--4.Google ScholarDigital Library
- John Barth, Don Plass, Erik Nelson, Charlie Hwang, Gregory Fredeman, Michael Sperling, Abraham Mathews, Toshiaki Kirihata, William R. Reohr, Kavita Nair, and Nianzheng Cao. 2010. A 45nm SOI embedded DRAM macro for the POWER™ processor 32 MByte on-chip L3 cache. IEEE Journal of Solid-State Circuits 46, 1 (2010), 64--75.Google ScholarCross Ref
- Gary Benson, Yozen Hernandez, and Joshua Loving. 2013. A bit-parallel, general integer-scoring sequence alignment algorithm. In Annual Symposium on Combinatorial Pattern Matching. Springer, 50--61.Google ScholarCross Ref
- Debjyoti Bhattacharjee, Rajeswari Devadoss, and Anupam Chattopadhyay. 2017. ReVAMP: ReRAM based VLIW architecture for in-memory computing. In 2017 Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’17). IEEE, 782--787.Google Scholar
- Sabpreet Bhatti, Rachid Sbiaa, Atsufumi Hirohata, Hideo Ohno, Shunsuke Fukami, and S. N. Piramanayagam. 2017. Spintronics based random access memory: A review. Materials Today 20, 9 (2017), 530--548.Google ScholarCross Ref
- Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. ACM, 72--81.Google ScholarDigital Library
- Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, et al. 2011. The gem5 simulator. ACM SIGARCH Computer Architecture News 39, 2 (2011), 1--7.Google ScholarDigital Library
- Evgeny Bolotin, David Nellans, Oreste Villa, Mike O’Connor, Alex Ramirez, and Stephen W. Keckler. 2015. Designing efficient heterogeneous memory architectures. IEEE Micro 35, 4 (2015), 60--68.Google ScholarDigital Library
- Julien Borghetti, Gregory S. Snider, Philip J. Kuekes, J. Joshua Yang, Duncan R. Stewart, and R. Stanley Williams. 2010. Memristive switches enable stateful logic operations via material implication. Nature 464, 7290 (2010), 873--876.Google Scholar
- S. Borkar. 1999. Design challenges of technology scaling. IEEE Micro 19, 4 (July 1999), 23--29. DOI:https://doi.org/10.1109/40.782564Google ScholarDigital Library
- Rafmag Cabrera, Emmanuelle Merced, and Nelson Sepúlveda. 2013. A micro-electro-mechanical memory based on the structural phase transition of VO2. Physica Status Solidi (a) 210, 9 (2013), 1704--1711.Google ScholarCross Ref
- Meng-Fan Chang, Ching-Hao Chuang, Min-Ping Chen, Lai-Fu Chen, Hiroyuki Yamauchi, Pi-Feng Chiu, and Shyh-Shyuan Sheu. 2012. Endurance-aware circuit designs of nonvolatile logic and nonvolatile SRAM using resistive memory (memristor) device. In 2012 17th Asia and South Pacific Design Automation Conference (ASP-DAC’12). IEEE, 329--334.Google ScholarCross Ref
- Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In IEEE International Symposium on Workload Characterization, 2009 (IISWC’09). IEEE, 44--54.Google ScholarDigital Library
- E. Chen, D. Apalkov, Z. Diao, A. Driskill-Smith, D. Druist, D. Lottis, V. Nikitin, X. Tang, S. Watts, S. Wang, et al. 2010. Advances and future prospects of spin-transfer torque random access memory. IEEE Transactions on Magnetics 46, 6 (2010), 1873--1878.Google ScholarCross Ref
- Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, et al. 2014. Dadiannao: A machine-learning supercomputer. In IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 609--622.Google ScholarDigital Library
- Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In ACM SIGARCH Computer Architecture News, Vol. 44. IEEE Press, 27--39.Google Scholar
- Gianni Conte, Stefano Tommesani, and Francesco Zanichelli. 2000. The long and winding road to high-performance image processing with MMX/SSE. In Proceedings of the 5th IEEE International Workshop on Computer Architectures for Machine Perception, 2000. IEEE, 302--310.Google ScholarCross Ref
- Joao Paulo C. de Lima, Paulo Cesar Santos, Marco A. Z. Alves, Antonio C. S. Beck, and Luigi Carro. 2018. Design space exploration for PIM architectures in 3D-stacked memories. In Computer Frontier. ACM, 295--308.Google Scholar
- Jaffrey Draper, J. Tim Barrett, Jeff Sondeen, Sumit Mediratta, Chang Woo Kang, Ihn Kim, and Gokhan Daglikoca. 2005. A prototype processing-in-memory (PIM) chip for the data-intensive architecture (DIVA) system. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology 40, 1 (2005), 73--84.Google ScholarDigital Library
- Jeff Draper, Jacqueline Chame, Mary Hall, Craig Steele, Tim Barrett, Jeff LaCoss, John Granacki, Jaewook Shin, Chun Chen, Chang Woo Kang, et al. 2002. The architecture of the DIVA processing-in-memory chip. In Proceedings of the 16th International Conference on Supercomputing. ACM, 14--25.Google ScholarDigital Library
- H. A. Du Nguyen, Jintao Yu, Lei Xie, Mottaqiallah Taouil, Said Hamdioui, and Dietmar Fey. 2017. Memristive devices for computing: Beyond CMOS and beyond von Neumann. In 2017 IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC’17). IEEE, 1--10.Google ScholarCross Ref
- Hoang Anh Du Nguyen, Lei Xie, Mottaqiallah Taouil, Razvan Nane, Said Hamdioui, and Koen Bertels. 2017. On the implementation of computation-in-memory parallel adder. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25, 8 (2017), 2206--2219.Google ScholarDigital Library
- P. Dudek and S. J. Carey. 2006. General-purpose 128/spl times/128 SIMD processor array with integrated image sensor. Electronics Letters 42, 12 (2006), 678--679.Google ScholarCross Ref
- Charles Eckert, Xiaowei Wang, Jingcheng Wang, Arun Subramaniyan, Ravi Iyer, Dennis Sylvester, David Blaauw, and Reetuparna Das. 2018. Neural cache: Bit-serial in-cache acceleration of deep neural networks. arXiv preprint arXiv:1805.03718 (2018).Google Scholar
- Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo, Rebecca L. Stamm, and Dean M. Tullsen. 1997. Simultaneous multithreading: A platform for next-generation processors. IEEE Micro 17, 5 (1997), 12--19.Google ScholarDigital Library
- Amin Farmahini-Farahani, Jung Ho Ahn, Katherine Morrow, and Nam Sung Kim. 2015. DRAMA: An architecture for accelerated processing near memory. IEEE Computer Architecture Letters 14, 1 (2015), 26--29.Google ScholarDigital Library
- Tim Finkbeiner, Glen Hush, Troy Larsen, Perry Lea, John Leidel, and Troy Manning. 2017. In-memory intelligence. IEEE Micro 37, 4 (2017), 30--38.Google ScholarDigital Library
- Nadeem Firasta, Mark Buxton, Paula Jinbo, Kaveh Nasri, and Shihjong Kuo. 2008. Intel AVX: New frontiers in performance improvements and energy efficiency. Intel White Paper 19 (2008), 20.Google Scholar
- Randall James Fisher. 2003. General-purpose SIMD within a register: Parallel processing on consumer microprocessors. Doctoral Dissertation.Google Scholar
- M. Flynn. 1966. Very high-speed computing systems. Proceedings of the IEEE 54, 12 (Dec. 1966), 1901--1909. DOI:https://doi.org/10.1109/PROC.1966.5273Google ScholarCross Ref
- G. D. Fuchs, N. C. Emley, I. N. Krivorotov, P. M. Braganca, E. M. Ryan, S. I. Kiselev, J. C. Sankey, D. C. Ralph, R. A. Buhrman, and J. A. Katine. 2004. Spin-transfer effects in nanoscale magnetic tunnel junctions. Applied Physics Letters 85, 7 (2004), 1205--1207.Google ScholarCross Ref
- Daichi Fujiki, Scott Mahlke, and Reetuparna Das. 2018. In-memory data parallel processor. In Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 1--14.Google ScholarDigital Library
- Pierre-Emmanuel Gaillardon, Luca Amar, Anne Siemon, Eike Linn, Rainer Waser, Anupam Chattopadhyay, and Giovanni De Micheli. 2016. The programmable logic-in-memory (PLiM) computer. In 2016 Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’16). IEEE, 427--432.Google Scholar
- Mingyu Gao, Grant Ayers, and Christos Kozyrakis. 2015. Practical near-data processing for in-memory analytics frameworks. In 2015 International Conference on Parallel Architecture and Compilation (PACT’15). IEEE, 113--124.Google ScholarDigital Library
- Simcha Gochman, Avi Mendelson, Alon Naveh, and Efraim Rotem. 2006. Introduction to Intel core duo processor architecture. Intel Technology Journal 10, 2 (2006), 89--97.Google ScholarCross Ref
- Jonathan E. Green, Jang Wook Choi, Akram Boukai, Yuri Bunimovich, Ezekiel Johnston-Halperin, Erica DeIonno, Yi Luo, Bonnie A. Sheriff, Ke Xu, Young Shik Shin, et al. 2007. A 160-kilobit molecular electronic memory patterned at 10 11 bits per square centimetre. Nature 445, 7126 (2007), 414.Google Scholar
- Beat Halg. 1990. On a micro-electro-mechanical nonvolatile memory cell. IEEE Transactions on Electron Devices 37, 10 (1990), 2230--2236.Google ScholarCross Ref
- Said Hamdioui, Koenraad Laurent Maria Bertels, and Mottaqiallah Taouil. 2017. Computing Device for Big Data Applications Using Memristors. US Patent 9,824,753.Google Scholar
- Said Hamdioui, Hoang Anh Du Nguyen, Mottaqiallah Taouil, Abu Sebastian, Manuel Le Gallo, Sandeep Pande, Siebren Schaafsma, Francky Catthoor, Shidhartha Das, Fernando G. Redondo, et al. 2019. Applications of computation-in-memory architectures based on memristive devices. In 2019 Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’19). IEEE, 486--491.Google Scholar
- Said Hamdioui, Shahar Kvatinsky, Gert Cauwenberghs, Lei Xie, Nimrod Wald, Siddharth Joshi, Hesham Mostafa Elsayed, Henk Corporaal, and Koen Bertels. 2017. Memristor for computing: Myth or reality? In Proceedings of the Conference on Design, Automation 8 Test in Europe. European Design and Automation Association, 722--731.Google ScholarCross Ref
- Said Hamdioui, Lei Xie, Hoang Anh Du Nguyen, Mottaqiallah Taouil, Koen Bertels, Henk Corporaal, Hailong Jiao, Francky Catthoor, Dirk Wouters, Linn Eike, et al. 2015. Memristor based computation-in-memory architecture for data-intensive applications. In Proceedings of the 2015 Design, Automation 8 Test in Europe Conference 8 Exhibition. EDA Consortium, 1718--1725.Google Scholar
- JongWook Han, Choon-Sik Park, Dae-Hyun Ryu, and Eun-Soo Kim. 1999. Optical image encryption based on XOR operations. Optical Engineering 38, 1 (1999), 47--55.Google ScholarCross Ref
- Adib Haron, Jintao Yu, Razvan Nane, Mottaqiallah Taouil, Said Hamdioui, and Koen Bertels. 2016. Parallel matrix multiplication on memristor-based computation-in-memory architecture. In 2016 International Conference on High Performance Computing 8 Simulation (HPCS’16). IEEE, 759--766.Google ScholarCross Ref
- John L. Hennessy and David A. Patterson. 2011. Computer Architecture: A Quantitative Approach. Elsevier.Google ScholarDigital Library
- HMC. 2018. Hybrid Memory Cube Specification 2.1. Retrieved from http://hybridmemorycube.org/.Google Scholar
- M. Hosomi, H. Yamagishi, T. Yamamoto, K. Bessho, Y. Higo, K. Yamane, H. Yamada, M. Shoji, H. Hachino, C. Fukumoto, et al. 2005. A novel nonvolatile memory with spin torque transfer magnetization switching: Spin-RAM. In IEEE International Electron Devices Meeting, 2005. IEDM Technical Digest. IEEE, 459--462.Google ScholarCross Ref
- Rotem Ben Hur and Shahar Kvatinsky. 2016. Memristive memory processing unit (MPU) controller for in-memory processing. In IEEE International Conference on the Science of Electrical Engineering (ICSEE’16). IEEE, 1--5.Google Scholar
- IBM. 2014. Power 4 - The First Multi-Core, 1GHz Processor.Google Scholar
- ITRS. 2010. ITRS ERD Report. Retrieved from http://www.itrs.net.Google Scholar
- Subramanian S. Iyer and Howard L. Kalter. 1999. Embedded DRAM technology: Opportunities and challenges. IEEE Spectrum 36, 4 (1999), 56--64.Google ScholarDigital Library
- Shubham Jain, Ashish Ranjan, Kaushik Roy, and Anand Raghunathan. 2017. Computing in memory with spin-transfer torque magnetic RAM. arXiv preprint arXiv:1703.02118 (2017).Google Scholar
- Joe Jeddeloh and Brent Keeth. 2012. Hybrid memory cube new DRAM architecture increases density and performance. In 2012 Symposium on VLSI Technology (VLSIT’12). IEEE, 87--88.Google ScholarCross Ref
- Zhang Jianwu, Zhao Danying, et al. 2008. Survey on microprocessor architecture and development trends. In 11th IEEE International Conference on Communication Technology, 2008 (ICCT’08). IEEE, 297--300.Google Scholar
- David Judd, Katherine Yelick, Christoforos Kozyrakis, David Martin, and David Patterson. 2001. Exploiting on-chip memory bandwidth in the VIRAM compiler. In Intelligent Memory Systems. Springer, 122--134.Google Scholar
- Hongshin Jun, Jinhee Cho, Kangseol Lee, Ho-Young Son, Kwiwook Kim, Hanho Jin, and Keith Kim. 2017. HBM (high bandwidth memory) DRAM technology and architecture. In 2017 IEEE International Memory Workshop (IMW’17). IEEE, 1--4.Google ScholarCross Ref
- Ron Kalla, Balaram Sinharoy, William J. Starke, and Michael Floyd. 2010. Power7: IBM’s next-generation server processor. IEEE Micro 30, 2 (2010), 7--15.Google ScholarDigital Library
- Yi Kang, Wei Huang, Seung-Moon Yoo, D. Keen, Zhenzhou Ge, V. Lam, P. Pattnaik, and J. Torrellas. [n.d.]. FlexRAM: Toward an advanced intelligent memory system. In 2012 IEEE 30th International Conference on Computer Design (ICCD’12). 5--14. DOI:https://doi.org/10.1109/ICCD.2012.6378608Google ScholarDigital Library
- Yi Kang, Wei Huang, Seung-Moon Yoo, Diana Keen, Zhenzhou Ge, Vinh Lam, Pratap Pattnaik, and Josep Torrellas. 2012. FlexRAM: Toward an advanced intelligent memory system. In 2012 IEEE 30th International Conference on Computer Design (ICCD’12). IEEE, 5--14.Google ScholarDigital Library
- Doris Keitel-Schulz and Norbert Wehn. 1998. Issues in embedded DRAM development and applications. In Proceedings of the 11th International Symposium on System Synthesis. IEEE Computer Society, 23--31.Google ScholarDigital Library
- Doris Keitel-Schulz and Norbert Wehn. 2001. Embedded DRAM development: Technology, physical design, and application issues. IEEE Design 8 Test of Computers 18, 3 (2001), 7--15.Google Scholar
- Kyosun Kim, Sangho Shin, and Sung-Mo Kang. 2011. Stateful logic pipeline architecture. In 2011 IEEE International Symposium of Circuits and Systems (ISCAS’11). IEEE, 2497--2500.Google ScholarCross Ref
- David Kirk et al. 2007. NVIDIA CUDA software and GPU parallel computing architecture. In ISMM, Vol. 7. 103--104.Google ScholarDigital Library
- Christoforos Kozyrakis. 2002. Scalable Vector Media-Processors for Embedded Systems. Technical Report. California University Berkeley Computer Science Division.Google Scholar
- Christoforos Kozyrakis and David Patterson. 2002. Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks. In Proceedings of the 35th Annual ACM/IEEE International Symposium on Microarchitecture. IEEE Computer Society Press, 283--293.Google ScholarDigital Library
- Christoforos E. Kozyrakis, Stylianos Perissakis, David Patterson, Thomas Anderson, Krste Asanovic, Neal Cardwell, Richard Fromm, Jason Golbus, Benjamin Gribstad, Kimberly Keeton, et al. 1997. Scalable processors in the billion-transistor era: IRAM. Computer 30, 9 (1997), 75--78.Google ScholarDigital Library
- Nasser Kurd, Muntaquim Chowdhury, Edward Burton, Thomas P. Thomas, Christopher Mozak, Brent Boswell, Praveen Mosalikanti, Mark Neidengard, Anant Deval, Ashish Khanna, et al. 2014. Haswell: A family of IA 22nm processors. IEEE Journal of Solid-State Circuits 50, 1 (2014), 49--58.Google ScholarCross Ref
- Shahar Kvatinsky, Dmitry Belousov, Slavik Liman, Guy Satat, Nimrod Wald, Eby G. Friedman, Avinoam Kolodny, and Uri C. Weiser. 2014. MAGIC--Memristor-aided logic. IEEE Transactions on Circuits and Systems II: Express Briefs 61, 11 (2014), 895--899.Google ScholarCross Ref
- Shahar Kvatinsky, Guy Satat, Nimrod Wald, Eby G. Friedman, Avinoam Kolodny, and Uri C. Weiser. 2014. Memristor-based material implication (IMPLY) logic: Design principles and methodologies. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 22, 10 (2014), 2054--2066.Google ScholarCross Ref
- Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2010. Phase change memory architecture and the quest for scalability. Communications of the ACM 53, 7 (2010), 99--106.Google ScholarDigital Library
- Jong Chern Lee, Jihwan Kim, Kyung Whan Kim, Young Jun Ku, Dae Suk Kim, Chunseok Jeong, Tae Sik Yun, Hongjung Kim, Ho Sung Cho, Yeon Ok Kim, et al. 2016. 18.3 A 1.2 V 64Gb 8-channel 256GB/s HBM DRAM with peripheral-base-die architecture and small-swing technique on heavy load interface. In 2016 IEEE International Solid-State Circuits Conference (ISSCC’16). IEEE, 318--319.Google ScholarCross Ref
- Eero Lehtonen, Jussi H. Poikonen, and Mika Laiho. 2014. Memristive stateful logic. In Memristor Networks. Springer, 603--623.Google Scholar
- John D. Leidel and Yong Chen. 2016. Hmc-sim-2.0: A simulation platform for exploring custom memory cube operations. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW’16). IEEE, 621--630.Google Scholar
- Chao Li, Wendy Fan, Bo Lei, Daihua Zhang, Song Han, Tao Tang, Xiaolei Liu, Zuqin Liu, Sylvia Asano, Meyya Meyyappan, et al. 2004. Multilevel memory based on molecular devices. Applied Physics Letters 84, 11 (2004), 1949--1951.Google ScholarCross Ref
- Chao Li, Daihua Zhang, Xiaolei Liu, Song Han, Tao Tang, Chongwu Zhou, Wendy Fan, Jessica Koehne, Jie Han, Meyya Meyyappan, et al. 2003. Fabrication approach for molecular memory arrays. Applied Physics Letters 82, 4 (2003), 645--647.Google ScholarCross Ref
- Shuangchen Li, Dimin Niu, Krishna T. Malladi, Hongzhong Zheng, Bob Brennan, and Yuan Xie. 2017. DRISA: A DRAM -based reconfigurable in-situ accelerator. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 288--301.Google ScholarDigital Library
- Shuangchen Li, Cong Xu, Qiaosha Zou, Jishen Zhao, Yu Lu, and Yuan Xie. 2016. Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories. In Proceeding of ACM/EDAC/IEEE Design Automation Conference (DAC). IEEE, 173--178.Google ScholarDigital Library
- E. Linn, R. Rosezin, S. Tappertzhofen, R. Waser, et al. 2012. Beyond von Neumann--logic operations in passive crossbar arrays alongside memory operations. Nanotechnology 23, 30 (2012), 305205.Google ScholarCross Ref
- Andrea Lodi, Mario Toma, Fabio Campi, Andrea Cappelli, Roberto Canegallo, and Roberto Guerrieri. 2003. A VLIW processor with reconfigurable instruction set for embedded applications. IEEE Journal of Solid-state Circuits 38, 11 (2003), 1876--1886.Google ScholarCross Ref
- Joe Macri. 2015. AMD’s next generation GPU and high bandwidth memory architecture: FURY. In 2015 IEEE Hot Chips 27 Symposium (HCS’15). IEEE, 1--26.Google ScholarCross Ref
- Ken Mai, Tim Paaske, Nuwan Jayasena, Ron Ho, William J. Dally, and Mark Horowitz. 2000. Smart memories: A modular reconfigurable architecture. ACM SIGARCH Computer Architecture News 28, 2 (2000), 161--171.Google ScholarDigital Library
- Ariel Maislos et al. 2011. A new era in embedded Flash memory. In Flash Memory Summit.Google Scholar
- Jack A. Mandelman, Robert H. Dennard, Gary B. Bronner, John K. DeBrosse, Rama Divakaruni, Yujun Li, and Carl J. Radens. 2002. Challenges and future directions for the scaling of dynamic random-access memory (DRAM). IBM Journal of Research and Development 46, 2.3 (2002), 187--212.Google ScholarDigital Library
- Pedro Marcuello, Antonio González, and Jordi Tubella. 1998. Speculative multithreaded processors. In Proceedings of the 12th International Conference on Supercomputing. ACM, 77--84.Google ScholarDigital Library
- Sparsh Mittal. 2018. A survey of ReRAM-based architectures for processing-in-memory and neural networks. Machine Learning and Knowledge Extraction 1, 1 (2018), 75--114. DOI:https://doi.org/10.3390/make1010005Google ScholarCross Ref
- Amir Morad, Leonid Yavits, and Ran Ginosar. 2014. Efficient dense and sparse Matrix multiplication on GP-SIMD. In 2014 24th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS’14). IEEE, 1--8.Google ScholarCross Ref
- Amir Morad, Leonid Yavits, and Ran Ginosar. 2015. GP-SIMD processing-in-memory. ACM Transactions on Architecture and Code Optimization (TACO) 11, 4 (2015), 53.Google Scholar
- Amir Morad, Leonid Yavits, Shahar Kvatinsky, and Ran Ginosar. 2016. Resistive GP-SIMD processing-in-memory. ACM Transactions on Architecture and Code Optimization (TACO) 12, 4 (2016), 57.Google ScholarDigital Library
- Onur Mutlu. 2013. Memory scaling: A systems architecture perspective. In 2013 5th IEEE International Memory Workshop (IMW’13). IEEE, 21--25.Google ScholarCross Ref
- Ravi Nair. 2015. Evolution of memory architecture. Proceedings of the IEEE 103, 8 (2015), 1331--1345.Google ScholarCross Ref
- Ravi Nair, Samuel F. Antao, Carlo Bertolli, Pradip Bose, Jose R. Brunheroto, Tong Chen, C.-Y. Cher, Carlos H. A. Costa, Jun Doi, Constantinos Evangelinos, et al. 2015. Active memory cube: A processing-in-memory architecture for exascale systems. IBM Journal of Research and Development 59, 2/3 (2015), 17--1.Google ScholarDigital Library
- H. Noyes et al. 2014. Micron’s automata processor architecture: Reconfigurable and massively parallel automata processing. In Proceedings of 5th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies.Google Scholar
- NVIDIA. 2012. Tesla K20X GPU Accelerator Board Specification.Google Scholar
- Mark Oskin, Frederic T. Chong, and Timothy Sherwood. 1998. Active Pages: A Computation Model for Intelligent Memory. Vol. 26. IEEE Computer Society.Google ScholarDigital Library
- David Patterson, Thomas Anderson, Neal Cardwell, Richard Fromm, Kimberly Keeton, Christoforos Kozyrakis, Randi Thomas, and Katherine Yelick. 1997. A case for intelligent RAM. IEEE Micro 17, 2 (1997), 34--44.Google ScholarDigital Library
- David A. Patterson. 2006. Future of computer architecture. In Berkeley EECS Annual Research Symposium (BEARS), College of Engineering, UC Berkeley, US.Google Scholar
- J. Thomas Pawlowski. 2011. Hybrid memory cube (HMC). In 2011 IEEE Hot Chips 23 Symposium (HCS’11). IEEE, 1--24.Google ScholarCross Ref
- Alex Peleg and Uri Weiser. 1996. MMX technology extension to the Intel architecture. IEEE Micro 16, 4 (1996), 42--50.Google ScholarDigital Library
- M. Radosavljević, M. Freitag, K. V. Thadani, and A. T. Johnson. 2002. Nonvolatile molecular memory elements based on ambipolar nanotube field effect transistors. Nano Letters 2, 7 (2002), 761--764.Google ScholarCross Ref
- R. M. Ramanathan. 2006. Intel® multi-core processors. In Making the Move to Quad-Core and Beyond.Google Scholar
- Simone Raoux, Feng Xiong, Matthias Wuttig, and Eric Pop. 2014. Phase change materials and phase change memory. MRS Bulletin 39, 8 (2014), 703--710.Google ScholarCross Ref
- John Reuben, Rotem Ben-Hur, Nimrod Wald, Nishil Talati, Ameer Haj Ali, Pierre-Emmanuel Gaillardon, and Shahar Kvatinsky. 2017. Memristive logic: A framework for evaluation and comparison. In 2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS’17). IEEE, 1--8.Google ScholarCross Ref
- Gurtej S. Sandhu. 2013. Emerging memories technology landscape. In 2013 13th Non-Volatile Memory Technology Symposium (NVMTS’13). IEEE, 1--5.Google ScholarCross Ref
- Karthikeyan Sankaralingam, Ramadass Nagarajan, Haiming Liu, Changkyu Kim, Jaehyuk Huh, Doug Burger, Stephen W. Keckler, and Charles R. Moore. 2003. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In ACM SIGARCH Computer Architecture News, Vol. 31. ACM, 422--433.Google Scholar
- Vivek Seshadri, Kevin Hsieh, Amirali Boroum, Donghyuk Lee, Michael A. Kozuch, Onur Mutlu, Phillip B. Gibbons, and Todd C. Mowry. 2015. Fast bulk bitwise AND and OR in DRAM. IEEE Computer Architecture Letters 14, 2 (2015), 127--131.Google ScholarDigital Library
- Vivek Seshadri, Donghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali Boroumand, Jeremie Kim, Michael A. Kozuch, Onur Mutlu, Phillip B. Gibbons, and Todd C. Mowry. 2017. Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 273--287.Google Scholar
- Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R. Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Computer Architecture News 44, 3 (2016), 14--26.Google ScholarDigital Library
- M. A. Shami and A. Hemani. 2012. Classification of massively parallel computer architectures. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW’12). 344--351. DOI:https://doi.org/10.1109/IPDPSW.2012.42Google ScholarDigital Library
- Patrick Siegl, Rainer Buchty, and Mladen Berekovic. 2016. Data-centric computing frontiers: A survey on processing-in-memory. In Proceedings of the 2nd International Symposium on Memory Systems. ACM, 295--308.Google ScholarDigital Library
- A. Siemon, S. Menzel, A. Chattopadhyay, R. Waser, and E. Linn. 2015. In-memory adder functionality in 1S1R arrays. In 2015 IEEE International Symposium on Circuits and Systems (ISCAS’15). IEEE, 1338--1341.Google Scholar
- Anne Siemon, Stephan Menzel, Rainer Waser, and Eike Linn. 2015. A complementary resistive switch-based crossbar array adder. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 5, 1 (2015), 64--74.Google ScholarCross Ref
- Gagandeep Singh, Lorenzo Chelini, Stefano Corda, Ahsan Javed Awan, Sander Stuijk, Roel Jordans, Henk Corporaal, and Albert-Jan Boonstra. 2018. A review of near-memory computing architectures: Opportunities and challenges. In Proceedings of the 21st Euromicro Conference on Digital System Design (DSD’18).Google ScholarCross Ref
- D. B. Skillicorn. 1988. A taxonomy for computer architectures. Computer 21, 11 (Nov. 1988), 46--57. DOI:https://doi.org/10.1109/2.86786Google ScholarDigital Library
- G. Snider. 2005. Computing with hysteretic resistor crossbars. Applied Physics A: Materials Science 8 Processing 80, 6 (2005), 1165--1172.Google Scholar
- Kyomin Sohn, Won-Joo Yun, Reum Oh, Chi-Sung Oh, Seong-Young Seo, Min-Sang Park, Dong-Hak Shin, Won-Chang Jung, Sang-Hoon Shin, Je-Min Ryu, et al. 2017. A 1.2 V 20nm 307GB/s HBM DRAM with at-speed wafer-level IO test scheme and adaptive refresh considering temperature distribution. IEEE Journal of Solid-State Circuits 52, 1 (2017), 250--260.Google ScholarCross Ref
- Harold S. Stone. 1970. A logic-in-memory computer. IEEE Transactions on Computing 100, 1 (1970), 73--78.Google ScholarDigital Library
- Arun Subramaniyan, Jingcheng Wang, Ezhil R. M. Balasubramanian, David Blaauw, Dennis Sylvester, and Reetuparna Das. 2017. Cache automaton. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-50’17). ACM, New York, NY, 259--272. DOI:https://doi.org/10.1145/3123939.3123986Google ScholarDigital Library
- Jinwoo Suh, Eun-Gyu Kim, Stephen P. Crago, Lakshmi Srinivasan, and Matthew C. French. 2003. A performance analysis of PIM, stream processing, and tiled processing on memory-intensive signal processing kernels. In ACM SIGARCH Computer Architecture News, Vol. 31. ACM, 410--421.Google Scholar
- Mark R. Thistle and Burton J. Smith. 1988. A processor architecture for Horizon. In Proceedings of Supercomputing’88. Vol. 1. IEEE, 35--41.Google Scholar
- Dean M. Tullsen, Susan J. Eggers, and Henry M. Levy. 1995. Simultaneous multithreading: Maximizing on-chip parallelism. In ACM SIGARCH Computer Architecture News, Vol. 23. ACM, 392--403.Google Scholar
- Mario Vestias and Horácio Neto. 2014. Trends of CPU, GPU and FPGA for high-performance computing. In 2014 24th International Conference on Field Programmable Logic and Applications (FPL’14). IEEE, 1--6.Google ScholarCross Ref
- Borui Wang, Martin Torres, Dong Li, Jishen Zhao, and Florin Rusu. 2016. Performance implications of processing-in-memory designs on data-intensive applications. In 2016 45th International Conference on Parallel Processing Workshops (ICPPW’16). IEEE, 115--122.Google ScholarCross Ref
- Jue Wang, Xiangyu Dong, Yuan Xie, and Norman P. Jouppi. 2014. Endurance-aware cache line management for non-volatile caches. ACM Transactions on Architecture and Code Optimization (TACO) 11, 1 (2014), 4.Google ScholarDigital Library
- Ying Wang, Yinhe Han, Lei Zhang, Huawei Li, and Xiaowei Li. 2015. ProPRAM: Exploiting the transparent logic resources in non-volatile memory for near data computing. In Proceedings of the 52nd Annual Design Automation Conference. ACM, 47.Google ScholarDigital Library
- Rainer Waser. 2012. Redox-based resistive switching memories. Journal of Nanoscience and Nanotechnology 12, 10 (2012), 7628--7640.Google ScholarCross Ref
- Rainer Waser and Masakazu Aono. 2007. Nanoionics-based resistive switching memories. Nature Materials 6, 11 (2007), 833.Google ScholarCross Ref
- Stephan Wong, Thijs Van As, and Geoffrey Brown. 2008. ρ-VEX: A reconfigurable and extensible softcore VLIW processor. In International Conference on ICECE Technology, 2008 (FPT’08). IEEE, 369--372.Google Scholar
- Wm A. Wulf and Sally A. McKee. 1995. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Computer Architecture News 23, 1 (1995), 20--24.Google ScholarDigital Library
- Lei Xie, Hoang Anh Du Nguyen, Mottaqiallah Taouil, and Koen Bertels Said Hamdioui. 2015. Fast Boolean logic mapped on memristor crossbar. In 2015 33rd IEEE International Conference on Computer Design (ICCD’15). IEEE, 335--342.Google ScholarDigital Library
- Lei Xie, Hoang Anh Du Nguyen, Jintao Yu, Ali Kaichouhi, Mottaqiallah Taouil, Mohammad AlFailakawi, and Said Hamdioui. 2017. Scouting logic: A novel memristor-based logic design for resistive computing. In IEEE Computer Society Annual Symposium on VLSI (ISVLSI’17). IEEE, 335--340.Google ScholarCross Ref
- Sheng Xu, Xiaoming Chen, Ying Wang, Yinhe Han, Xuehai Qian, and Xiaowei Li. 2018. PIMSim: A flexible and detailed processing-in-memory simulator. IEEE Computer Architecture Letters 18, 1 (2018), 6--9.Google ScholarDigital Library
- J. Joshua Yang, Dmitri B. Strukov, and Duncan R. Stewart. 2013. Memristive devices for computing. Nature Nanotechnology 8, 1 (2013), 13--24.Google ScholarCross Ref
- Leonid Yavits, Shahar Kvatinsky, Amir Morad, and Ran Ginosar. 2015. Resistive associative processor. In CAL.Google Scholar
- Jintao Yu, Lei Xie, Mottaqiallah Taouil, and Said Hamdioui. 2018. Memristive devices for computation-in-memory. In Design, Automation and Test in Europe (DATE’18).Google Scholar
- Shimeng Yu and Pai-Yu Chen. 2016. Emerging memory technologies: Recent trends and prospects. IEEE Solid-State Circuits Magazine 8, 2 (2016), 43--56.Google ScholarCross Ref
- Jian-Gang Zhu. 2008. Magnetoresistive random access memory: The path to competitiveness and scalability. Proceedings of the IEEE 96, 11 (2008), 1786--1798.Google ScholarCross Ref
Index Terms
- A Classification of Memory-Centric Computing
Recommendations
A Survey on Memory-centric Computer Architectures
Faster and cheaper computers have been constantly demanding technological and architectural improvements. However, current technology is suffering from three technology walls: leakage wall, reliability wall, and cost wall. Meanwhile, existing architecture ...
A computation-in-memory accelerator based on resistive devices
MEMSYS '19: Proceedings of the International Symposium on Memory SystemsToday's computing architectures suffer from the three well-known bottlenecks, which are the memory, the power and the instruction-level parallelism walls. Emerging non-volatile technologies, such as memristor, enable new resistive architectures that ...
Memory-centric communication architecture for reconfigurable computing
ARC'10: Proceedings of the 6th international conference on Reconfigurable Computing: architectures, Tools and ApplicationsThis paper presents a memory-centric communication architecture for a reconfigurable array of processing elements, which reduces the communication overhead by establishing a direct communication channel through a memory between the array and other ...
Comments