research-article

Open Access

A Classification of Memory-Centric Computing

Authors:
Hoang Anh Du Nguyen

Delft University of Technology, Delft, the Netherlands

Delft University of Technology, Delft, the Netherlands

0000-0002-4618-7371
View Profile

,
Jintao Yu

Delft University of Technology, Delft, the Netherlands

Delft University of Technology, Delft, the Netherlands
View Profile

,
Muath Abu Lebdeh

Delft University of Technology, Delft, the Netherlands

Delft University of Technology, Delft, the Netherlands
View Profile

,
Mottaqiallah Taouil

Delft University of Technology, Delft, the Netherlands

Delft University of Technology, Delft, the Netherlands
View Profile

,
Said Hamdioui

Delft University of Technology, Delft, the Netherlands

Delft University of Technology, Delft, the Netherlands
View Profile

,
Francky Catthoor

Inter-university Micro-Electronics Center (IMEC)

Inter-university Micro-Electronics Center (IMEC)
View Profile

ACM Journal on Emerging Technologies in Computing Systems Volume 16 Issue 2Article No.: 13pp 1–26https://doi.org/10.1145/3365837

Published:30 January 2020Publication History

ACM Journal on Emerging Technologies in Computing Systems

Abstract

Technological and architectural improvements have been constantly required to sustain the demand of faster and cheaper computers. However, CMOS down-scaling is suffering from three technology walls: leakage wall, reliability wall, and cost wall. On top of that, a performance increase due to architectural improvements is also gradually saturating due to three well-known architecture walls: memory wall, power wall, and instruction-level parallelism (ILP) wall. Hence, a lot of research is focusing on proposing and developing new technologies and architectures. In this article, we present a comprehensive classification of memory-centric computing architectures; it is based on three metrics: computation location, level of parallelism, and used memory technology. The classification not only provides an overview of existing architectures with their pros and cons but also unifies the terminology that uniquely identifies these architectures and highlights the potential future architectures that can be further explored. Hence, it sets up a direction for future research in the field.

References

Shaizeen Aga, Supreet Jeloka, Arun Subramaniyan, Satish Narayanasamy, David Blaauw, and Reetuparna Das. 2017. Compute caches. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA’17). IEEE, 481--492.Google ScholarCross Ref
Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2016. A scalable processing-in-memory accelerator for parallel graph processing. ACM SIGARCH Computer Architecture News 43, 3 (2016), 105--117.Google ScholarDigital Library
Marco A. Z. Alves, Matthias Diener, Paulo C. Santos, and Luigi Carro. 2016. Large vector extensions inside the HMC. In Design, Automation and Test in Europe Conference and Exhibition (DATE'16). IEEE, 1249--1254.Google Scholar
Marco Antonio Zanata Alves, Carlos Villavieja, Matthias Diener, Francis Birck Moreira, and Philippe Olivier Alexandre Navaux. 2015. SiNUCA: A validated micro-architecture simulator. In Proceeding of International Conference on High Performance Computing and Communications (HPCC), International Symposium on Cyberspace Safety and Security (CSS), and International Conference on Embedded Software and Systems (ICESS). 605--610.Google Scholar
Luca Amarú, Pierre-Emmanuel Gaillardon, and Giovanni De Micheli. 2015. The EPFL combinational benchmark suite. In Proceedings of the 24th International Workshop on Logic 8 Synthesis (IWLS’15).Google Scholar
Ali BanaGozar, Kanishkan Vadivel, Sander Stuijk, Henk Corporaal, Stephan Wong, Muath Abu Lebdeh, Jintao Yu, and Said Hamdioui. 2019. CIM-SIM: Computation in memory SIMuIator. In International Workshop on Software and Compilers for Embedded Systems. ACM, 1--4.Google ScholarDigital Library
John Barth, Don Plass, Erik Nelson, Charlie Hwang, Gregory Fredeman, Michael Sperling, Abraham Mathews, Toshiaki Kirihata, William R. Reohr, Kavita Nair, and Nianzheng Cao. 2010. A 45nm SOI embedded DRAM macro for the POWER™ processor 32 MByte on-chip L3 cache. IEEE Journal of Solid-State Circuits 46, 1 (2010), 64--75.Google ScholarCross Ref
Gary Benson, Yozen Hernandez, and Joshua Loving. 2013. A bit-parallel, general integer-scoring sequence alignment algorithm. In Annual Symposium on Combinatorial Pattern Matching. Springer, 50--61.Google ScholarCross Ref
Debjyoti Bhattacharjee, Rajeswari Devadoss, and Anupam Chattopadhyay. 2017. ReVAMP: ReRAM based VLIW architecture for in-memory computing. In 2017 Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’17). IEEE, 782--787.Google Scholar
Sabpreet Bhatti, Rachid Sbiaa, Atsufumi Hirohata, Hideo Ohno, Shunsuke Fukami, and S. N. Piramanayagam. 2017. Spintronics based random access memory: A review. Materials Today 20, 9 (2017), 530--548.Google ScholarCross Ref
Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. ACM, 72--81.Google ScholarDigital Library
Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, et al. 2011. The gem5 simulator. ACM SIGARCH Computer Architecture News 39, 2 (2011), 1--7.Google ScholarDigital Library
Evgeny Bolotin, David Nellans, Oreste Villa, Mike O’Connor, Alex Ramirez, and Stephen W. Keckler. 2015. Designing efficient heterogeneous memory architectures. IEEE Micro 35, 4 (2015), 60--68.Google ScholarDigital Library
Julien Borghetti, Gregory S. Snider, Philip J. Kuekes, J. Joshua Yang, Duncan R. Stewart, and R. Stanley Williams. 2010. Memristive switches enable stateful logic operations via material implication. Nature 464, 7290 (2010), 873--876.Google Scholar
S. Borkar. 1999. Design challenges of technology scaling. IEEE Micro 19, 4 (July 1999), 23--29. DOI:https://doi.org/10.1109/40.782564Google ScholarDigital Library
Rafmag Cabrera, Emmanuelle Merced, and Nelson Sepúlveda. 2013. A micro-electro-mechanical memory based on the structural phase transition of VO2. Physica Status Solidi (a) 210, 9 (2013), 1704--1711.Google ScholarCross Ref
Meng-Fan Chang, Ching-Hao Chuang, Min-Ping Chen, Lai-Fu Chen, Hiroyuki Yamauchi, Pi-Feng Chiu, and Shyh-Shyuan Sheu. 2012. Endurance-aware circuit designs of nonvolatile logic and nonvolatile SRAM using resistive memory (memristor) device. In 2012 17th Asia and South Pacific Design Automation Conference (ASP-DAC’12). IEEE, 329--334.Google ScholarCross Ref
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In IEEE International Symposium on Workload Characterization, 2009 (IISWC’09). IEEE, 44--54.Google ScholarDigital Library
E. Chen, D. Apalkov, Z. Diao, A. Driskill-Smith, D. Druist, D. Lottis, V. Nikitin, X. Tang, S. Watts, S. Wang, et al. 2010. Advances and future prospects of spin-transfer torque random access memory. IEEE Transactions on Magnetics 46, 6 (2010), 1873--1878.Google ScholarCross Ref
Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, et al. 2014. Dadiannao: A machine-learning supercomputer. In IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 609--622.Google ScholarDigital Library
Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In ACM SIGARCH Computer Architecture News, Vol. 44. IEEE Press, 27--39.Google Scholar
Gianni Conte, Stefano Tommesani, and Francesco Zanichelli. 2000. The long and winding road to high-performance image processing with MMX/SSE. In Proceedings of the 5th IEEE International Workshop on Computer Architectures for Machine Perception, 2000. IEEE, 302--310.Google ScholarCross Ref
Joao Paulo C. de Lima, Paulo Cesar Santos, Marco A. Z. Alves, Antonio C. S. Beck, and Luigi Carro. 2018. Design space exploration for PIM architectures in 3D-stacked memories. In Computer Frontier. ACM, 295--308.Google Scholar
Jaffrey Draper, J. Tim Barrett, Jeff Sondeen, Sumit Mediratta, Chang Woo Kang, Ihn Kim, and Gokhan Daglikoca. 2005. A prototype processing-in-memory (PIM) chip for the data-intensive architecture (DIVA) system. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology 40, 1 (2005), 73--84.Google ScholarDigital Library
Jeff Draper, Jacqueline Chame, Mary Hall, Craig Steele, Tim Barrett, Jeff LaCoss, John Granacki, Jaewook Shin, Chun Chen, Chang Woo Kang, et al. 2002. The architecture of the DIVA processing-in-memory chip. In Proceedings of the 16th International Conference on Supercomputing. ACM, 14--25.Google ScholarDigital Library
H. A. Du Nguyen, Jintao Yu, Lei Xie, Mottaqiallah Taouil, Said Hamdioui, and Dietmar Fey. 2017. Memristive devices for computing: Beyond CMOS and beyond von Neumann. In 2017 IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC’17). IEEE, 1--10.Google ScholarCross Ref
Hoang Anh Du Nguyen, Lei Xie, Mottaqiallah Taouil, Razvan Nane, Said Hamdioui, and Koen Bertels. 2017. On the implementation of computation-in-memory parallel adder. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25, 8 (2017), 2206--2219.Google ScholarDigital Library
P. Dudek and S. J. Carey. 2006. General-purpose 128/spl times/128 SIMD processor array with integrated image sensor. Electronics Letters 42, 12 (2006), 678--679.Google ScholarCross Ref
Charles Eckert, Xiaowei Wang, Jingcheng Wang, Arun Subramaniyan, Ravi Iyer, Dennis Sylvester, David Blaauw, and Reetuparna Das. 2018. Neural cache: Bit-serial in-cache acceleration of deep neural networks. arXiv preprint arXiv:1805.03718 (2018).Google Scholar
Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo, Rebecca L. Stamm, and Dean M. Tullsen. 1997. Simultaneous multithreading: A platform for next-generation processors. IEEE Micro 17, 5 (1997), 12--19.Google ScholarDigital Library
Amin Farmahini-Farahani, Jung Ho Ahn, Katherine Morrow, and Nam Sung Kim. 2015. DRAMA: An architecture for accelerated processing near memory. IEEE Computer Architecture Letters 14, 1 (2015), 26--29.Google ScholarDigital Library
Tim Finkbeiner, Glen Hush, Troy Larsen, Perry Lea, John Leidel, and Troy Manning. 2017. In-memory intelligence. IEEE Micro 37, 4 (2017), 30--38.Google ScholarDigital Library
Nadeem Firasta, Mark Buxton, Paula Jinbo, Kaveh Nasri, and Shihjong Kuo. 2008. Intel AVX: New frontiers in performance improvements and energy efficiency. Intel White Paper 19 (2008), 20.Google Scholar
Randall James Fisher. 2003. General-purpose SIMD within a register: Parallel processing on consumer microprocessors. Doctoral Dissertation.Google Scholar
M. Flynn. 1966. Very high-speed computing systems. Proceedings of the IEEE 54, 12 (Dec. 1966), 1901--1909. DOI:https://doi.org/10.1109/PROC.1966.5273Google ScholarCross Ref
G. D. Fuchs, N. C. Emley, I. N. Krivorotov, P. M. Braganca, E. M. Ryan, S. I. Kiselev, J. C. Sankey, D. C. Ralph, R. A. Buhrman, and J. A. Katine. 2004. Spin-transfer effects in nanoscale magnetic tunnel junctions. Applied Physics Letters 85, 7 (2004), 1205--1207.Google ScholarCross Ref
Daichi Fujiki, Scott Mahlke, and Reetuparna Das. 2018. In-memory data parallel processor. In Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 1--14.Google ScholarDigital Library
Pierre-Emmanuel Gaillardon, Luca Amar, Anne Siemon, Eike Linn, Rainer Waser, Anupam Chattopadhyay, and Giovanni De Micheli. 2016. The programmable logic-in-memory (PLiM) computer. In 2016 Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’16). IEEE, 427--432.Google Scholar
Mingyu Gao, Grant Ayers, and Christos Kozyrakis. 2015. Practical near-data processing for in-memory analytics frameworks. In 2015 International Conference on Parallel Architecture and Compilation (PACT’15). IEEE, 113--124.Google ScholarDigital Library
Simcha Gochman, Avi Mendelson, Alon Naveh, and Efraim Rotem. 2006. Introduction to Intel core duo processor architecture. Intel Technology Journal 10, 2 (2006), 89--97.Google ScholarCross Ref
Jonathan E. Green, Jang Wook Choi, Akram Boukai, Yuri Bunimovich, Ezekiel Johnston-Halperin, Erica DeIonno, Yi Luo, Bonnie A. Sheriff, Ke Xu, Young Shik Shin, et al. 2007. A 160-kilobit molecular electronic memory patterned at 10 11 bits per square centimetre. Nature 445, 7126 (2007), 414.Google Scholar
Beat Halg. 1990. On a micro-electro-mechanical nonvolatile memory cell. IEEE Transactions on Electron Devices 37, 10 (1990), 2230--2236.Google ScholarCross Ref
Said Hamdioui, Koenraad Laurent Maria Bertels, and Mottaqiallah Taouil. 2017. Computing Device for Big Data Applications Using Memristors. US Patent 9,824,753.Google Scholar
Said Hamdioui, Hoang Anh Du Nguyen, Mottaqiallah Taouil, Abu Sebastian, Manuel Le Gallo, Sandeep Pande, Siebren Schaafsma, Francky Catthoor, Shidhartha Das, Fernando G. Redondo, et al. 2019. Applications of computation-in-memory architectures based on memristive devices. In 2019 Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’19). IEEE, 486--491.Google Scholar
Said Hamdioui, Shahar Kvatinsky, Gert Cauwenberghs, Lei Xie, Nimrod Wald, Siddharth Joshi, Hesham Mostafa Elsayed, Henk Corporaal, and Koen Bertels. 2017. Memristor for computing: Myth or reality? In Proceedings of the Conference on Design, Automation 8 Test in Europe. European Design and Automation Association, 722--731.Google ScholarCross Ref
Said Hamdioui, Lei Xie, Hoang Anh Du Nguyen, Mottaqiallah Taouil, Koen Bertels, Henk Corporaal, Hailong Jiao, Francky Catthoor, Dirk Wouters, Linn Eike, et al. 2015. Memristor based computation-in-memory architecture for data-intensive applications. In Proceedings of the 2015 Design, Automation 8 Test in Europe Conference 8 Exhibition. EDA Consortium, 1718--1725.Google Scholar
JongWook Han, Choon-Sik Park, Dae-Hyun Ryu, and Eun-Soo Kim. 1999. Optical image encryption based on XOR operations. Optical Engineering 38, 1 (1999), 47--55.Google ScholarCross Ref
Adib Haron, Jintao Yu, Razvan Nane, Mottaqiallah Taouil, Said Hamdioui, and Koen Bertels. 2016. Parallel matrix multiplication on memristor-based computation-in-memory architecture. In 2016 International Conference on High Performance Computing 8 Simulation (HPCS’16). IEEE, 759--766.Google ScholarCross Ref
John L. Hennessy and David A. Patterson. 2011. Computer Architecture: A Quantitative Approach. Elsevier.Google ScholarDigital Library
HMC. 2018. Hybrid Memory Cube Specification 2.1. Retrieved from http://hybridmemorycube.org/.Google Scholar
M. Hosomi, H. Yamagishi, T. Yamamoto, K. Bessho, Y. Higo, K. Yamane, H. Yamada, M. Shoji, H. Hachino, C. Fukumoto, et al. 2005. A novel nonvolatile memory with spin torque transfer magnetization switching: Spin-RAM. In IEEE International Electron Devices Meeting, 2005. IEDM Technical Digest. IEEE, 459--462.Google ScholarCross Ref
Rotem Ben Hur and Shahar Kvatinsky. 2016. Memristive memory processing unit (MPU) controller for in-memory processing. In IEEE International Conference on the Science of Electrical Engineering (ICSEE’16). IEEE, 1--5.Google Scholar
IBM. 2014. Power 4 - The First Multi-Core, 1GHz Processor.Google Scholar
ITRS. 2010. ITRS ERD Report. Retrieved from http://www.itrs.net.Google Scholar
Subramanian S. Iyer and Howard L. Kalter. 1999. Embedded DRAM technology: Opportunities and challenges. IEEE Spectrum 36, 4 (1999), 56--64.Google ScholarDigital Library
Shubham Jain, Ashish Ranjan, Kaushik Roy, and Anand Raghunathan. 2017. Computing in memory with spin-transfer torque magnetic RAM. arXiv preprint arXiv:1703.02118 (2017).Google Scholar
Joe Jeddeloh and Brent Keeth. 2012. Hybrid memory cube new DRAM architecture increases density and performance. In 2012 Symposium on VLSI Technology (VLSIT’12). IEEE, 87--88.Google ScholarCross Ref
Zhang Jianwu, Zhao Danying, et al. 2008. Survey on microprocessor architecture and development trends. In 11th IEEE International Conference on Communication Technology, 2008 (ICCT’08). IEEE, 297--300.Google Scholar
David Judd, Katherine Yelick, Christoforos Kozyrakis, David Martin, and David Patterson. 2001. Exploiting on-chip memory bandwidth in the VIRAM compiler. In Intelligent Memory Systems. Springer, 122--134.Google Scholar
Hongshin Jun, Jinhee Cho, Kangseol Lee, Ho-Young Son, Kwiwook Kim, Hanho Jin, and Keith Kim. 2017. HBM (high bandwidth memory) DRAM technology and architecture. In 2017 IEEE International Memory Workshop (IMW’17). IEEE, 1--4.Google ScholarCross Ref
Ron Kalla, Balaram Sinharoy, William J. Starke, and Michael Floyd. 2010. Power7: IBM’s next-generation server processor. IEEE Micro 30, 2 (2010), 7--15.Google ScholarDigital Library
Yi Kang, Wei Huang, Seung-Moon Yoo, D. Keen, Zhenzhou Ge, V. Lam, P. Pattnaik, and J. Torrellas. [n.d.]. FlexRAM: Toward an advanced intelligent memory system. In 2012 IEEE 30th International Conference on Computer Design (ICCD’12). 5--14. DOI:https://doi.org/10.1109/ICCD.2012.6378608Google ScholarDigital Library
Yi Kang, Wei Huang, Seung-Moon Yoo, Diana Keen, Zhenzhou Ge, Vinh Lam, Pratap Pattnaik, and Josep Torrellas. 2012. FlexRAM: Toward an advanced intelligent memory system. In 2012 IEEE 30th International Conference on Computer Design (ICCD’12). IEEE, 5--14.Google ScholarDigital Library
Doris Keitel-Schulz and Norbert Wehn. 1998. Issues in embedded DRAM development and applications. In Proceedings of the 11th International Symposium on System Synthesis. IEEE Computer Society, 23--31.Google ScholarDigital Library
Doris Keitel-Schulz and Norbert Wehn. 2001. Embedded DRAM development: Technology, physical design, and application issues. IEEE Design 8 Test of Computers 18, 3 (2001), 7--15.Google Scholar
Kyosun Kim, Sangho Shin, and Sung-Mo Kang. 2011. Stateful logic pipeline architecture. In 2011 IEEE International Symposium of Circuits and Systems (ISCAS’11). IEEE, 2497--2500.Google ScholarCross Ref
David Kirk et al. 2007. NVIDIA CUDA software and GPU parallel computing architecture. In ISMM, Vol. 7. 103--104.Google ScholarDigital Library
Christoforos Kozyrakis. 2002. Scalable Vector Media-Processors for Embedded Systems. Technical Report. California University Berkeley Computer Science Division.Google Scholar
Christoforos Kozyrakis and David Patterson. 2002. Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks. In Proceedings of the 35th Annual ACM/IEEE International Symposium on Microarchitecture. IEEE Computer Society Press, 283--293.Google ScholarDigital Library
Christoforos E. Kozyrakis, Stylianos Perissakis, David Patterson, Thomas Anderson, Krste Asanovic, Neal Cardwell, Richard Fromm, Jason Golbus, Benjamin Gribstad, Kimberly Keeton, et al. 1997. Scalable processors in the billion-transistor era: IRAM. Computer 30, 9 (1997), 75--78.Google ScholarDigital Library
Nasser Kurd, Muntaquim Chowdhury, Edward Burton, Thomas P. Thomas, Christopher Mozak, Brent Boswell, Praveen Mosalikanti, Mark Neidengard, Anant Deval, Ashish Khanna, et al. 2014. Haswell: A family of IA 22nm processors. IEEE Journal of Solid-State Circuits 50, 1 (2014), 49--58.Google ScholarCross Ref
Shahar Kvatinsky, Dmitry Belousov, Slavik Liman, Guy Satat, Nimrod Wald, Eby G. Friedman, Avinoam Kolodny, and Uri C. Weiser. 2014. MAGIC--Memristor-aided logic. IEEE Transactions on Circuits and Systems II: Express Briefs 61, 11 (2014), 895--899.Google ScholarCross Ref
Shahar Kvatinsky, Guy Satat, Nimrod Wald, Eby G. Friedman, Avinoam Kolodny, and Uri C. Weiser. 2014. Memristor-based material implication (IMPLY) logic: Design principles and methodologies. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 22, 10 (2014), 2054--2066.Google ScholarCross Ref
Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2010. Phase change memory architecture and the quest for scalability. Communications of the ACM 53, 7 (2010), 99--106.Google ScholarDigital Library
Jong Chern Lee, Jihwan Kim, Kyung Whan Kim, Young Jun Ku, Dae Suk Kim, Chunseok Jeong, Tae Sik Yun, Hongjung Kim, Ho Sung Cho, Yeon Ok Kim, et al. 2016. 18.3 A 1.2 V 64Gb 8-channel 256GB/s HBM DRAM with peripheral-base-die architecture and small-swing technique on heavy load interface. In 2016 IEEE International Solid-State Circuits Conference (ISSCC’16). IEEE, 318--319.Google ScholarCross Ref
Eero Lehtonen, Jussi H. Poikonen, and Mika Laiho. 2014. Memristive stateful logic. In Memristor Networks. Springer, 603--623.Google Scholar
John D. Leidel and Yong Chen. 2016. Hmc-sim-2.0: A simulation platform for exploring custom memory cube operations. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW’16). IEEE, 621--630.Google Scholar
Chao Li, Wendy Fan, Bo Lei, Daihua Zhang, Song Han, Tao Tang, Xiaolei Liu, Zuqin Liu, Sylvia Asano, Meyya Meyyappan, et al. 2004. Multilevel memory based on molecular devices. Applied Physics Letters 84, 11 (2004), 1949--1951.Google ScholarCross Ref
Chao Li, Daihua Zhang, Xiaolei Liu, Song Han, Tao Tang, Chongwu Zhou, Wendy Fan, Jessica Koehne, Jie Han, Meyya Meyyappan, et al. 2003. Fabrication approach for molecular memory arrays. Applied Physics Letters 82, 4 (2003), 645--647.Google ScholarCross Ref
Shuangchen Li, Dimin Niu, Krishna T. Malladi, Hongzhong Zheng, Bob Brennan, and Yuan Xie. 2017. DRISA: A DRAM -based reconfigurable in-situ accelerator. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 288--301.Google ScholarDigital Library
Shuangchen Li, Cong Xu, Qiaosha Zou, Jishen Zhao, Yu Lu, and Yuan Xie. 2016. Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories. In Proceeding of ACM/EDAC/IEEE Design Automation Conference (DAC). IEEE, 173--178.Google ScholarDigital Library
E. Linn, R. Rosezin, S. Tappertzhofen, R. Waser, et al. 2012. Beyond von Neumann--logic operations in passive crossbar arrays alongside memory operations. Nanotechnology 23, 30 (2012), 305205.Google ScholarCross Ref
Andrea Lodi, Mario Toma, Fabio Campi, Andrea Cappelli, Roberto Canegallo, and Roberto Guerrieri. 2003. A VLIW processor with reconfigurable instruction set for embedded applications. IEEE Journal of Solid-state Circuits 38, 11 (2003), 1876--1886.Google ScholarCross Ref
Joe Macri. 2015. AMD’s next generation GPU and high bandwidth memory architecture: FURY. In 2015 IEEE Hot Chips 27 Symposium (HCS’15). IEEE, 1--26.Google ScholarCross Ref
Ken Mai, Tim Paaske, Nuwan Jayasena, Ron Ho, William J. Dally, and Mark Horowitz. 2000. Smart memories: A modular reconfigurable architecture. ACM SIGARCH Computer Architecture News 28, 2 (2000), 161--171.Google ScholarDigital Library
Ariel Maislos et al. 2011. A new era in embedded Flash memory. In Flash Memory Summit.Google Scholar
Jack A. Mandelman, Robert H. Dennard, Gary B. Bronner, John K. DeBrosse, Rama Divakaruni, Yujun Li, and Carl J. Radens. 2002. Challenges and future directions for the scaling of dynamic random-access memory (DRAM). IBM Journal of Research and Development 46, 2.3 (2002), 187--212.Google ScholarDigital Library
Pedro Marcuello, Antonio González, and Jordi Tubella. 1998. Speculative multithreaded processors. In Proceedings of the 12th International Conference on Supercomputing. ACM, 77--84.Google ScholarDigital Library
Sparsh Mittal. 2018. A survey of ReRAM-based architectures for processing-in-memory and neural networks. Machine Learning and Knowledge Extraction 1, 1 (2018), 75--114. DOI:https://doi.org/10.3390/make1010005Google ScholarCross Ref
Amir Morad, Leonid Yavits, and Ran Ginosar. 2014. Efficient dense and sparse Matrix multiplication on GP-SIMD. In 2014 24th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS’14). IEEE, 1--8.Google ScholarCross Ref
Amir Morad, Leonid Yavits, and Ran Ginosar. 2015. GP-SIMD processing-in-memory. ACM Transactions on Architecture and Code Optimization (TACO) 11, 4 (2015), 53.Google Scholar
Amir Morad, Leonid Yavits, Shahar Kvatinsky, and Ran Ginosar. 2016. Resistive GP-SIMD processing-in-memory. ACM Transactions on Architecture and Code Optimization (TACO) 12, 4 (2016), 57.Google ScholarDigital Library
Onur Mutlu. 2013. Memory scaling: A systems architecture perspective. In 2013 5th IEEE International Memory Workshop (IMW’13). IEEE, 21--25.Google ScholarCross Ref
Ravi Nair. 2015. Evolution of memory architecture. Proceedings of the IEEE 103, 8 (2015), 1331--1345.Google ScholarCross Ref
Ravi Nair, Samuel F. Antao, Carlo Bertolli, Pradip Bose, Jose R. Brunheroto, Tong Chen, C.-Y. Cher, Carlos H. A. Costa, Jun Doi, Constantinos Evangelinos, et al. 2015. Active memory cube: A processing-in-memory architecture for exascale systems. IBM Journal of Research and Development 59, 2/3 (2015), 17--1.Google ScholarDigital Library
H. Noyes et al. 2014. Micron’s automata processor architecture: Reconfigurable and massively parallel automata processing. In Proceedings of 5th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies.Google Scholar
NVIDIA. 2012. Tesla K20X GPU Accelerator Board Specification.Google Scholar
Mark Oskin, Frederic T. Chong, and Timothy Sherwood. 1998. Active Pages: A Computation Model for Intelligent Memory. Vol. 26. IEEE Computer Society.Google ScholarDigital Library
David Patterson, Thomas Anderson, Neal Cardwell, Richard Fromm, Kimberly Keeton, Christoforos Kozyrakis, Randi Thomas, and Katherine Yelick. 1997. A case for intelligent RAM. IEEE Micro 17, 2 (1997), 34--44.Google ScholarDigital Library
David A. Patterson. 2006. Future of computer architecture. In Berkeley EECS Annual Research Symposium (BEARS), College of Engineering, UC Berkeley, US.Google Scholar
J. Thomas Pawlowski. 2011. Hybrid memory cube (HMC). In 2011 IEEE Hot Chips 23 Symposium (HCS’11). IEEE, 1--24.Google ScholarCross Ref
Alex Peleg and Uri Weiser. 1996. MMX technology extension to the Intel architecture. IEEE Micro 16, 4 (1996), 42--50.Google ScholarDigital Library
M. Radosavljević, M. Freitag, K. V. Thadani, and A. T. Johnson. 2002. Nonvolatile molecular memory elements based on ambipolar nanotube field effect transistors. Nano Letters 2, 7 (2002), 761--764.Google ScholarCross Ref
R. M. Ramanathan. 2006. Intel® multi-core processors. In Making the Move to Quad-Core and Beyond.Google Scholar
Simone Raoux, Feng Xiong, Matthias Wuttig, and Eric Pop. 2014. Phase change materials and phase change memory. MRS Bulletin 39, 8 (2014), 703--710.Google ScholarCross Ref
John Reuben, Rotem Ben-Hur, Nimrod Wald, Nishil Talati, Ameer Haj Ali, Pierre-Emmanuel Gaillardon, and Shahar Kvatinsky. 2017. Memristive logic: A framework for evaluation and comparison. In 2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS’17). IEEE, 1--8.Google ScholarCross Ref
Gurtej S. Sandhu. 2013. Emerging memories technology landscape. In 2013 13th Non-Volatile Memory Technology Symposium (NVMTS’13). IEEE, 1--5.Google ScholarCross Ref
Karthikeyan Sankaralingam, Ramadass Nagarajan, Haiming Liu, Changkyu Kim, Jaehyuk Huh, Doug Burger, Stephen W. Keckler, and Charles R. Moore. 2003. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In ACM SIGARCH Computer Architecture News, Vol. 31. ACM, 422--433.Google Scholar
Vivek Seshadri, Kevin Hsieh, Amirali Boroum, Donghyuk Lee, Michael A. Kozuch, Onur Mutlu, Phillip B. Gibbons, and Todd C. Mowry. 2015. Fast bulk bitwise AND and OR in DRAM. IEEE Computer Architecture Letters 14, 2 (2015), 127--131.Google ScholarDigital Library
Vivek Seshadri, Donghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali Boroumand, Jeremie Kim, Michael A. Kozuch, Onur Mutlu, Phillip B. Gibbons, and Todd C. Mowry. 2017. Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 273--287.Google Scholar
Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R. Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Computer Architecture News 44, 3 (2016), 14--26.Google ScholarDigital Library
M. A. Shami and A. Hemani. 2012. Classification of massively parallel computer architectures. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW’12). 344--351. DOI:https://doi.org/10.1109/IPDPSW.2012.42Google ScholarDigital Library
Patrick Siegl, Rainer Buchty, and Mladen Berekovic. 2016. Data-centric computing frontiers: A survey on processing-in-memory. In Proceedings of the 2nd International Symposium on Memory Systems. ACM, 295--308.Google ScholarDigital Library
A. Siemon, S. Menzel, A. Chattopadhyay, R. Waser, and E. Linn. 2015. In-memory adder functionality in 1S1R arrays. In 2015 IEEE International Symposium on Circuits and Systems (ISCAS’15). IEEE, 1338--1341.Google Scholar
Anne Siemon, Stephan Menzel, Rainer Waser, and Eike Linn. 2015. A complementary resistive switch-based crossbar array adder. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 5, 1 (2015), 64--74.Google ScholarCross Ref
Gagandeep Singh, Lorenzo Chelini, Stefano Corda, Ahsan Javed Awan, Sander Stuijk, Roel Jordans, Henk Corporaal, and Albert-Jan Boonstra. 2018. A review of near-memory computing architectures: Opportunities and challenges. In Proceedings of the 21st Euromicro Conference on Digital System Design (DSD’18).Google ScholarCross Ref
D. B. Skillicorn. 1988. A taxonomy for computer architectures. Computer 21, 11 (Nov. 1988), 46--57. DOI:https://doi.org/10.1109/2.86786Google ScholarDigital Library
G. Snider. 2005. Computing with hysteretic resistor crossbars. Applied Physics A: Materials Science 8 Processing 80, 6 (2005), 1165--1172.Google Scholar
Kyomin Sohn, Won-Joo Yun, Reum Oh, Chi-Sung Oh, Seong-Young Seo, Min-Sang Park, Dong-Hak Shin, Won-Chang Jung, Sang-Hoon Shin, Je-Min Ryu, et al. 2017. A 1.2 V 20nm 307GB/s HBM DRAM with at-speed wafer-level IO test scheme and adaptive refresh considering temperature distribution. IEEE Journal of Solid-State Circuits 52, 1 (2017), 250--260.Google ScholarCross Ref
Harold S. Stone. 1970. A logic-in-memory computer. IEEE Transactions on Computing 100, 1 (1970), 73--78.Google ScholarDigital Library
Arun Subramaniyan, Jingcheng Wang, Ezhil R. M. Balasubramanian, David Blaauw, Dennis Sylvester, and Reetuparna Das. 2017. Cache automaton. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-50’17). ACM, New York, NY, 259--272. DOI:https://doi.org/10.1145/3123939.3123986Google ScholarDigital Library
Jinwoo Suh, Eun-Gyu Kim, Stephen P. Crago, Lakshmi Srinivasan, and Matthew C. French. 2003. A performance analysis of PIM, stream processing, and tiled processing on memory-intensive signal processing kernels. In ACM SIGARCH Computer Architecture News, Vol. 31. ACM, 410--421.Google Scholar
Mark R. Thistle and Burton J. Smith. 1988. A processor architecture for Horizon. In Proceedings of Supercomputing’88. Vol. 1. IEEE, 35--41.Google Scholar
Dean M. Tullsen, Susan J. Eggers, and Henry M. Levy. 1995. Simultaneous multithreading: Maximizing on-chip parallelism. In ACM SIGARCH Computer Architecture News, Vol. 23. ACM, 392--403.Google Scholar
Mario Vestias and Horácio Neto. 2014. Trends of CPU, GPU and FPGA for high-performance computing. In 2014 24th International Conference on Field Programmable Logic and Applications (FPL’14). IEEE, 1--6.Google ScholarCross Ref
Borui Wang, Martin Torres, Dong Li, Jishen Zhao, and Florin Rusu. 2016. Performance implications of processing-in-memory designs on data-intensive applications. In 2016 45th International Conference on Parallel Processing Workshops (ICPPW’16). IEEE, 115--122.Google ScholarCross Ref
Jue Wang, Xiangyu Dong, Yuan Xie, and Norman P. Jouppi. 2014. Endurance-aware cache line management for non-volatile caches. ACM Transactions on Architecture and Code Optimization (TACO) 11, 1 (2014), 4.Google ScholarDigital Library
Ying Wang, Yinhe Han, Lei Zhang, Huawei Li, and Xiaowei Li. 2015. ProPRAM: Exploiting the transparent logic resources in non-volatile memory for near data computing. In Proceedings of the 52nd Annual Design Automation Conference. ACM, 47.Google ScholarDigital Library
Rainer Waser. 2012. Redox-based resistive switching memories. Journal of Nanoscience and Nanotechnology 12, 10 (2012), 7628--7640.Google ScholarCross Ref
Rainer Waser and Masakazu Aono. 2007. Nanoionics-based resistive switching memories. Nature Materials 6, 11 (2007), 833.Google ScholarCross Ref
Stephan Wong, Thijs Van As, and Geoffrey Brown. 2008. ρ-VEX: A reconfigurable and extensible softcore VLIW processor. In International Conference on ICECE Technology, 2008 (FPT’08). IEEE, 369--372.Google Scholar
Wm A. Wulf and Sally A. McKee. 1995. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Computer Architecture News 23, 1 (1995), 20--24.Google ScholarDigital Library
Lei Xie, Hoang Anh Du Nguyen, Mottaqiallah Taouil, and Koen Bertels Said Hamdioui. 2015. Fast Boolean logic mapped on memristor crossbar. In 2015 33rd IEEE International Conference on Computer Design (ICCD’15). IEEE, 335--342.Google ScholarDigital Library
Lei Xie, Hoang Anh Du Nguyen, Jintao Yu, Ali Kaichouhi, Mottaqiallah Taouil, Mohammad AlFailakawi, and Said Hamdioui. 2017. Scouting logic: A novel memristor-based logic design for resistive computing. In IEEE Computer Society Annual Symposium on VLSI (ISVLSI’17). IEEE, 335--340.Google ScholarCross Ref
Sheng Xu, Xiaoming Chen, Ying Wang, Yinhe Han, Xuehai Qian, and Xiaowei Li. 2018. PIMSim: A flexible and detailed processing-in-memory simulator. IEEE Computer Architecture Letters 18, 1 (2018), 6--9.Google ScholarDigital Library
J. Joshua Yang, Dmitri B. Strukov, and Duncan R. Stewart. 2013. Memristive devices for computing. Nature Nanotechnology 8, 1 (2013), 13--24.Google ScholarCross Ref
Leonid Yavits, Shahar Kvatinsky, Amir Morad, and Ran Ginosar. 2015. Resistive associative processor. In CAL.Google Scholar
Jintao Yu, Lei Xie, Mottaqiallah Taouil, and Said Hamdioui. 2018. Memristive devices for computation-in-memory. In Design, Automation and Test in Europe (DATE’18).Google Scholar
Shimeng Yu and Pai-Yu Chen. 2016. Emerging memory technologies: Recent trends and prospects. IEEE Solid-State Circuits Magazine 8, 2 (2016), 43--56.Google ScholarCross Ref
Jian-Gang Zhu. 2008. Magnetoresistive random access memory: The path to competitiveness and scalability. Proceedings of the IEEE 96, 11 (2008), 1786--1798.Google ScholarCross Ref

Index Terms

A Classification of Memory-Centric Computing
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Special purpose systems
2. Hardware
  1. Emerging technologies
    1. Spintronics and magnetic technologies

Recommendations

A Survey on Memory-centric Computer Architectures
Faster and cheaper computers have been constantly demanding technological and architectural improvements. However, current technology is suffering from three technology walls: leakage wall, reliability wall, and cost wall. Meanwhile, existing architecture ...
Read More
A computation-in-memory accelerator based on resistive devices
MEMSYS '19: Proceedings of the International Symposium on Memory Systems

Today's computing architectures suffer from the three well-known bottlenecks, which are the memory, the power and the instruction-level parallelism walls. Emerging non-volatile technologies, such as memristor, enable new resistive architectures that ...
Read More
Memory-centric communication architecture for reconfigurable computing
ARC'10: Proceedings of the 6th international conference on Reconfigurable Computing: architectures, Tools and Applications

This paper presents a memory-centric communication architecture for a reconfigurable array of processing elements, which reduces the communication overhead by establishing a direct communication channel through a memory between the array and other ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Journal on Emerging Technologies in Computing Systems Volume 16, Issue 2
April 2020
261 pages
ISSN:1550-4832
EISSN:1550-4840
DOI:10.1145/3375712
Editor:
Zhaojun Bai
University of California at Davis, USA
Issue’s Table of Contents
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States

Journal Family
ACM Journals for the Design of Smart and Connected Systems
Publication History
- Published: 30 January 2020
- Accepted: 1 October 2019
- Revised: 1 September 2019
- Received: 1 December 2018
Published in jetc Volume 16, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Computation-in-memory
memory-centric computer architectures
resistive computing
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 39
  Total Citations
  View Citations
- 2,576
  Total Downloads
- Downloads (Last 12 months)586
- Downloads (Last 6 weeks)85
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

A Classification of Memory-Centric Computing

ACM Journal on Emerging Technologies in Computing Systems

Abstract

References

Cited By

Index Terms

Recommendations

A Survey on Memory-centric Computer Architectures

A computation-in-memory accelerator based on resistive devices

Memory-centric communication architecture for reconfigurable computing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

A Classification of Memory-Centric Computing

ACM Journal on Emerging Technologies in Computing Systems

Abstract

References

Cited By

Index Terms

Recommendations

A Survey on Memory-centric Computer Architectures

A computation-in-memory accelerator based on resistive devices

Memory-centric communication architecture for reconfigurable computing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media