tutorial

The Mondrian Data Engine

Authors:
Mario Drumond

EcoCloud, EPFL

EcoCloud, EPFL
View Profile

,
Alexandros Daglis

EcoCloud, EPFL

EcoCloud, EPFL
View Profile

,
Nooshin Mirzadeh

EcoCloud, EPFL

EcoCloud, EPFL
View Profile

,
Dmitrii Ustiugov

EcoCloud, EPFL

EcoCloud, EPFL
View Profile

,
Javier Picorel

EcoCloud, EPFL

EcoCloud, EPFL
View Profile

,
Babak Falsafi

EcoCloud, EPFL

EcoCloud, EPFL
View Profile

,
Boris Grot

University of Edinburgh

University of Edinburgh
View Profile

,
Dionisios Pnevmatikatos

FORTH-ICS & ECE-TUC

FORTH-ICS & ECE-TUC
View Profile

Authors Info & Claims

ACM SIGARCH Computer Architecture News Volume 45 Issue 2May 2017pp 639–651https://doi.org/10.1145/3140659.3080233

Published:24 June 2017Publication History

ACM SIGARCH Computer Architecture News

Abstract

The increasing demand for extracting value out of ever-growing data poses an ongoing challenge to system designers, a task only made trickier by the end of Dennard scaling. As the performance density of traditional CPU-centric architectures stagnates, advancing compute capabilities necessitates novel architectural approaches. Near-memory processing (NMP) architectures are reemerging as promising candidates to improve computing efficiency through tight coupling of logic and memory. NMP architectures are especially fitting for data analytics, as they provide immense bandwidth to memory-resident data and dramatically reduce data movement, the main source of energy consumption.

Modern data analytics operators are optimized for CPU execution and hence rely on large caches and employ random memory accesses. In the context of NMP, such random accesses result in wasteful DRAM row buffer activations that account for a significant fraction of the total memory access energy. In addition, utilizing NMP's ample bandwidth with fine-grained random accesses requires complex hardware that cannot be accommodated under NMP's tight area and power constraints. Our thesis is that efficient NMP calls for an algorithm-hardware co-design that favors algorithms with sequential accesses to enable simple hardware that accesses memory in streams. We introduce an instance of such a co-designed NMP architecture for data analytics, the Mondrian Data Engine. Compared to a CPU-centric and a baseline NMP system, the Mondrian Data Engine improves the performance of basic data analytics operators by up to 49x and 5x, and efficiency by up to 28x and 5x, respectively.

References

Daniel Abadi, Peter A. Boncz, Stavros Harizopoulos, Stratos Idreos, and Samuel Madden. 2013. The Design and Implementation of Modern Column-Oriented Database Systems. Foundations and Trends in Databases 5, 3 (2013), 197--280.Google ScholarDigital Library
Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A scalable processing-in-memory accelerator for parallel graph processing. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA 2015). 105--117. Google ScholarDigital Library
Junwhan Ahn, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. PIM-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA 2015). 336--348. Google ScholarDigital Library
Berkin Akin, Franz Franchetti, and James C. Hoe. 2015. Data reorganization in memory using 3D-stacked DRAM. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA 2015). 131--143. Google ScholarDigital Library
AMD. 2016. High Bandwidth Memory, Reinventing Memory Technology. (2016). Retrieved April 26, 2017 from http://www.amd.com/en-us/innovations/software-technologies/hbm.Google Scholar
ARM. 2017. Cortex-A35 Processor. (2017). Retrieved April 26, 2017 from https://www.arm.com/products/processors/cortex-a/cortex-a35-processor.php.Google Scholar
Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload analysis of a large-scale key-value store. In Proceedings of the ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS 2012). 53--64. Google ScholarDigital Library
Cagri Balkesen, Gustavo Alonso, Jens Teubner, and M. Tamer Özsu. 2013. Multi-core, Main-memory Joins: Sort vs. Hash Revisited. Proceedings of the VLDB Endowment 7, 1 (Sept. 2013), 85--96. Google ScholarDigital Library
Cagri Balkesen, Gustavo Alonso, Jens Teubner, and M Tamer Ozsu. 2013. Multicore hash joins source code. (2013). Retrieved April 26, 2017 from https://www.systems.ethz.ch/node/334/.Google Scholar
Cagri Balkesen, Jens Teubner, Gustavo Alonso, and M. Tamer Özsu. 2013. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In Proceedings of the 29th International Conference on Data Engineering, (ICDE 2013). 362--373. Google ScholarDigital Library
Spyros Blanas, Yinan Li, and Jignesh M. Patel. 2011. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2011). 37--48. Google ScholarDigital Library
Peter A. Boncz, Marcin Zukowski, and Niels Nes. 2005. MonetDB/X100: Hyper-Pipelining Query Execution. In Preceedings of the Second Biennial Conference on Innovative Data Systems Research (CIDR 2005). 225--237. http://www.cidrdb.org/cidr2005/papers/P19.pdfGoogle Scholar
John B. Carter, Wilson C. Hsieh, Leigh Stoller, Mark R. Swanson, Lixin Zhang, Erik Brunvand, Al Davis, Chen-Chi Kuo, Ravindra Kuramkote, Michael A. Parker, Lambert Schaelicke, and Terry Tateyama. 1999. Impulse: Building a Smarter Memory Controller. In Proceedings of the 5th International Symposium on High-Performance Computer Architecture (HPCA 1999). 70--79. Google ScholarDigital Library
Ke Chen, Sheng Li, Naveen Muralimanohar, Jung Ho Ahn, Jay B. Brockman, and Norman P. Jouppi. 2012. CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory. In 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE 2012). 33--38. Google ScholarDigital Library
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2014). 269--284. Google ScholarDigital Library
Bill Dally. 2015. Keynote: Challenges for Future Computing Systems. (2015). Retrieved April 26, 2017 from https://www.cs.colostate.edu/~cs575dl/Sp2015/Lectures/Dally2015.pdf.Google Scholar
Jeffrey Dean and Luiz André Barroso. 2013. The tail at scale. Commun. ACM 56, 2 (2013), 74--80. Google ScholarDigital Library
Hewlett-Packard Enterprise. 2015. The Machine: A new kind of computer. (2015). Retrieved April 26, 2017 from http://www.labs.hpe.com/research/themachine/.Google Scholar
Babak Falsafi, Mircea Stan, Kevin Skadron, Nuwan Jayasena, Yunji Chen, Jinhua Tao, Ravi Nair, Jaime H. Moreno, Naveen Muralimanohar, Karthikeyan Sankaralingam, and Cristian Estan. 2016. Near-Memory Data Services. IEEE Micro 36, 1 (2016), 6--13. Google ScholarDigital Library
Michael Ferdman, Almutaz Adileh, Yusuf Onur Koçberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi. 2012. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2012). 37--48. Google ScholarDigital Library
Apache Software Foundation. 2017. Apache Spark. (2017). Retrieved April 26, 2017 from http://spark.apache.org/.Google Scholar
Mingyu Gao, Grant Ayers, and Christos Kozyrakis. 2015. Practical Near-Data Processing for In-Memory Analytics Frameworks. In Proceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT 2015). 113--124. Google ScholarDigital Library
Mingyu Gao and Christos Kozyrakis. 2016. HRL: Efficient and flexible reconfigurable logic for near-data processing. In Proceedings of the 2016 International Symposium on High Performance Computer Architecture (HPCA 2016). 126--137.Google ScholarCross Ref
Boris Grot, Joel Hestness, Stephen W. Keckler, and Onur Mutlu. 2011. Kilo-NOC: a heterogeneous network-on-chip architecture for scalability and service guarantees. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA 2011). 401--412. Google ScholarDigital Library
Linley Group. 2015. Hexagon 680 Adds Vector Extensions. Mobile Chip Report (September 2015).Google Scholar
Linley Gwennap. 2013. Qualcomm Krait 400 hits 2.3 GHz. Microprocessor report 27, 1 (January 2013), 1--6.Google Scholar
Linley Gwennap. 2015. Cortex-A35 Extends Low End. Microprocessor Report 29, 11 (November 2015), 1--10.Google Scholar
Mary W. Hall, Peter M. Kogge, Jefferey G. Koller, Pedro C. Diniz, Jacqueline Chame, Jeff Draper, Jeff LaCoss, John J. Granacki, Jay B. Brockman, Apoorv Srivastava, William C. Athas, Vincent W. Freeh, Jaewook Shin, and Joonseok Park. 1999. Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture. In Proceedings of the ACM/IEEE Conference on Supercomputing, (SC 1999). 57. Google ScholarDigital Library
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. In Proceedings of the 43rd Annual International Symposium on Computer Architecture (ISCA 2016). 243--254. Google ScholarDigital Library
Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki. 2011. Toward Dark Silicon in Servers. IEEE Micro 31, 4 (2011), 6--15. Google ScholarDigital Library
Mark Harris. 2013. Unified memory in CUDA 6. (2013). Retrieved April 26, 2017 from http://on-demand.gputechconf.com/supercomputing/2013/presentation/SC3120-Unified-Memory-CUDA-6.0.pdfGoogle Scholar
IBM. 2017. IBM DB2. (2017). Retrieved April 26, 2017 from http://www.ibm.com/analytics/us/en/technology/db2/.Google Scholar
Joe Jeddeloh and Brent Keeth. 2012. Hybrid memory cube new DRAM architecture increases density and performance. In VLSI Technology (VLSIT), 2012 Symposium on. IEEE, 87--88.Google ScholarCross Ref
JEDEC. 2013. Wide I/O 2 Standard. (2013). Retrieved April 26, 2017 from http://www.jedec.org/standards-documents/results/jesd229-2.Google Scholar
JEDEC. 2015. High Bandwidth Memory (HBM) DRAM. (2015). Retrieved April 26, 2017 from https://www.jedec.org/standards-documents/docs/jesd235a.Google Scholar
Svilen Kanev, Juan Pablo Darago, Kim M. Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, and David M. Brooks. 2015. Profiling a warehouse-scale computer. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA 2015). 158--169. Google ScholarDigital Library
Yi Kang, Wei Huang, Seung-Moon Yoo, Diana Keen, Zhenzhou Ge, Vinh Vi Lam, Josep Torrellas, and Pratap Pattnaik. 1999. FlexRAM: Toward an Advanced Intelligent Memory System. In Proceedings of the IEEE International Conference On Computer Design, VLSI in Computers and Processors, (ICCD 1999). 192--201. Google ScholarDigital Library
Changkyu Kim, Tim Kaldewey, Victor W. Lee, Eric Sedlar, Anthony D. Nguyen, Nadathur Satish, Jatin Chhugani, Andrea Di Blas, and Pradeep Dubey. 2009. Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-core CPUs. Proceedings of the VLDB Endowment 2, 2 (Aug. 2009), 1378--1389. Google ScholarDigital Library
Yusuf Onur Koçberber, Boris Grot, Javier Picorel, Babak Falsafi, Kevin T. Lim, and Parthasarathy Ranganathan. 2013. Meet the walkers: accelerating index traversals for in-memory databases. In Proceedings of the 46th Annual International Symposium on Microarchitecture (MICRO 2013). 468--479. Google ScholarDigital Library
Andrew Lamb, Matt Fuller, Ramakrishna Varadarajan, Nga Tran, Ben Vandiver, Lyric Doshi, and Chuck Bear. 2012. The Vertica Analytic Database: C-store 7 Years Later. Proceedings of the VLDB Endowment 5, 12 (Aug. 2012), 1790--1801. Google ScholarDigital Library
Sheng Li, Ke Chen, Jung Ho Ahn, Jay B. Brockman, and Norman P. Jouppi. 2011. CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques. In Proceedings of the 2011 International Conference on Computer-Aided Design (ICCAD 2011). 694--701. Google ScholarDigital Library
Kevin T. Lim, Jichuan Chang, Trevor N. Mudge, Parthasarathy Ranganathan, Steven K. Reinhardt, and Thomas F. Wenisch. 2009. Disaggregated memory for expansion and sharing in blade servers. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA 2009). 267--278. Google ScholarDigital Library
Stefan Manegold, Peter A. Boncz, and Martin L. Kersten. 2002. Optimizing Main-Memory Join on Modern Hardware. IEEE Trans. Knowl. Data Eng. 14, 4 (2002), 709--730. Google ScholarDigital Library
Mozhgan Mansuri, James E. Jaussi, Joseph T. Kennedy, Tzu-Chien Hsueh, Sudip Shekhar, Ganesh Balamurugan, Frank O'Mahony, Clark Roberts, Randy Mooney, and Bryan Casper. 2013. A Scalable 0.128-1 Tb/s, 0.8-2.6 pJ/bit, 64-Lane Parallel I/O in 32-nm CMOS. J. Solid-State Circuits 48, 12 (2013), 3229--3242.Google ScholarCross Ref
MEMSQL. 2017. MEMSQL: The Fastest In-Memory Database. (2017). Retrieved April 26, 2017 from http://www.memsql.com/.Google Scholar
Micron. 2014. Hybrid Memory Cube Second Generation. (2014). Retrieved April 26, 2017 from http://investors.micron.com/releasedetail.cfm?ReleaseID=828028.Google Scholar
Micron. 2017. DDR3 SDRAM System-Power Calculator. (2017). Retrieved April 26, 2017 from https://www.micron.com/support/tools-and-utilities/power-calc.Google Scholar
Nooshin Mirzadeh, Yusuf Onur Koçberber, Babak Falsafi, and Boris Grot. 2015. Sort vs. hash join revisited for near-memory execution. In Proceedings of the 5th Workshop on Architectures and Systems for Big Data (ASBD 2015). http://acs.ict.ac.cn/asbd2015/papers/ASBD_2015_submission_3.pdfGoogle Scholar
Cavium Networks. 2014. Cavium Announces Availability of ThunderX: Industry's First 48 Core Family of ARMv8 Workload Optimized Processors for Next Generation Data Center & Cloud Infrastructure. (2014). Retrieved April 26, 2017 from http://www.cavium.com/newsevents-Cavium-Announces-Availability-of-ThunderX.html.Google Scholar
Thomas Neumann, Tobias Mühlbauer, and Alfons Kemper. 2015. Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD 2015). 677--689. Google ScholarDigital Library
Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, and Venkateshwaran Venkataramani. 2013. Scaling Memcache at Facebook. In Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2013). USENIX, 385--398. https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/nishtala Google ScholarDigital Library
Mark Oskin, Frederic T. Chong, and Timothy Sherwood. 1998. Active Pages: A Computation Model for Intelligent Memory. In Proceedings of the 25th Annual International Symposium on Computer Architecture (ISCA 1998). 192--203. Google ScholarDigital Library
John K. Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazières, Subhasish Mitra, Aravind Narayanan, Diego Ongaro, Guru M. Parulkar, Mendel Rosenblum, Stephen M. Rumble, Eric Stratmann, and Ryan Stutsman. 2011. The case for RAMCloud. Commun. ACM 54, 7 (2011), 121--130. Google ScholarDigital Library
D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick. 1997. A case for intelligent RAM. IEEE Micro 17, 2 (Mar 1997), 34--44. Google ScholarDigital Library
Javier Picorel, Djordje Jevdjic, and Babak Falsafi. 2016. Near-Memory Address Translation. CoRR abs/1612.00445 (2016). http://arxiv.org/abs/1612.00445Google Scholar
Seth H. Pugsley, Jeffrey Jestes, Rajeev Balasubramonian, Vijayalakshmi Srinivasan, Alper Buyuktosunoglu, Al Davis, and Feifei Li. 2014. Comparing Implementations of Near-Data Computing with In-Memory MapReduce Workloads. IEEE Micro 34, 4 (2014), 44--52.Google ScholarCross Ref
Seth H. Pugsley, Jeffrey Jestes, Huihui Zhang, Rajeev Balasubramonian, Vijayalakshmi Srinivasan, Alper Buyuktosunoglu, Al Davis, and Feifei Li. 2014. NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads. In Proceedings of the 2014 International Symposium on Performance Analysis of Systems and Software (ISPASS 2014). 190--200.Google ScholarCross Ref
Paul Rosenfeld, Elliott Cooper-Balis, and Bruce Jacob. 2011. DRAMSim2: A Cycle Accurate Memory System Simulator. Computer Architecture Letters 10, 1 (2011), 16--19. Google ScholarDigital Library
P. Griffiths Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. 1979. Access Path Selection in a Relational Database Management System. In Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data (SIGMOD 1979). ACM, New York, NY, USA, 23--34. Google ScholarDigital Library
Minglong Shao, Anastassia Ailamaki, and Babak Falsafi. 2005. DBmbench: fast and accurate database workload representation on modern microarchitecture. In Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative Research. 254--267. Google ScholarDigital Library
R. Sivaramakrishnan and S. Jairath. 2014. Next generation SPARC processor cache hierarchy. In IEEE Hot Chips 26 Symposium (HCS), 2014. 1--28.Google ScholarCross Ref
Michael Stonebraker and Ariel Weisberg. 2013. The VoltDB Main Memory DBMS. IEEE Data Eng. Bull. 36, 2 (2013), 21--27. http://sites.computer.org/debull/A13june/VoltDB1.pdfGoogle Scholar
Tezzaron. 2017. DiRAM4 3D Memory. (2017). Retrieved April 26, 2017 from http://www.tezzaron.com/products/diram4-3d-memory/.Google Scholar
Stavros Volos, Djordje Jevdjic, Babak Falsafi, and Boris Grot. 2017. Fat Caches for Scale-Out Servers. IEEE Micro 37, 2 (2017), 90--103. Google ScholarDigital Library
Stavros Volos, Javier Picorel, Babak Falsafi, and Boris Grot. 2014. BuMP: Bulk Memory Access Prediction and Streaming. In Proceedings of the 47th Annual International Symposium on Microarchitecture (MICRO 2014). 545--557. Google ScholarDigital Library
Thomas F. Wenisch, Michael Ferdman, Anastasia Ailamaki, Babak Falsafi, and Andreas Moshovos. 2008. Temporal streams in commercial server applications. In 4th International Symposium on Workload Characterization (IISWC 2008). 99--108.Google ScholarCross Ref
Thomas F. Wenisch, Roland E. Wunderlich, Michael Ferdman, Anastassia Ailamaki, Babak Falsafi, and James C. Hoe. 2006. SimFlex: Statistical Sampling of Computer System Simulation. IEEE Micro 26, 4 (2006), 18--31. Google ScholarDigital Library
Lisa Wu, Raymond J. Barker, Martha A. Kim, and Kenneth A. Ross. 2013. Navigating big data with high-throughput, energy-efficient data partitioning. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA 2013). 249--260. Google ScholarDigital Library
Lisa Wu, Andrea Lottarini, Timothy K. Paine, Martha A. Kim, and Kenneth A. Ross. 2014. Q100: the architecture and design of a database processing unit. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2014). 255--268. Google ScholarDigital Library
Roland E. Wunderlich, Thomas F. Wenisch, Babak Falsafi, and James C. Hoe. 2003. SMARTS: Accelerating Microarchitecture Simulation via Rigorous Statistical Sampling. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA 2003). 84--95. Google ScholarDigital Library
Marcin Zukowski, Mark van de Wiel, and Peter A. Boncz. 2012. Vectorwise: A Vectorized Analytical DBMS. In Proceedings of the 28th International Conference on Data Engineering (ICDE 2012). 1349--1350. Google ScholarDigital Library

Index Terms

The Mondrian Data Engine

Recommendations

The Mondrian Data Engine
ISCA '17: Proceedings of the 44th Annual International Symposium on Computer Architecture

The increasing demand for extracting value out of ever-growing data poses an ongoing challenge to system designers, a task only made trickier by the end of Dennard scaling. As the performance density of traditional CPU-centric architectures stagnates, ...
Read More
Hydra: a near hybrid memory accelerator for CNN inference
DATE '22: Proceedings of the 2022 Conference & Exhibition on Design, Automation & Test in Europe

Convolutional neural network (CNN) accelerators often suffer from limited off-chip memory bandwidth and on-chip capacity constraints. One solution to this problem is near-memory or in-memory processing. Non-volatile memory, such as phase-change memory (...
Read More
G-NMP: Accelerating Graph Neural Networks with DIMM-based Near-Memory Processing
Abstract
Graph Neural Networks (GNNs) are of great value in numerous applications and promote the development of cognitive intelligence, due to the capability of modeling non-euclidean data structures. However, the inherent irregularity makes GNNs memory-...
Highlights
- G-NMP exploits rank-level parallelism and leverages off-the-shelf CPU and DRAM chips.
- G-ISA instruction sets reduces memory requests and alleviates C/A bandwidth pressure.
- Data flow optimization improves memory-compute overlap and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGARCH Computer Architecture News Volume 45, Issue 2
ISCA'17
May 2017
715 pages
ISSN:0163-5964
DOI:10.1145/3140659
Editor:
Babak Falsafi
Interim
Issue’s Table of Contents
ISCA '17: Proceedings of the 44th Annual International Symposium on Computer Architecture
June 2017
736 pages
ISBN:9781450348928
DOI:10.1145/3079856
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 June 2017
Check for updates
Author Tags
Near-memory processing
algorithm-hardware co-design
sequential memory access
Qualifiers
- tutorial
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 95
  Total Citations
  View Citations
- 964
  Total Downloads
- Downloads (Last 12 months)98
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The Mondrian Data Engine

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Index Terms

Recommendations

The Mondrian Data Engine

Hydra: a near hybri<u>d</u> memo<u>r</u>y <u>a</u>ccelerator for CNN inference

G-NMP: Accelerating Graph Neural Networks with DIMM-based Near-Memory Processing