skip to main content
research-article

AC-DIMM: associative computing with STT-MRAM

Published:23 June 2013Publication History
Skip Abstract Section

Abstract

With technology scaling, on-chip power dissipation and off-chip memory bandwidth have become significant performance bottlenecks in virtually all computer systems, from mobile devices to supercomputers. An effective way of improving performance in the face of bandwidth and power limitations is to rely on associative memory systems. Recent work on a PCM-based, associative TCAM accelerator shows that associative search capability can reduce both off-chip bandwidth demand and overall system energy. Unfortunately, previously proposed resistive TCAM accelerators have limited flexibility: only a restricted (albeit important) class of applications can benefit from a TCAM accelerator, and the implementation is confined to resistive memory technologies with a high dynamic range (RHigh/RLow), such as PCM.

This work proposes AC-DIMM, a flexible, high-performance associative compute engine built on a DDR3-compatible memory module. AC-DIMM addresses the limited flexibility of previous resistive TCAM accelerators by combining two powerful capabilities---associative search and processing in memory. Generality is improved by augmenting a TCAM system with a set of integrated, user programmable microcontrollers that operate directly on search results, and by architecting the system such that key-value pairs can be co-located in the same TCAM row. A new, bit-serial TCAM array is proposed, which enables the system to be implemented using STT-MRAM. AC-DIMM achieves a 4.2X speedup and a 6.5X energy reduction over a conventional RAM-based system on a set of 13 evaluated applications.

References

  1. Design Compiler Command-Line Interface Guide. http://www.synopsys.com/.Google ScholarGoogle Scholar
  2. Free PDK 45nm open-access based PDK for the 45nm technology node. http://www.eda.ncsu.edu/wiki/FreePDK.Google ScholarGoogle Scholar
  3. Advanced Micro Devices, Inc. AMD64 Architecture Programmer's Manual Volume 2: System Programming, 2010.Google ScholarGoogle Scholar
  4. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proceedings of the 20th Very Large Databases Conference, Santioago de Chile, Chile, Sept. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. F. Alibart, T. Sherwood, and D. Strukov. Hybrid CMOS/nanodevice circuits for high throughput pattern matching applications. In Adaptive Hardware and Systems (AHS), 2011 NASA/ESA Conference on, June 2011.Google ScholarGoogle ScholarCross RefCross Ref
  6. I. Arsovski, T. Chandler, and A. Sheikholeslami. A ternary content-addressable memory (TCAM) based on 4T static storage and including a current-race sensing scheme. Solid-State Circuits, Journal of, 38(1):155--158, Jan. 2003.Google ScholarGoogle Scholar
  7. D. Elliott, W. Snelgrove, and M. Stumm. Computational RAM: A memory-SIMD hybrid and its application to DSP. In Custom Integrated Circuits Conference, 1992., Proceedings of the IEEE 1992, pages 30.6.1--30.6.4, May 1992.Google ScholarGoogle ScholarCross RefCross Ref
  8. K. Eshraghian, K.-R. Cho, O. Kavehei, S.-K. Kang, D. Abbott, and S.-M. S. Kang. Memristor MOS content addressable memory (MCAM): Hybrid architecture for future high performance search engines. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 19(8):1407--1417, Aug. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Foss and A. Roth. Priority encoder circuit and method for content addressable memory. Technical Report Canadian Patent 2,365, 891, MOSAID Technologies Inc., Apr. 2003.Google ScholarGoogle Scholar
  10. A. Goel and P. Gupta. Small subset queries and bloom filters using ternary associative memories, with applications. In Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems, SIGMETRICS '10, pages 143--154, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Gokhale, B. Holmes, and K. Iobst. Processing in memory: the terasys massively parallel PIM array. Computer, 28(4):23--31, Apr. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Q. Guo, X. Guo, Y. Bai, and E. İpek. A resistive TCAM accelerator for data-intensive computing. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-44 '11, pages 339--350, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop, Washington, DC, USA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Hashmi and M. Lipasti. Accelerating search and recognition with a TCAM functional unit. In Computer Design, 2008. IEEE International Conference on, Oct. 2008.Google ScholarGoogle ScholarCross RefCross Ref
  15. J. L. Henning. SPEC CPU2000: Measuring CPU performance in the new millennium. IEEE Computer, 33(7):28--35, July 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Huai. Spin-transfer torque MRAM (STT-MRAM) challenges and prospects. AAPPS Bulletin, 18(6):33--40, Dec. 2008.Google ScholarGoogle Scholar
  17. Intel Corporation. IA-32 Intel Architecture Optimization Reference Manual, 2003.Google ScholarGoogle Scholar
  18. ITRS. International Technology Roadmap for Semiconductors: 2010 Update. http://www.itrs.net/links/2010itrs/home2010.htm.Google ScholarGoogle Scholar
  19. M. Joshi, G. Karypis, and V. Kumar. ScalParC: A new scalable and efficient parallel classification algorithm for mining large datasets. In IPPS, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kawahara, T. and Takemura, R. and Miura, K. and Hayakawa, J. and Ikeda, S. and Young Min Lee and Sasaki, R. and Goto, Y. and Ito, K. and MEGURO, T. and Matsukura, F. and Takahashi, Hiromasa and Matsuoka, Hideyuki and OHNO, H. 2 Mb SPRAM (spin-transfer torque RAM) with bit-by-bit bi-directional current write and parallelizing-direction current read. IEEE Journal of Solid-State Circuits, 43(1):109--120, Jan. 2008.Google ScholarGoogle Scholar
  21. O. D. Kretser and A. Moffat. Needles and haystacks: A search engine for personal information collections. In Australasian Computer Science Conference, 2000.Google ScholarGoogle Scholar
  22. K. Lakshminarayanan, A. Rangarajan, and S. Venkatachary. Algorithms for advanced packet classification with ternary CAMs. In Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications, SIGCOMM '05, pages 193--204, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In International Symposium on Computer Architecture, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. L.-Y. Liu, J.-F. Wang, R.-J. Wang, and J.-Y. Lee. CAM-based VLSI architectures for dynamic Huffman coding. In Consumer Electronics, 1994. Digest of Technical Papers., IEEE International Conference on, June 1994.Google ScholarGoogle Scholar
  25. M. Madec, J. Kammerer, and L. Hebrard. Compact modeling of a magnetic tunnel junction part II: Tunneling current model. Electron Devices, IEEE Transactions on, 57(6):1416--1424, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  26. S. Matsunaga, K. Hiyama, A. Matsumoto, S. Ikeda, H. Hasegawa, K. Miura, J. Hayakawa, T. Endoh, H. Ohno, and T. Hanyu. Standby-power-free compact ternary content-addressable memory cell chip using magnetic tunnel junction devices. Applied Physics Express, 2(2):023004, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  27. S. Matsunaga, A. Katsumata, M. Natsui, S. Fukami, T. Endoh, H. OHNO, and T. Hanyu. Fully parallel 6T-2MTJ nonvolatile TCAM with single-transistor-based self match-line discharge control. In VLSI Circuits (VLSIC), 2011 Symposium on, June 2011.Google ScholarGoogle Scholar
  28. A. J. Mcauley and P. Francis. Fast routing table lookup using CAMs. In IEEE INFOCOM, pages 1382--1391, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  29. D. McGrath. Everspin samples 64Mb spin-torque MRAM. EETimes, Nov. 2012. http://www.eetimes.com/design/memory-design/4401052/Everspin-samples-64--Mb-spin-torque-MRAM?pageNumber=0.Google ScholarGoogle Scholar
  30. M. Meribout, T. Ogura, and M. Nakanishi. On using the CAM concept for parametric curve extraction. Image Processing, IEEE Transactions on, 9(12):2126--2130, Dec. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Micron Technology, Inc., MT41J128M8. 1Gb DDR3 SDRAM, 2006.Google ScholarGoogle Scholar
  32. R. Narayanan, B. Ozisikyilmaz, J. Zambreno, G. Memik, and A. Choudhary. MineBench: A benchmark suite for data mining workloads. In Workload Characterization, 2006 IEEE International Symposium on, Oct. 2006.Google ScholarGoogle ScholarCross RefCross Ref
  33. M. Oskin, F. Chong, and T. Sherwood. Active pages: a computation model for intelligent memory. In Computer Architecture, 1998. Proceedings. The 25th Annual International Symposium on, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Panchanathan and M. Goldberg. A content-addressable memory architecture for image coding using vector quantization. Signal Processing, IEEE Transactions on, 39(9):2066--2078, Sept. 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. S. P. Parkin, C. Kaiser, A. Panchula, P. M. Rice, B. Hughes, M. Samant, and S. H. Yang. Giant tunnelling magnetoresistance at room temperature with MgO (100) tunnel barriers. Nature Materials, 3(12):862--867, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  36. T.-B. Pei and C. Zukowski. VLSI implementation of routing tables: tries and CAMs. In INFOCOM '91. Proceedings. Tenth Annual Joint Conference of the IEEE Computer and Communications Societies., Apr. 1991.Google ScholarGoogle ScholarCross RefCross Ref
  37. J. Pisharath, Y. Liu, W. Liao, A. Choudhary, G. Memik, and J. Parhi. NU-MineBench 2.0. Technical report, Northwestern University, August 2005. Tech. Rep. CUCIS-2005-08-01.Google ScholarGoogle Scholar
  38. J. Potter, J. Baker, S. Scott, A. Bansal, C. Leangsuksun, and R. Asthagiri. ASC: An associative computing paradigm. Special Issue on Associative Processing, IEEE Computer, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. B. Rajendran, R. Cheek, L. Lastras, M. Franceschini, M. Breitwisch, A. Schrott, J. Li, R. Montoye, L. Chang, and C. Lam. Demonstration of CAM and TCAM using phase change devices. In Memory Workshop (IMW), 2011 3rd IEEE International, May 2011.Google ScholarGoogle ScholarCross RefCross Ref
  40. C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. In Proceedings of the 13th International Symposium on High-Performance Computer Architecture, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Prvulovic, L. Ceze, S. Sarangi, P. Sack, K. Strauss, and P. Montesinos. SESC simulator, Jan. 2005. http://sesc.sourceforge.net.Google ScholarGoogle Scholar
  42. S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. Memory access scheduling. In Proceedings of the 27th annual international symposium on Computer architecture, ISCA-27, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. S. Sharma and R. Panigrahy. Sorting and searching using ternary CAMs. In High Performance Interconnects, 2002. Proceedings. 10th Symposium on, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. R. Shinde, A. Goel, P. Gupta, and D. Dutta. Similarity search and locality sensitive hashing using ternary content addressable memories. In Proceedings of the 2010 international conference on Management of data, SIGMOD '10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. K. Tsuchida, T. Inaba, K. Fujita, Y. Ueda, T. Shimizu, Y. Asao, T. Kajiyama, M. Iwayama, K. Sugiura, S. Ikegawa, T. Kishi, T. Kai, M. Amano, N. Shimomura, H. Yoda, and Y. Watanabe. A 64Mb MRAM with clamped-reference and adequate-reference schemes. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2010 IEEE International, pages 258--259, Feb. 2010.Google ScholarGoogle ScholarCross RefCross Ref
  46. W. Xu, T. Zhang, and Y. Chen. Design of spin-torque transfer magnetoresistive RAM and CAM/TCAM with high sensing and search speed. IEEE Transactions on Very Large Scale Integration Systems, 18(1):66--74, Jan 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Z. Zhang, Z. Zhu, and X. Zhang. A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality. In MICRO-33, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. W. Zhao and Y. Cao. New generation of predictive technology model for sub-45nm design exploration. In International Symposium on Quality Electronic Design, 2006. http://ptm.asu.edu/. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. J.-G. Zhu. Magnetoresistive random access memory: The path to competitiveness and scalability. Proceedings of the IEEE, 96(11):1786--1798, Nov. 2008.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. AC-DIMM: associative computing with STT-MRAM

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGARCH Computer Architecture News
      ACM SIGARCH Computer Architecture News  Volume 41, Issue 3
      ICSA '13
      June 2013
      666 pages
      ISSN:0163-5964
      DOI:10.1145/2508148
      Issue’s Table of Contents
      • cover image ACM Other conferences
        ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture
        June 2013
        686 pages
        ISBN:9781450320795
        DOI:10.1145/2485922

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 June 2013

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader