skip to main content
10.1145/3134383.3134410acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicicmConference Proceedingsconference-collections
research-article

Low-power Parallel Data Processing Using Computation Reuse

Authors Info & Claims
Published:28 August 2017Publication History

ABSTRACT

A wide range of real-world applications, including DSP, deep learning, multimedia, and scientific algorithms generally include fixed-point and floating-point arithmetic operations and trigonometric functions which have long latency and high power usage. In this paper, we propose a computation reuse mechanism for multicore processors that reuses the results of an arithmetic operation for subsequent operations with (approximately) the same operands. It adds a small so-called result cache to every functional unit that keeps a few recent operands and their results to detect repetitive operands and reuse the results. Taking advantage of the value locality inherent in many real-world applications, our architecture relies on a multi-stage interconnection network to distribute input data elements across the cores of a multi-core processor in such a way that the data locality of each core is increased. This way, each core has higher computation reuse rate that translates to more power consumption reduction. Experimental results show that the proposed mechanism increases the result cache hit rate, which leads to a significant reduction in power consumption of arithmetic operations.

References

  1. X. He, G. Yan, Y. Han and X. Li, "ACR: Enabling computation reuse for approximate computing," 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), Macau, 2016, pp. 643--648. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Yasoubi, R. Hojabr and M. Modarressi, "Power-Efficient Accelerator Design for Neural Networks Using Computation Reuse," in IEEE Computer Architecture Letters, Vol. 16, no. 1, pp. 72--75, Jan.-June 1 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. X. He, G. Yan, F. Sun, Y. Han and X. Li, "ApproxEye: Enabling approximate computation reuse for microrobotic computer vision," 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), Chiba, 2017, pp. 402--407. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Modarressi, S. H. Nikounia and A. H. Jahangir, "Low-power arithmetic unit for DSP applications," International Symposium on System on Chip (SoC), 2011, pp. 68--71. Google ScholarGoogle ScholarCross RefCross Ref
  5. C. Alvarez, J. Corbal, and M. Valero, "Fuzzy memoization for floating-point multimedia applications" in IEEE Transactions on Computers, Vol. 54, No. 7, pp. 922--927, July 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. H. Esmaeilzadeh, et al., "Architecture support for disciplined approximate programming," International conference on Architectural Support for Programming Languages and Operating System, pp. 301--312, 2011.Google ScholarGoogle Scholar
  7. Y. Tong, R. Rutenbar, and D.F. Nagle, "Minimizing floating-point power dissipation via bit-width reduction", in Proc. of Power-Driven Microarchitecture Workshop, 1998.Google ScholarGoogle Scholar
  8. H. Lee, "A power-aware scalable pipelined Booth multiplier," in IEEE International Systems-On-Chip Conference, pp. 123--126, 2006.Google ScholarGoogle Scholar
  9. Moldovan, Dan I. Parallel processing from applications to systems. Elsevier, 2014.Google ScholarGoogle Scholar
  10. J. Duato, S. Yalamanchili, and L. Ni. Interconnection Networks. Morgan Kaufmann Publishers Inc., 2002.Google ScholarGoogle Scholar
  11. R. Sabbaghi-Nadooshan, M. Modarressi and H. Sarbazi-Azad, "The 2D DBM: An attractive alternative to the simple 2D mesh topology for on-chip networks," IEEE International Conference on Computer Design, 2008, pp. 486--490. Google ScholarGoogle ScholarCross RefCross Ref
  12. A. Oppenheim, et al., Discrete-time Signal Processing, Prentice Hall Pubs., 1999.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. www.seas.ucla.edu/~ingrid/ee213a/speech/speech.html, Jul 2017.Google ScholarGoogle Scholar
  14. S. Thoziyoor, N. Muralimanohar, J. H. Ahn and N. P. Jouppi, "CACTI 5.1", Technical Report HPL-2008-20, HP Laboratories, 2008.Google ScholarGoogle Scholar

Index Terms

  1. Low-power Parallel Data Processing Using Computation Reuse

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICICM '17: Proceedings of the 7th International Conference on Information Communication and Management
      August 2017
      181 pages
      ISBN:9781450352796
      DOI:10.1145/3134383

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 August 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)3
      • Downloads (Last 6 weeks)0

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader