research-article

Low-power Parallel Data Processing Using Computation Reuse

Authors:
Bita Dabiri

Department of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran

Department of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
View Profile

,
Seyyed Hossein SeyyedAghaei Rezaei

Department of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran

Department of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
View Profile

,
Mehdi Modarressi

Department of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran, School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Iran

Department of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran, School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Iran
View Profile

ICICM '17: Proceedings of the 7th International Conference on Information Communication and ManagementAugust 2017Pages 1–4https://doi.org/10.1145/3134383.3134410

Published:28 August 2017Publication History

ICICM '17: Proceedings of the 7th International Conference on Information Communication and Management

Pages 1–4

ABSTRACT

A wide range of real-world applications, including DSP, deep learning, multimedia, and scientific algorithms generally include fixed-point and floating-point arithmetic operations and trigonometric functions which have long latency and high power usage. In this paper, we propose a computation reuse mechanism for multicore processors that reuses the results of an arithmetic operation for subsequent operations with (approximately) the same operands. It adds a small so-called result cache to every functional unit that keeps a few recent operands and their results to detect repetitive operands and reuse the results. Taking advantage of the value locality inherent in many real-world applications, our architecture relies on a multi-stage interconnection network to distribute input data elements across the cores of a multi-core processor in such a way that the data locality of each core is increased. This way, each core has higher computation reuse rate that translates to more power consumption reduction. Experimental results show that the proposed mechanism increases the result cache hit rate, which leads to a significant reduction in power consumption of arithmetic operations.

References

X. He, G. Yan, Y. Han and X. Li, "ACR: Enabling computation reuse for approximate computing," 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), Macau, 2016, pp. 643--648. Google ScholarDigital Library
A. Yasoubi, R. Hojabr and M. Modarressi, "Power-Efficient Accelerator Design for Neural Networks Using Computation Reuse," in IEEE Computer Architecture Letters, Vol. 16, no. 1, pp. 72--75, Jan.-June 1 2017.Google ScholarDigital Library
X. He, G. Yan, F. Sun, Y. Han and X. Li, "ApproxEye: Enabling approximate computation reuse for microrobotic computer vision," 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), Chiba, 2017, pp. 402--407. Google ScholarDigital Library
M. Modarressi, S. H. Nikounia and A. H. Jahangir, "Low-power arithmetic unit for DSP applications," International Symposium on System on Chip (SoC), 2011, pp. 68--71. Google ScholarCross Ref
C. Alvarez, J. Corbal, and M. Valero, "Fuzzy memoization for floating-point multimedia applications" in IEEE Transactions on Computers, Vol. 54, No. 7, pp. 922--927, July 2005. Google ScholarDigital Library
H. Esmaeilzadeh, et al., "Architecture support for disciplined approximate programming," International conference on Architectural Support for Programming Languages and Operating System, pp. 301--312, 2011.Google Scholar
Y. Tong, R. Rutenbar, and D.F. Nagle, "Minimizing floating-point power dissipation via bit-width reduction", in Proc. of Power-Driven Microarchitecture Workshop, 1998.Google Scholar
H. Lee, "A power-aware scalable pipelined Booth multiplier," in IEEE International Systems-On-Chip Conference, pp. 123--126, 2006.Google Scholar
Moldovan, Dan I. Parallel processing from applications to systems. Elsevier, 2014.Google Scholar
J. Duato, S. Yalamanchili, and L. Ni. Interconnection Networks. Morgan Kaufmann Publishers Inc., 2002.Google Scholar
R. Sabbaghi-Nadooshan, M. Modarressi and H. Sarbazi-Azad, "The 2D DBM: An attractive alternative to the simple 2D mesh topology for on-chip networks," IEEE International Conference on Computer Design, 2008, pp. 486--490. Google ScholarCross Ref
A. Oppenheim, et al., Discrete-time Signal Processing, Prentice Hall Pubs., 1999.Google ScholarDigital Library
www.seas.ucla.edu/~ingrid/ee213a/speech/speech.html, Jul 2017.Google Scholar
S. Thoziyoor, N. Muralimanohar, J. H. Ahn and N. P. Jouppi, "CACTI 5.1", Technical Report HPL-2008-20, HP Laboratories, 2008.Google Scholar

Index Terms

Low-power Parallel Data Processing Using Computation Reuse
1. Information systems
  1. Information systems applications
    1. Mobile information processing systems

Recommendations

A Low Power Correlator for CDMA Wireless Systems

The complex valued matched filter correlators consume maximum power in the DS/SS CDMA receivers. These correlators accumulate 1024 samples lying in the range 7 to +7. This accumulation needs 3 data bits, 1 sign bit and 10 extra bits for overflow. Hence, ...
Read More
A low-power handheld GPU using logarithmic arithmetic and triple DVFS power domains
GH '07: Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware

In this paper, a low-power GPU architecture is described for the handheld systems with limited power and area budgets. The GPU is designed using logarithmic arithmetic for power- and area-efficient design. For this GPU, a multifunction unit is proposed ...
Read More
Low-Power Multiple-Precision Iterative Floating-Point Multiplier with SIMD Support

The demand for improved SIMD floating-point performance on general-purpose x86-compatible microprocessors is rising. At the same time, there is a conflicting demand in the low-power computing market for a reduction in power consumption. Along with this, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICICM '17: Proceedings of the 7th International Conference on Information Communication and Management
August 2017
181 pages
ISBN:9781450352796
DOI:10.1145/3134383

Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 August 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Computation Reuse
Interconnection Networks
Low-power
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 61
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Low-power Parallel Data Processing Using Computation Reuse

ICICM '17: Proceedings of the 7th International Conference on Information Communication and Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Low Power Correlator for CDMA Wireless Systems

A low-power handheld GPU using logarithmic arithmetic and triple DVFS power domains

Low-Power Multiple-Precision Iterative Floating-Point Multiplier with SIMD Support

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Low-power Parallel Data Processing Using Computation Reuse

ICICM '17: Proceedings of the 7th International Conference on Information Communication and Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Low Power Correlator for CDMA Wireless Systems

A low-power handheld GPU using logarithmic arithmetic and triple DVFS power domains

Low-Power Multiple-Precision Iterative Floating-Point Multiplier with SIMD Support

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media