Abstract
In earlier technology nodes, FPGAs had low power consumption compared to other compute chips such as CPUs and GPUs. However, in the 14nm technology node, FPGAs are consuming unprecedented power in the 100+W range, making power consumption a pressing concern. To reduce FPGA power consumption, several researchers have proposed deploying dynamic voltage scaling. While the previously proposed solutions show promising results, they have difficulty guaranteeing safe operation at reduced voltages for applications that use the FPGA hard blocks. In this work, we present the first DVS solution that is able to fully handle FPGA applications that use BRAMs. Our solution not only robustly tests the soft logic component of the application but also tests all components connected to the BRAMs. We extend a previously proposed CAD tool, FRoC, to automatically generate calibration bitstreams that are used to measure the application’s critical path delays on silicon. The calibration bitstreams also include testers that ensure all used SRAM cells operate safely while scaling Vdd. We experimentally show that using our DVS solution we can save 32% of the total power consumed by a discrete Fourier transform application running with the fixed nominal supply voltage and clocked at the Fmax reported by static timing analysis.
- Ian Kuon and J. Rose. 2007. Measuring the gap between FPGAs and ASICs. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 26, 2 (Feb. 2007), 203--215. Google ScholarDigital Library
- A. Putnam, A. M. Caulfield, E. S. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. P. Gopal, J. Gray, M. Haselman, S. Hauck, S. Heil, A. Hormati, J. Y. Kim, S. Lanka, J. Larus, E. Peterson, S. Pope, A. Smith, J. Thong, P. Y. Xiao, and D. Burger. 2014. A reconfigurable fabric for accelerating large-scale datacenter services. In ISCA. Google ScholarDigital Library
- Wikipedia. 2019. List of CPU Power Dissipation Figures. Retrieved May 2, 2019 https://en.wikipedia.org/wiki/List_of_CPU_power_dissipation_figures.Google Scholar
- Tim Tuan, Sean Kao, Arif Rahman, Satyaki Das, and Steve Trimberger. 2006. A 90nm low-power FPGA for battery-powered applications. In FPGA. 9. Google ScholarDigital Library
- Jose Nunez-Yanez. 2013. Energy proportional computing in commercial FPGAs with adaptive voltage scaling. In FPGAworld. Article 6, 5 pages. Google ScholarDigital Library
- S. Zhao, I. Ahmed, C. Lamoureux, A. Lotfi, V. Betz, and O. Trescases. 2016. A universal self-calibrating dynamic voltage and frequency scaling (DVFS) scheme with thermal compensation for energy savings in FPGAs. In APEC.Google Scholar
- Joshua M. Levine, Edward Stott, and Peter Y.K. Cheung. 2014. Dynamic voltage 8 frequency scaling with online slack measurement. In FPGA. 10. Google ScholarDigital Library
- C. T. Chow, L. S. M. Tsui, P. H. W. Leong, W. Luk, and S. J. E. Wilton. 2005. Dynamic voltage scaling for commercial FPGAs. In FPT.Google Scholar
- I. Ahmed, S. Zhao, O. Trescases, and V. Betz. 2016. Measure twice and cut once: Robust dynamic voltage scaling for FPGAs. In FPL.Google Scholar
- S. Zhao, I. Ahmed, A. Khakpour, V. Betz, and O. Trescases. 2017. A robust dynamic voltage scaling scheme for FPGAs with IR drop compensation. In APEC.Google Scholar
- V. R. Devanathan, A. Hales, S. Kale, and D. Sonkar. 2010. Towards effective and compression-friendly test of memory interface logic. In ITC.Google Scholar
- L. C. Chen, P. Dickinson, P. Mantri, M. Gala, P. Dahlgren, S. Bhattacharya, O. Caty, K. Woodling, T. Ziaja, D. Curwen, W. Yee, E. Su, G. Gu, and T. Nguyen. 2008. Transition test on UltraSPARC-T2 microprocessor. In ITC.Google Scholar
- J. Zeng, M. S. Abadir, G. Vandling, L. C. Wang, S. Karako, and J. A. Abraham. 2004. On correlating structural tests with functional tests for speed binning of high performance design. In MTV. Google ScholarDigital Library
- I. Ahmed, S. Zhao, J. Meijers, O. Trescases, and V. Betz. 2018. Automatic BRAM testing for robust dynamic voltage scaling for FPGAs. In FPL.Google Scholar
- Charles R. Lefurgy, Alan J. Drake, Michael S. Floyd, Malcolm S. Allen-Ware, Bishop Brock, Jose A. Tierno, and John B. Carter. 2011. Active management of timing guardband to save energy in POWER7. In MICRO. 11. Google ScholarDigital Library
- A. Drake, R. Senger, H. Deogun, G. Carpenter, S. Ghiasi, T. Nguyen, N. James, M. Floyd, and V. Pokala. 2007. A distributed critical-path timing monitor for a 65nm high-performance microprocessor. In ISSCC.Google Scholar
- B. Bowhill, B. Stackhouse, N. Nassif, Z. Yang, A. Raghavan, C. Morganti, C. Houghton, D. Krueger, O. Franza, J. Desai, J. Crop, D. Bradley, C. Bostak, S. Bhimji, and M. Becker. 2015. The Xeon processor E5-2600 v3: A 22nm 18-core product family. In ISSCC.Google Scholar
- B. Bowhill, B. Stackhouse, N. Nassif, Z. Yang, A. Raghavan, O. Mendoza, C. Morganti, C. Houghton, D. Krueger, O. Franza, J. Desai, J. Crop, B. Brock, D. Bradley, C. Bostak, S. Bhimji, and M. Becker. 2016. The Xeon processor E5-2600 v3: A 22nm 18-core product family. IEEE. Solid-State Circ. 51, 1 (Jan. 2016), 92--104.Google Scholar
- Atukem Nabina and Jose Luis Nunez-Yanez. 2012. Adaptive voltage scaling in a dynamically reconfigurable FPGA-based platform. ACM Trans. Reconfig. Technol. Syst. 5, 4, Article 20 (Dec. 2012), 22 pages. Google ScholarDigital Library
- Jose Nunez-Yanez. 2017. Adaptive voltage scaling in a heterogeneous FPGA device with memory and logic in-situ detectors. Microprocess. Microsyst. 51 (June 2017), 227--238.Google Scholar
- S. Mukhopadhyay, H. Mahmoodi, and K. Roy. 2005. Modeling of failure probability and statistical design of SRAM array for yield enhancement in nanoscaled CMOS. IEEE Trans. Comput. Aid. Des. Integr. Circ. Syst. 24, 12 (Dec. 2005), 1859--1880. Google ScholarDigital Library
- Sadegh Yazdanshenas, Kosuke Tatsumura, and Vaughn Betz. 2017. Don’t forget the memory: Automatic block RAM modelling, optimization, and architecture exploration. In FPGA. Google ScholarDigital Library
- S. Mukhopadhyay, H. Mahmoodi, and K. Roy. 2004. Statistical design and optimization of SRAM cell for yield enhancement. In ICCAD. 4. Google ScholarDigital Library
- A. R. Alameldeen, Z. Chishti, C. Wilkerson, W. Wu, and S. L. Lu. 2011. Adaptive cache design to enable reliable low-voltage operation. IEEE Trans. Comput. 60, 1 (Jan. 2011), 50--63. Google ScholarDigital Library
- E. Stott, J. M. Levine, P. Y. K. Cheung, and N. Kapre. 2014. Timing fault detection in FPGA-based circuits. In FCCM. Google ScholarDigital Library
- M. B. Tahoori and S. Mitra. 2007. Application-dependent delay testing of FPGAs. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syste. 26, 3 (Mar. 2007), 553--563. Google ScholarDigital Library
- P. R. Menon, Weifeng Xu, and R. Tessier. 2006. Design-specific path delay testing in lookup-table-based FPGAs. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 25, 5 (May 2006), 867--877. Google ScholarDigital Library
- Justin S. J. Wong, Pete Sedcole, and Peter Y. K. Cheung. 2009. Self-measurement of combinatorial circuit delays in FPGAs. ACM Trans. Reconfig. Technol. Syst. 2, 2, Article 10 (Jun. 2009), 22 pages. Google ScholarDigital Library
- A. Brant, A. Abdelhadi, D. H. H. Sim, S. L. Tang, M. X. Yue, and G. G. F. Lemieux. 2013. Safe overclocking of tightly coupled CGRAs and processor arrays using razor. In FCCM. Google ScholarDigital Library
- C. Chiasson and V. Betz. 2013. Should FPGAs abandon the pass-gate? In FPL.Google Scholar
- Altera. 2014. Cyclone IV Device Handbook.Google Scholar
- Song Yang. 1991. Logic Synthesis and Optimization Benchmarks User Guide 3.0. Technical Report. MCNC.Google Scholar
- S. Zhao, I. Ahmed, C. Lamoureux, A. Lotfi, V. Betz, and O. Trescases. 2016. A universal self-calibrating dynamic voltage and frequency scaling (DVFS) scheme with thermal compensation for energy savings in FPGAs. In APEC.Google Scholar
- Nikil Mehta, Raphael Rubin, and Andre DeHon. 2012. Limit study of energy 8 delay benefits of component-specific routing. In FPGA. 10. Google ScholarDigital Library
- Z. Guan, J. S. J. Wong, S. Chaudhuri, G. Constantinides, and P. Y. K. Cheung. 2012. A two-stage variation-aware placement method for FPGAs exploiting variation maps classification. In FPL.Google Scholar
- Yuko Hara, Hiroyuki Tomiyama, Shinya Honda, Hiroaki Takada, and Katsuya Ishii. 2008. CHStone: A benchmark program suite for practical C-based high-level synthesis. In ISCAS.Google Scholar
- K. E. Murray, S. Whitty, S. Liu, J. Luu, and V. Betz. 2013. Titan: Enabling large and complex benchmarks in academic CAD. In FPL.Google Scholar
- Peter Milder, Franz Franchetti, James C. Hoe, and Markus Püschel. 2012. Computer generation of hardware for linear digital signal processing transforms. ACM Trans. Des. Autom. Electron. Syst. 17, 2, Article 15 (Apr. 2012), 33 pages. Google ScholarDigital Library
- M. Zuluaga, P. Milder, and M. Püschel. 2012. Computer generation of streaming sorting networks. In DAC. 9. Google ScholarDigital Library
- Jian Liang, R. Tessier, and D. Goeckel. 2004. A dynamically-reconfigurable, power-efficient turbo decoder. In FCCM. Google ScholarDigital Library
- I. Ahmed, S. Zhao, O. Trescases, and V. Betz. 2017. Find the real speed limit: FPGA CAD for chip-specific application delay measurement. In FPL.Google Scholar
- S. Zhao, I. Ahmed, C. Lamoureux, A. Lotfi, V. Betz, and O. Trescases. 2018. Robust self-calibrated dynamic voltage scaling in FPGAs with thermal and IR-drop compensation. IEEE Trans. Power Electron. 33, 10 (Oct. 2018), 8500--8511.Google ScholarCross Ref
Index Terms
- FRoC 2.0: Automatic BRAM and Logic Testing to Enable Dynamic Voltage Scaling for FPGA Applications
Recommendations
GRT 2.0: An FPGA-based SDR Platform for Cognitive Radio Networks (Abstract Only)
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysAlthough there is explosive growth of theoretical research on cognitive radio, the real-time platform for cognitive radio is progressing at a low pace. Researchers expect fast prototyping their designs with appropriate wireless platforms to precisely ...
UTPlaceF 2.0: A High-Performance Clock-Aware FPGA Placement Engine
Special Section on Advances in Physical Design Automation and Regular PapersModern field-programmable gate array (FPGA) devices contain complex clock architectures on top of configurable logics. Unlike application specific integrated circuits (ASICs), the physical structure of clock networks in an FPGA is pre-manufactured and ...
Designing Modular Hardware Accelerators in C with ROCCC 2.0
FCCM '10: Proceedings of the 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing MachinesWhile FPGA-based hardware accelerators have repeatedly been demonstrated as a viable option, their programmability remains a major barrier to their wider acceptance by application code developers. These platforms are typically programmed in a low level ...
Comments