skip to main content
article

Energy savings and speedups from partitioning critical software loops to hardware in embedded systems

Published:01 February 2004Publication History
Skip Abstract Section

Abstract

We present results of extensive hardware/software partitioning experiments on numerous benchmarks. We describe our loop-oriented partitioning methodology for moving critical code from hardware to software. Our benchmarks included programs from PowerStone, MediaBench, and NetBench. Our experiments included estimated results for partitioning using an 8051 8-bit microcontroller or a 32-bit MIPS microprocessor for the software, and using on-chip configurable logic or custom application-specific integrated circuit hardware for the hardware. Additional experiments involved actual measurements taken from several physical implementations of hardware/software partitionings on real single-chip microprocessor/configurable-logic devices. We also estimated results assuming voltage scalable processors. We provide performance, energy, and size data for all of the experiments. We found that the benchmarks spent an average of 80% of their execution time in only 3% of their code, amounting to only about 200 bytes of critical code. For various experiments, we found that moving critical code to hardware resulted in average speedups of 3 to 5 and average energy savings of 35% to 70%, with average hardware requirements of only 5000 to 10,000 gates. To our knowledge, these experiments represent the most comprehensive hardware/software partitioning study published to date.

References

  1. Altera Corporation. 2001. ARM-Based Embedded Processor PLDs.Google ScholarGoogle Scholar
  2. Amdahl, G. 1967. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings AFIPS 1967 Spring Joint Computer Conference 30, 483--485.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Atmel FPSLIC, http://www.atmel.com/atmel/products/prod39.htm.Google ScholarGoogle Scholar
  4. Balboni, A., Fornaciari, W., and Sciuto, W. 1996. Partitioning and exploration in the TOSCA co-design flow. In Proceedings of the International Workshop on Hardware/Software Codesign, 62--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Burger, D. and Austin, T. M. 1997. The SimpleScalar tool set, Version 2.0. In Tech. Rep. #1342, University of Wisconsin-Madison Computer Sciences Department.Google ScholarGoogle Scholar
  6. E5 Press Release, http://www.triscend.com/about/indexrelease051401.html.Google ScholarGoogle Scholar
  7. Eles, P., Peng, Z., Kuchcinsky, K., and Doboli, A. 1997. System level hardware/software partitioning based on simulated annealing and tabu search. Design Automation for Embedded Systems 2, 1, 5--32.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Gajski, D.D., Vahid, F., Narayan, S., and Gong, J. 1998. SpecSyn: An environment supporting the specify-explore-refine paradigm for hardware/software system design. IEEE Transactions on VLSI Systems 6, 1, 84--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Givargis, T., Vahid F., and Henkel, J. 2001. System-level exploration for pareto-optimal configurations in parameterized systems-on-a-chip. In Proceedings of the International Conference on Computer-Aided Design (ICCAD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Gokhale, M. and Stone, J. 1998. NAPA C: Compiling for hybrid RISC/FPGA architectures. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gonzalez, R., Gordon, B., and Horowitz, M. 1997. Supply and threshold voltage scaling for low power CMOS. IEEE Journal of Solid-State Circuits 32, 8.Google ScholarGoogle ScholarCross RefCross Ref
  12. Hauser, J. and Wawrzynek, J. 1997. Garp: A MIPS processor with a reconfigurable coprocessor. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines, Napa Valley, CA, 12--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Henkel, J. 1999. A low power hardware/software partitioning approach for core-based embedded systems. In Proceedings of the 36th ACM/IEEE Design Automation Conference, 122--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Henkel, J. and Ernst R. 1997. A hardware/software partitioner using a dynamically determined granularity. In Proceedings of the Design Automation Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Henkel, J. and Li, Y. 1998. Energy-conscious HW/SW-partitioning of embedded systems: A Case Study on an MPEG-2 Encoder. In Proceedings of 6th International Workshop on Hardware/Software Codesign, 23--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Hou, J. and Wolf, W. 1996. Process partitioning for distributed embedded systems. In Proceeding International Workshop on Hardware/Software Codesign. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Intel XScale Processor, http://developer.intel.com/design/intelxscale.Google ScholarGoogle Scholar
  18. Kalavade, A. and Lee, E. 1994. A global criticality/local phase driven algorithm for the constrained hardware/software partitioning problem. In Proceedings of the International Workshop on Hardware/Software Codesign, 42--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Lee, C., Potkonjak, M., and Magione-Smith, W. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communication systems. In Proceedings of MICRO. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Malik, A., Moyer, B., and Cermak, D. 2000. A low power unified cache architecture providing power and performance flexibility. In Proceedings of the International Symposium on Low Power Electronics and Design. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. MediaBench. http://www.cs.ucla.edu/∼leec/mediabench/.Google ScholarGoogle Scholar
  22. Mernik, G., Mangione-Smith, W. H., and Hu, W. 2001. NetBench: A benchmarking suite for network processors. In Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 39--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. MIPS Technologies, Inc., http://www.mips.com.Google ScholarGoogle Scholar
  24. Stitt, G., Grattan, B., Villarreal, J., and Vahid, F. 2002. Using on-chip configurable logic to reduce embedded system software energy. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines, Napa Valley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Synopsys, http://www.synopsys.com.Google ScholarGoogle Scholar
  26. Triscend Corporation, http://www.triscend.com. 2002.Google ScholarGoogle Scholar
  27. University of California, Riverside; Dalton Project. http://www.cs.ucr.edu/∼dalton.Google ScholarGoogle Scholar
  28. Vanmeerbeeck, G., Schaumont, P., Vernalde, S., Engels, M., and Bolsens, I. 2001. Hardware/software partitioning of embedded system in OCAPI-xl. In Proceedings of the International Symposium on Hardware/Software Codesign, 30--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Villarreal, J., Lysecky, R., Cotterell, S., and Vahid, F. 2001. Loop analysis of embedded applications. In Tech. Rep. UCR-CSE-01-03, University of California, Riverside.Google ScholarGoogle Scholar
  30. Virtex Power Estimator, http://support.xilinx.com/cgi-bin/powerweb.pl.Google ScholarGoogle Scholar
  31. Wan, M., Ichikawa, Y., Lidsky, D., Rabaey, J. 1998. An energy conscious methodology for early design exploration of heterogeneous DSPs. In Proceedings of the IEEE Custom Integrated Circuits Conference, 111--117.Google ScholarGoogle ScholarCross RefCross Ref
  32. Werner, B. and Magnusson, P. 1997. A hybrid simulation approach enabling performance characterization of large software systems. In Proceedings of MASCOTS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Xilinx Corporation. 2002. Virtex-II Pro Platform FGPA Handbook.Google ScholarGoogle Scholar

Index Terms

  1. Energy savings and speedups from partitioning critical software loops to hardware in embedded systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Embedded Computing Systems
        ACM Transactions on Embedded Computing Systems  Volume 3, Issue 1
        February 2004
        232 pages
        ISSN:1539-9087
        EISSN:1558-3465
        DOI:10.1145/972627
        Issue’s Table of Contents

        Copyright © 2004 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 February 2004
        Published in tecs Volume 3, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader