Abstract
We present results of extensive hardware/software partitioning experiments on numerous benchmarks. We describe our loop-oriented partitioning methodology for moving critical code from hardware to software. Our benchmarks included programs from PowerStone, MediaBench, and NetBench. Our experiments included estimated results for partitioning using an 8051 8-bit microcontroller or a 32-bit MIPS microprocessor for the software, and using on-chip configurable logic or custom application-specific integrated circuit hardware for the hardware. Additional experiments involved actual measurements taken from several physical implementations of hardware/software partitionings on real single-chip microprocessor/configurable-logic devices. We also estimated results assuming voltage scalable processors. We provide performance, energy, and size data for all of the experiments. We found that the benchmarks spent an average of 80% of their execution time in only 3% of their code, amounting to only about 200 bytes of critical code. For various experiments, we found that moving critical code to hardware resulted in average speedups of 3 to 5 and average energy savings of 35% to 70%, with average hardware requirements of only 5000 to 10,000 gates. To our knowledge, these experiments represent the most comprehensive hardware/software partitioning study published to date.
- Altera Corporation. 2001. ARM-Based Embedded Processor PLDs.Google Scholar
- Amdahl, G. 1967. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings AFIPS 1967 Spring Joint Computer Conference 30, 483--485.Google ScholarDigital Library
- Atmel FPSLIC, http://www.atmel.com/atmel/products/prod39.htm.Google Scholar
- Balboni, A., Fornaciari, W., and Sciuto, W. 1996. Partitioning and exploration in the TOSCA co-design flow. In Proceedings of the International Workshop on Hardware/Software Codesign, 62--69. Google ScholarDigital Library
- Burger, D. and Austin, T. M. 1997. The SimpleScalar tool set, Version 2.0. In Tech. Rep. #1342, University of Wisconsin-Madison Computer Sciences Department.Google Scholar
- E5 Press Release, http://www.triscend.com/about/indexrelease051401.html.Google Scholar
- Eles, P., Peng, Z., Kuchcinsky, K., and Doboli, A. 1997. System level hardware/software partitioning based on simulated annealing and tabu search. Design Automation for Embedded Systems 2, 1, 5--32.Google ScholarDigital Library
- Gajski, D.D., Vahid, F., Narayan, S., and Gong, J. 1998. SpecSyn: An environment supporting the specify-explore-refine paradigm for hardware/software system design. IEEE Transactions on VLSI Systems 6, 1, 84--100. Google ScholarDigital Library
- Givargis, T., Vahid F., and Henkel, J. 2001. System-level exploration for pareto-optimal configurations in parameterized systems-on-a-chip. In Proceedings of the International Conference on Computer-Aided Design (ICCAD). Google ScholarDigital Library
- Gokhale, M. and Stone, J. 1998. NAPA C: Compiling for hybrid RISC/FPGA architectures. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM). Google ScholarDigital Library
- Gonzalez, R., Gordon, B., and Horowitz, M. 1997. Supply and threshold voltage scaling for low power CMOS. IEEE Journal of Solid-State Circuits 32, 8.Google ScholarCross Ref
- Hauser, J. and Wawrzynek, J. 1997. Garp: A MIPS processor with a reconfigurable coprocessor. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines, Napa Valley, CA, 12--21. Google ScholarDigital Library
- Henkel, J. 1999. A low power hardware/software partitioning approach for core-based embedded systems. In Proceedings of the 36th ACM/IEEE Design Automation Conference, 122--127. Google ScholarDigital Library
- Henkel, J. and Ernst R. 1997. A hardware/software partitioner using a dynamically determined granularity. In Proceedings of the Design Automation Conference. Google ScholarDigital Library
- Henkel, J. and Li, Y. 1998. Energy-conscious HW/SW-partitioning of embedded systems: A Case Study on an MPEG-2 Encoder. In Proceedings of 6th International Workshop on Hardware/Software Codesign, 23--27. Google ScholarDigital Library
- Hou, J. and Wolf, W. 1996. Process partitioning for distributed embedded systems. In Proceeding International Workshop on Hardware/Software Codesign. Google ScholarDigital Library
- Intel XScale Processor, http://developer.intel.com/design/intelxscale.Google Scholar
- Kalavade, A. and Lee, E. 1994. A global criticality/local phase driven algorithm for the constrained hardware/software partitioning problem. In Proceedings of the International Workshop on Hardware/Software Codesign, 42--48. Google ScholarDigital Library
- Lee, C., Potkonjak, M., and Magione-Smith, W. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communication systems. In Proceedings of MICRO. Google ScholarDigital Library
- Malik, A., Moyer, B., and Cermak, D. 2000. A low power unified cache architecture providing power and performance flexibility. In Proceedings of the International Symposium on Low Power Electronics and Design. Google ScholarDigital Library
- MediaBench. http://www.cs.ucla.edu/∼leec/mediabench/.Google Scholar
- Mernik, G., Mangione-Smith, W. H., and Hu, W. 2001. NetBench: A benchmarking suite for network processors. In Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 39--42. Google ScholarDigital Library
- MIPS Technologies, Inc., http://www.mips.com.Google Scholar
- Stitt, G., Grattan, B., Villarreal, J., and Vahid, F. 2002. Using on-chip configurable logic to reduce embedded system software energy. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines, Napa Valley, CA. Google ScholarDigital Library
- Synopsys, http://www.synopsys.com.Google Scholar
- Triscend Corporation, http://www.triscend.com. 2002.Google Scholar
- University of California, Riverside; Dalton Project. http://www.cs.ucr.edu/∼dalton.Google Scholar
- Vanmeerbeeck, G., Schaumont, P., Vernalde, S., Engels, M., and Bolsens, I. 2001. Hardware/software partitioning of embedded system in OCAPI-xl. In Proceedings of the International Symposium on Hardware/Software Codesign, 30--35. Google ScholarDigital Library
- Villarreal, J., Lysecky, R., Cotterell, S., and Vahid, F. 2001. Loop analysis of embedded applications. In Tech. Rep. UCR-CSE-01-03, University of California, Riverside.Google Scholar
- Virtex Power Estimator, http://support.xilinx.com/cgi-bin/powerweb.pl.Google Scholar
- Wan, M., Ichikawa, Y., Lidsky, D., Rabaey, J. 1998. An energy conscious methodology for early design exploration of heterogeneous DSPs. In Proceedings of the IEEE Custom Integrated Circuits Conference, 111--117.Google ScholarCross Ref
- Werner, B. and Magnusson, P. 1997. A hybrid simulation approach enabling performance characterization of large software systems. In Proceedings of MASCOTS. Google ScholarDigital Library
- Xilinx Corporation. 2002. Virtex-II Pro Platform FGPA Handbook.Google Scholar
Index Terms
- Energy savings and speedups from partitioning critical software loops to hardware in embedded systems
Recommendations
Dynamic hardware/software partitioning: a first approach
DAC '03: Proceedings of the 40th annual Design Automation ConferencePartitioning an application among software running on a microprocessor and hardware co-processors in on-chip configurable logic has been shown to improve performance and energy consumption in embedded systems. Meanwhile, dynamic software optimization ...
Performance improvements from partitioning applications to FPGA hardware in embedded SoCs
A hardware/software partitioning methodology for improving performance in single-chip systems composed by processor and Field Programmable Gate Array reconfigurable logic is presented. Speedups are achieved by executing critical software parts on the ...
Hardware/software partitioning of software binaries
ICCAD '02: Proceedings of the 2002 IEEE/ACM international conference on Computer-aided designPartitioning an embedded system application among a microprocessor and custom hardware has been shown to improve the performance, power or energy of numerous examples. The advent of single-chip microprocessor/FPGA platforms makes such partitioning even ...
Comments