Skip to main content
Log in

Benchmarking for power consumption monitoring

Description of benchmarks designed to expose power usage characteristics of parallel hardware systems, and preliminary results

  • Special Issue Paper
  • Published:
Computer Science - Research and Development

Abstract

This paper presents a set of benchmarks that are designed to measure power consumption in parallel systems. The benchmarks range from low-level, single instructions or operations, to small kernels. In addition to describing the motivation behind developing the benchmarks and the design principles that were followed, the paper also introduces a metric to quantify the power-performance of a parallel system. Initial results are presented and help to illustrate the contribution of the paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. http://www.adept-project.eu.

  2. These numbers were extracted from the system when the respective CPU was idle.

References

  1. Amarasinghe S, Campbell D, Carlson W, Chien A, Dally W, Elnohazy E, Harrison R, Harrod W, Hiller J, Karp S, Koelbel C, Koester D, Kogge P, Levesque J, Reed D, Schreiber R, Richards M, Scarpelli A, Shalf J, Snavely A, Sterling T (2009) Exascale software study: software challenges in extreme scale systems

  2. Cooley JW, Tukey JW (1965) An algorithm for the machine calculation of complex Fourier series. Math Comput 19(90):297297. doi:10.1090/s0025-5718-1965-0178586-1

    Article  MathSciNet  Google Scholar 

  3. Hardkernel: ODROID XU+E Specification. Online. http://bit.ly/1sLd62v. Accessed 30 May 2014

  4. Hart A, Richardson H, Doleschal J, Ilsche T, Bielert M, Kappel M (2014) User-level power monitoring and application performance on cray xc30 supercomputers. In: Proceedings of the Cray User Group (CUG) 2014, Lugano, Switzerland

  5. Juckeland G et al (2004) BenchIT-Performance measurement and comparison for scientific applications. In: Joubert G, Nagel W, Peters F, Walter W (eds) Parallel computing software technology, algorithms, architectures and applications, advances in parallel computing, vol 13. North-Holland, Amsterdam, pp 501–508

  6. OpenMP ARB: OpenMP Specification (2013)

  7. PMaC: MultiMaps. http://bit.ly/1hG2vwr. Accessed 30 May 2014

  8. Samsung: Samsung Exynos 5 Octa Specification. http://bit.ly/OOsOcZ. Accessed 30 May 2014

  9. Staelin C, packard Laboratories H (1996) lmbench: portable tools for performance analysis. In: USENIX annual technical conference, pp 279–294

  10. Towards a breakthrough in software for advanced computing systems. Report from a workshop organised by the European Commission in preparation for HORIZON 2020 (2012)

  11. UPC Consortium: UPC Language Specifications (2005)

Download references

Acknowledgments

Thanks to James Perry and Iakovos Panourgias, both EPCC, for testing/reviewing the benchmarks, and to Andrew McCormick from Alpha Data Parallel Systems Ltd for deriving the energy scaling metrics.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nick Johnson.

Additional information

The Adept project is partially funded by the European Commission under the 7th Framework Programme, Grant Agreement Number 610490.

Appendix: ODROID specifications

Appendix: ODROID specifications

The board used in the evaluation section of this paper is an ODROID XU+E. This is a complete System-on-Chip based on the Samsung Exynos 5410 Octa processor with two quad-core ARM CPUs [8]: the performance CPU, a complex out-of-order ARM A15 running at 1.6 GHz, and the powersaving CPU, a simple in-order ARM A7, with a clock speed of 200 MHz. Both CPUs have 32 KB L1 instruction and data caches per compute core. However the L2 cache (which is shared between all core of the CPU) for the A15 is 2 MB, as opposed to only 512 KB for the A7. The ODROID has 2 GB of LPDDR3 DRAM, which runs at 800 MHz and has a maximum bandwidth of 12.8 GB/s. Ordinarily, the system is free to migrate loads between processors, however, for all results in this paper the load (the benchmark) was fixed to one CPU.

The ODROID has built-in power measurement sensors for both the SoC and board, allowing easy access to power usage data without external instrumentation. These sensors can measure the voltage, current and power consumption of each the CPUs, as well as the memory and the on-board GPU. The sensor readings are reported via the Linux filesystem. The update period for the sensors is set to the default of 262 ms although it can be lowered to measure shorter loads at a cost of an increased overhead in sampling, as for any in-band measurement system. The measurements themselves are taken by INA231 sensor modules from TI which use 16 bit ADCs with an accuracy of \(2.5~\upmu \hbox {V}\).

A block diagram for the ODROID is shown in Fig. 8.

Fig. 8
figure 8

ODROID block diagram, courtesy of HardKernel

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Weiland, M., Johnson, N. Benchmarking for power consumption monitoring. Comput Sci Res Dev 30, 155–163 (2015). https://doi.org/10.1007/s00450-014-0260-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00450-014-0260-1

Keywords

Navigation