Abstract
This paper presents a set of benchmarks that are designed to measure power consumption in parallel systems. The benchmarks range from low-level, single instructions or operations, to small kernels. In addition to describing the motivation behind developing the benchmarks and the design principles that were followed, the paper also introduces a metric to quantify the power-performance of a parallel system. Initial results are presented and help to illustrate the contribution of the paper.
Similar content being viewed by others
Notes
These numbers were extracted from the system when the respective CPU was idle.
References
Amarasinghe S, Campbell D, Carlson W, Chien A, Dally W, Elnohazy E, Harrison R, Harrod W, Hiller J, Karp S, Koelbel C, Koester D, Kogge P, Levesque J, Reed D, Schreiber R, Richards M, Scarpelli A, Shalf J, Snavely A, Sterling T (2009) Exascale software study: software challenges in extreme scale systems
Cooley JW, Tukey JW (1965) An algorithm for the machine calculation of complex Fourier series. Math Comput 19(90):297297. doi:10.1090/s0025-5718-1965-0178586-1
Hardkernel: ODROID XU+E Specification. Online. http://bit.ly/1sLd62v. Accessed 30 May 2014
Hart A, Richardson H, Doleschal J, Ilsche T, Bielert M, Kappel M (2014) User-level power monitoring and application performance on cray xc30 supercomputers. In: Proceedings of the Cray User Group (CUG) 2014, Lugano, Switzerland
Juckeland G et al (2004) BenchIT-Performance measurement and comparison for scientific applications. In: Joubert G, Nagel W, Peters F, Walter W (eds) Parallel computing software technology, algorithms, architectures and applications, advances in parallel computing, vol 13. North-Holland, Amsterdam, pp 501–508
OpenMP ARB: OpenMP Specification (2013)
PMaC: MultiMaps. http://bit.ly/1hG2vwr. Accessed 30 May 2014
Samsung: Samsung Exynos 5 Octa Specification. http://bit.ly/OOsOcZ. Accessed 30 May 2014
Staelin C, packard Laboratories H (1996) lmbench: portable tools for performance analysis. In: USENIX annual technical conference, pp 279–294
Towards a breakthrough in software for advanced computing systems. Report from a workshop organised by the European Commission in preparation for HORIZON 2020 (2012)
UPC Consortium: UPC Language Specifications (2005)
Acknowledgments
Thanks to James Perry and Iakovos Panourgias, both EPCC, for testing/reviewing the benchmarks, and to Andrew McCormick from Alpha Data Parallel Systems Ltd for deriving the energy scaling metrics.
Author information
Authors and Affiliations
Corresponding author
Additional information
The Adept project is partially funded by the European Commission under the 7th Framework Programme, Grant Agreement Number 610490.
Appendix: ODROID specifications
Appendix: ODROID specifications
The board used in the evaluation section of this paper is an ODROID XU+E. This is a complete System-on-Chip based on the Samsung Exynos 5410 Octa processor with two quad-core ARM CPUs [8]: the performance CPU, a complex out-of-order ARM A15 running at 1.6 GHz, and the powersaving CPU, a simple in-order ARM A7, with a clock speed of 200 MHz. Both CPUs have 32 KB L1 instruction and data caches per compute core. However the L2 cache (which is shared between all core of the CPU) for the A15 is 2 MB, as opposed to only 512 KB for the A7. The ODROID has 2 GB of LPDDR3 DRAM, which runs at 800 MHz and has a maximum bandwidth of 12.8 GB/s. Ordinarily, the system is free to migrate loads between processors, however, for all results in this paper the load (the benchmark) was fixed to one CPU.
The ODROID has built-in power measurement sensors for both the SoC and board, allowing easy access to power usage data without external instrumentation. These sensors can measure the voltage, current and power consumption of each the CPUs, as well as the memory and the on-board GPU. The sensor readings are reported via the Linux filesystem. The update period for the sensors is set to the default of 262 ms although it can be lowered to measure shorter loads at a cost of an increased overhead in sampling, as for any in-band measurement system. The measurements themselves are taken by INA231 sensor modules from TI which use 16 bit ADCs with an accuracy of \(2.5~\upmu \hbox {V}\).
A block diagram for the ODROID is shown in Fig. 8.
Rights and permissions
About this article
Cite this article
Weiland, M., Johnson, N. Benchmarking for power consumption monitoring. Comput Sci Res Dev 30, 155–163 (2015). https://doi.org/10.1007/s00450-014-0260-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00450-014-0260-1