Skip to main content

Advertisement

Log in

Performance–energy adaptation of parallel programs in pervasive computing

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

It is meaningful to use a little energy to obtain more performance improvement compared with the increased energy. It also makes sense to relax a small quantity of performance restriction to save an enormous amount of energy. Trading a small amount of energy for a considerable sum of performance or vice versa is possible if the relativities between performance and energy of parallel programs are exactly known. This work studies the relativities by recording the performance speedup and energy consumption of parallel programs when the number of cores on which programs run are changed. We demonstrate that the performance improvement and the increased energy consumption have a linear negative correlation.In addition, these relativities can guide us to do performance–energy adaptation under two assumptions. Our experiments show that the average correlation coefficients between performance and energy are higher than 97 %. Furthermore, it can be found that exchanging less than 6 % performance loss for more than 37 % energy consumption is feasible and vise versa.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. The real speedup is higher than ideal speedup for some parallel programs at a few points. This is because the working set size of per-thread decreases when spawning more threads, thus performance gets improved due to the fact that the cache miss rate of per-thread reduces. The measured execution time is also affected by the precision of sniper simulator.

References

  1. Weldezion AY, Grange M, Pamunuwa D, Lu Z, Jantsch A, Weerasekera R, Tenhunen H (2009) Scalability of network-on-chip communication architecture for 3-D meshes. In: Proceedings of the 2009 3rd IEEE international symposium on networks-on-chip. IEEE Computer Society, New York, pp 114–123

  2. Eyerman S, Du Bois K, Eeckhout L (2012) Speedup stacks: Identifying scaling bottlenecks in multi-threaded applications. In: Proceedings of IEEE international symposium on performance analysis of systems and software (ISPASS ’12). IEEE, New York, pp 145–155

  3. Mikkilineni R, Seyler I (2011) Parallax—a new operating system for scalable, distributed, and parallel computing. In: Proceedings of IEEE international symposium on parallel and distributed processing workshops and Phd forum (IPDPSW ’11). IEEE, New York, pp 976–983

  4. Korthikanti VA, Agha G, Greenstreet M (2011) On the energy complexity of parallel algorithms. In: Proceedings of 2011 international conference on parallel processing (ICPP ’11). IEEE, New York, pp 562–570

  5. Sartori J, Kumar R (2010) Low-overhead, high-speed multi-core barrier synchronization. High performance embedded architectures and compilers. Springer, Berlin, pp 18–34

    Chapter  Google Scholar 

  6. Lakshmanan K, de Niz D, Rajkumar R (2009) Coordinated task scheduling, allocation and synchronization on multiprocessors. In: Proceedings of real-time systems symposium (RTSS ’09). IEEE, New York, pp 469–478

  7. Bhattacharjee A, Martonosi M (2009) Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors. In: Proceedings of the 36th annual international symposium on Computer architecture (HPCA ’09). ACM, New York, pp 290–301

  8. Curtis-Maury M, Shah A, Blagojevic F, Nikolopoulos DS, de Supinski BR, Schulz M (2008) Prediction models for multi-dimensional power-performance optimization on many cores. In: Proceedings of the 17th international conference on Parallel architectures and compilation techniques (PACT ’08), pp 250–259

  9. Sinha A, Wang A, Chandrakasan AP (2000) Algorithmic transforms for efficient energy scalable computation. In: Proceedings of the 2000 international symposium on low power electronics and design. ACM, New York, pp 31–36

  10. Korthikanti VA, Agha G (2010) Avoiding energy wastage in parallel applications. In: Proceedings of 2010 international green computing conference. IEEE, New York, pp 149–163

  11. Hernndez V, Romn JE, Toms A (2007) Parallel Arnoldi eigensolvers with enhanced scalability via global communications rearrangement. Parallel Comput 33(7):521–540

    Article  MathSciNet  Google Scholar 

  12. Chen D, Lu D, Tian M, He S, Wang S, Tian J, Li X (2013) Towards energy-efficient parallel analysis of neural signals. Cluster Comput 16(1):1–15

    Article  Google Scholar 

  13. Sasaki H, Tanimoto T, Inoue K, Nakamura H (2012) Scalability-based manycore partitioning. In: Proceedings of the 21st international conference on parallel architectures and compilation techniques (PACT ’12). ACM, New York, pp 107–116

  14. Bienia C, Kumar S, Singh JP, Li K (2008) The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of the 17th international conference on parallel architectures and compilation techniques (PACT ’08). ACM, New York, pp 72–81

  15. Carlson TE, Heirman W, Eeckhout L (2011) Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In: Proceedings of 2011 international conference for high performance computing. Networking, storage and analysis. ACM, New York, pp 1–12

  16. Heirman W, Carlson TE, Che S, Skadron K, Eeckhout L (2011) Using cycle stacks to understand scaling bottlenecks in multi-threaded workloads. In: Proceedings of 2011 IEEE international symposium on workload characterization (IISWC). IEEE, New York, pp 38–49

  17. Li S, Ahn JH, Strong RD, Brockman JB, Tullsen DM, Jouppi NP (2009) McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: Proceedings of 42nd annual IEEE international symposium on microarchitecture (MICRO ’42). IEEE, New York, pp 469–480

  18. Brandenburg BB, Calandrino JM, Anderson JH (2008) On the scalability of real-time scheduling algorithms on multicore platforms: a case study. In: Proceedings of real-time systems symposium. IEEE, New York, pp 157–169

  19. Wentzlaff D, Agarwal A (2009) Factored operating systems (fos): the case for a scalable operating system for multicores. ACM SIGOPS Oper Syst Rev 43(2):76–85

    Article  Google Scholar 

  20. Veal B, Foong A (2007) Performance scalability of a multi-core web server. In: Proceedings of the 3rd ACM symposium on architecture for networking and communications systems. ACM, New York, pp 57–66

  21. Merkel A, Stoess J, Bellosa F (2010) Resource-conscious scheduling for energy efficiency on multicore processors. In: Proceedings of the 5th European conference on computer systems. ACM, New York, pp 153–166

  22. Majzoub SS, Saleh RA, Wilton SJ, Ward RK (2010) Energy optimization for many-core platforms: communication and PVT aware voltage-Island formation and voltage selection algorithm. IEEE Trans Comput-Aided Des Integr Circ Syst 29(5):816–829

    Article  Google Scholar 

  23. Meng J, Chen C, Coskun AK, Joshi A (2011) Run-time energy management of manycore systems through reconfigurable interconnects. In: Proceedings of the 21st Great lakes symposium on VLSI. ACM, New York, pp 43–48

  24. Garcia E, Orozco D, Gao G (2011) Energy efficient tiling on a many-core architecture. In: Proceedings of 4th workshop on programmability issues for heterogeneous multicores (MULTIPROG ’11), pp 53–66

  25. Korthikanti VA, Agha G (2009) Analysis of parallel algorithms for energy conservation in scalable multicore architectures. In: Proceedings of international conference on parallel processing (ICPP ’09), pp 212–219

  26. Korthikanti VA, Agha G (2009) Energy bounded scalability analysis of parallel algorithms. Technical report, Department of Computer Science, University of Illinois at Urbana Champaign

  27. Korthikanti VA, Agha G (2010) Energy-performance trade-off analysis of parallel algorithms. In: Proceedings of USENIX workshop on hot topics in parallelism (HotPar ’10)

  28. Seo E, Jeong J, Park S, Lee J (2008) Energy efficient scheduling of real-time tasks on multicore processors. IEEE Trans Parallel Distrib Syst 19(11):1540–1552

    Article  Google Scholar 

  29. Curtis-Maury M, Singh K, McKee SA, Blagojevic F, Nikolopoulos DS, De Supinski BR, Schulz M (2007) Identifying energy-efficient concurrency levels using machine learning. In: Proceedings of IEEE international conference on cluster computing. IEEE, New York, pp 488–495

  30. Li J, Martinez JF (2005) Power-performance implications of thread-level parallelism on chip multiprocessors. In: Proceedings of IEEE international symposium on performance analysis of systems and software (ISPASS ’05). IEEE, New York, pp 124–134

Download references

Acknowledgments

This work was supported by National High-tech Research and Development Program of China (863 Program) under Grant No. 2012AA010905, and China National Natural Science Foundation under Grants Nos. 61272408, 61133006. Doctoral Fund of Ministry of Education of China under Grant No. 20130142110048.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hai Jin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, L., Jin, H., Liao, X. et al. Performance–energy adaptation of parallel programs in pervasive computing. J Supercomput 70, 1260–1278 (2014). https://doi.org/10.1007/s11227-014-1226-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-014-1226-6

Keywords

Navigation