skip to main content
10.1145/2925426.2926279acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Runtime-Guided Mitigation of Manufacturing Variability in Power-Constrained Multi-Socket NUMA Nodes

Authors Info & Claims
Published:01 June 2016Publication History

ABSTRACT

Current large scale systems show increasing power demands, to the point that it has become a huge strain on facilities and budgets. Researchers in academia, labs and industry are focusing on dealing with this "power wall", striving to find a balance between performance and power consumption. Some commodity processors enable power capping, which opens up new opportunities for applications to directly manage their power behavior at user level. However, while power capping ensures a system will never exceed a given power limit, it also leads to a new form of heterogeneity: natural manufacturing variability, which was previously hidden by varying power to achieve homogeneous performance, now results in heterogeneous performance caused by different CPU frequencies, potentially for each core, to enforce the power limit.

In this work we show how a parallel runtime system can be used to effectively deal with this new kind of performance heterogeneity by compensating the uneven effects of power capping. In the context of a NUMA node composed of several multi-core sockets, our system is able to optimize the energy and concurrency levels assigned to each socket to maximize performance. Applied transparently within the parallel runtime system, it does not require any programmer interaction like changing the application source code or manually reconfiguring the parallel system. We compare our novel runtime analysis with an offline approach and demonstrate that it can achieve equal performance at a fraction of the cost.

References

  1. P. E. Bailey, A. Marathe, D. K. Lowenthal, B. Rountree, and M. Schulz. Finding the limits of power-constrained application performance. In SC, pages 79:1--79:12, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In PACT, pages 72--81, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. In PPoPP, pages 207--216, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. J. ACM, 46(5):720--748, Sept. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, and V. De. Parameter variations and impact on circuits and microarchitecture. In DAC, pages 338--342, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. BSC. Programming models group. the nanos++ parallel runtime. https://pm.bsc.es/nanox, 2015.Google ScholarGoogle Scholar
  7. M. Casas, R. M. Badia, and J. Labarta. Automatic phase detection and structure extraction of mpi applications. Int. J. High Perform. Comput. Appl., 24(3):335--360, Aug. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Casas, M. Moreto, L. Alvarez, E. Castillo, D. Chasapis, T. Hayes, L. Jaulmes, O. Palomar, O. Unsal, A. Cristal, E. Ayguade, J. Labarta, and M. Valero. Euro-Par 2015, chapter Runtime-Aware Architectures, pages 16--27. August 2015.Google ScholarGoogle Scholar
  9. D. Chasapis, M. Casas, M. Moretó, R. Vidal, E. Ayguadé, J. Labarta, and M. Valero. Parsecss: Evaluating the impact of task parallelism in the parsec benchmark suite. ACM Trans. Archit. Code Optim., 12(4):41:1--41:22, Dec. 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Cochran, C. Hankendi, A. K. Coskun, and S. Reda. Pack & cap: Adaptive dvfs and thread packing under power caps. In MICRO, pages 175--185, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. D. Davis, S. Rivoire, M. Goldszmidt, and E. K. Ardestani. Accounting for Variability in Large-Scale Cluster Power Models. In EXERT, 2011.Google ScholarGoogle Scholar
  12. J. W. Demmel. Applied Numerical Linear Algebra. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. A. Ellsworth, A. D. Malony, B. Rountree, and M. Schulz. POW: System-wide Dynamic Reallocation of Limited Power in HPC. In HPDC, pages 145--148, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Etinski, J. Corbalan, J. Labarta, and M. Valero. Linear programming based parallel job scheduling for power constrained systems. In HPCS, pages 72--80, July 2011.Google ScholarGoogle ScholarCross RefCross Ref
  15. L. R. Harriott. Limits of lithography. Proceedings of the IEEE, 89(3):366--374, Mar 2001.Google ScholarGoogle ScholarCross RefCross Ref
  16. S. Herbert, S. Garg, and D. Marculescu. Exploiting process variability in voltage/frequency control. IEEE Trans. Very Large Scale Integr. Syst., 20(8):1392--1404, Aug. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Herbert and D. Marculescu. Variation-aware dynamic voltage/frequency scaling. In HPCA, pages 301--312, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  18. Y. Inadomi, T. Patki, K. Inoue, M. Aoyagi, B. Rountree, M. Schulz, D. Lowenthal, Y. Wada, K. Fukazawa, M. Ueda, M. Kondo, and I. Miyoshi. Analyzing and mitigating the impact of manufacturing variability in power-constrained supercomputing. In SC, pages 78:1--78:12, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Intel. Intel-64 and IA-32 Architectures Software Developer's Manual. Intel, December 2011.Google ScholarGoogle Scholar
  20. K. E. Isaacs, A. Bhatele, J. Lifflander, D. Böhme, T. Gamblin, M. Schulz, B. Hamann, and P.-T. Bremer. Recovering logical structure from charm++ event traces. In SC, pages 49:1--49:12, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. Lin, A. Mallik, P. Dinda, G. Memik, and R. Dick. User- and process-driven dynamic voltage and frequency scaling. In ISPASS, pages 11--22, April 2009.Google ScholarGoogle ScholarCross RefCross Ref
  22. Livermore Computing. The Catalyst supercomputer. http://computation.llnl.gov/computers/catalyst, 2014.Google ScholarGoogle Scholar
  23. A. Marathe, P. Bailey, D. Lowenthal, B. Rountree, M. Schulz, and B. de Supinski. A run-time system for power-constrained HPC applications. In High Performance Computing, volume 9137 of Lecture Notes in Computer Science, pages 394--408. 2015.Google ScholarGoogle ScholarCross RefCross Ref
  24. T. Patki, D. K. Lowenthal, B. Rountree, M. Schulz, and B. R. de Supinski. Exploring hardware overprovisioning in power-constrained, high performance computing. In ICS, pages 173--182, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. N. Rajovic, P. Carpenter, I. Gelado, N. Puzovic, A. Ramirez, and M. Valero. Supercomputing with commodity CPUs: Are mobile SoCs ready for HPC? In SC, pages 1--12, Nov 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. K. Ravichandran, S. Lee, and S. Pande. Work stealing for multi-core hpc clusters. In Euro-Par, pages 205--217, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. B. Rountree, D. Ahn, B. de Supinski, D. Lowenthal, and M. Schulz. Beyond DVFS: A first look at performance under a hardware-enforced power bound. In IPDPS Workshops PhD Forum, pages 947--953, May 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. B. S. Ashby and, J. Chen, P. Colella, B. Collins, D. Crawford, J. Dongarra, D. Kothe, R. Lusk, P. Messina, T. Mezzacappa, P. Moin, M. Norman, R. Rosner, V. Sarkar, A. Siegel, F. Streitz, A. White, and M. Wright. The opportunities and challenges of exascale computing. DOE Technical Report, 2010.Google ScholarGoogle Scholar
  29. S. Samaan. The impact of device parameter variations on the frequency and performance of VLSI chips. In ICCAD, pages 343--346, Nov 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. O. Sarood, A. Langer, A. Gupta, and L. Kale. Maximizing throughput of overprovisioned hpc data centers under a strict power budget. In SC, pages 807--818, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. K. Shoga, B. Rountree, and M. Schulz. Whitelisting MSRs with msr-safe, November 2014.Google ScholarGoogle Scholar
  32. R. Teodorescu and J. Torrellas. Variation-aware application scheduling and power management for chip multiprocessors. SIGARCH Comput. Archit. News, 36(3):363--374, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. E. Totoni, J. Torrellas, and L. V. Kale. Using an adaptive hpc runtime system to reconfigure the cache hierarchy. In SC, pages 1047--1058, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Tschanz, J. Kao, S. Narendra, R. Nair, D. Antoniadis, A. Chandrakasan, and V. De. Adaptive body bias for reducing impacts of die-to-die and within-die parameter variations on microprocessor frequency and leakage. Solid-State Circuits, IEEE Journal of, 37(11):1396--1402, Nov 2002.Google ScholarGoogle Scholar
  35. M. Valero, M. Moreto, M. Casas, E. Ayguade, and J. Labarta. Runtime-aware architectures: A first approach. Supercomputing frontiers and innovations, 1(1), 2014.Google ScholarGoogle Scholar
  36. G. Zheng, A. Bhatelé, E. Meneses, and L. V. Kalé. Periodic hierarchical load balancing for large supercomputers. Int. J. High Perform. Comput. Appl., 25(4):371--385, Nov. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    ICS '16: Proceedings of the 2016 International Conference on Supercomputing
    June 2016
    547 pages
    ISBN:9781450343619
    DOI:10.1145/2925426

    Copyright © 2016 Public Domain

    This paper is authored by an employee(s) of the United States Government and is in the public domain. Non-exclusive copying or redistribution is allowed, provided that the article citation is given and the authors and agency are clearly identified as its source.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 1 June 2016

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate584of2,055submissions,28%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader