research-article

Runtime-Guided Mitigation of Manufacturing Variability in Power-Constrained Multi-Socket NUMA Nodes

Authors:
Dimitrios Chasapis

Barcelona Supercomputing Center, Barcelona, Spain and Universitat Politécnica de Catalunya, Barcelona, Spain

Barcelona Supercomputing Center, Barcelona, Spain and Universitat Politécnica de Catalunya, Barcelona, Spain
View Profile

,
Marc Casas

Barcelona Supercomputing Center, Barcelona, Spain and Universitat Politécnica de Catalunya, Barcelona, Spain

Barcelona Supercomputing Center, Barcelona, Spain and Universitat Politécnica de Catalunya, Barcelona, Spain
View Profile

,
Miquel Moretó

Barcelona Supercomputing Center, Barcelona, Spain and Universitat Politécnica de Catalunya, Barcelona, Spain

Barcelona Supercomputing Center, Barcelona, Spain and Universitat Politécnica de Catalunya, Barcelona, Spain
View Profile

,
Martin Schulz

Lawrence Livermore National Laboratory, Livermore, CA, US

Lawrence Livermore National Laboratory, Livermore, CA, US
View Profile

,
Eduard Ayguadé

Barcelona Supercomputing Center, Barcelona, Spain and Universitat Politécnica de Catalunya, Barcelona, Spain

Barcelona Supercomputing Center, Barcelona, Spain and Universitat Politécnica de Catalunya, Barcelona, Spain
View Profile

,
Jesus Labarta

Barcelona Supercomputing Center, Barcelona, Spain and Universitat Politécnica de Catalunya, Barcelona, Spain

Barcelona Supercomputing Center, Barcelona, Spain and Universitat Politécnica de Catalunya, Barcelona, Spain
View Profile

,
Mateo Valero

Barcelona Supercomputing Center, Barcelona, Spain and Universitat Politécnica de Catalunya, Barcelona, Spain

Barcelona Supercomputing Center, Barcelona, Spain and Universitat Politécnica de Catalunya, Barcelona, Spain
View Profile

ICS '16: Proceedings of the 2016 International Conference on SupercomputingJune 2016Article No.: 5Pages 1–12https://doi.org/10.1145/2925426.2926279

Published:01 June 2016Publication History

ICS '16: Proceedings of the 2016 International Conference on Supercomputing

Pages 1–12

ABSTRACT

Current large scale systems show increasing power demands, to the point that it has become a huge strain on facilities and budgets. Researchers in academia, labs and industry are focusing on dealing with this "power wall", striving to find a balance between performance and power consumption. Some commodity processors enable power capping, which opens up new opportunities for applications to directly manage their power behavior at user level. However, while power capping ensures a system will never exceed a given power limit, it also leads to a new form of heterogeneity: natural manufacturing variability, which was previously hidden by varying power to achieve homogeneous performance, now results in heterogeneous performance caused by different CPU frequencies, potentially for each core, to enforce the power limit.

In this work we show how a parallel runtime system can be used to effectively deal with this new kind of performance heterogeneity by compensating the uneven effects of power capping. In the context of a NUMA node composed of several multi-core sockets, our system is able to optimize the energy and concurrency levels assigned to each socket to maximize performance. Applied transparently within the parallel runtime system, it does not require any programmer interaction like changing the application source code or manually reconfiguring the parallel system. We compare our novel runtime analysis with an offline approach and demonstrate that it can achieve equal performance at a fraction of the cost.

References

P. E. Bailey, A. Marathe, D. K. Lowenthal, B. Rountree, and M. Schulz. Finding the limits of power-constrained application performance. In SC, pages 79:1--79:12, 2015. Google ScholarDigital Library
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In PACT, pages 72--81, 2008. Google ScholarDigital Library
R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. In PPoPP, pages 207--216, 1995. Google ScholarDigital Library
R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. J. ACM, 46(5):720--748, Sept. 1999. Google ScholarDigital Library
S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, and V. De. Parameter variations and impact on circuits and microarchitecture. In DAC, pages 338--342, 2003. Google ScholarDigital Library
BSC. Programming models group. the nanos++ parallel runtime. https://pm.bsc.es/nanox, 2015.Google Scholar
M. Casas, R. M. Badia, and J. Labarta. Automatic phase detection and structure extraction of mpi applications. Int. J. High Perform. Comput. Appl., 24(3):335--360, Aug. 2010. Google ScholarDigital Library
M. Casas, M. Moreto, L. Alvarez, E. Castillo, D. Chasapis, T. Hayes, L. Jaulmes, O. Palomar, O. Unsal, A. Cristal, E. Ayguade, J. Labarta, and M. Valero. Euro-Par 2015, chapter Runtime-Aware Architectures, pages 16--27. August 2015.Google Scholar
D. Chasapis, M. Casas, M. Moretó, R. Vidal, E. Ayguadé, J. Labarta, and M. Valero. Parsecss: Evaluating the impact of task parallelism in the parsec benchmark suite. ACM Trans. Archit. Code Optim., 12(4):41:1--41:22, Dec. 2015. Google ScholarDigital Library
R. Cochran, C. Hankendi, A. K. Coskun, and S. Reda. Pack & cap: Adaptive dvfs and thread packing under power caps. In MICRO, pages 175--185, 2011. Google ScholarDigital Library
J. D. Davis, S. Rivoire, M. Goldszmidt, and E. K. Ardestani. Accounting for Variability in Large-Scale Cluster Power Models. In EXERT, 2011.Google Scholar
J. W. Demmel. Applied Numerical Linear Algebra. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 1997. Google ScholarDigital Library
D. A. Ellsworth, A. D. Malony, B. Rountree, and M. Schulz. POW: System-wide Dynamic Reallocation of Limited Power in HPC. In HPDC, pages 145--148, 2015. Google ScholarDigital Library
M. Etinski, J. Corbalan, J. Labarta, and M. Valero. Linear programming based parallel job scheduling for power constrained systems. In HPCS, pages 72--80, July 2011.Google ScholarCross Ref
L. R. Harriott. Limits of lithography. Proceedings of the IEEE, 89(3):366--374, Mar 2001.Google ScholarCross Ref
S. Herbert, S. Garg, and D. Marculescu. Exploiting process variability in voltage/frequency control. IEEE Trans. Very Large Scale Integr. Syst., 20(8):1392--1404, Aug. 2012. Google ScholarDigital Library
S. Herbert and D. Marculescu. Variation-aware dynamic voltage/frequency scaling. In HPCA, pages 301--312, 2009.Google ScholarCross Ref
Y. Inadomi, T. Patki, K. Inoue, M. Aoyagi, B. Rountree, M. Schulz, D. Lowenthal, Y. Wada, K. Fukazawa, M. Ueda, M. Kondo, and I. Miyoshi. Analyzing and mitigating the impact of manufacturing variability in power-constrained supercomputing. In SC, pages 78:1--78:12, 2015. Google ScholarDigital Library
Intel. Intel-64 and IA-32 Architectures Software Developer's Manual. Intel, December 2011.Google Scholar
K. E. Isaacs, A. Bhatele, J. Lifflander, D. Böhme, T. Gamblin, M. Schulz, B. Hamann, and P.-T. Bremer. Recovering logical structure from charm++ event traces. In SC, pages 49:1--49:12, 2015. Google ScholarDigital Library
B. Lin, A. Mallik, P. Dinda, G. Memik, and R. Dick. User- and process-driven dynamic voltage and frequency scaling. In ISPASS, pages 11--22, April 2009.Google ScholarCross Ref
Livermore Computing. The Catalyst supercomputer. http://computation.llnl.gov/computers/catalyst, 2014.Google Scholar
A. Marathe, P. Bailey, D. Lowenthal, B. Rountree, M. Schulz, and B. de Supinski. A run-time system for power-constrained HPC applications. In High Performance Computing, volume 9137 of Lecture Notes in Computer Science, pages 394--408. 2015.Google ScholarCross Ref
T. Patki, D. K. Lowenthal, B. Rountree, M. Schulz, and B. R. de Supinski. Exploring hardware overprovisioning in power-constrained, high performance computing. In ICS, pages 173--182, 2013. Google ScholarDigital Library
N. Rajovic, P. Carpenter, I. Gelado, N. Puzovic, A. Ramirez, and M. Valero. Supercomputing with commodity CPUs: Are mobile SoCs ready for HPC? In SC, pages 1--12, Nov 2013. Google ScholarDigital Library
K. Ravichandran, S. Lee, and S. Pande. Work stealing for multi-core hpc clusters. In Euro-Par, pages 205--217, 2011. Google ScholarDigital Library
B. Rountree, D. Ahn, B. de Supinski, D. Lowenthal, and M. Schulz. Beyond DVFS: A first look at performance under a hardware-enforced power bound. In IPDPS Workshops PhD Forum, pages 947--953, May 2012. Google ScholarDigital Library
P. B. S. Ashby and, J. Chen, P. Colella, B. Collins, D. Crawford, J. Dongarra, D. Kothe, R. Lusk, P. Messina, T. Mezzacappa, P. Moin, M. Norman, R. Rosner, V. Sarkar, A. Siegel, F. Streitz, A. White, and M. Wright. The opportunities and challenges of exascale computing. DOE Technical Report, 2010.Google Scholar
S. Samaan. The impact of device parameter variations on the frequency and performance of VLSI chips. In ICCAD, pages 343--346, Nov 2004. Google ScholarDigital Library
O. Sarood, A. Langer, A. Gupta, and L. Kale. Maximizing throughput of overprovisioned hpc data centers under a strict power budget. In SC, pages 807--818, 2014. Google ScholarDigital Library
K. Shoga, B. Rountree, and M. Schulz. Whitelisting MSRs with msr-safe, November 2014.Google Scholar
R. Teodorescu and J. Torrellas. Variation-aware application scheduling and power management for chip multiprocessors. SIGARCH Comput. Archit. News, 36(3):363--374, June 2008. Google ScholarDigital Library
E. Totoni, J. Torrellas, and L. V. Kale. Using an adaptive hpc runtime system to reconfigure the cache hierarchy. In SC, pages 1047--1058, 2014. Google ScholarDigital Library
J. Tschanz, J. Kao, S. Narendra, R. Nair, D. Antoniadis, A. Chandrakasan, and V. De. Adaptive body bias for reducing impacts of die-to-die and within-die parameter variations on microprocessor frequency and leakage. Solid-State Circuits, IEEE Journal of, 37(11):1396--1402, Nov 2002.Google Scholar
M. Valero, M. Moreto, M. Casas, E. Ayguade, and J. Labarta. Runtime-aware architectures: A first approach. Supercomputing frontiers and innovations, 1(1), 2014.Google Scholar
G. Zheng, A. Bhatelé, E. Meneses, and L. V. Kalé. Periodic hierarchical load balancing for large supercomputers. Int. J. High Perform. Comput. Appl., 25(4):371--385, Nov. 2011. Google ScholarDigital Library

Recommendations

MrPhi: An Optimized MapReduce Framework on Intel Xeon Phi Coprocessors
In this work, we develop MrPhi, an optimized MapReduce framework on a heterogeneous computing platform, particularly equipped with multiple Intel Xeon Phi coprocessors. To the best of our knowledge, this is the first work to optimize the MapReduce ...
Read More
Runtime coordinated heterogeneous tasks in charm++
ESPM2: Proceedings of the Second Internationsl Workshop on Extreme Scale Programming Models and Middleware

Effective utilization of the increasingly heterogeneous hardware in modern supercomputers is a significant challenge. Many applications have seen performance gains by using GPUs, but many implementations leave CPUs sitting idle.

In this paper, we ...
Read More
Intra-Socket and Inter-Socket Communication in Multi-core Systems

The increasing computational and communication demands of the scientific and industrial communities require a clear understanding of the performance trade-offs involved in multi-core computing platforms. Such analysis can help application and toolkit ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICS '16: Proceedings of the 2016 International Conference on Supercomputing
June 2016
547 pages
ISBN:9781450343619
DOI:10.1145/2925426

Copyright © 2016 Public Domain
This paper is authored by an employee(s) of the United States Government and is in the public domain. Non-exclusive copying or redistribution is allowed, provided that the article citation is given and the authors and agency are clearly identified as its source.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
High Performance Computing
Manufacturing Variability
Parallel Architectures
Parallel Programming
Pararallel Runtimes
Power and Energy
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate584of2,055submissions,28%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 219
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Runtime-Guided Mitigation of Manufacturing Variability in Power-Constrained Multi-Socket NUMA Nodes

ICS '16: Proceedings of the 2016 International Conference on Supercomputing

ABSTRACT

References

Cited By

Recommendations

MrPhi: An Optimized MapReduce Framework on Intel Xeon Phi Coprocessors

Runtime coordinated heterogeneous tasks in charm++

Intra-Socket and Inter-Socket Communication in Multi-core Systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Runtime-Guided Mitigation of Manufacturing Variability in Power-Constrained Multi-Socket NUMA Nodes

ICS '16: Proceedings of the 2016 International Conference on Supercomputing

ABSTRACT

References

Cited By

Recommendations

MrPhi: An Optimized MapReduce Framework on Intel Xeon Phi Coprocessors

Runtime coordinated heterogeneous tasks in charm++

Intra-Socket and Inter-Socket Communication in Multi-core Systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media