research-article

Generic topology mapping strategies for large-scale parallel architectures

Authors:
Torsten Hoefler

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Marc Snir

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

ICS '11: Proceedings of the international conference on SupercomputingMay 2011Pages 75–84https://doi.org/10.1145/1995896.1995909

Published:31 May 2011Publication History

ICS '11: Proceedings of the international conference on Supercomputing

Pages 75–84

ABSTRACT

The steadily increasing number of nodes in high-performance computing systems and the technology and power constraints lead to sparse network topologies. Efficient mapping of application communication patterns to the network topology gains importance as systems grow to petascale and beyond. Such mapping is supported in parallel programming frameworks such as MPI, but is often not well implemented. We show that the topology mapping problem is NP-complete and analyze and compare different practical topology mapping heuristics. We demonstrate an efficient and fast new heuristic which is based on graph similarity and show its utility with application communication patterns on real topologies. Our mapping strategies support heterogeneous networks and show significant reduction of congestion on torus, fat-tree, and the PERCS network topologies, for irregular communication patterns. We also demonstrate that the benefit of topology mapping grows with the network size and show how our algorithms can be used in a practical setting to optimize communication performance. Our efficient topology mapping strategies are shown to reduce network congestion by up to 80%, reduce average dilation by up to 50%, and improve benchmarked communication performance by 18%.

References

B. Arimilli, R. Arimilli, V. Chung, S. Clark, W. Denzel, B. Drerup, T. Hoefler, J. Joyner, J. Lewis, J. Li, N. Ni, and R. Rajamony. The PERCS High-Performance Interconnect. In Proc. of 18th Symposium on High-Performance Interconnects (HotI'10), Aug. 2010. Google ScholarDigital Library
A. Bhatelé, L. V. Kalé, and S. Kumar. Dynamic topology aware load balancing algorithms for molecular dynamics applications. In ICS '09, pages 110--116, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
S. H. Bokhari. On the mapping problem. IEEE Trans. Comput., 30(3):207--214, 1981. Google ScholarDigital Library
S. W. Bollinger and S. F. Midkiff. Heuristic technique for processor and link assignment in multicomputers. IEEE Trans. Comput., 40(3):325--333, 1991. Google ScholarDigital Library
U. Brandes. A faster algorithm for betweenness centrality. The Journal of Math. Sociology, 25(2):163--177, 2001.Google ScholarCross Ref
E. Cuthill and J. McKee. Reducing the bandwidth of sparse symmetric matrices. In Proceedings of the 1969 24th national conference, ACM '69, pages 157--172, New York, NY, USA, 1969. ACM. Google ScholarDigital Library
T. A. Davis. University of Florida Sparse Matrix Collection. NA Digest, 92, 1994.Google Scholar
J. Dongarra, I. Foster, G. Fox, W. Gropp, K. Kennedy, L. Torczon, and A. White, editors. Sourcebook of parallel computing. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2003. Google ScholarDigital Library
G. Dueck and T. Scheuer. Threshold accepting: a general purpose optimization algorithm appearing superior to simulated annealing. J. Comput. Phys., 90(1):161--175, 1990. Google ScholarDigital Library
M. Gary and D. Johnson. Computers and Intractability: A Guide to NP-Completeness. New York: W H. Freeman and Company, 1979. Google ScholarDigital Library
J. R. Gilbert, S. Reinhardt, and V. B. Shah. High-performance graph algorithms from parallel sparse matrices. In PARA'06: Proceedings of the 8th international conference on Applied parallel computing, pages 260--269, 2007. Google ScholarDigital Library
T. Hoefler, R. Rabenseifner, H. Ritzdorf, B. R. de Supinski, R. Thakur, and J. L. Traeff. The Scalable Process Topology Interface of MPI 2.2. Concurrency and Computation: Practice and Experience, 23(4):293--310, Aug. 2010. Google ScholarDigital Library
R. Johari and D. Tan. End-to-end congestion control for the internet: delays and stability. Networking, IEEE/ACM Transactions on, 9(6):818 --832, Dec. 2001. Google ScholarDigital Library
P. Kogge et al. Exascale computing study: Technology challenges in achieving exascale systems. DARPA Information Processing Techniques Office, Washington, DC, 2008.Google Scholar
S.-Y. Lee and J. K. Aggarwal. A mapping strategy for parallel processing. IEEE Trans. Comput., 36(4):433--442, 1987. Google ScholarDigital Library
MPI Forum. fMPI: A Message-Passing Interface Standard. Version 2.2, June 23rd 2009. www.mpi-forum.org.Google Scholar
D. Pekurovsky. P3DFFT - Highly scalable parallel 3D Fast Fourier Transforms library. Technical report, 2010.Google Scholar
F. Pellegrini and J. Roman. Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs. In HPCN Europe'96, pages 493--498, 1996. Google ScholarDigital Library
A. L. Rosenberg. Issues in the study of graph embeddings. In WG'80, pages 150--176, London, UK, 1981. Google ScholarDigital Library
K. Schloegel, G. Karypis, and V. Kumar. Parallel static and dynamic multi-constraint graph partitioning. Concurrency and Computation: Practice and Experience, 14(3):219--240, 2002.Google ScholarCross Ref
H. D. Simon and S.-H. Teng. How good is recursive bisection? SIAM J. Sci. Comput., 18:1436--1445, September 1997. Google ScholarDigital Library
J. L. Träff. Implementing the MPI process topology mechanism. In Supercomputing '02: Proceedings of the 2002 ACM/IEEE conference on Supercomputing, pages 1--14, 2002. Google ScholarDigital Library
H. Yu, I.-H. Chung, and J. Moreira. Topology mapping for Blue Gene/L supercomputer. In SC'06, page 116, New York, NY, USA, 2006. ACM. Google ScholarDigital Library

Index Terms

Generic topology mapping strategies for large-scale parallel architectures
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

Automatic topology mapping of diverse large-scale parallel applications
ICS '17: Proceedings of the International Conference on Supercomputing

Topology-aware mapping aims at assigning tasks to processors in a way that minimizes network load, thus reducing the time spent waiting for communication to complete. Many mapping schemes and algorithms have been proposed. Some are application or domain ...
Read More
Topology mapping of irregular parallel applications on torus-connected supercomputers

Supercomputers with ever increasing computing power are being built for scientific applications. As the system size scales up, so does the size of interconnect network. As a result, communication in supercomputers becomes increasingly expensive due to ...
Read More
Algorithms for Mapping Parallel Processes onto Grid and Torus Architectures
PDP '15: Proceedings of the 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing

Static mapping is the assignment of parallel processes to the processing elements (PEs) of a parallel system, where the assignment does not change during the application's lifetime. In our scenario we model an application's computations and their ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICS '11: Proceedings of the international conference on Supercomputing
May 2011
398 pages
ISBN:9781450301022
DOI:10.1145/1995896
General Chair:
David K. Lowenthal
University of Arizona
,
Program Chairs:
Bronis R. de Supinski
Lawrence Livermore National Laboratory
,
Sally A. McKee
Chalmers University of Technology
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 May 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
mpi graph topologies
topology mapping
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate584of2,055submissions,28%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 156
  Total Citations
  View Citations
- 861
  Total Downloads
- Downloads (Last 12 months)66
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Generic topology mapping strategies for large-scale parallel architectures

ICS '11: Proceedings of the international conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Automatic topology mapping of diverse large-scale parallel applications

Topology mapping of irregular parallel applications on torus-connected supercomputers

Algorithms for Mapping Parallel Processes onto Grid and Torus Architectures