skip to main content
10.1145/1995896.1995909acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Generic topology mapping strategies for large-scale parallel architectures

Authors Info & Claims
Published:31 May 2011Publication History

ABSTRACT

The steadily increasing number of nodes in high-performance computing systems and the technology and power constraints lead to sparse network topologies. Efficient mapping of application communication patterns to the network topology gains importance as systems grow to petascale and beyond. Such mapping is supported in parallel programming frameworks such as MPI, but is often not well implemented. We show that the topology mapping problem is NP-complete and analyze and compare different practical topology mapping heuristics. We demonstrate an efficient and fast new heuristic which is based on graph similarity and show its utility with application communication patterns on real topologies. Our mapping strategies support heterogeneous networks and show significant reduction of congestion on torus, fat-tree, and the PERCS network topologies, for irregular communication patterns. We also demonstrate that the benefit of topology mapping grows with the network size and show how our algorithms can be used in a practical setting to optimize communication performance. Our efficient topology mapping strategies are shown to reduce network congestion by up to 80%, reduce average dilation by up to 50%, and improve benchmarked communication performance by 18%.

References

  1. B. Arimilli, R. Arimilli, V. Chung, S. Clark, W. Denzel, B. Drerup, T. Hoefler, J. Joyner, J. Lewis, J. Li, N. Ni, and R. Rajamony. The PERCS High-Performance Interconnect. In Proc. of 18th Symposium on High-Performance Interconnects (HotI'10), Aug. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Bhatelé, L. V. Kalé, and S. Kumar. Dynamic topology aware load balancing algorithms for molecular dynamics applications. In ICS '09, pages 110--116, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. H. Bokhari. On the mapping problem. IEEE Trans. Comput., 30(3):207--214, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. W. Bollinger and S. F. Midkiff. Heuristic technique for processor and link assignment in multicomputers. IEEE Trans. Comput., 40(3):325--333, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. U. Brandes. A faster algorithm for betweenness centrality. The Journal of Math. Sociology, 25(2):163--177, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  6. E. Cuthill and J. McKee. Reducing the bandwidth of sparse symmetric matrices. In Proceedings of the 1969 24th national conference, ACM '69, pages 157--172, New York, NY, USA, 1969. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. A. Davis. University of Florida Sparse Matrix Collection. NA Digest, 92, 1994.Google ScholarGoogle Scholar
  8. J. Dongarra, I. Foster, G. Fox, W. Gropp, K. Kennedy, L. Torczon, and A. White, editors. Sourcebook of parallel computing. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Dueck and T. Scheuer. Threshold accepting: a general purpose optimization algorithm appearing superior to simulated annealing. J. Comput. Phys., 90(1):161--175, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Gary and D. Johnson. Computers and Intractability: A Guide to NP-Completeness. New York: W H. Freeman and Company, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. R. Gilbert, S. Reinhardt, and V. B. Shah. High-performance graph algorithms from parallel sparse matrices. In PARA'06: Proceedings of the 8th international conference on Applied parallel computing, pages 260--269, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Hoefler, R. Rabenseifner, H. Ritzdorf, B. R. de Supinski, R. Thakur, and J. L. Traeff. The Scalable Process Topology Interface of MPI 2.2. Concurrency and Computation: Practice and Experience, 23(4):293--310, Aug. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Johari and D. Tan. End-to-end congestion control for the internet: delays and stability. Networking, IEEE/ACM Transactions on, 9(6):818 --832, Dec. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Kogge et al. Exascale computing study: Technology challenges in achieving exascale systems. DARPA Information Processing Techniques Office, Washington, DC, 2008.Google ScholarGoogle Scholar
  15. S.-Y. Lee and J. K. Aggarwal. A mapping strategy for parallel processing. IEEE Trans. Comput., 36(4):433--442, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. MPI Forum. fMPI: A Message-Passing Interface Standard. Version 2.2, June 23rd 2009. www.mpi-forum.org.Google ScholarGoogle Scholar
  17. D. Pekurovsky. P3DFFT - Highly scalable parallel 3D Fast Fourier Transforms library. Technical report, 2010.Google ScholarGoogle Scholar
  18. F. Pellegrini and J. Roman. Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs. In HPCN Europe'96, pages 493--498, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. L. Rosenberg. Issues in the study of graph embeddings. In WG'80, pages 150--176, London, UK, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. Schloegel, G. Karypis, and V. Kumar. Parallel static and dynamic multi-constraint graph partitioning. Concurrency and Computation: Practice and Experience, 14(3):219--240, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  21. H. D. Simon and S.-H. Teng. How good is recursive bisection? SIAM J. Sci. Comput., 18:1436--1445, September 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. L. Träff. Implementing the MPI process topology mechanism. In Supercomputing '02: Proceedings of the 2002 ACM/IEEE conference on Supercomputing, pages 1--14, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. Yu, I.-H. Chung, and J. Moreira. Topology mapping for Blue Gene/L supercomputer. In SC'06, page 116, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Generic topology mapping strategies for large-scale parallel architectures

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ICS '11: Proceedings of the international conference on Supercomputing
        May 2011
        398 pages
        ISBN:9781450301022
        DOI:10.1145/1995896

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 31 May 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate584of2,055submissions,28%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader