skip to main content
10.1145/2503210.2503288acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

MVAPICH-PRISM: a proxy-based communication framework using InfiniBand and SCIF for intel MIC clusters

Published:17 November 2013Publication History

ABSTRACT

Xeon Phi, based on the Intel Many Integrated Core (MIC) architecture, packs up to 1TFLOPs of performance on a single chip while providing x86__64 compatibility. On the other hand, InfiniBand is one of the most popular choices of interconnect for supercomputing systems. The software stack on Xeon Phi allows processes to directly access an InfiniBand HCA on the node and thus, provides a low latency path for internode communication. However, drawbacks in the state-of-the-art chipsets like Sandy Bridge limit the bandwidth available for these transfers. In this paper, we propose MVAPICH-PRISM, a novel proxy-based framework to optimize the communication performance on such systems. We present several designs and evaluate them using micro-benchmarks and application kernels. Our designs improve internode latency between Xeon Phi processes by up to 65% and internode bandwidth by up to five times. Our designs improve the performance of MPI_Alltoall operation by up to 65%, with 256 processes. They improve the performance of a 3D Stencil communication kernel and the P3DFFT library by 56% and 22% with 1,024 and 512 processes, respectively.

References

  1. InfiniBand Trade Association, InfiniBand Architecture Specification, Volume 1, Release 1.0. http://www.infinibandta.com.Google ScholarGoogle Scholar
  2. Intel MIC Architecture. http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-detail.html.Google ScholarGoogle Scholar
  3. Intel MPI Library. http://software.intel.com/en-us/intel-mpi-library/.Google ScholarGoogle Scholar
  4. MVAPICH2: MPI over InfiniBand, 10GigE/iWARP and RoCE. http://mvapich.cse.ohio-state.edu/.Google ScholarGoogle Scholar
  5. OFS for Xeon Phi. https://www.openfabrics.org/images/docs/2013_Dev_Workshop/Mon_0422/2013_Workshop_Mon_1430_OpenFabrics_OFS_software_for_Xeon_Phi.pdf.Google ScholarGoogle Scholar
  6. TACC Stampede. http://www.tacc.utexas.edu/resources/hpc.Google ScholarGoogle Scholar
  7. TOP 500 Supercomputer Sites. http://www.top500.org.Google ScholarGoogle Scholar
  8. XEON-PHI Software Developer's Guide. http://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/xeon-phi-software-developers-guide.pdf.Google ScholarGoogle Scholar
  9. J. Duato, A. J. Pena, F. Silla, J. C. Fernández, R. Mayo, and E. S. Quintana-Orti. Enabling CUDA Acceleration within Virtual Machines using rCUDA. In High Performance Computing (HiPC), 2011 18th International Conference on, pages 1--10. IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. R. Garea and T. Hoefler. Modeling Communication in Cache-Coherent SMP Systems - A Case-Study with Xeon Phi. Jun. 2013. HPDC'13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Koesterke, J. Boisseau, J. Cazes, K. Milfeld, D. Stanzione. Early Experiences with the Intel Many Integrated Cores Accelerated Computing Technology. In Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery, November 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Deisher, M. Smelyanskiy, B. Nickerson, V. W. Lee, and M. Chuvelev. Designing and Dynamically Load Balancing Hybrid LU for Multi/many-core. In Computer Science - Research and Development, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. Meadows. Experiments with WRF on Intel Many Integrated Core (Intel MIC) Architecture. In Lecture Notes in Computer Science, volume 7312, pages 130--139, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. MPICH:High-performance and Portable MPI. http://www.mpich.org/.Google ScholarGoogle Scholar
  15. Open MPI: Open Source High Performance Computing. http://www.open-mpi.org.Google ScholarGoogle Scholar
  16. D. Pekurovsky. P3DFFT: A Framework for Parallel Computations of Fourier Transforms in Three Dimensions. SIAM Journal on Scientific Computing, 34(4):C192--C209, 2012.Google ScholarGoogle Scholar
  17. S. Potluri, A. Venkatesh, D. Bureddy, K. Kandalla and D. K. Panda. Efficient Intra-node Communication on Intel-MIC Clusters. In Int'l Symposium on Cluster, Cloud, and Grid Computing (CCGrid), May 2013.Google ScholarGoogle Scholar
  18. S. Potluri, K. Tomko, D. Bureddy and D. K. Panda. Intra-MIC MPI Communication using MVAPICH2: Early Experience. In TACC-Intel Highly Parallel Computing Symposium, April 2012.Google ScholarGoogle Scholar
  19. M. Si and Y. Ishikawa. An MPI Library Implementing Direct Communication for Many-Core based Accelerators. In High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:, pages 1527--1528. IEEE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Wu, J. Liu, P. Wyckoff, and D. K. Panda. Impact of On-Demand Connection Management in MPI over VIA. In Proceedings of Cluster '02, Sep 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Xiao, P. Balaji, Q. Zhu, R. Thakur, S. Coghlan, H. Lin, G. Wen, J. Hong, and W. Feng. VOCL: An Optimized Environment for Transparent Virtualization of Graphics Processing Units. 2012.Google ScholarGoogle Scholar

Index Terms

  1. MVAPICH-PRISM: a proxy-based communication framework using InfiniBand and SCIF for intel MIC clusters

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
          November 2013
          1123 pages
          ISBN:9781450323789
          DOI:10.1145/2503210
          • General Chair:
          • William Gropp,
          • Program Chair:
          • Satoshi Matsuoka

          Copyright © 2013 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 17 November 2013

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SC '13 Paper Acceptance Rate91of449submissions,20%Overall Acceptance Rate1,516of6,373submissions,24%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader