Skip to main content
Log in

High Performance RDMA-Based MPI Implementation over InfiniBand

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Although InfiniBand Architecture is relatively new in the high performance computing area, it offers many features which help us to improve the performance of communication subsystems. One of these features is Remote Direct Memory Access (RDMA) operations. In this paper, we propose a new design of MPI over InfiniBand which brings the benefit of RDMA to not only large messages, but also small and control messages. We also achieve better scalability by exploiting application communication pattern and combining send/receive operations with RDMA operations. Our RDMA-based MPI implementation achieves a latency of 6.8 μsec for small messages and a peak bandwidth of 871 million bytes/sec. Performance evaluation shows that for small messages, our RDMA-based design can reduce the latency by 24%, increase the bandwidth by over 104%, and reduce the host overhead by up to 22% compared with the original design. For large data transfers, we improve performance by reducing the time for transferring control messages. We have also shown that our new design is beneficial to MPI collective communication and NAS Parallel Benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFERENCES

  1. T. von Eicken, D. E. Culler, S. C. Goldstein, and K. E. Schauser, Active Messages: A Mechanism for Integrated Communication and Computation, International Symposium on Computer Architecture, pp. 256-266 (1992).

  2. M. Blumrich, C. Dubnicki, E. W. Felten, K. Li, and M. R. Mesarina, Virtual-Memory-Mapped Network Interfaces, IEEE Micro, 15(1): 21-28 (Feb. 1995).

    Google Scholar 

  3. S. Pakin, M. Lauria, and A. ChienHigh Performance Messaging on Workstations: Illinois Fast Messages (FM), Proceedings of the Supercomputing (1995).

  4. T. von Eicken, A. Basu, V. Buch, and W. Vogels, U-Net: A User-level Network Interface for Parallel and Distributed Computing, ACM Symposium on Operating Systems Principles, pp. 40-53 (1995).

  5. D. Dunning, G. Regnier,G. McAlpine,D. Cameron,B. Shubert,F. Berry,A. Merritt,E. Gronke, and C. Dodd, The Virtual Interface Architecture, IEEE Micro, 18(2): 66-76 (March/April 1998).

    Google Scholar 

  6. InfiniBand Trade Association, InfiniBand Architecture Specification, Release 1.1, http://www.infinibandta.org (November 2002).

  7. Lawrence Berkeley National Laboratory, MVICH: MPI for Virtual Interface Architecture, http://www.nersc.gov/research/FTG/mvich/index.html (August 2001).

  8. J. Liu, J. Wu, S. P. Kinis, D. Buntinas,W. Yu,B. Chandrasekaran,R. Noronha,P. Wyckoff, and D. K. Panda MPI over InfiniBand: Early Experiences, Technical Report, OSU-CISRC-10/02-TR25, Computer and Information Science, the Ohio State University (January 2003).

  9. R. Dimitrov and A. Skjellum, An Efficient MPI Implementation for Virtual Interface (VI) Architecture-Enabled Cluster Computing,, http://www.mpi-softtech.com/publications/ (1998).

  10. M. Banikazemi,R. K. Govindaraju,R. Blackmore and D. K. Panda, MPI-LAPI: An Efficient Implementation of MPI for IBM RS/6000 SP Systems, IEEE Transactions on Parallel and Distributed Systems, 12(10): 1081-1093 (October 2001).

    Google Scholar 

  11. H. Tezuka,F. O'Carroll,A. Hori, and Y. Ishikawa, Pin-down Cache: A Virtual Memory Management Technique for Zero-copy Communication, Proceedings of 12th International Parallel Processing Symposium, pp. 308-315 (1998).

  12. J. S. Vetter and F. Mueller, Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures, Int'l Parallel and Distributed Processing Symposium (IPDPS' 02) (April 2002).

  13. J. Wu, J. Liu, P. Wyckoff, and D. K. Panda, Impact of On-Demand Connection Management in MPI over VIA, Proceedings of the IEEE International Conference on Cluster Computing, pp. 152-159 (September 2002).

  14. R. Gupta, P. Balaji, D. K. Panda, and J. Nieplocha, Efficient Collective Operations Using Remote Memory Operations on VIA-Based Clusters, Int'l Parallel and Distributed Processing Symposium (IPDPS' 03) (April 2003).

  15. Mellanox Technologies, Mellanox InfiniBand InfiniHost MT23108 Adapters, http://www.mellanox.com (July 2002).

  16. W. Gropp, E. Lusk, N. Doss, and A. Skjellum, A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard, Parallel Computing, 22(6): 789-828 (1996).

    Google Scholar 

  17. NASA, NAS Parallel Benchmarks, http://www.nas.nasa.gov/Software/NPB/.

  18. D. E. Culler, R. M. Karp, D. A. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. von Eicken, LogP: Towards a Realistic Model of Parallel Computation, Principles Practice of Parallel Programming, pp. 1-12 (1993).

  19. R. Martin, A. Vahdat, D. Culler, and T. Anderson, Effects of Communication Latency, Overhead, and Bandwidth in a Cluster Architecture, Proceedings of the International Symposium on Computer Architecture, pp. 152-159 (1997).

  20. Pallas, Pallas MPI Benchmarks, http://www.pallas.com/e/products/pmb/.

  21. S. J. Sistare and C. J. Jackson, Ultra-High Performance Communication with MPI and the Sun Fire Link Interconnect, Proceedings of the Supercomputing (2002).

  22. P. Husbands and J. C. Hoe, MPI-StarT: Delivering Network Performance to Numerical Applications, Proceedings of the Supercomputing (1998).

  23. R. Brightwell and A. Skjellum, MPICH on the T3D: A Case Study of High Performance Message Passing, 1996 MPI Developers Conference (July 1996).

  24. Y. Zhou, A. Bilas, S. Jagannathan, C. Dubnicki, J. F. Philbin, andK. Li, Experiences with VI Communication for Database Storage, Proceedings of International Symposium on Computer Architecture'02, pp.257-268 (2002).

  25. K. Magoutis, S. Addetia, A. Fedorova, M. Seltzer, J. Chase, A. Gallatin, R. Kisley, R. Wickremesinghe, and E. Gabber, Structure and Performance of the Direct Access File System, Proceedings of USENIX 2002 Annual Technical Conference, Monterey, CA, pp. 1-14 (June 2002).

  26. E. V. Carrera, S. Rao, L. Iftode, and R. Bianchini, User-Level Communication in Cluster-Based Servers, Proceedings of the Eighth Symposium on High-Performance Architecture (HPCA'02), pp.275-286(February 2002).

  27. F. J. Alfaro, J. L. Sanchez, J. Duato, and C. R. Das, A Strategy to Compute the Infiniband Arbitration Tables, Int'l Parallel and Distributed Processing Symposium (IPDPS' 02) (April 2002).

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, J., Wu, J. & Panda, D.K. High Performance RDMA-Based MPI Implementation over InfiniBand. International Journal of Parallel Programming 32, 167–198 (2004). https://doi.org/10.1023/B:IJPP.0000029272.69895.c1

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:IJPP.0000029272.69895.c1

Navigation