High Performance RDMA-Based MPI Implementation over InfiniBand

Liu, Jiuxing; Wu, Jiesheng; Panda, Dhabaleswar K.

doi:10.1023/B:IJPP.0000029272.69895.c1

High Performance RDMA-Based MPI Implementation over InfiniBand

Published: June 2004

Volume 32, pages 167–198, (2004)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Jiuxing Liu¹,
Jiesheng Wu¹ &
Dhabaleswar K. Panda¹

1641 Accesses
148 Citations
3 Altmetric
Explore all metrics

Abstract

Although InfiniBand Architecture is relatively new in the high performance computing area, it offers many features which help us to improve the performance of communication subsystems. One of these features is Remote Direct Memory Access (RDMA) operations. In this paper, we propose a new design of MPI over InfiniBand which brings the benefit of RDMA to not only large messages, but also small and control messages. We also achieve better scalability by exploiting application communication pattern and combining send/receive operations with RDMA operations. Our RDMA-based MPI implementation achieves a latency of 6.8 μsec for small messages and a peak bandwidth of 871 million bytes/sec. Performance evaluation shows that for small messages, our RDMA-based design can reduce the latency by 24%, increase the bandwidth by over 104%, and reduce the host overhead by up to 22% compared with the original design. For large data transfers, we improve performance by reducing the time for transferring control messages. We have also shown that our new design is beneficial to MPI collective communication and NAS Parallel Benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A brief introduction to distributed systems

Article Open access 16 August 2016

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Article 27 April 2021

A survey on the evolution of stream processing systems

Article Open access 22 November 2023

REFERENCES

T. von Eicken, D. E. Culler, S. C. Goldstein, and K. E. Schauser, Active Messages: A Mechanism for Integrated Communication and Computation, International Symposium on Computer Architecture, pp. 256-266 (1992).
M. Blumrich, C. Dubnicki, E. W. Felten, K. Li, and M. R. Mesarina, Virtual-Memory-Mapped Network Interfaces, IEEE Micro, 15(1): 21-28 (Feb. 1995).
Google Scholar
S. Pakin, M. Lauria, and A. ChienHigh Performance Messaging on Workstations: Illinois Fast Messages (FM), Proceedings of the Supercomputing (1995).
T. von Eicken, A. Basu, V. Buch, and W. Vogels, U-Net: A User-level Network Interface for Parallel and Distributed Computing, ACM Symposium on Operating Systems Principles, pp. 40-53 (1995).
D. Dunning, G. Regnier,G. McAlpine,D. Cameron,B. Shubert,F. Berry,A. Merritt,E. Gronke, and C. Dodd, The Virtual Interface Architecture, IEEE Micro, 18(2): 66-76 (March/April 1998).
Google Scholar
InfiniBand Trade Association, InfiniBand Architecture Specification, Release 1.1, http://www.infinibandta.org (November 2002).
Lawrence Berkeley National Laboratory, MVICH: MPI for Virtual Interface Architecture, http://www.nersc.gov/research/FTG/mvich/index.html (August 2001).
J. Liu, J. Wu, S. P. Kinis, D. Buntinas,W. Yu,B. Chandrasekaran,R. Noronha,P. Wyckoff, and D. K. Panda MPI over InfiniBand: Early Experiences, Technical Report, OSU-CISRC-10/02-TR25, Computer and Information Science, the Ohio State University (January 2003).
R. Dimitrov and A. Skjellum, An Efficient MPI Implementation for Virtual Interface (VI) Architecture-Enabled Cluster Computing,, http://www.mpi-softtech.com/publications/ (1998).
M. Banikazemi,R. K. Govindaraju,R. Blackmore and D. K. Panda, MPI-LAPI: An Efficient Implementation of MPI for IBM RS/6000 SP Systems, IEEE Transactions on Parallel and Distributed Systems, 12(10): 1081-1093 (October 2001).
Google Scholar
H. Tezuka,F. O'Carroll,A. Hori, and Y. Ishikawa, Pin-down Cache: A Virtual Memory Management Technique for Zero-copy Communication, Proceedings of 12th International Parallel Processing Symposium, pp. 308-315 (1998).
J. S. Vetter and F. Mueller, Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures, Int'l Parallel and Distributed Processing Symposium (IPDPS' 02) (April 2002).
J. Wu, J. Liu, P. Wyckoff, and D. K. Panda, Impact of On-Demand Connection Management in MPI over VIA, Proceedings of the IEEE International Conference on Cluster Computing, pp. 152-159 (September 2002).
R. Gupta, P. Balaji, D. K. Panda, and J. Nieplocha, Efficient Collective Operations Using Remote Memory Operations on VIA-Based Clusters, Int'l Parallel and Distributed Processing Symposium (IPDPS' 03) (April 2003).
Mellanox Technologies, Mellanox InfiniBand InfiniHost MT23108 Adapters, http://www.mellanox.com (July 2002).
W. Gropp, E. Lusk, N. Doss, and A. Skjellum, A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard, Parallel Computing, 22(6): 789-828 (1996).
Google Scholar
NASA, NAS Parallel Benchmarks, http://www.nas.nasa.gov/Software/NPB/.
D. E. Culler, R. M. Karp, D. A. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. von Eicken, LogP: Towards a Realistic Model of Parallel Computation, Principles Practice of Parallel Programming, pp. 1-12 (1993).
R. Martin, A. Vahdat, D. Culler, and T. Anderson, Effects of Communication Latency, Overhead, and Bandwidth in a Cluster Architecture, Proceedings of the International Symposium on Computer Architecture, pp. 152-159 (1997).
Pallas, Pallas MPI Benchmarks, http://www.pallas.com/e/products/pmb/.
S. J. Sistare and C. J. Jackson, Ultra-High Performance Communication with MPI and the Sun Fire Link Interconnect, Proceedings of the Supercomputing (2002).
P. Husbands and J. C. Hoe, MPI-StarT: Delivering Network Performance to Numerical Applications, Proceedings of the Supercomputing (1998).
R. Brightwell and A. Skjellum, MPICH on the T3D: A Case Study of High Performance Message Passing, 1996 MPI Developers Conference (July 1996).
Y. Zhou, A. Bilas, S. Jagannathan, C. Dubnicki, J. F. Philbin, andK. Li, Experiences with VI Communication for Database Storage, Proceedings of International Symposium on Computer Architecture'02, pp.257-268 (2002).
K. Magoutis, S. Addetia, A. Fedorova, M. Seltzer, J. Chase, A. Gallatin, R. Kisley, R. Wickremesinghe, and E. Gabber, Structure and Performance of the Direct Access File System, Proceedings of USENIX 2002 Annual Technical Conference, Monterey, CA, pp. 1-14 (June 2002).
E. V. Carrera, S. Rao, L. Iftode, and R. Bianchini, User-Level Communication in Cluster-Based Servers, Proceedings of the Eighth Symposium on High-Performance Architecture (HPCA'02), pp.275-286(February 2002).
F. J. Alfaro, J. L. Sanchez, J. Duato, and C. R. Das, A Strategy to Compute the Infiniband Arbitration Tables, Int'l Parallel and Distributed Processing Symposium (IPDPS' 02) (April 2002).

Download references

Author information

Authors and Affiliations

Computer and Information Science, The Ohio State University, Columbus, OH, 43210
Jiuxing Liu, Jiesheng Wu & Dhabaleswar K. Panda

Authors

Jiuxing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jiesheng Wu
View author publications
You can also search for this author in PubMed Google Scholar
Dhabaleswar K. Panda
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, J., Wu, J. & Panda, D.K. High Performance RDMA-Based MPI Implementation over InfiniBand. International Journal of Parallel Programming 32, 167–198 (2004). https://doi.org/10.1023/B:IJPP.0000029272.69895.c1

Download citation

Issue Date: June 2004
DOI: https://doi.org/10.1023/B:IJPP.0000029272.69895.c1

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High Performance RDMA-Based MPI Implementation over InfiniBand

Abstract

Access this article

Similar content being viewed by others

A brief introduction to distributed systems

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

A survey on the evolution of stream processing systems

REFERENCES

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

High Performance RDMA-Based MPI Implementation over InfiniBand

Abstract

Access this article

Similar content being viewed by others

A brief introduction to distributed systems

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

A survey on the evolution of stream processing systems

REFERENCES

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation