Building Efficient HPC Cloud with SR-IOV-Enabled InfiniBand: The MVAPICH2 Approach

Lu, Xiaoyi; Zhang, Jie; Panda, Dhabaleswar K.

doi:10.1007/978-981-10-5026-8_6

Xiaoyi Lu⁴,
Jie Zhang⁴ &
Dhabaleswar K. Panda⁴

Abstract

Single Root I/O Virtualization (SR-IOV) technology has been steadily gaining momentum for high-speed interconnects such as InfiniBand. SR-IOV enabled InfiniBand has been widely used in modern HPC clouds with virtual machines and containers. While SR-IOV can deliver near-native I/O performance, recent studies have shown that locality-aware communication schemes play an important role in achieving high I/O performance on SR-IOV enabled InfiniBand clusters. To discuss how to build efficient HPC clouds, this chapter presents a novel approach using the MVAPICH2 library. We first propose locality-aware designs inside the MVAPICH2 library to achieve near-native performance on HPC clouds with virtual machines and containers. Then, we propose advanced designs with cloud resource managers such as OpenStack and Slurm to make users easier to deploy and run their applications with the MVAPICH2 library on HPC clouds. Performance evaluations with benchmarks and applications on an OpenStack-based HPC cloud (i.e., NSF-supported Chameleon Cloud) show that MPI applications with our designs are able to get near bare-metal performance on HPC clouds with different virtual machine and container deployment scenarios. Compared to running default MPI applications on Amazon EC2, our design can deliver much better performance. The MVAPICH2 over HPC Cloud software package presented in this chapter is publicly available from http://mvapich.cse.ohio-state.edu.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Virtualization. (2016). https://en.wikipedia.org/wiki/Virtualization.
Rosenblum, M., & Garfinkel, T. (2005). Virtual machine monitors: Current technology and future trends. Computer, 38(5), 39–47.
Article Google Scholar
Jose, J., Li, M., Lu, X., Kandalla, K., Arnold, M., & Panda, D. K. (2013). SR-IOV support for virtualization on InfiniBand clusters: Early experience. In Proceedings of 13th IEEE/ACM International Symposium Cluster, Cloud and Grid Computing (CCGrid), Delft, Netherlands.
Google Scholar
MVAPICH: MPI over InfiniBand, Omni-Path, Ethernet/iWARP, and RoCE. (2016). http://mvapich.cse.ohio-state.edu/.
OpenMPI: Open Source High Performance Computing. (2016). http://www.open-mpi.org/.
Zhang, J., Lu, X., Jose, J., Shi, R., & Panda, D. K. (2014). Can inter-VM Shmem benefit MPI applications on SR-IOV based virtualized InfiniBand clusters? In Proceedings of 20th International Conference Euro-Par 2014 Parallel Processing, Porto, Portugal.
Google Scholar
Single Root I/O Virtualization. (2016). http://www.pcisig.com/specifications/iov/single_root.
Cross Memory Attach (CMA). (2016). http://kernelnewbies.org/Linuxi_3.2.
Macdonell, A. C. (2011). Shared-memory optimizations for virtual machines. Ph.D. Thesis. University of Alberta, Edmonton, Alberta, Fall 2011
Google Scholar
Zhang, J., Lu, X., Jose, J., Li, M., Shi, R., & Panda, D. K. (2014). High performance MPI library over SR-IOV enabled InfiniBand clusters. In Proceedings of International Conference on High Performance Computing (HiPC), Goa, India.
Google Scholar
Zhang, J., Lu, X., & Panda, D. K. (2016). High performance MPI library for container-based HPC cloud on InfiniBand clusters. In Proceedings of the 45th International Conference on Parallel Processing (ICPP), Philadelphia, USA.
Google Scholar
Yoo, A., Jette, M., & Grondona, M. (2003). SLURM: Simple linux utility for resource management. In Proceedings of 9th International Workshop (JSSPP 2003), Seattle, WA, USA
Google Scholar
Zhang, J., Lu, X., Chakraborty, S., & Panda, D. K. (2016). SLURM-V: Extending SLURM for building efficient HPC cloud with SR-IOV and IVShmem. In Proceeding of the 22nd International European Conference on Parallel and Distributed Computing (Euro-Par ’16), Grenoble, France.
Google Scholar
Markwardt, U., Jurenz, M., Rotscher, D., Muller-Pfefferkorn, R., Jakel, R., & Wesarg, B. (2016). Running virtual machines in a Slurm batch system. http://slurm.schedmd.com/SLUG15/SlurmVM.pdf.
Jacobsen, D., Botts, J., & Canon, S. (2016). Never port your code again Docker functionality with Shifter using SLURM. http://slurm.schedmd.com/SLUG15/shifter.pdf.
Zhang, J., Lu, X., Arnold, M., & Panda, D. K. (2015). MVAPICH2 over OpenStack with SR-IOV: An efficient approach to build HPC clouds. In Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Shenzhen, China.
Google Scholar
Chameleon. (2016). http://chameleoncloud.org/.
Docker. (2016). https://www.docker.com/.
Singularity. (2016). http://singularity.lbl.gov/.
Keahey, K., Foster, I., Freeman, T., & Zhang, X. (2005). Virtual workspaces: Achieving quality of service and quality of life in the grid. Scientific Programming, 13(4), 265–275.
Article Google Scholar
Eucalyptus. (2016). http://eucalyptus.com/.
OpenNebula. (2016). http://opennebula.org.
Peng, J., Lu, X., Cheng, B., & Zha, L. (2010). JAMILA: A usable batch job management system to coordinate heterogeneous clusters and diverse applications over grid or cloud infrastructure. In Proceedings of Network and Parallel Computing, Zhengzhou, China.
Google Scholar
Lu, X., Lin, J., Zha, L., & Xu, Z. (2011). Vega LingCloud: A resource single leasing point system to support heterogeneous application modes on shared infrastructure. In Proceedings of IEEE 9th International Symposium on Parallel and Distributed Processing with Applications (ISPA), Busan, Korea.
Google Scholar
Crago, S., Dunn, K., Eads, P., Hochstein, L., Kang, D., Kang, M., et al. (2011). Heterogeneous cloud computing. In Proceedings of 2011 IEEE International Conference on Cluster Computing (Cluster), Austin, TX, USA.
Google Scholar
SPANK. (2016). https://slurm.schedmd.com/spank.html.
Subramoni, H., Lai, P., Luo, M., & Panda, D. K. (2009). RDMA over ethernet—A preliminary study. In Proceedings of the 2009 Workshop on High Performance Interconnects for Distributed Computing (HPIDC’09).
Google Scholar
Romanow, A., & Bailey, S. (2003). An overview of RDMA over IP. In Proceedings of International Workshop on Protocols for Long-Distance Networks (PFLDnet2003).
Google Scholar
Zhang, X., McIntosh, S., Rohatgi, P., & Griffin, J. (2007). XenSocket: A high-throughput interdomain transport for virtual machines. In Proceedings of the ACM/IFIP/USENIX 2007 International Conference on Middleware (Middleware), Newport Beach, USA.
Google Scholar
Kim, K., Kim, C., Jung, S., Shin, H., & Kim, J. (2008). Inter-domain socket communications supporting high performance and full binary compatibility on Xen. In Proceedings of the 4th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE ’08), Seattle, USA.
Google Scholar
Wang, J., Wright, K., & Gopalan, K. (2008). XenLoop: A transparent high performance inter-vm network loopback. In Proceedings of the 17th International Symposium on High Performance Distributed Computing (HPDC), Boston, USA.
Google Scholar
Huang, W., Koop, M., Gao, Q., & Panda, D. K. (2007). Virtual machine aware communication libraries for high performance computing. In Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC), Reno, USA.
Google Scholar
Xavier, M., Neves, M., Rossi, F., Ferreto, T., Lange, T., & Rose, C. (2013). Performance evaluation of container-based virtualization for high performance computing environments. 2013 21st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) (pp. 233–240). Northern Ireland: Belfast.
Chapter Google Scholar
Felter, W., Ferreira, A., Rajamony, R., & Rubio, J. (2014). An updated performance comparison of virtual machines and Linux containers. Technical Report RC25482 (AUS1407-001).
Google Scholar
Ruiz, C., Jeanvoine, E., & Nussbaum, L. (2015). Performance evaluation of containers for HPC. In 10th Workshop on Virtualization in High-Performance Cloud Computing (VHPC), Vienna, Austria.
Google Scholar
Zhou, Y., Subramaniam, B., Keahey, K., & Lange, J. (2015). Comparison of virtualization and containerization techniques for high performance computing. In Proceedings of the 2015 ACM/IEEE Conference on Supercomputing, Austin, USA.
Google Scholar
Estrada, I. (2016). Overview of a virtual cluster using OpenNebula and SLURM. https://portal.futuresystems.org/sites/default/files/one-slurm.pdf.
Ruivo, T., Altayo, G., Garzoglio, G., Timm, S., Kim, H., Noh, S., et al. (2014). Exploring InfiniBand hardware virtualization in OpenNebula towards efficient high-performance computing. In Proceedings of 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).
Google Scholar
MVAPICH2-Virt Heat-based Complex Appliance. (2016). https://www.chameleoncloud.org/appliances/28/.
Telfer, S. (2016). The crossroads of cloud and HPC: OpenStack for scientific research. OpenStack Foundation.
Google Scholar
Guay, W., Reinemo, S., Johnsen, B., Yen, C., Skeie, T., Lysne, O., et al. (2015). Early experiences with live migration of SR-IOV enabled InfiniBand. Journal of Parallel and Distributed Computing (JPDC).
Google Scholar
Xu, X., & Davda, B. (2016). SRVM: Hypervisor support for live migration with passthrough SR-IOV network devices. In Proceedings of the 12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE ’16), Atlanta, USA.
Google Scholar
Pan, Z., Dong, Y., Chen, Y., Zhang, L., & Zhang, Z. (2012). CompSC: Live migration with pass-through devices. In Proceedings of the 8th ACM SIGPLAN/SIGOPS Conference on Virtual Execution Environments (VEE ’12), London, UK (pp. 109–120).
Google Scholar
Zhang, J., Lu, X., & Panda, D. K. (2017). High-performance virtual machine migration framework for MPI applications on SR-IOV enabled InfiniBand clusters. In Proceedings of the 31st IEEE International Parallel and Distributed Processing Symposium (IPDPS ’17), Orlando, USA.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, The Ohio State University Columbus, Columbus, OH, USA
Xiaoyi Lu, Jie Zhang & Dhabaleswar K. Panda

Authors

Xiaoyi Lu
View author publications
You can also search for this author in PubMed Google Scholar
Jie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Dhabaleswar K. Panda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoyi Lu .

Editor information

Editors and Affiliations

School of Engineering and Applied Science, Ahmedabad University, Ahmedabad, Gujarat, India
Sanjay Chaudhary
Department of Computer Science and Engineering, Central University of Rajasthan, Ajmer, Rajasthan, India
Gaurav Somani
School of Computing and Information Systems, The University of Melbourne, Melbourne, Victoria, Australia
Rajkumar Buyya

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lu, X., Zhang, J., Panda, D.K. (2017). Building Efficient HPC Cloud with SR-IOV-Enabled InfiniBand: The MVAPICH2 Approach. In: Chaudhary, S., Somani, G., Buyya, R. (eds) Research Advances in Cloud Computing. Springer, Singapore. https://doi.org/10.1007/978-981-10-5026-8_6

Download citation

DOI: https://doi.org/10.1007/978-981-10-5026-8_6
Published: 28 December 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-5025-1
Online ISBN: 978-981-10-5026-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics