Abstract
Single Root I/O Virtualization (SR-IOV) technology has been steadily gaining momentum for high-speed interconnects such as InfiniBand. SR-IOV enabled InfiniBand has been widely used in modern HPC clouds with virtual machines and containers. While SR-IOV can deliver near-native I/O performance, recent studies have shown that locality-aware communication schemes play an important role in achieving high I/O performance on SR-IOV enabled InfiniBand clusters. To discuss how to build efficient HPC clouds, this chapter presents a novel approach using the MVAPICH2 library. We first propose locality-aware designs inside the MVAPICH2 library to achieve near-native performance on HPC clouds with virtual machines and containers. Then, we propose advanced designs with cloud resource managers such as OpenStack and Slurm to make users easier to deploy and run their applications with the MVAPICH2 library on HPC clouds. Performance evaluations with benchmarks and applications on an OpenStack-based HPC cloud (i.e., NSF-supported Chameleon Cloud) show that MPI applications with our designs are able to get near bare-metal performance on HPC clouds with different virtual machine and container deployment scenarios. Compared to running default MPI applications on Amazon EC2, our design can deliver much better performance. The MVAPICH2 over HPC Cloud software package presented in this chapter is publicly available from http://mvapich.cse.ohio-state.edu.
References
Virtualization. (2016). https://en.wikipedia.org/wiki/Virtualization.
Rosenblum, M., & Garfinkel, T. (2005). Virtual machine monitors: Current technology and future trends. Computer, 38(5), 39–47.
Jose, J., Li, M., Lu, X., Kandalla, K., Arnold, M., & Panda, D. K. (2013). SR-IOV support for virtualization on InfiniBand clusters: Early experience. In Proceedings of 13th IEEE/ACM International Symposium Cluster, Cloud and Grid Computing (CCGrid), Delft, Netherlands.
MVAPICH: MPI over InfiniBand, Omni-Path, Ethernet/iWARP, and RoCE. (2016). http://mvapich.cse.ohio-state.edu/.
OpenMPI: Open Source High Performance Computing. (2016). http://www.open-mpi.org/.
Zhang, J., Lu, X., Jose, J., Shi, R., & Panda, D. K. (2014). Can inter-VM Shmem benefit MPI applications on SR-IOV based virtualized InfiniBand clusters? In Proceedings of 20th International Conference Euro-Par 2014 Parallel Processing, Porto, Portugal.
Single Root I/O Virtualization. (2016). http://www.pcisig.com/specifications/iov/single_root.
Cross Memory Attach (CMA). (2016). http://kernelnewbies.org/Linuxi_3.2.
Macdonell, A. C. (2011). Shared-memory optimizations for virtual machines. Ph.D. Thesis. University of Alberta, Edmonton, Alberta, Fall 2011
Zhang, J., Lu, X., Jose, J., Li, M., Shi, R., & Panda, D. K. (2014). High performance MPI library over SR-IOV enabled InfiniBand clusters. In Proceedings of International Conference on High Performance Computing (HiPC), Goa, India.
Zhang, J., Lu, X., & Panda, D. K. (2016). High performance MPI library for container-based HPC cloud on InfiniBand clusters. In Proceedings of the 45th International Conference on Parallel Processing (ICPP), Philadelphia, USA.
Yoo, A., Jette, M., & Grondona, M. (2003). SLURM: Simple linux utility for resource management. In Proceedings of 9th International Workshop (JSSPP 2003), Seattle, WA, USA
Zhang, J., Lu, X., Chakraborty, S., & Panda, D. K. (2016). SLURM-V: Extending SLURM for building efficient HPC cloud with SR-IOV and IVShmem. In Proceeding of the 22nd International European Conference on Parallel and Distributed Computing (Euro-Par ’16), Grenoble, France.
Markwardt, U., Jurenz, M., Rotscher, D., Muller-Pfefferkorn, R., Jakel, R., & Wesarg, B. (2016). Running virtual machines in a Slurm batch system. http://slurm.schedmd.com/SLUG15/SlurmVM.pdf.
Jacobsen, D., Botts, J., & Canon, S. (2016). Never port your code again Docker functionality with Shifter using SLURM. http://slurm.schedmd.com/SLUG15/shifter.pdf.
Zhang, J., Lu, X., Arnold, M., & Panda, D. K. (2015). MVAPICH2 over OpenStack with SR-IOV: An efficient approach to build HPC clouds. In Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Shenzhen, China.
Chameleon. (2016). http://chameleoncloud.org/.
Docker. (2016). https://www.docker.com/.
Singularity. (2016). http://singularity.lbl.gov/.
Keahey, K., Foster, I., Freeman, T., & Zhang, X. (2005). Virtual workspaces: Achieving quality of service and quality of life in the grid. Scientific Programming, 13(4), 265–275.
Eucalyptus. (2016). http://eucalyptus.com/.
OpenNebula. (2016). http://opennebula.org.
Peng, J., Lu, X., Cheng, B., & Zha, L. (2010). JAMILA: A usable batch job management system to coordinate heterogeneous clusters and diverse applications over grid or cloud infrastructure. In Proceedings of Network and Parallel Computing, Zhengzhou, China.
Lu, X., Lin, J., Zha, L., & Xu, Z. (2011). Vega LingCloud: A resource single leasing point system to support heterogeneous application modes on shared infrastructure. In Proceedings of IEEE 9th International Symposium on Parallel and Distributed Processing with Applications (ISPA), Busan, Korea.
Crago, S., Dunn, K., Eads, P., Hochstein, L., Kang, D., Kang, M., et al. (2011). Heterogeneous cloud computing. In Proceedings of 2011 IEEE International Conference on Cluster Computing (Cluster), Austin, TX, USA.
SPANK. (2016). https://slurm.schedmd.com/spank.html.
Subramoni, H., Lai, P., Luo, M., & Panda, D. K. (2009). RDMA over ethernet—A preliminary study. In Proceedings of the 2009 Workshop on High Performance Interconnects for Distributed Computing (HPIDC’09).
Romanow, A., & Bailey, S. (2003). An overview of RDMA over IP. In Proceedings of International Workshop on Protocols for Long-Distance Networks (PFLDnet2003).
Zhang, X., McIntosh, S., Rohatgi, P., & Griffin, J. (2007). XenSocket: A high-throughput interdomain transport for virtual machines. In Proceedings of the ACM/IFIP/USENIX 2007 International Conference on Middleware (Middleware), Newport Beach, USA.
Kim, K., Kim, C., Jung, S., Shin, H., & Kim, J. (2008). Inter-domain socket communications supporting high performance and full binary compatibility on Xen. In Proceedings of the 4th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE ’08), Seattle, USA.
Wang, J., Wright, K., & Gopalan, K. (2008). XenLoop: A transparent high performance inter-vm network loopback. In Proceedings of the 17th International Symposium on High Performance Distributed Computing (HPDC), Boston, USA.
Huang, W., Koop, M., Gao, Q., & Panda, D. K. (2007). Virtual machine aware communication libraries for high performance computing. In Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC), Reno, USA.
Xavier, M., Neves, M., Rossi, F., Ferreto, T., Lange, T., & Rose, C. (2013). Performance evaluation of container-based virtualization for high performance computing environments. 2013 21st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) (pp. 233–240). Northern Ireland: Belfast.
Felter, W., Ferreira, A., Rajamony, R., & Rubio, J. (2014). An updated performance comparison of virtual machines and Linux containers. Technical Report RC25482 (AUS1407-001).
Ruiz, C., Jeanvoine, E., & Nussbaum, L. (2015). Performance evaluation of containers for HPC. In 10th Workshop on Virtualization in High-Performance Cloud Computing (VHPC), Vienna, Austria.
Zhou, Y., Subramaniam, B., Keahey, K., & Lange, J. (2015). Comparison of virtualization and containerization techniques for high performance computing. In Proceedings of the 2015 ACM/IEEE Conference on Supercomputing, Austin, USA.
Estrada, I. (2016). Overview of a virtual cluster using OpenNebula and SLURM. https://portal.futuresystems.org/sites/default/files/one-slurm.pdf.
Ruivo, T., Altayo, G., Garzoglio, G., Timm, S., Kim, H., Noh, S., et al. (2014). Exploring InfiniBand hardware virtualization in OpenNebula towards efficient high-performance computing. In Proceedings of 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).
MVAPICH2-Virt Heat-based Complex Appliance. (2016). https://www.chameleoncloud.org/appliances/28/.
Telfer, S. (2016). The crossroads of cloud and HPC: OpenStack for scientific research. OpenStack Foundation.
Guay, W., Reinemo, S., Johnsen, B., Yen, C., Skeie, T., Lysne, O., et al. (2015). Early experiences with live migration of SR-IOV enabled InfiniBand. Journal of Parallel and Distributed Computing (JPDC).
Xu, X., & Davda, B. (2016). SRVM: Hypervisor support for live migration with passthrough SR-IOV network devices. In Proceedings of the 12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE ’16), Atlanta, USA.
Pan, Z., Dong, Y., Chen, Y., Zhang, L., & Zhang, Z. (2012). CompSC: Live migration with pass-through devices. In Proceedings of the 8th ACM SIGPLAN/SIGOPS Conference on Virtual Execution Environments (VEE ’12), London, UK (pp. 109–120).
Zhang, J., Lu, X., & Panda, D. K. (2017). High-performance virtual machine migration framework for MPI applications on SR-IOV enabled InfiniBand clusters. In Proceedings of the 31st IEEE International Parallel and Distributed Processing Symposium (IPDPS ’17), Orlando, USA.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Lu, X., Zhang, J., Panda, D.K. (2017). Building Efficient HPC Cloud with SR-IOV-Enabled InfiniBand: The MVAPICH2 Approach. In: Chaudhary, S., Somani, G., Buyya, R. (eds) Research Advances in Cloud Computing. Springer, Singapore. https://doi.org/10.1007/978-981-10-5026-8_6
Download citation
DOI: https://doi.org/10.1007/978-981-10-5026-8_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-5025-1
Online ISBN: 978-981-10-5026-8
eBook Packages: Computer ScienceComputer Science (R0)