A Performance Study of Quantum ESPRESSO’s PWscf Code on Multi-core and GPU Systems

Romero, Joshua; Phillips, Everett; Ruetsch, Gregory; Fatica, Massimiliano; Spiga, Filippo; Giannozzi, Paolo

doi:10.1007/978-3-319-72971-8_4

Joshua Romero¹⁶,
Everett Phillips¹⁶,
Gregory Ruetsch¹⁶,
Massimiliano Fatica¹⁶,
Filippo Spiga¹⁷ &
…
Paolo Giannozzi¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10724))

Included in the following conference series:

International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems

1835 Accesses
9 Citations

Abstract

We describe the porting of PWscf (Plane-Wave Self Consistent Field), a key component of the Quantum ESPRESSO open-source suite of codes for materials modeling, to GPU systems using CUDA Fortran. Kernel loop directives (CUF kernels) have been extensively used in order to have a single source code for both CPU and GPU implementations. The results of the GPU version have been carefully validated and the performance of the code on several GPU systems (both x86 and POWER8 based) has been compared with traditional Intel multi-core (CPU only) systems. This current GPU version can reduce the time-to-solution by an average factor of 2–3 running two different input cases widely used as benchmarks on small and large high performance computing systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Auckenthaler, T., Blum, V., Bungartz, H.J., Huckle, T., Johanni, R., Krämer, L., Lang, B., Lederer, H., Willems, P.R.: Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations. Parallel Comput. 37(12), 783–794 (2011)
Article Google Scholar
Blackford, L.S., Choi, J., Cleary, A., D’Azeuedo, E., Demmel, J., Dhillon, I., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK User’s Guide. Society for Industrial and Applied Mathematics (1997)
Google Scholar
Fatica, M.: Customize CUDA Fortran Profiling with NVTX (2015). https://devblogs.nvidia.com/parallelforall/customize-cuda-fortran-profiling-nvtx
Fatica, M., Ruetsch, G.: CUDA Fortran for Scientists and Engineers. Morgan Kaufmann, Burlington (2014)
Google Scholar
Froyen, S.: Brillouin-zone integration by Fourier quadrature: special points for superlattice and supercell calculations. Phys. Rev. B 39, 3168–3172 (1989)
Article Google Scholar
Giannozzi, P., Baroni, S., Bonini, N., Calandra, M., Car, R., Cavazzoni, C., Ceresoli, D., Chiarotti, G.L., Cococcioni, M., Dabo, I., et al.: QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials. J. Phys. Condensed Matter 21(39), 395502 (2009)
Article Google Scholar
Dongarra, J., Gates, M., Haidar, A., Kurzak, J., Luszczek, P., Tomov, S., Yamazaki, I.: Accelerating numerical dense linear algebra calculations with GPUs. In: Kindratenko, V. (ed.) Numerical Computations with GPUs, pp. 3–28. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06548-9_1
Google Scholar
Johnson, D.D.: Modified Broyden’s method for accelerating convergence in self-consistent calculations. Phys. Rev. B 38, 12807–12813 (1988)
Article Google Scholar
Kohn, W.: Fundamentals of density functional theory. In: Joubert, D. (ed.) Density Functionals: Theory and Applications, pp. 1–7. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0106731
Google Scholar
Kraus, J.: CUDA Pro Tip: generate custom application profile timelines with NVTX (2013). https://devblogs.nvidia.com/parallelforall/cuda-pro-tip-generate-custom-application-profile-timelines-nvtx
Marek, A., Blum, V., Johanni, R., Havu, V., Lang, B., Auckenthaler, T., Heinecke, A., Bungartz, H.J., Lederer, H.: The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and computational science. J. Phys. Condensed Matter 26(21), 213201 (2014)
Article Google Scholar
Message Passing Interface Forum: MPI: A Message-Passing Interface Standard, Version 2.2. Technical report (2009). http://www.mpi-forum.org/docs/mpi-2.2/mpi22-report.pdf
Parr, R.G., Yang, W.: Density-Functional Theory of Atoms and Molecules (International Series of Monographs on Chemistry). Oxford University Press, New York (1994)
Google Scholar
Pickett, W.E.: Pseudopotential methods in condensed matter applications. Comput. Phys. Rep. 9(3), 115–197 (1989)
Article Google Scholar
Romero, J.: Developing an Improved Generalized Eigensolver with Limited CPU Offloading. In: GPU Technology Conference, San Jose, CA (2017). http://on-demand.gputechconf.com/gtc/2017/presentation/s7388-joshua-romero-developing-an-improved-generalized-eigensolver.pdf
Spiga, F.: Plug-in code to accelerate Quantum ESPRESSO v5 using NVIDIA GPU. https://github.com/fspiga/qe-gpu-plugin
Spiga, F.: Implementing and testing mixed parallel programming model into Quantum ESPRESSO. In: Science and Supercomputing in Europe - Research Highlights 2009, CINECA Consorzio Interuniversitario, Bologna, Italy (2010)
Google Scholar
Spiga, F., Girotto, I.: phiGEMM: a CPU-GPU library for porting Quantum ESPRESSO on hybrid systems. In: 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing, pp. 368–375 (2012)
Google Scholar

Download references

Acknowledgments

This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This work was also supported by a grant from the Swiss National Supercomputing Centre (CSCS) under project ID g33. Wilkes-2 is part of the Cambridge Service for Data Driven Discovery (CSD3) system operated by the University of Cambridge Research Computing Service funded by EPSRC Tier-2 capital grant EP/P020259/1, the STFC DiRAC HPC Facility (BIS National E-infrastructure capital grant ST/K001590/1, STFC capital grants ST/H008861/1 and ST/H00887X/1, Operations grant ST/K00333X/1) and the University of Cambridge. CSD3 and DiRAC are part of the UK National e-Infrastructure. Paolo Giannozzi also acknowledges support from the European Union through the MaX Centre of Excellence (Grant No. 676598).

Author information

Authors and Affiliations

NVIDIA Corporation, Santa Clara, USA
Joshua Romero, Everett Phillips, Gregory Ruetsch & Massimiliano Fatica
Research Computing Service, University of Cambridge, Cambridge, UK
Filippo Spiga
Dip. Scienze Matematiche Informatiche e Fisiche, University of Udine, Udine, Italy
Paolo Giannozzi

Authors

Joshua Romero
View author publications
You can also search for this author in PubMed Google Scholar
Everett Phillips
View author publications
You can also search for this author in PubMed Google Scholar
Gregory Ruetsch
View author publications
You can also search for this author in PubMed Google Scholar
Massimiliano Fatica
View author publications
You can also search for this author in PubMed Google Scholar
Filippo Spiga
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Giannozzi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joshua Romero .

Editor information

Editors and Affiliations

University of Warwick, Coventry, United Kingdom
Stephen Jarvis
University of Warwick, Coventry, United Kingdom
Steven Wright
Sandia National Laboratories, Albuquerque, New Mexico, USA
Simon Hammond

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Romero, J., Phillips, E., Ruetsch, G., Fatica, M., Spiga, F., Giannozzi, P. (2018). A Performance Study of Quantum ESPRESSO’s PWscf Code on Multi-core and GPU Systems. In: Jarvis, S., Wright, S., Hammond, S. (eds) High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation. PMBS 2017. Lecture Notes in Computer Science(), vol 10724. Springer, Cham. https://doi.org/10.1007/978-3-319-72971-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-72971-8_4
Published: 23 December 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-72970-1
Online ISBN: 978-3-319-72971-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics