Skip to main content

A Performance Study of Quantum ESPRESSO’s PWscf Code on Multi-core and GPU Systems

  • Conference paper
  • First Online:
High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation (PMBS 2017)

Abstract

We describe the porting of PWscf (Plane-Wave Self Consistent Field), a key component of the Quantum ESPRESSO open-source suite of codes for materials modeling, to GPU systems using CUDA Fortran. Kernel loop directives (CUF kernels) have been extensively used in order to have a single source code for both CPU and GPU implementations. The results of the GPU version have been carefully validated and the performance of the code on several GPU systems (both x86 and POWER8 based) has been compared with traditional Intel multi-core (CPU only) systems. This current GPU version can reduce the time-to-solution by an average factor of 2–3 running two different input cases widely used as benchmarks on small and large high performance computing systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Auckenthaler, T., Blum, V., Bungartz, H.J., Huckle, T., Johanni, R., Krämer, L., Lang, B., Lederer, H., Willems, P.R.: Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations. Parallel Comput. 37(12), 783–794 (2011)

    Article  Google Scholar 

  2. Blackford, L.S., Choi, J., Cleary, A., D’Azeuedo, E., Demmel, J., Dhillon, I., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK User’s Guide. Society for Industrial and Applied Mathematics (1997)

    Google Scholar 

  3. Fatica, M.: Customize CUDA Fortran Profiling with NVTX (2015). https://devblogs.nvidia.com/parallelforall/customize-cuda-fortran-profiling-nvtx

  4. Fatica, M., Ruetsch, G.: CUDA Fortran for Scientists and Engineers. Morgan Kaufmann, Burlington (2014)

    Google Scholar 

  5. Froyen, S.: Brillouin-zone integration by Fourier quadrature: special points for superlattice and supercell calculations. Phys. Rev. B 39, 3168–3172 (1989)

    Article  Google Scholar 

  6. Giannozzi, P., Baroni, S., Bonini, N., Calandra, M., Car, R., Cavazzoni, C., Ceresoli, D., Chiarotti, G.L., Cococcioni, M., Dabo, I., et al.: QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials. J. Phys. Condensed Matter 21(39), 395502 (2009)

    Article  Google Scholar 

  7. Dongarra, J., Gates, M., Haidar, A., Kurzak, J., Luszczek, P., Tomov, S., Yamazaki, I.: Accelerating numerical dense linear algebra calculations with GPUs. In: Kindratenko, V. (ed.) Numerical Computations with GPUs, pp. 3–28. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06548-9_1

    Google Scholar 

  8. Johnson, D.D.: Modified Broyden’s method for accelerating convergence in self-consistent calculations. Phys. Rev. B 38, 12807–12813 (1988)

    Article  Google Scholar 

  9. Kohn, W.: Fundamentals of density functional theory. In: Joubert, D. (ed.) Density Functionals: Theory and Applications, pp. 1–7. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0106731

    Google Scholar 

  10. Kraus, J.: CUDA Pro Tip: generate custom application profile timelines with NVTX (2013). https://devblogs.nvidia.com/parallelforall/cuda-pro-tip-generate-custom-application-profile-timelines-nvtx

  11. Marek, A., Blum, V., Johanni, R., Havu, V., Lang, B., Auckenthaler, T., Heinecke, A., Bungartz, H.J., Lederer, H.: The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and computational science. J. Phys. Condensed Matter 26(21), 213201 (2014)

    Article  Google Scholar 

  12. Message Passing Interface Forum: MPI: A Message-Passing Interface Standard, Version 2.2. Technical report (2009). http://www.mpi-forum.org/docs/mpi-2.2/mpi22-report.pdf

  13. Parr, R.G., Yang, W.: Density-Functional Theory of Atoms and Molecules (International Series of Monographs on Chemistry). Oxford University Press, New York (1994)

    Google Scholar 

  14. Pickett, W.E.: Pseudopotential methods in condensed matter applications. Comput. Phys. Rep. 9(3), 115–197 (1989)

    Article  Google Scholar 

  15. Romero, J.: Developing an Improved Generalized Eigensolver with Limited CPU Offloading. In: GPU Technology Conference, San Jose, CA (2017). http://on-demand.gputechconf.com/gtc/2017/presentation/s7388-joshua-romero-developing-an-improved-generalized-eigensolver.pdf

  16. Spiga, F.: Plug-in code to accelerate Quantum ESPRESSO v5 using NVIDIA GPU. https://github.com/fspiga/qe-gpu-plugin

  17. Spiga, F.: Implementing and testing mixed parallel programming model into Quantum ESPRESSO. In: Science and Supercomputing in Europe - Research Highlights 2009, CINECA Consorzio Interuniversitario, Bologna, Italy (2010)

    Google Scholar 

  18. Spiga, F., Girotto, I.: phiGEMM: a CPU-GPU library for porting Quantum ESPRESSO on hybrid systems. In: 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing, pp. 368–375 (2012)

    Google Scholar 

Download references

Acknowledgments

This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This work was also supported by a grant from the Swiss National Supercomputing Centre (CSCS) under project ID g33. Wilkes-2 is part of the Cambridge Service for Data Driven Discovery (CSD3) system operated by the University of Cambridge Research Computing Service funded by EPSRC Tier-2 capital grant EP/P020259/1, the STFC DiRAC HPC Facility (BIS National E-infrastructure capital grant ST/K001590/1, STFC capital grants ST/H008861/1 and ST/H00887X/1, Operations grant ST/K00333X/1) and the University of Cambridge. CSD3 and DiRAC are part of the UK National e-Infrastructure. Paolo Giannozzi also acknowledges support from the European Union through the MaX Centre of Excellence (Grant No. 676598).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joshua Romero .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Romero, J., Phillips, E., Ruetsch, G., Fatica, M., Spiga, F., Giannozzi, P. (2018). A Performance Study of Quantum ESPRESSO’s PWscf Code on Multi-core and GPU Systems. In: Jarvis, S., Wright, S., Hammond, S. (eds) High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation. PMBS 2017. Lecture Notes in Computer Science(), vol 10724. Springer, Cham. https://doi.org/10.1007/978-3-319-72971-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-72971-8_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-72970-1

  • Online ISBN: 978-3-319-72971-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics