From Describing to Prescribing Parallelism: Translating the SPEC ACCEL OpenACC Suite to OpenMP Target Directives

Juckeland, Guido; Hernandez, Oscar; Jacob, Arpith C.; Neilson, Daniel; Larrea, Verónica G. Vergara; Wienke, Sandra; Bobyr, Alexander; Brantley, William C.; Chandrasekaran, Sunita; Colgrove, Mathew; Grund, Alexander; Henschel, Robert; Joubert, Wayne; Müller, Matthias S.; Raddatz, Dave; Shelepugin, Pavel; Whitney, Brian; Wang, Bo; Kumaran, Kalyan

doi:10.1007/978-3-319-46079-6_33

Guido Juckeland^16,17,
Oscar Hernandez^16,18,
Arpith C. Jacob^16,19,
Daniel Neilson^16,20,
Verónica G. Vergara Larrea^16,18,
Sandra Wienke^16,21,
Alexander Bobyr^16,22,
William C. Brantley^16,23,
Sunita Chandrasekaran^16,24,
Mathew Colgrove^16,25,
Alexander Grund^16,17,
Robert Henschel^16,26,
Wayne Joubert^16,18,
Matthias S. Müller^16,21,
Dave Raddatz^16,27,
Pavel Shelepugin^16,22,
Brian Whitney^16,28,
Bo Wang^16,21 &
…
Kalyan Kumaran^16,29

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9945))

Included in the following conference series:

International Conference on High Performance Computing

2581 Accesses
10 Citations

Abstract

Current and next generation HPC systems will exploit accelerators and self-hosting devices within their compute nodes to accelerate applications. This comes at a time when programmer productivity and the ability to produce portable code has been recognized as a major concern. One of the goals of OpenMP and OpenACC is to allow the user to specify parallelism via directives so that compilers can generate device specific code and optimizations. However, the challenge of porting codes becomes more complex because of the different types of parallelism and memory hierarchies available on different architectures. In this paper we discuss our experience with porting the SPEC ACCEL benchmarks from OpenACC to OpenMP 4.5 using a performance portable style that lets the compiler make platform-specific optimizations to achieve good performance on a variety of systems. The ported SPEC ACCEL OpenMP benchmarks were validated on different platforms including Xeon Phi, GPUs and CPUs. We believe that this experience can help the community and compiler vendors understand how users plan to write OpenMP 4.5 applications in a performance portable style.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Github repository for the extended Clang implementation supporting OpenMP 4.0 (2016). https://github.com/clang-omp/clang_trunk
Agathos, S.N., Papadogiannakis, A., Dimakopoulos, V.V.: Targeting the parallella. In: Träff, J.L., Hunold, S., Versaci, F. (eds.) Euro-Par 2015. LNCS, vol. 9233, pp. 662–674. Springer, Heidelberg (2015). doi:10.1007/978-3-662-48096-0_51
Chapter Google Scholar
Bertolli, C., Antao, S.F., Bercea, G.T., Jacob, A.C., Eichenberger, A.E., Chen, T., Sura, Z., Sung, H., Rokos, G., Appelhans, D., O’Brien, K.: Integrating GPU support for OpenMP offloading directives into clang. In: Proceedings of 2nd Workshop on the LLVM Compiler Infrastructure in HPC, LLVM 2015, NY, USA, pp. 5:1–5:11. ACM, New York (2015). http://doi.acm.org/10.1145/2833157.2833161
Bertolli, C., Antao, S.F., Eichenberger, A.E., O’Brien, K., Sura, Z., Jacob, A.C., Chen, T., Sallenave, O.: Coordinating GPU threads for OpenMP 4.0 in LLVM (2014)
Google Scholar
Calore, E., Schifano, S.F., Tripiccione, R.: On portability, performance and scalability of an MPI OpenCL lattice Boltzmann code. In: Lopes, L., et al. (eds.) Euro-Par 2014. LNCS, vol. 8806, pp. 438–449. Springer, Heidelberg (2014). doi:10.1007/978-3-319-14313-2_37
Google Scholar
Cray: Cray Compiling Environment Release: Overview and Installation Guide (Document: S-5212-84) (2015)
Google Scholar
Foundation, F.S.: GCC 6 Release Series: Changes, New Features, and Fixes (2016). https://gcc.gnu.org/gcc-6/changes.html
GCC Wiki: Offloading Support in GCC. https://gcc.gnu.org/wiki/Offloading
Herdman, J.A., Gaudin, W.P., Perks, O., Beckingsale, D.A., Mallinson, A.C., Jarvis, S.A.: Achieving portability and performance through OpenACC. In: Proceedings of 1st Workshop on Accelerator Programming Using Directives, WACCPD 2014, pp. 19–26. IEEE Press, Piscataway (2014). http://dx.doi.org/10.1109/WACCPD.2014.10
Intel Corporation: Intel\(\textregistered \) C++ Compiler 16.0 User and Reference Guide: OpenMP* Support (2015)
Google Scholar
Juckeland, G., Grund, A., Nagel, W.E.: Performance portable applications for hardware accelerators: lessons learned from SPEC ACCEL. In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW), pp. 689–698, May 2015
Google Scholar
Juckeland, G., et al.: SPEC ACCEL: a standard application suite for measuring hardware accelerator performance. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) PMBS 2014. LNCS, vol. 8966, pp. 46–67. Springer, Heidelberg (2015). http://dx.doi.org/10.1007/978-3-319-17248-4_3
Google Scholar
Liao, C., Yan, Y., Supinski, B.R., Quinlan, D.J., Chapman, B.: Early experiences with the OpenMP accelerator model. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2013. LNCS, vol. 8122, pp. 84–98. Springer, Heidelberg (2013). http://dx.doi.org/10.1007/978-3-642-40698-0_7
Chapter Google Scholar
Lin, P.H., Liao, C., Quinlan, D.J., Guzik, S.: Experiences of using the OpenMP accelerator model to port DOE stencil applications. In: Terboven, C., de Supinski, B.R., Reble, P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2015. LNCS, vol. 9342, pp. 45–59. Springer, Berlin (2015)
Chapter Google Scholar
Martineau, M., McIntosh-Smith, S., Boulton, M., Gaudin, W.: An evaluation of emerging many-core parallel programming models. In: Proceedings of 7th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2016, NY, USA pp. 1–10 (2016)
Google Scholar
Mitra, G., Stotzer, E., Jayaraj, A., Rendell, A.P.: Implementation and optimization of the OpenMP accelerator model for the TI Keystone II architecture. In: DeRose, L., Supinski, B.R., Olivier, S.L., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2014. LNCS, vol. 8766, pp. 202–214. Springer, Heidelberg (2014)
Google Scholar
Müller, M.S., et al.: SPEC OMP2012 — an application benchmark suite for parallel systems using OpenMP. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 223–236. Springer, Heidelberg (2012). http://dx.doi.org/10.1007/978-3-642-30961-8_17
Chapter Google Scholar
Müller, M.S., van Waveren, M., Lieberman, R., Whitney, B., Saito, H., Kumaran, K., Baron, J., Brantley, W.C., Parrott, C., Elken, T., Feng, H., Ponder, C.: SPEC MPI2007 - an application benchmark suite for parallel systems using MPI. Concurr. Comput.: Pract. Exper. 22(2), 191–205 (2010). http://dx.doi.org/10.1002/cpe.v22:2
Newburn, C.J., Dmitriev, S., Narayanaswamy, R., Wiegert, J., Murty, R., Chinchilla, F., Deodhar, R., McGuire, R.: Offload compiler runtime for the Intel Xeon Phi™ coprocessor. In: 2013 IEEE 27th International Parallel and Distributed Processing Symposium Workshops and Ph.D. Forum (IPDPSW), pp. 1213–1225 (2013)
Google Scholar
OpenMP Architecture Review Board: OpenMP Application Program Interface. Version 4.0, July 2013. http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf
OpenMP Architecture Review Board: OpenMP Application Program Interface. Version 4.5, November 2015. http://www.openmp.org/mp-documents/openmp-4.5.pdf
Oracle: Oracle\({\textregistered }\) Solaris Studio 12.4: OpenMP API User’s Guide (2014). http://docs.oracle.com/cd/E37069_01/pdf/E37081.pdf
PathScale: PathScale ENZO 2015 (2015). http://www.pathscale.com/enzo
Pennycook, S.J., Jarvis, S.A.: Developing Performance-Portable Molecular Dynamics Kernels in OpenCL. In: 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC), pp. 386–395 (2012)
Google Scholar
Sabne, A., Sakdhnagool, P., Lee, S., Vetter, J.S.: Evaluating performance portability of OpenACC. In: Brodman, J., Tu, P. (eds.) LCPC 2014. LNCS, vol. 8967, pp. 51–66. Springer, Heidelberg (2015). http://dx.doi.org/10.1007/978-3-319-17473-0_4
Google Scholar
Strohmeier, E., Simon, H., Dongarra, J., Meurer, M.: The 46th top. 500 list, November 2015. http://top500.org/list/2015/11/
Wienke, S., Terboven, C., Beyer, J.C., Müller, M.S.: A pattern-based comparison of OpenACC and OpenMP for accelerator computing. In: Silva, F., Dutra, I., Santos Costa, V. (eds.) Euro-Par 2014 Parallel Processing. LNCS, vol. 8632, pp. 812–823. Springer, Heidelberg (2014). http://dx.doi.org/10.1007/978-3-319-09873-9_68
Google Scholar
Wong, M.: The future of GPU/accelerator programming models. In: Keynote at the 2nd Workshop on the LLVM Compiler Infrastructure in HPC (2015). https://llvm-hpc2-workshop.github.io/slides/Wong.pdf
Woolley, C.: Profiling and tuning OpenACC code. http://on-demand.gputechconf.com/gtc/2012/presentations/S0517B-Monday-Programming-GPUs-OpenACC.pdf

Download references

Acknowledgments

The authors thank Cloyce Spradling for his work on the SPEC harness as well as the SPEC POWER group for their work on enabling the integration of power measurements into other SPEC suites.

SPEC®, SPEC ACCEL™, SPEC CPU™, SPEC MPI®, and SPEC OMP® are registered trademarks of the Standard Performance Evaluation Corporation (SPEC).

Author information

Authors and Affiliations

SPEC High Performance Group (HPG), Gainesville, USA
Guido Juckeland, Oscar Hernandez, Arpith C. Jacob, Daniel Neilson, Verónica G. Vergara Larrea, Sandra Wienke, Alexander Bobyr, William C. Brantley, Sunita Chandrasekaran, Mathew Colgrove, Alexander Grund, Robert Henschel, Wayne Joubert, Matthias S. Müller, Dave Raddatz, Pavel Shelepugin, Brian Whitney, Bo Wang & Kalyan Kumaran
Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany
Guido Juckeland & Alexander Grund
Oak Ridge National Laboratory, Oak Ridge, TN, USA
Oscar Hernandez, Verónica G. Vergara Larrea & Wayne Joubert
IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
Arpith C. Jacob
IBM, Markham, ON, Canada
Daniel Neilson
RWTH Aachen University, Aachen, Germany
Sandra Wienke, Matthias S. Müller & Bo Wang
Intel, Nizhny Novgorod, Russia
Alexander Bobyr & Pavel Shelepugin
AMD, Austin, TX, USA
William C. Brantley
University of Delaware, Newark, DE, USA
Sunita Chandrasekaran
NVIDIA, Santa Clara, CA, USA
Mathew Colgrove
Indiana University, Bloomington, IN, USA
Robert Henschel
SGI, Milpitas, CA, USA
Dave Raddatz
Oracle, Redwood Shores, CA, USA
Brian Whitney
Argonne National Laboratory, Lemont, IL, USA
Kalyan Kumaran

Authors

Guido Juckeland
View author publications
You can also search for this author in PubMed Google Scholar
Oscar Hernandez
View author publications
You can also search for this author in PubMed Google Scholar
Arpith C. Jacob
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Neilson
View author publications
You can also search for this author in PubMed Google Scholar
Verónica G. Vergara Larrea
View author publications
You can also search for this author in PubMed Google Scholar
Sandra Wienke
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Bobyr
View author publications
You can also search for this author in PubMed Google Scholar
William C. Brantley
View author publications
You can also search for this author in PubMed Google Scholar
Sunita Chandrasekaran
View author publications
You can also search for this author in PubMed Google Scholar
Mathew Colgrove
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Grund
View author publications
You can also search for this author in PubMed Google Scholar
Robert Henschel
View author publications
You can also search for this author in PubMed Google Scholar
Wayne Joubert
View author publications
You can also search for this author in PubMed Google Scholar
Matthias S. Müller
View author publications
You can also search for this author in PubMed Google Scholar
Dave Raddatz
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Shelepugin
View author publications
You can also search for this author in PubMed Google Scholar
Brian Whitney
View author publications
You can also search for this author in PubMed Google Scholar
Bo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Kalyan Kumaran
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guido Juckeland .

Editor information

Editors and Affiliations

University of Delaware, Newark, Delaware, USA
Michela Taufer
Forschungszentrum Jülich, Jülich, Germany
Bernd Mohr
DKRZ, Hamburg, Germany
Julian M. Kunkel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Juckeland, G. et al. (2016). From Describing to Prescribing Parallelism: Translating the SPEC ACCEL OpenACC Suite to OpenMP Target Directives. In: Taufer, M., Mohr, B., Kunkel, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9945. Springer, Cham. https://doi.org/10.1007/978-3-319-46079-6_33

Download citation

DOI: https://doi.org/10.1007/978-3-319-46079-6_33
Published: 06 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46078-9
Online ISBN: 978-3-319-46079-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics