skip to main content
10.1145/3110355.3110356acmconferencesArticle/Chapter ViewAbstractPublication PagespodcConference Proceedingsconference-collections
research-article

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption

Published:28 July 2017Publication History

ABSTRACT

Many modern parallel computing systems are heterogeneous at their node level. Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon Phi) that provide high performance with suitable energy-consumption characteristics. However, exploiting the available performance of heterogeneous architectures may be challenging. There are various parallel programming frameworks (such as, OpenMP, OpenCL, OpenACC, CUDA) and selecting the one that is suitable for a target context is not straightforward. In this paper, we study empirically the characteristics of OpenMP, OpenACC, OpenCL, and CUDA with respect to programming productivity, performance, and energy. To evaluate the programming productivity we use our homegrown tool CodeStat, which enables us to determine the percentage of code lines required to parallelize the code using a specific framework. We use our tools MeterPU and x-MeterPU to evaluate the energy consumption and the performance. Experiments are conducted using the industry-standard SPEC benchmark suite and the Rodinia benchmark suite for accelerated computing on heterogeneous systems that combine Intel Xeon E5 Processors with a GPU accelerator or an Intel Xeon Phi co-processor.

References

  1. Erika Abraham, Costas Bekas, Ivona Brandic, Samir Genaim, Einar Broch Johnsen, Ivan Kondov, Sabri Pllana, and A. Achim Streit. 2015. Preparing HPC Applications for Exascale: Challenges and Recommendations. In 18th International Conference on Network-Based Information Systems (NBiS). 401--406. https://doi.org/10.1109/NBiS.2015.61Google ScholarGoogle Scholar
  2. Siegfried Benkner, Sabri Pllana, Jesper Larsson Traff, Philippas Tsigas, Uwe Dolinsky, Cedric Augonnet, Beverly Bachmayer, Christoph Kessler, David Moloney, and Vitaly Osipov. 2011. PEPPHER: Efficient and Productive Usage of Hybrid Computing Systems. Micro, IEEE 31, 5 (Sept 2011), 28--41. 0272--1732Google ScholarGoogle Scholar
  3. Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In IISWC 2009. IEEE, 44--54.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. George Chrysos. 2014. Intel® Xeon Phi? Coprocessor-the Architecture. Intel Whitepaper (2014).Google ScholarGoogle Scholar
  5. Daniel Grzonka, Agnieszka Jakobik, Joanna Kolodziej, and Sabri Pllana. 2017. Using a multi-agent system and artificial intelligence for monitoring and improving the cloud performance and security. Future Generation Computer Systems (2017). 0167--739X https://doi.org/10.1016/j.future.2017.05.046Google ScholarGoogle Scholar
  6. Guido Juckeland, William Brantley, Sunita Chandrasekaran, Barbara Chapman, Shuai Che, Mathew Colgrove, Huiyu Feng, Alexander Grund, Robert Henschel, Wen-Mei W Hwu, et al\mbox. 2014. SPEC ACCEL: a standard application suite for measuring hardware accelerator performance. In International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems. Springer, 46--67.Google ScholarGoogle Scholar
  7. Christoph Kessler, Usman Dastgeer, Samuel Thibault, Raymond Namyst, Andrew Richards, Uwe Dolinsky, Siegfried Benkner, Jesper Larsson Traff, and Sabri Pllana. 2012. Programmability and performance portability aspects of heterogeneous multi-/manycore systems. In 2012 Design, Automation Test in Europe Conference Exhibition (DATE). 1403--1408. 1530--1591 https://doi.org/10.1109/DATE.2012.6176582Google ScholarGoogle Scholar
  8. Xuechao Li, Po-Chou Shih, Jeffrey Overbey, Cheryl Seals, and Alvin Lim. 2016. Comparing programmer productivity in OpenACC and CUDA: an empirical investigation. International Journal of Computer Science, Engineering and Applications (IJCSEA) 6, 5 (2016), 1--15. https://doi.org/10.5121/ijcsea.2016.6501Google ScholarGoogle ScholarCross RefCross Ref
  9. Lu Li and Christoph Kessler. 2016. MeterPU: A Generic Measurement Abstraction API Enabling Energy-tuned Skeleton Backend Selection. Journal of Supercomputing (2016), 1--16. https://doi.org/10.1007/s11227-016--1792-x Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Suejb Memeti and Sabri Pllana. 2015. Accelerating DNA Sequence Analysis Using Intel(R) Xeon Phi(TM). In 2015 IEEE Trustcom/BigDataSE/ISPA, Vol. 3. 222--227.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Sparsh Mittal and Jeffrey S Vetter. 2015. A survey of cpu-gpu heterogeneous computing techniques. ACM Computing Surveys (CSUR) 47, 4 (2015), 69.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. NVIDIA. 2016. CUDA C Programming Guide. http://docs.nvidia.com/cuda/cuda-c-programming-guide/. (September 2016). Accessed: 2017-03-06.Google ScholarGoogle Scholar
  13. NVIDIA. 2017. What is GPU-Accelerated Computing? http://www.nvidia.com/object/what-is-gpu-computing.html. (April 2017). Accessed: 2017-04-03.Google ScholarGoogle Scholar
  14. OpenMP. 2013. OpenMP 4.0 Specifications. http://www.openmp.org/specifications/. (July 2013). Accessed: 2017-03--10.Google ScholarGoogle Scholar
  15. Rodinia. 2015. Rodinia:Accelerating Compute-Intensive Applications with Accelerators. (December 2015). http://www.cs.virginia.edu/skadron/wiki/rodinia/index.php/Rodinia:Accelerating_Compute-Intensive_Applications_with_Accelerators Last accessed: 10 April 2017.Google ScholarGoogle Scholar
  16. SPEC. 2017. SPEC ACCEL: Read Me First. https://www.spec.org/accel/docs/readme1st.html#Q11. (February 2017). Accessed: 2017-04--10.Google ScholarGoogle Scholar
  17. John E Stone, David Gohara, and Guochun Shi. 2010. OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in science & engineering 12, 1--3 (2010), 66--73.Google ScholarGoogle Scholar
  18. Ching-Lung Su, Po-Yu Chen, Chun-Chieh Lan, Long-Sheng Huang, and Kuo-Hsuan Wu. 2012. Overview and comparison of OpenCL and CUDA technology for GPGPU. In 2012 IEEE Asia Pacific Conference on Circuits and Systems. 448--451. https://doi.org/10.1109/APCCAS.2012.6419068Google ScholarGoogle ScholarCross RefCross Ref
  19. Andre Viebke and Sabri Pllana. 2015. The Potential of the Intel (R) Xeon Phi for Supervised Deep Learning. In 2015 IEEE 17th International Conference on High Performance Computing and Communications. 758--765. https://doi.org/10.1109/HPCC-CSS-ICESS.2015.45Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Sandra Wienke, Paul Springer, Christian Terboven, and Dieter an Mey. 2012. OpenACC: First Experiences with Real-world Applications. In Proceedings of the 18th International Conference on Parallel Processing (Euro-Par'12). Springer-Verlag, Berlin, Heidelberg, 859--870. x978--3--642--32819-0Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yonghong Yan, Barbara M. Chapman, and Michael Wong. 2015. A comparison of heterogeneous and manycore programming models. https://goo.gl/81A4iV. (March 2015). Accessed: 2017-03--31.Google ScholarGoogle Scholar

Index Terms

  1. Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ARMS-CC '17: Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing
          July 2017
          38 pages
          ISBN:9781450351164
          DOI:10.1145/3110355
          • General Chairs:
          • Florin Pop,
          • Radu Prodan,
          • Marc Frincu

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 28 July 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          ARMS-CC '17 Paper Acceptance Rate4of11submissions,36%Overall Acceptance Rate4of11submissions,36%

          Upcoming Conference

          PODC '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader