ABSTRACT
Many modern parallel computing systems are heterogeneous at their node level. Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon Phi) that provide high performance with suitable energy-consumption characteristics. However, exploiting the available performance of heterogeneous architectures may be challenging. There are various parallel programming frameworks (such as, OpenMP, OpenCL, OpenACC, CUDA) and selecting the one that is suitable for a target context is not straightforward. In this paper, we study empirically the characteristics of OpenMP, OpenACC, OpenCL, and CUDA with respect to programming productivity, performance, and energy. To evaluate the programming productivity we use our homegrown tool CodeStat, which enables us to determine the percentage of code lines required to parallelize the code using a specific framework. We use our tools MeterPU and x-MeterPU to evaluate the energy consumption and the performance. Experiments are conducted using the industry-standard SPEC benchmark suite and the Rodinia benchmark suite for accelerated computing on heterogeneous systems that combine Intel Xeon E5 Processors with a GPU accelerator or an Intel Xeon Phi co-processor.
- Erika Abraham, Costas Bekas, Ivona Brandic, Samir Genaim, Einar Broch Johnsen, Ivan Kondov, Sabri Pllana, and A. Achim Streit. 2015. Preparing HPC Applications for Exascale: Challenges and Recommendations. In 18th International Conference on Network-Based Information Systems (NBiS). 401--406. https://doi.org/10.1109/NBiS.2015.61Google Scholar
- Siegfried Benkner, Sabri Pllana, Jesper Larsson Traff, Philippas Tsigas, Uwe Dolinsky, Cedric Augonnet, Beverly Bachmayer, Christoph Kessler, David Moloney, and Vitaly Osipov. 2011. PEPPHER: Efficient and Productive Usage of Hybrid Computing Systems. Micro, IEEE 31, 5 (Sept 2011), 28--41. 0272--1732Google Scholar
- Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In IISWC 2009. IEEE, 44--54.Google ScholarDigital Library
- George Chrysos. 2014. Intel® Xeon Phi? Coprocessor-the Architecture. Intel Whitepaper (2014).Google Scholar
- Daniel Grzonka, Agnieszka Jakobik, Joanna Kolodziej, and Sabri Pllana. 2017. Using a multi-agent system and artificial intelligence for monitoring and improving the cloud performance and security. Future Generation Computer Systems (2017). 0167--739X https://doi.org/10.1016/j.future.2017.05.046Google Scholar
- Guido Juckeland, William Brantley, Sunita Chandrasekaran, Barbara Chapman, Shuai Che, Mathew Colgrove, Huiyu Feng, Alexander Grund, Robert Henschel, Wen-Mei W Hwu, et al\mbox. 2014. SPEC ACCEL: a standard application suite for measuring hardware accelerator performance. In International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems. Springer, 46--67.Google Scholar
- Christoph Kessler, Usman Dastgeer, Samuel Thibault, Raymond Namyst, Andrew Richards, Uwe Dolinsky, Siegfried Benkner, Jesper Larsson Traff, and Sabri Pllana. 2012. Programmability and performance portability aspects of heterogeneous multi-/manycore systems. In 2012 Design, Automation Test in Europe Conference Exhibition (DATE). 1403--1408. 1530--1591 https://doi.org/10.1109/DATE.2012.6176582Google Scholar
- Xuechao Li, Po-Chou Shih, Jeffrey Overbey, Cheryl Seals, and Alvin Lim. 2016. Comparing programmer productivity in OpenACC and CUDA: an empirical investigation. International Journal of Computer Science, Engineering and Applications (IJCSEA) 6, 5 (2016), 1--15. https://doi.org/10.5121/ijcsea.2016.6501Google ScholarCross Ref
- Lu Li and Christoph Kessler. 2016. MeterPU: A Generic Measurement Abstraction API Enabling Energy-tuned Skeleton Backend Selection. Journal of Supercomputing (2016), 1--16. https://doi.org/10.1007/s11227-016--1792-x Google ScholarDigital Library
- Suejb Memeti and Sabri Pllana. 2015. Accelerating DNA Sequence Analysis Using Intel(R) Xeon Phi(TM). In 2015 IEEE Trustcom/BigDataSE/ISPA, Vol. 3. 222--227.Google ScholarDigital Library
- Sparsh Mittal and Jeffrey S Vetter. 2015. A survey of cpu-gpu heterogeneous computing techniques. ACM Computing Surveys (CSUR) 47, 4 (2015), 69.Google ScholarDigital Library
- NVIDIA. 2016. CUDA C Programming Guide. http://docs.nvidia.com/cuda/cuda-c-programming-guide/. (September 2016). Accessed: 2017-03-06.Google Scholar
- NVIDIA. 2017. What is GPU-Accelerated Computing? http://www.nvidia.com/object/what-is-gpu-computing.html. (April 2017). Accessed: 2017-04-03.Google Scholar
- OpenMP. 2013. OpenMP 4.0 Specifications. http://www.openmp.org/specifications/. (July 2013). Accessed: 2017-03--10.Google Scholar
- Rodinia. 2015. Rodinia:Accelerating Compute-Intensive Applications with Accelerators. (December 2015). http://www.cs.virginia.edu/skadron/wiki/rodinia/index.php/Rodinia:Accelerating_Compute-Intensive_Applications_with_Accelerators Last accessed: 10 April 2017.Google Scholar
- SPEC. 2017. SPEC ACCEL: Read Me First. https://www.spec.org/accel/docs/readme1st.html#Q11. (February 2017). Accessed: 2017-04--10.Google Scholar
- John E Stone, David Gohara, and Guochun Shi. 2010. OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in science & engineering 12, 1--3 (2010), 66--73.Google Scholar
- Ching-Lung Su, Po-Yu Chen, Chun-Chieh Lan, Long-Sheng Huang, and Kuo-Hsuan Wu. 2012. Overview and comparison of OpenCL and CUDA technology for GPGPU. In 2012 IEEE Asia Pacific Conference on Circuits and Systems. 448--451. https://doi.org/10.1109/APCCAS.2012.6419068Google ScholarCross Ref
- Andre Viebke and Sabri Pllana. 2015. The Potential of the Intel (R) Xeon Phi for Supervised Deep Learning. In 2015 IEEE 17th International Conference on High Performance Computing and Communications. 758--765. https://doi.org/10.1109/HPCC-CSS-ICESS.2015.45Google ScholarDigital Library
- Sandra Wienke, Paul Springer, Christian Terboven, and Dieter an Mey. 2012. OpenACC: First Experiences with Real-world Applications. In Proceedings of the 18th International Conference on Parallel Processing (Euro-Par'12). Springer-Verlag, Berlin, Heidelberg, 859--870. x978--3--642--32819-0Google ScholarDigital Library
- Yonghong Yan, Barbara M. Chapman, and Michael Wong. 2015. A comparison of heterogeneous and manycore programming models. https://goo.gl/81A4iV. (March 2015). Accessed: 2017-03--31.Google Scholar
Index Terms
- Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption
Recommendations
A Comparison of SYCL, OpenCL, CUDA, and OpenMP for Massively Parallel Support Vector Machine Classification on Multi-Vendor Hardware
IWOCL '22: Proceedings of the 10th International Workshop on OpenCLIn scientific computing and Artificial Intelligence (AI), which both rely on massively parallel tasks, frameworks like the Compute Unified Device Architecture (CUDA) and the Open Computing Language (OpenCL) are widely used to harvest the computational ...
Generating OpenCL C kernels from OpenACC
IWOCL '14: Proceedings of the International Workshop on OpenCL 2013 & 2014Hardware accelerators are now a common way to improve the performances of compute nodes. This performance improvement has a cost: applications need to be rewritten to take advantage of the new hardware. OpenACC is a set of compiler directives to target ...
CUDA vs OpenACC: performance case studies with kernel benchmarks and a memory-bound CFD application
CCGRID '13: Proceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid ComputingOpenACC is a new accelerator programming interface that provides a set of OpenMP-like loop directives for the programming of accelerators in an implicit and portable way. It allows the programmer to express the offloading of data and computations to ...
Comments