Introducing and Implementing the Allpairs Skeleton for Programming Multi-GPU Systems

Steuwer, Michel; Friese, Malte; Albers, Sebastian; Gorlatch, Sergei

doi:10.1007/s10766-013-0265-6

Introducing and Implementing the Allpairs Skeleton for Programming Multi-GPU Systems

Published: 24 August 2013

Volume 42, pages 601–618, (2014)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Michel Steuwer¹,
Malte Friese¹,
Sebastian Albers¹ &
…
Sergei Gorlatch¹

372 Accesses
7 Citations
Explore all metrics

Abstract

Algorithmic skeletons simplify software development: they abstract typical patterns of parallelism and provide their efficient implementations, allowing the application developer to focus on the structure of algorithms, rather than on implementation details. This becomes especially important for modern parallel systems with multiple graphics processing units (GPUs) whose programming is complex and error-prone, because state-of-the-art programming approaches like CUDA and OpenCL lack high-level abstractions. We define a new algorithmic skeleton for allpairs computations which occur in real-world applications, ranging from bioinformatics to physics. We develop the skeleton’s generic parallel implementation for multi-GPU Systems in OpenCL. To enable the automatic use of the fast GPU memory, we identify and implement an optimized version of the allpairs skeleton with a customizing function that follows a certain memory access pattern. We use matrix multiplication as an application study for the allpairs skeleton and its two implementations and demonstrate that the skeleton greatly simplifies programming, saving up to 90 % of lines of code as compared to OpenCL. The performance of our optimized implementation is up to 6.8 times higher as compared with the generic implementation and is competitive to the performance of a manually written optimized OpenCL code.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SkelCL: Enhancing OpenCL for High-Level Programming of Multi-GPU Systems

SkelCL: a high-level extension of OpenCL for multi-GPU systems

Article 28 May 2014

Michel Steuwer & Sergei Gorlatch

Towards High-Level Programming for Systems with Many Cores

References

AMD.: Accelerated Parallel Processing Math Libraries (APPML). http://developer.amd.com/tools/heterogeneous-computing/amd-accelerated-parallel-processing-math-libraries/ (2013)
Arora, N., Shringarpure, A., Vuduc, R.: Direct N-body Kernels for multicore platforms. In: Proceedings of ICPP’09, IEEE, pp. 379–387 (2009)
Chang, D., Desoky, A., Ouyang, M., Rouchka, E.: Compute pairwise Manhattan distance and Pearson correlation coefficient of data points with GPU. In: Proceedings of SNPD’09, IEEE, pp. 501–506 (2009)
Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30(3), 389–406 (2004)
Article Google Scholar
Daub, C., Steuer, R., Selbig, J., Kloska, S.: Estimating mutual information using B-spline functions—an improved similarity measure for analysing gene expression data. BMC Bioinform. 5(1), 118 (2004)
Article Google Scholar
Enmyren, J., Kessler, C.: SkePU: a multi-backend skeleton programming library for multi-GPU systems. In: Proceedings 4th international workshop on high-level parallel programming and applications, ACM, pp. 5–14 (2010)
Ernsting, S., Kuchen, H.: Algorithmic skeletons for multi-core, multi-GPU systems and clusters. Int. J. High Perform. Comput. Netw. 7(2), 129–138 (2012)
Article Google Scholar
González-Vélez, H., Leyton, M.: A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers. Softw. Pract. Exp. 40(12), 1135–1160 (2010)
Article Google Scholar
Gorlatch, S., Cole, M.: Parallel Skeletons. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 1417–1422. Springer, Berlin (2011)
Google Scholar
Hoberock, J., Bell, N.: Thrust: A Parallel Template Library. https://developer.nvidia.com/thrust (2009)
Kirk, D.B., Hwu, W.W.: Programming Massively Parallel Processors—A Hands-on Approach. Morgan Kaufman, Burlington (2010)
Google Scholar
Lämmel, R.: Google’s MapReduce programming model—revisited. Sci. Comput. Program. 68(3), 208–237 (2007)
Google Scholar
Munshi, A.: The OpenCL Specification. Version 1.2. Khronos OpenCL Working Group, Beaverton, Oregon (2011)
NVIDIA.: NVIDIA CUDA C Programming Guide. http://docs.nvidia.com/cuda/cuda-c-programming-guide Version 5.0 (2012)
NVIDIA.: CUBLAS. http://developer.nvidia.com/cublas (2013)
Sarje, A., Aluru, S.: All-pairs computations on many-core graphics processors. Parallel Comput. 39(2), 79–93 (2013)
Article Google Scholar
Steuwer, M., Kegel, P., Gorlatch, S.: Towards high-level programming of multi-GPU systems using the SkelCL library. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), IEEE, pp. 1858–1865 (2012)
Wirawan, A., Schmidt, B., Kwoh. C.K.: Pairwise distance matrix computation for multiple sequence alignment on the cell broadband engine. In: Proceedings of ICCS’09, Springer, pp. 954–963 (2009)

Download references

Acknowledgments

We thank the anonymous reviewers for their valuable comments and NVIDIA for donating hardware.

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of Muenster, Münster, Germany
Michel Steuwer, Malte Friese, Sebastian Albers & Sergei Gorlatch

Authors

Michel Steuwer
View author publications
You can also search for this author in PubMed Google Scholar
Malte Friese
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Albers
View author publications
You can also search for this author in PubMed Google Scholar
Sergei Gorlatch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michel Steuwer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Steuwer, M., Friese, M., Albers, S. et al. Introducing and Implementing the Allpairs Skeleton for Programming Multi-GPU Systems. Int J Parallel Prog 42, 601–618 (2014). https://doi.org/10.1007/s10766-013-0265-6

Download citation

Received: 02 March 2013
Accepted: 02 August 2013
Published: 24 August 2013
Issue Date: August 2014
DOI: https://doi.org/10.1007/s10766-013-0265-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Introducing and Implementing the Allpairs Skeleton for Programming Multi-GPU Systems

Abstract

Access this article

Similar content being viewed by others

SkelCL: Enhancing OpenCL for High-Level Programming of Multi-GPU Systems

SkelCL: a high-level extension of OpenCL for multi-GPU systems

Towards High-Level Programming for Systems with Many Cores

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Introducing and Implementing the Allpairs Skeleton for Programming Multi-GPU Systems

Abstract

Access this article

Similar content being viewed by others

SkelCL: Enhancing OpenCL for High-Level Programming of Multi-GPU Systems

SkelCL: a high-level extension of OpenCL for multi-GPU systems

Towards High-Level Programming for Systems with Many Cores

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation