NUMA-aware image compositing on multi-GPU platform

Wang, Pan; Cheng, Zhiquan; Martin, Ralph; Liu, Huahai; Cai, Xun; Li, Sikun

doi:10.1007/s00371-013-0803-7

NUMA-aware image compositing on multi-GPU platform

Original Article
Published: 26 April 2013

Volume 29, pages 639–649, (2013)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Pan Wang¹,
Zhiquan Cheng¹,
Ralph Martin²,
Huahai Liu¹,
Xun Cai¹ &
…
Sikun Li¹

284 Accesses
3 Citations
Explore all metrics

Abstract

Sort-last parallel rendering is widely used. Recent GPU developments mean that a PC equipped with multiple GPUs is a viable alternative to a high-cost supercomputer: the Fermi architecture of a single GPU supports uniform virtual addressing, providing a foundation for non-uniform memory access (NUMA) on multi-GPU platforms. Such hardware changes require the user to reconsider the parallel rendering algorithms. In this paper, we thoroughly investigate the NUMA-aware image compositing problem, which is the key final stage in sort-last parallel rendering. Based on a proven radix-k strategy, we find one optimal compositing algorithm, which takes advantage of NUMA architecture on the multi-GPU platform. We quantitatively analyze different image compositing modes for practical image compositing, taking into account peer-to-peer communication costs between GPUs. Our experiments on various datasets show that our image compositing method is very fast, an image of a few megapixels can be composited in about 10 ms by eight GPUs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Can GPU performance increase faster than the code error rate?

Article Open access 18 April 2024

Optical experimental solution for the multiway number partitioning problem and its application to computing power scheduling

Article 03 August 2023

MT-3000: a heterogeneous multi-zone processor for HPC

Article 24 May 2022

References

Cavin, X., Demengeon, O.: Shift-based parallel image compositing on infiniband™ fat-trees. In: Eurographics Symposium on Parallel Graphics and Visualization, pp. 129–138 (2012)
Google Scholar
Cavin, X., Mion, C., Filbois, A.: Cots cluster-based sort-last rendering: performance evaluation and pipelined implementation. In: IEEE Visualization, p. 15. IEEE Comput. Soc., Los Alamitos (2005)
Google Scholar
Chan, E., Heimlich, M., Purkayastha, A., van de Geijn, R.: Collective communication: theory, practice, and experience. Concurr. Comput. 19(13), 1749–1783 (2007)
Article Google Scholar
Eilemann, S., Pajarola, R.: Direct send compositing for parallel sort-last rendering. In: ACM SIGGRAPH ASIA, pp. 39:1–39:8. ACM, New York (2008), courses, 2008
Google Scholar
Eilemann, S., Bilgili, A., Abdellah, M., Hernando, J., Makhinya, M., Pajarola, R., Schürmann, F.: Parallel rendering on hybrid multi-GPU clusters. In: Eurographics Symposium on Parallel Graphics and Visualization, pp. 109–117 (2012)
Google Scholar
Kendall, W., Peterka, T., Huang, J., Shen, H.W., Ross, R.B.: Accelerating and benchmarking radix-k image compositing at large scale. In: Eurographics Symposium on Parallel Graphics and Visualization, pp. 101–110 (2010)
Google Scholar
Ma, K., Painter, J.S., Hansen, C.D.: Parallel volume rendering using binary-swap compositing. IEEE Comput. Graph. Appl. 14, 59–68 (1994)
Google Scholar
Marchesin, S., Mongenet, C., Dischler, J.M.: Multi-GPU sort-last volume visualization. In: Eurographics Symposium on Parallel Graphics and Visualization, pp. 1–8 (2008)
Google Scholar
Moerschell, A., Owens, J.D.: Distributed texture memory in a multi-GPU environment. In: Graphics Hardware, pp. 31–38 (2006)
Google Scholar
Molnar, S., Cox, M., Ellsworth, D., Fuchs, H.: A sorting classification of parallel rendering. IEEE Comput. Graph. Appl. 14, 23–32 (1994)
Article Google Scholar
Moreland, K., Kendall, W., Peterka, T., Huang, J.: An image compositing solution at scale. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 25:1–25:10. ACM, New York (2011)
Google Scholar
Neumann, U.: Communication costs for parallel volume-rendering algorithms. IEEE Comput. Graph. Appl. 14(4), 49–58 (1994)
Article Google Scholar
NVIDIA: Cuda toolkit 4.0 (2012). http://developer.nvidia.com/cuda-toolkit-40
Peterka, T., Goodell, D., Ross, R., Shen, H.W., Thakur, R.: A configurable algorithm for parallel image-compositing applications. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp. 4:1–4:10 (2009)
Google Scholar
Porter, T., Duff, T.: Compositing digital images. In: SIGGRAPH, pp. 253–259 (1984)
Google Scholar
Schroeder, T.C.: Peer-to-peer and unified virtual addressing. Tech. rep. (2011)
Spafford, K., Meredith, J.S., Vetter, J.S.: Quantifying numa and contention effects in multi-GPU systems. In: Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-4, pp. 11:1–11:7. ACM, New York (2011)
Google Scholar
Stompel, A., Ma, K.L., Lum, E.B., Ahrens, J., Patchett, J.: Slic: scheduled linear image compositing for parallel volume rendering. In: IEEE Symposium on Parallel and Large-Data Visualization and Graphics, pp. 6–12 (2003)
Google Scholar
Yu, H., Wang, C., Ma, K.L.: Massively parallel volume rendering using 2–3 swap image compositing. In: ACM SIGGRAPH ASIA, pp. 40:1–40:11. ACM, New York (2008), courses, 2008
Google Scholar

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their comments. We would like to also thank Hongbin Zhuo of College of Science in National University of Defense Technology for providing us with the electromagnetic volume data.

This work is supported by the National Basic Research Program (No. 2009CB723803), National Science Foundation Program (Nos. 61103084, 61272334, 61170157 and No. 61272009) of China and Research Funding Program of National University of Defense Technology.

Author information

Authors and Affiliations

School of Computer Science, National University of Defense Technology, Hunan, China
Pan Wang, Zhiquan Cheng, Huahai Liu, Xun Cai & Sikun Li
School of Computer Science & Informatics, Cardiff University, Cardiff, UK
Ralph Martin

Authors

Pan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiquan Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Ralph Martin
View author publications
You can also search for this author in PubMed Google Scholar
Huahai Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xun Cai
View author publications
You can also search for this author in PubMed Google Scholar
Sikun Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhiquan Cheng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, P., Cheng, Z., Martin, R. et al. NUMA-aware image compositing on multi-GPU platform. Vis Comput 29, 639–649 (2013). https://doi.org/10.1007/s00371-013-0803-7

Download citation

Published: 26 April 2013
Issue Date: June 2013
DOI: https://doi.org/10.1007/s00371-013-0803-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

NUMA-aware image compositing on multi-GPU platform

Abstract

Access this article

Similar content being viewed by others

Can GPU performance increase faster than the code error rate?

Optical experimental solution for the multiway number partitioning problem and its application to computing power scheduling

MT-3000: a heterogeneous multi-zone processor for HPC

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

NUMA-aware image compositing on multi-GPU platform

Abstract

Access this article

Similar content being viewed by others

Can GPU performance increase faster than the code error rate?

Optical experimental solution for the multiway number partitioning problem and its application to computing power scheduling

MT-3000: a heterogeneous multi-zone processor for HPC

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation