Skip to main content
Log in

NUMA-aware image compositing on multi-GPU platform

  • Original Article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Sort-last parallel rendering is widely used. Recent GPU developments mean that a PC equipped with multiple GPUs is a viable alternative to a high-cost supercomputer: the Fermi architecture of a single GPU supports uniform virtual addressing, providing a foundation for non-uniform memory access (NUMA) on multi-GPU platforms. Such hardware changes require the user to reconsider the parallel rendering algorithms. In this paper, we thoroughly investigate the NUMA-aware image compositing problem, which is the key final stage in sort-last parallel rendering. Based on a proven radix-k strategy, we find one optimal compositing algorithm, which takes advantage of NUMA architecture on the multi-GPU platform. We quantitatively analyze different image compositing modes for practical image compositing, taking into account peer-to-peer communication costs between GPUs. Our experiments on various datasets show that our image compositing method is very fast, an image of a few megapixels can be composited in about 10 ms by eight GPUs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Cavin, X., Demengeon, O.: Shift-based parallel image compositing on infiniband™ fat-trees. In: Eurographics Symposium on Parallel Graphics and Visualization, pp. 129–138 (2012)

    Google Scholar 

  2. Cavin, X., Mion, C., Filbois, A.: Cots cluster-based sort-last rendering: performance evaluation and pipelined implementation. In: IEEE Visualization, p. 15. IEEE Comput. Soc., Los Alamitos (2005)

    Google Scholar 

  3. Chan, E., Heimlich, M., Purkayastha, A., van de Geijn, R.: Collective communication: theory, practice, and experience. Concurr. Comput. 19(13), 1749–1783 (2007)

    Article  Google Scholar 

  4. Eilemann, S., Pajarola, R.: Direct send compositing for parallel sort-last rendering. In: ACM SIGGRAPH ASIA, pp. 39:1–39:8. ACM, New York (2008), courses, 2008

    Google Scholar 

  5. Eilemann, S., Bilgili, A., Abdellah, M., Hernando, J., Makhinya, M., Pajarola, R., Schürmann, F.: Parallel rendering on hybrid multi-GPU clusters. In: Eurographics Symposium on Parallel Graphics and Visualization, pp. 109–117 (2012)

    Google Scholar 

  6. Kendall, W., Peterka, T., Huang, J., Shen, H.W., Ross, R.B.: Accelerating and benchmarking radix-k image compositing at large scale. In: Eurographics Symposium on Parallel Graphics and Visualization, pp. 101–110 (2010)

    Google Scholar 

  7. Ma, K., Painter, J.S., Hansen, C.D.: Parallel volume rendering using binary-swap compositing. IEEE Comput. Graph. Appl. 14, 59–68 (1994)

    Google Scholar 

  8. Marchesin, S., Mongenet, C., Dischler, J.M.: Multi-GPU sort-last volume visualization. In: Eurographics Symposium on Parallel Graphics and Visualization, pp. 1–8 (2008)

    Google Scholar 

  9. Moerschell, A., Owens, J.D.: Distributed texture memory in a multi-GPU environment. In: Graphics Hardware, pp. 31–38 (2006)

    Google Scholar 

  10. Molnar, S., Cox, M., Ellsworth, D., Fuchs, H.: A sorting classification of parallel rendering. IEEE Comput. Graph. Appl. 14, 23–32 (1994)

    Article  Google Scholar 

  11. Moreland, K., Kendall, W., Peterka, T., Huang, J.: An image compositing solution at scale. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 25:1–25:10. ACM, New York (2011)

    Google Scholar 

  12. Neumann, U.: Communication costs for parallel volume-rendering algorithms. IEEE Comput. Graph. Appl. 14(4), 49–58 (1994)

    Article  Google Scholar 

  13. NVIDIA: Cuda toolkit 4.0 (2012). http://developer.nvidia.com/cuda-toolkit-40

  14. Peterka, T., Goodell, D., Ross, R., Shen, H.W., Thakur, R.: A configurable algorithm for parallel image-compositing applications. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp. 4:1–4:10 (2009)

    Google Scholar 

  15. Porter, T., Duff, T.: Compositing digital images. In: SIGGRAPH, pp. 253–259 (1984)

    Google Scholar 

  16. Schroeder, T.C.: Peer-to-peer and unified virtual addressing. Tech. rep. (2011)

  17. Spafford, K., Meredith, J.S., Vetter, J.S.: Quantifying numa and contention effects in multi-GPU systems. In: Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-4, pp. 11:1–11:7. ACM, New York (2011)

    Google Scholar 

  18. Stompel, A., Ma, K.L., Lum, E.B., Ahrens, J., Patchett, J.: Slic: scheduled linear image compositing for parallel volume rendering. In: IEEE Symposium on Parallel and Large-Data Visualization and Graphics, pp. 6–12 (2003)

    Google Scholar 

  19. Yu, H., Wang, C., Ma, K.L.: Massively parallel volume rendering using 2–3 swap image compositing. In: ACM SIGGRAPH ASIA, pp. 40:1–40:11. ACM, New York (2008), courses, 2008

    Google Scholar 

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their comments. We would like to also thank Hongbin Zhuo of College of Science in National University of Defense Technology for providing us with the electromagnetic volume data.

This work is supported by the National Basic Research Program (No. 2009CB723803), National Science Foundation Program (Nos. 61103084, 61272334, 61170157 and No. 61272009) of China and Research Funding Program of National University of Defense Technology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiquan Cheng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, P., Cheng, Z., Martin, R. et al. NUMA-aware image compositing on multi-GPU platform. Vis Comput 29, 639–649 (2013). https://doi.org/10.1007/s00371-013-0803-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-013-0803-7

Keywords

Navigation