Skip to main content

Accelerating Parallel Operation for Compacting Selected Elements on GPUs

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13440))

Abstract

Compacting is a common and heavily used operation in different application areas like statistics, database systems, simulations and artificial intelligence. The task of this operation is to produce a smaller output array by writing selected elements of an input array contiguously back to a new output array. The selected elements are usually defined by means of a bit mask. With the always increasing amount of data elements to be processed in the different application areas, better performance becomes a key factor for this operation. Thus, exploiting the parallel capabilities of GPUs to speed up the compacting operation is of great interest. In this paper, we present different optimization approaches for GPUs and evaluate our optimizations (i) on a variety of GPU platforms, (ii) for different sizes of the input array, (iii) for bit distributions of the corresponding bit mask, and (iv) for data types. As we are going to show, we achieve significant speedups compared to the state-of-the-art implementation.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bakunas-Milanowski, D., Rego, V., Sang, J., Chansu, Y.: Efficient algorithms for stream compaction on GPUs. Int. J. Netw. Comput. 7(2), 208–226 (2017)

    Google Scholar 

  2. Bakunas-Milanowski, D., Rego, V., Sang, J., Yu, C.: A fast parallel selection algorithm on GPUs. In: 2015 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 609–614. IEEE (2015)

    Google Scholar 

  3. Choi, K., Yang, H.: A GPU architecture aware fine-grain pruning technique for deep neural networks. In: Sousa, L., Roma, N., Tomás, P. (eds.) Euro-Par 2021. LNCS, vol. 12820, pp. 217–231. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85665-6_14

    Chapter  Google Scholar 

  4. CUB: cub::DeviceScan::ExclusiveSum documentation. https://nvlabs.github.io/cub/structcub_1_1_device_scan.html#a02b2d2e98f89f80813460f6a6ea1692b

  5. CUB: Main Page. https://nvlabs.github.io/cub/index.html

  6. CUB: Main Page. https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__INT.html

  7. Fett, J., Kober, U., Schwarz, C., Habich, D., Lehner, W.: Artifact and instructions to generate experimental results for the euro-par 2022 paper: accelerating parallel operation for compacting selected elements on GPUs. In: European Conference on Parallel Processing. Springer, Heidelberg (2022). http://doi.org/10.6084/m9.figshare.19945469

  8. Guo, W., Li, Y., Sha, M., He, B., Xiao, X., Tan, K.: GPU-accelerated subgraph enumeration on partitioned graphs. In: SIGMOD Conference, pp. 1067–1082 (2020)

    Google Scholar 

  9. Hertzschuch, A., Hartmann, C., Habich, D., Lehner, W.: Simplicity done right for join ordering. In: 11th Conference on Innovative Data Systems Research, CIDR 2021, Virtual Event, 11–15 January 2021, Online Proceedings (2021)

    Google Scholar 

  10. Hu, L., Zou, L., Liu, Y.: Accelerating triangle counting on GPU. In: SIGMOD Conference, pp. 736–748 (2021)

    Google Scholar 

  11. Lo, S., Lee, C., Chung, I., Chung, Y.: Optimizing pairwise box intersection checking on GPUs for large-scale simulations. ACM Trans. Model. Comput. Simul. 23(3), 19:1–19:22 (2013)

    Google Scholar 

  12. Merrill, D., Garland, M.: Single-pass parallel prefix scan with decoupled look-back. NVIDIA, Technical report, NVR-2016-002 (2016)

    Google Scholar 

  13. Sistla, M.A., Nandivada, V.K.: Graph coloring using GPUs. In: Yahyapour, R. (ed.) Euro-Par 2019. LNCS, vol. 11725, pp. 377–390. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29400-7_27

    Chapter  Google Scholar 

  14. SPACE Github. https://github.com/yogi-tud/SPACE/

  15. Turing Tuning Guide: CUDA Toolkit documentation. https://docs.nvidia.com/cuda/turing-tuning-guide/index.html

  16. Ungethüm, A., et al.: Hardware-oblivious SIMD parallelism for in-memory column-stores. In: 10th Conference on Innovative Data Systems Research, CIDR 2020, Amsterdam, The Netherlands, 12–15 January 2020, Online Proceedings (2020). www.cidrdb.org

  17. Zhou, K., Hou, Q., Wang, R., Guo, B.: Real-time kd-tree construction on graphics hardware. ACM Trans. Graph. 27(5), 126 (2008)

    Article  Google Scholar 

Download references

Acknowledgements and Data Availability Statement

The datasets and code generated during and/or analyzed during the current study are available in the Figshare repository: https://doi.org/10.6084/m9.figshare.19945469 [7].

This work is funded by the German Research Foundation within the RTG 1907 (RoSI) as well as by the European Union’s Horizon 2020 research and innovative program under grant agreement number 957407 (DAPHNE project).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dirk Habich .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fett, J., Kober, U., Schwarz, C., Habich, D., Lehner, W. (2022). Accelerating Parallel Operation for Compacting Selected Elements on GPUs. In: Cano, J., Trinder, P. (eds) Euro-Par 2022: Parallel Processing. Euro-Par 2022. Lecture Notes in Computer Science, vol 13440. Springer, Cham. https://doi.org/10.1007/978-3-031-12597-3_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-12597-3_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-12596-6

  • Online ISBN: 978-3-031-12597-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics