Accelerating Parallel Operation for Compacting Selected Elements on GPUs

Fett, Johannes; Kober, Urs; Schwarz, Christian; Habich, Dirk; Lehner, Wolfgang

doi:10.1007/978-3-031-12597-3_12

Accelerating Parallel Operation for Compacting Selected Elements on GPUs

Johannes Fett⁹,
Urs Kober⁹,
Christian Schwarz⁹,
Dirk Habich⁹ &
…
Wolfgang Lehner⁹

Conference paper
First Online: 01 August 2022

1199 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13440))

Abstract

Compacting is a common and heavily used operation in different application areas like statistics, database systems, simulations and artificial intelligence. The task of this operation is to produce a smaller output array by writing selected elements of an input array contiguously back to a new output array. The selected elements are usually defined by means of a bit mask. With the always increasing amount of data elements to be processed in the different application areas, better performance becomes a key factor for this operation. Thus, exploiting the parallel capabilities of GPUs to speed up the compacting operation is of great interest. In this paper, we present different optimization approaches for GPUs and evaluate our optimizations (i) on a variety of GPU platforms, (ii) for different sizes of the input array, (iii) for bit distributions of the corresponding bit mask, and (iv) for data types. As we are going to show, we achieve significant speedups compared to the state-of-the-art implementation.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Bakunas-Milanowski, D., Rego, V., Sang, J., Chansu, Y.: Efficient algorithms for stream compaction on GPUs. Int. J. Netw. Comput. 7(2), 208–226 (2017)
Google Scholar
Bakunas-Milanowski, D., Rego, V., Sang, J., Yu, C.: A fast parallel selection algorithm on GPUs. In: 2015 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 609–614. IEEE (2015)
Google Scholar
Choi, K., Yang, H.: A GPU architecture aware fine-grain pruning technique for deep neural networks. In: Sousa, L., Roma, N., Tomás, P. (eds.) Euro-Par 2021. LNCS, vol. 12820, pp. 217–231. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85665-6_14
Chapter Google Scholar
CUB: cub::DeviceScan::ExclusiveSum documentation. https://nvlabs.github.io/cub/structcub_1_1_device_scan.html#a02b2d2e98f89f80813460f6a6ea1692b
CUB: Main Page. https://nvlabs.github.io/cub/index.html
CUB: Main Page. https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__INT.html
Fett, J., Kober, U., Schwarz, C., Habich, D., Lehner, W.: Artifact and instructions to generate experimental results for the euro-par 2022 paper: accelerating parallel operation for compacting selected elements on GPUs. In: European Conference on Parallel Processing. Springer, Heidelberg (2022). http://doi.org/10.6084/m9.figshare.19945469
Guo, W., Li, Y., Sha, M., He, B., Xiao, X., Tan, K.: GPU-accelerated subgraph enumeration on partitioned graphs. In: SIGMOD Conference, pp. 1067–1082 (2020)
Google Scholar
Hertzschuch, A., Hartmann, C., Habich, D., Lehner, W.: Simplicity done right for join ordering. In: 11th Conference on Innovative Data Systems Research, CIDR 2021, Virtual Event, 11–15 January 2021, Online Proceedings (2021)
Google Scholar
Hu, L., Zou, L., Liu, Y.: Accelerating triangle counting on GPU. In: SIGMOD Conference, pp. 736–748 (2021)
Google Scholar
Lo, S., Lee, C., Chung, I., Chung, Y.: Optimizing pairwise box intersection checking on GPUs for large-scale simulations. ACM Trans. Model. Comput. Simul. 23(3), 19:1–19:22 (2013)
Google Scholar
Merrill, D., Garland, M.: Single-pass parallel prefix scan with decoupled look-back. NVIDIA, Technical report, NVR-2016-002 (2016)
Google Scholar
Sistla, M.A., Nandivada, V.K.: Graph coloring using GPUs. In: Yahyapour, R. (ed.) Euro-Par 2019. LNCS, vol. 11725, pp. 377–390. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29400-7_27
Chapter Google Scholar
SPACE Github. https://github.com/yogi-tud/SPACE/
Turing Tuning Guide: CUDA Toolkit documentation. https://docs.nvidia.com/cuda/turing-tuning-guide/index.html
Ungethüm, A., et al.: Hardware-oblivious SIMD parallelism for in-memory column-stores. In: 10th Conference on Innovative Data Systems Research, CIDR 2020, Amsterdam, The Netherlands, 12–15 January 2020, Online Proceedings (2020). www.cidrdb.org
Zhou, K., Hou, Q., Wang, R., Guo, B.: Real-time kd-tree construction on graphics hardware. ACM Trans. Graph. 27(5), 126 (2008)
Article Google Scholar

Download references

Acknowledgements and Data Availability Statement

The datasets and code generated during and/or analyzed during the current study are available in the Figshare repository: https://doi.org/10.6084/m9.figshare.19945469 [7].

This work is funded by the German Research Foundation within the RTG 1907 (RoSI) as well as by the European Union’s Horizon 2020 research and innovative program under grant agreement number 957407 (DAPHNE project).

Author information

Authors and Affiliations

Database Research Group, Technische Universität Dresden, Dresden, Germany
Johannes Fett, Urs Kober, Christian Schwarz, Dirk Habich & Wolfgang Lehner

Authors

Johannes Fett
View author publications
You can also search for this author in PubMed Google Scholar
Urs Kober
View author publications
You can also search for this author in PubMed Google Scholar
Christian Schwarz
View author publications
You can also search for this author in PubMed Google Scholar
Dirk Habich
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Lehner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dirk Habich .

Editor information

Editors and Affiliations

University of Glasgow, Glasgow, UK
José Cano
University of Glasgow, Glasgow, UK
Phil Trinder

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fett, J., Kober, U., Schwarz, C., Habich, D., Lehner, W. (2022). Accelerating Parallel Operation for Compacting Selected Elements on GPUs. In: Cano, J., Trinder, P. (eds) Euro-Par 2022: Parallel Processing. Euro-Par 2022. Lecture Notes in Computer Science, vol 13440. Springer, Cham. https://doi.org/10.1007/978-3-031-12597-3_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-12597-3_12
Published: 01 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-12596-6
Online ISBN: 978-3-031-12597-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics