Abstract
Compacting is a common and heavily used operation in different application areas like statistics, database systems, simulations and artificial intelligence. The task of this operation is to produce a smaller output array by writing selected elements of an input array contiguously back to a new output array. The selected elements are usually defined by means of a bit mask. With the always increasing amount of data elements to be processed in the different application areas, better performance becomes a key factor for this operation. Thus, exploiting the parallel capabilities of GPUs to speed up the compacting operation is of great interest. In this paper, we present different optimization approaches for GPUs and evaluate our optimizations (i) on a variety of GPU platforms, (ii) for different sizes of the input array, (iii) for bit distributions of the corresponding bit mask, and (iv) for data types. As we are going to show, we achieve significant speedups compared to the state-of-the-art implementation.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bakunas-Milanowski, D., Rego, V., Sang, J., Chansu, Y.: Efficient algorithms for stream compaction on GPUs. Int. J. Netw. Comput. 7(2), 208–226 (2017)
Bakunas-Milanowski, D., Rego, V., Sang, J., Yu, C.: A fast parallel selection algorithm on GPUs. In: 2015 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 609–614. IEEE (2015)
Choi, K., Yang, H.: A GPU architecture aware fine-grain pruning technique for deep neural networks. In: Sousa, L., Roma, N., Tomás, P. (eds.) Euro-Par 2021. LNCS, vol. 12820, pp. 217–231. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85665-6_14
CUB: cub::DeviceScan::ExclusiveSum documentation. https://nvlabs.github.io/cub/structcub_1_1_device_scan.html#a02b2d2e98f89f80813460f6a6ea1692b
CUB: Main Page. https://nvlabs.github.io/cub/index.html
CUB: Main Page. https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__INT.html
Fett, J., Kober, U., Schwarz, C., Habich, D., Lehner, W.: Artifact and instructions to generate experimental results for the euro-par 2022 paper: accelerating parallel operation for compacting selected elements on GPUs. In: European Conference on Parallel Processing. Springer, Heidelberg (2022). http://doi.org/10.6084/m9.figshare.19945469
Guo, W., Li, Y., Sha, M., He, B., Xiao, X., Tan, K.: GPU-accelerated subgraph enumeration on partitioned graphs. In: SIGMOD Conference, pp. 1067–1082 (2020)
Hertzschuch, A., Hartmann, C., Habich, D., Lehner, W.: Simplicity done right for join ordering. In: 11th Conference on Innovative Data Systems Research, CIDR 2021, Virtual Event, 11–15 January 2021, Online Proceedings (2021)
Hu, L., Zou, L., Liu, Y.: Accelerating triangle counting on GPU. In: SIGMOD Conference, pp. 736–748 (2021)
Lo, S., Lee, C., Chung, I., Chung, Y.: Optimizing pairwise box intersection checking on GPUs for large-scale simulations. ACM Trans. Model. Comput. Simul. 23(3), 19:1–19:22 (2013)
Merrill, D., Garland, M.: Single-pass parallel prefix scan with decoupled look-back. NVIDIA, Technical report, NVR-2016-002 (2016)
Sistla, M.A., Nandivada, V.K.: Graph coloring using GPUs. In: Yahyapour, R. (ed.) Euro-Par 2019. LNCS, vol. 11725, pp. 377–390. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29400-7_27
SPACE Github. https://github.com/yogi-tud/SPACE/
Turing Tuning Guide: CUDA Toolkit documentation. https://docs.nvidia.com/cuda/turing-tuning-guide/index.html
Ungethüm, A., et al.: Hardware-oblivious SIMD parallelism for in-memory column-stores. In: 10th Conference on Innovative Data Systems Research, CIDR 2020, Amsterdam, The Netherlands, 12–15 January 2020, Online Proceedings (2020). www.cidrdb.org
Zhou, K., Hou, Q., Wang, R., Guo, B.: Real-time kd-tree construction on graphics hardware. ACM Trans. Graph. 27(5), 126 (2008)
Acknowledgements and Data Availability Statement
The datasets and code generated during and/or analyzed during the current study are available in the Figshare repository: https://doi.org/10.6084/m9.figshare.19945469 [7].
This work is funded by the German Research Foundation within the RTG 1907 (RoSI) as well as by the European Union’s Horizon 2020 research and innovative program under grant agreement number 957407 (DAPHNE project).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Fett, J., Kober, U., Schwarz, C., Habich, D., Lehner, W. (2022). Accelerating Parallel Operation for Compacting Selected Elements on GPUs. In: Cano, J., Trinder, P. (eds) Euro-Par 2022: Parallel Processing. Euro-Par 2022. Lecture Notes in Computer Science, vol 13440. Springer, Cham. https://doi.org/10.1007/978-3-031-12597-3_12
Download citation
DOI: https://doi.org/10.1007/978-3-031-12597-3_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-12596-6
Online ISBN: 978-3-031-12597-3
eBook Packages: Computer ScienceComputer Science (R0)