Abstract
GPU as a hardware processor plays an important role in the training of deep neural networks. However, when using GPUs for computation on convolutional neural network models, different combinations of GPU kernel configuration parameters have different performance. Therefore, this paper proposes BAGF, a bayesian auto-tuning framework for GPU kernels, which parameterizes the factors affecting the performance of GPU programs and uses bayesian optimization methods to search for the best parameters in the search space consisting of the parameters. Compared with other optimization algorithms, BAGF obtains excellent configuration parameters with fewer iterations. This paper analyzes the performance of BAGF on four benchmarks and compares with other common optimization algorithms. In addition, the performance improvement of each parameter configuration is analyzed. Finally, the BAGF was tested with the convolution layer of Alexnet, and the results of the Roofline model were analyzed. Compared with the original parameter configuration, the speed of BAGF was increased by 50.09%.
Supported by organization x.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cao, Z.: Continuous improvement of self-driving cars using dynamic confidence-aware reinforcement learning. Nat. Mach. Intell. 5(2), 145–158 (2023)
Mao, J.: 3D object detection for autonomous driving: a comprehensive survey. Int. J. Comput. Vision 131(8), 1909–1963 (2023)
Aldarmaki, H.: Unsupervised automatic speech recognition: a review. Speech Commun. 139, 76–91 (2022)
Kim, H.: Performance analysis of CNN frameworks for GPUs. In: ISPASS 2017 - IEEE International Symposium on Performance Analysis of Systems and Software, pp. 55–64. IEEE, Piscataway, NJ (2017)
Hu, Y.: A survey on convolutional neural network accelerators: GPU, FPGA and ASIC. In: 2022 IEEE 14th International Conference on Computer Research and Development. ICCRD 2022, pp. 100–107. IEEE, Piscataway, NJ (2022)
Wu, Y., Zhu, H., Zhang, L., Hou, B., Jiao, L.: Accelerating deep convolutional neural network inference based on OpenCL. In: Shi, Z., Jin, Y., Zhang, X. (eds.) Intelligence Science IV. ICIS 2022. IFIP Advances in Information and Communication Technology, vol. 659. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-14903-0_11
Schoonhoven, R.A.: Benchmarking optimization algorithms for auto-tuning GPU kernels. IEEE Trans. Evol. Comput. 27(3), 550–564 (2023)
van Werkhoven, B.: Kernel tuner: a search-optimizing GPU code auto-tuner. Futur. Gener. Comput. Syst. 90, 347–358 (2019)
Feurer, M.: Efficient and robust automated machine learning. In: Advances in Neural Information Processing Systems, pp. 2962–2970. Neural Information Processing Systems Foundation, La Jolla, California (2015)
Snoek, J.: Practical Bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems, pp. 2951–2959. Neural Information Processing Systems Foundation, La Jolla, California (2012)
Mahendran, N.: Adaptive MCMC with Bayesian optimization. In: 15th International Conference on Artificial Intelligence and Statistics, pp. 751–760. PMLR, New York, NY, USA (2012)
Wu, J.: Hyperparameter optimization for machine learning models based on Bayesian optimization. J. Electron. Sci. Technol. 17(1), 26–40 (2019)
Dao, T.T.: An auto-tuner for OpenCL work-group size on GPUs. IEEE Trans. Parallel Distrib. Syst. 29(2), 283–296 (2017)
Li, J.: A fine-grained prefetching scheme for DGEMM kernels on GPU with auto-tuning compatibility. In: 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 863–874. IEEE, Piscataway, NJ (2022)
Petrovič, F.: A benchmark set of highly-efficient CUDA and OpenCL kernels and its dynamic autotuning with kernel tuning toolkit. Futur. Gener. Comput. Syst. 108, 161–177 (2020)
Cheema, S.: GPU Auto-tuning framework for optimal performance and power consumption. In: Proceedings of the 15th Workshop on General Purpose Processing Using GPU, pp. 1–6. Association for Computing Machinery, New York, NY, USA (2023)
Lo, Y.J., et al.: Roofline model toolkit: a practical tool for architectural and program analysis. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) PMBS 2014. LNCS, vol. 8966, pp. 129–148. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17248-4_7
Acknowledgements
This work is funded in part by the Key Research and Development Program of Shaanxi (Program No. 2022ZDLGY01-09), GHfund A No. 202107014474, GHfund 202202036165, Wuhu and Xidian University special fund for industry- university- research cooperation (Project No. XWYCXY-012021013), and Cloud Computing Key Laboratory of Gansu Province.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhu, H., Liu, C., Zhang, L., Dong, X. (2024). Bayesian Optimization for Auto-tuning Convolution Neural Network on GPU. In: Tari, Z., Li, K., Wu, H. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2023. Lecture Notes in Computer Science, vol 14492. Springer, Singapore. https://doi.org/10.1007/978-981-97-0811-6_29
Download citation
DOI: https://doi.org/10.1007/978-981-97-0811-6_29
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0810-9
Online ISBN: 978-981-97-0811-6
eBook Packages: Computer ScienceComputer Science (R0)