ABSTRACT
Here we present the design and implementation of our LLVM-based Ahead-Of-Time (AOT) SYCL CPU device without using SPIR-V, known as non-SPIRV CPU device. Our design of non-SPIRV CPU device is intended to highlight a general SYCL CPU implementation that aims for both debuggability and performance. Contributions:
• Streamline compiler optimization pipeline by integrating kernel optimizations and transformations into LLVM C++ pipeline.
• Eliminate SPIR-V IR generation during the CPU device code compilation and leverage LLVM IR from compiler front-end directly to reduce compilation overhead and preserve IR information including debug info, among LLVM passes.
- Xinmin Tian, Hideki Saito, Ernesto Su, Jin Lin, Satish Guggilla, Diego Caballero, Matt Masten 2017. LLVM compiler implementation for explicit parallelization and SIMD vectorization. In Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC, pp. 1-11. 2017. DOI: https://doi.org/10.1145/3148173.3148191Google ScholarDigital Library
- Matt Masten, Evgeniy Tyurin, Konstantina Mitropoulou, Eric Garcia, and Hideki Saito. 2018. Function/kernel vectorization via loop vectorizer. In 2018 IEEE/ACM 5th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC), pp. 39-48. IEEE, 2018. DOI: https://doi.org/10.1109/LLVM-HPC.2018.8639483Google ScholarCross Ref
- Jin, Zheming. The rodinia benchmark suite in SYCL. No. ANL/ALCF-20/06. Argonne National Lab.(ANL), Argonne, IL (United States). Argonne Leadership Computing Facility (ALCF), 2020.Google Scholar
Index Terms
- Streamline Ahead-of-Time SYCL CPU Device Implementation through Bypassing SPIR-V
Recommendations
Improving performance of SYCL applications on CPU architectures using LLVM-directed compilation flow
PMAM '22: Proceedings of the Thirteenth International Workshop on Programming Models and Applications for Multicores and ManycoresThe wide adoption of SYCL as an open-standard API for accelerating C++ software in domains such as HPC, Automotive, Artificial Intelligence, Machine Learning, and other areas necessitates efficient compiler and runtime support for a growing number of ...
Exploring the performance and portability of the k-means algorithm on SYCL across CPU and GPU architectures
AbstractThe aim of SYCL is to reduce the gap between the performance and code portability of the main accelerators used in HPC, such as multi-vendor CPUs, GPUs, and FPGAs. To evaluate SYCL’s performance portability, this paper uses the k-means algorithm ...
An implementation of block conjugate gradient algorithm on CPU-GPU processors
Co-HPC '14: Proceedings of the 1st International Workshop on Hardware-Software Co-Design for High Performance ComputingIn this paper, we investigate the implementation of the Block Conjugate Gradient (BCG) algorithm on CPU-GPU processors. By analyzing the performance of various matrix operations in BCG, we identify the main performance bottleneck in constructing new ...
Comments