poster

Streamline Ahead-of-Time SYCL CPU Device Implementation through Bypassing SPIR-V

Authors:
Wenju He

Intel, China

Intel, China

0009-0003-1012-8921
View Profile

,
Yilong Guo

Intel, China

Intel, China

0009-0000-5123-8113
View Profile

,
Xinmin Tian

Intel, United States

Intel, United States

0000-0001-6228-924X
View Profile

,
Hideki Saito

Intel, United States

Intel, United States

0009-0004-5529-7048
View Profile

,
Wenwan Xing

Intel, China

Intel, China

0009-0001-8034-457X
View Profile

,
Feng Zou

Intel, China

Intel, China

0009-0006-9892-3613
View Profile

,
Chunyang Dai

Intel, China

Intel, China

0009-0005-4867-3188
View Profile

,
Maosu Zhao

Intel, China

Intel, China

0009-0007-0747-4099
View Profile

,
Haonan Yang

Intel, China

Intel, China

0009-0008-3842-9896
View Profile

IWOCL '23: Proceedings of the 2023 International Workshop on OpenCLApril 2023Article No.: 28Pages 1https://doi.org/10.1145/3585341.3585381

Published:18 April 2023Publication History

IWOCL '23: Proceedings of the 2023 International Workshop on OpenCL

Pages 1

ABSTRACT

Here we present the design and implementation of our LLVM-based Ahead-Of-Time (AOT) SYCL CPU device without using SPIR-V, known as non-SPIRV CPU device. Our design of non-SPIRV CPU device is intended to highlight a general SYCL CPU implementation that aims for both debuggability and performance. Contributions:

• Streamline compiler optimization pipeline by integrating kernel optimizations and transformations into LLVM C++ pipeline.

• Eliminate SPIR-V IR generation during the CPU device code compilation and leverage LLVM IR from compiler front-end directly to reduce compilation overhead and preserve IR information including debug info, among LLVM passes.

References

Xinmin Tian, Hideki Saito, Ernesto Su, Jin Lin, Satish Guggilla, Diego Caballero, Matt Masten 2017. LLVM compiler implementation for explicit parallelization and SIMD vectorization. In Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC, pp. 1-11. 2017. DOI: https://doi.org/10.1145/3148173.3148191Google ScholarDigital Library
Matt Masten, Evgeniy Tyurin, Konstantina Mitropoulou, Eric Garcia, and Hideki Saito. 2018. Function/kernel vectorization via loop vectorizer. In 2018 IEEE/ACM 5th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC), pp. 39-48. IEEE, 2018. DOI: https://doi.org/10.1109/LLVM-HPC.2018.8639483Google ScholarCross Ref
Jin, Zheming. The rodinia benchmark suite in SYCL. No. ANL/ALCF-20/06. Argonne National Lab.(ANL), Argonne, IL (United States). Argonne Leadership Computing Facility (ALCF), 2020.Google Scholar

Index Terms

Streamline Ahead-of-Time SYCL CPU Device Implementation through Bypassing SPIR-V
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. General programming languages

Index terms have been assigned to the content through auto-classification.

Recommendations

Improving performance of SYCL applications on CPU architectures using LLVM-directed compilation flow
PMAM '22: Proceedings of the Thirteenth International Workshop on Programming Models and Applications for Multicores and Manycores

The wide adoption of SYCL as an open-standard API for accelerating C++ software in domains such as HPC, Automotive, Artificial Intelligence, Machine Learning, and other areas necessitates efficient compiler and runtime support for a growing number of ...
Read More
Exploring the performance and portability of the k-means algorithm on SYCL across CPU and GPU architectures
Abstract
The aim of SYCL is to reduce the gap between the performance and code portability of the main accelerators used in HPC, such as multi-vendor CPUs, GPUs, and FPGAs. To evaluate SYCL’s performance portability, this paper uses the k-means algorithm ...
Read More
An implementation of block conjugate gradient algorithm on CPU-GPU processors
Co-HPC '14: Proceedings of the 1st International Workshop on Hardware-Software Co-Design for High Performance Computing

In this paper, we investigate the implementation of the Block Conjugate Gradient (BCG) algorithm on CPU-GPU processors. By analyzing the performance of various matrix operations in BCG, we identify the main performance bottleneck in constructing new ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

IWOCL '23: Proceedings of the 2023 International Workshop on OpenCL
April 2023
133 pages
ISBN:9798400707452
DOI:10.1145/3585341

Copyright © 2023 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 April 2023
Check for updates
Qualifiers
- poster
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate84of152submissions,55%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 20
  Total Downloads
- Downloads (Last 12 months)20
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Streamline Ahead-of-Time SYCL CPU Device Implementation through Bypassing SPIR-V

IWOCL '23: Proceedings of the 2023 International Workshop on OpenCL

ABSTRACT

References

Cited By

Index Terms

Recommendations

Improving performance of SYCL applications on CPU architectures using LLVM-directed compilation flow

Exploring the performance and portability of the k-means algorithm on SYCL across CPU and GPU architectures

An implementation of block conjugate gradient algorithm on CPU-GPU processors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Streamline Ahead-of-Time SYCL CPU Device Implementation through Bypassing SPIR-V

IWOCL '23: Proceedings of the 2023 International Workshop on OpenCL

ABSTRACT

References

Cited By

Index Terms

Recommendations

Improving performance of SYCL applications on CPU architectures using LLVM-directed compilation flow

Exploring the performance and portability of the k-means algorithm on SYCL across CPU and GPU architectures

An implementation of block conjugate gradient algorithm on CPU-GPU processors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media