Xenos : Dataflow-Centric Optimization to Accelerate Model Inference on Edge Devices

Zhang, Runhua; Jiang, Hongxu; Tian, Fangzheng; Geng, Jinkun; Li, Xiaobin; Ma, Yuhang; Zhu, Chenhui; Dong, Dong; Li, Xin; Wang, Haojie

doi:10.1007/978-3-031-30637-2_35

Xenos : Dataflow-Centric Optimization to Accelerate Model Inference on Edge Devices

Runhua Zhang¹⁵,
Hongxu Jiang¹⁵,
Fangzheng Tian¹⁵,
Jinkun Geng¹⁶,
Xiaobin Li¹⁵,
Yuhang Ma¹⁵,
Chenhui Zhu¹⁵,
Dong Dong¹⁵,
Xin Li¹⁵ &
…
Haojie Wang¹⁷

Conference paper
First Online: 14 April 2023

1894 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13943))

Abstract

In this paper, we propose Xenos, a high-performance edge platform for model inference. Unlike the prior works which mainly focus one operator-centric optimization, Xenos can automatically conduct dataflow-centric optimization on the computation graph and accelerate inference in two dimensions. Vertically, Xenos develops operator linking technique to improve data locality by restructuring the inter-operator dataflow. Horizontally, Xenos develops DSP-aware operator split technique to enable higher parallelism across multiple DSP units. Our evaluation proves the effectiveness of Xenos ’ vertical and horizontal dataflow optimization, which reduce the inference time by 15.0%–84.9% and 17.9%–89.9%, respectively.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Mutli-Core DSP uses the term “DSP core” whereas FPGA uses “DSP slice”. We use “DSP unit” as the general term in the following description.
2.
As a special case, when the feature map is too large to be held by the shared memory, Xenos will first slice the feature map as a preprocessing step, and then the split procedure continues to partition the sliced feature map.

References

Nvidia tensorrt. https://developer.nvidia.com/tensorrt
XLA: Optimizing compiler for ML. https://www.tensorflow.org/xla
Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: OSDI (2016)
Google Scholar
Chen, T., et al.: \(\{\)TVM\(\}\): an automated \(\{\)End-to-End\(\}\) optimizing compiler for deep learning. In: OSDI (2018)
Google Scholar
Galjaard, J., et al.: MemA: fast inference of multiple deep models. In: PerCom Workshops (2021)
Google Scholar
Geng, J., et al.: ElasticPipe: an efficient and dynamic model-parallel solution to DNN training. In: ScienceCloud (2019)
Google Scholar
Geng, J., et al.: Fela: incorporating flexible parallelism and elastic tuning to accelerate large-scale DML. In: ICDE (2020)
Google Scholar
Grulich, P.M., et al.: Collaborative edge and cloud neural networks for real-time video processing. In: VLDB (2018)
Google Scholar
He, Y., et al.: AMC: AutoML for model compression and acceleration on mobile devices. In: ECCV (2018)
Google Scholar
Hochstetler, J., et al: Embedded deep learning for vehicular edge computing. In: SEC (2018)
Google Scholar
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Jia, Z., et al.: TASO: optimizing deep learning computation with automatic generation of graph substitutions. In: SOSP (2019)
Google Scholar
Polino, A., et al.: Model compression via distillation and quantization. arXiv preprint arXiv:1802.05668 (2018)
Suo, K., et al.: Keep clear of the edges : an empirical study of artificial intelligence workload performance and resource footprint on edge devices. In: IPCCC (2022)
Google Scholar
Wang, H., et al.: PET: optimizing tensor programs with partially equivalent transformations and automated corrections. In: OSDI (2021)
Google Scholar
Wang, H., et al.: User preference based energy-aware mobile AR system with edge computing. In: INFOCOM (2020)
Google Scholar
Zhou, L., et al.: Adaptive parallel execution of deep neural networks on heterogeneous edge devices. In: SEC (2019)
Google Scholar

Download references

Acknowledgement

This work is supported by the National Key Research and Development Program of China (No. 2021ZD0110202). Haojie Wang is supported by the Shuimu Tsinghua Scholar Program.

Author information

Authors and Affiliations

Beihang University, Beijing, China
Runhua Zhang, Hongxu Jiang, Fangzheng Tian, Xiaobin Li, Yuhang Ma, Chenhui Zhu, Dong Dong & Xin Li
Stanford University, Stanford, United States
Jinkun Geng
Tsinghua University, Beijing, China
Haojie Wang

Authors

Runhua Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hongxu Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Fangzheng Tian
View author publications
You can also search for this author in PubMed Google Scholar
Jinkun Geng
View author publications
You can also search for this author in PubMed Google Scholar
Xiaobin Li
View author publications
You can also search for this author in PubMed Google Scholar
Yuhang Ma
View author publications
You can also search for this author in PubMed Google Scholar
Chenhui Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Dong Dong
View author publications
You can also search for this author in PubMed Google Scholar
Xin Li
View author publications
You can also search for this author in PubMed Google Scholar
Haojie Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongxu Jiang .

Editor information

Editors and Affiliations

Tianjin University, Tianjin, China
Xin Wang
University of Torino, Turin, Italy
Maria Luisa Sapino
POSTECH, Pohang, Korea (Republic of)
Wook-Shin Han
University of California Santa Barbara, Santa Barbara, CA, USA
Amr El Abbadi
University of Auckland, Auckland, New Zealand
Gill Dobbie
Tianjin University, Tianjin, China
Zhiyong Feng
Beijing University of Posts and Telecommunications, Beijing, China
Yingxiao Shao
The University of Queensland, Brisbane, QLD, Australia
Hongzhi Yin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, R. et al. (2023). Xenos : Dataflow-Centric Optimization to Accelerate Model Inference on Edge Devices. In: Wang, X., et al. Database Systems for Advanced Applications. DASFAA 2023. Lecture Notes in Computer Science, vol 13943. Springer, Cham. https://doi.org/10.1007/978-3-031-30637-2_35

Download citation

DOI: https://doi.org/10.1007/978-3-031-30637-2_35
Published: 14 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30636-5
Online ISBN: 978-3-031-30637-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics