Abstract
In this paper, we propose Xenos, a high-performance edge platform for model inference. Unlike the prior works which mainly focus one operator-centric optimization, Xenos can automatically conduct dataflow-centric optimization on the computation graph and accelerate inference in two dimensions. Vertically, Xenos develops operator linking technique to improve data locality by restructuring the inter-operator dataflow. Horizontally, Xenos develops DSP-aware operator split technique to enable higher parallelism across multiple DSP units. Our evaluation proves the effectiveness of Xenos ’ vertical and horizontal dataflow optimization, which reduce the inference time by 15.0%–84.9% and 17.9%–89.9%, respectively.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Mutli-Core DSP uses the term “DSP core” whereas FPGA uses “DSP slice”. We use “DSP unit” as the general term in the following description.
- 2.
As a special case, when the feature map is too large to be held by the shared memory, Xenos will first slice the feature map as a preprocessing step, and then the split procedure continues to partition the sliced feature map.
References
Nvidia tensorrt. https://developer.nvidia.com/tensorrt
XLA: Optimizing compiler for ML. https://www.tensorflow.org/xla
Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: OSDI (2016)
Chen, T., et al.: \(\{\)TVM\(\}\): an automated \(\{\)End-to-End\(\}\) optimizing compiler for deep learning. In: OSDI (2018)
Galjaard, J., et al.: MemA: fast inference of multiple deep models. In: PerCom Workshops (2021)
Geng, J., et al.: ElasticPipe: an efficient and dynamic model-parallel solution to DNN training. In: ScienceCloud (2019)
Geng, J., et al.: Fela: incorporating flexible parallelism and elastic tuning to accelerate large-scale DML. In: ICDE (2020)
Grulich, P.M., et al.: Collaborative edge and cloud neural networks for real-time video processing. In: VLDB (2018)
He, Y., et al.: AMC: AutoML for model compression and acceleration on mobile devices. In: ECCV (2018)
Hochstetler, J., et al: Embedded deep learning for vehicular edge computing. In: SEC (2018)
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Jia, Z., et al.: TASO: optimizing deep learning computation with automatic generation of graph substitutions. In: SOSP (2019)
Polino, A., et al.: Model compression via distillation and quantization. arXiv preprint arXiv:1802.05668 (2018)
Suo, K., et al.: Keep clear of the edges : an empirical study of artificial intelligence workload performance and resource footprint on edge devices. In: IPCCC (2022)
Wang, H., et al.: PET: optimizing tensor programs with partially equivalent transformations and automated corrections. In: OSDI (2021)
Wang, H., et al.: User preference based energy-aware mobile AR system with edge computing. In: INFOCOM (2020)
Zhou, L., et al.: Adaptive parallel execution of deep neural networks on heterogeneous edge devices. In: SEC (2019)
Acknowledgement
This work is supported by the National Key Research and Development Program of China (No. 2021ZD0110202). Haojie Wang is supported by the Shuimu Tsinghua Scholar Program.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, R. et al. (2023). Xenos : Dataflow-Centric Optimization to Accelerate Model Inference on Edge Devices. In: Wang, X., et al. Database Systems for Advanced Applications. DASFAA 2023. Lecture Notes in Computer Science, vol 13943. Springer, Cham. https://doi.org/10.1007/978-3-031-30637-2_35
Download citation
DOI: https://doi.org/10.1007/978-3-031-30637-2_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30636-5
Online ISBN: 978-3-031-30637-2
eBook Packages: Computer ScienceComputer Science (R0)