Skip to main content

Xenos : Dataflow-Centric Optimization to Accelerate Model Inference on Edge Devices

  • Conference paper
  • First Online:
  • 1894 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13943))

Abstract

In this paper, we propose Xenos, a high-performance edge platform for model inference. Unlike the prior works which mainly focus one operator-centric optimization, Xenos can automatically conduct dataflow-centric optimization on the computation graph and accelerate inference in two dimensions. Vertically, Xenos develops operator linking technique to improve data locality by restructuring the inter-operator dataflow. Horizontally, Xenos develops DSP-aware operator split technique to enable higher parallelism across multiple DSP units. Our evaluation proves the effectiveness of Xenos ’ vertical and horizontal dataflow optimization, which reduce the inference time by 15.0%–84.9% and 17.9%–89.9%, respectively.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Mutli-Core DSP uses the term “DSP core” whereas FPGA uses “DSP slice”. We use “DSP unit” as the general term in the following description.

  2. 2.

    As a special case, when the feature map is too large to be held by the shared memory, Xenos will first slice the feature map as a preprocessing step, and then the split procedure continues to partition the sliced feature map.

References

  1. Nvidia tensorrt. https://developer.nvidia.com/tensorrt

  2. XLA: Optimizing compiler for ML. https://www.tensorflow.org/xla

  3. Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: OSDI (2016)

    Google Scholar 

  4. Chen, T., et al.: \(\{\)TVM\(\}\): an automated \(\{\)End-to-End\(\}\) optimizing compiler for deep learning. In: OSDI (2018)

    Google Scholar 

  5. Galjaard, J., et al.: MemA: fast inference of multiple deep models. In: PerCom Workshops (2021)

    Google Scholar 

  6. Geng, J., et al.: ElasticPipe: an efficient and dynamic model-parallel solution to DNN training. In: ScienceCloud (2019)

    Google Scholar 

  7. Geng, J., et al.: Fela: incorporating flexible parallelism and elastic tuning to accelerate large-scale DML. In: ICDE (2020)

    Google Scholar 

  8. Grulich, P.M., et al.: Collaborative edge and cloud neural networks for real-time video processing. In: VLDB (2018)

    Google Scholar 

  9. He, Y., et al.: AMC: AutoML for model compression and acceleration on mobile devices. In: ECCV (2018)

    Google Scholar 

  10. Hochstetler, J., et al: Embedded deep learning for vehicular edge computing. In: SEC (2018)

    Google Scholar 

  11. Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

  12. Jia, Z., et al.: TASO: optimizing deep learning computation with automatic generation of graph substitutions. In: SOSP (2019)

    Google Scholar 

  13. Polino, A., et al.: Model compression via distillation and quantization. arXiv preprint arXiv:1802.05668 (2018)

  14. Suo, K., et al.: Keep clear of the edges : an empirical study of artificial intelligence workload performance and resource footprint on edge devices. In: IPCCC (2022)

    Google Scholar 

  15. Wang, H., et al.: PET: optimizing tensor programs with partially equivalent transformations and automated corrections. In: OSDI (2021)

    Google Scholar 

  16. Wang, H., et al.: User preference based energy-aware mobile AR system with edge computing. In: INFOCOM (2020)

    Google Scholar 

  17. Zhou, L., et al.: Adaptive parallel execution of deep neural networks on heterogeneous edge devices. In: SEC (2019)

    Google Scholar 

Download references

Acknowledgement

This work is supported by the National Key Research and Development Program of China (No. 2021ZD0110202). Haojie Wang is supported by the Shuimu Tsinghua Scholar Program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongxu Jiang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, R. et al. (2023). Xenos : Dataflow-Centric Optimization to Accelerate Model Inference on Edge Devices. In: Wang, X., et al. Database Systems for Advanced Applications. DASFAA 2023. Lecture Notes in Computer Science, vol 13943. Springer, Cham. https://doi.org/10.1007/978-3-031-30637-2_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-30637-2_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-30636-5

  • Online ISBN: 978-3-031-30637-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics