skip to main content
10.1145/3597062.3597278acmconferencesArticle/Chapter ViewAbstractPublication PagesmobisysConference Proceedingsconference-collections
research-article

The Case for Hierarchical Deep Learning Inference at the Network Edge

Published:21 June 2023Publication History

ABSTRACT

Resource-constrained Edge Devices (EDs), e.g., IoT sensors and microcontroller units, are expected to make intelligent decisions using Deep Learning (DL) inference at the edge of the network. Toward this end, developing tinyML models is an area of active research - DL models with reduced computation and memory storage requirements - that can be embedded on these devices. However, tinyML models have lower inference accuracy. On a different front, DNN partitioning and inference offloading techniques were studied for distributed DL inference between EDs and Edge Servers (ESs). In this paper, we explore Hierarchical Inference (HI), a novel approach proposed in [19] for performing distributed DL inference at the edge. Under HI, for each data sample, an ED first uses a local algorithm (e.g., a tinyML model) for inference. Depending on the application, if the inference provided by the local algorithm is incorrect or further assistance is required from large DL models on edge or cloud, only then the ED offloads the data sample. At the outset, HI seems infeasible as the ED, in general, cannot know if the local inference is sufficient or not. Nevertheless, we present the feasibility of implementing HI for image classification applications. We demonstrate its benefits using quantitative analysis and show that HI provides a better trade-off between offloading cost, throughput, and inference accuracy compared to alternate approaches.

References

  1. Ghina Al-Atat, Andrea Fresa, Adarsh P. Behera, Vishnu N. Moothedath, James Gross, and Jaya P. Champati. 2023. The Case for Hierarchical Deep Learning Inference at the Network Edge. arXiv:2304.11763 [cs.DC]Google ScholarGoogle Scholar
  2. Ying Cui, Bixia Tang, Gangao Wu, Lun Li, Xin Zhang, Zhenglin Du, and Wenming Zhao. 2023. Classification of dog breeds using convolutional neural network models and support vector machine. bioRxiv (2023). Google ScholarGoogle ScholarCross RefCross Ref
  3. Lei Deng, Guoqi Li, Song Han, Luping Shi, and Yuan Xie. 2020. Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey. Proc. IEEE 108, 4 (2020), 485--532. Google ScholarGoogle ScholarCross RefCross Ref
  4. Chongwu Dong, Sheng Hu, Xi Chen, and Wushao Wen. 2021. Joint optimization with DNN partitioning and resource allocation in mobile edge computing. IEEE Transactions on Network and Service Management 18, 4 (2021), 3973--3986.Google ScholarGoogle ScholarCross RefCross Ref
  5. Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proc. ICLR.Google ScholarGoogle Scholar
  6. Colby Banbury et al. 2021. MLPerf Tiny Benchmark. In Proc. Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).Google ScholarGoogle Scholar
  7. Igor Fedorov, Ryan P. Adams, Matthew Mattina, and Paul N. Whatmough. 2019. SpArSe: Sparse architecture search for CNNs on resource-constrained microcontrollers. Advances in Neural Information Processing Systems 32 (2019).Google ScholarGoogle Scholar
  8. Andrea Fresa and Jaya P. Champati. 2022. An Offloading Algorithm for Maximizing Inference Accuracy on Edge Device in an Edge Intelligence System. In Proc. ACM MSWiM. 15--23.Google ScholarGoogle Scholar
  9. Song Han, Huizi Mao, and William J. Dally. 2016. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. In Proc. ICLR.Google ScholarGoogle Scholar
  10. Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. 2018. AMC: AutoML for Model Compression and Acceleration on Mobile Devices. In Proc. ECCV. 815--832.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Chuang Hu, Wei Bao, Dan Wang, and Fengming Liu. 2019. Dynamic Adaptive DNN Surgery for Inference Acceleration on the Edge. In Proc. IEEE INFOCOM. 1423--1431. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chenghao Hu and Baochun Li. 2022. Distributed Inference with Deep Learning Models across Heterogeneous Edge Devices. In Proc. IEEE INFOCOM. 330--339. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5MB model size.Google ScholarGoogle Scholar
  14. Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge. In Proc. ACM ASPLOS. 615--629. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge. SIGARCH Comput. Archit. News 45, 1 (apr 2017), 615--629. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Proc. NIPS. 1097--1105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. En Li, Liekang Zeng, Zhi Zhou, and Xu Chen. 2020. Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing. IEEE Transactions on Wireless Communications 19, 1 (2020), 447--457. Google ScholarGoogle ScholarCross RefCross Ref
  18. Pavel Mach and Zdenek Becvar. 2017. Mobile Edge Computing: A Survey on Architecture and Computation Offloading. IEEE Communications Surveys Tutorials 19, 3 (2017), 1628--1656.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Vishnu N. Moothedath, Jaya P. Champati, and James Gross. 2023. Online Algorithms for Hierarchical Inference in Deep Learning applications at the Edge. arXiv:2304.00891Google ScholarGoogle Scholar
  20. Ivana Nikoloska and Nikola Zlatanov. 2021. Data Selection Scheme for Energy Efficient Supervised Learning at IoT Nodes. IEEE Communications Letters 25, 3 (2021), 859--863. Google ScholarGoogle ScholarCross RefCross Ref
  21. Emil Njor, Jan Madsen, and Xenofon Fafoutis. 2022. A Primer for tinyML Predictive Maintenance: Input and Model Optimisation. In Proc. Artificial Intelligence Applications and Innovations. 67--78.Google ScholarGoogle ScholarCross RefCross Ref
  22. Julius Ruseckas. n.d.. EfficientNet on CIFAR10. https://juliusruseckas.github.io/ml/efficientnet-cifar10.html.Google ScholarGoogle Scholar
  23. Ramon Sanchez-Iborra and Antonio F. Skarmeta. 2020. TinyML-Enabled Frugal Smart Objects: Challenges and Opportunities. IEEE Circuits and Systems Magazine 20, 3 (2020), 4--18.Google ScholarGoogle ScholarCross RefCross Ref
  24. Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proc. IEEE CVPR. 4510--4520.Google ScholarGoogle ScholarCross RefCross Ref
  25. Mingxing Tan and Quoc V. Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proc. ICML, Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), Vol. 97. PMLR, 6105--6114.Google ScholarGoogle Scholar
  26. Surat Teerapittayanon, Bradley McDanel, and H.T. Kung. 2016. BranchyNet: Fast inference via early exiting from deep neural networks. In Proc. ICPR. 2464--2469.Google ScholarGoogle Scholar
  27. Yundong Zhang, Naveen Suda, Liangzhen Lai, and Vikas Chandra. 2017. Hello Edge: Keyword Spotting on Microcontrollers. CoRR abs/1711.07128 (2017).Google ScholarGoogle Scholar

Index Terms

  1. The Case for Hierarchical Deep Learning Inference at the Network Edge

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      NetAISys '23: Proceedings of the 1st International Workshop on Networked AI Systems
      June 2023
      43 pages
      ISBN:9798400702129
      DOI:10.1145/3597062

      Copyright © 2023 Owner/Author(s)

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 June 2023

      Check for updates

      Qualifiers

      • research-article

      Upcoming Conference

      MOBISYS '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader