ABSTRACT
Resource-constrained Edge Devices (EDs), e.g., IoT sensors and microcontroller units, are expected to make intelligent decisions using Deep Learning (DL) inference at the edge of the network. Toward this end, developing tinyML models is an area of active research - DL models with reduced computation and memory storage requirements - that can be embedded on these devices. However, tinyML models have lower inference accuracy. On a different front, DNN partitioning and inference offloading techniques were studied for distributed DL inference between EDs and Edge Servers (ESs). In this paper, we explore Hierarchical Inference (HI), a novel approach proposed in [19] for performing distributed DL inference at the edge. Under HI, for each data sample, an ED first uses a local algorithm (e.g., a tinyML model) for inference. Depending on the application, if the inference provided by the local algorithm is incorrect or further assistance is required from large DL models on edge or cloud, only then the ED offloads the data sample. At the outset, HI seems infeasible as the ED, in general, cannot know if the local inference is sufficient or not. Nevertheless, we present the feasibility of implementing HI for image classification applications. We demonstrate its benefits using quantitative analysis and show that HI provides a better trade-off between offloading cost, throughput, and inference accuracy compared to alternate approaches.
- Ghina Al-Atat, Andrea Fresa, Adarsh P. Behera, Vishnu N. Moothedath, James Gross, and Jaya P. Champati. 2023. The Case for Hierarchical Deep Learning Inference at the Network Edge. arXiv:2304.11763 [cs.DC]Google Scholar
- Ying Cui, Bixia Tang, Gangao Wu, Lun Li, Xin Zhang, Zhenglin Du, and Wenming Zhao. 2023. Classification of dog breeds using convolutional neural network models and support vector machine. bioRxiv (2023). Google ScholarCross Ref
- Lei Deng, Guoqi Li, Song Han, Luping Shi, and Yuan Xie. 2020. Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey. Proc. IEEE 108, 4 (2020), 485--532. Google ScholarCross Ref
- Chongwu Dong, Sheng Hu, Xi Chen, and Wushao Wen. 2021. Joint optimization with DNN partitioning and resource allocation in mobile edge computing. IEEE Transactions on Network and Service Management 18, 4 (2021), 3973--3986.Google ScholarCross Ref
- Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proc. ICLR.Google Scholar
- Colby Banbury et al. 2021. MLPerf Tiny Benchmark. In Proc. Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).Google Scholar
- Igor Fedorov, Ryan P. Adams, Matthew Mattina, and Paul N. Whatmough. 2019. SpArSe: Sparse architecture search for CNNs on resource-constrained microcontrollers. Advances in Neural Information Processing Systems 32 (2019).Google Scholar
- Andrea Fresa and Jaya P. Champati. 2022. An Offloading Algorithm for Maximizing Inference Accuracy on Edge Device in an Edge Intelligence System. In Proc. ACM MSWiM. 15--23.Google Scholar
- Song Han, Huizi Mao, and William J. Dally. 2016. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. In Proc. ICLR.Google Scholar
- Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. 2018. AMC: AutoML for Model Compression and Acceleration on Mobile Devices. In Proc. ECCV. 815--832.Google ScholarDigital Library
- Chuang Hu, Wei Bao, Dan Wang, and Fengming Liu. 2019. Dynamic Adaptive DNN Surgery for Inference Acceleration on the Edge. In Proc. IEEE INFOCOM. 1423--1431. Google ScholarDigital Library
- Chenghao Hu and Baochun Li. 2022. Distributed Inference with Deep Learning Models across Heterogeneous Edge Devices. In Proc. IEEE INFOCOM. 330--339. Google ScholarDigital Library
- Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5MB model size.Google Scholar
- Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge. In Proc. ACM ASPLOS. 615--629. Google ScholarDigital Library
- Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge. SIGARCH Comput. Archit. News 45, 1 (apr 2017), 615--629. Google ScholarDigital Library
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Proc. NIPS. 1097--1105.Google ScholarDigital Library
- En Li, Liekang Zeng, Zhi Zhou, and Xu Chen. 2020. Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing. IEEE Transactions on Wireless Communications 19, 1 (2020), 447--457. Google ScholarCross Ref
- Pavel Mach and Zdenek Becvar. 2017. Mobile Edge Computing: A Survey on Architecture and Computation Offloading. IEEE Communications Surveys Tutorials 19, 3 (2017), 1628--1656.Google ScholarDigital Library
- Vishnu N. Moothedath, Jaya P. Champati, and James Gross. 2023. Online Algorithms for Hierarchical Inference in Deep Learning applications at the Edge. arXiv:2304.00891Google Scholar
- Ivana Nikoloska and Nikola Zlatanov. 2021. Data Selection Scheme for Energy Efficient Supervised Learning at IoT Nodes. IEEE Communications Letters 25, 3 (2021), 859--863. Google ScholarCross Ref
- Emil Njor, Jan Madsen, and Xenofon Fafoutis. 2022. A Primer for tinyML Predictive Maintenance: Input and Model Optimisation. In Proc. Artificial Intelligence Applications and Innovations. 67--78.Google ScholarCross Ref
- Julius Ruseckas. n.d.. EfficientNet on CIFAR10. https://juliusruseckas.github.io/ml/efficientnet-cifar10.html.Google Scholar
- Ramon Sanchez-Iborra and Antonio F. Skarmeta. 2020. TinyML-Enabled Frugal Smart Objects: Challenges and Opportunities. IEEE Circuits and Systems Magazine 20, 3 (2020), 4--18.Google ScholarCross Ref
- Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proc. IEEE CVPR. 4510--4520.Google ScholarCross Ref
- Mingxing Tan and Quoc V. Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proc. ICML, Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), Vol. 97. PMLR, 6105--6114.Google Scholar
- Surat Teerapittayanon, Bradley McDanel, and H.T. Kung. 2016. BranchyNet: Fast inference via early exiting from deep neural networks. In Proc. ICPR. 2464--2469.Google Scholar
- Yundong Zhang, Naveen Suda, Liangzhen Lai, and Vikas Chandra. 2017. Hello Edge: Keyword Spotting on Microcontrollers. CoRR abs/1711.07128 (2017).Google Scholar
Index Terms
- The Case for Hierarchical Deep Learning Inference at the Network Edge
Recommendations
Improved Decision Module Selection for Hierarchical Inference in Resource-Constrained Edge Devices
ACM MobiCom '23: Proceedings of the 29th Annual International Conference on Mobile Computing and NetworkingThe Hierarchical Inference (HI) paradigm has recently emerged as an effective method for balancing inference accuracy, data processing, transmission throughput, and offloading cost. This approach proves particularly efficient in scenarios involving ...
Deep active inference
This work combines the free energy principle and the ensuing active inference dynamics with recent advances in variational inference in deep generative models, and evolution strategies to introduce the "deep active inference" agent. This agent minimises ...
Distributing deep learning inference on edge devices
CoNEXT '20: Proceedings of the 16th International Conference on emerging Networking EXperiments and TechnologiesDeep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs) are widely used in IoT related applications. However, inferencing pre-trained large DNNs and CNNs consumes a significant amount of time, memory and computational resources. This makes ...
Comments