research-article

Public Access

Real-time neural network inference on extremely weak devices: agile offloading with explainable AI

Authors:
Kai Huang

University of Pittsburgh

University of Pittsburgh
View Profile

,
Wei Gao

University of Pittsburgh

University of Pittsburgh
View Profile

MobiCom '22: Proceedings of the 28th Annual International Conference on Mobile Computing And NetworkingOctober 2022Pages 200–213https://doi.org/10.1145/3495243.3560551

Published:14 October 2022Publication History

MobiCom '22: Proceedings of the 28th Annual International Conference on Mobile Computing And Networking

Pages 200–213

ABSTRACT

With the wide adoption of AI applications, there is a pressing need of enabling real-time neural network (NN) inference on small embedded devices, but deploying NNs and achieving high performance of NN inference on these small devices is challenging due to their extremely weak capabilities. Although NN partitioning and offloading can contribute to such deployment, they are incapable of minimizing the local costs at embedded devices. Instead, we suggest to address this challenge via agile NN offloading, which migrates the required computations in NN offloading from online inference to offline learning. In this paper, we present AgileNN, a new NN offloading technique that achieves real-time NN inference on weak embedded devices by leveraging eXplainable AI techniques, so as to explicitly enforce feature sparsity during the training phase and minimize the online computation and communication costs. Experiment results show that AgileNN's inference latency is >6X lower than the existing schemes, ensuring that sensory data on embedded devices can be timely consumed. It also reduces the local device's resource consumption by >8X, without impairing the inference accuracy.

References

[n. d.]. Progressive Automations. https://www.progressiveautomations.com/pages/industrial-linear-actuators.Google Scholar
[n. d.]. STM32F7/H7 Series Manual. https://www.st.com/resource/en/programmingmanual/pm0253-stm32f7-series-and-stm32h7-series-cortexm7-processor-programming-manual-stmicroelectronics.pdf.Google Scholar
Mohamed R Abdelhamid, Ruicong Chen, Joonhyuk Cho, Anantha P Chandrakasan, and Fadel Adib. 2020. Self-reconfigurable micro-implants for cross-tissue wireless and batteryless connectivity. In MobiCom'20: Proceedings of the 26th Annual International Conference on Mobile Computing and Networking.Google ScholarDigital Library
Eirikur Agustsson, Fabian Mentzer, Michael Tschannen, Lukas Cavigelli, Radu Timofte, Luca Benini, and Luc V Gool. 2017. Soft-to-hard vector quantization for end-to-end learning compressible representations. Advances in neural information processing systems 30 (2017).Google Scholar
Ejaz Ahmed, Michael Jones, and Tim K Marks. 2015. An improved deep learning architecture for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3908--3916.Google ScholarCross Ref
Dario Amodei, Sundaram Ananthanarayanan, Rishita Anubhai, Jingliang Bai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Qiang Cheng, Guoliang Chen, et al. 2016. Deep speech 2: End-to-end speech recognition in english and mandarin. In International conference on machine learning. PMLR, 173--182.Google Scholar
Roshan Ayyalasomayajula, Aditya Arun, Chenfeng Wu, Sanatan Sharma, Abhishek Rajkumar Sethi, Deepak Vasisht, and Dinesh Bharadia. 2020. Deep learning based wireless localization for indoor navigation. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking. 1--14.Google ScholarDigital Library
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).Google Scholar
Anton Bakker and Johan H Huijsing. 1999. A low-cost high-accuracy CMOS smart temperature sensor. In Proceedings of the 25th European Solid-State Circuits Conference. IEEE, 302--305.Google Scholar
Colby Banbury, Chuteng Zhou, Igor Fedorov, Ramon Matas, Urmish Thakker, Dibakar Gope, Vijay Janapa Reddi, Matthew Mattina, and Paul Whatmough. 2021. Micronets: Neural network architectures for deploying tinyml applications on commodity microcontrollers. Proceedings of Machine Learning and Systems 3 (2021).Google Scholar
Léon Bottou. 2012. Stochastic gradient descent tricks. In Neural networks: Tricks of the trade. Springer, 421--436.Google Scholar
Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5--32.Google ScholarDigital Library
Nam Bui, Nhat Pham, Jessica Jacqueline Barnitz, Zhanan Zou, Phuc Nguyen, Hoang Truong, Taeho Kim, Nicholas Farrow, Anh Nguyen, Jianliang Xiao, et al. 2019. ebp: A wearable system for frequent and comfortable blood pressure monitoring from user's ear. In The 25th annual international conference on mobile computing and networking. 1--17.Google Scholar
Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. 2018. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In International Conference on Machine Learning. PMLR, 794--803.Google Scholar
Gioele Ciaparrone, Francisco Luque Sánchez, SihamTabik, Luigi Troiano, Roberto Tagliaferri, and Francisco Herrera. 2020. Deep learning in video multi-object tracking: A survey. Neurocomputing 381 (2020), 61--88.Google ScholarDigital Library
Alex Davies, Petar Veličković, Lars Buesing, Sam Blackwell, Daniel Zheng, Nenad Tomašev, Richard Tanburn, Peter Battaglia, Charles Blundell, András Juhász, et al. 2021. Advancing mathematics by guiding human intuition with AI. Nature 600, 7887 (2021), 70--74.Google Scholar
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.Google ScholarCross Ref
Emily L Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, and Rob Fergus. 2014. Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in neural information processing systems. 1269--1277.Google Scholar
Estefania Munoz Diaz, Oliver Heirich, Mohammed Khider, and Patrick Robertson. 2013. Optimal sampling frequency and bias error modeling for foot-mounted IMUs. In International Conference on Indoor Positioning and Indoor Navigation. IEEE, 1--9.Google ScholarCross Ref
Amir Erfan Eshratifar, Mohammad Saeed Abrishami, and Massoud Pedram. 2019. JointDNN: An efficient training and inference engine for intelligent mobile cloud computing services. IEEE Transactions on Mobile Computing 20, 2 (2019), 565--576.Google ScholarDigital Library
Biyi Fang, Xiao Zeng, and Mi Zhang. 2018. Nestdnn: Resource-aware multi-tenant on-device deep learning for continuous mobile vision. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking. 115--127.Google ScholarDigital Library
Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M Roy, and Michael Carbin. 2020. Pruning neural networks at initialization: Why are we missing the mark? arXiv preprint arXiv:2009.08576 (2020).Google Scholar
Wei Gao, David Hsu, Wee Sun Lee, Shengmei Shen, and Karthikk Subramanian. 2017. Intention-net: Integrating planning and deep learning for goal-directed autonomous navigation. In Conference on robot learning. PMLR, 185--194.Google Scholar
Graham Gobieski, Brandon Lucia, and Nathan Beckmann. 2019. Intelligence beyond the edge: Inference on intermittent embedded systems. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 199--213.Google ScholarDigital Library
Yunchao Gong, Liu Liu, Ming Yang, and Lubomir Bourdev. 2014. Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115 (2014).Google Scholar
Rick Groenendijk, Sezer Karaoglu, Theo Gevers, and Thomas Mensink. 2021. Multi-loss weighting with coefficient of variations. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1469--1478.Google ScholarCross Ref
Song Han, Jeff Pool, John Tran, and William J Dally. 2015. Learning both weights and connections for efficient neural networks. arXiv preprint arXiv:1506.02626 (2015).Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
Robin Hesse, Simone Schaub-Meyer, and Stefan Roth. 2021. Fast axiomatic attribution for neural networks. Advances in Neural Information Processing Systems 34 (2021), 19513--19524.Google Scholar
Ke-Jou Hsu, Ketan Bhardwaj, and Ada Gavrilovska. 2019. Couper: Dnn model slicing for visual analytics containers at the edge. In Proceedings of the 4th ACM/IEEE Symposium on Edge Computing. 179--194.Google ScholarDigital Library
Chuang Hu, Wei Bao, Dan Wang, and Fengming Liu. 2019. Dynamic adaptive DNN surgery for inference acceleration on the edge. In IEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE, 1423--1431.Google ScholarDigital Library
Guosheng Hu, Yongxin Yang, Dong Yi, Josef Kittler, William Christmas, Stan Z Li, and Timothy Hospedales. 2015. When face recognition meets with deep learning: an evaluation of convolutional neural networks for face recognition. In Proceedings of the IEEE international conference on computer vision workshops. 142--150.Google ScholarDigital Library
Vikram Iyer, Vamsi Talla, Bryce Kellogg, Shyamnath Gollakota, and Joshua Smith. 2016. Inter-technology backscatter: Towards internet connectivity for implanted devices. In Proceedings of the 2016 ACM SIGCOMM Conference. 356--369.Google ScholarDigital Library
Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. ACM SIGARCH Computer Architecture News 45, 1 (2017), 615--629.Google ScholarDigital Library
Bryce Kellogg, Aaron Parks, Shyamnath Gollakota, Joshua R Smith, and David Wetherall. 2014. Wi-Fi backscatter: Internet connectivity for RF-powered devices. In Proceedings of the 2014 ACM Conference on SIGCOMM. 607--618.Google ScholarDigital Library
Jong Hwan Ko, Taesik Na, Mohammad Faisal Amir, and Saibal Mukhopadhyay. 2018. Edge-host partitioning of deep neural networks with feature space encoding for resource-constrained internet-of-things platforms. In 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 1--6.Google ScholarCross Ref
Jakub Konečnỳ, H Brendan McMahan, Felix X Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492 (2016).Google Scholar
Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).Google Scholar
Stefanos Laskaridis, Stylianos I Venieris, Mario Almeida, Ilias Leontiadis, and Nicholas D Lane. 2020. SPINN: synergistic progressive inference of neural networks over device and cloud. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking. 1--15.Google ScholarDigital Library
Ya Le and Xuan Yang. 2015. Tiny imagenet visual recognition challenge. CS 231N 7, 7 (2015), 3.Google Scholar
Didier Le Gall. 1991. MPEG: A video compression standard for multimedia applications. Commun. ACM 34, 4 (1991), 46--58.Google ScholarDigital Library
Hongshan Li, Chenghao Hu, Jingyan Jiang, Zhi Wang, Yonggang Wen, and Wenwu Zhu. 2018. Jalad: Joint accuracy-and latency-aware deep structure decoupling for edge-cloud execution. In 2018 IEEE 24th international conference on parallel and distributed systems (ICPADS). IEEE, 671--678.Google ScholarCross Ref
Yiran Li and Tong Zhang. 2011. Reducing dram image data access energy consumption in video processing. IEEE Transactions on Multimedia 14, 2 (2011), 303--313.Google ScholarDigital Library
Ji Lin, Wei-Ming Chen, Yujun Lin, John Cohn, Chuang Gan, and Song Han. 2020. Mcunet: Tiny deep learning on iot devices. arXiv preprint arXiv:2007.10319 (2020).Google Scholar
Zihao Liu, Tao Liu, Wujie Wen, Lei Jiang, Jie Xu, Yanzhi Wang, and Gang Quan. 2018. DeepN-JPEG: A deep neural network favorable JPEG-based image compression framework. In Proceedings of the 55th annual design automation conference. 1--6.Google ScholarDigital Library
Zihao Liu, Xiaowei Xu, Tao Liu, Qi Liu, Yanzhi Wang, Yiyu Shi, Wujie Wen, Meiping Huang, Haiyun Yuan, and Jian Zhuang. 2019. Machine vision guided 3d medical image compression for efficient transmission and accurate segmentation in the clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12687--12696.Google ScholarCross Ref
Yunfei Ma, Zhihong Luo, Christoph Steiger, Giovanni Traverso, and Fadel Adib. 2018. Enabling deep-tissue networking for miniature medical devices. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication. 417--431.Google ScholarDigital Library
Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML).Google ScholarDigital Library
Mark R Nelson. 1989. LZW data compression. Dr. Dobb's Journal 14, 10 (1989), 29--36.Google ScholarDigital Library
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. 2011. Reading digits in natural images with unsupervised feature learning. (2011).Google Scholar
Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, and Bin Ren. 2020. Patdnn: Achieving real-time dnn execution on mobile devices with pattern-based weight pruning. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 907--922.Google ScholarDigital Library
Wei Niu, Xiaolong Ma, Yanzhi Wang, and Bin Ren. 2019. 26ms inference time for resnet-50: Towards real-time execution of all dnns on smartphone. arXiv preprint arXiv:1905.00571 (2019).Google Scholar
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135--1144.Google ScholarDigital Library
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Model-agnostic interpretability of machine learning. arXiv preprint arXiv:1606.05386 (2016).Google Scholar
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510--4520.Google ScholarCross Ref
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision. 618--626.Google ScholarCross Ref
Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning Important Features Through Propagating Activation Differences. CoRR abs/1704.02685 (2017).Google Scholar
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution for deep networks. In International Conference on Machine Learning. PMLR, 3319--3328.Google Scholar
Mingxing Tan and Quoc Le. 2021. Efficientnetv2: Smaller models and faster training. In International Conference on Machine Learning. PMLR, 10096--10106.Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google Scholar
Gregory K Wallace. 1992. The JPEG still picture compression standard. IEEE transactions on consumer electronics 38, 1 (1992), xviii--xxxiv.Google ScholarDigital Library
Dan Wang, Dong Chen, Bin Song, Nadra Guizani, Xiaoyan Yu, and Xiaojiang Du. 2018. From IoT to 5GI-IoT: The next generation IoT-based intelligent algorithms and 5G technologies. IEEE Communications Magazine 56, 10 (2018), 114--120.Google ScholarDigital Library
Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, and Xiangyang Xue. 2015. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In Proceedings of the 23rd ACM international conference on Multimedia. 461--470.Google ScholarDigital Library
Shuochao Yao, Jinyang Li, Dongxin Liu, Tianshi Wang, Shengzhong Liu, Huajie Shao, and Tarek Abdelzaher. 2020. Deep compressive offloading: Speeding up neural network inference by trading edge computation for network latency. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems. 476--488.Google ScholarDigital Library
Zhou Yu, Jun Yu, Yuhao Cui, Dacheng Tao, and Qi Tian. 2019. Deep modular co-attention networks for visual question answering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6281--6290.Google ScholarCross Ref
Wuyang Zhang, Zhezhi He, Luyang Liu, Zhenhua Jia, Yunxin Liu, Marco Gruteser, Dipankar Raychaudhuri, and Yanyong Zhang. 2021. Elf: accelerate high-resolution mobile deep vision with content-aware parallel offloading. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking. 201--214.Google ScholarDigital Library
Hengshuang Zhao, Jiaya Jia, and Vladlen Koltun. 2020. Exploring self-attention for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10076--10085.Google ScholarCross Ref
Wentao Zhao, Wei Jiang, and Xinguo Qiu. 2021. Deep learning for COVID-19 detection based on CT images. Scientific Reports 11, 1 (2021), 1--12.Google Scholar
Yanbo Zhao and Zhaohui Ye. 2008. A low cost GSM/GPRS based wireless home security system. IEEE Transactions on Consumer Electronics 54, 2 (2008), 567--572.Google ScholarDigital Library

Index Terms

Real-time neural network inference on extremely weak devices: agile offloading with explainable AI
1. Computer systems organization
  1. Embedded and cyber-physical systems
2. Computing methodologies
  1. Artificial intelligence

Recommendations

An Efficient 3-Party Framework for Privacy-Preserving Neural Network Inference
Computer Security – ESORICS 2020
Abstract
In the era of big data, users pay more attention to data privacy issues in many application fields, such as healthcare, finance, and so on. However, in the current application scenarios of machine learning as a service, service providers require ...
Read More
Privacy Leakage in Privacy-Preserving Neural Network Inference
Computer Security – ESORICS 2022
Abstract
The community has seen many attempts to secure machine learning algorithms from multi-party computation or other cryptographic primitives. An interesting 3-party framework (SCSDF hereafter) for privacy-preserving neural network inference was ...
Read More
Novel Recurrent Neural Network for Time-Varying Problems Solving [Research Frontier]

By following the inspirational work of McCulloch and Pitts [1], lots of neural networks have been proposed, developed and studied for scientific research and engineering applications [2][18]. For instance, one classical neural network is Hopfield neural ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

MobiCom '22: Proceedings of the 28th Annual International Conference on Mobile Computing And Networking
October 2022
932 pages
ISBN:9781450391818
DOI:10.1145/3495243

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 October 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
explainable AI
microcontrollers
neural network inference
offloading
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate440of2,972submissions,15%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 1,569
  Total Downloads
- Downloads (Last 12 months)877
- Downloads (Last 6 weeks)93
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Real-time neural network inference on extremely weak devices: agile offloading with explainable AI

MobiCom '22: Proceedings of the 28th Annual International Conference on Mobile Computing And Networking

ABSTRACT

References

Cited By

Index Terms

Recommendations

An Efficient 3-Party Framework for Privacy-Preserving Neural Network Inference

Privacy Leakage in Privacy-Preserving Neural Network Inference

Novel Recurrent Neural Network for Time-Varying Problems Solving [Research Frontier]

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Real-time neural network inference on extremely weak devices: agile offloading with explainable AI

MobiCom '22: Proceedings of the 28th Annual International Conference on Mobile Computing And Networking

ABSTRACT

References

Cited By

Index Terms

Recommendations

An Efficient 3-Party Framework for Privacy-Preserving Neural Network Inference

Privacy Leakage in Privacy-Preserving Neural Network Inference

Novel Recurrent Neural Network for Time-Varying Problems Solving [Research Frontier]

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media