ABSTRACT
With the wide adoption of AI applications, there is a pressing need of enabling real-time neural network (NN) inference on small embedded devices, but deploying NNs and achieving high performance of NN inference on these small devices is challenging due to their extremely weak capabilities. Although NN partitioning and offloading can contribute to such deployment, they are incapable of minimizing the local costs at embedded devices. Instead, we suggest to address this challenge via agile NN offloading, which migrates the required computations in NN offloading from online inference to offline learning. In this paper, we present AgileNN, a new NN offloading technique that achieves real-time NN inference on weak embedded devices by leveraging eXplainable AI techniques, so as to explicitly enforce feature sparsity during the training phase and minimize the online computation and communication costs. Experiment results show that AgileNN's inference latency is >6X lower than the existing schemes, ensuring that sensory data on embedded devices can be timely consumed. It also reduces the local device's resource consumption by >8X, without impairing the inference accuracy.
- [n. d.]. Progressive Automations. https://www.progressiveautomations.com/pages/industrial-linear-actuators.Google Scholar
- [n. d.]. STM32F7/H7 Series Manual. https://www.st.com/resource/en/programmingmanual/pm0253-stm32f7-series-and-stm32h7-series-cortexm7-processor-programming-manual-stmicroelectronics.pdf.Google Scholar
- Mohamed R Abdelhamid, Ruicong Chen, Joonhyuk Cho, Anantha P Chandrakasan, and Fadel Adib. 2020. Self-reconfigurable micro-implants for cross-tissue wireless and batteryless connectivity. In MobiCom'20: Proceedings of the 26th Annual International Conference on Mobile Computing and Networking.Google ScholarDigital Library
- Eirikur Agustsson, Fabian Mentzer, Michael Tschannen, Lukas Cavigelli, Radu Timofte, Luca Benini, and Luc V Gool. 2017. Soft-to-hard vector quantization for end-to-end learning compressible representations. Advances in neural information processing systems 30 (2017).Google Scholar
- Ejaz Ahmed, Michael Jones, and Tim K Marks. 2015. An improved deep learning architecture for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3908--3916.Google ScholarCross Ref
- Dario Amodei, Sundaram Ananthanarayanan, Rishita Anubhai, Jingliang Bai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Qiang Cheng, Guoliang Chen, et al. 2016. Deep speech 2: End-to-end speech recognition in english and mandarin. In International conference on machine learning. PMLR, 173--182.Google Scholar
- Roshan Ayyalasomayajula, Aditya Arun, Chenfeng Wu, Sanatan Sharma, Abhishek Rajkumar Sethi, Deepak Vasisht, and Dinesh Bharadia. 2020. Deep learning based wireless localization for indoor navigation. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking. 1--14.Google ScholarDigital Library
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).Google Scholar
- Anton Bakker and Johan H Huijsing. 1999. A low-cost high-accuracy CMOS smart temperature sensor. In Proceedings of the 25th European Solid-State Circuits Conference. IEEE, 302--305.Google Scholar
- Colby Banbury, Chuteng Zhou, Igor Fedorov, Ramon Matas, Urmish Thakker, Dibakar Gope, Vijay Janapa Reddi, Matthew Mattina, and Paul Whatmough. 2021. Micronets: Neural network architectures for deploying tinyml applications on commodity microcontrollers. Proceedings of Machine Learning and Systems 3 (2021).Google Scholar
- Léon Bottou. 2012. Stochastic gradient descent tricks. In Neural networks: Tricks of the trade. Springer, 421--436.Google Scholar
- Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5--32.Google ScholarDigital Library
- Nam Bui, Nhat Pham, Jessica Jacqueline Barnitz, Zhanan Zou, Phuc Nguyen, Hoang Truong, Taeho Kim, Nicholas Farrow, Anh Nguyen, Jianliang Xiao, et al. 2019. ebp: A wearable system for frequent and comfortable blood pressure monitoring from user's ear. In The 25th annual international conference on mobile computing and networking. 1--17.Google Scholar
- Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. 2018. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In International Conference on Machine Learning. PMLR, 794--803.Google Scholar
- Gioele Ciaparrone, Francisco Luque Sánchez, SihamTabik, Luigi Troiano, Roberto Tagliaferri, and Francisco Herrera. 2020. Deep learning in video multi-object tracking: A survey. Neurocomputing 381 (2020), 61--88.Google ScholarDigital Library
- Alex Davies, Petar Veličković, Lars Buesing, Sam Blackwell, Daniel Zheng, Nenad Tomašev, Richard Tanburn, Peter Battaglia, Charles Blundell, András Juhász, et al. 2021. Advancing mathematics by guiding human intuition with AI. Nature 600, 7887 (2021), 70--74.Google Scholar
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.Google ScholarCross Ref
- Emily L Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, and Rob Fergus. 2014. Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in neural information processing systems. 1269--1277.Google Scholar
- Estefania Munoz Diaz, Oliver Heirich, Mohammed Khider, and Patrick Robertson. 2013. Optimal sampling frequency and bias error modeling for foot-mounted IMUs. In International Conference on Indoor Positioning and Indoor Navigation. IEEE, 1--9.Google ScholarCross Ref
- Amir Erfan Eshratifar, Mohammad Saeed Abrishami, and Massoud Pedram. 2019. JointDNN: An efficient training and inference engine for intelligent mobile cloud computing services. IEEE Transactions on Mobile Computing 20, 2 (2019), 565--576.Google ScholarDigital Library
- Biyi Fang, Xiao Zeng, and Mi Zhang. 2018. Nestdnn: Resource-aware multi-tenant on-device deep learning for continuous mobile vision. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking. 115--127.Google ScholarDigital Library
- Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M Roy, and Michael Carbin. 2020. Pruning neural networks at initialization: Why are we missing the mark? arXiv preprint arXiv:2009.08576 (2020).Google Scholar
- Wei Gao, David Hsu, Wee Sun Lee, Shengmei Shen, and Karthikk Subramanian. 2017. Intention-net: Integrating planning and deep learning for goal-directed autonomous navigation. In Conference on robot learning. PMLR, 185--194.Google Scholar
- Graham Gobieski, Brandon Lucia, and Nathan Beckmann. 2019. Intelligence beyond the edge: Inference on intermittent embedded systems. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 199--213.Google ScholarDigital Library
- Yunchao Gong, Liu Liu, Ming Yang, and Lubomir Bourdev. 2014. Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115 (2014).Google Scholar
- Rick Groenendijk, Sezer Karaoglu, Theo Gevers, and Thomas Mensink. 2021. Multi-loss weighting with coefficient of variations. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1469--1478.Google ScholarCross Ref
- Song Han, Jeff Pool, John Tran, and William J Dally. 2015. Learning both weights and connections for efficient neural networks. arXiv preprint arXiv:1506.02626 (2015).Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
- Robin Hesse, Simone Schaub-Meyer, and Stefan Roth. 2021. Fast axiomatic attribution for neural networks. Advances in Neural Information Processing Systems 34 (2021), 19513--19524.Google Scholar
- Ke-Jou Hsu, Ketan Bhardwaj, and Ada Gavrilovska. 2019. Couper: Dnn model slicing for visual analytics containers at the edge. In Proceedings of the 4th ACM/IEEE Symposium on Edge Computing. 179--194.Google ScholarDigital Library
- Chuang Hu, Wei Bao, Dan Wang, and Fengming Liu. 2019. Dynamic adaptive DNN surgery for inference acceleration on the edge. In IEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE, 1423--1431.Google ScholarDigital Library
- Guosheng Hu, Yongxin Yang, Dong Yi, Josef Kittler, William Christmas, Stan Z Li, and Timothy Hospedales. 2015. When face recognition meets with deep learning: an evaluation of convolutional neural networks for face recognition. In Proceedings of the IEEE international conference on computer vision workshops. 142--150.Google ScholarDigital Library
- Vikram Iyer, Vamsi Talla, Bryce Kellogg, Shyamnath Gollakota, and Joshua Smith. 2016. Inter-technology backscatter: Towards internet connectivity for implanted devices. In Proceedings of the 2016 ACM SIGCOMM Conference. 356--369.Google ScholarDigital Library
- Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. ACM SIGARCH Computer Architecture News 45, 1 (2017), 615--629.Google ScholarDigital Library
- Bryce Kellogg, Aaron Parks, Shyamnath Gollakota, Joshua R Smith, and David Wetherall. 2014. Wi-Fi backscatter: Internet connectivity for RF-powered devices. In Proceedings of the 2014 ACM Conference on SIGCOMM. 607--618.Google ScholarDigital Library
- Jong Hwan Ko, Taesik Na, Mohammad Faisal Amir, and Saibal Mukhopadhyay. 2018. Edge-host partitioning of deep neural networks with feature space encoding for resource-constrained internet-of-things platforms. In 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 1--6.Google ScholarCross Ref
- Jakub Konečnỳ, H Brendan McMahan, Felix X Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492 (2016).Google Scholar
- Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).Google Scholar
- Stefanos Laskaridis, Stylianos I Venieris, Mario Almeida, Ilias Leontiadis, and Nicholas D Lane. 2020. SPINN: synergistic progressive inference of neural networks over device and cloud. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking. 1--15.Google ScholarDigital Library
- Ya Le and Xuan Yang. 2015. Tiny imagenet visual recognition challenge. CS 231N 7, 7 (2015), 3.Google Scholar
- Didier Le Gall. 1991. MPEG: A video compression standard for multimedia applications. Commun. ACM 34, 4 (1991), 46--58.Google ScholarDigital Library
- Hongshan Li, Chenghao Hu, Jingyan Jiang, Zhi Wang, Yonggang Wen, and Wenwu Zhu. 2018. Jalad: Joint accuracy-and latency-aware deep structure decoupling for edge-cloud execution. In 2018 IEEE 24th international conference on parallel and distributed systems (ICPADS). IEEE, 671--678.Google ScholarCross Ref
- Yiran Li and Tong Zhang. 2011. Reducing dram image data access energy consumption in video processing. IEEE Transactions on Multimedia 14, 2 (2011), 303--313.Google ScholarDigital Library
- Ji Lin, Wei-Ming Chen, Yujun Lin, John Cohn, Chuang Gan, and Song Han. 2020. Mcunet: Tiny deep learning on iot devices. arXiv preprint arXiv:2007.10319 (2020).Google Scholar
- Zihao Liu, Tao Liu, Wujie Wen, Lei Jiang, Jie Xu, Yanzhi Wang, and Gang Quan. 2018. DeepN-JPEG: A deep neural network favorable JPEG-based image compression framework. In Proceedings of the 55th annual design automation conference. 1--6.Google ScholarDigital Library
- Zihao Liu, Xiaowei Xu, Tao Liu, Qi Liu, Yanzhi Wang, Yiyu Shi, Wujie Wen, Meiping Huang, Haiyun Yuan, and Jian Zhuang. 2019. Machine vision guided 3d medical image compression for efficient transmission and accurate segmentation in the clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12687--12696.Google ScholarCross Ref
- Yunfei Ma, Zhihong Luo, Christoph Steiger, Giovanni Traverso, and Fadel Adib. 2018. Enabling deep-tissue networking for miniature medical devices. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication. 417--431.Google ScholarDigital Library
- Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML).Google ScholarDigital Library
- Mark R Nelson. 1989. LZW data compression. Dr. Dobb's Journal 14, 10 (1989), 29--36.Google ScholarDigital Library
- Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. 2011. Reading digits in natural images with unsupervised feature learning. (2011).Google Scholar
- Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, and Bin Ren. 2020. Patdnn: Achieving real-time dnn execution on mobile devices with pattern-based weight pruning. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 907--922.Google ScholarDigital Library
- Wei Niu, Xiaolong Ma, Yanzhi Wang, and Bin Ren. 2019. 26ms inference time for resnet-50: Towards real-time execution of all dnns on smartphone. arXiv preprint arXiv:1905.00571 (2019).Google Scholar
- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135--1144.Google ScholarDigital Library
- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Model-agnostic interpretability of machine learning. arXiv preprint arXiv:1606.05386 (2016).Google Scholar
- Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510--4520.Google ScholarCross Ref
- Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision. 618--626.Google ScholarCross Ref
- Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning Important Features Through Propagating Activation Differences. CoRR abs/1704.02685 (2017).Google Scholar
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
- Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution for deep networks. In International Conference on Machine Learning. PMLR, 3319--3328.Google Scholar
- Mingxing Tan and Quoc Le. 2021. Efficientnetv2: Smaller models and faster training. In International Conference on Machine Learning. PMLR, 10096--10106.Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google Scholar
- Gregory K Wallace. 1992. The JPEG still picture compression standard. IEEE transactions on consumer electronics 38, 1 (1992), xviii--xxxiv.Google ScholarDigital Library
- Dan Wang, Dong Chen, Bin Song, Nadra Guizani, Xiaoyan Yu, and Xiaojiang Du. 2018. From IoT to 5GI-IoT: The next generation IoT-based intelligent algorithms and 5G technologies. IEEE Communications Magazine 56, 10 (2018), 114--120.Google ScholarDigital Library
- Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, and Xiangyang Xue. 2015. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In Proceedings of the 23rd ACM international conference on Multimedia. 461--470.Google ScholarDigital Library
- Shuochao Yao, Jinyang Li, Dongxin Liu, Tianshi Wang, Shengzhong Liu, Huajie Shao, and Tarek Abdelzaher. 2020. Deep compressive offloading: Speeding up neural network inference by trading edge computation for network latency. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems. 476--488.Google ScholarDigital Library
- Zhou Yu, Jun Yu, Yuhao Cui, Dacheng Tao, and Qi Tian. 2019. Deep modular co-attention networks for visual question answering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6281--6290.Google ScholarCross Ref
- Wuyang Zhang, Zhezhi He, Luyang Liu, Zhenhua Jia, Yunxin Liu, Marco Gruteser, Dipankar Raychaudhuri, and Yanyong Zhang. 2021. Elf: accelerate high-resolution mobile deep vision with content-aware parallel offloading. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking. 201--214.Google ScholarDigital Library
- Hengshuang Zhao, Jiaya Jia, and Vladlen Koltun. 2020. Exploring self-attention for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10076--10085.Google ScholarCross Ref
- Wentao Zhao, Wei Jiang, and Xinguo Qiu. 2021. Deep learning for COVID-19 detection based on CT images. Scientific Reports 11, 1 (2021), 1--12.Google Scholar
- Yanbo Zhao and Zhaohui Ye. 2008. A low cost GSM/GPRS based wireless home security system. IEEE Transactions on Consumer Electronics 54, 2 (2008), 567--572.Google ScholarDigital Library
Index Terms
- Real-time neural network inference on extremely weak devices: agile offloading with explainable AI
Recommendations
An Efficient 3-Party Framework for Privacy-Preserving Neural Network Inference
Computer Security – ESORICS 2020Privacy Leakage in Privacy-Preserving Neural Network Inference
Computer Security – ESORICS 2022AbstractThe community has seen many attempts to secure machine learning algorithms from multi-party computation or other cryptographic primitives. An interesting 3-party framework (SCSDF hereafter) for privacy-preserving neural network inference was ...
Novel Recurrent Neural Network for Time-Varying Problems Solving [Research Frontier]
By following the inspirational work of McCulloch and Pitts [1], lots of neural networks have been proposed, developed and studied for scientific research and engineering applications [2][18]. For instance, one classical neural network is Hopfield neural ...
Comments