skip to main content
research-article
Open Access

Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT

Published:08 October 2019Publication History
Skip Abstract Section

Abstract

Model compression has emerged as an important area of research for deploying deep learning models on Internet-of-Things (IoT). However, for extremely memory-constrained scenarios, even the compressed models cannot fit within the memory of a single device and, as a result, must be distributed across multiple devices. This leads to a distributed inference paradigm in which memory and communication costs represent a major bottleneck. Yet, existing model compression techniques are not communication-aware. Therefore, we propose Network of Neural Networks (NoNN), a new distributed IoT learning paradigm that compresses a large pretrained ‘teacher’ deep network into several disjoint and highly-compressed ‘student’ modules, without loss of accuracy. Moreover, we propose a network science-based knowledge partitioning algorithm for the teacher model, and then train individual students on the resulting disjoint partitions. Extensive experimentation on five image classification datasets, for user-defined memory/performance budgets, show that NoNN achieves higher accuracy than several baselines and similar accuracy as the teacher model, while using minimal communication among students. Finally, as a case study, we deploy the proposed model for CIFAR-10 dataset on edge devices and demonstrate significant improvements in memory footprint (up to 24×), performance (up to 12×), and energy per node (up to 14×) compared to the large teacher model. We further show that for distributed inference on multiple edge devices, our proposed NoNN model results in up to 33× reduction in total latency w.r.t. a state-of-the-art model compression baseline.

References

  1. Jimmy Ba and Rich Caruana. 2014. Do deep nets really need to be deep? In Advances in Neural Information Processing Systems. 2654--266Google ScholarGoogle Scholar
  2. Facebook. 2017. ONNX: Open Neural Network Exchange Format. https://onnx.ai/Google ScholarGoogle Scholar
  3. Hongyang Gao, Zhengyang Wang, and Shuiwang Ji. 2018. ChannelNets: Compact and efficient convolutional neural networks via channel-wise convolutions. In Advances in Neural Information Processing Systems. 5203--5211.Google ScholarGoogle Scholar
  4. Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv:1510.00149 (2015).Google ScholarGoogle Scholar
  5. Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. In NIPS. 1135--1143.Google ScholarGoogle Scholar
  6. Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv:1503.02531 (2015).Google ScholarGoogle Scholar
  7. Jeremy Howard. 2018. Imagenet in 18 minutes. https://www.fast.ai/2018/08/10/fastai-diu-imagenet/. (2018). Accessed: 2018-10-01.Google ScholarGoogle Scholar
  8. Itay Hubara et al. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. JMLR 18, 1 (2017), 6869--6898.Google ScholarGoogle Scholar
  9. Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size. arXiv:1602.07360 (2016).Google ScholarGoogle Scholar
  10. Juyong Kim, Yookoon Park, Gunhee Kim, and Sung Ju Hwang. 2017. SplitNet: Learning to semantically split deep networks for parameter reduction and model parallelization. In International Conference on Machine Learning. 1866--1874.Google ScholarGoogle Scholar
  11. Yoon Kim and Alexander M. Rush. 2016. Sequence-level knowledge distillation. arXiv preprint arXiv:1606.07947 (2016).Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Liangzhen Lai, Naveen Suda, and Vikas Chandra. 2017. Deep convolutional neural network inference with floating-point weights and fixed-point activations. arXiv:1703.03073 (2017).Google ScholarGoogle Scholar
  13. Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2016. Pruning filters for efficient convnets. arXiv:1608.08710 (2016).Google ScholarGoogle Scholar
  14. Jiachen Mao et al. 2017. Modnn: Local distributed mobile computing system for deep neural network. In 2017 DATE Conference. IEEE, 1396--1401.Google ScholarGoogle Scholar
  15. Jiachen Mao, Zhongda Yang, Wei Wen, Chunpeng Wu, Linghao Song, Kent W. Nixon, Xiang Chen, Hai Li, and Yiran Chen. 2017. Mednn: A distributed mobile system with enhanced partition and deployment for large-scale dnns. In Proceedings of the 36th International Conference on Computer-Aided Design. IEEE Press, 751--756.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Mark Newman, Albert-Laszlo Barabasi, and Duncan J. Watts. 2011. The Structure and Dynamics of Networks. Vol. 19. Princeton University Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Mark E. J. Newman. 2006. Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103, 23 (2006), 8577--8582.Google ScholarGoogle ScholarCross RefCross Ref
  18. Ariadna Quattoni and Antonio Torralba. 2009. Recognizing indoor scenes. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 413--420.Google ScholarGoogle ScholarCross RefCross Ref
  19. Mark Sandler et al. 2018. Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation. arXiv:1801.04381 (2018).Google ScholarGoogle Scholar
  20. STMicro. 2018. Datasheet for Arm-Based Microcontroller with up to 512KB total storage (including FLASH memory). Product Page: https://bit.ly/2I5ZSMR. Datasheet. https://bit.ly/2Kz8ehDGoogle ScholarGoogle Scholar
  21. Zhiyuan Tang, Dong Wang, and Zhiyong Zhang. 2016. Recurrent neural network training with dark knowledge transfer. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5900--5904.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Peter Welinder, Steve Branson, Takeshi Mita, Catherine Wah, Florian Schroff, Serge Belongie, and Pietro Perona. 2010. Caltech-UCSD birds 200. (2010).Google ScholarGoogle Scholar
  23. Tien-Ju Yang et al. 2016. Designing energy-efficient convolutional neural networks using energy-aware pruning. arXiv:1611.05128 (2016).Google ScholarGoogle Scholar
  24. Sergey Zagoruyko and Nikos Komodakis. 2016. Wide residual networks. BMVC (2016).Google ScholarGoogle Scholar
  25. Sergey Zagoruyko and Nikos Komodakis. 2017. Improving the performance of convolutional neural networks via attention transfer. ICLR (2017).Google ScholarGoogle Scholar
  26. Xiangyu Zhang et al. 2017. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. CoRR abs/1707.01083 (2017).Google ScholarGoogle Scholar
  27. Yundong Zhang, Naveen Suda, Liangzhen Lai, and Vikas Chandra. 2017. Hello edge: Keyword spotting on microcontrollers. arXiv:1711.07128 (2017).Google ScholarGoogle Scholar
  28. Zhuoran Zhao, Kamyar Mirzazad Barijough, and Andreas Gerstlauer. 2018. DeepThings: Distributed adaptive deep learning inference on resource-constrained IoT edge clusters. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 11 (2018), 2348--2359.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Embedded Computing Systems
            ACM Transactions on Embedded Computing Systems  Volume 18, Issue 5s
            Special Issue ESWEEK 2019, CASES 2019, CODES+ISSS 2019 and EMSOFT 2019
            October 2019
            1423 pages
            ISSN:1539-9087
            EISSN:1558-3465
            DOI:10.1145/3365919
            Issue’s Table of Contents

            Copyright © 2019 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 8 October 2019
            • Accepted: 1 July 2019
            • Revised: 1 June 2019
            • Received: 1 April 2019
            Published in tecs Volume 18, Issue 5s

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format