Predicting Inference Latency of Neural Architectures on Mobile Devices

Authors:
Zhuojin Li

University of Southern California, Los Angeles, CA, USA

University of Southern California, Los Angeles, CA, USA

0000-0002-8308-0231
View Profile

,
Marco Paolieri

University of Southern California, Los Angeles, CA, USA

University of Southern California, Los Angeles, CA, USA

0000-0001-5110-203X
View Profile

,
Leana Golubchik

University of Southern California, Los Angeles, CA, USA

University of Southern California, Los Angeles, CA, USA

0000-0001-8353-5040
View Profile

ICPE '23: Proceedings of the 2023 ACM/SPEC International Conference on Performance EngineeringApril 2023Pages 99–112https://doi.org/10.1145/3578244.3583735

Published:15 April 2023Publication History

ICPE '23: Proceedings of the 2023 ACM/SPEC International Conference on Performance Engineering

Pages 99–112

ABSTRACT

Due to the proliferation of inference tasks on mobile devices, state-of-the-art neural architectures are typically designed using Neural Architecture Search (NAS) to achieve good tradeoffs between machine learning accuracy and inference latency. While measuring inference latency of a huge set of candidate architectures during NAS is not feasible, latency prediction for mobile devices is challenging, because of hardware heterogeneity, optimizations applied by machine learning frameworks, and diversity of neural architectures. Motivated by these challenges, we first quantitatively assess the characteristics of neural architectures and mobile devices that have significant effects on inference latency. Based on this assessment, we propose an operation-wise framework which addresses these challenges by developing operation-wise latency predictors and achieves high accuracy in end-to-end latency predictions, as shown by our comprehensive evaluations on multiple mobile devices using multicore CPUs and GPUs. To illustrate that our approach does not require expensive data collection, we also show that accurate predictions can be achieved on real-world neural architectures using only small amounts of profiling data.

References

Saad Abbasi, Alexander Wong, and Mohammad Javad Shafiee. 2022. MAPLE: Microprocessor a Priori for Latency Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.Google ScholarCross Ref
Apple. 2016. Prioritize Work at the Task Level. https://developer.apple.com/library/archive/documentation/Performance/Conceptual/power_efficiency_guidelines_osx/PrioritizeWorkAtTheTaskLevel.html Accessed: 2022--10--10.Google Scholar
Apple. 2021. Discover Metal debugging, profiling, and asset creation tools. https://developer.apple.com/videos/play/wwdc2021/10157 Accessed: 2022--10-06.Google Scholar
Noureddine Bouhali, Hamza Ouarnoughi, Smail Niar, and Abdessamad Ait El Cadi. 2021. Execution Time Modeling for CNN Inference on Embedded GPUs. In Proceedings of the 2021 Drone Systems Engineering and Rapid Simulation and Performance Evaluation: Methods and Tools Proceedings. 59--65.Google ScholarDigital Library
Halima Bouzidi, Hamza Ouarnoughi, Smail Niar, and Abdessamad Ait El Cadi. 2021. Performance prediction for convolutional neural networks on edge GPUs. In Proceedings of the 18th ACM International Conference on Computing Frontiers. 54--62.Google ScholarDigital Library
Wieland Brendel and Matthias Bethge. 2019. Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet. arXiv preprint arXiv:1904.00760 (2019).Google Scholar
Peter Bryzgalov, Toshiyuki Maeda, and Yutaro Shigeto. 2021. Predicting How CNN Training Time Changes on Various Mini-Batch Sizes by Considering Convolution Algorithms and Non-GPU Time. In Proceedings of the 2021 on Performance EngineeRing, Modelling, Analysis, and VisualizatiOn STrategy. 11--18.Google ScholarDigital Library
Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. 2019a. Once-for-All: Train One Network and Specialize it for Efficient Deployment. In International Conference on Learning Representations, ICLR.Google Scholar
Han Cai, Ligeng Zhu, and Song Han. 2019b. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware. In International Conference on Learning Representations, ICLR.Google Scholar
Ping Chao, Chao-Yang Kao, Yu-Shan Ruan, Chien-Hsiang Huang, and Youn-Long Lin. 2019. HarDNet: A Low Memory Traffic Network. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).Google ScholarCross Ref
Xiaoliang Dai, Peizhao Zhang, Bichen Wu, Hongxu Yin, Fei Sun, Yanghan Wang, Marat Dukhan, Yunqing Hu, Yiming Wu, Yangqing Jia, Peter Vajda, Matt Uyttendaele, and Niraj K. Jha. 2019. ChamNet: Towards Efficient Network Design Through Platform-Aware Model Adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Xuanyi Dong, Lu Liu, Katarzyna Musial, and Bogdan Gabrys. 2021. NATS-Bench: Benchmarking NAS Algorithms for Architecture Topology and Size. IEEE transactions on pattern analysis and machine intelligence, Vol. 44, 7 (2021), 3634--3646.Google Scholar
Lukasz Dudziak, Thomas Chau, Mohamed Abdelfattah, Royson Lee, Hyeji Kim, and Nicholas Lane. 2020. BRP-NAS: Prediction-based NAS using GCNs. In Advances in Neural Information Processing Systems, Vol. 33. 10480--10490.Google Scholar
Jerome H. Friedman. 2001. Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, Vol. 29, 5 (2001), 1189--1232.Google ScholarCross Ref
Yanjie Gao, Xianyu Gu, Hongyu Zhang, Haoxiang Lin, and Mao Yang. 2021. Runtime Performance Prediction for Deep Learning Models with Graph Neural Network. Technical Report. Technical Report MSR-TR-2021--3. Microsoft.Google Scholar
X Yu Geoffrey, Yubo Gao, Pavel Golikov, and Gennady Pekhimenko. 2021. Habitat: A Runtime-Based Computational Performance Predictor for Deep Neural Network Training. In USENIX Annual Technical Conference. 503--521.Google Scholar
Google. 2022a. Tensorflow Lite: Kernel Fusion Implementation. https://github.com/tensorflow/tensorflow/blob/v2.9.0/tensorflow/lite/delegates/gpu/common/gpu_model.cc#L393 Accessed: 2022-08-05.Google Scholar
Google. 2022b. Tensorflow Lite: ML for Mobile and edge devices. https://www.tensorflow.org/liteGoogle Scholar
Google. 2022c. TensorFlow Lite: Multithreading for Convolutions with the Ruy Library. https://github.com/google/ruy/blob/38a926/ruy/trmul.cc#L390 Accessed: 2022-08-05.Google Scholar
Google. 2022d. TensorFlow Lite: Multithreading for Depthwise Convolutions. https://github.com/tensorflow/tensorflow/blob/v2.9.0/tensorflow/lite/kernels/internal/optimized/depthwiseconv_multithread.h#L173 Accessed: 2022-08-05.Google Scholar
Google. 2022 e. Tensorflow Lite: Profile Time for OpenCL Kernels. https://github.com/tensorflow/tensorflow/blob/v2.9.0/tensorflow/lite/delegates/gpu/cl/inference_context.cc#L792 Accessed: 2022--10--12.Google Scholar
Google. 2022 f. TFLite Model Benchmark Tool. https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/benchmark Accessed: 2022-07--12.Google Scholar
Ubaid Ullah Hafeez and Anshul Gandhi. 2020. Empirical Analysis and Modeling of Compute Times of CNN Operations on AWS Cloud. In 2020 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 181--192.Google Scholar
Kai Han, Yunhe Wang, Qi Tian, Jianyuan Guo, Chunjing Xu, and Chang Xu. 2020. GhostNet: More Features From Cheap Operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016a. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016b. Identity mappings in deep residual networks. In Computer Vision--ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part IV 14. Springer, 630--645.Google Scholar
Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, and Hartwig Adam. 2019. Searching for MobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).Google ScholarCross Ref
Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv preprint arXiv:1704.04861 (2017).Google Scholar
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. 2017. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and 0.5 MB model size. arXiv preprint arXiv:1602.07360 (2016).Google Scholar
Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning, Vol. 37. PMLR, 448--456.Google ScholarDigital Library
Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
Daniel Justus, John Brennan, Stephen Bonner, and Andrew Stephen McGough. 2018. Predicting the Computational Cost of Deep Learning Models. In 2018 IEEE International Conference on Big Data (Big Data). 3873--3882.Google Scholar
Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. (2009).Google Scholar
Andrew Lavin and Scott Gray. 2016. Fast Algorithms for Convolutional Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
Juhyun Lee, Nikolay Chirkov, Ekaterina Ignasheva, Yury Pisarchyk, Mogan Shieh, Fabio Riccardi, Raman Sarokin, Andrei Kulik, and Matthias Grundmann. 2019a. On-Device Neural Net Inference with Mobile GPUs. arXiv preprint arXiv:1907.01989 (2019).Google Scholar
Youngwan Lee, Joong-won Hwang, Sangrok Lee, Yuseok Bae, and Jongyoul Park. 2019b. An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.Google ScholarCross Ref
Jinyang Li, Runyu Ma, Vikram Sharma Mailthody, Colin Samplawski, Benjamin Marlin, Songqing Chen, Shuochao Yao, and Tarek Abdelzaher. 2021. Towards an Accurate Latency Model for Convolutional Neural Network Layers on GPUs. In MILCOM 2021--2021 IEEE Military Communications Conference (MILCOM). IEEE, 904--909.Google ScholarDigital Library
Zhuojin Li, Marco Paolieri, and Leana Golubchik. 2022. Predicting Inference Latency of Neural Architectures on Mobile Devices. arXiv preprint arXiv:2210.02620 (2022).Google Scholar
Bingqian Lu, Jianyi Yang, Weiwen Jiang, Yiyu Shi, and Shaolei Ren. 2021. One proxy device is enough for hardware-aware neural architecture search. Proceedings of the ACM on Measurement and Analysis of Computing Systems, Vol. 5, 3 (2021), 1--34.Google ScholarDigital Library
Sangkug Lym, Donghyuk Lee, Mike O'Connor, Niladrish Chatterjee, and Mattan Erez. 2019. DeLTA: GPU performance model for deep learning applications with in-depth memory system traffic analysis. In 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 293--303.Google ScholarCross Ref
Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. 2018. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European conference on computer vision (ECCV). 116--131.Google ScholarDigital Library
Kevin P Murphy. 2012. Machine learning: a probabilistic perspective. MIT press.Google ScholarDigital Library
Markus Nagel, Marios Fournarakis, Rana Ali Amjad, Yelysei Bondarenko, Mart Van Baalen, and Tijmen Blankevoort. 2021. A White Paper on Neural Network Quantization. arXiv preprint arXiv:2106.08295 (2021).Google Scholar
Wei Niu, Jiexiong Guan, Yanzhi Wang, Gagan Agrawal, and Bin Ren. 2021. DNNFusion: accelerating deep neural networks execution with advanced operator fusion. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. 883--898.Google ScholarDigital Library
Microsoft Research nn Meter Team. 2021. nn-Meter: Towards Accurate Latency Prediction of Deep-Learning Model Inference on Diverse Edge Devices. https://github.com/microsoft/nn-MeterGoogle Scholar
Hang Qi, Evan R. Sparks, and Ameet Talwalkar. 2017. Paleo: A Performance Model for Deep Neural Networks. In Proceedings of the International Conference on Learning Representations.Google Scholar
Zheng Qin, Zhaoning Zhang, Xiaotao Chen, Changjian Wang, and Yuxing Peng. 2018. Fd-mobilenet: Improved mobilenet with a fast downsampling strategy. In 2018 25th IEEE International Conference on Image Processing (ICIP). IEEE, 1363--1367.Google ScholarCross Ref
Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, and Piotr Dollar. 2020. Designing Network Design Spaces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
Dimitrios Stamoulis, Ruizhou Ding, Di Wang, Dimitrios Lymberopoulos, Bodhi Priyantha, Jie Liu, and Diana Marculescu. 2020. Single-Path NAS: Designing Hardware-Efficient ConvNets in Less Than 4 Hours. In Machine Learning and Knowledge Discovery in Databases. Springer, Cham, 481--497.Google Scholar
Muhtadyuzzaman Syed and Arvind Akpuram Srinivasan. 2021. Generalized Latency Performance Estimation for Once-For-All Neural Architecture Search. arXiv preprint arXiv:2101.00732 (2021).Google Scholar
Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V. Le. 2019. MnasNet: Platform-Aware Neural Architecture Search for Mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In International conference on machine learning. PMLR, 6105--6114.Google Scholar
Xiaohu Tang, Shihao Han, Li Lyna Zhang, Ting Cao, and Yunxin Liu. 2021. To bridge neural network design and real-world performance: A behaviour study for neural networks. Proceedings of Machine Learning and Systems, Vol. 3 (2021), 21--37.Google Scholar
Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), Vol. 58, 1 (1996), 267--288.Google ScholarCross Ref
Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Wenyu Liu, and Bin Xiao. 2020. Deep high-resolution representation learning for visual recognition. IEEE transactions on pattern analysis and machine intelligence, Vol. 43, 10 (2020), 3349--3364.Google Scholar
Robert J Wang, Xiang Li, and Charles X Ling. 2018. Pelee: A real-time object detection system on mobile devices. Advances in neural information processing systems, Vol. 31 (2018).Google Scholar
Siqi Wang, Gayathri Ananthanarayanan, Yifan Zeng, Neeraj Goel, Anuj Pathania, and Tulika Mitra. 2019. High-throughput CNN inference on embedded ARM Big. LITTLE multicore processors. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 39, 10 (2019), 2254--2267.Google ScholarCross Ref
Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. 2019b. FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
Carole-Jean Wu, David Brooks, Kevin Chen, Douglas Chen, Sy Choudhury, Marat Dukhan, Kim Hazelwood, Eldad Isaac, Yangqing Jia, Bill Jia, Tommer Leyvand, Hao Lu, Yang Lu, Lin Qiao, Brandon Reagen, Joe Spisak, Fei Sun, Andrew Tulloch, Peter Vajda, Xiaodong Wang, Yanghan Wang, Bram Wasti, Yiming Wu, Ran Xian, Sungjoo Yoo, and Peizhao Zhang. 2019a. Machine learning at facebook: Understanding inference at the edge. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 331--344.Google ScholarCross Ref
Saining Xie, Ross Girshick, Piotr Dollar, Zhuowen Tu, and Kaiming He. 2017. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
Tien-Ju Yang, Andrew Howard, Bo Chen, Xiao Zhang, Alec Go, Mark Sandler, Vivienne Sze, and Hartwig Adam. 2018. Netadapt: Platform-aware neural network adaptation for mobile applications. In Proceedings of the European Conference on Computer Vision (ECCV). 285--300.Google ScholarDigital Library
Fisher Yu, Dequan Wang, Evan Shelhamer, and Trevor Darrell. 2018. Deep Layer Aggregation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
Sergey Zagoruyko and Nikos Komodakis. 2017. Diracnets: Training very deep neural networks without skip-connections. arXiv preprint arXiv:1706.00388 (2017).Google Scholar
Li Lyna Zhang, Shihao Han, Jianyu Wei, Ningxin Zheng, Ting Cao, Yuqing Yang, and Yunxin Liu. 2021. nn-Meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices. In Proceedings of the 19th Annual International Conference on Mobile Systems, Applications, and Services. 81--93.Google ScholarDigital Library
Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
Barret Zoph and Quoc V. Le. 2017. Neural Architecture Search with Reinforcement Learning. In International Conference on Learning Representations, ICLR.Google Scholar
Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2018. Learning Transferable Architectures for Scalable Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar

Index Terms

Predicting Inference Latency of Neural Architectures on Mobile Devices

Recommendations

A Benchmark for ML Inference Latency on Mobile Devices
EdgeSys '24: Proceedings of the 7th International Workshop on Edge Systems, Analytics and Networking

Inference latency prediction on mobile devices is essential for multiple applications, including collaborative inference and neural architecture search. Training accurate latency predictors using ML techniques requires sufficient and representative data; ...
Read More
Evaluating the performance portability of SYCL across CPUs and GPUs on bandwidth-bound applications
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

In this paper, we evaluate the portability of the SYCL programming model on some of the latest CPUs and GPUs from a wide range of vendors, utilizing the two main compilers: DPC++ and hipSYCL/OpenSYCL. Both compilers currently support GPUs from all three ...
Read More
Software-reduced touchscreen latency
MobileHCI '16: Proceedings of the 18th International Conference on Human-Computer Interaction with Mobile Devices and Services

Devices with touchscreens have an inherent latency. When a user's finger drags an object across the screen the object follows with a latency of around 100ms for current devices. Previous work showed that latencies down to 25ms reduce users' performance ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICPE '23: Proceedings of the 2023 ACM/SPEC International Conference on Performance Engineering
April 2023
244 pages
ISBN:9798400700682
DOI:10.1145/3578244
General Chairs:
Marco Vieira
University of Coimbra, Portugal
,
Valeria Cardellini
University of Rome Tor Vergata, Italy
,
Program Chairs:
Antinisca Di Marco
University of L'Aquila, Italy
,
Petr Tuma
Charles University, Czechia
Copyright © 2023 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 April 2023
Check for updates
Author Tags
CPU
GPU
NAS
latency
mobile
neural networks
prediction
Qualifiers
- research-article
Conference

Acceptance Rates
ICPE '23 Paper Acceptance Rate15of46submissions,33%Overall Acceptance Rate252of851submissions,30%
More
Upcoming Conference
ICPE '24

Sponsor:

sigsoft online

sigsoft online

15th ACM/SPEC International Conference on Performance Engineering

May 7 - 11, 2024

London , United Kingdom
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 512
  Total Downloads
- Downloads (Last 12 months)512
- Downloads (Last 6 weeks)58
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Predicting Inference Latency of Neural Architectures on Mobile Devices

ICPE '23: Proceedings of the 2023 ACM/SPEC International Conference on Performance Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Benchmark for ML Inference Latency on Mobile Devices

Evaluating the performance portability of SYCL across CPUs and GPUs on bandwidth-bound applications

Software-reduced touchscreen latency