Extending a predictable machine learning framework with efficient gemm-based convolution routines

De Albuquerque Silva, Iryna; Carle, Thomas; Gauffriau, Adrien; Pagetti, Claire

doi:10.1007/s11241-023-09407-z

Extending a predictable machine learning framework with efficient gemm-based convolution routines

Published: 28 August 2023

Volume 59, pages 408–437, (2023)
Cite this article

Real-Time Systems Aims and scope Submit manuscript

212 Accesses
Explore all metrics

Abstract

To implement machine learning applications in real-time safety-critical systems, we previously introduced a predictable framework named ACETONE. This framework compiles the detailed description of an off-line trained feed-forward deep neural network into an equivalent C code. In this paper, we improve the performance of the generated C code by including gemm-based convolutions in ACETONE. The code incorporating the gemm routines maintains the ACETONE properties of semantics preservation and timing predictability. We compare the proposed method with ACETONE ’s initial version, Keras2c and uTVM on a realistic set of machine learning benchmarks and show that the introduced convolution algorithms allow a trade-off between performance and memory footprint.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 8

Fig. 10

A Hybrid Machine Learning Model for Code Optimization

Article 22 September 2023

Spiking Deep Convolutional Neural Networks for Energy-Efficient Object Recognition

Article 23 November 2014

Can GPU performance increase faster than the code error rate?

Article Open access 18 April 2024

Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Notes

There may be an additional parameter, that is the dilation supported by the code generation and not detailed here.

References

Abadi M, Agarwal A, Barham P, et al (2015) TensorFlow: large-scale machine learning on heterogeneous systems. URL https://www.tensorflow.org/, software available from tensorflow.org
Alves E, Bhatt D, Hall B, et al (2018) Considerations in assuring safety of increasingly autonomous systems. NASA
Amiri H, Shahbahrami A (2017) High performance implementation of 2D convolution using Intel’s advanced vector extensions. In: 2017 Artificial intelligence and signal processing conference (AISP), pp 25–30, https://doi.org/10.1109/AISP.2017.8324097
Anderson A, Vasudevan A, Keane C, et al (2017) Low-memory GEMM-based convolution algorithms for deep neural networks. https://doi.org/10.48550/arXiv.1709.03395, arXiv:1709.03395 [cs]
ApacheTVM (2021) microTVM: TVM on bare-metal. URL https://tvm.apache.org/docs/topic/microtvm/index.html
Ballabriga C, Cassé H, Rochange C, et al (2010) OTAWA: an open toolbox for adaptive WCET analysis (regular paper). In: IFIP Workshop on software technologies for future embedded and ubiquitous systems (SEUS)
Bhattacharyya S, Cofer D, Musliner D, et al (2015) Certification considerations for adaptive systems. 2015 International conference on unmanned aircraft systems, ICUAS 2015 pp 270–279. https://doi.org/10.1109/ICUAS.2015.7152300
Chellapilla K, Puri S, Simard P (2006) High performance convolutional neural networks for document processing. In: Lorette G (ed) Tenth international workshop on frontiers in handwriting recognition, Université de Rennes 1. Suvisoft, La Baule (France), URL https://hal.inria.fr/inria-00112631, http://www.suvisoft.com
Chen T, Moreau T, Jiang Z, et al (2018a) TVM: end-to-end optimization stack for deep learning. CoRR arXiv:abs/1802.04799
Chen T, Zheng L, Yan E, et al (2018b) Learning to optimize tensor programs. In: Proceedings of the 32nd international conference on neural information processing systems. Curran Associates Inc., Red Hook, NY, USA, NIPS’18, p 3393-3404
Chetlur S, Woolley C, Vandermersch P, et al (2014) cuDNN: efficient primitives for deep learning. CoRR arXiv:abs/1410.0759
Chichin S, Portes D, Blunder M, et al (2020) Capability to embed deep neural networks: study on CPU processor in avionics context. In: 10th European congress embedded real time systems (ERTS 2020)
Cong J, Xiao B (2014) Minimizing computation in convolutional neural networks. In: Wermter S, Weber C, Duch W et al (eds) Artificial neural networks and machine learning - ICANN 2014. Springer, Cham, pp 281–290
Chapter Google Scholar
Conlin R, Erickson K, Abbate J et al (2021) Keras2c: a library for converting keras neural networks to real-time compatible C. Eng Appl Artif Intell 100(104):182
Google Scholar
developers OR (2021) Onnx runtime. https://onnxruntime.ai/
Dongarra JJ, Du Croz J, Hammarling S et al (1990) A set of level 3 basic linear algebra subprograms. ACM Trans Math Softw 16(1):1–17. https://doi.org/10.1145/77626.79170
Article MathSciNet MATH Google Scholar
Dukhan M (2019) The indirect convolution algorithm. CoRR arXiv:abs/1907.02129
EUROCAE WG-114/SAE joint group (2021) Certification/approval of aeronautical systems based on AI. On going standardization
Gholami A, Kim S, Dong Z, et al (2021) A survey of quantization methods for efficient neural network inference. CoRR arXiv:abs/2103.13630
Gong Y, Liu L, Yang M, et al (2014) Compressing deep convolutional networks using vector quantization. CoRR arXiv:abs/1412.6115
Goto K, van de Geijn RA (2008) Anatomy of high-performance matrix multiplication. ACM Trans Math Softw 34(3):1–25. https://doi.org/10.1145/1356052.1356053
Article MathSciNet MATH Google Scholar
Han S, Mao H, Dally WJ (2016) Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. In: Bengio Y, LeCun Y (eds) 4th International conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, conference track proceedings, arXiv:org/abs/1510.00149
Hoseinzade E, Haratizadeh S (2019) CNNpred: CNN-based stock market prediction using a diverse set of variables. Expert Syst Appl 129:273–285
Article Google Scholar
IEEE (2019) IEEE Standard for Floating-Point Arithmetic. IEEE Std 754-2019 (Revision of IEEE 754-2008) pp 1–84. https://doi.org/10.1109/IEEESTD.2019.8766229
Jia Z, Padon O, Thomas J, et al (2019) TASO. In: Proceedings of the 27th ACM symposium on operating systems principles. ACM, https://doi.org/10.1145/3341301.3359630
Kalray (2021) MPPA® Coolidge\(^{{\rm TM}}\) Processor - white paper. https://www.kalrayinc.com/documentation/
Karmani RK, Agha G, Squillante MS et al (2011) ATLAS (Automatically tuned linear algebra software). Encyclopedia of parallel computing. Springer, New York, pp 95–101
Google Scholar
Krizhevsky A (2009) Learning multiple layers of features from tiny images. Tech. Rep. 0, University of Toronto
Lattner C, Amini M, Bondhugula U, et al (2021) MLIR: scaling compiler infrastructure for domain specific computation. In: Lee JW, Soffa ML, Zaks A (eds) International symposium on code generation and optimization, (CGO), pp 2–14
Lavin A, Gray S (2016) Fast algorithms for convolutional neural networks. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 4013–4021, https://doi.org/10.1109/CVPR.2016.435
LeCun Y, Boser BE, Denker JS et al (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551
Article Google Scholar
Li C, Yang Y, Feng M, et al (2016) Optimizing memory efficiency for deep convolutional neural networks on GPUs. In: Proceedings of the international conference for high performance computing, networking, storage and analysis, SC 2016
Lin S, Liu N, Nazemi M, et al (2018) FFT-based deep learning deployment in embedded systems. In: 2018 Design, automation and test in Europe conference and exhibition (DATE, pp 1045–1050, https://doi.org/10.23919/DATE.2018.8342166
Liu Y, Wang Y, Yu R, et al (2018) Optimizing CNN Model Inference on CPUs. https://doi.org/10.48550/ARXIV.1809.02697, arXiv:org/abs/1809.02697
Low TM, Igual FD, Smith TM et al (2016) Analytical modeling is enough for high-performance BLIS. ACM Trans Math Softw 43(2):1–18. https://doi.org/10.1145/2925987
Article MathSciNet MATH Google Scholar
Mathieu M, Henaff M, LeCun Y (2014) Fast training of convolutional networks through FFTS: International conference on learning representations (ICLR2014), cbls, april 2014. 2nd International conference on learning representations, ICLR 2014 ; Conference date: 14-04-2014 through 16-04-2014
NVIDIA (2021) Tensorrt documentation
Park H, Kim D, Ahn J, et al (2016) Zero and data reuse-aware fast convolution for deep neural networks on GPU. In: Proceedings of the eleventh IEEE/ACM/IFIP international conference on hardware/software codesign and system synthesis. Association for computing machinery, New York, NY, USA, CODES ’16, https://doi.org/10.1145/2968456.2968476,
Paszke A, Gross S, Massa F, et al (2019) PyTorch: an imperative style, high-performance deep learning library. In: Wallach H, Larochelle H, Beygelzimer A, et al (eds) Advances in neural information processing systems 32. p 8024–8035
Pearce H, Yang X, Roop PS et al (2020) Designing neural networks for real-time systems. IEEE Embed Syst Lett 13:1–1
Google Scholar
Perez-Cerrolaza J, Abella J, Kosmidis L et al (2022) GPU devices for safety-critical systems: a survey. ACM Comput Surv. https://doi.org/10.1145/3549526
Article Google Scholar
Pompougnac H, Beaugnon U, Cohen A, et al (2020) From SSA to synchronous concurrency and back. Research report RR-9380, INRIA Sophia Antipolis - Méditerranée (France), URL https://hal.inria.fr/hal-03043623
Pujol R, Jorba J, Tabani H, et al (2022) Vector extensions in cots processors to increase guaranteed performance in real-time systems. ACM Trans Embed Comput Syst
Ray PP (2022) A review on tinyml: state-of-the-art and prospects. J King Saud Univ Comput Inf Sci 34(4):1595–1623
Google Scholar
RTCA/EUROCAE (2011) DO-178C/ED-12C - Software considerations in airborne systems and equipment certification
Schoeberl M, Abbaspour S, Akesson B et al (2015) T-crest: time-predictable multi-core architecture for embedded systems. J Syst Archit 61(9):449–471
Article Google Scholar
Sentieys O, Filip S, Briand D, et al (2021) Adequatedl: approximating deep learning accelerators. In: 24th International symposium on design and diagnostics of electronic circuits systems (DDECS 21)
Silva IDA, Carle T, Gauffriau A, et al (2022) ACETONE: predictable programming framework for ML applications in safety-critical systems. In: 34th Euromicro conference on real-time systems, ECRTS 2022, July 5-8, 2022, Modena, Italy, pp 3:1–3:19
Stahl R (2021) \(\mu\)TVM StaticRT CodeGen. URL https://github.com/tum-ei-eda/utvm_staticrt_codegen
TensorFlow (2022) Simple audio recognition: recognizing keywords. URL https://www.tensorflow.org/tutorials/audio/simple_audio
Texas Instruments (2013) TCI6630K2L Multicore DSP+ARM KeyStone II System-on-Chip. Tech. Rep. SPRS893E, Texas Instruments Incorporated
The Khronos NNEF Working Group (2018) Neural network exchange format
Tollenaere N, Iooss G, Pouget S et al (2022) Autotuning convolutions is easier than you think. ACM Trans Archit Code Optim. https://doi.org/10.1145/3570641
Article Google Scholar
Van Zee FG, van de Geijn RA (2015) BLIS: a framework for rapidly instantiating BLAS functionality. ACM Trans Math Softw 41(3):1–33
MathSciNet MATH Google Scholar
Warden P (2018) Speech commands: a dataset for limited-vocabulary speech recognition. CoRR arXiv:abs/1804.03209
Whaley RC, Petitet A, Dongarra JJ (2001) Automated empirical optimizations of software and the ATLAS project. Parallel Comput 27(1–2):3–35. https://doi.org/10.1016/s0167-8191(00)00087-9
Article MATH Google Scholar
Wilhelm R, Engblom J, Ermedahl A et al (2008) The worst-case execution-time problem-overview of methods and survey of tools. ACM Trans Embed Comput Syst 7:1–53
Article Google Scholar
Xianyi Z, Qian W, Saar W (2011) Openblas: an optimized BLAS library. URL https://www.openblas.net/
Zhang J, Franchetti F, Low TM (2018) High performance zero-memory overhead direct convolutions. In: Dy J, Krause A (eds) Proceedings of the 35th international conference on machine learning, pp 5776–5785, URl https://proceedings.mlr.press/v80/zhang18d.html
Zheng L, Jia C, Sun M, et al (2020) Ansor : generating high-performance tensor programs for deep learning. https://doi.org/10.48550/ARXIV.2006.06762, arXiv:org/abs/2006.06762

Download references

Funding

This work has benefited from the AI Interdisciplinary Institute ANITI, which is funded by the French “Investing for the Future – PIA3” program under the Grant agreement ANR-19-P3IA-0004.

Author information

Authors and Affiliations

ONERA, Toulouse, France
Iryna De Albuquerque Silva & Claire Pagetti
IRIT - Université Toulouse 3 - CNRS, Toulouse, France
Thomas Carle
Airbus, Toulouse, France
Adrien Gauffriau

Authors

Iryna De Albuquerque Silva
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Carle
View author publications
You can also search for this author in PubMed Google Scholar
Adrien Gauffriau
View author publications
You can also search for this author in PubMed Google Scholar
Claire Pagetti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Iryna De Albuquerque Silva.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

De Albuquerque Silva, I., Carle, T., Gauffriau, A. et al. Extending a predictable machine learning framework with efficient gemm-based convolution routines. Real-Time Syst 59, 408–437 (2023). https://doi.org/10.1007/s11241-023-09407-z

Download citation

Accepted: 26 July 2023
Published: 28 August 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s11241-023-09407-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Extending a predictable machine learning framework with efficient gemm-based convolution routines

Abstract

Access this article

Similar content being viewed by others

A Hybrid Machine Learning Model for Code Optimization

Spiking Deep Convolutional Neural Networks for Energy-Efficient Object Recognition

Can GPU performance increase faster than the code error rate?

Data Availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Extending a predictable machine learning framework with efficient gemm-based convolution routines

Abstract

Access this article

Similar content being viewed by others

A Hybrid Machine Learning Model for Code Optimization

Spiking Deep Convolutional Neural Networks for Energy-Efficient Object Recognition

Can GPU performance increase faster than the code error rate?

Data Availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation