Skip to main content
Log in

Extending a predictable machine learning framework with efficient gemm-based convolution routines

  • Published:
Real-Time Systems Aims and scope Submit manuscript

Abstract

To implement machine learning applications in real-time safety-critical systems, we previously introduced a predictable framework named ACETONE. This framework compiles the detailed description of an off-line trained feed-forward deep neural network into an equivalent C code. In this paper, we improve the performance of the generated C code by including gemm-based convolutions in ACETONE. The code incorporating the gemm routines maintains the ACETONE properties of semantics preservation and timing predictability. We compare the proposed method with ACETONE ’s initial version, Keras2c and uTVM on a realistic set of machine learning benchmarks and show that the introduced convolution algorithms allow a trade-off between performance and memory footprint.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Notes

  1. There may be an additional parameter, that is the dilation supported by the code generation and not detailed here.

References

  • Abadi M, Agarwal A, Barham P, et al (2015) TensorFlow: large-scale machine learning on heterogeneous systems. URL https://www.tensorflow.org/, software available from tensorflow.org

  • Alves E, Bhatt D, Hall B, et al (2018) Considerations in assuring safety of increasingly autonomous systems. NASA

  • Amiri H, Shahbahrami A (2017) High performance implementation of 2D convolution using Intel’s advanced vector extensions. In: 2017 Artificial intelligence and signal processing conference (AISP), pp 25–30, https://doi.org/10.1109/AISP.2017.8324097

  • Anderson A, Vasudevan A, Keane C, et al (2017) Low-memory GEMM-based convolution algorithms for deep neural networks. https://doi.org/10.48550/arXiv.1709.03395, arXiv:1709.03395 [cs]

  • ApacheTVM (2021) microTVM: TVM on bare-metal. URL https://tvm.apache.org/docs/topic/microtvm/index.html

  • Ballabriga C, Cassé H, Rochange C, et al (2010) OTAWA: an open toolbox for adaptive WCET analysis (regular paper). In: IFIP Workshop on software technologies for future embedded and ubiquitous systems (SEUS)

  • Bhattacharyya S, Cofer D, Musliner D, et al (2015) Certification considerations for adaptive systems. 2015 International conference on unmanned aircraft systems, ICUAS 2015 pp 270–279. https://doi.org/10.1109/ICUAS.2015.7152300

  • Chellapilla K, Puri S, Simard P (2006) High performance convolutional neural networks for document processing. In: Lorette G (ed) Tenth international workshop on frontiers in handwriting recognition, Université de Rennes 1. Suvisoft, La Baule (France), URL https://hal.inria.fr/inria-00112631, http://www.suvisoft.com

  • Chen T, Moreau T, Jiang Z, et al (2018a) TVM: end-to-end optimization stack for deep learning. CoRR arXiv:abs/1802.04799

  • Chen T, Zheng L, Yan E, et al (2018b) Learning to optimize tensor programs. In: Proceedings of the 32nd international conference on neural information processing systems. Curran Associates Inc., Red Hook, NY, USA, NIPS’18, p 3393-3404

  • Chetlur S, Woolley C, Vandermersch P, et al (2014) cuDNN: efficient primitives for deep learning. CoRR arXiv:abs/1410.0759

  • Chichin S, Portes D, Blunder M, et al (2020) Capability to embed deep neural networks: study on CPU processor in avionics context. In: 10th European congress embedded real time systems (ERTS 2020)

  • Cong J, Xiao B (2014) Minimizing computation in convolutional neural networks. In: Wermter S, Weber C, Duch W et al (eds) Artificial neural networks and machine learning - ICANN 2014. Springer, Cham, pp 281–290

    Chapter  Google Scholar 

  • Conlin R, Erickson K, Abbate J et al (2021) Keras2c: a library for converting keras neural networks to real-time compatible C. Eng Appl Artif Intell 100(104):182

    Google Scholar 

  • developers OR (2021) Onnx runtime. https://onnxruntime.ai/

  • Dongarra JJ, Du Croz J, Hammarling S et al (1990) A set of level 3 basic linear algebra subprograms. ACM Trans Math Softw 16(1):1–17. https://doi.org/10.1145/77626.79170

    Article  MathSciNet  MATH  Google Scholar 

  • Dukhan M (2019) The indirect convolution algorithm. CoRR arXiv:abs/1907.02129

  • EUROCAE WG-114/SAE joint group (2021) Certification/approval of aeronautical systems based on AI. On going standardization

  • Gholami A, Kim S, Dong Z, et al (2021) A survey of quantization methods for efficient neural network inference. CoRR arXiv:abs/2103.13630

  • Gong Y, Liu L, Yang M, et al (2014) Compressing deep convolutional networks using vector quantization. CoRR arXiv:abs/1412.6115

  • Goto K, van de Geijn RA (2008) Anatomy of high-performance matrix multiplication. ACM Trans Math Softw 34(3):1–25. https://doi.org/10.1145/1356052.1356053

    Article  MathSciNet  MATH  Google Scholar 

  • Han S, Mao H, Dally WJ (2016) Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. In: Bengio Y, LeCun Y (eds) 4th International conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, conference track proceedings, arXiv:org/abs/1510.00149

  • Hoseinzade E, Haratizadeh S (2019) CNNpred: CNN-based stock market prediction using a diverse set of variables. Expert Syst Appl 129:273–285

    Article  Google Scholar 

  • IEEE (2019) IEEE Standard for Floating-Point Arithmetic. IEEE Std 754-2019 (Revision of IEEE 754-2008) pp 1–84. https://doi.org/10.1109/IEEESTD.2019.8766229

  • Jia Z, Padon O, Thomas J, et al (2019) TASO. In: Proceedings of the 27th ACM symposium on operating systems principles. ACM, https://doi.org/10.1145/3341301.3359630

  • Kalray (2021) MPPA® Coolidge\(^{{\rm TM}}\) Processor - white paper. https://www.kalrayinc.com/documentation/

  • Karmani RK, Agha G, Squillante MS et al (2011) ATLAS (Automatically tuned linear algebra software). Encyclopedia of parallel computing. Springer, New York, pp 95–101

    Google Scholar 

  • Krizhevsky A (2009) Learning multiple layers of features from tiny images. Tech. Rep. 0, University of Toronto

  • Lattner C, Amini M, Bondhugula U, et al (2021) MLIR: scaling compiler infrastructure for domain specific computation. In: Lee JW, Soffa ML, Zaks A (eds) International symposium on code generation and optimization, (CGO), pp 2–14

  • Lavin A, Gray S (2016) Fast algorithms for convolutional neural networks. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 4013–4021, https://doi.org/10.1109/CVPR.2016.435

  • LeCun Y, Boser BE, Denker JS et al (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551

    Article  Google Scholar 

  • Li C, Yang Y, Feng M, et al (2016) Optimizing memory efficiency for deep convolutional neural networks on GPUs. In: Proceedings of the international conference for high performance computing, networking, storage and analysis, SC 2016

  • Lin S, Liu N, Nazemi M, et al (2018) FFT-based deep learning deployment in embedded systems. In: 2018 Design, automation and test in Europe conference and exhibition (DATE, pp 1045–1050, https://doi.org/10.23919/DATE.2018.8342166

  • Liu Y, Wang Y, Yu R, et al (2018) Optimizing CNN Model Inference on CPUs. https://doi.org/10.48550/ARXIV.1809.02697, arXiv:org/abs/1809.02697

  • Low TM, Igual FD, Smith TM et al (2016) Analytical modeling is enough for high-performance BLIS. ACM Trans Math Softw 43(2):1–18. https://doi.org/10.1145/2925987

    Article  MathSciNet  MATH  Google Scholar 

  • Mathieu M, Henaff M, LeCun Y (2014) Fast training of convolutional networks through FFTS: International conference on learning representations (ICLR2014), cbls, april 2014. 2nd International conference on learning representations, ICLR 2014 ; Conference date: 14-04-2014 through 16-04-2014

  • NVIDIA (2021) Tensorrt documentation

  • Park H, Kim D, Ahn J, et al (2016) Zero and data reuse-aware fast convolution for deep neural networks on GPU. In: Proceedings of the eleventh IEEE/ACM/IFIP international conference on hardware/software codesign and system synthesis. Association for computing machinery, New York, NY, USA, CODES ’16, https://doi.org/10.1145/2968456.2968476,

  • Paszke A, Gross S, Massa F, et al (2019) PyTorch: an imperative style, high-performance deep learning library. In: Wallach H, Larochelle H, Beygelzimer A, et al (eds) Advances in neural information processing systems 32. p 8024–8035

  • Pearce H, Yang X, Roop PS et al (2020) Designing neural networks for real-time systems. IEEE Embed Syst Lett 13:1–1

    Google Scholar 

  • Perez-Cerrolaza J, Abella J, Kosmidis L et al (2022) GPU devices for safety-critical systems: a survey. ACM Comput Surv. https://doi.org/10.1145/3549526

    Article  Google Scholar 

  • Pompougnac H, Beaugnon U, Cohen A, et al (2020) From SSA to synchronous concurrency and back. Research report RR-9380, INRIA Sophia Antipolis - Méditerranée (France), URL https://hal.inria.fr/hal-03043623

  • Pujol R, Jorba J, Tabani H, et al (2022) Vector extensions in cots processors to increase guaranteed performance in real-time systems. ACM Trans Embed Comput Syst

  • Ray PP (2022) A review on tinyml: state-of-the-art and prospects. J King Saud Univ Comput Inf Sci 34(4):1595–1623

    Google Scholar 

  • RTCA/EUROCAE (2011) DO-178C/ED-12C - Software considerations in airborne systems and equipment certification

  • Schoeberl M, Abbaspour S, Akesson B et al (2015) T-crest: time-predictable multi-core architecture for embedded systems. J Syst Archit 61(9):449–471

    Article  Google Scholar 

  • Sentieys O, Filip S, Briand D, et al (2021) Adequatedl: approximating deep learning accelerators. In: 24th International symposium on design and diagnostics of electronic circuits systems (DDECS 21)

  • Silva IDA, Carle T, Gauffriau A, et al (2022) ACETONE: predictable programming framework for ML applications in safety-critical systems. In: 34th Euromicro conference on real-time systems, ECRTS 2022, July 5-8, 2022, Modena, Italy, pp 3:1–3:19

  • Stahl R (2021) \(\mu\)TVM StaticRT CodeGen. URL https://github.com/tum-ei-eda/utvm_staticrt_codegen

  • TensorFlow (2022) Simple audio recognition: recognizing keywords. URL https://www.tensorflow.org/tutorials/audio/simple_audio

  • Texas Instruments (2013) TCI6630K2L Multicore DSP+ARM KeyStone II System-on-Chip. Tech. Rep. SPRS893E, Texas Instruments Incorporated

  • The Khronos NNEF Working Group (2018) Neural network exchange format

  • Tollenaere N, Iooss G, Pouget S et al (2022) Autotuning convolutions is easier than you think. ACM Trans Archit Code Optim. https://doi.org/10.1145/3570641

    Article  Google Scholar 

  • Van Zee FG, van de Geijn RA (2015) BLIS: a framework for rapidly instantiating BLAS functionality. ACM Trans Math Softw 41(3):1–33

    MathSciNet  MATH  Google Scholar 

  • Warden P (2018) Speech commands: a dataset for limited-vocabulary speech recognition. CoRR arXiv:abs/1804.03209

  • Whaley RC, Petitet A, Dongarra JJ (2001) Automated empirical optimizations of software and the ATLAS project. Parallel Comput 27(1–2):3–35. https://doi.org/10.1016/s0167-8191(00)00087-9

    Article  MATH  Google Scholar 

  • Wilhelm R, Engblom J, Ermedahl A et al (2008) The worst-case execution-time problem-overview of methods and survey of tools. ACM Trans Embed Comput Syst 7:1–53

    Article  Google Scholar 

  • Xianyi Z, Qian W, Saar W (2011) Openblas: an optimized BLAS library. URL https://www.openblas.net/

  • Zhang J, Franchetti F, Low TM (2018) High performance zero-memory overhead direct convolutions. In: Dy J, Krause A (eds) Proceedings of the 35th international conference on machine learning, pp 5776–5785, URl https://proceedings.mlr.press/v80/zhang18d.html

  • Zheng L, Jia C, Sun M, et al (2020) Ansor : generating high-performance tensor programs for deep learning. https://doi.org/10.48550/ARXIV.2006.06762, arXiv:org/abs/2006.06762

Download references

Funding

This work has benefited from the AI Interdisciplinary Institute ANITI, which is funded by the French “Investing for the Future – PIA3” program under the Grant agreement ANR-19-P3IA-0004.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Iryna De Albuquerque Silva.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

De Albuquerque Silva, I., Carle, T., Gauffriau, A. et al. Extending a predictable machine learning framework with efficient gemm-based convolution routines. Real-Time Syst 59, 408–437 (2023). https://doi.org/10.1007/s11241-023-09407-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11241-023-09407-z

Keywords

Navigation