Skip to main content
Log in

A hybrid GPU-FPGA based design methodology for enhancing machine learning applications performance

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

The high-density computing requirements of machine learning (ML) is a challenging performance bottleneck. Limited by the sequential instruction execution system, traditional general purpose processors are not suitable for efficient ML. In this work, we present an ML system design methodology based on GPU and FPGA to tackle this problem. The core idea of our proposal is when designing an ML platform, we leverage the graphics processing unit (GPU)’s high-density computing to perform model training and exploit field programmable gate array (FPGA)’s low-latency to perform model inferencing. In between, we define a model converter, which enable transforming the model used by the training module to one that is used by inferencing module. We evaluated our approach through two use cases. The first is a handwritten digit recognition with convolutional neural network while the second use case is for predicting data center’s power usage effectiveness with deep neural network regression algorithm. The experimental results indicate that our solution can take advantages of GPU and FPGA’s parallel computing capacity to improve the efficiency of training and inferencing significantly. Meanwhile, the solution preserves the accuracy and the mean square error while converting the models between the different frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  • Aydonat U, O’Connell S, Capalija D, Ling AC, Chiu GR (2017) An OpenCL deep learning accelerator on Arria 10. In: Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays, ACM, pp 55–64

  • Bauer S, Köhler S, Doll K, Brunsmann U (2010) FPGA-GPU architecture for Kernel SVM pedestrian detection. In: Proceedings of the 2010 IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW), IEEE, pp 61–68

  • Bergstra J, Bastien F, Breuleux O, Lamblin P, Pascanu R, Delalleau O et al (2011) Theano: deep learning on gpus with python. In: NIPS 2011, BigLearning Workshop, Granada, Spain, Citeseer, vol 3

  • Bettoni M, Urgese G, Kobayashi Y, Macii E, Acquaviva A (2017) A convolutional neural network fully implemented on FPGA for embedded platforms. In: New generation of CAS (NGCAS), IEEE, pp 49–52

  • Chen C, Yao J, Zhang R, Zhou Y, Qin T, Zhan T, Wang Q (2019) MMdnn. GitHub repository. https://github.com/microsoft/MMdnn

  • David Wright (2017) Improving electrical efficiency in your data center. https://www.datacenterknowledge.com/archives/2014/09/23/improving-electrical-efficiency-data-center

  • Ganesh SS, Arulmozhivarman P, Tatavarti VSNR (2018) Prediction of pm2.5 using an ensemble of artificial neural networks and regression models. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-018-0801-8

    Article  Google Scholar 

  • Google (2018) The MNIST matrix. https://www.tensorflow.org/versions/r1.1/get_started/mnist/beginners

  • Google (2019) TensorFlow. https://www.tensorflow.org/

  • Huang R, Feng W, Fan M, Guo Q, Sun J (2017) Learning multi-path cnn for mural deterioration detection. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-017-0656-4

    Article  Google Scholar 

  • Intel (2018) Intel OpenCL development. http://www.innovatefpga.com/cgi-bin/innovate/teams.pl?Id=PR029&All=1

  • Kind T (2018) Tensorflow (TF) benchmarks. https://github.com/tobigithub/tensorflow-deep-learning/wiki/tf-benchmarks

  • Lanfear T (2013) High performance computing with CUDA and Tesla hardware. https://intranet.birmingham.ac.uk/it/teams/infrastructure/research/bear/documents/public/CUDA-2013-07-31/CUDA-Tutorial.pdf

  • LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  • LeCun Y, Cortes C, Burges CJC (2018) The MNIST database. http://yann.lecun.com/exdb/mnist/

  • Li Y, Liu Z, Xu K, Yu H, Ren F (2018) A gpu-outperforming fpga accelerator architecture for binary convolutional neural networks. J Emerg Technol Comput Syst 14(2):18:1–18:16. https://doi.org/10.1145/3154839

    Article  Google Scholar 

  • Liu X, Ounifi HA, Gherbi A, Lemieux Y, Li W (2018) A hybrid gpu-FPGA-based computing platform for machine learning. Proc Comput Sci 141:104–111

    Article  Google Scholar 

  • Motamedi M, Gysel P, Akella V, Ghiasi S (2016) Design space exploration of FPGA-based deep convolutional neural networks. In: Proceedings of the 21st Asia and South Pacific Design Automation Conference (ASP-DAC), IEEE, pp 575–580

  • Nagarajan K, Holland B, George AD, Slatton KC, Lam H (2011) Accelerating machine-learning algorithms on FPGAs using pattern-based decomposition. J Signal Process Syst 62(1):43–63

    Article  Google Scholar 

  • Ounifi HA, Liu X, Gherbi A, Lemieux Y, Li W (2018) Model-based approach to data center design and power usage effectiveness assessment. Proc Comput Sci 141:143–150

    Article  Google Scholar 

  • Potluri S, Fasih A, Vutukuru LK, Al Machot F, Kyamakya K (2011) CNN based high performance computing for real time image processing on GPU. In: 2011 joint 3rd Int’l workshop on nonlinear dynamics and synchronization (INDS) and 16th Int’l symposium on theoretical electrical engineering (ISTET), IEEE, pp 1–7

  • Qiu J, Wang J, Yao S, Guo K, Li B, Zhou E, Yu J, Tang T, Xu N, Song S et al (2016) Going deeper with embedded FPGA platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA international symposium on field-programmable gate arrays, ACM, pp 26–35

  • Raina R, Madhavan A, Ng AY (2009) Large-scale deep unsupervised learning using graphics processors. In: Proceedings of the 26th annual international conference on machine learning, ACM, pp 873–880

  • Rush A, Sirasao A, Ignatowski M (2017) Unified deep learning with cpu gpu and fpga technologies. In: Advanced Micro Devices, Tech. Rep

  • Sharp T (2008) Implementing decision trees and forests on a GPU. In: European conference on computer vision. Springer, Berlin, Heidelberg, pp 595–608

    Google Scholar 

  • Steinkraus D, Buck I, Y Simard P (2005) Using GPUs for machine learning algorithms. In: Proceedings of the 8th international conference on document analysis and recognition, IEEE, pp 1115–1120

  • Wang C, Gong L, Yu Q, Li X, Xie Y, Zhou X (2017) Dlau: a scalable deep learning accelerator unit on FPGA. IEEE Trans Comput Aided Design Integr Circ Syst 36(3):513–517

    Google Scholar 

  • Zhao W, Fu H, Luk W, Yu T, Wang S, Feng B, Ma Y, Yang G (2016) F-CNN: an FPGA-based framework for training convolutional neural networks. In: Proceedings of the IEEE international conference on application-specific systems, architectures and processors, pp 107–114

  • Zhu M, Liu L, Wang C, Xie Y (2016) Cnnlab: a novel parallel framework for neural networks using gpu and FPGA—a practical study with trade-off analysis. CoRR arXiv:1606.06234

Download references

Acknowledgements

This work is partially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC), Ericsson Research Canada and the Canada Research Chair in Sustainable Smart Eco-Cloud. We would also like to thank Yves Lemieux for his insightful feedback during the research work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdelouahed Gherbi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, X., Ounifi, HA., Gherbi, A. et al. A hybrid GPU-FPGA based design methodology for enhancing machine learning applications performance. J Ambient Intell Human Comput 11, 2309–2323 (2020). https://doi.org/10.1007/s12652-019-01357-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-019-01357-4

Keywords

Navigation