Abstract
Deep neural networks (DNNs) is one of the most popular machine learning methods and is widely used in many modern applications. The training process of DNNs is a time-consuming process. Accelerating the training of DNNs has been the focus of many research works. In this paper, we speed up the training of DNNs applied for automatic speech recognition and the target architecture is heterogeneous (CPU + MIC). We apply asynchronous methods for I/O and communication operations and propose an adaptive load balancing method. Besides, we also employ a momentum idea to speed up the convergence of the gradient descent algorithm. Experimental results show that our optimized algorithm is able to acquire a 20-fold speedup on a CPU + MIC platform compared with the original sequential algorithm on a single-core CPU.
Similar content being viewed by others
References
Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., Tran, J., Catanzaro, B., Shelhamer, E.: cuDNN: Efficient primitives for deep learning. CoRR abs/1410.0759 (2014). arXiv:1410.0759
Chigier, B.: Automatic speech recognition. US Patent 5,638,487, 10 June (1997). http://www.freepatentsonline.com/5638487.html
Cirean, D., Meier, U., Gambardella, L., Schmidhuber, J.: Deep, big, simple neural nets for handwritten digit recognition. Neural Comput. 22(12), 3207–3220 (2010)
Genevieve Orr FC Nici Schraudolph: Cs-449: Neural Networks. https://www.willamette.edu/gorr/classes/cs449/momrate.html (1999)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional Architecture for Fast Feature Embedding, pp. 675–678. doi:10.1145/2647868.2654889 (2014)
Jin, L., Wang, Z., Gu, R., Yuan, C., Huang, Y.: Training large scale deep neural networks on the intel xeon phi many-core coprocessor. In: IEEE International Parallel and Distributed Processing Symposium Workshops, pp. 1622–1630 (2014) doi:10.1109/ipdpsw.2014.194
Liu, J., Wang, H., Wang, D., Gao, Y., Li, Z.: Parallelizing Convolutional Neural Networks on Intel \(^{\textregistered }\) Many Integrated Core Architecture. Springer, Berlin (2015)
Niranjan, M.: Support vector machines: a tutorial overview and critical appraisal. In: Applied Statistical Pattern Recognition (1999) doi:10.1049/ic:19990359
Pennycook, S.J., Hughes, C.J., Smelyanskiy, M., Jarvis, S.A.: Exploring simd for molecular dynamics, using intel xeon processors and intel xeon phi coprocessors. In: Parallel and Distributed Processing Symposium, International, pp. 1085–1097 (2013). doi:10.1109/ipdps.2013.44
Viebke, A., Pllana, S.: The potential of the intel (r) xeon phi for supervised deep learning. In: Computer Science, pp. 758–765 (2015). doi:10.1109/hpcc-css-icess.2015.45
Zhang, C., Zhang, Z.: Improving multiview face detection with multi-task deep convolutional neural networks. In: IEEE Winter Conference on Applications of Computer Vision, pp. 1036–1041 (2014). doi:10.1109/wacv.2014.6835990
Acknowledgements
This work is supported in part by the National Natural Science Foundation of China under Grant No. 61472431. The authors would like to thank Chengkun Wu for his advising, and the anonymous reviewers for their time, work, and valuable feedback.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fan, S., Fei, J. & Shen, L. Accelerating Deep Learning with a Parallel Mechanism Using CPU + MIC. Int J Parallel Prog 46, 660–673 (2018). https://doi.org/10.1007/s10766-017-0535-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-017-0535-9