Skip to main content
Log in

Hierarchical Expert Neural Network System for Speech Recognition

  • Published:
Journal of Control, Automation and Electrical Systems Aims and scope Submit manuscript

Abstract

This work proposes a hierarchical architecture composed of a expert neural network set based on the ensemble method with dynamic selection of classifiers for application in speech recognition systems. Therefore, 30 commands in the Brazilian Portuguese language were coded by a two-dimensional time matrix, resulting from the application of the discrete cosine transformation in the mel-cepstral coefficients. These patterns were modified by means of a nonlinear transformation to a high-dimensionality space through a set of Gaussian radial basis functions (GRBFs) parameterized with the centroid and covariance characteristics of the classes. The classification was made through the dynamic classifier selection approach, in which multilayer perceptron and learning vector quantization configurations were analyzed to constitute the multiple classifiers specialized in the subdivisions made in the total of classes to be recognized. Then, given a new test pattern, the GRBF that presents the highest value of the receptive field in relation to the input feature vector indicates the class to which the pattern is nearer, thus directing to the expert neural network that provides the final result of classification based on the local accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Abdalla, M. I., Abobakr, H. M., & Gaafar, T. S. (2013). DWT and MFCCS based feature extraction methods for isolated word recognition. International Journal of Computer Applications, 69(20), 21–25.

    Article  Google Scholar 

  • Aida-zade, K., Xocayev, A., & Rustamov, S. (2016). Speech recognition using support vector machines. In IEEE 10th international conference on application of information and communication technologies (AICT) (pp. 1–4).

  • Araújo, R. A., Oliveira, A. L., & Meira, S. (2017). A morphological neural network for binary classification problems. Engineering Applications of Artificial Intelligence, 65, 12–28. https://doi.org/10.1016/j.engappai.2017.07.014.

    Article  Google Scholar 

  • Bellegarda, J. R., & Monz, C. (2016). State of the art in statistical methods for language and speech processing. Computer Speech and Language, 35, 163–184.

    Article  Google Scholar 

  • Bhowmik, T., Chowdhury, A., & Mandal, S. K. D. (2018). Deep neural network based place and manner of articulation detection and classification for bengali continuous speech. Procedia Computer Science, 125, 895–901. https://doi.org/10.1016/j.procs.2017.12.114.

    Article  Google Scholar 

  • Britanak, V., Yip, P., & Rao, K. (2010). Discrete cosine and sine transforms: General properties, fast algorithms and integer approximations. Amsterdam: Elsevier.

    Google Scholar 

  • Britto, A. S., Sabourin, R., & Oliveira, L. (2014). Dynamic selection of classifiers-a comprehensive review. Pattern Recognition, 47(11), 3665–3680. https://doi.org/10.1016/j.patcog.2014.05.003.

    Article  Google Scholar 

  • Buhmann, M. (2003). Radial basis functions: Theory and implementations. Cambride: Cambridge University Press.

    Book  MATH  Google Scholar 

  • Cao, L., Jin, L., Tao, H., Li, G., Zhuang, Z., & Zhang, Y. (2015). Multi-focus image fusion based on spatial frequency in discrete cosine transform domain. IEEE Signal Processing Letters, 22(2), 220–224. https://doi.org/10.1109/LSP.2014.2354534.

    Article  Google Scholar 

  • Cardoso, S. A., Castanho, J. E. C., Franchin, M. N., & Fontes, I. R. (2010). Sesame: sistema de reconhecimento de comandos de voz utilizando PDS e RNA. In Anais do XVIII Congresso Brasileiro de Automática (pp. 1316–1323).

  • Debatin, L., Haendchen, A., & Dazzi, R. L. S. (2017). O problema do reconhecimento de voz offline em dispositivos móveis: em busca de uma abordagem racional. In Anais do XXIII Simpósio Brasileiro de Sistemas Multimídia e Web: Workshops e Pôsteres. Porto Alegre: Sociedade Brasileira de Computação (pp. 229–230).

  • Didaci, L., Giacinto, G., Roli, F., & Marcialis, G. L. (2005). A study on the performances of dynamic classifier selection based on local accuracy estimation. Pattern Recognition, 38(11), 2188–2191. https://doi.org/10.1016/j.patcog.2005.02.010.

    Article  MATH  Google Scholar 

  • Dougherty, G. (2013). Pattern recognition and classification: An introduction. New York: Springer.

    Book  MATH  Google Scholar 

  • Duda, R. O., Hart, P. E., & Stork, D. G. (2012). Pattern classification. New York: Wiley.

    MATH  Google Scholar 

  • Filho, J. A. S. L., Canuto, A. M., & Santiago, R. H. N. (2018). Investigating the impact of selection criteria in dynamic ensemble selection methods. Expert Systems with Applications, 106, 141–153.

    Article  Google Scholar 

  • Giacinto, G., & Roli, F. (1999). Intelligent system of speech recognition using neural networks based on DCT parametric models of low order. In Proceedings 10th international conference on image analysis and processing (pp. 659–664).

  • Gnanasekar, A. K., Jayavelu, P., & Nagarajan, V. (2012). Speech recognition based wireless automation of home loads with fault identification for physically challenged. In International conference on communication and signal processing (pp. 128–132).

  • Halmos, P. (2017). Finite-dimensional vector spaces: Second edition. Dover books on mathematics. Mineola: Dover Publications.

    Google Scholar 

  • Haykin, S. (2011). Neural networks and learning machines (3rd ed.). Hoboken, NJ: Pearson Education.

    Google Scholar 

  • Hu, Y., & Hwang, J. E. (2014). Handbook of neural networks for speech processing (1st ed.). New York: CRC Press.

    Google Scholar 

  • Hua, Z. & Ng, W. L. (2010). Speech recognition interface design for in-vehicle system. In Proceedings of the 2nd international conference on automotive user interfaces and interactive vehicular applications (pp. 29–33).

  • Janson, S., Janson, P., Bollobas, B., Fulton, W., Katok, A., Kirwan, F., et al. (1997). Gaussian Hilbert spaces. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Jensen, J., & Tan, Z. (2015). Minimum mean-square error estimation of mel-frequency cepstral features-a theoretically consistent approach. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(1), 186–197. https://doi.org/10.1109/TASLP.2014.2377591.

    Article  Google Scholar 

  • Jo, J., Yoo, H., & Park, I. (2016). Energy-efficient floating-point MFCC extraction architecture for speech recognition systems. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 24(2), 754–758. https://doi.org/10.1109/TVLSI.2015.2413454.

    Article  Google Scholar 

  • Kautz, T., Eskofier, B. M., & Pasluosta, C. F. (2017). Generic performance measure for multiclass-classifiers. Pattern Recognition, 68, 111–125.

    Article  Google Scholar 

  • Kheradpisheh, S. R., Sharifizadeh, F., Nowzari-Dalini, A., Ganjtabesh, M., & Ebrahimpour, R. (2014). Mixture of feature specified experts. Information Fusion, 20, 242–251.

    Article  Google Scholar 

  • Koo, Y., Yang, J., Park, M., Kang, E., Hwang, W., Lee, W., et al. (2014). An intelligent motion control of two wheel driving robot based voice recognition. In 14th International conference on control, automation and systems (ICCAS) (pp. 313–315).

  • Kuncheva, L. (2014). Combining pattern classifiers: Methods and algorithms. Hoboken: Wiley.

    MATH  Google Scholar 

  • Kuncheva, L. I. (2000). Clustering-and-selection model for classifier combination. In Proceedings of fourth international conference on knowledge-based intelligent engineering systems and allied technologies, KES’2000 (Cat. No.00TH8516) (pp. 185–188).

  • Li, W., Zhou, Y., Poh, N., Zhou, F., & Liao, Q. (2013). Feature denoising using joint sparse representation for in-car speech recognition. IEEE Signal Processing Letters, 20(7), 681–684.

    Article  Google Scholar 

  • Liu, Y., Ouyang, C., & Li, J. (2017). Ensemble method to joint inference for knowledge extraction. Expert Systems with Applications, 83, 114–121.

    Article  Google Scholar 

  • Palacios, D. S., Ferri, C., & Quintana, M. J. R. (2017). Improving performance of multiclass classification by inducing class hierarchies. Procedia Computer Science, 108, 1692–1701.

    Article  Google Scholar 

  • Piryatinska, A., Darkhovsky, B., & Kaplan, A. (2017). Binary classification of multichannel-EEG records based on the complexity of continuous vector functions. Computer Methods and Programs in Biomedicine, 152, 131–139. https://doi.org/10.1016/j.cmpb.2017.09.001.

    Article  Google Scholar 

  • Priddy, K., & Keller, P. (2005). Artificial neural networks: An introduction (Illustrated ed.). Washington, DC: SPIE Press.

    Book  Google Scholar 

  • Qian, Y., Liu, J., & Johnson, M. T. (2009). Efficient embedded speech recognition for very large vocabulary mandarin car-navigation systems. IEEE Transactions on Consumer Electronics, 55(3), 1496–1500.

    Article  Google Scholar 

  • Rao, K. R., & Yip, P. (1990). Discrete cosine transform: Algorithms, advantages, applications. San Diego: Academic Press Professional Inc.

    Book  MATH  Google Scholar 

  • Rocha, P. L., & Silva, W. L. S. (2016). Intelligent system of speech recognition using neural networks based on DCT parametric models of low order. In International joint conference on neural networks (IJCNN) (pp. 788–795).

  • Roman, S. (2007). Advanced linear algebra. Berlin: Springer.

    Google Scholar 

  • Silva, I., Spatti, D., & Flauzino, R. (2010). Redes Neurais Artificiais para Engenharia e Ciências Aplicadas: Curso Prático. São Paulo: Artliber.

    Google Scholar 

  • Silva, W., & Serra, G. (2014). Intelligent genetic fuzzy inference system for speech recognition: An approach from low order feature based on discrete cosine transform. Journal of Control, Automation and Electrical Systems, 25(6), 689–698.

    Article  Google Scholar 

  • Singh, T., & Yadav, N. (2015). Voice recognition based advance patient’s room automation. International Journal of Research in Engineering and Technology, 4(6), 308–310. https://doi.org/10.1007/s40313-016-0285-8.

    Article  Google Scholar 

  • Song, Q., Jiang, H., & Liu, J. (2017). Feature selection based on FDA and F-score for multi-class classification. Expert Systems with Applications, 81, 22–27.

    Article  Google Scholar 

  • Sousa, C. A. R. D. (2016). An overview on weight initialization methods for feedforward neural networks. In International joint conference on neural networks (IJCNN) (pp. 52–59).

  • Stoll, R. (2013). Linear algebra and matrix theory. Dover books on mathematics. Mineola: Dover Publications.

    Google Scholar 

  • Strang, G. (2003). Introduction to linear algebra. Wellesley: Wellesley-Cambridge Press.

    MATH  Google Scholar 

  • Theodoridis, S., & Koutroumbas, K. (2008). Pattern recognition. Amsterdam: Elsevier.

    MATH  Google Scholar 

  • Woods, K., Bowyer, K., & Kegelmeyer, W. P. (1996). Combination of multiple classifiers using local accuracy estimates. In Proceedings CVPR IEEE computer society conference on computer vision and pattern recognition (pp. 391–396).

  • Xie, F., Fan, H., Li, Y., Jiang, Z., Meng, R., & Bovik, A. (2017). Melanoma classification on dermoscopy images using a neural network ensemble model. IEEE Transactions on Medical Imaging, 36(3), 849–858.

    Article  Google Scholar 

  • Zhang, Q., Yang, L. T., Chen, Z., & Li, P. (2018). A survey on deep learning for big data. Information Fusion, 42, 146–157. https://doi.org/10.1016/j.inffus.2017.10.006.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Priscila Rocha.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rocha, P., Silva, W. & Barros, A. Hierarchical Expert Neural Network System for Speech Recognition. J Control Autom Electr Syst 30, 347–359 (2019). https://doi.org/10.1007/s40313-019-00459-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40313-019-00459-w

Keywords

Navigation