An optimized convolutional neural network for speech enhancement

Karthik, A; Mazher Iqbal, J. L.

doi:10.1007/s10772-023-10073-6

An optimized convolutional neural network for speech enhancement

Published: 29 December 2023

Volume 26, pages 1117–1129, (2023)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

138 Accesses
Explore all metrics

Abstract

Speech enhancement is an important property in today’s world because most applications use voice recognition as an important feature for performing operations in it. Perfect recognition of commands is achieved only by recognizing the voice correctly. Hence, the speech signal must be enhanced and free from background noise for the recognition process. In the existing approach, a recurrent convolutional encoder/decoder is used for denoising the speech signal. It utilized the signal-to-noise ratio property for enhancing the speech signal. It removes the noise signal effectively by having a low character error rate. But it does not describe the range of SNR of the noise added to the signal. Hence, in this, optimized deep learning is proposed to enhance the speech signal. AI function deep learning mimics the human brain's ability to analyze data and create patterns for use in making decisions. An optimized convolutional neural network was proposed for enhancing the speech for a different type of signal-to-noise ratio value of noises. Here, the particle swarm optimization process performs tuning the hyper-parameters of the convolutional neural network. The tuning of value is to minimize the character error rate of the signal. The proposed method is realized using MATLAB R2020b software and evaluation takes place by calculating the character error rate, PESQ, and STOI of the signal. Then, the comparison of the proposed and existing method takes place using evaluation metrics with − 5 dB, 0 dB, + 5 dB and + 10 dB.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Methods for image denoising using convolutional neural network: a review

Article Open access 10 June 2021

Automatic speech recognition: a survey

Article 10 November 2020

A review of convolutional neural network architectures and their optimizations

Article 22 June 2022

References

Abdulbaqi, J., Gu, Y., Chen, S., & Marsic, I. (2020). Residual recurrent neural network for speech enhancement. In ICASSP 2020—2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2020 (pp. 6659–6663).
Abdulbaqi, J., Gu, Y., & Marsic, I. (2019). RHR-Net: A residual hourglass recurrent neural network for speech enhancement. ArXiv, abs/1904.07294
Bahadur, I., Kumar, S., & Agarwal, P. (2021). Performance measurement of a hybrid speech enhancement technique. International Journal of Speech Technology, 24, 665–677.
Article Google Scholar
Bhat, G. S., Shankar, N., Reddy, C. K. A., & Panahi, I. M. S. (2019). A real-time convolutional neural network based speech enhancement for hearing impaired listeners using smartphone. IEEE Access, 7, 78421–78433.
Article Google Scholar
Borgstrom, B. J., & Brandstein, M. S. (2020). The speech enhancement via attention masking network (SEAMNET): An end-to-end system for joint suppression of noise and reverberation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 515–526.
Google Scholar
Chai, L., Du, J., Liu, Q., & Lee, C. (2021). A cross-entropy-guided measure (CEGM) for assessing speech recognition performance and optimizing DNN-based speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 106–117.
Article Google Scholar
Fu, S.-W., Wang, T.-W., Tsao, Y., Lu, X., & Kawai, H. (2018). End-to-end waveform utterance enhancement for direct evaluation metrics optimization by fully convolutional neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(9), 1570–1584.
Article Google Scholar
Gnanamanickam, J., Natarajan, Y., & SriPreethaa, K. R. (2021). A hybrid speech enhancement algorithm for voice assistance application. Sensors (basel, Switzerland), 21(21), 7025.
Article Google Scholar
Grzywalski, T., & Drgas, S. (2019). Using recurrences in time and frequency within U-Net architecture for speech enhancement. In ICASSP 2019—2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2019.
Gutiérrez-Muñoz, M., & Coto-Jiménez, M. (2022). An experimental study on speech enhancement based on a combination of wavelets and deep learning. Computation, 10, 102. https://doi.org/10.3390/computation10060102
Article Google Scholar
Hou, J.-C., Wang, S.-S., Lai, Y.-H., Tsao, Y., Chang, H.-W., & Wang, H.-M. (2018). Audio–visual speech enhancement using multimodal deep convolutional neural networks. IEEE Transactions on Emerging Topics in Computational Intelligence, 2(2), 117–128.
Article Google Scholar
Hsieh, T.-A., Wang, H.-M., Lu, X., & Tsao, Y. (2020). WaveCRN: An efficient convolutional recurrent neural network for end-to-end speech enhancement. IEEE Signal Processing Letters, 27, 2149–2153.
Article Google Scholar
Jose, W. J. (2020). AMRConvNet: AMR-coded speech enhancement using convolutional neural networks. In 2020 IEEE international conference on systems, man, and cybernetics (SMC), 2020.
Karthik, A., & Mazher Iqbal, J. L. (2020). Performance estimation based recurrent-convolutional encoder decoder for speech enhancement. International Journal of Advanced Science and Technology, 29(05), 772–777. http://sersc.org/journals/index.php/IJAST/article/view/9611
Karthik, A., & Mazher Iqbal, J. L. (2021). Efficient speech enhancement using recurrent convolution encoder and decoder. Wireless Personal Communication, 119, 1959–1973.
Article Google Scholar
Kumawat, P., & Manikandan, M. S. (2019). SSQA: Speech signal quality assessment method using spectrogram and 2-D convolutional neural networks for improving efficiency of ASR devices. In 2019 Seventh international conference on digital information processing and communications (ICDIPC), 2019.
Lan, T., Lyu, Y., Ye, W., Hui, G., Xu, Z., & Liu, Q. (2020). Combining multi-perspective attention mechanism with convolutional networks for monaural speech enhancement. IEEE Access, 8, 78979–78991.
Article Google Scholar
Li, A., Yuan, M., Zheng, C., & Li, X. (2020). Speech enhancement using progressive learning-based convolutional recurrent neural network. Applied Acoustics, 166, 107347.
Article Google Scholar
Liu, C.-L., Fu, S.-W., Li, Y.-J., Huang, J.-W., Wang, H.-M., & Tsao, Y. (2020). Multichannel speech enhancement by raw waveform-mapping using fully convolutional networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing,. https://doi.org/10.1109/TASLP.2020.2976193
Article Google Scholar
Llombart, J., González, D. R., Miguel, A., Vicente, L., Ortega, A., & Lleida, E. (2021). Progressive loss functions for speech enhancement with deep neural networks. EURASIP Journal on Audio, Speech, and Music Processing, 2021, 1–16.
Article Google Scholar
Mashiana, H. S., Salaria, A., & Kaur, K. (2019). Speech enhancement using residual convolutional neural network. In 2019 International conference on smart systems and inventive technology (ICSSIT), 2019 (pp. 1193–1196).
Ouyang, Z., Yu, H., Zhu, W.-P., & Champagne, B. (2019). A fully convolutional neural network for complex spectrogram processing in speech enhancement. In ICASSP 2019—2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2019.
Pandey, A., & Wang, D. (2019a). TCNN: Temporal convolutional neural network for real-time speech enhancement in the time domain. In ICASSP 2019—2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2019.
Pandey, A., & Wang, D. (2019b). A new framework for CNN based speech enhancement in the time domain. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(7), 1179–1188.
Article Google Scholar
Pandey, A., & Wang, D. (2020). Densely connected neural network with dilated convolutions for real-time speech enhancement in the time domain. In ICASSP 2020—2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2020 (pp. 6629–6633).
Pandey, A., & Wang, D. (2021). Dense CNN with self-attention for time-domain speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 1270–1279.
Article Google Scholar
Parthasarathy, S., & Tashev, I. (2018). Convolutional neural network techniques for speech emotion recognition. In 2018 16th International workshop on acoustic signal enhancement (IWAENC), 2018.
Roy, S., & Paliwal, K. K. (2020). Causal convolutional neural network-based Kalman filter for speech enhancement. In 2020 IEEE Asia–Pacific conference on computer science and data engineering (CSDE), 2020 (pp. 1–6).
Roy, S. K., Nicolson, A., & Paliwal, K. K. (2021). DeepLPC-MHANet: Multi-head self-attention for augmented Kalman filter-based speech enhancement. IEEE Access, 9, 70516–70530.
Article Google Scholar
Shah, N., Patil, H. A., & Soni, M. H. (2018). Time–frequency mask-based speech enhancement using convolutional generative adversarial network. In 2018 Asia–Pacific Signal and Information Processing Association annual summit and conference (APSIPA ASC), 2018.
Shi, Y., Rong, W., & Zheng, N. (2018). Speech enhancement using convolutional neural network with skip connections. In 2018 11th International symposium on Chinese spoken language processing (ISCSLP), 2018. https://doi.org/10.1109/iscslp.2018.8706591
Strake, M., Defraene, B., Fluyt, K., Tirry, W., & Fingscheidt, T. (2020). Fully convolutional recurrent networks for speech enhancement. In ICASSP 2020—2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2020.
Tayseer, M., Adeel, A., & Hussain, A. (2018). A survey on techniques for enhancing speech. International Journal of Computer Applications, 179(17), 1–14.
Article Google Scholar
Wu, B., Yu, M., Chen, L., Jin, M., Su, D., & Yu, D. (2019). Improving speech enhancement with phonetic embedding features. In 2019 IEEE automatic speech recognition and understanding workshop (ASRU), 2019.

Download references

Author information

A Karthik
Present address: Department of ECE, Veltech Rangarajan Dr Sagunthala R&D Institute of Science and Technology, Chennai, India

Authors and Affiliations

Department of ECE, Veltech Rangarajan Dr Sagunthala R&D Institute of Science and Technology, Chennai, India
J. L. Mazher Iqbal
Department of ECE, Institute of Aeronautical Engineering, 500043, Hyderabad, India
A Karthik

Authors

A Karthik
View author publications
You can also search for this author in PubMed Google Scholar
J. L. Mazher Iqbal
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

There is no evidence of authorship.

Corresponding author

Correspondence to J. L. Mazher Iqbal.

Ethics declarations

Conflict of interest

There is no potential for a conflict of interest with this project.

Salutation

No credit is due for this creation.

Ethical approval

This procedure is carried out without the involvement of people.

Rights of humans and animals

Animal and human rights are not being violated in any way.

Backing

There is no money associated with this effort.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Karthik, A., Mazher Iqbal, J.L. An optimized convolutional neural network for speech enhancement. Int J Speech Technol 26, 1117–1129 (2023). https://doi.org/10.1007/s10772-023-10073-6

Download citation

Received: 04 August 2023
Accepted: 17 November 2023
Published: 29 December 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10772-023-10073-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An optimized convolutional neural network for speech enhancement

Abstract

Access this article

Similar content being viewed by others

Methods for image denoising using convolutional neural network: a review

Automatic speech recognition: a survey

A review of convolutional neural network architectures and their optimizations

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Salutation

Ethical approval

Rights of humans and animals

Backing

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An optimized convolutional neural network for speech enhancement

Abstract

Access this article

Similar content being viewed by others

Methods for image denoising using convolutional neural network: a review

Automatic speech recognition: a survey

A review of convolutional neural network architectures and their optimizations

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Salutation

Ethical approval

Rights of humans and animals

Backing

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation