Speech signal analysis and enhancement using combined wavelet Fourier transform with stacked deep learning architecture

Srinivasarao, V.

doi:10.1007/s10772-023-10044-x

Speech signal analysis and enhancement using combined wavelet Fourier transform with stacked deep learning architecture

Published: 06 October 2023

Volume 26, pages 735–742, (2023)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

V. Srinivasarao¹

104 Accesses
Explore all metrics

Abstract

Speech enhancement is a process of improving the quality and intelligibility of the degraded speech signal. Various techniques have been demonstrated in the literature employing regression that subtract noise or predict clean signals. This research proposes a novel technique in speech signal analysis with enhancement using wavelet based denoising. Here the input is noisy speech signal and noise is removed using combined wavelet Fourier transform and train stacked encoder with non-negative matrix factorization integrated with convolutional ResNet architecture. Then used pairs of noisy as well as clean speech for training convolutional ResNet architecture at both input and output. The neural network was entirely retrained utilising combinations of wavelet-based denoising and clean speech, using outputs of wavelet denoising as its inputs. Segmental Signal-to-Noise Ratio and Perceptual Evaluation of Speech Quality are the parameters considered in the experimental study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Study on Effectiveness of Deep Neural Networks for Speech Signal Enhancement in Comparison with Wiener Filtering Technique

Speech Enhancement Based on the Combination of Deep Learning and Wavelet Algorithm

Enhancement of single channel speech quality and intelligibility in multiple noise conditions using wiener filter and deep CNN

Article 06 October 2021

Data availability

All the data generated or analysed during this study are included in the published article.

References

Abdulbaqi, J., Gu, Y., & Marsic, I. (2019). RHR-Net: A residual hourglass recurrent neural network for speech enhancement. arXiv preprint arXiv:1904.07294
Abdullah, S. M. S. A., Ameen, S. Y. A., Sadeeq, M. A., & Zeebaree, S. (2021). Multimodal emotion recognition using deep learning. Journal of Applied Science and Technology Trends, 2(02), 52–58.
Article Google Scholar
Abdullah, S., Zamani, M., & Demosthenous, A. (2022). A compact CNN-based speech enhancement with adaptive filter design using gabor function and region-aware convolution. IEEE Access, 10, 130657.
Article Google Scholar
Bie, X., Leglaive, S., Alameda-Pineda, X., & Girin, L. (2022). Unsupervised speech enhancement using dynamical variationalautoencoders. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30, 2993–3007.
Article Google Scholar
Braithwaite, D. T., & Kleijn, W. B. (2019). Speech enhancement with variance constrained autoencoders. In Interspeech (pp. 1831–1835).
Chaiani, M., Selouani, S. A., Boudraa, M., & Yakoub, M. S. (2022). Voice disorder classification using speech enhancement and deep learning models. Biocybernetics and Biomedical Engineering, 42(2), 463–480.
Article Google Scholar
Gutiérrez-Muñoz, M., & Coto-Jiménez, M. (2022). An experimental study on speech enhancement based on a combination of wavelets and deep learning. Computation, 10(6), 102.
Article Google Scholar
Han, C., Luo, Y., & Mesgarani, N. (2020). Real-time binaural speech separation with preserved spatial cues. In Proceedings of IEEE international conference on acoustics, speech, signal processing (ICASSP) (pp. 6404–6408). IEEE.
Huang, Z., Watanabe, S., Yang, S. W., García, P., & Khudanpur, S. (2022). Investigating self-supervised learning for speech enhancement and separation. In ICASSP 2022–2022 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6837–6841). IEEE.
Hwang, J. W., Park, R. H., & Park, H. M. (2021). Efficient audio-visual speech enhancement using deep U-Net with early fusion of audio and video information and RNN attention blocks. IEEE Access, 9, 137584–137598.
Article Google Scholar
Jabari, S., Rezaee, M., Fathollahi, F., & Zhang, Y. (2019). Multispectral change detection using multivariate Kullback-Leibler distance. ISPRS Journal of Photogrammetry and Remote Sensing, 147, 163–177.
Article Google Scholar
Karthik, A., & MazherIqbal, J. L. (2021). Efficient speech enhancement using recurrent convolution encoder and decoder. Wireless Personal Communications, 119(3), 1959–1973.
Article Google Scholar
Kim, H., Kang, K., & Shin, J. W. (2022). Factorized MVDR deep beamforming for multi-channel speech enhancement. IEEE Signal Processing Letters, 29, 1898–1902.
Article Google Scholar
Kim, H., & Shin, J. W. (2021). Target exaggeration for deep learning-based speech enhancement. Digital Signal Processing, 116, 103109.
Article Google Scholar
Kominek, J., & Black, A. W. (2004). The CMU Arctic speech databases. In Proceedings of the fifth ISCA workshop on speech synthesis, Vienna, Austria, 20–22 September 2004.
Le, X., Lei, T., Chen, K., & Lu, J. (2022). Inference skipping for more efficient real-time speech enhancement with parallel RNNs. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30, 2411–2421.
Article Google Scholar
Li, A., Zheng, C., Zhang, L., & Li, X. (2022). Glance and gaze: A collaborative learning framework for single-channel speech enhancement. Applied Acoustics, 187, 108499.
Article Google Scholar
Li, X. X., Li, D., Ren, W. X., & Zhang, J. S. (2022). Loosening identification of multi-bolt connections based on wavelet transform and ResNet-50 convolutional neural network. Sensors, 22(18), 6825.
Article Google Scholar
Lin, J., van Wijngaarden, A. J. D. L., Wang, K. C., & Smith, M. C. (2021b). Speech enhancement using multi-stage self-attentive temporal convolutional networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 3440–3450.
Article Google Scholar
Lin, Y. C., Yu, C., Hsu, Y. T., Fu, S. W., Tsao, Y., & Kuo, T. W. (2021a). SEOFP-NET: Compression and acceleration of deep neural networks for speech enhancement using sign-exponent-only floating-points. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30, 1016–1031.
Article Google Scholar
Llombart, J., Ribas, D., Miguel, A., Vicente, L., Ortega, A., & Lleida, E. (2021). Progressive loss functions for speech enhancement with deep neural networks. EURASIP Journal on Audio, Speech, and Music Processing, 2021(1), 1–16.
Article Google Scholar
Michelsanti, D., Tan, Z. H., Zhang, S. X., Xu, Y., Yu, M., Yu, D., & Jensen, J. (2021). An overview of deep-learning-based audio-visual speech enhancement and separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 1368–1396.
Article Google Scholar
Passos, L. A., Khubaib, A., Raza, M., & Adeel, A. (2022). Multimodal speech enhancement using burst propagation. arXiv preprint arXiv:2209.03275.
Polyak, A., Wolf, L., Adi, Y., Kabeli, O., & Taigman, Y. (2021). High fidelity speech regeneration with application to speech enhancement. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP) (pp. 7143–7147). IEEE.
Rao, W., Fu, Y., Hu, Y., Xu, X., Jv, Y., Han, J., Jiang, Xie, L., Wang, Y., Watanabe, S., et al. (2021). Interspeech 2021 conferencing speech challenge: Towards far-field multi-channel speech enhancement for video conferencing. arXiv:2104.00960.
Reddy, C. K., Dubey, H., Koishida, K., Nair, A., Gopal, V., Cutler, R., Braun, S., Gamper, H., Aichner, R., & Srinivasan, S. (2021). Interspeech 2021 deep noise suppression challenge. In Interspeech.
Ribas, D., Miguel, A., Ortega, A., & Lleida, E. (2022). Wiener filter and deep neural networks: A well-balanced pair for speech enhancement. Applied Sciences, 12(18), 9000.
Article Google Scholar
Sun, K., & Zhang, X. (2021). UltraSE: Single-channel speech enhancement using ultrasound. In Proceedings of the 27th annual international conference on mobile computing and networking (pp. 160–173).
Toloosham, B., & Koishida, K. (2022). A training framework for stereo-aware speech enhancement using deep neural networks. In ICASSP 2022–2022 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6962–6966). IEEE.
Tolooshams, B., Giri, R., Song, A. H., Isik, U., & Krishnaswamy, A. (2020). Channel-attention dense u-net for multichannel speech enhancement. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP) (pp. 836–840). IEEE.
Wang, R., Chencho, An, S., Li, J., Li, L., Hao, H., & Liu, W. (2021). Deep residual network framework for structural health monitoring. Structural Health Monitoring, 20, 1443–1461.
Article Google Scholar
Yuliani, A. R., Amri, M. F., Suryawati, E., Ramdan, A., & Pardede, H. F. (2021). Speech enhancement using deep learning methods: A review. Jurnal Elektronikadan Telekomunikasi, 21(1), 19–26.
Article Google Scholar
Zhang, Q., Qian, X., Ni, Z., Nicolson, A., Ambikairajah, E., & Li, H. (2022). A time-frequency attention module for neural speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31, 462.
Article Google Scholar
Zheng, C., Liu, W., Li, A., Ke, Y., & Li, X. (2022). Low-latency monaural speech enhancement with deep filter-bank equalizer. The Journal of the Acoustical Society of America, 151(5), 3291–3304.
Article Google Scholar

Download references

Author information

Authors and Affiliations

ECE Department, NRI Institute of Technology, Pothavarappadu, Andhra Pradesh, India
V. Srinivasarao

Authors

V. Srinivasarao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to V. Srinivasarao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Srinivasarao, V. Speech signal analysis and enhancement using combined wavelet Fourier transform with stacked deep learning architecture. Int J Speech Technol 26, 735–742 (2023). https://doi.org/10.1007/s10772-023-10044-x

Download citation

Received: 13 February 2023
Accepted: 25 August 2023
Published: 06 October 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s10772-023-10044-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech signal analysis and enhancement using combined wavelet Fourier transform with stacked deep learning architecture

Abstract

Access this article

Similar content being viewed by others

A Study on Effectiveness of Deep Neural Networks for Speech Signal Enhancement in Comparison with Wiener Filtering Technique

Speech Enhancement Based on the Combination of Deep Learning and Wavelet Algorithm

Enhancement of single channel speech quality and intelligibility in multiple noise conditions using wiener filter and deep CNN

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Speech signal analysis and enhancement using combined wavelet Fourier transform with stacked deep learning architecture

Abstract

Access this article

Similar content being viewed by others

A Study on Effectiveness of Deep Neural Networks for Speech Signal Enhancement in Comparison with Wiener Filtering Technique

Speech Enhancement Based on the Combination of Deep Learning and Wavelet Algorithm

Enhancement of single channel speech quality and intelligibility in multiple noise conditions using wiener filter and deep CNN

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation