ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Global Signal-to-noise Ratio Estimation Based on Multi-subband Processing Using Convolutional Neural Network

Nan LI, Meng Ge, Longbiao Wang, Masashi Unoki, Sheng Li, Jianwu Dang

The global signal-to-noise ratio (gSNR) is defined as the ratio of speech energy to noise energy in whole noisy audio. However, due to the increase in noise interference, the generalization ability declines when the traditional features (e.g., raw waveforms and MFCCs) are fed directly to the statistical model to estimate a single fullband gSNR. In this paper, we propose a multi-subband-based gSNR estimation network (MSGNet). Specifically, we split the noisy speech waveforms into Bark-scale subbands to obtain higher resolution signals to the middle and low frequencies. Then, convolutional neural networks (CNNs) are used to learn a non-linear function to estimate the speech and noise energy ratio of each subband from the input muti-subband features. Finally, by integrating subbands with different speech and noise energies, gSNR in the fullband is calculated. Extensive experimental results on the AURORA-2J dataset demonstrate that the proposed MSGNet significantly reduces the mean absolute error compared to other baseline gSNR estimation methods.


doi: 10.21437/Interspeech.2022-154

Cite as: LI, N., Ge, M., Wang, L., Unoki, M., Li, S., Dang, J. (2022) Global Signal-to-noise Ratio Estimation Based on Multi-subband Processing Using Convolutional Neural Network. Proc. Interspeech 2022, 361-365, doi: 10.21437/Interspeech.2022-154

@inproceedings{li22b_interspeech,
  author={Nan LI and Meng Ge and Longbiao Wang and Masashi Unoki and Sheng Li and Jianwu Dang},
  title={{Global Signal-to-noise Ratio Estimation Based on Multi-subband Processing Using Convolutional Neural Network}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={361--365},
  doi={10.21437/Interspeech.2022-154}
}