Skip to main content
Log in

Enhancement of noisy speech based on a custom thresholding function with a statistically determined threshold

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Performance of the thresholding based speech enhancement methods largely depend on the estimate of the exact threshold value as well as on the choice of the thresholding function. In this paper, a speech enhancement method is presented, in which a custom thresholding function is proposed and employed upon the Wavelet Packet (WP) coefficients of the noisy speech. The thresholding function is capable of switching between modified hard and semisoft thresholding functions depending on a parameter that decides the signal characteristics under consideration. Here, the threshold is determined based on the statistical modeling of the Teager energy operated WP coefficients of the noisy speech. Extensive simulations indicate that the threshold thus obtained in conjunction with the custom thresholding function is very effective in reduction of not only the white noise but also the color noise from the noisy speech thus resulting in an enhanced speech with better quality and intelligibility. Several standard objective measures and subjective evaluations including informal listening tests show that the proposed method outperforms the recent state-of-the-art thresholding based approaches of noisy speech enhancement from high to low levels of SNR.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

References

  • Almajai, I., & Milner, B. (2011). Visually derived wiener filters for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 19(6), 1642–1651.

    Article  Google Scholar 

  • Bahoura, M., & Rouat, J. (2001). A new approach for wavelet speech enhancement. In EUROSPEECH (pp. 1937–1940).

    Google Scholar 

  • Chang, J.-H. (2005). Warped discrete cosine transform-based noisy speech enhancement. IEEE Transactions on Circuits and Systems. II, Express Briefs, 52, 535–539.

    Article  Google Scholar 

  • Chang, J.-H. (2007). Complex Laplacian probability density function for noisy speech enhancement. IEICE Electronics Express, 4, 245–250.

    Article  Google Scholar 

  • Chang, S., Kwon, Y., Yang, S.-I., & Kim, I.-J. (2002). Speech enhancement for non-stationary noise environment by adaptive wavelet packet. In Proc. IEEE int. conf. acoustics, speech, and signal processing (ICASSP) (Vol. 1, pp. I-561–I-564).

    Google Scholar 

  • Chen, B., & Loizou, P. C. (2007). A Laplacian-based MMSE estimator for speech enhancement. Speech Communication, 49, 134–143.

    Article  Google Scholar 

  • Donoho, D. (1995). De-noising by soft-thresholding. IEEE Transactions on Information Theory, 41, 613–627.

    Article  MathSciNet  MATH  Google Scholar 

  • Ghanbari, Y., & Mollaei, M. R. K. (2006). A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets. Speech Communication, 48(8), 927–940.

    Article  Google Scholar 

  • Hirsch, H., & Pearce, D. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions (ISCA ITRW ASR2000). Paris, France.

  • Hu, Y., & Loizou, P. (2004). Speech enhancement based on wavelet thresholding the multitaper spectrum. IEEE Transactions on Speech and Audio Processing, 12, 59–67.

    Article  Google Scholar 

  • Hu, Y., & Loizou, P. C. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication, 49, 588–601.

    Article  Google Scholar 

  • Jabloun, F., & Champagne, B. (2003). Incorporating the human hearing properties in the signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing, 11, 700–708.

    Article  Google Scholar 

  • Johnson, M. T., Yuan, X., & Ren, Y. (2007). Speech signal enhancement through adaptive wavelet thresholding. Speech Communication, 2007, 123–133.

    Article  Google Scholar 

  • Kaiser, J. (1993). Some useful properties of teager’s energy operators. In Proc. IEEE int. conf. speech, and signal processing (ICASSP) (Vol. 3, pp. 149–152).

    Google Scholar 

  • Kim, N. S., & Chang, J.-H. (2000). Spectral enhancement based on global soft decision. Signal Processing Letters, 7, 108–110.

    Article  Google Scholar 

  • O’Shaughnessy, D. (2000). Speech enhancement: theory and practice. New York: IEEE Press.

    Google Scholar 

  • Sameti, H., Sheikhzadeh, H., Deng, L., & Brennan, R. (1998). HMM-based strategies for enhancement of speech signals embedded in nonstationary noise. IEEE Transactions on Speech and Audio Processing, 6(5), 445–455.

    Article  Google Scholar 

  • Shao, Y., & Chang, C.-H. (2007). A generalized time-frequency subtraction method for robust speech enhancement based on wavelet filter banks modeling of human auditory system. IEEE Transactions on Systems, Man, and Cybernetics, 37(4), 877–889.

    Article  Google Scholar 

  • Sheikhzadeh, H., & Abutalebi, H. R. (2001). An improved wavelet-based speech enhancement system. In EUROSPEECH (pp. 1855–1858).

    Google Scholar 

  • Tabibian, S., Akbari, A., & Nasersharif, B. (2009). A new wavelet thresholding method for speech enhancement based on symmetric Kullback-Leibler divergence. In Computer conference, 2009. CSICC 2009. 14th international CSI (pp. 495–500).

    Chapter  Google Scholar 

  • Varga, A., & Steeneken, H. J. M. (1993). Assessment for automatic speech recognition: Ii. noisex-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12, 247–251.

    Article  Google Scholar 

  • Yamashita, K., & Shimamura, T. (2005). Nonstationary noise estimation using low-frequency regions for spectral subtraction. Signal Processing Letters, 12, 465–468.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Celia Shahnaz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sanam, T.F., Shahnaz, C. Enhancement of noisy speech based on a custom thresholding function with a statistically determined threshold. Int J Speech Technol 15, 463–475 (2012). https://doi.org/10.1007/s10772-012-9144-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-012-9144-6

Keywords

Navigation