Abstract
Performance of the thresholding based speech enhancement methods largely depend on the estimate of the exact threshold value as well as on the choice of the thresholding function. In this paper, a speech enhancement method is presented, in which a custom thresholding function is proposed and employed upon the Wavelet Packet (WP) coefficients of the noisy speech. The thresholding function is capable of switching between modified hard and semisoft thresholding functions depending on a parameter that decides the signal characteristics under consideration. Here, the threshold is determined based on the statistical modeling of the Teager energy operated WP coefficients of the noisy speech. Extensive simulations indicate that the threshold thus obtained in conjunction with the custom thresholding function is very effective in reduction of not only the white noise but also the color noise from the noisy speech thus resulting in an enhanced speech with better quality and intelligibility. Several standard objective measures and subjective evaluations including informal listening tests show that the proposed method outperforms the recent state-of-the-art thresholding based approaches of noisy speech enhancement from high to low levels of SNR.
Similar content being viewed by others
References
Almajai, I., & Milner, B. (2011). Visually derived wiener filters for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 19(6), 1642–1651.
Bahoura, M., & Rouat, J. (2001). A new approach for wavelet speech enhancement. In EUROSPEECH (pp. 1937–1940).
Chang, J.-H. (2005). Warped discrete cosine transform-based noisy speech enhancement. IEEE Transactions on Circuits and Systems. II, Express Briefs, 52, 535–539.
Chang, J.-H. (2007). Complex Laplacian probability density function for noisy speech enhancement. IEICE Electronics Express, 4, 245–250.
Chang, S., Kwon, Y., Yang, S.-I., & Kim, I.-J. (2002). Speech enhancement for non-stationary noise environment by adaptive wavelet packet. In Proc. IEEE int. conf. acoustics, speech, and signal processing (ICASSP) (Vol. 1, pp. I-561–I-564).
Chen, B., & Loizou, P. C. (2007). A Laplacian-based MMSE estimator for speech enhancement. Speech Communication, 49, 134–143.
Donoho, D. (1995). De-noising by soft-thresholding. IEEE Transactions on Information Theory, 41, 613–627.
Ghanbari, Y., & Mollaei, M. R. K. (2006). A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets. Speech Communication, 48(8), 927–940.
Hirsch, H., & Pearce, D. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions (ISCA ITRW ASR2000). Paris, France.
Hu, Y., & Loizou, P. (2004). Speech enhancement based on wavelet thresholding the multitaper spectrum. IEEE Transactions on Speech and Audio Processing, 12, 59–67.
Hu, Y., & Loizou, P. C. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication, 49, 588–601.
Jabloun, F., & Champagne, B. (2003). Incorporating the human hearing properties in the signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing, 11, 700–708.
Johnson, M. T., Yuan, X., & Ren, Y. (2007). Speech signal enhancement through adaptive wavelet thresholding. Speech Communication, 2007, 123–133.
Kaiser, J. (1993). Some useful properties of teager’s energy operators. In Proc. IEEE int. conf. speech, and signal processing (ICASSP) (Vol. 3, pp. 149–152).
Kim, N. S., & Chang, J.-H. (2000). Spectral enhancement based on global soft decision. Signal Processing Letters, 7, 108–110.
O’Shaughnessy, D. (2000). Speech enhancement: theory and practice. New York: IEEE Press.
Sameti, H., Sheikhzadeh, H., Deng, L., & Brennan, R. (1998). HMM-based strategies for enhancement of speech signals embedded in nonstationary noise. IEEE Transactions on Speech and Audio Processing, 6(5), 445–455.
Shao, Y., & Chang, C.-H. (2007). A generalized time-frequency subtraction method for robust speech enhancement based on wavelet filter banks modeling of human auditory system. IEEE Transactions on Systems, Man, and Cybernetics, 37(4), 877–889.
Sheikhzadeh, H., & Abutalebi, H. R. (2001). An improved wavelet-based speech enhancement system. In EUROSPEECH (pp. 1855–1858).
Tabibian, S., Akbari, A., & Nasersharif, B. (2009). A new wavelet thresholding method for speech enhancement based on symmetric Kullback-Leibler divergence. In Computer conference, 2009. CSICC 2009. 14th international CSI (pp. 495–500).
Varga, A., & Steeneken, H. J. M. (1993). Assessment for automatic speech recognition: Ii. noisex-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12, 247–251.
Yamashita, K., & Shimamura, T. (2005). Nonstationary noise estimation using low-frequency regions for spectral subtraction. Signal Processing Letters, 12, 465–468.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sanam, T.F., Shahnaz, C. Enhancement of noisy speech based on a custom thresholding function with a statistically determined threshold. Int J Speech Technol 15, 463–475 (2012). https://doi.org/10.1007/s10772-012-9144-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-012-9144-6