Abstract
Non-local mean (NLM) adaptive filtering is a well-explored technique for the denoising of images and electrocardiogram signals. In NLM filtering, the signal value at a particular sample point is estimated by a weighted average of sample points over a search neighborhood. The NLM filter effectively removes the noise when there are similarities among the samples of the signal over the search neighborhood. Due to the time-varying nature of the vocal-tract system and excitation source, the magnitude and frequency of the speech signal vary over the time. Consequently, NLM filtering is not effective in removing the noise components from the speech signal. The similarity among the sample points can be improved by classifying the speech signal into different categories depending on the magnitude and frequency components. In a given speech signal, the vowel-like speech (VLS) are high-magnitude regions compared to the other non-VLS. The vowel, semivowel and diphthong sound units are collectively termed as VLS. In this work, at the first level, the noisy speech signal is classified into VLS and non-VLS for improving similarity. Next, the non-local similarity present within the VLS and the non-VLS is exploited separately for an effective speech enhancement through NLM filtering. The experimental results presented in this study show that the proposed approach provides better denoising performance when compared with the NLM filtering without speech classification as well as recently reported speech enhancement methods. The hardware architecture of the proposed approach is also designed and prototyped on FPGA.
Similar content being viewed by others
References
M. Berouti, R. Schwartz, J. Makhoul, Enhancement of speech corrupted by acoustic noise, in IEEE International Conference on Acoustics, Speech and Signal Processing vol. 4 (Washington, 1979), pp. 208–211
D. Bhoyar, S. Bera, C. Dethe, M. Mushrif, FPGA implementation of adaptive filter for noise cancellation, in 2014 International Conference on Electronics and Communication Systems (ICECS) (2014), pp. 1–5
S. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27(2), 113–120 (1979)
A. Buades, B. Coll, J.M. Morel, A review of image denoising algorithms, with a new one. Multiscale Model. Simul. 4(2), 490–530 (2005)
N. Chatlani, J.J. Soraghan, Emd-based filtering (EMDF) of low-frequency noise for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 20(4), 1158–1166 (2012)
I. Cohen, Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Trans. Speech Audio Process. 11(5), 466–475 (2003)
G. Dahl, D. Yu, L. Deng, A. Acero, Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)
K. Deepak, S.M. Prasanna, Foreground speech segmentation and enhancement using glottal closure instants and mel cepstral coefficients. IEEE/ACM Trans. Audio Speech Lang. Process. 24(7), 1205–1219 (2016)
V. Digalakis, D. Rtischev, L. Neumeyer, Speaker adaptation using constrained estimation of Gaussian mixtures. IEEE Trans. Audio Speech Lang. Process. 3(5), 357–366 (1995)
Y. Ephraim, D. Malah, Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984)
Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 33(2), 443–445 (1985)
M.J.F. Gales, Semi-tied covariance matrices for hidden Markov models. IEEE Trans. Audio Speech Lang. Process. 7(3), 272–281 (1999)
J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S.Pallett, DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM, NIST speech disc 1-1.1. NASA STI/Recon Tech. Rep. 93 (1993)
T. Gerkmann, R.C. Hendriks, Unbiased MMSE-based noise power estimation with low complexity and low tracking delay. IEEE Trans. Audio Speech Lang. Process. 20(4), 1383–1393 (2012)
P. Goel, M. Chandra, VLSI implementations of retimed high speed adaptive filter structures for speech enhancement. Microsyst. Technol. 24, 4799–4806 (2018)
Y. Hu, P.C. Loizou, Evaluation of objective measures for speech enhancement, in Ninth International Conference on Spoken Language Processing (2006)
Y. Hu, P.C. Loizou, Subjective comparison of speech enhancement algorithms, in IEEE International Conference on Acoustics Speech and Signal Processing Proceedings vol. 1 (2006), pp. I–I
Y. Hu, P.C. Loizou, Evaluation of objective quality measures for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 16(1), 229–238 (2008)
Q. Jin, A. Waibel, Application of LDA to speaker recognition, in Proceedings of the Interspeech (2000), pp. 250–253
K. Khaldi, A.O. Boudraa, A. Bouchikhi, M.T.H. Alouane, Speech enhancement via EMD. EURASIP J. Adv. Signal Process. 2008, 873204 (2008)
K. Khaldi, A.O. Boudraa, A. Komaty, Speech enhancement using empirical mode decomposition and the Teager–Kaiser energy operator. J. Acoust. Soc. Am. 135(1), 451–459 (2014)
P. Krishnamoorthy, S.M. Prasanna, Enhancement of noisy speech by temporal and spectral processing. Speech Commun. 53(2), 154–174 (2011)
J. Li, L. Deng, Y. Gong, R. Haeb-Umbach, An overview of noise-robust automatic speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(4), 745–777 (2014)
P.C. Loizou, Speech Enhancement: Theory and Practice (CRC Press, Boca Raton, 2013)
Y. Lu, P.C. Loizou, Estimators of the magnitude-squared spectrum and methods for incorporating SNR uncertainty. IEEE Trans. Audio Speech Lang. Process. 19(5), 1123–1137 (2011)
U. Mahbub, T. Rahman, A. Rashid, FPGA implementation of real time acoustic noise suppression by spectral subtraction using dynamic moving average method, in IEEE Symposium on Industrial Electronics and Applications, vol. 1 (2009), pp. 365–370
R. Martin, Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans. Speech Audio Process. 9(5), 504–512 (2001)
J. Ming, T.J. Hazen, J.R. Glass, D.A. Reynolds, Robust speaker recognition in noisy conditions. IEEE Trans. Audio Speech Lang. Process. 15(5), 1711–1723 (2007)
M. Mukherjee, M. Maitra, et al. Reconfigurable architecture of adaptive median filter—an FPGA based approach for impulse noise suppression, in Third International Conference on Computer, Communication, Control and Information Technology (C3IT) (2015), pp. 1–6
S.J. Pinto, G. Panda, R. Peesapati, An implementation of hybrid control strategy for distributed generation system interface using Xilinx system generator. IEEE Trans. Ind. Inform. 13(5), 2735–2745 (2017)
D. Povey, L. Burget, M. Agarwal, P. Akyazi, F. Kai, A. Ghoshal, O. Glembek, N. Goel, M. Karafiát, A. Rastrow, R.C. Rose, P. Schwarz, S. Thomas, The subspace Gaussian mixture model—a structured model for speech recognition. Comput. Speech Lang. 25(2), 404–439 (2011)
G. Pradhan, B.C. Haris, S.R.M. Prasanna, R. Sinha, Speaker verification in sensor and acoustic environment mismatch conditions. Int. J. Speech Technol. 15, 381–392 (2012)
G. Pradhan, S.M. Prasanna, Speaker verification by vowel and nonvowel like segmentation. IEEE Trans. Audio Speech Lang. Process. 21(4), 854–867 (2013)
P. Singh, G. Pradhan, Exploring the non-local similarity present in variational mode functions for effective ECG denoising, in International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018), pp. 861–865
P. Singh, G. Pradhan, S. Shahnawazuddin, Denoising of ECG signal by non-local estimation of approximation coefficients in DWT. Biocybern. Biomed. Eng. 37(3), 599–610 (2017)
P. Singh, S. Shahnawazuddin, G. Pradhan, An efficient ECG denoising technique based on non-local means estimation and modified empirical mode decomposition. Circuits Syst. Signal Process. 37(10), 4527–4547 (2018)
N. Srinivas, P.K. Kumar, A fast carry chain adder using instantiation design entry on virtex-5 FPGA, in International Conference on Electrical, Computer and Electronics Engineering (2016), pp. 106–109
N. Srinivas, P.K. Kumar, G. Pradhan, Low latency architecture design and implementation for short-time fourier transform algorithm on FPGA, in International Conference on Microwaves, Antennas, Communications and Electronic Systems (2017), pp. 1–5
N. Srinivas, G. Pradhan, P.K. Kumar, An efficient hardware architecture for detection of vowel-like regions in speech signal. Integration 63, 185–195 (2018)
N. Srinivas, G. Pradhan, P.K. Kumar, Detection of vowel-like speech: an efficient hardware architecture and it’s FPGA prototype. Microsyst. Technol. 25(4), 1333–1343 (2019)
N. Srinivas, G. Pradhan, S. Shahnawazuddin, Enhancement of noisy speech signal by non-local means estimation of variational mode functions. Proc. Interspeech 2018, 1156–1160 (2018)
R. Tavares, R. Coelho, Speech enhancement with nonstationary acoustic noise detection in time domain. IEEE Signal Process. Lett. 23(1), 6–10 (2016)
B.H. Tracey, E.L. Miller, Nonlocal means denoising of ECG signals. IEEE Trans. Biomed. Eng. 59(9), 2383–2386 (2012)
A. Upadhyay, R. Pachori, Speech enhancement based on mEMD-VMD method. Electron. Lett. 53(7), 502–504 (2017)
D. Van De Ville, M. Kocher, Sure-based non-local means. IEEE Signal Process. Lett. 16(11), 973–976 (2009)
A. Varga, H.J. Steeneken, Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)
N. Virag, Single channel speech enhancement based on masking properties of the human auditory system. IEEE Trans. Speech Audio Process. 7(2), 126–137 (1999)
B. Yegnanarayana, C. Avendano, H. Hermansky, P.S. Murthy, Speech enhancement using linear prediction residual. Speech Commun. 28(1), 25–42 (1999)
L. Zao, R. Coelho, P. Flandrin, Speech enhancement with EMD and hurst-based mode selection. IEEE/ACM Trans. Audio Speech Lang. Process. 22(5), 899–911 (2014)
Acknowledgements
This research work is a sub-module of the project “Development of Speech Based Person Authentication System on FPGA” under SMDP-C2SD (9(I)/2014-MDD) program and is supported by the Ministry of Electronics and Information Technology (Meity), Government of India.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Srinivas, N., Pradhan, G. & Kumar, P.K. A Classification-Based Non-local Means Adaptive Filtering for Speech Enhancement and Its FPGA Prototype. Circuits Syst Signal Process 39, 2489–2506 (2020). https://doi.org/10.1007/s00034-019-01267-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-019-01267-y