Abstract
In meetings or noisy public places, often a number of speakers are active simultaneously and the sources of interest need to be separated from interfering speech in order to be robustly recognized. Independent component analysis (ICA) has proven to be a valuable tool for this purpose. However, under difficult environmental conditions, ICA outputs may still contain strong residual components of the interfering speakers. In such cases, time-frequency masking can be applied to the ICA outputs to reduce the remaining interferences. In order to remain robust against possible resulting artifacts and loss of information, treating the processed speech feature vector as a random variable with time-varying uncertainty, rather than as deterministic, is a helpful strategy. This chapter shows the ways of improving recognition of multiple speech signals based on nonlinear postprocessing, applied together with uncertainty-based decoding techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A. Hyvärinen, J. Karhunen, E. Oja, Independent Component Analysis, New York: John Wiley, 2001.
A. Mansour and M. Kawamoto, “ICA papers classified according to their applications and performances,” in IEICA Trans. Fundamentals, vol. E86-A, no. 3, pp. 620–633, March 2003.
M. S. Pedersen, J. Larsen, U. Kjems and L. C. Parra, “Convolutive blind source separation methods”, in Springer Handbook of Speech Processing and Speech Communication, pp. 1065–1094, Springer Verlag Berlin Heidelberg, 2008.
J. Anemüller and B. Kollmeier, “Amplitude modulation decorrelation for convolutive blind source separation”, in Proc. ICA 2000, Helsinki, pp. 215–220, 2000.
L. Deng, J. Droppo and A. Acero, “Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion”, in IEEE Trans. Speech and Audio Processing, vol. 13, no. 3, pp. 412–421, May 2005.
D. Kolossa, R. F. Astudillo, E. Hoffmann and R. Orglmeister, “Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions”, in EURASIP J. on Audio, Speech, and Music Processing, vol. 2010, article ID 651420, 2010.
D. Kolossa, A. Klimas and R. Orglmeister, “Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques”, in Proc. Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 82–85, Oct. 2005.
K. Kumatani, J. McDonough, D. Klakow, P. Garner, and W. Li, “Adaptive beamforming with a maximum negentropy criterion,” in Proc. HSCMA, 2008.
O. Yilmaz and S. Rickard, “Blind separation of speech mixtures via time-frequency masking,” IEEE Trans. Signal Processing, vol. 52, no. 7, pp. 1830–1847, July 2004.
M. Kühne, R. Togneri, and S. Nordholm, “Time-frequency masking: Linking blind source separation and robust speech recognition,” in Speech Recognition, Technologies and Applications. I-Tech, 2008.
G. Brown and M. Cooke, “Computational auditory scene analysis,” Computer Speech and Language, vol. 8, pp. 297–336, 1994.
J. B. Allen and L. R. Rabiner, “A unified approach to short-time Fourier analysis and synthesis,” Proc. IEEE, vol. 65, pp. 1558–1564, Nov. 1977.
J.-F. Cardoso and A. Souloumiac, “Blind beamforming for non-Gaussian signals,” Radar and Signal Processing, IEEE Proceedings, F 140(6), pp. 362-370, Dec. 1993.
A. Belouchrani, K. Abed Meraim, J.-F. Cardoso and E. Moulines, “A blind source separation technique based on second order statistics,” in EEE Trans. on Signal Processing, vol. 45(2), pp. 434-444, 1997.
A. Bell and T. Sejnowski, “An information-maximization approach to blind separation and blind deconvolution,” in Neural Computation, vol. 7, pp. 1129–1159, 1995.
L. Deng and J. Droppo and A. Acero, “Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion,” in IEEE Trans. Speech and Audio Processing, vol. 13, pp. 412–421, 2005.
A. Hyvärinen and E. Oja. A fast fixed-point algorithm for independent component analysis. in Neural Computation, vol. 9, pp. 1483–1492, 1997.
T. Kristjansson and B. Frey. Accounting for uncertainty in observations: A new paradigm for robust automatic speech recognition, in Proc. ICASSP, 2002.
C. Mejuto, A. Dapena and L. Castedo, “Frequency-domain infomax for blind separation of convolutive mixtures”, in Proc. ICA 2000, pp. 315–320, Helsinki, 2000.
N. Murata, S. Ikeda, and A. Ziehe, “An approach to blind source separation based on temporal structure of speech signals,” Neurocomputing, vol. 41, no. 1–4, pp. 1–24, Oct. 2001.
L. Parra, C. Spence and B. De Vries, “Convolutive blind source separation based on multiple decorrelation.” in Proc. IEEE NNSP workshop, pp. 23–32, Cambridge, UK, 1998.
K. Kamata, X. Hu, and H. Kobatake, “A new approach to the permutation problem in frequency domain blind source separation,” in Proc. ICA 2004, pp. 849–856, Granada, Spain, September 2004.
D.-T. Pham, C. Servière, and H. Boumaraf, “Blind separation of speech mixtures based on nonstationarity” in IEEE Signal Processing and Its Applications, Proceedings of the Seventh International Symposium, pp. 73–76, 2003.
W. Baumann, D. Kolossa and R. Orglmeister, “Maximum likelihood permutation correction for convolutive source separation,” in ICA 2003, pp. 373–378, 2003.
S. Kurita, H. Saruwatari, S. Kajita, K. Takeda, and F. Itakura, “Evaluation of frequency-domain blind signal separation using directivity pattern under reverberant conditions,” in ICASSP2000, pp. 3140–3143, 2000.
M. Ikram and D. Morgan, “A beamforming approach to permutation alignment for multichannel frequency-domain blind speech separation,” in ICASSP02, pp. 881–884, 2002.
N. Mitianoudis and M. Davies, “Permutation alignment for frequency domain ICA using subspace beamforming methods”, in Proc. ICA 2004, LNCS 3195, pp. 669–676, 2004.
H. Sawada, R. Mukai, S. Araki, S. Makino, “A robust approach to the permutation problem of frequency-domain blind source separation,” in Proc. ICASSP, vol. V, pp. 381–384, Apr. 2003.
V. Stouten and H. Van hamme and P. Wambacq, “Application of minimum statistics and minima controlled recursive averaging methods to estimate a cepstral noise model for robust ASR,” in Proc. ICASSP, vol. 1, May 2006.
D.-T. Pham, C. Servière, and H. Boumaraf, “Blind separation of convolutive audio mixtures using nonstationarity,” in Proc. ICA2003, pp. 981–986, 2003.
P. Sudhakar, and R. Gribonval, “A sparsity-based method to solve permutation indeterminacy in frequency-domain convolutive blind source separation,” in Independent Component Analysis and Signal Separation: 8th International Conference, ICA 2009, Proceedings, Paraty, Brazil, March 2009.
M. Van Segbroeck and H. Van hamme, “Robust speech recognition using missing data techniques in the prospect domain and fuzzy masks,” in Proc. ICASSP, pp. 4393–4396, 2008.
W. Baumann, and B.-U. Khler, and D. Kolossa, and R. Orglmeister, “Real time separation of convolutive mixtures.” in: Independent Component Analysis and Blind Signal Separation: 4th International Symposium, ICA 2001, Proceedings, San Diego, USA, 2001.
F. Asano, S. Ikeda, M. Ogawa, H. Asoh, and N. Kitawaki, “Combined approach of array processing and independent component analysis for blind separation of acoustic signals,” in IEEE Trans. Speech Audio Proc., vol. 11, no. 3, pp. 204–215, May 2003.
H. Sawada, S. Araki, R. Mukai and S. Makino, “Blind extraction of a dominant source from mixtures of many sources using ICA and time-frequency masking,” in ISCAS 2005, pp. 5882–5885, May 2005.
N. Mitianoudis, and M. E. Davies, “Audio source separation of convolutive mixtures.” in: IEEE Transactions on Audio and Speech Processing, vol 11(5), pp. 489-497, 2003.
D. Kolossa and R. Orglmeister, “Nonlinear post-processing for blind speech separation,” in Proc. ICA (LNCS 3195), Sep. 2004, pp. 832-839.
Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean square error log-spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-33, pp. 443–445, Apr. 1985.
S. Araki, S. Makino, Y. Hinamoto, R. Mukai, T. Nishikawa, and H. Saruwatari, “Equivalence between frequency-domain blind source separation and frequency-domain adaptive beamforming for convolutivemixtures,” in EURASIP Journal on Applied Signal Processing, vol. 11, p. 1157–1166, 2003.
E. Hoffmann, D. Kolossa and R. Orglmeister, “A batch algorithm for blind source separation of acoustic signals using ICA and time-frequency masking,” in Proc. ICA (LNCS 4666), Sep. 2007, pp. 480–488.
D. Kolossa, A. Klimas and R. Orglmeister, “Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 82–85, New Paltz, NY, 2005.
E. Hoffmann, D. Kolossa, and R. Orglmeister, “A soft masking strategy based on multichannel speech probability estimation for source separation and robust speech recognition”, In: Proc. WASPAA, New Paltz, NY, 2007.
R. J. McAulay and M. L. Malpass, “Speech enhancement using a soft-decision noise suppression filter,” IEEE Trans. ASSP-28, pp. 137–145, Apr. 1980.
I. Cohen, “On speech Enhancement under signal presence uncertainty,” International Conference on Acoustic and Speech Signal Processing, pp. 167–170, May 2001.
Y. Ephraim and I. Cohen, “Recent advancements in speech enhancement”, The Electrical Engineering Handbook, CRC Press, 2006.
R. G. Leonard, “A database for speaker-independent digit recognition”, Proc. ICASSP 84, Vol. 3, p. 42.11, 1984.
S. Srinivasan and D. Wang, “Transforming binary uncertainties for robust speech recognition”, in IEEE Trans. Audio, Speech and Language Processing, IEEE Transactions on Speech and Audio Processing vol. 15, pp. 2130–2140, 2007.
R. F. Astudillo, D. Kolossa, P. Mandelartz and R. Orglmeister, “An uncertainty propagation approach to robust ASR using the ETSI advanced front-end”, IEEE Journal of Selected Topics in Signal Processing, vol. 4, pp. 824–833, 2010.
G. Brown and D. Wang, “Separation of speech by computational auditory scene analysis”, Speech Enhancement, eds. J. Benesty, S. Makino and J. Chen, Springer, pp. 371–402, 2005.
R. F. Astudillo, D. Kolossa and R. Orglmeister, “Propagation of statistical information through non-linear feature extractions for robust speech recognition”, in Proc. MaxEnt, 2007.
S. Young, G. Evermann, T. Hain, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, P. Woodland, “The HTK Book (for HTK Version 3.4)”, Cambridge University Engineering Department, 2006.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Hoffmann, E., Kolossa, D., Orglmeister, R. (2011). Recognition of Multiple Speech Sources Using ICA. In: Kolossa, D., Häb-Umbach, R. (eds) Robust Speech Recognition of Uncertain or Missing Data. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21317-5_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-21317-5_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21316-8
Online ISBN: 978-3-642-21317-5
eBook Packages: EngineeringEngineering (R0)