Recognition of Multiple Speech Sources Using ICA

Hoffmann, Eugen; Kolossa, Dorothea; Orglmeister, Reinhold

doi:10.1007/978-3-642-21317-5_12

Eugen Hoffmann³,
Dorothea Kolossa³ &
Reinhold Orglmeister³

872 Accesses
2 Citations

Abstract

In meetings or noisy public places, often a number of speakers are active simultaneously and the sources of interest need to be separated from interfering speech in order to be robustly recognized. Independent component analysis (ICA) has proven to be a valuable tool for this purpose. However, under difficult environmental conditions, ICA outputs may still contain strong residual components of the interfering speakers. In such cases, time-frequency masking can be applied to the ICA outputs to reduce the remaining interferences. In order to remain robust against possible resulting artifacts and loss of information, treating the processed speech feature vector as a random variable with time-varying uncertainty, rather than as deterministic, is a helpful strategy. This chapter shows the ways of improving recognition of multiple speech signals based on nonlinear postprocessing, applied together with uncertainty-based decoding techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A. Hyvärinen, J. Karhunen, E. Oja, Independent Component Analysis, New York: John Wiley, 2001.
Book Google Scholar
A. Mansour and M. Kawamoto, “ICA papers classified according to their applications and performances,” in IEICA Trans. Fundamentals, vol. E86-A, no. 3, pp. 620–633, March 2003.
Google Scholar
M. S. Pedersen, J. Larsen, U. Kjems and L. C. Parra, “Convolutive blind source separation methods”, in Springer Handbook of Speech Processing and Speech Communication, pp. 1065–1094, Springer Verlag Berlin Heidelberg, 2008.
Google Scholar
J. Anemüller and B. Kollmeier, “Amplitude modulation decorrelation for convolutive blind source separation”, in Proc. ICA 2000, Helsinki, pp. 215–220, 2000.
Google Scholar
L. Deng, J. Droppo and A. Acero, “Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion”, in IEEE Trans. Speech and Audio Processing, vol. 13, no. 3, pp. 412–421, May 2005.
Google Scholar
D. Kolossa, R. F. Astudillo, E. Hoffmann and R. Orglmeister, “Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions”, in EURASIP J. on Audio, Speech, and Music Processing, vol. 2010, article ID 651420, 2010.
Google Scholar
D. Kolossa, A. Klimas and R. Orglmeister, “Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques”, in Proc. Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 82–85, Oct. 2005.
Google Scholar
K. Kumatani, J. McDonough, D. Klakow, P. Garner, and W. Li, “Adaptive beamforming with a maximum negentropy criterion,” in Proc. HSCMA, 2008.
Google Scholar
O. Yilmaz and S. Rickard, “Blind separation of speech mixtures via time-frequency masking,” IEEE Trans. Signal Processing, vol. 52, no. 7, pp. 1830–1847, July 2004.
Article MathSciNet Google Scholar
M. Kühne, R. Togneri, and S. Nordholm, “Time-frequency masking: Linking blind source separation and robust speech recognition,” in Speech Recognition, Technologies and Applications. I-Tech, 2008.
Google Scholar
G. Brown and M. Cooke, “Computational auditory scene analysis,” Computer Speech and Language, vol. 8, pp. 297–336, 1994.
Article Google Scholar
J. B. Allen and L. R. Rabiner, “A unified approach to short-time Fourier analysis and synthesis,” Proc. IEEE, vol. 65, pp. 1558–1564, Nov. 1977.
Article Google Scholar
J.-F. Cardoso and A. Souloumiac, “Blind beamforming for non-Gaussian signals,” Radar and Signal Processing, IEEE Proceedings, F 140(6), pp. 362-370, Dec. 1993.
Google Scholar
A. Belouchrani, K. Abed Meraim, J.-F. Cardoso and E. Moulines, “A blind source separation technique based on second order statistics,” in EEE Trans. on Signal Processing, vol. 45(2), pp. 434-444, 1997.
Google Scholar
A. Bell and T. Sejnowski, “An information-maximization approach to blind separation and blind deconvolution,” in Neural Computation, vol. 7, pp. 1129–1159, 1995.
Google Scholar
L. Deng and J. Droppo and A. Acero, “Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion,” in IEEE Trans. Speech and Audio Processing, vol. 13, pp. 412–421, 2005.
Google Scholar
A. Hyvärinen and E. Oja. A fast fixed-point algorithm for independent component analysis. in Neural Computation, vol. 9, pp. 1483–1492, 1997.
Google Scholar
T. Kristjansson and B. Frey. Accounting for uncertainty in observations: A new paradigm for robust automatic speech recognition, in Proc. ICASSP, 2002.
Google Scholar
C. Mejuto, A. Dapena and L. Castedo, “Frequency-domain infomax for blind separation of convolutive mixtures”, in Proc. ICA 2000, pp. 315–320, Helsinki, 2000.
Google Scholar
N. Murata, S. Ikeda, and A. Ziehe, “An approach to blind source separation based on temporal structure of speech signals,” Neurocomputing, vol. 41, no. 1–4, pp. 1–24, Oct. 2001.
Article MATH Google Scholar
L. Parra, C. Spence and B. De Vries, “Convolutive blind source separation based on multiple decorrelation.” in Proc. IEEE NNSP workshop, pp. 23–32, Cambridge, UK, 1998.
Google Scholar
K. Kamata, X. Hu, and H. Kobatake, “A new approach to the permutation problem in frequency domain blind source separation,” in Proc. ICA 2004, pp. 849–856, Granada, Spain, September 2004.
Google Scholar
D.-T. Pham, C. Servière, and H. Boumaraf, “Blind separation of speech mixtures based on nonstationarity” in IEEE Signal Processing and Its Applications, Proceedings of the Seventh International Symposium, pp. 73–76, 2003.
Google Scholar
W. Baumann, D. Kolossa and R. Orglmeister, “Maximum likelihood permutation correction for convolutive source separation,” in ICA 2003, pp. 373–378, 2003.
Google Scholar
S. Kurita, H. Saruwatari, S. Kajita, K. Takeda, and F. Itakura, “Evaluation of frequency-domain blind signal separation using directivity pattern under reverberant conditions,” in ICASSP2000, pp. 3140–3143, 2000.
Google Scholar
M. Ikram and D. Morgan, “A beamforming approach to permutation alignment for multichannel frequency-domain blind speech separation,” in ICASSP02, pp. 881–884, 2002.
Google Scholar
N. Mitianoudis and M. Davies, “Permutation alignment for frequency domain ICA using subspace beamforming methods”, in Proc. ICA 2004, LNCS 3195, pp. 669–676, 2004.
Google Scholar
H. Sawada, R. Mukai, S. Araki, S. Makino, “A robust approach to the permutation problem of frequency-domain blind source separation,” in Proc. ICASSP, vol. V, pp. 381–384, Apr. 2003.
Google Scholar
V. Stouten and H. Van hamme and P. Wambacq, “Application of minimum statistics and minima controlled recursive averaging methods to estimate a cepstral noise model for robust ASR,” in Proc. ICASSP, vol. 1, May 2006.
Google Scholar
D.-T. Pham, C. Servière, and H. Boumaraf, “Blind separation of convolutive audio mixtures using nonstationarity,” in Proc. ICA2003, pp. 981–986, 2003.
Google Scholar
P. Sudhakar, and R. Gribonval, “A sparsity-based method to solve permutation indeterminacy in frequency-domain convolutive blind source separation,” in Independent Component Analysis and Signal Separation: 8th International Conference, ICA 2009, Proceedings, Paraty, Brazil, March 2009.
Google Scholar
M. Van Segbroeck and H. Van hamme, “Robust speech recognition using missing data techniques in the prospect domain and fuzzy masks,” in Proc. ICASSP, pp. 4393–4396, 2008.
Google Scholar
W. Baumann, and B.-U. Khler, and D. Kolossa, and R. Orglmeister, “Real time separation of convolutive mixtures.” in: Independent Component Analysis and Blind Signal Separation: 4th International Symposium, ICA 2001, Proceedings, San Diego, USA, 2001.
Google Scholar
F. Asano, S. Ikeda, M. Ogawa, H. Asoh, and N. Kitawaki, “Combined approach of array processing and independent component analysis for blind separation of acoustic signals,” in IEEE Trans. Speech Audio Proc., vol. 11, no. 3, pp. 204–215, May 2003.
Google Scholar
H. Sawada, S. Araki, R. Mukai and S. Makino, “Blind extraction of a dominant source from mixtures of many sources using ICA and time-frequency masking,” in ISCAS 2005, pp. 5882–5885, May 2005.
Google Scholar
N. Mitianoudis, and M. E. Davies, “Audio source separation of convolutive mixtures.” in: IEEE Transactions on Audio and Speech Processing, vol 11(5), pp. 489-497, 2003.
Google Scholar
D. Kolossa and R. Orglmeister, “Nonlinear post-processing for blind speech separation,” in Proc. ICA (LNCS 3195), Sep. 2004, pp. 832-839.
Google Scholar
Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean square error log-spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-33, pp. 443–445, Apr. 1985.
Google Scholar
S. Araki, S. Makino, Y. Hinamoto, R. Mukai, T. Nishikawa, and H. Saruwatari, “Equivalence between frequency-domain blind source separation and frequency-domain adaptive beamforming for convolutivemixtures,” in EURASIP Journal on Applied Signal Processing, vol. 11, p. 1157–1166, 2003.
Google Scholar
E. Hoffmann, D. Kolossa and R. Orglmeister, “A batch algorithm for blind source separation of acoustic signals using ICA and time-frequency masking,” in Proc. ICA (LNCS 4666), Sep. 2007, pp. 480–488.
Google Scholar
D. Kolossa, A. Klimas and R. Orglmeister, “Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 82–85, New Paltz, NY, 2005.
Google Scholar
E. Hoffmann, D. Kolossa, and R. Orglmeister, “A soft masking strategy based on multichannel speech probability estimation for source separation and robust speech recognition”, In: Proc. WASPAA, New Paltz, NY, 2007.
Google Scholar
R. J. McAulay and M. L. Malpass, “Speech enhancement using a soft-decision noise suppression filter,” IEEE Trans. ASSP-28, pp. 137–145, Apr. 1980.
Google Scholar
I. Cohen, “On speech Enhancement under signal presence uncertainty,” International Conference on Acoustic and Speech Signal Processing, pp. 167–170, May 2001.
Google Scholar
Y. Ephraim and I. Cohen, “Recent advancements in speech enhancement”, The Electrical Engineering Handbook, CRC Press, 2006.
Google Scholar
R. G. Leonard, “A database for speaker-independent digit recognition”, Proc. ICASSP 84, Vol. 3, p. 42.11, 1984.
Google Scholar
S. Srinivasan and D. Wang, “Transforming binary uncertainties for robust speech recognition”, in IEEE Trans. Audio, Speech and Language Processing, IEEE Transactions on Speech and Audio Processing vol. 15, pp. 2130–2140, 2007.
Google Scholar
R. F. Astudillo, D. Kolossa, P. Mandelartz and R. Orglmeister, “An uncertainty propagation approach to robust ASR using the ETSI advanced front-end”, IEEE Journal of Selected Topics in Signal Processing, vol. 4, pp. 824–833, 2010.
Article Google Scholar
G. Brown and D. Wang, “Separation of speech by computational auditory scene analysis”, Speech Enhancement, eds. J. Benesty, S. Makino and J. Chen, Springer, pp. 371–402, 2005.
Google Scholar
R. F. Astudillo, D. Kolossa and R. Orglmeister, “Propagation of statistical information through non-linear feature extractions for robust speech recognition”, in Proc. MaxEnt, 2007.
Google Scholar
S. Young, G. Evermann, T. Hain, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, P. Woodland, “The HTK Book (for HTK Version 3.4)”, Cambridge University Engineering Department, 2006.
Google Scholar

Download references

Author information

Authors and Affiliations

Electronics and Medical Signal Processing Group, TU Berlin, Einsteinufer 17, 10587, Berlin, Germany
Eugen Hoffmann, Dorothea Kolossa & Reinhold Orglmeister

Authors

Eugen Hoffmann
View author publications
You can also search for this author in PubMed Google Scholar
Dorothea Kolossa
View author publications
You can also search for this author in PubMed Google Scholar
Reinhold Orglmeister
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eugen Hoffmann .

Editor information

Editors and Affiliations

Institute of Communication Acoustics, Ruhr-Universität Bochum, Universitätsstrasse 150, Bochum, 44801, Germany
Dorothea Kolossa
, Dept. of Communications Engineering, University of Paderborn, Warburger Strasse 100, Paderborn, 33098, Germany
Reinhold Häb-Umbach

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hoffmann, E., Kolossa, D., Orglmeister, R. (2011). Recognition of Multiple Speech Sources Using ICA. In: Kolossa, D., Häb-Umbach, R. (eds) Robust Speech Recognition of Uncertain or Missing Data. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21317-5_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-21317-5_12
Published: 23 June 2011
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21316-8
Online ISBN: 978-3-642-21317-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics