Skip to main content

Recognition of Multiple Speech Sources Using ICA

  • Chapter
  • First Online:
Robust Speech Recognition of Uncertain or Missing Data

Abstract

In meetings or noisy public places, often a number of speakers are active simultaneously and the sources of interest need to be separated from interfering speech in order to be robustly recognized. Independent component analysis (ICA) has proven to be a valuable tool for this purpose. However, under difficult environmental conditions, ICA outputs may still contain strong residual components of the interfering speakers. In such cases, time-frequency masking can be applied to the ICA outputs to reduce the remaining interferences. In order to remain robust against possible resulting artifacts and loss of information, treating the processed speech feature vector as a random variable with time-varying uncertainty, rather than as deterministic, is a helpful strategy. This chapter shows the ways of improving recognition of multiple speech signals based on nonlinear postprocessing, applied together with uncertainty-based decoding techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Hyvärinen, J. Karhunen, E. Oja, Independent Component Analysis, New York: John Wiley, 2001.

    Book  Google Scholar 

  2. A. Mansour and M. Kawamoto, “ICA papers classified according to their applications and performances,” in IEICA Trans. Fundamentals, vol. E86-A, no. 3, pp. 620–633, March 2003.

    Google Scholar 

  3. M. S. Pedersen, J. Larsen, U. Kjems and L. C. Parra, “Convolutive blind source separation methods”, in Springer Handbook of Speech Processing and Speech Communication, pp. 1065–1094, Springer Verlag Berlin Heidelberg, 2008.

    Google Scholar 

  4. J. Anemüller and B. Kollmeier, “Amplitude modulation decorrelation for convolutive blind source separation”, in Proc. ICA 2000, Helsinki, pp. 215–220, 2000.

    Google Scholar 

  5. L. Deng, J. Droppo and A. Acero, “Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion”, in IEEE Trans. Speech and Audio Processing, vol. 13, no. 3, pp. 412–421, May 2005.

    Google Scholar 

  6. D. Kolossa, R. F. Astudillo, E. Hoffmann and R. Orglmeister, “Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions”, in EURASIP J. on Audio, Speech, and Music Processing, vol. 2010, article ID 651420, 2010.

    Google Scholar 

  7. D. Kolossa, A. Klimas and R. Orglmeister, “Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques”, in Proc. Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 82–85, Oct. 2005.

    Google Scholar 

  8. K. Kumatani, J. McDonough, D. Klakow, P. Garner, and W. Li, “Adaptive beamforming with a maximum negentropy criterion,” in Proc. HSCMA, 2008.

    Google Scholar 

  9. O. Yilmaz and S. Rickard, “Blind separation of speech mixtures via time-frequency masking,” IEEE Trans. Signal Processing, vol. 52, no. 7, pp. 1830–1847, July 2004.

    Article  MathSciNet  Google Scholar 

  10. M. Kühne, R. Togneri, and S. Nordholm, “Time-frequency masking: Linking blind source separation and robust speech recognition,” in Speech Recognition, Technologies and Applications. I-Tech, 2008.

    Google Scholar 

  11. G. Brown and M. Cooke, “Computational auditory scene analysis,” Computer Speech and Language, vol. 8, pp. 297–336, 1994.

    Article  Google Scholar 

  12. J. B. Allen and L. R. Rabiner, “A unified approach to short-time Fourier analysis and synthesis,” Proc. IEEE, vol. 65, pp. 1558–1564, Nov. 1977.

    Article  Google Scholar 

  13. J.-F. Cardoso and A. Souloumiac, “Blind beamforming for non-Gaussian signals,” Radar and Signal Processing, IEEE Proceedings, F 140(6), pp. 362-370, Dec. 1993.

    Google Scholar 

  14. A. Belouchrani, K. Abed Meraim, J.-F. Cardoso and E. Moulines, “A blind source separation technique based on second order statistics,” in EEE Trans. on Signal Processing, vol. 45(2), pp. 434-444, 1997.

    Google Scholar 

  15. A. Bell and T. Sejnowski, “An information-maximization approach to blind separation and blind deconvolution,” in Neural Computation, vol. 7, pp. 1129–1159, 1995.

    Google Scholar 

  16. L. Deng and J. Droppo and A. Acero, “Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion,” in IEEE Trans. Speech and Audio Processing, vol. 13, pp. 412–421, 2005.

    Google Scholar 

  17. A. Hyvärinen and E. Oja. A fast fixed-point algorithm for independent component analysis. in Neural Computation, vol. 9, pp. 1483–1492, 1997.

    Google Scholar 

  18. T. Kristjansson and B. Frey. Accounting for uncertainty in observations: A new paradigm for robust automatic speech recognition, in Proc. ICASSP, 2002.

    Google Scholar 

  19. C. Mejuto, A. Dapena and L. Castedo, “Frequency-domain infomax for blind separation of convolutive mixtures”, in Proc. ICA 2000, pp. 315–320, Helsinki, 2000.

    Google Scholar 

  20. N. Murata, S. Ikeda, and A. Ziehe, “An approach to blind source separation based on temporal structure of speech signals,” Neurocomputing, vol. 41, no. 1–4, pp. 1–24, Oct. 2001.

    Article  MATH  Google Scholar 

  21. L. Parra, C. Spence and B. De Vries, “Convolutive blind source separation based on multiple decorrelation.” in Proc. IEEE NNSP workshop, pp. 23–32, Cambridge, UK, 1998.

    Google Scholar 

  22. K. Kamata, X. Hu, and H. Kobatake, “A new approach to the permutation problem in frequency domain blind source separation,” in Proc. ICA 2004, pp. 849–856, Granada, Spain, September 2004.

    Google Scholar 

  23. D.-T. Pham, C. Servière, and H. Boumaraf, “Blind separation of speech mixtures based on nonstationarity” in IEEE Signal Processing and Its Applications, Proceedings of the Seventh International Symposium, pp. 73–76, 2003.

    Google Scholar 

  24. W. Baumann, D. Kolossa and R. Orglmeister, “Maximum likelihood permutation correction for convolutive source separation,” in ICA 2003, pp. 373–378, 2003.

    Google Scholar 

  25. S. Kurita, H. Saruwatari, S. Kajita, K. Takeda, and F. Itakura, “Evaluation of frequency-domain blind signal separation using directivity pattern under reverberant conditions,” in ICASSP2000, pp. 3140–3143, 2000.

    Google Scholar 

  26. M. Ikram and D. Morgan, “A beamforming approach to permutation alignment for multichannel frequency-domain blind speech separation,” in ICASSP02, pp. 881–884, 2002.

    Google Scholar 

  27. N. Mitianoudis and M. Davies, “Permutation alignment for frequency domain ICA using subspace beamforming methods”, in Proc. ICA 2004, LNCS 3195, pp. 669–676, 2004.

    Google Scholar 

  28. H. Sawada, R. Mukai, S. Araki, S. Makino, “A robust approach to the permutation problem of frequency-domain blind source separation,” in Proc. ICASSP, vol. V, pp. 381–384, Apr. 2003.

    Google Scholar 

  29. V. Stouten and H. Van hamme and P. Wambacq, “Application of minimum statistics and minima controlled recursive averaging methods to estimate a cepstral noise model for robust ASR,” in Proc. ICASSP, vol. 1, May 2006.

    Google Scholar 

  30. D.-T. Pham, C. Servière, and H. Boumaraf, “Blind separation of convolutive audio mixtures using nonstationarity,” in Proc. ICA2003, pp. 981–986, 2003.

    Google Scholar 

  31. P. Sudhakar, and R. Gribonval, “A sparsity-based method to solve permutation indeterminacy in frequency-domain convolutive blind source separation,” in Independent Component Analysis and Signal Separation: 8th International Conference, ICA 2009, Proceedings, Paraty, Brazil, March 2009.

    Google Scholar 

  32. M. Van Segbroeck and H. Van hamme, “Robust speech recognition using missing data techniques in the prospect domain and fuzzy masks,” in Proc. ICASSP, pp. 4393–4396, 2008.

    Google Scholar 

  33. W. Baumann, and B.-U. Khler, and D. Kolossa, and R. Orglmeister, “Real time separation of convolutive mixtures.” in: Independent Component Analysis and Blind Signal Separation: 4th International Symposium, ICA 2001, Proceedings, San Diego, USA, 2001.

    Google Scholar 

  34. F. Asano, S. Ikeda, M. Ogawa, H. Asoh, and N. Kitawaki, “Combined approach of array processing and independent component analysis for blind separation of acoustic signals,” in IEEE Trans. Speech Audio Proc., vol. 11, no. 3, pp. 204–215, May 2003.

    Google Scholar 

  35. H. Sawada, S. Araki, R. Mukai and S. Makino, “Blind extraction of a dominant source from mixtures of many sources using ICA and time-frequency masking,” in ISCAS 2005, pp. 5882–5885, May 2005.

    Google Scholar 

  36. N. Mitianoudis, and M. E. Davies, “Audio source separation of convolutive mixtures.” in: IEEE Transactions on Audio and Speech Processing, vol 11(5), pp. 489-497, 2003.

    Google Scholar 

  37. D. Kolossa and R. Orglmeister, “Nonlinear post-processing for blind speech separation,” in Proc. ICA (LNCS 3195), Sep. 2004, pp. 832-839.

    Google Scholar 

  38. Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean square error log-spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-33, pp. 443–445, Apr. 1985.

    Google Scholar 

  39. S. Araki, S. Makino, Y. Hinamoto, R. Mukai, T. Nishikawa, and H. Saruwatari, “Equivalence between frequency-domain blind source separation and frequency-domain adaptive beamforming for convolutivemixtures,” in EURASIP Journal on Applied Signal Processing, vol. 11, p. 1157–1166, 2003.

    Google Scholar 

  40. E. Hoffmann, D. Kolossa and R. Orglmeister, “A batch algorithm for blind source separation of acoustic signals using ICA and time-frequency masking,” in Proc. ICA (LNCS 4666), Sep. 2007, pp. 480–488.

    Google Scholar 

  41. D. Kolossa, A. Klimas and R. Orglmeister, “Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 82–85, New Paltz, NY, 2005.

    Google Scholar 

  42. E. Hoffmann, D. Kolossa, and R. Orglmeister, “A soft masking strategy based on multichannel speech probability estimation for source separation and robust speech recognition”, In: Proc. WASPAA, New Paltz, NY, 2007.

    Google Scholar 

  43. R. J. McAulay and M. L. Malpass, “Speech enhancement using a soft-decision noise suppression filter,” IEEE Trans. ASSP-28, pp. 137–145, Apr. 1980.

    Google Scholar 

  44. I. Cohen, “On speech Enhancement under signal presence uncertainty,” International Conference on Acoustic and Speech Signal Processing, pp. 167–170, May 2001.

    Google Scholar 

  45. Y. Ephraim and I. Cohen, “Recent advancements in speech enhancement”, The Electrical Engineering Handbook, CRC Press, 2006.

    Google Scholar 

  46. R. G. Leonard, “A database for speaker-independent digit recognition”, Proc. ICASSP 84, Vol. 3, p. 42.11, 1984.

    Google Scholar 

  47. S. Srinivasan and D. Wang, “Transforming binary uncertainties for robust speech recognition”, in IEEE Trans. Audio, Speech and Language Processing, IEEE Transactions on Speech and Audio Processing vol. 15, pp. 2130–2140, 2007.

    Google Scholar 

  48. R. F. Astudillo, D. Kolossa, P. Mandelartz and R. Orglmeister, “An uncertainty propagation approach to robust ASR using the ETSI advanced front-end”, IEEE Journal of Selected Topics in Signal Processing, vol. 4, pp. 824–833, 2010.

    Article  Google Scholar 

  49. G. Brown and D. Wang, “Separation of speech by computational auditory scene analysis”, Speech Enhancement, eds. J. Benesty, S. Makino and J. Chen, Springer, pp. 371–402, 2005.

    Google Scholar 

  50. R. F. Astudillo, D. Kolossa and R. Orglmeister, “Propagation of statistical information through non-linear feature extractions for robust speech recognition”, in Proc. MaxEnt, 2007.

    Google Scholar 

  51. S. Young, G. Evermann, T. Hain, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, P. Woodland, “The HTK Book (for HTK Version 3.4)”, Cambridge University Engineering Department, 2006.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eugen Hoffmann .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Hoffmann, E., Kolossa, D., Orglmeister, R. (2011). Recognition of Multiple Speech Sources Using ICA. In: Kolossa, D., Häb-Umbach, R. (eds) Robust Speech Recognition of Uncertain or Missing Data. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21317-5_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21317-5_12

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21316-8

  • Online ISBN: 978-3-642-21317-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics