Skip to main content
Log in

Supervised and unsupervised separation of convolutive speech mixtures using f 0 and formant frequencies

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this paper we discuss the role of fundamental frequency f 0 and formants F 1, F 2 and F 3 of the speech signal in supervised and unsupervised source separation of real recorded convolutive speech mixtures. Initially supervised source separation is discussed where it is assumed that sources are known a priori. The supervised source separation is discussed by considering (1) only fundamental frequency f 0, (2) only formants F 1, F 2 and F 3, (3) both f 0 and formants F 1, F 2 and F 3. It is observed that last case which involves both f 0 and formants gives most accurate separation results and is used as ideal case or reference to compare the separation results obtained for unsupervised source separation. The unsupervised source separation is discussed, where there is no knowledge about the sources a priori. The unsupervised source separation is discussed using (1) cross correlation of formants of different frames along with f 0 and (2) standard deviation of magnitude of frequency components in F 1, F 2 and F 3 regions of the spectrogram. It is observed that separation results obtained using both unsupervised methods are very close to the ideal case in supervised source separation. The results show that this method works better than some of the classical blind source separation algorithms like independent component analysis and non negative matrix factorization which works well only for the case of instantaneous mixtures where delay is neglected.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Abrard, F., & Deville, Y. (2005). A time-frequency blind signal separation method applicable to underdetermined mixtures of dependent sources. Signal Processing, 85, 1389–1403.

    Article  MATH  Google Scholar 

  • Amari, S., Douglas, S. C., Chichocki, A., & Yang, H. H. (1997). Multi channel blind deconvolution and equalization using the natural gradient. Proceedings of 1st IEEE workshop signal processing advances wireless Communications, France, pp. 101–102.

  • Bell, A. J., & Sejnowski, T. J. (1995). An information maximization approach to blind separation and blind deconvolution. Neural Computation, 7, 1129–1159.

    Article  Google Scholar 

  • Belouchrani, A., & Amin, M. G. (1998). Blind source separation based on time-frequency signal representations. IEEE Transactions on Signal Processing, 46, 2888–2897.

    Article  Google Scholar 

  • Bofill, P. (2008). Identifying single source data for mixing matrix estimation in instantaneous blind source separation. ICANN, 1, 759–767.

    Google Scholar 

  • Bofill, P., & Zibulevsky, M. (2001). Underdetermined blind source separation using sparse representation. Signal Processing, 81, 2353–2362.

    Article  MATH  Google Scholar 

  • Douglas, S. C., Sawada, H., & Makino, S. (2005). Natural gradient multi channel blind deconvolution and speech separation using causal FIR filters. IEEE Transactions on Speech and Audio Processing, 13, 92–104.

    Article  Google Scholar 

  • Fevotte, R., & Gribonval, E. (2005). Vincent, BSS EVAL tool box user guide revision 2. IRISA: Technical Report.

    Google Scholar 

  • Hershey, J. R., Olsen, P. A., Rennie, S. J., & Aron, A. (2011). Audio Alchemy: Getting computers to understand overlapping speech. Scientific American Online, April 2011. http://www.scientificamerican.com/article/speech-getting-computers-understand-overlapping/.

  • Jourjine, A., Rickard, S., & Yilmaz, O. (2000). Blind separation of disjoint orthogonal signals: Demixing N sources from 2 mixtures. In Proceedings of the ICASSP, pp 2986–2988.

  • Karhunen, J., & Oja, E. (2001). Independent component analysis. Hoboken: Wiley.

    Google Scholar 

  • Loizou P. C. (2003). COLEA: A MATLAB software tool for speech analysis.

  • Naik, G. R., & Kumar, D. K. (2011). An overview of independent component analysis and its applications. Informatica, 35, 63–81.

    MATH  Google Scholar 

  • Ozerov, A., & Fevotte, C. (2010). Multi channel non negative matrix factorization in convolutive mixtures for audio source separation. IEEE Transactions on Audio, Speech and Language Processing, 18, 550.

    Article  Google Scholar 

  • Philipos, C. (2011). Loizou. Speech quality assessment, Multimedia Analysis, Processing & Communications, 346, 623–654.

    Article  Google Scholar 

  • Reju, V., Gkoh, S., & Soon, I. N. (2009). An algorithm for mixing matrix estimation in instantaneous blind source separation. Signal Processing, 89, 1762–1773.

    Article  MATH  Google Scholar 

  • Roy, C., & Milinazzo, F. (1993). Formant location from LPC analysis data. IEEE Transactionson Speech and Audio Processing, 1, 129–134.

    Article  Google Scholar 

  • Sawada, H., Mukai, R., Araki, S., & Makino, S. (2005). Frequency domain blind source separation, speech enhancement (pp. 299–327). Berlin: Springer.

    Google Scholar 

  • Smaragdis, P. (1998). Blind separation of convolved mixtures in the frequency domain. Neuro Computing, 22, 21–34.

    MATH  Google Scholar 

  • Smith, D., Lukasiak, J., & Burnett, I. S. (2004). A two channel block adaptive audio separation technique based upon time-frequency information. In Proceedings of the 12th European signal processing conference, pp. 393–396.

  • Stone, J. V. (2001). Blind source separation using temporal predictability. Neural Computation, 13, 1559–1574.

    Article  MATH  Google Scholar 

  • Stone, J. V. (2004). Independent component analysis: A tutorial introduction. Boston: MIT Press.

    Google Scholar 

  • Vincent, E., & Bertin, N. (2014). From Blind to guided audio source separation. IEEE Signal Processing Magazine, 31, 107.

    Article  Google Scholar 

  • Vincent, E., Gribonval, R., & Fevotte, C. (2006). Performance measurement in blind audio source separation. IEEE Transactionson on Speech and Audio Processing, 14, 1462–1469.

    Article  Google Scholar 

  • Yegnanarayana, B., & Sriramamurty, R. K. (2009). Determining mixing parameters from multi speaker data using speech specific information. IEEE Transactions on Audio, Speech and Language Processing, 17, 1196.

    Article  Google Scholar 

  • Yilmaz, O., & Rickard, S. (2004). Blind separation of speech mixtures via time-frequency masking. IEEE Transactions on Signal Processing, 52, 1830–1847.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. K. Prasanna Kumar.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Prasanna Kumar, M.K., Kumaraswamy, R. Supervised and unsupervised separation of convolutive speech mixtures using f 0 and formant frequencies. Int J Speech Technol 18, 649–662 (2015). https://doi.org/10.1007/s10772-015-9309-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-015-9309-1

Keywords

Navigation