Single-channel speech separation using sparse non-negative matrix factorization

Schmidt, Mikkel N.; Olsson, Rasmus K.

doi:10.21437/Interspeech.2006-655

Single-channel speech separation using sparse non-negative matrix factorization

Mikkel N. Schmidt, Rasmus K. Olsson

We apply machine learning techniques to the problem of separating multiple speech sources from a single microphone recording. The method of choice is a sparse non-negative matrix factorization algorithm, which in an unsupervised manner can learn sparse representations of the data. This is applied to the learning of personalized dictionaries from a speech corpus, which in turn are used to separate the audio stream into its components. We show that computational savings can be achieved by segmenting the training data on a phoneme level. To split the data, a conventional speech recognizer is used. The performance of the unsupervised and supervised adaptation schemes result in significant improvements in terms of the target-to-masker ratio.

doi: 10.21437/Interspeech.2006-655

Cite as: Schmidt, M.N., Olsson, R.K. (2006) Single-channel speech separation using sparse non-negative matrix factorization. Proc. Interspeech 2006, paper 1652-Thu2FoP.10, doi: 10.21437/Interspeech.2006-655

@inproceedings{schmidt06_interspeech,
  author={Mikkel N. Schmidt and Rasmus K. Olsson},
  title={{Single-channel speech separation using sparse non-negative matrix factorization}},
  year=2006,
  booktitle={Proc. Interspeech 2006},
  pages={paper 1652-Thu2FoP.10},
  doi={10.21437/Interspeech.2006-655}
}