Structural joint factor analysis for speaker recognition

Ferràs, Marc; Shinoda, Koichi; Furui, Sadaoki

doi:10.21437/Interspeech.2011-66

Structural joint factor analysis for speaker recognition

Marc Ferràs, Koichi Shinoda, Sadaoki Furui

In recent years, adaptation techniques have been given a special focus in speaker recognition tasks. Addressing the separation of speaker and session variation effects, Joint Factor Analysis (JFA) has been consolidated as a powerful adaptation framework and has become ubiquitous in the last NIST Speaker Recognition Evaluations (SRE). However, its global parameter sharing strategy is not necessarily optimal when a small amount of adaptation data is available. In this paper, we address this issue by resorting to a regularization approach such as structural MAP. We introduce two variants of structural JFA (SJFA) that, depending on the amount of data, use coarser or finer parameter approximations in the adaptation process. One of these variants is shown to considerably outperform JFA. We report relative gains over 25% EER on the 2006 NIST SRE data for GMM-SVM systems using SJFA over systems using JFA.

doi: 10.21437/Interspeech.2011-66

Cite as: Ferràs, M., Shinoda, K., Furui, S. (2011) Structural joint factor analysis for speaker recognition. Proc. Interspeech 2011, 2373-2376, doi: 10.21437/Interspeech.2011-66

@inproceedings{ferras11_interspeech,
  author={Marc Ferràs and Koichi Shinoda and Sadaoki Furui},
  title={{Structural joint factor analysis for speaker recognition}},
  year=2011,
  booktitle={Proc. Interspeech 2011},
  pages={2373--2376},
  doi={10.21437/Interspeech.2011-66}
}