Adaptive classifier cascade for multimodal speaker identification

Erzin, Engin; Yemez, Yucel; Tekalp, A. Murat

doi:10.21437/Interspeech.2004-425

Adaptive classifier cascade for multimodal speaker identification

Engin Erzin, Yucel Yemez, A. Murat Tekalp

We present a multimodal open-set speaker identification system that integrates information coming from audio, face and lip motion modalities. For fusion of multiple modalities, we propose a new adaptive cascade rule that favors reliable modality combinations through a cascade of classifiers. The order of the classifiers in the cascade is adaptively determined based on the reliability of each modality combination. A novel reliability measure, that genuinely fits to the open-set speaker identification problem, is also proposed to assess accept or reject decisions of a classifier. The proposed adaptive rule is more robust in the presence of unreliable modalities, and outperforms the hard-level max rule and soft-level weighted summation rule, provided that the employed reliability measure is effective in assessment of classifier decisions. Experimental results that support this assertion are provided.

doi: 10.21437/Interspeech.2004-425

Cite as: Erzin, E., Yemez, Y., Tekalp, A.M. (2004) Adaptive classifier cascade for multimodal speaker identification. Proc. Interspeech 2004, 2493-2496, doi: 10.21437/Interspeech.2004-425

@inproceedings{erzin04_interspeech,
  author={Engin Erzin and Yucel Yemez and A. Murat Tekalp},
  title={{Adaptive classifier cascade for multimodal speaker identification}},
  year=2004,
  booktitle={Proc. Interspeech 2004},
  pages={2493--2496},
  doi={10.21437/Interspeech.2004-425}
}