Analysis of gender normalization using MLP and VTLN features

Schaaf, Thomas; Metze, Florian

doi:10.21437/Interspeech.2010-117

Analysis of gender normalization using MLP and VTLN features

Thomas Schaaf, Florian Metze

This paper analyzes the capability of multilayer perceptron frontends to perform speaker normalization. We find the context decision tree to be a very useful tool to assess the speaker normalization power of different frontends. We introduce a gender question into the training of the phonetic context decision tree. After the context clustering the gender specific models are counted. We compare this for the following frontends: (1) Bottle-Neck (BN) with and without vocal tract length normalization (VTLN), (2) standard MFCC, (3) stacking of multiple MFCC frames with linear discriminant analysis (LDA). We find the BN-frontend to be even more effective in reducing the number of gender questions than VTLN. From this we conclude that a Bottle-Neck frontend is more effective for gender normalization. Combining VTLN and BN-features reduces the number of gender specific models further.

doi: 10.21437/Interspeech.2010-117

Cite as: Schaaf, T., Metze, F. (2010) Analysis of gender normalization using MLP and VTLN features. Proc. Interspeech 2010, 306-309, doi: 10.21437/Interspeech.2010-117

@inproceedings{schaaf10_interspeech,
  author={Thomas Schaaf and Florian Metze},
  title={{Analysis of gender normalization using MLP and VTLN features}},
  year=2010,
  booktitle={Proc. Interspeech 2010},
  pages={306--309},
  doi={10.21437/Interspeech.2010-117}
}