This paper analyzes the capability of multilayer perceptron frontends to perform speaker normalization. We find the context decision tree to be a very useful tool to assess the speaker normalization power of different frontends. We introduce a gender question into the training of the phonetic context decision tree. After the context clustering the gender specific models are counted. We compare this for the following frontends: (1) Bottle-Neck (BN) with and without vocal tract length normalization (VTLN), (2) standard MFCC, (3) stacking of multiple MFCC frames with linear discriminant analysis (LDA). We find the BN-frontend to be even more effective in reducing the number of gender questions than VTLN. From this we conclude that a Bottle-Neck frontend is more effective for gender normalization. Combining VTLN and BN-features reduces the number of gender specific models further.
Cite as: Schaaf, T., Metze, F. (2010) Analysis of gender normalization using MLP and VTLN features. Proc. Interspeech 2010, 306-309, doi: 10.21437/Interspeech.2010-117
@inproceedings{schaaf10_interspeech, author={Thomas Schaaf and Florian Metze}, title={{Analysis of gender normalization using MLP and VTLN features}}, year=2010, booktitle={Proc. Interspeech 2010}, pages={306--309}, doi={10.21437/Interspeech.2010-117} }