Deep neural networks have recently shown great promise for language recognition. In particular, the expected counts of clustered context-dependent phone states (senones) can serve as a simple but effective phonotactic system. This paper introduces multinomial i-vectors applied to senone counts and shows that they work better than current PCA approaches. In addition, we show that a new approach using a standard normal prior and MAP multinomial i-vector estimation further improves performance, particularly for shorter test durations. Finally, we present a reduced-complexity version of Newton's method to greatly accelerate multinomial i-vector extraction. Experimental results on the NIST LRE11 task show that this approach performs significantly better than top-performing acoustic and phonotactic systems from that evaluation.
Cite as: McCree, A., Garcia-Romero, D. (2015) DNN senone MAP multinomial i-vectors for phonotactic language recognition. Proc. Interspeech 2015, 394-397, doi: 10.21437/Interspeech.2015-162
@inproceedings{mccree15_interspeech, author={Alan McCree and Daniel Garcia-Romero}, title={{DNN senone MAP multinomial i-vectors for phonotactic language recognition}}, year=2015, booktitle={Proc. Interspeech 2015}, pages={394--397}, doi={10.21437/Interspeech.2015-162} }