Polynomial dynamic time warping kernel support vector machines for dysarthric speech recognition with sparse training data

Wan, Vincent; Carmichael, James

doi:10.21437/Interspeech.2005-853

Polynomial dynamic time warping kernel support vector machines for dysarthric speech recognition with sparse training data

Vincent Wan, James Carmichael

This paper describes a new formulation of a polynomial sequence kernel based on dynamic time warping (DTW) for support vector machine (SVM) classification of isolated words given very sparse training data. The words are uttered by dysarthric speakers who suffer from debilitating neurological conditions that make the collection of speech samples a time-consuming and low-yield process. Data for building dysarthric speech recognition engines are therefore limited. Simulations show that the SVM based approach is significantly better than standard DTW and hidden Markov model (HMM) approaches when given sparse training data. In conditions where the models were constructed from three examples of each word, the SVM approach recorded a 45% lower error rate (relative) than the DTW approach and a 35% lower error rate than the HMM approach.

doi: 10.21437/Interspeech.2005-853

Cite as: Wan, V., Carmichael, J. (2005) Polynomial dynamic time warping kernel support vector machines for dysarthric speech recognition with sparse training data. Proc. Interspeech 2005, 3321-3324, doi: 10.21437/Interspeech.2005-853

@inproceedings{wan05b_interspeech,
  author={Vincent Wan and James Carmichael},
  title={{Polynomial dynamic time warping kernel support vector machines for dysarthric speech recognition with sparse training data}},
  year=2005,
  booktitle={Proc. Interspeech 2005},
  pages={3321--3324},
  doi={10.21437/Interspeech.2005-853}
}