Inclusion of temporal information into features for speech recognition

Milner, Ben

doi:10.21437/ICSLP.1996-83

Inclusion of temporal information into features for speech recognition

Ben Milner

Conventional methods for incorporating temporal information into speech features apply regression to a series of successive cepstral vectors to generate differential cepstra, or apply a cosine transform to generate cepstral-time matrices. This paper aims to generalise these techniques such that a series of stacked cepstral vectors is multiplied by a temporal transform matrix to produce the final speech feature. This can made to incorporate both static and dynamic speech information. Using this method, the coding of temporal information is not restricted to regression or cosine coefficients - any suitable transform may used. Results are presented for a variety of transforms, such as Legendre, Karhunen-Loeve, Cosine, Rectangle, where it is shown that the transform based techniques offer higher performance than conventional differential cepstrum.

doi: 10.21437/ICSLP.1996-83

Cite as: Milner, B. (1996) Inclusion of temporal information into features for speech recognition. Proc. 4th International Conference on Spoken Language Processing (ICSLP 1996), 256-259, doi: 10.21437/ICSLP.1996-83

@inproceedings{milner96_icslp,
  author={Ben Milner},
  title={{Inclusion of temporal information into features for speech recognition}},
  year=1996,
  booktitle={Proc. 4th International Conference on Spoken Language Processing (ICSLP 1996)},
  pages={256--259},
  doi={10.21437/ICSLP.1996-83}
}