ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

A flexible stream architecture for ASR using articulatory features

Florian Metze, Alex Waibel

Recently, speech recognition systems based on articulatory features such as "voicing" or the position of lips and tongue have gained interest, because they promise advantages with respect to robustness and permit new adaptation methods to compensate for channel, noise, and speaker variability. These approaches are also interesting from a general point of view, because their models use phonological and phonetic concepts, which allow for a richer description of a speech act than the sequence of HMM-states, which is the prevalent ASR architecture today. In this work, we present a multi-stream architecture, in which CD-HMMS are supported by detectors for articulatory features, using a linear combination of log-likelihood scores. This multi-stream approach results in a 15% reduction of WER on a read Broadcast-News (BN) task and improves performance on a spontaneous scheduling task (ESST) by 7%. The proposed architecture potentially allows for new speaker and channel adaptation schemes, including stream asynchronicity.


doi: 10.21437/ICSLP.2002-583

Cite as: Metze, F., Waibel, A. (2002) A flexible stream architecture for ASR using articulatory features. Proc. 7th International Conference on Spoken Language Processing (ICSLP 2002), 2133-2136, doi: 10.21437/ICSLP.2002-583

@inproceedings{metze02_icslp,
  author={Florian Metze and Alex Waibel},
  title={{A flexible stream architecture for ASR using articulatory features}},
  year=2002,
  booktitle={Proc. 7th International Conference on Spoken Language Processing (ICSLP 2002)},
  pages={2133--2136},
  doi={10.21437/ICSLP.2002-583}
}