ISCA Archive Eurospeech 1993
ISCA Archive Eurospeech 1993

Real-time, neural network-based, French alphabet recognition with telephone speech

P. Schmid, Ronald Cole, M. Fanty, Hervé Bourlard, M. Haessen

We describe a real-time speaker-independent French alphabet recognizer that performs with sufficient accuracy for commercial use. The system (a) digitizes a sequence of letters separated by brief pauses and computes a RASTA-PLP spectral representation, zero-crossing rate and peak-to-peak amplitudes of the waveform; (b) uses a neural network to assign 23 phonetic category labels to successive time frames; (c) performs an initial segmentation of the speech by mapping the phonetic label scores for each frame to pronunciation models for each letter using a modified Viterbi search; (d) performs a second classification of each hypothesized letter using the segment boundaries provided by the first-pass segmentation, producing a set of 26 letter scores plus a score for the category "Not-A-Letter"; and (e) uses the letter scores (plus the score for the category "Not-A-Letter") to identify the spelled word from a data base. The system has been evaluated on calls that were not used for training either network. The system achieved 84.4% first choice letter recognition accuracy on the test set. The system has also been evaluated on 84 spelled names from different callers where it achieved 92.8% correct recognition of the 84 spelled names contained in a database of 50,000 names. The final system has been optimized to run in real-time on a PC-board based on a single DSP TMS320C30. The two passes described above are performed in real-time by the DSP while the name search (up to 50,000 names) is performed (as letters are recognized) by the PC.


doi: 10.21437/Eurospeech.1993-393

Cite as: Schmid, P., Cole, R., Fanty, M., Bourlard, H., Haessen, M. (1993) Real-time, neural network-based, French alphabet recognition with telephone speech. Proc. 3rd European Conference on Speech Communication and Technology (Eurospeech 1993), 1723-1726, doi: 10.21437/Eurospeech.1993-393

@inproceedings{schmid93_eurospeech,
  author={P. Schmid and Ronald Cole and M. Fanty and Hervé Bourlard and M. Haessen},
  title={{Real-time, neural network-based, French alphabet recognition with telephone speech}},
  year=1993,
  booktitle={Proc. 3rd European Conference on Speech Communication and Technology (Eurospeech 1993)},
  pages={1723--1726},
  doi={10.21437/Eurospeech.1993-393}
}