To achieve much faster decoding, or much lower power consumption, we need to liberate speech recognition from the artificial constraints of its current software-only form, and move the essential computations directly into silicon. There are vast efficiencies waiting to be unlocked in this application - we need the proper architecture to do so. We report results from a first-generation hardware architecture simulated at bit-level, and a complete, working FPGA-based prototype. Simulation results show that rather modest hardware designs, running 10-20X slower than conventional processors, can already decode at 0.6 xRT, running the standard 5K Wall Street Journal benchmark.
Cite as: Lin, E.C., Yu, K., Rutenbar, R.A., Chen, T. (2006) Moving speech recognition from software to silicon: the in silico vox project. Proc. Interspeech 2006, paper 1942-Thu1CaP.12, doi: 10.21437/Interspeech.2006-103
@inproceedings{lin06_interspeech, author={Edward C. Lin and Kai Yu and Rob A. Rutenbar and Tsuhan Chen}, title={{Moving speech recognition from software to silicon: the in silico vox project}}, year=2006, booktitle={Proc. Interspeech 2006}, pages={paper 1942-Thu1CaP.12}, doi={10.21437/Interspeech.2006-103} }