This paper describes the technical and system building advances made to the Google Home multichannel speech recognition system, which was launched in November 2016. Technical advances include an adaptive dereverberation frontend, the use of neural network models that do multichannel processing jointly with acoustic modeling, and Grid-LSTMs to model frequency variations. On the system level, improvements include adapting the model using Google Home specific data. We present results on a variety of multichannel sets. The combination of technical and system advances result in a reduction of WER of 8–28% relative compared to the current production system.
Cite as: Li, B., Sainath, T.N., Narayanan, A., Caroselli, J., Bacchiani, M., Misra, A., Shafran, I., Sak, H., Pundak, G., Chin, K., Sim, K.C., Weiss, R.J., Wilson, K.W., Variani, E., Kim, C., Siohan, O., Weintraub, M., McDermott, E., Rose, R., Shannon, M. (2017) Acoustic Modeling for Google Home. Proc. Interspeech 2017, 399-403, doi: 10.21437/Interspeech.2017-234
@inproceedings{li17c_interspeech, author={Bo Li and Tara N. Sainath and Arun Narayanan and Joe Caroselli and Michiel Bacchiani and Ananya Misra and Izhak Shafran and Haşim Sak and Golan Pundak and Kean Chin and Khe Chai Sim and Ron J. Weiss and Kevin W. Wilson and Ehsan Variani and Chanwoo Kim and Olivier Siohan and Mitchel Weintraub and Erik McDermott and Richard Rose and Matt Shannon}, title={{Acoustic Modeling for Google Home}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={399--403}, doi={10.21437/Interspeech.2017-234} }