Despite the significant progress in automatic speech recognition (ASR), distant ASR remains challenging due to noise and reverberation. A common approach to mitigate this issue consists of equipping the recording devices with multiple microphones that capture the acoustic scene from different perspectives. These multi-channel audio recordings contain specific internal relations between each signal. In this paper, we propose to capture these inter- and intra- structural dependencies with quaternion neural networks, which can jointly process multiple signals as whole quaternion entities. The quaternion algebra replaces the standard dot product with the Hamilton one, thus offering a simple and elegant way to model dependencies between elements. The quaternion layers are then coupled with a recurrent neural network, which can learn long-term dependencies in the time domain. We show that a quaternion long-short term memory neural network (QLSTM), trained on the concatenated multi-channel speech signals, outperforms equivalent real-valued LSTM on two different tasks of multi-channel distant speech recognition.
Cite as: Qiu, X., Parcollet, T., Ravanelli, M., Lane, N.D., Morchid, M. (2020) Quaternion Neural Networks for Multi-Channel Distant Speech Recognition. Proc. Interspeech 2020, 329-333, doi: 10.21437/Interspeech.2020-1682
@inproceedings{qiu20_interspeech, author={Xinchi Qiu and Titouan Parcollet and Mirco Ravanelli and Nicholas D. Lane and Mohamed Morchid}, title={{Quaternion Neural Networks for Multi-Channel Distant Speech Recognition}}, year=2020, booktitle={Proc. Interspeech 2020}, pages={329--333}, doi={10.21437/Interspeech.2020-1682} }