ISCA Archive Interspeech 2019
ISCA Archive Interspeech 2019

Effects of Natural Variability in Cross-Modal Temporal Correlations on Audiovisual Speech Recognition Benefit

Kaylah Lalonde

In audiovisual (AV) speech, correlations over time between visible mouth movements and the amplitude envelope of auditory speech help to reduce uncertainty as to when peaks in the auditory signal will occur. Previous studies demonstrated greater AV benefit to speech detection in noise for sentences with higher cross-modal correlations than sentences with lower cross-modal correlations.

This study examined whether the mechanisms that underlie AV detection benefits have downstream effects on speech recognition in noise. Participants were presented 72 sentences in noise, in auditory-only and AV conditions, at either their 50% auditory speech recognition threshold in noise (SRT-50) or at a signal-to-noise ratio (SNR) 6 dB poorer than their SRT-50. They were asked to repeat each sentence. Mean AV benefit across subjects was calculated for each sentence. Pearson correlations and mixed modeling were used to examined whether variability in AV benefit across sentences was related to natural variation in the degree of cross-modal correlation across sentences.

In the more difficult listening condition, higher cross-modal correlations were associated with higher AV sentence recognition benefit. The relationship was strongest in the 0.8–2.2 kHz and 0.8–6 kHz frequency regions. These results demonstrate that cross-modal correlations contribute to variability in AV speech recognition in noise.


doi: 10.21437/Interspeech.2019-2931

Cite as: Lalonde, K. (2019) Effects of Natural Variability in Cross-Modal Temporal Correlations on Audiovisual Speech Recognition Benefit. Proc. Interspeech 2019, 2260-2264, doi: 10.21437/Interspeech.2019-2931

@inproceedings{lalonde19_interspeech,
  author={Kaylah Lalonde},
  title={{Effects of Natural Variability in Cross-Modal Temporal Correlations on Audiovisual Speech Recognition Benefit}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={2260--2264},
  doi={10.21437/Interspeech.2019-2931}
}