In audiovisual (AV) speech, correlations over time between visible
mouth movements and the amplitude envelope of auditory speech help
to reduce uncertainty as to when peaks in the auditory signal will
occur. Previous studies demonstrated greater AV benefit to speech detection
in noise for sentences with higher cross-modal correlations than sentences
with lower cross-modal correlations.
This study examined
whether the mechanisms that underlie AV detection benefits have downstream
effects on speech recognition in noise. Participants were presented
72 sentences in noise, in auditory-only and AV conditions, at either
their 50% auditory speech recognition threshold in noise (SRT-50) or
at a signal-to-noise ratio (SNR) 6 dB poorer than their SRT-50. They
were asked to repeat each sentence. Mean AV benefit across subjects
was calculated for each sentence. Pearson correlations and mixed modeling
were used to examined whether variability in AV benefit across sentences
was related to natural variation in the degree of cross-modal correlation
across sentences.
In the more difficult listening condition, higher cross-modal
correlations were associated with higher AV sentence recognition benefit.
The relationship was strongest in the 0.8–2.2 kHz and 0.8–6
kHz frequency regions. These results demonstrate that cross-modal correlations
contribute to variability in AV speech recognition in noise.
Cite as: Lalonde, K. (2019) Effects of Natural Variability in Cross-Modal Temporal Correlations on Audiovisual Speech Recognition Benefit. Proc. Interspeech 2019, 2260-2264, doi: 10.21437/Interspeech.2019-2931
@inproceedings{lalonde19_interspeech, author={Kaylah Lalonde}, title={{Effects of Natural Variability in Cross-Modal Temporal Correlations on Audiovisual Speech Recognition Benefit}}, year=2019, booktitle={Proc. Interspeech 2019}, pages={2260--2264}, doi={10.21437/Interspeech.2019-2931} }