Impaired speech perception in noise with a normal audiogram: No evidence for cochlear synaptopathy and no relation to lifetime noise exposure

In rodents, noise exposure can destroy synapses between inner hair cells and auditory nerve fibers (“cochlear synaptopathy”) without causing hair cell loss. Noise-induced cochlear synaptopathy usually leaves cochlear thresholds unaltered, but is associated with long-term reductions in auditory brainstem response (ABR) amplitudes at medium-to-high sound levels. This pathophysiology has been suggested to degrade speech perception in noise (SPiN), perhaps explaining why SPiN ability varies so widely among audiometrically normal humans. The present study is the first to test for evidence of cochlear synaptopathy in humans with significant SPiN impairment. Individuals were recruited on the basis of self-reported SPiN difficulties and normal pure tone audiometric thresholds. Performance on a listening task identified a subset with “verified” SPiN impairment. This group was matched with controls on the basis of age, sex, and audiometric thresholds up to 14 kHz. ABRs and envelope-following responses (EFRs) were recorded at high stimulus levels, yielding both raw amplitude measures and within-subject difference measures. Past exposure to high sound levels was assessed by detailed structured interview. Impaired SPiN was not associated with greater lifetime noise exposure, nor with any electrophysiological measure. It is conceivable that retrospective self-report cannot reliably capture noise exposure, and that ABRs and EFRs offer limited sensitivity to synaptopathy in humans. Nevertheless, the results do not support the notion that noise-induced synaptopathy is a significant etiology of SPiN impairment with normal audiometric thresholds. It may be that synaptopathy alone does not have significant perceptual consequences, or is not widespread in humans with normal audiograms.


Electrophysiological recording and analysis methods
Participants reclined with eyes closed in a double-walled, sound-attenuating booth. Auditory stimuli were presented via electromagnetically shielded ER3A insert earphones driven by an Avid FastTrack C400 audio interface. A BioSemi Active2 measurement system recorded from active electrodes at Cz, C7, and both mastoids. Common Mode Sense and Driven Right Leg electrodes were attached at mid-forehead and electrode offsets remained within ±40 mV throughout all recordings. Data streams from all four electrodes were saved for offline analysis, along with stimulustiming information received from the external audio interface via a custom-made trigger box.

Auditory brainstem response
Stimuli were designed to focus excitation on the characteristic frequencies typically affected by early noise-induced cochlear damage. 100 µs pulses were high-pass filtered (first-order butterworth, 2.4 kHz cutoff) and delivered via ER3A inserts, yielding clicks whose 10 dB bandwidth extended from 1.2 to 4.7 kHz (as recorded in a Gras IEC60711 occluded-ear simulator). The stimuli were delivered at a level of 102 dB peSPL, sufficient to elicit the half-octave basalward shift in the travelling wave (McFadden, 1986) and provide strong excitation of characteristic frequencies between approximately 2 and 7 kHz. Each ear received 7040 clicks at a rate of 7.05/second. However, presentation alternated between ears, leading to an overall presentation rate of 14.1/second and halving the recording time. The inter-stimulus interval was jittered by up to 10%, in order to prevent the accumulation of stationary interference.
Bioelectrical activity between Cz and ipsilateral mastoid was recorded at a sampling rate of 16384 Hz and divided into epochs extending from 10 ms pre-stimulus to 8 ms post-stimulus. Epochs whose activity exceeded the mean for the recording by more than two standard deviations were rejected. Those that remained were averaged and the resulting waveforms were filtered between 50 and 1500 Hz (fourth-order butterworth) and corrected for any linear drift by subtracting a linear fit to the pre-stimulus baseline. Waves I and V were then quantified by a peak-picking algorithm that identified features in specified time windows. Wave I was defined as a maximum occurring 1.55-2.05 ms after stimulus peak; wave V as a maximum (or inflection point on a falling portion of the waveform) occurring 5.1-6.6 ms after stimulus peak; the trough of wave I as the lowest point occurring 0.3-1.0 ms after the peak of wave I. Wave I amplitude was measured from peak to trough; wave V amplitude from peak to pre-stimulus baseline. Post-hoc subjective review verified that the algorithm had appropriately interpreted all waveforms (presented in full on pages 7 and 8 of the supplementary material).

Envelope-following response
Stimuli were 75 dB SPL transposed tones (Bernstein and Trahiotis, 2002) with carrier frequency 4000 Hz and modulation rate 100 Hz. In order to attenuate off-frequency contributions, tones were presented concurrently with a notched-noise masker (bandwidth 20-10000 Hz, notch width 800 Hz), realized separately for each trial and applied at an SNR of 20 dB (broadband RMS). Stimulus duration was 400 ms with the addition of 15 ms onset and offset ramps. The duration of the inter-stimulus interval was 400 ms on average, jittered by up to 10%. Following the methods of Bharadwaj et al. (2015), tones were of two modulation depths: 0 dB (full modulation) and -6 dB (shallow modulation). Each of these tones was presented 1260 times, half in each polarity. The resulting four stimuli were interleaved throughout the recording, in the sequence 0 dB; 0 dB inverted; -6 dB; -6 dB inverted.
Bioelectrical activity between Cz and C7 was extracted for epochs extending from 4 to 404 ms after the end of the stimulus onset ramp. For each stimulus modulation depth and polarity, epochs were rejected if their RMS activity exceeded the 99 th percentile for recording. The remaining epochs were averaged and the responses to opposing polarities summed, emphasizing the response to the temporal envelope. Each resulting EFR was subjected to a discrete Fourier transform to yield the response amplitude (at the 100 Hz modulation frequency) and an estimate of the noise floor (based on activity in 10 adjacent frequency bins).
Following Bharadwaj et al. (2015), we aimed to enhance sensitivity to cochlear synaptopathy by computing an EFR difference measure: the difference in response amplitude (in dB) at the two stimulus modulation depths. This measure is closely related to the "EFR slope" metric of Bharadwaj and colleagues, though based on a two-point rather than a three-point function. Such measures rest on the assumption that synaptopathy preferentially affects high-threshold AN fibers and should therefore preferentially degrade the encoding of stimuli with shallow modulations. A schematic illustration of the difference measure is provided in Fig. 1. Since it is possible that responses to both modulation depths might be impaired by synaptopathy, raw response amplitude was also analyzed.