Introduction

Acoustic overexposure and aging are conventionally thought to cause hearing damage by damaging sensory cells (hair cells) in the cochlea leading to a decrease in hearing sensitivity1,2. Such loss of sensitivity is quantified using threshold audiometry, which is the foundation of current clinical diagnostics, audiological counseling, and patient management. Contrary to this view, recent animal data show substantial permanent damage to synapses and auditory afferents innervating the cochlea from noise exposure (NE)3, even in the absence of hair-cell damage. In normal aging, such primary cochlear neural degeneration is evident not only in animal models4, but also in analyses of post-mortem human temporal bones5. Insidiously, even an extreme degree of such cochlear synaptopathy (CS) is unlikely to affect thresholds on clinical audiograms6. Thus, the extent to which such “hidden” damage occurs in behaving humans and contributes to suprathreshold perceptual deficits (e.g., listening in restaurants amidst background noise) is unknown and hotly debated. The emergence of pharmacological treatments to restore synaptic connections in pre-clinical models7 further underscores the urgent need to resolve this foundational question and establish robust assays that can reveal CS non-invasively.

Given that most patients seeking audiological help struggle in listening environments with background noise, considerable effort has been directed to examine associations between risk factors for CS (i.e., NE history, or age), and suprathreshold hearing in noise. Such investigations have yielded mixed results8,9,10,11,12, perhaps because they have been hampered by multiple sources of variability13. First, individuals with a history of greater NE and more advanced age tend to also have greater audiometric threshold elevations making it difficult to accurately estimate the effects attributable to CS. Second, NE history and the history of hearing protection use are difficult to assess retroactively; moreover, cumulative NE levels tend to correlate with age rendering it difficult to disentangle their respective effects. Finally, while non-invasive assays correlated with CS have been successful in certain mouse strains 14,15, whether such assays are sensitive when there are genetic and experiential factors introducing considerable extraneous variance is yet to be established. In the present study, many of these impediments were mitigated by (1) employing two distinct non-invasive assays that were specifically designed to reduce the variability from commonly occurring extraneous factors13, and (2) using parallel measurements in two species, a “wild-type” chinchilla model of cochlear synaptopathy with minimal hair-cell loss16, and human groups with substantially different risk for synaptopathy but tightly matched clinical audiograms. The choice of the chinchilla model for this study was motivated by a confluence of many factors that make it a valuable model for hearing-science in general, and studies of noise-induced hearing loss in particular17. Chinchillas and humans have a similar frequency range of sensitivity18. Non-invasive measures provide readily interpretable assays in chinchillas; e.g., thresholds measured using otoacoustic emissions and auditory brainstem responses predict behavioral19 and single-neuron (auditory-nerve) thresholds20. Further, middle-ear muscle reflexes, which are a focus of the present study, are robustly measurable in chinchillas and known to show similar dependence on sound level and duration as humans21,22. Finally, the use of a wild-type model helps mirror the individual variations both in susceptibility to noise damage23, and in extraneous contributions (e.g., anatomical factors) to non-invasive measures13.

Results and discussion

To examine the integrity of the afferent cochlear nerve, two primary pathways driven by the same auditory afferents were evaluated. The first assay measured the auditory brainstem response (ABR) wave-I amplitude in response to high-pass (HP) clicks (3–8 kHz, for humans) or high-frequency tone-bursts (4 and 8 kHz, for chinchillas), normalized by the wave-V amplitude. The human ABR was acquired using ear-canal “tiptrodes” and 32-channel scalp measurements. Note that although the stimuli used for ABR measurements are slightly different across humans and chinchillas, they target similar tonotopic sections of the early auditory pathway. The second assay was the wideband middle-ear muscle reflex (WB-MEMR) in response to broadband (BB, 0.5–8 kHz, both species) and HP (3–8 kHz, humans only) noise elicitors. These particular choices of assays and measurement parameters were guided by a series of prior experiments, and help reduce many sources of extraneous variance13. Specifically, because the ABR is a neural population response, the relative timing and synchrony of response components arising from high- and low-frequency sections of the cochlea can influence the overall ABR amplitude. Because cochlear response times are slower and show siginificant intersubject variability at low frequencies, it was previously argued that the use of high-pass clicks can help circumnavigate this extraneous variance and increase sensitivity to cochlear synaptopathy13. Furthermore, the use of the normalized ABR wave I/V ratio can help reduce the effects of individual variations in head/brain anatomy13. Finally, a wideband, rather than a clinic-style single-frequency measurement of the MEMR can help reduce extraneous variance from individual variations in the spectral-profile of the reflex-drive immittance changes13. Mechanistically, the WB-MEMR is a particilarly attractive candidate assay for probing CS13, which is known to disproportionately affect nerve fibers with high thresholds and low spontaneous rates (low-SR)14. There is indirect evidence that the afferent limb of the MEMR circuit disproportionately weights such low-SR fibers, and that the circuit is driven by the summed output of such afferents24,25. This is in contrast to the ABR wave-1, which although reflective of population activity of cochlear neurons, seems to weigh the low-SR population little26. Thus, it is reasonable to expect that the WB-MEMR would be more sensitive than the ABR wave-1. In addition to these putative CS assays, distortion-product otoacoustic emissions (DPOAEs) were obtained with f2-primaries spanning 2–16 kHz to control for outer-hair-cell (OHC) damage. Behavioral audiometry from 250 Hz–16 kHz was also done in humans.

To establish the sensitivity of our assays in a genetically heterogeneous cohort, but with known lab-induced pathophysiology, we studied chinchillas exposed to moderate-level noise in a pre-post design (Fig. 1a). Awake chinchillas were exposed to 100-dB-SPL octave-band noise centered at 1 kHz for two hours, which produces histologically confirmed CS with less than 5 dB permanent threshold shift (Fig. 1b). Histological confirmation of synapse counts was conducted in a cohort of chinchillas separate from the animals used for measuring the effects of noise-exposure on the suprathreshold ABR and WB-MEMR. However, the noise exposure paradigm was identical in both groups of chinchillas. Furthermore, DPOAE magnitudes and ABR thresholds were used to ascertain that the cohort of chinchillas used for suprathreshold ABR and WB-MEMR measurements did not sustain threshold shifts two weeks after the exposure (i.e., no permanent threshold shifts).

Fig. 1: Exposing chinchillas to octave-band noise causes TTS and transient DPOAE reductions but sustained CS, reduced WB-MEMR and suprathreshold ABR.
figure 1

a Exposure and measurement timeline. b Confocal imaging reveals broad CS following noise-exposure (NE), octave-band centered at 1 kHz indicated in gray. c Reduced DPOAE amplitudes 1-day post-NE, but full recovery by 2 weeks. d ABR thresholds at 2-weeks post-NE are within 5 dB of pre-NE levels. e Large (>50%) sustained reduction in WB-MEMR amplitudes and increased thresholds. f Reduced ABR wave-I/V ratio at 2-weeks post. All datapoints are estimated mean ± STE bars (N = 7). Measurements from individual subjects are shown as transparent lines color coded by whether the values were obtained pre-, 1-day post-, or 2-weeks post-NE (a–d; Left panel of e), or as dotted gray lines (Right panel of e; f). Underlying data are archived on Zenodo (https://doi.org/10.5281/zenodo.6672827)61.

Although histological analysis confirms CS without hair-cell loss in our chinchilla model, similarly to noise-induce primary neural degeneration in mouse models3, the cochleatopic pattern of damage is slightly different from the typical cochleotopic profiles of CS induced in mouse models3. Here, the CS profile obtained using this exposure paradigm spans 1–10 kHz cochlear places (Fig. 1b), replicating a previous chinchilla study16. In mouse models that used high-frequency noise exposures, no synapse loss is observed in tonotopic regions that are tuned to the noise band; instead, damage is most pronounced in sections tuned to a frequency of about an octave and a half higher. This difference likely arises from the fact that the low-frequency NE (centered around 1 kHz) used in the present study would maximally excite apical sections of the cochlea, whereas the exposures used in prevailing mouse models would maximally excite basal sections. Cochlear mechanical studies suggest that the response properties of the base and apex can be fundamentally different in many respects27. It is likely that such apex-base differences, rather than species differences, lead to different cochleotopic pattern of noise-induced CS in the present study compared to established mouse models. Indeed, in previous experiments of noise-induced hair-cell loss (rather than just CS) where low-frequency exposures have been employed, the cochleotopic pattern of damage is comparable to the present study in that hair-cell loss is observed both in the apex and the base28,29.

Consistent with the lack of permanent OHC loss in response to our NE paradigm, DPOAEs were reduced in magnitude one-day post-NE but recover to pre-exposure levels two-weeks post-NE, thus revealing only a temporary threshold shift (TTS; Fig. 1c). ABR threshold shifts at two-weeks post-NE were also<5 dB (Fig. 1d), confirming TTS. In contrast, the WB-MEMR (Fig. 1e) showed a large (>50%) reduction in amplitudes and elevation in thresholds that did not recover even two-weeks post-NE (F(11, 120) = 6.74, P = 1e-8, N = 7). This suggests that the WB-MEMR is highly sensitive to CS even in a genetically heterogeneous cohort of animals despite likely including many extraneous sources of variance. As a second assay, the magnitude ratio of the ABR wave-I-to-wave-V ratio was computed for high-amplitude 4-kHz and 8-kHz tone bursts to mirror our human HP click protocol13. This assay also showed a reduction post-NE that did not recover at two weeks, but not to a statistically significant degree by conventional criteria (Fig. 1f). Taken together, these results show that while both assays may be sensitive to CS, the WB-MEMR shows greater promise in the presence of genetic heterogeneity.

While it is well established that multiple rodent models are susceptible to CS, whether humans show the same vulnerabilities is still contested30,31. To investigate whether humans show evidence of CS, we studied three groups with varying risk in a cross-sectional design (N = 166 individuals total). The control group (“YCtrl”, N = 55) comprised of young (18–35 years of age) individuals. A middle-aged cohort (“MA”, 36–60 years of age, N = 58) formed our first high-risk group, by virtue of their age. A second high-risk cohort (N = 53) of young individuals (aged 18–35 years) with regular and substantial acoustic exposures (“YExp”) was recruited from the Purdue University marching band and shooting clubs. All groups had clinically normal audiograms (thresholds better than 25 dB HL up to 8 kHz), and crucially, were matched (Supplementary Fig. 1a) in both mean and median threshold within<5 dB at every audiometric frequency in the standard clinical range (0.25–8 kHz). Furthermore, the YCtrl and the YExp groups were matched in mean and median thresholds (<5 dB) even in the extended-high-frequency range (8–16 kHz). This design helps dissociate effects of audiometric loss and the effects of cochlear synaptopathy. In a conservative addition to this design, the effects of residual individual variation in audiometric thresholds were explicitly accounted for during analysis through a linear mixed-effects modeling approach.

Consistent with the audiometric data, DPOAE amplitudes were similar across the three groups in the clinical frequency range with the MA group showing some reductions near and beyond 8 kHz (Fig. 2a). Despite groups being tightly matched in clinical hearing status, the WB-MEMR data revealed significant group differences. With the BB noise elicitor (0.5–8 kHz), the MA group showed significantly reduced MEMR growth functions compared to the YCtrl group when controlling either for the audiometric variations (F(10, 939) = 2.7, P = 0.0028) or for DPOAE amplitudes (F(10, 941) = 2.8, P = 0.0020), whereas the YExp group did not (Fig. 2b). With the HP noise elicitor (3–8 kHz), both high-risk groups showed significantly attenuated MEMR growth function (Fig. 2d), when controlling for either audiometric variations (YCtrl vs. YExp: F(10, 896) = 3.71, P = 7e-5; YCtrl vs. MA: F(10, 939) = 8.57, P = 1.7e-13), or for DPOAE amplitudes (YCtrl vs. YExp: F(10, 898) = 3.84, P = 4.2e-5; YCtrl vs. MA: F(10, 941) = 8.91, P = 4.2e-14). These results suggest substantial CS in the MA group spanning a broad frequency range, similar to our chinchilla model and to previous indications from human post-mortem data5. On the other hand, the results in the YExp group suggest a lesser degree of CS that is localized to the basal regions of the cochlea. These results are corroborated by the findings from our second assay where the ABR wave-I/V ratio for HP (3–8 kHz) clicks is attenuated in both high-risk groups, but more so in the MA group (Fig. 2c) after adjusting for audiometric variations (YCtrl vs. YExp: T(1, 155) = 1.89, P = 0.03; YCtrl vs. MA: T(1,155) = 3.794, P = 0.0001), or for DPOAE amplitudes (YCtrl vs. YExp: T(1, 156) = 1.731, P = 0.047; YCtrl vs. MA: T(1, 156) = 6.024, P = 5e-9). Consistent with the chinchilla findings, a comparison of the test statistics and P-values between WB-MEMR assays and ABR wave-I/V ratio suggests that the HP-noise-elicited WB-MEMR growth function is the most robust measure separating the control and high-risk human groups.

Fig. 2: YExp and MA human groups show reduced WB-MEMR and ABR compared to YCtrl despite tightly matched audiograms.
figure 2

a DPOAE amplitudes are closely matched up to 8 kHz. The MA group shows reduced DPOAEs between 8 to 16 kHz, consistent with audiograms. Box plots of DPOAE amplitudes averaged over two frequency ranges (3–8 kHz, and 9–16 kHz) are shown on the right side of (a) to visualize the full distribution of the data. b, d WB-MEMR to BB elicitors is significantly reduced in the MA group (b), whereas both high-risk groups show reductions with HP elicitors (d). WB-MEMR datapoints are estimated mean ± STE bars. Box plots to visualize the full distribution of the WB-MEMR data for the higher elicitor levels can be found in Supplementary Fig. 8. c ABR wave-I/V ratio is reduced in both high-risk groups. All box plots show the median line enclosed in a box denoting the 25th to 75th percentile range, and whiskers of length 1.5 times the interquartile range. Underlying data are archived on Zenodo (https://doi.org/10.5281/zenodo.6672827)61.

To test the sensitivity of clinically available versions of the MEMR and ABR, the same human groups were also studied using standard clinical equipment and protocols. A comparison of raw effect sizes for various lab and clinic-style measures showed that the clinical measures, although less sensitive, yielded results consistent with our more targeted laboratory assays (Fig. 3a), corroborating the notion that the high-risk groups exhibit CS. Using conventional effect-size interpretations32, raw effect sizes in the small-medium range were obtained for individual physiological metrics for the YCtrl vs. YExp comparison, and medium-large for the combined hybrid metric extracted by the classifier. For YCtrl vs. MA, medium-large effect sizes were obtained for individual metrics and large effect size for the combined hybrid metric extracted by the classifier (Fig. 3a). A support-vector machine classifier was trained to use both WB-MEMR and ABR wave-I/V ratio metrics to blindly classify whether an individual belongs to the YCtrl group or one of the high-risk groups. Leave-one-out cross validation demonstrated substantially above-chance classification (Fig. 3b). Note that although raw effect sizes are larger for ABR than for MEMR metrics, the ABR wave-I amplitude is known to also be strongly correlated with audiometric thresholds and DPOAE amplitudes in the extended-high-frequency range, which complicates the interpretation of ABR-based metrics13. Because individuals in the MA group showed elevated audiometric thresholds in the extended-high-frequency range compared to those in the YCtrl group, we used a simple linear regression to partially adjust for this effect from the “best” WB-MEMR and the “best” ABR metrics (i.e., WB-MEMR thresholds with HP elicitors, and wave I/V ratio with 32-channel measurements respectively). When effect sizes were re-estimated from the adjusted metrics, WB-MEMR thresholds (with HP elicitors) still showed a large effect size whereas the ABR wave-I/V effect size dropped to the medium range (Fig. 3c). Furthermore, it should be noted that the wave I/V normalization, although likely beneficial in reducing the effects of extraneous sources of variance13, entwines the effects of cochlear synaptopathy on wave I and V amplitudes with the effects of accompanying central gain on the wave V. It is possible that the full effects of central gain are not seen in our chinchilla data given that ABRs were measured just two weeks post-NE in the chinchillas. Considering these factors together, WB-MEMR measures seem to be the most sensitive to subclinical cochlear-nerve damage, and also simpler to interpret in the face of audiometric loss beyond 8 kHz and the effects of central gain. Finally, to obtain further insight into the sensitivity of the clinical MEMR, we also analyzed data from a large publicly available bank of audiological measurements from the NHANES 2012 repository, which revealed a steady decline of the MEMR amplitude with age despite normal hearing sensitivity33 (Fig. 3d). The findings from the NHANES 2012 data lend further credence to the notion that clinic-style MEMR measurements also capture the effects of CS, just with lower sensitivity.

Fig. 3: Clinic-style MEMR and ABR measures are also consistent with CS in high-risk human groups, but less sensitive than lab-style measures.
figure 3

a Effect-size comparisons for individual assays. b An SVM classifier performs substantially above chance in blindly classifying individuals into their groups. c Effect size adjusted for audiometric thresholds > 8 kHz is larger for the “best” WB-MEMR than the “best” ABR metric. d NHANES repository data show steady decline in MEMR amplitude with age despite normal audiograms.

Taken together, our results suggest that humans are also susceptible to CS from NE and aging, and that such damage may be widespread even among individuals with good hearing status per current clinical criteria. The large (>50%) reduction in the WB-MEMR amplitudes in our chinchillas and the parallel effects observed in at-risk human groups suggest that the WB-MEMR, particularly with a HP elicitor, is highly sensitive to CS. Another key insight from our results is that while the data from middle-aged listeners is consistent with CS at all frequencies, damage from high acoustic exposure is restricted to higher frequency sections of the cochlea. Interestingly, although we found evidence consistent with CS in our YExp group, we did not find audiometric threshold elevations in this group for extended high frequencies (9–16 kHz) as seen in some previous studies34. Another previous study found that self reports of greater acoustic exposures were associated with elevated thresholds at 16 kHz, but only in female subjects9. Given these inconsistencies, the level and duration of acoustic exposure needed to induce extended high-frequency hearing loss should be systematically investigated in future studies. The data in the middle-aged group correspond well with post-mortem histology data from human temporal bones5; specifically, the percent reduction in the WBMEMR amplitudes in middle age, and the slope of the MEMR vs. age function from the NHANES data are both consistent with the rates at which deafferentation is observed in the post-mortem data. This further supports the notion that the WB-MEMR can robustly indicate CS, at least in the absence of audiometric hearing loss. Thus, our results support the interpretation that CS mediates the observed associations between WB-MEMR measures and speech-in-noise scores35,36. The smaller effect sizes observed in analogous measures with clinic-style protocols can also account for the mixed results in similar comparisons between speech-in-noise scores and clinic-style acoustic reflex thresholds36,37. Finally, the WB-MEMR assay is likely to be available for clinical use widely in the near future; thus, the measure may be appropriate for use in clinical trials of therapeutic drugs intended to re-establish afferent cochlear synapses7 both for candidate selection and as an outcome measure.

Methods

Human groups

All human subject measures were conducted in accordance with protocols approved by the Purdue University Internal Review Board and the Human Research Protection Program. Participants were recruited via posted flyers and bulletin-board advertisements and provided informed consent. All participants had thresholds of 25 dBHL or better at audiometric frequencies in the 0.25–8 kHz range in at least one ear. If a subject met criteria in both ears, measurements were performed in each ear separately and averaged together to provide a single set of measurement per individual. Two groups of subjects who were specifically at risk for cochlear synaptopathy (CS) were tested; a middle-aged group (MA) with ages ranging from 36–60 years (N = 58, 44 Female) and a young group with regular and substantial acoustic exposure (YExp) owing to membership in the Purdue Marching Band or a campus hunting/shooting club (N = 53, ages 18–35 years, 27 Female). A young (ages 18–35) cohort of subjects who answered “No” to the question “Do you consider yourself regularly exposed to loud sound (e.g., play in a band, have a job with noisy tools/ machines)?” were included as the control group (YCtrl, N = 55, 34 Female). The choice of the control group was conservative in that it was minimally restrictive, and likely includes subjects with a range of recreational acoustic-exposure histories that are representative of the local community in this age group. Note that our audiometric criteria for inclusion meant that a greater proportion of YExp and (an even greater proportion of) MA subjects who initially expressed interest had to be excluded compared to those in the YCtrl group.

Although our study design relied on using well-defined groups, a modified version of the noise exposure survey developed by38 was used to obtain a correlate of acoustic exposure in a subset of participants and establish whether the YCtrl and YExp groups were different in their reporting of exposures. The survey was modified to include two additional questions to ask participants about time spent in nightclubs and in bars/pubs as these are sources of noise exposure not queried on the original survey. Following the formulae provided in38, an average yearly noise exposure level was calculated for each subject. The results confirmed that the groups are indeed well-separated (Supplementary Fig. 1b).

Chinchilla cochlear synaptopathy model and measurement setup

Young (<1.5 years old) male chinchillas (N = 7) weighing 400–650 grams were used in accordance with protocols approved by the Purdue Animal Care and Use Committee (PACUC Protocol No: 1111000123). Awake and unrestrained chinchillas were exposed to 100 dB SPL, octave-band noise centered at 1 kHz for 2 hours. By systematically varying the exposure level, prior studies suggested that this exposure paradigm yields a temporary threshold shift (TTS) and concomitant broad cochlear synaptopathy16. The synaptopathic effects of this NE were histologically confirmed in the present study (see section on Inner Hair-Cell Synaptic Cochleogram). To further confirm TTS and characterize the effects of cochlear synaptopathy, a battery of non-invasive assays were measured from the chinchillas pre-exposure, 1-day post exposure, and 2-weeks post exposure (Fig. 1a). The non-invasive measures were acquired either with the animals anesthetized or awake, depending on the measure (described in separate sections on the individual assays), and were carried out in a double-walled, electrically shielded, sound-attenuating booth (Acoustic Systems, Austin, TX, USA). Animals were socially housed in groups of two until they underwent an anesthetized procedure, after which they recovered in their own cage. All animals received daily environmental enrichment. Note that the focus of the present study was the effect of noise-induced cochlear synaptopathy on candidate non-invasive biomarkers for synaptopathy, with TTS-inducing exposure expected to reduce WBMEMR amplitudes and increase WBMEMR thresholds. Only male chinchillas were used here since sex differences in noise susceptibility (with females generally less susceptible than males) have been shown in chinchillas (Trevino et al., 2019). This experimental design was chosen to avoid a known sex-related factor that would likely add substantial variability to the outcomes if we were to pool across animals of different sexes. While this approach allowed us to address one of our specific hypotheses with greater statistical power for this initial study, we acknowledge that this does not allow us address the effects of sex variations that are important to characterize. This should be considered in future studies.

For anesthetized measurements, anesthesia was induced with xylazine (4 mg/kg, subcutaneous), followed a few minutes later with ketamine (40 mg/kg, intraperitoneal). The xylazine reversal agent atipamezole (0.4 to 0.5 mg/kg, intraperitoneal) was used after procedures for faster recovery. Under anesthesia, eye ointment was used to keep the eyes lubricated and the animals’ vital signs were monitored throughout procedures with a pulse oximeter (Nonin 8600V, Plymouth, MN). An oxygen tube was placed near the animals’ nostrils. Body temperature was maintained near 37 °C using a closed-loop heating pad with a rectal probe (50-7053F, Harvard Apparatus). Animals were provided lactated Ringer’s solution both before and after procedures (each 6 cc, subcutaneous).

For awake measurements, chinchillas were positioned in a cylindrical restraining tube modified from39 that terminated in a neck-sized opening that was narrower than the rest of the tube (Supplementary Fig. 2). Movement was restricted in the axial direction of the tube by positioning the head and bullae outside the tube (i.e., beyond the narrow opening), and the rest of the body inside the tube. Head rotation was then restricted by placing a custom adjustable nose holder across the nose with the head at a comfortable downward angle that allowed for unrestricted airflow for breathing, and access to the ears for delivering auditory stimuli. A pulse oximeter (Nonin 8600V, Plymouth, MN) was placed on the pinna to monitor for any signs of restricted airflow, and a webcam was used continuously to monitor for any signs of discomfort. Animals remained comfortable throughout the ~30 min of awake data collection.

Inner hair-cell synaptic cochleogram

The synaptopathy phenotype induced by our noise-exposure paradigm was confirmed by estimating synaptic loss in inner hair cells (IHCs) as a function of cochlear frequency location on a cohort of chinchillas (control: N = 8; exposed: N = 8) separate from the animals used for non-invasive physiological measures in this study. The synaptic loss was quantified by immunostaining and confocal imaging of CtBP2 (pre-synaptic ribbon protein). Animals were intracardially perfused with 4% paraformaldehyde in phosphate-buffered saline at pH 7.3. Cochleas were dissected and immediately perfused through the cochlear scalae, post-fixed for 2 h at room temperature, dissected into six pieces without decalcification (roughly half turns of the cochlear spiral) for whole-mount processing of the cochlear epithelium. Cochlear pieces were transferred to net well sieve in 5 ml disposable cup with ~ 1 cc of 30% sucrose and freeze-thawed for membrane permeabilization40. Immunostaining began with a blocking buffer (PBS with 5% normal horse serum and 0.3–1% Triton X-100) for 1 h at room temperature and followed by overnight incubation at 37 °C with mouse (IgG1) anti-CtBP2 (C-terminal Binding Protein), to quantify pre-synaptic ribbons and rabbit anti-Myosin VIIa to delineate the IHC cytoplasm. Primary incubations were followed by two sequential 60-min incubations at 37 °C in species appropriate secondary antibodies (coupled to Alexafluor dyes) with 0.3–1% Triton X-100. To understand the position of synapses with respect to IHC surface, anti-myosin VIIa & Topro-3 (nuclei dye) were used. Confocal z stacks of frequency-specific regions from each ear were obtained using a high resolution (1.4 NA) oil immersion objective (63x) on an inverted Zeiss LSM 710. Images were acquired in a 1448 × 1604 raster (pixel size = 0.036 μm in x and y, scan speed 7; Averaging of 2 with 8 bit depth) with a z-step-size of 0.25 mm. Image stacks (.lsm files) were ported to an off-line processing station, where further 3-D morphometry was performed using a commercial image-processing program (FEI’s Amira, Visage Imaging). Synaptic ribbons were counted and divided by the total number of IHC nuclei in the microscopic field including fractional estimates. The percentage of IHC synapse loss was plotted as a function of cochlear frequency location (Fig. 1b).

Behavioral audiometry (humans only)

Standard clinical and extended-high-frequency audiograms were measured using a GSI AudioStar Pro audiometer (Eden Prairie, MN) and Sennheiser HDA 200 high-frequency headphones by employing pulsed-tone stimuli. Thresholds were determined for 0.25, 0.5, 1, 2, 3, 4, 6, 8, 10, 12.5, 14 and 16 kHz using a bracketing procedure (modified Hughson and Westlake procedure). An important feature of this study was that the three human groups (YCtrl, YExp, and MA) were tightly matched in audiometric thresholds with mean and median differences being<5 dB at every frequency tested from 0.25 and 8 kHz (Supplementary Fig. 1a). The YCtrl and YExp groups were also matched in the 8–16 kHz range (with mean and median differences<5 dB, again). Owing to substantial effect of age on hearing sensitivity in the extended high-frequency range41, it was infeasible to match the young and MA groups in this range. The MA group thus showed an approximately 25-dB threshold elevation in the extended-high-frequency range. When performing statistical analyses of putative assays of cochlear synaptopathy, the audiograms were summarized into three bins: a low-frequency average (LFA; 0.25–3 kHz), a high-frequency average (HFA; 4–8 kHz), and an extended-high-frequency average (EHFA; 10–16 kHz). The mean and median differences were<5 dB across all three groups for LFA and HFA, and also across YCtrl vs. YExp for EHFA. This tight matching in audiograms between the groups allowed for interpreting the other suprathreshold measures in the context of cochlear synaptopathy.

Wideband middle-ear muscle reflex (WB-MEMR)

Motivated by findings in mice15, the WB-MEMR was a key candidate for an assay of cochlear synaptopathy and was measured in both species using shared methods and software. Chinchilla measurements were acquired with the animals awake and restrained. Human subjects passively watched a muted video with subtitles during data acquisition. The WB-MEMR paradigm was adapted from42. An ER-10X wideband measurement system was used to acquire data from human subjects whereas an ER-10B+ system coupled to ER-2 transducers and foam eartips were used for chinchilla data collection (Etymotic, Elk Grove Village, IL). Both measurement systems allowed for probe stimuli and ipsilateral reflex-eliciting stimuli to be presented from separate speakers to limit interchannel interactions and distortions, and consisted of a microphone to measure sound pressure near the ear tips. A 90-dB peSPL click probe with a flat incident power spectrum in the 0–10 kHz range was used to measure the acoustic immittance properties of the ear-canal middle-ear system. Each WB-MEMR measurement trial consisted of a series of seven clicks alternating with 120 ms-long ipsilateral noise elicitors (Supplementary Fig. 3a). The gap between the peak of the click and the onset ramp of the noise elicitor was 27.94 ms, whereas the gap between the offset ramp of the noise and the next click was 13.97 ms. This trial structure was used for each elicitor level and the level series was repeated 32 times with an intertrial interval of 1.5 seconds to allow for the middle-ear immittance to relax back to baseline levels. For each elicitor level, the immittance measured using clicks numbered two through seven in the sequence were averaged together and the change relative to the first click was calculated as the WB-MEMR metric. Although the data reported in the present study represent averaged measurements from 32 trials, the MEMR was measurable with single trials and thus is readily translatable to clinical applications with limited time availability. The immittance change from the WB-MEMR protocol was quantified using analogous metrics for chinchillas and humans. For both species, the dB-change in ear-canal pressure induced by the MEMR was quantified as a function of frequency to yield a pattern of alternating negative and positive peaks at different frequencies (Supplementary Fig. 3b, Supplementary Fig. 4a). For chinchillas, the absolute value of this pattern was averaged over 0.5–2 kHz to yield a single number per elicitor level (Supplementary Fig. 4b). For human data, additional calibrations which help reduce extraneous variance were leveraged (as described in the Acoustic Calibrations section). Accordingly, the dB-change in ear-canal pressure was added to the dB-change in ear-canal conductance (Supplementary Fig. 3c) to yield a stereotypical pattern of dB-change in absorbed power42 (Supplementary Fig. 3d). The absolute value of this dB-change in absorbed power was averaged over 0.5–2 kHz to yield a single number per elicitor level (Supplementary Fig. 3e). Chinchilla WB-MEMR data were acquired for broadband (BB) noise (0.5–8 kHz) elicitors in the 34–94 dB SPL range in steps of 6 dB. Human WB-MEMR data were acquired both for BB (0.5–8 kHz) and highpass (HP, 3–8 kHz) elicitors in the 34–88 dB FPL (Forward-pressure level) range in 6 dB steps. Noise spectra had a logarithmic roll-off with frequency in both species to produce a flat excitation pattern including adjustments for the sharpening of cochlear tuning at higher frequencies43. A thresholding procedure was used to reject artifactual trials before averaging. The WB-MEMR growth functions (with elicitor level) were baseline corrected by subtracting the mean values across the 34 and 40 dB SPL/FPL points. For plotting and group level analysis, a three-parameter sigmoid function was fit to individual baseline-adjusted WB-MEMR growth functions. The level at which the fitted function crossed 0.1 dB was recorded as the WB-MEMR threshold for each individual.

Acoustic calibrations to reduce human ear-canal filtering effects

For human measures, the ER-10X probe was calibrated using classic methods for characterizing two-port networks44. The Thevenin-equivalent source pressure and impedance for the click probe were estimated by measuring the acoustic response at the ER-10X microphone when the eartip was coupled to loads whose acoustic impedance values can be approximated using theoretical calculations (brass cylinders of 8 mm diameter and five different lengths, that is the ER-10X calibrator). The estimates were refined until the so-called “calibration error” (a dimensionless energy ratio scaled by a factor of 10000, and averaged over 2–8 kHz) was minimized. Error values of<1 are typically considered good quality calibration45; we routinely obtained errors in the 0.01–0.04 range. With the probe properties calibrated, the same click stimulus was then used to estimate the immittance properties of each subject’s ear for each insertion of the probe tip. Following46, we considered the probe as adequately sealed in the ear canal if the low-frequency (0.2–0.5 kHz average) ear absorbance was less than 29% and the admittance phase was greater than 44°. The in-ear calibration measurements were used to derive a voltage to forward-pressure-level (FPL) transfer function that was then used to generate voltage waveforms that would yield stimuli at desired FPL. These calibration methods help reduce extraneous variability from ear-canal filtering both within and across individuals13.

Distortion-product otoacoustic emissions (DPOAEs)

To indirectly assess cochlear mechanical function and outer-hair-cell (OHC) integrity, DPOAEs were measured in both species using logarithmically sweeping primaries47. f2 frequency was swept down from 16 kHz to 2 kHz with f1 = f2/1.225. For human measurements, because the additional calibrations were available (as described in the Acoustic Calibrations section), the primary levels were kept constant across frequencies at 66/56 dB FPL. The use of constant-FPL primaries allowed for the 2f1 - f2 DPOAE level to be quantified in emitted-pressure level (EPL) units48. For chinchilla measurements, constant-SPL primaries at 75/65 dB SPL were used instead with the DPOAE also quantified in SPL units. The identical least-squares approach with 0.5 s-long windows was used in both species to extract the DPOAE; this choice of window length was guided by previous work suggesting that this allows for extracting the contribution of just the distortion source49. To cross-validate our swept-tone methods, standard DP-grams were also acquired from the chinchillas using pure-tone primaries. The DPOAEs acquired using pure tones were comparable to those acquired using swept-tone primaries (Compare Supplementary Fig. 5, and Fig. 1c).

Auditory brainstem responses (ABRs)

ABRs were measured in human subjects using a 32-channel EEG cap (Biosemi, Amsterdam, Netherlands) using HP (3–8 kHz), 105 dB SPL click stimuli and in-ear gold-foil tiptrodes (ER3-26A combined with ER3-28S) coupled to earphone transducers (ER2, Etymotic Research, Elk Grove Village, IL). The use of HP clicks helps reduce extraneous inter-subject variability owing to cochlear dispersion, while simultaneously focusing the assay on the basal regions where audiometric threshold shift owing to noise-induced hearing loss tend to appear13. The presentation timing of the clicks was random with a Poisson distribution50 with a mean rate of 11 clicks per second and randomized polarity with 4000 repetitions per polarity. The ABR was extracted using conventional trial averaging. To quantify the amplitudes of wave I and wave V without the experimenter bias on group differences, an automated procedure based on dynamic time warping was used51. The grand-averaged waveform across all subjects (across all three groups) was defined as the template ABR. Wave I and V peaks and troughs were marked manually on this template waveform; the corresponding points identified by automatic dynamic time warping were identified as the ABR wave I and V peaks and following troughs for individual subjects. The peak-to-trough swings were then extracted as the wave I and V amplitudes. Wave I/V ratio formed the second candidate assay for cochlear synaptopathy13. Separate wave I and V values are plotted in Supplementary Fig. 6.

Anesthetized ABRs in response to tone-burst stimuli were measured from chinchillas pre-TTS exposure and 2-weeks post-TTS exposure. Measurements were acquired using subdermal needle electrodes with a stimulus presentation rate of 20 bursts per second and as a function of level, with 500 repetitions of both positive and negative polarities collected and averaged. ABR thresholds were computed using our standard cross-correlation technique20 at the two time points and were used to establish that the exposure did not lead to a threshold shift (i.e.,<5 dB; Fig. 1d). Suprathreshold amplitudes of the ABRs (averaged at 60, 70, and 80 dB SPL) were averaged across the 4 and 8 kHz tone-burst conditions to yield a metric that is analogous to the high-level 3–8 kHz clicks used in humans. The ABR wave I/V ratio was again evaluated as a candidate metric for cochlear synaptopathy. Separate wave I and V values are shown in Supplementary Fig. 7.

Note that the normalization of the wave I amplitude to extract the I/V ratio entwines two distinct phenomena that may influence the ABR, thus complicating the theoretical interpretation of the group differences. Specifically, cochlear deafferentation can reduce wave I amplitudes, and these reductions may be inherited by later peaks of the ABR. However, central gain, known to occur as a compensatory mechanism, can serve to resist this reduction and help maintain or even enhance the wave V amplitude. Indeed, data in animal models52 and preliminary data from human subjects53 suggest that such compensatory central gain is ubiquitous in young and middle-aged individuals. Thus, purely from an assay-design perspective, the normalization is likely to be beneficial in mitigating the influence of extraneous variables such as head size and tissue geometry13 for the human subject groups, and thus adopted in this study. However, in the chinchilla model where ABRs are measured just two weeks after noise exposure, the full effects of central gain may not be observed.

“Clinic-Style” measures in humans

To compare the sensitivity of ABR and MEMR protocols available in clinical systems to the sensitivity of lab-based targeted procedures, the same human participants were also measured using standard clinical equipment. ABRs were measured using a SmartEP system (Intelligent Hearing Systems, Miami, FL) in response to 80 dB nHL clicks at a presentation rate of 11.1 clicks per second, a gain setting of 150k, and a 50–3000 Hz bandpass filter. ABR wave I and V peaks and troughs were manually marked by trained research assistants who were also graduate students in the clinical audiology program at Purdue University. MEMRs were measured using a Titan System (Interacoustics, Assens, Denmark) in response to 4-kHz tone elictors between 75 and 100 dB HL in 5 dB steps. A 226 Hz tone probe was used for MEMR measurements, as is customary for adult immittance measurements in audiology clinics.

NHANES 2011–2012 MEMR data

A large repository of human audiological data is available from the Centers for Disease Control and Prevention (CDC) as part of the National Health and Nutrition Examination Survey (NHANES). This repository includes MEMR data acquired using the Earscan system and audiometric data in the standard clinical frequency range. MEMR-induced immittance change waveforms (in “Earscan” units) in response to two presentations of a 105 dB HL 2 kHz tone elicitor are available from 4500 adult subjects in the 20 - 69 year age range. Given the results from lab-based WB-MEMR measures suggesting that middle-aged individuals exhibit cochlear nerve degeneration across the frequency range, we asked whether there is evidence of such degeneration in this publicly available dataset. To minimize the contribution of audiometric hearing loss to the MEMR results, we only included the subset of N = 1884 subjects whose audiometric thresholds at 8 kHz was 20 dB HL or better and age of 60 years or less. Following prior work33, the MEMR traces were expressed in Z-scores relative to the last 450 ms of the available recordings. The MEMR amplitude was quantified as the mean Z-score between 350 and 450 ms after the onset of the 2 kHz tone elicitors. This value was plotted as a function of age in 2-year bins, with each containing about 100 subjects (Fig. 3d) and revealed a steady decline with age despite clinically normal hearing. For effect-size calculations the participants were divided into two groups matching the definition of the participant groups recruited in the present study.

Statistics and reproducibility

Statistical inference was performed by fitting linear mixed-effects models54 to the data when multiple data points were measured for the same individual (WB-MEMR data in both species, and pre vs. 2-week-post chinchilla ABR data), or by fitting simple linear models when only one data point was available per subject (Human ABR data). Fixed-effects terms were included to model the effects of group (humans), time-point (pre vs. 2-week-post in chinchillas), and the effects of various covariates (sex, audiometric thresholds, DPOAE amplitudes in humans), whereas subject-related effects were treated as random effects. Homoscedasticity of subject-related random effects was not assumed initially and hence the error terms were allowed to vary and be correlated across the levels of fixed-effects factors. In order to not over-parameterize the random effects, the random terms were pruned by comparing models with and without each term using the Akaike information criterion and log-likelihood ratios55. The best fitting random-effects model turned out to be a single subject-specific random effect that was condition independent in all cases. This random-effect term was used for all subsequent analysis. All model coefficients and covariance parameters were estimated using restricted maximum likelihood as implemented in the lme4 library in R56. To make inferences about the experimental fixed effects, the F approximation for the scaled type-II Wald statistic was employed57. This approximation is more conservative in estimating false-alarm rates than the Chi-squared approximation of the log-likelihood ratios and has been shown to perform well even with fairly complex covariance structures and small sample sizes58. The p-values and F-statistics based on this approximation are reported. For the human ABR data, because a simple linear model is used, group effects are evaluated using T-statistics.

Beyond formal statistical inference on our two key lab-based assays, effect size calculations were performed for each individual lab-based and clinic-style measure using the bias-corrected procedures outlined in ref. 59. Given the larger human sample sizes, mean and standard deviation for effect size metrics were computed using median-based estimates. For the chinchilla data, given smaller sample size, conventional sample statistics were used.

Blind classification of human groups

To assess the sensitivity of our battery of lab-based measures as a whole (rather than individual measures) in predicting acoustic exposure group status and middle-age status, we used a support-vector machine (SVM) classifier to blindly classify individuals into their corresponding groups. WB-MEMR thresholds, high-level amplitudes for both HP and BB elicitors, and the ABR wave I/V ratio were all entered as features (five total) for the classifier to operate on. To obtain the mean and standard error of the classification accuracy, a leave-1-out train-test split procedure was used. The resulting classification accuracies are reported in Fig. 3b. The equivalent effect sizes for the hybrid metric learned by the classifier were estimated from the classification accuracy by computing the p-value of the classifier against a binomial null distribution of chance classification (50%) and using the p-value to obtain equivalent Cohen’s d scores60. These combined/hybrid effect sizes are reported in Fig. 3a.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.