Effects of age on psychophysical measures of auditory temporal processing and speech reception at low and high levels

Highlights • We found little evidence of greater age-related hearing declines at high sound levels.• There are age-related temporal-processing declines independent of hearing loss.• No evidence of age-related speech-reception deficits independent of hearing loss.

Level re arbitrary unit (dB) SSN SSN bug Figure S1: Spectrum of a speech-shaped noise (SSN; black line), and of the noise used for the digit triplets test in the current study (red line). Both noises are lowpass filtered at 3 kHz. Due to a bug in version 1.5 of the software (https://it.mathworks.com/matlabcentral/fileexchange/37376-oscillatorand-signal-generator) used to generate the SSN for the current study, its spectral shape had relatively higher energy below ∼200 Hz, and lower energy above ∼2.5 kHz compared to a SSN. Log-Threshold Change X PT (dB) Figure S5: Posterior medians (circles) and 99% CIs for the effects of audiometric thresholds (at 0.5 kHz for the 0.6 kHz pure tone, and at 2 kHz for the other two stimuli) on pure tone (PT) frequency discrimination thresholds, and complex tone (CT) F0 discrimination thresholds estimated by the Bayesian multiple regression models. Log-Threshold Change X PT (dB) Figure S7: Posterior medians (circles) and 99% CIs for the effects of audiometric thresholds (at 2 kHz for the 2-kHz carrier, and at 0.5 kHz for the other two stimuli) on IPD detection thresholds estimated by the Bayesian multiple regression models. MOD refers to conditions in which the IPD was applied to the modulator of an AM tone, and PT to conditions in which the IPD was applied to a pure tone.  Figure S8: Posterior medians (circles) and 99% CIs for the effects of log 10 TCNE on IPD detection thresholds estimated by the Bayesian multiple regression models. MOD refers to conditions in which the IPD was applied to the modulator of an AM tone, and PT to conditions in which the IPD was applied to a pure tone. X PTA 0.125-2 (dB) Figure S11: Posterior medians (circles) and 99% CIs for the effects of PTA 0.125−2 on DTT thresholds estimated by the Bayesian multiple regression model.  z Score Figure S13: Standardized ratings for musical intervals by age. Each panel shows a least squares line fit of standardized ratings by age with 95% confidence intervals as a visual aid. The slope for the effect of age estimated by the Bayesian multiple regression model is not the same as that shown in the figure. Cons Pref. Change X Noise [log 10 (Energy)] Figure S15: Posterior medians (circles) and 99% CIs for the effects of log 10 TCNE on consonance preference scores estimated by the Bayesian multiple regression model.    Table S3: Posterior medians and 99% CIs (in brackets) for the effects of Cog1, Cog2, and MUS estimated by the Bayesian multiple regression models on AM detection thresholds (AMD), pure tone frequency discrimination thresholds (PT FD), complex tone F0 discrimination thresholds (CT F0D), IPD MOD and IPD PT detection thresholds, CRM thresholds in the colocated (Col.), and offset (Off.) conditions, and for the difference between offset and colocated conditions (Off. -Col.), DTT thresholds, consonance preference (Cons. Pref.), and SSQ12 scores.

UML settings
The form of the psychometric function assumed for all tasks was Logistic: The parameter space for the UML procedure consisted of a three-dimensional grid defining the possible values of the midpoint (α), the parameter controlling the slope (β), the lapse rate (λ), and their prior probabilities. It should be noted that because the psychometric functions were re-fit after data collection (as suggested in Shen and Richards, 2012;Shen et al., 2015), the priors used for the UML procedures did not directly affect the final threshold estimates, but could affect the adaptive placement of the stimuli and the efficiency of the procedure. Data collection occurred over two blocks of trials for each task and condition. To maximize the efficiency of the adaptive procedure, the posterior distribution of the parameters at the end of the first block was saved, and was used as the prior for the second block of trials. Besides the limits on the space of the parameters of the psychometric function, lower and upper limits on the values that the stimuli could take were defined. These upper limits also served to define the λ sweetpoint for all the tasks, except for the frequency and F0 discrimination task. For these two tasks the λ sweetpoint was initially set at a "suggested" frequency/F0 difference that was deemed sufficiently high to result in asymptotic performance for most listeners (10% frequency difference for the frequency discrimination task, and 80% F0 difference for the F0 discrimination task). However, if the current estimate of the psychometric function indicated that the frequency/F0 difference needed to reach a close-to-asymptotic performance (proportion correct of 0.99-λ) was greater than the suggested λ sweetpoint, the λ sweetpoint changed to the estimated frequency/F0 difference needed to reach a close-to-asymptotic performance. In this way most listeners would not be presented with very large frequency/F0 excursions across trials, but frequency/F0 differences larger than the suggested λ sweetpoint could be used if the current estimate of the psychometric function suggested that these were needed to more accurately estimate λ.
The parameter space for the slope and the lapse rate parameters, as well as their priors were the same for all tasks. The array of values for β ranged from 0.1 to 10 with a logarithmic spacing (step factor of 1.1). The prior for β was uniform in this log space. The array of values for λ ranged linearly from 0.001 to 0.3 with a 0.01 step. A beta prior with a mean of 0.08 and a standard deviation of 0.065 was used for λ.
The parameter space for the midpoint in each task is detailed in Table S4. The midpoint (and the stimulus space) was defined in terms of the modulation index m for the AM detection task, in terms of the frequency or F0 difference (∆F or ∆F 0), measured in percent, for the frequency/F0 discrimination tasks, in terms of the IPD, measured in degrees, for the IPD detection task, and in terms of the SNR of the target speech, measured in dB, for the CRM and DTT tasks.

Task
Par. α Step Spacing x AM det. m 0.  Table S4: Table listing the settings for the parameter space of the midpoint (α), and of the stimuli for each task. The second column indicates the parameter that was varied in the task. The third column indicates the lower and upper limits of the midpoint array. The fourth column indicates the step size; for parameters with linear spacing (see fifth column) the step was additive, while for parameters with log spacing the step was a multiplicative factor. The prior was always uniform on the (log or linear) parameter space of the midpoint. The sixth column indicates the lower and upper limits of the parameter space for the stimuli (e.g. for the AM detection task, the lowest and highest values that m could take in any trial).

Psychometric function fits
A Logistic function (Eq. 1) was used for all fits. Fits were obtained via MCMC methods (Kuss et al., 2005) using JAGS (Plummer, 2003) and R (R Core Team, 2020). For the λ parameter, a gamma prior with a mode of 0.01 and a standard deviation of 0.08 was used for all tasks. The priors for the α and β parameters were informed by preliminary fits obtained via maximum likelihood (Wichmann and Hill, 2001). For the β parameters gamma priors with the mode set at the geometric (across-participants) mean of the maximum likelihood estimate, and a standard deviation of 3 were used for all tasks. For the α parameters, normal (on a linear or log scale depending on the task; see Table S5) priors centered at the (across-participants) mean of the maximum likelihood estimate, and with a standard deviation indicated in Table S5 were used. Example priors for one condition of the pure tone frequency discrimination task are shown in Fig. S17  The fourth column indicates the value of the γ (guess rate) parameter of the psychometric function, that was fixed at the reciprocal of the number of response alternatives. The last column indicates whether the dependent variable was log-transformed before the fit.
Goodness of fit was assessed qualitatively, via visual posterior predictive checks (Kruschke, 2013), as well as via the Monte Carlo simulations described by Wichmann and Hill (2001). For the latter, 10,000 simulated datasets were generated using the parameter estimates of the fitted psychometric function. The deviance of each simulated dataset relative to the fitted function was calculated in order to derive the deviance distribution. The deviance of the observed data relative to the fitted function was then compared to the deviance distribution. Deviance values above the 95 th percentile of the deviance distribution may be indicative of poor fits. Only for 10 out of the total of 2448 fits across tasks had observed deviances above the 95 th percentile of the deviance distribution. This indicates that in the vast majority of cases the quantitative goodness of fit check did not detect issues with the fits. Visual inspection of the few poor fits indicated that these were either due to psychometric functions with nonmonotonicities, or to cases in which performance was very poor even for the highest differences between the standard and comparison stimuli (∆s). The former may be due to confusion or to lapses occurring more frequently at higher stimulus levels or in one of the two blocks of trials. The qualitative posterior predictive checks confirmed that the majority of fits were good. These checks also revealed a number of cases, occurring mostly for MOD IPD detection at 2 kHz, and in the AM detection task, in which performance was poor even at the highest ∆s. Because of this, in some cases the adaptive procedure could not explore a wide range of performance points on the psychometric function, and only a few points at high ∆s were available. Although in these cases the fits were usually good, in the sense that the fitted function passed close to available data points, the CIs around the parameter estimates were large. This reflects the fact that these functions could be relatively well fit by a large range of combinations of values of the midpoint, slope, and lapse rate. For example, poor performance at the highest ∆s in some cases could be well accounted for by either a very high lapse rate or a very shallow slope; the resulting trade-off between these parameter estimates results in large uncertainty in the value of these parameters. Such estimates will necessarily be more noisy. However, because the same procedures for fitting the functions were used for all participants, the estimates, even if noisy in some cases, are not biased with respect to age or to the other variables of interest in the study.
For some cases in which performance at the highest ∆s was poor the estimated threshold fell above the possible stimulus range. This is a normal consequence of the fact that performance at the highest possible stimulus level was still below threshold. Extrapolations of thresholds beyond the stimulus range have been performed in previous studies (Hopkins and Moore, 2010;King et al., 2014), relying on an assumed relation between the proportion of correct response (Pc) and ∆, that is considered valid also beyond the possible stimulus range. The approach employed in the current study, that allows threshold estimates beyond the stimulus range, is in a way similar to that of these previous studies, but relies on less assumptions. In particular, in the current study i) the form of the relation between ∆ and Pc is estimated from the data (or, when the data of a given participant do not sufficiently constrain the estimate, is influenced by the priors, which being based on average across-participant estimates reflect the overall across-participant trends) rather than being based on further assumptions, and ii) performance is in part explained by the estimated lapse rate, which was always assumed to be zero in previous studies extrapolating 19 thresholds beyond the stimulus range.

Bayesian correlation model
The Bayesian model to estimate the correlations among predictor variables was based on the model of Lee and Wagenmakers (2014, chap. 5) but used vague uniform priors for estimating the standard deviations of the variables instead of inverse-square-root-gamma priors.

Mixed effect multiple regression models
Tables S7, S8, S9, S10, S11, S12, S13, S14, and S15 list all the terms included in each statistical model (excluding the random effect of participant). The first column indicates the variable to which each coefficient refers (abbreviated as previously defined in the main text of the manuscript or as indicated for each model in Table S6; this table also indicates the dummy codes used to encode categorical variables through an unweighted effect coding scheme). The second column indicates the type of variable (continuous, categorical, or interaction). The third column indicates (for all the terms except the intercept) the scale parameter of the 1-degree-of-freedom t distribution used as a prior for the standardized slope coefficient; for the intercept term this column indicates the standard deviation of the zero-centered normal distribution used as a prior for the intercept. The fourth column indicates the same quantity as the third column, but in unstandardized mean-centered units. The fifth column indicates the median of the posterior distribution in unstandardized mean-centered units. The sixth column indicates the 99% CI for the coefficients in unstandardized mean-centered units.
Because Tables S7-S15 list all the terms included in each model the regression equations can be derived from these tables. Full equations for the regression models are not given here because they would be very long, but an example equation for a model containing only a term for age, one for stimulus level (a categorical predictor encoded as a dummy variable), their interaction, and a random effect of subject is given below: [i,s] (2) where y [i,s] is the score for the s th subject at the i th stimulus level, A [s] is the age of the s th subject, and L [i] is the dummy code for the i th stimulus level (-1=low lev., 1=high lev.); β 0 is the intercept, β A , β L , and β LxA are the regression coefficients for the effects of age, stimulus level, and their interaction, respectively; r [s] is the random effect of the s th subject, and ǫ [i,s] is the residual error term.
To give a sense of the prior distribution, Figure S18 plots t distributions with the same mean and degrees of freedom as the priors used in the current study for several values of the scale parameter. In each case the prior probability is highest for values around zero; while it is sharply centered around zero for small scale values, it becomes more diffuse as the scale value increases. Even when the scale value is relatively small, due to its heavy tails the t distributions can accommodate coefficients much larger than zero if the likelihood provides clear evidence for this. For a more in depth overview of t priors see Kruschke (2014).  Figure S17: Priors for A. the α, B. β, and C. λ parameters of the Logistic psychometric function for the 0.6 kHz pure tone frequency discrimination task at a level of 80 dB SPL. The priors for λ were the same for all tasks and conditions. Priors for β had a the same standard deviation across tasks/conditions, but their mode was centered at the average value of the parameter for the task/condition obtained via preliminary maximum likelihood fits. Priors for α were similarly centered at the average value of the parameter for the task/condition obtained via preliminary maximum likelihood fits.