Effect of hearing aids on the externalization of everyday sounds

This study examined the influence of stimulus properties on sound externalization when listening with hearing aids. Normally hearing listeners were presented with broadband “tokens” (environmental sounds and speech) from loudspeakers, and rated externalization using a continuous scale. In separate blocks, they listened unaided or while wearing behind-the-ear hearing aids with closed domes and low gain (linear or compressive). There was a significant influence of token on ratings, even for unaided listening, and the effect of hearing aids depended on token. An acoustic analysis indicated that hearing aids were more likely to disrupt externalization for peakier sounds with a low-frequency emphasis.


Introduction
The primary function of hearing aids (HAs) is to improve the audibility of speech and other sounds for individuals with hearing impairment.Because HAs interrupt the natural sound path to the ear canals, however, they have the potential to distort spatial cues that listeners rely on for sound localization.Indeed, a number of studies have documented disruptions to sound localization for HA wearers (Noble and Gatehouse, 2006;Akeroyd and Whitmer, 2016), and in a recent study, we showed that listening through HAs can reduce the perceived distance of sounds and even lead to breakdowns of externalization (whereby sound images are perceived at or inside the head instead of out in the world; Best and Roverud, 2024).Disruptions to externalization were most apparent for HAs configured with behind-the-ear (BTE) microphones and occluding domes.While that study examined only speech stimuli, informal interviews with participants who were HA users indicated that certain non-speech sounds were most likely to be internalized in daily life (e.g., dishes clanging, doors slamming, keys jingling, trucks driving by).These reports are consistent with interview data collected by Boyd (2014), in which HA wearers noted internalized perceptions for loud, broadband, and impulsive sounds (e.g., doors banging), as well as for narrow bandwidth high-frequency sounds (e.g., alarms, doorbells).These anecdotal reports suggest that the effects of HAs on externalization may be more extreme for non-speech than speech sounds, and may depend on the acoustic properties of the sounds.The primary goal of the present study was to provide some behavioral data on this issue.
A secondary aim of the study was to examine whether linear vs compressive HA amplification would be more likely to degrade externalization.HAs programmed with wide-dynamic range compression systematically apply lower gain for higher input levels, with momentary gain determined by attack and release times (Dillon, 2012).Given that sound level is an especially important cue for auditory distance perception (Zahorik et al., 2005), and that interaural level differences are critical for sound localization (Hartmann, 2021), it is often assumed that the nonlinear effects of compression may lead to altered spatial perception.However, the limited available data on this issue are somewhat equivocal.Akeroyd (2010) measured relative distance perception in HA wearers with their own HAs and found no evidence that performance depended on compression parameters.On the other hand, Wiggins and Seeber (2012) reported that compression adversely affected various spatial attributes of sounds (including externalization) for normally hearing listeners.Our previous study (Best and Roverud, 2024) did not examine this issue systematically but found breakdowns of externalization both for normally hearing listeners given linear gain and for hearing-impaired listeners given compressive gain.In the current study, we directly compared linear and compressive HAs in the context of externalization.Moreover, since compression depends strongly on the temporal characteristics of the input signal (Stone and Moore, 1992), we were interested to know whether differences between linear and compressive HAs would vary with sound type.

Participants
In total, 12 young adults with normal hearing (mean age, 24 years; range, 18-34 years) participated.All participants had audiometric thresholds <¼ 20 dB HL bilaterally at octave frequencies from 250 to 8000 Hz.Seven listeners participated in Experiment 1 and five participated in Experiment 2. Participants were college students or young professionals recruited via flyers and job postings in the Boston University community.They provided informed consent and were paid for their participation.

Hearing aids and fitting procedures
HAs were GN ReSound One receiver-in-the canal devices with Microphone and Receiver In-The-Ear (M&RIE, Hopkins, MN) technology.These HAs have 14 compression channels and a frequency range from 100 to 9550 Hz in an ear simulator according to manufacturer specifications.Based on our previous study (Best and Roverud, 2024), the devices were fit so as to have the largest impact on externalization, which meant programming the aids to use the BTE microphones on the body of the aids and coupling the aids to the ear canal using GN ReSound silicone instant ear tips of the strongly occluding power dome style.
Two pairs of HAs were programmed using ReSound SmartFit software.Only mild feedback cancellation was turned on, and all noise reduction and other special features were turned off.One pair of HAs ("linear") was programmed to have 10 dB of linear gain (compression ratio ¼ 1), and a maximum output level ranging from 97 to 107 dB and from 250 to 8000 Hz.This condition matched the "BTE Closed" condition described in Best and Roverud (2024).The other pair of HAs ("compressive") was programmed to have 10 dB of gain for input levels up to and including 50 dB sound pressure level (SPL), but decreasing gain for input levels above this with a compression ratio of 1.5 from 250 to 8000 Hz.Syllabic time constants were used: 12-ms attack times and 70-ms release times (except for 250 and 500 Hz, where release times were 120 ms), based on manufacturer settings.The feedback manager calibration was run for each participant and HA pair prior to testing.

Equipment
Participants were seated in a single-walled sound-attenuating booth with perforated metal walls/ceiling and a carpeted floor (Industrial Acoustics Company, Hampshire, UK).The reverberation characteristics of this room are described elsewhere (Kidd et al., 2005; "BARE room").The inner dimensions of the booth were approximately 3.8 Â 4.0 Â 2.3 m (length Â width Â height).Seven visible loudspeakers were positioned on a horizontal arc with a radius of 1.5 meters centered on the participant, at azimuths of 0 , 615 , 630 , and 690 .In Experiment 1, stimuli were only presented from the loudspeaker at 0 .Experiment 2 made use of all seven loudspeakers.Participants were asked to keep their heads still during the experimental blocks, but their head was not restrained.
Stimuli were generated on a Lenovo PC using MATLAB 2019 b (MathWorks, Inc., Natick, MA) at a sampling rate of 44 100 Hz, passed to a multichannel soundcard (MOTU 16 A, MOTU, Inc., Cambridge, MA) and a bank of power amplifiers (Crown Audio XTi 1002, Crown Audio, Los Angeles, CA) all located outside of the booth, and then delivered to the loudspeakers (Acoustic Research 215 PS, Acoustic Research, Cambridge, MA).Participants provided responses using a handheld backlit keypad.

Stimuli and procedures
The stimuli included environmental sounds and speech sounds interleaved randomly in 252-trial blocks.Following each stimulus presentation, the participant rated their perceived externalization using a continuous integer scale (10 ¼ at the loudspeaker; 0 ¼ inside the head; À10 ¼ behind the head at an equal distance to the front loudspeaker).Note that while we have chosen to use the term "externalization ratings," these can also be considered as near-field distance ratings (for a discussion, see Best et al., 2020).
The environmental sounds were 10 tokens selected from corpora shared by Shafiro (2008), Gygi andShafiro (2019), andNorman-Haignere et al. (2015).The tokens were selected somewhat arbitrarily, but were all broadband and represented a wide variety of temporal characteristics.Details about the environmental sounds are provided in Table 1 and Fig. 1.The speech sounds were monosyllabic words selected from a corpus recorded at Boston University (described in Kidd et al., 2008; word options selected from those listed in Roverud et al., 2020).The set included 16 words (eight adjectives and eight nouns) spoken by each of four talkers (two male and two female).Table 1 and Fig. 1 include details of the speech stimuli for comparison to the environmental sounds.
In Experiment 1, each of the ten environmental tokens was presented twice at three presentation levels (50, 55, and 60 dB SPL) for a total of 60 environmental sounds per block, and each of the 64 speech tokens was presented once at the same three levels (192 speech tokens per block).After seven participants had completed Experiment 1, it was clear that they found the blocks monotonous, and we suspected that the lack of variability in the stimuli was an issue.Thus, the mix of stimuli was modified for Experiment 2 to include seven azimuths instead of one, and to achieve a better balance of environmental and speech sounds.In Experiment 2, each of the ten environmental tokens was presented once for each of the three levels and seven loudspeakers (210 environmental sounds per block).The speech tokens were reduced to one female talker speaking the word "bags" and one male talker speaking the word "hot" (preliminary analysis of the data from Experiment 1 indicated that these specific words yielded the largest HA effects).These two words were presented once for each of the three levels and seven loudspeakers (42 speech tokens per block).
Each participant completed one block of trials per HA condition (unaided, linear, compressive) in a random order before repeating each of the three conditions in a new randomized order.The exceptions to this were for the first two participants in Experiment 1, who only ran the unaided condition once at the end of the session.

Externalization as a function of sound token
Although the response scale allowed for the possibility of front-back reversals, these were relatively rare and not of primary interest in the current experiment, so they are not considered further here.For the analyses that follow, externalization ratings represent absolute values (i.e., rear responses are flipped to the front).Figure 2(A) shows absolute externalization ratings for each HA condition as a function of sound token.These results are based on averages across level, azimuth, experiment, and participant.Figure 2(B) shows the percentage of internalized responses (defined as absolute ratings corresponding to  0 or 1, as in Best and Roverud, 2024) in the same format.For all three HA conditions, there are similar variations across tokens, with lower ratings and more internalized responses occurring for the "typing," "clock," and "pour" tokens.Also, ratings were generally higher (and internalized responses rarer) in the unaided condition than in the aided conditions.The differences between aided and unaided ratings were larger for some tokens than others.Differences between results for the linear and compressive HA conditions were generally small.Trial-by-trial absolute externalization ratings were analyzed with a linear mixed-effects model using the lmer function in R. The independent variables included fixed effects of HA condition (coded categorically), presentation level (coded ordinally), sound token (coded categorically), the interaction of sound token Â HA condition, and a random intercept for subject.Before settling on this model, we verified that including experiment (1 or 2) as a fixed effect did not significantly improve the model fit.We did not include a fixed effect of azimuth, given the uneven sampling of azimuth across the two experiments, and because this factor was not of primary interest.Sum contrast coding was used for categorical and ordinal variables.Significance testing for the fixed effects made use of the anova function (type III) in R. In a separate analysis, trial-by-trial data coded as 1 (internalized) or 0 (not internalized) were fitted with a generalized linear mixed model using the glmer function in R.This model was identical in structure to that applied to the rating data.Significance testing for the fixed effects made use of the anova function (type III) in R, which applies Wald Chi-square analysis of deviance tests for binomial data.Note that these two separate analyses were performed on the same data set and thus are not independent but provide complementary views of the data in terms of distance perception and internalization (see also Best and Roverud, 2024).
As shown in the top section of Table 2, all fixed effects (HA condition, level, token) and the interaction of HA condition Â token were statistically significant for both ratings and internalized responses.Post hoc Tukey contrasts for significant effects were conducted with the emmeans function.Contrasts involving HA condition (not shown in table) revealed that both linear and compressive HAs yielded lower ratings and more internalized responses than unaided, but linear and compressive conditions did not differ from each other.All three presentation levels were significantly different from one another (not shown in table), with higher levels associated with lower ratings and more internalized responses than lower levels.The statistical significance of contrasts for the HA condition Â token interaction is shown in the bottom section of Table 2.For most tokens, ratings were lower and internalization was more common with HAs than unaided, except for the "Glass," "Ice," and "Scratch" tokens, in which the effect of aiding was absent or inconsistent.There was no token for which the compressive and linear aided conditions differed significantly for both ratings and internalized responses.

Predicting externalization using acoustic properties
To further examine the significant effects involving token, we extracted two key acoustical properties of each token (crest factor and spectral centroid; Table 1).Note that because the crest factor was taken as the peak level to root-mean-square (RMS) ratio, and stimuli were RMS-normalized, crest factor is equivalent to peak level.Crest factor and spectral centroid were not correlated across tokens (Pearson's r ¼ 0.35, p ¼ 0.22).
These acoustical measures were included in a second set of statistical models similar to those presented previously.The dependent variables were again absolute ratings and internalized responses.In this case, however, the fixed effects were HA condition, level, and z-scored crest factor and spectral centroid values.Given the lack of a significant difference between compressive and linear HA conditions found previously, HA condition was reduced to two levels for this analysis (unaided or aided).The two-way interactions between the HA condition and each of the acoustical measures were also included, as was a random intercept for the subject.HA condition and level were coded using sum contrast coding.
The results of these model fits are shown in the top section of Table 3.For ratings, all fixed effects and the twoway interactions of HA condition x crest factor and HA condition Â spectral centroid were significant.For internalized responses, most fixed effects were significant (but not spectral centroid), as was the two-way interaction of HA condition Â crest factor (but not the interaction of HA condition Â spectral centroid).Regarding the post hoc contrasts of the significant main effects (not shown in the table), ratings were lower and internalization was more likely for the aided condition (compared to unaided), for higher presentation levels (compared to lower presentation levels), and for tokens with a higher crest factor (compared to lower).Additionally, ratings were lower for tokens with a lower spectral centroid (compared to higher).Contrasts for the interactions are shown in the bottom section of Table 3, with estimates indicating slopes relating crest factor or spectral centroid to predicted ratings and log-odds internalization for aided and unaided conditions.For both ratings and internalized responses, crest factor was a significant predictor for both aided and unaided conditions, but it was a significantly stronger predictor in the aided condition.For ratings, spectral centroid was a significant predictor for both aided and unaided conditions but was a significantly stronger predictor in the aided condition.For internalized responses, although the HA condition Â spectral centroid interaction was not significant, the spectral centroid was significant for the aided condition.Overall, these results indicate that tokens with higher crest factors were more likely to produce breakdowns of externalization for both aided and unaided conditions, but especially when aided.Moreover, with HAs, an additional disruption to externalization was apparent for tokens with lower spectral centroids.

Discussion
The primary aim of this study was to investigate whether certain everyday sounds are more prone to the disrupting effects of HAs on sound externalization.We replicated our previous findings (Best and Roverud, 2024) showing lower externalization ratings and a greater tendency for internalized responses for aided compared to unaided listening.We also extended this result beyond speech to a range of environmental sounds and found a significant effect of the specific token, with disruptions to externalization for some tokens but not others.An analysis involving two acoustical properties of the different sound types-crest factor (or peak level) and spectral centroid-suggested that the crest factor may have contributed to the overall variations in externalization; higher crest factors were associated with weaker externalization for both unaided and aided conditions.The tokens with the highest crest factors included "Typing," "Clock," "Scratch," and "Frying," and these tokens were all rather poorly externalized.The relationship between crest factor and externalization was stronger with HAs, suggesting that HAs exacerbate this general effect.Spectral centroid, which describes the spectral region with the highest concentration of energy for these broadband sounds, was associated with additional variations in externalization with HAs; tokens with a lower frequency spectral centroid were more likely to be given lower (closer) ratings with HAs than without.Demonstrating this effect, the two tokens with the highest spectral centroids ("Glass," "Ice") were also the tokens that showed little difference between aided and unaided conditions.This result runs somewhat counter to the findings of Boyd (2014) and Wiggins and Seeber (2012), which implicated high-frequency sounds in breakdowns of externalization, but this may reflect the use of broadband (not narrowband) stimuli in the current study.
We tested both linear and compressive HAs, to understand if the temporal aspects of compressive gain application were important drivers of any observed HA effects.If the nonlinear effects of compression did indeed alter the stimulus waveforms, the alterations were apparently not sufficient to influence externalization ratings.Instead, our results suggest that the provision of either linear or compressive gain through these particular devices had a disruptive effect on the externalization of many everyday sounds.
Additional information regarding the importance of stimulus acoustics in sound externalization, with and without HAs, may be revealed in future studies using a broader sampling of sounds or systematic manipulations of different features of sounds.Specifically, it would be worth considering narrowband sounds, which are poorly externalized in general (Best et al., 2020) and may be particularly susceptible to the effects of HAs (Boyd, 2014).Another interesting future direction would be to investigate a larger set of broadband sounds with different spectral profiles.Since HAs alter the spectrum of incoming sounds in various ways, their effects on externalization may vary with different sound source spectra.Our preliminary acoustic analysis suggested that spectral centroid may be relevant, but other characteristics may also play a role.For example, effects related to microphone position, which involve high-frequency spectral details, may be most apparent for stimuli with a relatively flat spectrum.Effects related to occluding domes, which tend to boost low frequencies, may be more noticeable for low-frequency dominant sounds such as speech.Another intriguing question is whether "unfamiliar" or "unexpected" sounds in the environment are more difficult to place accurately in extra-personal space.In addition, it is important for future work to extend these results to reverberant environments in which HAs may also interact with the acoustics of the room to affect externalization (e.g., Hassager et al., 2017).Finally, while our study was limited to listeners with normal hearing, different results may be expected from listeners with hearing loss who are long-term users of HAs.These listeners may have adapted to the specific disruptive effects of their devices (e.g., related to microphone position and dome type) and thus may show better externalization overall when listening aided.On the other hand, most HA users would have higher compression ratios than those used in the current study, and thus may show larger differences between linear and compressive amplification.

Conclusions
This study confirmed that, for listeners with normal hearing who are fit with low-gain HAs, sounds are perceived to be too close and more often inside the head relative to the natural listening situation without HAs.Our results suggest that the disruptive effect of HAs-and the robustness of externalization in general-may vary widely depending on the spectrotemporal characteristics of everyday sounds.

Fig. 1 .
Fig. 1.Waveforms and spectra of each of the tokens.The magnitude values for the spectra are normalized according to the mean value across frequency for that specific token.The male and female voice tokens shown are those from Experiment 2.

Fig. 2 .
Fig. 2. Absolute externalization ratings (A) and the percentage of internalized responses (B) as a function of sound token for each HA condition.Symbols and error bars show across-subject means and standard errors.Symbols within each HA condition are connected by lines to facilitate across-condition comparisons.

Table 2 .
Statistical analysis of absolute ratings and internalized responses, considering the factors of HA condition, level, and token.Key model results and post hoc contrasts are shown in the top and bottom sections, respectively.

Table 3 .
Statistical analysis of absolute ratings and internalization using predictors of HA condition, level, z-scored crest factor, and z-scored spectral centroid.Key model results and post hoc contrasts are shown in the top and bottom sections, respectively.Absolute ratings InternalizationType III ANOVA table with Satterthwaite's method Analysis of deviance table (Type III Wald Chisquare tests)