Age differences in central auditory system responses to naturalistic music

Aging influences the central auditory system leading to difficulties in the decoding and understanding of over- lapping sound signals, such as speech in noise or polyphonic music. Studies on central auditory system evoked responses (ERs) have found in older compared to young listeners increased amplitudes (less inhibition) of the P1 and N1 and decreased amplitudes of the P2, mismatch negativity (MMN), and P3a responses. While preceding research has focused on simplified auditory stimuli, we here tested whether the previously observed age-related differences could be replicated with sounds embedded in medium and highly naturalistic musical contexts. Older (age 55 – 77 years) and younger adults (age 21 – 31 years) listened to medium naturalistic (synthesized melody) and highly naturalistic (studio recording of a music piece) stimuli. For the medium naturalistic music, the age group differences on the P1, N1, P2, MMN, and P3a amplitudes were all replicated. The age group differences, however, appeared reduced with the highly compared to the medium naturalistic music. The finding of lower P2 amplitude in older than young was replicated for slow event rates (0.3 – 2.9 Hz) in the highly naturalistic music. Moreover, the ER latencies suggested a gradual slowing of the auditory processing time course for highly compared to medium naturalistic stimuli irrespective of age. These results support that age-related differences on ERs can partly be observed with naturalistic stimuli. This opens new avenues for including naturalistic stimuli in the investigation of age-related central auditory system disorders.


Age-related changes in the peripheral and central auditory system
Age-related changes that influence auditory perception occur independently at multiple stages along the auditory processing pathway (Federmeier et al., 2003;Gates & Mills, 2005;Alain et al., 2013;Bidelman et al., 2014;Ouda et al., 2015;Bidelman et al., 2019). Changes in the peripheral auditory system, e.g., loss of outer hair, ganglion, or striatal cells, or stiffness of the basilar membrane, result in peripheral age-related hearing loss (peripheral presbycusis), which is typically observed by heightened pure tone hearing thresholds for high-frequency tones (Gates & Mills, 2005). Independently, changes in the central auditory system, due to loss of grey-matter volume in the fronto-temporal cortex (Fjell et al., 2009;Ouda et al., 2015) and reduction of white matter microstructure (Golub, 2017;Wassenaar et al., 2019), result in central presbycusis, which is often indicated by degraded auditory object perception and decline in higher level auditory processing, such as impaired speech in noise and concurrent tones perception (Gates & Mills, 2005;Alain et al., 2006;Snyder & Alain, 2007;Alain et al., 2012;Alain et al., 2013;Ouda et al., 2015;Bidelman et al., 2019). Even though naturalistic auditory stimuli are neurally processed differentially from simplified stimuli (Schonwiesner & Zatorre, 2009;Theunissen & Elie, 2014;Gould van Praag et al., 2017;Sonkusare et al., 2019), most research has so far investigated auditory aging effects with simplified stimuli (e.g., short consonant/vowel sounds or pure tone sequences), whereas we are not aware of previous studies investigating aging effects on neural processing of highly naturalistic stimuli (e.g., an unmanipulated and whole music piece). The ecological validity of the age-related changes in auditory brain function is therefore a pertinent question (Alain et al., 2006;Leung et al., 2013;Rufener et al., 2014;O'Brien et al., 2015;Getzmann & Wascher, 2016;Bidelman et al., 2017;Zendel et al., 2019), which is also relevant for the diagnostics and the selection of appropriate preventive actions and treatments to ameliorate age-related hearing loss.

Age-related changes in cortical auditory evoked responses
Age-related changes in auditory brain function have been investigated noninvasively with electroencephalography (EEG) and magnetoencephalography (MEG), particularly age-related changes in auditory cortical evoked responses (for reviews, see Alain et al., 2013;Cheng et al., 2013). These methods offer promising possibilities for functional brain research in naturalistic settings, due to the ongoing developments of wearable EEG (for a review, see Casson, 2019) and MEG devices (e.g., Boto et al., 2018). EEG and MEG measures of the cortical auditory P1 and N1 responses, which occur approximately 50 and 100 ms after sound onsets, generally show in older compared to young listeners higher P1 amplitudes for pure tones, complex tones, and white noise (Ross et al., 2009;Ross et al., 2010;Alain et al., 2012;Alain et al., 2013;Nowak et al., 2016;Brinkmann et al., 2021) and higher N1 amplitudes for pure tones, complex tones, synthesized vowels, white noise, and naturalistic sounds and words (Alain & Woods, 1999;Alain et al., 2013;Dushanova & Christov, 2013;Leung et al., 2013;Bidelman et al., 2014;Rufener et al., 2014;Nowak et al., 2016;Tusch et al., 2016;Bidelman et al., 2017;Strömmer et al., 2017;Basharat et al., 2018;Mahajan et al., 2020;Ruohonen et al., 2020). These higher P1 and N1 amplitudes appear to be caused by less neural adaptation and inhibition of the cortical responses to repetitions of the same stimulus in older than in young (Leung et al., 2013). The enhanced P1 amplitude correlates with peripheral hearing loss , and the increased N1 amplitude is also partly related to peripheral hearing loss and partly to independent changes in the central auditory system Bidelman et al., 2014;Tusch et al., 2016;Ruohonen et al., 2020). With respect to response latencies, shorter P1 latency has been observed in older than young for pure and complex tones (Ross et al., 2009;O'Brien et al., 2015), albeit, one study reported longer P1 latency for pure tones (Ross et al., 2010). Generally, the N1 latency is longer in older than young listeners for pure tones, complex tones, and naturalistic words (Federmeier et al., 2003;Alain & McDonald, 2007;Alain et al., 2012;Dushanova & Christov, 2013;Rufener et al., 2014;Basharat et al., 2018;Brinkmann et al., 2021). These alterations in the P1 and N1 latencies seem to indicate changes in the central auditory system .
In the investigation of passive auditory deviance responses in older compared to young listeners, the amplitude is generally lower for the mismatch negativity (MMN) response to duration deviants in pure tone and white noise patterns, to gap deviants in pure tone patterns, and to pitch deviants in pure and complex tone patterns (Pekkonen et al., 1996;Alain & Woods, 1999;Alain et al., 2004;Kiang et al., 2009;Rimmele et al., 2012;Alain et al., 2013;Cheng et al., 2013;O'Brien et al., 2015;Nowak et al., 2016;Mah & Connolly, 2018). Also, amplitudes are found to be lower in older compared to young for the P3a responses to duration deviants in pure tones and white noise patterns, to pitch deviants in pure and complex tones patterns, to location deviants of naturalistic words, and to dog barks and other environmental sounds inserted in pure tone patterns (Knight, 1987;Gaal et al., 2007;Kiang et al., 2009;Rimmele et al., 2012;O'Brien et al., 2015;Correa-Jaraba et al., 2016;Getzmann & Wascher, 2016;Nowak et al., 2016;Mahajan et al., 2020). Furthermore, the MMN response is longer in older than in young for duration deviants in pure tone patterns, for pitch deviants in pure and complex tone patterns, and for environmental sounds inserted in pure tone patterns (O'Brien et al., 2015;Correa-Jaraba et al., 2016;Mah & Connolly, 2018). Longer P3a latencies are also observed for older compared to young for pitch deviants inserted in pure and complex tones patterns, for location deviants in naturalistic words, and for dog barks and environmental sounds disrupting pure tone patterns (Knight, 1987;Gaal et al., 2007;O'Brien et al., 2015;Correa-Jaraba et al., 2016;Getzmann & Wascher, 2016;Mahajan et al., 2020). Moreover, the MMN and P3a responses are elicited to deviant sounds interspersed within repeated auditory patterns (Näätänen et al., 2012). The MMN is generally assumed to be an early, automatic, and pre-attentive response to deviants that occurs before the P3a response (Näätänen et al., 2007). The P3a, which follows the MMN response, is thought to reflect the early direction of attention towards the deviant sound (Näätänen et al., 2007). The lower MMN amplitude in older compared to young listeners is presumed to indicate reduced pre-attentive auditory discrimination ability (Näätänen et al., 2012). The interpretation of the age-related lowering of the P3a amplitude or absent P3a response in older, however, seems to depend on the experimental task (Rimmele et al., 2012). Some authors argue that it reflects improved ability to ignore distracting auditory deviants (Getzmann et al., 2013;Mahajan et al., 2020), whereas others argue that it indicates impaired ability to detect relevant auditory deviants (Nowak et al., 2016).
So far, the investigation of these age-related functional changes in the central auditory system have been constrained to the application of simplified sounds or single instances of naturalistic sounds isolated from their natural auditory context. Thus, exploration of age group differences in the central auditory system, taking into account the brains' adaptation to naturalistic sounds occurring in ecologically valid auditory contexts, is warranted.

Ecological validity of age group differences in cortical auditory evoked responses
We tested the replicability of the preceding findings of age-related enhancement of the P1 and N1 amplitudes, reduction of the P2, MMN, and P3a amplitudes, and possibly shorter P1 latency and longer N1, P2, MMN, and P3a latencies in aging. First, the replicability was tested with medium naturalistic music stimuli (a repeated melody with interspersed deviant tones) and, second, with highly naturalistic music stimuli (a studio recording of a music piece). The aim was to test whether the age group differences could be observed with the medium and the highly naturalistic stimuli, and to test whether the age group differences were similar or modulated by applying the medium and the highly naturalistic stimuli. For example, recent studies found in older compared to young listeners higher P2 amplitude to the last presented tone (Halpern et al., 2017) and comparable ERAN amplitude to out-of-key (-/+ 100 cents) deviant tones (Lagrois et al., 2018) in medium naturalistic melodies, which challenges the typical findings observed with simplified stimuli. Throughout this study we apply the term ecological validity to denote a factor of ecological validity that can be manipulated, and the comparative adjectives medium naturalistic and highly naturalistic are applied to indicate stimuli of medium and high levels of ecological validity.

Participants
Fourteen older adults and seventeen young adults with normal hearing (Table 1) were recruited for the study via social media. All participants were tested on the Beltone online adaptive hearing test (https://beltonehearingtest.com), which indicated normal speech in noise hearing thresholds for all the participants. For ethical reasons, the sound pressure level was adjusted according to the individual's assessment of a level equivalent to normal comfortable speech, which across participants was median 64 dB SPL in both the older and the young group and showed no significant difference between the age groups ( Table 1). The older and younger adults were also matched on their music listening preferences (Table 1), according to a modified Danish version of the Iowa Music Background Questionnaire (IMBQ) (Gfeller et al., 2000;Petersen et al., 2013). Compared to the young group, the older group did not report answers indicative of peripheral presbycusis (no preference for loud music) or central presbycusis (no preference for simple melody, harmony, and rhythm) ( Table 1).
On the IMBQ the young group reported more years of musical training in comparison to the older group (Table 1). Therefore, it was also investigated whether the years of musical training influenced the results. Regarding music genre preferences, there were no significant differences between the older and young groups. The older and the young groups reported that they enjoyed listening to pop, rock, classical, jazz, blues, and Show Tunes (Musicals) genres; both groups indicated some enjoyment of Rap / Hip-hop and Country Western genres; both groups overall indicated no enjoyment of Danish Country ("Dansktop") and Hard Rock / Heavy Metal genres (Appendix A Table A.1).
In addition to the present study, the participants served as normally hearing controls in separate studies reported elsewhere, which investigated music perception in experienced cochlear implant (CI) users (Petersen et al., 2020) and recently implanted CI users (Seeberg et al., 2023), and CI diagnostics based on individual mismatch-negativity (MMN) responses .
Oral and written information about the study was provided to all participants. The study was approved by the Research Ethics Committee of the Central Denmark Region and was conducted in accordance with the Helsinki declaration. The participants did not receive monetary compensation for their participation.
The datasets generated for this study are available on request to the corresponding author. Due to the EU General Data Protection Regulation (GDPR) which came into force in 2018 the dataset cannot be made publicly available; it can, however, be obtained by individual researchers upon individual research data sharing agreements.

Stimuli
For the medium naturalistic stimuli, the musical multi-feature (MuMuFe) paradigm (Vuust et al., 2011) consisting of a four-note Alberti bass melody was played. The Alberti bass melody became popular in the 18th century of classical Western music. The MuMuFe paradigm consisted of a single (monophonic) piano melody, which followed a regularly timed MIDI track and was played with tones recorded from real music instruments (an audio excerpt of the applied medium naturalistic stimuli is provided as supplementary material). The Alberti bass melody was played with 200 ms note duration, 18 ms rise and fall time, and 5 ms silent inter-note interval. The four-note melody was repeated 48 times before being played in another key out of four different keys (C, Eb, Gb, A), with fundamental tone frequencies spanning the middle register between 208 and 659 Hz. The first, the second, and the fourth note position in the four-note melody served as repeated standard tones, whereas, at each third note position, deviant tones were inserted to measure MMN and P3a responses. The latest CI MuMuFe paradigm was applied (Petersen et al., 2020), which included a set of 16 different deviant tones presented 144 times in pseudo-random order. The total stimulation time for the medium naturalistic stimuli was approximately 30 min.
For the highly naturalistic stimuli, the participants listened to a four minute long excerpt of a studio recording of the instrumental tango nuevo music Adios Nonino by Astor Piazzolla (recorded in Buenos Aires 1969, from the album Astor Piazzolla y su Quinteto, © Circular Moves 2003. In a previous study (Haumann et al., 2021) it was validated that the tango piece stimuli resulted in evoked responses in a group of young listeners. The highly naturalistic tango was a single piece of four minute long music with overlapping (polyphonic) melodic parts from piano, accordion, violin, guitar, and double bass played by professional musicians (an audio excerpt of the applied highly naturalistic stimuli is provided as supplementary material).
All the auditory stimuli were played at a 44.1 kHz sampling rate. Since the study was part of a project comparing CI users with monoaural hearing in one implanted ear and normally hearing controls, the stimuli were presented in a mono track delivered to both ears of the normally hearing participants. All participants first listened passively to the medium naturalistic music (the CI MuMuFe paradigm), then to a popular song with lyrics, then to the highly naturalistic stimuli (the tango nuevo music), and finally to the same popular song without lyrics. (The popular song EEG data with and without lyrics is part of a separate study that will be reported elsewhere.).

Detection of sound onsets in naturalistic music stimuli
The automatic detection of cortical auditory evoked responses to sound onsets in real music is still a methodological challenge, due to difficulties of defining reliable detection thresholds for the typically homophonic or polyphonic mixtures of acoustical signals in the audio waveforms (e.g., see Smith & Fraser, 2004; Thoshkahna & Table 1 Participant demographics. Stimulus sound pressure level in dB and Iowa Musical Background Questionnaire (IMBQ) (Gfeller et al., 2000) items related to listening preferences and musical training. For the music listening preferences were the following scores applied: − 1 =Dislike, 0 =Irrelevant, and + 1 =Like. Numbers and Likert scale ratings were not normally distributed, and the medians are reported with the interquartile range in parentheses. The age groups were compared by using the exact Mann-Whitney U and Pearson's chi-squared (χ 2 ) statistics. ( *** p < .001, ** p < .01.).  Ramakrishnan, 2008;Alías et al., 2016). A recent study (Haumann et al., 2021), however, showed that manual detection of sound onsets by a musicology expert was currently more accurate than automatic detection. By following the same procedure as in the previous study, a total of 942 sound onset time points were manually detected by using auditory and visual inspection of the audio spectrogram. For visualization of the detected sound onsets, acoustic features were extracted with the Music Information Retrieval (MIR) Toolbox (version 1.6.1) for Matlab (Lartillot & Toiviainen, 2007). The audio spectrogram in dB was calculated by using the mirspectrum function with the default Fast Fourier Transform algorithm with a frame length of 50 ms and using a Hamming window, and the average spectrogram across the sound onsets was obtained. Additionally, the relative sound intensity in dB was measured in terms of root-mean-square (RMS) energy with the mirrms function, and the relative spectral flux in dB, indicating the spectral change between successive time frames, was measured with the mirflux function, using a frame length of 25 ms with 50% frame overlap.

Behavioral auditory discrimination test
Prior to the EEG recording, all participants completed a behavioral auditory discrimination test to investigate attentive sound discrimination ability (Petersen et al., 2020). The same acoustical differences between the deviant and standard tones in the same melody from the CI MuMuFe paradigm was applied. In each trial out of a total of 96 trials, the participants indicated which melody out of three melodies contained a deviant. The percent correct detections of the melody containing the deviant was measured with a chance level at 33.3%.

EEG recording
EEG was recorded in an acoustically shielded room at the EEG lab facilities of Aarhus University Hospital. A BrainAmp amplifier system (Brain Products, Gilching, Germany) was applied, and during preparation the electrode impedances were ensured to remain at < 25 kΩ. A 32electrode cap was applied by following the standard 10/20 system, and the EEG was digitized at a sampling rate of 1000 Hz. Also, an electrooculogram (EOG) was recorded with electrodes beside and above the left eye. The FCz was applied as an initial reference electrode.
During the EEG recording all participants received the audio stimuli via in-ear Shure SE215-CL headphones. The sound level was adjusted for all participants to an individually comfortable level by using a starting point of 65 dB SPL (a final stimulation level of median 64 dB SPL was chosen for the experiment in both age groups). For the medium naturalistic stimuli, a standard MMN procedure was applied where the participants were instructed to ignore the music stimuli and pay attention to a displayed movie in which the sound was muted. For the highly naturalistic stimuli, participants were instructed to answer questions about their perception of the music piece just after listening to the music piece.

EEG preprocessing
The FieldTrip Toolbox for Matlab (Oostenveld et al., 2011) was applied for the preprocessing of the EEG data. First, the continuous EEG data was down-sampled to 250 Hz, high-pass filtered at 1 Hz half-cutoff frequency and low-pass filtered at 25 Hz half-cutoff frequency with finite impulse response filters (FIR) and the FieldTrip default settings (Hamming window, zero-phase two-pass forward and reverse, 750-order high-pass, and 30-order low-pass), thus applying a frequency range common for investigating the early cortical evoked responses P1/N1/P2 and MMN/P3a (e.g., Alain & McDonald, 2007;Snyder & Alain, 2007;Lijffijt et al., 2009;Rimmele et al., 2012;Bidelman et al., 2014;Rufener et al., 2014;Bidelman & Alain, 2015;Fitzroy et al., 2015;Nowak et al., 2016;Bidelman et al., 2017;Volosin et al., 2017;Basharat et al., 2018;Bidelman et al., 2019). Note that the study did not involve late cortical responses, such as the P3, that might be distorted with high-pass filtering (Widmann et al., 2015). A few bad EEG electrodes (on average 0.2 electrodes, range=0-2 electrodes) showing high amplitude noise or flat line were substituted by interpolation of the EEG waveforms in the neighboring electrodes, weighted by the distance to the neighboring electrodes by using the ft_channelrepair function. The independent component analysis (ICA) infomax algorithm (Makeig et al., 1996;Delorme et al., 2007) was applied to identify and suppress eye movement artifacts. When one clear vertical eye movement component was observed (on average 0.9 component) and one horizontal eye movement component was observed (on average 0.8 component) by visual inspection, the eye artifact components were subtracted. After these steps, the EEG was re-referenced to the average across all electrodes. Subsequently, trials were extracted for each sound onset by using a − 100-400 ms time window. For comparing the P1/N1/P2 responses to the medium and highly naturalistic stimuli, the MMN and P3a responses were excluded. This was achieved by aligning the trials for the medium naturalistic stimuli to the standard tone onsets at position one within the four-tone melody, excluding the first tone at each key change. For the analysis of the MMN/P3a responses, the trials were aligned to all deviant and standard tones, excluding the first tone at each key change, as in Petersen et al. (2020). No baseline correction was used for the highly naturalistic stimuli, since evoked responses to a preceding sound onset could be present less than 100 ms before the current sound onset at moments with high sound onset event rates. Instead, the 1 Hz high pass filter ensured suppression of slow drifts in the EEG waveforms. For the medium naturalistic stimuli the average waveform between − 100-0 ms preceding the tone onset was subtracted as in Petersen et al. (2020). Any remaining noisy trials with amplitudes exceeding ± 100 μV were automatically removed (on average 0.2% of the trials; range: 0.0-2.8% of the trials). Subsequently, the average EEG waveforms across the trials were obtained for each condition. Finally, to isolate the MMN/P3a responses from the P1/N1/P2 responses, the average standard waveforms were subtracted from the average deviant waveforms.

Statistical analyses
IBM SPSS v27 was applied for statistical analyses (IBM, 351 Armonk, New York, USA). The average waveforms across the Fz and Cz electrodes, showing highest signal-to-noise-ratios in the grand-average across the older and young groups and stimulus conditions, were applied for the statistical analyses of P1/N1/P2 amplitudes and latencies (e.g., as in Lijffijt et al., 2009;Volosin et al., 2017). For the P1/N1/P2 responses to the medium naturalistic stimuli, only responses to the standard tones were applied. Mean amplitudes were extracted for each participant across a 30 ms time window centred on the peak latencies in the grand-average waveforms across the older and young groups for the medium naturalistic stimuli (P1 = 72 ms; N1 = 132 ms; P2 = 184 ms) and the highly naturalistic stimuli (P1 = 80 ms; N1 = 140 ms; P2 = 212 ms). For each participant, the P1 peak latencies were estimated as the most positive peak in the latency range of 30-100 ms, and the N1 peak latencies as the most negative peak between 100 and 200 ms. For the medium naturalistic condition, since any late P2 responses would be obscured by the P1 responses to the following tone onset always after 205 ms (0 ms jitter), the P2 peak latency was estimated as the most positive peak between 100 and 200 ms. Overlap between evoked responses in the highly naturalistic stimuli was not a critical issue, due to naturally inherent jitter in the sound onset intervals, between − 164 -+ 2779 ms from a median of 182 ms stimulus onset-asynchrony, cancelling out the non-time-locked evoked responses to preceding and following sound onsets in the average waveforms. For the highly naturalistic condition, the P2 peak latency was estimated as the most positive peak between 100 and 300 ms.
The Fz electrode was applied for the statistical analyses of MMN/P3a amplitudes and latencies (e.g., as in Cheng et al., 2013;O'Brien et al., 2015;Petersen et al., 2020). These were based on the difference waveforms with responses to all standard tones subtracted from responses to the deviant tones. Feature-specific mean amplitudes were extracted for each participant across a 30 ms time window centred on the peak latencies in the grand-average waveforms across the older and young groups (MMN intensity =156 ms; MMN pitch =148 ms; MMN timbre =132 ms; MMN rhythm =128 ms) (cf. Petersen et al., 2020), whereafter the average MMN amplitude across the features was obtained for each participant. Due to high individual variability in P3a latency, the mean P3a amplitude was measured in a time window between 200 and 300 ms. The MMN peak latencies were identified as the most negative peak at the Fz electrode between 100 and 250 ms, and the P3a peak latencies were identified as the most positive peak at the Fz electrode between 200 and 300 ms.

Age group differences in P1/N1/P2 responses
Normal distributions were observed for the older and young P1/N1/ P2 amplitudes and latencies for each naturalistic condition and for the difference in the P1/N1/P2 amplitudes and latencies between the naturalistic conditions. Prior to hypothesis testing, one-sample t-tests on the P1/N1/P2 amplitudes were used to assesses whether the P1/N1/P2 responses were measurable in each age group and ecological validity condition. Latency comparisons were excluded from further analysis if they involved a group and condition with a non-significant one-sample ttest result, which indicates that the evoked response diverging from the EEG noise floor could not be measured, and there would be no valid peaks for estimating the peak latencies. Though, amplitude comparisons were conducted regardless of the results of the one-sample t-tests, where a significant compared to a non-significant one-sample t-test would suggest an amplitude diverging from the EEG noise floor compared to an amplitude at the EEG noise floor.
To test whether the age group differences could be replicated for the P1/N1/P2 responses to the naturalistic stimuli and whether the effect of the age group might be modulated by the degree of ecological validity, the Age Group (Older, Young) by Ecological Validity (medium, high) interaction was assessed with multi-level mixed effects ANOVAs, where adjustments for unequal sample size and variance between the age groups were achieved with the Welch-Satterthwaite degrees of freedom correction and by modeling individual random intercepts for the Ecological Validity factor. This was followed up by four planned comparisons: the effect of the Age Group in the medium and the high ecological validity condition was analyzed with Welch's tests (an equivalent to the t-test), correcting for unequal sample size and variance between the age groups; the effects of Ecological Validity in the older and the young groups were tested with paired samples t-tests. Bonferroni correction of the significance threshold for the four follow-up comparisons were applied by dividing the standard significance threshold by four. The effect sizes of the predictor variables in the multi-level mixed effects ANOVAs are reported as partial R β 2 estimates, which similarly to the r 2 indicates the percent variance in the data (from 0 to 1) explained by the predictor variable (Edwards et al., 2008;Volpert-Esmond et al., 2021).
Since the older and younger groups differed in years of music training, it was necessary to verify that the differences in musical training did not confound with the age difference. To verify this, additional multi-level mixed effects ANOVAs for testing the Age Group by Ecological Validity interaction were conducted with the insertion of Musical Training as covariates (years of instrument/voice training, music group training, elementary school music lessons, music appreciation, and music theory lessons), and to ensure the stimulus sound pressure level dB SPL did not influence the age group differences, the individually adjusted dB SPL was also tested as a covariate.
For highly naturalistic stimuli it has previously been reported that the P1/N1/P2 amplitudes were lower for faster sound onset event rates (defined in Hz as one over the duration in seconds of the preceding sound onset) (Haumann et al., 2021). The present highly naturalistic stimuli showed high variance in the event rate: minimum 0.3 Hz, median 5.5 Hz, maximum 55.6 Hz, and interquartile range 5.1 Hz. (The event rate of the medium naturalistic stimuli, at the applied standard tone 1, was constantly 4.88 Hz with no variance.) Therefore, we tested whether the age group differences on the P1, N1, and P2 amplitudes for the highly naturalistic stimuli were modulated by an Event Rate percentile factor (0-20th, 21-40th, 41-60th, 61-80th, 81-100th) with sound onsets split into five sub-averages of each 188 trials sorted from the slowest to the fastest event rates (with the first sound onset excluded). The Age Group by Event Rate interaction was assessed with the same type of multi-level mixed effects ANOVAs applied to test the Age Group by Ecological Validity interaction (see above). This was followed by seven planned comparisons: the Age Group differences at each of the five levels of Event Rate were analyzed with Welch's tests, and the effects of Event Rate in the older and the young groups were tested F-tests derived from the multi-level ANOVA. Bonferroni correction of the significance threshold was applied for the seven follow-up comparisons by dividing the standard significance threshold by seven.

Age group differences in MMN/P3a
The MMN/P3a amplitudes and latencies were normally distributed in the older and the young groups. Prior to hypothesis testing, onesample t-tests on the MMN/P3a amplitudes were used to assesses whether the MMN/P3a response was measurable in each age group. Then, the age group differences in the MMN/P3a amplitude and latency were analyzed with Welch's tests, correcting for unequal sample size and variance between the age groups.
Adjustments for the possibly confounding effects of Musical Training and individually adjusted dB SPL were conducted with multi-level mixed effects ANOVAs for testing the main effect of Age Group with the insertion of Musical Training (years of instrument/voice training, music group training, elementary school music lessons, music appreciation and music theory lessons) and individually adjusted dB SPL as covariates.

Behavioral auditory discrimination
The percent correct detection of the deviants was not normally distributed in the older and young groups. Therefore, the Mann-Whitney U test was applied to test for an effect of Age Group on the behavioral auditory discrimination ability.

Age group differences in P1/N1/P2 responses
The topographies and waveforms of the P1/N1/P2 responses are shown in Fig. 1. The effects of Age Group were modulated by the Ecological Validity of the auditory stimuli, as indicated by significant Age Group by Ecological Validity interactions on the P1 amplitude (F(3, 41.9)= 5.32, p = .003, R β 2 = .28), the N1 amplitude (F(3, 42.5)= 7.99, p < .001, R β 2 = .36), and the P2 amplitude (F(3, 42.5)= 7.07, p < .001, R β 2 = .33). These Age Group by Ecological Validity interactions remained significant after adjusting for Musical Training and stimulus dB SPL, and the P1/N1/P2 amplitudes and latencies were not significantly influenced by Musical Training or stimulus dB SPL (Appendix C). The planned comparisons suggested that these interactions were consistently driven by reduced age group differences for the highly compared to the medium naturalistic stimuli (Fig. 2, Table 2). Individual participant ER waveforms are provided in Appendix B Figure  N1 and P2 latencies were excluded from the Age Group analyses, because the one-sample t-tests indicated that the N1 responses were not significantly measurable in the young group for the medium naturalistic stimuli, and the P2 responses were not significantly measurable in the older group for the medium naturalistic stimuli (see Appendix B Table B.1).

P1 and N1
For the medium naturalistic stimuli, the effects of Age Group on the P1 and N1 amplitudes were significant (Fig. 2, Table 2). Compared to the young group, the older group showed higher P1 and N1 amplitudes for the medium naturalistic stimuli. For the highly naturalistic stimuli, there were no significant effects of Age Group on the P1 and N1 amplitudes (Fig. 2, Table 2). The older group showed significantly lower P1 and N1 amplitudes for the highly compared to the medium naturalistic stimuli, indicating that the lack of significant aging effects for the highly naturalistic stimuli were related to significantly smaller age group differences and not higher variance in the evoked responses to the highly naturalistic stimuli (Fig. 2, Table 2). In the young group the P1 and N1 amplitudes were not significantly influenced by the ecological validity of the auditory stimuli (Fig. 2, Table 2). Moreover, in the older group, the N1 latency was significantly longer for the highly compared to the medium naturalistic stimuli (Fig. 1) (t(13) = 7.3, mean diff = 14 ms, 95% CI: [7,20], p < .001, r 2 = .25).

P2
The effect of Age Group on the P2 amplitude (i.e., lower P2 amplitude in the older than in the young group) was significant for the medium naturalistic stimuli (Fig. 2, Table 2). The effect, however, only approached significance with the highly naturalistic stimuli (Fig. 2,  Table 2). Also, the older group tended to show higher P2 amplitude for the highly compared to the medium naturalistic stimuli, although this tendency did not reach significance (Fig. 2, Table 2). The P2 amplitudes in the young group were not significantly influenced by the ecological validity of the auditory stimuli (Fig. 2, Table 2). Furthermore, in the young group, the P2 latency was significantly longer for the highly . For the medium naturalistic stimuli, sound onsets and evoked responses occur regularly every 205 ms (the onset at 0 ms is marked by a black rectangle, and the following onset at 205 ms is marked by a grey rectangle). (Shaded error bars indicate BCa bootstrap 95% confidence intervals.) (C top) For the highly naturalistic music stimuli, the manually identified sound onsets (starting at 0 ms) can be verified by increase in sound intensity (relative change in RMS energy in dB) and spectral flux (relative change in dB). (C bottom) Distribution of the sound intensity increase across the audio spectrum (frequency in Hz). compared to medium naturalistic stimuli (Fig. 1) (t(16) = 7.7, meandiff = 32 ms, 95%CI: [23,41], p < .001, r 2 = .47).
The Age Group by Event Rate interaction on the P2 amplitude (F(9, 102.2)= 9.32, p < .001, R β 2 = .45) was significant. A significant age group differences on the P2 amplitude (i.e., lower P2 amplitude) was observed at the slowest event rates, while no significant age group difference was found at faster event rates (Fig. 3, Table 3). The Event Rate significantly influenced the P2 amplitude in the younger group (F (4116)= 15.46, p < .001, R β 2 = .35) with the lowest P2 amplitude at the fastest event rate (Fig. 3). The effect of Event Rate on P2 amplitude did not reach significance in the older group (F(4116)= 2.23, p = .070, R β 2 = .07).

Behavioral auditory discrimination ability
The behavioral hit rates tended to be lower in the older than the young group, although this tendency did not reach significance (U (31)= 81.5, p = .138, r 2 = .15) due to high variance in the older group. All young participants (except one) scored at the ceiling level at 100% hit rate and showed no variance (Appendix D Figure D.1).

Ecological validity of age group differences in cortical auditory evoked responses
In this study, we aimed to investigate whether the typical differences in older compared to young listeners, which generally are enhanced P1 and N1 amplitudes and reduced P2 amplitude, could be replicated with medium and highly naturalistic music stimuli. Furthermore, we tested whether the commonly reduced MMN and P3a amplitudes in older age could be replicated with medium naturalistic music stimuli. With the medium naturalistic stimuli, the age group differences were consistently replicated, whereas with the highly naturalistic stimuli, the age group differences could not be replicated, except for the slowest event rates in the highly naturalistic stimuli that revealed significantly lower P2 amplitude in older compared to young listeners. The results suggest that the age group differences in these cortical auditory evoked responses might depend on the ecological validity of the stimuli.
The present findings suggested that the older listeners' neural adaptation and inhibition of the P1 and N1 amplitudes (Leung et al., 2013) and enhancement of the auditory object formation-related P2 amplitudes (Leung et al., 2013) improved with the highly compared to the medium naturalistic music stimuli. By contrast, the P1/N1/P2 amplitudes in the young group remained the same regardless of the ecological validity of the stimuli. These results support that auditory neural processing is functionally optimized for naturalistic stimuli (Schonwiesner & Zatorre, 2009;Theunissen & Elie, 2014;Sonkusare et al., 2019), possibly resulting in a benefit for older listeners in the processing of highly naturalistic stimuli compared to simplified but more synthetic stimuli (also see, Halpern et al., 2017). Future studies should investigate whether effects of aging might be consistently Fig. 2. Age group differences in P1/N1/P2 amplitudes modulated by ecological validity. Showing the effect of Age Group (older, young) in the medium and highly ecologically valid conditions and the effect of Ecological Validity (medium, high) in the older and young age groups on the P1, N1, and P2 amplitudes.
The absence of a measurable N1 response to the medium naturalistic stimuli in the young group was expected, since the young group would be expected to habituate to the repetitive standard tones in the melody and show diminished or difficult to measure N1 amplitude, whereas the older group would typically show less N1 suppression to repeated tones and less reduction of the N1 amplitude (Leung et al., 2013).
The study also showed significant Age Group by Event Rate interactions on the P1 and N1 amplitudes, although, the non-significant follow up tests this did not support that these interactions were driven by age differences at the tested event rates. Possibly, the interactions might be driven by the larger effects sizes for the Event Rate in the young group compared to the older group, or it could be driven by combined tendencies towards age differences across more levels of slow event rates (between 0.3 Hz and 6.8 Hz) compared to more levels of fast event rates (between 6.8 Hz and 55.6 Hz). The P2 amplitude to the highly naturalistic stimuli was significantly lower in the older compared to the young listeners at the slow sound onset event rates (0.3-2.9 Hz). This could indicate that there was an age-related difference in auditory object formation (Leung et al., 2013) at the slow event rates. Alternatively, the age difference in the P2 amplitude might be better measured at the slow event rates, due to high signal-to-noise ratio at the slow event rates (Haumann et al., 2021), or due to a superposition effect at the fast event rates, where the N1 response to the preceding sound onset overlap the P2 response and cancel out the measured P2 amplitude at fast event rates.
Moreover, the study showed that in the older group the N1 latency was significantly longer by 14 ms for the highly compared to the medium naturalistic stimuli. Also, in the younger adults, the P2 latency was significantly longer by 32 ms for the highly compared to the medium naturalistic stimuli, whereas the P1 latency was not significantly influenced by the ecological validity. This suggests that the evoked responses in both older and young were gradually delayed at each neural processing stage for the more complex naturalistic stimuli in comparison to the simplified stimuli. Perhaps comparable with these findings, invasive recordings from mammals have shown that specific spectrotemporal receptive field neurons are responsive to complex naturalistic sounds but not to simplified synthetic sounds, indicating that different neural populations are involved in the processing naturalistic than simplified sounds along the pathway from the thalamus and inferior-colliculus to the primary auditory cortex (Theunissen & Elie, 2014). Also, localized groups of neurons in the human primary and secondary auditory cortex have been observed to respond to specific combinations of spectrotemporal patterns (Schonwiesner & Zatorre, 2009). Most of the investigated groups of neurons in the human primary and secondary auditory cortex were tuned to respond to relative sparse spectral (<1.33 cycles per octave) and slow temporal (<10 Hz modulation) patterns, which are common for naturalistic sounds (Schonwiesner & Zatorre, 2009).
Furthermore, in the young group, an early negative deflection was visible prior to the P1 for the medium naturalistic stimuli and for slower event rates between 0.3 and 4.4 Hz in the highly naturalistic stimuli, which is most likely related to auditory steady-state responses (ASSR), which should be facilitated by the isochronous sound onset intervals in the medium naturalistic stimuli, but to a lesser extent for the variable sound onset intervals in the highly naturalistic stimuli. ASSRs are known to be attenuated in amplitude in older compared to young listeners (Sauve et al., 2019). Alternatively, it might be an N2 response to the preceding tone that is known to decrease in amplitude with age (Ponton & Eggermont, 2001;Fitzroy et al., 2015). It is unlikely a middle-latency response (MLR) since the MLR amplitude is typically enhanced in amplitude (along with the P1 and N1) in older compared to young (Amenedo & Diaz, 1998). Overall, these findings point towards a benefit for older listeners in neural processing highly naturalistic stimuli, presumably facilitated by compensatory mechanisms in the central auditory system.

Correlations between age-related hearing loss and auditory evoked responses
The present study focused on replicating the age group differences in cortical auditory evoked responses with naturalistic music stimuli. The behavioral scores did not significantly reflect the age-related differences observed in the cortical auditory evoked responses. In this respect, it has previously been considered that part of the age-related neural changes in P1/N1/P2 and MMN/P3a amplitudes and latencies can indicate a combination of healthy, degenerative, and compensatory changes (Alain et al., 2004;Alain & McDonald, 2007;Snyder & Alain, 2007;Kiang et al., 2009;Alain et al., 2013;Dushanova & Christov, 2013;Getzmann et al., 2013;Zendel & Alain, 2013;Bidelman et al., 2014;Moran et al., 2014;Rufener et al., 2014;Bidelman & Alain, 2015;Bidelman et al., 2017;Lagrois et al., 2018). Healthy age-related neural changes, e.g., reorganization of the neural source orientations, increased skull thickness, or synaptic pruning, would influence the amplitude or latency values but not influence the behavioral hearing ability (Snyder & Alain, 2007;Moran et al., 2014). Degenerative age-related neural changes for instance peripheral, temporal, or frontal atrophy would be visible as offsets from healthy age-corrected normal amplitude or latency values and correlate with impaired behavioral hearing ability (Alain & McDonald, 2007;Kiang et al., 2009; Alain et al., Table 2 Planned follow-up comparisons on the effects of aging and ecological validity. Showing the Least Significant Differences based on the estimated marginal means from the multi-level mixed effects ANOVA. (Bonferroni-corrected significance: *p < .05/4, ** p < .01/4, *** p < .001/4.  Haumann et al. 2013;Getzmann et al., 2013;Bidelman et al., 2014;Bidelman et al., 2017). Finally, compensatory neural mechanisms, such as long-term learning-based predictive coding mechanisms, are considered to either restore amplitude or latency values to healthy age-corrected normal values or to be reflected only in specific neural activity such as enhanced late-latency P3b responses, and they are assumed to maintain the behavioral hearing ability at a healthy-or high-performance level (Alain et al., 2004;Dushanova & Christov, 2013;Getzmann et al., 2013;Zendel & Alain, 2013;Moran et al., 2014;Rufener et al., 2014;Bidelman & Alain, 2015;Lagrois et al., 2018).

Diagnostics and healthy aging of the central auditory system
Across the lifespan average 10% of the population encounter problems with disabling hearing loss, and the risk of developing disabling hearing increases to average 40% for ages above 65 years (Gates & Mills, 2005). According to a recent review (Sardone et al., 2019) diagnostics on the central auditory system is uncommon and difficult to perform when the peripheral auditory function is impaired, though, neurophysiological measurements, such as those presented in the present study, might be applied to diagnose comorbid peripheral and central auditory system disorders. Diagnosing central auditory system disorders is important, because they require different treatment than peripheral hearing loss, such as hearing training programs or signal processing techniques implemented in hearing devices to reduce noise sources (Sardone et al., 2019).
While previous research on aging has focused primarily on the diagnostics of disorders, there is now also a growing interest in rehabilitation, health promotion, and prevention of disorders (Reuter-Lorenz & Park, 2010). A promising preventive measure against age-related hearing loss seems to be musical training , as indicated by better speech-in-noise perception thresholds in older musicians compared to non-musicians (Zendel & Alain, 2012 and better speech-in-noise perception thresholds after six months musical training  and ten weeks of choir singing (Dubinsky et al., 2019). Specifically, for older listeners music training has been associated with improved auditory gap detection (Zendel & Alain, 2012), improved pitch discrimination (Dubinsky et al., 2019), improved mistuned harmonic detection (Zendel & Alain, 2012, faster behavioral speech sound classification (Bidelman & Alain, 2015), faster brain stem responses (Bidelman & Alain, 2015), improved frequency-following brainstem responses (Dubinsky et al., 2019), improved inhibition of the P1 response , shorter N1 and P2 latency (Bidelman & Alain, 2015), shorter MMN and P3a latency (O'Brien et al., 2015), as well as an enhanced positive late-latency neural response emerging after 400 ms (Zendel & Alain, 2012Zendel et al., 2019). The focus of the present study was not particularly on these effects of musical training in aging, although we observed smaller age group differences in the P1, N1, and P2 responses with the highly compared to the medium naturalistic stimuli, which suggests that the application of highly naturalistic stimuli is important for future investigations of the role of compensatory mechanisms in healthy aging of the auditory system.

Limitations
The sound intensity level that the participants found pleasant and unpleasant varied across individuals, and, for ethical reasons, the sound intensity was individually adjusted prior to the EEG session, starting from a sound level of 65 dB SPL. Nonetheless, both the older and the young group chose a median of 64 dB SPL for stimulation, there was no significant difference in the applied dB SPL between the age groups, and inserting the individual dB SPL as a covariate in the ANOVA tests showed that variation in the dB SPL did not confound with the observed Age Group and Ecological Validity effects. Also, the results of this study supported that the typical pattern of aging effects on the central auditory system was clearly present with the enhanced P1 and N1 amplitude but attenuated P2 amplitude. By contrast, the P1/N1/P2 responses would have shown a uniform amplitude difference between the age groups if the sound intensity differed between the age groups (e.g., see Herrmann et al., 2020;Cardon & Sharma, 2021).
It might be argued that the reduced age differences were caused by Table 3 Planned follow-up comparisons on the effects of age group and event rate for the highly naturalistic stimuli. Showing the Least Significant Differences based on the estimated marginal means from the multi-level mixed effects ANOVA. (Bonferroni-corrected significance: *p < .05/7, ** p < .01/7, *** p < .001/7.  lower signal-to-noise ratio of the ERs to the highly compared to the medium naturalistic stimuli, e.g., due to higher complexity of the highly compared to the medium naturalistic stimuli. However, if this was true, then both the young and the older listeners would be expected to show lower or unmeasurable ER amplitudes for the highly compared to the medium naturalistic stimuli, which was not supported by the present findings. Only in the older group the ER amplitudes were significantly influenced by the level of ecological validity of the stimuli, whereas the ER amplitudes in the young group remained similar between the medium and the highly naturalistic stimuli. Also, for the highly naturalistic stimuli, the P1, N1, and P2 responses were measurable in both age groups, and there was a tendency of higher P2 amplitude in the older listeners for the highly compared to the medium naturalistic stimuli. We applied the recently developed music multi-feature paradigm, originally designed for cochlear implant users, for the medium naturalistic stimuli (Petersen et al., 2020). The EEG waveforms and scalp topographies in the range of 100-150 ms indicate an MMN response in the older listeners, which has also been observed in older experienced CI users (Petersen et al., 2020). It should be noted that subtracting the average standard response from the average deviant response and investigating the latency range between 100 and 150 ms will include the difference in the N1 between deviant and standard responses, which could be none, or it could reflect a dishabituation of the N1, which some consider to be a subcomponent of the MMN that also reflects auditory deviance detection (May & Tiitinen, 2004;Gu et al., 2018).
The older group showed a late positive trend in the 300-400 ms latency range. In a previous study where the same older group in this dataset were compared to a group of cochlear implant users, the late positive trend was shown to be related to misalignment between the deviant and standard response waveforms for the rhythm deviant part of the complete music deviant set, which caused the positive trend in the older group (for further details, see the discussion section and the positive trended distortions related to specific rhythm deviants in Fig. 4 in Petersen et al., 2020), which should not be confused with late P3a responses.

Conclusion
This study showed that the age group differences in cortical evoked responses can partly be replicated with naturalistic music stimuli. The level of ecological validity, however, modulated the age group differences, suggesting that the age group differences were reduced for naturalistic compared to simplified stimuli. These findings underline the importance of future assessments of whether effects observed in controlled laboratory studies can be replicated in more ecologically valid conditions. The continued application of naturalistic approaches in future studies could advance knowledge on how the brain is processing naturalistic stimuli across age.

CRediT authorship contribution statement
Niels Trusbak Haumann conducted the analyses and wrote the first version of the manuscript. Bjørn Petersen, Peter Vuust, and Elvira Brattico contributed to the conception and design of the study. Bjørn Petersen conceived the paradigm and the behavioral test and created the stimuli. Peter Vuust contributed with funding and manuscript revisions. All authors contributed to manuscript revisions and read and approved the manuscript.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
Data will be made available on request.