Order effects in task-free learning: Tuning to information-carrying sound features

Event-related potentials (ERPs) acquired during task-free passive listening can be used to study how sensitivity to common pattern repetitions and rare deviations changes over time. These changes are purported to represent the formation and accumulation of precision in internal models that anticipate future states based on probabilistic and/or statistical learning. This study features an unexpected ﬁnding; a strong order-dependence in the speed with which deviant responses are elicited that anchors to ﬁrst learning. Participants heard four repetitions of a sequence in which an equal number of short (30 msec) and long (60 msec) pure tones were arranged into four blocks in which one was common (the standard, p ¼ .875) and the other rare (the deviant, p ¼ .125) with probabilities alternating across blocks. Some participants always heard the sequences commencing with the 30 msec deviant block, and others always with the 60 msec deviant block ﬁrst. A deviance-detection component known as mismatch negativity (MMN) was extracted from responses and the point in time at which MMN reached maximum amplitude was used as the dependent variable. The results show that if participants heard sequences commencing with the 60 msec deviant block ﬁrst, the MMN to the 60 msec and 30 msec deviant peaked at an equivalent latency. However, if participants heard sequences commencing with the 30 msec deviant ﬁrst, the MMN peaked earlier to the 60 msec deviant. Furthermore, while the 30 msec MMN latency did not differ as a function of sequence composition, the 60 msec MMN latency did and was earlier when the sequences began with a 30 msec deviant ﬁrst. By examining MMN latency effects as a function of age and hearing level it was apparent that the differentiation in 30 msec and 60 msec MMN latency expands with older age and raised hearing threshold due to prolongation of the time taken for the 30 msec MMN to peak. The observations are discussed with reference to how the initial sound composition may tune the auditory system to be more sensitive to different cues (i


Introduction
Our brains possess an exquisite ability to attune to the most informative elements within sensory input.Whilst we may have a sense of agency over this information seeking process, much of the work appears to be performed automatically through pattern recognition.By recording brain eventrelated potentials (ERPs) we can infer pattern recognition in task-free learning scenarios from the way that responses change over time (Naatanen & Alho, 1997;Naatanen et al., 1978Naatanen et al., , 2005)).The notion that the act of predicting is central to brain function dates back at least 60 years with the adjustment or tuning in responsivity more recently argued to resemble a Bayesian probabilistic computation where responsiveness indexes a sort of "sensory belief" reflecting a reasonable expectation about the environment (Barascud et al., 2016;Friston, 2005Friston, , 2010;;Friston & Penny, 2011).Internal models produce predictions about future states that are weighted by the goodness of evidence gleaned over time (Lieder et al., 2013).However, observations from auditory ERPs do not always conform to what we might expect to see based on probability alone, with these exceptions prompting consideration of the other drivers or modifiers of learning.This paper features one such exception in which the timing of auditory responses revealed a lasting sensitivity to first learning.
Our brains automatically extrapolate the most probable soundscapes from even highly abstract and complex statistical regularities (Bendixen et al., 2008;Naatanen, 2008;Winkler, 2007;Yeark et al., 2021).Through the formation of internal models, experience is proposed to reduce uncertainty about the world by helping to anticipate likely input (Mathys et al., 2014).In auditory ERPs this is inferred from the differential responsiveness to pattern-conforming versus patternviolating sound where deviations elicit larger negative potentials at fronto-central scalp electrodes within 100e250 msec (Naatanen et al., 2007) even when attention is not directed to the sounds (Sussman, 2007).This negative potential, known as mismatch negativity or MMN, occurs when internal model predictions are in error with large prediction-errors indicating that the environment might have changed in some way that could be relevant, and that the model may need to be updated.Many studies employ very simple sound sequences called traditional oddball designs to quantify experimental or group difference effects on responses (Naatanen et al., 2012;N€ a€ at€ anen et al., 2016).These sequences contain a regular repeating sound and a rare deviating sound, usually differing in a physical attribute such as frequency, intensity or duration, and the repeating properties are fixed so a highly precise model anticipating the most likely attributes can be formed.Rare deviations produce MMN amplitudes that tend to scale based on the degree to which the sound differs from the predicted value (i.e., magnitude of difference such as the frequency discrepancy) and the rarity (larger when rarer) (Kujala et al., 2007;Pakarinen et al., 2007).However, sequences in which patterns change over time require models to be updated, and when patterns alternate, periods of sound are processed differently as a function of prior learning history (Todd et al., 2011).
In alternating oddball sequences two physically different sounds are organised into blocks in which one sound is common and the other rare (Todd et al., 2011).Successive blocks differ in that the sound that was rare in the previous block starts to repeat and the sound that was previously common becomes rare; the two types of block compositions thus alternate over time (see Fig. 1) and predicting the most probable sound properties requires updating of the active internal model.These alternating sequences therefore combine a form of roving paradigm where the repeating element changes every time a deviation occurs (Cowan et al., 1993) with "flip-flop" designs aimed to control for exogenous effects on the response (i.e., those reflecting differences in sound and not the probability per se (Kujala et al., 2007) see also (Pulvermuller et al., 2006)).In young adults, internal model formation/updating seems to be different in the different block compositions.Specifically, MMN elicited to the rare deviant tone within blocks reaches maximum amplitude early in blocks consistent with how the sequence starts, but it takes much longer to reach maximum amplitude in the alternate composition where it is initially significantly smaller (Todd, Heathcote, Mullens, et al., 2014).This finding of higher precision weighting on early error signals is order-dependent, not feature dependent, and always associated with the first block composition.The finding can be observed across a range of different sound features changes (Todd et al., 2013;Todd, Heathcote, Whitson, et al., 2014;Fitzgerald et al., 2018), and similar first-learning effects on auditory ERPs have been reported by other groups (Costa-Faidella et al., 2011;Kotchoubey, 2014).
Whilst the alternating sequences were designed to explore precision-weighting effects on MMN response amplitude, this paper reports on an unexpected order-dependent effect on MMN response latency.Latency refers to the time point at which the amplitude of the MMN component reaches its maximal value.Shorter peak latencies are observed where the deviating feature can be detected earlier in time or where there are differences in representational acuity (Novitski et al., 2004;Kujala et al., 2007;Pakarinen et al., 2007).For example, MMN to an unexpected frequency difference typically peaks earlier than MMN to a duration difference because the difference between the predicted and actual sound frequency can be detected very rapidly from the onset of the sound whereas the difference between a predicted and actual sound duration can only be detected sometime after sound onset when it finishes earlier or later than expected (Naatanen et al., 2007).MMN latency is also shorter where the physical difference between two tones are more clearly distinct (Novitski et al., 2004;Sams et al., 1985) and will extend to as long as 200e300 msec where barely discriminable (Naatanen et al., 2007).
In this paper, the alternating sequences contained two short duration sounds: a 30 msec and 60 msec pure tone.A duration difference at short sound durations like these is perceived in two ways: by a difference in the temporal extent of sound energy where one ceases earlier than the other, and by a difference in loudness due to integration of energy over time with longer sounds being perceived as louder.Based on psychophysics, a 60 msec sound could be expected to sound ~3e4 dB louder than a 30 msec sound, which is a perceptible difference (Zwislocki, 1969).An order-dependent difference in response latencies could therefore imply a modulation of which feature is used to discriminate the expected from unexpected tone, or a modulation of discrimination ability.Sequential learning may therefore prime feature extraction or acuity.The unexpected order dependence in the latency of MMN is a surprising finding inviting speculation about how and why the auditory system might tune in to different information-carrying features in sound.A test of how this observation varies with age and hearing ability was conducted to inform possible interpretations given prior observations that latency can be affected by these factors (Bertoli et al., 2005).

Explanation of data inclusion
The results of three published studies are included in this paper.The exploratory finding emerged from study 1, when using published data (Todd, Frost, et al., 2021) as a comparison for results on a subsequent manipulation (in preparation).This dataset involves two groups of young participants in a between-subjects design with the order of first sound composition counterbalanced as a between-subjects factor.Study 2 features published data from a separate group of participants, who heard one of these sound compositions, serving as an independent replication of the observation (Frost et al., 2016).Finally, study 3 involves an expanded set of data from study 1, originally published in an aging study (Todd, Yeark, et al., 2021).The present analysis includes an additional cohort of participants with their age between the younger and older groups in the published study; together these data are used to test how latency differences vary with age and hearing ability.As noted in the introduction, MMN amplitude and latency change with age (Cheng et al., 2013;Cooper et al., 2006;Gaeta et al., 1998) and the MMN latency is delayed in those with hearing impairment relative to normal hearing adults (Bertoli et al., 2005).

Participants
For study 1 and 2, all participants had to satisfy inclusion criteria that stipulated normal hearing (able to detect sounds Black boxes represent a "Block" of sound during which the 60 msec sound is the common standard (p ¼ .875)and the 30 msec sound is the rare deviant (p ¼ .125)and the grey boxes represent periods where the probabilities are reversed).Each "Sequence" included four blocks of sound alternating between the two compositions without a break between blocks.There was a 1 min silent break between Sequences.All sounds were presented at a regular 300 msec stimulus onset asynchrony (SOA) resulting in a block length of 2.4 min and a sequence length of 9.6 min.Only the data associated with the latter half of a block was used in the analysis to ensure the presence of a clear MMN (as found in the published studies; see text for further discussion).No code was used in statistical analysis and figure generation.
between 500 and 4000 Hz at or above 20 dB HL), no history of head injury or neurological condition, no current mental illness or family history of psychosis and no alcohol or substance abuse.Remuneration was offered as course credit to students and monetary vouchers to community volunteers.Written informed consent was obtained from all participants consistent with standards approved by the University of Newcastle Human Research Ethics Committee.In study 3, the same exclusion criteria were applied with the exception that those with raised hearing thresholds were still included so long as the thresholds for the stimulus frequency (1000 Hz) were a min of 40 dB HL.
Participants for study 1 and 2 were recruited from healthy community volunteers and undergraduate students from the University of Newcastle, Australia (study 1, n ¼ 33, 14 males; 18e32 years, mean ¼ 22 years SD ¼ 2 years (Todd, Frost, et al., 2021), study 2, n ¼ 15, 5 males, 18e27 years, mean ¼ 21 years, SD ¼ 2 years (Frost et al., 2016)).Study 3 included participants from an aging study (Todd, Yeark, et al., 2021) that incorporated a subset from study 1 (n ¼ 17, 8 males, 18e35 years, mean ¼ 21.8 years, SD ¼ 4.5 years) who heard the same composition of sounds as those in the older group.The group of older participants were recruited from the Toronto, New South Wales Australia chapter of the University of the Third Age, in addition to community volunteers and undergraduate students (n ¼ 46, 8 male, 31e75 years, mean ¼ 53.6 years, SD ¼ 7.6 years).All participants who were recruited through the University were offered course credit for participation and community volunteers were offered monetary vouchers for their time and inconvenience.

Procedure
Participants were invited to read the information form for the study, ask any questions and then provide written informed consent if they wished to continue.A screening interview was administered to determine whether any exclusion criteria were present and hearing test conducted to determine thresholds for detection.All participants completed a hearing test using an audiometer (Earscan ES3S) determining the lowest sound presentation level at which they could detect sound in the left and right ear for frequencies between 500 and 4000 Hz.Participants in study 1 and 3 were then fitted with a 64 Channel Neuroscan Quick Cap with Ag/AgCL electrodes arranged per the extended International 10e20 system including the nose as reference, and bilateral mastoid electrodes with data acquired from all electrodes.Participants in study 2 (which was conducted earlier) were fitted with a 32 Channel Neuroscan Quick Cap with Ag/AgCL electrodes arranged per the International 10e20 system including the nose as reference, data acquired from a reduced montage of FZ, CZ, PZ, F3, FC3, C3, F4, FC4, C4, F7, F8 and left and right mastoids.
Electrooculogram was recorded from an additional four electrodes: one above and another below the left eye, and two 1 cm lateral to the outer cantus of each eye.The impedance for all electrodes was reduced to less than 5 kU prior to recording and data was acquired continuously with a nose reference at a 1000 Hz sampling rate (highpass .1 Hz, lowpass 70 Hz, notch filter at 50 Hz and a fixed gain of 2010) on a Synamps 2 Neuroscan system.
Participants selected a movie to watch while binaural sounds were presented over stereo headphones (Sennheiser HD280pro) and continuous EEG data was acquired.Each participant was asked to focus attention on the subtitled movie and it was explained that the process being measured was something that the brain does automatically and it is best recorded when attempting to ignore the sounds and focus attention elsewhere as recommended (Kujala et al., 2007).

Sounds and sequences
All studies included sound sequences that featured an alternating oddball design as depicted in Fig. 1.At any given time in the sequence there was a regular repeating sound with rare occurrences of a sound that changed in duration.Two block compositions were possible; a sequence of common 30 msec sounds occasionally interrupted by a rare deviant 60 msec sound or the reverse (where the common was probability ¼ .875 and the rare probability ¼ .125).All sounds were 1000 Hz pure tones and were created with rise/fall times of 5 msec (cosine window) and pedestals of 20 msec and 50 msec, for 30 msec and 60 msec versions, respectively.Each block began with a minimum of five consecutive presentations of the new common standard tone and any occurrence of the rare deviant tones was separated by a minimum of at least three common standard tones.All sounds were separated by a constant 300 msec stimulus onset asynchrony and were presented at 75 dB SPL.
Participants heard four identical sequences composed of four consecutive blocks of 480 tones without breaks between the blocks and these sequences were repeated with a 1min silent break between each (see Fig. 1A and B).

Data analysis
Data for all studies were processed in Neuroscan Edit (version 4.5) software.The continuous EEG in all studies was inspected for movement or other large artifacts and these were manually excluded.An algorithm was then run to model and mathematically eliminate eye-blink artifact (Semlitsch et al., 1986) before extracting time-locked epochs starting 50 msec before the sound presentation and ending 300 msec after.Epochs were baseline corrected to zero across the entire period and then any epoch with variation that exceeded ±70 mV was rejected from further processing.Epochs were averaged in accordance with block-type and tone-type but only for the later half of each block.This data restriction was applied here to ensure that clear MMN responses were present for both the 30 msec and 60 msec tones as deviants, because prior studies exposed the delayed emergence of MMN for the deviant sounds which initially served as standard sounds within the sequence [i.e., for the 30 msec tone for the sequence composition depicted in Fig. 1A (Todd, Frost, et al., 2021;Todd, Yeark, et al., 2021)] and the 60 msec tone for the sequence composition depicted in Fig. 1B (Todd, Frost, et al., 2021).This process generated a 30 msec and 60 msec deviant and 30 msec and 60 msec standard average.Averages in all studies were digitally low pass filtered at 30 Hz and were rereferenced to the average activity at the left and right mastoid sites to increase signal to noise ratio.Although filtering c o r t e x 1 7 2 ( 2 0 2 4 ) 1 1 4 e1 2 4 can affect peak latencies the use of a digital filter minimises the shifting of the peak and it is a filter setting recommended for extraction of these components (Kujala et al., 2007).
A difference waveform was generated by subtracting the response to the common occurrence of a tone from that of the rare occurrence of the same tone.The MMN to a duration change is often observed as having a right frontal maximum (Schr€ oger, 1998).Commensurate with this known property we also observed a right-frontal maximum (see Supplementary Materials) and measurements of peak latency were therefore obtained from Fz, FCz, F4, FC4 electrodes centred over this region of interest.As described earlier, the studies were designed to explore MMN peak amplitude and these measures have been analysed and published previously.The data analysis reported here is focused on unexpected observed latency differences.To ensure that these latency differences were not simply reflecting amplitude effects we assessed correlations between amplitude and latency in the largest data set (i.e., study 3, n ¼ 63) where the Pearson's correlation for the 60 msec MMN was r ¼ .25 p ¼ .102and for the 30 msec was r ¼ .090p ¼ .555.Thus, the peak amplitude and the peak latency did not significantly correlate in this data set.The amplitude of the MMN is therefore not discussed any further in this manuscript.
The average peak latency over Fz, FCz, F4, FC4 electrodes served as the dependent variable used for data displays in the results section for study 1 and 3 and over the equivalent three available sites Fz, F4, FC4 for study 2. The peak latency was defined as the point at which the difference wave reached the maximum negative amplitude between 100 and 270 msec after sound onset.Where a peak latency measure occurred at the edge of the specified time window the individual participant data was visually inspected for that condition to determine the actual peak latency.A manual inspection/correction to data was required as follows: for one participant in the 60 msec First Deviant condition of study 1, for no participants in the 30 msec-First-Deviant condition of study 1, for no participants in study 2, and for seven participants in study 3. To assess the visible trends in the data, the averaged peak latency for the region of interest was entered into a mixed model ANOVA with group as the between-subjects variable and the within-subject variable of tone (30 msec or 60 msec).Significant interactions were followed up with post-hoc tests using the Holm correction.In study 3, Spearman Rank correlations was used to assess the relationship between latency and age and latency and hearing level to capture monotonic relationships.Due to this relationship being significant, a repeatedmeasures ANCOVA was used to examine the MMN latency with a within-subject variable of tone (30 msec or 60 msec) and age and hearing level as covariates.
No part of the study procedures nor analyses were preregistered prior to the research being conducted.The conditions of our ethics approval do not permit public archiving of anonymised study data.Readers seeking access to the data should contact the lead author Juanita Todd.Access will be granted to named individuals in accordance with ethical procedures governing the reuse of sensitive data.There are no conditions that must be met to obtain this data.The experiment code can be accessed at DOI 10.17605/OSF.IO/ZUF37.As an exploratory report, sample size determination is not relevant but inclusion exclusion criteria for studies have been noted and were established prior to the studies commencing.

Study 1 and study 2
The group averaged deviant-minus-standard responses for the 60 msec-First-Deviant and 30 msec-First-Deviant conditions in study 1, and the 60 msec-First-Deviant condition for study 2 are presented in Fig. 2A.The associated group mean latencies with 95 % confidence intervals are presented in Fig. 2B, and the distributions of the individual participant peak MMN latencies are presented in Fig. 2C.As is clear from Fig. 2, the peak MMN latencies for those hearing the 30 msec-First-Deviant condition in study 1 were visibly earlier for the 60 msec than the 30 msec deviant (60 msec mean latency 149.57msec and 30 msec mean latency 164.22 msec).In contrast, the MMN peak latencies for the two groups with the 60 msec deviant first tended to be very similar to each other, and later than those for the group who heard the 30 msec-First-Deviant condition for the 60 msec tone.The group mean latency for the peak amplitude of the 30 msec MMN was 174.47 msec for study 1 and 173.46 msec for study 2. The group mean latency for the peak amplitude of the 60 msec MMN was 168.43 msec for study 1 and 176.22 msec for study 2. Due to capability, data for the two groups with the 60 msec-First-Deviant condition were combined into a single set for the subsequent statistical analysis so that the between-subjects factor represented the difference in sequence composition order only (30 msec-First-Deviant vs 60 msec First Deviant).A mixed model ANOVA on the MMN peak latency measure revealed a main effect of tone [F(1,50) ¼ 7.49, p < .009,h 2 p ¼ .13]with earlier latencies for the MMN elicited by the 60 msec tone, and a main effect of condition [F(1,50) ¼ 12.46, p < .001,h 2 p ¼ .20]with earlier latencies for the group with the 30 msec-First-Deviant condition.The tone by group interaction was significant [F(1,50) ¼ 4.58, p < .037,h 2 p ¼ .08]with post-hoc analysis supporting a difference in latencies between the 30 msec and 60 msec long tones for the 30 msec-First-Deviant condition only [t(16) ¼ 3.01, p Holm ¼ .016]and the latencies differing significantly between the 30 msec and 60 msec-First-Deviant conditions only for the 60 msec tone [t(49) ¼ 4.11, p Holm ¼ .001].The density distributions for the latency values show clearer differentiation for the 30 msec-First-Deviant condition in Fig. 2C in line with the differences in group mean latencies in Fig. 2B.
The peak MMN latency differences in data have a visible correspondence to slightly different morphologies for the MMN response to the 30 msec and 60 msec tones.While the 30 msec MMN (and deviant response) is characterised by a fairly clear single peak, the 60 msec MMN and deviant response exhibit an earlier emergent difference between the standard and the deviant response (~100 msec) in addition to the later difference similar to that found for the 30 msec tone.The distribution of peak latencies for the 30 msec-First-Deviant condition demonstrates bi-modality (Fig. 2C) suggesting that for the response to the 60 msec tone there are a c o r t e x 1 7 2 ( 2 0 2 4 ) 1 1 4 e1 2 4 subgroup for whom this early peak may be larger and a subgroup exhibiting a larger later peak.A demonstration of these patterns at an individual participant level is provided in Supplementary Materials.

Study 3
As noted in methods, the data presented for the 17 younger adults who heard to 30 msec-First-Deviant condition in study 1 is part of a larger data set inclusive of 46 additional participants expanding the age range from 18 to 75 years.This larger dataset was explored in study 3 with the intent to determine whether the latency differences seen in Fig. 2 varied as a function of age or hearing level.Spearman Rank correlations were used to examine whether the MMN peak latencies to the 30 msec and 60 msec deviants increased with age and hearing level.Table 1 reveals a significant positive relationship between increasing age and the MMN peak latency to 30 msec tones.The same relationship is evident between the MMN peak latency for 30 msec tones and hearing level, albeit stronger for right ear measures.The correlation between the MMN peak latency for 30 msec tones and hearing level in the right ear survived a correction for variance associated with age suggesting that  age and hearing level accounted for partially independent variance in the MMN peak latency.Only four participants had hearing levels outside the normal range (two at 25 dB, one at 35 dB and one at 40 dB) and the correlation between 30 msec peak MMN latency and hearing level in the right ear remained significant even with these cases removed (r s ¼ .45,p < .001).
Neither age nor hearing levels correlated significantly with the latency of the MMN to the 60 msec sound.The scatterplots of the relationship between the MMN peak latency for 30 msec tones and age and hearing level for the right ear are presented in Fig. 3.When subjected to a repeated measures ANOVA with a within-subject factor of tone (30 msec vs 60 msec), the analysis of study 3 data yielded a main effect of tone [F(1,62) ¼ 49.46, p < .0.001, h 2 p ¼ .44]consistent with the earlier MMN peak for the 60 msec tone relative to the 30 msec tone (mean ¼ 149.74 msec vs 175.60 msec, respectively).However, this effect was abolished by the inclusion of age and hearing level (right ear) as covariates in the ANCOVA with age introduced as a marginal covariate of latency [F(1,58) ¼ 3.61, p ¼ .0.062].Hearing level was found to be a significant modulator of the effect of tone [tone by hearing level interaction: F(1,58) ¼ 8.08, p < .0.006, h 2 p ¼ .12].The data for this study are presented in Fig. 4 with two groups created for display purposes based on whether they had comparatively "good" hearing (thresholds of 10 dB or less, n ¼ 35 mean age ¼ 39 years) or "moderate" hearing (thresholds of 15 dB or higher, n ¼ 26 mean age ¼ 59 years).The two hearing groups were chosen to visually expose the differentiation based on hearing level in the relatively large sample sizes.The tendency for the peaks in the density distributions to differentiate between the tones very clearly increases with hearing level (Fig. 4C).

Discussion
All participants within the studies reported here were presented with simple sound sequences in which a longer (60 msec) and shorter (30 msec) sound alternated as rare or common events over time.Consistent with a vast literature on auditory perceptual inference, the auditory event-related responses to these sounds exhibited the expected changes in amplitude (reduced response to common and enhanced response to rare) that expose the MMN component known to occur to events violating context-specific predictions.Based on physical properties of the sounds themselves one might suppose that the time point at which a prediction-violation would be evident should be equivalent in both contexts given that the earliest point at which a sound differs from expected properties is when the sound energy ramps down earlier than expected (when the 30 msec sound is rare) or fails to ramp down when expected (when the 60 msec sound is rare).This is supported by equivalent latencies for the peak of the MMN response when the sequences start will the 60 msec sound as the rare tone, but not when the sequences start with the 30 msec sound as the rare tone.This order-dependent finding was discovered in an exploratory between-subjects study (study 1 and 2 datasets) and is an unexpected finding because the statistical properties of the sequence compositions were always locally identical, and the physical properties of the tones were also identical across studies so one might expect that any tuning of the auditory system to the change point (i.e., ~30 msec from sound onset) would be equivalent.
The absence of a peak latency difference in the 30 msec and 60 msec MMN was evident in two separate groups of young healthy individuals who heard sequences starting with the 60 msec sound as the rare tone, and the main difference for the group who heard the same sequences with the 30 msec tone as the first rare tone was an earlier peak for the for the 60 msec MMN.Earlier MMN latencies are thought to represent more distinctive sound representations supporting higher discriminability and are often associated with larger amplitudes (Novak et al., 1992;Novitski et al., 2004;Pakarinen et al., 2007).The latter finding was not observed in the current data, as the MMN peak latencies and amplitudes showed no significant correlation in the largest of the three data sets (see Methods).The encoding of sound duration is thought to be supported by at least three different mechanisms; neurons that respond throughout a sound, neurons that respond selectively to sound offset, and neurons that respond with transient on and off characteristics with the perception of duration assumed to be most tightly linked to offset responses in the latter (Li et al., 2021).However, the temporal duration of the sound may not be the only cue affecting the response to deviants in these sequences.With sound durations less than ~200 msec, longer sounds tend to sound louder due to the temporal summation of energy (Florentine et al., 1996;Zwislocki, 1969).In other words, a long deviant is perceived to be both longer and louder and conversely, a short deviant is perceived to be not only shorter but also softer.Manipulating the intensity of such sounds or attempting to control for these regular cues can affect the amplitude of the MMN response (Jacobsen & Schroger, 2003;Todd et al., 2001;Todd & Michie, 2000).To be clear, we do not know why the order effects are present; they have not been expected and we know of no precedent.However, this duality in differentiating percepts offers a potential explanation for how and why the latencies of the MMN to the deviant sounds might differ as a function of the order in which role they were heard.A possible explanation for these results is that the first heard composition of sounds (i.e., which sound is common, and which one is rare) tunes the auditory system to focus on different cues as the relevant distinction between the two sounds (i.e., focussing on "softer/louder" cues or on "shorter/ longer" cues).If what is distinctive about unpredictable sounds in the initial stimulus block is that they are louder, it may follow that the tuning of the change point will be based on the cumulative energy of sounds.This could lead to a similar focus in processing both forms of deviance and therefore, to similarity in the peak latencies between the MMN elicited by duration increase or decrease.Alternatively, what is distinctive about unpredictable sounds in the initial stimulus block is that they stop early causing an abrupt decrease in energy, it may follow that the tuning of the change point will be based on this discrete shorter/longer cue.If it is easier to detect an unexpected sound continuity (60 msec deviant) than discontinuity (30 msec deviant), this could explain the shorter peak latency for the 60 msec than for the 30 msec deviant.Why could a sound offset (a sudden drop in energy) be more difficult to detect?We know that sound offsets are less salient than onsets with sound-offset responses have been found to be particularly sensitive to age-and disease-related imbalanced in excitation-inhibition (reviewed in Kopp-Scheinpflug et al., 2018).In fact, detecting the offset of the sound takes measurable time, as temporal integration continues (auditory persistence ;Miller, 1948;Plomp, 1964), and the threshold by which one decides that the sound has terminated must depend on the context.Although the processing of offsets is only partially understood, it has been suggested that offset responses could depend on many parameters by the level of the inferior colliculus including their perceived relevance (Kopp-Scheinpflug et al., 2018).
Our exploration of age and hearing level indicated that the propensity to show latency differences is affected by mild (subclinical) hearing loss with the relative delay in the latency of the MMN to 30 msec deviant tones increasing with age and hearing loss.With outer hair cell damage, we become more sensitive to loudness cues; a phenomenon referred to as loudness recruitment and characterized by an abnormal amplification of the perception of medium to high-intensity sounds (Shi et al., 2022;Shiraki et al., 2022).It is plausible that such factors could be at play for the participants in study 3.The results are certainly consistent with age and hearing loss being factors that complicate the extraction of the cue used to differentiate a 30 msec deviation, but not the cue being used to detect the 60 msec deviant.One way to try to test this hypothesis would be to remove the informativeness of loudness cues altogether by presenting all sounds (short and long) at randomised presentation intensities.This should force the auditory system to use a cue for detecting the change point by tuning to the discrete shorter/longer cue given that cumulative loudness would no-longer bare a monotonic relationship to tone length (i.e., a longer sound presented at lower intensity would not necessarily be louder than a short sound at higher intensity).Under such circumstances we might expect to see earlier latency for the 60 msec MMN regardless of how the sequence starts based on the speculation above.The complex way in which the auditory system tunes to temporal properties could also mean that cues like the regular silent intervals could alter how sounds are processed, being shorter on average for blocks where the longer sound is common and longer on average for the blocks where the shorter sound is common.This regular cue could be removed by jittering sound onset times.However, it is not clear how this cue could explain the order effects seen here.In summary, the addition of data on persons across a broad age-range and with differing hearing levels offers a potential insight into explaining the current latency difference effect.However, future studies are needed to explore whether age and hearing level related differences remain, or indeed whether they affect the data differently, when the current sound compositions are heard in the reverse order.
It should certainly be noted that the control over attention in these studies was light (a request to focus on the movie and not attend to the sounds) and one cannot exclude the possibility that attention may have been drawn to the sounds on occasion.The method is nonetheless consistent with recommendations for this type of research and there is no reason to suppose that any wondering of attention would apply differentially to the different sound compositions.It is therefore considered unlikely that wondering attention could explain the different pattens of latency in these data.However, attention is believed to play a key role in tuning the precision of internal models in perception (Hohwy, 2012) and could therefore contribute to what feature/attribute dominates in internal models to the extent that higher focus may be drawn to the sound at the onset of sequences than during later sequence blocks.
The observed tuning of the auditory system in an orderdependent fashion is a noteworthy observation for several reasons.It highlights the marked context-dependence of ERPs and perception.Far from being a passive register of sound, the brain appears to actively shape and tune to the environment, perhaps in a manner that best differentiates the information content within input (Dean et al., 2005).It suggests that internal models are not formed solely on the basis of sound attributes and their probabilistic factors such as likelihoods, as these are equivalent regardless of the order of the sound compositions.The serendipitous finding reported here expands a significant literature on how first learning can exercise lasting influences over what we learn from subsequent information (Bruno et al., 2013;Todd et al., 2013;Bulgarelli & Weiss, 2016;Fitzgerald et al., 2018Fitzgerald et al., , 2021;;Frost et al., 2016Frost et al., , 2018;;Kotchoubey, 2014;Todd et al., 2011Todd et al., , 2016Todd et al., , 2020)).While we know this occurs, we know less about why.It may reflect a strategic conservation of energy in resting on that which is initially most informative unless sufficiently significant changes occur later to justify further investment of resources to extract superior predictors (Todd et al., 2020).If so, it is perhaps not surprising that measures derived from automatic perceptual inferences in task-free learning environments are prone to this kind of anchoring.

Fig. 1 e
Fig. 1 e Pictorial representation of the sound sequence structures used in Study 1 (A and B), Study 2 (A) and Study 3 (B).(C)Black boxes represent a "Block" of sound during which the 60 msec sound is the common standard (p ¼ .875)and the 30 msec sound is the rare deviant (p ¼ .125)and the grey boxes represent periods where the probabilities are reversed).Each "Sequence" included four blocks of sound alternating between the two compositions without a break between blocks.There was a 1 min silent break between Sequences.All sounds were presented at a regular 300 msec stimulus onset asynchrony (SOA) resulting in a block length of 2.4 min and a sequence length of 9.6 min.Only the data associated with the latter half of a block was used in the analysis to ensure the presence of a clear MMN (as found in the published studies; see text for further discussion).No code was used in statistical analysis and figure generation.

Fig. 2
Fig. 2 e A. The group averaged deviant and standard responses for the 30 msec (green) and 60 msec tones (orange) together with the resultant deviant-minus-standard difference waveforms, separately for the 30 msec (n ¼ 17) and the 60 msec-First-Deviant groups (n ¼ 16 study 1 and n ¼ 15 study 2).The vertical lines at the abscissa indicate the group mean peak latency for MMN.B. The group mean peak latencies for each group as a function of the tone type (30 vs 60 msec) with error bars indicating 95 % confident intervals.C. The MMN peak latencies for each participant as a function of tone type, separately for each group; the density distributions are shown to the right of each plot.

Fig. 3 e
Fig. 3 e Scatterplots (n ¼ 63) of the relationship between the MMN peak latency for the 30 msec tone on one side and the hearing level in the right ear (left panel) and age (right panel) on the other side.

Fig. 4
Fig. 4 e A. Deviant and standard responses averaged separately in two groups based on hearing level (n ¼ 35 and 26, for the Good and Moderate hearing level groups, respectively), separately for the 30 msec (green) and 60 msec tones (orange), together with the resultant deviant-minus-standard difference waveforms.The vertical lines at the abscissa indicate the group mean peak latency for MMN in the difference waveforms.B. The group mean peak latencies for each group as a function of the tone type with error bars indicating 95 % confident intervals.C. The MMN peak latencies for each participant as a function of tone type, separately for each group; the density distributions are shown to the right of each plot.

Juanita Todd :
Conceptualisation, Data Curation, Formal Analysis, Funding Acquisition, Methodology, Supervision, Writing Original Draft.Mattsen Yeark: Data Curation, Data Processing, Project Administration, Writing e review and editing.Paul Auriac: Data Curation, Writing e review and editing.Bryan Paton: Supervision, Writing e review and editing.Istv an Winkler: Conceptualisation, Writing e review and editing.

Table 1 e
Spearman Rank correlation (based on the 63 participants of study 2) between the MMN peak latency measures (30 and 60 msec tones) on one side and age (in years) or hearing level (in dB) for the left and right ear on the other side.Correlations in parentheses are those corrected for variance attributable to age.