Task difficulty modulates voluntary attention allocation, but not distraction in an auditory distraction paradigm

Keeping task-relevant sensory events in the focus of attention while ignoring irrelevant ones is crucial for optimizing task behavior. This attention-distraction balance might change with the perceptual demands of the ongoing task: while easy tasks might be performed with low attentional effort, difficult ones require enhanced attention. The goal of the present study was to investigate how task difficulty affected allocation of attention and distractibility in an auditory distraction paradigm. Participants performed a tone duration discrimination task in which tones were rarely, occasionally presented at a rare pitch (distracters), and task difficulty was manipulated by the duration difference between short and long tones. Short tones were consistently 200 ms long, while long tone duration was 400 ms in the easy, and 260 ms in the difficult condition. Behavioral results and deviant-minus-standard event-related potential (ERP) waveforms suggested similar magnitudes of distraction in both conditions. ERPs without such a subtraction showed that tone onsets were preceded by a negative-going trend, suggesting that participants prepared for tone onsets. In the difficult condition, N1 amplitudes to tone onsets were enhanced, indicating that participants invested more attentional resources. Increased difficulty also slowed down tone offset processing as reflected by significantly delayed offset-related P1 and N1/N2 waveforms. These results suggest that although task difficulty compels participants to attend the tones more strongly, this does not have significant impact on distraction-related processing.


Introduction
In many everyday tasks, overall performance depends on our ability to keep focusing on potentially task-relevant stimuli while filtering out task-irrelevant ones. In many cases, however, it is impossible to completely ignore distracting sensory events, especially those in the auditory modality. Although reducing overall performance, being distracted is potentially useful, because it allows the re-assessment of the situation (i.e. whether the ongoing behavior is still optimal or adaptive). It has been suggested that normal functioning is characterized by a balance between processes maintaining a task-oriented attentional focus and processes allowing to be distracted by task-irrelevant stimulation (Parmentier, 2014;Schröger, 1996;Volosin et al., 2016). Schröger (1997) suggested that a stronger focus on one's task allows to set a lower threshold which rare, unpredictably occurring task-irrelevant stimuli have to exceed to enter consciousness and trigger distraction. Although this variable threshold concept intuitively captures how task demands may influence the attention-distraction balance when task- relevant and -irrelevant stimuli are independent and well-separable, understanding processes contributing to the attention-distraction balance in other stimulation arrangements is less straightforward. While the impact of perceptual task-demand on distraction is well-documented in vision (perceptual load theory: Lavie, 1995;Lavie et al., 2004;Murphy et al., 2017), studies in auditory modality show mixed results (for a review see Murphy et al., 2017). The goal of the present study was to investigate how the manipulation of perceptual task-demands affected the allocation of attention in one of the most widely used paradigms for the study of event-related potential (ERP) correlates of auditory distraction.
Auditory distraction is extensively studied with paradigms using stimulation arrangements in which task-relevant and -irrelevant aspects of the stimulation are coupled (Schröger and Wolff, 1998b;Escera and Corral, 2003;Escera et al., 1998). That is, although certain aspects of the stimulation are deemed task-irrelevant by the instructions, the stimulation protocol features constant relationships between these aspects, which can be exploited by participants to improve task performance (Parmentier, 2014;Wetzel et al., 2013). For example, when the task-relevant stimulus aspect is always preceded by a sensory event in the same time interval (i.e. a cue), this predictable temporal relationship can be exploited to speed up the processing of the task-relevant aspect. If distraction is induced by replacing the standard cue (rarely and unpredictably) with a deviant stimulus, then processes subserving performance optimization and processes subserving the prevention of distraction are in conflict: Allocating attention to the cue allows better performance on most trials, but it also simultaneously lowers the variable threshold for the stimulus presented at the cue position, and thus "opens up" the cognitive system to potential distracters, which will thus be more efficient (or more difficult to ignore) distracters.
In the auditory version of the above-mentioned distraction paradigm (Schröger and Wolff, 1998b), participants perform a (two-alternativeshort or long) tone duration discrimination task in which the standard pitch is occasionally replaced by a deviant one. Although tone durations ranging between 100 and 600 ms have been used, most studies utilized 200 ms for short, and 400 ms for long tones separated by constant stimulus-onset-intervals of 1100-1600 ms (Berti et al., 2004;Berti and Schröger, 2001;Horváth et al., 2009;Horváth et al., 2008;Roeber et al., 2003aRoeber et al., ,b, 2005Schröger et al., 2000;Schröger and Wolff, 1998a;Wetzel et al.2006). Responses to (both short and long) deviant tones were typically slower and less accurate than those to standards. The deviant-minus-standard ERP difference waveform (pooling ERPs to short and long tone-variants) typically exhibited a characteristic chain of deflections (sometimes referred to as distraction potential, Escera and Corral, 2003) consisting of three distinct parts. These are thought to reflect the stages of distraction: The first (negative) part (peaking between 100 and 200 ms after deviance onset) encompasses several ERP components and ERP-effects (N1-effect, mismatch negativity -MMN; for a summary see Garrido et al., 2009;Rinne et al., 2006) reflecting separate auditory change detection processes. This is followed by a positivity (P3a) peaking around 250-350 ms, interpreted as reflection of attentional orientation (Polich, 2007) or task-switching (Barceló et al., 2006;Hölig and Berti, 2010). Finally, the re-orienting negativity (RON) is observable around 400-600 ms, which is speculated to reflect the re-orientation of attention to the task (Schröger and Wolff, 1998b).
Several recent studies (Horváth, 2014a,b;Horváth and Winkler, 2010;Volosin et al., 2017) suggested that in such paradigms, direct contrasts between ERPs elicited by tone on-and offsets (Hillyard et al., 1973;Hillyard and Picton, 1978) may provide information on attention allocation and distraction, which cannot be readily observed in deviantminus-standard differences. In the following, we are enumerating the attentional processes that may go on in duration discrimination tasks, with special attention on how task demand may affect these.
It seems plausible that performance in two-alternative duration discrimination can be improved by allocating attention to two timepoints: tone onsets and the timepoint when the offset of a short tone may occur. In the vast majority of studies utilizing the distraction paradigm, tones are presented with a constant stimulus onset interval (SOA) resulting in a regular pace which allows participants to prepare for the onset of the next tone, which is the first task-relevant reference point for duration estimation. Preparation is also manifested by slow, negative-going ERPs before tone onsets (Berti and Schröger, 2001;Horváth, 2014bHorváth, , 2016Horváth et al., 2017;Volosin and Horváth, 2014). This effect was also described and labeled "stimulus preceding negativity" (SPN) by van Boxtel and Böcker (2004) who suggested that this slow negative-going wave might be regarded as a part of contingent negative variation (CNV; Walter et al., 1964) elicited by the preceding stimulus and reflecting stimulus anticipatory processes. Beside the constant SOA allowing the trial-by-trial structuring of the task, attending the onset is also useful, because it may allow one to prepare for and attend the moment the offset could occur (for the short tone). Numerous studies show that when a cue precedes a task-relevant stimulus by a constant separation between 100 and 400 ms, task-performance increases (see below). This is the foreperiod effect (Holender and Bertelson, 1975).
That participants might take advantage of the temporal regularity of the stimulation to focus on onsets is supported by two lines of evidence. First, in studies in which the temporal separation between two events (the first one was either distracter or cue, and the second one was taskrelated) was manipulated, distraction was reduced or even abolished when the contingency between them was low, or when the foreperiod was not constant (Parmentier, 2014;Volosin et al., 2016;Wetzel et al., 2012). Second, distraction effects have been reported even for deviance magnitudes close to the level of discriminability: in a duration discrimination task applied by Berti, Roeber and Schröger (2004), tones were 200 or 400 ms long separated by constant 1300 ms, and pitch deviance could be 1, 3, 5 or 10%. They demonstrated that even a 1% pitch changewhich would be hardly conspicuous otherwiseresulted in reaction time prolongation and in the emergence of the distraction potential (Berti et al., 2004). These results support the notion that participants rely on the informative temporal relationships to orient attention to tone onsets and thus improve task performance, but because of attending these timepoints, unpredictably occurring deviance at these onsets leads to increased distraction. On the other hand, when the cues are uninformative, and thus the strategic allocation of attention is not possible, the processing of the distracters and distraction will be weaker.
Because voluntary allocation of attention requires effort (see Kahneman, 1973;Sarter et al., 2006), it is important to note that just because an opportunity is present to improve performance by attending certain events, the degree to which this opportunity is exploited may well depend on other characteristics of the paradigm and the participants (e.g. motivation). In some paradigms, the effort vs. performance gain ratio might not be attractive, thus it may not be "worth" to invest the effort (see e.g., Horváth, 2014a,b). An often-used experimental manipulation to compel participants to increase attention allocation effort is to make the task more difficult perceptually. Specifically, when duration discrimination is made more difficult by reducing the duration difference between short and long tones, participants may match this by focusing on tone onsets more strongly. Conversely, one may hypothesize that rare pitch deviants will lead to enhanced distraction effects. Only two studies assessed this question directly with somewhat diverse results (Muller-Gass and Schröger, 2007;Sabri et al., 2006). In both studies, task difficulty was manipulated by adjusting the duration difference between the short and long tones. In the study of Muller-Gass and Schröger (2007), in the easy condition, a 100 vs. 400 ms duration discrimination was administered, whereas in the difficult condition tone durations were 190 ms and 310 ms (the onset-to-onset interval was 1800 ms). As typical in such arrangements, performance (hit rate) was lower, and reaction times were longer for rare pitch deviants than for standards; also, more difficult discrimination resulted in lower hit rates and delayed responses, but no interaction was found, that is, task difficulty did not seem to modulate distraction. Task difficulty did not significantly affect distraction-related ERPs either: MMN and P3a amplitudes were comparable between easy and difficult conditions. In a similar paradigm, Sabri et al. (2006) utilized tones with 50 vs. 60 ms tone durations in the difficult, and 50 vs. 100 ms in the easy condition (the onset-to-onset interval was 1400 ms). They found the same pattern of behavioral effects as Muller-Gass and Schröger (2007), but in contrast, deviant-minus-standard difference waveforms showed a larger N1-effect, a smaller MMN, and a larger P3a when task was more difficult. As revealed by the simultaneous fMRI recording, contrast between trains containing deviants and those consisting solely of standards showed increased activation at right dorsal superior temporal areas. Moreover, when comparing deviant-minus-standard contrasts between difficult and easy conditions, larger deviance-induced activation was found at right supratemporal gyrus, medial part of right superior frontal gyrus, middle frontal gyrus and insula. Sabri and colleagues suggested that the activation of these areas represent the N1 and P3a components, implying stronger allocation of attention to tone onsets when discrimination was difficult. This enhanced attention reflected by N1 presumably amplified the bottom-up deviance detection and triggered an attention shift towards these events as reflected by the P3a (Sabri et al., 2006).
The somewhat diverging results of the studies of Muller-Gass and Schröger (2007) and Sabri et al. (2006) might be explained by differences in the onset-offset separations differed between the two studies. Whereas SOA was similar in the two studies, tone durations and the duration ratios between short and long tones were very different: Muller-Gass and Schröger (2007) utilized short tones with duration of 100 and 190 ms, in Sabri et al.'s (2006) experiment the offset of the short tones (that is, the timepoint at which the task-relevant information became available) was at 50 ms. Because reaction times in tasks utilizing cues or warning signals were found to be the fastest when cuetarget intervals (foreperiods) were around 100-200 ms (Bertelson, 1967;Proctor and Vu, 2003); and 250-400 ms foreperiods resulted in the optimum level of both speed and accuracy (Bertelson, 1967;Los et al, 2001), the 50 ms interval in the study of Sabri et al. (2006) is too short to be utilized for preparing for the task-relevant moment. Based on this, in the study of Muller-Gass and Schröger (2007) participants had the opportunity to temporally structure their deployment of attention (and exploit the foreperiod effect), whereas in Sabri et al.'s (2006) study, structuring attention allocation in such a way was unfeasible, and thus a continuously enhanced attentional focus might have been required for the whole tone duration, resulting in a more demanding task.
Both studies referred to above use the deviance-minus-standard difference waveforms in which short and long tones are typically averaged together to investigate attention-and distraction-related processes. Attention effects might also be manifested in ERP waveforms without the need for such a subtraction. Besides SPN (van Boxtel and Böcker, 2004) reflecting preparation processes as mentioned above, ERPs elicited by tone on-and offsets are important indicators of attentional focus. Tone onsets elicit a tri-phasic positive-negative-positive waveform: P1 peaks around 50 ms and presumably reflects early sensory processing and gating (Boop et al., 1994;Woodman, 2010), N1 peaks round 100-150 ms andsomewhat similarly to MMN (Näätänen and Winkler, 1999)reflects sensory change detection and triggering attention orientation (Näätänen, 1982;Näätänen and Picton, 1987), and finally, P2 peaking around 150-200 ms is probably related to stimulus evaluation processes (Crowley and Colrain, 2004). Although attention impacts the amplitude of all of these components, the vast majority of studies focused on N1: N1 amplitude enhancement was found to auditory events in the focus of attention (Hansen and Hillyard, 1980;Hillyard et al., 1973;Lange, 2013) while the disruptions of attention set resulted in decreased N1 amplitudes (Horváth, 2014a,b;Horváth and Winkler, 2010; see also Schröger, 1996). In the context of active attention, it is important to add that other negative components may also occur in the time window between 150 and 250 ms which might briefly follow or overlap N1. These waveforms are typically elicited at fronto-central areas and are modulated by increased attention. For example, negative difference (Nd; Hansen & Hillyard, 1980) and processing negativity (PN; Alho, 1992;Alho et al., 1986) are hypothesized to index processes matching the input to an attentional trace, and N2 reflects stimulus evaluation processes in context of task-relevance (Ritter et al., 1979).
Following tone onsets, the next task-relevant reference points are the offset of the short tones indicating the moment in time when decision about tone duration can be made. Tone offsets elicit comparable neural effects to task-relevant events both at vertex (Davis and Zerlin, 1966;Hari et al., 1987;Hillyard and Picton, 1978) and at temporal sites (T-complex; Wolpaw and Penry, 1975), which differ as a function of task-relevance or attention allocation (Horváth, 2014b(Horváth, , 2016Horváth et al., 2017). The typically used, 150-200 ms constant short tone durations (foreperiods) allow the most efficient task-preparation as described above. Specifically, in duration discrimination tasks, when no distracting event was present at tone onsets (i. e. for standards), offsetrelated P1-N1 elicitation was observable at short tones and at the shortminus-long difference waveform as well (Horváth, 2014b(Horváth, , 2016Horváth et al., 2017). In contrast, this effect was missing for offsets of deviant tones, suggesting that attention was captured by onset-related deviance and the attention set was not restored until the time of tone endings and until the elicitation onset of P3a or RON (Horváth, 2014b). Similar offset-related N1 modulations were demonstrated in cases when larger amount of attention was allocated to tone offsets for example in older adults or when the duration difference between short and long tones was small. In such cases T-complex also might be dominated by delayed fronto-central components like N1 or N2 (Horváth, 2016;Horváth et al., 2017).
Based on the studies presented above, the goal of the present study was to investigate the effects of task difficulty on distraction and the processing of tone on-and offsets reflecting allocation of attention. We utilized a similar paradigm to Muller-Gass and Schröger's (2007) and to Sabri et al. (2006), systematically manipulating the perceptual difficulty of the task by the amount of duration difference between short and long tones but with important adjustments. First, as Sabri et al. (2006) presented much shorter stimuli (50 vs 60 ms in the difficult condition) than the majority of studies applying distraction paradigm (100-400 ms) which probably resulted in a more demanding task than in other studies, in order to make results more comparable with the literature, we set the duration of short tones to 200 ms in easy as well as in the difficult condition. Second, in contrast to Muller-Gass and Schröger (2007), we utilized short tones with identical duration in both conditions and only the duration of long tones was different; that is, participants had to perform a 200 vs 400 ms discrimination task in the easy condition and a 200 vs 260 ms discrimination task in the difficult condition. This design enables the comparison of responses to stimuli with same physical features, that is, short tones, in different contexts (difficulty levels).
The arrangement presented above allowed us to inspect several reflections of attentional processes like distraction, preparation and allocation of attention for task-relevant auditory events. First, we calculated the "distraction potential"the sequence of N1-enhacement/ MMN, P3a, and RON to make our results comparable with previous studies. However, in order to avoid overlaps of tone offsets, only short tones were averaged. We hypothesized that in the difficult condition, in which short and long tone duration difference was smaller, participants would need to focus more strongly on the tone onsets, which, in turn, would amplify the processing of deviances. Therefore, deviants were presumed to cause larger distraction as reflected by poorer behavioral results to these tones, and by enhanced P3a and RON amplitudes. Second, as the constant SOA allows preparation for tone onsets, the steepness of pre-tone intervals reflecting preparation effects and onset N1 potentials were compared between easy and difficult conditions. Because demanding task requires enhanced focusing to tone onsets, enhanced pre-stimulus steepness and onset-related N1 amplitudes were expected in the difficult condition. Finally, we compared the latencies of offset-related P1s and N1s between easy and difficult conditions. In the difficult condition, delayed N1 responses were presumed and the elicitation of endogenous components like PN or N2 as well in order to compensate onset-related distraction and to solve the task. Importantly, because these components typically elicit in time intervals close to each other, overlaps might occur.

ERPs
The numbers of epochs remaining after the exclusion of artifactcontaminated events are presented in Table 1. For Bayes Factors, all proportional error rates were < .01% To allow the comparison of our data with previous studies on the distraction effect, results on the deviant-minus-standard waveforms are presented first. Then, potential preparation and attention effects on ERPs are assessed in the intervals preceding tone onsets, as well as in onset-and offset-related ERP responses. Because the ERPs were differently aggregated across stimuli for these comparisons (see Methods, below), the relevant components and time intervals are highlighted, while the remaining (non-interpretable, or irrelevant) parts of the ERPs are faded in the figures.
The conventional derivation of the distraction-related ERP as deviant-minus-standard waveform for short tones (Fig. 2., right) showed the expected three waveforms: MMN peaked at F1 at 134 ms in the easy condition with a polarity inversion at mastoids, P3 peaked at Pz at 384 ms, and RON peaked at F4 at 566 ms.
When observing ERPs without the conventional subtraction (Fig. 2, left; and Fig. 3), first, a negative pre-stimulus trend ( Fig. 3A and 3B., left column) is visible with maximal steepness at parieto-occipital sites (PO4). Standard tone onsets (Fig. 3., middle column) elicited three distinct ERP peaks: the onset-related N1 (reaching its most negative value at 104 ms at FCz in the easy condition, Fig. 3A., middle column, 3B right), was followed by a P2, then by the beginning of a sustained negative ERP response. Based on the peak latency difference between short and long tone related ERPs, the third negative waveform was identified as an offset-related negativitypresumably a mixture of N1 and N2 (because the offset was task-relevant) (Fig. 3A., right column). Short tone offset-related ERPs were characterized with a tri-phasic waveform as well, starting with a P1 (peaking at 324 ms at P7 electrode for short standards in the easy condition), followed by an N1/N2 as mentioned previously, and finally, since all tones were task-relevant, a large parietal positivity (probably a P3b) elicited as well. As task difficulty strongly affected the latency of the offset-related negativity as depicted in the right column of Fig. 3., defining a suitable window for amplitude modulation analysis would be problematic. Therefore, no amplitude analysis was applied for this component (the latency analysis is described below). It is important to note that while onset-related N1 was clearly present at temporal electrodes as well (peaking at FT7 at 140 ms), the offset-related N1/N2 could be observed only for easy standards at temporal sites (peaking at FT7 at 248 ms). This is on-a-par with that observed by Horváth, Gaál and Volosin (2017).

Deviant-minus-standard difference waveforms
For the MMN amplitudes (measured in the deviant-minus-standard difference waveforms at F1), the Type (standard/deviant) × Condition 11; BF = 157.954) main effects were found, suggesting larger negativities to deviants than to standards, and more negative amplitudes in the difficult condition compared to the easy condition. The Type × Condition interaction was not significant (F(1, 17) = 1.467, p = .242, η 2 G = .002; BF = .458).

ERPs without subtraction
The preparation effect observable in the pre-stimulus interval (Fig. 3, left column) was characterized by the steepness of the signal, defined as the difference of the signal average in the −250-200 ms and −50-0 ms intervals. Steepness was maximal at PO4 in both conditions. No significant difference was found between easy and difficult conditions as revealed by the paired t-test: t(17) = 1.034, p = .316; BF = .387, suggesting no substantial differences in preparation for tone onsets in the two conditions.
The paired t-test of onset N1 amplitudes measured at FCz showed a significant difference between easy and difficult condition to standard tones (t(17) = 2.503, p = .023; BF = 2.692). Larger (more negative) amplitudes were present in the difficult condition, suggesting stronger attentional focus on tone onsets in the more demanding condition.

Effects of task difficulty on tone offset latencies
Offset-latencies for P1 were defined using the jackknife-method (Kiesel et al., 2008) between 200 and 450 ms at the pre-determined Cz electrode using a cutoff value of 0 µV, and N1/N2 latencies were defined between 300 and 600 ms at the pre-determined Fz electrode using a cutoff value of −3 µV. Note that in the following the jackknife-adjusted F-and p-values and unadjusted η 2 G are reported. The Component × Condition ANOVA indicated significant main effect of Component (F(1, 17) = 138.331, p < .001, η 2 G = .999) and Condition (F(1, 17) = 49.421, p < .001, η 2 G = .996), suggesting that N1/N2 occurred later compared to P1, and both components were delayed in the difficult condition. More importantly, the Component × Condition interaction was significant: F(1, 17) = 6.538, p = .02, η 2 G = .906. This interaction implies that the latency delay in the difficult condition was larger for N1/N2 compared to P1, and suggests that attentional processing was affected more compared to sensory processing of tone offsets when the discrimination was more difficult.

Discussion
The aim of the present study was to investigate how task demand affected the allocation of attention in the auditory modality as reflected by behavioral performance and ERPs. We administered a distraction paradigm in which participants performed a tone duration discrimination task. Task difficulty was successfully manipulated by the reduction of duration difference between tones, that is, the perceptual discriminability in temporal dimension: A smaller difference between short and long tones (difficult condition) resulted in slower reaction times and worse discrimination accuracy. Rare pitch deviant tones led to slower and less accurate responses compared to standards, but the magnitude of distraction was not affected by task difficulty. Although the lack of significant MMN, P3a or RON difference mirrors the behavioral results, ERPs without subtraction showed a slow negative prestimulus effect in both conditions, and more importantly, in the difficult condition enhanced onset-N1 amplitudes and delayed offset-P1 and N1/ N2 latencies were measured.
The behavioral results are on a par with the studies both by Muller- Gass and Schröger (2007), and Sabri et al. (2006) who failed to demonstrate a modulation of the distraction effects due to higher task demand, indicating that unexpected, rare deviances captured attention independently of the strength of attentional focus. Due to the relatively modest Bayes Factors, however, no strong conclusions can be made regarding these null effects. Beyond the context of the distraction paradigm, the results also fit those by Murphy, Fraenkel and Dalton (2013), who did not find any effects of perceptual load on selective attention in a variety of behavioral auditory tasks, and proposed that by focusing on a single auditory stream or sound feature dimension, one may not be able to exhaust the capacity of the auditory system, and thus, in such cases capacities to process information from other sources still remain. This is also in line with the conclusion of the review by Murphy, Spence and Dalton (2017), according which in the auditory modality, perceptual task demand affects the filtering of irrelevant sensory events less strongly than in vision. The null effect of task difficulty on distraction-related ERPs observable in the deviant-minus-standard waveform correspond well to those of Muller-Gass and Schröger (2007), but contrast the results of Sabri et al. (2006), who found lower MMN and increased N1 and P3a amplitudes in the difficult condition. A possible cause of these differences with Sabri et al. (2006) might be the method of data analysis: while for deviant-minus-standard difference waveform in the present study physically identical short tones were averaged only, Sabri and colleagues did not separate short tones from long ones. Thus, it is presumable that offset responsesif elicitedwere characterized with different latencies which contributed to MMN and P3a modulations in their study, as previously suggested by Horváth (2014b). Note that despite similar results with Muller-Gass and Schröger (2007), conclusions should be drawn cautiously as they defined P3a to ERPs on deviants only (short and long tones pooled together) instead of difference waveforms.
Although only the above mentioned two studies investigated the effect of perceptual load in an auditory oddball paradigm, comparison with other studies in context of working memory load or intermodal arrangement is also informative. In a previous study utilizing a working memory manipulation identical to that used by Muller-Gass and Schröger (2007), P3a attenuation was found in the deviant-minusstandard waveform (Berti and Schröger, 2003). Similarly, in auditoryvisual paradigms (in which participants performed visual tasks, and tones were task-irrelevant), tone-related P3a amplitude was reduced when the primary visual task was more demanding (Harmony et al., 2000;Muller-Gass, Stelmack and Campbell, 2006;SanMiguel et al., 2008). Taken together with the present study, these results suggest that channel separation has a decisive influence on distraction: whereas it might be easier to suppress the irrelevant modality (when performing a visual task), in pure auditory paradigms, distracting (deviant) and taskrelevant event features are often parts of the same auditory stimulus, thus they are more difficult to ignore. Moreover, working memory load leaves less resources for stimulus processing as attention needs to be divided between the evaluation of actual stimulus and the motor response to the previous one which might lead to modulation of distraction as well (Muller-Gass and Schröger, 2007).
The ERPs (without the typically used deviant-minus-standard subtraction) showed several effects: first, tones were preceded by a slow, parietally pronounced negative trend in both conditions, suggesting that participants prepared for tone onsets andpossiblyfor the time points of the task-relevant offsets. Second, the enhancement of the onset-related N1 amplitude in the difficult condition suggests that attention to tone onsets might have been stronger in that condition. Third, offset-related positive (P1) and negative waveforms (N1/N2) were elicited with significant delay when the discrimination task was difficult (and were probably overlapped by a P3b in both conditions). While the present enhancement of the onset related N1 in the difficult condition is in line with Sabri et al. (2006) finding, it deviates from Muller-Gass and Schröger's (2007) study who found N1 enhancement only when task difficulty was manipulated by working memory load and not by perceptual demand.
The pattern of difficulty-related N1 enhancement may be due to between-study differences in task difficulty. Whereas in the present study, and in the study of Sabri et al. (2006), the ratio of short and long tone durations was 100% in the easy, and 30% and 20% in the difficult condition, respectively, Muller-Gass and Schröger (2007) utilized ratios of 200% and 63%. That is, the lack of difficulty-related N1 enhancement in the study by Muller-Gass and Schröger (2007) might have been caused by lower overall effort and focus requirements that allowed the task to be solved without substantial differences in effort. Furthermore, as the foreperiod effect provides substantial processing benefits for constant intervals longer than 100 ms, the 50 ms duration in Sabri and colleagues study (2006) was too short to allow such benefits (see Bertelson, 1967;Los et al., 2001;Proctor & Vu, 2003). That is, while in the present and the Muller-Gass and Schröger (2007) study the temporal separation between tone on-and offsets allowed the preparation for short tone offsets by allocating attention to that timepoint, participants in Sabri et al. (2006) study could not exploit the foreperiod effect, and probably invested enhanced attention during the whole time of the task. Depending on the demands of the task, the maintenance of such a continuous focus may lead to mental fatigue, which was also associated with N1 enhancement (for a review see: McGarrigle et al., 2014). Because the difficult condition in Sabri et al. (2006) was presumably the most difficult one in the present and the two referred studies, the difficulty-related N1 enhancement in their study may have received a contribution from the need for continuous attention deployment.
For the present study, one can also hypothesize that onset N1 enhancement in the difficult condition might be an artifact rather than a genuine N1 modulation, as N1 components were settled on a slow negative wave starting before stimulus onsets. The emergence of the stimulus preceding negativity (van Boxtel and Böcker, 2004) is in line with previous studies utilizing constant SOA (Volosin and Horváth, 2014;Horváth et al., 2017). Importantly, however, the steepness of this trend did not differ significantly between easy and difficult conditions, and was characterized with a parietal scalp distribution (corresponding to findings of Horváth et al., 2017) while onset N1 was peaking at fronto-central sites as presented in Fig. 3 (panel B). Therefore, it is more likely that the N1 effect was caused by enhanced attention allocated to tone onset in the difficult condition compared to the easy one.
In line with previous research , offset-related responses in the present study were dominated by extensive negativities, probably a mixture of N1 and N2 waveforms. That N2 may strongly contribute to this waveform is supported by the significant difficulty-related delay, a well-known characteristic of N2 (Ritter et al., 1979). This delay suggests that although participants could allocate their attention to the offsets of short tones, the task-related decision occurred significantly later when duration discrimination was more difficult. Besides the negative components dominating the offset responses, two distinctive positivities were observable as well. First, preceding N1, a P1 was present to the offset of the short tones peaking at temporo-parietal sites, presumably reflecting the sensory processing of tone endings (Horváth, 2014b(Horváth, , 2016Horváth et al., 2017). Similarly to N1/N2, P1 components also emerged with a delay in the difficult condition. In a previous study of Horváth, Gaál and Volosin (2017), no P1 latency modulation was present when duration difference between short and long tones was manipulated (150 ms, 300 ms, 450 ms vs 750 ms). In Horváth, Gaál and Volosin's (2017) study, the ratios between short and long tones were 400%, 150% 66%, respectively, resulting in an easier task. This large duration difference between short and long tones also allowed to characterize offset responses in the shortminus-long difference waveforms which allows a less ambiguous estimation than the one utilized in the present study. Despite this methodological issue, it is possible that tone offsets were detected by the sensory system but processed more slowly when discrimination was more demanding and onsets required enhanced attention. The significantly larger N1/N2 delay supports the notion that P1 and N2 reflect different, presumably sensory, and decision processes, respectively. Being largely post-hoc, however, these interpretations need to be confirmed by further studies.
In summary, although no effects of task difficulty were found in the conventional deviant-minus-standard difference waveforms, ERPs derived without subtraction allowed to highlight several sensory and cognitive processes involved in performing a duration discrimination task. First, participants relied on regularities in the temporal structure of the stimulation to improve task performance, as reflected by a negative trend in the pre-stimulus interval suggesting preparation to tone onsets. Second, enhanced task demand resulted in larger onset-related N1 and delayed offset-related P1 and N1/N2 when duration discrimination was more difficult, reflecting enhanced focus to the tones, slower offset detection and decision making. These results imply that task demand affects the attention-distraction balance in hearing by allocating enhanced attention to tone onsets when duration discrimination is difficult, leading to slowed processing of tone offsets as well. Remarkably, however, enhanced attention did not result in substantial changes in the indices of distraction.

Participants
20 healthy university students (14 females, 6 males) recruited by a student part-time job agency participated in the experiment for modest financial compensation. Because of excessive amount of eye movements, two participants were excluded from further analysis (13 females and 5 males remained; mean age = 22.13 years, SD = 2.47; 5 left-handed). All participants reported normal hearing and the absence of neurological or psychiatric diseases and had normal or corrected-tonormal vision. They gave written informed consent after the experimental procedure was explained to them. The study was approved by the United Review Committee for Research in Psychology (Hungary), and was conducted in accordance with the Declaration of Helsinki.

Stimuli and procedure
Participants were sitting in a comfortable armchair in a dimly lit room and listened to a sequence of binaurally presented complex tones through headphones (Sennheiser HD-600; Sennheiser, Wedemark, Germany). They held response buttons in their hands (one in each, assigned to short and long tones) and their task was to discriminate between short and long tones by button presses while ignoring tone frequency. They were also instructed to respond as fast and as accurately as possible. Task difficulty was manipulated by varying the duration difference between short and long tones, resulting in two conditions. Short tones were always 200 ms long, while the duration of long tones was 400 ms in the "easy" condition and 260 ms in the "difficult" condition. Durations included 10-10 ms linear rise and fall times. In both conditions, the short and long tones were presented randomly with 50-50% probability. On 87.5% of the trials tones with a base frequency of 880 Hz (standards) were presented; the remaining tones were deviants, presented pseudo-randomly with higher (932 Hz; 50% of deviants) or lower (830 Hz; 50% of deviants) pitch. Tone-order was pseudo-randomized with the constraint that a deviant was always followed by at least two standards. The stimulus onset asynchrony was 1500 ms. Tones were generated with Csound version 5.7.11 (www.csounds.com) with a sampling rate of 44100 Hz. Each tone consisted of the base frequency and the second and third harmonics with relative amplitudes of 70% and 50% of the fundamental, respectively.
Each block started with the instruction "Press a button to start!" (in Hungarian) on the screen on gray background. After participants pressed a button, the block started with a demonstration sequence of short, long, short, long standard tones played in parallel with the presentation of the corresponding words "SHORT" and "LONG" on the screen in order to familiarize participants with the difference between the tone durations. Following the demonstration sequence, a 1 s long "START" sign allowed participants to prepare for the task, and then the stimulation started immediately. During the task a small green square was present in the middle of the screen. At the end of each block, participants received feedback about their mean reaction times and hit rates. Blocks were separated by short breaks, and after the 10th block a longer break was available as required. 20 blocks were administered, half of which belonged the "Easy" and the other half belonged to the "difficult" condition. Each block consisted of 128 tones (112 standards and 16 deviants, resulting in 1120 standard and 160 deviant trials in each condition), that is, one block lasted about 3.2 min. The blocks were presented in an interwoven ("ABBAABBAABB…", where A and B correspond to the "easy" or "difficult" condition) sequence, with the type of the starting block counterbalanced between participants. The first two blocks of each condition were considered as training and were excluded from further analysis.

EEG recording
The continuous EEG was recorded at 500 Hz sampling rate with a Neuroscan Synamp 2 (Compumedics Inc., Victoria, Australia) amplifier. 61 Ag/AgCl electrodes were mounted on an elastic cap (EASYCAP GmbH, Herrsching, Germany) according to the 10-10 system (Nuwer et al., 1998). Two additional electrodes were attached to the mastoids. The reference and ground electrodes were placed at the tip of the nose, and at the forehead, respectively. The horizontal electro-oculogram was measured in a bipolar setup between electrodes placed near to the outer canthi of both eyes. Vertical electro-oculogram (EOG) was calculated off-line, as the difference signal of the Fp1 electrode and an additional electrode below the left eye. The continuous EEG was filtered offline using a 30 Hz lowpass filter (Kaiser-windowed sinc finite impulse response filter, beta of 5.65, 907 coefficients; 2 Hz transition bandwidth, stopband attenuation at least 60 dB).
For the ERP analysis, tones followed by a button press in 320-1200 ms (calculated from tone onset) were selected. 1250 ms long epochs were extracted including a 250 ms pre-stimulus baseline, separately for each stimulus type (standard/deviant) and duration (short/ long) in each condition (easy/difficult). Whole epochs with signal range exceeding 100 µV on any channel (including EOG) were excluded from further analysis (typically due to movement or high-amplitude alpha activity), as well as epochs corresponding to the first four trials of each block and standards immediately following deviants.

Statistical analysis
For behavioral analysis, only trials with responses between 320 and 1200 ms from tone onsets (corresponding to 120-1000 ms with respect to the time point of the potential short tone offset) were included; other trials were regarded as anticipatory, missed, or late-response trials. Since individual reaction times typically show a skewed distribution, participants were characterized by the medians of their individual reaction times in each condition. d' sensitivity scores were calculated as described by MacMillan and Creelman (1991). Because the different number of standard and deviant trials can lead to biased d' scores, the numbers of hits, false alarms, misses and correct rejections were scaled down to the nearest integer matching the number of deviant trials. Reaction times (for correct response trials) and d's were submitted to Type (standard vs deviant) × Condition (easy vs difficult) repeated measures ANOVAs.
For the analysis of ERPs, short and long tones were averaged separately for standards and deviants in the easy and difficult conditions. Preparation-, onset-and offset-related effects were estimated and analyzed separately. First, the usual group-averaged deviant-minus-standard difference waveforms were calculated for short tones only and MMN, P3a and RON were identified in the easy condition (i.e. the between-condition difference was not used for the selection of the analysis windows and electrodes, thus avoiding biasing between-condition statistical comparisons, see Luck and Gaspelin, 2017). The average signals for each participant were measured in 20 ms long windows centered at these latencies at electrodes exhibiting the largest signal and were submitted to Condition × Type ANOVAs.
Second, the preparation effect measured in the two conditions was estimated by calculating the steepness of the ERP in the pre-stimulus interval as the difference between the amplitudes measured at the beginning (average amplitude between −250 ms and −200 ms) and the end of the baseline (average amplitude between −50 ms and 0 ms, see Horváth et al., 2017) for the average of all trials (that is, short, long, standard and deviant trials) of each condition. The electrode exhibiting maximal steepness in any of the conditions was selected for statistical analysis and steepness in the easy and difficult conditions were compared by a paired Student's t-test (i.e. the between-condition difference was not used for the selection of the electrode, thus avoiding biasing between-condition statistical comparisons).
Third, to characterize the potential task-difficulty-related difference in the allocation of attention to the tones, the N1 elicited by tone onsets was identified in the group-average ERPs (averaged over short and long standards) in the easy condition; amplitudes were calculated in 20 ms long windows centered at the identified latency and electrode for standards in both (easy and difficult) conditions and compared by a paired Student's t-test (i.e. the between-condition difference was not used for the selection of the electrode, thus avoiding biasing betweencondition statistical comparisons). Additionally, for behavioral and ERP amplitude analyses, Bayes Factors (BF) were calculated using the BayesFactor R package (version 0.9.12-4.2; Morey et al., 2018) utilizing the default prior settings and interpreted as based on the guidelines provided by Jeffreys (1961), and as adapted by Lee and Wagenmakers (2013). The reported Bayes factor values show the comparison of the alternative versus the null hypothesis (i.e. values larger than 1 show that the observed data are more likely to have occurred under the alternative hypothesis than the null).
Finally, in order to investigate offset-related P1 and N1/N2 latencies, ERPs to short standards were analyzed by the jackknife method as described by Kiesel, Miller, Jolicoeur and Brisson (2008) and submitted to a Component (P1 vs N1/N2) × Condition (easy vs difficult) ANOVA. To calculate latency differences, Cz electrode was chosen fo P1, and Fz electrode was chosen for N1/N2 based on former results of Horváth (2014bHorváth ( , 2016. Although previous studies (Horváth, 2014b;2016;Horváth et al., 2017) used short-minus-long difference waveforms to investigate offset-related responses to short tones, we did not use this comparison in the present study, because in the difficult condition, the timepoints of short and long tone offsets were separated only by 60 ms (200 vs. 260 ms), leading to potential overlap between the two offset-related waveforms, whereas in the easy condition the 200 ms separation (200 vs. 400 ms) would not. Thus, the short-minus-long difference was calculated for easy standards only and plotted only for better visualization. All statistical tests were conducted in R (version 3.4.1; R Core Team, 2017). Generalized eta-square (η 2 G ) effect sizes are also reported (Bakeman, 2005;Olejnik and Algina, 2003).