Temporal expectations modulate face image repetition suppression of early stimulus evoked event-related potentials

Repeated exposure to a stimulus leads to reduced responses of stimulus-selective sensory neurons, an effect known as repetition suppression or stimulus-specific adaptation. Several influential models have been proposed to explain repetition suppression within hierarchically-organised sensory systems, with each specifying different mechanisms underlying repetition effects. We manipulated temporal expectations within a face repetition experiment to test a critical prediction of the predictive coding model of repetition suppression: that repetition effects will be larger following stimuli that appear at expected times compared to stimuli that appear at unexpected times. We recorded event-related potentials from 18 participants and mapped the spatiotemporal progression of repetition effects using mass univariate analyses. We then assessed whether the magnitudes of observed face image repetition effects were influenced by temporal expectations. In each trial participants saw an adapter face, followed by a 500 ms or 1000 ms interstimulus interval (ISI), and then a test face, which was the same or a different face identity to the adapter. Participants' expectations for whether the test face would appear after a 500 ms ISI were cued by the sex of the adapter face. Our analyses revealed multiple repetition effects with distinct scalp topographies, extending until at least 800 ms from stimulus onset. An early (158-203 ms) repetition effect was larger for stimuli following surprising, rather than expected, 500 ms ISI durations, contrary to the model predictions of the predictive coding model of repetition suppression. During this time window temporal expectation effects were larger for alternating, compared to repeated, test stimuli. Statistically significant temporal expectation by stimulus repetition interactions were not found for later (230-609 ms) time windows. Our results provide further evidence that repetition suppression can reduce neural effects of expectation and surprise, indicating that there are multiple interactive mechanisms supporting sensory predictions within the visual hierarchy.


Introduction
Living organisms exhibit a remarkable ability to exploit statistical regularities and recurring patterns that occur within sensory environments. Many vertebrate and invertebrate species can rapidly form predictions based on recurring sequences of stimuli, allowing them to anticipate the identities, locations and timing of upcoming events (e.g., Posner, 1980;Turk-Browne et al., 2009;Meyer and Olson, 2011;Hogendoorn and Burkitt, 2018;Nobre and van Ede, 2018). Responses of sensory neurons are also shaped by recent stimulus exposure. Repeated presentation of a stimulus typically leads to reductions in responses of stimulus-selective cortical and subcortical neurons, known as repetition suppression or stimulus-specific adaptation (Desimone, 1996;Movshon and Lennie, 1979).
Repetition suppression (RS) refers to a reduction in a recorded signal of neuronal activity (e.g. firing rate, local field potential amplitude, fMRI BOLD signal change) to repeated compared to unrepeated stimuli De Baene and Vogels, 2010; for reviews see Grill-Spector et al., 2006;Kohn, 2007;Vogels, 2016;Larsson et al., 2016). RS is not strictly stimulus-specific, but is typically dependent on the physical or perceptual overlap between concurrently-presented stimuli (e.g., Verhoef et al., 2008). Repetition effects have also been reported when recording EEG/MEG (e.g., Caharel et al., 2015;Feuerriegel et al., 2018a). These effects are widely believed to index RS, due to almost ubiquitous findings of suppression (rather than enhancement) of neural responses when using similar experimental designs combined with different recording modalities (e.g., single unit firing rates: Sawamura et al., 2006; local field potentials: De Baene and Vogels, 2010;fMRI BOLD signals: Grill-Spector et al., 1999). Characterising the neural mechanisms that underlie RS is critical for understanding how we detect rare or novel events in our environment (Nelken, 2014;Solomon and Kohn, 2014), and what occurs when this process functions abnormally in neurological and psychiatric disorders (e.g., Naatanen et al., 2014, Kremlacek et al., 2016. Here we will focus on immediate stimulus repetition, as opposed to delayed repetition (i.e., when several intervening stimuli are presented between the first and repeated presentations of a stimulussee Henson, 2016).
Several conceptual and computational models have been proposed to explain RS. Early models described local mechanisms that influence the rate, duration, and stimulus selectivity of neural responses (Desimone, 1996;Wiggs and Martin, 1998;reviewed in Grill-Spector et al., 2006). More recent models acknowledge that RS operates within hierarchically organised sensory systems, such as the visual system. These newer models emphasize that repetition effects occur within local, recurrently-connected neural networks, and can be propagated across brain regions. There are currently two dominant models of RS, which both focus on response modulations of stimulus-selective excitatory neurons, such as cortical pyramidal neurons which contribute to scalp-recorded EEG.
Normalisation models (Dhruv et al., 2011;Solomon and Kohn, 2014;Kaliukhovich and Vogels, 2016;Whitmire and Stanley, 2016) describe responses of stimulus-selective neurons according to the interplay between excitatory (i.e., driving afferent) input, corresponding to stimulation within classical receptive fields, and divisive normalising inhibitory input from other neurons within the same network. These are similar networks to those specified in normalisation models of attention (e.g., Reynolds and Heeger, 2009). The nature of the divisive normalising input differs by brain region (reviewed in Carrandini and Heeger, 2012), for example expressed as inhibitory 'surround' effects in V1 Kohn, 2012a, 2012b) or competitive interactions between feature-selective neurons in extrastriate visual areas, possibly acting via GABAergic interneurons (e.g., Chelazzi et al., 1998;Kaliukhovich and Vogels, 2016). Importantly, the effects of both excitatory and inhibitory inputs can be reduced by stimulus exposure in these models, for example due to spike frequency adaptation, afterhyperpolarisation or rapid synaptic plasticity (Zucker and Regehr, 2002;Fioravante and Regehr, 2011;reviewed in Whitmire and Stanley, 2016;Vogels, 2016). RS can also propagate across visual areas in a feedforward or feedback manner, due to downstream visual areas receiving altered input from adapted neural populations (e.g., Kohn, and Movshon, 2003;Drhuv and Carrandini, 2014). Such models postulate that a primary function of RS is to increase the salience of novel stimuli by enhancing responses to novel stimuli compared to those seen in the recent past.
Another dominant model of RS is derived from theories of perception based on predictive coding (e.g., Rao and Ballard, 1999;Friston, 2005) and is described in detail in Auksztulewicz and Friston (2016). Predictive coding models describe RS as a reduction of prediction error signals, due to fulfilled perceptual expectations that are weighted toward recently-encountered stimuli. In this model reductions in responses of superficial pyramidal neurons (which signal prediction errors) occur via inhibitory lateral and feedback connections (Friston, 2005), for example via GABAergic inhibitory interneurons (Chu et al., 2003;Wozny and Williams, 2011). A critical component of this model is sensory precision, which reflects the confidence that a system holds regarding its sensory predictions (Feldman and Friston, 2010). Accordingly, prediction errors are weighted by the sensory precision of predictions. Sensory precision can be manipulated by exogenous factors, such as stimulus signal-to-noise ratio, or endogenous factors, such as focused attention, or the expectation that a certain stimulus will appear (Feldman and Friston, 2010;Auksztulewicz and Friston, 2016). According to the predictive coding model of RS, contexts associated with high sensory precision (i.e., attended and/or expected stimuli) are predicted to lead to larger RS, compared with contexts associated with lower precision (i.e., unattended and/or surprising stimuli).
To test and extend normalisation and predictive coding models of RS researchers have assessed whether RS is modulated by attention and expectation. Studies manipulating attention within immediate repetition designs have reported larger fMRI BOLD RS for stimuli of attended (compared to unattended) spatial locations and stimulus categories (Murray and Wojciulik, 2004;Eger et al., 2004;Yi et al., 2006). The N250r ERP face identity repetition effect was also reduced when attention was diverted towards a distractor face (Neumann and Schweinberger, 2009). These effects of attention are congruent with predictive coding accounts of RS, and signify that normalisation models could be extended to describe interactions between attention and RS.
There is also a substantial literature on RS and perceptual expectations (e.g., expectations based on the contextual likelihood that a given stimulus will appear). Summerfield and colleagues (2008) presented pairs of repeated (i.e., AA) or alternating (i.e., AB) faces in each trial, and manipulated across blocks the proportions of trials with face repetitions (60% vs. 20%). They reported larger face identity BOLD RS in the fusiform face area (FFA; Kanwisher et al., 1997) in blocks with higher proportions of repetition trials compared to those with lower proportions. These findings have been replicated several times using fMRI (e.g., Kovács et al., 2012Kovács et al., , 2013de Gardelle et al., 2013;Grotheer and Kovács, 2014;Choi et al., 2017), and have been widely interpreted as increased RS resulting from higher sensory precision in contexts whereby repetitions were expected to occur. However, the analyses used in these experiments confounded additive and interactive effects of RS and expectation (discussed in Grotheer and Kovács, 2015;Feuerriegel et al., 2018a). Other experiments independently manipulated stimulus repetition and expectation using fMRI and electrophysiological recordings, and observed independent expectation and RS effects (Todorovic and de Lange, 2012;Grotheer and Kovács, 2015;Feuerriegel et al., 2018a). Others reported no effects of expectation on firing rates or local field potential amplitudes of Macaque inferior temporal neurons Vogels, 2011, 2014), including when attended, task-relevant face stimuli were presented and when expectations influenced behaviour (Vinken et al., 2018). Another study instead observed larger RS for surprising, rather than expected, stimuli (Amado et al., 2016); this pattern of effects is also visible in many of the Summerfield et al. replications (Kovács et al., 2012;de Gardelle et al., 2013;Larsson and Smith, 2012;Grotheer and Kovács, 2014;Choi et al., 2017;reviewed in Kovács and Vogels, 2014). Findings of larger RS for surprising (rather than expected) stimuli are the opposite pattern to that hypothesized by the predictive coding model (Auksztulewicz and Friston, 2016). However, these findings could potentially be accommodated by assuming interactions between local or feedforward mechanisms of RS (e.g., inherited adaptation or synaptic fatigue) and expectation effects on the same stimulus-selective neurons. If perceptual expectations modulate the gain of neural responses, similar to effects of attention (e.g., Larsson and Smith, 2012), then reduced stimulus-driven input due to RS would attenuate the influence of surprise-related response gain increases. This would lead to smaller expectation/surprise response differences for repeated stimuli (e.g., Feuerriegel et al., 2018b). Such an effect would be analogous to RS decreasing the salience (i.e., sensory precision) of stimulus-evoked responses within predictive coding models (Solomon and Kohn, 2014;Vogels, 2016).
In the current study we designed a different test of the precisionbased modulation hypothesis; we manipulated expectations regarding when an upcoming stimulus would appear (i.e., temporal expectations, Nobre et al., 2007;Nobre and van Ede, 2018). There appear to be multiple, widespread effects of temporal expectation in the brain (reviewed in Nobre and van Ede, 2018). In primate visual cortex temporal expectations can increase firing rates and local field potential amplitudes to expected stimuli in V4 and inferior temporal cortex (Ghose and Maunsell, 2002;Anderson and Sheinberg, 2008), and drive increased gamma band oscillations and suppressed alpha-band activity in V1 (Lima et al., 2011), similar to effects of spatial attention (Fries et al., 2008). Temporal expectations can also modulate visual stimulus evoked potentials and amplify the effects of spatial attention on scalp-recorded ERPs (Doherty et al., 2005;Correa et al., 2006). According to predictive coding models of RS, stimuli that appear at expected times are linked to higher sensory precision, and such stimuli should show larger RS than those which appear at unexpected/surprising times (Auksztulewicz and Friston, 2016).
It is currently unclear whether temporal expectations influence stimulus repetition effects in the visual system. Differences in ERP repetition effect magnitudes have been reported for auditory stimuli in oddball designs, which presented streams of stimuli separated by isochronous or random interstimulus intervals (Costa-Faidella et al., 2011;Schwartze et al., 2013; but see experiments 1 and 2 in Tavano et al., 2014). In these experiments repetition effects were operationalised as the additive effects of stimulus repetition and stimulus feature expectations; repeated stimulus tones were expected, whereas unrepeated tones were surprising. Consequently, it is unclear whether temporal expectations modulated RS-specific processes or effects of stimulus feature expectations (e.g. Tavano et al., 2014), which would lead to similar patterns of effects on recorded ERPs.
To provide a more specific test of temporal expectation effects on RS, we presented pairs of repeated and alternating (unrepeated) faces separated by 500 ms and 1000 ms interstimulus intervals (ISIs). We adapted the design of Grotheer and Kovács (2015) to cue participants' expectations for a 500 ms or 1000 ms ISI, depending on the sex of the first face presented in each trial. Unlike previous studies we also balanced expectations for specific stimulus identities, and temporal expectations, across repeated and alternating stimuli. By recording ERPs evoked by repeated and alternating faces we tested whether the N250r repetition effect, which is influenced by feature-based attention (Neumann and Schweinberger, 2009), could also be modulated by temporal expectations. Using mass univariate analyses we could also map the complex spatiotemporal progression of repetition effects (e.g., Feuerriegel et al., 2018a), and test whether earlier or later effects are modulated by temporal expectations. Predictive coding models (e.g. Auksztulewicz and Friston, 2016) hypothesise larger repetition effects for stimuli with expected onset times due to increased precision of sensory predictions.
The predictions of RS models specifying local or feedforward mechanisms (such as inherited adaptation or synaptic fatigue) differ depending on assumptions regarding the relationship between RS and temporal expectation effects in this design. If temporal expectation effects are independent of RS mechanisms, and modulate firing rates of all neurons selective for repeated or alternating test stimulus images, then such models (e.g., the normalisation model, Kaliukhovich and Vogels, 2016) will predict larger repetition effects for stimuli with expected onset times. However, if local or inherited RS serves to reduce the magnitude of expectation effects (as in Feuerriegel et al., 2018b) then we would expect to see larger repetition effects for stimuli with Fig. 1. Trial diagram and experimental block types. A) In each trial adapter and test stimuli were presented, separated by either a 500 ms or 1000 ms ISI. Test stimuli were 20% larger than adapters. An example of an alternating trial is displayed, in which adapter and test faces are different identities. B) Examples of stimuli presented in repetition, alternation and target trials. C) Trial structures for each block type. In balanced blocks the probability of a 500 ms or 1000 ms ISI duration was 50% each. In cued blocks the probability of a 500 ms or 1000 ms ISI duration varied by the sex of the adapter face. In this example a female adapter face cues a high (75%) probability of a 500 ms ISI, whereas the male adapter face cues a low (25%) probability of a 500 ms ISI. Rep and Alt stand for repeated and alternating (i.e., unrepeated) test stimuli. D) Block order in the experiment. Two balanced blocks were presented before 6 cued blocks.
surprising onset times, due to a reduction in surprise-related responses for repeated stimuli.

Participants
Eighteen people (4 males) participated in this experiment (age range 18-32 years, mean age 23.6 ± 4.9). This sample size was determined to be similar to previous ERP studies that have identified temporal attention and face repetition effects (e.g., Correa et al., 2006;Neumann and Schweinberger, 2009;Neumann et al., 2011). All participants were native English speakers and had normal or corrected-tonormal vision, no history of psychiatric or neurological disorders or substance abuse, no history of unconsciousness for greater than 1 min, and had not taken recreational drugs within the last 6 months. All participants were right-handed as assessed by the Flinders Handedness Survey (Nicholls et al., 2013). This study was approved by the Human Research Ethics committee of the University of South Australia.

Stimuli
We took 49 frontal images of faces (24 male, 25 female) from the Karolinska Directed Emotional Faces database (Lundqvist et al., 1998). Examples of stimuli are shown in Fig. 1A. Selected faces were of neutral expression with no facial piercings or hair that occluded the face. We then converted all images to greyscale, and cropped, resized and aligned them, so that the nose was in the horizontal center of the image. We then vertically aligned the eyes of each face, and resized the images so that at a viewing distance of 60 cm stimuli subtended approximately 3.15°× 3.72°of visual angle (134 × 156 pixels). We created test stimuli to be 20% larger than adapter stimuli to minimise low-level or retinal adaptation. We used the SHINE toolbox (Willenbockel et al., 2010) to equate mean pixel intensity, contrast and Fourier amplitude spectra across the images (Mean normalised pixel intensity = 0.52, RMS contrast = 0.16). Stimuli were presented against a grey background (normalised pixel intensity = 0.52).

Procedure
Participants sat in a well-lit testing room 60 cm in front of an LED monitor (refresh rate 60 Hz). We presented stimuli via custom scripts written in MATLAB r2014a (The Mathworks, USA) using functions from PsychToolbox v3.0.11 (Brainard, 1997;Kleiner et al., 2007). Behavioural responses were recorded using a one-button response box.
In each trial faces were presented as adapter stimuli (500 ms) and test stimuli (200 ms) separated by either a 500 ms or 1000 ms ISI (Fig. 1A). Adapters were preceded by a fixation cross for 500 ms. The intertrial interval (including the fixation cross duration) varied pseudorandomly between 1100 and 1300 ms (mean duration = 1200 ms). In each block we presented 4 non-target faces (2 male and 2 female) and one target face (either male or female). In non-target trials the adapter stimulus could be any of the four non-target faces, and the test stimulus could either be a repetition of the same face image (repetition trial) or the other face identity of the same gender (alternation trial; see Fig. 1B). The proportion of trials with face repetitions stayed constant at 37.5%. To prevent across-trial immediate repetition effects the adapter face in one trial could not be the same identity as the test face in the previous trial. Each non-target face appeared equally as adapters and tests, and as repetition and alternating trial stimuli.
There were eight experimental blocks in the experiment. Within the experiment we manipulated the probability that the test face would appear after a 500 ms or 1000 ms ISI (see Fig. 1C). In the first two blocks the test face appeared after a 500 ms or 1000 ms ISI with equal (50%) probability (neutral expectation conditions). These were labelled as 'balanced' blocks. In the remaining six blocks the probability of a 500 ms or 1000 ms ISI was cued by the sex of the adapter face. For example, for one participant a female adapter face cued a 75% probability of a 500 ms ISI (expected 500 ms ISI condition) and a 25% probability of a 1000 ms ISI (surprising 1000 ms ISI condition), whereas a male adapter face instead cued a 25% probability of a 500 ms ISI. These blocks are accordingly labelled as 'cued' blocks. Block order if illustrated in Fig. 1D. We counterbalanced the adapter face sex used to cue each ISI probability across participants. When questioned after testing no participants reported awareness of this cued ISI probability manipulation.
In target trials (25% of all trials) the target face was presented as the test stimulus. We displayed target faces onscreen during the break before each block for participants to memorise. Target faces allocated to each block were counterbalanced across participants. We instructed participants to press a button with their index finger as quickly as possible after seeing the target face (response hands counterbalanced across participants). We considered responses between 200 and 1000 ms from test stimulus onset as correct responses. The number of male and female target faces was equated within balanced and cued block types. Participants completed a short practice block (24 trials) before the main experiment. We presented a separate set of 5 face images during the practice block, which did not appear in the main experiment.
There were 1600 non-target trials in total: 80 trials for each neutral/ balanced (50% probability) and surprising (25% probability) ISI condition, and 240 trials for each expected (75% probability) ISI condition. There were 540 target trials: 54 trials for each neutral and surprising ISI target, and 162 trials for each expected ISI target. Participants were allowed self-paced breaks between blocks. The total time required to complete the experiment (excluding breaks) was 94.5 min.

EEG recording and data processing
We recorded EEG from 128 active electrodes using a Biosemi Active Two system (Biosemi, the Netherlands). Recordings were grounded using common mode sense and driven right leg electrodes (http:// www.biosemi.com/faq/cms&drl.htm). We added 8 additional channels: two electrodes placed 1 cm from the outer canthi of each eye, four electrodes; one placed above and below the center of each eye, and two electrodes; one placed on each of the left and right mastoids. EEG was sampled at 1024 Hz (DC-coupled with an anti-aliasing filter, −3 dB at 204 Hz). Electrode offsets were kept within ± 50 µV.
We processed EEG data using EEGLab V.13.4.4b (Delorme and Makeig, 2004) and ERPLab V.4.0.3.1 (Lopez-Calderon and Luck, 2014) running in MATLAB r2015a. We first downsampled EEG data to 512 Hz offline. We used a photosensor to measure the timing delay of the video system (10 ms) and shifted stimulus event codes offline to account for this delay. We identified 50 Hz line noise using Cleanline (Mullen, 2012) using a separate 1 Hz high-pass filtered dataset (EEGLab Basic FIR Filter New, zero-phase, finite impulse response, −6 dB cutoff frequency 0.5 Hz, transition bandwidth 1 Hz). We then subtracted the identified line noise from the unfiltered dataset (as recommended by Bigdely-Shamlo et al., 2015). We identified excessively noisy channels by visual inspection (median noisy channels by participant = 1, range 0-4) and excluded these from average referencing and independent components analysis (ICA) procedures. We then re-referenced the data to the average of the 128 scalp channels. We additionally removed one channel (FCz) to correct for the data rank deficiency caused by average referencing. A separate dataset was processed in the same way, except a 1 Hz high-pass filter was applied (filter settings as above) to improve stationarity for the ICA. We then performed ICA on the 1 Hz high-pass filtered dataset (RunICA extended algorithm, Jung et al., 2000) and transferred the resulting independent component information to the unfiltered dataset. We identified and removed independent components associated with ocular and muscle activity, according to guidelines in Chaumon et al. (2015). Following ICA, we interpolated any noisy channels and FCz using the cleaned data (spherical spline interpolation). We then low-pass filtered the EEG data at 30 Hz (EEGLab Basic Finite Impulse Response Filter New, zero-phase, −6 dB cutoff frequency 33.75 Hz, transition band width 7.5 Hz). Data were epoched from -100-800 ms from test stimulus onset and baseline-corrected using the prestimulus interval. We excluded from analyses epochs containing ± 100μV deviations from baseline, as well as non-target trials containing button press responses.
2.5. Statistical analyses 2.5.1. Behavioural data We compared mean accuracy percentages and reaction times for targets after expected and surprising 500 ms ISIs at the group level, using 20% trimmed means of the within-subject expected/surprise difference scores and 95% confidence intervals derived from the percentile bootstrap method (10,000 bootstrap samples; Efron and Tibshirani, 1994;Wilcox, 2012). For each bootstrap sample the 20% trimmed mean of the difference scores was calculated. From this distribution the values of the 2.5th and 97.5th percentiles were chosen as the edges of the two-tailed 95% confidence interval. This method gives more accurate probability coverage compared to tests based on the arithmetic mean, and is more robust against problems caused by skew and outliers (Wilcox and Keselman, 2003;Wilcox, 2012). Responses to targets following 1000 ms ISIs were not compared across expectation conditions, as even when participants expected a 500 ms ISI, if the target did not appear by 500 ms then it could always be expected to appear after 1000 ms (see Nobre et al., 2007).

Mass univariate ERP analyses of stimulus repetition effects
To characterise the spatiotemporal pattern of face image repetition effects we compared ERPs evoked by all repeated and alternating faces (pooled across temporal expectation and ISI conditions) using pairedsamples mass-univariate analyses, with cluster-based permutation tests to correct for multiple comparisons, implemented in the LIMO EEG toolbox V1.4 (Pernet et al., 2011). These cluster-based multiple comparisons corrections were used because they provide control over the weak family-wise error rate while maintaining high sensitivity to detect broadly-distributed effects (Maris and Oostenveld, 2007;Groppe et al., 2011). Paired-samples tests were performed at all time points between -100 and 800 s at all 128 scalp electrodes (59,008 comparisons) using the paired samples version of Yuen's t-test (Yuen, 1974). Corrections for multiple comparisons were performed using spatiotemporal cluster corrections based on the cluster mass statistic (Bullmore et al., 1999;Maris and Oostenveld, 2007). Paired-samples t-tests were performed using the original data and 1000 bootstrap samples. For each bootstrap sample data from both conditions were mean-centred, pooled and then sampled with replacement and randomly allocated to each condition (bootstrap-t method). For each bootstrap sample all t statistics corresponding to uncorrected p-values of < 0.05 were formed into clusters with any neighbouring such t statistics. Channels considered spatial neighbours were defined using the 128-channel Biosemi channel neighbourhood matrix in the LIMO EEG toolbox (Pernet et al., 2011(Pernet et al., , 2015. Adjacent time points were considered temporal neighbours. The sum of the t statistics in each cluster is the 'mass' of that cluster. We used the most extreme cluster masses in each of the 1000 bootstrap samples to estimate the distribution of the null hypothesis. We compared the cluster masses of each cluster identified in the original dataset to the null distribution; the percentile ranking of each cluster relative to the null distribution was used to derive its p-value. We assigned the pvalue of each cluster to all members of that cluster. Electrode/timepoint combinations not included in any statistically significant cluster were assigned a p-value of 1.

Mass univariate ERP analyses of stimulus repetition effects (localiser dataset)
A second mass univariate analysis was conducted on ERPs to test stimuli in balanced blocks (with 50% probability of a 500 ms ISI); this was our localiser dataset to define regions of interest (ROIs) for analysing interactions between stimulus repetition and temporal expectations in the cued blocks. Repeated and alternating stimuli (pooled across 500 ms and 1000 ms ISIs) were compared using cluster-based permutation tests as described above. We used clusters of statisticallysignificant repetition effects identified using data from the balanced blocks to define ROIs for testing for expectation by repetition interactions for responses to expected and surprising stimuli in cued blocks.
While others have used mass univariate analyses of repetition effects to derive ROIs for notionally orthogonal expectation x repetition interaction effects within the same dataset (e.g., Summerfield et al., 2011) we instead chose to use an independent localiser dataset. This is because unequal group-level variances across conditions (e.g., when trial numbers are not balanced across expected and surprising conditions) can lead to inflated false positive rates when defining ROIs using orthogonal contrasts (see Brooks et al., 2017;Kriegeskorte et al., 2009).

Repetition effect ROI mean amplitude analyses
For each positive-going and negative-going repetition effect identified using the localiser dataset, we calculated cluster mean ERP amplitudes for repeated and alternating stimuli, following expected and surprising 500 ms ISIs. Cluster mean amplitudes were calculated as the 20% trimmed mean of all channel/timepoint combinations included within each cluster. Trimmed means were used to minimise effects of skewed distributions and outliers within ROIs, which can influence ROI-averaged measures of neuroimaging data (Friston et al., 2006). Cluster mean amplitudes for each alternating stimulus condition were subtracted from their corresponding expectation-matched repeated stimulus condition, to derive a cluster mean amplitude repetition effect measure.
Repetition effects for stimuli following expected and surprising 500 ms ISIs were then compared using the percentile bootstrap method with 20% trimmed means (Wilcox, 2012). In this analysis framework statistically significant differences in the magnitude of repetition effects for stimuli following expected compared to surprising ISIs are equivalent to a temporal expectation by repetition interaction. The Holm-Bonferroni method (Holm, 1979) was used to correct for multiple comparisons across ROIs. Please note that the calculations for adjusting p-values within this method can produce multiple adjusted p-values that are identical, as in the case of our results below. This ROI mean amplitude-based approach allowed us to reduce the number of statistical comparisons as compared to mass univariate analyses, and identify attention or expectation effects that may slightly differ in latency from test stimulus onset across individuals (as done by Summerfield et al., 2011). In a separate control analysis we assessed repetition effects in the same way for stimuli following 1000 ms ISIs. The timing of these stimuli could always be expected once the ISI extended beyond 500 ms (see Nobre et al., 2007). Differences in repetition effects by ISI duration cue in these analyses would indicate that differences in responses to the adapter cue stimuli, rather than fulfilled or violated temporal expectations, are responsible for repetition by expectations interactions in our experiment.
In addition to clusters identified using the localiser data, an additional ROI was added, spanning 230-347 ms at bilateral occipiotemporal electrodes P7/8, P9/10, PO7/8, and PO9/10, corresponding to the N250r ERP face repetition effect (Schweinberger et al., 2002). This ROI was defined based on the time window during which repeated faces evoked more negative-going waveforms compared to unrepeated faces at these channels in the localiser data grand-averaged ERPs. This N250r effect is a robust face repetition effect, and is of high interest in our study as it was found to be modulated by attention (Neumann and Schweinberger, 2009). The time range and electrodes selected for N250r analyses are consistent with those of previous studies (e.g. Neumann and Schweinberger, 2009).
We also performed a 3-way expectation by repetition by ROI repeated measures ANOVA using JASP v0.9.1 (JASP Team, https://jaspstats.org/) to assess whether the observed expectation by repetition interaction effects differed across ROIs (i.e., across electrode clusters or time windows). We did not test for the other main effects and 2-way interactions, as these were not of interest in our study (as opposed to exploratory analyses, see Cramer et al., 2016). Greenhouse-Geisser corrections were applied to correct for violations of sphericity. For this analysis the mean amplitudes for negative repetition effect ROIs were multiplied by -1 to make the observed repetition effects consistent in polarity across ROIs. This was done so that similar expectation-related modulations of repetition effects across ROIs (i.e., more positive repetition-alternation differences in positive repetition effect ROIs, and more negative differences in negative repetition effect ROIs) would not lead to spurious 3-way interaction effects.

Mass univariate ERP analyses of adapter stimuli in cued blocks
In our design there is the possibility that differences in repetition effects by temporal expectation status could be driven by differences in response to the adapter stimuli. For example, if stimuli which cue a 500 ms ISI evoke different visual responses to those that cue 1000 ms ISIs, then this may lead to different repetition effects across expectation conditions. To investigate this, we processed and epoched EEG responses to adapter stimuli in cued blocks in the same way as done for test stimuli. We then included these epochs in a mass univariate analysis, with parameters and multiple comparisons corrections as described above, to compare ERPs evoked by adapters that cue 500 ms compared to 1000 ms ISIs. To maximise our ability to detect effects all adapter stimuli in the cued blocks were included, regardless of whether they were followed by a repetition, alternation or target stimulus.

Task performance
We first assessed whether participants were performing the task correctly, and whether the ISI expectation manipulation affected behavioural responses to target trials. Accuracy for detecting and responding to targets, collapsed across conditions, was near ceiling (20% trimmed mean = 99% range 93-100%). Accuracy did not differ for targets following expected compared to surprising ISIs (trimmed mean accuracy scores ranged between 98% and 100% across conditions). The trimmed mean reaction time to target faces, collapsed across conditions, was 479 ms (range 371-642 ms across participants). There were no statistically significant differences in reaction times to targets after expected compared to surprising ISIs, for targets after 500 ms and 1000 ms ISIs (trimmed mean difference for 500 ms ISI = −3 ms, 95% CI = [-10.7, 3.5], p = .20; difference for 1000 ms ISI = 1.6 ms, CI = [-6.6, 8.8], p = .34).

Analyses using all non-target trials
To characterise the spatiotemporal pattern of face image repetition effects we conducted mass univariate analyses of ERPs comparing responses to repeated and alternating test stimuli. These analyses revealed four time periods with distinct topographical patterns of stimulus repetition effects (shown in Fig. 2A-C). The earliest repetition effect (labelled as Cluster 1) spanned 162-211 ms from test stimulus onset, during which repeated stimuli evoked more positive waveforms at bilateral occipitotemporal channels and more negative waveforms at frontocentral channels. A later effect (Cluster 2) spanned 228-369 ms, during which waveforms to repeated stimuli were more negative at left occipitotemporal channels and more positive at frontal sites. Cluster 3 spanned 371-619 ms and consisted of more negative-going waveforms to repeated stimuli at bilateral posterior sites centred around Pz (but extending to PO7 and PO8), accompanied by more positive-going waveforms at frontal channels. The last cluster (Cluster 4) spanned 720-800 ms, and had a similar topography to the first cluster, with more positive-going waveforms at occipitotemporal channels to repeated stimuli, and more negative-going waveforms at central channels.
An earlier repetition effect could also be observed in the groupaveraged and single-subject ERPs at electrodes PO7/8, during the time window of the P1 component (100-120 ms; Fig. 2C). Although this repetition effect was not statistically significant, the topography of this effect (depicted in Fig. 2B) was highly similar to that found in our recent study (Feuerriegel et al., 2018a).

Analyses using the localiser dataset
We also ran the same face image repetition effect analyses using data only from the balanced blocks, in order to derive ROIs for analyses of temporal expectation by repetition interactions in the cued blocks. We identified 2 statistically significant clusters (displayed in Fig. 3A, C). An early repetition effect (Localiser Cluster 1) was observed spanning 158-203 ms, during which repeated stimuli evoked more positive waveforms at bilateral occipitotemporal channels and more negative waveforms at frontocentral channels (similar to Cluster 1 in Fig. 2B). A later repetition effect (Localiser Cluster 2) spanned 345-609 ms and consisted of more negative-going waveforms to repeated stimuli at posterior sites (centred around Pz) accompanied by more positive-going waveforms at frontal channels (similar to Cluster 3 in Fig. 2B).

ROI mean amplitude analyses
After deriving ROIs using the localiser dataset we then assessed whether repetition effects captured by these ROIs were modulated by temporal expectations. Grand-average ERPs displaying repetition effects for stimuli following expected and surprising 500 ms ISIs are displayed in Fig. 3B. Estimates of ROI-averaged repetition effects for stimuli following expected and surprising 500 ms ISIs are displayed in Fig. 4.
For negative-going repetition effects, we did not find significant differences between repetition effects following expected ( Fig. 4).
In the control analyses of stimuli following 1000 ms ISIs, we did not find differences in repetition effect magnitudes for positive-going repetition effects (trimmed mean [expectedsurprising]  Analyses of responses to stimuli following 1000 ms ISIs did not find differences in repetition effect magnitudes by temporal expectation (trimmed mean [expectedsurprising] difference = 0.04 µV, CI = [-0.30, 0.43], p = .738, Holm-Bonferroni adjusted p = 1).

Repetition by expectation by ROI ANOVA results
To assess whether the repetition by expectation interaction effects differed across ROIs, a 3-way repetition by expectation by ROI repeated measures ANOVA was performed. The 3-way interaction was statistically significant (F(1.65, 28.1) = 46.02, p < .001), showing that the interaction effects were not consistent across ROIs (i.e., across electrode Fig. 2. Results of mass univariate analyses of repetition effects (i.e., repetition minus alternation differences). A) Spatiotemporal map of statistically significant repetition effects. Yuen's t is plotted for each channel/timepoint combination, thresholded by cluster-level statistical significance. B) Scalp maps are displayed for each of 4 identified time windows showing distinct topographies of repetition effects. The top row displays the mean Yuen's t value over the time window of each cluster. The bottom row shows the [repetitionalternation] average amplitude differences over each time window. The repetition effect during the visual P1 time window was not statistically significant, but was plotted for comparison with Feuerriegel et al. (2018a). C) Grand-averaged ERPs to repeated and alternating stimuli (top row) and grand-average and single-subject repetition-alternation difference waveforms (bottom row). Blue shaded areas denote time windows of statistically significant clusters within which the plotted channel was included.
clusters and time windows). Combined with our ROI-specific analysis results, this suggests that the observed larger repetition effects for surprising stimuli may be particular to the early (158-203 ms) positive repetition effect ROI.

Mass univariate analyses of adapter ERPS in cued blocks
To investigate whether any observed differences in repetition effects were due to differences in adapter-evoked ERPs across expectation conditions, a mass univariate analysis was performed on ERPs evoked by adapter stimuli in the cued blocks. No statistically significant differences were found between adapters that cued 500 ms compared to 1000 ms ISIs. Fig. 3. Repetition effect clusters derived from the localiser data, and estimates of repetition effects for stimuli with expected and surprising onset times. A) Repetition effects from mass univariate analyses on the localiser dataset. Yuen's t is plotted for each channel/timepoint combination, thresholded by cluster-level statistical significance. B) Grand-average ERPs to repeated and alternating stimuli for expected and surprising 500 ms ISI conditions. Blue shaded areas denote ROI time windows selected for analysis based on localiser data repetition effects and the N250r ROI definition. C) Topographies of localiser-derived ROIs (left column) and mean amplitudes of repetition effects during ROI time windows for expected and surprising stimuli (center and right columns). In maps of localiser ROIs Electrodes included within a positive repetition effect cluster are coloured red, electrodes included in a negative repetition effect cluster are coloured blue.

Discussion
We comprehensively investigated the spatiotemporal dynamics of temporal expectation effects on RS in the visual system using ERPs. This permitted us to test a core hypothesis of the predictive coding model of RS: that RS will be larger for stimuli with expected, rather than surprising, stimulus onset times (Auksztulewicz and Friston, 2016). We instead found that repetition effects during an early (158-203 ms) time window were larger for stimuli with surprising temporal onsets, providing evidence against this model. Our findings resemble those of previous experiments investigating face identity expectations (e.g., Amado et al., 2016;Feuerriegel et al., 2018b) demonstrating that expectation and surprise effects are reduced by RS. Taken together, these findings suggest that RS, acting via local or feedforward mechanisms (e.g., reduced input or synaptic fatigue), may interact with a variety of expectation effects that influence responses of the same feature-selective neurons within local excitatory-inhibitory circuits. Such cumulative effects over many stimulus-selective neurons would lead to largescale modulations of neural activity, as measured using EEG and fMRI (e.g., Amado et al., 2016).
We also replicated the complex progression of ERP face image repetition effects reported in our previous study (Feuerriegel et al., 2018a), providing further evidence for a diverse range of RS effects which extend at least until 800 ms from stimulus onset.

Effects of temporal expectation on repetition suppression
We identified an early (158-203 ms) time window during which temporal expectations modulated RS. During this early time window ERP repetition effects were larger for stimuli with surprising (rather than expected) onset times. Follow-up analyses revealed that temporal expectation effects in this ROI were smaller for repeated compared to alternating stimuli. These results are incompatible with the predictive coding model of RS (Auksztulewicz and Friston, 2016) which predicts larger repetition effects following expected onset times due to increased sensory precision. Our results are also similar to findings from studies which manipulated expectations about the identity of upcoming stimuli, which found larger BOLD repetition effects for surprising stimuli in the FFA, the occipital face area and lateral occipital cortex (e.g., Amado et al., 2016;Kovács et al., 2012;Choi et al., 2017). Interactions between expectation effects and RS have not been explicitly described within normalisation models. One hypothesis compatible with the above findings is that RS, resulting from local or feedforward mechanisms as described in normalisation models, reduces effects of perceptual expectation and surprise. This could be achieved through reductions in stimulus-driven input to excitatory neurons (e.g., via inherited adaptation: Larsson et al., 2016;Feuerriegel, 2016), which may also reduce effects of subsequent inhibitory inputs to stimulusselective neurons during recurrent network activity, (e.g., those of inhibitory interneurons: Auksztulewicz and Friston, 2016), which are associated with prediction error minimisation and perceptual Expectation effects (expectedsurprise differences) are displayed for repeated and alternating stimuli for the positive repetition effect ROI in Localiser Cluster 1, where there was a significant repetition by expectation interaction. Red lines denote 20% trimmed means. Error bars denote 95% confidence intervals. Dots adjacent to error bars represent individual data points for each condition. P-values are displayed for temporal expectation by repetition interaction tests, both uncorrected for multiple comparisons and corrected using the Holm-Bonferroni method. Please note that repetition effect estimates for the neutral expectation conditions may be inflated, as the same data were used to define the ROIs used for estimation of repetition effects (see Kriegeskorte et al., 2009). expectation. For alternating (i.e., unrepeated) stimuli, feature-selective excitatory neurons may even be disinhibited (e.g., Kaliukhovich and Vogels, 2016) and expectation effects on neural response gain would then be amplified. This would be equivalent to RS modulating salience or sensory precision within predictive coding models (Solomon and Kohn, 2014;Feuerriegel et al., 2018b). Predictive coding mechanisms could potentially utilise locally-generated RS (e.g., via afterhyerpolarisation or synaptic fatigue) in this way, so to implement a default expectation for recently-seen stimuli, enabling expectation and surprise effects to preferentially signify important changes in the environment.
The finding of larger repetition effects for surprising stimuli in our experiment may suggest a role of repetition effects to minimise the effects of surprise on behaviour. Surprising stimuli elicit slower responses relative to expected stimuli (Summerfield and de Lange, 2014;Gold and Stocker, 2017), and lead to slower responses to subsequent stimuli (Wessel, 2017;Wessel and Aron, 2017). If repetition effects preferentially reduce effects of surprise (e.g., as in Amado et al., 2016) then this may serve to avoid surprise-related performance deficits. If this is the case, then one would expect to see less response slowing for stimuli following a surprising repeated stimulus, compared to following a surprising unrepeated stimulus. Such modulations of sequential effects could be tested in future behavioural experiments.
Alternatively, our results could also be explained by interactions between repetition suppression and attention-related neural response gain changes. During the early time window, mean amplitudes were more positive for expected compared to surprising alternating stimuli, which may reflect an increase in the visual P2 component evoked by stimuli appearing at expected (and attended) times. However, such modulations were smaller or absent for repeated stimuli. This suggests that increases in firing rates and local field potentials for stimuli appearing at expected times (e.g., Ghose and Maunsell, 2002;Anderson and Sheinberg, 2008;Lima et al., 2011) may have been reduced by RS. This could conceivably occur if stimulus repetition minimizes input to stimulus-selective neurons (e.g., via synaptic depression or inherited adaptation), analogous to decreasing visual stimulus contrast (e.g., in Williford and Maunsell, 2006). Reduced input to stimulus-selective neurons would then render the responses of these neurons less affected by attention-related response gain changes (e.g., Lee and Maunsell, 2009;Reynolds and Heeger, 2009). However, we are hesitant to ascribe the observed ROI-averaged amplitude differences to an expectationrelated increase of visual P2 component, given that our ROI predominantly sits between the peaks of the visual N170 and P2 components. These ROI-averaged effects could instead reflect a reduction of negative-going potentials during the time period of the N170, as found for expected (compared to surprising) stimulus identifies (Johnston et al., 2016) Modulations of RS by temporal expectations were not found during the later time windows (spanning 230-609 ms). Although it appeared that the N250r repetition effect was larger for stimuli with expected temporal onsets, this interaction did not survive correction for multiple comparisons. It is possible that the small sample size (n = 18) in our study prevented us from detecting this interaction effect. However any conclusions remain speculative until this interaction effect can be replicated using a larger independent sample.

Face image repetition effects
Using mass univariate analyses of ERPs we also identified a complex progression of repetition effects spanning 158-800 ms from stimulus onset, with differing topographies across separate time windows (displayed in Fig. 2). We replicated repetition effects reported in our previous study (Feuerriegel et al., 2018a) with regard to the polarity, latency, scalp topography and approximate magnitude of each effect.
Our results provide further evidence for multiple, distinct RS effects that occur over a wide time range following stimulus onset. Many of these effects were detected at electrodes over visual cortex, indicating that RS as measured by fMRI BOLD in the visual system reflects a mixture of early and late effects. If this is the case, it is unclear whether the effects of attention and expectation on BOLD RS (e.g., Eger et al., 2004;Summerfield et al., 2008) reflect early or late modulations of stimulus-evoked responses. Electrophysiological recordings with higher temporal precision (e.g., Todorovic and de Lange, 2012;Neumann and Schweinberger, 2009) will be needed to characterise the timing of these effects. Such timing data will be critical to identify different mechanisms underlying RS, including the relative timing of contributions by each mechanism. Notably, repetition effects in our study also extended beyond the time range of repetition suppression measured from firing rates and local field potential amplitudes in Macaques (typically lasting until 300-400 ms from stimulus onset, e.g., Liu et al., 2009;De Baene and Vogels, 2010;Kaliukhovich and Vogels, 2011). While we are not sure of the reason behind this discrepancy, it could suggest that later repetition effects in our experiment reflect activity of distinct populations of neurons compared to earlier effects, and that these populations are separated enough to avoid detection by microelectrodes used in the above nonhuman primate studies.
The earliest (162-211 ms) statistically significant repetition effect resembled reductions of N170 component amplitudes as reported in previous studies (Caharel et al., 2011(Caharel et al., , 2015. In Caharel et al. (2015) this effect was found only with identical image repetitions or small viewpoint differences between adapter and test faces, and so is likely to index repetition of low-level image features or stimulus shape. We also identified the N250r face identity repetition effect (Schweinberger et al., 2002) spanning 228-369 ms from test stimulus onset. As in our previous study, we identified a mid-latency repetition effect spanning 371-619 ms from stimulus onset. The latency and broad scalp topography across posterior electrodes suggests widespread secondary effects of RS on recurrent local network activity (e.g., Kaliukhovich and Vogels, 2016;Patterson et al., 2013) or consequences of feedforward and feedback interactions across visual areas (e.g., Ewbank et al., 2011). We also replicated a late repetition effect from 720 ms until the end of the 800 ms epoch. Such a late effect is unlikely to result from changes in stimulus-driven afferent input, and may index feedback input into visual areas from frontal regions .
One difference in results compared to our previous study is that we did not find a very early repetition effect (~100 ms, around the time of the visual P1 component) to be statistically significant. However, we did find a similar topography of effects when averaging ERPs over the time window of the P1 in our experiment (Fig. 2B). It is likely that our smaller sample size (n = 18 vs. n = 36) prevented us from detecting this short-duration effect using a cluster-based permutation test (see Groppe et al., 2011).

Caveats
Our research should be interpreted with the following points kept in mind. Firstly, it is important to note that the repetition effects reported here reflect face image repetition rather than face identity repetition. Early repetition effects, due to repetition of low-level features, may have caused later repetition effects, or enhanced their magnitudes via compounding inherited RS effects across brain regions (reviewed in Larsson et al., 2016;Feuerriegel et al., 2016). Studies aiming to isolate identity repetition effects should use different image sizes (Dzhelyova and Rossion, 2014), or different images with minimal overlap of local features (Schweinberger and Neumann, 2016).
Also, it is likely that the late (720-800 ms) repetition effect described here extends beyond 800 ms after stimulus onset. In both the current experiment and our previous study we could not extend our analysis epochs beyond 800 ms from stimulus onset due to paradigm design (as epochs would overlap with the earliest onset times of the fixation cross in the subsequent trial). Future experiments should use longer intertrial intervals to assess whether immediate stimulus repetition effects extend even further in time than described here.
In our study we could not determine whether the modulations of repetition effects in our experiment are due to temporal attention or expectation. Previous studies of temporal attention have also manipulated expectations regarding the temporal onset of task-relevant stimuli (e.g. Miniussi et al., 1999;Griffin et al., 2002) and so far not many experiments have separately manipulated temporal expectation and attention (but see Schwartze et al., 2011Schwartze et al., , 2013Paris et al., 2015). While either effect (temporal attention or temporal expectation) yields the same hypotheses according to the predictive coding model of RS, the distinction between expectation and attention will likely be important when developing mechanistic models of RS effects within local neural networks. Future work that orthogonally manipulates temporal attention and expectation will be able to distinguish between these effects.
In addition, because we used the neutral conditions as a localiser dataset, we could not determine whether the temporal expectation effects on ERPs were predominantly due to fulfilled expectations or surprise. Recent work on stimulus identity expectations that included neutral conditions has shown that surprise effects are larger than those of fulfilled expectations (Amado et al., 2016;see Kovacs and Vogels, 2014). Future work should include neutral expectation conditions, and avoid possible confounds of block order effects which are present in our experiment.

Conclusions
The research reported here shows that temporal expectations modulate RS at short latencies (158-203 ms) from stimulus onset, and that RS is larger for stimuli with surprising onset times, at least during the early period of the stimulus evoked response. Our findings provide evidence against the predictive coding model of RS, and support the idea that expectation and surprise effects are reduced by RS in the visual system. Our results suggest that there are multiple, interactive mechanisms which support sensory predictions within the visual hierarchy.