Temporal uncertainty enhances suppression of neural responses to predictable visual stimuli

Contextual information triggers predictions about the content ( “ what ”) of environmental stimuli to update an internal generative model of the surrounding world. However, visual information dynamically changes across time, and temporal predictability ( “ when ”) may inﬂuence the impact of internal predictions on visual processing. In this magnetoencephalography (MEG) study, we investigated how processing feature speciﬁc information ( “ what ”) is aﬀected by temporal predictability ( “ when ”). Participants ( N = 16) were presented with four consecutive Gabor patches (entrainers) with constant spatial frequency but with variable orientation and temporal onset. A ﬁfth target Gabor was presented after a longer delay and with higher or lower spatial frequency that participants had to judge. We compared the neural responses to entrainers where the Gabor orientation could, or could not be temporally predicted along the entrainer sequence, and with inter-entrainer timing that was constant (predictable), or variable (unpredictable). We observed suppression of evoked neural responses in the visual cortex for predictable stimuli. Interestingly, we found that temporal uncertainty increased expectation suppression. This suggests that in temporally uncertain scenarios the neurocognitive system invests less resources in integrating bottom-up information. Multivariate pattern analysis showed that predictable visual features could be decoded from neural responses. Temporal uncertainty did not aﬀect decoding accuracy for early visual responses, with the feature speciﬁcity of early visual neural activity preserved across conditions. However, decoding accuracy was less sustained over time for temporally jittered than for isochronous predictable visual stimuli. These ﬁndings converge to suggest that the cognitive system processes visual features of temporally predictable stimuli in higher detail, while processing temporally uncertain stimuli may rely more heavily on abstract internal expectations.


Introduction
Our interaction with the external environment is largely shaped by our internal expectations ( Clark, 2013 ;Mechelli et al., 2004 ;Mumford, 1992 ).In the primary visual cortex, percepts are decomposed into their fundamental features (such as edges, orientations, colours, shapes) and dissociable correlates of such representational properties can be decoded from neural signals ( Carlson et al., 2019 ;Pantazis et al., 2018 ).These features are the building blocks that determine the perceptual content -"what " we perceive a stimulus to be.Perception of content is largely modulated by higher level processes.Predictive processing theories propose that an internal mental model of the surrounding environment is used to generate inferences about the external causes of the Studies on visual perception have paid relatively less attention to temporal ( when ) than to content ( what ) predictability ( Demarchi et al., 2019 ;Kok et al., 2017 ; but see Nobre et al., 2007 ).The effect of temporal jittering has been explored intensively in visual attention studies that focused on the estimation of the spatial location of a stimulus ( Coull and Nobre, 1998 ).
The role of timing in predictive processing in vision has mainly been studied in relation to the perception of objects in motion.The visual system requires a certain amount of time to process incoming sensory information ( Blom et al., 2020 ;Maunsell and Gibson, 1992 ). Yet, the neurocognitive system must rapidly extrapolate the trajectory of moving objects to expedite actions.Predictive processing accounts propose a compensatory mechanism to support such extrapolations: the system generates predictions about incoming stimuli, which trigger visual responses before the temporal onset of the actual stimulation.Research on the perception of moving objects thus underscores that what and when stimulus properties are strongly interwoven and shape human perception.Here we focus on the modulation of "expectation suppression " effects ( Grill-Spector et al., 2006 ) to evaluate how predictive processing of "what properties " is affected by the manipulation of "when properties ".Specifically, we test the hypothesis that temporal uncertainty enhances predictive processing of stimulus content.We base this hypothesis on the theoretical claim that sensory systems require internal predictions in order to deal with uncertainty in the external environment; increased uncertainty leads to increased reliance on predictive processing ( Clark, 2013 ).
Repetition suppression or, more generally, expectation suppression1 is a ubiquitous phenomenon in the processing of repeating stimuli -and reveals some of the principles of predictive processing.Specifically, in the setting of predictive coding ( Rao and Ballard, 1999 ;Srinivasan et al., 1982 ) repetition suppression is often cast in terms of a progressive reduction in the amplitude of evoked responses due to a reduction in prediction errors.In other words, as sensory learning furnishes more accurate predictions of predictable stimuli, the prediction error falls and neuronal responses are attenuated ( Garrido et al., 2009 ).However, this suppression rests upon the predictability of successive stimuli, which, itself, has to be inferred by the brain.In predictive coding, this inference is usually thought of in terms of the precision of prediction errors, i.e., an estimate of predictability ( Ainley et al., 2016 ;Auksztulewicz and Friston, 2016 ;FitzGerald et al., 2015 ;Haarsma et al., 2020 ;Kok et al., 2012 ;Pinotsis et al., 2014 ;Shipp, 2016 ;Spratling, 2017 ;Sterzer et al., 2018 ).
In brief, predictive coding accounts state that ascending prediction errors are used to update environmental models at higher hierarchical levels, which then supply descending predictions to form prediction errors via the comparison with stimulus information.The degree to which an environmental event is predictable determines the precision or weight assigned to the prediction and prediction error leading to the notion of precision weighted prediction errors ( Bastos et al., 2012 ;Clark, 2013 ;Hohwy, 2013 ).Mathematically, precision weighting simply affords more weight to more predictable sources of information thereby enabling precise prediction errors to have more influence on belief updating.Technically, this is sometimes referred to as Kalman gain in Kalman filtering formulations of predictive coding ( Rao and Ballard 1999 ).Physiologically, this is usually thought to be mediated by changes in postsynaptic gain or excitability of the sort that mediates attentional gain ( Feldman and Friston 2010 ).According to such a formulation, there are dual determinants of evoked responses; namely, changes in prediction error and changes in precision.To disentangle these determinants, we examined expectation suppression under different levels of predictability.Our hypothesis was that expectation suppression would, itself, be attenuated when certain attributes (in the present study the stimulus timing) were unpredictable.An interesting corollary of precision weighted prediction errors in the brain is the representational sharpening that accompanies more precise priors and prediction errors ( Kok et al., 2012 ).This underscores our second hypothesis that the ability to decode stimulus attributes from evoked neuronal responses depends upon predictability.We addressed this hypothesis using multivariate analyses and decoding accuracy.
In the present MEG study, participants viewed four consecutive Gabor patches (henceforth entrainers ), then had to make a spatial frequency judgment on a fifth target Gabor.Entrainers followed either a predictable or unpredictable sequence of orientations and had either isochronous or jittered onset times.Formally, our design can be thought of as a factorial design with three factors.First, the stimulus factor with four levels of entrainment ( entrainers 1 to 4, that are followed by the target, see Fig. 1 ).The remaining two factors constitute a 2 × 2 design in which the spatial (the Gabor orientation reflecting the what factor) attributes were and were not predictable -and the stimulus onsets ( when factor) were (fixed SOA of 400 ms) and were not (average SOA of 400 ms ± 130 ms) predictable.This enabled us to examine the interaction between expectation suppression and predictability to test the hypothesis that different sources of predictability attenuated expectation suppression.We planned four main analyses.Firstly, we expected neural responses to entrainers to gradually decrease in amplitude across four-element sequences when orientations were predictable due to expectation suppression.Secondly, we hypothesized that if the cognitive system handles temporal uncertainty by increasing its reliance on internal predictions, expectation suppression should be larger for onset-jittered compared to onset-isochronous entrainers.Thirdly, we expected multivariate pattern analysis (MVPA; Pantazis et al., 2018 ;King et al., 2016 ;Cichy et al., 2014: Grootswagers et al., 2017 ) to reveal increasingly high decoding accuracy of entrainer orientation for predictable entrainers, s, reflecting the brain's incrementally higher reliance on predictions for successive Gabor orientations.Finally, we planned to analyze the time course of entrainer orientation classification results to understand how (both what and when ) predictability affects the stimulus specificity of the neural responses.

Participants
From the initial set of twenty participants, we included data from sixteen participants (7 females; age range: 19-31; M = 24.8;SD = 3.6) in our analyses.Two participants were excluded from the study as they did not complete the whole experiment and two more were excluded from the study due to excessive motion artifacts in their data.The ethical committee and the scientific committee of the Basque Center on Cognition, Brain and Language (BCBL) approved the experiment (following the principles of the Declaration of Helsinki).Participants gave written informed consent and were financially compensated.The participants were recruited from the BCBL Participa website ( https://www.bcbl.eu/participa/).Participants did not present any neurological or psychological disorders, and had normal or corrected to normal vision.

Experimental procedure
A series of Gabor patches with variable orientations and spatial frequencies (measured in cycles per degree [CPD] of visual angle) was presented.Stimuli were back-projected on a screen placed 60 cm from each participant's nasion.The Gabors were presented in the center of the Fig. 1.Experimental design.A) Orientations and timing of upcoming entrainers and target are predictable ( what + when condition).B) Orientations of upcoming entrainers and target are not predictable but the timings are predictable ( when condition).C) Orientation of upcoming entrainers and target are predictable but the timing is not predictable ( what condition).D) Orientations and timing of upcoming entrainers and target are not predictable ( random condition).Abbreviations: E1 -Entrainer 1, E2 -Entrainer 2, E3 -Entrainer 3, E4 -Entrainer 4, ISI -Inter Stimulus Interval.screen on a gray background, covering the central two degrees of the visual field.Each trial began with a fixation cross (black color) followed by four sequential Gabor patches (entrainers), each presented for 200 ms followed by an interstimulus interval showing an empty gray screen.After a longer interstimulus interval, a fifth Gabor (target) was presented for 200 ms.The entrainers had an intermediate spatial frequency (40 CPD), while the target could have either a higher (60 CPD) or lower (20 CPD) spatial frequency.Participants were required to indicate if the target had a higher or lower spatial frequency than the entrainers using a button press.
Four properties of these sequences were experimentally manipulated ( Fig. 1 ): a) the orientation of the target was either horizontal or vertical; b) the spatial frequency of the target was either higher or lower than the spatial frequency of the entrainers; c) the orientation of the target was either predictable based on the orientations of previous entrainers (i.e., clockwise or counter/clockwise rotations of either 15 or 30°; e.g., entrainers of 30, 45, 60, 75 and a target of 90°) or unpredictable (a random selection from a set of 15 or 30°rotations; e.g., 30, 75, 60, 45 and a target of 90°); d) the timing of the interstimulus intervals (blank gray screens) between the four entrainers and between the last entrainer (entrainer 4) and the target was either predictable (i.e., fixed interstimulus intervals of 200 ms between entrainers and 600 ms between entrainer 4 and the target) or unpredictable (varying interstimulus intervals ranging between 70 and 330 ms between entrainers and 370 and 850 ms between entrainer 4 and the target).
Depending on the timing and orientation of the entrainers and target, trials were divided into four conditions ( Fig. 1 ): (i) in the what + when condition, both the timing and the orientations of successive entrainers -and the final target Gabor -were predictable; (ii) in the when condition, timing was predictable but orientations were unpredictable.(iii) in what condition, successive entrainers and target orientations were predictable but timing was unpredictable; (iv) in the random condition both orientations and timing were unpredictable.
A total of 160 trials were presented in each condition (80 horizontal and 80 vertical targets, randomly assigned 80 high and 80 low spatial frequencies) for a total of 640 trials per participant.80 localizer trials for horizontal and vertical targets were also acquired while participants simply fixated the center of the screen.
On each trial, participants had to indicate whether the target had a higher or a lower spatial frequency [CPD] than the preceding entrainers.Participants responded by pressing a button with their left or right hand, with the response hand counterbalanced across participants.A short optional break (participants pressed the button when they were ready to continue) was available after every 12 trials, and a longer mandatory break took place every 60 trials (the MEG researcher pressed a button from the operating console to pause and restart the presentation).

Data acquisition and preprocessing
MEG data were acquired in a magnetically shielded room using the whole-scalp MEG system (Elekta-Neuromag, Helsinki, Finland) installed at the BCBL ( http://www.bcbl.eu/bcbl-facilitiesresources/meg/).The system is equipped with 102 sensor triplets (each comprising a magnetometer and two orthogonal planar gradiometers) uniformly distributed around the head of the participant.Head position inside the helmet was continuously monitored using four Head Position Indicator (HPI) coils.The location of each coil relative to the anatomical fiducials (nasion, left and right preauricular points) was defined with a 3D digitizer (Fastrak Polhemus, Colchester, VA, USA).This procedure is critical for head movement compensation during the data recording session.Digitalization of the fiducials plus ~300 additional points evenly distributed over the scalp of the participant were used during subsequent data analysis to Continuous MEG data were pre-processed off-line using the temporal Signal-Space-Separation (tSSS) method (Taulu & Simola, 2006) which suppresses external electromagnetic interference.MEG data were also corrected for head movements, and bad channel time courses were reconstructed in the framework of tSSS.Subsequent analyses were performed using Matlab R2014b (Mathworks, Natick, MA, USA).

Behavioural data
Behavioural responses to the spatial frequency [CPD] task (on the target) were evaluated in terms of accuracy and Response Times (RTs) for all four conditions ( what + when, when, what, and random ).Trials with response times longer than 1500 ms were considered to be outliers and were removed from the analysis.The mean RT and standard deviation was computed for each experimental condition.

Sensor level event-related fields (ERFs)
MEG trials were corrected for jump and muscle artifacts using standard automated scripts based on the Fieldtrip toolbox ( Oostenveld et al., 2011 ) implemented in MATLAB 2014B Heartbeat and EOG artifacts were identified using Independent Component Analysis (ICA) and lin-early subtracted from the MEG recordings.The ICA decomposition (30 components extracted per participant) was performed using the FastICA algorithm.ICA components maximally correlated with EOG and ECG recordings were automatically removed.On average, two components were removed per participant.The artifact-free data were bandpass filtered between 0.5 and 45 Hz.Trials were segmented time-locked to each of the entrainers (entrainers 1, 2, 3, and 4) and the target.The trial segments were grouped together for each entrainer and target, and then averaged to compute the ERFs.For each planar gradiometer pair, ERFs were quantified at every time point as the Euclidean norm of the two gradiometer signals.Baseline correction was also applied to the evoked data based on the 400 ms of data prior to the onset of the fixation cross presented at the beginning of each trial.
In brief, we will first describe the analysis of the sensor level data to establish expectation suppression and its interactions with different kinds of predictability.We then move on to a more detailed analysis of the functional anatomy of expectation suppression using source constructed data.We applied an ANOVA to sensor-level data to explore the influence of our experimental factors on visual ERFs.First, we extracted ERF amplitudes in the set of five occipital sensors that had shown maximum response to the visual localizers.We then selected the time window classically associated with the initial visual evoked response (85-135 ms post stimulus).A three-way repeated measures ANOVA was computed in JASP ( JASP Team, 2020 ) with these amplitude values as dependent variables and the following factors: entrainer (four levels; corresponding to entrainers 1, 2, 3, 4); what (two levels; predictable/unpredictable entrainer and target orientations); and when (two levels; predictable/unpredictable timing of entrainers and target).
Significant interactions (specifically, the triple interaction "entrainer * what * when ") were further investigated through theoretically relevant pairwise comparisons.Pairwise comparisons between conditions were performed using a cluster-based permutation test (Maris & Oostenveld, 2007).A randomization distribution of cluster statistics was constructed for each subject over time and sensors and used to evaluate whether conditions differed statistically over participants.In particular, t-values were computed for each sensor (combined gradiometers) and each time point during the 0-270 ms time window, and were clustered if they had t-values that exceeded a t-value corresponding to the 99.99th percentile of Students t-distribution, i.e. a two-tailed t -test at an alpha of 0.01, and were both spatially and temporally adjacent.Cluster members were required to have at least two neighboring channels that also exceeded the threshold to be considered a cluster.The sum of the t-statistics in a sensor cluster was then used as the cluster-level statistic, which was then tested by permuting the condition labels 1000 times.
Four different comparisons were carried out.In the first comparison, we contrasted ERFs for the when and the what + when conditions.This comparison evaluated the effect of orientation predictability when the timing of the entrainers and target were predictable.In the second comparison, we compared ERFs for the random and what conditions.This comparison evaluated the effect of orientation predictability when timing was unpredictable.These two comparisons mainly focused on the main effect of orientation predictability (i.e., the what manipulation) revealed by expectation suppression.
We then compared the ERFs for the what + when and what conditions.Here we directly contrasted these two predictable orientation conditions to evaluate the effect of temporal predictability on stimulus predictability.The final comparison contrasted ERFs in the when and random conditions.This comparison was performed to analyze the effect of temporal predictability in the absence of orientation predictability.

Source level event-related fields (ERFs)
Source reconstruction mainly focused on the statistically more reliable effects observed at the sensor-level.In the present experimental scenario, we expected to find the strongest modulation of visual evoked responses for the last entrainer of each predictable series, when both what and when expectations would be highest.
MEG-MRI co-registration was performed using MRIlab (Elekta Neuromag Oy, version 1.7.25).Individual T1-weighted MRI images were segmented into scalp, skull, and brain components using the segmentation algorithms implemented in Freesurfer (Martinos Center of Biomedical Imaging, MQ; Dale et al., 1999 ).The source space was defined as a regular 3D grid with a 5 mm resolution and the lead fields were computed using a single-sphere model for 3 orthogonal source orientations.The lead field at each grid point was reduced to its first two principal components.Whole brain source activity was estimated using a linearly constrained minimum variance (LCMV) beamformer approach ( Veen et al., 1997 ).Both planar gradiometers and magnetometers were used for inverse modeling.The covariance matrix used to derive LCMV beamformer weights was estimated from the pre-and post-stimulus data in the pre-stimulus (from 400 ms prior to fixation cross onset) to poststimulus (400 ms after the presentation of the target) time range.
The LCMV beamformer focused on the (baseline corrected) evoked data in the time period 85-125 ms post-stimulus (when ERF peak amplitude across participants at the sensor level was largest).A non-linear transformation using the spatial-normalization algorithm (implemented in Statistical Parametric Mapping SPM8: Friston et al., 1994 ) was employed to transform individual MRIs to the standard Montreal Neurological Institute (MNI) brain.Transformed maps were further averaged across participants.Freesurfer's tksurfer tool was used to visualize the brain maps in MNI space.For each condition (at entrainer 4, E4), we obtained the source value and the MNI coordinates of local maxima (sets of contiguous voxels displaying higher source activation than all other neighbouring voxels; Bourguignon et al., 2018 ).
Source activity was compared between conditions (e.g., when vs. what + when, random vs. what, what vs. what + when and when vs .random ) by extracting a peak value within a 5 mm sphere around the common lo-cal maximum in the source space.We used t-tests to evaluate differences between conditions across participants.

MVPA (Multivariate pattern analysis)
A MVPA approach ( Pantazis et al., 2018 ;King et al., 2016 ;Cichy et al., 2014: Grootswagers et al., 2017 ) was used to evaluate the stimulus-specificity of the visual neural response across time in the experimental conditions.To validate our method we decoded both the feature of interest in the present experimental manipulation (i.e., the Gabor orientation) and a control feature of that same stimulus (i.e., its spatial frequency, or CPD).
Time-resolved within-subjects MVPA was performed to decode the features (i.e., the orientation and spatial frequency) of all the Gabors (i.e., E1, E2, E3, E4 and T) from the MEG data.For E1, E2, and E3, data were segmented from 50 ms prior to 250 ms after the onset of the entrainers.The time interval between E4 and the target was longer than the time interval between the rest of the entrainers.For this reason, for E4, the data was segmented from 50 ms prior to 600 ms after the onset of the entrainer.For the target, the data was segmented from − 400 ms to 550 ms.
The data were down sampled to 200 Hz prior to the classification procedure.Then the data were classified separately for the orientation and spatial frequency of the Gabor using a linear support vector machine (SVM) classifier with L2 regularization and a box constraint of 1.The classifiers were implemented in MATLAB 2014B using the LibLinear package ( Fan et al., 2008 ) and the Statistics and Machine Learning Toolbox (Mathworks, Inc.).To decode the orientation of each Gabor, we used class labels derived from the target (e.g.horizontal vs vertical orientation or high vs low spatial frequency).Classification was performed separately at each time point.In other words, the class labels (i.e., horizontal vs. vertical, higher vs. lower spatial frequency) were derived from the target orientation.For example, if the target orientation was horizontal, then all the preceding Gabor orientations in the corresponding condition were labelled as horizontal.A similar rationale was applied to classifying spatial frequency.
To improve classification, we also performed multivariate noise normalization ( Guggenmos et al., 2018 ).The time-resolved error covariance between sensors was calculated based on the covariance matrix of the training set and used to normalize both the training and test sets in order to down-weight MEG channels with higher noise levels.
Pseudo-trials were generated to improve SNR by averaging trials over bins of 10, without overlap ( Dima and Singh, 2018 ).This pseudotrial generation was repeated 100 times based on random ordering of the data to generate trials with a higher signal to noise ratio.Five-fold crossvalidation was used to evaluate classifier performance.The data were randomly partitioned into five sets, of which four were used to train the classifier and one was used for testing.The process was repeated until each set (fold) had been left out once, and classifier accuracy was then averaged across folds.The trial averaging and cross-validation procedure was repeated 25 times to yield more stable estimates.
Cluster corrected sign permutation tests (one-tailed) ( Dima et al., 2018 ) were applied to the accuracy values obtained from the classifier with cluster-defining threshold p < 0.05, corrected significance level i.e., cluster-alpha p < 0.01.

Behavioural results
On each trial, participants were asked to evaluate if the spatial frequency of the target (cycles per degree) was lower or higher than that of the entrainers.This stimulus dimension was not directly related to the experimental manipulation of timing ( when ) and Gabor orientation ( what ).  1 presents the accuracy and reaction time (RTs) for all four conditions.We found no significant differences in behavioural accuracy.However, temporal predictability elicited faster RTs ( what + when > what | when > random ).We fit a Linear mixed model ( lmer : R function) with participants and observations as random effects and what (orientation: predictable or not), when (timing: predictable or not) and their interaction as fixed effects.We observed an effect of when ( t = -2.794,p < 0.05).
Orientation predictability ( what ) did not elicit statistically significant effects ( t = -1.557),probably due to that fact that, in order to perform the task, participants had to actively pay attention to the spatial frequency dimension of the target.

Sensor-level MEG results
Next, we report the analysis of evoked responses to the four entrainers, where visual predictions were incrementally built up.Note that there was no explicit task related to orientation.We first analysed the amplitude of the initial visual evoked response for 5 occipital sensors (the ones showing maximum response to the localizers) to determine how the two-by-two experimental design modulated visual responses across entrainers.In the three-way ANOVA (details in Table 1 ), we observed a significant main effect of entrainer ( p < 0.001) on the peak amplitudes of ERFs.Interestingly, this factor interacted with the factor what ( p < 0.001), suggesting that Gabor orientation predictability affected visual-evoked responses differently across entrainers.We should point out that a main effect of what ( p < 0.001) supported the observation that orientation predictability influenced visual processing.Importantly, the interaction between the three factors, i.e., entrainers, what and when , was significant ( p = 0.004) .This triple interaction underlines the fact that timing uncertainty influenced the development of visual predictions across the sequence of four entrainers.
Fig. 2 shows the sensor-level ERFs time-locked to the onset of each entrainer (E1, E2, E3, and E4) and target (T) for the when and the what + when conditions.Here, we can see the influence of orientation predictability (expectation suppression effect) when timing was also predictable.The amplitude of the ERFs was significantly higher ( p < 0.01, cluster based permutation test) in the when than the what + when condition for E2, E3, and E4, but not for E1 and T. The amplitude enhancement for the when compared to the what + when condition emerged in the (expected) 95-105 ms, 96-110 ms, and 97-121 ms time intervals for E2, E3, and E4, respectively.These clusters were located in occipital sensors for all four entrainers.Fig. 3 shows the sensor-level ERFs for the random versus the what conditions.This comparison highlights the expectation suppression effect when timing was not predictable.The amplitude of the ERFs was significantly higher ( p < 0.01) for the random compared to the what condition for E2, E3, and E4 but not for E1 and T. The amplitude enhancement for the random compared to the what condition emerged within the 95-119 ms, 94-123 ms, and 96-127 ms time intervals for E2, E3, and E4, respectively.These clusters were also clearly distinguishable in occipital sensors for all the entrainers.
Since both comparisons involving expectation suppression were significant from entrainer 2 onward, we compared the two orientation predictable conditions with ( what + when ) and without ( what ) temporal predictability.This comparison highlights how temporal predictability affects visual predictive processing.Fig. 4 shows that initial early evoked activity at E1 (0-75 ms, preceding the peak reflecting the visual evoked response) is similar for both conditions.As we move across entrain-ers, these early differences increase and reach statistical significance ( p < 0.001), but this effect vanishes at the target.This differential prestimulus activity demonstrates that results for the two orientation predictable conditions depend on temporal predictability.Here, it is worth noting that we used the same baseline time period (the 400 ms before the fixation cross at the beginning of the whole trial) to test the effects of all four entrainers (E1 to E4) and the target (T).Since our focus in this analysis was on early evoked responses to the visual stimulus, which showed robust expectation suppression effects across all predictable entrainers (see Figs. 2 and 3 ), we selected the time window from 75 to 135 ms, corresponding to the initial ERF peak reflecting early visual processing, for statistical comparison.Across the four entrainers an effect emerged only at E4, where the amplitude suppression was larger for the what than for the what + when condition in a cluster spanning the 106-124 ms time interval.This cluster was located in occipital sensors ( Fig. 4 ).
The differences that emerged between the two orientation predictable conditions ( Fig. 4 ) across the entrainment sequence could be due to carry-over effects from an earlier difference in baseline activity.To evaluate this hypothesis we also compared the two orientation unpredictable conditions ( when and random ).Here, we expected similar differential baseline activity in the 0-75 ms time interval, but no difference at the peak of the visual response for any entrainer.Fig. 5 shows the sensor level comparison of the ERFs for the when and random conditions.Differences in the initial activity time-locked to the Gabor patch are evident at E2, E3, and E4 within the 0-75 ms time range ( p < 0.001, compare Figs. 4 and 5 ).This difference is not evident at E1 or T. Since evoked responses to all the entrainers (E1, E2, E3, and E4) and T were baseline corrected using the same activity period before the fixation cross at the beginning of the trial, this effect could reflect temporal predictability affecting ongoing brain activity before the initial visual response to each Gabor entrainer.Importantly, we focused on the effect of temporal predictability on the peak visual response.In the selected time window 75-135 ms we did not observe any statistically reliable effect at any entrainer.This null effect is in line with the triple interaction observed in the initial overall ANOVA ( Table 1 ), supporting the idea that temporal predictability affects visual predictive processing.

Source level ERFs
We next identified the brain regions underlying the relevant effects observed at the sensor level.Source activity was estimated around the peak amplitude of the sensor-level ERFs in the 85-125 ms interval.Whole-brain maps of source activity were created for each condition ( what + when, when, what and random ) and entrainer 4 (E4), i.e., the stimulus where the difference between what + when and what was statistically significant ( Fig. 6 ).
Source activity was localized in bilateral occipital regions for all conditions and compared to baseline at the group level.The first local maxima emerged in visual association areas (Brodmann Area 18: BA 18, average coordinate [-3, -76, -2]) of the left occipital cortex in all conditions.
For this local maximum we evaluate the amplitude of source activity across conditions, following the same rationale described for the sensorlevel analyses.Fig. 6 A shows the brain maps representing maximum peak activity in the source space.

MVPA results
The ERF analyses showed that the expectation suppression effect grew incrementally larger across the four entrainers, demonstrating that orientation predictability reduced visual processing costs, possibly due to increased reliance on internally generated expectations.To further corroborate the hypothesis that the visual system developed expectations for successive Gabor orientations during the entrainer sequence, we performed the following analyses.We first checked whether the orientation of perceived Gabors could be decoded by applying a temporal decoding approach to each entrainer (as a control we performed the same analysis to decode the spatial frequencies of these Gabors).MVPA showed that only those conditions with predictable orientations ( what + when and what ) revealed above-chance and statistically significant decoding accuracy values compared to the conditions where orientation was not predictable (see Supplementary Figures 1 and 2 for the comparison what + when vs. when and what vs. random , respectively).Fig. 7 shows the decoding accuracy of predictable orientations in conditions with ( what + when ) and without ( what ) temporal predictability.Here, we see how target orientation becomes increasingly decodable across entrainers (especially at E3 and E4; t -test between peak decoding accuracy at E1 and E4 across participants and conditions, p < 0.01) and is strongest at the target.By contrast, decoding values for spatial frequencies (high vs. low CPD) were significant only at the target.This was expected since the CPD of the target could be predicted from the entrainers, which had intermediate spatial frequencies; this provided a good baseline for target orientation decoding effects.
Notably, decoding accuracy (at E3 and E4) was reliable within the time period of the initial evoked response (75-200 ms) for the two orientation predictable conditions whether timing was predictable or not.No statistical difference between the two predictable conditions emerged in this time interval.This indicates that the neural representation of the visual stimulus was preserved independently of the amplitude of the ERF responses.This analysis also showed that decoding accuracy was higher for Entrainer 4 in a later time interval (525-595 ms, p < 0.05) for the what + when than the what condition.This indicates that visual representations were actively maintained for a longer time period when timing was isochronous, while the effect lasted for less time when timing was jittered.
It could be argued that the increase in decoding accuracy for the orientation of the target observed across entrainers could be due to the fact that entrainer orientations gradually approached a horizontal/vertical orientation, making them representationally more similar to the target. .We argue that this is not the case based on two facts.For one.if this had been the case, we should have found large decoding accuracy even at the first entrainer.The orientation of the first (compared to following) entrainer was closer to the horizontal vs. vertical orientation contrast, while being orthogonal to the target (if E1 was horizontal, the target would be vertical).The classifier should also have picked up this dissociation at the first entrainer (since the classifier was blind to the actual orientation of the stimuli in any two classes and was simply trained to evaluate whether two classes of data were different).This was not the case (with weak and unstable accuracy for E1), indicating that the classifier was instead picking up increasing "orientation expectation " across the sequence of entrainers.Second, if the classifier had detected overall visual similarity (not specific to orientation) between entrainers and the target, we would have expected a similar trend of increasing accuracy to also emerge in decoding spatial frequencies (CPD).However, at E4 decoding accuracies for spatial frequency consistently showed chance level accuracy.

Discussion
In the present study we report robust expectation suppression effects for predictive processing of visual Gabors.Across a series of four entrainers, we observed incrementally larger suppression of visual evoked responses when Gabor orientations were predictable, accompanied by incrementally improved decodability for predictable Gabor orientations.Importantly, these effects were modulated by temporal predictability: expectation suppression of evoked responses was larger when the tim- ing of the entrainers was jittered, while decodability of visual responses was less sustained for jittered timings.These findings indicate that the neurocognitive system invested less resources in visual analysis in temporally uncertain scenarios due to precision weighting, i.e., higher reliance on internal predictions.

Expectation suppression effects
The goal of our study was to evaluate how visual expectation differentially modulates prediction error depending on changes in precision weighting due to variable temporal predictability.We mainly focused on the expectation suppression effect ( Walsh et al., 2020 ), and in contrast to previous studies ( Auksztulewicz et al., 2018 ;Utzerath et al., 2017 ), (i) we did not use mismatched stimuli, and (ii) we made sure that participants were not aware of the experimental manipulations in the study.Given the much debated interaction between attention and predictive processing ( Kok et al., 2012 ), we developed an experimental design which aimed to control for strategic effects related to the processing of the Gabor orientation (the task required that participants instead focus on spatial frequencies, which were the same across conditions).While the orientation manipulation was noticeable, it is important to underscore that our participants did not report having observed any temporal jitter of the visual stimuli in the temporally unpredictable conditions.
There are several studies in which reduced neural responses for predictable stimuli have been found during passive viewing ( Alink et al., 2010 ), as well as when stimuli are fully task irrelevant ( Den Ouden et al., 2009 ).This supports the contention that expectation suppression does not vary whether or not participants engage with the task.However, some authors have reported no effects from expectation suppression of sensory activity when stimuli were unattended ( Larsson and Smith, 2012 ).These findings suggest that contextually predictable stimuli may not result in any suppression of early visual neural responses ( John-Saaltink et al., 2015 ).In the present study, we found expectation suppression effects when the orientation of the entrainers was predictable and showed that these effect increased incrementally across the entrainer sequence.We interpret this effect as demonstrating that the visual system develops increasingly strong expectations for the specific orientation of upcoming Gabors across sequences: the stronger the expectation for a Gabor orientation, the larger the suppression of the visual response.This effect was significant in the evoked responses of the second, third and fourth entrainers ( Figs. 2 and 3 ) and was present in conditions both with and without temporal predictability.This visual evoked response possibly originated in visual area 2 (V2), the area which showed reduced activity for predictable stimuli compared to unpredictable stimuli ( Fig. 6 ).The source location of the present effect could reflect some sort of top-down activity generated in an extrastriate region projecting to the primary visual cortex (V1).This possibility should, however, be further validated (possibly by employing direct brain recordings in non-human primates) with additional connectivity analyses to investigate the bidirectional interaction between V1 and V2 and determine if the flow of information in the top-down direction is enhanced for content predictable conditions.
It is worth noting that this incremental effect was not mirrored in the behavioural responses, which probably reflect later decision processes.In addition, expectation suppression effects evident during the entrainer sequence vanished at the presentation of the target Gabor (where participants had to perform the task).At the target, it is possible that neural resources were largely invested in processing the task-relevant spatial frequency difference between the target and the preceding entrainer Gabors.This task likely interacted with and washed out the on-going neural expectation effects, which were observed for entrainers.It would be interesting to evaluate how the different features of the target Gabor (orientation and spatial frequency) were processed in future studies.This could be achieved by using a delayed cueing task in which participants receive a random post-target cue after target presentation indicating whether they should perform an orientation or spatial frequency discrimination task.Another option would be to avoid the use of any task, i.e., employ a passive viewing paradigm, to determine if the expectation suppression effect is nevertheless preserved.

The interaction between stimulus features and temporal predictability
We observed a repetition suppression effect in evoked responses within an early time interval, i.e., at 85-125 ms.The analysis of the amplitude of this initial evoked response showed that there was a significant interaction between the what and when dimensions of visual stimuli.This interaction was mainly driven by the increased suppression of neural responses to temporally jittered vs. isochronous predictable stimuli.It seems that the visual system generates a reduced response to an incoming stimulus whose onset is unpredictable compared to a stimulus whose timing is certain.It could be argued that the visual system is not "capable of preparing " for a temporally unpredictable stimulus.This can be supported by the fact that the effects observed in the evoked peaks at around 100 ms are preceded by a large difference between temporally predictable and unpredictable conditions independently from stimulus content predictability ( Figs. 4 and 5 ).However, at around 100 ms, when the initial visual evoked response is peaking, a difference emerges between the two content predictable conditions ( what + when vs. what , Fig. 4 ) but not between the two content unpredictable conditions ( when vs. random , Fig. 5 ; see also Fig. 6 ).If the reduced response found for temporally unpredictable stimuli in the what condition (compared to what + when ) had only been driven by the inability of the visual system to prepare for the timing of a visual stimulus (regardless of its visual properties), one would have expected a similarly larger visual response in the random condition (compared to when ) as well.One potential explanation for the larger suppression of the early evoked response in the temporally unpredictable what condition is that the visual system assigns more weight to abstract internal predictions and less to sensory evidence when temporal uncertainty is higher.This evidence highlights differential precision weighting of prediction error in different timing scenarios and would support theoretical claims suggesting that predictive mechanisms are essential for reducing uncertainty about the external environment ( Clark, 2013 ).

Stimulus specific neural activity
Expectation suppression effects provided evidence for reduced visual processing for expected stimuli, but did not provide evidence for expectations of specific visual representations.Our second hypothesis evaluated if decoding stimulus attributes from evoked neuronal responses depends upon predictability.We thus used multivariate pattern analysis to evaluate whether expectations regarding Gabor orientation increased across entrainers.Our results show that decoding accuracy for Gabor orientations increased across entrainers when successive entrainer and target orientations were predictable ( Fig. 7 ).This indicates that stimulus predictability is a crucial factor in enhancing the accuracy of orientation decoding during the presentation of entrainers.In fact, when the stimulus is not predictable, decoding accuracy remains at chance level (Supplementary Figures 1 and 2).
Temporal predictability did not affect the decodability of the predicted visual stimulus in the earlier time interval, when the early visual evoked response emerged.This indicates that the representation of the Gabor orientation was stable and preserved independently of the amplitude of the related evoked response.On the other hand, temporal predictability differently affected orientation decoding in a later time interval (525-595 ms), showing that the orientation representation of the fourth entrainer was maintained active for a longer period of time if the timing of the stimulus was predictable.In other words, this suggests that the visual system invests more resources and prolongs processing of stimulus features when the temporal onset of the visual stimulus is predictable.This difference mirrors the evoked effects, where we observed stronger visual responses to temporally predictable than temporally unpredictable conditions.The decoding results thus reinforce our hypothesis that prediction error responses are precision-weighted differently for processing expected/temporally predictable visual stimuli, compared to expected/temporally uncertain visual stimuli.
A side note regarding the decoding of spatial frequency: this feature was constant across entrainers and conditions thus leading to chance level decoding for all four entrainers.At the target, however, spatial frequency showed very high decoding accuracy (around 97-98%), even higher than orientation decoding, within the time interval related to the initial evoked response (~100 ms).Spatial frequency effects were also evident slightly earlier and lasted longer than the orientation effects.Since our task focused on the difference in spatial frequency between the entrainers and target, the neural system likely maintained information regarding the spatial frequency of the target active for a longer interval that information regarding the orientation of the target, thus obscuring or interfering with any on-going expectation suppression effects due to Gabor orientation.

Conclusions
In the present study we investigated the effect of temporal predictability on visual predictive processing.Our results show that temporal predictability modulates processing of expected visual features.We found increased suppression of visual evoked responses for temporally unpredictable relative to temporally predictable visual stimuli.This may demonstrate that the brain assigns less weight to evidence emerging at the sensory level when timing is uncertain.
Tables 2 and 3 Data and code availability statement The MATLAB scripts used for analyzing the data, the preprint version of the manuscript along with high resolution figures can be accessed through the Open Science Foundation repository (OSF).( https:// osf.io/bj6rd/?view_only = 68c79ba57d5940fabd8b06c244c39a74 ).Due to the size of the whole dataset (~15 GB per participant) and given limited public storage options available, we uploaded the raw and preprocessed data from one representative participant.However, the full dataset is available upon requests directed to Dr. Nicola Molinaro (n.molinaro@bcbl.eu)or Sanjeev Nara (s.nara@bcbl.eu).The full data set could then be shared through the private BCBL secured institutional servers temporarily available for big data transfer.

Declaration of Competing Interest
The author (s) declare (s) that there is no conflict of interest regarding the publication of this article.

Fig. 2 .Fig. 3 .Fig. 4 .Fig. 5 .
Fig. 2. Sensor level ERFs for the when and what + when conditions.A) For each condition (red, when ; blue, what + when ) and stimulus (Entrainer 1 [E1], E2, E3, E4 and Target [T]), we show the average of the event related fields (ERFs) in representative channels located above occipital regions (MEG02042/3, MEG2032/3, MEG2342/3, MEG2122/3, and MEG1922/3).Below we also report the ERF difference between the when and what + when (black line) conditions.gray boxes represent time points where the amplitude of the ERFs was higher ( p < 0.01, cluster-based permutation test) for the when than the what + when condition.B) Sensor maps of the

Fig. 6 .
Fig. 6.A) Brain maps representing source activity for each condition ( what + when, when, what , and random ) at Entrainer 4 (E4).We included a view of the medial surface and the occipital lobe of the left (LH) and the right (RH) hemispheres.B) The mean source activity in BA18 (Brodmann Area 18: xyz MNI coordinate: -3, -76, -2) in the four conditions at E4. Asterisks indicate significant differences across conditions.

Fig. 7 .
Fig. 7. Time-resolved decoding accuracy for the what + when condition (blue line) and what condition (orange line) time-locked to Entrainer 1 (E1), E2, E3, E4 and Target (T).The coloured dots under the curves indicate the statistical significance of decoding accuracy across time.The gray box at E4 in Orientation shows the statistical significant differences ( p < 0.05) between what + when and what.

Table 1
Accuracy and Reaction Times (RTs) for each condition.

Table 2
Repeated measure ANOVA with the factors entrainer (four levels, one for each entrainer), what (two levels: orientation predictable or not) and when (two levels: timing predictable or not).

Table 3
Statistically significant classification accuracy for orientation angle and spatial frequencies (CPD) of the target Gabors across entrainers and target for the orientation predictable conditions.Time intervals indicate the windows in which accuracy was statistically above chance.n.s.: not significant.