Decoding attention control and selection in visual spatial attention

Abstract Event‐related potentials (ERPs) are used extensively to investigate the neural mechanisms of attention control and selection. The univariate ERP approach, however, has left important questions inadequately answered. We addressed two questions by applying multivariate pattern classification to multichannel ERPs in two cued visual spatial attention experiments (N = 56): (a) impact of cueing strategies (instructional vs. probabilistic) on attention control and selection and (b) neural and behavioral effects of individual differences. Following cue onset, the decoding accuracy (cue left vs. cue right) began to rise above chance level earlier and remained higher in instructional cueing (~80 ms) than in probabilistic cueing (~160 ms), suggesting that unilateral attention focus leads to earlier and more distinct formation of the attention control set. A similar temporal sequence was also found for target‐related processing (cued target vs. uncued target), suggesting earlier and stronger attention selection under instructional cueing. Across the two experiments: (a) individuals with higher cue‐related decoding accuracy showed higher magnitude of attentional modulation of target‐evoked N1 amplitude, suggesting that better formation of anticipatory attentional state leads to stronger modulation of target processing, and (b) individuals with higher target‐related decoding accuracy showed faster reaction times (or larger cueing effects), suggesting that stronger selection of task‐relevant information leads to better behavioral performance. Taken together, multichannel ERPs combined with machine learning decoding yields new insights into attention control and selection that complement the univariate ERP approach, and along with the univariate ERP approach, provides a more comprehensive methodology to the study of visual spatial attention.

The univariate ERP approach, despite having generated a wealth of insights into the neural mechanisms of attention control and selection, has left some important questions inadequately addressed. For example, upon receiving the attention-directing cue, how long does it take to form the attention control set? Do different cueing strategies (i.e., probabilistic vs. instructional) impact the timing of attention control? Predefined analysis windows used in previous univariate ERP research to measure the effect of attention vary significantly from study to study and do not yield precise answers to these questions.
The impact of cueing strategies remains largely unexplored. In addition, there are significant individual differences in attention control and selection. Does stronger cue-triggered preparatory attention control lead to more effective selection of attended information? To what extent the attention selection of task-relevant stimuli is related to behavioral performance? Univariate ERP studies attempting to link cue-related ERPs with target-related ERPs (e.g., N1) and to link targetrelated ERPs with behavior (e.g., reaction time [RT]) have again yielded mixed results (Dale et al., 2008;Grent-'t-Jong et al., 2011;Harter et al., 1989;Talsma et al., 2007).
These issues may stem from the fact that univariate ERPs from single electrodes are not able to reflect the contributions of multiple neural processes taking place in distributed brain regions that collectively influence the subsequent neural or behavioral events in spatial attention (Eimer, 1998;Lasaponara et al., 2018;Mangun & Buck, 1998). For example, ADAN and LDAP both appear around 400-500 ms post-cue, but they each index different processes of preparatory attention and have distinct scalp topographies (Hong et al., 2015;Hopf & Mangun, 2000;Jongen et al., 2007;Lasaponara et al., 2018;Nobre et al., 2000). A multivariate approach taking into account the contribution of distributed neural processes that occur concurrently may provide a path forward to overcome the limitations of the univariate approach. Instead of treating different electrodes singly as in the univariate EEG/ERP approach, the multivariate pattern analysis/classification (also referred to as decoding) approach treats measurements from multiple ERP channels as a pattern representing the cognitive variable to be analyzed (Grootswagers, Wardle, & Carlson, 2017;Parra, Spence, Gerson, & Sajda, 2005). To date, EEGbased decoding analysis has been applied to face detection (Cauchoix, Barragan-Jason, Serre, & Barbeau, 2014), working memory (Bae & Luck, 2018), and decision-making (Bode et al., 2012;Philiastides, Ratcliff, & Sajda, 2006;. Here, we sought to apply this approach to understand the neural mechanisms of visual spatial attention control and selection. Our goal was to establish and compare the entire time courses of decoding for both cue-related (cue left vs. cue right) and target-related (cued target vs. uncued target) brain states across cueing strategies, and to link individual differences in decoding accuracy with attention enhancement of target processing and behavioral response.
Two cueing paradigms were considered in this study: probabilistic cueing (Mangun & Hillyard, 1991;Posner, 1980;Yamaguchi et al., 1994) and instructional cueing (Bengson, Kelley, & Mangun, 2015;Hopfinger, Buonocore, & Mangun, 2000;Snyder & Foxe, 2010;Worden, Foxe, Wang, & Simpson, 2000). The classic example of probabilistic cueing is the Posner paradigm in which the subject is encouraged to pay attention to a location where the target will be more likely to occur, but is required to respond to targets that are presented in both cued (valid targets) and uncued (invalid targets) locations (Posner, 1980). In this paradigm, there is strategic motivation to divide attention between both the cued location and the uncued location (Snyder & Foxe, 2010). In contrast, in instructional cueing, subjects are instructed to pay full attention to the cued location and respond only to targets appearing at the cued location (ignoring targets presented in the uncued location). Instructional cueing is expected to produce stronger attentional facilitation at the cued location compared with probabilistic cueing. This notion has been suggested but supporting evidence from quantitative comparisons between these two cueing strategies within one study is still lacking (Hong et al., 2015;Hopfinger et al., 2000;Liu, Bengson, Huang, Mangun, & Ding, 2016;Snyder & Foxe, 2010;Worden et al., 2000). Given the possible differences between these two cueing strategies, we expect that decoding accuracy as a function of time during the cue-target interval and during the target processing would rise above chance level sooner and reach higher levels for instructional cueing than for probabilistic cueing. We further predict that across individuals, the decoding accuracy during the cue-target interval (cue left vs. cue right), reflecting the strength of attention control, is positively associated with attention selection of the target and the decoding accuracy during target processing (cued target vs. uncued target), reflecting the strength of attention selection, is positively associated with behavioral performance.

| Participants
The experimental protocols were approved by the Institutional Review Board of Shanghai Mental Health Center (No. 2017-05R). Sixty-one healthy college students with normal or corrected-to-normal vision gave written informed consent and participated in this study. In traditional visual spatial attention studies using univariate ERP analysis, the typical sample size is between 15 and 20 subjects (Grent-'t-Jong et al., 2011;Hong et al., 2015;Kelly et al., 2009;Nobre et al., 2000). However, since this study was an attempt to apply a novel multivariate decoding approach and perform between-subject correlation analysis between decoding accuracy and target-related ERPs/behavior, we increased our sample size to the range of 30 subjects for each experiment. There were two experiments: Experiment 1 (N = 32) utilized instructional cueing and Experiment 2 (N = 29) utilized probabilistic cueing. Two participants were excluded in Experiment 1 due to poor data quality (i.e., less than 50% of trials remained after preprocessing). Three participants were excluded in Experiment 2 due to (a) poor task performance (N = 1; accuracy <50%) and (b) hardware issues (N = 2; responses were not recorded correctly). Thirty participants were included in final analysis for Experiment 1 (10 females, aged between 18 and 25 years, all right-handed) and 26 participants were included in final analysis for Experiment 2 (10 females, aged between 20 and 24 years, all right-handed).

| Stimuli and procedures
In both experiments, each trial began with an arrow cue (Experiment 1:2.24 × 1.62 ; Experiment 2:2.29 × 1.62 ) presented centrally for 200 ms, which pointed to either the left or right with equal probability. The cue was then replaced by a central fixation point (Experiment 1: a crosshair, 1.38 × 1.38 ; Experiment 2: a dot, 0.57 × 0.57 ) where subjects were required to maintain their fixation throughout each trial.
Upon seeing the cue, subjects were instructed to shift their attention covertly to the cued direction. Two location markers (Experiment 1: squares, 2.39 × 2.39 , located 9.05 from the vertical meridian and 7.2 below the horizontal meridian; Experiment 2: dots, 0.57 × 0.57 , located 8.87 from the vertical meridian and 7.06 below the horizontal meridian) were presented in the left and right visual fields throughout the trials. After a cue-target interval of 1,000-1,200 ms, a target stimulus (Experiment 1:1.67 × 1.67 ; Experiment 2:1.72 × 1.72 ), either a plus sign or the letter "x" with equal probability, was presented at one of the location markers. The target lasted for 200 and 100 ms in Experiment 1 and Experiment 2, respectively. In Experiment 1 (instructional cueing), the target was presented at the cued or uncued location with equal probability. Subjects were instructed to totally ignore the uncued location and respond only to the plus sign appearing at the cued location. In Experiment 2 (probabilistic cueing), subjects were instructed to respond to the plus sign presented at both the cued (valid trials, 73.3%) and the uncued (invalid trials, 13.3%) locations. The remaining 13.3% trials in Experiment 2 were neutral trials with bilateral arrow cues, which did not provide any information about the location of forthcoming targets. The neutral trials were included for testing the behavioral effects of attention cueing and not included in the following ERP or decoding analysis. In both experiments, the intertrial interval between the target offset and the cue onset of next trial was set at 2600 ms. Response to the plus sign was made by pressing a button on the response box with the right index finger as quickly and accurately as possible. Only responses made within 1,600 ms after the target offset were considered as valid.
In Experiment 1, all stimuli were in black and presented in a white background ( Figure 1a). In Experiment 2, all stimuli were in white and presented in a black background (Figure 1b). In both experiments, the paradigms were compiled and executed in the E-Prime 2.0 toolkit (Psychology Software Tools, Inc., Sharpsburg, MD), and all stimuli were presented on a 19-in. LCD monitor positioned 60 cm in front of the subject. Each trial block consists of 60 trials lasting for about 4-5 min. Subjects were first shown the experimental instructions, and then trained for at least one block to get familiarized with the task. After that, each subject finished eight blocks, with a 2-3 min break between successive blocks.

| EEG preprocessing
EEG preprocessing was conducted offline using EEGLAB and ERPLAB toolboxes, following the same general steps for both experiments.
Specifically, continuous EEG data were first band-pass filtered between 0.1 and 40 Hz using a two-way Butterworth filter with zero phase shift (roll-off slope: 12 dB/oct). The power line noise was suppressed by a Parks McClellan notch filter at 50 Hz. Other artifacts including ocular artifacts were then corrected by independent component analysis (ICA) using the Infomax algorithm (Jung et al., 2000). The EOG channels were included in the ICA to aid the removal of the ocular artifacts. Independent components (ICs) were manually identified according to their time courses and scalp topographies. Typically, for each subject, one IC was identified as related to eye movements and one IC to eye blinks. The ICs that are related to eye movements tend to have a scalp map showing strong contribution from anterior frontal channels (see Figure 2 for representative examples). ICs corresponding to artifacts were rejected and EEG data were reconstructed by adding together the nonartifact ICs. Artifacts-corrected EEG data were then re-referenced to the average of the two mastoid electrodes (TP9, TP10). The original recording reference electrode was recalculated as FCz. After that, continuous EEG data were downsampled to 250 Hz and then segmented into two types of epochs: one was time-locked to cue onset (from −1,000 ms pre-cue to 1,400 ms post-cue) and the other was time-locked to target onset (from −500 ms pre-target to 1,000 ms post-target). It is worth noting that we extracted longer epochs than the periods we were interested in, so that we can minimize the filtering-related edge artifacts by trimming the two ends of the epoch (i.e., first and last 200 ms in both cuerelated and target-related epochs; see below).
Epochs meeting one or more of the following criteria were rejected: (a) the maximal voltage difference in any EEG channel exceeded 150 μV within any of the moving windows (width: 200 ms; step: 50 ms) throughout the epoch, which was examined by a peak-to-peak (maximum minus minimum) function, (b) the absolute value of voltage at any time point throughout the whole epoch in any EEG channel exceeded 100 μV, which was examined by a simple voltage threshold function, (c) epochs with any overt eye movements as detected by a moving window step function (width: 400 ms; step: 10 ms; threshold: 40 μV) based on HEOG amplitude, and (d) epochs with any overt eye blinks around the cue or target stimuli presentation period (−200 to 200 ms) as detected by a moving window peak-to-peak function (width: 200 ms; step: 10 ms; threshold: 50 μV) based on VEOG amplitude. Instead of finding the maximal peak-to-peak difference, the step function measures the difference in mean amplitude between the first half of the window and the second half of the window (i.e., between the first 200 ms and the last 200 ms of a 400-ms window), and has been shown as an effective method to detect small eye movements recorded in an HEOG channel (Luck, 2014). As indicated above, EOG channels F I G U R E 1 Experimental paradigms for the Instructional cueing dataset (a) and Probabilistic cueing dataset (b). In both paradigms, an arrow cue was first presented, directing the subject to covertly shift attention to either the left or the right lower visual field. Fixation was required throughout. After a random cue-target interval, a visual target (the letter "x" or the plus sign) was presented in one of the location markers, and the subject responded to the target according to the paradigm requirements. In the instructional cueing paradigm, the subject was told to respond only to targets appearing at the cued location and totally ignore targets presented at the uncued location (50% probability). In the probabilistic cueing paradigm, the subject needed to respond to targets presented in both cued (73.3% probability, valid trials) and uncued (13.3%, invalid trials) locations. In the remaining 13.3% trials (neutral trials) in the probabilistic cueing paradigm, the cue was uninformative, consisting of a bilateral arrow. See Section 2.2 for more details F I G U R E 2 Removal of eye movement confounds. Independent components related to eye movements are shown for three representative subjects in each dataset. The decoding analysis was based on channels F7 and F8 which were near the eyes. Without independent component analysis (ICA) correction, the decoding is above chance-level during the late cue-target interval. Chance level performance (0.5) is indicated by the horizontal dash lines. Gray areas indicate clusters of time points in which the decoding was significantly greater than chance after the false discovery rate (FDR) correction for multiple comparisons. The blue shading indicates ±1 SEM. After removing eye movement-related ICA component, the decoding accuracy returns to chance level were included for the ICA correction, which aided the removal of the ocular artifacts. Since trial-wise artifact rejection was performed after ICA correction and artifact rejection steps (Criteria 3 and 4) relied on the uncorrected EOG channels, we replaced the ICAcorrected EOG channels with the uncorrected ones before performing artifact rejection. In addition, grand-averaged HEOG activity during the cue-target interval after artifact rejection was shown in Figure S3 in Supplemental Materials. The difference in cue-related eye movements was measured at <2 μV in HEOG amplitude. This difference corresponded to a difference of <0.2 in eye position (Lins, Picton, Berg, & Scherg, 1993).
The trial rejection rates across subjects (mean ± SD) of cuerelated epochs were 17.73 ± 14.00% (cue left) and 17.48 ± 14.02% (cue right) for Instructional cueing dataset, and 19.75 ± 14.69% (cue left) and 19.86 ± 15.50% (cue right) for Probabilistic cueing dataset. There were no significant differences in trial rejection rates between cue left and cue right in either dataset (both p > .7, paired sample t test). The trial rejection rates of target-related epochs were 7.27 ± 7.59% (attended targets) and 8.13 ± 7.88% (ignored targets) for Instructional cueing dataset, and 11.89 ± 12.92% (valid targets) and 11.64 ± 13.32% (invalid targets) for Probabilistic cueing dataset. There were no significant differences in trial rejection rates between cued (attended or valid) and uncued (ignored or invalid) targets in either dataset (both p > .4, paired sample t test). Only epochs with correct behavioral performance that were also artifactfree after correction in all channels were included in the following analysis.

| Univariate ERP analysis
Conventional univariate ERP analysis was performed by averaging EEG epochs of the same condition triggered either by the cue or by the target. Specifically, cue-related epochs were averaged according to the cue direction (left, right) with pre-cue interval (−200, 0 ms) as baseline, yielding cue-related ERPs for each condition, electrode, and participant. Target-related epochs were averaged according to the target location (left, right) and attention (cued, uncued) with pre-target interval (−200, 0 ms) as baseline, yielding target-related ERPs for each condition, electrode and participant.
2.6 | Multivariate pattern classification/decoding 2.6.1 | Overview We examined whether multichannel patterns of ERPs can be used to reveal neural representation of visual spatial attention. Since singletrial EEG data are noisy and attention-related ERPs have small amplitudes (e.g., typically less than 1 μV for EDAN, ADAN and LDAP), it would be difficult to decode their patterns based on single-trial EEG data. However, averaging across trials can substantially improve the signal-to-noise ratio, and thus increase decodability (Grootswagers et al., 2017;Luck, 2014). Thus, we applied a recently proposed ERPbased decoding approach (Bae & Luck, 2018, in which multiple EEG epochs from a given cue or target condition were first averaged to yield the ERPs at each channel, and decoding was then performed on multichannel patterns of ERPs rather than on single-trial data, as is often the case in EEG-based decoding studies. Decoding accuracy was computed based on the average of a test set of epochs with the same condition that was not in the training set used to define the classifier. In addition to ERPs that are phase-locked to the event of interest (cue onset or target onset), EEG data also contain nonphase-locked activity related to visual spatial attention, such as alpha (8-13 Hz) lateralization (Jia, Fang, & Luo, 2019;Liu et al., 2016;Worden et al., 2000). However, alpha lateralization was not the focus of this study. To ensure that our decoding was driven by ERP activity rather than alpha oscillations, we applied low-pass filtering at 8 Hz before decoding using a two-way least-squares FIR filter implemented in eegfilt() routine in EEGLAB. The filter order was set as 3 × (srate/ locutoff), where srate referred to the sampling frequency (250 Hz in this case), and locutoff referred to the cutoff frequency (8 Hz in this case). The filtering was applied to epoched EEG before averaging and decoding. The first and last 200 ms in both cue-related and targetrelated epochs were removed to minimize edge artifacts due to filtering. It should be noted that, however, low-pass filtering at 8 Hz may not completely eliminate all activity >8 Hz, as filters always have a gradual transition around the cut-off frequency. To assess if our findings of ERP decoding were driven by alpha activity, we performed an additional set of analysis by decoding different attention conditions (cue left vs. cue right) using alpha (8-13 Hz) power during the cuetarget interval. Results are included as Figure S4 in Supplemental Materials. It can be found that the weight maps of alpha decoding correspond to classical alpha lateralization, which, however, differ significantly from that of ERP decoding (see Figure 4). Furthermore, we correlated the accuracy of alpha decoding with attentional modulation of target-related N1 using the same approach as that for ERP decoding (see Section 2.7.1). As shown in Figure S5 in Supplemental Materials, no significant correlations were found between cue-related alpha decoding and attentional modulation of target-related N1, in contrast to the ERP-based decoding results to be presented below.
Together, such findings suggest that our ERP decoding was not significantly impacted by alpha.
To increase the efficiency of decoding analysis and reduce computation time, we further down-sampled the data to 50 Hz (1 data point every 20 ms). For cue-related epochs, we obtained a fourdimensional data matrix for each participant, with dimensions being time (100 time points), cue direction (cue left vs. cue right), trial, and electrode site (29 electrodes for Instructional cueing dataset, 60 electrodes for Probabilistic cueing dataset). For target-related epochs, we also obtained a four-dimensional data matrix for each participant, with dimensions being time (55 time points), attention condition (cued target vs. uncued target), trial, and electrode site. Target-related epochs were collapsed across the left and right visual fields for the decoding analysis. That is, left-target preceded by left-cue and right-target preceded by right-cue were combined as cued target, and left-target preceded by right-cue and right-target preceded by left-cue were combined as uncued target. In addition, we noted the reasons for combining left and right targets in the decoding procedure. First, our goal of target-related decoding was to determine whether the patterns related to attentional modulation can tell cued from uncued targets irrespective of their laterality. The significant above-chance decoding accuracy suggests that classifiers trained on collapsed target data can accurately tell whether a single instance of a left-target or right-target was cued or not, suggesting there is an overall attentional modulation pattern that is independent of laterality. Second, although the lateralized patterns of early sensory components (i.e., N1) were not taken into account, the attention effect would not be masked by the combination, and it is common in attention research to combine hemispheres to highlight the attentional modulation of ERP components. Third, if cued and uncued targets in the left and right visual fields were separately decoded, the number of trials became low to obtain reliable ERP patterns, especially for probabilistic cueing where the uncued targets consisted of only 13.3% of the total number of trials.

| Support vector machine classifier
The classifier was based on linear support vector machine (SVM) and trained through the MATLAB fitcsvm() function. The decoding procedure at a given time point included a training phase and a testing phase. Training and testing phases were based on different trials. Specifically, a threefold cross-validation procedure was applied at each time point. The data from two-third of the trials (randomly selected) were used to train a classifier (training), and then the performance of the classifier was assessed with the data from the remaining one-third of trials (testing). For the cue period, we first organized cue-related epochs with respect to the cue direction (left vs. right) and then divided all trials of the same condition into three equal-sized groups.
One or two trials from each cue direction was omitted if the trial number is not evenly divisible by 3. The trials in each group were averaged to yield a scalp distribution of ERPs for the time point being analyzed (a matrix of 3 groups × 2 cue directions × 29/60 electrodes). It should be noted that the accumulation of slow voltage drifts as time passed from the baseline period might result in overall decodable voltage differences that are not related to attention effects between classes (cue left vs. cue right). We thus performed z-score normalization across channels at each time point to minimize possible overall voltage differences between classes (Wen, Duncan, & Mitchell, 2019).
This normalization was performed within each class and each ERP pattern (averaged within each trial group). The data from two of the three groups served as a training dataset, which were used to train a SVM classifier, and the data from the remaining group served as a testing dataset. The trained SVM classifier was then used, with the help of the MATLAB function predict(), to predict the direction of visual spatial attention for the testing dataset. The output of this function provided a predicted cue direction for each observation in the testing dataset. Decoding accuracy was then computed by comparing the true labels of cue direction with the predicted labels. Decoding was considered correct only if the classifier correctly determined the direction of cued attention (left or right). The chance performance was 50%.
This decoding procedure was repeated three times, once with each of the three groups of data serving as the testing dataset. The entire procedure as described above was iterated 20 times, each time with a new random assignment of trials into three groups. This iteration could help to minimize idiosyncrasies associated with trial assignments, and thus yield a more stable result. After that, decoding accuracy was collapsed across 2 cue directions, 3 cross-validations, and 20 iterations, yielding an averaged decoding accuracy for a given time point based on 120 decoding attempts (2 cue directions × 3 cross validations × 20 iterations). After this procedure was applied to each time point from −800 to +1,200 ms (relative to cue onset), the averaged decoding accuracy values were smoothed across time points to minimize noise using a five-point moving window (for a given time point, the five-point window was centered at that point with two points on the left and two points on the right, which was equivalent to a time window of ±40 ms).
For the target period, SVM was again used to classify the attention condition (cued target vs. uncued target) based on the spatial distribution of target-related ERP signals over the scalp. The decoding procedure was identical to that for the cue period, except that the time period of analysis was the 55 time points from −300 to +800 ms (relative to target onset). Decoding was considered correct only if the classifier correctly determined the attention condition (cued target or uncued target). The chance performance was 50%.
In addition to decoding accuracy, we also examined the extent to which different channels drove the classifier performance by reconstructing the spatial distribution of the transformed classifier weights, namely, the activation patterns or weight maps. This was obtained by multiplying the classifier weights with the covariance matrix of the original data, yielding the weight maps for the classifier at each time point (Haufe et al., 2014).

| Statistical analysis of decoding accuracy
If the multivariate ERP patterns across electrodes contain information about the two conditions being compared (cue left vs. cue right for cue-related activity; cued target vs. uncued target for target-related activity), then the decoding accuracy should be greater than chance level, which was 50%. We tested whether the group-level decoding accuracy at each time point was above chance level by performing one-tailed signed rank test against the chance level of 50%. False discovery rate (FDR) was used for correcting the multiple comparison problem with q < 0.05. Furthermore, to control for possible isolated time points that might have high decoding accuracy by chance, we removed time windows with less than three contiguous time points that survived FDR correction.

| Minimizing the impact of eye movements
As discussed earlier, two steps were applied to remove ocular artifacts: ICA-based artifact correction and rejecting trials with overt eye movements. Past work has suggested that ICA may not be sufficient to remove eye movement-related artifacts for the purpose of decoding (Quax, Dijkstra, van Staveren, Bosch, & van Gerven, 2019).
We tested this in our data by conducting an additional set of decoding analyses to check whether eye movement-related artifacts were still detectable in the preprocessed data. Two anterior prefrontal EEG channels near the eyes (i.e., F7 and F8) were chosen as proxies of eye movements for decoding analysis. Given that these two channels contain minimal contribution from the posterior regions of the brain underlying the control of spatial attention, above-chance level decoding from F7 and F8, especially during the late cue-target interval, would suggest that there were still decodable eye movementrelated artifacts in EEG. By contrast, chance level decoding from these two channels would suggest that there were no substantial eye movement-related artifacts left in EEG data after artifacts removal.
These results demonstrated that ICA correction is an important step. Without ICA, the decoding accuracy from the two frontal channels was above chancel level during the cue-target interval in both datasets (Figure 2), indicating that systematic eye movements could contribute to EEG decoding even after excluding trials contaminated by overt eye movements. After ICA correction, however, the decoding accuracy from the two frontal channels was at chance level throughout the cue-target interval in both datasets, suggesting that the ICA correction for eye movements was successful and the decoding analysis reported below was not adversely impacted by eye movements.  Hillyard, 1991). In this study, we calculated the differences of N1 amplitudes between cued target and uncued target (with the negative sign maintained) as the index of attention modulation, and correlated it with the decoding accuracy in the cue-target interval using Pearson correlation (two-tailed). This correlation analysis was performed at each time point from 0 to +1,200 ms (relative to cue onset) to yield a time course of r values and corresponding p values.
Although the paradigms of the two experiments differed in terms of whether or not targets presented in the uncued location required response, subjects were expected to voluntarily shift their attention to the cued location in both paradigms. In other words, the general attention orienting process during the cue-target interval is the same between the two datasets, and the cue-target interval is also of the same duration in the two datasets (1,000-1,200 ms). When appropriate, we performed a meta-analysis by combining the p values from the two datasets using the Liptak-Stouffer meta-analysis, a wellvalidated method for combining multiple datasets Liptak, 1958;Liu et al., 2017). Specifically, the p value of correlation at each time point was converted to its corresponding Z-score using the equation: where Φ is the standard normal cumulative distribution function and i represents the ith dataset. Here, i = 1 (Instructional cueing dataset) or 2 (Probabilistic cueing dataset). Next, a combined Z-score for the correlation at each time point was calculated using the Liptak-Stouffer formula: where Z is the meta-analysis Z-score across the two datasets, Z i is the Z-score of the ith dataset, w i = ffiffiffiffi ffi N i p is the weight of the ith dataset, and N is the number of subjects in the ith dataset. Finally, the combined p value was identified from the meta-analysis Z-score at each time point.
Next, we assessed the statistical significance of the correlation results from the above procedure by performing a cluster-based permutation test. Specifically, we first found clusters of contiguous time points for which the combined single-point correlation was significant (meta-analysis p < .05) and computed the cluster size (numbers of contiguous time points). The minimal cluster size was set as 1. We then asked whether a given cluster size was greater than the size that would be expected by chance through permutation tests. This controls the Type I error rate at the cluster level, yielding a probability of .05 that one or more clusters would be significant if true decoding accuracy was at chance (Groppe, Urbach, & Kutas, 2011 2001). In each iteration, the above permutation was performed in each dataset separately, and the p values at each time point from the two datasets were then combined using the Liptak-Stouffer meta-analysis introduced above. After that, we computed the cluster size for which the combined single-point correlation was significant (p < .05) based on the meta-analysis p values for that permutation iteration. If we observed more than one cluster with significant combined p values, we then took the largest cluster size as the size for that iteration.
The above procedure was iterated 1,000 times to produce a null distribution for the cluster size. To compute the p value for a given cluster size observed in the actual datasets, we simply found where this p value fell within the null distribution of cluster size. The p value for a given cluster was then set based on the nearest percentiles of the null distribution. If the obtained cluster size is larger than the maximum of permuted cluster size, we then reported p < .001. If an observed cluster size was in the top 95% of the null distribution, we rejected the null hypothesis and concluded that the correlation was significant for that observed cluster.
Finally, to see whether univariate ERPs during the cue-target interval were able to predict the attentional modulation of target processing, we performed the same correlation analysis between univariate ERPs and attentional modulation of target-related N1, and used the same permutation test to assess the statistical significance.
This was performed on the basis of ERP difference waves (cue left minus cue right) for each channel separately. FDR correction was applied to correct for multiple comparison across channels. The corrected p values were then used for find clusters of contiguous time points. Since the channel numbers differed between the two datasets, we reported the results (the cluster-level p value and mean r value in each cluster) for each dataset separately instead of combining the results using meta-analysis. 3.2 | Attentional modulation of target-related N1

| Instructional cueing dataset
We conducted a two-way analysis of variance (ANOVA) with Attention (attend vs. ignore) and Target Location (left vs. right) as withinsubject factors on target-related N1 amplitudes. We observed the main effect of Attention (F (1,29) = 60.114, p < .001, η 2 p = 0.675), suggesting that attention significantly modulated the sensory processing of targets (see Figure 3a). No main effect or interaction was observed for the factor of Target Location.

| Comparing N1 modulation between the two datasets
We compared the magnitude of attentional modulation of N1 amplitudes (cued target minus uncued target) between the two datasets using t test.

Instructional cueing dataset
As Figure 4a illustrates, the decoding accuracy began to rise above chance level (50%) shortly after the cue onset, and more specifically, a signed rank test (FDR-corrected) indicated that the decoding was significantly greater than chance level starting at 80 ms and remaining significant until the end of the cue-target analysis interval.
The weight maps showed a frontally-posteriorly lateralized distribution during the 400-600 ms interval, which then diminished as time progressed, and became a centrally lateralized distribution in the later cue-target interval (1,000-1,200 ms). These weight maps were similar to the ERP topographical maps in corresponding time windows (See Figure S1 in Supplemental Materials).

Probabilistic cueing dataset
As Figure 4b illustrates, the decoding accuracy began to rise above chance level (50%) at 160 ms after the cue onset according to a signed rank test (FDR-corrected), and remained significant until the end of the cue-target analysis interval. Overall, the weight maps from probabilistic cueing were consistent with that from instructional cueing, except that in the later cue-target interval (1,000-1,200 ms), the weight maps from probabilistic cueing were more anteriorly distributed.
Comparing the onset of above chance decoding between the two datasets We used bootstrap resampling (100 times) across subjects to compute a distribution of decoding onset times for each dataset. A t test suggested that the decoding onset time was significantly earlier in Instructional cueing dataset than in Probabilistic cueing dataset (64.8 ± 3.4 ms vs. 161.4 ± 3.5 ms, t (198) = −19.706, p < .001, Cohen's d = 2.787).

Comparing the decoding accuracy between the two datasets
We divided the cue-target interval after decoding onset into two equal-sized time windows, that is, 200-700 ms (early) and 700-1,200 ms (late), and averaged the decoding accuracy within each window for each subject and dataset. These two time windows roughly corresponded to the shift and the maintenance stage in visual spatial attention, respectively (Dale et al., 2008;Muller et al., 1998;Rihs, Michel, & Thut, 2009). A two-way ANOVA with Dataset (Instructional vs. Probabilistic) as a between-subject factor and Window (early vs. late) as a within-subject factor revealed a main effect of Dataset (F (1,54) = 4.649, p = .036, η 2 p = 0.079), suggesting significantly higher decoding accuracy for Instructional cueing dataset than for Probabilistic cueing dataset. No other main effect or interaction was observed for this ANOVA.

| Linking decoding accuracy and attentional modulation of target processing
As with any other neurophysiological variables, decoding accuracy varies significantly across individuals. We take that as an opportunity to examine the functional significance of decoding accuracy. For cuerelated decoding accuracy, we correlated it with the magnitude of the attentional modulation of the target-evoked N1 component, the classic marker of attention selection. The correlation coefficients (r values) and corresponding p values for Instructional cueing dataset and Probabilistic cueing dataset are shown in Figure 5a,b as functions of time. We observed negative correlations (i.e., higher decoding accuracy predicts larger N1 attentional modulation) during the cue-target interval for both datasets, and such correlation reached statistical significance around 500 ms; this finding was consistent across the two datasets. For Probabilistic cueing dataset, the direction of this correlation was reversed briefly around 900 ms, but the same was not observed in Instructional cueing dataset.
The correlation results from the two datasets were further combined through the Liptak-Stouffer meta-analysis. The correlation between decoding accuracy within the 460-660 ms post-cue window (see Figure 5c, red region) and attentional modulation of target-related N1 was negative, that is, higher decoding accuracy predicted greater attentional modulation of target-related N1. Further permutation test confirmed that the correlation during this 200 ms length window was statistically significant (p = .037) (see Figure 5d, red line). There was another shorter window (1,020-1,060 ms post-cue) showing significant correlation (see Figure 5c, green region). However, permutation test suggested that the correlation during this 40 ms length window was not significant (p = .443) (see Figure 5d, green line).
Next, we examined whether cue-related univariate ERPs predicted the N1 attention effect. The univariate ERP difference waves (cue left minus cue right; see Figure S1 in Supplemental Materials) were correlated with the attentional modulation of N1, and the results were shown (0-1,200 ms post-cue) in Figure 6. FDR correction F I G U R E 3 Target-related event-related potentials (ERPs) and attentional modulation on N1 amplitudes in the Instructional cueing dataset (a) and Probabilistic cueing dataset (b). Target-related ERPs were constructed over posterior scalp regions that were contralateral to target location (i.e., left hemisphere for right targets, right hemisphere for left targets), then averaged across left and right hemispheres was first applied to correct for multiple comparison across channels, and permutation test was then applied to find significant clusters of contiguous time points. No significant cluster was identified for either dataset. Thus, unlike the decoding accuracy derived from multichannel ERPs, individual differences in univariate ERPs showed no relation to that in attentional modulation of N1.
Finally, we also explored whether cue-related decoding accuracy was correlated with behavior using the same approach as that for the correlation between target-related decoding accuracy and RT or RT difference (see Section 2.7.2). However, no significant results were found for either dataset (see Figure S6 in Supplemental Materials).

Instructional cueing dataset
After the target onset, as Figure 7a illustrates, the decoding accuracy began to rise above chance level (50%) at 100 ms, and remained significantly above chance level for the remainder of the analysis period (mean RT was 470 ms). The weight maps showed a frontal distribution during the 200-400 ms interval, and F I G U R E 4 Mean accuracy of event-related potential (ERP)-based multivariate decoding for cue-related neural processing (cue left vs. cue right) in the Instructional cueing dataset (a) and Probabilistic cueing dataset (b). Chance level performance (0.5) is indicated by the horizontal dash lines. Gray areas indicate clusters of time points in which the decoding was significantly greater than chance after the false discovery rate (FDR) correction for multiple comparisons. The blue shading indicates ±1 SEM. Weight maps from successive time points within the indicated windows were averaged and shown on the right F I G U R E 5 Results of between-subject correlation between cue-related event-related potential (ERP)-based decoding accuracy and attentional modulation of target-related N1 in Instructional cueing dataset (a) and Probabilistic cueing dataset (b). The p values from the two datasets were combined by the Liptak-Stouffer meta-analysis (c). The p = .05 and r = 0 are indicated by the horizontal dash lines. Panel (d) shows the results of permutation tests for the two consecutive time windows identified in Panel (c). The null distribution was estimated from 1,000 permutations of the data, by randomly pairing one subject's decoding accuracy with another subject's N1 modulation. If the window length from the observed data (red and green lines) falls within the top 5% of values from the null distribution (indicated by the yellow area), the observed window is considered to be significant became more parietal-distributed as time progressed. These weight maps were similar to the ERP topographical maps from corresponding time windows (See Figure S2 in Supplemental Materials).

Probabilistic cueing dataset
Similarly, as Figure 7b illustrates, the decoding accuracy began to rise above chance level (50%) at 160 ms after the target onset. The decoding accuracy gradually decreased after reaching its peak at 300 ms, but still remained significant until 740 ms (mean RT in invalid trials was 566 ms). Similar to the ERP topographical maps (See Figure S2 in Supplemental Materials), the weight maps showed a frontal distribution during the 200-400 ms interval, which then diminished as time progressed.
Comparing the onset of above chance decoding between the two datasets We used bootstrap resampling (100 times) across subjects to compute a distribution of decoding onset times for each dataset. A t test suggested that the decoding onset time was significantly earlier in Instructional cueing dataset than in Probabilistic cueing dataset (85.8 ± 3.0 ms vs. 156.0 ± 4.1 ms, t (198) = −13.734, p < .001, Cohen's d = 1.942).

Comparing decoding accuracy between the two datasets
We divided the target analysis interval after decoding onset into two equal-sized time windows, that is, 200-500 ms (early) and 500-800 ms (late), and averaged the decoding accuracy within each window for each subject and dataset. The splitting point, that is, F I G U R E 6 Results of between-subject correlation between cue-related univariate event-related potential (ERP) difference waves (cue left minus cue right) and attentional modulation of target-related N1 in Instructional cueing dataset (a) and Probabilistic cueing dataset (b). The p = .05 and r = 0 are indicated by the horizontal dash lines. Five channels with the smallest averaged p values within 0-1,200 ms interval were shown. All p values were false discovery rate (FDR)-corrected for channels at each time point. No significant effects were found 500 ms, roughly corresponded to the mean RT (to attended or valid targets) in the two experiments, and thus such separation would help to reveal possible differences before and after behavioral responses.
A two-way ANOVA with Dataset (Instructional vs. Probabilistic) as a between-subject factor and Window (early vs. late) as a within-subject factor revealed a main effect of Dataset (F (1,54) = 187.032, p < .001, η 2 p = 0.776), suggesting significantly higher decoding accuracy for Instructional cueing dataset than for Probabilistic cueing dataset. A main effect of Window (F (1,54) = 39.033, p < .001, η 2 p = 0.420) suggested that decoding accuracy significantly declined in the late window than in the early window. No interaction was observed.

| Linking decoding accuracy with RT
To examine the functional significance of decoding accuracy following the target onset, we correlated individual differences in decoding accuracy at each time point within 0-560 ms post-target interval with individual differences in behavioral performance (RT to attended targets for Instructional cueing dataset, RT difference between invalid and valid trials for Probabilistic cueing dataset). The correlation coefficients (r values) and corresponding p values for Instructional cueing dataset and Probabilistic cueing dataset are shown as functions of time in Figure 8a,b. From 180 to 560 ms, the correlation between decoding F I G U R E 7 Mean accuracy of event-related potential (ERP)-based multivariate decoding for target-related epochs (cued vs. uncued) in Instructional cueing dataset (a) and Probabilistic cueing dataset (b). Chance level performance (0.5) is indicated by the horizontal dash lines. Gray areas indicate clusters of time points in which the decoding was significantly greater than chance level after the false discovery rate (FDR) correction for multiple comparison problem. The blue shading indicates ±1 SEM. Weight maps from successive time points within the indicated windows were averaged and shown on the right accuracy and RT was statistically significant (p < .001) for Instructional cueing dataset, according to a permutation test (see Figure 8a, red line).
The negative r values suggested that individuals with higher decoding accuracy had faster responses to attended targets. For Probabilistic cueing dataset, the correlation between decoding accuracy and RT difference was statistically significant (p = .004) from 60 to 340 ms, according to a permutation test (see Figure 8b, red line). The positive r values suggested that individuals with higher decoding accuracy exhibited larger RT differences between invalid and valid trials (i.e., stronger benefits from attention cueing).
F I G U R E 8 Results of between-subject correlation between target-related event-related potential (ERP)-based decoding accuracy and reaction time (RT) effects ((a): RTs to attended targets; (b): invalid RTs minus valid RTs) in Instructional cueing dataset (a) and Probabilistic cueing dataset (b). The p = .05 and r = 0 are indicated by the horizontal dash lines. For each dataset, the null distribution was estimated from 1,000 permutations of the data, by randomly pairing one subject's decoding accuracy with another subject's RT effect. If the window length from the observed data (red line) falls within the top 5% of values from the null distribution (indicated by the yellow area), the observed window is considered to be significant Next, we correlated the univariate ERP difference waves (cued target minus uncued target, with left and right targets combined; see Figure S2 in Supplemental Materials) with RT or RT difference across subjects in each dataset. The correlation analysis was performed at each time point within the 0-560 ms post-target interval (Figure 9). FDR correction was first applied to correct for multiple comparison across channels, and permutation test was then applied to find significant clusters of contiguous time points. No significant cluster was identified for either dataset.
Finally, it should be noted that the mean RT was used in the present study as the behavioral measure for target processing. This makes our findings more comparable with previous visual spatial attention studies in which the mean RT was typically reported. However, RT distributions are known to be non-Gaussian and left-skewed, and therefore the median RT might be a better estimator than the mean RT as the summary statistic of the distribution. We performed an additional set of correlation analysis with the median RT as the behavioral measure for target processing, that is, median RT to attended targets for Instructional cueing dataset, and difference in median RT between valid and invalid trials (invalid minus valid) for Probabilistic cueing dataset. As shown in Figure S7 in Supplemental Materials, the results were generally similar to that observed using mean RT.

| DISCUSSION
We applied machine learning approaches to multichannel ERP data to examine the dynamics and functional significance of neural representations of attention control and selection in two cued visual spatial F I G U R E 9 Results of between-subject correlation between target-related event-related potential (ERP) difference waves (cued target minus uncued target) and reaction time (RT) effects (Panel (a): RTs to attended targets; Panel (b): invalid RTs minus valid RTs) in Instructional cueing dataset (a) and Probabilistic cueing dataset (b). The p = .05 and r = 0 are indicated by the horizontal dash lines. Five channels with the smallest averaged p values within 0-560 ms interval were shown. All p values were false discovery rate (FDR)-corrected for channels at each time point. No significant effects were found attention experiments (probabilistic cueing vs. instructional cueing).
SVM-based multivariate decoding was performed at each time point in the cue-related time period (cue left vs. cue right) and in the targetrelated time period (cued target vs. uncued target). We found that following cue onset, the decoding accuracy began to rise above chance level at 80 ms for the Instructional cueing dataset, and at 160 ms for the Probabilistic cueing dataset. Across subjects, decoding accuracy between 460 and 660 ms post-cue predicted the magnitude of attentional modulation of subsequent target processing indexed by the target-evoked N1 component, indicating that the attentional set or template implemented between 460 and 660 ms directly affected the attention selection of the target. During target processing, the decoding accuracy began to rise above chance level at 100 ms for the Instructional cueing dataset, and at 160 ms for the Probabilistic cueing dataset. Across subjects, decoding accuracy over a broad post-target time window predicted RT (or RT cueing effect), suggesting that the distinctness of neural representations of attended information affected subsequent behavioral performance. In contrast, univariate ERP analysis failed to provide an association between ERP attention effects during the cue-target interval and the attentional modulation of target-evoked response or between ERP attention effects during target processing and subsequent behavioral performance. Together, these findings suggest that multivariate decoding analysis of ERPs is a powerful approach, and along with the conventional univariate ERP analysis, offers more comprehensive insights into the neural mechanisms of attention control and selection in visual spatial attention.

| Decoding attention control in anticipatory attention
Consistent with our hypothesis that instructional cueing is associated with earlier formation of the attentional set, the decoding accuracy (cue left vs. cue right) began to rise above chance level at 80 ms following the onset of instructional cue, but at a much delayed time of 160 ms following the onset of probabilistic cue. EDAN, an early ERP component thought to mark the initial attention shift toward the attended location (Harter et al., 1989;Hopf & Mangun, 2000), often appears at 200 ms post-cue (sometimes earlier at 160 ms, see Nobre et al. (2000)) in previous ERP analysis of the spatial cueing paradigms. The timing difference between above-chance-level decoding onset (80 or 160 ms) and EDAN latency (200 ms) suggested that the attention shift was initiated earlier than previously thought.
Although no study has explicitly compared the onset latency of EDAN between instructional cueing and probabilistic cueing, the 80 versus  van Velzen and Eimer (2003).
If this were the case, the decoding accuracy should rise above chance level equally early for both instructional cueing and probabilistic cueing, since similar arrow cues were used in both paradigms. Our finding that the onset of above-chance decoding was 80 ms earlier in instructional cueing than in probabilistic cueing appears to suggest that this is not the case. In the same vein, the correlation between cue-related decoding accuracy and attentional modulation of targetrelated N1 may also have been sensory-driven, namely, the subjects who had stronger ERP differences between a leftward versus rightward arrow also had stronger differences between attended and unattended N1s. Our result that significant correlation started around 460 ms post-cue (the time of LDAP, see the discussion below), rather than the early parts of cue processing, appears to rule out such sensory confound. Despite the foregoing, how to more thoroughly disambiguate the ERP effects due to differences in stimuli and to differences in the cognitive processes that these stimuli evoke remains a question to be addressed by future studies.
As time progressed, the decoding accuracy (cue left vs. cue right) continued to rise and reached a local maximum around 300 ms in both datasets, and then declined slightly, but remained well above chance level until the end of the cue-target interval. This temporal pattern suggests that the subject, following instructions, was able to maintain a state of covert attention until the onset of target processing. In contrast to the relatively stable multivariate decoding dynamics, univariate ERP analysis showed that ADAN, an ERP component reflecting supramodal mechanisms of attentional engagement in frontal areas (Eimer et al., 2002), appeared at 350 ms in both datasets, whereas LDAP, an ERP component indexing increase in the excitability of occipital cortical neurons (Hopf & Mangun, 2000;Kelly et al., 2009), started to appear at 400 ms in Instructional cueing dataset and at 450 ms in Probabilistic cueing dataset (see Figure S1 in Supplemental Materials). Both ADAN and LDAP vanished after 700 ms, and a contralateral pretarget negativity with a frontal concentration became the dominant ERP phenomenon. Although previous studies have already reported contralateral negativity throughout the pretarget period, the scalp distribution of this negativity varied across studies, that is, within the occipital-parietal area (i.e., BRN) (Grent-'t-Jong et al., 2011;Grent-'t-Jong & Woldorff, 2007), the frontal area (Hopf & Mangun, 2000) or a broader area including frontal and parietal regions (Dale et al., 2008). This indicated that multiple neural processes might underlie the late negativity, making it insufficient to examine these processes in a univariate ERP approach. Despite this uncertainty, our multivariate decoding analysis suggested that the distinctness of neural representation of the attentional set during this late cue-target interval (>700 ms) did not significantly decline compared to the early cue-target interval (<700 ms).
Decoding accuracy (cue left vs. cue right) is an indicator of how well attended information is represented in the brain. As such, it is reasonable to expect that better representation of attended information in the cue-target interval will result in stronger attentional modulation of target processing. We tested this hypothesis by correlating, across subjects, the decoding accuracy derived from multivariate classification analysis with attentional modulation of target-evoked N1.
As shown in Figure 5, during the time period of 460-660 ms, decoding accuracy is positively correlated with the size of attentional modulation of target-evoked N1 component. In visual spatial attention, this time period was often regarded as the critical stage in the implementation of the attention control state for the representations of task-relevant locations (Dale et al., 2008;Grent-'t-Jong & Woldorff, 2007;Hopf & Mangun, 2000). Our results thus suggest that the anticipatory attentional state that was implemented at this time can directly impact subsequent attention selection of behaviorally relevant stimuli, and more importantly, this attentional state was indexed as a whole brain ERP pattern instead of univariate ERP amplitudes based on any single electrode, as our univariate ERP analysis did not reveal any correlation between ERP amplitudes and attentional modulation of target-evoked N1. The weight maps of the classifier during 400-600 ms post-cue interval showed a frontal-posterior pattern that corresponded with a combined ADAN and LDAP topography (see Figure 4), suggesting that these ERP components jointly, rather than singly, influence the attentional modulation of target-related ERPs.
This may explain why univariate ERP analysis was not able to predict the magnitude of attentional modulation of target-evoked N1. During the latter part of the cue-target interval (>700 ms), although the decoding accuracy did not significantly decline, it no longer predicted attentional modulation of N1. This contrasts with previous reports that late negativity within the occipital-parietal area (Grent-'t-Jong et al., 2011) or the frontal area (Dale et al., 2008) alone predicted attentional modulation of N1. On one hand, the relatively easy discrimination task used in our experiments, compared with a more difficult task, could substantially decrease the amplitudes of the late negativity (Grent-'t-Jong et al., 2011), which might then reduce the decodability during the late cue-target interval. On the other hand, in the late cue-target interval, the neural representation involving multiple neural processes captured by the multivariate decoding analysis may not have the same simple relationship with attentional modulation of N1. The above two aspects may underlie the reason that above chance level decoding in the late cue-target interval did not predict attentional modulation of N1.

| Decoding attention selection during target processing
Following the onset of the target stimulus, the decoding accuracy (cued target vs. uncued target) began to rise above chance level at 100 and 160 ms for the Instructional and Probabilistic cueing datasets, respectively. This again illustrated that attention exerted an earlier influence on target processing under instructional cueing than under probabilistic cueing. Although no study has explicitly compared the onset of attention effects between instructional cueing and probabilistic cueing using univariate ERP analysis, previous studies have shown that attention selection of sensory processing occurred as early as P1 component at 100 ms after stimulus onset (Hillyard & Anllo-Vento, 1998;Luck et al., 2000). In this study, significant attention effect for P1 component was not observed in the univariate ERP analysis, but the multivariate decoding analysis shows that attention effects are represented in multivariate patterns during the same time period. Moreover, the higher decoding accuracy in instructional cueing relative to probabilistic cueing was again as expected, suggesting that unilateral attentional focus facilitates the attention selection of the target stimulus. As time progressed, further differences between the two paradigms emerged. In Instructional cueing dataset, the decoding accuracy remained relatively high (>0.8) after reaching the peak at 300 ms (Figure 7a), while in Probabilistic cueing dataset, the decoding accuracy gradually declined after reaching the peak at 300 ms, and became nonsignificant near the end of the analysis period ( Figure 7b). This pattern was also expected from the paradigm requirements. In the probabilistic cueing experiment, subjects needed to orient their attention from the cued location to the uncued location upon seeing targets in the uncued location, which caused a gradual decrease of the decodability between valid and invalid targets, whereas in the instructional cueing experiment, no such attention shift was needed to complete the task, and the subject's attention focused on the cued location even after the target stimulus appeared in the uncued location (such target stimuli were ignored). Utilizing the individual differences in decoding accuracy and behavioral performance, the functional significance of decoding accuracy was examined by correlating decoding accuracy during target processing and RT (or RT cueing effect) across subjects. Significant correlation was observed soon after target onset (see Figure 8), and this correlation remained significant until the response was made, lending support to the notion that stronger attention selection of the target stimulus leads to better behavioral performance.
It is worth noting that from decoding analysis it is not easy to discern the individual contribution of distributed cognitive processes activated by target onset. Univariate ERP analysis has revealed attentional modulation of neural activity at multiple stages of information processing, including ERP differences following early sensory components (e.g., P1 and N1) as well as prior response execution (e.g., Nd1, Nd2, and LPD) (Curran et al., 2001;Eimer, 1996;Eimer, 1998;Mangun & Buck, 1998;Mangun & Hillyard, 1991). Although the precise functional correlates of these late ERP activities are still not clear, they should reflect perceptual, cognitive, and motor consequences of spatial attention (Mangun & Buck, 1998). More interestingly, the maximum decoding accuracy at 300 ms roughly corresponds with the latency of Nd2 which is part of the broader LPD (Curran et al., 2001;Eimer, 1996;Eimer, 1998), indicating that the decoding accuracy might be primarily driven by Nd2. This inference was further supported by the similarity between the SVM weight maps (see Figure 7) and ERP topographical maps (see Figure S2 in Supplemental Materials). Despite these important findings, univariate ERP differences between cued target and uncued target at single electrode did not predict behavioral performance (see Figure 9). One reason may be that the ERP patterns over the scalp could differ between different stages of information processing in spatial attention. By contrast, the decoding analysis can capture the ERP pattern over the whole scalp, which appeared to be a reliable predictor for behavioral performance in spatial attention tasks.
One limitation of the present study is that two different cohorts participated in the two experiments. When comparing the results between the two experiments, such between-subject design generally has lower power compared with within-subject design, leading to elevated chances of false positives. Another limitation in the present study is that the duration of the target stimulus differed between the two experiments (200 vs. 100 ms), which might impact the early sensory activities elicited by the target. However, previous research reported no differences in either the amplitude or the latency of N1 due to differences in stimulus duration (50, 150, and 250 ms) (Busch, Debener, Kranczioch, Engel, & Herrmann, 2004). Thus, the attentional modulation of target processing, indexed by the difference in N1 amplitude between cued and uncued targets, is unlikely to be impacted by the differences in target duration.

| Conclusions
In the present study, we applied a machine learning approach to analyze neural representations of attention from multichannel ERP patterns over the whole scalp in two independent visual spatial attention experiments. Across the two experiments, the direction of covert attention can be decoded, and the decoding accuracy during the cuetarget interval (460-660 ms post-cue) in which anticipatory attention set was implemented predicted the attentional modulation of target-related N1 amplitude. Also across the two experiments, after target appearance, attended targets can be decoded from unattended (or less attended) targets, and the decoding accuracy predicted behavioral performance (RT or RT cueing effect). However, no brainbehavior association was observed when the correlation analysis was based on univariate ERP analysis, that is, ERP amplitudes from single channels. Therefore, our findings suggest that top-down attentional control and its modulation on target processing is more comprehensively represented in multichannel ERP patterns over the whole scalp, rather than ERP amplitudes measured in specific channels, and that naturally occurring individual differences in neural and behavioral variables enable the study of the functional significance of decoding accuracy derived from multivariate classification in attention control and selection.