Saccade size predicts onset time of object processing during visual search of an open world virtual environment

Objective: To date the vast majority of research in the visual neurosciences have been forced to adopt a highly constrained perspective of the vision system in which stimuli are processed in an open-loop reactive fashion (i.e., abrupt stimulus presentation followed by an evoked neural response). While such constraints enable high construct validity for neuroscientific investigation, the primary outcomes have been a reductionistic approach to isolate the component processes of visual perception. In electrophysiology, of the many neural processes studied under this rubric, the most well-known is, arguably, the P300 evoked response. There is, however, relatively little known about the real-world corollary of this component in free-viewing paradigms where visual stimuli are connected to neural function in a closed-loop. While growing evidence suggests that neural activity analogous to the P300 does occur in such paradigms, it is an open question when this response occurs and what behavioral or environmental factors could be used to isolate this component. Approach: The current work uses convolutional networks to decode neural signals during a free-viewing visual search task in a closed-loop paradigm within an open-world virtual environment. From the decoded activity we construct fixation-locked response profiles that enable estimations of the variable latency of any P300 analogue around the moment of fixation. We then use these estimates to investigate which factors best reduce variable latency and, thus, predict the onset time of the response. We consider measurable, search-related factors encompassing top-down (i.e., goal driven) and bottom-up (i.e., stimulus driven) processes, such as fixation duration and salience. We also consider saccade size as an intermediate factor reflecting the integration of these two systems. Main results: The results show that of these factors only saccade size reliably determines the onset time of P300 analogous activity for this task. Specifically, we find that for large saccades the variability in response onset is small enough to enable analysis using traditional ensemble averaging methods. Significance: The results show that P300 analogous activity does occur during closed-loop, free-viewing visual search while highlighting distinct differences between the open-loop version of this response and its real-world analogue. The results also further establish saccades, and saccade size, as a key factor in real-world visual processing.


Introduction
Decades of research in the visual neurosciences have produced a wealth of knowledge about the oscillatory and evoked neural processes that correspond with stimulus encoding, attentional reorienting, and the transfer of low-level percepts to higher-order cognitive systems.Due to the inherently low signal-to-noise ratio (SNR) of scalp electroencephalogram (EEG) these subtle processes must be observed through ensemble averaging of repeated trials based on the assumption of an underlying linear model with an additive noise term independent of experimental conditions (Luck, 2014).Obtaining these trials, typically, involves the use of brief (milliseconds) presentations of static visual stimuli flashed on a computer monitor while participants remain motionless with eyes focused on the center of the screen.Such designs constrain perceptual processes to an open-loop system with discrete (impulse) events (Fig. 1 Top Row).While these approaches maximize SNR by minimizing variable latency, or temporal uncertainty, of the driving event these same paradigms fail to place the observed neural processes in context of the underlying closed-loop dynamic system for which they were designed to serve.In closed-loop systems, impulse events may perturb the system away from steady state, but ongoing adaptation allows the system to correct itself to efficiently solve the problem at hand.In the context of visual neuroscience research, closed-loop designs can include both behavioral and environmental pathways (Fig. 1 Middle and Bottom Rows, respectively); however, relatively little is known about the closed-loop analogues of many evoked neural processes.
One neural process that has received much attention in open-loop paradigms is the P300 component, or P3, an evoked response believed to reflect attentional re-orienting and encoding of task-relevant stimuli serving to maintain awareness of the outside world (Donchin, 1981;Polich, 2007;Nieuwenhuis et al., 2011;Twomey et al., 2015).The P300 was first described by (Sutton et al., 1965).The response is believed to be the result of simultaneous, synchronous neural activity from a widespread cortical network involving contributions from temporal-parietal, medial-parietal, and lateral-prefrontal regions (Soltani and Knight, 2000).One of the core aspects of P300 theory is that of context updatingin which the response is believed to play a role in maintaining and revising one's internal model of the environment (Donchin, 1981;Donchin and Coles, 1988).The response has been divided into two separate components: 1) P3a which presents predominantly in frontal areas in the Theta (4-7 Hz) band and is an indicator of rapid, automatic attentional reorienting, and 2) P3b which presents in centro-parietal regions in the Delta (1-3 Hz) and Theta bands and has been used as a measure of task-relevant stimulus processing (Polich, 2007).Though both components receive the name "P3" due to their positive activations and similar onset times, the P3a and P3b are two distinct, separate components that are often tightly coupled (Fonken et al., 2020).It is the P3b, though, that is generally referred to by the term "P300" and the focus of most research into P300 activity (Fields, 2023).The P3b has also been referred to as a "late positive complex" (LPC), which has been the focus of studies on working memory and its relationship to perceived variations in cognitive ability (Gevins and Smith, 2000;Nittono et al., 1999).
The P300 complex (i.e., P3b) is usually very stereotyped.In response to a target, or task relevant, stimulus, a positive change in neural activity is observed beginning around 300 ms after stimulus presentation.Measured with EEG, this activation change is first seen in occipital and parietal electrodes.Projections of this activity then rapidly move to electrodes located over centro-parietal regions, before dissipating in lateral frontal regions.The amplitude of this complex is often measured against a prestimulus baseline and is believed to be related to the strength of stimulus encoding (Polich, 1987).When the same stimuli are used with repeated trials researchers often report a habituation effect in which later trials are associated with diminished response amplitudes.Factors such as novelty of and habituation to the stimulus, target-distractor similarity, and cognitive interference from concurrent tasks are all known to affect P300 amplitude (Polich, 2007).Variations in the latency, on the other hand, arise from cognitive interference as well as the complexity of the stimulus.
While most research into the P300 is conducted in open-loop paradigms with a central fixation constraint, the response can also be obtained using co-registered eye movements, where saccades and fixations, rather than stimulus onset, are the time-locking event (Dimigen et al., 2011;Brouwer et al., 2013;Ries et al., 2016;Devillez et al., 2015;Ladouce et al., 2022).Analysis of evoked responses resulting from fixations on a stimulus was first described by (Yagi, 1979) and since then a number of fixation related responses, such as the lambda component (P100), N400, N170, and the P300 have been described as well as their relationship to visual and cognitive function (Dimigen et al., 2011;Ries et al., 2016Ries et al., , 2018;;Kazai and Yagi, 2003;Dandekar et al., 2012a).However, these paradigms often use visually sparse displays and peripheral cues to guide and control eye movements.In this way, these designs inherently replicate the open-loop aspect associated with most P300 investigation.For example, in a pair of studies (Touryan et al., 2017) and (Brouwer et al., 2017) used structured grids of stimuli, paired with guided fixations to investigate the neural correlates of fixation-related visual stimulus processing.Fixation rates were controlled to reduce the impact of overlapping responses and target Fig. 1.Top Row -Open-loop paradigm in which eyes are fixed and stimuli are flashed for a brief (e.g., order of milliseconds) window of time.P300 responses have been observed with known sources of temporal and amplitude variability.Middle Rowunconstrained eye movements are allowed to create a behaviorally closedloop.Static scenes are often used with presentation duration on the order of seconds.Bottom Rowthe brain is allowed to form a true closed-loop with both behavioral and environmental pathways.The environment is searched over a period of several minutes, one state leads naturally to the next state, and time pressures are largely removed.
discrimination required direct fixation on the stimulus.Here, researchers were able to identify P300 responses in the presence of ocular artifacts and these responses exhibited amplitude and latency variations resulting from increases in cognitive load.
When eye movements are not constrained, or controlled, the brain is able to assert some aspects of closed-loop control over stimulus processing.For example, work in (Ries et al., 2018;Devillez et al., 2015;Kamienkowski et al., 2012;Nikolaev et al., 2023) allowed free-viewing visual search of static scenes with predefined target locations (e.g., behaviorally closed-loop).In such cases, the stimuli are, usually, only visible for a few seconds (e.g., 4-10 s) and/or designed to make target versus nontarget discrimination especially difficult (e.g., a rotated character inside a small foveal region).Work in (Lapborisuth et al., 2019;Stankov et al., 2021) incorporated dynamic scenes but used pop-up targets that were only visible for a brief amount of time.In both cases, the end result is a step towards ecological validity while leveraging stimulus and task design to minimize the variable latency of the driving perceptual event.Such paradigms have shown enhanced P300 activity for target versus nontarget stimuli in the presence of ocular artifacts, overlapping fixations, and unconstrained behavior (e.g., eye movements).This work has also suggested a role for saccade size in determining the onset and amplitude of subsequent fixation-locked responses (Devillez et al., 2015;Ries et al., 2018) and predicting the generation of evoked P300 activity (Lapborisuth et al., 2019).
More recently, (Ladouce et al., 2022) used fixation-locked analyses applied to mobile EEG recordings and reported the presence of P300-like activity during free-viewing visual search in a library task in which participants visually scanned a shelf of books to find a target book (e.g., behaviorally and environmentally closed-loop).The authors used a template matching algorithm, tailored to the participant, to decode the EEG data around the moment of fixation, identify the time-block most likely associated with the evoked response, and then retroactively time-lock the analysis to this new sample point.This work demonstrates that visual evoked responses are measurable in such complex paradigms; however, the analysis approach also strongly suggests that a critical confounding factor in environmentally closed-loop paradigms is variable latency.This is because closing the loop through both behavioral and environmental pathways reduces the impact of infrequent impulse responses while favoring steady state behavior.In such closed-loop systems, causality becomes blurred as information cycles through the system driving adaptation.Without a well-defined, temporally precise event to associate with the response, it is exceedingly difficult for ensemble methods to generate statistically robust estimates of the evoked activity.
Here, we investigate factors to reduce the uncertainty in the onset time of any P300 analogue that occurs in a closed-loop free-viewing visual search (CL-FVVS) task under environmentally and behaviorally closed-loop conditions.We use the term "P300 analogue" when referring to the specific pattern of neural activity associated with traditional go/no-go target detection paradigms (i.e., P3b, or P300 complex) but manifested in the CL-FVVS task.This same pattern has been observed in open-loop paradigms with well-defined latencies with respect to known exogenous events but has only been isolated in behaviorally closed-loop conditions where stimuli are designed to elicit more discrete, impulselike responses.When considering steady state behavior of fully closedloop systems it is an open question when and to what extent this response occurs.
In the CL-FVVS task that we employ, participants had to navigate through an open world virtual environment while searching for and counting the occurrence of target objects.The target objects were randomly placed within the environment along with numerous nontarget (i.e., distractor) objects.We use domain-generalized neural decoders to estimate the probability distribution of P300 analogous events around the moment of fixation.Using the grand average response profiles provided by the neural decoder, we investigate which measurable factors lead to the most robust response, across individuals, with respect to variable latency.We use the term variable latency to refer to the wholesale temporal shift of the entire temporal/spatial pattern collectively known as the P300, or P3b.Given that the task utilizes a dynamic environment in which the same object may appear at first on the horizon and then, at intervals, successively closer, and movement through that environment is under the control of the participant, we conduct two separate analyses of the data.We begin by analyzing responses for initial fixations on objects only.Next, we repeat this analysis using refixations in order to look for consistency across these two aspects of the task (i.e., initial stimulus identification, or exploration, and subsequent context updating, or information recovery (Nikolaev et al., 2023)).

Influences on cognitive processing during visual search
We consider influences from both top-down and bottom-up processes and use surrogate factors that have been previously shown as reliable indicators of these converging systems and map onto known sources of variability in the P300.Specifically, we use fixation duration and salience of the attended region, respectively.We also consider saccade size, or the angular degree of eye movement immediately preceding each fixation, as an intermediate factor, as saccade size is believed to derive influences from both top-down and bottom-up systems.

Fixation duration
Prior work has suggested that real-world visual search behavior is dominated by top-down guidance (Chen and Zelinsky, 2006;Itti and Borji, 2015;Schütt et al., 2019, Zelinsky et al., 2005).Top-down control systems prioritize processing of goal-and task-relevant information.In visual search tasks (constrained and unconstrained) this has been assessed by comparing the amount of time spent processing each stimulus (e.g., Malcolm and Henderson, 2010;Eimer and Kiss, 2010) and is often reported as fixation duration, or the length of time the eyes remain fixed on a region of the scene.Fixation duration has been shown to be a reliable indicator of target-distractor similarity with longer fixation durations associated with processing target objects versus nontarget objects or distractors (Reingold and Glaholt, 2014;Enders et al., 2021).From the perspective of neural activation, variability in fixation duration has been found to be largely accounted for by attentional, perceptual, and cognitive processes associated with scene analysis and positively correlated with activation in visual and prefrontal executive control regions (Henderson and Choi, 2015).Similar to analyses conducted on the P300, changes in fixation durations have often been used as indirect measures of mental load or task demands (Liu et al., 2022;Meghanathan et al., 2015).

Salience
Another factor that may influence P300 onset time is the salience of the to-be-attended region, which is believed to influence bottom-up processing.Bottom-up processes are characterized by an automatic, rapid, often pre-attentional allocation of visual processing resources towards an item that has distinguishable features (e.g., color, orientation, spatial frequency).These processes are considered to be distinct from top-down processes that are characterized by attentional guidance towards regions or objects that have perceived importance to the viewer based on task demand or internal goals (Treisman and Gelade, 1980).As noted earlier, the P3a is known to be an indicator of rapid, automatic attentional reorienting, often to novel or attention-grabbing stimuli, (Polich, 2007) and is believed to be highly coupled with the subsequent P3b (Fonken et al., 2020), indicating a strong relationship between bottom-up driven re-orienting and subsequent stimulus encoding.
To assess bottom-up, feature-based processing in visual search tasks researchers have often used predefined salience maps (Itti and Koch, 2000).Such maps can be created based on current understanding of how humans extract features, such as color and orientation (Treisman and Gelade, 1980) from visual scenes or using various computational models that selectively extract specific features from the scene (Itti and Borji, 2015).Salience maps have been used to predict gaze locations within a given scene and to assess the regions and/or objects that capture an individual's attention.Since its neural computational origin by (Koch and Ullman, 1985), several salience models have been proposed, differing mainly by the selection of features and how these features are extracted and combined.Prior work has shown that salience has an influence on what items are attended in an environment and that salience can be used as an indirect measure of bottom-up processing utilization (Açık et al., 2010;Krishna et al., 2018).

Saccade size
While fixation-locked analyses have been used to assess components of stimulus processing, saccadic analysis has primarily focused on the dynamics of attention and attention shifting.Using saccades as the locking mechanism, researchers have found evidence of a shift of attention toward the to-be-fixated-region that begins before saccadic eye movements (Deubel, 2008;Stankov et al., 2021;Jonikaitis et al., 2013).This attention shift is covert, occurs during the preparation for the saccade, and has been shown to enhance discrimination performance and processing intensity at the target location (Matin et al., 1993;Sanders and Houtmans, 1985;Irwin, 2003;Findlay, 2013).
Saccade size has been found to be a significant predictor of the amplitude of the subsequent fixation-locked lambda response (e.g., Ries et al., 2018;Thickbroom et al., 1991).Saccade size has also been found to impact the latency of the lambda response (Dandekar et al., 2012b) when measured against saccade onset.During free-viewing search of static scenes, e.g., behaviorally closed-loop, (Devillez et al., 2015) found that large saccades were associated with longer latency P300 responses for first fixations on a target.Saccadic movements have been also shown to modulate the firing rate of early visual cortex neural assemblies (McFarland et al., 2015).Saccade generation has been proposed as the output of both anterior and posterior systems enabling rapid, reflex-like activity as well as goal-oriented activity (Schiller and Tehovnik, 2005) indicating that saccade generation and saccade size are modulated by both bottom-up and top-down systems.

Eye movement confounds
Eye movements are known to produce artifacts such as the saccadic spike, which is noise arising from the electric potentials around the eye, the associated movement of the eye dipole, and contraction of extraocular muscles (Keren et al., 2010), as well as early sensory processing potentials, such as the lambda component (Nikolaev et al., 2016).Furthermore, fixations are known to occur at a faster rate than late-stage cognitive components such as the P300 complex, thus simply time-locking to all fixations can produce both convolutional effects (i.e., overlaying the response with itself) and additive effects (i.e., overlaying faster sensory processes from neighboring fixations (Kamienkowski et al., 2012;Dandekar et al., 2012b)).However, there are potential avenues for addressing these concerns.
The most challenging of these confounds are, arguably, those that are phase-locked to the response of interest.For fixation-locked P300 analysis, this includes saccadic spikes and lambda potentials.Nonphase-locked confounds arising from neighboring fixations that occur during, or near, the epoch window may "average out" provided they meet certain conditions (e.g., additive, independent, random distribution) (Hiebel et al., 2018).Phase-locked confounds, however, must be addressed.This may include passband filtering and signal cleaning approaches (e.g., independent components analysis and regression techniques) to improve SNR.Another approach is to subtract a "control" response in which the artifact is present but the signal of interest is, presumably, not present or present to a much less degree.Alternatively, one can also use neural decoding approaches that provide a more robust estimate of the response in the presence of noise and artifact by modeling the associated pattern in its entirety (Ladouce et al., 2022;Dandekar et al., 2012b;Solon et al., 2019).

Neural decoders
In the current work, we follow the approach of (Ladouce et al., 2022) and (Dandekar et al., 2012b) by using a neural decoder to improve estimation of the P300 complex in the presence of noise and eye movement confounds.This is in contrast to other approaches that attempt to separate, or deconvolve, noise and overlapping influences found in free-viewing, fixation-locked data (Ehinger and Dimigen, 2019).Such deconvolution approaches tend to require an a priori regression model describing the relationship between individual terms (e.g., saccade size or fixation count) and the neural response, whereas neural decoders assess the probability, or confidence, that the entire pattern (e.g., scalp space and time) is present regardless of noise.Neural decoders reconstruct information about the world, or experimental design, from the encoded information in the neural record.In this case, the task is to decode the EEG record to determine the state of the P300 analogue (present or not).Neural decoding methods have been applied and studied in contexts ranging from single cell recordings (Quian Quiroga and Panzeri, 2009) to participant-specific EEG recordings (Wang et al., 2009;Sajda et al., 2009;Lee et al., 2022;Aellen et al., 2023) to large scale analysis of EEG from multiple disparate experiments (Solon et al., 2019;Gordon et al., 2023b).It is the latter area of neural decoding that will be exploited in the current work and the methods used here are derived from approaches originally developed by the Brain-Computer Interface (BCI) community.
Given the amplitude and utility of the P300 response, decoding this signal has received much attention from the BCI community.As such, there is a wealth of methods and algorithms for decoding P300 responses (Borra et al., 2021;Abibullaev and Zollanvari, 2021;Farahat et al., 2019;Ditthapron et al., 2019;Sajda et al., 2009;Dandekar et al., 2012b).In addition, the BCI community, over the past decade, has transitioned from mostly linear approaches for signal decoding to complex, multilayer, nonlinear networks.Solutions have been developed for a number of application spaces such as participant-specific methods that maximize classification accuracy (or bit rate) (Vega et al., 2022), transfer learning and domain adaptation methods that minimize calibration time (Zhang et al., 2020), and domain generalization methods that support interrogation and analysis of novel, complex data (Gordon et al., 2023b).
For the current work we will use domain generalization methods to create the neural decoder.Domain generalization methods aim to create computational models of neural activity that do not require participantor task-specific training data (Gordon et al., 2017).In other words, the model learns a representation of neural activity in one domain (e.g., participant set and/or task set) and must apply that representation to decode data from a new domain (i.e., participant and/or task set).Such methods have previously been used to decode states related to vigilance, alertness, and workload (Ma et al., 2019;Kim et al., 2022;Gordon et al., 2023b), detect emotions (Li et al., 2022;Liang et al., 2022), diagnose neurological conditions (Ayodele et al., 2020), as well as perform more classical BCI tasks, such as motor imagery and P300 detection (Han and Jeong, 2021;Solon et al., 2018).
Prior work using the domain-generalized neural decoder described in (Solon et al., 2019) has demonstrated that decoder has all of the properties needed to assess P300 analogues in closed-loop paradigms (Gordon et al., 2017;Solon et al., 2017Solon et al., , 2023a)).Given the windowed approach of the decoder, if a pattern of neural activity sufficiently similar to the P300 complex appears over a given window of time (from t 0 to t 1 ), then the decoder will produce a high-confidence value of "P300" at time t 0 (Fig. 2A).We note that associating the decoder outputs with t 0 is convention adopted in (Solon et al., 2019).This time-varying signal can be epoched and averaged to produce a grand average response profile with both mean and standard deviation (Fig. 2A).Prior work in (Solon et al., 2019) showed that for time-locked P300 analyses this response profile peaks at t 0 = image onset, and the peak differences are significant for target versus nontarget stimuli.If the latency of the P300 response shifts the peak of the response profile also shifts (Fig. 2B).If the amplitude of the P300 response changes, so too does the amplitude of the grand average profile (Fig. 2C).This same work also established that the SNR improvements from the decoder allowed the conclusions of statistically significant greater responses to targets versus nontargets with, approximately, 75% fewer trials than traditional approaches based in EEG scalp space for a given test set.
Subsequent work in (Gordon et al., 2023a) then showed that in the presence of jitter with respect to stimulus onset (i.e., jittering of the entire evoked event) the response profile increased in width by an amount linear to the jitter (Fig. 2D).The authors also showed that under such jittered conditions, applied equally to both target and nontarget stimuli, that the decoder still enabled accurate dissociation between target and nontarget responses in an open-loop paradigm.Finally, work in (Solon et al., 2017) showed that the decoder produced better, more robust, estimates of the P300 when the training data was sampled from a diverse set of source domains (i.e., domain-generalized).Essentially, the decoder was able to learn noise and artifact invariant representations provided that examples of the noise were included in the training data.

Methods
In the following sections we describe the data and tasks employed in this work.We then describe the neural decoding approach including how the domain-generalized model is trained and, subsequently, applied to the test data.We also detail the EEG processing approach for generating aggregate response signals.

Data
Data was collected from 45 participants from the greater Los Angeles area (17 female, mean age 36.8 ± 12.3, 28 male mean age 41.6 ± 14.4) and, later, 21 participants from the San Antonio, TX (3 female, mean age 43.7 ± 6.7, 18 male mean age 32.6 ± 7.9).Data from one of the Los Angeles participants was subsequently excluded due to overall data quality issues in the EEG record yielding a total of 65 participants collected over a two-year time frame.Data collection was performed at two locations due to the COVID pandemic and regional restrictions on in-person data collection.All participants were at least 18 years of age and signed an Institutional Review Board approved informed consent form prior to participation .All participants had normal, or corrected-to-normal, vision.EEG data was collected using a 64 channel BioSemi Active II using the 10-10 montage, with an original sampling rate of 2048 Hz, two mastoid channels for reference, two vertical electrooculogram (EOG) channels, and two horizontal EOG channels.The data was downsampled to 128 Hz in post-processing.The vertical EOG electrodes were placed along the orbital ridge above and below the right eye.The horizontal EOG electrodes were placed at the lateral junction of the upper and lower eye lid for each eye.Eye tracking data was collected using a Tobii Pro Spectrum (300 Hz).Participants were seated at a distance of approximately 70 cm from the Tobii Pro Spectrum monitor (EIZO FlexSCan EV2451) in a quiet room with sound and lighting control.The monitor had a resolution of 1,920 × 1080 pixels.The eye tracking data were synchronized with the game state and EEG data using the Lab Streaming Layer protocol (Kothe et al., 2014).Each participant completed a go/no-go open-loop impulse response (OL-IR) P300 task before partaking in the CL-FVVS task.A full description of the experimental apparatus, data collection methods, and study design can be found in (Enders et al., 2021).Brief descriptions of the OL-IR and CL-FVVS tasks are included in the following subsections.

Open-loop impulse response task
In the OL-IR task, participants were presented with a series of images either of a castle or a fountain and were asked to maintain their gaze on the center of the screen and press the spacebar if the displayed image was a fountain (target stimulus) versus a castle (nontarget stimulus) (Fig. 3A).This is an open-loop paradigm intended to provide validation of the decoder for the given participant pool with a stereotyped example Fig. 2. A) Windowing method to obtain confidence estimates of P300 activity and associate those estimates with a given time point "t".B) When presented with latency changes in the P300 response, the decoder profile shifts laterally in time.C) When presented with amplitude changes in the P300 response the decoder profile amplitude changes as well.D) When presented with a jittered signal, the decoder profile gets wider and shorter indicating the confidence in P300 presence is spread over a larger temporal window.
of the P300 complex.Images were presented asynchronously with Interstimulus Interval (ISI) uniformly distributed between 1-4 s.Images were displayed for 250 ms.Between images a white crosshair was presented in the center of the screen.Three consecutive 5-minute sessions were conducted per participant yielding 78 target (fountains) and 312 nontarget (castles) nontarget trials.A sample collection of target and nontarget images is shown in Fig. 3B.

Closed-loop free-viewing visual search task
In the CL-FVVS task, the participants were asked to navigate through a virtual environment, presented on the monitor.Movements through the environment required using a mouse and keyboard.The environment was developed with Unity 3D and objects were manually placed throughout (Enders et al., 2021).There was a designated course through the environment that participants were encouraged to follow.The course was marked with yellow trail markers; however, participants were not required to stay on the course.Participants were given 20 min to navigate and search the environment in any manner that they chose.A collection of 234 objects were distributed throughout the environment along the course.Participants were divided into four groups and each group was given a unique target category.Participants were instructed to find and mentally count the number of target objects in the environment.There were 15 instances of each target class (Humvees, motorcycles, aircraft, or furniture) in the environment.Given the closed-loop design of the task any object could receive multiple fixations and remain in the participant's field-of-view for a window of time, on average up to 30-45 s.While the ratio of target objects to nontarget objects was just under 7% prior work has found that in such environments targets receives approximately 30-40% more fixations than nontargets (Horstmann et al., 2019;Draschkow et al., 2014).This would put the expected final ratio of target to nontarget object fixations around 10%, which is consistent with target/nontarget presentation rates from open-loop paradigms.A sample scene from the task is shown in Fig. 4. Here, we have added highlighted squares for trail markers, nontarget objects, and a sample target object.All participants searched the same environment.The only potential differences were the assigned target, the chosen path, and the duration of the task.

Neural decoder
We used the EEGNet architecture (Lawhern et al., 2018) to implement the neural decoder.While there are other convolutional approaches available, this model has been evaluated multiple times with respect to its ability to decode the P300 and we wish to leverage this substantial prior work (Solon et al., 2017;Gordon et al., 2017Gordon et al., , 2019;;Gordon et al., 2023a).EEGNet is a compact (i.e., low number of free parameters) convolutional neural network (CNN) that was specifically designed for EEG data (Lawhern et al., 2018).For a given EEG segment of shape [C, T] where C is the number of EEG channels and T the number of time points, EEGNet first uses 1-D convolutions of shape (1, F) (where F is the temporal filter length) to learn discriminative temporal frequency patterns, then uses depthwise convolutions of shape (C, 1) to learn spatial filters for each temporal frequency pattern individually, and finally depthwise separable convolutions (Chollet, 2017).Here, we fit the EEGNet model for 64 EEG channels using four temporal filters, two spatial filters per temporal filter, and eight separable filters (EEGNet 4-2-8, using the notation from (Lawhern et al., 2018)).We used a temporal filter length of 64 samples, representing 0.5 s of data sampled at 128 Hz.The model was trained for 150 iterations using the Adam optimizer with default parameter settings (Kingma and Ba, 2014), and a mini-batch size of 16 instances, optimizing a binary cross-entropy loss function.The dropout probability was set to 0.25 for all layers.Final model selection was made by choosing the weights that produced the lowest validation loss using 10% of the training data held-out of training for this purpose.The CNN model was implemented using the publicly available source code for EEGNet (https://github.com/vlawhern/arleegmodels)with Python 3.7 and Tensorflow 2.1.0.
We trained the CNN using the four data sets outlined in (Solon et al., 2019).While the interested reader is directed to that work for a detailed description of those experimental data sets, here we provide a brief description.Each of these experiments were conducted with a similar 64 channel BioSemi Active II and each experiment was designed to investigate some aspect of the P300 visual response.Three of the experiments relied on the open-loop with impulse events paradigm to ensure proper temporal isolation of the P300 event.The fourth experiment used a behaviorally closed-loop paradigm with well-defined stimulus onset/offset events.Two of the experiments used static stimuli flashed at different presentation rates with eyes fixed.The remaining two experiments allowed eye movements.One of the eye movement experiments used guided, well-controlled fixation rates, while the other allowed free search with controlled target onsets and offsets.This selection of training data allowed for the representation and inclusion of multiple sources of noise and artifact during the learning process including noise associated with comorbid eye movements, which has been shown to produce more robust and noise invariant estimates of P300 activity (Solon et al., 2017).All five of the data sets (4 training and the test data set investigated in the current work) contained two mastoid signals, which were averaged and used as reference.All of the data sets were downsampled to 128 Hz and bandpass filtered between [0.3, 50] Hz by first low-pass filtering at 50 Hz using a finite-impulse response filter (FIR) and then high-pass filtering at 0.3 Hz using another FIR filter.We performed median absolute deviation normalization for the data from each participant prior to training or testing the CNN.This final step was performed to normalize the data prior to decoding as well as subsequent ensemble averaging given that the data originated from multiple sources (e.g., participants and recording locations).
We used 1.25 s epochs to train the model.This selection was based on both empirical testing and our own prior work with this approach (Gordon et al., 2023a;Solon et al., 2019).This choice was made to be consistent with prior work using EEGNet as a domain-generalized neural decoder of P300 activity.Empirically, the factors that should be considered when selecting the epoch length are that short epoch lengths less than 1 s might cut some of the signal out and longer epoch lengths would needlessly increase the number of parameters in the model.We labeled the partitioned training epochs using the predefined label for that evoked response from the experimental design found in the training data.In the case of class imbalance, we performed randomized downsampling of the majority class per participant.Given the nature of the training data, class imbalance was always in favor of the nontarget responses (i.e., there were more nontarget epochs than target epochs in each of the training sets).Using the previously stated settings for EEGNet resulted in a model with 2801 trainable parameters.Finally, we used an ensemble approach by training five distinct instantiations of the CNN.Each distinct instantiation of the CNN used a unique, and randomized, downsampling of the available labeled training data.While the randomized downsampling was performed to ensure class balance, the ensemble approach was performed to ensure that we used as much of the labeled training data as possible while also reducing the impact of a single, sub-optimal model.During testing, we averaged the outputs of these five instances to create a single output signal.
To interrogate the test data (i.e., data from the OL-IR and CL-FVVS tasks), we convolved the pretrained models over the processed EEG signals using a step-size of one sample (Fig. 3).Throughout the remainder of this paper, we refer to the beginning of this 1.25 s input window as the application time point for the model.We did this to stay consistent with the convention in (Solon et al., 2019).In other words, the model outputs are produced for a given time point, t 0 , using data from the window: [t 0 , t 0 +1.25] s.With this convention, t 0 = 0 s is typically where the output would peak for laboratory-based tasks in which static images are flashed in rapid succession to an observer.The output of the model for each such window is essentially a probability, or confidence, that a P300-like spatiotemporal pattern is present in the test data in that window.Therefore, the response profiles shown in the results section provide an estimate of the probability of a P300 around the time-locking event.As previously stated, we averaged the outputs from the ensemble of pretrained CNNs to produce a single output.This single value varied over time, as a function of the underlying neural patterns in the test data (Fig. 5).To collapse the variations in outputs for cross-participant analysis, we then z-scored the averaged model output time series over the entire set of available data (combined OL-IR and CL-FVVS) from a given participant.

Eye tracking and fixation detection
Eye tracking data was used to collect gaze position and pupil size.A standard 5-point calibration protocol was used to calibrate the eye tracker.The Tobii Pro SDK was used to obtain the gaze vector information for each sample, calculate the 3D gaze vector, and identify the gaze vector collision object (in the Unity environment) for each valid sample.The eye tracker was mounted under the monitor.Head movement was not physically restricted, but participants were instructed to maintain a constant distance from the monitor.
We used a velocity-based algorithm to detect saccades and corresponding fixations adapted from (Engbert and Kliegl, 2003) for the CL-FVVS task.This approach was adapted from the EYE-EEG plugin (http://www2.hu-berlin.de/eyetracking-eeg) and originally applied to this data in (Enders et al., 2021).As described in that prior work, velocity thresholds for saccade detection were based on the median of the velocity time series, smoothed over a 5-sample window, for each subject.These thresholds were computed independently for horizontal and vertical components.In the current study, we used a velocity factor of six (i.e., six times the median velocity) and a minimum saccade duration of 12 ms.If two or more saccades were detected in a given 50 ms window, we kept only the second saccade if it was the largest of the group and if its subsequent fixation was greater than 50 ms.Once all the saccades and fixations were labeled, we then removed fixations <100 ms for this current analysis (Ouerhani et al., 2004;Mueller et al., 2008).
Every eye tracking sample was either associated with an "object" in the environment or blank due a drop out in the data (most often associated with blinks).For small, distant objects it was possible that only a fraction of the samples for a given fixation would collide with that object, while the remaining samples would collide with the surrounding terrain.This is a limitation in the resolution of the eye tracker for this specific set-up.While uncommon, multiple objects could also be colinear with the participants gaze vector due to the specific camera angle at that moment.Therefore, for the current analysis, target fixations were defined as those where at least 10% of the available eye tracking samples, and the associated gaze vector collision object, from a single fixation were on a target object.Nontarget fixations were those fixations in which at least 10% of the available eye tracking samples were on a nontarget and 0% of the eye tracking samples were on a target.Fixations that included any portion of a trail marker were discarded for the current analysis.This is because trail markers were highly salient (bright yellow), visible from a long distance away, but also very small.In particular, the concern was that initial fixations on trail markers were the most likely to be missed due to small errors in the estimation of the gaze vector.Fixations for which the gaze vector only intersected terrain (e.g., trees, hillside, grass, path) were isolated as a third category referred to as "terrain."For target and nontarget objects we defined initial fixations as the first and second fixations on that object.While prior work (Nikolaev, 2023) indicates that first fixations are distinct from subsequent fixations and, thus, combining fixations 1 and 2 may not be appropriate we took this approach for two reasons.First, as a necessity to increase the number of available fixations when dividing the data based on the specific factors considered for the analysis.As previously stated, the design of our paradigm was intended to match the ratio of total number of target versus nontarget fixations to those ratios commonly used in prior work.This resulted in a very low amount of true "first" fixations.Second, the definition of "first" fixation might not be valid in a dynamic environment, such as the one employed in this work, where items can appear in the distance and gradually come into view.Third, the resolution of the eye tracker to accurately label the true first fixations for any (arbitrarily small) distant object is questionable.Therefore, we subsequently consider a generic class of initial fixations.
All subsequent fixations were considered refixations.The delineation between initial and refixations was determined by keeping independent running counts of all fixations on a given object.Here the term "refixation" means any fixation on an object after the second fixation, regardless of how many other objects were fixated in between successive fixations on that object.One point of note is that fixation count, as a measure, did not apply to terrain fixations in the same way that it applied to discrete objects.In the Unity tool, there was only one single terrain "object" and, thus, terrain count would continue to increase throughout the task, while the region of terrain attended would, of course, change.Fig. 5 shows an example of the synchronization of the eye tracking and EEG data with labeled fixations and CNN output time series.Given data in this format one can perform fixation-locked analysis by identifying the onset of each fixation and then assessing either the original EEG data or the processed CNN outputs.Each fixation trial, in this sense, can then be labeled using the specific entity (or type, such as target, nontarget, and terrain) that was fixated upon.

Salience models
Salience models were used as an indirect measure of bottom-up processing.We implemented six distinct salience models for this analysis.To do this we took a sampling of multiple, current bottom-up models Including models such as the Boolean Map Salience (BMS) (Zhang and Sclaroff, 2013) and the Corner based (CORS) (Rueopas et al., 2016) models that use well-defined features such as contours, corners, and lines along with models such as the structural-dissimilarity-based saliency (SDS) (Li and Mou, 2019) model that is representative of a class of biologically-plausible distinctiveness models derived from the original conceptual framework of (Itti et al., 1998).SDS, in particular, computes saliency through local contrast in a manner analogous to the early visual system.Each of these models were feature driven methods that required no top-down influence of eye position, scene semantics, or task demands.Table 1 summarizes the models that we used.
From the individual models described in Table 1, we then computed an aggregate salience score.We did this by first z-scoring the salience results from all participants for each model, respectively.We then averaged the six z-scored values per fixation to produce a single salience metric.This aggregation was performed to arrive at a more general notion of salience without overweighting any specific model.It is beyond the scope of this work to assess the performance of specific bottom-up salience methods in the context of the CL-FVVS task.
To associate a salience value for a specific fixation we performed the following 4 step process, in order.1) We extracted a full-scale image (1,920 × 1080) of the scene at fixation onset.We did this using a Unitybased frame grabbing tool that was run in post-processing.This tool extracted an instantaneous snapshot from the exact position and orientation of the camera at the time of fixation.2) Each snapshot image was then processed by all six salience models.3) We overlaid the fixation point, computed as the average x, y position on the image.Using this center point, we extracted salience values in a bounding box of width and height of 100 pixels around this point (i.e., a 100-pixel square).4) The salience values within this box were averaged to produce a single salience value for that fixation.For instances in which the fixation point was within 50 pixels of the side of the screen, we truncated the bounding box as necessary.The width and height of the bounding box used to extract salience were determined using empirical testing with this data set.Minor deviations in these values produced no significant changes in the results.

Factors
We analyze the CNN responses as a function of the three different factors: saccade size, fixation duration, and salience.To do this, we binned the data into the top 30% and bottom 30% of samples for each factor for each participant.Given the unconstrained nature of this task, though, this did not yield an equal number of samples for each participant and each factor.As such we set a threshold that a participant had to have at least 5 qualifying fixations for a given factor to be considered for that analysis.For example, at least 5 initial target fixations with saccade size in the top 30% of all saccades for that individual.In the results section we provide counts for how many participants meet each criterion.Binning the data in this way is intended to allow rough comparison of the responses as a function of large versus small saccades, large versus small fixation durations, and high versus low salience.We anticipate this approach will help assess if one factor is, predominantly, responsible for isolating P300 analogues but will not help if P300 analogues can only be temporally isolated using a nuanced combination of these factors.We denote large saccades as l-SC and small saccades as S-SC respectively.We denote long fixation durations and short fixation durations as l-FD and S-FD, respectively.We denote large salience and small salience values as l-SAL and S-SAL, respectively.For comparison purposes we also defined a control condition, denoted CTRL, which included all initial fixations, or refixations, on targets or nontargets regardless of factor.

EEG analysis
When presenting analysis of the EEG data in scalp space, we first remove EOG artifacts and reject noisy epochs.We used the IMICA algorithm (Gordon et al., 2015), a form of constrained independent components analysis (ICA), to identify, separate, and remove vertical and horizontal eye movement signals from the EEG record for each participant.While we computed component weights during preprocessing, the removal of these components was only done for the analysis of the aggregate EEG responses, such as a P300 complex or scalp topography.These components were not removed prior to computation of the CNN decoded waveform, because doing so would alter the nature of the test data away from that found in the training data for the CNN.
To identify and reject noisy epochs we identified those epochs of data whose ICA cleaned version still produced amplitudes greater than 5 standard deviations from the mean for each channel.To produce any ensemble averages of the EEG data, we referenced and bandpass filtered the data, normalized the data per participant using median absolute deviation, removed eye components, rejected noisy epochs, and then averaged first within participant and then across participants.With the exception of removing EOG components and noisy epochs these are the same preprocessing steps applied to the EEG data prior to CNN decoding.

Results
Fig. 6 provides an estimate of the P300 response for the OL-IR task.Fig. 6A shows the grand average CNN decoded response for target and nontarget stimuli.To compare the magnitudes of these responses we selected a 500 ms window of time around the peak latency of the aggregate response.These windows are shown by the gray bars in the main figure.We then performed a statistical comparison of the average within this window for target and nontarget responses from all participants, the results of which are shown in the inset of Fig. 6A.Using a paired T-test implemented in Matlab we found the magnitude of the CNN response profiles for targets to be significantly greater than the magnitude for nontargets.This is indicated by the (*) in the inset.Fig. 6B shows the EEG responses at electrode Pz associated with the same data shown in Fig. 6A, along with the results of another statistical test comparing the average amplitude of responses over a similar window of time.Here, the units of these responses are in median deviations (m.d.) since during preprocessing we divided each participant's EEG data by A probabilistic model that uses a feature space that extracts intensity contrast features to predict fixation density.
Corner based model (CORS) (Rueopas et al., 2016) Edges, such as corners, line intersections, and line endings are evaluated in this map to assess possible figure locations using center-surround and color opponency mechanisms.SALICON (SAL) (Huang et al., 2015) This model integrates salience prediction with a deep neural network (DNN)-based architecture for object recognition.Structural dissimilarity model (SDS) (Li and Mou, 2019) Computed based on the relationship between structural features, specifically local contrast and gradient magnitude, this salience model computes salience through the use of the image quality assessment.CASNet2 (Fan et al., 2018) A DNN-based framework that is trained on spatial and semantic context of a scene in efforts to model human emotion prioritization.
S.M. Gordon et al. NeuroImage 298 (2024) the median absolute deviation from all channels.As we can see from Fig. 6B, the peak times for the responses are not perfectly aligned but there is enough overlap that the use of common time window for the statistical comparison appears justified.Fig. 6C and D shows the results from electrodes Cz and Fz, respectively.Fig. 6E shows the scalp topography for the average target minus nontarget EEG response for a selection of time points around the stimulus-onset event.The results in Fig. 6 are as expected: target stimulus onset in the OL-IR task evokes a strong P300 response that originates in occipital areas, where activity is first negative before the positive activation begins, and then the positive activation moves towards parietal and lateral-frontal regions.Nontarget stimuli elicit an attenuated P300 response, and these differences are measurable in both the original EEG space and the CNN decoded profile.This provides a controlled validation test of the CNN decoder for this specific participant pool along with a demonstration of the basic properties of the CNN response profile (e.g., peak at t 0 =0 s associated with P300 occurring in the window [0, 1.25] s, measurable bounds on the jitter, and dissociations between stimulus type).Fig. 7 provides the results from the CL-FVVS task using fixation onset as the time-locking mechanism.Fig. 7A shows the grand average CNN decoded response for initial target and nontarget fixations along with the result of the same sample statistical comparison used in Fig. 6A.Immediately we observe that target fixations produce larger amplitudes in the CNN response profile than nontarget fixations, but the CNN response profiles are wider and there is a shift in the peak time of the response of approximately 450 ms.These features imply greater temporal variability in the onset of any P300 analogue (when measured against fixation) and an overall shift of the response to earlier time windows.Fig. 7B shows the three EEG responses for target, nontarget, and terrain fixations at electrode Pz.The target and nontarget fixations are the same trials presented in Fig. 7A.Here we see a peak of activity 300-500 ms after fixation that may coincide with the P300 complex, but the data is noisythe saccadic spikes and lambda responses appear to be confounding the analysis.Since we hypothesize that the terrain response would have a lower probability of P300 complex (and this is confirmed with the terrain CNN response profile), but the terrain response appears to contain similar levels of these artifacts, we subtract the terrain response from both the target and nontarget fixation locked responses to help improve overall SNR (i.e., removing some of these phase-locked artifacts).These results are presented in Fig. 7C and D for electrodes Pz and Cz, respectively.However, this cleaning step does not improve our ability to interpret the results.The response is much wider and does not really match anything that we observed in the OL-IR task.We do find significant differences in the amplitudes at these electrodes using the same window applied to the OL-IR task, which may be a weakened P300 but inspection of the scalp topography (Fig. 7E) coupled with our understanding of the CNN response profile (Fig. 7A) indicates that there is either 1) simply too much noise to draw strong conclusions, or 2) we have not sufficiently identified the onset time for this analysis and, thus, the entire signal is diffuse and attenuated.
Comparison of Figs.6A and 7A reveals that the CNN response profile for target fixations in the CL-FVVS is nearly twice as wide as the CNN response profile for target stimulus-onset in the OL-IR task.Since the CNN profile is not a difference waveform, this increase in width is directly related to the temporal uncertainty in the onset of the P300 analogue for the CL-FVVS data (Gordon et al., 2023a) for these fixations.Therefore, we then analyzed the CNN response profiles for each of the factors previously defined (L-SC, S-SC, l-FD, S-FD, l-SAL, and S-SAL).We identified qualifying fixations for each factor and used this information to select eligible participants.We averaged the CNN responses within participants and measured variability in the individual peak times (per participant) for each factor versus the variability observed in the original data (CTRL).
Fig. 8 shows a sampling of this process for fixations with large saccades (L-SC), long fixation durations (L-FD), and large salience values (L-SAL) for both initial fixations on targets (top row) and refixations on targets (bottom row).Here, we break down the grand average CNN response profile into the per-participant averages in order to measure the variance in individual peak times using the grand average peak as the mean.For each example we provide the grand average CNN response profile (thick purple line) and the individual response profiles from each eligible participant (gray lines).We also provide the peak time (dashed red line), computed from the grand average response, and used as the basis of the variance estimates from the individual peak times.From this example, we clearly see a clustering of peak times in the individual response profiles around a common time (dashed red line) for large saccades (L-SC) for both initial and refixations.Such clustering is not visible for either long fixation durations (L-FD) or large salience values (L-SAL).
Tables 2 and 3 provide descriptive information on the number of participants available for each of the six factors for initial fixations (Table 2) and refixations (Table 3).In addition, these tables indicate whether the variance in peak times measured using each factor significantly changed from the variance measured in the original (CTRL) data.Significant changes are marked as "Increase" or "Decrease", indicating which direction the variance changed, while non-significant changes were marked as "None".Only large saccades (L-SC) produced significant reductions in variance in the peak times of the individual response profiles for all groups considered: initial fixations on targets, initial fixations on nontargets, refixations on targets, and refixations on nontargets.
Using this knowledge that large saccades decrease the jitter in peak times across individuals, Fig. 9 shows the average CNN response profiles for just these fixations.Here we see that the response profiles for large saccades are highly stereotyped and consistent across both initial and refixations.The CNN profiles now look very similar to those observed for the OL-IR task (Fig. 6A) with just a couple of differences: 1) the amplitudes are lower by approximately a factor of 5, and 2) the peak time has shifted left (i.e., earlier in time) approximately 450 ms.However, while the responses have shifted earlier in time, this peak of activity is tightly grouped and consistentmeaning that it has been temporally isolated.Given this similarity in the CNN decoded profiles for initial and refixations the remaining analysis will combine these two groups.Fig. 10 shows the variation in CNN response profiles as a function of   saccade size across the range of saccade values for all fixations (initial and refixation) on targets.Saccade values greater than 20 • were combined in the final bin.Fig. 10A shows the changes in the CNN response profile as a function of saccade size.In Fig. 10B we show saccade size (degrees) regressed against the amplitude of the CNN response profile measured at t = − 450 ms where we find an almost linear increase in amplitude with r 2 = 0.95 which was statistically significant at p << 0.01 using Matlab regression statistics toolbox.We now revisit the initial fixation-locked analysis for the CL-FVVS task presented in Fig. 7 and focus on the EEG data only.We limit the new analysis to only the previous definition of large saccades (top 30%) but include both initial and refixations while re-using all of the other parameters included in Fig. 7.We also use our knowledge of the properties of the CNN response profiles across the two tasks (OL-IR and CL-FVVS) to linearly adjust the P300 complex observed in the original OL-IR task to provide a visual comparison to the CL-FVVS version.To do this, we time-shifted the OL-IR response to occur 450 ms earlier and we scaled the amplitude by a factor of 5.The results of this new analysis are presented in Fig. 11.Now, we can clearly see activity that looks very similar to the open-loop impulse response version of the P300.Fig. 11A shows, a response with a rapid rise time and slow dissipation over electrode Pz.Both target and nontarget response from the CL-FVVS data are very similar to the adjusted OL-IR data and we find no differences in the response at electrode Pz.
Fig. 11B shows the probability density over time of a prior large saccade immediately preceding the large saccades used to select the fixations on targets and nontargets shown in Fig. 11A.Here we see no difference in the distribution of prior large saccades for those fixations presented in Fig. 11A across targets or nontargets and virtually no probability of a prior large saccade in the 200 ms immediately preceding the large saccade to targets or nontargets.In the CL-FVVS data, for the selected definition of large saccades (top 30%, which yielded saccades sizes of approximately 5 • and above) there was an average intersaccade interval of 809 ms (± 75 ms) -averaged first within participants and then across.This information enables us to assert that the P300 analogue activity observed for large saccades in the CL-FVVS task (and presented in Fig. 11A) is faster than the average interval between large saccades.This supports the argument that any observed differences in the target versus nontarget responses is not likely the result of prior eye movement.Though it is not presented, similar results for the probability of any prior fixation were obtained.
Fig. 11C and D shows the target and nontarget response over electrodes Cz and Fz, respectively.Here we find that as the activity moves towards more frontal regions traditional target/nontarget differences appear and the time course of the signal is visibly similar to the adjusted OL-IR data.Fig. 11E shows the time course over the entire scalp in which we observe that the response originates in occipital areas, where activity is first negative before the positive activation begins, and then the positive activation moves towards parietal and lateral-frontal regions, as we would expect with the evoked version of the P300.The final result, therefore, is that in CL-FVVS paradigms the P300 complex appears to shift earlier by 450 ms so that it nearly co-occurs with the eyes settling upon the attended region and rides on the back of the lambda response.The amplitude, or more-likely probability of occurrence, is determined by saccade size but the overall probability, even for large saccades, is lower than what was observed in the OL-IR task.

Discussion
Few studies have investigated the P300 in environmentally and  behaviorally closed-loop paradigms.By using a CNN to first decode the EEG signal we were able to show that a pattern of activity analogous to the P300 complex was occurring around the moment of fixation under such conditions.Whereas initial visual inspection of the EEG results in Fig. 7 could have falsely led us to the belief that noise was the main limiting factor in obtaining a robust P300 response, the CNN profile suggested we were both 1) looking at the wrong time window and 2) observing a response with a tremendous amount of variable latency.More importantly, though, by exploiting the properties of the CNN response profile we were able to investigate factors that led to the greatest reduction in this temporal uncertainty.We showed that large saccades were sufficiently linked with the onset of this P300 analogue to enable analysis with standard ensemble averaging methods.We also found that the emergence of this pattern was tightly coupled to saccade size.We believe this is the case because saccades, and saccade size, directly reflect the dynamics of attentional deployment and, as a result, are modulated by both top-down and bottom-up processes.Before a saccade is initiated, cognitive factors (e.g., task properties) and low-level factors (e.g., salience of location) determine the magnitude and location of the subsequent saccade (Findlay, 2013).Fixation duration and salience, on the other hand, tend to be more strongly associated with only one, or the other, of these processes (top-down or bottom-up, respectively).Naturally, though, if the visual search paradigm we used in this study had been more skewed towards one of these processes, then we may have found a secondary relationship where either fixation duration or salience, along with saccade size, was a reliable indicator of the P300 analogue onset time.
Once we were able to temporally isolate the response, we observed a profile in the EEG scalp space that was very similar to the profile observed in the OL-IR data, with a few notable differences.The most striking of these differences was that the response occurred earlier in time with respect to fixation onset, supporting the idea that in unconstrained, closed-loop, free-viewing search that fixation on a stimulus is not analogous to the stimulus onset events used in evoked-response paradigms.Interestingly, though, the timing of the response was such that the earliest negative activity occurred before the fixation while the positive activity almost perfectly coincided with fixation onset, as if to maximize efficiency by placing the to-be-processed item in the fovea at just the right time.It is worth noting that a defining feature of closedloop systems are their ability to adapt in order to more optimally, or efficiently, solve the problem at hand.This shifting in time may appear in contrast to prior work suggesting that large saccades delay processing; however, we argue this difference is due to the fundamental difference with our paradigm.Our paradigm created a closed-loop system involving both environmental and behavioral paths and removed time pressures, thus allowing the brain, eyes, and environment to establish an inherent equilibrium.In open-loop paradigms the P300 is, essentially, a reactive process.In our task, this timing shift may indicate highly optimized proactive process.
The new time course also results in the blending of the well-known lambda response traditionally associated with fixation-locked analyses with the onset of the P300 analogue.Lambda responses are brief, with approximately 100 ms duration, and peak over the occipital cortex.The topography observed in Fig. 11A starts at nearly the same time as lambda but persists and projects up and across the scalp with a temporal/spatial distribution reminiscent of the P300 complex observed in open-loop paradigms.We recall that prior work has shown that saccade size modulates the amplitude of the lambda response, and that the lambda response is believed to reflect the afferent input of visual information (Ries et al., 2018;Billings, 1989;Yagi, 1979).We suspect this relationship and close temporal alignment is not coincidental, but delineating these two components, however, was beyond the scope of the current work and must be left for more focused future work in this area.
Another difference was that the amplitudes measured in the CL-FVVS were substantially lower than those from the OL-IR task.This was present in both the EEG response and the CNN decoded response profile.We argue this difference is expected as laboratory paradigms for the P300 often focus on maximizing the SNR in the response.In closed-loop paradigms where time pressures have been removed, it is unrealistic to expect such an enhanced SNR for every single "trial".In addition, we could not assess with the current paradigm whether the response happens with every single large saccade, but with diminished amplitude, or the amplitude is preserved but only occurs with a specific probability that is a function of saccade size.We believe the results presented in Fig. 10, in which the CNN decoded response profile increases in amplitude with saccade size, reflects a change in the probability of the response and not a change in amplitude of the response, but the current study was not designed to address this specific question.
A third difference was that the measured statistical differences between target versus nontarget fixations in the CL-FVVS data occurred in more frontal regions while the differences in target versus nontarget stimuli in the OL-IR task occurred in more occipital regions.We believe the primary deviation here, though, is the lack of difference at Fz in the OL-IR data as most open-loop P300 studies report differences in this region.In the CL-FVVS, while there was no difference in the amplitude observed at electrode Pz this could be explained by the presence of the lambda response in this area impacting both target and nontarget fixations.
All of these results, however, must be placed in context.Specifically, most of our data samples should be considered refixations.Even the samples we considered initial fixations would be difficult to validate as true "first fixations" given the resolution of the eye tracker and the continuous nature of the task.The purpose of this work was not to sharply delineate initial versus refixations but rather to investigate P300 analogous behavior in closed-loop systems under the conditions of steady state.As stated previously, causality is blurred in closed-loop systems and our results should be interpreted as describing certain steady state aspects of visual processing.
Future work that should be done, given the current results, would be to replicate and extend prior laboratory findings on the P300 in more naturalistic paradigms.For instance, how do prior results showing that P300 amplitude changes as a function of cognitive load map into closedloop paradigms?Is this analogue of the P300 still a reliable indicator of cognitive function or are those differences in latency and amplitude originally observed in open-loop paradigms lost when the response emerges in closed-loop paradigms?We believe the current work provides the foundation for such questions to be addressed by 1) establishing saccades, and saccade size, as measurable features of CL-FVVS that can be used to isolate the CL-FVVS P300 analogue, and 2) showing how domain-generalized neural decoders can be used to uncover such relationships.

Conclusion
Naturalistic search is not dominated by the abrupt appearance of stimuli, requiring immediate attention and processing; rather, in ecologically valid and naturalistic domains the rate of information change is likely sufficiently low from one moment to the next that the closed-loop dynamic system composed of the environment, neural/visual processes and their behavioral counterparts would be allowed to a establish an inherent equilibrium.In such contexts, there has been little evidence of P300 events such as those observed in more controlled settings and little-to-no knowledge on what factors would predict the onset such events.As a result, a comprehensive model of how attentional mechanisms, eye movements, and cognitive processes (such as those innervated by the P300) integrate to enable individuals to maintain awareness of the world around them has remained largely speculative.The current work established saccades as a critical factor in this dynamic system and one whose measurable features could be used to temporally isolate subsequent neural processes.This is just one step, though, along the path from analyzing the brain as an open-loop impulse response system to realizing models of the brain (with measurable behavior and neural processes) as the complex dynamic and closed-loop system that it really is (Schroeder et al., 2010).

Fig. 3 .
Fig. 3. A) Sample target stimuli from the OL-IR task presented in the center of the screen for 250 ms.B) Examples of target and nontarget stimuli for the OL-IR task.

Fig. 5 .
Fig. 5. Example of the synchronized EEG (top row), CNN time series (middle row), and eye tracking (bottom row) data.Fixations in the eye tracking data are color coded by type (target, nontarget, or terrain).Minimally processed EEG data is shown as a raster plot.Test data is preprocessed and the ensemble of pretrained CNN models is convolved over the test data one sample at a time.The outputs of the ensemble are averaged to produce a time series signal whose value indicates whether the underlying neural patterns in the test data are more similar to a P300 pattern response (target stimulus trials) or not (nontarget/background image trials).Fixation-locked analyses can be performed on either the EEG data or the CNN output space.

Fig. 6 .
Fig. 6.A) Grand average CNN decoded response profile to target versus nontarget stimulus presentations from the OL-IR task.Gray bars indicate the regions used to compute average magnitude and standard error for use in the statistical comparison shown in the inset.Statistical differences are indicated by (*) B) EEG response to the same trials and participants included in (A) at electrode Pz.C) EEG response to the same trials and participants included in (A) at electrode Cz.D) EEG response to the same trials and participants included in (A) at electrode Fz.E) Average scalp topography (EEG) for the target minus nontarget responses shown in (B).

Fig. 7 .
Fig. 7. A) Grand average CNN decoded response profile to initial target, nontarget, and terrain fixations from the CL-FVVS task where time zero is fixation onset.Gray bars indicate the regions used to compute average magnitude and standard error for use in the statistical comparison shown in the inset.Statistical differences are indicated by (*) B) EEG response for fixations on targets, nontargets, and terrain at electrode Pz.C) EEG response to the same trials and participants included in (A) for electrode Pz.Here the average terrain response has been subtracted to further help eliminate ocular artifacts.D) EEG response to the same trials and participants included in (A) for electrode Pz.E) Average scalp topography (EEG) for the target minus terrain response.

Fig. 8 .
Fig. 8. Individual CNN response profiles for each of the factors l-SC, l-FD, and l-SAL for targets and nontargets, initial and refixations.In each plot, the solid black line is the grand average, the multicolored lines are the individual responses, and the dashed red line indicates the peak time of the grand average.

Fig. 10 .
Fig.10.A) CNN response profile for all fixations (combined initial and refixations as well as target and nontarget) as a function of saccade size.B) CNN response profile for data in (A) for only trials not preceded, or followed, by a large saccade.

Fig. 11 .
Fig. 11.Fixation-locked response for fixations on targets and nontargets for large saccades only for all fixations (initial and refixations), time zero is fixation onset.Horizontal and vertical EOG signals have been removed and fixations on terrain have been subtracted to remove any additional ocular-based artifacts.A) Activity at the scalp electrode Pz with region of peak amplitude (gray) and pre-fixation baseline (red dash) marked.Inset indicates grand average response over the regions highlighted in gray.Dashed black line is the adjusted response from the OL-IR task for visual comparison purposes only.B) Probability of prior large saccade leading up to the large saccades associated with the fixations presented in (A).C) Activity for same trials shown in (A) but at electrode Cz.D) Activity for the same trials shown in (A) but at electrode Fz.E) Scalp topography for Target minus Terrain responses presented in (A).

Table 1
Salience models and a description of each.

Table 2
Number of eligible participants and results for the analysis of reduction in variance for initial fixations on targets and nontargets.

Table 3
Number of eligible participants and results for the analysis of reduction in variance for refixations on targets and nontargets.