Neural correlates of virtual reality-based attention training: An fMRI study

Theoretical background: Virtual Reality technology is increasingly used in attention rehabilitation for functional training purposes. However, the neural mechanisms by which Virtual Reality can affect attentional functioning are still unclear. The current study ’ s objective is to examine the effects of stereoscopic vs. monoscopic presentation on neural processing during a visual attention task. Method: Thirty-two healthy participants performed a visual attention task in an immersive virtual environment that was displayed via MR-compatible video goggles in an MRI scanner. The paradigm altered between trials that required active engagement with the task and mere observation trials. Furthermore, the form of binocular presentation switched between monoscopic and stereoscopic presentation. Results: Analyses yielded evidence for increased activation in stereoscopic compared to monoscopic trials in the tertiary visual cortex area V3A as well as elevated activation in the dorsal attention network when engaging in the attention task. An additional ROI analysis of area V3A revealed significantly lower attentional engagement costs in stereoscopic conditions. Discussion: Results support previous findings suggesting that V3A is involved in binocular depth perception. Furthermore, heightened activation in V3A following stereoscopic presentation seemed to facilitate attentional engagement with the task. Considering that V3A is the origin of the dorso-dorsal, ventro-dorsal, and ventral visual processing pathways, we regard it as a gating area that decides which kind of visual perception is processed.


Virtual reality in attention rehabilitation
In recent decades, Virtual Reality (VR) has become an increasingly valuable tool for simulating real-world environments.Especially in neurorehabilitation, VR has entered several fields of application, including motor rehabilitation (Chen et al., 2018;De Rooij et al., 2016), pain management (He et al., 2022;Eijlers et al., 2019;Mallari et al., 2019), and exposure therapy (Kim and Kim, 2020;Van Loenen et al., 2022;Wechsler et al., 2019).Another more recent usage of VR concerns the neuropsychological assessment and treatment of cognitive deficits especially in attention.By modeling aspects of the real world more realistically than paper-and-pencil and other computerized approaches, VR has been shown to reliably assess attention deficits in an ecologically valid way (Gilboa et al., 2018;Neguț et al., 2016;Jahn et al., 2021).
Moreover, many studies have found VR applications to be effective in improving attentional performance on standard neuropsychological test (Bioulac et al., 2020;Gamito et al., 2015).Overall, VR seems to be a promising technology for the future of cognitive rehabilitation.However, the neural mechanisms underlying VR-based rehabilitation of attentional functions remain largely unclear.

Neural correlates of visual attention
Since VR is particularly useful for simulating convincing visual environments, VR training applications tend to focus on visual attention.Like other computerized trainings, these protocols consist of a visual attention task that is repeated many times over to achieve its rehabilitative effect by stimulating brain networks associated with visual attention.Posner and Petersen (1990) divided visual attention into three subcomponents, each relying on distinct neural circuitry (Fan et al., 2005;Petersen and Posner, 2012).Alertness, or the ability to regulate physiological arousal into a state of readiness and openness to respond to stimuli, is the most basic component and has been linked to the reticular activating system (Moruzzi and Magoun, 1949) including the thalamus and the brainstem.Orienting attention toward a target in space was found to be associated with heightened activation in a fronto-parietal network encompassing the frontal eye fields (FEF) and the intraparietal sulci (IPS) which Corbetta and Shulman (2002) termed the dorsal attention network (DAN).In contrast to the ventral attention network which is thought to be involved in processing unexpected salient stimuli in the environment and triggering shifts in attention (Vossel et al., 2014), the DAN is crucial for the top-down goal-directed steering of attentional focus and eye movements.Lastly, executive control refers to the process of selecting the appropriate behavior out of a set of conflicting options in response to a given stimulus.This function was correlated with increased activation in the anterior cingulate cortex and thalamus also called the conflict network.
In order to examine how VR presentation of a visual attention paradigm might impact any of these neural systems, previous research on the neural correlates of the VR experience has to be considered.Former neurophysiological studies on VR have centered around the concept of presence.Presence or more particularly spatial presence refers to the feeling of being immersed in a virtual environment and forgetting one's actual surroundings (Slater and Wilbur, 1997).Russo et al. (2017) have argued that the heightened feeling of presence caused by the embodied nature of the VR experience might stimulate cognitive functions in a particularly ecologically valid way, thereby increasing rehabilitative potential.

Neural correlates of virtual reality-induced presence
Over the years, the concept of presence has been discussed, differentiated, and expanded upon by many researchers.Though the exact neural correlates of presence still remain unclear, several factors have been shown to contribute to its emergence (Slater and Wilbur, 1997;Lombard and Ditton, 1997;Steuer, 1992;Slater and Usoh, 1993).One of the most unique features of the VR experience that contributes to presence is the realistic tracking and representation of head and hand movements.Due to the technical constraints of magnetic resonance-based imaging techniques in studying participants' head movements, early research has primarily relied on electroencephalography (EEG) to study the neural bases of presence (Baumgartner et al., 2006;Kober et al., 2012;Slobounov et al., 2015).EEG results from these studies point toward correlations between the experience of presence and increased parietal activation as well as decreased activation in frontal cortical areas.However, many researchers have used functional magnetic resonance imaging (fMRI) to examine other contributing factors to presence that do not require participants to move their heads in the scanner.In a study by Baumgartner et al. (2008), the authors altered the sense of presence by using high-vs.low-engaging stimulus material.They found that the experience of presence was negatively correlated with activation in the right dorsolateral prefrontal cortex (DLPFC) and the left DLPFC though to a lesser extent.Interestingly, they could demonstrate that the experience of spatial presence was negatively correlated with the state of maturation of the DLPFC in children and adolescents.In 2014, Clemente et al. (2014) showed that sense of presence mediated by the degree of interactivity with the virtual environment (still picture vs. moving picture vs. self-navigation) was associated with lower activation in the DLPFC bilaterally and higher activation in the right insula and left postcentral parietal regions.Another factor demonstrated to contribute to the experience of presence is stereopsis (Ijsselsteijn et al., 2001) i.e., the perception of depth generated by the integration of two spatially offset visual inputs from both eyes.In a study by Gaebler et al. (2014), 25 participants were shown movie clips in either a monoscopic (same image to both eyes) or stereoscopic (spatially offset images to each eye) condition.They found that stereoscopic presentation was associated with higher self-reports of presence as well as increased intersubject correlations of cortical networks.Forlim et al. (2019) expanded this research by presenting an interactive virtual environment, in which participants could navigate upward, downward, left, and right using control buttons while the researchers altered between stereoscopic, monoscopic, and 2D mirror screen forms of presentation.Results showed higher functional connectivity in the stereoscopic condition between the superior frontal cortex and the insula/putamen when compared to monoscopic presentation.

Objective of the current study
In sum, several brain regions including the DLPFC, parietal cortex, and insula/putamen have been associated with the experience of presence (Jäncke, 2009).However, the neural mechanisms by which different contributors to presence might interact with networks of visual attention remain largely unstudied.In the present study, we aimed to examine the effect of different forms of binocular presentation (stereoscopic "3D" vs. monoscopic "2D") on neural processing during a visual attention task.Participants performed the task lying in an MRI scanner.The virtual environment was displayed to them with the use of MR-compatible video goggles which allowed us to switch between 2D and 3D presentation.
We hypothesized that stereoscopic presentation of the visual attention task would be positively correlated with activity in area V3A in the tertiary visual cortex based on previous research which indicated its involvement in stereoscopic depth perception (Tsao et al., 2003) and negatively correlated with activity in the DLPFC following Baumgartner et al.'s (2008) model on the neural basis of presence.Based on the structure of the visual attention task, we further predicted that engagement with the paradigm would be associated with the alerting and orienting attention networks, i.e., the thalamus and DAN, but not the conflict network.Lastly, we predicted that different forms of binocular presentation would significantly interact with visual attention.We chose area V3A as a region of particular interest in that regard because of its importance for stereopsis but also because of recent models showing V3A to be a key area in the distribution of visual inputs to several visuomotor pathways (Tessari et al., 2021) as well as some research linking it to the location prediction of moving objects (Maus et al., 2010) which was a key component of our visual attention task.

Participants
Thirty-three healthy participants were recruited via flyers that were put up in the University Clinic RWTH Aachen, around the city of Aachen, and uploaded to the university clinic's Intranet webpage.Participants who were interested in the study contacted the researcher directly via email.All participants were comprehensively informed about the study, signed the information material, filled in an MRI contraindication checklist, and provided written informed consent before proceeding with the study.Diagnosed neurological and psychological disorders, psychoactive medication, drug and alcohol abuse, as well as contraindications relating to the MRI procedure (pregnancy, metal implants, claustrophobia, etc.), were defined as exclusion criteria.One participant had to be excluded from the study retroactively due to technical difficulties with the video goggle system.Thus, the final study sample included 32 participants in total.Participants' ages ranged from 18 to 43 (M = 26.22,SD = 5.07).The female-to-male ratio was 15:17.Ethical approval for the study was granted by the local ethics committee of the faculty for medicine at the RWTH University of Aachen.

Technical setup and paradigm description
The paradigm used in this study was based on a VR attention training L. Lorentz et al. (programmed on Unity 2018.3.14fPersonal engine) which had been evaluated for feasibility and efficacy in previous neuropsychological studies (Lorentz et al., 2021(Lorentz et al., , 2023)).For the purposes of this study, one training module, a visual attention task targeting selective attention, was adapted for use in an MRI environment.While lying in the scanner, the paradigm was presented to the participants via an MRI-compatible goggle system (Visual System HD, Nordic Neuo Lab, Bergen, Norway), which was attached to the 32-channel head coil.This allowed changes between monoscopic or stereoscopic presentation of the virtual environment.For the right hand, participants received an MRI-compatible trackball which they could operate with their right thumb and allowed them to change their first-person perspective and scroll through the virtual scenario.For the left hand, a button box was provided to them.By pressing with the left index finger, they could take a picture in the virtual environment.
When participants entered the virtual scenario, they found themselves in an underwater environment (see Fig. 1).In front of them, they saw a swarm of fish that constantly moved around.A rectangular camera frame with a crosshair in the center was fixed in the middle of the visual field.By moving their right thumb on the trackball, participants could change their first-person perspective in virtual space and target the crosshair at any point in the environment.In the upper right corner of the camera frame, a bright red letter indicated the present condition.An "A" stood for an Action and a "B" for an Observe trial (Beobachten in German).If the condition letter displayed an "A", a two-dimensional representation of one particular kind of fish was presented beneath the camera frame, specifying the target stimulus.During Action trials, the participants' task was to locate the target stimulus within the swarm, direct the crosshair on it by moving the trackball accordingly, and press the response button with their left index finger to take a picture.If participants managed to successfully take a picture of the right stimulus, the target stimulus switched and another target was depicted under the camera frame.Participants were instructed to take as many pictures of the target stimulus as they could for the duration of the Action trial.If the condition letter displayed a B, no target stimulus was presented.In these Observe trials, the participants were instructed to simply explore the virtual environment with no selective focus.Participants were still encouraged to look around by moving the trackball, but had no particular task at hand, other than observing their surroundings.Action and Observe conditions were separated by gray screens of on average 10 s during which participants were instructed to rest.These pauses were introduced to allow the researchers to change between monoscopic and stereoscopic presentation.The paradigm encompassed four experimental conditions in total: monoscopically presented Observe trials (MO), monoscopically presented Action trials (MA), stereoscopically presented Observe trials (SO), and stereoscopically presented Action trials (SA).A comprehensive description of the sequencing of the four conditions and the intermittent gray screens can be found in the subsection "2.3 Procedure".

Procedure
On the day of the measurement, participants were picked up at the University Hospital lobby and led to the examination room by the attending researcher.After removing all metallic materials, participants were placed into the scanner, where the MRI-goggle system was L. Lorentz et al. attached to the head coil and adjusted to their interpupillary distance and optical diopter if necessary.Participants independently filled inflatable cushions to fixate their heads within the coil.A strap of duct tape was placed on the participants' foreheads to alert them to their own head movements thereby reducing overall motion.Subsequently, they received the trackball for their right and the button box for their left hand.When participants were comfortably placed in the scanner and had familiarized themselves with the controls and technical preparations were finished, the experiment would commence.
The study paradigm consisted of two experimental blocks of approximately 20 min each.Between these two blocks, the structural and the Reverse-Gradient-Polarity images were obtained, which provided participants with a resting phase of about 5 min.In each experimental block, participants went through a pseudorandomized sequence of the four conditions, in which each condition was presented 10 times (see Fig. 2).All conditions lasted for 20 s and were separated by gray screens for approximately 10 s.The duration of the gray screens varied between 8500 and 11,500 milliseconds in order to introduce temporal jittering and desynchronize the paradigm from the TR.Both experimental blocks followed the same pseudorandomized sequencing order.

Functional MRI data analysis 2.4.1. Scanning procedure
All images were acquired on a Siemens Prisma 3-Tesla scanner (Erlangen, Germany) using a 32-channel head coil.Structural images were obtained using a T1-weighted magnetization-prepared gradientecho sequence (MPRAGE; TR =1900 ms, TE = 2.21 ms, TA = ~ 4minutes, 1 × 1 × 1 mm, acquisition matrix = 256×256).A T2*-weighted echo-planar imaging (EPI) sequence (TR = 1500 ms, TE = 26.00ms, image matrix = 64×64, slice thickness = 3.00 mm, multi-band acceleration factor = 2, phase-encoding direction P → A, resolution 3 × 3 × 3 mm) was used for the acquisition of the functional blood-oxygen-leveldependent (BOLD) images.Another EPI sequence with phase-encoding set to the opposite direction (A → P) but otherwise identical specifications was applied in order to obtain a single functional image which was later used for Reverse-Gradient-Polarity (RGP) top-up correction.All images were aligned to the anterior-posterior commissure line using a localizer sequence at the beginning of each measurement.

Pre-processing
The first 6 images of every functional scanning sequence were discarded to ensure longitudinal steady-state magnetization.The remaining images were then realigned to the mean image using SPM12 (Wellcome Department of Cognitive Neurology, London, UK) launched on MATLAB R2017b (MathWorks, Natick, MA, USA).Thereafter, the reverse-phase-encoded AP image was combined with the realigned PA images to apply RGP top-up correction using FSL (Analysis Group, FMRIB, Oxford, UK).RGP top-up correction is a pre-processing procedure that is primarily used in diffusion tensor imaging to correct for susceptibility-based inhomogeneities in the static B 0 field by combining two functional images with phase-encoding set in opposite directions (Anderson et al., 2003).This approach was chosen because early pilot measurements displayed artifacts in the frontal regions of the brain due to B 0 inhomogeneities which were likely caused by electromagnetic distortions from the video goggle system.In following experiments, RGP correction proved superior to field map correction (Hutton et al., 2002) or non-linear warping approaches (Kybic et al., 2000) in minimizing distortion artifacts.All subsequent preprocessing steps were performed using SPM12.The top-up-corrected images were coregistered to the T1-weighted anatomical image and segmented into gray and white matter and cerebrospinal fluid.Lastly, images were normalized to Montreal Neurological Institute (MNI) space and consequently smoothed using an 8-mm full-width at half maximum (FWHM) Gaussian kernel to improve signal-to-noise ratio.For every participant, the voxel-specific mean framewise displacement (FD) was calculated according to Power et al. (2012).None of the participants displayed a mean FD exceeding the recommended 0.5 threshold.

Contrasts
Four contrasts of interest were included in the final analysis.To assess the main effects of engagement with the attention task and binocular presentation on neural processing two main contrasts were examined.A two-sided F-contrast of the Action and Observe conditions revealed significant activation in both Action > Observe To further examine the interaction effect, a mask of the significant activation cluster from the whole-brain analysis (see Fig. 3C and Fig. 4A) was overlayed on the second-level condition contrasts.Whitened and filtered beta values were extracted to MATLAB and exported to SPSS (IBM Corp. Released 2021.IBM SPSS Statistics for Windows, Version 28.0.Armonk, NY: IBM Corp) to conduct a two-way Analysis of Variance (ANOVA) with attentional engagement (Observe vs. Action) and binocular presentation (Mono vs. Stereo) as independent variables.Bonferroni-corrected post-hoc tests were performed for further levelwise comparisons.
In order to examine whether different levels of overall trackball movement could account for changes in neurological processing in the [A > O] contrast, trackball motion was included in the model as a control regressor.All trackball movements performed by the participant within one TR (1500 ms) were summed up and convolved according to SPM's model of the canonical hemodynamic response function (HRF).Consequently, the data were added to the motion correction realignment file as a seventh regressor.Since movements of the trackball were associated with changes in participants' first-person perspective in virtual space, a fourth contrast was specified to assess the effect of virtual head movement on neurological processing in isolation.
All simple contrasts, meaning one-sample t-tests for each condition, and complex contrasts were constructed on the first level for subjectwise analysis.To allow for population inference, simple contrasts were elevated to the second level and used to recreate the complex contrasts by means of a within-subject ANOVA.All reported results passed the family-wise error (FWE)-corrected significance threshold of p < .05 on the cluster level (cluster extent threshold > 20 voxels).

ROI analysis
An additional Region-of-Interest (ROI) analysis of area V3A was conducted to further investigate the interaction effect between attentional engagement and binocular presentation.We chose area V3A as an ROI based on earlier research demonstrating its involvement in the processing of stereoscopic inputs (Tsao et al., 2003), as well as current models highlighting its role as a hub area for multiple visuomotor pathways (Binkofski and Buxbaum, 2013;Tessari et al., 2021), which would put it in a promising position to mediate between depth perception and visual attention.SPM12 Anatomy Toolbox was employed to create a mask of V3A (Area H0c4d L + R, see Fig. 4B) which was then used to extract beta values from each condition.These values were then analyzed in a two-way ANOVA following the same procedure as described in subsection "2.4.3 Contrasts".

Main effects and interaction
To evaluate the main effects of engagement with the attention task [A > O] and form of presentation [S > M], as well as their interaction effect on neurological processing, complex contrasts were calculated by means of a within-subject ANOVA (see Table 1).
As illustrated in Fig. 3A, engagement with the attention task was associated with several clusters of significantly increased activation encompassing the thalamus (mostly prefrontal projections according to the Oxford Thalamic Connectivity Probability Atlas (OTCPA)), bilateral premotor areas, the IPS, and the superior parietal lobules.Furthermore, a network spanning the cerebellum and bilateral occipital areas V5 was also highly active.Two smaller clusters could be located in the left and right anterior insular cortex.Examining the effect of stereoscopic presentation, results yield evidence for a single horseshoe-shaped cluster spanning from the primary visual cortex to the left and right superior portion of the lateral occipital cortex corresponding to area V3A (see Fig. 3B).The interaction contrast yielded significant results in occipital areas with local maxima in V2 and V4 lateralized to the left hemisphere (see Fig. 3C).A two-way ANOVA of this cluster indicated a significant main effect of engagement (F (1, 31) = 88.496,p < .001),but not of presentation (F (1, 31) = 2.927, p = .097)as well as a significant interaction effect (F (2, 31) = 51.868,p < .001).As illustrated in Fig. 4A, post-hoc analyses revealed that presentation did have a significant effect on mean activation in both Observe (p <0.001) and Action conditions (p = .001).However, the effect of stereopsis was positive during Observe but negative during Action trials, therefore leading to an insignificant main effect.Attentional engagement on the other hand was associated with significantly higher activity in both monoscopic (p < .001)and stereoscopic (p < .001)conditions.

Virtual head movement
To assess the effects of trackball movement and its translation into simulated head movement in virtual space, first-level contrasts were aggregated in a separate one-sample t-test (see Table 2).Results show significant activation in the primary and secondary visual areas of the occipital lobes.Other cortical clusters of increased bilateral activation include the inferior parietal lobules, the secondary somatosensory cortex, Broca's areas, and premotor areas.Subcortical clusters encompass the left and right thalamus (projections to primary motor, sensory, and posterior parietal areas according to OTCPA) as well as the corticospinal tracts (see Fig. 3D).

ROI interaction analysis V3A
A two-way ANOVA was conducted to examine the interaction between the two factors attentional engagement and binocular presentation on mean activation in the ROI V3A (see Fig. 4B).Results indicate a significant main effect of binocular presentation (F (1, 31) = 29.259,p < .001),but not of attentional engagement (F (1, 31) = 0.613, p = .439)as well as a significant interaction in the area of interest (F (2, 31) = 24.618,p < .001).Post-hoc analyses revealed a significant effect of engagement in the monoscopic conditions (p = .019),but not in the stereoscopic conditions (p = .333).Moreover, differences between monoscopic and stereoscopic conditions were only significant during Observe (p < .001),but not during Action trials (p = .069).

Discussion
The present study aimed to examine the effects of different forms of binocular presentation on neural processing during a visual attention task.In line with findings from previous research, stereoscopic presentation of the virtual 3D environment was associated with elevated activation in area V3A.Interestingly, activation in this area also increased significantly when switching from Observe to Action trials in the monoscopic conditions but not in the stereoscopic conditions.Fig. 4. A) Mean signal intensities (beta values) in interaction cluster from whole-brain analysis (also Fig. 3C) indicate a larger effect of attentional engagement in the monoscopic conditions compared to the stereoscopic conditions.All post-hoc comparisons are significant.B) Mean signal intensities (beta values) in area V3A display overall higher netto activation in stereoscopic over monoscopic trials.In mono, active searching was associated with significantly higher activation than observing, whereas in stereo, activation remained unaffected by attentional engagement.* p < .05** p < .001.

Dorsal attention network
As expected, due to the salient nature of our stimuli, engagement with the visual attention task was associated with many clusters of significantly heightened activation.Most prominently, bilateral activation in premotor areas, corresponding to the FEF, and in the superior parietal lobules along the IPS can be attributed to the DAN.As first described by Corbetta and Shulman (2002), the DAN is involved in the top-down allocation of visual attention.Considering the attentional demands of the current paradigm, activation in this circuit seems hardly surprising given the need to orient attention towards a target (Petersen and Posner, 2012) and suggests that the task was effective in stimulating the appropriate attention-related neurocircuitry.Another activation pattern connects the cerebellar vermis VI with area V5 in both the left and right hemispheres.These findings are in line with previous research, which found that V5's primary function lies in the detection and tracking of motion (Zeki, 2015), whereas the cerebellum (Patel and Zee, 2014) and particularly vermian lobules VI and VII have been linked to the control of eye movements, balance, and hand-eye coordination (Park et al., 2018).In the context of the paradigm, these findings suggest that those regions were involved with locking onto a target stimulus during the Action trials.Finally, activation in the thalamus and the bilateral anterior insula can be accounted for by the task's attentional demand to selectively focus on one particular target, while ignoring other distracting stimuli.Research by Tokoro et al. (2015) has linked the thalamus, especially those nuclei projecting to prefrontal areas (Sieveritz et al., 2018), with attentional selection of visual and auditory information.Similarly, activation in the anterior insular cortex has been associated with focal attention (Nelson et al., 2010;Molnar-Szakacs and Uddin, 2022).As part of the reticular activating system, the thalamus has also been linked to managing alertness more generally (Moruzzi and Magoun, 1949).Considering that any active engagement with the task would require an increase in alerting attention (Petersen and Posner, 2012) compared to the observation conditions, activations in the thalamus might also be interpreted as an upregulation in tonic alertness.

V3A in depth perception and visual processing pathways
In line with our hypothesis, the current results show increases in BOLD activity in area V3A in the stereoscopically compared to the monoscopically presented conditions.These findings support previous ideas about V3A's integral involvement in the coding of depth cues for the perception of three-dimensional space in both humans and macaque monkeys (Tsao et al., 2003).According to a recent model put forward by Tessari et al. (2021), V3A might also serve as an important trifurcation point for the dorso-dorsal, ventro-dorsal, and ventral pathways.This crucial position within the visuomotor system could account for the significant interaction between attentional engagement and binocular presentation found in the current ROI analysis.The fact that in monoscopic conditions, mean activation in V3A increased significantly when switching from Observe to Action trials suggests that initiating goal-directed behavior required additional neural processing in this region.During the stereoscopic conditions however, mean activation was already elevated during Observe trials and did not rise with attentional engagement (see Fig. 4B).
One possible interpretation of these findings suggests that in order to effectively perform the attention task, depth cues had to be integrated into a three-dimensional map of the virtual environment.In monoscopic conditions, this operation was only necessary during Action trials and required additional metabolic resources leading to increased V3A activation.However, when the environment was presented stereoscopically, this process was initiated, regardless of the task at hand.Activation was therefore high during Observe trials and did not increase any further when interacting with the virtual environment, because depth cues had to be processed in either case.The interplay between depth perception and interactivity might be best illustrated by the analogous situation of trying to grasp an object while keeping one eye closed.When merely observing the object, perception seems to be hardly impaired.The need to compensate for the lack of stereoscopic depth only becomes apparent when grasping is initiated.In sum, the results suggest that the stereoscopic form of presentation elicited neural processes that seem to be preparatory for action in the attention task.This interpretation appears plausible considering that a three-dimensional model of one's peripersonal surroundings is essential when manipulating objects and initiating other actions.As Binkofski and Buxbaum (2013) pointed out referring to findings by Gutteling et al. (2011) and Neggers et al. (2007) as well as to the premotor theory of attention (Rizzolatti et al., 1987), the literature suggests that action and attention might share common neural mechanisms resulting in the modulation of visual attention by action preparation and vice versa.Moreover, V3A's role as a junction point for the dorsal-dorsal, ventro-dorsal, and ventral streams, puts it in a perfect position to offer prerequisite input for several visual pathways and affect multiple processes within the visuomotor system.To further investigate V3A's involvement in different neural networks, measures of functional connectivity could be considered in future research.
Interestingly and contrary to our predictions, activity in none of the areas previously associated with the experience of presence, such as the DLPFC, the insula, and the postcentral parietal cortex (Baumgartner et al., 2008;Jäncke, 2009), seemed to be affected by variations in binocular presentation.One possible explanation suggests that in the present case, stereoscopic presentation might not have been sufficient to elicit a heightened sense of presence contrary to research by Ijsselsteijn et al. (2001).On the other hand, previous neuroscientific studies have mostly employed other means than stereoscopic presentation to induce presence, which might indicate that not all contributors to presence rely on a single unitary neural substructure.In the absence of participants' subjective presence ratings which would allow us to test the former interpretation, no conclusive inference can be drawn from the current findings.

Virtual head movement
Virtual head movements (VHM), initiated by moving the trackball, were added to the analysis as a control regressor to ensure that differences between Action and Observe conditions would not solely be attributable to increased movement during the Action conditions.An isolated analysis of the effects of VHM revealed significant activations in primary and secondary visual processing areas.These results are probably a consequence of the change in first-person perspective.Other bilateral clusters of activation in the inferior parietal lobules, Broca's area (Brodmann's area 44), and the secondary somatosensory cortex can likely be attributed to a parieto-premotor network first described by Binkofski et al. (1999) responsible for the manipulation of objects.Activation in this network seems highly probable considering the complex motor operations participants had to initiate to interact with the trackball.Later studies such as Clower et al. (2001) could confirm that activation in the inferior parietal lobules has been associated with eye-hand coordination, which was a key feature of VHM, especially for targeting during Action trials.Similarly, activation in Broca's areas has been linked to manual control and fine motor skills (Nishitani et al., 2005), apart from its well-established role in speech production.Finally, activation in the thalamus with its primary motor and sensory projections as well as in the corticospinal tracts seem to play an important part in receiving and distributing sensory inputs and motor outputs.Thalamic projections to the posterior parietal lobule might be particularly instrumental for calibrating and fine-tuning trackball movements since these projections have been found to relate to efference copies in humans (Bellebaum, 2005) and manual tool use in capuchin monkeys (Mayer et al., 2019).

Limitations
The current study used very salient visual stimuli and granted participants a high degree of freedom by letting them change their firstperson perspective on their own accord.This aspect might have had significant influences on the experiment, particularly on the contrast between Action and Observe trials since virtual head movement patterns might have differed substantially between conditions even though participants were encouraged to move the trackball in both cases.This shortcoming was attempted to be mitigated by recording trackball movements and adding them to the analysis as a control regressor.Still, other control regressors such as eye movements could be integrated in future iterations as well to better differentiate between the neuropsychological effects of head movements, eye movements, and visual attention during Action trials.Another limitation of the current study is that participants' experience of presence was not assessed with a questionnaire following exposure to the monoscopically and stereoscopically presented virtual environments.This additional variable could have been used to investigate whether subjects' sense of presence mediated activity in area V3A and might explain why no activation in presencerelated neurocircuitry could be observed.

Implications
The primary objective of the present study was to examine whether stereoscopic presentation would impact neuropsychological processing during engagement with an attention task in order to investigate the neurological underpinnings of VR-based cognitive training protocols.Even though practical implications for the implementation of VR in cognitive rehabilitation need to be derived from clinical research, the current study finds evidence that stereoscopic presentation inherent to all VR headsets might facilitate attentional engagement by lowering metabolic costs in the intermediate stages of visual processing.These findings might be of particular importance when considering virtual training environments in which proper appraisal of depth cues is essential.These might include simulations of driving a car, which is a frequent setting for computerized cognitive training protocols, but may also apply to other scenarios that aim to simulate activities of daily living in ecologically valid environments and require patients to manipulate objects in peripersonal space.
Apart from its implications for cognitive rehabilitation, this study aimed to shed light on how presenting a virtual environment in VR impacts neuropsychological processing in general.Given the significantly lower metabolic costs when switching from observing to acting in the stereoscopically presented virtual environment, the results provide the first preliminary evidence that VR presentation might stimulate neurocircuitry responsible for the three-dimensional rendering of peripersonal space which is essential for action preparation.Also considering V3A's crucial position with the visuomotor system, further research could employ dynamic causal modeling to assess whether the current findings can be substantiated by differential patterns of functional connectivity in the dorso-dorsal and ventro-dorsal pathways as a function of monoscopic or stereoscopic presentation.

Fig. 1 .
Fig. 1.Video of the attention task.Only during Action trials, a target is displayed underneath the camera frame.

Fig. 2 .
Fig. 2. Schematic depiction of experimental timeline.Conditions were sequenced according to a pseudorandomized presentation order.

Table 1
Analysis of variance results indicate significant activation clusters for main effects and interaction effect (p < .001FWE at T > 5.77).For purposes of improved cluster parcellation, T-threshold was increased to T > 8.90 in [A > O] contrast.
* Labeled according to Harvard-Oxford Cortical Structural Atlas.** Labeled according to JHU White-Matter Tractography Atlas.