Perception and Action under Different Stimulus Presentations: A Review of Eye-Tracking Studies with an Extended View on Possibilities of Virtual Reality

Visual anticipation is essential for performance in sports. This review provides information on the differences between stimulus presentations and motor responses in eye-tracking studies and considers virtual reality (VR), a new possibility to present stimuli. A systematic literature search on PubMed, ScienceDirect, IEEE Xplore, and SURF was conducted. The number of studies examining the influence of stimulus presentation (in situ, video) is deficient but still sufficient to describe differences in gaze behavior. The seven reviewed studies indicate that stimulus presentations can cause differences in gaze behavior. Further research should focus on displaying game situations via VR. The advantages of a scientific approach using VR are experimental control and repeatability. In addition, game situations could be standardized and movement responses could be included in the analysis.


Introduction
Sporting expertise is expressed through exceptional motor skills and abilities and reflected in the athletes' cognitive ability [1] of lower levels (executive functions; e.g., [2]) and higher ones (attention [3], pattern recognition [4], and anticipation [5]). Anticipation and pattern recognition are essential parts of expert performance in many sports. The activities in this sports (e.g., soccer goalkeeping, volleyball) often have a strong spatial component. Athletes have to orientate in the room and pick up movement kinematics of opponents. The relevant factor in describing the requirements of the situation and the information processing of the athlete is gaze behavior. Eye-tracking technologies are often conducted in sports science studies examining gaze behavior. The method uses corneal reflections of a closely situated directed light source (often infrared) to track the pupil [6], aiming to infer fixations or other gaze parameters in sports. In addition to assessing the decision, eye-tracking data helps analyze anticipation and decision-making processes [7].
The studies, which deal with athletes' perceptual-cognitive abilities of various sports (e.g., tennis), differ in presenting the visual stimuli or the sport-specific situation to be analyzed (e.g., tennis serve). The most common ways to present a stimulus to explore gaze behavior are the in situ perspective [8] and video presentation [9]. In early research, visual behavior was examined with the so-called occlusion technique [10]. In these studies, the opponent's movements (e.g., free kick) were presented by showing clips which occluded either the opponent's movements at various times (e.g., while hitting the ball) or various parts of the body or the ball [9]. The latest research paradigms attacked this approach to study anticipation and information-processing from an ecological perspective [11]. In his work on the methodological procedure in so-called representative study designs, Brunswik review overviews the differences between stimulus presentation types (in situ, video-based, and VR) when examining eye movements (via eye-tracking).
Studies published from January 1990 up to September 2020 were searched in several ways. First, we searched the PubMed, ScienceDirect (Web of Science), IEEE Xplore, and SURF (national database) databases using relevant keywords in titles or abstracts of English-language journals. The search was restricted to the period from 1990 because of technical standards of eye-tracking equipment. The following keywords were used for literature search: (a) for perception: perceptual expertise, visual search, gaze behavior, eye movements, eye-tracking, and (b) for stimulus presentation and test design: test design, in-situ, VR, video presentation, simulation, visual cues, task specificity, task constraint, response. Furthermore, we searched on the website of relevant venues such as ETRA and identified articles' references. Studies were included if they compared different stimulus presentations, gaze behavior, or the search strategy was recorded (except the study with implementing VR) and if the article was written in English.

Results
The search in the databases identified 1035 (ScienceDirect), 1220 (PubMed), 932 (IEEE Xplore), and 32 articles (SURF database). Screening the references of the identified articles revealed 27 other relevant articles. After the exclusion of duplications, 1807 journal articles remained. After titles and abstracts were screened, 1776 non-relevant articles remained. Thirty-one (full-text) articles were assessed for eligibility. Twenty-three more were excluded because eye-tracking was not performed in these studies. They dealt with training the visual search strategy, or the studies were concerned with the accuracy of the eye-tracking measurement in VR. Another study was excluded because it re-analyzed the data of an already included study. A total of seven (in situ vs. video presentation: n = 4; perspectives in video presentations: n = 2; reality vs. VR: n = 1) studies was included (ScienceDirect: n = 2; PubMed/SURF database: n = 1; Citations: n = 5; see Figure 1). Eye-tracking technology is a frequently researched topic in sports science. The conducted studies cover a wide range of different subject areas (see [7]). However, a meaningful representation of such studies focusing on a specialization on the influence of athletes' perspectives (in situ vs. video presentation or in situ vs. aerial view) has been lacking so far.
Nevertheless, the research status allows for conclusions to be drawn about differences in gaze behavior. A reason for the weak state of research could be the complex execution of eye-tracking studies. Researchers have to create natural game situations or real opponents' actions for a study with an in situ design. To create a video presentation design, they have to record the videos of actions, which is laborious. Because of this complexity, VR technologies come into focus.

In Situ vs. Video Presentation
According to Abernethy et al. [24], Hernández et al. [25] postulated that the laboratory settings might not accurately show the experts advantages due to "(a) removal of experience factor(s) are associated with actually performing the task in an ecologically valid setting, (b) the introduction of potential floor or ceiling effects in measurement variability, and (c) constraining the expert's typical responses to either using different information to create a response or preventing access to information normally available in the performance context". Understanding the processes that underpin a qualified decision requires experimental conditions that reproduce the real context and task specificity as closely as possible [26]. The researchers combine as many variables as possible from real situations under laboratory conditions to achieve the highest ecological validity [27]. The question in this context is to what extent the eye movements and visual search strategies differ in connection with the research design or presenting the visual stimuli. Hernández et al. [25] (see Table 1) examined the effects of perceptual information on tennis coaches' visual behavior and error detection. The study provides information on the first research question: To what extent does the stimulus presentation influence fixations, eye movements, and cognitive processes? The authors carried out three gaze behavior measures: a laboratory 2-D presentation (video-based), an in situ setting on the court, and another presentation in the laboratory (video-based). Ten male tennis coaches (five experienced; five novices) had to detect errors in second topspin services (10 times). The outcome measure was the verbalized error detection and gaze behavior parameters (number of fixations, fixation duration, and fixation locations). There were fewer fixations and a shorter fixation duration for the 3-D and the second 2-D presentations (p < 0.001). Experienced coaches showed fewer fixations than novices. The experiment results support the hypothesis that gaze behavior differences are expected due to different stimulus presentations (Q1). Dicks et al. [14] analyzed the gaze behavior, responses (verbal, movement, and interceptive) and performance of eight male experienced goalkeepers. The stimulus that the goalkeepers had to react to was a penalty kick carried out by a trained penalty shooter. The mean fixation duration, the mean number of fixation locations, and the mean number of fixations were measured. The stimuli were presented in an in-situ setting and laboratory (video-based). Significant differences for the condition for fixation duration (F [4,23] = 3.117, p < 0.05) and for the number of fixation locations (F [4,23] = 4.218, p < 0.01) were detected. Significant differences in the different conditions with the number of fixations could not be shown (F [4,23] = 2.404, p = 0.08). The study shows that gaze behavior differences can arise from different stimulus presentation types and the subsequent response (verbal, movement, or interception). The study indicates that a separation between the vision for controlling perception and actions can be useful. Perception, measured by the test subjects' eye movements, seems dependent on the following options for action (Q2).
Afonso et al. [28] examined the visual search behaviors and verbal reports during film-based and in situ representative tasks in volleyball players. Nine female experienced volleyball players participated in the experiment and acted as background defenders in six scenarios. The number of fixations, fixation duration, and fixation locations was measured. The settings could be characterized as in situ and video-based with a verbal response.  The study results show evidence for the hypothesis that eye movements depend on the type of stimulus presentation (Q1). According to this, the statement applies only to fixation duration but not to the number of fixations and locations.
The last study in this field of research was conducted by Zweuts et al. [29]. Thirteen adults (eight females) with no sports context were examined. Participants had to ride a high-quality bicycle path and a low-quality bicycle path (±700 m) by bike. The number of fixations and saccades between Areas of Interest was measured in the in situ and video conditions. Overall, a significant Pearson correlation coefficient of r = 0.507 (p < 0.001) could be found for dwell time (in %) between the laboratory and field conditions. A comparison of the two road conditions shows a significant Pearson correlation coefficient for the low-quality cycle path (r = 0.663; p < 0.001). However, this effect could not be confirmed for the high-quality cycle path (r = 0.030; p = 0.821). Under real conditions (field conditions), the participants showed an increased visual search along the sight's vertical line. The authors conclude that the cognitive and motor requirements when cycling were more demanding in the field condition than in the laboratory because steering no longer played a role. The study again shows differences in gaze behavior between field and laboratory and a dependency on external task constraints (Q1 and Q2).

Different Perspectives in Video Presentations
A study conducted by Petit and Ripoll [30] (see Table 2) investigated two video presentation perspectives of simulated game scenes to estimate the effect of different stimuli presentations on expert perception and decision-making (forced-choice decision to pass or not to pass in response to game scenarios). The game scenes were shown from the broadcast point of view and the players' perspective (experienced vs. inexperienced soccer players). The masked prime paradigm was used to examine which parts of the scenes play a decisive role in the perception process. The results show that skilled soccer players make faster decisions. Furthermore, the presentation from the players' perspective (internal presentation mode) led to quicker and more precise decisions. The presentation perspective thus had a fundamental impact on decision latency and visual behavior. Considering Davids et al. [18], the task design, viewing perspective, and presentation mode should be orientated at the organism's natural environment.
In the second reported study by Mann et al. [31], 19 skilled youth soccer players were examined, observing identical game situations from two different viewing perspectives. The first perspective was the player perspective showing the experience of a player in this situation. The second perspective was an aerial view of the game situation from an elevated point overlying the same location on the field. Soccer players had a more extensive The study shows that gaze behavior differces can arise from different stimulus presentation types and the subsequent response rbal, movement, or interception). The study indicates that a separation between the ion for controlling perception and actions can be useful. Perception, measured by the t subjects' eye movements, seems dependent on the following options for action (Q2).
Afonso et al. [28]  The study results show evidence for the pothesis that eye movements depend on the type of stimulus presentation (Q1). Acrding to this, the statement applies only to fixation duration but not to the number of ations and locations.
The last study in this field of research was conducted by Zweuts et al. [29]. Thirteen ults (eight females) with no sports context were examined. Participants had to ride a h-quality bicycle path and a low-quality bicycle path (±700 m) by bike. The number of ations and saccades between Areas of Interest was measured in the in situ and video nditions. Overall, a significant Pearson correlation coefficient of r = 0.507 (p < 0.001) uld be found for dwell time (in %) between the laboratory and field conditions. A comrison of the two road conditions shows a significant Pearson correlation coefficient for low-quality cycle path (r = 0.663; p < 0.001). However, this effect could not be confirmed the high-quality cycle path (r = 0.030; p = 0.821). Under real conditions (field condins), the participants showed an increased visual search along the sight's vertical line. e authors conclude that the cognitive and motor requirements when cycling were more manding in the field condition than in the laboratory because steering no longer played ole. The study again shows differences in gaze behavior between field and laboratory d a dependency on external task constraints (Q1 and Q2).

. Different Perspectives in Video Presentations
A study conducted by Petit and Ripoll [30] (see Table 2) investigated two video esentation perspectives of simulated game scenes to estimate the effect of different stimpresentations on expert perception and decision-making (forced-choice decision to ss or not to pass in response to game scenarios). The game scenes were shown from the adcast point of view and the players' perspective (experienced vs. inexperienced socplayers). The masked prime paradigm was used to examine which parts of the scenes y a decisive role in the perception process. The results show that skilled soccer players ke faster decisions. Furthermore, the presentation from the players' perspective (interl presentation mode) led to quicker and more precise decisions. The presentation perective thus had a fundamental impact on decision latency and visual behavior. Considng Davids et al. [18], the task design, viewing perspective, and presentation mode ould be orientated at the organism's natural environment.
In the second reported study by Mann   The study results show evidence for the hypothesis that eye movements depend on the type of stimulus presentation (Q1). According to this, the statement applies only to fixation duration but not to the number of fixations and locations.
The last study in this field of research was conducted by Zweuts et al. [29]. Thirteen adults (eight females) with no sports context were examined. Participants had to ride a high-quality bicycle path and a low-quality bicycle path (±700 m) by bike. The number of fixations and saccades between Areas of Interest was measured in the in situ and video conditions. Overall, a significant Pearson correlation coefficient of r = 0.507 (p < 0.001) could be found for dwell time (in %) between the laboratory and field conditions. A comparison of the two road conditions shows a significant Pearson correlation coefficient for the low-quality cycle path (r = 0.663; p < 0.001). However, this effect could not be confirmed for the high-quality cycle path (r = 0.030; p = 0.821). Under real conditions (field conditions), the participants showed an increased visual search along the sight's vertical line. The authors conclude that the cognitive and motor requirements when cycling were more demanding in the field condition than in the laboratory because steering no longer played a role. The study again shows differences in gaze behavior between field and laboratory and a dependency on external task constraints (Q1 and Q2).

Different Perspectives in Video Presentations
A study conducted by Petit and Ripoll [30] (see Table 2) investigated two video presentation perspectives of simulated game scenes to estimate the effect of different stimuli presentations on expert perception and decision-making (forced-choice decision to pass or not to pass in response to game scenarios). The game scenes were shown from the broadcast point of view and the players' perspective (experienced vs. inexperienced soccer players). The masked prime paradigm was used to examine which parts of the scenes play a decisive role in the perception process. The results show that skilled soccer players make faster decisions. Furthermore, the presentation from the players' perspective (internal presentation mode) led to quicker and more precise decisions. The presentation perspective thus had a fundamental impact on decision latency and visual behavior. Considering Davids et al. [18], the task design, viewing perspective, and presentation mode should be orientated at the organism's natural environment.
In the second reported study by Mann et al. [31], 19 skilled youth soccer players were examined, observing identical game situations from two different viewing perspectives. 2 = 0.03). The study results show evidence for the hypothesis that eye movements depend on the type of stimulus presentation (Q1). According to this, the statement applies only to fixation duration but not to the number of fixations and locations.
The last study in this field of research was conducted by Zweuts et al. [29]. Thirteen adults (eight females) with no sports context were examined. Participants had to ride a high-quality bicycle path and a low-quality bicycle path (±700 m) by bike. The number of fixations and saccades between Areas of Interest was measured in the in situ and video conditions. Overall, a significant Pearson correlation coefficient of r = 0.507 (p < 0.001) could be found for dwell time (in %) between the laboratory and field conditions. A comparison of the two road conditions shows a significant Pearson correlation coefficient for the low-quality cycle path (r = 0.663; p < 0.001). However, this effect could not be confirmed for the high-quality cycle path (r = 0.030; p = 0.821). Under real conditions (field conditions), the participants showed an increased visual search along the sight's vertical line. The authors conclude that the cognitive and motor requirements when cycling were more demanding in the field condition than in the laboratory because steering no longer played a role. The study again shows differences in gaze behavior between field and laboratory and a dependency on external task constraints (Q1 and Q2).

Different Perspectives in Video Presentations
A study conducted by Petit and Ripoll [30] (see Table 2) investigated two video presentation perspectives of simulated game scenes to estimate the effect of different stimuli presentations on expert perception and decision-making (forced-choice decision to pass or not to pass in response to game scenarios). The game scenes were shown from the broadcast point of view and the players' perspective (experienced vs. inexperienced soccer players). The masked prime paradigm was used to examine which parts of the scenes play a decisive role in the perception process. The results show that skilled soccer players make faster decisions. Furthermore, the presentation from the players' perspective (internal presentation mode) led to quicker and more precise decisions. The presentation perspective thus had a fundamental impact on decision latency and visual behavior. Considering Davids et al. [18], the task design, viewing perspective, and presentation mode should be orientated at the organism's natural environment.  The study results show evidence for the hypothesis that eye movements depend on the type of stimulus presentation (Q1). According to this, the statement applies only to fixation duration but not to the number of fixations and locations.
The last study in this field of research was conducted by Zweuts et al. [29]. Thirteen adults (eight females) with no sports context were examined. Participants had to ride a high-quality bicycle path and a low-quality bicycle path (±700 m) by bike. The number of fixations and saccades between Areas of Interest was measured in the in situ and video conditions. Overall, a significant Pearson correlation coefficient of r = 0.507 (p < 0.001) could be found for dwell time (in %) between the laboratory and field conditions. A comparison of the two road conditions shows a significant Pearson correlation coefficient for the low-quality cycle path (r = 0.663; p < 0.001). However, this effect could not be confirmed for the high-quality cycle path (r = 0.030; p = 0.821). Under real conditions (field conditions), the participants showed an increased visual search along the sight's vertical line. The authors conclude that the cognitive and motor requirements when cycling were more demanding in the field condition than in the laboratory because steering no longer played a role. The study again shows differences in gaze behavior between field and laboratory and a dependency on external task constraints (Q1 and Q2).

Different Perspectives in Video Presentations
A study conducted by Petit and Ripoll [30] (see Table 2) investigated two video presentation perspectives of simulated game scenes to estimate the effect of different stimuli presentations on expert perception and decision-making (forced-choice decision to pass or not to pass in response to game scenarios). The game scenes were shown from the broadcast point of view and the players' perspective (experienced vs. inexperienced soccer players). The masked prime paradigm was used to examine which parts of the scenes In the second reported study by Mann et al. [31], 19 skilled youth soccer players were examined, observing identical game situations from two different viewing perspectives. The first perspective was the player perspective showing the experience of a player in this situation. The second perspective was an aerial view of the game situation from an elevated point overlying the same location on the field. Soccer players had a more extensive search pattern in the aerial perspective with a higher rate of fixations per second (t [12] = 4.90, p < 0.001, d = 0.92). The search rate was shorter in the aerial view compared to the player perspective (t [12] = 4.90, p < 0.002, d = 0.75). There was also a significant effect of the viewing perspective on where they set their fixations (F [8,5] = 6.56, p = 0.027, could not be shown (F [4,23] = 2.404, p = 0.08). The study shows that gaze behavior differences can arise from different stimulus presentation types and the subsequent response (verbal, movement, or interception). The study indicates that a separation between the vision for controlling perception and actions can be useful. Perception, measured by the test subjects' eye movements, seems dependent on the following options for action (Q2).
Afonso et al. [28] examined the visual search behaviors and verbal reports during film-based and in situ representative tasks in volleyball players. Nine female experienced volleyball players participated in the experiment and acted as background defenders in six scenarios. The number of fixations, fixation duration, and fixation locations was measured. The settings could be characterized as in situ and video-based with a verbal response. Participants had shorter fixation durations for the video-based stimuli presentation The study results show evidence for the hypothesis that eye movements depend on the type of stimulus presentation (Q1). According to this, the statement applies only to fixation duration but not to the number of fixations and locations.
The last study in this field of research was conducted by Zweuts et al. [29]. Thirteen adults (eight females) with no sports context were examined. Participants had to ride a high-quality bicycle path and a low-quality bicycle path (±700 m) by bike. The number of fixations and saccades between Areas of Interest was measured in the in situ and video conditions. Overall, a significant Pearson correlation coefficient of r = 0.507 (p < 0.001) could be found for dwell time (in %) between the laboratory and field conditions. A comparison of the two road conditions shows a significant Pearson correlation coefficient for the low-quality cycle path (r = 0.663; p < 0.001). However, this effect could not be confirmed for the high-quality cycle path (r = 0.030; p = 0.821). Under real conditions (field conditions), the participants showed an increased visual search along the sight's vertical line. The authors conclude that the cognitive and motor requirements when cycling were more demanding in the field condition than in the laboratory because steering no longer played a role. The study again shows differences in gaze behavior between field and laboratory and a dependency on external task constraints (Q1 and Q2).

Different Perspectives in Video Presentations
A study conducted by Petit and Ripoll [30] (see Table 2) investigated two video presentation perspectives of simulated game scenes to estimate the effect of different stimuli presentations on expert perception and decision-making (forced-choice decision to pass or not to pass in response to game scenarios). The game scenes were shown from the broadcast point of view and the players' perspective (experienced vs. inexperienced soccer players). The masked prime paradigm was used to examine which parts of the scenes play a decisive role in the perception process. The results show that skilled soccer players make faster decisions. Furthermore, the presentation from the players' perspective (internal presentation mode) led to quicker and more precise decisions. The presentation perspective thus had a fundamental impact on decision latency and visual behavior. Considering Davids et al. [18], the task design, viewing perspective, and presentation mode should be orientated at the organism's natural environment.
In the second reported study by Mann et al. [31], 19 skilled youth soccer players were examined, observing identical game situations from two different viewing perspectives. The first perspective was the player perspective showing the experience of a player in this situation. The second perspective was an aerial view of the game situation from an elevated point overlying the same location on the field. Soccer players had a more extensive search pattern in the aerial perspective with a higher rate of fixations per second (t [12]  It can be summarized that the quality of the eye-tracking studies is defined by the compromise or the interplay between ecological validity (field study) and internal validity (laboratory study). The sport-specific and task-specific perception often runs synchronously with the motor action [32]. The investigation of visual perception and the associated motor reaction in the player's perspective (in situ) with pupillary movements' detection using eye-tracking glasses can be regarded as the gold standard [13]. The presentation using video clips shows a higher internal validity and has the advantage that significantly more variables can be controlled in this experiment. The video presentation set is the only way to show every participant in the same game situation (stimuli).

Reality vs. Virtual Reality
Virtual reality technology is gaining interest in many different areas such as rehabilitation, sport, education, or medicine [33]. Especially in a scientific context, VR offers new possibilities to examine and understand human perception and action.
The in situ presentation of stimuli is the most realistic type of simulation to examine gaze behavior. However, the realistic game situations could not be standardized because none of the players involved were able to show the same movement for a second time. For this standardization, researchers used video presentations to show game situations or sports actions as a stimulus. The disadvantage of video presentations is the non-realistic representation regarding the depth of vision and restriction of the field of view. Visual VR (HMD) offers the opportunity to show such natural sports situations differently. The fast growth of VR simulations in academic research led to the validation of training applications for transferable skills in surgery or navigation [34]. Other research groups focused on the possibility of VR technology to learn general physical skills [35].
First, however, the question should be clarified to what extent VR can support valid gaze behavior studies based on the narrative review's identified study results. Ref. [36] experimented with gaze behavior and focused on the duel between a handball goalkeeper and a field player (thrower). The authors recorded the thrower's movements to create an avatar based on the movements (motion capturing). In this case, it was not the eye movements compared but the gestures to ward off the ball. The authors examined to what extent the VR presentation produced the same movements for the same throw as the real situation in the field. The results show that the gestures did not differ between real and virtual environments (Q4). The study did not show a real comparison between reality and VR with an HMD presentation and excluded gaze behavior.
Nevertheless, the example shows a great advantage of VR for science: creating different scenarios with real-time interactions between the user and the VR, which cannot be created in reality [37]. In addition, the programmed procedures for an investigation ensure standardized and easy-to-control experimental conditions.
The visual input predominates VR application compared to the acoustic and haptic information [38]. Integrating further measurement systems such as the eye tracker [39] or the EEG [40] offers new possibilities to get a more in-depth insight into visual information Appl. Sci. 2021, 11, 5546 9 of 12 processing. For this purpose, mobile eye trackers were integrated into various existing VR devices such as HMD's or the Cave Automatic Virtual Environment (CAVE). However, this application requires a complex technical implementation. For this reason, some authors carried out studies to measure the accuracy and precision of the point of view made in the VR compared with those from the natural environment [39]. The results show no significant differences between the two forms of presentation mentioned and that the VR setting can be used to investigate gaze behavior in sport.
Furthermore, in perception and anticipation, occlusion techniques showed that, through spatial depth information, athletes experienced a more realistic feeling in the virtual environment than in a 2-D video presentation. As a result, they were able to react better to enemy attacks in martial arts [41]. In addition to the facts mentioned above, the athletes' reactions and actions in the CAVE are more similar to those in reality than responses shown on a 2-D screen [42].
Moreover, [43] showed that additional VR training leads to an improvement in reaction behavior. These studies indicate that the use of VR in combination with the eye leads to new insights. Other gaze parameters such as saccades [44] and gaze fixations [45] can also be recorded in VR. It should also be noted that another great advantage of VR is represented in the evaluation of the eye movement data since the position of the objects targeted by vision is known in VR.
Nevertheless, negative aspects can also be detected in connection with VR and the examination of gaze behavior. Some studies have already been carried out to investigate the so-called vergence-accommodation conflict [46]. This phenomenon exists because the eye lens focuses the objects on the screen or projection (accommodation). In contrast, the eye muscles adjust the visual axis (fixation) to the 3-D image (vergence), either in front or behind the screen. These conflict of these two adoptions leads to fatigue in some persons using 3-D displays [47]. The potential fatigue has to be considered in VR studies in science. Another factor that has to be considered when using VR for such experiments is the lack of haptic feedback and locomotion space. Haptic feedback (feeling a ball, touching the racket in tennis) is critical in sports settings because the missing information could lead to a change in motor response and behavior. The space for locomotion is relevant for sports actions, which require a movement of the participant in the environment (moving to the ball in tennis return).

Discussion
The results on whether athletes differ in their gaze behavior if the stimuli are presented in situ or by video presentation (Q1) show a significant effect of this factor in the identified studies [8,14,25,29]. The fixation duration, fixation location, or the number of fixations may differ between the two types of study designs because of the modified depth perception in the video presentation [48]. In addition, the eye movements carried out by the test persons or athletes (concerning accommodation, vergence) differ in the in situ and the video presentation.
The possible action or reactions in response to the sequence also significantly influenced the pupil movements and the participants' visual search strategy ( [14]; Q2). Dicks et al. [49] discuss the nature of the relation between expertise and perception-action coupling using ideas from ecological psychology [11] and the framework of representative task design by Brunswick [12]. Dicks et al. [49] gives an example of perception-action coupling: "When analyzing the pickup of information used in the return of serve in tennis, participants should be exposed to the in-situ actions of an opponent and ball-flight while being allowed to use the attended information for action". In the video simulation, the perception is separated from the real action. The research approaches usually allow a rudimentary response to the shown stimuli (e.g., [14]). The separation of perception and action gaze behavior or visual information pickup is different from reality.
In comparing different perspectives on the same game situation (Q3), differences in gaze behavior and visual search strategy could be identified [31]. These differences appear to be determined by the available information. For example, the aerial view shows more details on the open space because of the depth of the perspective. This information leads to a different search strategy to find the best decisions for the given game situation.
The results of the reviewed studies reveal many reasons for using VR in gaze behavior studies: (a) There are significant differences in gaze behavior between 2-D or video screen projection and in situ representation [8,14,25,28,29] and also between the different perspectives of the presentation [30,31]; (b) In the current literature, no significant differences between the in situ representation and the representation of the scenes in VR or 3-D are reported ( [36]; Q4); (c) On the VR display, scenes and stimuli can be shown that cannot be shown in reality or that cannot be shown in a standardized manner ( [37,50]; and (d) In addition to the VR presentation, supplementary examination methods that require a stationary or safe workplace can be used (eye-tracking: [39]; EEG: [40]).
In scientific studies, however, the fatigue caused by the (a) vergence-accommodation conflict should be considered in developing the research design and interpreting the results.
Another aspect which is lacking consideration is the perception-action paradigm in the context of sports situations. Most sporting movements and gaze movements require movement by the athlete himself. This means that the athlete moves in order, for example, to take a better angle to the opponent or teammate. Immersive designs consider this problem by using artificial locomotion (continuous forward movement) or teleporting ("jumping" from one location to another) [51]. However, using these techniques in the sports context is questionable because the complexity of motor control would be expanded. The player must learn how to move with a controller (teleporting) instead of just using his legs when moving more than just one or two meters.
In addition to the few studies that can be reported, the review has another limitation: the studies report findings of different sports (e.g., tennis, soccer). Gaze behavior could differ between these sports situations, which must be considered when interpreting the results. Therefore, the results cannot be generalized, but they provide an adequate overview because the gaze strategies of individual sports can be described as very similar [1].
Further research should also consider the (b) lack of haptic feedback and the requirement of space. Another question is if the established metrics for eye-tracking parameters (c) and if the established methodology in eye-tracking could be transferred to a VR setting (d).
Research should focus on the possibilities of conducting eye-tracking studies in a VR setting and show the issues and benefits of this technology. Studies examining gaze accuracy and precision in real-world and virtual reality are the first step, but further work is needed.