Stopping by looking: A driver-pedestrian interaction study in a coupled simulator using head-mounted displays with eye-tracking Applied Ergonomics

Automated vehicles (AVs) can perform low-level control tasks but are not always capable of proper decision- making. This paper presents a concept of eye-based maneuver control for AV-pedestrian interaction. Previ-ously, it was unknown whether the AV should conduct a stopping maneuver when the driver looks at the pedestrian or looks away from the pedestrian. A two-agent experiment was conducted using two head-mounted displays with integrated eye-tracking. Seventeen pairs of participants (pedestrian and driver) each interacted in a road crossing scenario. The pedestrians ’ task was to hold a button when they felt safe to cross the road, and the drivers ’ task was to direct their gaze according to instructions. Participants completed three 16-trial blocks: (1) Baseline, in which the AV was pre-programmed to yield or not yield, (2) Look to Yield (LTY), in which the AV yielded when the driver looked at the pedestrian, and (3) Look Away to Yield (LATY), in which the AV yielded when the driver did not look at the pedestrian. The driver ’ s eye movements in the LTY and LATY conditions were visualized using a virtual light beam. Crossing performance was assessed based on whether the pedestrian held the button when the AV yielded and released the button when the AV did not yield. Furthermore, the pedestrians ’ and drivers ’ acceptance of the mappings was measured through a questionnaire. The results showed that the LTY and LATY mappings yielded better crossing performance than Baseline. Furthermore, the LTY condition was best accepted by drivers and pedestrians. Eye-tracking analyses indicated that the LTY and LATY mappings attracted the pedestrian ’ s attention, while pedestrians still distributed their attention between the AV and a second vehicle approaching from the other direction. In conclusion, LTY control may be a promising means of AV control at intersections before full automation is technologically feasible.


Introduction
Various forms of vehicle automation, such as lane-keeping assistance systems, adaptive cruise control, and autonomous emergency braking, are available on the market today (Bengler et al., 2014). In addition, recently deployed vehicles include traffic light detection, allowing the car to stop in front of an intersection automatically (Nica, 2020;Tesla, 2021). These existing technologies ensure that the car follows a given target (e.g., lane center, headway).
Although current vehicle automation excels in bounded tasks such as those described above, automation technology still has difficulty performing tasks that depend on social context, such as predicting whether a pedestrian will cross the road (Rudenko et al., 2020;Vinkhuyzen and Cefkin, 2016). For this reason, in current AVs, the human driver still supervises the automation, ready to override the system when needed (Lu et al., 2019;Noy et al., 2018). In addition, some vehicles feature a monitoring system that tracks the driver's eye or head movements to verify whether the driver is sufficiently involved (e.g., Cadillac, 2021;Ford, 2021).
One relatively novel way to keep the human involved and allocate responsibility between human and machine is maneuver-based control (Detjen et al., 2021;Hakuli et al., 2011;Manawadu et al., 2018). In maneuver-based control, the driver commands on the maneuver level (e. g., stop the vehicle), while the machine executes tasks on the control level (e.g., braking, turning the steering wheel). Thus, with maneuver-based control, the human is responsible for vital decision-making tasks, whereas low-level control is delegated to automation.
The present study proposes touchless maneuver-based control using the driver's eye movements. This type of control may be efficient because eye movements coincide with attention (Just and Carpenter, 1980) and because body movement (other than turning the head or eyes) is not required. Outside of the domain of driving, studies have shown that gaze-contingent interfaces can have powerful uses. Examples are eye-typing (Majaranta and Räihä, 2002), object positioning (Liu et al., 2020) and navigation (Stellmach and Dachselt, 2012) in virtual worlds, training of perceptual skills (Ryu et al., 2016), surveillance displays (Smith et al., 2015), and teleoperation (drone control: Hansen et al., 2014;surgical robot control: Noonan et al., 2008) (see Reingold et al., 2003 andDuchowski, 2018, for reviews of applications of gaze-contingent control and other forms of gaze-based interaction). Others have used eye-tracking not for gaze-contingent applications, but for gaze-ray visualization, in tasks such as joint visual search and solving puzzles (see Erickson et al., 2020;Jing et al., 2021, for collaborative gaze concepts).
So far, only a limited amount of research is available on eye-based control in automated and assisted driving. Ataya et al. (2020) used heads-up gaze-head input, where drivers first gazed at an icon of interest then nodded to confirm their selection, and compared this concept with touch-input interaction and voice interaction. They used these input modalities in a driving simulator to obtain drivers' satisfaction ratings of the decision-making of an AV. Jiang et al. (2018) demonstrated a parking assistance system in which drivers indicated the intended direction of their vehicle using eye movements. Wu et al. (2019) developed a system that used the driver's eye gaze to predict the driver's upcoming maneuver. They argued that this concept could be useful in assistance systems that aim to prevent dangerous situations. In Wang et al. (2020), drivers requested an assistance system to take action (e.g., change lanes) by looking at an object and uttering "Watch out". Despite these demonstrations of gaze-contingency in driving, there is still little knowledge about how a maneuver-based controlled AV should be controlled by means of eye movements.
When considering a pedestrian-crossing situation, a gaze-controlled AV could adopt two possible mapping strategies. The first strategy, Look to Yield, is that the AV stops when the driver looks at the pedestrian and continues driving when not looking at the pedestrian. This mapping strategy is consistent with research showing that pedestrians are more willing to cross the road when the driver seeks eye contact compared to when the driver does not (Faas et al., 2021;Malmsten Lundgren et al., 2017;Onkhar et al., 2022;Yang, 2017).
The opposite mapping, Look Away to Yield, stops the car when not looking at the pedestrian. Conversely, if the driver is paying attention to the pedestrian, the AV continues to drive. This mapping is loosely consistent with the 'minimal risk maneuver' ('fallback maneuver') described in automated driving standards (International Organization for Standardization, 2020; Karakaya and Bengler, 2021). More specifically, if the driver is paying attention to potential hazards (pedestrians), the AV can safely continue driving, knowing that the driver is situationally aware. Conversely, if the driver is not paying attention to potential hazards, the AV automatically stops out of precaution.

Study aims and approach
The study aimed to determine which of the two mappings (Look to Yield vs. Look Away to Yield) for controlling an AV by means of eye movements is best accepted by pedestrians and drivers. For this purpose, a human-subject study was conducted where the driver and the pedestrian interacted in the same virtual environment. Driver and pedestrian were both immersed using a virtual-reality headset with integrated eye-tracking, and the driver's eye movements were visualized using a colored light beam. Recent research proposes external human-machine interfaces (eHMIs) to let the AV communicate its intentions to pedestrians (Dey et al., 2020). The light beam is a special type of eHMI that allowed the driver to confirm that his/her eyes were tracked correctly, whereas it was expected to help the pedestrian predict whether the AV would stop or not.
In real traffic, pedestrians will have to consider multiple road users simultaneously, something that is often overlooked in research on eHMIs so far (Tabone et al., 2021a; for exceptions, see Joisten et al., 2021;Mahadevan et al., 2019;Wilbrink et al., 2021). That is, a risk is that pedestrians will over-rely on the (attention-grabbing) AV with its light beam display and ignore surrounding road users. Therefore, both mapping concepts were evaluated with and without a second vehicle coming from the other direction.

Participants
Thirty-four participants (17 in the role of driver; 17 in the role of pedestrian) partook in the experiment. The drivers were 13 males and 4 females and had a mean age of 23.6 years (SD = 2.0). The pedestrians were 10 males and 7 females and had a mean age of 23.4 years (SD = 1.8). The nationalities of the participants were Dutch (29), Chinese (3), and Indian (2). Note that traffic in India is left-handed, but the Indian participants resided in the Netherlands for more than a year.
Five participants (1 driver, 4 pedestrians) had participated in an experiment on crossing behavior before. Nine of the drivers wore seeing aids (7 glasses and 2 contact lenses), and 10 of the pedestrians wore seeing aids (5 glasses and 5 contact lenses) during the experiment. Moreover, 12 of the drivers and 12 of the pedestrians had a driver's license, and 12 of the drivers and 11 of the pedestrians had previous experience with virtual reality glasses. All participants provided written informed consent. The research was approved by the Human Research Ethics Committee of the Delft University of Technology.

Hardware
Two Varjo VR-2 Pro head-mounted displays (HMDs) and two Alienware PCs with identical specifications were used for the experiment. The HMDs were equipped with an industrial-grade, 0.2 • -precision eyetracker and provided a horizontal and vertical field of view of 87 • for the participants. The view was provided by two low-persistence micro-OLEDs with a display resolution of 1920x1080 pixels and two lowpersistence AMOLEDs with a display resolution of 1440x1600 pixels. The PCs used an Intel Core i9-9900K 3.60 GHz CPU, had 64.0 GB RAM, and used two GPUs: Intel UHD Graphics 630 and NVIDIA GeForce RTX 2081Ti. The pedestrians used an HTC Vive Controller 2.0 to indicate whether they felt safe to cross the road. Two SteamVR base stations were used to track the HMDs and the controller. Finally, a Beyerdynamic DT 770 PRO headset was used by the pedestrian. Data were recorded at a frequency of 60 Hz.

Virtual reality environment
The virtual environment was based on the open-source coupledsimulator project of Bazilinskyy et al. (2020). Fig. 1 shows the top view of the zebra crossing, including the pedestrian, the AV, and the second vehicle.
The AV approached the zebra crossing from the pedestrian's left side, and the second vehicle approached from the pedestrian's right side. The pedestrian's camera was placed 1.7 m (global average height for men; Roser et al., 2013) above the 0.22-m high curb and at a perpendicular distance of 1.0 m from the edge of the road. The width of the road was 10 m. The pedestrian and driver could rotate their heads in all directions but could not move their body in the virtual world. The driver sat in the passenger seat rather than the driver seat to enhance the impression of being in an AV.

Eye-gaze visualization
The driver's visual attention was rendered as a colored light beam. A 7-cm-wide semi-transparent (α = 0.2) beam was drawn between the middle point between the driver's eyes and a point 100 m away in the driver's gaze direction (see Fig. 2). The eye-gaze visualization was realized using the head-and eye-tracking capabilities of the Varjo VR-2 Pro HMD.

Experimental design
Participants were divided into the role of driver or pedestrian before the session began. The experiment was of a within-subject design, with three independent variables: the mapping (Baseline, LTY, LATY), the yielding behavior of the AV (yielding, non-yielding), and the presence of a second vehicle (without, with).
The sessions were divided into three blocks. Each block contained one mapping: In the LTY and LATY conditions, the driver's eye gaze was visualized,

Fig. 1.
Top view of the virtual environment. In all trials, the AV approached from the left, and the second vehicle approached from the right. The AV-pedestrian distance is defined as the distance along the road measured from the pedestrian to the windshield of the AV. whereas in the Baseline condition, the driver's eye gaze was not visualized.
Furthermore, each block contained the same four scenarios: The order of trials within the block was randomized for each participant, which implies that participants could not anticipate whether the AV would yield or whether the second vehicle would be present. All participants began with the Baseline block. This was followed by either the LTY block or the LATY block, depending on the participant number (odd: LTY; even: LATY). In total, the two participants performed 48 trials (3 mappings × 2 yielding conditions × 2 second-vehicle conditions × 4 repetitions).

Vehicle behavior
The AV's yielding behavior in the Baseline and LATY conditions was not interactive. The driver was instructed to follow the LATY instructions and was therefore expected not to look at the pedestrian. Thus, there would be no manual trigger point for yielding. In real traffic, drivers tend to make their yielding decision later than 30 m before the crosswalk at a vehicle speed of 30 km/h (Schneemann and Gohl, 2016). It was chosen to start the deceleration at an AV-pedestrian distance shorter than 30 m to mimic human deceleration behavior. More specifically, the pre-programmed braking was initiated at an AV-pedestrian distance of 22.5 m, resulting in the AV coming to a standstill at a distance of 6.2 m from the pedestrian.
For the LTY mapping, the yielding behavior of the AV was interactive. The AV braked when the driver looked at the pedestrian if the AVpedestrian distance was between 14.4 and 25 m. An adaptive deceleration was used so that the AV would always stop at a distance of 6.2 m from the pedestrian. The yielding trigger needed to be activated before 14.4 m to ensure that the deceleration did not exceed an assumed comfortable deceleration of 3 m/s 2 (Schroeder, 2008). A single glance of the driver in the set distance interval was sufficient to initiate the deceleration of the AV. 'Seeing the pedestrian' was defined as directing the eye gaze at the pedestrian avatar's hitbox (an invisible shape used for real-time collision detection). The rectangle was as tall as the avatar, and the width was 1.9 m, equal to the length of the arms stretched horizontally. At a distance of 25 m, the rectangle width of 1.9 m equals about 4 • of the visual field.
The second vehicle came from the right and maintained a constant speed of 30 km/h. In the non-yielding scenario, the second vehicle passed the pedestrian approximately 0.9 s later than the AV, and in the yielding scenario, the second vehicle passed the pedestrian at the same time the AV came to a stop. Fig. 3 shows the AV trajectory of the trials without the second vehicle.
• Phase 1 is the approaching phase and is defined as the period after the start signal till the start of the trigger distance for yielding, i.e., AV-pedestrian distance of 25 m (time between 0 and 4.3 s). • Phase 2 represents the deceleration phase of the yielding AVs and is defined as the period between an AV-pedestrian distance of 25 m and the vehicle coming to a stop (time between 4.3 and 8.2 s). Note that Phase 2 in LTY shows a variation in time (end time between 7.5 and 8.5 s) for the yielding trials because this condition used adaptive deceleration, as explained above. For non-yielding trials, Phase 2 ended when the AV passed the zebra (time between 4.3 and 8.0 s). • Phase 3 represents the period during which the AV stood still for 2.6 s.

Procedure and instructions
Because of COVID-19, participants were requested to disinfect their hands and wear facemasks before entering the lab. Participants were first informed about the purpose of the study and were given an informed consent form to read and sign. Next, the participants read the instructions and completed a pre-experiment questionnaire asking about demographics and prior experience in gaming, VR, driving, and road crossing.
The mapping details (Baseline, LTY, LATY) and participants' tasks were provided in written form. The task for the driver was to follow the instructions given visually at the beginning of each trial in the form of "Stop the AV" and "Do not stop the AV". The pedestrian's task was to press and hold a button when they felt safe to cross the road and release the button when they did not feel safe to cross the road. At the start of the trial, pedestrians heard "Press now" via the headphones to ensure that participants had the button pressed at the beginning of the trial. No other sounds were simulated.
Before the start of each session block, the experimenter repeated to the participants what mapping would be used and what was expected Fig. 3. AV-pedestrian distance as a function of time for the Baseline, LTY, and LATY conditions. Phase (1) represents the period after the start signal (0 m) till the start of the yielding trigger detection range (25 m). Phase (2) is the start of the yielding trigger detection range (25 m) till the standstill of the AV, and Phase (3) is the period when the AV was at a standstill for 2.6 s. Note that the figure was created from plotting all trials without a second vehicle (between 64 and 68 trials per condition). from them. The experimenter also mentioned to the pedestrian that the AV carrying the driver would come from the left side of the pedestrian and that the second vehicle would come from the right side.
First, the participants performed a practice run in which one trial per mapping was provided. Then, three blocks of trials were presented to the participants. Between blocks, there was a break in which participants were asked to complete a questionnaire regarding their cybersickness state using a misery scale (MISC; Bos et al., 2005) and acceptance of the system ( Van der Laan et al., 1997). In the acceptance questionnaire, participants responded to the statement "Your judgments of the eye-gaze visualization system are …" on nine five-point items. At the end of the experiment, participants were asked to complete a questionnaire about their mapping preference and level of presence (Witmer and Singer, 1998; data not used for this study).

Data analysis
First, trials in which the driver did not act in accordance with the instructions were excluded. Next, the button-press percentage was computed for Phase 2 per trial (for similar approaches, see De Clercq et al., 2019;Oudshoorn et al., 2021;Sripada et al., 2021). It was chosen to use Phase 2 because differences between mappings were expected to show during this phase. In comparison, in Phase 1, the car is still far away, and in Phase 3, the car is standing still and no difference between mappings is expected.
Acceptance was computed for two subscales: usefulness (5 items) and satisfaction (4 items). The minimum and maximum possible scores were − 2 and +2, respectively. Furthermore, a distribution of the driver's eye-gaze yaw angle was created to illustrate the driver's gaze behavior. The calculation was done over Phase 2 of the trial. Finally, to illustrate the gaze behavior of the pedestrian during Phase 2 of the trial, a distribution of the yaw angle difference between the pedestrian's eyegaze yaw angle and the yaw angle to the AV was created. When the yaw difference was 0 • , the pedestrian looked at the AV. A positive yaw difference means that the pedestrian looked to the left of the AV, and a negative yaw difference means that the pedestrian looked to the right of the AV.

Results
In total, 816 trials were performed (17 pairs of participants × 48 trials per participant pairs), of which 544 (17 × 36) were in the LTY and LATY conditions. 17 out of those 544 trials were removed because participants did not comply with the task instructions: • In the LTY mapping, the instruction "Stop the AV" was given (yielding scenario), but the driver did not look at the pedestrian: 9 of 136 trials. • In the LTY mapping, the instruction "Do not stop the AV" was given (non-yielding scenario), but the driver looked at the pedestrian: 2 of 136 trials. • In the LATY mapping, the instruction "Stop the AV" was given (yielding scenario), but the driver looked at the pedestrian: 1 of 136 trials. • In the LATY mapping, the instruction "Do not stop the AV" was given (non-yielding scenario), but the driver did not look at the pedestrian: 5 of 136 trials.
In addition, one trial of the Baseline condition (With second vehicle, Yielding) was missing because of a data logging error. Fig. 4 shows the driver's gaze yaw distribution during the experiment and confirms that the mapping and instructions functioned properly. In the yielding scenarios (left two subfigures), the gaze in the LTY was directed to the right (the pedestrian). The driver's gaze in the LATY mapping was directed to the left and middle, that is, away from the pedestrian. Oppositely, in the non-yielding scenarios (right two Fig. 4. Distribution of the driver's gaze yaw angle in Phase 2, at a 1-deg resolution. A gaze yaw angle of 90 • represents the driver looking straight ahead. A gaze yaw angle smaller than 90 • represents the right side, and larger than 90 • represents the left side of the driver. For creating this figure, missing data (e.g., due to blinks or loss of gaze tracking) were linearly interpolated. The sum of frequencies per condition equals 100. subfigures), the driver's gaze in the LATY condition was directed to the right, and the gaze in the LTY was directed to the left and middle.
In Fig. 5, the button-press data for the scenarios without the second vehicle is shown. A large decrease in button presses was observed in Phase 2 for the yielding scenario of the Baseline condition (solid blue line). This decrease is considerably smaller for the mappings with eyegaze visualization (red and yellow solid lines). In other words, the LTY and LATY mappings and corresponding eye-gaze visualization made pedestrians believe they could cross when it was safe to cross.
For the non-yielding scenarios, an earlier drop in button presses is found for the LTY and LATY conditions (red and yellow dashed lines) compared to the Baseline condition (blue dashed line). In other words, the LTY and LATY conditions refrained pedestrians from crossing when the AV maintained speed. Fig. 6 shows the button-press data for the scenarios with the second vehicle. In the yielding scenarios, where good performance is characterized by releasing the button in Phase 2 (when the second vehicle is arriving), no clear difference between the three mappings can be seen. For the non-yielding scenarios, an earlier drop in button presses is observed for the LTY and LATY mappings (red and yellow dashed lines)

Fig. 5.
Button-press data for the three mappings for scenarios with yielding and non-yielding AVs, without the presence of the second vehicle.  Table 1 shows the button-press percentages for the three mappings. It can be seen that the LTY and LATY mappings yielded a higher percentage than Baseline when the AV was yielding and a lower percentage when the AV was not yielding (with five of six effects being statistically significant; Conditions 1, 2, and 4 in Table 1). In other words, the LTY and LATY mappings helped improve performance compared to Baseline. When the second vehicle was present and the vehicle was yielding (Condition 3 in Table 1), however, there were no significant differences between LTY/LATY and Baseline. In other words, the LTY and LATY mappings did not cause participants to cross when they should not. Finally, no significant difference in button-press percentages was observed between the LTY and LATY mappings. Fig. 7 shows the 'yaw angle difference' in Phase 2. The peak at 0 • represents the pedestrian looking at the AV. For the scenarios without the second vehicle, LTY (red) has the highest peak when the AV yielded (left top subfigure), and LATY has the highest peak when the AV did not yield (left bottom subfigure). These findings suggest that the gaze visualizations attracted the pedestrian's attention and, especially when being looked at (yielding & LTY, non-yielding & LATY), encouraged the pedestrian to look back. Fig. 7 further shows that pedestrians distributed their attention when the second vehicle was present (right two subfigures), with the peak being about half as high as compared to without the second vehicle (left two subfigures). Table 2 shows the mean acceptance ratings for the LTY and LATY mappings. The pedestrians experienced the LTY mapping as significantly more satisfactory than the LATY condition. Furthermore, drivers experienced the LTY condition as more useful than the LATY condition, but satisfaction ratings were only moderate, i.e., near the midpoint of the scale.
These observations were supported by a post-experiment questionnaire in which participants were asked to rank the three mappings (Baseline, LTY, LATY) in terms of their preference. The results revealed the following for the drivers: • 4 of 17 drivers ranked the Baseline mapping as the most preferred, and 3 ranked it as the least preferred (i.e., third ranking). • 10 of 17 drivers ranked the LTY mapping as the most preferred, and 4 ranked it as the least preferred.

Table 1
Mean (SD) button press percentage per mapping condition in the four different scenarios. The minimum possible score is 0; the maximum possible score is 100.  Fig. 7. Yaw angle difference between the pedestrian's eye-gaze and the AV in Phase 2, per yielding scenario and mapping, at a 1-deg resolution. The horizontal lines are drawn at the top of the highest frequency of the distribution. For creating this figure, missing data (e.g., due to blinks or loss of gaze tracking) were linearly interpolated. The sum of frequencies per condition equals 100.

Table 2
Mean (SD) acceptance scores for pedestrians (n = 17) and drivers (n = 17). The minimum possible score is − 2; the maximum possible score is 2. • 3 of 17 drivers ranked the LATY condition as the most preferred, and 10 ranked it as the least preferred.
For the pedestrians, the rankings were as follows: • 3 of 17 pedestrians ranked the LTY condition as the most preferred, and 9 ranked it as the least preferred. • 14 of 17 pedestrians ranked the LTY condition as the most preferred, and 0 ranked it as the least preferred. • 0 of 17 pedestrians ranked the LATY condition as the most preferred, and 8 ranked it as the least preferred.
Additionally, in the post-block questionnaire, pedestrians were asked whether it was clear to them that the vehicle was going to yield. The mean was 5.65 for LTY and 5.00 for LATY on a scale of 1 = Strongly disagree to 7 = Strongly agree. Drivers were asked whether it is was easy for them to direct the eye-gaze visualization to where they wanted it to be. The mean was 4.71 for LTY and 5.53 for LATY on a scale of 1 = Extremely difficult to 7 = Extremely easy.

Discussion
This study compared two mappings for maneuver-based AV control combined with driver eye-movement visualization. In the LTY mapping, the AV stopped for the pedestrian when the driver looked at the pedestrian. Conversely, in the LATY mapping, the AV stopped when the driver did not look at the pedestrian. An experiment was conducted with 34 participants, divided into 17 pairs of two, present in the same virtual environment. Thus, participants were a driver of a maneuver-based controlled AV, or they were a pedestrian who had to indicate their intention to cross the road via a button press. Pressing the button when the AV was yielding and not pressing when the vehicle was not yielding were regarded as indicative of high performance. However, when a second vehicle was present, the pedestrian had to refrain from crossing the road; hence in this situation, a low button-press rate was regarded as indicative of high performance. The scenario with the second vehicle was included to test for pedestrian overreliance on the driver's eye-gaze visualization.
The pedestrians' button-press results indicated that the LTY and LATY mappings yielded improved performance compared to the Baseline condition without eye-gaze visualization. These findings are consistent with much literature showing that eHMIs provide clarity to pedestrians (e.g., Böckle et al., 2017;Chang et al., 2017;De Clercq et al., 2019;Oudshoorn et al., 2021). The current eye-gaze visualization has various interesting qualities compared to existing eHMIs. Firstly, it depicts the driver's intention without further instruction or clarification, consistent with recommendations for eHMIs outlined by the International Organization for Standardization (2018; see also Tabone et al., 2021a;Benderius et al., 2018). Second, the literature suggests that non-directional signaling by eHMIs could confuse pedestrians as it may be unclear from whom the message is meant (Dey et al., 2021;Hensch et al., 2019). Through eye-gaze visualization, the message can be directed toward one specific pedestrian. Lastly, in real traffic, it may be unclear whether drivers and pedestrians have seen each other due to sunlight or glare, for example (AlAdawy et al., 2019). The visualization of eye gaze makes such information salient and explicit. The current study compared the concepts for maneuver-based control with a Baseline condition. A comparison with existing eHMIs in more complex traffic environments would be a welcome topic for future research (for a study on directed vs. undirected signaling, see Dietrich et al., 2018).
The results showed no significant differences in pedestrians' crossing intentions between the LTY and LATY mappings, which may be explained because drivers and pedestrians were instructed about the meaning of the mappings. The LTY and LATY mappings provided the same information to pedestrians (only in opposite directions), and hence no difference in crossing performance ought to be expected. Of note, the LTY and LATY mappings, which informed the pedestrian about whether they could cross, did not cause overreliance in the presence of a second vehicle approaching from the other direction. Even though the eyemovement analysis showed that the LTY and LATY mappings attracted the pedestrians' attention, pedestrians distributed their attention between the AV and the second vehicle.
Although no significant differences in crossing performance between the LTY and LATY conditions were observed, there were notable differences in acceptance. The LTY condition was regarded as most satisfactory by pedestrians and most useful by drivers. Furthermore, when asked to rank the mappings in terms of preference, participants ranked the LTY mapping the highest. Research shows that drivers are inclined to look at pedestrians and other potential hazards (e.g., Underwood et al., 2011); hence, deliberately not looking at a pedestrian may be regarded as counterintuitive. One driver wrote in the post-block questionnaire: "I had to think hard about if I had to look away or not. It was not logical to me." Similarly, recent research indicates that pedestrians are more likely to cross the road when the driver looks at them (Faas et al., 2021;Malmsten Lundgren et al., 2017;Onkhar et al., 2022). One pedestrian stated: "This map [LATY] was confusing me a bit. When the spotlights (the lasers) were on me, I was thinking that it was my turn to cross the road, while I actually should wait." A noteworthy aspect of the experiment is the use of two participants in the same virtual world, an approach that is gaining popularity in human factors research (Hancock and De Ridder, 2003;Houtenbos et al., 2017;Ko et al., 2022;Lehsing et al., 2015;Muehlbacher et al., 2014;Oeltze and Schießl, 2015;Park et al., 2019;Preuk et al., 2016). The visualization of a human driver's eye-gaze may have contributed to more natural and realistic situations compared to a pre-programmed light beam. However, it does introduce more variation as every person behaves somewhat differently. Another noteworthy aspect of the current research is the combination of eye-tracking and head tracking in an HMD, something that is gaining popularity (Blattgerste et al., 2018;Chen et al., 2018;Ferris et al., 2018;Park et al., 2019) but still relatively rare. Head-tracking in the HMD allowed the pedestrian to look around, resulting in a high field of regard. The wide eye-gaze distribution shown in the results section would be impossible to obtain using computer monitors or driving simulators. In the current study, a single hitbox detection by the gaze ray was sufficient to cause the AV to stop or not stop. This approach worked satisfactorily in our experimental setup, with participants inadvertently looking at the pedestrian in only 3 of 272 trials. In real-life applications, however, a temporal threshold may have to be used to distinguish a quick glance at the pedestrian from a more prolonged glance that expresses stopping intent.
A limitation of this research is that the participants lacked diversity, as our sample consisted predominantly of young Dutch persons. Older persons can be expected to be slower in understanding novel types of HMIs and are generally more cautious when crossing the road (e.g., Dunbar et al., 2004). Additionally, it can be expected that there exist differences in cultural and social norms across countries (Pelé et al., 2017;Ranasinghe et al., 2020). On the other hand, our recent research suggests that the effects of eHMIs on pedestrians' crossing intentions are cross-nationally robust (Oudshoorn et al., 2021).
Like many other AV-pedestrian interaction studies, the experiment addressed a single-pedestrian crossing in front of one or two vehicles. In reality, however, a pedestrian crossing situation is more complex. In future traffic, pedestrians may have to deal with a large number of other pedestrians and AVs at different levels of automation, as well as different crossing configurations and different weather conditions (for recommendations, see Dey et al., 2020). Furthermore, even with a high level of presence in virtual reality, it is still unclear whether testing in virtual environments is as valid as naturalistic testing (Tabone et al., 2021a).
Regarding the experiment, the block order randomization did not include the Baseline block as it was expected that there would be only minimal learning effects due to the addition of a practice run. Our analysis of learning curves showed no major learning effects, with the Baseline mapping yielding consistently lower performance (i.e., lower button-press rates for the yielding scenario without second vehicle, higher button-press rates for non-yielding scenarios) than the LTY and LATY mappings (see Fig. S1 in the supplementary material).
The technical and practical feasibility of the eye-gaze visualizations were not considered in the design process. The visualization of the driver's eye gaze brings many challenges. An eye tracker and a light projection system would need to be installed in the AV. A possible disadvantage of eye-gaze visualization is that it could distract other drivers. One also needs to consider the visibility of eye-gaze visualization in changing weather conditions. Another approach to implementing eye-gaze visualization is augmented reality (AR). The use of AR for drivers through heads-up displays is a common topic of research and development (e.g., Kim et al., 2018;WayRay, 2021), and recent research has started to consider AR for pedestrians as well (Hasan and Hasan, 2022;Hesenius et al., 2018;Tabone et al., 2021b;Tran et al., 2022). AR would solve the problem of distracting other road users and visibility in changing weather conditions as the visualization is displayed directly on the wearable. However, many challenges regarding AR still need to be addressed in future traffic, such as privacy, invasiveness, user-friendliness, technological feasibility, and inclusiveness (Tabone et al., 2021a).

Conclusions
This study compared two forms of eye-based AV control combined with the driver's eye-gaze visualization. The LTY and LATY conditions improved the crossing interaction compared to the Baseline condition. Furthermore, the LTY mapping achieved higher acceptance ratings than the LATY condition.
This study provides insights that may be important for the future maneuver-based control of automated vehicles, particularly the control of the vehicle through eye movements. In addition, the results show potential for the communication of the driver's intent by addressing the pedestrian directly via augmented feedback. How to implement our findings in actual practice is an engineering challenge that deserves further investigation.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.