Enhancing Safety in Autonomous Vehicles: The Impact of Auditory and Visual Warning Signals on Driver Behavior and Situational Awareness

: Semi-autonomous vehicles (AVs) enable drivers to engage in non-driving tasks but require them to be ready to take control during critical situations. This “out-of-the-loop” problem demands a quick transition to active information processing, raising safety concerns and anxiety. Multimodal signals in AVs aim to deliver take-over requests and facilitate driver–vehicle cooperation. However, the effectiveness of auditory, visual, or combined signals in improving situational awareness and reaction time for safe maneuvering remains unclear. This study investigates how signal modalities affect drivers’ behavior using virtual reality (VR). We measured drivers’ reaction times from signal onset to take-over response and gaze dwell time for situational awareness across twelve critical events. Furthermore, we assessed self-reported anxiety and trust levels using the Autonomous Vehicle Acceptance Model questionnaire. The results showed that visual signals significantly reduced reaction times, whereas auditory signals did not. Additionally, any warning signal, together with seeing driving hazards, increased successful maneuvering. The analysis of gaze dwell time on driving hazards revealed that audio and visual signals improved situational awareness. Lastly, warning signals reduced anxiety and increased trust. These results highlight the distinct effectiveness of signal modalities in improving driver reaction times, situational awareness, and perceived safety, mitigating the “out-of-the-loop” problem and fostering human–vehicle cooperation.


Introduction 1.Background
Autonomous vehicles (AVs) are expected to become a key part of future transportation, offering benefits like improved mobility, reduced emissions, and increased safety by reducing human errors [1][2][3].They also allow passengers to engage in non-drivingrelated tasks (NDRTs) during travel, reshaping the concept of private vehicle use [4].As defined by SAE International, each level of AV automation provides specific advantages.Levels 0 to 2 require human oversight, while Levels 4 and 5 involve high to full automation.Level 3 requires drivers to take vehicle control in situations unmanageable by the AV [5].This conditional automation has attracted attention due to their potential for driver-autonomous-system interaction [2,6,7].While this interaction enhances acceptance, a cognitive barrier to AV adoption, it also presents challenges [8][9][10][11].Acceptance is crucial in determining the feasibility of AVs integration into everyday life, and it can be effectively achieved through collaborative control and shared decision-making strategies [7,8,12].On the contrary, constant interaction demands driver readiness, requiring drivers to stay alert and ready to take control.This conflicts with the benefit of engaging in non-driving-related tasks, negatively impacting the user experience, increasing anxiety, and reducing acceptance of such AVs [8,13].To fully realize the benefits of AVs, we still need to address the existing drawbacks and challenges.
A key issue in Level 3 automation is the "out-of-the-loop" problem, defined as the absence of appropriate perceptual information and motor processes to manage the driving situation [14,15].The out-of-the-loop problem occurs when sudden take-over requests (TORs) by the vehicle disrupt drivers' motor calibration and gaze patterns, ultimately delaying reaction times, deteriorating driving performance, and increasing the likelihood of accidents [14][15][16][17].Providing appropriate Human-Machine Interfaces (HMIs) that improve driver-vehicle interaction can enhance the benefits of the AV and mitigate its challenges.They help drivers return back into the loop, ensuring a safe and effective transition from autonomous to manual control.Inadequate or delayed responses to take-over requests (TORs) can lead to accidents, posing significant risks to everyone on the road [1,3,18,19].For instance, multimodal warning signals (such as auditory, visual, and tactile) allow users to perform non-driving-related tasks while ensuring their attention can be quickly recaptured when necessary [20].These signals convey critical semantic and contextual information, ensuring that drivers can efficiently process the situation, make decisions, and resume control [20][21][22].The take-over process involves several information-processing stages, including the perception of stimuli (visual, auditory, or tactile), the cognitive processing of the traffic situation, decision-making, resuming motor readiness, and executing the action [19,23,24].Effective multimodal signals help drivers go back into the loop by moderating information processing during take-over, thereby enhancing situational awareness and performance.
Situational awareness and reaction time are crucial factors in the process of taking over control in semi-autonomous vehicles.Situational awareness is defined as "the perception of elements in the environment within a volume of time and space, the comprehension of their meaning, and the projection of their status in the near future" [25].This awareness is critical in the early stages of information processing, where the driver perceives potential threats and understands the context of a critical situation [20,25].Signals, whether unimodal or multimodal, that convey urgency can impact these early stages by capturing the driver's attention [20,26].For example, auditory non-verbal sounds enhance perceived urgency and refocus attention on driving tasks [27][28][29].Similarly, visual warnings, alone or combined with auditory signals, effectively increase the awareness of critical situations [22,27,30].Reaction time, another key metric in evaluating driver behavior during TORs, encompasses the entire process from perceiving the stimuli (perception reaction time) to performing the motor action (movement reaction time) [19,24].Perception reaction time involves perceiving the stimuli, processing information, and making a decision, while movement reaction time refers to the time needed to perform the motor action [19].The modality and characteristics of a signal can influence perception reaction times, while movement reaction times are more affected by the complexity of the situation and the expected maneuver [24,31].As a result, the visual modality can affect movement reaction times by alerting the driver while simultaneously providing complex spatial information, such as distance and direction [32].Understanding the roles of situational awareness and reaction times, as well as how different warning signals affect each, is essential for determining when and where specific signals are most beneficial or detrimental, ensuring a safer transition during TORs.
The methods used to measure drivers' situational awareness and reaction times during TORs are as crucial as their definition.Differences in measuring methods lead to incommensurable results challenging reproducibility [33,34].For instance, different metrics have been used to understand drivers' situational awareness and reaction time, including gaze-related reaction time, gaze-on-road duration, and hands-on-wheel reaction time [19,35,36], which leads to mixed results.Conventional driving simulations, typically conducted in risk-free laboratory setups, are costly, space-consuming, and often fail to replicate the realism of actual driving [37,38], leading to questions about the generalizability of their findings.Participants in these simulations are usually aware of the artificial setting, which can reduce the perceived realism and affect their behavior [39,40].To address these limitations, virtual reality (VR) has been proposed as a more immersive and cost-effective solution [41][42][43][44].VR allows researchers to isolate and fully engage drivers in critical situations, making simulated events feel more real [45,46], resulting in drivers' behaviors being closer to real life and findings that are more ecologically valid [47,48].Additionally, VR-based simulations offer advantages like integrated eye-tracking for assessing situational awareness [36,49], greater controllability and repeatability [50], and usability for education and safety training [51][52][53].Overall, VR presents a promising tool for investigating driver behavior in a safe, efficient, and reproducible manner, particularly when designing multimodal interfaces for future autonomous transport systems.

Research Aim
In this study, we used VR to investigate the impact of signal modalities-audio, visual, and audio-visual-on situational awareness and reaction time during TOR in semi-autonomous vehicles.The rapid development of semi-autonomous vehicles has left a gap in understanding how warning signals impact driver behavior and situational awareness.This study aims to fill that gap by investigating the impact of these warning signals within a controlled VR environment, a method that is both innovative and allows for continuous and highly detailed observation of drivers' responses.Unlike previous studies that often focused on single-modality warnings or non-immersive environments, our research combines a multimodal approach with the immersive capabilities of VR technology.We collected quantitative data on both objective behavioral metrics (visual and motor behavior) and subjective experience (AVAM questionnaire) from a large and diverse population sample, encompassing varying ages and driving experience.We hypothesized that the presence of any warning signal (audio, visual, or audio-visual) would lead to a higher success rate compared to a no-warning condition.Specifically, we posited that the audio warning signal would enhance awareness of the critical driving situation, while a visual signal would lead to a faster motor response.Furthermore, we predicted that the audio-visual modality, by combining the benefits of both signals, would result in the fastest reaction times and the highest level of situational awareness.Additionally, we expected that seeing the object of interest, regardless of the warning modality, would directly improve the success rate.Finally, we expected a positive correlation between self-reported preferred behavior and actual behavior during critical events and hypothesized that the presence of warning signals would be associated with increased perceived trust and decreased anxiety levels in drivers.This approach advances our understanding of driver-vehicle interactions and provides practical insights for designing more effective and user-friendly AV systems, with significant implications for improving road safety and reducing accidents.

Research Design
To investigate the impact of warning signal modality on driver behavior and situational awareness, four experimental conditions were designed to alert the driver about critical traffic events, requiring the driver to take over the control of the vehicle.Experimental conditions included a base condition with no signal, an audio signal, a visual signal, and an audio-visual signal (Figure 1a).The base condition served as the control, providing no warning feedback to the driver.The audio signal modality featured a female voice, delivering a spoken "warning" sound at a normal speech pace.The visual signal modality included a red triangle with the word "warning" and a red square highlighting potential driving hazards in the scene.The size and color of the warning triangle are designed to capture participants' attention.The warning signal is displayed on the windshield directly in front of the participant's head (Figure 2a).At the moment the triangle appears, the objects detected by the AV were highlighted with a red border.The audio-visual signal modality combined both sound and visual warnings for dual feedback.Participants were randomly assigned to one of these conditions and drove an identical route, encountering the same 12 critical traffic events in sequential order.The critical events were designed to assess driver behavior during TOR and subsequent driving in hazardous situations (Figure 1b).Each critical event involved the sudden appearance of predefined driving hazards or "object(s) of interest", such as animals, pedestrians, other vehicles, or landslides that posed challenges for driving.For instance, in the audio-visual condition, both visual and auditory feedback alerted the driver about the driving hazard briefly at the start of the event (Figure 2a), after which the signals remained off for the duration of the subsequent manual drive (Figure 2b).The experimental design allowed us to collect behavioral and perceptual metrics and compare drivers' responses across different conditions.The figure illustrates the four experimental conditions during the semi-autonomous driving, with the green area depicting a critical event from start to end.During critical events, participants transitioned from "automatic drive" (hands off wheel) to "manual drive" (hands on wheel).In the no-signal condition, no signal prompted participants to take over vehicle control.In the audio condition, they would hear a "warning" sound, and in the visual condition, they would see a red triangle on the windshield.In the audio-visual condition, participants would hear and see the audio  We chose a mixed-methods approach to collect quantitative data on both objective behavioral metrics and subjective experiences of autonomous driving.Objective measures included the driver's reaction time from signal onset to the take-over response, the successful completion in avoiding vehicle collision, as well as the dwell time of the gaze on the object of interest.Additionally, we used the Autonomous Vehicle Acceptance Model questionnaire to assess self-reported levels of anxiety, trust, and the preferred method of control after participants completed the VR experiment (for the respective items, refer to Appendix A, Table A1) [54].The current research design enables us to investigate how signal modality impacts driver behavior, situational awareness, and perceived levels of trust when handling TOR from semi-autonomous vehicles.By analyzing these performance metrics, the current research provides insights into the impact of multimodal warning signals on drivers' behavior and situational awareness, ultimately contributing to enhanced driving safety.

Materials and Setup
The current study used an open-access VR toolkit called "LoopAR" to test the driver's interaction with a highly automated vehicle during various critical traffic events.The LoopAR environment was implemented using the Unity ® 3D game engine 2019.3.0f3(64 bit) [55] and designed for assessing driver-vehicle interaction during TORs [56].LoopAR offers an immersive driving experience and seamlessly integrates with gaming steering wheels, pedals, and eye tracking.The VR environment includes a dynamic VR city prototype comprising numerous vehicles (such as motorbikes, trucks, cars, etc.), 125 animated pedestrians, and several types of traffic signs.The design of the terrain was based on real geographical information of the city of Baulmes in the Swiss Alps.The entire terrain covers approximately 25 km 2 and offers about 11 km of a continuous driving experience through four types of road scenes (Figure 3).The scenes include a mountain road (3.4 km), a city road "Westbrueck" (1.2 km), a country road (2.4 km), and a highway road (3.6 km).In addition to static objects such as trees and traffic signs, the VR environment is populated with animated objects like pedestrians, animals, and cars to create realistic traffic scenarios.This diverse, expansive, and highly realistic VR environment enables the current experiment to measure driving behavior with high accuracy and efficiency.For rendering the VR environment, we used the HTC Vive Pro Eye device with a built-in Tobii Eye Tracker connected to SRanipal and the Tobii XR SDK.The eye-tracking component includes an eye-tracking calibration, validation, and online ray-casting to record the driver's gaze data during the experiment.To enable and collect steering and braking input data, we used game-ready Fanatec CSL Elite Steering Wheel and pedal devices.The experimental setup aimed at providing a stable and high frame rate visual experience and supported an average of 88 frames per second (fps), matching the 90 fps sampling rate of the HTC Vive eye-tracking device.More detailed information about the LoopAR environment, implementation, and software requirements can be found in the paper [56].

Participants
A total of 255 participants were recruited for the study during two job fairs in the city of Osnabrück.Choosing a public measurement site allowed us to collect participants from diverse backgrounds and demographic groups, closely reflecting the composition of real-world society.However, some participants were excluded from the final analysis due to various reasons.Only 182 participants, including 74 females (M = 23.32 years, SD = 8.27 years), and 110 males (M = 27.68 years, SD = 19.75years), who answered the Autonomous Vehicle Acceptance Model questionnaire were included in the questionnaire analysis.For the behavioral analysis, further exclusion criteria were applied.A total of 84 participants were excluded due to an eye-tracking calibration error exceeding 1.5 degrees or failure to complete the four queried scene rides.Additionally, 8 participants were excluded due to missing data for two or more of the critical events.Lastly, 6 participants were excluded due to eye-tracking issues such as missing data or large eye blinks.After these exclusions, 157 participants (base = 43; audio-visual = 50; visual = 36; audio = 28) were included in the main behavioral analyses.Demographic information such as gender and age was only available for 143 participants who had both behavioral and questionnaire data (53 females, M = 22.67, SD = 6.79, and 90 males, M = 27.38 years, SD = 10.49years).No demographic-based analysis was conducted in this study.All participants provided written informed consent before the experiment, and data were anonymized to protect participant privacy.The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of the University of Osnabrück.

Experimental Procedure
Upon arrival, participants were familiarized with the VR equipment that simulates the VR environment, comprising the VR headset and the wooden car simulator (Figure 4).For the efficient use of time and space, measurements were conducted on both sides of the wooden car.Regardless of the side, the VR simulation consistently replicated right-hand driving.In the simulated driving scenario, participants were randomly assigned to one of the four experimental conditions and instructed to complete a VR drive.Before starting the main experiment, participants underwent a training session to acquaint themselves with the equipment and the driving task.Subsequently, we conducted eye-tracking calibration and validation, along with calibration for seat positioning and steering wheel.After completing the initial preparations, participants proceeded to the main experiment, which involved a continuous (virtual) drive of around 11 km with 12 critical events.During the drive, all participants traversed an identical route and encountered the same sequence of events.The experiment lasted approximately 10 min, and afterward, participants completed an optional Autonomous Vehicle Acceptance Model questionnaire.

Data Preprocessing
The data were recorded in Unity and collected via the eye-tracker integrated into the Head-Mounted Display (HMD) and the steering wheel and pedals.It included information about the critical events, participants' visual behavior, and the handling of the vehicle.The participant's eye-tracking data consisted of the positions and directions represented as three-dimensional (3D) vectors in Euclidean space.All objects in the VR environment were equipped with colliders, which defined their shapes and boundaries.During the experiment, the eye tracker recorded the vector from each participant's gaze that intersected with the collider of each object.In other words, the participant's gaze on the objects was determined by where their gaze hit the objects' colliders.Both the names and the positions of the objects hit by the participant's gaze were recorded in 3D vectors for each timestamp.To filter and narrow down the search for determining which objects the participants paid attention to during the drive, we calculated the distance from the participant's position to each object and selected only the names and positions of the five closest objects to include in the final dataset.The steering data consisted of numerical values representing the rotation of the vehicle steering wheel in degrees.A value of zero indicated no steering (constant direction), negative values indicated steering to the left, and positive values indicated steering to the right.Similarly, the pedal data consisted of acceleration and braking inputs.While the vehicle was driving autonomously, the values remained constant at either 1 or 0. When the participant provided input during the take-over and manual driving, the values changed to greater than 0. Lastly, the VR scene data consisted of information regarding each event, such as the name, start and end timestamps, and duration, as well as the name of the object that the participant gaze collided with, if any.As such, information on whether the event was successfully completed or not was also included.This process ensured that the data used in the study was reliable and met the necessary requirements for the analysis and testing of the hypothesis.

Performance Metrics 2.6.1. Reaction Times
To evaluate drivers' take-over performance, we calculated their reaction time.It was defined as the time interval between the onset of the warning signal and drivers' following behavioral response.This involved analyzing drivers' motor engagement when steering, braking, and accelerating the vehicle.For each type of behavior, we first verified whether input values remained constant or varied, indicating whether drivers maneuver with said behavior or not.If the values varied over the duration of the critical event, we calculated the rate of change per second over time [57,58], smoothed this rate of change using a mean rolling window, and took its absolute value.This approach allowed us to understand the velocity of the steering wheel maneuver over the duration of the manual drive, as well as sudden changes in braking and acceleration.
Following a peak approach, we then calculated the first derivative of the smoothed rate of change to identify all local maxima (peak velocity), which corresponded to pivotal points of significant change when steering, braking, and accelerating [58].The first local maxima surpassing a predefined threshold represented the driver's reaction time.For braking and acceleration, we set a threshold equal to the median of the initial values of the smoothed rate of change plus 0.01.For steering, we set a threshold of 5 degrees if the maximum smoothed rate of change was greater than 3; otherwise, the maximum smoothed rate of change was set as the threshold [23,58,59].We then identified the first local maximum that exceeded the threshold and used its timestamp as the reaction time by subtracting it from the timestamp when the event started.In instances where no local maxima exceeded the threshold, the reaction time was determined as the timestamp of the first identified local maximum.We repeated this process for each type of motor engagement (steering, braking, and accelerating) of the driver with the vehicle and for each of the 12 critical events.Because critical events could vary greatly in nature and could require different types of responses [23], we calculated two types of reaction time: fastest reaction time and expected reaction time.The fastest reaction time was determined as the minimum value between steering, braking, and accelerating.The expected reaction time was based on a predefined order of reactions that could be expected from the driver during each event.For instance, some events would require steering to avoid driving hazards on the road, while others would require braking to avoid frontal collisions with pedestrians.This approach allowed us to calculate the reaction time in a way that accounts for the different characteristics of each type of motor engagement and the specific conditions at each event.

Situational Awareness
To calculate the behavioral indicator of situational awareness, we processed the participants' gaze data.Specifically, we examined whether the participant saw the object of interest ("seeing the object of interest"), the number of times the participant saw the object of interest ("gaze hits"), and the duration of their gaze on the object of interest ("dwell time of gaze").To determine the "seeing of the object of interest", we identified the data sample where the participant's gaze collided with the object of interest in each event and saved it as binary categorical data.To calculate the "dwell time of gaze", we computed the difference between the start and end timestamp for each consecutive data sample when the participant's gaze collided with the object of interest.We excluded durations shorter than 100 milliseconds, consistent with typical eye gaze durations in human eye movement analysis [60,61].To establish the number of "gaze hits" per participant within each event, we counted and summed the number of meaningful individual gaze durations (>100 ms).Finally, the total "dwell time of gaze" was also calculated by summing over the individual gaze durations.These steps enabled us to quantify the participant's various aspects of situational awareness in each event during the experiment.

Successful Maneuvering
We assessed participants' successful maneuvering during the VR drive by analyzing their frontal collision performance during critical traffic events.Specifically, we focused on whether participants avoided colliding with objects in the scene.For each of the 12 critical events, a maneuver was considered successful if the participant did not crash the vehicle with any of the objects of interest inside the event.Conversely, an event was flagged as unsuccessful when a collision occurred.Unity performed this "successful event completion" calculation online during the experiment for each event, tracking participants' interactions with the objects in the scene based on the vehicle and objects' positions.

Results
A comprehensive analysis was conducted to evaluate the impact of different types of warning signals (audio, visual, audio-visual, and no signal) on reaction time, situational awareness, and the likelihood of successfully managing critical driving events in a simulated environment.Due to the non-normal distribution of the data across all dependent variables, we conducted the non-parametric Friedman test, an extension of the Wilcoxon signed-rank test.This test serves as the non-parametric equivalent of repeated measures analysis of variance (ANOVA) without assuming normality.The results from both Friedman tests and logistic regression models provided a detailed understanding of these effects.

Reaction Time Analysis
Investigating the main effect of the warning signal on overall reaction time among participants is one of the main purposes of the current study.A Friedman test was conducted to compare the effects of different types of warning signals-audio, visual, and audio-visual-as well as no-signal (reference condition) on the reaction time (log-transformed).The test revealed a significant difference in reaction times across the conditions, χ 2 (3) = 24.99,p < 0.001.Post hoc Wilcoxon signed-rank tests were conducted to follow up on this finding.Significant differences were found between the base condition and both the audio-visual (W = 51, p < 0.001) and visual conditions (W = 61, p < 0.001), indicating that both audio-visual and visual warnings significantly reduced reaction time compared to the case with no warning signal.Additionally, significant differences were observed between the audio-visual and audio conditions (W = 64, p < 0.001), as well as between the visual and audio conditions (W = 67, p = 0.001), suggesting that visual components, whether alone or in combination with audio signals, are more effective in reducing reaction times than audio signals alone.No significant difference was found between the audio-visual and visual conditions (W = 188, p = 0.745) (Figure 5).These results demonstrate that visual warning signals, whether alone or combined with audio signals, are particularly effective in reducing reaction times in critical driving scenarios, whereas no such effect was observed for audio warning signals alone.The result aligns with our hypothesis, which expects different speeds of motor responses for visual warnings compared to audio warnings.

Figure 5. This plot displays the distribution of reaction times (y-axis) across four conditions (x-axis).
Each condition is represented by a violin plot, where the outer shape illustrates the probability density of reaction times.The embedded box plot highlights the median (central line) and interquartile range (box).Scatter points show individual data observations.The gray dashed line (*) connects conditions with no statistically significant differences.

Synergy between Warning Signals and Seeing Objects of Interest
As a first step in investigating the impact of situational awareness, we tested whether seeing or not seeing the object of interest influences successful maneuvering.A logistic regression analysis was conducted with the binary variable of seeing the object of interest and different types of warning signals-audio, visual, audio-visual, and the no-signal condition serving as the reference category-on the likelihood of successfully managing critical driving events.The base model, which included 1884 observations, revealed significant positive effects for seeing the object of interest (β = 1.31,SE = 0.11, z = 12.36, p < 0.001), audio warnings (β = 0.70, SE = 0.15, z = 4.62, p < 0.001), visual warnings (β = 0.78, SE = 0.14, z = 5.48, p < 0.001), and audio-visual warnings (β = 0.82, SE = 0.13, z = 6.29, p < 0.001) (Figure 6).The overall model was significant (LLR p < 0.001), with a pseudo-R2 value of 0.086, indicating that both the presence of warning signals and the probability of seeing the object of interest significantly increase the likelihood of successfully managing critical driving events.An interaction model was also tested to explore the combined effects of seeing the object of interest and different types of warning signals.The interaction terms, however, were not significant (e.g., object_of_interest * Audio: β = 0.16, SE = 0.32, z = 0.51, p = 0.611), suggesting no synergistic benefit from the combination of seeing the object of interest and receiving specific types of warnings.The interaction model showed a slightly higher pseudo R 2 value of 0.087,and a log-likelihood of −1133.5 compared to the base model's −1135.0(for the full model result refer to Appendix A, Figure A1).The audio-visual warnings had the strongest effect, followed by visual and audio warnings, indicating the particular effectiveness of combining audio and visual signals in increasing the probability of seeing objects of interest.

Dwell Time of Gaze Analysis
As the next step, we investigate whether the quantitative measure of the duration of gaze on the objects of interest provides more detailed information on situational awareness and the effect of warning signals.A Friedman test was conducted to compare the effects of the warning signals on total dwell time (log-transformed) across conditions.This test revealed a statistically significant difference in dwell times across the conditions, χ 2 (3) = 11.23,p = 0.011.Follow-up Wilcoxon signed-rank tests indicated that all warning conditions (audio-visual: W = 92, p = 0.010; visual: W = 96, p = 0.014; audio: W = 106, p = 0.026) resulted in significantly greater total dwell times compared to the base condition, suggesting that any form of warning signal increases drivers' attention.However, no significant differences were found between the audio-visual and visual conditions (W = 167, p = 0.425), audio-visual and audio conditions (W = 162, p = 0.362), or visual and audio conditions (W = 178, p = 0.582) (Figure 7).The results demonstrate that the effect of an audio warning signal on increasing driver dwell time was comparable to that of a visual warning signal when compared to no warning signal.

Identifying Success Factors in Critical Event Maneuvering
In the final stage of the analysis, we investigate the impact of warning signals on successful maneuvering, as well as incorporate additional variables-specifically reaction time and dwell time-to examine their combined influence on this success.The comparison of these models will allow us to discern the relative contributions of reaction time and dwell time, in conjunction with warning signals, to the overall success rates in managing critical driving scenarios.The base model included 1884 observations and showed a significant overall effect (LLR p = 1.014 × 10 −50 ), with a pseudo R 2 of 0.098.The model demonstrated that the presence of an audio warning signal significantly increased the likelihood of success (β = 0.59, SE = 0.15, z = 3.83, p < 0.001), as did visual warnings (β = 0.55, SE = 0.14, z = 3.85, p < 0.001) and audio-visual warnings (β = 0.63, SE = 0.13, z = 4.74, p < 0.001).Longer reaction times were associated with a decrease in the likelihood of success (β = −0.80,SE = 0.13, z = −6.29,p < 0.001), and greater total dwell time increased the likelihood of success (β = 1.22,SE = 0.11, z = 11.03,p < 0.001).The extended model, which included 1884 observations incorporating interaction terms between reaction time, total dwell time, and warning signals was also significant (LLR p = 4.413 × 10 −47 ), with a pseudo R 2 of 0.105.However, when interaction terms are included, the main effects of audio and visual signals are not significant anymore.However, the interaction of reaction time and dwell time with the audio-visual condition and visual condition is significant.The model illustrated significant interactions between total dwell time and visual warning (β = 2.68, SE = 0.86, z = 3.10, p = 0.002), total dwell time and audio-visual warning (β = 2.02, SE = 0.77, z = 2.62, p = 0.009), and reaction time and visual warning (β = 1.56,SE = 0.55, z = 2.81, p = 0.005).It can be interpreted that the positive effect of shorter reaction times on increasing success is enhanced by the presence of visual signals (comparing β = −0.80 with β = 1.56).Similarly, the positive coefficient between longer dwell time and success was enhanced in visual (comparing β = 1.22 with β = 2.68) and audio-visual (comparing β = 1.22 with β = 2.02) signals.In addition, the interaction between reaction time and total dwell time was significant (β = 1.21,SE = 0.41, z = 2.99, p = 0.003) suggesting that the combined effect of quick reactions and increased awareness (dwell time) improves success (Figure 8) (for the full model result, refer to Appendix A, Figure A2).Overall, this finding aligns with our hypothesis, suggesting that all types of warning signals significantly enhance the likelihood of successfully managing critical driving events compared to no warning, with audio-visual signals being the most effective.However, simply reacting quickly is not enough; maintaining situational awareness is critical.Additionally, the presence of visual signals enhances the reaction time performance, indicating that visual signals help drivers process and react to critical events more effectively.Furthermore, the increase in successful maneuvering is associated with higher situational awareness, particularly with visual and audio-visual signals.In summary, our findings confirm that while all types of warning signals improve the management of critical driving events, visual and audio-visual signals notably enhance both reaction times and situational awareness, highlighting the crucial role of maintaining visual engagement in ensuring driving safety.

Questionnaire Analysis
To examine the impact of different types of warning signals on reducing anxiety and increasing trust in AVs, we asked participants to score relevant six items from the Autonomous Vehicle Acceptance Model questionnaire.Additionally, to compare the objective use of the body and subjective ratings of such use while using the experimental car, we included three additional items from the questions regarding the importance of using eyes, hands, and feet (for the respective items, refer to Appendix A, Table A1).Since we used a subset of the original questionnaire, we began with an exploratory factor analysis.The analysis identified two primary factors based on the eigenvalues, the percentage of variance explained, and the cumulative percentage.Factor 1 has an eigenvalue of 3.42 and accounts for 38.01% of the variance, while Factor 2 has an eigenvalue of 1.60, explaining 17.83% of the variance.Together, these two factors cumulatively account for 56.09% of the variance in responses, indicating that they capture a substantial portion of the overall variability in the data.Factor one includes six items related to anxiety and perceived safety, while factor two is primarily associated with items related to methods of control (Table 1).The factor scores were computed by summing the products of each item's score and its respective factor loading.Both positive and negative loadings were considered when calculating the scores for each factor.Therefore, six items related to perceived safety and anxiety were combined into a single "Anxiety_Trust" measure.Due to the distinct characteristics of the methods of control questions, they were retained as three separate items.The results of the exploratory factor analysis demonstrated that the selected items for anxiety and trust, as well as methods of control, constitute a valid subset for measuring these values in the context of AV usage.To investigate the effect of the warning signal on relevant acceptance scores, we conducted a Multivariate Analysis of Variance (MANOVA).This analysis aimed to examine the differences in the mean vector of the multiple dependent variables (Anxiety_Trust and the three methods of control scores) across the various levels of the independent variable (four conditions).The analysis showed significant effects of the conditions on the acceptance of autonomous driving.The multivariate tests for the intercept showed significant effects across all test statistics: Wilks' Lambda, Λ = 0.0694, F(4, 175) = 586.25,p < 0.0001; Pillai's Trace, V = 0.9306, F(4, 175) = 586.25,p < 0.0001; Hotelling-Lawley Trace, T 2 = 13.40,F(4, 175) = 586.25,p < 0.0001; and Roy's Greatest Root, Θ = 13.40,F(4, 175) = 586.25,p < 0.0001.These results confirm the robustness of the model.Additionally, the multivariate tests for the condition variable indicated significant effects: Wilks' Lambda, Λ = 0.8596, F(12, 463.29) = 2.27, p = 0.0083; Pillai's Trace, V = 0.1445, F(12, 531) = 2.23, p = 0.0092; Hotelling-Lawley Trace, T 2 = 0.1586, F(12, 301.96) = 2.30, p = 0.0081; and Roy's Greatest Root, Θ = 0.1194, F(4, 177) = 5.28, p = 0.0005.These findings suggest that the different conditions significantly affect the Anxiety_trust measure of autonomous driving, as indicated by Wilks' Lambda.Post hoc analyses using Tukey's HSD test revealed significant differences among the experimental conditions for the Anxiety_Trust variable.Specifically, there was a significant difference between the audio and base conditions (mean difference = 1.03, p = 0.003) and between the base and audio-visual conditions (mean difference = 1.02, p < 0.001), and between the base and visual conditions (mean difference = 0.72, p = 0.046).No other pairwise comparisons for Anxiety_Trust were significant (p > 0.05) (for the full result, refer to Appendix A, Figures A3 and A4).For the Methods_of_Control_eye, Meth-ods_of_Control_hand, and Methods_of_Control_feet variables, the post hoc tests indicated no significant differences between any of the experimental conditions (all p > 0.05) (Figure 9).These results suggest that all types of warning signals specifically decrease anxiety and increase trust but do not impact the choice of control methods.

Questionnaire and Behavioral Data Comparison
In the last stage of the analysis, we aimed to determine the extent to which selfreported measures, such as anxiety, trust, and preferred methods of control, correlate with participants' objective behavior during the experiment.To investigate this, we conducted Spearman's rank-order correlation to assess the relationship between reaction time and dwell time with questionnaire scales from factor analysis.The correlation between reaction time and Anxiety_Trust factor was found to be non-significant.Similarly, the correlations between reaction time and Methods_of_Control_eye, Methods_of_Control_hand, and Methods_of_Control_feet were also non-significant, indicating no meaningful relationships between reaction time and these variables.A similar series of Spearman's rank analyses was conducted for dwell time, which also showed no significant correlation with the questionnaire measures (see Figure 10 for correlation scores and significance levels).Overall, these findings reveal that, in the context of semi-autonomous studies, self-reported measures of acceptance do not reliably correlate with users' actual behavior while driving.In the next step, we extended the correlation analysis to include age and gender variables.The results showed no significant correlation between age and Anxiety_Trust, all methods of controls, reaction time, or dwell time.Similarly, no significant correlations were found between gender and these variables.In addition to demographic variables, we assessed participants on their VR experience, racing experience, and their weekly driving hours.The analysis also revealed no significant correlation between these experience variables and the experimental variables (See Figure 10 for correlation scores and significance levels).

Discussion
The aim of this experiment was to investigate the effect of unimodal and multimodal warning signals on reaction time and situational awareness during the take-over process by collecting visual metrics, motor behavior, and subjective scores in a VR environment.The results demonstrated distinct effects of audio and visual warning signals during the take-over control process.The visual warning signal effectively reduced reaction time and increased dwell time on the object of interest, indicating enhanced situational awareness.In contrast, the audio warning signal increased situational awareness but did not affect reaction time.The multimodal (audio-visual) signal did not exceed the effectiveness of the visual signal alone, suggesting its impact is primarily due to the visual component rather than the audio.Additionally, the positive impact of both unimodal and multimodal signals on successful maneuvering during critical events demonstrates the complex moderating role of these modalities in decreasing reaction time and increasing situational awareness.The results indicate that when visual warnings are present, the negative effect of longer reaction time on success is lessened.In other words, visual warnings, by providing more robust cues, help mitigate the adverse impact of longer reaction time during take-over.Furthermore, the impact of dwell time on success is moderated by the presence of visual or audio-visual warning signals, possibly because the warnings offer sufficient information or cues, reducing the necessity for additional dwell time for successful maneuvering.Finally, the positive impact of providing unimodal and multimodal warning signals on reducing anxiety and enhancing trust has been confirmed.In conclusion, various signal modalities impact reaction time and situational awareness differently, thereby influencing the takeover process, while their effect on anxiety and trust components is evident but independent of situational awareness and reaction time.
Despite conducting our experiment in a realistic VR environment, this study has some limitations.In designing our study, we aimed to capture a diverse sample of participants by conducting the experiment in a public setting.This approach allowed us to include individuals with a wide range of ages, genders, driving experiences, and prior exposure to VR.Although our sample size was limited by the financial and logistical constraints of VR-based research, this diversity enhances the ecological validity of our findings, making them more representative of the general driving population.We used 12 critical events in succession despite past studies suggesting the importance of lowering their frequency [39].This can negatively influence participants' experience by inducing motion sickness [62].To mitigate possible side effects, the wooden car was designed to provide a higher feeling of immersion.Additionally, our choice of specific types of auditory and visual warning signals as well as measuring methods may limit direct comparisons with studies using different types of signals [63,64].For reaction time calculation, we considered three types of reactions (steering, braking, and acceleration), unlike most studies, which typically focus on one type or a combination [65].Similarly, for the dwell time of thegaze, we accounted for the entire event duration, not just until the first gaze or response [33,36,66].Despite these limitations, reaction time and situational awareness are commonly used metrics for evaluating warning signals on take-over processes in AVs [35], and the immersive VR environment enabled a way to assess ecologically valid driver responses over a large and diverse population sample [47,48,67].A crucial role of warning signals is their potential to mitigate distraction, particularly in situations involving fatigue and prolonged driving.However, the current experimental setup assessed visual and auditory warning signals within a limited time window.While our study offers insights into the immediate effects of visual and auditory warnings on driver behavior and stress, the responses were measured in a brief period during a 10-min session.Additionally, stress levels were assessed through a post-experiment self-report, which may not fully capture their physiological impact on drivers [68].More research is required to investigate the effects of these warning signals on stress and fatigue during longer driving scenarios.
The take-over process consists of several cognitive and motor stages, from becoming aware of a potential threat to the actual behavior leading to complete take-over [19,23,69].This procedure is influenced by various factors, and the effect of signal modality can be examined at each stage.Broadly, this procedure can be divided into two main phases, initial and latter, which overlap with situational awareness and reaction time, respectively.In the initial stage of the take-over process, stimuli such as warning signals alert the driver about the critical situation.The driver perceives this message and cognitively processes the transferred information [19,24].Endsley's perception and comprehension components of situational awareness align with the initial stages of perceiving and cognitively processing information during the early stage of the take-over process [25,70].Despite the differences between auditory and visual signals [26,71], the current study aligns with previous findings demonstrating the ability of unimodal and multimodal signals to enhance situational awareness [20], thereby fostering the initial stage of the reaction time.While situational awareness defines the early stages of cognitive and motor processes during take-over, the reaction time establishes the latter stages.
Following the initial stage, the subsequent stage involves decision-making and selecting an appropriate response, which is heavily dependent on previous driving experience.Thereafter, the driver's motor actions are calibrated and transition to a ready phase, involving potential adjustments to the steering wheel or pedals.The process culminates with the execution of the reaction required to manage a critical situation [24].According to the definition of the reaction time, it passed through two steps of comprehension and movement [19].Comprehension reaction time overlaps with the concept of situational awareness, while movement reaction time refers more to the latter stages of the take-over process, including calibration and actual motor action.Our calculation of reaction time focused on measuring the actual hand or foot movements from the start of a critical event (including driver calibration) to the movement reaction time.Our findings confirm that both visual and audio-visual signals reduce reaction time to the same extent.The audio modality, however, was not able to extend its effect to the reaction time phases.Despite the fact that the multimodal signal (audio-visual) helps decrease reaction time, it was not a significant moderator of reaction time in ensuring success.This is explainable when considering the late stages of the take-over process.Before movement calibration and actual action, there is a decision-making stage where visual warning signals are crucial [19,24].A fundamental characteristic of visual warning signals encompasses the type of information they convey [32].Indeed, visual warnings can provide additional information in comparison to auditory signals, including spatial information (e.g., depth and speed) and distractor information (e.g., type, kind) [26,32,72].This complex information enhances decision-making on the more appropriate type of reaction for the situation, compensating for the faster effect of auditory signals.While audio signals impact the early stages of take-over, visual signals provide valuable information that assists in the process.
Enhancing user experience is a critical aspect of the development and adoption of AV technology [34].Therefore, the subjective perception of users regarding each aspect of the technology should be developed alongside technical advancements.The methodological approach of the current experiment, based on VR, places a high priority on realism while ensuring safety and inclusivity.This approach combines eye-tracking methods with hand reaction measurements to comprehensively understand human reactions.To further emphasize realism, participant recruitment was designed to include a diverse group representing various ages, genders, and experiences.Additionally, the combination of behavioral data with subjective measures serves to bridge the gap between actual experiences and self-assessed experiences.The outcome of such a methodological approach will lead to user-friendly designs that are compatible with users' individual characteristics and specific needs.Further studies should delve deeper into extending subjective measurements to encompass the cognitive and behavioral processes that arise during driving.

Conclusions
In conclusion, this study presents a comprehensive investigation into the effects of auditory and visual warning signals on driver behavior in semi-autonomous vehicles, utilizing an immersive VR setup.This study enhances our understanding of the characteristics, benefits, and roles of auditory and visual signals, which can aid in designing HMIs that are compatible with drivers' perceptual and behavioral reactions during take-over requests.Building on previous research that often isolates these modalities, our study integrates them within a more complex and dynamic driving environment that is closer to real-world conditions.Thus, our research offers a more holistic and realistic assessment of how drivers interact with advanced vehicle systems.This approach is designed with drivers' cognitive capabilities and limitations in mind, potentially enhancing driver-vehicle cooperation and intention recognition.By providing new insights into how different sensory modalities can be optimized for driver alerts, our findings contribute to advancements in vehicle safety technology with potential social implications such as improved safety and public trust in AV systems.These considerations are essential for creating a framework to address the challenge of integrating AVs into future transportation systems.

Figure 1 .
Figure 1.(a)The figure illustrates the four experimental conditions during the semi-autonomous driving, with the green area depicting a critical event from start to end.During critical events, participants transitioned from "automatic drive" (hands off wheel) to "manual drive" (hands on wheel).In the no-signal condition, no signal prompted participants to take over vehicle control.In the audio condition, they would hear a "warning" sound, and in the visual condition, they would see a red triangle on the windshield.In the audio-visual condition, participants would hear and see the audio and visual signals together.(b) Aerial view of the implementation of a critical traffic event in the VR environment.
Figure 1.(a)The figure illustrates the four experimental conditions during the semi-autonomous driving, with the green area depicting a critical event from start to end.During critical events, participants transitioned from "automatic drive" (hands off wheel) to "manual drive" (hands on wheel).In the no-signal condition, no signal prompted participants to take over vehicle control.In the audio condition, they would hear a "warning" sound, and in the visual condition, they would see a red triangle on the windshield.In the audio-visual condition, participants would hear and see the audio and visual signals together.(b) Aerial view of the implementation of a critical traffic event in the VR environment.

Figure 2 .
Figure 2. Exemplary images of the driver's view in the audio-visual condition during two different critical events.(a) Auditory and visual feedback alerting the driver at the beginning of a critical event, and the view of the windshield after the take-over for the subsequent drive (b).The "warning" logo on the top left was not visible to participants.

Figure 3 .
Figure 3. LoopAR road map bird view from start to end.Image from [56].

Figure 4 .
Figure 4. Two figures illustrating the experimental setup during measurement.(a) Two participants sitting on the wooden car simulator wearing the VR Head-Mounted Display.(b) Experimenter ensuring correct positioning of the Head-Mounted Display and steering wheel.The photos are reprinted with permission from © 2023 Simone Reukauf.

Figure 6 .
Figure 6.The plot represents the coefficient estimates of the effect of warning signals and seeing objects of interest on success.The intercept represents the reference level (base condition).The horizontal x-axis represents the regression coefficient, which shows the strength and direction of the relationship between predictors (each type of warning signal and seeing objects of interest) and the probability of success.The vertical y-axis lists the warning signal types.Each bar represents the estimated coefficient for the predictor.The horizontal lines through the bars (error bars) represent the 95% confidence intervals for the coefficient estimates.The red dashed line at 0 represents the null hypothesis (coefficient = 0).

Figure 7 .
Figure 7.This plot displays the distribution of gaze dwell time (y-axis) across four conditions (x-axis).Each condition is represented by a violin plot, where the outer shape illustrates the probability density of dwell times.The embedded box plot highlights the median (central line) and interquartile range (box).Scatter points show individual data observations.The gray dashed line (*) connects conditions with no statistically significant differences.

Figure 8 .
Figure 8.This plot represents the coefficient estimates of the warning signal, reaction time, and dwell time effect on success.The intercept represents the reference level (base condition).The horizontal x-axis represents the regression coefficient, which shows the strength and direction of the relationship between the predictor and the probability of success.The vertical y-axis lists the warning signal types.Each bar represents the estimated coefficient for the predictor.The horizontal lines through the bars (error bars) represent the 95% confidence intervals for the coefficient estimates.The red dashed line at 0 represents the null hypothesis (coefficient = 0).

Figure 9 .
Figure 9. Anxiety_Trust and Method of Control Analysis Results.The plot shows the results of post hoc tests for each dependent variable (questionnaire subscales), comparing the mean scores of each level of the independent variable (condition).The error bars represent the standard error of the mean (SEM), illustrating the variability of each mean score across different conditions.

Figure 10 .
Figure 10.Correlation matrix.The plot shows the result of the Spearman rank correlation between behavioral, questionnaire, and demographic data.(*) indicates a significant correlation at 0.05 level.

Figure A1 .
Figure A1.Logistic Regression Complete Tables, Synergy between warning signals and seeing objects of interest.

Figure A2 .
Figure A2.Logistic Regression Complete Tables, Identifying success factors in critical event maneuvering.

Table 1 .
Exploratory factor analysis results.
Note.Factor loadings and communality for Varimax-rotated four-factor solutions for nine Autonomous Vehicle Acceptance Model items (N = 182).The largest loading for each item on each factor is in bold.