applied

Featured Application: Using the key parameters identified in this paper, researchers can improve their designs of take-over request (TOR) user interfaces by increasing or decreasing the influence of each key factor, thus contributing to greater safety and comfort in conditionally automated vehicles. Abstract: In conditionally automated driving, a vehicle issues a take-over request when it reaches the functional limits of self-driving, and the driver must take control. The key driving parameters affecting the quality of the take-over (TO) process have yet to be determined and are the motivation for our work. To determine these parameters, we used a dataset of 41 driving and non-driving parameters from a previous user study with 216 TOs while performing a non-driving-related task on a handheld device in a driving simulator. Eight take-over quality aspects, grouped into pre-TO predictors (attention), during-TO predictors (reaction time, solution suitability), and safety performance (off-road drive, braking, lateral acceleration, time to collision, success), were modeled using multiple linear regression, support vector machines, M5’, 1R, logistic regression, and J48. We interpreted the best-suited models by highlighting the most inﬂuential parameters that affect the overall quality of a TO. The results show that these are primarily maximal acceleration (88.6% accurate prediction of collisions) and the TOR-to-ﬁrst-brake interval. Gradual braking, neither too hard nor too soft, as fast as possible seems to be the strategy that maximizes the overall TO quality. The position of the handheld device and the way it was held prior to TO did not affect TO quality. However, handling the device during TO did affect driver attention when shorter attention times were observed and drivers held their mobile phones in only one hand. In the future, automatic gradual braking maneuvers could be considered instead of immediate full TOs.


Introduction
The ever-advancing automation of vehicles is bringing more and more comfort to our daily migrations. However, along with convenience, many new challenges are being introduced. Recent studies show that in densely populated cities, children, the older population, and individuals with low income would be the main beneficiaries of fully automated vehicles, as they would improve their traveling possibilities [1,2]. For those who would benefit, the main concerns are accessibility, privacy, reliability, and safety of shared automated vehicles [3][4][5]. Insights showed that the elderly population is only willing to use a vehicle equipped with an automated driving system (ADS) if it is supervised by a human operator [6]. Therefore, remote control by a human driver is one of the possible countermeasures, although it is associated with a number of other safety concerns [7]. In terms of safety, Western European countries are more pessimistic than Asian countries [8]. Although potential users would be willing to pay more for a conditionally or fully automated vehicle [9,10], it seems unlikely that the future will be dominated by privately owned automated vehicles, as the total vehicle kilometers traveled could increase by as much as 60%, with more than half of these being zero-occupancy trips [11].
Getting back from the future, today's technology trends are mainly focused on conditionally automated driving, the automation level where the vehicle can drive on its own in some predefined environments (highway, cities with dedicated road infrastructure, sunny weather, etc.), but can request the driver's intervention at any time in case of unpredictable situations that cannot be handled by the internal algorithms. The Society of Automotive Engineers (SAE) defines this as Level 3 (L3) conditional driving automation [12]. During fully automated driving, the driver can engage in secondary tasks, such as checking mail, answering missed calls, etc. However, the driver must always be present in the driver's seat in case the vehicle issues a take-over request (TOR) that requires intervention within a limited time (usually within a few seconds [13]). Then, the driver is solely responsible for resolving the potentially critical situation.
Research by Tomasevic et al. showed that drivers are willing to engage in L3 automated driving if the perceived situations do not appear too complex [14]. Ensuring safe driving continuity is even more important in L3 automated vehicles, as more drivers tend to use their mobile phones or other handheld devices while driving, transforming their vehicles into mobile offices. Multitasking to mitigate the negative effects of travel time and to use travel time more efficiently for work or study is already present in conventional transport modes and is only expected to increase with the introduction of L3 vehicles [15,16].
If the driver is using a handheld device when a TOR is issued, the reaction time consists of periods for gaze switching, information gathering, and cognitive processing, which together result in gaining situational awareness, and a period for decision making [17]. According to Endsley's theory of situational awareness, one must be aware of a situation before any effective action could be taken [18]. Therefore, reduced engagement in driving tasks (due to the use of handheld devices, etc.) results in an additional period for handling the switch between non-driving and driving tasks, thus increasing the time needed to (re)gain situational awareness, to know what is going on around you. From the perspective of a vehicle user interface (UI) developer, it is of crucial significance to use an appropriate interface that makes it easy and efficient for the driver to put away the handheld device and regain situational awareness, e.g., by choosing the appropriate modality, display, timing, etc. [19][20][21].
Nowadays, almost everything in a vehicle is monitored. There are many sensors available, but how can they be used to help the driver regain situational awareness after a TOR? To determine the appropriate and important sensors and parameters, data mining and modeling approaches seem promising. Based on an accurate model of the most important parameters, both workflow and measurement optimization are possible. If the models explained the impact of a particular parameter on the overall process, the designer would know which part of the process needs improvement. Such an approach is already widely used in speech recognition and quality measurements [22,23]. In addition, determining key parameters during the take-over (TO) process would allow for improved driver profiling [24,25] or even the determination of the driver's fitness or unfitness to take over [26].

Related Work on Common Take-Over Parameters
In recent years, researchers have been trying to evaluate TOs and TO performance in many ways. One quality aspect that is common to all of them, is the reaction time after a TOR (RT, sometimes referred to as take-over time) [27][28][29][30][31][32][33][34][35][36][37][38][39]. Some authors also distinguish between reaction time to first steering (movement of the steering wheel angle by more than 2 degrees) and reaction time to first braking maneuver (application of the brake pedal by more than 10%) [40][41][42]. In many recent studies, RT is further subdivided into attention time (until the first glance at the dashboard and/or the road), time until motor readiness is established (i.e., hands on the steering wheel, foot on the pedal), and time until a reaction is executed [43][44][45][46]. However, measuring RT alone is not sufficient to determine the overall quality of TO [47][48][49][50]. In fact, Gold et al. have shown that driver reactions are generally of poorer quality when they are faster [51]. To evaluate the TO quality, most researchers considered minimal time to collision (TTC), i.e., the minimum of the measured current times it would take a vehicle to collide with an obstacle, assuming constant speed and acceleration [27,30,39,43,50,[52][53][54]. In addition, various measures of lane deviation are usually considered: steering wheel angle variability (standard deviation) [29,53,55], maximal lateral acceleration (deceleration) [30,39,43,54], standard deviation of lane position [31,39,44,56], distance to centerline [34], and slope of obstacle avoiding trajectory [50].
In terms of driver background, aging has been shown to most often negatively affect TO performance, i.e., increasing reaction time [24,29,30,35,53,70]. Other than aging, of the demographic data, only previous experience with TORs has shown a (positive) impact on TO performance [52].
To include all available parameters in a single metric, Agrawald and Peeta proposed the take-over performance index (TOPI), which consists of three parts: risk of collision, response intensity, and trajectory quality [52]. Similarly, Li et al. proposed a risk evaluation method based on the driving safety field [71], and Kolekar et al. proposed the driver risk field-based estimation as a prediction of driver perceived risk [72].

Related Work on Handheld Device Use in Conditionally Automated Vehicles
As we anticipate that the use of handheld devices, particularly mobile phones, will be the primary distraction factor in L3 conditionally automated vehicles, converting them into mobile offices, additional attention should be paid to effectively and efficiently manage the transition between this specific task, using a handheld device, and driving. Consistent with our intentions, Jazayeri et al. showed that mobile phone use was the only type of secondary task in L3 vehicles that caused significantly more near-crashes and crashes [73].
Many general studies indicate that mobile phone use while driving poses risks and a higher crash probability [74][75][76][77]. Yannis et al. found that using a touchscreen phone poses a higher risk than using a conventional phone for texting [75]. Oviedo-Trespalacios et al. reported that drivers performed better when making phone calls than when browsing or texting [78].
In contrast to Jazayeri et al. [73], Kaye et al. reported that mobile phone use in L3 vehicles did not negatively affect younger drivers compared to other cognitively demanding tasks [79]. Li et al. reported that the use of a heads-up display (HUD) reduced TO reaction time compared to looking at the mobile phone [80]. However, there are very few research reports available that specifically address handheld device use in L3 conditionally automated vehicles. Many of them use handheld devices such as mobile phones as a form of a secondary task to distract the driver, but almost none consider the specifics of this task, e.g., where and how does the driver hold the device, how do they drop the device, etc.

Related Work on Modeling in the Field of Driving Automation
Modeling, machine learning, and artificial intelligence approaches that are based on a large number of parameters (clustering, classification, regression, etc.) are not yet so commonly used in the domain of taking over during conditionally automated drives. However, beginnings, which indicate feasibility, do exist.
Gold et al. have recently successfully used regression techniques to model driver reaction times, estimated time to collision, and collision probability [81]. Deng et al. developed a cognitive model based on a queue network (QN-ACTR) that predicts reaction time [82]. Zhang et al. found using a mixed linear model that the average reaction time of a driver is lower in more urgent or critical situations [83]. Deo and Trivedi proposed a long short-term memory (LSTM) network model for the continuous prediction of driver readiness [84]. Kuen et al. proposed a TOR agent that learns to issue TOR at the most appropriate time [85]. To be even more effective in determining the timing and type of TOR, Yang et al. introduced a 3D convolutional neural network to recognize drivers' current activity [86]. Similarly, Samani and Mishra used clustering techniques to assess risky driving styles after TO conditions [24].

Research Questions
From the related work on common take-over parameters (Section 1.1), it could be concluded that scientists have come to an agreement that measuring reaction time alone is not sufficient to assess the TO quality. Therefore, the quality aspects considered in our context also include measurements of reaction intensity, lane deviation, collision avoidance approach, and success. However, it is not yet clear which of the many driving-related and non-driving-related parameters have the greatest influence on these quality aspects and need to be additionally considered.
The aim of this study is to determine the parameters that have the greatest impact on the quality of a TO. For this purpose, an existing dataset from a driving simulator study [87] and proven machine learning algorithms that also allow a reliable interpretation were used. By determining key parameters, researchers will be able to further improve their designs of TOR user interfaces by knowing which parameters to focus on.
As far as the authors are aware, this is the first study to examine different ways of transition between handheld device use (specifically) and the driving task. Using the interpretable models, we hope to determine the effects of each possible action related to device handling at the moment of a TOR. Thus, the two research questions of this paper are: • "Which parameters have the greatest impact on the quality of TO?" and • "How does handheld device handling affect the TO process?" In Section 2, we introduce a few basic and commonly used machine learning algorithms that will be considered later in the analysis. Then we briefly introduce the methodology-the dataset, the parameters, and the procedure used to build the models. In the Results section (Section 4), we present the created models according to TO quality aspects. In the last section, we propose possible interpretations of the results and conclude with suggestions for additional steps to further improve conditionally automated vehicles.

Data Mining
Nowadays, data models are typically built with various machine learning (ML) techniques. While numerous algorithms are already available [88,89], many more are under development at the time of writing. However, the ML algorithms perform differently in different use cases and target knowledge domains. Data mining is therefore still an "experimental science" where multiple algorithms have to be applied to data from the target domain and evaluated or compared to each other to find the one that provides better, more reliable, and more usable results [90].
Generally, a data mining scientist's task is to predict the value of an attribute based on models generated from an already available and labeled training dataset. Depending on the type of predicted attribute, a distinction is made between [90]: • Classification: when the predicted attribute is nominal, e.g., did the vehicle encounter a collision or not, and • Regression: when the predicted value is measured on a continuous scale (i.e., numerical), e.g., the predicted driver reaction time.
The basic algorithms that were considered in our analysis are briefly presented in the following subsections.

Baseline
As a starting method, it makes sense to check what performance we get, if we do not apply any ML algorithm. The baseline is determined by calculating the mean (regression tasks) or mode (classification tasks) of the predicted attribute from the training dataset and predicting this value for each occurrence. Y n of an instance n is calculated as the weighted sum of the other K attributes X k,n of the same instance (1). ..
The weights are iteratively determined in a way to minimize the root mean square error (RMSE) of the prediction .. Y n against the observed attribute Y n of N instances (2).

Support Vector Machine (SVM) Regression
SVM is originally a classification technique based on sequential minimal optimization (SMO) [91]. Its main advantage over other classification algorithms is its ability to determine class boundaries (models) based on only a few instances from the training dataset that are close to the boundary (i.e., support vectors). Therefore, the produced models are usually simpler and more resilient to overfitting.
Although SVM is originally a classification technique, it is often extended to its regression version, SVM regression [92,93], to account for numerical data.

M5' (Nonlinear Regression)
Common examples of nonlinear regression algorithms are classical linear regression algorithms that additionally consider the weighted squares of the individual attributes and trees of linear regression models. The M5' algorithm [94,95] is an example of a tree modeling algorithm that builds a separate linear model for each individual tree leaf. Its particularly valuable ability, as is the case with most regression algorithms, is to provide a possibility for natural interpretation of the model [81].

1R (Decision Based on Only One Attribute)
With many datasets, even very simple classification rules work very well [96]. An example of the simplest algorithm is 1R, an algorithm that considers only one attribute for prediction, the one that leads to the smallest error.
The algorithm is run once for each attribute. It determines which class is most often represented by each outcome (possible value) of an attribute and makes a prediction rule for this outcome of an attribute. At the end, the algorithm calculates the proportion of misclassified instances per attribute and selects a model, based on the attribute that has the lowest proportion of misclassifications.

Logistic Regression
Linear regression can also be used to predict (classify) a binary attribute by establishing a meaningful threshold. If the regression value is above the threshold, an instance is classified into one class; otherwise, it is classified into another. However, a better approach to the same problem is binary logistic regression, which uses the logit transformation of (1) to directly compute the weighted probability P n that an instance n belongs to a particular class [97] (3).
The weights are determined to maximize the log-likelihood log(L) considering classes Y n and attributes X n of N instances from the training dataset (4): Multinomial logistic regression for more than two class values is usually performed by calculating separate logistic regression models for each class value. The instances are then classified into the class with the highest probability for each instance.

J48 Tree
Classification methods based on decision trees usually work on the principle of "divide et impera," recursively from top to bottom. C4.5 is a commonly used decision tree algorithm, developed by Quinlan in 1993 [98]. J48 is the name of its implementation in Java, which we used in our analysis.
The algorithm repeats the following steps:

1.
Choose an attribute for the node and branch (make a new branch for each possible value of this attribute).

2.
Assign the dataset instances to the corresponding branches.

3.
Repeat the procedure for each branch until all instances in the branch correspond to the same class value.
The hardest part of the algorithm is deciding which attribute to choose for the node. A general principle is that it makes sense to choose a node that divides the dataset into the "cleanest" branches possible, to obtain the smallest tree (to obtain more information). The tree algorithm in the first step of the iterative process thus calculates the entropy that would be obtained by selecting an attribute for the node. To reduce complexity at the expense of errors, the recursive procedure could be stopped before all instances in a branch correspond to the same class value (known as pruning).

Other
We are aware that more powerful ensemble learning techniques (such as random forests [99]) could often provide even better modeling results in terms of regression or classification accuracy, but they can be hard and sometimes even impossible to interpret or analyze [90] and are therefore not considered in the scope of this paper.

Materials and Methods
We used a dataset provided by Gruden et al. [87]. They conducted a user study in which they placed drivers in a driving simulator featuring a conditionally automated vehicle. Drivers were instructed to play a game on their smartphone and take over when a take-over request (TOR) was issued by the vehicle.
The aforementioned user study that produced the dataset tested drivers' reactions to take-over requests of different modalities and types. Therefore, we believe that the dataset as a whole represents a general take-over procedure that can sometimes be initiated by an auditory user interface or in other cases by a tactile user interface, either with or without directional cues, etc.

Dataset Overview
A conditionally automated vehicle drove about 20 km on a foggy three-lane highway. While in fully automated mode, it was always in the middle lane and driving at 110 km/h. Meanwhile, drivers were engaged in a secondary visuomotor task: playing a "Tetris" game on a smartphone. The task was mentally demanding, but not time-critical (the player would not fail just for doing nothing). The vehicle issued a take-over request 6 s before the potential impact with the obstacle, i.e., 183 m before the obstacle, as suggested by Shi et al. [100].
Each of the 36 drivers participated in 6 take-over events where the road was either partially or completely blocked, resulting in a dataset with a total of 216 take-over instances.
In 144 instances (two-thirds), drivers were able to avoid the roadblock by traveling in the third available lane because the obstacle occupied only two of the three highway lanes. In the remaining 72 instances, a full roadblock was in place, requiring drivers to stop their vehicles completely to avoid a collision.
For more information about the dataset, see the referenced article [87].

Attributes
The dataset consists of 41 demographic, driving-related, and other observed takeover attributes. Among the variety, we focused mainly on observations of how the driver handled the transition from the secondary back to the primary task and what they did with the handheld device (and how). In addition, all attributes were grouped chronologically into pre-TO predictors, during-TO predictors, and safety measures. The attributes we used for the generation of data models are listed in Table 1. Driving frequency nominal "every day", "few times a week", "few times a month", "few times a year" A7 Simulator experiences nominal "never", "once", "a few times", "many times" A8 Gaming experiences nominal "never", "once a week", "few times a week", "every day" A9 VR experiences nominal "never", "once", "a few times", "many times" A10 Motion sickness while reading during driving nominal "no", "yes, after some time", "yes" The darker rows present the attributes that, according to our results, most strongly influence the TO quality. * This table also includes the predicted attributes that represent the criteria for different TO quality aspects; they start with the letter "P". When generating models for predicting an attribute, we only consider the attributes that are in the table above the predicted attribute, as they are listed chronologically.
As the overall quality of a take-over is difficult to assess, multiple attributes, TO quality aspects (QAs), could be modeled. Because related work suggests that considering reaction time alone is not sufficient to validly assess the TO quality, we also modeled measures of reaction intensity, lane deviation, approach to collision, and success (TO without collision). The predicted attributes were selected so that each category of quality aspects that are often found in related work is predicted at least once by our models. The complete list of predicted attributes is given in Table 2.

Data Analysis
Using the ML methods, described in one of the first sections, we built various TO quality models to better explain each of the eight studied TO quality aspects (QAs) from Table 2. Of all the attributes listed (Table 1), we want to expose those that have the most influence on each quality aspect. If we knew which factors have the greatest influence, we could try to increase or decrease the influence of each individual key factor by appropriate UI and interaction design, and thus contribute to greater safety and comfort while driving in a conditionally automated vehicle.
For the regression tasks (TO quality aspects P1, P2, P6, and P7 as described in Table 2), we compared the root relative squared error (RRSE). Squared error was used for comparison so that extreme error values have a greater impact on the results; it was rooted to reduce the error to the same dimension as the predicted attribute and presented relative to the error of a baseline predictor (mean) [90]. We tried the following algorithms and selected the one with the lowest RRSE: In classification tasks (TO quality aspects P3, P4, P5, and P8), we compared the classification accuracy of the different models. When the data were not equally distributed between the classes (TO quality aspects P3 and P4), we used the area under the ROC (receiver operating characteristics) curve (AUC) instead, as it also considers selectivity (capability of distinguishing between classes) besides accuracy [90]. The ROC curve is a graph plotting true positive rate against false positive rate for all thresholds, showing the performance of a classification model. The area under the curve (AUC) therefore provides aggregated statistics across all classification thresholds. We tried the following algorithms and selected the one with the largest accuracy or AUC: 1R (based on only one attribute); • J48 tree; • Logistic regression.
The numerical attributes were Z-normalized before being used in the algorithms so that their weights could be compared with each other, corresponding to the impact of the attribute on the overall result [104]. The models were generated using the open-source software WEKA 3.8 [105]. For each of the eight predicted attributes, 100 models were created using each of the four relevant ML algorithms (regression or classification), as each ML algorithm was evaluated 10 times with a 10-fold cross-validation. A total of 3200 models were created and evaluated. Results were compared using paired-sample t-tests. Unless otherwise stated, an alpha level of α = 0.05 was used.

Results
To determine the key parameters of conditionally automated driving, we modeled eight different take-over quality aspects (Table 2) using the 41 observed TO attributes (Table 1) from a dataset, collected in a previous user study in a driving simulator. By interpreting the best-suited models, we were able to highlight the most influential parameters that affect the overall quality of a TO. In the following subsections, we present the results of eight TO quality models corresponding to the eight studied TO quality aspects. The results are grouped into pre-TO predictors, during-TO predictors, and safety measures. The last subsection provides a summary list of all highlighted parameters by impact.

Pre-Take-Over Predictors Attention
As a take-over request (TOR) could be issued while the driver is performing a secondary task, a short period for gaining attention, switching gaze back to the road, is added to the reaction time prior to cognitive processing. If one wants to shorten the reaction time, it would also be reasonable to shorten the period for gaining attention, e.g., by using the proper modality for the request. Drivers' attention was measured from the moment of a TOR until the first glance on the road. The detection was performed with eye-tracking glasses.
We modeled attention (P1) with regression algorithms using Attributes A1-A18. We achieved the lowest RRSE with multiple linear regression (RRSE = 88.11%) (see Table 3). The results were significantly better (p = 0.001) than when using the baseline model with mean attention time as a predictor variable but comparable to the other two models. Predicted attention is modeled as a linear combination of four independent variables: TOR modality (A13), repetition number (A14), attention before reaction (A18), and handheld device handling strategy (A17). The histogram of measured attention times with regard to handheld device handling strategy is plotted in Figure 1.

Reaction Time
The most often studied TO quality indicator is reaction time, also commonly referred to as take-over time (TOT). A shorter reaction time gives the driver more time to better resolve (maneuver) a potentially critical situation.
We predicted the reaction times (P2) using Attributes A1-A22 and P1. The histogram of the measured reaction times is plotted in Figure 2a. We achieved the best results with a tree of linear regression models, Algorithm M5' (RRSE = 43.57%), which performed significantly better than baseline (p < 0.001), MLR (p = 0.001), or SVM (p = 0.041) model (see Table 4). The model tree is presented in Figure 3. For shorter TOR-to-first-brake intervals (A21), the linear regression model (LM) 1 is used, whereas for longer TOR-to-first-brake intervals, LM2, LM3, and LM4 are considered based on the TOR-to-first-steer interval (A22). LM1 depends mainly (91%) on A21 itself, and similarly, LM2 depends mainly (89%) on A22 itself. In LM3, attention before reaction (A18) is exposed as the most informational parameter, revealing that drivers who directed their gaze to the road first had a delayed reaction (longer reaction time). LM4 shows that for the drivers who took longer to brake or steer, reaction time depended on the type of situation (A11-reactions were faster when the driver had to steer to the left) and visibility conditions (A20-reactions were faster when the obstacle was already visible).

Solution Suitability
Suitability refers to the driver's initial reaction and the type of situation: the proposed steering direction or braking request. When the TOR was directional (i.e., stimuli from the left or right), the driver was expected to recognize the directional instruction and steer away from the estimated obstacle position. Otherwise, the driver was expected to reduce speed, i.e., brake.
We predicted the solution suitability (P3) using Attributes A1-A22 and P1-P2. The Figure 2. Histograms of predicted attributes: (a) measured reaction times-the solid black bar (at reaction time equal to zero) represents the three occurrences where drivers were already driving manually when a TOR was issued; (b) solution suitability-whether the driver reacted according to the TOR stimuli, i.e., steered in the proposed direction or braked when steering was not proposed. (a) (b) Figure 2. Histograms of predicted attributes: (a) measured reaction times-the solid black bar (at reaction time equal to zero) represents the three occurrences where drivers were already driving manually when a TOR was issued; (b) solution suitability-whether the driver reacted according to the TOR stimuli, i.e., steered in the proposed direction or braked when steering was not proposed.

Solution Suitability
Suitability refers to the driver's initial reaction and the type of situation: the proposed steering direction or braking request. When the TOR was directional (i.e., stimuli from the left or right), the driver was expected to recognize the directional instruction and steer away from the estimated obstacle position. Otherwise, the driver was expected to reduce

Solution Suitability
Suitability refers to the driver's initial reaction and the type of situation: the proposed steering direction or braking request. When the TOR was directional (i.e., stimuli from the left or right), the driver was expected to recognize the directional instruction and steer away from the estimated obstacle position. Otherwise, the driver was expected to reduce speed, i.e., brake.
We predicted the solution suitability (P3) using Attributes A1-A22 and P1-P2. The histogram of solution suitability is plotted in Figure 2b. The automatic classification of drivers' correctness seems to be a more complex problem than expected. Although the best results among the four classification algorithms were obtained by binary logistic regression (AUC = 0.67), we were unable to make a model that is significantly better (p = 0.079) than the baseline prediction-that every reaction is correct (see Table 5). Drivers' inability to perform a smooth and efficient take-over can also result in unwanted situations where the vehicle goes off the road. Off-road drive was recorded when the center of the vehicle was off the driving area. In situations featuring a full roadblock, driving off the road may sometimes be even better than staying on the road. Therefore, we only considered the dataset instances where the type of situation (A12) was "steering", when the obstacle could be avoided by only changing lanes.
We predicted the off-road drive (P4) using Attributes A1-A11, A13-A33, and P1-P3. The histogram of off-road drive is plotted in Figure 4c. Two of the classification algorithms used showed statistically comparable results-J48 with AUC = 0.75 and the slightly better binary logistic regression with AUC = 0.84, both statistically outperforming (p < 0.001) the baseline assumption that there were no off-road drives (see Table 6). Odds ratios for off-road drive during a TO determined by binary logistic regression are presented for each attribute in Table 7.

Off-Road Drive
Drivers' inability to perform a smooth and efficient take-over can also result in unwanted situations where the vehicle goes off the road. Off-road drive was recorded when the center of the vehicle was off the driving area. In situations featuring a full roadblock, driving off the road may sometimes be even better than staying on the road. Therefore, we only considered the dataset instances where the type of situation (A12) was "steering", when the obstacle could be avoided by only changing lanes.
We predicted the off-road drive (P4) using Attributes A1-A11, A13-A33, and P1-P3. The histogram of off-road drive is plotted in Figure 4c. Two of the classification algorithms used showed statistically comparable results-J48 with AUC = 0.75 and the slightly better binary logistic regression with AUC = 0.84, both statistically outperforming (p < 0.001) the baseline assumption that there were no off-road drives (see Table 6). Odds ratios for offroad drive during a TO determined by binary logistic regression are presented for each attribute in Table 7.    TOR-to-first-brake interval 1.40 A31 Maximal brake pedal press 3.5 × 10 −3 P3 Solution suitability 0 A30 Maximal jerk 0

Brake Application
In their study, Gold et al. noticed that the TO quality differed between drivers who applied brakes during TO and those who did not [81]. As we tend to encourage drivers to reduce their speed, we are interested in which attributes increase the likelihood of braking.
We predicted the brake application (P5) using Attributes A1-A33 and P1-P4. The histogram of brake application is plotted in Figure 4d. A rather complex decision tree to predict brake application was produced with the J48 algorithm. The accuracy (percent correct) of the tree was 86.8%, which is significantly better (p < 0.001) than the baseline assumption that everyone applied a brake (see Table 8). The produced tree consisted of 10 decision rules and 14 leaves with 3-62 nodes. It included many correlated factors and was therefore impossible to interpret simply due to structural complexity. Altogether, in 66 out of 216 TO events (30.6%), the drivers did not apply the brakes to reduce their cruise speed.

Lateral Acceleration
Lateral accelerations are often considered as a part of the TO quality metric. Faster lateral accelerations pose a safety risk to the driver and the vehicle, as the vehicle can easily become unstable. In situations with heavy traffic, sudden lateral accelerations also pose a greater risk of collision with vehicles in adjacent lanes.
Examination of the study's video recordings indicated that the maximal lateral acceleration could exceed 10 m/s 2 in some extreme cases. This occurred mostly when drivers lost control of the vehicle due to oversteer, causing the vehicle to drift perpendicular to the lane and eventually hit the roadblock with the side.
We predicted the maximal lateral accelerations (P6) using A1-A33 and P1-P5. The histogram of the measured maximal lateral accelerations is plotted in Figure 4a. The lowest RRSE = 85.4% was obtained with support vector machines (see Table 9). It was also the only model that was significantly better (p = 0.005) than the baseline assumption (predicting the mean). The five attributes with the highest absolute weights are maximal steering wheel speed (A33), weight: 0.20; maximal jerk (A30), weight: −0.18; maximal steering wheel angle (A32), weight: 0.16; age (A1), weight: 0.10; TOR-to-first-brake interval (A21), weight: 0.09. Table 9. Results of the TO quality models for predicting lateral acceleration (mean scores). Calculating minimal TTC is a well-established road-safety measurement [106]. It is determined as the minimum of all discrete TTCs measured between the moment of TOR and the moment of passing the roadblock. A current TTC is the time budget until collision assuming constant vehicle speed and acceleration. The longer the minimal TTC, the safer the take-over.

Baseline
We predicted the minimal TTC (P7) using Attributes A1-A33 and P1-P6. The histogram of measured minimal TTC is plotted in Figure 4b. The regression model with the lowest RRSE = 66.1% for the approximation of the minimal TTC was created with SVM (see Table 10). It proved to be significantly better (p < 0.001) than the baseline, but not signifi-

Overall Success
The driver successfully takes over the vehicle if they can normally continue with the drive after the critical situation is resolved. Usually, a collision is treated as an unsuccessful take-over. We predicted the overall success (P8) using every other attribute (A1-A33 and P1-P7). The histogram of the overall success is plotted in Figure 4e. The 1R (p = 0.002) and binary logistic regression (p = 0.005) both outperformed the baseline assumption; however, the differences in classification accuracy among algorithms were not statistically significant (p = 0.877) (see Table 11). Therefore, it is reasonable to use the simplest model for interpretation, which was produced by 1R with an average accuracy of 88.6%. According to the model (see Figure 5), a collision does not occur if the maximal achieved acceleration (deceleration) achieved is between 6.6 m/s 2 and 12 m/s 2 .   Figure 5. Model for predicting overall success, created using 1R algorithm. The prediction depends only on maximal acceleration (deceleration) during TO, which should be between 6.6 m/s 2 and 12 m/s 2 to avoid collision.

Determined Parameters
According to the results of the TO quality models, the following attributes contribute to more than one TO quality aspect (QA):  TOR-to-first-brake interval (three QAs);  Maximal acceleration (three QAs);  Correct solution of critical situation (two QAs);  Attention before reaction (two QAs);  Reaction before the obstacle became visible (two QAs);  Reaction time (two QAs);  Maximal jerk (two QAs).
Additionally, the following parameters contribute to at least one QA: The mentioned resulting parameters are also highlighted with a dark background in Table 1. The darker the color, the more TO QAs are affected by the parameter.

Discussion
In this study, we modeled 8 TO quality aspects based on 41 parameters and determined those that have the greatest impact on TO quality. In addition, we also investigated

Determined Parameters
According to the results of the TO quality models, the following attributes contribute to more than one TO quality aspect (QA): • The mentioned resulting parameters are also highlighted with a dark background in Table 1. The darker the color, the more TO QAs are affected by the parameter.

Discussion
In this study, we modeled 8 TO quality aspects based on 41 parameters and determined those that have the greatest impact on TO quality. In addition, we also investigated how the handling of the handheld device used to perform a secondary task affects the TO process. The following subsections discuss the results and their practical implications.

Pre-TO Predictors
The models exposed the correct solution of the critical situation (i.e., the type of situation) and attention prior to a reaction among the predictors that preceded a TO. Faster reaction times were observed when the drivers had to steer to the faster lane (left) to avoid the roadblock, but, on the other hand, those situations also resulted in more off-road drives. Therefore, we would like to emphasize the importance of considering the type of situation when designing TO interaction or evaluating TO quality.
Unlike Wu et al. [29], Körber et al. [30], and Peng et al. [53], we found only a minor effect of the driver's age and no effect of any other demographic parameters on the TO quality.

During-TO Predictors
If a vehicle monitors the driver's responses also during the TO process, HVI could be continuously adjusted to increase the TO quality. Among the parameters that could be measured during TO, our models exposed the TOR-to-first-brake interval, reaction time, and whether a reaction was made before the obstacle became visible. In agreement with Jing et al. [40], Wandtner et al. [41], and Li et al. [42], it might be beneficial for researchers to distinguish between reaction times to the drivers' first steering reactions and their first braking reactions. For example, if the driver reacted quickly, but only by steering, a customized interaction plan could leave longitudinal coordination of the vehicle under automatic control (gradual braking) instead of assuming a full take-over.
By examining individual TO quality aspects, we observed that although drivers' attention (P1) was delayed when they decided to brake (react) before looking at the road, it later turned out that the TO quality was generally better when they reduced speed in a timely manner. The linear models of reaction time (P2) showed faster reactions for drivers who applied the brake pedal to take over the vehicle (with or without simultaneous steering). The TOR-to-first-brake interval was also positively correlated with maximal lateral acceleration (P6). In many ways, the SMO model of minimal TTC (P7) also showed that larger minimal TTC can be achieved with shorter reaction times. As the results show, TOR-to-first-brake interval seems to be one of the most influential TOR parameters.

Safety Measures
In addition to predictions during the TO process, which focuses on monitoring the intervals between TOR and the TO, some safety performance parameters could also be monitored. If the most exposed parameters, i.e., maximal acceleration (deceleration) and maximal jerk, would exceed a certain threshold, additional actions such as warnings or automatic maneuvers could be performed by the vehicle, regardless of whether the driver performs a TO or not. For example, Minderhoud and Bovy [103] proposed that additional warnings should be issued every time a minimal TTC (P7) drops below three seconds.
The simple model of overall success (no collision, P8) suggests the importance of moderate braking (moderate maximal acceleration). It should be neither too strong, which could be the consequence of a late response, nor too weak, which could be a possible consequence of an inadequate response or no response at all. Separately, the analysis of off-road driving (P4) revealed that drivers with strong accelerations/decelerations also had the highest probability of off-road driving. This suggests a correlation among maximal acceleration (A29), off-road driving (P4), and overall success (P8). On the other hand, no off-road driving was observed with drivers who braked fast enough (with a larger maximal jerk), did so appropriately (not too hard), or responded correctly to the TOR.
It seems somewhat surprising that measurements of lane deviation were not pointed out more often by our models. In contradiction to Gold et al. [43] or Dillmann et al. [55], lane deviation measurements do not seem to affect the TO quality as much as the maximal acceleration or braking reaction time do. However, maximal lateral acceleration (P6) is most strongly influenced by the fastest steering wheel rotation. The recorded fast steering wheel turns may have been the result of reckless reactions by drivers. It could be debatable whether preventing fast steering wheel turns would benefit the overall quality of a TO (e.g., by hardening the wheel, as lane-keeping systems do while manually changing lanes without signaling).
To finally answer our primary research question (which parameters), the TOR-to-firstbrake interval and maximal deceleration seem to have the greatest impact on the quality of a TO. Therefore, to improve the TO quality, a good TOR UI should provoke quick braking reactions (although not necessarily stronger ones, since excessive braking can negatively affect some TO quality aspects, such as off-road drive or overall success) or, even better, include a controlled automatic braking maneuver. Somewhat generalized, one could say that the best option for the driver would be gradual braking immediately after TOR to gradually reduce speed. Braking, and thereby reducing speed, reduces both the overall reaction time and the maximal lateral acceleration, as well as increases the minimal TTC. Furthermore, this could be done automatically by an automatic braking maneuver.

Handheld Device Handling
The dataset consists of three attributes related to the use of a handheld device during a take-over: A15-driver's hand position before TOR, A16-handheld device position (height) before TOR, and A17-handheld device handling strategy. Our model of attention time only showed the importance of different strategies for handling the device (types of task switching). Drivers who chose to drop the smartphone or hold it in their hand on the steering wheel had a longer attention time. On the contrary, drivers with shorter attention times (see histogram in Figure 1) held the device in their free hand rather than on the steering wheel. Since the way drivers handled the handheld device at the time of TOR was not controlled, we cannot confirm causality with attention times; however, we can conclude that the correlation exists.
Contrary to our initial expectations, the position of the driver's hands or the handheld device itself did not affect the quality of TOs. One might naively assume that holding the device at the same height as the windshield and thus watching the road with peripheral vision would lead to better TO quality. For similar reasons, Li et al. suggested using HUD on the windshield [80]. However, our results did not confirm this assumption. We suspect that using a handheld device was too cognitively demanding (especially when playing a strategic game) to benefit from peripheral vision. Similarly, no effect on the TO quality was found when drivers held the smartphone with one or both hands and when a free hand was positioned on the steering wheel or on the body. It seems that TOR is a too demanding task interruption for the resulting TO quality to benefit significantly from previous activities or knowledge.

Other Observations and Limitations
We also found that some drivers did not brake at all during TO. Preventive measures such as continuous education of drivers about possible measures and advantages of driving at reduced speed would be advisable to increase overall road safety.
A possible limitation of this study is that it uses a dataset with two specific user interfaces for TOR. However, it should be noted that there is not yet a common agreement on an interface design that can be standardized. Therefore, a possible future work would be to repeat the analysis for other promising TOR interface designs.

Conclusions
In summary, our results emphasize the importance of moderate maximal acceleration and a short TOR-to-first-brake interval; in other words, we emphasize the importance of gradually reducing travel speed as quickly as possible by applying moderate braking as soon as a take-over request is issued in a conditionally automated vehicle. As 30.6% of all drivers from the dataset did not apply any measure to reduce travel speed, automated gradual braking maneuvers should be considered. In our case, the appropriate strategy of putting the handheld device away lowered the required attention time.
In the future, further validation experiments confirming that determined influential parameters affect the quality of TO would be highly desirable. It would also be interesting to investigate whether these models can be used for real-time predictions in vehicles due to their computational complexity, rather than relying on their general results. Another parameter to be determined is the available time to let the driver decide on an appropriate action, considering the time needed to automatically perform an effective and efficient maneuver. In addition, the vehicle could continue to monitor the driver's (re)actions after the driver has successfully taken over the vehicle until driving conditions return to normal for fully automated driving. This also opens up a promising area for future research related to partial take-overs, where the vehicle does not transition to fully manual driving after a TO but merely reduces the level of automation to its current capabilities.