Effects of non-driving related tasks on mental workload and take-over times during conditional automated driving

Automated driving will be of high value in the future. While in partial-automated driving the driver must always monitor the traffic situation, a paradigm shift is taking place in the case of conditional automated driving (Level 3 according to SAE). From this level of automation onwards, the vehicle user is released from permanent vehicle control and environmental monitoring and is allowed to engage in Non-Driving Related Tasks (NDRT) in his or her newly gained spare time. These tasks can be performed until a take-over request informs the user to resume vehicle control. As the driver is still considered to be the fall-back level, this aspect of taking over control is considered especially critical. While previous research projects have focused their studies on the factors influencing the take-over request, this paper focuses on the effects of NDRT on the user of the vehicle during conditional automated driving, especially on the human workload. NDRT (such as Reading, Listening, Watching a movie, Texting and Monitoring ride) were examined within a static driving simulator at the Institute of Ergonomics & Human Factors with 56 participants in an urban environment. These NDRT were tested for mental workload and the ability to take over in a critical situation. To determine the perceived workload, the subjective workload, psychophysiological activity as well as performance-based parameters of a secondary competing task performed by a were used. This study revealed that the selected NDRT vary significantly in their mental workload and that the workload correlates with the length of the time needed for take over control. NDRT which are associated with a high workload (such as Reading or Texting) also lead to longer reaction times.


Motivation
The demand for individual passenger transport has increased considerably in recent decades, which has had not only positive but also negative side effects. The increase in traffic density resulted, among other things, in 2.6 million accidents on German roads in 2018, with around 400,000 people injured and approximately 3300 traffic fatalities. According to studies, around 86% of all accidents involving personal injury are attributable to driver misconduct [1]. According to [2], even 95% of all fatal accidents are caused by human error. To counteract this, great expectations are therefore placed in automated driving. To reduce the number of accidents, automated driving systems can be used to protect the driver in complex driving situations from being overloaded or, in the case of reduced attention, from impending accidents. For many drivers, delegating the driving task is therefore also associated with a gain in comfort [3]. In addition, automated vehicles can potentially improve safety, reduce congestion and thus emissions, and positively influence the independence and mobility of the non-driving population [4]. In the case of L3 driving [5], the driver can relinquish control of the vehicle. This makes it possible to deal with non-driving related tasks (NDRT) while driving. Tasks vary in their type and complexity and therefore require a different level of attention. First studies already show a reduced ability to take over as a result of performing NDRT during automated driving [6,7]. The performed NDRT and the change of tasks may cause a reduction in the take-over control capability.

Scope of this paper
In this paper, the effects of different NDRT on the vehicle user during conditional automated drive are experimentally investigated. Since the vehicle user still serves as a fallback level during conditional automated driving, the aspect of the change of tasks from NDRT to taking over control is considered critical. The mental workload was investigated by means of psychophysiological and performance-based parameters as well as the subjective task load. Since in conditional automated systems it can happen that the vehicle user has to resume control of the vehicle guidance, this aspect will be investigated for different NDRT. In particular, the ability to take over after a take-over request (TOR) in urban traffic as well as the relationship between workload and takeover control will be analysed in this paper.

Theoretical principles
In the context of conditional automated driving, the vehicle user can turn away from the obligation of permanent vehicle control as well as monitoring the environment and engage in NDRT in his or her newly acquired spare time. These activities may be carried out until a TOR advises the user to resume vehicle control. Since the driver is the fallback level, this aspect of taking control is considered particularly critical as a late reaction of the driver to a TOR could result in accidents. [8] have already proposed various interacting determinants and their implications for automated systems. For example, trust, mental models, experiences, task loads, situation awareness and mental workload should be used to explain behaviour during automation. According to [8][9][10][11] mental workload is a construct for explaining performance and safety in automated systems and is therefore described in more detail. In addition, aspects of take-over request, further literature references to previous research results and the research questions are presented below.

Mental workload
In order to understand the term workload, the stress and strain concept (SSC) (cf. [12,13]) is briefly explained. A simple approach to explain this concept is the cause-effect chain. The stresses are generally causes that are independent of the individual and have an effect on humans. Humans react to this in the form of quantifiable individual strain. In contrast, the workload concept is described as "portion of the operator's limited capacity that is actually required to perform a particular task" [14]. According to [15] workload is understood as "[…] the specification of the amount of information processing capacity that is used for task performance" [16]. use the term workload to answer questions such as "How busy is the operator?", "How complex are the tasks that the operator is required to perform?", "Can any additional tasks be handled above and beyond those that are already imposed?" [17]. defines workload as the ratio between the resources required by a task and the resources available to the human. According to [18], emotional and mental load is summarized as psychological load. Furthermore, emotional strain is often seen in direct connection with feelings. Mental workload, on the other hand, describes the cognitive reaction of the human information processing system to the informational parts [19]. It can be summarised from the above definitions that the stresses affecting humans result in strain respectively workload. The acting stress can be differentiated into task-and situation-specific partial stress. Partial stress that affects humans can be summarised to a total stress and cause measurable strain or workload in humans. The strain or workload is therefore the effect or the reaction of a person to external stress factors.

Take-over request
A central problem in conditionally automated vehicle research is how quickly the vehicle user can react to a critical event or a TOR. Until automated systems are able to perform all driving tasks under all conditions, the vehicle users must regain control if the automation fails or reaches its operating limits. Partial automation (L2), which is already provided by several car manufacturers, requires that the vehicle users constantly monitor the road and are able to intervene in case of critical events. In L3 vehicle users can delegate the monitoring task to the system during automated driving and therefore engage in NDRT.
The take-over process was previously described by [20]. The transfer process starts with the conditional automated vehicle guidance. If the automated system issues a TOR, it is necessary for the user to detect and register it. Then the change of tasks to vehicle takeover and guidance takes place by interrupting the NDRT that has been carried out and turning one's gaze back to the road, before a choice of action is made. In parallel, the motor readiness is established. This is characterized by gripping the steering wheel with the hands and/or moving the feet to the pedals. Finally, manual control of the vehicle can be taken over by steering and/or braking. How long the transition to manual driving takes and which factors explain the transfer time has already been investigated in recent years. The reaction time most commonly used in the scientific literature is the takeover time (TOT). It is defined as the time between TOR and the intervention in the vehicle control. This time already shows a wide bandwidth in the publications. In [21] an average brake reaction time of only 0.87 s is found, in a meta-analysis of 25 studies by [22]

Related work
The human-related research on conditional automated driving is primarily concerned with the question of how much time the driver needs to intervene in the driving task again. According to a meta-study by [24] 129 studies have been identified to determine the factors influencing the take-over time. Further analyses of previous studies on transition are provided by [25,26]. In the literature reviews cited above, influencing factors such as urgency, environmental factors (including the complexity of the traffic situation) and the effect of NDRT are particularly mentioned.
Numerous studies have investigated the urgency of a takeover situation depending on the time available until a collision is impending, also called time budget or timeto-collision (TTC) [27]. examined various time budgets and found that in more critical takeover situations (lower time budget) the reaction times were faster than in more extensive time horizons. The authors found that from a time budget of 6 to 8 s, there were no differences in the frequency of take-over control errors. In addition, [28] examined the effects of the time budget. Longer time budgets also lead to longer TOT.
The environmental factors, in particular the complexity of the traffic situation, were investigated at [29] as well as [30]. It turned out that a more complex traffic situation leads to longer TOT. However, this negative effect could not be found in [31].
For investigation in the driving context, the literature also contains a classification into standardised and more naturalistic NDRT. Standardised techniques intended to imitate more naturalistic NDRT are, for example, the cognitive loading n-Back Task [32] or the visual search task SuRT [33]. A list of standardised and naturalistic NDRT studies in the context of different degrees of automation can be found in [6]. Standardised tests have advantages such as better comparability and repeatability. The disadvantage of standardised tests can be seen as the lack of transferability of results to reality. Similarly, the motivation to perform tasks is supposedly higher in more naturalistic NDRT than in standardised activities, which can lengthen the time needed to take over control. In [34] test persons performed the visually distracting SuRT and needed more time for a takeover than drivers without NDRT. Studies by [35] as well as [36] also used SuRT as a distracting activity and delivered similar results [29]. compared the effects of different NDRTs by means of SuRT and an n-back test on the ability to take-over vehicle control. The two NDRTs did not show significant differences in driving behaviour during the take-over situation. In the study by [37] a standardised quiz was used as a NDRT. The subjects did not react significantly different compared to a control group without additional activity. However, they showed a shorter time gap to an obstacle after taking the quiz.
Other studies focused on more naturalistic tasks such as reading news articles [38] [23]. investigated the different emphasis of NDRT in automated driving. In one experiment, several versions of a quiz game were implemented to simulate an increasing workload. In all versions, the question was played audibly, but the answer options were presented differently (acoustically or visually). The answering modalities were also varied (verbal or motor). The greatest impairment of acquisition ability was found for the variant that included a combination of acoustic, cognitive, visual and motor load. In a study by [25], participants performed two NDRTs on a tablet (reading a newspaper article, playing Tetris) and compared both NDRTs with a baseline test. In comparison to a control group, the takeover times for both NDRTs were significantly longer. However, the comparison among the NDRTs showed no significant difference.
The influence of different writing activities on a mobile device (texting) regarding the take-over quality during automated driving was investigated by [7]. They concluded that the different task modalities have an influence on the take-over quality. A motor-visual task (texting) shows worse reaction times than other NDRTs (visual-verbal) and when driving without NDRT [39]. also examined the influence of naturalistic NDRT (writing email, reading news and watching video) on takeover performance. No significant effects on reaction times (hand to the steering wheel) were found within the NDRTs investigated.
In this context, it can be concluded that different factors influencing the ability to take over during automated driving have already been identified and researched in the literature. Furthermore, it can be reported that standardised and naturalistic NDRT have already been investigated. However, comparatively few studies investigated more than just one NDRT. In addition, the research has shown that when several NDRTs with different demands were studied, no significant differences in ability to take over control were found among the different activities depending on the study.

Research questions
The investigation of the workload caused by various naturalistic NDRTs during automated driving has not yet been sufficiently investigated and thus represents a research demand. In this paper the NDRT is considered as an independent object of investigation during automated driving, which results in the following research question: 2.4.1 RQ1: how does the mental workload differ when performing different naturalistic NDRT during conditional automated driving?
Since there is no explicit research on this issue, the following undirected difference hypothesis is made H 1 : There is a significant difference in the mental workload when performing different naturalistic NDRT during automated driving.
So far, the reviewed studies indicate that NDRT have an impact on the take-over ability of vehicle users. How different naturalistic NDRT affect the ability to take over and whether this can be explained by the previously investigated construct mental workload is to be examined more closely with the second research question.

RQ2
: how does the take-over time differ between different naturalistic NDRTs and can this be explained by mental workload?
Which leads to the following hypothesis H 2 : There is a significant difference in take-over time from automated to manual driving depending on various NDRT performed. H 3 : With increasing mental workload caused by NDRT during automated driving, the ability to take-over significantly decreases.

Examined NDRT
A selection of five NDRT was evaluated by means of an online survey [40]. It was ensured that they differ in terms of their physiological load modalities. The activities to be further investigated are: Reading text (visual load), listening to radio reportage (auditory load), watching video (combination of visual and auditory load), texting (motoric and mental load) and monitoring the ride (baseline, L2 automation). To provide natural NDRT during conditional automated driving, a tablet was placed on the centre console of the vehicle. We made sure that the text is displayed in sufficient font size (about 150 words per DIN A4 page). A radio report was selected for auditory NDRT, which was a podcast for travellers.
When choosing the right content for the NDRT, watching video, movies and TV shows were excluded to avoid that the test persons already knew them. For this reason, a scientific magazine was selected. To create the highest possible degree of authenticity in texting, the study supervisor was integrated into the experimental setting. A chat program was opened on the tablet, which enabled the subjects to communicate with the supervisor. This included chatting about their favourite food or the last holiday destination. The last activity does not offer the test person any other tasks in this setting apart from the pure monitoring of the driving. To ensure that the people perform the NDRT, check questions were asked about the content at the end of a run. To increase motivation to prioritise the NDRT, the participants were promised a higher financial reward if they answered at least half of the control questions during the NDRT correctly. Two subjects, who answered less than 40% of the primary task questions correctly during the particular NDRT, were excluded from the data analysis.

Workload measurement
Since the informational stress and strain cannot be measured directly, mental workload measurements are used as suggested by [14,15]. Subjective, psychophysiological and performance measurement approaches were used in this study and are presented below.
The subjective measure is based on the assumption that the respondents are best able to assess their mental workload themselves [41]. Subjective mental workload measurement methods are popular because of their practical advantages, e.g. the low cost, as no equipment is required and high ease of use. The National Aeronautics and Space Administration Task-Load Index (NASA-TLX) by [42], the Subjective Workload Assessment Technique (SWAT) by [43] and the Workload Profile (WP) after [44] are the most frequently used subjective methods for mental workload measurement. According to [45] the NASA TLX has a high validity, reliability and user acceptance compared to SWAT and WP, as well as a high diagnostic accuracy in dynamic environments. Furthermore, [46] show that the NASA TLX has a high sensitivity and is considered more sensitive compared to other subjective evaluation scales. Because of this, the NASA TLX is used to measure workload in this study. An increasing value correlates with an increasing load. According to a meta-analysis by [47], overstraining can occur if the overall NASA TLX score is 60 and higher; under 37 points understraining occur.
Psychophysiological measures include both the measurement of the physiological reactions of individuals to task performing and the relationship between psychological processes and their underlying physiological characteristics [48]. The physiological responses of the organism are activated autonomously and therefore unconsciously by the peripheral nervous system. Advantages result both from the continuous measurement as well as from the small to non-existent interference with the task fulfilment [15,49]. In addition to the advantages mentioned above, there are also limitations, since other influences such as physical stress, environmental conditions and the individual condition of the subject also affect the measurement results. An electrocardiogram (ECG) records the electrical activity of the heart over time. Relevant for the recording are the R-spikes, which describe the highest positive peak in the ECG signal. The Heart Rate Variability (HRV) is a physiological parameter for mental workload. Based on the R-R interval, heart rate variability is described over time [50]. With increasing load, the differences in R-R distances are reduced and the HRV decreases. According to [51][52][53] HRV decreases under both informational and physical load. The VarioPort measuring system from Becker Meditec GmbH was used to determine the psychophysiological load parameters.
Another possibility is to determine mental workload through performance measures [15]. developed a model based on the inverted U-function of optimal arousal from [54] which connects mental workload to task performance. Typical performance parameters of driving tasks are the average speed, the standard deviation of the speed or the time distance to the vehicle in front (timeto-collision). However, during automated driving and the assessment of NDRT, these driving context-related parameters cannot longer be used. To be able to measure mental workload with performance measures, it is appropriate to measure the spare capacity of mental workload. Therefore, a secondary task for the subject is added. Secondary tasks such as reaction time tests or time estimation tasks are usually found in the literature [55]. Furthermore, measurement with secondary tasks can be divided into two paradigms [56]. With the Loading Task Paradigm, the performance of the secondary task is to be maintained, the performance loss of the primary task is thereby measured. Within the second paradigm, the Subsidiary Task Paradigm, the subject is instructed to avoid deterioration in the performance of the primary task at the expense of the secondary task.
Depending on the primary task demand, resources are required from the primary task. Due to the fact that resources are limited [17], only the remaining capacity can be used to perform the secondary task. Consequently, the performance of the secondary task varies depending on the task load of the primary task. This difference in performance of the secondary task is measured and can be compared. Figure 1 illustrates that the task load in the form of resource consumption is a fluctuating curve. The task demands are therefore interpreted as a continuum rather than a steady state (cf. [58]). If no differences in secondary task performance are measured for tasks of varied complexity, this may be caused by the subject choosing the priority of the task incorrectly and in favour of the secondary task (change from Subsidiary Task Paradigm to Loading Task Paradigm).
For the study, a Detection Response Task (DRT) according to [59] is chosen, taking the Subsidiary Task Paradigm into account. The participants in the experiment must react to a stimulus that occurs randomly every 3 to 5 s for approximately 2 min, by pressing a button. The stimulus is emitted for 1 s or until the participant returns a positive response. A valid response to a stimulus exists if the subject presses the button within 100-2500 ms after the stimulus begins. Unrealistic responses below 100 ms and responses longer than 2500 ms were not evaluated and were coded as a fault. This value is included in the calculation of the percentage hitrate. The visual stimulus (LED 5 mm, light colour 626 nm) was head-mounted at 12 to 13 cm to the left eye. This head-mounted variant offered the advantage that the stimulus was always in the same position in the field of vision even during head movements. The response button is contrary to [59] located in a comfortable position on the left armrest of the driver's door instead of the finger itself. This adjustment was necessary due to the design of the NDRT and an enhanced cable rupture protection.

Take-over controllability
During conditional automated driving, the vehicle user must be able to respond to a TOR from the system at any given time and take over vehicle control [60]. In this paper we will only focus on the time factor in take-over controllability. However, time is not the only consideration, the quality of take-over also has a crucial role in this context. For more information, see [40]. In this paper, the term take-over time is used to describe the minimum take-over time. This is the time difference between the start of the TOR and the minimum time value of the steering or braking intervention. A brake engagement was classified as such if the brake pedal was moved by at least 10%. For steering intervention, a change in the steering angle of at least 3°has been found to be appropriate (cf. [61]). Generally, shorter reaction times correlate with better take-over controllability.

Apparatus
At the time of this research, neither a production nor prototype test vehicle was available that could meet the conditional automated driving characteristics as defined by [5]. Therefore, the test trials were carried out on the static driving simulator at the Institute of Ergonomics and Human Factors at TU Darmstadt. The driving simulator consists of a fully assembled vehicle mock-up (Chevrolet Aveo, 2008) surrounded by six projection screens. Three front projection screens provide a forward and side view and another three provide a view of the rear traffic, which the test person can see through the existing exterior and interior mirrors, see Fig. 2.
We used the Silab simulation software by WIVW GmbH for this study. A automation controller for conditional automated driving according to [5] was developed for this investigation. This provided a standardized and thus comparable test drive for each participant. During the automated drive, the driver can intervene at any time and override the automation system.

Driving scenario
For each NDRT to be investigated, a separate urban route was designed. According to [62], a typical urban route has characteristics such as a permissible maximum speed between 30 and 50 km/h, a rather high traffic density, traffic light systems, increased number of road signs as well as turning and braking procedures. The simulated urban route has a length of approximately 19 min (9 km) for each NDRT. To ensure that the participants cannot anticipate an impending TOR, the order of the individual test sections in the route design and the traffic routing was varied for each NDRT. For all five TOR scenarios, no additional traffic was added to keep the influence factor of traffic density constant. The TOR takes place for each NDRT on a straight section of road at a speed of 13.8 m/s after passing a pre-defined waypoint. During the actual TOR, the subject must prevent an impending collision by evading or braking. After driving around the obstacle, the automation controller is reactivated in the original lane and the subject can continue with the NDRT. A schematic overview of a TOR is shown in Fig. 3. During the measurement section of the mental workload and the secondary task, the automated vehicle drove along the city route and no further incidents occurred.

Study design
Given the high number of variables of the constructs investigated, the decision was made to use a dependent sample in a within-subjects study design [63]. In this case all subjects perform all NDRT in a permuted order.
The vehicle was always driven in an automated mode and one of the five NDRTs was performed. For each NDRT to be examined, the trial run is divided into three sections: 1) psychophysiological measurement, 2) performance measurement with a secondary task and 3) TOR. After the psychophysiological measurement, the NASA TLX questionnaire was answered by the participants for the subjective workload measurement. According to the time requirement of [64] a 7 min section for the psychophysiological measurement was chosen. Since [34] could not detect any effect on the take-over performance after a short trip (5-min) compared to a longer trip (20-min), the TOR was carried out within a 5-min section. A visual, auditory and vibrotactile TOR that had already been empirically evaluated (cf. [61]) was used in this study. A red steering wheel icon was projected onto the road using a head-up display; a warning tone was emitted through the in-car audio system and a vibration was generated by the in-seat motors. All three alert stimuli were delivered simultaneously and did not differ across the study. The secondary task is simultaneously carried out with the NDRT in a sevenminute section, see Fig. 4. Before the actual investigation began an acclimatisation, the drivers were intended to get used to the simulator and were already presented with an exemplary TOR. After the approximately fiveminute training drive, reference measurements for the psychophysiological measuring and for the secondary task were carried out without performing a NDRT nor automated driving of the car.

Data analysis
The measured parameters are displayed in Boxplot diagrams. The significance tests were selected based on a decision tree from [65]. The significance level was set to α = 0.05.
Since several NDRT were examined, an ANOVA with repeated measurements were used. If the standard deviations within the NDRT differ, a Greenhouse-Geisser correction was applied. As soon as significant differences were found, post-hoc tests were then performed to determine differences between the individual NDRTs.

Sample details
Sixty-two subjects could be recruited for the study. Six of them had to prematurely stop the trial due to simulator sickness and were not included in the data analysis and for two persons, one NDRT data set each had to be excluded as the persons did not achieve sufficient results on the NDRT control questions. The participants were distributed in almost equal proportions across the gender. A total of 30 male (53.6%) and 26 female (46.4%) participants were part of the study. The subjects were 19-59 years old and had an average age of 33.2 years (SD 12.0). The experiments took place in February and March of 2019.

Mental workload
The evaluation of the subjective workload was carried out using the NASA-TLX. The weighted overall evaluation shows that the workload for the NDRT Reading is the highest with 52.47 (SD = 17.68 points) of 100 possible points. The detailed results are shown in Fig. 5 and listed in Table 1 (mean value and standard deviation as well as the respective post-hoc tested mean value difference). A significant higher mental workload can be recognized between the reference measurement (23.24 points, SD = 19.11) and all NDRT. The result of the multifactorial analysis of variance with repeated measurements confirms this significant difference between the tested factors [F (5, 265) = 28.67, p < 0.001, f = 0.37]. The workload depending on the NDRT shows a significant difference only between the tasks Monitoring Ride and Reading. The arithmetic mean of the perceived workload decreases in the following order: Reading, Listening, Watching a movie, Texting and Monitoring ride.
Objective measured workload is given in this paper by the heart rate variability (HRV) parameter rMSSD. A low rMSSD value indicates a higher mental workload. In comparison to all other activities, significantly lower values can be found for Texting (33.64 ms, SD = 16.14   Table 3.

Take-over time
The parameter Take-over time results from the time difference between TOR and steering or braking intervention by the participant. The value should be as low as possible to be able to claim a good take-over capability. The longest average minimum take-over time could be determined for the NDRT Reading   Fig. 8 and Table 4.

Discussion and conclusion
The results presented are used at this point to answer the research questions presented at the beginning. The trimodal approach of subjective, psychophysiological, and performance-based measurement methods was used to assess mental workload. The methods used are reviewed below and the results are discussed in closing. At the beginning of the experiment a reference measurement was carried out for all mental workload characteristics to establish comparability. As this reference measurement was performed in the paused simulator, the test participants might have felt an initial excitement due to the unknown situation. As a result, there may be a bias in the subjective perception and in the psychophysiological data.
The perceived workload was measured using a NASA TLX questionnaire. Since no data were available yet on the actual performing of naturalistic NDRT during automated drive, these results can be used as a first data basis for further research purposes. A weighting of the individual six dimensions was carried out for each NDRT by pair comparison. Due to the differentiated scores in the individual categories, the total score does not provide clear indications regarding the mental workload of each NDRT. More information can be found in [40]. Benefits of the method include easy handling. Despite the dimensional description in the questionnaire, there may have been errors in answering the questionnaire and thus a misjudgement of the respondents.
The psychophysiological data collection for mental strain measurement turned out to be less reliable due to the high variance. A clear distinction as to which NDRT are more demanding cannot be satisfactorily assessed at this point with the measurement methods used. During the actual examination of NDRT, the measurement of cardiovascular activity was carried out in such a way that the physical load was as low as possible. As the psychophysiological measurement showed, Texting was the most demanding compared to the other NDRTs. Since the typing also involved the motor part of the hand-arm system, it can be argued that this may have resulted in a lowered HRV. A clear distinction between physical, mental or emotional load is not possible when evaluating the characteristics of the electrocardiogram, so that influences of physical and emotional load on the mental workload cannot be excluded. A disadvantage of the secondary task method is the increase in load caused by the DRT itself, since it must be considered as an independent load [66]. However, even if the informational processing requirement of the stimulus-response time test can be regarded as minimal and can be learned quickly, it cannot be excluded that the DRT may bias the simultaneous measurement of psychophysiological data. But even if this is the case, this is not relevant, since it is not the absolute values that are considered but rather the relative comparison between the NDRT. The participants were fast to understand the function of the DRT. According to the Subsidiary Task Paradigm, the performance drop should only occur in the secondary task. The DRT proved to be a very sensitive measuring tool, since it is very well able to recognize even small differences, cf. [67]. For example, in this study, significant higher mental workload expressed by a longer reaction time could already be observed at the Monitoring ride in comparison to the reference measurement. Furthermore, the DRT reaction-times revealed significant measurement differences and small variances compared to the psychophysiological measurements. On the other hand, no significant differences were found at the DRT hit-rate.
After discussing the methods in detail, we will summarize these below. This study could prove that the mental workload differs depending on the NDRT while conditional automation driving. For the aforementioned reasons, the hypothesis H 1 cannot be refuted. The subjective workload perception for each NDRT investigated differs significantly from the reference measurement taken during vehicle standstill without NDRT. It was found that Reading was perceived as the most demanding NDRT. However, all examined NDRT showed a high variance, so that a clear distinction is not possible. In addition, the single dimensional analysis showed that the test participants enjoyed Texting in particular, as they indicated a lower frustration level, which can explain the comparatively subjectively low perceived feeling of workload.
The psychophysiological parameters also show a high variance among each other and can additionally react sensitively to emotional and physical stress. Cardiovascular activity in the form of HRV has been identified in various literature sources as a mental workload indicator. Significant differences in NDRT were also found in this study. Texting shows a significantly higher load despite the high variance of the measured values.
The results of performance-based workload measurement show similar results to those of psychophysiological measurements. Texting is also the most demanding activity. Reading also proves to be more demanding than Watching a movie, Listening or Monitoring ride.
However, the results of the different measurement methods are not all consistent in their own aspects. Large differences were found in the self-assessment instrument NASA-TLX, because Texting was found to be less mentally demanding. This finding is of great interest, because in literature the workload is often only represented by the NASA TLX. This leads to possible falsifications, as objectively measured activities with a high workload are perceived as comparatively less demanding due to the actual joy of use (expressed by a low level of frustration). A sensitive tool to determine the mental workload of a NDRT during automated drive is the performance-based measurement by means of competing secondary tasks in form of the DRT. Reading and Texting were consistently identified by psychophysiological and performance-based measurement methods as the most demanding NDRT. The subjective perception confirms this only for the NDRT Reading. Listening, Watching a movie and Monitoring ride showed no significant differences at the psychophysiological and performance-based parameters.
An essential question in the investigation of conditional automated driving systems is whether the user of the vehicle can quickly take back control of the vehicle in case of a TOR.
As described in the theoretical part, TOT depends on many factors. The influence of the design of the TOR was empirically examined in advance of this study and the best version was chosen [61]. Through a training drive before the actual trial, the participants were already confronted with a TOR, allowing them to gain a sufficient knowledge of the system. A special aspect of this study is the consideration of an urban scenario. There have been studies (such as in [29]) that showed that a more complex traffic situation is resulting in longer TOT. The purpose of this study was to simulate a complex traffic situation in the urban scenario, so that a worst-case situation could be investigated. Therefore, even longer TOT can be ruled out due to the traffic scenario. The TOR was carried out on a straight to be able to differentiate between measured steering angle changes initiated by the test person and the target steering angle of the automation controller. For following experiments, the investigation area should be similarly structured as in this study. A time budget of 6 s ensured that there was no collision with the obstacle and should be shortened in future so that more significant results on take-over can be achieved. In total, the test participants experienced six TOR over the entire study. No learning-effects in dependence of the number of TOR experienced could be determined.
Significant differences were found in the minimum take-over time parameter depending on the performed NDRT. Therefore hypothesis H 2 cannot be refuted either. The mean of the minimum take-over time for this study are between 1.10 s (Listening) and 1.64 s (Reading) and therefore in a shorter range compared to the presented literature.
Moreover, the results of this work differ from studies that have also examined multiple NDRT. In [25] no differences were found between the take-over times of two NDRT [39]. also examined the influence of naturalistic NDRTs (writing e-mails, reading messages and watching videos) on take-over performance. In their study, no significant differences in take-over times were found in relation to the NDRT examined. A possible explanation for this might be the fact that the NDRT were explicitly the focus in this experiment and that the participants were questioned about the content of the NDRT. This meant that they were even more involved in the implementation of the NDRT.
In a more detailed analysis, the relationship between the construct mental workload in relation to the take-over time was identified. A regression analysis reveals that the take-over time increases significantly with increasing mental workload [F (1, 268) = 30.74, p < 0.001. R 2 = 0.103], see Fig. 9. Consequently, NDRT with high mental workload lead to longer reaction times. Hence the hypothesis H3 cannot be refuted. In contrast to the study of [68], a significant correlation between mental workload in the form of DRT and the ability to react in a critical situation was found in this study. To ensure a better TOT, it can be concluded from this study that individuals should not have a high mental workload. The mental workload can be influenced significantly by the task difficulty. A mental workload that is too low can lead to insufficient demand and thus to monotony-induced fatigue (cf. [69,70]), which should also be avoided. Due to the mentioned conflicts between the execution of NDRT and a short take-over time, the question arises if the automation level 3 is a desirable automation approach or if NDRT should only be allowed from level 4 onwards, where the individuals do not have to intervene in the driving process anymore.

Limitations
Every study has certain limitations that must be considered when interpreting the results. Since no vehicle with the appropriate level of automation was available, the research had to be carried out in a driving simulator. However, this has the consequence that the transferability to real road traffic may be limited. For example, the participants may have behaved differently in the simulator due to a potentially increased feeling of safety. In this case, it could be possible that individuals are less likely to engage with the NDRT and that in field experiments the results could differ from each other. In order to verify this, a questionnaire was handed out at the end of the experiment to evaluate the test setup. The average answer to the question "In real road traffic I would have behaved differently in an automated vehicle" is 3.49 (SD = 1,20; 1 = Not applicable at all; 5 = Fully applicable). This confirms the hypothesis recently made. It was reported that they did not trust the automated system and therefore continued to monitor the automated ride. Even though the measured absolute values could differ from reality, the identified relative differences between the NDRT can be transferred to real world conditions [71]. Due to the number of variables and the sample size, a Within-Subjects Study Design was used. Here, especially learning effects must be considered. To counteract this, the course was designed differently for each NDRT. The duration of the experiment was between 3.5 and 4 h per person, depending on how much time the subjects needed to answer the questionnaires. Overall, there were sufficient breaks between the sections for food intake and recovery. However, some participants in the experiment stated that they felt that the experiment was taking too long. To ensure good transferability to real-world practice, the focus of this research is on naturalistic NDRT. However, the drawback of naturalistic NDRT as opposed to standardised (e.g. n-Back or SuRT) is that it is less comparable with other studies.

Conclusion
In a simulator study, the effects of NDRT in conditional automated driving were investigated. In contrast to partially automated vehicles, where the driver must monitor the driving situation continuously, the users can turn away from monitoring the road. However, the vehicle user is still considered as a fallback level and must therefore be ready to take over the vehicle quickly and safely. But the parallel execution of NDRT and a short takeover time in critical situations conflict with each other. Therefore, the gain in comfort, handling NDRT's, results in reduced take-over controllability. To ensure a better take-over controllability, it can be concluded from this work that vehicle users should not be exposed to high mental workload. A lack of mental workload can lead to understress and thus to monotonous fatigue, which should also be avoided. This can be achieved, for example, through gamification and the targeted use of NDRT [69,70]. Due to the conflicts mentioned between the execution of NDRT and a short take-over time, the question arises in summary, whether the automation level of conditional automated driving is an approach to be striven for or whether NDRT should only be permitted from even higher level of automation, where human intervention in the driving process is no longer required.