Identifying interaction types and functionality for automated vehicle virtual assistants: An exploratory study using speech acts cluster analysis

Onboard virtual assistants with the ability to converse with users are gaining favour in supporting effective human-machine interaction to meet safe standards of operation in automated vehicles (AVs). Previous studies have highlighted the need to communicate situation information to effectively support the transfer of control and responsibility of the driving task. This study explores ‘interaction types ’ used for this complex human-machine transaction, by analysing how situation information is conveyed and reciprocated during a transfer of control scenario. Two human drivers alternated control in a bespoke, dual controlled driving simulator with the transfer of control being entirely reliant on verbal communication. Handover dialogues were coded based on speech-act classifications, and a cluster analysis was conducted. Four interaction types were identified for both virtual assistants (i.e., agent handing over control) - Supervisor, Information Desk, Interrogator and Converser, and drivers (i.e., agent taking control) - Coordinator, Perceiver, Inquirer and Silent Receiver. Each interaction type provides a framework of characteristics that can be used to define driver requirements and implemented in the design of future virtual assistants to support the driver in maintaining and rebuilding timely situation awareness, whilst ensuring a positive user experience. This study also provides additional insight into the role of dialogue turns and takeover time and provides recommendations for future virtual assistant designs in AVs.


Introduction
Automated vehicles (AVs) are becoming increasingly dependent on effective human-machine interaction to meet safety standards (Klein et al., 2004;Louw et al., 2015;Walch et al., 2017).Furthermore, high-profile collisions of AVs are frequently attributed to human factors such as situation awareness (SA), trust in automation, workload, and driver attention (Stanton and Young, 2000;Heikoop et al., 2016;Merriman et al., 2021).This is partly attributable to a shift in the role of the driver as automated capabilities increase (Shaw et al., 2020).Due to the pace of this change, there are many unanswered questions on how the human-AV relationship will develop in-line with dynamic user-requirements and the evolution of this technology (Banks et al., 2014).This, in tandem with the recent endorsement of level 3 AV systems, such as Automated Lane Keeping System (ALKS) (which paves the way for such vehicles to come to market (BBC, 2021)), means that the AV domain is facing a time-critical engineering problem regarding how these vehicles should be designed to communicate situation information to improve safety, and calibrate trust in the user population, whilst maintaining an overall useable experience for the driver.Issues such as these have a profound impact on the safe operation of these systems.For instance, reduced SA can introduce vulnerabilities when taking over control (Endsley and Kiris, 1995;Heikoop et al., 2016;Stanton and Young, 2000), and uncalibrated trust can lead to the misuse or disuse of automated systems (Lee and See, 2004).
Onboard virtual assistants (particularly those that feature Conversational User Interfaces; CUIs) are gaining favour in addressing these issues by keeping the driver appraised of the driving situation, as this form of user-interaction is flexible, immediate, and rich in information (Clark et al., 2019c;Large et al., 2017;Velaga et al., 2021).In this context, previous studies have explored what information should be communicated to the driver during automated driving and the transfer of control (Clark et al., 2017(Clark et al., , 2019b;;Stanton et al., 2022).Findings typically highlight the need to enhance or rebuild a driver's level of situational information, often referred to as keeping them 'on', and getting them back 'in', the loop of control (Merat et al., 2019;Banks and Stanton, 2016).However, there is limited guidance on how situation information, in particular, should be conveyed and reciprocated, both in terms of messages sent to the human-driver and those that are fed-back to the automated system.Moreover, in-line with modern theories of situation awareness, and demonstrated in various other domains featuring automated systems, tasks, functions, and responsibilities are expected to become more distributed (Pritchett et al., 2014;Stanton et al., 2006;2017).Thus, as system complexity increases, the ability to transmit a "complete situation picture" to the driver will become progressively challenging.This further increases the criticality of conducting research into viable and effective solutions to communicate safety critical information between the system and human driver.This research study addresses this complex human-AV interaction by exploring the transfer of control between two human drivers operating in a bespoke driving simulator, with two sets of primary vehicle controls; each driver interchangeably took on the role of the AV virtual assistant or the human driver.In so doing, it uncovered new speech-based interaction models, applicable to both drivers and virtual assistants, that can be adopted and calibrated through system design to assist the driver in maintaining and rebuilding timely situation awareness, whilst ensuring a positive user experience.The work is predicated on the use of a virtual assistant employing a verbal exchange of information.This approach (i.e. using a spoken, conversational user interface) is already well-established as a viable and effective solution in the context of driving (see: Large et al., 2017), although other human-machine interfaces (HMIs) may be feasible.The main focus of this paper is to inform the design of virtual assistants and conversational user interfaces in this context, rather than make the case for such.

Driving, automation and situation monitoring
A key challenge in system automation is the inverse relationship between automation and human performance (Banks andStanton, 2016, 2019).For example, in the context of driving, as control actions and decision-making functions (e.g.deciding and setting the vehicle's lane position, speed and headway) become automated, the driver naturally gives less attention to the driving task (Young et al., 2015).This unintended consequence of automation (Parasuraman et al., 2000) takes the user 'out of the loop' (OOTL) of control, thereby reducing their level of perception and comprehension of the system state and driving environment and the projection of their future state, a construct termed 'situation awareness' (SA) (Endsley, 2017).Situation awareness and monitoring is implicated at all levels of Michon's (1985) hierarchical model of the driving task, which encompasses the different spatiotemporal scales associated with control of subtasks at the operational, tactical and strategic levels.For example, continuous monitoring is required for vehicle operations, whereas monitoring relating to tasks at the tactical level varies in response to characteristics of the driving environment (Merat et al., 2019).
Any vehicle with less than full, autonomous driving capability still requires the human driver to remain in the control-feedback loop, and for them to play an active role in the driving task (Banks and Stanton, 2019).In practise, using automation to control parts of the driving task fundamentally restructures the task as a whole, bringing with it changes to the role and responsibilities of the human driver (Banks et al., 2014;Kircher et al., 2014;Banks and Stanton, 2016).For example, in automated mode at SAE level 2, the human driver no longer has physical control of the operational sub-tasks, but is required to continuously monitor the system and driving task and be prepared to take over at any point.Consequently, this level of automation demands the driver remains 'on the loop' (OTL) during automated periods of driving.This requirement for passive monitoring is typically associated with performance challenges linked with driver distraction as drivers are likely to enter an OOTL state.At SAE level 3, the human driver is taken out of the loop by design during periods of automated driving (Merat et al., 2019) and can turn their attention to non-driving related tasks.However the operational design domain (ODD) is typically bounded.Therefore, the driver remains actively responsible for the vehicle and is required to be able to take over control when system boundaries are reached or due to system failures (Stanton, 2023).
Planned takeovers will form a significant new part of the driving task and will predominate handover scenarios at level 3 and 4; (SAE, 2016;Morgan et al., 2016).In this stuation, the human driver is expected to take control safely and efficiently after being removed from the control loop, either physically, by engaging the system to control the operational movements of the vehicle (i.e. level 2, advanced driver assisted systems, or ADAS); or, completely, relinquishing the entire driving task to the automated systems (i.e. level 3 AVs).Human drivers will therefore need to be able to smoothly transition between control loops (see: Large et al., 2019 for a novel exploration).Performance challenges for the human driver will be associated with the need to understand and calibrate the level of attention and situation awareness required in relation to the mode of automation in a timely manner during dynamic operations (Carsten and Martens, 2019;Merat et al., 2019).With this in mind, a virtual assistant may be employed to coordinate 'take-over' activities, raise situation awareness, and guide the driver towards desired actions (see: Clark et al., 2019b;Klein et al., 2004) as an informed and attentive passenger mightalthough the nature of the virtual assistant's 'personality' and the manner in which key information is delivered (see: Azmandian et al., 2019) is undefined in this context.

Virtual assistants
Virtual assistants can embody a variety of characteristics, and the type of virtual assistant required for a specific task is dependent on the environment in which it is being operated as well as the nature of the task at hand.For example, Azmandian et al. (2019) conceptualised four different types of virtual assistant: a 'helpdesk', which focuses on delivering key pieces of information, an 'assistant' that can aide in the management of the task and peripheral features, a 'buddy', that features high amounts of personalisation specific to the user, and a 'guardian angel', that actively monitors and stays aware of the environment and user behaviour.The current work builds on Azmandian et al.'s (2019) conceptualisation, aiming to uncover the characteristics of virtual assistants for managing the takeover task in AV operation, as well as outlining human-driver characteristics and interaction strategies to inform the future design of AVs.Due to the relationship between humans and autonomy becoming more complex, dynamic, and emotionally receptive (Chiu et al., 2020;Schewe et al., 2019), human-human communication was used as a platform to explore natural language and coordination during the takeover task (see.McDaid, 2009 for an overview of how human-human communication can inform human-computer interaction).In doing so, this study generates recommendations for future designs as they become more capable of embodying more sophisticated methods of verbal communication and also contributes towards the philosophical debate of "how human" virtual assistants should be.

Speech acts theory and Verbal Response Modes
Given that the central focus of communication during the takeover of control is task-orientated, a proposed method for analysing verbal interaction in this context is the use of Speech Act Theory (Searle et al., 1980).The central premise of Speech Act Theory is that utterances convey meaning in the form of an action such as a warning, question, advisement, or a statement.For the takeover task, this is of direct applicability as such manoeuvres typically involve notifications, situation awareness information and intended actions.Further, verbal components can readily map onto three core factors that influence trust in automation: the presentation of information regarding the performance of the system, process in which it operates, and the purpose of intended actions (Lee and Moray, 1992), thus providing insight into how virtual assistants can convey intentions and actions.Lampert et al. (2006) developed a classification of the original speech acts proposed by Searle et al. (1980) that has been adapted to be more readily applied to a wider variety of domains, specifically for communication during collaborative tasks (see Table 2 in Section 3.5).These are known as Verbal Response Modes (Lampert et al., 2006;Stiles, 1992).We apply these classifications to simulated human-autonomy interactions to identify how speech can be better utilised to coordinate actions.
The current study presents a speech acts analysis of verbal communication, drawn from Lampert et al.'s (2006) classifications during an AV takeover task between two humans.The handover of control typically requires a notification, an explanation of the current situation, and some level of physical coordination to ensure that the vehicle remains under safe operation (Clark et al., 2019b;McCall et al., 2016).To identify characteristics for AV virtual assistants and human drivers, verbal interaction between pairs of drivers (alternating in the roles of automation or human driver) formed the core dataset of this research study.Cluster analysis was subsequently performed on the dataset to identify the interaction types that participants embody when handing over control to one-another.These analyses inform the AV research and manufacturing communities on the nature of human-autonomy interaction design; how the domain can learn from human-human communication; and how to use virtual assistants to address key concerns within the AV domain such as safety, trust, and usability.Additional considerations such as takeover time (ToT; Eriksson and Stanton, 2017a) and frequency of dialogue turns were also analysed to provide additional insight into efficacy and the development of trust within dialogue, although it is noted that this is not the main contribution of the work, and no object measurement of trust was made as part of this study.
The work is conceptualised and presented as an exploratory study.Nevertheless, it was hypothesized that: H1.In line with Speech Acts theory, participants taking on the role of 'automation' (i.e., driving whilst their partner took part in a secondary task) and those acting as 'human driver' (i.e., engaging in a secondary task while the automation controls the vehicle) will exhibit distinct, clustered compositions of speech-acts in dialogue.

H2.
The number of speech acts in each exchange will correlate with takeover time and the frequency of turns in dialogue, with more speech acts reflected by longer take-over time and a higher frequency of turns.Azmandian et al.'s (2019) taxonomy, distinctly different interaction types will emerge from speech-act cluster analysis and common pairings of interaction types (between 'human driver' and 'automation') will emerge.

Method
The research was conducted in two parts.Firstly, the experiment itself (part one), in which the data were captured and analysed to explore what information should be communicated to the driver.This received ethical approval through the University of Southampton's ethics board (ERGO number: 26691), and is reported in full in Clark, Stanton & Revell (2019b).Part two (reported here) concerns further analysis using transcripts of speech and speech-acts, with the specific aim of exploring how situational information should be conveyed, and in so doing identifies eight different participation roles.This additional analysis received subsequent approval from the University of Southampton's ethics board (ERGO number: 64307).An overview of the original study is provided in the following sections for context, but readers are recommended to consult Clark, Stanton & Revell (2019b) for full details.

Participants
Twenty pairs of drivers (40 participants) were recruited through the University of Southampton's website and advertisements placed around the campus.Participants were aged 18-61 years (29M, 11F; mean age = 31.1,SD = 10.07).All participants held a full UK driving licence.Participants reported to drive a mean of 7,169 miles annually (SD = 5,151 miles).Pairings were randomly allocated, and participants within pairs were unknown to one-another prior to the experiment.

Design
Driving pairs took it in turns to drive a simulated vehicle (taking on the role of automation) or read a book (taking on the role of a driver receiving control after a secondary task).The pairs took part in five handover conditions prior to taking part in the experimental condition, which were inspired by shift-work domains (e.g., Bickmore and Cassell, 2001;Clark et al., 2019a;Rayo et al., 2014;Riesenberg et al., 2009).Each condition featured six transfers of control (three in each role).Table 1 outlines these conditions.

Table 1
Table to show experimental conditions and related descriptions that occurred prior to the final free-form condition.

Condition Description
Free-form Participants were free to communicate however they saw fit Checklistreadback* 'Automation' delivered key pieces of information from a checklist and required the driver to repeat information back Checklistquestions* 'Automation' queried the incoming driver on a number of pre-set questions related to the checklist Open-questions The incoming driver was prompted by the 'Automation' to ask any questions they might have before receiving control Timed 'Automation' notified the incoming driver to take control and counted down from 60 Note. * = checklist information included: Hazards, Lanes, Fuel, Speed, Exit & Action.

Table 2
Speech Act classifications (Lampert et al., 2006).Participants took part in the 'free-form' condition at the beginning of the experiment followed by each of the pre-set conditions presented in a counter-balanced fashion.The checklist used in the eponymous conditions comprised key pieces of driving-related information curated during an earlier workshop and inspired by two concepts: IPSGA (used as a driver coaching system; Stanton et al., 2007) and PRAWNS (a checklist used in air-traffic control; Walker et al., 2010;Wilkinson and Lardner, 2013).These conditions served as a way of introducing drivers to a range of information-handover techniques common in professional, shift-handover settings, and to provide them with knowledge on what key pieces of information might be most appropriate to communicate during handover situations in automated vehicles, but did not prescribe specific turn-taking behaviour.
Following the different conditions, participants took part in a final, free-form condition (the 'experimental condition' for this study), where they were given a second opportunity to handover control in an unrestricted manner.The interactions captured during this final, free-form condition comprise the core dataset used in the analysis reported herein.During this final, free-form experimental condition, participants were not instructed nor prompted to communicate in any particular manner.This allowed for the collection of 'natural', but also informed, data on verbal communication during the transfer of control.During the experiment, all speech data were recorded via a microphone placed on the dashboard and this was subsequently transcribed.

Apparatus
The driving simulation took part in the Southampton University Driving Simulator featuring a 135-degree view and a rear-view mirror to simulate a UK motorway environment.STISIM Drive was used to emulate the motorway environment and simulate vehicles travelling between 62 and 72 miles per hour, with slower vehicles being generated in the left lane and faster vehicles in the right (the test vehicle was driven in the central lane).Two functioning steering wheels and sets of foot pedals were fitted within the cockpit of the simulator to allow either participant, each of whom occupied one of the front seats of the vehicle, to drive (Fig. 1).A button located on the right side of each steering wheel enabled control to be transferred instantaneously between drivers and was activated by the driver taking control (i.e. when they were ready to receive control).A curtain was placed between the drivers to restrict interactions to verbal exchanges only.A webcam was placed behind each driver to monitor their interactions with their respective steering wheels.

Procedure
Participants were welcomed, briefed, and asked to sign a consent form.They were then asked to fill out a form outlining their demographic information (i.e., age, gender, driving experience).They were introduced to the simulator and shown how they should transfer control to one another using buttons attached to the steering wheel to receive control.Participants were told that the experiment was to explore interactions with automated vehicles and that each participant will take on either the role of an "automated assistant in control of the vehicle" or a "human driver taking part in a secondary task and then taking control of the vehicle".
Drivers took part in a 5-min practise trial to familiarise themselves with the environment and the process of transferring control.Participants then took part in the five 'handover' conditions outlined in Table (starting with free-form and continuing with the four counter-balanced conditions), with each condition comprising six handovers of control between participants (three in each roleautomation and driver).Each condition began with one participant driving and the other reading a magazine (the 'secondary task').After random intervals (between 1 and 2 min), the driver currently in control of the vehicle was tapped on the shoulder as a prompt to begin the handover process.When tapped on the shoulder, the current driver was instructed to handover control to their partner using verbal instructions only, and to follow the protocol specific to that condition (see Table 1).For free-form conditions, participants were not restricted to any existing technique.Each condition was concluded when the sixth transfer of control had been complete, and one additional minute had passed.The sixth condition featured a final freeform condition for pairs to communicate however they saw fit, albeit restricted to verbal communications only.

Method of analysis
For the first study, all transcripts were coded according to the information transmitted verbally, and method of delivery during handover (see : Clark, Stanton & Revell (2019b) for full results and analysis).For this paper, only speech data captured during the final, free-form condition were utilised, and these were analysed using content-analysis with speech-act classifications as a coding structure.Classifications were drawn from Verbal Response Modes (Lampert et al., 2006;Stiles, 1992) which feature classifications of speech acts that have been repurposed to be universally applicable.The coding structure is outlined in Table 2.
Turn Constructing Units (TCUs), a segment of speech that is sufficiently complete and can be interpreted as turn-ending by the recipient, (Couper-Kuhlen and Selting, 2001;Ford and Thompson, 1996;Selting and Couper-Kuhlen, 2001) formed the analysis components for the content analysis.For example, the word "Yes" may only be one word, but conveys meaning, and can be interpreted as a complete turn, whereas "where are" is not sufficiently complete to be defined as turn-ending dialogue.A TCU can also result in the continuation of the current speakers turn through syntactic, prosodic, and/or pragmatic continuation (e.g., "There is a vehicle in the left lane … um … and you are expected to turn off at the next junction", featuring two TCUs that could form a turn individually, but are connected by the utterance 'um' into a single dialogue turn) (Ford and Thompson, 1996).Each TCU was analysed and attributed to a single speech-act classification from Table (Lampert et al., 2006).Pauses, intonations and inflections were not included in the coding process.Video recordings were used to measure times from the beginning of the handover to the moment of control transfer.
Two analysts coded the transcripts for each of the 20 pairs of participants.An inter-rater reliability test showed that the analysis featured sufficient amounts of reliability (Cronbach's Alpha = .78).Both analysts discussed their coding strategy and jointly constructed a synthesised, final analysis of the transcripts by coming to an agreement on TCUs that featured non-identical speech act classifications.
Analyses of speech-acts during handover dialogue included the following:

Descriptive statistics on dialogue turns and takeover time and a
Pearson's correlation between these two measures (Section 4.1; informs H2) 2. Descriptive statistics on overall speech-act frequencies for both driver and automation roles (Section 4.2; informs H1). 3. A TwoStep cluster analysis on the proportions of speech acts within each handover to identify the various interaction types that participants exhibited (Section 4.3; informs H1). 4. Two multi-variate between-subjects ANOVAs (one for the role of driver and one for automation) to analyse the differences in the frequency of turns and ToT between the interaction type clusters (Section 4.4; informs H1). 5. A cross-tabular overview of the pairings of interaction types exhibited within the handovers (Section 4.5; informs H3) Within these analyses, the term 'role' is used to refer to whether the speaker is taking on the actions of the 'driver' (reading a book during periods of automation and taking control of the vehicle when requested to do so) or the 'automation' (currently operating the vehicle before handing control over to the driver).These roles consist of various 'interaction types' (see below), defined by the clusters that were identified in this analysis and represents the verbal-communication strategy that the speaker has taken on during dialogue.

Frequency of dialogue turns and takeover time
During the experimental condition (i.e. the second free-form handover used for the current analysis), there was a total of 240 handovers.This comprised 3 handovers for each participant within each role (i.e. 3 handovers × 2 roles × 20 pairs × 2 participants per pair).Each pair took it in turns to either be the automated system, or the human driver (i.e. 120 handovers in each role).
A dialogue turn was defined as the beginning of a statement or utterance, finishing with the other participant beginning their statement or utterance.The handover process did not have a time-limit.Therefore, participants were able to interact for however long they felt necessary and take as many dialogues turns to complete the handover.Overall, participants exhibited a median of 4 turns to complete the handover with the lower and upper quartiles lying two-turns outside of this range (2 and 6 respectively; see Table 3).This indicates that 75% of all handovers were achieved via 6 turns or fewer (3 per participant).The remainder of the upper 25% of dialogues were clustered towards the lower end, with some handovers taking up to 19 turns to complete.Figs. 2 and 3 display density plots and normal distribution plots for ToT and frequency of turns for all handover dialogues within the study.As made apparent in the fluctuations within Fig. 3, takeovers typically featured even-numbers of turns (due to handover typically ending with an acknowledgement or disclosure from the driver taking control).
Takeover time was measured from the beginning of the first utterance to the time of control transition, defined as the instance that the button to take control was pressed.This action was reflected in the simulation data to show that control had been transitioned to the second steering wheel.There was a strong Pearson's correlation between takeover time and the number of turns in the dialogue (r = 0.84, p < .001).Dialogues took a mean of 17.98 s to complete (mdn = 13.5), with a mean of 4.31 turns, indicating a mean time of 4.59 s per dialogue turn (expressed as ToT/Turns in Table 3.).Twenty five percent of handovers took over 22 s to complete with some handovers taking up to 72 s to complete.

Overall speech act frequencies
Automation and driver roles exhibited markedly different frequencies in speech acts during dialogue.Those in the role of automation focused primarily on edifications and advisements, providing the driver with information to help them through the process, whilst asking questions to ensure the driver was comfortable and able to take control.Those in the driver role typically acknowledged incoming information, providing the automated system with their perceptions and declarations of readiness.Fig. 4 shows these frequencies of speech acts across roles.Speech acts such as confirmations, interpretations and reflections were seldom used during dialogue for both roles.

Interaction type clusters
All 120 handover dialogues were attributed a proportion value for each speech act, representative of the percentage of the dialogue that featured the respective speech act.These proportions were analysed using a TwoStep Cluster Analysis for each interaction type to identify unique, frequent distributions of speech acts during handover dialogue for the roles of driver and automation (Norusis 2011).The highest cluster quality was selected by including the four most relevant speech acts for each role: both driver and automation identification type cluster analyses included 'disclosures', 'questions' and 'edifications'.Additionally, the driver role included 'acknowledgements' whereas the automation role included 'advisements'.
The TwoStep cluster analyses both produced a sample (n = 120) with a silhouette measure of cohesion and separation of 0.5.The analysis identified four frequent combinations of speech acts for each role, presented in Figs. 5 and 6.Both figures show the labels given to each speech act, the proportion of handovers in which each cluster were attributed to, and a density plot for each cluster/speech act combination (the xaxes represent the percentage of dialogue during handover consisting of that given speech act and the y-axis represents the frequency of handovers in which that percentage has been attributed within that cluster).The following clusters were labelled to reflect the speech-act proportions within each identification type.

Dialogue partner taking on the role of 'automation'
The Information Desk (39.2%) -Provides valuable situation awareness information about the environment and the upcoming situation, either as a response to questions or delivered proactively.The Information Desk may ask the driver whether they have any questions before handing over control.
The Supervisor (32.5%) -Will provide guidance and suggestions for when and how to takeover control, as well as delivering some key information relevant to the situation.
The Interrogator (17.5%) -Typically ensures the driver is aware of their surroundings by 'quizzing' them on the situation.The Interrogator will also provide advisements based on answers.
The Converser (10.8%) -A mix of all speech-acts, conversation is mutual and less likely to be goal-oriented.

Dialogue partner taking on the role of 'driver'
The Coordinator (35%) -Features a high number of acknowledgements and disclosures, with a low number of questions and edifications.This Coordinator has a greater focus on the coordinative aspects of the control transition (e.g., confirming control, and relaying preparedness to receive control).
The Perceiver (26.7%) -Features a high number of edifications, acknowledgements, and disclosures, but rarely asks questions about the situation.The Perceiver relays information to the vehicle ('automation'), either to inform them of what they perceive to be happening or responding to the questions laid out by the automated system.
The Inquirer (24.2%) -Raises situation awareness by querying the automated system about the environment, upcoming actions, or intentions.The Inquirer confirms receipt of information and communicates to the vehicle when ready to receive control.
The Silent Receiver (14.2%) -Communication is short and comprises mainly disclosures.The Silent Receiver will typically relay when ready to take control, but not readily engage in additional dialogue.

Speech Act Clusters and dialogue turns/takeover time
Due to clusters being analysed using proportions of speech-acts, clusters were further analysed to explore whether there was a relationship between the distribution of speech acts and the length of dialogue (i.e.number of turns) and takeover time (Fig. 7).A multivariate between-subjects ANOVA showed a significant main effect of speech-act cluster by turns (F(3,1) = 2.96, adj R 2 = 0.049, p = .036),and speech-act cluster by ToT (F(3,1) = 5.74, adj R 2 = 0.112, p < .001).A Tukey posthoc analysis showed that the difference between clusters on frequency of turns occurred between the Information Desk cluster and the Coordinator (p < .05).In addition, a Tukey post-hoc analysis for ToT showed that the Information Desk cluster was significantly longer than all other clusters within the analysis (p < .05).All other comparisons were not

Interaction type pairings
Table 4 presents a cross-tabulation of the interaction types identified in the TwoStep cluster analysis, showing how many occurrences of pairings of each interaction type occurred within the 120 handovers.The table shows that Perceivers were more commonly paired with Supervisors and Information Desks, Coordinators were more commonly paired with Supervisors and Information Desks, and Inquirers were more  commonly paired with Information Desks.Silent receivers appeared to have no favoured interaction type pairing.

Discussion
By using naturally-occurring human-human speech data, the study aimed to inform the design of virtual assistants and conversational user interfaces for future AVs.The study featured two humans alternating in the roles of 'automation' and 'driver' through handovers of control in a driving simulator, with verbal exchanges being the sole communication method.Speech act classifications (Lampert et al., 2006;Stiles, 1992) and cluster analyses (Norusis, 2011) of speech transcripts identified eight interaction types encompassing the virtual assistant and the human driver, four for each.The results thus provide evidence to suggest that drivers cluster their speech into distinct interaction types (confirming hypothesis, H1).Furthermore, the number of speech acts in each exchange differed in terms of takeover time and the frequency of turns (confirming H2), and distinct driver-automation interaction type pairings were evident (confirming H3).

Time of takeover and dialogue turns
Overall, handover interactions typically featured four turns, with a turn taking a median of 4 s to complete.These findings show that it may take a median of 13.5s and four turns to (re)build sufficient SA and reach a suitable level of trust and coordination between the driver and the automated system prior to taking over control (see.Hoff and Bashir, 2015 for an overview of trust-formation before and during an interaction).Notably, there were multiple instances of handover interactions taking up to 72 s (19 turns) to complete.Eriksson and Stanton (2017a) had previously found upper bounds of 25.7 s for takeover, although this was notably in the absence of communicating situational information.This suggests that the ability to communicate during the takeover may naturally extend takeover time, and therefore the dialogue should be tailored to ensure that the takeover occurs in a timely fashion and safe operation is maintained.Nevertheless, it has also been argued that takeover time should be paced by the driver and not the machine (Stanton, 2023;Stanton et al., 2021).Regardless, these findings further demonstrate that although communication and the raising of situation awareness are important, AV design should consider the upper bounds of user behaviour.

Characteristics within interaction types
The cluster analyses on speech act classifications found several characteristics that AV virtual assistants and drivers can adopt.In the role of automation, the speech act interaction types were defined as: the Supervisor, the Information Desk, the Interrogator and the Converser.

Table 4
Frequency of interaction type pairings for 'automation' (rows) and 'driver' (columns), with common pairings highlighted in bold/underlined.Most frequent of these, representing 39.2% of occurrences, was the Information Desk, which focused on delivering information about the environment (primary speech act classificationedification), either as a response to the driver's questions or given unprompted.Handovers identified as featuring the Information Desk demonstrated a greater level of detail and focus on situation awareness information, which necessitated a longer takeover time, and a greater number of turns (median turns = 6, median ToT = 19s) -approximately doubling the takeover times of other automation interaction types.Participants who took on the interaction type of the Information Desk (as automation) were paired most commonly with driver interaction types of Perceivers (those that also shared information about the environment), Coordinators (those that featured a high level of acknowledgements and self-readiness), and Inquirers (those that asked questions about the situation).These findings show that users are likely to want to engage in information-transfer during the handover of control.However, this is likely to lead to increased takeover time, which may affect the safe and effective handover of control in certain contexts and situations (Eriksson and Stanton, 2017a).Much like that of the Information Desk, the Interrogator is also concerned with the raising of situation awareness (17.5% occurrence).However, the Interrogator asks questions of the incoming driver to confirm that their level of situation awareness is sufficient and compatible with the system.The Interrogator may therefore be demonstrating indicators of a trust formation process (Morita and Burns, 2014), showing that the automated system may require a demonstration of human-performance measures prior to handing over control.This is supported by current developments in AV technology, which are tailored towards ensuring that the human driver is alert and engaged in the driving task before transferring control (Kashef, 2019).Together, the Information Desk and the Interrogator (totalling 56.7% of occurrences) focus on transactions in situation awareness and ensuring that compatible mental models are developed and/or updated prior to the takeover of control (Sorensen and Stanton, 2016;Stanton et al., 2017).It follows that these interaction types may be most appropriate for safety-critical interactions.However, additional costs of coordinated activity such as time and cognitive resources may need to be incurred to uphold these interaction types (Klein et al., 2004).Intuitive driver interaction type partners for the Information Desk and the Interrogator are the Inquirer and the Perceiver, who ask about and provide perceptions on, respectively, the situation.Together, Inquirers (24.2% of handovers) and Perceivers (26.7% of handovers) made up a small majority of handovers (50.9%) showing that drivers are generally well-receptive to transactions in situation awareness either through the use of querying or being queried.
The Supervisor (with primary speech actadvisement), although partially concerned with the raising of situation awareness, is also concerned with the role of 'directability' -"one's ability to direct the behaviour of others and complementarily be directed by others" (Klein et al., 2004;Johnson et al., 2014, p. 52).This interaction type accounted for 32.5% of handover interactions and aims to ensure that drivers received critical pieces of information, whilst providing an advisement to take control, thus ending the interaction between driver and automation.The Supervisor features a strength that may alleviate some of concerns of the Information Desk having high ToT upper bounds.The Supervisor provides a definitive end to the interaction by requesting the driver takes action to complete the handover.Intuitively, the Supervisor would typically pair well with the Coordinator driver interaction type, who typically expressed their ability to takeover control and relayed confirmations of the control transition.
The remaining interaction typesthe Converser (10.8% of handovers) and the Silent Receiver (14.2% of handovers) are featured in the least number of handovers and offer relatively undefined approaches.Arguably, the Converser represents a hybrid of previous interaction types by providing some information, advisements, and disclosures.This interaction type could therefore be readily implemented in a many differet contexts.The Silent receiver, however, features the shortest ToT and fewest number of turns to complete the handover, but as such, demonstrates that these participants did not engage with the communication of situation awareness information.This category may suffer from complacency or over-trust (Lee and See, 2004), as there is little opportunity to build a situational picture prior to the takeover (see.Wilkinson and Lardner, 2013 for further discussion into complacency during shift-handover).

Practical implications
The findings show that drivers and virtual assistants may have synergies that complement one another during the handover of the driving task.While one might argue that the characteristics of each of the identified interaction types intuitively suggest natural pairings (see: Table 5 for suggested/intuitive pairings), there was only limited evidence of these pairings during the study (see: Table 4 in Results).In other words, a number of other pairings also naturally emerged.This suggests that, although ideal pairings may be predicted or assumed, the system must be flexible enough to allow other partnerships to develop depending on user preferences and/or the dynamics of the conversation (see: Eriksson and Stanton, 2017b).
From a practical point of view, these characteristics can be instilled within virtual assistants to improve the efficacy of the handover.For example, drivers that want to inquire about the road environment prior to the takeover of control would require a virtual assistant that provides them with key-pieces of situational information.In practise, the car would therefore need to categorise its human driver and develop an appropriate virtual assistant around their preferences and needs.Thus, drivers who prefer to coordinate actions and await information related to the road situation (rather than building it themselves) may be assigned the Supervisor, whereas those that want to be prompted to perceive the environment and actively build their own situation awareness (the Perceiver) may be assigned the Interrogator to support their user-interaction style.
It is worth acknowledging that the study was situated within routine hand-over scenarios, and thus equates loosely to so-called level 3 automation (SAE, 2016).Although routine handovers will likely comprise the majority of exchanges of control at this and indeed, other intermediate levels of automationand thus sufficient time should be factored into the handover to ensure that the exchange can finish in a timely manner, it is expected that there may also be emergency or unexpected handovers that may require a quicker exchange of information.Some of the conversational partnerships highlighted here may be less than ideal in this situationextending the conversation beyond the available time.However, the partnership models are not restricted to the perfunctory handover of control and apply equally throughout the journey experience.In other words, the virtual assistant may adopt the most appropriate interaction type (based on the driver's preferences and prior exchanges) to keep them appraised of the road system during periods of automation, thereby ensuring that they are already well-prepared should an emergency, unexpected handover occur (in a similar manner to the 'chatty' co-driver concept proposal by Eriksson and Stanton, 2017b).Moreover, the exchanges could also be used to highlight the key elements that should be communicated, even if time is limited (the so-called 'must-haves'), resulting in truncated exchanges that are still in-keeping with the partnership models.Additionally, elements of the partnership models could be adopted within higher levels of automation in which the humans may never take control.Indeed, verbal interaction and social artificial intelligence are expected to take a larger role in the AI community generally (Joo et al., 2019), and would therefore likely become more prevalent in higher levels of automated driving.As such, interaction types, such as those identified in this research study (and indeed, the process by which they were extracted), may form the basis of future verbal interactions with autonomous pods, trams, and 'robotaxis', albeit to enhance the user experience rather than the transfer of control, per se.
The findings in this research study suggest that manufacturers may benefit from offering multiple characteristics (or 'personalities') within their virtual assistants by categorising their drivers, and providing customised interactions based on desired purpose of the interaction (e.g., raise situation awareness and coordinate actions during the handover of control, or engage in conversation to keep driver alert etc.).This approach also has the potential to enhance road safety, for example, by managing the time to takeover and the costs incurred by providing greater communication potential and implementing dialogue closers, such as those provided by the Supervisor (e.g., "Please take control of the vehicle now!").In practise, this could be achieved by designing four to six dialogue turns that take in total between 14 and 22s to perform but also provide the driver with an alert to the handover, key pieces of information about the environment, and a definitive closer to the handover activity through an advisement.
Virtual assistant characteristics may also adapt to the context of the handover.For example, when approaching a junction on a highway, the takeover may not be as urgent.In these scenarios, the Information Desk may be more appropriate.On the other hand, a scenario in which fog has closed in, and sensors are no-longer functioning at a suitable level, the Supervisor may be most appropriate to guide the driver towards keyactions, whilst being conservative with ToT.Further research should explore such contexts to determine how situational factors affect the nature of verbal communication and virtual assistant role attribution in AVs.

Situation awareness and trust
It is evident that the interaction types emerging from this study have varying characteristics for addressing issues in raising situation awareness, but these also have the potential to mediate trust.Notably, those that are likely to provide additional safety and security are also those that may counteract the age-old issue of 'silent automation' (Norman, 1990).Moreover, interaction types such as the Information Desk (automation) and the Interrogator (driver) are likely to lead to better-aligned mental models of the driving and road situation (Endsley and Kiris, 1995;Heikoop et al., 2016;Stanton and Young, 2000) and address Lee and Moray's (1992) mediating factors of trust formation (performance, process and purpose) by providing insight into what the automated system can perceive, and what its purpose is.However, this may come at a cost, requiring additional dialogue turns and thereby extending takeover time.Designers and manufacturers should be aware of the tools available to them in raising situation awareness and mediating trust and consider how far in advance dialogues are initiated prior to handover.The value in the use of virtual assistants has the potential to be augmented over time and longevity of use.For example, in automated driving, the nature of interaction is first and foremost orientated around safety.However, as automated systems develop, and user relationships with technology evolve, virtual assistants have the potential of being driving companions (Lugano, 2017;Large et al., 2019), organisers of journeys, and will be able to provide an overview of what and when actions should be performed (Walch et al., 2015).Virtual assistants are also expected to be able to analyse the emotions and awareness of the user (Chiu et al., 2020;Schewe et al., 2019).
Finally, it is noted that the study necessitated the frequent transfer of control (every couple of minutes, or so).Naturally, this would be highly unexpected in a real-world scenario.Nevertheless, it was a necessary part of the experimental method as a means to encourage verbal exchanges and to ensure that sufficient, rich data were collected, and is in keeping with similar driving-related studies in which participants repeatedly experience different conditions in rapid succession.This does not negate the ecological validity of the resultsindeed, the study was concerned with how participants acted out the transfer of control using verbal exchanges (not their views on the performance and resilience of the vehicle automation) and there is no reason to expect these to differ if exchanges occurred less frequently.

Conclusion
By categorising speech interactions between two human drivers tasked with exchanging control between each other, as a proxy to an automated vehicle handing over control to its driver, this research study identified eight potential interaction types for both virtual assistants and drivers.Participants were more receptive to raising situation awareness collaboratively through user querying or user questioning or the coordination of actions.The interaction types revealed in this study provide readers with a framework for defining driver requirements and can be readily implemented with future, virtual assistants.The study also provides additional insight into the role of dialogue turns and takeover time, and how they relate to the raising of situation awareness and the calibration of trust, although such assertions require validation in future work.Regardless, manufacturers and policy makers should consider the trade-offs between takeover time and the necessity to rebuild situation awareness, as virtual assistant interaction types that focus on coordinating actions (rather than building situation awareness per se) may be more suitable in time-critical takeover scenarios.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Fig. 1 .
Fig. 1.Driving simulator set-up to simulate handover between two drivers.

Fig. 2 .
Fig. 2. Overall frequency of handovers by time to takeover.

Fig. 3 .
Fig. 3. Overall frequency of handovers by turns in dialogue.

Fig. 4 .
Fig. 4. Overall frequencies of speech Act classifications grouped by role.

Fig. 7 .
Fig. 7. Box Plots to show number of dialogue turns grouped by Speech Act Cluster for Automation (A) and Driver (C), and Takeover Time by Speech Act Cluster for Automation (B) and Driver (D).

Table 3
Table to show descriptive statistics for Turns, Takover Time and the division of Takeover Time by Turn.

Table 5
Predicted (or intuitive) interaction type pairings.