Vulnerable road users and the coming wave of automated vehicles: Expert perspectives

dominant and advised against text ‐ based and instructive eHMIs. AR was commended for its potential in assisting VRUs, but given the technological challenges, its use, for the time being, was believed to be limited to scienti ﬁ c experiments. The present expert perspectives may be instrumental to various stakehold-ers and researchers concerned with the relationship between VRUs and AVs in future urban traf ﬁ c.

• Augmented reality (AR): AR allows the user to see the real world with virtual objects overlaid or embedded in it.AR supplements reality rather than replacing it (Azuma, 1997).• Automated Vehicle (AV): a vehicle capable of driving itself but which requires human intervention at certain points.Automated vehicles are not to be confused with autonomous vehicles, which are vehicles capable of sensing their environment and moving safely without human input, also known as connected and autonomous vehicles (CAVs), driverless vehicles, robotic vehicles, or vehicles that exhibit SAE Level 5 automation (Taeihagh and Lim, 2019).• Detached eHMIs: eHMIs that are not attached to the vehicle, but which may be projected on the ground or visible elsewhere in the environment or on a wearable.• Explicit communication: "behaviour signaling perception and/or movement without at the same time achieving either of these" (Schieben et al., 2019a).Examples are hand gestures, vocal communication, and eye contact (Schmidt, 2000).• Extended reality (XR): is an umbrella term for immersive technologies, where 'X' can stand for various spatial computing technologies (e.g., VR, AR).• External human-machine interfaces (eHMIs): communication devices located on the outside of the vehicle that can communicate to surrounding road users.An example is an electronic display on the front of the car (Colley et al., 2017;Schieben et al., 2019b).• Implicit communication: "behaviour which is at the same time both achieving and signaling movement and/or perception" (Schieben et al., 2019a).Examples are vehicle speed and gap size (see also Bazilinskyy et al., 2019;Dey and Terken, 2017;Schmidt, 2000).• SAE levels of automation: classification of different levels of vehicle automation from no automation (Level 0) to full automation (Level 5).From Level 0 to 2, the driver has to be in control of the vehicle, and the automated driving system provides limited assistance.At Level 3 (conditionally automated driving), the driver is not anymore required to monitor the road permanently but has to take over control when the system requests so.At Level 4, the vehicle may operate itself without human intervention on certain types of roads, whereas at Level 5, it can drive itself anywhere and under all conditions (SAE International, 2018).• Segregated traffic: refers to the separation of traffic streams, for example through the subdivision of towns and cities into certain units where road traffic is restricted, and pedestrians predominate (Mayhew, 2015).• Shared control: a situation in which human and computer are performing the same task at the same time (Sheridan and Verplank, 1978).However, the term shared control is often used more loosely to describe a situation in which a human and computer are performing distinct aspects of a task (e.g., monitoring vs control) at the same time.• Smart infrastructure: a traffic system which can monitor, measure, analyse, communicate, and act based on sensor captured information (adapted from Royal Academy of Engineering, 2012).• Vehicle-to-Everything (V2X) communication: an umbrella term for a vehicle communication system where information from onboard sensors and other sources travels via high-bandwidth wireless links.V2X encompasses vehicle-to-vehicle (V2V), vehicle-toinfrastructure (V2I), vehicle-to-pedestrian (V2P), and vehicle-tonetwork (V2N) communication.V2X may be part of future automated driving systems, where vehicles will be able to communicate with other vehicles, to pedestrians with wearables, data centres, and infrastructure such as signage, dynamic lane markers, cameras, parking metres, and street lighting.• Virtual reality (VR): a computer-generated simulation of a realistic experience.VR usually blocks out the real world and replaces it with a virtual synthetic environment (Azuma, 1997).
• Vulnerable Road User (VRU): non-motorised road users, such as pedestrians and cyclists as well as motorcyclists and persons with disabilities or reduced mobility and orientation (Directive, 2010/40).• Wearables: devices that can be worn and which contain computer technology or can connect to the internet.

Introduction
Automated driving is a topic that has been discussed for over 80 years.Already in the first half of the 20th century, futuristic plans were created to roll out an automated highway system (AHS) in the United States (Geddes, 1940;Kröger, 2016).These futuristic visions have turned out to be inaccurate.Automated driving can better be described as an evolutionary process in which more and more computer systems have appeared in cars.Cruise control has been around since the 1950s, adaptive cruise control is available for more than 20 years (Bärgman and Victor, 2020;Bengler et al., 2014;Stanton et al., 1997;Jurgen, 2006), and there now is Level 2 automation (SAE International, 2018) where the driver can be intermittently hands-free, but not mind-free (Banks et al., 2014;Dikmen and Burns, 2016).
Over the past decade, there have been a large number of research projects that have focused on automated driving for relatively simple road environments such as highways.In particular, since about 2010, there has been a surge of research on highway automation involving SAE Level 3 automation, where the driver is free to perform a non-driving task, such as watching a movie.The research so far has covered topics such as human-machine interface design (as reviewed in Carsten and Martens, 2019), transitions of control (e.g., Eriksson and Stanton, 2017;Forster et al., 2017;Körber et al., 2018;Zeeb et al., 2015), and driver state monitoring to track the driver's level of stress and visual attention (e.g., Cabrall et al., 2018;Dong et al., 2011;Kim and Yang, 2017).
Research is now entering a new phase, with researchers and manufacturers examining automated driving technology in more complex environments such as cities.This direction concerns new research projects, such as the EU-funded project Supporting the Interaction of Humans and Automated Vehicles: Preparing for the Environment of Tomorrow (SHAPE-IT) (European Commission, 2019), which investigates this challenge from a Human Factors perspective.City automation raises various new questions, such as when AVs will be truly self-driving (SAE Levels 4 and 5) or whether such vehicles will have a form of shared control in which continuous input from the human operator is required.
With the advent of vehicle automation in cities, research topics will switch from drivers in AVs towards VRUs and their interactions with AVs (e.g., Applin et al., 2015;Hagenzieker, 2015).Foremost, it should be examined whether AVs will be beneficial to the safety and efficiency of VRUs.Moreover, the role of smart infrastructure in future traffic and whether such infrastructure will be used in the communication between AVs and VRUs must be explored (Banks et al., 2018;Sewalkar and Seitz, 2019;Toh et al., 2020).The question here is whether the road infrastructure should communicate with road users, and how the communication between AVs and VRUs should take place.
One of the research areas dealing with communication between AVs and VRUs concerns external human-machine interfaces (eHMIs).A large variety of eHMIs designs exist in academia (see Dey et al., 2020a;Rouchitsas and Alm, 2019) and industry (see Bazilinskyy et al., 2019).These come in different modalities, including LED strips and screens, robotic attachments, projections on the road, and auditory signals, amongst others.The question remains, however, whether AVs will communicate with VRUs via eHMIs or whether AVs will only use traditional signalling methods such as turn indicators and brake lights (see Norman, 2014).The importance of this topic may rise if the behaviour of AVs becomes indistinguishable from their manually driven counterparts (see Emuna et al., 2020;Oliveira et al., 2019;Stanton et al., 2020).
Anthropomorphism could be introduced in future traffic, and some researchers have already proposed human-like eHMIs (e.g., eyes on the car by Chang et al., 2017), to enhance user acceptance and safety.Anthropomorphic communication is popular in robotics (see Duffy, 2003;Fink, 2012), but it is still an open question whether anthropomorphism would be beneficial in eHMIs.Another fundamental topic which is still open for debate is whether communication between AVs and VRUs should be egocentric or allocentric in nature.The former entails that the AV instructs the VRU on their next action (e.g., "cross now"), which may be regarded as clear and unambiguous (Bazilinskyy et al., 2019), whereas the latter communicates the state of the AV and leaves the decision to the VRU, an approach taken by various research groups (e.g., Cefkin et al., 2019).
There are other open questions such as whether eHMIs could or should be detached from the AV (as discussed in Eisma et al., 2020, andMahadevan et al., 2018).More specifically, the question is whether receiving information that is separate from the AV would prove advantageous over eHMIs that are attached to the body of the AV.Furthermore, an important challenge is how eHMIs should operate in a traffic situation consisting of multiple road users.In some cases, the eHMI may want to send a message to just a single VRU (e.g., 'I have seen you'), which could confuse other road users.These factors, together with VRU workload, cultural and language barriers, as well as the particular user requirements of older persons and children, are among the challenges to be explored.
A solution to these challenges could be the introduction of augmented reality (AR).The effectiveness of AR technology, a subset of extended-reality (XR), has already been explored and demonstrated in many domains, including driving (Schall et al., 2013) and navigation (Narzt et al., 2006), but also arts, education, entertainment, medicine, tourism, military applications, and marketing (e.g., Chang et al., 2014;Sanna and Manuri, 2016;Tabone, 2020;Van Krevelen and Poelman, 2010).Recent research has recommended the use of XR technology in AV-VRU communication and pedestrian simulation testing (Perez et al., 2019).Although it is still speculative whether AR will be used in future AV-VRU communication, the recognition of AR in other domains and the fact that AR technology is likely to improve in the coming years suggest that it is a worthwhile area of scientific endeavour: Could AR be a useful addition to daily life and VRU-AV communication specifically?, what information should an AR device show to the VRU?, could AR offer a solution to the above-mentioned problem of multi-agent interactions?, and how to address the issue that not all VRUs may have access to AR devices?
Various research groups are currently studying the design of the future city, and dozens of concepts for VRU-AR interaction have been proposed so far, a situation which Dey et al. (2020a) recently characterised as an "eHMI jungle".Given the current state of the field, it seems worthwhile to perform a high-level survey among leading academics in the field.More specifically, the aim of this paper is to shed light on the above questions and uncertainties by soliciting the independent input of renowned Human Factors scientists who are experienced in this area.The idea is to assimilate the views of these researchers and generate an overview of topics of agreement and disagreement.This paper is organised as follows: firstly, the methods used in this study are described (Section 2), followed by 14 narratives that summarise the researchers' views (Section 3), and a discussion that reflects, using academic literature, on the convergences and divergences of the major points examined in the study (Section 4).Key takeaways and concluding remarks are presented at the end (Section 5).

Methods
Human Factors researchers were invited to share their views on the topic of AV-VRU communication in future traffic.The researchers were selected based on their publication record in the area of Human Factors of automated driving and based on their eminence as judged from the number of citations in their Google Scholar profile.Furthermore, preference was given to principal investigators in the SHAPE-IT project, because of its high relevance to the study objective.For diversity of views, it was ascertained that a maximum of one researcher was recruited from the same institution, with the exception of the TU Delft which featured two researchers from a different faculty, and with the exception of two researchers who themselves invited a colleague to join them in the same interview.The researchers were included as authors for their intellectual contribution.
In order to obtain input from the researchers in equivalent forms in terms of the topics covered, text length, and terminology used, it was decided to solicit the input from the researchers through semistructured interviews, which were subsequently summarised by the first author, with the help of the second author, into a concise narrative.
The interview questions (see Appendix A), were split into four main themes: (1) general questions on AVs, (2) eHMIs, (3) AR and AR eHMIs, and (4) VR and AR for experiments.More specifically, the first section explored the researchers' views on the arrival of SAE Level 4 and 5 vehicles, shared control, the future of pedestrian-to-vehicle interaction, and smart road infrastructure.This was followed by a section that explored the researchers' views on the usefulness of eHMIs, design considerations and future design directions such as eHMI detachment from the vehicle.The third section of the interview concerned the exploration of the viability of AR technology for eHMIs.Lastly, the interview concluded with questions related to the effectiveness of VR simulations for investigating pedestrians' behaviour around AVs and the potential use of AR technology in such experiments.
The researchers were interviewed via video conference and recorded by consent during the months of May and June 2020, with the interview durations varying between 45 and 120 min.Each video interview was transcribed and summarised into a narrative of approximately 700 words.The narratives reflect the researchers' statements during the interviews and do not include citations to academic references unless the researchers verbally referred to literature.
The narrative was sent to each researcher for approval or further adjustment from their end.Each researcher adjusted their contribution to clarify various points, respecting the thematic structure of the original contribution.The fourth theme of the interview (i.e., VR and AR for experiments) was omitted from the individual researcher narratives because the researchers provided similar comments on the positive and negative aspects.The major points made for the fourth part are included in the Discussion.
In summary, the aim was to create a position paper containing the current perspectives of independent researchers.Similar exercises have been conducted previously, in the form of multi-author guideline papers, working groups, expert consensus papers, and Delphi surveys (e.g., Leiter et al., 2011;Lorenzon et al., 2018;Potapov et al., 2019; and see Kyriakidis et al., 2019, which explored the distinctive perspectives regarding Human Factors challenges in the development of AVs focusing on highway driving and the driver inside the AV).

Jonas Bärgman
Automated Vehicles: Because of all the complexities involved, including vehicle performance in different weather conditions, it will be a long time before we see SAE Level 5 vehicles on the roads, if ever.Level 4 vehicles will be available on motorways within five years or so.However, as they would require hand-over at, for example, off-ramps, many may still call it Level 3 automation.So, the SAE levels can be quite confusing.There will be some low-speed whole-trip Level 4 vehicles in cities in the same time frame.Pilots are already ongoing, but still on a relatively small scale.We will see fewer crashes with AVs in urban environments as all human problems related to perception and distraction will be avoided.However, there will likely be other types of crashes where the human would have been better at preventing the crash.There will also be new behavioural risks, where pedestrians will take the opportunity to cross the roads as they know that the AVs would stop.Therefore, AVs will not, at least not initially, be as mobile as manually driven vehicles.
Communication will play an important role in future traffic.However, it will take some time until communication equipment such as transponders in vehicles is pervasive.A complete shift to 5G communication on our mobile phones would greatly help in accomplishing this.Following the creation of standards, 5G would also be useful for vehicles to see beyond objects that obstruct camera and LIDAR vision.
eHMIs: eHMIs will make it easier to communicate the status of the vehicle to VRUs and will allow building predictions on how AVs will perform.The communication of eHMIs should be language independent and focus on auditory, visual, and perhaps tactile signals.There must be standardisation, as it would be highly problematic to learn the eHMI for every car manufacturer.Standardisation would also help cross-cultural communication.
We will see context-aware AR in the distant future but not the kind where there are perfect overlays on the real-world image a user sees.There are too many components right now that still need improvement, and prices need to reduce.I see a lot of challenges for a communication-based AR system.
As an information provider, AR glasses can give the user auditory or textual feedback on objects in their sight, but great care needs to be taken not to overwhelm wearers with information.If at all, indications should only be given when pedestrians are about to commit an action, such as crossing.The interface should be simple and able to handle situations when multiples cars are approaching reliably.There should be no direct instruction to pedestrians about when they can cross.Rather, the pedestrian symbol should disappear, similar to showing only the red stoplight.In this way, the decision is left to the pedestrian, and only the AV's intent is communicated.
Augmented Reality: We cannot assume that pedestrians will be wearing AR glasses.AR design should be guided by the assumption that at least one of the pedestrians will not have it.Hence, we should still adhere to all the normal rules, and kinematic cues such as keeping speed, braking, or accelerating, will still be the main means for AVs to communicate their intentions.In other words, pedestrians may infer that an AV is automated through an eHMI and base their action on the vehicle's kinematic cues.Kinematic cues would be the most efficient and safe way for one-to-many or many-to-many AV-VRU communication scenarios.It is likely that eHMIs will not be needed anymore when AVs are ubiquitous, and humans have learned to interact with them.While AR information could be relied upon and helpful for pedestrians to navigate traffic situations, I do not think that we are going to see this being used in safety or time-critical situations.It must still be safe without them.

Martin Baumann
Automated Vehicles: There are currently SAE Level 5 people movers that drive at 15 km/h, but it will take 50 to 75 years for Levels 4 and 5 automation to occur on urban roads sharing the space with non-automated traffic participants.VRU-AV communication will mainly consist of AV behaviour, supported by basic eHMIs that communicate the intentions of the AV.I think in the future this can be complemented by wearables on pedestrians that would enable communication with the AVs.
Maybe we will see infrastructure such as lightning on the ground to support the interaction between pedestrians and AVs.Since smart infrastructure is expensive, the question is: who will pay for it?Smart infrastructure might therefore be introduced only in dedicated and selected shared spaces, which could become more efficient with the investment.Smart infrastructure will allow the AV to increase its perceptual horizon and see around corners.Wearables probably will enhance pedestrians' perception of smart infrastructure elements.I think this is something that will happen and will be probably accepted by traffic participants if the privacy issues are treated well.
eHMIs: I am positive that eHMIs will effectively support the interaction between AVs and pedestrians.However, it depends on their design and the information presented.The current ways of communication and interaction between drivers and pedestrians will need to be transferred to AV-pedestrian communication.For example, in case of a deadlock, you would need to communicate explicitly, and so eHMIs would be essential.I think that eHMIs that communicate the intention of the vehicle are more effective than those that give advice to the interaction partner.AVs cannot be culture-independent, and there must be adaptations to the cultural context.For example, when I am standing at a zebra crossing, I would expect the car to stop for me in Germany, but this may not be the case in France or Italy.This means that I will have to communicate and interact in a way that depends on the cultural context.These differences in behaviour and habits must be respected by AVs.
Most of the communication should happen through legacy behaviours such as the movement of the AV.If this is not enough and there is still ambiguity, then simple explicit signals should factor in. Wearables might support this process, but if they cannot provide an unambiguous reference to a vehicle outside, then VRUs may not trust and use them.Receiving a message that a vehicle from the right will stop to let me cross without specifying which of the vehicles, does not seem very helpful.Thus, eHMI that are placed elsewhere than on the vehicle itself may not be effective and acceptable.
Augmented Reality: AR glasses could be of help here to solve this problem.They have the potential to make the situation more transparent if they do not overload the user with all possible information but present the relevant things only.Pervasive and context-aware AR could be available soon with a multitude of applications, if it is not already there.An example would be navigation advice through AR.
AR glasses will have to be socially acceptable.One of the problems of Google Glass was that people disliked the thought of being recorded.AR glasses offer the possibility to display more information than an AV-mounted eHMI could, and to identify the AV that is the source of communication and the interaction partner.The AR system should provide information related to safety.For example, the pedestrian could be presented with safety corridors related to which vehicles will stop for them.The advantage of this safety corridor concept is that it is integrating information from several vehicles.In doing so, it is clear to whom the vehicles are communicating, in contrast to the undirected communication of traditional eHMIs.Nevertheless, there will still be a need for eHMIs on the car or infrastructure as a backup since there will not be 100% market penetration for these glasses.

Shuchisnigdha Deb
Automated Vehicles: SAE Level 3 is not safe because it splits the monitoring and control roles between the system and the driver, which confuses drivers.Therefore, SAE Level 4 should be introduced instead of SAE Level 3. Level 4 will be on the roads in 10 years as it is not safe yet.We must rebuild our infrastructure to support such vehicles.SAE Level 5 vehicles will be introduced in urban environments in the very distant future, following their introduction in dedicated lanes.For the time being, I foresee shared control and smart infrastructure solutions, with operator insides the vehicle and in control rooms.
Pedestrians are cautious when told that there will be AVs on the road, but this caution subsides quickly once correct behaviour is observed from the AVs.There is a risk that pedestrians will be jaywalking in front of AVs.Similarly, children may assume that the AV will stop for them, or they could just run out into the middle of the road.This would be very concerning.We cannot simply make AVs very conservative because it means that anybody could obstruct them and create a traffic jam.Road users should be educated on AVs and their capabilities and limits.
eHMIs: Substantial efforts have been conducted to enhance AV-VRU interaction via eHMIs using LED lights on the vehicle.However, we have not seen such eHMIs on current SAE Level 2 vehicles, and it is unclear how it will work out in the future.I believe that eHMIs are essential on Level 4 and 5 AVs since pedestrians need a signal of some sort.People prefer AVs with eHMIs, and confusion arises when there is no such interface or driver present.If elements are borrowed from current pedestrian signals, we can develop an interface that could become a new standard that pedestrians can learn.Research shows that people usually understand eHMI signals, such as a stop sign or upraised hand, the way they should.However, anthropomorphic interfaces such as a smiley face are not useful as they may elicit surprise and curiosity, which can be counterproductive.Especially children may huddle around the vehicle to explore the eHMI.Text always works well, but it depends on literacy, and it is problematic in cross-country scenarios.
I think it is reasonable to trust eHMIs that are detached from the AV.If we want to connect everything, we will have a mobile application that will give users a signal based on acoustics or other signals like vehicle speed.We are currently exploring whether providing information to cyclists on mobile phones would be overwhelming or useful.AR glasses will probably be useful for cyclists in the future.I am not sure how useful these would be for pedestrians since they move slowly and hence have enough time to decide.
Augmented Reality: Pervasive and context-aware AR will be of great benefit to humans, especially for the training of workforce, where it is already being used.For example, in the USA, AR is already being used in the Navy and for construction safety, and agriculture.However, we need to do a lot of research before we put such devices to people's eyes.AR should not overwhelm the user.Information should include highlighting of hazards such as specific alerts that a vehicle is approaching.While this can be a good solution for vulnerable road users like pedestrians, it can also be overwhelming.For situations where not everyone has an AR device, I would like to see information from the smart infrastructure.If there is no such infrastructure, the signal should come from the car itself.There could also be a hybrid approach depending on the location.
AR could be used to alleviate ambiguous situations where an AV is communicating to multiple pedestrians.Such a situation is confusing and may lead to accidents, but it is still better than not having a signal at all.The solution here is once again standardisation and training.

Azra Habibovic
Automated Vehicles: SAE Level 5 vehicles will be driving in urban environments only in the very distant future.For Level 4, it would depend on what is meant by that level.We can have Level 4 in different operational design domains.We will probably first have Level 4 automation in specific operational design domains, such as highways.Vehicles with this potential may be available by 2023-2024, while Level 4 vehicles on a wider scale may surface in 10-15 years.Vulnerable road user safety will be improved if AVs are designed properly, but I think there are still many uncertainties around AVs in general.I do not think that automated driving would eliminate all possible accidents and incidents in traffic, but hopefully, it will make the situation better.
Smart infrastructure will be part of future transport.Although AVs should be able to operate by using the information provided through on-board sensors only, their operation could be improved with the help of digital infrastructure and other vehicles.As it looks right now, information exchange will not necessarily occur using V2V but rather through different cloud systems.However, making use of safety-critical information from other sources is currently challenging.
eHMIs: It is uncertain if eHMIs would be needed or implemented.eHMI might be necessary when automated driving is not very mature and not accepted in society yet.Another point which is not studied enough is how eHMIs affect traffic flow and efficiency.Also, longterm behavioural effects of eHMIs are largely unknown.It will take many years to be able to decide on the perfect eHMI modality since we would need a large enough fleet of AVs to test this.eHMIs must be based on different modalities and not be based exclusively on light or text.Text-based eHMIs are challenging because they would need to be translated into many different languages, whereas anthropomorphic eHMIs pose a challenge of cost and durability.The vehicle should communicate its intent and status without instructing people as the latter could be the cause of legal incidents.You can never be sure that neighbouring vehicles will also stop for the pedestrian.One should also consider eHMIs in terms of vehicle motion and vehicle appearance, as many important signals are today communicated to other users via such implicit means.
Requirements of people with different needs in society, such as children and people with visual or auditory impairments, must be addressed.The prototypes that exist now and are used in research studies are usually based on one modality, and if they are evaluated, they are not evaluated with children and older persons.We need to have a much wider approach to research and development.Education should ensure that children learn the meaning of an eHMI just as they learn the meaning of traffic lights.Another approach to make eHMIs suitable to children would be using multimodality and existing colour conventions, such as green for 'go' and red for 'stop'.However, with current regulations, this might be difficult as certain colours are already in use in traffic, and we are therefore limited in what colours to use in eHMIs.
Augmented Reality: Augmented reality has potential, but for some reason, it has not reached a breakout yet.Pervasive and context-aware AR has been under development for many years, but my impression is that AR devices are not moving forward at high speed.It could mean that the technology has not reached an acceptable level of maturity or that we have not found a good application yet.But this could change drastically, as happened with the smartphone.
AR has the potential to amplify information that is already available but is unseen.Accordingly, AR has the potential to be useful for children and people with impairments.I do not think that the information in the AR environment should be any different compared to communication by the AV itself.It should be directed and minimalistic in nature as we do not want to overload the user.In any case, there must be a combination of communication modalities, as not all VRUs will be using AR.AR would be interesting to explore for cases when the AV has to communicate to individual pedestrians.

Marjan Hagenzieker
Automated Vehicles: SAE Level 4 AV shuttles have already been implemented on urban roads on dedicated lanes and sometimes mixed with cyclists and pedestrians on short stretches of road.It can be debated whether they are really Level 4 because there are stewards on-board.The steward needs to intervene often because of technological failure or due to objects on the road.Level 4 and 5 passenger cars are decades away, if they will ever even materialise for large scale use.It will be particularly challenging to get them implemented on a large scale on urban roads.Since SAE Level 5 is difficult to achieve, I believe that we will probably have shared control for a long time.I am not too optimistic about pedestrian safety as I am afraid that technology is promising more than it can deliver.Although we see that advanced sensors that can detect pedestrians are being developed, it is unclear whether they can predict what pedestrians will do.
It is difficult to say which role smart infrastructure will have in the future because there will be a need for standards and numerous authorities need to collaborate.The European view is that there will be such infrastructure in the future whereas, in other parts of the world and certain industries, there are wishes of independence, where vehicles communicate with each other without the need of specific communication with the infrastructure.eHMIs: It is not clear yet to what extent eHMIs will be helpful.It is still a new area of research, and most research has been done for a single person interacting with a single vehicle.The research so far shows that it is difficult to convey a message that is understood by everyone in the same way.Road users base their decisions not necessarily on the eHMI but rather on AV speed and distance.The need for communication with pedestrians will decrease as pedestrians build up experience with AVs.The most effective eHMIs appear to be those that communicate the vehicle's intentions not only to pedestrians but to all traffic around it.
I am curious as to whether anthropomorphic eHMIs will be effective.We should look at other fields such as health and robotics, to see how they use anthropomorphic ideas.Moreover, the issue of different cultures needs to be solved.Text messages work best when the receiver knows the language.We have the same problem in traditional traffic, where foreigners do not always understand road signs.There will probably be a multitude of solutions just as has happened with route guidance.It is good to have standards.In this way, we can build up expectations and schemas, and accordingly make fewer errors and become safer.However, we are so much at the beginning; it is too early to work on standardisation.For the time being, we should be open-minded and do out-of-the-box research.
Augmented Reality: Pervasive and context-aware AR will become available soon.Some solutions are already available for specific groups of people, but we are a long way from having it on a larger scale.The information that is useful in AR is not any different from what pedestrians currently need.That is, it would be good to know if there are obstacles on the road or moving objects that the pedestrian should be aware of.The AR system could communicate what behaviour is allowed, similar to how road users currently gather such information through visual and auditory feedback of the road environment.However, one must consider which tasks should stay with humans because humans may be better at certain tasks or can do them naturally without external aids.
AR could be a solution to language barriers: If you know the user, you can implement the right language.Moreover, AR can be helpful in AV-VRU communication, but it is still something that has to be investigated.Maybe one could stimulate road users to use AR equipment even if they do not like it.However, if it is pleasurable to use, they may gain a lot more from it.AR technology will be rejected, or it will be adopted out of free will because it serves lots of purposes, just like the smartphone.As with any automation, technology is one thing and what humans want is another.Sometimes developments go very fast, and sometimes it becomes disastrous and dies out.

P. A. Hancock
Automated Vehicles: AVs will be restricted to freeways in the coming years and then move on to arterial roads and urban areas within ten years.We will asymptote towards SAE Level 5 as automated control penetrates all forms of transportation, including ocean transport such as container ships.The latter will occur silently compared to ground transportation, and once manufacturers realise this, they will progress even more rapidly towards the development of SAE Level 5 AVs.For such vehicles to be deemed safe by pedestrians, we need to show empirically that the AV does no greater harm than the manned vehicle.AVs will first be used in niche areas, where they can best be employed for making a profit.
Pedestrians are highly vulnerable, and high-density pedestrian areas are not a good place for AVs to operate, as a conservative algorithm would drastically slow down the vehicle's movement.Since the intelligence may not necessarily have to be on the AV itself, such pedestrian dense environments can especially benefit from smart infrastructure.However, this presents both political and technical challenges, which include making the intelligent roadway work for non-AVs and the limitation of sensor capacities in different geographical locations.A further issue is the funding model.Infrastructure, since it is a public good, would presumably need a public-private transformation for improvement.
eHMIs: There may be value in using eHMIs for informing pedestrians about the AV's intentions.It would be ideal to provide a unified affordance of the intentions of the vehicle, rather than just a visual colouring.For example, it would be interesting to alter the AV's perceived surface to make it appear as more or less threatening, contingent on need.The message communicated should attempt to resolve as much ambiguity with the shortest possible signal.With regard to AR-based HMIs, text would be too slow and constrained, and therefore, there must be more focus on graphical, aural, and tactile message representations.The focus would be more on collision avoidance than on information transmission per se.
It is unknown how an eHMI should communicate to multiple individual pedestrians; this would also raise the question of liability in case of a collision.The AV's algorithm would need to demonstrate that it did not explicitly discriminate one pedestrian over another.In terms of the Haddon hierarchy (Haddon, 1970), the easiest solution is to keep pedestrians and AVs separated as much as possible and have only limited locations where interaction can occur.If interaction were to occur with detached eHMIs or wearables, it should not impose the cognitive load that is freed from the AV driver onto the pedestrian, as this could lead to overload very quickly.The idea of having these capabilities is not to aid you while you are walking but to permit you to do something else while you are walking.
Augmented Reality: AR will be helpful in VRU-AV communication.Humans prefer accessing information anywhere and anytime as required.This is being strongly demonstrated during the present COVID-19 lockdown period.There is much potential for future use too, such as the integration into spacesuits for use on Mars missions.Even if most people wear AR glasses, some people will still be excluded.Also, there are several Human Factors and Ergonomics issues that need to be solved, including for how long a user could wear AR glasses.Ideally, the user should not know that they are wearing it since the whole point about interfaces is that they disappear.If you push on the idea of Gibsonian affordances (Gibson, 1979), the affordances are not a conscious experience; affordances are an implicit process of the environment around the perceiver.Another important issue is what would happen if you remove or leave the AR off.In this case, one may expect negative transfer, and a possible solution would be to invest in the infrastructure.

Riender Happee
Automated Vehicles: Automation in passenger cars will gradually progress from current SAE Level 2 to Levels 3 and 4, with an operational design domain limited to highways and other roads where pedestrians and cyclists are not to be expected.The perception capabilities of these vehicles will enhance safety on all types of roads through increasingly effective intervention systems, such as automated emergency braking.Level 5 is decades away, but driverless Level 4 shuttles already operate on public roads with a safety steward on board, and several sites aim to drive without steward in the near future.The presence of a steward is reassuring for occupants but often not visible or known to other road users.Hence, driverless shuttles provide a suitable basis for investigating and improving AV interaction with pedestrians and cyclists.Surveys show that pedestrians and cyclists encountering driverless shuttles are willing to accept such AVs on public roads.However, they also want AVs to express intentions through visual and auditory interfaces.
For milestone improvements in urban road safety and AV acceptance, a bigger change is needed.I believe that AVs and vulnerable road users such as pedestrians and cyclists should not encounter vehicles driving at high speeds and violating traffic rules.This can be achieved through traffic separation, banning manually driven cars and with speed limits that are enforced through communication systems.Pedestrians and cyclists must also do their bit to improve safety, by changing their behaviour.AVs will aim to prevent all accidents, even if they result from other road users' misbehaviour.Therefore, misbehaviours may induce deadlock in dense pedestrian areas.Cities may have to be redesigned to contain well-designed static infrastructure and communication systems.Traffic lights may be replaced by or connected to communication systems in cars and systems worn by pedestrians and cyclists.
eHMIs: Similar to smart infrastructure, eHMIs can contribute to safety and AV acceptance.Our research indicates eHMIs are especially useful at low speeds where pedestrians have time to interpret and react to eHMI signals.At longer distances, recognition of eHMIs is problematic.Our experiments showed surprisingly small differences between fundamentally different types of eHMIs in terms of acceptance and effect on behaviour, and participants learned to use eHMIs quickly.Possibly, our participants simply reacted to the changing eHMI colour, text, or symbol.This eHMI change was always coupled with implicit communication, which remains an important factor.
Overtrust in eHMIs may cause accidents.This ties in with the issue of directing the message to the appropriate actor: pedestrians may see a message that was not intended for them and cross erroneously.One solution is that AVs communicate status rather than instruct.After further research in a preferably worldwide population, we must harmonise eHMIs, similar to current traffic signs and vehicle lights.
Augmented Reality: Regarding signals received wirelessly, the smartphone is too unreliable.AR is promising, but for prolonged use, comfort should be improved in terms of resolution, image stability, and smoothness in order to prevent eyestrain or even motion sickness.AR will enable personalised information to VRUs early during an AV interaction.AR would be ideal for conveying if a vehicle is automated, whether an AV has seen the VRU, and which action the AV will take.An interesting concept would be an augmented 3D traffic light in the form of a virtual fence to stop pedestrians from crossing a vehicle lane.It would be ideal for tram lanes as well, and simple enough for a child to understand even in ambiguous situations when multiple pedestrians are present.
Safe and acceptable AV interaction may be feasible without eHMIs or AR, but I do expect substantial benefits of such systems.To compensate for not everyone having access to the technology, AVs should be designed to be understandable even for those who do not have the technology or have other limitations.

Josef Krems and Claudia Ackermann
Automated Vehicles: Level 4 automation will be available on urban roads in 10 to 20 years, while Level 5 automation may appear 5 years later.However, some people will still like to have the pleasure to drive by themselves.These people may have to be forced to use automation by law to reduce accidents.AVs' communication to pedestrians should be through implicit behaviour from the vehicle, while eHMIs should be used in ambiguous situations.
Another option is to use the infrastructure to communicate to the pedestrian, so the car does not have to be used as a communication entity.Smart infrastructure will play a pivotal role in future electromobility.Silent cars should not be equipped with additional noise as it would counteract the idea of silent cars.A better solution is to use infrastructure to warn people via devices such as smartphones.Infrastructure will also play a key role in traffic separation.Segregating different modalities such as AVs from manually driven cars, cyclists, and pedestrians is very costly, however.
There might be a lot of warning signals in the future, but there is a limit to the number of signals that humans can handle.At the same time, information processing capabilities will increase as people will come better at understanding how the connected world works.
eHMIs: eHMIs should be avoided unless we can show that their benefits are large.It is a tricky thing to have new signals on the road as we know that signals tend to add workload and confusion.eHMIs should make crossing easier rather than more difficult.There are still numerous questions related to how the right message should be communicated to the right person.There will not be a need for eHMIs for every single interaction for every pedestrian.There are a lot of technological, regulatory, and standardisation challenges such as regarding projections on the road at daylight, the limited colour options for eHMIs, designing in the appropriate size to make the interfaces visible from 30 to 40 m ahead, and making sure that eHMIs work across different crossing cultures and language barriers.This was one of the main benefits of the UN Vienna Convention (United Nations, 1968).Unfortunately, for the time being, all OEMs are creating their own distinct designs.
Driver gestures do not play a significant role except for very short distances; a pedestrian's decision to cross is usually taken before that.Therefore, there will not be much of a difference between an AV or a manually driven car approaching from a certain distance, and so the same signals and interaction principles apply.Kinematic cues such as vehicle deceleration would still signify that the vehicle has acknowledged the pedestrian and intends to stop.Hence, an eHMI will only be useful when the pedestrian is still unsure whether the car has acknowledged them.
Visual eHMIs are effective, but there is no clear answer as to whether to use symbols or text-based messages.Another issue is that current eHMIs are very artificial and have no cultural backup.They would require a lot of training or time for people to get used to them.While there are eHMI designs that utilise anthropomorphic elements to enhance understandability, future AVs will allow a whole new design for the AV and eHMIs.The inside and outside of the AV will probably not be humanlike but designed so that pedestrians can recognise the AVs and adapt their behaviour.Our research has shown that people want to have instructions from the car about what to do.Participants found information about the status of the AV to be ambiguous and less trustworthy.
Augmented Reality: AR-based eHMIs will not be much different from regular eHMIs.Most people do not like additional tools and will have a problem wearing AR glasses.However, acceptance would be higher if it is something that does not require continuous interaction.It would be important to support situation awareness.We need to find out what kind of information has to be made artificial or augmented.Context information such as regarding the dynamics from approaching vehicles could prove useful.For cases where not everyone has access to AR, intelligent infrastructure such as a flashing zebra crossing or a special traffic light that addresses pedestrians, could be used.
Although AR already exists in research environments and at OEMs, it will not have a role in the market of the near future due to its cost.For the time being, this technology will be used for some special applications and users such as firefighters, police, or maybe rescue personnel.A designed-for-all approach will take a long time.

John D. Lee
Automated Vehicles: SAE Level 5, by definition, will never happen.It is defined as being able to drive under any condition, which is impossible even for manually driven cars.There will always be limitations, such as during a snowstorm.At the same time, Level 4 shuttles that have no driver, steering wheel, or brake pedals already exist in urban environments.As these vehicles become more advanced in the next ten years, they might be perceived as Level 5 vehicles because their passengers feel that they can go anywhere.For the time being, shared control is viable, because people can pair with automation to accommodate unanticipated variability of the driving environment.Autonomous taxis that serve multiple people are really the promise of vehicle automation.But this promise cannot be achieved through shared control.Therefore, to get to the promise of AVs there needs to be Level 4 automation.
Interaction with pedestrians is going to be challenging in several ways.Firstly, the unpredictability of pedestrians makes it almost impossible for people or algorithms to avoid collisions.Secondly, pedestrians interact in a social and culture-specific manner, which makes it challenging to create algorithms with the requisite culturally-specific driving expertise.Lastly, pedestrians may perceive the risk of AVs differently compared to manual vehicles.Pedestrians may AVs since pedestrians have no incentive to accept them into their space.
I am not optimistic about smart road infrastructure as this presents a challenge of cost and backward compatibility.A virtual traffic signal might work well for properly equipped vehicles but will be invisible to those without.So, it would be difficult to get people to invest in this.I think the most challenging part would be how to communicate with pedestrians.If it is virtual, then that mandates equipment on the pedestrian to signal to the infrastructure and display the received signals.This may be possible in a country with a high standard of living where everybody can be outfitted with smart glasses.But in other countries, you have economic disparities, which will leave large parts of the population wandering about without instrumentation and displays.Non-smart infrastructure might be a more productive way forward.Best practices of infrastructure design that currently helps drivers and pedestrians negotiate the roadways might also help pedestrians and AVs interact safely.
eHMIs: eHMIs will almost certainly be implemented in the future.They will be useful in building trust among pedestrians whilst giving them a feeling that the AV is polite.However, eHMI design should be considered as a secondary communication channel.eHMIs should be paired with vehicle motion cues since these can communicate the intention of the automation more effectively than say a text-based interface because people have evolved to communicate through motion.The motion of the vehicle can appear menacing or safe, depending on the deceleration profile.This approach would also be helpful to make such communication cross-cultural, since motion cues are language independent.Motion also addresses a deeper anthropomorphic level based on a fundamental perception of motion and the meaning of that motion.On the other hand, the surface anthropomorphic level is based on elements such as putting eyes on the vehicle.I am uncertain how effective that would be but, work in robotics suggest that it might work as people seem to be sensitive to eye gaze.eHMIs that instruct the pedestrian can be risky as people may over-rely on the automation.Instead, it should inform the pedestrian of its intent, which would hopefully prime the pedestrian to check for other cues in the environment.A final important aspect for eHMI development is to include children, older people, and other vulnerable populations as a priority test cases because what is explainable to them in a few words would hopefully be understandable to the rest of the population.
With regards to detached eHMIs, I think that physical proximity is important, or at least the illusion of it.With detached eHMIs, there may be an additional burden of mental rotation and mapping of the image to the object it refers to.More generally, detached eHMIs need to consider the frame of reference of the information that the person must use to interpret the information.The motion of the vehicle itself is likely the most direct and interpretable cue.
Augmented Reality: AR smart glass technology is here, but its penetration is still low.It is a technology that may be on the cusp of widespread use, but whether people will find it useful and adopt it broadly is the question.This technology will offer both a benefit and disbenefit to people, similar to the smartphone, but it will have even greater power to attract and guide attention.
AR could prove beneficial for both pedestrians and drivers.It could help drivers understand what the automation is doing and why they might have to take back control.On the pedestrian side, AR will enrich the communication between the pedestrian and the driver.Pedestrians could see the vehicle's intent more directly.Another thing that AR can help with is giving directions and public transit coordination, where information is overlaid on the world.However, relying on AR as a safety feature for interacting with AVs is problematic.AR provides more flexibility and opportunity for personalization than physical reality for the design of eHMIs, but personalisation may not be a good idea since users oftentimes do not understand what the best solution is.
Most importantly, AR facilitates communication but only for those who have it and are wearing it, and we should design for those who are not.Consequently, it should be considered a secondary information source that complements non-AR sources of information.Working out the necessary communication in the absence of AR seems important to me.I am inclined to say that the vehicle cues should be sufficient on its own to ensure safety.In other words, the non-augmented layer would have to be sufficient for those who are not seeing the augmented layer.

Marieke Martens
Automated Vehicles: There is a lot of confusion about what the levels actually mean.When thinking this through, there is, for instance, a lot of confusion about Level 4, with a huge difference between SAE Level 4 public transport and SAE Level 4 personal cars.For public transport or robot taxis, you can train a vehicle to drive on specific routes.For the general public, this will be interpreted as fully automated or autonomous driving.However, for personal vehicles, Level 4 means the car can drive itself for a specific amount of time, offering the driver the chance to do something else and requesting the driver to take back control in various conditions.The only difference in this case between Level 3 and Level 4 is that, if the person does not take back control, there is a safety-backup, without specification of how safe this option actually is.Level 5 refers to fully automated or autonomous driving, under all conditions.I do not understand this rush for Level 5; the most important thing is how we can improve traffic safety and avoid confusion for other road users.
eHMIs: The concept of the eHMI has been introduced to support the interaction with other road users and prevent confusion about what the vehicle is doing when in automated mode.The eHMI may need to show that a dual-mode vehicle is in automated mode.Depending on the surroundings, other signs and signals may be needed.One must distinguish between what is absolutely necessary to communicate-to improve traffic safety or at least not cause accidents-and what is not absolutely required but may still be wise, for instance to improve acceptance.Some signals may be the same as manually driven vehicles, such as turn signals and stop lights in the rear of the vehicle.Secondly, there is the question of liability, which also guides the way manufacturers will present messages to pedestrians.For example, it is unlikely that an AV will tell the pedestrian what to do.Even though people often indicate that they want to know if the AV has detected them, I believe that we should not do this because it could give rise to misunderstandings and unsafe situations.It makes more sense to indicate what the AVs plans to do, the so-called communication of intent.When there are multiple AVs, pedestrians, cyclists, and other road users, it will be difficult to indicate who the message is for.An issue is that eHMIs may distract VRUs from looking at non-AVs, which is undesirable (as also pointed out by International Organization for Standardization [ISO], 2018).
Augmented Reality: Smartphones or wearables could be used to predict the influx of pedestrians such as at schools or at the end of big events, and reroute AVs accordingly.However, I do not believe that the solution is that everybody should be connected and that VRUs are warned for AVs on their phones.
I do not believe in smiling cars, strange designs, text, or voice messages.eHMI messages should be simple and always in relation to the vehicle's behaviour as vehicle motion is really the strongest cue.The largest benefits of eHMIs are in shorter-distance communication, since when a vehicle is further away, people pay attention to other cues such as speed, movement, and distance.It would be interesting to be notified not just about the AV's intention to stop but also about where it intends to stop; this is what we found in research.
With regard to eHMIs not being on the car, I think this is too risky since different cars may project different things and view may be blocked or changed by the presence of others.And let's not forget that the movements of the car are the primary means for conveying vehicle intent.

Natasha Merat
Automated Vehicles: The deployment of SAE Level 5 vehicles in an urban environment will probably never happen, whereas SAE Level 4 vehicles may be deployed around 2040.Until that time, it would be a good idea to keep the human in the loop, and shared control may be one of the ways to accomplish this.
In the InterACT project (interACT, 2017), we have observed that, while technological developments have made AVs great at obstacle detection, they still have limitations when it comes to seeing around other vehicles and anticipating the future movements of other road users such as cyclists and pedestrians.We must go a step further than object detection and have the AV communicate to different traffic actors so that there is a decent flow of traffic.The general feeling is that people want the AV to communicate with them, as was made very clear in the CityMobil2 project.
eHMIs: We are currently looking at whether eHMIs can replace interactions by drivers.It has become clear that the presence of an eHMI generally translates to quicker pedestrian crossing decisions.Although there is no set formula on how to design these interfaces, we do not think that text is useful, as it is not very international or decipherable at a distance.We are, therefore, testing lights, but there are still uncertainties regarding the choice of colour and presentation methods, and of course, this is still an issue for visually impaired road users.Research on anthropomorphic messaging has suggested that personalised messaging, which, for example, makes use of a family member's voice, can be effective.It would be interesting to investigate whether similar results could be achieved for an AV-based eHMI.
Pedestrians report that knowing that the AV has detected them is important.However, a problem that remains is how to communicate between an AV and multiple actors simultaneously.So far, most studies have been conducted between only two actors: an AV and a pedes-trian.The use of infrastructure and wearables have been mentioned as possible solutions.For smart infrastructure to be part of the equation, there must be much more investment in reliable communication technology.
Augmented Reality: Pervasive and context-aware AR is already being developed and implemented inside the car.So why not outside of the car?AR would help us see things that we cannot see and, for research, investigate traffic situations in which human presence would not be safe.There is a role for psychologists in the design of AR systems; the design should ensure that users direct their attention to the right information at the right time.The use of AR glasses can be powerful as it can allow communication with different people at the same time.Accordingly, we can move forward from the present oneto-many communication, which presently is a problematic situation.
Most eHMI research has been conducted in the Western world, with only a few exceptions.Introducing eHMIs everywhere would be challenging.AR technology may contribute to solving the issue of communicating to multiple cultures as a different interface would be used for each, similar to having different voices for your satnav.In a way, AR will also allow pedestrians to walk around while consuming media and be prompted when a vehicle communicates.This is a similar analogy to the AV, allowing the driver to do other tasks.In a way, AR will free the pedestrian, just like the AV frees up the driver.

Don Norman and Colleen Emmenegger
Automated Vehicles: We do not like the levels.They mislead because they use the wrong dimensions to characterise the complex and subtle distinctions that are necessary.Note that fully autonomous vehicles already exist in mining, agriculture, and factory floors, and there are attempts to make home deliveries autonomous.It will take a very long time until commercial versions of these vehicles are released onto urban roadways without changing the infrastructure and without separating the humans from the AVs.
As regards to shared control, we must first look at its definition.It means that both the operator and the machine are performing a task together.What this often means is that a person is supposed to sit and do nothing for hours, but is expected to respond in a tenth of a second in cases of an emergency.That is not sharing but monitoring, and it is an impossible situation.We have long argued that excellent but partial autonomy should be skipped in favour of full autonomy.Complete autonomy in any situation, especially off-road in difficult terrain will not happen for a very long time, if ever.
There are ways of dividing up the job so that the autonomy does what it is good at and people can do what they are good at, both doing it together, much as a rider and a horse coordinate their activities.Riding a horse is a shared activity, where the horse does all the low-level detailed stuff while the rider controls the goals and the pace.But where necessary, the rider can force the horse to do things it would otherwise not do, and if the rider falls asleep or is otherwise incapable of supervision, the horse can take over, either returning to its home base or simply stopping at a safe location, waiting for assistance.
It is the AV's responsibility to make its intentions clear to all other traffic participants.However, signalling the intention does not mean that everyone who needs to know it receives or understands the message.The AV's intention could be conveyed in many ways, but requiring everyone to carry a special device to read that intention is not a viable solution.
The safest approach is to separate the means of travel, removing the priority over all other users of a shared infrastructure that has been given to the automobile.Safety requires separation of the many different modes of transportation.Fundamentally, the problem is that mixed modes of almost anything are dangerous.
It is incredible how well the current traffic system works.Our research has indicated that pedestrians and other road users pay attention to the movements of cars.A car that stops is signalling to others.
Similarly, a car that travels very slowly, far slower than is permitted, is also signalling.Where a car stops at a crosswalk signals whether the car will wait for pedestrians and other users or whether it is anxious to get moving, which means other road users should be cautious.Any form of explicit communication should be built on top of this baseline.The vehicle should communicate its intention so that other road users can chart their best course of action.
eHMIs: The approach of having the car signal to other road users that it is safe for them to proceed is very dangerous.Drivers do it today with hand signals, sometimes leading to accidents.This type of signalling only works when there are so few road users that the intended recipient is unambiguous and there are no other possible vehicles that might suddenly appear, negating the signal given by the first vehicle.
Our studies show that vehicles should convey intentions and not tell others how to behave.Movement is one way of doing this.The nuances of gestures through vehicle motion cues may be different from one country to the next, but the notion of using motion to communicate is still a valuable one.Using motion would be ideal in a multicultural setting.
If messages are placed outside of the car, human attention becomes more scattered, especially when there are many vehicles, so they may miss a critical signal.The use of sound may help because hearing picks up sounds from all directions.However, if many vehicles are simultaneously signalling through sound, the result might be confusion and chaos.These problems need to be explored deeply.And here is where standardisation is essential Augmented Reality: Developing AR-based eHMIs is a wrong perspective as it puts the responsibility on the pedestrian to act on the presented information.Instead, it should be a pedestrian-centred approach where the signalling comes from the pedestrian and the AV will drive safely, rather than the other way around.Furthermore, we do not know if people will adapt to AR devices.A safety concern is about what will happen when the technology does not operate properly.AR glasses are effective and widely used in specialised activities, but they may not be appropriate for solving traffic problems.
Concerning the problem of an AV communicating to multiple pedestrians, we must remember that there is a lot of following behaviour in traffic.A pedestrian decides to cross not only by watching the traffic but also by observing what other pedestrians are doing.All vehicles will stop if a group of pedestrians cross the road.This happens because the mob of people have signalled its intention to the automobiles.Such communication of intentions goes both ways, and the car may therefore not have to signal to multiple pedestrians.

Thomas B. Sheridan
Automated Vehicles: There are already AVs out there, but it does not mean that people will accept them.Furthermore, there is much to driving that is social.Pedestrians can interpret the face of a driver and their hand signals, which is a difficult task for artificial intelligence to pick up.I believe that, with driving, you need to be in the loop, alert, and attentive.Some of my previous work stated that one should not expect a driver who is not attentive to come back in the loop after an instant take-over signal.It takes maybe 10 to 20 s for attention to be restored.Therefore, you must jump over that Level (SAE 3) and be smarter.I am a bit sceptical that automated driving will be accepted quickly since it all boils down to trust.In the area of Human Factors, trust of automation has become very important.
I believe that the driver should be in the loop with at least some of the functions of an AV, so either the driver is doing all or most of the tasks, or the computer is doing all or most of the tasks.So, there could be a scenario where the driver handles steering, and the computer handles braking.This is known as traded control and not what people are referring to as shared control.
eHMIs: Smart road infrastructure will have many potential uses, but the problem is that it is expensive, requires maintenance, and is vulnerable to damage by weather.I am worried about the safety and reliability of that infrastructure.On another note, since I am retired, I have not been familiar with the eHMI concept.However, I find it an interesting and wonderful idea.I believe that they would make a difference if they are a substitute for the driver's communication.If the AV is confused and not sure about itself or another actor, then communication, similar to how drivers do with honking or hand movement, would be important to pedestrians.It would be ideal for communicating what is expected of the pedestrian or the driver.Using anthropomorphic communication would be a good thing.In fact, anthropomorphism is common across cultures, and hence it could be instrumental in aiding in the breaking down of cultural barriers for communication with eHMIs.
Augmented Reality: Pervasive and context-aware AR is possible of course and is already being used in various settings such as by mechanics, assembly line workers, and storage facilities, where the glasses help the user navigate towards a package with the queried ID.ARbased eHMIs would be most useful in low-speed scenarios where the vehicle and/or the pedestrian have stopped or are moving very slowly and there is uncertainty about who would go first.This is usually sorted out socially between two people, and in this case, eHMIs would be instrumental in replicating such interaction.AI-based communication through AR could be done even when there is no driver in the car.
For cases when a pedestrian is not wearing the glasses, there would be other cues in addition to the AR eHMIs.The vehicle could honk its horn, edge, and also move towards the centreline.This would also be useful in cases where the AV has to communicate with multiple pedestrians.In a manually driven scenario, the vehicle is slowly edged ahead at a crossing, and the driver would check if any pedestrian is moving or not.In case of no movement, the vehicle will be edged beyond and continue driving.Such communication could be replicated by an AV.AR eHMIs would also be useful when travelling to other countries with unfamiliar cultures to the traveller.It would be interesting to have a future scenario where hotels would rent out AR glasses that would help me cross the streets of that particular place.
I do not believe that receiving information from a secondary source such as AR glasses would cause cognitive overload, as the overload is caused by the uncertain situation in the first place.There is more such overload caused by the uncertainty than there is by the certainty of having some kind of a clear signal.A clear signal reduces the cognitive load irrespective of the communication technology that is used.Having everything clear is better than having some degree of uncertainty in a potential accident situation.

Neville A. Stanton
Automated Vehicles: SAE Level 3 vehicles are currently being simulated on the roads through SAE Level 2 vehicles with their early warnings switched off.There are many problems that still need solving and this pushes the release of Level 4 out to 10-20 years from now.Truly SAE Level 5 that can operate on all roads (urban, rural, multicarriageway), in all conditions, requires Artificial General Intelligence (AGI) and would, therefore, launch in the distant future beyond 50 years from now.
With current AVs, it is still unclear who is in control, and this is causing collisions.For the time being, we should use automation only for motorway driving.It is unlikely that there will be a time when VRUs will be completely safe, as the AV would require humanlike AGI to handle all situations in a complex urban setting.This requires a lot of time and training.However, current test vehicles operate at very slow speeds, nearly walking pace, and are ultra-cautious, so I do not necessarily see an immediate risk to people should these be implemented into urban environments.
V2I communication would be useful in promoting safety, but we must be careful not to remove all current infrastructure, since there will be people who would still enjoy driving and riding manually, for example, sports cars and motorcycles as well as classic and vintage vehicles.Smart infrastructure may direct pedestrians to cross anywhere on the road by relying on mapping technology and the AV's current position.Once again, the challenge to solve here are the manually-driven/ridden vehicles and how the infrastructure should handle them.In the future, when everything is connected, infrastructure will have an important role in changing the way we operate, and it may enable us to move away from vehicle ownership.
eHMIs: The premise of knowing the intention of the vehicle through eHMIs is a good one, and it is commendable to try and make them anthropomorphic.These interfaces can help but may also confuse and mislead due to cultural bounds.Something as simple as flashing the headlamps may be interpreted differently across cultures and contexts.For example, a signal from the host vehicle to give way to the oncoming vehicle or a signal from the host vehicle to the oncoming vehicle to say that I am coming through so you need to give-way, are polar opposite messages.Therefore, eHMIs must be simple and tested for all cultures and multiple scenarios since even basic symbols have the potential to be misinterpreted.Moreover, as Human Factors teaches us, we must design for all.It is my belief that the design should be the same for all pedestrians irrespective of their age; else we risk confusion.
Augmented Reality: AR can be introduced into the equation since we already have near-field communication based on location.AR glasses may allow for hands-free navigation and assist pedestrians with speed estimations by projecting the AV's trajectory.Moreover, I believe that AR could be beneficial to motorcyclists, as it is essential for them to know other road users' intentions.A problem arises in a mixed manual and automated environment where the VRU might not get the manual car driver's signal as they are looking for other communication.Providing individual context information to all actors around the AV seems very complex and difficult to achieve successfully.It would be much simpler to convey the AV's intentions and let each VRU interpret those intentions with reference to their own context.
In contemporary urban design, mode separation of traffic such as by means of tunnels and bridges is the safest option.It would be interesting to see how the separation of pedestrians and vehicles could be done electronically through AR and smart infrastructure.

Comparison of the researchers' views on the future of AVs
The consensus among the researchers' views was that it will take decades for SAE Level 4 vehicles to be introduced on the roads, first on highways and later in urban environments.The consensus was that SAE Level 5 vehicles are a long way off, with some of the researchers stating that these would never come to fruition, highlighting that it is difficult for AVs to engage in social interaction in mixed traffic (see also Müller et al., 2016), drive in particular road types and weather conditions, and anticipate the behaviour of VRUs.While Level 4 transport shuttles have already been introduced, they are still accompanied by a steward (see Heikoop et al., 2020).Some researchers noted that fully autonomous vehicles (SAE Level 5 automation) are already available in various industries, but that it would be very difficult to introduce their commercial equivalents in urban environments without changing the infrastructure or segregating VRUs from AVs.Of note, the researchers seemed coherent and did not appear to disagree about the state of current and future automation technology.However, the SAE levels themselves were regarded as open to different interpretations.Several researchers critiqued the ambiguity of the SAE levels (see also Hancock, 2020;Inagaki and Sheridan, 2019), and one researcher even mentioned that SAE Level 5 is impossible by definition.Furthermore, the researchers expressed nuance and explained that the answer as to when SAE Level 4 and 5 automation will be available depends on the specific use of automation (e.g., public versus private transport).
Various researchers pointed out that shared control is a viable option in the short term and the horse metaphor was mentioned as an ideal type of shared control (Abbink et al., 2012;Flemisch et al., 2003;Norman, 2009).However, it was pointed out that shared control is not regarded as desirable in the long term, as shared control requires the human to be in the loop, which counteracts the idea of being able to benefit from a robot taxi (e.g., being able to work inside one's vehicle).Strikingly, many researchers used the words 'shared control' rather loosely (e.g., referring to automation that requires some human involvement) or in fact meant traded control (per Sheridan and Verplank, 1978).Traded control, that is, automation that requires the driver to take over immediately, was generally regarded as a bad idea as an operator cannot be expected to return into the loop quickly (Stanton et al., 1997;Mok et al., 2015, for further critical review, see Banks et al., 2018;Casner et al., 2016).In the same vein, some of the researchers suggested that intermediate levels of automation should be skipped in favour of full autonomy.
AV-pedestrian interaction in the future urban environment was deemed challenging by the researchers, with various concerns focused on the behaviour of pedestrians themselves (see Domeyer et al., 2020).More specifically, there was a worry that pedestrian-heavy areas will become more susceptible to jaywalking due to the pedestrians' expectation that AVs will always stop (see also Liu et al., 2020;Millard-Ball, 2018).Such a scenario could render AVs immobile or make them drive very slowly under caution.It was suggested that pedestrians should change their behaviour and put safety first.Furthermore, the researchers recommended the segregation of traffic participants but noted that this would be a costly and resource-intensive solution.
A suggested solution for improved interaction was the inclusion of smart infrastructure in urban environments.In fact, the potential of smart infrastructure was recognised by many, but through different interpretations.It was mentioned that future infrastructure could receive state signals from AVs, communicate to VRUs wirelessly, or provide feedback to VRUs via the road surface, smart traffic lights, smartphones, and cloud systems.It was mentioned that smart infrastructure would be beneficial in pedestrian-heavy environments and could enhance the AVs's capabilities, such as by increasing their perceptual horizon.At the same time, it was argued by most of the researchers that cost, maintenance, and the reliability of wireless communication of smart infrastructure are major concerns.For these reasons, several researchers were outright negative about smart infrastructure, including V2I communication, especially for lowincome countries.They saw a greater potential in AVs that act independently from infrastructure and recommended further investment that direction.

Comparison of views on eHMIs and AR for eHMIs
The majority of researchers pointed out that future AVs should or will be equipped with eHMIs.They noted that VRUs would like the AV to communicate to them (see also Habibovic et al., 2018;Nordhoff et al., 2020) and would tend to base their actions on the eHMI message received.It was also argued that car manufacturers would like to ensure that their AV communicates at least some cues for resolving confusion and for liability reasons.The researchers recognised the potential of anthropomorphic eHMIs and found anthropomorphism an interesting area of research.However, they tended to be critical towards superficial forms of anthropomorphism such as artificial eyes and smiles on the AV.It was generally mentioned that eHMIs should convey their state (allocentric communication) and not instruct VRUs what to do (egocentric communication).Furthermore, the majority mentioned that text-based eHMIs should not be used because they require translations to different languages, may be hard to read from a distance, and take time to read.Instead, eHMIs should be simple, perhaps signify a change of state and not much else, so that children can understand it too.The rejection of text-based eHMIs poses a dilemma because several empirical studies have found such eHMIs to be effective or preferred (see Ackermann et al., 2019;Bazilinskyy et al., 2019;Fridman et al., 2017).Implicit communication (i.e., vehicle speed, trajectory, and distance) was regarded as dominant and easiest to interpret (see Ackermann et al., 2018;Moore et al., 2019).According to the researchers, eHMIs are at best a secondary cue: An eHMI should support and confirm the existing implicit communication and not be detached from the AV (such as via infrastructure or projections on the road) because of issues of stimulus-response compatibility.
The researchers pointed out that AR is already successfully used in certain industries (e.g., manufacturing and agriculture) as well as in smartphones.However, they were generally critical towards AR in future traffic, noting challenges of privacy, invasiveness, userfriendliness, technological feasibility (brightness, image stability, and wireless communication reliability), and inclusiveness (i.e., not everybody having access to such devices).Participants were considering different types of AR in their answers, including basic vibrotactile feedback and feedback on smartphones, but also advanced types of visual feedback presented in head-mounted glasses.Furthermore, AR can be non-conformal and conformal, where the latter is defined as feedback that is embedded in the real world.Conformal AR allows for the most innovative feedback opportunities.The researchers noted AR possibilities such as holographic traffic lights and signs, the removal of irrelevant information in the world, an indicator nudging the user's attention towards an AV, the projection of safety zones or coloured road surfaces, or a fence or barrier indicating that one should not cross (see Eriksson et al., 2019 for a similar barrier concept for drivers).Several researchers argued that AR feedback is only useful for tasks that involve longer time constants, such as navigation and wayfinding.They noted that AR for short-term tasks such as collision avoidance is not feasible or useful, since implicit communication is dominant and the temporal window for possible interaction is short.The researchers concurred that AR could resolve the one-to-many problem in eHMIs, or resolve language barriers by providing personspecific feedback, although some doubted this notion.The researchers pointed out that AR should be a secondary cue to implicit communication and eHMIs in the real layer because not everyone can be expected to wear an AR device.
The importance of standardisation of wireless communication protocols and eHMI designs were emphasised by many.The researchers indicated that the current proliferation of eHMI concepts is problematic (see also Dey et al., 2020a;Emmenegger et al., 2016;Merat et al., 2018).There was an emphasis on the importance of standardising colours for use in eHMIs, similar to how there are current international standard colours for various traffic signals, such as red for stop (see Dey et al., 2020b;Faas and Baumann, 2019).Furthermore, training and education of the meaning of eHMI/AR-based feedback were regarded as essential.
It was interesting that the researchers, although often in agreement, differed in their degree of conservatism.Some participants were conservative and noted that much more research is needed about what types of eHMIs are needed if at all, and expressed concerns about VRU workload and reliability, cost, and maintenance of technology.Others were liberal in their thoughts and saw great promise in new technology.For example, they mentioned that AR and eHMIs have the potential to reduce VRU workload and confusion.One researcher mentioned that, given the fact that we develop automation that offloads drivers in AVs so that they can engage in non-driving tasks (e.g., infotainment, working), then why not develop similar support for VRUs, allowing VRUs to consume media and be informed about traffic via AR only when needed?
A number of researchers made enlightening parallels with current traffic.For example, it was mentioned that the car's horn and turn indicators are in fact eHMIs, that blind-spot warning systems and HUDs are existing forms of AR for drivers, that current pedestrian traffic lights are already an example of one-to-many communication (thereby suggesting that this problem does not need to be resolved), and that the issue of cross-cultural interpretation of texts and messages is present also in current traffic.

Comparison of views on VR and AR simulation
There were mixed opinions on the usefulness of VR simulation testing.Some stressed the importance of VR simulation in creating controlled, repeatable, and affordable environments, enabling experiments with large numbers of participants quickly and effectively.VR also allows for testing with vulnerable groups in a less risky environment (e.g., Deb et al., 2020).Some of the researchers who had worked with VR found that their results were in agreement with similar experiments that were carried out in the real world (see also Deb et al., 2017;Fuest et al., 2020;Kaplan et al., 2020;Klüver et al., 2016).However, others stressed the importance of real-life studies, with VR testing being an intermediate step.
Several researchers mentioned that participants in a VR simulator study might not behave naturally due to their awareness that they are inside a VR environment and part of an experiment.It was noted that in experiments with human participants, the participants tend to please the investigator and attempt to do what the investigator wants them to do.Another reported downside to simulation testing is that most VR studies so far do not involve complicated scenarios but focus on one-to-one interactions without traffic, thus offering only a slice of the full experience.Other critical notions on VR simulation were about the difficulty and cost required in creating a physical movement sensation.However, there was a compromise in this argument concerning CAVE simulators, which allow pedestrians to walk through the actual space and encounter cars driven by other humans.
While the researchers were somewhat negative about AR in future traffic, 15 of the 16 researchers regarded AR as a suitable or excellent research tool for simulating and testing eHMIs and training in a realistic context.It was pointed out that in AR-based experiments, there would be more natural triggers, situations, risk, and real decisionmaking as compared to VR.Whether the promise of AR as a research tool will be developing into a commercial product in future traffic remains to be seen.

Conclusions
This study invited 16 Human Factors researchers to share their views on the topic of AV-VRU communication in future traffic with discussion points ranging from their views on the future of AVs, smart infrastructure, eHMIs, AR for eHMIs, and simulation.The researchers agreed that SAE Level 5 automation is still far away from on-the-road implementation and that intermediate solutions such as shared control could be viable.At the same time, there was substantial heterogeneity on the definition of automated driving, as it can be used in different contexts such as public transport and specialised services.It was also believed that automation will improve safety; however, for that, rigorous measures such as segregated roads would be needed.The majority of researchers expressed concerns about smart infrastructure because of the cost and maintenance issues involved.They therefore felt that AVs that can move independently from infrastructure are the way forward.
The majority of researchers agreed that eHMIs will form part of the future interaction process between VRUs and AVs.However, they noted that there are still several open research questions that should be addressed before moving on to standardisation.The consensus appears to be that text-based eHMIs and eHMIs that provide instructions to VRUs should be avoided.The importance of experience/training and mental model formation was highlighted by several of the researchers.They emphasised that the long-term effects of eHMIs should be studied, and examine whether eHMIs are important relative to implicit communication, which is likely dominant in AV-VRU interaction.
AR technology and wearables were enthusiastically received by the researchers who, however, also highlighted various practical limitations such as user-friendliness, invasion of privacy, and information overload in case multiple streams of information compete for the wearer's attention.The notion that the one-to-many communication problem of current eHMIs could be solved through AR technology was positively welcomed.Various design concepts were provided, such as the use of virtual fences, the use of AR as a secondary cue to implicit communication, and person-specific feedback.However, at the same time, it was recognised that, for the time being, AR would be more of a research tool than something ready for public roads.

Limitations
A limitation of this paper is that it contains personal views that are not always backed up by empirical evidence.The researchers were asked to share their predictions on AVs decades into the future, and these predictions may turn out to be inaccurate.For example, one of the researchers referred to the widespread future use of 5G technology for V2I and V2V communication.Although 5G is indeed regarded to have strong growth potential (Andrews et al., 2014;Li et al., 2018), it remains to be seen whether 5G will be used in future AVs on a broad scale.Similar remarks can be made about AR, which is a technology that is still in a nascent phase.
A second limitation is that although their views were solicited independently, the selected researchers for this study are not entirely independent because they know each other through the academic network.
A third limitation is the lack of industry representation, a decision taken that was taken to not favour one industry entity over another.The industry is likely to bring different sort of topics into the fray, such as regarding standardisation, regulations, and commercial viability (Emmenegger and Norman, 2019).A mixture of perspectives from Human Factors experts and industry could yield interesting insights that did not come across in the present paper.

Recommendations for Future Research
From the findings in this study, it is recommended that there be a push for standardisation of various eHMI elements.Also recommended are longitudinal studies to learn about the long-term effects of eHMIs (see also by Faas et al., 2020).Thirdly, the present interviews revealed that there is a lot to learn from current traffic and available research about anthropomorphism in robotics, HUDs in cars and aircraft, and existing types of communication such as brake lights and turn indicators, etc. Accordingly, we recommend that Human Factors scientists carefully familiarize themselves with the existing literature base before embarking on new empirical research.Finally, there should be a push towards standardisation of terminology in the research field.The present study highlighted differences between researchers in employed definitions, such as regarding shared control, smart infrastructure, and the SAE levels of automation.This semantic issue highlights that more communication is needed across the research field to homogenise such interpretations.9.There is the issue of different crossing cultures and language barriers.How do you think this communication problem could be solved for eHMIs?10.How should such technology adapt to children and older people?
A.3.Questions on AR and AR eHMIs 1. Do you think pervasive and context-aware AR technology is possible in the near future and how far away do you think we are from such technology?2. Do you think this technology will be of benefit to humans? 3. How can AR be used in the future?4. Would you be comfortable wearing AR wearables (glasses/ lenses) when this technology becomes pervasive in the future? 5. Do you think AR would be helpful for VRU-AV communication?
Why? 6. Do you think there is potential in using AR for eHMI design?
Why? 7. What information do you think is essential for VRUs in an AR environment?8.There are obviously several problems with this approach, namely not everyone having AR wearables, similar to not everyone being in possession of a smartphone.A possible solution to this would be to combine other eHMIs on the car or infrastructure which would still convey information to a pedestrian.What are your thoughts on this?What do you think would be the best possible combination?9.With current solutions, it is still difficult to interpret to whom the AV is communicating to if there are multiple VRUs, and this could potentially cause accidents.Do you think that this approach would alleviate the ambiguous situation when there is an AV attempting to communicate with a group of VRUs? 10.There have also been occurrences where VRUs were confused in situations where textual eHMIs were utilised especially in cases where the text was written in a language unfamiliar to them.Do you think a customised AR eHMI approach would solve such a problem?Why?
A.4. Questions on VR and AR experiments 1.Current VR simulation studies investigate the behaviour of pedestrians when interacting with AVs.Do you think such studies replicate the behaviour of pedestrians in real-life traffic conditions?If not, how can this be solved?2. Do you think there is potential in using AR cars for in situ eHMI simulation testing?Why? 3. Are there any advantages of AR over VR in this context?What are they?