Enhanced Teleoperation Interfaces for Multi-Second Latency Conditions: System Design and Evaluation

Adding human cognitive skills to planetary exploration through remote teleoperation can lead to more valuable scientific data acquisition. Still, even small amounts of latency can significantly affect real-time operations, often leading to compromised robot safety, goal overshoot, and high levels of cognitive workload. Thus, novel operational strategies are necessary to cope with these effects. This paper proposes three augmented teleoperation interfaces that allow the user to operate a robot subject to 3 seconds of latency: (1) Avatar-Aided Interface (AAI), a semi-autonomous approach based on a virtual element; (2) Predictive Interface (PI), an approach with direct control and predictive elements; and (3) Hybrid Interface (HI), where operators can easily switch between PI and AAI. We conducted a systematic within-subject experiment to evaluate the proposed interfaces in a realistic virtual environment with frequent traction losses. The user study compared AAI and PI to a Control Interface (CI), which did not display any augmented elements. The main results of this comparison showed that: (1) AAI led to a significant reduction in workload and a significant increase in usability and robot safety; (2) the use of the PI caused a significant increase in path length, indicating that operators overshoot their goals more often with this approach; (3) PI and AAI had lower reported effort; and (4) AAI is more flexible and effortless than PI and CI. Finally, during traction loss periods, PI and AAI had shortcomings that led to confusion from the operator, showing the need to integrate uncertainty measures in future interface design.


I. INTRODUCTION
With the growing interest in planetary exploration over the past decades, further research is essential to address the new operational challenges. In particular, due to the harsh conditions and expensive operations on planetary surfaces The associate editor coordinating the review of this manuscript and approving it for publication was Antonio Piccinno .
(e.g., Mars or the Moon), human activities in these environments will require a strong collaboration between human and robotic teams. However, direct teleoperation will be challenging due to the physical distance and consequent communication latency between Earth and the planetary surface.
When teleoperating a robot in an unknown, unstructured, or dynamic situation, it becomes difficult for the operator to perceive the remote environment accurately and make timely VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ and effective control decisions [1]. Furthermore, the physical separation between the human operator and mobile robot during teleoperation raises several challenges. One particular challenge is providing adequate awareness of the robot situation, known as Situation Awareness (SA). This challenge becomes even more demanding when there is communication latency, as it significantly impacts human cognitive processes and performance. Consequently, a simple driving task becomes demanding and highly stressful for human operators when resorting to teleoperation to act in a timedelayed environment, even with a latency of only a few seconds [2], [3]. Thus, adequate interaction methods should be integrated into the teleoperation systems to mitigate the negative effects of latency on operator performance and robot safety.

A. HUMAN PERCEPTION AND CHALLENGES DURING TIME-DELAYED TELEOPERATION
When teleoperating a robot without communication latency, the operator can use low-level commands, e.g., move forward, and see, almost immediately, the result of this action in the image stream of the robot. However, when the teleoperation system is subject to communication latency, the operator's commands become timely disconnected from its feedback. This often leads the operators to overshoot their goals, which might require several corrections to reach the intended goal. Consequently, latency during robot teleoperation often leads to increased collision rates, compromised robot safety [4], [5], [6], and high human cognitive workload [7]. Moreover, extended exposure of the operator to such conditions can lead to mental fatigue due to high cognitive overload [8].
For example, when discussing the Earth-to-Moon teleoperation scenario, the image will only show the forward movement 3 seconds later if the operator sends a command to move forward. Thus, the operator must often employ a move-andwait approach to control the robot. With this approach, the operator must constantly send commands and wait for feedback instead of continuous control. Additionally, the operator must continuously perform a mental estimation of where the rover will reach, often leading to a high mental workload and higher robot collisions.
To develop effective teleoperation interfaces to cope with significant latency, we must first understand the limitations of human cognition in this context. According to Sheridan [9], humans have a minimum response time of approximately 0.2 seconds. Suggesting that humans cannot distinguish between real-time feedback and feedback with time delays of approximately half a second [10]. Lester [11] stated that low latency telerobotics should be conducted within the human cognitive threshold for real-time perception, allowing for a maximum of 0.4 seconds. Mellinkoff [10] verified that even small amounts of latency (2.6 seconds) could drastically affect the real-time operation, thus requiring novel operational strategies that compensate for this effect.

B. CURRENT APPROACHES TO PLANETARY TELEROBOTIC EXPLORATION
Planetary missions often resort to teleoperated robotic systems with different levels of onboard autonomy [12]. For example, robotic platforms on the surface of Mars operate with a two-way communication latency from 8.6 minutes to 40 minutes. Therefore, they require several autonomy components to ensure safety and task completion. Additionally, due to the communication constraints, rovers on Mars must constantly wait for sequences of high-level commands (e.g., position goals) sent from Earth [13], [14]. However, for lunar missions, rovers can be operated, from Earth, with an average two-way communication latency of about 3 seconds [10], [15], [16]. Such a scenario allows adding human cognitive skills to the control loop, resulting in more effective and valuable scientific data [13], [14], [17], [18], [19]. For example, Fong et al. [14] showed that scouting missions were more successful when operators could manually control a rover than autonomous navigation. Thus, direct robot teleoperation with low-level and near real-time commands (e.g., locomotion commands) is possible for the Moon, unlike on Mars.
Consequently, investigating efficient teleoperation systems under communication latency is essential to reduce future mission costs and human risks, as the proposed teleoperation strategy does not require humans to be on the surface of the Moon or even on an orbiting station (e.g., Deep Space Gateway). However, the efficiency and reliability of Earth-to-Moon robot teleoperation are constrained by the challenges imposed by communication latency and should be adequately studied and mitigated.
Since the DSG will not be manned year-round, it will also serve as a communication relay between ground assets on the lunar surface and Earth command stations [24], [25]. These technologies will provide an opportunity for novel and more effective interaction methods between humans and robotic platforms for planetary exploration. Thus, it is necessary to consider context-specific requirements, constraints and challenges to develop effective teleoperation interfaces that can cope with multi-second latency.
Previous experiments on the International Space Station (ISS) demonstrated that astronauts can maintain appropriate SA with a low effort and workload when using a supervisory control method while ensuring overall mission success, for low latency conditions (< 0.5 seconds) [17], [19], [21], [26], [27], [28]. Nevertheless, there are often events that current state-of-the-art autonomy fails to solve and requires human intervention through direct teleoperation. For example, Schreckenghost [29] showed that, during an autonomous recon mission, humans had to spend significant time handling anomalies that interrupted robot activity. On average, operators had to intervene every 24 minutes (minimum: 5.5 min, to maximum: 1 h), and each intervention was, on average, 5.6 minutes (minimum: 1.6 min to maximum: 17.9 min).
Future telerobotics systems should allow for direct teleoperation so that crew members can perform low-level commands (e.g., wheel motion) while maintaining appropriate SA. However, when designing and implementing these direct teleoperation interfaces, it is necessary to consider the context-specific challenges. For example, for ground rovers in planetary exploration, it is necessary to convey to the operator the appropriate SA of the robot status and any possible mobility faults (e.g., traction loss). The latter are often unexpected events that the current state-of-the-art autonomy still fails to solve. Thus, it requires human cognitive and dexterous skills to cope with these events and return to system to a nominal state [29].
In this paper, we propose three teleoperation interfaces to aid the navigation of a ground robot in a multi-second latency (3 seconds) scenario. The novelty presented by this paper is two-fold. First, we present the successful integration of three augmented interfaces: Predictive Interface, an interface with a predictive display of the rover path validated by the current literature; Avatar Aided Interface, a novel semiautonomous approach inspired by line-of-sight teleoperation approaches; and Hybrid Interface, a teleoperation interface that allows switching between two teleoperation approaches (adjustable autonomy) to cope with several operational needs. Second, we systematically evaluate the presented interfaces in a realistic environment in the presence of uncertainty (traction losses) and subjected to multi-second latency conditions.
Additionally, we focused on the Moon exploration scenario case study, where the ground rover is operated from Earth with a latency of 3 seconds. Nonetheless, the practical application of the proposed teleoperation interfaces goes beyond a moon scenario and can be applied to several multi-second latency applications on Earth.
The remainder of this article is structured as follows: Section II presents a brief review of the literature on methods to aid teleoperation under communication latency, Section III describes the teleoperation architecture, the development, and integration of the three augmented interfaces, Section IV presents the design of a human-subject experiment to evaluate the interfaces, Section V reports and discusses the acquired results, system gaps, lessons learned, and insights for future teleoperation interfaces, and, lastly Section VI presents the paper conclusions.

II. RELATED WORK
A review of the literature regarding approaches to compensate for the negative effects of latency in robot teleoperation reveals two main approaches: (1) predictive approaches capable of conveying, to the human operator, the state of the robot and remote environment [2], [30] or capable of predicting human intention [6], and (2) the use of different autonomy levels capable of ensuring robot functionalities and local safety [17], [31]. Thus, the literature suggests that one of these two approaches would be beneficial to apply to an Earth-to-Moon teleoperation scenario (3 seconds latency). However, due to the different evaluation conditions and metrics across the different publications, we cannot directly compare the results of the tested interfaces and make an informed decision on each to apply to the proposed scenario. Moreover, most of the presented work only compares one of the approaches to a control interface (no compensation techniques) and does not compare to other approaches. Therefore, this paper presents the design and systematic evaluation of three teleoperation interfaces (predictive elements, onboard autonomy, and hybrid) within the context of lunar exploration. Furthermore, the proposed systematic user study will directly compare the two main approaches that the literature review suggests have significant advantages to teleoperation under latency conditions.
Currently available methods that resort to predictive approaches for teleoperation under latency conditions are two-folded. First, several authors proposed methods that convey to the operator predictions about the rover's status based on delayed telemetry. For example, Hu [32] graphically rendered near-instant visual feedback, and Matheson [2] augmented the delayed image feed with a prediction of the current robot pose. A human-subject experiment showed this display significantly improved performance in terms of time taken to complete the courses [2]. Additionally, Burker [30] and Chong [33] investigated using visual predictive displays for telemanipulation by providing a photorealistic display. In particular, Matheson [2] conducted a human-subject experiment that showed that these predictive displays significantly improved performance during teleoperation. Second, other authors proposed an alternative to feedback prediction by predicting human intention during time-delayed teleoperation. Su [4] proposed modeling the operator's intention remotely based on previously received commands, while Nieto [5], [6] estimated the operator's position intention, which was later executed by autonomous path planning and motion planner algorithms.
Regarding using different autonomy levels to compensate for the adverse effects of latency, the literature shows that when low-latency conditions (<0.5 seconds) are available, operators can employ direct control of the robot. However, as the latency of the teleoperation system increases, so does the need for increased onboard autonomy. For example, Schiele [34], [35] was able to use a direct control given the low-latency conditions (0.8 seconds) and described an experiment showing the feasibility of performing haptic interactions between humans from space to the ground. Furthermore, Storms [31] introduced a shared control method for teleoperation under less than one second of latency, where autonomy elements handled obstacle avoidance and control arbitration. Finally, supervisory control was revealed to be one of the most adopted strategies for planetary exploration [17], [19], [21], [26], [27], [28], [36], [37]. This strategy allows taking advantage of robot autonomy integrated while performing high-level decision-making by the human crew. Additionally, this autonomy level helps the operator cope with higher communication delays, where the review literature tested a maximum of two-second delay.
Finally, Walker [38] presented an Augmented Reality (AR) interface to control a drone in line-of-sight using a virtual element (surrogate). With the proposed interface, the operator controlled a virtual surrogate rather than directly operating the robot. The surrogate was then autonomously followed by the robot and provided the operator with foresight regarding where the physical robot will end up and how it will get there. Experimental results showed that the proposed interface significantly lowered task completion time compared to the baseline interface. Furthermore, with this approach, the operator can have a closer experience with a direct teleoperation setup with the surrogate, while the onboard autonomy ensures efficient and safe navigation within the remote environment.
Although this work [38] was only applied to a line of sight teleoperation, it could potentially be beneficial in a remote teleoperation scenario, as proposed in this paper. However, it raises some questions regarding its application within remote teleoperation. For example, is this teleoperation method still beneficial when applied to a scenario where the operated robot is not in the line of sight? Furthermore, what is the appropriate and efficient way of displaying the surrogate element in a 2D interface? Section III-C presents the design of an augmented interface for remote teleoperation based on Walker's proposed and evaluated work.
In conclusion, this paper presents and systematically evaluates three literature-based operational strategies that compensate for the adverse effects of latency during remote teleoperation. First, we present in Section III-B a teleoperation interface that predicts the robot's position based on the work proposed by Matheson [2]. The second is a teleoperation interface that indirectly allows the operator to control the robot through a virtual avatar (Section III-C), motivated by the work of Walker [38]. Finally, Section III-D describes a third teleoperation interface that allows operators to switch between these two operational strategies as the operator's needs vary during the task.

III. AUGMENTED TELEOPERATION INTERFACES: SYSTEM DESIGN AND IMPLEMENTATION
We propose three augmented teleoperation interfaces to aid the operator during teleoperation under multi-second latency conditions.

A. IMPLEMENTATION OF THE TELEOPERATION SYSTEM
The implementation of the teleoperation system and augmented interfaces was based on ROS 1 (Robot Operating System). In particular, we simulated the robot (Husky 2 ) and environments using Gazebo, 3 a ROS-based physics simulator, and resorted to Rviz, 4 a ROS-based visualization tool, to implement the visual interfaces. These open-source tools produced a modular platform that can be easily replicated and adapted to various system requirements. Finally, we resorted to an out-of-the-shelf (simulated) mobile platform with an onboard camera and a 2D laser scan, as illustrated in Fig. 1. Regarding the software implementation, Fig. 2 shows the architecture of the implemented teleoperation system and illustrates its main components.
With the implemented teleoperation architecture, the operator receives feedback from the robot through a visual interface and can send motion commands with a controller (e.g., gamepad). However, due to the communication latency, the robot only receives the commands 1.5 seconds after the operator sends them (forward latency). Since the time latency was not naturally available in the simulated environment, we implemented a system component that artificially adds latency to the system (see ''Delay Generator'' in Fig. 2). The implemented delay generator is responsible for the exchange of data between the local machine and the remote robot hardware and simulates the presence of latency in a controlled way (1.5 seconds delay each way). On the remote environment, the robot receives the delayed motion commands and operates within the environment. Additionally, it uses its sensor data to localize itself in a known map of the environment. Information regarding the pose of the robot and the image being streamed by the onboard cameras is then sent back to the local machine and visualized using the ''Augmented Interface''.
The ''Augmented Interface'' component ( Fig. 2) was implemented using several Rviz plugins. In particular, the camera display plugin 5 overlays the augmented information (avatar or rover's pose prediction) on the delayed image, immediately reacting to the operator's commands. However, monocular cameras, such as the one used in this paper, have a limited field-of-view (FOV). Hence, when the position of the augmented elements is outside the FOV of the camera, these are no longer visible in the image stream. For example, in Fig. 3, when the operator wants to move the augmented element (avatar) away from the robot, this one exits the FOV of the camera. If only the image stream is available, the operator must wait for the delayed movement of the robot to see the augmented information again and continue the rover control. Therefore, the augmented interface also displays a map to aid robot control to compensate for this limitation.
Similarly to the streamed image, the map displays the delayed pose of the rover overlaid with the augmented elements. Consequently, even when the augmented elements are outside the FOV of the camera, the operator can continue the rover's control by looking at the map. Likewise, the map adopts an ego-referenced configuration to avoid a similar issue of having the augmented elements outside of the visible 5 http://wiki.ros.org/rviz/DisplayTypes/Camera (last accessed: 15 September 2022) map. With this map representation, the rover is fixed in the center while the map and augmented items move around it.
Finally, the augmented elements displayed in the ''augmented interface'' ( Fig. 2) depend on which of the three approaches the operator is currently using: • When using the Predictive Interface (PI), described in section III-B, the operator directly controls the rover with low-level commands (e.g., move front/back) and sees a prediction of the rover's movement and its final position.
• When using the Avatar Aided Interface (AAI), described in section III-C, the operator controls an avatar that sends high-level commands (navigation goals) to the rover, while this one moves autonomously to the requested location.
• When using the Hybrid Interface (HI), described in section III-D, the operator can choose to manually change between the two previous interaction modalities (PI and AAI).

B. PREDICTIVE INTERFACE (PI)
With the Predictive Interface (PI), the operator can continuously send locomotion commands, e.g., with a gamepad, and see, augmented on the image, a prediction of the path and final position of the robot based on those commands (see Fig. 4). The operator can receive immediate feedback about the future trajectory of the robot, anticipate collisions and avoid overshooting its goals. The design of the predictive display (robot footprint and trajectory) was an iterative process and was based on the work in the literature concerning effective robot teleoperation with predictive displays under time delay [2], [31], [39], as these showed significant improvements in teleoperation for multi-second latency conditions. However, given the well-established predictive approach, we maintained the main features of the previous work [2], [31], [39] and added several features that we consider relevant during the design and iteration. Such additional features include: (1) using an arrow to indicate the robot's orientation, (2) showing the future pose of the robot instead of the current pose, and (3) augmenting the predictive element both on the image and the map. These features of the predictive interface are described in detail in this section.

1) PREDICTIVE DISPLAY DESIGN
From the operator's perspective, the most relevant information is the future state of the robot (ŝ 2 ). Thus, we designed a predictive display that provides information regarding where the robot will be when it receives the command currently being sent with the gamepad, as illustrated in Fig. 5. When the user sends a command at the time stamp t, the augmented prediction shows where the robot will be at time stamp t+3 seconds. With this information, the operator can visualize the robot's future states and avoid errors associated with latency-based teleoperation, such as collisions or goal overshooting. The final design of the augmented predictive display is shown in Fig. 4. This one includes the representation of two elements: the future trajectory of the robot and its future pose, represented by its geometric footprint and an arrow. The augmented trajectory is represented by two green parallel curves and aims at improving awareness of the robot's dimensions in the environment. A notion of the robot's dimensions in future positions can aid the operator in controlling the robot and avoiding collisions in small spaces.
Although the geometric footprint of the robot would be clear enough to represent the robot's orientation, when the prediction coincides with the robot's position, the footprint is outside the FOV of the camera and is no longer visible in the image. If the operators wanted to turn the robot, they would need to switch their attention to the map to see the augmented prediction changing. Therefore, to allow the operators to focus on the image, the augmented geometrical footprint also includes an arrow to indicate its direction, as shown in Fig. 6.

2) PATH AND POSE PREDICTION
The prediction of the future robot's position is calculated based on its kinematic model of the robot and the locomotion commands sent by the operator. As the operators continue to send new commands, these are stored in the buffer while the commands that have already been executed are removed. Finally, we update the trajectory estimation with this  information, and the augmented elements immediately respond to the locomotion commands of the operator.
The implemented buffer has a time length of the two-way communication latency ( t f + t b ) and contains the information of N consecutive locomotion commands that the robot will execute during that time period. Based on the delayed pose of the robot and the current locomotion commands, we calculate the series of consecutive displacements of the trajectory that will lead to the final augmented prediction, as illustrated in Fig. 7.
Each locomotion command is composed of two parts: linear velocity in x, y, and z axis, and angular velocity around x, y, and z axis. Since the used robot (husky) is a differential drive robot moving in a 2D plane, its movement is a combination of linear velocity in the x axis (v), the forward direction, and angular velocity around the z axis (w), the vertical axis. Thus, each trajectory segment results from the linear displacement in x and y axis between two consecutive frames P k and P k+1 , where k ∈ {0, N − 1}, is given by and the angular displacement, around z axis, is given by where t stands for the two way communication latency ( t f + t b ). However, these displacements lead to a pose represented in the P k frame, when our goal is to calculate the series of frames P k that compose the trajectory in the world frame (W ). Thus, using homogeneous coordinates representation, the transformation between the P k and W frames is given by where R is the rotation matrix and T the translation matrix.
Since the movement of the rover is limited to a 2D movement (z = 0 plane), R can be express as where θ k is the angle between the two frames P k and W , as represented in Fig. 7. Moreover, using the homogeneous properties we can estimate the matrix W H P k+1 : where stands for the transformation matrix between the world frame (W ) and the frame P k (see Fig. 7), and stands for the transformation matrix between the frames P k and P k+1 . Thus, the transformation matrix between the P k+1 and W , represented in homogeneous coordinates, is given by Finally, by knowing the delayed robot pose in the world, P 0 , and applying equation (9) to all N sequential locomotion commands, we can compute the prediction (P N ) of where the robot will be when it receives the next operator command. This information is then used to augment the image and map of the teleoperation interface, as shown in Fig.4.

C. AVATAR AIDED INTERFACE (AAI)
Alternatively to directly controlling the robot, the Avatar Aided Interface (AAI) explores using high-level commands to control the robot. Here, the operator uses a semi-autonomous approach to control the robot while the interface augments the image and map with the high-level navigation goal (avatar). The pose of the avatar is sent to the rover and autonomously followed in the remote environment. This approach aims at reducing the high mental workload caused by communication latency by allowing the operator to focus on higher-level goals of the mission (e.g., finding an item in the remote environment) instead of concentrating on the locomotion and safety of the robot.
With the Avatar Aided Interface, the operator continuously controls the pose of an augmented avatar using a gamepad (see Fig. 8). Here, the motion commands (e.g., move forward/backward) done by the operator with the gamepad are translated into the motion of the avatar element. The choice to use this continuous interaction method instead of only providing discrete way-points on the map, as previously done in the literature [24], [27], [37], was motivated by three main reasons: 1) providing a teleoperation control method that behaves close to a direct teleoperation approach in scenarios with no significant latency; 2) maintaining the interaction method (gamepad) typically used for direct teleoperation, instead of changing to a different control method (e.g., mouse or touch screen); 3) providing adequate situation awareness of future positions of the rover in the environment and its respective dimensions and possible interactions.

1) AVATAR DESIGN
The proposed avatar is a visual representation of the navigation goal of the operator. Hence, we used a visual model of the robot with partial transparency to represent the augmented avatar, as shown in Fig. 8. By using the visual model of the robot, the operator can have an enhanced awareness of the robot's proportions in the remote environment and anticipate VOLUME 11, 2023 possible collisions. Moreover, the transparency of the visual model serves two purposes: (1) to reinforce the knowledge that the avatar is not a physical element interacting with the remote environment but is simply a representation of a goal, and (2) to avoid covering the elements of the streamed image, such as obstacles beyond the avatar (see Fig. 9). The avatar location is sent to the robot as a goal pose to be followed by its onboard autonomous navigation components (see Fig. 2). These autonomous navigation components can then calculate the path the robot should take to safely reach the avatar pose without colliding with the obstacles in the environment. This path is then augmented (with latency) on the visual interface (see Fig. 8) to ensure operator awareness regarding future robot movements. Thus, the operator can continuously move the avatar while the remote robot tries to reach the latest avatar pose safely within the remote environment. This way, the robot avoids possible inefficient or unsafe trajectories produced by the operator. For example, if the operator moves the avatar with a zig-zag motion or crosses an obstacle, the planned path should be the shortest and safe to reach the latest avatar pose.

2) AVATAR CONTROL
The implemented avatar control is independent of the remote environment and does not consider its physical interaction with the environment. For example, the avatar can move through obstacles (e.g., a wall) that would be physically impossible for the robot to traverse. When the operator performs such an action, the robot can plan a path to reach the avatar goal by going around the obstacle. With this avatar implementation, the operators can abstract themselves from the robot's locomotion and focus on the mission goals. For example, if the an operator wants to move the robot to a pose behind a long wall. In that case, (s)he can move the avatar through the wall and wait for the robot to calculate and execute a safe path to reach the goal (see Fig. 9). Moreover, if we incorporated the physical interaction between the avatar and the remote environment, the operator would have an extra workload of estimating and executing a feasible path to reach the intended pose behind the wall.
The computation of the avatar's pose is done based on the kinematic model of the robot and the locomotion commands sent by the operator. As mentioned before, these locomotion commands are a combination of linear velocity in the x axis (v), the forward direction, and angular velocity around the z axis (w), the vertical axis. These linear and angular lead to a displacement in the avatar frame A k , as shown in equations (1), (2), and (3). The avatar's pose, in the world frame (W ), is calculated using the transformation matrix ( W H A k+1 ) between the avatar frame at each time stamp k + 1 (A k+1 ) and the world frame (W ) by applying equation (9), where t stands for the time interval between two consecutive locomotion commands. Finally, the calculated avatar frame A k+1 is sent to the robot as a navigation goal and used to perform autonomous navigation. With the integration of ''avatar controller'' (see Fig. 2) and the onboard components the operator can continuously control the avatar at a high frequency, ensuring a visually smooth movement of the avatar.

3) AUTONOMOUS NAVIGATION
For the robot's autonomous navigation, we resorted to move_base, 6 an off-the-shelf solution provided by the ROS framework. This one provides an implementation of an action that will attempt to reach a goal in the world (avatar pose) with a mobile base (remote robot). The move_base integrates a global 7 and local planner 8 to accomplish its navigation.
First, the global planner (dijkstra algorithm) calculates the robot's trajectory to go from its current pose to the avatar pose. This one considers the robot's motion capabilities (differential drive) and a global costmap that integrates information from a previously known map and obstacles. The generated trajectory is then augmented on the visual interface as a green curve, as shown in Fig. 8.
Second, the local planner provides the motion commands used by the robot's ''motion controller'' component (see Fig. 2) to follow the planned trajectory and reach the final goal (avatar pose). This local plan uses a dynamic window approach to provide the mobile robot with a sequence of velocity commands required to execute the global plan. For this step, the local planner resorts to odometry information and a local costmap, which integrates dynamic and unmapped obstacles, detected by onboard sensors.
Both the global and local planners provide the overall safety of the robot. On the one hand, mapped obstacles are integrated into the trajectory of global planner. Therefore the robot can plan a safe trajectory to reach the avatar pose without colliding with mapped obstacles. On the other hand, the local planner copes with obstacles not previously mapped and visible by the onboard sensors (lidar).

D. HYBRID INTERFACE (HI)
When discussing the design of the teleoperation interfaces and the control levels of a robot, some approaches are often optimal for a specific environment or task but not efficient in different scenarios, even within the same overall mission. For example, a higher level of robot autonomy might be advantageous to travel long distances or perform a repetitive task but unable to cope with high dexterity demands or complex decision-making. For example, during the METERON experiments in the ISS [37], astronauts reported the need to choose the perfect command modality to execute the different mission tasks and handle anomalies. Thus, literature shows that when designing teleoperation interfaces to explore remote environments, the interface often needs flexibility and adaptation capabilities to cope with different operator and mission requirements.
With the proposed Hybrid Interface (HI), the operator can easily switch between the two control methods (direct and semi-autonomous) and the two augmentation representations (prediction and avatar) described in sections III-B and III-C. With this interface, the operator can simply click on a button of the gamepad to request a change between the two control methods.

1) DESIGN OF THE CONTROL METHOD PANEL
With the HI approach, it is essential to efficiently convey to the operator which teleoperation method (avatar or predictive) is active and any ongoing changes between them. Specifically, changes in the control method imply that the same gamepad commands (e.g., move forward) will affect the robot's movement differently. For these reasons, we included a visual panel in the Hybrid Interface (HI) to indicate the currently active method and ongoing method changes (see Fig. 10 and 11).
The active method panel presents a simplistic design to avoid cluttering the interface or requiring an increased learning process from the operator. Here, the active method is highlighted with color, and the inactive one appears in grey. The colors used to highlight the active method were selected to be easily associated with the corresponding method. This way, the operators can quickly glance at the method panel to perceive the currently active method based on the displayed color. On the one hand, the avatar method is associated with the color yellow, as this one is the visually predominant color of this augmented element. On the other hand, the predictive is associated with the color green as this color is used to represent the path and final prediction.
Moreover, when the user requests to change between the two control methods, a colored arrow pointing at the requested method appears. This arrow confirms to the operator that the request has been sent, with latency, to the robot. Once the interface receives the delayed confirmation that the new method is active, the colored arrow disappears, and the corresponding name becomes highlighted. Figures 10 and 11 show a snapshot of the interface when one of the teleoperation methods is active and the operator requests to change to the other one.
The indication of change between control methods is particularly relevant due to the communication latency. Although the augmented elements immediately change when the operator requests, the robot's movement remains dependent on the previously activated control for at least 3 seconds. Therefore, it is necessary to inform the operator during these transition periods and provide confirmation of method activation.
In summary, with the hybrid interface, the operator can easily change between direct and semi-autonomous control of the robot while having access to the augmented elements (prediction and avatar). Moreover, the proposed control panel establishes awareness about the currently active teleoperation method and activation requests.

IV. SYSTEM EVALUATION
We designed and conducted a systematic user study to evaluate the proposed teleoperation interfaces. Similarly to other user studies investigating augmented interfaces, we implemented an additional interface to serve as a control condition. No augmented elements are displayed on the delayed image with this control interface. The results of this user study will advance the current literature by providing a systematic comparison between a teleoperation interface with a predictive display and an augmented interface with continuous goal-following control of the robot under multi-second latency conditions. Furthermore, even though the current VOLUME 11, 2023 literature reveals the need for adaptable teleoperation interfaces, it is still missing a systematic study of the interaction of the users with this type of interface and how it is affected by the remote environment.

A. USER STUDY DESIGN
To maximize the number of obtained samples, we employed a within-subject design. A total of 30 participants performed all four experimental conditions with a two-way communication delay of 3 seconds: • Control Interface (CI): teleoperation with direct control of the robot and no augmentation; • Predictive Interface (PI): teleoperation with direct control of the rover and augmented information regarding the position of the robot and expected trajectory (presented in section III-B); • Avatar Aided Interface (AAI): delayed teleoperation with semi-autonomous navigation controlled through an augmented avatar overlayed on the image (presented in section III-C); • Hybrid Interface (HI): delayed teleoperation with the possibility of switching between PI and AAI interfaces at any point of the experimental task. All experimental trials with HI start with PI or AAI methods active at random and participants can request to switch at any time (presented in section III-D).
To minimize the carryover effects, inherent to a withinsubject design, it was necessary to switch the order in which the participants performed the experimental conditions. However, the HI condition required that the participants previously performed both PI and AAI conditions. Hence, the HI condition was always the last condition to be performed. Consequently, the condition permutations only included CI, PI and AAI conditions. For this reason, the recorded metrics could only be compared between these three and the HI condition required a separate analysis.
Finally, the goal of the user study is twofold. First, study the impact of the augmented interfaces in the operator performance and answer the following research questions: What effect does the use of the augmented teleoperation interfaces (PI and AAI) have on:  Q8: how do the environment characteristics influence the use of the teleoperation methods?

B. EXPERIMENTAL APPARATUS
The experimental apparatus was divided into two components: (1) the teleoperation station, described in section IV-B2, and (2) the simulated remote environment, described in section III-A

1) TELEOPERATION STATION
During the user study, the participants sat in front of a monitor displaying the instructions, the visual interfaces and the questionnaires. Figure 12 shows the experimental setup of the teleoperation station, where the participants received visual feedback from the robot and controlled it with a gamepad. This setup also included a video camera, to record the interaction of the participants with the interfaces and verbal comments made during the experiments, and an eye tracker, to record the points of the interfaces participants most looked at during the tasks (eye gaze).

2) REMOTE ENVIRONMENT
The remote environment, to be explored by the robot during the experimental tasks, was simulated with Gazebo, a realistic physics simulator often used in the literature and robotics community. As shown in Fig. 13, the simulated environment (15 × 15 meters) is physically limited by a series of barriers, and its space contains several obstacles.
The chosen obstacles disposition tried to minimize the bias the environment would have toward benefiting some experimental conditions. For example, the robot can avoid most obstacles due to onboard collision avoidance safety measures in the AAI condition. Hence, if the environment only included obstacles detectable by the onboard sensors (lidar), the answer to our research question Q2, concerning robot safety, would likely be biased because the environment setup ensures ideal conditions for the AAI condition. Therefore, the environment contains hurdles that often cause safety measures to fail in realistic environments. These hurdles include small unmapped obstacles and barriers with a wide base not detected by the onboard sensor. This way, there is a chance that the robot collides with obstacles during all four conditions. Finally, the complexity of the remote environment, including the size, amount, and distribution of the obstacles, was iterated based on a series of pilot tests.
A total of five different remote environment configurations were built, including: • Three configurations with the same number of obstacles, but different distributions within the environment, used for the CI, PI and AAI experimental conditions. One of these configurations is shown in Fig. 14. This variation in the obstacle distribution minimizes learning effects between the experimental conditions CI, PI, and AAI. Since the three configurations were built with equivalent difficulty, these were not included in the conditions permutations and were assigned randomly.
• One configuration with fewer obstacles and more open areas for the training sessions.
• One configuration with a larger size and more obstacles to evaluate the HI condition. Here, one of the goals is to study the influence of the environment in the control method. Hence, three distinct areas were configured: (1) a large open area, (2) an area with a high density of small obstacles, and (3) one area with big obstacles detectable by the onboard safety measures.
Furthermore, the simulated environment and robot yielded frequent traction losses. It presents a relevant case study largely absent from the literature. Most evaluations of the predictive displays and use of autonomy test the approaches in ideal environments. However, events such as traction losses are realistic and frequent for ground robots, often leading to cognitive challenges to the operator (e.g. impaired SA or frustration). Testing the teleoperation interfaces in these realistic conditions is one of the significant contributions of this publication to the current literature.

C. PROCEDURE 1) PARTICIPANTS
Thirty unpaid participants aged between 21 and 45, with an average of 27 years old, voluntarily participated in the user study. Regarding gender, five participants were female and twenty-five male.

2) INSTRUCTIONS AND DEMOGRAPHIC QUESTIONNAIRE
All participants received written instructions about the user study, the experimental apparatus, procedure and goals of the recorded data. After reading them, the participants signed an informed consent allowing the recording and publication of the experimental data, including image and sound. Additionally, the participants answered a demographic questionnaire. This demographic questionnaire included questions regarding age, and frequency regarding travel to new routes, use of teleoperated devices and use of gamepads to play video games.

3) TRAINING SESSION
Before each experimental trial, the participants did a learning and training session. During the learning session, the participants saw an instructional video explaining the gamepads controls, the behaviour of the robot and the teleoperation interface under communication latency. Moreover, before each experimental condition, the video demonstrated the particular interaction method and its impact on the robot's movement. After watching the instructional video, the participants practised using the teleoperation interface until they felt confident to start the experimental trial (minimum 2 minutes).

4) INSPECTION TASK
When selecting the type of experimental task, several issues were taken into consideration. First, the task should provide a fair evaluation of all conditions. Since the interfaces affect the robot movement (direct or semi-autonomous), the task should require the operator to navigate the robot within a remote environment. Second, the literature on the evaluation of predictive displays often resorts to navigation challenges. Third, the feedback displayed on the interface focuses on augmenting the image stream from the robot. Thus, to ensure participants focus on the image to control the robot, instead of resorting to the map, the task should involve a search for an item in the environment. Lastly, provide participants with a realistic and engaging challenge.
For those reasons, to evaluate the different interfaces, the participants performed an inspection task during the experimental trials. Participants had to navigate through the remote FIGURE 14. Distribution of participants that finished the task before the 13 minutes time-out (successful), the ones that could not finish before the time-out (unsuccessful) and the ones that gave up. environment, find five numbered boxes (see Fig. 13) and take a picture (i.e. a screenshot) of each one. During each task, the participants had a time limit of 13 minutes before the task was automatically halted. The amount of numbered boxes and the time-out limit of the task were iterated based on a series of pilot tests.

5) POST-TRIAL QUESTIONNAIRE
After each experimental trial, the participants answered a NASA-TLX questionnaire [40], to report the task workload, and a truncated USE questionnaire [41], to assess the ease of use of the interfaces.
The USE questionnaire is a seven-point Likert rating scale, where users are asked to rate agreement with the statements, ranging from strongly disagree (1) to strongly agree (7). This questionnaire contains four dimensions, Usefulness, Satisfaction, Ease of Learning, and Ease of Use, that can be adapted to construct a shorter form of the questionnaire. For this user study, we wanted to understand if the interfaces were easy to use and if there was a difference in usability between them. Thus, our post-trial questionnaire contained a truncated version of the USE questionnaire. More specifically, the series of relevant statements within the Ease of Use dimension: (1) It is easy to use, (2) It is user friendly, (3) It is flexible, (4) Using it is effortless, (5) I can use it without written instructions, (6) I don't notice any inconsistencies as I use it.
Finally, once the first three trials were complete (CI, PI, and AAI), participants reported their preferred teleoperation interface and a justification for that preference.

D. EXPERIMENTAL METRICS
To evaluate the experimental conditions and answer the proposed research questions, we recorded a series of experimental measures that included: 1) Task completion time.
2) The number of collisions, as an implicit measure of robot safety. We assume that a higher number of collisions implies lower robot safety.
3) The total path length during a task, as an implicit measure of goal over-shooting. Here we assume that higher path length implies more over-shootings. 4) Task workload, through the NASA-TLX questionnaire.

5) Ease of use of the teleoperation interfaces CI, PI and
AAI, through the truncated USE questionnaire. 6) Eye gaze, to assess whether the participants looked more at the augmented image or at the map during the teloperation task. 7) Interface preference.

V. RESULTS AND DISCUSSION
In this section, we perform two separate analyzes. First, we analyze and discuss the experimental measures to compare the three conditions: CI, PI, and AAI (section V-A). This first analysis provides the results that support the answers to the research questions Q1 to Q5. Second, we examine the participants' behavior using the HI (section V-B) and quantify experimental metrics to answer research questions Q6 to Q8. Lastly, as mentioned in section IV-B2, the locomotion of the rover was conditioned by traction losses. Thus, it is necessary to contextualize the following results within the described experimental conditions.

A. ANALYSIS OF THE CI, PI AND AAI CONDITIONS 1) SUCCESS FINISHING THE INSPECTION TASK
Fig. 14 displays the data of task success. Here, we see that most participants finished the task before the time-out in all conditions, while two participants gave up when using AAI and CI. In both these cases, the participants gave up because the robot got permanently stuck and they could no longer control it. For this reason, the experimental measures recorded during those experimental trials were considered invalid. Hence, these data points were excluded from the statistical analysis of the metrics completion time, path length, and the number of collisions.

2) COMPLETION TIME
A repeated measures one-way ANOVA with assumed sphericity determined that the difference between mean completion time using different interfaces was not statistically significant (F(2, 54)=0.884, p = 0.419). Moreover, Fig. 15 shows that condition PI (M=526, SD=157) had a slightly lower mean completion time than CI (M=577, SD=158) and AAI (M=560, SD=162). However, the differences between the mean time in the three conditions were not statistically significant. The presented results answer the research question Q1: ''What effect does the use of the augmented teleoperation interfaces (PI and AAI) have on the task completion time?'' Using the Avatar Aided Interface and Predictive Interface does not significantly change the task completion time.
We expected that the AAI and PI would yield a statistically significant difference in completion time compared to CI. The literature and empirical observation show that the operators often employ a send-and-wait approach when teleoperating a robot under latency conditions with no augmented aids (CI). Additionally, one of the goals of the augmented interfaces (PI and AAI) is to minimize this send-and-wait approach and generate a smoother control by providing augmented elements or enhancing the navigation. However, the traction loss was not taken into account in the design of the prediction and navigation models. On the one hand, the predictions (PI) would sometimes be inaccurate and require a position correction due to traction losses, often leading to frustration from the participants. On the other hand, the semi-autonomous control provided by the avatar interface (AI) struggled to cope with the traction losses, as it did not take this event into account to actuate on the robot. Hence, future teleoperation interfaces would benefit from considering uncertainty in their design and implementation, as realistic environments are likely to generate unpredicted events and significantly impact operator performance.

3) COLLISIONS
A repeated measures ANOVA with a Greenhouse-Geisser correction determined that the mean of collisions differed statistically significantly between interfaces (F(1.89, 50.94)=11.96, p < 0.001). Post hoc analysis with a Bonferroni adjustment revealed that (1) the number of collisions significantly decreased on average by 4.7 when using AAI (M=3.71, SD=0.66) compared to CI (M=8.57, SD=1.48) (p = 0.007) and (2) the number of collisions decreased on average 7.4 when using AAI compared to PI (M= 11.11, SD=1.52) (p < 0.001). These results are shown in Fig. 16.
These results answer the research question Q2: ''What effect does the use of the augmented teleoperation interfaces (PI and AAI) have on the robot safety (number of collisions)?'' Using the Avatar Aided Interface significantly increases robot safety, while using the Predictive Interface does not induce a significant difference.
It is interesting to notice two points about the presented results: (1) onboard autonomy (AAI) led to improved safety, and (2) the average of collisions with PI is higher compared to CI. We believe that the higher collisions with PI occurred mainly due to the traction losses and their impact on prediction errors. When using the CI, participants were more conservative with the amount and velocity of locomotion of commands sent to the robot. However, when using the PI, participants often trust the displayed prediction and employ a more continuous control. However, when traction losses occurred, the displayed prediction would be inaccurate, as future traction losses were not considered in the trajectory and pose estimation. Thus, when participants noticed the robot slipping, it was often too late to correct the trajectory to avoid a collision.
Finally, as mentioned before, these results reinforce the need for integrating uncertainty in the design of augmented teleoperation interfaces. If the operators are aware of the uncertainty of the prediction, they could include that information into their mental processes and adjust their commands or proceed with greater caution to avoid collisions. However, predicting future traction losses and including that in the pose prediction is a complex problem that requires further research.

4) PATH LENGTH (GOAL OVER-SHOOTS)
A repeated measures one-way ANOVA with assumed sphericity determined that the mean of the path length differed statistically significantly between interfaces (F(2,54)=4.906, p = 0.011). Post hoc analysis with a Bonferroni adjustment revealed that path length significantly decreased on average 32.7 when using CI (M=112.09, SD=31.59) compared to PI (M=144.86, SD=55.15) (p = 0.010), as shown in Fig. 17.
These results support the answer to our third research question Q3: ''What effect does the use of the augmented teleoperation interfaces (PI and AAI) have on the total path length during a task?'' The predictive interface induced a significant increase in the path length compared to the control interface, while the avatar aided interface did not show a significant difference.
Since the path length was an implicit measure of the goal overshooting, we can infer from the presented results that participants overshoot their goals more often with PI than with CI. Furthermore, this result is also aligned with the higher number of PI collisions compared to CI, as operators that overshoot their goals are more likely to cause collisions of the robot with the environment.

5) WORKLOAD
A repeated measures ANOVA with a Greenhouse-Geisser correction determined that the difference in means of the VOLUME 11, 2023 reported NASA TLX score (workload) was statistically significant between interfaces (F(1.83, 53.068)=4.793, p = 0.014). Furthermore, post hoc analysis with a Bonferroni adjustment revealed that NASA TLX scores significantly decreased on average by 11.92 when using AAI compared to using CI (p = 0.021) (see Fig. 18).
These results support the answer to our fourth research question Q4: ''What effect does the use of the augmented teleoperation interfaces (PI and AAI) have on the workload of the operator during the task?'' The use of the Avatar Aided Interface shows a significant decrease in task workload while using the Predictive interface does not yield a significant change compared to the Control Interface.
We were expecting that both PI and AAI would induce a significant reduction in the reported workload. However, we suspect that due to the occasional inaccuracy of the augmented prediction (PI), the participants had a higher workload than we anticipated by mentally calculating the prediction's uncertainty and correcting goal overshoots. Fig. 18 illustrates a reduction of the average workload for PI compared to CI. However, this difference was not statistically significant.
Additionally, we performed a repeated measures ANOVA with a Greenhouse-Geisser correction for each questionnaire dimension (Mental Demand, Physical Demand, Temporal Demand, Performance, Effort, and Frustration). This analysis revealed that there was a statistically significant difference in Post hoc analysis with a Bonferroni adjustment revealed that: (1) mental effort decreased on average 14.3 (see Fig. 19) when using AAI compared to CI (p = 0.015), (2) physical demand decreased on average 13.6 (see Fig. 20) when using AAI compared to CI (p = 0.032), (3) effort decreased on average 16.2 (see Fig. 21) when using AAI compared to CI (p = 0.005), and (4) effort decreased on average 14.6 when using PI compared to CI (p = 0.004).

6) EASE OF USE
A Friedman test revealed a statistically significant difference in the reported usability scores depending on the    teleoperation interface, χ 2 (2) = 7.724, p = 0.021. Thus, we performed a post hoc analysis with Wilcoxon signed-rank tests with a Bonferroni correction, resulting in a significance level set at p < 0.017. The median of the reported usability for the AAI and CI trials was 5.25 (4.42 to 5.67) and 4.42 (3.63 to 5.04), respectively. There was only a statistically significant increase in the reported usability when using AAI compared to using CI (Z = −2.418, p = 0.016).
These results support the answer to our fifth research question Q5: ''What effect does the use of the augmented teleoperation interfaces (PI and AAI) have on the ease of use of the teleoperation interface?'' The use of the Avatar Aided Interface leads to an increase in reported usability, while the use of the Predictive Interface did not yield any significant changes compared to the Control Interface.
Finally, we performed an additional analysis on each of the statements on the USE questionnaire. Participants reported: (1) the AAI is more flexible than the CI (Z = −2.702, p = 0.007) and the PI (Z = −2.474, p = 0.013), and (2) using the AAI is more effortless than the CI (Z = −3.185, p = 0.001) and PI (Z = −2.865, p = 0.004).

7) EYE TRACKING
A Mann-Whitney test revealed that the percentage of points in the image area of the screen was statistically significantly higher than the percentage of points in the map area for all conditions, including CI (U = 0.0, p < 0.001), PI (U = 0.0, p < 0.001), AAI (U = 0.0, p < 0.001), and HI (U = 6.50, p < 0.001). These results are shown in Fig. 26 and corroborate that the participants mainly use the augmented image to control the rover, as expected.

8) POST-TASK QUESTIONNAIRE
When asked about their preference regarding the tested teleoperation interfaces, 50% of the participants preferred the Avatar Aided Interface, 40% preferred the Predictive Interface, and 10% preferred the Control Interface. When asked to justify their preference, participants reported that the AAI was a simple and effortless way to move the robot long distances while avoiding obstacles and locally coping with the robot slippage. On the other hand, participants reported that the PI helped perceive the robot's future states and perform delicate motions. Finally, the participants that preferred the CI reported that this interface was less noisy than PI and did not confuse them with inaccurate predictions.
Participants also reported that the prediction would lead to higher confusion and frustration. Additionally, some   participants reported frequently forgetting the presence of latency when using the AAI. With this interface, operators mainly focused on the avatar element, which was immediately responsive to all locomotion commands given with the gamepad. However, this forgetfulness sometimes leads to frustration because the movement of the rover and planned trajectory do not respond immediately to the avatar changes due to the existing latency.

B. ANALYSIS OF THE HYBRID INTERFACE CONDITION 1) USAGE OF TELEOPERATION METHODS
When using the hybrid interface, participants used, on average, the PI method 55% (SD=0.26) of the task time and the AAI 45% of the time (SD=0.26). A t-test did not reveal a statistical significant difference between the use of AAI and PI methods during the HI condition (t(58) = 1.589, p = 0.118). Thus, providing us the answer to our sixth research  question Q6: ''When using the hybrid interface (HI), do operators use one of the teleoperation methods during more time than the other'' Operators do not use one of the methods significantly more than the other. This result is aligned with our expectations, as the task took place in a scenario with diverse and equally distributed characteristics (e.g., open and tight spaces).

2) TELEOPERATION METHOD CHANGES
On average, participants using the Hybrid Interface switched between teleoperation modes 0.81 times per minute (SD = 0.51). We performed two statistical analyses to assess if changing the teleoperation methods impacted the participants' performance. First, a Pearson product-moment correlation determined the relationship between the number of method changes and completion time. Second, Spearman's rank-order correlation determined the relationship between the number of method changes and the number of collisions. There was no statistically significant correlation between number of method changes and the completion time metric (r = 0.054, n = 30, p = 0.778) or the number of collisions (r s (30) = −0.094, p = 0.620).
To answer research question Q7 ''When using the hybrid interface (HI), in which situations do users switch between teleoperation modes'', we reviewed the participants' video recordings while using the Hybrid Interface. We compiled a list of reasons we observed led participants to change the interaction method. Finally, we summarized the four most common reasons in Fig. 27. On the one hand, we see in Fig. 27(a) that participants mainly changed to the PI method to perform a finer control (e.g., a slight adjustment of the robot's orientation) and to navigate near small obstacles not detected by the onboard autonomy. On the other hand, we observe in Fig. 27(b) that participants mainly changed to the AAI approach to reach distant goals and cope with the robot slippage.
These results are aligned with the assumptions and empirical observations made in the literature. Supervisory control as been shown to be an adequate and efficient method for future crew-centered teleoperation. The literature reveals that this approach is capable of maintaining situational awareness of the operators with a low effort and workload while ensuring overall mission success. However, when using this approach, operators also report the need for low-level control (direct control of robot velocity) to perform more delicate movements or correct issues that onboard autonomy could not solve by itself (e.g., traction losses).

3) ENVIRONMENT CHARACTERISTICS AND METHOD USAGE
A t-test found no statistically significant difference between (1) the use of AAI and PI interfaces in open areas with no obstacles, t(58) = −1.804, p = 0.076 (see Fig. 24), and (2) the use of AAI and PI interfaces in the area with mapped obstacles, t(56) = 1.038, p = 0.304 (see Fig. 25).
A Mann-Whitney test revealed that the participants used the PI method significantly more than the AAI method in the area with very frequent small obstacles (U = 299.5, p = 0.026), see Fig. 22, and the area with small and mapped obstacles (U = 289, p = 0.017), see Fig. 23.
Based on these results, we can answer our final research question Q8: ''When using the hybrid interface (HI), how do the environment characteristics influence the use of the teleoperation methods? The environment's characteristics significantly impact the use of the teleoperation methods when small obstacles are present. When the environment exhibits these characteristics, operators use significantly more the PI method. On the other hand, operators tend to use the AAI more when controlling a rover in open areas without obstacles.

C. LESSONS LEARNED AND OPEN QUESTIONS
During the interaction of the participants with the teleoperation system, we observed several issues and relevant occurrences, leading to important lessons learned that will be considered in future work and are relevant to the research community. The results of the predictive interface were the ones that were further from our initial expectations. The frequent traction losses played a significant role in the obtained results from all tested conditions. These results evaluate a very relevant and realistic event and provide a relevant case study to the current literature. Previous work presented in the literature mainly focused on testing this type of interface (predictive and semi-autonomous) in ideal environments without significant uncertainty (e.g., traction losses or uneven terrain).
During the PI condition, we observed that the traction losses would lead to a cumulative frustration from the participants, as the predicted poses were occasionally wrongly estimated. A similar effect happened when using the AAI when the robot got temporarily stuck, and autonomy would take significant time to cope with that event and recover the nominal motion of the robot. In this case, the participants would display signs of frustration and complain that something was wrong with the robot and could not move onwards. Thus, unexpected events, such as traction losses, can have a significant impact on the performance of the teleoperated system and, consequently, on the operator's performance. For this reason, it is crucial to design teleoperation systems capable of coping with such unexpected events (e.g., displaying uncertainty of the augmented predictions in the visual interface).
One interesting remark made by several participants after the tasks with the AAI was that sometimes they forgot the existence of communication latency in the system. With this interface, operators mainly focus on the avatar, which is immediately responsive to all locomotion commands of the gamepad. However, the operator can only see the feedback from the robot's corresponding actions (e.g., planned path or movement) three seconds later. Thus, this forgetfulness sometimes led to frustration because the movement of the rover and planned trajectory do not respond immediately to the avatar changes, as it would in the absence of latency. Such empirical observations raise several interesting questions regarding interface transparency and user trust in the autonomy and predictive elements.
In conclusion, the methods proposed by the literature and this paper can significantly improve certain aspects of operator's performance and robot safety in simplistic environments (simulated, controlled laboratory environments, or even outdoors with flat floors). However, as we go into more realistic environments, aspects of the environment and unexpected events will be significantly more challenging to model or predict. Thus, it is crucial to test teleoperation system shortcomings in realistic scenarios and adapt the teleoperation methods accordingly. Future work should include real robot experiment in a rough terrain to evaluate advantages and limitations of the designed teleoperation interfaces and better prepare for future remote operations.

VI. CONCLUSION
This paper presents and systematically evaluates three teleoperation human-machine interfaces to teleoperate a ground robot under multi-second latency conditions. In particular, we focused on the case study of three seconds of latency, equivalent to a future Earth-to-Moon teleoperation scenario. First, we presented a teleoperation interface where the delayed image stream is augmented with the future path and position of the robot in the remote environment: Predictive Interface (PI). Second, we explored using a semi-autonomous approach to control the ground rover: Avatar Aided Interface (AAI). With this approach, the operators can control an avatar augmented on the image stream that illustrates the high-level representation of the navigation goal the operator wants the rover to reach. And third, we presented an interface that provides the flexibility that allows the operators to easily switch between the two proposed methods (PI and AAI) to better adapt to their needs during the task: Hybrid Interface (HI). Finally, the complete system architecture and interfaces were implemented with ROS, allowing it to be easily replicated due to its modular approach and open-source components.
To evaluate the proposed teleoperation interfaces, we conducted a systematic user study with a total of 30 participants. The simulated robot and environment used for the user study exhibited recurring losses of traction. This uncertainty in the environment significantly impacted the accuracy of the pose prediction of the PI. Moreover, onboard autonomy elements also struggled to cope with this uncertainty and impacted several aspects of the performance and workload of the operators during teleoperation. Thus, the experimental conditions of the user study provided relevant and realistic experimental conditions that literature still lacked.
When comparing PI, AAI, and a Control Interface (CI), the analyzed results of the user study showed that: (1) PI did not lead to a significant difference in time to complete a task, unlike similar work presented in the literature; (2) PI led to an increase in goal overshooting and higher number of collisions (lower safety), (3) AAI led to a decrease in workload (compared to CI), and (4) AAI led to higher usability when compared to PI and CI. These results show that conventional predictive displays could fail to enhance teleoperation and lead to lower robot safety because they cannot cope with uncertainty.
When studying the behaviour of participants using the HI, participants mainly used PI to perform delicate controls of the robot motion (e.g., minor adjustments) and in areas of the remote environment with small unmapped obstacles. On the other hand, participants mainly used AAI to navigate to far away goals, cope with frequent traction losses and tended to use it more in open areas of the environment.
Finally, the results of this paper show that augmented teleoperation interfaces could significantly benefit from integrating remote environmental uncertainties (e.g., traction losses and uneven terrain) in the design of the interfaces. Moreover, this paper shows the need for systematic evaluation of these novel interfaces in more realistic and complex environments, as these can significantly impact operator performance. In particular, it is necessary to evaluate the impact on the augmented teleoperation interfaces and operator performance when controlling a rover in rough terrains and with varying latency conditions. RUTE LUZ received the B.Sc. and M.Sc. degrees in aerospace engineering from the Instituto Superior Técnico, Lisbon, Portugal, where she is currently pursuing the Ph.D. degree. Since 2018, she has been a Researcher at the Institute for Systems and Robotics and the Interactive Technologies Institute, where she worked on multimodal interfaces for a search-and-rescue robot and an underwater vehicle. Her work on haptic devices and effective robot teleoperation has been published in IEEE RO-MAN, IEEE ACCESS, and ASTRA. Her research focusses on the development of effective interfaces for planetary exploration and involved the participation in AMADEE-20 MARS analog mission, organized by the Austrian Space Forum. Her current research focusses human-robot interaction, haptic interfaces, and robotics, leading to a collaboration with ESA on the development of novel control methodologies for ground rovers. RODRIGO VENTURA (Member, IEEE) received the Ph.D. degree. He is currently a tenured Assistant Professor with the Electrical and Computer Engineering Department, Instituto Superior Técnico (IST), University of Lisbon, and a Senior Researcher at the Institute for Systems and Robotics (ISR-Lisbon), part of the Laboratory of Robotics and Engineering Systems (LARSyS). He has published more than 130 publications in peer-reviewed international journals and conferences, on various topics intersecting robotics and artificial intelligence (H-index=21). He is also the co-inventor of several national and international patents on innovative solutions for robotic systems. He has participated in several international and national research projects. He is also an Associate Editor of the Journal of Field Robotics, among other journals. He is the Coordinator of the Minor in Space Sciences and Technologies at IST, a Founding Member of the Biologically-Inspired Cognitive Architecture Society, and the Alumni of the International Space University (ISU). His research interests include the intersection between robotics and artificial intelligence, with particular interest in human-robot interaction, mobile manipulation, biologically-inspired cognitive architectures, and machine learning. This research is driven by applications in space robotics, urban search and rescue robotics, aerial robots, and social service robots. VOLUME 11, 2023