Cooperative and Multimodal Capabilities Enhancement in the CERNTAURO Human–Robot Interface for Hazardous and Underwater Scenarios

The use of remote robotic systems for inspection and maintenance in hazardous environments is a priority for all tasks potentially dangerous for humans. However, currently available robotic systems lack that level of usability which would allow inexperienced operators to accomplish complex tasks. Moreover, the task’s complexity increases drastically when a single operator is required to control multiple remote agents (for example, when picking up and transporting big objects). In this paper, a system allowing an operator to prepare and configure cooperative behaviours for multiple remote agents is presented. The system is part of a human–robot interface that was designed at CERN, the European Center for Nuclear Research, to perform remote interventions in its particle accelerator complex, as part of the CERNTAURO project. In this paper, the modalities of interaction with the remote robots are presented in detail. The multimodal user interface enables the user to activate assisted cooperative behaviours according to a mission plan. The multi-robot interface has been validated at CERN in its Large Hadron Collider (LHC) mockup using a team of two mobile robotic platforms, each one equipped with a robotic manipulator. Moreover, great similarities were identified between the CERNTAURO and the TWINBOT projects, which aim to create usable robotic systems for underwater manipulations. Therefore, the cooperative behaviours were validated within a multi-robot pipe transport scenario in a simulated underwater environment, experimenting more advanced vision techniques. The cooperative teleoperation can be coupled with additional assisted tools such as vision-based tracking and grasping determination of metallic objects, and communication protocols design. The results show that the cooperative behaviours enable a single user to face a robotic intervention with more than one robot in a safer way.


Introduction
Remote robotic systems are increasingly becoming essential for industries hosting hazardous facilities for their personnel. The use of robotic systems could not only increase personnel safety but also improve the monitoring, inspection, and maintenance of environments to which access is constrained, limited in time, or not possible at all. Moreover, such robotic systems could be permanently installed in confined areas, improving the availability of the facility by removing the time necessary to access the area, which could be nonnegligible due to complex access procedures, remote locations, or a mandatory waiting period before ensuring safe human access.
Underwater robotic interventions present similar issues. Particular tasks, such as wreck inspections and manipulation, are far too dangerous for humans and remote robotic systems are necessary to perform them safely.
However, currently available robotic systems lack that level of usability necessary to broaden their use in both scenarios and require a pool of trained expert operators to ensure safe and successful operations in hazardous environments. Moreover, certain tasks may need a team of robots controlled with a certain degree of synchronisation, which is not easily achievable when multiple operators are involved. For example, picking up and transporting material is a common task, which requires big and delicate objects to be transported by multiple agents.
The CERNTAURO project [1] aims at creating a complete system for remote interventions in the European Center for Nuclear Research (CERN)'s accelerator environments, including preparation, preliminary studies, safety, and execution of the tasks. For this purpose, a modular, usable, and multimodal human-robot interface was developed [2] to ensure easier and safer operation of multiple remote robotic systems. The CERNTAURO project tackled a series of recurrent tasks, proposing different robotic solutions to ensure their accomplishment. Among the identified tasks, those requiring the use of multiple robotic systems appeared to be the most complex ones.
The TWINBOT project aims to go beyond the state-of-the-art in the field of underwater cooperative interventions. Common requirements and problems were identified with the CERNTAURO project, and the proposed Human-Robot Interface (HRI) with its functionalities was adapted and further developed to allow control of underwater vehicles. As a matter of fact, at the moment of writing, the TWINBOT project presented results in simulated scenarios for underwater mobile manipulators, including vision and underwater communications. Additional details about the TWINBOT project are available in Section 1.2.
The CERNTAURO project is more mature, and at the moment of writing this paper, more than 100 real interventions in hazardous environments have already been performed, including some dual-arm and cooperative ones, especially thanks to the modular CERNBOT platform [3]. Moreover, some work has been done to solve the communication constraints found between the accelerators' underground tunnels and the operator location. Additional details of the CERNTAURO project are available in Section 1.1.
In this paper, a solution to the problem of allowing a single operator to control multiple agents simultaneously is presented. The proposed solution makes use of behavioural control scripts programmed during the task preparation, customised and adapted during the task execution. Such scripts allow the creation of additional behaviours interconnecting multiple agents during the operation. The control scripts are executed on the human-robot interface, which can communicate with all the available agents seamlessly. The proposed solution satisfies the requirement of multiple agents controlling both the CERNTAURO and the TWINBOT projects.
The CERNTAURO and TWINBOT projects are first introduced in more detail, highlighting their synergies as well. Then, the paper introduces the state-of-the-art of HRI for robot teams. Afterwards, the system description of the HRI cooperative functionalities is presented, which proved to be fundamental for carrying out the mentioned operations safely.

Cerntauro Project
The design and development of custom robotic solutions is steered by CERN's needs for continuous robotic support to perform heterogeneous, complex, and unexpected interventions in hazardous environments. CERN's robotic service provides an emergency best-effort support, capable of deploying a remote robotic solution in less than two hours in CERN's entire accelerators complex. The CERNTAURO project [1] tackles these needs by creating novel robotic solutions that can provide good results in the accelerator infrastructures [4]. For this purpose, a set of customised modular robotic platforms, providing the possibility to be reconfigured and adapted according to the requirements, have been designed. Such robotic platforms triggered research and innovation related to communications, safe interventions, HRI, virtual/augmented reality, and Artificial Intelligence (AI) as well.
In order to improve the availability and efficiency of the CERN robotic service, the issue of multi-agent collaborations during a complex task has been explored. As an example, multiple robotic platforms can be used for grasping and transporting big objects (Figure 1), for improving the communication with the robot, or for providing additional points of view to the operator during a complex manipulation or navigation task. The modularity of the CERNTAURO system is faced considering both hardware and software re-configurability. This architecture facilitates adaptation of the system to new specifications, coming from new robotic intervention necessities, new hardware, and new functionalities. Moreover, this allows a short preparation time before robotic intervention, ensuring a fast emergency response but guaranteeing overall robot functionalities and safety as well.
In addition, the user-friendly human-robot interface has been designed in order to allow both robotic experts and inexperienced operators to operate the robots, resulting in an overall high transfer level of the system among a broader set of users.

TWINBOT Project
The aim of the TWINBOT. (http://www.irs.uji.es/twinbot) project is to overcome the current limitations of cooperative interventions in the field of underwater robotics, enabling the operator to supervise robotic operations through a visual-guided control-based human-robot interface and to perform complex tasks while avoiding cognitive fatigue that brings low-level manual control, thus augmenting the autonomy and intelligence of the vehicles. A significant issue in this project is the necessity to communicateto the robots in a wireless way, which requires the study of advanced underwater communication techniques.
To accomplish this goal, it is necessary to evaluate the operator's interaction with the robot team, composed of a pair of I-AUVs (Intervention Autonomous Underwater Vehicles), which are intended to perform cooperative operations in complex underwater scenarios.
Moreover, the cooperative underwater twin robots are intended to grasp and transport a big object (e.g., pipe) without the use of any umbilical and with the help of a multimodal human-robot interface ( Figure 2). A third robot can be used to provide support such as enhanced point of views to the robot team and operator.
The transmission of the images to the operator without the use of umbilical is possible thanks to an advanced compression algorithm with region-of-interest selection, which allows fo adaptation of the image size to the currently available bandwidth via radio frequency and sonar channels, obtaining appropriated image quality with image sizes of several hundreds of bytes [5]. Current efforts are trying to extend the system to Visual Light Communications (VLC) modems.
In order to enable partners experimentation before integration in the field, a set of simulation tools have been designed. UWSIM-NET [6] allows the simulation of cooperative interventions including mathematical models of real radio and sonar underwater transceivers, experimenting their network protocols through the NS3 platform as well. The realistic simulation used in this article takes as a basis the experience obtained from UWSIM-NET development. Moreover, previous projects such as FP7 TRIDENT [7] and MERBOTS [8] (Figure 3), in collaboration with the University of Girona (UdG) and the University of Balearic Islands (UiB), have allowed the current state of the research.

Technologies Used in the Projects: Synergies
Both projects presented synergies from the research and innovation points of view, especially on the necessity to provide a human-supervised HRI to control a coordinated team of robots, including a cooperative mission plan ( Figure 4). The presented unified multimodal human-robot interface provides a means for assisted grasping of metallic objects and 3D environment reconstruction, among others.
In summary, both the TWINBOT and CERNTAURO projects challenge semiautonomous cooperative grasping of big objects, techniques that are in the frontier of knowledge, especially in underwater scenarios (see Figure 4).  Common technological aspects of cooperative interventions in radioactive and underwater environments.

Related Work
The CERNTAURO human-robot interface allows an inexperienced user to easily operate a set of different modular robots, which is usually adapted to the intervention request. However, it did not allow for synchronisation between the different agents, making particular tasks difficult, including a high level of coordination.
The topic of controlling multiple robots during remote intervention has been studied in detail in the past few years. The problem has often been addressed under a different perspective, with the objective of optimising the number of robots and the number of operators during an intervention.
The goal is to maximise the overall intervention performance and to ensure human, environmental, and robot safety at any time. Several factors cooperate in the resulting task performance: human factors such as mental workload, stress, and errors must be taken into account, in particular when the tasks are difficult and safety critical; robot factors comprise the level of autonomy of the robot, the communication, and the sensors and tools available.
It has been demonstrated that, in determined scenarios, such as in urban search and rescue, the use of a single operator controlling one robot often resulted in missed opportunities for a more efficient task execution or in key details being left out. In the urban search and rescue scenario, it has been shown that an intervention supervisor collaborating with the robotic operator greatly improves the overall task efficiency [9]. Studies showed how additional robotic roles to task experts affect operators' multitasking and efficiency [10][11][12].
In certain scenarios, a single operator is in charge of controlling multiple robots, although in most cases, the operator must accomplish a single task using several robots, where there is a context switch cost coming from the transition between multiple parallel tasks [13].
Designing an HRI that enables a single nonexpert user to control a coordinated team of robots is a challenging objective [14], especially when the task requires accurate synchronisation between mobile manipulators [15]. Also, the presence of communication constraints, as found in underground tunnels and underwater scenarios, makes the problem even more complex.
In fact, enabling human supervision of the robot team facilitates the introduction of many teamwork applications, in particular in those scenarios where an expert of the application is necessary, where the expert has more knowledge of the tools and targets to intervene on rather than the robots to be controlled. This finds an important application at CERN as well as in marine archaeology [16].
Increasing the level of autonomy of the agents [17] ensures the operator's ability to multi-tasking, resulting in increased efficiency and task safety. It has been demonstrated that an operator can control multiple agents, provided that they exhibit an appropriate level of autonomy [18,19]. In this context, the fan-out measure represents the number of independent and heterogeneous robots that an operator can control simultaneously [20]. Some experiments provide a user interface to enable a team of operators to control a heterogeneous team of robots [21]. This is especially important when sharing control with the robot team can be done on time as well as in task assignment. However, this finds limitations when a task has to be performed by two robots in a synchronised manner, especially for grasping interventions. Other works focus on human-robot cooperation not only at a distance but also on the field. For this reason, it is important to find a way to share their internal knowledge (e.g., ontologies) of the application as well as the goal of the cooperative mission [22].
Although the topic of multi-agent collaboration is investigated in more and more detail, little studies provided accurate performance metrics to set a baseline for future improvements. In [23], it is proposed to measure efficiency, effectiveness, and user satisfaction, as in standard human-computer interfaces evaluation. Furthermore, it defines four metrics for the evaluation of HRIs when used by nonexpert operators: predictability of the behaviour, capability awareness, interaction awareness, and user satisfaction. In the context of this work, two performance metrics appeared to be particularly suitable: time-based efficiency and expert relative efficiency [24].
Time-based efficiency is a measure of the effectiveness of the interface on a specific task. It is defined as follows: where R is the number of tasks; N is the number of users; n ij is the result of the task, with 1 if succeeded and 0 if failed; and t ij is the time taken to complete the task. The result of this metric is in goals per time unit. A more interesting way to evaluate the efficiency of an interface in this context is the expert relative efficiency, which is defined as follows: where N is the number of tasks, N s is the number of succeeded tasks, R is the number of users, t ij is the completion time for the task, and t 0i is the ideal completion time for the task of an expert user.
The physical meaning of the relative expert efficiency is the measure of potential efficiency relative to actual system efficiency with regard to its user effectiveness. Although these two metrics are well known in the context of User Interaction (UI) and User eXperience (UX), they have rarely been used in the context of human-robot interactions for remote robots, and therefore, reference values for demonstrating the effectiveness of an interface are not available.

Previous Dual-Arm Experiments
The CERNTAURO project has faced a significant number of robotic operations in hazardous environments that required a single mobile platform with two robotic arms [2]. As a matter of fact, the use of the robotic manipulators depended on the specific intervention, having situations where each robot was configured to perform a specific function (e.g., screw/unscrew and holding). Also, in other operations, the robots performed synchronised behaviours, such as holding a part while the other manipulator was performing the intervention (e.g., disconnecting a pipe).
To this end, the CERNTAURO project enables the robots to be adapted to a dual-arm configuration according to the operation necessities ( Figure 5).   Not only the robot mechanical components but also the software, such as the human-robot interface are adapted to the operations. As a matter of fact, in Figure 7, specific components interacting with the dual-arm robot configuration can be seen. As it can be seen in Figure 8, the first step to recover a pipe was to use a single mobile platform with dual-arm configuration, using the simulation server connected to the user interface.
This experiment allowed the operator to enable leader-follower behaviours in order to synchronise the grasping, recovery, and transportation of the pipe. A video of this experiment can be found in the Videos section (see Section 4.5).
The lessons learnt from this experiment can be summarised as follows: • Recovery of the pipe with a single robot dual-arm configuration was successfully and easily performed by the operator, considering the facilities provided by the central head camera. • Transportation of the pipe with a dual-arm configuration has failed over 30% of the iterations due to the inertia forces of the mobile platform that might cause the pipe to be dropped. • To provide safety to the intervention, it was necessary to add three auxiliary robots to the scenario, providing external viewpoints to the operator from the top, left, and right.
In order to better solve these issues and to be able to face further problems with big objects, this paper focuses on grasping, recovering, and transporting the pipe with two single-arm mobile manipulators, which gives more possibilities to the operator. In fact, this enhances stability in the recovery and transportation phases while presents more difficulties while grasping.
The solution presented is a step forward to solving the problem in a safe and reliable way.

State-of-the-Art on Vision
The use of cameras for robotic interventions is an effective and cheap solution which can be used to develop autonomous behaviours and to provide visual feedback to a remote operator. As a matter of fact, visual servoing control is a very important and well-known topic which is usually applied for manipulation applications (e.g., grasping) [25]. This is a crucial step to be carried out in most robotic interventions in which one robot interacts with the environment of the target area [26].
To do so, many strategies and technologies have been tested and integrated, like the implementation of Convolutional Neural Networks (CNN) to determine whether or not visually guided grasping has been successful, apart from using tactile and force sensors [27].
Many researches have been carried out for such a purpose, with the aim of grasping and manipulating objects, either by using monocular [28] or stereo cameras [29].
Robotic interventions in real scenarios can be affected by unexpected situations, such as facing metallic objects, lack of visibility, reflections, and occlusions, where the use of specific techniques has been studied [30]. Besides these, some strategies have been developed for cooperative robot coordination, where computer vision has a significant importance for both robot coordination [31] and grasping determination [32,33].
In summary, vision-based approaches have been studied in detail and offer interesting results, which still need more effort in order to face realistic environments and cooperative interventions. This paper is facing a step forward in this direction.

State-of-the-Art on Networking
Real interventions on hazardous environments, such as underwater and radioactive environments, present limitations on communication, which need special attention to guarantee cooperation between the robots, also allowing supervision of the human operator.
Scenarios such as the ones present at CERN can be affected by radiation, shielding, and huge magnetic fields, which can limit communications and can also leave the robot without connectivity. Underwater scenarios are affected by energy absorption of the water, limiting the communication distance of visual and radio modems and presenting difficulties in sonar-based systems, such as reflections and multi-path. Sonar needs to be used in the open sea, presenting problems in the presence of nearby structures [34].
Communications in these environments are an interesting line of research which really need an effort to enable the system to work safely. Also, the communication system is very related to fine-grained localization of the robots, especially when working with robot teams, a functionality that has not yet been solved in underwater and symmetrical indoor environments.
Previous experiments with robot teams at CERN have been performed by using a leader-follower configuration, using a local Wi-Fi network to link the robots and a 3G/4G unique connection to the surface. The system faces relative localization of the robots by using RSSI (Radio Signal Strength Indicator) techniques [35]. It is necessary to work on improving the localization of robots and the recovery behaviours in the presence of communication failure.
Some current communications experiments in underwater research systems use RF (Radio Frequency) and VLC (Visual Light Communications) modems for low distances. RF systems are able to provide a rough localization of the nodes and to communicate even with a lack of visibility [36,37], while VLC systems are very convenient when visibility is good and in the presence of magnetic fields [38].
In summary, there is no general solution, making it necessary to continue working in this field and to use multimodal communication systems. In addition, the communication systems for hazardous environments need more research on compression techniques, especially for applications where a human operator is necessary (e.g., expert manipulation and maintenance) [39].

User Interface Overall Description
In this section, an overall description of the user interface is presented, including the submodules that enable control of the robot team in a human-supervisory manner. As a matter of fact, the system is intended to be used by untrained robotic operators, which might be experts on the scientific scenario where robotic intervention is going to be performed. Figure 9 provides an overall description of the multimodality features of the user interface, which can be summarised as follows: • Cooperative Mission Plan: This module has been designed in order to provide the operator with a graphical tool (i.e., mission plan) that shows the steps of the planned mission, their corresponding behaviour's state, as well as the possibility to activate the current state of the mission in a graphical manner. The mission is adapted to the current operation in advance, creating the behaviours by using scripts (see Listing 1) which implement cooperative synchronisation of the robots by processing feedback and control signals appropriately, as presented in more detail in the next section. It is important to explain that the script module enables the possibility to implement more sophisticated modes of interaction, such as implementing specific object recognition modules for certain objects as well as their grasping determination procedures. • Unilateral Input Devices: These are the devices that let the operator control the system without providing feedback. • Bilateral Input Devices: These devices are specifically designed for manual teleoperation modes, allowing control of the system by using joints and world coordinate commands and giving force feedback to the operator, according to the force sensor integrated into the robot's arms. • Bilateral Output Devices: These are devices that, for the moment, are used for representation of the information to the user. In the context of the TWINBOT project, the virtual and augmented reality components already enable input and output control signals. Also, haptic devices are attached to the operator's fingers to provide force feedback. Audio is also a very important feedback source to the operator. • Vision and Grasping: This module is implemented as a vision library that runs on the server side and can also be invoked via scripts from the client's side. It enables the user to interact with the robots in a higher-level way (i.e., objects and regions of interest). The module is able to track and detect trained metallic objects such as pipes in a robust manner. Also, it enables the calculation of distances to targets using on-hand monocular cameras (e.g., laparoscopic).
The grasping determination module, which allows for calculation of potential grasping points for a given contour, can be activated by the user during the operation. This grasping functionality is demonstrated in one of the experiments presented in this article, facing in simulation the problem of grasping a big object (e.g., pipe) with two robots, taking into account that, once the object has been approached, it is partially perceived by the cameras. The experiment needs the user's supervision to select the required grasp in each robot, apart from monitoring the cooperative grasp mission to avoid unexpected situations such as failures in perception.  For more details about the system multimodality, for example, the way the controls are used to interact with the robots and user interface, please refer to [2].
The HRI adapts to the current configuration of the robot functionality that has been extended for multi-agent operations. To accomplish this function, the user is required to select the current robot team configuration, which can be applied to both the real robotic team as well as the training simulator. This is especially useful during the preparation time of the intervention. In fact, specific tools and cooperative behaviours scripts are implemented during this phase.
Once the user interface establishes a connection to the sensors and robots, the control view comes up. It lets the user select camera inputs from any robot, sending control commands and activating the current mission state according to the mission plan. In the top right part of this window, the controls to access the robot team commands can be seen.
The use of 3D sensors to perceive the environment enriches enormously the human-robot interface experience, allowing the use of the 3D virtual/augmented reality module. This enables the unified representation of sensors and robot states in the same view, facilitating the robot operation, especially for nonexpert users.

Multi-Robot Cooperative Behaviours
The multi-robot cooperative behaviours are implemented via scripts, in which the leader and slave roles are assigned automatically (this can be disabled by the operator to choose the role manually) and that can be integrated into the control loop in several ways. As an example, in Listing 2, we can see a script that implements a simple leader-follower behaviour. First of all, this script checks the mission plan state to know if the leader-follower behaviour is active. If this is true, then it takes as input the current leader velocity generated by the user interface and creates a second network package to replicate the velocity command to the robot slave. Once the behaviour state in the mission plan is deactivated, the script stops replicating the control.
In Figure 10, we can see a second example of script attachment to the control architecture in order to implement a closed-loop recovery behaviour. In this situation, once the operator activates the behaviour, the script in the sensor feedback input flow takes the current leader robot position and generates a velocity as output in order to let the leader reach the surface in a smooth manner. The generated leader velocity is injected to the second script, which replicates the velocity to the follower, according to the active leader-follower behaviour. These examples show that any sensor input or robot control output can be attached with a script that implements a cooperative behaviour.

Mission Plan
The multi-robot cooperative behaviours are activated by using the mission plan tool, which provides a list of steps to be performed during the intervention, where the corresponding behaviours are enabled to perform each step. Also, the currently enabled behaviours are checked by the cooperative scripting module in order to implement the corresponding cooperative functionality.
As an example of the mission plan tool, Figure 11 shows the dependencies applied to the robots and sensors in order to make the team approach the seabed in a supervised closed-loop manner. The enabled behaviours are shown on the right side in blue. The loop can also be closed by camera inputs, which in fact have been implemented to be only run on the server side, as it will be explained in the following section.
In addition, Figure 12 shows step 4 of the pipe pick-up and transport intervention, where both the secondary mobile platform (follower) and the arm follow the primary robot's movements (leader). For a complete demonstration of the mission plan, please refer to the Video section.

Behaviour Implementation
The HRI incorporates a module to design cooperative scripts, including support to access the available components as well as to compile functionality (see Figure 13). The scripts can be executed in three ways: (1) linked to a sensor input, (2) as an HRI output to a robot controller, or (3) working in a closed-loop base as a parallel thread to the HRI process. As an example, Listings 2 and3 show some simple behaviours that have been implemented for the current experimentation. We can see that they are invoked by the HRI and are able to receive the packet that holds the sensor input (e.g., distance or camera) or the robot movement, so the script can enhance that input information to perform additional cooperative operations.
Listing 1: Implementation of the behaviour to make the follower robot arm synchronised with the leader manipulator.

I s B e h a v i o u r A c t i v e ( ' G 5 0 0 _ 1 _ A P P R O A C H _ T A R G E T _ 2 M E T E R S ') :
# The distance to the sea floor is read from the sonar sensor d i s t a n c e T o T a r g e t = header . g e t D i s t a n c e T o T a r g e t () # According to the distance sensor of G500_1 # robot is approached to the sea floor t cp Cl i en tG 5 00 _ 1 = client . g e t T C P C l i e n t F r o m P o r t ( G500_1_PORT ) t cp Cl i en tG 5 00 _ 2 = client . g e t T C P C l i e n t F r o m P o r t ( g500_2_PORT ) tcpClientAUX = client . g e t T C P C l i e n t F r o m P o r t ( AUX_PORT ) # Create packet to move the robots vertically packet = client . In summary, the cooperative module architecture and design enable the operator to prepare, activate, and tune very simple scripts that can help during an intervention. In fact, as the interventions are studied in advance, the scripts and corresponding behaviours can also be implemented before facing a real intervention. Also, in case of necessity, taking into account the simplicity of the procedure, a trained operator can tune the scripts using the Graphic User Interface (GUI) during an intervention in case of real necessity.

Experiments
In this section, four experiments will be presented: (1) Validation of the HRI cooperative module in the CERN LHC mockup, (2) validation of the cooperative functionalities in the TWINBOT underwater training simulation, (3) underwater radio-frequency communications, and (4) the vision module to track and calculate multi-robot grasping points.
The two first experiments use leader-follower cooperative behaviours as well as closed-loop control on sensors (e.g., altitude) under supervision of the user, who initiates the mission plan steps (activating behaviours) and teleoperates as the leader.
The third experiment demonstrates the singularity of underwater scenarios in terms of communications. In fact, the bandwidth constraints, using specific compression techniques, allow the operator to get visual feedback in a very limited manner (e.g., one frame every 5 seconds with region of interest). Under these circumstances of constrained communication links, the use of more sophisticated semiautonomous cooperative behaviours is needed, such as vision for grasping.
The fourth experiment incorporates a new cooperative behaviour that gives more intelligence to the robot team by calculating the object grasps and by letting the robots perform the cooperative grasping in a supervised semiautonomous manner, enabling the operator to initiate and validate the mission steps. In this scenario, user interaction is performed at a higher level, reducing the need to manually control the multi-robot system and still being able to have the expert user in the loop in order to control the mission to steps, confirm the operations, and to use the multimodal facilities of the user interface in order to solve unexpected situations.
Videos of the four experiments can be found in Section 4.5.

Experiment I: Validation in CERN LHC Mockup
The HRI cooperative behaviours module has been validated at the CERN experimental area [40] using two CERNBOT robots with the one-top-arm configuration in order to pick up and transport a pipe in a synchronized way (see Figure 14) by having a single operator in the loop (see Figure 1 and Section 4.5). For this experiment, the cooperative behaviours create leader-follower control commands that are activated by the operator via the mission plan module. The operator interacts with the system by moving the leading platform in velocity. The sequence of the mission plan and user interface interaction can be seen in Figure 15. Table 1 shows a comparison between the time-based efficiencies of the task with and without the cooperative behaviours. It should be noted that the operator was able to complete the task without programmed behaviours as well. Moreover, having synchronisation between the two platforms during the transport phase resulted in a much safer operation. Table 1. Time-based efficiency of the transport task using the two CERNBots.

Time-Based Efficiency
Without cooperative behaviours 0.12 goals/min With cooperative behaviours 0.37 goals/min

Experiment II: Validation in Underwater Simulator
The same behaviour configurations used in the previous experiment have also been utilised and adapted accordingly in order to be able to pick up and transport in a cooperative manner a pipe from the seafloor in the simulation training tool for the TWINBOT project (see Figure 16). The intervention video can be found in Section 4.5.
As a matter of fact, in this experiment, apart from the leader-follower behaviours, the mission plan allows for the activation of a closed-loop approach to the target behaviour by using the depth sensor. Also, recovery transport to the surface is performed using a similar procedure.
In Figure 17, the mission plan behaviour's state selection and the effect on the user interface are shown.
The usability of the HRI cooperative behaviours has been validated by performing first an intervention on the training simulator using two operators: the first one controlling the auxiliary robot and the leader and the second operator controlling the second mobile manipulator. It was necessary to perform the intervention by having an operator leader that decided when a synchronised operation should be done. As it can be seen in Figure 18, the intervention was performed five times and it failed three times, especially in the transport of the pipe (step 5), with some difficulties in the coordinated grasping (step 4).
Moreover, the experiment was performed using one operator and the cooperative behaviours, according to the mission plan (see Figure 19). The first attempt failed due to excessive pressure while grasping the object. The following attempts were successful and very smooth when approaching the seafloor and when transporting the pipe to the surface.  Table 2 shows the time-based efficiency of task execution. In this case, the calculation of the metric has been slightly modified. In its definition, the time-based efficiency is computed as the sum of n i t i , where n i is 1 if the task was accomplished and 0 if the task was not completed and where t i is the execution time of the task. This means that, in the context of this experiment, a failed task out of the five in the entire operation would impact only the execution time of that specific task. However, it is safe to assume that a failed task compromises the entire operation. For this reason, for calculation of the time-based efficiency, if a task was not completed, the entire efficiency was considered 0.
Considering the values of Table 2, it is possible to use the expert-relative efficiency metric to compute the percentage gain in terms of efficiency by using the scripting behaviours and a single operator. In this case, the ideal completion time for the task of an expert user has been replaced by the completion time of the single operator using scripting behaviours. The resulting value is E e = 31.8%, which is the relative efficiency of two operators in the completion of an operation with respect to the single operator using behavioural scripts to accomplish the same tasks. It appears that the single operator is over three times more efficient in this context.  Table 2. Time-based efficiency of the transport task in the simulated scenario.   . Usability test using 1 operator to control the team using cooperative behaviours: 1 experiment failed on grasping, and the following attempts were successful.

Experiment III: Underwater Communications
In the current experiment, the communications architecture has been studied to enable the underwater team of robots to communicate in a cooperative manner and to be able to transmit status and sensor data (including compressed images) to the surface.
As can it be seen in Figure 20, a user interface networking module, including 3D virtual and augmented reality, has been implemented to allow the operator to remotely control an underwater robot using both sonar and radio frequency modems. Wireless underwater image feedback is expensive in terms of consumed bandwidth and requires advanced compression techniques. As shown in Figure 20, once the robot obtains its position by using for example markers or target localization, it is possible to use the 3D virtual scenario to represent the real position of the robot to the user and to allow the user to specify the next movement. This operation can be seen in Figure 20 as the transparent robot that represents the next position of the robot programmed to the operator, while the opaque 3D model represents its real position. This type of user interface reduces the consumed bandwidth, enhances the information provided to the user, and enables enrichment of the 3D scene with augmented information, such as the projection of the robot position in the pool floor, angles, etc.
The user interface enables the operator to select the control parameters of the network protocol and compression algorithm, allowing for specification of both the quality and resolution of the transmitted data as input. Also, a Region-Of-Interest (ROI) module has been added, allowing the operator to define multiple layers of compression. Further details about the communication and compression underwater techniques can be found in [8].
The results of the communication experiment are shown in Table 3. From this, we can conclude that, in CERN tunnels, at least, the leader of the robot team should have a Global System for Mobile communications (GSM)/3G/4G. link to the surface, while the followers or auxiliary robots should connect to the leader using a Wi-Fi ad hoc infrastructure, taking into account that having a low jitter is mandatory for cooperative team actions.
In addition, for underwater teams, the TWINBOT network architecture has a sonar link to the surface between the leader mobile manipulator and the surface. Also, the three-team robots communicate using radio frequency modems, which have proven to provide 1.9 Kbps of bandwidth at 5 m and can be updated to reach 12 or 30 m of coverage. The sonar link provides 3.2 Kbps, while the propagation and intrinsic delay are too high for cooperative interventions, so it is used to get the operator in the control loop in a supervised manner.
Acoustic communication for this experiment has proven to have greater fluctuation in the intrinsic delay (i.e., 363.853 standard deviation) due to Media Access Control (MAC). protocol implementation, which uses a burst of packages that require buffers in the modem.
The underwater communication experiment results have been used as input to model the network behaviour in the UWSim underwater simulator (http://www.irs.uji.es/uwsim).

Experiment IV: Vision for Multi-Robot Grasping
The multi-robot grasping determination experiment has been performed using the TWINBOT underwater training simulation server as input. The vision module gets the current image of the pipe as input, taken from the on-hand camera, and applies three actions: (1) object recognition (or ROI provided by the user, see Figure 21), (2) object tracking and visual servoing control to center the model in the image (see Figure 22), and (3) grasping determination algorithm for stable robot team manipulation (see Figure 23).
The two first phases of the vision have been developed and tested within the CERNTAURO project ( Figure 24). As a matter of fact, the significant contribution of this tracking system is that it enables the recognition of metallic objects, very common in CERN scenarios. The third action (multi-robot grasping determination) is a work in progress decoupled solution based on previous experiments at Jaume I University (UJI) for single robot manipulation. Current efforts are performed to integrate this grasping determination algorithm for the TWINBOT and CERNTAURO real experiments.
Grasping determination can be applied to any camera input selected by the operator, belonging to any of the robots. To apply this to a big object, it is necessary to have a more complete view of the object, where several grasps can be offered to the user to be tracked and executed by each particular robot. The computer vision system, which runs on the server side, will be activated from the HRI as a server behaviour.
Phase 1 (see Figure 21) uses the Speeded-Up Robust Features (SURF) [41] algorithm to identify the object in the scene. It can also be selected by the user operator or can be recognized through the object recognition module.
The second phase uses as input the ROI of the object and applies a tracking-based guidance system (see Figure 22), which overcomes the well-known weakness of feature extractor/descriptor and tracking algorithms such as reescalation, rotation, stability, or partial occlusions, achieving good results on harsh conditions either in metallic or underwater scenarios. This algorithm, which is a novel contribution from CERN [42], calculates the distance to the target by using a single monocular on-hand camera based on the Sinus Theorem (see Equation (3)), which is fully deployed in [30]. To summarize, the tracking system uses a global ROI which takes control of the correct behaviour of the four sub-trackings responsible for fulfilling the necessities upon hazardous and industrial scenarios.
The third phase has been studied in a decoupled manner using underwater training simulation, considering the multi-robot grasping determination procedure and not yet the multi-robot execution to avoid collisions. Once the target has been approached appropriately using on-hand visual servoing, the contour of the object (i.e., pipe) is extracted and analysed using a saliency detection-based algorithm [43]. Then, the grasping determination algorithm explained in [44] is applied, which gives the geometrical best-fit ellipse analysis (minimum and maximum inertia axis: I min ,I max ) and grasping points for a stable single-arm robot. Then, the pipe contour is divided into two sections, considering I max of the whole pipe as the division line. After this, the grasping determination algorithm is used in order to extract the grasping points for the two cooperative robots, as it can be seen in Figure 25.  For this underwater multi-robot grasping and recovering experiment, which can be seen in the videos section 4.5, the vision system for each robot calculates 3 graspings: (1) the center of the object, as an intersection between the I min and I max axes; (2) grasping 1, which is located at the left side of the center along the I min axis, and (3) grasping 2, which is located at the right side equidistant to grasping 1. This automatic grasping selection has been performed for this specific application in order to guarantee recovery stability.
Grasping calculation has been performed for each robot camera, as the robots approach the target, enabling the user to center the robot gripper to the corresponding grasp 1 or 2, according to the mission parameters.
The experiment has demonstrated that the proposed solution works safely even when the robots are closer to the target, having partial view of the pipe.
As it can be appreciated in Figure 26, the use of a module for multi-robot grasping support enhances enormously the safety of the intervention by using a single operator, minimising the need for manual control of the robots which, due to the underwater communication constraints, might present stability problems. The use of the mission plan has been very useful in order to allow the user to confirm the state of the mission before starting the next step, offering a supervised way of operation avoiding cognitive fatigue and increasing safety.

VIDEOS
• Preliminary dual-arm experiment: CERN simulation server and HRI pipe recovery with dual-arm mobile manipulator: https://cernbox.cern.ch/index.php/s/6Ud83QBpB1OpJx6 • Experiment I: CERN experimental area validation to recover a pipe with two single-arm cooperative mobile manipulators: https://cernbox.cern.ch/index.php/s/mtyfUhW6Rryqe8k • Experiment III: underwater training simulator validation of TWINBOT project pipe recovery with two single-arm cooperative mobile manipulators https://cernbox.cern.ch/index.php/s/ iJYnMK8D1srkWbB • Experiment IV: underwater training simulator validation of TWINBOT project for cooperative grasping with two single-arm mobile manipulators by using the GUI https://cernbox.cern.ch/ index.php/s/ieLYuUCvK1Q7FyO

Conclusions
This paper has shown the current state of the cooperative behaviours module developed at CERN for the CERNTAURO project, which was very convenient in the control of similar operations in underwater scenarios. This paper shows the synergies between both projects and the technical aspects that can be applied in radioactive and underwater scenarios.
The cooperative behaviours module, integrated into the HRI, includes a mission plan software which enables the user to activate/deactivate the corresponding cooperative behaviours according to the state of the mission and unexpected situations that might arise. Experience in real interventions shows that, in order to guarantee the safety of human beings and machines, it is very convenient to have a human being in the loop who can supervise the execution of close-loop behaviours, having the ability to decide the state of the mission, the level of interaction (from manual to autonomous control), and the interaction mode (e.g., scripts, keyboards, master/slave, etc).
The experiments shown in this paper cover very important aspects of the cooperative intervention, such as the tracking and object recognition module, the underwater results from the experiments' communication, validation of the system by recovering and transporting a pipe with two robots in the CERN LHC mockup, as well as the use and validation of the HRI cooperative module to recover a pipe in a simulated underwater training scenario. Current efforts are being made to apply the knowledge acquired from this experience to real underwater field experiments. This paper also presents a comparative analysis to solve recovery and transport of the pipe, first by using two expert operators and secondly by using one operator who uses the cooperative behaviours and the mission plan. The usability results show that the cooperative behaviours module enables a single operator to control the robot team in a safer way, being able to smoothly pick up and transport the pipe. Besides this, the experiments have been performed, first by using a teleoperated leader-follower technique and later by using a multi-robot grasping determination algorithm, which has shown the best results in terms of safety and efficiency.
The system has been enhanced with a reliable vision-based tracking and recognition module that demonstrates very good reliability when facing metallic and underwater images, while an extension to the module in order to calculate the grasping points of the robot team has also been presented.
Further work will focus on defining and evaluating more advanced techniques for multi-robot grasping executions, focusing on more realistic underwater scenarios.