Recent advances in robot-assisted echography: combining perception, control and cognition

: Echography imaging is an important technique frequently used in medical diagnostics due to low-cost, non-ionising characteristics, and pragmatic convenience. Due to the shortage of skilful technicians and injuries of physicians sustained from diagnosing several patients, robot-assisted echography (RAE) system is gaining great attention in recent decades. A thorough study of the recent research advances in the field of perception, control and cognition techniques used in RAE systems is presented in this study. This survey introduces the representative system structure, applications and projects, and products. Challenges and key technological issues faced by the traditional RAE system and how the current artificial intelligence and cobots attempt to overcome these issues are summarised. Furthermore, significant future research directions in this field have been identified by this study as cognitive computing, operational skills transfer, and commercially feasible system design.


Introduction
Echography (also named as sonography or ultrasound (US) imaging [1]) is an important technique in medicine that has the advantages of low-cost, non-ionising property, portability, availability and convenience [2,3]. In the recent decades, the robot technology was widely used in the health care areas, especially in remote surgery, patient rehabilitation and pharmaceutical automation [4,5] since the 1980s [6], and robot-assisted echography (RAE) system gained advantages due to the following two main reasons.
(i) The availability of skilful physicians is limited and they are completely absent in some healthcare centres. Hence, the RAE system becomes more meaningful for remote diagnosis [7]. (ii) Physicians exerted large forces to diagnose patients using ultrasound probes with uncomfortable gestures, which usually endure strain injuries [4,8,9].
Since the last century, the scientist has embarked upon the development of a medical RAE system [10]. Nowadays, a number of echography systems have been developed and integrated into the robotic surgical system, e.g. Da Vinci robotic surgical system [11][12][13]. Similar to control paradigms division of the surgical robots [6], previous researches overviewed and classified the RAE system into three categories: autonomous US imaging, teleoperated US imaging, and human-robot cooperation [14]. These days, more and more RAE systems are designed with two or more functions [15][16][17]. This paper presents the recent advances in the RAE system and provides a comprehensive literature review of perception, control, and cognition of the RAE system.
The subsequent sections of this paper are organised as follows. Section 2 presents system construction, applications, and related projects and products. Section 3 covers the major challenges and related techniques. Section 4 provides a discussion about the system and the future scope of this research area, highlighting the emerging techniques such as cognitive computing and imitation learning. Section 5 provides the concluding remarks.

Preliminaries
Prior to embarking on the advances in the field of RAE, it is imperative to establish the system construction, and the applications and the progress made by other research groups across the globe are presented in this section.

System construction
Traditionally, the RAE systems have been built with three basic components: a robotic arm for manipulating the US probe, a joystick/haptic and display interface, and a robot controller [10,14,15]. Several equipment, sensors, and software modules are added to enhance system abilities to satisfy different demands, e.g. magnetic resonance imaging (MRI), computer tomography (CT), X-ray, electromagnetic tracking system and surgical needles [18]. Fig. 1 presents a typical experimental setup designed in the early years.
Additionally, the RAE system was integrated with further abilities of perception, diagnosis based on multimodal information, tool-based manipulation (needle guidance and intraoperative ultrasonography) and human-robot interaction. The robot is a perception/cognition-driven tool that provides accurate position information and enables the physicians to treats patients by the physicians safely and effectively. Fig. 2 illustrates the structure of the RAE system, which consists of three control paradigms: autonomous, teleoperated and human-robot cooperative mode.
The process starts from the patients, and the perception unit obtains the images (ultrasound, cameras, CT, MRI, etc.), and forces by the use of medical devices fixed at the robot end or external equipment. Then, the information (red arrow) returns to the robot control side for serving control, to the cognitive unit for increasing the feasibility and autonomy of the RAE system. Then, the cognitive results as referring commands, are sent back to the robot control side (blue arrow), or to the robot-human interface (RHI) side for further decisions. The three units (background in dark orange) form a closed-loop frame for the autonomous RAE system, which moves after getting the surgeon's concurrence with preprogramed orders. The perceptive feedback signals will be delivered to the RHI side in the teleoperated mode. Preliminary cognition results are combined with various physical signals, e.g. position, force, impedance, etc. and symbolic/graphic signals are provided for interaction with the operators by devices. The surgeon specifies the desired actions and enacts the robot's motions directly, and examples include Da Vinci surgical robotics. The human-robot cooperative RAE system is described as the shared control or semi- autonomous control between the user and robot device [14]. Operators control some degrees of freedom (DoF) and the rest is controlled by the robot such that relieves the burden and provides convenience for surgical tasks. Examples show that the control mode can be realised both in the local and remote areas [17]. Detailed examples are presented in Tables 1-3.

Extracorporeal diagnostic:
Originally, the RAE system was created as an innovative system to address the problem of limited availability of skilful personnel (such as ultrasonographers) and to assist with remote diagnosis. Hence, developing the diagnosis capability has been the main priority since the late 1990s [42]. Except for special dextrous devices design, integrating more sensors and devices and developing smart diagnostic software and algorithms are also widely studied. Until recently, the pure RAE system and the system with multiple diagnostic devices (MRI, CT, X-ray, ultrasound, etc.) have been used for renal tumour [43], oesophageal conditions [44].

Needle guidance:
Before the ultrasound imaging, MRI and CT models were used for surgical planning [45]. From about 2001, Megali et al. [46] and Tsekos et al. [47] developed robotic systems for prostate and breast biopsy and therapeutic interventions separately. The RAE system provides needle guidance for biopsy, therapeutic interventions, and brachytherapy, which solves the absolute dependence on operator skills and even improve success rates. Priester et al. [42] concluded the brachytherapy and robotic ablation systems and stated that the US scanning was performed internally through surgical incisions. Some recent researches reveal more advantages of ultrasound guidance such as low-cost [48] and feasibility of the data fusion with other techniques, e.g. magnetic resonance imaging [49] (Fig. 3).   [17] France TEL/HRP 2008 WTA-2 [9] Japan AUT 2010 FP-USM [29] German TEL 2011 iFIND [30,31] UK AUT 2014 a TEL: teleoperation, HRP: human-robot cooperation, AUT: automation. preoperative imaging and surgical exploration does not exceed 60-80%, however, IOUS allows for early diagnosis and precise localisation of many diseases, and it is an excellent guidance tool for accurate and radical surgical treatment'. IOUS, dating back to the 1960s, has been integrated with robot techniques and used for detection of liver ablation [18], liver metastases [50], liver tumours [51], and intramedullary tumours [52]. Now, IOUS is developed with the device applications of surgical robotics, according to the mode of ultrasound imaging, which, in [42], are categorised into three major groups: hand-held, trans-rectal, and robotically integrated. Some recent researches focus on the increasing the 'smartness' of the surgical instrument, such as the programme of 'smart surgical instrument (SSI)', which proposes the integration of ultrasound imaging into the ultrasonic scalpel to improve real-time sensing during the surgical procedures [53].

Other applications:
Apart from the previous three major applications, RAE systems have been successfully adapted to perform a number of other useful functions. Chang et al. [54] and Stetten et al. [55] proposed a technique based on the special geometry of the intraoperative acquisition device to add US images onto the video images of the patients. Meng et al. [56,57] introduced a mirror RAE system that a physician in the master side operates the robot holding a US probe to inspect a leg, and on the other side, robot arm follows master's motions and scan the other leg under the navigation of Kinect.

Related projects
From 1990, several groups and relative projects have been supported by Europe, Japan, US, France, UK, etc. [58]. Table 1 presents some of the projects. Among the projects in Table 1, there are some milestones. Hippocrate [19,20] was known as the first RAE system, though there was an automated breast ultrasound (ABUS) device, which had been first proposed by Maturo et al. in the 1970s [59]. Hippocrate relied on the technologies of the force sensor, position sensors, motors, control boards, computers, and provided special robot arm, force control and feedback, safety method, calibration, and user interface. TERESA [21,23], ESTELE [22], and PROSIT [17] were related projects using a similar design that was designed by Arbeille's group. The object of MIDSTEP project is to realise both interventional teleoperated US demonstrators and minimal invasive surgery (MIS) [24]. The iFIND project started in 2014 and aimed to improve the accuracy of routine screening in pregnancy by developing the RAE system and US technologies that allow screening of foetal in an automated and uniform fashion [30,31].

Challenges and key technologies
Though the RAE technology has been studied for more than 20 years, it still strongly relies on the operator skills, including the precise probe positioning, accurate orientation, image analysis, and mental construction of 3D anatomy based on 2D images. Furthermore, humans have limitations such as manipulation ability and dexterity outside natural, limited geometric accuracy, and prone to fatigue and negligence [48]. However, the robot can overcome these limitations due to the strengths of excellent geometric accuracy, robustness and operational stability. Most of the previous researches about the RAE system or even surgical systems are taken to solve the challenges: i. To provide an effective and pragmatic system for the surgical staff. ii. To enhance the robot's control ability and performance. iii. To improve the robot's intelligence for autonomous or semiautonomous diagnosis and surgery.
In order to solve these diverse challenges, many researchers endeavour to design novel systems by merging a few advanced techniques. Such advances fall into four major categories: mechanical design, sensors and perception, robot control, and autonomous cognition.

Mechanical design
Degoulange et al. [19,20] were pioneers in developing a 6-DoF articulated robotic arm by providing force control to satisfy the rotational constraints at the tip of the probe (Fig. 4). Simultaneously, Salcudean et al. also developed a 6-DoF robot arm that had a parallel linkage structure in 1999, with counterbalancing and back drivable joints to ensure safety [8] (Fig. 5).
In the subsequent researches, the jointed robot becomes one of the main drivers for US imaging. Some researchers even used the redundant robot arm or multiple robots [36] for the RAE system. Table 3 presents some of the designed joint robot arms and their architectures.
Some researchers designed the special robot end [62], e.g. wrist [7,63] or probe holder [22]. In [7], a parallel robot with a 4-DoF serial wrist is designed: 3-DoF for orientation and 1-Dof for probe translation along the axis (Fig. 6b). The robotic wrist for remote ultrasound imaging in [63] was a 4-DoF device that uses parallel mechanisms in configuration to provide a remote centre-of-motion. All four actuators are located on the ground, and each actuator is responsible for a required motion of the ultrasound probe. The probe holder in [22] was a special one that does not depend on the  assistance of robots. However, it needs to be held by an operator at the patient site under the direction of the expert on the remote side, the bottom ring of which is attached to the skin of the patient (Fig. 6c). There are a lot of similar products, e.g. OTELO [27,28], TER [15], RUDS [26], Masuda's [64] and Najafi's robots [65].
Apart from the slave robot and actuators, some researchers designed special human-robot interaction (HRI) equipment, though most of the experiments are completed by commercial joysticks, e.g. Phantom, Touch, Omega. SHaDe is a spherical parallel 3-DoF haptic device that can be used to control the robot orientation with force feedback.

Sensors and perception
For needle guidance and IOUS, the RAE system is integrated as an important part of the surgical robot. Here, we mainly introduce the equipment and sensors that enhance the diagnostic ability of the RAE system. CT, MRI, X-ray, and US are all well-established imaging modalities in medical diagnosis [26]. However, they have different drawbacks and advantages. Compared with MRI and CT, the US imaging technique is used more widely than the others, due to its low-cost, non-ionising, availability and convenience [2,3]. However, CT and MRI have a better interobserver agreement for the germinal matrix and reveal more instances in some cases [66]. Thus, some researchers use data fusion technology for registration and guidance. Fig. 7 presents the radiofrequency ablation of a liver mass guided by ultrasound imaging in the CT suite. Table 2 presents a small part of equipped perceptive equipment, sensors to the RAE system.
After acquiring the measured data, researchers focused on the design of the algorithm of perception. The initial purpose of the RAE system is to utilise the accurate position of the robot to reconstruct 3D US images by 2D US images and providing better volumetric understanding for the physician to select the needle entry and target points. Recently, more and more researchers pay more attention to the data fusion method for the guidance of the prostate [67], liver and kidney biopsies [39]. In [32], a new method to simulate the medical US from CT in real-time is proposed to correct the registration errors manually, reproducing the majority of ultrasonic imaging effects based on the designed equipment (shown in Fig. 7). An evaluation study (involving 25 patients) was undertaken and correctly registered without any manual interaction (by the US) in 76% of the cases.
The integrated sensors with the US probe have been widely studied, e.g. marks, cameras, force sensors etc. In [14], there are some LED markers added at the probe end. Using the optical tracker, the position tracking error of the probe is investigated in the task space. More researchers prefer to use cameras or Kinect for position location. In [37], a probe-camera based system was built for 3D US image reconstruction with functions of calibration and localisation, and the results showed that the location of the US probe could be estimated with a 2 mm error for the probe travel distance of 200 mm.
Force control is mandatory if clinical tests contain artery elasticity [19]. The force/torque sensor has been added to the RAE system in the early study [10,38] (Fig. 8); it provides the slave environment to the operator in the teleoperation control mode and prohibits large contact forces from avoiding injury to the patients. However, some surgical robots do not provide force feedback in teleoperation mode. From the view of teleoperation, a four-channel architecture transmitting both the position and force information is helpful to enhance manipulation transparency [68]. Recently, there is some research integrating more sensors to the RAE system. In [40,41], a system framework is presented with the constitution of a US machine, a depth camera for obtaining 3D contour, an industry computer for collecting B-scans and their positional information for the subsequent volume, two force sensors for measuring the contact force between the probe emission plane and tissue surfaces for fine-tuning the robotic arm.  [63], (b) [7], (c) [22], (d) [28]

Robot control
Accurate probe guidance and registration, contacting soft tissues with suitable forces, or multiple robot cooperative operation effects rely heavily on the performance of the robot control performance. As mentioned above, three modes are concluded as autonomous, teleoperation and human-robot cooperative control. Furthermore, various control methods such as the three-channel environment force compensation (EFC) control [69], PD control [10], adaptive control [70], neural network (NN) control [71] and fuzzy control [72] are taken to solve US guidance with uncertain environmental forces [69], and varying time delays. Different control strategies have their own advantages and limitations. The autonomous RAE system completes scanning and imaging through a complex path under the programmes. Mostly, the system uses 2D or 3D images for surgical CAD/CAM applications or even needle guidance intervention. However, they can only be applied in simple procedures and hard to be used in deforming anatomy. Teleoperation provides real-time experienced operation signals, and it is also the main way for the MIS and remote surgery. Remote distance causes large time delays, a lack of immersion experience, and an increase in the cost of the surgical robot [73]. Human-robot cooperative mode absorbs human experience and precision and strength of robotic devices in the shared controlled structure so that the robot can relieve some of the burdens from the operator.
From the view of acquired signal-based controller design, the common control modes consist of position control, force control, impedance control, and hybrid control. Most RAE systems have two or more control modes, e.g. [39] provided the position control and hybrid control, the latter of which uses a combination of position and rate guiding. Sartori et al. [74] proposed a two-layer passivity-based bilateral teleoperation architecture for the RAE system stability, despite unknown varying time delays, which allows implementing different kinds of position and force control laws. In [75], the authors used a velocity-control-based impedance control for the teleoperated RAE system, which can transmit the contact force to the doctor efficiently with the proposed motion control law. For the considerations of safety and stability, a force controller is also completed.

Autonomous cognition
As the system structure presented in [76], the robot controller provides low-level primitive control signals as motor driving input, and cognition and management rely on the human operator and the high-level cognitive and task plan module. Further, in [77], the authors distinguished the two definitions 'detection' and 'diagnosis': detection for locating the lesion region of the image and diagnosis for identifying the potential diseases. With the trend of artificial intelligence (AI) in the 21st century, the RAE systems are designed with increasing complexity, intelligence and provide enhanced convenience. The autonomous cognition contains three main aspects: US diagnosis based on image processing, robot learning (from demonstrations) and signal processing (for easy human recognition and understanding).
The first aspect is a part of the computer-aided diagnosis (CAD) system, which was first used in the breast tumour diagnosis in the 1960s [78]. The general flowchart of ultrasound CAD is shown in Fig. 9.
Feature selection and classification of the lesion are essential steps to achieve the final purpose. Texture, morphology, feature based on a statistical model, descriptor features, and different kinds of classifiers built the basis of the traditional US CAD system and inspired research with in the field of deep learning (DL) technology [77]. From about 2006, Hinton and his student published the first paper about DL [79], and learning methods were used for the CAD system. In [80], in order to reduce latency for ultrasound-based robotic tracking tasks, a deep learning method was explored to extract pertinent information directly from raw radio-frequency channel data to locate targets of interest from a single plane wave insonification. In [81], deep convolutional NN (DCNN) architecture was proposed to automatically recognise foetal facial standard plane (FFSP). A composite NN framework was proposed for automatic identification of different standard planes from US videos, which was different from previous studies specifically designed for individual standard planes. Some researchers studied the convolutional NNs (CNNs) for the classification of US images [82] and image enhancement [83]. Compared with traditional methods, the DL method does need the feature designed by the human but from the presented results of open sources. However, the accuracy, sensitivity, and specificity still have differences.
The second aspect is skills cognition for US scanning by demonstration. In [84], a learning-based controller was shown to enable autonomous extended focused assessment with sonography in trauma (eFAST) scanning according to expert demonstrations. The designed method was two-fold. One automatically acquires the USS images and sends them to the expert radiologist without the need for robotic teleoperation. The other applies the learning method based on Gaussian mixture regression (GMR) and Gaussian mixture modelling (GMM) by human demonstration to reduce the complexity of robotic programming. While in [56], a mirror robotic US scanning system is built by a two-arm robot. The sonographer's operation on the probe was recognised and used for the robot's action under the navigation of Kinect. Clustered viewpoint feature histogram (CVFH) descriptors of segmented probe and CAD training data were calculated for probe recognition.
The final issue is about providing better cognitive feelings and understanding of human beings. According to the depictions of [85,86], the spatial representation should be derived by not only modality-specific records of perception but also through common cognitive mediation. The experiment results in [87] shown that CAD images alongside the original ultrasound images can improve the perceptual tasks of non-radiologists. However, for expert, the improvements are small in the perceptual and cognitive tasks. Thus, some researchers used techniques, e.g. augmented reality (AR) [88,89], virtual reality (VR) [90] to visualise US images to improve the cognitive ability for echography leaners.

Discussion
The RAE system has been developed for several decades. During the time, it absorbed many techniques and equipped some sensors to enhance the scanning accuracy and recognisability. However, it still deeply depends on humans' experience and cannot independently complete complex tasks. Hence, the authors of this paper are of the view that the developments in AI and humanfriendly robot techniques will provide the tools for solving these problems. This section provides three significant areas of further research developments in this field.

Cognitive computing
Cognitive computing aims to develop a coherent, unified, universal mechanism inspired by the mind's capabilities [91]. Though the state-of-the-arts learning methods satisfy some classification and cognitions tasks with acceptable and human-like correctness, they cannot throw light on each other over different tasks. As Allen Newell described, we need 'a single set of mechanisms for all of the cognitive behaviour. Our ultimate goal is a unified theory of human cognition.' [92]. For the RAE system, the ultrasound machine provides the raw data, and a strong cognitive computing Fig. 9 General flowchart of CAD system [77] Cogn. Comput. Syst., 2020, Vol. 2 Iss. 3, pp. [85][86][87][88][89][90][91][92] This is an open access article published by the IET in partnership with Shenzhen University under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/) system helps to process it and export the diagnosis results directly without the internal steps of data transformation for cognition, especially with the help of human doctors.

Operator skills transfer and learning
Though the strain injuries of the physicians are released by the RAE system, some in-hand skills such as modifications of probe scanning attitude and the corresponding impedance are lost. Therefore, how to store and transfer human manipulation skills is as important as enhancing the image recognition ability. In [93], a training system for US-guided needle insertion procedures was designed. The system involved issues such as surface and volumetric registration, solid texture modelling, spatial calibration, and real-time synthesis and rendering of the US images. The authors even provided principles and metrics for verifying the effectiveness of the training process. However, it does not provide force guidance during the operation process, but skills transfer and learning are easily achieved during human hand-to-hand teaching. Some recent researches on human manipulative skills are detailed in [93][94][95][96].

Commercially feasible system design
In [4], the RAE system was designed using a commercial UR5 robot. From the beginning of the 21st century, the collaborative robots (cobot) are manufactured by many companies, e.g. Cobotics, KUKA, UR, and ABB, which provide safer interaction with human beings in a shared space. The commercial cobots balance the cost and safety in a pragmatic manner; it is an ideal driver for the development of the future of RAE systems.

Conclusions and further work
This survey summarised the applications and projects, challenges, technologies, and progress of the RAE system from the view of system construction, which has main modules: perception, control, and cognition. The development of each module enhances the abilities of the RAE system. Especially, the achievements acquired in areas of AI and cobot promote the RAE system to provide a lowcost intelligent service. Further research directions discussed in this paper include cognitive computing, operational skills transfer and learning, and commercial and feasible system, which are also suitable for other robot applications, e.g. surgical robots and service robots.
From the literature review, it becomes clear that there is a lack of study interest focusing on the operational skills required for ultrasound hand-on scanning. An interaction with suitable contact force, scanning direction, and voice not only provide a comfortable feeling to the patients but also improve the image qualities. Similar to the diagnosis based on images, the recognition of human motions in other areas also widely used the learning methods for training [97] and human-robot collaboration [98]. A comprehensive learning frame of US images and human operation skills will be appraised to bring a smarter and more human-friendly feeling to the patients.