Automatic Tracking of Surgical Instruments with a Continuum Laparoscope Using Data-Driven Control in Robotic Surgery

scheme achieves a higher detection precision and provides more optional keypoints for tracking. Simulation and experiments validate the feasibility of the proposed control method. Experimental results show that the proposed method can automatically adjust the ﬁ eld of continuum laparoscope by tracking surgical instruments in real time and satisfy the clinical requirements.


Introduction
Robot-assisted minimally invasive surgery (RMIS) has received increasing attention because of its unique advantages compared with traditional open surgery. [1][2][3][4][5] In RMIS, laparoscopes used to display the surgical scenario on a screen are held by a robotic arm instead of an assistant. Surgeons need to adjust the laparoscope frequently to provide a proper field of view (FOV) in surgery. This procedure distracts the attention of surgeons during the surgery, thereby affecting the progress of the laparoscopic operations.
A control method for adjusting the FOV of laparoscopy by tracking surgical instruments automatically during surgery needs to be developed. Currently, rigid laparoscopy is widely applied to RMIS with automatic FOV adjusting algorithms. Yang et al. proposed a region-based visual servoing method to automatically manipulate laparoscopy with colored markers, which can improve the control efficiency and safety of FOV. [6] An autonomous surgical instrument tracking method without any markers was further proposed based on the visual tracking space vector. [7] An eye-tracking camera control method is used in Senhance surgical robotic system (Asensus Surgical, NC, USA) for adjusting the FOV of the laparoscope. However, these methods encounter the workspace problem. When operating a rigid laparoscope using a robotic arm, avoiding collision between the robot arm and the other surgical instruments is necessary. It is difficult to operate a rigid laparoscope using a robotic arm in a narrow workspace, leading to the limited FOV of the rigid laparoscope.
Continuum manipulators have been widely applied in robotic surgical applications due to their higher dexterity and less workspace required. [8][9][10] Recently, continuum manipulators have been used for automatic FOV adjustment with visual servoing in RMIS. [11] However, the dynamics of continuum manipulator are always highly nonlinear and high dimensional due to the mechanical compliance of its structure. [12] These characteristics bring challenges to the precise control of continuum manipulators. Existing methods usually simplify the continuum manipulator based on physical assumptions in establishing dynamic models, such as piecewise constant curvature model, pseudo-rigid-body, quasistatic, and simplified geometry. [13][14][15][16][17][18][19][20] DOI: 10.1002/aisy.202200188 In existing surgery process, surgeons need to manually adjust the laparoscopes to provide a better field of view (FOV) during operation, which may distract surgeons and slow down the surgery process. Herein, a data-driven control method that uses a continuum laparoscope to adjust the FOV by tracking the surgical instruments is presented. A Koopman-based system identification method is first applied to linearize the nonlinear system. Shifted Chebyshev polynomials are used to construct observation functions that transfer low-dimension observations to high-dimension ones. The Koopman operator is approximated using a finite-dimensional estimation method. An optimal controller is further developed according to the trained linear dynamic model. Furthermore, a learning-based pose estimation framework is designed to detect keypoints on surgical instruments and provides visual feedback for the control system. Compared with other detection methods, the proposed scheme achieves a higher detection precision and provides more optional keypoints for tracking. Simulation and experiments validate the feasibility of the proposed control method. Experimental results show that the proposed method can automatically adjust the field of continuum laparoscope by tracking surgical instruments in real time and satisfy the clinical requirements.
Assumptions in these simplified models may lead to deviation under actual conditions and inaccurate results, which are not feasible for use in practice, especially for scenarios with high precision requirements.
Recently, data-driven control methods, such as neural networks and reinforcement learning, have shown great potential for controlling continuum manipulators. [21,22] The advantage of these methods lies in the input-output mapping of the system derived from sensing data without analytical modeling and complex computation. Given enough input-output data, data-driven models can describe the behavior of the system over its entire operating range. However, these methods usually require many tuning parameters and repeated trials to establish accurate models. Other concerns include low real-time performance and computational complexity. [23] Koopman operator provides an alternative solution for establishing the dynamic model of a continuum manipulator based on its unique linear structure. [24,25] Koopman operator lifts the nonlinear dynamic model of the system into an infinite-dimensional space and evolves the state functions, also called observation functions, in the new space. In this way, the dynamic model of the nonlinear system can be easily propagated in a linear manner, relying on input-output data only. As a result, linear control methods can be applied to control the continuum manipulator with high precision. Apart from accurate system identification, close-loop control with high precision visual feedback is also essential. Visual feedback in laparoscopic instruments tracking can be divided into two types: marked methods and unmarked methods. Marked methods manually add a characteristic marker on the instrument for easy detection. Although this method can localize the target quickly, uncertainty exists due to the presence of blood and gas during surgery. Furthermore, this method provides surgeons with a poor experience and has a low tracking precision because markers are usually located at a nonclient rod of the instrument. Unmarked methods usually choose the whole metal part of the instrument as the detecting area and then detect the area as an object detection task using a deep learning algorithm. [26] However, this method requires surgeons to focus on different points of operation at different stages of surgery, and the method is often not flexible enough. For example, the ultrasonic knife in surgery is used to resect tissue, and the focused point should be the tip of the instrument. Scissors are used to clamp tissues or needles, so the focused point should be the center of the clasper. Therefore, the unmarked methods result in a less accurate visual feedback.
In this present work, we focus on autonomous control of a continuum laparoscope to adjust the FOV and keep the surgical instruments at the view center in RMIS. To address this critical issue, an automatic surgical instrument tracking framework is proposed based on a Koopman-based control scheme and learning-based vision feedback. As shown in Figure 1, this framework can be divided into two units. The first is the datadriven system identification unit, which applies the Koopman operator to transfer a nonlinear dynamics system into a linear closed-loop control. Unlike the Taylor-based method, we introduce the Chebyshev polynomials to choose observation functions. [27,28] Chebyshev polynomials is a global approximation method dependent on high-order derivatives of the system state as existing methods. The approximation error of the proposed method is also analyzed. A linear quadratic regulator controller is further used for real-time control based on the linear representation of the continuum laparoscope.
The second unit is the visual feedback and optimal control unit, which provides control feedback for the surgical instruments tracking task. In this unit, a deep keypoints detection network is developed to predict the pixel positions of keypoints on surgical instruments. Unlike the existing object detection methods, a pose estimation method is developed to detect the keypoints on surgical instruments. The pose estimation method can directly regress the pixel coordinates of the keypoints on surgical instruments instead of detecting the whole area as the object detection method. This method can increase the precision of keypoints detection and benefit subsequent control tasks. The keypoints needed as the tracking point can be selected flexibly according to different surgery stages in the following control system. In addition, weights of different surgical instruments can be www.advancedsciencenews.com www.advintellsyst.com set by surgeons when multiple instruments are used according to their requirements. The rest of this article is organized as follows. Section 2 reviews the Koopman operator and describes the linear quadratic regulator (LQR) controller design. The method to select the observation functions through shifted Chebyshev polynomials is introduced, and the approximation error is analyzed. In Section 3, the pose estimation task of surgical instruments and the architecture of the learning-based keypoint detection network are described in detail. Section 4 evaluates the proposed method in a simulation environment. Section 5 introduces the surgical instruments tracking method on a real platform, and the experimental results are also discussed. Finally, conclusions of the work and a discussion of future work are given in Section 6.

Preliminary
Consider the following general discrete nonlinear dynamic system where k is the time step, s k ∈ ℝ y represents the state of the system at step k and u k ∈ ℝ z is the input of the system at step k, and F denotes the transition function, which advances the state s k to s kþ1 with the corresponding input u k . Furthermore, we defined an observation function Ψ, which is used to transfer observed y-dimension states s k into Ψðs k Þ, in an infinite-dimensional space ℱ. The Koopman operator K, known as an infinite-dimensional linear operator, is applied to approximate the nonlinear system of the continuum laparoscope. [22][23][24][25] The linear system in Equation (1) can be expressed by a set of Koopman operators K d and the defined observation function Ψ in the infinitedimensional function space ℱ where d ¼ 1, 2, · · · is the index of the Koopman operator. Therefore, Koopman operator K advances measurements of the states linearly. Equation (2) can be written as follows As a result, the nonlinear dynamical system given by (1) can be controlled in a linear manner by a set of Koopman operator. However, the Koopman operator K is infinite-dimensional, which may be difficult to be applied in practice.
Recent studies use the least-squares method to approximate the infinite-dimensional operator K with a finite-dimensional representation with tolerable error. In order to get an approximation to the Koopman operator, the observation function Ψðs, uÞ in space ℱ can be expressed as Ψðs, uÞ ¼ ½ψ 1 ðs, uÞ, ψ 2 ðs, uÞ, ·· · ,ψ p ðs, uÞ (4) where p is the number of the observable functions. The observed state and its corresponding input are assumed to have been collected in advance and expressed in vector form, as follows where i is the number of collected data with the same time step. Koopman operator K can be calculated using the least-squares method, [29] expressed as follows Solving the least-squares problem yields where † is the Moore-Penrose pseudoinverse and This approach generally yields a better approximation with the increase of output dimension of Ψ. Furthermore, the number of collected data and their distribution across the state space will have a significant effect on the computedK d . Thus, it is crucial to choose the observation function Ψ and collect enough data of the unknown system.
For simplicity, the observation function can be re-written as follows Ψðs, uÞ ¼ ½ΨðsÞ, ΨðuÞ where ΨðsÞ ∈ ℝ q s and ΨðuÞ ∈ ℝ q u are the functions depending on the states s and inputs u of the system, respectively. ℝ q s and ℝ q u are the subsets of space ℝ q , and q ¼ q s þ q u . q is the dimension of invariant subspace ofK d . Then we can rewrite (3) as follows whereK A d ∈ ℝ q s Âq s andK B d ∈ ℝ q s Âq u are submatrices ofK d . Ψðs kþ1 Þ can be expressed as follows To ensure the control linearly, we use Ψðu k Þ ¼ u k and Equation (11) is re-written as follows Algorithm 1 summarizes the system identification process with data-driven methods. Now, the nonlinear system has been expressed in a linear manner with the Koopman representation. A LQR controller is applied to choose optimal control inputs over a finite time horizon. The minimized objective is defined as follows where Q ≥ 0 ∈ ℝ q s Âq s and R ≥ 0 ∈ ℝ q u Âq u are positive definite weight matrices on states and inputs, which penalize the deviation from the desired observable functions Ψðs kd Þ at step k.
The LQR feedback law for (13) becomes the following where K ∈ ℝ nÂq s represents the state-feedback gain metric of the LQR control. Metric K can be calculated by solving the discretetime algebraic Riccati equation. [30] Algorithm 2 summarizes the optimal control process of the nonlinear system.

Observation Function Selection
As described earlier, a set of observation functions is chosen to approximate of the Koopman operator. The error between two adjacent can be expressed as follows To minimize the error ε, selecting an appropriate observation function is important. Taylor series, which is continuously differentiable up to nth order, has been applied to approximate the observation function. Considering that the values of derivatives in Taylor series can be evaluated using numerical estimation from state measurements, this method is easy to implement. However, the Taylor series maintains high precision only at the expansion point s k . Morever, it has strict conditions, such as continuously differentiable.
To solve this problem, we introduce shifted Chebyshev polynomials into the selection of observation functions. [31] Chebyshev polynomial approximation is a global method that gives a better approximation for the truncated series than other techniques. For t ∈ ½À1, 1, the well-known Chebyshev polynomials T m ðtÞ are defined through the following identity T m ðtÞ ¼ cosðmθÞ (16) where θ ¼ arccosðtÞ and m ¼ 0, 1, 2, · · · . The polynomials form an orthogonal basis with the corresponding weighting func- and can be expressed as follows The polynomials can be generated by the following A function f ðtÞ ∈ ℒ 2 ðÀ1, 1Þ can be expanded in terms of the Chebyshev polynomials, as follows where the coefficients a m can be calculated by the following Now, we can shift the domain of Chebyshev polynomials from where k ¼ 0, 1, 2, · · · . The recursion formula of Chebyshev polynomials can be written as and the corresponding weighting function is changed into Hence, for any arbitrary function, f ðxÞ ∈ ℒ 2 ½x k , x kþ1 , it can Algorithm 1. Nonlinear system identification with data-driven methods. Input: Collected system states and inputs ðs k , u k Þ, penalty coefficient λ Step 1: Lift the system states s kþ1 to Ψðs kþ1 Þ using Chebyshev polynomials with (36) Step 2: Build the observation function Ψðs, uÞ with (4) Step 3: Calculate the Koopman operatorK d Step 4: Extract submatricesK A d andK B d belong to the states s and inputs u with (10) Step 5: Build the linear model of the system with (11).
Output: Koopman operatorK d Algorithm 2. Optimal control of the nonlinear system. Input: Initial state s 1 , target state s kþ1 , weight matrices Q and R, linear model of the system with (11) Step 1: Build linear quadratic regulator (LQR) controller and cost function of the system with (13) for t ¼ 1, 2, · · · , k þ 1 Step 2: Calculate the needed input u t Step 3: Apply u t to the system Step 4: Set the time from t to t þ 1 Step 5: Get and update the new state s tþ1 End Output: System input sequences u 1 , u 2 , ··· ,u t and state sequences s 1 , s 2 , ··· ,s t www.advancedsciencenews.com www.advintellsyst.com be written as follows where m ≥ 1, and the coefficients a m can be computed as follows Truncating the series in Equation (22), we can rewrite it as follows where T denotes matrix transposition. O and ΦðxÞ represent the column vectors as follows T m ðxÞ can be analytically written as follows From Equation (25)- (27), ΦðxÞ can be denoted as follows where QðxÞ ¼ ð1, x, x 2 , ·· · ,x m Þ, and P is the coefficient matrix given as follows . . . and For any arbitrary x ∈ ½x k , x kþ1 , there exists coefficient q ð0 < q < 1Þ that satisfies x kþ1 ¼ x k q . Then we have the following [28] The parameter R can be written as following r mÀ1,0 r mÀ1,1 · · · r mÀ1,mÀ1 0 r m,0 r m,1 · · · r m,mÀ1 r m,m where and So there exists a matrix S that satisfies the following where S can be calculated by, respectively, equating the coefficients of x i and solving the linear equations. It is found that Equation (3) is similar to the Equation (35). Here we choose the Chebyshev polynomials and corresponding coefficients as the observables, as follows The method can also be used for a system with multiple states.

Error Estimation and Overfitting
Considering the shifted Chebyshev polynomials of f ðxÞ ∈ ℒ 2 ½x k , x kþ1 , if f ðxÞ has M þ 1 derivatives, the max error in Equation (15) induced by Chebyshev polynomials to calculate Koopman operateK d across one time step can be calculated as follows One of the main factors affecting the data-driven modeling approach is overfitting. Although the least square method can minimize the L 2 -norm error in the training process, it is easily affected by noise and singularity. To solve this problem, we apply least absolute shrinkage and selection operator (LASSO), [13] which is an L 1 -regularization method.
Using the LASSO, the Equation (6) can be written as follows where λ is the weight of the L 1 penalty coefficient.

Learning-Based Keypoint Detection
In this section, a learning-based framework is developed to detect and localize the keypoints on the surgical instrument without any artificial markers. Then the pixel position of the keypoint in the image plane is used as the feedback in a closed-loop control. As shown in Figure 2, five keypoints, namely: RightClasper, LeftClasper, Head, Shaft and End, are chosen with consideration of the articulated structure of surgical instruments. [32] 3.1. Network Architecture Existing approaches for pose estimation tasks mainly use heatmap regression to detect keypoints. [33,34] Then the grouping method is used to group the keypoints that belong to the same object. Alternatively, each object can be segmented in advance, and keypoints detection of each object is further carried out. [35] Heatmap regression has several known drawbacks, such as computational inefficiencies, inherent quantization error, and spatial resolution sensitively. [36] Therefore, current methods cannot achieve real-time performance with high accuracy. In order to solve this problem, a real-time pose estimation framework is proposed to detect keypoints of surgical instruments. Small bounding boxes are used to represent the keypoint regions and the position of keypoints located at the center of the region. Details of the architecture are described as follows.
In Figure 3, GhostNet is used as the backbone to extract features, which can generate more features by using fewer parameters. [37] A spatial attention bottleneck module is then applied to provide abundant spatial information on the keypoints. [38] Feature pyramid network (FPN) and path aggregation network (PAN) are used to fusion features with different scales, which can fusion spatial information from low-level feature maps and semantical information from high-level feature maps. [4,39] Meta-ACON is used as the activation function. [40] Nonmaximum suppression (NMS) is applied to obtain the candidate keypoint regions. Associative embedding is used for keypoints grouping after detecting module (group keypoint that belongs to the same instrument). The grouping process clusters the identity-free keypoints into individuals by grouping keypoints whose tags have small L2 distance.

Loss Function
To enhance the pose estimation performance, the loss function ℒ total for a single image is computed as follows where ℒ keypoint represents the complete intersection over union (CIoU) of keypoint between ground truth regions and predicted regions. ℒ grouping represents the loss function of associative embedding.

Simulation
To verify the proposed control method in controlling a continuum laparoscope to adjust the FOV, a set of simulations are conducted in the instrument tracking task using Mujoco, which provides a simulation environment and physic engine for robotic and biomechanics. [41] As shown in Figure 4a, a continuum laparoscope model is established first. This model consists of 4 joints, and the size of the simulated continuum laparoscope is same as the real ones. The two cylinders in Figure 4b represent the surgical instruments. The red markers represent the target keypoints on the surgical instruments. There is a fixed global camera in the simulation environment to monitor the position of tracking point and the FOV center of the continuum laparoscope. Simulation performances are compared to determine the appropriate parameters in the LQR controller. Then two types of simulation tasks were performed to evaluate the performance by using the proposed method through tracking instruments in static and moving, respectively. For each type of simulation task, www.advancedsciencenews.com www.advintellsyst.com the proposed method is evaluated in two surgical scenarios with single and double instruments. Finally, the tracking error caused by constructing the observation function of Koopman operator using Taylor series and the proposed shifted Chebyshev polynomials were compared. The state of the automatic tracking system is s ¼ ½h, w, d T , where h and w are the pixel positions of tracking point in the image plane, and d is the distance between the tracking point and the center of FOV. u ¼ ½u 1 , u 2 T represents the rotation angle of motors that applied to the continuum laparoscope. The parameter M we choosing in Equation (35) is 5 in simulation. The penalty coefficient λ is 10. The weight metrics Q and R in the LQR controller mainly affects the convergence speed of the system state and the error with the target state. In order to choose the appropriate weight metrics in the LQR controller, repeated simulations are performed with different Q and R from the same initial position of a single static surgical instrument, respectively. As shown in Figure 5, the cumulative error to reach the equilibrium state of the system is smaller with the increase of Q, but it will lead to a larger error of the system in the equilibrium state. With the increase of R, the time required to reach the equilibrium state of the system is shorter, and likewise, the error of the system at the equilibrium state is larger. Considering the convergence speed and error of the system state in the equilibrium state, the weight metrics Q and R in the LQR controller are chosen as diagð10, 10, 1Þ and diagð1, 1Þ, respectively.

Tracking the Static Instruments
First, we evaluated the proposed method by tracking a single stationary surgical instrument from different initial positions. The continuum laparoscope moved automatically under the proposed control methods until the tracking point reached the center of FOV. The pixel coordinate of the laparoscopic FOV center was (200, 200) in the image plane. As shown in Figure 6a, several repeat trials were performed, while the instrument was located at different initial positions in four different areas of the image plane. The initial positions were marked as black points. All initial positions are chosen randomly, so the selected initial positions are basically representative of the entire workspace. The distance between the tracking point and the center of FOV is also called the tracking error. The time interval between each sample data is 0.05 s. The continuum laparoscope can be controlled to adjust the view until the tracking point is located at the center of FOV in the image plane. As shown in Figure 6b, the distance-step curve oscillates in the first few steps, and then the states of the system will become stable with the increase of steps. After approximately 25 steps, the tracking point locates at the center of FOV with an error of 10 pixels. This status also indicates that tracking is successful, and the continuum laparoscope maintains the current state unless the instrument moves. The errors of the three repeat trials are almost the same, which shows the stability of the proposed optimal control methods.
Then we simulate the double static surgical instruments tracking task while the instruments have random positions. We set the center of the two detected keypoints on each instrument as the new tracking point. As shown in Figure 6c,d, the tracking point moves relatively from the initial state (black point) to the view center. The tracking error is approximately 4 pixels.  www.advancedsciencenews.com www.advintellsyst.com

Tracking the Moving Instruments
In this section, moving instruments tracking will be conducted. Figure 7a shows the trajectory of the tracking point on a single instrument. When the surgical instrument moves in a circular trajectory, the FOV center of the continuum laparoscope in the image plane of the globle camera also moves along with the surgical instrument. It is seen from the global camera that the continuum laparoscopy can track the keypoints with the movement of the surgical instrument. Figure 7b shows the distance between the tracking point and center of FOV in the image plane when tracking the moving instrument. Another simulation of tracking two moving surgical instruments is further performed. Figure 7c shows trajectory while tracking double instruments. The center of the two detected keypoints on the instruments is set as the tracking point. Figure 7d shows the distance between the tracking point and center view while tracking two instruments simultaneously. In addition, Figure 7d also shows the distance from the detected tracking points on the two surgical instruments to the view center in the simulation environment.  The simulation results show the feasibility of our approach while controlling a continuum laparoscope by tracking moving surgical instruments. In addition, the laparoscopic FOV can be well adjusted based on the position of the tracking point. The simulation results also show that the continuum laparoscope can provide a stable FOV. The distances between the tracking point and the center of FOV while tracking a single moving instrument and two moving instruments are approximately 6.02 and 5.84 pixels, respectively.

Comparison with Taylor-Based Method
In this section, we will compare the tracking error while using the Taylor series and Chebyshev polynomials in constructing observation functions of the Koopman operator. Figure 8a shows the comparison between the two methods while tracking a static instrument. Chebyshev polynomials used in our method are more complex. Thus, reaching the balanced state of the system (the surgical instrument appears in the center of FOV) is slower than using the Taylor series method. On the other hand, the error of our method is smaller than that of the Taylor-based method. We then compare the distances between the tracking point and center of FOV while tracking a moving surgical instrument using the two methods. As shown in Figure 8b, the tracking error of the Taylor-based method is 14.76 pixels, whereas our Chebyshev-based method has an error of 11.23 pixels. The small error variation shows that our Chebyshev-based method can provide a more stable FOV by controlling the laparoscope automatically.

Experimental Section
Experimental Setup: An experiment platform that is consistent in the simulation environment is built using a continuum laparoscope based on the proposed automatic tracking system to validate the data-driven control method. As shown in Figure 9, a 2 mm diameter pinhole camera with a resolution of 400 Â 400 pixels is fixed at the end effector of the cable-driven continuum manipulator (Intuitive Surgical, California, USA). Sensing image can be obtained with a frequency of 30 Hz through a USB port. The continuum manipulator has four connected joints, which can be divided into two groups. Joint1 and Joint4 control the movement in the X-axis direction, and both are actuated by a brushless motor (Maxon Group, Sachseln, Switzerland). Joint2 and Joint3 control the movement in the Y-axis direction and are actuated by another brushless motor. The continuum manipulator is fixed in the Z-axis direction, which could ensure safety once the initial position is determined. Elmo drivers (Elmo Motion Control Ltd., Israel) are used to actuate motors precisely by receiving command from TwinCAT3 (Beckhoff Automation GmbH & Co. KG, Germany) through the EtherCAT bus. The large needle driver and the grasping retractor (Intuitive Surgical, California, USA) are used in this work. [4] The large needle driver holds the needle while forcing the tip through www.advancedsciencenews.com www.advintellsyst.com the tissue to complete the suturing task in RMIS. Grasping retractor is used to retract the tissue to reveal the surgical scope so that the surgeon can explore the surgical area and perform surgery. Data Collection and Implementation: First, in order to train the learning-based pose estimation model, we collected an in-house dataset containing 3000 images with the two surgical instruments using the experimental platform. The in-house dataset is labeled following the label rules for the pose estimation task by using the LabelImg software. We performed data augmentations to enlarge the dataset and prevent the overfitting problem. Mosaic, horizontal flips, random crop, color transform, and translation are used as augmentation skills in this work. The hyperparameters during training are as follows: the training step is 300, the batch size is 32, the polynomial decay learning rate scheduling strategy is adopted with an initial learning rate of 0.1, the momentum is 0.9, and the weighted decay is 0.005. We implemented our method using Pytorch on an Ubuntu 18.04 LTS workstation with an Intel Xeon(R) 2.30 GHz CPU and an NVIDIA Tesla P100 16 GB graphics card. The implementation takes around 6.5 h to train the entire framework. The processing speed achieves an average of 34.3 fps for videos.
Second, in order to calculate the infinite-dimensional approximation of the Koopman Operator, a set of system states and  www.advancedsciencenews.com www.advintellsyst.com corresponding inputs were collected. We collected 100 trials with random initial inputs of the motors. The state of the automatic tracking system s and the input of two motors u are the same as that selected in the previous simulation environment. Then the input of the two motors varies one cycle according to the trigonometric function to generate the states. Following this rule, the continuum laparoscope runs 500 steps a cycle in the workspace. Finally, we collected 50 000 pairs of data about system states and inputs. The Koopman operatorK d of the continuum laparoscope system is estimated according to the collected system states and inputs in Equation (7). Then the dynamic model of the system in Equation (12) is determined. Parameters used in the LQR are the same as that selected in the simulation environment. Parameter M in Equation (35) is 5, and the penalty coefficient λ is 10. The weight metrics Q and R in the LQR controller are diagð10, 10, 1Þ and diagð1, 1Þ, respectively. According the control strategy of LQR control, states of the continuum laparoscope system will change with inputs of the motors. Once the updated system state is determined, the new input of system can be calculated using Equation (14). Then the motors are controlled through the Elmo drivers and turn to the specified angle, thus achieving the goal of adjusting the FOV of the continuum laparoscope. Furthermore, we also evaluated the dynamic uncertainty of the continuum laparoscope system based on the data collection.
Dynamical uncertainty means that the same inputs may lead to slightly different outputs. This uncertainty is caused by motor encoder error and system assembly error. Specifically, the system gets an input that changes periodically. The output of the system over 100 cycles is then collected, and the standard error over 100 cycles is calculated. The standard error of the system is 19 pixels. The internal error of this system will affect our tracking experiment, which cannot be eliminated by the designed controller.
Experiments on Keypoint Detection: Motion blur, specular reflections of surgical instruments, blood, surgical smoke, and tissue occlusions are the main challenges in medical image analysis, [4] which are also applicable to the keypoint detection task of surgical instruments. In order to evaluate the capability of the proposed pose estimation model in facing these challenges, experiments are first performed on a public EndoVis Challenge dataset. [32] This dataset is collected from several laparoscopic colorectal surgeries. A large needle driver is also used in this dataset, which is the same as the in-house collected dataset. The EndoVis Challenge dataset contains 1850 images: 940 for training and 910 for testing. The frame resolution is 720 Â 576 pixels. To evaluate our proposed framework, we use a standard metric, namely, object keypoint similarity (OKS) for the pose estimation task. [35] We report the standard average precision (AP) and average recall (AR) scores, including AP 50 (AP at OKS ¼ 0.5), AP 75 (AP at OKS ¼ 0.75), AP (mean of AP scores from OKS ¼ 0.50 to OKS ¼ 0.95 with the increment as 0.05), AP M (AP scores for the instrument of medium sizes), AP L (AP scores for instruments of large sizes), and AR (the mean of AR scores from OKS ¼ 0.50 to OKS ¼ 0.95 with the increment as 0.05). Results are shown in Table 1. Figure 10 more intuitively shows the pose estimation results of the surgical instruments on the two datasets. More intuitive results of instruments are available in the Video SI, Supporting Information.   Besides, in order to compare the proposed method with other methods, we also used the same evaluation criteria, as follows: mean average precision (mAP), mean localization error (mLE), and detection time. [7] As shown in Table 2, our method achieves 96.27% of the mAP, 1.1 pixels of the mLE, and 34.3 fps. Experimental results show that our keypoint detection method based on pose estimation task can obtain relatively high precision of keypoint position and avoid the shortcomings of other methods mentioned in Section 3.1. [7] Through the learning-based pose estimation method, the pixel positions of the different parts of the surgical instrument could be obtained. For different surgical phases, different surgical instruments are used. We can adjust the tracking point automatically for a better view due to the difference between the shapes of the instruments. Tips of the surgical instrument are usually close to the target tissue to be operated. The target tissue located in the center of FOV can bring a better surgical experience to the surgeon. Considering the effect of the large needle driver and the grasping retractor and tissue occlusions in surgery, we choose the head as the reference point for a better view. In addition, we set the weight of the two surgical instruments in the FOV to 1:1 while tracking two instruments.
Experiments on Tracking Static Instruments: Experiments on static surgical instruments are performed to evaluate the autonomous laparoscopic control method to adjust the FOV of a continuum laparoscope. Repeat trials are conducted for different surgical instruments. Only the initial position of the surgical instrument is different. As shown in Figure 11a, the black point represents the initial position of the tracking keypoint on the surgical instrument. The scatters mean the position of the tracking point relative to the center of the FOV at each step. Figure 11b shows the distance between the tracking point and the FOV center of the laparoscope. When the continuum laparoscopy was controlled to adjust the FOV based on visual feedback automatically, it needs approximately 25 steps to approach the center of the FOV. This is consistent with our verification in the simulation environment. After 25 steps, the continuum laparoscope remains largely stationary, indicating that our approach provides a stable FOV after reaching the tracking purpose. The tracking error is approximately 39.1 pixels when the system is stable. Figure 11c shows the keypoints on the two surgical instruments and the pixel position of our tracking points relative to the center of the FOV at each step. Notably, the weights of both surgical instruments in the FOV are the same. Figure 11d shows changes of the tracking error and relative position of the two surgical instruments, which is consistent with the performance when a single surgical instrument tracking task is performed.  www.advancedsciencenews.com www.advintellsyst.com The performance of tracking static surgical instruments proves the feasibility of our proposed data-driven control approach. Experiments on Tracking Moving Instruments: First, we evaluate the proposed method with one moving surgical instrument. As shown in Figure 12a, the scatters represent the relative position of the tracking points on the surgical instrument in the FOV of the continuum laparoscope. The number of color bar means the density of the tracking points in the image plane. The higher the value, the more times the tracking point locates in the area with the movement of the surgical instrument. It is seen that most of the tracking points locate near the center of the FOV. The distance between the tracking point and the FOV center of the laparoscope is shown in Figure 12b. The average distance while tracking a moving surgical instrument is approximately 45.77 pixels.
Second, we extend the tracking task to two surgical instruments. Similar to the setting in Section 5.4, we set the same weights for the two surgical instruments in the tracking process. The same weights mean that the tracking point is located at the midpoint of the keypoint on the two surgical instruments. Figure 12c shows the relative pixel position of the tracking points in the image plane. Figure 12d shows tracking errors while tracking double surgical instruments. The average distance is about 28.47 pixels.
Discussion with Clinical Requirements: Researchers reported the clinical requirements for adjusting the FOV of the laparoscope with the movement of surgical instruments. [7] The researchers analyzed collected videos of different clinical surgeries. They found that the average distance from the tip of the surgical instruments to the FOV center of the laparoscope is about 423.84 pixels, even though the surgical assistant was constantly adjusting the laparoscopic field of vision during the procedure. These errors are mainly caused by the lack of understanding between assistant and surgeons. Considering the resolution of collected surgical videos, the error is approximately 22.08% of the horizontal resolution and 39.24% of the vertical resolution. In addition, the average moving speed of surgical instruments during the surgery in FOV of the laparoscope is about 131.95 pixel s À1 .
We have demonstrated the feasibility of our proposed autonomous laparoscopic control approach to adjust FOV regardless of whether the surgical instruments are moving. However, the fast movement of surgical instruments leads to the accuracy of keypoint detection and then affects the accuracy of surgical instrument tracking. In the experiment, the average moving speed of surgical instruments is about 173.54 pixels s À1 , which is sufficient in clinical surgery. From the experimental results on the automatic tracking task of surgical instruments, the distances between the tracking point and the center of FOV while tracking single moving instruments and double moving instruments are about 45.77 and 28.47 pixels, respectively. The results are approximately 11.44% and 7.12% of our continuum laparoscopic FOV, which is much smaller than the FOV error of clinical surgery. Considering the dynamic uncertainty of the continuum laparoscope system, the internal error of this system is about 19 pixels. This indicates the proposed autonomous FOV adjusting method with a continuum laparoscope can satisfy the clinical requirements. In addition to laparoscopic FOV adjustment as part of the surgical procedure, our approach can promote the automation of the robotic surgery process. www.advancedsciencenews.com www.advintellsyst.com

Conclusion
This article presents a data-driven control method for a continuum laparoscope system with learning-based visual feedback to adjust the FOV automatically in RMIS. We first developed a nonlinear system identification method using the Koopman operator and Chebyshev polynomials. Then we build an LQR controller based on the trained Koopman Operator with the visual feedback. A learning-based keypoint detection method is designed without manual markers to provide precise visual feedback for the control system. This method provides more options for selecting keypoints on surgical instruments while ensuring detecting accuracy during the different surgical processes. Simulation and experiments are performed to evaluate the proposed methods. The pose estimation method provides higher accuracy than other keypoint detection methods. Tracking experiments show the feasibility of the proposed data-driven control method of a continuum laparoscope for adjusting the FOV automatically. Experimental results show that the proposed method can also satisfy the clinical requirements. In the future, the freedom of continuum laparoscope in Z-axis direction to enlarge the view will be considered to provide surgeons a better experience. Constrained workspaces and inputs will be studied to ensure safety during robotic surgery. In addition, we will further explore scene segmentation in surgical images or videos. According to the results of scene segmentation, the target tissue was chosen as visual feedback instead of surgical instruments to adjust the FOV of the laparoscope automatically.

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.