Distilled neural state-dependent Riccati equation feedback controller for dynamic control of a cable-driven continuum robot

This article presents a novel learning-based optimal control approach for dynamic control of continuum robots. Working and interacting with a confined and unstructured environment, nonlinear coupling, and dynamic uncertainty are only some of the difficulties that make developing and implementing a continuum robot controller challenging. Due to the complexity of the control design process, a number of researchers have used simplified kinematics in the controller design. The nonlinear optimal control technique presented here is based on the state-dependent Riccati equation and developed with consideration of the dynamics of the continuum robot. To address the high computational demand of the state-dependent Riccati equation controller, the distilled neural technique is adopted to facilitate the real-time controller implementation. The efficiency of the control scheme with different neural networks is demonstrated using simulation results.


Introduction
Continuum robotics has made considerable strides in recent years and currently encompasses a diverse variety of applications and technologies. In contrast to most traditional robots comprising rigid linkages, continuum robots (CRs) have flexible parts that enable them to deform continuously, making them ideal for applications requiring intricate motions, ranging from industrial maintenance to surgical operations and space deployment systems. [1][2][3][4][5][6][7] CRs provide a number of benefits over rigid robots, that is, they can operate in restricted and congested environments and manipulate a variety of shapes and types of objects owing to their flexibility without excessive mechanical parts complexity.
Despite the benefits of CRs discussed above, modeling and control of CRs present difficult challenges due to the high level of dynamic complexity, coupling, and nonlinearity. In general, the deformation of CRs may be represented by a suitable kinematics model, which is also a focus of study in the literature; however, such models do not give insight into dynamic behavior. Furthermore, the high-speed movements and inertia decrease the robot's kinematic model accuracy significantly. Therefore, for accurate control it is necessary to explore the CR dynamic model to analyze the relationship between driving force and bending deformation. 8 In the literature, four approaches for dynamic modeling of CRs have been presented, including Lagrangian method, 9 Cosserat rod theory, 10 constant deformation, 11 and center of gravity. 12 The Lagrangian technique is used for dynamic modeling of the CR in this work, because of its computational efficiency and ease of real-time implementation. 13 However, the Lagrangian approach described in He et al. 9 lacked a general formulation and relied predominantly on Taylor series expansion and several curve fittings to simplify the CR equations. For these reasons, an appropriate technique (adapted from Samadikhoshkho et al. 13 ) is required to handle CR modeling issues in the initial stage of designing the controller.
An effective dynamic control strategy is also required to deal with the system's complicated behavior. Numerous prior publications have concentrated on the kinematic control of cable-driven CRs, 14 with only a few addressing their dynamic control. [15][16][17] The inherent difficulty in designing a controller for a dynamic model is the primary reason that a limited number of works exist on dynamically controlled CRs. More studies have been conducted on the dynamic modeling and control of pneumatically actuated robots 18-20 than on cable-driven robots. [21][22][23] Along with model-based control strategies for CRs, there are model-free control methods that utilize machine learning. 24 Machine learning research to control CRs often employs a supervised learning technique [25][26][27][28][29] or reinforcement learning (RL). 30,31 Supervised learning approaches are predicated on having the correct answer to the issue available and attempting to learn from it. In Giorelli et al., 25 a feed-forward neural network was trained to learn the inverse kinematics of a cable-driven CR. An experimental study with the same approach 25 was again conducted by Giorelli et al. 26 In Xu et al., 28 the inverse kinematics of the robot was determined using three regression methods, including extreme learning machine, Gaussian mixture regression, and k-nearest neighbors regression. In Bern et al., 29 a neural network was trained to learn the forward kinematics of a soft robot; and an optimal open-loop controller was designed using a gradient-based optimization method.
In RL, a learning agent is rewarded after each action; and this reward indicates how beneficial the activity was. The objective of RL is to discover an optimal policy that maximizes the agent's cumulative reward over the course of an episode. A reinforcement approach based on a Q-learning algorithm was employed in You et al. 30 to control a soft manipulator in a 2D plane. In Liu et al., 31 proximal policy optimization algorithm was used for locomotion control of a soft snake robot.
RL has several drawbacks. For example, solving a problem with RL requires an appropriate reward function, which can be very challenging to formulate. In addition, every RL algorithm has several hyper-parameters, such as discount factor and time horizon, that significantly affect the solution and need to be carefully tuned. In the present work, a supervised approach is developed based on a specific type of nonlinear optimal control scheme, the state-dependent Riccati equation (SDRE) technique, 32 to optimally control a CR's dynamics. An optimal control strategy should leverage the flexibility of CRs to advance present applications and incorporate online application in congested, dynamic, and unstructured environments. 33 Although SDRE is a powerful control method that is a nonlinear generalization of the linear quadratic regulator controller, it has rarely been used in the past due to its high computational cost, particularly prior to the development of fast computers and learning approaches in recent years. For example, neural SDRE control of the dynamics of a satellite's attitude 34 is a recently developed solution for real-time application of the SDRE method. The dynamics of a CR are more complex than those of a satellite, making SDRE control design and implementation for CRs more challenging. To the best of the authors' knowledge, no optimal SDRE control strategy has yet been developed for CRs due to the complexity of their dynamics.
The purpose of this article is to introduce a novel end-toend learning-based strategy for dynamic control of a cabledriven CR. Following the training of neural networks to replicate the SDRE controller, the knowledge distillation (KD) method 35 is used to train smaller networks on the basis of previously trained larger ones. Although KD has been mostly applied to classification issues, here it is modified for the regulation challenge. KD enables training smaller networks with fewer parameters while leveraging prior knowledge from larger networks. Additionally, it has a regularization impact that may help networks achieve greater generality. The impact of several network designs is investigated to determine the best effective architecture and learning technique for neural SDRE control of CRs.

Dynamic modeling
Dynamic modeling of a tendon-driven CR, shown in Figure 1, is provided in this section by adapting from Samadikhoshkho et al. 13 and focusing exclusively on the CR described in that work.
Controllable degrees of freedom for a tendon-driven CR are denoted by vector q in equation (1), including two angles q 1 and q 2 where q 1 is the horizontal angle of the end-effector with respect to X-axis in the XY plane, and q 2 is the vertical angle of the end-effector with respect to the Z-axis.
To model a flexible continuum arm, it is first necessary to consider a section of the CR and find its kinematics and energy functions. Then, by taking the integral along the length of the CR, the dynamics of the CR can be derived using Euler-Lagrange theory. The position of the endeffector can be determined by rotating the flexible arm coordinate (given by the flexible beam model 9 ) about the Z-axis by the angle q 1 . In addition, the end-effector orientation (rotation matrix) can be found by performing three rotations around the Z-axis for angle q 1 , the Y-axis for angle q 2 , and the Z-axis for angle Àq 1 , respectively. 9 The rotation matrix between a point of arm at section s and the base of the arm R s can be found in Godage et al. 36 and expressed as equation (2) where vectors n s , b s , and t s are columns of the rotation matrix.
The angular velocity of each section of arm v s is obtained as 37 where skewðÞ denotes the skew-symmetric form and J o ðsÞ is the orientation Jacobian matrix at section s. The linear velocity of arm at section s is also determined as where J p ðsÞ represents the position Jacobian matrix at section s. By considering Euler-Lagrange formulation, equation (5), a dynamic model of the robot can be derived. To define the Lagrangian of the system, L, calculating system's kinetic energy K and potential energy U are required The kinetic energy of the robot consists of two parts. The first part is related to the main backbone denoted by K m:b , and the second part is the total kinetic energy of disks, K d . Energy in energy is neglected as it has a negligible effect on the system dynamics Kinetic energy of the backbone at each section s, where r s , A s , and I s are the backbone's density, cross section, and moment of inertia matrix, respectively. Similarly, the kinetic energy of the j-th disk K d ðjÞ is calculated as where in which h, n, m d , and I d denote distance between disks, number of disks, mass and moment of inertia of each disk, respectively. Finally, the total kinetic energy of the robot, K, is expressed as where l is the length of the backbone. By using equations (3) and (4), the total kinetic energy can be written as Similarly, the potential energy of the robot U consists of two parts, U m:b and U d , which are presented in equations (13) to (15) as follows where g is the acceleration due to gravity, e 3 shows direction of gravity, and E s denotes Young's modulus of the main backbone. Potential energy storage in cables is neglected. Finally, using equations (11) and (13) to (15), the dynamic model of the system is derived as equation (16) GðqÞ ¼ @UðqÞ @q T Although the CR has an infinite number of degrees of freedom, only two angles of the end-effector can be controlled by the robot actuators pulling tendons. Therefore, the end-effector position can be changed by adjusting the end-effector angles. Three tendons are responsible for controlling the tendon-driven CR under consideration. The first tendon actuator is positioned along the X-axis, while the other two are each placed at an angle of 120 with respect to the first tendon. Based on the horizontal angle of the end-effector, two tendons should function. Tendons 1 and 2 should be activated if q 1 is between 0 and 120 . Similarly, tendons 2 and 3 should be involved for 120 < q 1 < 240, whereas tendons 1 and 3 are responsible for 240 < q 1 < 360.
The relation between the control signal and tendons tensions F ten ¼ ½F 1 F 2 F 3 T is expressed as equation (19) (as stated in Samadikhoshkho et al. 13 ) where D ten is calculated based on tendon displacement in Samadikhoshkho et al. 13

Control design
This section discusses the development of the model-based optimal control and learning approach for controlling a CR.

State-dependent Riccati equation
The presented model of the CR in the preceding section reveals a high degree of nonlinearity in the system dynamics. Therefore, linear control techniques may be inapplicable to such a nonlinear system. To deal with the nonlinearity issue and to improve the robustness of the controller, the SDRE approach is suggested to compute the control's optimal solution at each time step.
To design the controller, it is necessary to first describe the nonlinear equations in a pseudo-linear form with statedependent coefficients. It is common to express the pseudolinear structure in the form shown in equation (20); however, a new representation is required to adapt the equations for CR dynamics, as seen in equation (21) _ For optimal reference tracking problem of the system presented in equation (21), the error vector is defined as The control problem is to find the optimal solution for the system by minimizing the following performance measure (cost function), J J ¼ 1 2 where Q and R are positive definite weight matrices for tracking and control effort costs, respectively. To calculate the optimal control signal, the Hamiltonian H xðtÞ; uðtÞ; h ðtÞ; t is written as equation (23) as cited in Kirk. 38 Here, h depicts co-states H ðxðtÞ; uðtÞ; h ðtÞ; tÞ ¼ The optimal control gain can be derived by finding the solution for equation (24) (as stated in Kirk 38 ).
which results in By defining h ¼ Pe À g and substituting in equation (25a), one can obtain where P and g terms are defined later, K 1 ¼ ÀR À1 B T P and K 2 ¼ R À1 B T g. Substituting the optimum control signal equation (26) into equation (25b) and using the error dynamics, _ e ¼ Ae þ Bu þ F, the following expression emerges which can be rearranged as The solution of equation (28) leads to the following equations The expression in equation (29a) is a well-known Riccati equation, which can be solved for steady-state conditions (to facilitate its real-time application) as seen in equation (30) By solving the Riccati equation and finding P, g is calculated by substituting P into equation (29b) as Finally, having P and g, the optimum control signal can be found from equation (26). For the CR problem, the state variables are selected as Also, the quadratic form of the CR equation, equation (16), can be rewritten by expressing equation (21) as

Data acquisition and learning procedure
This work employs a supervised approach to solve the reference tracking problem. Supervised learning aims to estimate the mapping between given input-output pairs, which can then be used to predict the output for any new input. In the scenario under consideration for CR control, the mapping to be estimated is an SDRE controller that receives the dynamical system's state variables and desired states as inputs and produces the appropriate control signals as outputs.
The training data are generated by the SDRE control technique presented in the previous section. Since there is a reference tracking problem for CR control, both state vector x and error vector e ¼ x À x d are used to find the optimal control signal. Hence, the input of each training sample is defined as To train the network, inputs are randomly generated within the appropriate range, and the corresponding outputs are calculated by the SDRE controller. The output of each sample has two potential alternatives, both of which are tried. The first option is to predict the SDRE control gains, K 1 and K 2 , and calculate control signal using u Ã ¼ K 1 e þ K 2 . The second, and more straightforward, alternative is to use control signal u as the output signal, which may result in predicting fewer values because u has the same dimension as K 2 .
Several types of neural networks can be used to address regression issues, and the problem setup should determine the most appropriate one. The input to the SDRE controller for the CR is a vector with no spatial link between its members, that is, the order of the vector elements is chosen arbitrarily before deriving the dynamical system governing equations of motion. Additionally, because there are no temporal trends in the data, all samples are independent and identically distributed. This is because the SDRE controller computes the control signal only based on the system's present state and disregards past states. The present work uses multilayer perceptron (MLP) neural networks.
A schematic structure of a simple MLP artificial neural network with input size n, output size c, and two hidden layers is shown in Figure 2. In each neuron (shown as circles in Figure 2), input x is multiplied by the weight vector of the neuron W and a bias b is added. Finally, an activation function sð:Þ is applied to generate the output of the neuron. Therefore, the output of j-th neuron in the i-th layer can be calculated using equation (35) y ðiÞ j ¼ sðW where the input of layer ðiÞ, x ðiÞ , is the output of layer ði À 1Þ. Network parameters must be determined by minimizing an appropriate loss function between network outputs (predictions) and desired outputs. Following training, the neural network is expected to perform well not only on the training data but also on previously unseen test data. This is referred to as generalization, and it is one of the most challenging aspects of machine learning. Another critical issue in particular applications, such as control of CRs, is the size of the neural network (number of parameters). Reducing size of the network can increase its generalization and speed up its execution; however, simply decreasing the size is frequently not a wise approach, because it reduces the network's complexity and capability to learn increasingly more complicated tasks.
KD 35 is a technique for training smaller networks based on what a larger network has already learned, similar to the concept of fitting data to functions or reducing the order of a set of governing differential equations. Initially, KD was presented as a solution to the classification problem.
According to KD, when a large network is trained to solve a classification task, the layer preceding the softmax provides an appropriate embedding for the input data. Hinton et al. 35 recommended concurrently training a smaller network that predicts this embedding and labels the sample. This approach is adopted for CR control in this article.
KD begins with training of a large neural network (teacher network) to solve the regression issue by minimizing the mean squared error (MSE) between network outputs and desired outputs. As indicated in Figure 3, a smaller network (student/distilled network) is then trained in such a way that its output layer accurately predicts the desired outputs, while one of its hidden layers predicts the embedding, z teacher , generated by one of the teacher network's hidden layers. The loss function for training the student network can be defined as sum of two MSE loss functions presented in equation (36) Loss ¼ aL student þ ð1 À aÞL distillation (36) where L student ¼ MSEðy student ; y desired Þ (37) Notably, the teacher is not trained during the student network's training, and back-propagation is performed only on the student network.
In a regression problem with c outputs, there are two types of neural network architectures based on their outputs. The architecture can be either a single network with c outputs or a set of c separate networks with a single output. For KD, if the teacher network is trained with the second scenario (separate networks for outputs), then they can either be distilled separately or the foremost can be merged and then distillation is performed. The performance of these different scenarios will be compared in simulation results section.
The Block diagram of the closed-loop system is shown in Figure 4, in which Controller and System refer to the trained neural network and the dynamic model of the system presented by equation (33), respectively.

Simulation results
Simulation was used to assess the efficiency of the proposed neural-based SDRE control approach for tendon-driven CRs with the exact specification as Samadikhoshkho et al. 13 Several networks were trained and compared to determine the best neural controller structure for this CR.
The input dimension of all networks is eight, including four states and four error variables. Different neural network cases are considered as follows: 1. a single network with output of the control signal u ¼ ½u 1 ; u 2 T ; 2. two networks with output of u 1 and u 2 ; 3. two teacher networks with outputs of u 1 and u 2 trained separately, then merged and distilled to a single student network with the output ofû ¼ ½û 1 ;û 2 T ; and 4. two teacher networks with outputs of u 1 and u 2 trained and distilled separately to networks with outputs ofû 1 andû 2 , then merged.
In case (1), the network has three intermediate (hidden) layers with 500, 500, and 100 neurons while the output layer has two neurons. In case (2), both u 1 and u 2 networks have three intermediate layers and an output layer with 200, 100, 50, and 1 neuron, respectively. In both cases, intermediate layers have "PReLu" activation function.
In case (3), for both u 1 and u 2 networks, networks from case (2) after eliminating their last layers are used. This architecture is shown in Table 1. To form the teacher, networks u 1 and u 2 are concatenated so that the output has two components. This teacher network is then distilled into the student network with the architecture shown in Table 2.
In case (4), each teacher network is identical to that of case (3); however, they are separately distilled to networks with architectures presented in Table 3. Then, these separately distilled networks are augmented and used as  controller. Each distilled network has 4130 trainable parameters, which is significantly less than the teacher network with 26,953 parameters.
Distillation loss in equation (38) is defined as MSE between the output of the third hidden layer of the unetwork and the third hidden layer of the distilled network, both of which have a dimension of 50. The student loss is defined as MSE between the output of the distilled network and the ground truth.
To evaluate the performance of the proposed neural network controllers, the following desired trajectory for a circular motion of the end-effector is considered The control signals (tendon tensions) and end-effector angles, q 1 and q 2 , for the results obtained from various neural controllers, as well as those of the model-based SDRE controller, are illustrated in Figures 5 to 9.
The MSEs of angles q 1 and q 2 in different cases with respect to those of the SDRE controller are compared in Table 4.
The results in Figures 5 to 7 indicate that case (2) (two separate networks without distillation) and case (4) (two separate networks with separate distillation) perform better than other cases. Additionally, deviation from the ideal SDRE control signal is large in case (3) (two separate networks with a single distillation), although the error is greatest in case (1) (single network without distillation). As seen in Figures 8 and 9, all cases perform well for the angle q 2 . However, cases (4) and (1) perform the best and worst in terms of angle q 1 control, respectively. Cases (2) and (4) have very similar performance, but case (4) has very fewer parameters. This result supports the claim that the distilled network can be implemented as a digital controller with a higher sampling frequency on the same hardware.    It can be interpreted that in the multi-output regression problem of CR control, it is preferable to train a network separately for each output to obtain the best performance. As well, when one network with several outputs is trained for such a problem, all of the outputs are functions of some mutual weight (i.e. the parameters of shared layers). This complicates network training. Additionally, distillation should be performed independently on each network for the same reason.

Conclusions
The present work uses neural networks with KD to replicate a powerful but computationally expensive SDRE to optimally control the challenging dynamics of CRs for the first time. While many prior efforts on CR control have focused on the kinematics of the robot, the present work employs the dynamics of the system described by the Lagrangian approach. Different setups for neural network and KD were explored in this respect, and it was found that training a separate network for each output and distilling those networks independently produced the best results for a simple CR in simulation. The comparison between the precise SDRE controller and the neural SDRE controller without KD shows that the trained distilled neural SDRE controller requires less computation. To address timeconsuming and labor-intensive process of collecting and classifying data from the real world, all neural networks in this work are tuned using synthetic/simulation data. The proposed controller is applicable to all types of CRs whose models are presented using the Lagrangian technique.