Real-Time Optimal Approach and Capture of ENVISAT Based on Neural Networks

A neural network-based controller is developed to enable a chaser spacecraft to approach and capture a disabled Environmental Satellite (ENVISAT). This task is conventionally tackled by framing it as an optimal control problem. However, the optimization of such a problem is computationally expensive and not suitable for onboard implementation. In this work, a learning-based approach is used to rapidly generate the control outputs of the controller based on a series of training samples. These training samples are generated by solving multiple optimal control problems with successive iterations. Then, Radial Basis Function (RBF) neural networks are designed to mimic this optimal control strategy from the generated data. Compared with a traditional controller, the neural network controller is able to generate real-time high-quality control policies by simply passing the input through the feedforward neural network.


Introduction
ENVISAT is one of the largest satellites launched by the European Space Agency (ESA) which lost communication with the ground station in April 2012 [1]. A collision between ENVISAT and an object of mass 10 kg could produce an extremely large cloud of debris, initiating a self-sustaining chain reaction of collisions and fragmentation with the production of new debris, a phenomenon known as the Kessler syndrome [2].
The capture of ENVISAT is a complex space mission complicated by its uncertain rotation and the coupling motion between the chaser spacecraft and its manipulators. Many efforts were made by researchers to determine the attitude of ENVISAT. Sommer et al. [3] estimated the rotation vector and rotation speed of ENVISAT through ISAR images from 2011 to 2017, and he indicated that the rotational axis is not fixed and ENVISAT performs a nutation. Koshkin et al. [4] determined the spin motion of ENVISAT using photometry and provided an estimation formula to calculate the rotation period. Pittet et al. [5] suggested that ENVISAT per-formed no precession and a possible but limited nutation.
Theoretically, these ground observations pave the way of estimating the rotation of ENVISAT by multiple equations. Unfortunately, in practice, there are still some inconsistencies within these observations that may suffer from different measurement information, measurement equipment, and target motion data. Such inaccuracy affects the performance of the capture mission.
The capture mission is usually divided into two stages, i.e., the approach stage and the capture stage. In the approach stage, only 6-Degree-Of-Freedom (DOF) motion planning and control are considered since the manipulators are locked. In general, motion planning can be executed before the implementation of the tracking control. Boyarko et al. [6] employed Gauss Pseudospectral Method (GPM) to plan 6-DOF time and fuel optimal motion trajectories. Chu et al. [7] further considered an obstacle avoidance constraint during GPM planning. However, [6,7] require accurate measurement of three-dimensional orbit position and relative distance information. In contrast, Zhang et al. [8] introduced an angles-only offline trajectory planning method which is tracked by a nonlinear model predictive control (NMPC). Offline planning has the advantage to optimize trajectories under various performance indices and constraints. However, onboard implementation is not efficient since these algorithms require accurate measurements and high computational expense. Lee and Vukovich [9] established a robust adaptive terminal sliding mode control scheme on the Lie group SE(3) that ensures the finite-time convergence of trajectory tracking errors. Sun and Zheng [10] extended the adaptive control to consider thrust misalignment in spacecraft rendezvous and proximity mission. Zhang et al. [11] further applied an adaptive finite-time fast terminal sliding mode controller to achieve spacecraft formation flying control. These approaches can obtain finite-time convergence control in the presence of model uncertainties and external disturbances.
New approaches have been developed that can yield rapid or even real-time, near-optimal, trajectory generation by taking simplifications [12,13] or approximations [14]. Other methods include planar optimal energy planning [15], inverse methods [16], modified Rapidly exploring Random Tree (RRT * ) [17], modified Fast Marching Tree (FMT * ) [18], convex planning [19], artificial potential field-based guidance [20], and polynomial planning [21]. Convex planning is suitable for problems subject to convex constraints such as descent and landing [19], unmanned path planning [22], and rendezvous and proximity operations [23,24]. On the contrary, time-varying nonconvex motion constraint can be considered by combining artificial potential field with backstepping methodology [20]. Inverse dynamics based polynomial planning methods consisted of parameterizing nominal trajectories via high-order polynomials, converting the original optimal control problem into a nonlinear programming problem [21,25]. These studies mainly focused on the rapid guidance of the chaser body, ending with final conditions suitable for docking or capture, while neglecting the dynamic coupling between the manipulator and the base in the capture stage. The orbit [26] and/or attitude [27] of the space robots are generally assumed to be uncontrolled so that the focus is placed on the motion planning and control of manipulators [26][27][28]; real-time near-optimal [29] or optimal [30] manipulator motion trajectories can be obtained.
Significant advancements have been made with regard to both relative motion control and manipulator motion control; however, the reconciliation between the two fields remains to be solved. The division of the mission into approach stage and capture stage can reduce the complexity of the problem [15], but the sum of the two optimal results of the two stages is not necessarily the optimal solution of the whole mission. Considering the highly dynamic nature of the capture mission, the narrow capture window, and the limited energy, the objective is to find optimal trajectories and controls in real-time, during the whole approach and capture procedure. Benefiting from machine learning technology [31,32], coordination between the optimal control and real-time control becomes possible. Izzo et al. [33] performed a survey on the artificial intelligence trends in spacecraft guidance dynamics and control, which involved a review of the intersection of the guidance and control with evolutionary optimization, tree searches, and machine learn-ing. Sánchez-Sánchez and Izzo [34] applied deep learning methods to represent the optimal control action during a pinpoint landing. Furthermore, Izzo et al. [35] investigated the use of machine learning and evolutionary techniques in interplanetary trajectory design. Biggs and Fournier [36] trained a multilayer perceptron (MLP) to achieve real-time optimal attitude control of a 12U CubeSat. All these inspiring works attempted to train deep neural networks to learn from traditional optimal control solutions and later directly provide control commands by inputting the measured variables or even images.
Offline solutions of optimal control problems require heavy computational costs to solve an iterative optimization that restricts their applications in real time. Meanwhile, online guidance and control only consider designing control policy in a single scenario, e.g., either 6-DOF guidance control or manipulator motion control. In this work, a more comprehensive implementation of the dynamic system that covers the spacecraft, the manipulators, and their inherent interactions is considered. While the system is more complicated, they can be efficiently solved via our proposed neural network-based approach. Three neural networks are trained to mimic the optimal control policy in a data-driven manner. Then, the optimal relative motion guidance and control for 6-DOF relative motion spacecraft can be implemented in real-time through neural network inference. The main contributions are stated as follows: (i) Machine learning technology has been successfully applied to solving complicated optimal control problems. This problem involves considerably more complicated dynamics, DOFs of state variables and control variables compared with landing problems or interplanetary transfer problems (ii) Multiple neural networks are jointly designed to collaboratively learn optimal solutions in a synchronized framework. It is difficult to train a single network with a small number of samples, while the generation of huge optimal control solutions as training samples is costly. Simulation results indicate that multiple networks perform well through carefully designed inputs and outputs even when the number of samples are small (iii) The proposed control strategy exhibits comparable capture performance as tracking of optimal solutions in ideal cases and can be implemented in real time The remainder of this article is organized as follows. Section 2 is devoted to the problem formulation, where the motion dynamic model of the approach and capture mission is presented. In Section 3, control strategies applied to generate optimal relative motion control and manipulator motion control solutions are presented. In Section 4, a series of Radial Basis Function (RBF) neural networks are designed and their inputs and outputs are discussed. As an illustration, numerical simulations are carried out in Section 5. Finally, brief concluding remarks in Section 6 end this study.

2
International Journal of Aerospace Engineering

Problem Formulation
According to the measurement of Kucharski et al. [1], ENVISAT is presently flying in an approximate sunsynchronous orbit with a side rotating attitude. According to the ESA, the main body of ENVISAT can be regarded as a 5 m × 10 m × 5 m cube rotating approximately around its z axis at an angular velocity of 0.8631°/s, as obtained by the approximate calculation of the formula proposed by Koshkin et al. [4]. The +z surface of this body is full of sensors and devices, the -z surface is obscured by antennas, the ±x surfaces are obscured by the Advanced Synthetic Aperture Radar (ASAR), and the -y surface is obscured by the solar array of the satellite. Consequently, only the +y surface is suitable for capture. A chaser spacecraft (hereinafter referred to as the chaser) installed with two 7-DOF manipulators is designed to approach and capture the ENVI-SAT. The chaser is a small spacecraft launched from a mother spacecraft, which reaches up to a dozen meters from the ENVISAT after a Lambert transfer. Subsequently, the spacecraft starts to approach and capture the ENVISAT, which is the focus of this paper. The overall mission is depicted in Figure 1. The inertial coordinate system is termed S i (Ox i y i z i ). The centroid orbital coordinate system is termed S o (Ox o y o z o ). The x axis of this system is along the flight velocity direction, z axis points to the earth, y axis can be determined by right-hand rule, and origin is at the centroid of the ENVISAT. The body coordinate system of the ENVISAT is termed S tb (Ox tb y tb z tb ), and the body coordinate system of the chaser is termed S cb (Ox cb y cb z cb ). The x axes of these systems are along the spacecraft origin forward directions, and their z axes point to the earth when both the spacecraft are in the normal earth pointing flight mode.

Dynamical Equations.
A series of equations are required to describe the motion regulations of the ENVISAT, chaser, and manipulators, namely, the orbit kinematics equations, orbit dynamics equations [37], attitude kinematics equations, attitude dynamics equations [38], multibody kinetics equations, and multibody dynamics equations [39].
The orbit and attitude motion regulation of the target can be expressed as where the subscription "tar" refers to "target," r tar = ½r tarx , r tary , r tarz is the 3-axis position vector of the ENVISAT, v tar = ½v tarx , v tary , v tarz is the 3-axis velocity vector of the ENVISAT, I tar is the moment of inertia of the target, q tar = ½q tar0 , q tar1 , q tar2 , q tar3 is the quaternion of the ENVISAT, ω tar = ½ω tarx , ω tary , ω tarz is the 3-axis angular velocity of the ENVISAT, and Ωðω tar Þ can be expressed as : ð2Þ The motion regulation of the chaser with its manipulators can be expressed as where Θ = ½r cha , e cha , θ is the generalized coordinate of the chaser including its 3-axis position vector r cha = ½r chax , r chay , r chaz , 3-axis Euler angle e cha = ½e chax , e chay , e chaz (which is replaced by chaser's quaternion q cha = ½q cha0 , q cha1 , q cha2 , q cha3 to avoid ambiguity in Equation (5)), and the joint angles of the two 7-DOF manipulators θ = ½θ 1 , θ 2 , ⋯, θ 14 ; HðΘÞ is a symmetrically positive definite inertia matrix of the chaser with manipulators; Cð _ Θ, ΘÞ is a nonlinear matrix including Coriolis and centrifugal forces; and U = ½F, M, T T is the control variables for the chaser including its 3-axis control forces F = ½F x , F y , F z , 3-axis control moment M = ½M x , M y , M z , and the 14 joint torques T = ½T 1 , T 2 , ⋯, T 14 . The subscription "cha" refers to "chaser." Detailed definition of HðΘÞ and Cð _ Θ, ΘÞ can be seen in [40] and is not discussed here. The dynamic coupling of the chaser spacecraft and the manipulators can be more obviously seen if Equation 3 International Journal of Aerospace Engineering is not only related to the position (r cha ), velocity (_ r cha in _ Θ), and acceleration (€ r cha in € Θ) of the chaser spacecraft but also related to e cha and θ. The coupled nature can be more clearly reflected by applying the Newton-Euler method [41] due to its recursive form; i.e., a forward recursion is performed to propagate the link velocities and accelerations, followed by a backward recursion to propagate forces.
The governing dynamics assume 20 controls that can be used by the chaser and the manipulators to achieve the approach and capture mission: which has been fully defined after Equation (3).

Performance
Indices. Three performance indices are considered to evaluate the control optimality of the chaser, namely, the fuel consumption J f , time consumption J t , and impact of the manipulator end effector J v . Considering I sp as the specific impulse of the thrusters, J f can be expressed as Considering t f as the final capture time, J t can be expressed as J v can be measured using the x component of the velocity of the manipulator end effector relative to the desired capture point Δv end as follows: where Δv endx is the x axis projection of Δv end in S tb . By defining the fuel consumption coefficient as k f , time consumption coefficient as k t , and impact coefficient as k v , the overall performance index J can be obtained as follows: 2.3. Constraints. Two main constraints are considered during the approach and capture procedure, namely, the collision avoidance constraint C 1 and state variable constraint C 2 . The collision avoidance can be taken into consideration by imposing the following path constraint: the mass center of the chaser spacecraft must remain at a distance larger than the minimum distance (a "keep out" sphere with a radius Δr min ) from the mass center of the ENVISAT, which can be defined as The state variable constraints state that all the state variables and control variables cannot exceed their range of values. By defining X min and X max as the minimum and maximum values of state vector X, respectively, and U min and U max as the minimum and maximum values of controls U, respectively, the state variable constraints can be expressed as

Real-Time Optimal Control Strategy
It is difficult to directly optimize the entire problem due to the complex coupling motion of the orbit, attitude, and manipulator. Thus, it is necessary to reduce the complexity of the problem step by step. The coupling motion of the orbit and attitude is first considered and optimized using the GPM. The manipulator deployment, point, and capture control are then considered, and a feedforward compensation control algorithm is designed to eliminate the coupling among the manipulators, orbit, and attitude. Finally, an iterative optimization is established to obtain the optimal control strategy of the manipulators, orbit, and attitude. Let W be the range of initial values of the state variables. A set of initial values X 0 can be randomly selected in W. Let Φ be a neighborhood sufficiently close to X 0 . A series of value sets fX i g can be randomly selected in Φ, and each value set is known as a neighborhood member. For each neighborhood member, the optimal control sequence UðtÞ and state sequence XðtÞ can be obtained by using the methods described in the following sections.

Relative Motion Control.
When the manipulators are locked, only the orbit and attitude motions of the chaser are needed to be considered. The state variables and control variables can be written as where Δr and Δv denote the relative position and velocity vectors between the chaser and the ENVISAT, respectively. The relative orbit dynamics of the chaser and ENVISAT in the orbit frame centered at ENVISAT can be described using the Hill-Clohessy-Wiltshire (C-W) equations [42]. By replacing the orbit dynamics equations with the C-W equations and removing the multibody kinetic and dynamic 4 International Journal of Aerospace Engineering equations, a series of explicit differential equations to govern the law of motion of the approaching stage can be obtained as follows: Since the manipulators are not considered, the value of k v in the overall performance index J is zero. Thus, the approach mission can be transferred into an optimal control problem. That is, for a system described by Equation (14), the control u is calculated to transfer the state vector x from the initial state x i to the terminal state x f under constraints C 1 and C 2 while optimizing the performance index J.
In the optimal control problem, besides path constraints C 1 and C 2 , certain extra constraints at the final time t f (often referred to as the terminal constraints) are needed to define x f and make the optimization problem solvable. The terminal constraints include the position constraint T 1 , velocity constraint T 2 , and attitude constraint T 3 as follows: where L oi is the rotation matrix between S i and S o and V is the extra relative velocity. If V = 0, the chaser remains relatively static with the ENVISAT; otherwise, the chaser flies over the ENVISAT at Δr d with a relative speed of V. L ob is the rotation matrix between S cb and S o and is determined by q cha . The above optimal control problem can be solved using the well-developed numerical toolkit Gauss Pseudospectral OPtimization Software (GPOPS) [43]. The output is the optimal state variable sequence xðtÞ and the optimal control variable sequence uðtÞ, t = t 0 , ⋯, t f . Boyarko et al. [6] proved that a pseudospectral solver provided a solution remarkably close to the truly optimal one by using the results obtained by applying the Minimum Principle (MP). In their research, the chaser started to approach at a distance of 5 m, and the MP solution returned a value of a minimum control perfor-mance index J = 0:11341185 compared with J = 0:1133 obtained by the pseudospectral solver. The GPOPS has also been employed to compute the optimal trajectories and generate the appropriate training set in a lunar landing mission [44]. In this case, the training set was used to train a deep Convolutional Neural Network (CNN) followed by a Recurrent Neural Network connecting with a Long Short-Term Memory (RNN-LSTM) and is effective in the planar landing problem.
Given the size of the ENVISAT, a similar approach scenario is considered, in which the chaser starts at a distance of approximately 15 m from the mass center of the ENVISAT. The ENVISAT has an initial small angular velocity around its x body axis and an angular velocity of 0.1484 rad/s around its z body axis. The minimum safe distance Δr min is set to be 6.8 m, which means that the chaser approaches the ENVI-SAT to a distance of 1.8 m since the distance of the mass center of the ENVISAT to its +y surface is 5 m. The detailed ranges of all the initial values are presented in Table 1 to form a neighborhood, from which 388 numbers of neighborhood members (initial conditions) are randomly selected.
Researchers determine the coefficients in the performance index according to the mission requirements. If the chaser is required to approach the ENVISAT as fast as possible, k f is set as zero to ensure that the optimal control problem becomes a minimum time control problem. On the contrary, if the chaser is required to approach the ENVISAT with the least fuel consumption, the problem becomes a minimum control problem, and k t is set as zero.
Here, k f is set to be 2/3 and k t is set to be 1, which means that both the fuel consumption and approach time are considered.
For each neighborhood member, an optimal control sequence and the corresponding optimal trajectory are determined using the GPM. Each trajectory consists of 40 discrete points. The 388 optimal relative orbit trajectories are shown in Figure 2. A sample relative orbit trajectory (m = 12) along with the attitude of the chaser is shown in Figure 3.
Note that not all the optimal trajectories and corresponding control sequences are used to train the networks. 26 trajectories of bad quality are eliminated so that only 362 trajectories are used to train the networks.

Manipulator
Control. The manipulator installed on the chaser spacecraft is shown in Figure 4. The base coordinate system is termed S m (more specifically termed S lm /S rm for the left and right manipulators). The modified Denavit-Hartenberg (D-H) method is used to establish the multibody kinetics of the manipulator, and the D-H parameters are listed in Table 2. The initial joint angles of the folded manipulators are listed in the last column of Table 2.
A common deployment strategy is designed by using the following defined desired joint angular accelerations: By integrating Equation (16), the joint angular velocities can be obtained, and by integrating it again, the joint angles can be obtained. By determining _ ω c , the joint torques T at any time during the deployment can be calculated by using Equation (3).
After deployment, a Proportional-Differential (PD) control algorithm is designed to control the joints of the manipulators. The calculation of the desired joint angles can be divided into the following 4 steps: Step 1. Let θ 3 = θ 3const , and calculate the angles of the other joints as 1 θ i .
The desired angle of joints 4, 2, and 1 can be calculated as follows: The rotation matrix M 04 can be derived from the abovementioned desired angles of joints 1, 2, 3, and 4. M 07 can be obtained according to the desired position and attitude of the manipulator end effector. Thus, M 47 (transformation matrix between the coordinate systems of links 4 and 7) can be  : Subsequently, the desired angles of joints 5, 6, and 7 can be calculated using the components of M 47 as follows: or θ 5 = tan −1 −g 33 −g 13 , For the attitude of the end effector of a certain manipulator, 2 groups of values of θ 5 , θ 6 , and θ 7 are calculated using Equations (20) and (21). For the position of the end effector of a certain manipulator, 4 groups of values of θ 1 , θ 2 , θ 3 , and θ 4 are calculated using Equation (17). Together, 8 groups of desired joint angles are available. One group of joint angles is chosen as the desired joint angles to minimize the sum of the rotation angles from the current angles to the desired angles.
Step 3. Linearize the equation for the manipulator in intervals ½ 1 θ i , 2 θ i . The variation of the angle of joint 3 Δθ 3best can be optimized by using linear programming.
The desired angular velocity of each joint can be calculated by using the Moore-Penrose inverse of the Jacobian matrix JðθÞ as follows: where v e is the desired velocity of the manipulator end effector and ω e is the angular velocity of the manipulator end effector.
Since the desired angle and angular velocity of each joint have been calculated, the PD control law is employed to solve for the joint torque as follows: Twelve neighborhood members are randomly selected from the 388 neighborhood members. Each member of the manipulator neighborhood is calculated, and all the trajectories of the joint angle varying with time are obtained. Taking joint 2 of the left manipulator as an example, the optimal joint angle trajectories of all the 12 neighborhood members are determined, as shown in Figure 5.

Feedforward Compensation and Iterative Optimization.
A recursive form of the multirigid body dynamic model can be established using the Newton-Euler method as follows:   When i = 0, Γ 0 = ½ f 0 , m 0 T denotes the transferring force and moment between the manipulator base and the chaser body. By calculating and applying these values inversely as the feedforward to the chaser orbit and attitude control, the control variables of the chaser can be updated as follows: Since the disturbance is estimated, it cannot be precisely eliminated by using the feedforward compensation using uðtÞ. The possible influences may include the deviation of the optimal trajectory, which makes uðtÞ not optimal, as shown in Figure 6. The orbit and attitude motion trajectory is assumed as xðt 0 Þ, ⋯, xðt i Þ, ⋯, xðt f Þ (solid line) by assigning the terminal state xðt f Þ when the manipulators do not move. When the manipulators move and finish the capturing, the orbit and attitude motion trajectory is modified to x * ðt 0 Þ, ⋯, x * ðt i Þ, ⋯, x * ðt f Þ (dash line) with the terminal state x * ðt f Þ. One or more iterations are performed to update uðtÞ to make x * ðt f Þ closer to xðt f Þ. Considering that Δx * ðt f Þ = x * ðt f Þ − xðt f Þ, the iteration stops when Δx * ðt f Þ is less than a preset constant value, and the optimal orbit and attitude trajectory x ′ ðt f Þ (dot-dash line) is obtained. The corresponding optimal control sequence is UðtÞ = u ′ ðtÞ + Γ 0 ðtÞ.

Learning the Optimal Control
A series of RBF networks are chosen to reduce the optimization time consumption while maintaining the solution to be optimal or at least suboptimal. The network input and output are obtained from the optimal solutions and their initial values of the state variables. By assuming the number of neighborhood members as m and the line number of state sequence X of each neighborhood member as n, the number of lines of all the neighborhood members can be defined as m · n. Considering xðtÞ and uðtÞ as the t th lines of XðtÞ and UðtÞ, respectively, xðtÞ and uðtÞ together form a sample, where xðtÞ is the input and uðtÞ is the output.
It is unrealistic to directly provide all the xðtÞ and uðtÞ as samples to train a single large neural network, and decoupling the input and output according to the motion characteristics of the chaser is necessary. In this case, the motion of the chaser can be divided into 3 parts: the orbit motion, attitude motion, and manipulator motion. Each part can be further divided according to the dimensions.

Coordinate Transformation.
The optimal trajectories and the corresponding optimal controls form the so called "stateaction pairs" and are supposed to be used to train networks. However, the RBF networks cannot be directly trained by using these state-action pairs because the state variables are anisotropic, which means that the same relative trajectory has different relative state variables in different directions. To eliminate the anisotropy, it is necessary to transform the state variables and control variables to a new body-based coordinate system.
Here, a training coordinate system S n is defined as shown by the blue line in Figure 7. The x axis of S n coincides with the x axis of S o . The y axis of S n is the projection of the rotation axis of the ENVISAT on the yoz plane of S o . The z axis can be determined using the right-hand rule. The rotation matrix between the training coordinate system and the orbital coordinate system is L no . By transferring the inputs and outputs to S n , the anisotropy can be eliminated, and the regulations between the inputs and outputs can be more easily learned.

Orbit Control
Networks. Three RBF neural networks N F x , N F y , and N F z are designed according to the three-axis motion of the chaser orbit. These networks are supposed to provide the three-axis control forces by providing certain inputs. x(t i ) x ⁎ (t f ) Figure 6: Iterative optimization process. Therefore, it is vital to determine the factors that affect the decision to implement the control forces. The instant relative position ΔrðtÞ and relative velocity ΔvðtÞ play a dominant role in the decision, while the terminal relative position Δrðt f Þ and relative velocity Δvðt f Þ may influence the decision, especially when the chaser is close to the terminal position. The maximum relative velocity Δv max affects the decision of the acceleration or deceleration. Hence, rðtÞ, vðtÞ, rðt f Þ, vðt f Þ, and Δv max are selected to form n × 5 matrices I F i (i = x, y, z) as the input, and F from uðtÞ is selected to form n × 1 matrices O F i (i = x, y, z) as the output, which can be formulated as The training procedure for the orbit control networks is shown in Figure 8. First, I F i and O F i are transferred to the training coordinate system while neglecting the orbital angular velocity. Next, both I F i and O F i are divided into a training dataset and a testing dataset after normalization. The networks are trained using the training dataset and tested using the testing dataset. The trained networks onboard provide the three-axis forces by inputting the appropriate values from the sensors.
During the normalization procedure, the inputs and outputs are both normalized to the [-1, 1] interval. M F in~i s defined as the overall dataset of the input and output data after normalization, σ F i ∈ ½0, 1 as the test ratio, j = σ F i · n · m as the line number of the training data, and k = 1 − j as the line number of the test data. j lines are randomly selected from M F in~t o form the training dataset M F in train~. The remaining k lines are used to form the test dataset M F in test~. The N F i network can be trained using M F in train~a nd tested using M F in test~a fter setting an appropriate spread speed and training goal.
The output of the trained network when the input is For every moment t k , the deviation can be written as ΔF F in ðt k Þ. Mean square error (MSE) is used to evaluate the training error of the network, which can be calculated as By antinormalizing O F in~, the actual output x, y, z) can be obtained.

Attitude Control Networks.
Similarly, the attitude motion can be controlled by three networks N M x , N M y , and N M z . The training process of N M i (i = x, y, z) is shown in Figure 9. The only difference between the orbit control network and the attitude control network is in terms of the input segment, which contains a process to transform q cha and q tar to relative quaternions Δq and relative angular velocities Δω.
Furthermore, Δq is transferred to the relative Euler angle Δe so that the dimension of xðtÞ can be reduced as follows: Similar to that for the orbit control networks, the selection of the input and output elements for the attitude control networks is determined by analyzing the factors affecting the output moments. Here, the elements selected from x ′ ðtÞ to form the inputs I M x , I M y , and I M z and the outputs O M x , O M y , and O M z are as follows:   International Journal of Aerospace Engineering Note that the attitude control is related not only to the relative attitude states, i.e., Δe i ðtÞ and Δω i ðtÞ, but also to the relative position and velocity states, i.e., Δr i ðtÞ and Δv i ðtÞ in Equation (29), because the orbit and attitude are coupled in relative motion.
The other procedures are similar to those of the orbit control networks and are not repeated here.

Manipulator Joint Control
Networks. Joints 2 and 4 are selected to be controlled by the networks because they exert a considerable influence on the manipulator shape. For a 7-DOF manipulator, once a redundant joint (joint 2 or joint 4) is determined, the other joint angles can be calculated. The left and right manipulator control networks can be designed as N l 4 and N r 4 , respectively. The training process of N l 4 is shown in Figure 10 as an example.
The inputs I l 4 and I r 4 and outputs O l 4 and O r 4 no longer need to be transferred to a certain coordinate system, and they are designed as follows: Note that the outputs of the manipulators are the joint angles θ i , and the joint torques T can be obtained using Equation (23) after determining _ θ i . The other procedures are similar to those of the orbit control networks and are not repeated here.

Training
Process. The training parameters of the orbit, attitude, and manipulator control networks are listed in Table 3. In our investigation, the training performances using the parameters different from those listed in Table 3 either match the training performance using the parameters in the table or exhibit a degradation. The training process of N F x with 50 iterations is shown in the left part of Figure 8, and the training process of the left manipulator joint 4 control network N l 4 with 22 iterations is shown in the right part of Figure 11.
After training, the MSEs of the orbit control networks N F x , N F y , and N F z are 0.499%, 0.788%, and 0.119%, respectively. The MSEs of the attitude control networks N M x , N M y , and N M z are 0.299%, 0.499%, and 0.299%, respectively. The MSEs of the manipulator control networks N l 2 , N l 4 , N r 2 , and N r 4 are 0.084%, 0.099%, 0.085%, and 0.096%, respectively.
The trained networks can be completely described by a series of parameters containing the input dimension n i , output dimension n o , number of neurons n n , input layer weight I w , output layer weight L w , hidden layer weights b 1 and b 2 , maximum value O o max , and minimum value O o min of the normalization range. Once the networks are well trained, the network parameters are loaded into the control computer of the chaser to control its orbit, attitude, and manipulators.

Results and Discussion
5.1. Simulation Case Design. As discussed above, the ENVISAT can be regarded as a 5 m × 10 m × 5 m cube by neglecting its solar array and other deployed devices. The geometric center of the ENVISAT, mass center of the ENVI-SAT, and origin of S tb are considered to coincide for simplicity. According to Gómez and Walker [45], the mass of the ENVISAT is 8211 kg, the moment of inertia is (124801.21, 16979.74, 129180.25) kg·m 2 , and the product of inertia is 0 kg·m 2 . The z axis is the maximum inertia principal axis of the satellite, and ENVISAT is assumed to rotate at an angular velocity of 0.8631°/s [4]. At some point, the instantaneous angular velocity of ENVISAT is assumed to be (0.15, 0, 0.85)°/s and the corresponding nutation angle is 9.6752°, which satisfies the conclusion of Pittet et al. [5]. The instantaneous attitude of ENVISAT is assumed to be (0, 78, 86)°a s if it is "lying" on its side in the orbit. According to Section 2, only the +y surface of the ENVISAT is suitable for capture. Therefore, (0, 5, -0.  A small chaser is designed as a 0:88 m × 1 m × 1 m cuboid with a length of 1.5 m after installing the two unfolded manipulators. The mass of the chaser is 100 kg. The moment of inertia of the chaser is (100, 100, 100) kg·m 2 , and the product of inertia is 0 kg·m 2 . The geometric center of the chaser is fixed with the mass center of the chaser, and it coincides with the origin of S cb . The two manipulators are installed at (0.49, -0.37, 0) m and (0.49, 0.37, 0) m on the +x surface (front surface) of the chaser. The mass and inertia parameters of the manipulator are listed in Table 4 by referring to Papadopoulos and Dubowsky [46] who considered the manipulator as a cylinder for simplicity.
The maximum thrust force of the chaser is 15 N, the maximum control moment is 15 N·m, and the specific impulse is 300 m·s -1 . The maximum joint torque is 40 Nm, and the maximum rotation angular velocity of each joint is 20 rpm.
An overall of six cases listed in Table 5 are designed to evaluate the performance and verify the effectiveness of the network controller. The first three cases are designed to evaluate the performance of the network controller compared to a tracking control strategy of the optimal trajectories without considering measurement uncertainty. The chaser is locating at three different positions expressed in S tb relative to the ENVISAT near (0, 15, 0) m at the initial time. The latter three cases are designed to verify the effectiveness of the network controller under different nutation situations with initial measurement uncertainty. The chaser is locating at the same position expressed in S tb relative to the ENVISAT. In the latter three cases, the ENVISAT has a slightly smaller absolute rotation angular velocity (0.8631°/s compared to 0.8668°/s in the previous three cases). Case 4 simulates the situation in which the ENVISAT has nutation, although it is not considered and measured. Cases 5 and 6 simulate the situations in which the ENVISAT has nutation, but it is not measured accurately at the initial time. The real value of ω t in the table is the instantaneous angular velocity of the ENVI-SAT relative to the inertial coordinate system expressed in S tb at the initial time.
In each case, two different control strategies are applied on the chaser: the traditional control strategy that plans at first and then tracks the optimal control result and the network control strategy proposed in this paper. The following performance indices are used to evaluate the mission performance, they are (i) J t : the overall mission duration, defined by Equation (8) (ii) Δr end = |Δr end |: the relative position between the manipulator end effector and the desired capture    (iv) J f : fuel consumption defined by Equation (7) 5.2. Capture without Measurement Uncertainty. The simulation results of the first three cases are presented in Table 6. The average J t of the traditional control strategy is 34.849 s, and that of the network method is 38.828 s. The average J v of the traditional control strategy is 0.036 m/s, and that of the network method is 0.027 m/s. The average J f of the traditional control strategy is 0.517 kg, and that of the network method is 0.586 kg. Considering that k f is 2/3, k t is 1, and k v is 100 (considerably larger to show the importance of the manipulator end impact and to balance the small magnitude of J v ), the overall J of the tracking control strategy is 27.350, and that of the network controller is 29.171. Compared to the tracking control strategy, the performance loss of the network controller is approximately 6.6%. Since there is no measurement uncertainty in the first three cases, the optimal trajectories planned offline are the true optimal solutions. The tracking result of those optimal trajectories can be regarded as the standard to evaluate the performance of the network controller proposed in this paper. The first three cases indicate that there exist approximately 6.6% performance loss of the network controller. That is, the network controller provides a suboptimal control close to the true optimal control solution. The losses in performance occur due to the following three aspects:   (2) Training errors (commonly evaluated considering the MSE) that reflect the gaps between the output of the networks and the test datasets. The training error can be reduced by selecting the appropriate network parameters, which can be realized with experience by using the Monte Carlo method.
(3) Generalization errors that reflect the prediction precision to the unknown flight conditions. The generalization errors are generally related to the network types and structures. The RBF network is a shallow net-work, and the number of neurons is determined automatically in the training process. The division of the networks according to the dimensions is based on human intelligence. Further research on different kinds of networks and different division methods is in progress.

Capture with Measurement
Uncertainty. The simulation results of the latter three cases are presented in Table 7. The average J t of the traditional control strategy is 36.502 s, and that for the network methods is 41.686 s. The average J v of the traditional control strategy is 0.124 m/s, and that for the network controller is 0.045 m/s. The average J f of the traditional control strategy is 0.517 kg, and that for the network controller is 0.834 kg. Considering that k f is 2/3, k t is 1, and k v is 100, the overall J of the traditional control strategy is 37.252, and that of the network controller is 33.125. Since there exist measurement errors of ω t at the initial time, when the nutation is not measured accurately, the tracking of the offline planned trajectories results in larger Δr end and J v (bold numbers in Table 7). By using the network controller, the average J v is 63.71% lower than that of the tracking control strategy, which means that the capture mission has a lower risk to produce debris. On the contrary, the network controller consumes more time and fuel than those controlled by the traditional control strategy, which can be seen from the average values of J t and J f . Taking the simulation results of case 5 as an example, the relative orbit varying with time expressed in S tb is shown in Figure 12(a). The joint 2 and joint 4 angles of the left manipulator varying with time are shown in Figure 12(b).
The vertical velocities of both the left and right manipulator end effectors relative to the desired capture points in all the latter three cases under different control strategies are shown in Figure 13. When the nutation was neglected, the chaser's manipulator end effectors hit the ENVISAT at a relatively high speed when controlled by the traditional control strategy, as depicted using the red lines in case 4. When the nutation is taken into account but the measurement is not accurate (cases 5 and 6), the chaser's manipulator end effectors still heavily impact the ENVISAT when controlled by the traditional tracking control strategy (red lines), compared to when the chaser is controlled by the networks (blue lines). This indicates that the network controller can conduct a softer capture with less impact.
The relative positions between both the manipulator end effectors and the desired capture points within the capture 14 International Journal of Aerospace Engineering plane at the moment of impact are shown in Figure 14. The dots closer to the origin indicated a smaller capture position deviation. It can be seen from the image that in all the latter three cases, the chaser controlled by the networks exhibits a smaller capture position deviation (blue dots) compared to that of the chaser controlled using the traditional control strategy (red dots). This indicates that the network controller can conduct a more accurate capture action. The total flight history of the chaser in case 5 is shown in Figure 15. Multiple flight moments from the initial time to the final time are shown in Figure 15(a). At the initial time, the chaser is at the right of the ENVISAT, and the ENVISAT rotates approximately around its z axis (not exactly around the z axis because of nutation). The relative flight trajectory between the chaser and the ENVISAT is plotted in the white line. Figure 15(b) shows the deployment, pointing, and capturing procedure of the two manipulators.

Conclusions
This paper proposes a real-time optimal control strategy to capture the ENVISAT in the presence of nutation and measurement uncertainty using artificial intelligence. From the simulation results, the following observations have been obtained. Firstly, in ideal cases, the capture performance of our method is comparable with the tracking control approach. Secondly, the average J v (the relative velocity between the manipulator end effector) is 63.71% lower than that of the tracking control approach, implying the capture mission has a lower risk to produce debris. Thirdly, the solution of the tracking control problem requires an average computational cost of 1128 seconds to generate the policies while our approach is accomplishable in real time. It is noted that the inference time of the neural network is 12.97 milliseconds on ARM (STM32F417) that can be omitted in this case. As a result, our proposed control strategy is particularly suitable to solve a variety of other control tasks where performance and efficiency are both required.

Data Availability
The training sample data and trained networks used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that they have no conflicts of interest.