Time-Optimal Trajectory Planning for the Manipulator Based on Improved Non-Dominated Sorting Genetic Algorithm II

: To address the issues of low efﬁciency and lengthy running time associated with trajectory planning for 6-degree-of-freedom manipulators, this paper introduces a novel solution that generates a time-optimal path for a manipulator while adhering to its kinematic limitations. The proposed method comprises several stages. Firstly, the kinematics of the manipulator are analyzed. Secondly, the manipulator’s joint-space path points are interpolated via the quintic B-spline curve. Subsequently, the non-dominated sorting genetic algorithm II (NSGA-II) is improved by applying reinforcement learning to optimize its crossover and mutation probabilities, and the variable neighborhood search (VNS) algorithm is integrated to enhance its local search capability. Finally, the joint increments and running time of the manipulator are optimized using the improved NSGA-II, and the time-optimal trajectory is then determined by simulation on MATLAB. Furthermore, compared with other conventional optimization methods, the proposed approach has reduced the total running time by 19.26%, effectively improving the working efﬁciency of the manipulator.


Introduction
Recent years have seen the rise of manipulators as critical components in numerous industries, as advancements in industrial technology have led to their growing importance. Consequently, the demand for high-performance manipulators has also increased significantly. Research on manipulators primarily includes aspects such as structural design [1], trajectory planning, and trajectory tracking [2,3]. Among these, trajectory planning is crucial in the design process of manipulators.
Many experts and scholars have conducted extensive research on various optimization problems related to manipulators, such as time-optimal [4], energy-optimal [5], impactoptimal [6], and multi-objective optimization [7], and have proposed a variety of solutions. In recent years, research and discussion have increasingly focused on developing efficient trajectory planning methods for manipulators, with a particular emphasis on minimizing the time required to complete a task.
When planning a time-optimal trajectory for manipulators, the goal is to generate a path that satisfies the manipulator's kinematic constraints while also minimizing the time required to complete the task. One approach that was proposed in paper [8] is to use multiple quadratic radial basis functions to generate smooth motion trajectories for the manipulator. This method uses two objective functions, one based on the total running time and the other on the square of the jerk, which are minimized in order to achieve a manipulator's optimal trajectory in time. Experimental results show that this method can generate highly smooth trajectories within a relatively short running time.
Paper [9] proposed a smooth and efficient trajectory planning method by combining an adaptive elitist genetic algorithm with singularity avoidance mechanism and quintic polynomials. The algorithm includes an elite population and an adaptive adjustment mechanism, which improves the recognition ability of the optimal solution. Furthermore, the singularity avoidance mechanism is introduced to prevent singular points in the generated trajectory. The combination of the adaptive elitist genetic algorithm with quintic polynomials generates smooth and efficient motion trajectories.
In paper [10], an enhanced teaching-based optimization algorithm is proposed, which incorporates variable neighborhood search to keep the algorithm out of local optima. The algorithm is further combined with the quintic B-spline function to generate time-optimal motion trajectories.
The particle swarm optimization algorithm was first improved by introducing population exchange operations and a tail elimination principle to enhance the algorithm's search ability [11]. Then, fuzzy reward-penalty theory was introduced to plan manipulator trajectory accurately while effectively reducing computational complexity for the manipulator, which leads to improved optimization efficiency by minimizing redundant decoupling operations.
An improved adaptive cuckoo search algorithm was presented in paper [12] based on the continuous quintic B-spline curve algorithm for motion trajectory planning of manipulators, which can effectively generate time-optimal joint space smooth trajectories.
Although the papers mentioned above have made positive contributions to timeoptimal trajectory planning for manipulators, they did not address the issue of the motion path of manipulators. Specifically, they only achieved time-optimal trajectory planning on a fixed path without considering whether that path was the optimal solution.
In contrast to the previous approaches, paper [13] introduced a method that not only plans the time-optimal trajectory for a manipulator but also searches for the optimal path, considering both aspects simultaneously. Given the start and final points, the intermediate path points and the running time are treated as variables, and the time-optimal trajectory is planned by means of an improved genetic algorithm. Although this method achieved shorter running times compared to other methods, it only utilized a single-objective genetic algorithm that treats running time as the objective function, without considering whether the path itself is the shortest.
This paper presents an improved method for time-optimal trajectory planning of the manipulator. Firstly, the quintic non-uniform rational B-spline (NURBS) curve is used to fit the manipulator trajectory passing through the path points. Then, to address the shortcomings of the traditional fast non-dominated sorting genetic algorithm II (NSGA-II), such as slow convergence speed and easy entrapment in local optimal values, the traditional NSGA-II is improved by using the reinforcement-learning algorithm to optimize the crossover and mutation probabilities and by introducing the variable neighborhood search (VNS) algorithm to enhance the capabilities of local search. The improved algorithm and other multi-objective optimization algorithms are tested on the test function to verify the reliability of the improved algorithm. Finally, the proposed approach utilizes an improved version of the NSGA-II to plan the time-optimal trajectory of the manipulator, taking into account both the path and time optimization simultaneously. The variables are designed as path points and their running time. The first objective function is designed as the total joint increment of the six joints of the manipulator along the whole trajectory, the second objective function is the total time of the current path, and a penalty function is introduced to limit the manipulator's angular velocity, angular acceleration, and angular jerk. By utilizing the iterative improvement of NSGA-II, the proposed method generates the optimized trajectory for the manipulator, incorporating both path planning and time optimization simultaneously. This study conducts simulation verification on MATLAB using PUMA560 and demonstrates that the proposed method yields positive effects on the trajectory optimization of the manipulator compared with other trajectory planning methods.
The remaining sections of this paper are as follows. Section 2 introduces the commonly used PUMA560 manipulator and established its model. Section 3 discusses the NURBS interpolation and establishes the interpolation matrix. Section 4 presents the improved NSGA-II and compares it with other algorithms. Section 5 performs trajectory planning for the manipulator and compares it with other methods. The final section presents the conclusion of the paper.

The Manipulator Modelling
The PUMA560 is a six-jointed robot with three joints used for determining the end effector's position, while the remaining three joints determine the wrist orientation. The axis of the first joint is vertical, while the axes of the second and third joints are horizontal and parallel, with a distance of a2 between them. The joint 1 and joint 2 axes intersect perpendicularly, while the axes of joints 3 and 4 intersect alternately at a distance of a3. The manipulator's schematic diagram is presented in Figure 1, while Figure 2 shows the corresponding link coordinate system, and Table 1 displays the corresponding modified D-H data [14]. Appl. Sci. 2023, 13, x FOR PEER REVIEW 3 of 25 NURBS interpolation and establishes the interpolation matrix. Section 4 presents the improved NSGA-II and compares it with other algorithms. Section 5 performs trajectory planning for the manipulator and compares it with other methods. The final section presents the conclusion of the paper.

The Manipulator Modelling
The PUMA560 is a six-jointed robot with three joints used for determining the end effector's position, while the remaining three joints determine the wrist orientation. The axis of the first joint is vertical, while the axes of the second and third joints are horizontal and parallel, with a distance of a2 between them. The joint 1 and joint 2 axes intersect perpendicularly, while the axes of joints 3 and 4 intersect alternately at a distance of a3. The manipulator's schematic diagram is presented in Figure 1, while Figure 2 shows the corresponding link coordinate system, and Table 1 displays the corresponding modified D-H data [14].  NURBS interpolation and establishes the interpolation matrix. Section 4 presents the improved NSGA-II and compares it with other algorithms. Section 5 performs trajectory planning for the manipulator and compares it with other methods. The final section presents the conclusion of the paper.

The Manipulator Modelling
The PUMA560 is a six-jointed robot with three joints used for determining the end effector's position, while the remaining three joints determine the wrist orientation. The axis of the first joint is vertical, while the axes of the second and third joints are horizontal and parallel, with a distance of a2 between them. The joint 1 and joint 2 axes intersect perpendicularly, while the axes of joints 3 and 4 intersect alternately at a distance of a3. The manipulator's schematic diagram is presented in Figure 1, while Figure 2 shows the corresponding link coordinate system, and Table 1 displays the corresponding modified D-H data [14].
When solving for the manipulator with joint angles as variables, setting up a coordinate system for each joint of the manipulator is crucial. To construct the coordinate system for  Table 1 are needed. The range of rotation for each joint is primarily used to eliminate irrelevant solutions when the manipulator has multiple solutions during the solution process [15]. Each link can be transformed by its transformation matrix using the coordinate system shown in Figure 2 and the data provided in Table 1, as demonstrated in Equation (1): In Equation (1), cθ represents cos θ, and sθ represents sin θ. According to Equation (1), by substituting the data of PUMA560, the forward kinematics equation of PUMA560 can be obtained: Simplifying Equation (2) yields the forward kinematics equation for PUMA560: The variables in the equation are defined as Equation (4): Equation (4) describes the position and orientation of coordinate system {6} relative to the base coordinate system {0}, where c 23 = cos(θ 2 + θ 3 ) and s 23 = sin(θ 2 + θ 3 ).

The Fundamental Formulas of NURBS
Non-uniform rational B-splines (NURBS) is a mathematical curve representation method that uses a small number of control points and weights to describe a curve. NURBS curves have good local support characteristics, as changing one control point only affects the adjacent control points rather than the entire curve [16]. Using NURBS functions with non-uniform distribution of knot vectors can effectively prevent abrupt changes. The NURBS curves are represented by piecewise rational polynomial vector functions as Equation (5): where d i represents the control points, ω i represents the weight factors of the control points, and N i is the basis function of the B-spline function [17], which can be expressed from the recursion formula as Equations (6) and (7): To obtain the trajectory passing through m path points. The NURBS curve requires the definition of control points d i (i = 0, 1, · · · , m) and the knot vector Regulating u 0 = u 1 = · · · = u k = 0, u n = u n+1 = · · · u n+k+1 = 1, and the other knot vectors are determined by the time intervals h j between the path points using the chord-length parameterization method [18] where n = m + k − 1.

Matrix Expression of Quintic NURBS Curve
Based on the derivative formula of k-th NURBS curves, the angular velocity, angular acceleration, and angular jerk of the manipulator at the moment u can be obtained by Equation (9): where According to Equation (10): To solve the NURBS curve with n + 1 unknowns, four endpoint constraints need to be added to make the system of equations have a unique solution [19]. Therefore, four constraint equations are introduced as Equation (11): This constraint equation constrains the angular velocity and angular acceleration at the start and final positions of the manipulator. Substituting it into Equation (5) yields a matrix equation for solving all the control points.
The remaining points are the known numerical values. The displacement equation of the manipulator can be obtained from Equation (5), and the angular velocity, angular acceleration, and angular jerk equations of each joint of the manipulator can be obtained from Equation (9).

The Improved NSGA-II
The NSGA-II optimization method, proposed by Strinivas and Deb in 2002, is an improved version of the NSGA. NSGA-II was developed based on the non-dominated sorting concept of Goldberg and is derived from the genetic algorithm [20]. To distribute the population into different non-dominated levels, NSGA adopts a slow-sorting method, and to maintain the diversity of the population, it utilizes the sharing function method [21]. Specifically, NSGA-II is a non-dominated sorting algorithm that utilizes a fast-sorting method and has a worst-case computational complexity that is relatively low. It has been proven in practice that NSGA-II has improved in terms of optimization effect and computation time compared to NSGA and is an excellent multi-objective optimization algorithm [22][23][24].
Some experts and scholars have also made improvements to NSGA-II. In paper [25], the K-means algorithm was used to divide each generation of parent populations into two categories, which were selected for genetic evolution through tournament selection, and then, the differential evolution (DE) algorithm was integrated to effectively improve the algorithm's search capability. In paper [26], an adaptive NSGA-II(AONSGA-II) was proposed in which the crossover and mutation probabilities change with the iteration number and was combined with the 2-optimization algorithm, effectively improving the algorithm's performance.
The genetic evolutionary behavior of NSGA-II mainly depends on the crossover and mutation operations. Due to the randomness of the probabilities of crossover and mutation, their crucial parameters cannot be adaptively optimized based on the current population state during the algorithm evolution process. Therefore, there are also drawbacks, such as in the context of solving large-scale multi-objective task scheduling problems, some common issues that arise include the risk of getting stuck in local optima and the potential for decreased computational efficiency, resulting in the efficiency and quality of the solution failing to meet the requirements. In previous algorithms, the probabilities of crossover and mutation are often determined through extensive experimentation or prior experience, which are not conducive to the preservation of excellent individuals in the population. During the population evolution iteration process, to maintain population diversity and preserve the individuals with higher fitness, the selection strategy should prioritize their preservation in the population while avoiding disrupting the population by excluding them from crossover and mutation. Individuals with relatively lower fitness should have a higher probability of participating in crossover and mutation so that the population can evolve towards a better direction. Reinforcement learning focuses on the autonomous interaction process between the agent and the environment, which does not involve external interference. Reinforcement learning provides a new solution for the self-learning of evolutionary parameters.

Reinforcement-Learning Algorithm
Reinforcement learning is a trial-and-error learning method where an agent interacts with the environment and receives rewards that guide its behavior towards maximizing the total reward obtained over a sequence of actions. Reinforcement learning is different from supervised learning in connectionist learning in that the reinforcement signal in reinforcement learning is a scalar evaluation of the goodness of the actions taken provided by the environment rather than an instruction on how to produce the correct action. Since the external information provided by the environment is limited, the reinforcement-learning system (RLS) learns from its own experience by interacting with the environment and adjusting its actions based on the rewards received. In this way, the RLS acquires knowledge about the action-evaluation environment and adapts its strategies accordingly. Table   SARSA and Q-learning are two main algorithms in reinforcement learning used to train the value function Q [27]. The Q-table is initialized as a matrix with a number of rows and columns equal to the number of states and actions, respectively, and all its entries are set to zero. As shown in Equation (13), the feedback on actions is updated based on the environment and state.

Design Q
Q-learning is a model-free method that does not require previous state values and can be executed in a stochastic dynamic environment. It updates the Q-table based on the maximum expected reward using Equation (14). SARSA, on the other hand, is an improved version of Q-learning, which trains the action value function using Equation (15). Both algorithms have their own advantages, and the experimental results presented by Sutton and Barto [28] showed that SARSA has a faster convergence speed, while Q-learning achieves better performance in the end. Furthermore, paper [29] has shown that combining SARSA and Q-learning can improve the convergence performance of the single objective genetic algorithm. In this paper, we adopt the SARSA algorithm in the initial phase and switch to the Q-learning algorithm in the subsequent phase, as shown in Equation (16):

Design States
The objective of using the reinforcement-learning algorithm is to find the optimal solution for the NSGA-II, while considering the diversity and balance of the optimization process. The design of the environment state for the algorithm should consider the fitness of the population and other factors, such as the crowding distance. In the algorithm, the construction of the environment state mainly considers the following aspects: Among them, Equation (17) obtains the fitness value of an individual by weighting the objective function, where gen represents the iteration number, w i is the weight parameter that determines the specific value based on the requirements of each objective function, f (x gen ) represents the gen-th generations linear weighted sum of the objective functions of individual i in the gen-th population, P[N gen ] dis is the crowding distance of the individual in the gen-th generations population, balance represents the balance degree of each individual, N is the number of populations, M is the number of objectives, and F represents the non-dominated layers. Equation (18) normalizes the average fitness of the population by the average fitness of the first generation to reflect the convergence of the population. Equation (19) represents the total crowding distance of the population in each generation. Equation (20) converts the crowding distance into a minimum value, where a smaller value represents a larger crowding distance, reflecting the diversity of the population. Equation (21) represents the balance degree of an individual, and a smaller balance degree indicates a more uniform distribution of the individual on each objective and a better multi-objective balance. Equation (22) represents the balance degree of the population. Equation (23) obtains the total state value of the population by weighting. The average fitness and diversity of the population are beneficial to improve the quality of the entire population and obtain excellent individuals so the three weights are designed as 0.4, 0.4, and 0.2.

Designing Actions
Actions refer to the responses made by the agent to the current environmental state. For each generation of the population, the agent takes different actions based on the environmental state, that is p c and p m . For p c , the interval value between each action is set to 0.05, with the exploration rate ranging from 0.4 to 0.9 as an example; when 0.4 to 0.45 is selected, a random number is chosen within this range. The same method is also applied to p m , which has a range of 0.01 to 0.21 and an interval value of 0.02 [29].
The action set for crossover probability is shown in Table 2, and the action set for mutation probability is shown in Table 3.

Designing Reward Functions
The agent does not take actions by itself but rather continuously tries to determine which actions will result in higher rewards in order to steer the algorithm towards better directions. The crossover probability p c is adjusted by comparing the overall states of each generation using Equation (24), while the mutation probability p m is adjusted by comparing the crowding distances of each generation using Equation (25). As the goal of the objective function in this paper is to minimize, the reward is positive if the population in generation gen-th is better than that in generation gen − 1-th, and negative otherwise.

Movement Selection Strategy
At the beginning of the algorithm, the agent has no learning experience to use, and all the data in the Q-table are zeros. Therefore, the agent can only explore and learn through trial and error. By constantly exploring the unknown environment, the agent accumulates more and more experience and uses this knowledge and experience to guide the selection of actions. A formula ε − greedy that considers both exploration and exploitation is shown in Equation (26): In the formula, ε is the greedy rate, and k 0−1 is a random number between 0 and 1. When the agent selects an action, it can either choose the action that maximizes the Q value, which is called the greedy policy, or it can perform exploration and choose a random action.

VNS Algorithm
The variable neighborhood search (VNS) is a metaheuristic algorithm proposed by Mladenović and Hansen in 1997 [30]. The VNS algorithm expands the search space to obtain a local optimum by changing the neighborhood structure.

VNS Algorithm Concept
The basic idea of the VNS algorithm is to use multiple different neighborhood structures to systematically search the solution space, obtain a local optimal solution first, and then modify the neighborhood structure to explore a broader search space and discover another optimal solution. The algorithm first uses the smallest neighborhood structure for the search. If the obtained result cannot be further improved, the algorithm switches to a slightly larger neighborhood structure for the search. If the obtained result can still be improved, the algorithm returns to the smallest neighborhood structure for further search.

Neighborhood Design
Considering the optimization objective of this paper is the joint increments and total time of the manipulator; the time-optimal trajectory of the manipulator can be obtained by minimizing the joint increments while reducing the total time as much as possible. To achieve this goal, five neighborhoods are designed as follows: 1.
Randomly select a displacement variable and subtract a certain value from it.

2.
Randomly select two time variables and exchange them.

3.
Randomly select a time variable and subtract a certain value from it. 4.
Randomly select a displacement variable and multiply it by a random number.

5.
Randomly select a time variable and multiply it by a random number.

Flow Chart of the Improved NSGA-II
Combining reinforcement learning with the NSGA-II requires considering rewards, states, actions, and selection policies. The state of NSGA-II is viewed as the environment state, and each iteration updates the state from S gen to S gen+1 . The learning module consists of a reinforcement-learning agent and a reward function R. The overall execution process involves four steps. Firstly, the agent acquires the environmental state S gen at gen-th generation, and then selects an action a i based on the action selection policy. The agent adjusts p c and p m by executing the action, and NSGA-II uses the adjusted p c and p m for genetic operations. After completing the evolution operation, the environmental state is updated from S gen to S gen+1 . Finally, the reward function is used to calculate the reward and update the Q-table values for the corresponding state and action. After gen times iterations, the action selection for p c and p m are optimized based on past learning experiences. The flowchart is shown in Figure 3.

Comparative Verification
This paper selects four test functions of multi-objective algorithms, including ZDT1, ZDT2, ZDT3, and ZDT6, to assess the effectiveness of the improved NSGA-II [31], which are shown in Table 4. These functions cover convex, concave, and discontinuous features, which can effectively evaluate the comprehensive performance of most multiobjective evolutionary algorithms. The performance of the proposed algorithm is compared with other multi-objective optimization algorithms, such as the multi-objective particle swarm optimization algorithm (MOPSO), the multi-objective evolutionary algorithm based on decomposition (MOEA/D), the strength pareto evolutionary algorithm 2(SPEA2), KNSGA-II [25], and AONSGA-II [26].
The standard NSGA-II is set population size N = 100, iteration times gen = 300, crossover probability p c = 0.8, and mutation probability p m = 0.1. The improved NSGA-II is set with population size N = 100, iteration times gen = 300, learning rate of α = 0.06, discount rate γ = 0.85, and greediness rate ε = 0.6.
To avoid making the article too lengthy, only the Pareto optimal frontiers of the traditional NSGA-II and the improved NSGA-II are shown in Figure 4 for each test function. Other optimization algorithms were only compared in terms of the mean inverted generational distance (IGD) and hypervolume (HV) values, as shown in Tables 5 and 6.

Comparative Verification
This paper selects four test functions of multi-objective algorithms, including ZDT1, ZDT2, ZDT3, and ZDT6, to assess the effectiveness of the improved NSGA-II [31], which are shown in Table 4. These functions cover convex, concave, and discontinuous features,

Test Function Representation
Variable Range To avoid making the article too lengthy, only the Pareto optimal frontiers of the traditional NSGA-II and the improved NSGA-II are shown in Figure 4 for each test function. Other optimization algorithms were only compared in terms of the mean inverted generational distance (IGD) and hypervolume (HV) values, as shown in Tables 5 and 6.     The IGD and HV mean values obtained from running each function ten times are shown in Tables 5 and 6. The bold portion represents the algorithms with the best results in terms of IGD and HV. To ensure fairness in the comparison, the same population size and number of iterations were used for the other multi-objective algorithms, and the remaining parameters were set to commonly used values.
According to the above results, although the MOEA/D algorithm has a lower average IGD value on the ZDT6 function, it does not demonstrate greater advantages on other test functions. KNSGA-II has higher average HV values, but its IGD results are not very good. Through comparison, it can be found that the proposed improved NSGA-II has better comprehensive performance and better search capability than other optimization algorithms, providing a foundation for the subsequent trajectory planning and enabling the search for better trajectories.

Model Building
The proposed method's feasibility was demonstrated by implementing it on a PUMA560 manipulator model in MATLAB, utilizing the data presented in Table 1. The model of the PUMA560 manipulator is shown in Figure 5: Considering the working performance and limits of each joint of the manipulator, the kinematic constraints are given in the Table 7: To fulfill the kinematic constraints of the manipulator, penalty functions were introduced on each objective function to ensure that the motion trajectory meets the kinematic constraints. The fitness function is designed as shown in Equation (27):

Model Building
The proposed method's feasibility was demonstrated by implementing it on a PUMA560 manipulator model in MATLAB, utilizing the data presented in Table 1. The model of the PUMA560 manipulator is shown in Figure 5: Considering the working performance and limits of each joint of the manipulator, the kinematic constraints are given in the Table 7: To fulfill the kinematic constraints of the manipulator, penalty functions were introduced on each objective function to ensure that the motion trajectory meets the kinematic constraints. The fitness function is designed as shown in Equation (27): , 1 (1) ( ) In the Equation (27), u represents the number of control points, and When each joint exceeds the kinematic constraint, the penalty function is activated: Figure 5. The PUMA560 model. In the Equation (27), u represents the number of control points, and f (1) represents the joint increment of the six joints of the manipulator on the current motion trajectory, P i,j represents the joint increment of the th joint based on the th control point of the quintic NURBS curve, f (2) represents the total running time of the manipulator on the current trajectory, and E a , E b , and E c are penalty functions for each fitness function. When each joint exceeds the kinematic constraint, the penalty function is activated: The data of p min , p max , V max , a max and J max from Tables 1 and 7, E b and E c , are added once to the adaptation thresholds of 1000 • and 5 s, respectively, otherwise E a = 1,

Experimental Simulation
According to the parameters of PUMA560 in Table 1 The population size N = 100, the number of iterations gen = 300, the learning rate α = 0.06, the discount rate γ = 0.85, and the greediness rate ε = 0.06. The improved algorithm was used to solve for the time-optimal trajectory of the PUMA560 manipulator, and the path points and time are shown in Table 8. Based on the data in Table 8, the angular displacement, angular velocity, angular acceleration, and angular jerk of the manipulator are plotted in Figure 6. algorithm was used to solve for the time-optimal trajectory of the PUMA560 manipulator, and the path points and time are shown in Table 8. Based on the data in Table 8, the angular displacement, angular velocity, angular acceleration, and angular jerk of the manipulator are plotted in Figure 6.  As shown in Table 8 and Figure 6, the optimal time for the manipulator is 5.899 s, and the resulting angular displacement, angular velocity, angular acceleration, and angular jerk curves are smooth and continuous and all within the constraint range.

Comparative Experiment
Through comparative experiments, it has been verified that the proposed method in this paper can achieve shorter robot-arm-running time compared to other methods. The comparative experiments all use the quintic NURBS curves to fit the path points of the manipulator, with the difference being that a single-objective genetic algorithm is used to find the optimal time based on both uniform and random paths. Specifically, uniform path refers to a path where the points between the start and final points increase or decrease uniformly. Random path, on the other hand, consists of arbitrary points that satisfy the constraints given in Table 7. The experimental results are shown in Table 9:  As shown in Table 8 and Figure 6, the optimal time for the manipulator is 5.899 s, and the resulting angular displacement, angular velocity, angular acceleration, and angular jerk curves are smooth and continuous and all within the constraint range.

Comparative Experiment
Through comparative experiments, it has been verified that the proposed method in this paper can achieve shorter robot-arm-running time compared to other methods. The comparative experiments all use the quintic NURBS curves to fit the path points of the manipulator, with the difference being that a single-objective genetic algorithm is used to find the optimal time based on both uniform and random paths. Specifically, uniform path refers to a path where the points between the start and final points increase or decrease uniformly. Random path, on the other hand, consists of arbitrary points that satisfy the constraints given in Table 7. The experimental results are shown in Table 9: According to the data in Table 9, using the PUMA560 model generated from Figure 5, three trajectory planning images were plotted, as shown in Figure 7. The angular displacement, angular velocity, angular acceleration, and angular jerk curves for the random path are plotted in Figure 8. Similarly, the curves for the uniform path are plotted in Figure 9. According to the data in Table 9, it can be observed that the new time-optimal trajectory planning method proposed in this paper reduces the total running time by 46.23% compared to the traditional method of random path (or fixed path) and by 19.26% compared to the uniform path method. This effectively improves the efficiency of the manipulator. The four images in Figure 6 indicate that the obtained curves satisfy the kinematic constraints in Table 7, and the curves are uniform and continuous, demonstrating the effectiveness of the generated trajectory. By comparing the three trajectory planning methods in Figure 7, it can be observed that the proposed method passes through the start and final points, does not encounter singular points, and has a similar running trajectory to the uniform path method (which has a shorter running trajectory), demonstrating the effectiveness and rationality of the proposed method. In conclusion, the new time-optimal trajectory planning method proposed in this paper has a positive significance compared to conventional methods.
To verify the search capability of the improved NSGA-II, trajectory planning for the manipulator was performed using NSGA-II, MOPSO, MOEA/D, SPEA2, KNSGA-II [25] and AONSGA-II [26]. The resulting running times for the manipulator optimization using different algorithms are shown in Table 10. Additionally, trajectory planning curve graphs were plotted for each algorithm, as shown in Figures 10-13.  According to the data in Table 9, it can be observed that the new time-optimal trajectory planning method proposed in this paper reduces the total running time by 46.23% compared to the traditional method of random path (or fixed path) and by 19.26% compared to the uniform path method. This effectively improves the efficiency of the manip-      (e) (f)  (e) (f) Based on the data in Table 10 and the trajectory curve figures in Figures 10-13, all algorithms satisfy the kinematic constraints in Table 7, validating the constraining effect of the penalty function on the results. By comparing the results, it can be observed that the traditional NSGA-II achieves shorter time durations compared to other optimization algorithms, demonstrating that NSGA-II is more suitable for solving the problem presented in this paper. KNSGA-II and AONSGA-II exhibit better search capabilities than the traditional NSGA-II, resulting in shorter time durations. However, when comparing with the improved NSGA-II proposed in this paper, the total runtime is optimized to 5.899 s, Based on the data in Table 10 and the trajectory curve figures in Figures 10-13, all algorithms satisfy the kinematic constraints in Table 7, validating the constraining effect of the penalty function on the results. By comparing the results, it can be observed that the traditional NSGA-II achieves shorter time durations compared to other optimization algorithms, demonstrating that NSGA-II is more suitable for solving the problem presented in this paper. KNSGA-II and AONSGA-II exhibit better search capabilities than the tradi-tional NSGA-II, resulting in shorter time durations. However, when comparing with the improved NSGA-II proposed in this paper, the total runtime is optimized to 5.899 s, which is 38.4% and 54.4% shorter than KNSGA-II (9.591 s) and AONSGA-II (12.939 s), respectively. This further confirms that the improved NSGA-II possesses superior search capabilities.

Conclusions
The main work of this paper is to use an improved NSGA-II to simultaneously optimize the path point and total time of the manipulator to obtain the time-optimal trajectory. However, it cannot guarantee that the obtained trajectory is the shortest path of the manipulator.
Firstly, a uniformly and continuously moving trajectory of the manipulator is obtained by using quintic NURBS curve interpolation. Then, the reinforcement-learning algorithm and VNS algorithm are introduced to enhance the search capability of the algorithm. Subsequently, the improved NSGA-II is used to optimize the joint increments and running time of the manipulator arm. Finally, the correctness of the results is verified using the PUMA560 manipulator in MATLAB and compared with other optimization methods. The results show that the proposed method can achieve shorter running time.
The main contributions of this paper are the improvement of NSGA-II and a new method of trajectory planning for manipulator on shorter running time, which seeks shorter total running time while minimizing the joint increments. Based on comparative experiments, the improved NSGA-II has better overall performance compared to other optimization algorithms. Moreover, the new time-optimal trajectory planning method proposed in this paper effectively reduces the total runtime and improves the working efficiency of the manipulator compared to traditional methods. While this paper proposes an effective trajectory planning method, there are still some limitations to be addressed. Specifically, the method does not ensure finding the global optimal path and may not be able to guarantee the optimal running time of the trajectory, both of which are highly dependent on the search capability of the optimization algorithm. Therefore, for future work, it is crucial to improve the algorithm to have better search capabilities and design more suitable fitness functions to ensure that the path obtained is the shortest and the time obtained is the optimal time. Data Availability Statement: All data generated or analyzed during this study are included in this published article.