Research on improved ant colony optimization for traveling salesman problem

: As one of the most popular combinatorial optimization problems, Traveling Salesman Problem (TSP) has attracted lots of attention from academia since it was proposed. Numerous meta-heuristics and heuristics have been proposed and used to solve the TSP. Although Ant Colony Optimization (ACO) is a natural TSP solving algorithm, in the process of solving it, there are also some shortcomings such as slow convergence speed and prone to fall into local optimum. Therefore, this paper proposes an improved ant colony optimization based on graph convolutional network: Graph Convolutional Network Improved Ant Colony Optimization (GCNIACO). The graph convolutional network is introduced to generate a better solution, and the better solution is converted into the pheromone on the initial path of the ACO. Thereby, the guiding e ﬀ ect of the pheromone concentration for the ants at the beginning of the algorithm is enhanced. In the meantime, through adaptive dynamic adjustment of the pheromone volatility factor and the introduction of the 3-opt algorithm, the algorithm’s ability to jump out of the local optimum is enhanced. Finally, GCNIACO is simulated on TSP datasets and engineering application example. Comparing the optimization results with other classical algorithms, it is veriﬁed that the graph convolutional network improved ant colony optimization has better performance in obtaining the optimal solution.


Introduction
This kind of optimization problem that finds the optimal solution in a finite set of feasible solutions is called combinatorial optimization problem. With the wide application in various industries and the rapid development of computer technology, the combinatorial optimization problem has developed into an independent branch of operational research. The research questions involve financial investment [1], ecological environment [2], medical biology [3], logistics management [4], transportation [5], industrial engineering [6], medical image [7] and many other fields. Traveling salesman problem (TSP) is one of the most popular combinatorial optimization problems in recent years, and has been widely used as a benchmark for various optimization techniques and meta-heuristic searches. The traveling salesman problem was proposed in 1930, its goal is to find the shortest path to visit each city exactly once and return to the starting point city based on a given list of cities and the distance between cities. Since the feasible solution of this problem is the full permutation of all vertices, as the number of vertices increases, combinatorial explosion will occur, so this is also an NP-hard problem. Accurate algorithms such as cutting plane, branch and bound, the shortest spanning tree and the 2-matching can be used to solve the exact solution of the TSP [8]. However, when the traveling salesman problem is too large, the accuracy of the data sample is not enough, the problem target conflicts, etc., the accurate algorithm runs for a long time and it is difficult to obtain satisfactory results. Therefore, scholars use meta-heuristic and heuristic algorithms to effectively solve such problems. At this time, the algorithm does not insist on an accurate optimal solution, but finds an acceptable optimal solution in a reasonable time.
In the meta-heuristic algorithm, the principle and mechanism of ant colony optimization (ACO) [9] make it a natural traveling salesman problem solving algorithm. When ants are in search of food, they transmit information between individuals by releasing pheromone along the path they travel. At the same time, the ants will choose paths with a higher concentration of pheromone to move. With the continuous search of the ant colony, the shorter the path, the more ants will pass, and the higher the pheromone concentration left on the path, prompting more ants to choose this path to move, thus looping to form a positive feedback mechanism. This allows the ant colony to quickly find the shortest path to find food. In 1991, Italian scholar Dorigo et al. just imitated this foraging behavior of ant colonies in the biological world, and proposed a swarm intelligent bionic optimization algorithm: Ant Colony Optimization (ACO). But the ACO itself also has certain defects and shortcomings. With the continuous in-depth development of research, scholars have proposed many improvements and algorithm fusion schemes. Good results have been achieved and many practical problems in different situations have been solved.
With the rise of machine learning, especially the development of graph neural networks, the learning-based approach is also effective in solving the TSP. The motivation of the paper is that although the ACO has defects and deficiencies, it is possible to achieve good results after improvement. At the same time, the graph convolutional network also has the ability to solve the TSP. Therefore, the article proposes a new improved ant colony optimization based on graph convolutional network: Graph Convolutional Network Improved Ant Colony Optimization (GCNIACO). Aiming at shortcomings of the lack of pheromone and slow convergence in the initial stage of the ACO, the graph convolutional network is introduced to generate a better solution, and this better solution is converted into the initial pheromone of the ant colony through a pheromone conversion strategy. Thereby, the initial solving speed of the algorithm has been improved. Then, aiming at the shortcomings of ant colony optimization that it is easy to stagnation and fall into local optimum in the later stage of iterative stage, the ability of the algorithm to jump out of local optimum is enhanced by dynamically adjusting the pheromone volatility factor and introducing the 3-opt algorithm. A new discrete optimization method based on Jaya algorithm and transformation operators is proposed.
In terms of mean, standard deviation and relative error, the results show that DJAYA is competitive and alternative.
Inferior to DSTAII in some cases. D-GWO [11] Combined with the 2-opt algorithm, a new discrete GWO algorithm is proposed.
D-GWO is able to provide adequate and comparable solutions for symmetric TSP.
Poor performance on instances kroc100 and kroe100.
DHOA [12] A new hybrid optimization algorithm is developed by integrating the excellent performance of deer hunting optimization algorithm and earthworm optimization algorithm.
Demonstrate better convergence performance and lower computational complexity.
The applicability of the algorithm in practical engineering problems has not been verified.
S-ROA [13] A new hybrid optimization algorithm is developed by integrating the excellent performance of spotted hyena optimizer algorithm and rider optimization algorithm.
Able to get rid of premature convergence.
It takes longer time in solving for large scale number of cities.
The ability of the algorithm to jump out of the local optimum is enhanced.
DBAL is not suitable for very large-scale TSP datasets (over 1002 cities); and the running time also needs to be further optimized. neural combinatorial optimization framework [19] Design their own critic to compute a baseline for the tour length; and introduce 2-opt heuristic.
Near-optimal results are obtained on a 2D Euclidean graph.
The applicability of the algorithm on real TSP instances has not been verified.
GNN [20] Train the model to be an efficient message-passing algorithm.
GNNs can learn the decision variables of solving the traveling salesman problem with very little supervision.
Model is not trained and evaluated on the comprehensive set of real and random graphs.
model based on attention layers [21] An attention layer-based model is proposed and the model is trained by using REINFORCE with a simple baseline based on a deterministic greedy rollout.
Significantly improve the learning heuristic results of TSP.
The simple masking procedure in the model cannot meet the feasibility constraints of practical problems.

BGNN [22]
Design a bidirectional message passing layer and train the model with imitation learning.
It can balance running time and performance, and has the ability to jump out of local optimum.
The applicability of the algorithm on real TSP instances has not been verified.
This study's major contributions for the literature are as follows: • A new improved ant colony optimization is proposed to improve its performance when solving the TSP. • The simulation test results indicate that GCNIACO can provide the solutions with good competitive performance for the TSP. • Provide a reference method for the combination of swarm intelligence algorithm and machine learning. Introduce fuzzy logic module to calculate pheromone value; and introduce Taguchi concept to optimize algorithm parameters.
The algorithm is more appropriate for handling complex network.
The pheromone formula and algorithm operation in the actual environment are not perfect.
DEACO [24] Dynamically adjust the parameters of the ant colony optimization.
The convergence speed and search accuracy are improved.
Theoretical analysis is difficult, and the experimental basis is more than theoretical research.
PACON [25] Introduce an angle and angle calculation rule in pheromone transfer rule; and increase the weight of pheromone concentration on the optimal path.
The algorithm convergence speed and convergence accuracy have been improved.
The applicability of the algorithm in practical engineering problems has not been verified.
HAACO has better algorithm performance relative to the compared algorithms.
Solve poorly on some very large instances of TSP.

GA-ACO [27]
The ant population is initialized using the genetic algorithm.
It has stronger global search performance and fast convergence.
No simulation comparisons with other excellent algorithms have been performed.
improved ant colony algorithm [28] The heuristic function is improved; the pheromone update mechanism is improved, and the path selection strategy is improved.
It outperforms traditional algorithms in terms of path length and task waiting time.
The simultaneous working of multiple cranes has not been analyzed.
improved ant colony algorithm [29] The initial pheromone differential distribution strategy is adopted; the local path is optimized in blocks; the pheromone update mechanism is improved; and the path selection strategy is improved.
It can plan the optimal path with effective obstacle avoidance and little number of folds.
The running time of the algorithm has not been analyzed.
IDAACO [30] Heuristic strategy with direction information; adaptive pseudo-random transmission strategy; improved local pheromone update mechanism and improved global pheromone update mechanism.
IDAACO has the advantage in terms of practicality and efficiency.
It is difficult to balance the relationship between the parameters and the calculation formula.
multi-factor improved ant colony algorithm [31] Construct multi-factor heuristic function; distribute initial pheromone stepwise; update pheromone by classification; adopt max-min ant strategy and adaptively adjust pheromone volatile factor.
Good ability to adapt to the environment.
It has not been analyzed and compared with other excellent algorithms in a complex terrain barrier-free environment.
IACO [32] The formula of transition probability is improved; the pheromone volatility factor of self-adaptive adjustment is designed; the variable neighborhood local search embedded by insertion operator and exchange operator is introduced.
The improved effect is obviously better than that of the traditional ant colony optimization.
The model does not take into account the dynamics and heterogeneity of customer needs.
IACO [33] A pheromone update mechanism based on information entropy is designed, and variable neighborhood search is introduced.
Global and local search capabilities have been improved.
The running time of the algorithm is prolonged.
The other parts of the thesis are composed as follows: Section 2 is the relevant literature review; Section 3 introduces the basic concept of the traveling salesman problem; Section 4 introduces the basic ant colony optimization model; Section 5 introduces the principle and optimization mechanism of the proposed graph convolutional network improved ant colony optimization. In Section 6, the improved algorithm is simulated and tested on a wide range of benchmark examples, and compared with the traditional meta-heuristic algorithm to verify the improved effect of the algorithm. Section 7 applies some statistical tests to confirm that the observed differences between the original and improved versions are actually meaningful. Section 8 applies the improved algorithm to solve practical engineering problem to verify the practicality of the algorithm. Section 9 summarizes the overall research work and presents an outlook for the future.

Literature review
This part mainly focuses on the literature review on the solution of TSP and the improvement of ACO in recent years.

Traveling salesman problem solving
In order to obtain a satisfactory solution to the NP-hard traveling salesman problem, scholars have proposed lots of meta-heuristics to solve the TSP. Gunduz et al. [10] proposed a new discrete optimization method called DJAYA for the permutation-coded optimization problems, and studied the performance of the proposed algorithm on 14 different symmetric TSP data sets. Experimental results showed that the algorithm was an optional and competitive optimization algorithm. Panwar et al. [11] based on the meta-heuristic algorithm for solving continuous optimization problems: gray wolf optimizer (GWO), combined with the 2-opt algorithm, produced a novel discrete GWO algorithm for solving the symmetric traveling salesman problem. And the effectiveness of the proposed method was verified by experimental simulation. Kanna et al. [12] proposed a new hybrid algorithm by integrating two meta-heuristic algorithms with good performance: earthworm-based deer hunting optimization algorithm (DHOA). The proposed optimization algorithm showed better convergence performance and lower computational complexity when solving the TSP. Based on the rider optimization algorithm (ROA) and the spotted hyena optimizer algorithm (SHO), Krishna et al. [13] constructed a new hybrid algorithm: spotted hyena-based rider optimization (S-ROA). By solving the traveling salesman problem case, the good competitive performance of the constructed method was verified. Saji et al. [14] proposed a new discrete bat algorithm to solve the TSP. Lévy's flights and neutral crossover operator were introduced to enhance the ability of the algorithm to jump out the local optimum. The overall performance of the algorithm was therefore improved. There are also meta-heuristic algorithms such as discrete crow-inspired algorithms (DC) [15], discrete shuffled frog leaping algorithm (DSFLA) [16], differential evolution (DE) [17], discrete spider monkey optimization (DSMO) [18], etc. have also been proposed and applied to solve the TSP, and all have obtained good results. With the in-depth research and development of machine learning and graph neural networks, solving the TSP based on learning has also attracted the attention of scholars. Deudon et al. [19] proposed a neural combinatorial optimization framework to solve the TSP. In the framework, the city coordinates were used as input, and the neural network was trained by using reinforcement learning to predict the distribution of the city arrangement. Then the good performance of the proposed framework was verified by experimental simulation. Prates et al. [20] successfully applied graph neural networks to solve the TSP, proving that graph neural networks could learn the decision variables of solving the traveling salesman problem with little supervision. Kool et al. [21] proposed a model based on the attention layer, and trained the model by using REINFORCE with a simple baseline based on a deterministic greedy rollout. Finally, satisfactory results were obtained when solving problems such as orienteering problem (OP) and prize collecting TSP (PCTSP). Hu et al. [22] proposed a bidirectional graph neural network to solve the traveling salesman problem on any symmetric graph. The network used imitation learning to sequentially generate the next city to visit. At the same time, by designing a bidirectional message transfer layer, the graph was coded based on edge and partial solutions, so as to construct a near-optimum solution for traveling salesman problem on any symmetric graph. A summary of relevant literature reviews was shown in Table 1.

Ant colony optimization improvement
Ragmani et al. [23] proposed a new hybrid algorithm based on fuzzy logic and ACO to improve load leveling in cloud computing environments. The algorithm used Taguchi experimental design to obtain the optimum parameter value of the ant colony optimization, and defined a fuzzy module to assess pheromone value, thereby speeding up the computation of the algorithm. Aiming at the shortcomings of the ant colony optimization's slow convergence speed and prone to fall into the local optimum, Ebadinezhad et al. [24] proposed a dynamic evaporation ant colony optimization (DEACO) that dynamically adjusted the parameters of the ACO. It was used to solve the TSP example, which verified the good performance of the proposed algorithm in terms of convergence speed and search accuracy. Aiming at the shortcomings of low convergence accuracy and slow convergence speed in the ant colony optimization, Li et al. [25] proposed a pseudo-dynamic search ACO with an improved negative feedback mechanism. By introducing an angle and angle calculation rule in the pheromone transmission rule, it affected the probability of city selection and improved the ability of the algorithm to jump out of the local optimum. Then the algorithm updated the pheromone concentration on the optimum path and the worst path at the same time, and enhanced the weight of the pheromone concentration on the optimum path, thereby improving the convergence speed of the algorithm. Tuani et al. [26] proposed a heterogenous adaptive ant colony optimization (HAACO), which modified the transition probability formula in the ACO to introduce the heterogeneity of ACO. Then by introducing a set of parameter adaptation rules, the adaptiveness of α and β parameters in the ant colony optimization was achieved. At the same time, the 3-opt local search algorithm was integrated into the proposed algorithm to further improve the algorithm's search ability. Aiming at the problem that ACO has strong dependence on pheromone and is prone to fall into local optimum, Zheng et al. [27] proposed an improved hybrid genetic ant colony optimization. The optimal solution produced by the genetic algorithm was used as the initial information of the pheromone in the ant colony optimization to improve the algorithm's global search ability and rapid convergence. Li et al. [28] proposed an improved ant colony optimization to solve the path planning problem of unmanned cranes during hoisting, and improved the solution performance of the algorithm by improving the heuristic function, pheromone update mechanism, and path selection strategy. Aiming at the shortcomings of ant colony optimization to solve the path planning problem, such as falling into local optimum and slow convergence speed, Tang et al. [29] proposed an improved ant colony algorithm. They used a differentiated distribution strategy to improve the initial pheromone concentration on the path. Then, the ACO was improved by adopting local path block op-timization strategy, introducing pheromone self-adjustment enhancement factor, introducing random state transition parameters, etc., so as to improve the convergence and stability of the algorithm. Liu et al. [30] proposed an improved dynamic adaptive ant colony optimization (IDAACO) to handle the problem of pipe routing design. IDAACO had designed four improved new mechanisms: heuristic strategy with directional information, adaptive pseudo-random transmission strategy, improved local pheromone update mechanism and improved global pheromone update mechanism. Then, the excellent performance of the improved algorithm in terms of practicability and high efficiency was verified through experimental simulation. Yang et al. [31] proposed a multi-factor improved ant colony optimization to solve the path planning problem of mobile robots. The performance of the algorithm was improved by constructing multi-factor heuristic functions, assigning initial pheromone stepwise, updating pheromone classification, adopting the maximum and minimum ant strategy and adaptively adjusting the pheromone volatility factor. He et al. [32] proposed an improved ant colony optimization (IACO) to solve the vehicle routing problem with soft time windows. Based on the basic ant colony optimization, they improved the transition probability formula, designed an adaptively adjusted pheromone volatility factor, and introduced the variable neighborhood local search embedded by the insertion operator and the exchange operator. They also set the conditions for starting and exiting the local search, and updated the current local optimal solution. Finally, the effectiveness of the improved algorithm was verified through experimental simulation. Wang et al. [33] proposed an improved ant colony algorithm to solve the green periodic vehicle routing problem with time windows. By designing a pheromone update mechanism based on information entropy in the algorithm and introducing variable neighborhood search, the algorithm's global and local search capabilities were improved. A summary of relevant literature reviews was shown in Table 2.

Traveling salesman problem
Imagine you are planning a travel route now. You start from the city where you live and travel through the cities you are interested in sequentially. You have passed through all these cities and only once. Finally, you are back to the city where you live, so how do you plan your route to minimize the distance you traveled? This is the typical traveling salesman problem (TSP). TSP includes symmetric traveling salesman problem (STSP) [34], asymmetric traveling salesman problem (ATSP) [35], pickupand-delivery traveling salesman problem (PDTSP) [36], multiple traveling salesman problem (MTSP) [37], multi-objective traveling salesman problem (MOTSP) [38], etc. This article mainly takes the classic symmetric traveling salesman problem as the research objective, hereinafter referred to as the traveling salesman problem. It can be described as: in the given n cities C = [C 1 , C 2 , ..., C n ], the distance d i j (i, j ∈ 1, 2, ..., n) between any two cities i and j is known, and d i j = d ji ; find a closed patĥ C = [Ĉ 1 ,Ĉ 2 , ...,Ĉ n ] that minimizes the objective function value of Eq (3.1) and satisfies the constraint condition of traversing each city once and finally returning to the starting point.
According to the knowledge of graph theory in operational research, the quiddity of the traveling salesman problem is to seek a Hamilton loop with the smallest weight in a completely undirected graph with weights. Let G = (V, E) be the weighted graph; V = (1, 2, ..., n) is the set of nodes formed by all cities; E is the set of edges; the distance between each node d i j > 0(i, j ∈ V) is known, and d ii = 0, d i j = d ji ; let the expression of x i j (i, j ∈ V) be: Then the TSP can be modeled as the following integer programming problem [10]: where S is all non-empty subsets of the node set V, and |S | represents the number of vertices contained in the set S . Equation (3.3) is the objective function, which means that the sum of distances is the smallest; Eqs (3.4)-(3.7) are all constraints; Eqs (3.4) and (3.5) indicate that each node in the weighted graph G is reached and left only once; Eq (3.6) constrains to eliminate the occurrence of subtour, and ensures that the optimal closed path is a single large loop that passes through all points.

Ant colony optimization
The ACO is inspired by the collaborative work of ants. By abstracting the problem to be studied as the problem of finding the optimal path, the node model is constructed. The construction process of the solution is the selection and movement process of ants between nodes. Solution elements that satisfy the solution conditions are continuously added to construct a complete feasible solution. Finally, it converges to the optimum solution of the problem under the effect of the pheromone positive feedback mechanism. The principle and mechanism of ant colony optimization make it a natural traveling salesman problem solving algorithm. This article also uses the traveling salesman problem as an instance to introduce the classic ant colony optimization model [39]: In order to ensure the generality of the algorithm, the number of cities is set to n; the number of ants is set to m; the distance d i j (i, j ∈ 1, 2, ..., n) between any two cities i and j is known, and d i j = d ji , the Euclidean distance is used in the paper; the pheromone concentration on the path between city i and city j at time t is set to τ i j (t). At the initial moment, the ACO sets the pheromone concentration on all paths to same value. Then let the ant choose the next city to visit according to a certain selection probability p k i j (t), where p k i j (t) is called the transition probability of the ant k at the time of t from the city i to the city j, which is the path selection process of the ant. The specific formula is expressed in Eq (4.1): where η i j (t) is called the heuristic function, which represents expectation degree of the ants from the city i to the city j at time t, generally η i j (t) = 1 d i j ; allow k is the set of cities currently accessible by each ant k; α is a pheromone factor, which reflects the influence degree of pheromone when the ant chooses a path; β is a heuristic function factor, which reflects the influence degree of the heuristic function when the ant chooses a path. When all the ants complete a complete traversal, that is, after each ant independently completes its own feasible solution construction, the pheromone concentration on the inter-city path will increase while the ants secrete the pheromone, and decrease at the same time as the pheromone volatilizes. The specific update mechanism is as follows: where ρ is the pheromone volatility factor, which reflects the degree of pheromone volatility, ρ ∈ (0, 1); ∆τ i j represents the increased pheromone concentration that all ants secrete pheromone on the path between city i and city j in one cycle; ∆τ k i j represents the increased pheromone concentration that a single ant k secretes pheromone on the path between city i and city j in one cycle, the Ant-cycle model [40] that can utilize global information is generally used to define ∆τ k i j : where Q is a pheromone constant, which denotes the total amount of pheromone secreted by each ant after completing a complete traversal; L k denotes the length of the path taken by a single ant k after completing a complete traversal. Equation (4.2) simulates the ant's pheromone update process, including the volatility of pheromone and the superposition of pheromone on the path that the ant passes; Eq (4.3) is to superimpose new pheromone on all nodes path, and the value of ∆τ k i j is calculated according to Eq (4.4).

Graph convolutional network improved ant colony optimization
The ACO has strong robustness, and the positive feedback mechanism of pheromone makes the ant colony optimization have a strong capacity to discover the optimum solution. But at the same time, the algorithm also has the following shortcomings [41]: in the initial stage of the solution, pheromone is lacking, and the solution speed is slow; in the later stage of the solution, under the influence of pheromone, it is easy to fall into the local optimum. Therefore, in view of the shortcomings of the ACO, this paper proposes a graph convolutional network improved ant colony optimization (GCNIACO). First, the graph convolutional network is introduced to generate a better solution, and the better solution is converted into the initial value of pheromone through the pheromone conversion strategy to improve the initial solution speed of the algorithm; the algorithm's capacity to jump out of the local optimum is then enhanced by dynamically adjusting the pheromone volatility factor and combining with the 3-opt algorithm.

Graph convolutional network
At the initial moment in the ant colony optimization, all paths have the same pheromone concentration, and the attraction to ants is also the same. As a result, the pheromone in the initial stage has poor guidance for the ants' choice of paths, and the convergence rate of the optimal solution is slower. In addition, it may search for irrelevant paths that produce a large number of non-optimal path components. The enhancement of pheromone on these paths interferes with the update of the overall pheromone of the algorithm, which is not conducive to exploring a better path. Therefore, in order to improve the search performance of the algorithm in the initial stage. The graph convolutional network is introduced to generate the better solution, and the better solution is converted into the initial value of the pheromone through the pheromone conversion strategy. Then, the difference in pheromone concentration between the better path and the poorer path at the initial stage is increased.
Convolutional neural network (CNN) requires regular data domains, such as 2-dimensional or 3dimensional Euclidean grid images in computer vision (CV), 1-dimensional sequential text in natural language processing, and so on [42]. However, a lot of data is usually not in the domain of regular data, but in the domain of heterogeneous graphs. This requires the use of another common data structure, that is, a graph composed of vertices and edges. But it is also difficult to directly define operations such as convolution and pooling in the graph, which hinders the development of CNN, so these drives form the graph convolutional network (GCN). In 2019, Chaitanya et al. [43] introduced a new learning-based method to solve the traveling salesman problem, used a deep graph convolutional network to construct an efficient traveling salesman problem graph representation, and then output the optimal path in a nonautoregressive manner through a highly parallelized beam search. The main practice is: take a graph as input, extract synthetic features from the nodes and edges of the graph by stacking several graph convolutional layers, and train the graph convolution model to directly output the adjacency matrix corresponding to the path; then use post-hoc beam search technology to convert the adjacency matrix obtained from the model into a valid path. Among them, the adjacency matrix denotes the probability of the edge appearing in the path; the parameters of the graph convolution model are trained end-to-end by minimizing the cross-entropy loss through gradient descent.
In the construction of the graph convolutional layer, let x l i and e l i j respectively denote the node feature vector and edge feature vector at the graph convolutional layer l associated with node i and edge i j then the node feature and edge feature of the next layer are defined as Eqs (5.1) and (5.2): where W ∈ R h×h is the parameter that needed to train by the graph convolution model; h is the hidden dimension of each graph convolutional layer; σ is the sigmoid function; ε is a small value, which can ensure that the denominator of η l i j is not 0; ReLU is rectified linear unit, which is used as an activation function in the graph convolutional network; BN stands for batch normalization. When l = 0, x l=0 i and e l=0 i j respectively represent the node feature and edge feature of the input layer, and their definitions are as shown in Eqs (5.3) and (5.4) respectively: 1, if node i and node j are k nearest neighbors 2, if node i and node j are selfconnected 0, otherwise After the graph convolution model outputs an adjacency matrix predicting edge occurrence probabilities, a probabilistic heatmap of edge connections is obtained. Using the beam search technique, starting from the first node, the probabilistic heatmap is explored by extending the b most likely edge connections between neighbor nodes. The first b local paths of each stage are then iteratively expanded until all nodes in the graph are visited. At the same time, during the search process, the nodes that have been visited before are shielded to ensure the validity of the path. The final predicted optimal path is the path with the shortest length among the b complete paths after the end of the beam search.
When improving the initial pheromone of the ACO, if the optimal solution predicted and generated by the graph convolutional network is directly converted into the initial pheromone, the information of the graph convolutional network solution may not be fully utilized because the number of selections is too small. Therefore, in order to fully and effectively utilize the information of graph convolutional network solutions, according to the literature [44], this paper designs the following pheromone conversion strategy. The main practice is: set the initial pheromone of the ant colony optimization as τ = τ c + τ GCN . Among them, τ c is the initial pheromone constant, and the value can be specifically set based on the actual situation, but please try to ensure that there is a large difference in pheromone concentration between the better path and the worse path after use, and this paper sets it as τ c = 0.1. τ GCN is generated by the conversion of the optimal solution of the graph convolutional network, which can be calculated by the following formulas: is the pheromone concentration on the path between nodes i and j; q is the first q short paths taken from the beam search of the graph convolutional network, according to the literature [44], the value of q is q = 30; k i j is the number of times that every two city nodes (i, j) appear in q paths; n is the number of city nodes. In order to reduce the interference of pheromone reinforcement on poor paths, this paper sets a threshold for k i j . When the number of occurrences of the two city nodes (i, j) in q paths is less than or equal to 10%q, k i j = 0, otherwise the value of k i j remains unchanged. The specific formula is shown in Eq (5.9):

Dynamic pheromone volatility factor ρ
The pheromone volatility factor ρ represents the volatility degree of pheromone in each iteration, which affects the global search ability and convergence speed of the ant colony optimization. When the value of ρ is too large, the pheromone on each path volatilizes faster, and the ant colony may search the path repeatedly, resulting in a decrease in the convergence speed; when the value of ρ is too small, the volatility speed of pheromone on each path is slow, and the accumulation of pheromone concentration on the path is too high, which may affect the randomness and global search ability of ant colony search, resulting in the algorithm falls into local optimum. Therefore, this paper proposes an improved method for dynamic pheromone volatility factor, as shown in Eq (5.10): where NC max is the total number of iterations of the algorithm; NC is the current iteration number of the algorithm; ρ min is the minimum value of ρ, which is set to prevent ρ from being too small and reducing the algorithm convergence speed. And according to the literature [39], the best empirical result range of ρ is 0.1 ≤ ρ ≤ 0.99, the paper sets the value of ρ min to ρ min = 0.1. The improved pheromone volatility factor change curve was shown in Figure 1.
The main idea of dynamic pheromone volatility factor proposed in the paper is: it is hoped that the pheromone volatility factor value will change with the number of iterations. In the initial stage of the algorithm iteration, the pheromone volatility factor value is set larger, the accumulation of pheromone concentration on the path is less, and the ants can search for more paths, so that the algorithm has a strong global search ability. After the algorithm iterates a certain number of times, the numerical setting of the pheromone volatility factor gradually decreases, so that the accumulation of pheromone concentration on the path gradually increases, which enhances the pulling and guiding effect of pheromone on the ant colony. The ants gradually converge to the path with high pheromone concentration, which speed up the algorithm convergence speed.

3-opt algorithm
When the ant colony optimization is searching for a feasible solution of the path, it may generate a suboptimal path with crossover phenomenon similar to Figure 2(a). The more paths intersect, the longer the path length, and the worse the quality of the obtained feasible solution. Therefore, this article uses the 3-opt algorithm [45] to optimize the ant colony optimization to further shorten the path length. That is, the de-crossing operation is performed on the part of the path that has a similar crossover phenomenon in Figure 2(a), and it is optimized into the optimal path in Figure 2(b). The specific operation steps are: starting from the first node t = 1, 5 points c(t), c(t + 1), c(t + 2), c(t + 3), c(t + 4) are continuously selected each time; swap the positions of the three middle points in turn to get six kinds of sequential arrangements, and the calculated lengths are:

GCNIACO flow
The main idea of graph convolutional network improved ant colony optimization is: the better solution generated by the graph convolutional network is converted into the initial pheromone of the ACO through the pheromone conversion strategy, and the difference of initial pheromone concentration on the best path and the suboptimal path is improved; then the pheromone volatility factor is improved to be dynamically adaptive, and the 3-opt algorithm is used for the de-crossing operation, which further optimizes the path length and enhances the algorithm's ability to jump out of the local optimum. The flowchart of graph convolutional network improved ant colony optimization was shown in Figure 3.
The implementation steps of the GCNIACO algorithm are as follows: Step 1: initialize the coefficients related to the ant colony optimization and calculate the distance matrix between nodes; Step 2: initialize the graph convolutional network and generate the better solution; then convert the better solution into the ant colony initial pheromone through the pheromone conversion strategy; Step 3: randomly select the first city node to start for the ants; calculate the transition probability for each ant between city nodes according to Eq (4.1); select the next city node to visit by roulette, and record it in the path record table; repeat this process until all the ants have visited all city nodes; Step 4: calculate the path length L k that each ant passes through in the path record table; Step 5: the 3-opt algorithm is introduced to perform the de-crossing operation on the best path in the path record table; if the optimized path length L 3-opt is less than the best path min(L k ), use L 3-opt to update min(L k ), and at the same time update the corresponding path node sequence in the path record table; if L 3-opt is larger than min(L k ), directly jump to Step 6; Step 6: record the best solution in the current iteration number; Step 7: according to Eqs (4.2)-(4.4), update the pheromone on the path, wherein the pheromone volatility factor ρ is calculated according to Eq (5.10); then clear the path record table, and jump to Step 3 to continue the iteration; Step 8: when the algorithm reaches the specified maximum number of iterations NC max, stop the iteration and output the best solution.

Generalization comparison of GCN models
When the graph convolutional network solves the TSP and generates a better solution, since the parameters of the graph convolutional network model are independent of the size of the instance city node, the model trained on the smaller instance graph can be used to solve any large instance [43]. In order to explore the influence of generalization effect of the graph convolution model in the graph convolutional network improved ant colony optimization on the holistic improvement of the algorithm, this paper conducts experimental simulations on the scale of the TSP with the city node size of 20, 50, and 100. Among them, the city coordinates of all traveling salesman problem instances are randomly and uniformly sampled in the two-dimensional unit square. The dataset and model hyperparameters of the graph convolutional network part in the GCNIACO algorithm are set by referring to [43]. The training data set corresponding to each problem scale is composed of 1,000,000 pairs problem instances and optimal path schemes, and validation data set is composed of 10,000 pairs problem instances and optimal path schemes. The optimal path scheme here is obtained using Concorde [46]. Each graph convolutional network model consists of 30 graph convolution layers and 3 layers of multi-layer perceptron, and the hidden dimension of each layer is h = 300. The beam width in beam search is fixed to b = 1280. During the training of the graph convolutional network, each training epoch randomly selects 10,000 subsets from the training set and divides them into 500 mini-batches for training. The Adam optimizer [47] with an initial learning rate of 0.001 is then used to minimize the cross-entropy loss for each mini-batch. The trained graph convolutional network model is validated on the validation set every five training epochs. If the validation loss does not reduce by at least 1% of the previous validation loss, then dynamically reduce the learning rate by dividing the learning rate by 1.01, thereby improving the learning ability of the model. Parameters of the ant colony optimization part in the GCNIACO algorithm are set in reference [48], the details were shown in Table 3. Then the GCNIACO algorithm uses different graph convolutional network model to solve the TSP 30 times for 20, 50 and 100 city nodes, and the results were demonstrated in Table 4. It can be obtained from Table 4 that the graph convolutional network model has a certain generalization ability, but there are also certain differences in the effect. On the traveling salesman problem with 20 nodes, the GCNIACO algorithm finds the same optimal solution for all three scale-trained graph convolutional network models. But the model trained with 100 nodes achieves better maximum value and average value than the other two models. On the traveling salesman problem with 50 nodes, the model trained with the corresponding 50 nodes is better than the other two models. However, compared with the solution effect of the model trained by 20 nodes, the model trained by 100 nodes is better. On the traveling salesman problem with 100 nodes, although the model trained with 50 nodes can enable the GCNIACO algorithm to find a better minimum value, the model trained with 100 nodes can still maintain the advantage in the obtained maximum value and average value. Therefore, considering various factors, this paper chooses to use the graph convolutional network model trained on 100 nodes in the GCNIACO algorithm.

Validation of proposed improvement strategies
To verify the effectiveness of the improved strategies proposed in the study, the paper chooses to test it on the traveling salesman problem example with city node size of 50 and 100, where all coordinates are randomly and uniformly sampled in the two-dimensional unit square. Table 5 shows the relevant experimental results. Among them, ACO is the basic ant colony optimization, GCNIACO-1 is the improved algorithm for GCN optimization initialization, GCNIACO-2 is the improved algorithm with GCN and dynamic pheromone volatile factor optimization, and GCNIACO is the improved algorithm proposed in this paper. In order to ensure the validity and reliability of the results, each algorithm is run independently for 30 times, and the parameter settings are basically the same as those in Section 6.1, in which the default parameter is set to ρ = 0.5. It can be seen from Table 5 that the addition of the three improved strategies makes the improved ant colony optimization outperform the original algorithm in terms of best value and average value. Although these three strategies work at different stages, there is no doubt that they all have a positive impact on the results. Graph convolutional network can provide "better" initial values for ACO's pheromone memory. The adaptive dynamic adjustment of pheromone volatility factor can improve the global search ability and convergence speed [49]. The 3-opt algorithm can eliminate the intersections in the middle of the path and further optimize the path length.

Improvements on random TSP maps
By using GCNIACO and ACO to solve the TSP of the same city count, the improvement effect of the improved algorithm is analyzed. The size of the city is set to 20, 50, 100, and 200 respectively, and the specific city coordinates are randomly and uniformly sampled in the two-dimensional unit square. The parameter settings in GCNIACO are the same as in Section 6.1. The parameter settings in ACO are basically the same as those in GCNIACO, and the default parameter setting is ρ = 0.5. The results of 30 simulations are demonstrated in Table 6: Where the TSP optimal solution is the optimal path scheme obtained by using the Concorde solver; "Time" is the average running time of 30 simulations; the deviation rate indicates the degree of deviation between the best value obtained by the algorithm simulation and the optimal solution of TSP. The specific Eq of the deviation rate is shown in Eq (6.1): deviation rate = simulation best solution − TSP optimal solution TSP optimal solution × 100% (6.1) It can be learned from Table 6 that when the common parameter setting values are the same, the simulation best solution, worst solution and average value obtained by GCNIACO in solving traveling salesman problem are a lot better than ACO. However, the proposed graph convolutional network improved ant colony optimization also exhibits a no free lunch theorem effect [50], that is, when the algorithm improves the solving performance of one aspect of the problem, the solving performance on another hand is bound to decrease [51]. GCNIACO improves the ability to find the shortest path while making the program run time longer.
For the 20 city-scale traveling salesman problem, both GCNIACO and ACO can find the same optimal solution as the Concorde solver. But the worst solution and average value obtained by GCNIACO simulation are 0.1365 and 0.0546 less than ACO respectively. For the traveling salesman problem of 50, 100, and 200 cities, although neither GCNIACO nor ACO finds the same or smaller path than the optimal solution of TSP, GCNIACO still shows better performance than ACO. The best solution and average value obtained by GCNIACO for TSP50 simulation are respectively 0.087 and 0.0895 smaller than ACO, and the deviation rate is also 1.4395% smaller. The best solution and average value obtained by GCNIACO for TSP200 simulation are respectively 0.3385 and 0.282 smaller than ACO, and the deviation rate is also 3.0787% smaller. For the traveling salesman problem of 100 cities, the worst solution obtained by GCNIACO simulation is even 0.0841 less than the best solution of ACO simulation, and the deviation rate is 3.9708% smaller. Therefore, GCNIACO does have better solution performance than ACO. In order to more intuitively reflect the solution advantages of GCNIACO, the paper presents the optimization paths and optimization curves of two algorithms for solving TSP20, TSP50, TSP100 and TSP200, as shown in Figures 4-15 respectively.            The simulation results of GCNIACO, genetic algorithm (GA) and simulated annealing algorithm (SA) on the random TSP maps are shown in Table 7. The parameters in GA are set according to the literature [53]; the parameters in SA are set according to the literature [54]. Each algorithm is simulated and run 30 times. As can be seen from the above table, GCNIACO exhibits better performance than GA and SA on all four random TSP maps. For the 20 city-scale traveling salesman problem, GCNIACO and GA have similar solving performance, but outperform SA in terms of mean. For the traveling salesman problem of 50 city-scale and 200 city-scale, GCNIACO outperforms GA and SA in terms of both min and mean. For the 100 city-scale traveling salesman problem, the solution performance of GCNIACO is better than SA. Although GCNIACO fails to find a best value smaller than GA in 30 simulation experiments, it still outperforms GA in terms of mean.

Improvements on the classic TSP dataset
The coordinates of the city nodes used in the above experimental simulations are all generated by random sampling. The following selects and uses five classic data sets (lin105, ch130, ch150, kroA200, pr264) in the TSPLIB standard library [52] to conduct GCNIACO simulation tests. Then it is compared and analyzed with the optimization results of ant colony optimization (ACO), genetic algorithm (GA) and simulated annealing algorithm (SA). During simulation, the parameter settings in GCNIACO are the same as in Section 6.1; the parameter settings in ACO are basically the same as those in GCNIACO, and the default parameter is set to ρ = 0.5; the parameters in GA are set according to literature [53]; the parameters in SA are set according to literature [54]. The results of 30 simulations are demonstrated in Table 8: According to Table 8, the graph convolutional network improved ant colony optimization proposed in the thesis shows good solution performance on the five classic TSP datasets. In 30 simulation tests, the best path shorter than ACO [48], GA [53] and SA [54] can be found. The average value of 30 simulation solutions is also better than the above algorithm, which verifies the improvement effect of the algorithm.

Statistical analysis
To confirm that the observed differences between the original and improved versions are actually meaningful, this section uses two statistical analysis methods to evaluate the differences between the GCNIACO, ACO, GA, and SA. Referring to literature [55] and literature [11], the paper first chooses to use Friedman test among nonparametric tests to verify whether there is a significant difference between the algorithms. The Friedman test is to rank each algorithm, and the algorithm with better performance has a smaller ranking value, that is, the best performing algorithm is ranked 1, the second best is ranked 2, and so on. In the event of a tie, the average ranking is calculated [56]. Under the null hypothesis, this hypothesis states that all algorithms behave similarly. When there is a significant difference between the algorithms, the null hypothesis is rejected. The paper uses SPSS software to perform Friedman test on Tables 6-8, and the rank average of each algorithm is shown in Table 9. Among them, the calculated Friedman statistic for 3 degrees of freedom (according to the χ 2 distribution) is 22.0112. And within the 95% confidence interval, the critical value of the χ 2 distribution is 7.81. 22.0112 is greater than 7.81, and the observed p-value is 6.5E-5 (< 0.05), from which it can be seen that GCNIACO is the best algorithm with the smallest rank among the compared algorithms.
In addition, post hoc analysis using Holm test is also chosen to evaluate the significance of GC-NIACO's better statistical performance. GCNIACO is the control algorithm, and the null hypothesis holds that these algorithms are equivalent. Tables 10 shows the unadjusted and adjusted p-values obtained using the Holm test. Since SPSS software is not convenient for scientific notation here, it is uniformly accurate to three decimal places, which does not affect the analysis of results. Analysis of the data shows that all p-values are less than 0.05, so the null hypothesis is rejected and it can be considered that GCNIACO outperforms ACO, GA and SA at the 95% confidence level.

Engineering case
The above content introduces the proposed graph convolutional network improved ant colony optimization, and verifies the effectiveness of the improved algorithm on some TSP datasets. In this section, the graph convolutional network improved ant colony optimization is applied to solve the vehicle routing problem (VRP) in engineering applications. And the solution result will be compared with the basic ant colony optimization to verify the practicality of the improved algorithm.

Vehicle routing problem
The vehicle routing problem (VRP) [57] was proposed by Dantzig and Ramser in 1959, and it is still at the forefront of combinatorial optimization research. In order to facilitate the establishment of the model, it is assumed that there is only one distribution center and the location coordinates are known; all distribution vehicles start from the distribution center and return to the distribution center in turn after delivery is completed. The rated load of the distribution vehicle is known, the demand at the demand point is known, and each demand point has one and only one vehicle responsible for distribution [58]. According to the above assumptions and reference [59], a model aiming at the lowest distribution cost is established: Minimize: where Z represents the distribution cost; c 0 is the unit cost of the vehicle; m is the count of vehicles; L is the count of demand points; c is the unit cost of the distance traveled by the vehicle; d i j (i, j = 1, 2, ..., L) is the distance between the demand point i and the demand point j; c 1 is the overload penalty coefficient; g i is the demand of the demand point i; Q is the rated load of the vehicle; k(k = 1, 2, ..., m ) is the vehicle number. The distribution center number is 0, the demand points number are 1, 2, ..., and the variables x i jk and y ki are respectively defined as:  (8.5) indicate that only one vehicle is allowed to make one delivery at each demand point. Equation (8.6) indicates that the vehicles all start from the distribution center and finally return to the distribution center.

Solving steps
Step 1: initialize the coefficients related to the ant colony optimization and calculate the distance matrix between nodes; Step 2: initialize the graph convolutional network and generate the better solution that meets the vehicle load requirements; then convert the better solution into the ant colony initial pheromone through the pheromone conversion strategy; Step 3: put m ants in the distribution center, each ant selects the next demand point according to Eq (4.1), and records it in the path record table; then judge whether the vehicle reaches the rated load, if so, return to the distribution center, and insert the serial number of the distribution center into the path record table; repeat this process until all the ants have visited all demand points; Step 4: calculate the distribution cost Z k corresponding to each ant in the path record table; Step 5: the 3-opt algorithm is introduced to perform de-crossing operation on the best path in the path record table, and at the same time, it is judged whether the de-crossing operation meets the vehicle load requirements, if not, the de-crossing operation is cancelled; if the optimized distribution cost Z 3-opt is less than the best cost min(Z k ), use Z 3-opt to update min(Z k ), and at the same time update the corresponding path node sequence in the path record table, if Z 3-opt is larger than min(Z k ), directly jump to Step 6; Step 6: record the best solution in the current iteration number; Step 7: according to Eqs (4.2)-(4.4), update the pheromone on the path, wherein the pheromone volatility factor ρ is calculated according to Eq (5.10); then clear the path record table, and jump to Step 3 to continue the iteration; Step 8: when the algorithm reaches the specified maximum number of iterations NC max, stop the iteration and output the best solution.

Simulation solution
The case data such as distribution center coordinates, demand point coordinates, and demand point demand used in this paper are all from the R101 case in http://web.cba.neu.edu/ msolomon/r101.htm. It is set that the distribution center needs to deliver to 100 demand points. The rated load of the distribution vehicle is set to 500. In the simulation, let c 0 = 0, c = 1, c 1 = ∞, then the distribution cost is only related to the distance traveled by the vehicle. The parameter settings of GCNIACO and ACO are basically the same as those in 6.1, and the default parameter is set to ρ = 0.5. In order to ensure the validity and reliability of the results, each algorithm is independently run 30 times during the simulation, and the solution results are shown in Table 11:  where 0 is the number of the distribution center, and 1, 2, ... are the number of demand points. As can be seen from the above table, compared with ACO, GCNIACO can find the best value with lower distribution cost when solving VRP, and also outperforms ACO in terms of average and worst value. The best path diagrams and optimization curves in Figures 16-18 more intuitively demonstrate the solution advantages of GCNIACO.

Conclusions
In this study, a graph convolutional network improved ant colony optimization is proposed. By introducing graph convolutional network, dynamically adjusting pheromone volatile factor and introducing 3-opt algorithm, the deficiencies of ACO in solving the TSP are made up. The study combines graph convolutional network to improve ant colony optimization, which provides an idea for the combination of machine learning and swarm intelligence. At the same time, the generalized graph convolutional network model is selected, and the corresponding graph convolutional network does not need to be trained for each specific city node scale, which reduces operating costs and saves computing resources to a certain extent. And it also makes the proposed algorithm have certain scalability. The simulation test results of the graph convolutional network improved ant colony optimization on TSP datasets and VRP engineering application example verify the excellent performance of the improved algorithm in seeking the optimum solution. GCNIACO is a metaheuristic algorithm based on swarm intelligence. It can be applied to optimization problems in various fields, such as biological sequence alignment, digital microfluidic biochip contamination fault removal, molecular spectrum wavelength selection, molecular docking, epistasis detection of genome-wide association, etc. Moreover, GCNI-ACO, as an improved algorithm of basic ant colony optimization, may obtain relatively better results when solving optimization problems.   In the study, only the pheromone volatility factor in the ant colony optimization is adaptively improved. Therefore, future work can also consider improving other parameters of the ant colony optimization into an adaptive dynamic adjustment method. In addition to combining with graph convolutional network, combining ant colony optimization with other recently proposed algorithms is also one of the future study directions. Applying the improved algorithm to solve practical issues in real life, such as logistics distribution center location problem, robot path planning problem, etc., is also a direction worth considering for future study.