A Novel Multiple Person Pose Estimation Optimization Model Utilizing Genetic Algorithm

. For traditional human pose estimation models rely on a large amount of human body feature information, this paper proposes an optimization model using genetic algorithm to solve the problem of multiple person body part assembly. Different from other human body parts assembly method. The method proposed in this paper depends on the joints position information, namely the sum of the connection distances between the joints as the objective function, and finds the optimal value to obtain the best human pose assembly information. The simulation results show that compared with the traditional OpenPose model, the model proposed in this paper can obtain the same human skeleton using less position information.


Introduction
Human pose estimation technology has developed rapidly and widely used in the research field of computer vision, and study how to accurately recognize the target human body from a given image and obtain the target human body pose estimation [1][2]. Compared with single-person pose estimation, multi-person pose estimation is more complicated and requires more information.
In the multi-person method, there are two detection methods, namely the top-down method [3][4][5] and the bottom-up method [7][8]. Cao et al. achieve a fast connection to the trunk body (Part Affinity Fields, PAFs) joints to match different postures [8]. The advantage of this method is that it runs fast and the average detection accuracy is well. However, the method needs to change the articulation joints coordinates in the skeleton PAFS assembly stage, so in the actual test, the case due to missing PAFs human skeleton caused the connection error occurs. Since the human body structure has fixed connection rules, multiple joints will produce connection modes. In this case, the solution to body part assembly can be converted to intelligent optimization algorithms. The intelligent optimization algorithm can quickly find the optimal solution that meets the requirements from a large number of possible results, and at the same time, which can reduce the degree of dependence of the algorithm on prior information, thereby achieving the same effect.
The main innovation of this paper is to build a model using genetic algorithm to obtain the multiple person body part assembly with less position information.

The Traditional Human 2D Pose Estimation Model
The OpenPose model is essentially a dual-parallel convolutional network model. It uses two convolutional networks to complete body part assembly in the image.  Figure 1, the VGG-19 network is used to extract the input image's bottom-level features. One network uses a non-maximum value suppression algorithm to generate a confidence map for human joints' coordinate positioning. The other network uses the local area affinity field algorithm to connect key parts to essential limbs. After that, the two convolutional networks' processing results are summarized, and the Hungarian algorithm is used for body part assembly. The final output of the human pose in the image to be tested

Problem description and Genetic Algorithm Application
In the body part assembly process of the OpenPose model, the Hungarian algorithm is used to solve the problem of maximum bipartite graph matching. However, in the assembly process, we need to rely on the joint confidence map, and the local affinity field information between the limbs. The local affinity field information between the limbs is susceptible to background interference and errors, which leads to mistakes in body part assembly.

Problem description
In the image of two-person interaction, there are 36 joints and two pose sequences. According to this feature, the assembly process is transformed into sequentially selecting 18 human body joints and selecting an optimal point from each part's candidate points for body part assembly. However, if we try all the joint connections, there will be 2 19 connection methods, and this calculation method will consume a lot of computing resources. Therefore, we propose a connection method based on distance constraints for combining human joints in two-person images. The inspiration for the method comes from solving the TSP problem by selecting the joints with a shorter distance to connect to minimize the total joint connection distance. Among them, the distance constraint is divided into two types of distance constraints, as follows: 1) The overall goal is the shortest total length of the connection distance between the joints, as shown in equation (1), where n is the number of related nodes in the two-person image, and j is the currently detected joint.
2) The connection distance between each group of candidate's joints should be less than the length threshold. As shown in Figure 2, c1, c2 are the point cluster centers, p1 to p6 are joints, L12 to L46 are the connections between the joints, and TH value is the set threshold.

The implementation process of the genetic algorithm
This paper proposes a model using a genetic algorithm to solve the matching problem with no reference limb confidence information during body part assembly under dual constraints. The comparison flowchart between the traditional model and the improved model is shown in Figure 3. First, random coding is performed on different parts of different people to create chromosomes and initial populations. Then through a series of operations such as selection, crossover, and mutation, the objective function is the fitness function value. Then select and mutate operations. The optimal individuals are selected from the iterative population to form an elite population, and a solution that satisfies the dual constraints is selected. The joint sequence of a person in the image is obtained.  There are 18 human joints and their distribution and serial numbers are shown in Figure 4. In order to avoid unreasonable joint connection methods, for example, the arm nodes are connected to the leg nodes, the corresponding distances are specified according to the connection sequence of the human joint pose, and the adjacency matrix is shown in Figure 5, where the red area on the way is Correspondence that can be connected. The corresponding connection's distance value is calculated according to the adjacency matrix, and the distance matrix corresponding to the adjacency matrix is obtained. Then, according to each chromosome's different connection sequence, the sum of the accumulated distance is calculated, and this value is the target fitness value to be solved.   First, we use the k-means algorithm to perform cluster analysis on all joints to solve the problem. Since the number of people is known, the two cluster centers positions can be obtained, and the Euclidean distance cdis between the cluster centers can also be calculated. Next, based on the computed distance matrix between the related nodes of the reaction. The distance value from the center point to the left and right hip joints is extracted, the minimum value and the second minimum value are selected from them, and the average value pxmin is calculated. Since the position of the character in the image is uncontrollable, it is necessary to judge cdis and pxmin. When cdis > pxmin, there is a distance threshold th = cdis, when cdis < pxmin, there is a distance threshold th = pxmin.
The mutation operation in the genetic algorithm swaps two genes' positions on a parent chromosome to form a new offspring individual. According to the joints' characteristics in the two-person image, constraint (2) is first introduced here so that the distance between each pair of genes meets it. Thereby, a binary matrix of the current posture connection of constraint (2) can be obtained. In a binary matrix, the true value and the false value mean to exceed the constraint and satisfy the constraint condition, respectively. Find the true value index in the matrix, which also means the position of the gene in the chromosome that requires mutation operation. Then replace the candidate points corresponding to the gene, as shown in Figure 6.

Experimental
The test data comes from 8 videos in TikTok, each video picture taken, and the picture contains two persons. We send 8 test pictures into the OpenPose model for prediction and compare the predicted human skeleton with the algorithm predicted by this article. The result is as follows.

Figure 7. Some test results display
As shown in Figure 7, the first image is the original image. The second image is the human 2D pose predicted by the OpenPose model. The third image is the human 2D pose obtained by the predicted joints and using the optimization algorithm. The basic parameters of GA (No crossover rate), such as the population number, the number of evolutions, and the mutation rate are assigned as 90, 10, 0.5.
As shown in Table 1, each connection optimal solution corresponding to a minimum error and average error. The value of the connection solution represents the total connection distance of a person's joints. Observing the minimum error and the average error, our model generally finds the optimal solution with a longer distance more times when searching for the optimal solution (Standard optimal solution second column). In pictures with complex characters, the optimal solution with a shorter distance is difficult to find. Since there are only two people in the picture, as long as an optimal solution is found, the remaining joints are all the joints of the other person.

Conclusion
In this paper, we use genetic algorithms to find the best multiple person body part assembly. The results show that our model can be consistent with the connection result of the OpenPose model using only joints coordinate information. Achieved the purpose of reducing the dependence of the human pose estimation model on features.