Facial feature point recognition method for human motion image using GNN

: To address the problems of facial feature point recognition clarity and recognition efficiency in different human motion conditions, a facial feature point recognition method using Genetic Neural Network (GNN) algorithm was proposed. As the technical platform, we’ll be using the Hikey960 development board. The optimized BP neural network algorithm is used to collect and classify human motion facial images, and the genetic algorithm is introduced into neural network algorithm to train human motion facial images. Combined with the improved GNN algorithm, the facial feature points are detected by the dynamic transplantation of facial feature points, and the detected facial feature points are transferred to the face alignment algorithm to realize facial feature point recognition. The results show that the efficiency and accuracy of facial feature point recognition in different human motion images are higher than 85% and the performance of anti-noise is good, the average recall rate is about 90% and the time-consuming is short. It shows that the proposed method has a certain reference value in the field of human motion image recognition.


Introduction
Accurate acquisition of effective facial feature points (including eyes, nose, etc.) is a prerequisite for correct facial recognition. Therefore, the more important process in facial feature point recognition is the location of basic feature points. Among them, facial feature point recognition in human motion images is a type of facial feature point recognition. Facial landmark localization aims to detect the predefined points of human faces, and the topic has been rapidly improved with the recent development of neural network methods. However, it remains a challenging task when dealing with faces in unconstrained scenarios, especially with large pose variations. In this paper, a novel split-andaggregate strategy is proposed for large-pose faces. By introducing an anchor-based design, our proposed approach can simplify the regression problem by splitting the search space. Moreover, aggregating the prediction results contributes to reducing uncertainty and improving the localization performance [1,2]. There are various forms of facial feature point recognition. In Literature [3], based on the geometric features of the face and the local descriptors, Three-dimension (3D) face recognition is performed, and the key points of the obtained facial features are matched with the descriptors on the face of the library. In addition, the covariance matrix descriptor is extracted to measure the matching degree of the face and improve the efficiency of the recognition algorithm. In Literature [4], the facial image Gaussian pyramid was established by multi-resolution analysis, and the Gabor feature spectrum of each layer of the image in the pyramid was extracted. Aiming at the problem that Gabor features lack a global description of facial image, the local and global feature information of facial image are captured to realize face classification and recognition, and improve facial image recognition performance in complex and changeable environments. In Literature [5], the RANSAC algorithm is used to extract facial image feature points to eliminate unstable matching points, and to perform accurate 3D face reconstruction, and use singular value decomposition singular value decomposition (SVD) to solve the coarse registration transformation matrix, which makes 3D face recognition less computation and higher real-time performance. In Literature [6], in order to improve the efficiency of face recognition, keep as much of the original face information as possible, use principal component analysis to eliminate the correlation and noise between image features, and further project transformation to reduce the data dimension, and its recognition rate is optimized. Literature [7] describes the variation of facial expressions, uses dynamic optical flow features to improve the recognition rate of facial expressions, and proposes a new method of facial expression recognition, which extends the traditional linear judgment analysis method, in JAFFE and CK facial expression database classifies and recognizes facial expressions. However, the above methods have disadvantages such as long training time, slow convergence speed and easy trapping in local extreme points. Therefore, this paper proposed a method for facial feature point recognition in human motion images using on GNN. The genetic operator and face detection algorithm are combined and imported into the training process of GNN algorithm for many times as a face classifier, which has better anti-interference and makes up for the deficiencies of the neural network. Dynamic transplantation of face feature points is carried out with the local features of face feature points, and face feature points are repeatedly detected to improve the convergence speed of the whole process. The detected face feature points are transferred to face alignment to effectively improve the training speed and accuracy of GNN algorithm, and realizes high-quality recognition of face feature points in human motion images. The main contributions of this paper are as follows: 1) In many development platforms, Hikey960 development board is selected as hardware according to the actual image in this paper, which lays the basic conditions of hardware. 2) In human motion image collection, for the problem that the classification training speed limit is prone to errors, an error analysis was made to enhance the accuracy of image information collection. 3) Combining face detection algorithm and genetic operator, the GNN algorithm is improved to avoid the problem of inaccurate target representation in GNN algorithm, and the conversion characteristics of GNN algorithm are strengthened, which lays a foundation for high precision recognition of face feature points in human motion images.

Related work
There are many intelligent research methods for human motion image facial feature point recognition. In Literature [8], a weakly supervised learning method is proposed, which can learn convolutional neural network (CNN) from unlabeled RGBD video with few annotations, more importantly, this paper introduces a new data set, Birmingham nuclear waste simulation data set, and evaluates the proposed method to solve this new industrial target recognition challenge. The weak supervision method has proved to be very effective in solving a new application of RGBD object detection and recognition; Literature [9] studied the neural network tracking control of under-actuated systems with unknown parameters, matching and mismatching disturbances, and proposed a new adaptive control scheme using multilayer neural networks, adaptive control and variable structure strategies. In order to cope with the uncertainty of approach error, unknown datum parameters, timevarying matching and mismatching external disturbances, new auxiliary control variables are designed to establish the controllability of the non-configurable subset of the under-actuated system. Through the design of an appropriate robust compensator, it effectively cancels the external interference of approximation errors and matching and mismatching. Literature [10] proposed a two-step single-trial classification method to identify the three movements of the left and right arms (fist, extension and elbow flexion), and distinguish the left and right arms by decoding event-related synchronization. The motion is characterized by cortical coherence, and the specific motion of the arm is recognized. Research shows that proposed method is effective for the classification of different types of single-arm motions. Literature [11] proposed an improved general spatial pattern feature extraction method. Firstly, for different objects, the Bhattacharyya distance method is used to select the best frequency band of each electrode. Then the optimal frequency band signal is decomposed into spatial patterns, and features that can describe the largest difference are extracted from the EEG data, and the classification effect is better. However, the above methods all have the problems of low recognition efficiency, low recognition accuracy and high noise in feature point recognition. For this reason, this paper proposes to recognize facial feature points in human motion images based on GNN algorithm. The results show that the proposed method has high recognition efficiency and recognition accuracy, high signal-to-noise ratio and recall rate, and short time-consuming.

Recognition of facial feature points in human motion image
Through the selection and design of the software and hardware development platform, the improvement of the GNN design concept, and the matching and transplantation of facial feature points, the recognition of facial feature points in human motion images is completed.

Development platform design
The main goal of hardware selection is to shorten the development schedule, and carefully evaluate the selected target platform. It involves the availability of convenient development environment and technical support. Moreover, the mobile SoC platform released by Huawei has the highest performance. Applying it to the Android system can achieve the goal of application development with strong practicality [12,13]. Therefore, the HiKey960 development board is used as the hardware and configured in the Android Open-Source Project Android (AOSP) hardware opensource system. In order to facilitate the initial debugging of the development, several special pins, such as the Joint Test Action Group (JTAG) interface, will be pulled out for the external debugging module [14].
USB camera is used as video input facility, and a driver is written to call this facility. By obtaining the relevant information about the universal serial bus camera video input drive, and using the opensource library drive facility. The driving process includes: Step One: Obtain the libusb library, and there is no application program interface that restricts access to USB facilities; Step Two: Obtain the libuvc library based on the libusb library, call the USB video facility, implement fine-grained control of the facility, and obtain the video stream; Step Three: The compression library libjpeg-turbo, which obtains the facial image of the human motion in Android, is also a motion facial image codec, and processes the video stream information, and uses single instruction multiple data streams to encode and decode the human motion facial image.
Android 8.0 system is selected as the software application. Its advantage is that it is smooth and stable, and can efficiently collect and recognize facial feature points in human motion image.

Human motion image acquisition based on optimized BP neural network
BP neural network belongs to the forward network, which is composed of input layer, middle layer and output layer. The middle layer includes many layers to facilitate the judgment of the interaction between each factor. And each layer is composed of several neurons, and each neuron of the two similar layers is connected by weight, and the connection strength between the two neurons is judged according to the size of the weight. Through the single-phase operation from the input layer to the middle layer and then to the output layer, the calculation process of the entire network is realized, and the learning algorithm is introduced in this process. The structure diagram of the BP neural network used in this paper is shown in Figure 1.
The output sigmoid function of the neurons in each layer of the BP neural network is expressed as: where is node interval, is node mapping function. The node weight from the middle layer (input layer) to the output layer (or middle layer) is , is the mapping interval, and the number of node input values is B. = 7 and = 12 refer to the number of nodes from the input layer to the middle layer and the number of nodes from the middle layer to the output layer. Since the number of nodes from the input layer to the middle layer cannot be accurately determined and the classification effect cannot be guaranteed, the revised node weight is . The premise for the revised weight value as is that the original value of ( = 1,2, … 7; = 1 … … 12) is not within the range. And it satisfies the information formula that the intermediate layer accepts the reverse transmission of the output layer: where ( ) refers to the connection weight value at moment from the layer's neutral node (the middle layer or input layer) to the upper layer's neutral node (the input layer or middle layer). The actual output value of neuron in this layer at time is ; , and refer to step size adjustment factor, smoothing factor and error weight adjustment factor, and ∈ (0,1), and ∈ (0,1). The intermediate node and the output node are represented by Eqs (3) and (4), respectively.
where and represent the actual output value and output target value of the intermediate node . Limited by the classification training speed, it is easy to produce output errors [15,16]. Therefore, error analysis is performed. In the case that the intermediate layer accepts the information formula passed by the output layer backward, the relative error function is: where and refer to the network's actual output value and predicted output value. When there is no error in the calculation of the model, the error shows less than the error tolerance of the network; conversely, if the error exceeds the network error tolerance, return to the second step to adjust the weight and continue to calculate until the error tolerance requirement is met [17,18].

Improve GNN with genetic algorithm 1) Fixed network structure
The genetic algorithm combines Mendel's survival of the fittest and Darwin's law of evolution and is completed by a cumbersome problem coding. Four steps such as genetic manipulation, selection, crossover and mutation are used for generational evolution, and finally the optimal solution and suboptimal solution of the problem are obtained. This study used real-valued coding to fix the network structure [19]. Real-valued coding not only has high calculation accuracy and simple calculation, it can realize the output of neural network through the form of coding. Equation (6) is the coding criterion.
where A represents the node distribution function. In order to reduce the length of the chromosome of the genetic algorithm and simplify the calculation amount of the genetic operation [20], the coding problem in Eq (6) is solved after [ ] is rounded and decoded. The fitness function needs to be satisfied during the rounding process: where refers to non-negative complexity value, and refers to continuous differentiable conditions. Complete decoding and simplified calculations. 2) Improve GNN network structure with genetic operator Combined with the genetic algorithm after real value coding, the GNN algorithm is improved to recognize the facial feature points of human motion image.
Firstly, the human motion image collected by the optimized BP neural network is preprocessed to form the morphological filtering of human motion face image: where ′ represents the morphological filter value. represents the original image shape value. G is a square lattice.
After morphological filtering, in order to ensure the quality of human motion image and improve the image SNR, judge whether the current image pixel is impact noise pixel, as shown in Eq (9).
Where ( , ) represents the image pixel value, and are pixel points, and represents the threshold. If the judgment result of the impact noise pixel is 1, the pixel is a noise pixel.
The variable threshold of image morphological filtering is binarized, and the eye position in human motion face image is determined by intersection and separation transform, and the specific morphological unit is extracted from human motion face image. The equation is: where represents the threshold of a specific shape. represents image morphological space.
represents the maximum threshold. The eyeball is the closest circle in the human motion face image. Therefore, in this formula, the circular structural unit is . Then the unit excluding in the square range of G is * .
The largest point obtained from the detected morphological transformation is regarded as the judgment point and eyeball point. ii) The position of feature points is determined by projecting the human motion face image according to the clear regional points.

Recognition of human facial feature points
The obtained human motion facial image is transferred to the face alignment method to detect the facial feature points. The detailed process is shown in Figure 2.
The face detection method process, as shown in Figure 2, is written using the computer vision library OpenCV, CAFFE, and the programming language Python. This effectively detects the face's location as well as the five feature points. The source code of the face detection algorithm is compiled using MATLAB and CAFFE, and the finished face detection program is compiled into the dynamic link library of the Android application using NDK and CAFFE; face alignment is realized using the regression tree algorithm, and a cascade regression tree is built to restore the actual shape of the face. Gradient Boosting Decision Tree (GBDT) is used in the alignment process, and each GBDT tree is serially connected. A Dlib library, which is an opensource library, is obtained on the basis of building facial recognition programs on multiple platforms. This opensource library's transplantation process is divided into four parts: 1) The operation of facial image and files for human motion is realized by compile code, such as human motion facial image format conversion and other operations, and assign them to the project for calling.
2) The human motion facial image information obtained by the face detection algorithm is passed to the face alignment method to complete the face alignment. The NDK command combined with the computer vision library is used to compile the face alignment information into a dynamic link library.
3) The functions such as calling the dynamic link library and loading the trained model are applied to the main project. 4) Rewrite configuration files such as build.gradle, use the gradlew command set to complete the creation of all projects, and generate an Android installation package, use the adb command set to assemble it on the development board, and pass the verification. The program can accurately and effectively obtain the position information of the facial image feature points of 50 human motions, and can display the detection results on the display.  According to the above process, the clear distribution area of the eyes, nose and mouth of each organ completes the realization of the facial feature point detection algorithm. Complete the effective recognition of facial feature points in human motion images.

The proposed method
In this study, genetic operators are used to improve network structure, train in the GNN network, and meet specific error requirements, and implement accurate GNN optimization. The optimization process is shown in Figure 3.
Meet the accuracy req uirements  If the maximum algebra of the designed evolutionary algebra is satisfied, the genetic operation will be terminated. If not, skip to step 2), and use real-valued coding to avoid interference with the design process through the coding plan; 7) According to the GNN logic, the optimal individual in the recognition area is obtained, and the GNN network training is performed according to the decryption result of the genetic algorithm; 8) The improved GNN is used to perform precise optimization on the network and clarify the location of facial feature points; 9) Realize face alignment algorithm; 10) Write the USB camera video input driver in turn; 11) The realized face detection algorithm is written into a dynamic link library, and the running process is completed through the Android platform.

Data set
The operating system used in this experiment is window10, which runs the algorithm based on Caffe framework and python. In order to verify the improvement effect of the proposed method, AFW data set [21] and WFLW data set [22] are selected as data sources. AFW data set: this dataset is mainly applicable to face image recognition, including 473 face markers. Each face image is set as a rectangular bounding box style, and each image contains 6 landmark information. WFLW data set: this data set selects 98 key points, takes 10,000 faces, has a large number of images and diverse environments, including occlusion, illumination, expression and other attribute information. The face images of running, playing table tennis and playing basketball are randomly selected from the above two data sets to form two data sets, which are detected by the face detector to form countless overlapping blocks. Each data set contains 30,000 data and 11,000 videos, and 78 typical data points are marked, which are converted into about 200,000 images. Each image is annotated with 68 labels. In this experiment, 200,000 images are selected for data training, and the remaining half are used for experimental test and analysis, which are tested under 40,000, 60,000 and 100,000 data respectively.
After the face image is obtained by setting the boundary box of the face detector, the specific position of the face is analyzed according to the input coordinate data and the face detection data, the specific position of the feature points is determined, and the training test image is checked repeatedly with the feature point pixel data. The internal parameter matrix is obtained by chessboard calibration method, and then the existing model is used to identify facial feature points. According to the coordinates of the detected feature points, the homography matrix of the plane on the positive plane between the corners of eyes and mouth is calculated, and then the face angle of each image is solved combined with the constraint of feature points. According to the average distance between the normalized predicted coordinates and the real coordinates, the face angle is calibrated to ensure the practicability and applicability of each image. The partial sample collection results are shown in Figure 4.  The proposed method is compared with [3,8,9] and CNN methods for facial feature point recognition. Figure 4 is analyzed to verify the application effect of the proposed method.

Evaluation criteria
Facial recognition point mining: In the case of two kinds of noise, the proposed method verifies the mining results of face recognition points of different types of human motion images.
SNR: In the experiment, 20% salt and pepper noise and 50% salt and pepper noise are added to verify the SNR of the proposed method. The calculation formula of the SNR is: where, and refer to the effective power of the signal and noise, which can also be converted into the ratio of voltage amplitude.
Recognition accuracy/recognition efficiency: when 20% salt and pepper noise and 50% salt and pepper noise are added, the recognition accuracy and recognition efficiency of the proposed method are verified. The verification equation is: Recognition efficiency： where is total recognition feature points, h is the number of facial feature points extracted, T is the adjustment coefficient, and ( ) is the efficiency function.
In the process of recognition accuracy analysis, since each image is annotated with 68 labels. Define the feature points according to the regional point projection to ensure that the face feature points optimized by the adjustment coefficient correspond to the extracted feature points one by one. It is transferred to the dynamic link library after the application of face alignment algorithm to complete face alignment and ensure that h and are recognizable feature points with the same location information.
Recognition time-consuming: Time-consuming is an important index for judging the performance of the method. The recognition time-consuming of proposed method, [3,8,9] and CNN method is compared.
Recall rate: In order to further verify the effectiveness of the proposed method, the recall rate is used for verification and analysis. According to Figure 5, when other literature methods recognize human facial feature points in different motion images, the other methods recognize the features that are rough in the missing part, resulting in poor pixel mining effect.

Results and discussion
The CNN method is under 29 recognition points, the [3] method is under 27 recognition points, and the [8,9] have fewer methods, and it does not reach 24 recognition points. The proposed method has a high degree of recognition of feature points, which is more than 32, which can meet the needs of facial feature recognition in motion. After adding 20 and 50% salt and pepper noise, the comparison results of the SNR, recognition accuracy and recognition efficiency of the four methods are shown in Tables 1 and 2. In order to improve the quality of experimental data, the data in Tables 1 and 2 are the average values obtained  from multiple experiments.  According to Tables 1 and 2, for different human motion images, when 20% salt and pepper noise is added, the proposed method is higher than 85% recognition efficiency and accuracy of facial feature points in human motion images. when 50% salt and pepper noise is added, the proposed method is still higher than 85% recognition efficiency and accuracy of facial feature points in human motion images, which is not affected by environmental noise. However, the methods of [3,8,9] have obvious differences under the two salt and pepper noises, and the overall performance of 20% salt and pepper noise is relatively better.
According to statistics, under 20% salt and pepper noise, the recognition efficiency and accuracy of the method in [3] is no more than 50%. The recognition efficiency and accuracy of the method in [8] is no more than 60%. The recognition efficiency and accuracy of the [9] method does not exceed 70 and 50%, and the CNN method does not exceed 60%, which are far lower than the proposed method. In summary, when adding different salt and pepper noises, the test results of proposed method have obvious advantages in SNR, recognition accuracy and recognition efficiency. This is because the proposed method combines the face detection algorithm with genetic operators to synchronize and improve the GNN algorithm. The dynamic transplantation results of face feature points are combined with the projection results of regional points to minimize the impact of face corner or noise on the recognition results when ensuring human motion or in the presence of noise. That is to improve the matching degree of feature points, and then improve the recognition accuracy and efficiency. According to the SNR, it can be known that under 20% salt and pepper noise and 50% salt and pepper noise, the SNR range of the proposed method is between [33dB, 36dB]. The method in [3] is between [20dB, 26dB]. The methods of [8,9] are between [20dB, 27dB]. The CNN method is between [21dB, 27dB], and the four comparison methods do not exceed 27dB, which is significantly lower than the proposed method.
In view of the result that the image signal noise of the proposed method is higher than that of the traditional method, the main reason is that this paper uses the double compound morphological filter to preprocess the human motion images collected by the optimized BP neural network, and in order to ensure the quality of the human motion images, Improve image SNR. Whether the image pixels are impact noise pixels is further judged, which provides a basis for improving the SNR of human motion images. However, the methods in [3,8,9] are not enough to deal with the image noise, which is lower than the image SNR of the proposed method.
The time-consuming comparison results of different methods are shown in Table 3.   16 17 According to Table 3, it can be seen that the identification time of proposed method is significantly lower than other methods. Under different data size conditions, the identification of proposed method does not exceed 7 s. In other methods. The recognition in [9] takes a long time. When the amount of data is 100,000, the recognition time is up to 30 s, the maximum recognition time in [3] is 23 s, and the maximum recognition time in [8] is 28 s, which is multiple higher than the method in this paper. Therefore, we can show the advantages of the proposed method. The comparison result of the recall rate of different methods is shown in Figure 6.
According to Figure 6, it can be seen that the recall curve of the proposed method changes around 90%, recall rate is high. While the recall curve of other methods is lower than the proposed method, and there are significant differences. [3,8] have a large change in the recall rate curve, the highest is about 80%, the lowest is only about 50%, and the stability is poor. The CNN method varies from 60 to 75%, the overall recall rate of the method in the [9] is the lowest, less than 70%. It can be seen that the proposed method has better recognition.

Conclusions and future works
In the paper, we proposed facial feature point recognition method of human motion image using GNN, based on the HiKey960 development board, and MATLAB and CAFFE are used to write the source code of the facial detection algorithm. Through the transplantation of facial feature point recognition algorithm, facial detection and face alignment, the recognition of facial feature points in human motion images is realized. Experiments show that the recognition efficiency and recognition accuracy of face feature points in different human motion images are high, and the signal-to-noise ratio and recall rate are high, and the time-consuming is short. The proposed method can effectively recognize the facial feature points of different human motion images accurately and stably, and proposed method can be applied to intelligent transportation aerial photography. In future works, we need to increase investment in experimental platform and experimental data to provide theoretical reference for the research of neural network and related fields.