Research on the Pedestrian Re-Identification Method Based on Local Features and Gait Energy Images

: The appearance of pedestrians can vary greatly from image to image, and different pedestrians may look similar in a given image. Such similarities and variabilities in the appearance and clothing of individuals make the task of pedestrian re-identification very challenging. Here, a pedestrian re-identification method based on the fusion of local features and gait energy image (GEI) features is proposed. In this method, the human body is divided into four regions according to joint points. The color and texture of each region of the human body are extracted as local features, and GEI features of the pedestrian gait are also obtained. These features are then fused with the local and GEI features of the person. Independent distance measure learning using the cross-view quadratic discriminant analysis (XQDA) method is used to obtain the similarity of the metric function of the image pairs, and the final similarity is acquired by weight matching. Evaluation of experimental results by cumulative matching characteristic (CMC) curves reveals that, after fusion of local and GEI features, the pedestrian re-identification effect is improved compared with existing methods and is notably better than the recognition rate of pedestrian re-identification with a single feature.


Introduction
Pedestrian re-identification is a research hotspot in the field of intelligent video image analysis.The objective of this task is to identify cross-camera pedestrian targets in nonoverlapping monitoring networks [Liu, Tao, Song et al. (2015); Li, Zhou, Yu et al. (2019); Zhou, Ke and Luo (2019)].Pedestrian re-identification technology is widely used in intelligent video surveillance, criminal investigations, home security management and other fields.In the video surveillance environment, pedestrian re-identification mainly depends on pedestrian appearance and clothing, and pedestrians with similar clothing or who have changed clothing can easily affect the identification results.Therefore, pedestrian re-identification has become a challenging and valuable topic in the field of computer vision.Pedestrian re-identification involves extracting the features of a person, calculating the distance between each feature vector with a distance measurement function, and then describing the similarity between persons by the distance between features to achieve pedestrian re-identification.Cheng et al. [Cheng, Cristani and Stoppa (2011)] used a pictorial structure to divide the human body into different parts and extract features from each part.Hervé et al. [Hervé and Ondřej (2012)] estimated the salient region of each image, learned the similarity function of the salient regions, and sorted the results.Prosser et al. [Prosser, Zheng, Gong et al. (2010)] used a support vector machine to measure the similarity of predefined color and texture feature vectors.Liao et al. [Liao, Hu, Zhu et al. (2015)] established a feature expression method using color and texture histograms and proposed a locally monotonic (LOMO) model for calculating local color texture features and obtaining an ultrahigh dimensional feature vector.Matsukawa et al. [Matsukawa, Okabe, Suzuki et al. (2016)] proposed a Gaussian of Gaussian (GOG) descriptor that adopted a hierarchical Gaussian operator to divide the image into different regions described by multiple Gaussian distributions representing color and texture information.Each Gaussian distribution represented a small image block, and the features of each image block were combined to produce the feature vector of the pedestrian image.Wang et al. [Wang, Gong, Zhu et al. (2014)] first extracted the motion features of a pedestrian by calculating the gait energy image (GEI) [Hofmann, Bachmann and Rigoll (2012)] of each pedestrian video sequence and then fused HOG3D spatiotemporal features before finally conducting similarity measurements with discriminating video frame ranking models.Deng et al. [Deng, Wang, Cheng et al. (2017)] established a comprehensive feature of human gait dynamics by combining the spatiotemporal and kinematic gait characteristics of the human body.When the walking conditions of the human body change, this method can still achieve excellent performance.However, this method requires a high-quality gait sequence, the calculation amount for establishing the model is large, and it is difficult to achieve the ideal effect.Mogan et al. [Mogan, Lee and Tan (2017)] proposed a time gradient pattern method for gait recognition that not only described the spatial contour shape but also implicitly captured contour deformation over time.Based on the distribution of feature space and the relationship between sample points, Zheng et al. [Zheng, Zheng and Yang (2018)] proposed an algorithm combining verification loss and classification loss, but the algorithm did not consider image alignment and local features of pedestrian images.Zhao et al. [Zhao, Tian and Sun (2017)] used key points of the human body as prior knowledge to divide it into several fixed rigid structures, thereby strengthening local feature learning; however, the algorithm did not consider the integrity of the images of individuals and background features, resulting in low recognition accuracy.Sun et al. [Sun, Liang, Yang et al. (2017)] proposed a method for dividing the feature image horizontally and fusing multi-granularity features to complete the task of pedestrian re-identification; however, the approach did not consider pedestrian image misalignment.The rest of this article is organized as follows.In Section 2, color and texture features are extracted as local features, while gait features are extracted in Section 3. In Section 4, local and gait features are fused, and similarity measurement functions are learned.In Section 5, experiments are carried out to verify the effectiveness of the proposed algorithm.Finally, the last section concludes the paper.

Local feature extraction
In this paper, it is assumed that a pedestrian is composed of various parts such that a convolution structure is adopted to locate pedestrian body nodes [Syed, Tahir, Robina et al. (2019)]; the body structure is then divided according to the node positions, while the images are aligned.Then, the extracted nodes divide the pedestrian into four parts: the head, the trunk and arms, and the left and right legs.Fig. 1 shows the body area division diagram.Each part is scaled to a fixed pixel value, and the pixel size of each part is unified; the image of each part can be described as , , , P P P P P = . Finally, the color and texture features of each part are extracted.

Color feature extraction
Hue-Saturation-Value (HSV), Lightness-A-B (LAB) and Luminance-Chroma: Blue-Chroma: Red (YCbCr) are three complementary color spaces; therefore, describing a given object from different angles can better reflect the differences between samples.This paper extracts features in the HSV, LAB and YCbCr color spaces.Due to illumination, clothing material and other factors, the color of images can vary greatly for the same pedestrian between different cameras and for different pedestrians with the same camera, which affects the extracted color [Wang, Huo, Yu et al. (2019)].Therefore, before extracting the color feature histogram, the image is first processed with the retinex image enhancement algorithm [Nejad and Shiri (2019)] to reduce the influence of illumination on color [Yang, Yang, Yan et al. (2014)].According to the image division standard shown in Fig. 1, a pedestrian image is divided into 4 area blocks.For each channel of each color space of each area block, a 16dimensional color feature histogram is extracted, and all the color feature histograms are then connected in series to generate the color features of the pedestrian image.A 576dimensional (4×3×3×16) color feature can be extracted for each pedestrian image.If the original 576-dimensional features are used to directly learn the similarity metric function, 2×576×576 parameters must be estimated, which is too much.Therefore, in the experiment, principal component analysis (PCA) is used to reduce the feature dimension of the pedestrian image to a specific dimension.

Texture feature extraction
To better describe the local information of the pedestrian image, the scale invariant ternary pattern (SILTP) is used to extract the texture information of the pedestrian image.The SILTP [Liao, Zhao, Kellokumpu et al. (2010)] is an improved local binary pattern (LBP) description operator with a good anti-interference ability for noise in the region.When the detection region is dark, covered by shadows or contains a lot of noise, the operator has strong adaptability.Moreover, the SILTP operator exhibits scale invariance, which makes it robust to brightness changes; thus, the SILTP feature will be only slightly affected, even if the illumination suddenly changes from dark to bright.Fig. 2 shows the encoding process of the SILTP operator in this algorithm.
where c I is the grayscale value of the image center pixel point, k I is the grayscale value of the pixel point corresponding to the Q neighborhood with radius R , ⊕ connects the binary values of all neighborhoods into character strings, and t is the threshold range.As shown in Fig. 2, the image is binary coded according to Eqs. ( 1) and ( 2), and the center point is coded as 0010010100001001 in counterclockwise order.Fig. 2 reveals that, even if the image contains noise or scale changes within a certain range, the SILTP coding value remains unchanged, which indicates that the operator is robust to problems such as illumination changes.According to the image division standard shown in Fig. 1, for each area block, the SILTP features in two modes with a threshold of 0.3, a neighborhood of 8 with a radius of 1, and a neighborhood of 16 with a radius of 2 are extracted.The dimensions in each mode are 81-dimensional.Then, four area block features are connected in series at the same time, and a 648-dimensional (4×2×81) texture feature can be extracted from each pedestrian image.The PCA algorithm reduces the feature to a specific dimension.

Gait feature extraction
The pedestrian re-identification results are affected to a certain extent by the appearance and clothing of pedestrians [Zhao, Ouyang and Wang (2013)].Gait features have advantages in terms of long-distance non-invasiveness, difficulty in disguising and hiding, and feature extraction and recognition, which are not available in other biometrics, and can be used for low-resolution images.GEIs provide abundant static and dynamic information for gait recognition.The LBP has simple calculations and high accuracy, but it has incomplete features and low accuracy for complex images.HOG features can suitably describe the edge information of images.To fully consider the local deformation of the human target image and to better describe the boundary contour information of the human body, LBP and HOG features are extracted from the GEI to describe the GEI features and compare the similarities.Therefore, in this paper, the LBP and HOG operators are used to extract pedestrian GEI features as gait features.

Image preprocessing
In this paper, the background modeling method based on the Gaussian mixture model is used to model the background, the foreground is extracted by the background subtraction method, and the obtained binary image is morphologically processed.The image preprocessing flowchart is shown in Fig. 3.  of the human body region.In the subsequent analysis, the center point of the human body region will be more readily found according to , and images are cropped to a size of 160×90.The cropped images retain the complete aligned human body regions, while the pixel size of the image has been reduced.

Gait cycle extraction
Gait is considered a regular cyclic movement, and extracted gait information is based on several binary images of the human body during a given period.In this paper, the optical flow method [Chen, Zha, Li et al. (2018)] is adopted to calculate the gait cycle.The optical flow method describes the instantaneous motion shape, including the shape of the moving object (space) and the shape of the motion (time).Different gaits can be distinguished from the periodic difference in the motion shape.In this method, calculations rely solely on the relative motion of adjacent frames without constructing any model or acquiring the image background in advance.The results provide not only motion information about the moving objects but also rich information about the threedimensional structure of the scene.In the optical flow method, it is reasonable to calculate the gait cycle by the periodic variation in offset v in the whole gait sequence.Therefore, ( ) T t and expressions of the v component of the gait flow field t are used to calculate the gait cycle.( , y, t) where x and y represent the serial numbers of the rows and columns, respectively, W and H represent the numbers of rows and columns, respectively, and represents the optical flow component v of the t and 1 t + frames of the sequence at coordinate ( , y) x .Fig. 5 shows the periodic variation of ( ) T t .The frames between every pair of adjacent peaks or valleys belong to the same period.The image in the frame where ( ) T t is located is extracted as a gait cycle.

Gait energy image extraction
The GEI is a commonly used feature in gait detection; it is easy to extract and reveals gait features such as speed and morphology.It is defined as follows: where N represents the length of an image sequence of a gait cycle, t is the number of frames, and ( , ) t I x y is the pixel value at pixel point ( , ) x y .GEIs are calculated by Eq. ( 4) for all images within a gait cycle.GEIs [Luo, Yang and Liu (2016)] of different people are shown in Fig. 6.The brighter the position in the image, the more pixels appear.In brighter areas, the changes during walking are smaller, and in darker areas, the changes are larger.If the GEI is directly analyzed as a feature, then the amount of data is very large; thus, PCA is performed on the GEI to reduce the dimensionality of the image.

HOG feature extraction of gait energy images
The implementation process of the HOG feature extraction algorithm is as follows: the feature dimension.In XQDA, the original feature i x , is mapped to a lowdimensional subspace by learning the mapping matrix ( ) The similarity function is defined in Eq. ( 9): , , and I ∑ and E ∑ are the covariance matrices of the intraclass and interclass sample difference distributions, respectively.When solving and decomposing the characteristic value of matrix and W is the measurement matrix.
In this paper, the color, texture, LBP and HOG features of the GEI of a pedestrian are extracted.To fully reflect the differences in feature expression in the different spaces and to fully utilize the complementary effect of the different features, the color, texture, LBP and HOG features of the GEI are separately extracted to learn the measurement matrix 1 2 4 , , , W W W  with the independent measurement learning method.The measurement criteria based on the different features are then obtained and used to weight and fuse the similarity of test samples in different spaces to obtain the final similarity.Four measurement matrices , , , W W W W can be acquired by learning the color and texture features of local region blocks [Farenzena, Bazzani, Perina et al. (2010)] and the LBP and HOG features of GEIs.The similarities , , , δ δ δ δ of the four features can be obtained by Eq. ( 10).The similarity of the local color and texture features is weighted to obtain the similarity local δ of the local features, as shown in Eq. ( 10).
( , ) ( , ) (1 ) ( , ) The LBP and HOG features of the GEI are weighted for similarity to obtain the similarity gei δ of the gait, as shown in Eq. ( 11).

Experimental data and evaluation indicators
In this paper, Dataset B [Zheng, Zhang, Huang et al. (2011)] from the CASIA gait database of the Institute of Automation of the Chinese Academy of Sciences is used to

Performance comparison of algorithms with different weights , γ η
Eq. ( 12) indicates that the similarity of pedestrian local features is weighted by the similarity of color and texture features, and the GEI features similarity is weighted by the LBP and HOG features similarities.Fig. 10 shows the performance comparison results for different weights , γ η of the dataset in this experiment.The experimental results reveal that the identification performance is optimal when , γ η is between 0.3 and 0.4.

Result comparison
Tab. 2 summarizes the comparison of the experimental results and those from the gait energy image-region bounded by legs-local binary pattern (GEI-RBL-LBP) [Kumar and Nagendraswany (2014)] and gait energy image-principal component analysis-multiple discriminant analysis (GEI-PCA-MDA) [Rida, Almaadeed and Bouridane (2014)] pedestrian re-identification methods, with relevant parameters for 120 samples, including the 100 data samples from the CASIA Dataset B standard database and the 20 newly added data samples, which are set to optimal values.The experimental results reveal that the average identification effect of the proposed algorithm is the best compared with the other pedestrian re-identification methods.The method used in this paper 88.5% In this paper, pedestrian samples with similar clothing and postures are included as interference samples, which increases the difficulty of identifying the dataset.In the proposed method, color, texture and gait features are fused; for pedestrians with similar clothing, the gait features of the pedestrians compensate for the defects of the color and texture features, and for pedestrians with similar postures, the color and texture features suitably compensate for the deficiencies of the gait features.Consequently, the pedestrian re-identification rate is improved.

Conclusion
This paper proposes a pedestrian re-identification method that combines color, texture and gait features, thereby preserving not only the color and texture features but also the gait features of pedestrians.The experimental results show that the method can suitably improve both the low identification rate caused by pedestrians with similar clothing and postures and the pedestrian re-identification effect.However, the weight coefficient used in similarity fusion in this paper is not general.The coefficients in this paper are obtained by trial and error, and the optimal weights of features cannot be adaptively determined in a direct manner.In the future, the optimal weights under different conditions will be able to be determined by learning.

Figure 1 :
Figure 1: Body area division diagram

Figure 2 :
Figure 2: Encoding process of the SILTP operator Assuming that the pixel position of an image is ( , y ) c c x , the SILTP encoding method is as follows: 1 , 0 ( , y ) ( , ) Q t Q R c c t c k k SILTP x s I I − = = ⊕

Figure 3 :
Figure 3: Image preprocessing flowchart The pedestrian binary image obtained after preprocessing is shown in Fig. 4.

Figure 4 :
Figure 4: Image preprocessing In gait images, the position of the human body region changes continuously during movement.Subsequent image feature extraction is facilitated by normalizing the image to a uniform size, which ensures that the normalized human body region is located at the center of the image and that human body alignment is realized in all images.Normalization: The image is traversed from left to right and from top to bottom to identify the four boundaries , , , min max min max x x y y

Figure 6 :.
Figure 6: Gait energy images of different people

Figure 10 :
Figure 10: Performance comparison of the algorithms for different weights , γ η at a 0degree visual angle

Table 1 :
Algorithm identification results of local and gait energy image features at a 0degree visual angle

Table 2 :
Rank-1 comparison of the results of the different methods