Research of Metric Learning-Based Method for Person Re-Identification by Intelligent Computer Vision Technology

Person re-identification technology aims to establish an efficient metric model for similarity distance measurement of pedestrian images. Candidate images captured by different camera views are ranked according to their similarities to the target individual. However, the metric learning-based method, which is commonly used in similarity measurement, often failed in person re-identification tasks due to the drastic variations in appearance. The main reason for its low identification accuracy is that the metric learning method is over-fitting to the training data. Several types of metric learning methods which differ from each other by the distribution of sample pairs were summarized in this article for analysing and easing the metric learning methods’ over-fitting problem. Three different metric learning methods were tested on the VIPeR dataset. The distributions of the distance of the positive/negative training/test pairs are displayed to demonstrate the over-fitting problem. Then, a new metric model was proposed by combining the thoughts of binary classification and multi-class classification. Related verification experiments were conducted on VIPeR dataset. Besides, the semi-supervised metric learning approach was introduced to alleviate the over-fitting problem. The experimental results reflect gap between training pairs and test pairs in the metric subspace. Therefore, reducing the difference between training data and test data is a promising way to improve the identification accuracy of metric learning method.


Introduction
Person re-identification technology has always been a hot spot in the field of computer vision. This technology tries to match pedestrian images cross view of cameras in non-overlapping surveillance system. Person re-identification has important research significance for its great role in many scenarios. However, person re-identification encounters several notable difficulties which face-recognition doesn't encounter. In the application scenario of face recognition, users have relatively fixed postures and the light is under control in a close-up shooting system. While in the people re-identification, large intra-class variations and small inter-class variations due to drastic appearance variations often cause identification failure.
Early researches on person re-identification focused on ways to manually design better visual features or learn better similarity measures. Many discriminative appearance-based methods and metric learning-based methods were proposed, including appearance models Symmetry-Driven

Similarity Distance Metric Model
2.1.1. Euclidean distance. Euclidean distance is the most direct distance function to measure the difference between two points in the European Space. Assume that denotes a dataset, which consists of C class of data. Let i x denotes the sample of the i-th class, where is an ndimensional feature vector. Then, the Euclidean distance between 1 2 ( , , , ) The corresponding vector expression of formula (1) is as follows: Where M denotes the Mahalanobis matrix and i j x x are feature vectors of two images of a sample pair. M is learned from the training data. The Mahalanobis distance can effectively avoid the interference between variables when the training data could represent the distribution of all samples. However, the metric learning method requires large number of samples. The metric model would be over-fitting to the training data when the training data are too small to describe the distribution of the population properly. i j x x is not similar. The likelihood ratio function of positive and negative sample pairs is established to measure the difference between samples i x and j x as follows:

Mahalanobis Distance
According to the likelihood function (4), when the likelihood of dissimilarity of the sample pair ( , ) i j x x to be identified is greater than the likelihood of similarity, the value of the function is greater than 0. Then we accept the null hypothesis. i x and j x are not similar. Otherwise, the function value is less than 0. Then we refuse the null hypothesis. i x and j x are not similar. Assume that the difference vector of positive/negative pair follows the Gaussian distribution of zero mean. The probability of sample pair ( , ) i j x x belongs to dissimilar/similar pair is defined as follows, Where 0  denotes the covariance matrix of the positive pairs. 1  denotes the covariance matrix of the negative pairs. Then, function (4) can be rewritten as in (7) It is obvious that Dual-regularized KISS Metric Learning (DR-KISS) method introduced the regularization approach for covariance matrix estimation. The two regularized covariance matrices is defined as follows:  . 0  and 1  are respectively regularization parameters of covariance matrices of the positive pairs and the negative pairs. By regularizing the covariance matrices, DR-KISSME method efficiently ease the over-fitting problem and improve the generalization ability of metric model.

PCCA. Pairwise
Constrained Component Analysis (PCCA) [26] models the metric learning model-based on the pairwise data. This model learns a projection matrix which projects the raw features to a low dimension feature vector. And the Euclidean distance is used to measure the similarity-based on the projection feature vector. PCCA method formulate penalty items of different kinds of sample pairs to reduce the influence of dimension decline. The optimization objective function is as follows: is a generalized logistic function. L is the mapping function which projects the high-dimension feature vector to low-dimension feature vector.
is negative pair.

Projection Subspace Learning-based Method
The Mahalanobis distance could be defined as different forms. The projection subspace learning methods formulate the person re-identification as a classification problem, which tries to learn a discriminative classification boundary separating the positive samples from negative samples. Then, the matrix M is decomposed as follows,  [27], Local Retention Projection (LRP) [28], et. al.
Where w is the projection vector which projects the image feature vector i x to metric subspace. For convenience of calculation, the sample data is standardized by i i    x x x . The formula (14) can be rewritten as follows: 2) Local Retention Projection (LRP). characterized the local neighbourhood relationship of highdimensional data based on Graph theory. This method learned a projection subspace which keeps local neighbourhood relations in relatively low-dimensional projection subspace. LRP model used the Euclidean distance to measure the distance between samples in the projection subspace. The distance was mapped to a Gaussian distribution. The similarity measurement model is as follows: According to the distance function between two samples as defined in function (16), the higher the similarity, the closer the function value is to 0. Then, the metric model is established as follows,

3) Large margin nearest neighbour (LMNN)
. introduced the thought of large margin to Mahalanobis distance learning. It defined the samples of same class to be the nearest neighbours. The samples of different classes were separated from each other by the large margin. The goal of the Mahalanobis distance model was to separate samples of different classes and gather samples of same class together. Then the objective function of the optimization model was formulated as follows, Where i o denotes the set of the same class to the i-th sample. By enhancing the correlation of the positive pairs and weakening the correlation of the negative pairs, the model seeks for a Mahalanobis distance metric model which minimizes the objective function value of formula (19).

Supervised metric learning method.
The supervised methods tried to learn a projection subspace by the pairwise information of samples.
1) Zheng et. al. [10] proposed the Relative Distance Comparison (RDC) method to learn a projection subspace which separate the positive samples from the negative samples. It modelled a Mahalanobis distance function and formulated a distance comparison as follows: Where   p i f x denotes the distance between positive pair and   n j f x denotes distance between negative pair. The Mahalanobis distance function is defined as follows: (21) Where x denotes the difference vector of sample pairs' feature. Then, an optimization model is proposed to learn the Mahalanobis distance by which the positive sample is closer to the instance than the negative sample. The objective function is as follows: 2) Elastic Projections-based Metric Learning [31] proposed a pairwise similarity measurementbased on the elastic projection. Differing from the common metric learning method, this model learned two projections, including positive projection and negative projection, to improve the discrimination and robustness of the metric model. The objective function was formulated as follows: Where . p L denotes the projection matrix of positive pair and n L denotes the projection matrix of positive pair.

Mid-level Feature Learning-based Method
Mid-level feature learning-based method tries to learn a dictionary representation which is more robust to the appearance variations, including Semi-Supervised Coupled Dictionary Learning (SSCDL) [32], and Least Square Semi-Coupled Dictionary Learning (LSSCDL) [33] et. al.
The SSCDL method formulated a dictionary learning model to learn a mid-level feature for person re-identification. The two-dictionary representation model was established based on the pairwise relationship of probe sample and gallery sample respectively. Then, an optimization model of coupled dictionary learning was proposed as follows: Where x D and y D are dictionaries of probe set X and gallery set Y respectively. x Λ and y Λ are corresponding coefficient matrices. The SSCDL method utilized the dictionary learning method to learn a robust mid-level feature. Then, the dictionary representations were used for similarity distance measurement based on Euclidean distance.
The LSDCDL method firstly learned an SVM-based classification boundary for each instance to separate the positive samples from the negative samples. And the SSCDL-based method was used to construct an SVM classification boundary specific to each individual. The objective function is: Where p X is the probe sample.  [29] method tried to find a set of hyperplanes in which the projection of the samples from different classes are separated as far as possible and the samples from same classes are separated as close as possible. The between-class scatter and within-class scatter is used to formulate the metric learning model. The between-class scatter was as follows, Where i N denotes the sample number of the i-th class. i x denotes mean of the i-th class. x denotes the mean of all samples. The within-class scatter is as follows, Where ij x denotes the j-th samples of the i-th class. The objective function of FDA is as follows: According to the properties of Rayleigh entropy function, the solution of the optimization model (28)  The Euclidean distance is used to measure the similarity between images in the projection subspace. The result of the aforementioned semi-supervised method is also greater than the identification result of SCIR, the deep learning-based model. In summary, the solution to the over-fitting problem possesses important research value in improving the identification accuracy of the metric learningbased method.

Conclusion
The person re-identification methods of different types were reviewed in this paper, including appearance feature-based methods, deep learning-based methods, and metric learning-based methods. The appearance feature-based method and metric learning-based method corresponds to the two parts of the person re-identification task, appearance feature extraction and similarity measurement. While deep learning model establishes an end-to-end network, which formulates the two parts into a unified structure. Our taxonomy provides a guide for person re-identification. In recent years, deep learningbased method has made a significant improvement by introducing the body structure detection network. The improvement indeed alleviated the over-fitting problem, which is the main factor that influence the generalization ability of the metric model. The semi-supervised learning approach shows promising performance in improving the generalization ability of metric learning-based method. Besides, quantitative experiments were conducted in this paper to demonstrate the over-fitting problem and effectiveness of the semi-supervised method.