International Journal of Advanced Robotic Systems Distance Adaptive Tensor Discriminative Geometry Preserving Projection for Face Recognition Regular Paper

There  is  a  growing  interest  in  dimensionality reduction  techniques  for  face  recognition,  however,  the traditional  dimensionality  reduction  algorithms  often transform  the  input  face  image data  into  vectors  before embedding.  Such  vectorization  often  ignores  the underlying  data  structure  and  leads  to  higher computational complexity. To effectively cope with these problems,  a  novel  dimensionality  reduction  algorithm termed distance adaptive tensor discriminative geometry preserving  projection  (DATDGPP)  is  proposed  in  this paper. The key idea of DATDGPP is as follows: first, the face  image  data  are  directly  encoded  in  high‐order tensor structure so that the relationships among the face image data can be preserved; second, the data‐adaptive tensor  distance  is  adopted  to  model  the  correlation among  different  coordinates  of  tensor  data;  third,  the transformation  matrix  which  can  preserve discrimination  and  local  geometry  information  is obtained  by  an  iteration  algorithm.    Experimental results on  three  face databases  show  that  the proposed algorithm  outperforms  other  representative dimensionality reduction algorithms.


Introduction
Over the last decade face recognition has become one of the most active research areas in multimedia information processing due to the rapidly increasing requirements in many practical applications, such as identify authentication, information security, human-computer interaction/communication and so on.A major challenge of face recognition, however, is that the captured face image data often lies in a high-dimensional feature space.For example, a face image with the resolution of 128 128  pixels is represented as a point in a 16384-dimensional face space.Learning in such high-dimensionality in many cases is almost infeasible.Thus, learnability necessitates dimensionality reduction, which aims to find the lowerdimensional feature representation of high-dimensional data with enhanced discriminatory power.Once the highdimensional face image data is projected into lower-dimensional feature space, traditional classification schemes can then be applied.
The most popular conventional algorithms for dimensionality reduction are principal component analysis (PCA) and linear discriminant analysis (LDA) [1].PCA maintains the global Euclidean structure of the data in the original high-dimensional space and projects the data points into a lower-dimensional subspace, in which the sample variance is maximized.In contrast to the unsupervised method of PCA, LDA is a supervised learning approach.LDA aims to seek the project axes on which the data points of different classes are far from each other while requiring data points of the same class to be close to each other.The optimal projection of LDA is computed by minimizing the within-class scatter and maximizing the between-class scatter simultaneously.It is generally believed that LDA-based algorithms outperform PCA-based ones, since the former optimizes the low-dimensional representation of the objects with the most discriminant information, while the latter simply achieves object reconstruction.In addition, nonnegative matrix factorization (NMF) [2] has been proposed for face recognition.NMF learns the parts of objects by using non-negativity constraints, which leads to a parts-based representation of objects.However, the iterative update method for solving the NMF problem is computationally expensive.Therefore, although the above mentioned algorithms have widely been applied to face recognition, they are designed for discovering only the global Euclidean structure, whereas the local manifold structure is ignored.
Recently, a number of manifold learning algorithms have been proposed to discover the geometric property of high-dimensional data spaces and they have been successfully applied to face recognition.Manifold learning aims at discovering the geometric properties of the data space, such as its Euclidean embedding, intrinsic dimensionality, connected components, homology, etc.The desired manifold is an intrinsically lowerdimensional space hidden in the input high-dimensional space.The most well-known manifold learning algorithm include isometric feature mapping (ISOMAP) [3], locally linear embedding (LLE) [4] and Laplacian eigenmap (LE) [5].ISOMAP, a variant of multidimensional scaling (MDS), aims to preserve the global geodesic distances between any pair of data points.LLE aims to embed data points in a low-dimensional space by finding the optimal linear reconstruction in a small neighbourhood.LE aims to preserve proximity relationships by manipulations on an undirected weighted graph, which indicates neighbour relations of pairwise data points.These nonlinear methods have achieved impressive results on some pattern classification tasks, however, the mappings derived from them are defined only on the training data points and thus, how to evaluate the maps on novel test data points remains unclear [6].One common response to cope with this problem is to apply a linearization procedure to construct explicit maps over new samples.The representative example of this approach is locality preserving projection (LPP) [7], a linearization of LE.LPP is obtained by finding the optimal linear approximations to the eigenfunctions of the Laplace Beltrami operator on the manifold.Despite this, dimensionality reduction methods are based on different embedding criteria.Yan et al. demonstrated that most of them can be mathematically unified within a general framework, called graph embedding [8].Based on the graph embedding framework, marginal Fisher analysis (MFA) was developed to jointly consider the local manifold structure and the class label information for dimensionality reduction.However, MFA extracts discriminative information from only marginal samples, although non-marginal samples also contain the discriminative information.Recently, Song et al. has proposed discriminative geometry preserving projections (DGPP) [9] which has yielded impressive results on scene classification.DGPP is fundamentally based on manifold learning, but simultaneously considering both intraclass geometry and interclass discrimination for dimensionality reduction.
However, most previous dimensionality reduction algorithms often operate on input face image data after they have been transformed into a 1-D vector.In fact, face image data are intrinsically in the form of second-or higher-order tensors [10,11].For example, grey-level face images represent second-order tensor data (matrices) and can be expanded to third-order tensors by representing sets of images after Gabor filtering.Therefore, such vectorization ignores the underlying data structure and often leads to the curse of the dimensionality dilemma and the small sample size problem.To address these problems, many tensor-based dimensionality reduction algorithms have been proposed.Representative algorithms include tensor principal component analysis (TPCA) [12], tensor linear discriminant analysis (TLDA) [10], tensor locality preserving projection (TLPP) [13], tensor marginal Fisher analysis (TMFA) [8] and so on.However, most of the existing tensor-based algorithms simply use the traditional Euclidean distance to measure relationships among different data points.Despite its prevalent usages, Euclidean distance often ignores the relationships among different coordinates for high-order data, such as the spatial relations of pixels in images, which has been shown in many previous studies to be very useful for improving the learning performance [14].Therefore, it is very natural that both embedding strategies and their related distance metric should be considered in designing tensor-based dimensionality reduction algorithms.However, most of the existing tensor learning algorithms fail to take into account the correlation among different coordinates of data with arbitrary number of orders.
In this paper, we propose a novel distance adaptive tensor manifold learning algorithm for face recognition.By using a data-adaptive tensor distance metric proposed by Liu et al. [14], we can effectively exploit the spatial correlation relationships of face image data to enhance the learning performance.We discuss how to tensorize the discriminative geometry preserving projection which gives rise to distance adaptive tensor dimensionality reduction algorithm for face recognition.
The rest of the paper is organized as follows: in Section 2, we provide a brief review of the original vector-based discriminative geometry preserving projection (DGPP) algorithm.Our distance adaptive tensor manifold learning algorithm for face recognition is introduced in Section 3. The experimental results on face recognition are presented in Section 4. Finally, we provide the concluding remarks and suggestions for future work in Section 5.

Brief review of DGPP
The original vector-based discriminative geometry preserving projection (DGPP) [9] is a recently proposed manifold learning algorithm for dimensionality reduction, it can precisely model both the intraclass geometry and interclass discrimination by using an average weighted adjacency graph and local linear reconstruction error.In addition, the original vector-based DGPP avoids the out of sample problem which exists in traditional manifold learning algorithms by applying a linearization procedure to construct explicit maps over new samples.
Given a set of face images   , , , . DGPP aims to find a linear transformation in the p-dimensional space to a vector i y in the lower q-dimensional space by x well in terms of the discrimination preservation and local geometry preservation criterion.The optimal linear transformation of DGPP can be obtained by solving the following maximization problem: arg max arg max Tr with the constraint where I is an identity matrix, matrix and its ith entry is , and the weighing factor matrix denotes both the distance weighing information and the class label information in terms of where  is a positive constant and its setting can be referred to in [5], x and the th l class contains l n samples satisfying arg min arg min Thus, we can easily obtain As can be seen from the above statement, DGPP aims to look for a linear transformation matrix U such that the distances between interclass samples are as large as possible, while distances between intraclass samples are as small as possible, and the local geometry of intraclass samples is preserved as much as possible by keeping linear reconstruction error minimization.Finally, with simple matrix operations, the transformation matrix U that maximizes (1) is given by the maximum eigenvalue solution to the following stand eigenproblem: Thus, the matrix U can also be regarded as the eigenvectors of the matrix with the largest eigenvalues.
Although the original vector-based DGPP has shown promising results on scene classification, it commonly unfolds each image into a single column vector before dimensionality reduction.In fact, image objects are intrinsically in the form of second or higher-order tensors.Thus, the vectorization operation of DGPP largely increases the computational costs of image data analysis and seriously destroys the intrinsic structure relationships among different coordinates of high-order image data which has been shown in many previous studies to be very useful for improving the learning performance [14,15].

Distance adaptive tensor discriminative geometry preserving projection for face recognition
In order to preserve the intrinsic relationships among different coordinates for high-order face image data, a natural way is to perform dimensionality reduction in their original high-order tensor space.In this section, we will describe our distance adaptive tensor discriminative geometry preserving projection approach which is fundamentally based on discriminative geometry preserving projection, as well as tensor structure representation and its closely related adaptive tensor distance metric.We begin with a review of a few tensor operations [10,16].

Review of tensor operations
Assume that a data sample is represented as an nth-order , ; consequently, the norm of a tensor X is defined to be , X X Y  and the tensor distance between tensors X and Y is defined as , The k-mode unfolding of a tensor . For example, if is a third-order tensor, then we can obtain 6 20

Data-adaptive tensor distance
Effective distance function plays a key role in tensorbased dimensionality reduction techniques and a number of research efforts have shown that the performance of the tensor-based dimensionality reduction algorithms not only depends on embedding strategies, but is also closely related to distance metrics.The most commonly used distance metrics are Euclidean distance in tensor-based techniques.However, the orthogonality assumption of Euclidean distance ignores the relationships among different coordinates for high-order tensor data, such as the spatial relationships of pixels of pixels in images.To alleviate the orthogonality assumption and enhance the performance of the tensor-based dimensionality reduction algorithms, we adopt the data-adaptive tensor distance metric proposed by Liu et al. [14].
Given a high-order tensor data , let x denote the vector form representation of X .Thus, the element . Then, following the suggestions in [14], the tensor distance between two tensors X and Y can be defined as follows: where lm g denotes the metric efficient and G denotes the metric matrix.The key issue now is the choice of G so that the tensor distance induced by the data-dependent metric matrix is motivated with respect to the intrinsic relationships between different coordinates for highorder tensor data.
In order to model the intrinsic relationships between different coordinates, Wang et al. [15] have already proved the following conclusion: for image data, if the metric coefficients depend properly on distances of pixel locations, then the obtained distance metric can effectively reflect the spatial relationships between pixels.Following this idea, Liu et al. [14] designed the following the metric matrix G : where  is a positive regularization parameter and is empirically set as 1 for his simplicity, and ), which can be defined as follows: Once obtaining the metric matrix G according to (8), we eventually get the following data-adaptive tensor distance metric: It is important to note that the above data-adaptive tensor distance metric can be reduced to the traditional Euclidean distance when G I  .Therefore, the traditional Euclidean distance is a special case of the proposed data-adaptive tensor distance metric.

Distance adaptive tensor discriminative geometry preserving projection
In order to preserve the intrinsic tensor structure of highorder face image data, a natural way is to perform dimensionality reduction in their original high-order tensor space.In the following, we discuss how to tensorize the discriminative geometry preserving projection which gives rise to a distance adaptive tensor discriminative geometry preserving projection (DATDGPP) algorithm for face recognition.
Given n face image samples 1 2 , , , n X X X  in the tensor space Y represents j X well in terms of discrimination preservation and local geometry preservation.
Similar to the original vector-based DGPP method, to preserve both the local geometry and the discriminative information in the low-dimensional feature subspace, the optimal objective function of DATDGPP is defined as follows: with the constraint where the weighting factor ij H encodes both the distance weighting information and the class label information in the high-order tensor space according to where  is a positive constant and its setting can be referred to in [5],

 
, td i j d X X denotes the tensor distance metric between two tensor i X and j X defined in (10), and ij W is the reconstruction weight in the tensor space, which can be obtained by solving the following reconstruction error minimization problem: arg min arg min Similar to the original DGPP method, by imposing :  on the optimal problem (14), we can easily obtain is the local Gram matrix in the tensor space and p q i c c c   .
Because of the difficulty in computing the optimal , , , N U U U  simultaneously for (11), we propose an iterative algorithm inspired by previous research [10].In this algorithm, we first initialize the transformation matrices , , , , , , , then the optimal transformation matrix k U can be solved from the other projection matrices , , , , , , arg max arg max arg max where Since the optimal problem of (15) subject to ( 16) can be approximately solved by the following standard eigenvalue decomposition problem: then the optimal transformation matrix k U can also be regarded as the eigenvectors of the matrix associated with the largest eigenvalues.Therefore, the optimal objective function of ( 11) can be solved by iteratively optimizing different transformation matrices while fixing the other transformation matrices.
The resolving procedure of the above iteration algorithm can be described as follows: first, we fix , and obtain the optimal projection matrix 2 U by computing the largest eigenvectors of The rest can be computed by analogy.Finally, we can obtain the optimal projection matrix n U by . We repeat the above iteration procedure until algorithm converges.The algorithmic procedure of DATDGPP can be summarized as follows: and their class labels 3: Obtain the reconstruction error W by solving the function ( 14); 4: Construct the optimal objective function of DATDGPP in terms of ( 11) and ( 12); 5: Initialize iteration number 0 t  and transformation matrices 11: Solve the eigenproblem: In Algorithm 1, we described in detail the procedure for learning the transformation matrices in an iterative manner.In the following, we analyse the computational complexity of DATDGPP and prove the convergence of the proposed iteration resolving algorithm.
To easily analyse the computational complexity, we assume that the high-order tensor data have the same size in each dimension, i.e., i m m  , the time complexity of the original vector-based DGPP algorithm is . However, for our proposed DATDGPP algorithm, the time complexity in computing the scatter matrices is and the time complexity for solving the eigenvalue decomposition is for each loop, which is much lower than the original vector-based DGPP algorithm.Although the DATDGPP algorithm has no closed-form solution and many loops are needed to reach convergence, it still runs much faster than the original DGPP algorithm due to its simplicity in each iteration.
Theorem 1.The iterative algorithm for DATDGPP will converge.
Proof: to prove the convergence of the proposed iterative algorithm, we only need to prove that the objective function is non-decreasing and has an upper bound.To conveniently describe the proof procedure, we rewrite the objective function in (11) as follows: In fact, in each iteration of Algorithm In addition, an upper bound exists for the objective function in (11), i.e., , , , , , ,

 
, . In fact, we have shown that the traditional Euclidean distance is a special case of our proposed data-adaptive tensor distance metric in the last paragraph of Section 3.2.In addition, since the TDGPP algorithm is similar to our proposed DATDGPP algorithm, except for the tensor distance computation in (13), we omit the detailed algorithmic procedure of TDGPP for simplicity.

Face recognition using DATDGPP
Once the transformation matrix of DATDGPP is obtained, we can apply it to project the face images into a lowdimensional subspace.Then the face recognition problem becomes a pattern classification task and the traditional classification algorithms can be applied in the lowdimensional subspace.In this paper, we apply the nearest-neighbour classifier because of its simplicity.
With the learned transformation matrices   , can be computed according to When a new testing face image data X arrives, we can first obtain its low-dimensional feature representation according to Then the class label of X is predicted to be that of a face

Experimental results
In this section we first compare our proposed DATDGPP algorithm with the original vector-based DGPP algorithm and the tensor DGPP(TDGPP) algorithm which adopts the traditional Euclidean distance metric.Then, we compare the proposed DATDGPP algorithm with tensor principal component analysis (TPCA) [12], tensor linear discriminant analysis (TLDA) [10], tensor locality preserving projection (TLPP) [13], and tensor marginal Fisher analysis (TMFA) [8] -four of the most popular tensor-based dimensionality reduction algorithms in face recognition.The CMU PIE face database [17] contains 41,368 face images of 68 subjects in total.The face images were captured by 13 synchronized cameras and 21 flashes, under varying pose, illumination and expression.In this work, we use a subset of five near front poses (C05, C07, C09, C27 and C29) and five illuminations (indexed as 08 and 11).Therefore, each person has ten images.Figure 3 shows ten example images of one person from the PIE database.

Results
We conducted two experiments on each database.In each experiment, the face image set was randomly partitioned into the training and testing set with different numbers.For ease of representation, the experiments are named as Gm/Pn which means that m images per person are randomly selected for training and the remaining n images for testing.We repeat each experiment 20 times on randomly selected training and testing sets and report the average results.
In the first experiments we compare the recognition accuracy and running time of DGPP, TDGPP and DATDGPP algorithms under different training and testing partitions.In general, the performance of all these algorithms varies with the number of dimensions.We show the maximal average recognition accuracy as well as the optimal reduced dimension and the running time obtained by DGPP, TDGPP and DATDGPP algorithms on the three databases in Tables 1-6.In the second experiment we also compare our proposed DATDGPP algorithm with the other four representative tensor-based dimensionality reduction algorithms: TPCA, TLDA, TLPP and TMFA.Tables 1, 3, 5 report the maximal average recognition accuracies and the corresponding optimal reduced dimensions of the TPCA, TLDA, TLPP, TMFA and DATDGPP algorithms on the three databases.Due to space limitations, we omit the plots of recognition accuracy versus variation of reduced dimensions on the three databases.
w can be obtained by solving the following reconstruction error minimization problem: find N transformation matrices

. 1 : 2 :
Construct the adaptive tensor distance metric in terms of(10); Construct the weighting factor matrix H in terms of (13); Our empirical study on face recognition was conducted on three real-world face databases: the Yale database, the Olivetti Research Laboratory (ORL) database and the PIE (pose, illumination and expression) database from CMU.In all the experiments, preprocessing to locate the faces was applied.Original images were manually aligned according to the eye position, cropped and normalized to the resolution of 112 92  .In addition, in order to reduce the influence of some extreme illumination, histogram equilibrium was applied in the preprocessing step.Face images in the three databases are naturally second-order tensors.In all the experiments the recognition process has three steps.First, we calculate the face subspace from the training samples; then the new face image to be identified is projected into d-dimensional subspace for the vectorbased algorithms or   d d  -dimensional subspace for the tensor-based algorithms; finally, the new face image is identified by the nearest-neighbour classifier in the lowdimensional subspace.The Yale database (http://cvc.yale.edu/projects/yalefaces/yalefaces.html)contains 165 front view face images of 15 individuals.Eleven images were collected from each individual with varying facial expressions and configurations.Eleven sample images of one person from the Yale database are shown in Figure 1.

Figure 1 .
Figure 1.Face image examples from the Yale database.

Figure 2 .
Figure 2. Face image examples from the ORL database.

Figure 3 .
Figure 3. Face image examples from the PIE database.
and the face image X is classified to the class *

Table 1 .
Comparisons of maximal average recognition accuracy (in percent) as well as the optimal reduced dimension on the Yale database.

Table 2 .
Comparisons of running time (second) on the Yale database.

Table 3 .
Comparisons of maximal average recognition accuracy (in percent) as well as the optimal reduced dimension on the ORL database.

Table 4 .
Comparisons of running time (second) on the ORL database.

Table 5 .
Comparisons of maximal average recognition accuracy (in percent) as well as the optimal reduced dimension on the CMU PIE database.

Table 6 .
Comparisons of running time (second) on the CMU PIE database.