Robust Tensor Analysis with Non-Greedy (cid:96) 1 -Norm Maximization

. The (cid:96) 1 -norm based tensor analysis (TPCA-L1) is recently proposed for dimensionality reduction and feature extraction. However, a greedy strategy was utilized for solving the (cid:96) 1 -norm maximization problem, which makes it prone to being stuck in local solutions. In this paper, we propose a robust TPCA with non-greedy (cid:96) 1 -norm maximization (TPCA-L1 non-greedy), in which all projection directions are optimized simultaneously. Experiments on several face databases demonstrate the eﬀectiveness of the proposed method.


Introduction
Principal component analysis (PCA) is a widely used dimensionality reduction and feature extraction method due to its simplicity and effectiveness [1][2][3].In the domain of image analysis, traditional PCA algorithms reshaped the original two-demensional image into a one-dimensional long vector and aimed to find the optimal projection directions in which the reconstruction error was minimized.However, the intrinsic spatial structural information was damaged.2DPCA [4] was proposed to alleviate this problem, which can essentially be seen as PCA in matrix forms.In order to capture more spatial information, tensor analysis [5], [6] (also named multilinear subspace analysis) was introduced into PCA, in which an image can be treated as a second-order tensor and a sequence of images as a third-order tensor.Tensorface [7] is one of the earliest tensor analysis methods for face recognition.The main idea of tensorface is high-order singular value decomposition (HOSVD).Generalized low rank approximation (GLRAM) [8] was proposed to solve the problem of low rank approximations of matrices, in which each image is represented as a two-order tensor.
It is known that PCA and 2DPCA are prone to outliers due to the utilization of 2 -norm.By contrast, the 1 -norm is more robust to outliers than the 2 -norm.In order to alleviate the effect of outliers, some 1 -norm based methods were proposed, including L1-PCA [9], R1-PCA [10] and PCA-L1 [11].Among them, PCA-L1 is invariant to rotations and also robust to outliers.Based on the work of Ye [8], Pang proposes the TPCA-L1 algorithm using 1 -norm instead of Frobenius norm [12].To solve the 1 -norm maximization problem, these algorithms use a greedy strategy which makes them easy to being stuck in local solutions.In 2011, Nie and Huang [13] propose a PCA with non-greedy 1 -norm maximization (PCA-L1 non-greedy), in which all projection directions are optimized simultaneously without increasing computational complexity.Then, Wang extends 2DPCA-L1 to its non-greedy version (2DPCA-L1 non-greedy) [14].In this paper, we propose a tensor principal component analysis with non-greedy 1 -norm maximization termed as TPCA-L1 non-greedy.It has three major advantages: 1) It is robust to outliers due to the utilization of 1 -norm; 2) more spatial structure information is preserved compared with PCA-L1; 3) all projection directions can be optimized simultaneously and much better recognition accuracy can be obtained than that of TPCA-L1 without increasing the computational cost.
The rest of this paper is organized as follows: In Sec. 2, we give a brief review of the work of TPCA-L1 greedy algorithm [12].Then we propose the tensor principal component analysis with non-greedy 1 -norm maximization in Sec. 3. The experimental results are reported and an analysis of the experimental results is given in Sec. 4. Finally Sec. 5 concludes the paper and points out the future work.

Brief Review of TPCA-L1
Let {X 1 , • • • , X n } be a sequence of image matrices, where n is the number of images.The size of the matrix X i is h × w, where h and w represent the image height and width, respectively.In TPCA-L1 [12], an effective optimization algorithm was proposed to find two r-dimensional projection matrices U and V that maximize the 1 -norm based dispersion, i.e., max subject to The alternative projection optimization procedure can be repeated until converges, which is proved in [8].The procedure of how to compute v while u is fixed and then compute u while v is fixed is described in detail.

Compute v while u is Fixed
Beginning with r = 1, equation (1) can be rewritten as subject to Define a polarity function p i (t) to solve the 1 -norm maximization, of which the value is either −1 or 1.The value t in the polarity function indicates iteration number.The polarity function p i (t) is defined as So g(u, v) can be converted to The projection vector v(t + 1) at the (t + 1)th iteration is updated as ( Let t = t + 1 and repeat (3) and ( 5), the iteration then runs until converges.

Compute u while v is Fixed
In the section above, the iteration procedure calculates the optimal v while u is fixed.After the optimal v is obtained, the task becomes computing the optimal u that maximize where the polarity function s i (t) is defined as The updating rule for u is Then equations ( 7) and ( 8) should be repeated until the iteration runs convergence.

Compute u k and v k Based on u k−1 and v k−1 Respectively
From ( 5) we can find that v 1 is a linear combination of , X i u should be updated as a whole Similarly, to calculate The TPCA-L1 algorithm does not need to perform eigendecomposition of covariance-like matrices.If the covariance-like matrices are in a large size, the computational cost of traditional tensor analysis algorithms will normally be very large in most cases.However this algorithm optimizes all projection directions sequentially, which makes it easy to being stuck in local solutions.

TPCA with Non-Greedy 1 -Norm Maximization
In this section, a general 1 -norm maximization problem is discussed at first, and then TPCA is extended to its nongreedy version.The general 1 -norm maximization problem is formulated as follows: which is assumed to have a upper bound.Without loss of generality, f (•) and g(•) are two random functions.Equation (11) can be rewritten as where p i = sgn(g i (v)), and sgn(•) is a sign function defined as sgn An effective algorithm to solve this general 1 -norm maximization problem has been detailed in [13], which is summarized in Appendix: Algorithm 1.
In each iteration, p i is calculated by current v t , and next solution v t+1 is updated with the current p i .The iterative procedure is repeated until the algorithm converges.It has been proved that Algorithm 1 will monotonically increase the objective function of (11) in each iteration, and usually converge to a local solution [13].
Then, we will focus on how to extend the TPCA-L1 algorithm to its non-greedy version.The original problem of TPCA-L1 is to minimize the reconstruction error as follows min n i=1 subject to U T U = I r , V T V = I r , where stand for two orthogonal projection matrices.The problem is corresponding to the following equation: subject to In this paper, an alternative projection optimal algorithm is proposed to solve the 1 -norm maximization.When U is fixed, one can compute V, then when V is fixed, compute U.So we extend TPCA-L1 to its non-greedy version from two sides.Here is the description of how to compute U and V, respectively.

Compute V while U is Fixed
While U is fixed, the problem (1) can be rewritten as (15) where the matrix Y 1 (i) = X i U, y1 (i)  j represents the jth column vector of matrix Y 1 (i) .If define the training matrix as Y = y1 (1)  1 , where where Z = R T V T L ∈ R r ×h , λ ii and z ii are the (i, i)th element of matrices Λ and Z respectively.Note that Z Z T = I r , where I r is r dimensional identity matrix, so z ii ≤ 1.While λ ii ≥ 0 for that λ ii is ith singular value of M. That is to say, when Z = [I r , 0], Tr V T M reaches the maximum.So the optimal solution is (18)

Compute U while V is Fixed
Similarly while V is fixed, the problem (1) can be rewritten as arg max where the matrix Y 2 (i) = X T i V , y2 (i) j represents the jth column vector of matrix Y 2 (i) .Similarly define the training matrix as Y = y2 (1)  1 , y2 where where Z = R T U T L ∈ R r ×h , λ ii and z ii are the (i, i)th element of matrices Λ and Z respectively.Similarly to the method to compute V , the optimal solution of U can be calculated: The whole procedure is summarized in Appendix: Algorithm 2.

Experimental Results
In this section, four public image databases are selected for performance evaluation.The brief description of the four databases is listed as following [6], [15]:  Some image samples of the four databases are shown in Fig. 1.In this experiment, we demonstrate the effectiveness of the proposed TPCA-L1 non-greedy algorithm compared to the TCPA-L1 algorithm [12].Four aforementioned image databases are used to compare the objective values Figure 2 shows the objective values versus feature number obtained by the TPCA-L1 greedy algorithm and the TPCA-L1 non-greedy algorithm, from which we can get the proposed algorithm obtains much higher objective values than that of the TPCA-L1 greedy algorithm on all the four image databases.
The classification is to evaluate the Euclidean distance between the testing image and the training image [4].Here if define that X j ( j = 1, 2, • • • , m) represent the training images, and Y i (i = 1, 2, • • • , n) represent the testing images, the Euclidean distance is evaluated by the equation Given one testing sample Y ∈ L class, if d(Y, X l ) = min d(Y, X j ), the testing image Y ∈ L class.Then we select two databases (ORL and Yale) to evaluate the proposed algorithm and the TPCA-L1 greedy algorithm for classification.The size of each image is 64 × 64 pixels in this experiment.We randomly select 5 images for each person as training samples, and the remaining images for testing.Here, a varying percentage of training images was corrupted by outliers.For convenience, we use "a% of training images with outliers" to denote that a% of the images in our training set are corrupted by outliers.The percentage of training images with outliers is 0, 20 and 40 respectively.Figure 3 shows the recognition accuracy versus feature number obtained by the TPCA-L1 non-greedy algorithm and the TPCA-L1 greedy algorithm on the two face databases.As can be seen in Fig. 3, the proposed algorithm gets much better recognition accuracy than that obtained by the TPCA-L1 greedy algorithm on the ORL Face Database, especially when the feature number is large.On the Yale Face Database, when some images are corrupted by outliers, the proposed algorithm gets much higher recognition accuracy than that of the TPCA-L1 greedy algorithm.
As can be seen, the recognition accuracy decreases badly while the number of features is increasing.Note that high dimensional data is usually interrelated and has much redundancy information.In the low dimensional space, redundancy information is removed and the recognition accuracy is high.When the number of features increases, the more information (including the noises and redundancy information) are obtained which can depress the recognition accuracy.
Comparing the performances of the TPCA-L1 nongreedy and the TPCA-L1, it can be seen from Tab. 1 that the TPCA-L1 non-greedy performs better than the TPCA-L1 for small number features.The computational cost is decided by iteration times, feature numbers, image numbers and image size.However, image numbers and image size are fixed for one database.So iteration times and feature numbers are the main factors that decide the computational cost.Here we use the time cost to represent the computational cost.Figure 4 shows the recognition accuracy versus the time cost for TPCA-L1 non-greedy when the procedure runs 50 times.From the results, it can be seen that the recognition accuracy won't keep increasing while the time cost increasing.

Conclusions
In this paper, a non-greedy 1 -norm based TPCA has been proposed, which is robust to outliers.By using tensor representing the training images, the proposed algorithm can exploit more spatial information of images, and thus gets better performance.In addition, all projection directions are optimized simultaneously with a non-greedy method.Experimental results on the databases show that the TPCA-L1 nongreedy algorithm performs better than the greedy method in recognition accuracy and objective values.

Fig. 1 .Fig. 2 .
Fig. 1.Sample images from the four databases used in the experiments.