Linear Projective Non-negative Matrix Factorization

In order to solve the problem that the basis matrix is usually not very sparse in Non-Negative Matrix Factorization (NMF), a method, called Linear Projective Non-Negative Matrix Factorization (LP-NMF), is proposed. In LP-NMF, from projection and linear transformation angle, an objective function of Frobenius norm is defined. The Taylor series expansion is used. An iterative algorithm for basis matrix and linear transformation matrix is derived and a proof of algorithm convergence is provided. Experimental results show that the algorithm is convergent; relative to Non-negative Matrix Factorization (NMF), the orthogonality and the sparseness of the basis matrix are better; in face recognition, there is higher recognition accuracy. The method for LP-NMF is effective.


INTRODUCTION
Projective Non-negative Matrix Factorization (P-NMF) X WW X T ≈ (Yuan and Oja, 2005) was proposed based on NMF (Lee and Seung, 1999).Since it was constructed from the projection angle, the basis matrix W was only computed in the algorithm for P-NMF.The computational complexity was lower for one iteration step for P-NMF, as only one matrix had to be computed instead of two for NMF.
Linear Projection-Based Non-negative Matrix Factorization (LPBNMF) WQX X ≈ (Li and Zhang, 2010a) was constructed from projection and linear transformation angle.In LPBNMF, a monotonic convergence algorithm was given and the orthogonality and the sparseness of the basis matrix were computed quantificationally.
On the basis of optimization rules for P-NMF and LPBNMF, the basis matrixes were all forced to tend to be orthogonal.So, relative to NMF, the orthogonality and the sparseness of the basis matrixes were better and then the methods for P-NMF and LPBNMF were more beneficial to the applications of data dimension reduction, pattern recognition, and so on.However, since the algorithm for P-NMF wasn't convergent, the method for LPBNMF was more beneficial to the application (Li and Zhang, 2010a).
In this study, another method is proposed based on LPBNMF WQX X ≈ . We call it Linear Projective Non-negative Matrix Factorization (LP-NMF).Relative to the algorithm in the study (Li and Zhang, 2010a), the iterative formulae of this algorithm are simpler.

LINEAR PROJECTIVE NON-NEGATIVE MATRIX FACTORIZATION (LP-NMF)
Taking Frobenius norm as similarity measure, we consider an objective function: The mathematical model in NMF definition WH X ≈ is based on nonlinear projection.But, the basic idea for LP-NMF is that: firstly, we turn the data X into QW by a suitable linear transformation Q.Secondly, we may consider that QW is the projection of the sample space X onto a suitable subspace W. Finally, we minimize the objective function F in Eq. ( 1) to get W and Q.Here, we respectively call W basis matrix and Q linear transformation matrix.

The update rule for basis matrix W: For any element
Similarly, the second order derivative of ) (w F ab w at ab w is:     according to the definition 1 of reference (Lee and Seung, 2001).
Using this update rule, we may make the auxiliary function  8), the local minimum of the objective function F may be gotten.The algorithm converges after finite iterations.The Eq. ( 8) is the update rule for the basis matrix W.

The update rule for linear transformation matrix Q:
Similarly, we can get a function So, the Taylor series expansion of ) (q F ab q at ab q is: Meantime, when numerical calculation is considered, Eq. ( 9) is expressed through equation: We define a function: If all elements of Q are updated by Eq. ( 12), the local minimum of the objective function gotten.
The Eq. ( 12) is the update rule for transformation matrix Q.
Algorithm steps: Using Eq. ( 8) and Eq. ( 12), we may get an algorithm to compute the basis matrix linear transformation matrix Q.As follows: Step1: Initialize W, Q and X with non-Step2: Update W by Eq. ( 8) Step3: Update Q by Eq. ( 12) Step4: Repeat step2 and Step3 converges

EXPERIMENTS AND ANALYSIS
In order to verify the convergence of the algorithm and the sparseness of the basis matrix experiment.In the experiment, X consists of the first five images of each person in the ORL facial image database, a total of 200 data.We randomly initialize and Q with non-negative data, and set the rank of the basis matrix W 80. In order to reduce the amount of computation and speed up operating speed, every image is reduced to half.

Algorithm convergence:
In the experiment curve of the objective function values steps is shown in Fig. 1.We can see that convergent, but the convergence speed the reason is that the algorithm is still an a optimization method.

Analysis of the basis matrix:
Meantime, matrix image is shown in Fig. 2. We respectively the vector W T x, (WTW) -1 W T x and Qx ) , ( ) (t ab q q q ab is an We have: ) ) ab (12) are updated by Eq. ( 12), the local minimum of the objective function F may be The Eq. ( 12) is the update rule for the linear Using Eq. ( 8) and Eq. ( 12), we may get an algorithm to compute the basis matrix W and the As follows: -negative data until algorithm ANALYSIS verify the convergence of the algorithm and the sparseness of the basis matrix W, we do an consists of the first ORL facial image database, a total of 200 data.We randomly initialize W set the rank of the 80.In order to reduce the amount of computation and speed up operating speed, every image experiment, the varied s versus iteration that the algorithm is the convergence speed is lower which that the algorithm is still an alternating Meantime, the basis .We respectively take Qx as the feature From the basis matrix image, basis matrix is very sparse.This shows that the basis matrix W is forced to tend to optimizing the objective function F.
From the reconstructed images, three reconstructed images are all effective, and this shows that the basis matrix W is effective; reconstructed image of x is better by  In addition, the orthogonality and the sparseness of the basis matrix may be computed quantificationally (Li and Zhang, 2010a;Yang et al., 2007).Without doubt, because this method is still based on the objective function in Eq. ( 1) for optimization, the orthogonality and the sparseness of the basis matrix are still better.Here, we don't repeat them.

RESULTS OF FACE RECOGNITION AND ANALYSIS
In learning phase, X consists of the first five images of each person in the ORL facial image database, a total of 200 data.In order to reduce the amount of computation, and speed up the operating speed, each image is reduced to a quarter of the original.We initialize randomly W and Q with nonnegative data.After the algorithm converges, we get the basis matrix W, matrix Q, X W W W T T 1 ) ( − , and QX .
In the pattern recognition test phase, we take the after five images of each person in the ORL facial image database, a total of 200 data, as test data, and reduce every image to a quarter of the original.
We first decide the feature vector of data x from  ) ( − used, the recognition accuracy is very low when the rank of the basis matrix is greater than or equal to 140.On the contrary, the recognition accuracy is higher when the template library QX is used.But, when the rank of the basis matrix is smaller than 140, the recognition accuracy is slightly higher using ) ( − than using QX .Therefore, the linear transformation matrix Q has some important information. So, in next experiments, we take the matrix QX as a template library, and use Qx to compute the feature vector of test image x by the matrix Q obtained in the learning phase, and use the nearest neighbor rule for face recognition.We compare this method with the methods of NMF, LNMF (Li et al., 2001), NMFOS (Li et al., 2010b), ONMF (Yoo and Choi, 2010), and DNMF (Buciu and Nafornita, 2009).When the ranks (i.e., the feature subspace dimensions) of the basis matrix are set different values, the results of the face recognition are shown in Fig. 4.
As can be seen from the Fig. 4, the recognition accuracy is obviously higher using LP-NMF than using NMF or ONMF.The cause is that the basis matrix W is forced to tend to be orthogonal by the objective function for LP-NMF in Eq. (1) so that the basis matrix is more orthogonal in LP-NMF than in NMF.So the discriminative power of the feature vector Qx for LP-NMF is better.Meantime, when the rank of the basis matrix is greater than or equal to 60, the recognition accuracy is slightly higher using LP-NMF than using LNMF or NMFOS.This is because there are also approximately orthogonal constraints for the basis matrixes in the objective functions for LNMF and NMFOS so that the discriminative power of the feature vectors is also good.But the discriminative power of the feature vector Qx for LP-NMF is better.Finally, since the class information is taken into account in DNMF, there is also higher recognition accuracy.
In addition, when the rank of the basis matrix of LP-NMF is between 40 and 160, the recognition accuracy becomes more stable.This is because the orthogonality and the sparseness of the basis matrix for the LP-NMF are always better so that the recognition accuracy is less affected by the number of the rank of basis matrix.

F
stand for the part of F relevant to w ab in Eq. (1).So, writing w instead of w ab in is the first order partial derivative of F with respect to w ab .That is: calculate the first order partial derivative of w , and have: all elements of W are updated by Eq. (

Fig. 1 :Fig
Fig. 1: Objective function values versus iteration steps when the basis matrix is initialized randomly with non negative data and this shows that the basis versus iteration steps when the basis matrix is initialized randomly with nonrespectively shown in the (b) the (d) image in Fig.3.From the basis matrix image, we can see that the basis matrix is very sparse.This shows that the basis be orthogonal by .From the reconstructed images, we can see that reconstructed images are all effective, and this is effective; getting the is better by x as template library, and use the nearest neighbor rule for face recognition.When the ranks of the basis matrix W are set different values, the results of the face recognition are shown in

Fig. 4 :
Fig. 4: Comparison of the results of face recognition in the ORL

Table 1 .
From the Table 1, the template library