International Journal

Minor component analysis (MCA) are used in many applications such as curve and surface ﬁtting, robust beamforming, and blind signal separation. Based on the generalized eigen-decomposition, we present a completely different approach that leads to derive a novel MCA algorithm. First, in the sense of generalized eigen-decomposition, by using gradient ascent approach, we derive an algorithm for extracting the ﬁrst minor eigenvector. Then, the algorithm used to extract multiple minor eigenvectors is proposed by using the orthogonality property. The proofs and rigorous theoretical analysis show that our proposed algorithm is convergent to their corresponding minor eigenvectors. We identify three important characteristics of these algorithms. The ﬁrst is that the algorithm for extracting minor eigenvectors can be extended to generalized minor eigenvectors easily. The second is that the corresponding eigenvalues can be computed simultaneously as a byproduct of this algorithm. The third is that the algorithm is globally convergent. The simulations have been conducted for illustration of the efﬁciency and effectiveness of our algorithm.

The learning algorithm of MCA neural networks is described by a stochastic discrete-time(SDT) system.Many authors have proven the convergence by using continuous ordinary differential equations (ODE) approximation 22 .Since the convergence of continuous ODE does not imply the convergence of corresponding discrete ones when the learning rate is a constant 23,24 , we can not use continuous ODE approximation to prove the convergence.On the other hand, because the behavior of conditional expectation of the weight vector can be studied by de-Downloaded by [York University Libraries] at 09:52 03 January 2015 Received 2 January 2011 Accepted 12 November 2011 terministic discrete-time(DDT) system, it is reasonable to study the SDT algorithm by its DDT system indirectly 24,29 .The DDT system at least can partially illustrate the phenomena of the corresponding SDT system.
Currently, the studies of the convergence of the DDT system of PCA or MCA algorithms are very mature, see our recent works 25,26,27,28,29 .For constant learning rates, the convergence problem of the MCA learning algorithms has been studied systematically recently via their DDT systems, see 7,12,23,26,29,32,33,34,35 .Based on the careful analysis of convergence properties of the existing algorithms, many novel convergent algorithms have been proposed, for example, see 25,26,27,32,33,34,35 .However, most of these algorithms are derived by using mathematical inductions for convergence purpose, which are not meaningful optimization functions.
In this paper, a novel algorithm is proposed based on a viewpoint of generalized eigendecomposition(GED).The GED is to extract the generalized eigenvectors of a matrix pencil (A,B), which is given as follows 36 : where A and B are n × n positive definite and symmetric matrices in most signal processing applications.The positive scalar λ and the corresponding vector w are called generalized eigenvalue and generalized eigenvector of matrix pencil (A,B) respectively.Without loss of generality, assume the matrix pencil has n positive generalized eigenvalues, and their corresponding generalized eigenvectors φ 1 , • • • , φ n , the following properties will hold 36 : where δ i j is the Kronecker delta function.For the case of some eigenvalue with multiplicity larger than one, the problem is more difficult.Our algorithm cannot guarantee the convergence.We refer to φ 1 and φ n as the first principal generalized eigenvector and the first minor generalized eigenvector respectively.In many practical applications such as dimension reduction and signal processing, extracting the principal or minor generalized eigenvectors adaptively are critical 30,31 .
It can be easily shown that φ n is the first principal generalized eigenvector of matrix pencil (B, A).If φ n is computed, the first minor generalized eigenvector (A, B) can be obtained accordingly as following, Let B = I n , to extract the minor eigenvector of matrix A is equivalent to find the minor generalized eigenvector of matrix pencil (A, B).So we can extract the minor eigenvector of matrix A by finding the principal generalized eigenvector of matrix pencil (B, A).Based on the above observations, we propose a new MCA algorithm from this completely different perspective.The major contribution of our paper is that by using a generalized eigen-decomposition approach MCA algorithms have three important characteristics.The first is that the algorithms for extracting minor eigenvectors and minor generalized eigenvectors are unified.The simulation results on extracting the minor generalized eigenvectors are shown in 37 .The second is that the corresponding eigenvalues can be computed simultaneously as a byproduct of these algorithms.The third is that this algorithm is globally convergent.The rest of this paper is organized as follows.In section 2, the algorithm for extracting minor eigenvectors is derived from the perspective of generalized eigendecomposition.The convergence analysis is presented in section 3. Section 4 provides the simulations in order to confirm the efficiency and effectiveness of our algorithms.In section 5, conclusions are presented.

Downloaded by [York University Libraries] at 09:52 03 January 2015
A Globally Convergent MCA Algorithm by Generalized Eigen-Decomposition

The Algorithm
Consider a linear neuron with input ) is a zero-mean discrete-time stochastic process.Such process is constructed as a sequence x(0), x(1), • • • of independent and identically distributed samples upon a distribution of a random variable.And weight vector w(k) determines the relationship between input x(k) and output y(k) at time k.
Obviously the eigenvalues of this matrix are nonnegative.In our algorithm, some of eigenvalues can have a multiplicity more than one.However, for the simplicity of theoretical analysis, we assume Since A is a symmetric matrix, then there exists an orthonormal basis of R n composed by eigenvectors of matrix A. Suppose that As stated before, to extract the minor eigenvectors of matrix A is equivalent to find the principal generalized eigenvectors of matrix pencil (B, A) where B = I n .Based on this observation, we can derive our new algorithm directly by using an objective function.Assume w 1 , • • • , w j−1 have already converged to the j − 1 principal generalized eigenvectors of matrix pencil (B, A), taking into account the constrains w T i Aw j = δ i j for i = 1, • • • , j − 1, the objective function to extract the jth principal generalized eigenvector of matrix pencil (B, A) is where µ and α i are Lagrange multipliers.When the function J(w j ) reaches the maximum value, the principal generalized eigenvector of matrix pencil (B, A) will be the maximum solution of J(w j ).The optimal value of µ and α i can be determined by multiplying the gradient of J(w j ) with respect to w j by w T j or w T i from the left, and equating the result to zero.Taking into account that w T i Aw j = δ i j , the optimum value µ and α i are µ = −w T j Bw j , α i = −2w T i Bw j .Substituting these optimum value to J, the gradient of J(w j ) with respect to w j is , we obtain the following algorithm in matrix form, where η is the learning rate and UT[•] sets all elements below the diagonal of its matrix argument to zero, i.e. upper triangular.And Remark 1.When the above algorithm converges, the weight vector is different with the corresponding eigenvector with a scale factor which equals to the inverse of square root of its associated eigenvalue.After the process of normalization, the weight vector will be our desired minor eigenvector.

Theoretical Analysis
Direct analysis of the stochastic discrete time learning algorithm ( 8) is very difficult.Taking the conditional expectation E{w j (k + 1)|w j (0), x(i), i < k} J. Gao et al to this algorithm and identifying the conditional expected value as the next iteration, we have its corresponding DDT algorithm (7).Convergence analysis of this DDT algorithm can partially illustrate the convergence phenomena of its SDT algorithm.
Here, we can only show the convergence of the algorithm for sequentially extracting minor eigenvectors.
Mathematic induction method is used.First, when j = 1, we will show the convergence of the following algorithm To show the weight vector w 1 (k) converges to the minor eigenvector with a scalar factor, three steps are used.First, we show that the weight vector of algorithm is bounded in iteration, i.e., our algorithm is feasible in practice.Then, that the direction of weight vector converges to that of the eigenvector associated with the smallest eigenvalue will be proved.In the end, by using the results of previous two steps, we show that the norm of weight vector will converge to a constant, thus proving the convergence.
Since vector set {φ 1 , • • • , φ n } forms an orthogonal basis of R n , for each k 0, w 1 (k) can be represented as where z n (k) and ε j (k) are constants.Substitute (10)  into (9), it follows that and for 1 j n − 1, where k 0, and Lemma 1. Suppose that there exists a constant η 0 > 0 such that and w 1 (0) is not orthogonal to the eigen-subspace V λ n , then there exists a time k λ n .
Proof: Given in the Appendix.
Remark 2. It is required that initial value w 1 (0) is not orthogonal to eigensubspace V λ n .If w 1 (0) is orthogonal to V λ n , i.e., w 1 (0) ∈ V ⊥ λ n , where V ⊥ λ n is the orthogonal complement space of V λ n , the convergence is not guaranteed.However, in practical applications, the unstable situation cannot be observed since the dimension of V ⊥ λ n is less than that of R n , and any small disturbance can make w 1 (0) non-orthogonal to V λ n .
Lemma 1 shows that w 1 (k) will be bounded below and above if some initial conditions are satisfied, thus, algorithm (9) is reasonable.Lemma 2. Under the conditions of Lemma 1, the angle between w 1 (k) of algorithm ( 9) and eigenvector φ n will approach to zero.Moreover, for k > 0, and Assume the conditions of Lemma 1 hold, and 2η(λ 1 + λ n + √ λ 1 λ n ) < 1, then there exist constants k 1 , Π 1 , θ 1 and θ 2 so that for k k 1 , where θ 1 = −lnθ and θ 2 = −lnδ with Proof: Given in the Appendix.
Theorem 1 If the conditions of Lemma 1 and Lemma 3 are satisfied, then the w 1 (k) of algorithm (9) will converge to the eigenvector φ n with a scale fac- Proof: By Lemma 3, clearly, Then, using (16), and Lemma 2, as k → +∞.From (10) together with ( 16) and ( 17), it follows that The proof is completed.
Up to now, the convergence has been shown for the case of j = 1.Assume for i j − 1, w i have already converged to 1 φ n−i+1 , we will prove the convergence of w j in the remaining part of this section.The DDT version of w j (k) is Expressing w j (k) as that in (10), we have the following equations, for and for 1 i < n − j + 1, where k 0, and With the same conditions in Lemma1, then there exists time k 1 such that w j (k Proof: Given in the Appendix.By using the same method in Lemma 2, we have the following lemma, J. Gao et al eigenvector φ n− j+1 will approach to zero.Moreover, for k > 0.
Similarly, the following lemma holds.
From (10) together with ( 23) and ( 24), it follows that The proof is completed.
Remark 3. By Theorem 1 and Theorem 2, we know that our algorithm can not only extract the minor eigenvector, but also compute the corresponding eigenvalue whose value equals to 1/ w 2 , where w represents a converged weight vector., where Trace(A) = ∑ N k=1 x 2 (k)/N, which can be computed incrementally.Based on our analysis, the varying learning rate η does not affect the convergence.But theoretical analysis of the impact of the varying η is very difficult.

Simulations
In this section, we perform a few simulations on the algorithm (8) to confirm the convergence results and illustrate the good performance by using MATLAB software.There are many MCA algorithms, we cannot compare each of them.Because M öLLER 15 is a successfully improved MCA algorithm and OJA+ 16 is the original online MCA learning algorithm, we Downloaded by [York University Libraries] at 09:52 03 January 2015 A Globally Convergent MCA Algorithm by Generalized Eigen-Decomposition only compare the performance of the novel algorithm with the performance of M öLLER, and OJA+.For simplicity, the novel algorithm ( 8) is referred to as GM.
A sequence of 20-dimensional signals are generated, whose covariance matrix has the two smallest eigenvalues 0.3445, 0.3838 and the largest eigenvalue 1.9711.We generate the input sequence x k , 1 k 20000 by repeating adding zero-mean gaussian noise to the previous signal sequence with 0.1 variance.If the number of samples is large enough, it is easy to show that the covariance matrix A = E[x k x T k ] is the same as that of the signal covariance matrix.The following evaluation function is used to measure the accuracy of those algorithms.The direction cosine measures the accuracy of direction of the estimated eigenvectors, where w(k) is the weight vector at time k for any one of previous mentioned algorithms, φ is the actual corresponding eigenvector of matrix A.
In the first set of experiments, we compute the first minor eigenvector by using algorithms GM, O-JA+ and M öLLER.Let the learning rate η = 0.01 for all algorithms and the same initial weight vector is generated randomly with unit norm.The averaged value curves of evaluation function of 50 independent runs are drawn in Fig. 1.From Fig. 1, we could observe that GM algorithm is convergent, which confirms the convergence theory.The rate of convergence of GM is almost the same as that of OJA+.And both of two algorithms are faster than the algorithm M öLLER.Because the OJA+ algorithm has the limitation that the minor eigenvalue must be less than 1, so our algorithm GM is more suitable in practical applications.
In the second set of experiments, we will show the impact of different choosing of learning rates.We still use the input sequence in the first set of experiments.To illustrate it, we perform algorithm GM to extract the first minor eigenvector with different learning rates η = 0.02, 0.01, 0.005 with the same initial weight vector.The value curves of evaluation function are drawn in Fig. 2. From Fig. 2, it has been shown that the larger the learning rate is, the more oscillated the algorithm is.In the third set of experiments, the ability of algorithm GM to extract multiple minor eigenvectors is shown.Here, the same input sequence of previous experiments is used.The first three minor eigenvectors are computed using our algorithm GM.The learning rate η = 0.01.The value curves of evaluation function are drawn in Fig. 3. From Fig. 3, the efficiency of algorithm GM is shown for extracting multiple minor eigenvectors.
Please note that the algorithms OJA+ and M öLLER can not extract multiple minor eigenvectors.Further experiments are performed to confirm the convergence of our algorithms with different standard deviations of the input data and more higher dimensional data, for the limitation of space, which are not shown here.

Conclusions
A completely new approach that leads to a novel algorithm for extracting minor eigenvectors online has been discussed.First, from the perspective of generalized eigen-decomposition the algorithm is proposed.Theoretical analysis shows that this algorithm is globally convergent to the minor eigenvectors.Simulations have been conducted for illustration of the convergence and efficiency of our algorithms.From both theoretical analysis and simulations, it can be concluded that our algorithm is a good choice in the practical applications.

Remark 4 .
For choosing practical learning rate η to guarantee the convergence, practically, because the exact values of λ 1 and λ n are unknown, since Trace(A) =

Fig. 1 .
Fig. 1.Performance of extracting the first minor eigenvector using algorithms GM, OJA+ and M öLLER respectively.

Fig. 3 .
Fig. 3. Performance of algorithm GM for extracting the first three minor eigenvectors.