A MULTIGRID METHOD FOR THE MAXIMAL CORRELATION PROBLEM

. The maximal correlation problem (MCP) aiming at optimizing correlations between sets of variables plays an important role in several areas of statistical applications. Several algorithms have been proposed for MCP. However, these algorithms are not guaranteed to ﬁnd a global optimal solution of MCP. Employing an idea of the multigrid method for PDEs, a new method for MCP is proposed in the present paper. Numerical tests are carried out and suggest superior performance of the new method to the others in ﬁnding a global optimal solution of MCP.

1. Introduction. The maximal correlation problem (MCP) as a generalization of the canonical correlation analysis [9] aims at assessing the relationship between sets of random variables. Suppose A ∈ R n×n is a symmetric and positive definite matrix, and P = {n 1 , n 2 , . . . , n m } is a set of positive integers with m i=1 n i = n. Let Σ m = {x = (x 1 , · · · , x m ) ∈ R n : x i ∈ R ni , x i 2 = 1, for i = 1, 2, · · · , m} and ρ(x) = x Ax. The MCP is formulated as the following optimization problem: (1.1) Upon employing the Lagrange multiplier theory, it is easy to see that the first order necessary optimality condition for (1.1) is the existence of real scalars λ 1 , · · · , λ m and a vector x ∈ Σ m that solve the system of equations: where Λ = diag{λ 1 I n1 , · · · , λ m I nm }. This gives the multivariate eigenvalue problem (MEP), and (Λ, x) satisfying (1.2) is called a solution to MEP. A brief but comprehensive discussion on the statistical background of MCP can be found in Sun [11] and Chu and Watterson [1]. More applications of MEP can be found in [10,13]. Generalizations of MCP can be found in [12]. An important result proved in [1] states that whenever the matrix A has n distinct eigenvalues, then there are precisely m i=1 (2n i ) solutions to the MEP (1.2).
Obviously, a special case of MEP where m = 1 is the classical symmetric eigenvalue problem. The Horst method [7] is a generalization of the classical power iteration. Its convergence is established later by Hu [8], and independently by Chu and Watterson [1]. A Gauss-Seidel-type iteration is proposed in [1] and analyzed later in [17]. As another generalization of the Horst method, the P-SOR iteration is proposed and analyzed by Sun [11]. These algorithms for the general MCP stop at solutions of the associated MEP for a related matrix A. Recently, Grzegórski [4], and independently Zhang and Liao [18], proposes an alternating projection method (APM) (in [18], called alternating variable method (AVM)). They prove that (i) the algorithm converges globally and monotonically to a solution of MEP under some assumptions, and (ii) any accumulation point satisfies a globally optimal condition of MCP. However, simple examples show that the APM also can not guarantee to converge to a global optimal solution of MCP (see Section 3 below).
In searching the global optimal solution of MCP, several achievements have been made by Hanafi and Ten Berge [6], Zhang and Chu [17], and Zhang, Liao and Sun [19]. For the general case, however, solving (1.1) globally still remains very challenging. One idea suggested by Xu [14] and independently by Zhang and Chu [17] is to set a starting point strategy for the existing algorithms. Even though numerical experiment has demonstrated the substantial improvement in boosting up the probability of finding a global optimal solution of (1.1), we are still not satisfied with the current algorithms.
Employing an idea of the multigrid method for PDEs, in this paper, we propose a multigrid-type method for MCP. The rest of the paper is organized as follows. In the next section, we prove several properties of MCP. In Section 3, simple examples are presented to demonstrate that no algorithm so far can guarantee convergence to a global optimal solution of MCP. We propose a multigrid algorithm in Section 4. Finally, in Section 5, numerical tests are carried out and suggest superior performance of the new algorithm to the others in finding a global optimal solution of MCP.
2. Properties of MCP. Let A ∈ R n×n be symmetric and positive definite. If P = {n 1 , n 2 , . . . , n m } is a set of positive integers with m i=1 n i = n, then we call P a partition of n. Let P = {n 1 , n 2 , . . . , n m } be a partition of n, and let P = {n 1 , . . . , n k−1 , n k , n k+2 , . . . , n m } with n k = n k + n k+1 . Then we call P a finer partition of P .
Proof. It is proved by Sun [11] that This implies that the second inequality in (2.1) holds. On the other hand, we note that 1 This completes the proof.
The following corollary presents bounds of the function value for MCP.
where λ max (A) stands for the largest eigenvalue of A, and Proof. The first inequality in fact can be reformulated as This completes the proof.
The next theorem presents other bounds for the MCP value.
Theorem 2.3. Let P = {n 1 , n 2 , . . . , n m } be a partition of n. Partition A into block forms according to P as follows, Proof. The first inequality is a consequence of [17,Theorem 3.2] . In order to prove the second inequality in (2.3), consider the Cholesky factorization of A, This inequality and the fact A ii = L i L i imply that the second inequality in (2.3) holds. This completes the proof.
The following theorem presents a comparison of the lower bounds in (2.2) and (2.3).
. Obviously, one has that As a consequence, we see that And therefore, This completes the proof.
The next corollary presents a comparison of the upper bounds in (2.2) and (2.3) under some assumptions.
(2.5) Corollary 2.5 demonstrates that under the assumptions A ii = I ni for j = 1, · · · , m, the upper bound in (2.2) is sharper than the one in (2.3). On the other hand, it is worth pointing out that for general A, mλ max (A) may be larger than the upper bound in (2.3) as the following example demonstrates.
The next example presents further information on the bounds in (2.2) and (2.3).
One sees that when a → 1 the bounds in (2.2) are close to the ones in (2.3). Furthermore, if we choose m = n, then we see that λ max (A) → m as a → 1 and hence the bound in Corollary 2.5 is approximately sharp.
3. Global solution and the existing algorithms. As mentioned in Section 1, several effective algorithms for solving MEP have been proposed in literature. We now briefly describe these algorithms as follows. Consider the following splitting of A: 11 , · · · , A mm ), L = U is a strictly lower triangular matrix. Horst algorithm: . For details see Horst [7], Hu [8], and Chu and Watterson [1]. This algorithm defines a map on Σ m , denoted by Φ H , so that x (k+1) = Φ H (x (k) ). Gauss-Seidel algorithm: . See Chu and Watterson [1], and Zhang and Chu [17]. This algorithm can be written as x (k+1) = Φ GS (x (k) ). P-SOR algorithm: where M (k) and Ω (k) are diagonal matrices. See Sun [11] for details. This algorithm can be expressed as x (k+1) = Φ PSOR (x (k) ). APM(or AVM [18]) algorithm: This algorithm can be expressed as x (k+1) = Φ APM (x (k) ). It is not difficult to verify the following results.
One consequence of Proposition 3.1 is that if (Λ * , x * ) is a solution of the MEP (1.2) but x * is not a global maximizer of the MCP (1.1) and x (0) = x * is taken as starting point, then the sequence {x (k) } generated by any of the algorithms mentioned above do not converge to a global maximizer of the MCP (1.1). Therefore, these algorithms do not guarantee to obtain a maximizer of MCP.
In fact, Chu and Watterson [1] have observed that the computed solution by the Horst algorithm depends heavily on selection of the starting point. Similar observation has been made by Xu [14] for the P-SOR algorithm.
The following examples demonstrate that the computed results of APM also depends on the starting point choice.

4.
A multigrid method for MCP. Taking advantage of the bounds established in Theorem 2.1 and employing an idea from the multigrid methods for PDEs [5], we propose in this section a multigrid-type method for MCP.
Several remarks are in order. Firstly, any one of existing algorithms mentioned previously, i.e., the Horst algorithm, the Gauss-Seidel algorithm, the P-SOR algorithm, and the APM, can be chosen as the iteration algorithm Φ. Both the Gauss-Seidel algorithm [1] and the P-SOR method [11] are proposed as natural improvements on the Horst method for the MEP (1.2). The Gauss-Seidel method is deduced from the P-SOR method by selecting all the relaxation parameters ω (k) i = 1. An open question associated with the P-SOR method is how to establish the optimal relaxation parameters (if exist). Numerical testings made in [14] indicate that for some choices of ω (k) i , the convergence of the P-SOR is slower than the Gauss-Seidel algorithm.
For linear system of equations, an interesting observation shows that the convergence rate of the symmetric SOR (SSOR) is less sensitive to selection of the Algorithm 1 A multigrid method for MCP (1) Compute the solution of the following optimization problem max x∈Σ (1) ρ(x). (4.1) (2) For i = 2, 3, · · · , L (2.1) Solve min where x (i−1) is the solution of the following problem max x∈Σ (i−1) ρ(x).
(2.2) Let y (i) be the solution of (4.2). Take y (i) as the starting point to solve the following problem by some iteration method Φ: end relaxation parameter than the SOR iteration. See, e.g., [15, P118]. Therefore, we suggest that the APM and the symmetric Gauss-Seidel iteration are taken as candidates for the iteration Φ in Algorithm 1. The framework of the APM has been outlined in [4] and [18]. The symmetric Gauss-Seidel (S-G-S) is summarized in Algorithm 2.

Algorithm 2
The framework of the symmetric Gauss-Seidel method (S-G-S) Select x (0) ∈ Σ m for k = 0, 1, · · · do for i = 1, 2, · · · , m do y Secondly, the case L = 2 is of special interest. For such a case, Algorithm 1 can be regarded as a algorithm Φ for MCP with a specially selected starting point x (0) . Similar ideas have appeared in literature. For example, Zhang and Chu [17] propose the following starting point strategy for the Gauss-Seidel algorithm. Strategy 1 Set v = (v 1 , · · · , v m ) as starting point, where v i is the unit eigenvector associated with the largest eigenvalue of A ii . Theorem 2.3 suggests that this is a good starting point selection. Numerical experiment in [17] indicates that this strategy boosts up the probability of finding a global maximizer of (1.1). However, in some cases, e.g., A ii = I ni , for i = 1, 2, · · · , m, this strategy is no longer effective. Theorems 2.1 and Corollary 2.5 suggest the following starting point strategy. Strategy 2 (1) Compute the unit eigenvector v ∈ R n associated with the largest eigenvalue of A. (2) Compute the projection x (0) of v onto Σ m , and take x (0) as starting point for an existing iteration algorithm (for example, the symmetric Gauss-Seidel algorithm) for MCP.
Finally, we point out that the following optimization can be regarded as a relaxation of the MCP (1.1): More information on the relaxation techniques for optimization problems can be found, for example, in [2,16,20].  i } for the P-SOR algorithm is the same as Sun [11, eqn.(4.1)]. We stopped the iteration when Ax (k) − Λ (k) x (k) 2 ≤ is met for given tolerance .
For comparison, we first carried out numerical tests on problems with known global maximizers in advance. Three small examples are available in the literature whose global maximizers have been obtained through an exhaustive search. For reference, we list these examples below. For comparison, we ran five algorithms for these examples: the APM, the Horst algorithm, the Gauss-Seidel (G-S) algorithm, the symmetric Gauss-Seidel (S-G-S) algorithm and the P-SOR algorithm.
We ran all algorithms starting from 10 4 random initial points, and used = 10 −6 as the tolerance. The numerical results are displayed in Table 1. Under the columns "Avg.Iter.#" are the average numbers of iterations needed to meet the control precision. Under columns "%to global" are the sample probabilities, out of the 10 4 random tests, of convergence to a global maximizer of MCP. The five algorithms all succeed in finding a solution of MEP for all 10 4 random starting points. The results in Table 1 indicate that the APM has the fewest iteration numbers and largest probability to reach a global maximizer of MCP among the five methods. The results with Example 5.1 and Example 5.2 show that selecting starting point randomly, though the APM has a larger probability to reach a global maximizer than others, it can not guarantee to always converge to a global optimal solution. At the same time, our numerical experiments with the Algorithm 1 obtained a global maximizer of MCP for each of these examples.
We ran four algorithms with various starting point strategies, and the numerical results are displayed in Table 2. Under the columns "Iter" are the numbers of iterations needed to meet the control precision. Under columns "Y/N" (Yes/No) represent whether the algorithm converged to a global optimal solution of MCP or not and "MG" represents Multigrid method.
The results in Table 2 clearly demonstrate the power of some starting point strategies in boosting up the probability of finding a global maximizer of MCP, and what we are interested in here is that the multigrid-method combined with any one of the four known algorithms always converge to a global optimal solution of MCP. We carried out many further experiments and found that the multigrid-method always reached a maximizer of MCP.
Example 5.4. This example is created as follows: first, an N -by-n (N > n and let n = 50 and 100 respectively) matrix Y with rank(Y ) = n is generated randomly with MATLAB 7.1. Then we let A = Y Y . Thus, A is an n-by-n symmetric and positive definite matrix. In the third step, we partition A into block forms according to various m and P, where m and P are left to our decision.
Numerical results are recorded in Table 3. Under the column "ρ" are the maxima of MCP. These experiment results seem to suggest that (i) the P-SOR method shows no advantage on larger scale problems, (ii) ρ increases with the increase of m.