Joint CLT for several random sesquilinear forms with applications to large-dimensional spiked population models

In this paper, we derive a joint central limit theorem for random vector whose components are function of random sesquilinear forms. This result is a natural extension of the existing central limit theory on random quadratic forms. We also provide applications in random matrix theory related to large-dimensional spiked population models. For the first application, we find the joint distribution of grouped extreme sample eigenvalues correspond to the spikes. And for the second application, under the assumption that the population covariance matrix is diagonal with $k$ (fixed) simple spikes, we derive the asymptotic joint distribution of the extreme sample eigenvalue and its corresponding sample eigenvector projection.


Introduction
The aim of this paper is to derive the joint central limit theorem of a new type of random vector whose components are made with several groups of random sesquilinear forms. To be more specific, we consider a sequence (x i , y i ) i∈N of iid. complex-valued, zero-mean random vector belonging to C K × C K (K fixed) with a finite moment of fourth-order. For positive integer n ≥ 1, write x i = (x 1i , · · · , x Ki ) T , X(l) = (x l1 , · · · , x ln ) T (1 ≤ l ≤ K) , (1.1) with a similar definition for the vectors {y i } and {Y (l)} 1≤l≤K . The covariance between x l1 and y l1 is denoted as ρ(l) = E[x l1 y l1 ], 1 ≤ l ≤ K. Let A n = [a ij (n)] n and B n = [b ij (n)] n be two sequences of n × n Hermitian matrices, and define U (l) := 1 √ n X(l) * A n Y (l) − ρ(l)trA n , (1.2) V (l) := 1 √ n X(l) * B n Y (l) − ρ(l)trB n .
If we use only one sequence of Hermitian matrix, say {A n } and consider one form (K = 1), then the problem reduces to the central limit theorem of a simple random sesquilinear form: If we further impose Y ≡ X, we obtain a classical random quadratic form U * (1) := 1 √ n X(1) * A n X(1) − ρ(1)trA n with independent random variables.
There exists an extensive literature on the asymptotic distribution of quadratic form U * (1). The pioneering work in this area dates back to Sevastyanov (1961), who deals principally with the case when the variables X have normal distribution. This CLT is extended to arbitrary iid. components in X by Whittle (1964), with additional conditions on the matrix A; In particular, A has a zero diagonal (i.e quadratic form:Ũ (1) := 1 √ n X(1) * A n X(1)). Later extensions deal with other types of limiting theorem (functional CLT, law of iterated logarithm) or dependent random variables in X, see : Rotar' (1973), De Jong (1987), Fox and Taqqu (1987), Mikosch (1991) and Jakubowski and Memin (1994) for reference.
In a different area, Pan et al (2008) and Hachem et al (2013) established the asymptotic behavior of quadratic form and bilinear form, where A = S n is the sample covariance matrix and A = (M n − zI) −1 is the resolvent of some large dimensional random matrix M n , respectively. Such CLT can be used in the areas of wireless communications and electrical engineering.
In the paper of Bai and Yao (2008), the authors derived the central limit theorem for U (l) in (1.2) (i.e with one group of sesquilinear forms) in their Appendix as a tool for establishing the central limit theory for the extreme sample eigenvalues when the population has the spiked covariance structure.
In this paper, we follow the lines and strategy that put forward in Bai and Yao (2008), and extend this CLT to arbitrary number of groups of random sesquilinear forms, which is presented in Section 2. Indeed, this extension has been motivated by applications in the field of random matrix theory related to the spiked population model. When the population has a spiked covariance structure, we establish the asymptotic joint distribution of any two groups of extreme sample eigenvalues that corresponding to the spikes. Besides, when the population has only one simple spike, we find the joint distribution of the largest sample eigenvalue and its corresponding sample eigenvector projection using our main result. All these applications are developed in Section 3. Section 4 contains the proofs of the theorems in Section 3. And the last Section contains some additional technical lemmas. a uv (n)b uv (n) . Denote

And let
then, the 2K-dimensional complex-valued random vector: T converges weakly to a zero-mean complex-valued vector W whose real and imaginary parts are Gaussian. Moreover, the Laplace transform of W is given by where B could be expressed as each block is a K × K matrices ( l, l ′ = 1, · · · , K) with entries: Proof. (proof of Theorem 2.1) It is sufficient to establish the CLT for the linear combinations of random Hermitian sesquilinear forms: where the coefficients (c l ), (d l ) ∈ C K × C K are arbitrary. Also, it holds that We use the moment method as in Bai and Yao (2008). Define then we can expand η n as follows: where e is an edge associated with vertex u and v, i.e. e = (u, v) ∈ {1, · · · , n} 2 ; and To each sum in equation (2.3), we associate a directed graph G by drawing an arrow u → v for each factor e j = (u, v). We denote G 1 as a subgraph of G corresponding to the coefficients being aψ, and G 2 the remaining: G 2 = G\G 1 . Besides, to a loop u → u corresponds the product a uu ψ uu = a uu K l=1 c l (x lu y lu − ρ(l)) and to an edge u → v (u = v) corresponds the product a uv ψ uv = a uv K l=1 c l x lu y lv . The same holds for b uu ϕ uu and b uv ϕ uv . In the paper of Bai and Yao (2008) (proof of Theorem 7.1), they show that only three types of components in the graph G contribute to a non-negligible term (see Figure 1): Figure 1: three major components in the graph G Because G 1 and G 2 are subgraphs of G, and by the definition in equation (2.1) and (2.2), ψ e differs from ϕ e only through the coefficient c l or d l in front. So the difference between ψ e and ϕ e is at most O(1), which means that for the components in the graph G that have o(1) contribution to En K/2 ξ K n (see Bai and Yao (2008) Figure 2: nine major components in the graph G 1 G 2 on this fact, we get this time that only the influence of the following nine components (in Figure 2) counts. The numbers k 1 , · · · , k 9 in Figure 2 stand for the multiplicity of each component, so by degree of each vertex, we also have the restriction that 4(k 1 + · · · + k 9 ) = 2K, which means K should be an even number, denoted as 2p for convenience.
From the combinatorics, we have this time The coefficients in front of D 1 D 2 · · · D 9 is due to the fact that by observing the nine components in Figure 2, we find that each component is made of two edges; first we combine two edges in a group in the total of K edges, that is C 2 K C 2 K−2 · · · C 2 2 ; second, the first k 1 (also the following k 2 , · · · , k 9 ) groups should be the same, we must exclude the k 1 ! · · · k 9 ! perturbations from the total of C 2 K C 2 K−2 · · · C 2 2 ; and last, for the three components in the last column of Figure 2, the two edges in each component belong to different subgraphs (one edge in G 1 and the other in G 2 ), so there should be an additional perturbation 2 k3+k6+k9 added, and combine all these facts leads to the result.
Then we specify the terms of D 1 , D 2 , · · · , D 9 in the following: Combine these nine terms with equation (2.4), we have which means that η n =⇒ N (0, σ 2 ) by the moment method, with The proof of Theorem 2.1 is complete.
T converges weakly to a zero-mean 2K-dimensional Gaussian vector with covariance matrix B.
Theorem 2.1 can be generalized to the joint distribution of several sesquilinear forms. We present this generalization in the following theorem. Recall that in the proof of Theorem 2.1, we use the moment method and find the nine major components presented in Figure 2, which all contain two edges. Therefore, if now we consider the k sesquilinear forms as a whole, there should be 3 2 k(1 + k) major components that will lead to a nonneglectable contribution. And each component still has two edges, from the same subgraph (both from G i (i = 1, · · · , k) or from two different subgraphs (one from G i and the other from G j (i = j)). This means that the k sesquilinear forms packed together only has pairwise covariance function. Other proofs are similar and omitted.
ij (n)] n m = (1, · · · , k) be k sequences of n× n Hermitian matrices and the vector {X(l), Y (l)} 1≤l≤K are defined as (1.1). Assume that the following limits exists (m, m ′ = (1, · · · , k) and m = m ′ ): And let then, the (K · k)-dimensional complex-valued random vector: T converges weakly to a zero-mean complex-valued vector W whose real and imaginary parts are Gaussian. Moreover, the Laplace transform of W is given by where B could be written as , each block is a K × K matrices with entries (for l, l ′ = 1, · · · , K):

Two applications in large-dimensional spiked population models
It is well known that the empirical spectral distribution of a large-dimensional sample covariance matrix tends to the Marčenko-Pastur distribution F y (dx): where y = lim p/n, a y = (1 − √ y) 2 and b y = (1 + √ y) 2 under fairly general conditions, see Marčenko and Pastur (1967). Moreover, under a fourth moment condition, the smallest and largest sample eigenvalues converge almost surely to the end points a y and b y , respectively.
While in recent empirical data analysis, there's always the case that some eigenvalues are well separated from the bulk, so Johnstone (2001) proposed a spiked population model, where all the population eigenvalues equal to 1 except some fixed number of them (spikes) for possible explanation of this phenomenon. Clearly, the spiked population model can be considered as a finite-rank perturbation of the null case where all the population eigenvalues equal to 1. Then there raises the question that what's the influence of these spikes on the individual sample eigenvalues. Baik et al. (2005) first unveiled the phase transition phenomenon in the case of complex Gaussian variables, stating that when the population spikes are above (or under) a certain threshold 1 + √ y (or 1 − √ y), the corresponding extreme sample eigenvalues will jump out of the bulk. In Baik and Silverstein (2006), they consider more general random variable: complex or real and not necessarily Gaussian and derived the same transition phenomenon. As for the central limit theorem, Baik et al. (2005) proposed the result for the largest eigenvalue in the Gaussian complex case. Paul (2007) found the Gaussian limiting distribution when the population vector is real Gaussian and some other conditions on the population covariance matrix. Bai and Yao (2008) established the central limit theorem for the largest as well as for the smallest sample eigenvalues under general population variables. Related central limit theory of extreme eigenvalues for finite-rank perturbed random matrices has been proposed in .
In this section, we establish two new central limit theorems for the extreme sample eigenvalues as well as sample eigenvector projections. First, Section 3.1 gives introductions on the model and some preliminary results. In Section 3.2, a joint central limit theorem is proposed for groups of packed sample eigenvalues corresponding to the spikes (primary CLT in Bai and Yao (2008) concerns only one such group). Next in Section 3.3, assuming the simple spiked case, we derive a joint CLT for the extreme sample eigenvalue and its corresponding sample eigenvector projection. Such CLT is a new result; indeed, we do not know any CLT related to spike eigenvectors from the literature. Finally, both applications are based on the general CLT for random sesquilinear forms in our Theorem 2.1.

Some notation and preliminary results
Suppose the zero-mean complex-valued random vector copies of x. Moreover, assume that E||x|| 4 < ∞ and the coordinates of η are independent and identically distributed with unit variance.
The population covariance matrix of the vector x is then Assume Σ has the spectral decomposition: where U is an unitary matrix, the a i 's are positive and different from 1, and the n i 's satisfy n 1 + · · · + n k = M . Besides, let M a be the number of j ′ s such that a j < 1 − √ y (here, y is the limit of dimension to sample size ratio: y = lim p/n ∈ (0, 1)), and let M b be the number of j ′ s such that a j > 1 + √ y. More specifically, if we arrange the a ′ i s in decreasing order, then Σ could be diagonalized as , · · · , a k , · · · , a k n k
The sample covariance matrix of x is which can be partitioned as Since M is fixed and p → ∞, n → ∞ such that p/n → y ∈ (0, 1), the empirical spectral distribution of the eigenvalues of S n , as well as the one of S 22 , converges to the Marčenko-Pastur distribution F y (dx). For real constant λ / ∈ [a y , b y ], we define the following integrals with respect to F y (dx): (3.3) Let l 1 ≥ l 2 ≥ · · · ≥ l p be the eigenvalues of S n . Let s j = n 1 + · · · + n j for 1 ≤ j ≤ M b or k − M a + 1 ≤ j ≤ k. Baik and Silverstein (2006) derive the almost sure limit of those extreme sample eigenvalues. They have proven that for each m ∈ {1, · · · , M b } or m ∈ {k − M a + 1, · · · , k} and s m−1 < i ≤ s m , ya m a m − 1 almost surely. In other words, if a spike eigenvalue a m lies outside the interval [1 − √ y, 1 + √ y], then the n m -packed sample eigenvalues {l i , i ∈ J m } (associated to a m ) converge to the limit λ m , which is outside the support of the M-P distribution [a y , b y ] (here, we denote J m = (s m−1 , s m ] when m ∈ {1, · · · , M b } or m ∈ {k − M a + 1, · · · , k}).
Recently Bai and Yao (2008) derives the CLT for those extreme sample eigenvalues. More specifically, let δ n,i := √ n(l i − λ m ), where m ∈ {1, · · · , M b } or m ∈ {k − M a + 1, · · · , k}, i ∈ J m , and λ m = φ(a m ) / ∈ [a y , b y ] as defined before. They have proven that δ n,i tends to the solution v of the following equation: where U * R n (λ m )U mm is the m-th diagonal bloc of U * R n (λ m )U corresponding to the index {u, v ∈ J m }, and R n (λ) = 1 √ n ξ 1:n I + A n (λ) ξ * 1:n − Σtr I + A n (λ) , Let R(λ) denote the M × M matrix limit of R n (λ), and R(λ) := U * R(λ)U . According to (3.4), it says that δ n,i tends to an eigenvalue of the matrix (1 + ym 3 (λ m )a m ) −1 [ R(λ m )] mm . Besides, since the index i is arbitrary over J m , all the J m random variables √ n{l i − λ m , i ∈ J m } converge almost surely to the set of eigenvalues of this matrix. The following theorem in Bai and Yao (2008) identifies the covariance of the elements within the limit matrix R(λ). For simplicity, we only consider the real case in all the following or otherwise specified.

Application 1: Asymptotic joint distribution of two groups of extreme sample eigenvalues in the spiked population model
In this subsection, we consider the asymptotic joint distribution of two groups of extreme sample eigenvalues, say, {l i , i ∈ J m } and {l i ′ , i ′ ∈ J m ′ } (m = m ′ ) when Σ has the structure (3.2), namely the random vector Following the work of Bai and Yao (2008), we know that this n m + n m ′ dimensional random vector converges to the eigenvalues of the symmetric (n m +n m ′ )× (3.5) Here, this random matrix (3.5) has two diagonal blocks with dimension n m and n m ′ , respectively. The covariance function of the elements within each block has been fully identified by Bai and Yao (2008), see Theorem 3.1. But if we consider them as a whole, there's still need to explore the covariance between the elements from the two blocks [ R(λ m )] mm and [ R(λ m ′ )] m ′ m ′ . We establish such a covariance function in Theorem 3.2 when the observation vector x is real with the help of our Corollary 2.1. However, it can also be generalized to the complex case by considering the real and imaginary parts as two independent real random variables with the help of our Theorem 2.1, readers who are interested in this can refer to Bai and Yao (2008) (see the proof of their Proposition 3.2).

Main result
Theorem 3.2. Assume that the variables ξ and η are real, then the two diagonal blocks of the 2M × 2M random matrix are symmetric, having zero-mean Gaussian entries, with the following covariance function between each other: for .
Remark 3.1. If we restrict the index (i, j) to the region J m × J m and (i ′ , j ′ ) to J m ′ × J m ′ , we can get the covariance function between the two blocks of (3.5). And it should be noticed that the two regions J m × J m and J m ′ × J m ′ do not intersect with each other.
Remark 3.2. If the coordinates {ξ(i)} of ξ are independent (thus, Σ is diagonal and U = I M ), Bai and Yao (2008) has already proved that the covariance matrix within each diagonal block in (3.6) is diagonal; in other words, the Gaussian matrix R(λ m ) and R(λ m ′ ) are both made with independent entries. And by noting that the regions J m × J m and J m ′ × J m ′ are disjoint, the only covariance function that may exist between the two blocks is Cov Using (3.7) and the fact that {ξ(i)} are independent, we have which means that the two diagonal blocks in (3.5) are independent. Besides, Bai and Yao (2008) has already pointed out the variances within each block:

asymptotically independent, converging to the eigenvalues of the Gaussian random matrices
And both the Gaussian random matrices are made with independent entries, with a fully identified variance function given by (3.8) and (3.9). Moveover, if the observations are Gaussian, (3.9) reduces to V ar(R(i, i)) = 2θΣ 2 ii .

Conditions that two groups of packed extreme sample eigenvalues are pairwise independent
An interesting question in the asymptotical analysis of spiked eigenvalues is to know whether two groups of packed extreme sample eigenvalues are asymptotically pairwise independent. In Remark 3.2, we have seen that when We aim to relax the independent restriction of {ξ(i)} under the condition that all the eigenvalues of Σ are simple, that is, Σ has the spectral decomposition: where the a ′ i s are arranged in decreasing order. We discuss the condition that when the extreme sample eigenvalues are pairwise independent, asymptotically.
Let l i , l j denote the extreme sample eigenvalues corresponding to the spikes a i and a j , where a i , a j / ∈ [1 − √ y, 1 + √ y], and a i = a j . Then, the two-dimensional random vector δ n,i δ n,j = √ n l i − λ i √ n l j − λ j converges to the eigenvalues of the following random matrix: Since all the eigenvalues of Σ are simple, the multiplicity numbers n i and n j both equal to 1. Therefore, [ R(λ i )] ii and [ R(λ j )] jj are now two Gaussian random variables (actually, they are the (i, i)-th and (j, j)-th elements of the M × M Gaussian random matrices R(λ i ) and R(λ j ), respectively, denoted as R(λ i )(i, i) and R(λ j )(j, j)). As a result, δ n,i δ n,j T actually converges to the Gaussian random vector From the definitions of w(i, j) and θ(i, j), taking the fact that m 1 (λ i ) = 1/(a i − 1) (see Lemma 5.1) into consideration, we have, The values of w(i, j) will always be positive whenever a i , a j / ∈ [1 − √ y, 1 + √ y], while θ(i, j) − w(i, j) will be negative if a i > 1 + √ y and 0 < a j < 1 − √ y (corresponding to one extreme large and one extreme small sample eigenvalues), and positive if a i , a j > 1 + √ y or 0 < a i , a j < 1 − √ y (corresponding to two extreme large or two extreme small sample eigenvalues).
Therefore, if any two extreme large (or small) sample eigenvalues are mutually independent (equal to the condition that Cov(R(λ i )(i, i), R(λ j )(j, j)) = 0), a sufficient and necessary condition is (j)) ; another way of saying this is Obviously, when {ξ(i)} are independent, the condition ( * ) is satisfied. We consider a special case that the observations are Gaussian, with a diagonal population covariance matrix. This model satisfies condition ( * ). It is due to the fact that when the observations are Gaussian, uncorrelation between ξ(i) and ξ(j) implies independence, which further implies ξ(i) 2 and ξ(j) 2 are uncorrelated. Therefore, if the observations are Gaussian and the population covariance matrix is diagonal, then any two extreme large (or small) sample eigenvalues are mutually independent. Furthermore, we can derive explicitly the joint distribution of δ n,i and δ n,j . According to (3.10), (3.11) and (3.12), we have a much more simplified form due to the Gaussian assumption: where θ(i) = (ai−1+y) 2 (ai−1) 2 −y and θ(j) = (aj −1+y) 2 (aj −1) 2 −y by definition. And using the expression m 3 (λ) = 1 (a−1) 2 −y (see Lemma 5.1), we finally derive the asymptotic joint distribution: But, if we only assume Σ is diagonal, and no Gaussian assumptions are made, things are different. One such example is that ξ(i) and ξ(j) come from the uniform distribution inside the ellipse: one can check that Eξ(i)ξ(j) = Eξ(i) · Eξ(j) = 0, but Eξ(i) 2 = 4, Eξ(j) 2 = 9 and Eξ(i) 2 ξ(j) 2 = 24, that is Eξ(i) 2 ξ(j) 2 = Eξ(i) 2 · Eξ(j) 2 , therefore, condition ( * ) is not satisfied. From this example, we see there could happen that although ξ(i) and ξ(j) are uncorrelated, ξ(i) 2 and ξ(j) 2 are correlated. And in such a case, even though the population covariance matrix is diagonal, the two extreme large (or small) eigenvalues of the sample covariance matrix may actually have correlation between each other.
A small simulation is conducted below to check this covariance formula according to the two cases mentioned above. The dimension p is fixed to be 200 and the sample size n is fixed to be 300. We choose two spikes a 1 = 9 and a 2 = 4, which are both larger than the critical value 1 + √ y (= 1 + 2/3). We repeat 10000 times to calculate the empirical covariance value between the largest (l 1 ) and the second largest (l 2 ) sample eigenvalues. The first case is the two-dimensional multivariate Gaussian vector (ξ(1), ξ(2)) T , which has a joint distribution According to (3.13), the theoretical covariance value between l 1 and l 2 should be 0, and the empirical covariance value from the 10000 sample simulated turns out to be 0.0019. The second case is the aforementioned uniform distribution inside the ellipse: ξ(1) 2 /36 + ξ(2) 2 /16 ≤ 1. This time, the theoretical covariance value between l 1 and l 2 could be calculated as −0.0366 according to (3.12), and the empirical covariance value from the 10000 sample simulated turns out to be −0.0371. The two errors are both smaller than the order O(1/ √ 10000) under both cases.

Application 2: Asymptotic joint distribution of the largest sample eigenvalue and its corresponding sample eigenvector projection
In this section, let the population covariance matrix have only one simple spike: where now the Σ in (3.1) reduces to a single number a (a > 1 + √ y). The sample covariance matrix S n is also partitioned as before: which are mutually independent. And we denote v 4 = Eξ 4 /a 2 − 3 as the kurtosis coefficient of the first coordinate. Now suppose l is the largest eigenvalue of S n , converging to the value λ = φ(a) = a + ya/(a − 1), which is outside the support of M-P distribution [a y , b y ] and let u be the corresponding sample eigenvector with u 1 its first coordinate corresponding to the spike a and u 2 the remaining p×1 component. We have the following central limit theorem that related to the asymptotic joint distribution of the largest sample eigenvalue l and its corresponding sample eigenvector projection u * 1 u 1 . (Notice that the population eigenvector corresponding to the spike a is simply e 1 = (1, 0, · · · , 0) T . Therefore, u 1 represents the inner product between e 1 and the sample eigenvector u.) where v 11 = a 2 y 2 (a 2 + y − 1) 2 (a − 1) 4 (a − 1 + y) 4 ν 4 + 2a 2 y((a + y − 1) 2 + ya 2 ) ((a − 1) 2 − y)(a − 1 + y) 4 , v 12 = ya 2 (a 2 − 1 + y)((a − 1) 2 − y) (a − 1) 6 (a − 1 + y) 4 Remark 3.3. If the observations are Gaussian, then we have v 4 = 0, then the covariance matrix in the above theorem reduces to a much simpler expression: Remark 3.4. Trivially, the following central limit theorem of the eigenvector projection holds In particular, Observe that this limit ∈ (0, 1). In particular, the sample eigenvector u does not converge to the population eigenvector; only their angle tends to a limit. Notice that the limit of the angle has already been established by Paul (2007) for the Gaussian case and Benaych-Georges and Nadakuditi (2011) on somewhat different but closely related random matrix models with a finite-rank perturbation.
By applying Corollary 2.1, we have We specify these values in the following: where we have used Lemma 6.1. in Bai and Yao (2008); and Combine all these, we have The proof of Theorem 3.2 is complete.

Proof of Theorem 3.3
Proof. Since l is the largest eigenvalue of S n and u its corresponding eigenvector, we have where u 1 is the first component of u, and u 2 is the remaining p × 1 component, and this leads to Consequently, Moreover, combining (4.1) with the fact that Next, we simplify the values of C 1 and C 2 . .