Almost sure localization of the eigenvalues in a gaussian information plus noise model. Applications to the spiked models

Let $\boldsymbol{\Sigma}_N$ be a $M \times N$ random matrix defined by $\boldsymbol{\Sigma}_N = \mathbf{B}_N + \sigma \mathbf{W}_N$ where $\mathbf{B}_N$ is a uniformly bounded deterministic matrix and where $\mathbf{W}_N$ is an independent identically distributed complex Gaussian matrix with zero mean and variance $\frac{1}{N}$ entries. The purpose of this paper is to study the almost sure location of the eigenvalues $\hat{\lambda}_{1,N} \geq ... \geq \hat{\lambda}_{M,N}$ of the Gram matrix ${\boldsymbol \Sigma}_N {\boldsymbol \Sigma}_N^*$ when $M$ and $N$ converge to $+\infty$ such that the ratio $c_N = \frac{M}{N}$ converges towards a constant $c>0$. The results are used in order to derive, using an alernative approach, known results concerning the behaviour of the largest eigenvalues of ${\boldsymbol \Sigma}_N {\boldsymbol \Sigma}_N^*$ when the rank of $\mathbf{B}_N$ remains fixed when $M$ and $N$ converge to $+\infty$.


Introduction
The addressed problem and the results Let Σ N be a M × N complex-valued matrix defined by  (1) is referred in the literature to as the information plus noise model (see e.g Dozier-Silverstein [14]). In this paper, we assume that Rank(B N ) = K(N ) = K < M because this assumption is verified in a number of practical situations, in particular in the context of the spiked models addressed here.
The purpose of this paper is to study the almost sure location of the eigenvaluesλ 1,N ≥ . . . ≥λ M ,N of the Gram matrix Σ N Σ * N when M and N converge to +∞ such that the ratio c N = M N converges towards a constant c > 0 and to take benefit of the results to obtain, using a different approach than Benaych-Nadakuditi [7], the behaviour of the largest eigenvalues of the information plus noise spiked models for which the rank K of B N remains constant when M and N increase to +∞.  [16,Th.7.4]) whose support N is the union of disjoint compact intervals called in the following the clusters of N . The boundary points of each cluster coincide with the positive extrema of a certain rational function depending on the empirical spectral measure of matrix B N B * N , σ 2 and on the ratio c N = M N (see [28], Thereom 2). Each cluster of N appears to be naturally associated to another interval containing a group of consecutive eigenvalues of B N B * N ( [28]). It is shown in [28] that the property proved in Bai-Silverstein [2] holds in the context of model (1). Roughly speaking, it means that for an interval [a, b] located outside N for N large enough, no eigenvalue of Σ N Σ * N belong to [a, b] almost surely, for all large N . In this paper, we establish the analog of the property called in Bai-Silverstein [3] "exact separation": almost surely, for N large enough, the number of eigenvalues of Σ N Σ * N less than a (resp. greater than b) coincides with the number of eigenvalues of B N B * N associated to the clusters included into [0, a] (resp. included into [b, ∞)). Note that these results also hold in the case where K = M , not treated in this paper. Indeed, the analysis of the support N provided in [28] can be extended when B N B * N is full rank. Once the characterization of the support is established, the probabilistic part of the proof of the above mentioned exact separation result eigenvalues can be used verbatim.
We also use the separation result to study the case where Rank(B N ) = K is independent of N . It is assumed that for each k = 1, . . . , K, the non zero eigenvalues of B N B * N satisfy lim N →+∞ λ k,N = λ k . The support N of µ N is first characterized in this case, and using the above results related to the almost sure location of the (λ k,N ) k=1,...,M , it is proved that if λ k > σ 2 c, then, and that if λ k ≤ σ 2 c, then,λ k,N → σ 2 (1 + c) 2 .
This behaviour was first established in [7] using a different approach.
Motivations Our work has been originally motivated by the context of array processing in which the signals transmitted by K < M sources are received by an array equiped with M sensors. Under certain assumptions, the M -dimensional vector y(n) received on the sensor array at time n can be written as where each time series (s k (n)) n∈Z represents a non observable deterministic signal corresponding to source k and where d k is an unknown deterministic M -dimensional vector depending on the direction of arrival of the k-th source.
(v(n)) n∈Z is an additive complex white Gaussian noise such that Model (4) poses important statistical problems such as detection of the number of sources K or estimation of the direction of arrivals of the K sources. A number of estimation schemes based on the eigenvalues and the eigenvectors of matrix Σ N Σ * N were developed, and analysed if N → +∞ while M remains fixed. If however M and N are of the same order of magnitude, the above technics may fail, and it is therefore quite relevant to study these statistical problems in the asymptotic regime M , N → +∞ in such a way that M N → c, c ∈ (0, +∞). The number of sources may be constant or scale up with the dimensions M and N . For this, the first step is to evaluate the behaviour of the eigenvalues of Σ N Σ * N .
About the literature Concerning the zero-mean correlated model. The problems addressed in this paper were studied extensively in the context of the popular zero-mean correlated model defined by where H N is a deterministic M × M matrix and where W N is a random matrix with possibly non Gaussian zero mean variance 1 N i.i.d entries. The most complete results concerning the almost sure localization of the eigenvalues of Σ N Σ * N are due to Bai-Silverstein [2,3] and were established in the non Gaussian case. Spiked models were first proposed by Johnstone [20] in the context of (5) (matrix H N is a diagonal matrix defined as a finite rank perturbation of the identity matrix). Later, Baik et al. [4] studied, in the complex Gaussian case, the almost sure convergence of the largest eigenvalues of Σ N Σ * N and established central limit theorems. The analysis of [4] uses extensively the explicit form of the joint probability distribution of the entries of Σ N . Using the results of [2,3] as well as the characterization of the support of the limiting distribution µ N of the empirical eigenvalue distributionμ N (see ), Baik-Silverstein [5] addressed the non Gaussian case, and showed the almost sure convergence of certain eigenvalues of Σ N Σ * N . Mestre considered in [21] the case where H N H * N has a finite number of different positive eigenvalues having multiplicities converging to +∞, and showed how to estimate the eigenvalues of H N H * N as well as their associated eigenspace. Similar ideas were also developed in [22] in order to address the source localization problem in the context of large sensor arrays when the source signals are i.i.d. sequences. The analysis of Mestre [22,21] is based on the results of [2,3] as well as on the observation that it is possible to exhibit contours depending on the Stieljes transform of µ N , and enclosing each eigenvalue of H N H * N . Paul studied in [26] the behaviour of the eigenvectors associated to the greatest eigenvalues of a Gaussian spiked model (almost sure convergence and central limit theorems). Bai and Yao showed in [1] that certain eigenvalues of a non Gaussian spiked model satisfy a central limit theorem. We finally note that the above results on zero-mean spiked models have been used in the context of source localization (see [19,23]).
Concerning the information plus noise model. Except our paper [28] devoted to the source localization of deterministic sources, the almost sure location of the eigenvalues of matrix Σ N Σ * N was not studied previously. In [28], we however followed partly the work of Capitaine et al. [9], devoted to finite rank deformed Gaussian (or satisfying a Poincaré inequality) Wigner matrices, which was inspired by previous results of Haagerup and Thorbjornsen [17]. See also the recent paper [10] in which the rank of the deformation may scale with the size of the matrix. We used in [28] the same approach to prove that for N large enough, no eigenvalue of Σ N Σ * N is outside the support N of µ N . In [28], under the assumption that the eigenvalue 0 of B N B * N is "far enough" from the others, we established a partial result showing that the M − K smallest eigenvalues of Σ N Σ * N are almost surely separated from the others. In the present paper, we prove a general exact separation property extending the result of [5] to the complex Gaussian information plus noise model.
The almost sure behaviour (2), (3), of the largest eigenvalues of information plus noise spiked models appears to be a consequence of the general results of [6,7] devoted to the analysis of certain random models with additive and/or multiplicative finite rank perturbation. (2) and (3) are therefore not new, but the technics of [7] completely differ from the approach used of the present paper which can be seen as an extension to the information plus noise model of the paper [5].
Organization of the paper In section 2, we review some results of [13] and [28] concerning the support N of µ N as well as some useful background material. As [28] assumed c N < 1, we address the case c N = 1 and prove some extra results concerning the behaviour of the Stieltjes transform of µ N around 0. In section 3, we prove the analog of the exact separation of [3]. [9] generalized the approach of [3] to prove this property in the finite rank deformed Wigner model. We however show that it is still possible to use again the ideas of [17]. We establish that it is sufficient to prove that the mass (w.r.t. µ N ) of any interval of N is equal to the proportion of eigenvalues of B N B * N associated to . For this, we evaluate an integral along a certain contour enclosing the eigenvalues of B N B * N associated to . This contour is the analog of the contour introduced by [21] in the context of model (5) and was extensively used in [28]. Section 4 addresses the behaviour of the largest eigenvalues of an information plus noise spiked model. We analyse the support N of µ N , which appears equivalent to evaluate the positive extrema of a certain rational function. Using results concerning perturbed third order polynomial equations, it is shown that if λ k = σ 2 c for k = 1, . . . , K, the intervals of N are where k is any index for which λ k,N > σ 2 c, and where + (M −1/2 ) represents a positive (M −1/2 ) term. The results of section 3 imply immediately (2) and (3) when λ k = σ 2 c for k = 1, . . . , K. If one the (λ k ) k=1,...,K is equal to σ 2 c, we use an argument similar to Baik-Silverstein [5], which relies on an eigenvalue perturbation technic.

Model and assumptions
We now summarize the model and assumptions which will be used in the paper, and introduce some definitions.
where σ > 0 and B N and W N satisfy the two following assumptions. Note that the Gaussian assumption A-2 will be only required in section 3. All the results in section 2 concerning the convergence of the spectral distribution of Σ N Σ * N are also valid in the non Gaussian case. In the following, we study the context where The assumption on the multiplicities of the eigenvalues of B N B * N is not really necessary, but it allows to simplify the notations. We denote by K the rank of B N B * N (K may depend on N ), and by This of course implies that c ≤ 1. Assuming c N ≤ 1 does not introduce any restriction because if c N > 1, the eigenvalues of Σ N Σ * N are 0 with multiplicity M − N as well as the eigenvalues of matrix Σ * N Σ N . The location of this set of eigenvalues can of course be deduced from the results related to c N < 1.
In this paper, ∞ c (R, R) will denote the set of infinitely differentiable functions with compact support, defined from R to R. If ⊂ R, ∂ and Int( ) represent the boundary and the interior of respectively.
We finally recall the definition and useful well known properties of the Stieltjes transform, a fundamental tool for the study of the eigenvalues of random matrices. Let µ be a positive finite measure on R. We define its Stieltjes transform Ψ µ as the function where supp(µ) represents the support of measure µ. We have the following well-known properties 2 Characterization of the support N of measure µ N In this section, we recall some known results of [13] and [28] related to the support N of measure µ N . As we assumed in [28] that c N < 1, we also provide, when it is necessary, some details on the specific case c N = 1.

Convergence of the empirical spectral measureμ
The following result, concerning the convergence ofm N (z) can be found in [14, Th.
The behaviour of the Stieltjes transform m N around the real axis is fundamental to evaluate the support N of µ N . The following theorem is essentially due to [13].
1. If c N < 1, the limit of m N (z), as z ∈ C + converges to x, exists for each x ∈ R and is still denoted by m N (x). If c N = 1, the limit exists for

Measure µ N is absolutely continuous and its density is given by f
The statements of this theorem are essentially contained in [13, Th.2.5] (see also [28] for more details), except item 2 because it is shown in [13, Lem.2.1] that Re(1 + σ 2 c N m N (z)) ≥ 0. We therefore prove item 2 in the Appendix A.
We note that as m N is a Stietljes transform, it also satisfies m N (z * ) = m N (z) * . Therefore, it holds that In the following, we denote by f N , φ N and w N the functions defined by Functions w N and φ N are of crucial importance because, as shown in [28], the interior of N is given by We also note that (6) is equivalent to and that the identity Lemma 1.

Properties of φ
1. The function φ N admits 2Q N non-negative local extrema counting multiplicities (with

Theorem 3. The support N is given by
In the same way as in theorem 2, we set w N (x) = lim z∈C + ,z→x w N (z) for x ∈ R if c N < 1 and for x ∈ R * if c N = 1. We notice that lim z∈C − ,z→x w N (z) = w N (x) * . Function x → w N (x) satisfies the following properties.   ,N ), (x + q,N ) of respectively x − q,N and x + q,N such that, The lemma was proved in We finish this section by showing that the following result holds. N )), and item 2 of theorem 2, we get that

Corollary 1. We have
This completes the proof.

Almost sure location of the sample eigenvalues.
We first recall the following result of [28,Th.3], which states the almost sure absence of eigenvalue of Σ N Σ * N outside the support N of µ N for all large N . This property is well-known in the context of zero mean non Gaussian correlated matrices (see [2]). We note that the proof of theorem [28, Th.3] uses extensively that W N is Gaussian (assumption A-2). Theorem 4. Let a, b ∈ R, ε > 0 and N 0 ∈ N such that (a − ε, b + ε) ∩ N = for each N > N 0 . Then, with probability one, no eigenvalue of Σ N Σ * N belongs to [a, b] for N large enough.
We remark that theorem 4 extends to semi-infinite intervals [b, +∞) because, as W N W * N → (1 + c) 2 almost surely, then it holds thatλ 1,N In order to interpret this result, assume that for each N > N 1 ≥ N 0 , the number of clusters of N does not depend on N (denote Q the number of clusters), and that for each q = 1, . . . , Q, the sequences In this context, theorem 4 implies that almost surely, for each ε > 0, each eigenvalue belongs to one of the intervals [x − q − ε, x + q + ε] for N large enough. We now establish the following property, also well-know in the literature and referred to as "exact separation" (see e.g. [3] in the context of non Gaussian correlated zero mean random matrices).
Under the above simplified assumptions, this result means that almost surely for N large enough, the number of sample eigenvalues that belong to each interval [x − q − ε, x + q + ε] coincides with the number of eigenvalues of B N B * N that are associated to the cluster [x − q,N , x + q,N ]. To prove theorem 5, we use the same technic as in [28], where a less general result is presented in the case c < 1.

Preliminary results
We first need to state preliminary useful lemmas. The first lemma is elementary and is related to the solutions of the equation 1 − σ 2 c N f N (w) = 0. The next two lemmas are fundamental, and were proved by Haagerup-Thorbjornsen in [17] in the Wigner case models (see also [8]). Lemma 4 and 5 are established in [28, Prop. 4, Lem. 2 and proof of Th.3]. Note that, unlike section 2, the Gaussian assumption is required here. We give here some insights on the proof of these two lemmas for the reader's convenience.
Proof: Using the integration by part formula (see e.g. [24], [25]) and the Poincaré inequality for Gaussian random vectors [12], it is proved in [28,Prop.4 where χ N is holomorphic on C\R and satisfies with P 1 , P 2 two polynomials with positive coefficients independent of N , z. The Stieltjes inversion formula gives The polynomial bound (14) implies the bound lim sup y↓0 R ψ(x) χ N (x + i y) dx ≤ C < ∞, with C > 0 independent of N (a result shown in [8,Sec.3.3] using the ideas of [17]). Plugging (13) into (15), we obtain the desired result.
Lemma 5 is not explicitely stated in [28], but it can be proved easily using the derivation of [28, eq. (37)].

Lemma 5.
Let ψ ∈ ∞ c (R, R), independent of N and constant on each cluster of N for N large enough. Then, we have

Proof:
We only give a sketch of proof for the reader's convienence. Using the Poincaré inequality for gaussian random vectors, we obtain where the last equality follows from the application of lemma 4 to the function λ → λψ (λ) 2 . The conclusion follows from the observation that this function vanishes on N for all large N .
We are now in position to prove theorem 5.

End of the proof
We first prove (12) and assume that a > 0 because (12) is obvious if a ≤ 0. We consider η < ε and assume without restriction that 0 < η < a. We consider a function ψ a ∈ ∞ c (R, R), independent of N , such that ψ a ∈ [0, 1] and By lemma 4, we have Lemma 5 also implies that Therefore, Markov inequality leads to which implies that with probability one, The remainder of the proof is dedicated to the evaluation of − η, a)) = 0. By theorem 2, µ N is absolutely continuous with density π −1 Im(m N (x)). Therefore, it holds that In order to evaluate the righthandside of (17), we use the contour integral approach introduced in [28]. For this, we consider the curve q,N defined by It follows from lemma 2 items 1, 4 and 7 that q,N is a closed continuous contour enclosing the interval (w − q,N , w + q,N ). q,N is differentiable at each point except at w − q,N and w + q,N (see item 8 of lemma 2). However, (10) and (11) imply that |w N | is summable on [x − q,N , x + q,N ]. Therefore, for each function g continuous in a neighborhood of q,N , satisfying (g(w)) * = g(w * ), it is still possible to define the contour integral The notation − q,N means that the contour q,N is clockwise oriented. Although q,N is not differentiable, the main results related to contour integrals of meromorphic functions remain valid. In particular, it holds that In order to evaluate the righthandside of (17) using a contour integral, we remark that ∀x ∈ R\∂ N (see (8) and item 3 of lemma 2). Moreover, by item 5 of lemma 2, we have w N (x)φ N (w N (x)) = 1 on (x − q,N , x + q,N ). Therefore, we have where g N (w) is the rational function defined by .
In order to justify the existence of the integral at the righthandside of (18), we prove that g N (w) is continuous in a neighborhood of q,N . We first note that the poles of g N (w) coincide with the eigenvalues of B N B * N and the zeros (z k,N ) k=0,..., N ). It remains to check the continuity at x − q,N and x + q,N . If c N < 1, w − q,N = w N (x − q,N ) and w + q,N = w N (x + q,N ) do not coincide with one the poles of g N (w). If c N = 1 and q = 1, this property still holds true except for w − 1, N (see lemma 3). However, if c N = 1, the solutions of 1 − σ 2 N c N f N (w) are not poles of g N due to a pole zero cancellation.
can also be written as The integral can be evaluated using residue theorem and we give here the main steps of calculation. Define q = {k ∈ {1, 2, . . . , K} : λ k,N ∈ (w − q,N , w + q,N )} and L q = card( q ) > 0 (L q > 0 from lemma 1 item 3). Assume c N < 1. Since q,N only encloses (w − q,N , w + q,N ), we will have residues at the following points: • for q = 1: residues at z 0,N , 0 and z k,N , λ k,N for k ∈ 1 .
• for q ≥ 2: residues at z k,N , λ k,N for k ∈ q .
If c N = 1, the zeros of 1 − σ 2 c N f N (w) are not poles of g N (w): • for q = 1: residues at 0 and λ k,N for k ∈ 1 .
• for q ≥ 2: residues at λ k,N for k ∈ q .
We just consider the case c N < 1 in the following (the calculations are similar for c N = 1 and are therefore omitted). We consider the decomposition g N (λ) = g 1,N (λ) + g 2,N (λ) + g 3,N (λ), with These three functions admit poles at 0, λ k,N k=1,...,K , and g 3,N has moreover poles at (z k,N ) k=0,...,K . After tedious but straightforward calculations, we finally find that for k ∈ {1, 2, . . . , K}, For the residues at 0, we get Finally, the residues at z k,N for k = 0, . . . , K are given by Res(g 3,N , z k,N ) = 1−c N c N . Using these evaluations, we obtain immediately that if q ≥ 2, then, This coincides with the ratio of eigenvalues of B N B * N associated to the cluster [x − q,N , x + q,N ] (i.e. the eigenvalues λ k,N for k ∈ q ). If q = 1, Therefore, using (16), we get that But almost surely, for N large enough, Tr ψ a (Σ N Σ * N ) is exactly the number of eigenvalues contained in [0, a] because no eigenvalue of Σ N Σ * N belong to [a − η, a] (use theorem 4 with a − η in place of a). The left handside of (19) is thus an integer. Since this integer decreases at rate N −1/3 , it is equal to zero for N large enough. (12) follows from the observation that

Applications to the spiked models
In this section, we use the above results in order to evaluate the behaviour of the largest eigenvalues of the information plus noise spiked models. In the remainder of this section, we assume that Assumption A-5: K does not depend on N and for all k = 1, . . . , K, the positive sequence (λ k,N ) writes with lim N →+∞ k,N = 0 and λ i = λ j for i = j.
We define K s = max{k : λ k > σ 2 c} and the function ψ(λ) = (σ 2 +λ)(σ 2 c+λ) λ . In the following, we characterize the support N of measure µ N and use the above results on the almost sure location of the sample eigenvalues in order to prove the theorem Theorem 6. We have with probability one, We note that theorem 6 was already proved in the recent paper [7] using a different approach.

Preliminary results on perturbed equations
We first state two useful lemmas related to the solutions of perturbed equations. They can be interpreted as extensions of lemmas 3.2 and 3.3 of [5]. In the following, we denote respectively by o (z, r), c (z, r) and (z, r) the open disk, closed disk and circle of radius r > 0 with center z. Moreover, in this paragraph, the notation o(1) denotes a term that converges towards 0 when the variable ε converges towards 0. The first result is a straightforward modification of [5, lemma 3.2]. Its proof is thus omitted.
The second result is an extension of [5, Lem.3.3] to certain third degree equations. The proof is given the Appendix C.

Characterization of
From the results of the previous section, it is clear that, almost surely, Therefore, we end up with Since ψ(λ) → σ 2 (1 + c) 2 when λ → σ 2 c, this completes the results of theorem 6.

B Proof of items 6 and 8 of lemma 2 when q = 1
In order to prove these 2 statements, we study the behaviour of w N (x) and of w N (x) when x → 0, x < 0 and x → 0, x > 0.