1 Introduction

1.1 Perturbation of Eigenvalues of Self-Adjoint Matrices

In the introduction we will primarily focus on the finite dimensional case although these results will be extended to operators on separable Hilbert spaces. We denote by \({{\mathcal {H}}}_n\) the set of self-adjoint matrices of size \(n\times n\). Given \(A\in {{\mathcal {H}}}_n\), the spectral decomposition gives a unitary matrix \(U_A\) and a diagonal matrix \(\Lambda _\alpha \) such that

$$\begin{aligned} A=U_A\Lambda _\alpha U_A^*. \end{aligned}$$
(1.1)

Here the columns of \(U_A\) are the eigenvectors, \(\alpha \) is the vector of eigenvalues and \(\Lambda _\alpha \) denotes the corresponding diagonal matrix

$$\begin{aligned} \Lambda _\alpha = \begin{pmatrix} \alpha _1 &{} &{} &{} \\ {} &{} \alpha _2 &{} &{} &{} \\ &{} &{} &{} \ddots &{} \\ &{} &{} &{} &{} \alpha _n \end{pmatrix}. \end{aligned}$$

Given a “small” self-adjoint matrix E we ask how \(\alpha \) changes upon replacing A with \(A+E\). The literature on this topic is immense and can roughly be divided into two groups. One group “freeze” the matrix E and consider \(A+tE\) as a function of the complex variable t, giving rise to a beautiful and rich connection with algebra and analytic function theory [3, 15, 31]. However, it lacks a global perspective, in the sense that E is fixed and not a free variable. The second group of results do not “freeze” E, with weaker more general results as a consequence, such as the estimates by Geršgorin, Weyl, Stewart and Bauer-Fike to name a few. In this article we will present uniform third order bounds for the eigenvalues of the perturbation \(A+E\) under the condition that E is small. Hence we work in a framework which in a sense fits in between, by providing estimates that are global in E but only interesting for small \(\Vert E\Vert \).

A key result in this field is the fact that the eigenvalues \(\xi (t)\) of \(A+tE\) are real analytic functions of \(t\in {{\mathbb {R}}}\) (given a suitable ordering). This result is due to F. Rellich in a sequence of articles from the 30’s [23], and a simple proof in the finite dimensional setting is found in his monograph [24] (or consult Theorem 6.1 Ch. II in [15]). Even before that, the coefficients of the corresponding series expansion were computed by Lord Rayleigh and later Schrödinger, although they lacked a general proof that the corresponding series converged. These coefficients are typically found in the literature on mathematical physics and quantum physics, rather than books on pure mathematics such as Kato’s seminal work [15] or Rellich’s own monograph [24], for that matter. For example, they are computed in Reed-Simon’s book [22] Ch. XII.1 using complex analytic tools, while assuming that the eigenvalues of A are simple. Even without this assumption, Courant and Hilbert [8] computes them by making a simple ansatz and backing out their values from a set of equation systems, see Ch. 5.13. While these coefficients have very complicated expressions, the first and second order terms are manageable; If we suppose for simplicity that a basis has been chosen such that \(A=\Lambda _\alpha \) and moreover such that \(E(i,j)=0\) whenever \(\alpha _i=\alpha _j\) and \(i\ne j\), then

$$\begin{aligned} \xi _j(t)=\alpha _j+t E(j,j)+t^2\sum _{k:\alpha _k\ne \alpha _j}\frac{| E(k,j)|^2}{\alpha _j-\alpha _k}+O(t^3). \end{aligned}$$
(1.2)

Despite the beauty of this formula, it lacks a global perspective. In his 1953 monograph, Rellich himself points out that even introducing two unknown parameters in the perturbation leads to lack of analyticity and unpredictable behavior. In this paper we prove a generalization of (1.2) which holds for all E (with uniform control on the \(O(\Vert E\Vert ^3)\)-error term).

1.2 Novelties

Let \(A\in {{\mathcal {H}}}_n\) be a fixed given matrix, let \(\lambda \) be any one of its eigenvalues, and let m be the dimension of the eigenspace corresponding to \(\lambda \). Let \(U_A\) be a unitary matrix such that

$$\begin{aligned} A = U_A\hat{A} U_A^*,\quad {\hat{A}} = \begin{pmatrix} \lambda I_m &{} 0 \\ 0 &{} {\hat{A}}_{22} \end{pmatrix} \end{aligned}$$

where \(I_m\) is the \(m\times m\) identity matrix and \({\hat{A}}_{22}\) is a matrix whose eigenvalues are distinct from \(\lambda \). Now consider an arbitrary self-adjoint perturbation \(E\in {{\mathcal {H}}}_n\). Let \(\hat{E} = U_A^*E U_A\) and introduce the block decomposition

$$\begin{aligned} {\hat{E}} = \begin{pmatrix} {\hat{E}}_{11} &{} {\hat{E}}_{12} \\ {\hat{E}}_{21} &{} {\hat{E}}_{22} \end{pmatrix} \end{aligned}$$

where \({\hat{E}}_{11}\) is \(m\times m\). We now introduce a matrix B by

$$\begin{aligned} B = {\hat{E}}_{11}-{\hat{E}}_{12}({\hat{A}}_{22}-\lambda I_{n-m})^{-1}{\hat{E}}_{21}. \end{aligned}$$

For simplicity, we here only present the results in the finite-dimensional setting, since the results are new also for matrices (for \(m>1\) at least).

Theorem 1.1

Let \(\lambda \) be a fixed eigenvalue of \(A\in {{\mathcal {H}}}_n\) of multiplicity m and \(\{\beta _j\}_{j=1}^{m}\) be the eigenvalues of B. Then the eigenvalues \(\{\xi _j\}_{j=1}^{n}\) of \(A+E\) can be arranged such that

$$\begin{aligned} \xi _j = \lambda +\beta _j+O(\left\| E\right\| ^3), \quad 1\le j \le m. \end{aligned}$$

Example

As an example, we took the diagonal operator

$$\begin{aligned} A = \textsf{diag} (0,0,0,1,2,3,4,5,6,7,8,9) \end{aligned}$$

and perturbed it with a random self-adjoint matrix E with norm \(10^{-1}\). In this manner we obtained \((-0.04172, -0.01581, 0.03698)\) for the three smallest values of \(\xi \), whereas the eigenvalues of \(\beta \) where \((-0.04181, -0.01581, 0.03698)\) so the maximal error is precisely \(9\cdot 10^{-4}\approx 10^{-3}=\Vert E\Vert ^3\). When lowering the size of \(\Vert E\Vert \) to \(10^{-4}\), the corresponding error drops to just below \(10^{-12}\), in perfect harmony with Theorem 1.1.

We shall also extend the theorem to certain operators on separable Hilbert spaces, which is needed for applications to quantum physics. However, the above result is, to the best of our knowledge, new also for matrices. The proof in the infinite dimensional setting is more involved. Mainly since we can not rely on Geršgorin’s Circle theorem.

As a corollary, in Sect. 4 we show that

$$\begin{aligned} \xi _j=\lambda + \varepsilon _j+O(\Vert E\Vert ^2) \end{aligned}$$
(1.3)

where \(\{\varepsilon _j\}_{j=1}^m\) are the eigenvalues of \({\hat{E}}_{11}\). Although the latter estimate is much simpler to prove, we have not been able to find it either in the literature when \(m>1\). Another corollary, also new for multiplicity greater than one, pertains to the singular value decomposition, but since this is a bit lengthy to state we leave the details to Sect. 5.

The only results that we have found in the existing literature that partially overlap our results are those by G. W. Stewart, see e.g. [27], and in collaboration with J. Sun [28]. In particular, Theorem 1.1 is an extension of the key result of [27] (which also appears as Lemma 4.5 in [28]) to the case of multiplicity higher than one. In the case of \(\lambda \) being a simple eigenvalue of A, our method can be used iteratively to provide formulas with error \(O(\left\| E\right\| ^k)\), which is another extension of the above mentioned results, see Corollary 4.2. The application to singular values extends Theorem 4.6 of [28], again with the only difference that we include the case of multiplicity greater than one. We consider this application in the final section. Although not immediately obvious, the formula (1.2) for the first and second order Rayleigh-Schrödinger coefficient is a special case of the above theorem, obtained by inserting tE. (We refer to the article [5], which is a preliminary version of the present work containing a number of additional results and observations.)

Our proof will be based on an iteration of a lemma due to Issai Schur and avoids the use of Geršgorin type estimates, although based on new results [6] it would have been possible to do a proof along those lines as well. However, we prefer the present self-contained version.

1.3 Related Works

It seems that E. Schrödinger was one of the first to postulate some results and conjectures concerning perturbation series in [25], in particular this paper contains the first coefficients in the series expansion (1.2), which Schrödinger attributes to Lord Rayleigh’s investigations of harmonic vibrations of a string [29]. Such results are of key interest to mathematical physics and in particular quantum physics, where more complicated systems are considered as perturbations of simpler systems for which closed form solutions do exist. The classical example is the study of the hydrogen atom, see Example 3, Section XII.2, in Reed and Simon [22]. Many more interesting examples are found in the same chapter, and the “Notes”-section contains a more extensive historical exposition. Other books on quantum mechanics that treat perturbation theory include [8, 16, 17]. For a more recent application to quantum information theory, see [10].

F. Rellich was the first to systematically study the topic in a sequence of papers in the 30’s and 40’s (Störungstheorie der Spektralzerlegung I-V), and in particular he proved Schrödinger’s conjectures and established analyticity of eigenvalues and eigenprojections for perturbations (depending analytically on one parameter) of self-adjoint operators. The area was very active through the 50’s and 60’s which led to the classic [15] by T. Kato (see in part. Sec. 6, Ch II), still today a key reference on perturbation theory. This work is continued e.g. in Baumgärtel [3].

In parallel, global bounds for the perturbation of eigenvalues goes back to H. Weyl around 1910 [32], where in particular the famous “Weyl perturbation theorem” is established. Improvements were then given e.g. by Hoffman-Wielandt, Bauer-Fike, Mirsky and later Bhatia. It seems that (1.3) is a sort of local improvement of Weyl’s, Bauer-Fike’s and Hoffman-Wielandt’s theorems, in the sense that these results give less accurate information on the eigenvalues of \(A+E\) than (1.3) for small E. We refer to [14] (Ch. 6), [28] (Ch. IV and V) and [4] (Ch. VI and VII) for more information on this type of results.

From the numerical perspective we mention the books [9, 11, 21] and [33]. Other more recent contributions to perturbation theory for self-adjoint matrices include [2, 12, 18, 20, 30], but the results are of a different nature than those presented here. For example, Section 9 of [18] tries to understand the local behavior of eigenvalues using so called Clarke subdifferentials. The recent article [26] treats the use of Schur complements in spectral problems, but seems to have no overlap with the present article. See also Ch. 15 of [13] for an overview of modern results. The fairly recent book [34] contains a compilation of known uses of Schur complements, see in particular Section 2.5 containing a large amount of estimates on eigenvalues in the self-adjoint case.

Finally, we also mention the works [1, 7, 19] which, among other things, consider the topic of localizing the spectrum as well as contain a number of results on convergence of the spectrum when \((A_n)_{n=1}^{\infty }\) is a sequence of operators converging, in some sense, to A. However, there seems to be no overlap between those results and the theory presented in this paper.

2 The Case of Operators

Let A be a bounded self-adjoint operator on a separable Hilbert space. In general, the spectrum \(\sigma (A)\) of such an operator can be rather complicated, but the discrete spectrum behaves quite like the eigenvalues of matrices (we follow [22] for terminology which differs slightly between various books, see in particular Sec. XII.2). We recall that the discrete spectrum is the complement of the essential spectrum, and that \(\lambda \) lies in the discrete spectrum if and only if it is an isolated point in \(\sigma (A)\) which has a finite dimensional spectral projection,Footnote 1 defined e.g. by the Riesz-Cauchy functional calculus

$$\begin{aligned} P_\lambda (A)=\int _{\partial \Omega } (\zeta -A)^{-1}\frac{d\zeta }{2\pi i}, \end{aligned}$$
(2.1)

where \(\Omega \) is a disc around \(\lambda \) that has a finite distance to the remaining spectrum. The “isolation distance” is the distance between \(\lambda \) and the remainder of the spectrum. For a matrix the algebraic multiplicity of an eigenvalue coincides with the rank of the corresponding spectral projection. In the infinite dimensional setting we define the algebraic multiplicity of and eigenvalue \(\lambda \) as the rank of the corresponding projection \(P_{\lambda }\). Hence, if \(\lambda \) has algebraic multiplicity m as an eigenvalue to the self-adjoint operator A, \(A+E\) will have precisely m eigenvalues inside \(\Omega \) for small enough E, and these are part of the discrete spectrum for \(A+E\). In what follows we denote them by \(\xi _1,\ldots ,\xi _m\), and we assume that they are ordered non-increasingly. Upon introducing an orthonormal basis, we may identify the Hilbert space in question with \(\ell ^2({{\mathbb {N}}})\), and subsequently any linear operator can be identified with an infinite dimensional matrix in the obvious way.

Now let \(\lambda \) be a particular eigenvalue of multiplicity m in the discrete spectrum of A, and assume that the basis has been chosen so that A has the form

$$\begin{aligned} A = \begin{pmatrix} \lambda I_m &{}\quad 0 \\ 0 &{}\quad A_{22} \end{pmatrix}, \end{aligned}$$

where \(A_{22}\) is not necessarily diagonal. Denote by \(E_{11},~E_{12},~E_{21}\) and \(E_{22}\) the operators in the corresponding decomposition of E, (where \(E_{11}\) is a finite-dimensional matrix). Set

$$\begin{aligned} B = E_{11}- E_{12}(A_{22}-\lambda I)^{-1} E_{21}, \end{aligned}$$
(2.2)

and let \(\beta \) be the corresponding eigenvalues, which we assume are ordered non-increasingly. The extension of Theorem 1.1 then reads as follows.

Theorem 2.1

Let A and E be bounded self-adjoint operators on a separable Hilbert space. If \(\lambda \) is an eigenvalue of A of multiplicity m then there are eigenvalues \(\xi _1,\dotsc ,\xi _m\) of \(A+E\) ordered non-increasingly such that

$$\begin{aligned} \xi _j = \lambda +\beta _j+O(\left\| E\right\| ^3), \quad 1\le j \le m. \end{aligned}$$

where \(\beta _j\) are the eigenvalues of B from (2.2), ordered non-increasingly.

3 The Proofs

It is easy to see that Theorem 1.1 can be obtained from Theorem 2.1 so we content with proving the latter. A key ingredient will based on the concept of the Schur complement as introduced by Issai Schur. Given a matrix representation

$$\begin{aligned} F = \begin{pmatrix} F_{11} &{}\quad F_{12} \\ F_{21} &{}\quad F_{22} \end{pmatrix} \end{aligned}$$
(3.1)

the Schur complement of F with respect to the block \(F_{22}\) is denoted by \(F/F_{22}\) and is defined via

$$\begin{aligned} F/F_{22} = F_{11}-F_{12}F_{22}^{-1}F_{21}. \end{aligned}$$
(3.2)

If we recall (2.2) we see that the operator B introduced there is in fact a Schur complement.

Lemma 3.1

(Schur) Let F be an operator acting on \(\ell ^2({{\mathbb {N}}})\) whose matrix representation is given as in (3.1). The operator F is via a change of basis similar to both operators

$$\begin{aligned}&\begin{pmatrix} F/F_{22} &{}\quad F_{12} \\ F_{22}^{-1}F_{21}(F/F_{22}) &{}\quad F_{22}+F_{22}^{-1}F_{21}F_{12} \end{pmatrix}, \end{aligned}$$
(3.3)
$$\begin{aligned}&\begin{pmatrix} F/F_{22} &{}\quad (F/F_{22})F_{12}F_{22}^{-1} \\ F_{21} &{} \quad F_{22}+F_{21}F_{12}F_{22}^{-1} \end{pmatrix}. \end{aligned}$$
(3.4)

Proof

The matrix \( J= \begin{pmatrix} I&{}\quad 0 \\ F_{22}^{-1}F_{21} &{} \quad I \end{pmatrix} \) is invertible with inverse \(J^{-1}=\begin{pmatrix} I&{}\quad 0\\ -F_{22}^{-1}F_{21} &{}\quad I \end{pmatrix}\). The first result follows by computing \(JFJ^{-1}\), and the second identity can be proved by a similar argument, (or by applying the first to the adjoint). \(\square \)

Similar decompositions as the one in Lemma 3.1 appears throughout the study of Schur complements. This particular result appears for instance in Theorem 1.6 in [34].

Concerning the proof of Theorem 2.1, there is clearly no loss in generality in assuming that \(\lambda = 0\) since \(A+E-\lambda I\) has the same eigenvalues as \(A+E\) apart from a translation by \(\lambda \). Moreover we may assume that an orthonormal basis has been chosen so that A has the form

$$\begin{aligned} A = \begin{pmatrix} 0 &{}\quad 0 \\ 0 &{}\quad A_{22} \end{pmatrix} \end{aligned}$$
(3.5)

where \(A_{22}\) is self-adjoint and invertible. We let E denote a self-adjoint perturbation and write its corresponding block representation as

$$\begin{aligned} E = \begin{pmatrix} E_{11} &{}\quad E_{12} \\ E_{21} &{}\quad E_{22} \end{pmatrix}. \end{aligned}$$
(3.6)

Note that E here coincides with \(\hat{E}=U_A^*E U_A\) from the introduction, as we are assuming that A is diagonal to start with. The Schur complement of \(A+E\) with respect to \(A_{22}+E_{22}\) equals

$$\begin{aligned} {\tilde{B}}:= (A+E)/(A_{22}+E_{22}) = E_{11}-E_{12}(A_{22}+E_{22})^{-1}E_{21}. \end{aligned}$$

which should be compared with the operator B from the introduction, which in the present setting takes the form

$$\begin{aligned} B = E_{11}-E_{12}A_{22}^{-1}E_{21}. \end{aligned}$$

We will need the following result relating the eigenvalues of B with those of \(\tilde{B}\). Note that both matrices are self-adjoint.

Lemma 3.2

Let the eigenvalues of B and \({\tilde{B}}\), ordered non-increasingly, be denoted \(\beta \) and \(\tilde{\beta }\) respectively. Then

$$\begin{aligned} \beta _j=\tilde{\beta }_j+O(\left\| E\right\| ^3),\quad 1\le j \le m. \end{aligned}$$
(3.7)

Proof

We consider the difference

$$\begin{aligned} B-\tilde{B}&= E_{12}(A_{22}+E_{22})^{-1}E_{21}-E_{12}A_{22}^{-1}E_{21}\\&= E_{12}A_{22}^{-1}\Big (A_{22}-(A_{22}+E_{22})\Big )(A_{22}+E_{22})^{-1}E_{21}\\&= -E_{12}A_{22}^{-1}E_{22}(A_{22}+E_{22})^{-1}E_{21}. \end{aligned}$$

Thus \(\Vert B-\tilde{B}\Vert =O(\Vert E\Vert ^3)\) and the desired result then follows from Weyl’s perturbation inequality which implies that \(|\beta _j-\tilde{\beta }_j|\le \Vert B-\tilde{B}\Vert \). \(\square \)

Armed with these results, we are now in position to prove the main result.

Proof of Theorem 2.1

Due to Lemma 3.2, it suffices to prove that the eigenvalues \(\{\xi _j\}_{j=1}^{n}\) of \(A+E\) can be arranged such that

$$\begin{aligned} \xi _j = {\tilde{\beta }}_j+O(\left\| E\right\| ^3), \quad 1\le j \le m, \end{aligned}$$

keeping in mind that we have set \(\lambda =0\). We can then rewrite the \(A+E\) as

$$\begin{aligned} \begin{pmatrix} 0 &{} 0 \\ 0 &{} A_{22} \end{pmatrix} + \begin{pmatrix} E_{11} &{}\quad E_{12} \\ E_{21} &{}\quad E_{22} \end{pmatrix} = \begin{pmatrix} \tilde{B}+E_{12}(A_{22}+E_{22})^{-1}E_{21} &{}\quad E_{12} \\ E_{21} &{}\quad A_{22}+E_{22} \end{pmatrix}. \end{aligned}$$

Applying (3.3) from Lemma 3.1, we find that \(A+E\) is similar to

$$\begin{aligned}{} & {} \begin{pmatrix} \tilde{B} &{}\quad E_{12}\\ (A_{22}+E_{22})^{-1}E_{21}\tilde{B} &{}\quad A_{22}+E_{22}+(A_{22}+E_{22})^{-1}E_{21}E_{12} \end{pmatrix} \\{} & {} \quad = \begin{pmatrix} \tilde{B} &{}\quad E_{12}\\ O(\Vert E\Vert ^2) &{}\quad A_{22}+O(\Vert E\Vert ). \end{pmatrix} \end{aligned}$$

For sufficiently small values of \(\left\| E\right\| \) the operator \(A_{22}+O(\left\| E\right\| )\) is invertible, (since \(A_{22}\) is by construction). Therefore the similarity of (3.3) in Lemma 3.1 is applicable once more, and we find that \(A+E\) is similar to

$$\begin{aligned}&\begin{pmatrix} \tilde{B}-E_{12}(A_{22}+O(\Vert E\Vert ))^{-1}(A_{22}+E_{22})^{-1}E_{21}\tilde{B} &{} E_{12} \\ (A_{22}+O(\Vert E\Vert ))^{-1}O(\Vert E\Vert ^2)\big (\tilde{B}-E_{12}(A_{22}+O(\Vert E\Vert ))^{-1}O(\Vert E\Vert ^2)\big ) &{} A_{22}+O(\Vert E\Vert ) \end{pmatrix} \nonumber \\&\quad =\begin{pmatrix} \tilde{B}-E_{12}A_{22}^{-2}E_{21}E_{11}+O(\Vert E\Vert ^4) &{} E_{12} \\ O(\Vert E\Vert ^3) &{} A_{22}+O(\Vert E\Vert ) \end{pmatrix} \end{aligned}$$
(3.8)

where we used that \(\tilde{B}=O(\Vert E\Vert )\). Again the lower right corner of the matrix is of the form \(A_{22}+O(\Vert E\Vert )\) and hence this block will be invertible for small enough E. A final application of (3.3) from Lemma 3.1 gives us that

$$\begin{aligned} A+E\sim \begin{pmatrix} \tilde{B}-E_{12}A_{22}^{-2}E_{21}E_{11}+O(\Vert E\Vert ^4) &{} E_{12} \\ O(\Vert E\Vert ^4) &{} A_{22}+O(\Vert E\Vert ) \end{pmatrix}. \end{aligned}$$

(This step is only needed for the improved estimate (3.14) below.)

At this point, it is possible to conclude Theorem 1.1 by carefully invoking a new extension of Geršgorin’s theorem to the Hilbert space setting, see [6]. However, we prefer to present a self-contained proof as follows.

We now apply Lemma 3.1 to obtain a similar operator where also the upper right corner is \(O(\Vert E\Vert ^4)\), and then rely on a careful use of the Riesz-Cauchy functional calculus. To begin with, we apply (3.4) from Lemma 3.1. Note that the Schur complement denoted \(F/F_{22}\) in the lemma is in this case equal to \(\tilde{B} -E_{12}A_{22}^{-2}E_{21}E_{11}+O(\Vert E\Vert ^4)\) while \(\tilde{B} = O(\Vert E\Vert )\). We apply the lemma three times to conclude that

$$\begin{aligned}&A+E\sim \begin{pmatrix} \tilde{B} -E_{12}A_{22}^{-2}E_{21}E_{11}+O(\Vert E\Vert ^4) &{} \quad O(\Vert E\Vert )E_{12}(A_{22}+O(\Vert E\Vert ))^{-1}\\ O(\Vert E\Vert ^4) &{}\quad A_{22}+O(\Vert E\Vert ) \end{pmatrix}\nonumber \\&\quad = \begin{pmatrix} \tilde{B} -E_{12}A_{22}^{-2}E_{21}E_{11}+O(\Vert E\Vert ^4) &{}\quad O(\Vert E\Vert ^2)\\ O(\Vert E\Vert ^4) &{}\quad A_{22}+O(\Vert E\Vert ) \end{pmatrix} \nonumber \\&\qquad \sim \begin{pmatrix} \tilde{B}-E_{12}A_{22}^{-2}E_{21}E_{11}+O(\Vert E\Vert ^4)&{}\quad O(\Vert E\Vert )(A_{22}+O(\Vert E\Vert ))^{-1}O(\Vert E\Vert ^2)\\ O(\Vert E\Vert ^4)&{}\quad A_{22}+O(\Vert E\Vert ) \end{pmatrix}\nonumber \\&\qquad \sim \begin{pmatrix} \tilde{B}-E_{12}A_{22}^{-2}E_{21}E_{11}+O(\Vert E\Vert ^4) &{}\quad O(\Vert E\Vert ^4)\\ O(\Vert E\Vert ^4)&{}\quad A_{22}+O(\Vert E\Vert ) \end{pmatrix}\nonumber \\&\quad =\underbrace{\begin{pmatrix} \tilde{B}&{}0\\ 0&{}A_{22}+H(E) \end{pmatrix}}_{ \tilde{A}}+\underbrace{\begin{pmatrix} -E_{12}A_{22}^{-2}E_{21}E_{11}+O(\Vert E\Vert ^4)&{}\quad O(\Vert E\Vert ^4)\\ O(\Vert E\Vert ^4)&{}\quad 0 \end{pmatrix}}_{{\tilde{E}}} \end{aligned}$$
(3.9)

where H(E) denotes an operator that is not necessarily self-adjoint but \(O(\Vert E\Vert )\). We denote the first operator in the last expression by \({\tilde{A}}\) and the latter by \({\tilde{E}}\), so that both depend on E and \(\Vert {\tilde{E}}\Vert =O(\Vert E\Vert ^3)\). Let \(\delta \) be the distance from the spectrum of \(A_{22}\) to 0. We will now fix E with the only restriction that \(\Vert E\Vert <r\), where we assume that \(r>0\) is small enough so that \(\Vert H(E)\Vert \le \delta /3\) and \(2\Vert {\tilde{E}}\Vert <\delta /6\). Since also \(\tilde{B}\) depends continuously on E, we can in addition assume that r is such that \(|\tilde{\beta _j}|<\delta /6\) holds for all \(1\le j\le m\).

Given any \(\zeta \in {{\mathbb {C}}}\) with \(|\zeta |<\delta /3\), we shall now prove that

$$\begin{aligned} \big (\zeta I-(A_{22}+H(E))\big )^{-1}=\big (I-(\zeta I-A_{22})^{-1}H(E)\big )^{-1}(\zeta I-A_{22})^{-1} \end{aligned}$$
(3.10)

exists and is bounded by \(3/\delta \). To see this, first observe that since \((\zeta I-A_{22})^{-1}\) is a normal operator its norm equals its spectral radius, and the spectrum of \(\zeta I-A_{22}\) is outside the disc with radius \(2\delta /3\). Hence \(\Vert (\zeta I-A_{22})^{-1}\Vert \le \frac{3}{2\delta }\) so \(\Vert (\zeta I-A_{22})^{-1}H(E)\Vert \le \frac{3}{2\delta }\frac{\delta }{3}=\frac{1}{2}\) and therefore

$$\begin{aligned} \big (I-(\zeta I-A_{22})^{-1}H(E)\big )^{-1}=\sum _{k=0}^\infty \big ((\zeta I-A_{22})^{-1}H(E)\big )^{k} \end{aligned}$$

which has norm less than or equal to 2. Since \(2\frac{3}{2\delta }=\frac{3}{\delta }\), the desired estimate follows from (3.10).

Let \(\Omega \) be the union of open discs with center at \({{\tilde{\beta }}}_j\), \(j=1,\ldots ,m\), and radius \(2\Vert {\tilde{E}}\Vert \). Since \(2\Vert {\tilde{E}}\Vert <\delta /6\) and \(|{{\tilde{\beta }}}_j|<\delta /6\), \(j=1,\ldots ,m\), by the previous assumptions, \(\Omega \) is inside the disc with center 0 and radius \(\delta /3\). Moreover its boundary does not intersect with \(\sigma ({\tilde{A}})\). (To see this, note that \(\sigma ({\tilde{A}})=\sigma ({\tilde{B}})\cup \sigma (A_{22}+H(B))\), and the latter stays away from \(\partial \Omega \) by the existence of (3.10)). Given any F the spectral projection of \(\tilde{A}+F\) onto the eigenspace corresponding to the eigenvalues in \(\Omega \) is then equal to

$$\begin{aligned} \int _{\zeta \in \partial \Omega }(\zeta -({\tilde{A}}+F))^{-1}\frac{d\zeta }{2\pi i}, \end{aligned}$$
(3.11)

and it depends continuously on F as long as F is small enough (so that the inverse exists for all \(\zeta \in \partial \Omega \)). Since the rank of a projection is integer, this implies that the amount of eigenvalues (counted with algebraic multiplicity) in \(\Omega \) is constant for all F in some neighborhood of 0. We now show that this neighborhood includes a disc centered at 0 with radius \(2\Vert {\tilde{E}}\Vert \). To this end, note that for any \(\zeta \in \partial \Omega \) we have

$$\begin{aligned} \left( \zeta I-{\tilde{A}}\right) ^{-1}=\begin{pmatrix} (\zeta I-\tilde{B})^{-1} &{}\quad 0 \\ 0 &{} \quad \big (\zeta I-(A_{22}+H(E))\big )^{-1} \end{pmatrix} \end{aligned}$$
(3.12)

whenever the inverse exists. Let \(\Omega _1,\ldots ,\Omega _K\) be the connected components of \(\Omega \), ordered so that \(\Omega _{k+1}\) always lies to the left of \(\Omega _k\), which can be done since the sets are symmetric around the real axis (as \(\tilde{B}\) is self-adjoint). Note that \(\Vert (\zeta I-\tilde{B})^{-1}\Vert =(2\Vert E\Vert )^{-1}\) (since \(\zeta I-\tilde{B}\) is normal with spectrum outside the disc with center 0 and radius \(2\Vert E\Vert \)). By (3.12) it therefore follows that \(\Vert (\zeta -{\tilde{A}})^{-1}\Vert =(2\Vert {\tilde{E}}\Vert )^{-1}\) whenever \(\zeta \in \partial \Omega \) (since \(3/\delta \) is a bound for the lower right operator and \((2\Vert {\tilde{E}}\Vert )^{-1}>6/\delta \) by assumption). Finally, using a series expansion similar to (3.10), it follows that \((\zeta -({\tilde{A}}+F))^{-1}\) exists for any operator F with \(\Vert F\Vert <2\Vert {\tilde{E}}\Vert \), as desired.

In fact, since the rank of a projection is an integer, continuity implies that each \(\Omega _k\) contains precisely as many eigenvalues of \(\tilde{A}+F\) as \(\tilde{A}\) does, given that \(\Vert F\Vert <2\Vert {\tilde{E}}\Vert \). In particular, setting \(F={\tilde{E}}\) and recalling that \(\tilde{A}+\tilde{E}=A+E\) we find that every set \(\Omega _k\) contains precisely as many \(\xi _j\)’s as \({{\tilde{\beta }}}_j\)’s. Due to the ordering of the sets \(\Omega _k\) and the fact that the \(\xi _j\)’s are also ordered non-increasingly, we conclude that \(\{j:~{{\tilde{\beta }}}_j\in \Omega _k\}=\{j:~\xi _j\in \Omega _k\}\) holds for each k. Since the diameter of each \(\Omega _k\) is at most \(2 (2\Vert {\tilde{E}}\Vert ) \) times the amount of balls it is made up of (which is less than m), we see that

$$\begin{aligned} \max \{|\xi _j-{{\tilde{\beta }}}_j|:~1\le j\le m\}\le 4\Vert {\tilde{E}}\Vert m=O(\Vert E\Vert ^3), \end{aligned}$$
(3.13)

as desired. \(\square \)

We remark that the estimate in Theorem 2.1 can be made slightly more precise as follows

$$\begin{aligned} |\xi _j -\lambda +\beta _j|\le 5 m\delta ^{-2}\Vert E\Vert ^3+ O(\left\| E\right\| ^4), \quad 1\le j \le m, \end{aligned}$$
(3.14)

where \(\delta \) as above is the isolation distance of \(\lambda \). To derive the above inequality, first note that

$$\begin{aligned} |\beta _j-\tilde{\beta }_j|\le \Vert {\tilde{B}}-B\Vert \le \delta ^{-2}\Vert E\Vert ^3+O(\Vert E\Vert ^4), \end{aligned}$$
(3.15)

(by the computation in the proof of Lemma 3.2), and similarly that \({\tilde{E}} \) has the structure \(\begin{pmatrix} {\tilde{E}}_{11} &{} \quad {\tilde{E}}_{12} \\ {\tilde{E}}_{21} &{} \quad 0 \end{pmatrix} \) where \(\max (\Vert \tilde{E}_{12}\Vert ,\Vert \tilde{E}_{21}\Vert ) = O(\Vert E\Vert ^4)\) and \(\Vert \tilde{E}_{11}\Vert \le \delta ^{-2}\Vert E\Vert ^3+O(\left\| E\right\| ^4)\), from which it easily follows that \(\Vert {\tilde{E}}\Vert \le \delta ^{-2}\Vert E\Vert ^3+ O(\Vert E\Vert ^4)\). Combining (3.13) with (3.15), this implies that the left hand side of (3.14) is bounded by \((1+4m)\delta ^{-2}\Vert E\Vert ^3+O(\left\| E\right\| ^4)\) which, since \(1\le m\), gives the desired bound.

4 Further Results

We first prove the estimate (1.3) from the introduction, which to our best knowledge is new as well. Recall \(E_{11}\) as defined in (3.6).

Corollary 4.1

Let A and E be bounded self-adjoint operators on a separable Hilbert space. If \(\lambda \) is an eigenvalue of A of multiplicity m then there are eigenvalues \(\xi _1,\dotsc ,\xi _m\) of \(A+E\) such that

$$\begin{aligned} \xi _j = \lambda +\varepsilon _j+O(\left\| E\right\| ^2), \quad 1\le j \le m. \end{aligned}$$

where \(\varepsilon _j\) are the eigenvalues of \(E_{11}\) (ordered non-increasingly).

Proof

By Weyl’s inequality the eigenvalues \( \beta \) of B from Theorem 2.1 satisfy

$$\begin{aligned} |\beta _j-\varepsilon _j|\le \Vert B-E_{11}\Vert =\Vert E_{12}A_{22}^{-1}E_{21}\Vert . \end{aligned}$$

The right hand side is \(O(\Vert E\Vert ^2)\), and hence the desired estimate directly follows from Theorem 2.1. \(\square \)

The main problem with extending the proof of Theorem 2.1 to higher order formulas is that the the operator which takes the role of \({\tilde{B}}\) in this setting no longer is self-adjoint. Indeed (3.9) implies that

$$\begin{aligned} A+E\sim \begin{pmatrix} \tilde{B} -E_{12}A_{22}^{-2}E_{21}E_{11}+O(\Vert E\Vert ^4) &{} \quad O(\Vert E\Vert ^4) \\ O(\Vert E\Vert ^4) &{}\quad A_{22}+O(\Vert E\Vert ) \end{pmatrix} \end{aligned}$$

and here \(E_{12}A_{22}^{-2}E_{21}E_{11}\) need not be self-adjoint. For these reasons we refrain from pursuing this in the general case. However, if the eigenvalue is simple, i.e. when \(m=1\), the latter issue disappears. In this case we conclude that a generalized form of the eigenvalue approximation found in Lemma 4.5 in Ch. V.4 of [28] holds.

Corollary 4.2

Let A and E be bounded self-adjoint operators on a separable Hilbert space. Assume that \(\lambda \) is an isolated eigenvalue of A of multiplicity 1. Let \(\xi \) be the corresponding eigenvalue of \(A+E\), and denote the number \(E_{11}\) from (3.6) by \(\varepsilon \). Then

$$\begin{aligned} \xi =\lambda + \varepsilon -E_{12}A_{22}^{-1}E_{21}+E_{12}A_{22}^{-1}E_{22}A_{22}^{-1}E_{21}-\varepsilon E_{12}A_{22}^{-2}E_{21}+O(\left\| E\right\| ^4) \end{aligned}$$

as \(\Vert E\Vert \rightarrow 0\).

Proof

As before we assume that \(\lambda = 0\). By (3.9) \(A+E\) is similar to

$$\begin{aligned}\begin{pmatrix} \varepsilon -E_{21}A_{22}^{-1}E_{21}+E_{12}A_{22}^{-1}E_{22}A_{22}^{-1}E_{21}-\varepsilon E_{12}A_{22}^{-2}E_{21}+ O(\Vert E\Vert ^4)&{} O(\Vert E\Vert ^4). \\ O(\Vert E\Vert ^4) &{} A_{22}+O(\Vert E\Vert ) \end{pmatrix} \end{aligned}$$

so the result follows from the same argument as provided in the proof of Theorem 2.1 (with \(m=1\)). \(\square \)

Remark

We should remark that iterating the process described in the proof of Theorem 1.1k-times gives an explicit approximation with error term of the form \(O(\left\| E\right\| ^{k+1})\), which can be used to extend the above corollary to higher orders. However, we have not been observed any particular structure of the approximant that is suitable for a closed formula and hence refrain from pursuing this further.

5 An Application: Singular Values

We recall that every \({n_1}\times {n_2}\) matrix A has a singular value decomposition. That is we can find unitary matrices U and V of type \({n_1}\times {n_1}\) and \({n_2}\times {n_2}\) respectively such that

$$\begin{aligned} U^*A V = \Sigma \end{aligned}$$

where \(\Sigma \) is an \({n_1}\times {n_2}\) rectangular diagonal matrix with non-negative entries. If E is a given perturbation then \(A+E\) shares singular values with \(\Sigma +U^*EV\), since these are invariant under multiplication with unitary matrices. We can therefore restrict our study of perturbed singular values to matrices where the fixed term is diagonal and non-negative. We consider the following theorem which directly generalizes the singular value estimate given as Theorem 4.6 in Chapter V of [28] to the case of singular values of arbitrary multiplicity. To simplify the notation we assume that \({n_1}>{n_2}\) although this is no real restriction as we can always consider the transpose which then shares the singular spectrum. Let \(\varsigma \) be a particular singular value of multiplicity m, and assume that

$$\begin{aligned}\Sigma = \begin{pmatrix} \varsigma I &{}\quad 0 \\ 0 &{}\quad \Lambda _{\tau } \\ 0 &{}\quad 0 \end{pmatrix}, \end{aligned}$$

where \(\Lambda _{\tau }\) is diagonal and contain the remaining singular values (possibly both larger and smaller than \(\varsigma \)). Consider the perturbation \(\Sigma +E\), where E has corresponding block representation

$$\begin{aligned} E = \begin{pmatrix} P &{}\quad Q\\ R &{}\quad S \\ X &{}\quad Y \end{pmatrix}, \end{aligned}$$

and let \((\sigma _j)_{j=1}^{n_2}\) be the corresponding singular values, ordered non-increasingly. Let \(k(\varsigma )\) denote the number of singular values of \(\Sigma \) larger than \(\varsigma \), counting multiplicity. We then have

Theorem 5.1

Let \(\{\mu _j\}_{j=1}^{m}\) be the non-increasing enumeration of the eigenvalues of

$$\begin{aligned} M= & {} \varsigma (P+P^*)+P^*P+R^*R+X^*X\nonumber \\{} & {} -(\varsigma Q+R^*\Lambda _{\tau })(\Lambda _{\tau }^2-\varsigma ^2 I)^{-1}(\varsigma Q+R^*\Lambda _{\tau })^*. \end{aligned}$$
(5.1)

Then

$$\begin{aligned} \sigma _{k(\varsigma )+j}^2 = \varsigma ^2+\mu _j+O(\Vert E\Vert ^3),\quad 1\le j\le m. \end{aligned}$$
(5.2)

Proof

The singular values of \(\Sigma +E\) are the square roots of the eigenvalues of \((\Sigma +E)^*(\Sigma +E)\). By expanding the parentheses we obtain

$$\begin{aligned} (\Sigma +E)^*(\Sigma +E) = \Sigma ^*\Sigma +\Sigma ^*E+E^*\Sigma +E^*E. \end{aligned}$$
(5.3)

In order to connect with the notation in the previous sections we note that the perturbation now becomes \(\Sigma ^*E+E^*\Sigma +E^*E\) while the fixed term is \(\Sigma ^*\Sigma \). In block form the summands of (5.3) become

$$\begin{aligned} \Sigma ^*\Sigma&= \begin{pmatrix} \varsigma ^2 I &{} \quad 0 \\ 0 &{} \quad \Lambda _{\tau }^2 \end{pmatrix},\\ \Sigma ^*E&= \begin{pmatrix} \varsigma P &{}\quad \varsigma Q\\ \Lambda _{\tau } R &{}\quad \Lambda _TS \end{pmatrix},\\ E^*\Sigma&= \begin{pmatrix} \varsigma P^*&{}\quad R^*\Lambda _{\tau }\\ \varsigma Q^*&{} \quad S^*\Lambda _{\tau }\\ \end{pmatrix},\\ E^*E&= \begin{pmatrix} P^*P+R^*R +X^*X&{} P^*Q + R^*S+X^*Y\\ Q^*P+S^*R +Y^*X&{} Q^*Q+S^*S+Y^*Y \end{pmatrix}. \end{aligned}$$

The matrix B (recall (2.2)) thus becomes

$$\begin{aligned} {B}=&\varsigma (P+P^*)+P^*P+R^*R+X^*X- \\&(\varsigma Q+R^*\Lambda _{\tau }+P^*Q+R^*S+X^*Y)(\Lambda _{\tau }^2-\varsigma ^2 I)^{-1}\\&(\varsigma Q+R^*\Lambda _{\tau }+P^*Q+R^*S+X^*Y)^*\end{aligned}$$

and hence Theorem 2.1 implies that the singular values of \(\Sigma +E\) satisfy

$$\begin{aligned} \sigma _{k(\varsigma )+j}^2 = \varsigma ^2+ \beta _j+O(\Vert E\Vert ^3),\quad 1\le j\le m, \end{aligned}$$
(5.4)

(where \((\beta _j)_{j=1}^m\) are the eigenvalues of B). Note that the matrix M, as defined in (5.1), differs from B by

$$\begin{aligned}&(P^*Q+R^*S+X^*Y)(\Lambda _{\tau }^2-\varsigma ^2 I)^{-1}(\varsigma Q+R^*\Lambda _{\tau })^*+\\&(\varsigma Q+R^*\Lambda _{\tau })(\Lambda _{\tau }^2-\varsigma ^2 I)^{-1}(P^*Q+R^*S+X^*Y)^*+\\ {}&(P^*Q+R^*S+X^*Y)(\Lambda _{\tau }^2-\varsigma ^2 I)^{-1}(P^*Q+R^*S+X^*Y)^*\end{aligned}$$

which clearly is \(O(\Vert E\Vert ^3)\), and therefore Weyl’s perturbation theorem implies that

$$\begin{aligned} \mu _j = {\beta }_j+O(\Vert E\Vert ^3),\quad 1\le j\le m. \end{aligned}$$

The desired result now follows immediately by combining this with (5.4). \(\square \)