The Kato–Temple inequality and eigenvalue concentration with applications to graph inference

: We present an adaptation of the Kato–Temple inequality for bounding perturbations of eigenvalues with applications to statistical inference for random graphs, speciﬁcally hypothesis testing and change-point detection. We obtain explicit high-probability bounds for the individual distances between certain signal eigenvalues of a graph’s adjacency matrix and the corresponding eigenvalues of the model’s edge probability matrix, even when the latter eigenvalues have multiplicity. Our results extend more broadly to the perturbation of singular values in the presence of quite general random matrix noise.


Overview
Eigenvalues and eigenvectors are structurally fundamental quantities associated with matrices and are widely studied throughout mathematics, statistics, and engineering disciplines. For example, given an observed graph, the eigenvalues and eigenvectors of associated matrix representations (such as the adjacency matrix or Laplacian matrix) encode structural information about the graph (e.g. community structure, connectivity [8]). In the context of certain random graph models, the eigenvalues and eigenvectors associated with the underlying matrix-valued model parameter, namely the edge probability matrix, exhibit similar information. It is therefore natural to study how "close" the eigenvalues and eigenvectors of a graph are to the underlying model quantities. median and around its expectation, E[λ i (A)] [1]. Unfortunately, since the latter quantities are inaccessible in practice, such bounds are of limited practical use.
By way of contrast, numerous results in the literature bound the spectral norm matrix difference A− E[A] 2 , thereby immediately and uniformly bounding each of the eigenvalue differences |λ i (A) − λ i (E[A])| via an application of Weyl's inequality. For example, [27] proved an asymptotically almost surely spectral norm bound of A − E[A] 2 = O( √ ∆ log n) for ∆ = Ω(log(n)) where ∆ ≡ ∆(n) denotes the maximum expected degree of a graph. In [23] the above bound is improved to A − E[A] 2 ≤ (2 + o(1)) √ ∆ under the stronger assumption that ∆ = ω(log 4 n) with further refinement being subsequently obtained in [22]. We on the other hand show that under certain conditions, for particular eigenvalue pairs one can obtain tighter and non-uniform high probability bounds of the form |λ i (A) − λ i (E[A])| = O(log δ n) for small δ > 0.
Spectral theory for random graphs overlaps with the random matrix theory literature. There, asymptotic analysis includes proving, for example, convergence of the empirical spectral distribution to a limiting measure [9]. Related approaches to studying the spectrum of random graphs consider normalized versions of the adjacency matrix [20] and employ standard random matrix theory techniques such as the Stieltjes transform method [4,41]. In contrast, we do not study normalized versions of the adjacency or the edge probability matrix.
Indeed, much of the existing literature focuses on properties of eigenvectors corresponding to random graphs [11,22,35] given, among other reasons, the success of spectral clustering methods for graph inference [38]. We do not consider eigenvectors since our aim is to demonstrate the usefulness of adapting and applying the eigenvalue-centric Kato-Temple framework.
The stochastic block model (SBM) offers an example of an inhomogeneous random graph model which is wildly popular in the literature [6,17,21,22,42] and in which our results apply to the top (signal) eigenvalues of A and P . Previously, the authors in [3] obtained a collective deviation bound on the top eigenvalues of A and P for certain stochastic block model graphs in order to prove the main limit theorem therein. Our Theorem 3.3 improves upon Lemma 2 in [3] by removing a distinct eigenvalue assumption and by yielding stronger high-probability deviation bounds for pairs of top eigenvalues of A and P which are of the same order. This implies a statistical hypothesis testing regime for random graphs which is discussed further in Section 4.

Organization
The remainder of this paper is organized as follows. In Section 2 we introduce notation and the Kato-Temple eigenvalue perturbation framework. In Section 3 we present our results for random graphs and more generally for matrix perturbation theory. There we also include illustrative examples together with comparative analysis involving recent results in the literature. In Section 4 we discuss applications of our results to problems involving graph inference. Sections 5 and 6 contain our acknowledgments and the proofs of our results, respectively.

Setup and notation
Let ·, · denote the standard Euclidean inner (dot) product between two vectors, · denote the vector norm induced by the dot product, and · 2 denote the spectral norm of a matrix. The identity matrix is implicitly understood when we write the difference of a matrix with a scalar. In this paper, O(·), Ω(·), and Θ(·) denote standard big-O, big-Omega, and big-Theta notation, respectively, while o(·) and ω(·) denote standard little-o and little-omega notation, respectively.
As prefaced in Section 1, we consider simple, undirected random graphs on n vertices generated by the inhomogeneous Erdős-Rényi model, G ∼ G(n, P ), via the corresponding (binary, symmetric) adjacency matrix A ≡ A G . Given an open interval in the positive half of the real line, (α, β) ⊂ R >0 , we denote the d eigenvalues of P that lie in this interval (locally) by and similarly for A, noting that for A this amounts to a probabilistic statement. By symmetry one can just as well handle the case when the interval lies in the negative half of the real line. We are principally interested in eigenvalues that are large in magnitude, so we do not consider the case when the underlying interval contains the origin.
To highlight the Kato-Temple framework for bounding eigenvalues, we now reproduce two lemmas from [18] along with the Kato-Temple inequality as stated in [16] (see Theorem 2.3 below). 1 These results all hold in the following common setting.
Let H be a self-adjoint operator on a Hilbert space. Assume a unit vector w is in the domain of H and define η := Hw, w along with ǫ := (H −η)w , noting that η 2 + ǫ 2 = Hw 2 . The quantity η may be viewed as an "approximate eigenvalue" of H corresponding to the "approximate eigenvector" w, while ǫ represents a scalar residual term.
Remark 2.4 (Hermitian dilation). Given an m × n real matrix M , it will be useful to consider the corresponding real symmetric (m+n)×(m+n) Hermitian dilation matrixM given byM It is well-known that the non-zero eigenvalues ofM correspond to the signed singular values of M (see Theorem 7.3.3 in [15]). This correspondence between the singular values of arbitrary matrices and the eigenvalues of Hermitian matrices allows our results to generalize beyond the IERM setting to the more general study of matrix perturbation theory for singular values in a straightforward manner.

Results for random graphs
In the IERM setting, a graph's adjacency matrix can be written as A = P + E where E := A−P is a random matrix and P is the (deterministic) expectation of A. We begin with a preliminary observation concerning the tail behavior of A−P which will subsequently be invoked for the purpose of obtaining standard union bounds. The proof follows from a straightforward application of Hoeffding's inequality.
Proposition 3.1 (General IERM concentration). Let u, v ∈ R n denote (nonrandom) unit vectors. Then for any t > 0, It is indeed possible to invoke more refined concentration inequalities than Proposition 3.1 in the presence of additional structure (e.g. when all entries of P have uniformly very small magnitude). Doing so is particularly useful when it is simultaneously possible to obtain a strong bound on A − P 2 . This observation will be made clearer in the context of Theorem 3.3 below. Furthermore, consideration of Proposition 3.1 will facilitate the subsequent presentation of our generalized results which extend beyond the IERM setting.
Remark 3.2. In this paper the main diagonal elements of P are allowed to be strictly positive, in which case realizations of A need not necessarily be hollow (i.e. observed graphs may have self-loops). To avoid graphs with self-loops, one may either condition on the event that A is hollow or set the main diagonal of P to be zero. In the former case, note that P ≡ E[A] no longer holds on the main diagonal. In the latter case, a modified version of Proposition 3.1 holds.
We now present our main results for the IERM setting. The proofs, which are located in Section 6, also formulate a bound for the special case when the upper bound threshold β may be chosen to be infinity. This special case is particularly useful in applications.

Theorem 3.3 (IERM eigenvalue perturbation bounds, conditional version).
Let the matrices A ∈ {0, 1} n×n and P ∈ [0, 1] n×n correspond to the IERM setting described in Section 2. Suppose the interval (α, β) ⊂ R >0 contains precisely d eigenvalues of P , λ 1 (P ) ≤ λ 2 (P ) ≤ · · · ≤ λ d (P ) (possibly with multiplicity). Condition on the event that (α, β) contains precisely d eigenvalues of A, is an orthonormal collection of eigenvectors of P corresponding to the eigenvalues with probability at least with probability at least Remark 3.4. Our proof depends upon several new observations with respect to Kato's original argument. In particular, for w i as defined above, the ma- need not constitute an orthonormal collection of "approximate eigenvectors" of A in the sense of [18]. Instead, here the notion of "approximate" may be interpreted via Proposition 3.1 as the source of randomness which allows for Kato-Temple methodology to be adapted beyond the original deterministic setting. Of additional note is that the vectors w i as defined in this paper agree in function and notation with Kato's original paper, the operational distinction being that our setting provides a canonical choice for these vectors.
Remark 3.5. We note that the appearance of E 2 2 in the formulations of ζ + , ζ − can be replaced by taking the appropriate maximum over quantities of the form Ew i 2 (see Equation (6.19)). That is to say, in the presence of additional local structure and knowledge, one can refine the above bounds in Theorem 3.3.
Remark 3.6. In settings wherein the eigenvalues of interest have disparate orders of magnitude, Kato-Temple methodology is not guaranteed to yield useful bounds. This can be seen in the bounds' dependence on the ratio of eigenvalues of P in Theorem 3.3. Moreover, within the Kato-Temple framework, poor separation from the remainder of the spectrum also deteriorates the bounds, as is evident in the denominators' dependence on the interval endpoints α and β along with the smallest and largest local eigenvalues of P . On the other hand, by further localizing, i.e. by restricting to a subset of d ′ < d eigenvalues in a particular interval, applying Theorem 3.3 to said fewer eigenvalue pairs may yield improved bounds (see Example 3.14 and Remark 3.5).
Next, we formulate an unconditional version of Theorem 3.3. For both simplicity and the purpose of applications, Theorem 3.7 is stated in terms of the largest singular values in the IERM setting.
Theorem 3.7 (IERM singular value perturbation bounds, unconditional version). Let the matrices A ∈ {0, 1} n×n and P ∈ [0, 1] n×n correspond to the IERM setting described in Section 2 with maximum expected degree (via P ) given by ∆ ≡ ∆(n). Denote the d + 1 largest singular values of A by 0 ≤σ 0 <σ 1 ≤ · · · ≤ σ d , and denote the d + 1 largest singular values of P by 0 ≤ σ 0 < σ 1 ≤ · · · ≤ σ d . Suppose that ∆ = ω(log 4 n), σ 1 ≥ C∆, and σ 0 ≤ c∆ for some absolute con- A similar version of Theorem 3.12 holds when ∆ = Ω(log n) under slightly different assumptions on the entries of P for which one still has A − P 2 = O( √ ∆) with high probability [22]. On a related yet different note, see [20] for discussion of the sparsity regime ∆ = O(1) in which graphs fail to concentrate in the classical sense.
Remark 3.8 (Random dot product graph model). When the edge probability matrix P can be written as P = XX ⊤ for some matrix X ∈ R n×d with d ≪ n, then the IERM corresponds to the popular random dot product graph (RDPG) model [40]. In the random dot product graph model, the largest eigenvalues of A and P are of statistical interest in that they represent spectral "signal" in the model. These eigenvalues are separated from the remainder of their respective spectra and lie in an interval of the form (α, ∞) where, for example, α may be taken to be O( A − P 2 ).
Among its applications, the RDPG model has been used as a platform for modeling graphs with hierarchical and community structure [24]. In addition, a central limit theorem is known for the behavior of the top eigenvectors of adjacency matrices arising from the RDPG model [3]. In particular, the main limit theorem in [3] relies upon a lemma which collectively bounds the differences between top eigenvalues of A and P while requiring a stringent eigengap assumption. Namely, for δ gap := min i (σ i+1 (P ) − σ i (P ))/∆ > 0, Lemma 2 in [3] yields that with probability In contrast, using Theorem 3.7 with σ 0 := 0, we do not require the gap assumption δ gap > 0 and still obtain that with probability In practice, models involving repeated or arbitrarily close eigenvalues are prevalent and of interest (e.g. Section 4.2). As such, the above improvement is nontrivial and of practical significance.
Remark 3.9 (Latent position random graphs). Theorem 3.3 further extends to the more general setting of latent position random graphs. There, the matrix P is viewed as an operator [κ(X i , X j )] n i,j=1 where X i and X j are independent, identically distributed latent positions with distribution F and the positive definite kernel, κ (viewed as an integral operator), is not necessarily of finite fixed rank as n increases [13,36]. Note that for the RDPG model, the kernel κ is simply the standard Euclidean inner product between (latent position) vectors.

Results for matrix perturbation theory
The behavior of the random matrix A − P (see Proposition 3.1) represents a specific instance of more general, widely-encountered probabilistic concentration as discussed in [28] and formulated in the following definition.  28]). An m × n random real ("error") matrix E is said to be (C, c, γ)-concentrated for a trio of positive constants C, c, γ > 0 if for all unit vectors u ∈ R n , v ∈ R m and for every t > 0, then In particular, the IERM setting corresponds to (C, c, γ) concentration where m = n, C = γ = 2, and c = 1. For the Hermitian dilation discussed in Remark 2.4 one has the following important correspondence between E andẼ. Lemma 3.11 ([28]). Let E ∈ R m×n be (C, c, γ)-concentrated. DefineC := 2C andc := c/2 γ . Then the matrixẼ ∈ R m+n×m+n is (C,c, γ)-concentrated. Definition 3.10 and Lemma 3.11 together with Remark 2.4 allow for Theorem 3.3 to be generalized in a straightforward manner. We frame the generalization in the context of a signal-plus-noise matrix model with tail probability bounds. In particular, replace A withM := M + E, thought of as an observed data matrix. Replace P with M , thought of as an underlying signal matrix, so that the matrix A − P becomes E, thought of as an additive error matrix. We emphasize that the following generalization is in terms of the singular values of M andM . This generalization resembles the formulation of a result obtained in [28] using different methods; however, unlike our Theorem 3.12, the bound in [28] depends upon the rank of M and assumes that the rank is known.
Given a matrix M ∈ R m×n , write its singular value decomposition as M ≡ U ΣV ⊤ where M v i = σ i u i holds for the normalized left (resp., right) singular vectors v i (resp., u i ) and singular values σ i = Σ i,i . For each i such that σ i > 0, definew i ∈ R m+n to be the concatenated unit vector w i : Condition on the event that the interval (α, β) contains precisely d singular values ofM , denoted 0 <σ 1 ≤σ 2 ≤ · · · ≤σ d , as well as with probability at least with probability at least . Moreover, the upper and lower bound hold collectively As with the results in Section 3.1, Theorem 3.12 can be formulated unconditionally and for collections of not-necessarily-the-largest singular values. Both of these aspects are explored in greater detail in Example 3.15. The following technical lemma will subsequently be employed in the application of unconditional bounds in Section 4. Lemma 3.13. Let E ∈ R m×n be a (C, c, γ)-concentrated random matrix. Choose ǫ > 0 such that 2 + ǫ > 2 (2 log(9)/c) 1/γ and define the quantity c ǫ,c,γ := (c(1 + ǫ/2) γ − 2 log(9)) > 0. Then, If in addition m = n and E is assumed to be symmetric, then the quantity 2 log(9) above may be replaced by log (9), an improvement.

Two illustrative examples
In the remainder of this section we present two examples which highlight the usefulness and flexibility of Kato-Temple methodology. We begin with Example 3.14 which presents a simple stochastic block model setting wherein our results compare favorably with those in the recent work of [28], noting that in general neither [28] nor this paper dominates the other.
Example 3.14 (Balanced two block stochastic block model). Consider an n vertex realization from a two block (affinity) stochastic block model in which 0 < q < p < 1 where p and q denote the within-block and between-block edge probabilities, respectively. Suppose each block contains n/2 of the graph's vertices. The signal singular values and maximum expected degree of this rank two model are given by σ 1 (P ) = n 2 (p − q), σ 2 (P ) = n 2 (p + q), and ∆ = σ 2 (P ). (3.11) For the purposes of large n comparison, view E 2 ≈ 2 √ ∆ from [23] and set the lower threshold α to be E 2 . Define r p,q to be the edge probability-dependent parameter r p,q := (p + q)/(p − q). Then via Kato-Temple methodology applied jointly to σ 1 (P ) and σ 2 (P ), with probability approximately 0.99 when t KT ≥ 2.55, for each singular value, respectively, By the same approach, the bounds obtained in [28] are given by Direct application of the results in [28] yields probability approximately at least 0.99 for t OV W ≥ 11.6, though it appears upon further inspection that this can be improved to, for example, t OV W ≥ 5.6. The above joint analysis demonstrates that our bounds are favorable for the pair {σ 1 (A), σ 1 (P )} whereas the opposite is true for the pair {σ 2 (A), σ 2 (P )}. We emphasize that here the upper bounds are of primary importance and interest. Indeed, the (C, c, γ) property allows for straightforward lower bounds to be obtained by epsilon net techniques together with the Courant-Fisher-Weyl min-max principle. For example, note that a single application of (C, c, γ)concentration yields thatσ 2 (A) − σ 2 (P ) ≥ −t with probability at least 1 − C exp(−ct γ ).
Among the advantages of the Kato-Temple methodology is the ability to, in certain cases, refine one's initial analysis by further localizing the interval (α, β). This is possible in the current example wherein we can "zoom in" further on the largest signal singular value. In particular, keeping the same indexing as above and setting α to be E 2 + σ 1 (P ), then for n large and with probability approximately 0.99, we have Throughout this example, we note the bounds' dependence upon the underlying parameter p and q. By virtue of the large n comparison here, these parameters do not meaningfully influence the underlying probabilistic statement.
In contrast to the low rank setting of Example 3.14, Example 3.15 below demonstrates how our results can be applied to the problem of estimating signal in a high rank matrix setting. where τ, κ > 0. By slight abuse of notation, denote the singular values of M up to multiplicity by σ 1 := 1, σ 2 := κ + 1, and σ 3 := τ + κ + 1.
Further suppose that E ∈ R q×q has entries which are independent, identically distributed standard normal random variables. It follows by Gaussian concentration that E is (C, c, γ)-concentrated with parameters C = 2, c = 1 2 , and γ = 2, and so by an application of Lemma 3.13 for ǫ = 4, then DefineM := M + E and organize the singular values ofM in correspondence with the repeated singular values of M , namely write . Then we can use Weyl's inequality as a preliminary device for selecting the threshold values α and β. In particular, such analysis yields that with high probability, For the choices α = 6 √ q + 2 and β = τ + κ − 6 √ q, observe that {σ 2,i2 } n i2=1 ⊂ (α, β) ⊂ R >0 while simultaneously {1, κ + 1, τ + κ + 1} (α, β) = {κ + 1}. In this setting our perturbation theorems apply for κ sufficiently large. Namely, choosing δ ∈ (0, 1] and setting t = Θ(log δ q) yields that for each k ∈ [n] there exist positive constants c ′ and c ′′ such that with high probability, To reiterate, this bound improves upon the bound implied by a naïve, terminal application of Weyl's inequality. Moreover, Example 3.15 demonstrates how Weyl's inequality may be invoked for the preliminary purpose of establishing threshold values when the paired singular values (eigenvalues) correspond to the same index after ordering.

Methods of graph inference
The field of statistical inference and modeling for graphs represents a burgeoning area of research with implications for the social and natural sciences among other disciplines [12,19]. Within the current body of research, the pursuit of identifying and studying community structure within real-world networks continues to receive widespread attention [2,5,11,25,26,37]. Still another area of investigation involves anomaly detection for time series of graphs by considering graph statistics such as the total degree, number of triangles, and various scan statistics [33,39]. Here we apply our results to two such detection tasks.

Community detection via hypothesis testing
In this application we view the problem of community detection through the lens of hypothesis testing as in [2,37]. We consider the simple setting of a balanced three block stochastic block model and the problem of detecting differences in between-block communication. Namely, consider the block edge probability matrix and block assignment vector given by where p = 0.81 and q = 0.2025. In this model, vertices have an equal probability of belonging to each of the three blocks. Vertices within the same block have probability p of being connected by an edge, whereas for vertices in different blocks the probability is q.
As an aside, we note that this SBM may be cast in the language of random dot product graphs for which the underlying distribution of latent positions F is a mixture of point masses. Specifically, take F to be the discrete uniform distribution on the vectors x 1 ≈ (0.55, 0.32, 0.64), x 2 ≈ (−0.55, 0.32, 0.64), and x 3 ≈ (0, −0.64, 0.64) in R 3 (see Remarks 3.8 and 3.9).
For a graph on n vertices from this three block model, condition on the graph exhibiting equal block sizes, i.e. n 1 = n 2 = n 3 = n/3. For the corresponding P matrix, denoted P n (B 0 ), the non-trivial (signal) model eigenvalues themselves exhibit multiplicity (hence Equation (3.5) via [3] does not apply) and are λ 1 (P n (B 0 )) = λ 2 (P n (B 0 )) = n 3 (p − q) and λ 3 (P n (B 0 )) = n 3 (p + 2q). (4.2) In contrast, consider an alternative model in which the first and second blocks exhibit stronger between-block communication. This stronger communication is represented by an additional additive factor ǫ ∈ (0, p − q) in the block edge probability matrix B ǫ , where ǫ is assumed to be bounded away from p − q for convenience.
Under B ǫ , the signal eigenvalues of P n (B ǫ ) (equiv., singular values) can be explicitly computed as functions of p, q, n, and ǫ. They are given by Furthermore, the maximum expected degree of the model corresponding to B ǫ is given by ∆ ǫ = n 3 (p + 2q + ǫ). Given ǫ > 0, one may formulate a simple null versus simple alternative hypothesis test written as In what follows we choose the smallest signal eigenvalue as our test statistic and denote it by Λ 1 . We compare our bounds obtained via Kato-Temple methodology with the large-sample approximation bounds implied by [23] for the specified values n ∈ {6000, 9000, 12000, 15000}. Similar comparison can be carried out with respect to the results in [28]. Our bounds compare favorably with those in [28] even for conservative choices of t therein.
By Lemma 3.13 and Proposition 3.1, irrespective of ǫ > 0 above, we have the concentration inequality P [ E 2 > 3 √ n] ≤ 2 exp − 1 20 n . This spectral norm bound allows us to invoke an unconditional version of Theorem 3.3. Specifically, for moderate choices of t > 0, the bounds in Theorem 3.3 hold with probability at least 1 − 12 exp(−t 2 ) − 2 exp − 1 20 n . When n ≥ 6000, the choice t ≈ 2.66 yields probability at least 0.99.
Using these concentration inequality results, we determine confidence intervals which hold for Λ 1 with probability at least 0.99 under H 0 and H A , respectively. We compute the value ǫ n such that the confidence intervals under H 0 and H A no longer overlap for ǫ ∈ (ǫ n , 0.2], emphasizing that smaller values of ǫ n indicate superior performance. This provides us with a region of the alternative in which our statistical test has power at least 0.99. Our results are summarized in the numerical table below. It is not too difficult to realize that the eigenvalue-based test considered here has asymptotic power equal to one as n → ∞ for any choice of 0 < q < p < 1 and ǫ ∈ (0, p − q). Moreover, as a consequence of Theorem 3.7 and subsequent discussion, we make the following observation.  (4.4). Assume that q ≡ q n = ω( log n n ) with q n < p n . Then for nǫ n = ω(log n) and ǫ n < p n − q n , the above test using Λ 1 has asymptotically full power.
Note that the above analysis investigates testing performance as a function of ǫ for graphs with fixed block proportions. Next we investigate a setting wherein ǫ is fixed and the sizes of the graph communities change.

Change-point detection
We now consider a stylized example of change-point detection via hypothesis testing. Let T * ≥ 1 and suppose that G 1 , G 2 , . . . , G T for T < T * are Erdős-Rényi graphs on n vertices, while for T ≥ T * the graph G T is sampled according to a two block stochastic block model with block edge probability matrix B = pǫ p p p for p ǫ := p + ǫ and ǫ > 0, with m vertices assigned to the first block and n − m vertices assigned to the second block. We note that B encapsulates a notion of chatter anomaly, i.e., a subset of the vertices in [n] exhibit altered communication behavior in an otherwise stationary setting. For a given value of T , we are interested in testing the hypothesis that T is a change-point in the collection {G 1 , G 2 , . . . , G T }. Given two graphs with adjacency matrices, A (T −1) and A (T ) , this can be formulated as the problem of testing the two-sample hypotheses We emphasize that in the above formulation, the parameter p in ER(n, p), the size m of the chatter community, and the associated communication probability p ǫ are generally assumed to be unknown.
Many test statistics are available for this change-point detection problem, including those based on graph invariant statistics (such as number of edges or number of triangles) or those based on locality statistics (such as max degree or scan statistics). For a given graph with adjacency matrix A, let N (i) = {j : A i,j = 1} denote the collection of vertices adjacent to vertex i. Furthermore, • let T k count the number of k-cliques in A for k ≥ 2; • let δ(A) := max i j A i,j be the max degree statistic of A; • let Ψ(A) := max i j,k∈N (i) A j,k be the scan statistic of A.
We note that these test statistics are widely used in anomaly detection for time series of graphs; see [2,29,30,39] and the references therein for a survey of results and applications.
One can then show [31,34] that the test statistics based on T 2 and T 3 are consistent for the above hypothesis test when m = Ω( √ n). More precisely, under the null hypothesis, one has as n → ∞, while under the alternative hypothesis, one has as n → ∞ for some positive constants C 1 and C 2 together with µ n,m,p,ǫ := no test statistic is consistent for testing the above hypotheses. Similarly, one can also show [32,34] that the test statistics based on δ(A) and Ψ(A) are consistent for the above hypothesis test when m = Ω( √ n log n); in particular the (normalized) limiting distributions of both δ(A (T ) ) − δ(A (T −1) ) and Ψ(A (T ) ) − Ψ(A (T −1) ) is the Gumbel distribution.
In the context of this paper, one could also use a test statistic based on the largest eigenvalue. Our earlier results indicate that, under the null hypothesis, with high probability the largest eigenvalues of A (T ) and A (T −1) satisfy |λ max (A (T ) ) − λ max (P (T ) )| = O(1) and |λ max (A (T −1) ) − λ max (P (T −1) )| = O(1), along with |λ max (A (T ) ) − λ max (A (T −1) )| = O(1). Meanwhile, under the alternative hypothesis, when m = o(n), then with high probability Thus the largest eigenvalue test statistic is also consistent when m = Ω( √ n). The previous test statistics are all global test statistics in the sense that, if H 0 is rejected, the resulting test procedures do not extract the subset of the vertices which exhibits anomalous behavior between A (T ) and A (T −1) . One can construct related local test statistics which do extract the subset of anomalous vertices, although the resulting test procedure is computationally prohibitive. For example, assuming that m is known, we could replace Ψ(A) with the (mod- Thus for any fixed p and ǫ, the test statistic based on Λ m is also consistent for the above hypothesis test whenever m = Ω(log n) as n → ∞.
In summary, the results in Section 3 facilitate eigenvalue-based test statistics for the change-point detection problem as presented in this section. Furthermore, the resulting procedure is consistent whenever the size of the chatter community m exceeds the threshold of detectability given in [2].

Acknowledgments
The authors thank the anonymous referees for their valuable feedback which has improved the quality of this paper.

Proof of Theorem 3.3
Proof. Let P, E ∈ R n×n be real symmetric matrices such that E satisfies Proposition 3.1. Denote the d largest eigenvalues of P and A by 0 < λ 1 (P ) ≤ λ 2 (P ) ≤ · · · ≤ λ d (P ), For each i ∈ [d] define η i to be an "approximate eigenvalue of A close to λ i (P )" in the sense that and define a corresponding "residual quantity" ǫ i as We now define a collection of "aggregate quantities": • Define w to be an "aggregate approximate eigenvector of A" in the sense that w := k i=1 r i w i for a collection of normalized coefficients {r i } k i=1 such that w 2 = k i=1 r i w i = 1, and satisfying the under-determined linear system w, u i = 0 for i = 1, 2, . . . , k − 1.
• Define η to be an "aggregate approximate eigenvector of A" in the sense that η := Aw, w . • Define ǫ to be the "aggregate residual quantity" ǫ := (A − η)w .
By Lemma 1 in [18], the interval α, η + ǫ 2 η−α contains a point in the spectrum of A. Note that by construction, w ∈ M ⊥ k−1 =: N k−1 ; moreover, Aw ∈ N k−1 as a function of {r i } k i=1 . In the Hilbert space N k−1 , however, the spectrum of A does not contain λ 1 (A), . . . , λ k−1 (A) since u 1 , . . . , u k−1 / ∈ N k−1 . Thus, by another application of Lemma 1 in [18], the eigenvalue of A in the interval given by α, η + ǫ 2 η−α must be λ k (A) with associated unit eigenvector u k . Hence, We pause briefly to make several computational observations. First, Letting δ i,j := I{i = j} denote the Kronecker delta function, we have for each i, j ∈ [d] that It will also prove useful to recognize the expansion Combining these observations yields An application of the Cauchy-Schwarz inequality coupled with subsequent computation yields k i,j=1 Hence, Returning to Eqn. (6.3), the numerator then becomes By a simple union bound, observe that for t > 0, (6.12) in which case with high probability, while with high probability, By adding and subtracting k i=1 r 2 i η i k(k −1)t to the numerator of Eqn. (6.3) we obtain the following bound in which the first term on the right-hand side is the leading term while the second term on the right hand side corresponds to a residual term.
Now by the same arguments as in [18], Section 3, Eqns. (22)(23)(24)(25)(26)(27)(28)(29)(30), the constants {r i } k i=1 can be removed. To this end, the quantity is bounded above by the quantity Note that max 1≤i≤k η i ≤ λ k (P )+t, with high probability, while a simple computation reveals that for each i ∈ [k], Putting all these observations together finally produces an upper bound on λ k (A) of the form where ζ + :=

Proof of Theorem 3.3: lower bound
Fix k ∈ [d] and let l := d − k + 1. Define M l to be the l-dimensional linear manifold given by We now define a collection of "aggregate quantities" similar to the formulation in Section 6.1.1: • Define w to be an "aggregate approximate eigenvector of A" in the sense that w := d i=k r i w i for a collection of normalized coefficients {r i } d i=k such that w 2 = d i=k r i w i = 1, and satisfying the under-determined linear system w, u i = 0 for i = k + 1, . . . , d.
• Define η to be an "aggregate approximate eigenvector of A" in the sense that η := Aw, w . • Define ǫ to be the "aggregate residual quantity" ǫ := (A − η)w .
By Lemma 2 in [18], the interval η − ǫ 2 β−η , β contains a point in the spectrum of A. Note that by construction, w ∈ M ⊥ l−1 =: N l−1 ; moreover, Aw ∈ N l−1 as a function of {r i } d i=k . In the Hilbert space N l−1 , however, the spectrum of A does not contain λ k+1 (A), . . . , λ d (A) since u k+1 , . . . , u d / ∈ N l−1 . Thus, by another application of Lemma 2 in [18], the eigenvalue of A in the interval η − ǫ 2 β−η , β must be λ k (A) with associated unit eigenvector u k .
Consider first the special case when β = ∞. By a simple union bound, observe that for t > 0, P [max k≤i≤j≤d | Ew i , w j | > t] ≤ l + l 2 C exp(−ct γ ), (6.21) hence with high probability Now suppose that β < ∞. Then for the lower bound of the above interval, one has Reversing the direction of the previous application of the Cauchy-Schwarz inequality in Eqn. (6.9) permits the numerator to be bounded below by whereas the denominator has the expansion d i=k r 2 i (β − η i ) + k≤i<j≤d 2r i r j Ew i , w j .
In the numerator of Eqn. (6.23), add and subtract the quantity d i=k r 2 i η i l(l− 1)t which is bounded below by (λ k (P )−t)l(l−1)t. Combining these observations yields By employing the same approach used to obtain the upper bound and taking negatives when necessary (thereby reversing the direction in which bounds hold), we obtain the lower bound for λ k (A) of the form where ζ − := l E 2 2 +((β−λ k (P ))+(λ d (P )−λ k (P ))+3t)l(l−1)t β−λ d (P )−(l(l−1)+1)t .

Proof of Theorem 3.12
Proof. The proof follows essentially mutatis mutandis as in Theorem 3.3 via Remark 2.4, Definition 3.10, and Lemma 3.11. In particular, observe that one has Mw i ,w j = σ i δ i,j + Ẽw i ,w j for each pair i, j, while at the same time Ẽ 2 = E 2 .
If in addition m = n and E is assumed to be symmetric, then since E 2 ≡ sup x 2=1 | Ex, x |, one need only consider the 1 4 -net X for the purposes of a union bound.