Spectral distances on graphs

By assigning a probability measure via the spectrum of the normalized Laplacian to each graph and using L^p Wasserstein distances between probability measures, we define the corresponding spectral distances d_p on the set of all graphs. This approach can even be extended to measuring the distances between infinite graphs. We prove that the diameter of the set of graphs, as a pseudo-metric space equipped with d_1, is one. We further study the behavior of d_1 when the size of graphs tends to infinity by interlacing inequalities aiming at exploring large real networks. A monotonic relation between d_1 and the evolutionary distance of biological networks is observed in simulations.


Introduction
One major interest in graph theory is to explore the differences of graphs in structure, that is, in the sense of graph isomorphism. In computational complexity theory, the subgraph isomorphism problem, like many combinational problems in graph theory, is NP hard. Therefore, a method that gives a quick and easy estimate of the difference between two graphs is desirable [34]. As we know, all the topological information of one graph can be found in its adjacency matrix. The spectral graph theory studies the relationship between the properties of graphs and the spectra of their representing matrices, such as adjacency matrices and Laplace matrices [14,18,17]. In particular, some important topological information of a graph can be extracted from its specific eigenvalue like the first or the largest one, see e.g. [18,17,39,11,25,12,10]. The approach of reading information from the entire spectrum of a graph was explored in [5,6,7,30,32] etc. In spite of the existence of co-spectral graphs (see [38,Chapter 3] for a general construction and the references therein), the spectra of graphs can support us one way on exploring problems that involve (sub-)graph isomorphism by the fast computation algorithms and the close relationship with the structure of graphs.
A spectral distance on the set of finite graphs of the same size, i.e. the same number of vertices, was suggested in a problem of Richard Brualdi in [37] to explore the so-called cospectrality of a graph. It was further studied in [26] using the spectra of adjacency matrices. Employing certain Gaussian measures associated to the spectra of normalized Laplacians and the corresponding L 1 distances, the first named author, Jost, the third named author and Stadler [21,20] explored a spectral distance well-defined on the set of all finite graphs without any constraint about sizes. In this paper, instead of the Gaussian measures, we assign Dirac measures to graphs through the spectra of normalized Laplacians and use the Wasserstein distances between probability measures to propose spectral distances between graphs. In fact, this notion of spectral distances provides a metrization of the notion of spectral classes of graphs introduced in [21] via the weak convergence of the corresponding Dirac measures. The Spectral class can be considered as a weak notion of graph limits (see the concepts of graphon, graphing and related theories in the monograph of Lovász [33]). This notion of spectral distances is even adaptable for weighted infinite graphs. And we can prove diameter estimates with respect to these distances, which are sharp for certain cases.
A weighted graph G is a triple (V, E, θ) where V is the set of vertices, E is the set of edges and θ : E → (0, ∞), (x, y) → θ xy , is the (symmetric) edge weight function. We write x ∼ y or xy ∈ E if θ xy > 0. We assume that for any vertex x, the weighted degree defined by θ x := y∼x θ xy is finite and θ xx = 0 (i.e. there is no self-loops).
Let us first consider finite weighted graphs. The normalized Laplacian of G = (V, E, θ) is defined as, for any function f : V → R and any x ∈ V , This operator can be extended to an infinite weighted graph which has countable vertex set V but is not necessarily locally finite (see [27] or Section 2 below). As a matrix, ∆ G is unitarily equivalent to the Laplace matrix studied in [17]. If x ∈ V is an isolated vertex, i.e. θ x = 0, (1) reads as ∆ G f (x) = f (x). This implies that an isolated vertex contribute an eigenvalue 1 to the spectrum of ∆ G , denoted by σ(G). In this way, by the absence of the self-loops, the spectrum of any finite weighted graph σ(G) = {λ i } N i=1 , counting the multiplicity, satisfies the trace condition where N = |V |. It is well-known that σ(G) is contained in [0, 2]. We associate to σ(G) a probability measure on [0, 2] as follows: where δ λi is the Dirac measure concentrated on λ i . We call µ σ(G) the spectral measure for a finite weighted graph. (This is known as the empirical distribution of the eigenvalues in random matrix theory.) Denote by P ([0, 2]) the set of probability measures on the interval [0, 2]. For any µ ∈ P ([0, 2]), the first moment of µ is defined as m 1 (µ) :=´[ 0,2] λ dµ(λ). The trace condition (2) is then translated to Definition 1.1. Given two finite weighted graphs G = (V, E, θ) and G ′ = (V ′ , E ′ , θ ′ ), the spectral distance between G and G ′ is defined as We denote by F G the space of all finite weighted graphs. Then for any 1 ≤ p < ∞, (F G, d p ) is a pseudo-metric space. This is not a metric space due to the existence of co-spectral graphs. However, in applications this spectral consideration leads to the simplification of measuring the discrepancy of graphs.
One of the main results of our paper is the following theorem. In fact, Theorem 1.2 follows from the estimates on the Wasserstein distance of probability measures in condition of the first moments.
By Proposition 2.1 and Lemma 2.2 below, one easily shows that the above measure-theoretic estimate is equivalent to the following analytic estimate.
Section 3 is devoted to the proofs of Theorem 1.2, 1.4 and Theorem 1.5. We extend our approach of the spectral distance to infinite graphs (with countable vertex set V) in Section 4. Note that in the above arguments we only use the normalization of the first moment of the spectral measures, i.e. m 1 (µ σ(G) ) = 1, our results generalize to all weighted graphs including the infinite ones. For spectral measures with distinguished vertices on infinite graphs, we refer to Mohar-Woess [36]. We introduce two definitions of spectral measures for infinite graphs. One is defined via the exhaustion of the infinite graphs by the spectral measures of normalized Dirichlet Laplacians on subgraphs. The other is defined for random rooted graphs following Benjamini-Schramm [13], Aldous-Lyons [2] and Abért-Thom-Virág [1].
We denote by G the collection of all (possibly infinite) weighted graphs. For any G ∈ G, we define SM(G) as the spectral measures of G by exhaustion, see Definition 4.1, which is a closed subset of P ([0, 2]). Then G endowed with the Hausdorff distance induced from the metric space (P ([0, 2]), d W p ), denoted by d p,H , is a pseudo-metric space. A direct application of Theorem 1.4 yields the following corollary (recalled below as Theorem 4.2).
For any D ≥ 1, we denote by RRG D the collection of random rooted graphs of degree D, see Section 4.2 for definitions. Any finite weighted graph G gives rise to a random graph by assigning the root of G uniformly randomly. There are many interesting class of random rooted graphs such as unimodular and sofic ones, see [1]. For each random rooted graph G ∈ RRG D , we associate it with an expected spectral measure, denoted by µ G . In this way, RRG D endowed with d W p Wasserstein distance for expected spectral measures (d p in short) is a pseudo-metric space. By Theorem 1.4, one can prove the following corollary (recalled below as Theorem 4.4).  We then concentrate on the spectral distance d 1 . In Section 5, we calculate d 1 on several particular classes of graphs. For our purpose of application to large real networks, we are more concerned with the behavior of d 1 when the size of graphs N tends to infinity. We observe convergence behaviors like The asymptotic behavior of d 1 is studied in general in Section 6 by employing interlacing inequalities of the spectra of finite weighted graphs. For two graphs G and G ′ , which differ from each other by some standard operations including e.g. edge deleting, vertex replication, vertices contraction and edge contraction, we prove where C depends only on the operations and is independent of the size N of G (see Theorem 6.3). By this result, we further derive a convergence result of graphs under the d 1 distance.
In the last section, we apply the distance d 1 to study the evolutionary process of biological networks by simulations. We start from a Barabási-Albert scale-free network, which has proven to be a very common type of real large networks [8]. We then simulate the evolutionary process by the operations, edge-rewiring and duplication-divergence respectively. We observe a monotonic relation between d 1 and the evolutionary distance, which is a crucial point to anticipate further applications in exploring evolutionary history of biological networks.

Preliminaries, spectral measures and spectral distances
In this section, we recall basics about graph spectra and Wasserstein distances on the space of probability measures, and define the spectral distances of finite graphs. The spectral distances of infinite graphs and random graphs will be postponed to Section 4.
Let us consider a possibly infinite weighted graph G = (V, E, θ), where V is a countable (possible infinite) set. We require that the weight function θ satisfies y∈V θ xy < ∞, ∀x ∈ V.
The weighted degree of the vertex x ∈ V is still defined as θ x := y∼x θ xy . The graph is called connected if for every two vertices x, y ∈ V there exists a finite path x = x 0 ∼ x 1 ∼ · · · ∼ x n = y connecting x and y.
We define the (formal) normalized Laplacian ∆ on the formal domain As a linear operator, its restriction to the Hilbert space defined on ℓ 2 (V, θ), for details see [27]. If G = (V, E, θ) is a weighted graph without isolated vertices, i.e. θ x > 0 for all x ∈ V , then the normalized Laplacian of G can be rephrased as where D is the degree operator and A is the adjacency operator (defined as Dτ x = θ x τ x and Aτ x = y∼x θ yx τ y , where τ x (y) = 1 if y = x and 0 otherwise), i.e. for any finitely supported function f : V → R, Since D −1 A is a bounded selfadjoint operator with operator norm less than or equal to 1 on ℓ 2 (V, θ), the spectrum of ∆ G , denoted by σ(G), is contained in the interval [0, 2]. We order the spectrum of any finite weighted graph G in the nondecreasing way: where N = |V |. For convenience, we also denote the spectrum of G by a vector, called spectral vector of G, λ G : 2.1. Spectral measures. Let G be a finite weighted graph. We denote by F G the cumulative distribution function associated to µ σ(G) (recall (3)), and by Recalling the trace condition (2), we have the following proposition.
Proposition 2.1. Let G = (V, E, θ) be a finite weighted graph. Then the following are true: Since the spectrum of the normalized Laplacian of a graph lies in the interval [0, 2] ⊂ R, one may calculate the spectral distance (5) explicitly. This is an advantage of probability measures supported in the 1dimensional space. In fact, the spectral distance between two finite weighted graphs G, G ′ , i.e. the Wasserstein distance of two spectral measures µ σ(G) , µ σ(G ′ ) , can be calculated by the inverse cumulative distribution functions F −1 G and F −1 G ′ thanks to the following lemma.
One can show that if two graphs having the same number of vertices, say N , then the spectral distance between them is reduced to the ℓ p distance between the spectral vectors, i.e. for any 1 ≤ p < ∞, In this paper, we are interested in the diameter of the pseudo-metric space We denote by {·} a graph consisting of a single vertex without any edge. Then by our convention, σ({·}) = {1}. Clearly, for any weighted graph G, In the following, we use (integral) Chebyshev inequality to derive a refined upper bound for the diameter. Lemma 2.3 (Chebyshev inequality, see [22,Section 2.17] or [19]). For any nonnegative, monotonically increasing integrable functions f, g : i.e. for any finite weighted graphs G and G ′ , Proof. Let us denote f = F −1 G and g = F −1 G ′ . Then by Chebyshev inequality (9) and Proposition 2.
Hence, for any 1 ≤ p ≤ 2 we have where we have used that f ≤ 2 and g ≤ 2. This proves the theorem.
In the next section, we will give a tighter upper bound for the diameter estimates. In particular, in the case of p = 1, we derive an optimal upper bound, that is, we will prove that diam(F G, d 1 ) = 1. The tightness of this estimate can be seen from the following two examples.
The following example is more convincing.
Example 2.6. Let G ′ = P 2 be the path on two vertices and G N an unweighted (i.e. θ xy = 1 for every edge xy) complete graph on N vertices. Then it is known that Therefore we have

The proof of the diameter estimate
This section is devoted to the proofs of Theorem 1.2, 1.4 and Theorem 1.5. We first prove some lemmata.
We call a function f : In particular, we say f jumps at a and b. Clearly, The name for a 2-step function is evident from the graph of the function. In particular, any inverse function F −1 G of a cumulative distribution function of a graph G with 3 vertices is an admissible 2-step function.
where "=" holds if and only if (ignoring the order of f, g) Observe that the inverse cumulative distribution functions in Example 2.5 are exactly the two functions in (13).
Proof. Let f : [0, 1] → [0, 2] (g : [0, 1] → [0, 2] resp.) be an admissible 2-step function jumping at a and b (c and d resp.). Denote the height of the first jump of f and g by h 1 := 2b−1 b−a and h 2 := 2d−1 d−c respectively. The proof is divided into four cases and several subcases as follows: Fig. 1. For each domain I (II resp.) in Fig. 1, we denote by |I| (|II| resp.) the area of that domain. We reflect the domain II along the line {x = c} to obtain a new domain II ′ . By the fact that c ≤ 1 2 , we havê Reflect the domain I along the line {x = d} to obtain I ′ . Then which is a contradiction. This proves the claim.
By interchanging the role of a, b and c, d, this reduces to the Case 1.
This reduces to Case 2 by the same change as in Case 3.
Combining all the cases and subcases, we prove (12). Finally, we can check case by case that the equality in (12) can be achieved only when f and g are the functions given by the relation (13). This completes the proof.
Before proving the next lemma, we recall some basic facts from the convex analysis. Let Ω be a convex subset of R N , possibly having lower Hausdorff dimension. A function f : Ω → R is called convex if for any x, y ∈ Ω and 0 ≤ t ≤ 1, In particular, for any norm · on R N , the function f : R N → R defined by f (x) = x − x 0 for some fixed x 0 is a convex function. We say a point x ∈ Ω is extremal if it cannot be written as the nontrivial convex combination of two other points in Ω, i.e. if x = tx 1 + (1 − t)x 2 for some 0 < t < 1 and x 1 , x 2 ∈ Ω, then x = x 1 = x 2 . The set of extremal points of a convex set Ω is denoted by Ext(Ω). A subset P ⊂ R N is called a (closed) convex polytope if it is the intersection of finite many half spaces, i.e. there exist K ∈ N linear functions {L j } K j=1 on R N such that We state a well-known fact which will be used to prove the next lemma. f.
The following lemma is the special case of Theorem 1.2 when two graphs have the same number of vertices.
Proof. Let P denote the compact convex polytope {α ∈ R N : 0 ≤ α 1 ≤ · · · ≤ α N ≤ 2, α ℓ 1 = N }. Then by the induction on N, one can show that the set of extremal points of P is Then for any α ∈ P, we define a step function f α : In addition, for any γ ∈ Ext(P ), f γ is an admissible 2-step function defined in (11).
Note that for any fixed β 0 ∈ R N , the function F : By Fact 3.2, max α∈P β∈P This proves the claim. For any γ, θ ∈ Ext(P ), noting that f γ and f θ are admissible 2-step functions, by Lemma 3.1, we have Combining this with (15), we prove the lemma. Now we can prove Theorem 1.5. A function f : [0, 1] → [0, 2] is called a rationally distributed step function if there is a (rational) partition 0 = r 0 < r 1 < r 2 < · · · < r N = 1 with r i ∈ Q for all 0 ≤ i ≤ N and an increasing sequence 0 ≤ a 1 < · · · < a N ≤ 2 such that Proof of Theorem 1.5. First, we consider p = 1. By the standard approximation argument, any such functions, f and g, can be approximated in L 1 norm by a sequence of rationally distributed step functions, say {f n } ∞ n=1 and {g n } ∞ n=1 , satisfyinǵ 1 0 f n =´1 0 g n = 1. Hence it suffices to prove the theorem for rationally distributed step functions.
W.l.o.g., we may assume f and g are rationally distributed step functions, say

Hence Lemma 3.3 implies that
For p ∈ (1, ∞), it can be easily derived from the result for p = 1. This proves the theorem.

Theorem 1.4 then follows directly.
Proof of Theorem 1.4. Let F µ and F ν denote the cumulative distribution functions of the measures µ and ν respectively. Since the total area of the square [0, 1] × [0, 2] is equal to 2, by the assumption m 1 (µ) = m 1 (ν) = 1 we havê

Spectral distances of infinite graphs
In this section, we introduce two definitions of spectral measures for infinite weighted graphs with countable vertex set and extend our approach of spectral distance to this setting.

Spectral measures by exhaustion.
Let G = (V, E, θ) be an infinite weighted graph and G Ω := (Ω, E |Ω , θ |Ω×Ω ) a finite connected subgraph of G induced by a subset Ω ⊂ V . We introduce the Dirichlet boundary problem of the normalized Laplacian on Ω, see e.g. [10]. Let ℓ 2 (Ω, θ) denote the space of real-valued functions on Ω. Note that every function f ∈ ℓ 2 (Ω, θ) can be extended to a functionf ∈ ℓ 2 (V, θ) by settingf (x) = 0 for all x ∈ V \ Ω. The normalized Laplacian with the Dirichlet boundary condition on Ω, denoted by ∆ GΩ , is defined as ∆ GΩ : ℓ 2 (Ω, θ) → ℓ 2 (Ω, θ), Thus for x ∈ Ω the Dirichlet normalized Laplacian is pointwise defined by where θ(x) is the weighted degree of the entire graph. A simple calculation shows that ∆ GΩ is a positive self-adjoint operator. We arrange the eigenvalues of the Dirichlet Laplace operator ∆ GΩ in nondecreasing order, i.e. λ 1 (Ω) ≤ λ 2 (Ω) ≤ . . . ≤ λ N (Ω), where N is the cardinality of the set Ω, i.e. N = |Ω|. By the trace condition, we also have the key property As same as finite graphs, we associate it with the spectral measure, Hence m 1 (µ Ω ) = 1. A sequence of finite connected subgraphs {Ω n } ∞ n=1 is called an exhaustion of the infinite graph G if Ω n ⊂ Ω n+1 for all n ∈ N and ∪ ∞ n=1 Ω n = V. Hence we have a sequence of probability measures {µ Ωn } ∞ n=1 on [0, 2]. Since P ([0, 2]) is compact under the weak topology, up to a subsequence, w.l.o.g. we have µ Ωn ⇀ µ for some µ ∈ P ([0, 2]). Note that any subsequence of an exhaustion is still an exhaustion. Therefore we define the spectral measures of an infinite graph by all possible exhaustions. Note that the convergence of the spectral structure was studied in more general setting by Kuwae-Shioya [29]. For any metric space (X, d), one can define the Hausdorff distance between the subsets of X. For any subset A ⊂ X, we define the distance function to the subset A as X ∋ x → d(x, A) = inf{d(x, y)|y ∈ A}, and the r-neighborhood of A as U r (A) := {y ∈ X|d(y, A) < r}, r > 0. Given two subsets A, B ⊂ X, the Hausdorff distance between them is defined as One can show that the set of closed subsets of X endowed with the Hausdorff distance is a metric space.
Note that for p ∈ [1, ∞), P ([0, 2]) endowed with the p-th Wasserstein distance is a metric space and SM(G) is a closed subset of P ([0, 2]) for any weighted graph G. We denote by G the collection of all (possibly infinite) weighted graphs. Hence G endowed with the Hausdorff distance induced from (P ([0, 2]), d W p ), denoted by d p,H , is a pseudo-metric space.
A direct application of Theorem 1.4 yields
For any D ≥ 1, we define a subcollection of G, G D := {(V, E, θ) ∈ G| deg x ≤ D, θ xy ≤ D for all x, y ∈ V } where deg x = |{y ∈ V |y ∼ x}|, i.e. the set of weighted graphs with bounded (unweighted) degree (≤ D) and bounded edge weights (≤ D). Let RG D denote the set of graphs G in G D with a distinguished vertex, called the root of G.
For any x, y ∈ V of G = (V, E, θ), we denote by d C (x, y) the distance between x and y, i.e. d C (x, y)  o 1 ), (G 2 , o 2 ) ∈ RG D with G 1 = (V 1 , E 1 , θ 1 ) and G 2 = (V 2 , E 2 , θ 2 ), we define the rooted distance between G 1 and G 2 as 1/K where respectively. One can prove that RG D endowed with the rooted distance is a compact metric space. By a random rooted graph of degree D we mean a Borel probability distribution on RG D . We denote by RRG D the collection of random rooted graphs of degree D. Any finite weighted graph G gives rise to a random rooted graph by assigning the root of G uniformly randomly.
For a rooted weighted graph (G, o) ∈ RG D with G = (V, E, θ), the normalized Laplacian is a bounded self-adjoint operator on ℓ 2 (V, θ) which is independent of o. By spectral theorem, there is a projection-valued measure, denoted by P • , on [0, 2], i.e. P A is a projection on ℓ 2 (V, θ) for any Borel A ⊂ [0, 2], such that for any continuous function f ∈ C([0, 2]) we have the functional calculus where P x = P [0,x] . We define the spectral measure of the rooted graph (G, o) as where ·, · is the inner product for ℓ 2 (V, θ). One can easily show that µ G,o is a probability measure on [0, 2]. Further calculation by using (16) yields m 1 (µ G,o ) = 1. Now we can define the expected spectral measure for rooted random graphs.
Definition 4.3. Let G be a random rooted graph. We define the expected spectral measure of G as where the expectation is taken over the distribution on RG D .
Let G be a random rooted graph rising from a finite weighted graph with uniform distribution on its vertices. A similar calculation as in Abért-Thom-Virág [1] shows that is the spectrum of the finite graph. Hence the expected spectral measure of random rooted graphs generalizes the spectral measure of finite graphs. There are other interesting classes of random rooted graphs such as unimodular and sofic ones, see e.g. [1].
The set of random rooted graphs of degree D, RRG D , endowed with d W p Wasserstein distance for expected spectral measures (d p in short) is a pseudo-metric space. By Theorem 1.4, one can prove the following theorem.

Calculation of examples
From now on, we will concentrate on the study of the spectral distance d 1 . We calculate this distance for several classes of graphs in this section. Rather than the exact value of the d 1 distance between two graphs, we are more concerned with the asymptotical behavior of the distance between two sequences of graphs which become larger and larger, as the sizes of real networks in practice nowadays are typically huge. All example graphs we consider in this section are unweighted.
Proposition 5.1. For two complete graphs G and G ′ with N and M (M > N ) vertices respectively, we have Proof. Recall the spectrum (10) of a complete graph. We then calculate the distance (i.e. the area of the grey region shown in Fig. 7), Proof. The spectrum of a complete bipartite graph G with N vertices is Then the distance is (the area of the grey region shown in Fig. 8) Proposition 5.5. For two cubes G and G ′ of size 2 N and 2 N +1 respectively, we have Proof. The spectrum of the cube G with 2 N vertices is Secondly, by the recursive formula N +1 Therefore the distance between G and G ′ equals the area of the grey region depicted in Fig. 9. Again by the recursive formula of binomial numbers, we calculate,  Figure 9. An example of two neighboring cubes with N = 3 and N + 1 = 4.
Remark 5.6. The distance between two neighboring cubes (N -cube and (N + 1)cube) is O(1/N ) as N tends to infinity. Recall a crucial difference of this example from previous ones is that the size difference, 2 N , is not uniformly bounded as N → ∞.
Proposition 5.7. For two paths G and G ′ of size N and N + 1 respectively, we have Proof. The spectrum of the path G with with N vertices is 1 − cos πi N − 1 , i = 0, 1, . . . , N − 1 .
. . , N − 2, and every eigenvalue of a path has multiplicity one, the situation is similar to Proposition 5.5, as shown in Fig. 10.
For the last equality above we use the Lagrange's trigonometric identities and their derivatives.
Remark 5.8. By a Taylor expansion argument, we observe that Therefore in this example, we have d 1 (G, G ′ ) = O(1/N ) as N tends to infinity.
We can calculate the example of cycles similarly.
Proposition 5.9. For two cycles G and G ′ of size N and N + 1 respectively, we have

Distance between large graphs
In this section we explore the behaviors of the spectral distance d 1 between large graphs in general. We require two large graphs are different from each other only by finite steps of operations which will be made clear in Remark 6.1. The main tool we employ is the so-called interlacing inequalities, which describe the changes of the spectrum when we perform some operations on the underlying graph. Such kind of results for normalized Laplacian of a graph have been studied in [16,31,15,23,3]. In fact, we can observe the interlacing phenomena of eigenvalues for paths and cycles in Proposition 5.7 and 5.9.
Let the cardinality of vertices of G and G ′ be N and N − j respectively, where j ∈ Z can be either negative or positive. Assume are the spectra of the corresponding normalized Laplacian ∆ G and ∆ G ′ . Then interlacing inequalities have the following general form.
with the notation that λ i = 0 for i ≤ 0 and λ i = 2 for i > N , and k 1 , k 2 are constants independent of the index i.
Remark 6.1. G ′ can be obtained from G by performing the following operations.
• G ′ is the proper difference of G and one of its subgraph L. We say L is a subgraph of G if the weights θ L,uv ≤ θ G,uv for all u, v. And the proper difference of G and L is a weighted graph with weights θ G − θ L . In this case, , see also Butler [15]). This includes the operation of deleting an edge (see Chen et al [16] for the result for this particular operation). Symmetrically, this also covers the operation of adding a graph, see Bulter [15] for particular results and Atay-Tunçel [3] for vertex replication. • G ′ is the image of an edge-preserving map ϕ : G → G ′ . By an edgepreserving map here we mean an onto map from the vertices of G to vertices of G ′ , such that for all vertices x, y of G ′ , and the degree of vertices are defined according to the edge weights as usual in both graphs. Notice that for our purpose, we do not allow ϕ maps two neighboring vertices in G to the same vertex in G ′ in order to avoid self-loops. In this case, k 1 = 0 and k 2 = j.  We prove the following result. Theorem 6.3. Let G, G ′ be two graphs, for which the spectra of corresponding normalized Laplacians satisfy (17). Then we have Proof. By definition, we have By symmetry, w.l.o.g., we can suppose j ≥ 0. We use a particular transport plan to derive the upper bound estimate. We move the mass 1 N from λ i to λ ′ i for i = 1, 2, . . . , N − j. We then move the mass at the remaining positions λ N −j+1 , . . . , λ N to fill the gaps at λ ′ 1 , λ ′ 2 , . . . , λ ′ N −j with a cost for every transportation at most 2. That is, we have In the second inequality above, we used interlacing inequalities (17). This complete the proof.
Remark 6.4. The disjoint union of a path of size N and an isolated vertex can be obtained from a path of size N + 1 by deleting an edge. A cycle of size N can be obtained from a cycle of size N + 1 by contracting an edge. Recall our calculation in Proposition 5.7 and 5.9, we see the estimate (18) is sharp in the order of 1/N . Remark 6.5. This theorem tells that if two large graphs share similar structure, then the spectral distance between them is small.
If G ′ is the graph obtained from G by performing operations such that k 1 , k 2 are bounded (then j is also bounded), we say G ′ differs from G by a bounded operation. Corollary 6.6. Let {G i } ∞ i=1 be a sequence of graphs with size N i tending to infinity. Assume that for any i, G ′ i differs from G i by a uniformly bounded operation, then lim i→∞ d 1 (G i , G ′ i ) = 0.

Applications to biological networks
In real biological networks, such as protein interaction networks, edge-rewiring and duplication-divergence are two edit operations which have been proven to be closely related to some evolutionary mechanism, see [24,28]. For a spectral analysis of the effect of such operations on protein interaction networks, we refer to [4]. In this section, we apply the spectral distance d 1 to capture evolutionary signals in protein interaction networks through detecting their structural differences. We evolve graphs by operations of edge-rewiring and duplication-divergence, and then check the connection between the spectral distance d 1 and the evolutionary distance (i.e. the number of evolutionary operation steps). We restrict our simulations in the following to unweighted graphs.
Let us first explain the two edit operations on an unweighted graph G = (V, E) explicitly.
• Edge-rewiring: Select randomly two edges v 1 v 3 , v 4 v 5 ∈ E on four distinct vertices v 1 , v 3 , v 4 , v 5 ∈ V (see Fig. 11(a)). Delete these two edges v 1 v 3 , v 4 v 5 and add new edges v 1 v 4 , v 3 v 5 . The size of the graph is preserved by this operation, and so is the degree sequence. • Duplication-divergence: Select randomly a target vertex v 3 ∈ V . Add a replica v 2 of v 3 and new potential edges connecting v 2 with every neighbor of v 3 . Each of these potential edges is activated with certain probability (0.5 in our simulations). Then if at least one of these potential edges is established, keep the replica v 2 ; otherwise, delete the replica v 2 (see Fig.  11(b)). Our simulations are designed as follows. We start form a Barabási-Albert scalefree graph with 1000 vertices. This is obtained through a mechanism incorporating growth and preferential attachment from a small complete graph of size 10, see [8,9]. For each step of preferential attachment, we add one vertex with two edges. We remark that the Barabási-Albert scale-free graph is not necessarily the best starting model for any biological network. However, it is closer to biological networks in many cases than the other two popular models, the Erdős-Rényi random graph and the Watts-Strogatz small-world graph. Therefore, we use it as our starting point here. We carry out the operation of edge-rewiring and duplication-divergence on this graph iteratively, then plot the relationship of the spectral distance and the evolutionary distance between new obtained graphs and the original ones.
In the plot of Fig. 12, we observe that the spectral distance between graphs obtained by edge-rewiring operations and the original one increases more quickly than that obtained by duplication-divergence operations. This indicates that, after the same number of operation steps, edge-rewiring brings in more randomness to the graph than duplication-divergence. Recall also the fact that the sizes of graphs are invariant in the former case and vary in the later case. Although there is no strictly linear relation between the two distances, the spectral distance increases monotonically with respect to the evolutionary distance. Based on this crucial point, the spectral distance is very useful for exploring the hiding evolutionary history of large real networks.