Metastability for the contact process on the configuration model with infinite mean degree

We study the contact process on the configuration model with a power law degree distribution, when the exponent is smaller than or equal to two. We prove that the extinction time grows exponentially fast with the size of the graph and prove two metastability results. First the extinction time divided by its mean converges in distribution toward an exponential random variable with mean one, when the size of the graph tends to infinity. Moreover, the density of infected sites taken at exponential times converges in probability to a constant. This extends previous results in the case of an exponent larger than $2$ obtained in \cite{CD,MMVY,MVY}.


Introduction
In this paper we will prove metastability results for the contact process on the configuration model with a power-law degree distribution, extending the main results of [CD, MVY, MMVY] to the case when the exponent of the power-law is smaller than or equal to 2.
The contact process is one of the most studied interacting particle systems, see in particular Liggett's book [L], and is also often interpreted as a model for the spread of a virus in a population or a network. Mathematically, it can be defined as follows: given a countable locally finite graph G and λ > 0, the contact process on G with infection rate λ is a continuous-time Markov process (ξ t ) t≥0 on {0, 1} V , with V the vertex set of G. The elements of V , also called sites, are regarded as individuals which are either infected (state 1) or healthy (state 0). By considering ξ t as a subset of V via ξ t ≡ {v : ξ t (v) = 1}, the transition rates are given by ξ t → ξ t \ {v} for v ∈ ξ t at rate 1, and where deg ξt (v) denotes the number of edges between v and another infected site (note that if G is a simple graph, in the sense that there is only one edge between any pair of vertices, then deg ξt (v) is just the number of infected neighbors of v at time t).
Since the empty configuration is an absorbing state (and the only one), a quantity of particular interest is the extinction time, defined by τ G = inf{t : ξ t = ∅}.
Exploiting the fact that the contact process is stochastically increasing in λ, one can show that some graphs exhibit a nontrivial phase transition, regarding the finiteness of τ G . For instance on Z d , there exists a critical value λ c (d) > 0, such that for λ ≤ λ c (d), τ Z d is a.s. finite (when ξ 0 is finite), whereas when λ > λ c (d), it has positive probability to be infinite (even when starting from a single vertex), see [L] Section I.2 for a proof of this and references.
Here we will only consider finite graphs, in which case the extinction time is always almost surely finite. However, it is still interesting to understand its order of magnitude as a function of the size of the graph. For instance a striking phenomenon occurs on finite boxes 0, n d : one can show that with high probability (w.h.p.), if the process starts from full occupancy, the extinction time is of logarithmic order when λ < λ c (d), of polynomial order when λ = λ c (d) (at least in dimension one), and of exponential order when λ > λ c (d), see [L] Section I.3 for a discussion on this and a complete list of references.
In fact such result seems intimately related to the fact that finite boxes converge to Z d when n tends to infinity, in the sense of the Benjamini-Schramm's local weak convergence of graphs [BS]. If a rigorous connection between the two phenomena still remains conjectural at the moment, recently many examples gave substancial credit to this conjecture, see for instance [CD, CMMV, MV, MMVY].
The case of the configuration model (a definition will be given later) is particularly interesting in this regard, at least when the degree distribution has finite mean. Indeed in this case it is not difficult to see that when the number of vertices increases, the sequence of graphs converges toward a Galton Watson tree. In [CD] Chatterjee and Durret have shown that when the degree distribution has a power law (with exponent larger than two), the extinction time grows faster than any stretched exponential (in the number of vertices), which can be interpreted in saying that the critical value is zero for these graphs (invalidating thereby some physicists predictions). Since on the other hand one can show that the critical value on the limiting Galton Watson tree is also zero (the process has always a positive probability to survive for any λ > 0), the conjecture mentioned above is satisfied for this class of examples. It is worth noting that the case of degree distributions with lighter tails than polynomial seems much harder (in particular understanding the case of Poisson distributions would be of great interest due to its connection with Erdös-Rényi random graphs). But the configuration model is also interesting for another reason, highlighted in [CD]: when the degree sequence has a power law, the contact process exhibits a metastable behaviour. This was first proved under a finite second moment hypothesis (equivalently for exponents larger than three) in [CD], and the result has been later strengthened and extended to exponents larger than two in [MVY, MMVY]. To be more precise now, in [CD] the authors proved that when the degree distribution has a power law with finite second moment, then for some positive constants c and C (independent of λ), where ξ denotes the contact process starting from full occupancy. In [MMVY] the authors have shown that when the degree distribution has finite mean (and a power law), the extinction time is w.h.p. exponential in the size of the graph (when starting from full occupancy), and combined with the results of [MVY], one obtains that for any sequence (t n ) satisfying t n → ∞ and t n ≤ exp(cn), where In this paper we complete this picture by studying the case of power laws with exponents a ∈ (1, 2]. To simplify the discussion and some proofs we have choosen to consider mainly only two special choices of degree distribution. Namely we assume that it is given either by (1) p n,a (j) = c n,a j −a for j = 1, . . . , n, for graphs of size n, or by independently of the size of the graph, where (c n,a ) and c ∞,a are normalizing constants. However, at the end of the paper we also present straightforward extensions of our results to more general distributions, see Section 7 for more details. Our first main result in this setting is the following: Theorem 1.1. For each n, let G n be the configuration model with n vertices and degree distribution given either by (1) or (2) with a ∈ (1, 2]. Consider the contact process (ξ t ) t≥0 with infection rate λ > 0 starting from full occupancy on G n . Then there is some positive constant c = c(λ), such that the following convergence in probability holds: for any sequence (t n ) sastifying t n → ∞ and t n ≤ exp(cn), where Note that as λ → 0, which in particular shows that the guess of Chatterjee and Durrett [CD] that ρ a (λ) should be O(λ) was not correct. Now let us make some comments on the proof of this result. One first remark is that one of the main ingredients in the approach of [MVY] completely breaks down when the degree distribution has infinite mean (or when its mean is unbounded like in the case (1)), since in this case the sequence of graphs (G n ) does not locally converge anymore. In particular we cannot transpose the analysis of the contact process on G n (starting from a single vertex) into an analysis on an infinite limit graph. So instead we have to work directly on the graph G n . In fact we will show that it contains w.h.p. a certain number of disjoint star graphs (i.e. graphs with one central vertex and all the others connected to the central vertex), which are all connected, and whose total size is of order n (the size of G n ). It is well known that the contact process on a star graph remains active w.h.p. for a time exponential in the size of the graph. So our main contribution here is to show that when we connect disjoint star graphs together, the process survives w.h.p. for a time which is exponential in the total size of these graphs. To this end we use the machinery introduced in [CD], with their notion of lit stars. We refer to Proposition 4.1 and its proof for more details. Now it is interesting to notice that while this strategy works in all the cases we consider, the details of the arguments strongly depend on whether a < 2 or a = 2, and on the choice of the degree distribution. This explains why we found interesting to present the proof for the two examples (1) and (2) (note that these distributions were also considered in [VVHZ], where it was already proved that the distance between two randomly chosen vertices was a.s. equal either to two or three).
Then to obtain the asymptotic expression for the density (3), the point is to use the self-duality of the contact process. This allows to transpose the problem on the density of infected sites in terms of survival of the process starting from a single vertex. But starting from a single vertex, the process has a real chance to survive for a long time only if it infects one of its neighbors before extinction. Moreover, when it does, one can show that w.h.p. it immediately infects one of the star graphs mentioned above, and therefore the virus survives w.h.p. for a time at least t n . The conclusion of the theorem follows once we observe that the probability to infect a neighbor before extinction starting from any vertex is exactly equal to ρ a (λ) in case (2) and to (5) ρ n,a (λ) := n j=1 jλ jλ + 1 p n,a (j), in case (1), which converges to ρ a (λ), as n → ∞.
Our second result is often considered in the literature as another (weaker) expression of the metastability: Theorem 1.2. Assume that the degree distribution on G n is given either by (1) or (2) with a ∈ (1, 2], and let τ n be the extinction time of the contact process with infection rate λ > 0 starting from full occupancy. Then (i) the following convergence in law holds with E(1) an exponential random variable with mean one, (ii) there exists a constant C > 0, such that E(τ n ) ≤ exp(Cn), for all n ≥ 1.
In particular this result shows that Theorem 1.1 cannot be extended to sequences (t n ) growing faster than exponentially. In fact one can prove (see Remark 6.4) that Theorem 1.1 holds true for any constant c smaller than lim inf(1/n) log E(τ n ), and cannot be extended above this limit. This of course raises the question of knowing if the sequence (1/n) log E(τ n ) admits a limit or not. Such result has been obtained in a number of contexts, for instance in [MMVY] or on finite boxes 0, n d (see [L] Section I.3), but we could not obtain it in our setting. One reason, which for instance prevents us to apply the strategy of [MMVY], is that there does not seem to be a natural way to embed G n into G n+1 (or another configuration model with larger size).
Our method for proving Theorem 1.2 (i) is rather general and applies to a variety of other models. For instance it can be used with finite boxes of Z d (when d ≥ 2), the configuration model with power law distribution having a finite mean (with the same hypothesis as in [CD, MVY]), or finite regular trees and random d-regular graphs with d ≥ 3, in the supercritical regime, see Remark 6.3 for more details.
Let us also stress the fact that (ii) would be well known if the graph had order n edges, as when the degrees have finite mean, but here it is not the case, so we have to use a more specific argument, see Section 6. Now the paper is organized as follows. In the next section, we recall the well-known and very usefull graphical construction of the contact process. We also give a definition of the configuration model, fix some notation, and prove preliminary results on the graph structure. In Section 3, we prove that G n contains w.h.p. a subgraph, called two-step star graph, which is made of several star graphs connected together, whose total size is comparable to the size of the whole graph. We refer to this section for a precise statement, which in fact depends on which case we consider (a < 2 or a = 2, and distribution (1) or (2)). In Section 4 we show that once a vertex (with high degree) of the two-step star graph is infected, the virus survives for an exponential time. Then we prove Theorem 1.1 and 1.2 in Sections 5 and 6 respectively. Finally in the last section we discuss several extenstions of our results to more general degree distributions.

Preliminaries
2.1. Graphical construction of the contact process. We briefly recall here the graphical construction of the contact process (see more in Liggett's book [L]).
Fix λ > 0 and an oriented graph G (recall that a non-oriented graph can also be seen as oriented by associating to each edge two oriented edges). Then assign independent Poisson point processes N v of rate 1 to each vertex v ∈ V and N e of rate λ to each oriented edge e. Set also N (v,w) := ∪ e:v→w N e , for each ordered pair (v, w) of vertices, where the notation e : v → w means that the oriented edge e goes from v to w.
We say that there is an infection path from (v, s) to (w, t), and we denote it by either if s = t and v = w, or if s < t and if there is a sequence of times s = s 0 < s 1 < . . . < s l < s l+1 = t, and a sequence of vertices v = v 0 , v 1 , . . . , v l = w such that for every Furthermore, for any A, B two subsets of V n and I, J two subsets of [0, ∞), we write if there exists v ∈ A, w ∈ B, s ∈ I and t ∈ J, such that (6) holds. Then for any A ⊂ V n , the contact process with initial configuration A is defined by has the same distribution as the process defined in the introduction. Just note that in our definition, the Poisson processes associated to edges forming loops play no role (we could in particular remove them), but this definition will be convenient at one place of the proof (when we will use that the Y n,v 's are i.i.d. in Subsection 5.1). We define next τ A n as the extinction time of the contact process starting from A. However, we will sometimes drop the superscript A from the notation when it will be clear from the context. We will also simply write ξ v t or τ v n when A = {v}.
Finally we introduce the following related notation: for any vertex v and oriented edge e.
2.2. Configuration model and notation. The configuration model is a well known model of random graph with prescribed degree distribution, see for instance [V]. In fact here we will consider a sequence (G n ) of such graphs. To define it, start for each n with a vertex set V n of cardinality n and construct the edge set as follows. Consider a sequence of i.i.d. integer valued random variables (D v ) v∈Vn (whose law might depend on n) and assume that L n = v D v is even (if not increase one of the D v 's by 1, which makes no difference in what follows). For each vertex v, start with D v half-edges (sometimes called stubs) incident to v. Then match uniformly at random all these stubs by pairs. Once paired two stubs form an edge of the graph. Note that the random graph we obtain may contain multiple edges (i.e. edges between the same two vertices), or loops (edges whose two extremities are the same vertex). Now we introduce some notation. We denote the indicator function of a set E by 1(E). For any vertices v and w we write v ∼ w if there is an edge between them (in which case we say that they are neighbors or connected), and v ∼ w otherwise. We also denote by s v the number of half-edges forming loops attached to a vertex v. We call size of a graph G the cardinality of its set of vertices, and we denote it by |G|.
A graph whose all vertices have degree one, except one which is connected to all the others is called a star graph. The only vertex with degree larger than one is called the center of the star graph, or central vertex. We call two-step star graph a graph formed by a family of disjoints star graphs, denoted by S(v i ) 1≤i≤k , centered respectively in vertices (v i ) 1≤i≤k , plus an additional vertex v 0 and edges between v 0 and all the v ′ i 's (or equivalently it is just a tree, which is of height 2 when rooted at v 0 ). The notation S(k; d 1 , . . . , d k ) will refer to the two-step star graph where v i has degree d i + 1 for all i (which means that inside S(v i ), v i has degree d i , or that S(v i ) has size d i + 1). These graphs will play a crucial role in our proof of Theorem 1.1.
Furthermore we denote by B(n, p) the binomial distribution with parameters n and p. If f and g are two real functions, we write Finally for a sequence of random variables (X n ) and a function f : N → (0, ∞), we say that X n ≍ f (n) holds w.h.p. if there exist positive constants c and C, such that P(cf (n) ≤ X n ≤ Cf (n)) → 1, as n → ∞.
2.3. Preliminary estimates on the graph structure. We first recall a large deviations result which we will use throughout this paper (see for instance [DZ]): if X ∼ B(n, p), then for all c > 0, there exists θ > 0, such that Now we present a series of lemma deriving basic estimates on the degree sequence and the graph structure. The first one is very elementary and applies to all the cases we will consider in this paper.
Lemma 2.1. Assume that the degree sequence is given either by (1) or (2), with 1 < a ≤ 2. For j ≥ 1, let A j := {v : D v = j} and n j = |A j |. Then there exist positive constants c and C, such that Proof. Observe that we always have n j ∼ B(n, p j ), for some p j ∈ (c ∞,a j −a , j −a ), with c ∞,a as in (2). Thus the result directly follows from (9).
Our next results depend more substantially on the value of a and the choice of the degree distribution.
Lemma 2.2. Assume that the degree distribution is given by (1), with a ∈ (1, 2). Let E := {v : D v ≥ n/2}. Let also κ > 2 − a and χ < 1 be some constants. Then the following assertions hold Proof. Let us start with Part (i). It follows from the definition (1) that The result follows by using Chebyshev's inequality. Part (ii) is similar to Lemma 2.1. For Part (iii), let v and w be two vertices such that D v ≥ n/2 and D w ≥ n κ . Then conditionally on (D z ) z∈Vn , the probability that the n/8 first stubs of v do not connect to w is smaller than (1 − n κ Ln−n/4 ) n/8 . Hence, which proves (iii) by using (i) and a union bound.
We now prove (iv). To this end, notice that conditionally to D v and L n , s v is stochastically dominated by a binomial random variable with parameters D v and D v /(L n −2D v +2) (remark in particular that since D z ≥ 1 for all z, the denominator in the last term is always positive). Hence Markov's inequality shows that The result follows by using (i) and that for any fixed ε > 0, P (D v It remains to prove (v). Denote the degrees of the neighbors of v by D v,i , i = 1, . . . , D v . It follows from the definition of the configuration model that for any i ≤ D v and k = D v , where we recall that n k is the number of vertices of degree k. Therefore, Summing over i, we get Moreover, similarly to the proof of (i), we can see that w.h.p.
Together with (i), and using again that D v ≤ n ε w.h.p. for any fixed ε > 0, we get (v).
(ii) For any ε > 0, there exists a positive constant η = η(ε), such that for any fixed k ≥ 1, and an integer k = k(ε), such that Proof. Part (i) is standard, we refer for instance to Lemma 2.1 in [VVHZ]. More precisely let (e i ) i≥1 be an i.i.d. sequence of exponential random variables with mean one and Γ i = e 1 + . . . + e i , for all i ≥ 1 (in particular Γ i is a Gamma random variable with parameters i and 1). Then the result holds with for all i ≥ 1, and γ 0 = i γ i (which is well a.s. a convergent series). For (ii) note that Γ i /i → 1 a.s. as i → ∞. In particular for any ε, there exists C > 0, such that P(Γ i ≤ Ci for all i ≥ 1) ≥ 1 − ε/2.
The first assertion follows with (i), using also that P(γ 0 ≤ C) ≥ 1 − ε/2, for C large enough. The second one is an immediate corollary of (i) and the definition of γ 0 as the limit of the partial sum i≤k γ i , as k → ∞. Parts (iii)-(v) are similar to the previous case.
We now give an analogous result for the case a = 2, which we will not prove here since it is entirely similar to the case a < 2 (just for the case when the degree distribution is given by (2), one can use the elementary fact that w.h.p. all vertices have degree smaller than n log log n).
Lemma 2.4. Assume that the degree distribution is given either by (1) or (2), with a = 2. Let E ′ := {v : D v ≥ n 3/4 }. Then the following assertions hold (i) L n ≍ n log n w.h.p., (v) P (All neighbors of v have degree larger than (log n) 4 ) = 1 − o(1), for any v ∈ V n .

Existence of a large two-step star graph
In this section we will prove that the graph G n contains w.h.p. a large two-step star graph S(k; d 1 , . . . , d k ), the term large meaning that d 1 + · · · + d k will be of order n, and all the d i 's of order at least log n. However, the precise values of k and the d i 's will depend on which case we consider (to be more precise, in the case of degree distribution given by (2) with a ∈ (1, 2) we prove that for any ε > 0, G n contains a large two-step star graph with probability at least 1 − ε, with k and the d i 's depending on ε. Nevertheless, the rest of the proof works mutadis mutandis).
3.1.1. Bounded degree sequence. We assume here that the law of the degrees is given by (1). Recall that E = {v : D v ≥ n/2} and A 1 = {v : D v = 1}. In addition for any vertex v, let us denote by the number of neighbors of v in A 1 .
Lemma 3.1. There exist positive contants β and κ, such that Proof. It follows from the definition of the configuration model that for any w ∈ A 1 and v ∈ E, Similarly for any v ∈ E and w = w ′ ∈ A 1 , Define now the set Note that the existence of c and C is guaranteed by Lemma 2.1 and 2.2. Set also β = c/(4C). Then (11) and (12) show that on A n , Thus by using Chebyshev's inequality, we deduce that on A n , Therefore, by taking expectation and using (13) we get Then by using Lemma 2.2 (ii) and Chebyshev's inequality again we obtain (10) as wanted.
As a corollary we get the following result: Proposition 3.2. Assume that the law of the degree sequence is given by (1) with a ∈ (1, 2). There exist positive contants β and κ, such that w.h.p. G n contains as a subgraph a copy of S(k; d 1 , . . . , d k ), with k = κn 2−a and d i = βn a−1 , for all i ≤ k.

Unbounded degree sequences.
We assume here that the law of the degrees is given by (2). The proof of the next result is similar to the one of Lemma 3.1, so we omit it.
Lemma 3.3. With the notation of Lemma 2.3, let (v i ) i≤n be a reordering of the vertices of G n , such that the degree of v i is D i for all i (in particular v 1 is a vertex with maximal degree). Then for any fixed i,

As a consequence we get
Proposition 3.4. Assume that the degree distribution is given by (2), with a ∈ (1, 2). There exists a constant c > 0, such that for any ε > 0, there exists η = η(ε) > 0 and an integer k = k(ε), such that for n large enough, with probability at least 1 − ε, G n contains as a subgraph a copy of S(k; d 1 , . . . , d k ), with d i ≥ ηi −1/(a−1) n for all i ≥ 1, and d 1 + · · · + d k ≥ cn.

3.2.
Case a = 2. In this case we can treat both distributions (1) and (2) in the same way. Recall that E ′ = {v : D v ≥ n 3/4 }, and that d 1 (v) denotes the number of neighbors in A 1 of a vertex v.
Lemma 3.5. There exists a positive constant β, such that Proof. The proof is very close to the proof of Lemma 3.1. First, for any v ∈ E ′ and w ∈ A 1 , we have and furthermore for any w = w ′ ∈ A 1 , Then by using Chebyshev's inequality, we get that for any v ∈ E ′ , for some constant β > 0. The desired result follows by using a union bound and then Lemma 2.1 and 2.4 (i)-(ii).

As a consequence we get
Proposition 3.6. Assume that the law of the degree distribution is given either by (1) or (2) and that a = 2. There exists a positive constant β such that w.h.p. G n contains as a subgraph a copy of S(k; d 1 , . . . , d k ), with k ≍ n 1/4 , d i ≥ βn 3/4 / log n for all i ≤ k, and d 1 + · · · + d k ≍ n.
Proof. Just take for the v i 's the elements of E ′ . Then use Lemma 2.4 (ii)-(iii) and Lemma 3.5.

Contact process on a two-step star graph
In this section we will study the contact process on a two-step star graph. Our main result is the following: Proposition 4.1. There exist positive constants c and C, such that for any two-step star graph G = S(k; d 1 , . . . , d k ), satisfying d i ≥ C log n/λ 2 , for all i ≤ k, and d 1 + ... + d k = n, where τ v 1 n is the extinction time of the contact process with infection parameter λ ≤ 1 starting from v 1 on S(k; d 1 , . . . , d k ).
Note that since we are only concerned with the extinction time here, there is no restriction in assuming λ ≤ 1, as the contact process is stochastically monotone in λ (see [L]). So when λ > 1 the same result holds; one just has to remove the λ everywhere in the statement of the proposition. Now of course an important step in the proof is to understand the behavior of the process on a single star graph. This has already been studied for a long time, for instance it appears in Pemantle [P], and later in [BBCS, CD, MVY]. We will collect all the results we need in Lemma 4.2 below, but before that we give some new definition. We say that a vertex v is lit (the term is taken from [CD]) at some time t if the proportion of its infected neighbors at time t is larger than λ/(16e) (note that in [MMVY] the authors also use the term infested for a similar notion).
Lemma 4.2. There exists a constant c > 0, such that if (ξ t ) is the contact process with parameter λ ≤ 1 on a star graph S with center v, satisfying λ 2 |S| ≥ 64e 2 , then Proof. These results are exactly Lemma 3.1 (ii) and (iii) in [MVY] (similar results can be found in [BBCS, CD, D, P]).
Proof of Proposition 4.1. We first handle the easy case when there is some 1 ≤ i ≤ k, such that deg(v i ) ≥ n/2. First by Lemma 4.2 we know that w.h.p. the virus survives inside S(v 1 ) at least a time exp(cλ 2 d 1 ). Since by hypothesis d 1 diverges when n tends to infinity, and since v 1 and v i are at distance at most two (both are connected to v 0 ), we deduce that w.h.p. v i will be infected before the extinction of the virus. The proposition follows by another use of Lemma 4.2.
We now assume that d i ≤ n/2, for all i. First we need to introduce some more notation. For s < t and v, w ∈ S(v i ), we write if there exists an infection path entirely inside S(v i ) joining (v, s) and (w, t). Similarly if V and W are two subsets of G, we write (15) holds. Now for ℓ ≥ 0 and We claim that for any ℓ ≥ 0 and 1 ≤ i ≤ k, we have for some constant c > 0. To fix ideas we will prove the claim for i = 1 (clearly by symmetry there is no loss of generality in assuming this) and to simplify notation we also assume that ℓ = 0 (the proof works the same for any ℓ). Furthermore, in the whole proof the notation c will stand for a positive constant independent of λ, whose value might change from line to line. Now before we start the proof we give a new definition. We denote by (ξ ′ t ) t≥0 the contact process on S(v 1 ) := S(v 1 ) ∪ {v 0 }, which is defined by using the same Poisson processes as ξ, but only on this subgraph. In particular with ξ ′ , the vertex v 0 can only be infected by v 1 , and thus the restriction of ξ on S(v 1 ) dominates ξ ′ . We also assume that the starting configurations of ξ ′ and of the restriction of ξ on S(v 1 ) are the same. Now for any integer m ≤ n, define be the natural filtration of the process ξ ′ . Then observe that for any vertex w ∈ S(v 1 ), conditionally on F 3m , and on the event at least if w = v 1 . Moreover, the event on the right hand side has probability equal to (1−e −λ ) 2 e −5 , which is larger than cλ 2 , for some c > 0, and a similar result holds if w = v 1 . Therefore for any m and any nonempty subset A ⊂ S(v 1 ),

In other words, if we define
all m ≤ n. By using induction, it follows that

But by construction
Then by repeating the argument in each interval [3Mn, 3(M +1)n], for every M ≤ n/3−1, we get Note that these events are independent of M and E 0,1 , as they depend on different Poisson processes. Note also that by using (9) and thus (since C m,j and C m ′ ,j are independent when m − m ′ ≥ 2), Moreover, by construction if m ∈ M and C m,j holds, then v j is lit at some time t ∈ [m + 1, m + 2]. Therefore by using (17), Finally define U j = exp(cλ 2 d j ), for all j ≤ k, with the constant c as in Lemma 4.2, and take C large enough, so that the hypothesis d j λ 2 ≥ C log n implies U j ≥ 2n 2 . Then (19) together with Lemma 4.2 (i) imply that where for the last inequality we used that d 2 + · · · + d k ≥ n/2. This concludes the proof of (16). The proposition immediately follows, since by using Lemma 4.2, we also know that P(E 0,1 ) = 1 − o(1), when v 1 is infected initially (observe that exp(cλ 2 d 1 ) ≥ n 2 , if the constant C in the hypothesis is large enough).

Proof of Theorem 1.1
The proof is the same in all the cases we considered, so to fix ideas we assume in all this section that the degree distribution is given by (1) with a ∈ (1, 2). The other cases are left to the reader.
Let (t n ) be as in the statement of Theorem 1.1. Define for v ∈ V n , X n,v = 1({ξ v tn = ∅}). The self-duality of the contact process (see (1.7) p. 35 in [L]) implies that for any γ > 0, P |ξ Vn tn | > γn = P v∈Vn X n,v > γn and similarly for the reverse inequality. Hence, to prove that |ξ Vn tn |/n converges in probability to ρ a (λ), it is sufficient to show that the right-handside in (20) converges to 0 when γ = ρ n,a (λ) + ε (resp. ρ n,a (λ) − ε for the reverse inequality), for any fixed ε > 0 (remind that ρ n,a (λ) converges to ρ a (λ), as n → ∞). We will prove these two statements in the next two subsections. 5.1. Upper bound. This part is quite elementary. The idea is to say that if the virus survives for a time t n starting from some vertex v, then v has to infect one of its neighbors before σ(v) (recall the definition (7)), unless σ(v) ≥ t n , but this last event has o(1) probability so we can ignore it. Now the probability that v infects a neighbor before σ(v), is bounded by the probability that one of the Poisson point processes associated to the edges emanating from v has a point before σ(v) (actually it is exactly equal to this if there is no loop attached to v). Then having observed that the latter event has probability exactly equal to ρ n,a (λ), we get the desired upper bound, at least in expectation. The true upper bound will follow using Chebyshev's inequality and the domination of the X n,v 's by suitable i.i.d. random variables. Now let us write this proof more formally. Set Y n,v = 1(C n,v ), with (recall (8)) where the notation e : v → · means that e is an (oriented) edge emanating from v (possibly forming a loop). By construction the (Y n,v ) v∈Vn are i.i.d. random variables, and moreover, the above discussion shows that for all v, jλ jλ + 1 p n,a (j) = ρ n,a (λ).
Therefore it follows from Chebyshev's inequality that for any fixed ε > 0. On the other hand P(σ(v) > t n ) = e −tn = o(1). Thus by using Markov's inequality we get The desired upper bound follows with (21).

Lower bound.
This part is more complicated and requires the results obtained so far in Sections 2, 3 and 4. First define Z n,v = 1(A n,v ∩ B n,v ), for v ∈ V n , where A n,v = {v infects one of its neighbors before σ(v)}, and B n,v = {ξ v tn = ∅}. Remember that X n,v = 1(B n,v ), which in particular gives Z n,v ≤ X n,v . Therefore the desired lower bound follows from the next lemma and Chebyshev's inequality.
Proof. We claim that To see this first use that w.h.p. there is a large two-step star graph in G n (given by Proposition 3.2). Then use Lemma 2.2 (iii) and (v) to see that w.h.p. all neighbors of v have large degree and are connected to all the v i 's of the two-step star graph (recall that by construction D v i ≥ n/2, for all i). Note that in the case a = 2, this is not exactly true, but nevertheless the neighbors of v and the v i 's are still w.h.p. at distance at most two, since they are all connected to the set of vertices z satisfying D z ≥ n/ log n (and w.h.p. this set is nonempty). Now if a neighbor, say w, of v is infected and has large degree, then Lemma 4.2 shows that w.h.p. the virus will survive in the star graph formed by w and its neighbors for a long time. But if in addition w and v 1 are connected (or more generally if they are at distance at most two), then v 1 will be infected as well w.h.p. before extinction of the process. Then Proposition 4.1 gives (23).
On the other hand observe that Therefore (22) and Lemma 2.2 (iv) give Part (i) of the lemma. The second part follows easily by using that we also have A n,v ⊂ C n,v , and that the C n,v 's are independent.
6. Proof of Theorem 1.2 We first prove a lower bound on the probability that the extinction time is smaller than n 2 . Together with the following lemma, we will get the assertion (ii) of the theorem: Lemma 6.1. For every s > 0, we have This lemma is a direct consequence of the Markov property and the attractiveness of the contact process, see for instance Lemma 4.5 in [MMVY].
For simplicity we assume that λ ≤ 1, and leave to the reader the task to slightly modify the values of some constants in the case λ > 1. We also assume first that the degree distribution is given by (1).
Letn a be the number of vertices having degree larger than n 1/2a . Thenn a ∼ B(n,p a ), wherep a = j>n 1/2a p n,a (j) ≍ n (1−a)/2a . Hence, as for Lemma 2.1, there exists a constant K > 0, such that P n a ≤ Kn (1+a)/2a = 1 − o(1). In fact thanks to Lemma 2.1, we can even assume that where E n := n j ≤ Knj −a for all j ≤ n 1/2a ∩ n a ≤ Kn (1+a)/2a . Now if a vertex has degree j, the probability that it becomes healthy before spreading infection to another vertex is at least equal to 1/(1 + jλ) (it is in fact exactly equal to this if there is no loop attached to this vertex). Since this happens independently for all vertices, we have that a.s. for n large enough, on E n , is an exponential random variable with mean 1. Hence, a.s. for n large enough and on E n , The same can be proved in the case when the degree distribution is given by (2). One just has to use that w.h.p. all the degrees are bounded by n 2/(a−1) , but this does not seriously affect the proof.
We now prove (i). This we wil be a consequence of a more general result: Proposition 6.2. Let (G 0 n ) be a sequence of connected graphs, such that |G 0 n | ≤ n, for all n. Let τ n denotes the extinction time of the contact process on G 0 n starting from full occupancy. Assume that (1/d n ) log E(τ n ) → ∞, with d n the diameter of G 0 n . Assume further that there exists ε > 0, such that for all n, E(τ n ) ≥ (log n) 1+ε . Then τ n E(τ n ) where E(1) is an exponential random variable with mean one.
Proof. According to Proposition 1.2 in [M] and Lemma 6.1 above it suffices to show that there exists a sequence (a n ), such that a n = o(E(τ n )) and where (ξ t ) t≥0 denotes the process starting from full occupancy. We take now a n = E(τ n )/(log n) ε/2 (with the same ε as in the hypothesis of the proposition). To prove (25) it is convenient to introduce the dual contact process. Given some positive real t and A a subset of the vertex set V n of G n , the dual process (ξ A,t s ) s≤t is defined bŷ for all s ≤ t. It follows from the graphical construction that for any v, So let us prove now that the last sum above tends to 0 when n → ∞.
Let w ∈ V n be some fixed vertex, and define A n := {ξ w,an an = ∅}.
Let then K n be the largest integer smaller than a n /d n , and define for any 0 ≤ k ≤ K n − 1 (with the implicit constant in the o(1/n) independent of v and w) as it immediately implies (25), using (26) (recall that |V n | ≤ n by hypothesis). To this end, note that for any z, z ′ ∈ V n , and any t ≥ 0, for some constant C > 0. Indeed this can be proved by using the same argument as for (18) repeated d n times along a path going from z to z ′ (note that since G 0 n is connected by hypothesis, such path exists, and by definition, its length is bounded by d n ).
We next denote by G k the sigma-field generated by all the Poisson processes introduced in the graphical construction outside the time interval [kd n , (k + 1)d n ], or more formally Then by using first the FKG inequality (see for instance Theorem B15 in [L]) and then (28), we get that for any k ≤ K n − 1, Then by using induction on k, we obtain which gives (27) using our choice of a n . This concludes the proof of the proposition. Remark 6.3. This proposition can be used in various examples, for instance to the case of the configuration model with degree distribution satisfying p(1) = p(2) = 0, and for some constants c > 0 and a > 2. This is the degree distribution considered in [CD, MMVY]. In this case it is known that w.h.p. the graph is connected and has diameter O(log n), see [CD,Lemma 1.2]. Moreover, [CD] shows that the extinction time is w.h.p. larger than any stretched exponential, therefore the proposition applies well here. It also applies for finite boxes of Z d , namely for G n = 0, n d , with d ≥ 2 and for supercritical λ, since in this case we can use [L,Theorem 3.9] which shows that the logarithm of the extinction time is w.h.p. of order n d . This provides another proof of the main result of [M]. One can also notice that it gives an alternative proof of Theorem 1.2 in [MMVY] (using their Theorem 1.1), under the stronger condition that G n has diameter o(n).
Remark 6.4. Assume that on a sequence of graphs (G n ), one can prove that w.h.p. τ n ≥ ϕ(n), for some function ϕ(n), and that in the mean time we can prove (25) for some a n ≤ ϕ(n). Then observe that if (3) holds with t n = a n , then by using the selfduality, we can see that the same holds as well with t n = ϕ(n). In particular, in our setting, by using Theorem 1.2, we deduce that (3) holds with t n = exp(cn), for any c < c crit := lim inf(1/n) log E(τ n ), but (using again Theorem 1.2) it does not when c > c crit . This argument also explains why the combination of the results in [MVY] and [MMVY] give the statement that was mentioned in the introduction for the case a > 2.
Now to complete the proof of Theorem 1.2 (i), it remains to show that the hypothesis of the proposition are well satisfied in our case, namely for the maximal connected component -call it G 0 n -of the configuration model G n . It amounts to show first that the size of all the other connected components is much smaller, to ensure that w.h.p. the extinction time on G n and on G 0 n coincide. Remember that with Theorem 1.1 we know that on G n it is w.h.p. larger than exp(cn). In the mean time we will show that the diameter of G 0 n is o(n). Since we could not find a reference, we provide a short proof here (in fact much more is true, see below).
For v ∈ V n , we denote by C(v) the connected component of G n containing v, and by ||C(v)|| its number of edges. We also define Lemma 6.5. Let G n be the configuration model with n vertices and degree distribution given either by (1) or (2), with a ∈ (1, 2]. Let d n = diam(G 0 n ) be the maximal distance between pair of vertices in G 0 n . Then there exists a positive constant C, such that w.h.p.
max(d n , d ′ n ) ≤ C when 1 < a < 2 4 log n/ log log n when a = 2.
Proof. We only prove the result for a = 2 here, the case a < 2 being entirely similar. To fix ideas we also assume that the degree distribution is given by (1), but the proof works as well with (2) By construction, the probability that any stub incident to some vertex v / ∈ F is matched with a stub incident to a vertex lying in F is equal to R n /(L n − 1). By iterating this argument, we get for any k, where d(v, F ) denotes the graph distance between v and F (which by convention we take infinite when there is no element of F in C(v)). Then it follows from Lemma 2.4 (i) and the fact that R n ≍ n log log n, that P (d(v, F ) > k n and ||C(v)|| > k n ) ≤ C log log n log n 2 log n/ log log n−1 = o(n −1 ), for some constant C > 0, with k n = 2 log n/(log log n) − 1. This proves the lemma, using a union bound and (29).
To complete the proof of Part (i) of the theorem, we just need to remember that on any graph with k edges, and for any t ≥ 1, the extinction time is bounded by 2t with probability at least 1 − (1 − exp(−Ck)) t (since on each time interval of length 1 it has probability at least exp(−Ck) to die out, for some constant C > 0, independently of the past). Therefore the previous lemma shows that w.h.p. the extinction time on G 0 n and on G n are equal, as was announced just above the previous lemma. Then Part (i) of the theorem follows with Proposition 6.2.

Extension to more general degree distributions
We present here some rather straightforward extensions of our results to more general degree distributions.
A first one, which was also considered in [VVHZ], is to take distributions which interpolate between (1) and (2): for any fixed α ∈ [1, ∞], define p n,a,α (j) := c n,a,α j −a for all 1 ≤ j ≤ n α , where (c n,a,α ) are normalizing constants, and with the convention that the case α = ∞ corresponds to the distribution given by (2). It turns out that if a < 2 and α < 1/(a − 1), one can use exactly the same proof as in the case α = 1. When α > 1/(a − 1), using that w.h.p. all vertices have degree smaller than n 1/(a−1) log log n, one can use the same proof as in the case α = ∞. The case α = 1/(a − 1) is more complicated, and as in [VVHZ], a proof would require a more careful look at it.
When a = 2, using that w.h.p. all vertices have degree smaller than n log log n, one can see that the same proof applies for any α > 1.
Another extension is to assume that there exist positive constants c and C, and some fixed m ≥ 1, such that for any vertex v, cj −a ≤ P(D v = j) ≤ Cj −a for m ≤ j ≤ n α , say with α = 1, but it would work with α = ∞ as well. The only minor change in this case is in the proof of Lemma 3.1. But one can argue as follows: just replace the set A 1 by the set of vertices in A m whose first m − 1 stubs are not connected to any of the vertices in E. By definition these vertices have at most one neighbor in E and moreover, it is not difficult to see that this set also has w.h.p. a size of order n. Then the rest of the proof applies, mutadis mutandis. All other arguments in the proof of Theorem 1.1 remain unchanged. Therefore in this case we obtain that: |ξ Vn tn | n − ρ n,a (λ) (P) −−→ 0, with ρ n,a (λ) as in (5). Theorem 1.2 remains also valid in this setting.