Critical window for the configuration model: finite third moment degrees

We investigate the component sizes of the critical configuration model, as well as the related problem of critical percolation on a supercritical configuration model. We show that, at criticality, the finite third moment assumption on the asymptotic degree distribution is enough to guarantee that the sizes of the largest connected components are of the order $n^{2/3}$ and the re-scaled component sizes (ordered in a decreasing manner) converge to the ordered excursion lengths of an inhomogeneous Brownian Motion with a parabolic drift. We use percolation to study the evolution of these component sizes while passing through the critical window and show that the vector of percolation cluster-sizes, considered as a process in the critical window, converge to the multiplicative coalescent process in the sense of finite dimensional distributions. This behavior was first observed for Erd\H{o}s-R\'enyi random graphs by Aldous (1997) and our results provide support for the empirical evidences that the nature of the phase transition for a wide array of random-graph models are universal in nature. Further, we show that the re-scaled component sizes and surplus edges converge jointly under a strong topology, at each fixed location of the scaling window.


Introduction
Random graphs are the main vehicles to study complex networks that go through a radical change in their connectivity, often called the phase-transition.A large body of literature aims at understanding the properties of random graphs that experience this phase-transition in the sizes of the large connected components for various models.The behavior is well understood for the Erdős-Rényi random graphs, thanks to a plethora of results [2,19,26,31].However, these graphs are often inadequate for modeling real-world networks [11,14,28,29] since the real-world network data often show a power-law behavior of the asymptotic degrees whereas the degree distribution of the Erdős-Rényi random graphs has exponentially decaying tails.Therefore, many alternative models have been proposed to capture this power-law tail behavior.An interesting fact, however, is that the behavior, in most of these models, is quite universal in the sense that there is a critical value where the graphs experience a phase-transition and the nature of this phase-transition is insensitive to the microscopic descriptions of the model [4,8,12,20,26,27,32].
In this work, we focus on the configuration model, the canonical model for generating a random multi-graph with a prescribed degree sequence.This model was introduced by Bollobás [10] to choose a uniform simple d-regular graph on n vertices, when dn is even.The idea was later generalized for general degree sequences d by Molloy and Reed [24] and others.We denote by CM n (d) the multi-graph generated by the configuration model on the vertex set [n] = {1, 2 . . ., n} with the degree sequence d.The configuration model, conditioned on simplicity, yields a uniform simple graph with the same degree sequence.Various features related to the emergence of the giant component phenomenon for this model have been studied recently [15,16,18,20,24,27].We give a brief overview of the relevant literature in Section 4.1.Our aim is to obtain precise asymptotics for the component sizes of CM n (d) in the critical window of phase transition under the optimal assumptions on the degree sequence involving a finite third-moment condition.The re-scaled vector of component sizes (ordered in a decreasing manner) is shown to converge to the ordered excursion lengths of certain reflected inhomogeneous Brownian motions with a parabolic drift.This shows that the component sizes of CM n (d) in the critical regime, for a large collection of possible d, lies in the same universality class as the Erdős-Rényi random graph [2] and the inhomogeneous random graph [8].We use percolation on a super-critical configuration model to show the joint convergence of the scaled vectors of component sizes at multiple locations of the percolation scaling window.We also obtain the asymptotic distribution of the number of surplus edges in each component and show that the sequence of vectors consisting of the re-scaled component sizes and surplus converges to a suitable limit under a strong topology as discussed in [6].These results give very strong evidence in favor of the structural similarity of the component sizes of CM n (d) and Erdős-Rényi random graphs at criticality.

Our contribution
The main contribution of this paper is that we derive the strongest results in the literature under the finite third-moment assumption on the degrees.This finite third-moment assumption is also necessary for Erdős-Rényi type scaling limits, since, amongst other reasons, the third moment appears in the scaling limit.In a recent work [13], we consider the infinite third-moment case with power-law degrees and show that the scaling limit of the cluster sizes is quite different.Also, we prove the joint convergence of the component sizes and the surplus edges under a strong topology, which improves the previous known results involving the surplus edges [27].We also study percolation on the configuration model to gain insight about the evolution of the configuration model over the critical scaling window.This is achieved by studying a dynamic process that generates the percolated graphs with different values of the percolation parameter, a problem that is interesting in its own right.
Before stating our main results, we need to introduce some notation and concepts.

Definitions and notation
We will use the standard notation È − →, L − → to denote convergence in probability and in distribution or law, respectively.We often use the Bachmann Landau notation O(•), o(•) for large n asymptotics of real numbers.The topology needed for the distributional convergence will always be specified unless it is clear from the context.A sequence of events (E n ) n≥1 is said to occur with high probability (whp) with respect to probability measures For a triangular array of random variables (f k,n ) k,n≥1 , we write phrases like f k,n = O È (n α ) (respectively o È (n α )), uniformly over k ≤ n β to mean that sup k≤n α |f k,n | = O È (n α ) (respectively o È (n α )).We also write f n = O E (a n ) (respectively f n = o E (a n )) to denote that sup n≥1 a −1 n f n < ∞ (respectively lim n→∞ a −1 n f n = 0).Denote by ℓ 2 ↓ := x = (x 1 , x 2 , x 3 , ...) : the subspace of non-negative, non-increasing sequences of real numbers with square norm metric d(x, y) = ( ∞ i=1 (x i − y i ) 2 ) 1/2 and let (ℓ 2 ↓ ) k denote the k-fold product space of ℓ 2 ↓ .With ℓ 2 ↓ × AE ∞ , we denote the product topology of ℓ 2 ↓ and AE ∞ , where AE ∞ denotes the collection of sequences on AE, endowed with the product topology.Define also x i y i < ∞ and y i = 0 whenever with the metric x 1i y 1i − x 2i y 2i . (2.3) We usually use the boldface notation X for a time-dependent stochastic process (X(s)) s≥0 , unless stated otherwise, C[0, t] denotes the set of all continuous functions from [0, t] to Ê equipped with the topology induced by sup-norm || • || t .Similarly, [0, t] (resp.[0, ∞)) denotes the set of all càdlàg functions from [0, t] (resp.[0, ∞)) to Ê equipped with the Skorohod J 1 topology.B λ µ,η denotes an inhomogeneous Brownian motion with a parabolic drift, given by where B = (B(s)) s≥0 is a standard Brownian motion, and µ > 0, η > 0 and λ ∈ Ê are constants.
Define the reflected version of B λ µ,η as For a function f ∈ C[0, ∞), an interval γ = (l, r) is called an excursion above past minima or simply an excursion of f if f (l) = f (r) = min u≤r f (u) and f (x) > f (r) for all l < x < r. |γ| = r(γ) − l(γ) will denote the length of the excursion γ.Also, define the counting process of marks N λ = (N λ (s)) s≥0 to be a unit-jump process with intensity βW λ (s) at time s conditional on (W λ (u)) u≤s so that is a martingale (see [2]).For an excursion γ, let N (γ) denote the number of marks in the interval [l(γ), r(γ)].
Remark 1.By [2, Lemma 25], the excursion lengths of B λ µ,η can be rearranged in decreasing order of length and the ordered excursion lengths can be considered as a vector in ℓ 2 ↓ , almost surely.Let γ λ = (|γ λ j |) j≥1 be the ordered excursion lengths of B λ µ,η .Then, (|γ λ j |, N (γ λ j )) j≥1 can be ordered as an element of U 0 ↓ almost surely by [6,Theorem 3.1 (iii)].We denote this element of Finally, we define a Markov process X := (X(s)) −∞<s<∞ on ((−∞, ∞), ℓ 2 ↓ ), called the multiplicative coalescent process.Think of X(s) as a collection of masses of some particles (possibly infinite) in a system at time s.Thus the i th particle has mass X i (s) at time s.The evolution of the system takes place according to the following rule at time s: At rate X i (s)X j (s), particles i and j merge into a new particle of mass X i (s) + X j (s).This process has been extensively studied in [2,3].In particular, Aldous [2, Proposition 5] showed that this is a Feller process.

Main results
Consider n vertices labeled by [n] := {1, 2, ..., n} and a sequence of degrees d = (d i ) i∈ [n] such that ℓ n = i∈[n] d i is even.For convenience we suppress the dependence of the degree sequence on n in the notation.The configuration model on n vertices with degree sequence d is constructed as follows: Equip vertex j with d j stubs, or half-edges.Two half-edges create an edge once they are paired.Therefore, initially we have ℓ n = i∈[n] d i half-edges.We pick any one half-edge and pair it with a uniformly chosen half-edge from the remaining unpaired half-edges and keep repeating the above procedure until we exhaust all the unpaired half-edges.
Note that the graph constructed by the above procedure may contain self-loops or multiple edges.It can be shown [30,Proposition 7.15] that, conditionally on CM n (d) being simple, the law of such graphs is uniform over all possible simple graphs with degree sequence d.
In this section, we discuss the main results in this paper.As discussed in the introduction, our results are twofold and concern (i) general CM n (d) at criticality, and (ii) critical percolation on a super-critical configuration model, both under a finite third moment assumption.

Configuration model results
We consider a sequence of configuration models (CM n (d)) n≥1 satisfying the following: Assumption 1.Let D n denote the degree of a vertex chosen uniformly at random independently of the graph.Then, for some random variable D such that for some λ ∈ Ê.
Suppose that C (1) , C (2) ,... are the connected components of CM n (d) in decreasing order of size.In case of a tie, order the components according to the values of the minimal indices of vertices in those components.For a connected graph G, let SP(G):= (number of edges in G) − (|G| − 1) denote the number of surplus edges.Intuitively, this measures the deviation of G from a tree-like structure.Let σ r = [D r ] and consider the reflected Brownian motion, the excursions, and the counting process N λ as defined in Section 2 with parameters Let γ λ denote the vector of excursion lengths of the process B λ µ,η , arranged in non-increasing order.Our main results are as follows: with respect to the ℓ 2 ↓ topology.Recall the definition of Z(λ) from Remark 1. Order the vector component sizes and surplus edges n −2/3 C (j) , SP(C (j) ) j≥1 as an element of U 0 ↓ and denote it by Z n (λ).
Theorem 2. Fix any λ ∈ Ê.Under Assumption 1, with respect to the U 0 ↓ topology.In words, Theorem 1 gives the precise asymptotic distribution of the component sizes re-scaled by n 2/3 and Theorem 2 gives the asymptotic number of surplus edges in each component jointly with their sizes.
Remark 2. The strength of Theorems 1 and 2 lies in Assumption 1.Clearly, Assumption 1 is satisfied when the distribution of D satisfies an asymptotic power-law relation with finite third moment, i.e., È(D ≥ x) ∼ x −(τ −1) (1 + o(1)) for some τ > 4. Also, if one has a random degreesequence that satisfies Assumption 1 with high probability, then Theorems 1 and 2 hold conditionally on the degrees.In particular, when the degree sequence consists of an i.i.d sample from a distribution with [D 3 ] < ∞ [20], then Assumption 1 is satisfied almost surely.We will later see that degree sequences in the percolation scaling window also satisfy Assumption 1.

Percolation results
Bond percolation on a graph G refers to deleting edges of G independently with equal probability p.In the case G is a random graph, the deletion of edges are also independent of G. Consider bond percolation on CM n (d) with probability p n , yielding CM n (d, p n ).We assume the following: Assumption 2. (i) Assumption 1 i and ii hold for the degree sequence and the CM n (d) is supercritical, i.e.
(ii) (Critical window for percolation) For some λ ∈ Ê, Note that p n (λ), as defined in Assumption 2 ii, is always non-negative for n sufficiently large.Now, suppose di ∼ Bin(d i , √ p n ), n + := i∈[n] (d i − di ) and ñ = n + n + .Consider the degree sequence d consisting of di for i ∈ [n] and n + additional vertices of degree 1, i.e. di = 1 for i ∈ [ñ] \ [n].We will show later that the degree Dn of a random vertex from this degree sequence satisfies Assumption 1 i, ii almost surely for some random variable D with [ D3 ] < ∞.Moreover, ñ/n → 1 + µ(1 − ν −1/2 ) = ζ almost surely.Now, using the notation in Section 2, define γλ j = ζ 2/3 γλ j , where γλ j is the j th largest excursion of the inhomogeneous Brownian motion B λ µ,η with the parameters Define the process Ñ as in (2.7) with the parameter values given by (3.9).Denote the j th largest cluster of CM n (d, p n (λ)) by C p (j) (λ).Also, let Z p n (λ) denote the vector in U 0 ↓ obtained by rearranging critical percolation clusters (re-scaled by n 2/3 ) and their surplus edges and Z(λ) denote the vector in U 0 ↓ obtained by rearranging (( Theorem 3.Under Assumption 2, with respect to the U 0 ↓ topology.Next we consider the percolation clusters for multiple values of λ.There is a very natural way to couple (CM n (d, p n (λ)) λ∈Ê described as follows: Suppose that each edge (ij) of CM n (d) has an associated i.i.d uniform random variable U ij , and the U ij 's are also independent of CM n (d).Now, delete edge (ij) if U ij > p n (λ).The obtained graph is distributed as CM n (d, p n (λ)).Moreover, if we fix the set of uniform random variables and change λ, this produces a coupling between the graphs (CM n (d, p n (λ)) λ∈Ê .The next theorem shows that the convergence of the component sizes holds jointly in finitely many locations within the critical window, under the above described coupling: with respect to the (ℓ 2 ↓ ) k topology.Remark 3. The coupling for the limiting process in Theorem 4 is given by the multiplicative coalescent process described in Section 2. This will become more clear when we describe the ideas of the proof.To understand this intuitively, notice that the component C p (i) (λ) consists of some paired half-edges which form the edges of the percolated graph, and some open half-edges which were deleted due to percolation.Denote by O p i (λ), the total number of open half-edges of C p (i) (λ).One can think of O p i as the mass of C p (i) .Now, as we change the value of the percolation parameter from p n (λ) to p n (λ + dλ), exactly one edge is added to the graph and the two endpoints are chosen proportional to the number of open half-edges of the components of CM n (d, p n (λ)).By the above heuristics, C p (i) and C p (j) merge at rate proportional to O p i O p j and creates a component of mass O p i +O p j −2.Later, we will show that the mass of a component is approximately proportional to the component size.Therefore, the component sizes merge approximately like the multiplicative coalescent over the critical scaling window.Remark 4. Janson [16] studied the phase transition of the maximum component size for percolation on a super-critical configuration model.The critical value was shown to be p = 1/ν.This is precisely the reason behind taking p n of the form given by Assumption 2 ii.The width of the scaling window is intimately related to the asymptotics of the susceptibility function i |C (i) | 2 /n.
In fact, if i |C (i) | 2 ∼ n 1+η , then the width of the critical window turns out to be n η and the largest component sizes are of the order n (1+η)/2 .This has been universally observed in the random graph literature [2,8,12,20,25,27], even when the scaling limit is not in the same universality class as Erdős-Rényi random graphs [9,13] and the same turns out to be the case in this paper.Remark 5. Theorem 1 and Theorem 2 also hold for configuration models conditioned on simplicity.We do not give a proof here.The arguments in [20, Section 7] can be followed verbatim to obtain a proof of this fact.As a result, Theorem 3 and Theorem 4 also hold, conditioned on simplicity.
The rest of the paper is organized as follows: In Section 4.1, we give a brief overview of the relevant literature.This will enable the reader to understand better the relation of this work to the large body of literature already present.Also, it will become clear why the choices of the parameters in Assumption 1 iii and Assumption 2 ii should correspond to the critical scaling window.We prove Theorems 1 and 2 in Section 5.In Section 6 we find the asymptotic degree distribution in each component.This is used along with Theorem 2 to establish Theorem 3 in Section 7. In Section 8, we analyze the evolution of the component sizes over the percolation critical window and prove Theorem 4.

Literature overview
Erd ős-Rényi type behavior.We first explain what 'Erdős-Rényi type behavior' means.The study of critical window for random graphs started with the seminal paper [2] on the Erdős-Rényi random graphs with p = n −1 (1 + λn −1/3 ).Aldous showed in this regime that the largest components are of asymptotic size n 2/3 and the ordered component sizes (scaled by n 2/3 ) asymptotically have the same distribution as the ordered excursion lengths of a Brownian motion with a negative parabolic drift.Aldous also considered a natural coupling of the re-scaled vectors of component sizes as λ varies, and viewed it as a dynamic ℓ 2 ↓ -valued stochastic process.It was shown that the dynamic process can be described by a process called the standard multiplicative coalescent, which has the Feller property.This implies the convergence of the component sizes jointly for different λ values.In Theorem 4, we show that similar results hold for the configuration model under a very general set of assumptions.Of course, for general configuration models, there is no obvious way to couple the graphs such that the location parameter in the scaling window varies and percolation seems to be the most natural way to achieve this.By [15,16], percolation on a configuration model can be viewed as a configuration model with a random degree sequence and this is precisely the reason for studying percolation in this paper.
Universality and optimal assumptions.In [8] it was shown that, inside the critical scaling window, the ordered component sizes (scaled by n 2/3 ) of an inhomogeneous random graph with converge to the ordered excursion lengths of an inhomogeneous Brownian motion with a parabolic drift under only finite third-moment assumption on the weight distribution.We establish a counterpart of this for the configuration model in Theorem 1.Later Nachmias and Peres [25] studied the case of percolation scaling window on the random regular graph; for percolation on the configuration model similar results were obtained by Riordan [27] for bounded maximum degrees.
Joseph [20] obtained the same scaling limits as Theorem 1 for the component sizes when the degrees are i.i.d samples from a distribution having finite third moment.Theorem 2 and Theorem 3 prove stronger versions of all these existing results for the configuration model under the optimal assumptions.Further, in Theorem 4, we give a dynamic picture for percolation cluster sizes in the critical window and show that this dynamics can be approximated by the multiplicative coalescent.
Comparison to branching processes.In [18,24] the phase transition for the component sizes of CM n (d) was identified in terms of the parameter ν = [D(D − 1)]/ [D].Janson and Luczak [18] showed that the local neighborhoods of the configuration model can be approximated by a branching process X which has ν as its expected progeny and thus, when ν > 1, CM n (d) has a component C max of approximate size ρn, where ρ is the survival probability of X .Further, the progeny distribution of X has finite variance when [D 3 ] < ∞.Now, for a branching process with mean ≈ 1 + ε and finite variance σ 2 , the survival probability is approximately 2σ −2 ε for small ε > 0. This seems to suggest that the largest component size under Assumption 1 should be of the order n 2/3 since ε = Θ(n −1/3 ).Theorem 1 mirrors this intuition and shows that in fact all the largest component sizes are of the order n 2/3 .

Proof ideas
The proof of Theorem 1 uses standard functional central limit theorem argument.Indeed we associate a suitable semi-martingale with the graph obtained from an exploration algorithm used to explore the connected components of CM n (d).The martingale part is then shown to converge to an inhomogeneous Brownian motion, and the drift part is shown to converge to a parabola.The fact that the component sizes can be expressed in terms of the hitting times of the semimartingale implies the finite-dimensional convergence of the component sizes.The convergence with respect to ℓ 2 ↓ is then concluded using size-biased point process arguments formulated by Aldous [2].Theorem 2 requires a careful estimate of the tail probability of the distribution of surplus edges when the component size is small and we obtain this using martingale estimates in Lemma 21.Theorem 3 is proved by showing that the percolated degree sequence satisfies Assumption 1 almost surely.Finally, we prove Theorem 4 in Section 8.The key challenges here are that, for each fixed n, the components do not merge according to their component sizes, and that the components do not merge exactly like a multiplicative coalescent over the scaling window.Thus the main theme of the proof lies in approximating the evolution of the component sizes over the percolation scaling window with a suitable dynamic process that is an exact multiplicative coalescent.

Open problems
(i) Theorem 4 proves the joint convergence at finitely many locations in the scaling window.
(ii) A reason for studying percolation in this paper is to understand the minimal spanning tree of the giant component.For a super-critical configuration model with i.i.d edge weights, it should be the case that the minimal spanning tree can be described by the critically percolated graph at a very high location of the scaling window.Such results were obtained in [1] for the minimal spanning tree on a complete graph.The study of minimal spanning trees is still an open question, even for random regular graphs.
5 Proofs of Theorems 1 and 2

The exploration process
Let us explore the graph sequentially using a natural approach outlined in [27].At step k, divide the set of half-edges into three groups; sleeping half-edges S k , active half-edges A k , and dead half-edges D k .The depth-first exploration process can be summarized in the following algorithm: Algorithm 1 (DFS exploration).At k = 0, S k contains all the half-edges and A k , D k are empty.While (S k = ∅ or A k = ∅) we do the following at stage k + 1: S3 If A k = ∅ for some k, then take out one half-edge a from S k uniformly at random and identify the vertex v incident to it.Declare v to be discovered.Let r = d v − 1 and assume that a v1 , a v2 ,..., a vr are the half-edges of v other than a and identify the collection of half-edges involved in self-loops C k as in Step 2. Order the half-edges of v as In words, we explore a new vertex at each stage and throw away all the half-edges involved in a loop/multiple edge/cycle with the vertex set already discovered before proceeding to the next stage.The ordering of the half-edges is such that the connected components of CM n (d) are explored in the depth-first way.We call the half-edges of B k ∪ C k cycle half-edges because they create loops, cycles or multiple edges in the graph.Let (5.1) Let d (j) be the degree of the j th explored vertex and define the following process: (5. 2) The process S n = (S n (i)) i∈[n] "encodes the component sizes as lengths of path segments above past minima" as discussed in [2].Suppose C i is the i th connected component explored by the above exploration process.Define (5.3)

Size-biased exploration
The vertices are explored in a size-biased manner with sizes proportional to their degrees, i.e., if we denote by v (i) the i th explored vertex in Algorithm 1 and by d (i) the degree of v (i) , then where V i denotes the first i vertices to be discovered in the above exploration process.The following lemma will be used crucially in the proof of Theorem 1: Lemma 5. Suppose that Assumption 1 holds and denote σ r = [D r ] and µ = [D].Then for all t > 0, as n → ∞, and The proof of this lemma follows from the two lemmas stated below: , where the size of index i is (5.7) Then, for any t > 0, as n → ∞, sup u≤t |Y (t) − t| È − → 0.

Estimate of cycle half-edges
The following lemma gives an estimate of the number of cycle half-edges created up to time t.This result is proved in [27] for bounded degrees.In our case, it follows from Lemma 5 as we show below: and uniformly for k ≤ tn 2/3 and any t > 0, where F k is the sigma-field generated by the information revealed up to stage k.Further, all the O È and o È terms in (5.9) and (5.10) can be replaced by O E and o E .
Proof.Suppose U k := S k .First note that by (5.5) uniformly over k ≤ tn 2/3 .Let a be the half-edge that is being explored at stage k + 1.Now, each of the (A k − 1) half-edges of A k \ {a} is equally likely to be paired with a half-edge of v (k+1) , thus creating two elements of B k .Also, given F k and v (k+1) , the probability that a half-edge of A k \ {a} is paired to one of the half-edges of v (k+1) is (d (k+1) − 1)/(U k − 1).Therefore, (5.12) Hence, (5.13) Now, using (5.5) and (5.6), uniformly over k ≤ tn 2/3 , where the last step follows from Assumption 1 iii.Further, using the fact È(D = 1) > 0, U k ≥ c 0 n for some constant c 0 > 0 uniformly over k ≤ tn 2/3 .Thus, ( gives (5.9).The fact that all the O È , o È can be replaced by ).To prove (5.10), note that (5.15) By Assumption 1 and (5.5) uniformly for k ≤ tn 2/3 .Therefore, uniformly over k ≤ tn 2/3 .Again, O È term can be replaced by O E , as argued before.
The following result is the main ingredient for proving Theorem 1. Recall the definition of B λ µ,η from (2.5) with parameters given in (3.4).
Theorem 9 (Convergence of the exploration process).Under Assumption 1, as n → ∞, with respect to the Skorohod J 1 topology.
As in [20], we will prove this by approximating S n by a simpler process defined as Note that the difference between the processes S n and s n is due to the cycles, loops, and multipleedges encountered during the exploration.Following the approach of [20], it will be enough to prove the following: with respect to the Skorohod J 1 topology.
Remark 6.It will be shown that the distributions of Sn and sn are very close as n → ∞, and therefore, Proposition 10 implies Theorem 9.This is achieved by proving that we will not see too many cycle half-edges up to the time ⌊n 2/3 u⌋ for any fixed u > 0.
From here onwards we will look at the continuous versions of the processes Sn and sn by linearly interpolating between the values at the jump points and write it using the same notation.It is easy to see that these continuous versions differ from their càdlàg versions by at most n −1/3 d max = o(1) uniformly on [0, T ], for any T > 0. Therefore, the convergence in law of the continuous versions implies the convergence in law of the càdlàg versions and vice versa.Before proceeding to show that Theorem 9 is a consequences of Proposition 10, we will need to bound the difference of these two processes in a suitable way.We need the following lemma.Recall the definition of c (5.21) Proof.Lemma 11 is similar to [20, Lemma 6.1].We add a brief proof here.Note that, for all large n, A k ≤ M n 1/3 on E n (t, M ), because where the last step follows by noting that min j≤k s n (j) ≤ min j≤k S n (j) + 2 k j=1 c (j) .By Lemma 8, uniformly for k ≤ tn 2/3 .Summing over 1 ≤ k ≤ tn 2/3 and taking the lim sup completes the proof.
The proof of the fact that Theorem 9 follows from Proposition 10 and Lemma 11 is standard (see [20,Section 6.2]) and we skip the proof for the sake of brevity.From here onward the main focus of this section will be to prove Proposition 10.We use the martingale functional central limit theorem in a similar manner as [2].
Proof of Proposition 10.Let {F i } i≥1 be the natural filtration defined in Lemma 8. Recall the definition of s n (i) from (5.19).By the Doob-Meyer decomposition [21, Theorem 4.10] we can write where (5.25c) Recall that for a discrete time stochastic process (X n (i)) i≥1 , we denote Xn (t) = n −1/3 X n (⌊tn 2/3 ⌋).Our result follows from the martingale functional central limit theorem [33,Theorem 2.1] if we can prove the following four conditions: For any u > 0, ) ) The facts that the jumps of both the martingale and the quadratic-variation process go to zero and that the quadratic variation process is converging to the quadratic variation of an inhomogeneous Brownian Motion, together imply the convergence of the martingale term.The validation of these conditions are given separately in the subsequent part of this section.
Next, we prove Condition (5.26a) which requires some more work.Note that where the last step follows from Assumption 1 iii.Therefore, (5.36) The following lemma estimates the sums on the right-hand side of (5.36): Lemma 13.For all u > 0, as n → ∞, and Consequently, (5.39) Proof.Notice that (5.40) and (5.37) follows from (5.6) in Lemma 5.The proof of (5.38) is similar and it follows from (5.5).We now show (5.39).Recall that uniformly over i ≤ un 2/3 where we use Lemma 5 to conclude the uniformity.Similarly, (5.28) implies that j / and Assumption 1, combined with (5.38), complete the proof.
Proof.The proof follows by using Lemma 13 in (5.36).

Finite dimensional convergence of the ordered component sizes
Note that the convergence of the exploration process in Theorem 9 implies that, for any large T > 0, the k-largest components explored up to time T n 2/3 converge to the k-largest excursions above past minima of B λ µ,η up to time T .Therefore, we can conclude the finite dimensional convergence of the ordered components sizes in the whole graph if we can show that the large components are explored early by the exploration process.The following lemma formalizes the above statement: Then, (5.44) There exists some constant C 0 > 0 such that for any T > 0, (5.45) Proof.Using a similar split up as in (5.35), we have Now, (5.5) and (5.6) give that, uniformly over i ≤ T n 2/3 , j / ) Proof of Lemma 15.Let i T := inf{i ≥ T n 2/3 : S n (i) = inf j≤i S n (j)}.Thus, i T denotes the first time we finish exploring a component after time T n 2/3 .Note that, conditional on the explored vertices up to time i T , the remaining graph Ḡ is still a configuration model.Let νn = i∈ Ḡ d i (d i − 1)/ i∈ Ḡ d i be the criticality parameter of Ḡ.Then, using (5.45), we can conclude that (5.48) Take T > 0 such that λ − C 0 T < 0. Thus, with high probability, νn < 1. Denote the component corresponding to a randomly chosen vertex from Ḡ by C ≥T (V n ), and the i th largest component of Ḡ by C ≥T (i) .Also, let È denote the probability measure conditioned on F i T , and let ¯ denote the corresponding expectation.Now, for any δ > 0, where the second step follows from the Markov inequality and the last step follows by combining Lemma 16 and (5.48).Noting that νn < 1 with high probability, we get for some constant C > 0 and large T > 0 and the proof follows.
Theorem 18.The convergence in Theorem 1 holds with respect to the product topology.
Proof.The proof follows from Theorem 9 and Lemma 15.

Proof of Theorem 1
The proof of Theorem 1 follows using similar argument as [2, Section 3.3].However, the proof is a bit tricky since the components are explored in a size-biased manner with sizes being the total degree in the components (not the component sizes as in [2]).For a sequence of random variables Y = (Y i ) i≥1 satisfying i≥1 Y 2 i < ∞ almost surely, define ξ := (ξ i ) i≥1 such that ξ i |Y ∼ Exp(Y i ) and the coordinates of ξ are independent conditional on Y.For a ≥ 0, let S (a) := ξ i ≤a Y i .Then the size biased point process is defined to be the random collection of points Ξ := {(S (ξ i ), Y i )} i≥1 (see [2,Section 3.3]).We will use Lemma 8, Lemma 14 and Proposition 15 from .
(5.51) Also define the point processes where we recall that l(γ) are the left endpoints of the excursions of B λ µ,η and |γ| is the length of the excursion γ (see (2.6)).Note that Ξ ′ n is not a size biased point process.However, applying [2, Lemma 8] and Theorem 9, we get (5.53) To verify the claim, note that (5.5) and Assumption 1 iii together imply, for any t > 0, Thus, (5.53) follows using (5.54).Now, the point process 2Ξ ∞ satisfies all the conditions of [2, Proposition 15] as shown by Aldous.Thus, [2, Lemma 14] gives (5.55) This implies that n −2/3 C (i) i≥1 is tight in ℓ 2 ↓ by simply observing that |C i | ≤ k∈C i d k + 1.Therefore, the proof of Theorem 1 is complete using Theorem 18.

Proof of Theorem 2
The proof of Theorem 2 is completed in two separate lemmas.In Lemma 19 we first show that the convergence in Theorem 2 holds with respect to the ℓ 2 ↓ × N ∞ topology.The tightness of (Z n ) n≥1 with respect to the U 0 ↓ topology is ensured in Lemma 20.

Lemma 19. Let N λ n (k) be the number of surplus edges discovered up to time k and N
where N λ is defined in (2.7).
Proof.Recall the definitions of a, b, A k , B k , C k , S k from Section 5.1.Recall also that From Lemma 8, we can conclude that, uniformly over k ≤ un 2/3 , (5.57) The counting process N λ n has conditional intensity (conditioned on F k−1 ) given by (5.57).Writing the conditional intensity in (5.57) in terms of Sn , we get that the conditional intensity of the rescaled process Nλ n is given by (5.58) Denote by Wn (u) := Sn (u) − min ũ≤u Sn (ũ) which is the reflected version Sn .By Theorem 1, where W λ is defined in (2.6).Therefore, we can assume that there exists a probability space such that Wn → W λ almost surely.Using [22, Theorem 1; Chapter 5.3], and the continuity of the sample paths of W λ , we conclude the proof.
Lemma 20.The vector (Z n ) n≥1 is tight with respect to the U 0 ↓ topology.The proof of Lemma 20 makes use of the following crucial estimate of the probability that a component with small size has very large number of surplus edges: Lemma 21.Assume that λ < 0. Let V n denote a vertex chosen uniformly at random, independent of the graph CM n (d) and let C (V n ) denote the component containing V n .Let δ k = δk −0.12 .Then, for δ > 0 (small), where C is a fixed constant independent of n, δ, K.
For T > 0 (large), let Then, by applying the Cauchy-Schwarz inequality, i∈Kn For the case λ > 0, we can use similar ideas as the proof of Lemma 15, i.e., we can run the exploration process till T n 2/3 and the unexplored graph becomes a configuration model with negative criticality parameter for large T > 0, by (5.45).Thus, the proof can be completed using (5.64), the ℓ 2 ↓ convergence of the component sizes given by Theorem 1 and Lemma 19, and the proof for the case λ < 0.
Proof of Lemma 21.To complete the proof of Lemma 21, we will use martingale techniques coupled with Lemma 16.Fix δ > 0 (small).First we describe another way of exploring C (V n ) which turns out to be convenient to work with.
Algorithm 2 (Exploring C (V n )).Consider the following exploration of C (V n ): (S0) Initialize all half-edges to be alive.Choose a vertex from [n] uniformly at random and declare all its half-edges active.
(S1) In the next step, take any active half-edge and pair it uniformly with another alive half-edge.Kill these paired half-edges.Declare all the half-edges corresponding to the new vertex (if any) active.Keep repeating (S1) until the set of active half-edges is empty.
Unlike Algorithm 1, we need not see a new vertex at each stage and we explore only two halfedges at each stage.In this proof, F l denotes the sigma-field containing information revealed up to stage l by Algorithm 2 and V l denotes the vertex set discovered up to time l.Recall that we denote by D n the degree of V n .Define the exploration process s ′ n by, where n hits zero and the hitting time to zero gives the number of edges in C (V n ), since exactly one edge is being explored at each time step.We will use a generic constant C to denote a positive constant that can be different in different equations.For H > 0, let (5.66)Note that uniformly over l ≤ 2δn 2/3 for all small δ > 0 and large n, where the last step follows from the fact that λ < 0. Therefore, {s ′ n (l)} 2δn 2/3 l=1 is a super-martingale.The optional stopping theorem now implies (5.69) We put To simplify the notation, we write (5.70) Here we have used the fact that if there is at least one surplus edge in Let us denote the event that surplus edges appear at times l 1 , . . ., l K , s ′ n [0, 2δ K n 2/3 ] < H, and s ′ n [0, δ K n 2/3 ] > 0 by SPB(l 1 , . . ., l K ).Now, where (5.72) Therefore, using induction, where we have used the fact that ! and have used the Stirling approximation for (K − 1)! in the last step.Since λ < 0, we can use Lemma 16 to conclude that for all sufficiently large n for some constant C > 0 and we get the desired bound for (5.70).The proof of Lemma 21 is now complete by applying (5.69) and (5.73) in (5.70).

Vertices of degree k
In this section, we compute the number of vertices of degree k in each connected component at criticality.This will be useful in Section 7 and 8.Such an estimate was proved in [18,Theorem 2.4] for supercritical graphs under stronger moment assumptions.
Lemma 22. Denote by N k (t) the number of vertices of degree k discovered up to time t.For any t > 0, uniformly over k, Proof.By setting w i = ½ {d i =k} in Lemma 6 we can directly conclude that However, one can repeat the same arguments leading to the proof of Lemma 6 and obtain that Now, we can use the finite third-moment assumption to conclude that the numerator in the right hand side can be taken to be uniform over k.Thus, the proof follows.
Define v k (G) := the number of vertices of degree k in the connected graph G.As a corollary to Lemma 22 and (5.43), we can deduce that Moreover, the following also holds: Let ord(x) denote the vector with elements of x ordered in a non-increasing manner.
| uniformly over k.The proof now follows from (6.4) and ℓ 2 ↓ tightness of the component sizes given in Theorem 1.

Percolation on Configuration Model
Let p = p n ∈ (0, 1) be the percolation parameter.Recall the notation CM n (d, p) for the random graph obtained after deleting edges of CM n (d) independently with probability 1 − p. Suppose, d ′ is the random degree sequence obtained after percolation.Fountoulakis [15] showed that, given d ′ , the law of CM n (d, p) is same as the law of CM n (d ′ ).We will use the following construction of CM n (d, p) due to Janson [16]: For each half-edge e, let v e be the vertex to which e is attached.With probability 1 − √ p, one detaches e from v e and associates e to a new vertex v ′ .Color the new vertex red.This is done independently for every existing half-edge.Let n + be the number of red vertices created and ñ = n is the new degree sequence obtained by the above procedure, i.e. di ∼ Bin( (S2) Construct CM ñ( d), independently of (S1).
(S3) Delete all the red vertices.
Remark 8.It was argued in [16] that the obtained multigraph also has the same distribution as CM n (d, p) if we replace (S3) by (S3 ′ ) Instead of deleting red vertices, choose any n + degree one vertices uniformly at random, independently of (S1) and (S2), and delete them.
Remark 9.The construction of CM ñ( d) in Algorithm 3 consists of two stages of randomization, the first one is described by (S1), and the second one by (S2).We will consider the following probability space to describe the randomization arising from Algorithm 3 (S1): Suppose we have a sequence of degree sequences (d) n≥1 .Let È n p denote the probability measure induced on N ∞ by Algorithm 3 (S1).Denote the product measure of (È n p ) n≥1 by È p .Thus (S1) is performed independently on d = d(n) as n varies.All the almost sure statements in this section will be with respect to the probability measure È p .
Remark 10.The idea of the proof of Theorem 3 is as follows.We show that d, under Assumption 2, satisfies Assumption 1 È p almost surely and then estimate the number of vertices to be deleted from each component using Lemma 22. Since deleting a degree one vertex does not break up any component, we can just subtract this from the component sizes of CM ñ( d) to get the component sizes of CM n (d, p n (λ)).Since the degree one vertices do not get involved in surplus edges, deleting degree one vertices does not change the number of surplus edges.

Proof of Theorem 3
We now consider the critical window corresponding to percolation.The goal is to prove Theorem 3. Let n j and ñj be the number of vertices of degree j before and after performing Algorithm 3 (S1) respectively.Further let For convenience we write r j = È(D = j).Denote by ñjl , the number of vertices that had degree l before and have degree j after performing Algorithm 3 (S1).Therefore, ñjl Using the strong law of large numbers for triangular arrays, note that È p almost surely, ñjl = n l b lj ( Therefore, using the similar arguments as (7.2) again, È p almost surely, and (1) Under Assumption 2 i and for r = 1, 2, 3, (2) Under Assumption 2, νn = 1 + λn −1/3 + o(n −1/3 ).(7.7) Proof.We will make use of [19,Corollary 2.27].Suppose Z 1 , Z 2 , ..., Z N are independent random variables with Z i taking values in Λ i and f : N i=1 Λ i → Ê satisfies the following: If two vectors z, z ′ ∈ N i=1 Λ i differ only in the i th coordinate, then |f (z) − f (z ′ )| ≤ c i for some constant c i .Then, for any t > 0, the random variable Now let I ij denote the indicator of the j th half-edge corresponding to vertex i to be kept after Algorithm 3 (S1).Then di ( di − 1).(7.9) Note that f 1 (I) = i∈[ñ] di ( di − 1) since the degree one vertices do not contribute to the sum.One can check that, by changing the status of one half-edge corresponding to vertex k, we can change f 1 (•) by at most 2(d k + 1).Therefore, (7.8) yields (7.10) By setting t = n 1/2+ε for some suitably small ε > 0, using the finite third moment conditions and the Borel-Cantelli lemma we conclude that È p almost surely, ), (7.11) and in particular, ). (7.12) Similarly, take f 2 (I) = i∈[n] di ( di − 1)( di − 2) and note that changing the status of one bond changes f 2 (•) by at most [2(d k + 1)] 2 .Thus, (7.8) gives 13) which implies that, È p almost surely, ).Now, to prove Lemma 24 (1), note that the case r = 1 follows by simply observing that i∈ñ di = i∈[n] d i .The cases r = 2, 3 follow from (7.12) and (7.14).Finally, to see Lemma 24 (2), note that ), (7.15) by (7.12) and this completes the proof of Lemma 24.
We will denote by C(j) , the j th largest component of CM ñ( d).To conclude Theorem 3 we also need to estimate the number of deleted vertices from each component.Recall from Remark 8 that CM n (d, p n (λ)) can be obtained from CM ñ( d) by deleting relevant number of degree one vertices uniformly at random.Let v d 1 ( C(j) ) be the number of degree one vertices of C(j) that are deleted while creating CM n (d, p n (λ)) from CM ñ( d).Since the vertices are to be chosen uniformly from all degree one vertices, the number of vertices to be deleted from C(j) is asymptotically the total number of degree one vertices in C(j) times the proportion of degree one vertices to be deleted.Therefore, where the third equality follows from (6.4).The proof of Theorem 3 is now complete by using the ℓ 2 ↓ convergence in Lemma 23,(7.16)and Remark 10.
Proof of Lemma 25.Assume that k = 2 for the sake of simplicity.Observe that the total number of perfect matchings of 2k objects is given by 2k Let E 1 denote the event that a uniform perfect matching of all the half-edges contains also perfect matchings of the half-edges in H 1 and H 2 .Then, and and the proof is complete.

The dynamic construction
Let us now describe a dynamic construction of CM n (d) that turns out to be easier to work with.This dynamic construction was introduced in [5] to study the metric-space limits of the large components of the percolated configuration model.The graph process, given by Algorithm 5, can also be constructed as follows: Algorithm 6.Let Ξ n be an inhomogeneous Poisson process with rate s 1 (t) at time t.Let e 1 < e 2 < . . .be the event times of Ξ n .
(S1) At each event time, choose two unpaired half-edges uniformly at random and pair them.
The graph G n (t) is obtained by adding this edge to G n (t−).
Notice the similarity between Algorithm 4 (S1) and Algorithm 6 (S1).Now, the idea is to compare the number of half-edges that have been paired by Algorithms 4 and 6.For that, we need the following lemma that describes the evolution of the count of the total number of open half-edges in Algorithm 6: Lemma 27 ([5, Lemma 8.2]).Let s 1 (t) denote the total number of open half-edges at time t.Suppose that Assumption 2 holds.Then, for any T > 0 and some Notice that the proof of [5, Lemma 8.2] is stated only under some more stringent assumptions, however the identical argument can be carried out under Assumption 2. The next proposition ensures that the graphs generated by percolation in Algorithm 4 and the dynamic construction in Algorithm 5 are uniformly close in the critical window.Define There exists a coupling such that with high probability where ε n = cn −γ 0 , for some 1/3 < γ 0 < 1/2 and the constant c does not depend on λ.
Proof.Notice the similarity between Algorithm 4 (S1) and Algorithm 6 (S1).Let #E(G) denote the number of edges in a graph G. Suppose that we can show, as n → ∞, }, the choice of the uniform pair of half-edges at the k th pairing in Algorithm 4 (S1) can be taken to be exactly same as the k th pairing in Algorithm 6 (S1).Under the above coupling CM Thus, it remains to show (8.12).An application of Lemma 27 along with (8.10) yields, for some 1/3 < γ 0 < γ < 1/2, with high probability, Notice that the total number of half-edges in CM n (d, p n (λ)) follows a binomial distribution with parameters ℓ n /2 and p n (λ).Thus, with high probability, The fact that the error can be chosen to be uniform over λ ∈ [λ ⋆ , λ ⋆ ] follows from the DKW inequality [23].Thus, (8.13) and (8.14) together show that, with high probability, The other part follow similarly and the proof is now complete.
Remark 11.Notice that the proof of Proposition 28 can be directly modified to show that there exists a coupling such that, with high probability, where ε n = cn −γ 0 , for some 1/3 < γ 0 < 1/2 and the constant c does not depend on λ.Therefore, the scaling limits of different functionals like re-scaled component-sizes, surplus edges for G n (t n (λ)) and CM n (d, p n (λ)) are the same.

The modified process
From here onward, we often augment λ to a predefined notation to emphasize the dependence on λ.We write C (i) (λ) for the i th largest component of G n (t n (λ)) and define . By Lemma 27 and (8.10), ℓ o n (λ) ≈ nµ(ν − 1)/ν.Now, observe that, during the evolution of the graph process generated by Algorithm 5, between time [t n (λ), t n (λ + dλ)], the i th and j th (i > j) largest components, merge at rate an edge between the corresponding vertices.However, the selected half-edges are kept alive, so that they can be selected again.
Remark 12.The only difference between Algorithm 6 and Algorithm 7, is that the paired halfedges are not discarded and thus more edges are created by Algorithm 7. Thus, there is a natural coupling between the graphs generated by Algorithms 6 and 7 such that G n (t n (λ)) ⊂ Ḡn (t n (λ)) for all λ ∈ [λ ⋆ , λ ⋆ ], with probability one.In the subsequent part of this section, we always work under this coupling.The extra edges that are created by Algorithm 7 will be called bad edges.

Multiplicative coalescent with mass and weight
The Feller property of the multiplicative coalescent [2, Proposition 5] ensures the joint convergence of the number of open half-edges in each component of Ḡn (t n (λ)) at multiple values of λ as we shall see below.To deduce the scaling limits involving the components sizes let us consider a dynamic process that is further augmented by a certain weight.Initially, the system consists of particles (possibly infinitely many) where particle i has mass x i , and weight z i .Let (X i (t), Z i (t)) i≥1 denote the vector of masses, and weights at time t.The dynamics of the system is described as follows: At time t, particles i and j coalesce at rate X i (t)X j (t) and create a particle with mass X i (t) + X j (t), and weight Z i (t) + Z j (t).
Denote by MC 2 (x, z, t) the vector (X i (t), Z i (t)) i≥1 with initial mass x, and weight z.We shall need the following theorem: Theorem 29.Suppose that (x n , z n ) → (x, x) in (ℓ

Asymptotics for the open half-edges
In this section, we show that the open half-edges in the components of G n (t n (λ)) are approximately proportional to the component sizes.This will enable us to apply Theorem 29 for deducing the scaling limits of the required quantities for the graph Ḡn (t n (λ)).
then take the smallest half-edge a from A k .S2 Take the half-edge b from S k that is paired to a. Suppose b is attached to a vertex w (which is necessarily not discovered yet).Declare w to be discovered, let r = d w −1 and b w1 , b w2 , . . .b wr be the half-edges of w other than b.Declare b w1 , b w2 ,..., b wr , b to be smaller than all other halfedges in A k .Also order the half-edges of w among themselves as b w1 > b w2 > • • • > b wr > b.Now identify B k ⊂ A k ∪ {b w1 , b w2 , . . ., b wr } as the collection of all half-edges in A k paired to one of the b wi 's and the corresponding b wi 's.Similarly identify C k ⊂ {b w1 , b w2 , . . ., b wr } which is the collection of self-loops incident to w.Finally, declare

Lemma 15 .
Let C ≥T max denote the largest component which is started exploring after time T n 2/3 in Algorithm 1.Then, for any δ > 0, lim T →∞ lim sup n→∞ È |C ≥T max | > δn 2/3 = 0. (5.43)Let us first state the two main ingredients to complete the proof of Lemma 15: Lemma 16 ([17, Lemma 5.2]).Consider CM n (d) with ν n < 1 and let C (V n ) denote the component containing the vertex V n , where V n is a vertex chosen uniformly at random independently of the graph CM n (d).
[2].Let C := {C : C is a component of CM n (d)}.Consider the collection ξ := (ξ(C )) C ∈C such that conditional on ( k∈C d k , |C |) C ∈C , ξ(C ) has an exponential distribution with rate n −2/3 k∈C d k independently over C .Then the order in which Algorithm 1 explores the components can be obtained by ordering the components according to their ξ-value.Recall that C i denotes the i th explored component by Algorithm 1 and let D i := k∈C i d k .Define the size biased point process

. 5 )
Denote rl = È( D = l) = lim n→∞ ñl /ñ.Let Dn denote the degree of a uniformly chosen vertex from [ñ], independently of the graph CM ñ( d).Thus, (7.2) and (7.5) imply that Dn L − → D. The following lemma verifies the rest of the conditions for d in Assumption 1: Lemma 24.The statements below are true È p almost surely:

. 1 )
Also, for percolation on any (random) graph, conditional on the set of edges of the graph and the fact that k edges have been retained by percolation, the choice of the retained edges is uniformly distributed among all subsets of size k of the set of edges.Let E 2 denote the event that |H(CM n (d, p 1 ))| = 2k 1 , and |H(CM n (d, p 2 ))| = 2k 2 .It follows that