Clustering and percolation on superpositions of Bernoulli random graphs

Let $W=\{w_1,\dots, w_m\}$ be a finite set. Given integers $x_1,\dots,x_n\in[0, m]$ and numbers $q_1,...,q_n \in [0,1]$, let $D_1,\dots,D_n$ be independent uniformly distributed random subsets of $W$ of sizes $|D_i|=x_i$, $1\le i\le n$. Let $G_n$ be the union of independent Bernoulli random graphs $G(x_i, q_i),$ $1\le i\le n$, with vertex sets $D_1,\dots, D_n$. For $m,n\to+\infty$ such that $m=\Theta(n)$ we show that $G_n$ admits a tunable (asymptotic) power law degree distribution and non-vanishing global clustering coefficient. Moreover, the clustering spectrum admits a tunable scaling $k^{-\delta}$, $0\le \delta\le 2$, where $k$ is the vertex degree. Furthermore, we show a phase transition in the size of the largest connected component and examine the relation between the clustering spectrum and bond percolation threshold.

Despite remarkable methodological advances, most sparse network models still appear somewhat rigid in what comes to modeling finer clustering properties, such as the clustering spectrum (degree-dependent local clustering coefficient) [3,43,47], which may significantly impact the percolation properties of the network [4,19]. A decreasing clustering spectrum manifests the fact that high-degree nodes tend to have sparser local neighborhoods than low-degree nodes. Motivated by analyzing this phenomenon in a tractable quantitative framework, this article discusses a statistical network model generated as an overlay of mutually independent Bernoulli random graphs G 1 , … , G m which can be interpreted as layers or communities. The layers have a variable size (number of nodes) and strength (link probability), and they may overlap each other. A key feature of the model is that the layer sizes and layer strengths are assumed to be correlated, which allows to model and analyze a rich class of networks with a tunable frequency of strong small communities and weak large communities.

Main contributions
This article presents a rigorous mathematical analysis of clustering and percolation of the overlay graph model in the natural sparse limiting regime where the number of nodes n tends to infinity, the number of layers m is linear in the number of nodes, and the joint distribution P n of layer sizes and layer strengths converges to a limiting distribution P. We derive exact formulas for the limiting degree distribution, clustering coefficient, clustering spectrum, and the largest component size in terms of cross-factorial moments and functional transforms of P. We also investigate the model under bond and site percolation, and characterize critical parameter values of the associated phase transitions. The descriptive power of the model is illustrated by a detailed investigation of an instance where the layer size follows a power law, and the layer strength is a deterministic function of the layer size following another power law. This setting leads to a power-law degree distribution and a power-law clustering spectrum with tunable exponents in ranges (1, ∞) and [0,2], respectively. A special case in which layer strengths are inversely proportional to their sizes corresponds to layers of bounded average degree. In this natural parameter regime we discover a remarkable double phase transition phenomenon with two critical values: the first characterizing the emergence of a giant component in the overlay graph, and the second characterizing the emergence of gigantic components in layers covering a typical node.
Finally, we highlight that the modelling framework in this article covers both deterministic and random layer types. Our approach of characterizing the regularity of layer types using averaged empirical distributions allows both cases to be treated in a uniform manner.

Related work
The overlay network model discussed in this article is naturally motivated and implicitly described by classical works in social networks [17,22]. The explanatory power and wide applicability of the model in the context of social, collaboration, and information networks has been demonstrated in [48,49] by experimental studies of a community-affiliation graph, which represents an instance of the present model where the node sets of layers are nonrandom or otherwise known to the observer. The superposition of Bernoulli random graphs considered here serves as a null model for sparse community-affiliation graphs.
The mathematical analysis in this article builds on earlier works on component evolution and clustering in inhomogeneous random graphs [14] and random intersection graphs [8,9]. The special model instance with unit layer strengths reduces to the so-called passive random intersection graph [26], and as a byproduct, the present article also provides the first rigorous analysis of giant components in general passive random intersection graphs, extending [16,37]. When layer strengths are constant but not necessarily one, clustering properties and subgraph densities of the model have been analyzed in [31,32,40], and the recovery of the layers in [21]. Another related work [45] (also part of [44]) on percolation in overlapping community networks assumes that layers are sampled from an arbitrary distribution on the space of finite connected graphs, and the layers are assigned to nodes via a bipartite configuration model. The restriction to connected layers and the use of a configuration model makes the model in [45] and its analysis fundamentally different from the present one, and limits its applicability by ruling out networks composed of weak communities.
Clustering spectra with power-law exponent 1 have been shown for random intersection graph models [7,9] and spatial preferential attachment models [27,36], and with a tunable power-law exponent in [0, 1] for random intersection graphs [10,12] and recently also for a hyperbolic random geometric graph model [24]. Furthermore, [43] discusses an inhomogeneous Bernoulli graph model where the clustering spectrum vanishes, but its normalized version displays evidence of a power-law behavior with exponent in range (0,2).
To the best of our knowledge, the present work is the first of its kind where a degree dependent clustering with a tunable power-law exponent in the extended range [0,2] is rigorously analyzed in terms of a simple statistical network model. This model admits a clear explanation of the values of power-law exponents, and introduces a new analytical framework for studying ordinary and double phase transitions in bond and site percolation on sparse networks of overlapping communities of variable size and strength.

Outline
In the rest of the article, Section 2 presents model details and notations, and Section 3 the main results. Section 4 illustrates the main results in a power-law setting, and confirms the existence of double phase transition. The remaining Sections 5-10 are devoted to proofs, with technical details postponed to Appendix 11.

Multilayer network
A multilayer network model with n nodes and m layers is defined by a list ( (G 1 , X 1 , Q 1 ), … , (G m , X m , Q m ) ) of mutually independent random variables with values in  n ×{0, … , n}× [0,1], where  n is the set of undirected graphs with node set contained in {1, … , n}. We assume that conditionally on (X k , Q k ), the probability distribution of the vertex set V(G k ) of G k is uniform on the subsets of {1, … , n} of size X k , and conditionally on (V(G k ), X k , Q k ), each node pair of V(G k ) is linked with probability Q k , independently of other node pairs. Thus, G k is a Bernoulli random graph on node set V(G k ), with edge set denoted E(G k ). The variables X k , Q k , and (X k , Q k ) are called the size, strength, and type of layer k, respectively. Aggregation of layers produces an overlay random graph G defined by This setting includes as special cases: (i) models with deterministic layer types, and (ii) models where the layer types are independent and identically distributed random variables.

Large networks
A large network is analyzed by considering a sequence of network models ( (G n,1 , X n,1 , Q n,1 ), … , (G n,m , X n,m , Q n,m ) ) indexed by the number of nodes n = 1, 2, … so that the number of layers m = m n tends to infinity as n → ∞. The sequence of corresponding overlay random graphs is denoted (G (n) ). We shall focus on a sparse parameter regime where there exists a probability measure P on (Borel's -algebra  of) {0, 1, … } × [0, 1] which approximates in sufficiently strong sense the averaged layer type distribution In this fundamental regime, the network features are described by limiting formulas with rich expressive power captured by cross moments and tail characteristics of P.

Notations
We denote Z + = {0, 1, … }, (a) + = max{0, a}, and (x) s = x(x − 1) · · · (x − s + 1). The indicator function of a condition A is denoted by I(A) or I A , whichever is more convenient. Sets of size x are called x-sets. Unordered pairs and triples are abbreviated as ij = {i, j} and ijk = {i, j, k}. We write ∑ ′ ij and ∑ ′ ijk to indicate sums over ordered pairs and ordered triples with distinct elements. We write a n ≪ b n and a n = o(b n ) when a n ∕b n → 0, a n ≲ b n and a n = O(b n ) when limsup|a n ∕b n | < ∞, and a n ∼ b n when a n ∕b n → 1. For a sequence of bivariate random variables ( n , n ), we write n = o P ( n ) whenever lim n→∞ P(| n | < | n |) = 1 for any > 0; and n = O P ( n ) if for every > 0 there exists a constant c > 0 such that lim n→∞ P(| n | < c | n |) > 1 − . Notation n = o P ( n ), n = O P ( n ) extends to the case where the sequence n is deterministic (nonrandom).
A graph is a pair G = (V, E) where E is a set of unordered pairs of elements of V. The degree and component of node i in graph G are denoted by deg G (i) and C G (i), respectively. The transitive closure of graph G is defined as the graph G with V(G) = V(G) and E(G) = {ij ∶ i ∈ C G (j), j ∈ V(G)} consisting of unordered node pairs connected by a path in G.
The probability distribution of a random variable X is denoted by (X). For probability measures, tv (f , g) denotes the total variation distance, f * g the convolution, and f n w − → f refers to weak convergence. Convergence in probability is denoted P − →. On countable spaces, the same letter is used for both a probability measure f (A) and its density f (t) with respect to the counting measure. The Dirac measure at x is denoted by x . The densities of the binomial distribution Bin(x, q), with x ∈ Z + and q ∈ [0, 1], and the Poisson distribution Poi( ) with ≥ 0, are denoted by with the convention that the densities are zero for t outside {0, … , x} and Z + , respectively. The Bernoulli distribution is denoted Ber(q)(t) = Bin(1, q)(t). We also denote by the degree distribution of any particular node in the transitive closure H x+1,q of a Bernoulli random graph H x+1,q on node set {1, … , x + 1}, where each node pair is linked with probability q, independently of other node pairs. Alternatively, Bin + (x, q)(t) equals the probability that the connected component of any particular node in H x+1,q has size t + 1. Both distributions have the same support {0, … , x}, and Bin(x, q) ≤ st Bin + (x, q) in the strong stochastic order. No simple closed form expression is known for Bin + (x, q)(t), but its values can be efficiently computed with the help of Gontcharoff polynomials [2,5]. The compound Poisson distribution with rate parameter and increment distribution g is denoted CPoi( , g); recall that this is the law of a random variable ∑ Λ k=1 X k where Λ, X 1 , X 2 , … are mutually independent and such that (Λ) = Poi( ) and (X k ) = g.

Degree distribution
The model degree distribution is defined by and represents the probability distribution of the number of neighbors of a randomly chosen node. Because G (n) is an exchangeable random graph, we see that Theorem 3.1. Assume that m n → ∈ (0, ∞) and P n → P weakly together with (P n ) 10 → (P) 10 ∈ (0, ∞) for some probability measure P on Z + ×[0, 1]. Then the model degree distribution f (n) converges weakly to the compound Poisson distribution f = CPoi( (P) 10 , Bin 10 (P)).
The limiting degree distribution f in Theorem 3.1 can be represented as the law of = ∑ Λ k=1 k where Λ is Poisson distributed with mean (P) 10 , 1 , 2 , … follow a mixed binomial distribution Bin 10 (P), and the random variables in the sum are mutually independent. Here, Λ represents the number of layers covering a particular node, and k the number of neighbors in a typical layer covering the node. The mean equals E = (P) 21 ≤ ∞, and the variance equals Var( ) = ((P) 21 + (P) 32 ) The generating function is given by E(z ) = e (ĝ 10 (z)−1) , whereĝ 10 10 . The structure of P determines whether or not the limiting degree distribution is light-tailed or heavy-tailed. Section 4 illustrates both cases and provides examples of power laws with a tunable exponent. In Theorem 3.1 we have assumed that (P) 10 > 0. One can show that for (P) 10 = 0 the asymptotic degree distribution is degenerate at zero.

Clustering
Given a finite (nonrandom) graph  = (, ), the global clustering coefficient  and the degree dependent local clustering coefficient (also called clustering spectrum)  (k) are defined as follows. Let N △ and N ∨ denote the number of triangles and cherries (paths of length 2) of , respectively. Let N △ (v) be the number of triangles containing vertex v. Then These network characteristics represent conditional probabilities of a link between two neighbors of a randomly selected vertex.
be an ordered triple of vertices sampled uniformly at random. (8), where  is replaced by an instance of the random graph G. An argument bearing on the law of large numbers (applied to the sums of random variables in the numerators and denominators of ratios (8)) suggests that G − (G) = o P (1) and G (k) − (G)(k) = o P (1) as the number of vertices n → +∞. Therefore, the model characteristics (G) and (G)(k) can be viewed as approximations to the clustering coefficients G and G (k) and our asymptotic results for (G) and (G)(k) shown below can likely be extended to G and G (k). Now we focus on the model clustering characteristics (G) and (G)(k). We observe that since the distribution of G is invariant under permutation of its vertices, we have that where G(ij) represents the event that nodes i and j are adjacent, and the sums are taken over ordered triples of distinct nodes. We denote (n) = (G (n) ) and (n) (k) = (G (n) )(k).
Theorem 3.2. Assume that (P n ) rs → (P) rs < ∞ for rs = 21, 32, 33, and (P) 21 > 0. Then 32 when m ≪ n and (P) 32 > 0, Remark (constant layer strengths). When Q k = q is constant for all k, we see that (P) rs = (p) r q s where (p) r equals the rth factorial moment of the limiting layer size distribution. In this case the limiting model clustering equals q(p) 3 and agrees with [9,32].
Section 4 illustrates examples where the limiting clustering spectrum (t) follows a power law.

Connected components
We denote by N 1 (G (n) ) ≥ N 2 (G (n) ) the two largest component sizes in G (n) . For a probability distribution h on Z + , we denote by } the probability of eternal survival of a Galton-Watson branching process with offspring distribution h.
In Theorem 3.4 we have assumed that (P) 10 > 0. One can show that for (P) 10 = 0 we have

Site percolation
We may analyze how a subset of nodes S ⊂ {1, … , n} is connected by considering a site-percolated graph defined as the subgraphǦ of G induced by S. The site-percolated overlay graph is an instance of the overlay graph model (1) on the vertex set S with layers (Ǧ 1 , Note that the conditional distribution ofX k given X k = |V(G k )| is hypergeometric. An approximation of the hypergeometric distribution by a binomial distribution Bin(X k , ) with |S| n ≈ suggests replacing the limiting layer type distribution P by The following result confirms that this modification is well justified, and summarizes the results of Theorems 3.1-3.4 adjusted to site percolation.

Bond percolation
Bond percolation studies how well the nodes of a graph are connected along a subset of links obtained by random sampling. In a multilayer network, we may either sample (i) a subset of links of the overlay graph, or (ii) independent subsets of links for each layer separately. To analyze these cases for the overlay graph model G = G (n) in (1), we define an overlay bond-percolated graph bŷ and a layerwise bond-percolated graphG by where H, H 1 , … , H m are mutually independent random graphs on {1, … , n} in which each node pair is linked with probability , independently of other node pairs, and independently of the layers To emphasize the dependence on we sometimes writeĜ =Ĝ( ) andG =G( ).
In an epidemic modeling context, the standard SIR epidemic model is used to model individuals who infect their neighbors with probability , independently of each other [2]. The links of a graph G represent social contacts, and the bond-percolated component of node i corresponds to the set of eventually infected individuals in a population where node i is initially infectious and the other nodes susceptible. Bond percolation on the overlay graph can be used to develop finer models to model contacts of individuals generated by social communities (households, workplaces, schools) of variable size and strength. Layerwise percolationG then models the case where infections occur independently inside the communities, and the overlay bond-percolationĜ models the case where infections occur between individuals regardless of the underlying community structure.
The layerwise bond-percolated graph is an instance of the overlay model (1) with layer types (X k , Q k ). This suggests considering a modified limiting layer type distribution We expect the overlay bond-percolated model to behave similarly to the layerwise bond-percolated model in sparse regimes where the layers do not overlap much. The following result confirms this, and summarizes the results of Theorems 3.1-3.4 adjusted to bond percolation.
If we also assume that (P n ) rs → (P) rs ∈ (0, ∞) for rs = 21, 32, 33, then: (iii) The model clustering coefficient converges to where is the corresponding limit of the nonpercolated graph G (n) .

Double phase transition
Theorem 3.6 shows that the largest relative component size in the bond-percolated graph is approximated by the survival probability (f + ) of a Galton-Watson process with compound Poisson offspring distributionf + = CPoi ( (P) 10 , Bin + 10 (P) ) . The mean of the offspring distribution can be written as 1 where R(x, q) = ∑ t≥0 t Bin + (x, q)(t) defined using (3) represents the expected degree of a node in the transitive closure of Bernoulli random graph with x + 1 nodes and link probability q. Classical branching process theory tells that (f + ) > 0 if and only if R 0 ( ) > 1. Hence the largest component in the bond-percolated graph is sublinear for < 1 , and linear for > 1 , where the critical threshold is defined by The overlay graph model studied in this article involves another nontrivial phase transition associated with a critical threshold value Section 4 describes an example where 0 < 1 < 2 < 1.
The first phase transition at 1 characterizes the emergence of a giant component in a bond-percolated overlay graph. To understand the second phase transition, note that R 0 ( ) is proportional to the expected number of nodes which can be reached by paths within a typical bond-percolated layer covering a particular node. The second phase transition at 2 hence amounts to the emergence of gigantic components inside bond-percolated layers covering a typical node.
In the epidemic context discussed in Section 3.5, we note that the critical quantity R 0 ( ) does not refer to the number of individuals directly infected by a reference individual in an otherwise susceptible population, unlike in classical SIR models. Rather, R 0 ( ) also counts the number of individuals indirectly infected by the reference individual via single-layer infection paths.

POWER-LAW MODELS
This section illustrates the rich statistical features of the overlay model by discussing the results of Section 3 in a setting where the limiting layer strength is a deterministic function of layer size according to Q = g(X) for some g ∶ Z + → [0, 1], and the limiting layer type distribution factorizes according to where the layer size distribution p is a probability on Z + . We are especially interested in the case where the probability mass function p(x) = P(X = x) of the layer size distribution and g(x) follow power laws with exponents ≥ 2 and ≥ 0. Here L(x) is a slowly varying function at +∞, b > 0 is a constant, and we choose b ≤ 1 for = 0. We will assume that (16) holds for large x. Note that for r, s satisfying + s > r + 1 the cross moment is finite. It is also finite in the case where + s = r + 1 and x −1 L(x) is integrable at +∞.

Degree distribution and clustering spectrum
Theorems 4.1 and 4.2 below establish power laws for the limiting degree distribution and clustering spectrum. Figures 1 and 2 illustrate how the associated power-law exponents relate to the  corresponding exponents of layer sizes and layer strengths. Remarkably, the power law of the clustering spectrum admits a tunable exponent in [0, 2]. A similar power law with exponent 1 has earlier been established for a random intersection graph [9] and for a spatial preferential attachment random graph [27], and with exponent restricted to [0, 1] for inhomogeneous random intersection graphs [7,10,12] and a hyperbolic random geometric graph model [24]. (i) If ∈ (0, 1), then the limiting degree distribution f satisfies as t → +∞ (17) 33 , and c 2 = c 1 + c 3 . In particular, (t) ∼ b for = 0.
We remark that the inequality + 2 > 4 implies (P) 10 , (P) 21 , (P) 32 < ∞; in particular the asymptotic degree distribution has a finite second moment/variance. Networks with (t) ≪ t −1 are sometimes called weakly clustered, and those with (t) ≫ t −1 strongly clustered [4]. According to Theorem 4.2, the overlay graph model produces weakly clustered networks for > 1 2 , and strongly clustered networks for < 1 2 . We believe Theorem 4.2 can be extended to more general subexponential distributions p(x). We do not pursue this line here to avoid unnecessary technicalities.
The above observations confirm the existence of a double phase transition in bond percolation, as postulated in [19], for a natural network model admitting tunable power-law exponents for both the degree distribution and the clustering spectrum. Together with Theorems 4.1 and 4.2, this opens up a flexible framework for studying the significance and interrelations of these power laws to bond and site percolation properties in clustered complex networks. The investigation of how these phase transitions are reflected in the core-periphery organization of the network [4,19] remains an important topic for future research.

NOTATION USED IN PROOFS
Let (X, Q) and (X n, , Q n, ) be bivariate random variables with the distributions P and P n respectively (one may interpret as a integer selected uniformly at random from {1, … , m}). For a random variable 2 The first implication in (18) follows by noting that if xq < 1, then the proof of Theorem 5.4 [30] shows that we denote by P the probability distribution of and write ∼ P (equivalently, P = ( )). For a bivariate random variable ( , ) we denote by P the conditional probability distribution of given . In particular, P (B) = E(I { ∈B} | ), for a Borel set B. In the proof of Theorem 3.1 below we use the inequality For vectors x = (x 1 , … , x m ) ∈ R m and q = (q 1 , … , q m ) ∈ [0, 1] m we denote (x, q) = ((x 1 , q 1 ), … , (x m , q m )). With a little abuse of notation we write (x n , q n ) = ((x n,1 , q n,1 ), … , (x n,m , q n,m )) and (X n , Q n ) = ((X n,1 , Q n,1 ), … , (X n,m , Q n,m )). To stress the dependence of the overlay graph on the sequence (X n , Q n ) we write G (n) = G (X n ,Q n ) . By V = {v 1 , … , v n } we denote the vertex set of G (n) . The vertex sets of the layers G n,j are denoted D j , 1 ≤ j ≤ m.
Next we introduce an -discretization of the space [0, 1] of admissible layer strengths (edge densities) which helps to reduce the analysis of the general model with potentially uncountably many layer types into a one with finitely many layer types. For any ∈ (0, 1) we fix numbers 0 = s 0 < s 1 < · · · < s r = 1 (with r ≤ 2∕ ) such that P(Q = s i ) = 0 for 0 < s i < 1 and |s i − s i−1 | < for 1 ≤ i ≤ r. Relative to this mesh, we define down-rounding and up-rounding operations q → q − and q → q + on [0, 1] using the formulas ) .
Furthermore, we denote for short G + = G (X n ,Q + n ) and G − = G (X n ,Q − n ) . In particular, G + is the superposition of the layers (G + n,1 , X n,1 , Q + n,1 ), … , (G + n,m , X n,m , Q + n,m ). In view of the coordinate-wise inequalities there is a natural coupling of random graphs G − , G + , and G (n) such that P(G − ⊂ G (n) ⊂ G + ) = 1. By , + , and − we denote the degree of vertex v 1 in G (n) , G + and G − respectively. The discretization is used in the proofs of Theorems 3.1 and 3.4. Given ∈ (0, 1) we first establish respective results for G − and G + and then letting ↓ 0 we carry them over to G (n) using the coupling By E (X,Q) and P (X,Q) we denote the conditional expectation and probability given the random vector (X n , Q n ).

ANALYSIS OF DEGREE DISTRIBUTIONS
Here we prove Theorem 3.1. Before the proof we collect some useful facts about the compound Poisson distribution and the total variation distance. Let 1 , 2 > 0. Let , be random variables and let Z ∼ CPoi( 1 , ( )) and Z ∼ CPoi( 2 , ( )) be independent compound Poisson random variables. Let I be Bernoulli random variable independent of ( , ) having the success probability P(I = 1) = In the particular case where and take values in Z + we have If, in addition, ( ) = ( ) then we have ( ) = ( ) = ( ).
Proof of Theorem 3.1. In the proof we drop the subscript n when it does not cause an ambiguity. Thus we write G = G (n) , G j = G n,j , G + n,j = G + j and X j = X n,j , Q j = Q n,j , (X, Q) = (X n , Q n ), X = X n, . We begin by outlining the idea of the proof. The degree of vertex v 1 in the overlay random graph G is approximated by the sum of degrees of v 1 in the layers G j containing this vertex. We denote this sum Here, I {v 1 ∈D j } is the indicator of the event {v 1 ∈ D j } and H(j) stands for the degree of v 1 in G j . In the sparse regime considered it is rather unlikely that two layers intersect in more than one point. Hence we approximate Note that only the layers of size at least 2 may contribute to the sum L A . We denote the respective number of layers In order to analyze the distributions of L A and S A it is convenient to condition on (X j , Q j ), 1 ≤ j ≤ m. Then S A becomes the sum of independent Bernoulli random variables and its distribution is approximately Poi , wherẽ has binomial distribution Bin(X j − 1, Q j ) and the probability that it will contribute to the sum L A is proportional to X j . This in turn yields that the typical contribution to the sum (26) by a layer of size at least 2 has the size biased mixed binomial distribution while the conditional distribution of L A is approximately CPoi In this way we establish the approximation CPoi We conclude the outline with the observation that CPoi( x * , h * ) = CPoi ( (P) 10 , Bin 10 (P)) .
Indeed, a simple calculation shows that These identities imply the relation (P) 10 (ĝ − 1) = x * (ĥ * − 1) between the Fourier transformsĝ and h * of the increment distributions Bin 10 (P) and h * . The latter relation establishes the correspondence between the Fourier transforms of respective compound distributions (30). Finally, we mention that the rigorous proof of Theorem 3.1 is a bit more complex and the proof idea is somewhat hidden by technicalities including the truncation and discretization. Now we give a rigorous proof. We first consider the special case where there exists M > 0 such that X n,j ≤ M almost surely for each n and 1 ≤ j ≤ m.
Given 0 < < 1, consider -discretized random variables Q + and Q − and the overlay graphs G + and G − defined by the vectors (X n , Q (20), (21). Let h + * and h − * denote the distributions defined by (29), where Q is replaced by Q + and Q − respectively. Let + * = + * ( ) and − * = − * ( ) be compound Poisson random variables with the distributions CPoi( x * , h + * ) and CPoi( x * , h − * ). Note that as → 0 Now we turn to the analysis of the degrees , + and − . We will show below that Since the coupling G − ⊂ G ⊂ G + implies the coupling − ≤ ≤ + we have Combining (32) and (33) we obtain Finally, letting ↓ 0 we obtain from (31), . It remains to prove (32). We only show the second relation. The proof of the first one is the same. At this point we need some more notation. Recall -discretization (20). Let k,i for k ≥ 2 and 1 ≤ i ≤ r and j, l ≥ 1 be independent random variables. We assume that  (20). Let where H + (j) stands for the number of neighbors of v 1 created by the layer G + j . In particular, given (X, Q), the random variable H + (j) has binomial distribution Bin(X j − 1, Q + j ). Introduce random sets Next we condition on (X, Q). Given (X, Q) such that S > 0 define random variables For S = 0 we put L 2 ≡ 0 and L 3 ≡ 0.
We observe that the conditional distributions P (X,Q) and P (X,Q) is a compound Poisson distribution. Moreover, using the property (22), (23) one can easily show that P (X,Q) Now we are ready for the proof of (32). We observe that + ≠ L 1 implies that some v l ≠ v 1 is a neighbor of v 1 in at least two different layers. We have, by the union bound and symmetry, that Invoking EX 2 ≤ M 2 and using the fact that ( Next we evaluate tv ((L 2 ), (L 3 )). To this aim we consider an array of random variables starting with L 2 and ending at L 3 where each subsequent element of the sequence is obtained from the previous one by replacing k,i . We proceed until all the products are replaced so that at the array ends with L 3 . By the triangle inequality, the total variation distance between the conditional distributions tv ( P (X,Q) Invoking the bound tv ) ) ≤ 2k 2 ∕n 2 , which follows by Le Cam's inequality [42], we Now an application of (19) yields Finally, we evaluate the distance tv (( Recall that given (X, Q) the conditional distribution of L 3 is CPoi(̂, h + ). It follows from (25) Furthermore, Lemma 6.1 implies (Here we apply Lemma 6.1 to k,i = k,i = H k,i in the case where and are bivariate random variables with the distributions P( = (k, i)) =p k,i and P( = (k, i)) = p + k,i .) We observe that both terms on the right of (38) vanish in probability. Indeed, by the weak law of large numbers (we use Chebyshev's inequality), our assumption P n In view of the obvious inequality (1). Finally, from the latter bound combined with (36), (37) we derive (32), by the triangle inequality: Now we revoke the extra condition that X n,j ≤ M almost surely for each n and 1 ≤ j ≤ m. From now on let {(X n,j , Q n,j ), n ≥ 1, 1 ≤ j ≤ m n } be arbitrary bivariate random variables satisfying conditions of the theorem. Given M > 0, let G [M] (n) be the random overlay graph defined by the sequence ( ( (n) has asymptotic compound Poisson distribution CPoi . Here where stands for the degree of v 1 in G (n) . Moreover, ≠ [M] implies that v 1 belongs to a layer of size greater than M. Hence, by the union bound, Note that the quantity on the right is o(1) uniformly in n as M → ∞. Indeed our assumptions P n w − → P and (P) 10 → (P) 10 imply lim M→∞ sup n EX n,

ANALYSIS OF CLUSTERING
Here we prove Theorems 3.2 and 3.3. In the proof we drop the subscript n when it does not cause an ambiguity. Thus we write G = G (n) , V = V(G (n) ) = {1, 2, … , n} and G k = G n,k . Let K 12 and P( = t,  12 ) defining the ratios (n) = P( 3 ) P( 12 ) and (n) (t) = P( =t,  3 ) P( =t,  12 ) . In the proof of Theorem 7.3 we approximate P( 3 ) by the probability that K 3 is produced by a single layer (with m different layers available). Similarly, in the proof of Theorem 7.4 P( 12 ) is approximated by the sum of two probabilities: the first one being the probability that K 12 is produced by a single layer (with m different layers available) and the second one being the probability that the edges of K 12 are produced by different layers (with m(m − 1) layer pairs available). We note the corresponding sum in the denominator of (10).
Before the proof we introduce some notation and collect auxiliary results. Subgraph frequencies in the overlay graph will be characterized using cross moments of the averaged layer type distribution P n defined by (2), and normalized cross moments defined by where Lemma 7.1. Recall the averaged empirical distribution P n defined by (2). If P n w − → P and (P n ) rs → (P) rs < ∞, then the cross moments defined in (40)-(41) satisfy 10,rs ≪ m(n) −1 r and (P n ) 10,rs ≪ n.
By taking expectations and averaging with respect to k, we find that where A = X , B = (X ) r Q s , and (X , Y ) is a generic P n -distributed random variable. Because the left side above equals m −1 n(n) r 10,rs , we conclude that where c = sup n (P n ) rs , (t) = sup n ∫ I(x > t) P n , and (t) = sup n ∫ (x) r y s I((x) r y s > t) P n . Then the tightness of P n implies that (a n ) → 0 for a n = n 1∕2 . Hence also b n (a n ) → 0 for b n = (a n ) −1∕2 → ∞.
The uniform (x) r y s -integrability of P n further implies that (b n ) → 0. Hence the right side above vanishes and first claim follows. For the second claim, we may repeat the above reasoning to verify that (42) holds also with the E-symbol removed. Therefore, (43) also holds when the left side is replaced by (P n ) 10 Hence the second claim follows by the same argument. ▪ Let k * be a random number uniformly distributed in {1, … , m} and independent of G = G (n) . In the following result G k * = G n,k * represents a randomly chosen layer. Lemma 7.2. Let F rs be a graph with node set in {1, … , n} such that |V(F rs )| = r and |E(F rs )| = s, and let i be a node in V(F rs ) with deg F rs (i) = r − 1. Select k * ∈ {1, … , m} uniformly at random and independently of the layers. Then: The corresponding probability for a randomly selected k * equals The corresponding probability for a randomly chosen k * is so the claim follows by dividing both sides by P(G k * ⊃ F rs ) = (n) −1 r (P n ) rs . ▪ Now we are ready to state and prove Theorems 7.3 and 7.4. We use the short hand notation g (n) rs = Bin rs (P n ), where the mixed binomial distribution Bin rs (P n ) is defined in (5).
is the model degree distribution defined by (7) and the approximation error is bounded by Proof. Denote by  k = {G k ⊃ K 3 } the event that all node pairs of the triangle are linked by the layer k. We also denote k = deg G k (1), Proof of (i). Denote and observe that 0 ≤ 1 (t) ≤ P( = t,  12 ) + P( = t,  111 ), where  12 is the event that there exists one layer covering one link and a different layer covering the remaining two links of K 3 , and  111 is the event that three distinct layers cover distinct links of K 3 . We write p(abc) = P ( , where  ij a denotes the event that node pair ij is linked in layer a. We note that p(abc) = p 21 (a)p 21 (b)p 21 (c), p(aab) = p 32 (a)p 21 (b), and p(aaa) = p 33 (a) for distinct layers a, b, c. Hence (p(aab) + p(aba) + p(baa)) ≤ 3 21 32 , , and hence, noting that 33 ≤ 32 ≤ 21 , By combining this with the bound for 1 (t), we conclude that where Hence claim (i) follows by summing the above equality over t, and noting that ∑ k P( k ) = 33 . Proof of (ii). We start with (45) It follows from Lemma 7.2 that ∑ Hence the last term above equals 33 f (n) * g (n) 33 (t − 2), and to prove the claim it suffices to analyze the approximation errors in (46)- (47).
The approximation error in (47) equals 4 10,33 . Claim (ii) follows by combining the above estimates for the total approximation error (t) = 1 (t) + 2 (t) + 3 (t) + 4 (t). ▪ Theorem 7.4. We have is the degree distribution of G, and the approximation error is bounded by We start with an outline of the proof. First we approximate ≈ ∑ k, so that We note that where the conditional probability is evaluated using Lemma 7.2. Hence the first term on the right of (53) equals Next we approximate, denoting h k (s) = P( k = s,  12 k ), After noting (see Lemma 7.2) that and hence the second term on the right side of (53) is approximately By combining (53), (55) and (58), we conclude that The total approximation error in (59) can be written as (t) = 1 (t) + 2 (t) + 3 (t) + 4 (t), where 1 (t), 2 (t), 3 (t) are the approximation errors in (50), (51), (52), respectively, and the approximation error in (58) equals where 41 (s), 42 (s) denote the errors made in (56), (57), respectively. Now we give a rigorous proof of (i) and (ii), where we analyze the individual approximation errors one by one.

CLUSTERING AND DEGREE IN POWER-LAW MODELS
Here we prove Furthermore, we show below that for P satisfying (15), (16) the mixed binomial distribution (5) follows a power law with parameters It is an immediate consequence of (65) and (66) that the limiting degree distribution f = CPoi( (P) 10 , Bin 10 (P)) obeys a power law (17). We similarly establish respective power law asymptotics for the distributions f * g 33 , f * g 32 and f * g 21 * g 21 that appear in (10). Now a simple analysis of the ratio (10) as t → +∞ shows the asymptotics of Theorem 4.2.
Proof of Theorem 4.1. Statements (i) and (ii) are immediate consequences of (65), (66) as described above. To prove (iii) we note that ≥ 1 implies XQ = Xg(X) ≤ B almost surely. Denote for short = (P) 10 and let H be a random variable with the distribution Bin 10 (P) and Λ be a Poisson random variable with parameter . We have ∑ where ) .
The almost sure inequality QX ≤ B together with the identity (P) 10 = EX complete the proof. ▪ Proof of Theorem 4.2. Theory of discrete subexponential densities [23, Lemmas 4.9 and 4.14], implies that (f 1 * f 2 )(t) ∼ f 1 (t) + f 2 (t) for all probability densities on the positive integers such that f i (t) ∼ a i t − i with a i > 0 and i > 1. By Theorem 4.1, we know that f (t) ∼ (P) 10 10 t − 10 , and by (66), we find that g rs (t) = Bin rs (P)(t) ∼ rs t − rs with parameters given by (67). Because 32 < 21 < 10 , it follows that Hence by formula (10), Because 33 − 10 = 3 −2 1− , we see that (t) follows a power law with density exponent 33 − 32 = 1− for ≤ 2 3 , and density exponent 10 − 32 = 2 for ≥ 2 3 . The constant term of the power law is determined by (67). ▪ Let us prove (66). Given r ∈ Z + , let H (r) be a mixed binomial random variable with the distribution P(H (r) = t) = E (Bin(X − r, g(X))(t)) , where the distribution p(x) = P(X = x) of X and function g are given by (16). In Lemma 8.1 below we show that the distribution of H (r) follows a power law Next we observe that the distribution Bin rs (P) can be written in the form Bin(x − r, g(x))(t)p rs (x), where the random variableX has the power law distribution Hence (69) yields (66). It remains to prove (69). In the proof we use the fact that binomial distribution is highly concentrated around its mean. We apply this fact to mixed binomial random variable H (r) conditionally, given the mixing random variable X.
In what follows we assume that either 0 < < 1 or = 0 and b < 1. We only consider the case where r = 0. The proof for r ≥ 1 is much the same.
In the proof limits are taken as t → +∞. Let H k ∼ Bin(k, g(k)) be a binomial random variable. We use the shorthand notation Given t, let t = t 1∕2 ln 4 t. We split the probability We assume that t is large enough so that g(k) = bk − for k > t. Then k = bk 1− and A 1 ≠ ∅. To prove (69) we show that (1)) and I 1 , Let us evaluate I 2 . By the local limit theorem [39], [50] we approximate uniformly in k ∈ A 2 In what follows we consider the cases 0 < < 1 and = 0 separately.
For 0 < < 1 we have 2 k = k (1 − g(k)) = t(1 + O( t ∕t) + O(t − ∕(1− ) )) uniformly in k ∈ A 2 . Now (71) implies . Next we approximate and use the substitution x = (by 1− − t)∕ √ t to write the integral in the form Now it is easily seen that the integral I 1 converges to √ 2 as t → +∞. Hence, We have arrived to the first relation of (70). For = 0 and b < 1 we have 2 ). Now (71) implies Invoking the approximation S ′ = I ′ + O(1), where we obtain the first relation of (70). We derive the bounds on the right of (70) from the upper bounds that hold uniformly in k ∈ A 1 ∪ A 3 . Indeed, (72) follows from the well-known exponential inequalities for Binomial probabilities see for example, Theorem 2.1 of [30], We only show (72) The second inequality of (73) implies (1)). ▪

GIANT COMPONENT
Here we prove Theorem 3.4. We start with an outline of the proof. In the proof we apply the approach developed in [15]: we approximate the number N 1 (G (n) ) of vertices in the largest connected component of G (n) by the number of vertices u with the property that the breadth first search (BFS) tree rooted at u contains at least vertices, where = (n) → +∞ as n → +∞. Then the BFS exploration process rooted at u is approximated by respective branching process and the fraction of vertices u having large BFS trees (of size at least ) is approximated by the survival probability = (f + ) of the branching process. Hence N 1 (G (n) )∕n P − → . We briefly comment on the branching process, which is much different from that of [15]. Let us consider the BFS exploration process at the moment when it enters a new layer, say, G n,i . The first vertex of G n,i detected by BFS will be included into the BFS tree together with the component of G n,i the vertex belongs to. The other members of the component are called children of the vertex. Since the vertex may belong to several layers, it can have children from the other layers as well. Consequently, the total number of children is approximated by the sum of degrees of the vertex in the transitive closures G n,i of layers G n,i that cover this vertex. Accordingly, the offspring distribution is approximated by f + . We note that layers do not need to be connected. Each layer G n,i may split into components, but for moderately growing = (n) the BFS exploration process will not visit the same layer twice within the first O( ) steps with high probability. Therefore the offspring numbers are approximately independent.
The proof is organized as follows. We start with Lemma 9.1 which establishes the result in the special case where the layer types are deterministic, that is, ∀n P((X n , Q n ) = (x n , q n )) = 1 for some x n = (x n,1 , … , x n,m ) and q n = (q n,1 , … , q n,m ). Furthermore, we assume that the distribution of (X, Q) has a finite support, say, A ⊂ {0, 1, 2, … , M} × [0, 1], where M ≥ 2 is an integer. Hence, P(X = t, Q = q) > 0 whenever (t, q) ∈ A. Moreover, we assume in Lemma 9.1 that the set Note that for each (x n,i , q n,i ) ∈ A ⧵ A 0 respective layer G n,i has no edges. Next, in Lemma 9.2 we relax condition (74) by allowing a negligible fraction of layer types (x n,i , q n,i ) to take their values outside A.
In the last step of the proof we reduce the general case to that considered in Lemma 9.2. To this aim we truncate the layer sizes X n,i (at level M) and -discretize the edge densities Q n,i as in (20). Then we apply Lemma 9.2 conditionally given the truncated and discretized layers Notation. Before the proof we introduce some notation. Given a Galton-Watson (G-W) branching process  we denote by || the total progeny of , (k) () = P(|| ≥ k) and () = P(|| = ∞).

Let T(t, q) = deg H t,q
( ) be the degree of a randomly selected vertex in the transitive closure H t,q of Bernoulli random graph H t,q on t vertices and with edge density q. Let  = {T s (t, q), T (j) s (t, q) ∶ q ∈ [0, 1], j, s, t ∈ N} be a collection of independent random variables such that T s (t, q) and T (j) s (t, q) have the same distribution as T(t, q).
Let C v ⊂ V be the vertex set of the connected component of G = G (n) = G (X n ,Q n ) containing vertex v ∈ V. Let ∶ N → N be a function satisfying (n) → ∞, (n) = o(n) as n → +∞. Let B k = {v ∶ |C v | ≥ k} ⊂ V be the set of vertices belonging to connected components of G of size at least k. We write B = B (n) . Let C ⊂ V denote the vertex set of the largest connected component of G. Note that for each integer k ≥ 1 In Lemma 9.1 below we assume that the distribution of (X, Q) has a finite support A and denote q 0 = min{q ∶ (t, q) ∈ A 0 } and h t,q = P(X = t, Q = q), (t, q) ∈ A.
Given 0 < < 1∕4, let  + ,  − ,  be G-W processes with the offspring numbers Here Λ t,q ∼ Poi( t,q ) and Λ ± t,q ∼ Poi( ± t,q ) with t,q = th t,q and ± t,q = t,q (1 ± ). Furthermore, we assume that the collection of random variables {Λ t,q , Λ + t,q , Λ − t,q , (t, q) ∈ A 0 } and  are independent. Note that offspring numbers (76) have compound Poisson distributions. In particular, Y has the probability distribution CPoi( , (T * )), where = x * with x * = E(XI {X≥2} ) and the random variable T * has the probability distribution By the same reasoning as in (30) above, we obtain the equality of distributions CPoi( , (T * )) = CPoi( (P) 10 , Bin + 10 (P)), where the increment distribution Bin + 10 (P) is defined by (6). We recall that D i denotes the vertex set of layer G n,i , 1 ≤ i ≤ m. With a little abuse of notation we also refer to D i as the layer G n,i . Furthermore, we assign label D i to the edges of G n,i . In particular, an edge of G may receive several labels if it is present in several layers. Given n and (t, q) ∈ A, let D t,q be the collection of layers D i having size x n,i = t and edge density q n,i = q. Put D 0 = ∪ (t,q)∈A 0 D t,q . For every D i ∈ D 0 the probability that a randomly chosen vertex of D i has a neighbor in D i connected by an edge labeled D i is Note thatq ≥ q 0 .
Proof of Lemma 9.1. We recall that the idea of the proof is outlined in the first paragraph of the section above. Proofs of (81), (82), (83) are given in separate steps.
Proof of (81). Note that the distribution of |C v | is the same for each v ∈ V. Hence, it suffices to approximate P(|C v | ≥ (n)) for v = v 1 .
Proof of (90). We fix an order v 1 < v 2 < · · · < v n of elements of V. Let m ′ = ∑ (t,q)∈A 0 m t,q be the number of sets in the collection D 0 . We can assume without loss of generality that Upper bound (the last inequality of (90)). Given v ∈ V, define the list L v of vertices using a BFS type exploration procedure. In the beginning all vertices are uncolored, all sets D i ∈ D 0 are unmarked, and L v = ∅. After a vertex is added to L v the vertex is colored white. We add v to the list. Next we proceed recursively. We choose the oldest (with respect to inclusion to L v ) white vertex, say u, from L v .
For i = 1, 2, … , m ′ such that u ∈ D i and D i is not marked, we mark D i (we say that D i is marked by u) and add to L v (in increasing order) all uncolored vertices of D i that are connected to u by paths of edges labelled D i . We say that D i brings these vertices to the list and attach label D i to each of them. Afterwards we color u black. Vertices added to L v in this step are called children of u. We then chose the oldest white vertex from L v , add to L v its children and color this vertex black etc. We stop when there are no more white vertices in L v or there are no more unmarked sets D i left. We denote L v = {u 1 , u 2 , … }, where elements are listed in the order of their inclusion to the list (u i is older than u j for i < j and [1, i) such that u i is a child of u i * (equivalently, u i * is the parent of u i ). While constructing the list L v we keep track of the sets D i 1 , D i 2 , … that have been marked one after another (D i s was marked before D i t for s < t). For u j ∈ L v the number r = r(j) tells us that u j was brought to the list by D i r , the rth member of the sequence A set D i s marked by u ∈ L v is called void if u has no neighbors in D i s linked to u by edges labeled D i s (in this case D i s brings no children to u). Note that any D i j is void with probability at most 1 −q.
We observe that the number of vertices brought to the list L v by a regular set D i s ∈ D t,q has the same distribution as T(t, q). For a nonregular set this number may be smaller, since white vertices of a nonregular set D i s that have been colored in previous steps of the exploration cannot be brought to L v by D i s . Therefore as long as k ≤ ′ (n) a coupling of the exploration process with the branching process  + shows that (94) Next we show that wherek ∶= 2k∕q. For v with |L v | ≥ k the event {v ∉ V k } implies that one or more nonregular sets have been marked during the exploration. Then either the first marked nonregular set D i s has index s satisfying s ≤k (we denote this event  k ) or we have s >k. In the latter case there are at leastk − k + 2 void sets D i l with l ≤k (this event we denote  k ). Indeed, on the event  k ∩ {v ∉ V k } ∩ {|L v | ≥ k} we have that the index s of the first observed nonregular set D i s satisfiesk < s ≤ r(k). But the inequalityk < r(k) implies that among the firstk sets from D v there are less than k − 1 nonvoid ones as each nonvoid set contributes at least one new vertex to the list. Now (95) follows from the inequalities Here, Y ∼ Bin(k,q) and (96) follows by Chebyshev's inequality. In (97) we estimated P(D i s is non regular) ≤ (k − 1)M 2 ∕(n − (k − 1)). Indeed, given H s−1 = ∪ 1≤ ≤s−1 D i , the size |D i s | = t and the event that D i s is marked by u j , the probability that D i s is non regular is the conditional probability The last fraction upper bounds the probability that Note that (h − j)∕(n − j) ≤ h∕n and h ≤ (k − 1)M and t ≤ M. Therefore the right side of (98) is at most (k − 1)M 2 ∕n. This shows (97) and we arrive to (95). It follows from (95) that for k = k n → +∞ We similarly show that The probability of the latter event is bounded by (99). Now we show that the remaining event (1). We observe 3 that event  k implies that a nonregular set D i s has been marked by some u r ∈ L v , where r < k. Next we consider two alternatives: either the index s of the first marked nonregular set D i s satisfies s ≤k (the probability of such event is upperbounded in (97) and is o(1)) or s >k. But the inequality s >k implies that at most k − 1 elements of the list L w have marked at leastk − k + 2 void sets before a nonregular set was marked. The probability of such event is upperbounded by (96) and is o(1). Finally, we observe that the events combined with (94) imply Lower bound (the first inequality of (90)). We modify a bit our exploration procedure. Given v ∈ V, we construct the list L * v = {u 1 , u 2 , … } similarly as L v above, but now each u j ∈ L * v only accepts children brought by regular sets. Moreover, not every regular set is allowed to contribute to the list L * v . Permission to contribute is granted at random. The construction of L * v is described in the algorithm A, which uses a slightly modified definition of regular set.
In the algorithm A we use the following notation. D * 1 , D * 2 , … denote the regular marked sets that were allowed to contribute to the list L * v one after another during the exploration; , s ≥ 1. We set H * 0 = {u 1 }. Furthermore, M (1) , M (2) , … denote the numbers of sets marked by u 1 , u 2 , · · · ∈ L * v respectively; M (j) t,q denotes the number of sets from D t,q marked by u j (so that t,q ). For each (t, q) ∈ A 0 we define the integer sequence m (j) t,q = m t,q ( ) − (j − 1)⌊3 ln m⌋, j ≥ 1. For integers h, t we denote, see (84), examine each D ∈ D (j) t,q : mark D whenever u j ∈ D, and if D is regular 4 then accept D with probability p *  (h, t, j), where t = |D| and where h = |H * s | refers to the union H * s of regular sets accepted so far. Color white the children of u j brought by D and insert them to the list L * v . SetD t,q to be the family of unmarked sets from the collection t,q > ⌊3 ln m⌋. Let us show that for any sequence k = k n → +∞, k n ≤ nln −2 n we have More precisely, we show that there exist integer n ′ * depending on and {h t,q , (t, q) ∈ A 0 } such that for n > n ′ * we have P(|L * v | ≥ k) ≥ P(| − | ≥ k) + r * n , where the sequence r * n = o(1) may depend on M, , { n } and |A 0 |.
We firstly show that the acceptance probability p * (h, t, j) of step 7 is well defined, that is, for sufficiently large n, m it is always less than 1. Indeed, while deciding whether to accept a regular set D in step 7 we know that |L * v | < k. Hence, the number of sets marked so far is at most as each M (j) contribute at most |A 0 |3 ln m. The number of marked regular accepted sets is even less.
Here we used the fact that the layer sizes |D * l | ≤ M, the set A 0 is finite and m∕n → ∈ (0, +∞). Now for j = O(nln −2 n) and 2 ≤ t ≤ M we have In the inequality above we use n > n ′ * . A similar reasoning shows that each time we perform step 6 the collectionD (j−1) t,q has more than m t,q (2 ) members. Indeed, we have for j < nln −2 n |D (j) In the last inequality above we use n > n ′ * . We secondly show (see (105), (106) below) that the probability that algorithm A stops for the reason (b) is negligibly small. Introduce the event t,q > 3 ln m, for some 1 ≤ j ≤ r and some (t, q) ∈ A 0 }.
If the algorithm did not stop before u j started exploring its children in the layers of size t and strength  ) . In this case we have for some constant c depending on and M To show (104) we couple M (j) t,q with binomial random variable M * j ∼ Bin(m, t∕(n − j + 1)) so that Then we apply exponential Markov inequality P(M * j > x) ≤ e −x Ee M * j with x = ⌊3 ln m⌋ and use the bound 5 Ee M * j ≤ c, where c depends on , M and the sequence { n }. If the algorithm stops before u j starts exploring its children in the layers of size t and strength q, then we set M (j) t,q ≡ 0. In the latter case (104) is obvious. For r ≤ k we obtain from (104) by the union bound that By i * (k) and r * (k) we denote the positive integers such that u k ∈ L * v is a child of u i * (k) ∈ L * v and u k is brought to the list by the set D * r * (k) . Observe that r * (k) ≤ M 1 + · · · + M i * (k) and i * (k) < k. Therefore (105) implies Here, the second relation follows from the first one. The conditioning on  i * (k) means that the algorithm has not stopped for the reason (b). We now are ready to prove (103). We consider the probability P ( of u j is defined by (107), but with t,q . Note that the total variation distance between their distributions ( Here, the second inequality is shown in (104) and the first one follows from the inequality 6 |P(A|B) − P(A)| ≤ P(B), which holds for any events A, B with P(B) > 0. Hence, we have Furthermore, we have P(|| ≥ k) ≥ P(| − | ≥ k}. Indeed, we can represent the offspring number of  as Proof of (82). We use the shorthand notation The first identity combined with (81) yield Proof of (112). We start with an outline. Denote for short k = (n). We select y and perform exploration L y (as in the proof of the upper bound of (90)). We stop the exploration after the first k elements of the list L y are discovered. Let D y denote the collection of sets marked during this exploration and H(y) = ∪ D∈D y D. We will show that |H(y)| = O P (k) (see (115) below). Next we select x. The event  x|y ∶= {x ∉ H(y)} has probability 1 − O(k)∕n = 1 − o(1) (see (116) below). We then consider the exploration L x (until the first k elements of the list L x are discovered) conditionally on the event  x|y . Using the fact that |H(y)| = O P (k) we show that the sets marked during the exploration L x do not intersect with H(y) with high probability (see (118)). Hence the event  x ∶= {|L x | ≥ k} is asymptotically independent of  y ∶= {|L y | ≥ k}. Finally, we establish (112) by approximating P( y ) and P( x | y ) = P( x )(1 + o(1)) by the survival probabilities of related branching processes. The rigorous argument below adds the details.
Recall that V k denotes the set of k-regular vertices. Denote the events wherek = 2k∕q. (100) and the fact that events Note that event  + y ∩ {|D y | >k} implies that among the firstk sets marked by L y less than k − 1 are nonvoid. The probability of such event is o(1), see (96). Since |D y | ≤k implies  y we conclude that Combining the latter bound with the obvious bound we obtain In the last step we used the inequalities Let D x = {D i 1 , D i 2 , … } denote the sets marked during the exploration L x (D i s is marked before D i s+1 ). We call D i s healthy whenever D i s ∩ H(y) = ∅. Exploration L x is called healthy if all marked sets D i s are healthy (recall that we stop marking the sets after L x collects k elements). Introduce events Next we show that Given integer 0 < t ≤ M, let D ⊂ V be a random set of size |D| = t. Assuming that D and H(y) are independent we estimate the conditional probability Now we consider the exploration L x conditionally, given the event  y ∩ x|y . The conditional probability that D i 1 marked by x is not healthy is at most (M − 1)Mk(n − 1) −1 . Here, we applied (119) and used the fact that x ∉ H(y) implies D i 1 ∉ D y and therefore D i 1 and H(y) are (conditionally) independent. Furthermore, for s = 1, 2, … , given the event that D i 1 , … , D i s are all healthy and that D i s+1 was marked by the jth element (where j < k) of the list L x , the probability that D i s+1 is not healthy is at most Here we used the fact that u j ∉ H(y) implies D i s+1 ∉ D y . By the union bound applied to  * x = ∪ 1≤s≤k {D i 1 , … , D i s−1 are healthy and D i s is not healthy}, we have

This bound implies
Furthermore, on the event  * x the exploration L x does not encounter H(y) and therefore L x is determined solely by the sets D x = {D i 1 , D i 2 , … } (which are subsets of V ⧵ H(y)). The same argument as that of (96), (97) above yields that the event In view of the fact that the event {|D x | ≤k} ∩  * x implies  x and event  x implies  * x we obtain from (121) that This relation together with (117), (120) imply (118).
In the last step of the proof of (112) we estimate and The first bound of (123) follows from (89,93,94). The second one is obtained by a similar argument, but now we perform exploration L x in the subgraph of G induced by the vertex set V ⧵ H(y). In particular, we set N + t,q ∼ Bin(m t,q , p * t ) and Ñ + t,q ∼ Poi(m t,q p * t ) in (85), (86). Here p * t = t∕(n − k − Mk) upperbounds the probability P(u j ∈ D) that u j ∈ L x = {u 1 , u 2 , … } with j < k marks a "healthy" random set (1)). Relation (112) follows from (114), (117), (118), (122), and (123). We have shown (82) in the special case of = , where (n) = ln n. |C| ≤ n () + ′′ n n ) ≥ 1 − o(1) follows from (82). Indeed, we can assume without loss of generality that ′′ n from (82) satisfies ′′ n n ≥ ln 2 n. For (n) = ln n we obtain from (75) for large n, m that For () = 0 relation (83) follows from (124). It remains to prove (83) for () > 0. To this aim we show the matching lower bound |C| ≥ ()n + o P (n). Fix (t, q) ∈ A 0 . Choose = n > 0 such that ( − ) > 0. We select a subset D t,q ⊂ D t,q of size |D t,q | = ⌊ m t,q ⌋ and color sets from D t,q blue. The collection D * 0 = D 0 ⧵ D t,q is obtained from D 0 by removal of the blue sets. Let G (respectively G * ) be the overlay graph on the vertex set V defined by the collection of layers D t,q (respectively D * 0 ). We color edges of G blue. We couple G, G and G * so that G = G ∪ G * . Let (n) = n 2∕3 and let B * ⊂ V be the set of vertices belonging to connected components of G * having at least (n) vertices. Clearly, there are at most n 1∕3 such components. Given a pair of such components C ′ , C ′′ ⊂ V, for any D ∈ D t,q , the probability that C ′ , C ′′ are connected by a blue edge labeled D is at least , the probability that a randomly selected pair of elements of D intersects with C ′ and C ′′ simultaneously, is at least n 2∕3 × n 2∕3 ∕  . Furthermore, by Chernoff's bound (see formula (2.6) of [30]), for anyc > 0 the probability that there are less thanc ln n blue edges between C ′ and C ′′ is at most where the random variable X ⋆ has binomial distribution Bin(⌊ m t,q ⌋, p ⋆ ). Furthermore, the constant c > 0 in (125) depends on h t,q , constantc > 0 and the sequence m n ∕n → . Next, by the union bound, the probability that there exists a pair of components connected by less thanc ln n blue edges is at most provided that n 1∕3 ≥ ln 2 n. We let = n −1∕6 and apply (82) to the set B * of vertices of G * . In view of (126) these vertices belong to the same connected component of G = G ∪ G * with high probability. Hence, |C| ≥ |B * | ≥ n () + o P (n). ▪ In the next Lemma, we relax condition (74) by allowing a negligible fraction of layer types (x n,i , q n,i ) to take their values outside A. Proof.
Given n and (x n , q n ) we color a pair (x n,i , q n,i ) red whenever (x n,i , q n,i ) ∈ A. Otherwise we color (x n,i , q n,i ) blue. The layers defined by red (blue) pairs are colored red (blue) as well. By (127), the number m B of blue layers satisfies m B = o(m). The overlay graphs defined by the families of red (blue) pairs are denoted by G R (G B ). Then G = G (x n ,q n ) is the union G = G B ∪ G R . Let C R and C be the vertex sets of the largest components of G R and G.
We first show that (83) holds (under conditions of Lemma 9.2). We observe that results (81), (82), (83) of Lemma 9.1 apply to G R because m B = o(m). In particular, (83) remains true with C replaced by C R . This observation together with the simple inequality |C| ≥ |C R | yields the lower bound |C| ≥ n () + o P (n). To prove the matching upper bound we apply the inequality where B k R is the set of vertices that belong to components of G R of sizes at least k. Let us show inequality (128). We observe that each v ∈ B k ⧵ B k R belongs to a component of G R of size less than k and this component intersects with some blue layer. Furthermore, each blue layer (being of size at most M) may intersect with at most M distinct components. Hence each blue layer may contribute at most kM vertices to B k ⧵ B k R . Consequently, blue layers altogether contribute at most m B Mk vertices. From (75) and (128) we obtain Choosing k = k n → +∞ so that we obtain |C| ≤ |B , where x [M] = xI {x≤M} . Furthermore, given 0 < < 1 and Q n = (Q n,1 , … , Q n,m ) we denote Q ± n = (Q ± n,1 , … , Q ± n,m ) the -discretization as in (20). Let A ± denote the support of (X [M] , Q ± ). That is, A ± is a subset of {0, 1, … , M}×{s 0 , … , s r }, where s 0 , … , s r are possible values of Q ± , and (t, q) ∈ A ± whenever h ± t,q, > 0. Here, We consider the overlay random graphs G + Let us show (133). We only consider C + [M] . For (t, q) ∈ A + denote Let  n ( ) denote the event that max (t,q)∈A + |h + t,q, − m t,q ∕m| ≤ . We claim that there exists a positive sequence + n ↓ 0 such that Indeed, for each (t, q) ∈ A + we have In the very last step we use the fact that our assumption P n . Noting that m t,q is a sum of independent Bernoulli random variables (some of them may be degenerate) we have, by Chebyshev's inequality, for any > 0 Combining (135) and (136)   Combining this bound with (134) we obtain (133).
In the second step of the proof we let M → +∞. Denote = x * , where x * = E(XI {X≥2} ), and let T * be a random variable with the distribution Note that T * has the same distribution as (77) above, but now we drop the restriction on (X, Q) of having a finite support. Let  be a G-W process with the offspring number Y ∼ CPoi( , (T * )). We mention that the equality of distributions (78) extends to the general setup (where we drop the restriction on (X, Q) of having a finite support). Consequently, the respective survival probabilities () and (f + ) are the same.
To prove the first relation of Theorem 3.4 we show that |C| = n () + o P (n).
We have, by continuity, that ( M ) → () as M → +∞. Now the second identity of (131) together with (132) yield the lower bound |C| ≥ n ()+o P (n) as n → +∞. Indeed, given > 0 we choose large Then for any > 0 and n we have, by Markov's inequality, that Hence (139) holds uniformly in n. Finally, combining (137), (138) and invoking the simple inequality (k) ( M ) ≤ (k) () we obtain for any k, M as n, m → +∞ where o P (n) depends on k and M. Now, by choosing large k, we can make (k) () arbitrarily close to (). Then, by choosing large M, we can make kZ M,n arbitrarily close to zero whp. In this way we obtain the desired upper bound |C| ≤ n () + o p (n). For readers convenience we explain these steps in more detail. From the fact that m∕n → we conclude that the sequence {m∕n} is bounded, that is, m∕n < C ⋆ for some C ⋆ > 0 and all n (recall that m = m n ). Given > 0 we choose (sufficiently large) k such that (k) () ≤ () + . Then we choose (sufficiently large) M such that M < 2 ∕(C ⋆ k). Now (137), (138) yields |C| ≤ n () + n + kmZ M,n + k + nr k,M,n , where nr k,M,n stands for the remainder o P (n) in (138). In particular, r k,M,n = o P (1) as n → +∞. Consequently, P(|C| > n () + 4 n) ≤ P(kmZ M,n > n) + P(k > n) + P(r k,M,n > ).

PERCOLATION MODELS
Proof of Theorem 3.5. The site-percolated graphǦ is an instance of the overlay graph model (1) witȟ n = |S| nodes and m layersǦ 1 , … ,Ǧ m whereǦ k is the subgraph of G k induced by vertices belonging to S, and G 1 , … , G m are the original layers generating the overlay graph G. The layer types (X k , Q k ) in the site-percolated model are mutually independent, and (X k | X k = x k ) is hypergeometric with probability mass function We consider the sequence of site-percolated graphs (Ǧ (n) ) defined by a sequence of overlay graphs (G (n) ) and sets (S n ). For each n, the site-percolated graphǦ (n) is an instance of the overlay model with |S n | =ň =ň n nodes, m = m n layers, and averaged layer type distributioň ( Hyp(n,ň, X n, )(t) ) and the random vector (X n, , Q n, ) has the distributionP n . Let (X, Q) be a random variable with the distributionP. To prove the weak convergenceP n w − →P we show for every (t, s) ∈ Z + × [0, 1] that and .
Next we observe that (P n ) 10 =̌n n (P n ) 10 → (P) 10 = (P) 10 , and m n →̌= −1 . Theorem 3.5(i)-(ii) now follow by applying Theorems 3.1 and 3.4 toǦ (n) and noting thať(P) 10 = (P) 10 . Now assume that (P n ) rs → (P) rs ∈ (0, ∞) for rs = 21, 32, 33. A direct computation shows that (P) rs = r (P) rs . Theorem 3.5(iii) now follows by applying Theorem 3.2 to conclude that the clustering coefficient (Ǧ (n) ) converges to (P) 33 (P) 32 +̌(P) 2 21 = (P) 33 (P) 32 + (P) 2 21 = . Theorem 3.5(iv) follows similarly from Theorem 3.3. ▪ Proof of Theorem 3.6 for the layerwise bond-percolated graphG (n) . The graphG (n) is an instance of the overlay model with n nodes and m layersG n,1 , … ,G n,m whereG n,k has size X n,k and strength Q n,k . The layers (G n,k , X n,k , Q n,k ) are mutually independent, with averaged layer type distributionP converging according toP n w − →P and (P) 10 → (P) 10 . Furthermore, a direct computation shows that (P) rs = s (P) rs . Statements (i)-(ii) of Theorem 3.6 now follow by Theorems 3.1 and 3.4, and noting that (P) 10 = (P) 10 . Statements (iii)-(iv) follow analogously by Theorems 3.2 and 3.3. (n) . In the proof we use a coupling argument. We will utilize the fact that the overlay bond-percolated graph does not differ much from the layerwise bond-percolated graphG (n) , for which the theorem has already been proved. The conditional distribution ofĜ (n) given the layers (G n,k , X n,k , Q n,k ) is an inhomogeneous Bernoulli graph on {1, … , n} where each node pair ij is linked with probabilityp ij = (M ij ∧ 1) where M ij = ∑ k I(E(G n,k ) ∋ ij) is the number of layers linking a node pair ij. The corresponding conditional distribution ofG (n) is a similar inhomogeneous Bernoulli graph with link probabilities p ij = 1 − (1 − ) M ij . Becausep ij ≤p ij , this suggest the following coupling construction:
In the particular case, where there the layer sizes are bounded, that is, there exists constant M > 0 such that the right side of (144) is o(1) as n → +∞. Now (144) together with the weak convergence (shown above) (̃n) w − → CPoi( (P) 10 , Bin 10 (P)) yields the weak convergence (̂n) w − → CPoi( (P) 10 , Bin 10 (P)). Finally, to treated the general case we revoke the boundedness condition (145) in the same way as in the proof of Theorem 3.1. The proof of Theorem 3.6(i) is complete.
Hence (Ĝ (n) ) = (G (n) ) and the claim follows by applying Theorem 3.2 to the nonpercolated model. Proof of Theorem 3.6(iv). The proof is similar to that of Theorem 3.3, but it is a bit more technical. For reader's convenience we give it in the Appendix (Section A) below.
Proof of Theorem 3.6(ii). Let Ñ 1 and Ñ 2 (respectively,N 1 andN 2 ) denote the number of vertices of the largest and second largest component ofG (n) (respectivelyĜ (n) ). Let̃= (f + ) denote the survival probability of the Galton-Watson branching process with the offspring distributionf + .
We observe that coupling (142) yields a coupling ofN 1 and Ñ 1 such that Pr(N 1 ≤ Ñ 1 ) = 1. Furthermore, an application of Theorem 3.4 to the sequence of overlay graphs (G (n) ) yields the approximation Ñ 1 =̃n + o P (n). These two facts taken together imply the upper boundN 1 ≤̃n + o P (n). To show the matching lower boundN 1 ≥ ñ+ o P (n) and the boundN 2 = o P (n) we use the same argument as that of the proof of Theorem 3.4. The only place where we need a minor modification of the argument is the proof of Lemma 9.1. In what follows we review the proof of Lemma 9.1 and pinpoint the changes needed to be made.
At this point we need some notation. LetĈ,Ĉ v ,B k andC,C v ,B k denote the largest component, the component containing vertex v and the set of vertices belonging to components of size at least k in G (n) andG (n) respectively. Recall the notation = x * and x * = E(XI {X≥2} ), and letT * be a random variable with the distribution which is obtained from (77) by attaching the factor to Q. We note the equality of distributionsf + = CPoi( , (T * )), which is shown by the same argument as (30) above. In particular,̃is the survival probability of the Galton-Watson branching process with the offspring distribution CPoi( , (T * )). We claim that the results (81), (82), (83) of Lemma 9.1 hold true if we replace C, C v , B , () bŷ C,Ĉ v ,B ,̃, respectively. In the proof we use the fact that the couplingĜ (n) ⊂G (n) implies couplingŝ C ⊂C,Ĉ v ⊂C v ,B ⊂B , and that the results of Lemma 9.1 hold true forC,C v ,B ,̃.
Proof of (81). We apply Lemma 9.1 toG (n) and obtain the upper bound of (81) via the couplinĝ C v ⊂C v : The corresponding lower bound is obtained by the same argument as in the proof of respective result of Lemma 9.1 (note that regular exploration will not detect any difference between the coupled graphsĜ (n) ⊂G (n) ). Proof of (82). In the proof we take a shortcut (compared to the original argument of Lemma 9.1) while establishing the analog of the main intermediate inequality (112). For the overlay graphG (n) the inequality (112) holds and it reads as follows  .
This is the analogue of (112) that yields (82) forĜ (n) . The rest of the proof of (82) goes without changes. Proof of (83). The upper bound follows from the respective upper bound for |C| combined with the couplingĈ ⊂C. To show the matching lower bound we use the couplingĜ (n) ⊂G (n) and slightly modify the corresponding argument of the proof of Lemma 9.1. Recall thatĜ (n) is obtained fromG (n) by deleting certain edges at random and the probability of deletion is at most 1−p. We call this process thinning. LetĈ ′ ,Ĉ ′′ be connected components of G (n) that have at least n 2∕3 vertices each. We show that for any pair of such componentsĈ ′ ,Ĉ ′′ there is at least one blue edge ofG (n) (blue edges are defined in the proof of Lemma 9.1 when applied toG (n) ) connectingĈ ′ andĈ ′′ that has not been removed when we intersect H * withG (n) to getĜ (n) =G (n) ∩H * in our coupling construction (142). For this purpose we choosec = 10 −1 in (125). Now the probability that all blue edges ofG (n) connecting a given pairĈ ′ ,Ĉ ′′ have been removed by the intersection of G (n) with H * is at most (1 − )̃c ln n ≤ n −10 . As the number of pairs does not exceed ( ⌈n 1∕3 ⌉ 2 ) the probability that at least one pairĈ ′ ,Ĉ ′′ is not connected by a blue edge is at most ( ⌈n 1∕3 ⌉ 2 ) n −10 = o(1), by the union bound. The rest of the proof of the lower bound of (83) goes without changes. Hence (83) holds. □

SUPPLEMENTARY RESULTS
Recall the overlay graph G defined by (1). Given A ⊂ [m], we consider the subgraph G A ⊂ G on the vertex set V(G A ) = V(G) = {1, … , n} defined by the layers (G a , X a , Q a ), a ∈ A. Thus, the edge set of G A is E(G A ) = ∪ a∈A E(G a ). In the following two results, we denote by N A the set of neighbors of node i in G A , and we set D A = |N A | to denote the degree of i in G A .
Lemma 11.1. Let g be a probability density on Z + . Let Hence, it follows that | (t)| ≤ P(D A ≤ t,  A ,  B ,  ), where the upper bound can be expressed as Because P(|U ∩ N B | > 0,  B ) ≤ ∑ j∈U P(ij ∈ E(G B ),  B ) ≤ c B t whenever |U| ≤ t, the inequality | (t)| ≤ c B tP(D A ≤ t,  A ) follows.
We split ∑ k, In and upper bound the conditional probability P( k , ′ k > 0) using the union bound and symmetry, Here, (X k ) 2 Q k We similarly setimate S ′′ 1 . The event ′′ k > 0 means that at least one link 12 or/and 13 is present in G −k . Hence (P n ) 3 21 = 2Δ 21 . Finally, we show S 2 ≤Δ 43 + 2Δ 32 in much the same way as (A12) above. Collecting the bounds for S ′ 1 , S ′′ 1 , and S 2 we obtain (A13). ▪