The threshold for jigsaw percolation on random graphs

Jigsaw percolation is a model for the process of solving puzzles within a social network, which was recently proposed by Brummitt, Chatterjee, Dey and Sivakoff. In the model there are two graphs on a single vertex set (the `people' graph and the `puzzle' graph), and vertices merge to form components if they are joined by an edge of each graph. These components then merge to form larger components if again there is an edge of each graph joining them, and so on. Percolation is said to occur if the process terminates with a single component containing every vertex. In this note we determine the threshold for percolation up to a constant factor, in the case where both graphs are Erd\H{o}s--R\'enyi random graphs.


Introduction
Jigsaw percolation is a dynamical percolation model on finite graphs, which was proposed by Brummitt, Chatterjee, Dey and Sivakoff [2] as a tool for the study of sequences of interactions within a social network that enable a group of individuals to collectively solve a problem. In the model there are two edge sets defined on a common set of vertices, and, at discrete times, clusters of vertices merge to form larger clusters if they are joined by at least one edge of each type. Before expanding on the motivation for the model, let us give the formal definition. We write [n] for {1, 2, . . . , n}. Definition 1. For i = 1, 2, let E i ⊂ [n] (2) be a set of pairs of elements of V := [n]. Let G be the ordered triple G := (V, E 1 , E 2 ); we call this object a double graph. Jigsaw percolation with input G evolves at discrete times t = 0, 1, . . . according to the following algorithm. At time t there is a partition C t = {C 1 t , . . . , C kt t } of the vertex set [n], which is constructed inductively as follows: (1) We take k 0 = n and C i 0 = {i} for all 1 i n. That is, at time 0 we begin with every vertex in a separate set of the partition.
(2) At time t 0, construct a graph G t on vertex set C t by joining C i t to C j t if there exist edges e 1 ∈ E 1 and e 2 ∈ E 2 such that e ℓ ∩ C k t = ∅ for each of the four choices of ℓ ∈ {1, 2} and k ∈ {i, j}. (3) If E(G t ) = ∅, then STOP. Otherwise, construct the partition t+1 } corresponding to the connected components of G t , so each part C i t+1 is a union of those parts of C t corresponding to a component of G t . (4) If |C t+1 | = 1 then STOP. Otherwise, go to step 2.
Since |C t | is strictly decreasing, the algorithm terminates in time at most n 2 . We denote the final partition by C ∞ = (C 1 ∞ , . . . , C k∞ ∞ ). We say that there is percolation, or that the double graph is solved, if C ∞ = {V }, i.e., if we stop in step (4).
Less formally, the jigsaw percolation algorithm begins with each vertex considered a separate cluster, and proceeds by merging, at each step, clusters of vertices joined by at least one edge from E 1 and at least one edge from E 2 .
Let us mention in passing a superficially similar, but very different, percolation model in double graphs introduced by Buldyrev, Parshani, Paul, Stanley and Havlin [3] in 2010. The set-up is the same, but one defines the partition in a topdown way, finding the maximal sets of vertices connected in both graphs (V, E 1 ) and (V, E 2 ) ('mutually connected clusters'). To see the difference note that if these graphs are edge-disjoint and connected, then in this model V forms a single 'mutually connected cluster', whereas in jigsaw percolation the algorithm stops where it starts, with a partition into singletons.
Returning to the jigsaw model, which we consider throughout this paper, Brummitt, Chatterjee, Dey and Sivakoff [2] suggest that jigsaw percolation may be a suitable model for analysing how a puzzle may be solved by collaboration between individuals in a social network. The premise is that each individual has a 'piece' of the puzzle, and that these 'pieces' must be combined in a certain way in order to solve the puzzle. The process of solving the puzzle is constrained by the social network of the individuals concerned. To model this, the authors of [2] suggest that one of the graphs, G 1 := (V, E 1 ) say, (which they call the people graph, and which we call the red graph), could represent the graph of acquaintances, and that the other the graph, G 2 := (V, E 2 ), (which they call the puzzle graph, and which we call the blue graph), could represent the 'compatibility' between pairs of 'pieces' of the puzzle. The jigsaw percolation algorithm thus represents the merging of 'compatible puzzle pieces' by groups of connected individuals. For an in-depth account of the applications of the model to social networks, we refer the reader to the original article [2].
In [2], the authors prove a number of necessary and sufficient conditions for percolation when the red graph G 1 is the Erdős-Rényi random graph G(n, p) with n vertices and edge probability p and the blue graph G 2 is a deterministic graph, such as the n-cycle or another graph of bounded maximum degree. For example, they show that there is an absolute constant c > 0 such that if, for each n ∈ N, is an arbitrary connected graph on vertex set [n], and G n 1 is an Erdős-Rényi graph with edge probability p c/ log n, then the double graph G := ([n], E n 1 , E n 2 ) percolates with high probability as n → ∞. On the other hand, if the graphs G n 2 have bounded maximum degree and instead p < n −ε for some ε > 0, then with high probability the double graph does not percolate.
Gravner and Sivakoff [5] observe that for certain deterministic graphs G 2 , the jigsaw percolation model behaves similarly to bootstrap percolation on the grid [n] 2 , and they use techniques from bootstrap percolation to prove tight bounds in certain cases. For example, if G 1 = G(n, p) and G 2 = C n is the n-cycle, they show that the critical probability p Cn c (n) for the corresponding double graph G, defined by (This critical probability, and specifically the constant π 2 /6, will be known to readers who are familiar with bootstrap percolation: it is also (see [6]) the critical probability for the so-called 'modified' bootstrap percolation model on [n] 2 -this is, of course, not a coincidence (see [5] for the details).) In this note we study the case where both underlying graphs are Erdős-Rényi random graphs. In order to state our result, we need a little more notation. For the rest of the paper, we shall take G 1 = (V, E 1 ) and G 2 = (V, E 2 ) to be independent Erdős-Rényi random graphs with the same vertex set V = [n], with edge probabilities p 1 and p 2 respectively, and we take G = ([n], E 1 , E 2 ). A first trivial observation is that if the double graph G is to percolate then both G 1 and G 2 must be connected. We shall ensure this by assuming for a sufficiently large absolute constant c. Under this condition, in this note we determine the critical value p c (n) of the product p 1 p 2 up to a constant factor. More precisely, we show that under the assumption (1), if p 1 p 2 (1/c)p c (n) then percolation is very unlikely, and if p 1 p 2 cp c (n) then percolation is very likely.

Theorem 2.
There is an absolute constant c > 0 such that the following holds, are independent Erdős-Rényi random graphs on the same set of n vertices, and p 1 and p 2 are functions of n.
Informally, this result says that The proof of lower bound in Theorem 2 is straightforward and follows from standard methods; the real content of this paper is the proof of the upper bound.

Proof of part (i) of Theorem 2
Here we present the very brief proof of part (i) of Theorem 2, although really it is no more than the argument used by Aizenman and Lebowitz [1] to derive the lower bound up to a constant factor for the critical probability for 2-neighbour bootstrap percolation on [n] 2 .
The percolation process defined above can be broken down into smaller steps, in each of which two clusters (parts of the current partition) merge -specifically, one can modify step (3) of Definition 1 to merge an arbitrary pair of sets C i t joined in the graph G t , rather than entire connected components. Since we start with a partition into singletons, considering the first step at which a cluster of size at least log n appears, it follows that if (G, E 1 , E 2 ) percolates, then there is some set A of at least log n but at most 2 log n vertices such that the red and blue graphs restricted to A are both connected. Using independence of the red and blue graphs, the facts that a connected graph must contain a spanning tree and that there are k k−2 labelled trees on k vertices, and the bound n k (en/k) k , we see that For p 1 p 2 1/(e 4 n log n), say, the quantity in brackets is at most 1/e 2 and it follows that the final bound is o(1), proving (i).
We can now move on to the main part of this paper: the second part of Theorem 2.

Proof of part (ii) of Theorem 2
Let us begin with a small number of conventions. As already mentioned, G 1 and G 2 will always be independent Erdős-Rényi random graphs on vertex set V := [n], with densities p 1 and p 2 respectively. Throughout, we assume that c is a sufficiently large absolute constant, that the number of vertices n is sufficiently large, and that p 1 and p 2 satisfy The assumptions of Theorem 2 (ii) require p 1 p 2 c/(n log n) rather than the equality in (2), but we may couple with smaller p 1 and p 2 if necessary so that (2) holds. Constants implicit in O(·) notation (and its variants) are independent of c (and of n). For later use, let us note some immediate consequences of (2); these follow since p 1 √ p 1 p 2 and p 2 = (p 1 p 2 )/p 1 : We need a key definition: that of an 'internally spanned' set of vertices. The definition enables one to say which sets of sites (internally) percolate, without any help from other vertices, and is motivated by several similar notions in the bootstrap percolation literature (see, for example, [1,4]). Our method for showing that G percolates will broadly take the form 'there exists a nested sequence of internally spanned sets U 1 ⊂ · · · ⊂ U m , with U m = [n]'; the crux will be finding such a sequence.
). We write I(G, m) for the event that V (G) contains an internally spanned set of size at least m.
The proof of the lower bound of Theorem 2 could now be rephrased as follows. First, observe that if G percolates, then, by merging components two at a time, we have that V (G) must contain an internally spanned set of size roughly log n. Second, using well-known properties of trees, one can show that if p is small then this event is unlikely to occur.
The proof of the upper bound is divided into three parts, with a corresponding division of both the red and blue edges into three subsets. In the first part of the proof we show that with high probability there is a set A of at least (log n) 3/2 vertices which is internally spanned by the first set of (red and blue) edges. This is the core of the proof: the 'bottleneck' event to percolation (in a certain sense) is the existence of an internally spanned set of size about log n. 1 Then we show, using the second set of edges, that with high probability the set A is contained in an internally spanned (with respect to the edges reveal so far) set B of size n/16. Finally, using the condition (1), we show using the third set of edges and the set B that with high probability the whole vertex set is internally spanned.
In each of the first two parts we specify an 'exploration algorithm', in which the edges of each of the underlying graphs are revealed in an order that depends on what has been observed so far. The purpose is to reveal as few edges as possible (in order that we may reveal them later if necessary) in the search for a nested sequence of internally spanned sets. The algorithms are set out explicitly in Definitions 5 and 10.
Between the three parts of the proof, independence is maintained by sprinkling: for each i = 1, 2 and j = 1, 2, 3, we take G (j) i to be an independent copy of G(n, p i ), where p 1 and p 2 satisfy the conditions (2) as before; we then set E Constructing the G i in this way maintains both conditions in (2), with a different value of c.
3.1. Part I. In this first part of the proof we prove the following lemma.
Lemma 4. The probability that G (1) contains an internally spanned set of size at least (log n) 3/2 is at least 1 − e − √ n .
We prove the lemma by repeatedly attempting to build an internally spanned set of size at least (log n) 3/2 by adding one vertex at a time to a so-called 'trial set' (we call this the 1-by-1 algorithm). If we find a suitable vertex to add to the trial set, then we continue. If not, then we discard the vertices from the trial set, and start again; we call this starting a new round. Discarding the trial set will ensure independence between rounds (see below). More precisely, the algorithm performs a sequence of 'tests', asking whether certain potential red edges or potential blue edges are present. (More precisely still, the 'test' corresponding to a pair {x, y} of vertices and i ∈ {1, 2} asks whether xy ∈ E (1) i .) We shall make sure that no test is performed twice.
The subtlety is the order in which we reveal the edges: the aim is to reveal as few edges as possible in the search for each new vertex. Given an internally spanned trial set X, we first reveal all red edges from (not-yet-discarded) vertices outside X to the most recently added vertex v in X. Since a potential red edge is tested immediately after the first time one of its ends is added to X, it cannot be tested twice within a round. 2 Let R be the set of vertices outside X incident with such red edges. We test for blue edges, to the whole of the trial set, only from vertices in R. If there is a vertex in R with a blue edge to any vertex in the trial set, then we add one such vertex to the trial set. We discard all other vertices in R until the end of this round; this ensures that no potential blue edge is tested twice within a round. 2 Looking at it from the point of view of vertices, rather than edges, the fact that a potential new vertex has been considered at the tth step, and found not to have a red edge to the most recently added vertex, does not stop us from testing the same vertex again at later steps, since in those steps we will be testing for different red edges.
At the end of a round we permanently discard all vertices in the trial set. Since a tested edge (red or blue) always has at least one end in the trial set, this ensures that no edge is tested in two different rounds, giving us the independence we need.
Here is a formal description of the algorithm.
Definition 5. (The 1-by-1 algorithm.) The algorithm is divided into rounds, indexed by k, and each round is divided into steps, indexed by t. At the start of the kth round there is a set A k ⊂ [n] of active vertices and a set D k ⊂ [n] of discarded vertices. We begin with A 1 = [n] and D 1 = ∅. The procedure for the kth round is as follows: (1) At the start of the tth step of the kth round there is a set X t k = {x 1 k , . . . , x t k } ⊂ A k of trial vertices, a set A t k ⊂ A k of active vertices, and a set D t k ⊂ A k of discarded vertices. These sets partition A k , so for all t, A k is the disjoint union of X t k , A t k and D t k . To begin, we have X 0 k = D 0 k = ∅ and A 0 k = A k . (2) For t = 0, move an arbitrary active vertex to the trial set. That is, set 1 (that is, all red edges from the first sprinkling) between A t k and {x t k }, and let Then, reveal all edges of G 2 (that is, all blue edges from the first sprinkling) between R t k and X t k , and let B t k := x ∈ R t k : xx s k ∈ E (1) 2 for some 1 s t .
(4) If B t k = ∅, then let x t+1 k be an arbitrary element of B t k . Then set X t+1 If t (log n) 3/2 then STOP, otherwise set t := t + 1 and go to step (3).
Before starting the analysis, let us note that since we consider at most n/(2(log n) 3/2 ) rounds, and stop each with a trial set of size at most (log n) 3/2 , we start each round with Let R t k be the event that round k proceeds at least as far as step t, i.e., that R t k is (defined and) non-empty. We shall show that R t k is not too unlikely. For technical reasons, we also need to consider the event S t k = |R s k | n/4t for s = 1, 2, . . . , t that within round k, we have not 'used up' too many vertices by step t. (Here and in what follows we ignore rounding to integers in expressions such as n/4t. This makes essentially no difference.) For k n/(2(log n) 3/2 ) and t 1, let noting that S 0 k is the trivial event that always holds We shall need two different estimates on r t k , according to whether t is larger or smaller than (log n)/c. Lemma 6. Suppose that k n/ 2(log n) 3/2 and 1 t t 1 = (log n) 3/2 . Then Proof. We condition on the outcome of the exploration so far, up to the start of step t of round k. Note that this information determines whether R t−1 k ∩ S t−1 k holds; we may assume that it does.
Given the information revealed so far, the conditional distribution of |R t k | is binomial Bin(|A t k |, p 1 ). Since |A t k | n, we have say, where for the final inequality we used the bounds p 1 1/ √ n (from (3)) and t (log n) 3/2 . Now, conditional on the exploration so far, for each x ∈ A t k we have On the event R t−1 k ∩ S t−1 k we have |A t k | |A t 0 | − (t − 1)n/4t n/4, using (4). Since R t k holds if and only if B t k = ∅, we thus have The first case of the lemma now follows from the bounds and The second case follows from the first and the inequality 1 − e −x x/2, valid for x 1.
In the next two lemmas, we break down the 1-by-1 algorithm into two stages: first, in Lemma 7, we show that the probability the algorithm reaches step t 0 := (log n)/c in a given round is at least n −O(1)/c . Then, in Lemma 8, we show that the probability it reaches step given that it has reached step t 0 , is also at least n −O(1)/c . Lemma 7. Suppose that k n/ 2(log n) 3/2 . Then Proof. The definition of t 0 combined with the expression for p 1 p 2 from (2) implies that np 1 p 2 t 0 1, and hence the second case of Lemma 6 applies for the whole range. Thus, Recall that p 2 (log n) −2 , so p 2 t 0 (log n) −1 = o(1). Noting that 1 − x e −2x if x 1/2 and that t! (t/e) t for all t ∈ N, it follows that Since p 2 t 0 1/ log n, we obtain as required.
Lemma 8. Suppose that k n/ 2(log n) 3/2 . Then Proof. For t 0 < t t 1 , we use the first of the two estimates in Lemma 6, the validity of which does not depend on t. Using this, and recalling that p 2 1/(log n) 2 (from (3)) and t (log n) 3/2 , we have where for the final step we used the inequality 1 − x exp(−3x), valid (by convexity) for 0 x 0.9, say. 3 Thus, where the implied constant does not depend on c.
We now put the previous few lemmas together.
Proof of Lemma 4. Let k n/ 2(log n) 3/2 . Then in the kth round, the probability of finding an internally spanned set of size (log n) 3/2 is at least n −O(1)/c , by applying Lemmas 7 and 8 in turn. Moreover, this bound holds conditional on the result of all previous bounds, since in proving Lemma 7 and 8 we conditioned on these previous rounds. Hence the probability that all n/ 2(log n) 3/2 rounds terminate 'early' (without finding an internally spanned set of size (log n) 3/2 ) is at most if c is sufficiently large.
3.2. Part II. In this part of the proof we prove the following lemma.
Lemma 9. Given that G (1) contains an internally spanned set of size at least (log n) 3/2 , the conditional probability that G (1) ∪ G (2) contains an internally spanned set of size at least n/16 is at least 1 − n −100 . That is, In this range we use a different vertex exploration algorithm in order to find successively larger internally spanned sets. Rather than adding vertices 1-by-1, as in Part I, we attempt to double the size of the trial set at each step. We start with a set X 0 of size t 1 = (log n) 3/2 internally spanned by G (1) (found in Part I), and we continue so that at step t we have a set X t of size x t := 2 t t 1 internally spanned by G (1) ∪ G (2) . In order to maintain independence between steps, we only add a new vertex v to the trial set if there is at least one edge of each colour from G (2) joining v to the subset of the trial vertices that was added at the previous step.
Definition 10. (The doubling algorithm.) At the start of the tth step there is a set X t of vertices internally spanned by G (1) ∪ G (2) , where |X t | = x t . The set X t is the trial set. The set A t := V (G) \ X t of remaining vertices in the graph is the active set. The algorithm takes as its inputs the double graphs G (1) and G (2) , and a set X 0 of size (log n) 3/2 , internally spanned by G (1) .
(1) At step t 0, reveal all edges of G (2) between A t and X t \ X t−1 , where we set Thus, B t is the set of active vertices joined to X t \ X t−1 by an edge of each colour from the second sprinkling. (2) If |B t | x t then STOP. Otherwise, let C t ⊂ B t be an arbitrary set of exactly x t vertices of B t , and set If |X t+1 | n/16 then STOP, otherwise go to step (1).
First we need a lower bound on the probability that the size of the trial set doubles at step t.
Lemma 11. The probability that |B t | x t , conditional on the doubling algorithm having reached the tth step (that is, X t = ∅), is at least Proof. Let s t be the probability in question, so Let q t,i be the probability that a vertex v ∈ A t is joined to X t \ X t−1 = C t−1 by at least one edge of G i . Since |C t−1 | = x t /2 for t 1 while |C −1 | = x 0 , we have By the definition of the doubling algorithm, we have |X t | n/16 (otherwise we would have stopped), so there are at least 15n/16 n/2 vertices v ∈ A t . The events that individual vertices are in B t are independent (because we do not 'retest' edges). Hence if Z is a random variable with the Bin n/2, q t,1 q t,2 distribution, then s t P(Z x t ).
Lemma 9 is now immediate.
Let t 2 be maximal such that x t 2 n/16, which in particular implies that t 2 = O(log n). By Lemma 11, the left-hand side of (6) is at least which is certainly at least 1 − n −100 , as required.
3.3. Part III. It remains to show that if G (1) ∪ G (2) contains an internally spanned set X of size at least n/16 then G = G (1) ∪ G (2) ∪ G (3) is internally spanned with high probability. But this is trivial: using the final sprinkle, i.e., the edges of G (3) , every vertex v ∈ [n] \ X is joined to X by both a red edge and a blue edge, with high probability.
Proof of Theorem 2. As noted above, it remains only to prove (ii), and in doing so, we may assume (2). By Lemmas 4 (Part I) and 9 (Part II), G (1) ∪ G (2) contains an internally spanned set of size at least n/16 with probability at least 1 − 2n −100 . Let X be such a set. Then the probability that there is any vertex of [n] \ X not joined