The $Q_2$-free process in the hypercube

The generation of a random triangle-saturated graph via the triangle-free process has been studied extensively. In this short note our aim is to introduce an analogous process in the hypercube. Specifically, we consider the $Q_2$-free process in $Q_d$ and the random subgraph of $Q_d$ it generates. Our main result is that with high probability the graph resulting from this process has at least $cd^{2/3} 2^d$ edges. We also discuss a heuristic argument based on the differential equations method which suggests a stronger conjecture, and discuss the issues with making this rigorous. We conclude with some open questions related to this process.


Introduction
Let F be a (typically small) graph.A graph G on vertex set V is F -saturated if it contains no copy of F as a subgraph but the addition of any new edge in V (2) \ E(G) creates a copy of F .The F -free process is a well-known way of generating a random F -saturated graph.We fix a finite set V and form a nested sequence G 0 , G 1 , . . ., G M of F -free graphs with common vertex set V .For each i we have |E(G i )| = i and G i+1 is obtained from G i by randomly adding a new edge chosen uniformly at random from all of the possible edges which do not create a copy of F .The process stops when no new edge can be added; in other words G M is an F -saturated graph with M edges.We can now ask: what can be said about the properties of the graph G M , and in particular the random variable M ?Work in this direction was initiated in 1992 by Ruciński and Wormald [21], who studied the case F = K 1,3 , the star with three leaves, investigating the structure of G M .
A major breakthrough in the area was the 2009 paper of Bohman [4] on the case F = K 3 , the so-called triangle-free process.Using the differential equations method for random graph processes introduced by Rucinski and Wormald in [21] (see for instance [24] for a survey of the subject), Bohman determined the order of M with high probability.
Theorem 1 (Bohman [4]).Let G M be the graph generated by the triangle-free process with |V | = n.Then with high probability, c 1 (log n) for some constants c 1 and c 2 .
This result was later refined by Pontiverios, Griffiths and Morris [20] and independently by Bohman and Keevash [5].Both sets of authors used a substantial extention of the differential equations method to determine M asymptotically, with high probability.They also used their analysis of the triangle-free process to improve the known bounds on the Ramsey number R(3, t).
There is now a large body of work on the F -free process for other graphs.See for instance [6,18,22] and the references therein.Now let H be a (typically large) host graph A graph G is (H, F )-saturated if it is an F -free subgraph of H but the addition of any new edge in E(H) \ E(G) creates a copy of F .With this formulation the usual notion of F -saturation corresponds to (K n , F )-saturation.Our particular interest is the case where the host graph H is the hypercube Q d .
Analogously to the usual triangle-free process, we define the Q 2 -free process in Q d as the random nested sequence of subgraphs of Q d generated by repeatedly adding a new edge chosen uniformly at random from all those edges of Q d which do not create a copy of Q 2 .We consider Q 2 as our forbidden subgraph since this is the most natural analogue of the triangle-free process (forbidding K 3 in K n ).However, the definition naturally extends to other forbidden subgraphs of Q d .For comparison, the unconstrained random process with the hypercube as host graph is considered in [1,7,9].
We will describe this process more formally using an equivalent definition based on random permutations which turns out to be easier to work with.We first choose a uniformly random permutation of E(Q d ) giving a labelling of these edges as e 1 , e 2 , . . ., e |E(Q d )| .From this, form a nested sequence of subgraphs of Q d by looking at the edges in turn, and adding the next edge which does not create a copy of Q 2 .More precisely, suppose that we have constructed graphs G 0 , G 1 , . . ., G i and have looked at edges e 1 , . . ., e t(i) .We look at the edges e t(i)+1 , e t(i)+2 , . . . in turn, stopping when we get to some e j which can be added to G i without creating a copy of Q 2 .We add e j to G i to form G i+1 and let As in the graph case, our main question is what can be said about the random variable M ?Our main result, proved in Section 2, is that with high probability, the subgraph of Q d generated by the Q 2 -free process in Q d has at least cd 2/3 2 d edges, for some constant c.We also establish a local version of this result: with high probability, almost all vertices have degree at least cd 2/3 .
In Section 3 we consider the differential equations heuristic in the hypercube context.This leads to the conjecture that the order of magnitude of the number of edges in the graph generated is (log d) 1/3 d 2/3 2 d .We also see that the approach to making the differential equations method rigorous followed in the ordinary graph triangle-free process is unlikely to work in the hypercube context without some significant new ideas.Thus the problem of analysing this process may be a natural testing ground for extending the differential equations method further or developing new techniques.
We conclude, in Section 4, by raising some related open problems.Finally, we note that, complementing these probabilistic questions, saturated graphs have been studied from an extremal perspective.Indeed, the well-studied Turán number of F , denoted by ex(n, F ), can be defined as the maximum number of edges in an F -saturated graph on n vertices.As a counterpart to this, the saturation number of F , denoted by sat(n, F ) is the minimum number of edges in an F -saturated graph.See the surveys [11] and [10,19] and many references therein for more on Turán and saturation numbers.Both Turán and saturation numbers have been studied for the host graph Q d (see for instance [2,3,8,14] for the former and [13,15,17] for the later).However, to our knowledge, this associated random process has not.

Main Result
Theorem 2. Let M be the number of edges in the subgraph G M of Q d generated by the (Q d , Q 2 )-free process.With high probability, M > cd 2/3 2 d , for some constant, c.
As we shall see, the constant c can be taken to be arbitrarily close to 1/e.The proof uses the random permutation formulation of the process.We identify a local condition on the permutation which guarantees that a particular edge appears in the final graph G M .Calculating the probability that this condition is satisfied gives a lower bound on the expected number of edges.The fact that the condition is a local one means that dependence between edges is limited and the second moment method gives a lower bound on M which holds with high probability.
Proof of Theorem 2. Generate a random permutation of the edges of Q d by assigning to each edge e, a random variable T e , where T e is uniformly distributed in the interval [0, 1].We say that e precedes f in our order if T e < T f .Let G M be the saturated graph yielded by following the (Q d , Q 2 )-free process on this permutation.
We say that an edge e ∈ E(Q d ) is good if, for every Q 2 containing e, the last of its four edges in our ordering is not e.It is easy to see that if e is good, then e is an edge of G M .
Let A e denote the indicator random variable taking the value 1 if e is good and 0 otherwise, and let A = e∈E(Q d ) A e be the total number of good edges.Considering how the permutation is generated from the variables T e , we obtain: (as the integrand is decreasing in x) for large enough d.
Since Q d has d2 d−1 edges, linearity of expectation gives that: The event A e depends only on the variables T f where f is one of the 3(d − 1) edges contained in a Q 2 through e.It follows that A e is independent of all but at most 9d 2 other A f .
Thus by Chebychev's inequality, A ≥ cd 2/3 2 d with high probability, for some c (which can be taken to be arbitrarily close to 1/e).This concludes the proof, since M ≥ A.
A slightly more careful calculation gives that d 2/3 2 d−1 is the correct order of magnitude of E(A) so A = Θ(d 2/3 2 d−1 ) with high probability.However, because the property of being good is sufficent but not necessary for an edge to be in G M , this observation gives no upper bound for M .
Notice that the way in which we bounded the integral, means that only edges with T e ≤ d −1/3 are considered.This means that with high probability, not only do we finish the process with at least cd 2/3 2 d edges, but at least this many edges must be added from among the first d 2/3 2 d−1 edges considered.
The same approach can be used to give some information on the degrees in G M .The degree of v in G M is bounded by the number of good edges among the edges of Q d incident to v. Unfortunately, there are no independent pairs among the events that each of these edges is good.However, the dependence is very limited and so a local analogue to Theorem 2 can be established.
-free process and v be a randomly chosen vertex of G M .With high probability, the degree of v in G M is at least cd 2/3 , for some constant, c.
Proof.Let e 1 , . . ., e d be the edges of Q d incident to a fixed vertex v.We define good edges as in the proof of Theorem 2. Let A i be the indicator variable of the event 'e i is good', and where It will suffice to show that r − p 2 = o(d −2/3 ).From this we deduce that Var(D v ) = o(d 4/3 ) = o(E(D v ) 2 ), and the result follows as in the proof of Theorem 2. Now, using the same method of generating a random permutation as in the proof of Theorem 2, we have: It is easy to check that if one of x, y is greater than d −1/4 then both f (x, y, d) and g(x, y, d) are at most exp(d −1/4 ) and so the contribution to r − p 2 from this range of x, y is certainly o(d −2/3 ).
On the other hand, if x < y < d −1/4 then, writing where we conclude that That is, the contribution to r − p 2 from this range of x, y is at most 3d −3/4 = o(d −2/3 ) as required.
We do not know whether or not, with high probability all vertices have degree at least cd 2/3 .

Heuristic
For the triangle-free process, Bohman [4] introduces a heuristic that assumes certain random variables follow some trajectories closely.Using this assumption he deduces the values of those trajectories in order to bound the number of edges in the resulting graph.This approach can be made rigorous using martingales.
We use the analogous heuristic for the (Q d , Q 2 )-free process to suggest a possible order for M .However, we also point out some differences between the (Q d , Q 2 )-free process and the triangle-free process that cause difficulties in making this argument rigorous.
Let G 0 , . . ., G M be the sequence of graphs generated by the (Q d , Q 2 )-free process.Let u and v be a pair of vertices that are adjacent in Q d .We say that uv is open in G i if there is no path of three G i -edges that connect u to v.In other words uv is open if adding it to G i does not form a copy of Q 2 .We write O i for the number of open pairs in G i .This definition of open pairs is analogous to a definition in [4].
We also define, for each Q d -adjacent pair of vertices u and v, three other random variables.Let W i (uv) denote the number of paths of length 3 from u to v consisting of three open pairs in G i , let X i (uv) be the number of paths of length 3 from u to v consisting of two open pairs and one G i -edge and let Y i (uv) count the paths of length 3 from u to v consisting of one open pair and two G i -edges.
For convenience, we also introduce a scaling t = i d 2/3 2 d .We assume there are continuous functions q, w, x and y such that for all i and all Q d -adjacent u and v: Note that adding a single edge uv to G i to form G i+1 removes Y i (uv) open edges.Thus for small ǫ, we expect This suggests that dq dt = −y.Similar arguments give: Solving these equations with initial conditions q(0) = 1/2, w(0) = 1, x(0) = y(0) = 0 gives For any i, the final number of edges in the process is bounded from above by i + O(i) and from below by i.If indeed O(i) ≈ 1 2 e −8t 3 d2 d , then when t = Θ(log 1/3 d), these bounds are both Θ (log d) for some constants c 1 and c 2 .
For the triangle-free process, Bohman uses martingales to show that with high probability all the relevant random variables do indeed follow their trajectories closely.By contrast, in our process the situation is more complicated; the random variables we use to track the evolution of the graph do not all follow the trajectory indicated by the differential equations heuristic.
Associated with the (Q d , Q 2 )-free process, we have a sequence of graphs, H(j), for j = 0, . . ., n2 n−1 , where H(j) is the graph formed by the first j edges in the randomly chosen permutation.This nested sequence of graphs is a natural analogue of the unconstrained Erdős-Renyi random graph process We again let G i denote the graphs of the (Q d , Q 2 )-free process for i = 0, . . ., M , but consider i as a function of j.That is, we write i(j) for the number of edges added from among the first j edges looked at.For Q d -adjacent vertices u and v, note that Y uv (i(j)) = 0 whenever u and v are isolated in H(j).Thus, It follows that there is some constant c such that while j ≤ cd2 d−1 , we have, in expectation, a large number of pairs uv with Y uv (i(j)) = 0.It seems likely that i is approximately concave as a function of j (the number of edges added should grow faster early on in the process when fewer edges have been looked at).If true this would imply that for some uv, the random variable Y uv (i(j)) equals zero for a constant proportion of the process.Thus, unlike in the triangle-free process, we will not typically have every variable following its expected trajectory closely.It is still possible that this approach can be salvaged, for instance by showing that almost every variable follows its trajectory closely, but this does not appear to be straightforward.

Further Questions
Given the apparent obstacles to adapting the techniques from the triangle-free process to the cube, the main open problem is to develop tools to understand the Q 2 -free process in Q d .This could involve either refining the differential equation method or introducing a completely new approach.
The most immediate open problem is to give a good upper bound for M and in particular to answer the following question: Question 5. Is the true order of magnitude of M given by (log d) 1/3 d 2/3 2 d as predicted by the differential equation heuristic?
More generally, one could ask about properties other than the number of edges.Question 6.What can be said about properties of the graph G M generated by the Q 2 -free process in Q d other than the number of edges it has.
The minimum and maximum degree of G M are two such quantities.Note that Proposition 3 does not give any information about the minimum degree since it bounds almost all degrees rather than every degree.
Bohman proves that with high probability the triangle-free process produces a graph with no large independent set.This was used to give improved lower bounds on the Ramsey number R(3, k).In the cube there is no analogous Ramsey result; indeed for any d there is a 2-colouring of e(Q d ) with no monochromatic Q 2 .Nevertheless, one could ask about the existence of empty subcubes.Question 7. What can be said about the number of copies of Q k in Q n which contain no edges of G M ?For which k is the expected number of empty Q k bounded away from 0?
More generally, what can said about the appearance of fixed subgraphs in G M ?For the triangle-free process, this question has been addressed by Wolfovitz [23] (sparse subgraphs) and Gerke and Makai [12] (dense subgraphs).
It is worth noting that in general the notion of being a subgraph of Q d is a little nuanced.Suppose that H is a subgraph of some Q k with k ≤ d.There are two natural notions of H being a subgraph of Q d .We could simply insist that H a subgraph of G M in the ordinary graph theoretic sense.An alternative stronger concept is the notion of an isometric subgraph.Where H is an isometric subgraph means that there is an injective map i from V (Q k ) to V (Q d ) such that (i(x), i(y)) ∈ E(Q d ) if and only if (x, y) ∈ E(Q k ) and (i(x), i(y)) ∈ E(G M ) if (x, y) ∈ E(H).When H = Q k these notions coincide but for many other subgraphs (for instance the 4 vertex path or the 6-cycle), the particular embedding of the graph H in Q k does make a difference.
Finally, we studied the Q 2 -free process as a natural special case of the Ffree process.What can be said about the F -free process in the cube for other fixed graphs F ? Two particularly appealing instances for F are fixed dimension subcubes Q k and the star K 1,t (the bounded degree process).Once again, one could forbid subgraphs in an isometric sense or not and this choice will in general change the problem (although for stars and subcubes the notions coincide).
Question 8. what can be said about the graph G M generated by the F -free process in Q d ?In particular when F = Q k or F = K 1,t .

2 3 2
d .Due to this heuristic, we propose the following, Conjecture 4. Let G M be the graph generated by the (Q d , Q 2 )-free process.With high probability,