Random subcube intersection graphs I: cliques and covering

We study random subcube intersection graphs, that is, graphs obtained by selecting a random collection of subcubes of a fixed hypercube $Q_d$ to serve as the vertices of the graph, and setting an edge between a pair of subcubes if their intersection is non-empty. Our motivation for considering such graphs is to model `random compatibility' between vertices in a large network. For both of the models considered in this paper, we determine the thresholds for covering the underlying hypercube $Q_d$ and for the appearance of s-cliques. In addition we pose some open problems.


Introduction
In this paper we introduce and study two models of random subcube intersection graphs. These are random graph models obtained by (i) selecting a random collection of subcubes of a fixed hypercube Q d , to serve as the vertices of the graph, and (ii) setting an edge between a pair of subcubes if their intersection is non-empty.

Motivation
Our basic motivation is to study a model for 'random compatibility' between vertices.
A typical example of the applications we have in mind comes from social choice theory. Suppose we have a society V which is faced with d political issues, each which can be resolved in a binary fashion. We represent the two policies possible on each issue by 0, 1, and the family of all possible sets of policies by a d-dimensional hypercube Q d .
Individual members of the society may have fixed views on some issues, but may be undecided or indifferent on others. We can thus associate to each citizen v ∈ V a subcube of acceptable policies f (v) in a natural way. The subcube intersection graph G arising from (V, Q d , f ) then represents political agreement within the society: uv is an edge of G if and only if the citizens u and v can agree on a mutually acceptable set of policies.
A key characteristic of subcube intersection graphs is that they possess the Helly property: if we have s subcubes f (v 1 ), f (v 2 ), . . . f (v s ) of Q d which are pairwise intersecting, then their total intersection s i=1 f (v i ) is non-empty. (This is an easy observation, already made by Johnson and Markström in [20].) A consequence of this fact is that in the model for political agreement described above, s-cliques represent s-sets of citizens able to agree on a mutually acceptable set of policies and, say, unite their forces to promote a common political platform. This example motivates our study of the clique number in (random) subcube intersection graphs.
Other examples of compatibility graphs naturally modelled by subcube intersection graphs exist. Some closely resemble the one above: the work of matrimonial agencies or the assignment of room-mates in the first year at university for instance naturally lead to the study of such compatibility graphs. Another class of examples can be found in the medical sciences, more precisely in the context of organ donations. For kidney or blood donations, several parameters must be taken into account to determine whether a potential donor-receiver pair is compatible. Large random subcube intersection graphs may provide a good way of modelling these compatibility relations over a large pool of donors and receiver, and of identifying efficient matching schemes. Our hope is that investigating some properties of random subcube intersection graphs may help shed light on others relevant in applications.

The models
Let us now describe our models more precisely. We begin with some basic definitions and notation.
Definition 1 (Intersection graphs). A feature system is a triple (V, Ω, f ), where • V is a set of vertices, • Ω is a set of features, and • f is a function mapping vertices in V to subsets of Ω.
Given a vertex v ∈ V , we call f (v) ⊆ Ω its feature set.
We construct a graph G on the vertex-set V from a feature system (V, Ω, f ) by placing an edge between u, v ∈ V if their feature sets f (u), f (v) have non-empty intersection. We call G the intersection graph of the feature system (V, Ω, f ).
In this paper we shall study intersection graphs where Ω and the feature sets {f (v) : v ∈ V } have some additional structure. Namely, Ω shall be a high-dimensional hypercube Q d and the feature sets will consist of subcubes of Q d .
Definition 2 (Hypercubes and subcubes). The d-dimensional hypercube is the set Q d = {0, 1} d . A k-dimensional subcube of Q d is a subset obtained by fixing (d − k)-coordinates and letting the remaining k vary freely. We may regard subcubes of Q d as elements of {0, 1, } d , where coordinates are free and the 0, 1 coordinates are fixed.
We shall define two models of random subcube intersection graphs. Both of these are obtained by randomly assigning to each vertex v ∈ V a feature subcube f (v) of Q d and then building the resulting intersection graph.
Definition 3 (Uniform model). Let V be a set of vertices. Fix k, d ∈ N with k ≤ d. For each v ∈ V independently select a k-dimensional subcube f (v) of Q d uniformly at random, and set an edge between u, v ∈ V if f (u)∩f (v) = ∅.
Denote the resulting random subcube intersection graph by G V,d,k .
Definition 4 (Binomial model). Let V be a set of vertices. Fix d ∈ N and p ∈ [0, 1]. For each v ∈ V independently select a subcube f (v) ∈ {0, 1, } d at random by setting (f (v)) i = with probability p and (f (v)) i = 0, 1 each with probability 1−p 2 independently for each coordinate i ∈ {1, . . . d}. Denote the resulting random subcube intersection graph by G V,d,p .
Remark 1. We may view G V,d,p as the intersection of d independent copies of G V,1,p on a common vertex-set V . Indeed an edge uv of G V,d,p is present if and only if f (u) and f (v) agree in each and every one of the d dimensions of Q d .
The graph G V,1,p is itself rather easy to visualise: we first randomly colour the vertices in V with colours from {0, 1, }, and then remove from the complete graph on V all edges between vertices in colour 0 and vertices in colour 1.

Degree distribution, edge-density and relation to other models of random graphs
Our two models of random subcube intersection graphs bear some resemblance to previous random graph models. To give the reader some early intuition into the nature of random subcube intersection graphs, we invite her to consider the degree distributions and edge-densities found in them, and to contrast them with models of random graphs with similar degree distributions and edge-densities. The degree of a given vertex in the uniform model G V,d,k is a binomial random variable with parameters |V | − 1 and q, where q is the probability that two uniformly chosen k-dimensional subcubes of Q d meet. If k = k(d) = αd for some fixed α ∈ (0, 1), then one can show This expression is not, however, terribly instructive. The quantity q is also the edge-density of G V,d,k . The appropriate random graph to compare and contrast it with is thus an Erdős-Rényi random graph on V with edge probability q. However G V,d,k displays some significant clustering: our results can be used to show for instance that dependencies between the edges cause triangles to appear well before we see a linear number of edges, in contrast to the Erdős-Rényi model.
The edge-density of the binomial model G V,d,p is easy to compute: it is The degree distribution of G V,d,p is more complicated, however. Increasing the dimension of a subcube by 1 doubles its volume inside Q d , so that larger subcubes expect much larger degrees. The number of feature subcubes from our graph met by a fixed subcube of dimension αd is a binomial random variable with parameters |V | and 1+p 2 (1−α)d . The number of vertices in V whose feature subcubes have n=200 d=20 p=0.35 dimension αd is itself a binomial random variable with parameters |V | and As in this paper we will typically be interested in the case where d is large and V has size exponential in d, we will expect to see some feature subcubes with dimension much larger or much smaller than pd. This will have a noticeable effect on the properties of the graph G V,d,p .
Among the random graph models studied in the literature, G V,d,p most resembles the multi-type inhomogeneous random graphs studied in [8], though we should point out there are significant differences. First of all some 'types' corresponding to vertices with unusually large or unusually small feature subcubes will have only a sublinear (and random) number of representatives. Secondly, the binomial model shares the clustering behaviour of the uniform model (see Remark 3), differentiating it from the models considered in [8]. (We note that a further general model for inhomogeneous random graphs with clustering was introduced by Bollobás, Janson and Riordan in [9], for which this second point does not apply.) Finally, let us mention the standard models of random intersection graphs. Write [m] for the discrete interval {1, 2, . . . m}. In the binomial random in- . This feature set is obtained by including j ∈ [m] into f (v) with probability p and leaving it out otherwise independently at random for each feature j ∈ [m]. Edges are then added between all pairs of vertices u, v ∈ V with f (u) ∩ f (v) = ∅ to obtain a random intersection graph on V . A variant on this model is to choose feature sets f (v) uniformly at random from the k-subsets of [m]; this yields the uniform random intersection graph model G(V, [m], k).
While these two random intersection graph models bear some resemblance (in terms of clustering, for example) to our random subcube intersection graph models, there are also some significant differences due to the underlying structure of our feature sets. Let us note amongst other things that random intersection graphs do not have the Helly property, and that the effects on the degree of increasing the size of a feature set by 1 in a binomial random intersection graph are far less dramatic than the effects of increasing the dimension of a feature subcube by 1 in a binomial random subcube intersection graph. In particular, the binomial random subcube intersection graph model G V,d,p (which is probably the most natural one to consider given the applications motivating our work) has a much more dramatic variation of degrees than its non-structured counterpart G(V, [m], p).
We end this section by noting that there has been some interest in another model of 'structured' random intersection graphs, namely random interval graphs. The idea here is to associate to each vertex v ∈ V a feature interval 1] at random and to set an edge between u, v ∈ V whenever I u ∩ I v = ∅. Here 'at random' means the intervals are generated by independent pairs of uniform U (0, 1) random variables, which serve as the endpoints. A d-dimensional version of this model also exists, where we associate to each vertex a d-dimensional box lying inside [0, 1] d . This gives rise to (random) d-box graphs.
In the setting of interval or d-box graphs, we do have the Helly property. The random interval and random d-box graph models are however quite different from the random subcube intersection graphs we study in this paper. Indeed, for d = 1 subcube intersection graphs can be viewed as interval graphs where the feature intervals I v are restricted to a small set of possible values, for example [0, 1], [0, 1/3] and [2/3, 1] (to correspond to , 0 and 1 respectively). This naturally leads to very different dynamics.

Previous work on random intersection graphs and subcube intersection graphs
Subcube intersection graphs were recently introduced by Johnson and Markström [20], with motivation coming from the same example in social choice theory we discussed in Section 1.1. They studied cliques in subcube intersection graphs from an extremal perspective, obtaining a number of results on Ramsey-and Turán-type problems and providing a counterpoint to the probabilistic perspective of the work undertaken in this paper. The random intersection graph models G(V, [m], p) and G(V, [m], k) we presented in the previous subsection have for their part received extensive attention from the research community since they were introcuded by Karonski, Scheinerman and Singer-Cohen [22] and Singer-Cohen in her thesis [32]. By now, many results are known on their connectivity [4,19,27,32], hamiltonicity [7,14], component evolution [1,5,27], clique number [6,22,29], independence number [24], chromatic number [2,25], degree distribution [33] and near-equivalence to the Erdős-Rényi model G n,p for some range of the parameters [17,28], amongst other properties. Even more recently, there has been interest in obtaining versions of the results cited above for inhomogeneous random intersection graph models.
Finally there has been some work on random intersection graphs and d-box graphs that runs somewhat parallel to the work of Johnson and Markström and of this paper. From an extremal perspective, sufficient conditions for the existence of large cliques in d-box graphs were investigated by Berg, Norine, Su, Thomas and Wollan [3] in the context of models for social agreement and approval voting, while random interval graphs were introduced by Scheinerman [30], and have been extensively studied [13,18,26,31].

Results of this paper
In this paper we study the behaviour of the binomial and uniform subcube intersection models when d is large (see Remark 2 below for a discussion of the constant d case). We study two main properties, that of containing a clique of size s = s(d), and that of covering the entirety of the underlying hypercube Q d with the union v∈V f (v) of the feature subcubes.
Both of these properties are closed under the addition of vertices to V (or, equivalently, of subcubes f (v) to the family of feature subcubes). The question is then how large V needs to be for these properties to hold with high probability (whp), that it to say with probability tending to 1 as d → ∞. In the case of covering, this question can be thought of as a structured variant of the classical coupon collector problem (see the discussion at the beginning of Section 2.3).
In this paper, we restrict our attention to the binomial model G V,d,p with p ∈ (0, 1) fixed, and to the uniform model G V,d,k with k = k(d) = αd for α ∈ (0, 1) fixed. In both cases, the interesting behaviour occurs when |V | = e xd for x bounded away from 0. We thus typically use x = 1 d log |V | as a parameter, rather than the actual number |V | of vertices in the graph.
Definition 5. Let P be a property of subcube intersection graphs that is closed with respect to the addition of vertices. We say that a real number t ∈ R ≥0 is a threshold for P in the binomial model with parameter p if for Similarly, we say that T ∈ R ≥0 is a threshold for P in the uniform model with parameter while for x > T and |V (d)| ≥ e xd we have lim d→∞ P(G V,d,k ∈ P) = 1.
Our main results are determining the thresholds for the appearance of cliques and for covering the ambient hypercube in both the binomial and the uniform model. In most cases we also give some slightly more precise information about the thresholds, going into the lower order terms. We show in particular that around the covering threshold, the clique number of our models undergoes a transition: below the covering threshold, the clique number is whp of order O(1); close to the covering threshold, it is whp of order a power of d; finally above the covering threshold, it is whp of order exponential in d.
Our paper is structured as follows. In Section 2, we state and prove our results for the binomial model. In Section 3 we use these to obtain our results for the uniform model. Finally in Section 4 we discuss small p and large p behaviour, and end with a number of open problems.
Remark 2. In this paper, as we said, we are focussing on our models in the case where d → ∞. What happens when d is fixed and the number of vertices goes to infinity?
In some applications, this may be a more relevant choice of parameters. However, the asymptotic behaviour of our models in this case is not terribly interesting. Indeed, let d be fixed and let U be the family of all subcubes of Q d . We may define a subcube intersection graph G d on U by setting an edge between two subcubes if their intersection is non-empty. The binomial model G V,d,p is then just a random weighted blow-up of G d . Thus knowledge of the finite graph G d will give us essentially all the information we could require concerning the graph G V,d,p as |V | → ∞.
Similarly, the asymptotic behaviour of G V,d,k for d fixed can be inferred from the properties of the intersection graph G d k of the k-dimensional subcubes of Q d . We note that this latter graph G d k may be thought of as a subcube analogue of (the complement of) a Kneser graph, and may be interesting in its own right as a graph theoretical object; this is however outside the scope of the present paper.

A note on approximations
Throughout this paper we shall need some standard approximations. In particular we shall often use m

The binomial model 2.1 Summary
In this section, we prove our results for the binomial model. Denote by K s the complete graph on s vertices. Recall that the clique number ω(G) of a graph G is the largest s such that G contains a copy of K s as a subgraph.

Then for every sequence of vertex sets
Corollary 2. Let p ∈ (0, 1) and s ∈ N be fixed. The threshold for the appearance of s-cliques in Remark 3. As we shall see in the proof of Theorem 1, from the moment it becomes non-zero, the number of edges in G V,d,p remains concentrated about its expectation e 2(x−t K 2 )d+o(d) . If there was no clustering in G V,d,p , that is, if cliques appeared no earlier than they would in the Erdős-Rényi model with parameter e −2t K 2 d , then we would expect s-cliques to appear roughly when However, it is the case that t Ks < (s − 1)t K 2 for all p ∈ [0, 1) and all s ≥ 3. This is an exercise in elementary calculus. In particular, s-cliques appear much earlier than we would expect them to given the edge-density of our binomial random subcube intersection graphs. Indeed, letting p → 0, we have for s ≥ 3 , which is strictly larger provided p is chosen sufficiently small. Thus for every s ∈ N, there exists p s ∈ [0, 1] such that for all fixed p ∈ [0, p s ], whp we see s-cliques appear in G V,d,p before we have a linear number of edges. This stands in stark contrast to the situation for the Erdős-Rényi model.
Corollary 4. Let p ∈ (0, 1) be fixed. Then the threshold for covering the ambient hypercube Q d with the feature subcubes from G V,d,p is t cover (p) = log 2 1 + p .
Theorem 5. Let p ∈ (0, 1) and ε > 0 be fixed, and let s = s(d) be a sequence of integers with s d/ log d. Then for every sequence of vertex sets Theorem 1 is proved in Section 2.2, where in addition we prove some key results on the dimension of the feature subcubes of the vertices in the first s-clique to appear in our graph. These will be needed in Section 3 when we study the uniform model. Theorem 3 is proved in Section 2.3, while Theorem 5 is proved in Section 2.4.
Before we proceed to the proofs, let us remark that our results imply that the clique number ω(G V,d,p ) undergoes a transition around the covering threshold.
The following hold: has order e cd+o(d) . Let q(s, d) denote the probability that a given s-set of vertices induces an s-clique in G [n],d,p . We have

Below the covering threshold
It follows by Markov's inequality that whp G [n],d,p contains no s-clique, proving the first part of the theorem.
We use Chebyshev's inequality to show that X is concentrated about this value (and hence that whp G [n],d,p contains an s-clique). Fix i : 0 ≤ i ≤ s. Let A, B be two s-sets of vertices meeting in exactly i vertices. Using Remark 1 and the inclusion-exclusion principle, we compute the probability b i that both A and B induce a copy of K s in G [n],d,p : We claim that the dominating contribution to this sum comes from the i = 0 term. Indeed, for (the second and third term in the exponent coming from log b i and the last two terms coming from with the inequality in the last line coming from substituting t Ks (p) + 2 log s d + ε log d d for x, and rearranging terms. We now resort to the following technical lemma.
Lemma 7. For all y ∈ [0, 1] and all integers 0 ≤ i ≤ s, the following inequality holds: We defer the proof of Lemma 7 (which is a simple albeit lengthy exercise) to Appendix A. Set y = 2p 1+p . As 0 < p < 1 by assumption, we have y ∈ (0, 1). Applying Lemma 7, we have Substituting this into the expression inside the exponential in 1, we get Thus (1)).
In particular, Var(X) = o (EX) 2 , and by Chebyshev's inequality whp X is at least 1 2 EX > 0. Thus whp G [n],d,p contains (many) s-cliques, as claimed. is: This quantity becomes large when x hits the threshold value For every ε > 0, we can deduce from Markov's inequality that for x ≤ t α Ks −ε whp there is no s-clique in G [n],d,p in which one of the vertices has a feature subcube of dimension less than αd.
Using this fact together with Theorem 1 (or more precisely Corollary 2), we can identify with quite some precision the dimension of the feature subcubes of the vertices which witness the emergence of s-cliques in G Remark 7. For 0 < p < 1 fixed, the sequence (α s ) s∈N is strictly increasing and tends to 2p 1+p as s → ∞. Note in particular that for all s > 1, α s (p) > α 1 (p) = p. Proposition 8. Let p ∈ (0, 1) be fixed. Then for every s ∈ N the following equality holds: t Ks = t αs Ks .
Moreover, t αs Ks is the unique minimum of t α Ks over all α ∈ [0, 1].
Proof. The first part of our proposition is a simple calculation. Recall from the proof of Theorem 1 that q(s, 1) = 2 1+p 2 s − p s is the probability that a given s-set of vertices forms an s-clique in G [n],d,1 . Note that Thus, as required. Now, let us show that t αs Ks is in fact the unique minimum of t α Ks over α ∈ [0, 1]. This is a straightforward exercise in calculus: making use of our observations above, we may write st α Ks as The derivative with respect to α is which is strictly negative for 0 ≤ α < α s , zero for α = α s and strictly positive for 1 ≥ α > α s , establishing our claim. Proof. Pick ε > 0. For all α with |α − α s | > ε, we have t α Ks > t Ks by Proposition 8. In particular by Markov's inequality, whp the first s-clique to appear has no feature subcube of dimension αd.

The covering threshold
We may view the question of covering the hypercube Q d with randomly selected subcubes as an instance of the following problem.

Problem 1 (Generalised Coupon Collector Problem).
Let Ω be a (large) finite set, and let X be a random variable taking values in the subsets of Ω. Suppose we are given a sequence of independent random variables X 1 , X 2 , . . . , X n with distribution given by X. When (for which values of n) do we have n i=1 X i = Ω holding whp?
When X is obtained by selecting a singleton from Ω uniformly at random, Problem 1 is the classical coupon collector problem (see [23] for an early incarnation of the problem).
The bounds we give in Proposition 10 are very crude, but are in a sense of the right order. That the upper bound is essentially best possible follows from classical results on the standard coupon collector problem, where X is a singleton from Ω chosen uniformly at random (see for example Erdős and Rényi's result [15] on the distribution of the least n for which n i=1 X i = Ω). To see that the lower bound is basically best possible, consider the following example.  (1) , which is of constant order.
What Example 10 shows is that the variance of the variable X can have a significant influence on the covering time (number n of independent copies of X necessary to have n i=1 X i = Ω holding whp). In our setting, we have Ω = Q d , and the (X i ) n i=1 are the feature subcubes of vertices in the binomial random subcube intersection graph 'G [n],d,p . The expected volume of a feature subcube f (v) is However typical feature subcubes have dimension pd and thus volume 2 pd . Since 2 p < 1 + p for all p ∈ (0, 1), typical feature subcubes therefore have a volume much smaller than the expected volume. In particular, the variance of the volume of a feature subcube is large, and thus our binomial subcube version of Problem 1 is significantly different from classical coupon collector problems.
On the other hand, we have for v ∈ V , which is exactly the threshold for covering we prove in Theorem 2.3. Thus despite the large variance in the volume of the feature subcubes f (v) dis-cussed above, it is the case that in the binomial random subcube intersection graph model the covering threshold agrees with the prediction we would make based on the classical coupon collector problem.
This having been said, let us make one more definition before proceeding to the proof of our covering result, Theorem 3.

Above the covering threshold
For n ≥ e xd , we have EVol[n] ≥ e ε s2 d , so that in particular elements of the ambient hypercube expect to be contained in e ε s > s feature subcubes. Thus to show that G [n],d,p whp contains (many) s-cliques for this value of n, it is enough to show that Vol[n] is concentrated about its mean. Again, we use the second-moment method to do this. By linearity of variance we have Applying Chebyshev's inequality, In particular, proving the claimed upper bound on the threshold for the emergence of s-cliques.
In particular the expected number of elements of Q d contained in at least It follows by Markov's inequality that whp there is no such element, and thus, by the Helly property for subcube intersections, that whp G [n],d,p contains no copy of K s .

The uniform model
In this section, we prove our results for the uniform model. We note that these are generally less precise than those we obtained for the binomial model, owing to the greater difficulty of performing clique computations.

Summary
Fix s ∈ N. We established in Section 2 (Corollary 9) that in G V,d,p , whp the feature subcubes of the vertices in the first s-clique to appear all have dimension (α s + o(1))d, where α s is the function: We shall show in Proposition 16 that α s is a bijection from (0, 1) to itself. This will allow us to determine the threshold for the appearance of s-cliques in the uniform model. Theorem 11. Let α ∈ (0, 1) and s ∈ N be fixed, and let k(d) = αd . Set p = α s −1 (α). Then, the threshold for the appearance of s-cliques in Theorem 12. Let α ∈ (0, 1) and ε > 0 be fixed, and let k(d) = αd . Let V = V (d) be a sequence of vertex sets with |V (d)| = e xd . Then, for the Corollary 13. Let α ∈ (0, 1) be fixed, and let k(d) = αd . Then the threshold for covering the ambient hypercube Q d with the feature subcubes from G V,d,k is T cover (α) = (1 − α) log 2.
Remark 9. As we observed in Remark 7, we have lim s→∞ α s (p) = 2p 1+p . From this we deduce that for large s, we have α −1 s (α) = α 2−α + o(1). Substituting this into T Ks (α), we see that as s → ∞, mirroring our observation in Remark 4 for the binomial model.
Remark 10. Theorems 11 and Corollary 13 show how significant 'outliers' (subcubes with unusually high dimension) are for the behaviour of the binomial model. Indeed, Corollary 9 tells us that for 0 < p < 1 fixed and s ≥ 3, the vertices in the first s-clique to appear in G V,d,p have feature subcubes of dimension (α s (p) + o(1)) d. Since α s (p) > p it shall follow straightforwardly from the proof of Theorem 11 that t Ks (p) < T Ks (p). Similarly, by Corollaries 3 and 13, we have for 0 < p < 1 fixed that t cover (p) = log 2 1 + p < (1 − p) log 2 = T cover (p).
From the covering threshold upwards, Corollary 13 and Theorem 14 suggest that the right instance of the binomial model to compare G V,d, αd with is G V,d,2 α −1 (rather than the G V,d,α we might have expected). For these two models, the covering threshold and the thresholds for higher order cliques coincide. Since both models have the same expected volume of feature subcubes, this vindicates the use of volume/covering arguments for determining the thresholds for higher order cliques. Note however that G V,d, αd and G V,d,2 α −1 have different thresholds for lower order cliques. Our binomial model and uniform model thus behave very differently.
Finally, let us add that, just as in the binomial model, the clique number ω(G V,d,k ) in the uniform model undergoes a transition around the covering threshold.
Remark 11. There is a gap here: we do not know what the order of the clique number is when for a fix real γ with 0 < γ ≤ 1. We make the natural conjecture that for this value of x(d), we should have ω(G V,d,k ) = d γ+o (1) , similarly to the binomial model.

Below the covering threshold
Proposition 16. The function α s is a bijection from [0, 1] to itself, and has a continuous inverse over its domain.
Proof. Since α s (0) = 0 and α s (1) = 1, all we have to do is show that the derivative of α s with respect to p is strictly positive in [0, 1], whence we are done by the inverse function theorem.
Thus the minimum of the numerator is attained when y(p) = 1. In particular, In general, computing an explicit closed-form expression for the inverse of α s is difficult, reflecting the fact that computing the probability that the intersection of an s-set of k-dimensional subcubes chosen uniformly at random is non-empty is hard. It is for this reason that in Theorem 11 we give the thresholds for the uniform model in terms of the thresholds for the binomial model.
Proof of Theorem 11. The key observation is that we can view the binomial model as the result of a two stage random process. In the first stage we randomly partition the set of vertices V into sets V 0 , V 1 . . . V d , according to the binomial distribution with parameter p. In the second stage for each k we associate independently to each vertex in V k a feature subcube of dimension k chosen uniformly at random, and then build the subcube intersection graph as normal. In particular, the restriction of G V,d,p to the (random) subset V k is exactly (an instance of) the uniform model G V k ,d,k . We shall use this to pull results back from the binomial model to the uniform model.
By a standard Chernoff bound, for d large enough there are whp at most ,d,p whose feature subcubes have dimension in the range [k − , k + ]. Let V denote this set of vertices.
For each v ∈ V with a feature subcube of dimension k with k − ≤ k ≤ k + , select a (k + − k)-subset of its fixed coordinates uniformly at random from all possibilities, and change those coordinates to wildcards . This gives a new feature subcube f (v) with dimension exactly k + .
We now restrict our attention to the subcube intersection graph G defined by V and the 'lifted' feature subcubes (f (v)) v∈V . Observe that the distribution on k + -dimensional subcubes given by f is exactly the uniform distribution. Thus G is in fact an instance of the uniform model G V ,d,k + . Furthermore the 'lifting' procedure we performed on the feature subcubes (f (v)) v∈V has not destroyed any edge of the restriction of G [N ],d,p to Vincreasing the dimension of feature subcubes can only add edges -so that whp G contains an s-clique.
It follows that the threshold for the appearance of s-cliques in the uniform model with parameter k + = αd is at most Since ε, η > 0 were arbitrary, and since both p and t Ks are continuous functions (of α − η and p = (α s ) −1 (α − η) respectively), the threshold for the appearance of s-cliques in the k + -uniform model is at most proving the claimed upper bound on T Ks (α). (Recall that p = (α s ) −1 (α) = lim η→0 p .) Lower bound: consider the binomial random subcube intersection graph G [N ],d,p , and let X = log N d . Suppose X = t Ks (p) − ε + o(1). By Corollary 2, whp G [N ],d,p contains no s-clique. In particular the subgraph of G [N ],d,p induced by the set of vertices V whose feature subcube have dimension αd is also K s -free. As we observed, this random subgraph is identical in distribution to the random uniform subcube intersection graph G V ,d,k . Let N = |V | be the number of vertices it contains.
By a standard Chernoff bound, whp It follows that the threshold for the whp appearance of s-cliques in the uniform model G V,d,k is at least Since ε > 0 was arbitrary, the claimed lower bound on T Ks (α) follows.

The covering threshold
Proof of Theorem 12. This is very similar to the proof of Theorem 2.3. Assume without loss of generality that V = [n]. We let α ∈ (0, 1) be fixed, set k = k(d) = α and consider the uniform random subcube intersection graph G  (1)) . Now let ε be fixed with 0 < ε < log 2. Upper bound: suppose n = e xd ≥ 2 d−k d(log 2 + ε). Then the expected number of uncovered elements of Q d is e −εd+o(d) = o(1), whence by Markov's inequality we have that whp n v=1 f (v) = Q d , as desired. Lower bound: suppose n = e xd = 2 d−k d(log 2 − ε) . Then the expected number of uncovered elements of Q d is e εd+o(d) , which is large, and we use Chebyshev's inequality to show the actual number of uncovered elements is concentrated about this value. As before we compute the expectation of the square of the number of uncovered elements by considering pairs of points lying at Hamming distance i from one another. Let e [i] denote the element of Q d with 1 in the first i coordinates and 0 otherwise.
We now bound the sum above just as we did in the proof of Theorem 2.3, to show Since the details are identical, we omit them. We deduce just as in The- By Chebyshev's inequality whp at least e εd+o(d) 1 elements of Q d are not covered by n v=1 f (v), as required.

Above the covering threshold
Proof of Theorem 14. This is similar to the proof of Theorem 5. Without loss of generality, we may assume that V = [n]. Fix ε > 0 and α ∈ (0, 1). Let k = k(d) = αd , and consider the random subcube intersection graph we have by the pigeon-hole principle that some x ∈ Q d is contained in at least s feature subcubes, and thus G [n],d,k contains a copy of K s . Lower bound: suppose n ≤ (1 − ε)s2 d−k . Let 0 be the all zero element from Q d . The number C 0 of feature subcubes containing 0 is the sum of n independent identically distributed Bernoulli random variables with parameter 2 −(d−k) . We have EC 0 = n2 −(d−k) ≤ (1 − ε)s. Applying a Chernoff bound, we deduce that In particular the expected number of elements of Q d contained in at least s feature subcubes is at most 2 d e − ε 2 3 s , which is o(1) for s d. It follows by Markov's inequality and the Helly property for subcube intersections that whp G [n],d,k contains no copy of K s .

Concluding remarks 4.1 Small p and large p
In this paper we focussed on the case where p ∈ (0, 1) is fixed (in the binomial model). Let us make here a few remarks about the small p and large p case. Small p: note first of all that the proofs of Theorems 1, 3 and 5 all extend to the case when p = p(d) → 0 as d → ∞. Similarly, Theorems 12 and 14 for the uniform model also hold when α = α(d) → 0 as d → ∞. The proof of Theorem 11 does not, however, go through as it is stated in the paperit needs stronger concentration for the dimension of the feature subcubes in Corollary 9.
There are two further remarks worth making concerning the small p case. First of all, as p (or α) tends to 0, the covering results Theorems 3 and 12 'converge' to the classical coupon collector problem. Taking Ω = Q d and independently drawing random elements from Ω, we expect to make roughly |Ω| log |Ω| = 2 d d log 2 draws before we cover Ω, and this is the limit of 2 1+p d d log 2 as p → 0 (binomial model) and of 2 (1−α)d d log 2 as α → 0 (uniform model).
Secondly, for the uniform model with constant parameter k = 1, the uniform model is closely related to bond percolation on the hypercube, which is a well-studied model of random graph in its own right (see e.g. [10,11]). On the other hand, the binomial model with parameter p = 1 d is different: the dimension of its feature subcubes have an approximatively Poisson distribution, and one does see feature subcubes of large bounded dimension. These will have an impact on the thresholds for lower-order cliques -indeed, a quick calculation shows that for s fixed, the expected dimension of feature subcubes in s-cliques of G V,d, 1 . Large p: in this case, we expect quasirandom behaviour from G V,d,p . We establish it below in the special case when p = 1 − ε(d), with ε(d) of order 1 √ d , when the edge-density is of constant order. Proposition 17. Let ε(d) be a sequence of reals from the interval [0, 1] with ε 2 d bounded away from both 0 and +∞. Then for p = 1 − ε, with probability tending to 1 as n → ∞ the graph G [n],d,p is quasirandom with parameter Proof. We shall use the celebrated quasirandomness theorem of Chung, Graham and Wilson [12], which states (amongst other things) that if the number of K 2 (edges) and the number of C 4 (4 − cycles) contained in a graph are 'what you would expect if the graph was a typical Erdős-Rényi random graph with parameter q', then G is quasirandom with parameter q. We now verify that the numbers of edges #{K 2 } and of 4-cycles #{C 4 } are concentrated about their respective expectations. We appeal to the second moment method once more. For the edge K 2 we already established that Var#{K 2 } = o (E#{K 2 }) 2 in the proof of Theorem 1. Thus by Chebyshev's inequality, we have the required concentration: whp G Our proposition then follows from the quasirandomness theorem of Chung, Graham and Wilson [12].
Thus in this case the binomial model behaves like an Erdős-Rényi random graph. It is not hard to use this to show that the uniform model with parameter k = d − O( √ d) is also quasirandom. As such, there is nothing very novel about our models in this range.

Further questions
There are a number of further natural questions to ask about our models.
Two such questions concern connectivity and component evolution. We address them in a forthcoming paper [16], in which we show amongst other things that the connectivity threshold for the binomial model with p ≤ 1 3 is t connect = log 2 1+p , coinciding with the covering threshold. For the range p > 1 3 on the other hand, we relate the connectivity of the binomial model to that of the uniform model for a suitable choice of parameter k.
A big open problem remains to understand independence in the context of subcube intersection graphs. We do not know how to track the independence number of our models, and more generally we do not know how to perform anything but the most basic computations involving non-edges. Similarly, we have a lower bound on the chromatic number coming from the clique number, but no non-trivial upper bound.
Finally, given our motivation for studying subcube intersection graphs, it would be desirable to allow some bias in the distribution of the feature subcubes. For instance, in a polarised society with two-party politics it is likely citizens will have either mostly zeroes ('left-wing opinions') or mostly ones ('right-wing opinions') amongst their opinions. It would then be interesting to study the change in the behaviour of our models as the polarisation becomes stronger. g(0) = 2 s−i > 1 and g(1) = 1, so we will be done if we can show that the function g is monotone decreasing in the interval [0, 1].
The claim above implies that g (y) ≤ 0 for all y ∈ [0, 1], whence g(y) ≥ g(1) = 1 for all y ∈ [0, 1], as desired. Let us therefore prove it to conclude the proof of Lemma 7. This completes the proof of Lemma 7.