Packing degenerate graphs

Given $D$ and $\gamma>0$, whenever $c>0$ is sufficiently small and $n$ sufficiently large, if $\mathcal{G}$ is a family of $D$-degenerate graphs of individual orders at most $n$, maximum degrees at most $\tfrac{cn}{\log n}$, and total number of edges at most $(1-\gamma)\binom{n}{2}$, then $\mathcal{G}$ packs into the complete graph $K_{n}$. Our proof proceeds by analysing a natural random greedy packing algorithm. This version of the manuscript corrects a small error that appeared in the published version [Adv Math, 354 (2019), 106739].


Introduction
A packing of a family G = {G 1 , . . . , G k } of graphs into a graph H is a colouring of the edges of H with the colours 0, 1, . . . , k such that the edges of colour i form an isomorphic copy of G i for each 1 ≤ i ≤ k. The packing is perfect if no edges have colour 0. We will often say an edge is covered in a packing if it has colour at least 1, and uncovered if it has colour zero.
Packing problems have been studied in graph theory for several decades. Many classical theorems and conjectures of extremal graph theory can be written as packing problems. For example, Turán's theorem can be read as the statement that if the n-vertex G does not have too many edges (depending on r), then G and K r pack into K n . Putting extremal statements into this context often suggests interesting generalisations, such as asking for packings of more graphs. However packings in this context are usually very far from being perfect packings, with a large fraction of E(H) uncovered. By contrast, in this paper we are interested in near-perfect packings, that is, packings in which o e(H) edges are uncovered.
The first problems asking for perfect packings in graphs actually predate modern graph theory: Plücker [23] in 1835 found perfect packings of 1 3 n 2 copies of K 3 into K n for various values of n, and more generally, Steiner [26] in 1853 asked the following question (phrased then in set-theoretic terms). Question 1. Given 2 ≤ k ≤ r, for which values of n does the complete k-uniform hypergraph K (k) n have a perfect packing with copies of K (k) r ? A packing of this form is called a combinatorial design. There are some simple divisibility conditions on n which are necessary for an affirmative answer. Recently and spectacularly, Keevash [19] proved that for sufficiently large n these conditions are also sufficient. This result was reproved, using a more combinatorial method, by Glock, Kühn, Lo and Osthus [14], who were also able to extend the result to pack with arbitrary fixed hypergraphs in [15]. A related problem tracing back to Kirkman [21] in 1846 asks for packings with copies of the n-vertex the work of Adamaszek, Allen, Grosu, Hladký [1]. The focus of [1] is the so-called Graceful Tree Conjecture but there is a well-known observation that this conjecture would imply Ringel's Conjecture, see [1,Section 1.1].
To state our main result, we need to define the degeneracy of a graph G. An ordering of V (G) is D-degenerate if every vertex has at most D neighbours preceding it, and G is D-degenerate if V (G) has a D-degenerate ordering. Every graph from a non-trivial minorclosed class has bounded degeneracy. In particular, trees are 1-degenerate, planar graphs are 5-degenerate. Of course, every bounded-degree graph has automatically bounded degeneracy.
Our main result then reads as follows.
Theorem 2. For each γ > 0 and each D ∈ N there exists c > 0 and a number n 0 such that the following holds for each integer n > n 0 . Suppose that (G t ) t∈[t * ] is a family of D-degenerate graphs, each of which has at most n vertices and maximum degree at most cn log n . Suppose further that the total number of edges of (G t ) t∈[t * ] is at most (1 − γ) n 2 . Then (G t ) t∈[t * ] packs into K n .
Theorem 2 thus strengthens the main results about packings into complete graphs from [6,22,10,20,11]. 2 The main features of the result are that guest graphs may be spanning, expanding, and have very high maximum degree.
Moving away from packing into complete graphs, there are several classical conjectures which ask for packing results similar to the above when K n is replaced by a graph of sufficiently high minimum degree, perhaps with additional constraints (such as regularity). Advances have recently been made on several of these, especially by the Birmingham Combinatorics group (see for example [8,4,13]). In particular, we should observe that the near-perfect packing for bounded degree graphs [20] mentioned above actually works in the setting of ε-regular partitions, which turned out to be necessary for the perfect packing results of [18].
Finally, in line with the current trend in extremal combinatorics of asking for random analogues of classical extremal theorems, one can ask for packing results when K n is replaced by a typical binomial random graph G(n, p). This is actually the focus of the paper of Ferber and Samotij [11], and they are able to prove near-perfect packing results even in G(n, p) when p is not much above the threshold for connectivity. Our approach also proves near-perfect packing results (for the same family of graphs) in sufficiently quasirandom graphs of any positive constant edge density (see Theorem 11), and hence in Erdős-Rényi random graphs (see Theorem 12). It might be possible to modify our approach to work in somewhat sparse random graphs as well, but certainly not sparse enough to compete with [11].
Although our current progress with actually proving exact packing conjectures is limited, at least we have not found counterexamples. The existing conjectures point in the following direction.
Meta-Conjecture 3. Let G be any family of sparse graphs, and H be an n-vertex dense graph. If there is no simple obstruction to packing G into H, then a packing exists.
Some obvious examples of obstructions include the total number of edges in the family G being larger than e(H), or any graph in G having more vertices than H. Certainly more subtle obstructions exist. For example it is possible that the total number of edges in graphs of G equals e(H), but all graphs in G have only vertices of even degree, while some vertices of H have odd degree, so that there is a parity obstruction to packing G into H, or that G contains two graphs with vertices of degree n − 1 (or more generally too many vertices of very high degree). More such examples exist, see for example the discussions in [6] (Section 9.1) and [18] (after Theorem 1.7). The meta-conjecture can be read as claiming that there is nevertheless a finite list.
Note that without restriction the problem of packing a given G into a given H is NP-complete (the survey [27] gives several NP-completeness results of which the one in [9] is arguably the most convincing), so in particular we do not expect to find any finite list of simple obstructions to the general packing problem. It follows that 'dense' in the meta-conjecture cannot simply mean large edge-density: one can artificially boost edge density without changing the outcome of this decision problem by taking the disjoint union with a very large clique and adding large connected graphs to G which perfectly pack the very large clique. However a typical random, or quasirandom, graph seems to be a reasonable candidate for 'dense', as does a graph with high minimum degree (in this case, the minimum degree bound must depend on parameters of the graphs G such as chromatic number, otherwise a reduction similar to the edge-density reduction exists).
Finally, on the topic of what constitutes a 'sparse graph', observe that bounded degeneracy is a fairly common and unrestrictive notion. One might ask whether degeneracy growing as a function of n is reasonable (of course, in Theorem 2 one can have a very slowly growing function). However, observe that we do not know the answer to Question 1 when r grows superlogarithmically, even for k = 2, and it seems reasonable to believe that the answer will often be 'no' even when the simple divisibility conditions are met. It is less clear that the maximum degree restriction of Theorem 2 is necessary, and we expect that it can at least be relaxed. However, with no degree restriction at all Theorem 2 becomes false, see Section 8.2.
Proof outline and organisation of the paper. Our proof of Theorem 2 amounts to the analysis of a quite natural randomised algorithm. We first describe a procedure which works if each graph in G has order at most (1 − δ)n. We take graphs in G in succession. For each G, we embed vertex by vertex into K n in a degeneracy order, at each time embedding to a vertex of K n chosen uniformly at random subject to the constraints that we do not re-use a vertex previously used in embedding G, or an edge used in embedding a previous graph. This procedure succeeds with high probability, and after each stage of embedding a graph, the unused edges in K n are quasirandom (in a sense we will later make precise).
To allow for spanning graphs, we modify this slightly. We adjust the degeneracy order so that the last δn vertices are independent and all have the same degree; this can be done while at worst doubling the degeneracy of the order. Then for each graph we follow the above procedure to embed the first (1 − δ)n vertices, and finally complete the embedding arbitrarily using a matching argument. We will see that this last step is with high probability always possible. The only slight subtlety is that we have to split E(K n ) into a very dense main part, whose edges we use only for the embedding of the first (1 − δ)n vertices, and a sparse reservoir which we use only for the completion; we do this randomly. This paper is organised as follows. In Section 2 we introduce martingale concentration inequalities needed for the analysis of our algorithm. We also establish some basic properties of degenerate graphs. In Section 3 we state our main technical result (Theorem 11) and show how to deduce Theorem 2 from it. In Section 4 we describe in detail our packing algorithm, PackingProcess, and outline the main steps of its analysis. We also state our main lemmas and show how they imply Theorem 11. In Sections 5, 6 and 7 we prove these lemmas. Finally in Section 8 we give some concluding remarks.
1.1. About the current version of the manuscript. The current version of the manuscript, which we made available in April 2022, corrects a small error in Definition 22 (version of June 2021) and an insufficiently small choice of constants (April 2022). These errors appear in the published version [Advances in Mathematics, Volume 354, 106739]. Calculations had to be adjusted appropriately throughout the paper. The paper was not updated otherwise. In particular, the introduction and cited literature represent the state at the time of publication.
The neighbourhood of a vertex v in the graph G is denoted N G (v). We write N G (U ) = v∈U N G (v) for the common neighbourhood of the set U ⊆ V (G).
The definition of degenerate graphs naturally suggests to label the vertices of a graph by integers. Suppose that the vertices of a graph G are V (G) = [ℓ]. Suppose that i ∈ V (G). The graphs to be packed in Theorem 2 are denoted G t because they are guest graphs. By contrast, during our packing procedure, we shall work with host graphs H s which are obtained from the original K n by removing what was used previously.

Probability.
2.2.1. Probability basics. All probability spaces considered in this paper are finite. The implicit sigma-algebra underlying each such space is the sigma-algebra generated by all singletons; in particular, the notion of measurability is trivial in this setting. Recall that if Ω is finite probability space then a sequence of partitions F 0 , F 1 ,. . . , F n of Ω is a filtration if each partition F i refines its predecessor F i−1 . 3 In this setting, a function f : Recall also that if Ω is a finite probability space and f : Ω → R is a function, then the conditional expectation E(f |F) : Ω → R and the conditional variance Var(f |F) : Ω → R of f with respect to a given partition F of Ω are defined by , 2.2.2. Sequential dependence and concentration. In this section we introduce some convenient consequences of standard martingale inequalities. These are generally useful in the analysis of randomised processes, so we try to provide some brief background and motivation. Suppose that we have a randomised algorithm which proceeds in m rounds. We can then denote by Ω := m i=1 Ω i the probability space that underlies an execution of the algorithm. Here Ω i is the set of all possible choices the algorithm may make in step i. It is important, however, that Ω as a probability space is not necessarily a product of probability spaces Ω i ; in other words, the algorithms can (and typically will) make choices for the step i depending on the choices it made in steps 1, . . . , i − 1. By history up to time t we mean a set of the form {ω 1 } × · · · × {ω t } × Ω t+1 × · · · Ω m , where ω i ∈ Ω i . We shall use the symbol H t to denote any particular history of such a form. By a history ensemble up to time t we mean any union of histories up to time t; we shall use the symbol L to denote any one such. Observe that there are natural filtrations associated to such a probability space: given times t 1 < t 2 < . . . we let F t i denote the partition of Ω into the histories up to time t i . We introduce formally a probability space of this type, which we use for the key part of our argument, in Section 4.1.
We recall that if Y 1 , . . . , Y n are a collection of independent random variables, whose ranges are not too large compared to n, we have Hoeffding's inequality for the tails of such sums: One should think of the squared range of Y i as a crude upper bound for Var(Y i ). There are various improvements, such as the Bernstein inequalities, which take into account the actual values Var(Y i ) in order to obtain stronger concentration results such as When the sum of variances is much larger than R̺, this probability bound is optimal up to small order terms in the exponent; for most applications this means it cannot usefully be improved. However when analysing randomised algorithms, usually one has to deal with a sum of random variables which are not independent, but rather are sequentially dependent, meaning that they come in an order in which earlier outcomes affect the later random variables. A good example is the following procedure (a variant of which we use in this paper) for embedding a graph G on vertex set [n/2] into a graph H on n vertices. We simply embed vertices in order 1, . . . , n/2, at each time t embedding vertex t uniformly at random to the set of all valid choices: that is, choices which give an embedding of G [1, . . . , t]. In order to show that this procedure is likely to succeed (which is true if G has small degeneracy and H is sufficiently quasirandom) we will want to know how vertices are embedded over time to some subsets S ⊆ V (H). In other words, we define (in this case, Bernoulli) random variables Y t to be 1 if t is embedded to S and 0 otherwise, and we want to know how the partial sums of these random variables, which are certainly not independent but are sequentially dependent, behave. The point of this section is to observe that in fact more or less the same concentration bounds hold as for independent random variables, except that one has to replace the sum of expectations with a sum of observed expectations, that is, where H i−1 denotes the history up to time i − 1, and the sum of variances with a sum of observed variances, similarly defined.
In combinatorial applications, one is usually interested in showing that a sum of random variables (which might in general not be Bernoulli) is close to its expectation µ. It is not a priori obvious that concentration bounds such as the above help: after all, the sum of observed expectations is itself a random variable and might not be concentrated near µ (it is easy to come up with examples in which it is not). We deal with this in what follows by defining a good event E, within which the observed sum of expectations is µ ± ν for some (small) ν > 0.
In applications E will often be a combinatorial statement about the process, and hence we refer to ν as the combinatorial error, to distinguish it from the probabilistic error ̺ > 0, as in (2.1) and (2.2). It is important to note that E is usually not determined before the random variables Y i (i.e. it may well not be F i -measurable for any member F i of the filtration), so we do not condition on E, rather we aim to estimate the probability that E holds and yet n i=1 Y i = µ ± (ν + ̺). In order to avoid mentioning any particular process, it is convenient to state the following lemmas in terms of a finite probability space Ω with a filtration (F 0 , F 1 , . . . , F n ). We should stress that though in our applications we will always use the same probability space, which underlies our packing process, we will consider different filtrations, always given by the histories up to increasing times, depending on the random variables we wish to sum.
The following lemma, from [1], is a sequential dependence version of Hoeffding's inequality. Note that the lemma as stated in [1] includes the condition P(E) > 0. However if P(E) = 0 the lemma statement is trivially true, so we drop the condition below.
Lemma 4 (Lemma 7, [1]). Let Ω be a finite probability space, and (F 0 , F 1 , . . . , F n ) be filtration. Suppose that for each 1 ≤ i ≤ n we have a nonnegative real number a i , an F i -measurable random variable Y i satisfying 0 ≤ Y i ≤ a i , nonnegative real numbers µ and ν, and an event E. Suppose that almost surely, either E does not occur or Furthermore, if we weaken the assumption, requiring only that either E does not occur or We should note that the probability bound in this lemma is what one would obtain from standard martingale inequalities for P( n i=1 Y i = µ±(ν+̺)) if the condition n i=1 E Y i F i−1 = µ ± ν held almost surely. The rôle of E is that we can allow this condition to fail outside of E but still obtain the same concentration within E; this is probabilistically fairly trivial but very useful. The same applies for the next lemma.
Lemma 4 gives close to optimal (up to a constant factor in the exponential) results when the random variables Y i are relatively often close to 0 and a i ; in other words, when a 2 i is not much larger than the variance Var(Y i ). This will turn out to be the case for most of the random sums we need to estimate in this paper. However, when it is not the case, at the cost of a second moment calculation the following version of Freedman's inequality [12] gives much stronger bounds, corresponding to a Bernstein inequality for independent random variables.

Lemma 5 (Freedman's inequality on a good event).
Let Ω be a finite probability space, and (F 0 , F 1 , . . . , F n ) be a filtration. Suppose that we have R > 0, and for each 1 ≤ i ≤ n we have an F i -measurable non-negative random variable Y i , nonnegative real numbers µ, ν and σ, and an event E. Suppose that almost surely, either E does not occur or we have Furthermore, if we assume only that either E does not occur or we As with the Bernstein inequality, this result is essentially optimal when the sum of observed variances is much larger than R̺. We would like to point out that since E is often a combinatorial statement which is not tailored to the specific random variables Y i we are summing, when we use either lemma to estimate tail probabilities for several sums of random variables, we will often use the same event E repeatedly; since it will appear only once in union bounds, both lemmas are useful for showing that a.a.s. a collection of many (rapidly growing with n) sums are simultaneously close to their expectations, even when the probability of E only tends to one quite slowly with n.
We deduce Lemma 5 from Freedman's martingale inequality, which we now state.
Theorem 6 (Proposition (2.1), [12]). Let Ω be a finite probability space, and (F 0 , F 1 , . . . , F n ) be a filtration. Suppose that for some R > 0, for each 1 ≤ i ≤ n, we have an We now deduce Lemma 5, using a similar approach as was used in [1] to prove Lemma 4.
Proof of Lemma 5. We show the required upper bound and the corresponding lower bound follows by symmetry, replacing each Y i with R − Y i . This gives the desired two-sided result by the union bound. Observe that if P(E) = 0, (2.3) holds trivially. We may thus assume P(E) > 0. Now, given Y 1 , . . . , Y n , we define random variables U 1 , . . . , U n as follows. We set U i = max(Y i , R) if P(E|F i−1 ) > 0, and otherwise U i = 0. Observe that U i is constant on each part of F i by definition. We claim that for each 1 ≤ t ≤ n we have almost surely Indeed, suppose that t is minimal such that this statement fails, and let F be a set in F t−1 with P(F ) > 0 witnessing its failure. By minimality of t, at least one of E(U t |F ) and Var(U t |F ) is strictly positive. By definition of U t we have P(E|F ) > 0. But since E(U i |F i−1 ) and Var(U i |F i−1 ) are nonnegative for each i, this shows that with probability at least P(F )P(E|F ) > 0, the event E occurs and one of the This is a contradiction, so we conclude (2.4) holds almost surely for each t. Furthermore, we have 0 ≤ U i ≤ R for each 1 ≤ i ≤ n.
Next, define for each 1 ≤ i ≤ n the random variable W i = U i − E(U i |F i−1 ). We have −R ≤ W i ≤ R for each i, by definition W i is F i -measurable, and by definition almost surely E(W i |F i−1 ) = 0 and Var(W i |F i−1 ) = Var(U i |F i−1 ). Thus by Theorem 6 we have Finally, if E occurs then almost surely Y i = U i for each 1 ≤ i ≤ n, giving the desired upper bound (2.3).
Finally, let us note that we shall be using many statements of the form (2.5) with probability at least p, provided event A we get event B.
We emphasize that such statements are not statements about conditional probabilities. That is, the meaning of (2.5) is P(A \ B) ≤ 1 − p. A prototypical example is with probability at least 1 − o(1), if a given randomized algorithm does not fail, then it produces an output with certain desired properties.

2.3.
Simple properties of degenerate graphs. We need to bound x∈V (G) deg(x) 2 for degenerate graphs G. In several applications of Lemma 4 the numbers a i will be upper bounded by the degrees of vertices in G, where G is one of the graphs to be packed, so that x∈V (G) deg(x) 2 is an upper bound for the sum i a 2 i appearing in Lemma 4. Lemma 7. Let G be an n-vertex graph with degeneracy D and maximum degree ∆. Then we have We also need to show that degenerate graphs contain large independent sets all of whose vertices have the same degree.
Lemma 8. Let G be a D-degenerate n-vertex graph. Then there exists an integer 0 ≤ d ≤ 2D and a set I ⊆ V (G) with |I| ≥ (2D + 1) −3 n which is independent, and all of whose vertices have the same degree d in G.
Proof. We first claim that at least (2D + 1) −1 n vertices of G have degree at most 2D. Indeed, if this were false then there would be more than 2Dn/(2D + 1) vertices of G all of whose degrees are at least 2D + 1, so that we obtain e(G) > Dn, which contradicts the D-degeneracy of G. Let 0 ≤ d ≤ 2D be chosen to maximise the number of vertices in G of degree d, and let S be the set of vertices in G with degree d. We thus have |S| ≥ (2D + 1) −2 n. Now let I be a maximal independent subset of S. Each vertex of I has at most d ≤ 2D neighbours in S, so that I ∪ i∈I N(i) ≤ (2D + 1)|I|. By maximality I ∪ i∈I N(i) covers S, hence |I| ≥ (2D + 1) −1 |S| ≥ (2D + 1) −3 n, as desired.

Reducing the main theorem
We deduce Theorem 2 from the following technical result.
Theorem 9. For each γ > 0 and each D ∈ N there exists c > 0 and a number n 0 such that the following holds for each integer n > n 0 . Suppose that s * ≤ 2n and that for each s ∈ [s * ] the graph G s is a graph on vertex set [n], with maximum degree at most cn log n , such that deg − (x) ≤ D for each x ∈ V (G s ) and such that the last (D + 1) −3 n vertices of [n] form an independent set in G s , and all have the same degree d s in G s . Suppose further that the total number of edges of (G s ) s∈[s * ] is at most (1 − 3γ) n 2 . Then (G s ) s∈[s * ] packs into K n . Actually, we prove Theorem 9 in a slightly more general form using the concept of quasirandomness which is crucial for our approach. This concept was introduced by several authors independently in the 1980s (of which the paper [7] is the most comprehensive) and captures a property that the edges of graph are distributed evenly among its vertices. We give a definition tailored for our needs which is somewhat stronger than the usual definition of quasirandom graphs.
Definition 10 (quasirandom). Suppose that H is a graph with n vertices and with density p. We say that such graph H is (α, L)-quasirandom if for every set S ⊆ V (H) of at most L vertices we have |N H (S)| = (1 ± α)p |S| n.
Theorem 11 (Main technical result). For each γ > 0 and each D ∈ N there exist numbers n 0 ∈ N and c, ξ > 0 such that the following holds for each n > n 0 . Suppose that H is an (ξ, 2D + 3)-quasirandom graph with n vertices and density p > 0. Suppose that s * ≤ 2n and that for each s ∈ [s * ] the graph G s is a graph on vertex set [n], with maximum degree at most cn log n , such that deg − (x) ≤ D for each x ∈ V (G s ) and such that the last (D + 1) −3 n vertices of [n] form an independent set in G s , and all have the same degree d s in G s . Suppose further that the total number of edges of (G s ) s∈[s * ] is at most (p − 3γ) n 2 . Then (G s ) s∈[s * ] packs into H. Theorem 11 indeed generalizes Theorem 9 because it can be easily checked that for any fixed D ∈ N and α > 0, the graph K n is (α, 2D + 3)-quasirandom for n sufficiently large. The reason why we give the proof in this greater generality is that it is clear that the only feature of K n we actually use is its quasirandomness. We show that Theorem 9 implies Theorem 2. Note that starting with Theorem 11 the same deduction would yield a version of Theorem 2 for quasirandom host graphs. We state such a version for dense Erdős-Rényi random graphs G(n, p), an n-vertex graph, where each pair of vertices forms an edge independently with probability p. Those graphs are well-known to have asymptotically almost surely error in quasirandomness (even in our Definition 10) tending to zero.
Theorem 12. For each p, γ > 0 and each D ∈ N there exists c > 0 such that the following holds asymptotically almost surely, as n → ∞. Suppose that (G t ) t∈[t * ] is a family of D-degenerate graphs, each of which has at most n vertices and maximum degree at most cn log n . Suppose further that the total number of edges of (G t ) t∈[t * ] is at most (p − γ) n 2 . Then (G t ) t∈[t * ] packs into G(n, p).
Proof of Theorem 2. To deduce Theorem 2 from Theorem 9, observe that given an integer D and graphs G = (G t ) t∈[t * ] to pack, we may assume without loss of generality that none of the graphs in G has isolated vertices, since such vertices can be erased and then easily packed in the last step.
We now successively modify the family G as follows. If there are two graphs G, G ′ ∈ G with v(G), v(G ′ ) ≤ n/2, we replace G and G ′ with the disjoint union G ∪ G ′ . We repeat this until no further such pairs exist, giving G ′ .
Observe that the maximum degree and the degeneracy of the graphs in G is the same as in G ′ . Furthermore a packing of G ′ is also a packing of G. Finally, there is at most one graph in G ′ with less than n/2 vertices. Hence all but at most one graph has at least n/4 edges. We conclude that the total number s * of graphs in G ′ satisfies (s * − 1)n/4 ≤ (1 − γ) n 2 , and hence s * ≤ 2n. Finally, we let the graphs (G ′ s ) s * s=1 be obtained from the graphs G ′ by adding if necessary isolated vertices to each in order to obtain n-vertex graphs. Now, for each G ′ s we choose an order on V (G ′ s ) as follows. First, we pick an order witnessing D-degeneracy of G ′ s . Next, we pick an integer 0 ≤ d s ≤ 2D and an independent I s set of (2D + 1) −3 n vertices each of which has degree d s in G ′ s and change the order by moving these vertices to the end. Such an integer d s and independent set exist by Lemma 8. The result is an ordering of V (G ′ s ) with degeneracy at most 2D, as required for Theorem 9 with input 2D and γ/3. Then Theorem 9 returns the desired packing.

Proof of Theorem 11
For the proof of Theorem 11, we need some algorithms and definitions. We give these now along with a sketch of the proof.
We prove Theorem 11 by analysing a randomised algorithm, which we call PackingProcess, that packs the guest graphs G s into H. We prove that this algorithm succeeds with high probability. In this algorithm we assume that the last δn vertices of each graph G s form an independent set, where δ < (D + 1) −3 is to be chosen later.
PackingProcess begins by splitting the edges of the input graph H into a bulk H 0 and a reservoir H * 0 by independently selecting edges into the latter with probability chosen such that e(H * 0 ) ≈ γ n 2 . As a result, the graphs H 0 and H * 0 are with high probability quasirandom. Now PackingProcess proceeds in s * stages. In each stage s, it runs a randomised embedding algorithm, called RandomEmbedding and explained below, to embed the first n − δn vertices of G s into the bulk H s−1 . Then in the completion phase the last δn vertices of G s are embedded into the reservoir H * s−1 . Since there are exactly δn vertices of G s left to embed and exactly δn vertices of V ( H) unused so far in this stage, we want to find a bijection between these. Since all neighbours of each yet unembedded vertex are already embedded, this completion amounts to choosing a system of distinct representatives. The completion phase does not use randomness: the system of disjoint representatives is obtained using Hall's theorem. Now H s and H * s are defined simply by removing the edges used in this embedding.
Both RandomEmbedding and the completion phase may fail at any stage s; this means that it is not possible to embed a certain part of G s . In that case PackingProcess fails, too. If PackingProcess does not fail then it always produces a valid packing of (G s ) into H. So, we need to show that PackingProcess (see Algorithm 1) succeeds with positive probability.
For describing our randomised embedding algorithm RandomEmbedding we need the following definitions. We shall use the symbol ֒→ to denote embeddings produced by RandomEmbedding. We write G ֒→ H to indicate that the graph G is to be embedded into H. Also, if t ∈ V (G), v ∈ V (H) and A ⊆ V (H) then t ֒→ v means that t is embedded on v, and t ֒→ A means that t is embedded on a vertex of A.
When j = t − 1, we call C j G֒→H (t) the final candidate set of t. RandomEmbedding (see Algorithm 2) randomly embeds a guest graph G into a host graph H. The algorithm is simple: we iteratively embed the first (1 − δ)n vertices of G randomly to one of the vertices of their candidate set which was not used for embedding another vertex already.

Algorithm 2: RandomEmbedding
Input: graphs G and H, with To show that PackingProcess does not fail at any stage, we shall show that the host graph H s constructed in PackingProcess in embedding stage s is quasirandom in the sense of Definition 10. In fact, in order to analyse the completion phase of PackingProcess we need quasirandomness of the pair (H s , H * 0 ), where H * 0 is the initial reservoir. We now define this coquasirandomness of a pair of graphs. Recall that quasirandomness of one graph means that common neighbourhoods are always about the size one would expect in a random graph of a similar density. Coquasirandomness of two graphs means that the intersection of a common neighbourhood in the first graph and another in the second graph has about the size one would expect in two independent random graphs of the respective densities.
Definition 14 (coquasirandom). For α > 0 and L ∈ N, we say that a pair of graphs (F, F * ), both on the same vertex set V of order n and with densities p and p * , respectively, is (α, L)coquasirandom if for every set S ⊆ V of at most L vertices and every subset R ⊆ S we have With this we can state the setting of our main lemmas and fix various constants which we will use in the remainder of the paper.
Setting 15. Let D, n ∈ N and γ > 0 be given. We define such that ∆(G s ) ≤ cn/ log n, and such that the final δn vertices of G s all have degree d s and form an independent set.
Let H 0 and H * 0 be two edge-disjoint graphs on the same vertex set of order n such that Note that in (4.1) we give numbers α x which we call 'constant' even though n appears in their definition. Observe that α x is strictly increasing in x. We will be interested only in values 0 ≤ x ≤ 2n (though it is technically convenient to have the definition for all x ∈ R), and it is easy to check that neither α 0 nor α 2n depends on n.
The main lemmas for the analysis of PackingProcess are now the following. Lemma 16 states that (H 0 , H * 0 ) is coquasirandom with high probability. Lemma 17 states that with high probability (H s , H * 0 ) continues to be coquasirandom for each stage s. To prove this lemma will be the main work of this paper. Lemma 18 states that, provided that H s has the quasirandomness provided by Lemma 17, the RandomEmbedding of G s+1 into H s is very likely to succeed. Lemma 19 states that with high probability very few edges of H * 0 are removed at each vertex to form H * s . This then implies that (H s , H * s ) is also likely to be coquasirandom (though with a much worse error parameter). Finally, in Lemma 20, using the coquasirandomness of (H s , H * s ), we argue that at each stage it is very likely that the completion phase is possible.
We start with the lemma concerning the coquasirandomness of the initial bulk and reservoir.
Lemma 16. For each D ∈ N and each γ > 0, and for each n sufficiently large, let us suppose that the constants α 0 and ξ are as in Setting 15. Suppose that H is a (ξ, 2D + 3)-quasirandom graph of order n and density p ≥ 3γ. Let H * 0 be a random subgraph of H in which each edge of H is kept with probability q = γ/p. Let H 0 be the complement of H * 0 in H. Then with probability at least 1 − n −6 , we have that e(H * 0 ) = (1 ± α 0 )γ n 2 and the pair (H 0 , H * 0 ) is 1 4 α 0 , 2D + 3 -coquasirandom. The next lemma states that coquasirandomness of (H s , H * 0 ) is preserved. Lemma 17. For each D ∈ N and each γ > 0, and for each n sufficiently large, the following holds with probability at least 1 − n −5 . Suppose that the constants and G 1 , G 2 , . . . , G s * and the graph H 0 ∪ H * 0 = H are as in Setting 15. When PackingProcess is run, for each s ∈ [s * ] either PackingProcess fails before completing stage s, or the pair The next lemma estimates the probability that a single execution of RandomEmbedding succeeds.
Lemma 18. For each D, each γ > 0, and any sufficiently large n, let δ, η, α 0 , α 2n , ε and c be as in Setting 15. Given any α 0 ≤ α ≤ α 2n , let G be a graph on vertex set [n] with maximum degree at most cn/ log n such that deg − (x) ≤ D for each x ∈ V (G), and let H be any (α, 2D + 3)quasirandom n-vertex graph with at least γ n 2 edges. When RandomEmbedding is run then it fails with probability at most 2n −9 .
Our final two main lemmas concern the completion phase of PackingProcess. The first states that the completion phase is likely to delete very few edges at any vertex of H * 0 . Lemma 19. Given D ∈ N and γ > 0, let n be sufficiently large. Suppose that the constants and G 1 , G 2 , . . . , G s * and H are as in Setting 15. When PackingProcess is run, with probability at least 1 − n −50 one of the following three events occurs. First, PackingProcess fails. Second, We will show in the proof of Theorem 11 that the first two events are unlikely, so that the likely event is the last.
Our last lemma states that with high probability, at any stage s, provided (H s−1 , H * s−1 ) is sufficiently coquasirandom, running RandomEmbedding to partially embed G s into H s−1 is likely to give a partial embedding which can be completed to an embedding of G s using H * s . Lemma 20. For each D ∈ N and each γ > 0, and for each n sufficiently large, let the constants be as in Setting 15. Suppose that G is a graph on [n], such that we have deg − (x) ≤ D for each x ∈ V (G), we have ∆(G) ≤ cn/ log n, and such that the final δn vertices of G form an independent set, and all have degree d. Suppose (H, H * ) are a pair of (η, 2D + 3)coquasirandom graphs on n vertices, and H is (α s * , 2D + 3)-quasirandom, with e(H) = p n 2 and e(H * ) = (1 ± η)γ n 2 , where p ≥ γ. When RandomEmbedding is run to embed G[[n−δn]] into H, with probability at least 1−5n −9 it returns a partial embedding φ which can be extended to an embedding φ * of G into H ∪ H * , with all the edges using a vertex in {n − δn + 1, . . . , n} mapped to H * .
Let us briefly explain why we cannot simply perform the whole embedding in the quasirandom H, but have to split it into a bulk and a reservoir. In order to analyse RandomEmbedding, we require that the bulk is very quasirandom, but RandomEmbedding is very well-behaved and preserves this good quasirandomness. In contrast, we are not able to show that the completion stage, where we choose a system of distinct representatives for the remaining vertices, is so well-behaved. If we used the bulk for this embedding the errors would rapidly become unacceptably large. However, to show that choosing such a system of distinct representatives is possible, we do not need much quasirandomness. Thus the reservoir H * s does rapidly lose its quasirandomness (compared to H s ), but it is sufficient for the completion.
We now argue that our main lemmas imply Theorem 11.
Proof of Theorem 11. We can assume that p > 3γ as the statement is vacuous otherwise.
Suppose that we run PackingProcess on the input graphs G 1 , . . . , G s * . For the course of the analysis of this run, we shall first ignore possible failures during the completion phase. That is, if any failure during the completion phase occurs, we ignore it and continue embedding using RandomEmbedding into the bulk. Clearly, this does not change behaviour of future rounds of RandomEmbedding or the evolution of the bulk.
As we said earlier, we need to argue that with positive probability PackingProcess does not fail. Rather than proving this directly, we introduce additional quasirandomness conditions, and prove that with positive probability, all these conditions are satisfied up to any given stage, and that if we have the said quasirandomness conditions up to that stage, then Ran-domEmbedding will proceed successfully through the next stage. (Of course, it could happen that PackingProcess succeeds in the overall embedding even though some of our quasirandomness conditions failed during the course of the packing; we shall pessimistically view such an execution of PackingProcess as unsuccessful.) More precisely, it is clear that PackingProcess does not fail (in the RandomEmbedding stage) unless at least one of the following exceptional events occurs: without failure, the graphs H s are (α s , 2D + 3)-quasirandom for s ≤ r. Then, in stage r + 1, RandomEmbedding fails. Lemma 16 gives an upper bound on the probability of the event in (i). Lemma 17 gives an upper bound on the probability of all the events in (ii). For each fixed r ∈ {0, . . . , s * − 1}, the event in (iii) can be bounded using Lemma 18. Thus, the probability that PackingProcess fails in the RandomEmbedding part is at most n −6 + n −5 + s * · 2n −9 .
Let us now analyse the completion phases of PackingProcess. If PackingProcess fails in one of the completion phases then one of the following events occurs: (iv) One of the events described under (i)-(iii).
(v) None of (i)-(iii) occurs. RandomEmbedding and the completion phase proceed successfully through the first r stages (for some r ∈ {1, . . . , None of (i)-(iii) occurs. RandomEmbedding and the completion phase proceeds successfully through the first r stages (for some r ∈ {0, . . . , s * − 1}, and throughout all the pairs (H s , H * 0 ) and (H s , H * s ) are (α s , 2D + 3)-coquasirandom and (η, 2D + 3)coquasirandom, respectively. In stage r + 1, RandomEmbedding successfully embeds but the completion phase fails. Lemma 19 bounds the probability of the event in (v) by n −50 . Finally, Lemma 20 bounds the probability of events in (vi) for each given r by 5n −9 . Thus, the total probability of failure due to (v) or (vi) is at most n −50 + s * · 5n −9 .
We conclude that PackingProcess packs the graphs G 1 , . . . , G s * into H with positive probability.
4.1. The probability space for RandomEmbedding. Algorithm 2 gives a sound definition of a randomised algorithm which either provides an embedding of G[n−δn] into H or fails, and the probability of any output can be in principle computed. To handle the analysis of RandomEmbedding, which is the most demanding part of this paper, it is useful to properly set up a probability space as indicated at the beginning of Section 2.2.2. Given G and H as in Algorithm 2 (recall that V (G) = [n]), let Ω G֒→H := (V (H)∪ { }) n−δn . We now need to define the probability measure on Ω G֒→H . Let ω = (ω 1 , . . . , ω n−δn ) ∈ Ω G֒→H be given. Suppose first that ω consists only of vertices of V (H). Then we define P G֒→H (ω) as the probability that RandomEmbedding succeeds embedding G[n−δn] into H, and maps each vertex t ∈ [n − δn] of G on vertex ω t . Suppose next that ω contains some 's, and that these form a terminal segment of ω, say starting from position t 0 . Then we define P G֒→H (ω) as the probability that RandomEmbedding succeeds in the first t 0 − 1 steps, and for each t ∈ [t 0 − 1] it maps vertex t on ω t , and then in step t it halts with failure. Last, suppose that ω contains some 's but these do not form a terminal segment of ω. We then define P G֒→H (ω) := 0. It is clear that P G֒→H (ω) is a probability measure on Ω G֒→H which corresponds to possible runs of RandomEmbedding.
We shall use the concept of histories and history ensembles, as introduced in Section 2.2.2, in connection with Ω G֒→H .

4.2.
Organisation of the technical part of the paper. It thus remains to prove all the main lemmas from this section. Lemmas 16 and 17 are proven in Section 6. Lemma 18 is stated here in a simplified form. In actuality, we prove a stronger statement (of which Lemma 18 is a straightforward consequence) in Lemma 24. This stronger form is also needed for proving Lemma 17, and its proof spans the entire Section 5. Lemmas 19 and 20 are proven in Section 7.

Staying on a diet
In this section we consider the running of RandomEmbedding to embed one degenerate graph G into a quasirandom graph H. The results of this section will always be used to analyse one stage s, when we take G = G s and H = H s−1 . We also analyse how RandomEmbedding behaves with respect to the graph H * = H * s−1 . We analyse carefully how fast common neighbourhoods of vertices in H are eaten up by RandomEmbedding, and how often individual vertices of H appear in candidate sets. To make this precise, we introduce the following two definitions.
The diet condition states that during the running of RandomEmbedding, for each t ∈ [n−δn], the fraction of each set N H (S) which is covered by im(ψ t ) is roughly as expected, that is, roughly proportional to | im(ψ t )|/n. As with (co)quasirandomness, we also require a codiet condition, considering the intersection of some vertex neighbourhoods in H and H * .
Definition 21 (diet condition, codiet condition). Let H be a graph with n vertices and p n 2 edges, and let X ⊆ V (H) be any vertex set. We say that the pair (H, Let H, H * be two graphs with vertex set V of order n and p n 2 and p * n 2 edges, respectively, and let X ⊆ V be any vertex set. We say that the triple (H, H * , X) satisfies the (β, L)-codiet condition if for every set S ⊆ V of at most L vertices and for every subset R ⊆ S we have Observe that the (β, L)-diet condition holding for (H, ∅) is simply the statement that H is (β, L)-quasirandom, and similarly for the codiet condition.
The cover condition, defined below, roughly states that for each v in the host graph H during the embedding of G into H by RandomEmbedding, the right fraction of vertices x of G have v in their final candidate set. For making precise what we mean by 'the right fraction' some care is needed. Firstly, how likely it is that v is in the final candidate set of x depends on the number neighbours of x preceding x. Therefore we will partition V (G) according to this number of previous neighbours. For technical reasons we actually further want to control this fraction in intervals of V (G) of length εn, where n is the order of H. Hence we define for a given ε > 0 the set When G is given with a D-degenerate ordering it is enough to consider d ∈ {0, 1, . . . , D}. That is, So if H is quasirandom and has p n 2 edges, then for an arbitrary v ∈ V (H), we would expect that about a p d -fraction of vertices x in each X i,d have v in their final candidate sets (let us remind that the candidate set may include also vertices used by the embedding). However, this expectation turns out not to be quite true. If a vertex y of G has been embedded to v, and a vertex x has a left-neighbour z which is adjacent to y, then x is much more likely to have v in its candidate set, because we get 'for free' that z is embedded to a neighbour of v. This is the only reason why the above intuition can fail: in particular the expectation does hold true if v is not in the image of ψ, and it turns out that this is all we need.
Definition 22 (cover condition). Suppose that G and H are two graphs such that H has order n, the vertex set of G is [n], and H has density p. Suppose that numbers β, ε > 0 and i ∈ [n − εn] are given. Suppose that ψ is a partial embedding of G into H which embeds at least the first i + εn − 1 vertices of G. We say that ψ satisfies the (ε, β, i)-cover condition if for each v ∈ V (H) such that v ∈ im ψ ↾[i+εn −1] , and for each d ∈ N, we have Note that a corresponding condition for d = 0 is trivial, even with zero error parameters.
Note that the events DietE(·; t) and CoDietE(t) are determined by histories (as defined in Sections 2.2.2 and 4.1) up to time t. That is, for any λ > 0 and any history H t , we have that DietE(λ; t) either contains H t or is disjoint from H t . We have similar the same property for CoDietE(t). The event CoverE(·; t) is somewhat different since its definition involves the set X t,d which looks εn − 1 many steps forward in time. So, for any history H t+εn−1 , we have that CoverE(λ; t) either contains H t+εn−1 or is disjoint from H t+εn−1 .
The following lemma is the crucial accurate analysis of RandomEmbedding which we need in order to show that RandomEmbedding is likely to succeed and in order to derive further properties of the final embedding.
Lemma 24 (Diet-and-cover lemma). For each D ∈ N, each γ > 0, and any sufficiently large n, let δ, η, α 0 , α 2n , ε and c, C be as in Setting 15. Let α ∈ [α 0 , α 2n ] be arbitrary. Let G be a graph on vertex set [n] with maximum degree at most cn/ log n such that deg − (x) ≤ D for each x ∈ V (G), and let H be any (α, 2D + 3)-quasirandom n-vertex graph with at least γ n This lemma immediately implies Lemma 18.
Proof of Lemma 18. Recall that RandomEmbedding fails if and only if C t−1 G֒→H (t)\im(ψ t−1 ) = ∅ for some t, and DietE(Cα; t − 1) in particular gives a formula lower bounding the size of C t−1 G֒→H (t) \ im(ψ t−1 ) which is greater than 0. Since the likely event of Lemma 24 is contained in DietE(Cα; t − 1) for each t ≥ 2, and the same lower bound is trivially implied by (α, 2D + 3)quasirandomness of H for t = 1 (since im ψ 0 = ∅), we conclude that within the likely event of Lemma 24, RandomEmbedding does not fail.
The main difficulty is to establish that the cover and diet conditions hold. We will see that the codiet condition is an easy byproduct. The reason for the difficulty is that the error terms in the cover and diet conditions for small times t feed back into the calculations which will establish the cover and diet conditions for larger times t, and we have to ensure that this feedback loop does not allow the errors to spiral out of control. To that end, we define a new sequence of error terms, which we need only in the proof of Lemma 24. The following constants {β t : t ∈ R} are a carefully chosen increasing sequence (depending on α) such that β 0 = α and such that β n /β 0 is bounded by a constant which does not depend on α (though it does depend on D, γ and δ). Given D and α, δ, γ > 0, we define We will mainly take t integer in the range [0, n], but it is convenient to allow t to be any real number. In particular, for each t ≥ 0, we have Suppose that we have Setting 15, and suppose that α ≥ α 0 is given. Then for each t ≥ 0 we have We split the proof of Lemma 24 into two parts. The cover lemma (Lemma 25) states that if the (β t , 2D + 3)-diet condition holds for (H, im ψ i ) for each i ∈ [t − 1], then it is very unlikely that the (ε, 20Dβ t , t)-cover condition fails for ψ t+εn−2 . Note that the time t + εn − 2 is the first time at which the (ε, 20Dβ t , t)-cover condition is guaranteed to be determined, since at this time all left-neighbours of all vertices t, t + 1, . . . , t + εn − 1 have certainly been embedded.
Lemma 25 (Cover lemma). For each D, each γ > 0 and sufficiently large n, let α 0 , α 2n , ε, δ and c be as in Setting 15. Suppose that α 0 ≤ α ≤ α 2n and G is a graph on vertex set [n], with deg − (x) ≤ D for each x ∈ [n], with maximum degree at most cn/ log n, and suppose that H is an n-vertex graph of density at least γ. Let β t for 0 ≤ t ≤ n be defined as in (5.3) and assume that β n ≤ 1 10 . Let t with 1 ≤ t ≤ n − δn − εn + 1 be fixed. Then we have Let us consider Setting 15. Suppose that for some 0 ≤ t ≤ n − δn − εn, RandomEmbedding runs up to time t and the (β t , 2D + 3)-diet condition holds for (H, im ψ t ). Let p := e(H)/ n 2 and suppose that p ≥ γ. Then for each t + 1 ≤ j ≤ t + εn, and each set S ⊆ V (H) of at most 2D + 3 vertices, we have Hence, the (2β t , 2D + 3)-diet condition holds deterministically for (H, im ψ j ). In particular RandomEmbedding cannot fail before time t + εn.
Lemma 26 (Diet lemma). For each D, each γ > 0, and any sufficiently large n, let α 0 , α 2n , ε, δ and η be as in Setting 15. For any t ≤ (1−δ)n, and α 0 ≤ α ≤ α 2n the following holds. Suppose that G is a graph on [n] such that deg − (x) ≤ D for each x ∈ [n], and H is an (α, 2D + 3)quasirandom graph with n vertices with p n 2 edges, with p ≥ γ. Suppose furthermore that H * is a graph on V (H) andp n 2 edges withp ≥ (1 − η)γ, such that (H, H * ) satisfies the (η, 2D + 3)-coquasirandomness condition. Let {β τ : τ ∈ [0, n]} be defined as in (5.3) and assume that β n ≤ 1 10 . Let t with 1 ≤ t ≤ n − δn be fixed. Then we have Since the graphs G and H are fixed in Lemmas 24,25,and 26, in this section we drop the subscript in the notation C j G֒→H (x) and write simply C j (x). Likewise, we write P instead of P G֒→H . Last, we write (ψ i ) i∈t * for partial embeddings of G into H; here t * is the time at which RandomEmbedding halts. Of course, t * and (ψ i ) i∈t * depend on a particular realization ω ∈ Ω G֒→H of the run of RandomEmbedding.
We now show that Lemmas 25 and 26, whose proofs are deferred to later in this section, imply Lemma 24.
Proof of Lemma 24. Suppose that we are given D and γ. Now, given α > 0, we define β t for each 0 ≤ t ≤ n as in (5.3). For t = 0, . . . , n − δn, define Our strategy is first to show that P(A t−1 \ A t ) is tiny for each t. Since P(A 0 ) = 1, this will imply that P(A n−δn ) is very close to 1. Last, we shall show that A n−δn is a subset of the event in (5.2). Indeed, suppose that the event A t−1 holds. This in particular means that the (β j , 2D + 3)diet condition holds for (H, im ψ j ) for each 1 ≤ j < t, and the (ε, 20Dβ j−εn+1 , j −εn+1)-cover condition holds for ψ j for each εn − 1 ≤ j < t.
Because the Firstly, let us focus on the term CoverE(20Dβ t−εn+1 ; t − εn + 1) in (5.6). This term does not exist when t < εn, so let us assume the contrary. Lemma 25 then tells us that In particular, Secondly, we use Lemma 26 to show that with high probability neither the diet condition nor the codiet condition fails at time t. Indeed, Lemma 26 tells us that In particular, Summing up (5.7) and (5.8), we conclude that P(A t−1 \ A t ) ≤ 2n −10 . Taking a union bound over the at most n choices of t, we see that with probability at least 1 − 2n −9 the good event from the statement of Lemma 24 holds, i.e., that RandomEmbedding does not fail, and by the choice of C and by (5.3), for each 1 ≤ t ≤ (1 − δ)n the pair (H, im ψ t ) satisfies the (Cα, 2D + 3)-diet condition and the triple (H, H * , im ψ t ) satisfies the (2η, 2D + 3)-codiet condition, and for each 1 ≤ t ≤ n + 1 − εn the embedding ψ (1−δ)n satisfies the (ε, Cα, t)-cover condition, as desired.
We now prove the cover lemma.
Proof of Lemma 25. Let e(G) = p n 2 ≥ γ n 2 . Let D be the event that the (β t , 2D + 3)-diet condition holds for each (H, im We also fix 1 ≤ d ≤ D. Define B v,d as the event that D holds, and that v and d witness the failure of the (ε, 20Dβ t , t)-cover condition for ψ t+εn−2 . More formally, Our aim is to show that A union bound over the choices of v and d then gives the lemma.
Our strategy for proving (5.9) is as follows. Ideally, we would like to assert that for each x ∈ X t,d the probability of v ∈ C x−1 (x) is roughly p d and apply Lemma 4 to bound the probability of the bad event B v,d . To this end, we consider a dynamical version of candidate sets, where we track changes in the set potentially suitable to accommodate x as we gradually embed more and more left-neighbors of x. More precisely, for each , and as i increases, the set C i,dyn (x) shrinks exactly at times y ∈ N − (x) when left-neighbors of x are embedded.
Unfortunately we are not able to carry out this ideal strategy, because when we apply Lemma 4 what we need to calculate is not the probability of v ∈ C x−1 (x), but this probability in the conditioned space given by the history up to some earlier time. Because the sets N − (x) interleave each other, this conditional probability will generally not be close to p d and we were not able to find a good way to estimate it. Hence we refine this strategy by rewriting the event {v ∈ C x−1 (x)} as where y 1 , . . . , y d are the neighbours of x, ordered from left to right. The event {y 1 , y 2 , . . . , y d ֒→ N H (v)}, of course, equals the entire intersection (5.10). However, this more complicated way of expressing (5.10) suggests to introduce, for each k, a sequence of random variables that count the events of the form {y 1 , y 2 , . . . , y k ֒→ N H (v)}, ordered by y k . Intuitively, conditioning on {y 1 , y 2 , . . . , y k ֒→ N H (v)} holding (which is determined by the history up to the time at which we embed y k ) we should expect that the probability that {y 1 , y 2 , . . . , y k+1 ֒→ N H (v)} holds is about p. We will be able to demonstrate this is true, even if we condition on a typical history up to the time immediately before embedding y k+1 , and this allows us to use Lemma 4.
More formally, given 1 ≤ k ≤ d and y ∈ V (G), we define random variables Y k,1 , . . . , Y k,t+εn−2 as follows. Let Y k,y be the number of vertices x ∈ X t,d such that y is the k-th leftmost vertex of N − (x) and the first k vertices of N − (x) are all embedded to N H (v). Further, for each 0 ≤ k ≤ d, we let Y k be the event that either v ∈ im ψ t+εn−1 or (1 ± 10β t ) k p k |X t,d | ± kε 2 n/d vertices x ∈ X t,d have all of the first k vertices of N − (x) embedded to N H (v). Observe that the event Y k is precisely the statement that (5.11) either v ∈ im ψ t+εn−1 or Y k,y = (1 ± 10β t ) k p k |X t,d | ± kε 2 n/d .

Our bad event then satisfies
In order to bound the probability of B v,d we cover B v,d with d events, each of whose probabilities we can bound with Lemma 4. For this purpose we define the event Note that E 1 = D since Y 0 holds trivially with probability one. We thus have Our aim then is to show that for each 1 ≤ k ≤ d we have Note that this and a union bound over the d choices of k gives (5.9).
To establish (5.12) we would like to apply Lemma 4. Hence we need to argue that either E k fails, or we can estimate t+εn−2 y=1 E (Y k,y |H y−1 ), where H y−1 is the history of embedding decisions taken in RandomEmbedding up to and including the embedding of vertex y − 1. To this end, for y ∈ [t + εn − 2] let Z k,y be the number of vertices x ∈ X t,d such that y is the k-th leftmost vertex of N − (x) and the first k − 1 vertices of N − (x) are embedded to N H (v). Then the quantity Z k,y is determined by H y−1 and Observe further that (5.14) because both sums count the number of vertices x ∈ X t,d such that the first k − 1 vertices of N − (x) are embedded to N H (v), in the first sum grouped by their k-th left neighbour, and in the second sum by their (k − 1)-st left neighbour. Assume now that y ∈ V (G) is fixed and that H y−1 is such that H y−1 ∩ E k = ∅, and let us bound P y ֒→ N H (v)|H y−1 . Observe that if v ∈ im ψ y−1 , then we are by definition in the event Y k and hence not contributing to the probability of (5.12). Thus we can assume in what follows that (5.15) v ∈ im ψ y−1 .
Since H y−1 ∩ E k = ∅ and D ⊇ E k , by definition of D the (β t , 2D + 3)-diet condition holds for (H, im ψ y−εn ), where we have to subtract εn in the index of ψ y−εn because y could be as large as t + εn − 2 (and we only know that the diet condition holds up to time t − 1). This implies that for each set S of vertices in H with |S| ≤ 2D + 3 we have where the last inequality follows from γ ≤ p and ε ≤ αγ 2D+3 ≤ 1 2 β t γ 2D+3 . We conclude that the (2β t , 2D + 3)-diet condition holds for (H, im ψ y−1 ). Since deg − (y) ≤ D it follows that C y−1 (y) \ im ψ y−1 = (1 ± 2β t )p deg − (y) (n − y + 1) and Here we used the diet condition twice, once with the set of vertices ψ y−1 N − G (y) and once with the set ψ y−1 N − G (y) ∪ {v}. The latter set is indeed one larger than the former since ψ y−1 N − G (y) is by definition contained in the image of ψ y−1 and v is not by (5.15). Therefore we have We conclude from (5.13) that unless E k fails. Further, unless E k fails, we have Plugging this in (5.16), we get that E k fails or we have Since 0 ≤ Y k,y ≤ deg(y) for each y, we can thus apply Lemma 4 with the event E = E k , with µ ± ν = (1 ± 10β t ) k p k |X t,d | ± (k − 1)ε 2 n/d, and with ̺ = ε 2 n/d to conclude that By Lemma 7 applied to G, and because ∆(G) ≤ cn/ log n, we have and hence, because c ≤ D −4 ε 4 /100 and d ≤ D, we obtain (5.12) as desired.
Finally, we prove the diet lemma.
Proof of Lemma 26. First observe that if ψ t−1 satisfies the (β t−1 , 2D + 3)-diet condition, Ran-domEmbedding cannot fail at time t, so ψ t exists. We first state a claim that if the diet condition holds up to time t − εn, then for any given large set T ⊆ V (H), with high probability either the cover condition fails at some time before t − εn, or ψ t−1 embeds about the expected fraction of each interval of εn vertices to T . (a) ψ t does not have the (ε, 20Dβ j , j)-cover condition, or (b) {x : j ≤ x < j + εn, ψ t−1 (x) ∈ T } = (1 ± 40Dβ j ) |T |εn n−j . We defer the proof of this claim until later, and move on to state a second claim, which we will deduce from Claim 26.1. Let ℓ = ⌊ t εn ⌋. We claim that either we witness a failure of the diet or cover conditions before time t, or the set N H (R) ∩ N H * (S \ R) \ im ψ ℓεn has about the expected size for each R ⊆ S ⊆ V (H) with |S| ≤ 2D + 3.
Claim 26.2. With probability at least 1 − n −10 , one of the following holds.

(5.17)
Before proving these claims, we show that Claim 26.2 implies the lemma. We want to show that (5.17) holding implies that we do not have witnesses for a failure of the diet condition nor the codiet condition at time t. Indeed, taking logs, we have where the final equality holds since 1 − (k + 1)ε ≥ δ, and hence by choice of ε the quantity 40Dβ kεn ε 1−(k+1)ε is close to 0. Since at most εn vertices are removed from N H (R)∩N H * (S \R)\im ψ ℓεn to obtain N H (R) ∩ N H * (S \ R) \ im ψ t , we conclude We first consider the case R = S, when N H (R) ∩ N H * (S \ R) = N H (S), and deduce that S does not witness a failure of the (β t , 2D + 3)-diet condition for (H, im ψ t ). Indeed, from (5.18) we have where the second equality uses the fact that H is (α, 2D + 3)-quasirandom. We thus have Now, we let R be any subset of S and aim to establish the codiet condition. Again from (5.18), we have This concludes the proof of the lemma, modulo the proofs of Claim 26.1 and Claim 26.2, which we now provide.
Proof of Claim 26.1. Let j and T be as in the statement. Fix 0 ≤ d ≤ D. We want to show how to make use of the (ε, 20Dβ j , j)-cover condition for ψ t (which we have when Part (a) fails) to deduce that the assertion of Part (b) holds with high probability. That is, we consider the number of vertices in X j,d embedded to T . In order to apply Lemma 4, we want to estimate the sum over x ∈ X j,d of the probability that x is embedded to T , conditioning on ψ x−1 , that is, we need to estimate the number By the diet condition, we have C x−1 (x) \ im ψ j = (1 ± β j )p d (n − j). Since j < t ≤ (1 − δ)n, since x ≤ j + εn, since p ≥ γ, and by choice of ε, we have thus providing a bound on the denumerator in (5.19). (Note that this bound on the denumerator does not depend on the choice of x ∈ X j,d .) Now x is embedded uniformly at random into C x−1 (x) \ im ψ x−1 , so it remains to determine the sum of the numerators in (5.19), where the first equality uses j ≤ x < j + εn, and the second the fact that T ⊆ V (H) \ im ψ j and that |X j,d | ≤ εn.
Consider a vertex v ∈ T . If v ∈ im ψ j+εn−1 , then the (ε, 20Dβ j , j)-cover condition tells us that v contributes (1 ± 20Dβ j )p d |X j,d | ± ε 2 n to the summation x∈X j,d T ∩ C x−1 (x) . Since T ⊆ V (H) \ im ψ j , there are at most εn vertices v ∈ T such that v ∈ im ψ j+εn−1 , and these contribute between 0 and εn to the summation; in particular they contribute (1 ± 20Dβ j )p d |X j,d | ± ε 2 n ± εn to the summation. Putting this together, we have Putting this into (5.21) we have We can thus apply Lemma 4, setting E to be the event that the (ε, 20Dβ j , j)-cover condition holds for ψ j . The random variables whose sum we are estimating are the Bernoulli random variables indicating whether each x ∈ X j,d is embedded to T , so the sum of squares of their ranges is at most εn. Combining (5.20) and (5.22), the expected number of vertices of X j,d embedded to T is where we use n−j ≥ δn and p ≥ γ. The probability that the (ε, 20Dβ j , j)-cover condition holds for ψ j and the outcome differs from this by more than ε 2 n is at most 2 exp(−2ε 3 n) ≤ n −2D−20 , so taking the union bound over the D + 1 choices of d and summing, we conclude that with probability at most n −2D−19 the (ε, 20Dβ j , j)-cover condition holds for ψ j and the number of vertices x with j ≤ x < j + εn embedded to T is not equal to where the final equality uses our lower bound on |T | and the choice of ε. This is what we wanted to show.
Proof of Claim 26.2. Given a set S ⊆ V (H) with |S| ≤ 2D + 3 and a subset R ⊆ S, for each integer 0 ≤ k < ℓ, we set T k = N H (R) ∩ N H * (S \ R) \ im ψ kεn . Observe that as (H, H * ) is (η, 2D + 3)-coquasirandom, we have For each 0 ≤ k < ℓ, suppose that where the final line follows since 200Dβ n δ −1 ≤ 400CDαδ −1 < 1/100 by choice of α. We can thus apply Claim 26.1 with T = T k and obtain that with probability at least 1 − n −2D−19 either we have a failure of the diet or the cover condition is witnessed before time k, or we have Observe that then providing the assumption for using of Claim 26.1 in step k + 1.
Repeating this process for each 0 ≤ k ≤ ℓ − 1 we get that with probability at least 1 − ε −1 n −2D−19 either a failure of the diet or cover condition is witnessed before time ℓεn, or we have Taking a union bound over the at most (2D + 3)n 2D+3 choices of S and the at most 2 2D+3 choices of R ⊆ S, we see that with probability at least 1 − n −10 either a failure of the diet or cover condition is witnessed before time t, or the above equation holds for all |S| ≤ 2D + 3 and R ⊆ S.

Maintaining quasirandomness
In this section we provide the proofs of Lemma 16 and Lemma 17.
6.1. Initial coquasirandomness. We begin with the easy proof of Lemma 16, which states that splitting the edges of a quasirandom graph randomly gives a coquasirandom pair with high probability.
Proof of Lemma 16. Using (2.1) we see that the densities p 0 and p * 0 of H 0 and H * 0 satisfy (6.1) p 0 = (1 ± α 0 1000D )(p − γ) and p * 0 = (1 ± α 0 1000D )γ with probability at least 1 − n −10 , giving the first part of Lemma 16. Now, let R ⊆ S ⊆ V ( H) be two sets of size at most 2D + 3. By quasirandomness of H we have |N H (S)| = (1 ± ξ)p |S| n. Observe that each vertex of N H (S) appears with probability Observe also that for distinct vertices in N H (S) the events whether these appear in N H * 0 (R) ∩ N H 0 (S \ R) are independent. Using again (2.1), with probability at least 1 − n −2D−10 we have that Taking the union bound we conclude that (6.2) holds for all S ⊆ V ( H) with |S| ≤ 2D + 3 and R ⊆ S with probability at least 1 − n −6 . Now, assume that (6.1) holds. Then the right-hand side of (6.2) can be rewritten as We conclude that (H * 0 , H 0 ) is 1 10 α 0 , 2D + 3)-coquasirandom with probability at least 1 − n −5 .
6.2. Maintaining coquasirandomness. In this subsection we prove Lemma 17. We need to show that, provided coquasirandomness is maintained up to stage s−1 and RandomEmbedding does not fail, it is likely that coquasirandomness holds after stage s, when G s is embedded into H s−1 and we obtain H s . Let us briefly sketch the idea (for convenience focusing only on quasirandomness of H s ). We fix a set R ⊆ V ( H) with |R| ≤ 2D + 3, and consider the running of PackingProcess up to stage s. We want to show that it is very unlikely that R witnesses the failure of H s to be quasirandom, since then the union bound over choices of R tells us that it is likely that H s is quasirandom. In other words, we want to know that N Hs (R) is very likely close to the expected size. We write is the change at step i, and apply Lemma 5 to show that the sum Y 1 + · · · + Y s is very likely to be close to its expectation. So proving Lemma 17 boils down to estimating accurately E(Y i |H i−1 ) and finding a reasonable upper bound for E(Y 2 i |H i−1 ). The latter turns out to be relatively straightforward and is done in Lemma 31. We now outline the route to the former estimation.
Observe that Y i is equal to the number of stars in H i−1 whose leaves are the vertices in R and at least one of whose edges is used in embedding G i to H i−1 . By linearity of expectation, E(Y i |H i−1 ) is equal to the sum, over stars in H i−1 whose leaves are R, of the probability that at least one edge in the star is used in embedding G i . We will see that this probability is about the same for any given star S, and the problem is to calculate it. To do this we need to consider the running of RandomEmbedding.
We begin in Lemma 27 by estimating the chance that a given vertex, or one of a given pair of vertices, is used in a short time interval in RandomEmbedding. From this we deduce in Lemma 28 the probability that a given vertex, or one of a pair, is used in any given time interval. This helps us to establish, in Lemma 29, that any given edge of H i−1 is about equally likely to be used in the embedding of G i . Finally, in Lemma 30 we show that the chance of two or more edges in S being used in the embedding of G i is tiny, from which it follows that the chance of one or more is about |R| times the probability of any given edge being used.
All of these estimations depend upon H i−1 being sufficiently quasirandom, and the errors depend upon the quasirandomness α i−1 . Because the errors add up over time, it is important that the α s increase quite fast with s. Here it is very important that the dependence of the error term in Lemma 24 is linear in the input α and not much worse: otherwise it would not be possible to choose any sequence α s such that the error remains bounded by α s at each stage s.
As the main work is to estimate the probability that, for a given H s−1 and G s , and R and v, RandomEmbedding uses an edge of the star with centre v and leaves R when embedding G s into H s−1 , for most of this section we will consider fixed graphs G and H. We now embark upon this probability estimation.
First, for given u, v ∈ V (H), we estimate the probability that RandomEmbedding embeds a vertex to {u, v} in the short interval of time [t, t + εn), conditioning on not having done so before time t, and the probability that RandomEmbedding embeds a vertex to v in the interval of time [t, t + εn), conditioning on not having done so before time t. In both cases, we need to assume that the history H t−1 of embedding up to time t − 1 is typical (in a sense which we now make precise).
Lemma 27. Given D ∈ N and γ > 0, let δ, α 0 , α 2n , C, ε be as in Setting 15. The following holds for any α 0 ≤ α ≤ α 2n and all sufficiently large n. Suppose that G is a graph on [n] such that deg − (x) ≤ D for each x ∈ V (G), and H is an (α, 2D + 3)-quasirandom graph with n vertices and p n 2 edges, with p ≥ γ. Suppose that u and v are two distinct vertices of H. When RandomEmbedding is run to embed G[[n−δn]] into H, for any 1 ≤ t ≤ n + 1 − (δ + ε)n we have the following two statements.
(a) Suppose the history H t−1 up to and including embedding t − 1 is such that v ∈ im ψ t−1 , the (Cα, 2D + 3)-diet condition holds for (H, im ψ t−1 ), and Then we have P G֒→H v ∈ im ψ t+εn−1 H t−1 = (1 ± 10Cα) εn n−t . (b) Suppose the history H t−1 up to and including embedding t − 1 is such that u, v ∈ im ψ t−1 , the (Cα, 2D + 3)-diet condition holds for (H, im ψ t−1 ), and Then we have Before proving Lemma 27, we first sketch its proof. For Lemma 27(a), the idea is that either the cover condition fails, or v is in candidate sets of roughly p d |X t,d | vertices x of X t,d (for each d). Because the diet condition holds at time t − 1, each of these vertices x is embedded uniformly at random to a set of roughly p d (n − t) vertices. One would like to say that it follows that the probability that x is embedded to v is thus about 1/(p d (n − t)) and the desired result follows by summing these probabilities. Unfortunately this is not true: the probability that x is embedded to v also depends on the probability that no previous vertex was embedded to v. In order to get around this, we define the following ModifiedRandomEmbedding, which generates a sequence of embeddings with an identical distribution to RandomEmbedding, but which in addition generates a sequence of reported vertices. The modification we make is simple: at each time 1 ≤ t ′ ≤ n − δn, RandomEmbedding chooses a vertex of C t ′ −1 , and report this vertex. If the reported vertex w is not in im ψ t ′ −1 , we set ψ t ′ = ψ t ′ −1 ∪ {t ′ ֒→ w}, as in RandomEmbedding. If the reported vertex is in im ψ t ′ −1 (which happens only if w = v) we choose w ′ uniformly at random in C t ′ −1 G֒→H (t ′ ) \ im ψ t ′ −1 , and set ψ t ′ = ψ t ′ −1 ∪ {t ′ ֒→ w ′ }. We will see that it is easy to calculate the expected number of times v is reported, and also easy to show that the contribution due to v being reported multiple times is tiny. The point is that the probability of RandomEmbedding using v is the same as the probability that ModifiedRandomEmbedding reports v at least once, which we can thus calculate.
Lemma 27(b) is established similarly, using a slightly different version of ModifiedRan-domEmbedding.
Proof of Lemma 27(a). Instead of RandomEmbedding, we consider ModifiedRandomEmbedding as defined above, which creates the same embedding distribution. For each i, let r(i) be the vertex reported by ModifiedRandomEmbedding at time i. We shall use the following two auxiliary claims.
Define E as the random variable counting the times when v is reported by ModifiedRan-domEmbedding in the interval t ≤ x < t + εn, The probability that RandomEmbedding uses v in the interval t ≤ x < t + εn, conditioning on H t−1 , is equal to the probability that ModifiedRandomEmbedding reports v at least once in that interval, which probability is by definition at most E (E | H t−1 ) and at least Our first claim estimates E (E | H t−1 ).
Claim 27.1. We have that Our second claim is that the sum in the expression above is small.
Proof of Claim 27.1. Note that since the (Cα, 2D + 3)-diet condition holds for (H, im ψ t−1 ), Before trying to obtain an accurate estimate for E (E | H t−1 ), we give an easy (and not very sharp) upper bound on P G֒→H v ∈ im ψ t+εn−1 H t−1 . When we embed any one vertex x with t ≤ x ≤ t + εn − 1, we have a probability at most |C x−1 (x) \ im ψ x−1 −1 of embedding x to v (in fact, the probability is equal either to this figure or to zero). Using the lower bound (6.4) and summing over the εn choices of x, we see We now try to estimate the desired expectation. By linearity of expectation, we have (6.6) Using (6.4), we get Splitting this sum up according to |N − (x)|, and again using linearity of expectation, we have .
Proof of Claim 27.2. Since the (Cα, 2D + 3)-diet condition holds for (H, im ψ t−1 ), since p ≥ γ, and since n − t ≥ δn, for each x ∈ [t, t + εn), when we embed x we report a uniform random vertex from a set of size at least 1 2 γ D δn. The probability of reporting v when we embed x is thus at most 2γ −D δ −1 n −1 , conditioning on H t−1 and any embedding of the vertices [t, x). Since the conditional probabilities multiply, the probability that at each of a given k-set of vertices in [t, t + εn) we report v is at most 2 k γ −kD δ −k n −k . Taking the union bound over choices of k-sets, we have where we use the bound εn k ≤ (εn) k and sum the resulting geometric series.
The proof of Lemma 27(b) is similar, and we only focus on the differences.

Proof of Lemma 27(b).
We define MoreModifiedRandomEmbedding this time reporting a uniform random vertex of C t−1 G֒→H (t) \ (im ψ t−1 \ {u, v}) at each time step t, and either embedding t to it (if it is not in im ψ t−1 ) or otherwise picking as before a uniform random vertex of C t−1 G֒→H (t) \ im ψ t−1 to embed t to. As before, the embedding distribution generated by this procedure is the same as for RandomEmbedding. We let E ′ be the number of times u or v are reported in the interval t ≤ x < t + εn. Again, the probability that RandomEmbedding uses either u or v is equal to the probability that MoreModifiedRandomEmbedding reports u or v at least once, which by definition is By linearity of expectation, E (E ′ |H t−1 ) is equal to the expected number of times u is reported plus the expected number of times v is reported. We now argue that these latter quantities are (1 ± 4Cα) εn n−t ± 8(D + 1)ε 2 γ −2D δ −2 . This follows from calculations in Claim 27.1, with a small change which we now describe. Note that Claim 27.1 deals with ModifiedRandomEmbedding, where reported vertices are taken from C x−1 (x) \ (im ψ x−1 \ {v}) and not from C x−1 (x) \ (im ψ x−1 \ {u, v}). This is corrected if we rewrite (6.6) as Then the rest of the calculations in Claim 27.1 applies (see Footnote 4) We thus have Again, it remains to show that the effect of reporting u or v multiple times is small. This time the probability at any step x that one of u and v is reported, conditioning on the history up to time x − 1, is at most 4γ −2D δ −2 n −1 , and by the same calculation as above we conclude that the summation is bounded above by 32ε 2 γ −2D δ −2 , which as before gives Lemma 27(b).
We now use Lemma 27 to estimate the probability of embedding a vertex to v, or to {u, v}, in the interval (t 0 , t 1 ] (which may be of any length). This time, we do not condition on one typical embedding history up to time t 0 , but rather on a history ensemble up to time t 0 which is not very unlikely. This allows us to drop the typicality restriction, simply because only very few histories can be atypical.
Lemma 28. Given D ∈ N and γ > 0, let δ, α 0 , α 2n , C, ε be as in Setting 15. Then the following holds for any α 0 ≤ α ≤ α 2n and all sufficiently large n. Suppose that G is a graph on [n] such that deg − (x) ≤ D for each x ∈ V (G), and H is an (α, 2D + 3)-quasirandom graph with n vertices and p n 2 edges, with p ≥ γ. Let 0 ≤ t 0 < t 1 ≤ n − δn. Let L be a history ensemble of RandomEmbedding up to time t 0 , and suppose that P(L ) ≥ n −4 . Then the following hold for any distinct vertices u, v ∈ V (H).
(a) If v ∈ im ψ t 0 then we have Proof. We write P for P G֒→H . We shall first address part (a). We divide the interval (t 0 , t 1 ] into k := ⌈(t 1 − t 0 )/εn⌉ intervals, all but the last of length εn. Let L 0 := L . Let, for each 1 ≤ i < k, the set L i be the embedding histories up to time t 0 + iεn of RandomEmbedding which extend histories in L i−1 and are such that v ∈ ψ t 0 +iεn . Let L k be the embedding histories up to time t 1 extending those in L k−1 such that v ∈ ψ t 1 . Thus we have Finally, for each 1 ≤ i ≤ k, let the set L ′ i−1 consist of all histories in L i−1 such that the (Cα, 2D + 3)-diet condition holds for (H, im ψ t 0 +(i−1)εn ) and the probability that the (ε, Cα, t 0 + 1 + (i − 1)εn)-cover condition fails, conditioned on ψ t 0 +(i−1)εn , is at most n −3 . In other words, L ′ i is the subset of L i consisting of typical histories, satisfying the conditions of Lemma 27.
We now determine P(L k ) in terms of P(L 0 ), and in particular we show inductively that P(L i ) > n −5 for each i. Observe that for any time t, the probability (not conditioned on any embedding) that either the (Cα, 2D + 3)-diet condition fails for (H, im ψ i ) for some i ≤ t or that the (ε, Cα, t + 1)-cover condition has probability greater than n −3 of failing, is at most 2n −6 by Lemma 24. In other words, for each i we have P(L i \ L ′ i ) ≤ 2n −6 . Thus by Lemma 27(a) we have where the final equality uses the lower bound P(L i−1 ) ≥ n −5 . Similarly, we have P(L k ) = 1 ± (1 + 20Cα) εn n−t 1 P(L k−1 ). Putting these observations together, we can compute P(L k ): Observe that the approximation log(1 + x) = x ± x 2 is valid for all sufficiently small x. In particular, since n − t 0 − (i − 1)εn ≥ n − t 1 ≥ δn and by choice of ε, for each i we have Thus we obtain where we use t 1 ≤ n − δn, and we justify that the integral and sum are close by observing that for each i in the summation, if (i − 1)εn ≤ x ≤ iεn then we have where the final inequality uses n − t 0 − iεn ≤ n − t 1 ≤ δn and the choice of ε. By choice of ε, this gives part (a). Furthermore, (6.8), and the fact t 1 ≤ n − δn, imply that P(L k ) ≥ n −5 . Since the L i form a decreasing sequence of events the same bound holds for each L i .
For part (b), we use the identical approach, replacing Lemma 27(a) with Lemma 27(b). Since the difference between these equations is a factor of 2, we obtain twice all the terms other than the term log P(L 0 ) in the above equation, and hence the second statement of the claim.
Next, we estimate the probability that the edge uv ∈ E(H) is used by RandomEmbedding when embedding G to H. The idea is the following. In order for uv to be used, there must be some xy ∈ G such that x is embedded to u and y to v, or vice versa. These events are disjoint, and so it suffices to estimate the probability of each separately and sum them. Without loss of generality, we can assume x is embedded before y. We need to calculate the probability that x is embedded to u and y to v. In other words, we need that all left-neighbours of x are embedded to neighbours of u, all left-neighbours of y are embedded to vertices of v, other vertices are not embedded to {u, v}, and when we come to embed x and y we actually do embed them to u and v. The point of phrasing it like this is that, provided the diet condition holds, we can estimate accurately all the (conditional) probabilities of embedding individual vertices in N (x) ∪ N (y) ∪ {x, y} to neighbourhoods or to u or v, while Lemma 28 gives accurate estimates for the probability of any other vertex being embedded to u or v. Putting this together yields the desired accurate estimate for the probability that we have x ֒→ u and y ֒→ v.
Lemma 29. Given D ∈ N, and γ > 0, let constants δ, ε, C, α 0 , α 2n be as in Setting 15. Then the following holds for any α 0 ≤ α ≤ α 2n and all sufficiently large n. Suppose that G is a graph on [n] such that deg − (x) ≤ D for each x ∈ V (G), and H is an (α, 2D + 3)-quasirandom graph with n vertices and p n 2 edges, with p ≥ γ. Let uv be an edge of H. When RandomEmbedding is run to embed G[[n−δn]] into H, the probability that an edge of G is embedded to uv is Proof. We first calculate the probability that a given pair (x, y), such that xy is an edge of G, is embedded to (u, v), in that order. Without loss of generality, suppose that x < y. Let z 1 , . . . , z k be the vertices N − (x) ∪ N − (y) \ {x, y} in increasing order. Let j ∈ {0, . . . , k} be such that z j < x < z j+1 (where the case j = 0 and j = k corresponds to the situations when all z i 's are to the right or to the left of x, respectively; in these cases some notation below has to be modified in a straightforward way). Define time intervals using z 1 , . . . , z j , x, z j+1 , . . . , z k , y as separators: We now define a nested collection of events, the first being the trivial (always satisfied) event and the last being the event {x ֒→ u, y ֒→ v}, whose probability we wish to estimate. These events are simply that we have not yet (by given increasing times in RandomEmbedding) made it impossible to have {x ֒→ u, y ֒→ v}. We will see that we can estimate accurately the probability of each successive event, conditioned on its predecessor.
Let L ′ −1 be the trivial (always satisfied) event. If L ′ i−1 is defined, we let L i be the event that L ′ i−1 holds intersected with the event that (A1) (if i ≤ j:) no vertex of G in the interval I i is mapped to u or v, or (A2) (if i > j:) no vertex of G in the interval I i is mapped to v.
In other words, L i is the event that we have not covered u or v in the interval I i . It turns out that we do not need to know anything else about the embeddings in the interval I i .
If L i is defined, we let L ′ i be that event that L i holds and that (B1) (if i < j:) Again, in order for {x ֒→ u, y ֒→ v} to occur we obviously need that a neighbour of x is embedded to a neighbour of u and so on, hence the above conditions. By definition, we have L ′ k+1 = {x ֒→ u, y ֒→ v}. Since we have L ′ i ⊆ L i ⊆ L ′ i−1 for each i and L ′ −1 is the sure event, we see Thus, we need to estimate the factors in (6.9). This is done in the two claims below. In each claim we assume P(L ′ i ), P(L i ) > n −4 . This assumption is justified, using an implicit induction, since the smallest of all the events we consider is L ′ k+1 , whose probability according to the following (6.13) is bigger than n −4 .
Proof. By definition of (A1), for each i = 0, . . . , j, we have (6.10) by Lemma 28(b), with L = L ′ i−1 . Note that looking at two consecutive indices i and i + 1 in (6.10) we have cancellation of the former nominator and the latter denominator, n − 1 − max(I i ) = n − min(I i+1 ) + 1. Thus, To express k+1 i=j+1 P L i | L ′ i−1 , by definition of (A2) we have to repeat the above replacing Lemma 28(b) by Lemma 28(a). We get that Putting (6.11) and (6.12) together, we get the statement of the claim. .
Proof. Suppose that we have embedded up to vertex max(I i ), and that L i holds. The probability of the event L ′ i depends on which of the cases in (B1)-(B3) applies. When L ′ i is defined using (B1)(i) then the probability P( be the set of vertices in H to which we could embed z i+1 , given the embedding of all vertices before z i+1 . Suppose that the (Cα, 2D + 3)-diet condition holds for (H, im ψ z i+1 −1 ). Then we have where the last line uses the (Cα, 2D + 3)-diet condition for (H, im ψ z i+1 −1 ) twice, in the denominator with the set ψ(N − (z i+1 )) and in the numerator with the set {u} ∪ ψ(N − (z i+1 )).
Let us now deal with the terms P L ′ j | L j and P L ′ k+1 | L k+1 which correspond to (B2) and (B4), respectively. Suppose first that L j holds. In particular, N − (x) is embedded to N H (u). Suppose first that the (Cα, 2D + 3)-diet condition for (H, im ψ x−1 ) holds. With this, conditioning on the embedding up to time x − 1, the probability of embedding x to u is Similarly, if the (Cα, 2D + 3)-diet condition for (H, im ψ y−1 ) holds, the probability of embedding y to v, provided N − (y) is embedded to N H (v), and conditioning on the embedding up to time y − 1, is (1 ± 2Cα)p − deg − (y) 1 n+1−y .
Plugging Claims 29.1 and 29.2 into (6.9), we get We now sum over the choices of (x, y) such that xy ∈ E(G). There are 2e(G) such choices, so we conclude that the probability that some edge of G is embedded by RandomEmbedding to uv is as desired.
We can now estimate the probability that, again for fixed G and H, at least one edge in a given star in H is used by RandomEmbedding.
Lemma 30. Given D ∈ N and γ > 0, let the constants δ, ε, α 0 , α 2n , C be as in Setting 15. Then the following holds for any α 0 ≤ α ≤ α 2n and all sufficiently large n. Suppose that G is a graph on [n] such that deg − (x) ≤ D for each x ∈ V (G), with at least n/4 edges and maximum degree ∆(G) ≤ n/ log n, and H is an (α, 2D + 3)-quasirandom graph with n vertices and p n 2 edges, where p ≥ γ. Let u 1 , . . . , u k , v be vertices of H for some k ≤ 2D + 3, and suppose u i v is an edge of H for each i. When RandomEmbedding is run to embed G[[n−δn]] into H, the probability that there is at least one u i v to which some edge of G is embedded is Proof. Given u 1 , . . . , u k , v and G and H, let S be the event that there is at least one u i v to which some edge of G is embedded.
The expected number of edges u i v embedded to by RandomEmbedding is, by Lemma 29 and linearity of expectation, and by inclusion-exclusion, we have E − 1≤i<i ′ ≤k P u i v and u i ′ v are embedded to by RandomEmbedding ≤ P(S) ≤ E .
We thus simply have to show that the above sum, which has k 2 ≤ 2D+3 2 terms, is small. We will show that the probability of RandomEmbedding embedding to any two fixed edges uv, u ′ v is small. This probability is equal to the sum over triples x, x ′ , y ∈ V (G) such that xy, x ′ y ∈ E(G) of the probability that x ֒→ u, x ′ ֒→ u ′ and y ֒→ v. For any given y ∈ V (G) there are at most deg G (y) 2 choices of (x, x ′ ), so by Lemma 7, there are at most 2Dn∆(G) such triples. It is now enough to make the estimate for one such triple. Assuming the (Cα, 2D + 3)-diet condition holds throughout RandomEmbedding, we embed each of x, x ′ and y uniformly at random into a set of size at least 1 2 p D δn ≥ 1 2 γ D δn, so the probability of the event x ֒→ u, x ′ ֒→ u ′ , y ֒→ v is at most 8γ −3D δ −3 n −3 . Finally, the probability of the (Cα, 2D + 3)-diet condition failing for some (H, im ψ i ) is by Lemma 26 at most 2n −9 . Putting this together, we have Because e(G) ≥ n/4 the first term in the above is Θ(n −1 ), while since ∆(G) ≤ n/ log n the other two terms are of asymptotically smaller order. Since n is sufficiently large, this gives the desired result.
In Lemma 30 we estimated the probability of using an edge in a star with a given centre and a given set R of ends. In particular, looking at all stars in H whose ends are R, we get an estimate of the expected number of them from which an edge is used in the embedding. In the following lemma we prove an upper bound on the second moment of this random variable.
Lemma 31. Let D ∈ N and let γ > 0. Let δ, ε, c, C, α 0 , α 2n be as in Setting 15. Then the following holds for any α 0 ≤ α ≤ α 2n and all sufficiently large n. Suppose that G is a graph on [n] such that deg − (x) ≤ D for each x ∈ V (G), with at least n/4 edges and maximum degree ∆(G) ≤ cn/ log n, and H is an (α, 2D + 3)-quasirandom graph with n vertices and p n 2 edges, where p ≥ γ. Given R ⊆ V (H) with |R| ≤ 2D + 3 and any subset T of N H (R), let X count the number of vertices v ∈ T such that an edge from v to R is used by RandomEmbedding when embedding G to H. Then we have Proof. We can write X = v∈T W v , where W v is the indicator random variable of the event that some edge from R to v is used in embedding G. We have Since e(G) ≤ Dn, by Lemma 30, applied with {u 1 , . . . , u k } = R and for each v ∈ T , we have where we use |R| ≤ 2D + 3 and |T | ≤ n. Thus the main task is thus to estimate is equal to 1 if and only if there is an edge of G embedded to some edge between R and v, and another to an edge between R and v ′ . So, in order to refine our strategy, for v ∈ T and u ∈ R, let Y v,u be the indicator random variable of the event that the edge uv is used in embedding G. For each {v, v ′ } ⊆ T we have (6.14) First, we focus on the first term of the right-hand side of (6.14). That is, we need to find an upper bound for the probability that two given disjoint edges xy and x ′ y ′ of G are embedded to respectively uv and u ′ v ′ for some fixed u, u ′ ∈ R and fixed v, v ′ . As RandomEmbedding runs, either for some t we observe that the (Cα, 2D + 3)-diet condition fails for (H, im ψ t ), or it is successful and at each time t, the vertex t is embedded uniformly at random into a set of size at least 1 2 γ D δn. The probability of the former occurring is at most 2n −9 by Lemma 24, while in the latter case the probability of embedding x, y, x ′ , y ′ to u, v, u ′ , v ′ in that order is at most 16γ −4D δ −4 n −4 . Putting these together the probability of xy, x ′ y ′ being embedded to uv, u ′ v ′ in that order is at most 32γ −4D δ −4 n −4 . Summing over the at most 8 e(G) 2 ≤ 8 Dn 2 choices of edges xy, x ′ y ′ and their orderings, we get There are exactly |R| 2 − |R| ≤ (2D + 3) 2 choices of distinct vertices u, u ′ ∈ R. Hence Next, we focus on the second term of the right-hand side of (6.14). That is, we now find an upper bound for the probability that RandomEmbedding uses both uv and uv ′ for some u ∈ R. The only way this can happen is that for some x, y, y ′ ∈ V (G) with xy, xy ′ ∈ E(G), the vertex x is embedded to u and y, y ′ to v, v ′ . Again, by Lemma 24, the probability that a fixed such triple x, y, y ′ are embedded to u, v, v ′ is at most 2n −9 + 8γ −3D δ −3 n −3 . By Lemma 7 there are at most 2Dn∆(G) such triples. Hence, we get There are exactly |R| ≤ 2D + 3 choices of u, so the probability that RandomEmbedding uses both uv and uv ′ for some u ∈ R is at most We can now plug in (6.15) and (6.16) into (6.14), Summing over the at most n 2 choices of v, v ′ ∈ T , we obtain the desired bound.
We are now in a position to prove Lemma 17.
Proof of Lemma 17. We definep by e(H * 0 ) =p n 2 . By assumption we havep = (1 ± η)γ. Our aim is to show that with high probability, for any given s, either PackingProcess fails before completing stage s or the pair (H s , H * 0 ) is (α s , 2D + 3)-coquasirandom. Let S be a set of at most 2D + 3 vertices in V (H * 0 ), and let R ⊆ S. Recall that for (H s , H * 0 ) to be (α s , 2D + 3)-coquasirandom means that N Hs (R) ∩ N H * 0 (S \ R) has about the size one would expect if both graphs were random. For each 1 ≤ i ≤ s, let We now need to estimate the sum s i=1 E(Y i |H i−1 ), on the assumption that each (H i−1 , H * 0 ) is (α i−1 , 2D + 3)-coquasirandom. We first estimate the sum of the main terms of (6.17): Note that for every x, h ∈ [0, 1] and a ∈ N, we have (x + h) a − x a = ah(x + h) a−1 ± 2 a h 2 . We use this with x := p i , h := p i−1 − p i , and a := |R|, and observe that n We continue (6.18) as follows: Next, we bound the sum of the error terms of (6.17): Let us write ∆ := cn/ log n. We wish to estimate Var( Furthermore, the range of each Y i is at most |S|∆(G i ) ≤ |S|∆. We apply Lemma 5 with σ 2 as above, ̺ = εn and E the event that the pair (H i , H * 0 ) is (α i , 2D + 3)-coquasirandom for each 0 ≤ i ≤ s − 1. We obtain that the probability that where the last inequality is by choice of c. Taking the union bound over all choices of R ⊆ S and S of size at most 2D +3, and applying Lemma 26, we see that the following event has probability at most 3n −9 . The pair (H i , H * 0 ) is (α i , 2D+3)-coquasirandom for each 0 ≤ i ≤ s−1, but either RandomEmbedding fails to embed G s or (H s , H * 0 ) is not (α s , 2D+3)-coquasirandom. Taking now the union bound over all choices of 1 ≤ s ≤ s * , and recalling that (H 0 , H * 0 ) is by assumption 1 4 α 0 , 2D + 3 -coquasirandom, we conclude that the probability that for any 1 ≤ s ≤ s * , RandomEmbedding fails to embed G s or the pair (H s , H * 0 ) fails to be (α s , 2D + 3)-coquasirandom is at most 1.5n −8 . This completes the proof.

Completing the embedding
Recall that we complete the embedding of each graph G s by embedding the final δn vertices using only edges of H * s−1 . From Setting 15, these unembedded of G s vertices form an independent set and each of them has degree d s . Lemma 19 states that it is very likely, provided PackingProcess does not fail and provided (H s , H * 0 ) is coquasirandom for each s, that only a few edges of H * 0 are used at any given vertex to form H * s , and hence (H s , H * s ) is also coquasirandom. Complementing this, Lemma 20 states that this coquasirandomness guarantees that completing the embedding is possible. We prove these two lemmas in this section.
To prove Lemma 19, we give an upper bound for the expected number of edges used at v in each stage, and apply Lemma 5 to show that the actual outcome is with high probability not much larger than this upper bound. For each x ∈ V (G s ), we define the completion degree of x, written deg * (x), to be the degree of x in the bipartite graph G s [n − δn], [n] \ [n − δn] . Then the number of edges of H * 0 at v used in stage s is deg * (x) where x is the vertex of G s embedded to v. Note that since n x=n−δn+1 deg * (x) = δnd s , the hand-shaking lemma tells us that We note that the number of edges of H * s−1 used in stage s at any given vertex v does not depend upon how the embedding of G s is completed, but only on how RandomEmbedding embeds the first n − δn vertices, so the proof of Lemma 19 will only need to analyse Ran-domEmbedding. Indeed, if some vertex x ∈ V (G s ), x ≤ n − δn is mapped onto v, then this number is deg * (x). If on the other hand, v is not in the image of G s [n − δn] then v will be used is the completion phase. In this case, the number of edges used at v will be d s irrespective of which particular vertex v will host. at v used in stage s. We have We define E to be the event that PackingProcess succeeds and (H s−1 , H * 0 ) is (α s−1 , 2D + 3)coquasirandom for each 1 ≤ s ≤ s * . In other words, E is the complement of the first two events in the statement of Lemma 19, so to prove Lemma 19 we want to show that the probability of E occurring and the third event not occurring is very small.
Suppose that H s−1 is an arbitrary history of PackingProcess up to and including stage s − 1 for which (H s−1 , H * 0 ) is (α s−1 , 2D + 3)-coquasirandom. We begin by estimating E(Y s |H s−1 ). To estimate the desired expectation, we first aim to show In order to establish (7.3) and (7.4), we need the following consequence of Lemma 28. Conditioning on H s−1 , for each 1 ≤ t ≤ n − δn, the probability that RandomEmbedding does not embed any of the first t vertices of G s to v is at most 2 n−1−t n < 2 n−t n . This readily establishes (7.4).
Furthermore, under the same conditioning, by Lemma 24, for each 1 ≤ t ≤ n − δn, with probability at least 1 − 2n −9 , we have C t−1 Gs֒→H s−1 (t) ≥ 1 2 γ D (n + 1 − t). Now, for each 1 ≤ t ≤ n − δn, the probability that RandomEmbedding, conditioning on H s−1 , embeds t to v is the probability that no vertex is embedded to v at time t − 1 times the probability of picking v when choosing uniformly from the candidate set of t. This is at most This establishes (7.3). Now, we are going to substitute (7.3) and (7.4) into (7.2). To this end, recall that for each x ∈ [n − δn + 1, n] we have deg * (x) = d s . It follows that Next, we obtain a similar upper bound for the second moment. Since only one vertex gets embedded to v, we have Since 0 ≤ Y s ≤ ∆(G s ) ≤ ∆ holds for each s, and since s * ≤ 2n, we can apply Lemma 5, with ̺ = δn and with E as defined above, to give where the final inequality is since ∆ = cn/ log n and by choice of c. Taking the union bound over all choices of v, we see that the probability that E occurs and yet more than 50γ −D Dδn edges of H * 0 are deleted at any vertex in the running of PackingProcess is at most n −99 . Because the degree of each vertex in H * s is monotone decreasing as s increases, in particular this implies that the probability that there exists 1 ≤ s ≤ s * such that PackingProcess completes stage s, where the final line is by choice of δ in (4.1) and since p ≥ γ, so that (H s , H * s ) is (η, 2D + 3)coquasirandom, as desired.
Recall that Lemma 20 states that it is likely that the partial embedding φ s of each G s provided by RandomEmbedding can be extended to an embedding φ * s of G s , with the completion edges used for the extension lying in H * . Since the neighbours of each of the last δn vertices of G s are embedded by φ s , the set of candidate vertices for each x of these last δn vertices in V (H * s−1 ) \ im φ s are already fixed, and the desired φ * s exists if and only if there is a system of distinct representatives for the C * s (x) as x ranges over the last δn vertices of G s . Recall that Lemma 24 states in particular that (H * , im φ s ) is likely to satisfy the (2η, 2D + 3)-diet condition, which implies both that C * s (x) is of size roughly p ds δn for each of these last x, and also that the collection of sets is well-distributed (in a sense we will make precise later). We will see that this is almost enough to verify Hall's condition for the existence of a system of distinct representatives, but we need in addition to know that every vertex of H * s−1 − im φ s is in sufficiently many of these candidate sets. The following lemma states that this typically is the case.
The proof of this lemma is similar to the proof of Lemma 25.
Proof. Fix v ∈ V (H * ) and let I be the last δn vertices of G, which by assumption form an independent set. If at any time during the run of RandomEmbedding we embed a vertex to v, then there is nothing to prove, so we will suppose that this does not occur. Denote by N − k (x) the first k neighbours of N − (x). Let Y k be the event that the vertices N − k (x) are all embedded to N H * (v) for about as many x ∈ I as one would expect, more formally that (7.6) x ∈ I : ψ n−δn (N − k (x)) ⊆ N H * (v) = (1 ± 10kη)γ k δn . Let B be the event that the (2η, 2D + 3)-codiet condition fails at some time t ≤ n − δn. Let Z k,t := x ∈ I : ψ n−δn (N − k−1 (x)) ⊆ N H * (v) and t is the kth vertex of N − (x) . In other words, when we embed the vertex t, if it is embedded to N H * (v) it will add Z k,t more vertices to the set in (7.6). Let Y k,t := Z k,t · ½ ψ n−δn (t)∈N H * (v) .
We want to show that if Y k−1 occurs, then Y k is very likely to occur. We will then show this implies the lemma. Observe that Y k is the event that n−δn t=1 Y k,t = (1 ± 10kη)γ k δn. Furthermore, Y k−1 implies that n−δn t=1 Z k,t = (1 ± 10(k − 1)η)γ k−1 δn. We would like to calculate n−δn t=1 E(Y k,t H t−1 ), where H t−1 denotes the embedding history of RandomEmbedding up to and including embedding t − 1. Given a time t, if t is the kth vertex of N − (x), then at time t − 1 the first k − 1 vertices of N − (x) have already been embedded, so Z k,t is determined. Thus we have E(Y k,t H t−1 ) = P ψ t (t) ∈ N H * (v) H t−1 · Z k,t .
Applying Lemma 4 with ̺ = ηγ k δn, we deduce that the probability that Y k fails is very small. Indeed, the probability that B holds but n−δn t=1 Y k,t = (1 ± 10kη)γ k δn is at most 2 exp − η 2 γ 2k δ 2 n 2 log n 2Dcn 2 ≤ n −20 , where we use that Y k,t ≤ deg(t) and observe that Lemma 7 gives n−δn t=1 deg(t) 2 ≤ 2D∆(G)n ≤ 2Dcn 2 / log n. As Y 0 holds trivially with probability one, by a union bound over the choices of k and v we obtain that the probability that B holds but there is some 1 ≤ k ≤ d for which Y k fails is at most 2dn −19 . Finally, Lemma 24 states that B holds with probability at most 2n −9 , giving the lemma statement by the union bound.
We are now in a position to prove the completion lemma, Lemma 20.
Suppose that both good events occur, which happens with probability at least 1 − 5n −9 . We will now show that (deterministically) this implies the existence of a system of distinct representatives for the candidate sets C * (x) : n − δn + 1 ≤ x ≤ n , which trivially gives an embedding φ * of G into H ∪ H * such that all edges in [n − δn] are embedded to H and the rest to H * , as desired.
The final, harder, case is 1 2 γ D δn < |X| < δn − 1 2 γ D δn. Given X in this size range, let X ′ be a maximal subset of X with the property N G (x) ∩ N G (x ′ ) = ∅ for each x, x ′ ∈ X ′ . Since each vertex of X ′ has d ≤ D neighbours, the set Y = x∈X ′ N G (x) has size at most D|X ′ |. By maximality of X ′ , every vertex in X is adjacent to some vertex of Y . Since no vertex of Y has degree more than ∆(G) ≤ cn/ log n, we conclude 1 2 γ D δn < |X| ≤ ∆(G)|Y | ≤ ∆(G)D|X ′ | ≤ cnD|X ′ |/ log n , and hence |X ′ | ≥ log n by choice of c in (4.1). We will now argue that Z := x∈X ′ C * (x) satisfies |Z| ≥ 1 − 1 2 γ D δn, which implies (7.8). Suppose for a contradiction that |Z| < 1 − 1 2 γ D δn. By definition, we have C * (x) ⊆ Z for each x ∈ X ′ . We now aim to estimate the number N of triples (x, x ′ , z) with x, x ′ ∈ X distinct and z ∈ Z satisfying z ∈ C * (x) ∩ C * (x ′ ). For each z, let d z = {x ∈ X ′ : z ∈ C * (x)} .

Concluding remarks
8.1. Constants in Theorem 2. Given γ and D in Theorem 2, the constant c is set in Setting 15. All the dependencies in (4.1) are polynomial, except for the exponentials used to define C and α x . As a result, c depends roughly doubly-exponentially on D and γ, more precisely c ≈ exp(− exp(D 5+o(1) · γ −24D−10+o(1) )) (where o(1) → 0 as D, 1/γ → ∞). This of course puts an implicit requirement on n 0 , as instances of the result for which the maximum degree bound cn log n are less than 1 are vacuous. By way of brief comparison with other recent packing results, we believe most of the results we cited earlier obtain broadly similar or better constant dependencies to our results (though these bounds are generally not given explicitly and we did not check carefully), unless the Regularity Lemma is used. 8.2. Limits of the method. As Ferber and Samotij [11] point out, a randomised strategy such as the one we use here will not succeed in packing graphs with many vertices of degree ω n log n , because it is likely to put these vertices unevenly into the host graph and after packing only half the guest graphs one vertex will probably have degree substantially less than the average. If the remaining graphs are for example Hamilton cycles, this vertex will become a bottleneck which causes the strategy to fail. One might try to pick vertices non-uniformly in order to correct such imbalances as they form, but analysing such a strategy would be challenging and it is not clear that it would work: common neighbourhoods of several vertices will also occasionally be far from the expected size.
Although it might well be that we can obtain near-perfect packings of graphs with degeneracy much bigger than log n into K n , any strategy like the one we use here will certainly not succeed in doing so. The reason is simply that strategies like ours work by maintaining quasirandomness, and hence work equally well starting with a dense random graph rather than the complete graph. Take H to be a clique of order 3 log 2 n. Then a well-known calculation shows that G n, 1 2 typically does not even contain one copy of H. We have not tried to analyse our approach more carefully in order to work with sparse random or quasirandom graphs. We are confident that (with substantially more work, and using ideas from [2]) one could prove a near-perfect packing result for typical G n, p , where p > n −ε for some ε > 0 depending on the degeneracy bound D. But we suspect that our approach would not then allow for maximum degrees of the guest graphs as large as Ω(pn/ log n), even if we asked only to pack almost-spanning graphs, and certainly we cannot take ε as big as 1 2D+3 , since at this point G(n, p) itself is typically not ( 1 2 , 2D + 3)-quasirandom. In particular, our approach cannot challenge the tree packing results of [11] in sparse random graphs. 8.3. Perfect packings. It is easy to check that the graph of uncovered edges in the packing of Theorem 11 is (2η, 2D +3)-quasirandom, and η can be chosen arbitrarily small by increasing D if necessary. In particular, this means that the result of Joos, Kim, Kühn and Osthus [18] applies to this leftover. Thus we can extend the result of [18] on the Tree Packing Conjecture to allow many trees where the maximum degree is bounded only by cn log n , provided that it is bounded by D in the remainder. This is however a rather peculiar condition.