Bivariate fluctuations for the number of arithmetic progressions in random sets

We study arithmetic progressions $\{a,a+b,a+2b,\dots,a+(\ell-1) b\}$, with $\ell\ge 3$, in random subsets of the initial segment of natural numbers $[n]:=\{1,2,\dots, n\}$. Given $p\in[0,1]$ we denote by $[n]_p$ the random subset of $[n]$ which includes every number with probability $p$, independently of one another. The focus lies on sparse random subsets, i.e.\ when $p=p(n)=o(1)$ as $n\to+\infty$. Let $X_\ell$ denote the number of distinct arithmetic progressions of length $\ell$ which are contained in $[n]_p$. We determine the limiting distribution for $X_\ell$ not only for fixed $\ell\ge 3$ but also when $\ell=\ell(n)\to+\infty$. The main result concerns the joint distribution of the pair $(X_{\ell},X_{\ell'})$, $\ell>\ell'$, for which we prove a bivariate central limit theorem for a wide range of $p$. Interestingly, the question of whether the limiting distribution is trivial, degenerate, or non-trivial is characterised by the asymptotic behaviour (as $n\to+\infty$) of the threshold function $\psi_\ell=\psi_\ell(n):=np^{\ell-1}\ell$. The proofs are based on the method of moments and combinatorial arguments, such as an algorithmic enumeration of collections of arithmetic progressions.


Introduction and main results
An -term arithmetic progression ( -AP) in a set X ⊂ Z is an (ordered) -tuple of distinct numbers (a, a + b, . . . , a + ( − 1)b) whose elements belong to X . In Dickson's History of the Theory of Numbers, the analysis of APs is traced back to around 1770 when it became prominent due to Lagrange and Waring investigating how large the common difference of an -AP of primes must be. Ever since, the study of APs has remained an extremely active domain of research and led to several results of fundamental importance, for instance Dirichlet's Theorem [12] proved in 1837 played a key role in the formation of analytic number theory. Perhaps unsurprisingly, APs also became objects of interest in other fields such as combinatorics: van der Waerden's celebrated theorem [36] states that for any given positive integers r and k, there exists some number W (r, k) (the minimal such number being nowadays called the van der Waerden number) such that if the integers {1, 2, . . . , W (r, k)} are colored with one of r different colors, then there exist at least k integers in arithmetic progression whose elements are of the same color.
Erdős also stated a number of conjectures related to -APs [5, pp. 232-233]. In particular, he offered $1000 to solve the following largest progression-free subset problem: find the cardinality of the largest subset of {1, . . . , m} (m ∈ N) which does not contain any -AP. This problem was solved by Szémeredi with his celebrated density theorem [35]: a subset of N of non-zero upper asymptotic density contains -APs of any arbitrary length . The case = 3 was settled in Roth's celebrated theorem that opened the use of Fourier analysis in additive combinatorics. Subsequently, based on Szémeredi's Theorem, Green and Tao [18] proved the long-standing conjecture on prime APs: (dense subsets of) the primes contain infinitely many -APs for all lengths .
In 1936, Cramér [11] conjectured that the gaps between two consecutive primes remain asymptotically bounded by the square of their logarithms and backed this conjecture with a heuristic model that replaces the set P of primes by a random set P made out of Bernoulli random variables, where P(m ∈ P ) ≈ 1/ log m independently for all integers m ≥ 2. However, the study of APs in random sets does not only provide a nice heuristic for number theoretic problems but is also a very natural and interesting model from a probabilistic point of view. For instance, Kohayakawa, Łuczak, and Rödl [24] proved that sparse uniformly random subsets M ⊆ {1, . . . , n} of size |M | = Ω( √ n) have the property that any (sufficiently) dense subset of M already contains a 3-AP with probability tending to 1 as n → +∞. For recent developments on extremal theorems for random sets (not only -APs), we refer to [2,10,33,34].
In this article we focus our attention on longer APs in sparse binomial subsets of {1, . . . , n}, including -APs with length = (n) → +∞ as n → +∞. In particular, we determine the limiting distribution of the number of -APs and analyse the joint distribution of the numbers of -APs and -APs of different lengths = .

Main results
We consider a family of random subsets of the initial segments [n] := {1, . . . , n} ⊂ N of the integers. For any p = p(n) ∈ [0, 1] let Ξ 1 , . . . , Ξ n be a collection of independent identically distributed Be(p) random variables, denote their product measure by P, and let [n] p := {i ∈ [n] : Ξ i = 1} be the p-percolation of [n], i.e. [n] p is the random subset of [n] obtained by deleting any of the elements with probability 1 − p, independently of all other elements. We use the term constant to mean independent of the parameter n, and any unspecified asymptotic notation (including limits) is to be understood with respect to n → +∞. Clearly, [n] itself is an n-AP and any -AP contains a whole number of -APs for each 3 ≤ ≤ − 1. Therefore, the family {X } 3≤ ≤n is obviously correlated in a non-trivial way. While the FKG inequality (see Theorem 2.10) implies that this family is actually positively correlated, it is a priori unclear whether this correlation is asymptotically relevant. The main goal of this article is to study the asymptotic behaviour of the joint distribution of the pair (X 1 , X 2 ) with 1 > 2 .
We start by determining the limiting distribution of the number of -APs to be either a Poisson distribution or a Gaussian distribution. Let σ := V(X ) denote the standard deviation of X .  While a priori could be as large as n, it is easy to see that the random subset [n] p with p = o(1) (i.e. in the sparse regime) asymptotically almost surely (a.a.s.) does not contain any -APs with = (n) ≥ C log n for any constant C > 0. This follows by a first moment argument, since ≤ exp 2 log n − C log n log(p −1 ) = o(1), (1.1) and thus by Markov's inequality P(X = 0) → 1. In other words, Theorem 1.1 is optimal concerning the range of . 1 Theorem 1.1 shares conceptual similarities with a result of Ruciński [30] that deals with the number of copies of a given graph H in a binomial random graph G(n, p) obtained as the p-percolation of the complete graph K n . While [30,Thm. 2] deals only with graphs H of fixed size, it considers any possible graph having at least one edge; moreover, the method employed to prove it is the method of moments that will be used in the proof of Theorem 1.2 but that could also be used for Theorem 1.1.
We remark that for constant ≥ 3, Theorem 1.1 hardly comes as a surprise since X is a sum of "weakly dependent" Bernoulli random variables. The Gaussian approximation follows then from a sufficient criterion due to Mikhailov (cf. 1 / log n → 0. Let 0 < p = p(n) < 1 be such that p 9 1 → 0 and n 2 p 1 −9 1 → +∞. Then we Interestingly, the strength of the correlation is characterised by the asymptotic behaviour of the function which originates from the combinatorial structure of tuples of overlapping APs. There are two structures, loose pairs and overlap pairs (see Definition 2.2), which compete to dominate the centralised second moments of the pair (X 1 , X 2 ). The function ψ 1 is obtained as the ratio of the contribution of loose pairs by that of overlap pairs (of 1 -APs); when ψ 1 → 0, overlap pairs dominate, and when ψ 1 → +∞, loose pairs dominate. We call the former the overlap pair regime, and the latter the loose pair regime. An explicit expression of κ 1, 2 is given in Lemma 2.11 and its proof; its derivation is surprisingly quite intricate and involves an integral representation.
Furthermore, we want to highlight that when 2 = 2 (n) → +∞ (and thus also 1 = 1 (n) → +∞), the random variables X 1 and X 2 are either asymptotically uncorrelated, or converge to the same random variable (once renormalised). However, in all other cases, there exists a regime where the asymptotic correlation is non-trivial.
Lastly, we remark that the slightly more restrictive conditions p 9 1 → 0 and n 2 p 1 −9 1 → +∞ are an artefact of the proof method, we strongly believe that the result remains true under the weaker assumptions p → 0 and n 2 p 1−1 −1 → +∞, which characterise the sparse Gaussian regime for 1 -APs, cf. Theorem 1.1(b).

Related work
In the literature, the study of X for random subsets of the integers is largely focused on ≥ 3 being a constant and estimating the probability of large deviations from its mean, i.e. the upper tail probabilities P (X ≥ (1 + ε)E(X )), and the lower tail probabilities P (X ≤ (1 − ε)E(X )). For a recent survey on large deviations in random graphs (and related combinatorial structures) see [8].
For the upper tail, Janson and Ruciński [22] obtained upper and lower bounds on − log P (X ≥ (1 + ε)E(X )) being apart by a factor of log(1/p) by extending an earlier result by Janson, Oleszkiewicz, and Ruciński [21] on large deviations for subgraph counts in random graphs. Subsequently, Warnke [37] closed this gap by proving that and also supplying the dependency on ε of the implied constants in Θ ε . Notably, provided that p is in the loose pair regime (more precisely, ψ ≥ log n, where ψ = n p −1 as in (1.2)) the results in [37] also extend to moderate variations, i.e. events of the form {X ≥ E(X ) + t} for any t ≥ σ . Complementing these results, Bhattacharya, Ganguly, 2 I.e. 2 (n) < 1 (n) for all n ≥ 1.
Shao, and Zhao [3] pinned down the precise large deviation rate function for "sufficiently large" p. By contrast to the approach in [37], the proof in [3] builds on the non-linear large deviation principle by Chatterjee and Dembo [9] and its refinement due to Eldan [13] in terms of the concept of Gaussian width, a particular notion of complexity. Recently, Briët and Gopi [7] derived an upper bound on the Gaussian width leading to an improvement of the lower bound on p given in [3]. The special case = 3 was already included in [9]. On the other hand, the lower tail has received less attention: for all constants ≥ 3, Janson and Warnke [23] determined the large deviation rate function up to constants to be while Mousset, Noever, Panagiotou, and Samotij [28] concentrated on the probability of [n] p to be -AP free, and expressed − log P(X = 0) as an alternating sum of certain joint cumulants defined in terms of the dependency graph associated to X . The results on -APs in [28] hold only for p within the overlap pair regime (ψ = o(1), where ψ = n p −1 as in (1.2)).
We complement the literature results on large and moderate deviations by considering typical deviations and thereby determining the limiting distribution of X not only for all constants ≥ 3 but also when = (n) → +∞. Additionally, we also investigate the interaction of the number of APs of different length occuring in [n] p , i.e. typical fluctuations of the pair (X , X ). Strikingly, we find a significantly different behaviour of their bivariate fluctuations in the overlap pair regime, as compared to the loose pair regime. By contrast to the results on moderate deviations in [37] or the result in [28] which work only in one of the two regimes, we employ the same approach in both regimes.

Proof method and outline
The main goal of this article lies in the analysis of bivariate fluctuations of the pair (X 1 , X 2 ) based on the method of moments: we show that the joint moments of (X 1 , X 2 ), once centered and rescaled, converge to the moments of a Gaussian vector, which ensures the convergence in distribution. More formally, we apply the combination of the following two classical results: Theorem 1.3 (e.g. Theorem 30.2 in [4]). Let Y be a random variable which is determined by its moments, and let (Y n ) n∈N be a sequence of random variables having finite moments of all orders.
Instances of this method in the setting of probabilistic combinatorics could not be traced back, but one can already see it in the proof by Füredi and Komlós [15] of the Wigner semi-circle distribution, or in the work of Ruciński [30], itself inspired by Maheara [25]. Our approach for the analysis of the (normalised) joint moments with growing was in particular inspired by a recent result of Gao and Sato [17] determining the limiting distribution of the number of matchings of size = (n) in G(n, p) to be either a Normal or a Log-normal distribution. The scheme of the method consists in finding the appropriate combinatorial structure that describes the moments of the limiting distribution and to show that such a structure governs the moments of the actual (normalised) random variables under study. The method can also be extended to joint moments, as long as a relevent combinatorial structure underlies them. It is well-known that the odd moments of a centered, multivariate Gaussian distribution vanish, while the even moments can be expressed combinatorially: for k ∈ N the 2k-th moment is given by a sum over all perfect matchings of the set [2k]. The key technical point in [15,17,30] is thus to find a suitable coding of the moments that highlights the combinatorial structure giving the main contribution, hence, the heart of our proof lies in showing that the (even and centred) joint moments of (X 1 , X 2 ) are dominated by a matching structure.
In fact, we will see that this combinatorial structure is encoded in the dependency graph Γ (cf. Definition 3.1) associated with the pair (X 1 , X 2 ). Depending on the range of p, the main contribution will come from matchings consisting of overlap pairs and/or loose pairs, and can be determined explicitly. It then remains to bound the contributions of all non-matching configurations. This last step is based on an algorithmic exploration of the components in Γ; a similar argument was previously used by Bollobás, Cooley, Kang, and the second author [6] in the context of jigsaw percolation on random hypergraphs. By contrast, in [17] this last step was based on the switching method introduced by McKay [26], which turned out to be difficult to apply in the setting of APs due to their arithmetic structure.
We close with an outline of the article: Section 2 focusses on counting APs and pairs of APs, and deriving the joint second moments from these. Since we require a high level of precision, the counting argument for loose pairs of APs turns out to be surprisingly challenging. In Section 3 we complete the proof of Theorem 1.1 based on two sufficient criteria from the literature. The higher joint moments of the pair (X 1 , X 2 ) are analysed in Section 4, where we also complete the proof of Theorem 1.2 and provide an alternative proof of Theorem 1.1(b). We then conclude with a discussion of open problems in Section 5.

Preliminaries: counting APs and pairs of APs
We start out with determining the asymptotics related to the set of APs in [n]. First, we consider the total number of -APs, denoted by A := |A |, where we recall that A denotes the set of -APs in [n].
In particular, the following asymptotics holds for all 3 ≤ = (n) ≤ n: A = Θ(n(n − + 1) −1 ). Furthermore, for any 3 ≤ = (n) = o(n), we have . Furthermore, we observe that for all we have • if /n → 1, the -AP contained in [n] is clearly an interval, hence the number of such choices is n − + 1, completing the proof.

Loose pairs and overlap pairs
Next, we consider pairs of APs of potentially different lengths, and distinguish them by the size of their intersection.
Notice that C , = A . Computing the asymptotic behaviour of the number of overlap pairs is a corollary of Claim 2.1.
Proof. Note that the number of overlap pairs (T 1 , Similarly, we obtain an upper bound on the number of pairs intersecting in precisely r elements for 2 ≤ r ≤ − 1. Despite being somewhat crude, this bound will suffice for our purposes. , is already uniquely determined by choosing the first AP T , for which there are at most O(n 2 −1 ) many choices by Claim 2.1; and then fixing the relative position of the first two intersection elements within T and T , for which there are at most 2 and ( ) 2 many choices, respectively. The first claim follows by multiplying.
As for the second bound, assume that r ≥ 2 /3, then any pair (T, T ) ∈ D (r) , induces an overlap pair consisting of the -AP T and the r-AP T ∩ T . By definition the number of such pairs is C ,r and thus at most O(n 2 ( − r + 1)/ ), by Corollary 2.3. Next, observe that once T and T ∩ T are chosen, the common difference of T needs to be a divisor of the common difference of T ∩ T . However, since r ≥ 2 /3 implying that both T ∩ T and T have the same common difference. So we may only choose how many elements of T \ T are smaller than the smallest element of T ∩ T , the number of choices is at most − r + 1. Hence in total we obtain the claimed upper bound.
By contrast, determining the asymptotics of the number of loose pairs is much more difficult. In the following we will use the convention that 1/0 = +∞, min{x, +∞} = x,  We start by proving two technical properties of these functions Claim 2.5. For any constant ≥ 3 the function h is non-negative and has the following properties:
Proof. For the first claim, we note that min x a , x a ≥ 1/2 for all 1/3 ≤ a ≤ 2/3 and 1/3 ≤ x ≤ 2/3. We conclude by noting that there is at least one ι in {1, 2, . . . , } such that Therefore, we obtain as claimed.
Next, let the (binary) entropy function h ∞ : and observe that h ∞ is continuous on [0, 1]. The next statement shows that h ∞ is obtained naturally from h when = (n) → +∞.
Moreover, for all x ∈ (0, 1) we have With this preparation we will now determine the number of loose pairs asymptotically.
Expressing T (1), T (1), T ( ), and T ( ) in terms of m, ι, ι , δ, and δ , this is equivalent to In other words, the number of valid choices for m is It turns out to be convenient to divide this quantity by n to obtain where the function f : and thus it is not hard to show that there exists a constant C > 0 such that for all Furthermore, let and observe that {dν n } n∈N converges weakly to the uniform measure on [0, 1] 2 . Since f is bounded and continuous, we therefore have The next goal is to deal with the positive part of the function f : we note that and so, for any (a, a , u, u ) ∈ [0, 1] 4 , by setting using the convention that 1/0 = +∞, min{x, +∞} = x, and x := 1 − x for all x ∈ [0, 1]. Consequently, by integrating over (u, u ) ∈ [0, 1] 2 and using Fubini's Theorem, we obtain Hence, (2.7) simplifies to become where µ and µ are the measures defined in (2.1). Now, we observe that Assume now that is a constant, but = (n) → +∞ with = o(log n). Then by completing the claim for this case.
analogously to the previous case, we obtain where we evaluated the integral using SageMath [32].

Second moments
Given any subset T ⊆ [n], we define , and for any 3 ≤ = (n) ≤ n we set First, we prove that the main contribution of the centred second moments comes from loose pairs, overlap pairs, or a combination of both.
Proof. We observe that for any r ∈ [ ] and (T, By distinguishing the size of the intersection we obtain and recall that by definition D Therefore, we first consider the contribution of summands with 2 ≤ r ≤ 2 /3 . By the first estimate of Claim 2.4 we have where for the last estimate, we recall that C , = Θ(n 2 ( − + 1)( ) −1 ) by Corollary 2.3, and observe that  Hence, the main contribution to E(X X ) comes from the summands for r = 1 and r = , i.e. we have E(X X ) = (1 ± o(1)) B , p + −1 + C , p , as claimed by the first statement. As for the second statement, we recall that by definition B , = B and C , = C = A .
Before investigating the limiting correlation more in details, we remind the classical FKG inequality (see e.g. [20, thm. 2.12]). We say that a function f : In particular, if Q 1 , Q 2 are two increasing (resp. decreasing) families of subsets of [n], An important application of this theorem concerns the random variable X defined by X := S∈S 1 {S⊂[n] P } for a certain family S of non-empty subsets of [n] P . Note that every random variable 1 {S⊂[n] P } is increasing, hence, X is increasing. Now, for any 3 ≤ = (n) < = (n) ≤ n we define κ , := lim n→+∞ E(X X ) σ σ (2.9) and observe that 0 ≤ κ , ≤ 1, by the FKG inequality and the Cauchy-Schwarz inequality.
The following proof shows implicitly that κ , is well-defined, i.e. the limit in (2.9) exists.
Proof. By Lemma 2.9, we have Furthermore, using Claim 2.1 and Lemma 2.7 we obtain Hence, letting n → +∞ we obtain as claimed since we already argued that κ , ≥ 0 by the FKG inequality.
Thus we obtain (2.10) Now let ϕ := h if is a constant, and ϕ := h ∞ if = (n) → +∞; and define ϕ analogously. We note that both ϕ and ϕ are L 2 -integrable. Next, we take the limit n → +∞ in (2.10) and note that Lemma 2.7 implies In particular, the Cauchy-Schwarz inequality implies from Lemmas 2.9 and 2.7, using the notation of ϕ and ϕ as in the previous case. As before, we observe that ϕ , ϕ > 0 and this implies κ , > 0.
Second, the case where is a constant can be treated with the cases of equality of the Cauchy-Schwarz inequality. We recall that f, g = f g in a given R-vector space E iff the two functions are linearly dependent in E, i.e. there exists λ ∈ R * s.t. f = λg with f, g ∈ E. We then observe that h and h ∞ are linearly independent in L 2 . To see this, let ε = ε( ) > 0 be a sufficiently small constant, and observe that however, by Lemma 2.5 (b), for sufficiently small ε > 0, we have h (x) ≥ 1/( − 1) and Consequently, for any sufficiently small constant ε > 0 we obtain and so the functions h and h ∞ are not linearly dependent in L 2 , as claimed. Consequently, the Cauchy-Schwarz inequality is a strict inequality and we obtain κ , = ϕ , ϕ 2 ϕ 2 2 ϕ 2 2 < 1, completing the proof.

Univariate fluctuations: proof of Theorem 1.1
In this section we focus on univariate fluctuations of A , i.e. we prove the two statements of Theorem 1.1. First we treat the Poisson regime, where the result follows directly from an application of the Chen-Stein method and the preliminary computations performed in Section 2 (with = ). Likewise, the Gaussian approximation is a consequence of a classical normality criterion.

Poisson regime: proof of Theorem 1.1(a)
We start by introducing the notion of a dependency graph. We emphasize the fact that this definition is the one that fits our purpose, and that there can be many other such notions (see e.g. [14,20]).  Moreover, it is clearly not a bipartite graph. We remark that including the possibility = is to just cover the univariate case, in which A = A and so A ∪ A = A .
We define the following two quantities associated with a dependency graph G of (Y i ) 1≤i≤N : We use a variant of the Chen-Stein method due to Arratia, Goldstein, and Gordon [1] (in a slightly simplified form).
Let G be a dependency graph of (Y i ) 1≤i≤N , and V 1 (G), V 2 (G) as in (3.2). Let Y be a Poisson random variable with mean E(Y) := ζ. Then, for any U ⊂ N,

Remark 3.3. The theorem given in [1] uses an additional quantity V 3 (G) given by
due to the use of a different notion of dependency graphs. This quantity is irrelevant for us, as we always have V 3 (G) = 0.
where the last estimate holds due to Claim 2.4 and Lemma 2.7.

Gaussian regime: proof of Theorem 1.1(b)
For the normal approximation we apply a criterion due to Janson [19], which was then refined by Mikhailov [27]. This normality criterion is based on controlling mixed cumulants of sum of random variables by means of an associated dependency graph. We follow the notation of [20]. Theorem 3.5 (e.g. Theorem 6.21 in [20]). Let (X i,n ) 1≤i≤Nn be a family of random variables with dependency graph Γ n (as defined in Definition 3.1) and suppose that there exist constants {C r } r∈N independent of n, and quantities M n and Q n such that E Nn i=1 |X i,n | ≤ M n , (3.3) and for all V of constant size (i.e. |V | is independent of n), we have Note that the proof of Theorem 3.5 shows that the assumption (3.5) becomes weaker as s increases. However, we will see that for this application it is satisfied for any s > 0.  Recall that Lemma 2.9 gives σ n = (1 ± o(1)) B p 2 −1 + C p with C = A = Θ(n 2 −1 ) and B = Θ(n 3 ) by Claim 2.1 and Lemma 2.7, respectively. Thus we have σ n = Θ( n 2 p −1 (1 + np −1 )) and we distinguish two cases: If np −1 ≥ 10, then we have M n /σ n = O(n 1/2 p 1/2 −1 ) and also Q n /σ n ≤ np −1 5 /σ n = O(n −1/2 p −1/2 5 ). Thus, for any s > 2, we have Next, we recall that by Remark 3.4 we may additionally assume that p is not too small, e.g. p ≥ εn − max{3/(2 −1),2/( +1)} for any ε = ε(n) > 0 with ε → 0. It remains to observe that when ε is decreasing sufficiently slowly this implies that n 2 p e Ω(log n)/ . Since = o(log n), it follows that (3.5) is satisfied and applying Theorem 3.5 completes the proof of Theorem 1.1(b).

Bivariate fluctuations: proof of Theorem 1.2
For the rest of this Section, it will be convenient to assume that 2 < 1 . More precisely, we let 3 ≤ 2 = 2 (n) < 1 = 1 (n) and 0 < p = p(n) < 1 such that   k ∈ N and u 1 , u 2 ∈ R. (We recall that σ i = E(X 2 i ) denotes the standard deviation of X i for i ∈ {1, 2}.) By definition we have where k i (T) := |{T ∈ T : |T | = i }|, for i ∈ {1, 2}, is the number of i -APs in T.

Main contribution to the moments
In (4.4) we expressed the k-th moment of an arbitrary linear combination ofX 1 and X 2 as a sum ranging over k-tuples of APs, each of length 1 or 2 . We will now show that for even k the main contribution to this sum comes from k-tuples T = (T 1 , . . . , T k ) with a certain matching structure, namely there exists a bijective self-inverse mapping ν : [k] → [k] without fixed point (we will call such permutation a (perfect) matching) such that T satisfies We write F ν (k) for the set of (ordered) k-tuples satisfying (4.5) for a given matching ν, and observe that any two distinct sets F ν (k) and F ν (k), ν = ν , are disjoint and can be mapped bijectively onto each other. Thus for any even k let ν * be defined by ν * (2i − 1) = 2i, ∀i ∈ [k/2], and note that there are precisely (k − 1)!! many distinct matchings ν if k is even, and none at all if k is odd.
Let F (k) denote the contribution of k-tuples in F(k) :=˙ ν F ν (k) to the k-th moment , and set F(k) := ∅ for k odd. Then we let G(k) := A 1 ∪ A 2 k \ F(k) for all k ∈ N, and denote the contribution of G(k) by G(k). In other words, we have

Arithmetic progressions in random sets
We observe that by the previous argument we may express F (k) for even k as Note that the summation now ranges over ordered k-tuples of APs T ∈ F ν * (k), where the first AP is matched with the second, the third with the fourth, and in general any AP on an odd position is matched with its successor. Thus it is convenient to slightly change our perspective: from now on we will regard T as an ordered k 2 -tuple of intersecting pairs of APs (T 2i−1 , T 2i ). Each such intersecting pair falls into precisely one of three categories, either both T 2i−1 and T 2i are 1 -APs, both are 2 -APs, or they form a mixed intersecting pair, i.e. one is an 1 -AP while the other is an 2 -AP. Formally, we define sets of labels by setting Consequently, by parametrising according to Θ 1 and Θ 2 , the expression (4.7) turns into since there are precisely k/2 θ1,θ2,θ3 many possibilities for partitioning the set [k/2] into sets of sizes θ 1 , θ 2 and θ 3 := k/2 − θ 1 − θ 2 . Note that this means we have already fixed a partition of the labels [k/2] =˙ j=1,...,3 Θ j (T) when choosing T in the last sum.
We observe that the first part of this expression is already very reminiscent of a multinomial formula. In the next lemma, we show that this intuition is well justified and demonstrate that the leading order term of F (k)/(k − 1)!! is given by a multinomial with three summands (representing the three categories of intersections) and exponent k/2 (the length of T as a tuple of intersecting pairs of APs).
We are aiming to proceed pair-by-pair, i.e. for rounds i = 1, . . . , k/2 we enumerate all possible ways of embedding (T 2i−1 , T 2i ) into the set [n]. To do so, we need to be careful to avoid reusing points from [n] in different rounds. We now formalise this idea. Given an integer m ∈ N and j ∈ {1, 2} we define the sets where the range of summation of the overlap parameter m is {1, . . . , 1 } when j = 1 and {1, . . . , 2 } when j ∈ {2, 3}. We observe that the factor 2 for j = 3 is due to symmetry as we consider ordered pairs (T, T ), see also the definition of M(m, 3). Now in any round i ∈ {1, . . . , k/2}, we enumerate all possible choices for (T 2i−1 , T 2i ) by first choosing the size of their overlap, say m i , and then selecting an embedded pair (T, T ) ∈ M i := M(m i , j i ), where j i is given as the unique solution of i ∈ Θ ji (T). 4 However, choosing (T 2i−1 , T 2i ) = (T, T ) may not be a valid choice, since T ∪ T may contain elements from T j for some j ∈ [2i − 2] (thus violating (4.5) and the definition of F(k)). Nonetheless, we claim that almost all of them are indeed valid. More formally, let (4.10) Furthermore, as the contribution from each term (T, T ) ∈ M i is the same, Claim 4.3 shows that the error introduced by replacing M * i with M i in (4.10) is negligible: it is accounted for by a factor of (1 ± o(1)). Consequently, by (4.9) we obtain We first deal with the case m i ≥ 2. We will see that |M i | = Ω(n 2 / 1 ) and |M i | = O(n 5 1 ), and thus |M i | = o(|M i |) since 1 = o(log n). Indeed, note that for every 2 1 -AP T , we can let T : = {T (1), . . . , T (|T |)} and T : (1)) by Claim 2.1. On the other hand, to obtain a pair (T, T ) in M i , we need to choose first some x ∈ (T ∪ T ) ∩ R, which has at most |R| ≤ k 1 choices. Then the arithmetic progression containing x, say T , is determined by picking a common difference, for which there are at most n choices. Then by the observation above, the number of choices for T with |T ∩ T | ≥ 2 is O( 4 1 ). Therefore, |M i | = O( 1 · n · 4 1 ) as claimed. We then deal with the case m i = 1. Similarly, we show that |M i | = Ω(n 3 −2 1 ) and |M i | = O(n 2 1 ), and hence |M i | = o(|M i |) since 1 = o(log n). Indeed, to get a pair (T, T ) in M i , we have at least A 1 choices to fix T and then, upon choosing some x ∈ T as its intersection with T , there are at least n/2 1−1 choices to choose the common difference of T . This is because if x ≥ n/2 (x ≤ n/2 respectively), then we can find T with x as the last (first respectively) element. Again there are at most O( 5 1 ) such T intersecting with T at more than one place, we then have |M i | ≥ A 1 · n/2 1 −1 − O( As demonstrated earlier, this also completes the proof of Lemma 4.2.

Minor contribution to the moments
Next we turn our attention to k- We start with some preparation. We will change the order of summation in an algorithmic fashion as described below. First we fix an arbitrary total order π of the set A 1 ∪ A 2 such that all 1 -APs come before any 2 -AP, i.e. we have π(T ) < π(T ) for all T ∈ A 1 and T ∈ A 2 . We now explore any (non-empty) finite collection of APs component-wise as follows. Roughly speaking, given T, let H be an auxiliary k-vertex graph, in which each vertex represents an AP in T and two vertices are adjacent if and only if the corresponding APs have non-empty intersection. Then we will explore V (H), moving from one vertex to one of its neighbours according to the ordering π and start the search from a new component whenever the current one is exhausted. For More precisely, we perform the following algorithm: (I) Initialise the inactive list L i and active list L a : L i ← T, L a ← ∅, and j ← 1.
(II) Start a new component: If L a = ∅, then let L a ← {min π L i }.
(III) Set: T j , (size of the overlap with previous APs) (V) If j = |T|, then STOP; otherwise, set j ← j + 1 and return to step (II).
We remark that our algorithm resembles that of the, say, Depth/Breadth First Search algorithm on graphs. The difference is that in our algorithm within a connected component, we search APs according to the ordering π. Note that any permutation T of the input T will result in the same ordered tuple π(T ) = (T 1 , . . . , T |T| ). We now assume that |T| = k. Observe that t and s satisfy ∀i ∈ [k] : s i ∈ { 1 , 2 }, (4.11) ∀i ∈ [k] : 0 ≤ t i ≤ s i , (4.12) ∀i ∈ [k] : {t i = 0} =⇒ {s i = 1 } ∨ {s j = 2 , ∀j = i, . . . , k}, (4.13) where (4.13) follows from the choice of π. An illustration of (4.13) is given in Example 4.5.
Additionally, note that if the input T is such that there (4.14) Similarly, t k > 0, since otherwise Z T k is independent from (Z T1 , . . . , Z T k−1 ) and thus T does not contribute to G(k). We write T k := {t ∈ {0, 1, . . . , 1 } k : t satisfies (4.14) and t k > 0} for the set of all type vectors of length k which do not contain two consecutive zeros and do not end in a zero. In particular, this implies that we may assume |I 0 | ≤ k 2 − 1 for even k and |I 0 | ≤ k−1 2 for odd k, in other words, we have |I 0, 1 | + |I 0, 2 | ≤ k/2 − 1. The main idea is to enumerate the sum in (4.4) by first choosing the vector t ∈ {0, 1, . . . , 1 } k , then a valid size-type vector s ∈ S k (t), and lastly a tuple (T 1 , . . . , T k ) ∈ G(k) such that τ (T 1 , . . . , T k ) = (t, s). In terms of formula, we obtain M t,s · µ t,s , is the average contribution to G(k) of a k-tuple with given type vectors (t, s). We first aim to bound the average contribution µ t,s . Proposition 4.6. Let t ∈ T k and s ∈ S k (t), then we have Proof. Let T = (T 1 , . . . , T k ) ∈ G(k) with τ (T) = (t, s). Here T 1 , . . . , T k are in the order corresponding to the output of the exploring algorithm, hence we have |T i | = s i . We see Thus, it only remains to show that the remaining (constantly many) summands are all of lower order.
Let r ∈ [k] and fix an arbitrary subset R ⊆ [k] of size r. The absolute value of its contribution to E(Z T1 · · · Z T k ) is equal to Furthermore, if this last inequality is not an equality, then Next, suppose towards contradiction that the equality holds, so But at the same time we have and thus all intermediate inequalities above must be equalities. This happens for the first inequality when {T i } i∈R are pairwise disjoint and for the second inequality when (∪ i∈R T i ) ∩ (∪ i / ∈R T i ) = ∅. But this in turn implies that for any i ∈ R, the set T i is disjoint from ∪ j =i T j , so t i = t i+1 = 0, contradicting (4.14).
Because these bounds are uniform over the choice of the k-tuple T the statement follows by taking the average.
We now aim at bounding the number of summands M t,s . To do so, recall that in the dependency graph G 1, 2 defined in (3.1), each vertex represents an AP in A 1 ∪ A 2 , and two vertices form an edge if and only if the corresponding APs have non-empty intersection.
We will construct tuples T with τ (T) = (t, s) in the order given by its reordering π(T) = (T 1 , . . . , T k ). In particular, this means that we consider one component of Observe that, by (4.14), the j-th component contains at least two APs T rj and T rj +1 . As T rj starts a new component (t rj = 0), the number of choices for T rj is at most A sr j = O(n 2 s −1 rj ) by Claim 2.1. Next we choose T rj +1 : (a) if t rj +1 = 1, then the number of choices is at most O(ns rj ), since there are at most s rj choices for the common vertex x ∈ T rj ∩ T rj +1 , at most s rj +1 choices for the position of x within T rj +1 and O(n/s rj +1 ) for the common difference of T rj +1 ; (b) if t rj +1 = s rj +1 = s rj , then there is only one possibility T rj +1 = T rj ; (c) otherwise, T rj +1 is determined by choosing two elements from T rj and their respective positions within T rj +1 , which amounts to at most O(s 2 rj s 2 rj +1 ) many choices.
Similarly, for any remaining i = r j + 2, . . . , r j+1 − 1 (there might be none), we use the following bounds on the number of choices for T i : (a) if t rj +1 = 1, then the number of choices is at most O(n 1 ), since there are at most O( 1 ) choices for the common vertex x ∈ T i ∩ T rj ∪ · · · ∪ T i−1 , at most s i choices for the position of x within T i and O(n/s i ) for the common difference of T i ; (b) otherwise, T i is determined by choosing two elements from T rj ∪ · · · ∪ T i−1 and their respective positions within T i , which amounts to at most O( 2 1 s 2 i ) many choices.
With this preparation we are now ready to prove Lemma 4.4. We will bound the contribution of each k-tuple to G(k) = t,s µ t,s M t,s from above component-wise Proof of Lemma 4.4. First observe that Lemma 2.9 implies that for any ≥ 3 we have g t,s (i)σ −1 si , Moreover, we recall the notation r j = min {i ∈ [k] \ {r 1 , . . . , r j−1 } : t i = 0} and r |I0|+1 := k + 1 used in the proof of Proposition 4.7. These indices split the interval [k] into |I 0 | parts, i.e.
We now treat any (potentially) remaining indices i = r j + 2, . . . , r j+1 − 1 and estimate g t,s (i)σ −1 si one by one. The distinction of the different regimes in Theorem 1.2 follows from Lemma 2.11, completing the proof.
The same proof also applies for the study of univariate fluctuations. 5 Alternative proof of Theorem 1.1(b). For 3 ≤ = (n) = o(log n) and 0 < p = p(n) < 1 such that p 9 → 0 and n 2 p −9 → +∞, we obtain

Concluding remarks
The main topic at stake in this article was to study the joint distribution of the numbers of APs of different length in some random subsets M of the integers. In the most general setup, we would like to understand the growth behaviour of the family {X } 3≤ ≤n where X = X (M ) denotes the number of -APs of integers which are (entirely) contained in M .
Here, we took a first step in this direction by determining the joint limiting distribution of (X 1 , X 2 ) in M = [n] p for a significant range of parameters p and 3 ≤ 2 < 1 = o(log n).
We believe that our approach should also allow us to determine the limiting distribution of r-tuples (X 1 , X 2 , . . . , X r ) for r ≥ 3 (within the intersection of their respective Gaussian regimes), hence, to give a functional Central Limit Theorem for e.g. X s s∈[0,1] with = (n) = o(log n). In particular, it would be interesting to know whether for some constants 1 , 2 , . . . , r , with (constant) r ≥ 3, the Gaussian limit becomes degenerate. We observed it for r = 2 when 1 = 1 (n), 2 = 2 (n) → +∞ sufficiently slowly: X 1 and X 2 are then either asymptotically uncorrelated or converge to the same Gaussian random variable (after re-normalisation).
Furthermore, recall that Theorem 1.2 uses the assumption n 2 p 1 −9 1 → +∞ which guarantees that both X 1 and X 2 are within their respective Gaussian regimes. One may thus ask what happens for smaller values of p. At least heuristically, our results for the overlap pair regime (i.e. np 1 −1 1 → 0) suggest that a good candidate for the joint limit consists of two independent random variables having the appropriate marginal distributions (Gaussian or Poisson) determined in Theorem 1.1.
Throughout the article, we focused on -APs where = o(log n), the reason being that typically the random set [n] p will not contain any longer APs as long as p = o(1). In order to witness any -APs with / log n → +∞ we would need to consider p = p(n) → 1.
Borrowing some intuition from Gao and Sato's work [17] on large matchings in the random graph G(n, p) -namely the log-normal paradigm of Gao [16] -we might expect to see another change of regime to a Log-normal limiting distribution for very long APs. However, in this regime, various estimates derived in this paper cease to hold and we leave this as an open problem.
Another question of interest concerns the behaviour of the joint cumulants of (X 1 , X 2 ) in the various regimes encountered here. In the Gaussian regime, since the moments of the rescaled random variables converge to the Gaussian moments, their 5 Albeit with the mild additional assumption p 9 → 0 for technical reasons.
cumulants of order r ≥ 3 converge to 0. One can ask if the BFS-type coding allows to see such a behaviour in a fine way, for instance with an asymptotic expansion.
Lastly, we would like to move in a slightly different direction: let 0 < s < t and consider the coupling [ tn ] p = [ sn ] p ∪ { sn + 1, . . . , tn } p for any p ∈ [0, 1]. What can be said about the joint distribution of X ([ sn ] p ), X ([ tn ] p ) ? More generally, does the random process X ([ tn ] p ) t≥0 satisfy a functional central limit theorem? What about X s ([ tn ] p ) s,t≥0 for = (n) = o(log n)?