Sharp Threshold Asymptotics for the Emergence of Additive Bases

A subset A of {0,1,...,n} is said to be a 2-additive basis for {1,2,...,n} if each j in {1,2,...,n} can be written as j=x+y, x,y in A, x<=y. If we pick each integer in {0,1,...,n} independently with probability p=p_n tending to 0, thus getting a random set A, what is the probability that we have obtained a 2-additive basis? We address this question when the target sum-set is [(1-alpha)n,(1+alpha)n] (or equivalently [alpha n, (2-alpha) n]) for some 0<alpha<1. Under either model, the Stein-Chen method of Poisson approximation is used, in conjunction with Janson's inequalities, to tease out a very sharp threshold for the emergence of a 2-additive basis. Generalizations to k-additive bases are then given.


Introduction
In 1956, Erdős [3] answered a question posed in 1932 by Sidon by proving that there exists an infinite sequences of natural numbers S and constants c 1 and c 2 such that for large n, where, for k ≥ 2, r k (n) is the number of ways of representing the integer n as the sum of k elements from S, a so-called asymptotic basis of order k.
The result was generalized in the 1990 work of Erdős and Tetali [4] which established that there exists an infinite sequence S for which (1) was true for each fixed k ≥ 2, i.e., for each large n, To achieve this result, Erdős and Tetali constructed a random sequence S of natural numbers by including z in S with probability where C is a determined constant and z 0 is the smallest constant such that p(z 0 ) ≤ 1/2.They then showed that this random sequence is a.s.an asymptotic basis of order k and that (2) holds a.s.for large n.
Note that this definition allows for x i = x j i = j.In [4], Erdős and Tetali showed that in some probability space, almost all infinite sequences S satisfy (2) and are asymptotic bases of order k.It is natural to then ask, for finite A, how small A can be while still being a k-additive basis.For k = 2, if A is a 2-additive basis we must clearly have  [7]).
In this paper, we will use a probability model in which each integer in [n] ∪ {0} is chosen to be in A with equal (and low) probability p = p n .We will then give sharp bounds on the probability that the random set A is a k-additive basis.It is evident that smaller numbers must be present in an additive basis, since, e.g., the only way to represent 1 in a 2-additive basis is as 1+0, and so in the random model edge effects come into play.Therefore, the random ensemble is unlikely to form an additive basis unless we adopt a different approach.This may be done in two ways, which lead to the following alternative definitions: It turns out that definitive results using Definition 1 have been proved in the papers of Yadin [14] and Sandor [12], and we thus focus on developing results using Definition 2. Although our model differs from that of Erdős and Tetali (our set is constructed using constant probability p as opposed to the p(z) used in [4]), the output threshold probabilities for the size of a truncated (or modular) k-additive basis end up being remarkably close to the input probabilities used by Tetali-Erdős to construct their asymptotic bases.We will stress the similarities between our results as appropriate.It also bears mentioning that the threshold size for our random A to be a 2additive basis is, up to a logarithmic factor, similar to the bounds outlined above using the analytic techniques.
Our work is organized as follows: We present threshold results on truncated 2-additive bases in Section 2, using the Stein-Chen method of Poisson approximation (see [2]), which is an alternative to the Janson inequalities and Brun's sieve used in [14], [12].This method allows one to not just find the limiting probability that A forms an additive basis, but also to approximate the probability distribution L(X) of the random variable X, defined as the number of integers j that cannot be expressed as a sum of elements in A. In the modular case, we will, accordingly, show as a corollary how our result partially extend those in [14], [12].In Section 4, we consider similar questions for truncated k-additive bases; at times, the Janson exponential inequalities ( [1], [8]) are used to estimate some critical baseline quantities that were calculated exactly in Section 2 for k = 2. Remark 4. Throughout the rest of the paper, we suppress the descriptor "additive", referring simply to "truncated k-bases," and, occasionally, "modular k-bases".

2-Additive Bases
We begin by investigating truncated 2-bases in our random model outlined above.We will then state the analogous result for modular 2-bases and we will explain how the modular results (and their generalization to arbitrary k-bases) extends the existing work in [14] and [12].As with all results in this paper, we seek to find the threshold value at which the bases emerge.
We begin by selecting each integer from {0, 1, . . ., n} independently with probability p = p n .We thus obtain a random set A, and denote by X the number of integers in [αn, (2−α)n] that cannot be written in the form x 1 +x 2 , x i ∈ A. Evidently, A is a truncated 2-basis if and only if X = 0. We can write X = ⌊(2−α)n⌋ j=⌈αn⌉ I j , where I j equals one or zero according as the integer j cannot or can be represented as a 2-sum of elements in A. To simplify the notation a bit, we will write simply (2−α)n j=αn I j for ⌊(2−α)n⌋ j=⌈αn⌉ I j in the sequel.The main results of this section are summarized in the following theorem: Theorem 5. Pick each integer in [n] ∪ {0} independently with probability p = p n → 0 thus getting a random set A.
i. Let Y ∼ Po(λ = E(X)).Let δ > 0, then for all and Proof.Here and throughout the paper, let S denote the random sumset generated by A. The first step in the proof will be to calculate the precise asymptotics of E(X) Proposition 6.With X as above, Proof.This is a critical computation and needs to be justified for the entire range of p ′ s that we encounter, in particular for all p satisfying (3).The first direction is easy: For the other direction, we have With p defined as in ( 4) we see that E(X) = (1+o( 1))(2α•exp{−αA n /2}).As is often the case with threshold phenomena, Markov's inequality can be used to easily establish the first part of ii, as we have To establish the second and third statements in part ii. of the theorem, we go beyond estimating the point probability P(X = 0), using instead the Stein-Chen method of Poisson approximation [2] to establish a total variation approximation for the distribution L(X) of X.If the distribution of X is approximately Poisson, then we will have e −λ − ε n ≤ P(X = 0) ≤ e −λ + ε n , where λ = E(X) is the mean of the approximating Poisson variable and ε n is the total variation error bound for the approximation.We have that , and throughout writing d TV (A, B) instead of the more appropriate d TV (L(A), L(B)), we seek to bound Following [2], we first need to determine, for each j separately, an auxiliary sequence of variables J j i defined on the same probability space with the property that L(J j 1 , J j 2 , . ..) = L(I 1 , I 2 , . . .|I j = 1). ( Our explicitly constructed coupling of the J j i 's is as follows: If I j = 1, set 1−p 2 , and remove both x 1 and x 2 from A with probability If x ∈ A and x + x = i, then we remove x from A with probability 1.Finally, define J j i = 1 if i ∈ S after the above coupling is implemented.It is clear that ( 5) is satisfied since we have "de-selected" offending integers based on the conditional probability of one or both integers in a pair being absent, given that both are not present.The total variation bounds derived in [2] are expressed in terms of the probability that the coupled indicator variables are different after the coupling is implemented; i.e.I i = 1, J j i = 0 or I i = 0, J j i = 1.Now, P(I i = 1, J j i = 0) = 0, since if integer i is not present in S, it cannot magically appear after some integers have been de-selected.
The formula we need thus reduces to There are two terms above.The first, and thus is bounded above by , a condition that is satisfied for p satisfying (3).
To bound the second term in the sum, we begin by noting 1 − e −λ λ j P(I j = 1) We further bound this second term by conditioning on the number of 2-sums of i that are present pre-coupling, denoting this number by B i .Denote the sumset of A post-coupling by S * .We see that Note that if there exists x such that i = 2x, then 2 ⌉ to simplify the notation) Our next step is to bound P(i ∈ S * |B i = k) via To bound the above term, we will assume all elements of the 2-sums of i are part of 2-sums of j and thus have positive probability of being removed by the coupling process.Fix x, y x = y, such that x + y = i.For ease of notation, define P x to be the event that x is part of a pre-coupling 2-sum of j, and analogously define P y .Further define R x to be the event x is removed from A by the coupling, and analogously define R y .Consider the following simple calculations: since the only undesirable outcome is if only the "other" component of the 2-sum of 0 is removed in lieu of x; and In the case where x = y, the only case we would need to consider is From this, we see that Let C(p) = 2p+2p 3 +p 2 .Our above calculations yield that max j i =j P(I i = 0, where and hence Σ 1 → 0 if p satisfies (3).For Σ 2 , we have and we have that d T V (X, Y ) → 0 if p satisfies (3), and for p in this range which finishes the proof.Note that if we consider a p not in this range, the result will hold by monotonicity.
The modular 2-basis result is proved similarly, and suppressing all details we state the result: Theorem 7. Pick each integer in [n − 1] ∪ {0} independently with probability p = p n thus getting a random set A. Then, i.Let X denote the number of missing integers in the modulo-2 sumset of A, and let Y ∼ Po(λ = E(X)).Let δ > 0, then for all The second part of Theorem 7 was proven in [14] using Janson's correlation inequalities and in [12] using the method of Brun's sieve (see for example [1]).In [12], the author is able to derive part i. of Theorem 7 at the threshold value of p, while the Stein-Chen method allows us to derive the result in a window about the threshold, thus somewhat generalizing the previous known results.
Remark 8. We have so far dealt with random sets of fixed expected size, and now indicate briefly how we can easily transition to the case of random sets of fixed size.This can be done for all the results in this paper, but we indicate the method in the context of Theorem 5: Choose one of n+1 |A| families randomly, and suppose |A| = √ Kn log n; K > 2 α (this corresponds to the expected size of A with p = K log n/n).Then we reconcile the two models as follows with p = K log n/n: by Theorem 5 and the central limit theorem.A similar argument holds if K < 2 α , or even if |A| = 2 α n log n − 2 α n log log n + nA n , where |A n | → ∞.It follows that we can easily go back and forth from the independent model to the fixed set size model except possibly when we are at the threshold, i.e., when

Modular k-Additive Bases
Let us briefly now turn our attention to modular k-additive bases.We define A as in Section 2, and will choose integers to be in A with probability p = p n .Define We define I j to be 0 if j ∈ S and 1 if j / ∈ S. As before, let X = n−1 j=0 I j .In [12] it was proven, again using the method of Brun's sieve, that Theorem 9.With X defined as above, if we choose elements of [0, n − 1] to be in A with probability where . Using the Stein-Chen method we are able to reproduce the above result and prove the total variation convergence for all p ∈ [p 0 , 1],where p 0 > k Ak! log n n k−1 , with A > (k − 1)/k.We will provide proof for our contribution to the window of convergence, omitting proofs when results were derived also in [12] With X as above, we first derive (as in [12]) that Following [2], we first need to determine, for each j separately, an auxiliary sequence of variables J j i defined on the same probability space with the property that L(J j 1 , J j 2 , . ..) = L(I 1 , I 2 , . . . Such a coupling can probably be described explicitly as we did for k = 2 but there is no need to do so: It is clear that the more integers in A, the higher the probability that I j is 0, as integer j is more likely to be representable as a ksum.Therefore, the I j 's are decreasing functions of the baseline i.i.d.random variables {Y i } that have distribution P(Y 0 = 1) = p; P(Y 0 = 0) = 1 − p, and a monotone coupling satisfying (6) exists.Thus with λ := E(X) and Y ∼Po(λ), we can apply Theorem 2.E and Corollary 2.C.4 in [2] to get Now, from the asymptotic formula of E(X), we have that 2P( and this goes to 0 as n → ∞ if Next, we need to bound the growth of E(I i I j ) for i < j.Now E(I i I j ) equals the probability then neither i nor j is in S, and thus E(I i I j ) = P(S = 0), where where J a := J r,l,a equals one if the l integers in a that sum to r (with possible repetition) are all selected to be in A. We have that (see for example Lemma 1.5 in [12]) Set {r, l, a} ∼ {s, m, b} if a ∩ b = ∅; {r, l, a} = {s, m, b}.For the associated dependency graph, we see that ∆ = {r,l,a} {s,m,b}∼{r,l,a} and thus by Janson's inequality we have Returning to (7), we see that Thus, the total variation distance goes to 0 if before, we wish to represent each j ∈ [αn, (k − α)n] as j = x 1 + . . .+ x k for x i ∈ A, where x 1 ≤ x 2 ≤ . . .≤ x k .Again, for each j, let I j equal 1 if j cannot be expressed as a k-sum of elements of A and 0 otherwise.We set X := (k−α)n j=αn I j , and note that X = 0 ⇔ A is a truncated k-basis.Theorem 10.With X defined as above, if we choose elements of {0} ∪ [n] to be in A with probability where Before proving this theorem, we need some preliminary work.Let S j be the set of all unordered k-tuples of nonnegative integers in {0} ∪ [n] that sum to j. Claim 11.For j ∈ [αn, n], Proof.The number of ordered k-tuples of nonnegative integers that sum to j is j+k−1 j ∼ j k−1 (k−1)! .All such tuples will be composed entirely of numbers in {0} ∪ [j], and at most k 2 • n • j+k−3 j = O(j k−2 ) of these contain a number repeated once or more often.We can disregard these in the asymptotic analysis, and consider the remaining unordered and ordered tuples.Each remaining unordered tuple appears k! times among the remaining ordered tuples, giving us the desired first order asymptotics.
Lastly, we have: Proof.We will use the fact that |S j |, i.e. the number of partitions of j of size at most k each part of which is less than or equal to n, is the coefficient of q j in the q-binomial coefficient It is well known, see for example [11], that n+k k q = a i q i is a polynomial in q and that the coefficients are unimodal, namely a j−1 < a j for j ≤ nk/2.Claim 11 yields that and the proof now follows directly from Claim 12.
We next begin our analysis of E(X) with a preliminary claim.
Claim 14.With S j defined as above, where T j , the set of k-tuples of distinct elements that add to j satisfies from the statement of the theorem, and setting where we used the facts that x x+1 ≤ 1 − e −x in the fourth line and, in the final line of the display, that for all choices of p considered in the theorem, n k−2 p k → 0. Finally, the geometric bound in the second line follows from the fact that the ratio of consecutive terms in the sum satisfies A lower bound on Σ 1 (and thus on E(X)) is obtained by using elementary integration by parts to derive a tight estimate for the integral in much the same way that Gaussian tails are analyzed (which is the k = 3 case.)We have Setting, for t > 0, Ψ(t, k) and, for another two constants E, E ′ > 0, using (9).Thus, since in our context It is easy to verify that Σ 2 = o(1)Σ 1 , and thus Let λ = E(X) and let Y ∼ Po(λ).First we note that E(X) tends to zero, infinity, or 2α k−1 e −A/K if p is as stated as in the theorem with A n → ∞, A n → −∞ or A n → A respectively.To complete the proof of Theorem 10, we use Poisson approximation to show that the total variation distance between X and Y converges to 0, so that, in particular, P(X = 0) → e −λ .
Using Theorem 2.C of [2] with Γ 0 α = Γ − α = ∅, for i = j, ℓ, and for the cross product term E(I j I ℓ ) we use Janson's inequality as in our Stein-Chen treatment of Theorem 9.For fixed j, ℓ, note that P(I j I ℓ = 1) = P(S = 0), where S = r=j,ℓ k s=1 a:={a 1 ,...,as}: a i =r J r,l,a , where J a := J r,s,a equals one if the s integers in a that sum to r (with possible repetition) are all selected to be in A. We then use Janson's inequality and the worst case scenario estimate for the ∆ that arises to estimate Open Problems In both the modular and truncated cases, the representation function question has yet to be addressed.We have preliminary results in this direction and plan on publishing them in a future paper.

λ j P 2 (
I j = 1) + j ℓ (E(I j I ℓ ) − E(I j )E(I ℓ )) ;the above is just a variation of the bound used in the proof of Theorem 5. Now1 − e −λ λ j P 2 (I j = 1) ≤ max j P(I j = 1)≤ max j exp{−|T j |p k (1 + o(1))} ≤ e −δn k−1 p k (1+o(1)) ,where δ is a constant not depending on p or n, and this term converges to 0 if p = k G log n n k−1 for any constant G > 0. For the double sum, we start by using the estimatesP(I i = 1) ≥ exp{−|S i |p k (1 + O(1/np))}