Tower-type bounds for Roth's theorem with popular differences

Green developed an arithmetic regularity lemma to prove a strengthening of Roth's theorem on arithmetic progressions in dense sets. It states that for every $\epsilon>0$ there is some $N_0(\epsilon)$ such that for every $N \ge N_0(\epsilon)$ and $A \subset [N]$ with $|A| = \alpha N$, there is some nonzero $d$ such that $A$ contains at least $(\alpha^3 - \epsilon) N$ three-term arithmetic progressions with common difference $d$. We prove that the minimum $N_0(\epsilon)$ in Green's theorem is an exponential tower of 2s of height on the order of $\log(1/\epsilon)$. Both the lower and upper bounds are new. It shows that the tower-type bounds that arise from the use of a regularity lemma in this application are quantitatively necessary.


Introduction
A celebrated theorem of Roth [25] states that for each α > 0 there is a least positive integer N (α) such that if N ≥ N (α) and A ⊂ [N ] := {1, . . ., N } with |A| ≥ αN , then A contains a three-term arithmetic progression.Over the past six decades, there has been great efforts made by many researchers toward understanding the growth of this function, and despite the introduction of important tools in these efforts, the growth of N (α) is still not well understood.The upper bound was improved by Heath-Brown [21], Szemerédi [30], Bourgain [7,8], Sanders [28,27], and most recently Bloom [5] (see also [6]).The lower bound of Behrend [2] was recently improved a bit by Elkin [10] (see also Green and Wolf [20] for a shorter proof).The best known bounds are of the form α −Ω(log(α −1 )) ≤ N (α) ≤ 2 O α −1 (log α −1 ) 4 .Szemerédi [29] extended Roth's theorem to show that any dense set of integers contains arbitrarily long arithmetic progressions.Szemerédi's proof developed an early version of Szemerédi's regularity lemma [31], which gives a rough structural result for large graphs and is arguably the most powerful tool developed in graph theory.It roughly says that any graph can be equitably partitioned into a bounded number of parts so that between almost all pairs of parts, the graph behaves randomlylike.Szemerédi's proof of the regularity lemma gives an upper bound on the number of parts which is tower-type in an approximation parameter, which gives a seemingly poor bound for the various applications of the regularity lemma.For over two decades there was some hope that a substantially better bound might hold leading to better bounds in the many applications.This hope was shattered by Gowers [16], who proved that the bound on the number of parts in the regularity lemma must grow as a tower-type function.Further results improving on some aspects of the lower bound were obtained in [9,11,24].
It has been a major program over the last few decades to find new proofs of the various applications of Szemerédi's regularity lemma and its variants that avoid using the regularity lemma and obtain much better quantitative bounds.This program, popularized by Szemerédi and others, has been quite successful, leading to the development of powerful new methods, such as in Gowers' new proof of Szemerédi's theorem [17] which introduced higher order Fourier analysis [17], and in the resolution of many open problems in extremal combinatorics using the powerful probabilistic technique known as dependent random choice (see the survey [15]).However, until now it was unclear if one could Theorem 1.5 (Lower bound for intervals).There exist constants c, α 0 > 0 such that for every 0 < α ≤ α 0 , 0 < ǫ ≤ α 12 and N ≤ tower(c log(1/ǫ)), there exists A ⊂ [N ] with |A| ≥ αN such that for every positive integer d ≤ N/2, one has x, x + d, x + 2d ∈ A for at most (α 3 − ǫ)(N − 2d) many integers x.
Organization.In the next section, we introduce some helpful notation and preliminaries including some basic facts from discrete Fourier analysis.In Section 3, we give an overview of the proof strategies for our results.In Section 4, we prove Theorems 1.3 and 1.4 giving the upper bound results.In Section 5, we give some auxiliary results for the probabilistic lower bound construction.In Section 6, we give a lower bound construction for groups which are the product of prime cyclic groups with fast growing order.In Section 7, we then use this construction as an important ingredient to obtain the lower bound construction in intervals.
We often omit floor and ceiling signs when they are not crucial for clarity of presentation.

Notations and Preliminaries
Averaging and expectation.We use E to denote the averaging operator: given a function f on a finite set S, denote We may write E x instead of E x∈S if the domain of x is clear from context (usually over a group).The L p norms are defined in the usual way: As our lower bound construction is probabilistic, we will also need to consider expectations of random variables, for which we use the usual notation E for expectation (note the difference in font compared to the averaging operator E).
Fourier transform and convolutions.Given a finite abelian group G, let G denote its dual group, whose elements are characters of G, i.e., homomorphisms χ : G → S 1 := {z ∈ C : |z| = 1}.The Fourier transform of f : G → C is a function f : G → C defined by We write χ 1/2 to denote the character given by x → χ(x/2) (we will always work with odd abelian groups so that x/2 makes sense).
It is often convenient to explicitly identify the dual group G with G (they are isomorphic for finite abelian groups).For example, for f : Z N → C and r ∈ Z N , we identify r with the character χ r (x) = e(xr/N ) where we use the standard notation for the complex exponential e(t) := exp(2πit), t ∈ R. Likewise, for f : F n p → C and r ∈ F n p , we identify r with the character χ r (x) = e(x • r/p), where x • r = x 1 r 1 + • • • + x n r n ∈ F p is the dot product in F n p .Thus, Given two functions f and g on G, their convolution f * g is defined by We recall several useful properties of the Fourier transform: The Fourier transform is also fundamentally related to the count of 3-APs (or the count of solutions to linear equations in general), as evident in the following key identity already used in the proof of Roth's theorem [25].It can be easily shown by substituting the Fourier coefficients and expanding.
Densities.For an abelian group G with odd order and a function f : G → [0, 1], we define the density of 3-APs of f as We define the density of 3-APs with common difference d of f as For a subset A of G, when we say "density of 3-APs" of A, we mean that of its indicator function as used in Theorem 1.5.In this case, we take the average only over 3-APs supported in [N ].It is easy to see that the density from the second definition is always at least the density from the first definition.In particular, the upper bound using the first definition implies the upper bound using the second definition.Similarly, the lower bound using the second definition implies the lower bound using the first definition.Thus, we give the stronger result in each case.
Constants.We use c > 0 and C > 0 to denote small and large absolute constants, though their values may differ at every instance.One could imagine attaching a unique subscript to each appearance of c and C.

Overview of strategy
In this section we sketch the proof ideas of our main theorems, starting with the upper bound (Theorems 1.4 and 1.3) and then followed by the lower bound (Theorem 1.5).
In both cases, we prove the "functional" versions of the theorems.That is, instead of working with subsets A ⊂ G, we work with functions f : G → [0, 1], which can also be viewed as subsets with weighted elements.A subset A ⊂ G can be represented by its indicator function 1 A .Conversely, given a function f : G → [0, 1], we can produce from it a random subset A ⊂ G obtained by putting each element x ∈ G independently into A with probability f (x).The resulting A has similar statistical properties compared to f due to concentration.Working with functions affords us greater flexibility, which is convenient for both parts of the proofs.

Theorem 3.2 (Upper bound for abelian groups, functional version).
There exists a constant C > 0 such that the following holds.Let ǫ > 0 and let G be a finite abelian group of odd order with Green [18] proved the above theorems with a slightly worse bound of tower(C(1/ǫ) C ) instead of tower(C log(1/ǫ)).Let us first give a quick sketch of Green's approach.It is easier to first explain it for the finite field vector space setting G = F n p with p fixed.Green begins by establishing a regularity lemma.Given f : G → [0, 1], one finds a subspace H of G = F n p of codimension at most tower(C(1/ǫ) C ) such that inside almost all translates of H, f behaves "pseudorandomly" in the sense of having small Fourier coefficients (other than the principal "zeroth" Fourier coefficient that records the density).This subspace H is obtained iteratively, similar to the standard energy-increment proofs of regularity lemmas: starting with H 0 = G, at each step one checks if H i has the desired properties, and if not, then one finds a bounded-codimensional subspace H i+1 of H i witnessing the non-uniformity.Each step increases the "energy", or meansquared density, by at least ǫ O (1) .As the energy can never exceed 1, the process terminates after at most (1/ǫ) O (1) steps.
Once we have the above bounded-codimensional subspace H, let g be the function obtained from f by averaging f inside each translate of H.In other words, consider the convolution g = f * β H , where β H denotes the averaging measure on H (normalized so that Eβ H = 1).The regularity property of H, namely f − g having small Fourier coefficients when restricted to most H-cosets, is enough to deduce that f and g have similar densities of 3-APs with common differences lying in H, i.e., Thus the final expression is E[g 3 ] ≥ (Eg) 3 by convexity, and we have Eg ≈ Ef .Putting everything together, we have (2) If G is large enough, so that H is large enough, then the above inequality implies that there is some nonzero common difference d ∈ H so that thereby showing that d is a popular common difference.
Let us now sketch how to improve the above bound to tower(C log(1/ǫ)) in the finite field setting, which had been worked out in [12].Instead of finding an H that regularizes the function f , we simply seek to satisfy the inequality (2).One then shows that if (2) is violated, then we can find a bounded-codimensional subspace of H, via an application of the weak regularity lemma at a "local" level, so that the corresponding mean cubed density of f (after averaging along the subspace) nearly doubles at each step (instead of merely increasing by ǫ −O (1) ), so that the iteration process must end after O(log(1/ǫ)) steps (instead of (1/ǫ) O(1) steps).Once we obtain an H satisfying (2), the rest of the argument is essentially identical.
For general abelian groups G, unlike in the field field setting F n p , the group might not have enough subgroups to run the above arguments.Instead, one uses Bohr sets, which play an analogous role to subgroups.Bohr sets are defined in Section 4.1, and their manipulations are much more delicate compared to subspaces, particularly as they do not have as nice closure properties.Green proved his arithmetic regularity lemma for general abelian groups using Bohr sets as basic structural objects.The strategy remains largely similar in spirit to the finite field setting, though more challenging at a technical level (e.g., the group cannot be partitioned into Bohr sets, unlike with subgroups).For instance, we obtain g from f by setting g(x) to be a certain "smooth average" of f around a certain carefully chosen Bohr neighborhood of x.While the values g(x), g(x + d), and g(x + 2d) are no longer necessarily identical, they are hopefully approximately the same if d lies inside some Bohr neighborhood of 0. The rest of Green's argument is similar to the finite field vector space case.
To obtain the corresponding result for intervals, one considers embedding [N ] in Z N and only consider Bohr sets whose elements d are all small in magnitude.
In order to improve the bound from tower((1/ǫ) O (1) ) to tower(O(log(1/ǫ))) for general groups and for intervals, we carefully execute a combination of the above ideas.New ideas are required to adapt the mean cube density increment argument from [12] to Bohr sets due to complications that do not arise in the finite field setting.The proof is carried out in full detail in Section 4.

Lower bound.
In this section, we give a brief overview of the proof of Theorem 1.5.We will deduce Theorem 1.5 from its functional analogue given below, where we replace the subset A by a function f : [N ] → [0, 1] with density α so that for any nonzero d, the density of 3-APs with common difference d of f is at most α 3 (1 − ǫ).

Theorem 3.3 (Lower bound for intervals, functional version).
There are positive absolute constants c, α 0 such that the following holds.If 0 ≤ α ≤ α 0 , 0 ≤ ǫ ≤ α 7 , and N ≤ tower(c log(1/ǫ)), then there is a function f : We remark that we have replaced ǫ by ǫα 3 , which is more convenient to work with in the lower bound construction.Since we treat α as a constant throughout, this has no effect on the behavior of the asymptotic bound we get.The proof of Theorem 1.5 assuming Theorem 3.3 follows from a standard sampling argument, which we defer to Appendix A. There, we also show that it suffices to prove Theorem 1.5 and Theorem 3.3 for N ≥ ǫ −15 .
In the following subsections, we sketch the construction of the function f in Theorem 3.3.This construction utilizes a construction over cyclic groups which can be factored into a product of groups with appropriate growth in size.The construction in this case is inspired by the recursive construction presented in [12] over finite field vector spaces.However, several important new ideas are needed.The construction of f for such product groups is sketched in Subsection 3.2.1.Using this construction, we can construct f over intervals, using ideas sketched in Subsection 3.2.2,thus proving Theorem 3.3.
We remark that in this section we only give proof sketches without complete detail.The details of each construction and full proofs are presented in Sections 6 and 7.
3.2.1.Product groups.We first give the construction for groups which can be written as a product of appropriately growing cyclic groups of prime order.Theorem 3.4 (Lower bound for product of growing prime cyclic groups).Let 0 < α ≤ 1/4, 0 < ǫ ≤ 20 −9 , and G = Z n where n is a positive integer such that there exist distinct primes m 1 , . . ., m s with s ≤ log 150 ǫ −1/4 α 6  8 satisfying , and Here, the product structure of G and the bound on the growth of m i allow us to conduct an iterative construction.The basic framework of the construction builds on the construction in [12], which took place in the setting of F n p .We build functions f 1 , f 2 , f 3 , . . . in this order.Here the domain of f i is Q i = i j=1 Z m j .We will maintain that Ef i = α, and that for every d ∈ Q i \ {0}, we have The function f s thus gives the desired construction.
Suppose we have already constructed f i−1 .Here is how we construct f i : (ii) Choose some M i ⊂ Q i−1 (with some properties).
(iii) For each x ∈ M i , fill in values of f i on the coset x + Z m i using a random function chosen from F. (iv) For each x / ∈ M i , set all values of f i on the coset x + Z m i to be f i−1 (x).We refer to this process as random modification (the phrase "perturbation" was used in [12]).We will show that with positive probability, the random function f i satisfies the desired properties.
The main difference from the finite vector space case is the choice of the family F. In [12], we first choose g : F p → [0, 1] to be a multiple of the indicator of an interval of length 2p/3.We then choose the family F to be functions of the form g(x • v) for some nonzero v ∈ F m i p .Here, instead, we choose a nice model function g : Z m i → [0, 1] satisfying certain properties to be discussed later.We choose F to consist of functions g a,b : We denote elements of Q i by x = (x 1 , x 2 , . . ., x i ) where x j ∈ Z m j .We write where a, b vary uniformly over all elements of Z m i with a = 0, and y, z vary uniformly over all elements of Z m i with z = 0.The final inequality is due to a property of the model function g.It then follows via concentration that with positive probability, We are left with the task of bounding the density of 3-APs with common difference d where p , this is easy, as we can show, using the structure of vector spaces, that under a mild condition, if Such equality does not hold in our current setting.Even though we do not need exact equality, obtaining uniform control over all d ∈ Q i with d [i−1] = 0 using standard concentration inequalities does not work because n 1 is small compared to ǫ −1 .However, we can indeed guarantee with high probability the equality assuming that the model function g over Z m i satisfies some additional nice properties, which we refer to as smoothness.
Roughly speaking, g : To see how this smoothness property helps, assume that we are given Then, for all d with d where c 1 , c 2 , c 3 depend on x ′ and d.Moreover, if g is smooth, then with high probability over random a 1 , a 2 , a 3 , one has for all c 1 , c 2 , c 3 ∈ Z m i .In such case, we obtain that for all d with Thus, as long as the parameters a 1 , a 2 , a 3 corresponding to each , we obtain the desired equality We guarantee this property by applying the union bound over possible values of Note that it is crucial here that the smoothness property allows us to avoid the union bound over The model function g over Z m i is constructed in Section 5.The essential idea behind the construction is that g should be supported on only a few Fourier characters.The details of the construction over product groups are included in Section 6.
3.2.2.Intervals.Using Theorem 3.4, we can prove Theorem 3.3, giving the desired lower bound over intervals.The construction of the function f in Theorem 3.3 over intervals consists of three steps.
In the first step, we construct a function f 1 with density α which is 0 in the interval [N ′ + 1, N ] for N ′ slightly smaller than N .This sets the density of 3-APs with common difference d close to N/2 to 0.
In the second step, we let f 2 be the function obtained from the following procedure applied to f 1 : (i) Partition [N ′ ] to N ′ /q intervals I 1 , I 2 , . . ., I N ′ /q of length q where q can be written as a product of prime numbers as required in Theorem 3.4.(ii) Using Theorem 3.4, construct a function g : For each j = 1, 2, . . ., N ′ /q, identify each interval I j with Z q and place a copy of g on each of them.For any d with 0 < d < N/2 and q ∤ d, one can show that the density of 3-APs with common difference d of f 2 is at most α 3 (1 − ǫ).However, for d divisible q, the density of 3-APs with common difference d of f 2 is larger than α 3 .
Note that the function f 2 constructed in the second step is constant on each mod q residue class in [N ′ ].In the third step, we construct the function f 3 as follows: (i) Construct a subset X of Z N ′ /q with much fewer 3-APs compared to the random bound using a variant of the Behrend construction.(ii With some appropriate T ⊆ Z q , for each t ∈ T , take a random linear transformation of X inside set Z N ′ /q , and set f 3 on P t to be the indicator function of this randomly transformed X. The function f 3 has the property that in expectation, for a nonzero d divisible by q, the density of 3-APs with common difference d of f 3 is at most α 3 (1 − ǫ).
We let f = f 3 .Using concentration inequalities, we can show that with positive probability (over the randomness in the third step), for any d ∈ [N/2], proving Theorem 3.3.The details of this construction are contained in Section 7.

Upper bound
In this section, we prove Theorem 1.4 and Theorem 1.3, showing the existence of a popular difference for 3-APs when |G| ≥ tower(C log(1/ǫ)) or N ≥ tower(C log(1/ǫ)).Here G always denotes a finite abelian group of odd order.For x ∈ G, we write x/2 to mean the inverse of the isomorphism x → 2x.In Section 4.1, we give some preliminaries on Bohr sets, which is an important tool to make Fourier analysis work over general abelian groups.In Section 4.2, we give the complete proofs of Theorem 1.4 and Theorem 1.3.We often drop S and ρ from the notation and denote the Bohr set by B if it is clear from context.Given a Bohr set B, we write S(B) to denote the frequency set of B. For a real number ν ≥ 0, we denote by (B) ν the Bohr set with the same frequency set and scaled radius B(S, νρ).

Define the normalized indicator function of Bohr sets by
where we chose the normalization so that Then we also have E x [φ B (x)] = 1.The functions β B can be thought of as the density with respect to the uniform distribution on G of the uniform distribution on B, and φ B is the density of a smoothened version of the uniform distribution on B. In general, a function τ : G → [0, ∞) with Eτ = 1 can be thought of as the density of a distribution with respect to the uniform distribution on G.
Conventions.For simplicity of notation, we often omit the subscript B and use consistent subscripts throughout.For example, As introduced by Bourgain [7], it is often useful to work with regular Bohr sets, those for which a small change to the radius does not significantly change the size of the Bohr set.
In the next proposition, we state some basic properties of Bohr sets, whose proofs can be found in [32,Section 4.4].Denote by 2 • X = {2x, x ∈ X} the dilation of X by a factor 2. Recall that for a character χ, we denote by χ 1/2 the character given by x → χ(x/2).
The next estimate shows that regular Bohr sets are essentially invariant under convolutions with a distribution whose support is contained in a Bohr set with the same frequency set and smaller radius.This is analogous to the additive closure property of subgroups. and Furthermore, for any f : G → [0, 1], letting κ be either β or φ, Proof.We have . For (5), note that (by the triangle inequality) (by ( 4)) = 160νd.
(by ( 4), ( 5)) The next lemma says that if B 2 ⊆ (B 1 ) ν/2 and τ = β B 2 or τ = φ B 2 , then the k-th moment of f τ is at least the k-th moment of f φ B 1 up to a small error term.Let Let k ≥ 1 be an integer and ν ≤ 1/(80d 1 ).Then following statements hold. If Proof.By (6) of Proposition 4.5, applied with Thus, By Jensen's inequality applied to the convex function t → t k , we obtain The proof of (8) follows similarly.
4.2.Proofs of Theorem 1.4 and Theorem 1.3.In the following, we prove two results that are used in the proof of Theorem 1.4, the counting lemma (Lemma 4.7) and the mean-cube density increment (Lemma 4.9).For a function φ on G with Eφ = 1 and a function f : G → [0, 1], we denote Lemma 4.7 (Counting lemma).Let B 1 = B(S 1 , ρ 1 ) and B 2 = B(S 2 , ρ 2 ) be two Bohr sets.Let Proof.By expanding in the Fourier basis,

By similar expansion for
Note that and We can now bound the first term as (by ( 9)) (by ( 10)) , where in the last inequality, we used the fact that sup x φ 1 (x) ≤ |G|/|B 1 | and Eφ 1 = 1.
The two remaining terms are bounded similarly.
We next state and prove the mean-cube density increment lemma.We make use of the following classical inequality in the proof of the lemma: Theorem 4.8 (Schur's inequality).For real numbers a, b, c ≥ 0, one has Lemma 4.9 (Mean-cube density increment).
Proof of Theorem 1.4.We define inductively parameters ρ i such that ρ 1 = ǫ 10 , and for i ≥ 2, since arg(χ(x))/(2π) R/Z ≤ ρ i /(4π) for all x such that β i (x) = 0. Hence, Observe that E[f 3 φ i ] ≥ α 3 by convexity, and E[f 3 φ i ] ≤ 1 for all i.By Lemma 4.10, there exists Fix such an i.We have and furthermore χ 1/2 ∈ S i+1 so . By Lemma 4.9, there exists a regular Bohr set B = (B i ) By our choice of ρ i , ν i , we have Observe that there exists an absolute constant C ′ > 0 so that ρ i ≥ 1/tower(C ′ i).Furthermore, the codimension of B i is bounded above by 5ρ −2 i and the radius of B i is ρ i /(4π).Hence, we obtain a Bohr set B with size at least |G|/tower(10C ′ log(1/ǫ)) such that Λ φ (f ) ≥ α 3 − 7ǫ/8.Hence, for a sufficiently large constant C > 0, assuming that |G| ≥ tower(C log(1/ǫ)), we have Thus, there exists d = 0 such that Proof of Theorem 1.3.We can assume without loss of generality that N is odd by possibly increasing N by 1.Let G = Z N .We repeat the proof of Theorem 1.4 with the inclusion of the character χ 0 (x) = e 2πix/N in the sets S i .We then obtain a Bohr set B whose frequency set contains χ 0 such that B has size at least |G|/tower(C ′ log(1/ǫ)) and Λ φ (f ) ≥ α 3 − 7ǫ/8.Assuming that N ≥ tower(C log(1/ǫ)) for sufficiently large C, following the last step in the proof of Theorem 1.4, we obtain a positive integer d < N/2 such that d ∈ supp(φ) when viewed an element in Z N and Since d ∈ supp(φ) ⊆ B + B and χ 0 is in the frequency set defining B, arg(χ 0 (d))/(2π) R/Z ≤ 2ρ ≤ 2ǫ 10 .Thus, as a positive integer less than N/2, we have d < 2ǫ 10 N .Thus, restricting to x ∈ [N − 2d] in the above expectation, we have

Lower bound construction: preparations
We assume throughout this section that N is an odd prime number.As a building block in our construction, we will make use of a function g which has relatively low 3-AP density (considerably smaller than the random bound given the density of g), but behaves random-like in the following way.If a 1 , a 2 , . . ., a h are chosen independently and uniformly at random from the nonzero elements of Z N , then with high probability, for all b In the following, we identify Z N with Z N , so that we write Lemma 5.1.Suppose g : Z N → [0, 1] and a 1 , a 2 , . . ., a h ∈ Z N \ {0} satisfy the following properties: (i) The support of g has size at most ℓ.
Furthermore, if a 1 , a 2 , . . ., a h are chosen from Z N \ {0} uniformly and independently at random, then Property (ii) is satisfied with probability at least 1 − ℓ h /(N − 1).

Proof. By the Fourier inversion formula,
Note that h j=1 ĝ(0) = E[g] h .Consider (r 1 , r 2 , . . ., r h ) = (0, . . ., 0) where h j=1 r j a j = 0. Property (ii) guarantees that ĝ(r j ) = 0 for some j ∈ [h], so h j=1 ĝ(r j ) = 0. Hence, if a 1 , a 2 , . . ., a h satisfy Property (ii), then Next, we show that if a 1 , a 2 , . . ., a h are chosen uniformly and independently at random from Z N \ {0}, then Property (ii) is satisfied with probability at least 1 − ℓ h /(N − 1).Indeed, consider a fixed (r 1 , r 2 , . . ., r h ) = (0, 0, . . ., 0) such that r j is in the support of g for each j ∈ [h].There exists i ∈ [h] such that r i = 0.For each fixed choice of a j for j ∈ [h] \ {i}, there is a unique choice of a i such that h j=1 r j a j = 0. Hence, the probability that h j=1 r j a j = 0 is at most 1/(N − 1).By the union bound over the choice of r j in the support of g, we obtain that Property (ii) is violated with probability at most ℓ h /(N − 1).
Next, for each α ≤ 1/2, we construct a function g α with mean α and prove that g α has the desired properties in Lemma 5.1.We recall that the 3-AP density of a function g is denoted by Then g α satisfies the following properties.
From (1), and This proves Property (ii).Property (iii) follows directly from Lemma 5.1 applied to the function g α and ℓ = 5.
To prove Property (iv), notice that Finally, Property (v) follows from Parseval's identity,

Lower bound construction for product groups
In this section, we prove Theorem 3.4.For convenience, we recall the theorem statement here.
Theorem.Let 0 < α ≤ 1/4, 0 < ǫ ≤ 20 −9 , and G = Z n where n is a positive integer such that there exist distinct primes m 1 , . . ., m s with s ≤ log 150 satisfying We first make a few notation conventions.Note that if n = s i=1 m i for distinct primes m i , then Each element of G can be represented by an s-tuple (x 1 , x 2 , . . ., x s ) where We can think of Q i as a quotient of G by the subgroup H i = {x ∈ G : x j = 0 for all j ≤ i}.We identify Z m i as the subgroup of Q i consisting of elements with x j = 0 for j < i, and we identify the quotient For an element x ∈ G or x ∈ Q j with j ≥ i, we denote x [i] = (x 1 , . . ., x i ).For j < i, we say that an element x of Q i is a lift of an element y in Q j if x [j] = y.In the following discussion, when the level i is clear from context, if not specified otherwise, the 3-APs would refer to 3-APs in Q i . .In each level i, for i ∈ [s], we construct a function f i : We introduce parameters µ 1 = ǫ 1/4 and µ i = 150 i−1 α ′−6 ǫ 1/4 for i ≥ 2, where α ′ = α(1 + 1 m 1 −1 ).In the first level, define For i ≥ 2, we extend In level i, we define f i to be a random function as follows.
For each x ∈ M i−1 , we choose a x ∈ Z m i \ {0} and b x ∈ Z m i uniformly and independently at random.For each y ∈ Q i such that y [i−1] = x, we define where g α ′ is the function with density α ′ and with low 3-AP density defined earlier in Lemma 5.2.
Otherwise, for x / ∈ M i−1 and y ∈ Q i such that y [i−1] = x, we define We refer to this as the random modification in level i.This defines (random) We will show that with positive probability, for each level i, we can pick f i such that the function f has the desired properties claimed in Theorem 3.4.
6.2.Proof of Theorem 3.4.We first claim that the construction is feasible with the above choice of parameters.Note that µ 1 ≥ 1/m 1 , so f 1 (x) = α ′ for all but a µ 1 fraction of elements x ∈ Q 1 .For i ≥ 2, observe that if f i (y) = α ′ , then we must have y [1] = 0 or y [j] ∈ M j for some j < i.Thus, the fraction of y ∈ Q i for which f i (y) = α ′ is at most i j=1 µ j .Since , it is possible to choose We next prove that the function f i has density α and f i maps Q i to [0, 1].This is true for i = 1.Assume that f i−1 has density α and takes values in [0, 1], we show that f i also has these properties.
Hence, the density of f i is the same as the density of f i−1 , and f i takes values in [0, 1].By induction, the density of f i is α and the values of We denote by E f i the expectation over the randomness of f i (the local modifications in level i), conditioned on a fixed choice of f i−1 .Furthermore, all of the probability we consider will be conditioned on this fixed choice of f i−1 , hence in level i we only consider the randomness of the random modification in level i.
The random modification in level i has the following key property.For any This is since when a is chosen uniformly at random from Z m i \ {0} and b is chosen uniformly at random from Z m i , then for any fixed x i and nonzero d i , (ax i + b, ax i + ad i + b, ax i + 2ad i + b) is distributed uniformly among all 3-APs with nonzero common difference in Z m i .We now proceed to prove that there exists a choice of the modification in each level so that for any d ∈ G \ {0}, The main idea is to maintain by induction that for any i ∈ [s], we can choose f i which is a random modification of f i−1 so that for any d ∈ Q i \ {0}, For all d such that d [i−1] = 0, the above property follows from observation ( 14) and concentration inequalities.If )] is small by the induction hypothesis.We guarantee that with large probability, is small.Combining these two cases, we obtain a modification f i of f i−1 whose density of 3-APs with common difference d is small for all nonzero d ∈ Q i .
We now give the proof of Theorem 3.4.
Proof of Theorem 3.4.It is easy to see that for ǫ ≤ 20 −9 .Inductively, if where the first inequality is by (ii) in Lemma 5.2 as we apply the local modification to a µ i fraction of the Z m i -cosets, getting at most a 1 2 α ′3 increment in the mean cube density over each of them, and the second inequality follows from our choice of parameters µ i ≥ 150µ i−1 for all i ≥ 2.
Let P(i) be the property that for all d ∈ Q i \ {0}, We prove by induction that in level i, the modifications can be chosen so that P(i) holds.

Consider the base case
This establishes P(1).Next, we continue with the inductive step.Assume that P(i − 1) holds.We prove that we can choose the modification in level i so that P(i) also holds.This follows from the following two claims.Claim 6.1.With probability larger than 1/2, conditioned on a fixed choice of Claim 6.2.With probability larger than 1/2, conditioned on a fixed choice of Combining Claims 6.1 and 6.2, by the union bound, the modification in level i fails to satisfy P(i) with probability strictly less than 1.Thus we can choose a modification satisfying P(i) in level i.This completes the induction.Thus, there exists f = f s which satisfies P(s), so for any nonzero d in G, This completes the proof of Theorem 3.4.Now we turn to the proofs of Claims 6.1 and 6.2.
Proof of Claim 6.1.
Hence, for y ∈ M i−1 , Note that the random variables for y ∈ M i−1 , are independent (under the randomness of the modification in level i, conditioned on a fixed choice of f i−1 ).Thus the probability that where the first equality follows from , and the inequality follows from (15) and f i−1 (y) = α ′ for y ∈ M i−1 .Thus, if /64, by the union bound, the probability that there exists d , where we used the upper bound on m i in the theorem statement.
Proof of Claim 6.2.Recall that for a Z m i -coset representing by w where as defined in Lemma 5.2 and a w ∈ Z m i \ {0} and b w ∈ Z m i are chosen uniformly and independently for each w ∈ M i−1 .For each 3-AP (w, w + d ′ , w + 2d ′ ) with common difference d ′ ∈ Q i−1 \ {0}, and for any lift d of d ′ , we have where g (j) : Z m i → [0, 1] can be either the function g α ′ or a constant function, a j ∈ Z m i \ {0}, b j ∈ Z m i are chosen uniformly and independently at random, and Note that if we fix the modification (i.e., fixing each a j and b j ), changing d to a different lift of d ′ would only change c j in equation (17), and would not change the coefficients of y in g (1) , g (2) , g (3) in the last line of equation (17).Let J ⊆ [3] be the set of indices such that g (j) = g α ′ .By Lemma 5.1 applied to the function g α ′ and h = |J| ≤ 3, with probability at least 1 − 125/(m i − 1), for all u j ∈ Z m i .Since g (j) is a constant function for j / ∈ J, we obtain that with probability at least 1 − 125/(m i − 1), Thus, by the union bound, with probability at least 1 − 125n 2 i−1 /(m i − 1), for every 3-AP (w, w + d ′ , w + 2d ′ ) in Q i−1 with nonzero common difference d ′ , and for all lifts i−1 , and m i ≥ m 2 ≥ ǫ −2 /64 > 10 6 , so 125n 2 i−1 /(m i − 1) < 1/2.Hence with probability larger than 1/2, for all Thus, assuming that P(i − 1) holds, we can choose the modification in level i so that P(i) holds.By induction, we can find a function f s which satisfies P(s).Notice that f s (x) = α ′ for at least a 3/4 fraction of x ∈ G by (13), and E x [f s (x) 3 ] ≤ 3α 3 /2 by ( 15) with i = s.The function f = f s then satisfies the conclusion of Theorem 3.4.

Lower bound construction for intervals
In this section we prove Theorem 3.3, restated below for convenience.
Theorem.There are positive absolute constants c, α 0 such that the following holds.If 0 ≤ α ≤ α 0 , 0 < ǫ ≤ α 7 , and N ≤ tower(c log(1/ǫ)), then there is a function f : By Appendix A, in order to prove Theorem 3.3, we can (and will) assume that N ≥ ǫ −15 .Before proving Theorem 3.3, we first need an auxiliary construction of a set with relatively low 3-AP density given its density.Recall from the introduction that N (α) is the least positive integer such that if N ≥ N (α) and A ⊂ [N ] with |A| ≥ αN , then A contains a 3-AP.So we may assume n > 4N .Integers x, y, z form an approximate 3-AP if |2z − x − y| ≤ 1.Let S := {2a : a ∈ A}, so S has no approximate 3-AP.Let t = ⌊ n 4N ⌋.Consider the set I i := {(i − 1)t + 1, (i − 1)t + 2, . . ., (i − 1)t + t} of t consecutive integers.Let T be the union of the sets I i with i ∈ S. The set T has size |T | = |A|t ≥ αn.Also, every element of T is a positive integer at most (2N − 1)t + t ≤ n/2.So if x, y, z ∈ T are such that (x, y, z) (mod n) form a 3-AP in Z n , then (x, y, z) is also a 3-AP of integers.Since S has no approximate 3-AP, it follows that the only 3-APs in T are those where the three terms are in the same interval I i .In each interval I i , which has size t, the number of 3-APs (with any integer difference allowed) is t + 2⌊ t 2 −1 4 ⌋.There are |A| intervals I i whose union is T .The number of 3-APs in Z n is n 2 .Hence, the 3-AP density of T as a subset of The Behrend construction [2] implies that on N (α) implies that if α > 0 is sufficiently small, then N (6α) ≥ 2 1 9 (log 2 1/α) 2 .Together with the previous lemma, we have the following immediate corollary.
Lemma 7.2.If α > 0 is sufficiently small, then for any positive integer n, there is a subset of Z n with density at least α and 3-AP density at most max 1 n , 2 − 1 9 (log 2 1/α) 2 .
7.1.The construction and proof of Theorem 3.3.We next construct a function f : The construction is done in three steps.
We let f = f 3 .It is easy to see that E x [f 3 (x)] = α.We now prove that there exists a choice of randomness (in Step 3) such that for each positive integer d < N/2, into different classes according to the congruence class modulo q of the 3-AP (so the class a 3-AP belongs to is determined by the congruence class modulo q of the first element of the 3-AP).Since βN α 3 (1 − ǫ) > q 2 , all classes of 3-APs modulo q with common difference dq appear, each class with at least q elements, and any two classes differ in size by at most 1.Hence, By the construction, if dq = 0 then E y∈Zq [g(y)g(y + dq )g(y + 2 dq )] ≤ α 3 (1 − 3ǫ), and for d divisible by q, In the third step, suppose d is nonzero and divisible by q and let t ∈ Z q with g(t) = α * .For Recall that a t is uniformly distributed over Z n \ {0} and b t is uniformly distributed over Z n , so (a t φ t (x) + b t , a t φ t (x + d) + b t , a t φ t (x + 2d) + b t ) is uniformly distributed over the 3-APs in Z n with nonzero common difference.Thus, where λ(ξ) is the density of 3-APs with nonzero common difference of ξ, Λ(ξ) is the density of 3-APs of ξ, and we have used that n ≥ √ N /2 ≥ ǫ −15/2 /2 ≥ α −10 and α ≤ α 0 is sufficiently small.Thus, for each t ∈ Z q such that g(t) = α * , For each t ∈ Z q such that g(t) = α * , Hence, where in the first inequality we used (18), (19) and (20) together with the fact that g(t) = α * for at least a 3/4 fraction of t ∈ Z q , and in the second inequality we used that q > N 1/5 ≥ ǫ −3 ≥ α −21 > 120/α * 3 .Notice that for fixed nonzero d divisible by q, the random variables for t ∈ Z q , are independent.By Hoeffding's inequality, the probability that , by the union bound, the probability that there exists a nonzero common difference d which is divisible by is at most (N/q) exp −72 −1 α * 6 q < q 4 exp −72 −1 α * 6 q < 1/2, as N < q 5 , q > ǫ −3 and ǫ ≤ α 7 .If d is not divisible by q, each 3-AP with common difference d occupies three different modulo q classes, and hence the weights of the elements in the 3-AP are independent random variables.By construction, for each Thus, For this fixed d, we can partition Z q into five sets S 1 , S 2 , . . ., S 5 such that for each i, 1 ≤ i ≤ 5, the 3-APs (t, t + d, t + 2 d), t ∈ S i are disjoint, and |S i | ≥ q/10.For each set S i , the random variables for t ∈ S i , are independent.By Hoeffding's inequality, the probability that is at most exp(−2(ǫα 3 ) 2 q/10) = exp(−5 −1 • ǫ 2 α 6 q).By the union bound, the probability that there exists a common difference d not divisible by q with is at most 5N exp −5 −1 ǫ 2 α 6 q < 1/2, where we used N < q 5 , q > ǫ −3 and ǫ ≤ α 7 .Since f 3 (x) = 0 for all x / ∈ [N ′ ], Hence, with positive probability, the function f 3 satisfies the required properties in Theorem 3.3.
We have ≤ N ≤ tower(c log(1/ǫ 0 )), so we can apply Theorem 1.5 with ǫ 0 in place of ǫ to obtain the desired set A. The same argument also shows that we only need to prove Theorem 3.3 when N ≥ ǫ −15 .
We next discuss how to obtain a set A with the properties in Theorem 1.5 from Theorem 3.3 when N ≥ ǫ −15 .This follows via a standard sampling argument which is essentially similar to Lemma 9 in [12].However, there are some small differences to the argument which we now highlight.Given a function f : [N ] → [0, 1] such that the density of 3-APs with common difference d of f is small for all 0 < d < N/2, we sample a set A where each element x ∈ [N ] is in A with probability f (x) independent of each other.If the density of 3-APs with common difference d in A is concentrated around its expectation, which is the density of 3-APs with common difference d of f , then it is small with high probability.However, for d near N/2, there are very few 3-APs with common difference d, and we do not have sufficiently strong concentration to be able to take a union bound over all such d.To get around this, we define a function f ′ which is 0 for all x close to N , and which is equal to f elsewhere, and sample the set A from f ′ .This ensures that for common differences d which are close to N/2, set A contains very few 3-APs with common difference d.
We now carry out the details.By Theorem 3.3 applied with α replaced by α + 2ǫ and ǫ replaced by 12ǫ/α 3 ≤ α 7 , we can find a function f :

1 A
, and likewise for "density of 3-APs with common difference d" of A. Over the interval [N ], we have two possible notions for the density of 3-APs with common difference d of a function f : [N ] → [0, 1].One can define the density of 3-APs with common difference d of f as x∈[N −2d] [f (x)f (x + d)f (x + 2d)] N , as used in Theorem 1.3.This defines the density of 3-APs with common difference d as the average weight of the 3-APs (x, x + d, x + 2d) for x ∈ [N ], setting the value of f outside [N ] to 0. The other possible definition of the density of 3-APs with common difference d of f is

3. 1 .
Upper bound.Theorems 1.3 and 1.4 follow from the functional forms below by setting f = 1 A .Theorem 3.1 (Upper bound for intervals, functional version).There exists a constant C > 0 such that the following holds.Let ǫ > 0 and N

4. 1 .
Bohr sets .Denote the distance from x ∈ R to the nearest integer by x R/Z := min n∈Z |x−n|.Let arg(z) denote the argument of z ∈ C, so that arg(e it ) ∈ [0, 2π] and e it = e i arg(e it ) .Definition 4.1.Let G be an abelian group of odd order.For a subset S ⊆ G and a parameter ρ ∈ [0, 1], define the Bohr set B(S, ρ) = {x ∈ G : arg(χ(x))/(2π) R/Z ≤ ρ ∀χ ∈ S}.We call S the frequency set of the Bohr set B(S, ρ) and ρ the radius.The codimension of the Bohr set is |S|.

Proposition 4 . 5 .
Let B be a regular Bohr set of codimension d, β = β B and φ

Lemma 7 . 1 .
For α > 0 sufficiently small, there is a subset T ⊂ Z n with |T | ≥ αn and with 3-AP density at most max 1 n , 2α N (6α) .Proof.Let N = N (6α) − 1, so there is A ⊂ [N ] with |A| = ⌈6αN ⌉ which has no nontrivial 3-AP.First assume n ≤ 4N .Partition [N ] into at most 2N/n + 1 ≤ 6N/n intervals of length at most ⌈n/2⌉.The set A contains at least |A|/(6N/n) ≥ αn elements in one of these intervals.Viewed as a subset of Z n , we have a subset of Z n with density at least α and with no nontrivial 3-AP, and hence 3-AP density at most 1/n.

Proof of Theorem 3 . 3 .
We reuse the notations from the description of the construction.For a 3-AP (x, x+d, x+2d) in [N ], we refer to f 3 (x)f 3 (x+d)f 3 (x+2d) as its weight.For any common difference d ≥ N ′ 2 , all the 3-APs (x, x + d, x + 2d) in [N ] have zero weight since x + 2d > N ′ .Hence the density of 3-APs with common difference d of f 3 is 0. For any common difference d ≥ N ′ −βN α 3 (1−ǫ) 2 , let t = N ′ − 2d.Then t ≤ βN α 3 (1 − ǫ).The number of 3-APs with common difference d in [N ′ ] is at most t, and hence the number of 3-APs in [N ] with nonzero weight is at most t.The number of 3-APs with common difference d in [N ] is N − 2d = βN + t, so the density of 3-APs with common difference