Rado's criterion over squares and higher powers

We establish partition regularity of the generalised Pythagorean equation in five or more variables. Furthermore, we show how Rado's characterisation of a partition regular equation remains valid over the set of positive $k$th powers, provided the equation has at least $(1+o(1))k\log k$ variables. We thus completely describe which diagonal forms are partition regular and which are not, given sufficiently many variables. In addition, we prove a supersaturated version of Rado's theorem for a linear equation restricted either to squares minus one or to logarithmically-smooth numbers.


Introduction
Schur's theorem [Sch1916] is a foundational result in Ramsey theory, asserting that in any finite colouring of the positive integers there exists a monochromatic solution to the equation x + y = z (a solution in which each variable receives the same colour). A notorious question of Erdős and Graham asks if the same is true for the Pythagorean equation x 2 + y 2 = z 2 , offering $250 for an answer [Grah07,Grah08]. The computer-aided verification [HKM16] of the two colour case of this problem is reported to be the largest mathematical proof in existence, consuming 200 terabytes [Lam16]. We provide an affirmative answer to the analogue of the Erdős-Graham question for generalised Pythagorean equations in five or more variables.
Theorem 1.1 (Schur-type theorem in the squares). In any finite colouring of the positive integers there exists a monochromatic solution to the equation (1.1) Rado's criterion for one equation. Let c 1 , . . . , c s ∈ Z \ {0}, where s 3.
Then the equation s i=1 c i x i = 0 is (non-trivially) partition regular over the positive integers if and only if there exists a non-empty set I ⊂ [s] such that i∈I c i = 0. A number of authors [Ber96,Ber16,Grah08,DNB18] have sought algebraic characterisations of partition regularity within families of non-linear Diophantine equations. The example of the Fermat equation shows that one cannot hope for something as simple as Rado's criterion for diagonal forms. Nevertheless, provided that the number of variables s is sufficiently large in terms of the degree k, we establish that the same criterion characterises partition regularity for equations in kth powers. Theorem 1.3 (Rado over kth powers). There exists s 0 (k) ∈ N such that for s s 0 (k) and c 1 , . . . , c s ∈ Z \ {0} the following holds. The equation is (non-trivially) partition regular over the positive integers if and only if there exists a non-empty set I ⊂ [s] such that i∈I c i = 0. Moreover, we may take s 0 (2) = 5, s 0 (3) = 8 and s 0 (k) = k (log k + log log k + 2 + O(log log k/ log k)) . (1.3) Notice that Rado's criterion for a linear equation shows that the condition i∈I c i = 0 is necessary for (1.2) to be partition regular. The content of Theorem 1.3 is that this condition is also sufficient.
For higher-degree equations one cannot avoid the assumption of some lower bound on the number of variables, as the example of the Fermat equation demonstrates. Given current knowledge on the solubility of diagonal Diophantine equations [Woo92], the bound (1.3) is at the cutting edge of present technology. Indeed, it is unlikely that one could improve this condition without making an analogous breakthrough in Waring's problem, since partition regularity implies the existence of a non-trivial integer solution to the equation (1.2).
We remark that one could use the methods of this paper to establish the weaker but explicit bound s 0 (k) k 2 + 1.
This follows by utilising the work of Bourgain-Demeter-Guth [BDG16] on Vinogradov's mean value theorem, eschewing smooth numbers, as in [Cho17]. We are also able to establish the sufficiency of Rado's criterion for other sparse arithmetic sets of interest, such as logarithmically-smooth numbers and shifted squares. For these sets we avoid certain local issues which must be surmounted for perfect powers, and thereby prove stronger quantitative variants of partition regularity, analogous to work of Frankl, Graham and Rödl [FGR88] counting monochromatic solutions to a linear equation.
Theorem 1.4 (Supersaturation 1 in squares minus one). Let c 1 , . . . , c s ∈ Z\{0} with s 5 and suppose that i∈I c i = 0 for some non-empty I. Define the set of shifted squares by S := x 2 − 1 : x ∈ Z . For any r ∈ N there exist c 0 > 0 and N 0 ∈ N such that for any N N 0 if we have an r-colouring of S then  When R is logarithmic in N, of the form R = log K N, then |S(N; log K N)| ∼ N 1−K −1 +o(1) (N → ∞), so logarithmically-smooth numbers constitute a polynomially sparse arithmetic set [Gran08]. A recent breakthrough of Harper [Har16] gives a count of the number of solutions to an additive equation in logarithmically-smooth numbers. We are able to extend this count to finite colourings as follows.
Theorem 1.7 (Supersaturation in the smooths). Let c 1 , . . . , c s ∈ Z \ {0}, and suppose that i∈I c i = 0 for some non-empty I. Then for any r ∈ N there exist c 0 > 0 and C, N 0 ∈ N such that if N N 0 , R log C N and S(N; R) is r-coloured then The term 'supersaturation', from extremal combinatorics, describes when we wish to "determine the minimum number of copies of a particular substructure in a combinatorial object of prescribed size" [NSS18]. For us, the substructure is defined by a Diophantine equation.
As for shifted squares, we emphasise that the corresponding upper bound in (1.5) follows (when s 3) from the methods of Harper [Har16].
1.1. Non-triviality. It may be that (1.2) possesses a wealth of monochromatic solutions for 'trivial' reasons. For instance, if c 1 + · · · + c s = 0 then taking x 1 = · · · = x s yields many uninteresting solutions. We have delineated between partition regularity and non-trivial partition regularity to ensure that Rado's criterion still has content in such a situation. However, since Rado's criterion is necessary for 'trivial' partition regularity, the two notions are in fact equivalent.

Previous work.
To the knowledge of the authors, work on non-linear partition regularity begins with papers of Furstenberg and Sárközy [Fur77,Sár78], independently resolving a conjecture of Lovász-a line of investigation which culminates in the polynomial Szemerédi theorem of Bergelson-Leibman [BL96], proved using ergodic methods. Such methods have also established colouring results for which no density analogue exists, such as partition regularity of the equation x − y = z 2 [Ber96,p.53]. Interestingly, the story is more complicated for the superficially similar equation x + y = z 2 studied in [KS06,CGS12,GL16,Pac18].
A recent breakthrough of Moreira [Mor17] resolves a longstanding conjecture of Hindman [Hin79], proving partition regularity of the equation x + y 2 = yz. More intuitively: in any finite colouring of the positive integers there exists a monochromatic configuration of the form {a, a + b, ab}. This result is a consequence of a general theorem which also yields partition regularity of equations of the form x 0 = c 1 x 2 1 + · · · + c s x 2 s , subject to the condition that c 1 + · · · + c s = 0.
Notice that all of the above results involve an equation with at least one linear term. There are fewer results in the literature concerning genuinely non-linear equations such as (1.2). Certain diagonal quadrics are dealt with in Lefmann [Lef91, Fact 2.8], using Rado's theorem to locate a long monochromatic progression whose common difference possesses a (well-chosen) multiple of the same colour. This results in the following sufficient condition for partition regularity.
This result reduces the combinatorial problem of establishing partition regularity of (1.7) to a task in number theory: find a rational point of a certain form on a variety determined by a diagonal quadric and linear equation. In Appendix F we derive general algebraic criteria guaranteeing such a rational point using the Hardy-Littlewood circle method.
Theorem 1.8 (Lefmann + Hardy-Littlewood circle method). Let c 1 , . . . , c s ∈ Z \ {0}, and suppose that i∈I c i = 0 with I = ∅. Suppose in addition that |I| 6 and at least two c i are positive and at least two are negative. Then c 1 x 2 1 + · · · + c s x 2 s = 0 (1.8) is partition regular.
We emphasise that Lefmann's criterion cannot hope to be a necessary condition for partition regularity, as there are partition regular equations for which the auxiliary Lefmann system (1.6) has no rational point of the required form. Such equations include the generalised Pythagorean equation (1.1), as well as the 'convex' equation (1.9) addressed in [BP17].
In the same article, Lefmann [Lef91, Theorem 2.6] established Rado's criterion for reciprocals. This demonstrates the partition regularity of answering a question of Erdős and Graham. If one is prepared to relax the definition of partition regularity, so that certain variables are not constrained to receive the same colour as the remainder, then specific homogeneous equations of arbitrary degree are dealt with in Frantzikinakis-Host [FH14]. For instance, one consequence of their methods is that in any finite colouring of the positive integers there exist distinct x, y of the same colour, along with λ (possibly of a different colour) such that 9x 2 + 16y 2 = λ 2 .
(1.10) However for these techniques to succeed, not only must one variable of (1.10) be free to take on any colour, but it is also necessary for the solution set to possess a well-factorable parametrisation, allowing for the theory of multiplicative functions to come into play. When the coefficients of (1.2) sum to zero, partition regularity follows easily, since any element of the diagonal constitutes a monochromatic solution. However, there are results in the literature which also guarantee non-trivial partition regularity in this situation, provided that s k 2 + 1. This was first established for quadrics in [BP17] and for general k in [Cho17]. In fact in [Cho17] it is established that, under these assumptions, dense subsets of the primes contain many solutions to (1.2). Density results were obtained for nondiagonal quadratic forms in at least 9 variables by Zhao [Zha17], subject to the condition that the corresponding matrix has columns which sum to zero.
We believe that when the solution set of a given equation contains the diagonal it is more robust with respect to certain local issues-indeed one expects dense sets (such as congruence classes) to contain solutions under this assumption. As a consequence, the local issues for such equations are easier to handle using elementary devices, such as passing to a well-chosen subprogression. The novelty in our methods is that for general equations, instead of tackling the somewhat thorny local problem head on, we show how we may assume our colouring possesses a certain homogeneous structure, and this structure allows the same devices available in the dense regime to come into play.
We remark that it appears to be a challenging problem to decrease s 0 (k) substantially below k 2 + 1 for the density analogue of Theorem 1.3. In order to show that s 0 (k) = (1+o(1))k log k is admissible in our partition result we make heavy use of the fact that a colouring of the positive integers induces a colouring of the smooth positive integers, and we obtain a monochromatic solution to our equation in the smooths. Sets of positive density, however, may not contain any smooth numbers. We are therefore in the curious situation where we can prove that relatively dense sets of smooth numbers possess solutions to certain diagonal equations, but cannot say the same for dense sets of integers.
It is interesting to compare our results with partition regularity results over the primes. Here congruence obstructions mean that one cannot hope to establish a Rado-type criterion. For example, a parity obstruction prohibits Schur's equation from being partition regular over the primes. The situation is markedly different if one considers modifications of the primes with no local obstructions, such as the set of primes minus one. Partition regularity of the Schur equation over this set was established by Li-Pan [LP12], then generalised to the full Rado criterion for systems of linear equations by Lê [Lê12]. This latter result utilised the full strength of Green and Tao's asymptotic for linear equations in primes [GT10a], together with a characterisation of so called 'large' sets due to Deuber [Deu73]. Neither of these tools are available, or reasonable to expect, for kth powers.
The argument of Li-Pan for Schur's theorem in primes minus one is a direct application of the Fourier-analytic transference principle pioneered by Green [Gre05], elucidated by the same author in the context of partition regularity in a comment 2 on MathOverflow. This approach cannot hope to succeed for perfect powers, at least when the coefficients of the equation do not sum to zero, since one can no longer pass to the same (affine) subprogression in all of the variables. The introduction of homogeneous sets (Definition 2.2) allows us to circumvent these difficulties. However, for squares minus one, or smooth numbers, one need only pass to projective subprogressions when enacting the transference principle. The methods of Part 3 therefore use a direct form of the transference principle analogous to Li-Pan. We include the argument to illustrate the subtleties which must be overcome for perfect powers.
1.3. Notation. We adopt the convention that ε denotes an arbitrarily small positive real number, so its value may differ between instances. We shall use Vinogradov and Bachmann-Landau notation: for functions f and positivevalued functions g, write f ≪ g or f = O(g) if there exists a constant C such that |f (x)| Cg(x) for all x. At times we opt for a more explicit approach, using C to denote a large absolute constant (whose value may change from line to line), and c to denote a small positive absolute constant. The notation f ≍ g is the same as . . , ⌊Y ⌋}. We write T for the torus R/Z. For x ∈ R and q ∈ N, put e(x) = e 2πix and e q (x) = e 2πix/q . If S is a set, we denote the cardinality of S by |S| or #S.
Throughout we use counting measure on Z d and Haar probability measure on the dual T d : Define the Fourier transform of f bŷ We endow T d with the metric (α, β) → α − β , where

Methods
All of the essential ideas required for Theorem 1.3 are contained in the proof of the following finitary analogue of Theorem 1.1, whose deduction is the focus of this section.
Theorem 2.1 (Finitary Schur-type theorem in the squares). For any r ∈ N there exists N 0 = N 0 (r) such that for any N N 0 the following is true. Given an r-colouring of [N] there exists a monochromatic solution to the equation Chapman [Cha18] has observed that this is a quantitative variant of what it means to be multiplicatively syndetic (see ), and that such sets appear to have a number of interesting properties in regard to the partition regularity of homogeneous systems of polynomial equations.
We leave it as an exercise for the reader to verify that if B is an Mhomogeneous set then |B ∩ [N]| ≫ M N for N sufficiently large in terms of M, so homogeneous sets are dense (see Lemma 4.2). In fact they are dense on all sufficiently long homogeneous arithmetic progressions.
We demonstrate the utility of this definition by giving a proof of Schur's theorem. The argument is prototypical for that employed in the proof of Theorem 2.1.
Proof of Schur's theorem. We induct on the number of colours r to show that there exists N r ∈ N such that however [N r ] is r-coloured there exist x 1 , x 2 , x 3 ∈ [N r ] all of the same colour with x 1 + x 2 = x 3 .
The base case of 1-colourings follows on taking N 1 = 2, so we may assume that r 2. Let N be a large positive integer, whose size (depending on r) is to be determined, and fix an r-colouring Set M := N r−1 and consider two possibilities. The inhomogeneous case: Some colour class C i is not M-homogeneous in [N]. From the definition of homogeneity it follows that there exists a pos- Since M = N r−1 it follows from our induction hypothesis that there exist Schur's theorem follows in this case on setting x t := qx ′ t for t = 1, 2, 3. The homogeneous case: All colour classes are M-homogeneous in [N]. In this case it turns out that every colour class contains a solution to the Schur equation, provided that N is sufficiently large in terms of r. To prove this we invoke the following.
then there exist x, x ′ ∈ A and y ∈ B such that x − x ′ = y.
The claim settles the homogeneous case of Schur's theorem on taking A = B to be any colour class, since M-homogeneous sets have density at least M −2 + o(1) in [N] (see Lemma 4.2; one could have alternatively taken the largest colour class).
To prove the claim we invoke Szemerédi's theorem! 4 This yields N 0 = N 0 (δ, M) such that for any N N 0 if A ⊂ [N] with |A| δN then A contains an arithmetic progression of length M + 1, so that there exist x and q > 0 for which . Taking x ′ = x + y establishes the claim and completes our proof of Schur's theorem.
It may seem excessive to employ a density result in the proof of a colouring result, since (typically) density results lie deeper and require more work to prove. 5 We have described this approach to motivate our proof of Theorem 1.1, which uses an analogous non-linear density result. We also believe the proof offers an alternative reason for why Schur's theorem is true: there is always a long homogeneous arithmetic progression on which one of the colour classes is multiplicatively syndetic. This exemplifies a well-used philosophy in Ramsey theory that underlying every partition result there is some notion of largeness.
To prove partition regularity of the generalised Pythagorean equation we induct on the number of colours as in our proof of Schur's theorem. The inhomogeneous case follows with minimal change to the argument. In the remaining case we may assume that all colour classes are homogeneous. In this situation we are able to show that every colour class contains many solutions to our non-linear equation by employing the following density result.
Using Green's Fourier-analytic transference principle [Gre05], as elucidated for squares in [BP17,Pre17a], the deduction of Theorem 2.3 is reduced (in § §5-6) to a linear analogue in which the squares have been removed from the dense variables. This can be thought of as a generalisation of the Furstenberg-Sárközy theorem [Fur77,Sár78], extended to homogeneous sets. tuples (x, y) ∈ A 2 × B 3 satisfying the equation Our ability to remove the squares from the dense variables is intrinsically linked to the fact that the coefficients corresponding to these variables sum to zero. One consequence of this is that we may restrict all of the dense variables to lie in the same congruence class, without destroying solutions to the equation in the process.
Theorem 2.4 is ultimately derived (in §8) from the following result, which is both more general and at the same time slightly weaker than Theorem 2.4. It is weaker in that it yields only one solution to (2.2), yet it applies to the more general context of multidimensional sets of integers. The increase in dimension allows us to deduce a supersaturation result for (2.2) by bootstrapping the existence of a single solution to the existence of many solutions, using an averaging argument first implemented by Varnavides [Var59]. x, x ′ ∈ A and y 1 ∈ B 1 , . . . , y d ∈ B d such that x − x ′ = (y 2 1 , . . . , y 2 d ). (2.3) In §7 this theorem is proved using the Fourier-analytic density increment strategy pioneered by Roth [Rot53], a proof which yields quantitative bounds on N 0 . One can deduce the qualitative statement in a few lines from the multidimensional polynomial Szemerédi theorem of Bergelson and Leibman [BL96], see Corollary 9.1. The general Rado criterion of Theorem 1.3 requires a more complicated density result for which Fourier analysis does not appear sufficient and which therefore necessitates the invocation of this deep result.

Open problems
3.1. The supersaturation result. Frankl, Graham and Rödl [FGR88] establish that for any r-colouring of [N], a linear equation s i=1 c i x i = 0 satisfying Rado's criterion has ≫ r N s−1 monochromatic solutions. Our methods do not yield the analogous supersaturation result for equation (1.2). We instead find that if N is sufficiently large in terms of M then [N] contains a homogeneous arithmetic progression of length M which possesses at least ≫ r M s−k monochromatic solutions to (1.2). This deficiency is an artefact of our method where, to avoid tackling certain local issues, we iteratively pass to a well-chosen homogeneous subprogression.
It may be possible to establish a supersaturation result if one is prepared to replace the homogeneous arithmetic progressions appearing in this paper with quadratic Bohr sets. Informally, let us call a set quadratic Bohr homogeneous if it has large intersection with all quadratic Bohr sets (centred at zero). Then our methods reduce to showing that if A is a dense subset of a quadratic Bohr set and if B is quadratic Bohr homogeneous, then there are many solutions to the equation x 2 1 − x 2 2 = y 2 1 + y 2 2 + y 2 3 with x i ∈ A and y i ∈ B. A promising strategy for obtaining such a result proceeds by decomposing 1 A according to a variant of the arithmetic regularity lemma developed by Green and Tao [GT10b]. It is in fact this strategy which informs the simpler approach developed in this paper.
3.2. Quantitative bounds. Define the Rado number (see [GRS90,p.103]) of the equation (1.2) to be the smallest positive integer R c,k (r) such that any r-colouring of the interval {1, 2, . . . , R c,k (r)} results in at least one monochromatic tuple (x 1 , . . . , x s ) satisfying (1.2) with all x i distinct. For linear equations, this quantity has been extensively studied by Cwalina and Schoen [CS17], with near optimal bounds extracted for certain choices of coefficients. In [BP17] it is shown that when k = 2, c 1 + · · · + c s = 0 and s 5 then there exists a constant C c such that R c,2 (r) exp exp exp(C c r). (3.1) It is feasible that the methods of this paper lead to quantitative bounds for the Rado number of the equation (1.2) provided that there exist coefficients with c i = −c j . In this situation, all of the results we employ in our argument can be proved using Fourier-analytic methods, where the quantitative machinery is well-developed. However, these bounds are sure to be of worse quality than (3.1) due to our induction on the number of colours, a feature of the argument not present in [BP17].
If there are no coefficients satisfying c i = −c j , then any hope of extracting quantitative bounds on R c,k (r) is diminished, since the methods of this paper invoke the multidimensional (polynomial) Szemerédi theorem, a result for which there are no quantitative bounds presently known. It would be interesting if one could avoid calling on such a deep result.
3.3. Systems of equations. Rado [Rad33] characterised when systems of linear equations are partition regular. This criterion says that a system Ax = 0 is partition regular if and only if the integer matrix A satisfies the so-called columns condition (see [GRS90,p.73]). We conjecture that the columns condition is sufficient for systems of equations in kth powers, provided that the number of variables is sufficiently large in terms of the degree and the number of equations, and that the matrix of coefficients is sufficiently generic. For instance, in analogy with results of Cook [Coo71] we posit the following.
Conjecture 3.1. Let a 1 , . . . , a s , b 1 , . . . , b s ∈ Z \ {0}. Then the system of equations a 1 x 2 1 + · · · + a s x 2 s = 0 b 1 x 2 1 + · · · + b s x 2 s = 0 is non-trivially partition regular, provided that (i) s 9; (ii) the matrix A := a 1 . . . a s b 1 . . . b s satisfies the columns condition; (iii) for any real numbers λ, µ that are not both zero, the vector (λ, µ)A has at least five non-zero entries, not all of which have the same sign.
Condition (ii) is certainly necessary for partition regularity, by Rado's criterion. Weakening conditions (i) and (iii) would presumably require improvements in circle method technology.
3.4. Roth with logarithmically-smooth common difference. Using the arguments of §9 one can prove the following (see Remark 9.3).
Theorem 3.2. If A ⊂ [N] lacks a three-term arithmetic progression with Rsmooth common difference, where 10 R N, then (3.2) When R = log K N for some fixed absolute constant K, the set of R-smooth numbers in [N] has cardinality N 1−K −1 +o(1) . Common differences arising from such a set are therefore polynomially sparse, and Theorem 3.2 results in a density bound of the form (log log N) −1+o(1) .
The argument for Theorem 3.2 really only uses the fact that the R-smooths contain the interval [R], and that A must be dense on a translate of this set, so we are in fact locating a 'short' arithmetic progression. Since smooth arithmetic progressions are much more abundant than short arithmetic progressions, it would be interesting if one could obtain a better density bound by exploiting this.
The only other bound known for Roth's theorem with common difference arising from a polynomially sparse arithmetic set can be found in [Pre17b], which deals with perfect kth powers. This also results in a double logarithmic bound, of the form (log log N) −c k for some small c k > 0. Breaking the double logarithmic barrier for the smooth Roth problem may be a tractable intermediate step towards improving bounds in the polynomial Roth theorem.

Part 1. The generalised Pythagorean equation
In this part we establish partition regularity of the 5-variable Pythagorean equation The proof contains all of the essential ideas required for Theorem 1.3 but is more transparent, avoiding notational complexities and the need for smooth number technology. Unlike the general case, we show that all requisite steps can be established using Fourier analysis, avoiding recourse to deeper results involving higher-order uniformity and the multidimensional Szemerédi theorem. This may be of use to those interested in quantitative bounds and supersaturation.
Throughout this part we assume familiarity with the high-level schematic outlined in §2.
By Theorem B.1, there exist N 1 ∈ N and c 1 > 0 such that for N N 1 we have Since the latter quantity is positive, Theorem 2.1 follows for 1-colourings (the base case of our induction).
4.2. The inductive step. Let [N] = C 1 ∪ · · ·∪ C r be an r-colouring. We split our proof into two cases depending on the homogeneity of the C i .
4.2.1. The inhomogeneous case. Let M := N 0 (r − 1) be the quantity whose existence is guaranteed by our inductive hypothesis. We first suppose that some . By the induction hypothesis, there exist y k ∈ C ′ j for some j = i such that y 2 1 − y 2 2 = y 2 3 + y 2 4 + y 2 5 . Setting x k := qy k we obtain elements of C j which solve the generalised Pythagorean equation.

4.2.2.
The homogeneous case. In this case every colour class is M-homogeneous in [N]. We claim that Theorem 2.3 then implies that each C i contains a solution to the generalised Pythagorean equation. First we observe that each colour class is dense. Proof. We proceed by a variant of Varnavides averaging [Var59]. For each q N/M the definition of homogeneity gives Summing over q then yields Interchanging the order of summation, we see that The result follows on noting that Since the latter quantity is positive the induction step follows, completing the proof of Theorem 2.1. Note that a quantity dependent on M = N 0 (r − 1) is ultimately dependent only on r.

A pseudorandom Furstenberg-Sárközy theorem
In §4 we reduced partition regularity of the generalised Pythagorean equation (1.1) to Theorem 2.3. In §6 we deduce the latter result from Theorem 2.4. To prepare the ground for this deduction, we first modify Theorem 2.4 to accommodate sets which are relatively dense in a suitably pseudorandom set. The goal is to find the weakest possible pseudorandomness conditions required for such a result to hold. Our primary quantity of interest is the following.
Definition 5.1 (T 1 counting operator). Given functions f 1 , f 2 : Z → C with finite support and B ⊂ Z, define We and an application of Theorem 2.4 completes the proof. Our next step is to weaken the assumptions of Theorem 2.4 even further, replacing bounded functions with unbounded functions which are sufficiently pseudorandom. The pseudorandomness we enforce posits the existence of a 'random-like' majorising function ν, whose properties are given in the following two definitions.
Definition 5.3 (Fourier decay). We say that ν : Definition 5.4 (p-restriction). We say that ν : Theorem 5.5 (Pseudorandom Sárközy). For any δ > 0 and K, M ∈ N there exist N 0 , c 0 , θ > 0 such that for any N N 0 the following holds. Let B be an M-homogeneous set of positive integers. Let ν : [N] → [0, ∞) satisfy a 4.995-restriction estimate with constant K, and have Fourier decay of level θ.
Then for any f : Proof. Since ν has Fourier decay of level θ, we may apply the dense model lemma recorded in [Pre17a, Theorem 5.1], rescaling as appropriate, to conclude the existence of g : Provided that θ exp(−Cδ −1 ) with C a large positive constant, we can compare Fourier coefficients at 0 to deduce that g 1 ≫ δN. Applying Lemma 5.2 then gives Let h denote the indicator function of the set The function h is majorised by the indicator function of the set which, by Lemma B.3, satisfies a 4.995-restriction estimate with constant O(1). The function g is majorised by 1 [N ] , which satisfies a 4.995-restriction estimate with constant O(1). Employing the generalised von Neumann lemma (Lemma C.3), together with (5.1) and (5.3), we deduce that . Combining this with (5.2) and choosing θ θ 0 (δ, M, K) completes the proof.
6. The W -trick for squares: a simplified treatment In this section we deduce our non-linear density result (Theorem 2.3) from its pseudorandom analogue (Theorem 5.5). The heart of the matter is massaging the set of squares to appear suitably pseudorandom. This is accomplished using a version of the W -trick for squares, simplified from that developed in Browning-Prendiville [BP17].
It is useful to have a non-linear version of the operator T 1 introduced in §5.
Definition 6.1 (T 2 counting operator). Given functions f 1 , f 2 : Z → C with finite support and B ⊂ Z, define Assuming the notation and premises of Theorem 2.3, our objective is to obtain a lower bound for T 2 (A; B) by relating it to an estimate for T 1 (f ; B), where f is a function bounded above by a pseudorandom majorant ν, as in Theorem 5.5. Let is a constant to be determined, and the product is over primes. By Lemma A.4, applied with S = [N], there exists a w-smooth positive integer ζ ≪ δ,w 1, and ξ ∈ [W ] with (ξ, W ) = 1, such that and, noting that (2W ) 1/2 is a positive integer, set One may check that B 1 is M-homogeneous, and that there exists an absolute constant C such that if N C(δζW ) −1 then By the binomial theorem We note that although the squares are not equidistributed in arithmetic progressions with small modulus, the same cannot be said of the set This is the reason for our passage from A to A 1 ; the latter is a subset of the more pseudorandom set (6.4). Unfortunately, the (truncated) Fourier transform of (6.4) still does not behave sufficiently like that of an interval: they decay differently around the zero frequency, reflecting the growing gaps between consecutive elements of (6.4). To compensate for this, we must work with a weighted indicator function of A 1 that counteracts this increasing sparsity.
We first observe that A 1 is contained in the interval [X], where Define a weight function ν : Since the results we are about to invoke are independent of the normalisation of ν, we note that we could replace the weight W x+ξ in the above definition by x, or even by √ n. We have chosen to incorporate the more complicated weight in order to make calculations a little cleaner. The weight ν(·) has average value 1, since Lemma 6.2 (Density transfer). For N large in terms of w and δ we have Proof. For N sufficiently large in terms of δ and w the estimate (6.2) holds so, with Z > 0 a parameter, we have An application of (6.6) completes the proof.
The following two ingredients are established in Appendices D and E.
Lemma 6.3 (Fourier decay). We have Lemma 6.4 (Restriction estimate). For any real number p > 4 we have Proof of Theorem 2.3. Let K denote the absolute constant implicit in Lemma 6.4 when p = 4.995. Let N 0 and θ denote the parameters occurring in Theorem 5.5 with respect to a density of δ 2 /256, restriction constant K and homogeneity of level M. Employing Lemma 6.3, we may choose w = w(δ, M) sufficiently large to ensure that ν has Fourier decay of level θ with respect to 1 [X] . Setting f = ν1 A 1 in Theorem 5.5 yields . This inequality completes the proof of Theorem 2.3 on noting that X ≫ δ,M N 2 and ν ∞ ≪ N.

Multidimensional homogeneous Furstenberg-Sárközy
It remains to establish Theorem 2.4. In §8 we derive this supersaturated counting result from a multidimensional 'existence' result, Theorem 2.5, whose proof is the aim of this section. One can prove Theorem 2.5 succinctly using the multidimensional polynomial Szemerédi theorem of Bergelson-Leibman [BL96], see Corollary 9.1 for such an argument. One may regard such an approach as overkill, and of little utility if one is interested in quantitative bounds. In this section we opt for a more circuitous approach which demonstrates how Fourier analysis suffices for Theorem 2.5. More precisely, we adapt the Fourier-analytic density increment strategy originating with Roth [Rot53] and Sárközy [Sár78], and show how it may accommodate the presence of homogeneous sets. The structure of our argument is based on Green [Gre02].
Proof of Theorem 2.5 given Lemma 7.1. Let us assume that A ⊂ [N] d has size at least δN d and lacks solutions to (7.1) with y i ∈ B i , where the B i are Mhomogeneous sets. Setting A 0 := A, we iteratively apply Lemma 7.1 to obtain a sequence of sets A 0 , A 1 , A 2 , . . . , each contained in an ambient grid [N n ] d with If this iteration continues until n is sufficiently large in terms of d, δ, M, we obtain a density exceeding 1, which would be impossible. Hence for some n ≪ d,δ,M 1 the inequality (7.2) is satisfied with N n in place of N therein. Therefore We henceforth proceed with the proof of Lemma 7.1. Put Write T B (f ) for T B (f, f ). With this notation, our assumption is that Then by bilinearity Hence there exists g : [N] → [0, 1] with g 2 √ δN and such that Since the balanced function f has average value 0, one can regard (7.3) as exhibiting the fact that f displays some form of non-uniformity. In order to demonstrate this formally we require the following lemmas.
Lemma 7.2 (Homogeneous counting lemma). Let B = B 1 × · · · × B d be a product of M-homogeneous sets. Then for N 64M 2 we have Proof. It suffices to prove the result for d = 1, since If y ∈ N/2 then y 2 ∈ [N/2], so for y in this interval we have Summing over y lying in the intersection of this interval with a homogeneous set B, we apply Lemma 4.2 to deduce that The result follows provided that N is sufficiently large.
Then for i = 1, 2 we have Proof. We prove the result for i = 1, the other case being similar.
By orthogonality and Hölder's inequality, we have The result now follows on incorporating Parseval's identity The latter mean value estimate follows from orthogonality and Theorem B.1.
When taken in conjunction with (7.3), Lemmas 7.2 and 7.3 imply that for N 64M 2 there exists α ∈ T d for which Lemma 7.4 (Fragmentation into level sets). If α ∈ T d , Q 1 and P ∈ N then there exist positive integers q i Q and a partition of Z d into sets R of the form such that for any g : Z d → [−1, 1] with finite L 1 norm we have the estimate Proof. By a weak form of a result of Heilbronn [Hei48], there are q 1 , . . . , q d Q such that We partition Z d into congruence classes of the form then partition each copy of Z appearing in this product into a union of intervals of the form 2nP + (−P, P ] with n ∈ Z. This yields a partition of Z d into sets R of the form (7.5).
It then follows from the triangle inequality that Let us take P := N 1/9 and Q := N 3/8 . Then, provided that (7.2) fails to hold, we have With these bounds in hand, we claim that we may apply Lemma 7.4 to (7.4) and conclude that there exists a set R contained in [N] d and of the form (7.5) for which Let us presently set about showing this. The first bound in (7.8), together with (7.6), implies that By definition, the balanced function has average value x f (x) = 0, so adding this quantity to either side of the inequality gives Inspection of the proof of Lemma 7.4 reveals that the number of R which The second inequality in (7.8) now implies that By (7.8) and (7.10), the number of R contained in An application of the pigeonhole principle finally confirms (7.9). The estimate (7.9) completes our proof of Lemma 7.1, for if R takes the form (7.5) with P = N 1/9 then we may take N 1 := 2P , B ′ i := {y ∈ N : q i y ∈ B i } and

Varnavides averaging for supersaturation
We complete the proof of Theorem 2.1 by deducing the counting result, Theorem 2.4, from the multidimensional existence result, Theorem 2.5. The deduction proceeds by collecting a single configuration from many subprogressions, then establishing that these configurations don't coincide too often. This random sampling argument originates with Varnavides [Var59].
Proof. For q, n ∈ Z d write q ⊗2 ⊗ n for the tuple (q 2 1 n 1 , . . . , q 2 d n d ) and write Let N 0 = N 0 (δ/2 1+d , d, M) be the quantity given by Theorem 2.5. Suppose that N N 0 and write Q := N/N 0 . Averaging, we have By the definition of Q, there are at most (2N) d choices for z for which there exists q ∈ [Q] d such that Hence there are at least 1 Call each such choice of (z, q) a good tuple. Define Translating and dilating, we deduce that each set As there are at most N 0 choices for m i for fixed y i , there are at most N d 0 choices for q. Once one has fixed this choice of q we have so there are at most N d 0 choices of z for fixed x. This establishes the claim. Invoking the claim gives Next we interchange the order of summation to find that It follows that The result follows since N 0 ≪ δ,d,M 1.
Proof that Proposition 8.1 implies Theorem 2.4. We prove a more general result for sums of d squares. First note that, by translation, Proposition 8.1 remains valid for dense subsets of For each such tuple the sum x = x 1 + · · · + x d is an element of the one-dimensional set A, as is x + y 2 1 + · · · + y 2 d . As each element of A has at most (2N + 1) d−1 representations of the form x 1 +· · ·+x d , it follows that the number of solutions to x Part 2. Rado's criterion over squares and higher powers In this part we prove Theorem 1.3. Let η = η k > 0 be a fixed constant, where η 2 = 1, and η k is sufficiently small when k 3. In other words, we will work with smooth numbers when k 3, but not when k = 2. This choice will improve our mean value estimate in the former situation, and our minor arc estimate in the latter.

The smooth homogeneous Bergelson-Leibman theorem
We begin our investigation of Rado's criterion in kth powers by generalising Theorem 2.4, which asserts that dense multidimensional sets contain configurations of the form (x 1 , . . . , x d ), (x 1 + y 2 1 , . . . , x d + y 2 d ) with the y i lying in a homogeneous set. We require a version of this result which concerns affine configurations determined by kth powers, similar in flavour to the following special case of the multidimensional polynomial Szemerédi theorem of Bergelson-Leibman [BL96].
We require a version of this result in which the kth power comes from a homogeneous set. Fortunately, this strengthening can be deduced from the original. It is convenient to set up the following notation.
Notation. Given q, y, k ∈ N d define q ⊗ y := (q 1 y 1 , . . . , q d y d ), Here is our version of the Bergelson-Leibman theorem with common difference arising from a homogeneous set. x + y ⊗k ⊗ F ⊂ A. (9.1) Proof. Let K := i k i and consider the finite set By the Bergelson-Leibman theorem, provided that N ≫ M,K,F,δ 1, there exist x ∈ Z d and t ∈ N such that The result follows if the progression t K · [M K ] contains an element of the form Next we require a counting analogue of this result. In fact, we need to count the number of configurations arising from a smooth common difference. Before stating the theorem, we remind the reader of what it means for a set to be M-homogeneous in the N η -smooths (see Definitions 1.6 and 2.2).
Theorem 9.2 (Varnavides averaging). Let k 1 , . . . , k d , M ∈ N, η, δ ∈ (0, 1], and let F ⊂ Z d be a finite set. There exist N 0 ∈ N and c 0 > 0 such that for any N N 0 , if A ⊂ [N] d has |A| δN d and B ⊂ N is M-homogeneous in the N η -smooths, then the number of tuples (x, y) ∈ Z d × B d for which (9.1) holds is at least Proof. Increasing the size of F if necessary, we may assume that F contains two elements which differ in the ith coordinate for each i ∈ [d]. Let N 0 be the quantity given by Corollary 9.1 with respect to the density δ/2 d+1 . Suppose that N N 1/η 0 , (9.2) and define the following sets of smooths: Interchanging the order of summation, we have Notice that there are at most (2N) d choices for z for which there exists q ∈ S 1 × · · · × S d such that Hence there are at least 1 2 δN d |S 1 | · · · |S d | choices for (z, q) ∈ Z d × i S i for which Call such a choice of (z, q) a good tuple.
Claim 1. For each good tuple (z, q) the set A ∩ z + q ⊗k ⊗ [N 0 ] d contains a configuration of the form x + y ⊗k ⊗ F for some x ∈ Z d and some y ∈ B d . To see this, define Using the fact that B is N η -smoothly M-homogeneous, together with (9.2), one can check that each B i is M-homogeneous (not just smoothly homogeneous). Invoking Corollary 9.1, we see that there exist x ∈ Z d and y ∈ B 1 × · · · × B d such that x + y ⊗k ⊗ F ⊂ A z,q .
Translating and dilating, we deduce that A ∩ z + q ⊗k ⊗ [N 0 ] d contains a configuration of the form x ′ + (q ⊗ y) ⊗k ⊗ F . By definition of the B i and the fact that F is non-constant in each coordinate, we see that y ∈ [N 0 ] d and thus each coordinate of q ⊗ y lies in B. This establishes Claim 1.
(9.5) Then interchanging the order of summation shows that the sum (x,y)∈A G(x, y) is at least Applying Lemma A.2 (for N sufficiently large) we deduce that Since the theorem asserts a lower bound on the size of A, the result is proved provided we have the following upper bound on G(x, y).
Claim 2. Suppose that F contains two elements which differ in the ith coordinate for each i ∈ [d]. Then G(x, y) N 2d 0 .
To see this, first note that if x + y ⊗k ⊗ F ⊂ z + q ⊗k ⊗ [N 0 ] d then, since F contains two elements differing in their ith coordinate, there exist integers . Subtracting these elements, we deduce that there exists n i ∈ [N 0 ] for which As there are at most N 0 choices for n i , and y i is fixed, there are at most N d 0 choices for q. Once one has fixed this choice of q, for any f ∈ F we have so there are at most N d 0 choices for z. In summary G(x, y) N 2d 0 , which establishes Claim 2.

A supersaturated generalisation of both Roth and Sárközy's theorems
In this section we deduce a one-dimensional counting result analogous to Theorem 2.4 by projecting down the multidimensional Theorem 9.2. Again we remind the reader of what it means to be M-homogeneous in S(N 1/k ; N η ) (see Definition 2.2).
Since there are at most 1 2 |A| elements x of A satisfying the inequality x 1 2 |A|, it follows that for N C s,t δ −1 we have In the statement of Theorem 10.1, at least one of the coefficients λ i must be positive. Relabelling indices, we may assume that λ s > 0. For a technical reason, it will be useful in a later part of the argument if we can ensure that Define F ⊂ Z s+t−2 to be the set consisting of the zero vector together with the rows of the following matrix  Consider the setB := {y ∈ N : λ s y ∈ B} ∪ (N 1/k λ −1 s , ∞). Provided that N η max {λ s , M} (as we may assume), we see thatB is Mhomogeneous in the N η -smooths. Applying Theorem 9.2, we find that there are at least c 0 N s+t−2+s−2+ t k tuples (x, y, z) ∈ Z s+t−2 ×B s−2 ×B t such thatÃ contains the configuration , hence by definition ofB we deduce that λ s z i ∈ B. Projecting down to one dimension and taking into account the multiplicities of representations, we obtain ≫ N s+ t k −1 tuples (x, y, z) ∈ Z × N s−2 × N t with λ s z i ∈ B and such that A contains the configuration Let us set x i := x − λ s y i for i = 1, . . . , s − 2, along with x s−1 = x and One can then check that the tuple (x 1 , . . . , x s , λ s z 1 , . . . , λ s z t ) is an element of A s ×B t satisfying (10.1). By construction there are ≫ N s+ t k −1 such tuples.

Pseudorandom Roth-Sárközy
In this section we develop a pseudorandom variant of Theorem 10.1. As in Part 1, we begin by relaxing Theorem 10.1 to encompass general bounded functions. In order to count solutions to our equation weighted by general functions, we use the following notation.
Remark 11.2 (Dependence on constants). In the sequel we regard the coefficients λ i and µ j as fixed, and suppress their dependence in any implied constants. Similarly for the degree k and the number of variables s + t. We also fix η = η k globally: recall that this is 1 if k = 2, and a small positive constant if k 3. We opt to keep any dependence on the following explicit: the level of homogeneity M, and the density δ. Then and an application of Theorem 10.1 completes the proof.
Our next step is to weaken the assumptions of Theorem 10.1 even further, replacing bounded functions with unbounded functions which are sufficiently pseudorandom, in that they possess a majorant with good Fourier decay (Definition 5.3) and p-restriction (Definition 5.4).

The W -trick for smooth powers and a non-linear Roth-Sárközy theorem
Our objective in this section is to use Theorem 11.4 to deduce the following non-linear density result. Recall that η = η k is 1 if k = 2, and a small positive constant if k 3.
This deduction proceeds by developing a W -trick for smooth kth powers, analogous to that developed for prime powers in [Cho17]. Let where w = w(η, δ, M) is a constant to be determined, and the product is over primes. We apply Lemma A.4 with S = S(N; N η ), using Lemma A.2 in the process. This allows us to conclude that there exists a w-smooth positive integer ζ ≪ η,δ,w 1 and ξ ∈ [W ] with (ξ, W ) = 1 such that #{x ∈ Z : ζ(ξ + W x) ∈ A} 1 2 δ#{x ∈ Z : ζ(ξ + W x) ∈ S(N; N η )}. (12.2) Define and set . Combining (12.2) and Lemma A.5, we have the lower bound (12.5) Noting that (kW ) 1/k is a positive integer, let otherwise.
Lemma 12.2. We have n ν(n) = ρ(1/η)X + O η,w (P k / log P ). (12.9) Proof. Throughout the following argument, all implied constants in our asymptotic notation are permitted to depend on k, η, w. Bear in mind that η η k is small. From the definition n ν(n) = x∈S(P ;P η ) x≡ξ mod W so, by the mean value theorem and the boundedness of ρ ′ , it remains to show that P P 1/2 Integration by parts gives and the estimate now follows from the boundedness of ρ, ρ ′ .

Lemma 12.3 (Density transfer).
For N large in terms of k, η, w and δ we have Proof. We employ (12.5) in conjunction with (12.10) to conclude that # x ∈ S(P ; P η ) : Using Lemma A.2 and recalling (12.3) we obtain Taking N sufficiently large, an application of (12.9) completes the proof.
The following two ingredients are established in Appendices D and E.
Proof of Theorem 12.1. We employ Theorem 11.4 with majorant ν given by (12.8), homogeneous set B 1 ⊂ S(X 1/k ; X η ) given by (12.6), and function f = ν1 A 1 (recall (12.4)). It is first necessary to check that these choices satisfy the hypotheses of Theorem 11.4. By Lemma 12.5, the function ν satisfies a (s + t − 10 −8 )-restriction estimate with constant K = O η,k (1). Let c η,k denote the implied constant in (12.11) and setδ := c η,k δ k . Theorem 11.4 guarantees the existence of a positive constant θ = θ(η,δ, M, K) (12.13) such that provided ν has Fourier decay of level θ and f 1 δ ν 1 we may conclude that (11.1) holds. Taking w = C η θ k guarantees sufficient Fourier decay, by Lemma 12.4. We note that this choice of w satisfies w ≪ η,δ,M 1, as can be checked by unravelling the dependencies in (12.13). We obtain f 1 δ ν 1 via Lemma 12.3. This requires us to take N sufficiently large in terms of k, η, w and δ. By our choice of w, this is ensured if N is sufficiently large in terms of η, δ and M (as we may assume).

Deducing partition regularity
In this final section of this part of the paper we prove a finitary version of Theorem 1.3.  Let c 1 , . . . , c s ∈ Z \ {0} and suppose that i∈I c i = 0 for some non-empty I. Then, for any r ∈ N, there exists N 0 ∈ N such that the following holds: for any N N 0 , if we have a finite colouring of the N η -smooth numbers in [N] S(N; N η ) = C 1 ∪ · · · ∪ C r , then there exists a colour i ∈ [r] and distinct x 1 , . . . , x s ∈ C i solving (1.2).
13.1. The inductive base: one colour. As in §4, given functions f 1 , . . . , f s : Z → C with finite support, define the counting operator and write T (f ) for T (f, f, . . . , f ). It follows from Theorem B.1 that there exist η = η(k) > 0, N 1 = N 1 (η, k, c) ∈ N and c 1 = c 1 (η, k, c) > 0 such that for N N 1 and we have By Lemma B.4, the number of trivial solutions in S(N; N η ) is o(N s−k ), so there must be at least one non-trivial solution (x 1 , . . . , x s ) ∈ S(N; N η ) s to (1.2) for N sufficiently large in terms of η, k, s and c. The base case follows.
13.2. The inductive step. Let S(N; N η ) = C 1 ∪· · ·∪C r . Re-labelling indices, we may assume that C r is the largest colour class, so that : qx ∈ C i } . Then it follows from (13.2) that C ′ 1 ∪ · · · ∪ C ′ r−1 = S(M; M η ). By the induction hypothesis, there exist distinct elements of some C ′ i which solve (1.2). Since this equation is homogeneous, we obtain a non-trivial solution in C i by multiplying the equation through by q k . 13.2.2. The homogeneous case. We now assume that C r is M-homogeneous in S(N; N η ). We apply Theorem 12.1, taking A = B = C r . By (13.1) the density of A in S(N; N η ) is at least 1 r . Theorem 12.1 then implies that, provided N N 0 (η, 1/r, M) we have T (1 Cr ) c 0 (η, 1/r, M)N s−k .
By Lemma B.4, the number of solutions in S(N; N η ) with two or more coordinates equal is o(N s−k ), hence taking N sufficiently large yields at least one non-trivial solution in C r . We note that a quantity dependent on the tuple (η, 1/r, M) is ultimately dependent only on η and r, by the definition of M. The induction step thereby follows, completing the proof of Theorem 13.1.

Part 3. Supersmooths and shifted squares
In this part we establish Rado's criterion for a linear equation in logarithmicallysmooth numbers (Theorem 1.7). Furthermore, we show how a direct application of the transference principle yields a supersaturated version of this result, and analogously for a linear equation in the set of squares minus one (Theorem 1.4). Both of these results are established without recourse to properties of homogeneous sets. This reflects the fact that supersmooths and shifted squares possess subsets which can be projectively transformed to obtain equidistribution in congruence classes to small moduli, ruling out possible local obstructions to partition regularity-obstructions which must be surmounted when working with perfect squares and higher powers. This phenomenon manifests itself when massaging the perfect powers to obtain equidistribution; this can be done, but requires an affine transformation, as opposed to a projective one. Unfortunately, a typical equation satisfying Rado's criterion is only projectively invariant, so the methods of this part do not succeed in establishing partition regularity for equations in perfect powers.
14. Modelling a pseudorandom partition with a colouring As described above, the proofs of Theorems 1.4 and 1.7 proceed by first passing to a subset of the sparse arithmetic set of interest (supersmooths or shifted squares). We then projectively transform this subset to obtain a set which is well distributed in arithmetic progressions to small moduli. We can then define a weight ν : [N] → ∞ supported on our equidistributed set which has nice pseudorandomness properties.
Given a finite colouring of our original arithmetic set, the above procedure induces a finite partition of our pseudorandom weight function into non-negative functions f i , so that Deducing supersaturation then amounts to showing that the count of solutions to our equation weighted by some f i is within a constant factor of the maximum possible.
The main tool in deriving this lower bound is to model the f i with functions g i whose sum dominates the indicator function of the interval 1 [N ] . It is a short step to show that, in essence, we may assume that the g i correspond to indicator functions of a colouring of [N]. For such colourings there is already a supersaturation result in the literature due to Frankl, Graham and Rödl [FGR88,Theorem 1]. Employing this theorem and then (quantitatively) retracing our steps yields Theorems 1.4 and 1.7.
In this section we establish the modelling part of the above procedure: nonnegative functions f i with pseudorandom sum i f i have approximants g i whose sum dominates the constant function 1 [N ] . This 'transference principle' 6 for colourings is based on Green's transference principle for dense sets [Gre05], as exposited in [Pre17a]. We recall the concepts of Fourier decay and p-restriction given in Definitions 5.3 and 5.4.
Let κ, ε > 0 be parameters, to be determined later. In proving this result we utilise the large spectrum of f i , which we take as Define the Bohr set with frequencies S := S 1 ∪ · · · ∪ S r−1 and width ε 1/2 by where, for finitely supported f i , we set We first estimate The key identity is If α ∈ T \ S then by the definition (14.1) of the large spectrum we have f i (α) If α ∈ S, then for each n ∈ B we have e(αn) = 1 + O(ε). Hencê and consequently f i (α) Combining both cases gives f i From this it is apparent we should choose κ = ε, which we do. We will show that, for any n, the sum i r−1 g i (n) is almost bounded above by 1. By positivity and orthogonality, we have Inserting our Fourier decay assumption, and using Parseval, yields and that f i (1 i r − 1).

A pseudorandom Rado theorem
We begin the proof of this theorem by generalising [FGR88] from colourings to bounded weights.
Lemma 15.2 (Functional FGR). Let c 1 , . . . , c s ∈ Z \ {0} with i∈I c i = 0 for some non-empty I ⊂ [s]. For any r there exists N 0 ∈ N and c 0 > 0 such that for N N 0 and g 1 , . . . , g r : Proof. By the pigeonhole principle, for each x ∈ [N] there exists i ∈ [r] such that g i (x) 1/r. Let i be minimal with this property, and assign x the colour i. By the result of Frankl, Graham and Rödl, for some such choice of i there are at least c ′ 0 N s−1 tuples x where each coordinate receives the colour i and such that c · x = 0. It follows that With this in hand, we proceed to prove Proposition 15.1. Since ν satisfies a (s−0.005)-restriction estimate with constant K, and has Fourier decay of level 1/M, we may apply the modelling lemma (Proposition 14.1, provided M M 0 (s, K) as we may assume) to conclude the existence of g i : where p = s − 0.005. This also implies that (1 i r).
Applying Lemma 15.2 (provided that N N 0 (r, c), as we may assume) furnishes a colour class i for which c·x=0 g i (x 1 ) · · · g i (x s ) ≫ r,c N s−1 .
Our assumption that i∈I c i = 0 ensures that s |I| 2. We may in fact assume that s 3, for if s = |I| = 2 then Proposition 15.1 is trivial. Hence (1 + M −1/2 )1 [N ] satisfies a (s − 0.005)-restriction estimate with constant 1, and majorises each g i . Employing the generalised von Neumann lemma (Lemma C.3), with i as in the previous paragraph, we deduce that Assuming that M M 0 (r, c, K) completes the proof of Proposition 15.1.

Supersaturation for shifted squares
In this section we relate a colouring of the shifted squares to a partition of a pseudorandom majorant ν satisfying the hypotheses of Proposition 15.1, and thereby prove Theorem 1.4. As in §6, we accomplish this through the W -trick for squares.
Define W by (6.1), where w = w(c, r) is a constant to be determined. Let If c is an r-colouring of the squares minus one, we induce an r-colouring of S ′ via . . , S ′ r denote the induced colour classes. From the definition of S ′ and the homogeneity of the equation, we see that the left-hand side of (1.4) is at least as large as As in (6.5), define a weight function ν : We reassure the reader that neither the constant term 1 nor the factor W appearing above are necessary, but their presence is consistent with (6.5) and (12.8). A calculation similar to (6.6) gives where S is the set of shifted squares as defined in the theorem.
We recall that W ultimately depends only on w = w(c, r). Therefore, to show that (16.1) is of order |S ∩ [N]| s N −1 , and hence to prove Theorem 1.4, it suffices to establish that for Appendices D and E yield the following.

Supersaturation for logarithmically-smooth numbers
The proof of Theorem 1.7 follows in analogy with the argument of the prior section. The situation is somewhat simpler in this context, as there is no need to massage the set of smooths to exhibit sufficient pseudorandomness.
Define the indicator function ν : The relevant pseudorandomness properties follow from work of Harper [Har16].
Proof of Theorem 1.7. We are assuming that i∈I c i = 0 for some I = ∅, and this forces s 2. If s = 2 then we are counting monochromatic solutions to x 1 − x 2 = 0, for which we have the lower bound |S(N; R)| |S(N; R)| 2 N −1 .
Let us therefore assume that s 3. Provided that R log C N we have that ν satisfies a p = 2.995 restriction estimate with constant K = O(1). Applying Proposition 15.1 with these parameters, there exist N 0 , M, c 0 > 0 such that (1.5) holds, provided that ν has Fourier decay of level M −1 . This can be guaranteed on employing Lemma 17.2 and ensuring that where C = C(r, c) is sufficiently large.
The result follows on noting that 1 + 1 Notice that if W is a w-smooth positive integer divisible by the primorial p w p, then every positive integer can be written in the form ζ(ξ + W y) for a unique choice of a w-smooth positive integer ζ and a unique ξ ∈ [W ] with (ξ, W ) = 1.
Lemma A.5. For any K 1 we have Proof. By Lemma A.2, we have The estimate now follows from the mean value theorem, since ρ ′ is bounded and log N η log N + log K − 1 η ≪ 1 log N .

Appendix B. The unrestricted count and mean values estimates
Recall that η is 1 if k = 2 and a small positive constant if k 3. The following is a consequence of the current state of knowledge in Waring's problem. Let c 1 , . . . , c s ∈ Z \ {0} with i∈I c i = 0 for some non-empty subset I of [s]. Then, for k 2, there exists s 0 (k) ∈ N such that if s s 0 (k) and N N 0 then # x ∈ S(N; N η ) s : Moreover, one can take s 0 (2) = 5, s 0 (3) = 8, and s 0 (k) satisfying (1.3).
The k = 2 case was known to Hardy and Littlewood. In an influential paper, Kloosterman [Klo27] opens with a discussion of this, then adapts the Hardy-Littlewood method to address the quaternary problem. Details of a proof may be found in [Dav2005, Ch. 8].
As we cannot find the precise statement that we require for k 3 in the literature, we outline a proof below. The conclusion largely follows from the earlier techniques of Vaughan and of Wooley [Vau89,VW91,Woo92], but we find it convenient to also draw material from other sources. Indeed, the aforementioned articles on Waring's problem involve a combination of smooth and full-range variables, so for our lower bound the results cannot be imported directly. Theorem B.1 is an indefinite version of a special case of [DS16, Theorem 2.4]; the latter is more profound, as it tackles a more challenging smoothness regime. One approach would be simply to imitate the proof of that theorem, until needing to treat the local factors-this is approximately what we do below.
Proof. Let k 3, and let η = η k be a small positive constant. By orthogonality, our count is for some c = c(k) > 0. Therefore First we prune our major arcs down to a lower height. Set Q 1 = √ log N . Let As (a, q) = 1 and (rc s , b) ≪ 1, we have q ≍ r, |rc s α − b| ≍ |qα − a|, and it now follows from [VW91, Lemma 8.5] that For q ∈ N, a ∈ Z and β ∈ R, define S(q, a) = x q e q (ax k ), and W (α, q, a) = q −1 S(q, a)w(α − a/q), where as before ρ denotes the Dickman-de Bruijn ρ-function. Next, we apply [Vau89, Lemma 5.4] to c i α, for 1 i s and α ∈ N(q, a) ⊂ N, where 0 a < q Q 1 and |qα − a| By (B.1) and (B.2), together with Hölder, we now have 1 0 g 1 (α) · · · g s (α)dα The bound (B.3) enables us to extend the integral to [−1/2, 1/2] s and then the outer sum to infinity with o(N s−k ) error, as is usual for a major arc analysis [Dav2005, Vau97]. We thus obtain q −s S(q, c 1 a) · · · S(q, c s a) and As discussed in [Dav2005, Ch. 8], the singular series is a product of p-adic densities, and is strictly positive if and only if χ p > 0 for all p. The positivity of the p-adic densities χ p follows from the assumption that i∈I c i = 0 for some non-empty I ⊆ [s]: one takes a non-trivial solution in {0, 1} s , and this is a non-singular p-adic zero.
Our final task is to show that J ≍ N s−k . By orthogonality With c > 0 small, we have the crude lower bound since the c i are not all of the same sign. We also have the complementary upper bound Remark B.2. By working harder, we could have obtained a main term λN s−k , for some positive constant λ = λ(c), similarly to Drappeau-Shao [DS16].
We also need the following bounded restriction inequalities.
Proof. The quadratic statement is a direct consequence of [Bou89, Eq. (4.1)]. Assuming for the time being that k 4, write 2t for the smallest even integer greater than or equal to the integer s 0 (k) appearing in Theorem B.1. Note that modifying s 0 (k) by adding a constant does not affect the veracity of (1.3), and so we will prove the statement for s s 0 (k) + 2 in this case.
By orthogonality, the triangle inequality and Theorem B.1, we have The trivial estimate x∈S(N ;N η ) f (x)e(αx k ) N completes the proof when k 4.
For k = 3 we require a more elaborate argument to prove that the precise value of s 0 (3) = 8 is admissible. In particular, our approach relies on a 'subconvex' mean value estimate of Wooley [Woo95]. Define φ : Z → C by φ(n) = f (x) if n = x 3 for some x ∈ S(N; N η ), and zero otherwise. Our objective is to show that T |φ(α)| 8−10 −8 dα ≪ N 5−10 −8 .
Proof. Let s 0 (k) be as in Lemma B.3. By the union bound, it suffices to prove an estimate of the required shape for the number of solutions with x s−1 = x s . In this case we are estimating # x ∈ S(N; N η ) s−1 : It may be that c s−1 + c s = 0, so we estimate the contribution from the x s−1 variable trivially. Using orthogonality and Hölder's inequality, it therefore suffices to prove that It remains to check that s − 2 − k(s−2) p < s − 1 − k, or equivalently that 2 + p(1 − 1 k ) < s. Since s > p, this follows if p/k 2, which we can certainly ensure without affecting the bound (1.3).
Proof. Let p = s − δ. By Lemma C.1, the weight satisfies a p-restriction estimate with constant K and majorises the difference Observing that this weight has L 1 norm equal to two, the lemma follows on applying the telescoping identity together with Lemma C.2.

Appendix D. Pointwise exponential sum estimates
The primary objective of this section is to establish the Fourier decay estimates in Lemmas 6.3, 12.4 and 16.1. Of these, Lemma 12.4 concerns an exponential sum over smooth numbers. As before, put R = P η , with η = 1 when k = 2 and η = η k a small positive number when k 3, and define P and X by (12.3). Our weight function ν is defined by (12.8), with k = 2 when dealing with Lemmas 6.3 and 16.1 as well as ξ = 1 in the latter scenario. This is consistent with (6.5) and (16.2). We assume throughout that X is sufficiently large in terms of w.
Our goal is to prove the inequality (12.12), using the Hardy-Littlewood circle method. More explicitly, we wish to show that if α ∈ T then We treat the k 3 and k = 2 cases separately, as smooth numbers are used for the former.

D.1. Smooth Weyl sums.
We first consider the case k 3, recalling that here we choose η = η k sufficiently small. The idea is to consider a rational approximation a/q to α; there will ultimately be four regimes to consider, according to the size of q. We begin with a variant of [Vau89, Lemma 5.4], which is useful for low height major arcs. Let Lemma D.1 (First level). Suppose q ∈ N and a ∈ Z, with q R/W and qα = |qα − a|. Then Proof.
In particular, if α(x) equals e a q · x k −ξ k kW when x ≡ ξ mod W is R-smooth and 0 otherwise, then By partial summation and the boundedness of ρ ′ , we also have and so x m Next, observe that with β = α − a/q we have |β| = q −1 qα and Partial summation gives and with the boundedness of ρ ′ it also implies that Meanwhile, Euler-Maclaurin summation [Vau97, Eq. (4.8)] yields Substituting these estimates into (D.2) concludes the proof.
We supplement this by bounding S q,a and I(β).
A standard calculation provides the following bound.
Lemma D.3. We have Before continuing in earnest, we briefly describe the plan. We can modify [Vau89, Theorem 1.8] to handle a set of minor arcs. At that stage, our major and minor arcs fail to cover the entire torus T, but we can bridge the gap using a classical circle method contraption known as pruning (also used in Appendix B). Adapting [VW91, Lemma 7.2], we can prune down to q (log P ) A . Finally, by adapting [VW91, Lemma 8.5], we prune down to q (log P ) 1/4 . In order to tailor the classical theory to suit our needs, we begin with the observation that The inner summation is a classical quantity with a linear twist.
Remark D.5. We will later apply this with ε = ε k , so that the condition η η 0 (ε, k) will be met.
Proof. Following the proof of [Vau89, Theorem 1.8], we find that if α ∈ m 1 and 1 m P then Indeed, already built into that proof are bounds uniform over linear twists; see [Vau89,Eq. (10.9)]. The sum above is over x ∈ S(m; R), where 1 m P , rather than over x ∈ S(P ; R), however we can assume that √ P m P and then run Vaughan's argument. Now, by (D.7), we have x∈S(m;R) x≡ξ mod W e α x k − ξ k kW ≪ P 1+ε (P −δ + P −ι(k) ).
The remainder of the proof of [VW91, Lemma 7.2] carries through in the present context, mutatis mutandis. The eventual outcome of the changes above is to increase the term q 1/4 P (R/M) 1/2 to q 1 2 − 1 2k P (R/M) 1/2 , and we obtain the asserted bound.
Lemma D.7 (Second pruning step). Suppose R = P η with 0 < η < 1/2, and that a, q ∈ Z with (a, q) = 1 and 1 q (log P ) A . Then for some c = c(η, A) we havê As m P and kW ≪ k,W 1, the outcome of this calculation is unaffected, and we obtain e Q (Ay k + By).
Our final task is to show that if d | kW q then W(d, a(kW q/d) k−1 , tkq) ≪ k,w,ε q 1− 1 k +ε .

In both cases we have
To tie together what we have gleaned, we make a Hardy-Littlewood dissection. For q ∈ N and a ∈ Z, let M(q, a) be the set of α ∈ T such that |α − a/q| (log P ) 1/4 /P k . Let M(q) be the union of the sets M(q, a) over integers a such that (a, q) = 1, and let M be the union of the sets M(q) over q (log P ) 1/4 . By identifying T with a unit interval, we may write M(q) as a disjoint union First we consider the minor arcs m := T \ M.
Proof. Let α ∈ m. If α kW ∈ m 1 , where m 1 is as in Lemma D.4 with δ = (4k) −1 , then Lemma D.4 applies and is more than sufficient (recall (12.3)). We may therefore assume that α kW / ∈ m 1 , and then deduce the existence of relatively prime integers q > 0 and a for which q + P k |qα − a| ≪ P 3/4 . If the 'natural height' q+P k |qα−a| exceeds (log P ) 9k , then an application of Lemma D.6 with M ≍ RP 3/4 suffices. So we may suppose instead that q+P k |qα−a| (log P ) 9k . As α / ∈ M, we must also have q + P k |qα − a| max q, P k α − a q > (log P ) 1/4 , and now Lemma D.7 delivers the sought inequality.
We are ready to prove Lemma 12.4, in the case k 3. As discussed at the beginning of this appendix, our task is to establish the estimate (D.1). It will be useful to have (12.3) and (12.9) in mind. By a geometric series calculation, we have First suppose α ∈ m. By Dirichlet's approximation theorem, we obtain relatively prime integers q and a such that 1 q (log P ) 1/4 , |qα − a| (log P ) −1/4 .
D.2. Quadratic Weyl sums. The purpose of this subsection will be a proof of Lemmas 6.3 and 16.1, together with the k = 2 case of Lemma 12.4. In all of these cases k = 2, so η = 1, and the weight function is simpler, namely For the Fourier transform of this weight function, we can obtain a power saving on the minor arcs, as in [BP17]. This will be used in the next appendix, in the proof of the restriction estimate. We keep this brief, as the analysis is essentially the same as that of [BP17].
As discussed at the beginning of this appendix, we seek to establish (D.1). The Fourier transform is given bŷ The following is a straightforward adaptation of [BP17, Lemma 5.1].
Lemma D.9 (Major arc asymptotic). Suppose that qα = |qα − a| for some q, a ∈ Z with q > 0. Then Lemmas D.2 and D.3 still hold when k = 2, with the same proof. Following [BP17], put τ = 1 100 , and to each reduced fraction a/q with 0 a < q X τ associate a major arc Let M 2 denote the union of all major arcs, and define the minor arcs by m 2 = T \ M 2 . The following is a straightforward adaptation of [BP17, Eq.
We have examined all cases, thereby completing the proofs of Lemmas 6.3, 12.4 and 16.1.

Appendix E. Restriction estimates
In this section we prove the restriction estimates claimed in Lemmas 6.4, 12.5 and 16.2. The core elements of our setup are the same as in Appendix D, but we repeat all of this for clarity. Put R = P η , and define P and X by (12.3). In the cases of Lemmas 6.3 and 16.1 let η = 1 and k = 2, and ξ = 1 in the latter scenario. Our weight function ν is defined by (12.8). When k 3, we choose η = η k sufficiently small. We assume throughout that X is sufficiently large in terms of w.
Let φ : Z → C with |φ| ν pointwise. For an appropriate restriction exponent p, our task is to establish the restriction inequality The implied constant, in particular, will not depend on w.
it suffices to show this when where s 0 (k) ∈ N is as in Theorem 1.

Fix this choice of p.
To summarise what is written above, we seek to establish the restriction inequality (E.1) when the exponent p is given by (E.3). This will prove Lemmas 6.4, 12.5 and 16.2 at one fell swoop.
Even moments play a key role, owing to the presence of an underlying Diophantine equation. In particular, they allow bounded weights to be freely removed. Let 2m be the greatest even integer strictly less than p.
Remark E.2. The sixth moment estimate, for the case k = 3, has a slightly different flavour; it is a consequence of Wooley's 'subconvex' mean value estimate [Woo95]. It is this that ultimately enables us to procure a p-restriction estimate with p < 8.
Proof. By orthogonality and the triangle inequality where N is the number of solutions (x, y) ∈ S(P ; P η ) m × S(P ; P η ) m to the Diophantine equation x k 1 + · · · + x k m = y k 1 + · · · + y k m .
Note that adding a constant to s 0 (k) in the case k 4 does not cause it to violate (1.3), and so we may assume that 2m s 0 (k) for the quantity s 0 (k) appearing in Theorem B.1. For k 4 we therefore have, by Theorem B.1, that T |φ(α)| 2m dα ≪ P 2m(k−1) P 2m−k = P k(2m−1) ≪ (W X) 2m−1 .
The case k = 2 is similar, as the crude bound N ≪ ε P 2+ ε 2 is standard. When k = 3 the proof may be concluded using [Woo95, Theorem 1.2], which implies that N ≪ P 3.25−10 −4 .
These estimates fall short of being sharp. By increasing the exponent, we are able to make them sharp, using Bourgain's epsilon-removal procedure [Bou89]. In the case k = 3, an additional intermediate exponent is required.
E.1. Epsilon-removal. In this subsection we assume that k = 3. The case k = 3 is treated in the next subsection by incorporating a small finesse. Denote by δ a parameter in the range 0 < δ ≪ 1.
We obtain (E.9), but with k = 2 in the definition of G(·), and Bourgain's argument again completes the proof.

E.2. An intermediate exponent.
In this subsection let k = 3, and let η be a small positive constant as before. We proceed in two steps, effectively 'pruning' the large spectrum. In the first step, we use a power-saving minor arc estimate for an auxiliary majorant to come close to a sharp restriction estimate. In the second step, we no longer require a power saving on the minor arcs, so we are able to obtain a sharp restriction estimate by reverting to the majorant ν.
E.2.1. A close estimate. Here we concede a small loss. By slightly increasing the exponent, we will recover it in the next subsection. Our goal for the time being is to establish the following. Similarly to the k = 3 case, it suffices to prove that meas(R δ ) ≪ 1 δ 8−10 −5 X , where it is now convenient to redefine R δ = {α ∈ T : |φ(α)| > δW X}.
We have considered all cases, thereby completing the proof of Lemmas 6.4, 12.5 and 16.2.

Appendix F. Lefmann's criterion
In this section we prove Theorem 1.8, which is a consequence of Lefmann's lemma [Lef91, Fact 2.8]. The theorem is a special case of Theorem 1.3, but can be established more simply, and we presently provide a proof. By rearranging the variables, we may suppose that for some t ∈ {6, 7, . . . , s} we have c 1 + · · · + c t = 0.
The following obscure fact was shown by Lefmann [Lef91, Fact 2.8].
To complete the proof of Theorem 1.8, it remains to prove that the system has a solution (y, y) ∈ (Z\{0})×Z t . The number of such solutions in [−P, P ] t+1 is N 1 −N 2 , where N 1 is the total number of integer solutions (y, y) ∈ [−P, P ] t+1 and N 2 is the number of integer solutions y ∈ [−P, P ] t to c 1 y 1 + · · · + c t y t = c 1 y 2 1 + · · · + c t y 2 t = 0. Here P is a large positive real number.
Lemma F.2. We have N 2 ≪ P t−3 log P.
Proof. Let Q(y 1 , . . . , y t−1 ) = c −1 t (c 1 y 1 + · · · + c t−1 y t−1 ) 2 + i t−1 (c i c j /c t )y i y j , and put C = |c 1 | + · · · + |c t |. Now N 1 is greater than or equal to the number of integer solutions (y, y 1 , . . . , y t−1 ) ∈ [−P/C, P/C] t to ay 2 + Q(y 1 , . . . , y t−1 ) = 0 with c 1 y 1 + · · · + c t−1 y t−1 ≡ 0 mod c t . By considering only multiples of c t , we find that N 1 is greater than or equal to the number of integer solutions x ∈ [−P/C 2 , P/C 2 ] t to Q 1 (x) := Q(x 1 , . . . , x t−1 ) + ax 2 t = 0. For the sake of brevity, we appeal to Birch's very general theorem [Bir61, Theorem 1]. The Birch singular locus is the set S of x ∈ C t at which the gradient of Q 1 vanishes identically. (In this instance, the Birch singular locus coincides with the usual singular locus.) We compute that 1 2 ∂ i Q(y 1 , . . . , y t−1 ) = (c i + c 2 i /c t )y i + j t−1 j =i c i c j y j /c t , and so c t 2c i ∂ i Q = (c t + c i )y i + j t−1 j =i c j y j = c t (y i − y t ) + j t c j y j = c t (y i − y t ), where y t := −c −1 t (c 1 y 1 + · · · + c t−1 y t−1 ). Therefore S = {(x, x, . . . , x, 0) ∈ C t }, and in particular dim S = 1. As t − dim S > 4, Birch's theorem [Bir61, Theorem 1] gives N 1 = SJP t−2 + O(P t−2−δ ), (F.4) for some constant δ > 0, where S and J are respectively the singular series and singular integral arising from the circle method analysis. Birch notes in [Bir61,§7] that S is positive as long as Q 1 has a non-singular p-adic zero for each prime p, and that J is positive as long as Q 1 has a real zero outside of S 1 . Note that Q has a zero x * ∈ Z t−1 with pairwise distinct coordinates; this follows from [Kei14, Theorem 1.1], or from a circle method analysis. Now (x * , 0) is a real zero of Q 1 outside of S 1 , and is also a non-singular p-adic zero for each p. Hence SJ > 0, and by (F.4) the proof is now complete.
The previous two lemmas yield N 1 > N 2 , and this completes the proof of Theorem 1.8.
Remark F.4. Lefmann's lemma generalises straightforwardly to higher degrees. We do not explore this avenue further, as any results thus obtained are likely subsumed by Theorem 1.3.