Coordinate Distribution of Gaussian Primes

We study the problem of writing Gaussian primes as the sum of two squares, both of which are interesting arithmetically, in particular, when one is the square of a prime and the other the square of an almost-prime.


Introduction and statement of results
The modern history of prime number theory might well be said to begin with the statement of Fermat to the effect that the primes of the form 4m + 1 can be written as the sum of two squares. The first recorded proof is due to Euler. We think of these today as being the primes which occur as the norms of the unramified splitting primes a + 2bi in the Gaussian field Q(i) and we shall refer to them as Gaussian primes. Following the proof of the prime number theorem, we have the well-known asymptotic formula for the number of these: ψ(x; 4, 1) = n x n≡1 (mod 4) where we are going to restrict to a and b being positive. Beginning in the 1990's one began seeing how to count the frequency of subsets of these primes for which one of the squares has an additional interesting arithmetic property. The first result to note in this connection was the work [FoIw] of E. Fouvry and H. Iwaniec in which the asymptotic formula was obtained for the case wherein one of the squares was the square of a prime (actually their result was rather more general). Subsequently, in [FrIw1], the current authors obtained the asymptotic in the setting where one of the squares was the square of a square and thus for the number of primes which could be written as the sum of a square plus a fourth power. This result had an additional interest in first successfully establishing the asymptotic formula for a thin set of prime values of a polynomial, that is, one having density ≪ x 1−δ for some positive δ.
Following a gap of some fifteen to twenty years, there have now been a number of newer developments along these lines of research. R. Heath-Brown and X. Li [HL] have shown that, in the statement of [FrIw1], one can replace the fourth power of an integer by the fourth power of a prime and still establish for these the relevant asymptotic formula. Very recently, K. Pratt [Pr] has succeeded with the thin set obtained when one of the squares is the square of an integer which is missing three prescribed digits from its decimal expansion. P. Lam, D. Schindler and S. Xiao [LSX] have succeeded in extending the original work [FoIw], replacing the Gaussian integers and Gaussian primes by the corresponding values of an arbitrary irreducible positive definite binary quadratic form.
In all of these highly interesting works one is concerned with the specialization to a particular subset those values taken on by one of the two coordinates. In this work we shall be motivated by the question wherein we ask something special about both of them.
We are going to count the primes π = a + 2bi in the ring Z[i] which have their coordinates a, b restricted to special integers. Ideally, we would like to reach π = a + 2bi with a and b both primes, but we are too old to reach these by currently developed technology. However, we still have enough strength for catching π = a + 2bi with a prime and b almost-prime.
We accomplish the goal by estimating sums of type (1.1) G(x) = 4k 2 +ℓ 2 x β k γ ℓ Λ(4k 2 + ℓ 2 ) with coefficients β k , γ ℓ which live on primes and almost-primes. In most parts of our considerations these coefficients can be quite general, but sometimes we have to specialize. Let Λ r = µ * (log) r denote the von Mangoldt function of order r 1 and Λ = Λ 1 . The Λ r (n) vanish unless r has at most r distinct prime factors and, in any case, 0 Λ r (n) (log n) r . In the Appendix we shall give some heuristic arguments leading to the determination of an asymptotic formula for G r (x) = 4k 2 +ℓ 2 x Λ r (k)Λ(ℓ)Λ(4k 2 + ℓ 2 ).
Conjecture. We have The case r = 1 is most challenging, because it requires breaking the parity barrier of sieve theory.
We are able to estimate G r (x) positively for r 7.
Remarks 1.1. If n is not squarefree or n has a small prime factor then Λ r (n) contributes to G r (x) a negligible amount, so we are really catching primes 4k 2 + ℓ 2 with ℓ prime and k having at most r prime factors, all distinct.
In fact we shall estimate a more restricted sum.
Theorem 1.2 (Almost Primes Theorem). Let β k = 1 if k has at most 7 prime factors, all of which are larger than k 1/49 , and β k = 0 otherwise. Then Remarks 1.2. The lower bound of (1.4) follows from the lower bound of (1.5), because β k (log k) 7 ≪ Λ 7 (k). The upper bounds can be derived directly by application of any crude sieve method so we skip the proof.
We shall establish an asymptotic formula for G(x) with relatively small error term where β is the convolution 1 * λ with λ supported on a relatively short segment. Put with |λ h | 1 for h squarefree, h y say, λ h = 0 otherwise. Obviously we have in mind the sieve weights λ h of level y. Having the weights λ h at our disposal we can build the β k having our favorite property. There are numerous possibilities to play with these weights. The second coordinate ℓ is counted with weight γ ℓ about which we do not need to know much. However, after serious attempts to handle γ ℓ in great generality we gave up this ambition, because of tremendous complications in resolving the main term in certain bilinear forms over the Gaussian domain. We are going to assume that (1.7) |γ ℓ | log ℓ, if ℓ is an odd prime, and γ ℓ = 0, otherwise. Moreover, we need the asymptotic formula to hold for every q 1, (a, q) = 1, x 2 and any B 2, the implied constant depending only on B.
Remarks 1.3. For γ ℓ = log ℓ the formula (1.8) is just the Siegel-Walfisz theorem. Have in mind that our assumption (1.8) is meaningful for q < (log x) A with any A 2, but has no value for much larger moduli. By resizing, one is allowed to change (1.7) and (1.8) by a fixed positive constant independent of the residue classes a (mod q).
Before getting to the Main Theorem let us express some principles of its proof. First of all, our arguments borrow substantial parts from the works [FoIw] and [FrIw2], but as we impose restrictions on both coordinates of the Gaussian integers ℓ+2ki some fresh ideas occur. We consider the sequence A = (a n ) of numbers (1.13) a n = 4k 2 +ℓ 2 =n β k γ ℓ and count them over primes. There will be a lot of Fourier analysis performed so it helps to start with a smoothed counting. Let f (t) be a function supported on 1 2 x t x, twice differentiable and such that (1.14) |t j f (j) (t)| 1, j = 0, 1, 2.
We are going to evaluate asymptotically the sum (1.15) S(x) = n a n f (n)Λ(n).
with any A 2, the implied constant depending only on A.
It is not difficult to derive MT from SMT; see a brief explanation in Chapter 18. Classical ideas for estimating sums of type (1.15) begin by partitioning into a sum of sums which we call "congruence sums", and double sums with suitable coefficients u m , v n , which we call "bilinear forms". There are plenty of possibilities, see Chapters 17 and 18 of [FrIw2]. For our purpose we choose Theorem 18.5 of [FrIw2], which is derived by finessing Bombieri's asymptotic sieve.
The congruence sums are treated in Sections 3, 4 with an application of the large sieve type inequality for roots of the quadratic congruence ν 2 + 1 ≡ 0 (mod d), see Lemma 3.1. The bilinear forms are treated in Sections 7-16. These bilinear forms are modified in various directions to create special features, as required for the application of distinct tools.

Interlude: an easier result
If one stares at our sum it seems only natural to ask what happens when we consider the visually similar sum Actually, this is a much easier problem and we can obtain the correct order of magnitude as soon as r 3. In the Appendix we give a very short proof of the following result.

The congruence sums
In this section we extract the main term from the congruence sum A d (x) and provide a Fourier series expansion for the error term. Then we estimate the absolute remainder (the sum of absolute values of the error terms) in Section 4.
We have The summation is void if d is even so we always assume that d is odd. Taking advantage of ℓ being an odd prime we insert the restriction (ℓ, d) = 1 up to an error term Keep in mind that ρ(d) is multiplicative with ρ(p) = 1 + χ(p) where χ is the non-principal character modulo 4. Consequently (h, d) = 1. Now we split the inner sum over b into residue classes b ≡ νℓ2h (mod d) and divide the congruence by ℓ getting Recall the popular notation a (mod d) which stands for the multiplicative inverse of a (mod d); aa ≡ 1 (mod d) if (a, d) = 1. Do not confuse it with complex number conjugation. Working with (3.3) we no longer need the restriction (ℓ, d) = 1 so we drop it up to the same error term which we committed when installing it. First we evaluate (3.3) quickly by Hence our congruence sum satisfies the approximation If we use the assumption (1.8) (the PNT for γ ℓ with q = 1) we get However, to maintain transparency we shall keep the original expression (3.7) until (1.8) is really needed. The elementary formula (3.5) suffices for d odd, uniformly in the range d y −1 √ x(log x) −A . By the large sieve for characters χ (mod d) we can get good results on average over d < √ x(log x) −A .
However we can do even better by applying Poisson's formula to (3.3). We extend the summation , thus counting every term twice, except for b = 0 in which case d = 1. We find that (3.3) is equal to is the Weyl harmonic from the theory of equidistribution of the roots of (3.2). Hence, for d odd we have Here the main term comes from the zero frequency s = 0 and r d (x) can be considered to be an error term because it will turn out to have small effect due to cancellation in the Weyl harmonics. The last term in (3.11) is negligible. There is a considerable cancellation of the terms in (3.12) due to the spacing of the fractions ν/d modulo 1 as ν runs over the roots of (3.2). This property of ν/d leads to the following large sieve type inequality.
Lemma 3.1. Let h 1. For any complex numbers α n we have Proof. See section 20.2 of [FrIw2].

Estimation of the remainder
We need a bound for the remainder where r d (x) is given by the Fourier series (3.12). Since we shall not take advantage of the summation over h we partition (3.12) into and we estimate the partial remainders separately for every h y and 1 2X D. We have where X = D/2, D/4, D/8, · · · . In order to apply (3.13) we build a single variable n = sℓ out of the two variables s, ℓ which we need to separate from the modulus d. We accomplish the separation by the change of the variable t in the Fourier integral (3.9) into t √ x/s getting The trivial bound If s is larger we can gain by twice integrating (4.6) by parts. We obtain another expression where the derivatives f ′ , f ′′ are evaluated at ℓ 2 + xt 2 /s 2 . Now estimating (4.8) trivially we get Let S 0 ≥ 1. The part of (4.3) with S 0 s < 2S 0 is estimated by Summing (4.10) over X < d 2X with (d, 2h) = 1 we derive by (3.13) (apply the Cauchy-Schwarz inequality) that the partial remainder (4.4) restricted by S 0 s < 2S 0 is bounded by We have derived (4.12) using the formula (4.6). Similarly, if we use the formula (4.8), then we get the bound (4.12) with an extra factor (S/S 0 ) 2 . Combining both bounds we see that, with optimal cutoff point, the worst result comes from S 0 ≍ S = hX/ √ x. Hence, we conclude that Finally, inserting (4.13) into (4.5) we obtain Proposition 4.1. We have (4.14) Remark 4.1. The bound (4.14) is useful if y 2 D ≪ x(log x) −A .

A model for A = (a n )
By means of multiplicative functions we construct a sequence for which the main terms of the congruence sums agree with those for A d (x). We consider B = (b n ) with the numbers where the multiplicative functions ψ(n) and φ(h) are given by if p = 2 and α 1. Recall that ρ(p) = 1 + χ(p) is the number of roots of ν 2 + 1 ≡ 0 (mod p).
Let w(y) be a smooth function supported on 0 < y < 1 with We are going to evaluate asymptotically the sum The implied constant in (5.6) depends only on the crop function w.
Proof. We execute the summation via L-functions rather than by Poisson's formula. We have The corresponding Dirichlet series is equal to Now we borrow L(s, χ)/ζ(2s) and return it in the form of its Euler product, getting For p = 2 the local factor of P (s) is 1 + χ(p)/(p − 1 + χ(p))(p s + χ(p)) so the product converges for Re s > 0. We compute the residue of L(s) at s = 1 Checking the local factors we find Finally (5.6) follows by contour integration with the error term obtained by trivial estimations on the line Re s = 1 2 . Remarks 5.1. The main term of (5.6) agrees with that of (3.11) after normalization. Checking the local factors of H in (5.7) and κ in (1.12) against L(1, χ) we see that

Sums over primes
Theorem 18.5 of [FrIw2] gives an inequality between a sum over primes, sums of congruence sums and a bilinear form. We can use this inequality as it stands, but we get faster results with a slightly different inequality (which is actually derived in [FrIw2], but not stated explicitly).
Proposition 6.1. Let 1 < z √ x. For any complex numbers c n we have Remarks 6.1. The double sum over m, n is a bilinear form. The key feature of this form is that the inner sum is weighted by the clean Möbius function µ(m); it is not contaminated by some incomplete Dirichlet convolutions presented by similar identities in the literature. Moreover, we sum µ(d)C ′ d (x) with the Möbius factor µ(d) rather than with absolute values. This slight (not vital) difference will simplify our work.
We apply (6.1) with z = x δ , 0 < δ 1 8 , for the sequence of numbers where A = (a n ) is our target sequence (1.13) and B = (b n ) is its model (5.1). Note that c n = 0, unless n < x, n odd. The congruence sums of C = (c n ) have no main term; compare (3.11) with (5.6).
On the left-hand side of (6.1) we get (up by the PNT, where A is any number 2. On the right-hand side of (6.1) we get three sums. The first sum is The second sum is The third sum is the bilinear form We estimate R ′ by applying two elementary approximations to the main terms, namely (3.5) with f (t) replaced by f (t) log t and (5.6) with w(y) replaced by w(y) log xy. We obtain Note that the extra logarithmic factors log t and log xy in the crop functions make the resulting main term different. They do not match exactly, yet they are close. If f (t) is supported in a relatively short interval centered at cx with the constant c = exp ˆw (y) log y dy then the above main terms cancel out up to a sufficiently small error term, showing that R ′ is negligible. But we do not need to make such a restriction for f (t), because we may exploit cancellation from the summation over d. Indeed, by the PNT we get with any A 2, the implied constant depending only on A.
In the second sum R the main terms match exactly, they cancel out and the remaining terms are estimated in (4.14), (5.6), respectively. We get In the bilinear form B we also get cancellation due to sign changes of the Möbius function µ(m). It is difficult to see that µ(m) does not correlate with the original sequence a mn , but this is clear for the model sequence b mn . We have By the PNT we find that the last sum over m is ≪ n −1 x(log x) −A−3 . Next, summing over h y and n < xz −1 we lose a factor (log x) 2 . Hence the total contribution of the model sequence to the bilinear form Adding up the above estimates we conclude this section with the following result which does not contain the model sequence.
If we assume (1.8), then W (x) satisfies (3.8) so (6.8) becomes To complete the proof of (1.16) it remains to show that (6.10) subject to the condition (6.7).

Bilinear forms in the Gaussian Domain
It remains to estimate the bilinear form (4.11). We need the bound with any A 2. In this section we make several simplifications before launching the essential arguments.
First we split the segment z < m z 2 into dyadic intervals M < m 2M . Assume for simplicity that log z/ log 2 is an integer so we cover the segment exactly with 2 log z/ log 2 dyadic intervals. We get where M runs over the numbers z, 2z, 4z, . . . . Next we transfer the common factor c = (m, n) from m to n getting

The contribution of terms with c > C is estimated trivially by
This bound satisfies (7.1) if C = (log x) A+3 . Now we ignore the condition c 2 |n for c C getting for some M with z/C M < z 2 . Note that the support of f (t) implies that n runs over the segment N/4 < n < N with M N = x.
Note that we have introduced the restriction (m, 2h) = 1, which is permitted because it is redundant. Indeed, if e = (m, 2h) = 1, then e|ℓ 2 , e|ℓ, e 2 |mn, e 2 |m, contradiction! Typically, for bilinear forms of this nature, one applies Cauchy's inequality and interchanges the order of summation. However, in our case a mn (h) has multiplicity which would become more difficult to treat after application of Cauchy's inequality. Our next step is to express the variables in terms of Gaussian integers so that there is no multiplicity, after which Cauchy's inequality can be applied without leading to such complications.
In the following the gothic letters a, b, m, n, . . . denote Gaussian integers and the corresponding Latin letters a, b, m, n, . . . denote the norms; a = aa, b = bb, m = mm, n = nn, . . . . By the unique factorization in Z Here we put Note that m = mm is squarefree odd so this inner sum runs over Gaussian integers m with (m, m) = 1 (called primitive). In this case the Möbius function µ(m) in rational integers agrees with the Möbius function µ(m) in Gaussian integers. For notational convenience we shall be writing m ∼ M to say that m = mm ∼ M . The condition (m, n) = 1 was needed for performing the unique factorization in Z[i]. After that, the resulting condition (m, n) = 1 is a hindrance so we are going to remove it using a similar argument by which we inserted it, but now in the Gaussian domain.
We start from the formula We keep the terms with c C 1 = (log x) 2A+8 and estimate the remaining terms with larger c trivially getting Now we need to show that Some properties of n in the outer sum of (7.5) are hidden but can be inferred from the equation mn = 4b 2 h 2 + ℓ 2 and the support of f (mn) being x/2 < mn < x. In particular, the inequality ℓ < √ x is redundant information in every expression containing the crop function f . From now on the dyadic segment m ∼ M never changes so sometimes we skip writing m ∼ M or m ∼ M , but never forget it. Now we are ready to apply Cauchy's inequality as follows: Note that we borrowed a factor h into C(M ). Now we need to show that Squaring out and interchanging the order of summation we write where the summation runs over all Gaussian integers n satisfying (7.14) Im m 1 n ≡ Im m 2 n ≡ 0 (mod 2h).

In the off-diagonal area
From now on we assume that ∆ = ∆(m 1 , m 2 ) = 0. Now the system of equations (7.17) has a unique solution in the complex number n given by Since n must be a Gaussian integer this means ℓ 1 , ℓ 2 satisfy (9.2) ℓ 1 m 2 ≡ ℓ 2 m 1 (mod ∆).
By the distribution of primes ℓ 1 , ℓ 2 in arithmetic progressions we expect that the main term of (9.14) should be which is regarded as an error term. We need to sum E h (m 1 , m 2 ) and R h (m 1 , m 2 ) over m 1 , m 2 as in (7.12) and over h as in (7.9) restricted by ∆(m 1 , m 2 ) ≡ 0 (mod 2h), see (9.10). Therefore our moduli 2∆h run over multiples of 4h 2 .

Separation of variables
In Section 12 we shall estimate the error terms by means of the large sieve. To this end we need to separate the variables ℓ 1 , ℓ 2 from m 1 , m 2 , because m 1 , m 2 are constituents of the moduli ∆(m 1 , m 2 )h. Although in most cases the determinant ∆(m 1 , m 2 ) is as large as M , it can take smaller values which require special attention. Our technique of separation of variables addresses this issue.
For the proof of (10.3) apply polar coordinates. In our case s = a 2 + b 2 = (α 1 I) 2 + (α 2 + α 1 R) 2 is the quadratic form Going through the Fourier transform we lost sight on the ranges of ℓ 1 , ℓ 2 so let us record that This information is redundant when the original function (10.1) is present. Estimating directly and after integrating by parts two times of (10.6) we find that Hence F (s) is very small if s > x −1 (log x) 2C so the integration (10.8) runs effectively over the set (ellipse) (10.11) S = (α 1 , α 2 ) ∈ R 2 : s(α 1 , α 2 ) = (α 1 I) 2 + (α 2 + α 1 R) 2 x −1 (log x) 2C whose volume (the Lebesgue measure) is equal to Note that the trivial integration shows that (10.8) is bounded Similarly we find that the integral over R 2 \ S is small; Therefore, we lost essentially nothing by the separation of the variables ℓ 1 , ℓ 2 through the Fourier transform (10.8). We get Recall that the error term R h (m 1 , m 2 ) is given by (9.17). Introducing (10.13) into (9.17) we get γ ℓ1 γ ℓ2 e(α 1 ℓ 1 + α 2 ℓ 2 ).
The total contribution of R ′′ h (m 1 , m 2 ) to C h (M ) is (see (7.12) and (9.10)) The determinant ∆ = ∆(m 1 , m 2 ) occurs with a multiplicity which is bounded by 8M so the above contribution is bounded by Inserting this bound into (7.9) we find that the total contribution of R ′′ h (m 1 , m 2 ) to C(M ), say C ′′ (M ), satisfies This bound satisfies our requirement (7.11) if we take C to be a sufficiently large constant, specifically C 8A + 30.
We get the bound From the second sum of integrals we get Multiplying both estimates we conclude that This bound is better than (11.2) for Therefore we are done in this case.
13 Estimation of R ′ h (m 1 , m 2 ) on average In most cases is not smaller than (12.4). Assuming I does not satisfy (12.4) we give a better treatment of R ′ h (m 1 , m 2 ) using the Siegel-Walfisz condition and the large sieve inequality. We begin by removing the twists by additive characters from the multiplicative character sum (12.2). To this end we apply partial summation losing factors 1 + 2π|α 1 | √ x and 1 + 2π|α 2 | √ x.
Specifically, we apply the expression (13.2) e(αℓ) = 1 + 2πiαˆℓ 0 e(αt) dt to the sums over ℓ in (12.2) getting for some 0 < t 1 , t 2 < √ x. The loss is not large because, for (α 1 , α 2 ) in S, Integrating this over S against F (s(α 1 , α 2 )) ≪ x we conclude by (11.3) that Remarks 13.1. The cropping parameters t 1 , t 2 come from integration in the expression (13.2). We could carry such integration to the very end of our arguments and only then choose the worst values t 1 , t 2 which are independent of the preceding variables m 1 , m 2 , h. To simplify the presentation we accept (13.3) having t 1 , t 2 independent of m 1 , m 2 , h. By 2G(t 1 , t 2 ) G(t 1 , t 1 ) + G(t 2 , t 2 ) we arrive at We need to sum R ′ h (m 1 , m 2 ) over m 1 , m 2 as in (7.12) and over h as in (7.9) subject to the condition ∆ = ∆(m 1 , m 2 ) ≡ 0 (mod 2h), see (9.10). The total contribution of Recall that m 1 ∼ M , m 2 ∼ M and m 1 , m 2 are primitive. The determinant ∆ occurs with certain multiplicity which is bounded by 8M , so Each character χ = χ 0 is induced by a unique primitive character χ 1 (mod q) with q = 1, q|rh 2 and χ(ℓ) = χ 1 (ℓ) for primes ℓ > rh 2 . Hence Using the S-W condition for small q and the large sieve inequality for larger q we get Finally, the total contribution of the error terms R h (m 1 , m 2 ) to C(M ) is bounded by This bound satisfies our requirement (7.11) if we take B large. Every bound obtained so far satisfies our requirements subject to the conditions (6.7) and (13.7). It remains to estimate the contribution of the main terms E h (m 1 , m 2 ) to C h (M ) on average over h, see (9.16), (7.9), (7.11). It turns out that the main term is a harder piece than the error terms!

Preparation of the main terms
Recall that the main terms E h (m 1 , m 2 ) are defined by (9.16) and we need to estimate the sums Our goal is to show that with any B 2, which bound is fine for the requirement (7.11).
In this section we make preparations for the application of tools in the next two sections. First it helps to execute the summation over ℓ 1 , ℓ 2 in (9.16). To this end we exploit our assumption (1.8) for q = 1, that is the PNT for the γ ℓ 's.
Let us check that the restrictions (10.9) are redundant. Indeed, from the support of f and n given by (9.15) we get Interchanging ℓ 1 , ℓ 2 and m 1 ,m 2 we get a similar formula for m 1 n, hence ℓ 1 < √ x.
We show that the partial derivatives of f (x 1 , x 2 ) defined by (10.1) satisfy To this end, we compute as follows: .
Similarly for f (m 1 n(x 1 , x 2 )) and for the partial derivatives with respect to x 2 . Hence, (14.4) holds.

Hence the contribution of
From now on we assume that The condition (m 1 m 2 , 2h) = 1 in the inner sum of (14.13) is equivalent to (m 1 m 2 , c/(c, d)) = 1. This is a harmless, but inconvenient condition. We are going to remove it by a cute trick. Let T * denote the sum over m 1 , m 2 with the condition (m 1 m 2 , c/(c, d)) = 1 and T the sum without this condition. We show that (14.15) 0 T * T.
By the above considerations we derive the following inequality µ(m)f (tm)e a c uw 2 and the sums ′ ′ are restricted by the conditions (14.10) and (14.14). Moreover we dropped out of (14.16) a few parts which we already showed to be admissisble for the goal (14.8). Here we Hence (14.20) Writing a/c in the lowest terms we get (14.21)

Small moduli
We can estimate the sum over m = u + ibw in (14.22) using the Siegel-Walfisz type theorem in the Gaussian domain. See Lemma 5 of [Fog] or Lemma 16.1 of [FrIw1] and the references therein. µ(m)f (tm)e a q uw and the partial sum of (14.21) with q Q 0 , say K(q Q 0 ), satisfies This bound satisfies (14.8) if

Large moduli
It remains to estimate the partial sums of (14.21) with Q 1 < q 2Q 1 , say K(q ∼ Q 1 ), for In this range we no longer need help from the Möbius function µ(m); the cancellation is due to the variation of e a q uw . We need saving a bit larger than the size of the conductor q so the saving from averaging over the classes a (mod q) (making the Ramanujan sum) is not enough. But even a little extra averaging extracted from q would do the job by means of the large sieve inequality. However, we do not have any multiplicative structure of q from which to borrow a little extra averaging so we throw the whole range q ∼ Q 1 into the game.
For m = u + ibw with m ∼ M the first coordinate u runs over the segment |u| √ 2M which is sufficiently long for exploiting the large sieve inequality effectively. Because we do not need help from the second coordinate v = bw, (w, qr) = 1, we can simplify the matter by estimating (14.22) as follows Summing over q ∼ Q 1 we get by the large sieve inequality Recall that M satisfies (7.7). Hence

Proof of SMT. Conclusion
Putting together the results of Sections 6-16 we complete the proof of (6.10) and of SMT (see (6.8)) under the following conditions: y 2 (log x) 2A+4 z x The choice z = x 1 6 and y = x θ with any θ < 1 12 is good. This completes the proof of (1.16).

Derivation of MT
It is not hard to derive MT from SMT simply by subdividing the range 1 t x into dyadic segments T < t 2T, T = 2 −a x, a = 1, 2, . . . and smoothing at the end points over two short intervals T < t < T (1 + δ), 2T (1 − δ) < t < 2T.
The total contribution of n's in the short intervals is estimated trivially by O(δx(log x) 4 ) which is absorbed by the error term in (1.10) if The resulting smooth function f (t) supported in a given dyadic segment is f (t) = 1, except for t in the short intervals adjacent to the end points where t j f (j) (t) ≪ δ −j . Because we require only j = 0, 1, 2, the condition (1.14) can be secured by resizing f (t) by a factor δ 2 . This factor does not ruin (1.16), because we can use (1.16) with A replaced by 3A + 8.
We have k c k = X + O x(log x) −A with X = κx. For any 1 h y, h squarefree, we set the error terms r h = k≡0 (mod h) c k − g(h)X and we derive by (1.10) with some λ h = ±1 that h y In other words, speaking the language of sieve theory, our sequence C = (c k ) has the absolute level of distribution y and the density function g(h) satisfies the linear sieve condition (5.38) of [FrIw2]. Therefore, Theorem 25.1 of [FrIw2] is applicable giving (k,P (z))=1 ν(k) r c k ≍ x(log x) −1 , Hence, the remainder of level D is estimated as follows: |E(x; d, a)| 2 1 2 + D(log x) 6 ≪ x 2 (log x) −A by the Barban-Davenport-Halberstam Theorem (see (9.75) of [FrIw2]), where A is any positive number and D = x 2 (log x) −B with some B = B(A). Therefore the sequence A = (a n ) is supported on n N = 5x 2 , it satisfies the linear sieve conditions and it has level of distribution D ≍ N 1 2 (log N ) −B . Now, just about any sieve, such as for example Theorem 6.9 of [FrIw2], gives the upper bound claimed in the proposition. Since also ∆ 3 > 4 − log 4/ log 3 > 2, it follows from Theorem 25.1 of [FrIw2] that the lower bound in the proposition holds and specifically (A1) ω(n) 3 n,P (D 1 4 ) =1 a n ≍ x 2 (log x) −1 , which implies the proposition.
We conclude the paper with heuristics supporting the formula (1.2). If r 2 we use Bombieri's sieve in Theorem 3.5 of [FrIw2] showing that (1.2) holds with the constant Recall that κ is given by (1.12), hence c is given by (1.3).
Of course, this result is conditional subject to the assumption that the sequence C = (c k ) has exponent of distribution as large as 1, meaning (1.10) holds for y = x θ with any θ < 1 2 . If r = 1, we write Λ(k) = h|k λ h , λ h = −µ(h) log h, and apply (1.10). For r = 1 Bombieri's sieve gives no help so we simply ignore that (1.10) is applicable unconditionally only for h < y, because we believe that for larger h the Möbius function does not correlate with anything "different" on its way. We arrive at (GPC) with the constant κ h µ(h)(− log h)g(h) = c.