Dirichlet $L$-functions of quadratic characters of prime conductor at the central point

We prove that more than nine percent of the central values $L(\frac{1}{2},\chi_p)$ are non-zero, where $p\equiv 1 \pmod{8}$ ranges over primes and $\chi_p$ is the real primitive Dirichlet character of conductor $p$. Previously, it was not known whether a positive proportion of these central values are non-zero. As a by-product, we obtain the order of magnitude of the second moment of $L(\frac{1}{2},\chi_p)$, and conditionally we obtain the order of magnitude of the third moment. Assuming the Generalized Riemann Hypothesis, we show that our lower bound for the second moment is asymptotically sharp.


Introduction and results
The values of L-functions at special points on the complex plane are of great interest. At the fixed point of the functional equation, called the central point, the question of nonvanishing is particularly important. For instance, the well-known Birch and Swinnerton-Dyer conjecture [43] relates the order of vanishing of certain L-functions at the central point to the arithmetic of elliptic curves. Katz and Sarnak [22] discuss several examples of families of L-functions and describe how the zeros close to s = 1 2 give evidence of some underlying symmetry group for each of these families. They suggest that understanding these symmetries may in turn lead to finding a natural spectral interpretation of the zeros of the L-functions. The analysis of each family they discuss leads to a Density Conjecture that, if true, would imply that almost all L-functions in the family do not vanish at the central point. Iwaniec and Sarnak [19] show that the non-vanishing of L-functions associated with holomorphic cusp forms is closely related to the Landau-Siegel zero problem. Thus the question of non-vanishing at the central point is connected to many deep arithmetical problems.
A considerable amount of research has been done towards answering this question for families of Dirichlet L-functions. Chowla conjectured that L( 1 2 , χ) = 0 for χ a primitive quadratic Dirichlet character [7, p. 82, problem 3]. It has since become a sort of folklore conjecture that L( 1 2 , χ) = 0 for all primitive Dirichlet characters χ. One family that has attracted a lot of attention is the family of L(s, χ) with χ varying over primitive characters modulo a fixed conductor. This family is widely believed to have a unitary symmetry type, as in the philosophy of Katz and Sarnak. Balasubramanian and Murty [3] were the first to prove that a (small) positive proportion of this family does not vanish at the central point. They used the celebrated technique of mollified moments, a method that has been highly useful in other contexts (see, for example, [4,9,38]). Iwaniec and Sarnak [18] developed a simpler, stronger method and improved this proportion to 1 3 . The approach of Iwaniec and Sarnak has since become standard in the study of non-vanishing of L-functions at the central point. Bui [5] and Khan and Ngo [26] introduced new ideas and further improved the lower bound 1 3 . The second author [35] has shown that more than fifty percent of the central values are non-vanishing when one additionally averages over the conductors. For further interesting research on this and other families of L-functions, see [6,10,23,24,25,27,28,29,30,31].
The family of L(s, χ) with χ varying over all real primitive characters has also been extensively studied. This family is of particular significance because it seems to be of symplectic rather than unitary symmetry. Thus we encounter new phenomena not seen in the unitary case. For d a fundamental discriminant, set χ d (·) = d · , the Kronecker symbol. Then χ d is a real primitive character with conductor |d|. The hypothetical positivity of central values L( 1 2 , χ d ) has implications for the class number of imaginary quadratic fields [17, p. 514]. Jutila [21] initiated the study of non-vanishing at the central point for this family and proved that L( 1 2 , χ d ) = 0 for infinitely many fundamental discriminants d. His methods show that ≫ X/ log X of the quadratic characters χ d with |d| ≤ X have L( 1 2 , χ d ) = 0.Özlük and Snyder [32] examined the low-lying zeros of this family, and found the first evidence of its symplectic behavior. Assuming the Generalized Riemann Hypothesis (GRH), they showed that more than 15 16 of the central values L( 1 2 , χ d ) are non-zero [33]. Katz and Sarnak independently obtained the same result in unpublished work (see [22,39]).
Soundararajan [39] made a breakthrough when he proved unconditionally that more than 7 8 of the central values L( 1 2 , χ d ) with d ≡ 0 (mod 8) are non-zero. The biggest difficulty lies in analyzing the contribution of the "off-diagonal" terms in the evaluation of a mollified second moment. Soundararajan discovered that there is, in fact, a main contribution arising from these off-diagonal terms. (See Section 3 for more discussion.) The case of real primitive characters with prime conductor is more difficult still. Jutila [21] initiated the study of L( 1 2 , χ p ), where p is a prime. His methods yield that ≫ X/(log X) 3 of the primes p ≤ X satisfy L( 1 2 , χ p ) = 0. The difficulty in studying this family is that its moments involve sums over primes, and thus are more complicated to investigate. In fact, Jutila only evaluated the first moment of this family. As far as the authors are aware, no asymptotic evaluation of the second moment has appeared in the literature. However, Andrade and Keating [2] asymptotically evaluated the second moment of an analogous family over function fields. Andrade and the first author [1] have continued the study of the family of L( 1 2 , χ p ), showing that it is likely governed by a symplectic law. Conditionally on GRH, they prove that more than 75% of primes p ≤ X satisfy L( 1 2 , χ p ) = 0. We prove an unconditional positive proportion result for the central values L( 1 2 , χ p ). In fact, we prove that more than nine percent of these central values are non-zero. The proof of Theorem 1.1 proceeds via the mollification method, which we discuss briefly in Section 3 below. Our methods build on those of Jutila [21] and Soundararajan [39]. As in the work of Soundararajan, the main difficulty lies in evaluating the contribution of certain off-diagonal terms. The difference now is that we are summing over primes instead of over square-free integers, and so we cannot directly use his approach. A key idea in the proof of Theorem 1.1 is the use of upper bound sieves to turn intractable sums over primes into manageable sums over integers. The use of sieves in studying central values of L-functions has also appeared in some other contexts (see [16], also [36, p. 1035]).
One would rather have an upper bound in Theorem 1.2 that asymptotically matches the lower bound, but this seems difficult to prove unconditionally. By adapting a method of Soundararajan and Young [41] we are able, however, to prove such an asymptotic formula on GRH. After we completed this paper, Maksym Radziwi l l informed us about work in progress with Julio Andrade, Roger Heath-Brown, Xiannan Li, and K. Soundararajan in which they derive an unconditional asymptotic formula for the second moment of L( 1 2 , χ p ). Their approach similarly introduces sieve weights, and they also observed that this idea could lead to a non-vanishing result.
Our methods further yield the order of magnitude of the third moment of L( 1 2 , χ p ), assuming that the central values L( 1 2 , χ n ) are non-negative for certain fundamental discriminants n. This non-negativity hypothesis follows, of course, from GRH. Theorem 1.4. Assume that for all positive square-free integers n with n ≡ 1 (mod 8) it holds that L( 1 2 , χ n ) ≥ 0. Then for large X p≤X p≡1 (mod 8) (log p)L 1 2 , χ p 3 ≍ X(log X) 6 .
Throughout this paper, we work exclusively with p ≡ 1 (mod 8) for convenience, but our methods are not specific to this residue class. With some modifications one could state similar results for other residue classes modulo 8. See the end of Section 3 for more details.
Our work indicates that Soundararajan's lower bound [39] for the proportion of nonvanishing for fundamental discrimimants d ≡ 0 (mod 8) also holds for the case of fundamental discriminants d ≡ 1 (mod 8). Proving this involves re-doing the calculations in Section 7, but without applying an upper bound sieve. To complete the proof, one would also need a first moment calculation. We omit the details and instead refer the reader to [39,Section 4].
It is natural to ask about the limitations of our method, and how much we can increase the lower bound in Theorem 1.1. If we assume that we can use arbitrarily long mollifiers [12], then we obtain a higher percentage of non-vanishing. However, in view of the parity problem of sieve theory [13], we could not reach a proportion greater than 1 2 via our method. On the other hand, by a different method [1], the Density Conjecture of Katz and Sarnak would imply that 100% of the central values L( 1 2 , χ p ) are nonzero. The outline of the rest of the paper is as follows. In Section 2 we establish some notation and conventions that hold throughout this work. Section 3 outlines the basic strategy for the proof of Theorem 1.1. In Sections 4 and 5 we state a number of important technical results which are used in the proofs of our theorems. The proof of Theorem 1.1 is spread across Sections 6, 7, and 8. In Section 6 and its subsections we study the mollified first moment problem. The very long Section 7 and its subsections handle the mollified second moment. We choose our mollifier and finish the proof of Theorem 1.1 in Section 8. We prove Theorems 1.2 and 1.3 in Section 9, and we prove Theorem 1.4 in Section 10.

Notation and conventions
We define χ n (·) = n · , the Kronecker symbol, for all nonzero integers n, even if n is not a fundamental discriminant. Note that this means χ n has conductor |n| only when n is a fundamental discriminant. We write S(Q) for the set of all real primitive characters χ with conductor ≤ Q. For an integer n, we write n = or n = according to whether or not n is a perfect square.
We let ε > 0 denote an arbitrarily small constant whose value may vary from one line to the next. When ε is present, in some fashion, in an inequality or error term, we allow implied constants to depend on ε without necessarily indicating this in the notation. At times we indicate the dependence of implied constants on other quantities by use of subscripts: for example, Y ≪ A Z.
Throughout this paper, we denote by Φ(x) a smooth function, compactly supported in We could state our results for arbitrary smooth functions supported in [ 1 2 , 1], but we avoid this in an attempt to achieve some simplicity.
We write e(x) = e 2πix . For g a compactly supported smooth function, we define the Fourier transformĝ(y) of g byĝ (y) = R g(x)e(−xy)dx.
At times, however, we find it convenient to use a slightly different normalization of the Fourier transform (see Lemma 5.2).
We define the Mellin transform g † (s) of g by It is also helpful to define a modified Mellin transformǧ(w) by Observe thatǧ(w) = g † (1 + w). Lastly, for a complex number s, we define g s (t) = g(t)t s/2 .
Note thatΦ The letter p always denotes a prime number. We write ϕ for the Euler phi function, and d k for the k-fold divisor function. If a and b are integers we write [a, b] for their least common multiple and (a, b) for their greatest common divisor. It will always be clear from context whether [a, b], say, denotes a least common multiple or a real interval.
Given coprime integers a and q, we write a (mod q) for the multiplicative inverse of a modulo q.
3. Outline of the proof of Theorem 1.1 The proof of Theorem 1.1 proceeds through the mollification method. The method was introduced by Bohr and Landau [4], but later greatly refined in the hands of Selberg [38]. The idea is to introduce a Dirichlet polynomial M(p), known as a mollifier, which dampens the occasional wild behavior of the central values L( 1 2 , χ p ). We study the first and second moments If the mollifier is chosen well then S 1 ≫ X and S 2 ≪ X. By the Cauchy-Schwarz inequality we have and this implies that a positive proportion of L( 1 2 , χ p ) are non-zero. Our mollifier takes the form for some coefficients b m we describe shortly. Here we set The larger one can take θ, the better proportion of non-vanishing one can achieve.
The coefficients b m are a smoothed version of the Möbius function µ(m). Specifically, we choose where H(t) is smooth function compactly supported in [−1, 1] which we choose in Section 8. It will be convenient in a number of places that b m is supported on square-free integers.
We outline our strategy for estimating S 1 and S 2 . We simplify the presentation here in comparison to the actual proofs. The sum S 1 is by far the simpler of the two, so we start here (see Section 6). Using an approximate functional equation for the central value L( 1 2 , χ p ) (Lemma 4.2), we write S 1 as The main term arises from the "diagonal" terms mk = . The character values χ p (mk) are then all equal to one, and we simply use the prime number theorem in arithmetic progressions modulo eight to handle the sum on p. The sum on k contributes a logarithmic factor, but this logarithmic loss is canceled out by a logarithmic gain coming from a cancellation in the mollifier coefficients. This yields the main term for S 1 , which is of size ≍ X (Proposition 6.1). The "off-diagonal" terms mk = contribute only to the error term. After some manipulations the off-diagonal terms are essentially of the form where α(q) is some function satisfying |α(q)| ≪ ε q ε . We assume here for simplicity that all of the characters χ q are primitive characters. We bound the character sum over primes in E in three different ways, depending on the size of q. These three regimes correspond to small, medium, and large values of q. Some of the arguments are similar to those of Jutila [21].
In the regime of small q we appeal to the prime number theorem in arithmetic progressions with error term. The sum on primes p is small, except in the case where one of the characters χ q * is exceptional: that is, the associated L-function L(s, χ q * ) has a real zero β * very close to s = 1. Siegel's theorem gives q * ≥ c(B)(log X) B with B > 0 arbitrarily large. This would immediately dispatch any exceptional characters, but unfortunately the constant c(B) is not effectively computable. To get an effective estimate we use Page's theorem, which states that at most one such exceptional character χ q * exists. We then study carefully the contribution of this one exceptional character and show it is acceptably small.
In regimes of medium and large q, we take advantage of the averaging over q present in E. We bound E in terms of instances of where Q is of moderate size, or is large. When Q is medium-sized, we use the explicit formula to bound E(Q) by sums over zeros of the L-functions L(s, χ q ). We then use zero-density estimates.
We are left with the task of bounding E(Q) when Q is large, which means Q is larger than X δ for some small, fixed δ > 0. Rather than treating the sum on primes analytically, as we did when Q was small or medium-sized, we treat the sum on primes combinatorially. We use Vaughan's identity to write the character sum over the primes as a linear combination of linear and bilinear sums. The linear sums are handled easily with the Pólya-Vinogradov inequality. We bound the bilinear sums by appealing to a large sieve inequality for real characters due to Heath-Brown (Lemma 4.4).
We now describe our plan of attack for S 2 (see Section 7). Recall that As we see from Theorem 1.3, we only barely obtain an asymptotic formula for the second moment p≤X p≡1 (mod 8) (log p)L 1 2 , χ p under the assumption of the Generalized Riemann Hypothesis. Thus, it might seem doubtful that one can say anything useful about S 2 , since the central value L( 1 2 , χ p ) 2 is further twisted by the square of a Dirichlet polynomial. The key idea is that we do not need an asymptotic formula for S 2 , but only an upper bound of the right order of magnitude (with a good constant). We therefore avail ourselves of sieve methods (see Section 5). By positivity we have is an upper bound sieve supported on coefficients with d ≤ D. Since we are now working with ordinary integers instead of prime numbers, the analysis for S 2 becomes similar to the second moment problem considered in [39] (see [39,Section 5]). We begin by writing and Y is a small power of X. The sum is an error term, and is shown to be small in a straightforward fashion by applying moment estimates for L( 1 2 , χ n ) due to Heath-Brown (Lemma 4.5). The main task is therefore to asymptotically evaluate the sum We use an approximate functional equation to represent the central values L 1 2 , χ n 2 and arrive at expressions of the form where ω(x) is some rapidly decaying smooth function that satisfies ω(x) ≈ 1 for small x. We then change variables n = m[d, ℓ 2 ].
We use Poisson summation to transform the sum on m into a sum basically of the form for some smooth function F ν . The zero frequency k = 0 gives rise to a main term. Since ( 0 h ) = 1 or 0 depending on whether h is a square, the k = 0 contribution represents the expected "diagonal" contribution from m 1 m 2 ν = . There is an additional, off-diagonal, main term which arises, essentially, from the terms with [d, ℓ 2 ]k = . We adapt here the delicate off-diagonal analysis of [39]. The situation is complicated by the presence of the additive character e(·), which is not present in [39]. The additive character necessitates a division of the integers k into residue classes modulo 8. We then use Fourier expansion to write the additive character as a linear combination of multiplicative characters. After many calculations the off-diagonal main term arises as a sum of complex line integrals. When we combine the various pieces the integrand becomes an even function, exhibiting a symmetry which none of the pieces separately possessed. This fact proves to be very convenient in the final steps of the main term analysis.
One intriguing feature of the main term in S 2 is a kind of "double mollification". We must account for the savings coming from the mollifier M(n), but must also account for the savings coming from the sieve weights λ d , which act as a sort of mollifier on the natural numbers. It is crucial that we get savings in both places, and therefore our sieve process must be very precise. We find that a variation on the ideas of Selberg (see e.g. [17, Section 6.5]) is sufficient.
At length we arrive at an upper bound S 2,U , say, for S 2 of size S 2,U ≪ X. We make an optimal choice of the function H(x) in Section 8 to maximize the ratio S 2 1 /S 2,U . The resulting mollifier is not the optimal mollifier, but it gives results that are asymptotically equivalent to those attained with the optimal mollifier. This yields Theorem 1.1.
To treat other residue classes of p (mod 8), we make the following changes. First, we change the definition of χ p (·) to (−1) a p · , where a = 0 if p ≡ 1 (mod 4) and a = 1 if p ≡ 3 (mod 4). Thus χ p is still a primitive character of conductor p. Second, we use a variant of the approximate functional equation (Lemma 4.2) with ω j , defined in (4.1), replaced by The function W (s) here is 16 s 2 − 1 4 2 . Its purpose is to cancel potential poles at s = 1 2 in the analysis.

Lemmata
We represent the central values of L-functions by using an approximate functional equation. We first investigate some properties of the smooth functions which appear in our approximate functional equations. For j = 1, 2 and c > 0, define Lemma 4.1. Let j = 1, 2. The function ω j (ξ) is real-valued and smooth on (0, ∞). If ξ > 0 we have For any fixed integer ν ≥ 0 and ξ ≥ 4ν + 10, we have Proof. The proof is similar to [39, Lemma 2.1], but we give details for completeness. The function ω j (s) is real-valued because the change of variable Im(s) → −Im(s) shows that ω j is equal to its complex conjugate. Moreover, uniform convergence for ξ in compact subintervals of (0, ∞) shows that ω j is smooth.
Recall that |Γ(x + iy)| ≤ Γ(x) for x ≥ 1 and zΓ(z) = Γ(z + 1). Thus, for c ≥ 2 we obtain Proof. The proof follows along standard lines (e.g. [17,Theorem 5.3]), but we give a proof since our situation is slightly different. Let Λ(z, χ n ) = n π z/2 Γ z 2 L(z, χ n ). Since n ≡ 1 (mod 4) we have χ n (−1) = 1, and therefore we have the functional equation (see [8,Proposition 2.2.24], [11,Chapter 9]) Recall also that Λ(z, χ n ) is entire because χ n is primitive. Now consider the sum We use the definition of ω j and interchange the order of summation and integration. Since χ n (2) = 1 we have We move the line of integration to Re(s) = −c, picking up a contribution from the simple pole at s = 0: In this latter integral we change variables s → −s and then apply the functional equation Λ 1 2 − s, χ n = Λ 1 2 + s, χ n to obtain ( n π ) −j/4 Γ 1 We then rearrange to obtain the desired conclusion.
We frequently encounter exponential sums which are analogous to Gauss sums. Given an odd integer n, we define for all integers k We require knowledge of G k (n) for all n.
The following two results are useful for bounding various character sums that arise. Both results are corollaries of a large sieve inequality for quadratic characters developed by Heath-Brown [15]. Lemma 4.4. Let N and Q be positive integers, and let a 1 , . . . , a N be arbitrary complex numbers. Then χ∈S(Q) n≤N a n χ(n) 2 ≪ ε (QN) ε (Q + N) n 1 n 2 = |a n 1 a n 2 |, for any ε > 0. Let M be a positive integer, and for each |m| ≤ M write 4m = m 1 m 2 2 , where m 1 is a fundamental discriminant, and m 2 is positive. Suppose the sequence a n satisfies |a n | ≪ n ε . Then Proof. This is [ Proof. This is [39, Lemma 2.5].

Sieve estimates
Our main sieve will be a variant of the Selberg sieve (see [14,Chapter 7]). To lessen the volume of calculations, we also use Brun's pure sieve [14,Chapter 6] as a preliminary sieve to handle small prime factors. We set Given a set A of integers we write 1 A (n) for the indicator function of this set. For y > 2 we define P (y) = p≤y p.
We use an "analytic" Selberg sieve (e.g. [34]) for the second factor of (5.3). We introduce a smooth, non-negative function G(t) which is supported on the interval [−1, 1]. We further require G(t) to satisfy |G(t)| ≪ 1, |G (j) (t)| ≪ j (log log X) j−1 for j a positive integer, and on the interval [0, 1] we require G(t) = 1 − t for t ≤ 1 − (log log X) −1 . Then We mention also that the properties of G imply  where the coefficients λ d are defined by If b|P (z 0 ) and ω(b) ≤ 2r 0 , then b ≤ z 2r 0 0 = exp(2(log X) 2/3 ). Hence λ d = 0 only for d ≤ D, where In our evaluation of sums involving the sieve coefficients (5.8) we use the following version of the fundamental lemma of sieve theory (see also [14,Section 6.5]).
Lemma 5.1. Let 0 < δ < 1 be a fixed constant, r a positive integer with r ≍ (log X) δ , and z 0 as in (5.1). Suppose that g is a multiplicative function such that |g(p)| ≪ 1 uniformly for all primes p. Then uniformly for all positive integers ℓ.
Proof. The proof is standard. Complete the sum on the left-hand side by adding to it all the terms with ω(b) > r, dropping by positivity the condition (b, ℓ) = 1. The error introduced in doing so is ≪ exp(−(1 + o(1))r log r) ≪ exp(−r log log r) (e.g. [17, §6.3]). The completed sum is equal to the Euler product on the right-hand side.
The basic tool in our application of the Selberg sieve is the following lemma.
Lemma 5.2. Let z 0 = exp((log X) 1/3 ). Let G be as above. Suppose h is a function such that |h(p)| ≪ ε p −ε uniformly for all primes p. Let A > 0 be a fixed real number. Then there exists a function E 0 (X), which depends only on X, G, and ϑ (see Proof. Let S denote the left-hand side of (5.10). If m, n ≤ R and (mn, P (z 0 )) = 1, then ω(mn) ≪ log R, and each prime dividing mn is larger than z 0 . Thus and so We may ignore the condition (mn, ℓ) = 1 in (5.11) because We next insert the Fourier inversion formula We then interchange the order of summation and integration and write the sum as an Euler product to deduce that (5.14) By integrating (5.13) by parts repeatedly we see and we have the trivial bound Therefore, we may truncate the double integral in (5.14) to the region |z 1 |, |z 2 | ≤ √ log R, with an error of size O A ((log R) −A ). After doing so, we multiply and divide the integrand by Euler products of zeta-functions to arrive at The product over primes p > z 0 in (5.15) is 1 + O(1/z 0 ). To estimate the product over We may also expand each zeta-function in (5.15) into its Laurent series. With these approximations, we deduce from (5.15) that uniformly for log ℓ ≪ log X. Here E(X, ϑ, z 1 , z 2 ) tends to zero as X → ∞. By the rapid decay of g(z), we may extend the range of integration to R 2 without affecting our bound for the error term. By differentiating (5.12) under the integral sign and Fubini's theorem, we find The lemma now follows from (5.16) and (5.6). Lemma 5.3. Let λ d and D be as defined in (5.8) and (5.9), respectively. Suppose that g is a multiplicative function such that g(p) = 1 + O(p −ε ) for all primes p. Then with E 0 (X) as in Lemma 5.2 we have Proof. The definitions (5.8) and (5.9) of λ d and D imply In the sum on the right-hand side, g(b[m, n]) = g(b)g([m, n]) because b and mn are coprime. Thus we may apply Lemma 5.2 and then Lemma 5.1 to arrive at Lemma 5.3.
Lemma 5.4. Let λ d , D, g be as in Lemma 5.3. Suppose that h is a function such that |h(p)| ≪ ε p −1+ε for all primes p. Then with E 0 (X) as in Lemma 5.2 we have uniformly for all integers ℓ such that log ℓ ≪ log X. (Here, the index q runs over primes q.) Proof. The definitions (5.8) and (5.9) of λ d and D imply Since b and mn are coprime, We may ignore the sum over the p|mn because the conditions (mn, P (z 0 )) = 1 and mn ≤ R 2 imply We factor out g(b) and p|b h(p) from the sum over m, n and then apply Lemma 5.2 to deduce that To estimate the b-sum, we interchange the order of summation and then relabel b as bp to write Lemma 5.4 now follows from Lemma 5.1 and (5.17).

The mollified first moment
Our goal in this section is to asymptotically evaluate S 1 . Recall from (3.1) that Recall the definition of M(p) from (3.3), and the choice (3.5) we made for the mollifier coefficients b m . We shall prove the following result.
The implied constant in the error term is effectively computable.
Let us begin in earnest, following the outline in Section 3. We apply Lemma 4.2 to write L( 1 2 , χ p ) as a Dirichlet series. We insert the definition of M(p) and obtain The main term arises from the terms with mn = . Let us denote this portion of S 1 by S 1 . We denote the complementary portion with mn = by S = 1 . Therefore We treat first the main term S 1 , and later we will bound the error term S = 1 .
6.1. Main term. Recall that b m is supported on square-free integers m. Therefore, mn = if and only if n = mk 2 , where k is a positive integer. We make this change of variables and then interchange orders of summation to obtain By the rapid decay of ω 1 (Lemma 4.1) we see that the contribution from those k with , so we may safely ignore this condition. We may also ignore the condition (m, p) = 1, since m ≤ M < p. We insert the definition (4.1) of ω 1 (ξ) and interchange to deduce that for any c > 0 we have We move the line of integration to Re s = − 1 2 + ε, leaving a residue at (3.4). Writing the residue at s = 0 as an integral along a small circle around 0, we deduce that From repeated integration by parts we obtain as a power series and arrive at We may extend the range of integration to the entire real line, with negligible error, because of (6.1.4). The definition of H(t) implies that We insert (6.1.5) into (6.1.1) to obtain We evaluate the integral using the formula for a pole of a function g(s) at s = 0 of order at most n. This yields By the support of Φ we have log p = log X + O(1). We then use the prime number theorem in arithmetic progressions and partial summation to obtain Now (6.1.7) gives the main term for Proposition 6.1.

6.2.
Preparation of the off-diagonal. We turn to bounding S = 1 . In order to complete the proof of Proposition 6.1, we prove We need to perform some technical massaging before S = 1 is in a suitable form. Recall from (6.1) that We begin by uniquely writing n = rk 2 , where r is square-free and k is an integer (this variable k is unrelated to the variable k appearing in the analysis for S 1 ). The condition mn = is equivalent to m = r, since both m and r are square-free. It follows that We next factor out the greatest common divisor, say g, of m and r. We change variables m → gm, r → gr and obtain Observe that the support of b gm forces g ≤ M < X 1 2 , but we prefer not to indicate this explicitly.
Clearly we have g 2 k 2 p = 1 for p ∤ gk and = 0 otherwise. Since g ≤ M < p the condition p ∤ g is automatically satisfied. By Lemma 4.1 we may truncate the sum over k to k ≤ X 1 4 +ε at the cost of an error O(X −1 ), say. We may similarly truncate the sum on r to r ≤ X 1 2 +ε . With k suitably reduced we may drop the condition p ∤ k, and then we use the rapid decay of ω 1 again to extend the sum on k to infinity. It follows that We next detect the congruence condition p ≡ 1 (mod 8) with multiplicative characters modulo 8. Therefore Since m and r are odd and square-free and (m, r) = 1, it follows that mr is odd and squarefree. Hence, for each γ ∈ {1, −1, 2, −2}, the integer γmr is square-free. Therefore γmr ≡ 1, 2, or 3 (mod 4). If γmr ≡ 1 (mod 4), then γmr · is a real primitive character modulo |γmr|, while if γmr ≡ 2 or 3 (mod 4), then 4γmr · is a real primitive character modulo |4γmr| (see [ where χ γmr (·) = γmr · if γmr ≡ 1 (mod 4), and χ γmr (·) = 4γmr · if γmr ≡ 2 or 3 (mod 4), so that χ γmr (·) is a real primitive character for all the relevant γ, m, r. Also, since mr > 1, we see that γmr is never 1, so each χ γmr is nonprincipal.
We insert the definition of ω 1 into (6.2.4) in order to facilitate a separation of variables. Recalling (6.2.2) and (6.2.3), we interchange the order of summation and integration to obtain We choose c = 1 log X , so that p s/2 is bounded in absolute value. We can put the summation on k inside of the integral, where it becomes a zeta factor, and we obtain It is more convenient to replace the log p factor with the von Mangoldt function Λ(n). By trivial estimation we have When we sum the error term over m, g, r and integrate over s, the total contribution is O(X 1−ε ), provided ε = ε(θ) > 0 is sufficiently small. By the rapid decay of the Γ function in vertical strips we can truncate the integral to |Im(s)| ≤ (log X) 2 , at the cost of a negligible error. We therefore obtain Having arrived at (6.2.5), we are finished with the preparatory technical manipulations. We proceed to show that S = 1 is small. As discussed in Section 3, we apply three different arguments, depending on the size of mr. We call these ranges Regimes I, II, and III, which correspond to small, medium, and large values of mr.
, and Regime III corresponds to X 1 10 ≪ mr ≪ MX 1 2 +ε . We then write where E 1 contains those terms with mr ≪ exp(̟ √ log x), and E 2 contains those terms with mr ≫ exp(̟ √ log x). We claim the bounds where c > 0 is some absolute constant. Taking together (6.2.6) and (6.2.7) clearly gives (6.2.1), and this yields Proposition 6.1. It therefore suffices to show (6.2.7). 6.3. Regime I. We first bound E 1 , which is precisely the contribution of Regime I. By definition, we have We transform the sum on n with partial summation to obtain where c 1 > 0 is some absolute constant, and the term −w β 1 /β 1 only appears if L(s, χ γmr ) has a real zero β 1 which satisfies β 1 > 1 − c 2 log |γmr| for some sufficiently small constant c 2 > 0. All the constants in (6.3.3), implied or otherwise, are effective.
The contribution from the error term in (6.3.3) is easy to control. Observe that uniformly in s with Re(s) bounded. Taking (6.3.1),(6.3.2) and (6.3.4) together, we see the error term of (6.3.3) contributes to E 1 , where c 3 > 0 is some absolute constant. The bound (6.3.5) is more than adequate for (6.2.7) provided we choose ̟ > 0 sufficiently small in terms of c 1 .
The conductor of the primitive character . We apply Page's theorem [11, equation (9) of Chapter 14], which implies that, for some fixed absolute constant c 4 > 0, there is at most one real primitive character χ γmr with modulus ≤ exp(2̟ √ log X) for which the L-function L(s, χ γmr ) has a real zero satisfying To estimate the contribution of the possible term − w β 1 β 1 , we evaluate the integral arising from (6.3.2) and (6.3.3). We make the change of variable w X → u and integrate by parts to see that this integral equals We assume that a real zero satisfying (6.3.6) does exist, for otherwise we already have an acceptable bound for E 1 . Let q * denote the conductor of the exceptional character χ γmr for which the real zero β 1 satisfying (6.3.6) exists. Then we have where c 5 > 0 is some constant, and γ * is some bounded power of two. We next write b gm = µ(gm)H( log gm log M ) and apply Fourier inversion as in (6.1.2),(6.1.3) to obtain which is acceptable. We therefore have We handle the s-integral in (6.3.9) by moving the line of integration to Re(s) = − c 6 log log X , where c 6 > 0 is small enough that ζ(1 + s + 1+iz log M ) has no zeros in the region Re(s) ≥ − c 6 log log X , Im(s) ≤ (log X) 2 . By moving the line of integration we pick up a contribution from the pole at s = 0. We write this residue as an integral around a circle of small radius centered at the origin, and thereby deduce We have the bound where c 7 > 0 is a fixed absolute constant (see [11, equation (12) of Chapter 14]). If q * satisfies |q * | ≤ (log X) 2−ε then by (6.3.11) we derive By estimating (6.3.10) trivially we then obtain which is an acceptable bound. We may therefore assume that q * satisfies |q * | > (log X) 2−ε . (6.3.12) For |s| = 1 log X we have the bounds Using these bounds and (6.3.12) we deduce by trivial estimation that This completes the proof of the bound for E 1 in (6.2.7). 6.4. Regime II. It remains to prove the bound for E 2 in (6.2.7). From (6.2.5) and (6.2.6) we see that E 2 is the contribution from those m and r in Regimes II and III. The estimates in regimes II and III are less delicate than those in regime I, and consequently the arguments are easier.
In (6.2.5) we write q = γmr. After breaking q into dyadic segments we find Here s 0 is some complex number with Re(s 0 ) = 1 log X and |Im(s 0 )| ≤ (log X) 2 . In order to prove (6.2.7) it therefore suffices to show that In this subsection we treat the Q belonging to Regime II, that is, those Q which satisfy Q ≪ X 1 10 . In the next subsection we treat the Q in Regime III, which satisfy Q ≫ X 1 10 . In Regime II we employ zero-density estimates. We begin by writing Φ s 0 as the integral of its Mellin transform, yielding Observe that from repeated integration by parts we have for every non-negative integer j.
We shift the line of integration to Re(w) = − 1 2 , picking up residues from all of the zeros in the critical strip. On the line Re(w) = − 1 2 we have the bound L ′ L (w, χ) ≪ log(q|w|), and this yields We have written here ρ = β + iγ. The error term is, of course, completely acceptable for (6.4.1) when summed over q ≪ Q. By (6.4.2), the contribution to E(Q) from those ρ with |γ| > Q 1/2 is ≪ XQ −100 , say, and this gives an acceptable bound. We have therefore obtained In order to bound the right side of (6.4.3), we first need to introduce some notation. For a primitive Dirichlet character χ modulo q, let N(T, χ) denote the number of zeros of L(s, χ) in the rectangle For T ≥ 2, say, we have [11,Chapter 16] N(T, χ) ≪ T log(qT ). In N(α, Q, T ) the summation on χ is over primitive characters. We employ Jutila's zerodensity estimate [20, (1.7)] which holds for α ≥ 4 5 . In (6.4.3), we separate the zeros ρ according to whether β < 4 5 or β ≥ 4 5 . Using (6.4.4) we deduce For those zeros with β ≥ 4 5 we write We then embed S(Q) into the set of all primitive characters with conductors ≤ Q. Applying (6.4.6) and (6.4.5), we obtain Since Q ≪ X 1 10 the integrand of this latter integral is maximized when α = 1. It follows that  (recall (3.4)). Here we depart from the philosophy of the previous two regimes, in that we do not bound E(Q) by considerations of zeros of L-functions. Rather, we exploit the combinatorial structure of the von Mangoldt function and Lemma 4.4.
We observe that in Regime III one may still proceed with zero-density estimates by appealing to Heath-Brown's zero-density estimate for L-functions of quadratic characters [15,Theorem 3]. We present our method for the sake of variety, and because it might prove useful in other contexts.
We have By partial summation and the Pólya-Vinogradov inequality, we find that the last inequality holding for ε = ε(θ) > 0 sufficiently small.

Applying Lemma 4.4 yields
The last inequality follows since V = X 6.6. Dénouement. We can extract from our proof of Proposition 6.1 the following result on character sums over primes, which we shall have occasion to use later. Lemma 6.1. Let X be a large real number, and let δ > 0 be small and fixed. Let s 0 be a complex number with |Re(s 0 )| ≤ A 1 log X and |Im(s 0 )| ≤ (log X) A 2 , for some positive real numbers A 1 and A 2 . Given any positive real numbers A 3 , A 4 , and B, we have The implied constant is ineffective.
Proof. Follow the proof of (6.2.7), but instead use the lower bound q * > c(D)(log X) D , which holds for arbitrary D > 0. The constant c(D) is ineffective if D ≥ 2. Lemma 6.1 is quite strong since it corresponds, roughly, to square root cancellation on average in the sums over p. Thus, one would not expect to be able to prove an analogue of Lemma 6.1 with the upper bound for q replaced by X 1+ε for any ε > 0.

The mollified second moment
In this section we derive an upper bound of the correct order of magnitude for the sum S 2 defined in (3.1). Our main result for this section is the following (recall (3.4) and (5.2)).
Proposition 7.1. Let δ > 0 be small and fixed, and let θ, ϑ satisfy θ + 2ϑ < 1 2 . If X ≥ X 0 (δ, θ, ϑ), then The proof of Proposition 7.1 follows the ideas outlined in Section 3. First, we note that log p ≤ log X in (3.1) because Φ is supported on [ 1 2 , 1]. By positivity we may apply the upper bound sieve condition (5.7) to write where S + is defined by Note that d is odd since d | n and n ≡ 1 (mod 8). Also, λ d = 0 only for square-free d by the definition (5.8), and so λ d = µ 2 (d)λ d . We use Lemma 4.2 to write L( 1 2 , χ n ) 2 = D 2 (n), then insert (3.6) into (7.1) to write We first obtain a bound on S + R . The remainder of this section will then be devoted to an analysis of S + N . 7.1. The contribution of S + R . In this subsection we show The arguments here are almost identical to those in [39,Section 3]. Observe that R Y (n) = 0 unless n = ℓ 2 h with ℓ > Y and h square-free. If n ≡ 1 (mod 8) then ℓ and h are odd and h ≡ 1 (mod 8). By the divisor bound we have and therefore There is a mild complication compared to [39] in that it is possible to have h = 1, in which case the character χ h is principal. We apply Cauchy-Schwarz and obtain We have for some coefficients α(m) satisfying |α(m)| ≪ m ε . For h = 1 we use the trivial bound M(ℓ 2 ) 4 ≪ M 2 X ε . For h > 1 we use Lemma 4.4. We therefore have Now observe that, for any c > 1 2 , If h = 1 then L 2 1 2 + s, χ h = ζ 2 ( 1 2 + s). In any case, we move the line of integration to c = 1 log X , and we do not pick up contributions from any poles. When h > 1 this is obvious, and when h = 1 the double pole of ζ 2 ( 1 2 + s) is canceled out by the double zero of (1 − 2 −(1/2−s) ) 2 . By trivial estimation we have then |D 2 (ℓ 2 )| ≪ X ε . For h > 1 we apply Cauchy-Schwarz to obtain Summing over h and using Lemma 4.5, we obtain We next apply Poisson summation to evaluate the n-sum. Denote the n-sum in (7.2.1) by Z, i.e. define Z by We insert the definition (3.7) of N Y (n) and interchange the order of summation to write Z as where F ν (t) is defined by If α and d are square-free, then [α 2 , d] = α 2 d 1 , where .
We may thus relabel n as α 2 d 1 m in (7.2.3), and then split the resulting sum on m according to the congruence class of m (mod m 1 m 2 ν). We deduce from (7.2.3) that By the Chinese Remainder Theorem, we may write the congruence conditions on m as a single condition m ≡ γ (mod 8m 1 m 2 ν) for some integer γ depending on α, d, b. Thus, we may relabel m as 8jm 1 m 2 ν + γ, where j ranges over all integers, and arrive at (7.2.6) We apply Poisson summation to the j-sum to write We insert this into (7.2.6), apply the reciprocity relation e kγ 8m 1 m 2 ν = e k8b m 1 m 2 ν e kα 2 d 1 m 1 m 2 ν 8 , and then evaluate the b-sum using the definition (4.3) of the Gauss sum. Therefore

Recalling (7.2.1) and (7.2.2), we arrive at
Note that we may impose the condition (m 1 m 2 ν, d) = 1 because otherwise ( 2d 1 m 1 m 2 ν ) = 0. We write (7.2.7) as where T 0 is the contribution from k = 0 in (7.2.7), while B is the contribution from k = 0 in (7.2.7). We evaluate T 0 in the next subsection, and B in later subsections.
7.3. The contribution from k = 0. By (4.3), τ 0 (n) = ϕ(n) if n is a perfect square, and τ 0 (n) = 0 otherwise. Hence the term T 0 in (7.2.7) is We first extend the sum over α to infinity. Since ϕ(n) ≤ n, the error introduced in doing so is By Lemma 4.1,F ν (0) ≪ 1 uniformly for all ν > 0, andF ν (0) ≪ exp(− πν 8X ) for ν > X 1+ε . Moreover, (5.8) implies that |λ d | ≪ d ε , while |b m | ≪ 1 by (3.5). It follows from these bounds that (7.3.2) is Since m 1 m 2 ν is a perfect square, the sum over m 1 , m 2 , ν in (7.3.3) is ≪ X ε . Also, the definition (7.2.5) of d 1 implies that This bounds the error in extending the sum over α in (7.3.1) to infinity, and we arrive at Writing the sum on α as an Euler product, we deduce that We next evaluate the sum over d.
Recall that E 0 (X) → 0, and depends only on X, G, and ϑ. Heretofore we just write o(1) instead of E 0 (X).
We may omit the condition p ≤ z 0 by trivial estimation and (5.1). It follows from (7.3.5) and (7.3.4) that Y .

(7.3.6)
The next task is to carry out the summation over m 1 , m 2 , and ν. Let Υ 0 be defined by We insert into (7.3.7) the definition (3.5) of b m and the definitions (7.2.4) and (4.1) of F ν and ω 2 , and then apply the Fourier inversion formula (6.1.2). After interchanging the order of summation, we arrive at This can also be written as where Q(w 1 , w 2 , s) is an Euler product that is uniformly bounded and holomorphic when each of Re(w 1 ), Re(w 2 ), and Re(s) is ≥ −ε. From this definition of Q and a calculation, we see that (7.3.10) Q(0, 0, 0) = 1, a fact we use shortly. We insert the expression (7.3.9) for the m 1 , m 2 , ν-sum into (7.3.8) and arrive at By (6.1.4) and the rapid decay of the gamma function, we may truncate the integrals to the region |z 1 |, |z 2 | ≤ √ log M and |Im(s)| ≤ (log X) 2 , introducing a negligible error. We then deform the path of integration of the s-integral to the path made up of the line segment L 1 from 1 log X −i(log X) 2 to − c ′ log log X −i(log X) 2 , followed by the line segment L 2 from − c ′ log log X − i(log X) 2 to − c ′ log log X + i(log X) 2 , and then by the line segment L 3 from − c ′ log log X + i(log X) 2 to 1 log X + i(log X) 2 , where c ′ is a constant chosen so that for Re(z) ≥ −c ′ / log |Im(z)| and |Im(z)| ≥ 1 (see, for example, Theorem 3.5 and (3.11.8) of Titchmarsh [42]). This leaves a residue from the pole at s = 0. The contributions of the integrals over L 1 and L 3 are negligible because of the rapid decay of the Γ function, while the contribution of the integral over L 2 is negligible because X s ≪ exp −c ′ log X log log X for s on L 2 . Hence the main contribution arises from the residue of the pole at s = 0. Writing this residue as an integral along a circle centered at 0, we arrive at We may expand the zeta-functions and the function Q into Laurent series. The main contribution arises from the first terms of the Laurent expansions, and so we deduce using (7.3.10) By (6.1.4), we may extend the integrals over z 1 , z 2 to R 2 , introducing a negligible error. We then apply the formula to obtain We evaluate the s-integral as a residue using (6.1.6). The result is From this, (7.3.6), and the definition (7.3.7) of Υ 0 , we arrive at Y . 7.4. The contribution from k = 0: splitting into cases. Having estimated the term T 0 in (7.2.8), we now begin our analysis of B. The analysis of B is much more complicated than the analysis for T 0 . The behavior of the additive character e(kα 2 d 1 m 1 m 2 ν/8) in (7.2.7) depends upon the residue class of k modulo 8. We therefore distinguish the following cases for k: k is odd, k ≡ 2 (mod 4), k ≡ 4 (mod 8), or k ≡ 0 (mod 8). We split our analysis of the sum B in (7.2.8) according to these four cases. For the terms with odd k, we use the identity and treat separately the contributions of each term on the right-hand side. Moreover, for the terms with odd k or k ≡ 2 (mod 4), we use the second expression in (4.3) for τ k (n) and treat separately the contributions of the terms 1+i 2 G k (n) and −1 n 1−i 2 G k (n). We can treat these two contributions together as one combined sum for the terms with k ≡ 0, 4 (mod 8), because, for those k, the additive character e(kα 2 d 1 m 1 m 2 ν/8) is constant and the conditions k ≡ 0, 4 (mod 8) are invariant with respect to the substitution k → −k. Hence, in view of these considerations, (7.2.7), and (7.2.8), we write

7.5.
Evaluation of the sum with Q 1 . In this subsection, we evaluate the sum with Q 1 defined by (7.4.2). We may cancel the two Jacobi symbols ( 2 m 1 m 2 ν ) in (7.4.2), insert the resulting expression into (7.5.1), and then apply the Mellin inversion formula to the ν-sum to deduce that for any c > 1. The interchange in the order of summation is justified by absolute convergence. The next step is to write the ν-sum as an Euler product, as follows.
We also need some analytic properties of the function h(ξ, w) defined for Re(w) > 0 by These are embodied in the following lemma. As a bit of notation, for a real number x we define The integral above may be expressed as ds s Now, by these lemmas and the rapid decay ofΦ(w) as |Im(w)| → ∞ in a fixed vertical strip, we may move the line of integration of the w-integral in (7.5.2) to Re(w) = − 1 2 + ε. This leaves a residue from a pole at w = 0 only when χ k 1 is a principal character, which holds if and only if k 1 = 1. By (7.5.3), k 1 = 1 if and only if kd 1 is a perfect square. Hence (7.5.6) Q * 1 = P 1 + R 1 , where P 1 is defined by and R 1 is defined by We bound R 1 in Subsection 7.6. To estimate P 1 , observe that d 1 is square-free by its definition (7.2.5) and the fact that d is square-free. This implies that kd 1 is a perfect square if and only if k equals d 1 times a perfect square. Hence, in (7.5.7), we may relabel k as d 1 j 2 , where j runs through all the odd positive integers. With this and Lemma 7.2, we deduce from (7.5.7) that where Γ 2 (u) is defined by (7.5.10) Γ 2 (u) = (2π) −u Γ(u)(cos π 2 u − i sin π 2 u ), and where we take c > 1 2 to guarantee the absolute convergence of the j-sum. We next write the j-sum in (7.5.9) as an Euler product. By (ii) of Lemma 4.3, if j is a positive integer then d 1 p β G d 1 j 2 (p β ) = G j 2 (p β ) for all p ∤ 2αd and β ≥ 1. From this and the definition of G 0 in Lemma 7.1, we see that where G is defined by [39, (5.8)]. Hence we may write the inner j-sum in (7.5.9) as an Euler product p 2b(w−s) G p (1 + w; p 2b , m 1 m 2 , αd).
where ℓ 1 is the square-free integer defined by the equation (7.5.12) m 1 m 2 = ℓ 1 ℓ 2 2 , µ 2 (ℓ 1 ) = 1, ℓ 2 ∈ Z, and H 1 is defined by an Euler product The local factors H 1,p are (7.5.13) Inserting this expression for the j-sum in (7.5.9) into (7.5.9), we find that (7.5.14)  The next step is to extend the α-sum to infinity and show that the error introduced in doing so is small. To do this, we need to move the line of integration in (7.5.15) closer to 0 to guarantee the absolute convergence of the α-sum. We first evaluate the residue to see that (7.5.15) is the same as w=0 ds s . (7.5.16) Here γ denotes the Euler-Mascheroni constant. The definition (7.5.13) of H 1 (s − w, 1 + w; m 1 m 2 , αd) implies that it is holomorphic for Re(s) > 0 and |w| < max{ 1 2 , 2|s|}, and that it and its first partial derivatives at w = 0 are bounded by ≪ (αX) ε for Re(s) ≥ 1 log X . Thus, by the rapid decay of the gamma function, we may move the line of integration in (7.5.16) to Re(s) = 1 log X . There is no residue because the poles of ζ(2s) and ζ ′ ζ (2s) at s = 1 2 are canceled by the zero of the factor (1 − 2 s− 1 2 ) 2 . Using well-known bounds for ζ(2s) and ζ ′ (2s) implied by the Phragmén-Lindelöf principle, we see that the new integral is now bounded by which is ≪ m 1 m 2 ℓ − 1 2 +ε 1 α ε X ε by the rapid decay of the gamma function. Dividing this bound by α 2 d 1 and summing the result over all α > Y , we deduce that From (7.5.14), (7.5.17), and (7.5.15) now with c = 1 log X , we arrive at  5.19) where, as before, ℓ 1 is defined by (7.5.12), d 1 is defined by (7.2.5), and H 1 is defined as the product of (7.5.13) over all primes. It is convenient for later calculations to write P 1 in terms of a residue, as in (7.5.18), rather than in terms of logarithmic derivatives as in (7.5.16). 7.6. Bounding the contribution of R 1 . Having handled P 1 in (7.5.6), we next turn to R 1 , defined by (7.5.8). It will be convenient to denote , w L(1 + w, χ k 1 ) 2 G 0 (1 + w; k, ℓ, α, d) dw, (7.6.1) so that R 1 = m 1 m 2 R(m 1 m 2 , d). We will bound |R(ℓ, d)| on average as ℓ and d each range over a dyadic interval. Let β ℓ,d = R(ℓ, d)/|R(ℓ, d)| if R(ℓ, d) = 0, and β ℓ,d = 1 otherwise. Then |β ℓ,d | = 1 and |R(ℓ, d)| = β ℓ,d R(ℓ, d). We sum this over all ℓ, d with J ≤ ℓ < 2J and V ≤ d < 2V , where J, V ≥ 1. We then insert the definition (7.6.1) and bring the d, ℓ-sum inside the integral to deduce that (7.6.2) where for brevity we denote We split the k-sum into dyadic blocks K ≤ |k| < 2K, with K ≥ 1, and apply Cauchy's inequality to write where k 2 is defined by (7.5.3). To bound the first factor on the right-hand side of (7.6.3), we split the k-sum according to the values of k 1 and k 2 and interchange the order of summation. Then we use the fact that d 1 ≥ d/α by (7.2.5) to deduce that We estimate the inner sum using the divisor bound, and find that the above is by Lemma 4.5. It follows from this and (7.6.3) that The next task is to bound the second factor on the right-hand side. To this end we prove the following two lemmas.
Lemma 7.3. Let α ≤ Y , d, K, and J be positive integers, and suppose w is a complex number with real part − 1 2 + ε. Then for any choice of complex numbers γ ℓ with |γ ℓ | ≤ 1, and also by Lemma 7.4. Let δ ℓ ≪ ℓ ε be any sequence of complex numbers and let Re(w) = − 1 2 + ε. Then Proof of Lemma 7.3 assuming Lemma 7.4. To prove the first bound, we use the triangle inequality and apply the bounds for G 0 from Lemma 7.1 and h(ξ, w) from Lemma 7.2 to deduce that the sum in question is We then estimate the k-sum by splitting it according to the values of k 1 and k 2 and using (ℓ, k 2 4 ) ≤ k 2 4 ≤ k 2 2 , which follows from (7.5.3) and (7.5.4). This leads to the first bound of the lemma.
To prove the second bound, we apply Lemma 7.2 and write the integral (7.5.5) as 1 2πi (c) g(s, w; sgn(ξ)) X π|ξ| s ds with c = ε. We then bring the ℓ-sum inside the integral and use the triangle inequality to deduce that Thus, since g(s, w; sgn(k)) ≪ ε (1 + |w|) ε exp(−( π 2 −ε)|Im(s)|) by Stirling's formula, it follows from Cauchy's inequality that The second bound of the lemma follows from this and Lemma 7.4.

7.7.
Conditions for the parameters. From (7.5.1), (7.5.6), (7.5.18), and (7.6.12), we see that the total contribution of the sum with Q 1 to B in (7.4.1) is and we take the parameter Y in (3.6) to be Y = X δ with δ = δ(θ, ϑ) sufficiently small.

7.8.
Evaluating the sums of the other terms with k = 0. The procedure for evaluating the sum with Q 2 in (7.4.1) is largely similar to the above process for Q 1 , with only a few differences. The main difference arises from the negative sign in the character −2d 1 m 1 m 2 ν in (7.4.3). This causes the residues in the versions of (7.5.6) and (7.5.7) for Q 2 to have each −kd 1 equal to a perfect square instead of kd 1 = . This means sgn(k) = −1. Hence, because of the factor sgn(ξ) in (7.5.5), the version of (7.5.9) for Q 2 has the function (2π) −u Γ(u)(cos π 2 u + i sin π 2 u ) in place of the function Γ 2 (u) defined by (7.5.10). These lead to a version of (7.7.1) for Q 2 that we may combine with (7.7.1) using the identity The result is where (7.8.3) Γ 1 (u) = (2π) −u Γ(u)(cos π 2 u + sin π 2 u ) and the bound O(X 1−ε ) for the error term is guaranteed by the conditions in Subsection 7.7. The evaluation of the sums in (7.4.1) with Q 3 and Q 4 defined by (7.4.4) and (7.4.5) is similar. The version of (7.5.7) for Q 3 has an extra −1 factor because the Kronecker symbol −2 kd 1 equals −1 when −kd 1 is an odd perfect square. The resulting expression for the sums in (7.4.1) with Q 3 and Q 4 is exactly the same as the right-hand side of (7.8.2). Therefore To estimate the sum with U 1 in (7.4.1), we first relabel k in (7.4.6) as 2k, now with k odd, to write From the definition (4.2) of G k (n), we see that G 2k (n) = 2 n G k (n) for all odd integers n. Also, the orthogonality of Dirichlet characters modulo 4 implies that e( h 4 ) = i( −1 h ) for odd h. It follows from these and (7.8.5) that We then proceed as we did for Q 1 . We treat the sum with U 2 , defined by (7.4.7), in a similar way. We combine the resulting expressions using the identity (7.8.1), and we arrive at Next, to evaluate the sum with V in (7.4.1), we relabel k in (7.4.8) as 4k, now with k odd, to see that since e(h/2) = −1 for odd h and τ 4k (n) = τ k (n) for odd n by (4.3). Into this we insert the second expression for τ k (n) in (4.3). Since −1 n G k (n) = G −k (n) by (4.2), we may split our sum expression for V into two, one with G k (n) and the other with G −k (n). We relabel k as −k in the latter and combine the result with the former to arrive at whereF (ξ) is defined bỹ We then proceed as we did for Q 1 , using [39,Lemma 5.2] instead of Lemma 7.2. We arrive at versions of (7.5.6), (7.5.7), and (7.5.8) which show that the residue at w = 0 equals zero because 2kd 1 = when kd 1 is odd. This leads to (7.8.8) under the conditions in Subsection 7.7. Lastly, to estimate the sum with W in (7.4.1), we relabel k in (7.4.9) as 8k to write using the fact that e(h) = 1 for any integer h and τ 8k (n) = ( 2 n )τ k (n) for odd n by (4.3). Into this we insert the second expression for τ k (n) in (4.3), apply −1 n G k (n) = G −k (n), and recombine the k and −k terms as we did for V in (7.8.7) to deduce that We then proceed as we did for Q 1 , using [39,Lemma 5.2] instead of Lemma 7.2. Since we are now summing over all nonzero integers k and not just the odd ones, instead of (7.5.11) we use We arrive at (7.8.9) 7.9. Putting together the estimates. From (7.4.1), (7.8.4), (7.8.6), (7.8.8), and (7.8.9), we deduce that We next evaluate the residue at w = 0. Note that, for fixed s, the integrand has a pole of order at most 2 at w = 0. We use (6.1.6) with n = 2 to write are even functions of s. Hence (7.9.2) and (7.9.3) are even functions of s. It follows that the integrand in (7.9.1) is an odd function of s. We move the line of integration in (7.9.1) to Re(s) = − 1 log X , leaving a residue at s = 0. In the new integral, we make a change of variables s → −s to see that, since its integrand is odd, it equals the negative of the original integral in (7.9.1). Therefore twice the original integral equals the residue at s = 0. We write this residue as an integral along the circle |s| = 1 log X , taken in the positive direction, and arrive at In view of the expressions (7.9.2) and (7.9.3) and the definitions (7.9.5) and (7.9.6), it now follows from (7.9.4), (7.9.10) and (7.9.11) that From these, the definition (3.5) of b m , and the Fourier inversion formula (6.1.2), we deduce from (7.9.14) that Thus, writing the sum as an Euler product, we see that We write this as (7.9.20) where W (s, z 1 , z 2 , 1 log M ) is an Euler product that is bounded and holomorphic for |s| ≤ ε and complex z 1 , z 2 with |Im(z 1 )|, |Im(z 2 )| ≤ ε log M. Note that this definition of W implies (7.9.21) W (0, 0, 0, 0) = 8 a fact we use shortly. By (6.1.4), we may truncate the integrals in (7.9.20) to the range |z 1 |, |z 2 | ≤ √ log M , introducing a negligible error. On this range of z 1 and z 2 , the function W and the zeta-functions in (7.9.20) may be written as Laurent series. The contributions of the terms other than the first terms of these Laurent expansions are a factor of (log X) 1−ε smaller than the contribution of the first terms. The first term of the Laurent expansion of W is given by (7.9.21). We thus arrive at As H(x) is a smooth function supported in [−1, 1], we have H(1) = H ′ (1) = 0. For notational simplicity we set H(0) = A, −H ′ (0) = B. Since we have We choose H(x) such that on [0, 1] it is a smooth approximation to the optimal function H * (x) which minimizes the integral By the Euler-Lagrange equation, we find that an H * (x) which minimizes (8.3) must satisfy H (4) * (x) = 0. Thus, H * (x) is a polynomial of degree at most three. Recalling the boundary conditions, we find By direct computation we obtain and therefore It is now a straightforward, but tedious, calculus exercise to find that is an optimal choice. Thus Since ̺ is invariant under multiplication of H by scalars, we arrive at the convenient expression
The proof of (9.1.2) follows the lines of the proof of Proposition 7.1, taking M(p) = 1. We employ positivity to replace log p by log X and then introduce an upper bound sieve. After applying the approximate functional equation we split µ 2 (n) = N Y (n) + R Y (n), and employ the bound (7.1.1).
We follow the argument of Section 7 down to (7.2.8), obtaining S + N = T 0 + B. Since we have no mollifier here, we find We insert into this the definitions (7.2.4) and (4.1) of F ν and ω 2 , interchange the order of summation, and then write the sum on ν as an Euler product. The result is As before, we truncate the integral to the range |Im(s)| ≤ (log X) 2 , and then deform the path of integration to the path made up of the line segments L 1 , L 2 , L 3 defined above (7.3.11) to see that the main contribution arises from the residue of the integrand at s = 0. We evaluate the residue using (6.1.6) and arrive at Recalling the definition of c, we have Moreover, we see from (7.9.12) that if M = 1 and b 1 = 1, then since we may deform the path of integration in (7.9.12) to a circle |s| = ε. The condition θ + 2ϑ < 1 2 in Subsection 7.7 with θ = 0 allows us to take ϑ = 1 4 − ε in (9.1.3). We then set We wish to separate the variables s 1 and s 2 . Since c ℓ > 0 we expand ζ(1 + s 1 + s 2 ) as an absolutely convergent Dirichlet series. Interchanging the order of summation and integration, we obtain To truncate the summation over n, first we move the contours of integration to the right to c 1 = c 2 = 1. By trivial estimation we deduce that the contribution from n ≫ X α 1 +α 2 4 × ζ(1 + 2s 1 )ζ(1 + 2s 2 ) ds 1 ds 2 s 1 s 2 dx + O(X(log X) 2 ).
The following standard result implies that the contribution to (9.3.2) from mn = is O(X/ log X), say.
Lemma 9.1. Let χ be a non-principal Dirichlet character modulo q. Let χ * be the primitive character inducing χ, and assume that GRH holds for L(s, χ * ). If q ≤ X M for some fixed positive constant M, then p≤X χ(p)(log p) ≪ M X 1/2 (log X) 2 .
The proof of (9.3.3) is more subtle. Here the method of proof is that of Soundararajan and Young [41]. As the arguments are very similar, our exposition will be sparse, and we refer the reader to [41] for more details. We perform some initial manipulations, and then we state the main proposition which will yield (9. To state the proposition we need, we first establish some notation, following [41,Section 6]. Given x ≥ 10, say, and a complex number z, we define (log x) −1 ≤ |z| ≤ 1, 0, |z| ≥ 1.
It is helpful to know that for the values of z 1 and z 2 we consider, we have log log X ≤ V(z 1 , z 2 , X) ≤ 4 log log X.
The following result, an analogue of [41, Theorem 6.1], is the key input we need. Proposition 9.3. Let X be large, and let z 1 and z 2 be complex numbers with 0 ≤ Re(z i ) ≤ 1 log X and |z i | ≤ X. Assume the Riemann Hypothesis for the Riemann zeta function ζ(s) and for all Dirichlet L-functions L(s, χ p ) with p ≡ 1 (mod 8). Then for any r > 0 in R and any ε > 0 we have p≤X p≡1 (mod 8) L 1 2 + z 1 , χ p L 1 2 + z 2 , χ p r ≪ r,ε X (log X) 1−ε exp rM(z 1 , z 2 , X) + r 2 2 V(z 1 , z 2 , X) .
Then use Proposition 9.4.
We use the following lemma to determine how frequently a Dirichlet polynomial can be large. We write log 2 X for log log X.
Lemma 9.2. Let X and y be real numbers and k a natural number with y k ≤ X 1 2 − 1 log 2 X . For any complex numbers a(q) we have p≤X p≡1 (mod 8) 2<q≤y a(q)χ p (q) where the implied constant is absolute.
Proof. This result is similar to [41, Lemma 6.3], so we give only a sketch. Since we are assuming GRH we could use Lemma 9.1, but we get an unconditional result that is almost as good by appealing to sieve theory. Since p ≡ 1 (mod 8), we have χ p (q) = χ q * (p), where for an odd integer n we define n * = (−1) n−1 2 n. Observe that χ q * is a primitive character with conductor ≤ 4q. We then introduce an upper bound sieve supported on d ≤ D = X 1 log 2 X . With the upper bound sieve in place we drop the congruence condition modulo 8 and the condition that p is a prime. Opening the square and using the Pólya-Vinogradov inequality, the sum in question is then a(q)χ q * (n) q 1 2 2k ≪ q i ≤y q 1 ···q 2k = |a(q 1 ) · · · a(q 2k )| √ q 1 · · · q 2k n≤X   d|n λ d   + D log(y 2k ) q 1 ,...,q 2k ≤y |a(q 1 ) · · · a(q 2k )|.
We put V = V(z 1 , z 2 , X), and define V > 1 16 V log log log X. We take x = X T /V , and z = x 1/ log log X .

V T ,
where S 1 is the sum on q truncated to q ≤ z, and S 2 is the remainder of the sum. Since log L 1 2 + z 1 , χ p L 1 2 + z 2 , χ p ≥ V + M(z 1 , z 2 , X) we have We take k = ⌊( 1 2 − 1 log 4 X ) V T ⌋−1 in Lemma 9.2 and apply the usual Chebyshev-type maneuver to deduce that the number of p ≤ X with S 2 ≥ V /T is It remains to bound the number of p for which S 1 is large. By Lemma 9.2, for any k ≤ ( 1 2 − 1 log 2 X ) V log log X T the number of p ≤ X with S 1 ≥ V 1 is ≪ X log 2 X log X 2kV(z 1 , z 2 , X) + O(log log log X) eV 2 1 k .
For V ≤ (log log X) 2 we take k = ⌊V 2 1 /2V⌋, and for V > (log log X) 2 we take k = ⌊10V ⌋. It follows that the number of p for which S 1 ≥ V 1 is where ω 3 (ξ) is defined by taking j = 3 in (4.1). Our function ω 3 (ξ) is not the same as ω 3 (ξ) in [39]. After using the approximate functional equation to represent L( 1 2 , χ n ) 3 , we write µ 2 (n) = N Y (n) + R Y (n). The contribution from R Y (n) is bounded using arguments similar to those in Subsection 7.1. For N Y (n) we use Poisson summation as before. Up to negligible error, we therefore have the upper bound where F ν (t) = Φ(t)ω 3 ν π tX 3/2 .
We treat separately the contributions from k = 0 and k = 0. The calculations are somewhat easier in that ultimately we seek only upper bounds, not asymptotic formulas. The contribution from k = 0 is treated as in Subsection 7.3, and is ≪ X log X log R (log X) 6 ≪ X(log X) 6 .
For k = 0 the presence of the additive character necessitates a splitting of k into residue classes modulo 8. When necessary, we write the additive character as a linear combination of multiplicative characters. We use the identity and treat the two terms separately. We then follow the method of Section 7 to obtain that the contribution from k = 0 is ≪ X log X log R (log X) 6 ≪ X(log X) 6 .
One difference that arises is in proving analogues of Lemma 7.2. Here we haveΦ(w + s 2 ) inside of an integral, instead of justΦ(w) outside of an integral. It is helpful to use the boundΦ (y) ≪ j log X |y| j .
Another difference is that we have a factor of X s/2 in the integrals, whereas this factor disappeared for the k = 0 terms in Section 7. We therefore do not need to concern ourselves with any symmetry properties of the integrand (cf. the symmetry argument yielding (7.9.4)).