The best constant in a Hilbert-type inequality

We establish that \[\sum_{m=1}^\infty \sum_{n=1}^\infty a_m \overline{a_n} \frac{mn}{(\max(m,n))^3} \leq \frac{4}{3}\sum_{m=1}^\infty |a_m|^2\] holds for every square-summable sequence of complex numbers $a = (a_1,a_2,\ldots)$ and that the constant $4/3$ cannot be replaced by any smaller number. Our proof is rooted in a seminal 1911 paper concerning bilinear forms due to Schur, and we include for expositional reasons an elaboration on his approach.


Introduction
Set R + = (0, ∞). Suppose that K is a function defined on R + × R + that is continuous, positive, symmetric, and homogeneous of degree −1, i.e. such that K(λx, λy) = λ −1 K(x, y) holds for all x, y, λ > 0. The problem of interest is to identify the smallest real number C = C(K) such that the inequality a m a n K(m, n) ≤ C ∞ m=1 |a m | 2 holds for every square-summable sequence of complex numbers a = (a 1 , a 2 , . . .). The canonical example of an inequality of the form (1) is Hilbert's inequality, where the kernel is K(x, y) = (x + y) −1 . In this case, it was Hilbert who proved that the inequality holds with the constant C = 2π before Schur [9] established that the best constant is C = π. It is for this reason that inequalities of the form (1) are commonly referred to as Hilbert-type inequalities.
Schur actually established a much more general result. A hint to his approach can be found in our list of assumptions above, since the requirement that K be a continuous function on R + × R + may seem incongruous as we only evaluate it at pairs of positive integers in (1). Schur's main idea is to first study the continuous analogue of (1) and then reduce the continuous case to the discrete case.
To state his result, we introduce the auxiliary function (2) k(y) = K(1, y) √ y for 0 < y < ∞. The reduction from the continuous case to the discrete case goes through when k enjoys certain monotonicity properties.
Theorem 1 (Schur, 1911). Let K be a continuous, positive, and symmetric kernel of homogeneity −1 and let k be as in (2). (a) If k is decreasing on the interval (1, ∞), then The reader is invited to verify that K(x, y) = (x + y) −1 satisfies the assumption of Theorem 1 (b) and to check that the integral for C(K) indeed equals π.
Chapter IX in the classical monograph Inequalities by Hardy, Littlewood and Pólya [5] is devoted to the further development of Schur's idea. Theorem 318 in that text is a generalization of (b), while (a) is implicitly contained in Section IX.5. However, the authors are keenly aware of the limitations of Schur's approach: If the reader will try to deduce Theorem 331 from Theorem 328 similarly, he will find some difficulty. Something is lost in the passage from integrals to series, and it is by no means always possible that (as here) the passage can be made without damage to the final result.
Hardy, Littlewood and Pólya [5, p. 249]. There is a vast literature (which we make no attempt at delineating) of extensions and generalizations of Theorem 1 in various directions, common among them is that Schur's approach works with only superficial modifications. We will instead consider a family of kernels exemplifying the phenomenon quoted above, namely for 0 < α < ∞. Accordingly, we let C α = C(K α ) denote the best constant in the inequality (1) with the kernel K = K α . Since it is plain that K α satisfies the assumption of Theorem 1 (a) for all 0 < α < ∞ and the assumption of Theorem 1 (b) only for 0 < α ≤ 1. Consequently, the best constant satisfies C α ≥ 2/α for all 0 < α < ∞ and C α = 2/α when 0 < α ≤ 1. We are interested in whether the latter conclusion holds for some α > 1.
If K = K α , then the left-hand side of (1) enjoys the integral representation which can be established by expanding the absolute values into a double sum. Similar formulas can be obtained for other Hilbert-type inequalities via the Mellin transform (see e.g. [10, Sec. IV]). For each fixed square-summable sequence a, the right-hand side of (3) is increasing as a function of α. The same must also be true for the left-hand side, so the function α → αC α is increasing. It follows that if C α ≤ 2/α holds for some α, then the same estimate also holds for all 0 < β ≤ α.
To prove that C α = 2/α holds beyond α = 1, we will rely on another innovation of Schur's seminal paper [9], namely the Schur test. See [4,Sec. 3] for a historical account of the Schur test. In its simplest form, the Schur test is just the weighted Cauchy-Schwarz inequality with an unspecified weight. The strategy is to first use this inequality, then analyze the resulting expression and try to identify a good weight. Unfortunately, once a good weight is found the proof of the resulting inequality is often written up without any mention of how the weight was found. Indeed, Schur's (!) proof of Theorem 1 in Section 7 of [9] makes no reference to the Schur test first introduced in Section 3 of the very same paper.
The first goal of the present note is therefore to give a complete account of Theorem 1 including a clear explanation of how the weight is found. After analyzing how the proof of Theorem 1 (b) fails for the kernels K α when α > 1, we are next led by the Schur test to a sufficient condition for C α ≤ 2/α. We will finally use Euler-Maclaurin summation to show that this condition is satisfied for α = 3/2, thereby answering a question raised by the present author in [1, Sec. 5.2].

and the constant 4/3 cannot be replaced by any smaller number.
By the fact that α → αC α is increasing discussed above and Theorem 1 (a), we also obtain the following.
|a m | 2 and the constant 2/α cannot be replaced by any smaller number.
Corollary 3 is an improvement on [1, Thm. 1] due to the present author, which implies that C α = 2/α for 0 < α ≤ α 0 = 1.48 . . .. Here α 0 denotes the unique positive solution of the equation αζ(1+α) = 2 where ζ is the Riemann zeta function. The Riemann zeta function makes an appearance due to the relation Let us close out this introduction by briefly mentioning some interesting properties of the constants C α . The determination of C α is equivalent to a problem arising in the theory of composition operators on the Hardy space of Dirichlet series through [6,Prob. 3] and [1,Thm. 2]. The connection to composition operators provides at once the lower bound C α ≥ ζ(1 + 2α). The question of whether C α = 2/α is related to the discrete spectrum of certain Jacobi matrices by [2, Thm. C], which in turn is related to the reproducing kernel thesis certain composition operators through material from [7,Sec. 5] and [2,Sec. 5]. We refer to [8,Ch. 8] for a general account of the theory of composition operators on Hardy spaces of Dirichlet series.

The Schur test
We follow the strategy outlined by Schur [9, p. 2] and begin by investigating the continuous analogue of (1). Suppose that the kernel K is continuous, positive, symmetric, and homogeneous of degree −1. Consider the inequality We want to find the smallest constant B = B(K) such that the inequality (5) holds for every square-integrable complex-valued function f on R + .
By the symmetry and positivity of the kernel K, we may assume without loss of generality that f is nonnegative on R + . After inspecting (5), it is natural to expect that the proof of such an estimate would involve the Cauchy-Schwarz inequality. A naive first attempt is to use the symmetry and positivity of K to write before applying the Cauchy-Schwarz inequality. It turns our that a slightly more refined approach is called for. A continuous function ω : R + → R + will be called a weight in what follows. For an unspecified weight ω, we write (6) f By the Cauchy-Schwarz inequality and symmetry, we deduce from (5) and (6) that If we can find a weight ω and a constant A such that the estimate holds for every 0 < x < ∞, then plainly B(K) ≤ A. This is the Schur test.
The plan is now to study the integral on the left-hand side of (8) in order to identify a suitable weight ω. Due to the homogeneity of K, we can write From this we see that the easiest way to to attain the estimate (8) is to choose a weight which satisfies ω(xy) = ω(x)ω(y) for every x and y in R + . By the assumption that ω is continuous, this is only possible if ω(x) = x r for a fixed real number r.
We now want to pick r to minimize the resulting integral. Appealing to the homogeneity of K yet again, we find that The minimum of the integrand on the right-hand side is attained at r = −1/2 for each fixed 1 < y < ∞. Hence it follows that Is this the best constant? The only estimate we have used is the Cauchy-Schwarz inequality in (7). To attain the equality here with a non-trivial function, there must be a constant C = 0 such that f = Cω. However, this is not permissible since ω is not square-integrable on R + . To overcome this issue, we fix ε > 0 and set In view of the final equality in (9), it is sufficient to consider a test function supported on 1 < x < ∞. It is plain that the square-integral of f ε on R + is equal to (2ε) −1 . By the homogeneity of K and integration by parts, we find that Letting ε → 0 + , we find that the estimate in (9) indeed yields the best constant in (5). We have consequently established the following result.

and the constant B cannot be replaced by any smaller number.
Let us next turn to the discrete case and the proof of Theorem 1. Although we will not explicitly use Theorem 4 in our proof, we are influenced by the choice of weight and test function made above.

Proof of Theorem 1 (a).
If a = (a m ) m≥1 is defined by a m = m − 1 2 −ε for some ε > 0, then m≥1 |a m | 2 = ζ(1 + 2ε). Moreover, The first sum remains bounded as ε → 0 + and can be ignored. We need to estimate the double sum from below, and we rewrite it using homogeneity to find that We recognize the inner sum as a left Riemann sum with uniform partition size m −1 for the integral Combining the assumption that k is decreasing on the interval (1, ∞) with the fact that the function y → y −ε is decreasing on the same interval and a geometric argument, it can be seen that I ε is a lower bound for every left Riemann sum (see Figure 1 for an example). Thus, a m a n K(m, n) ≥ −K(1, 1)ζ(2 + 2ε) + 2I ε ζ(1 + 2ε).
Letting ε → 0 + , we find that C(K) ≥ 2I 0 . We finish the proof by using the homogeneity of K as in (9). Let us now turn to the Schur test in the discrete case. As above, if there is a weight ω : N → R + and a constant A such that holds for every m ≥ 1, then the best constant C in the Hilbert-type inequality (1) satisfies C ≤ A. Theorem 1 (b). Following our analysis of the continuous case above it is natural to choose the weight ω(m) = 1/ √ m for m ≥ 1. By (10), we then find that

Proof of
We recognize the right-hand side of (11) as right Riemann sums of uniform partition size m −1 for the integral The assumption that k is decreasing on the interval (0, ∞) and a geometric argument (see Figure 2 for an example) shows that the supremum in (11) is attained as m → ∞ and is equal to I. Hence we get that C(K) ≤ I by the Schur test.
A similar Riemann sum argument starting from (11) under the assumption that k is increasing on (0, 1) and decreasing on (1, ∞), gives that This estimate is in general unlikely to be sharp, since we have to choose m = 1 to attain the supremum for the integral over the interval (0, 1) and m → ∞ to attain the supremum for the integral over the interval (1, ∞). For the kernels K α the estimate (12) becomes C α ≤ 1 + 1/α. This is sharp if and only if α = 1 when k 1 (y) = 1 for all 0 < y < 1. This case is presented in Figure 2. The typical situation for α > 1 is presented in Figure 3. One possible strategy to improve (12) is to keep track of the overestimates on (0, 1) and underestimates on (1, ∞) for each fixed m ≥ 1 and try to compute the supremum in (11). This plan was carried out in [1,Lem. 8] and it led to the proof that C α = 2/α for 0 < α ≤ 1.48 . . . mentioned in the introduction. We will instead take an alternative approach, which in addition to giving a stronger result also is somewhat easier to handle from a computational point of view. Our approach is based on the observation is that K α (1, 1) = k α (1) = 1, so as α increases this term will be increasingly dominant. The plan is therefore to adjust the weight in the Schur test accordingly. The key idea of the next result is that we require of the unspecified weight ω that (10) is satisfied with A = 2/α. We then try to choose a parameter in the weight in order to satisfy this requirement.
Proof. We already know that C α ≥ 2/α by Theorem 1 (a), so it is enough to consider the upper bound C α ≤ 2/α. We will use the Schur test (10) with the weight for some parameter δ α > 0. To conclude that C α ≤ 2/α, we need to establish that holds for every m ≥ 1. There are two cases. First, if m = 1, then (14) yields the requirement Note that for 0 < α < 2 we may choose a positive δ α satisfying this estimate as required by the Schur test. Second, if m ≥ 2, then (14) yields the requirement We can find δ α > 0 which satisfies both requirements whenever for every m ≥ 2. This is equivalent to (13) by a computation involving (4).

Euler-Maclaurin summation
We require two estimates in order to establish that the requirement (13) from Lemma 5 holds for α = 3/2. These estimates be extracted from [1,Sec. 4], but we include a complete account here to ensure that the present note is self-contained. Although we shall use the two estimates only for α = 3/2 in the proof of Theorem 2, we state them somewhat generally. What we need to know about Euler-Maclaurin summation can be found in [3,Sec. 11.5].
Proof. Let f be a function defined on the interval [m, ∞) which has continuous derivatives of order three on the same interval. If both f and f ′ vanish at infinity, then one step of the Euler-Maclaurin summation formula yields that where {x} denotes the fractional part of x and is an increasing function on the interval [m, ∞), then a symmetry consideration shows that We apply (15) and (16) to f (x) = x −α−1 and obtain the stated result.
Proof. Let f be a function defined on the interval [1, m] which has continuous derivatives of order five on the same interval. Two steps of the Euler-Maclaurin summation formula yields that We can now deduce our main result from Lemma 5, Lemma 6, and Lemma 7.