Hardy’s inequality and its descendants: a probability approach

We formulate and prove a generalization of Hardy’s inequality [27] in terms of random variables and show that it contains the usual (or familiar) continuous and discrete forms of Hardy’s inequality. Next we improve the recent version by Li and Mao [42] of Hardy’s inequality with weights for general Borel measures and mixed norms so that it implies the discrete version of Liao [43] and the Hardy inequality with weights of Muckenhoupt [48] as well as the mixed norm versions due to Hardy and Littlewood [29], Bliss [8], and Bradley [14]. An equivalent formulation in terms of random variables is given as well. We also formulate a reverse version of Hardy’s inequality, the closely related Copson inequality, a reverse Copson inequality and a Carleman-Pólya-Knopp inequality via random variables. Finally we connect our Copson inequality with counting process martingales and survival analysis, and brieﬂy discuss other applications.


General rights
It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons).
Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.

Introduction
The classical Hardy inequality is often presented as the following pair of inequalities: the continuous (or integral form) inequality says, if p > 1 and ψ is a nonnegative p-integrable function on (0, ∞), then (1.2) For example, see pp. 239-243 of [30], Exercises 3.14 and 3.15 of [55], [41], or [59], Chapter 9. As Hardy [27] mentions in his Section 5, Landau pointed out that the discrete inequality follows from the integral one by noting that c 1 ≥ c 2 ≥ · · · may be assumed, and by choosing an appropriate step function as ψ; see Section 8 of [39].
Our main objective here is to give a unified formulation and proof of the inequalities (1.1) and (1.2) using the notation and language of probability theory. Along the way we will obtain a large family of other corollaries related to weighted Hardy inequalities (as given in [39] and in the book-length treatments [41] and [40]); see Section 2.
Such versions usually involve two arbitrary Borel measures. A very recent result by Li and Mao [42] is not optimal yet, because it does not contain the discrete version as given by Liao [43]. In Section 3 we shall formulate an improvement of the result by [42] that contains the discrete version by [43] as a special case. Actually our proof of this improvement is based on the discrete result of [43]. An equivalent formulation of our version of Hardy's inequality with weights in terms of random variables will also be given.
Furthermore, we apply our methods from Section 2 to Copson's inequality ( [18]) in Section 5 and to the reverse Hardy inequality in Section 4; cf. [53] and [6]. We treat reverse Copson inequalities in the same style in Section 6, and we provide a probabilistic version of the inequalities of Carleman, Pólya, and Knopp in Section 7. In Section 8 we connect our new versions of Copson's inequality formulated in probability terms with counting process martingales arising in survival analysis and reliability theory. The appendix, Section 12, elaborates on survival analysis by briefly explaining connections with the forward (and backward) versions of the Kaplan -Meier estimators appearing in right (and left) censored survival data, including a short description of the analysis of data arising from the question of "when do the baboons come down from the trees". Other applications are presented briefly in Section 11 and a summary of the new inequalities is given in Section 10. Most of the proofs are collected in Section 9.

Hardy's inequality
Here is our version of Hardy's inequality that implies both (1.1) and (1.2).

Theorem 2.1. Hardy's inequality
Let X and Y be independent random variables with distribution function F on (R, B), and let ψ be a nonnegative measurable function on (R, B). For p > 1 holds. For continuous distribution functions F this inequality may be rewritten as with ψ F (v) = ψ(F −1 (v)), 0 < v < 1, and for such F the constant (p/(p − 1)) p is the smallest possible one.
The strength of this inequality (2.1) lies in the fact that it implies both the continuous and the discrete version of Hardy's inequality.
Proof. (i) and (ii) follow from Theorem 2.1 by taking F to be the distribution function corresponding to the uniform probability measure on [0, K] and on {1, . . . , K}, respectively, multiplying by K, and taking limits as K → ∞.
Translating Theorem 2.1 from random variable notation back into analysis yields the following corollary. Corollary 2.3. For any p > 1, distribution function F on R, and ψ ∈ L p (F ) we have where H F is the F -averaging operator defined for x ∈ R and ψ ∈ L p (F ) by Note that H F generalizes both the discrete and the continuous Hardy averaging operators; see e.g. [39], page 715. Observe that |H F ψ| ≤ H F |ψ| holds for all measurable ψ with equality if ψ is nonnegative F -a.e. This shows the equivalence of Theorem 2.1 and Corollary 2.3.
, F (X−)) with the convention 0/0 = 0, then the inequality does not hold anymore for some distribution functions with jumps. In particular, for X and Y Bernoulli with success probability P (X = 1) = q and with ψ(0) = 1, ψ(1) = 0 we get (2.6) Remark 2.5. There are distributions for which the constant in (2.1) is not optimal for any p > 1. This is the case for all Bernoulli distributions. Let X and Y have a Bernoulli distribution with P (X = 1) = q = 1 − P (X = 0). Then with ψ(0) = a ≥ 0 and ψ(1) = b ≥ 0 our Hardy inequality (2.1) becomes holds. Consequently, for the Bernoulli distribution with success probability q the optimal constant in our Hardy inequality equals at most 1 + q, for which holds.
Note that (2.10) can be rewritten as is the "mean residual life function" corresponding to the distribution function F . It turns out that for ψ(Y ) ∈ L 2 (F ) and F continuous so that the conditional centering operator I − H F is an isometry. For more on this and connections to counting process martingales and survival analysis see [54], [22], and [7]. [60] studies I − H and I − H * as operators on L p (R + , λ) where λ denotes Lebesgue measure.
Remark 2.7. Since the conditional distribution of X given X ≤ c has distribution function F (·)/F (c) for c ∈ R and the same holds for Y , we have the following conditional version of (2.1) Remark 2.8. The Hardy inequality for weighted L p spaces on (0, ∞), such as Theorem 1.2.1 of [3], also follows from our Hardy inequality for random variables. With 0 ≤ ε < (p − 1)/p and K a large constant, we choose F (x) = (x/K) 1−εp/(p−1) ∧ 1, x ≥ 0. This results in the inequality Taking limits as K → ∞ and writing Ψ(y) = ψ(y)y −εp/(p−1) we arrive at

Hardy's inequality with weights and mixed norms
To the best of our knowledge the most recent and most general versions of Hardy's inequalities with weights and mixed norms are presented by Liao [43] and Li and Mao [42]. We shall improve the result of [42] so that it contains the discrete version of [43] as a special case. To this end we prove the result of [42] with (−∞, x) in the inner integral replaced by (−∞, x], i.e. EJP 26 (2021), paper 142.

Theorem 3.1. Hardy's Inequality with Weights and Mixed Norms
Let 1 < p ≤ q < ∞, and suppose that µ and ν are σ-finite Borel measures on R. Then holds for all measurable ψ : R → [0, ∞), where k q,p and B are defined by which implies the well known inequality B ≤ C. By Theorem 3.1 we also have C ≤ k q,p B so C < ∞ if and only if B < ∞. The constants k q,p first appeared via a (1923) conjecture of Hardy and Littlewood [29] which was later confirmed by Bliss [8]. See Chapter 5 of [40] for a very complete history of these developments and further results.
Theorem 3.1 and Remark 3.3 may be reformulated in terms of random variables as follows.

Theorem 3.4. Probability Version of Hardy's Inequality with Weights and Mixed Norms
Let X and Y be independent random variables with distribution functions F and G respectively, let 1 < p ≤ q < ∞, and let U and V be nonnegative measurable functions on (R, B). Furthermore let C ∈ [0, ∞] be the smallest constant such that holds for all nonnegative measurable functionsψ on (R, B). With , is a probability measure dominating µ. Let F and G be the distribution functions of probability measures dominating the measures µ and ν, respectively, from Theorem 3.1. The choices U (x) = dµ/dF (x) and V (y) = (dν/dG(y)) 1−p show that Theorem 3.4 implies Theorem 3.1.
Following the arguments of Muckenhoupt [48], in Section 9 we prove the following generalization of his result, which is the special case q = p of our Theorems 3.1 and 3.4.

Theorem 3.5. Probability Version of Muckenhoupt's Inequality
Let X and Y be independent random variables with distribution functions F and G respectively, let p > 1, and let U and V be nonnegative measurable functions on (R, B). Furthermore let C ∈ [0, ∞] be the smallest constant such that holds for all nonnegative measurable functions ψ on (R, B). With the string of inequalities holds, even for B = ∞. Remark 3.6. With U = G −p , V = 1 and G = F the second inequality in (3.11) does not imply our Hardy inequality (2.1). Indeed, for Bernoulli random variables with P (X = 1) = 1/p = 1 − P (X = 0) the factor B equals 1 + (p − 1) p−1 /p p then and hence the upper bound on C equals 1 + p p /(p − 1) p−1 , which is larger than (p/(p − 1)) p for p ≥ p 0 ≈ 1.77074. However, with U = G −p , V = 1 and G = F a continuous distribution function the factor B equals 1/(p − 1), which shows that (3.11) does imply our Hardy inequality (2.1) for this case. If X is stochastically larger than Y, Y X, and they have no point masses at the same location, then Theorem 3.5 yields an inequality very similar to (2.1). A comparable result is obtained for X Y .

Corollary 3.7. Stochastic ordering
Let X and Y be independent random variables with distribution functions F and G respectively, let p > 1, and let ψ be a nonnegative measurable function on (R, B).
is valid.
is valid.
Proof. In case (a) we apply Theorem 3.5 with U = G −p and V = 1. Then B from (3.10) If F has no point mass at r, then the stochastic ordering Y X implies In Combining (3.14)-(3.16) and (3.9)-(3.11) we arrive at (3.12).
In case (b) we apply Theorem 3.5 with U = F −p and V = 1. Then the continuity of F and G ≤ F imply that B from (3.10) satisfies and hence that (3.13) holds.

A reverse Hardy inequality
There are also reversed versions of the classical Hardy inequality: the continuous (or integral form) inequality says, if p > 1 and ψ is a nonnegative, nonincreasing p-integrable function on (0, ∞), then  Here, ζ(·) is the zeta function. These inequalities have been obtained independently by Renaud [53] and Bennett [6]; see also Lemma 2.1 of [47]. By taking ψ the indicator function of the unit interval we see that (4.1) is sharp and by taking c 1 = 1, c 2 = c 3 = · · · = 0 that (4.2) is sharp.
Here are our random variable versions of (4.1) and (4.2).

Theorem 4.1. Reverse Hardy inequality
Let X and Y be independent random variables both with distribution function F on (R, B), and let ψ be a nonnegative, nonincreasing function on (R, B). For p > 1 and F absolutely continuous holds with equalities if ψ is constant. If F is general, but p ≥ 2 is an integer, then, with X, Y, X 1 , . . . , X p independent and identically distributed and with X (p) = max{X 1 , . . . , X p }, we have with equality if ψ is constant.
The continuous version (4.1) of the reverse Hardy inequality is contained in (4.3) and the discrete version (4.2) for integer p follows from (4.5).
For further developments concerning reverse Hardy type inequalities, see [24]. Copson [18] presented the following pair of inequalities: the continuous (or integral form) inequality says, if p > 1 and ψ is a nonnegative p-integrable function on (0, ∞),

Copson's inequality
holds, while the discrete (or series form) inequality says, if p > 1 and a i and λ i , i = 1, 2, . . . , are nonnegative numbers and holds. We generalize Copson's inequalities as follows.

Theorem 5.1. Copson's inequality
Let X and Y be independent random variables with distribution function F on (R, B), and let ψ be a nonnegative measurable function on (R, B).
holds. For absolutely continuous distribution functions F the constant p p is the smallest possible one.
The strength of this inequality (5.3) lies in the fact that it implies both the continuous and the discrete version of Copson's inequality.
(i) can be seen by choosing X and Y uniform on (0, K) and taking limits with K → ∞.
(ii) needs a longer argument.
. . , K, for some natural number K and define the bounded continuous function ψ such that Taking limits here for K 2 → ∞ and subsequently K 1 → ∞ we arrive at (5.2).
Comparison of the left side of (5.3) with the left side of (2.1) and the definition of H F in (2.3) leads us to define the Copson (or dual) operator H * F as follows: for x ∈ R and ψ ∈ L p (F ) Knopp in Section 7.) As pointed out by Hardy in [28], the discrete Copson inequality is a "reciprocal" or "dual" inequality of the discrete Hardy inequality (1.2), in the sense that one implies the other. But this holds in other senses as well. For a treatment of (1.1) and (5.1) based on the duality of L p and L q with 1/p + 1/q = 1, see [25], section 6.3, especially his Theorem EJP 26 (2021), paper 142. 6.20 and Corollary 6.2.1. In particular when viewed as operators on L 2 (F ), H F and H * F are adjoint operators: for ψ and χ in L 2 (F ) we have So, H F and H * F have the same norms for p = 2, and indeed the bounds in (10.1) and (10.2) are the same for p = 2. Applying Hardy's approach we obtain the equivalence of (2.1) and (5.3).

Theorem 5.3. Equivalence of Hardy's and Copson's inequality
Let X and Y be independent random variables with distribution function F on (R, B). Although this Theorem 5.3 (formally) renders one of our proofs of Hardy's and Copson's inequality superfluous, we have included both proofs in Section 9 to illustrate the different methods.
where the first inequality follows from Jensen's inequality and the convexity of x → x p , x ≥ 0. The right hand side of (5.8) is bounded by where the strict inequality holds since p → p log p − (p − 1) log 2 is strictly increasing on [1, ∞) with value 0 at p = 1 and where the last expression is the upper bound in (5.3).
Remark 5.5. Theorem 5.3 gives a qualitative connection between Hardy's inequality and Copson's inequality (or the "dual Hardy inequality"). The papers by [38], [35], and [36] quantify these connections. These results are strongly related to further work on the connections between the I − H F and I − H * F operators on the one hand, and between the I − H F and I − H * F operators on the other hand. Also see [12]. Recall that dF (y). are the backward cumulative hazard function and the (forward ) cumulative hazard functions of survival analysis.

A reverse Copson inequality
Reversed versions of the classical Copson inequality are given in Theorems 2 and 4 of Renaud (1986) [53]. His continuous (or integral form) inequality may be rephrased as follows. If p ≥ 1 holds and ψ is a nonnegative p-integrable function on (0, ∞) such that holds. His discrete form says: if p ≥ 1 holds and a 1 /1 ≥ a 2 /2 ≥ · · · are nonnegative numbers, then holds.
It seems natural to consider a reverse Copson inequality formulated in terms of random variables. Here is our result in this direction.

Theorem 6.1. Reverse Copson inequality
Let X and Y be independent random variables both with distribution function F on (R, B) and let ψ be a nonnegative p-integrable function on (R, holds with equality if ψ = F or p = 1 holds. If the distribution function F is continuous, ψ is nonincreasing, and p is an integer, holds with equality if ψ is constant or p = 1 holds. If the distribution function F is arbitrary, ψ is nonincreasing, and p is an integer, holds with equality if ψ equals 0, or F is degenerate (i.e. F is concentrated at one point), or p = 1 holds.
We conjecture that (6.4), with p! replaced by Γ(p + 1), and (6.5) hold for all p ≥ 1, but we have no proof. Note that for F continuous (6.5) with p ∈ [1, ∞) follows from (6.3). For the situations of the continuous and discrete versions of the original Copson inequality our reverse Copson inequality implies: (ii) If p ≥ 1 is an integer and ψ is a nonnegative, nonincreasing, p-integrable function holds.
The proof of this corollary is almost the same as the proof of Corollary 5.2 in Section 5 (but with the inequality signs reversed and the constants changed), and therefore it is omitted. Remark 6.3. Without continuity of F inequality (6.3) is not generally valid for p > 1. Again a counterexample is provided by the Bernoulli distribution. Take ψ = F and . Now, as a function of the success probability q the left minus the right hand side of (6.3) equals , [32], and [50]. By now the reader will anticipate our impulse to reformulate and unify these two inequalities in a more probabilistic vein involving random variables and distribution functions as follows: Theorem 7.1. Let ψ be a positive valued function on R and let X, Y be independent random variables with distribution function F . (i) For any nonnegative ψ ∈ L 1 , inequality (7.1) holds.
(ii) For any positive sequence {c k } ∈ 1 the inequality (7.2) holds The proof of Corollary 2.2 is applicable to Corollary 7.2 as well.
Kaijser et al. [33] rewrite the classical integral version of the Carleman inequality as follows: replacing ψ(y) in (7.1) by ψ(y)/y yields x . This follows by elementary manipulations together with the identity x 0 log ydy = x(log x− 1). [33] prove (7.1) with strict inequality by proving (7.3) with strict inequality via the following simple convexity argument. By convexity of exp, it follows from Jensen's inequality followed by Fubini's theorem that Strict inequality follows because equality in Jensen's inequality almost everywhere forces ψ to be constant a.e., but this contradicts finiteness of ∞ 0 ψ(y)/y dy.
Now several questions arise: is there a corresponding rewrite of our probabilistic version of the inequalities of Carleman and Pólya -Knopp? The answer is clearly "yes" for continuous distribution functions F . Replacing ψ by ψ/F in (7.1) and arguing as above, but using the identity ( . This is a "left tail inequality" with motivations from survival analysis. For the corresponding "right tail inequality" we instead replace ψ by ψ/(1 − F ). Then reasoning as above yields, for continuous F , where Λ(x) ≡ (−∞,x] dF (y)/(1 − F (y−)).
Note: This notation goes against the classical notation of survival analysis but is in keeping with the current notation of our paper. The usual notation for the "right side" or forward cumulative hazard function is simply Λ(x) = (−∞,x] dF (y)/(1 − F (y−)).

Martingale connections and the H operators
In this section we expand on the comments in Sections 2, 5, and 7 concerning martingales, counting processes, and the residual life and dual Hardy operators.
We will also need the classical Hardy operators H and H * defined by for ψ ∈ L p (R + , λ) where λ denotes Lebesgue measure. Krugliak et al. [37] (see also [38]), showed that It is well known (see e.g. [16]) that I − H is an isometry on L 2 (R + , λ).
EJP 26 (2021), paper 142. [54] showed that R ≡ I − H F is an isometry of L 2 (R, F ); see also [7] Appendix A.1, pages 420 -424. These authors also showed that with R ≡ I − H F and L ≡ I − H * F we have , and we see that the analogue of the identity (8.2) becomes where Λ is as defined in (5.10).
To see that this is fundamentally linked to counting process martingales, let X have distribution function F on R + , and define a one-jump counting process {N(t) : t ≥ 0} by This process is (trivially) seen to be nondecreasing in t with probability 1, and hence is a sub-martingale (a process increasing in conditional mean). By the Doob-Meyer decomposition theorem there is an increasing predictable process where {M(t) : t ≥ 0} is a mean−0 martingale. In fact for this simple counting process it is well-known that

[X≥s] dΛ(s).
Comparing this with the identity (8.3) rewritten for a distribution function F on R + we see that with ψ t (x) = 1 [x≤t] and evaluating the resulting identity at x = X we get Since the σ-fields {F t } t≥0 are nested, {Y(t) : t ≥ 0} is a martingale (and it is often called "Doob's martingale"). Furthermore, it can be represented in terms of the basic martingale M using the fundamental identity L • R = I on L 0 2 (F ) discussed above: since ψ = L • Rψ we see that This set of connections deserves to be explored further. In particular we conjecture that many of the interesting properties of the classical Hardy operator H and the dual Hardy operator H * established in the series of papers by [37], [38], [12], [35], [13], [36], and [60] will have useful analogues for H F and H * F in the probability setting for Hardy's inequalities which we have considered here. On the other hand, the martingale connections of the operators L and R perhaps deserve to be better known in the world of classical Hardy type inequalities.

Rψ(s)dM(s).
For further explanation of the connections of these processes with right and left censored data problems in survival analysis, see the Appendix, Section 12.
If X 1 , . . . , X n are i.i.d. with (continuous distribution function) F , then is a counting process which is simply the sum of independent counting processes and the sum of the corresponding counting process martingales is again a counting process martingale: is the number of X i 's "at risk" at time t.

Proofs for Section 2
In order to prove our random variable version of Hardy's inequality we need a Lemma. The proof of this Lemma has the same structure as Broadbent's proof of Hardy's inequality (1.2), which is a slightly improved version of Elliot's proof; see [15], [23], and [30], page 240. With p i = 1 this inequality is a finite sum version of the discrete Hardy inequality (1.2). Taking limits as m → ∞ first on the right hand side and subsequently on the left hand side of (9.1) with p i = 1 we obtain the discrete Hardy inequality itself.
Proof of Lemma 9.1. With the notation P n = n i=1 p i , A n = n i=1 a i p i , B n = A n /P n , n = 1, . . . , m, A 0 = B 0 = P 0 = 0 we rewrite a n p n B p−1 n = (A n − A n−1 ) B p−1 n = (P n B n − P n−1 B n−1 ) B p−1 n (9.2) into P n B p n = a n p n B p−1 n + P n−1 B n−1 B p−1 n .
Proof. By symmetry it suffices to prove (9.32), which with the distribution function With the random variable U uniformly distributed on the unit interval the left hand side of this inequality equals and satisfies which implies the first inequality in (3.11). With inequality (9.32) of Lemma 9.2 with χ = V −1/(p−1) and γ = 1 − 1/p = (p − 1)/p yields By the definition of B in (3.10) the right hand side of (9.39) is bounded from above by where the inequality follows from (9.33) of Lemma 9.2. By the definition of B the last expression is bounded by the right hand side of (3.11), which completes the proof of (3.11).

Proofs for Section 4
Proof of Theorem 4.1. Let f be a density of F . The monotonicity of ψ implies for Lebesgue almost all x ∈ R. So we have which is the first inequality of (4.3). Since ψ p and 1 − F p−1 are both nonincreasing, ψ p (Y ) and 1 − F p−1 (Y ) are nonnegatively correlated and consequently their covariance . (9.44) This results in the second inequality of (4.3). Note that inequality (4.4) and hence the inequality between the left hand side and the right hand side of (4.3) is obvious as ψ is nonincreasing. Let F be general and p integer. As X 1 , . . . , X p are independent and identically distributed and ψ(·)1 [·≤x] is nonincreasing, we have (9.45) and hence which implies (4.5).
Proof of Corollary 4.2. Let X and Y be uniformly distributed on the interval (0, K). Our reverse Hardy inequality (4.3) becomes (9.48) Taking limits for K → ∞ and subsequently ε ↓ 0 we arrive at (4.1).
For the second part of the corollary we take X and Y uniformly distributed on {1, . . . , K}. In view of P (X (p) ≤ n) = (n/K) p our inequality (4.5) with ψ(k) = c k becomes for n ≥ 2. As for n = 1 equality holds in (9.52), the proof that for integer p inequality (4.2) can be obtained from our inequality (4.5), is complete.

Proofs for Section 5
We will use the following Lemma, which shows the structure of Copson's proof of his Theorem B with sums over infinitely many terms replaced by finite sums; see [18]. holds.
Note that part of Theorem B of [18] follows from this inequality by taking limits for m → ∞, first at the right hand side, subsequently within the p-th power at the left hand side, and finally for the first sum at the left hand side.
Proof of Lemma 9.3. With the notation Young's inequality (as in the proof of Lemma 9.1) yields A p n p n − pA p−1 n a n p n = A p n p n − pA p−1 n P n (A n − A n+1 ) (9.55) ≤ (p n − pP n ) A p n + P n (p − 1)A p n + A p n+1 = P n A p n+1 − P n−1 A p n for n = 1, . . . , m. Summing this inequality over n we obtain m n=1 A p n p n − p m n=1 a n A p−1 n p n ≤ 0.

Proof for Section 6
Proof of Theorem 6.1. First we prove that for p ∈ [1, ∞), for arbitrary F and for x → holds. Observe that for continuous F this implies (6.3). To prove (9.68) we follow the line of argument in the proof of Theorem 4 of Renaud [53]. For x < y the monotonicity of ψ/F implies In view of F (F −1 (u)−) ≤ u and since u ≤ F (y−) implies F −1 (u) ≤ y, Fubini's theorem shows that the right hand side of (9.71) equals and satisfies Furthermore, for fixed x we define the distribution function This shows that the left hand side of (9.71) is bounded from above by Combining this with (9.71) and (9.72) we arrive at (9.68) and hence at (6.3).
To prove (6.4) and (6.5) we restrict attention to integer p and let X, Y, Y 1 , . . . , Y p be independent random variables all with distribution function F . If F is continuous, the monotonicity of ψ implies that

Proofs for Section 7
Proof of Theorem 7.1. By Hardy's inequality in the probability form (2.1) with ψ replaced by ψ 1/p we have The left hand side equals where the inequality holds in view of Jensen's inequality for conditional expectations and the convexity of exp. The right hand side of (2.9) completes the proof.

Summary
Our sharp inequalities related to Hardy's inequality read as follows.
where the first inequality holds if F is absolutely continuous and ψ is nonincreasing.
Our sharp inequalities related to Copson's inequality are the following.
where the first inequality holds if F is continuous and x → ψ(x)/F (x) is nonincreasing.

Our Hardy inequality with weights and mixed norms is
Detailed conditions are given in the respective Theorems.

Applications and Related Work
We close with a few brief comments concerning applications and related work.
As noted by Diaconis [21], Hardy's inequality (1.2), and especially the weighted version thereof due to Muckenhoupt [48], has been applied by Miclo [46] to obtain useful bounds for the spectral gap for birth-and-death Markov chains. He provides a nice overview of alternative methods and their potential drawbacks. Bobkov and Götze [10] extend the methods of [48] to study optimal constants in log-Sobolev inequalities on R. Because log-Sobolev inequalities are preserved by the formation of products of independent distributions (i.e. tensorization), their results yield log-Sobolev inequalities EJP 26 (2021), paper 142. for product measures. Their results have been refined by Barthe and Roberto [4] who go on in [5] to study modified log-Sobolev inequalities. Saumard and Wellner [56] use the "two-sided" Hardy inequality given by (2.14) to give an alternative proof of Cheeger's inequality. Applications of the Hardy inequality (2.1) with F continuous to semiparametric models for survival analysis were given by Ritov and Wellner [54] and Bickel et al. [7]. As noted in Sections 2, 5, 7, and 8, these results yield martingale connections with the operators H F and H * F . There has been some related work on Hardy type inequalities with similar unification (of continuous and discrete cases) as an explicit goal: for example, see Kaijser et al. [33] and Evans et al. [24], page 45. Li and Mao [42], pages 257-258, refer to Prokhorov [52]. They all study general measures.
What about related work on formulating probabilistic versions of Hardy type inequalities? We have not found any results in this direction. Despite the many applications of Hardy and Muckenhoupt type inequalities in probability theory over the past 30 years, we are unaware of any explicit mention of these inequalities in terms of random variables. It seems to us that these inequalities should be better known in both the probability and statistics communities, and the probability versions may stimulate both further applications and further theoretical developments. In any case, it seems to be worthwhile to understand when several different formulations can be unified.
In Section 8 we sketched the connection between the operators H * F and H * F appearing in our probabilistic version of Copson's dual inequality and a simple counting process martingale. The key functions Λ F (x) and Λ F (x) appearing in those operators (recall (5.10) for the explicit definitions) play an extremely important role in survival analysis and reliability theory. Also note that they do not appear without the probabilistic perspective adopted in our approach. In the Appendix (Section 12) we discuss how these functions arise in connection with left and right censored survival data.

Appendix
Here we go further with the discussion concerning the forward and backward hazard functions connected with our random variable versions of the Copson inequalities.

Censored survival data: from the right and from the left
Suppose that X 1 , . . . , X n are i.i.d. survival times with d.f. F on [0, ∞). Furthermore, suppose that Y 1 , . . . , Y n are i.i.d. censoring times (independent of X 1 , . . . , X n ) with distribution function G. Unfortunately we do not get to observe the X i 's. Instead, for each individual we observe Nevertheless, our goal is to estimate the cumulative hazard function Λ F (t) = (1 − F (s−)) −1 dF (s) and the survival function 1 − F nonparametrically. Actually, once we have an estimator Λ F,n of Λ F , then estimation of 1 − F (and hence also F ) is immediate since estimators of Λ and 1 − F are the famous Nelson-Aalen estimators Λ of Λ and Kaplan-Meier estimator 1 − F n of 1 − F . This is the random censorship version of right-censored survival data. For treatments of fixed (i.e. deterministic) censoring times, see Pollard [51] and Meier [45].
Before discussing right-censoring further, suppose instead that we observe where the U i 's are i.i.d. with d.f. F , and the V i 's are i.i.d. G (and independent of the U i 's).
The goal again is to estimate the (reverse or backwards) cumulative hazard function Λ F (t) ≡ [t,∞) dF (s)/F (s) and the d.f. F . This is left-censored survival data. Note that Λ F is the function which arose naturally in the random variable version of Copson's inequality in Section 8. A famous example of left-censored data is the data which arose in a study of the descent times of baboons in the Amboseli Reserve, Kenya. See [63], [64], [19], [20].
In this study the U i 's represent the times when the baboons descended from the trees in the morning while the V i 's represent the times at which the investigators arrived at the study site. If a baboon descended before its observer arrived at the study site, then that baboon's U i is regarded as being "left -censored". Again the goal is nonparametric estimation of the d.f. of the U i 's.
In this setting, once we have an estimator Λ F,n of Λ F , then estimation of F is immediate since

Nonparametric estimation for right or left censored survival data
First the classical and frequently occurring censoring from the right. To see that Λ F and 1 − F can be estimated nonparametrically from the observed data, consider the following empirical distributions: where "uc" stands for "uncensored" observations and "c" stands for "censored" observations. By the strong law of large numbers,     1 K n (s) dK uc n (s).
For more on left-censoring, the data in the baboon study, and a plot of the resulting backwards Kaplan-Meier estimator, see Andersen et al. [1], pages 24, 162-165, and 273-274.