Sharp Lower and Upper Bounds for the Covariance of Bounded Random Variables

In this paper we derive sharp lower and upper bounds for the covariance of two bounded random variables when knowledge about their expected values, variances or both is available. When only the expected values are known, our result can be viewed as an extension of the Bhatia-Davis Inequality for variances. We also provide a number of diﬀerent ways to standardize covariance. For a binary pair random variables, one of these standardized measures of covariation agrees with a frequently used measure of dependence between genetic variants.


Introduction
What can be said about the statisical dependency between two random variables X and Y , when some information about their marginal distribution is available?The answer to this question depends on the dependency measure being used as well as the type of restrictions that are imposed on the marginal distributions of X and Y .The covariance Cov(X, Y ) = E(XY ) − E(X)E(Y ) is one of the most frequently employed measures of dependence between two random variables X and Y , when these are measured on an interval scale.The above question can then be phrased as finding lower and upper bounds of Cov(X, Y ) that incorporate any available information about the marginal distributions of X and Y .The most well known such covariance bounds follow from the Cauchy-Schwarz Inequality, originally stated by Augustine Louis Cauchy in 1821, and later proved independently by Viktor Bunyakovski and Karl Hermann Schwarz.The lower and upper bounds in (1) only involve the variances Var(X) and Var(Y ) of X and Y , and they are attained when Y = kX + l is a linear function of X with negative and positive slope respectively.A related class of covariance bounds involve not only the marginal distributions of X and Y , but more generally the variance of some function h(X, Y ) of X and Y (Koop, 1964, Kimeldorf andSampson, 1973).There is also a large literature on covariance bounds when X = f (Z) and Y = g(Z) are functions of the same random variable Z.These results make use of various mathematical tools such as the Hoeffding Inequality (Hoeffding, 1940), Chebyshev's Integral Inequality and Stein operators, see for instance Egozcue (2015), He and Wang (2015), Ernst et al. (2019) and references therein.
In this article we consider a pair of bounded random variables X and Y , when knowledge about their marginal distributions is given in terms of their expected values E(X) and E(Y ), and/or their variances Var(X) and Var(Y ).Barnett and Dragomir (2004) considered the case when the expected values of X and Y are known, and they derived lower and upper bounds for the covariance of X and Y .However, these bounds are not sharp, and may thus include values of the covariance that are logically impossible, given the expected values.We provide sharp lower and upper bounds for Cov(X, Y ) when the expected values of X and Y are known, which extend well known results for binary random variables (Ferguson, 1941, Cureton, 1959, Guilford, 1965, Davenport and El-Sanhurry, 1991).These bounds can also be viewed as a generalization of the Bhatia-Davis Inequality (Bhatia and Davis, 2000), which provides an upper bound on the variance of a bounded random variable, when its expected value is known.We demonstrate that our covariance bounds are attained when the joint distribution of X and Y is discrete, with at most three possible outcomes.We also derive lower and upper bounds of Cov(X, Y ) when the variances of X and Y are known.These bounds are either equal to or truncated versions of the Cauch-Schwarz bounds in (1), depending on whether the expected values of X and Y are unknown or known.
The covariance bounds that we propose naturally lead to four different standardized measures of covariation between bounded random variables, depending on whether the expected values and variances of these two random variables are known or not.In particular, for binary random variables with known expected values, the corresponding standardized measure of covariation coincides with a measure of dependence used to quantify linkage disequilibrium between two biallelic genetic variants (Lewontin, 1965, Chapter 8 of Thomas, 2004).
Our paper is organized as follows: In Section 2 we present our new and sharp covariance bounds of X and Y when the expected values but not the variances of these two random variables are known.Then in Section 3 we derive covariance bounds of X and Y when the variances of these two random variables are known, whereas the expected values are either known or not.The four standardized measures of covariation are introduced in Section 4, and finally a discussion in Section 5 concludes.

Covariance bounds when variances are unknown
In particular, the lower covariance bound in (2) is attained for a pair (X, Y ) of discrete random variables having at most three possible outcomes, with whereas the upper covariance bound in (2) is attained for another pair (X, Y ) of discrete random variables having at most three possible outcomes, with It turns out that Theorem 1 is related to the Bhatia-Davis Inequality for the variance of bounded random variables.This inequality implies Setting X = Y , a = c, and b = d we find that the upper bound of Cov(X, X) = Var(X) in (2) agrees with (5).It is possible to combine the Bhatia-Davis Inequality with the Cauchy-Schwarz Inequality (1).Indeed, inserting ( 5) into (1) we deduce It follows from Theorem 1 that the bounds in (6) are at least as wide as those in (2).We will give precise conditions under which the bounds in ( 6) are strictly wider.To this end, it is helpful to rewrite the expected values of X and Y as for some constants 0 ≤ α, β ≤ 1.These numbers quantify the expected values of X and Y on a relative scale, and as the following result shows, they determine when the Cauchy-Schwarz covariance bounds are strictly wider than those of Theorem 1: The covariance bounds of ( 6) are at least as wide as those of (2).The lower covariance bound of ( 6) equals the one in (2) if and only if α + β = 1, and then the random vector (X, Y ) in (3) that attains this lower bound has a two point distribution supported on (a, d) and (b, c).
Whenever the lower covariance bounds of ( 2) and ( 6) differ, the random vector (X, Y ) in (3) that attains the lower bound of (2) has a three point distribution.
The upper covariance bound of (6) equals the one in (2) if and only if α = β, and the random vector (X, Y ) in ( 4) that attains this upper bound has a two point distribution supported on (a, c) and (b, d).Whenever the upper bounds of ( 2) and ( 6) differ, the random vector (X, Y ) in ( 4) that attains the upper bound of (2) has a three point distribution.
Bartnett and Dragomir (2004) investigated upper and lower bounds of the covariance Cov(X, Y ) of two bounded random variable with known expected values.At the end of Section 8 of their paper, they obtain The following result details how (8) compares to the covariance bounds of Theorem 1: Corollary 2. The upper and lower covariance bounds in (2) are strictly sharper than those obtained from (8), even when the two terms b−a and d−c are removed from the right-hand side of (8).
By minimizing (maximizing) the left-hand (right-hand) side of (2) with respect to E(X) and E(Y ) , it is possible to derive sharp lower (upper) bounds of Cov(X, Y ) when the expected values are unknown: 3 Covariance bounds when variances are known In this section we will assume that the variances Var(X) and Var(Y ) of X and Y are known.To begin with, we also assume that the expected values E(X) and E(Y ) are known.The following result unifies Theorem 1 with the Cauchy-Schwarz Inequality (1): Theorem 2. Assume that the expected values E(X) and E(Y ), as well as the variances Var(X) and Var(Y ), of a ≤ X ≤ b and c ≤ Y ≤ d are known.Then ) provide sharp lower and upper bounds for the covariance of X and Y .
Example 1 (Three point distributions).To gain intuition for the results in Theorems 1 and 2, and their relation to the Cauchy-Schwarz bounds, it is instructive to consider the special case when both X and Y have three-point distributions, as follows.Let X ∈ {a, E(X), b} and Y ∈ {c, E(Y ), d} with P (X = E(X)) = r, P (Y = E(Y )) = s, and With these figures we have that the expected values of X and Y are given by (7), whereas Define the log odds Computing the ratio between the lower bound for Cov(X, Y ) in Theorem 1 and the Cauchy-Schwarz lower bound, gives, after some algebra Similarly, computing the ratio between the upper bound for Cov(X, Y ) in Theorem 1 and the Cauchy-Schwarz upper bound, gives min If either r or s approach 1, then the ratios in ( 12) and ( 13) both approach infinity.Hence, when most of the probability mass is located at the mean for either X or Y , the Cauchy-Schwarz bounds tend to be more informative (i.e.narrower) than the bounds in Theorem 1, or equivalently, the Cauchy-Schwarz bounds will appear in Theorem 2. Conversely, if both r and s approach 0, then the ratios in ( 12) and ( 13) approach their numerators, respectively, which are both ≤ 1.
Hence, when most of the the probability mass is located at the extreme ends for both X and Y , the bounds in Theorem 1 tend to be more informative than the Cauchy-Schwartz bounds, and therefore the bounds of Theorem 1 will also appear in Theorem 2. An exception from the latter occurs for the lower bound when ψ X = −ψ Y (or α = 1 − β), that is, when X and Y have opposite skews (cf.Corollary 1).In this case the numerator of ( 12) is equal to 1, which implies that the lower bound in Theorem 1 is never more informative than the Cauchy-Schwartz lower bound.A similar exception occurs for the upper bound when ψ X = ψ Y (or α = β), that is, when X and Y have the same skews (cf.Corollary 1).In this case the numerator of ( 13) is equal to 1, which implies that the upper bound in Theorem 1 is never more informative than the Cauchy-Schwartz upper bound.
Example 2 (Continuous distributions).In order to illustrate the difference between Theorems 1 and 2 for continuous random variables, assume that rescaled versions of X and Y have beta distributions.Given numbers 0 < α, β, r, s < 1, we postulate where the limits r → 0, s → 0 (r → 1, s → 1) correspond to the same two point (one point) distributons of X and Y as in Example 1.Using formulas for the expected value and variance of a beta distribution, it follows that the expected values and variances of X and Y are the same as in Example 1 (cf.( 7) and ( 11)), for any values of α, β, r, and s.Therefore, the ratios between the bounds of Theorem 1, and the corresponding Cauchy-Schwarz bounds, are the same as in ( 12)- (13).
By minimizing (maximizing) the left-hand (right-hand) side of (2) it is possible to derive lower (upper) bounds of the covariance of X and Y when the variances but not the expected values of these two random variables are known:

Standardized Measures of Covariation
In this section we will present four different ways of standardizing the covariance of a ≤ X ≤ b and c ≤ Y ≤ d, so that all values in [−1, 1] are possible for the standardized measure.The form of these standardized covariances will depend on whether the expected values and variances of X and Y are known or not.

No moments known
When neither the expected values nor the variances of X and Y are known we use Corollary 3 and introduce

Variances known
When the variances but not the expected values of X and Y are known, we employ Corollary 4 and use the ordinary correlation coefficient as a standardized version of the covariance.

Exected values known
Assume that the expected values but not variances of X and Y are known.
Then we use Theorem 1 and define ) as a standardized covariance.In particular, when X and Y have two point distributions on {a, b} and {c, d}, D ′ (X, Y ) is a well known measure of dependence (Ferguson, 1941).In genetic epidemiology it is a freqeuently used measure of linkage disequilibrium betweeen two biallelic genetic variants (Lewontin, 1965).

Expected values and variances known
If the expected values and variances of X of Y are known, it is natural to use Theorem 2 for standardizing the covariance of X and Y .This amounts to a definition

Relations between the standardized measures of covariation
Our four measures of standardized covariation have a partial ordering There is however no general ordering between |r(X, Y )| and | holds for binary random variables, we recall from Examples 1-2 that this inequality sometimes goes in the other direction when X and Y have three point distributions or beta distributions.

Discussion
In this paper we derived sharp lower and upper bounds for the covariance of two bounded random variables X and Y when their expected values and/or their variances, are known.This resulted in various ways of standardizing covariances, some of which are well known, whereas others are new.A number of extensions are of interest.A first extension is to find the minimum and maximum covariance of two bounded random variables under other moment constraints than expected values and variances.More generally, it would be of interest to derive covariance bounds under various types of restrictions on the marginal distributions of X and Y .A second extension is to obtain bounds for other types of dependency measures between X and Y , under various restrictions on the marginal distributions of these two random variables.Examples of alternative dependency measures include the kappa statistic (Cohen, 1960) and proportional reduction in entropy (Theil, 1970) for nominal random variables and the gamma statistic (Goodman and Kruskal, 1954) for ordinal random variables.

A Appendix
A.1 Proofs from Section 2.
Proof of Theorem 1. .Since the covariance operator as well as the lower and upper bounds of (2) are bilinear, equation ( 2) is invariant with respect to linear transformations of X and Y .We may therefore without loss of generality assume a = c = 0 and b = d = 1.Thus our objective is to prove for pairs (X, Y ) of random variables satsifying 0 ≤ X, Y ≤ 1, or equivalently Moreover, we also need to show that the lower (upper) bounds of (A.2) are attained by a binary pair of random variables satisfying (3) and ( 4) respectively, with a = c = 0 and b = d = 1.
It is clear that q 11 + c is minimized when c is chosen as small as possible, and yet satisfies (A.6).This corresponds to c = − min(q 00 , q 11 ) and in agreement with the lower bound of (A.2).Analogously, q 11 + c is maximized for c = min(q 10 , q 01 ), corresponding to in agreement with the upper bound of (A.2).The proof is finalized by noticing that the two vectors p = q−min(q 00 , q 11 )v and p = q+min(q 10 , q 01 )v correspond to the bivariate distributions of (X, Y ) in ( 3) and ( 4 respectively.The lower bound of (A.8) is at least as small as that in (A.7), since Moreover, it is clear that the lower bounds of (A.7) and (A.8) agree if and only if is strictly increasing on (0, 1) it follows that (A.9) is equivalent to α = 1−β.Moreover, when α = 1−β, the random vector (X, Y ) of (3) has a two point distribution, since P (X = x, Y = y) = 0 when (x, y) equals (a, c) and (b, d).This concludes the proof for the lower covariance bounds (2) and ( 6).The proof for the upper covariance bounds is analogous.
Proof of Corollary 2. Denote the upper and lower covariance bounds of (2) by U and L, whereas those in (8) are denoted U BD and L BD respectively.We will start comparing the two upper covariance bounds.Recall from (A.7) that whereas the upper covariance bound of (8) takes the form where in the last step we made use of (7).Hence For the lower covariance bounds we similarly derive Proof of Corollary 3. We will only verify the upper bound of (9), since the proof of the lower bound is analogous.By maximizing the upper bound in (9) with respect to E(X) and E(Y ), and making use of the parametrization (7), it follows that where in the second step we invoked Corollary 1. Thus the upper bound of Cov(X, Y ) is at most equal to (b − a)(d − c)/4.The fact that the upper bound of Cov(X, Y ) indeed has this value follows from the fact that there is equality in the second step of (A.10) when α = β = 0.5, the values of α and β for which the maximum in the third step of (A.10) was attained.
since x 1 = x 1 (p) and x 2 = x 2 (p) are uniquely determined by p through the system of equations (A.13), and therefore the two point distribution of X has only one degree of freedom.
In the fourth step of (A.16) we introduced the binary random variable X * , with P (X * = x 1 ) = 1 − p and P (X * = x 2 ) = p for some 0 ≤ x 1 < E(X) < x 2 ≤ 1 and p that satisfy (A.13), so that E(X * ) = E(X) and Var(X * ) = Var(X).The inequality in the last step of (A.It remains to verify that (A.15) equals (A.12), and this requires an explicit formula for κ.To this end we first note that the upper equation of (A. 13)

Theorem 1 .
Throughout this article we assume that a ≤ X ≤ b and c ≤ Y ≤ d are two bounded random variables, restricted by lower and upper bounds −∞ < a < b < ∞ and −∞ < c < d < ∞ respectively.In this section we will investigate which values are attainable for the covariance Cov(X, Y ) of X and Y , when the variances of X and Y are unknown, whereas the expected values E(X) and E(Y ) are either known or not.The following theorem treats the case when the expected values are known: Assume that the expected values E(X) and E(Y ) of a ≤ X ≤ b and c ≤ Y ≤ d are known.Then the covariance of X and Y satisfies

Corollary 4 .
Assume that the variances Var(X) and Var(Y ) of a ≤ X ≤ b and c ≤ Y ≤ d are known.Then the Cauchy-Schwarz Inequality (1) provide sharp lower and upper bounds for the covariance of X and Y .