Confidence intervals for the means of the selected populations

Consider an experiment in which p independent populations πi with corresponding unknown means θi are available, and suppose that for every 1 ≤ i ≤ p, we can obtain a sample Xi1, . . . , Xin from πi. In this context, researchers are sometimes interested in selecting the populations that yield the largest sample means as a result of the experiment, and then estimate the corresponding population means θi. In this paper, we present a frequentist approach to the problem and discuss how to construct simultaneous confidence intervals for the means of the k selected populations, assuming that the populations πi are independent and normally distributed with a common variance σ2. The method, based on the minimization of the coverage probability, obtains confidence intervals that attain the nominal coverage probability for any p and k, taking into account the selection procedure.


Introduction
Given a set of p available features, researchers must often determine which one is the best, or simply rank them according to a certain prespecified criteria. For Post-selection confidence intervals 59 instance, researchers may be interested in determining what treatment is more efficient in fighting a certain disease, or ranking the level of gene expression in a genomics experiment. This type of problems is commonly referred to as ranking and selection procedures and specific solutions and methods have been proposed in the literature since the second half of the 20th century, with a start that is usually traced back to the pathbreaking works of Bechhofer [2] and Gupta & Sobel [16].
In his paper, Bechhofer presents a single sample multiple decision procedure for ranking means of normal populations. Assuming the variances of the populations are known, he is able to obtain closed form expressions for the probabilities of a correct selection for a number of cases. The solution is based on the use of an indifferent zone, that allows him to determine a minimum guaranteed probability of selecting the population with the largest mean, as long as that mean is separated from the second largest by a prespecified distance δ [see 3]. Alternatively, Gupta and coauthors have pioneered the subset selection approach, in which a subset of populations is selected with a minimum probability guarantee of containing the largest mean with certain probability P * [see 15].
Note that both of these approaches are mainly concerned with the problem of correct selection of the population with the largest mean rather than estimation of the selected mean. This second problem has also been widely discussed in the literature, and in the following two sections we present a brief summary of the main findings, giving separate consideration to the point estimation and interval estimation procedures.

Point estimation
There are two formulations of the point estimation problem under selection, with subtle differences between them. Suppose that we have p populations, with unknown means θ i (1 ≤ i ≤ p). Assuming that for every 1 ≤ i ≤ p we can obtain a sample X i1 , . . . , X ini from the population π i , we can either: 1. Attempt to select the population that has the largest parameter, i.e. max{θ 1 , . . . , θ p }, and estimate its value. 2. Select the population that gives the largest sample mean, and estimate the corresponding θ i .
The first of these problems has been widely discussed in the literature. For example, Blumenthal & Cohen [6] consider estimating the larger mean from two normal populations and compare different estimators, but they do not discuss how to make the selection. In this direction, Guttman & Tiao [17] propose a Bayesian procedure consisting of the maximization of the expected posterior utility for a certain utility function U (θ i ). In the same direction, but from a frequentist perspective, Saxena & Tong [27], Saxena [26], and Chen & Dudewicz [9] consider point and interval estimation of the largest mean.
Surprisingly, despite its relevance from a practitioner's perspective, the second problem has received less attention. In this context, a common and widely used estimator is δ(X) = p i=1 X i I(X i = X (1) ), where X i is the sample mean of the i-th population, X (1) = max{X 1 , . . . , X p }, and I(·) denotes the indicator function. Properties of this estimator have been discussed in the literature and is known to be biased [see 21]. This is particularly evident when all the populations are identically distributed (the iid case), where the problem of bias arises from the estimation of the common mean using an extreme observation.
Other properties of the estimator have also been evaluated. In particular, minimaxity and admissibility of the naive estimator δ(X) were first established by Stein [29] for the case p = 2. For larger dimensions, Sackrowitz & Samuel-Cahn [25] proved the estimator is not minimax for the normal family when p ≥ 3. Standard techniques to determine admissibility, following the ideas in Berger [5], Brown [7] and Lele [20], are not straightforward in this problem. However, Brown [8] is often credited to have shown admissibility of the estimator for the general case.
Nevertheless, the problem of improving the intuitive estimator is technically difficult. Dahiya [12] addresses this problem for the case of two normal populations and proposed estimators that perform better in terms of mean squared error (MSE). In this direction, progress was made by Cohen & Sackrowitz [10,11] and Gupta & Miescke [14], where Bayes and generalized Bayes rules were obtained and studied. Venter [31] considered a bias correction approach for the problem obtaining estimators that perform well in terms of frequentist risk, and following this idea, Venter & Steel [33] introduced later the ω-estimators which basically are a weighted average of the order statistics. Despite of these results, performance theorems are scarce. Some exception are Hwang [18], who proposes an empirical Bayes estimator and shows that it performs better in terms of the Bayes risk with respect to any normal prior, and Sackrowitz & Samuel-Cahn [24] who find UMVUE and minimax estimators of the mean of the selected population for the negative exponential distribution.
For the selection problem under sparsity, Reid & Tibshirani [23] implicitly make such assumption, with many effect sizes θ i = 0. They adapted theory developed in Lee et al. [19] and consider doing post-selection inference with the Lasso. Simon & Simon [28] considered adjusting the selection bias of the naive estimate from a frequentist perspective. They start by estimating the mean by maximum likelihood, and then estimate the bias in order to achieve bias reduction. To further improve on this estimate the authors suggested estimating the second order bias as well. They also compare the bias from their frequentist approach to the bias from the empirical Bayes approach in Efron [13], which turn out to be very similar. However, an advantage of the approach in [28] is that it is not limited to the Gaussian setting.

Interval estimation
The problem of interval estimation is equally challenging. Typically, confidence intervals are constructed in the usual way, using the standard normal distribution as a reference to attain the desired confidence level. However, these intervals fail to maintain the nominal coverage probability as the number of populations increase. As a correction, the intervals are sometimes constructed using the Bonferroni bounds determined by the number of selected populations k. This approach is clearly a misuse of the Bonferroni correction and, in practice, the constructed intervals also fail to achieve the desired nominal level. To illustrate this issue, Figure 1 shows the results of a small numerical experiment in which we observe the behavior of the confidence coefficient against the number of populations in the iid case for the traditional intervals (Trad), Bonferroni adjusting by k = p populations (Bonfp) and Bonferroni adjusting by k = 5 (Bonf5), using a nominal level of 95%. A valid approach would be to obtain simultaneous intervals for all the populations under consideration (and not just the selected ones) using the Bonferroni bounds. However, such approach is typically uninteresting either because Bonferroni intervals are known to be too conservative as the number of populations p increases, or simply because they do not offer a direct solution to the problem.
Venter [32] considers this problem and discusses how to construct confidence intervals when only one population is selected. By examination and optimization of the coverage probability, he proposes the construction of asymmetric intervals obtained as the intersection of two one-sided confidence regions. These intervals perform better than Bonferroni's simultaneous intervals in terms of length, but he does not extend his results to higher dimensions. More recently, Qiu & Hwang [22] propose an empirical Bayes approach to construct simulta-neous confidence intervals for K selected means. Using a Bayesian framework, they consider a normal-normal model for the mean of the selected population, assuming that each population mean θ i follows a normal distribution. Under these assumptions, they make use of a James-Stein type estimator to center the intervals and obtain simultaneous confidence intervals that asymptotically maintain the nominal coverage probability and are substantially shorter than the ones obtained using the Bonferroni bounds. However, these intervals are not guaranteed to attain the nominal level for any p and their solution does not take in consideration the selection mechanism. Moreover, since their coverage probabilities are obtained by averaging over both sample space and prior, they do not give a valid frequentist interval.
With the explosion of big data, modern variations of this problem have become popular calling for the development of new methodologies. For instance, Benjamini & Yekutieli [4] and Zhao & Hwang [35] focus on multiplicity correction and discuss how to control for the false coverage rate (FCR), that is, the proportion of confidence intervals that fail to cover the parameter of interest. In their approach, given 0 < q < 1 they discuss how to construct confidence intervals so that the expected FCR for the selected intervals does not exceed q. In that context, the number of selected populations is a random quantity depending on q, and the resulting intervals (by construction) do not necessarily achieve the joint nominal level, which is the main purpose of this paper. We are not aware of any other attempts to solve this problem, in particular of any solutions that take explicit consideration of the selection mechanism.
In this paper, we focus on the problem of interval estimation following selection from a frequentist perspective and discuss how to construct simultaneous confidence intervals for the selected means. In Section 2 we obtain closed form expressions for the coverage probability of interest in the context and find tight lower bounds by minimizing the coverage probability function. In Section 3 we use these bounds to obtain valid confidence intervals, when the common variance σ 2 is known or unknown, introducing asymmetric intervals. In Section 4 we present some numerical results that illustrate the performance of the proposed intervals and we finish with a brief discussion in Section 5. Some technical details are included in the Appendix.

Coverage probability results
Let X 1 , . . . , X p be independent random variables such that where θ i is unknown, but the common variance σ 2 is known. For simplicity, we take σ 2 = 1. Define the order statistics X (1) , . . . , X (p) as the sample values placed in descending order, that is, the order statistics satisfy X (1) ≥ . . . ≥ X (p) . In this context, the problem is to construct confidence intervals for the mean of the populations that give the largest sample means as a result of the experiment. Formally, for any 0 < α < 1 and 1 ≤ k ≤ p specified prior to the experiment, our aim is to construct simultaneous confidence intervals for θ (1) for 1 ≤ j ≤ k and I(·) is the indicator function.
Even for k = 1 it is not difficult to realize that the traditional confidence intervals fail to maintain the nominal coverage probability. For instance, if where Φ(·) is the cumulative distribution function (cdf) of the standard normal distribution, and z α/2 is the corresponding (1 − α/2)-th percentile. Then, for p = 3, we obtain for any 0 < α < 1. In fact, it is easy to show that the coverage probability of the traditional intervals maintains the nominal level only for p = 1, 2, and then decreases as p approaches infinity. A similar argument shows why the Bonferroni correction based on k < p lead to the construction of invalid intervals, and constitutes a misinterpretation of the Bonferroni procedure. The problem with both of these approaches is that they fail to take into account the selection mechanism, resulting in symmetric intervals centered at X (j) , and therefore ignoring the possible bias of the naive estimators. We address these issues by considering the partition of the sample space induced by the order statistics and obtaining a closed form expression for the coverage probability, which allow us to construct asymmetric intervals.

Selecting the best population
We begin by considering the case k = 1. The coverage probability of interest is of the form where c, d > 0 are constants. Typically, we are interested in the case d ≤ c to correct for the selection bias. Assuming that the variance σ 2 is known, we define Z j = (X j − θ j )/σ for j = 1, . . . , p, and obtain Hence, considering the transformation T : (z 1 , z 2 , . . . , z p ) → (z, y 2 , . . . , y p ) with z = z 1 and y j = z 1 − z j (j = 2, . . . , p), we can write the first term of the sum in (1) in terms of the parameters Δ 21 , . . . , Δ p1 . We obtain where, without loss of generality, we have taken σ 2 = 1 to ease the notation.
Noticing that for fixed z, the integrals within the curly { } brackets are essentially the tail probabilities of a normal distribution centered at z, we can write where φ(·) denotes the probability density function (pdf) of the standard normal distribution. It follows that we can entirely describe the probability P ( Under this representation, the values of the coverage probability are completely determined by the relative distances between the population means θ i (1 ≤ i ≤ p) and therefore, we can think of the coverage probability as a function h = h(Δ|c, d), where Δ denotes the vector of possible configurations of the Δ ij .
For instance, when p = 3, we can fully describe the problem in terms of the parameters Δ 21 and Δ 32 . Figure 2 shows the form of the coverage probability surface in terms of these parameters for two different values of c, d. If we further assume, without any loss of generality, that θ 1 ≤ θ 2 ≤ θ 3 , we can write the probability of interest as where Δ 21 , Δ 32 ≥ 0. Observe that a trivial lower bound for the probability in (2) is given by Φ(c) − Φ(−d), obtained by conveniently setting the values of Δ ij equal to 0 or ∞ in each term of the expression. In order to obtain a sharper bound we look at the behavior of the coverage probability in order to find its minimum. Looking at the partial derivatives, we obtain Combining terms and changing variables, we can rewrite (3) as ∂h/∂Δ 21 The values of c, d can be chosen so that D 1 and D 2 are both positive (see Lemma A.1 in the Appendix), and therefore, ∂h/∂Δ 21 > 0. A similar argument gives ∂h/∂Δ 32 > 0, and we obtain that the coverage probability is minimized at Δ 21 = Δ 32 = 0, or equivalently, at θ 1 = θ 2 = θ 3 . The general result follows from induction on p, leading to the following theorem: Theorem 1. Let X 1 , . . . , X p be independent random variables with X i ∼ N (θ i , σ 2 ) for i = 1, . . . , p. Then, for any c > 0 there exists 0 < d 0 ≤ c such that: for some 1 ≤ q ≤ p.

Selecting the top k populations
For general k, the coverage probability of interest is of the form

C. Fuentes et al.
Observe that each term of the sum in (6) is determined by the joint distribution of X 1 , . . . , X p . For instance, when (j 1 , . . . , j k ) = (1, . . . , k), the corresponding piece of relevant probability is Combining (6) and (7), it is not difficult to obtain the following expression for the coverage probability where I j = {j 1 , . . . , j k } and I c j = {j k+1 , . . . , j p } are respectively the set of indices for the top k variables and the bottom p − k variables in the j-th arrangement.
Before we move forward, let us take a closer look at the formula in (8) and consider the case p = 6 and k = 3. Then, the sum will have 6 3 = 20 terms determined by the configurations where the numbers to the left of the vertical line correspond to the indices of the set I j (the populations being selected in the respective arrangement) and the numbers on the right to the indices of the set I c j (the non-selected populations). Observe that all the indices appear on the left side (and on the right side) the same number of times. Now, suppose that θ 1 ≤ θ 2 ≤ θ 3 ≤ θ 4 ≤ θ 5 ≤ θ 6 and let θ 6 ↑ ∞. Then, all the terms where population 6 is on the right side approach to zero. On the other hand, for those terms where 6 appears on the left, the value of Φ (min =1,...,k {z + θ } − θ m ) becomes independent of θ 6 . It follows that, as θ 6 ↑ ∞, the coverage probability is determined by the configurations which correspond to the number of ways to choose 2 out of 5 populations.
Repeating the argument, but letting θ 5 ↑ ∞, we obtain the configurations 1|234 3|124 2|134 4|123 which are the possible ways to choose 1 out of 4 populations and we know, from Section 2, that the minimum is reached at θ 1 = θ 2 = θ 3 = θ 4 . To extend this argument for the general case (1 ≤ k < p), we first notice that the number of possible configurations is is the number of times that any given index i appears on the right side (population i is not selected) and p−1 k−1 is the number of arrangements that have the index i on the left side (population i is selected).
Due to the symmetry of the problem, we assume (without any loss of generality) that θ 1 ≤ . . . ≤ θ p . Also, for every j such that p ∈ I j we define where I(·) is the indicator function. Then, we can write and therefore At the same time, for every j such that p ∈ I c j , we have and consequently, as θ p ↑ ∞, the coverage probability converges to Integrating (9) with respect to z p , we obtain where the quantity in square [ ] brackets is exactly the coverage probability for selecting k − 1 out of p − 1 populations. Repeating the argument, but now letting θ p−1 ↑ ∞, we obtain Continuing this process (k − 1 times) until the sets I j consist of only one element, we find that the coverage probability converges to as θ p−k increases to infinity. Finally, we observe that the expression in square [ ] brackets in (10) corresponds to the coverage probability for selecting 1 out of p−k+1 populations, and we showed in Theorem 1 that the sum is minimized at θ 1 = . . . = θ p−k+1 whenever d > d 0 . We obtain the following general result that we prove in Appendix A.2. Theorem 2. Let X 1 , . . . , X p be independent random variables with X i ∼ N (θ i , σ 2 ) for i = 1, . . . , p. Then, for any c > 0 and 1 ≤ k ≤ p there exists

Post-selection confidence intervals
For 1 ≤ i ≤ p, let X i1 , . . . , X in be a random sample from a population π i with unknown mean θ i and variance σ 2 . We assume the populations π i are indepen-dent and normally distributed or that n is large enough so we can safely apply the central limit theorem. In other words, we assume that for every i, the corresponding sample mean X i = n −1 n j=1 X ij follows a N (θ i , σ 2 /n) distribution. We now apply the bounds obtained in Section 2 to obtain the desired confidence intervals. The proof of these results is a direct consequence of Theorem 2. 0 < α < 1 and for i = 1, . . . , p, suppose that X i1 , . . . , X in  is a random sample from a N (θ i , σ 2 ) population, where θ i is unknown, but the variance σ 2 is known. Then, simultaneous confidence intervals for θ (1) , . . . , θ (k) , with a joint confidence coefficient of (at least) 1 − α are given by

Theorem 3. Let
Corollary 1. Let 0 < α < 1 and for i = 1, . . . , p, suppose that X i1 , . . . , X in is a random sample from a N (θ i , σ 2 ), where θ i is unknown, but σ 2 is known. Then, a confidence interval for θ (1) , with a confidence coefficient of (at least) 1 − α is given by For given 0 < α < 1, the specific values of c, d can be determined numerically using the conditions in Theorem 1, optimizing for the length of the intervals. For the case k = 1, we can use equation (1) in Corollary 1, to explore the asymptotic behavior of the cutoff value c. It can be shown that the length of the intervals grow approximately at the rate 2 √ log p, as p increases. However, more interestingly, we observe that the limits of the intervals are determined by the cdf of the first order statistic of a random sample from a standard normal population and therefore, from extreme value theory results, we obtain where a p = (2 log p − log log p − log 4π) 1/2 and b p = 1/a p (see [1]). The problem of determining the asymptotic behavior of the intervals for the general case is more involved and will not be addressed in this note.

The unknown variance case
If the variance σ 2 is unknown, we need to estimate its value. We assume that we have an independent estimate s 2 of σ 2 , such that t = s/σ has a pdf f (t). In a normal experiment, where we observe a sample of size n from each population, s 2 can be taken as the pooled variance estimate, so that νs 2 /σ 2 ∼ χ 2 ν , a chi-square distribution with ν = p(n − 1) degrees of freedom.
For k = 1 we observe that conditioning on t = s/σ, we can rewrite coverage probability as a mixture and obtain where, for fixed t, the probability in the integrand is minimized by the expression in Corollary 1.
Using the same strategy, we can determine a lower bound for the general case and obtain the following result: Theorem 4. Let 0 < α < 1 and for i = 1, . . . , p, suppose that X i1 , . . . , X in is a random sample from a N (θ i , σ 2 ), where θ i is unknown. If the variance σ 2 is unknown, consider the estimate of σ 2 given by Then, simultaneous confidence intervals for θ (1) , . . . , θ (k) , with a joint confidence coefficient of (at least) 1 − α are given by

Numerical studies
In order to evaluate the performance of the new intervals we need to look at width and coverage probability. For the case k = 1, we first look at the behavior of the actual confidence coefficient in terms of the total number of populations p. As we discussed in Section 2, the use of intervals of the form X (1) ± z α/2 σ (traditional intervals) or even using a Bonferroni correction based on k < p populations will result in intervals that have a poor performance in terms of coverage probability and do not deliver the desired confidence coefficient. This feature is particularly noticeable when all the populations have similar population means, in which case, the confidence coefficient of the procedures that do not take into account the selection mechanism approaches zero rapidly.
In terms of length, in order to maintain the desired confidence coefficient the intervals need to get wider as the number of population increases. However, even for c = d (the symmetric case), the new intervals perform better than Bonferroni. Venter's method produces asymmetric intervals that are substantially shorter than Bonferroni, but optimizing the length in our approach we can obtain asymmetric confidence intervals that are consistently shorter then Venter's while maintaining the desired confidence coefficient. In Figure 3 we can find a plot of the interval lengths against the number of populations for each one of these procedures for α = 0.05. We observe that the new intervals are significantly shorter than the other methods, even for a small number of populations.
In applied situations, the population means θ i (1 ≤ i ≤ p) will rarely be equal. If the largest θ i is far apart from the rest, then, the selected population is likely to be the population with the largest mean. When this is the case, the selection mechanism will have little or negligible effect on the outcome of the experiment and the traditional intervals will be adequate. On the other hand, by construction, the proposed intervals will have a larger confidence coefficient than the nominal level for any configuration of the θ i that does not correspond to the iid case. Therefore, it is important to evaluate the performance of the intervals for different configurations of the population means. We observe that the confidence coefficient of the traditional intervals is below the nominal level even when the population means are relatively separate. At the same time, the new intervals will have a coverage probability closer to the nominal level than Bonferroni and Venter intervals for any configuration. In Table 1 we summarize some of these results obtained for the case p = 6. The first column shows the true values of the population means corresponding to independent normal populations with variance one, and the remaining columns correspond to the actual coverage probabilities for the traditional intervals, Venter's, Bonferroni's and the new procedure. Notice that even when the population means are rel- atively far apart (rows 5 and 6) the traditional intervals fail to maintain the nominal coefficient and that the new intervals perform better than Bonferroni and Venter's intervals for any configuration.
For k > 1 (number of selected populations) the width of the intervals will also grow as k and p increase. Of course, for fixed p the length of the new intervals will approach the length of Bonferroni's intervals when k gets closer to p, but it is important to study the behavior of the new intervals as the number of populations p increases and k is fixed or increases at a certain rate. Numerical studies show that in either case, the new intervals will be shorter than Bonferroni, for any value of k. In Figure 4 we show the results of one of these studies where we considered the number of selected population to be k = 5 (fixed) regardless the total number of populations, and k (variable) equal to 5% of the total number of populations p. In both cases we observe that the proposed new intervals improve in length Bonferroni's intervals and the difference can be significant, even when the total number of populations is relatively small.
Regarding the coverage probability, we conducted numerical studies to determine the actual confidence coefficient of the intervals for different configurations of the population means. In Table 2 we summarize the results of one of these studies for the case p = 6. Again, the first column shows the true values of the population means corresponding to independent normal populations with variance one, and the remaining columns indicate the observed coverage probabilities for the corresponding number of selected populations k. The results are based on 10 replications of 25K simulations reporting and standard error < 0.0001. We observe the actual confidence coefficient is never below to the nominal level and do not exceed Bonferroni. Interestingly, the coverage probabilities get closer to the nominal level when we have some separation between the true population means (rows 5 and 6), however to establish an accurate pattern is difficult, since the actual value coverage probability depends not only on the configuration of the θ i , but also on k and p as can be seen in equation (8).

Discussion
A practitioner might wish to use the data to select scientifically relevant variables and then make inference on the selected ones. In this article we framework  for post-selection inference by constructing simultaneous confidence intervals for the means of k ≥ 1 selected populations. In this context, we were able to derive a closed form exact expression for the coverage probability of such intervals and minimize it to determine a lower bound. As a result we are able to obtain simultaneous confidence intervals that do not fail to maintain the nominal confidence coefficient and retain the frequentist interpretation. The proposed intervals do take into account the selection mechanism and therefore their length increases as the number of populations grow. Furthermore, the approach allow us to construct asymmetric intervals that correct the selection bias of the order statistics used as point estimators and shorten the length of the intervals. For k = 1, the proposed intervals perform better than Venter's and Bonferroni's intervals in both length and coverage probability.
Although the assumptions of normality and equal variances are necessary in order to obtain the exact results in closed form, the results in Theorem 2 are still valid under some mild degree of asymmetry and therefore hold for a larger family of distributions. This, in conjunction with the length of the intervals, allows the methodology to perform well under mild violations of the assumptions.
Finally, we observe that for k > 1 we do not need to consider a fixed common values of c, d for all the intervals. This open the door to further investigate alternatives to optimize the length of the intervals obtained using this method and improve the results.
The approach of this article should be contrasted to the asymptotic approaches of Van de Geer & Dezeure [30] and Zhang & Zhang [34]. These articles approached the post-selection inference issue by first applying a lasso procedure to select relevant features and then use a debiased version of the lasso estimator to account the inherent randomness in the selection procedure. Symmetric intervals are then constructed based on the scaled debiased lasso estimator. The confidence intervals we construct maintain exact, not asymptotic, coverage and do not depend on the implicit selection of tuning parameters.

A.1. Lemma in Theorem 1
We begin by proving the following lemma which may be of independent interest: Lemma A.1. Let Δ > 0 and suppose that f 1 (z) and f 2 (z) are integrable functions over the real line such that f 1 (z) > 0 is non-decreasing and f 2 (z) > 0 is concave and symmetric with respect to Δ/2 and non-decreasing in (−∞, Δ/2).
The inequalities are strict if the function f 1 (·) is strictly increasing in z.
Proof. Since Δ > 0, cases (i) and (ii) are straightforward. For (iii), we have two possibilities for the intervals of integration: depending on whether the intervals overlap or not. Consequently, if we denote by R 1 and R 2 the non-common regions of integration, there are two possible cases 1) R 1 = (−d, Δ − d) and R 2 = (c, Δ + c) 2) R 1 = (−d, c) and R 2 = (Δ − d, Δ + c) Regardless the case, R 1 and R 2 are intervals of the same length. Also, since f 2 (z) is symmetric around Δ/2, we have f 2 (−c) = f 2 (Δ + c) and therefore, g(−c) ≤ g(Δ + c). In fact, for any 0 < δ < length(R 1 ), we have g(−c + δ) ≤ g(Δ + c − δ) and therefore R1 g(z)dz ≤ R2 g(z)dz which completes the proof for part (a) when d 0 = 1. Observe that the argument is till valid for 0 < d 0 < 1 as long as f 1 (z) is strictly increasing in some interval contained in R 2 . In such case, the value of d 0 will depend on the rate at which f 1 (z) increases. For part (b) the inequalities reverse and the result follows completing the proof of the Lemma.
Following a similar argument, cases (i) and (ii) in Lemma A.1 lead to the construction of one-sided intervals discussed in [32]. and H p (p) = M p (p). Furthermore, M p (k) is convex as a function of k and H p (k) is either convex, concave or changes from concave to convex. Regardless the case, it is sufficient to prove the H p (p − 1) ≥ M p (p − 1) to obtain the result.