Variability and stability of the false discovery proportion

Much effort has been done to control the “false discovery rate” (FDR) when m hypotheses are tested simultaneously. The FDR is the expectation of the “false discovery proportion” FDP = V/R given by the ratio of the number of false rejections V and all rejections R. In this paper, we have a closer look at the FDP for adaptive linear step-up multiple tests. These tests extend the well known Benjamini and Hochberg test by estimating the unknown amount m0 of the true null hypotheses. We give exact finite sample formulas for higher moments of the FDP and, in particular, for its variance. Using these allows us a precise discussion about the stability of the FDP, i.e., when the FDP is asymptotically close to its mean. We present sufficient and necessary conditions for this stability. They include the presence of a stable estimator for the proportion m0/m. We apply our results to convex combinations of generalized Storey type estimators with various tuning parameters and (possibly) data-driven weights. The corresponding step-up tests allow a flexible adaptation. Moreover, these tests control the FDR at finite sample size. We compare these tests to the classical Benjamini and Hochberg test and discuss the advantages of them. MSC 2010 subject classifications: Primary 62G10; secondary 62G20.


Introduction
Testing m ≥ 2 hypotheses simultaneously is a frequent issue in statistical practice, e.g., in genomic research. A widely used criterion for deciding which of these hypotheses should be rejected is the so-called "false discovery rate" (FDR) promoted by Benjamini and Hochberg [3]. The FDR is the expectation of the "false discovery proportion" (FDP), the ratio of the number of false rejections V m and the amount of all rejections R m . We call a multiple test procedure (FDR-)α-controlling for a pre-specified level α ∈ (0, 1) when FDR m = E(FDP m ) ≤ α. Under the so-called basic independence (BI) assumption, which will be introduced in more detail below, the classical Benjamini and Hochberg linear step-up test, in short BH test, is α-controlling. In fact, there is an exact formula for its FDR, namely FDR m = (m 0 /m)α [3,19], where m 0 is the unknown amount of true null hypotheses. Especially, if m 0 /m is not close to 1 the BH test should be improved regarding a better exhaustion of the FDR level to achieve higher power. For this purpose, so-called adaptive procedure can be used. The basic idea is to estimate m 0 by an appropriate estimator m 0 in a first step and to apply the BH test for the data dependent level α = (m/ m 0 )α in the second step. We can expect a better FDR exhaustion for a good estimator m 0 ≈ m 0 because, heuristically, FDR m ≈ (m 0 /m)α ≈ α.
Various estimators are suggested in the literature [4,5,7,8,36,38,39,40,42]. Generalized Storey estimators with data dependent weights discussed by Heesen and Janssen [24] will be our prime example in later discussions of our general results. The latter lead to α-controlling procedures, whereas other approaches are often just asymptotically α-controlling, i.e., lim sup m→∞ FDR m ≤ α. Sufficient conditions for estimators leading to (finite sample) α-controlling procedures can be found in Sarkar [32] and Heesen and Janssen [23,24]. Adaptive procedures are also used to get procedures controlling the family-wise error rate FWER m = P (V m > 0), another criterion for multiple tests, for details we refer to Finner and Gontscharuk [17] and Sarkar et al. [33] as well as the references therein.
Due to the additional estimation step, the variability of the FDP m is higher for adaptive procedures. This is contrary to the actual idea of α-controlling methods, namely to ensure in a certain way that the proportion of false rejections FDP m is small. In fact, methods are preferable when the inequality FDP m ≤ (α+ε) holds with a high probability and small ε > 0. That is why we address the question for which adaptive procedures this property can be expected. Ferreira and Zwinderman [15] presented formulas for higher moments of FDP m for the BH test and Roquain and Villers [31] did so for step-up and step-down tests with general (but data independent) critical values. We extend these formulas to adaptive procedures. In particular, we derive a finitely exact variance formula for FDP m . Combining this and Chebyshev's inequality we obtain an upper bound for the undesired event of a relatively large FDP for α-controlling procedures: (1.1) Under mixture p-value models Chi and Tan [11] already derived bounds and asymptotic results for P (V m > αR m ), see also Chi [10]. In the spirit of (1.1), for good procedures we expect that the variance of FDP m is small or even vanishes in the asymptotic set-up, i.e., if the number of hypothesis m tends to infinity. In the latter case, we say that FDP m is stable. To be mathematically more precise, FDP m is (asymptotically) stable if Note that E(V m /R m ) is not convergent in general but for appropriate subsequences. At long these subsequences V m /R m converges in probability to a constant under (1.2). Using our exact variance formula for FDP m we determine sufficient and necessary conditions for stability in the sense of (1.2). We also treat the more challenging case of sparsity in the sense that m 0 /m → 1 as m → ∞. This situation can be compared to the one of Abramovich et al. [1], who derived an estimator of the (sparse) mean of a multivariate normal distribution using FDR procedures. In the asymptotic set-up, stochastic process methods were applied to study the asymptotic behavior of FDP m and FWER m , e.g. asymptotic confidence intervals were calculated [20,27,28,29]. Since FDP m is an unknown quantity in practice, estimation of it is a further interesting topic. For various correlated test statistics, mainly normal and χ 2 -statistics, estimators of FDP m and FDR m were studied [13,30,35,39,41]. Outline of the results. In Section 2, we introduce the model as well as the adaptive step-up tests and, in particular, the generalized Storey estimators serving as our prime examples. Section 3 provides exact finite sample variance formulas for the FDP m under the BI model. Extensions to higher moments can be found in the appendix, see Section 9. These results apply to the variability and the stability of FDP m , see Section 4. Roughly speaking, we have stability if we have a stable estimator m 0 /m ≈ C 0 and the number of rejections tends to infinity. Section 5 is devoted to concrete adaptive step-up tests mainly based on the convex combinations of generalized Storey estimators with data dependent weights. We will see that stability cannot be achieved in general. Under mild assumptions the adaptive tests based on the estimators mentioned above are superior compared to the BH test: 1. The adaptive procedures lead to a more exhausted FDR while remaining α-controlling. 2. The corresponding FDP is stable whenever the FDP of the BH test is stable. In Section 6, we discuss least favorable configurations which serve as useful technical tools. For the reader's convenience we add a discussion and summary of the paper in Section 7. All proofs are collected in Section 8.

The model and general step-up tests
Throughout the paper, let m 0 ≥ 1 be nonrandom. Similarly to Heesen and Janssen [24], our results can be extended to more general models with random m 0 by conditioning under m 0 . By using this modification our results can easily be transferred to familiar mixture models discussed, among others, by Abramovich et al. [1] and Genovese and Wassermann [20]. We study adaptive multiple step-up tests with estimated critical values extending the famous Benjamini and Hochberg [3]  be the number of falsely rejected null hypotheses. Then the false discovery rate FDR m and the false discovery proportion FDP m are given by Good multiple tests like the BH test or the frequently applied adaptive test of Storey et al. [39] control the FDR at a pre-specified acceptance error bound α at least under the BI assumption, in short we say that they are α-controlling.
In addition to this property, two further aspects for multiple procedures are of importance and discussed below: (i) To make the test sensitive for signal detection the FDR should exhaust the level α as best as possible. (ii) On the other hand the variability of FDP m is of interest in order to judge the stability of FDP m .
For a large class of adaptive tests exact FDR formulas were established in Heesen and Janssen [24]. These formulas are now completed by formulas for exact higher FDP moments and, in particular, for the variance. These results can be used to discuss conditions for (1.2). Throughout the paper, lim inf m→∞ FDR m > 0 is assumed. Then the following condition is necessary for the stability of FDP m : As already stated we can not expect stability in general. In the following we discuss two negative results concerning the stability for the BH test.
see Finner and Roters [19] and Theorem 4.8 of Scheer [34]. The limit variable belongs to the class of linear Poisson distributions [12,18,25]. The requirement for stability will be somehow between DU(m, m 1 ) alternatives and the setting of Example 2.1(b), where the assumption m 1 → ∞ will always be needed. More information about DU(m, m 1 ) and least favorable configurations can be found in Section 6.

Our step-up tests and underlying assumptions
In the following we introduce the adaptive step-up tests. Let 0 < α < 1 be a fixed level. A tuning parameter λ ∈ [α, 1) is chosen such that no null H i,m with p i,m > λ should be rejected. For instance, it is uncommon to reject a null if the corresponding p-value is large than λ 0 = 1/2, even a rejection when p i,m > λ 1 = α is rather unusual. In this spirit, we split the range [0, 1] of the p-values into a decision region [0, λ], where we may reject the corresponding null hypotheses, and an estimation region (λ, 1], where we use the p-values to estimate m 0 , see Figure 1. To be more specific, we consider estimators for m 0 , which are measurable functions depending only on ( F m (t)) t≥λ . As usual we denote by F m the empirical distribution function of the p-values where we promote to use the upper bound λ as Heesen and Janssen [24] already did. The following two quantities will be frequently used: Throughout this paper, we investigate different mild assumptions. For our main results we fix the following two assumptions: If only 0 < lim inf m→∞ m 0 /m is valid then our results apply to appropriate subsequences. The most interesting case is κ 0 > α since otherwise (if m 0 /m ≤ α) the FDR can be controlled, i.e. FDR m ≤ α, by rejecting everything. If (A2) is not fulfilled then consider the estimator max{ m 0 , (α/λ)R m (λ)} instead of m 0 . Note that both estimators lead to the same critical values (2.6) and, thus, it is irrelevant which of these two estimators is used. Consequently, (A2) is not a restriction for the practical application but is improving the readability of our formulas significantly. Remark 2.2. Under (A2) the FDR of the adaptive multiple test was obtained for the BI model by Heesen and Janssen [24]: In particular, we obtain where the upper bound is always strictly smaller than 1 for finite m.
A prominent α-controlling adaptive test is based on the Storey estimator To obtain an estimator fulfilling (A2), we consider instead. A refinement was established by Heesen and Janssen [24]. They introduced a couple of inspection points Liang and Nettleton [26] already used the estimators (2.11) in another context. The Storey estimator can be rewritten as a linear convex combination of these estimators: Heesen and Janssen [23] allowed the weights to be data dependent and proved that the corresponding adaptive test is α-controlling under the BI assumption, see Proposition 2.3 below. In Section 5, we discuss the stability of FDP m for these procedures.
The adaptive step-up test using the following estimator m 0 is α-controlling: (2.12) Finally, we want to present a sufficient condition of asymptotic α-control.
It should be mentioned that Proposition 2.4 is even valid for reverse martingale models, a huge model class including, among others, the BI model. Finner and Gontscharuk [17] proved asymptotic FWER-control under the same conditions.

Moments
This section provides exact second moment formulas of FDP m = V m /R m for our adaptive step-up tests for a fixed regime P m . Our method of proof relies on conditioning with respect to the σ-algebra Conditionally under the (non-observable) σ-algebra F λ,m the quantities m 0 , R m (λ) and V m (λ) are fixed values. But only R m (λ) = m F m (λ) and m 0 are observable. The FDR formula (2.8) is now completed by an exact variance formula. The proof offers also a rapid approach to the known moment formulas of Ferreira and Zwinderman [15] for the Benjamini and Hochberg test (with m 0 = m and λ = α). Without loss of generality we reorder the p-values such that Exact higher moment formulas are established in Section 9.

The variability and stability of FDP m
In this section, we use the exact variance formula to study conditions for stability (1.2) of the FDP m . For this purpose we need a further mild assumption: We want to point out that any K > 0, not only K = 1, is allowed and, hence, Assumption (A3) is not a real restriction. Clearly, (A3) is fulfilled for all generalized weighted estimators of the form (2. Note that (A1) and (A3) imply lim inf m→∞ FDR m > 0 and, hence, (2.4) is a necessary condition for stability in this case. Below, we give boundaries for the variance of FDP m = V m /R m depending on the leading term in the variance formula of Theorem 3.1: .
The conditions (4.3) and (4.4) are competing. The choice m 0 = m (BH-test) always fulfills (4.3), whereas a random m 0 may lead to more rejections, which is preferable regarding (4.4). In Example 4.3 below, we present a situation, in which FDP m is stable when using the Storey estimator but is not stable for the BH test. But first, we want to point out that (4.3) does not imply that the estimator m 0 is consistent for m 0 , i.e., m 0 /m 0 → 1 in probability. Consistent estimators only exists under strong additional assumptions, see, e.g., Genovese and Wassermann [20]. Although being not consistent, the usual (random) estimators m 0 fulfill the stability condition (4.3), see Section 5.1 for a detailed discussion concerning generalized Storey estimators. The results concerning the BH test can be motivated by Figure 2. For the purpose, recall that the BH procedure can also be formulated by using the Simes line (0, 1) t → f α (t) = t/α. First, the empirical distribution function F m of the p-values p 1,m , . . . , p m,m and the Simes line f α are compared. Let t * be the largest intersection of them. Then all hypotheses with p i,m ≤ t * are rejected. By the Glivenko-Cantelli Theorem F m converges uniformly to F given by It is easy to check that F and f 1/4 have a non-trivial intersection point, namely t * = 1/3, whereas F and f 1/2 just intersect at t * = 0. By the first observation it follows that the number of rejections tends in probability to ∞ for the BH test with α = 1/2. In Section 8, we give a rigorous proof that this is not the case when α = 1/4.
The previous example, in particular the part concerning the BH tests, is in line with the results of Chi and Tan [11], see their Section 4.3. For a mixture pvalue model they showed for the BH test that the number of rejections remains finite for all levels α smaller than some threshold α * ∈ (0, 1) and the number of rejections tends to ∞ for all levels α > α * .
In the following we want to discuss the condition (4.4) more closely. Although R m → ∞ implies R 1,λ m → ∞, both in probability, the reverse is not obvious and may be false. But it is easy to see that, at least, E(R m ) → ∞ holds under stability: First, observe that under (A1)-(A3) stability, i.e., Var  Before we present the corresponding theorem, we recall that V m and R m depend, of course, on the pre-specified level α. That is why we prefer (only) for this theorem the notation V m,α and R m,α .

Various stability and instability results
To avoid the ugly estimator m 0 = (α/λ) max{R m (λ), 1}, which could lead to rejecting all hypotheses with p i,m ≤ λ, we introduce: Note that (A4) guarantees that (A2) holds at least with probability tending to one. The next theorem yields a necessary condition for stability.  [15] found conditions such that R m /m → C > 0 in P m -probability. However, the sparse signal case κ 0 = 1 is more delicate since R m /m always tends to 0 even for adaptive tests. Below, we discuss the convergence behavior of R m more closely, in particular for the sparse case.
A detailed proof is given in Section 8. As already stated, stability only holds under certain additional assumptions. In the following we compare stability of the classical BH test and of adaptive tests with appropriate estimators. Under some mild assumptions Lemma 5.5 is applicable for the weighted estimator (2.12), see Corollary 5.6(c) for sufficient conditions.

Combination of generalized Storey estimators
In this section, we are more concrete by discussing the combined Storey estimators m 0 introduced in (2.12). For this purpose we need the following assumption to ensure that (A4) is fulfilled.

Corollary 5.6. Let (A1), (A5) and all assumptions of Theorem 2.3 be fulfilled. Consider the adaptive multiple test with
with probability one for every m ∈ N. Moreover, assume that and all i = 1, . . . , k. If there is some j ∈ {1, . . . , k} and δ > 0 such that and for all i = 1, . . . , k. Moreover, suppose that for some j ∈ {1, . . . , k}, where β 0 := β 0 := 0 =: λ −1 . Additionally, assume It is easy to see that the assumptions of (c) imply the ones of (b). Typically, the p-values p i,m , i ∈ I 1,m , from the false null are stochastically smaller than the uniform distribution, i.e., P m (p i,m ≤ x) ≥ x for all x ∈ (0, 1) (with strict inequality for some x = λ i ). This may lead to (5.7) or (5.10).

Asymptotically optimal rejection curve
Our results can be transferred to general deterministic critical values (2.1), which are not of the form (2.6) and do not use a plug-in estimator for m 0 . Analogously to Section 4 and Section 9, we define R (j) m for j ∈ {1, . . . , m 0 } by setting j p-values from true null hypotheses to 0. By the same arguments as in the proof of Theorem 3.1 we obtain The first formula can also be found in Benditkis et al. [2], see the proof of Theorem 2 therein. The proof of the second one is left to the reader. By these formulas we can now treat an important class of critical values given by min(a, b). These critical values are closely related to where f α defined by f α (t) = t/(t(1 − α) + α) is the asymptotically optimal rejection curve, which is the optimal curve in terms of asymptotic power, i.e., there is no other curve being (asymptotically) α-controlling and having a higher power, see Finner et al. [16] for more details. Note that the case i = m is excluded on purpose because it would lead to α AORC m:m = 1. The remaining coefficient α AORC m:m has to be defined separately such that α AORC m−1:m ≤ α AORC m:m < 1, see Finner et al. [16] and Gontscharuk [21] for a detailed discussion. It is well-known that neither for (5.12) with b = 0 and a > 0 nor for (5.14) we have control of the FDR by α over all BI models simultaneously. This follows from Lemma 4.1 of Heesen and Janssen [23] since α 1:m > α/m. However, Heesen and Janssen [23] proved that for all fixed b > 0, α ∈ (0, 1) and m ∈ N there exists a unique parameter a m ∈ (0, b) such that where the supremum is taken over all BI models P m at sample size m. The value a m may be found under the least favorable configuration DU(m, m 1 ) using numerical methods.
By transferring our techniques to this type of critical values we get the following sufficient and necessary conditions for stability.
) is sufficient for stability and, moreover, FDR m → α holds.

Least favorable configurations
Below, least favorable configurations (LFC) are derived for the p-values (p i,m ) i∈I1,m corresponding to false null hypotheses. Subsequently, we use "increasing" and "decreasing" in their weak form, i.e., equality is allowed, whereas other authors use "nondecreasing" and "nonincreasing" instead. When deterministic critical values i → α i:m /i are increasing then the FDR is decreasing in each argument p i,m , i ∈ I 1,m , for fixed m 1 , see Benjamini and Yekutieli [6] or Benditkis et al. [2] for a short proof. In that case the Dirac uniform configuration DU(m, m 1 ), see Example 2.1, has maximum FDR, in other words it is LFC. Such least favorable configurations are useful tools for all kinds of proofs. Below, we condition on (p i,m ) i∈I1,m . Due to the independence assumption in (BI2) we may write P m = P 0,m ⊗ P 1,m , where P j,m represents the distribution of (p i,m ) i∈Ij,m , and E(X| ((p i,m ) only depends on the portion p i,m > λ, i ∈ I 1,m . While any deterministic convex combination of Storey estimators m Stor 0 (λ i ) fulfills (A6) it may fail for estimators of the form (2.11). But if the weights fulfill (5.6) then (A6) holds also for a convex combination (2.12) of these estimators. This follows from the other representation of the estimator (2.12) used in the proof of Corollary 5.6(c).

Discussion and summary
In this paper, we presented finite sample variance and higher moment formulas for the false discovery proportion (FDP) of adaptive step-up tests. These formulas allow a better understanding of FDP. Among others, the formulas can be used to discuss stability of FDP m , which is preferable for application since the fluctuation and so the uncertainty vanishes. We determined a sufficient and necessary two-part condition for stability: (i) We need a stable estimator in the sense that m 0 /m − E( m 0 /m) tends to 0 in probability. (ii) The p-values corresponding to false null hypotheses have to be stochastically small "enough" compared to the uniform distribution such that the number of rejections tends to ∞ in probability.
Since the latter is more difficult to verify we gave a sufficient condition for it, see (5.2). This condition also applies to the sparse signal case m 0 /m → κ 0 = 1, which is more delicate than the usually studied case κ 0 < 1. In addition to the general results we discussed data dependently weighted combinations of generalized Storey estimators. Tests based on these estimators were already discussed by Heesen and Janssen [24], who showed finite (FDR)-αcontrol. Heesen [22] and Heesen and Janssen [24] presented practical guidelines how to choose the data dependent weights. For our results, the additional condition (5.6) is required. We want to summarize briefly advantages of these tests in comparison to the classical BH test (see Corollary 5.6(c)): • The adaptive tests attain (if κ 0 = 1) or even exhaust (if κ 0 < 1) the (asymptotic) FDR level κ 0 α of the BH test. • Under mild assumptions stability of FDP m for the BH test always implies stability of FDP m for the adaptive test.
In Section 5.2 we explained that our results can also be transferred to general deterministic critical values α i:m , which are not based on plug-in estimators of m 0 . The same should be possible for general random critical values under appropriate conditions. Due to lack of space, we leave a discussion about other estimators for future research.

Proof of Theorem 3.1
To improve the readability of the proof, all indices m are omitted, i.e., we write p i instead of p i,m etc. First, we determine E(FDP 2 |F λ ). Without loss of generality we can assume conditioned on F λ that the first V (λ) p-values correspond to the true null and p 1 , . . . , p V (λ) ≤ λ. In particular, we may consider p (1) = (0, p 2 , p 3 , . . . , p m ) and p (2) = (0, 0, p 3 , . . . , p m ) if V (λ) ≥ 1 and V (λ) ≥ 2, respectively. Since α R:m ≤ λ we deduce from (BI3) that It is easy to see that p 1 ≤ α R:m implies R = R (1,λ) and, thus, P (p 1 ∈ ( α R:m , α R (1,λ) :m ]) = 0. Both were already known and used, for instance, in Heesen and Janssen [23,24]. Since p 1 and R (1,λ) are independent conditionally on F λ we obtain from Fubini's Theorem that Hence, we get the second summand of the right-hand side in (a). To obtain the first term, it is sufficient to consider V (λ) ≥ 2. Since p 1 , p 2 and R (2,λ) are independent conditionally on F λ we get similarly to the previous calculation: which completes the proof of (a). Combining (a), (2.8) and the variance formula Var(Z) = E(Z 2 ) − E(Z) 2 yields (b). The proof of (c) is based on the same techniques as the one of (a), to be more specific:

Proof of Lemma 4.1
To improve the readability, all indices m are omitted except for K m . (a): By Theorem 3.1(b) and (A2) it remains to show that is smaller than 2/(λ(m 0 + 1)). It is known and can easily be verified that B(n, p). From this and V (λ) ∼ B(m 0 , λ) we obtain the desired upper bound, see also p. 47ff of Heesen and Janssen [24] for details. (b): We can deduce from (8.3) that Note that Thus, by Markov's inequality it remains to verify E(Y λ ) ≤ D λ . We divide the discussion of E(Y λ ) into two parts. First, we use Hoeffding's inequality Second, we can conclude from Jensen's inequality and Theorem 3.
Finally, combining this with (4.1) yields the statement.

Proof of Theorem 4.2
Since V m /R m is bounded by 1 the stability statement in (a) is equivalent

Proof of Example 4.3
The statements concerning the Storey procedure follow immediately from Corollary 5.6(c). It remains to verify that FDP m is not stable for the BH test when the underlying level is α := 1/4. In this case the Simes line is given by t → f α (t) = 4t. Clearly, the Simes line lies strictly above F , the uniform limit of F m , on (0, 1), see also Figure 2.
Then P m (α Rm:m ≤ λ 0 ) → 1 follows. That is why we can restrict our asymptotic considerations to the portion of p-values with p i,m ≤ λ 0 and the instability follows analogously to the proof of Theorem 5.1(b).

Proof of Theorem 4.4
At long an appropriate subsequence n(m) → ∞ we can always obtain We suppose, contrary to our claim, that V m,α does not converge to ∞ in P mprobability for some α ∈ (α 1 , α 2 ). Since α → V m,α is increasing we can suppose without loss of generality that λ −1 αC / ∈ Q (otherwise take a smaller α > α 1 ). By our contradiction assumption there is some k ∈ N∪{0} and a subsequence of {n(m) : m ∈ N}, which we denote by simplicity also by n(m), with n(m) → ∞ such that P n(m) (V n(m),α = k) → β ∈ (0, 1]. We can deduce from (2.8) and, consequently, the necessary condition (2.4) for stability is not fulfilled.

Proof of Lemma 5.2
Analogously to the proof of Theorem 5.1(b), we condition under F λ,m and introduce the new p-value q i,Rm(λ) and the new critical value α where t → f αm (t) =: t/ α m is the corresponding Simes line, see also

Proof of Theorem 5.3
Clearly, all p i,m ≤ t m are rejected and, in particular, Combining this, (5.1), (5.2) and (8.6) yields Finally, the statement follows from V m (t m ) ∼ B(m 0 , t m ) and m 0 t m → ∞.

Proof of Remark 5.4
By Theorem 5.3 it remains to show that converges to 1. Note that the left-hand side of the last row converges in distribution to Z ∼ N (0, 1). Moreover, by straightforward calculations it can be concluded from (5.3) and C 0 ≥ κ 0 α that the right-hand side tends to −∞, which completes the proof.

Proof of Lemma 5.5
It is easy to see that (5.4) always holds if C 0 < 1. From (5.4) we obtain immediately that Moreover, we observe that the right-hand side only depends on p * i , i ∈ I 1,m , if p * i > λ. Consequently, we obtain the statement.
(aii): Due to (ai) it remains to show that the conditional second moment is minimal under DU cond (m, M 1,m (λ)). Clearly, BI and (A2) are also fulfilled conditioned on p * λ,m . From Theorem 3.1(a) we obtain It is easy to see that V m (λ) and m 0 are not affected and R (1,λ) m increase if we set all M 1,m (λ) p-value p i,m ≤ λ, i ∈ I n,1 , to 0.
(bi): Since V m (λ) is not affected by any p i,m , i ∈ I 1,m , the first statement follows from (A6) and (2.8). If p i,m ≤ λ, i ∈ I 1,m , decreases than V m (λ) and m 0 are not affected, and R m as well as R (1,λ) m increase. Hence, the second statement follows from Theorem 3.1(b).

Appendix: Higher moments
We extend the idea of the definition of p Remark 9.2. (a) If we set m 0 = m 0 and λ = 1 then this formula coincide up to the factor C j,k with the result of Ferreira and Zwinderman [15]. By carefully reading their proof it can be seen that the coefficients C j,k have to be added. It is easy to check that C 1,k = C k,k = 1 but C j,k > 1 for all 1 < j < k. In particular, the coefficients C 1,1 , C 1,2 , C 2,2 , which are needed for the variance formula, are equal to 1. (b) For treating one-sided null hypotheses the assumption (BI3) need to be extended to i.i.d. (p i,m ) i∈I0,m p-values of the true null hypothesis, which are stochastically larger than the uniform distribution, i.e., P (p i,m ≤ x) ≤ x for all x ∈ [0, 1]. In this case the equality in Theorem 9.1 is not valid in general but the statement remains true if "=" is replaced by "≤", analogously to the results of Ferreira and Zwinderman [15].
Proof of Theorem 9.1. For the proof we extend the ideas of the proof of Theorem 3.1. In particular, we condition on F λ,m . First, observe that . . , M j } such that every M s , 1 ≤ s ≤ j, is picked at least once, see, e.g., (II.11.6) in Feller [14]. Consequently, we obtain from (BI3) that (9.1) equals