On Stepwise Control of the Generalized Familywise Error Rate

A classical approach for dealing with the multiple testing problem is to restrict attention to procedures that control the familywise error rate (FWER), the probability of at least one false rejection. In many applications, one might be willing to tolerate more than one false rejection provided the number of such cases is controlled, thereby increasing the ability of the procedure to detect false null hypotheses. This suggests replacing control of the FWER by controlling the probability of $k$ or more false rejections, which is called the $k$-FWER. In this article, a unified approach is presented for deriving the $k$-FWER controlling procedures. We first generalize the well-known closure principle in the context of the FWER to the case of controlling the $k$-FWER. Then, we discuss how to derive the $k$-FWER controlling stepwise (stepdown or stepup) procedures based on marginal $p$-values using this principle. We show that, under certain conditions, generalized closed testing procedures can be reduced to stepwise procedures, and any stepwise procedure is equivalent to a generalized closed testing procedure. Finally, we generalize the well-known Hommel procedure in two directions, and show that any generalized Hommel procedure is equivalent to a generalized closed testing procedure with the same critical values.

procedures are proposed to control the k-FWER approximately, which account for the dependence structure of the individual test statistics or pvalues. Their results were generalized in Romano and Wolf (2007). In van der Laan et al. (2004), alternative procedures controlling the k-FWER are provided by augmenting single-step and stepwise FWER procedures. Further methods are discussed in Dudoit et al. (2004) and van der Laan et al. (2005).
In contrast to the popular false discovery rate (FDR), it is easier to derive powerful k-FWER controlling procedures in numerous settings. For example, suppose we are examining all pairwise comparisons in the one-way ANOVA model, in which the number of treatments is moderate. In this situation, the assumption of positive regression dependence of the underlying test statistics is not satisfied (Yekutieli, 2008), so Benjamini and Hochberg (1995)'s procedure is not applicable (Benjamini and Yekutieli, 2001), but no other FDR controlling procedure is available dealing with this problem.
An alternative choice is to control the k-FWER, since we will be able to develop relatively easy powerful k-FWER controlling procedures accounting for the special dependence structure of the individual test statistics. Hence, in many applications, the k-FWER can be regarded as a good complement to the FWER and FDR. For further enunciation of k-FWER criterion, see Hommel and Hoffmann (1987), Korn et al. (2004), Lehmann and Romano (2005), and van der Laan et al. (2004). In addition, based on the similar rationale to the k-FWER, Sarkar (2008b) advocated the k-FDR using the expected ratio of k or more false rejections to the total number of rejections, which is a generalization of the FDR. Several procedures controlling the k-FDR have also been developed in Sarkar (2008b) and Sarkar and Guo (2008a, b).
In this paper, we focus on the control of the k-FWER. Instead of pursuing a piece-meal approach, a unified approach is provided for the construction of the k-FWER controlling procedures based on marginal p-values. The main motivation comes from one particular paradigm of research on the FWER, where the well-known closure principle plays a fundamental role in the construction of the FWER controlling procedures. We believe that a generalization of the closure principle will play a similar key role in the construction of the k-FWER controlling procedures. To begin with, we generalize the closure principle, and then derive several general results on the relationship between generalized closed testing procedures and stepdown, stepup, and generalized Hommel procedures. As an application, it is then shown that the existing procedures can be directly derived following the generalized closure principle, and they are equivalent to some generalized closed testing procedures.
This paper is organized as follows. In Section 4.2, we set up the terminology. A generalization of the closure principle and several global tests are provided in Section 4.3. In Section 4.4, we discuss the relationship between generalized closed testing procedures and stepdown and stepup procedures.
Several general results are obtained. In Section 4.5, we generalize the Hommel procedure and show that generalized Hommel procedures are equivalent to generalized closed testing procedures with the same critical values. In Section 4.6, we offer some concluding remarks.
2. Basic Setting. Consider the problem of testing simultaneously a family of n null hypotheses H 1 , . . . , H n . Suppose that the family satisfies the free combination condition of Holm (1979), that is, for any I ⊆ {1, . . . , n}, there exists a distribution P ∈ Ω, for which all H i , i ∈ I are true and all where Ω is the set of all possible distributions of the data.
Suppose V is the number of true null hypotheses falsely rejected. The generalized familywise error rate (k-FWER) is defined to be the probability of at least k false rejections, where k is pre-specified with 1 ≤ k ≤ n. That is, If k = 1, k-FWER is the usual familywise error rate (FWER). When testing H 1 , . . . , H n , we assume that the p-values P 1 , . . . , P n are available, and the p-values associated with the true null hypotheses satisfy (2.2) P {P i ≤ u} ≤ u for any u ∈ (0, 1).
There are two main avenues open for developing multiple testing procedures based on marginal p-values: stepup or stepdown. We generalize these procedures to accommodate control of the k-FWER. A (generalized) stepup procedure based on the critical values α i , which is slightly different from the usual one, is described below. If P (n) ≤ α n , then reject all null hypotheses; otherwise, reject hypotheses H (1) , · · · , H (r) , where r ≥ k is the smallest index satisfying If, for all r ≥ k, P (r) > α r , then reject the first (k − 1) most significant hypotheses.
Similarly, a (generalized) stepdown procedure, which is slightly different from the usual one, is described below. If P (k) > α k , reject the first (k − 1) most significant hypotheses. Otherwise, reject hypotheses H (1) , · · · , H (r) where r ≥ k is the largest index satisfying Note that, if k = 1, the stepwise (stepup or stepdown) procedures described above are the same as the usual ones.
Evidently, from the definition of the k-FWER, one can always reject the (k−1) most significant hypotheses without violating control of the k-FWER.
This is the reason why we give a slightly different definitions of stepup and stepdown procedures, in which, the (k − 1) most significant hypotheses are automatically rejected. An alternative choice is to let α i = α k , 1 ≤ i < k, as in Hommel and Hoffmann (1987) and Lehmann and Romano (2005).
For convenience of discussion, in the subsequent sections, all procedures including the closed testing procedures described in Section 4.3 and the generalized Hommel procedures defined in Section 4.5, are also supposed to reject automatically the (k − 1) most significant hypotheses.
For any subset I ⊂ {1, . . . , n} with |I| ≥ k, consider the local test (I : otherwise 0. Then, by Lemma 3.1, the Type I error rate of the local test is less than or equal to Evidently, if the right side of (3.2) is bounded above by α, then the local test is a level α test under arbitrary dependence of p-values.
After obtaining the symmetric families of local tests, we now generalize the usual closure principle for controlling the k-FWER. Proof. Let I 0 be the set of indices of true hypotheses. Assume |I 0 | ≥ k or there is nothing to prove. Define the event The occurrence of event A implies that there exists i ≥ k such that the null hypothesis H i:I 0 is rejected. From the description of the generalized closed testing procedure, H i:I 0 is rejected implies that H I 0 is rejected. Therefore,  In what follows, the closed testing procedures considered are always built on symmetric families of local tests characterized by the critical constants 4. Stepwise Procedure. In this section, we discuss how to apply the generalized closure principle enunciated in Theorem 3.1 to derive stepwise (stepup or stepdown) procedures with the k-FWER controlling property.
It is generally not easy to show directly that a specific stepwise procedure has the k-FWER controlling property. However, our strategy is first to build a closed testing procedure based on a family of level α local tests, and then to prove that the specific stepwise procedure is equivalent to or dominated by the closed testing procedure. We now qualify equivalence or dominance of two procedures (Liu, 1996;Grechanovsky and Hochberg, 1999 with the k-FWER controlling property, which is a generalization of Holm's procedure (Holm, 1979). In this subsection, we provide a general result (The-  Proof. (i) We first show that, for any individual hypothesis H (i) with index (i), which corresponds to the ith minimal p-value P (i) , if it is rejected by the stepdown procedure, it is also rejected by the closed testing procedure.
If i < k, H (i) is automatically rejected by these two procedures. We as- is rejected by the stepdown procedure, then P (j) ≤ α k,(n−j)+k , for all k ≤ j ≤ i. Consider any subset I ⊂ {1, . . . , n} with (i) ∈ I, |I| ≥ k, and P (i) ≥ P k:I . Suppose P k:I = P (l) . Then k ≤ l ≤ i and |I| ≤ k + (n − l).

Consequently,
That is, H I is rejected. Then, from Theorem 3.1, H (i) is rejected by the closed testing procedure. Therefore, the stepdown procedure with the critical values α k,(n−i)+k , k ≤ i ≤ n is dominated by the closed testing procedure.
(ii) We now show that, when α i,|I| , k ≤ i ≤ |I| are constant for each given |I|, if H (i) is rejected by the closed testing procedure, it is also rejected by the stepdown procedure.
Let  We now focus on the converse of Theorem 4.1. In Theorem 4.2, we show that any stepdown procedure is equivalent to some closed testing procedure.

Proof. (i) We show that, for any individual hypothesis H (i) , if it is
rejected by the stepup procedure, it is also rejected by the closed testing procedure.
If i < k, H (i) is automatically rejected by these two procedures. So, we assume i ≥ k. If H (i) is rejected by the stepup procedure, then there exists j ≥ i satisfying P (j) ≤ α k,(n−j)+k . Consider any subset I ⊂ {1, . . . , n} with (i) ∈ I, |I| ≥ k, and P (i) ≥ P k:I . Let l = max{i ′ : P i ′ :I ≤ P (j) }. Since j ≥ i and (i) ∈ I, l ≥ k. If l < |I|, then P l+1:I > P (j) , so |I| ≤ (n−j)+l. Evidently, when l = |I|, it follows that |I| ≤ (n − j) + l, too. Hence, The third inequality in the chain above follows from the assumption that α l,(n−i)+l is increasing in l, and the last inequality follows from the inequality |I| ≤ (n − j) + l. Consequently, H I is rejected. Then, by Theorem 3.1, H (i) is rejected by the closed testing procedure. Hence, the stepup procedure with the critical values α k,(n−i)+k is dominated by the closed testing procedure.
(ii) We show that when α l,(n−j)+l , k ≤ l ≤ n are constant for each given That is, there exists j ≥ i satisfying P j:I ≤ α j,|I| . Since P j:I = P (i+j−k) , α j,|I| = α j,n−i+k , and α l,(n−i ′ )+l , k ≤ l ≤ n are constant for each given  We now focus on the converse of Theorem 4.3. In Theorem 4.4, we show that any stepup procedure is equivalent to some closed testing procedure.  (1986)' test, which is more powerful than Hochberg (1988)'s stepup procedure (Hommel, 1989). Hommel's procedure is described as follows: compute j = max{i ∈ {1, . . . , n} : P (n−i+l) > lα i , for l = 1, . . . , i}. If the maximum does not exist, reject all H i (i = 1, . . . , n); otherwise reject all H i with P i ≤ α/j, (i = 1, . . . , n). In this section, we generalize Hommel's procedure in two directions. In one direction, we move from the critical values of Simes' test to any double-indexed critical values satisfying certain properties, and in another, we move from the control of the FWER to that of the k-FWER.
For convenience of discussion, a formal definition of a generalized Hommel procedure is first given as follows. We now consider the case that j exists. First, we show that, for any individual hypothesis H (i) , if it is rejected by the generalized Hommel procedure, it is also rejected by the closed testing procedure.
If H (i) is rejected by the generalized Hommel procedure, then P (i) ≤ α k,j .
Consequently, H I is rejected. If |I| > j, then from the definition of j, there exists l satisfying k ≤ l ≤ |I| such that P (n−|I|+l) ≤ α l,|I| . Let i 0 = max{i ′ ≥ k : P i ′ :I ≤ P (n−|I|+l) }. Note that (|I|−k +1)+(n−|I|+l) = n+l −k +1 > n, which implies that the maximum i 0 exists. From the definition of i 0 , we have P i 0 +1:I > P (n−|I|+l) , so |I| ≤ n − (n − |I| + l) + i 0 . That is, l ≤ i 0 . Therefore, Hence, H I is rejected. Consequently, H (i) is rejected by the closed testing procedure. Therefore, the generalized Hommel procedure with the critical values α i,|I| is dominated by the closed testing procedure.
Next, we show that, if H (i) is rejected by the closed testing procedure, it is also rejected by the generalized Hommel procedure.
(ii) Since (I : α i,|I| , k ≤ i ≤ |I|) is a level α local test for each I ⊂ {1, . . . , n} with |I| ≥ k, then, from Theorem 3.1, the closed testing procedure controls the k-FWER at level α. Since the generalized Hommel procedure is equivalent to the closed testing procedure, the generalized Hommel procedure also controls the k-FWER at level α. ample, suppose α i,|I| = k |I| α. Letĵ = max i ∈ {k, · · · , n} : P (n−i+k) > k i α , thenĵ can be expressed as n − j + k, where j = min j ∈ {k, . . . , n} : P (j) > k n−j+k α . Note that j − 1 is the number of rejected hypotheses by using the stepdown procedure of Hommel and Hoffmann (1987) and Lehmann and Romano (2005), and (k − 1) is the maximal number of false rejections that one is willing to tolerate in this procedure. Thus we use n−( j −1)+(k −1) = n − j + k =ĵ as an estimate of the number of true null hypotheses in the family of hypotheses H 1 , . . . , H n . Hence, each generalized Hommel procedure with the critical values α i,|I| can be interpreted as a two-stage procedure, in which, first estimate the number of true null hypotheses by usingĵ, and then based on the estimateĵ, establish a single-step procedure with the critical value α k,ĵ .
6. Concluding Remarks. The original closure principle was formulated by Marcus et al. (1976) in the context of the FWER and has since been a powerful tool for deriving multiple testing procedures controlling the FWER. In fact, almost all FWER controlling procedures are either derived using this principle or can be rewritten as associated closed testing procedures. The only disadvantage is that closed testing procedures are computationally complex.
In this paper, we have generalized the closure principle for the k-FWER.
In the same vein as the usual closure principle, the value of generalized closure principle is that, instead of simultaneously testing multiple hypotheses, one only needs to do a number of single tests of intersection null hypotheses.
The construction of single tests based on marginal p-values is relatively easy.
The reason is that, as all p-values associated with true null hypotheses are marginally stochastically dominated by uniformly distributed random variables, some powerful probability inequalities on uniformly distributed random variables are available such as Bonferroni inequality, Simes' inequality and generalized Simes' inequality, which are useful in building single tests.

See Sarkar (2008a).
The generalized closed testing procedures are also computationally complex. We have discussed how to reduce generalized closed testing procedures to simple stepwise procedures, and showed that, under certain conditions, a generalized closed testing procedure can be formulated as a stepwise procedure. We have generalized the well-known Hommel procedure, and shown that each generalized closed testing procedure is equivalent to a simple generalized Hommel procedure with the same critical values.
We need to point out that, in this paper, we only discussed how to derive the k-FWER controlling procedures based on marginal p-values using the generalized closure principle. A future research is to use this principle for developing some resampling-based methods such as those described in Dudoit We also need to note that the generalized closure principle is derived for the families of non-hierarchical null hypotheses. In many cases, we need to test families of hierarchical null hypotheses simultaneously. An interesting future research is to modify the generalized closure principle for families of hierarchical null hypotheses extending the work of Hommel (1986) on the FWER to the k-FWER.