The Control of the False Discovery Rate in Fixed Sequence Multiple Testing

Controlling the false discovery rate (FDR) is a powerful approach to multiple testing. In many applications, the tested hypotheses have an inherent hierarchical structure. In this paper, we focus on the fixed sequence structure where the testing order of the hypotheses has been strictly specified in advance. We are motivated to study such a structure, since it is the most basic of hierarchical structures, yet it is often seen in real applications such as statistical process control and streaming data analysis. We first consider a conventional fixed sequence method that stops testing once an acceptance occurs, and develop such a method controlling the FDR under both arbitrary and negative dependencies. The method under arbitrary dependency is shown to be unimprovable without losing control of the FDR and unlike existing FDR methods; it cannot be improved even by restricting to the usual positive regression dependence on subset (PRDS) condition. To account for any potential mistakes in the ordering of the tests, we extend the conventional fixed sequence method to one that allows more but a given number of acceptances. Simulation studies show that the proposed procedures can be powerful alternatives to existing FDR controlling procedures. The proposed procedures are illustrated through a real data set from a microarray experiment.


Introduction
In many applications of multiple testing, such as genomic research, clinical trials, and statistical process control, the hypotheses are so structured that they are to be tested in a particular sequence. This structure may be a natural one, as in Goeman and Mansmann (2008), where Gene Ontology imposes a directed acyclic graph structure onto the tested hypotheses, or can be formed by using a data-driven approach for specifying the testing order of the hypotheses, as in Kropf and Läuter (2002), Kropf Hommel and Kropf (2005), Finos and Farcomeni (2011), etc. In some applications, it is not even possible to use the conventional p-value based multiple testing methods, because of some inherent structure among the tested hypotheses. For example, the hypotheses associated with stream data in sequential change detection problems (Ross et al., 2011) have a natural temporal structure, but none of conventional methods, such as the stepwise procedures, which are applicable only when all of p-values are available, can be used here since the decision concerning a hypothesis has to be made even before the data associated with the remaining hypotheses are observed. Some progress has been made in testing structured hypotheses. However, it has been primarily focused on controlling the familywise error rate (FWER) ( and Mehrotra and Heyse (2004) developed methods for testing hypotheses with a specific hierarchical structure where the structure is limited to only two levels. Yekutieli (2008) discussed a method that controls the FDR when the tested hypotheses have a general hierarchical structure. However, that method is shown to control the FDR only under independence.
The primary objective of this paper is to help advance the theory and methods on controlling the FDR for testing structured hypotheses. We do so by focusing on a structure where the hypotheses have a fixed pre-defined testing order since this is the simplest of hierarchical structures, yet it is often seen in real applications such as clinical trials, statistical process control and streaming data analysis. For such a structure, we will refer to it as a fixed sequence structure throughout this paper. Very recently, several methods have been introduced for controlling FDR while testing pre-ordered hypotheses. Farcomeni and Finos (2013) developed a 'single-step' FDR controlling method for testing hypotheses with the same critical value α, which tests each hypothesis at level α until a stopping condition is reached. Barber and Candes (2015), G'Sell et al. (2016) and Li and Barber (2016) developed several different 'step-up' FDR controlling procedures in the context of high-dimensional regression for testing hypotheses with fixed sequence structure for which hypotheses are tested from highest-ranked to lowest-ranked, and Lei and Withian (2016) performed asymptotic power analysis for such 'step-up' procedures. In addition, Javanmard and Montanari (2015) developed procedures for controlling the FDR in an online manner while testing a sequence of possibly infinite pre-ordered hypotheses.
In this paper, we develop 'step-down' FDR controlling methods that fully exploit the fixed sequence structural information, in which hypotheses are tested from lowestranked to highest-ranked. We first consider a conventional fixed sequence multiple testing method that keeps rejecting until an acceptance occurs and develop such a method controlling the FDR under arbitrary dependence. It is shown to be optimal in the sense that it cannot be improved by increasing even one of its critical values without losing control over the FDR, or even by imposing a positive dependence condition on the pvalues, such as the standard PRDS (positive regression dependence on subset) condition of Benjamini and Yekutieli (2001). This is different from what happens in case of nonfixed sequence multiple testing. For instance, the so-called BY method of Benjamini and Yekutieli (2001) that controls the FDR under arbitrary dependence can be improved significantly by the BH method of Benjamini and Hochberg (1995) by imposing this PRDS condition. Since our procedure cannot be improved under positive dependence, we consider the case of negative dependence and develop a more powerful conventional fixed sequence multiple testing method controlling the FDR under negative dependence which includes independence as a special case.
There is a potential for loss of power in a conventional fixed sequence multiple testing method if the ordering of the hypotheses, particularly for the earlier ones, does not match with that of their true effect sizes, potentially leading to some earlier hypothesis being accepted and the follow-up hypotheses having no chance to be tested. To mitigate that, we consider generalizing the conventional fixed sequence multiple testing to one that allows more than one but a pre-specified number of acceptances, and develop such generalized fixed sequence multiple testing methods controlling the FDR under both arbitrary dependence and independence.
It is not always the case in real data applications that the hypotheses will have a natural fixed sequence structure or information about how to order them will be available a priori. Nevertheless, the data itself can often provide information on how to order the hypotheses. In this paper, we discuss such a data-driven ordering strategy which can be applied to a broad spectrum of multiple testing problems such as one-sample and twosample t-tests, and one-sample and two-sample nonparametric tests. Through simulation studies and a real microarray data analysis, this strategy coupled with our proposed fixed sequence methods is seen to perform favorably against the corresponding non-fixed sequence methods under certain settings.
The paper is organized as follows. With some concepts and background information given in Section 2, we present the developments of our conventional and generalized fixed sequence procedures controlling the FDR under various dependencies in Sections 3 and 4, respectively. Our fixed sequence procedures coupled with a data-driven ordering strategy for the hypotheses are applied to a real microarray data in Section 5. The findings from some simulation studies on the performances of our procedures are given in Section 6.
Some concluding remarks are made in Section 7 and proofs of some results are given in the Appendix.

Preliminaries
Suppose that H i , i = 1, . . . , m, are the m null hypotheses that are ordered a priori and are to be simultaneously tested based on their respective p-values P i , i = 1, . . . , m. Let m 0 and m 1 of these null hypotheses be true and false, respectively. For notational convenience, we denote the index of the ith true null hypothesis by u i and the set of indices of the true null hypotheses by I 0 . Let V and S be the numbers of true and false null hypotheses, respectively, among the R rejected null hypotheses in a multiple testing procedure. Then, the familywise error rate (FWER) and false discovery rate (FDR) of this procedure are defined respectively as FWER = Pr(V > 0) and FDR = E Typically, the hypotheses are ordered based on their p-values and multiple testing is carried out using a stepwise or single-step procedure. However, when these hypotheses are ordered a prior and not according to their p-values, multiple testing is often performed using a fixed sequence method. Given a non-decreasing sequence of critical constants 0 < α 1 ≤ . . . ≤ α m , a conventional fixed sequence method is defined as follows: Definition 1 (Conventional fixed sequence method) 1. If P 1 ≤ α 1 , then reject H 1 and continue to test H 2 ; otherwise, stop.
2. If P i ≤ α i then reject H i and continue to test H i+1 ; otherwise, stop.
Thus, a conventional fixed sequence method continues testing in the pre-determined order as long as rejections occur. Once an acceptance occurs, it stops testing the remaining hypotheses. In Section 4, we will generalize a conventional fixed sequence method to allow a given number of acceptances. It should be noted that a conventional fixed sequence method with common critical constant α, which is often called the fixed sequence procedure in the literature, strongly controls the FWER at level α (Maurer et al., 1995). We will refer to it as the FWER fixed sequence procedure in this paper in order to distinguish it from other fixed sequence methods designed to control the FDR.
Regarding assumptions we make about the p-values in this paper, we assume that the true null p-values, which we denote for notational convenience by P i , for i = 1, . . . , m 0 , are marginally distributed as follows: Pr P i ≤ p ≤ p for any p ∈ (0, 1).
One of several types of dependence, like arbitrary dependence, positive dependence, negative dependence, and independence, has been assumed to characterize a dependence structure among the p-values.
for all fixed p j 's.
Several multivariate distributions posses the conventional negative association property, including multivariate normal with non-positive correlation, multinomial, dirichlet, and multivariate hypergeometric (Joag Dev and Proschan, 1983). It is easily seen that independence is a special case of negative dependence.

Conventional Fixed Sequence Procedures
In this section, we present the developments of two simple conventional fixed sequence procedures controlling the FDR under both arbitrary dependence and negative dependence conditions on the p-values.

Procedure under arbitrary dependence
Since the FDR is more liberal than the FWER, a conventional fixed sequence method controlling the FDR under arbitrary dependence is expected to have critical values that are at least as large as α, the common critical constant for the FWER fixed sequence method. In the following, we present such a simple conventional fixed sequence FDR controlling procedure. (ii) One cannot increase even one of the critical constants α (1) i , i = 1, . . . , m, while keeping the remaining fixed without losing control of the FDR. This is true even when P is assumed to be PRDS on P 0 .
Proof of (i). Since u 1 is the index of the first true null hypothesis, the first u 1 − 1 null hypotheses are all false. Note that the event {V > 0} implies that S ≥ u 1 − 1 and u 1 , and therefore we have For a proof of part (ii), see Appendix. Although the procedure in Theorem 3.1 cannot be improved under the PRDS condition, we consider in the next subsection the condition of negative dependence which includes independence as a special case, and under such condition, develop a more powerful conventional fixed sequence method that controls the FDR.

Procedure under negative dependence
When the p-values are negatively associated as defined in Section 2, the critical constants of the conventional fixed sequence procedure in Theorem 3.1 can be further improved as in the following: The conventional fixed sequence method with critical constants To prove Theorem 3.2, we use the following lemma, with proof given in Appendix: Lemma 3.1 Let m 0,i and m 1,i respectively denote the numbers of true and false null hypotheses among the first i null hypotheses, and Then, the FDR of any fixed sequence procedure can be expressed as Proof of Theorem 3.2. If u 1 = 1, then The second equality follows from the fact that d i = 0 for i = 1, . . . , u 1 − 1.
To see this, we consider, separately, the case when i ∈ I 0 and when i / The first and second inequalities follow from (2) and (1), respectively.
In the second equality, we use the fact that m 0,i−1 + m 1,i−1 = i − 1.
Applying (4) to (3), we have The equality follows from that fact that m 1,u 1 −1 = u 1 −1, since the first u 1 −1 hypotheses are false. When m 1 /m approaches to π 1 as m → ∞ with 0 ≤ π 1 < 1, an approximate lower bound of the FDR To see why, we first note that Then, by simple calculation, we have This lower bound of the FDR is very close to the pre-specified level α.  To remedy this, we generalize a conventional fixed sequence method to one that allows a certain number of acceptances. The procedure will keep testing hypotheses until a pre-specified number of acceptance has been reached. The same idea has also been used by Hommel and Kropf (2005) to develop FWER controlling procedures in fixed-sequence multiple testing.
Suppose k is a pre-specified positive integer and α 1 ≤ · · · ≤ α m is a non-decreasing sequence of critical constants. A fixed sequence method that allows more acceptances is defined below.
Definition 4 (Fixed sequence method stopping on the k th acceptance) It is easy to see that when k = 1, the fixed sequence method stopping on the k th acceptance reduces to the conventional one. Proof. Let U be the index of the first rejected true null hypothesis. If no true null hypotheses are rejected, then set U = 0. We will show that for i = 1, . . . , m 0 , If i ≤ k, then Now, assume i > k. Note that the event {U = u i } implies V ≤ m − u i + 1 and S ≥ u i − k, because the first u i − 1 hypotheses were either false nulls or not rejected true nulls and among the first u i − 1 hypotheses tested, there can be at most k − 1 acceptances. Thus, The first inequality follows from the fact that V /(V + S) is an increasing function of V and a decreasing function of S.
where the first equality follows from the fact that if none of the first k true null hypotheses are rejected, then V = 0.
We should point out that the result in Theorem 4.1 is weaker than that in Theorem 3.1, although the method in Theorem 4.1 reduces to that in Theorem 3.1 when k = 1.
More specifically, we cannot prove that the procedure in Theorem 4.1 is optimal in the sense that its critical constants cannot be further improved without losing control of the  We use the following two lemmas, with proofs given in Appendix, to prove the theorem.

Lemma 4.1
The FDR of any fixed sequence method stopping on the k th acceptance can be expressed as Proof of Theorem 4.2. By Lemmas 4.1 and 4.2, we have It is easy to see that under such configuration, when k = 2, the FDR of this procedure is equal to α 2−α + α 2 > α.

Data Driven Ordering
The applicability of the aforementioned fixed sequence methods depends on availability of natural ordering structure among the hypotheses. When the hypotheses cannot be pre-ordered, one can use pilot data available to establish a good ordering among the hypotheses in some cases. For example, in replicated studies, the hypotheses for the follow-up study can be ordered using the data from the primary study. However, when prior information is unavailable ordering information can usually be assessed from the data itself. Such data-driven ordering has been used by several authors in fixed sequence 2. The hypotheses are tested using a fixed sequence procedure based on the the pvalues P 1 , . . . , P m and the testing order established in Step 1.
We give a few examples to further illustrate the approach.
Example 1: One sample T-test. Consider testing H i : µ i = 0 against H i : µ i = 0 simultaneously where X i follows a N (µ i , σ 2 ) distribution. LetX i = n j=1 X ij /n and s 2 i = n j=1 (X ij −X i ) 2 /(n − 1) be the sample mean and variance, respectively, based on the observations X i1 , . . . , X in . Let Y i = n j=1 X 2 ij be the ordering statistics, that is, the hypotheses are ordered according to the values of the corresponding sums of squares, which suggests that |µ i | tends to increase as Y i increases.
Example 2: Two sample T-test. Consider testing H i : µ i simultaneously using n = n 1 + n 2 data vectors. Suppose X (l) ij , j = 1, . . . , n l , follows a N (µ (l) i , σ 2 ) distribution, for l = 1, 2. Then, the hypotheses can be tested using the two-sample t-test statistics T i and are ordered through the values of the 'total sum ij /n, for i = 1, . . . , m. The rationale behind this is independence between the Y i 's and T i under H i (see, for instance, Westfall et al., 2004), and the following result:  ij 's and X (2) ij 's.
When our proposed fixed sequence procedures are used in applications coupled with the aforementioned data-driven ordering strategy, the FDR controls are still maintained under the independence assumption, if the ordering statistics are chosen to be independent of the test statistics in the data-driven ordering strategy, even though the same data is repeatedly used for ordering and testing the hypotheses. We have the following result. Proof. Assume without any loss of generality that Y 1 ≥ · · · ≥ Y m , so that conditional on the Y i 's, H i is the ith hypotheses to be tested in our fixed sequence multiple testing methods. When H i is true, P i is independent of both Y i and X j , j = 1, . . . , m with j = i. This follows from independence of the X i 's and that of Y i and T i under H i . Thus, conditional on the Y i 's, each true null p-value P i still satisfies (1) and is independent of all other p-values P j with j = i. Therefore, we have for each of the procedures in Theorems 3.1, 3.2, 4.1, and 4.2, This proves the desired result.
We applied our proposed methods to the HIV microarray data (vant Wout et al., 2003) used by Efron (2008). These data consist of m = 7680 gene expression levels across eight subjects, four HIV infected and four uninfected. The data were log-transformed and normalized. Our goal is to determine which genes are differentially expressed by testing i simultaneously for i = 1, . . . , 7680, where µ (1) i and µ (2) i are the gene specific mean expressions for HIV infected and uninfected subjects, respectively.
We applied our proposed procedures with the p-values generated from two sample t-tests for the genes. Since there is no natural ordering among the genes, we used the ordering statistics for two sample t-tests in Example 2 to order these tested hypotheses.
We compared the procedure in Theorem 4.2 with the BH procedure. The results are summarized in Table 1 for different values of k where k/m = 0.05, 0.1, and 0.15. As seen from Table 1, for all values of k except k = 1, the procedure in Theorem 4.2 generally has more rejections than the BH procedure. When α is small, k/m = 0.05 tends to have the most rejections, but for large α, k/m = 0.1 has the most rejections. Also, we compared the procedure in Theorem 4.1 with the BY procedure. The results are displayed in Table 2. As seen from Table 2, for most values of k, our procedure outperforms the

Simulation Study
A simulation study was conducted to address the performances of the proposed procedures. We will refer to the procedures in Theorems 3.1, 3.  In each simulation, n independent m dimensional random normal vectors with covariance matrix Σ and components Z i ∼ N (µ i , 1), i = 1, . . . , m, were generated. The p-value for testing H i : µ i = 0 vs. H i : µ i > 0 was calculated using a one-sided, one-sample t-test for each i. The µ i corresponding to each false null hypothesis is set to the value at which the power of one-sample t-test at level 0.05 is 0.75. As for the joint dependence, we considered a common correlation structure where Σ had off-diagonal components equal to ρ and diagonal components equal to 1.
We set α = 0.05 and m = 100. The hypotheses were ordered using the 'sum of squares ordering' used in Example 1 from Section 5. We had 5,000 runs of simulation for each of the procedures considered. We noted the false discovery proportion and the proportion of correctly rejected false null hypotheses for each procedure in each of these runs. The simulated FDR and average power (the expected proportion of correctly rejected false null hypotheses) were obtained by averaging out the corresponding 5,000 values.
We  Figure 2. In terms of power, Figure 3 shows that the powers of all the procedures tend to show an increase over the powers shown in Figure 2.

Concluding Remarks
In this paper, we have developed 'step-down' procedures which control the FDR and exploit the structure of pre-ordered hypotheses. We have been able to produce the desired methods in the most simple as well as a general setting covering different dependence scenarios. Our simulation study and real data analysis show that in some cases, the proposed procedures can be powerful alternatives to existing FDR controlling procedures.
Using some of the techniques developed in this paper, it is possible to develop other types of fixed sequence procedures controlling the FDR, such as a fallback-type procedure. Unlike the conventional and generalized fixed sequence procedures developed in this paper, the fallback-type procedure tests the remaining hypotheses no matter how many earlier hypotheses are accepted, which is needed for analyzing stream data in sequential change detection problems.
Although we have only considered the simplest hierarchical structure -fixed sequence structure -by using the similar techniques presented in this paper, we were able to develop simple and powerful procedures that control the FDR under various dependencies when testing multiple hypotheses with a more complex hierarchical structure. We plan to present these procedures in a future communication.
By (7) and Lemma 4.1, the desired result follows.

Proof of Lemma 4.1
It is easy to see that the desired result.

Proof of Lemma 4.2
It is easy to see that for i = 1, . . . , m, if there are at least i rejections , then i ≤ J i ≤ min(i + k − 1, m). For ease of notation, let j * i = min(i + k − 1, m). For i, j = 1, . . . , m, define f i (j) = (k−j+i)α k S i i and W i (j) = 1 {J i−1 ≤j,J i >j} . Regarding the relationship between J i and W i (j), there are the following two equalities available: and The first equality follows from the fact that for i = 1, . . . , m and j = i, . . . , j * i , when J i = j, there are i − 1 rejections among the first j − 1 tested hypotheses and the i th rejection is exactly the j th tested hypothesis, thus The second equality follows from the fact that the event {W i (j) = 1} implies that there are exactly i−1 rejections among the first j tested hypotheses, thus for j = i−1, . . . , j * i−1 , where, the third equality follows from the fact that 1 {P j >α (4) j } = 1 − 1 {P j ≤α (4) j } and the fourth follows from (8).
By using the above two equalities, we can prove two inequalities below, which are needed in the proof of this lemma. Firstly, we show by using (8) that the following inequality holds: Proof of (10). To see this, we consider, separately, the case when j ∈ I 0 and when j / ∈ I 0 .
Suppose j ∈ I 0 , then S i = S i−1 and V i = V i−1 + 1 when J i = j. Using the fact that V i−1 + S i−1 = i − 1, the left hand side of (10), after some algebra, becomes The first equality follows from these two facts: (i) When J i = j, i.e. the i th rejection is Now suppose j / ∈ I 0 , then S i = S i−1 + 1 and V i = V i−1 . Similarly, using the fact that V i−1 + S i−1 = i − 1, the left hand side of (10), after some algebra, becomes The inequality follows by the fact that j ≥ i so that k − j + i ≤ k. In addition, in the last line we use the fact that V i−1 + S i−1 = i − 1.
Next, we show by using (9) that the following inequality holds: Proof of (11). By using (9), we have the desired result. Here, the second equality follows from the fact that if j * i−1 = m, then j * i = m; otherwise, j * i−1 = i + k − 2 and j * i = i + k − 1 so that f i−1 (j * i ) = 0. The fourth equality follows from (9) and the fact that W i (i − 1) = 1 {J i−1 =i−1} . The inequality follows from the definition of f i−1 (j).
Proof of Lemma 4.2. Finally, by combining these two inequalities, we can get the desired result as follows.