A Package to Study the Performance of Step-Up and Step-Down Test Procedures

The package can be used to analyze the performance of step-up and step-down procedures. It can be used to compare powers, calculate the “false discovery rate”, to study the eﬀects of reduced step procedures, and to calculate P [ U ≤ k ], where U is the number of rejected true hypotheses. It can be used to determine the maximum number of steps that can be made and still guarantee (with a given probability) that the number of false rejections will not exceed some speciﬁed number. The test statistics are assumed to have a multivariate- t distribution. Examples are included.


Introduction
There are many situations where a researcher is interested in the outcome of a family of tests. A classical example is the case where m experimental drugs are compared with a standard with respect to an outcome. Analysts are increasingly confronting large-mining environments, due to advances in computing facilities and data collection strategies. Multiple testing has increased in importance and micro-array experiments are common in such diverse areas as neuro-imaging, genomics and astronomy.
In multiple testing, it is important to control the probability of rejecting true hypotheses. A standard procedure has been to control the family-wise error rate (FWER), the probability of rejecting at least one true null hypotheses. For large numbers of hypotheses, using FWER can result in very low power for testing single hypotheses. Recently powerful multiple step methods have been developed for controlling the false discovery rate (FDR), that is, the expected proportion of type I errors. More recently Van der Laan, Dudoit, and Pollard (2004) proposed controlling a generalized family error rate k-FWER (also called gFWER(k)), defined as the probability of at least (k + 1) type I errors (k = 0 for the usual FWER). Lehmann and Romano (2005) suggested both a single step and a step-down procedure for controlling k-FWER. Somerville and Hemmelmann (2008) proposed limiting the number of steps in step-up or step-down procedures (reduced step procedures) to control k-FWER (and the proportion of false positives).

Step-down and step-up procedures
Step-down test procedures may be described as follows. Let t 1 , t 2 , . . . , t m be the test statistics corresponding to the null hypotheses H 1 , H 2 , . . . , H m . Denote by T i the random variable associated with t i .
Let T (1) ≤ T (2) ≤ · · · ≤ T (m) be the ordered values for the test statistics and denote the corresponding hypotheses H

Defining the problem
Assume there are m multivariately distributed test statistics. The n T test statistics corresponding to the true hypotheses have zero means, and the n F test statistics corresponding to the false hypotheses have means equal to δ. The test statistics corresponding to the true and false hypotheses have common correlations T and F respectively, with the common correlation between the true and false test statistics being C . Using Monte Carlo methods the program FDR generates n m-dimensional random vectors from the multivariate distribution, and for each uses the step-up or step-down procedure to determine the number of true and false hypotheses rejected using the designated test procedure. Powers, probabilities, etc., are thus estimated.

Using the package
The package is designed to study the performance of arbitrary step-up and step-down procedures. There are three modes for the Fortran program FDR. Mode 1 was designed to calculate three kinds of power (per pair, all pairs and any pair). See Horn and Dunnett (2004). In addition FDR is calculated. Using mode 1, the user specifies the number of steps to be used, and one or more series of ranges for nF. The output file PWR.out contains, for all the values for nF specified, the three kinds of power, and the false discovery rate (FDR). Somerville and Hemmelmann (2008) showed that, under fairly general conditions, when the means of test statistics corresponding to false hypotheses increase without limit, , as a function of nF, is minimized when nF is less than or equal to the number of steps minus 1. Extensive calculations, using a Fortran program BHmax with extended precision, strongly suggest, at least for the Benjamini and Hochberg (1995, BH for short) procedure, and most practical cases, that the minimum occurs when nF is exactly equal to the number of steps minus (k + 1). Mode 1 can be used to verify this for particular sets of values of m, nF, d, etc.
While mode 1 uses a fixed value for the number of steps, and one or more series of ranges of values for nF, mode 2 uses nF equal to the number of steps minus (k + 1), and one or more series of ranges for the number of steps.
A file CV.in must exist before executing FDR. The file must contain in non-decreasing order of magnitude, the m critical values for the procedure. A Fortran program makeCV, which calculates the ctitical values for the BH procedure, and for the step-down procedure of Lehmann and Romano (2005) is given in the supplementary materials. The user should be able to follow the instructions given in FDR.
The Fortran program FDR has been compiled using Lahey/Fujitsu Fortran 95 on Microsoft platforms XP, Vista and Windows 7, and using g95 on Microsoft platform Windows 7.

Selecting the value of kSTAR
If the object is to calculate the maximum number of steps such that P [U ≤ k] ≥ GAMMA, then the proper value for kSTAR is k. However the user may be interested in several values of k for several values of GAMMA. The minimum value of s is usually overestimated if kSTAR is chosen otherwise. The error is less serious if a lesser value is chosen for kSTAR. Especially for large values of m, kSTAR may be chosen slightly smaller with minimal error. Choosing kSTAR larger is usually more serious, with the error increasing much more rapidly than the choice of choosing the value slightly smaller. Example 3 in Appendix C gives a more detailed analysis.

Accuracy of estimated probabilities
The number of random m-dimensional vectors generated is given by n, a user input. Values of n of 100, 1000, 10000 and 100000 for the estimation of a probability of 0.95 result in standard errors of 0.0218, 0.0069, 0.0022 and 0.0007 respectively.

Miscellaneous
The following procedure is suggested when both one-sided and two-sided hypotheses are included. Calculate the p values corresponding to each test statistic, and then use the test statistic value which corresponds to that p value as a one-sided hypotheses.
Clearly the more steps that are used in a procedure, the more powerful the procedure will be. Mode 2 was added to the predecessor package to assist in determining the maximum number of steps which will still guarantee P [U ≤ k] ≥ GAMMA.
The value of n, the number of generations of m-dimensional random vectors from a multivariatet distribution, determines the accuracy of the estimates and the results. A useful technique is to begin a study with small values of n, but very large numbers of steps or numbers of false hypotheses. These preliminary runs can determine the values for the number of steps or false hypotheses in future runs where increased accuracies are needed.
The package has been used with the number of hypotheses as large as 1,000,000. With such large values for m, the value of n, the number of random generations of the m-dimensional vectors must be reduced so as to result in reasonable running times.
The original version of the program FDR was used to produce tables for two papers published in "Recent Developments in Multiple Comparison Procedures", Institute of Mathematical Statistics, one by Somerville (2004) and one by Horn and Dunnett (2004).

Summary and conclusions
The Fortran program FDR can be a very useful tool in the evaluation of step-up or step-down procedures. It can be used to compare competing procedures.
It is worth noting that, since single step procedures are special cases of multi-step procedures, all of the capabilities of the program apply to single step procedures. See Example 5 in Appendix C.
A. Corollary to Theorem 6.1, Somerville and Hemmelman Somerville and Hemmelmann (2008) proved the following: Theorem: Let d 1 ≤ d 2 ≤ · · · ≤ d m be the critical values corresponding to a step-up or step-down procedure used to simultaneously test m hypotheses. Let the random variable T i , associated with the test statistic t i be absolutely continuous or discrete, have mean µ i , and a distribution such that where a, µ and δ are arbitrary constants.
Let m = n T + n F , where n T and n F , respectively are the number of true and false hypotheses. Suppose T 1 , T 2 , . . . , T m have means µ 1 , µ 2 , . . . , µ nF , 0, . . . , 0, respectively. If U is the number of true hypotheses which are rejected, then, for the 1-sided case, We show that the following corollary is true.
Corollary: Let P * (n F , k, m) be the minimum value of P [U ≤ k] when there are m hypotheses, of which n F are false. Under the conditions of the above theorem, the minimum value of P * , as a function of n F , occurs for a value of n F which is never greater than s − 1.
of the s steps of an s-step procedure will result in the rejection of a false hypothesis and no true hypotheses can be rejected, i.e. P [U ≤ k] = 1 for all possible k values. Thus a minimum value of P * (n F , k, m), if one exists, cannot occur for a value of n F greater than s − 1.
Comments: The program FDR (mode 2) uses the value of n F as s − k − 1 in any calculations requiring P * (n F , k, m). The program can also be used (mode 1) to check that n F = s − k − 1 gives the smallest value of P [U ≤ k] for arbitrary k, m, s and critical values. (See example 2 in Appendix C.) Extensive calculations (primarily using critical values for the BH procedure) strongly suggest that, for most practical applications, P * (n F , k, m), as a function of n F , is minimized when n F = s − k − 1. Using a Fortran program BHmax with extended precision, no cases were encountered where a value of n F resulted in P [U ≤ k] less than its value when n F = s − k − 1.

B. Random multivariate normal vector generator
This appendix outlines the methodology used to construct an m-dimensional vector with the first n T components having mutual correlation T , and the last n F = m − n T components having mutual correlation F , and with the correlation between elements of the two sets of components having mutual correlation C .
For the i-th m-dimensional random vector, generate z T , z F and z C as follows: where rnor(nz, nw) are randomly generated N (0, 1) variates. (Note that the generation requires that C can never be larger than the smaller of T and F .) Generate n T random normal variates: Generate n F = m − n T random normal variates: .

C. Worked examples (tutorial)
This appendix illustrates how to use the Fortran program FDR for 6 different objectives. (A seventh possible objectivemfor the user is listed but not illustrated. The program is interactive and its use begins with the user typing FDR and pressing ENTER for each example. All the program responses and the user actions are replicated. Upon completeion of the calculations, FDR notifies the user of the output files which contain the results of the computations (e.g. The output file is AT1.out.) The appropriate output file is explicitly presented for the user.
The seven different objectives are: 1. Calculate powers and false discovery rate (FDR) for multiple step procedures.
2. Check that minimum of P [U ≤ k] occurs when nF = s − k − 1.
3. Determine the maximum number of steps so that P [U ≤ k] ≥ GAMMA.
4. Check accuracies when kSTAR is not chosen to equal k.
5. Calculate powers and false discovery rate (FDR) for single step procedures.
6. Examine effects of number of steps on powers and probabilities.
7. Compare two step-up or step-down procedures (not included).
In all cases, before executing the program FDR, the file CV.in must exist and contain the critical values for the step-up or step-down procedure to be analysed by FDR. The critical values must be arranged in non-descending order from smallest to largest.
A Fortran program makeCV, which can be used to give the critical values (in CV.in) for both the BH procedure and the step-down procedure of Lehmann and Romano (2005), is provided in the supplementary materials. Also included are the CV.in files needed for the examples in the appendix.
For all of the examples we assume the conditions for Theorem 6.1 of Somerville and Hemmelmann (2008), see Appendix A. Assume also that the means of the test statistics corresponding to a null hypothesis equal zero.

Example 1
Calculate powers and false discovery rate (FDR) for multiple step procedures.
(a) There are 50 null one-sided hypotheses to be simultaneously tested. Using the BH procedure with q = 0.05 = α, determine the per pair, all pairs and any pair power, and the actual false discovery rate (FDR) when there are 1, 11, 21, 31 or 41 false hypotheses, each of which are 3 standard errors greater than those of the true hypotheses (delta = 3). Use n = 100000 and seed = 757. Assume the test statistics have a multivariate-t distribution with degrees of freedom equal to infinity and rhoT = rhoF = rhoC = 0.0. We will need a file CV.in, containing the critical values (ascending order) for the BH procedure.
(a) Executing FDR, we obtain: TYPE 1 and press ENTER for MODE 1 TYPE 2 and press ENTER for MODE 2 TYPE 3 and press ENTER for MODE 3 1 In the next step user will need to give values for the following parameters. n number of random vectors used iseed seed used for generation, e.g. 757, 1234998 m number of hypotheses df degrees of freedom for test statistics use "-1" if infinity nbrsided use "1" for 1-sided tests use "2" for two-sided tests upordown use pos. integer or zero for stepup use neg. integer for down The program will make tables of P[U <= k] for values of k from kBEG to kEND in intervals of kINT Maximum number of tabulated k values in a run is 110! Give kBEG, kEND and press "ENTER" Warning:kEND must not be larger than m, the number of hypotheses to be tested. The file AT1.out is as follows: .000 1.000 1.000 1.000 1.000 1.000 1.000 950 0.949 0.994 0.999 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 999 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 Observe from AT1.out that P [U ≤ k] as a function of nF is ∪-shaped.
Executing the second time with nF ranging from 130 to 145, the file AT1.out is as follows: U is # of false rejections nF\k 1 2 3 4 5 6 7 8 9 10 11 From this AT1.out file, we observe that the value of P [U ≤ k] is never less than its value at nF = s − k −

Example 3
Determine the maximum number of steps so that P [U ≤ k] ≥ GAMMA. (a) Two approaches can be used. We can make a preliminary study using numbers of steps from 3 to 400 in intervals of 10, (for example) using a small value of n (say 1000). The subroutine POST will use linear interpolation to give preliminary values, and then we can make a second study using one or more series of values for the number of steps and a larger value of n, and all numbers of steps from 3 to 400 in intervals of 1, and use n = 100000. This is much more time consuming.
Be sure that CV.in has the critical values (non-decreasing order) for the desired procedure. The required file for the BH procedure can be found in the supplementary materials. The first step is to execute FDR, using mode 2, and delta = 10.
Executing FDR, using n=100000, we have: In the next step user will need to give values for the following parameters. n number of random vectors used iseed seed used for generation, e.g. 757, 1234998 m number of hypotheses df degrees of freedom for test statistics use "-1" if infinity nbrsided use "1" for 1-sided tests use "2" for two-sided tests upordown use pos. integer or zero for stepup use neg. integer for down GIVE the following PARAMETERS needed for FDRpwr (as integers) and TYPE ENTER n iseed m df upordown nbrsided 100000 757 1000 -1 1 1

FDR
Type a value for delta, the common expected value of the test statistics corresponding to the false hypotheses.
If you wish to have all false hypotheses rejected, TYPE "10" Press ENTER 10 rhoT is the correlation among TRUE test statistics rhoF is the correlation among FALSE test statistics rhoC is the correlation between TRUE and FALSE Give rhoT,rhoF and rhoC 0.0 0.0 0.0

Type a value for kSTAR
To give the most accurate computation of P[U <= k] use kSTAR=k. However, especially for large values of m, the value of P is only very slightly overestimated using values of k which are moderately larger than kSTAR. Using values for k smaller than kSTAR usually results in large overestimates of P. Thus a value of kSTAR greater than contemplated values of k is not recommended A value of kSTAR equal to 1 is often sufficient. For which values of k would you like tables? The program will make tables of P[U <= k] for values of k from kBEG to kEND in intervals of kINT Maximum number of tabulated k values in a run is 110! Give kBEG, kEND, kINT and press "ENTER" WARNING: kEND must not be larger than m, the number of hypotheses tested The program can calculate the largest number of steps such that P[U <= k] >= GAMMA. How many Gammas will you be using? The file MAXsteps.out is as follows: m = 1000 df= -1 rhoT= 0.00000000 rhoF= 0.00000000 rhoC= 0.00000000 seed= 757 n= 100000 1 -sided test STEP UP testing delta= 10.0000000 mode= 2 NOTE: A negative number (or zero) for the max number of steps (eg -xx) indicates an undetermined max less than xx. A number equal to nsEND usually indicates an undetermined max number of steps exceeding nsEND. However too small a value for n and or nsEND could be the cause. Examination of the appropriate AT file, or a rerun with a larger nsINT (quickest) or a larger value of n is suggested. nsEND= 400. Example 4 Check accuracies when kSTAR is not chosen to equal k.
We have demonstrated that P [U ≤ k] as a function of nF is ∪-shaped, and have proved that, under the conditions of Theorem 6.1 of Somerville and Hemmelmann (2008), the minimum occurs when nF is not greater than s−1, and assert that, in most practical cases, the minimum occurs when nF equals s − k − 1.
We have observed that the slope of P [U ≤ k] is steepest when nF is less than s − k, and is nearly flat for a large proportion of the range from nF = s − k to m. Table 1 illustrates the differences between the maximum number of steps such that P [U ≤ k] ≥ GAMMA when kSTAR = k, and when kSTAR is 5.
Note that, as previously indicated, using kSTAR larger or smaller, in FDR results in an overestimate of the number of steps which can be used, and still maintain P [U ≤ k] ≥ GAMMA. If kSTAR is less than k, the overestimate is usually slight, particularly if m or k or both are large. However, when kSTAR is greater than k the overestimate is much more serious, even when the difference is small.

Example 5
Calculate powers and false discovery rate (FDR) for single step procedures.
A single step procedure can be regarded as a multi-step procedure with s = 1. The following example uses FDR to determine the powers and false discovery rate (FDR) for the Bonferroni procedure. Note that, using s = 1, the file CV.in can use the critical values for the Holm procedure, the Lehman-Romano procedure with k = 1, or the BH procedure.
Problem: For m = 1000, q = 0.05 and delta = 3, find the powers and false discovery rate (FDR) of the Bonferroni procedure.
Use s = 1, and assume all correlations equal to 0.
Executing FDR we obtain the followimg: TYPE 1 and press ENTER for MODE 1 TYPE 2 and press ENTER for MODE 2 TYPE 3 and press ENTER for MODE 3

1
In the next step user will need to give values for the following parameters. n number of random vectors used iseed seed used for generation, e.g. 757, 1234998 m number of hypotheses df degrees of freedom for test statistics use "-1" if infinity nbrsided use "1" for 1-sided tests use "2" for two-sided tests upordown use pos. integer or zero for stepup use neg. integer for down Example 6 Examine effects of number of steps on powers and probabilities.
Compare the per pair power and P [U ≤ 1] for s = 1, 11, ..., 110 and 800 for values of nF = 1, 11, ..., 110, and 800. Use q = 0.05, m = 1000, n = 100000, delta = 3, rhoT = rhoF = rhoC = 0.0 for the BH procedure. We will need a run for each value of s. We illustrate for s = 1. Executing FDR, we obtain: TYPE 1 and press ENTER for MODE 1 TYPE 2 and press ENTER for MODE 2 TYPE 3 and press ENTER for MODE 3 1 In the next step user will need to give values for the following parameters. n number of random vectors used iseed seed used for generation, e.g. 757, 1234998 m number of hypotheses df degrees of freedom for test statistics use "-1" if infinity nbrsided use "1" for 1-sided tests use "2" for two-sided tests upordown use pos. integer or zero for stepup use neg. integer for down GIVE the following PARAMETERS needed for FDR (as integers) and TYPE ENTER n iseed m df upordown nbrsided 100000 757 1000 -1 1 1 For which values of k would you like tables? The program will make tables of P[U <= k] for values of k from kBEG to kEND in intervals of kINT Maximum number of tabulated k values in a run is 110! Give kBEG, kEND, kINT and press "ENTER" Warning: kEND must not be larger than m, the number of hypotenuses