Admissible, consistent multiple testing with applications including variable selection

: For multivariate normal models and some exponential family models a multiple testing stepwise method is oﬀered that is both admissible and consistent. The method is readily adaptable to selecting variables in linear regression models where it is akin to the forward selection method plus a screening stage plus a sign compatibility stage.


Introduction
Multiple testing in general linear models is a long standing statistical practice. The subject has been given great impetus over the last two decades with applications arising in fields where the numbers of hypotheses to be tested can be extremely large (namely thousands). Such applications are needed in microarrays, astronomy, mutual fund evaluations, proteomics, disclosure risk, cytometry, imaging and others. Traditional methods, usually classified as single step methods, are deemed too conservative, particularly when the number of hypotheses to be tested is very large. To compensate, stepwise procedures were introduced, which enable more rejection of hypotheses. See, for example, Dudoit and Van Der Laan (2008) and Lehmann and Romano (2005) for description and some properties of single step and stepwise procedures. Many stepwise procedures are based on P-values derived from the marginal distributions of relevant test statistics that test an individual hypothesis. Oftentimes, these and other multiple testing procedures (MTPs) are focused on controlling some type of error rate, namely, the familywise error rate (FWER), weak or strong, or the false discovery rate (FDR). Oftentimes the error rate controlling procedures seek to have good average power. While focusing on error rate control of the overall procedure, sometimes the properties of the ensuing tests of each individual hypothesis is not given enough attention. It is true that the resulting test for an individual hypothesis resulting from an MTP can be complicated. Nevertheless, examining the properties of the individual tests can be important. In fact, a decision theory approach to multiple testing, focusing on individual tests, can suggest procedures that do well when evaluated by the expected number of type I errors and expected numbers of type II errors. This represents another way to evaluate MTPs. See Dudoit and Van Der Laan (2008) where attention is given to type I and type II errors as well as to error rate controlling procedures.
In a series of papers, Cohen and Sackrowitz (2005a,b, 2007 and Cohen, Kolassa, and Sackrowitz (2007) have shown that many of the standard stepwise procedures under a wide variety of assumptions, often turn out to be inadmissible for a variety of risk functions that involve both expected type I and expected type II errors. For exponential family models the inadmissibility is often shown to follow because in testing an individual hypothesis, relevant acceptance sections are not intervals. This represents a disturbing practical shortcoming of many of the usual procedures.
In response to the inadmissibility property and the fact that many stepwise procedures are based on the marginal distributions of test statistics, even when they are statistically dependent, Cohen, Sackrowitz, and Xu (2009) recommend a new MTP method called maximum residual down (MRD). MRD takes correlation into account and is admissible (in many exponential family models) for a risk function that focuses on expected type I and type II errors.
For some models the MRD method is not consistent. See Section 2 for a formal definition of consistency. Informally, consistency means that the probability of correctly testing each hypothesis tends to 1 as the sample size tends to infinity and as critical values tend to infinity at a certain rate.
In this paper we introduce a modification of MRD which offers a procedure that is admissible and consistent. The modification, discussed in Section 3, adds a screening stage following MRD, and a sign stage following the screening stage. The screening stage, which is intuitively desirable, enables the overall procedure to be consistent. Its addition however can alter the admissibility property of MRD. The sign stage restores the admissibility and retains the consistency property. Call the new procedure MRDSS.
For the most part we assume a multivariate normal model with unknown mean vector and a covariance matrix that is known or known up to a scalar multiple. The tests of hypotheses concern the components of the mean vector. The results also apply, with minor modifications, to many , but not all, exponential family models. For example, the results apply to testing differences of binomial parameters, Poisson parameters, or scale parameters of exponential distributions. Furthermore the MRD, screening, and sign stages suggest an approach with the potential to yield an admissible and consistent MTP for more complicated exponential family models. One such model entails testing whether all local log odds ratios are zero in an R × C contingency table. This latter problem is treated in the companion paper by Chen, Cohen, and Sackrowitz (2009).
In that paper all three stages require the development of nontrivial analogues of the MRD, screen, and sign stages that enable an admissible, consistent MTP to ensue. Furthermore, as in the multivariate normal case and other straightforward exponential family cases, the resulting MTP has other desirable properties. These include computational feasibility and as justified by simulation results, they do considerably better than typical stepwise procedures in terms of type I and type II errors over large portions of the parameter space.
We will apply the MRDSS method to selection of variable models in regression. For such models, the MRD method is shown to be equivalent to some forward method of selecting variables. See Miller (2002) for a discussion of forward methods. Both MRD and the forward method are stepwise methods based on what we define as residuals. Forward methods and hence MRD are typically not consistent. See, for example, An and Gu (1985). We show that MRDSS, and thus a modification of a forward method, with a screen and sign stage added, is admissible and consistent under modest and typical conditions on the critical values and design matrix.
The results will be proved assuming a full rank multivariate normal model with unknown mean vector and known covariance matrix while testing components of the mean vector. However, extension to the following cases, sometimes with modifications are possible: covariance equal to an unknown variance times a known positive definite matrix; testing functions of natural parameters of exponential family densities; regression models with suitable design matrices.
Through the first five sections the focus will be on the multivariate normal model. Definitions and other preliminaries are given in Section 2. MRDSS is described in Section 3. Section 4 is concerned with results on admissibility and consistency. A small simulation is also reported in Section 4. The simulation shows the improvement of MRDSS relative to the FDR controlling step-up procedure over a large portion of the parameter space. In Section 5 we demonstrate that MRD is equivalent to a forward method of variable selection in a linear regression model. Some remarks about the backward method are also made in Section 5. Section 6 contains extensions to some exponential family models. The proof of admissibility of MRDSS is in the Appendix.

Multivariate normal models
Let X α , α = 1, . . . , n, be a random sample of M × 1 vectors from a multivariate normal distribution whose mean vector is µ and whose covariance matrix is Γ. Sometimes Γ is known and sometimes Γ = σ 2 Σ where Σ is known and σ 2 is unknown. In the later case, is an unbiased estimator of σ 2 , which is independent ofX = n α=1 X α /n. For this model with Γ known and nonsingular the density ofX is We wish to test H i : µ i = 0 vs K i : µ i = 0, i = 1, 2, . . ., M . Sometimes one sided alternatives K i : µ i > 0, are of interest.
The general linear model has Y = Aβ + ǫ where Y is an n × 1 vector which is multivariate normal with mean vector Aβ and covariance matrix σ 2 I. A = (a 1 , . . . , a M ) is an n × M fixed design matrix and β is an M × 1 vector of parameters. If A has rank M , thenβ = ( which is called mean squared error and which is independent ofβ. Thus this general linear model reduces to our multivariate normal model with Γ = σ 2 S −1 . We wish to test H i : β i = 0 vs K i : β i = 0, i = 1, 2, . . . , M . For notational convenience we consider the model in (2.1) with n = 1 for all matters not concerning consistency. This is done without loss of generality. An MTP Φ(x) = (φ 1 (x), . . . , φ M (x)) ′ where φ i (x) is a test function for testing H i vs K i , i = 1, . . . , M . That is, φ i (x) is the probability of rejecting H i for the sample point x. A type I error is made if H i is rejected when µ i = 0 and a type II error is made if H i is not rejected and µ i = 0. The expected type I error is The risk function for the procedure Φ(x) is defined as the vector risk function with strict inequality holding for some i and some µ.
Now we return to the model in (2.1) for the case of n > 1. Consistency is defined in An and Gu (1985) and in Bunea, Wegkamp, and Auguste (2006). Let We say Φ is consistent when P {Î = I} tends to 1 when n tends to infinity.

MRDSS: Multivariate normal model
We start with the multivariate normal model with Γ = Σ = (σ ij ) known and nonsingular. Without loss of generality for now we let n = 1. (For n > 1,X would replace X where X is given in the next line.) is the (M − r − 1) × 1 vector of covariances between X j and all variables except X i1 , .., X ir and X j .
is the conditional variance of X j , given all variables except X i1 , . . . , X ir , X j . Now define the jth normalized residual at step m to be We note that if D m is the diagonal matrix of the terms in (3.1), then Let C 1 > C 2 > · · · > C M > 0 be a given set of constants. Then the MTP, called maximum residual down (MRD), is determined as follows: At step 1, consider U 1j (X), j ∈ 1, . . . , M. Let j 1 = j 1 (X) be such that |U 1j1 (X)| = max j |U 1j (X)|. If |U 1j1 (X)| < C 1 , stop and accept all H i . Otherwise reject H j1 and continue to step 2.
In general, at step m, m = 1, . . . , M , consider M − m + 1 functions If and when H i is rejected, say at step r, record the sign of U ri .
To add a screening stage to MRD, let C U > C L > 0 be two additional constants. Typically C L ≤ C M < C U . After MRD is done, each hypothesis is accepted or rejected.
Let H j1 , . . . , H jp be those hypotheses that are rejected. Should any . . , p, then reverse the reject decision to an accept decision. For those hypotheses that MRD accepted, say, H jp+1 , . . . , H jM , reject any for which To add a sign stage first consider only those hypotheses rejected by MRD, i.e., H j1 , . . . , H jp , and such that C L < |X ji |/ √ σ jiji < C U , i = 1, . . . , p. Then switch those to accept whenever the sign of X ji is different from the sign of U m * ji where m * is the step when H ji was rejected by MRD. The MRD procedure followed by a screening stage and a sign stage is called MRDSS.
First we apply MRD. At step 1 the residuals are: Since |U 11 | is the largest and |U 11 | > 2.3 = C 1 , MRD rejects H 1 and goes to the next step. Now with X 1 eliminated, Since |U 22 |, |U 23 | and |U 24 | are all less than C 2 = 2.0, MRD stops and accepts H 2 , H 3 , and H 4 .
Next we do the screen stage by comparing each |X i / √ 2| to C L = .6 and C U = 3.6. We see that Lastly we perform the sign stage. This applies only to hypotheses H i that were rejected by MRD and for which C L < |X i / √ 2| < C U . In this example only H 1 falls into this category. Upon examination we see that H 1 was rejected at the first step of MRD stage with U 11 = −2.46 while X 1 / √ 2 = .707. Since these signs are opposite we reverse the MRD decision to reject H 1 .
In conclusion MRDSS will accept H 1 , H 2 , H 3 and reject H 4 .
For the multivariate normal model where Γ = σ 2 Σ , σ 2 is unknown and n > 1, a modification of U mj is called for. The numerator would use the component ofX instead of X, and the denominator of U mj would be multiplied by the square root of an estimator of σ 2 divided by √ n. The estimator of σ 2 could be s 2 defined in Section 2 or it could be V = n α=1 x α Σ −1 x α /n, which is an unbiased estimator of σ 2 when µ = 0. Simular modifications would also be made in the screen stage determined by (3.3) and (3.4). The advantage of using V 1/2 is that both admissibility and consistency of the overall procedure are easily established. The advantage of using s is that the overall procedure is both location and scale invariant.

Admissibility and consistency of MRDSS
Let Φ M RDSS (X) be the MRDSS MTP for the multivariate normal model with σ 2 known. When studying admissibility, without loss of generality, n = 1. Proof. See Appendix.
Note that MRD has been proven to be admissible in Cohen, Sackrowitz, and Xu (2009).
Let Σ be positive definite and let g i be the i-th column of Σ. Then define the class of MTPs with: Property N : Suppose Ψ(X) is an MTP with individual test functions Ψ i (X). Suppose Ψ i (X), for every i, is such that the acceptance section along the line x + rg i is an interval. Then the MTP is said to have Property N. It is noted in the Appendix that Property N is necessary and sufficient for an MTP to be admissible.
Next we address the issue of consistency of MRDSS. We first note that MRDSS is consistent if and only if for each i, the individual test of hypothesis H i : µ i = 0 vs K i : µ i = 0 is consistent.
In discussing consistency the sample size n > 1 and the random vector X is replaced byX. The screen step of MRDSS becomes accept H i if , where now the critical values change with n.
The proof of Theorem 4.2, with appropriate modification would establish consistency of MRDSS in this case. That is, the statistic for the screen step ii . In (4.1) we would replace √ n(X i − µ i )/σ 1/2 ii by (t i − Et i )/ var(t i ). Using properties of the noncentral t distribution as given in Johnson and Kotz (1970), and analogues of (4.1) and (4.2) would establish consistency. Consistency also follows if √ n|X i |/V 1/2 σ 1/2 ii is used in the screen stage.
Remark 4.4. In the next section we will show that MRD is the same as a forward method in selection of variables in a multiple linear regression model. For this application the statistic used for the screening step is T i = |β i |/sλ 1/2 ii where λ ii is the i − th diagonal element of S −1 . Then for consistency we require that max λ ii (n) → 0 as n → ∞ in such a way that λ ii = O(1/n). A different requirement for λ ii can be made if C L and C U are chosen to tend to ∞ at a rate different from the one given in Theorem 4.2.
At this point we note that MRD alone is oftentimes not consistent. For a simple example we need only consider the model of the illustrative example of Section 3. We need only begin by thinking of the X i as sample means based on n independent observations and change the covariance matrix to 1 n Σ. Next we study the distribution of (U 11 , U 12 , U 13 , U 14 ) when the parameter point is µ = (0, r, r, r) ′ . From (3.5) it is clear the covariance matrix is symmetric and the mean vector is 4/5(−3r/4, r/2, r/2, r/2). Therefore, regardless of n, P µ |U 11 | = max(|U 11 |, |U 12 |, |U 13 |, |U 14 |) > 1/4. Thus, with probability > 1/4, the decision at the first step of MRD will be based on |U 11 |. If |U 11 | ≥ C 1 , H 1 is rejected and a type I error is made. If |U 11 | < C 1 , the process stops and all hypotheses, including H 2 , H 3 and H 4 are accepted. In this case three type II errors are made. Here for any n, the probability that some error will be made is > 1/4. We conclude this section with the results of a simulation. As mentioned in Cohen, Sackrowitz, and Xu (2009), the MRD procedure can be viewed as a family of procedures that is parameterized by a set of constants C 1 , . . . , C M . MRDSS has two more parameters: C L and C U .
The simulation concerns a multivariate normal model with a mean vector of order 1000 × 1 and a covariance matrix which is intraclass with equal diagonal elements and equal off-diagonal elements. The parameter points are such that at most 25 percent are non-zero. This is not atypical for many practical models when the number of hypotheses to be tested is very large. We perform MRDSS and an FDR controlling step-up procedure at FDR level .05. Type I and type II errors are compared as well as total number of mistakes. For MRD and MRDSS, we compute the percent decrease of total number of mistakes relative to the step-up procedure. For all simulations, we used 2000 iterations.

Equivalence of MRD and a forward selection method
For this section we assume the general linear model described in Section 2 with σ 2 known. We test H i : β i = 0 vs K i : β i = 0, i = 1, . . . , M . Let C 1 > C 2 > · · · > C M > 0 be a given set of constants.
Recall A = (a 1 , . . . , a n ) and let A be partitioned as (A 1 , A 2 ) where A 1 is n × (m − 1) and A 2 is n × (M − m + 1). Let V = R(A) be the linear space spanned by the columns of A and V 1 = R(A 1 ) be the linear space spanned by the columns of A 1 . Furthermore, let P V be the projection matrix for space V and let P 1 be the projection matrix for space (a 2·1m , . . . , a 2·1M ). Finally let P 2 denote the projection matrix for V 2 so that P V = P 1 + P 2 .
Forward selection methods are studied in Miller (2002) and many other texts studying linear regression. They are methods that select fixed variables to enter a linear regression model in steps. In the first step one considers the random variables (a ′ i y)/σ a i , i = 1, . . . , M . The forward method selects the fixed variable a i1 provided that If U 1i1 ≤ C 1 , no variables are chosen and the procedure stops. Now assume (m − 1) variables have been selected. Without loss of generality say they are (a 1 , . . . , a m−1 ) = A 1 . Then at stage m, the method selects a im provided If U mim ≤ C m , the procedure stops. Thus at step m, the forward method is tantamount to rejecting H im : β im = 0 by virtue of (5.2). Note that the vector of components a ′ 2·1j y in (5.2) can be written as A ′ 2·1 y. Now recall MRD applied toβ ∼ N (β, σ 2 S −1 ). At step one the numerator of the residual is proportional to Sβ = A ′ y, which is the vector of components in the expression in the numerator in (5.1). At step m, assuming H i : β i = 0, i = 1, . . . , m−1 have been rejected, MRD deletes the first m−1 rows and first m−1 columns of S −1 and considersβ 2 ∼ N (β 2 , Λ 22 ) where β 2 = (β m , . . . , β M ) ′ and S −1 = Λ 11 Λ 12 Λ 21 Λ 22 , with Λ 11 (m−1)×(m−1) and Λ 22 (M −m+1)×(M −m+1).
The residuals U m at this step are proportional to Λ −1 22β 2 . That is, Multiply both sides of (5.4) by P 2 to get A 2·1β 2 = P 2 P V y = P 2 y . (5.5) , we find using (5.3) and (5.5) that Note that the components of (5.6) are (a ′ 2·1 y), the unnormalized terms in the expression (5.2). Thus we have proved Theorem 5.1. For the assumption of the general linear regression model, the forward selection method is equivalent to MRD.
Remark 5.2. Whereas the forward method and MRD are oftentimes not consistent procedures, MRDSS and thus the forward method plus screening, plus sign is consistent. It is interesting to note that Christensen (1987) p.288, claims that the precise screening step we recommend is an improvement on the forward method.
The backward method starts by assuming a linear regression model with M parameters. At step 1, one can decide that Otherwise the procedure stops and no variables are eliminated.
If min m≤j≤M of the terms in (5.7) ≤ C M −m+1 , and the index is i m then decide β im = 0 and continue. Otherwise stop.
Remark 5.3. The backward method is consistent under mild conditions (see An and Gu, 1985). Nevertheless the backward method is akin in some sense to the step-up method of multiple testing. They are identical at the first step. As previously mentioned step-up methods are often inadmissible in normal and exponential family models and it can be shown that the backward method would sometimes be inadmissible for a risk function whose components are expected number of type I errors and expected number of type II errors.

Exponential family models
There are many models involving distributions that belong to the exponential family. For example, hypothesis testing problems involving differences of binomial parameters, Poisson parameters, and scale parameters of exponential distributions. Such exponential family models lend themselves to a somewhat straightforward extension of the MRDSS method and result in admissible and consistent MTPs. In this section we describe the extension. In other exponential family models it can be non-trivial to find (if such exist) functions with the properties needed to play the roles of the analogues of MRD, screen and sign stage. One such case is an R × C contingency table where the hypotheses of interest concern all local log odds ratios. Such a case will be treated in the companion paper by Chen, Cohen, and Sackrowitz (2009).
For exponential families, where the hypotheses of interest may be differences or contrasts among natural parameters, we describe MRDSS.
We wish to test H i : θ i = 0 vs K i : θ i = 0, i = 1, . . . , M . Note that the first M components of X are contrasts among the components ofZ while i=1Z i . Also note that the density in (6.1) is similar to the density (2.1). The MRD stage for this model will entail residuals W mj defined as follows: The numerator of W mj , the analogue of U mj of (3.2) , is defined exactly as in (3.2). The denominator is defined as an estimator of the standard deviation of the numerator assuming all ω i are equal. The estimator will be a function of X M +1 /(M + 1) which is the (M + 1)st component of x ′ Σ −1 . The term X M +1 /(M + 1) is the average of theZ i , i = 1, . . . , M . Such a definition of W mj will ensure that MRD will be admissible in this case. The screen stage will be determined by statistics whose numerators are the absolute values of |X i |, i = 1, . . . , M and whose denominator is also a function of X M +1 /(M + 1). Such statistics along with a sign stage based on the signs of W * mji and X ji will ensure that the resulting procedure will be admissible and consistent under conditions similar to those assumed in the normal case.
the reasoning used to identify the possible paths suppose stage 1 results in the sequence AAR. By definition stage 3 cannot impact the first two actions. Thus the only two paths possible, begining with AAR, are AAR → ARR → ARA and AAR → ARA → ARA. Following these rules we find that there are 18 paths ( listed in Table A.1 ) that must be studied. We will show that each of these paths leads to contradiction.
Note that the fourth row in each cell of Table A.1 indicates how |X i | relates to the screen stage. This will help in showing that each of the 18 cases cannot happen.
At this point we note a fact.

Now for the 18 cases listed in
Case 4: Cannot occur since MRD is admissible.
Case 5: From Table A.1 we see that at x * * the screen stage took a reject to an accept implying |x * * 1 | < C L . But |x * 1 | > C L implying x * 1 < 0. But MRD is ARR implying U m * 1 (x * ) > 0. Hence MRDSS being ARA is contradicted at x * .
Case 6: From the first row in this cell we see that U m * 1 (x * * ) > 0. From the sign change at x * * , we find x * * 1 < 0 which implies x * 1 < 0. But this violates the sign rule.
Case 8: From the fourth row we find x * 1 < 0 which implies x 1 < 0. Since MRD is RAA, U m * 1 (x) < 0. However from row 3 we have a sign change, implying a contradiction.
Case 12: Same reasoning as Case 11.
Case 13: From the fourth row x * 1 > 0 and between C U and C L . From the first row U m * 1 (x) < 0. This contradicts row 3 at x * .
Case 16: Note that a sign action took place at x and at x * * . Applying F1, we have C L < |x * 1 | < C U . Now if x 1 < 0 that implies U m * 1 (x) > 0 which in turn implies U m * 1 (x * ) > 0 and U m * 1 (x * * ) > 0. Since there is a sign change at x * * then we have x * * 1 < 0 which means x * 1 < 0 and a sign change should have occurred at x * . This is a contradiction. Now if x 1 > 0, similar reasoning leads to a contradiction at x * .
Hence we have C L < |x * 1 | < C U and x * 1 < 0 and U m * 1 (x * ) > 0. This contradicts the sign stage at x * .