Multiple testing in ordinal data models

: Consider an R × C contingencytable in which the categoriesare ordered. Multiple testing of the hypotheses that each local odds ratio is one is carried out. The methodology to perform the multiple tests is an exten- sion of the MRDSS method of Chen, Cohen, and Sackrowitz (2009). The MRDSS method extends the MRD method of Cohen, Sackrowitz, and Xu (2009) by adding a screen stage and a sign stage to MRD. The MRDSS method as well as the extension here is admissible and consistent. Both Fisher-type statistics and Chi-square statistics are used. Examples and a simulation study are included.

procedures based on P-values derived from Fisher's exact test statistics or from Pearson's Chi-square statistics for local 2 × 2 subtables, have an undesirable property. Namely, that relevant acceptance sections of individual hypotheses are not always convex. This renders these multiple testing procedures (MTPs) inadmissible in terms of expected number of type I errors and expected number of type II errors.
New admissible and consistent MTPs are offered. The new procedures represent an extension of the MRDSS method developed in the companion paper by Chen, Cohen, and Sackrowitz (2009). Precise definitions of admissibility and consistency are given in that reference. MRDSS is a three stage multiple testing method. The first stage called maximum residual down was developed by Cohen, Sackrowitz, and Xu (2009). It is a step-down method with several desirable practical and theoretical properties. However, MRD is not always consistent in the sense that asymptotically (as sample size tends to infinity) it can sometimes make mistakes. A screen stage added to MRD plus a sign stage added to MRD yields MRDSS which is both admissible and consistent.
The MRD stage of MRDSS is based on statistics called residuals. These statistics arise naturally in multivariate normal models and in some exponential family models. The residuals have certain monotonicity properties that are essential for admissibility of individual tests. In the contingency table models analogues of the residuals, with the desirable properties, are found. The new, so called residuals, can be determined as Fisher statistics or Chi-square statistics for certain 2 × 2 tables obtained by collapsing R × C tables and certain subtables. Sometimes this involves segmenting the overall table into smaller tables that subsequently are collapsed into 2 × 2 tables. Systems of equations need to be derived and solved to provide estimated expected frequencies and estimated variances of frequencies for the Chi-square statistics. This all must be done so that the analogues of the residuals have the necessary monotonicity properties enjoyed by the residuals of MRD. Precise definitions are forthcoming.
The screen stage and sign stage of MRDSS also must be developed for the R × C model. The new screen stage must depend on statistics for local 2 × 2 tables that yield consistency. For MRDSS applied in normal models the choice of statistic for the screen stage is obvious. For the R × C case there are several choices of statistics. The screen stage statistic is formed for local 2 × 2 tables. The sign stage for the R × C table is not literally a comparison of signs. This time "signs" match or not, depending whether a rejection by MRD is for a left or right side rejection as compared to whether the screen stage statistic would reject on the right side or left side. Again precise definitions are forthcoming.
Multiple testing of odd ratios in 2 × 2 subtables of R × C tables is a problem discussed in Gabriel (1966), Hirotsu (1978, 1983 and Hochberg and Tamhane (1987). The procedures recommended are all single step and are very conservative. Shaffer (1986) recommends stepwise procedures for testing 2 × 2 subtables. We regard testing all local odds ratios as a meaningful problem for R × C tables with both factors having ordered categories. Such testing enables detections of change points in directions of each factor, enables examination of likelihood ratio order (all log odds non-negative one-sided testing) between pairs of rows or pairs of columns, and helps identify the adjacent levels of each factor where a true association exists.
Textbook discussions on stepwise methods for multiple testing in general (not necessarily for categorical data) can be found in Hochberg and Tamhane (1987), Lehmann and Romano (2005) and Dudoit and Van Der Laan (2008). Another useful reference for these methods is Dudoit, Shaffer, and Boldrick (2003). Cohen and Sackrowitz (2005b, 2005a and Cohen, Kolassa, and Sackrowitz (2007) demonstrate the inadmissibility of the usual step-up and step-down procedures for a wide variety of models, loss functions and conditions for some one-sided and some two-sided hypotheses. Cohen, Sackrowitz, and Xu (2009) and Chen, Cohen, and Sackrowitz (2009) offer new methodologies in models concerned with means of normal variables.
In the next section we pose several discrete models for an R × C contingency table and give some other preliminaries. In Section 3 we cite examples for the 2 × C case in which step-up or step-down procedures based on Fisher exact test statistics or based on Pearson's chi-squared statistics are inadmissible. We give examples since discreteness and other conditions prevent a general result.
Nevertheless the examples are such that it will be clear that inadmissibility will often be the case. Examples of inadmissibility of typical procedures for the R×C case will be similar to the 2 × C case.
New methodology, first for 2×C tables will be given in Section 4. An example will be given. New methodology will be given in Section 5 for the R × C case. Pearson's chi-squared statistics will be used for the R × C case since Fisher's statistic becomes less computationally feasible. An example is given in Section 6 for the R × C case and a simulation study is offered in Section 7.

Models and preliminaries
Consider an R × C contingency table with ordered categories in both rows and columns. Let X ij be the frequency in the ijth cell, i = 1, . . . , R, j = 1, . . . , C. Assume either the full multinomial model or the product multinomial model with cell probabilities p ij . In either case, if we condition on both the column totals t j , j = 1, . . . , C and the row totals R i , i = 1, . . . , R, the conditional distribution of X ij , i = 1, . . . , R − 1, j = 1, . . . , C − 1 given all R i , t j is multivariate The conditional distribution expressed in exponential family form is where ν ij are log odds ratios, i.e., ν ij = log pij pRC Cohen and Sackrowitz (2000). If we let S ij = i k=1 j l=1 X kl and let µ ij = log[p ij p (i+1)(j+1) /p i(j+1) p (i+1)j ] then (2.2) Note (2.2) would also ensue for the model in which X ij are independent Poisson variables with parameter λ ij . Then upon conditioning on R i , X i is multinomial with parameter p ij = λ ij / C j=1 λ ij . Conditioning next on t j yields the multivariate hypergeometric of (2.1).
Hypotheses of interest for the R × C table are and let T ij be a statistic to test H ij . Typical step-down and step-up procedures are based on statistics T ij that depend only on the cell frequencies in (2.5). Such statistics are often Fisher's exact test statistics or Pearson's chi-square statistics.
To describe a class of step-down procedures based on P-values let 0 < α 1 < · · · < α q , where q = (R − 1)(C − 1), be a sequence of critical values. Let P ij (T ij ) be the P-value for testing H ij and let P (1) ≤ · · · ≤ P (q) correspond to the ordered P-values.
The critical values α i can be chosen in variety of ways. Often times α 1 is chosen at step 1 to control the weak familywise error rate (FWER). That is, the probability of at least one false rejection when all nulls are true. The other α's are often chosen to control strong FWER or the false discovery rate (FDR). See for example, Dudoit, Shaffer, and Boldrick (2003).
To describe a class of step-up procedures again consider a sequence of critical values 0 < α 1 < · · · < α q , not necessarily the same as in the step-down case.
(i) If P (q) < α q , stop and reject all H (i) . If P (q) ≥ α q accept H (q) and go to step (ii).
The critical constants are often chosen to control FDR (or FWER). See for example, Benjamini and Hochberg (1995).
We evaluate the collection of q tests by evaluating each individual test by its expected type I error and expected type II error. If T represents the collection of test statistics T ij (x) and ψ ij (x) represents the test function based on T ij for testing H ij then the risk for the individual test is One can consider a q-vector risk function consisting of q components, each representing the risk for an individual test. In terms of admissibility, multiple testing procedures that are inadmissible for this vector risk would remain so if the risk is any non-decreasing function of the collection of individual components of the vector risk.

2 × C tables -examples of non-convex acceptance sections
Poisson, multinomial, or independent binomial models Our starting point in this section is to assume the model where upon conditioning on row and column totals we have the multivariate hypergeometric distribution given in (2.2). We start with a result for R × C tables.
Lemma 3.1. A necessary and sufficient condition for a test ψ ij (x) to be admissible is that for all S i ′ j ′ (i ′ = i, j ′ = j), except S ij , fixed, the acceptance sections of the test are convex in S ij .
Whereas one cannot demonstrate that step-down and step-up procedures based on Fisher exact tests or Pearson's chi-square tests are always inadmissible, we give examples (which are more typical than not) to indicate that those procedures often are inadmissible. It sufficies to work with 2 × 3 tables since extensions to 2 × C tables would easily follow.

2 × C tables. New methodology
Our starting point for this section is the case R = 2 and we observe S with distribution given in (2.2).
The methods offered are related to the MRDSS method of Chen, Cohen, and Sackrowitz (2009) which extends the MRD method of Cohen, Sackrowitz, and Xu (2009). The new method also features a 3-stage process in the spirit of the original MRDSS. Although the statistics used at each stage are quite different we will still refer to the new method as MRDSS. Here it involves using Fisher's test statistics (or Pearson's chi-square statistics) for various sometimes collapsed tables that wind up as 2 × 2 tables. We give the method using Fisher's statistics P 1j although Pearson's chi-square statistics could also be used.
If 1 < j 1 < C − 1, segment the 2 × C table into 2 tables. The first table is the 2 × j 1 table consisting of the first j 1 columns of the original table and the second table is 2 × (C − j 1 ) consisting the last (C − j 1 ) columns of the table. Now treat each table as in the 2 × C table case and form 2 × 2 tables as in step (i). That is, form (j 1 − 1) P-values for the 2 × j 1 table and C − j 1 − 1 P-values for the 2 × (C − j 1 ) table and then get P At stage 2, screen the results of stage 1, as follows: Let α L < α U be two critical values between 0 and 1. Typically α L < α C−1 ≤ α U . Find Fisher's statistic for each local 2 × 2 table. If the Fisher statistic is less than α L and at stage 1 the test accepted, switch to rejection. If the statistic exceeds α U and at stage 1 the test rejected, switch to accept. Otherwise maintain the results of stage 1.
At stage 3, switch a reject at stage 1 to a final accept at stage 3, if when screening, the statistic lies between α L and α U and if the Fisher statistic used in stages 1 and 2 were compiled using opposite sides of their respective 2 sided rejection regions. For example, suppose we are testing H 11 : µ 11 = 0 vs K 11 : µ 11 = 0. Suppose that at stage 1, step 1 the table S 11 S iC − S 11 t 1 − S 11 t 2 − S 1C + S 11 yielded the minimum P-value < α 1 and thus H 11 was rejected at stage 1. Note that (1 -P-value) based on Fisher's statistic for this table as a function of S 11 is decreasing and then increasing. See Cohen (1987). Suppose further that the observed value of S 11 was on the increasing part of the function. Now for the table x 11 x 12 x 21 x 22 assume the Fisher test statistic P-value was between α L and α U . Further suppose for the observed value of x 11 that (1 -P-value) as a function of x 11 was on the decreasing part of the function. This would call for a switch of reject H 11 to accept H 11 .
Remark 4.1. The critical values 0 < α 1 < · · · < α C−1 in stage 1 and α L ≤ α U in stage 2 can be chosen according to some considerations. They can be chosen so that the overall procedure satisfies an error rate control such as FWER or FDR. They can be chosen so that FDR is controlled only when the numbers of non-nulls is large. This can sometimes be accomplished by simulation. The degree of conservativeness, total sample size, namely S RC and the total number of hypotheses to be tested influence the choice of constants. At stage 2 α L and α U would be chosen so that changes are made only if the evidence is compelling. In other words α L is usually small and α U is relatively large. Sometimes critical values are obtained via simulation. In this case we seek FDR control for all situations except perhaps when the number of true alternations is small.
We now illustrate the method for 2 × C tables. The data are from Agresti (1984a) and are provided in Table 4.1.
Stage 1-step 1. Form 3 2 × 2 tables where the entries in the first row-first column are S 11 , S 12 and S 13 respectively. These 3 tables and their corresponding P-values based on Fisher statistics are: The minimum P-value among the 3 tables occurs for the second table. Using critical values (.05, .075, .1) we reject H 12 : µ 12 = 0.
Stage 1-step 2. Since µ 12 = 0 was rejected the 2 × 4 table is segmented into two 2 × 2 tables. Namely 12 10 4 6 and 5 8 8 11 The P-values for these are .4887 and 1 respectively. Since P-values are less than .075, both H 11 and H 13 are not rejcted and Stage 1 is complete.
Stage 2 (screen stage): The 3 local 2 × 2 tables are considered. These include the two considered at Stage 1 -step 2 and the third is Thus there is evidence that the ratio of the proportion of larger craters to less than 2/3 healed craters is different for the two treatments.
Note if in stage 2 α U = .2, the decision to reject H 12 at stage 1 would be reversed and all 3 null hypotheses would have been accepted.
At step 2, find the statistics i2j2 and continue.
At step 2, suppose H i1j1 is rejected by the end of step 1. Estimate the means of cells again by using the same R×C equations, except now replace the equation In general at step m, we remove (m − 1) equations of the type (5.6) and add (m − 1) equations of the type (5.8).
To get the estimateV we think of the cell frequencies x ij as independent normal variables with mean λ ij and variance λ ij . Recall S ij = i l=1 j k=1 x lk , so S = AX for the appropriate A, It would follow that S ∼ N (Aλ, AΣA ′ ) with Σ = diag(λ 11 , . . . , λ RC ) being the covariance matrix of X. Having estimated λ ij byλ ij we thus have an estimator of Σ S = AΣA ′ . However to findV First, we solve the equations for estimating λ ij . This is done by transforming the equation-solving problem into a minimization problem.
At step m, let It is clear that f m (λ) ≥ 0 for all λ.
Letλ be the unique non-negative solution of equations (5.3)-(5.8). This implies f m (λ) = 0. Henceλ is the point that minimizes f m . So by minimizing f m , we can get the solution of equations (5.3)-(5.8). It is obvious that f m is a convex function for λ ≥ 0, hence the minimization of f m can be done very easily and efficiently. In our study, we use the nlm function in the popular R(2009) software to minimize f m .
After we solve the equations, i.e, we getλ, we compute the conditional variance of S ij . This enables us to complete the computation of U (m) ij . After the MRD stage, we proceed to stage 2, the screen stage. This time we compute the statistics for each local 2 × 2 table and find the P-values for these statistics, comparing them to α L and α U . Then we proceed to stage 3, the sign stage. Here we proceed as done in the 2 × C case.
We conclude this section with: Theorem 5.1. For an R × C contingency table, assuming either the product multinomial model, full multinomial model or independent Poisson model, the MRDSS method using Pearson's Chi-square statistic or Fisher's statistic in the 2 × C case is admissible.
Proof. Chen, Cohen, and Sackrowitz (2009) prove that for some exponential family models adding a screen and sign stage to an MRD procedure maintains its admissibility property. A similar argument in this case can be used to demonstrate that the screen and sign stage will enable an admissible MRD procedure to maintain its admissibility property. Thus for MRDSS to be admissible it suffices to prove that MRD is admissible. In Cohen, Sackrowitz, and Xu (2009) MRD was shown to be admissible in a multivariate normal model where residuals were defined as functions of the coefficients of the parameters to be tested. For the R × C contingency table model the S ij , i = 1, . . . , R − 1; j = 1, . . . , C − 1 are the coefficients of the parameters to be tested and suitably centered and normalized they have the identical properties as the residuals U mj in Cohen, Sackrowitz, and Xu (2009). That is, suppose we are examing the admissibility of the individual test of H ij : µ ij = 0 vs K ij : µ ij = 0. Then |U (m) ij | given in (5.1) and (5.2) and given implicitly for m > 2 decrease and then increase as a function of S ij while all other S ij are fixed. Also all other |U (m) ij | do not change. These are the properties needed to prove MRD is admissible.
Remark 5.2. In the case of a 2×C table Fisher's test statistic can be used instead of Pearson's Chi-Square statistic and it too has the same required properties; namely the statistic for testing H ij is decreasing and then increasing while other S i ′ j ′ remain fixed and all other Fisher statistics for hypotheses H i ′ j ′ remain unchanged.

R × C example
In this section we illustrate the method with an 3 × 3 table example. The data to be used are shown below in Table 6.1.
The data are from Agresti (1984b): There are a total of 2 ×2 = 4 local log odds ratios, i.e., a total of 4 hypotheses to be tested.
For the MRD stage, we use critical values obtained from the significance level 1 −(1 −α) 1/k , k = 1, . . . , 4 for our test statistics at step 4,3,2 and 1, respectively, where α = 0.05. Hence the critical values are: (Ω 1 , . . . , Ω 4 ) = (2.49, 2.39, 2.24, 1.96) At stage 1 step 1, we first solve the equations (5.3)-(5.6) to get the estimated mean of each cell given the row and column totals, under the assumption all local log odds ratios equal to 0. The mean value of each cell is shown below in Table 6.2. After this, we can compute the conditional variance of S ij . The conditional variance is shown below in Table 6.3. Note the zeros in Table 6.3 reflect the fact that the corresponding S ij variables are fixed. Now, we can compute the test statistics U ij , shown in Table 6.4. The maximum of these is 7.99, which is greater than the critical value 2.49, hence we reject the hypothesis H 11 and continue to the next step. We also record the sign of the test: +. At step 2, we estimate the mean of each cell conditioned on the row and column summation being fixed, and conditioned on all local log odds ratios except µ 11 are 0, and also conditioned on S 11 fixed. Solve the subset of equations (5.3)-(5.8) as explained earlier to get the estimated mean of each cell: shown in Table 6.5. Table 6.5 Step 2 -Expected cell frequencies As in step 1, we can compute the conditional variance of S ij : conditioned on S iC , S Rj , S 11 are known and fixed. The conditional variance of S ij are shown below in Table 6.6: The maximum of these is 5.06, which is greater than 2.39, so we reject H 22 and continue to the next step. Also we record the sign of the test: +.
Step 3: We estimate the mean of each cell conditioned on the row and column sum fixed, conditioned on all local log odds ratios (except µ 11 and µ 22 ) are 0, and also conditioned on S 11 and S 22 are fixed. As in step 2, solve the subset set of equations (5.3)-(5.8) to get the estimated mean of each cell. Now, we can compute the conditional variance of S ij : conditioned on S iC , i = 1, . . . , R − 1, S Rj , j = 1, . . . , C, S 11 and S 22 are all fixed. The conditional variances of S ij are in Table 6.9.
The test statistics U ij are in Table 6.10. The maximum of these is 2.19, which is smaller than the critical value 2.24, So MRD stops here, and all remaining hypotheses are accepted.
Now the screening stage. The P-values based on Pearson's Chi-square test statistics for the 4 local 2 × 2 tables are shown below: Thus there is evidence of opinion differences only at the extremes. That is, the "generally disapprove" and "middle position" categories when paired with "less than high school" and "high school" as well as "middle position" and "generally approve" categories when paired with "high school" and "more than high school".

Simulation for R × C table
We did some simulations to compare our method, MRDSS, to Holm's step down method. FWER is controlled at level α = 0.05 for Holm's step down (SD). For the MRD stage of MRDSS we use critical values obtained from the normal distribution based on significance levels 1 − (1 − α)  For the screen stage we use α L = 1 − (1 − α) 1/M , α U = α = .05. For a 3 × 3 table, M = 4; for a 3 × 4 table, M = 6. The type I and type II error columns reflect the average number of type I and type II errors respectively.
We define the power of a MTP as: power = 1 -E { # of type II errors } / # of true non-null hypotheses.
The percentage power increase of MRDSS relative to SD is defined as: power incr = 100×(power M RDSS -power SD )/power SD We first list 34 sets of local odds ratios. For the 3 × 3 table, each set contains 4 local odds ratios. We compare the expected number of type I errors, expected number of type II errors and FWER of MRDSS and SD. We also give the percentage power increase of MRDSS relative to SD.
For the 3×3 table, note that both MRDSS and SD control FWER at α = .05. Also note that in 29 out of 33 cases MRDSS had an increase in power over SD and in many cases the increase was substantial.
In Table 7.2 and 7.3, we list the simulation results for a 3 × 4 table. There are a total of 70 configurations of local odds ratios. For each row, the row sum of the Poisson parameter λ is 80, and for each column, the column sum is 60.
Note that MRDSS strongly controls FWER at level α = .05 in all 70 cases, except one (where MRDSS has a FWER = 0.051). We also compare the percentage power increase of MRDSS relative to stepdown method. MRDSS had an increase in power with only one exception. The increase was substantial in most cases.