A New Method for Detecting Signiﬁcant p -values with Applications to Genetic Data

A new method for detecting signiﬁcant p -values is described in this paper. This method, based on the distribution of the m -th order statistic of a U (0 , 1) distribution, is shown to be suitable in applications where m → ∞ independent hypothesis are tested and it is of interest for a ﬁxed type I error probability to determine those being signiﬁcant while controlling the false positives. Equivalencies and comparisons between our method and others methods based-on p -values are also established, and a graphical representa-tion of the distribution of the test statistic is depicted for diﬀerent values of m . Finally, our proposal is illustrated with two microarray data sets.


Introduction
Genome-wide association studies (GWAS) are aimed at identifying genetics variants associated with a trait (Manolio 2010). For this, hundred of thousands participants with and without a particular disease (or trait) are required, and hundred of thousand of genetic variants, i.e., single nucleotide polymorphisms (SNPs), are read using SNPs arrays. Associated variants are further determined after performing (not necessarily) independent statistical tests comparing either the allele frequency or the distribution of the genotypes of these SNPs between cases and controls. Further, the correspondent p-value for each SNP is used to determine whether it is associated with the disease.
As a total of m → ∞ independent SNPs are being tested in a typical GWAS, the problem of determining which variants are associated with the specific trait can be reduced to a multiple testing problem (for a review see Shaffer 1995) and so the family-wise error rate (FWER), i.e., the probability that one or more of the significance tests results in a type I error, must be controlled at level α. For such purpose, several methods can be applied (Bonferroni 1935, Shaffer 1995, Benjamini & Hochberg 1995, Nyholt 2004, Liu et al. 2010. In general terms, these methods use the p-values for each SNP and compare with a (adaptative) threshold, such that the SNPs associated with the trait are those for which the p-value is grater (or lower) than that threshold.
Here we describe a new method to detect p-values while controlling the FWER at level α. This method is heavily based on extreme values theory and considers the distribution of m-th order statistic of a U (0, 1). We derive the test statistic, show its equivalency with Bonferroni's method, and provide asymptotic results for its limiting distribution. In addition, we report preliminary results of a simulation study in which, under the null hypothesis, i.e., p ∼ U (0, 1), the limiting distribution and the simulated values are depicted for different values of m. Finally, we apply our method to two well-known microarray data sets (Golub et al. 1999, Mootha et al. 2003).

Background
Suppose that m → ∞ independent hypotheses of the form are tested, with θ i some parameter of interest and Θ the parameter space. Let α ∈ (0, 1) be the type I error probability at which the ith hypothesis is tested and be its P -value. In (2), T i is the test statistic for the ith hypothesis and G its cumulative distribution function (cdf ). Under H 0 , P 1 , P 2 , . . . , P m is a random sample from a U (0, 1) (Sackrowitz & Samuel-Cahn 1999, Murdoch, Tsai & Adcock 2008. Let V be a random variable with cdf F , and let V (m) = max{V 1 , V 2 , . . . , V m } be its maximum in a random sample of size m. The exact distribution of V (m) is given by Casella & Berger (2001): Note that if F is not known, (3) cannot be calculated. However, Serfling (1980, pp. 89) presents an alternative using extreme values theory and asymptotic results. As in a GWAS m → ∞ independent hypothesis are being tested, to build up our methodology on such results seems intuitive.

The Test
Consider the random variable with V (m) as previously defined. For some choices of constants {a m } and {b m }, the limiting distribution of D m is known (Serfling 1980, pp. 89). It follows from the U (0, 1) null distribution of the p-values that − log(p) has a standard exponential distribution with parameter λ = 1 , and choosing a m = log(m) and b n = 1 yields (Serfling 1980, pp. 90) making possible the calculation of (3). It is straightforward to show that the limiting density function of D m is given by We shall say that the ith p-value is significant at level α if where is the test statistic and t c the critical value of the test at level α, e.g., t c is such that Combining (5) and (9), and solving for t leads to In Figure 1 we depict the simulation-based distribution of t * when P 1 , P 2 , . . . , P m iid ∼ U (0, 1) for different values of m. It is also possible to establish some equivalencies between our proposed method and others. For instance, if the Bonferroni (1935) method is to be applied to control by multiple testing (Shaffer 1995), the critical value should be used instead of (10). This result is particularly useful in situations where a stringent control of the FWER (and hence the false positives) is required.

Using the Test
The following steps are suggested for detecting those p-values being statistically significant: 1. For each p-value, calculate t * i as in (8) and denote them as t * 1 , t * 2 , . . . , t * m . Here, higher values of t * indicate strong evidence against H 0 in (1).
2. Determine which t * i s are greater than t c (or t * c ). 3. Define the p-values from step 2 as potential candidates.
In order to facilitate the use of our proposal, an implementation of the aforementioned steps in R (R Core Team 2013) is provided in 4. This function takes a vector of p-values as the main argument, calculates the test statistic and the critical value, and prints the number of rejected p-values as well as the rejection rate. Furthermore, an invisible object (a list) with three components is returned; this list contains the actual p-value, the test statistic and the correspondent decision (significant: TRUE; not significant: FALSE). If necessary, such an object can be used for further analyses.

Examples
In this section, we consider two gene expression data sets to illustrate the usefulness of our proposed method for the identification of significant p-values. Golub et al. (1999) present a generic approach to cancer classification based on gene expression monitoring by DNA microarrays. As a test case, the authors use gene expression data from 3,051 genes in 38 tumor mRNA samples from patients with leukemia; 27 samples come from patients with lymphoblastic leukemia (ALL)(cases) and 11 from patients with acute myeloid leukemia (AML)(controls). For analysis, the processed data was obtained from the multtest package (Pollard, Gilbert, Ge, Taylor & Dudoit 2011).  Golub et al (1999). The vertical dotted line represents the critical value of the test for α = 0.05 when no correction for multiple testing is applied.

Tumor Data
We tested whether the ith gene (i = 1, 2, . . . , m = 3, 051) was differentially expressed (DE), i.e., if there was any statistical difference between the expression levels in cases and controls. This is equivalent to test H 0,i : µ ALL,i = µ AML,i vs. H 1,i : µ ALL,i = µ AML,i As implemented in the genefilter package (Gentleman, Carey, Huber & Hahne 2011), we used a two-sample t-test for testing (12) and calculated the p-value for each gene. Further, these p-values were used to calculate (10) and (11).
In Figure 2 we present the distribution of t * using equation (8) for the m genes. When no correction for multiple testing is applied on the p-values, a total of 1,045 (34.3%, t c = 2.97) genes were found to be DE, which were reduced to 98 (3.2%, t * c = 10.99) when a Bonferroni correction was applied. On the other hand, when the p-values were FDR-corrected before applying our methodology, 681 (22.3%, t c = −5.05) were found to be DE. Equivalent results were obtained using built-in R function p.adjust(). Mootha et al. (2003) presented an analytical strategy for detecting modest but coordinate changes in gene expression using DNA microarray data. This data consists of 22,283 gene expression levels measured in 43 age-matched males skeletal muscle biopsy samples, 17 with normal glucose tolerance (NGT), 8 with impaired glucose tolerance (IGT) and 18 with type 2 diabetes (T2D).

Type 2 Diabetes Data
After randomly selecting 1,000 gene expression levels for T2D samples from the original data, the linear correlation coefficient ρ for each pair of genes was calculated. ρ might be seen as a «proxy» of the potential interacting effects between pair of genes.
were tested. For α = 0.05, 52,576 (10.53%, t c = 2.97) correlation coefficients were significant when no correction for multiple testing was applied, which reduced to 319 (0.06%, t c = 2.97) and 6 (∼0%, t * c = 16.09), respectively, when the FDR and Bonferroni corrections were used. Results for the latter are presented in Table 1.

Discussion
In this paper, we propose a new method to determine whether a p-value is significant under a multiple testing setting while controlling (or not) the FWER. Our proposal, based on the m-th order statistic of a U (0, 1) distribution, has been shown to give equivalent results to Bonferroni's method while controlling the FWER, and to classical methods while not. Furthermore, under the null hypothesis, the proportion of true null hypothesis being rejected is close to the nominal level α. Observe that, by no means, we are stating that our method is improving any of the other alternatives available in the literature to correct by multiple testing, and which have extensively been applied in the genetics field.
The contribution of this paper can be seen under two perspectives. First, it offers a graphical alternative to represent p-values and the cutoff value beyond which, in the genetic context, we consider that a SNP (or gene in a microarray) is statistically significant. Second, the use of asymptotic statistics and extreme values theory in genetics. In a review of the literature previous to the writing of this paper, we found no mention or application of these two important concepts in genetics. The main advantages of this new approach are the direct calculation of the cutoff value labelling a p-value as significant, the simplicity of its calculations, and how easy it is to graphically represent the results. Computationally, our approach is better than the FDR (Benjamini & Hochberg 1995) as it does not require to store all the p-values.
Although in our applications section we showed how to use our approach to determine significant p-values with GWAS and microarray data, it is not limited, under any circumstance, to these type of data. The main reason for this is that our approach uses the p-values of the hypotheses tested regardless of the type(s) of data on which they have been tested. Future extensions of this methodology include considering correlated tests as those proposed by Benjamini & Yekutieli (2001).