A direct approach to estimating false discovery rates conditional on covariates

Modern scientific studies from many diverse areas of research abound with multiple hypothesis testing concerns. The false discovery rate (FDR) is one of the most commonly used approaches for measuring and controlling error rates when performing multiple tests. Adaptive FDRs rely on an estimate of the proportion of null hypotheses among all the hypotheses being tested. This proportion is typically estimated once for each collection of hypotheses. Here, we propose a regression framework to estimate the proportion of null hypotheses conditional on observed covariates. This may then be used as a multiplication factor with the Benjamini–Hochberg adjusted p-values, leading to a plug-in FDR estimator. We apply our method to a genome-wise association meta-analysis for body mass index. In our framework, we are able to use the sample sizes for the individual genomic loci and the minor allele frequencies as covariates. We further evaluate our approach via a number of simulation scenarios. We provide an implementation of this novel method for estimating the proportion of null hypotheses in a regression framework as part of the Bioconductor package swfdr.

G x i (λ) = P r(P i ≤ λ|θ i = 1, X i = x i ). (1) The major assumption we make moving forward is that conditional on the null, the p-values do not depend on the covariates.
Theorem S1 Suppose that m hypotheses tests are performed and that conditional on the null, the p-values do not depend on the covariates. Furthermore, the null p-values have a Uniform(0,1) distribution, whereas the alternative p-values have a distribution with cdf G x i , as defined above. Then: We first review the algorithm which yields an estimator of π 0 for the no-covariate case, which is used by Storey (2002), then develop a procedure based on Theorem S1 to obtain an estimator of π 0 (x i ). Both of them are based on assuming reasonably powered tests and a large enough λ, so that Theorem S1 can then be applied assuming no covariates, leading to: resulting in: Using a method-of-moments approach, one may consider the estimator: which is used by Storey (2002).
For the GWAS meta-analysis dataset, using this approach with λ = 0.8 leads to anπ 0 = 0.951 and λ = 0.9 tô π 0 = 0.949. Note that in practice one may smooth over a series of thresholds, as described below; otherwise, fixed thresholds between 0.8 and 0.95 are often used. This means that G x i (λ) will be very close to 1, but λ will not be large enough to lead to numerical instability issues when dividing by 1 − λ.
For the covariate case, applying the same steps with Theorem S1, we get: We can use a regression framework to estimate E(Y i |X i = x i ), then estimate π 0 (x) by: obtaining Step (c) in the algorithm.
Note that thus far we have considered the estimate of π 0 (x i ) at a single threshold λ, so thatπ 0 (x i ) is in factπ λ 0 (x i ).
We generally prefer to smooth over a series of thresholds to obtain the final estimate, as done by Storey and Tibshirani (2003). The estimates should generally be thresholded at 1, as Eq. (3) may otherwise lead to values greater than 1. It is also possible but less likely that the smoothed estimate would be below 0, hence we also threshold it at 0. If we assume that the p-values are independent, we can also use bootstrap samples of them to obtain a confidence interval forπ 0 (x i ) - Steps (e) and (f) in Algorithm 1.
In order to obtain Step (g) in the algorithm and estimate FDR(x i ), we multiply the BH adjusted p-values byπ 0 (x i ), thus leading to a simple plug-in estimator, denoted FDR(x i ). This is done in the spirit of Storey (2002), whose approach uses an estimate which is not conditional on covariates.
S2 Special cases

S2.1 No covariates
If we do not consider any covariates, the usual estimatorπ 0 from Eq. (2) can be deduced from applying Algorithm 1 by fitting a linear regression with just an intercept.

S2.2 Partioning the features
Now assume that the set of m features is partitioned into S sets, namely that a collection of sets S = {A s : 1 ≤ s ≤ S} is considered such that all sets are non-empty, pairwise disjoint, and have the set of all the features as their union. This idea has been proposed before, for example in Hu et al. (2010), but we propose it here as a natural subcase of our approach. We consider the sets ordered for the sake of convenience, for example, the first set in S is A 1 et cetera, but note that this ordering does not necessarily have scientific relevance. In the GWAS meta-analysis dataset, the SNPs are partitioned according to their MAFs. Other examples of such partionings include all possible atoms resulting from gene-set annotations or brain regions of interest in a functional imaging analysis, when considering only the genes or voxels that are annotated (Boca et al., 2013). We then consider vectors x i of length S, 1 ≤ i ≤ m, such that element s of x i is defined, using the indicator notation, as: For example, if S = 3 and feature 1 was in set A 1 , then x 1 = (1, 0, 0) . Since all features i in a set A s have the same vector x i , we denote it by e As to emphasize this. Taking into account the partition, a natural way of estimating π 0 (e As ) is to just apply the estimatorπ 0 from Eq. (2) to each of the S sets: where the numerator i∈As Y i |As| represents the fraction of features in A s that are not discoveries at the λ threshold.
A related idea has been proposed for partitioning hypotheses into sets to improve power (Efron, 2008). These results would be obtained directly from our approach if we considered linear instead of logistic regression and fit a linear regression with no intercept and the covariates x i in Algorithm 1, or instead, set one of the sets as the baseline and also considered an intercept. As we are considering a logistic regression approach, our results will be slightly different.

S3 Theoretical results
We now proceed to explore some theoretical properties of the estimatorπ λ 0 (x i ). Applying Theorem S1 to Eq.
(3), we is positive or negative and small in absolute value, thenπ λ 0 (x i ) is a conservative estimator of π 0 (x i ). For example, if we had considered a correctly specified linear regression model, this would always hold; indeed the case where π 0 is shared by all the features, i.e. in the case of no dependence on covariates, this is shown in Storey (2002). Given that here we are takingÊ(Y i |X i = x i ) to be the MLE from the logistic regression model, we know that it represents a consistent estimator of E(Y i |X i = x i ) if the model is correctly specified for m → ∞, given certain technical conditions, for instance those specified in Gourieroux and Monfort (1981). Thus, we can show thatπ λ Theorem S2 Under a correctly specified model and technical regularity conditions, as m → ∞.
Eq. (5) also leads to Var{π λ 0 (x i )} = V ar{b(x i )} (1−λ) 2 . Once again, using the properties of the MLE, under appropriate conditions: for some σ 2 , leading to V ar{π λ 0 (x i )} being approximately inversely proportional to m for large values of m.
We note that our approach to estimating π 0 (x i ) does not place any restrictions on its range. In practice, the values will also be thresholded to be between 0 and 1, as detailed in Algorithm 1. In Result S3, we show that implementing this thresholding decreases the mean squared error of the estimator. The approach is similar to that taken in Theorem 2 in the work of Storey (2002). Then:

S4 Proofs of analytical results
Proof of Theorem S1 Then, using the assumption that conditional on the null, the p-values do not depend on the covariates:

Proof of Result S3
We prove this result by showing that: and: Then, we can combine them as follows: In Eq. (6): In Eq. (7): S5 Functions π 0 (x i ) used in simulation scenarios Below, we refer to scenarios I-IV, as in Figure 3: In scenarios I-IV, the values of x 1 are equally spaced between 0 and 1, with the number of points being equal to m, the number of features considered.
• Scenario IV: π 0 (x 1 , x 2 ) is the same function as in scenario III multiplied by 0.6.
• Scenario V: π 0 (x 1 ) = x 1     Figure S2: Simulation results for m=10,000 features and t-distributed independent test statistics showing the true function π 0 (x i ) in black and the empirical means ofπ 0 (x i ), assuming different modelling approaches in orange (for our approach, Boca-Leek = BL), blue (for the Scott approach with the theoretical null = Scott T), and brown (for the Storey approach.) The scenarios considered are those in Figure 3.