Estimation of false discovery proportion in multiple testing: From normal to chi-squared test statistics

: Multiple testing based on chi-squared test statistics is common in many scientiﬁc ﬁelds such as genomics research and brain imaging studies. However, the challenges of designing a formal testing procedure when there exists a general dependence structure across the chi-squared test statistics have not been well addressed. To address this gap, we ﬁrst adopt a latent factor structure ([14]) to construct a testing framework for approximating the false discovery proportion (FDP) for a large number of highly correlated chi-squared test statistics with a ﬁnite number of degrees of freedom k . The testing framework is then used to simultaneously test k linear constraints in a large dimensional linear factor model with some observable and unobservable common factors; the result is a consistent estimator of the FDP based on the associated factor-adjusted p -values. The practical utility of the method is investigated through extensive simulation studies and an analysis of batch eﬀects in a gene expression study.


Introduction
Large-scale multiple testing is common in analysis of high dimensional data, and it has wide application in scientific fields such as biology, medicine, genomics, neuroscience, economics, and finance.For example, in a genome-wide association study, hundreds of thousands of genetic polymorphisms were scanned simultaneously; in this process, multiple testing procedures, especially false discovery rate (FDR) methods ( [4]), play an important role in detecting the association between genetic variants and some complex diseases ( [32]).
Benjamini and Hochberg's (BH) procedure and its variants (e.g., the stepwise multiple testing procedures in [23] and false discovery rate estimators in [27]) have been shown to be theoretically valid when the test statistics are independent or have some form of weak dependence such as positive regression dependence on a subset (PRDS) ( [5]; [23]) or general weak dependence on pvalues ( [27]).There has been some recent research on quantifying the effect of correlation in false discovery rate analysis.For instance, [7] argued that the difficulties caused by dependence in multiple testing tend to diminish as the number of simultaneous tests increases when the dependence is weak, provided that the test statistics have light-tailed marginal distributions; [31] generalized the popular random effect model that is assumed on the p-values to a conditional dependence model that allows weak dependence among the null hypotheses and can be useful to characterize the spatial structures of the null hypotheses.When the dependence is strong, the effect of correlation on the variance of some error measures, such as the number of false discoveries or the estimator of false discovery rate, does not diminish as the number of tests increases and should be carefully accounted for ( [22]; [11]; [25]; [2]).In particular, [13] proposed that the false discovery proportion (FDP) can be consistently estimated when the test statistics follow a multivariate normal distribution with arbitrary covariance dependence.Their method is quite useful in applications where the normality approximation of the test statistics is adequate, such as the expression quantitative trait loci (eQTL) in genome-wide association studies and the classical two-sample comparison in microarray studies.In practice, chi-squared or F tests are popular in many areas, including (but not limited to), the analysis of variance (ANOVA) for testing fixed or random effects in the analysis of experimental data, the Pearson's chi-squared goodness of fit test for independence, and many variants of likelihood-ratio tests.In all of the above testing problems, it would be beneficial to extend [13]'s method to accommodate the chi-squared or F tests in multiple testing under general dependence structures.It is worth noting that [23] showed that a class of multivariate F distributions that arises in many-to-one comparisons of variances with one-sided alternatives satisfies the PRDS property, and thus the BH procedure can be legally applied.However, this class of multivariate F distributions is very restrictive in applications as it is constructed based on a series of independent chi-squared random variables.
In this article, we provide a new method for solving multiple testing problems that arise when the test statistics follow a multivariate non-central chi-squared distribution with arbitrary underlying covariance dependence.Specifically, we construct a testing framework for a large number of highly correlated chi-squared test statistics, based on a small panel of data with dimension m by k.The dataset consists of k independent multivariate normally distributed random vectors with an arbitrary common covariance dependence structure, where k is the number of degrees of freedom for each chi-squared test statistic and m is the number of tests.For each multivariate normally distributed random vector, we adopt a latent factor structure ( [14]) to model the arbitrary covariance dependence structure.That is, the covariance dependence can be additively decomposed into two separate sources of variation: (i) major dependent variation shared across the tests in terms of low dimensional latent common factors; and (ii) test-specific weakly dependent variation.Under this setting, we approximate the FDP based on the associated p-values with respect to a threshold t by summing a series of non-central chi-squared distribution functions, within which are found the latent common factors and factor loadings.
We then use the testing framework to simultaneously test k linear constraints in a large dimensional linear factor model, with another latent factor structure assumed on the random errors.This model has wide application in genomics research where some unobservable batch effects (or confounders) that contribute to the gene expression differentiation can be viewed as latent common factors.This naturally leads to a factor-adjusted multiple testing procedure ( [20]; [16]).The factor-adjusted procedure is more robust and powerful than the traditional unadjusted method, because the latent common factors are taken into account at an early stage before any tests are performed, resulting in a new ranking of the hypotheses being tested.To make the testing procedure operational, the latent common factors need to be accurately estimated, both theoretically and computationally.Both goals are achieved via a proposed restricted principal component analysis (Restricted-PCA) algorithm.We further show that the estimated FDP of the associated factor-adjusted p-values converges to the true one, even when idiosyncratic errors are allowed to be weakly correlated.Note that factor-adjusted multiple testing that accounts for latent factors in a linear model has been studied in the literature; see [20], [16], [28], and [17].Although we present latent models in a similar fashion, our model settings (see (13) and ( 14)) are more general in the sense that the idiosyncratic errors can accommodate various dependence structures.Moreover, we provide a simple and analytically tractable Restricted-PCA algorithm to adjust for the latent factors.This is the key ingredient of our method; it enables us to derive a consistent estimated FDP.
The rest of this article is organized as follows.The testing framework for approximating the FDP based on a large number of highly correlated chi-squared test statistics is presented in detail in Section 2. Section 3 discusses the connection between simultaneously testing k linear constraints in the large dimensional linear factor model and the testing framework, and forms a consistent estimator of FDP based on the factor-adjusted p-values.The Restricted-PCA algorithm is described in Section 4. The model's numerical performances including both simulation studies and a real data application are extensively investigated in Sections 5 and 6, respectively.Concluding remarks in Section 7 end this article.All of the proofs are deferred to the Appendix.For notational convenience, three tables summarizing the most common recurring notations in this article are provided in Appendix A.

Testing framework from normal to chi-squared test statistics
In this section, we systematically construct and study a testing framework for a large number of highly correlated chi-squared test statistics based on a small panel of data with dimension m by k, where m is the number of tests and k is the number of degrees of freedom for each chi-squared test statistic.

Multiple testing under multivariate non-central chi-squared distribution
We begin with a small panel of data W = (w i ) m×k ∈ R m×k .As a convention, we denote by W = (w 1 , . . ., w m ) the -th column of W, and assume that {W , = 1, . . ., k} are independent and W follows a multivariate normal distribution with a mean μ = (μ 1 , . . ., μ m ) and correlation matrix Σ W = (σ W,i1i2 ) m×m (σ W,ii = 1, for i = 1, . . ., m).For notational convenience, we denote by W i = (w i1 , . . ., w ik ) and μ i = (μ i1 , . . ., μ ik ) the i-th row of W and its mean, respectively.Under the above setup, the joint distribution of ( W ) .Within W, suppose we are interested in testing the following hypotheses: ( By construction, T i = W i 2 2 can serve as the test statistic for the i-th hypothesis, which follows a chi-squared distribution with degrees of freedom k under the true null.Therefore, the p-value for the i-th test is calculated as , where χ k (t; λ) is the cumulative distribution function (CDF) of a chi-squared random variable with degrees of freedom k and non-centrality parameter λ.Denote by I 0 and I 1 the sets of indices corresponding to the true null and non-null, and let m 0 and m 1 be the cardinality of I 0 and I 1 , respectively.Hence, the asymptotic proportion of the true null is defined as As noted by [10], correlation must be accounted for in deciding which null hypotheses are significant because the accuracy of FDR controlling techniques is compromised in high correlation situations.To capture the correlation structure of Σ W , we assume that {W i , i = 1, . . ., m} satisfies the following latent factor structure ( [3]; [14]): where Z = ( Z 1 , . . ., Z k ) ∈ R k×r and Z = ( Z 1 , . . ., Z r ) , for = 1, . . ., k, is a r × 1 vector of latent common factors with the identification restriction E( Z ) = 0 and cov( Z ) = I r .The accompanied γ i are the unknown factor loadings and η i = ( η i1 , . . ., η ik ) are the idiosyncratic errors that are independent of Z.In the spirit of the model setup in [13], we further assume that { η = (η 1 , . . ., η m ) , = 1, . . ., k} are independent and that η satisfies a weak dependence assumption as where is the coefficient of correlation between η i1 and η i2 .Here we would like to point out that Z drives the dependence among the m tests, hence the number of factor r, depends on m.Consequently, the mean μ i , the factor loadings γ i and the variance of the idiosyncratic error σ η,i 1 i2 also depend on m.For notational simplicity, we suppress the dependent m in the model assumptions ( 2) and (3).
Remark 1. [13] introduced a definition for weakly dependent normal variables to guarantee the validity of their approximated FDP.Their definition is essentially in the same form as in (3) except that the correlation matrix is replaced by the covariance matrix.If the diagonals of the covariance matrix are bounded away from zero and infinity, these two definitions are mutually inclusive.A more general definition of weakly correlated standard normal variables is given in Definition 1 of [2], which is a necessary and sufficient condition for the consistency of the associated empirical cumulative distribution function in L 2 norm.Another related condition was given in [25], where they assumed that the empirical moments of all of the pairwise correlations of the chi-squared test statistics shrink to zero when the number of tests diverges to infinity.Under this condition and a novel decomposition for bivariate chi-squared density function, they showed that the fraction of discoveries has asymptotically vanishing variance.
We remark here that the technique for decomposing the bivariate chi-squared density function is still not applicable to our case because the condition in (3) is imposed directly on the underlying correlation matrix.

Approximation of FDP
Traditional multiple testing procedures are based on a sequence of p-values for a family of hypotheses, where the hypotheses with p-values less than or equal to some threshold t are rejected.Define the following empirical processes: for any t ∈ [0, 1], where I = I 0 ∪I 1 .Then V (t), S(t), and R(t) are the number of falsely rejected hypotheses, the number of correctly rejected hypotheses, and the total number of rejected hypotheses, respectively.The false discovery proportion with respect to the threshold t is defined as FDP(t) = V (t)/{R(t) ∨ 1} with R(t) ∨ 1 = max{R(t), 1}.Note that V (t) is unobserved but realized through an experiment, whereas R(t) can be observed.Essentially, estimating the FDP is equivalent to approximating V (t)/m 0 .As discussed in [27], if the p-values satisfy the weak dependence assumption, then V (t)/m 0 would converge to some distribution function almost surely.However, a strong dependence structure on Σ W would affect or even violate the convergence of V (t)/m 0 .From (2), after controlling for the effect of the latent common factors Z, the correlation among the idiosyncratic errors η is weak, according to assumption (3).Hence, it is possible to seek an approximated expression for V (t)/m 0 conditional on the latent common factors Z, which in turn can be used to estimate the FDP.Proposition 1. Suppose assumptions (2) and (3) hold.If we further assume that (i): where The proof can be found in Appendix B. The above results could be generalized to the case where the number of degrees of freedoms k is allowed to grow to infinity by assuming that the average of the coefficients of correlation grows to zero at the rate O(m −δ k −1 ) in assumption (3).Essentially, this Proposition indicates that the difference between two random parts (i.e., V(t) or R(t) and the corresponding approximation) is negligible.This is achieved by assuming that the latent factors part Z γ i is fixed, and by deriving the analysis solely on the dependence structure of η i .
Proposition 1 approximates the corresponding FDP(t) by a series of noncentral chi-squared distribution functions, which contain some latent common factors and factor loadings.In practice, the approximation of V (t) in ( 5) is intractable due to the unknown true null set I 0 .If the corresponding summation of the alternative set I 1 in ( 5) is negligible, we can conservatively estimate V (t) as Recall that FDP(t) = V (t)/{R(t) ∨ 1}, in which R(t) is observed.From (8), we propose to estimate the FDP(t) as To demonstrate the usefulness of the testing framework, in the next section we obtain a consistent estimator of the FDP when the test statistics are multivariate non-central chi-squared distributed within a large dimensional linear factor model.
Remark 2. Our approximated FDP in Proposition 1 is closely related to two existing studies in the literature as follows.
The weak dependence case given in [27].By letting Z = 0, W i 's are weakly correlated in the absence of the latent common factors Z, in which the summation part of (5) is m 0 t.This essentially implies that the empirical distribution of the chi-squared type p-values under the true null converges to the standard uniform distribution, which is a generalization of Theorem 1 in [2].Thus, we can adopt the method of [27] to estimate FDP(t) as m π 0 (λ)t/{R(t) ∨ 1} with π 0 (λ) = #{i; P i > λ}/{m(1 − λ)}.Here λ is selected such that the number of hypotheses under the alternative with p-values greater than λ is negligible.The strong dependence case given in [13].
When the number of degrees of freedom k reduces to 1, the approximated FDP in (7) can also be reformulated as ) + Φ( = 0, (10) where Φ(•) and z t/2 = Φ −1 (t/2) are the CDF and the t/2 lower quantile of a standard normal distribution, respectively.The above approximation is exactly the same as in [13].Thus, our approximation of FDP can accommodate the cases where the test statistics are normally or chi-squared distributed.

Simultaneously testing k linear constraints in a large dimensional linear factor model
The testing framework in Section 2 is derived under a very special data matrix W ∈ R m×k , which is motivated by the structure of multivariate chi-squared distribution.To illustrate the practical utility of the testing framework, we provide a specific example, that is, simultaneously testing k linear constraints in a large dimensional linear factor model.

Model and assumptions
Assume we observe a large panel of data, that is, Y = (Y ij ) m×n ∈ R m×n , with a total of m units, and each unit has n observations.Let X j = (X j1 , . . ., X jp ) ∈ R p be the observable explanatory variables.To link the panel data to the explanatory variables, we consider the following multivariate linear regression: where where Z = (Z 1 , . . ., Z n ) ∈ R n×r and {Z j = (Z j1 , . . ., Z jr ) , j = 1, . . ., n} are independent and each is an r × 1 vector of latent common factors with the identification restriction E(Z j ) = 0 and cov(Z j ) = I r , γ i are the corresponding factor loadings (r × 1), and η i = (η i1 , . . ., η in ) are the idiosyncratic errors, which are independent of Z.Moreover, {η j = (η 1j , . . ., η mj ) , j = 1, . . ., n} are supposed to be i.i.d., and η j satisfies a similar weak dependence assumption as in (3), that is, ) m×m , and with ρ η,i1i2 = σ η,i1i2 /(σ η,i1i1 σ η,i2i2 ) 1/2 being the coefficient of correlation between η i1 and η i2 .After incorporating the latent factor structure (12), model (11) can be further written as (14) is called the large dimensional linear factor model with some observable and unobservable common factors.
In regression analysis, testing the regression coefficients is of great interest in practice.Most testing problems related to the regression coefficients for the i-th unit in (14) can be formulated by a set of linear constraints as follows: where A ∈ R k×p is a full low rank matrix with rank(A) = k, and k ≤ p. Stacking all of the m hypotheses, as in (15), multiplicity must be accounted for.In the following Sections 3.2 and 3.3, we first elaborate a factor-adjusted multiple testing procedure to address the multiplicity issue, and then resort to the theoretical framework in Section 2 to provide convincing mathematical proof to show its consistency.

A factor-adjusted multiple testing procedure
Motivated by the parametric F -test in linear regression and by ignoring the latent common factors Z, we construct an unadjusted test statistic for (15) and approximate the corresponding p-value as follows: where β i = (X X) −1 X Y i is the ordinary least-squares estimate of β i , and 16) is that the ranking of the statistical significance is completely determined by that of which is undesirable and inefficient in practice.Moreover, as mentioned in [8], the approximated FDP based on unadjusted p-values tends to be highly unstable due to dependence, even when the dependence is fully parametrized and used in the procedure.In other words, any downstream methods (e.g., adjusting test statistics) are naturally at a disadvantage.To provide a more robust and powerful remedy, the latent common factors Z must be taken into account at an early stage before any tests are performed, resulting in a new ranking of the hypotheses being tested.Based on the above rationale, the testing problem ( 15) can be conducted using factor-adjusted test statistics under model setup (14).
From model ( 14), Y i = Xβ i + Zγ i + η i , the latent common factors Z should be extracted from β i before the test statistics are formed.This motivates us to introduce an oracle factor-adjusted test statistic and a p-value for the i-th unit as follows: To make the testing procedure operational, the latent factors part Zγ i and the variance of the idiosyncratic error σ η,ii should be estimated by some algorithm and then used to replace the true ones.Specifically, the factor-adjusted test statistics and the associated p-values for ( 15) can be constructed as follows: where Z γ i and σ η,ii can be obtained by the algorithms discussed in Section 4.
As in (4), we denote by {V ad (t), V ad (t)} and {R ad (t), R ad (t)} the number of false rejections and the number of total rejections with respect to a threshold t based on {P ad i , P ad i }'s, respectively.Hence, the false discovery proportion of the oracle factor-adjusted p-values is defined as FDP ad (t) = V ad (t)/{R ad (t) ∨ 1}.
Intuitively, if the latent common factor and the variance of the idiosyncratic error can be accurately estimated, the behavior of P ad i s would resemble that of P ad i s to a large extent.Moreover, the dependence between the oracle factoradjusted test statistics or p-values only comes from the idiosyncratic error part η i , which is weakly dependent according to assumption (3).Hence, we can adopt the traditional multiple testing procedure to estimate the FDP ad (t) ( [27]) as where } is an estimate of the proportion of true null π 0 and λ is defined in Remark 2.

Consistency of the estimated FDP
In this subsection, we connect the chi-squared type factor-adjusted p-values to the testing framework in Section 2 to explore the theoretical property of the estimated FDP in (19).It has been shown that the BH procedure ( [4]) and Storey's method ( [27]) continue to control the FDR even when the test statistics or the p-values are weakly dependent.Does the weak dependence assumption hold for the factor-adjusted p-values?To answer this question, we propose a two-stage procedure.In the first stage, we connect the structure of the oracle factor-adjusted p-values (i.e., P ad i ) to the theoretical framework in Section 2 to show that the weak dependence assumption of the oracle factor-adjusted p-values is satisfied; in the second stage, we supply a sufficient condition on the estimators of the latent factors and the variance of the idiosyncratic error, under which the difference between the factor-adjusted p-values (i.e., P ad i ) and the oracle factor-adjusted p-value is asymptotically negligible.

Stage I: Weak dependence of the oracle factor-adjusted p-values
To this end, we first heuristically argue that by decomposing (T ad 1 , . . ., T ad m ) , a data matrix W ad as in Section 2.1 can be recovered through Y and the design matrix X, and W ad satisfies assumptions ( 2) and ( 3).This can be achieved by resorting to Proposition 1 in the testing framework.More specifically, by comparing the structure of (T 1 , . . ., T m ) in Section 2 with that of (T ad 1 , . . ., T ad m ) , we observe that where is closely related to the model assumptions ( 2) and (3) in the testing framework.To link the parameters in the large dimensional linear factor model to our testing framework, we directly compare (20) with (2), which yields where μ i , γ i , Z, and η i are as in ( 2) and ( 3).Note that the matrix Accordingly, we term P the connection matrix.As P P = I k , one can readily verify that η i ∼ N(0, I k ), and , where P is the -th column of P and D η = diag (σ η,11 ) −1/2 , . . ., (σ η,mm ) −1/2 .Hence, assumptions (2) and (3) in Section 2 are satisfied.In particular, as the latent common factors Z have been adjusted before the test statistics are formed, there are no common factors involved in (20).Therefore, the corresponding non-centrality parameter λ ad 0i /λ ad 1i and the variance of the idiosyncratic error σ ad η,ii , as in (5), are reduced to 0/ μ i and 1, respectively.From Proposition 1, we obtain where

Stage II: When can we ignore the difference between the oracle factor-adjusted p-values and the factor-adjusted p-values?
In the second stage, we supply a sufficient condition for the estimators of the latent common factors and the variance of the idiosyncratic error, under which the differences between m −1 0 V ad (t) and m −1 0 V ad (t) (m −1 R ad (t) and m −1 R ad (t)) are negligible.The results are summarized in the following Proposition.Proposition 2. For any estimators Z, γ i , and σ η,ii that satisfy max i∈I | σ η,ii − σ η,ii | → p 0 and max i∈I P Z γ i −P {Q(X)+PP }Zγ i 2 → p 0, we can obtain, with a probability tending to one, The above conditions essentially mean that σ η,ii − σ η,ii and P Z γ i − P Zγ i converge to zero in probability uniformly for all i, so that the effect of estimating the latent common factors vanishes uniformly for all i ∈ I, as m and n increase.For more detailed derivations, see the proof in Appendix B.
Combining the above two stages, the consistency of FDP ad λ (t) can be readily obtained, which is summarized in the following theorem.Theorem 1. Suppose the conditions in Proposition 2 and assumption (13) hold.Then for a finite number of degrees of freedom k, we can obtain The proof can be found in Appendix B. Theorem 1 indicates that the estimated FDP can consistently match the true FDP of the oracle factor-adjusted p-values when the tuning parameter λ is well chosen, such that F 1 (λ) = 1.

Remark 3.
We remark here that we can use techniques similar to those in Section 3.3 to decompose the unadjusted test statistics T u i and connect the decomposed formulas to the theoretical framework in Section 2 to approximate the false discovery proportion for the unadjusted p-values P u i with respect to a threshold t as is the number of total rejections.Nevertheless, this estimator overestimates the true FDP under the non-sparse assumption, and the associated power is lower than that based on the factor-adjusted procedure.

Departure from normality
In this subsection, we discuss the possible relaxation of the assumption of normality (13) on η j .We conjecture that as long as η j 's are i.i.d. with mean 0 and a finite covariance matrix Σ η , the theoretical properties in Section 3.3 are still valid.For illustration, we consider the large dimensional linear factor model ( 14) as a one-way ANOVA model with p treatments.As the design matrix X has a special structure within this model, P (the -th column of the connection matrix) is a vector with p distinct values.For notational convenience, suppose that the g-th treatment has n g replications with p g=1 n g = n, Consequently, by definition, η can be decomposed as where {η gh , g = 1, . . ., p, h = 1, . . ., n g } are the idiosyncratic errors in (12) under the one-way ANOVA model.Then, as min{n g , g = 1, . . ., p} → ∞, according to the central limit theorem (CLT), η is multivariate normally distributed with mean 0 and covariance matrix D η Σ η D η , asymptotically.Therefore, all of the results in Section 3.3 are expected to hold beyond normality.For simplicity, we leave the rigorous proof of the above conjecture for future research.To support our conjecture, in Section 5 we evaluate the performance of the proposed factor-adjusted procedure using various idiosyncratic error distributions.

Restricted-PCA algorithm on an initial subset
This section provides a new algorithm for estimating the latent common factors, and shows that they indeed satisfy the sufficient condition given in Proposition 2.
In the literature, the method of principal component analysis (PCA) has been developed for consistently estimating the latent common factors Z in model (12) if the pure error matrix E is observed; see, e.g., [3] and [14].In practice, E is not directly observed; rather, we observe Y, which is the sum of BX and E with B = (β 1 , . . ., β m ) ∈ R m×p (model ( 11)).Under model ( 14), [20] introduced an iterative re-weighted surrogate variables analysis (IRW-SVA) to estimate Z.In essence, the weight vector in IRW-SVA is iteratively estimated and converges to its steady state such that big weights are assigned to the rows of Y corresponding to the true null whereas small weights (close to zero) are assigned to those related to non-null.Note that under the null hypothesis that β i = 0 (in other cases, we can transform Hence, when the small weights vanish after several iterations, the subsequent standard factor analysis ( [3]; [14]) can accurately estimate the latent factors Z. Analogously, [17] suggested using negative control genes to perform factor analysis to remove unwanted variation (RUV), where the negative control genes can be regarded as a subset mimicking the true null set.In their approach, the negative control genes are genes known as a priori not to be differentially expressed with respect to the biological factor of interest.In summary, both methods essentially perform the standard PCA algorithm on a subset of the units, which can be treated as a subset-PCA algorithm.As an alternative to the subset algorithm, [16] proposed approximating the latent common factors using some variants of EM algorithms based on a hybrid residual matrix (on a full set) obtained through a combination of restricted and unrestricted least squares.As mentioned in [16], the estimation of the regression coefficients β i differs depending on whether i falls in the true null.That is, β i should be estimated by the unrestricted least-squares estimate for i ∈ I 1 , and by the restricted least-squares estimate for i ∈ I 0 .To achieve a preliminary classification, they suggested separating the true null and non-null sets based on the ranking of the unadjusted p-values.However, thus far, there is no rigourous theoretical support to explain how the effect of the estimated true null set (either by the empirical posterior probability in [20], by some prior information in [17], or by the unadjusted p-values in [16]) affects the estimate of the latent common factors.
To quantify how the preliminary estimated null set affects the subsequent factor analysis, we propose estimating the latent common factors (i.e., Z) via a restricted-PCA algorithm on an initial subset; we show that the estimate of the latent common factors is consistent.Specifically, we first estimate all of the regression coefficients β i in an selected initial subset using the restricted leastsquares estimate.The PCA method can be subsequently used to estimate the latent factors based on the induced residual matrix.As long as the initial subset possesses some sparsity properties, the proposed estimation procedure can still be valid; see Proposition 3 for details.Accordingly, we term this method the restricted principal component analysis (Restricted-PCA) algorithm; it can be regarded as a subset PCA algorithm.The new method is outlined below.
According to the theoretical results given in [1] and [29], the number of latent common factors r can be selected by the eigenvalue ratio test in part (b) with a probability tending to one.Consequently, we assume that r is known when investigating the theoretical properties of the Restricted-PCA algorithm.In addition, we impose some regularity conditions, which are not the weakest possible conditions, but those that facilitate the technical derivations.

Conditions.
C1. Assume p = O(n ζ1 ), r = O(n ζ2 ), and log(m) = O(n ζ3 ), for some constants 0 for some positive constants σ 2 0 and σ 2 1 .C3. Assume there exists some positive definite matrix Σ γ of dimension r such that m −1 m i=1 ( , with the eigenvalues of Σ γ being uniformly bounded from zero and infinity.Moreover, there exits some constant γ max such that lim m→∞ max i γ i Condition C1 allows a diverging number of factors (observable and latent) to capture all of the correlations among a large number of units, which is more realistic because the correlation structure is not truly low-dimensional under this setting.By condition C1, the number of units m could grow exponentially with the sample size n, so that m can be much larger than n.Condition C2 is commonly assumed in the literature; see, for example, [29].Condition C3 requires that the L 2 norm of the factors loadings (i.e., γ i 2 ) are bounded, so that the variance of the response would not go to infinity.Moreover, √ rγ i s can be regarded as m independent realizations from a non-degenerate population with a finite second moment.Conditions C4 is directly borrowed from [12], and can be verified by [3].Condition C5 requires that the signals on the alternative within the initial subset I * are sparse.Under the above the conditions, we demonstrate the following result.Proposition 3. Suppose conditions C1-C5 hold, and further assume that η ij is normally distributed.Then, for sufficiently large m and n, we obtain that where ν is as in condition C4 and with Σ X ∈ R p×p being the asymptotic covariance of the design matrix X.
The proof can be found in Appendix C. The Restricted-PCA algorithm first adjusts the main effect X via a restricted least-squares on an selected initial subset, followed by estimating the latent factors using the restricted residual matrix with the row indices corresponding to the initial subset I * .In practice, I * can be selected by the empirical posterior probability as in [20], the unadjusted p-values as in [16], or the prior information as in [17].The number of alternative hypotheses in the initial subset I * affects the performance of the estimated latent factors, resulting in two additional terms (i.e., b m,n ) on the asymptotic variance in ( 27) and (28).Note that both terms are of the order Card(I * ∩ I 1 )/Card(I * ) × n √ r if the signals Aβ i 's under the alternative are comparable and bounded, which is asymptotically ignorable under condition C5.Under the local alternative that Aβ i is of order n −1/2 , the term b m,n would go to zero if √ rCard(I * ∩ I 1 )/Card(I * ) goes to zero.This indicates that the choice of the initial subset is flexible.(27) implies that Z γ i obtained from the Restricted-PCA algorithm does not converge to its true counterpart Zγ i , but in the form of {Q(X) + PP }Zγ i .Thus, the Restricted-PCA algorithm produces biased estimators in the sense that it projects Z γ i into a sub-space spanned by Q(X) and P.However, the additional space spanned by the connection matrix P induced by restricted leastsquares is sufficient to verify the sufficient condition of Proposition 2. Compared to previous methods ( [20]; [16]; [17]), the restricted PCA algorithm has a formal theoretical justification.The good theoretical properties of the Restricted-PCA algorithm are essential to derive the consistency of the proposed FDP ad λ (t); see Theorem 1 for details.

Simulation studies
In this section, we gauge the performance of our proposed procedure on the basis of one important metric: the accuracy of the estimated FDP, via simultaneously testing k linear constraints in the large dimensional linear factor model (11).It is well known that lifestyle, geography, and biotic factors are at least as important as genetic divergence in modulating gene expression variation in humans.This enables us to use one-way ANOVA to model the gene differentiation, where lifestyle, geography, and biotic factors serve as common factors (observed and unobserved) and the idiosyncratic error explains some genetic factors that contribute limit to the observed differentiation such as methylation.To make the simulation more realistic, the model setup and parameters are calibrated from a real data set in a biological environment, detecting differentially expressed genes in peripheral blood leukocyte samples from desert nomadic, mountain agrarian, and coastal urban Moroccan Amazigh individuals ( [18]).

Model setup: one-way ANOVA model
(1) To mimic the biological environment of detecting differentially expressed genes across three geographically defined populations ( [18]), we set the large dimensional linear factor model (11) to where X j1 = 1, X j2 = I(j ∈ mountain agrarian), and X j3 = I(j ∈ desert nomadic), with I(•) denoting an indicator function.Each simulated replication consists of m tests and n samples.The samples are split into three equal sizes of groups parameterized by X, where the first column parameterizes the intercept, and the second and the third columns denote group membership.(2) We are interested in testing whether these three groups show differential expression on each gene i (i.e., Y i ), that is, For each simulated replication, 5% of the tests have nonzeros β i2 or β i3 with the order C/ √ n, where C is a constant.The reference level β i1 is randomly sampled from a standard normal distribution.
(3) It is well known that microarray gene expression data are confounded with some batch effects ( [19]), such as individual's age and gender, resulting in strong correlation between genes.Many of these batch effects are unfortunately unknown to statisticians or unavailable in many public gene expression data repositories ( [9]).Based on these observations, we assume the following latent factor structure on ε ij as to account for the potential dependence among the genes induced by some unobservable batch effects.Here, Z j1 and Z j2 are discrete random variables, which are i.i.d.taking values −1 and 1 with equal probability 0.5, and γ i1 and γ i2 are sampled from a standard normal distribution.(4) The idiosyncratic errors η j used to account for limited genetic variation are sampled from a multivariate normal distribution with mean 0 and covariance matrix Σ η = (σ η,i1i2 ) m×m with σ η,i1i2 = ρ |i1−i2| and ρ = 0.9.One can easily verify that the AR(1) covariance structure satisfies the weak dependence assumption (13).The results for other covariance structures, for example, the banded and blocked covariance structures, are numerically similar to those for the AR(1) covariance structure.
To determine the initial subset used in the Restricted-PCA algorithm, we adopt the variance filter proposed in [6] to select 95% of tests with smaller σ E,ii throughout Sections 5 and 6.Unless otherwise specified, we set the tuning parameter λ = 0.2 and the threshold t = 0.01.

FDP estimate: From negative dependence to consistency
We first confirm that FDP ad λ (t) in ( 19) can consistently estimate the true FDP of the oracle factor-adjusted p-values.For this purpose, we set √ n for i ∈ I 1 with C = 6, 12, respectively.The number of tests m and the sample size n are calibrated from two real data sets (m = 10177, n = 46 in [18] and m = 4397, n = 166 in [26]).From Figure 1(a)-(d), when the signal is 6/ √ n, the estimated FDP and the true FDP follow a 45 degree line, and as m and n increase, the variation decreases.Remarkably, from Figure 1(e)-(h), the estimated and the true FDPs tend to exhibit a negative dependence pattern as the signals increase.According to the discussion given in [24], the reason can be explained as follows.When the signals are large enough (i.e., √ n) such that all of the p-values under the alternative are smaller than the threshold t = 0.01, the slope of any two replications is approximated roughly as − m0 m1 × t (−0.19 in our setting).Rigorous proofs of this negative dependence pattern can found in Appendix D.Moreover, under the strong signal setting, the true FDP is expressed as V ad (t)  m1+V ad (t) , with the mean asymptotically close to m0t m1+m0t = π0t (1−π0)+π0t = 0.16.This is confirmed by the x-axis of Figure 1(g)-(h).In summary, the negative dependence pattern does not violate the consistency of our FDP estimator, because the result of Theorem 1 is based on the deviation between the estimated and the true values, a quantity that is analogous to the length of the "negative" line, and as m and n increase, the length decreases.

Factor-adjusted procedure via various algorithms for estimating the latent common factors
To demonstrate the effectiveness of the Restricted-PCA algorithm in estimating the FDP, we adapt the factor-adjusted procedure by replacing the Restricted-PCA algorithm with three algorithms for estimating the latent common factors as follows.
PCA: similar to the Restricted-PCA algorithm except that the residuals are obtained using unrestricted least-squares.IRW-SVA: the iterative re-weighted surrogate variable analysis algorithm given in [20].For each iteration, it first calculates the empirical posterior probability as a weight vector, that is, P(β i = 0, γ i = 0 | Y, X, Z (old) ), and then updates Z (new) using right singular vectors of a weighted matrix obtained by multiplying the weight vector to the corresponding rows of Y. Once the final estimation Z is formed, γ i is it calculated through the least-squares estimate based on the model Y i = Xβ i + Zγ i + η i , as if Z was observed.FAMT: the factor analysis and multiple testing method given in [16].Specifically, it first obtains a hybrid residual matrix by combing subsets of restricted and unrestricted residuals, and then applies the standard factor analysis EM algorithm to learn a set of latent factors.The subsets are determined using a preliminary p-value (unadjusted p-value) cut-off of 0.05.To benchmark the performance of the above algorithms, we consider an oracle procedure, where the true latent factors Z and factor loadings γ i 's are directly incorporated into FDP ad λ (t).This oracle procedure serves as a gold standard because the factor-adjusted procedure with one of these algorithms attempts to get as close as to the oracle one as possible.From Figure 2, the oracle procedure performs the best.Among these three algorithms, PCA estimates Zγ i by unrestricted least-squares.Because it ignores the fact that β i2 = β i3 = 0 for i ∈ I 0 , this estimator converges to Q(X)Zγ i , a quantity orthogonal to the connection matrix P. In such a circumstance, the sufficient condition given in Proposition 2 is not satisfied; the result is a lower estimated FDP as shown in Figure 2(b).Compared with PCA, the cut-off value of 0.05 for the unadjusted p-values is too subjective for FAMT to classify the true null and non-null sets in an initial step.As a result, there are certain hypotheses in the true null that are incorrectly identified as of statistical significance.Thus the hybrid residual matrix used in the subsequent factor analysis is unbalanced (with more rows erroneously estimated from unrestricted least-squares).Consequently, the FAMT-based factor-adjusted procedure fails to capture the true FDP when the true value is large; see Figure 2(c) for details.In contrast, from Figure 2(b), the Restricted-PCA algorithm borrows strength from both the restricted leastsquares estimate and the sparsity assumption on the selected initial subset, thus is able to assist the factor-adjusted procedure to estimate the true FDP consistently.Note that IRW-SVA obtains as good results as Restricted-PCA, as it estimates space spanned by the latent factors Z well due to the powerful weight vector, as shown in Figure 2(d).

Performance of the factor-adjusted procedure beyond normality
The results in Sections 5.1 and 5.2 have shown that the factor-adjusted procedure can consistently estimate the true FDP when the idiosyncratic error η j is normally distributed.Does this conclusion hold for other types of distributions?Towards this end, we modify the model setup (4) given in Section 5.1 by replacing normal distribution on η ij with double exponential distribution, t distribution with the number of degrees of freedom 3, and chi-squared distribution with 3 degrees of freedom (a centered version so that it has mean 0), respectively.To make a fair comparison, we independently sample η ij from each designated distribution.Figure 3 supports the conjecture made in Section 3.4.The factor-adjusted procedure in conjunction with Restricted-PCA continues to correctly track the true pattern of FDP when the idiosyncratic error is of double exponential distribution or t distribution with low degrees of freedom, both having finite second moment and being symmetric with respect to zero.Surprisingly, under the chi-squared distribution with degrees of freedom 3, the factor-adjusted procedure performs quite well with reasonable variation, revealing that our method is robust against asymmetric distributions.

An application: Tackling batch effects in gene expression among ethnic groups
In this section, we apply our method to a microarray data set with strong bath effects.This data set is publicly available in the gene expression omnibus (GEO): GSE5859, which contains the gene expression of 8793 genes for 166 subjects (60 European descent individuals from the Utah pedigrees of the Center d'Etude du Polymorphism Humain (CEU), 41 Han Chinese in Beijing (CHB), 41 Japanese in Tokyo (JPT), and 24 validation samples from Han Chinese of Los Angeles (CHLA)).While the previous work in [26] found a large proportion of differentially expressed genes among the populations or specific genetic polymorphisms, some follow-up articles (e.g., [19]) questioned the validity of these results due to strong batch effects.They argued that the pervasive signature of differential expression observed in [26] is a systematic bias introduced during the sample preparation or microarray expression measurement.Specifically, they found that arrays used to measure expression for the CEU individuals were primarily processed from year 2003 to 2004, whereas arrays used to measure expression for the Asian descent individuals were all processed in 2005-2006.Other unobserved variations may exist among the samples.We intend to further tackle the unobservable batch effects in gene expression among the four distinct groups using the factor-adjusted procedure in conjunction with the Restricted-PCA algorithm.
The model setup is identical to that in the simulation studies except that we compare four distinct groups here.Before conducting significance analysis, genes that are not well expressed in tissue must be excluded as genes with little variance in their expression are not proper candidates for differential expression.Moreover, genes with little variance tend to exacerbate the test statistics and distort the associated p-values to zero.Based on this rationale, a preliminary screening step is implemented to eliminate 50% of the genes using an independent filter called overall sample variance; see [6] for details.For the remaining 4397 genes, we calculated the unadjusted p-value, and four types of factor-adjusted p-values based on the PCA, Restricted-PCA, IRW-SVA, or FAMT algorithms, respectively; their corresponding histograms are shown in Figure 4.The unadjusted p-values display troubling behavior, with most pvalues shrinking to zero, and this phenomenon further deteriorates with the factor adjustment using the PCA, IRW-SVA, and FAMT algorithms.As a To further gauge the performance of our Restricted-PCA algorithm, we adapt the factor-adjusted procedure as proposed in Section 3.2 with latent factors estimated from Restricted-PCA, PCA, IRW-SVA, and FAMT, respectively.From Figure 5(a), we observe that the factor-adjusted procedure combined with PCA severely underestimates FDP, as it estimates only one principal component and by design this component is orthogonal to the design matrix.The same dilemma occurs for the factor-adjusted procedure based on the FAMT algorithm.In particular, due to the instability of the method for selecting the number of latent factors, IRW-SVA detects 23 principal components as the latent common factors.The redundant components naturally lead to an inaccurate FDP estimator.In contrast, Restricted-PCA estimates two principal components as surrogate variables for batch effects and incorporates them to correct the null distribution of the factor-adjusted p-values.Hence, the Restricted-PCA based factor-adjusted procedure provides more accurate estimation of FDP and finds fewer significant genes at commonly used significance levels, as shown in Figure 5(d).

Concluding remarks
In this article, we provide a new method for accurately approximating the true FDP when the test statistics follow a multivariate non-central chi-squared distribution with arbitrary covariance dependence; this method shares a similar flavor with previous works based on the assumption of normality ( [13]; [12]).Our testing framework is constructed based on k independent multivariate normal random vectors with an arbitrary common covariance dependence structure, where k is the number of degrees of freedom for each chi-squared test statistic.Under assumptions (2) and (3), we theoretically approximate a valid FDP expression for a large number of highly correlated chi-squared test statistics.This new type of approximation is then used to simultaneously test k linear constraints in a large dimensional linear factor model with a latent factor structure.The specific testing problem is solved via a factor-adjusted procedure, which is more robust and powerful than the traditional unadjusted method, because the latent common factors are taken into account at an early stage before any tests are performed, resulting in a new ranking of the hypotheses being tested.We further propose a Restricted-PCA algorithm to estimate the latent common factors, and incorporate them to form a FDP estimator based on the factor-adjusted p-values.Under some mild conditions, we show that this estimator can consistently estimate the true FDP.
To broaden the usefulness of the proposed procedure, we conclude the article by identifying two possible research avenues.Theoretically, it would be of interest to extend our testing framework to accommodate multivariate non-central chi-squared distribution with large degrees of freedom.This situation is particularly useful in large-scale hypotheses testing problems with a nonparametric alternative; see the generalized likelihood ratio principle in [15] for details.Empirically, multiple testing procedures based on chi-squared test statistics are popular in applications, such as the semiparametric inference for identifying significantly activated voxels in brain imaging studies ( [33]) and the genomewide SNP-set testing based on the logistic kernel machine model for assessing the association between the genotyped SNP-set and disease risk ( [30]).In both scenarios, nonparametric functions are involved in addition to explanatory covariates, a setting which is more general than the large dimensional linear factor model.How to appropriately connect these extended models to our testing framework and provide valid multiple testing procedures are topics for future study.

Appendix A: Notation index
The following three tables summarize the most common recurring notations, and indicate the section where each symbol is defined.As a convention, we denote by v 2 the L 2 norm of a vector v, and M F = {tr(M M)} 1/2 the Frobenius norm of an arbitrary matrix M, respectively.Moreover, we denoted by M S1S2 the submatrix of M with the rows and columns corresponding to the index sets S 1 and S 2 , where S 1 or S 2 could be a single index i or j, or even a full set.When S 1 or S 2 is a full set, we use a centerdot to replace it for notational simplicity.The number of true null hypotheses 2 m 1 The number of alternative hypotheses 2 n The number of repeated observations 3 k The degrees of freedom of the chi-squared test 2 k The number of linear constraints The indices used for the tests 3 j = 1, . . ., n The indices used for the observations 3 = 1, . . ., k The indices used for the degrees of freedom 2 p The number of observable factors 3 r The number of latent factors 3 limm am The limit of the sequence of {am} exists 2

Table 2 Notations related to the model W
The small panel of data matrix for test statistics 2 Z ∈ R k×r The latent factor matrix attached to W The factor loadings associated with Z 2 η ∈ R m×k The idiosyncratic error matrix attached to W 2 The test statistic for testing The numbers of false rejections for P i s at the threshold t 2 R(t) The numbers of total rejections for P i s at the threshold t 2 FDP(t) The false discovery proportion for P i s at the threshold t 2

Table 3 Notations related to the model
The observed panel data matrix The observed design matrix Z ∈ R n×r The latent factor matrix attached to The factor loadings associated with Z η ∈ R m×n The idiosyncratic error matrix attached to Y T u i The unadjusted test statistic for testing The oracle factor-adjusted test statistic for testing The numbers of false rejections for P ad i s at the threshold t R ad (t) The numbers of total rejections for P ad i s at the threshold t FDP ad (t) The false discovery proportion for P ad i s at the threshold t T ad i , P ad i The realized versions of T ad i and P ad i V ad (t), R ad (t) The realized versions of V ad (t) and R ad (t) The estimator of FDP ad (t) Z The estimator of Z by Restricted-PCA I * The initial subset used for Restricted-PCA m * The cardinality of I *

Proof of Lemma 1
The result can be obtained using similar techniques as thoses in Theorem 1 of [21].By condition (2) that m≥1 a m /m < ∞ and Lemma 2 of [21], there exists an increasing sequence of integers {m q } such that q≥1 a mq < ∞ and m q+1 /m q → 1.By Lemma 3 of [21], we can obtain that m q −1 i≤mq x i,mq → 0 a.s.Lemma 1 can be proved by applying a suitable maximal inequality to interpolate between {m q , q ≥ 1}.Specifically, for m q ≤ m < m q+1 , where the second term converges to zero almost surely as m q → ∞.By condition (1) that |x i,mq+s | ≤ 1, the third term is bounded by mq+1−mq mq → 0. The first term is asymptotically negligible by condition (3) that x i,mq −x i,mq +s converges to zero uniformly.Combining the above results together, the proof is completed.

Proof of Proposition 1
Whenever necessary, we use superscript {} m to emphasize that the vector or matrix therein depends on m.We first prove (6).The expectation of I(P i ≤ t | Z) is derived as where 29), the goal is to show that We first show a weaker version of (30), that is, As discussed in Section 2.

Proof of Proposition 2
We first verify the conclusion in (24).Motivated by the structure of P ad i , we write By more derivations, W ad i is expressed as For ease of presentation, let a = σ By the sufficient condition, we have a → p 1 and b → p 0 uniformly for i ∈ I.By definition, for any ε > 0, there exist a δ > 0 and sufficiently large m and n such that when m > m and n > n, we have P(Ω 1 ) ≥ 1 − ε and P( Similarly, when m > m, n > n, and ω ∈ Ω 1 ∩ Ω 2 , we have By letting ε → 0 and δ → 0, (37) and (38) imply that, with a probability tending to one, lim m,n→∞ Thus V ad (t)/m 0 −V ad (t)/m 0 converges to zero almost surely.( 25) can be derived in a similar way as (24), hence the proof is omitted.
As a result, R ad (t)/m ≥ t/2 as m is sufficiently large.Similarly, ]t = cπ 0 t almost surely.This together with (22) implies that {m π 0 (λ)t − cV ad (t)}/m converges to zero almost surely, which completes the proof of Theorem 1.

Appendix C: Consistency of the Restricted-PCA algorithm
For notational simplicity, we denote by Žj , Λe , γi , ση,ii and Z j , Λ e , γ i , σ η,ii the associated estimators of Z j , Λ e , γ i , σ η,ii based on the error matrices {Q(X) + PP }E I * and E I * in the Restricted-PCA algorithm, respectively.Before proving Proposition 3, we present four useful lemmas below.Lemma 2 can be obtained directly through Bonferroni inequality.Lemma 3 consists of the so-called Weyl's Theorem and sin(θ) Theorem used in [14].Hence, the proofs of both lemmas are omitted here.The proofs of the last two lemmas involve more tedious derivations than that in [3], [14], and [12], because we allow p and r to grow to infinity at some polynomial rates.For completeness, we supply detailed proofs for Lemma 4 and Lemma 5 in this Appendix.

Proof of Lemma 4
We prove the lemma in the following two steps.In the first step, we demonstrate that max i∈I σ η,ii − σ η,ii = O p log(m)/n + (r + p)/n + r/ √ n + r 2 /n , followed by extending the result to ση,ii in the second step.
We first show that max i∈I σ η,ii − σ η,ii = O p log(m)/n + (r + p)/n + r/ √ n + r 2 /n .Note that σ η,ii = n −1 η i. η i. with η = Q(Z)Q(X)E , where Z and Q(Z) are analogously defined in Section 4. As a result, we have σ η,ii = n −1 E i Q(X)Q(Z)Q(X)E i , which leads to We then consider the above two parts separately.We first consider the second part.Note that η i follows a multivariate normal distribution with mean zero and covariance matrix σ η,ii I n .As a result, η i Q(X, Z)η i /σ η,ii follows a chisquare distribution of degree n − r − p.Thus, according to Lemma 2, we can obtain that We next consider the first part of (41).Note that We then consider the three parts in (42) in the following two steps.STEP I.We first consider n According to the results of Theorem 2 in [29] that tr{Q(Z)−Q(Z)} 2 = O p (r 2 /n), we can obtain that where the last equality is due to the fact that max i η i Q(X)η i = O p (n − p) by Lemma 2. STEP II.We next consider 2n −1 η i Q(X)Q(Z)Q(X)Zγ i , which is less than |2n −1 η i Q(Z)Zγ i |.As Q(Z) is orthogonal to Z, the order of 2n −1 η i Q(Z)Zγ i is equivalent to that of 2n −1 η i {Q(Z) − Q(Z)}Zγ i .By the results of Theorem STEP III.We last consider n −1 γ i Z Q(X)Q(Z)Q(X)Zγ i .By condition C3 and the fact that tr(Z Q(Z)Z) = O p (r 2 ) as demonstrated in [29], we obtain Combining all of these results above, we have completed the first step of the proof for Lemma 4.
We next consider the second step of Lemma 4. Note that {Q(X) + PP }E can be written as {Q(X)+PP }Zγ +{Q(X)+PP }η .As a result, {Q(X)+ PP }E also follows a latent factor structure of dimension r.Thus, by the results in the first step of the proof, we can obtain that max i∈I ση,ii − σ η,ii = O p log(m)/n + (r + p)/n + r/ √ n + r 2 /n .
This completes the proof of Lemma 4. We then consider the above two terms separately.The first term of (43) can be bounded by

Proof of
Note that Z P are i.i.d. with mean zero and identity matrix of dimension r, due to the fact that P P = I k .Hence, the trace of k =1 Z P P Z is O p (r).As a result, max i∈I We consider the above two parts separately.First note that Moreover, according to condition C3, we have max i γ i Here, (Z 1 , . . ., Z r ) = Z.Consequently, we have We next consider the second part of (44).By Cauchy-Schwarz inequality, we have Z Q(X)η i For the second part, similar derivations yield that max
(a) [Determine the initial set]: Select an initial subset I * with its cardinality Card(I * ) = m * , such that the number of alternative hypotheses within I * is negligible.(b) [Estimate the latent factors]: We first extract the effect of the observable explanatory variables X by regressing Y i on X under the constraint Aβ i = 0 for i ∈ I * , and obtain the restricted residual matrix as E I * = {Q(X) + PP }Y I * .We next define by {( Λ e , e ), e = 1, . . ., n} the eigenvalue-eigenvector pairs of (nm * ) −1 E I * E I * , with Λ 1 ≥ . . .≥ Λ n .The number of factors can be selected by maximizing the eigenvalue ratios as r = arg max 1≤e≤n−1 Λ e / Λ e+1 ([1]).Accordingly, Z = n 1/2 ( 1 , . . ., r ).(c) [Estimate the unknown parameters]: Using least-squares estimate, for i ∈ I, γ i and σ η,ii can be estimated by

Fig 1 .
Fig 1.The FDP ad λ (t) given by the factor-adjusted (FA) procedure with the latent factors estimated by the Restricted-PCA algorithm for various signals β ≡ β i2 = β i3 = C/ √ n, C = 6, 12, respectively.Here, the x-axis is the true FDP of the oracle factor-adjusted p-values with the threshold t = 0.01, and the results are based on 500 replications.

Fig 3 .
Fig 3.The FDP ad λ (t) versus FDP ad (t) for the factor-adjusted (FA+Restricted-PCA) procedure when the idiosyncratic error η ij is independently sampled from double exponential distribution, t distribution with the number of degrees of freedom 3, and chi-squared distribution with 3 degrees of freedom (a centered version so that it has mean 0), respectively.Here, m = 4397, n = 165, β = β i2 = β i3 = 12/ √ n for i ∈ I 1 with the threshold t = 0.01, and the results are based on 500 replications.

Fig 4 .
Fig 4. (a): Histogram of the unadjusted p-values resulting from tests of differential expression among four groups without a latent factor structure; (b)-(e): Histogram of the factor-adjusted p-values when the latent factors are estimated from Restricted-P CA, PCA, IRW-SVA, and FAMT algorithms, respectively.

Fig 5 .
Fig 5.Estimated FDP versus the number of rejected genes from 20 to 400 in multiples of 20 for the factor-adjusted (FA) procedure combined with four various algorithms, respectively.Here, r denotes the estimated number of latent factors by each specific algorithm. lim

2 2 ≤ 2 2 + 2 Z 2 .
2 (Z − Z) Q(X)η i Q(X)η i 2The first part can be further bounded as max i∈I the regression coefficients of unit i, and E i = (ε i1 , . . ., ε in ) ∈ R n are the random errors that are independent of X.
Similar to the proof of Lemma 4, we only need to prove that max Lemma 5 i∈I P Zγ i − P Zγ i 2 = O p r/n r + √ r log(m) + O p m −ν √ r .By Cauchy-Schwarz inequality, we can obtainP Zγ i − P Zγ i 2 = P Zγ i − P Zγ i + P Zγ i − P Zγ i 2 ≤ √ 2 P Zγ i − P Zγ i 2 + √ 2 P Zγ i − P Zγ i 2 .(43)