Exact distributions of statistics for making inferences on mixed models under the default covariance structure

At this juncture when mixed models are heavily employed in applications ranging from clinical research to business analytics, the purpose of this article is to extend the exact distributional result of Wald (Ann. Math. Stat. 18: 586–589, 1947) to handle models involving a number of variance components.Due to the unavailability of exact distributional results for underlying statistics, currently available methods provide small group/sample inference only for balanced ANOVA models or simple regression models. The exact distributional results developed in this article should prove useful in making inferences by such methods as parametric bootstrap, fiducial, and generalized p-value approach, when there are a number of variance components to deal with.

(2020) 7:4 Page 2 of 14 In fact, in Corporate America the BLUP has replaced the LSE as the most widely used statistical technique in advanced business analytics. In mixed-effects regression models, the MLE (Maximum likelihood estimator) based methods, are basically the only class of methods available for making inferences on models involving a number of random effects. But such solutions are asymptotic methods requiring, not only the sample sizes to be large, but also the number of levels in a grouping structure to be large, an unreasonable assumption. Therefore, in this article we will also briefly develop tests and interval estimates for variance components and their functions.

The model and problems
To describe the underlying problems more precisely, consider a regression model in a mixed model setting with a number of random effects in a grouping structure, y = Xβ + Z 1 u 1 + Z 2 u 2 + · · · + Z J u J + , where X is an N × k matrix formed by a set of covariates corresponding to k fixed effects β, Z j are N × k j matrices formed by covariates corresponding to random effects u j of j th factor with k j levels. We make the usual assumption that ∼ N 0, σ 2 e I N . Also using the most widely used compound symmetric covariance structure, which is also the default setting in statistical software packages, we also make the assumption u j ∼ N 0, σ 2 j I j , j = 1, · · · , J, and that they are distributed independently of each other and the error terms. It should be noted that the vector of fixed effects β contain the overall mean effect of each of the factors having random effects, which we denote as β zj , j = 1, · · · , J.
The model can also be written in familiar compact form as where Z = Z 1 , · · · , Z J and u = u 1 , · · · , u J . Henderson (1975) provided the formulas for the BLUEs (Best linear unbiased estimators) of the fixed effects and the BLUPs of random effects when the variances are known. Robinson (1991) showed that various derivations available in the literature for these quantities all lead to Henderson's formulas X Xβ + X Zũ = X y, Z Xβ + (Z Z + −1 )ũ = Z y, where = Var(u) andũ is the BLUP of u. Under the compound symmetric covariance structure, the default in statistical software packages, ρ = /σ 2 e = diag(ρ 1 I 1 , ρ 2 I 2 , · · · , ρ J I J ), where ρ j = σ 2 j /σ 2 e . The equations in (3) can be solved explicitly forβ andũ. It is evident from (2) and was formally proved by Henderson et al. (1975) thatβ is actually the GLSE (Generalized least squares estimator) β = (X −1 X) −1 X −1 y, where = I N + Z ρ Z .
( 4 ) Then, we can easily obtain the explicit solution for the random part of the predictor as

The issue with available results
When we go about making inferences in applications involving a number of variance components, we encounter the fact that the variance components are unknown parameters, and hence need to be tackled before we can carry out any kind of inference. The most widely used method is to estimate them by the REML (Restricted maximum likelihood) or the ML (Maximum likelihood), and then carry out tests and interval estimates using their asymptotic distributions. It is well known that (cf. Weerahandi (2004) and Gamage et al. (2013)), due to using such asymptotic assumptions, which require not only the sample sizes to be large, but also the number of factor levels in a grouping structure to be large, an unreasonable assumption in practice. Therefore, the purpose of this article is to develop inference methods without relying on asymptotic distributional results. Since variance components are important in their own merit in a variety of applications we will undertake one application in the area of industrial process control.

Exact distributions of LSEs
To solve problems involving variance components, regardless one takes the GPQ (Generalized pivotal quantity) based approach, parametric bootstrap approach, or generalized fiducial approach, we need to tackle the variance components, which are typically unknown. The inferences based on MLE and asymptotic results are highly inaccurate in that they seriously miss the intended Type I error in testing of hypotheses and the intended probability coverage in interval estimation

Distributions for inferences on variance components
By considering the ordinary least squares regression of X = (X, Z) on y, Wald (1947) showed, albeit in non-matrix notation, that the distribution of the residual sum of squares, S e = y I n − X( X X) −1 y is related to the familiar chi-squared distribution ∼χ 2 e , where e = N − rank( X).
Moreover, U was shown to be distributed independently of u. TypicallyX is not of full rank as Wald (1947) implicitly assumed, but the assumption can be maintained by redefin-ingX as the non-singular sub-matrix that one gets if the regression is run blindly. Gamage et al. (2013) showed how one variance component can be tackled by transformingX by an orthonormal basis for the vector space spanned by its columns. Actually, any number of variance components in a compound symmetric structure can be handled more conveniently and efficiently based on the LSE results, because underlying LSE results based on the QR algorithm is highly computationally efficient and fool proof in handling nonsingular design matrices. This is accomplished by extending Wald's argument for any number of random effects, and properly redefining theX matrix. From the theory of regression and extended Wald results, it is known thatû|u ∼ N u, σ 2 e (X X ) −1 u , where (X X ) −1 u ) is the sub-matrix of (X X ) −1 corresponding to the u part of the LSE regression. Therefore, the unconditional as well as the conditional distribution of (û − u) given u is given by is also the same and distributed independently of u. Hence, the LSE, as opposed to Henderson's predictorũ, of u is distributed aŝ where u is the covariance matrix of u. In particular, individual variance components and variance ratios of practical importance can be tackled based on the distributional result In computingû j s by LSE, perhaps it is convenient to run the unconstrained regression, and then center eachû j by subtracting the mean, the mean of estimated β zj .

Distributions for functions of variance components
The distributional result (8) allows us to handle any variance component by the distribution of the corresponding component ofû. In solving some of the inference problems in quality assurance and in applications of the BLUP, however, we also need to tackle multiple variance components simultaneously. To develop necessary distributional results for such applications, consider two factors of interests, and letû 1 andû 2 be the least squares estimators of their random effects u 1 and u 2 , each of which corresponds to non-singular sub-matrices ofX. Then, it follows from (7) that where xρ = x (ρ 1 , ρ 2 ) = X X −1 12 + ρ 1 I 1 0 1 0 2 ρ 2 I 2 .
Let −1 xρ be the inverse of the matrix xρ , found by Eigen value decomposition or otherwise. Then, from (9) we can obtain two independent random vectors as Now it is evident that the two variance components can be tackled using the two independent chi-squared random variables where a l = rank(X l ), which in many cases equal to k l − 1. As shown by Weerahandi and Gamage (2016), an alternative way of making inferences in problems involving two parameters can be accomplished by a marginal distribution of one statistic and the conditional distribution of the other statistic. In our application we readily have the marginals given by (8), the conditional distributions can be found using the multivariate distribution theory on normal distributions. For example, the conditional distribution ofû 2 givenû 1 is given bŷ , It is now evident that we can tackle the two variance components of interest using the chi-squared statistics, However, it is inconvenient to work with conditional distributions in deriving inference methods. We can avoid the above conditional distribution by applying the transformation suggested by Weerahandi and Gamage (2016). In our problem this is accomplished by considering the distribution of the CDF (Cumulative distribution function) of V 21 , which is distributed as where uppercase variables represent the observable random quantities, and C is the CDF of the chi-squared distribution with a 2 degrees of freedom, which is readily available from statistical software packages. But the above uniform distribution is free ofû 1 . Therefore, the random variable U 21 is distributed as independently of U 1 . Obviously the above argument can be extended for two sets of random effects to enable handling of any number of variance components in an iterative manner.

Illustration: Inference on functions of variance components
In this section we illustrate the application of foregoing results by undertaking a function of variance components that arise in practice of statistical quality assurance and in some applications of BLUPs. For a variety of problems associated with Gauge, "Repeatability" and "Reproducibility", and the reader is referred to Wu and Hamada (2009) for a detailed discussion of these notions and for various sums and ratios of variance component that are important in quality assurance measurement systems. We will illustrate the approach that one can take in such applications, by considering the problems of making inferences on a variance component and a sum of one variance component σ 2 j of and the error variance σ 2 e . Making inferences about the variance ratio ρ j itself is a trivial task because, it follows chi-squared random variate and a = k j − 1 and (X X ) −1 j is the sub-matrix of ( X X) −1 corresponding to the u j part of the regression. In some applications it is more convenient to work with the alternate form of (15), (2020) 7:4 Page 6 of 14 If needed, the computation of S(ρ j ) can be simplified by diagonalizing the (X X ) −1 j matrix by means of an orthogonal matrix, say P, as where D j is a diagonal matrix formed by Eigen values λ j of (X X ) −1 j andũ j = Pû j . It follows from (15) and (8) a test statistic involving no nuisance parameters. For example, it is easily seen that an upper 100γ % confidence bound for ρ j is given by where q γ is the γ th quantile of the F-distribution with a and e degrees of freedom, and s −1 j is the inverse function of s(ρ j ).
Making inferences on individual variance components, and sums and some ratios of variance components is not a trivial task. In fact, classical approach to inference fails to provide small sample inference even in the simple balanced mixed models (cf. Weerahandi (1992)). Whereas inferences on any function of variance components is easily accomplished by taking the generalized approach to inference. It should be emphasized that one can obtain equivalent results by taking the generalized fiducial inference approach introduced by Hannig et al. (2006). For example, most of the the problems undertaken by Cisewski and Hannig (2012) can be tackled using exact distributional results without relying on any asymptotic or approximate results. Moreover, now that we have exact distributional results, one can also take Parametric Bootstrap approach to make inferences involving exact probability statements, just like Generalized Inference does.

Inference on individual variance components
Perhaps the easiest way to make inferences on individual variance components as well as functions of variance components, is by replacing each nuisance parameter by their GPQ (Generalized pivotal quantity) that reduces to the parameter at the observed sample point, and then showing the resulting quantity leads to a well defined extremer region in the problem of testing of the parameter of interest. While the error variance, σ 2 e is easily handled by the obvious GPQ suggested by (6) which is basically the same as the classical PQ (Pivotal quantity). The GPQs of other variance components is not that easy to derive, but by applying what is known as the substitution method (cf. Weerahandi (2004)) to Eqs. (6) and (15), we can obtain a GPQ that reduces to σ 2 j at the observed sample point as where s() −1 is the inverse function of s() unless it is negative, in which case the GPQ is set to 0. Obviously expression (19) provides a GPQ for σ 2 j , because it has the required properties: • The distribution of G σ 2 j is free of unknown parameters • At observed sample points, G σ 2 j depends only on the parameter of interest, σ 2 j .
In fact, at observed sample points, G σ 2 j reduces to σ 2 j . Nevertheless, the inverse function s −1 () of s() can be avoided, by first considering the problem of testing null hypotheses of the form A GTV (Generalized test variable), a notion introduced by Tsui and Weerahandi (1989), for testing the above hypotheses can also be obtained by the substitution method in alternative equivalent forms as Notice that the distribution of the above test variable is free of nuisance parameters, because the expression of T in (20) involves only σ 2 j , the parameter of interest. In addition, from its first expression, it is clear that the test variable reduces to 1 at the observed values of the random variables, because at the observed values S e become equal to s e , and S becomes equal to s, thus leading to a well defined point on the boundary of extreme regions based on T. Moreover, the random variable T tends to be stochastically increasing in σ 2 j , because it follows from (16) thatS () is a stochastically decreasing function of σ 2 j . Hence, the generalized p-value for testing H 0 can be obtained as where G a is the CDF of the chi-squared distribution with a degrees of freedom and the expectation is taken with respect to the random variable U ∼ χ 2 e . The generalized confidence intervals for the variance component σ 2 j can be derived from the GPQ (19) or deduced directly from the p-value in (21). In the latter approach, if σ 2 1 and σ 2 2 are chosen such that EG a s s e U , σ 2 1 = 1 − γ 2 and EG a s s e U , σ 2 2 = 1 + γ 2 , then σ 2 1 , σ 2 2 is a 100γ % generalized confidence interval for σ 2 j .

Weerahandi and Yu Journal of Statistical Distributions and Applications
(2020) 7:4 Page 8 of 14

Inference on sums of variances
To illustrate the approach one can take in dealing with various sums and ratios of variance components that arise in quality assurance applications (cf. Wu and Hamada (2009)) consider the particular problem of making inferences about quantities such as σ 2 1 +σ 2 2 . Unlike simple in variance ratios, the problem is non-trivial even if one of the two variances is the error variance σ 2 e . So, in order to illustrate the approach that one can take in dealing with sums of variance components, consider the particular problem of constructing interval estimates for the sum σ 2 s = σ 2 e + σ 2 j It follows from (18) and (19) that is a potential GPQ for the sum. It is indeed a GPQ for σ 2 s , because it has the required properties: • The distribution of G s is free of unknown parameters • At observed sample points, G s depends only on the parameter of interest.
Moreover, at observed sample points G s reduces to σ 2 s . Therefore, 100γ % generalized confidence intervals on σ 2 s can be constructed by first finding quantiles of the GPQ and obtaining one-sided interval estimates as σ 2 s ≤ q γ . The quantiles can be easily computed by exact numerical integration. The generalized confidence can be computed more conveniently using Monte Carlo integration. For example, 95% generalized confidence intervals for σ 2 s can be obtained in the following steps: • Generate a large set of random numbers from the independent chi-squared random variables (U i , V i ), say i = 1, 2, · · · , 100000 pairs, • For each pair, compute G i = G s (U i , V i ), i = 1, 2, · · · , 100000, from U i ∼ χ 2 e and V i ∼ χ 2 a , • Sort the G i samples and obtain the order statistic as G (1) , G (2) , · · · , G (100000) , • Obtain the 95% lower confidence bound for σ 2 s as G (5000) , • Obtain the 95% upper confidence bound for σ 2 s as G (95000) , • Obtain the 95% equal-tail confidence interval for σ 2 s as G (2500) , G (97500) .
Testing of hypotheses can also be carried out using the above interval estimates. For example, the null hypotheses H 0 : σ 2 s = δ is rejected at 0.05 level of significance if the interval estimate, G (2500) , G (97500) does not contain the hypothesized value δ.

Inference on fixed effects
Next consider the problem of making inferences on some or all fixed effects in the presence of random effects. Confidence intervals and tests available from widely used software packages such as SAS PROC MIXED and R lme are based on asymptotic distributions of the REML and the ML of variance components. Unlike inferences directly on variance components, these methods are not too bad. But since fixed effects also depend on variance components to some extent, the purpose of this section to show how one can make inferences based on exact probability statements. If variance components were known parameters, then we can make inferences about β based on its GLSE (Generalized Leased Squares Estimator) specified by (4), which is distributed aŝ But the issue of concern is that, while it is trivial to handle the error variance σ 2 e , the classical approach fails to provide satisfactory inferences when the among group variances are unknown, as usually the case. The inference problem is not simple becauseβ is itself is a function of nuisance parameters, namelŷ There is of course no issue in point estimation. In fact, simple LSE based point estimates are unbiased estimates, which are as good as point estimates provided by the REML and the ML methods. The problem is making inferences about components of β beyond the point estimation is not trivial. So we will next develop generalized inference methods to tackle β.
To show how the generalized inferences based on exact probability statements can be developed based on (22), let us consider the case where we have one factor between group variance σ 2 . In this case the covariance matrix appearing in (4) reduces to = I N + ρZZ , where ρ = σ 2 /σ 2 e . Consider the problem of testing individual components of β, results of which can be easily extended to test a vector of fixed effects. In generalized approach to inference such nuisance parameters as ρ can be tackled by its GPQ, the random quantity which reduces to ρ at the observed set of sample points. In Section 4 we saw that this GPQ yields that same inferences on ρ as the classical PQ, but as we see below, it has greater potential when treated as a GPQ. Now we can make inferences on the β vector and its components based on the result (23) using the resulting GPQ To illustrate the steps in making such generalized inference, consider the specific problem of testing hypotheses of the form where β 0 is a specified constant with β 0 = 0 in (24) being the most widely used situation of testing the significance of a variable of special interest such as the efficacy of a drug. If there were no random effects, we would have used the t-test suggested by the classical regression theory to perform the test. To develop the counterpart in the presence of a random effects, let us use the distributional result

Weerahandi and Yu Journal of Statistical Distributions and Applications
(2020) 7:4 Page 10 of 14 whereβ i (ρ) is the i th component of β(ρ) corresponding to the parameter β i being tested. This result along with (6) implies that a result we could have used if the variance ratio ρ were known. Moreover, recall that we can tackle the ρ parameter using the distribution results (6) and (15), namely ∼χ 2 e , where e = N − rank( X) and in the case of one grouping structure with l groups.
Since ρ is unknown in practice, the above result leads to an approximate t-test only. However, we can derive a generalized test based on exact probability statement using forgoing distributional results. In view of the form of the T statistic in the known ρ case, it is evident that the unknown variance ratio can be tackled by using the potential generalized test variablẽ where R is a function of U and V , and Z ∼ N(0, 1), U ∼ χ 2 e , and V ∼ χ 2 a are mutuality independent random variables.
It is clear from the second expression of (26) that the distribution ofT is free of unknown parameter, and from the first expression it is clear that at the observed set of sample points, the test variable reduces to 1/ √ s e . Moreover the distribution ofT is stochastically monotonic in β i . Therefore, the hypotheses of the form (24) can be tested based on the generalized p-value or 1 − p, where the expectation in (27) is taken with respect to the chi-squared random variable U and V , and S(ρ) is given by (15).

Simulation study
The purpose of this section is to conduct a Monte-Carlo simulation study to study the performance of the proposed method compared with the widely used MLE based methods that are available from such software packages as SAS Proc Mixed and R lme. Weerahandi (2004, Chapter 4) provides simulated results showing serious Type I Error issues of ML and REML based tests in the context of random effects ANOVA. Whereas, in our simulation study concerning the regression case, we considered 3,5 and 10 groups and error variance is fixed at 1. Though not necessary, to minimize the extent of the simulation, sample size of each group is assumed to be equal and set at values n = 10 and 100. In each of the studies below, we also included one X variable with fixed effect 10 is included when X takes on normal random numbers with mean 0 and variance 0.1. Moreover, except when more than one random effect is necessary, the Z is set to be taken as normal random numbers with mean 1 and variance 1. In typical applications where mixed effects models add value over fixed effects models, the factor variance tends to be smaller than the error variance, and so we fixed the latter was set at 1 and the former was varied to have standard deviations at 0.01, 0.1, and 1.

Testing the variance component
The literature provides comparison of tests available for testing variance components basically for ANOVA type models only. Here we compare the performance of the proposed generalized test in Section 4 against the two methods available from the R software package. One widely used test is an REML based method available from the R lme function with the "interval" procedure, which is an improved REML based Wald-type confidence interval. The other competing method is the PL (Profile likelihood) based confidence interval available from R lmer with the "confint" procedure. Table 1 below shows the Type-I error of alternative tests for testing hypotheses of the form H 0 : σ 2 0 ≤ σ 2 A . When the sample size and the factor variance are both large, REML method fails due to non-convergence, and so such cases are indicated as NA. It is evident from Table 1 that the proposed test outperforms both the competing methods under all scenarios. The REML method seriously misses the intended size of tests when the factor variance is small and become conservative when the factor variance is large, whereas the PL method is too conservative in all cases. The proposed method also becomes somewhat conservative when the factor variance is also large. The performance of likelihood based methods somewhat improve as the number of factor levels increases.

Testing fixed effects
The literature does not seem to provide any comparison of tests available for testing fixed effects of mixed effects regression models, perhaps thinking that their optimality is covered by the classical theory. Although the situation is not as serious as that of variance components, the assumption is valid only if the variance components are known. Therefore, here we compare the performance of the proposed generalized test in Section 5 against the two methods available from software packages such as SAS and R, namely the REML and ML based tests. Again since multiple covariates add no value in understanding the comparative performance competing tests, we consider one covariate X, whose fixed effect is denoted as β. The performance was found to vary somewhat with choice of x, and therefore, to understand the overall performance of competing tests for H 0 : β = 0, we generated X as a 50% mixture of X used in the previous section and one taking on equally spaced values from -1 to 1. Table 2 below shows the performance of competing methods as the number of groups take on typical values 3, 5, and 10, whereas the sample size from each group takes on value 10 and 100. In each case, the performance of competing tests were studied as the factor variance σ A vary from 0.1 to 1.0. It is evident from Table 2 that when number of groups is small, type I error of MLE based methods tends to be somewhat larger than the intended size of 0.05, whereas the propose method stay closer to the intended size. The ML test is worse than the REML test and could be as large as .084, though not as bad as tests on variance components. When the sample size is small, Type I error of each test tends to go up with the factor variance, whereas when the sample sizes are large Type I errors of RMEL and PL methods tend to go down. When the number of groups and the sample sizes are both large, all three methods behave practically the same.

Discussion
In this article we have shown how to make inferences in mixed effects models based on exact distributions of underlying statistics when the covariance structure is compound symmetric. The simulation studies undertaken in this article have shown the potential in developing tests and confidence intervals that are better than currently available asymptotic methods. Our results also show that there is potential for further improvement of inferences, especially when factor variances are too large or too small compared to the error variance. Our tests and interval estimates are based on the generalized inference approach. Researchers are encouraged to develop alternative inferences by taking other promising approaches as Parametric Bootstrap and Fiducial inference with a view towards further improving the performance of tests, especially when factor variances are large compared to the error variance. Also desirable is to extend results to widely used and practically useful other covariance structures. Gamage et al. (2013) developed generalized prediction intervals for BLUPs of mixed models. Further research necessary to extend such results to compare two BLUPS in the same model or BLUPs of two independent models. The results presented in this article should pave the way for such developments.