Maximum Likelihood and Restricted Likelihood Solutions in Multiple-Method Studies

A formulation of the problem of combining data from several sources is discussed in terms of random effects models. The unknown measurement precision is assumed not to be the same for all methods. We investigate maximum likelihood solutions in this model. By representing the likelihood equations as simultaneous polynomial equations, the exact form of the Groebner basis for their stationary points is derived when there are two methods. A parametrization of these solutions which allows their comparison is suggested. A numerical method for solving likelihood equations is outlined, and an alternative to the maximum likelihood method, the restricted maximum likelihood, is studied. In the situation when methods variances are considered to be known an upper bound on the between-method variance is obtained. The relationship between likelihood equations and moment-type equations is also discussed.

mum likelihood is considered in Sec. 3. An explicit formula for the restricted likelihood estimator is discovered in Sec. 3.2 in the case of two methods. Section 4 deals with the situation when methods variances are considered to be known, and an upper bound on the between-method variance is obtained. The Sec. 5 discusses the relationship between likelihood equations and moment-type equations, and Sec. 6 gives some conclusions. All auxiliary material related to an optimization problem and to elementary symmetric polynomials is collected in the Appendix.

ML Method and Polynomial Equations
To model interlaboratory testing situation, denote by n i the number of observations made in laboratory i, i = 1, ..., p, whose observations x ij have the form (1) Here μ is the true property value, b i represents the method (or laboratory) effect, which is assumed to be normal with mean 0 and unknown variance σ 2 , and ε ij represent independent normal, zero mean random errors with unknown variances τ i 2 .
For a fixed i, the i-th sample mean x i = ∑ j x ij / n i is normally distributed with the mean μ and the variance σ 2 + σ i 2 , where σ i 2 = τ i 2 / n i . If the σ's were known, then the best estimator of μ would be the weighted average of x i with weights proportional to 1 / (σ 2 + σ i 2 ). Since these variances are unknown, the weights have to be estimated. Traditionally to evaluate σ i 2 one uses the classical unbiased statistic s i 2 = ∑ j (x ij −x i ) 2 / (n i v i ), v i = n i −1, which has the distribution σ i 2 χ 2 (v i ) / v i . Since statistics x i , s i 2 , i = 1, ..., p, are sufficient, we use the likelihood function based on them.
The ML solution minimizes in μ, σ 2 , σ i 2 , i = 1, ..., p, the negative logarithm of this function which is proportional to (2) It follows from (2) that if σ i 2 is the ML estimator of σ i 2 , then σ i 2 > 0. However, it is quite possible that σ i 2 = 0. In order to find these estimates one can replace μ in (2) by (3) which reduces the number of parameters from p + 2 to p + 1.
Our goal is to represent the set of all stationary points of the likelihood equations as solutions to simultaneous polynomial equations. To that end, note that (4) This formula, which easily follows from the Lagrange identity [7,Sec. 1.3], will be used with We introduce a polynomial P of degree p in σ 2 , elementary symmetric polynomial. Another polynomial (6) Since (7) the identity (4) can be written as The negative log-likelihood function (2) in this notation is (9) Let P i = Π k≠i (σ 2 +σ k 2 ) be the partial derivative of P with regard to σ i 2 ; denote by Q i the same partial derivative of Q and by P′ i = ∑ j:j≠i P ij the derivative of P′ (σ 2 , σ 2 ,…,σ p 2 ). By differentiating (9), we see that the stationary points of (2) satisfy polynomial equations, Each of these polynomials has degree 4 in σ i 2 , i = 1,…, p. When σ 2 = 0, P = ∏σ i 2 , and the equations (10) simplify to (11) These polynomials have degree 3 in σ i 2 . If σ 2 > 0, in addition to (10) one has (12) and F has degree 3p−3 in σ 2 .
In both cases the collection of all stationary points forms an affine variety whose structure can be studied via the Groebner basis of the ideal of polynomials (11) or (10) and (12) which vanish on this variety. The Groebner basis allows for successive elimination of variables leading to a description of the points in the variety, i.e., to the characterization of all (complex) solutions. There are powerful numerical algorithms for evaluation of such bases [8]. Many polynomial likelihood equations are reviewed in Ref. [9]. We determine the Groebner basis for equations (11) when p = 2 in the next section.

ML Method: p = 2
When p = 2, Q = (x 1 −x 2 ) 2 . If σ 2 = 0, the polynomial equations (11) take the form (13) The Groebner basis is useful for solving these equations as it allows elimination of one of the variables, say, σ 2 2 . With n = n 1 + n 2 , f = n + n 2 , 2 , under lexicographic order, σ 1 2 > σ 2 2 , the Groebner basis for equations (13) written in the form (14) ( 15) consists of two polynomials, (16) and (17) This fact can be derived from the definition of the Groebner basis and confirmed by existing computational algebra software.
, c n n f z n n n n n z n n f = − + + − Thus, there are 3 × 6 = 18 complex root pairs out of which positive roots u, υ are to be chosen. Although in practice most all these roots are complex or negative and are not meaningful, sometimes the number of positive roots is fairly large. For example, x 1 = particular case one has to compare the likelihood function evaluated at these solutions with the equivalent. The fact excludes the possibility that (24) Unfortunately, the Groebner basis for equations (21)-(23) has a much more complicated structure: the form of the monomials entering into the basis polynomials depends on z 1 , z 2 , n 1 and n 2 . To find solutions to (21)-(23) we start with conditions on u and υ for which y > 0 in (21)- (23).
For fixed u and υ the behavior of the derivative of (9) with respect to σ 2 is determined by that of the cubic polynomial (12) which now takes the form, This derivative is positive if and only if F(y) > 0.
. In this and only in this situation (whose probability is zero), = . The equation (21) implies that 2 1/2, so that  , in the interval (−1, 1), observe first that it cannot have more than two roots there. Indeed the polynomial in (33) assumes negative values at the end points, and it has exactly two roots if and only if there is a point in this region at which that polynomial is positive.
We summarize the obtained results.
We conclude by noticing the relationship of the problem discussed here to the likelihood solutions to the classical Behrens-Fisher problem [10] which assumes that σ 2 = 0.

Solving Likelihood Equations Numerically
In view of the difficulty in evaluating the Groebner basis for p > 2, an iterative method to solve the optimization problem in (2) is of interest. Notice that the sum of the first two terms in (2) is a homogeneous function of σ 2 , σ 1 which reduces the problem to minimization of such To minimize the objective function in (39)   (40) one can impose a restriction on y 0 , y 1 , …, y p such as ∑ y i = 1 or λ 0 = 1. Note that F tends to +∞ when y i → 0 or y i → +∞ for any fixed y 0 > 0 . Since which must be positive for large y i and negative for sufficiently small y i .     If y 0 > 0, by equating this derivative to zero, we obtain The equation (42) must have a positive root, which means that for λ i evaluated at a stationary point, and (44) It follows that for any stationary point, 2s i 2 > λ 0 y i , i.e., For a fixed value of λ 0 , say, λ 0 = 1, the Eq. (42) leads to an iteration scheme, with specified initial values of y 0 (0) and μ _ 0 . We take these to be the estimates arrived at by the method of moments as described later in Sec. 5, but once they are given, one can solve the cubic equation for υ i = y 0 / y i , It is easy to see that each of these equations has either one or three positive roots. If there is just one root, then it uniquely defines υ i . In case of three positive roots, the root which minimizes (40)  As in [11], (46) defines a sequence converging to a stationary point, and at each step the value of (40) is decreased.

Theorem 2.2 Successive-substitution iteration defined by equations (46), (47) and (48) converges to a stationary point of (40) so that at each step this function decreases. At any stationary point inequalities (43) and (44) hold and (45) is valid.
Notice that Vangel and Rukhin tacitly assume in [11] that y 0 > 0. The case of the global minimum attained at the boundary, i.e., when y 0 = 0, can be handled as follows. Equating (41) to zero gives The maximum likelihood estimator and the restricted maximum likelihood estimator discussed in the next section can be computed via their R-language implementation through the lme function from nlme library [12]. However this routine has a potential (false or singular) convergence problem occurring in some practical situations.

Restricted Maximum Likelihood
The possible drawback of the maximum likelihood method is that it may lead to biased estimators. Indeed the maximum likelihood estimators of σ 's do not take into account the loss in degrees of freedom that results from estimating μ. This undesirable feature is eliminated in the restricted maximum likelihood (REML) method which maximizes the likelihood corresponding to the distribution associated with linear combinations of observations whose expectation is zero. This method discussed in detail by Harville [13] has gained considerable popularity and is employed now as a tool of choice by many statistical software packages.
The negative restricted likelihood function has the form (51) By using notation (5) and (6) as opposed to 3p−3 which is the degree of the corresponding polynomial F in (12) under the ML scenario.
An application of (54) shows that (55) Since P (σ 2 ) > 0, if F has a positive root σ 2 , then H also must have a positive root τ 2 . Moreover, τ 2 ≤ σ 2 , so that the ML estimator of σ 2 is always smaller than the REML estimator of the same parameter for the same data.
The polynomial equations for the restricted likelihood method have the form, with each of these polynomials being of degree 3 in σ i 2 , i = 1, …, p. When σ 2 > 0, Eqs. (56) have to be augmented by the equation H = 0.
When y 0 > 0, the function in (57) takes the form (58) Its derivative with respect to y i is (59) As in Sec. 2.2, for a fixed value λ 0 = 1 we can use an iterative scheme which is based on (59). However, now one has to specify initial values of y 0 (0) , μ _ 0 and of A (0) = ∑(y 0 + y j ) −1 after which the cubic equations for are solved for i = 1, …, p, with updating as in (47)  The solutions obtained via (60) and (64) are to be compared by evaluating RL in (57).

REML p = 2
To find the REML solutions σ ∼ This Groebner basis consists of three polynomials (70) whose coefficients are not needed here, All stationary points of the restricted likelihood equations can be found by solving the cubic equation G 5 (υ) = 0 for υ > 0, and substituting this solution in (71), allowing one to express u through v, Thus, in this case there are 3 complex non-zero roots pairs u, v.
Another approach is to use the argument in Sec. 3.1 which for a fixed sum 2y 0 + y 1 + y 2 = 1, leads to the optimization problem, , then according to Lemma 1 in the Appendix, for a fixed sum y 1 + y 2 = y, the minimum in (73) monotonically decreases when 0 < y ≤ 1, so that this minimum is attained at the boundary, y 1 + y 2 = 1, in which case indeed σ ∼ 2 = 0. Thus, one can solve the cubic equation for w = 1 − 2y 1 , An alternative method is to use the iteration scheme (64) to solve (57).

Conditions and Bounds for Strictly Positive Variance Estimates
In view of the rather complicated nature of likelihood equations, in many situations it is desirable to have a simpler estimating method for the common mean μ. The most straightforward way is to assume that the variances σ i 2 , i = 1, …, p, are known. In this case, essentially suggested in Ref. [15], but also pursued in Refs. [16,17], these variances are taken to be s i 2 (or a multiple thereof). Because of (3), the only parameter to estimate is y = σ 2 . (   and Lemma 3 in the Appendix shows that since One gets for the polynomial H in (53), When p = 2, the bounds (75) and (76) are sharp as (24) and (67) show. When p increases, their accuracy decreases. Section 4.2 gives necessary and sufficient conditions for σ 2 = 0 when p = 3. It is possible to get better estimates under additional conditions. For example, if the ordering of sequences (77) and (78) can be replaced by their average.

Example: Restricted Maximum Likelihood for p = 3
When p = 3, Q(y) = q 0 y + q 1 , is a polynomial of degree one, and (84) is a polynomial of degree three. If H(0) = h 3 < 0, it has a positive root, which means that σ 2 > 0. Otherwise the existence of a positive root is related to the sign of the discriminant (85) If D < 0 and h 3 ≥ 0, there is just one real root which must be negative, and then σ ∼ 2 = 0. If D ≥ 0, then there are three real roots. The condition h 3 ≥ 0 implies that either two of them are positive and one is negative (so that at least one of the coefficients h 1 or h 2 is negative) and σ ∼ 2 > 0, or all three roots are negative (so that σ ∼ 2 > 0.) ( 1) ( The argument, as before, shows now that if ( ) /(2 ) ( 1) , the n max ( ) 0, = 0, , 2 3 . It follows that , so that (76) which is defined for λ < 0.5137…, and let E The discriminant of (87) is negative, and according to Descartes's rule there are always two complex roots, and one positive root. It is easy to check that E  1 = E  1 (λ) is monotonically increasing in λ < 0 and E  1 (1/16) = 1/4. In this notation, when λ ≤ 1/16 (so that h 2 ≤ 0 implies that h 1 ≤ 0), the region {(E 1 , E 2 ) : E 2 ≤ E 1 2 / 3, h3 ≥ 0, σ ∼ 2 > 0}, is formed by three curves: (i ) h 3 = 0 between the point but the probability of having the likelihood solution there is zero.

Weighted Means Statistics
When the within-lab and between-lab variances σ i 2 and σ 2 are known, the best (in terms of the mean squared error) unbiased estimator of the treatment effect μ in the model (1) is a weighted means statistic (3) with w i = w i 0 = 1 / (σ 2 + σ i 2 ). Even without the normality assumption for these optimal weights, If V ar(x i ) = σ i 2 + σ 2 , but the weights w i are arbitrary, In particular, when The simplest estimate of the within-trials variances σ i 2 is by the available s i 2 , but the problem of estimating the between-trial component of variance σ 2 remains. . Actually, as follows from

DerSimonian-Laird Procedure
By employing the idea behind the method of moments, DerSimonian and Laird [18] estimators for the common mean (the Graybill-Deal estimator). In other words, the statistic x 0 and the weights

Mandel-Paule Method
The Mandel-Paule algorithm uses weights of the form (91) as well, but now y = y MP , which is designed to approximate σ 2 , is found from the moment-type estimating equation, See Refs. [15], [19]. The modified Mandel-Paule procedure with y = y MMP is defined as above, but p − 1 in the right-hand side of (93) is replaced by p, i.e., Notice that when p = 2, the DerSimonian-Laird estimator coincides with the Mandel-Paule rule, as, in this case, It was shown in Ref. [20] that the modified Mandel-Paule is characterized by the following fact: the ML estimator σ 2 of σ 2 coincides with y MMP , if in the reparametrized version of the likelihood equation the weights w i admit representation (91). As a consequence, the corresponding weighted means statistic (3) must be close to the ML estimator, so that the modified Mandel-Paule estimator approximates its ML counterpart. Thus, the modified Mandel-Paule estimator can be interpreted as a procedure which uses weights of the form 1 / (y + s i 2 ), instead of solutions of the likelihood equation that are more difficult to find, and still maintains the same estimate of σ 2 as ML.
A similar interpretation holds for the original Mandel-Paule rule and the REML function [21]. For this reason both Mandel-Paule rules are quite natural. It is also suggestive to use the weights (91) with y = y MP determined from the Mandel-Paule equation (93) as a first approximation when solving the likelihood equations via the iterative scheme (47) and (48).

Uncertainty Assessment
It is tempting to use formula (88) to obtain an estimator of the variance of x . For example, DerSimonian and Laird [18] suggested an approximate formula for the estimate of the variance of their estimator, Similarly motivated by (88), Paule and Mandel [19] suggested to use [∑ i (y MP + s i 2 ) −1 ] −1 to estimate the variance of x MP . However these estimators typically underestimate the true variance, Ref. [5]. They are not GUM consistent [22] in the sense that the variance estimate is not representable as a quadratic form in deviations x i − x .
Horn, Horn and Duncan [23] in the more general context of linear models have suggested the following GUM consistent estimate of V ar (x), which has the form ∑ i ω i 2 (x i − x ) 2 / (1 − ω i ), with ω i = w i / ∑ k w k . When the plug-in weights are ω i = (y + s i In this case δ is an increasing function of y with the largest value (x 1 − x 2 ) 2 / 4 attained when y → ∞.
An alternative method of obtaining confidence intervals for μ on the basis of REML estimators was suggested in Ref. [24] for an adjusted estimator of based on the inverse of the Fisher information matrix. This (not GUM consistent) estimator is more complicated.

Conclusions
The original motivation for this work was an attempt to employ modern computational algebra techniques by evaluating the Groebner bases of likelihood polynomial equations for the random effects model. While this attempt leads to an explicit answer when there are two labs with no between-lab effect, it was not successful in a more general situation. The classical iterative algorithms appear to be more efficient in this application although they do not guarantee the global optimum. Simplified, method of moments based procedures, especially the DerSimonian-Laird method, deserve much wider use in interlaboratory studies. is monotonically nondecreasing in the interval (0, 1) Proof: With B = (1 + z 1 / y 1 + z 2 / y 2 ) / (n − 1), any stationary point (y 1 , y 2 ) in (100) satisfies the simultaneous equations, where ζ is the Lagrange multiplier. By multiplying these equations by y 1 and y 2 respectively and adding them, we see that ζ y = 1 − 1 / B. Notice that B ≤ 1 if and only if ζ ≤ 0, which means that B ≥ z i / (v i y i ) for i = 1, 2. Then z 1 / v 1 + z 2 / v 2 ≤ 1, which contradicts the condition of the Lemma 1. Therefore, B > 1.
To prove Lemma 1 it suffices to show that the derivative of (100) is negative. This fact follows from the inequality (102) which can be proven by differentiation of (101). Indeed for i = 1, 2, Since y 1 + y 2 = y, and y′ 1 + y′ 2 = 1, by adding these identities we get (104)

Elementary Symmetric Polynomials: Lemmas 2 and 3
We note the following identities for elementary symmetric polynomials.
Combined with (105), identities (106) and (107) demonstrate the following formula (108) (which the author was unable to find in the literature.) Lemma 2 Under the notation above, for The following result Lemma 2 gives an upper bound on the coefficients of polynomial Q.