Maximum likelihood degree of variance component models

Most statistical software packages implement numerical strategies for computation of maximum likelihood estimates in random effects models. Little is known, however, about the algebraic complexity of this problem. For the one-way layout with random effects and unbalanced group sizes, we give formulas for the algebraic degree of the likelihood equations as well as the equations for restricted maximum likelihood estimation. In particular, the latter approach is shown to be algebraically less complex. The formulas are obtained by studying a univariate rational equation whose solutions correspond to the solutions of the likelihood equations. Applying techniques from computational algebra, we also show that balanced two-way layouts with or without interaction have likelihood equations of degree four. Our work suggests that algebraic methods allow one to reliably find global optima of likelihood functions of linear mixed models with a small number of variance components.


Introduction
Linear models with fixed and random effects are widely used for dependent observations. Such mixed models are typically fit using likelihood-based techniques, and the necessary optimization problems can be solved using the numerical methods implemented in various statistical software packages, as discussed, for instance, in [Far06]. Such software typically takes into account that the variance parameters are nonnegative. However, general-purpose optimization procedures do not give any guarantees that a global optimum is found; compare Section 1.8 in [Jia07]. It can thus be appealing to compute maximum likelihood (ML) estimates algebraically. Since linear mixed models have rational likelihood equations, this involves careful clearing of denominators and applying symbolic and specialized numerical techniques to determine all solutions of the resulting polynomial system. An explanation of what we mean by careful clearing of denominators is given in [DSS09,Chap. 2]. While solving likelihood equations algebraically may not be feasible in large models with several random factors, modern computational algebra does allow one to fully understand the likelihood surface in practically relevant settings.
The main contribution of this paper is a study of the algebraic complexity of ML estimation in the unbalanced one-way layout with random effects. This model concerns a collection of grouped observations (1) Y ij = µ + α i + ε ij , i = 1, . . . , q, j = 1, . . . , n i .
The overall mean µ ∈ R is a fixed ('non-random') but unknown parameter. The random effects α i and the error terms ε ij are mutually independent normal random Key words and phrases. Analysis of variance, linear mixed model, maximum likelihood, restricted maximum likelihood, variance component.
variables. More precisely, α i ∼ N (0, τ ) and ε ij ∼ N (0, ω), where τ and ω denote the common variances of the random effects and the error terms, respectively. Clearly, the distribution of observation Y ij is N (µ, τ + ω), and two observations Y ij and Y ik from the ith group are dependent with covariance τ . A detailed discussion and examples of applications of this specific model can be found, for instance, in Chapter 3 of [SCM92] and in Chapter 11 of [SO05].
The covariance matrix of the joint multivariate normal distribution for all Y ij defined by (1) is the product of the scalar ω and a matrix that is a function of the variance ratio θ = τ /ω. Therefore, when θ is known, the likelihood equations for µ and ω are of the type encountered in generalized least squares calculations, with a unique solution that is a rational function of the data and the known value of θ. We may thus eliminate µ and ω from the likelihood equations, which then reduce to a single univariate equation. Before turning to a first example, we remark that we always tacitly assume suitable sample size conditions to be satisfied such that ML estimates exist. In particular, we assume there to be q ≥ 2 groups with at least one group of size n i ≥ 2. A definitive answer to the existence problem in linear mixed models is given in [DM99] who also treat restricted maximum likelihood (REML) estimation; see [MN89] for an introduction to this technique.
Example 1. Textbook data from [DG72,§6.4] give the yield of dyestuff from 5 different preparations from each of q = 6 different batches of an intermediate product; the data are also available in the R package lme4. The layout is balanced, that is, all batch sizes are equal, here n i = 5. In this case, the likelihood equations are well-known to be equivalent to a linear equation system, and the ML estimators are rational functions of the observations Y ij . In terminology we will use later on, balanced one-way layouts have ML degree one. Exactly the same is true for REML.
A different picture emerges in the unbalanced case, when the batch sizes are not all equal. For illustration, we remove the first, second and sixth observation from the data. The first batch then only comprises n 1 = 3 preparations, and the second batch only n 2 = 4. The remaining batches are unchanged with n i = 5 for i ≥ 3. In this unbalanced case, the solutions of the likelihood equations correspond to the solutions of the polynomial equation (2) − 245488320000 θ 7 − 277109078400 θ 6 − 58814614680 θ 5 + 54052612853 θ 4 + 37792395524 θ 3 + 10086075110 θ 2 + 1279832076 θ + 64175517 = 0.
As the large integers suggest, this equation is exact given the input. However, the measurements that enter the computation are, of course, rounded.
Numerical optimization using the R package lme4 yields a local maximum of the likelihood function that corresponds to θ ≈ 0.5585. We may check whether this local maximum is unique, or at least a global optimum, by finding all roots of the above univariate polynomial. This is a task that can be done reliably in computer algebra systems. However, such a computation is not needed here. The polynomial in (2) has exactly one sign change in its coefficient sequence. Hence, by Descartes' rule of signs, it has precisely one positive real root. The mere construction of the polynomial thus reveals that the local maximum we computed is the unique local (and global) maximum of the likelihood function; recall that the parameter θ is restricted to be nonnegative.
We note that equations (2) and (3) cannot be solved by radicals. The Galois groups are the symmetric groups S 7 and S 5 , respectively.
It is natural at this point to ask for the maximum likelihood degree of the one-way layout as a function of the number of groups q and the group sizes n 1 , . . . , n q . The ML degree is the number of complex solutions to the likelihood equations when the data are generic. Indeed, the number of complex solutions is constant with probability one, and a data set is generic if it is not part of the null set for which the number of complex solutions is different. The REML degree is defined in just the same way, but starting from different equations. Either degree measures the algebraic complexity of the computation of the estimates. In Example 1, it is simply the degree of a univariate polynomial in θ = τ /ω whose roots yield the possibly complex vectors (µ, ω, τ ) that solve the likelihood equations. For more background on ML degrees, see [HKS05,CHKS06,BHR07,DSS09,Stu09,HS10].
Our main result answers the above question. Theorem 1, which we prove in later sections, gives formulas for both the ML and the REML degree of possibly unbalanced one-way layouts and offers a direct comparison of the algebraic complexity of the two approaches. The theorem is conveniently stated using a notion of multiplicities. Suppose v = (v 1 , . . . , v q ) ∈ Z q is a tuple of integers. If v has M distinct entries, then the multiplicities of v form the integer multiset {m 1 , . . . , m M }, where m j counts how often the jth distinct entry of v appears among all entries of v.
Theorem 1. Consider a one-way layout with random effects for q groups that are of sizes n 1 , . . . , n q . Suppose M of the group sizes are distinct, with associated multiplicities m 1 , . . . , m M . Let M 2 = #{j : m j ≥ 2}. Then the ML degree is 3M + M 2 − 3, and the REML degree is 2M + 2M 2 − 3. The ML degree exceeds the REML degree unless M 2 = M , in which case equality holds.
The condition M 2 = M holds if each group size appears at least twice. In the balanced case, we have M = M 2 = 1 and the theorem recovers the well-known fact that both degrees are one; compare [Hoc85,SCM92,SO04]. Each degree is maximal when the group sizes n 1 , . . . , n q are pairwise distinct. The degrees are then 3q − 3 for ML and 2q − 3 for REML.
The remainder of the paper is structured as follows. In Section 2, we review the derivation of the likelihood equations for ML and REML estimation. Section 3 contains the proof of the ML degree formula from Theorem 1, and Section 4 treats the REML degree. Each proof consists of a detailed study of a univariate rational equation in the variance ratio θ. In Section 5, we demonstrate that algebraic computations are feasible for more general linear mixed models. More precisely, we treat a one-way layout with q = 109 unbalanced groups and a mean structure given by two covariates that is relevant in a recent application. In Section 6, we consider balanced two-way layouts. These are known to have REML degree equal to one, and we show that the ML degree is four, which means that ML estimates are available in closed form in the sense of Cardano's formula. Our conclusions are summarized in Section 7, where we also give two examples of unbalanced one-way random effects models with bimodal likelihood functions.

The likelihood equations
Let n 1 , . . . , n M be unique group sizes with associated multiplicities m 1 , . . . , m M . Let Y ij = (Y ij1 , . . . , Y ijni ) be the vector comprising the observations in the jth group of size n i . Then the model for the one-way layout given by (1) can equivalently be described as stating that Y 11 , . . . , Y 1m1 , Y 21 , . . . , Y MmM are independent multivariate normal random vectors with where the covariance matrix is Σ ni (ω, τ ) = ωI ni + τ 1 ni 1 T ni . Here, 1 n = (1, . . . , 1) T ∈ R n , and I n is the n × n identity matrix.
Let N = m 1 n 1 + · · · + m M n M be the total number of observations. For each i = 1, . . . , M , define the group averages Y ijk , j = 1, . . . , m i , and the average across the groups of equal sizē From the averages, compute the between-group sum of squares Note that, for generic data, B j = 0 if and only if m j = 1. Therefore, it suffices to consider the sums of squares B i with m i ≥ 2. Finally, define the within-group sum of squares which is positive for generic data.
Proposition 1. Upon the substitution κ = 1/ω and θ = τ /ω, the log-likelihood function for the one-way layout can be written as Proof. Applying (5), the quadratic form in (4) can be expanded into Using this expression and (6), the log-likelihood function is seen to be equal to The claimed form of ℓ(µ, κ, θ) is now obtained by expanding the last sum as The partial derivatives of the log-likelihood function from Proposition 1 are Since N = 0, the equation system obtained by setting the three partials to zero has the same solution set as the equation system Now we can solve equation (14) for µ, substitute the result into equation (15) and solve for κ. Both µ and κ are then expressed in terms of θ. Substituting the expressions into (16), we obtain a univariate rational equation in θ. Our proof of the ML degree formula in Theorem 1 proceeds by cancelling terms from the numerator and denominator of this rational expression. This is the topic of Section 3.

2.2.
Restricted maximum likelihood. The REML method uses a slightly different likelihood function that is obtained by considering a projection of the observed random array (Y ijk ) ∈ R N . The mean of this array has all entries equal to µ. In other words, it is modelled to lie in the space L ⊂ R N spanned by the array with all entries equal to one. The likelihood function used in REML is obtained by taking the observation to be the projection of (Y ijk ) onto the orthogonal complement of L. The distribution of the projection no longer depends on µ and so the REML function only has (τ, ω) or, equivalently, (κ, θ) as arguments.
Using the formulas given, for instance, in [MN89], and simplifying the resulting expressions similar to what was done in the proof of Proposition 1, we obtain the following expression for the restricted log-likelihood function.
Proposition 2. Upon the substitution κ = 1/ω and θ = τ /ω, the restricted loglikelihood function for the one-way layout can be written as Note thatμ(θ) is the solution to the equation in (14). Computingμ(θ) is the standard way to obtain an estimate of µ from a REML estimate of θ.
The partial derivatives of the restricted log-likelihood function from Proposition 2 are The equation ∂l/∂κ = 0 is easily solved. Substituting the unique solutionκ(θ) into the equation ∂l/∂θ = 0 yields again a univariate rational equation in θ. The proof of the REML degree formula in Theorem 1 requires studying cancellations from the numerator and denominator of this equation, which is the topic of Section 4.

Proof of formula for ML degree
Our proof of the ML degree formula in Theorem 1 proceeds in two steps. First, in Lemma 1 we derive a univariate rational equation whose number of zeros is the ML degree of the model. Second, we simplify it in Lemmas 2 and 3 by clearing common factors from the numerator and the denominator.
Fix the following notation, used throughout. For a vector a = (a 1 , . . . , a M ) ∈ R M , define the rational functions We write r 1 , r B/m , r Y , r Y 2 for the functions r a that have respectively. It is clear from Section 2 that forming a common denominator for the rational equations to be studied involves the product (1 + n i θ).
For a vector a ∈ R M , define the degree M − 1 polynomial (1 + n j θ) and the degree 2(M − 1) polynomial Lemma 1. The ML degree of the one-way layout is the degree of the numerator created when cancelling all common factors from numerator and denominator of the following rational function in θ: Proof. Adopting the notation above, the solution of the first of the likelihood equations in (14) can be written as Next, rewrite the following term from the system of the three critical equations: Solving the second equation in (15) with µ =μ(θ) for κ thus giveŝ Substitutingμ(θ) andκ(θ) into the third and last equation in (16), we obtain the univariate rational equation where we have divided by the non-zero rational expressionκ(θ). According to (24), this is Reexpress (26) in terms of the f and g polynomials as The claim now follows by forming a common denominator.
The denominator given in (21) in Lemma 1 has degree 2M + 2(M − 1) = 4M − 2. The numerator in (21) has degree 3(M − 1) + M = 4M − 3; the highest degree term involves the within-group sum of squares W . The next two lemmas imply that, after cancelling common factors, the numerator of the univariate rational function from Lemma 1 has the degree claimed in the ML degree formula from Theorem 1.
Lemma 3. If d 1 (θ) is cleared from both the numerator and the denominator of the rational function given in (21), then the new numerator and denominator are relatively prime for generic sufficient statisticsȲ 1 , . . . ,Ȳ M , W , and B j with m j ≥ 2.
Proof of Lemma 2. Let m t = 1. To show that (1 + n t θ) divides the numerator, it is sufficient to show (1 + n t θ) divides the sum of The product f 1 (θ) 2 g Y 2 (θ) in the first term of (29) may be rewritten as Combining this expression with the analogous expansions of the other three terms shows that the polynomial in (29) is equal to N times The polynomial in (30) can be expanded similarly. We find (1 + n l θ).
Expanding f 1 (θ) 2 as well, we obtain that the polynomial in (30) is equal to Now notice that (1 + n t θ) divides every summand in (31) and (33) unless i = k = r = t in the first summation, or i = k = r = u = t in the second summation. So it suffices to only consider these 'diagonal' terms. However, under the equality of indices, the quadratic expressions in the averagesȲ i cancel. Hence, the terms missing a factor of (1 + n t θ) in (29) and (30) sum to Throughout the paper, we assume that we have at least two groups with at least one group size n i ≥ 2. Moreover, for generic data, B t = 0 if and only if m t = 1. Hence, for generic data, the expression in (34) is zero if and only if m t = 1. We conclude that d 1 (θ) divides the numerator of the rational function in (21).
Note that the last part of the above proof shows not only that d 1 (θ) divides the numerator of (21), but that (1 + n t θ) does not divide the numerator when B t = 0, which holds generically if m t ≥ 2.
Proof of Lemma 3. Clearing d 1 (θ) from the denominator in (21) yields the polynomial N d 2 (θ)d(θ)f 1 (θ) 2 . From the preceding comment, we know that d 2 (θ) and the numerator are relatively prime for generic dataȲ 1 , . . . ,Ȳ M , W > 0, and B j > 0 with m j ≥ 2. To establish our claim, we will first show that f 1 (θ) does not share a common factor with the numerator by showing f 1 (θ) and f Y (θ) 2 g 1 (θ) to be relatively prime; all terms other than f Y (θ) 2 g 1 (θ) in the numerator of (21) are multiples of f 1 (θ). Then, we will show that after clearing d 1 (θ) in (21), d 1 (θ) and the new numerator are relatively prime.
Let θ 1 , . . . , θ M−1 be the (possibly complex) roots of the degree M − 1 polynomial A generic vector of group means (Ȳ 1 , . . . ,Ȳ M ) lies outside this lower-dimensional set, which means that f 1 (θ) and f Y (θ) are relatively prime for generic data.
To show that f 1 (θ) and g 1 (θ) are relatively prime, assume θ 0 = a + ib is a root of f 1 (θ) and g 1 (θ). Since g 1 (θ) is a sum of squares that is positive on R, we must have θ 0 / ∈ R and hence b = 0. Without loss of generality, let n 1 be the least of the group sizes n i . Rewriting f 1 (θ 0 ) = 0, we get The imaginary part of the right side of this equation must equal 0 since n 1 is an integer. Substituting a + ib for θ 0 , the imaginary part of (35) is Since each term in the sum is positive, we obtain that b = 0. Consequently, θ 0 ∈ R, which is a contradiction. Therefore, f 1 (θ) and g 1 (θ) are relatively prime.
It remains to show that the numerator and denominator obtained by clearing the factor d 1 (θ) in (21) are relatively prime for generic data. We claim that if m t = 1 then (1 + n t θ) divides are relatively prime for generic data. The ratio in (36) equals (31) divided by d 1 (θ). We may rewrite (31) as (1 + n s θ) 2 .
It is clear that the square (1 + n t θ) 2 divides all terms in the sum (38) except those for r = i = t or r = k = t. However, the quadratic form in the averagesȲ i vanishes if r = i or r = k. Since the terms in question have r = t, and B r = B t = 0 because m t = 1, we conclude that (1 + n t θ) 2 divides the entire sum (38), which proves that d 1 (θ) divides the ratio in (36).
We are left to show that d 1 (θ) and W f 1 (θ)d 2 (θ) + F (θ) are relatively prime for generic data. Let θ 1 , . . . , θ M−M2 be the roots of d 1 (θ); each root is equal to −1/n i for some index i. Since the n i are distinct, no root of d 1 (θ) is a root of d 2 (θ). Moreover, it is easy to see that no root of d 1 (θ) is a root of f 1 (θ). Now let I be the ideal generated by the where the B ′ i stand for the between-group sums of squares B i with multiplicity m i ≥ 2. Pick sufficient statistics W =Ȳ 1 = . . . =Ȳ M = 0 and B ′ 1 = . . . = B ′ M = 0. Since no root of d 1 (θ) is a root of d 2 (θ) or f 1 (θ), (32) implies that for these special data W f 1 (θ k )d 2 (θ k ) + F (θ k ) = 0 for each k. The zero locus V (I) is thus a proper algebraic subset of C M+M2+1 . Such a set is of lower dimension and, thus, d 1 (θ) and W f 1 (θ)d 2 (θ) + F (θ) are relatively prime for generic data.

Proof of formula for REML degree
For the proof of the REML degree formula in Theorem 1, we proceed in the same way as for the ML degree. We begin by deriving the univariate rational function whose number of roots is the REML degree.
Lemma 4. Consider the rational function whose numerator is and denominator is The REML degree is the degree of the numerator of this rational function after clearing common factors from the given numerator and denominator.
Proof. The equation ∂l/∂κ = 0 has the unique solution recall (20). We can now simplify and rewrite (41), forming a common denominator, to obtain the desired rational function.
The degree of the numerator in Lemma 4 is 4M −3, but it shares common factors with the denominator. In fact, in the proof of Lemma 2, we have shown that the denominator from (40). To prove Theorem 1, it remains to prove the following two facts.
Lemma 6. After clearing d 1 (θ) 2 from (39) and (40), the new numerator and new denominator are relatively prime for generic data.
Proof of Lemma 5. From the proof of Lemma 2, we know that d 1 (θ) divides the polynomial f Y 2 (θ)f 1 (θ) − f Y (θ) 2 + f 1 (θ)f B/m . Moreover, as shown in the proof of Lemma 3, the square d 1 (θ) 2 divides To complete the proof of the present lemma, it suffices to show that d 1 (θ) divides g 1 (θ) − f 1 (θ) 2 . However, with some distributing and grouping, we see which is divisible by (1 + n t θ) if and only if m t = 1.
Proof of Lemma 6. We first show that if m t ≥ 2, then, for generic data, (1 + n t θ) and the numerator from (39) are relatively prime. Consider Using the results from the proof of Lemma 2 and writing out the involved summations, (42) is seen to be equal to (1 + n l θ) The factor (1+n t θ) divides every summand in the above summations unless t = i = k = r = u, so it suffices to only consider these terms. Letting t = i = k = r = u, the terms missing a factor of (1 + n t θ) sum to a term we already encountered, namely, that in (34). The discussion following display (34) shows that if the data is generic and m t ≥ 2, then (1 + n i θ) does not divide the numerator given in (39). Continuing to work through the factors of the denominator from (40), assume that θ 0 is a root of f 1 (θ). Then everything vanishes in the numerator except for two terms −g 1 (θ 0 )f Y (θ 0 ) 2 and (N −1)f Y (θ 0 ) 2 g 1 (θ 0 ), which add to (N −2)f Y (θ 0 ) 2 g 1 (θ 0 ). From the proof of Lemma 3, we know f Y (θ 0 ) 2 g 1 (θ 0 ) = 0 for generic data, so since we are working under the assumption of at least two groups and at least one group size n i ≥ 2, the numerator and f 1 (θ) are relatively prime for generic data.
Finally, we need to show and G(θ) := W f 1 (θ)d 2 (θ) + F (θ), are relatively prime for generic data W ,Ȳ 1 , . . .Ȳ M , and B i with m i ≥ 2; the polynomial F (θ) was defined in (37). We will again denote the between-group sums of squares with multiplicities m i ≥ 2 as B Picking W not to satisfy (44) shows that Res(G, H) is not the zero polynomial in Hence, the zero locus of Res(G, H) is a set of lower dimension, and we conclude that H and G are relatively prime for generic data.

General mean structure in the one-way layout
The one-way layout as specified in (1) postulates a common mean µ for all observations Y ij . Often the interest is instead in a more general mean space. Formally, consider the model where the random effects α i ∼ N (0, τ ) and the error terms ε ij ∼ N (0, ω) are again all mutually independent. However, the array of means (µ ij ) may now belong to a linear subspace of R N that we assumed to be spanned by the independent columns of a full rank design matrix X ∈ R N ×p ; as before, N = n 1 + · · · + n q is the sample size. In other words, for some unknown (fixed) mean parameter vector β ∈ R p . ML and REML estimation with more general mean structure can be approached algebraically in the exactly the same way as before. It is convenient to reparametrize the covariance matrix in terms of κ = 1/ω and θ = τ /ω. For known covariance parameters, the ML estimateβ(θ) of β is obtained by generalized least squares and depends on θ but not on κ. For fixed θ, it is then also straightforward to solve the ML or REML equations for κ. This way we may reduce algebraic solution of the likelihood equations to solving a single rational equation in θ. In this section we demonstrate that the involved algebraic computations are feasible in a larger example. Before going into the details of the example, we would like to offer the following conjecture based on numerical experiments with smaller models and randomly chosen design matrices. It states that the ML and REML degrees for the model specified by (45) and (46) cannot exceed the largest possible respective degrees in the model with common mean µ. Recall that the largest degrees arise in the entirely unbalanced case with group sizes n 1 , . . . , n q that are pairwise distinct.
Conjecture 1. For any design matrix X ∈ R N ×p that has the vector (1, . . . , 1) T in its column span span(X), the ML degree for the one-way layout with mean space span(X) and q random group effects is bounded above by 3q − 3. Similarly, the REML degree is bounded above by 2q − 3.
According to this conjecture, the degrees would grow only linearly with the number of groups, which would suggest that a moderately large number of unbalanced groups can be handled in algebraic computations.
Example 3. With the goal of providing linguistic support for an African origin of modern humans, Atkinson [Atk11] fits regression models to data on the phonemic diversity of languages. The data, which can be obtained from the journal's online supplementary material, concern N = 504 languages that are classified into q = 109 language families. Besides quantitative summary measures of phonemic diversity, the available information includes the size of the population speaking each language and the distance between a chosen center for each language and an inferred origin in Africa, the latter being the main covariate of interest.
One model of interest in this application is a one-way layout with groups corresponding to the language families. The response Y ij is the phonemic diversity of the jth language in the ith family, which, as in (45) and (46), is modelled as (47) Y ij = β 0 + β 1 log(P ij ) + β 2 D ij + α i + ǫ ij , i = 1, . . . , q, j = 1, . . . , n i .
Here, P ij stands for the population size and D ij is the distance from the origin in Africa. As can be expected, the data is unbalanced. The group sizes n 1 , . . . , n 109 fall into the range from 1 to 62. There are M = 17 distinct group sizes of which M 2 = 9 have multiplicity two or larger. Hence, by Theorem 1, the one-way layout with all means equal has ML degree 57 and REML degree 49. However, as we show next, the mean structure can affect the ML and REML degree. Computations we did using the software Maple show that the ML degree of the model given by (47) is 83, whereas the REML degree is 71. Exact computations in analogy to the ones given in Example 1 produce large integer coefficients, too large to display on paper but easily handled by a computer. Solving the polynomial equations for ML and REML numerically, each equation is seen to have a unique positive root, namely, Each root gives a local and, thus, global maximum of the concerned likelihood function. We remark that the ML equation has twelve negative real roots. The REML equation has no other real roots. Running the numerical optimizers implemented in the R package lme4 yields estimates that agree with (48) in all the given digits. As in Example 1, the fact that our two univariate polynomials each have a unique positive root manifests itself in a single sign change in the coefficient sequence. Finally, we remark that when omitting either the covariate log(P) or the covariate D, the ML degree drops to 72 and the REML degree drops to 61.

Balanced two-way layouts
Suppose we have observations Y ijk that are cross-classified according to two factors and model the observations in an additive two-way layout as i = 1, . . . , r, j = 1, . . . , q, k = 1, . . . , n.
The interaction terms γ ij are again mutually independent and independent of all other random variables appearing on the right hand side of (50).
The models in (49) and (50) are balanced; the groups of observations Y ij1 , . . . , Y ijn specified by the different index pairs (i, j) are all of size n. It is known that REML leads to closed form estimates for each of the two balanced models; compare [Hoc85,SCM92,SO04]. In other words, the REML degree of either model is one. ML estimation, however, presents a non-trivial algebraic problem. The ML degree can be derived using Gröbner basis calculations, and we see that balanced two-way layouts have closed form ML estimates in the sense of Cardano's formula.
Theorem 2. The ML degree of balanced additive two-way layout with random effects is four. The same holds for the model with random interaction.
Proof. Define the sum of squares where we use the convention that the overbar indicates that an average was formed and the ' ' subscripts specify which indices were averaged over.
(No interaction) The ML equations for the additive model given by (49) are derived, for instance, in Chapter 4.7.d of [SCM92] and in Chapter 3 of [SO04]. One equation leads to the ML estimatorμ =Ȳ .
The rational equations for the variance components may be written as Clearing the common denominators ω 2 , (ω + qnτ 1 ) 2 , (ω + rnτ 2 ) 2 , and ω + qnτ 1 + rnτ 2 gives a polynomial equation system. However, multiplying each equation with the relevant product of these denominators introduces new solutions that are not solutions of the original rational equations. Using saturation as explained in Chapter 2 of [DSS09], we can remove these extraneous solutions and obtain a polynomial equation system of degree 4. (We remark that software such as Maple is able to produce a lexicographic Gröbner basis over the field of fractions in r, q, n, and the four sums of squares.) (With interaction) Chapter 4.7.d of [SCM92] also gives the ML equations for the model with interaction defined by (50); see also Chapter 4 of [SO04]. Two equations determine the ML estimatorŝ .
Clearing the denominators carefully via saturation yields a polynomial equation system of degree 4. (Again, a lexicographic Gröbner basis can be obtained with r, q, n and the sums of squares as parameters to the equations.) We briefly illustrate algebraic computation of the ML estimators in an example that involves the additive two-way layout.
Example 4. The R package lme4 contains data from experiments for an assessment of the variability between samples of penicillin. The data are described in detail in [DG72]. The response is a diameter measurement of the zone in which growth of an organism is inhibited by the penicillin. The experiments are cross-classified according to the assay plate and the penicillin sample used. The former is a factor with r = 24 levels, the latter has q = 6 levels. There are no replications to be considered in this case, that is, n = 1. We will consider the additive model for which the relevant sums of squares are SSA = 105 8 9 , SSB = 449 2 9 , SSAB + SSE = 34 7 9 .
Using the saturation computation alluded to in the proof of Theorem 2, we obtain the It defines the unique global maximum of the likelihood function.

Conclusion
This paper takes a first step towards understanding the algebraic complexity of ML and REML estimation in linear mixed models. Our main results in Theorem 1 concern the unbalanced one-way layout with common mean for all observations. It would be interesting to generalize the results to one-way classifications with more complicated mean spaces; recall Conjecture 1. Similarly, it would be interesting to study unbalanced two-and higher-way layouts, although these models would require more sophisticated mathematical treatment because it is no longer possible to analyze a single univariate rational equation; compare Section 6.
A remarkable feature common to Examples 1 and 3 is that Descartes' rule of sign applied to a univariate polynomial in the variance ratio θ reveals that there is a unique feasible solution to the ML/REML equations. The same was true for many other examples of unbalanced one-way classifications that we computed. This said, we also saw cases with more than one sign change and the number of positive solutions for θ not matching up with the sign changes.
To our knowledge, the literature does not supply many examples of linear mixed models with multimodal likelihood functions. We conclude by giving two simulated examples that demonstrate the mathematical possibility of more than one mode. Such examples were rare in our simulations, which is in agreement with findings of [SM84] who also treat the unbalanced one-way layout. While uniqueness of local optima is not explicitly discussed in [SM84], the authors remark in their conclusion that "varying the iteration starting point slightly affects the rate of convergence, but not the [mean square errors] or biases of the [ML and REML] estimators." The examples we give involve three positive roots to the ML or REML equations for the variance ratio θ. We do not know of examples with more positive roots.
Let the sufficient statistics be the five group averages having specified six digits we should add that the solutions were computed treating the above rational fractions as the input. The solutionθ ML,1 yields the global maximum of the likelihood function, whereasθ ML,2 andθ ML,3 determine a saddle point and local maximum, respectively. In contrast, the restricted likelihood function has a unique local and global maximum for θ REML ≈ 0.771763.
The data was simulated from the model with mean µ 0 = 0, and variance components τ 0 = 3 and ω 0 = 2, which gives θ 0 = 3/2. The solutionθ REML,1 gives the global maximum of the restricted likelihood function. The solutionsθ REML,2 andθ REML,3 determine a saddle point and a local maximum, respectively. The data was simulated as in Example 5.
Readers experimenting with the two examples just given will find the likelihood functions to be rather flat between the three stationary points, which give loglikelihood values that differ by less than 0.1.
In both Example 5 and Example 6, the first group is of the smallest size but has group mean that is largest in absolute value. The other means are comparatively close to each other. We experimented with permuting the means, while holding the group sizes fixed. In Example 6, eight out of 120 permutations give bimodal restricted likelihood functions. Two permutations yield three positive roots to the REML equations. The other six cases have two positive roots, and one of the two local maxima occurs for θ = 0. The eight permutations generate the group of permutations that keep the first mean fixed. In this example, there is clearly negative correlation between the group sizes n i and the group meansȲ i . (In practice, such dependence could arise from selection effects.) The eight permutations of interest turn out to give the eight most negative correlations between group sizes and means. In similar experiments for Example 5, which features positive correlation between group sizes and means, bimodal likelihood functions are obtained for 18 permutations. Again, these permutations keep the first mean fixed. Only three permutations give three positive roots to the ML equations. The 18 permutations include the top six permutations in terms of large positive correlation but also the permutation whose associated correlation ranks 43rd.
While dependence between group means and sizes plays a role in Examples 5 and 6, the precise interplay between them appears to be subtle. For instance, when varying the meanȲ 1 in Example 5 and keeping all other sufficient statistics fixed, we find that there are three positive roots to the ML equations when −5.47 ≤ Y 1 ≤ −5.08 but a unique root otherwise; we experimented with a grid of values in [−10, 10]. In particular, the likelihood function is unimodal for larger negative values ofȲ 1 . It would be interesting, but presumably difficult, to get a better understanding of the semi-algebraic set of sufficient statistics that give (restricted) likelihood functions with more than one local maximum.