Decision Theory for the Variance Ratio in One-Way ANOVA with Random Effects

Estimating a variance component in the model of analysis of variance with random effects and testing the hypothesis that the variance vanishes are important issues in many applications. Such inferences are beyond the confines of the standard (asymptotic) theory because a zero variance is on the boundary of the parameter space and the maximum likelihood or another reasonable estimator of variance has a non-trivial probability of zero in many settings. We derive decision rules regarding the variance ratio in balanced one-way analysis of variance, in both the frequentist and Bayesian perspectives. We argue that this approach is superior to hypothesis testing because it incorporates the consequences of the two kinds of error (incorrect choice) that may be committed. An application to a track athlete’s training performance is presented.


Introduction
Testing the hypothesis that between-cluster variance vanishes in the mixed model of one-way analysis of variance (ANOVA) and its extensions has received considerable attention recently (Crainiceanu & Ruppert 2004, Greven, Crainiceanu, Küchenhoff & Peters 2008, Giampaoli & Singer 2009, Andrade, Longford & Tovar 2014).The principal findings in these references are that asymptotic theory, or its adaptation for the non-standard nature of the inferential problem, provides a poor approximation for small and moderately large samples, and that the likelihood ratio test statistic has a distribution well approximated by a mixture of the constant zero and one or several χ 2 distributions.The mixture probabilities are specific to the setting (design).
We regard hypothesis testing as problematic in general, because it has no means of incorporating the consequences of the two kinds of error (bad choice) that may be committed (Longford 2005(Longford , 2012b)).We follow up on this criticism by solving the decision-theoretical version of the problem, in which we choose whether to act as if the ratio of between-and within-cluster variances, ω = σ 2 B /σ 2 W , were smaller or greater than a given positive constant ω 0 , called the threshold.We specify a loss function (and later a set of loss functions) which quantify the consequences (ramifications) of two kinds of bad decision.We choose the course of action, that is, one of the two verdicts, ω ≤ ω 0 and ω > ω 0 , for which the expected loss is smaller.We use asymmetric loss functions that reflect the dependence of the loss on both the size of the error | ω −ω 0 | and its sign, when the inappropriate action is chosen.Choosing the appropriate action is associated with no loss.The outcome of our analysis is the preferred action.In contrast, the outcome of a hypothesis has to be interpreted, and the consequences of the two kinds of error considered ad hoc.
We address the uncertainty about the loss function by considering a range of plausible loss functions and, in effect, solving the problem for every one of them.Owing to some monotonicity properties, it suffices to solve the problem for the loss functions that delimit the plausible set.This can be regarded as a form of sensitivity analysis.The outcome may be an inferential impasse, but its threat is an incentive for more detailed elicitation and declaration of a narrower range of loss functions.Solutions are developed in the Bayesian paradigm, using prior densities with analytically convenient functional forms.They have a natural frequentist interpretation in terms of additional observations (degrees of freedom).
For background to decision theory we refer to DeGroot (1970, Part 3), Berger (1985) and Lindley (1985) , and for a new perspective to Longford (2013).An application of our approach to compare two normally distributed random samples is presented in Longford (2012a).
The next section presents a condensed background to one-way ANOVA with random effects.For a comprehensive treatment of the topic, including numerous generalisations, see Searle, Casella & McCulloch (2006).The following section deals with decisions about the variance ratio, namely whether it is greater or smaller than a given positive constant.The decision is based on the comparison of the expected losses associated with the contemplated courses of action.Central to our approach is the evaluation of configurations of values of estimates of the variance ratio and of a parameter involved in the loss function for which the two expected losses coincide.These configurations (equilibria), described by a function, divide the space of loss functions to two subsets corresponding to the preference for either action.Section 4 extends the results to unbalanced one-way designs by approximations.An example from athletics training is presented in Section 5. Apart from addressing a substantive issue, we use it to highlight the inappropriateness of hypothesis testing as a basis for decision-making.Technical derivations and related details are collected in a set of appendices.

Variance Ratio in One-Way ANOVA
In this section, we derive the ML estimator of ω and show that its distribution is a linear transformation of an F distribution.This simplifies the discussion of the properties of ω and prepares the main development in Section 3.
We consider the balanced one-way ANOVA design with clusters k = 1, . . ., K of m observations each; δ k and ε ik , i = 1, . . ., m, are two mutually independent random samples from centred normal distributions with respective variances σ 2 B and σ 2 W . Denote the variance ratio ω = σ 2 B /σ 2 W and the overall sample size n = Km.
By setting the partial derivatives of the loglikelihood for the model in (1) to zero, we obtain the estimator μ = k i y ik /n and the equations where e is the (n × 1) vector of residuals e ik = y ik − μ, composed of the K withincluster subvectors e k = (e 1k , . . ., e mk ) ; ēk is the within-cluster average residual, ēk = 1 m e k 1 m , where 1 m is the vector of unities of length m.See Appendix A for the derivation of (2).
We decompose e e into the within-and between-cluster sum of squares, where k .For the balanced design, these two statistics are independent and have scaled χ 2 distributions with n − K and K − 1 degrees of freedom and respective scaling by σ 2 W and mσ 2 B + σ 2 W : By simple manipulation of (2) we obtain the identity Therefore ω > 0 when S W < (m − 1)S B .When S W > (m − 1)S B , we truncate ω at zero, to conform with ω ≥ 0. To avoid any ambiguity, we denote by ω0 the version of ω given by ( 3), but truncated at zero.The random variable has F distribution with K − 1 and n − K degrees of freedom, and (3) is equivalent to where u = K/(K − 1)/(1 + mω).Hence P(ω 0 = 0) = P(ω ≤ 0) = P(X ≤ u), but this probability depends on ω.
Denote by f k1,k2 and F k1,k2 the respective density and distribution function of the F distribution with k 1 and k 2 degrees of freedom.To establish the properties of the estimators ω and ω0 , we use the identities where The identities are derived in Appendix B.

Decision Theory for ω
In this section, we derive the posterior distributions of ω for a range of prior distributions and use them to formulate a decision rule for choosing between the actions (options) A and B. Although the derivations refer to the Bayesian paradigm, they retain a frequentist interpretation by equating informative priors to specific data alterations.The decision rule is based on the balance of the expected losses for the two options; we prefer the option for which the expected loss is smaller.
Suppose two courses of action, A and B, are contemplated.They are complementary (one of the actions has to be taken) and exclusive (both actions cannot be taken).Suppose further that A is appropriate when ω ≤ ω 0 and B when ω > ω 0 ; ω 0 is a given scalar, but ω is not known, and all the information about it is contained in the vector of outcomes y and a prior distribution for ω.
The inverse of the identity in ( 4) is where v = K(ω + 1/m)/(K − 1).This operation is associated with the so called fiducial argument, originating from Fisher (1935Fisher ( , 1956)).Its validity has been extensively discussed (Lindley 1985, Seidenfeld 1992, Hannig 2009) .In fact, ( 7) is an example of failure of the argument, because the solution ω as a random variable has a positive probability of being negative.
We consider the non-informative improper prior g(y) = 1, y ≥ 0, and a parametric class of proper priors for ω, and base our choice of option (the course of action) on the posterior distribution of ω.The proper prior densities are for y > 0 and a parameter q > 1.A selection of these densities is drawn in Figure 1 for m = 10, with the value of the parameter q printed at the left-hand margin.For large q, the densities are highly informative and have large values in the vicinity of ω = 0. We acknowledge that the presence of m in the prior densities may be seen as problematic.
The posterior densities of ω for these priors as well as the (improper) constant prior are derived in Appendix C. They are given by the expression where z(y; ω) = (1 + my)/(1 + mω), k 1 (q) = n − K − 2q + 2 and k 2 (q) = K + 2q − 3 and The non-informative prior corresponds to q = 0.For q = 1, when the prior is not defined, the 'posterior' density in (9) is well defined as and H 1 = (K − 1)/K.This is the conditional distribution of ω in (7) given that ω > 0. The posterior distribution for q > 1 can be interpreted as an addition of 2q clusters to the data, with the corresponding reduction of the cluster size m to n/(K + 2q), to keep the overall sample size n unchanged.The estimator ω is also changed, multiplied by H 1 /H q .Thus, a prior in ( 8) is equivalent to altering the dataset, and the posterior distribution of ω can be treated as the sampling distribution of this altered dataset.Of course, we have to permit fractional withincluster sample sizes and numbers of clusters.In what follows, we drop q in the arguments of k 1 and k 2 and in the subscript of H.We assume that the appropriate course of action results in no loss.However, when B is chosen (claiming that ω > ω 0 ), even though ω ≤ ω 0 , we incur unit loss, and when A is chosen even though ω > ω 0 , the loss is equal to R. We refer to R as the penalty ratio.We want to choose the action for which the expected loss is smaller.The choice is based solely on the posterior distribution of ω.Table 1 illustrates the loss function, with values of 0, 1 and R for the subsets of the space of pairs (ω, ω).It represents the client's perspective, and is therefore subjective.This perspective may be difficult to establish and quantify by specifying the value of R. We address this issue in Section 3.2 by considering a range of plausible values of R, and later we introduce classes of loss functions other than piecewise constant.
Revista Colombiana de Estadística 38 (2015) 181-207 The expected loss associated with action B is We define the balance function as We choose action A when ∆L < 0 and action B when ∆L > 0. For greater values of ω we should be more inclined toward choosing action B, implying that ∆L, as a function of ω, is increasing.We have no analytical proof of this conjecture.The respective limits of ∆L as ω → −1/m and ω → +∞ are −1 and R. Therefore there is an equilibrium value ω * q , for which ∆L(ω * q ; R, ω 0 ) = 0.In an exhaustive (empirical) search, we have found this equilibrium to be unique.When ω < ω * q we choose A and when ω > ω * q we choose B. Note that the equilibrium may be negative; in fact, ω * q < 0 when This is equivalent to either of the following conditions that is, when the consequences of choosing action B incorrectly are sufficiently serious (large R) or we are very strict about what we regard as small (small ω 0 ).The equilibrium value is found by the Newton method or a similar algorithm.Methods that use the derivatives ∂∆L/∂ ω are not practical because the expressions involved are quite lengthy.
The decision rule based on the sign of ∆L is substantially different from the outcome of a hypothesis test based on an F distribution.As an aside, we note the complications with ω possibly being at the boundary of the parameter space.The hypothesis in a test is often that ω = 0.However, in such a case we would be satisfied if the hypothesis were not rejected even when ω is positive but small.The value of ω 0 for the decision rule is comparable to the largest value of ω that we would still call 'small' (the largest unimportant value).In this perspective, ω 0 = 0 is not appropriate.For a decision, we always choose ω 0 > 0. In decision theory, the two alternatives, ω ≤ ω 0 and ω > ω 0 , have equal status, whereas in hypothesis testing the hypothesis defines the default course of action, which is overruled only when there is sufficient evidence against it.In hypothesis testing, the semicontinuous prior for ω, with a mass at zero, can be declared.Such a distribution is a mixture of the mass at zero and a continuous distribution.For a given continuous component, greater prior mass at zero leads to greater posterior mass at ω = 0. Transparency of the declaration, a clear justification for the declared mass, is therefore essential.In the approach we propose, we do not have to resort to such a device.
Instead of ω = 0, we may test the hypothesis that ω ≤ ω 0 .However, then the power of the test for values slightly greater than ω 0 is very small, so ω 0 is not a tangible quantity in hypothesis testing as it is in our approach, where it represents a clear boundary between the two options.Note that tests (and decisions) for ω and the intraclass correlation coefficient, Although the scale of ρ is preferred by some for interpretation, evaluations for ω are easier for both hypothesis testing and decision making because of its relation to F distribution.
Having to set the penalty ratio R might seem as an additional burden to the analyst.However, its importance is obvious from how ω * q depends on R. Figure 2 displays the values of ω * 0 as a function of R for the pairs (K, m) set to (4, 6), (7, 10) and (10, 20), and thresholds ω 0 = 0.1 and ω 0 = 0.25 drawn by black and gray colors, respectively.The noninformative prior (q = 0) is assumed.The functions are drawn on the original (linear) and log scales, to obtain high resolution for both large and small values of R. The functions are decreasing with limits +∞ as R → 0 and −1/m as R → +∞.They are drawn either for R < 20 or up to the value of R at which the equilibrium reaches the minimum of −1/m.For (K, m) = (10, 20), the equilibria are above the minimum of −0.05 for both ω 0 = 0.1 and 0.25 even at R = 20, but for the other settings the limits are reached for R < 20.
For very small penalty ratio R, action B is selected only for extremely large values of ω because the error of incorrectly choosing action A has no serious repercussions.When the equilibrium is at −1/m we choose action B for all values of ω.The equilibrium functions ω * (R) have steep gradients for small values of R. The equilibria have much less curvature on the log scale.Higher value of ω 0 is associated with uniformly higher equilibrium.The differences of the pairs of equilibrium functions decrease with R.

Piecewise Linear Loss
In some settings, the loss depends not only on the sign of the error made, but also on its magnitude | ω − ω 0 |.One such class of loss functions are the piecewise linear, defined as ω 0 − ω when we choose B inappropriately (when ω ≤ ω 0 ) and R(ω − ω 0 ) when we choose A inappropriately (when ω > ω 0 ).
For a given value of ω, we compare the expected losses L A and L B , and choose the action with the smaller expected loss.In Appendix D, we derive the following expression for the balance function where G(y) = H q z(y; ω) = H q (1 + my)/(1 + mω).This identity holds only when K > 3 − 2q.Otherwise L A is infinite, so action B is chosen, irrespective of ω, ω 0 or R. When ∆L is well defined (finite) we choose A if ∆L(ω; R, ω 0 ) < 0, and choose B otherwise.
Figure 3 presents the equilibrium functions for a selection of designs (K, m, indicated in the diagram), and ω 0 set to 0.1 and 0.25 (distinct colors), as functions of the penalty ratio R. All six functions in the diagram are non-increasinggreater R makes the choice of B more attractive.Each equilibrium converges to the minimum value of ω, equal to −1/m.For ω 0 = 0.1, K = 8 and m = 15, this limit is reached at R .= 5.92 < 20.For ω 0 = 0.1, K = 10 and m = 14, the limit is reached at R .= 23.26, off the horizontal scale.Smaller value of ω 0 is associated with lower equilibrium, and therefore increased preference for action B.
Earlier we conjectured that the balance function for the piecewise constant loss is increasing.This is not the case for the piecewise linear loss in general.A set of examples is drawn in Figure 4 for K = 8 and m = 15.All the functions in the diagram converge to zero as ω → −1/m.However, for R smaller than a critical value R † = R(K, m), the balance has a dip before increasing.Without choosing the initial values carefully, a search for the root of ∆L may converge to −1/m, even when there is another root.For R < R † , the balance in favor of action A is narrower for ω close to −1/m than for larger values of ω because larger values of ω yield posterior distributions with greater dispersion.This adds strength to the choice of B, although the decision rule remains reasonable:  • for R up to a certain value R * , choose A for ω smaller than a critical value; otherwise, choose B; • for R greater than this value, choose B for all values of ω.
The borderline value R * depends on K, m and ω 0 .
A more radical dependence of the loss on the magnitude of the error is represented by the quadratic loss function, given as (ω − ω 0 ) 2 when we erroneously choose action B, even though ω < ω 0 , and as R(ω − ω 0 ) 2 when we choose action A, even though ω > ω 0 .The balance function for this loss is derived in Appendix E.

Plausible Loss Functions, Priors and Values of ω 0
The values of the penalty ratio R and ω 0 are elicited from the subject-matter expert (the client).Often no single pair of values is arrived at; instead, ranges of plausible values R = (R L , R U ) and Ω = (ω 0L , ω 0U ) are declared.A range R (or Ω) is said to be plausible if any value R (or ω 0 ) outside the range can be ruled out.Thus, if a range is plausible, then so is any (wider) range that contains it.
With plausible ranges of R and ω 0 , we might solve the problem for a grid of plausible values (R, ω 0 ) ∈ R × Ω and choose action A or B if the expected loss with that action is smaller for every plausible pair (R, ω).If action A is preferred for some plausible pairs (R, ω 0 ) but action B for some others, then an impasse results, which can be resolved only by reviewing and reducing the ranges R and Ω. Uncertainty about the prior parameter q can be dealt with similarly.
We assume that the plausible values of R and ω 0 form a rectangle, R × Ω.In practice, it suffices to find the signs of the balance ∆L for the vertices of this rectangle.If ∆L < 0 at all four vertices, then action A is selected; if ∆L > 0 at all four vertices, then action B is selected.In these two cases, we have unequivocal decisions; the same action is chosen for all plausible configurations of R and ω 0 .Otherwise we reach an impasse.If the plausible ranges of R and ω 0 are reviewed in further elicitation, the reduced ranges have to remain plausible.
In an alternative approach, we split the space of all pairs (R, ω 0 ) according to the sign of ∆L(ω; R, ω 0 ).For ω 0 close to zero, L B is very small, and so B is the preferred action.For very large ω 0 , action A is preferred.For ω and ω 0 fixed, the balance functions are linear in R. Therefore ∆L = 0 for a unique value R * = R(ω 0 ; ω).This function of ω 0 and ω splits the space (R, ω 0 ) to the regions in which A or B has smaller expected loss.Figure 5 displays these functions for piecewise constant loss and a selection of values ω indicated at the right-hand margin.The curves drawn by black color are for the design (K = 8, m = 11) and those by gray color for (K = 8, m = 10), to explore the dependence of R * on m.Less information in the data requires smaller penalty ratio R to reach the balance of the two expected losses.This applies also when K is reduced; details are omitted.
If plausible ranges are declared for R and ω 0 and R * intersects the plausible rectangle, we have an impasse, because for some plausible pairs (R, ω 0 ) one action, and for other pairs the other action is preferred.The equilibrium value R is an increasing function of ω 0 .Therefore, if the same action is preferred for both extremes R L and R U , then it is preferred for all R ∈ (R L , R U ).If one action is preferred for some plausible values of R, say, action , then we have an impasse that can be resolved only by resuming elicitation to narrow down the plausible range (R L , R U ).
Although the standard Bayesian analysis deals with a single prior, it is sometimes practical to consider a range of plausible priors, or prior parameters, (q L , q U ), reflecting the lack of consensus in the elicitation or uncertainty admitted by the expert.See Longford (2010) for an application and related discussion.An example is given in Figure 6  loss is assumed.The functions drawn in the diagram are the equilibrium functions R(q) which satisfy the identity ∆L q (ω; R(q), ω 0 ) = 0.The solutions for ω = 0.05 are drawn by black lines for the values of ω 0 printed at the right-hand margin.
The solutions for ω = 0.055 and the same values of ω 0 are drawn by gray lines.
For ω 0 = 0.10 the two curves are difficult to discern.Suppose the value ω = 0.05 was realised and the plausible range (0.15, 0.20) was declared for ω 0 .Further, suppose the respective plausible ranges for R and q are (4, 6) and (1.3, 1.4); this rectangle is filled in the diagram by gray color.The Revista Colombiana de Estadística 38 (2015) 181-207 plausible rectangle is above the equilibrium curve R(q) for any plausible value of ω 0 ∈ (0.15, 0.20), so action B is selected unequivocally.
Suppose next that the plausible range of ω 0 is (0.20, 0.25).Then nearly the entire plausible rectangle lies between the functions R(q) for ω 0 = 0.20 and ω 0 = 0.25.In this case, we have an impasse-for some plausible values of ω 0 action A and for other values action B is preferred.Finally, suppose the plausible range of ω 0 is (0.25, 0.30).The plausible rectangle intersects the region delimited by the functions R(q) for ω 0 = 0.25 and 0.30, but if the lower limit of plausible values of q could be increased, or the upper limit for R reduced, the reviewed plausible rectangle might be entirely under the curve R(q) for ω 0 = 0.25 and action A would then be preferred unequivocally.Such a reviewed plausible rectangle is delimited in the diagram by dots.
These examples imply a strong incentive to declare as small a set of plausible values as possible for all the parameters involved, to reduce the chances of an equivocal decision.But the client has to be comfortable with the implication that all pairs outside the declared set can be ruled out.

Designs Without Balance
For designs without balance we do not have a closed-form expression for the conditional distribution of ω given ω, nor a tractable posterior density of ω.We approximate this distribution by its match among the balanced designs.In the approximation, we use the synthetic number of clusters K = n 2 / k n 2 k and the harmonic mean of the cluster sizes m = K/ k 1/n k in the respective roles of K and m.These proposals are based on Potthoff, Woodbury & Manton (1992) and Longford (2000).Potthoff et al. (1992) derived a generalisation of the equation for K by matching the information in a sample with unequal sampling weights with a sample that would have equal weights.The approximation for m is derived directly from the information about the variance ratio in the likelihood maximisation.We note that these approximations are poor for large ω, such as ω > 0.5.
Figure 7 presents the histograms of the ML estimates ω obtained in 10,000 replications each for the values of ω listed in the titles for the design with two clusters each of size 6, 7 and 8, for which K = 5.92 and m = 6.90.The approximation is not perfect, but in the context of substantial dispersion of ω it is acceptable.
An alternative is to assume that the statistic S B is associated with K − 1 degrees of freedom and S W with degrees of freedom in the range (n l , n u ), where n l = (K − 1)m l and n u = (K − 1)m u , and m l and m u are the sample sizes of the smallest and largest clusters, respectively, 6 and 8 in the example above.Then we solve the problem for K(m l − 1) and K(m u − 1) degrees of freedom associated with S W .If we arrive at the same conclusion in both cases, then it applies also to the original dataset.The method is poorly suited for extremely unbalanced data.Another alternative is motivated by methods for missing data (Little & Rubin 2002, Rubin 2002).We assume that the (unbalanced) observed dataset is incomplete, and its complete version is balanced, with m u observations in each cluster.The EM algorithm (Dempster, Laird & Rubin 1977) is particularly easy to implement for this setting, but it does not yield the sampling distributions of the estimators.By multiple imputation, we generate a number of plausible completions of the observed data, and then analyse each dataset separately.If in every case we prefer the same action, we have an unequivocal conclusion.Otherwise an Revista Colombiana de Estadística 38 (2015) 181-207 impasse results.The problem with this approach is that the chances of an impasse increase substantially when the sample sizes n k are in a wide range, because the missing data contains a large fraction of the complete information.
We note that a similar approach to hypothesis testing requires an adjustment because by augmenting the observed (incomplete) dataset to make it balanced we bias the results toward the alternative.See Li, Meng, Raghunathan & Rubin (1991) for a solution.

Application
We illustrate the methods on the data from ten training sessions of a track athlete.Each session comprised eight 400 metre runs (one lap of the track), separated by jogging 400 metres for recovery.The athlete ran unaccompanied, and was not informed about any intermediate times (e.g., at 200 metres), nor when completing a lap; he could inspect the eight times only at the end of the session.He aimed to run each lap in 55.0 seconds.The purpose of the sessions was to develop a good judgement of speed, discounting any fatigue and any external factors (weather).The collected data, times in seconds with precision to one decimal place, are presented in Figure 8, with sessions marked as A -J.We ignore the sequential order of the runs and sessions.Running a lap in 55.0 seconds does not require a full effort of the athlete, who could run the distance in well below 50 seconds.Of course, fatigue accumulates over the laps, but there is an equal threat of under-and over-compensating for it.Also, initial data exploration suggests no presence of a trend over the eight laps nor any temporal dependence across the sessions.The sufficient statistics for the random-effects ANOVA are S W = 5.595 and S B = 1.797 and the sample mean is μ = 55.16.The maximum likelihood estimates (MLE) of the variance components are σ2 W = 0.0799 and σ2 B = 0.0125.
If the times achieved do not differ much across the sessions (and are sufficiently close to 55.0 seconds on average), then the athlete can progress to the next stage of training, for which the ability to maintain the particular speed is essential.One aspect of the ability is a sufficiently small value of the variance ratio ω; its MLE is ω = 0.156.Elicitation from the coach and his colleagues concluded that ω ≤ 0.20 is necessary for progressing to the next stage and that piecewise linear loss functions with R ∈ (1/5, 1/3) are plausible.Noninformative prior (q = 0) is assumed.
Contrary to popular perceptions, (professional) athletes and their coaching and management staff are well aware of uncertainty about future fitness and performance, which they consider in preparing training schedules, planning attendance in competitions and assessing the athlete's prospects.A lot of data is nowadays collected in training, and statistical definitions of distributions and their dispersion are relatively easy to introduce.In the elicitation process, we communicated with the coaching staff mainly through graphs of large (simulated) sets of times, and asked them whether the displayed variation was acceptable or not for progressing to the next stage.We settled first on ω ∈ (0.17, 0.24) and later agreed on ω = 0.20.Disagreement and uncertainty persisted about the value of R, which compares the harm done by the two kinds of erroneous choice, and that is why we consider a plausible range for it.
Figure 9 summarises the results of the analysis by the plot of the equilibrium function (marked q = 0).The plausible range of R is marked by shading and the value of ω by horizontal dots.Since ω is above the equilibrium function throughout R, the action with smaller expected loss is not to proceed to the next stage; the estimated variance ratio is too large.The equilibrium values ω * are 0.101 and 0.074 for the respective values R = 1 3 and 1 5 .For R = 0.075, ω * = 0.157, so the decision would not be affected if values of R much smaller than 1 5 were regarded as plausible.If instead of a single value, ω 0 = 0.20, the plausible range ω 0 = (0.17, 0.24) were agreed, as it was in an earlier round of elicitation, the equilibrium functions drawn by black dashes and marked by the values of ω 0 would be obtained.Since both curves are entirely below ω = 0.156 for all R ∈ (0.20, 0.33), the conclusion 'Not to proceed' would not be affected.By trial and error, we found that for ω 0 = 0.272 the equilibrium at R = 0.2 is equal to ω.Thus, if a standard more lenient than ω 0 = 0.272 for the athlete's consistency were plausible, then the decision would be equivocal (in doubt) because the equilibrium curve would then intersect the horizontal line at ω for a plausible value of R. For ω 0 = 0.32, the equilibrium curve is equal to ω at the upper limit of plausible values of R, 0.33.So, if ω 0 were greater than 0.32 the appropriate decision would be 'To proceed'.
We add a word of caution at this point.Suppose the prior with q = 1.01 is adopted.The proximity of q to unity might suggest that this prior is only mildly informative.However, the equilibrium function, drawn by gray color, differs from the equilibrium for q = 0 substantially.In fact, with q = 1.01, the decision to proceed (that ω is small) would be preferred, because the 'gray' equilibrium function is entirely above ω throughout the plausible range of R.
In an established approach, we would test the hypothesis that ω ≤ 0.20.This hypothesis is not rejected; the p value, derived from (4), is 0.48.Commonly, one would conclude that the action appropriate when the hypothesis applies should be taken.Such a decision process is logically incorrect, confusing failure to reject a hypothesis with its acceptance.That is, the analysis started by assuming that the hypothesis is valid, and no contradiction with it was found.The appropriate conclusion is that of ignorance, that we do not know whether ω ≤ 0.20 or ω > 0.20, or more precisely, that the data yield sufficient evidence for neither the hypothesis nor the alternative.We note that the hypothesis ω > 0.20 (exchanging the roles of the original hypothesis and alternative) would not be rejected either, further compounding the illogicality of a decision based on the result of a hypothesis test.

Discussion and Conclusion
The outcome of a hypothesis test is often used to support a decision to continue an analysis as if the hypothesis or the alternative were valid.This is widely acknowledged to be inappropriate when we fail to reject the hypothesis, and yet act as if it did apply, but this is often ignored in practice.In our perspective, such use of a hypothesis test is inappropriate even when the hypothesis is rejected, and thus evidence against it obtained, because the consequences of the type II error are not taken into account.The pragmatic arguments for hypothesis testing, such as relatively simple computational procedures and reference to asymptotic theory, do not have a good foundation in the case of variance estimation, especially in experiments with small or moderate sample sizes, in which the expenditure on the study and the ramifications of the errors of the two kinds are important factors.
In our approach based on decision theory, the consequences of the two kinds of bad decision are represented by a loss function, and the uncertainty about it by a set of plausible loss functions.The loss functions are elicited from the expert (client), together with the prior information about the variances.The prior distribution of the variances is useful, but specifying it is not an imperative in our analysis.In fact, it can be formulated as an alteration of the realised dataset, so the analyst does not have to subscribe to the Bayesian paradigm.Plausible priors (or prior data) and loss functions have an important role as conduits for sensitivity analysis, exploring how the conclusion is changed as a result of (small) changes in the input.This is greatly simplified by the choice of classes of loss functions, for which the balance is a linear function of the penalty ratio R; see ( 10), ( 11) and ( 16).The priors used in them involve the within-cluster sample size m.They are chosen so that the posterior would have a closed form.An alternative to them is the class of uniform priors on (0, ω † ), with a value or a plausible range specified for ω † .The balance functions with these priors are obtained from ( 11) and ( 16) by replacing each term where respectively k 1 = k 1 (q), k 1 (q) + 2 or k 1 (q) + 4 and similarly for k 2 ; h = 1, h 1 or h 2 .
Restricted maximum likelihood (REML; Patterson and Thompson, 1971) is often considered for random effects models.Our method has a simple adaptation for REML.It amounts to setting u = 1/(1 + mω) in ( 4) and in ( 9).No other changes are required in the subsequent equations.
The task solved by our approach is to select the action that is appropriate when the unknown value of the variance ratio ω is smaller than a set threshold ω 0 > 0, or the complementary action; ω 0 can be interpreted as the smallest important deviation from zero.Although a typical hypothesis test about ω is for ω = 0, failure to reject it is regarded as appropriate when ω is positive but small.Our approach requires a quantification of what 'small' means, by ω 0 , and setting it to zero would not be reasonable.Its magnitude should be informed by the methods (steps in the analysis or in experimentation) contemplated after the decision.Uncertainty about it and the contentious nature of having to specify a single value are dealt with by using a plausible range for ω 0 .
Our development is exact only for balanced (one-way) designs; the proposed solutions for unbalanced designs make references to the results for similar balanced designs.Extensions to more complex (multiway) designs are on our research agenda; for such designs the reference to balanced designs may be rather restricting.
Decision theory for some elementary statistical problems, such as estimating a mean (and fixed-effects ANOVA), a variance, and classifying units to two categories, is developed in Longford (2013).With fixed-effects ANOVA, the grouplevel means µ k , k = 1, . . ., K, are parameters, and µ 1 = • • • = µ K is the commonly tested hypothesis.A measure of the departure from this hypothesis, needed for applying our approach, can be defined through the (finite-sample) variance of the means, σ 2 G = k (µ k − µ) 2 /K, which is similar to, but not the same as σ 2 B .The variance σ 2 G is estimated by the method of moments, adjusting k (μ k − μ) 2 /K for its bias.
All computing was carried out in R, and the software developed is available from the first author (NTL).All our evaluations are based on closed-form expressions for the expected losses, and are executed instantly.When a prior is declared for which the posterior density has to be evaluated numerically, Monte Carlo Markov chain (MCMC) calculations can be employed (Robert & Cassella 2004) Large samples from the joint posterior distribution of the parameters are generated and the integrals involved in the expected loss are evaluated from these samples empirically.Much of the calculus, similar to that presented in the Appendices, can be dispensed with, in exchange for approximate results (due to the stochastic nature of MCMC) and concerns related to the convergence of the chain(s).Although MCMC evaluations are computationally much more demanding they should not restrict the scope of sensitivity analysis, which we regard as an integral element of our method.

Appendix A. Maximum Likelihood Estimation
We derive the score function for the model in (1).The joint distribution of the n × 1 vector of outcomes y is normal with block-diagonal matrix with (identical) blocks σ 2 W W k = σ 2 W W 1 = σ 2 W (I m + ωJ m ) corresponding to clusters k = 1, . . ., K; I m is the m × m identity matrix and J m the m × m matrix of unities.Denote by 1 m the m × 1 vector of unities, so that J m = 1 m 1 m .Let e k = y k − µ1 m be the vector of residuals in cluster k.

The loglikelihood is
The expressions for the roots of these equations are given in (2).

Appendix B. A Link Among F Densities
We prove the identities in (5).By Γ 2 we denote the half-gamma function, Γ 2 (x) = Γ( 1 2 x).The density of the F distribution with k 1 and k 2 degrees of freedom is Hence where The identity in ( 13) is obtained by matching the expression for yf k1,k2 (y) with another F density.First, the factor y k1/2 implies k 1 + 2 degrees of freedom instead of k 1 ; then the term 1 + yk 1 /k 2 in the denominator implies the argument h 1 y instead of y, and its exponent (k 1 + k 2 )/2 implies the change from k 2 degrees of freedom to k 2 − 2. The constant factors remaining from the match with f k1+2,k2−2 (h 1 y) reduce to k 1 /(k 1 + 2).By reusing the identity in ( 13) we obtain where

Appendix C. Posterior Distribution of ω
Using the terminology associated with the Bayes' theorem, the conditional distribution (ω | ω) is derived from (4).Its density is For ω we choose the noninformative prior g(y) = I(y > 0), where I denotes the indicator function; its result is unity when its argument as a statement is correct, and is equal to zero otherwise.The posterior distribution of ω is the standardised product of the densities of (ω | ω) and ω: ) and D = D(ω; K, n) is the standardising function (the denominator in the Bayes' theorem).By rearranging the penultimate factor as a power of a linear function of 1 + my, we obtain .
Except for the indicator I(y > 0) and a scalar D , this matches the density Revista Colombiana de Estadística 38 (2015) 181-207 Theory for One-Way ANOVA

203
where Let Y be a random variable with F distribution with n − K + 2 and K − 3 degrees of freedom.Then the function in ( 15) is the density of The posterior distribution of ω has the same distribution as Z, except for the condition that Z > 0, that is, Y > H 0 m/(1 + mω).Therefore, the posterior density of ω is where z = (1 + my)/(1 + mω).

A Class of Informative Priors
A class of proper densities for which the posterior density can be obtained in a closed form is g(y) = m(q − 1) (1 + my) q I(y > 0) for q > 1.For orientation, a selection of densities is drawn in Figure 1.For q close to unity, the prior is highly dispersed (the density for q = 1.2 is drawn by a gray line).The densities are decreasing for y > 0 and g(0) = m(q − 1).For greater q, the density has greater mass around zero and has a steeper slope for small y.Although this is quite a flexible class of functions, the dependence on m is an obvious drawback.Note that q should not be set to a large value (say, q > 5), because it corresponds to information comparable to that in a very large sample.
To reduce the typographical length of some subscripts, we introduce the notation k 1 (q) = n − K − 2q + 2 and k 2 (q) = K + 2q − 3, and drop the argument q when its value is not specified or is obvious from the context.Following the outline for the non-informative prior, we obtain g(y; ω, q) = D q m (1 + mω) q+1 I(y > 0) B 2 (n − K, K − 1) , where The constant factor of the concluding term is .
By similar steps we obtain the expression and hence (16) It holds only when K > 5 − 2q; otherwise L A is infinite and B is the preferred action.The factor [1 − F k1,k2 {G(0)}] −1 is not relevant for finding the root of ∆L, and can be dropped.Figure 10 displays the equilibrium values for a selection of designs (K, m) and ω 0 set to 0.1 and 0.25.The equilibrium functions are decreasing and reach their minima of −1/m for finite R. In the diagram, the functions are interrupted at that point.For K = 9, m = 7 and ω 0 = 0.1, no line is drawn because the equilibrium is equal to −1/m even for very small ω.

Figure 1 :
Figure 1: A set of prior densities given by (8) for cluster size m = 10.

Figure 2 :
Figure 2: The equilibrium values ω * as functions of the penalty ratio R for number of clusters K and cluster size m indicated in the left-hand panel; non-informative prior for ω.

Figure 3 :
Figure 3: The equilibrium values ω * as functions of the penalty ratio R for number of clusters K and cluster size m indicated in the left-hand panel.Piecewise linear loss and non-informative prior for ω.

Figure 4 :
Figure 4: The balance functions ∆L(ω) for the piecewise linear loss; K = 8, m = 15, ω0 = 0.2 and non-informative prior for ω.The values of R are indicated on the curves.

Figure 5 :
Figure 5: The equilibrium functions R * = R(ω0 ; ω) for the piecewise constant loss function and values of ω indicated at the right-hand margin; K = 8 and m = 11 (black lines) and m = 10 (gray).

Figure 6 :
Figure 6: The equilibrium functions R(q) for the piecewise quadratic loss function and values of ω0 indicated at the right-hand margin and at the top; ω = 0.05 (black lines) and ω = 0.055 (gray), K = 10 and m = 10.The plausible rectangle is filled by gray color, and its reduced version is delimited by dots.

Figure 8 :
Figure 8: The times for running 400 metres (a lap of the track) in sessions A -J comprising eight separate laps (repeats) each.

9 :
The equilibrium values of ω for the study of the track athlete's times for 400 metres.

Figure 10 :
Figure 10: The equilibrium values ω * as functions of the penalty ratio R for number of clusters K and cluster size m indicated in the left-hand panel.Piecewise quadratic loss and non-informative prior for ω.