Empirical likelihood for the analysis of experimental designs

Empirical likelihood enables a nonparametric, likelihood-driven style of inference without restrictive assumptions routinely made in parametric models. We develop a framework for applying empirical likelihood to the analysis of experimental designs, addressing issues that arise from blocking and multiple hypothesis testing. In addition to popular designs such as balanced incomplete block designs, our approach allows for highly unbalanced, incomplete block designs. We derive an asymptotic multivariate chi-square distribution for a set of empirical likelihood test statistics and propose two single-step multiple testing procedures: asymptotic Monte Carlo and nonparametric bootstrap. Both procedures asymptotically control the generalised family-wise error rate and efficiently construct simultaneous confidence intervals for comparisons of interest without explicitly considering the underlying covariance structure. A simulation study demonstrates that the performance of the procedures is robust to violations of standard assumptions of linear mixed models. We also present an application to experiments on a pesticide.


Introduction
In designed experiments, questions of particular interest frequently involve differences in means of a set of treatments and multiple comparisons.Classical parametric tools for analysis, such as the F -test, Tukey test, or Ryan/Einot-Gabriel/Welsch test, provide efficient ways for testing hypotheses and constructing simultaneous confidence intervals (SCIs) but rely on restrictive assumptions on underlying distributions, variances, and sample sizes.Issues of misspecification and robustness of inference arise when these assumptions are not met.In randomized block designs, rank-based, distribution-free multiple comparison procedures have been suggested, going back to Friedman (1937) and Nemenyi (1963).Mansouri and Shaw (2004) developed a Tukey-type nonparametric pairwise comparison procedure for balanced incomplete block designs.More recently, Eisinga, Heskes, Pelzer, and Te Grotenhuis (2017) proposed an exact test for simultaneous pairwise comparison of Friedman rank sums with a method to quickly calculate the exact p-values and associated statistics.Rank-based approaches, however, have limitations in that they do not fully utilize the available data and have a well-known cycling inconsistency issue (Lehmann and D'Abrera 1975;Fey and Clarke 2012).
Empirical likelihood (Owen 1988) can be helpful as a nonparametric alternative in such situations.With suitably defined estimating functions, empirical likelihood enables nonparametric, likelihood-driven inferences without distributional specifications.It is well established that various forms of empirical likelihood ratio functions admit a nonparametric version of Wilks' theorem under mild conditions, providing a basis for an asymptotic test based on a chi-square null distribution; see, e.g.Qin and Lawless (1994) and Owen (2001).In addition, the empirical distribution of the data determines the shape and orientation of confidence regions.The coverage accuracy of the confidence regions can further be improved by bootstrap or Bartlett-correction (DiCiccio, Hall, and Romano 1991).In the context of the analysis of designed experiments, empirical likelihood has been studied for inference on the median using ranking data by Liu, Yuan, Lin, and Zhang (2012) and Alvo (2015).Inference on the mean is also available by formulating an appropriate estimating function.Designs without a blocking factor, for example, can be analyzed as an analysis of variance problem (Owen 1991).Popular block designs such as randomized complete block designs or balanced incomplete block designs can also be reconfigured as a multivariate mean problem.
The existing literature has mainly focused on establishing limit theorems for a single empirical likelihood (ratio) statistic with a single hypothesis.Wang and Yang (2018) applied F -distribution calibrated empirical likelihood statistics to multiple hypothesis tests, assuming independence between tests.However, these results cannot be directly extended to various dependence scenarios, including the problem of multiple comparisons.Although individual p-values from empirical likelihood tests can be substituted into many existing multiple testing procedures, constructing SCIs for the comparisons based on empirical likelihood has not yet been investigated.This article addresses the challenges of the multiplicity of comparisons by introducing an asymptotic framework for general block designs that leads to manageable inference.In particular, each confidence interval has a variable length that accommodates the underlying covariance structure without explicit studentization, and the SCIs achieve the target coverage probability asymptotically.We also propose empirical likelihood-based multiple testing procedures that rest on this framework.These procedures are generally applicable to other models and estimating functions.
The article is organized as follows.Section 2 introduces some preliminary concepts and conditions used in the rest of the article.Section 3 develops an asymptotic theory for a set of empirical likelihood test statistics.We propose two multiple testing procedures in Section 4 and evaluate the performances of the procedures in Section 5 through a simulation study.An application to pesticide concentration experiments is discussed in Section Section 6.We conclude with a discussion of directions for future research in Section 7. The proofs of the theoretical results are provided in Appendix.

General block designs
A block design is an ordered pair (T , B) where T is a set of p points that we call treatments, and B is a collection of n nonempty subsets of T called blocks.We consider general block designs where each block size is b i , with 1 ≤ b i ≤ p, for i = 1, . . ., n. Treatment k is contained in r k blocks and each pair of distinct treatments k and l is contained in λ kl blocks, for k, l = 1, . . ., p. Then we have the following set of equations: where B k ⊆ {1, . . ., n} denotes the index set of the blocks containing treatment k.Let C n denote the associated n × p binary incidence matrix with the (i, k) component given by c ik = 1(i ∈ B k ), where 1(•) is the indicator function of its argument.The ith row sum is then b i and the kth column sum is r k .
The n blocks are regarded as random samples from an unknown population.Specifically, we assume independent and identically distributed (i.i.d.) p-dimensional random variables X 1 , . . ., X n defined on a probability space (Ω, F, P ) with mean E(X 1 ) = θ 0 ∈ int(θ) and positive definite covariance matrix Var(X 1 ) = Σ, where Θ ⊆ R p denotes the parameter space.The parameter of interest is θ 0 , the treatment effects.According to the design and C n , we only observe those X ik from X i = (X i1 , . . ., X ip ) for which c ik = 1.Since C n is always available, we do not make a notational distinction between the underlying random variable X i and its observable components.It will be clear from the context what we are referring to.In order to work with empirical likelihood, we require that n −1 C n C n → D as n → ∞ for some matrix D with positive diagonal entries.

Empirical likelihood for block designs
We introduce the general setup for empirical likelihood within the block design framework.The available data are denoted by X n = {X 1 , . . ., X n }.Inference for θ 0 is based on a p-dimensional estimating function g(X i , θ), where g(X i , θ) equals X i − θ with the unobserved components set to 0. More explicitly, we write where c i is the ith row of C n and • is the Hadamard product.The (profile) empirical likelihood ratio function, evaluated at θ, is defined as A unique solution exists if the zero vector is contained in Conv n (θ), where Conv n (θ) denotes the interior of the convex hull of {g(X i , θ) : i = 1, . . ., n}.The Lagrange multipliers λ ≡ λ(θ) of the dual optimization problem solve We denote minus twice the log empirical likelihood ratio function by In the case of g(X i , θ) = X i − θ, Owen (1990) showed that l n (θ 0 ) converges in distribution to χ 2 p , a chi-square distribution with p degrees of freedom.Similar results also hold for other forms of estimating functions (Qin and Lawless 1994), and it can be shown that l n (θ 0 ) → χ 2 p in distribution for our general block designs under some regularity conditions.A confidence region for θ 0 can then be constructed as {θ : l n (θ) ≤ χ 2 p,α }, where χ 2 p,α is the (1 − α)th quantile of a χ 2 p distribution.
For the case of a subset of the parameter vector, let θ = (θ 1 , θ 2 ) for a q-dimensional parameter θ 1 with q ≤ p, and consider testing a hypothesis H : θ 1 = θ * 1 .Under additional assumptions, a relevant test statistic l n (θ * 1 , θ 2 ) → χ 2 q in distribution, where θ 2 minimizes l n (θ * 1 , θ 2 ) with respect to θ 2 (Qin and Lawless 1994, Corollary 5).More generally, for a q-dimensional constraint h(θ) = 0, Qin and Lawless (1995) showed that if h(θ 0 ) = 0 then l n ( θ) → χ 2 q in distribution, where θ denotes the minimizer of the problem.Adimari and Guolo (2010) extended hypothesis testing with empirical likelihood to show that the chi-square calibration holds for an even broader class of estimating functions.We can apply these results to general block designs and perform some important tests, including the test of no treatment effect or the interaction between treatments and the blocking variable.The applicability, however, is still restricted to a single hypothesis test.

Multiple testing
Consider simultaneously testing m null hypotheses H j , j = 1, . . ., m.We assume that each H j corresponds to a nonempty subset of θ through a smooth q j -dimensional function h j such that H j = {θ ∈ Θ : h j (θ) = 0}.We have θ 0 ∈ H j under H j (when H j is true).The complete null hypothesis H 0 = ∩ j H j is also assumed to be nonempty.Then we denote a multiple testing procedure by φ = {φ j : j = 1, . . ., m}, where φ j maps the data X n into {0, 1}, and H j is rejected if and only if φ j = 1.We restrict our attention to procedures that provide a common cutoff value c α at a nominal level α ∈ (0, 1).Given a vector of m test statistics T n = (T n1 , . . ., T nm ), we reject H j if T nj > c α .The total number of false rejections is V m = j∈I0 1(φ j = 1), where I 0 = {j : θ 0 ∈ H j } is the index set of true null hypotheses.
Of various Type I error rates for multiple testing, the most common choice in designed experiments is family-wise error rate (FWER).When the number of hypotheses is large, one can consider the generalized family-wise error rate (gFWER) as a less stringent alternative, which is defined as the probability of v or more false rejections for some v ≤ m.A discussion of procedures for gFWER control can be found in Lehmann and Romano (2005).Formally, a procedure φ is said to control gFWER (strongly) at level α if gFWER θ (φ) = P θ (V m ≥ v) ≤ α for all θ ∈ Θ.
When v = 1, this reduces to FWER control.We say that φ controls gFWER asymptotically if lim sup n→∞ gFWER θ (φ) ≤ α for all θ ∈ Θ.This article addresses single-step procedures for gFWER control with consideration of the joint distribution of the empirical likelihood statistics.

Multivariate chi-square calibration
In order to address the multiplicity of our problem and formalize asymptotic multiple testing procedures based on empirical likelihood statistics, we first need a multivariate generalization of chi-square calibration, a multivariate chi-square distribution.The class of multivariate distributions with marginal chi-square distributions is much too broad to be useful in practice, and there is no universal definition of a multivariate chi-square distribution.
In what follows, we adopt a particular type of multivariate chi-square distribution introduced in Dickhaus (2014).
This distribution naturally arises as a joint limiting distribution of many Wald-type statistics and allows for varying degrees of freedom in each marginal.A comprehensive overview of different types of multivariate chi-square distributions and their applications can be found in Dickhaus and Royen (2015).
We now establish a multivariate extension that covers general block designs as a special case.To this end, we do not require i.i.d.observations X i and we allow the p-dimensional estimating function g(X i , θ) to take forms different from (1).Let θ be a parameter of interest (not necessarily the mean parameter) and define with the property that G(θ 0 ) = 0.Here and throughout, we use | • | to denote the Euclidean norm for vectors.For matrices, • and ∂ θ (•) denote the Frobenius norm and the Jacobian matrix, respectively.All limits are taken as n → ∞.We assume the following regularity conditions: Condition 2 g(X i , θ) and G(θ) are continuously differentiable in a neighborhood N of θ 0 almost surely, and Condition 3 There exists a matrix function V (θ) with positive definite V (θ 0 ) such that sup θ∈N S n (θ) − V (θ) → 0 in probability and sup |θ−θ0|≤bn V (θ) − V (θ 0 ) → 0, for any sequence of positive real numbers b n → 0.
Condition 4 a n G n (θ 0 ) → U in distribution for a sequence of positive real numbers a n → ∞, where U ∼ N (0, V (θ 0 )).
Condition 6 The function H j defining the null hypothesis H j is continuously differentiable on N with Jacobian matrix J j = ∂ θ h j (θ 0 ) of full rank q j ≤ p, j = 1, . . ., m.
Condition 1 is the basic existence condition for empirical likelihood in the asymptotic setting.Since the computation of l n (θ) involves the quadratic forms G n and S n , Conditions 2 and 3 are required for l n (θ), as a smooth function of θ, to be evaluated in a neighborhood of θ 0 .Condition 4 implies that, asymptotically, the quadratic forms have marginal and joint multivariate chi-square distributions.Condition 5 demands that the remainder terms be negligible.Condition 6 completes the statement of the constrained empirical likelihood problems for multiple testing and holds for most practical applications that we consider.
For j = 1, . . ., m, we define the empirical likelihood statistic associated with hypothesis H j as where THEOREM 3.1 Under H 0 and Conditions 1-6, in distribution for some sequence K n .Here, q = (q 1 , . . ., q m ) and R is the correlation matrix of (Z 1 , . . . ,Z m ), with Z j = (J j M J j ) −1/2 J j W U .
Remark 1 For the general block designs introduced in Section 2.1, it can be shown that l n is convex in θ so we can find a solution θ c of the optimization problem in (2) such that . In other cases with general estimating functions, identification of θ 0 may require additional conditions, such as compactness of θ, or θ 0 being the unique zero of G(θ) (see, e.g.Yuan and Jennrich 1998;Jacod and Sørensen 2018).
Remark 2 T n satisfies the so-called subset pivotality condition (Westfall and Young 1993) asymptotically in the sense that, for any subset S ⊆ {1, . . ., m} the joint limiting distribution of {T nj : j ∈ S} remains the same under ∩ j∈S H j and H 0 .

Illustration of the theory for general block designs
We give an illustration of the preceding theory by verifying Conditions 1-5 of Theorem 3.1 for general block designs.Condition 1 holds by applying the Glivenko-Cantelli argument over the half-spaces of Owen (2001, p. 219).Condition 2 is checked by noting that both g(X i , θ) and G(θ) are continuously differentiable with where diag(•) denotes the diagonal matrix of its argument (either a vector or a matrix).The result follows since we have assumed that with the (k, l) component of Σ denoted by σ kl .Then, we have almost surely uniformly in θ ∈ N by the (uniform) law of large numbers, which verifies the first requirement of Condition 3 such that sup θ∈N |S n (θ) − V (θ)| → 0 in probability.For the second requirement, establishing Condition 3.For Condition 4, we take a n = √ n and choose any > 0. Let We have V n → V = Σ • D where V is positive definite, and It follows from the Lindeberg-Feller central limit theorem that a n G n (θ 0 ) → N (0, V ) in distribution, and Condition 4 holds with Condition 5 is checked.

Asymptotic Monte Carlo
This section extends the multivariate empirical likelihood theory developed in Section 3 to specific multiple testing procedures for general block designs.We propose two procedures for calibration of the common cutoff value, where the finite-sample null distribution of T n is approximated by employing appropriate schemes.Both procedures determine cutoffs that provide asymptotic gFWER control (see Remark 2).As a multivariate analog of chi-square calibration, one may consider relying on multivariate chi-square quantiles of T as a cutoff.In practice, however, the covariance matrix V of U and thus the correlation matrix R of T is rarely known, making it impossible to compute the multivariate quantiles directly.As an alternative, the asymptotic Monte Carlo (AMC) procedure relies on the stochastic representation in Theorem 3.1 to produce a simulation-based approximation to the distribution of T up to any desired precision.
Suppose that we have a consistent estimator θ of θ 0 .It can be shown from Condition 3 that S n ( θ) → V in probability; see Hjort, McKeague, and van Keilegom (2009, Remark 2.2).Then, the AMC procedure consists of replacing V with S n ( θ) and simulating samples from the Data: X n Result: cutoff c α and adjusted p-values p 1 , . . ., p m 1. Compute T n , S n ( θ), and A 1 , . . ., A m 2. Monte Carlo simulation for approximation 4. Adjusted p-values and multiple testing for j = 1, . . ., m do , defined conditionally on the observed data X n .With P n denoting the conditional distribution of T n , the following theorem ensures that the distance between P n and the distribution of T n converges to zero in probability.
THEOREM 4.1 Under H 0 and Conditions 1-6, Theorem 4.1 guarantees the asymptotic validity of the AMC procedure described in Algorithm 1.For v = 1, the procedure reduces to controlling the asymptotic FWER and the cutoff When the H j s are contrasts of the form p k=1 u k θ k with known constants u 1 , . . ., u p , asymptotic 100(1 − α)% SCIs for the The test procedure and the SCIs are compatible, i.e. whenever H j is rejected, the corresponding interval does not include the null value and vice versa.
Remark 3 Rather than generating draws from an approximate multivariate chi-square distribution, low-dimensional multiplicity-adjusted quantiles can be computed numerically if the underlying correlation matrix fulfills certain structural properties (Stange, Loginova, and Dickhaus 2016).

Nonparametric bootstrap
It has been widely noted that the error rates of tests based on the asymptotic chi-square calibration tend to be higher than the nominal levels, especially in small sample or high-dimensional problems; see, e.g.Qin and Lawless (1994) and Tsao (2004).This issue persists in our setting with multiple empirical likelihood statistics.Moreover, considering the incomplete nature of block designs, convergence to a multivariate chi-square distribution may be slow.As an alternative, Owen (1988) proposed a bootstrap calibration for the mean.Resampling from the original data X n yields bootstrap replicates X In our setting, let X n be the null-transformed data with H 0 imposed on the observed data X n (see (4) below as an example).Then we denote the (nonparametric) bootstrap samples by Theorem 4.2 ensures that the conditional distribution of T * n approximates the multivariate chi-square distribution of T .Adding the continuity of T implies that the procedures for gFWER control can be asymptotically calibrated by the bootstrap replicates of T * n , namely T (1) n , . . ., T n .In Algorithm 2 we describe the nonparametric bootstrap (NB) procedure.It differs from the AMC procedure only in the cutoff c NB α and the resulting adjusted p-values.Our experience with the procedure shows that the NB procedure is better tuned to the distribution from which the data arise and that c NB α is typically larger than c AMC α .
Remark 4 Other bootstrap schemes can be considered as well.The block bootstrap, for instance, may be adapted to produce bootstrap replicates that better preserve the original design structure.This can be of great importance when n is small and the convex hull constraint is of concern.
In this regard, it is worth examining the applicability of alternative formulations of empirical likelihood that are free from the constraint (see, e.g.Chen, Variyath, and Abraham 2008; Tsao and Wu 2013).

Simulation study
In this section, we carry out a simulation study on a balanced incomplete block design for all pairwise comparisons of treatment effects.The design has five treatments with n blocks, and each block consists of a pair of treatments that appear in 0.1n blocks.We have a (5, n, 0.4n, 2, 0.1n)design for short.Finite sample performances of the AMC and the NB procedures are evaluated for controlling FWER and constructing SCIs.We simulate data from the following standard linear 4. Adjusted p-values and multiple testing for j = 1, . . ., m do Algorithm 2: NB mixed effect model: where both β i and ik are i.i.d.random variables for block effects and errors, respectively.The null hypothesis for treatment pair (k, l) is H kl : θ k − θ l = 0 for k, l = 1, . . ., 5 with k < l.We denote the pairwise differences between treatment effects by δ j for j = 1, . . ., 10, with the corresponding hypothesis H j and test statistic T nj .
For comparison, we consider the single-step procedure proposed by Hothorn, Bretz, and Westfall (2008) as a benchmark.This procedure (henceforth HBW) is based on an asymptotic multivariate normal distribution for the point estimates and a consistent plug-in estimate of the associated covariance matrix.We apply HBW to restricted maximum likelihood estimates of δ j , assuming the additive form and compound symmetry that are present in the model.We refer to Hothorn et al. (2008) for technical details.
where Gamma(2, 1) denotes a Gamma distribution with shape parameter 2 and scale parameter 1.Each pair of scenarios has a distinct distributional specification.In each pair, the first scenario is of the form (3). The second scenario, however, has negligible block effects and larger variance for the fifth treatment, breaking some assumptions of the model.In each scenario, we consider three different numbers of blocks n ∈ {50, 100, 200} and three different values of θ to vary the number of true null hypotheses.Given specific values of n and θ, simulation results for the AMC procedure are obtained as follows.For S = 10,000 simulation runs indexed by s: Step 1 Simulate data from the given scenario and compute T n (s).
Step 2 With B = 10,000, apply the AMC procedure in Algorithm 1 to obtain c AMC The results for the NB procedure are obtained similarly.
Step (i) is modified to set up bootstrap sampling that respects H 0 .Before drawing the bootstrap replicates, pass from X ik to where X k = r −1 k i∈Bk X ik is the maximum empirical likelihood estimate for θ k .Applying Algorithm 2 in Step (ii), we obtain the same T n but the cutoff c NB α is different from c AMC α , which produces different SCIs, FWER, AL, and CP.All simulations are performed in R (Team 2023).We implement AMC and NB with the melt package (Kim 2022).For HBW, we fit (3) via the lme4 package (Bates, Mächler, Bolker, and Walker 2015) and then pass the result to the multcomp package (Hothorn et al. 2008).
Tables 1-3 summarize the simulation results.In all scenarios and procedures, the FWER is largest for all n when H 0 holds and decreases as the number of false hypotheses increases since there are fewer opportunities to reject the true null hypotheses.By construction, AL and CP are not related to θ.The intervals are shorter when the model generates less variation in the data.FWER and CP approach their respective targets, 0.05 and 0.95, under H 0 as n increases.For AMC, FWER and CP are quite far from the targets when n = 50 and are also sensitive to the distribution of the data.As can be seen from Table 2, the FWER and CP tend to be worse when the distribution is highly skewed and has a thick tail.The estimates computed with only 20 observations per treatment can be inaccurate in the presence of skewness and outliers.On the other hand, NB provides FWER and CP close to target even when n = 50.NB outperforms AMC in FWER and CP but has larger AL in all scenarios.This finding is consistent with our experience that c AMC α < c NB α holds in most cases and in keeping with the slow convergence of Wald-type statistics to chi-square distributions (see, e.g.Pauly, Brunner, and Konietschke 2015).The performances of AMC and NB are similar when n = 200 or when the data range is restricted (Table 3).Interestingly, NB is more conservative when n = 50 than when n = 100 or n = 200.This is partly due to the higher chance that a bootstrap sample may not satisfy the convex hull constraint, contributing to the large cutoff of NB (see Remark 4).
The HBW procedure, contrary to the empirical likelihood-based procedures, depends heavily on the assumptions in (3).HBW performs well in scenarios where the compound symmetry assumption is met (S1-1, S2-1, and S3-1).FWER and CP are robust across different distributions for the block effects and the errors.Except for n = 200, HBW outperforms AMC and comes close to NB with considerably shorter AL.In scenarios where compound symmetry is violated (S1-2, S2-2, and S3-2), however, HBW shows a substantial performance deterioration.The AL of HBW is larger than those of AMC and NB when n is 100 or 200.FWER and CP are far from their target values, and the rate at which they improve is much slower.Figure 1 further shows the impact of the violation on AL and CP by gradually decreasing the degrees of freedom for the distribution of i5 in S3-2 when n = 200 and θ = (0, 0, 0, 0, 0).The AL of AMC and NB is much larger for the intervals with θ 5 than the rest, and only the AL of these intervals increases as the degrees of freedom decrease to 2 (infinite variance).As a result of this adjustment, the CP of the individual interval is maintained above 0.95 for AMC and NB.In contrast, all intervals of HBW have the same length.This implies that the intervals with θ 5 are not wide enough as SCIs, causing the under-coverage shown in Figure 1.For AMC and NB, additional simulation results for gFWER control are also shown in Table 4.In summary, AMC and NB show robust performance without relying on restrictive model assumptions.Notably, NB successfully approximates the target error rate and CP even with small sample sizes.The performance gap between AMC and NB is modest for larger n, where the computational burden of the bootstrap and optimization involved in NB gives the advantage to AMC.

Application to pesticide concentration experiments
We apply the methodology developed in Sections 3 and 4 to analyze the clothianidin concentration data in Alford and Krupke (2017).Clothianidin, a neonicotinoid pesticide, is a potent agonist of the nicotinic acetylcholine receptor in insects and is extensively applied in the United States to maize seeds before planting.Quantifying the amount of clothianidin translocated into plant tissue, coupled with its potential for environmental accumulation via runoff or leaching, provides information on the costs and benefits of this delivery method.
Figure 1.AL and CP of S3-2 with varying degrees of freedom for the distribution of i5 when n = 200 and θ = (0, 0, 0, 0, 0).For AMC and NB, separate results are presented for the comparisons that involve θ 5 (+5) and those that do not (−5).AL and CP are computed groupwise.
Alford and Krupke (2017) investigated clothianidin concentration in three regions (seed, root, and shoot) of maize plants in the early growing season.Two field experiments were conducted in 2014 and 2015 with four seed treatments: untreated (Naked), fungicide only (Fungicide), low rate of 0.25 mg clothianidin/kernel (Low), and high rate of 1.25 mg clothianidin/kernel (High).In a randomized complete block design with four blocks for each year, the treatments were applied to four plots in each block.Clothianidin contamination of the untreated plots was expected due to subsurface flow and proximity between plots.Sampling was carried out at 6, 8, 10, 13, 15, 17, 20, and 34 days post-planting in 2014, and at 5, 7, 9, 12, 14, 16, 19, 47, and 61 days post-planting in 2015.On each sampling day, up to ten plants were randomly sampled from each plot, and three or five of them were processed for chemical analysis.Some plant observations were lost before the analysis.Root and shoot regions of the remaining observations were scored as "complete" (> 80% present) or "incomplete" (< 80% present).In this way, the experimental design had a hierarchical structure of sampling (year/block/days post-planting/plot) that is unbalanced and incomplete in several layers.The clothianidin concentration data were log-transformed to conform more closely to a normal distribution.Alford and Krupke (2017) fit a linear mixed model to test two contrasts: Naked + Fungicide vs. Low and Naked + Fungicide vs. High.Jensen, Schaarschmidt, Onofri, and Ritz (2018) used the same data and performed a variety of post-hoc pairwise comparisons with various linear mixed models.
We subdivide the original blocks into 68 new blocks according to the days post-planting, considering that plant tissue clothianidin dissipates over time following an exponential decay pattern (Alford and Krupke 2017, Fig. 2).For illustration, we analyze only the "incomplete" shoot region observations.This results in 32 blocks.Plot level replicates, if any, are averaged over the treatments within these blocks, resulting in 102 observations in total.The decay pattern gives Table 4. Simulation results for the gFWER control.For the three simulation scenarios, gFWER of each procedure is computed below for v = 2, 3, 4, 5 with θ = (0, 0, 0, 0, 0).To summarize, empirical likelihood has the advantage of avoiding clearly inappropriate assumptions.For these data, imposing these assumptions and performing a standard analysis leads to different conclusions.

Discussion
Several extensions remain open for future research.We are primarily interested in producing common cutoffs for pairwise comparisons and SCIs, but the AMC and NB procedures can be modified to yield common quantiles.Common quantile procedures are appropriate when the asymptotic multivariate chi-square distribution in Theorem 3.1 has different degrees of freedom for each marginal.While preserving the gFWER control, improvement in power can be achieved by adapting to stepwise procedures in both approaches.
As previously mentioned in Section 3.1, it is also possible to study other parameters and estimating functions in a similar fashion.AMC and NB would need minor adjustment once an asymptotic multivariate distribution is established, though it can be challenging to specify the null transformation for NB.Due to nonconvexity, a major challenge would be the computation of the statistics in (2).See Tang and Wu (2014) for general strategies for computing constrained empirical likelihood problems.
Finally, another interesting topic concerning multiple testing is the use of high-dimensional estimating functions with a growing number of parameters.Hjort et al. (2009) and Tang and Leng (2010) investigated the feasibility of empirical likelihood methods when p, the dimension of the parameter, is allowed to increase with n.In such high-dimensional settings, however, the typical objective is often to control other types of error rates, such as the false discovery rate, which is outside the scope of this article.
for some K > 0 and K n .Now consider hypotheses H j for j = 1, . . ., m.For each j, there exist K j > 0 and K . We may take K = max{K 1 , . . ., K m } and define T nj = inf θ∈Hj∩Kn l n (θ).Then, Applying the Crámer-Wold device, under H 0 we have U m ) has a multivariate normal distribution with mean 0 and correlation matrix R, where R is a m j=1 q j × m j=1 q j block matrix whose kth diagonal matrix is I qk and (k, l) off-diagonal matrix is . Then, T follows a multivariate chi-square distribution with parameters m, q = (q 1 , . . ., q m ), and R, i.e.T ∼ χ 2 (m, q, R).
Proof of Theorem 4.2.We present the proof of Theorem 4.2 first, followed by the proof of Theorem 4.1 since it readily follows from Theorem 4.2.We set up some notation and terminology.Recall that the bounded Lipschitz metric d BL between two probability measures p and Q on R m for m ∈ N metrizes weak convergence and is defined by where BL 1 denotes the set of functions f : R m → [−1, 1] such that |f (x) − f (y)| ≤ |x − y| for all x, y ∈ R m .As a mapping from the common probability space (Ω, F, P ) into the set of probability measures on R m , let P n be a sequence of random probability measures such that f d P n is measurable for any bounded and Lipschitz continuous function f .Then we say that P n converges weakly to p in probability if in probability for all f ∈ BL 1 .Also, (12) holds if and only if d BL ( P n , P ) → 0 in probability.Moreover, if the distribution function of p is continuous, it is equivalent to d K ( P n , P ) → 0 in probability (see, e.g.Bücher and Kojadinovic 2019, Lemma 2.5), where denotes the Kolmogorov distance between P and Q.Let X * i be an independent observation from X n for i = 1, . . ., n.It can be shown that, for each j, T * nj = nG * n (X) A * j G * n (X) + o P (1), where G * n (X) = n −1 n i=1 g(X * i , X) and We first establish a bootstrap central limit theorem for √ nG * n (X) as in Singh (1981).Observe that g(X i , X) = 0, and Var * g(X * i , X) = 1 n n i=1 g(X i , X)g(X i , X) = S n (X) = S n (θ 0 ) + o(1) almost surely.Then S n (X) → V almost surely by the law of large numbers.Applying the Lindeberg-Feller central limit theorem, for any > 0 we have almost surely.It follows that √ nG * n (X) converges weakly to a N (0, V ) distribution almost surely, i.e.
This implies that S * n → V in probability.It follows from the continuous mapping theorem and (13) that d K (L * (T * nj ), L(T j )) → 0 in probability for each j.Then for every fixed λ = (λ 1 , . . ., λ m ) ∈ R m , an application of the continuous mapping theorem implies that

n
, b = 1, . . ., B. For each X (b) n , the empirical likelihood statistic l (b) n (X) is computed at the sample mean X of X n .The cutoff is obtained as the sample (1 − α)th quantile of {l Data: X n and X n Result: cutoff c α and adjusted p-values p 1 , . . ., p m 1. Compute T n 2. Bootstrapping for approximation for b = 1, . . ., B do Simulate X and its length |I AMC j (s)| for each j.The empirical FWER, average length (AL), and coverage probability (CP) of the SCIs are calculated

Figure 2 .
Figure 2. Summary of data for each treatment: (a) box plot of log transformed clothianidin concentration with median (solid line) and mean (dashed line); (b) density plot of clothianidin concentration.

Figure 3 .
Figure 3. Pairwise scatter plots of observations.Each dot represents a pair of observations in a block; incomplete pairs are discarded in each plot.The overlaid heat maps show densities.Many dots are located either in the bottom left or upper right corners.

Figure 4 .
Figure 4. Asymptotic 95% simultaneous confidence intervals for pairwise comparisons.The estimates are given as dots inside the error bars.As a result of the larger cutoff, the NB interval contains the corresponding AMC interval for all comparisons.
X n , N (0, V ) → 0 (13) almost surely.Next, and let s * jk denote the (j, k) component of S * n .Then E * (S * n ) = S n (X) and, by the law of iterated expectation, E S * n − S n (X) 14)    in probability.From the subsequential property of convergence in probability(Dudley 2002, Theorem 9.2.1), there exists a subsequence such that (14) holds almost surely along the subsequence.Then the Crámer-Wold device implies that T * n converges weakly to T almost surely along the subsequence.Another application of the subsequential argument shows thatd K (L (T * n | X n ) , L(T )) → 0in probability.Finally, T n → T in distribution under H 0 and the result follows from the continuity of T .Proof of Theorem 4.1.Since S n ( θ) → V in probability, we have A j → A j in probability for j = 1, . . ., m and the continuity of the characteristic function of the normal distribution implies thatd K (L (U n | X n ) , N (0, V )) → 0in probability.For any λ = (λ 1 , . . ., λ m ) ∈ R m , it follows from the continuous mapping theorem that Then the Crámer-Wold device and the subsequential argument applied in (14) complete the proof.
n, are i.i.d.observations from X n .Conditional on X * n , we denote the bootstrap empirical likelihood statistic by l * n (θ) and the test statistics by T * n = (T * n1 , . . ., T * nm ), where T * nj = a 2 n n −1 inf θ∈Hj l * n (θ).We establish another consistency result that provides the weak convergence of T * n in probability to T .As is customary, we denote the bootstrap distribution conditional on the data by P * n .
3 , θ 4 , and θ 5 are set to zero.The largest standard error of the results is 0.003 when n = 50.

Table 2 .
Simulation results under scenario S2. , θ 4 , and θ 5 are set to zero.The largest standard error of the results is 0.003 when n = 50.
3 , θ 4 , and θ 5 are set to zero.The largest standard error of the results is 0.007 when n = 50.
The largest standard error of the results is 0.003 when n = 50.

Table 5 .
Pairwise comparisons between treatments: Naked(N), Fungicide(F), Low(L), and High(H).The estimates are obtained from empirical likelihood (EL) and the mixed effect model.