Maximum empirical likelihood estimation and related topics

Abstract: This article develops a theory of maximum empirical likelihood estimation and empirical likelihood ratio testing with irregular and estimated constraint functions that parallels the theory for parametric models and is tailored for semiparametric models. The key is a uniform local asymptotic normality condition for the local empirical likelihood ratio. This condition is shown to hold under mild assumptions on the constraint function. Applications of our results are discussed to inference problems about quantiles under possibly additional information on the underlying distribution and to residual-based inference about quantiles.


Introduction
Let (Z , S ) be a measurable space, Q be a family of probability measures on S , and κ be a function from Q onto an open subset Θ of R k .Let Z 1 , . . ., Z n be independent and identically distributed Z -valued random variables with an unknown distribution Q belonging to the model Q.We are interested in inference about the characteristic θ = κ(Q) of Q.Let us look at the following case.
(K0) There is a function u from Z × Θ into R m , with m ≥ k, such that, for every R in Q, the identity u(z, κ(R)) dR(z) = 0 holds and the matrix is positive definite.
We refer to u as a constraint function.To simplify notation we abbreviate W (Q) by W and set Let P n denote the closed probability simplex of dimension n, To construct confidence sets for θ, Owen (1988 [11], 1990 [12], 2001 [13]) introduced the empirical likelihood and proved the following theorem.
This allowed him to show that the set where χ 2 1−α (m) denotes the (1 − α)-quantile of the chi-square distribution with m degrees of freedom, is a confidence set for θ of asymptotic size 1 − α.Indeed, this follows from Of course, this result can also be used to test whether θ equals some specific value, say θ 0 .The corresponding test rejects the null hypothesis H 0 : θ = θ 0 if the test statistic −2 log R n (θ 0 ) equals or exceeds χ 2 1−α (m).This test has asymptotic size α.Soon it was realized that the empirical likelihood can also be used to construct point estimators.Qin and Lawless (1994 [16]) studied the maximum empirical likelihood estimator (MELE) θ = arg max ϑ∈Θ R n (ϑ).
Similar to the classical theory for parametric models, where the behavior of the maximum likelihood estimator is tied to the behavior of the local log-likelihood ratio, the behavior of the empirical likelihood analogs is now linked to the behavior of the local empirical log-likelihood ratio The local empirical log-likelihood is said to satisfy the uniform local asymptotic normality (ULAN) condition if the expansion holds for all finite constants C, some invertible k × k dispersion matrix J, and random vectors Γ n satisfying Γ n =⇒ N (0, J).
Qin and Lawless (1994 [16]) obtain this condition under regularity and integrability conditions on the constraint function u and its partial derivatives with respect to the parameter.Their assumptions imply the following conditions with the matrix A equal to −E[ u(Z, θ)], where u denotes the derivative of u with respect to the parameter.(Q2) There is an m × k matrix A of full rank k such that the expansion holds for each sequence C n of positive numbers satisfying C n = o(n 1/2 ).
The conditions (Q1) and (Q2) are also implied by Assumptions 2.1 and 2.2 in Parente and Smith (2011 [14]) who allow for irregular u and treat generalized empirical likelihood.Under their assumptions the matrix A equals the derivative of the map ϑ → −E[u(X, ϑ)] at θ.
Here we shall show that the ULAN condition holds under the following weaker conditions which restrict the sequences C n in (Q1) and (Q2) to be bounded.[u(Z j , θ + n −1/2 t) − u(Z j , θ)] + At = o P (1) holds for every finite constant C.

Theorem 1.2. Suppose (K0)-(K2) hold. Then the expansion
holds for every finite C. Thus the local empirical log-likelihood satisfies the ULAN condition with J = A W −1 A and Γ n = A W −1 U n .
The expansion (1.1) is critical to the study of maximum empirical likelihood estimation.Assume for the moment that the map ϑ → R n (ϑ) attains a maximum on each compact subset of Θ.This is the case when the map is upper semi-continuous or if it takes only finitely many values.Note that the function h defined by is uniquely maximized by t = J −1 Γ n .This shows that under the ULAN condition the random function ϑ → R n (ϑ) has a local maximum θ such that In particular, if ϑ → R n (ϑ) has one local maximizer with probability tending to 1, then this local maximizer θ will obey the expansion (1.3).The theory becomes more involved if ϑ → R n (ϑ) has several local maxima or if maxima do not exist.
For J and Γ n of Theorem 1.2, the expansion (1.3) can be written as If m = k, then A will be invertible, and (1.4) simplifies to We call an estimator θ that satisfies (1.4) central.Qin and Lawless (1994 [16]) have shown that central estimators possess some optimality properties, i.e., they are efficient in the model defined by the constraint function u which is the largest model for which (K0) holds.
Qin and Lawless (1994 [16]) claim that the MELE is central under their assumptions.Their proof, however, only establishes that a maximizer in a neighborhood of the true parameter of radius n −1/3 has this property, see the proof of their Lemma 1. Parente and Smith (2011 [14]) consider a compact parameter space, an interior point θ of this parameter space and show that the MELE is central under their assumptions which imply (Q1) and (Q2).The assumption that the parameter space is compact is mathematically convenient, but typically not realistic in applications.We shall obtain central estimators under the weaker conditions (K1) and (K2) and without compactness, but require the availability of a √ n-consistent estimator.In Section 3 we address two methods of constructing central estimators, namely, one-step maximum empirical likelihood estimation and guided maximum empirical likelihood estimation.These methods yield the following constructive existence result for central estimators.
It follows from the previous two theorems that, under the assumptions of Theorem 1.3, every central estimator θ satisfies the expansion with Ũn = W −1/2 U n and Π A the idempotent matrix Since the m-dimensional random vector Ũn is asymptotically standard normal, we see that −2 log R n ( θ) is asymptotically chi-square with m − k degrees of freedom provided m is greater than k.If m equals k, then −2 log R n ( θ) converges to zero in probability.For m > k, a similar result has been proved in Corollary 4 by Qin and Lawless (1994 [16]) under their regularity assumptions and more recently for the irregular case under stronger conditions in Theorem 1 by Lopez, Van Keilegom and Veraverbeke (2009 [7]) and in Theorem 3.1 by Parente and Smith (2011 [14]).We avoid some of the difficulties by working with central estimators instead of the maximum empirical likelihood estimator.The latter satisfies the expansion (1.4)only under additional requirements such as consistency.Note the simplicity of our conditions as compared to conditions (C0)-(C6) of Lopez, Van Keilegom and Veraverbeke (2009 [7]).
In the above we have focused on maximum empirical likelihood estimation.The key to this was the ULAN condition.As this condition plays a key role in the theory of likelihood ratio tests for parametric models, it should not be surprising that the theory for likelihood ratio testing for parametric model carries over to the empirical likelihood setting.Indeed, Qin and Lawless (1994 [16]) have already discussed this under their sufficient conditions for ULAN.We shall develop the appropriate theory for empirical likelihood ratio testing in Section 4.
So far we have discussed a simple approach to maximum empirical likelihood estimation which generalizes results of Qin and Lawless (1994 [16]) to allow for irregular constraint functions and relaxes the conditions in Parente and Smith (2011 [14]).Of great interest are extensions to constraint functions that depend on nuisance parameters.Generalizations of Theorem 1.1 that allow for estimated constraint functions have been developed in Hjort, McKeague and Van Keilegom (2009 [3]) and Peng and Schick (2013 [15]).Here we are interested in developing a theory parallel to Theorems 1.2 and 1.3 that allows for constraint functions with estimated nuisance parameters.The theory will be developed in Section 2.
The remainder of this paper is organized as follows.In Section 2 we discuss the case when the constraint function depends on characteristics of the underlying distribution and is thus unknown.We develop a theory parallel to that given in this introduction based on estimates of the unknown constraint function.The key result is Theorem 2.2 which gives the ULAN condition for the local empirical likelihood based on random constraint functions.In Section 3 we address the construction of central estimators in the more general setting of Section 2. Section 4 treats empirical likelihood ratio testing again for random constraint functions.In Section 5 we treat several inference problems related to quantiles as these provide constraints that are not regular.In particular, we treat maximum empirical likelihood estimation of quantiles with and without additional information, and empirical likelihood ratio testing about quantiles and about the equality of median and mean.The results of a simulation study are reported in Section 6, where we compare the behavior of the various constructions of central estimators in small to moderate sample sizes and present simulated significance levels and powers of empirical likelihood ratio tests.Residual-based inference about a quantile is considered in Section 7 for regression models.We first treat linear regression and then discuss how the results carry over to nonparametric and semiparametric regression models.In Section 8 we present a uniform expansion for an abstract general empirical likelihood process.This result is then used to prove Theorem 2.2 and other related expansions.

Maximum empirical likelihood estimation in the presence of nuisance parameters
Our goal is to extend the results discussed in the Introduction beyond the basic assumption (K0).We are interested in extensions that allow for nuisance parameters.This is important for applications to semiparametric models.A formulation that allows for this is given next.Again, let m be an integer satisfying m ≥ k.
holds and the matrix is positive definite.
Note that (K0) is the special case of (L0) in which u R = u for all R ∈ Q.To simplify notation we abbreviate W (Q) by W and set and Then −2 log Rn (θ) is asymptotically chi-square with m degrees of freedom.
Next we are looking for a generalization of Theorem 1.2.The corresponding local empirical log-likelihood ratio is Motivated by the conditions (K1) and (K2), we introduce the following conditions.
(L1) For every finite constant C one has (L2) There is an m × k matrix A of full rank k such that the expansion holds for each finite constant C.
Theorem 1.2 is a special case of this theorem.To see this take u R = u for all R ∈ Q and ûn = u.Theorem 2.2 lets us also treat the case when (K0) holds, but we want to work with a slightly perturbed version u n of u.In this case ûn = u n is non-stochastic.In particular, this allows the treatment of smoothed versions of u.If Θ = R k , a possible smoothed version is given by u n (z, ϑ) = u(z, ϑ + b n u)K(u) du, where K is a kernel and b n is a bandwidth.
Having obtained the ULAN property for the modified empirical likelihood, the theory for central estimators based on it can be developed as before.Now a central estimator must satisfy the expansion The following theorem is a consequence of the results of Section 3.

Theorem 2.3. Suppose (L0)-(L2) hold, and θ is a √ n-consistent estimator. Then one can construct a central estimator.
One has to be careful in selecting the functions {u R : R ∈ Q} in order to achieve (L2).This will be explained by means of an example in Section 7.

On the construction of central estimators
In this section we address the construction of central estimators.We shall restrict our attention to the more general case when the assumptions (L0)-(L2) are met.Results for this case immediately yield results for the case (K0)-(K2); simply take u R = u and ûn = u.All our methods require the availability of a preliminary √ n-consistent estimator of θ.Thus throughout this section we shall always assume that the following condition is met.
We abbreviate W (Q) from (L0) by W and let U n be the random vector defined in (2.1).It follows from Theorem 2.2 that the ULAN condition holds with We begin with a simple observation.A n 1/2 -consistent (generalized) MELE is central.
Then θ is central.
Then, on the event we derive from the ULAN condition and the identity (1.2) the expansions and On this event, we also have B n1 ≥ B n2 − 1/n, and therefore 1[A n ] Δ = o P (1) by the positive definiteness of J. Since this holds for every C, we obtain the desired result in view of the n 1/2 -consistency of θ.
The previous lemma was formulated for a generalized MELE, which in contrast to a MELE does always exist.The practical value of this lemma is limited, as it does not provide a method of constructing a n 1/2 -consistent generalized MELE and hence a central estimator.Explicit methods of constructing central estimators are discussed next.
Method 1: One-step maximum empirical likelihood estimation.One-step maximum likelihood estimators were introduced by Le Cam (1960 [6]), who showed that such estimators are asymptotically efficient in parametric LAN families.He actually used a discretized preliminary estimator in his construction.Discretization is not needed here, in view of the more stringent ULAN condition.
A one-step maximum empirical likelihood estimator is of the form and the random vector Γn obeys the expansion It is easy to see that such an estimator is central.
There are several ways to construct the quantities J and Γn .One such method is described in Fabian and Hannan (1982 [1]) who use first and second order differences of the log-likelihood.Here we follow a different approach which uses least squares.

Let us set
Then r(t) equals Ln (Δ + t) − Ln (Δ) and we obtain the identity for every finite C.This identity can be rewritten as and where, for a k × k matrix M , diag(M ) denotes the vector formed by the diagonal of M and φ(M ) denotes the vector ordered by row index and then by column index.Now let t 1 ,. . .,t L be vectors in R k and set . . .
Then we have the identity r = DV + e. Assume now that the matrix D has full rank . Thus the requirements (3.1) and (3.2) are met if we take Γn = ( b1 , . . ., bk ) and J the symmetric matrix with diagonal −( bk+1 , . . ., b2k ) and upper triangular part formed by the last K − 2k entries of − b in an obvious way.We refer to the one-step MELE with these choices of J and Γn as the least squares one-step MELE, short LSMELE.Let us summarize our findings in the following theorem.
Theorem 3.1.Suppose condition (A) holds and the vectors t 1 , . . ., t L are chosen such that the matrix D has full rank K. Then the LSMELE is central.
In the case k = 1, we can take t 1 , . . ., t L to be distinct non-zero numbers that satisfy t 3  1 Then the LSMELE takes on the simple form The above theorem remains true even if the vectors t 1 , . . ., t L are replaced by random vectors which are bounded in probability and for which D D is positive definite.Moreover L, the number of vectors, can be made random as long as we keep it bounded by a fixed number L 0 .
Method 2. Guided maximum empirical likelihood estimation using one-step estimators.Let θ * denote a one-step MELE such as the LSMELE.Although this estimator is central under condition (A), we might want to slightly modify it to resemble more a MELE.Roughly speaking our second method works with an approximate maximizer of Rn (ϑ) in a ball of radius cn −1/2 centered at the onestep MELE.More precisely, we call an estimator θ that satisfies Thus we have the expansion From this and the invertibility of J we immediately conclude the desired result Δ = o P (1).
Method 3: Guided maximum empirical likelihood estimation using a n 1/2consistent estimator.Guided (generalized) maximum empirical likelihood estimation can also be done using the n 1/2 -consistent estimator θ rather than the one-step estimator.This, however, requires a larger neighborhood and a stronger version of the ULAN condition.

Theorem 3.3. Let condition (A) hold and let C n be a sequence of positive numbers tending to infinity and satisfying
holds.Then an estimator θ that satisfies Then, with h as in (1.2) and λ the smallest eigenvalue of J, we have where Since c n is arbitrary, we conclude the n 1/2 -consistency of θ.The desired result now follows as in Lemma 3.1.
From a practical point it is preferable to work with a very slowly growing C n , say C n = (log n) 1/2 .Sufficient conditions for the strengthened version of ULAN needed in the theorem can be given by strengthening (L0)-(L2).A general result will be given in Section 8.Here we mention the special case for is finite, and we have the rates

Empirical likelihood ratio testing
In this section we shall discuss empirical likelihood ratio testing.For this we assume again the setting of the introduction and require that (L0)-(L2) hold so that we have the ULAN condition for the likelihood ratio.We do not separately discuss the case for the conditions (K0)-(K2) as this is just the special case with u R = u for all R ∈ Q and ûn = u.We begin with a preliminary result.Let us set In view of Theorems 2.2 and 2.3, a central estimator θ satisfies the expansion with Π A the idempotent matrix We are interested in testing the null hypothesis H 0 : θ ∈ Θ 0 for some subset Θ 0 of Θ.We assume that Θ 0 is the image {ψ(t) : t ∈ Δ} of some open subset Δ of R l under some injective differentiable function ψ which has derivatives of full rank l < k.With Θ 0 we associate the submodel and the functional κ 0 from Q 0 onto Δ defined by where ψ −1 : Θ 0 → Δ is the inverse map of ψ.Suppose from now on that θ belongs to Θ 0 so that the null hypothesis is true.Then there is a unique τ in Δ such that θ = ψ(τ ), and the derivative B of ψ at τ has full rank l.We have and Thus the analogues of the conditions (L0)-(L2) hold for the submodel Q 0 and the functional κ 0 .The roles of Rn , θ, and A are now played by Rn •ψ, τ = κ 0 (Q), and AB.Thus Theorem 2.2 yields the expansions and sup for every finite C with Γ n = A W −1 U n and J = A W −1 A. Hence a central estimator τ of τ for the submodel satisfies the expansion The delta-method yields the expansion Thus we find with Π AB the idempotent matrix defined by Analogous to the classical likelihood ratio, the empirical likelihood ratio test rejects the null hypothesis for small values of the test statistic sup ϑ∈Θ0 Rn (ϑ) sup ϑ∈Θ Rn (ϑ) .
It will be more convenient to work instead with the test statistic where θ is a central estimator in the full model and τ is a central estimator in the submodel Q 0 with functional κ 0 .In view of the previous results, we have the expansion has asymptotic size α.The above shows that the empirical likelihood ratio test behaves like the usual parametric likelihood ratio test.

Inference about quantiles
Throughout this section X 1 , . . ., X n are independent copies of a random variable X with distribution function F .We shall focus on inference problems related to quantiles as these provide constraints that are not regular.We let F −1 denote the left inverse of F defined by We say F is γ-regular, if γ belongs to the interval (0, 1) and F has a positive derivative at is the unique γ-quantile and the sample γ-quantile qγ obeys the expansion In the following examples the verification of the conditions (K1) and (K2) will rely on the following well known result.Lemma 5.1.Let C be a finite constant and q be a real number.Then we have if F is continuous at q, and have if F is differentiable at q. Example 5.1.Let us assume that F is γ-regular.We want to estimate the γ-quantile θ = F −1 (γ) of F using the empirical likelihood approach.Since F is continuous, θ satisfies E[1[X 1 ≤ θ]] = γ.This suggests to look at the empirical likelihood In view of Lemma 5.1 and the γ-regularity of F , the conditions (K0)- and hence is asymptotically equivalent to the sample quantile.
Here we have an explicit formula for the empirical likelihood, , q ∈ R, where . This follows as in Owen (2001 [13]), page 43, who considered a slightly modified version.From the formula we derive the identity where X (1) , . . ., X (n) are the order statistics and It is easy to check that the function x → g γ (x, n) is increasing on the interval (0, nγ] and decreasing on the interval [nγ, n).This shows that, almost surely, the function ϑ → R n (ϑ) is piecewise constant, non-decreasing on (−∞, X (kn+1) ) and non-increasing on [X (kn+1) , ∞), where k n is the integer part of nγ.Thus a MELE is given by Remark 5.1.For γ = 1/2, we can also use the empirical likelihood Then the conditions (K0)-(K2) hold with u(z, ϑ) = sign(z − ϑ), W = 1 and A = 2F (θ).Thus a GOMELE obeys the expansion It is easy to show that the sample median is a MELE.
The above example is easily extended to cover the simultaneous estimation of several quantiles.Let us sketch this briefly.
Here (K0)-(K2) hold with A = −diag(F (θ 1 ), . . ., F (θ m )) and W the matrix with entries W ij = γ i − γ i γ j for 1 ≤ i ≤ j ≤ m.From this and Theorem 1.3 we find that the i-th component θi of a GOMELE θ satisfies the expansion The above empirical likelihoods can be used to test composite hypothesis about quantiles.We explain this in a concrete example.
Example 5.4.We assume that X has a finite variance σ 2 and its distribution function F has a positive derivative F (m F ) at its (unique) median m F .We want to test whether the mean μ F of F equals the median m F .For this we look at The assumptions (K0)-(K2) hold with where ρ is the covariance of ε = X − μ F and sign(X − m F ).This follows from Lemma 5.1 and simple calculations.The map ψ can be taken to be ψ(t) = (t, t) and has derivative (1, 1) of rank 1.It is easy to see that R n (q, r) is maximized by q, r, where q is the sample median and r is the sample mean and that R n (q, r) = 1.The empirical likelihood ratio statistic T n simplifies to T n = R n (τ , τ ), where τ is a GOMELE under the null hypothesis of the common value τ of μ F and m F , and −2 log T n has a limiting chi-square distribution with one degree of freedom.From this we conclude that the test 1[−2 log T n ≥ χ 2 1−α (1)] has asymptotic size α.
In the next examples we address estimation of a quantile under additional assumptions on the underlying distribution function F .Example 5.5.Suppose F is γ-regular and has zero mean and finite variance σ 2 .We estimate θ = F −1 (γ) using the empirical likelihood It is easy to check that (K0)-(K2) hold in this case with where ρ is the covariance between X and 1 and has asymptotic variance Example 5.6.Suppose F is γ-regular for some γ = 1/2 and has known median 0. To estimate θ = F −1 (γ), we rely on the empirical likelihood It is easy to check that (K0)-(K2) hold in this case with where ρ is the covariance between sign(X) and and has asymptotic variance (γ(1 − γ) − ρ 2 )/(F (θ)) 2 .

Simulations
To study the performance of one-step and guided maximum likelihood estimation and of likelihood ratio tests in small to moderate sample sizes we carried out a small simulation study.This was done with the aid of the R package [17].We used the function elm provided by Art Owen to calculate the log-empirical likelihood.

Simulations for Example 5.5
We first looked at estimating the median θ when the distribution has known mean zero utilizing the empirical likelihood The theory for this was treated in Example 5.5 where we considered the more general problem of estimating a quantile when the mean is known.We chose this problem as the criterion function is irregular.Table 1 reports n times the simulated mean square errors of five estimators of θ, sample median θ (SMED), LSMELE, GOMELE (guided by LSMELE), GMELE (guided by the sample median), and MELE, for four different distributions and four sample sizes (n = 30, 60, 90, 120).The distributions chosen were (1) the standard normal distribution with density (2) the logistic distribution with density (3) the double exponential distribution with density (4) a shifted exponential distribution with density The respective asymptotic variances of the sample median for these four distributions are π/2 1.5708, 4, 1, and 1, while those of a central estimator are π/2− 1 = .5708,4−48(log(2)) 2 /π 2 1.6634, .5, and 1−(log(2)) 2 .5195.These numbers show that much can be gained from the knowledge that the mean is zero.
The present empirical likelihood is piecewise constant between neighboring order statistics and thus only needs to be calculated for the midpoints between neighboring order statistics.We used this observation to calculate LSMELE, GOMELE, GMELE and MELE.The GOMELE was chosen as a maximizer over a neighborhood of the LSMELE of radius 2.5σn −1/2 and the GMELE over a neighborhood of the sample median of radius σ log(n)n −1/2 , with σ = n/(2 n i=1 1[|X j − θ| < .5])an estimator of the asymptotic standard deviation of the sample median.Since the maximizers are not unique, we used the midpoint of the maximizing interval.For the LSMELE, we first computed θi = θ + n −1/2 σ(−2.55 + i/10), i = 0, . . ., 51, then associated with θi the average θ * i of the largest observation less than or equal to θi and the smallest observation larger that θi , computed  1 we see that the LSMELE performs best in all cases considered and that GOMELE, GMELE and MELE perform about the same.The performance of the LSMELE is better than suggested by the asymptotic theory for three of the four densities.

Simulations for Example 5.6
We also looked at an example with a smooth constraint function, namely estimating the mean when the median is known to be zero using the empirical likelihood A √ n-consistent estimator is given by the sample mean X.Table 2 reports again n times the simulated mean square errors of the sample mean (SM), the LSMELE (guided by the sample mean), and versions of the LSMELE associated with different design points and the corresponding GOMELE for the first three densities which are symmetric about zero and thus have median zero.The respective asymptotic variances of the sample mean for these three distributions are 1, π 2 /3 3.290 and 2, while those of a central estimator are 1−2/π .3634,1.3681 and 1.These numbers show that much can be gained from the knowledge that the median is zero.Each entry is the sample size times the simulated mean square error for the corresponding estimator, sample size and error density.The results are based 4000 repetitions.
We ran the simulations for the sample sizes n = 50 and n = 100 and used 4000 iterations in each case.The GMELE was found via a grid search using the grid { X + iσC n /(100 √ n) : i = −100, . . ., 100} with σ the sample standard deviation and C n = 4 + √ log n.For a = 1, 1.5, 2, we used the design points {t i = iσC n /100, i = −a n , . . ., a n } where a n is the integer closest to 100a/C n to compute the LSMELE L(a) and used a grid search with the above grid points within 2σ/ √ n units of L(a) to find the GOMELE G(a).From Table 2 we see that the performance of the GOMELE's and the GMELE are the same, and that the performance of the LSMELE's is slightly worse and seems to be better for smaller a.

Simulations for Example 5.3
Here we report simulation results for the empirical likelihood ratio test described in Example 5.3 which addresses testing the null hypothesis F −1 (1/4) + F −1 (3/4) − 2F −1 (1/2) = 0.The data were generated from F = F 0 for computing the significance level and from the mixture distribution F = .65F0 + .35Gβ for computing the power.The distribution F 0 was taken to be symmetric about zero so that the null hypothesis was met by F = F 0 .The distribution G β was taken from a parametric family with parameter β.The parameter β was selected so that the difference δ = F −1 (1/4) + F −1 (3/4) − 2F −1 (1/2) took the values .4,.6 and .8.We worked with three choices for F 0 , the standard normal distribution N(0, 1), the Cauchy distribution Cau(0), and the Laplace distribution Lap(0).We picked six choices for G β , namely, the Cauchy distribution Cau(β) with location parameter β, the exponential distribution Exp(β) with rate parameter β, the Laplace distribution Lap(β) with location parameter β, the Logistic distribution Logis(β) with location parameter β, the normal distribution N(β, 1) with mean β and variance 1, and the uniform distribution Unif(β, 2β).
The R function constrOptim was used to compute the two-dimensional GMELE of the parameter τ = (τ 1 , τ 2 ).Here the arguments ui,ci (the constraint matrix and constraint vector) of the R function constrOptim were set to ensure that the search region is 0 < τ 1 ≤ τ1 + 3 log(n)/n and τ2 − 3 log(n)/n ≤ τ 2 ≤ Table 3 Simulated significance levels and powers of the EL test about 0 at the nominal level .05for the sample sizes n = 120, 140, 160.Data were generated from F = F 0 and F = 0.65F 0 + 0.35G β .τ2 + 3 log(n)/n, where τ1 was chosen to be the difference of the third and first sample quartile, and τ2 was taken to be the sample median.Table 3 reports the simulated significance level and power of the test given in Example 5.3.The results are based on 2000 repetitions and the sample sizes n = 120, 140 and 160.

Simulations for Example 5.4
Here we report simulation results for the empirical likelihood ratio test described in Example 5.4 to test for the equality of mean and median.The data were generated from F 0 for computing the significance level and from the mixture distribution F = 0.95F 0 + 0.05G β for computing the power.Here again F 0 is a symmetric distribution and G β comes from a parametric model.We selected F 0 to be the standard normal distribution N(0, 1), the standard logistic distribution Logis(0, 1), the t-distribution t(4) with 4 degrees of freedom, and the slash t-distribution SLt(4, 5) with 4 degrees of freedom and tail index 5 (stochastically equivalently, SLt(4, 5) = t(4)/Unif(0, 1) 1/5 ).We used two choices for G β , the exponential distribution Exp(β) with rate β and the Gamma distribution Gam(β, 5) with rate β and shape equal to 5. The parameter β was selected so that the difference δ = μ F − m F took the values .4,.6 and .8.
The R function optimize was used to find the one-dimensional GMELE with the argument interval having endpoints τ ± σ log(n), where τ is the average of the sample mean and sample median, and σ is the jackknife estimator of the standard error of τ .
Table 4 reports the simulated significance level and power of the empirical likelihood test from Example 5.4.Also listed in the table are the values of β which correspond to the values of δ.The results are based on 2000 repetitions and for the sample sizes n = 40, 80 and 120.

Residual-based inference about a quantile
Let Z 1 , . . ., Z n be independent replicas of the random vector Z = (X , Y ) which forms the linear regression model where ε and X are independent, X has a positive definite dispersion matrix, and ε has mean zero, a finite variance σ 2 , and a uniformly continuous density f with {f > 0} an interval.We are interested in estimating the γ-quantile θ of ε for some 0 < γ < 1.If the error variables ε 1 , . . ., ε n were observable, we could work with the empirical likelihood from Example 5.5 which takes into account the fact that the errors are centered.A naive approach would now be to replace the unobservable error variables by the residuals ε1 , . . ., εn , based on the least squares approach.While this choice yields the desired (L1), it does not produce (L2).This follows from the fact that the sum of the residuals is zero.We should also point out the following additional properties of the residuals, To find an appropriate choice of u Q , we start with the fact that the least squares residuals satisfy the property This can be derived from results of Koul (1969 [4]) for the fixed design case and Müller, Schick and Wefelmeyer (2007 [8]) for the random design case used here, see also Remark 2 in Müller, Schick and Wefelmeyer (2009 [9]).The expansion (7.1) and the fact that the residuals sum to zero imply that This follows from uniform consistency of the error-based kernel estimator f (defined as f but with ε j in place of εj ) and the inequality with L the Lipschitz constant for K.
for several choices of c.Here θ denotes the RBSM and σ denotes the jackknife estimate of the standard error of θ.Reported in Table 5 are the simulated mean squared errors multiplied by the sample size n of the estimators EBSM, RBSM and G(c) with c = .1,.2,.3,.4, . 5. We looked at odd sample sizes since for an even sample size n = 2m the RBSM (ε (m) + ε(m+1) )/2 is also a MELE.The table shows that the RBSM has smallest simulated mean square error in all cases considered and the mean square errors of the estimators G(c) increase with c.
Based on this we recommend the use of the RBSM.
The above results for linear regression carry over to nonparametric regression.Let us explain this in the simplest case when Z = (X, Y ) and Y = r(X) + ε, with r a twice continuously differentiable function and X and ε are independent, with ε having mean zero and finite variance, and X is quasi-uniform on the unit interval [0, 1].The latter means X has a density g that is bounded and bounded away from zero on its support [0, 1].Under the additional assumption that the density f is Hölder of order 1/3 and has a finite moment of order greater than order 8/3, Müller et al (2007) have shown that there are estimators rn of r such that (7.1) also hold for the nonparametric residuals εj = Y j − rn (X j ).These nonparametric residuals satisfy, for some ρ > 1/2, It is now easy to check that the kernel density estimator f based on these nonparametric residuals is uniformly consistent for f if also n ρ b 4 → ∞.One verifies (L1) and (L2) with A = −f (θ) and again obtains the expansion (7.2) for the corresponding GOMELE.Expansions to nonparametric regression models with multivariate covariates are possible using the results of Müller et al (2009 [9]).The results in Müller et al (2007 [8], 2012 [10]) can be used to obtain extensions to the partly linear regression model and to the additive nonparametric regression model, while those of Koul, Müller and Schick (2017 [5]) can be used for extensions to single-index models.In all these models one can construct residuals so that (7.1) and (7.3) hold and then obtains the expansion (7.2).We should mention that we do not get the second part of (7.3) for all regression models.This is already so in linear regression without an intercept.

Remark 7.1.
A key point of the section is that one has to be careful in selecting the constraint function in order to be able to achieve (L1) and (L2).We are not the first to observe this.Zhu and Xue (2006 [20]) have pointed this out in the context of a single index model.Here we look at a more general single-index random-coefficient model, namely where V is a random variable, U is a k-dimensional random vector, X is a qdimensional random vector, the error variable is independent of the covariates (U, V, X) with mean zero and finite positive variance, β is an unknown smooth function from R into R q , and the k-dimensional vector θ is the parameter of interest.
If q = 1 and X = 1, then this model reduces to the single index model For this model, Xue and Zhu (2006 [19]) used the constraint function with I t = V + t U and were unable to verify (L1) and (L2).Zhu and Xue (2006 [20]) considered instead the constraint function and were able to verify (L1) and (L2).
In the general case, the constraint function used by Xue and Wang (2012 [18]) is not suitable for obtaining (L1) and (L2).Instead, one should work with the constraint function where the matrix To this end we shall use the following result which is a special case of Lemma 5.2 of Peng and Schick (2013 [15]).T nj (t)T nj (t).
We impose the following conditions.We have the following result.The first conclusion in the theorem follows from (8.6), (8.5) and (B3).The second conclusion is a simple consequence of the first one.
From the above we immediately derive the following result which gives sufficient conditions for (3.3).The assumptions used in this result imply (L0)-(L2).We use the notation of Section 2. We need the following stronger version of (L2). (

(
K1) For every finite constant C, one hasD n (C) = sup |t|≤C 1 n n j=1 |u(Z j , θ + n −1/2 t) − u(Z j , θ)| 2 = o P (1).MELE 2965 (K2) There is an m × k matrix A of full rank k such that the expansion sup |t|≤C n −1/2 n j=1 ε j converges to zero in probability for any estimator f of f .This suggests to work with the empirical likelihoodRn (ϑ) = sup n j=1 nπ j : π ∈ P n , n j=1 π j 1[ε j ≤ ϑ] − γ + f (ϑ)ε j = 0 ,where f is a residual based kernel density estimator of f , b , y ∈ R, with K a symmetric Lipschitz-continuous density and b a bandwidth satisfying nb 4 → ∞ and b → 0. Then the residual-based density estimator f is uniformly consistent

Table 1
Estimating Median When Mean Is ZeroEach entry is the sample size times the simulated mean square error for the corresponding estimator, sample size and error density.The results are based on 4000 repetitions.

Table 2
Estimating Mean When Median Is Zero

Table 4
Simulated significance level and power of the EL test about H 0 : δ = μ F − med F = 0 at the nominal level .05for several sample sizes n.Data were generated from F 0 and from the contaminated symmetric distribution F = 0.95F 0 + 0.05F 1 .

A general result
Let T n1 (t), . . ., T nn (t) be m-dimensional random vectors indexed by t ∈ R k , where k ≤ m.Let C n be a sequence of positive numbers such that inf n C n > 0 and C n = o(n 1/2 ).We are interested in the asymptotic behavior of the empirical likelihood process XU onto the space of functions of the form A(I t ) X with A a function into R k×q satisfying E R [|A(I t )| 2 ] < ∞}.Here one needs to assume that the matrix E R (XX |I t ) is invertible.8.j : π ∈ P n , n j=1 π j T nj (t) = 0 , |t| ≤ C n .