The empty set and zero likelihood problems in maximum empirical likelihood estimation

: We describe a previously unnoted problem which, if it occurs, causes the empirical likelihood method to break down. It is related to the empty set problem, recently described in detail by Grend´ar and Judge (2009), which is the problem that the empirical likelihood model is empty, so that maximum empirical likelihood estimates do not exist. An exam- ple is the model that the mean is zero, while all observations are positive. A related problem, which appears to have gone unnoted so far, is what we call the zero likelihood problem. This occurs when the empirical likelihood model is nonempty but all its elements have zero empirical likelihood. Hence, also in this case inference regarding the model under investigation breaks down. An example is the model that the covariance is zero, and the sample consists of monotonically associated observations. In this paper, we deﬁne the problem generally and give examples. Although the problem can occur in many situations, we found it to be especially prevalent in marginal modeling of categorical data, when the problem often occurs with proba- bility close to one for large, sparse contingency tables.


Description of the empirical likelihood method
With P a set of probability distributions, a subset of P is called a model in P. A useful and general way to define a model is as Φ(θ, P) = {P ∈ P | θ(P ) = 0} where θ is an appropriate map from P into a Hilbert space. For a topological space Ω let P(Ω) be the set of Borel probability measures on Ω, called the saturated model. With Φ Ω = Φ(θ, P(Ω)) we now consider maximum empirical likelihood (MEL) estimation of a distribution P ∈ Φ Ω Consider a sample of iid observations X 1 , . . . , X n (X i ∈ Ω) with unknown common distribution P ∈ P(Ω). The empirical likelihood sample space is the finite set X = {X 1 , . . . , X n } ⊆ Ω, and the empirical likelihood of Q ∈ P(X ) is Note that, for all Q ∈ P(X ), The unrestricted MEL estimator of P ∈ P(Ω) is defined aŝ Via the inequality of arithmetic and geometric means, it is straightforward to show thatP 1 is the empirical distribution, i.e.,P 1 (x) = 1 n n i=1 I(X i = x) where I is the indicator function (in the continuous case,P 1 (X i ) = 1/n for all i with probability 1). The empirical likelihood model Φ X ⊆ Φ Ω is defined as Thus, Φ X consists of those probability distributions in Φ Ω whose support is X . The MEL estimator of P ∈ P(Ω) in Φ Ω iŝ A particularly important application of MEL estimation is the testing of the null hypothesis that P ∈ Φ Ω against the alternative hypothesis that P ∈ P(Ω) \ Φ Ω . For this purpose, it is common to use the log likelihood ratio test statistic In this context, it is assumed that θ : P(Ω) → R p for some p ≥ 1; then if P ∈ Φ Ω , and θ satisfies certain smoothness conditions, T has an asymptotic chisquare distribution with p degrees of freedom (Qin and Lawless, 1994;Owen, 2001). (NB: these authors gave conditions on estimating equations which imply θ(P ) = 0; so those are actually implicit conditions on θ.) So-called inversion of the likelihood ratio test can be used to construct confidence intervals for θ(P ).

The empty set and zero likelihood problems
For finite samples, there may be problems which cause the MEL method to break down. Grendár and Judge (2009) considered in detail the problem that Φ X = ∅, which they called the empty set problem. Another problem which can occur and which we have not seen explicitly described before, is that L(P ) = 0 for all P ∈ Φ X . We call this the zero likelihood problem. Note that every P ∈ Φ X then trivially maximizes the empirical likelihood. Evidently, the zero likelihood problem occurs if and only if for all P ∈ P(X ), there exists an 1 ≤ i P ≤ n such that P (X iP ) = 0. The next example illustrates the two problems and how they compare.
Example 1. Suppose Ω = R and θ is the mean, i.e., Φ Ω is the model that the mean is zero. The empty set problem occurs if and only if X i > 0 for all i or X i < 0 for all i, i.e., 0 lies outside the convex hull of the data points (Qin and Lawless, 1994). Alternatively, the zero likelihood problem occurs if and only if X i ≥ 0 for all i, with X i = 0 for at least one i and X i > 0 for at least one i, or the same with reverse inequalities: then Φ X consists of the degenerate probability distribution with all mass at 0. If the population mean is zero, the zero likelihood problem can occur if and only if P (X = 0) > 0, P (X > 0) > 0 and P (X < 0) > 0. To take a specific example, suppose we have three observations X 1 = 0, X 2 = 1, and X 3 = 2. The empirical likelihood is The only solution is p 1 = 1 and p 2 = p 3 = 0. Hence, the empirical likelihood equals zero. A numerical example is P (X = 0) = 0.5, P (X = 1) = 0.2, P (X = 2) = 0.2, P (X = −6) = 0.1.
Then for sample size three, it can be verified that the probability of the zero likelihood problem occurring would be 0.63 (i.e., the probability that exactly one or exactly two observations are zero).
We next give three examples with Ω ⊆ R 2 , denoting the sample points as (X 1 , Y 1 ), . . . , (X n , Y n ), where each (X i , Y i ) is an independent replication of (X, Y ).
Example 2. Let Ω = R 2 and let θ be the covariance, i.e., Φ X is the model that the covariance is zero. Since, in general, the covariance of a degenerate distribution on a point mass is zero, the empty set problem cannot occur. On the other hand, the zero likelihood problem occurs if n ≥ 2 and the observations (X 1 , Y 1 ), . . . , (X n , Y n ) are monotonically associated, in the sense that there is a permutation (i 1 , . . . , i n ) of (1, . . . , n) such that In particular, Φ X consists of the degenerate probability distributions which have all probability mass at a single point (X i , Y i ), so L(P ) = 0 for all P ∈ Φ X .
Example 3. Let Ω = {0, 1} × R and let θ be the differences in conditional variances, conditioning on X, i.e., Hence, Φ Ω is the model of equal conditional variances. The empty set problem occurs if there is no i such that X i = 0 or if there is no i such that X i = 1.
The zero likelihood problem occurs if there is exactly one (X i , Y i ) with X i = 0, and there are (X i , Y i ) = (X j , Y j ) with X i = X j = 1, or vice versa with the roles of 0 and 1 interchanged. In the first case, in the empirical likelihood model var(Y |X = 0) = 0, and hence also var(Y |X = 1) = 0. The second case is analogous. Thus, in either case, Φ X consists of those probability distributions whose probability mass is nondegenerately distributed on two points, one with X = 0 and one with X = 1.

Remarks
We first encountered the zero likelihood problem when trying to fit categorical marginal models (CMMs; see Bergsma, Croon, and Hagenaars, 2009) for large, sparse contingency tables using MEL. The simplest CMM, marginal homogene-ity for a 2 × 2 table, is described in Example 4. The reason we used empirical likelihood rather than ordinary maximum likelihood is that the latter is computationally infeasible for very large tables, say with millions of cells. Such tables occur regularly in practice, for example if there are ten variables with five categories each the number of cells is approximately ten million. Unfortunately, we found that the zero likelihood problem occurred regularly, in many cases with probability close to one, and so we decided it was necessary to determine the general nature of the problem; this resulted in the present paper. In general, we would expect the zero likelihood problem to occur most frequently for for MEL estimation of bivariate or multivariate (continuous or discrete) models.
A possible solution to both the zero likelihood and empty set problems is to augment the empirical sample space with one or more well-chosen points. To solve the empty set problem, Chen et al. (2008) proposed adding a single point, namely minus the average of the sample mean (see also Liu and Chen, 2010). We are currently investigating augmenting the empirical sample space for CMMs, for which it is typically necessary to add more than one point in order to solve the two problems.
MEL estimation involves maximization of a multinomial likelihood subject to constraints on the multinomial probabilities. Typically in the empirical likelihood literature, this maximization is done with the help of so-called empirical estimating equations (Qin and Lawless, 1994). Using this technique a large class of models can be written via linear constraints on the multinomial probabilities. The Lagrangian dual is then a convex optimization problem which is, in principle, easy to solve. An alternative Lagrange multiplier method for maximizing a multinomial likelihood subject to constraints, which does not require the potentially cumbersome specification of the estimating equations, is described by Bergsma (1997) (see also Bergsma et al., 2009). We found this method to have very good numerical properties in practice. Another Lagrange multiplier method suitable for MEL estimation was described by Bergsma and Rapcsák (2005); this method turns a smooth constrained optimization problem into an unconstrained one.
Besides the aforementioned hypothesis testing, another important application of empirical likelihood is the construction of confidence intervals for a parameter θ(P ). This is typically done via inversion of the likelihood ratio test, a method which can be formulated in terms of the profile likelihood. Evidently, if the zero likelihood problem occurs for the hypothesis θ(P ) = θ 0 , then the value θ 0 will necessarily not be in such a confidence interval, regardless of the confidence level used. So, the zero likelihood problem is also a major potential issue in confidence interval construction.
Of some independent interest in this paper may be the formulation of the empirical likelihood method given in Section 1, which is more general than the estimating equations formulation normally used in the literature. For example, the hypothesis that the medians of two probability distributions are equal seems hard to formulate using estimating equations.