On evaluating the efficiency of the delta-lognormal mean estimator and predictor

Graphical abstract


Specifications table
Subject area: Probability distributions More specific subject area: Relative efficiency of statistical estimators or predictors Method name: UMVU-based estimation or prediction of the mean for delta-lognormal data Name and reference of original method: • J. Aitchison

Method details
Let U be a finite population of sampling units, unambiguously identifiable by integer labels i = 1 , 2 , . . . , N. Let y be a variable of interest measured or observed on the sampling units, and the total t U = N y U , with y U the mean of y defined over U. A sample s ⊆ U of size n is drawn from U by an ignorable selection mechanism. Classical (frequentist) statistics assume that the y i ( i ∈ U) are random variables of joint distribution ξ . Said another way, U is itself a random sample drawn from an infinite set of populations sharing the same general statistical properties (i.e., a superpopulation), described by stochastic model ξ . In this article, we refer to the situation where, from the sample s at hand, the purpose is either to estimate the expectation of y in the model ξ , or to predict the finite population mean y U (or equivalently, the total t U ).
We consider here a variable of interest y taking nonnegative values ( y ≥ 0 ). The sample s can be partitioned into s = s 0 ∪ s 1 , s 0 ∩ s 1 = ∅ , with s 0 of size n 0 having zero values (y = 0) and s 1 of size n 1 having positive values (y > 0) . If n 1 = 1 , we note the unique positive value y s 1 . A characteristic of such data is that they may exhibit a high proportion of zero values. To take this into account in a sufficiently flexible manner, one approach is to use a two-component mixture model. There are two possibilities: (i) increasing the probability of zero values from a distribution defined for y ≥ 0 ( zeroinflated distributions ); (ii) introducing a dichotomy between y = 0 and y > 0 in a mixture model with two separately estimable parts ( hurdle-at-zero, conditional, two-part, and delta distributions designate the same thing). If the second possibility is adopted, then a suitable model for nonnegative values is written as: where F (y ; θ) is a cumulative distribution with parameters θ, corresponding to a positive distribution, either discrete or continuous; here we consider the lognormal distribution.

Lognormal distribution
Let z be a random variable distributed according to the standard normal distribution (i.e., z ∼ Norm (0 , 1) ) of probability density: Then, y = exp (μ + σ z) follows a lognormal distribution that is completely specified by μ and σ 2 .

Delta-lognormal distribution
The lognormal distribution is no longer appropriate when zero values must be accounted for. This leads to using the delta-lognormal distribution [2] , often also called the -distribution [1] and occasionally the Bernoulli-lognormal two-part model (e.g., [ 3 , p. 703]).
The delta-lognormal distribution results from a mixture of a Dirac mass at 0 with probability p 0 and a lognormal distribution with probability (1 − p 0 ) , that is: where δ(y ) is a Dirac distribution that concentrates a unit mass at 0.
The first three cumulants (i.e., expectation, variance, and third central moment) of the distribution (4) are written as [ 1 , p. 95, Eq. (9.43)-(9.45)]: with: Even if it is not necessarily the best possible definition, in this article, skewness is classically defined as: The skewness γ 1 of the delta-lognormal distribution increases dramatically as σ 2 increases. For small values of σ 2 , as p 0 increases, γ 1 first decreases and then increases. For σ 2 > 1 approximately, as p 0 increases, γ 1 only increases ( Fig. 1 ).

Relative efficiency assessement
In the context of a finite population, one can indifferently consider the mean y U or the total t U as the quantity of interest. For the finite population that has actually been sampled, these quantities have fixed values that one would be able to know exactly if s = U (ignoring possible measure or observation errors). Under a superpopulation model ξ , these statistics are random variables whose values one wants to predict. In an infinite population (superpopulation), one is interested in estimating the expectation E (y ) = κ 1 .
For delta-lognormal data, depending on the level of skweness of the distribution, the question arises of the gain in precision that can be achieved by relying on the uniformly minimum-variance unbiased estimator (UMVUE) compared to using the sample mean y s , either for estimating κ 1 or for predicting y U (or equivalently, t U ). In other words, one may compare the situation where the shape of the distribution is known, to the situation where it is unknown (or known but not taken into account), first in the case of an infinite population (estimation context), then in that of a finite population (prediction context).
In the estimation context, Aitchison and Brown [ 1 , p. 98, Fig. 9.1] provided relative efficiency results only for p 0 = 0 . 5 and for the degenerate case of the lognormal distribution ( p 0 = 0 ), using a variance approximation (see the validation section). By doing so, the sample size n is disregarded in the relative efficiency assessment. Shimizu [2] did not document the relative efficiency in the case of the delta-lognormal distribution. Smith [4] considered the relative efficiency for p 0 = 0 . 1 and p = 0 . 5 , for very small sample sizes, using exact or approximate variance. To our knowledge, the relative efficiency assessment in the prediction context has not been documented yet.
In this technical article, after providing a compendium of fundamental formulas for UMVU estimation in the case of the delta-lognormal distribution, we document the relative efficiency more thoroughly than in the past by considering both the estimation and prediction contexts, taking into account the sample size (and the finite population size in the prediction context), and varying the probabilitiy of getting a zero value up to p 0 = 0 . 9 . In all cases we use the exact expression of the variance of the estimator (or predictor).

Unknown shape of the distribution
Let s be a sample of size n drawn by random sampling from an infinite population of unknown shape. The unbiased estimator of κ 1 is the sample mean y s and its sampling variance is written as: where κ 2 is estimated without bias by: The sampling variance is then estimated without bias by:

Known shape of the distribution
When y is distributed according to a delta-lognormal distribution, κ 1 can be estimated by the UMVUE [ 1 , p. 97, Eq. (9.54)] (typo corrected); [4] : with: (16) and g m (t) an infinite series introduced by Finney [ The exact variance of ˆ κ 1 (14) was provided by Smith [ 4 ,Eq. (6)]. The function g m (t) belongs to the class of generalized hypergeometric functions and can be written as a particular instance of the confluent hypergeometric limit function (here denoted as 0 F 1 ) as [ 6 , Eq. (2.1)]: where (a ) j is the notation used in special function theory for the rising factorial: Note that the numerical evaluation of special functions 0 F 1 (a ; z) and g m ( t ) is addressed later in the article.

Relative efficiency
We compare the precision of the estimator for κ 1 according to whether the distribution shape is unknown or known. It is expected that taking into account the knowledge of the distribution shape will lead to a gain in precision. The relative efficiency is defined in the same way as that used by r ← k + 1 14: for i = 2 to p do 15: r ← r × (k + i )) /i 16: end for i that is, taking into account the shape of the distribution leads to a gain in precision, which is higher when eff 1 is low ( V ( ˆ κ 1 ) < V ( y s ) ). Hence, we quantify the gain in precision by expressing it as a function of p 0 , σ 2 (or, equivalently, σ ) and n . The parameter μ vanishes through the elimination of the α 2 term that appears in the numerator and denominator of the relative efficiency.
To illustrate the speed of convergence of eff 1 toward its asymptotic limit, we set σ 2 = 2 , and we repeat the calculations for p 0 = 0 . 0(0 . 1)0 . 90 and 50 ≤ n ≤ 1 0 0 0 . The asymptotic limit and the speed of convergence of eff 1 depend on p 0 ; the lower p 0 is, the higher the efficiency gain and the speed of convergence ( Fig. 3 ).

Prediction context
By denoting r = U − s , the total defined on the population can be written as: that is the sum of the totals defined over the sample ( t s ) and the remaining part in the population ( t r ). A predictor of t U can be written as: Consider a simple mean model where the random variables y i ( i = 1 , . . . , N) have the same expectations and variances and are not correlated, that is: The empirical predictor is written as:  Predictor (29) is model-unbiased; that is, if the model is correct, then E ( ˜ t U − t U ) = 0 . The variance of the prediction error is obtained as: The ξ -covariance between ˜ t r and t r is zero since: (i) ˜ t r is a function of the set of values in s ( { y i , i ∈ s } ), not of the set of values in r ( { y i , i ∈ r} ) which we do not know; and (ii) the two sets of values { y i , i ∈ s } and { y i , i ∈ r} are uncorrelated under the model (see (28c) ).

Distribution of unknown shape
Predictor (29) can be written as: which we can designate as an expansion predictor [13] . From relation (30) , the prediction error variance of ( ˜ t exp U − t U ) is obtained as: The predictor of the mean is y s , and its prediction error variance is V ( y s − y U ) = ( 1 − n/N ) κ 2 /n . It follows the limit: For prediction error variance (34) , an unbiased estimator is obtained by substituting estimator ˆ κ 2 (22) for parameter κ 2 .

Distribution of known shape
Predictor (29) can be written as: From relation (30) , the prediction error variance ( ˜ t mvu where the variance V ( ˆ κ 1 ) is given by expression (23) . The mean predictor is ˜

Relative efficiency
We compare the precision of the t U prediction depending on whether the shape of the distribution is unknown or known. In addition to p 0 , σ 2 and n , we must also vary the finite population size N.
The relative efficiency is defined as: First, we examine the effect of the population size N for a fixed sample size n , which is equivalent to examining the effect of the sampling fraction f = n/N. We set n = 50 and vary the sampling fraction as f = 0 . 4 , 0 . 2 , 0 . 1 , 0 . 05 ( N = 125 , 250 , 500 , 1 000 ). As before, we vary p 0 = 0(0 . 025)0 . 9 and σ = 0 . 05(0 . 05)2 . The obtained results ( Fig. 4 ) show a smaller gain in precision than in the case of an infinite population ( Fig. 2 a). This finite population effect is less pronounced as f tends toward 0 ( N → ∞ ) since in that scenario the situation tends toward the asymptotic result of the case considered here, which corresponds to Fig. 2 a. To illustrate the speed of convergence of eff 2 toward its asymptotic limit eff 1 , we set σ 2 = 2 , and we repeat the calculations for p 0 = 0 . 0(0 . 1)0 . 90 , n = 50 and 500 ≤ N ≤ 10 000 . As in the infinite population case, the asymptotic limit and the speed of convergence of eff 2 depend on p 0 . The lower p 0 is, the higher the efficiency gain ( eff 2 decreases). In contrast, the speed of convergence becomes less important when p 0 decreases ( Fig. 5 ). In practice, we can consider that we are almost at convergence as soon as N = 5 0 0 0 (or N = 10 0 0 0 if one is more conservative). Note that limit values correspond to the values for n = 50 in Fig. 3 and are represented by dotted half lines in Fig. 5 .
Finally, we examine the gain in precision when n and N increase jointly, keeping the sampling fraction constant. For f = 0 . 1 , we increase the population size as N = 500 , 1 000 , 5 000 , 10 000 ( n = 50 , 100 , 500 , 1 000 ). The obtained results ( Fig. 6 ) show a gain in precision that increases ( eff 2 decreases) as N and n jointly increase. Note that Fig. 6 a is the same as Fig. 4 c ( N = 500 , n = 50 ). There is only a small difference between the case N = 5 0 0 0 , n = 500 ( Fig. 6 c) and the case N = 10 0 0 0 , n = 1 0 0 0 ( Fig. 6 d), which suggests the convergence of eff 2 toward its asymptotic limit (in the sense that n and N jointly increase and for f = 0 . 1 ).

Numerical evaluation of the generalized hypergeometric functions
In this article, we need to numerically evaluate 0 F 1 (a ; z) ( a ∈ R * + and z ∈ R * + ) -or equivalently g m (t) -to calculate V ( ˆ κ 1 ) (23) , which is directly involved in the relative efficiency eff 1 (25) and through formula (38) in the relative efficiency eff 2 (41) . Evaluating 0 F 1 (a ; z) by recurrence The confluent hypergeometric limit function (19) can be written as: which gives the recurrence relation: and translates into Algorithm 3 . The series is infinite, but in practice, it is only required to compute the sum until it reaches a level of convergence considered sufficient; for example, with the convergence criterion | S j − S j−1 | < , we used = 10 −10 . Another way to proceed is to determine the number of terms necessary to reach the precision allowed by the computer at hand (see [ 15 , pp. 88-89]). Computing successive terms of the series by means of the recurrence relation is a computational method often used by default (e.g., [ 16 , p. 99]). For other methods, the interested reader is referred to [17,18] .

Evaluating g m ( t ) by recurrence
One can proceed in the same way as previously for the function g m ( t ) (17) , which can be written as (see, for example, [ 4 , Eq. (1)], with m = n 1 − 1 ): and translates into Algorithm 4 .  The examination of V ( ˆ κ 1 ) (23) shows that the minimal value that a can take in this expression is a = 0 . 5 and that for the maximal value n = 10 0 0 used in this article, we have a 500 . Besides, we may consider 0 < z ≤ 1 0 0 0 for covering the situations addressed in [19] and similar future examples.

Additional information
Pennington [12] introduced the use of the delta-lognormal distribution in marine biology to estimate mean abundance more efficiently than can be achieve with the sample mean in the case of highly skewed distributions. The article by Pennington [12] is widely cited in the literature (603 citations according to Google Scholar, at the time of writing this article). Regarding the robustness of this approach, the reader is referred to Myers and Pepin [21 , 22 ], Syrjala [23] and Christman [24] . In the context of the mean (or total) prediction, see the recent contribution [19] .

Declaration of Competing Interest
The author declare that he has no known competing for financial interests or personal relationships that could have appeared to influence the work reported in this paper.