Multiple monotonicity of discrete distributions: The case of the Poisson model

Shape constrained estimation in discrete settings has received increasing attention in statistics. Among the most important shape constrained models is multiple monotonicity, including k-monotonicity, for a given integer k ∈ [1,∞), and complete monotonicity. Multiple monotonicity provides a nice generalization of monotonicity and convexity and has been successfully used in applications related to estimation of species richness. Although fully nonparametric, it is of great interest to determine some of the well-known parametric distributions which belong to this model. Among the most important examples are the family of Poisson distributions and mixtures thereof. In Giguelay (2017) k-monotonicity of Poisson distributions was connected to the roots of a certain polynomial, but a typographical error occurred while writing its expression. In this note, we correct that typographical error and give a detailed proof that a Poisson distribution with rate λ ∈ [0,∞) is k-monotone if and only if λ ≤ λk, where λk is the smallest zero of the k-th degree Laguerre polynomial Lk(x) = ∑k j=0(−1) (k j ) xj/j!, x ≥ 0. This result yields the sufficient condition that a mixture of Poisson distributions is k-monotone if the support of the mixing distribution is included in [0, λk]. Furthermore, we show that the only complete monotone Poisson distribution is the Dirac distribution at 0.


Background and motivation
Let p be a given function defined on N. The r-th differential operator Δ (r) is defined as follows: Δ (0) p(i) = p(i) and Δ (r) p(i) = ΔΔ (r−1) p(i) for all i ∈ N, with Δ the usual difference operator. Thus, Δ (r) is recursively defined with Δ (0) being the identity. For example, Δ (1) p(i) = p(i + 1) − p(i) gives the slopes of p, whereas Δ (2) p(i) = p(i + 2) + p(i) − 2p(i + 1) measures its curvature at i ∈ N. Furthermore, it can be easily shown that for all r, i ∈ N. For a given integer k ≥ 1 p is said to be k-monotone if (−1) k Δ (k) p(i) ≥ 0 for all i ∈ N. Therefore, k-monotonicity generalizes monotonicity (k = 1) and convexity (k = 2). Research work on k-monotone sequences can be traced back to the paper of Knopp (1925). From the definition of k-monotonicity, it is clear that this property is preserved under an arbitrary linear combination with positive coefficients. In Lefèvre and Loisel (2013) it is shown that p is a k-monotone probability mass function (pmf) if and only if there exists a sequence of probabilities (π k j ) j∈N such that p can be written as where Q k j is the finitely supported k-monotone spline given by see the proof of their Proposition 2.3. Above, the notation I is used for the indicator function. Thus, k-monotonicity defines a mixture model, and estimation of an unknown pmf p in this class is then equivalent to estimation of the mixing sequence (π k j ) j∈N . Other interesting results including alternative stochastic representations can be found in Lefèvre and Loisel (2013). Given n independent random variables X 1 , . . . , X n assumed to be i.i.d. from an unknown k-monotone pmf p, Giguelay (2017) considered unconstrained and constrained least squares estimation. The resulting least square estimators (LSE) are found to have a convergence rate of order √ n. Although no limit distribution was derived, this gives a nice generalization for the rate of convergence of the Gnenander estimator of Jankowski and Wellner (2009) and the convex least squares estimator of Durot et al. (2013). See also Balabdaoui et al. (2017). According to the numerical results obtained by Giguelay (2017), it is more advantageous to fit the k-monotone LSEs than the empirical estimator even for moderate sample sizes n, provided that the model is well-specified. Note that k-monotone pmfs with large values of k exhibit a deeper hollow than those with smaller values. For an illustration of this fact see Figure 1 where plots of the k-monotone spline Q k 10 are shown for different values of k. The same feature can be observed in the plots of the probability mass functions of the k-monotone Poisson distributions for k ∈ {2, 4, 6, 8, 10}; see Figure 3. Extending the work of Durot et al. (2015), Giguelay and Huet (2018) showed that k-monotone estimation can be applied to estimating species richness; i.e., the total (unknown) number of species of a given population. If p is the pmf of abundances, then assuming that p is k-monotone comes with the interpretation that π k j in the mixture model (1.1) is the probability that a species belongs to the j-th sub-population whose abundance is distributed according to the pmf Q k j . Since Q k 0 = δ 0 , the Dirac at 0, we necessarily have π k 0 = 0 since π k 0 is then the probability that a species belongs to the sub-population whose individuals cannot be observed! Chee and Wang (2016) also considered an estimation approach based on kmonotonicity. Their method differs however in that they considered mixtures of the k-monotone splines s θ Also, the authors considered maximum likelihood instead of least squares estimation. In applications related to species richness, the authors found that their method gives better results than those obtained with mixtures of Poisson distributions, a model that has become popular for estimation of species richness since the work of Norris and Pollock (1998). Interestingly, it can be shown that if a mixture of Poisson distributions has a mixing distribution with a restricted support then this mixture is k-monotone; see Corollary 2.7 below.
We would like to mention that for a density f with respect to Lebesgue measure which is defined on [0, ∞), k-monotonicity can be also defined in terms of higher derivatives: (−1) k−2 f (k−2) has to be non-negative and convex on [0, ∞). Characterization of k-monotonicity through scale mixtures of Beta(1, k) goes back to Williamson (1956). See also Fejér (1936), Feller (1939, Lefèvre and Loisel (2013), Balabdaoui and Wellner (2007) and Balabdaoui and Wellner (2010) where estimation via maximum likelihood and least squares methods was considered.
The other multiple monotonicity model is that of complete monotonicity. As k → ∞ the classes of k-monotone pmfs converge, in the sense of intersection, to the class of completely monotone of pmfs, formally, p defined on N is completely monotone if it is k-monotone for all k ≥ 1. It has been established in Steutel (1969) that p is a completely monotone pmf on N if and only if it is a mixture of geometric distributions. In the recent work of Balabdaoui and de Fournas-Labrosse (2019) nonparametric least squares estimation in this class was considered. There, it was shown that the complete monotone LSE converges at the rate √ n, to a limit distribution which is characterized as the unique solution of fully described limiting quadratic criterion.
When considering a nonparametric class of distributions such as the multiple monotone class, it is of much interest to identify some of the well-known parametric models which are elements thereof. This can be very helpful when looking for concrete examples in simulation settings. This also gives reference distributions representing the associated model. In the log-concave model, it is known that Binomial, Poisson, Skellam (convolution of two Poisson distributions) and geometric distributions are all log-concave on their respective supports, without any restriction on their parameters. As k gets larger the constraint of kmonotonicity becomes much more difficult to check. As opposed to log-concavity of a pmf p where only the sign of Δ (2) log p(i) for i in the support of p needs to be determined, it follows from Property 2 of Giguelay (2017) that k-monotonicity of p actually implies that the k conditions (−1) r Δ (r) p(i) ≥ 0, r = 1, . . . , k should be satisfied for all i ∈ N. Obviously, the splines Q k j , j ∈ N which form the basis of the class of k-monotone pmfs are themselves k-monotone. However, they are not as meaningful as Poisson distributions and their mixtures. In Giguelay and Huet (2018), the problem of testing k-monotonicity was considered with the goal of estimating the parameter k based on a random sample from a given multiple monotone pmf. To evaluate the power of these nested testing procedures a simulation study was designed using the Poisson distribution as the truth. This necessitates knowing the range of the rate λ of the Poisson distribution such that the latter is k-monotone but not (k + 1)-monotone. Indeed, if we write M k the class of k-monotone pmfs, then the procedures in Giguelay and Huet (2018) are designed for testing It can be easily shown that a Poisson pmf is 1-monotone; i.e., monotone nonincreasing, if and only if its rate parameter belongs to [0, 1]. With some addi-tional effort, one can show that if the rate belongs to [0, 2 − √ 2], then the corresponding Poisson distribution is 2-monotone; i.e., convex (and non-increasing). Our first goal in this paper is to give a precise characterization of k-monotonicity of such an important discrete distribution, thereby correcting the typographical error occurring in the condition given in Property 8 of Giguelay (2017). For the interested reader, we draw attention to the fact that values of the roots given by Giguelay (2017) in page 14 are correct, but they are actually not the roots of the polynomial considered in Property 8. Rather, they are the roots of certain Laguerre polynomials. The second main goal of this paper is to make the link between the roots of these polynomials and the range of the rate parameter of a k-monotone Poisson distribution.
The paper is organized as follows. In the next section we recall the k-monotonicity condition for the Poisson model and then make the connection between this property and the (α, k)-Laguerre polynomials where α ∈ N and k ∈ N \ {0}. Our main result given in Proposition 2.6 shows that a necessary and sufficient condition for k-monotonicity of a Poisson distribution with rate λ is that λ ≤ λ k with λ k the smallest zero of the (0, k)-Laguerre polynomial. A byproduct of this characterization is a sufficient condition of k-monotonicity of a mixture of Poisson distributions. Using monotonicity of λ k in k, we show that the result further implies that the sole complete monotone Poisson distribution is the Dirac at 0. In Section 3 we gather some concluding remarks and remaining questions. A short appendix is added for an intermediate result and a proof.

Characterization of k-monotonicity of Poisson distribution
Let λ ∈ [0, ∞) and P(λ) denote the Poisson distribution with rate λ. For λ = 0, P(λ) coincides with δ 0 , the Dirac distribution at 0. Let p λ denote the probability mass function of P(λ); i.e., for i ∈ N Thus, saying that p λ is k-monotone is equivalent to the statement that (−1) k Δ (k) p λ (i) ≥ 0 for all i ∈ N, with Δ (k) is the k-th differential operator already defined above. In other words, the parameter λ needs to satisfy

Link to Laguerre polynomials
We now make the connection between k-monotonicity of P(λ) and certain Laguerre polynomials. Before doing so we would like to first recall their definition.
Given a real number α ∈ (−1, ∞) and an integer k ≥ 1, consider the ordinary differential equation also referred to as the Laguerre's equation. Its polynomial solution is the well- admits k simple zeros, which all belong to the oscillatory region (0, 4k + 2α + 2), see for example Gatteschi (2002). Several properties of the zeros of L (α) k have been extensively studied in the literature. This includes lower and upper bounds, asymptotic behavior when k → ∞ and α is fixed, monotonicity in certain functions of α and Stieltjes interlacing, to mention only a few. Investigation of these properties and related theorems can be found for example in Gatteschi (2002), Krasikov (2003), Natalini and Palumbo (2003), Dimitrov andRafaeli (2009) and. For our purpose, we will mainly need three results which we will present next. In the sequel, λ (α) k,1 < . . . < λ (α) k,k denote the k simple zeros of L (α) k , rearranged in increasing order. The following theorem is due to Natalini and Palumbo (2003).
As already noted by Natalini and Palumbo (2003) Theorem 2.1 implies that λ (α) k,r increases in α. This property will be used in the proof of Proposition 2.6 below. For illustration, we have plotted in Figure 2 the (α, 3)-Laguerre polynomials, L 3 , for α ∈ {0, 2, 4, 6}. It can be seen from the plots that each of the three roots of L (α) 3 increases in α. The second crucial result on a remarkable interlacing property of zeros of shifted Laguerre polynomials is due to Driver and Jordaan (2007). The complete version of their result can be found in their Theorem 2.3. For more results on interlacing properties of zeros of generalized Laguerre and other orthogonal polynomials we refer to Driver (2009), Driver (2012 and the references therein. Theorem 2.2. Driver and Jordaan (2007) Fix α > −1 and k ≥ 1 an integer. If λ (α) k,r , r = 1, . . . , k are the k zeros of the Laguerre polynomial L We illustrate this important result in Table 1 and 2 where it can be seen from the pattern exhibited by the boxes and circles how the zeros of L k,1 ) k≥1 is decreasing. This property is well known and is a special case of a more general decreasing monotonicity property of the zeros of Laguerre polynomials L (α) k when k increases and α is held fixed. See for example the statement in Driver and Jordaan (2007) on pages 615-616 in the Introduction, where n can be taken here to be k and the polynomial p n to be L (0 k , with the corresponding weight w(x) = e −x . Showing this property is not difficult as it only uses induction and some well-known recursive relationships satisfied by Laguerre polynomials. For completeness we give a proof of Proposition 2.3 in the appendix. Now we shall make the link between k-monotonicity of P(λ) and the generalized Laguerre we have the following proposition.

Characterization of k-monotonicity of Poisson distributions
The proof of Proposition 2.4 is immediate as the result follows directly from the definition of L  k,2 ], and so on as the polynomial switches sign to the right of every zero. This yields the following corollary.

Corollary 2.5. For a given integer k ≥ 1, P(λ) is k-monotone if and only if
To guarantee a consistent notation in the corollary above we have used the convention λ (i) k,k−1 = 0 if k = 1. In the sequel we will write λ k instead of λ (0) k,1 . The following proposition characterizes k-monotonicity of P(λ), which is one of our main goals in this work.

Proposition 2.6. For a given integer k ≥ 1, P(λ) is k-monotone if and only
In Table 3 we give the values of λ k for k ∈ {2, . . . , 10}. For k > 3, the value is preceded by "≈" to indicate that it is only an approximation. The latter was found using the R-function "uniroot". Figure 3 shows the plots of the probability  Table 3 The values of the first root λ k for k ∈ {2, 3, 4, 5, 6, 7, 8, 9, 10}. that P(λ k ) is k-but not (k + 1)-monotone. The shapes of the pmfs of P(λ k ) exhibit deeper hollows as k grows. Note also that their value at 0, e −λ k , should increasingly converge to 1 the value taken by the Dirac at 0, the completely monotone limit of P(λ k ), as k → ∞.
Proof of Proposition 2.6. Let k be an even positive integer. To have a more readable proof, let us write Then, using the fact that intersection is distributive over union, the region R + can also be re-written as Since {0, . . . , k/2} is finite, there must exist an r ∈ {0, . . . , k/2} such that the set S r =: {i ∈ N : r(i) = r} is infinite. Theorem 2.1 implies that the sequences (λ Thus, it follows that This allows us to conclude that i∈N I (i) k,r(i) = ∅ whenever S r is infinite for some r > 0. We consider now the case where S 1 , . . . , S k/2 are all finite, which implies that S 0 =: {i ∈ N : r(i) = 0} must be infinite. We will now show that (2.4) k,r = ∅. Hence, the intersection on the right side of the display in (2.4) has to be empty. We conclude that the only element remaining in the union defining R + is the one corresponding to the case S 0 = N, that is, The same reasoning applies to the case when k is odd. To show the second part of the proposition, we know that P(λ) is k-monotone but not (k + 1)-monotone if and only if λ ≤ λ k and λ > λ k+1 . Also, by Proposition 2.3, we know that λ k+1 < λ k . Thus, the latter statement is equivalent to λ ∈ (λ k+1 , λ k ], which completes the proof.
Proposition 2.6 implies the two following results.
Corollary 2.7. Let F be some distribution function supported on [0, ∞). Then, the pmf of the mixture of Poisson distributions with mixing distribution F given by The proof of this corollary is immediate, and hence is omitted. Note that the condition is only sufficient. More discussion about this result can be found in Section 3.

Corollary 2.8. A Poisson distribution P(λ) is completely monotone if and only
if it is equal to the Dirac distribution at 0; i.e., λ = 0.
Proof. It follows from Theorem A.1 (see Appendix) that with j 0,1 ≈ 2.40 is the first zero of the Bessel function of the first kind, J 0 . Now, Proposition 2.6 and the inequality in (2.6) imply that P(λ) is completely monotone if λ ≤ 3/(2k + 1) for all k ∈ N, that is, if λ = 0. This finishes the proof.

Some conclusions
In this article we give a necessary and sufficient on λ for P(λ) to be k-monotone for a given integer k ≥ 1 and a formal proof thereof. It turned out that kmonotonicity of this distribution holds if and only if λ is smaller or equal than λ k , the first root of the Laguerre polynomial L k . The values reported for k ∈ {2, 3, 4, 5} in p.14 of Giguelay (2017) and k ∈ {2, 3, . . . , 10} in Table 1 on p. 99 of Giguelay and Huet (2018) match exactly with the values of λ k given in our Table 3. However, the theoretical statement of Property 8 has a typographical error because it involves the first root of a polynomial that is different from L (0) k . The main result of Proposition 2.6 shows also that λ ∈ (λ k+1 , λ k ] is the characterizing condition for P(λ) to be exactly k-monotone; i.e., k-but not (k + 1)-monotone. This condition gives a full theoretical justification for the choice of λ in the simulation study in Giguelay and Huet (2018). We would like to note that the result of Proposition 2.6 is in a perfect agreement with the intuition: as k grows the value of the k-monotone pmf at 0 should increase, and hence the rate λ should decrease with k. It is also worthy to note that P(λ) is not k-monotone for any k ≥ 1 if λ ∈ (1, ∞). For that range, the Poisson model is log-concave with mode location different from 0. Thus, the adaptive shape of the Poisson distributions and related models can be used differently depending on the statistical application. Chee and Wang (2016) and Giguelay and Huet (2018) both considered kmonotone pmfs to model the distribution of species abundances with the aim of estimating species richness of a given population. Their approaches to kmonotone estimation are different since Chee and Wang (2016) considered maximum likelihood whereas the method implemented by Giguelay (2017) was based on least squares estimation. Also, Chee and Wang (2016) considered the cone generated by the splines s θ (i) = (θ − i) k−1 + / j≤θ j k−1 for i ∈ N, θ ∈ [1, ∞), instead of the family of splines (Q k j ) j≥0 given in (1.2), which is known to generate the whole k-monotone class of pmfs. If it is easy to show that s θ is k-monotone for any θ ∈ [1, ∞), it is less clear whether these splines also generate all kmonotone pmfs. Chee and Wang (2016) showed that their k-monotone maximum likelihood estimation improves upon the one based on mixture of Poisson distributions under a variety of scenarios; see their Table 3. According to Corollary 2.7 mixtures of Poisson distributions are actually part of the k-monotone model provided that the range of the rate is confined to the interval [0, λ k ]. This condition is only sufficient. In fact, in the simplest situation of a mixture of two Poisson distributions with probability mass function πμ i 1 e −μ1 /i! + (1 − π)μ i 2 e −μ2 /i!, i ∈ N, we can check that this pmf is monotone non-increasing (1-monotone) if π = 1/2, μ 1 = 0.5 and μ 2 = 2. In this example, the support of the mixing distribution is L k+1 (λ) = (2k + 1 − λ)L k (λ) − kL k−1 (λ) k + 1 for all λ ∈ [0, ∞). Thus, by definition of λ k we have that where the negative sign is a consequence of the fact that it is assumed that λ k−1 > λ k , which means that L k−1 ≥ 0 on [0, λ k ]. As L k+1 ≥ 0 on [0, λ k+1 ] the inequality L k+1 (λ k ) < 0 implies that we necessarily have λ k > λ k+1 . This completes the proof.
The following theorem is due to Gatteschi (2002).