A Novel Zero-Truncated Katz Distribution by the Lagrange Expansion of the Second Kind with Associated Inferences

: In this article, the Lagrange expansion of the second kind is used to generate a novel zero-truncated Katz distribution; we refer to it as the Lagrangian zero-truncated Katz distribution (LZTKD). Notably, the zero-truncated Katz distribution is a special case of this distribution. Along with the closed form expression of all its statistical characteristics, the LZTKD is proven to provide an adequate model for both underdispersed and overdispersed zero-truncated count datasets. Speciﬁcally, we show that the associated hazard rate function has increasing, decreasing, bathtub, or upside-down bathtub shapes. Moreover, we demonstrate that the LZTKD belongs to the Lagrangian distribution of the ﬁrst kind. Then, applications of the LZTKD in statistical scenarios are explored. The unknown parameters are estimated using the well-reputed method of the maximum likelihood. In addition, the generalized likelihood ratio test procedure is applied to test the signiﬁcance of the additional parameter. In order to evaluate the performance of the maximum likelihood estimates, simulation studies are also conducted. The use of real-life datasets further highlights the relevance and applicability of the proposed model.


Introduction
In probability theory, positive discrete distributions called "zero-truncated distributions" are used to model data that exclude zero counts. For instance, the number of times a voter casts a ballot during the general election, the number of journal articles published in various disciplines, the number of stressful events reported by patients, and the length of hospital stay, which must be at least one day. Various zero-truncated discrete distributions, such as the zero-truncated Poisson distribution (ZTPD) (see [1]), zero-truncated negative-binomial distribution (see [2]), zero-truncated Katz distribution (ZTKD) (see [3]), zero-truncated generalized negative-binomial distribution (ZTGNBD) (see [4]), zero-truncated generalized Poisson distribution (see [5]), intervened Poisson distribution (IPD) (see [6]), intervened generalized Poisson distribution (IGPD) (see [7]), a generalization of the Poisson-Sujatha distribution (AGPSD) (see [8]), and zero-truncated discrete Lindley distribution (ZTDLD) (see [9]), have been proposed in the literature to model such count data. In spite of the abundance of practical situations with counting data without zero categories, there is a notable sparseness of zero-truncated discrete distributions in the scientific literature, in contrast to the vast number of classical discrete distributions.
Since the early 1970s, researchers studying discrete distributions seem to have focused more on "Lagrangian distributions", so named because they are connected to the Lagrange expansions (see [10,11]). The authors in [12] considered the possibility of using Lagrangian distributions to address inferential problems in a random mapping theory. A study in [13] showed that, in certain circumstances, all the discrete Lagrangian distributions converged to the Gaussian distribution and the inverse Gaussian distribution. The authors in [14] proposed certain mixture distributions based on Lagrangian distributions. Recently, Lagrangian distributions were used for turbulent collisional fluid-particle flows (see [15]). A unified method for creating the class of "quasi" distributions, which includes the quasi-binomial, quasi-Polya, quasi-hypergeometric, and several new quasidistributions, was presented in [16] using the Lagrange expansions. As a result, the distributions arose from Lagrange expansions and have gained traction from both theoretical and applied perspectives.
The Lagrangian distributions of the first kind (LD 1 ) and the Lagrangian distributions of the second kind (LD 2 ) were the first divisions of the class of Lagrangian distributions. The authors in [13] were the first to present and study the LD 1 . Several Lagrangian distributions have been constructed using the LD 1 , but four fundamental distributions, which are the generalized negative binomial distribution, the generalized geometric series distribution, the generalized Poisson distribution, and the generalized logarithmic series distribution, are of particular note and have proven to be very useful in practical applications (see [4]). The authors in [17] defined a Lagrangian Katz distribution (LKD) using the LD 1 . The author in [18] showed that the LKD was a subclass of the generalized Polya-Eggenberger family of distributions. The authors in [19] obtained the LKD as a limiting distribution of the Markov-Polya distribution. The authors in [20] discussed the application of the LKD to time series data.
On the other hand, the authors in [21,22] conducted extensive research on the LD 2 . The Geeta distribution and its characteristics were derived in [23] based on the LD 2 . The authors in [24] proposed the Dev distribution and some of its applications in queuing theory by using the LD 2 . Ref. [25] proposed the Harish distribution and inferred some of its characteristics, with applications in the branching process and queuing theory based on the LD 2 . Furthermore, the authors in [18] also used the LD 2 to create the generalized LKD of type two. The competence of the distributions proposed based on the LD 2 profoundly attracted our team, and as a result, we suggested the Lagrangian version of the ZTPD, the zero-truncated binomial distribution, and the IPD (see [26][27][28]). Moreover, the authors in [24] demonstrated that every member of the LD 2 was also a member of the LD 1 . Thus, the authors observed from the literature that several members of both LD 1 and LD 2 were based on various variants of classical discrete distributions that have thoroughly been explored in the literature. Analogously, we were motivated to fill the sparseness of zero-truncated discrete distributions by considering the probability-generating function (PGF) of the ZTKD and generalizing it through the LD 2 and so we named the new distribution LZTKD.
An overview of the remaining study sections is provided below: Section 2 provides a brief summary of the Lagrange expansions. The construction of the LZTKD and its statistical features are explored in Section 3 and Section 4, respectively. In Section 5, it is established that the LZTKD belongs to the LD 1 class. In Section 6, the maximum likelihood (ML) estimation approach is employed to explore the parameter estimation of the LZTKD. The significance of the additional parameter in the LZTKD is evaluated using the likelihood ratio test in Section 7. The simulation results based on the maximum likelihood estimates (MLEs) are included in Section 8. Section 9 provides an empirical illustration of the LZTKD, and Section 10 concludes the article.

Some Basic Preliminary Results
In this section, we go over some fundamental concepts, such as the Lagrange expansions at the basis of the LD 1 and LD 2 , as well as some distributions that belong to the LD 1 and LD 2 that have already been published in the literature.

Lagrange Expansions
Let us first present the Lagrange expansions described in [10,11]. These expansions are described as and where D r = ∂ ∂z r and z = u k 1 (z), under the conditions that k 1 (z) and k 2 (z) are two analytic functions of z in [−1,1], which are differentiable with respect to z and such that k 1 (0) = 0.
These expansions are at the basis of our findings.

Lagrangian Distribution of the First Kind
Along with the Lagrange expansion given in Equation (1), under the following additional conditions: for y = 0, 1, 2, . . . in Equation (1), we can define the probability mass function (PMF) of the LD 1 as The class of Lagrangian distributions given in Equation (3) is sometimes denoted as LD 1 (k 1 (z), k 2 (z)). The corresponding PGF of the PMF given in Equation (3) is indicated as where u = z k 1 (z) . The functions k 1 (z) and k 2 (z) are called the transformed function and transformer function, respectively. Some important members belonging to the LD 1 available in the literature are discussed below.

Generalized Katz Distribution
A special case of the LD 1 includes the generalized Katz distribution (GKD) given in [4]. It is generated through the PGF of the Katz distribution (KD). That is, the PMF of the GKD is obtained by applying k 1 (z) = 1−αz Hence, it is given by where ( x y ) is the generalized binomial coefficient, that is, ( x y ) = x(x−1)...(x−y+1) y! , γ > 0, 0 < α < 1, and β > 0.

Lagrangian Distribution of the Second Kind
Along with the Lagrange expansion given in Equation (2), under the conditions for y = 0, 1, . . . in Equation (2), we can define the PMF of the LD 2 (see [21,29]). Explicitly, it is given by The class of Lagrangian distributions given in Equation (4) is sometimes denoted as LD 2 (k 1 (z), k 2 (z)). The corresponding PGF is given by where u = z k 1 (z) .
In this case, the functions k 1 (z) and k 2 (z) are also called the transformed function and transformer function, respectively. Numerous members of the LD 2 are available in the literature, some of them are described below.

Weighted Consul Distribution
A special case of the LD 2 includes the weighted Consul distribution (WCD) given in [4], which is generated through the PGF of the binomial distribution and an analytic function. That is, the PMF of the WCD is obtained by applying k 1 (z) = z and k 2 (z) = (1 − α + αz) β in Equation (4). It is given as where 0 < α < 1 and β < α −1 .

Rectangular-Poisson Distribution
A special case of the LD 2 includes the rectangular-Poisson distribution (RPD) given in [4], which is generated through the PGF of the rectangular distribution and the PGF of the Poisson distribution. That is, the PMF of the RPD is obtained by applying k 1 (z) = e α(z−1) and k 2 (z) = 1−z n n(1−z) in Equation (4). Hence, it is expressed as where n > 0 is an integer, a = min(y, n − 1), 0 < α < 1 .

Rectangular-Binomial Distribution
The rectangular-binomial distribution (RBD) given in [4] is a special case of the LD 2 , which is generated by the PGF of the binomial and rectangular distributions, respectively. That is, the PMF of the RBD is obtained by applying k 1 (z) = (1 − α + αz) β and k 2 (z) = 1−z n n(1−z) in Equation (4). It is thus obtained as where n > 0 is an integer, a = min(y, n − 1), 0 < α < 1, and β < α −1 .
Given the applications of the Lagrangian distributions generated with various PGFs, it is worthwhile to investigate other horizon Lagrangian distributions that make use of new PGFs. This serves as the amended study distribution, which is displayed below.

Lagrangian Zero-Truncated Katz Distribution (LZTKD)
In this section, we adopt the PMF of the LD 2 given in Equation (4) to derive the PMF of the LZTKD. Here, we consider k 1 (z) as the PGF of the KD with parameters 0 < α < 1 and β < 1 − α, and k 2 (z) as the PGF of the ZTKD with parameters 0 < α < 1 and γ > 0 to generate the LZTKD.
That is, we take The analytic functions given in Equation (6) satisfy the conditions presented in Section 2.3. That is, we have Then, under the transformation z = u 1−αz , the PMF of the LD 2 given in Equation (4) can be derived as follows: Hence, the definition of the LZTKD can be formalized as follows: Assume that a random variable (RV) Y follows the LZTKD, with 0 < α < 1, 0 < β < 1 − α, and γ > 0. Then, the PMF of Y is given by with y = 1, 2, 3 . . .
This distribution is denoted as LZTKD(α, β, γ), and one can write Y ∼ LZTKD(α, β, γ) to inform that Y follows the LZTKD with the parameters α, β, and γ. Now, Figure 1 portrays the graphical representation of the PMF of the LZTKD for different parameter values of α, β, and γ. We see that it is monotonically decreasing for increasing values of the parameters α and γ, and decreasing the value of the parameter β as the value of y increases. In addition, this graph takes on a bell-shaped appearance as the value of y increases if both the α and γ parameters increase but the parameter β remains constant.
The hazard rate function (HRF) of the LZTKD is obtained by substituting the PMF in the following equation: From Equation (8), it goes without saying that determining the closed-form expression of the HRF is more difficult. However, to determine the shape of the HRF, we sketched its graph. Figure 2 demonstrates that it has increasing, decreasing, bathtub, and upside-down bathtub shapes for various parameter values.
Proof. For β = 0, the LZTKD defined with the PMF given in Equation (7) reduces to the ZTKD; the following PMF is obtained: In this sense, the LZTKD is a generalization of the ZTKD.
Proof. For β = 0 in Equation (6), the PMF of the LD 2 given in Equation (4) can be rederived as follows: which is the PMF of the ZTKD given in [3]. The proof is completed.

Mathematical Properties
In this section, we present some important mathematical properties of the LZTKD, including the median, mode, factorial moments, mean, variance, coefficient of variation (CV), index of dispersion (IOD), skewness, and kurtosis.

Median
Let Y be a RV following the LZTKD. The median of Y is then defined by the smaller integer k ∈ {1, 2, 3 . . . } such that P(Y ≤ k) ≥ 1 2 , also written as (9)

Mode
Let Y be a RV following the LZTKD. Then, the mode of Y, denoted by y m , exists in {1, 2, 3 . . . }. It corresponds to the integer y for which the PMF f (y) has the greatest value. That is, we aim to solve f (y) ≥ f (y − 1) and f (y) ≥ f (y + 1). First, we note that f (y) can also be written as Moreover, the inequality f (y) ≥ f (y + 1) implies that By combining Equations (10) and (11), we obtain the following condition:

Probability Generating Function
The Lagrangian transformation z = u 1−αz , when expanded in powers of u, provides the PGF of the LD 2 given in Equation (5). That is, where z = u 1−αz

Remark 1.
The moment-generating function (MGF) of a RV Y following the LZTKD is obtained by putting z = e s and u = e v in Equation (13). This yields where

Distribution of Sample Sum
Let Y 1 , Y 2 , . . . , Y n be n independently and identically distributed (iid) RVs following the LZTKD. Then, the distribution of the sample sum W = ∑ n i=1 Y i has the following PGF: Indeed, based on the PGF of the LZTKD given in Equation (13), the PGF of the RV W becomes

Factorial Moment
For any integer r ≥ 1, the rth factorial moments µ [r] of the LZTKD is calculated by successively differentiating G(u) in Equation (4) r times with respect to u, and by setting u = z = 1. Thus, we consider Taking the first derivative with respect to u on both sides, we obtain Then, taking second derivative with respect to u, we obtain Proceeding like this, we obtain an rth derivative of the following form: For u = z = 1, Equation (15) can be written as We have k 1 (z) = 1−αz , which are substituted in Equation (16) to yield

Mean and Variance
The mean (µ 1 ) and variance (σ 2 ) for the LZTKD are now determined. Using Equation (17), we have

Index of Dispersion and Coefficient of Variation
A normalized measure of dispersion can be obtained by using the variance-to-mean relationship. This measure, the well-known IOD, is given by Analogously, the CV of the RV Y has the following form: The skewness and kurtosis coefficients of a distribution are frequently used to measure the degree of asymmetry and flatness, respectively. These coefficients are essential to characterize the shape of any distribution, but for the LZTKD, the expressions obtained for such measures were extensive and too lengthy. However, they can be calculated numerically. They are given in Table 1, as well as the mean, variance, CV, and IOD for particular values of the parameters.
It is clear from this table that for α > 0 and β > 0, the LZTKD exhibits overdispersion (IOD > 1) and for α → 0 and β → 0, the LZTKD exhibits underdispersion (IOD < 1). When the parameter value of γ increases, the mean and variance of the LZTKD increases. Moreover, it is noteworthy that the LZTKD has various kurtosis levels and is mainly right-skewed.
Proof. For the PMF of the LD 1 given in Equation (3) which belongs to the LD 1 .
For the PMF of the LD 2 given in Equation (4), we have This completes the proof.
To show the LZTKD belongs to the LD 1 , we adopt the following equivalence theorem given in [24], also discussed in [4]. Theorem 2. Let k 1 (z), k 2 (z), and k 3 (z) be three analytical functions, which are successively differentiable for |z|≤1 and such that k 1 (0) = 0 and k 1 (1) = k 2 (1) = k 3 (1) = 1. Then, under the transformation z = uk 1 (z), every member of the LD 2 is a member of the LD 1 by choosing Proof. The proof is not new; it is given in [4] and hence omitted.
Proof. The LZTKD belongs to the LD 1 by choosing Proof. For the LD 2 (k 1 (z), k 3 (z)), the PMF can be rewritten as It is the same PMF as the one of the zero-truncated generalized Katz distribution (ZTGKD). It is given in [4] and belongs to the LD 1 .

Estimation of the Parameters
In this section, we estimate the unknown parameters of the LZTKD by the method of the ML.
As a first remark, the model related to the LZTKD is a three-parameter model with parameters α, β, and γ. Let a random sample of size n be from the LZTKD and let the observed frequency be n y , y = 1, 2, 3 . . . , k, so that ∑ k y=1 n y = n, where k is the largest of the observed value having nonzero frequencies. Then, the corresponding likelihood function is given by Thus, the log-likelihood function is obtained as where y = 1 n ∑ k y=1 y n y . The maximization of L n with respect to the parameters gives their respective MLEs. They can also be obtained by considering the following differentiation approach. The score function associated with this log-likelihood function is Now, by solving ∂L n ∂α = 0, ∂L n ∂β =0, and ∂L n ∂γ = 0 simultaneously, we obtain the associated nonlinear log-likelihood equations. Consequently, these equations are given by Thus, the solutions of these three equations give the MLEs.
In this research, we maximized the log-likelihood function to find the MLEs in the numerical optimization. The fitdistrplus package of RStudio software was used to fix a lower and upper bound for each parameter using the numerical optimization technique "L-BFGS-B", see [30]. When there are uncertainties about the initial guesses and convergence of the algorithm, fitdistrplus is a highly useful tool that provides original solutions for the MLEs. In order to provide the algorithm with good starting values, we employed the prefit function of that package. Convergence is indicated using certain integer codes as one of the mledist function's returning components, with "0" denoting a successful convergence and "1" denoting that the maximum number of iterations is used. As a result, a value of "10" indicates that the algorithm is degenerate, and a value of "100" shows that the algorithm made a mistake inside. One can click on the following link for further information about this package https://CRAN.R-project.org/package=fitdistrplus accessed on 3 January 2023. The corresponding R code is given in Appendix A.

Likelihood Ratio Test
In this section, we test the significance of an additional parameter included in the LZTKD using the generalized likelihood ratio test (GLRT) (see [31]).
More precisely, to test the significance of the parameter β of the LZTKD(α, β, γ), we consider the GLRT procedure. The null hypothesis is that H 0 : Y follows the ZTKD against the alternative hypothesis that H 1 : Y follows the LZTKD. In this setting, the test statistic is given by whereΘ is the vector of MLEs of Θ = (α, β, γ) with no constraints, andΘ * is the vector of MLEs of Θ under H 0 . The test statistic presented in Equation (19) is asymptotically distributed as the χ 2 distribution with one degree of freedom.

Simulation Study
To evaluate the performance of the estimates obtained using the ML estimation approach, we ran a quick simulation exercise in this section. We simulated an LZTKD random sample using the inverse transformation method (see [32]). The following is the inverse transform algorithm for generating a value from the LZTKD: Step1: Generate a random number from the uniform U(0, 1) distribution. Step2 Step3: If U < F, set X = i and stop. Step4 Step5: Go to Step 3.
In the above description, P is the probability that X = i, and F is the probability that X is less than or equal to i. The iteration process was repeated N = 1, 000 times and three parameter sets were considered. The specification of these sets was as follows: (i) α = 0.80, β = 0.03 and γ = 0.80. (ii) α = 0.35, β = 0.09, and γ = 3.12.
Thus, we computed the average of the mean square error (MSE), and average absolute bias using the MLEs.
The average absolute bias of the simulated estimates was calculated as 1 1000 ∑ 1000 i=1 |ω i − ω| and the average MSE of the simulated estimates was calculated as 1 1000 ∑ 1000 i=1 (ω i − ω) 2 , in which i is the number of iterations, ω ∈ {α, β, γ } andω is the MLE of ω. Table 2 provides a summary of the study for samples of sizes 50, 250, 500, and 1000. As the sample size increases and for the three parameter sets, it can be seen that the MSEs are in decreasing order, and the MLEs of the parameters become closer to their original parameter values, indicating their consistency property.

Presentation
The purpose of this section is to demonstrate the LZTKD's empirical relevance. To this end, two COVID-19 datasets were considered. In the first COVID-19 dataset, daily newly reported cases were included, while in the second COVID-19 dataset, daily deaths were included. Since the outbreak's detection, almost every country has reported at least one new positive case and death each day. To the best of our knowledge, zero-truncated distributions are the most suitable statistical model in this case. In order to show how the LZTKD might be useful, we compared the fits of the various competing distributions, which are presented in Table 3. To evaluate these datasets numerically, we used RStudio software version 4.2.1.  [4] The HRF of the datasets was determined using a graphical technique based on the total time on test (TTT) plot. If a TTT plot is convex, concave, convex then concave, or concave then convex, the corresponding HRF has a decreasing, increasing, bathtub shape, or an upside-down bathtub shape, respectively (see [33]).

Daily New Cases of COVID-19 Dataset
Here, we considered a dataset of daily newly reported COVID-19 instances from Algeria in East Africa, recorded between 13 June 2022 to 3 October 2022. These data are accessible at http://covid19.who.int/data, (accessed on 20 October 2022). The dataset is: 2 10 6 9 12 4 3 4 10 8 13 9 10 5 8 11 13 11 14 18 10 13 19 17 17 21 26 18 11 17 29 25 28  The descriptive measures of this dataset, which include sample size (n), minimum (min), first quartile (Q 1 ), median (M d ), third quartile (Q 3 ), maximum (max), and interquartile range (IQR), are given in Table 4. In addition, Figure 3 shows the corresponding empirical TTT plot. It revealed an upside-down bathtub shape HRF.  We compared the competitive distributions to the LZTKD employing the statistical techniques provided, namely the negative log-likelihood (−log L), Akaike information criterion (AIC), Bayesian information criterion (BIC), and χ 2 value. Table 5 displays the corresponding MLEs, model adequacy measures, and χ 2 values. As it can be seen in this table, the model adequacy measures and χ 2 value of the LZTKD are lower than those of the other studied distributions. The suggested model is therefore the most suitable one to model the provided dataset. In the case of the GLRT, the calculated value based on the test statistic given in Equation (19) was 2(−532.3369 + 637.6204) = 210.567 (p-value = 0.03620). As a result, at any level > 0.03620, the null hypothesis is rejected in favor of the alternative hypothesis. Hence, we conclude that the additional parameter β in the LZTKD is significant in light of the test procedure outlined in Section 7.
In addition, Figure 4 shows an empirical TTT plot for the COVID-19 dataset from Bosnia and Herzegovina and it shows an increasing HRF.
We used well-established statistical measures to compare the competitive distributions to the LZTKD, including the − log L, AIC, BIC, and χ 2 value. Table 7 displays the corresponding MLEs, model adequacy measures, and χ 2 values. It is observed that the LZTKD's model adequacy measures and χ 2 value are lower than those of the other distri-butions studied. Because of this, the suggested model is the best choice for modeling the considered dataset.   In the case of the GLRT, the calculated value based on the test statistic given in Equation (19) was 2(−1422.617 + 1764.195) = 341.578 (p-value = 0.02620). As a result, at any level > 0.02620, the null hypothesis is rejected in favor of the alternative hypothesis. Hence, we conclude that the additional parameter β in the LZTKD is significant in light of the test procedure outlined in Section 7.

Concluding Remarks
In this article, we proposed a novel zero-truncated Lagrangian distribution called the "LZTKD" using the Lagrange expansion of the second kind. We demonstrated that the ZTKD was a special case of the LZTKD. We looked at the shape properties of the PMF and HRF of the LZTKD. The expressions for the factorial moments, generating functions, mean, and median were derived. Using the equivalence theorem of the class of Lagrangian distributions, we demonstrated that the LZTKD belonged to the LD 1 . Subsequently, the ML method was employed to estimate the model parameters for the LZTKD. Using the GLRT procedure, we tested the significance of the additional parameter included in the LZTKD. Simulated studies were conducted to show the effectiveness of MLEs. Two actual datasets were used to validate the results, which proved that the LZTKD offered a superior fit compared to competing models. The LZTKD may also act as a baseline distribution for the hurdle model's development. If the bivariate version of the LZTKD and the corresponding regression model are constructed, this research may go in a new direction. This task requires a lot of improvements and research, which we leave for further study.  Acknowledgments: The editors and the unknown reviewers are to be thanked for their insightful comments, which helped to substantially improve the current version of our work.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: