Classical and Bayesian inference for the discrete Poisson Ramos-Louzada distribution with application to COVID-19 data

: The present study is based on the derivation of a new extension of the Poisson distribution using the Ramos-Louzada distribution. Several statistical properties of the new distribution are derived including, factorial moments, moment-generating function, probability moments, skewness, kurtosis, and dispersion index. Some reliability properties are also derived. The model parameter is estimated using different classical estimation techniques. A comprehensive simulation study was used to identify the best estimation method. Bayesian estimation with a gamma prior is also utilized to estimate the parameter. Three examples were used to demonstrate the utility of the proposed model. These applications revealed that the PRL-based model outperforms certain existing competing one-parameter discrete models such as the discrete Rayleigh, Poisson, discrete inverted Topp-Leone, discrete Pareto and discrete Burr-Hatke distributions.


Introduction
Data modeling has become extremely complicated in recent years as a result of the massive amount of data collected from many sectors, mainly in engineering, medicine, ecology, and renewable energy. The most popular option for analyzing count data sets is the Poisson distribution. The Poisson distribution has the drawback of being unable to represent overdispersed data sets. Overdispersion happens when the variation exceeds the mean. For count data sets, many researchers have presented mixed-Poisson distributions such as Poisson inverse Gaussian by [1], Conway-Maxwell-Poisson [2], Generalized Poisson Lindley [3], Poisson Weibull [4], Poisson Ishita [5], Poisson quasi-Lindley [6], Poisson Xgamma [7,8], Poisson XLindley [9], Poisson Moment Exponential [10], among authors. Even though there are several discrete models in the literature, there is still plenty of room to suggest a new discretized model that is acceptable under a variety of scenarios.
Let X be a random variable having Ramos and Louzada distribution [11] with the probability density function (PDF) given by where is the scale parameter.
In this study, a new one-parameter discrete distribution for modeling count observations is introduced by compounding the Poisson distribution with Ramous-Louzada (RL) distribution. The resulting model is called the Poisson Ramous-Louzada (PRL) distribution. The major reason for the selection of the RL distribution as a compounding distribution is because of its simple form, which is needed to compute the statistical properties of the proposed distribution and estimate the unknown parameter. The proposed model may be used to model count datasets, which are frequently seen in real-world data modeling. To build a mixed Poisson model, it is assumed that the Poisson model's parameter is a random variable (RV) with a continuous distribution, and the count variable is drawn from the Poisson distribution conditional on the random parameter. As a result, the count variable's marginal distribution is a mixed Poisson distribution.
The remainder of the paper is structured as follows: The new model is described in Section 2 and gives graphical representations of PMF, and HRF. Section 3 deduces several mathematical characteristics. Section 4 estimates the PRL parameter using the following classical estimation methods, maximum likelihood estimation (MLE), Anderson Darling (AD), Cramer von Mises (CVM), ordinary least-squares (OLS) and weighted least squares (WLS), and a simulation study is also given. Section 5 additionally discusses the Bayesian model formulation for the suggested distribution. Section 6 examines three real-world data sets to demonstrate the versatility of the PRL distribution. Section 6 also includes a Bayesian study of real-world data sets using Markov chain Monte Carlo methods. Section 7 concludes with some recommendations.   We have

The Structure of the new model
The PMF behavior of the Poisson Ramos-Louzada distribution for various parameter values is shown in Figure 1.
As can be seen, the PMF has a positively skewed and can be used to discuss the count data that is positively skewed. The corresponding CDF of the discrete Poisson Ramos-Louzada distribution is given as The hazard rate function (HRF), and reversed hazard rate function can be expressed as ( ; ) = : ( ;1);1 The graphs below depict the behavior of the HRF of the discrete PRL distribution for various parameter values.

Statistical properties of PRL distribution
This section has examined some statistical measures of the PRL distribution. Moments, the moment generating function (MGF), and the probability generation function are among them (pgf).

Moments of PRL distribution
Assume is a PRL random variable, the r th factorial moments can be derived as The first four factorial moments can be expressed as The first four moments about the mean of the PRL distribution are obtained.
and ( ) = The moment-generating function of RV X can be expressed as The probability-generating function of PRL distribution can be derived as

Parameter estimation
In this section, the parameter of PRL distribution is examined using some classical estimation approaches. The considered estimation approaches are maximum likelihood, Anderson-Darling, Cramer von Mises, least squares, and weighted least squares.

Maximum likelihood estimation
be a random sample of failure times from PRL distribution, and the likelihood function for the parameter can be written as and log-likelihood function is specified by We get the following equation by deriving Eq (17) with regard to parameter : . (18) The ML estimate is obtained by equating the above equation to zero and solving it for parameter . However, the ensuing expression has not a closed-form result and the required results can be obtained using iterative procedures.

Anderson darling estimation
The Anderson-Darling (AD) estimator ̂ of parameter can be defined by minimizing the following expression

Ordinary least squares estimation
The ordinary least-square (OLS) estimator of the PRL model parameter can be obtained by minimizing with respect to the parameter . Moreover, the LSE of is also obtained by solving

Weighted least-square estimation
The WLS estimate (WLSE) of , say , can be determined by minimizing with respect to . The WLSE of can also be obtained by solving In which ϕ( : | ) is presented in (19).

Cramer Von-Misses estimator
The Cramer von Mises (CVM) is a minimum distance-based estimator. The CVM of the PRL distribution can be obtained by minimizing with respect to the parameter .
The CVME of is also obtained by solving

Simulation
In this section, we performed a simulation study to evaluate the accuracy of all considered estimators. In the simulation run, we generate 10,000 samples of size n = 10, 25, 50, 100, 200, and 300 from PRL distribution and then calculate the average estimates (AE), absolute bias (AB), mean relative error (MRE) and mean square error (MSE). For this purpose, we consider the six sets of values of parameter . The simulation results are presented in Tables 2-7.

Bayesian analysis
The Bayesian parameter estimation technique is an alternate to classical maximum likelihood estimation. In Bayesian estimation, a prior distribution must be defined for each unknown parameter. Consider a set of data = 1 , 2 , … , taken from discrete PRL distribution and the likelihood function is provided by . (20) The Bayesian model is constructed by stating the prior distribution for the model parameter and then multiplying it with the likelihood function for the provided data using the Bayes theorem to generate the posterior distribution function. The prior distribution of parameter is denoted as ( ).
For the proposed distribution, the gamma distribution is considered a prior distribution with known hyperparameters such as ~( , ). The posterior expression, up to proportionality, may be found by multiplying the likelihood by the prior, and this can be represented as The posterior density is not mathematically tractable; for inference purposes, we will utilize the Markov Chain Monte Carlo (MCMC) approach to mimic posterior samples, allowing for easy sample-based conclusions.
In the present study, we explore the application of MCMC algorithms implemented in the package MCMCpack of the R program to simulate samples from the joint posterior distribution. For this purpose, we generated 1006000 samples of the joint posterior distribution of interest. The effects of the initial values in the iterative process are eliminated after a burn-in phase of 6000 simulated samples. To achieve approximately independent samples, a thinning interval of size 300 was utilized. The parameter Bayes estimates were gained by taking the expected value of generated samples. Traceplots and the Geweke diagnostic were used to monitor the convergence of the simulated sequences. The asymptotic standard error of the difference divided by the difference between the two means of non-overlapping parts of a simulated Markov chain is the basis of the Geweke convergence diagnostic. We may say that a chain has reached convergence if its corresponding absolute z score is smaller than 1.96 since this z score asymptotically follows a typical normal distribution. The construction of interesting posterior summaries was done using the R software package MCMCpack.

Application
This section is ardent to prove the usefulness of the discrete Poisson Ramos-Louzada distribution in the modeling of three datasets. We compare the fits of the proposed distribution with some renowned one-parameter discrete distributions, discrete Raleigh [12], Poisson, discrete Pareto [13] and discrete Burr-Hatke [14], discrete Inverted Topp-Leone [15]. The Kolmogorov-Smirnov (KS) test, Akaike Information Criteria (AIC) and Bayesian Information Criteria (BIC) are used to compare the fitted models. We also illustrate the estimation procedures based on censored samples proposed in the previous section with three examples from the literature.

Data I (Failure times of electronic components)
A sample of the failure time of 15 electronic components in an acceleration life test [16]. The observations are 1,5,6,11,12,19,20,22,23,31,37,46,54,60, and 66. The mean and variance of the first dataset are 27.533 and 431.94 respectively. The dispersion index value is 15.689 which indicates that the dataset is overdispersed. We determine the MLEs, standard errors (SE), and model selection measures (AIC, BIC, and KS) for the first dataset using the R software's maxLik package. These results are shown in Table 8 along with the model selection measures.  For Bayesian data analysis, the parameter of the PRL distribution was assumed to have an approximate gamma as the prior distribution, that is, ~(0.001, 0.1). Figure 4 depicts posterior samples for the parameter . The evaluation of the MCMC draws across iterations is assessed using traceplot, posterior density, and ACF plot. From the traceplot, it is interesting to note that the samples produced attained acceptable convergence. The ACF plot indicates that the posterior samples are uncorrelated. Furthermore, the z-score of the Geweke test is -0.2498, indicating that the samples have sufficiently converged to a stable distribution. The posterior mean for τ is = 13.00418 with a standard deviation of 2.18641, and the corresponding 95% highest density interval is (9.008356, 17.3976). We observe that the ML and Bayesian estimates are quite similar.  These results are shown in Table 9. For Bayesian data analysis, the parameter tau of the PRL distribution was assumed to have a gamma prior distribution. The associated Geweke z-score is -0.08203, which likewise indicates that the samples have sufficiently converged to a stable distribution. The posterior mean for τ is = 32.0684 with a standard deviation of 2.89397, and a 95% HDI of (26.20931, 37.44432). The ML and Bayesian estimates are discernibly similar to one another.

Data set III (Deaths due to COVID-19 in Pakistan)
The third dataset is also about deaths due to COVID-19 in Pakistan from 18 March 2020 to 30 June 2020. The data are: 1, 6, 6, 4 , 4, 4, 1, 20, 5, 2, 3, 15, 17, 7, 8, 25, 8, 25, 11, 25, 16, 16, 12, 11, 20, 31 Table 10.  For the third dataset, the gamma distribution is again considered as the prior distribution, and the posterior samples for the parameter are described in Figure 8. Furthermore, the Geweke z-score is used as a diagnostic measure and its value is -0.03794, suggesting convergence of the samples to a stable distribution. The posterior mean for the third dataset is = 46.96159 with a standard deviation of 4.92385. The corresponding 95% HDI (37.94273, 57.07319). The ML and Bayes estimate is quite similar to each other.

Conclusions
In this paper, we introduce a one-parameter discrete distribution by compounding Poisson with the Ramos-Louzada distribution. The proposed distribution is showing unimodal and positively skewed behavior. The failure rate of new distribution is increasing pattern. Some statistical properties derived include the moment-generating function, probability-generating function, factorial moments, dispersion index, skewness and kurtosis. The model parameter is estimated using the maximum likelihood estimation approach and the behavior of the derived estimator is assessed via a simulation study. The usefulness of the proposed distribution is carried out using three real-life datasets. The proposed distribution provides more efficient results than all considered competitive distributions. The Bayesian analysis is also performed by taking the MCMC approximation approach.

"Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.