A new discrete pareto type (IV) model: theory, properties and applications

Discrete analogue of a continuous distribution (especially in the univariate domain) is not new in the literature. The work of discretizing continuous distributions begun with the paper by Nakagawa and Osaki (1975) to the best of the knowledge of the author. Since then several authors proposed discrete analogues of known continuous models. In this paper, we propose and study a discrete analogue of the continuous Pareto (type IV) distribution, namely the discrete Pareto (type IV) distribution (DPIV, henceforth, in short) that has three parameters. Its probability mass function can be approximately symmetric, right-skewed and left-skewed shapes, and the hazard rate function possesses decreasing and upside-down bathtub shapes. Also, the proposed discrete distribution can be under-, over- or equi- dispersion. The flexibility of the new discrete model is illustrated by means of three applications to real life data sets arising out of various domains affecting our life.


Introduction
The discrete distributions are useful when count phenomenon occurs. The discrete models are as important as the continuous models. Nowadays, both types of models can be used in fascinating ways to explore real life data sets available in different fields of studies. One convincing way is the compounding technique, where the discrete and continuous models are mixed together for better exploration of phenomenons under study. The other interesting technique began with the work of (Nakagawa and Osaki 1975), who first introduced the concept of discretizing a continuous model into discrete one. There are many situations where it is inappropriate to describe the lifetime of devices on a continuous scale. For example, a piece of equipment operates in cycles and experimenter observes that the number of cycles successfully completed prior to the failure. In such case, the time to failure is more appropriately represented by the number of times they are used before they fail, which is a discrete random variable. Salvia and Bollinger (1982), and Padgett and Spurrier (1985) discussed discrete hazard rate functions (h.r.f.) and failure rate models by giving illustrations to such situation. Xie et al. (2002) also defined another discrete h.r.f. h(k) = log[ S(k − 1)/S(k)] for a random variable K, where S(· · · ) is the reliability function, which gives similar results to those for continuous h.r.f. 's. (Bracquemond and Gaudoin 2003) presented a survey on discrete lifetime distributions and suggested two ways by which discrete distributions can be derived from the continuous ones; (i) to consider a characteristic property of a continuous distribution and to build the similar property in discrete time, and (ii) to consider discrete lifetime as the integer part of continuous lifetime. Lai (2013) also appreciated the discretization of a continuous model but stressed that a continuous lifetime random variable may be characterized either by its cumulative distribution function (c.d.f.), probability density function (p.d.f.) or h.r.f. which are equivalent in the sense that one can be uniquely determined by the other.
In reliability theory, classification of lifetime models is defined in terms of their survival function (s.f.) and other reliability characteristics. For example, the increasing (decreasing) failure rate IFR (DFR) class, increasing (decreasing) failure rate average IFRA (DFRA) class, the new better (worse) than used NBU (NWU) class, new better (worse) than used in expectation NBUE (NWUE) class and increasing (decreasing) mean residual lifetime IMRL (DMRL) class etc. (See Kemp 2004) and the references cited therein). The discretization of a continuous lifetime distribution retains the same functional form of the survival function, therefore, many reliability characteristics and properties shall remain unchanged. Thus, discretization of a continuous lifetime model is an interesting and simple approach to derive a discrete lifetime model corresponding to the continuous one.
In the last two or three decades, there has been a growing interest in introducing discretized continuous distributions. For more details the reader is referred to (Chakraborty 2015) and the references cited therein. In the literature, there are several different methods of discretizing a continuous probability model. We will consider the approach which is due to (Nakagawa and Osaki 1975) and (Roy 2003). This is a rather more general approach for discretizing continuous models, adopted in this paper. The following proposition may lead to this approach.
Proposition 1 Given a continuous random variable X with survival function S X (x), a discrete random variable Y can be defined as Y = x , where x = max{m ∈ Z | m ≤ x}, the floor function. The probability mass function (p.m.f.) P (Y = y) of Y is then given by where y ∈ Z, and Z is the set of integers. Consequently, a continuous failure-time model can be used to generate a discrete time model by introducing a grouping on the time axis. To put it in a simple way, if X is a continuous random variable, then the p.m.f. of its integer part, that is T = dX = X , can be viewed as a discrete concentration of the p.d.f. of X. Such discretized distributions retain the same functional form of the survival function as that of the continuous ones and the reliability characteristics also do not change.
(3) will be said to have the DPIV distribution.
From (3), the c.d.f. and the survival function of a random variable that follows the DPIV distribution are given as follows The next result discusses the limiting behavior of the DPIV distribution corresponding to various parameter choices at the boundary.
Result 1 Some representative plots of the DPIV p.m.f. are provided in Fig. 1, for varying parameter values.
From the plots in Fig. 1, it appears that the distribution is right skewed. If we wanted to apply this to some real life application, we would desire the data to also possess this right skewed characteristic for a better model fit. It is also important to note that the Pareto (Type IV) distribution is highly sensitive to changes in the α parameter as this is the shape parameter (also known as the tail index). Furthermore, from Fig. 1, it represents the fact that for larger values of the parameter α, and γ the mode moves to the right, indicating that the proposed distribution is quite versatile in nature; while smaller values of α appear to have a significant effect on the respective probabilities, and of course, on the values of the moments. We found that mass points were more evenly distributed when α ≤ 2, γ = 0.4. As we will see in the "Estimation" section, this sensitivity will become an issue in estimating the parameters under the maximum likelihood method.
The rest of the paper will be organized in the following way. "Structural properties" introduces the discrete Pareto (IV) distribution. The maximum likelihood estimation in DPIV distribution is discussed in detail with simulation studies in "Estimation". For illustrative purposes, three different data sets from various real life scenarios are re-analyzed to show the applicability of the proposed DPIV distribution "Application". Finally, we conclude this paper by providing some final remarks in "Concluding remarks" sections.

Structural properties
In this section, we discuss some important structural properties of the DPIV distribution. At first, we have the following lemma.
Proof Follows immediately from (3). Stochastic ordering is an integral tool to judge comparative behaviors of random variables. Many stochastic orders exist and have various applications. Theorem 1 and Corollary 1 (below) give some results on the stochastic orderings of the DPIV. The orders considered here are the stochastic order ≤ st , and the expectation order ≤ E .
Proof Follows immediately from the c.d.f. of the DPIV (θ, γ , σ ) distribution. Next, we describe the expectation ordering in the next Corollary which follows from Theorem 1.

Ghosh Journal of Statistical Distributions and Applications
(2020) 7:3 Page 5 of 17 Notice that for the DPIV (θ, γ , σ ) distribution For θ > 0, σ > 0, γ > 0, the distribution is infinitely divisible since the expression in Eq. (6) can be negative and g(0) = 0, g(1) = 0, see Warde and Katti (1971) for details. Then, in this case the distribution has its mode at zero. Furthermore, the infinitely divisible distribution plays an important role in many areas of statistics, for example, in stochastic processes and in actuarial statistics. When a distribution G is infinitely divisible, then for any integer y ≥ 2, there exists a distribution G y such that G is the y-fold convolution of G, namely, G = G * x x . Also, when a distribution is infinitely divisible an upper bound for the variance can be obtained when θ > 0, σ > 0, γ > 0 (see Johnson and Kotz (1982), page 75), which is given by The following results hold for the p.m.f. of the DPIV distribution in Eq.
• The cumulants of an infinitely divisible distribution on the set of non-negative integers (as far as they exist) are non-negative, see Steutel and Van Harn (2004), page 47, Corollary 7.2. This will imply that the skewness of the new distribution is positive, since the third cumulant equals the third central moment.

Increasing and decreasing failure rate
The purpose of this section is to find a relationship between the parameters of this model in order to study the failure rates. From the beginning of this section, we have that the failure rate, r(x) is given by : If γ is an integer, the failure rate is indeterministic. Therefore, this parametric relationship yields information about the  (4) is increasing. Furthermore, we observe the following: • If 1 γ < 1, then φ(x) is increasing, and with θ increasing (equivalently α decreasing), r(x) is decreasing. This proves the fact that it has a DFR in such a scenario.
is decreasing, and with θ decreasing (equivalently α increasing), r(x) is increasing. This proves the fact that it has a IFR in such a scenario.
According to (Kemp 2004), page 3074 one can say the following relationships for discrete distributions which are applicable to the DPIV distribution Eq. (3) given below.

Moments and generating functions
The r th moment of a random variable X with the p.m.f. in Eq.

Probability generating function and factorial moments
The relationship between the probabilities and the associated factorial moments (see Johnson et al. (2005), page 59) [x+r] x! r! , which is due to (Frechet 1940;1943). Again, one may write which is due to (Laurent 1965).

Theorem 3 Characterization via minimization
Let X i s, i = 1, 2, 3, ... be non-negative independent and identically distributed (i.i.d.) integer valued random variables with X (1) = min 1≤i≤n X i . Then, Proof Sufficiency part: Necessary part: Hence the proof.
Theorem 4 Let X i s, i = 1, 2, 3, ... be non-negative independent integer valued random variables with Proof It is similar to that of Theorem 3 and hence omitted.
Proof Let x → ∞, and also let t = t(x) be the unique integer such that t(x) ≤ x ≤ t(x) + 1. As a consequence, we have S (t(x)) ≥ P (Y > x) ≥ S (t(x) + 1) . Therefore, Next, note that since x t(x) → 1, the sequence in the middle is bounded by two sequences which converges to 1 as x → ∞. Hence, the proof.

Estimation
For a random sample of size n drawn from the p.m.f. in Eq. (3), the log-likelihood function is given by The corresponding maximum likelihood equations are (by taking partial derivatives of w.r.t. θ, σ and γ respectively) The maximum likelihood estimates of θ, σ and γ can be obtained by setting Eqs. (8)-(10) equal to zero and solving simultaneously using bivariate Newton-Raphson method. The asymptotic variance-covariance matrix of the MLEs of parameters θ, σ and γ are obtained by inverting the Fisher's information matrix with elements which are negative expected values of second order derivatives of the log-likelihood function ( ). Using the general theory of MLEs, (see Appendix A) the asymptotic distribution of θ ,σ ,γ is a trivariate normal with mean (θ, σ , γ ) and variance-covariance matrix is given by ⎡ The exact expressions for various expectations above are cumbersome. However, in practice we would estimate above matrix by the inverse of observed Fisher's information matrix using the following approximations The expressions are provided in Appendix B. For simulation study, we consider the following choices of the parameters (for a random sample of sizes n = 50, 100, 200) respectively. A thorough simulation study was performed by generating 2000 samples of various sizes from DPIV (θ, γ , σ ) .
Comment on the simulation study: From Tables 1, 2, 3, 4, 5, 6 and 7, we observe that the convergence of N-R method is slightly strong for smaller values of θ, γ , and σ as θ, γ , and σ are close to the corresponding actual values in expectation. The associated       confidence intervals (in the parenthesis) cover the actual values of θ, γ , and σ quite satisfactorily. From the estimated value of θ, i.e., θ one can obtain an estimate of α by using invariance property of the MLE. Also, notice that there exists an appreciable amount of error when the sample size is very small (values of the estimates for n = 50 in Tables  1, 2, 3, 4, 5, 6 and 7. Since, α appears in powers of X i 's (from the original model, before reparametrization) it is expected to have more fluctuations in N-R method, resulting in weak convergence of the method which results in an unstable estimated values in some sense. Furthermore, as mentioned in the "Introduction" section, for a choice of α ≤ 2 (equivalently θ = exp(−α) ≤ 0.1353, the estimated value of θ is not good. A full scale simulation study with all possible such combination of values of each of the parameters of DPIV distribution under the Bayesian paradigm is required which will be a subject matter of a separate article. One may consider the method of moments estimation to examine if such an anomaly can be removed or at least, in principle be reduced to a desired accuracy level.

Data set 1
In this section we re-analyze the data which is used by Krishna and Pundir (2009). The data set comprises the recordings of (Phyo 1973) of the total number of carious teeth among the four deciduous molars in a sample of 100 children 10 and 11 years old. Symmetry between right and left molars is presumed and only the right molars are considered with a time unit of two years. The data are given in Table 8. The p-values of χ 2 -statistic are 0.000024, 0.2614, 0.5438, 0.4169, 0.1339 and 0.6238 for Poisson, geometric, DBD, DPareto, GBPareto(discrete) and DPIV distributions, respectively. This reveals that Poisson and Geometric distributions are not good fit at all, whereas GBPareto (discrete), DBD and DPareto are good fit with DPIV being the best one. We compute the expected frequencies (E i ) for fitting Poisson, Geometric, DBD, DPareto, GBPareto, DPIV distributions and pool the frequencies for 3 or more in order to apply χ 2 -test for goodness of fit. For the calculation of expected frequencies we use ML estimates in each case. The estimated value of the parameter is given in parenthesis in column one of Table 9.

Data set 2
A second application of the distributions is for modeling discrete data in which the frequencies at successive values increase. In general, most real discrete data are unimodal, multimodal or with decreasing frequencies. However, for this data set a reverse pattern can be observed. As an illustration, we consider data on duckweed fronds for plants growing in pure water observed weekly (for details, see Hand et al. (1993)) presented in Table  10. We fit this data to DPIV(θ, γ , σ ) (equivalently, DPIV), Poisson, Geometric, and to the generalized Poisson distribution (GPD) with p.m.f. (see, for details, Consul (1989)) given by where θ > 0, max(−1, −θ/m) ≤ λ ≤ 1 and m(≥ 4) is the largest positive integer for which θ + mλ ≥ 0 when λ < 0. From the above table (Table 11) it appears that clearly, the DPIV distribution provides the best fit.

Data set 3
In this Section, the discrete gamma-Lomax distribution is applied to a data set that is taken from (Consul 1989). The data set represents the observed frequencies of the number of outbreaks of strike in the coal-mining industry in the U.K. during 1948 − 1959. The data are depicted in Table 12 along with the expected frequencies corresponding to DPIV, Poisson, Geometric, and the Generalized Poisson distribution utilized by Consul (1989) as given in earlier "Data set 2" section. Consul (1989) applied the Generalized Poisson distribution (GPD) to this data set to examine the efficacy of the GPD model.
From the above table (Table 13) it appears that clearly, the DPIV distribution provides the best fit.

Concluding remarks
In this paper, we have proposed a new discrete analogue of the continuous Pareto (IV) distribution (DPIV distribution in short with three parameters) and derived some of its interesting distributional properties. The DPIV distribution offers good flexibilities in terms of shapes for the probability mass functions and hazard rate functions. The "Application" shows that the DPIV can be useful in fitting data which are positively skewed as well as to other data sets with slightly different shapes.The estimation of the model parameters are discussed in the classical set up under the method of maximum likelihood. From the "Application" sections, it appears that the DPIV distribution provides a better alternative to the existing discrete Pareto probability models. In this section we provide some justification on the consistency of the maximum likelihood estimators which are given as follows: Consistency of MLEs: Under certain regularity conditions, Lehman and Casella (1998) in Theorem 3.10 page 449 has proved that MLEδ is a consistent estimator of the parameter δ. Several conditions can be checked accordingly in our case. First condition states that the parameter space = (θ ∈ (0, 1), σ ∈ (0, ∞), γ ∈ (0, ∞)) is a subset on the real line. The support of the random variable X is independent of the parameters. Next, the second condition E ∂ log f ∂δ = 0 can be verified easily. The third condition that the is positive definite which can be verified easily. The final condition to verify is that In this case, we consider M( . For a carefully selected large k, we verified numerically (on using Mathematica software) that for all = (θ ∈ (0, 1), σ ∈ (0, ∞), γ ∈ (0, ∞)) and x = 1, 2, · · · is finite whenever E(X) and Var(X) are finite. Therefore, all the regularity conditions are satisfied, one may say the MLE of δ given byδ is a consistent estimator of δ and in our caseδ ∼ N 3 δ, [I(δ)] −1 .

Appendix B
In this section, we provide the elements of the observed Fisher Information matrix which are as follows: