Next Article in Journal
Improving Data Sparsity in Recommender Systems Using Matrix Regeneration with Item Features
Next Article in Special Issue
On the Existence and Stability of Solutions for a Class of Fractional Riemann–Liouville Initial Value Problems
Previous Article in Journal
Adaptive Nonparametric Density Estimation with B-Spline Bases
Previous Article in Special Issue
Advances in Singular and Degenerate PDEs
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Two Families of Continuous Probability Distributions Generated by the Discrete Lindley Distribution

by
Srdjan Kadić
1,*,†,
Božidar V. Popović
1,† and
Ali İ. Genç
2,†
1
Faculty of Science and Mathematics, University of Montenegro, 81000 Podgorica, Montenegro
2
Department of Statistics, Cukurova University, 01290 Adana, Turkey
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2023, 11(2), 290; https://doi.org/10.3390/math11020290
Submission received: 21 November 2022 / Revised: 17 December 2022 / Accepted: 26 December 2022 / Published: 5 January 2023
(This article belongs to the Special Issue State-of-the-Art Mathematical Applications in Europe)

Abstract

:
In this paper, we construct two new families of distributions generated by the discrete Lindley distribution. Some mathematical properties of the new families are derived. Some special distributions from these families can be constructed by choosing some baseline distributions, such as exponential, Pareto and standard logistic distributions. We study in detail the properties of the two models resulting from the exponential baseline, among others. These two models have different shape characteristics. The model parameters are estimated by maximum likelihood, and related algorithms are proposed for the computation of the estimates. The existence of the maximum-likelihood estimators is discussed. Two applications prove its usefulness in real data fitting.

1. Introduction

Compound discrete distributions serve as probabilistic models in various areas of applications, for instance, in ecology, genetics and physics. See, for example, [1]. Distributions obtained by compounding a parent distribution with a discrete distribution are very common in statistics and in many applied areas. Suppose we have a system consisting of N components, the lifetime of each of which is a random variable. Let X be the maximum lifetime of the components. Clearly, X has a compound distribution arising out of a random number N of components; i.e., X = max { Z 1 , , Z N } . On the other hand, in case of a system consisting of N components whose energy consumption is a random variable, and assuming that Z is the component whose energy consumption is minimal, we obtain the compound distribution of Y = min { Z 1 , , Z N } . The compounding principle is applied in the many different areas: insurance [2], ruin problems [3], compound risk models and their actuarial applications [4,5]. The development of the theory of compounding distribution is skipped here, because it has been covered in detail in [6].
The random variable N is often determined by economy, customer demand, etc. There is a practical reason why N might be considered as a random variable. A failure can occur due to initial defects being present in the system. A discrete version of this distribution has been studied in [7], having its applications in count data related to insurance.
We will say that random variable X possesses the discrete Lindley distribution introduced by [7] if its probability mass function is given by
P ( X = x ) = λ x 1 log λ [ λ log λ + ( 1 λ ) ( 1 log λ x + 1 ) ] ,
where x = 0 , 1 , and 0 < λ < 1 . The probability generating function (PGF) (see Equation (4) in [7]) is given by the typo error. The corrected version is defined by
Φ ( s ) = ( λ 1 ) ( λ s 1 ) ( 1 2 λ + λ 2 s ) log ( λ ) ( 1 λ s ) 2 ( 1 log ( λ ) ) , s < 1 / λ , 0 < λ < 1 .
In this manuscript, we consider the previously discrete Lindley distribution for the random variable N . Why do we assume a discrete Lindley distribution? For example, using a Poisson distribution has an important assumption: equidispersion of data. The assumption of equidispersion is not valid in real cases. Some alternative distributions to the model of overdispersed data are available—binomial negative, generalized Poisson or zero inflated Poisson. However, judging by the number of parameters used, these alternatives are more complex than the Poisson distribution. That is why we are introducing a continuous Lindley distribution with one parameter, which is similar to the Poisson distribution. The application of the Lindley distribution in modeling the number of claim data is less suitable because the number of claims data is a discrete number, as opposed to the Lindley distribution’s continuous nature. That is why we are introducing a new discrete Lindley distribution, created through discretisation of a continuous Lindley distribution with one parameter.
Assuming that M is the zero truncated version of N with PGF (1), we will construct two new families of distributions: the discrete Lindley-generated families of distributions of the first and second kinds.
The paper is organized as follows. In Section 1, we construct two discrete Lindley generated families. Section 2 is devoted to shape characteristics. In Section 3 we derive some mathematical properties of the families. Estimation issues are investigated in Section 4 and Section 5. The simulation study is presented in Section 6. Two applications to real data are addressed in Section 7. The paper is finalized with concluding remarks.

2. Construction of the Families of Distributions

There are various methods for getting the discrete Lindley distribution. For example, in [8], the authors considered a method of infinite series for constructing the discrete Lindley distribution. On the other hand, in [9], the discrete Lindley distribution was built using the survival function method. In this manuscript, we employ the so-called max-min procedure. This construction is widely used in practice. For a comprehensive literature review, we refer the reader to [10] and references therein.
In this section, we introduce two new families of distributions as follows. Let { Z i } i 1 be a sequence of independent and identically distributed (iid) random variables with baseline cumulative distribution function (CDF) F ( x ) = F ( x ; ψ ) , where x R and ψ is the parameter vector. Suppose that N is a discrete random variable with the PGF Φ ( s ) and let M have the zero-truncated distribution of the random variable N obtained by removing zero from N. Then, the probability mass function (pmf) of M is given by
P ( M = m ) = P ( N = m ) 1 Φ ( 0 ) , m { 1 , 2 , } .
In order to prove that m = 1 + P ( M = m ) = 1 , let us recall that P ( N = m ) = Φ m ( 0 ) m ! . After some algebra, we find
P ( N = m ) = λ m λ 1 + log λ 2 λ log λ log λ 1 + m λ m ( 1 λ ) log λ log λ 1 .
Using serial representations m = 1 + λ m = λ 1 λ and m = 1 + m λ m = λ ( 1 λ ) 2 , one can calculate
m = 1 + P ( N = m ) = λ ( 1 2 log λ ) 1 log λ .
Equation (3) coincides with 1 Φ ( 0 ) . This completes the proof that m = 1 + P ( M = m ) = 1 .
First, we introduce the family of distributions based on the maximum of random variables. We define the random variable X = max { Z i } i = 1 M . Then, the CDF and probability density function (PDF) of X are given by
G X ( x ) = Φ [ F ( x ) ] 1 Φ ( 0 ) , x R
and
g X ( x ) = f ( x ) Φ [ F ( x ) ] 1 Φ ( 0 ) , x R ,
respectively.
Further, if we suppose that the random variable N has the PGF given by (1), the CDF and PDF of X for x R , λ ( 0 , 1 ) are given by
G 1 ( x ) = G 1 ( x ; θ , λ ) = F ( x ) [ 1 λ + ( 3 λ 2 ) log ( λ ) λ ( 1 λ + ( 2 λ 1 ) log ( λ ) ) F ( x ) ] ( 1 2 log ( λ ) ) [ 1 λ F ( x ) ] 2 ,
and
g 1 ( x ) = f ( x ) [ 1 λ + ( 3 λ 2 ) log ( λ ) λ ( 1 λ + λ log ( λ ) ) F ( x ) ] ( 1 2 log ( λ ) ) [ 1 λ F ( x ) ] 3 ,
respectively. We say that the family of distributions defined by (4) and (5) is the discrete Lindley generated family of the first kind (“LiG1” for short). A random variable X having PDF (5) is denoted by X LiF1 ( λ , ψ ) .
The hazard rate function (HRF) of X can be expressed as
τ 1 ( x ) = h F ( x ) [ 1 λ + ( 3 λ 2 ) log ( λ ) λ ( 1 λ + λ log ( λ ) ) F ( x ) ] [ 1 λ F ( x ) ] [ 1 2 log ( λ ) λ ( 1 log ( λ ) ) F ( x ) ] , x R , λ ( 0 , 1 ) .
Let us study the identifiable property of the distribution given by (4) under the exponential baseline distribution F ( x ; θ ) = 1 e θ x . We will get the discrete Lindley exponential distribution of the first kind. We will designate this distribution LiE1.
Theorem 1. 
The LiE1 distribution is identifiable with respect to the parameters λ and θ .
Proof. 
Let us suppose that
G 1 ( x ; θ 1 , λ 1 ) = G 1 ( x ; θ 2 , λ 2 )
for all x > 0 and when F ( x ) is the CDF of exponential distribution. If we let x into both sides of (7) and after some algebra, it can be concluded that λ 1 = λ 2 . Now it is not hard to verify that θ 1 = θ 2 . Hence the proof of the theorem.    □
Second, in [6], it was demonstrated that the random variable Y = min { Z i } i = 1 M has CDF and PDF given by
G Y ( y ) = 1 Φ [ 1 F ( y ) ] 1 Φ ( 0 ) , y R ,
and
g Y ( y ) = f ( y ) Φ [ 1 F ( y ) ] 1 Φ ( 0 ) , y R ,
respectively.
Now, inserting (1) in Equation (8), the CDF of the random variable Y becomes
G 2 ( x ) = G 2 ( x ; θ , λ ) = F ( x ) [ 1 2 log ( λ ) λ ( 1 log ( λ ) ) F ¯ ( x ) ] ( 1 2 log ( λ ) ) [ 1 λ F ¯ ( x ) ] 2 , x R , λ ( 0 , 1 ) ,
where F ¯ ( x ) = 1 F ( x ) is the survival function of the random variable Z 1 .
In a similar manner, by replacing (1) in the Equation (9), the PDF of Y reduces to
g 2 ( x ) = f ( x ) [ 1 λ + ( 3 λ 2 ) log ( λ ) λ ( 1 λ + λ log ( λ ) ) F ¯ ( x ) ] ( 1 2 log ( λ ) ) [ 1 λ F ¯ ( x ) ] 3 , x R , λ ( 0 , 1 ) .
The random variable Y having the PDF (11) is called the discrete Lindley generated family of the second kind, Y LiF2 ( λ , ψ ) .
From Equations (10) and (11), the HRF of Y follows as
τ 2 ( x ) = h F ( x ) [ 1 λ + ( 3 λ 2 ) log ( λ ) λ ( 1 λ + λ log ( λ ) ) F ¯ ( x ) ] [ 1 λ F ¯ ( x ) ] [ 1 λ + ( 3 λ 2 ) log ( λ ) λ ( 1 λ + ( 2 λ 1 ) log ( λ ) ) F ¯ ( x ) ] , x R , λ ( 0 , 1 ) ,
where τ F ( x ) = f ( x ) / F ¯ ( x ) is the HRF of the random variable Z i .
There are at least four motivations for having two families of distributions: Reliability: From the stochastic representations X and Y, we note that the two families can arise in parallel and series systems with identical components, which appear in many industrial applications and biological organisms. The first-activation scheme: If we assume that an individual is susceptible to a cancer type, then we can call the number of carcinogenic cells that survived the initial treatment M, and Z i is the time needed for the i th carcinogenic cell to metastasise into a detectable tumour, for i 1 . If we assume that { Z i } i 1 is a sequence of a total of iid random variables, all independent of M, where M is given by (2), we can conclude that the time to relapse of cancer of a susceptible individual is defined by the random variable Y. Last-activation scheme: Let us assume that M equals the number of latent factors that have to be active by failure, and Z i is the time of disease resistance due to the latent factor i. According to the last-activation scheme, the failure occurs once all N factors are active. If the Z i s are iid random variables that are independent of N having the baseline distribution F, where N follows (2), the random variable X can model time to the failure according to the last-activation scheme. The times to the last and first failures: Let us assume that the device failure happens due to initial defects numbering M, and that these can be identified only after causing the failure, and that they are being repaired perfectly. We will define Z i as the time to the device failure due to the defect number i, where i 1 . Under the assumptions that the Z i s are iid random variables independent of M given by (2), the random variables X and Y are appropriate for modeling the times to the last and first failures.

3. Shape Characteristics of the Proposed Models under the Exponential Baseline Distribution

Let us examine the shapes of the PDF and HRF for the case of the exponential baseline distribution. Let the random variables Z 1 have the exponential distribution with scale parameter θ > 0 . If we set F ( x ) = 1 e θ x and replace it in (5), we will get the LiE1 distribution. Its PDF is for x > 0 , θ > 0 , λ ( 0 , 1 )
g 1 ( x ; θ , λ ) = θ e θ x [ 1 λ + ( 3 λ 2 ) log ( λ ) λ ( 1 λ + λ log ( λ ) ) ( 1 e θ x ) ] ( 1 2 log ( λ ) ) [ 1 λ ( 1 e θ x ) ] 3 .
The exponential distribution is widely used due to its simplicity and applicability. For its usage in the theory of the compounding distribution, we recommend [10], where it is possible to find a long list of the corresponding references.
In order to study the shape of the last PDF, firstly we will give the following example. The next example will serve us to prove Theorem 2. It will play a crucial role in the study of the inequality that is important for drawing the conclusion about the PDF’s shape.
Example 1. 
Suppose λ ( 0 , 1 ) . Find λ such that ( 8 λ 2 9 λ + 2 ) log ( λ ) > 2 λ 2 3 λ + 1 .
Solution: An analytical solution of the above inequality is not possible, so we will use numerical algorithms. Let us consider the corresponding equation ( 8 λ 2 9 λ + 2 ) log ( λ ) = 2 λ 2 3 λ + 1 . Using function Solve in Mathematica software ([11]), we get that λ 0.3536 . Furthermore, using the function Reduce we see that for λ ( 0.3536 , 1 ) the inequality holds. The graphical solution is given in Figure 1.
Theorem 2. 
The PDF of LiE1 with parameters θ > 0 and λ ( 0 , 1 ) is unimodal if λ ( 0.3536 , 1 ) . Otherwise, it is decreasing.
Proof. 
The first derivative of the logarithm of the PDF g 1 ( x ) can be represented in the form
[ log g 1 ( x ) ] = θ s ( x ) ( 1 λ ( 1 e θ x ) ) ( a + b ( 1 e θ x ) ) ,
where s ( x ) = ( a + b ) ( 1 λ ) 2 ( a λ + b ) e θ x + λ b e 2 θ x , a = 1 λ + ( 3 λ 2 ) log ( λ ) and b = λ ( 1 λ + λ log ( λ ) ) . We transform the function s ( x ) to a quadratic function s ( y ) = λ b y 2 2 ( b + a λ ) y + ( a + b ) ( 1 λ ) , y [ 0 , 1 ] . Let y 1 and y 2 represent the roots of the equation s ( y ) = 0 . Some calculations indicate that a > 0 , b < 0 , b + a λ > 0 and a + b > 0 . Thus,
y 1 + y 2 = 2 ( a λ + b ) λ b < 0 , y 1 y 2 = ( a + b ) ( 1 λ ) λ b < 0 ,
so we have y 1 < 0 < y 2 and y 1 > y 2 . After some calculations, it can be shown that discriminant D = 4 ( a λ + b ) 2 4 λ b ( 1 λ ) is positive and that s ( y ) is concave. We need to find when solution y 2 ( 0 , 1 ) . If we set u = b , one gets
y 2 = ( a λ u ) 2 + λ u ( a u ) ( 1 λ ) ( a λ u ) λ u .
If y 2 < 1 , then
( a λ u ) 2 + λ u ( a u ) ( 1 λ ) < λ u + a λ u .
It is not difficult to verify that the right-hand side of the last inequality is positive and we can quadrate (13). Then, the inequality (13) reduces to
λ u ( 3 λ a u a ) > 0 .
Now, the assertion of the first part of Theorem follows from Example 1.
In case λ < 0.3536 , s ( y ) is always positive on the interval ( 0 , 1 ) , and hence the PDF is decreasing.    □
Different shapes of the PDF in cases of LiE1 model are given in Figure 2.
The HRF of the LiE1 distribution is
h 1 ( x ) = θ [ 1 λ + ( 3 λ 2 ) log ( λ ) λ ( 1 λ + λ log ( λ ) ) ( 1 e θ x ) ] [ 1 λ ( 1 e θ x ) ] [ 1 2 log ( λ ) λ ( 1 log ( λ ) ) ( 1 e θ x ) ] , x > 0 , θ > 0 , λ ( 0 , 1 ) .
Determining the shape of a HRF of a distribution is an important issue in statistical reliability and survival analysis. We give it for the LiE1 model in the following theorem.
Theorem 3. 
The HRF of the LiE1 with parameters θ > 0 and λ ( 0 , 1 ) is an increasing function.
Proof. 
The first derivative of the log h 1 ( x ) can be represented as
[ log h 1 ( x ) ] = θ e θ x s ( x ) ( a + b ( 1 e θ x ) ) ( 1 λ ( 1 e θ x ) ) ( d c ( 1 e θ x ) ) ,
where a and b were defined in Theorem 2, c = λ ( 1 log ( λ ) ) , d = 1 2 log ( λ ) and s ( x ) = λ b c e 2 θ x 2 λ c ( a + b ) e θ x + 2 λ a c λ a d b d + λ c b c a . After extensive calculations, it can be shown that 2 λ a c λ a d b d + λ c b c a < 0 .
Again, using the transformation y = e θ x , where y [ 0 , 1 ] , we get quadratic equation s ( y ) = 0 with
y 1 + y 2 = 2 λ c ( a + b ) λ b < 0 , y 1 y 2 = 2 λ a c λ a d b d + λ c b c a λ b > 0 .
Thus, we have y 1 < y 2 < 0 . The function s ( y ) is concave, and it holds that s ( y ) < 0 for all y [ 0 , 1 ] . Finally, the HRF is increasing. Hence, we proved Theorem.    □
Different shapes of the HRF in the case of the LiE1 model are outlined in Figure 3.
Now, we will study the shapes of the discrete Lindley exponential distribution of the second kind (LiE2) of distribution. By replacing F ¯ ( x ) = e θ x in Equation (11), we obtain the PDF of the LiE2 distribution as
g 2 ( x ; θ , λ ) = θ e θ x [ 1 λ + ( 3 λ 2 ) log ( λ ) λ ( 1 λ + λ log ( λ ) ) e θ x ] ( 1 2 log ( λ ) ) ( 1 λ e θ x ) 3 , x > 0 , θ > 0 , λ ( 0 , 1 ) .
The shapes of the LiE2 distribution are given by the following theorem.
Theorem 4. 
The PDF of the LiE2 with parameters θ > 0 and λ ( 0 , 1 ) is a decreasing function with lim x 0 g 2 ( x ) = θ ( 1 λ + ( λ 2 ) log ( λ ) ) ( 1 λ ) 2 ( 1 2 log ( λ ) ) and lim x g ( x ) = 0 .
Proof. 
Similarly to in Theorem 2, we have
[ log g 2 ( x ) ] = θ s ( x ) ( 1 λ e θ x ) ( a + b e θ x ) ,
where s ( x ) = a 4 λ a e θ x 3 λ b λ e 2 θ x , a = 1 λ + ( 3 λ 2 ) log ( λ ) and b = λ ( 1 λ + λ log ( λ ) ) . We can prove that s ( x ) is positive for all x > 0 . Letting y = e θ x , we transform the function s ( x ) to a quadratic function s ( y ) = b λ y 2 + 2 ( b + a λ ) y + a ; y [ 0 , 1 ] . Let y 1 < y 2 represent the roots of the equation s ( y ) = 0 . Since we have a > 0 , b < 0 and b + a λ > 0 ,
y 1 + y 2 = 4 a 3 b < 0 , y 1 y 2 = a 3 b λ > 0 ,
which implies y 1 < y 2 < 0 . Since b λ < 0 and the discriminant D = 4 ( b + a λ ) 2 4 a b λ is positive, it follows that s ( y ) is concave and positive on [ y 1 , y 2 ] , which means that s ( y ) is positive for y [ 0 , 1 ] . Finally, s ( x ) is positive for all x > 0 and g 2 ( x ) < 0 .    □
The HRF of the LiE2 distribution for x > 0 , θ > 0 , λ ( 0 , 1 ) is given by
h 2 ( x ) = h 2 ( x ; θ , λ ) = θ [ 1 λ + ( 3 λ 2 ) log ( λ ) λ ( 1 λ + λ log ( λ ) ) e θ x ] [ 1 λ e θ x ] [ 1 λ + ( 3 λ 2 ) log ( λ ) λ ( 1 λ + ( 2 λ 1 ) log ( λ ) ) e θ x ] .
The shape of the HRF of the LiE2 distribution is given in the following theorem.
Theorem 5. 
The HRF of the LiE2 distribution with parameters θ > 0 and λ ( 0 , 1 ) is an increasing function with lim x 0 h 2 ( x ) = θ ( 1 λ + ( λ 2 ) log ( λ ) ) ( 1 λ ) 2 ( 1 2 log ( λ ) ) and lim x h 2 ( x ) = θ .
Proof. 
We consider the logarithm of the HRF h 2 ( x ) . Its first derivative can be expressed as
[ log h 2 ( x ) ] = θ e θ x t ( x ) ( a + b e θ x ) ( 1 λ e θ x ) ( a + c e θ x ) ,
where a and b are defined as in the proof of the previous theorem, c = λ [ 1 λ + ( 2 λ 1 ) log ( λ ) ] and t ( x ) = b c λ e 2 θ x + 2 a c λ e θ x + a ( b + a λ c ) . By letting y = e θ x , we transform the function t ( x ) to the quadratic function t ( y ) = b c λ y 2 + 2 a c λ y + a ( b + a λ c ) ; y ( 0 , 1 ) . As before, let y 1 < y 2 be the roots of the equation t ( y ) = 0 . Some calculations indicate that a > 0 , b < 0 , c < 0 and b + a λ c > 0 , which implies that
y 1 + y 2 = 2 a b > 0 , y 1 y 2 = a ( b + a λ c ) b c λ > 0 , ( 1 y 1 ) ( 1 y 2 ) = 1 + a b c λ ( b + a λ + 2 c λ c ) = 1 + a b c [ 1 3 λ + 2 λ 2 ( 3 6 λ + 4 λ 2 ) log ( λ ) ] > 0 .
Thus, two cases can be considered, 0 < y 1 < y 2 < 1 and 1 < y 1 < y 2 . The first case is not possible, since
y 1 y 2 1 = a ( b + a λ ) c ( a + b λ ) b c λ > 0 ,
which follows from the fact that a + b λ = ( 1 λ ) 2 ( 1 + λ ( λ + 2 ) log ( λ ) ) > 0 . Thus, 1 < y 1 < y 2 . Since b c λ > 0 and the discriminant D = 8 a c λ 3 ( 1 λ ) 2 log 2 ( λ ) is positive, it follows that t ( y ) is a convex function and positive on ( 0 , 1 ) . This implies that t ( x ) is positive for all x > 0 . Finally, h 2 ( x ) < 0 , which means that the HRF is an increasing function.    □
Using similar calculations, we can derive the shapes of the PDF and HRF of X and Y given by (5), (6), (11) and (12), respectively, under various baseline distributions.
Figure 4 represents plots of the LiE2 density function, while on Figure 5 we have plots of the LiE2 hazard rate functions for various parameter values.
Theorem 6. 
The LiE2 distribution function is identifiable with respect to the parameters θ and λ.
Proof. 
As was the case in the proof of Theorem 1, we will assume that G 2 ( x ; θ 1 , λ 2 ) = G 2 ( x ; θ 2 , λ 2 ) for all x > 0 and F ( x ) is the CDF of an exponential distribution. As a consequence, we have h 2 ( x ; θ 1 , λ 2 ) = h 2 ( x ; θ 2 , λ 2 ) . Then, from Theorem 5, we have that θ 1 = θ 2 when x . Now, since θ 1 = θ 2 after some algebra, it can be shown that from h 2 ( 0 ; θ 1 , λ 2 ) = h 2 ( 0 ; θ 2 , λ 2 ) follows λ 1 = λ 2 .    □

4. Some Mathematical Properties

4.1. Mixture Representations

In this section, we obtain a very useful representation for the LiG1 density function. For | z | < 1 and ρ > 0 , we can write
( 1 z ) ρ = j = 0 w j z j ,
where w j = Γ ( ρ + j ) / [ Γ ( ρ ) j ! ] and Γ ( ρ ) = 0 t ρ 1 e t d t is the gamma function. For α ( 0 , 1 ) , we can apply (14) in Equation (5) to obtain
g 1 ( x ) = f ( x ) [ a ( λ ) + b ( λ ) F ( x ) ] j = 0 v j F ( x ) j ,
where a ( λ ) = 1 λ + ( 3 λ 2 ) log ( λ ) , b ( λ ) = λ [ 1 λ + λ log ( λ ) ] and
v j = v j ( λ ) = Γ ( j + 3 ) λ j 2 ( 1 2 log ( λ ) ) j ! .
Henceforth, T a as a random variable will be said to have the exponentiated-F (“exp-F”) distribution, its power parameter being a > 0 , say, T a e x p F ( a ) , if its PDF and CDF are given by
h a ( x ) = a f ( x ) F a 1 ( x ) and H a ( x ) = F a ( x ) ,
respectively.
Then, using the exp-F distribution, we can write Equation (15) as
g 1 ( x ) = j = 0 [ t j h j + 1 ( x ) + s j h j + 2 ( x ) ] = j = 0 p j h j + 1 ( x ) ,
where t j = a ( λ ) v j / ( j + 1 ) , s j = b ( λ ) v j / ( j + 2 ) , p j = t j + s j 1 (for j 0 ) and s 1 = 0 .
Equation (16) is this section’s main result. It shows that the LiF1 family density function is a mixture of e x p F ditributions. Therefore, there are structural properties (for instance incomplete and ordinary moments, generating functions, mean deviations) of the LiF1 family that can be obtained from the corresponding properties of the exp-G distribution. The exp-F mathematical properties have been studied by many authors in recent years, such as Nadarajah and Kotz (2006). In the following sections, we provide some mathematical properties of the LiG1 family distribution.

4.2. Moments

Henceforth, let T j + 1 have the the exp-F density h j + 1 ( x ) with power parameter j + 1 , say, T j + 1 exp-F ( j + 1 ) . A first formula for the nth moment of the LiF1 family can be obtained from (16) as
μ n = E ( X n ) = j = 0 p j E ( T j + 1 n ) .
Nadarajah and Kotz [12] provide explicit expressions for moments of some exponentiated distributions. They can be used to produce μ n .
A second formula for μ n can be obtained from (17) in terms of the baseline quantile function (qf) Q F ( u ) . We obtain
μ n = j = 0 ( j + 1 ) p j τ ( n , j ) ,
where the integral can be expressed as a function of the F quantile function (qf), say, Q F ( u ) = F 1 ( u ) , as τ ( n , j ) = 0 1 Q F ( u ) n u j d u .
Even though there is an infinite sum in the moments’ equation, it is not difficult to calculate its values. For example, if we set an error to 10 6 , then four iterations would be enough for moments’ calculation.
Equations (17) and (18) can be used to directly determine the ordinary moments of some LiF1 distributions. Three examples will be provided here. Here, we consider three examples. LiE1 distribution moments (with scale parameter θ > 0 from the exponential baseline distribution) are given by
μ n = n ! θ n j = 0 i = 0 j i ( 1 ) i p j ( j + 1 ) 1 ( i + 1 ) n + 1 .
Particularly, we have
E ( X ) = j = 0 p j [ ψ ( j + 2 ) ψ ( 1 ) ] ,
where ψ ( · ) is the digamma function defined by ψ ( · ) = Γ ( · ) / Γ ( · ) .
For the discrete Lindley Pareto of the first kind (LiPa1) of distribution, the baseline distribution is F ( x ) = 1 ( 1 + x ) ν , x > 0 and we have
μ n = j = 0 i = 0 n n i ( 1 ) i p j ( j + 1 ) B j + 1 , 1 i ν , ν > n ,
where B ( a , b ) = 0 1 t a 1 ( 1 t ) b 1 d t is the standard beta function.
For the discrete Lindley standard logistic of the first kind (LiSL1) of distribution, the baseline distribution is F ( x ) = ( 1 + e x ) 1 and < x < . Using an integral result from [13], we have
μ n = j = 0 i = 0 n n i ( 1 ) 2 n i j + 1 Γ ( j + 2 ) p j Γ ( 1 ) ( i ) Γ ( j + 1 ) ( n i ) ,
where
Γ ( a ) ( m ) = 0 ( ln x ) m x a 1 e x d x .
Further, central moments, that is, moments around the mean, can also be computed. The relation between the central moments ( μ r ) and the moments about the origin are given by
μ r = k = 0 r ( 1 ) k r k ( μ 1 ) k μ r k .
The cumulants of the distribution can also be computed together with the skewness and kurtosis measures. For this approach, we refer the reader to [14]. The skewness and kurtosis plots for these distributions are sketched in Figure 6, Figure 7, Figure 8 and Figure 9. We observe that various skewness and kurtosis values can be obtained from these models.

4.3. Generating Function

As far as the moment generating function (mgf) M ( t ) = E ( e t X ) of X is concerned, we will provide two formulae. The first M ( t ) formula comes from (16) as
M ( t ) = j = 0 p j M j + 1 ( t ) ,
where M j + 1 ( t ) is the mgf of T j + 1 . Therefore, M ( t ) is determined by the generating function of the e x p F ( j + 1 ) distribution. The second M(t) formula is derived from (16)
M ( t ) = j = 0 ( j + 1 ) p j ρ ( t , j ) ,
where ρ ( t , j ) can be calculated from Q F ( x ) as
ρ ( t , j ) = 0 1 exp t Q G ( u ) u j d u .
It is possible to get several mgf of some LiG1 distributions using Equations (20) and (21), which can be used to directly obtain the mgf of several LiG1 distributions. For example, we have the mgfs of the LiE1 (with parameter λ ) and and LiSL1 as
M ( t ) = j = 0 ( j + 1 ) B ( j + 1 , 1 λ t ) p j , t > λ ,
and
M ( t ) = j = 0 ( j + 1 ) B ( t + j + 1 , 1 t ) p j , t < 1 ,
respectively.

4.4. Incomplete Moments and Mean Deviations

The shapes of many of the distributions can, for empirical reasons, be conveniently described as incomplete moments. Such moments are important in measuring inequality, such as income quantiles and Lorenz and Bonferroni curves, which depend on the distribution incomplete moments. The n th incomplete moment of the random variable X is defined as
m n ( y ) = 0 y g 1 ( x ) d x = j = 0 ( j + 1 ) 0 F ( y ) Q F ( u ) n u j d u .
The integral in (22) can be computed in the closed-form for several baseline F distributions.
The mean deviations about the mean ( δ 1 = E ( | X μ 1 | ) ) and about the median ( δ 2 = E ( | X M | ) ) of X can be expressed as δ 1 = 2 μ 1 G 1 ( μ 1 ) 2 m 1 ( μ 1 ) and δ 2 = μ 1 2 m 1 ( M ) , respectively, where μ 1 = E ( X ) , M = M e d i a n ( X ) is the median of X computed from
G 1 ( M ) = F ( M ) { 1 λ + ( 3 λ 2 ) log ( λ ) λ [ 1 λ + ( 2 λ 1 ) log ( λ ) ] F ( x ) } [ 1 2 log ( λ ) ] [ 1 λ F ( M ) ] 2 = 0.5 ,
G 1 ( μ 1 ) is easily calculated from (4) and m 1 ( z ) = z x f ( x ) d x is the first exp-F incomplete moment.
We will provide two ways to compute d e l t a 1 and d e l t a 2 . In the first instance, we can derive a general equation for m 1 ( z ) from (16) by setting u = F ( x ) as
m 1 ( z ) = j = 0 ( j + 1 ) A j ( z ) ,
where
A j ( z ) = z x h j + 1 ( x ) d x = 0 F ( z ) Q F ( u ) u j d u .
Equation (24) provides the basic quantity for computing the mean deviations of the exp-F distributions. Hence, the mean deviations δ 1 and δ 2 depend only on the exp-F mean deviations. Thus, alternative representations for δ 1 and δ 2 are given by δ 1 = 2 μ 1 G 1 ( μ 1 ) 2 j = 0 ( j + 1 ) A j ( μ 1 ) and δ 2 = μ 1 2 j = 0 ( j + 1 ) A j ( M ) .
In a similar way, the mean deviations of any LiF1 distribution can be computed from Equations (23) and (24). For example, the mean deviations of the LiE1 (with parameter λ ), LiPa1 (with parameter 0 < ν < 1 ) and LiSL1 are determined immediately (by using the generalized binomial expansion) from the functions
A j ( z ) = λ 1 Γ ( j ) m = 0 ( 1 ) m { 1 exp ( m λ z ) } Γ ( j m ) ( m + 1 ) ! ,
and
A j ( z ) = m = 0 r = 0 m ( 1 ) m ( 1 r ν ) j + 1 m m r z 1 r ν ,
and
A j ( z ) = 1 Γ ( j ) m = 0 ( 1 ) m Γ ( j + m + 1 ) { 1 exp ( m z ) } ( m + 1 ) ! ,
respectively.
Bonferroni and Lorenz curves defined can be given to obtain for a given probability π by B ( π ) = T ( q ) / ( π μ 1 ) and L ( π ) = T ( q ) / μ 1 , respectively, where μ 1 = E ( X ) and q = Q ( π ) is the LiG1-F qf at π .

5. On the Maximum-Likelihood Estimation of Parameters

We propose to use the maximum likelihood (ML) estimation method for the parameter estimation of the introduced distributions. The log-likelihood function for the general case (5) is given by
L ( λ , ψ ) = n log ( 1 2 log ( λ ) ) 3 i = 1 n log ( 1 λ F ( x i ; ψ ) ) + i = 1 n log f ( x i ; ψ ) + i = 1 n log 1 λ + ( 3 λ 2 ) log ( λ ) λ ( 1 λ + λ log ( λ ) ) F ( x i ; ψ ) .
In this special case, we consider the exponential baseline distribution. Thus, for the LiE1 model, the estimating equations are given by   
L ( λ , θ ) θ = 3 λ θ i = 1 n e θ x i 1 λ ( 1 e θ x i ) + b θ i = 1 n e θ x i a + b ( 1 e θ x i ) + n θ i = 1 n x i = 0 L ( λ , θ ) λ = 3 i = 1 n 1 e x i θ 1 λ ( 1 e x i θ ) + 2 n λ ( 1 2 log ( λ ) )
+ i = 1 n 2 2 λ + 3 log ( λ ) ( 1 e x i θ ) λ log ( λ ) ( 1 e x i θ ) ( 1 λ + λ log ( λ ) ) 1 λ + ( 2 + 3 λ ) log ( λ ) ( 1 e x i θ ) ( 1 λ + λ log ( λ ) ) = 0 .
Now, we will study the existence of the ML estimators when the other parameter is known in advance (or given).
Theorem 7. 
If the parameter λ is known, then the Equation (25) has at least one root in the interval ( 0 , + ) .
Proof. 
One can readily verify that lim θ + L ( λ , θ ) θ = i = 1 n x i and lim θ 0 + 0 L ( λ , θ ) θ = + . Thus, there exists at least one root of the Equation (25).    □
Theorem 8. 
Assuming that
i = 1 n e x i θ < n 2
and if the parameter θ is known, then (26) has at least one root on the interval ( 0 , 1 ) .
Proof. 
Applying L’Hôpital’s rule, we get lim λ 1 0 L ( λ , θ ) λ = and lim λ 0 + 0 L ( λ , θ ) λ = 3 i = 1 n ( 1 e x i θ ) 3 n 2 .
In order to have at least one solution, it is necessary to have 3 i = 1 n ( 1 e x i θ ) 3 n 2 > 0 . Hence the theorem.    □
On the other hand, the estimating equations for the LiE2 model are given by
L ( λ , θ ) θ = 3 λ θ i = 1 n e θ x i 1 λ e θ x i b θ i = 1 n e θ x i a + b e θ x i + n θ i = 1 n x i = 0 L ( λ , θ ) λ = 3 i = 1 n e x i θ 1 λ e x i θ + 2 n λ ( 1 2 log ( λ ) )
+ i = 1 n 2 2 λ + 3 log ( λ ) e x i θ λ log ( λ ) e x i θ ( 1 λ + λ log ( λ ) ) 1 λ + ( 2 + 3 λ ) log ( λ ) e x i θ ( 1 λ + λ log ( λ ) ) = 0 .
The next two theorems examine the existence problem of the ML estimates via (27) and (28). Their proofs are very similar to those cases of Theorems 7 and 8, so we here omit them.
Theorem 9. 
If the parameter λ is known, then the Equation (27) has at least one root on the interval ( 0 , + ) .
Theorem 10. 
If the parameter θ is known and if it is assumed that
i = 1 n e x i θ > n 2 ,
then the Equation (28) has at least one root on the interval ( 0 , 1 ) .
Clearly, the log-likelihood estimating equations for the parameters are nonlinear in the sense that the estimators cannot be obtained in closed forms. Thus, a numerical iterative method such as the Newton–Raphson one should be used in the estimation.

6. Estimation of Parameters via the EM Algorithm

We propose to use the method of maximum likelihood in estimating the parameters of the introduced models. The construction method of the models suggests using an EM (expectation maximization) algorithm. In this section, we provide EM algorithms for the estimation of the unknown parameters θ and λ for both exponential-discrete Lindley distributions.

6.1. EM Algorithm for the LiE1 Model

The missing data random variable will be the random variable M with the zero-truncated discrete Lindley distribution. Let us derive its probability mass function as
P ( M = m ) = P ( N = m ) 1 P ( N = 0 ) = λ m 1 [ λ log ( λ ) + ( 1 λ ) ( 1 ( m + 1 ) log ( λ ) ) ] 1 2 log ( λ ) , m = 1 , 2 , ,
where N is a random variable with the discrete Lindley distribution with the parameter λ ( 0 , 1 ) . Next, the random variable X = max ( Z 1 , , Z M ) for a given M = m has the CDF ( 1 e θ x ) m . Then, the PDF of the complete-data distribution is given by
f ( x , m ) = θ m λ m 1 λ log ( λ ) + ( 1 λ ) [ 1 ( m + 1 ) log ( λ ) ] e θ x ( 1 e θ x ) m 1 1 2 log ( λ ) .
The marginal PDF of X is given by
f X ( x ) = θ e θ x 1 λ + ( 3 λ 2 ) log ( λ ) ( 1 e θ x ) λ [ λ ( log ( λ ) 1 ) + 1 ] ( 1 2 log ( λ ) ) [ 1 ( 1 e θ x ) λ ] 3 .
Then, the conditional PDF of M for given X = x is given by
f M | X ( m | x ) = m ( 1 e θ x ) m 1 λ m 1 [ 1 ( 1 e θ x ) λ ] 3 λ log ( λ ) + ( 1 λ ) [ 1 ( m + 1 ) log ( λ ) ] 1 λ + ( 3 λ 2 ) log ( λ ) ( 1 e θ x ) λ [ λ ( log ( λ ) 1 ) + 1 ] ,
where m = 1 , 2 , 3 , .
The E-step of the EM algorithm requires the computation of the conditional expectation of the random variable M for a given X = x . Now, we have
E ( M | X ) = = λ log ( λ ) ( 3 ξ 2 ( x ; λ , θ ) + 4 ξ ( x ; λ , θ ) ) ( 4 ξ ( x ; λ , θ ) + 2 ) log ( λ ) + ( 1 λ ) ( 1 ξ 2 ( x ; λ , θ ) ) ( 1 ξ ( x ; λ , θ ) ) 1 λ + ( 3 λ 2 ) log ( λ ) ξ ( x ; λ , θ ) [ λ ( log ( λ ) 1 ) + 1 ] ,
where ξ ( x ; λ , θ ) = λ ( 1 e θ x ) .
In the M-step, we consider the complete data log-likelihood function, which is given by
l c ( θ , λ ) = n log ( θ ) + i = 1 n log ( m i ) θ i = 1 n x i + i = 1 n ( m i 1 ) log ( 1 e θ x i ) + i = 1 n m i n log ( λ ) + i = 1 n log { λ log ( λ ) + ( 1 λ ) [ 1 ( m i + 1 ) log ( λ ) ] } n log ( 1 2 log ( λ ) ) .
Maximizing the log-likelihood function l c ( θ , λ ) , the obtained estimates in the k + 1 iteration are given by   
θ ( k + 1 ) = n n x ¯ i = 1 n x i ( m i ( k + 1 ) 1 ) e θ ( k + 1 ) x i 1 e θ ( k + 1 ) x i 1
λ ( k + 1 ) = n ( 1 + 2 log ( λ ( k + 1 ) ) ) 2 log ( λ ( k + 1 ) ) 1 i = 1 n m i ( k + 1 ) × i = 1 n ( m i + 2 ) log ( λ ( k + 1 ) ) ( ( 1 λ ( k + 1 ) ) / λ ( k + 1 ) ) ( m i ( k + 1 ) + 1 ) λ ( k + 1 ) log ( λ ( k + 1 ) ) + ( 1 λ ( k + 1 ) ) 1 ( m i ( k + 1 ) + 1 ) log ( λ ( k + 1 ) ) 1 ,
where x ¯ is the sample mean and
m i ( k + 1 ) = λ ( k ) log ( λ ( k ) ) ( 3 ξ 2 ( x i ; λ ( k ) , θ ( k ) ) + 4 ξ ( x i ; λ ( k ) , θ ( k ) ) ) ( 4 ξ ( x i ; λ ( k ) , θ ( k ) ) + 2 ) log ( λ ( k ) ) + ( 1 λ ( k ) ) ( 1 ξ 2 ( x i ; λ ( k ) , θ ( k ) ) ) / ( 1 ξ ( x i ; λ ( k ) , θ ( k ) ) ) 1 λ ( k ) + ( 3 λ ( k ) 2 ) log ( λ ( k ) ) ξ ( x i ; λ ( k ) , θ ( k ) ) λ ( k ) ( log ( λ ( k ) ) 1 ) + 1 .
The solutions for these equations can be found using an iterative numerical process. For example, one can use the uniroot function in R (R Core Team, 2020).

6.2. EM Algorithm for the LiE2 Model

In this case, the random variable Y = min ( Z 1 , , Z M ) for a given M = m has the exponential distribution with the scale parameter θ m . Thus, the PDF of the hypothetical complete-data distribution is
f ( y , m ) = λ m 1 [ λ log ( λ ) + ( 1 λ ) ( 1 ( m + 1 ) log ( λ ) ) ] θ m e θ m y 1 2 log ( λ ) , y > 0 , m = 1 , 2 ,
Following some calculations, we can deduce that the marginal PDF of the random variable Y is given by
f ( y ) = θ e θ y [ 1 λ + ( 3 λ 2 ) log ( λ ) λ ( 1 λ + λ log ( λ ) ) e θ y ] ( 1 2 log ( λ ) ) ( 1 λ e θ y ) 3 , y > 0 ,
which implies that the conditional PDF of M for given Y = y has the form
f M | Y ( m | y ) = m λ m 1 e θ ( m 1 ) y ( 1 λ e θ y ) 3 [ λ log ( λ ) + ( 1 λ ) ( 1 ( m + 1 ) log ( λ ) ) ] 1 λ + ( 3 λ 2 ) log ( λ ) λ ( 1 λ + λ log ( λ ) ) e θ y , m = 1 , 2 ,
The E-step of the EM algorithm requires the computation of the conditional expectation of the random variable M for a given Y = y . We have that
E ( M | Y = y ) = 1 λ + ( 3 λ 2 ) log ( λ ) 4 ( 1 λ ) λ e θ y log ( λ ) λ 2 ( 1 λ + λ log ( λ ) ) e 2 θ y ( 1 λ e θ y ) ( 1 λ + ( 3 λ 2 ) log ( λ ) λ ( 1 λ + λ log ( λ ) ) e θ y ) .
In the M-step, we need the complete data log-likelihood function, which is given by
l c ( θ , λ ) = n log ( θ ) + i = 1 n log ( m i ) θ i = 1 n m i y i + i = 1 n m i n log ( λ ) + i = 1 n log [ λ log λ + ( 1 λ ) ( 1 ( m i + 1 ) log ( λ ) ) ] n log ( 1 2 log ( λ ) ) .
By maximizing the log-likelihood function l c ( θ , λ ) , we obtain the estimates in the k + 1 iteration as follows:   
θ ( k + 1 ) = n i = 1 n y i m i ( k + 1 ) ,
i = 1 n λ ( k + 1 ) ( m i ( k + 1 ) + 2 ) log ( λ ( k + 1 ) ) ( 1 λ ( k + 1 ) ) ( 1 + m i ( k + 1 ) ) λ ( k + 1 ) log ( λ ( k + 1 ) ) + ( 1 λ ( k + 1 ) ) ( 1 ( m i ( k + 1 ) + 1 ) log ( λ ( k + 1 ) ) ) + 2 n 1 2 log ( λ ( k + 1 ) ) = = n i = 1 n m i ( k + 1 ) ,
where
m i ( k + 1 ) = 1 λ ( k ) + ( 3 λ ( k ) 2 ) log ( λ ( k ) ) 4 ( 1 λ ( k ) ) λ ( k ) log ( λ ( k ) ) e θ ( k ) y i λ 2 ( k ) ( 1 λ ( k ) + λ ( k ) log ( λ ( k ) ) ) e 2 θ ( k ) y i / ( 1 λ ( k ) e θ ( k ) y i ) ( 1 λ ( k ) + ( 3 λ ( k ) 2 ) log ( λ ( k ) ) λ ( k ) ( 1 λ ( k ) + λ ( k ) log ( λ ( k ) ) ) e θ ( k ) y i ) .

7. Simulation Study

In this section, we consider LiE1 and LiE2 models and present a simulation study testing the performances of the estimators using the EM algorithm. We generated 10,000 random samples in batches of 50, 100 and 200 from both models.
We can generate random numbers from the L i E 1 distribution by using the inverse transform method. Let u be a random number from the uniform distribution on [ 0 , 1 ] . Employing some algebra, we have x = log ( 1 y ) / θ , a number from the L i E 1 distribution. Here,
y = 2 λ a u + c Δ 1 2 ( b + λ 2 a u ) ,
where a = 1 2 log ( λ ) , b = λ [ 1 λ + ( 2 λ 1 ) log ( λ ) ] , c = 1 λ + ( 3 λ 2 ) log ( λ ) and Δ 1 = ( 2 λ a u + c ) 2 4 ( λ 2 a u + b ) a u .
Similarly, we can generate random numbers from the L i E 2 distribution by using the inverse transform method. Let u be a random number from the uniform distribution on [ 0 , 1 ] . Following some calculations, we have y = log ( x ) / θ , a number from the L i E 2 distribution. Here,
x = d + a ( 1 2 u λ ) Δ 2 2 ( d λ 2 a u ) ,
where d = λ [ 1 log ( λ ) ] and Δ 2 = [ ( 2 u λ 1 ) a d ] 2 4 a ( d u a λ 2 ) ( 1 u ) .
We used R (R Core Team, 2020) with uniroot to run the EM algorithms. We took the parameter values as the starting points for the iterations in the algorithms. The algorithms stopped when | λ ( k + 1 ) λ ( k ) | < 10 5 . The simulation results of the empirical means and mean square errors (MSEs) are reported in Table 1 and Table 2. We observe that the estimates are close to the parameter values and the MSEs decrease with increasing sample size. This makes the use of the EM algorithm plausible for estimation.

8. Real Data Fitting

In this section, we investigate the performance of the introduced distributions in data fitting. We also compare them with their natural competitor, that is, the generalized exponential (GE) distribution studied in [15]. The GE distribution was proposed as an alternative to exponential, gamma and Weibull distributions. A lot of work in the literature has shown that it is a flexible model with reverse J-shaped and positively skewed unimodal data fitting. The PDF of the GE distribution is given by
f ( x ; α , θ ) = α θ e θ x ( 1 e θ x ) α 1 , x , α , β > 0 .
We consider the maximum likelihood method in the estimation. Since we compare the models, we used the direct maximization of the respective log-likelihood functions.

8.1. Carbon Data Set

Let us consider a data set (uncensored) from [16], which includes 100 observations regarding breaking stress of carbon fibers in Gba. The data are given in Table 3.
The data were also used in [17].
We used the LiE1 distribution in fitting instead of LiE2, since the data exhibits a unimodal shape (see Figure 10). One can also use the total time test (TTT) plot procedure to determine an appropriate model shape.
The TTT plots were introduced by [18] for model identification purposes, that is, for choosing a suitable lifetime distribution. These plots were studied in detail by [19]. Let x ( 1 ) x ( n ) denote the ordered observations from the random sample of size n. The TTT plot is obtained in the following way:
  • Let s 0 = 0 .
  • Calculate the TTT values s j = s j 1 + ( n j + 1 ) ( x ( j ) x ( j 1 ) ) for j = 1 , 2 , , n .
  • Obtain the normalized TTT values by u j = s j / s n for j = 0 , 1 , 2 , , n .
  • Plot the points ( j / n , u j ) for j = 0 , 1 , 2 , , n , and then join them by line segments.
A TTT plot is a diagnostic tool in the sense that it gives an insight about the aging properties of the underlying distribution. Then, one can choose an appropriate lifetime distribution for modeling the data. For example, when the T T T plot is concave, a life distribution with an increasing failure rate should be used. The TTT plot for the Carbon data set is sketched in the lhs of Figure 10. It can be seen that it is concave. Thus, a model with increasing failure rate like LiE1 should be used.
Further, the HRF can not only be increasing, but also be constant, decreasing or even a U-shaped. These futures may also be inferred from the TTT plot. The HRF is constant when the TTT plot is straight diagonal, decreases when the TTT plot is convex and is U-shaped if the TTT plot is S-shaped—that is, first convex and then changed to a concave shape. When the ordering is reversed in the S-shaped case, a HRF with a unimodal characteristic is obtained.
Alternatively, we also fit LiSL1 and GE distributions to this data set and computed the parameter estimates using the optim function in R [20]. The results are reported in Table 4. We observe that the Lie1 distribution is better than the others according to the Akaike information criterion (AIC). The Kolmogorov–Simirnov test statistic was 0.074605 with p-value 0.6338. Figure 10 also supports this good fit. On the other hand, the EM algorithm gave λ ^ = 0.9415187 and θ ^ = 1.432148 , which are similar values to those obtained from direct maximization.

8.2. Failure Data Set

The data set is based on the number of successive failures of air conditioning systems on 13 Boeing 720 air planes. The data set is from [21] and was recently analyzed in [22]. Since the data exhibit a reversed J-shape (see Figure 11), we used the LiE2 distribution in fitting. TTT plot sketched in the lhs of Figure 11 also supports this conjecture, since it produces a convex shape.
For convenience, the data are given Table 5.
The fitting results are given in Table 6. According to the AIC, the LiE2 fit is better than the GE fit. The Kolmogorov–Simirnov test statistic is 0.050017 with a p-value of 0.7347 . In addition, the EM algorithm gave λ ^ = 0.3837683 and θ ^ = 0.007553028 , which are close to those obtained from direct maximization.

9. Conclusions

In this manuscript, we constructed two general probability distribution families using the discrete Lindley distribution. The families contain a baseline distribution which can be manipulated by the user to obtain probability distributions of different shapes. The resulting distributions are not so complex in the sense that the number of parameters of the baseline distribution is increased by one only. As an alternative to the direct maximization of the log-likelihood, we constructed an EM algorithm to compute the ML estimates of the parameters. We mainly focused on the exponential baseline distribution and used the newly defined distributions in real data fitting.
As a part of further research, the introduced distributions may be studied in detail using other simple baseline distributions like Pareto. Also, the Marshall-Olkin approach of construction of bivariate distributions can be used to define the bivariate extensions of the models introduced.

Author Contributions

Conceptualization, S.K. and B.V.P.; methodology, S.K., B.V.P. and A.İ.G.; software, A.İ.G.; validation, S.K., B.V.P. and A.İ.G.; investigation, B.V.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sets are given in the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Johnson, N.L.; Kemp, A.W.; Kotz, S. Univariate Discrete Distributions; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2005. [Google Scholar]
  2. Hu, X.; Zhang, L.; Sun, W. Risk model based on the first-order integer-valued moving average process with compound Poisson distributed innovations. Scand. Actuar. J. 2018, 5, 412–425. [Google Scholar] [CrossRef]
  3. Asmussen, S. Ruin Probabilities; World Scientific Publishing: Singapore, 2000. [Google Scholar]
  4. Klugman, S.; Panjer, H.H.; Willmot, G.E. Loss Models: From Data to Decisions; Wiley: New York, NY, USA, 1998. [Google Scholar]
  5. Panjer, H.H.; Willmot, G.E. Insurance Risk Models; Society of Actuaries: Schaumburg, IL, USA, 1992. [Google Scholar]
  6. Nadarajah, S.; Popović, B.V.; Ristić, M.M. Compounding: An R package for computing continuous distributions obtained by compounding a continuous and a discrete distribution. Comput. Stat. 2013, 28, 977–992. [Google Scholar] [CrossRef]
  7. Gómez-Déniz, E.; Calderín-Ojeda, E. The discrete Lindley distribution: Properties and applications. J. Stat. Comput. Simul. 2011, 81, 1405–1416. [Google Scholar] [CrossRef]
  8. Abebe, B.; Shanker, R.A. A discrete lindley distribution with applications in biological sciences. Biom. Biostat. Int. J. 2018, 7, 48–52. [Google Scholar] [CrossRef]
  9. Oliveira, R.P.; Mazucheli, J.; Achcar, J.A. A comparative study between two discrete Lindley distributions. Cienc. Nat. 2017, 39, 539–552. [Google Scholar] [CrossRef] [Green Version]
  10. Tahir, M.H.; Cordeiro, G.M. Compounding of distributions: A survey and new generalized classes. J. Stat. Distrib. Appl. 2016, 3, 13. [Google Scholar] [CrossRef] [Green Version]
  11. Wolfram Research, Inc. Mathematica, Version 9.0.; Wolfram Research, Inc.: Champaign, IL, USA, 2012. [Google Scholar]
  12. Nadarajah, S.; Kotz, S. The Exponentiated Type Distributions. Acta Appl. Math. 2006, 92, 97–111. [Google Scholar] [CrossRef]
  13. Brazauskas, V. Information matrix for Pareto (IV), Burr, and related distributions. Commun. Stat.-Theory Methods 2003, 32, 315–325. [Google Scholar] [CrossRef]
  14. Cordeiro, G.M.; Brito, R.S. The Beta Power distribution. Braz. J. Probab. Stat. 2012, 26, 88–112. [Google Scholar]
  15. Gupta, R.D.; Kundu, D. Generalized exponential distributions. Aust. N. Z. J. Stat. 1999, 41, 173–188. [Google Scholar] [CrossRef]
  16. Nichols, M.D.; Padgett, W.J. A Bootstrap control chart for Weibull percentiles. Qual. Reliab. Eng. Int. 2006, 22, 141–151. [Google Scholar] [CrossRef]
  17. Lemonte, A.J.; Cordeiro, G.M. The exponentiated generalized inverse Gaussian distribution. Stat. Probab. Lett. 2011, 81, 506–517. [Google Scholar] [CrossRef]
  18. Barlow, R.E.; Campo, R. Total time on test processes and applications to failure data analysis. In Reliability and Fault Tree Analysis; Barlow, R.E., Fussell, J., Singpurwalla, N.D., Eds.; SIAM: Philadelphia, PA, USA, 1975; pp. 451–481. [Google Scholar]
  19. Klefsjö, B. TTT-plotting—A tool for both theoretical and practical problems. J. Stat. Plan. Inference 1991, 29, 99–110. [Google Scholar] [CrossRef]
  20. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2020; Available online: https://www.R-project.org/ (accessed on 12 November 2022).
  21. Proschan, F. Theoretical explanation of observed decreasing failure rate. Technometrics 1963, 5, 375–383. [Google Scholar] [CrossRef]
  22. Al-Saiary, Z.A.; Bakoban, R.A. The Topp-Leone generalized inverted exponential distribution with real data applications. Entropy 2020, 22, 1144. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Graphical solution of the inequality f 1 ( λ ) > f 2 ( λ ) , where f 1 ( λ ) = ( 8 λ 2 9 λ + 2 ) log ( λ ) and f 2 ( λ ) = 2 λ 2 3 λ + 1 .
Figure 1. Graphical solution of the inequality f 1 ( λ ) > f 2 ( λ ) , where f 1 ( λ ) = ( 8 λ 2 9 λ + 2 ) log ( λ ) and f 2 ( λ ) = 2 λ 2 3 λ + 1 .
Mathematics 11 00290 g001
Figure 2. The plots of the density function of the LiE1 distribution for various choices of parameters with θ = 1 (left) and λ = 0.65 (right).
Figure 2. The plots of the density function of the LiE1 distribution for various choices of parameters with θ = 1 (left) and λ = 0.65 (right).
Mathematics 11 00290 g002
Figure 3. The plots of the HRF of the LiE1 distribution for various choices of parameter λ with θ = 1 .
Figure 3. The plots of the HRF of the LiE1 distribution for various choices of parameter λ with θ = 1 .
Mathematics 11 00290 g003
Figure 4. The plots of the density function of the LiE2 distribution for various choices of parameters with θ = 1 (left) and λ = 0.5 (right).
Figure 4. The plots of the density function of the LiE2 distribution for various choices of parameters with θ = 1 (left) and λ = 0.5 (right).
Mathematics 11 00290 g004
Figure 5. The hazard plots of the LiE2 distribution for various choices of parameter λ with θ = 1 .
Figure 5. The hazard plots of the LiE2 distribution for various choices of parameter λ with θ = 1 .
Mathematics 11 00290 g005
Figure 6. Skewness and kurtosis plots of the LiE1 distribution as a function of parameter λ .
Figure 6. Skewness and kurtosis plots of the LiE1 distribution as a function of parameter λ .
Mathematics 11 00290 g006
Figure 7. Skewness and kurtosis plots of the LiPa1 distribution as a function of parameter λ .
Figure 7. Skewness and kurtosis plots of the LiPa1 distribution as a function of parameter λ .
Mathematics 11 00290 g007
Figure 8. Skewness and kurtosis plots of the LiPa1 distribution as a function of parameter ν .
Figure 8. Skewness and kurtosis plots of the LiPa1 distribution as a function of parameter ν .
Mathematics 11 00290 g008
Figure 9. Skewness and kurtosis plots of the LiSL1 distribution as a function of parameter λ .
Figure 9. Skewness and kurtosis plots of the LiSL1 distribution as a function of parameter λ .
Mathematics 11 00290 g009
Figure 10. TTT plot of the data set (on the left) and several fits for the Carbon data (on the right).
Figure 10. TTT plot of the data set (on the left) and several fits for the Carbon data (on the right).
Mathematics 11 00290 g010
Figure 11. TTT plot of the data set (on the left) and two competing fits for the Failure data (on the right).
Figure 11. TTT plot of the data set (on the left) and two competing fits for the Failure data (on the right).
Mathematics 11 00290 g011
Table 1. Empirical means and MSEs of the maximum-likelihood estimates of the LiE1 for different values of the parameters.
Table 1. Empirical means and MSEs of the maximum-likelihood estimates of the LiE1 for different values of the parameters.
n λ θ λ ^ θ ^ λ θ λ ^ θ ^ λ θ λ ^ θ ^
500.60.50.59540.51590.610.59631.03500.620.59652.0675
(0.0163)(0.0089) (0.0161)(0.0369) (0.0159)(0.1448)
100 0.59520.5068 0.59571.0143 0.59522.0264
(0.0081)(0.0041) (0.0080)(0.0164) (0.0082)(0.0685)
200 0.59540.5020 0.59701.0067 0.59492.0058
(0.0040)(0.0020) (0.0041)(0.0083) (0.0040)(0.0334)
Table 2. Empirical means and the MSEs of the maximum-likelihood estimates of the LiE2 for different values of the parameters.
Table 2. Empirical means and the MSEs of the maximum-likelihood estimates of the LiE2 for different values of the parameters.
n λ θ λ ^ θ ^ λ θ λ ^ θ ^ λ θ λ ^ θ ^
500.60.20.53350.23790.610.53441.18980.810.68831.7007
(0.0368)(0.0119) (0.0375)(0.2993) (0.0434)(1.7356)
100 0.55260.2252 0.54551.1424 0.73311.4154
(0.0240)(0.0068) (0.0248)(0.1737) (0.0213)(0.7965)
200 0.56680.2176 0.56691.0838 0.76161.2371
(0.0127)(0.0036) (0.0128)(0.0884) (0.0093)(0.3335)
Table 3. Data on the breaking stress of carbon fibers.
Table 3. Data on the breaking stress of carbon fibers.
0.390.810.850.981.081.121.171.181.221.25
1.361.411.471.571.571.591.591.611.611.69
1.691.711.731.801.841.841.871.891.922.00
2.032.032.052.122.172.172.172.352.382.41
2.432.482.482.502.532.552.552.562.592.67
2.732.742.762.772.792.812.822.832.852.87
2.882.932.952.962.972.973.093.113.113.15
3.153.193.193.223.223.273.283.313.313.33
3.393.393.513.563.603.653.683.703.754.20
4.384.424.704.904.915.085.56
Table 4. Maximum-likelihood estimates with standard errors in parentheses, log-likelihood and AIC values for Carbon data.
Table 4. Maximum-likelihood estimates with standard errors in parentheses, log-likelihood and AIC values for Carbon data.
Model λ ^ θ ^ α ^ log-likAIC
LiE10.94191.4344 −142.1633288.3266
(0.0169)(0.1187)
LiSL10.95281.5067 −142.9535289.9069
(0.0127)(0.1109)
GE 1.01327.7883−146.1823296.3646
(0.0875)(1.4962)
Table 5. Data on the successive failures for the air conditioning system of each member in a fleet of 13 Boeing 720 jet air planes.
Table 5. Data on the successive failures for the air conditioning system of each member in a fleet of 13 Boeing 720 jet air planes.
194413907455239750359501304875710215
1410573202615144925449333182094158
60485687111021251414293718629104
35985410011181654912239141839312
536795933246179327201842715621
168813014118441542106462302659153104
2020656634292635582311183261254
3634182512031221821613967310346210
577614111976239307441163232223
141813341618130901632081247016101
52208956211191147
Table 6. Maximum-likelihood estimates with standard errors in parentheses, log-likelihood and AIC values for failure data.
Table 6. Maximum-likelihood estimates with standard errors in parentheses, log-likelihood and AIC values for failure data.
Model λ ^ θ ^ α ^ log-likAIC
LiE20.38000.0076 −1033.6442071.288
(0.1180)(0.0014)
GE 0.01020.9005−1036.9072077.814
(0.0010)(0.0852)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kadić, S.; Popović, B.V.; Genç, A.İ. Two Families of Continuous Probability Distributions Generated by the Discrete Lindley Distribution. Mathematics 2023, 11, 290. https://doi.org/10.3390/math11020290

AMA Style

Kadić S, Popović BV, Genç Aİ. Two Families of Continuous Probability Distributions Generated by the Discrete Lindley Distribution. Mathematics. 2023; 11(2):290. https://doi.org/10.3390/math11020290

Chicago/Turabian Style

Kadić, Srdjan, Božidar V. Popović, and Ali İ. Genç. 2023. "Two Families of Continuous Probability Distributions Generated by the Discrete Lindley Distribution" Mathematics 11, no. 2: 290. https://doi.org/10.3390/math11020290

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop