A weighted transmuted exponential distribution with environmental applications

In this paper, we introduce a new three-parameter distribution based on the combination of re-parametrization of the so-called EGNB2 and transmuted exponential distributions. This combination aims to modify the transmuted exponential distribution via the incorporation of an additional parameter, mainly adding a high degree of flexibility on the mode and impacting the skewness and kurtosis of the tail. We explore some mathematical properties of this distribution including the hazard rate function, moments, the moment generating function, the quantile function, various entropy measures and (reversed) residual life functions. A statistical study investigates estimation of the parameters using the method of maximum likelihood. The distribution along with other existing distributions are fitted to two environmental data sets and its superior performance is assessed by using some goodness-of-fit tests. As a result, some environmental measures associated with these data are obtained such as the return level and mean deviation about this level.


Introduction
The precise analysis of a wide variety of data sets is limited by the use of models based on the classical distributions (normal, exponential, logistic. . .).For instance, the analysis of environmental data sets collecting from observations of complex natural phenomena needs special treatments to reveal all the underlying informations.Over the last decades, numerous solutions have been provided by the statisticians, including the elaboration of several methods which aim to increase the flexibility of the former classical distributions.Among these methods, a popular one that aims to construct a generator of distributions by compounding continuous distributions with well-known discrete distributions.This compounding is always motivated by practical problems as those involving cdf of minimum or maximum of several independent and identically random variables.An exhaustive survey on the construction of such generators, with the presentation of new ones, can be found in [22], and the references therein.Among the long list, let us briefly present the EGNB2 distribution introduced by [22,Remark 2 (ii)].Using a cumulative distribution function (cdf) G(x), the general form of the associated cdf is given by The EGNB2 distribution can be viewed as an extension of the G-negative binomial families introduced by [9] and [17].It enjoys remarkable theoretical and practical properties. 1 In this study, we consider a particular case of this EGNB2 distribution consisting in a re-parametrization for the parameters α, η and υ appearing in (1) as described below.Let γ > 0, η = − γ γ+1 , υ = −(γ + 1) and α = 1.That yields a cdf of the (simple) form: Let us now explain the importance of this re-parametrization of (1), with some statistical features.One can observe that F (x) as the following integral form: p(t)dt, where p(t) denotes the pdf: p(t) = γ+1 (1+γ) (1 + γt) γ .So it reveals to be a new particular case of the T-X family cdf introduced by [4].Another remark is that, when G(x) → 0, we have F (x) ∼ γ+1 (1+γ) G(x) and when γ → 0, we have F (x) ∼ e G(x) −1 e−1 .This transformation of cdf corresponds to the one proposed in [10].All the resulting distributions have demonstrated nice properties in terms of analysis of real life data sets.Furthermore, let us observe that the probability density function (pdf) associated to (2) is given by (1 + γ) Note that we can also express it as a weighted pdf: f (x) = cw(x)g(x), where w(x) = [1 + γG(x)] Further details on such family of distributions can be found in [19].On the other side, [5] introduced the transmuted exponential distribution defined by the following cdf: where H(x) denotes the cdf of the exponential distribution.Then, it is proved that the additional parameter θ can significantly increase the flexibility of the former exponential distribution, demonstrating a superiority in terms of fit in comparison to the former exponential distribution.We may refer the reader to [16], and the references therein.
In this paper, we introduce a new three-parameter distribution which combines the features of the distribution characterized by (2) and the transmuted exponential distribution.This combination aims to modify the former transmuted exponential distribution by incorporating the parameter γ and takes benefit of the flexibility of the EGNB2 distribution.Its main role is to add a high degree of flexibility on the mode, and the skewness and kurtosis of the tail.We thus obtain a very flexible distribution, which opens new perspectives in terms of the construction of statistical models for data analysis.The theoretical and practical aspects are explored in an exhaustive way.The theoretical ones include expansions of the cdf, pdf, hazard rate function (hrf), quantile function, moments, moment generating function, various entropy measures, residual life functions, conditional moments, mean deviations and reversed residual life function.We investigate the estimation of its parameters via the maximum likelihood method.Two real-life data sets in environmental sciences are analyzed to show its superior performance in terms of fit in comparison to well-known distributions: The gamma distribution, the Marshal-Olkin exponential distribution [11], the Nadarajah-Haghighi exponential distribution [15], the exponentiated exponential distribution [7], the transmuted Weibull distribution [5], the transmuted generalized exponential distribution [8], the transmuted linear exponential distribution [23] and the Kappa distribution [13].The best performance of the proposed distribution recommends it as a hydrologic probability model, such as the most known distributions: Kappa and gamma distributions.This motivates to estimate important hydrologic parameters of those data sets by making use of the distribution.
The rest of this article is organized as follows.In Section 2, we present our main distribution.Some of its mathematical properties are studied in Section 3. Residual life functions are determined in Section 4. Estimations of the parameters are investigated in Section 5. Applications to two real-life data sets are provided in Section 6. Concluding remarks are addressed in Section 7.

A new weighted transmuted exponential distribution
In this section, we precise what is the considered cdf G(x) given by (2).[21] and [5] introduced the quadratic rank transmutation map (QRTM) to propose a new distribution based on the Weibull/exponential one with great flexibility and nice fit for real-life data.In the current studies, it remains a serious competitor in terms of precision in modelling (see [16]).For these reasons, we use it in our study.We consider the cdf: where H(x) is considered to be the cdf of the exponential distribution of parameter λ: Set the above expression into (2), we introduce a new cdf defined by Another useful expression is the following one: We will refer to the distribution given by (3) as the new weighted transmuted exponential and denote it by NWTE(λ, γ, θ) with the considered parameters.The corresponding pdf is given by (1 + γ) The associated hrf is given by Let us now discuss the possible shapes of pdf (4) and hrf (5) as follows.
On the other side, we have In order to visualize the wide variety of shapes, some plots of the pdf (4) and hrf (5) are given in Figures 1 and 2. We see that γ has a great impact on the mode of the NWTE distribution.Moreover, the hrf also exhibits sudden spikes at the end of upside-down bathtub shapes, which manages the model to analyze a non-stationary real-life data.

Expansion for the associated functions
Expansion for the cdf function.First of all, set so h is increasing.Since h(0) = 0 and h(1) = 1, we have 0 < h(u) < 1 for all u ∈ (0, 1).Since 0 ≤ γ 1+γ < 1 and 0 < e −λx (1 − θ + θe −λx ) = h(e −λx ) < 1, the generalized binomial expansion, we have where Therefore we can expand the cdf function as A WEIGHTED TRANSMUTED EXPONENTIAL DISTRIBUTION WITH ENVIRONMENTAL APPLICATIONS Expansion for the pdf function.Similar mathematical arguments used for (6) give where (1 + γ) where (1 + γ) On the survival function.Note that Using (7), we have the following expansion (1 + γ) Expansion for the hrf function.Using ( 5), ( 8) and ( 10), an expansion of the hrf function is given by Another expansion comes from the geometric series decomposition: .

Quantile function
The quantile functions are in widespread use in general statistics to obtain mathematical properties of a distribution and often find representations in terms of lookup tables for key percentiles.For generating data from the NWTE model, let u ∼ U (0 , 1 ).Then, by inverting the cdf (3) and after some algebra, we get the quantile function The analysis of the variability of the skewness and kurtosis of X can be investigated based on quantile measures.The Bowley skewness is given by and the Moors' kurtosis by where Q(u) is given by ( 12).These measures are less sensitive to outliers and they exist even for distributions without moments.Figure 3 displays plots of S and K as functions of θ and γ, which show their variability in terms of the shape parameters.

Moments and moment generating function
Moments.Using equation ( 8) and the gamma function γ(ν) = +∞ 0 x ν−1 e −x dx, the r-th moments about the origin is given by The moment generating function.Similarly the moment generating function associated to the NWTE distribution is given by, for t ≤ λ,

Entropies
An entropy can be considered as a measure of uncertainty of probability distribution of a random variable.Therefore, we obtain three entropies for the NWTE distribution with investigating a numerical study among them.Entropy 1.Let us consider the Shannon entropy [20]: Let us now expand the two integrals by using the logarithmic expansion: M X (t) denotes the moment generating function defined by (14).
Entropy 2. Let us now focus our attention on the Rényi entropy [18]: β dx , with β = 1 and β > 0. Similar mathematical arguments used for (6) give : where On the other side, observing that |θ(1 − 2e −λx )| < 1, similar mathematical arguments used for (6) give : where Hence [f (x)] β can be expanded as where Entropy 3. We now focus our attention on the entropy introduced by [12]: where Hence Some numerical values for the three entropies are given in Table 1.It can be observed that these entropies decrease with increasing the parameter values.Moreover, one can see that J MH (δ) has the smallest values comparing with the other entropies considered here.

Conditional moments and mean deviations
Here, we introduce an important lemma which will be used in the next sections.

Lemma 1
Let J r (t) = t 0 x r f (x)dx and γ(t, ν) = t 0 x ν−1 e −x dx be the lower incomplete gamma function.Then we have

Proof
Using the equation ( 8), we have The r-th conditional moments of the NWTE distribution is given by It can be expressed using ( 5), ( 13) and Lemma 1.The same remark holds for the r-th reversed moments of the NWTE distribution given by The mean deviations of X about the mean µ = E(X) can be expressed as δ = 2µF (µ) − 2J 1 (µ) and the mean deviations of X about the median M has the form η = µ − 2J 1 (M ).

Residual lifetime function
The residual life is described by the conditional random variable R (t) = X − t | X > t, t ≥ 0. Using (10), the survival function of the residual lifetime R (t) for the NWTE distribution is given by The associated cdf is given by .
The corresponding pdf is given by .
The associated hrf is given by .
The mean residual life is defined as where f (x) is given by (4), S(t) is mentioned in ( 9), E(X) is given by ( 13) and J 1 (t) is stated in Lemma 1. Further, the variance residual life is given by where E(X 2 ) is given by ( 13) and J 2 (t) is given by Lemma 1.Some numerical values for the mean residual life are displayed in Table 2 for various choices of the parameters γ and θ at the time points t = 1, 3, 5, 7, 10.It can be seen that, the mean residual life increases with increasing the time points t, also decreases with increasing γ and θ.

Reversed residual life function
The reverse residual life is described by the conditional random variable R (t) = t − X | X ≤ t, t ≥ 0. Using (3), the survival function of the reversed residual lifetime R (t) for the NWTE distribution is given by The associated cdf is given by .
The corresponding pdf is obtained as .
The associated hrf is given by .
Moreover, the mean reversed residual life is defined as where f (x) is given by (4), F (t) is defined by (3) and J 1 (t) is given by Lemma 1. Also, the variance reversed residual life is given by  where J 2 (t) is given by Lemma 1.
In Table 3, we give some numerical values for the mean reversed residual life with different choices of the parameters γ and θ at the time points t = 1, 3, 5, 7, 10.From this table, the mean reversed residual life increases with increasing the time points t and with increasing γ and θ.

Estimation
When the parameters λ, γ and θ of the NWTE distribution need to be estimated, several estimation approaches are possible.In this section, we investigate the maximum likelihood estimates (MLEs) of these parameters.Then we propose three goodness-of-fit statistics to compare the densities fitted to any data set.

Maximum likelihood estimation
Let (x 1 , . . ., x n ) be a random samples of size n from the NWTE distribution.Set Θ = (λ, γ, θ) T , then the MLE of Θ can be determined by maximizing the log-likelihood function ℓ(Θ) given by Alternatively, by differentiating ℓ(Θ), the MLE of Θ can be obtained by solving the nonlinear log-likelihood system equations given by By solving the equations above simultaneously, we can obtain the MLE Θ of Θ, with components providing the MLEs λ, γ, θ of λ, γ, θ respectively.Various numerical iterative techniques can be used for estimating these parameters.In this study, we consider the iterative algorithm inherent to the NMaximize command in the symbolic computational package Mathematica.Under some regularity conditions, the asymptotic normality of the MLEs is guaranteed; the asymptotic distribution of ( Θ − Θ) is N 3 (0 3 , I(Θ) −1 ), where I(Θ) = E(J(Θ)) denotes the expectation of the information matrix: J(Θ) = {J rs (Θ)}, (r, s) ∈ {λ, γ, θ}.Thus confidence intervals or Wald test can be constructed for the parameters.
Other estimation methods can be considered, as those performed in [3] for instance.

Goodness-of-fit statistics
In order to evaluate the goodness-of-fit of the fitted models, we consider the Anderson-Darling statistics (A * ), the Cramér-von Mises statistics (W * ) and the Kolmogrov-Smirnov statistics (K-S), given by where z i = F (y i ) and the y , i s are the ordered observations.The associated P -values are determined.The better distribution in terms of fit is the one having the smallest statistics and largest P -values.

Applications
This section is devoted to the data analyses of two data sets in environmental sciences, namely hydrology, where we compare the fit of our new distributions and some well-known distributions.The best model among them is then selected.

Data fitting
We consider the data sets: "Ground-water data (GWD)" described in Table 1 of Bhaumik and Gibbons [6] and "Flood data (FD)" described in Akinsete et al. [2].The data of GWD represent vinyl chloride concentrations (n = 34) collected from clean upgradient monitoring wells.The data of FD represent flood rates (for the years 1935-1973) (n = 39) for the Floyd River located in James, Iowa, USA.The descriptive statistics of both data sets are summarized in Table 4. From this table, the data are over-dispersed and having skewness and kurtosis.For each data set, the NWTE model is compared with the following distributions.
• The gamma distribution with pdf given by • The Marshal-Olkin exponential distribution (MOE) [11] with a pdf given by • The Nadarajah-Haghighi exponential distribution (NHE) [15] with a pdf given by • The exponentiated exponential distribution (EE) [7] with a pdf given by • The transmuted Weibull distribution (TW) [5] with a pdf given by • The transmuted generalized exponential distribution (TGE) [8] with a pdf given by • The transmuted linear exponential distribution (TLE) [23] with a pdf given by • The Kappa distribution [13] with a pdf given by The MLEs with their standard errors are given in Tables 5 and 6 for both data sets along with the goodness-of-fit statistics for each distribution.We can see in Tables 5 and 6 that the NWTE distribution has the smallest statistics and the largest P -value; it provides the best fit among the considered distributions.This conclusion is confirmed again by Figure 4.

Hydrologic parameters
The nice fit properties of the NWTE distribution motivates the determination of three important hydrologic parameters for the considered data sets: the return level, the conditional mean of the event data and the mean deviation about the return level.This recommends the NWTE as a hydrologic probability model, such as the most known distributions: Kappa and gamma distributions.

Return level
A return period is an estimate of the likelihood of an event, such as a flood or a river discharge flow to occur.The probability, return period and return level of flood data and ground water contamination data can be estimated using the equation; P (x T ) = 1 − F (x T ), T = 1/P (x T ) and x T = F −1 1 − 1 T , respectively, where F −1 (•) is the inverse of the cdf F (x) and P (x T ) called exceedance probability (see, for instance, [1,14]).The  return level x T under the NWTE distribution is obtained by , where x T > 0 and T ≥ 1. Table 7 provides estimates of the return level x T of the ground water contamination data and flood data, respectively, for the return periods T = 2, 5, 10, 20, 50, 100, 200 years based on replacing the parameters λ, γ, θ by their ML estimates in Tables 5 and 6.Moreover, the return periods for some largest values  of the both data sets are reported in Table 8 and computed using T = 1/P (x T ), where P (x T ) = S(x T ) is the estimated survival function of the NWTE distribution given by S(x T ) = (1 + γ) (1 + γ) where λ, γ, θ, are the ML estimates corresponding the used data and are given in Tables 5 and 6.

Conditional mean of the event data
The conditional mean of the event (GWD or FD) data based on equation ( 17) is defined as where S(x) is the survival function of the NWTE distribution and Q is a value of the event.For example, for the GWD E(X | X > 8.0 m3/s) = 10.1378 and FD E(X | X > 71500 mm) = 81788.2.

Mean deviation about the return level
The mean deviation about the return level is the mean of the distances of each value from their return level and it is a measure of the scatter in a population.The mean deviation about return level can be defined as  where m(x T ) = xT 0 xf (x)dx and f (x) is the pdf of the NWTE distribution.Table 7 provides mean deviation about the return level m(x T ) for the return periods T = 2, 5, 10, 20, 50, 100, 200 for the GWD and FD distributions, respectively, noting that we replace the parameters in f (x) by their ML estimates for the corresponding data.In this article, we introduce and study a new three-parameter distribution, called the NWTE distribution, having the feature to combine the respective flexibility of the EGNB2 and transmuted exponential distributions.Some of its mathematical properties are discussed, including the hazard rate function, moments, the moment generating function, the quantile function, various entropy measures and (reversed) residual life functions.Then, the NWTE is investigated from both the theoretical and practical aspects.In particular, the estimation of the parameters is performed with the method of maximum likelihood.By considering two environmental data sets, it is shown that it can provide better fits in comparison to eight well-established statistical models.Thanks to its high degree of flexibility, we believe that the NWTE model can found a place of choice for the analysis of data in other areas including engineering, medicine, science, ecology, biology and finance.
constant.It thus belongs to the family of weighted distributions.

Figure 3 .
Figure 3. Plots of the skewness and kurtosis of the NWTE distribution for λ = 0.5.

Figure 4 .
Figure 4. Plots of the estimated pdfs and cdfs of the NWTE distribution, superimposed on the histograms and empirical cdfs, respectively, for the used data sets.

Table 2 .
Mean residual life function for arbitrary parameter values with λ = 1.

Table 3 .
Mean reversed residual residue life function for arbitrary parameter values with λ = 1.

Table 4 .
Descriptive statistics of both data sets.

Table 5 .
Comparison of fit of the NWTE distribution using different methods of estimation for GWD.

Table 6 .
Comparison of fit of the NWTE distribution using different methods of estimation for FD.

Table 7 .
Return level estimates xT for T and mean deviation about it.

Table 8 .
Return periods for some largest values of the GWD and FD.