The Exponentiated Kumaraswamy-G Class: General Properties and Application

We propose a new family of distributions called the exponentiated Kumaraswamy-G class with three extra positive parameters, which generalizes the Cordeiro and de Castro's family. Some special distributions in the new class are discussed. We derive some mathematical properties of the proposed class including explicit expressions for the quantile function, ordinary and incomplete moments, generating function, mean deviations, reliability, Rényi entropy and Shannon entropy. The method of maximum likelihood is used to t the distributions in the proposed class. Simulations are performed in order to assess the asymptotic behavior of the maximum likelihood estimates. We illustrate its potentiality with applications to two real data sets which show that the extended Weibull model in the new class provides a better t than other generalized Weibull distributions.


Introduction
One of the preferred area of research in the led of distribution is that of generating new distributions starting with a baseline distribution by adding one or more additional parameters.A generalized distribution may be important because it is connected with other special distributions in interesting ways (via transformations, limits, conditioning, etc.).In some cases, a parametric family may be important because it can be used to model a wide variety of random phenomena.
In many cases, a special parametric family of distributions will have one or more distinguished standard members, corresponding to specied values of some of the parameters.Usually the standard distributions will be mathematically simpler, and often other members of the family can be constructed from the standard distributions by simple transformations on the underlying standard random variable.An incredible variety of special distributions have been studied over the years, and new ones are constantly being added to the literature.Notable among them are the Azzalini's skewed family (Azzalini 1985), Marshall-Olkin extended (MOE) family (Marshall & Olkin 1997), exponentiated family of distributions (Gupta, Gupta & Gupta 1998), or the composite methods of combining two or more known competing distributions through transformations like beta generated family (Eugene, Lee & Famoye 2002, Jones 2004), gamma-generated familiy (Zografos & Balakrishnan 2009, Ristic & Balakrishnan 2012), Kumaraswamy-G (Kw-G) family (Cordeiro & de Castro 2011), McDonald-G family (Alexander, Cordeiro, Ortega & Sarabia 2012), beta extended-G family (Cordeiro, Silva & Ortega 2012), Kumaraswamy-beta generalized family (Pescim, Cordeiro, Demetrio, Ortega & Nadarajah 2012), exponentiated transformed transformer family (Alzaghal, Felix & Carl 2013), exponentiated generalized family (Cordeiro, Ortega & Cunha 2013), geometric exponential-Poisson family (Nadarajah, Cancho & Ortega 2013), truncated-exponential skew symmetric family (Nadarajah, Ja-Revista Colombiana de Estadística 42 (2019) 133 yakumar & Ristic 2013), logistic generated family (Torabi & Montazari 2014), Kumaraswamy Marshall-Olkin-G family (Alizadeh, Tahir, Cordeiro, Zubair & Hamedani 2015), generalized gamma-Weibull distribution (Meshkat, Torabi & Hamedani 2016), generalized odd log-logistic family (Cordeiro, Alizadeh, Tahir, Mansoor, Bourguignon & G. 2017), generalized transmuted-G family (Nofal, Afy, Yousof & Cordeiro 2017) and odd Lindley-G family (Gomes- Silva, Percontini, Brito, Ramos, Silva & Cordeiro 2017).While the additional parameter(s) bring in more exibility at the same time they also complicate the mathematical form of the resulting distribution, often considerably enough to render it not amenable to further analytical and numerical manipulations.But with the advent of sophisticated powerful mathematical and statistical softwares more complex distributions are getting accepted as useful models for data analysis.Tahir & Nadarajah (2015) provided a detail review of how new families of univariate continuous distributions can be generated through introduction of additional parameter(s).Cordeiro & de Castro (2011) dened the Kw-G family as follows.If G(x) denotes the cumulative distribution function (cdf ) of a random variable, the Kw- where a > 0 and b > 0 are two additional shape parameters to the G distribution, whose role is to govern skewness and tail weights.The cdf (1) compares extremely favorably in terms of simplicity with the beta cdf.The probability density function where g(x) = dG(x)/dx.Equation (2) does not involve any special function, such as the incomplete beta function, as is the case of the beta-G family pionnered by Eugene et al. (2002).So, the Kw-G family is obtained by adding two shape parameters a and b to the G distribution.The generalization (2) contains distributions with unimodal and bathtub shaped hazard rates.It also contemplates a broad class of models with monotonic hazard rate functions (hrf 's).
In this paper, we dene a new class of distributions that extends the Kw-G family and derive some of its structural properties.Based on a continuous cdf H a,b (x) given by ( 1), the class of exponentiated H a,b distributions is dened by where a > 0, b > 0 and c > 0 are three additional shape parameters to the G distribution.The pdf corresponding to (3) is and the associated hrf reduces to .
Revista Colombiana de Estadística 42 (2019) 133 The exponentiated Kw-G class (EKw-G for short) of densities (4) allows for greater exibility of its tails and can be widely applied in many areas of engineering and biology.We study some structural properties of (4) because it extends several well-known distributions in the literature.In the next sections some mathematical properties of the new class are derived.The density function (4) will be most tractable when the cdf G(x) and pdf g(x) have simple analytic expressions.Note that even if g(x) is a symmetric distribution, the distribution f (x  4) is its ability of tting skewed data that can not be properly tted by existing distributions.
We dene the exponentiated-G (Exp-G) random variable Z with power parameter c > 0 from an arbitrary baseline distribution G, say Z ∼ Exp c G, if Z has cdf and pdf given by Π The EKw-G class shares an attractive physical interpretation whenever a, b and c are positive integers.Consider a device made of c independent components in a parallel system.Further, each component is made of b independent subcomponents identically distributed according to G(x) a in a series system.The device fails if all c components fail and each component fails if any subcomponent fails.Let X j1 , . . ., X jc denote the lifetimes of the subcomponents within the jth component (j = 1, . . ., b) with common cdf G(x).Let X j denote the lifetime of the jth component and let X denote the lifetime of the device.Thus, the cdf of X is given by So, the lifetime of the device follows the EKw-G family of distributions.

Special Models
Next, we present four EKw-G distributions.

Exponentiated Kumaraswamy Weibull (EKwW)
The Weibull cdf with parameters β > 0 and α > 0 is G(x) = 1 − e −(βx) α (for x > 0).The cdf of a random variable X having the EKwW distribution, say X∼ EKwW(a, b, c, α, β), can be expressed as and the associated density function reduces to (5) The hrf corresponding to (5) is given by For c = 1, we obtain as a special model the Kw-Weibull (KwW) distribution.The most important case of ( 5) is the exponentiated Weibull (EW) (when b = c = 1) pioneered by Mudholkar and Srivastawa (Mudholkar & Srivastava 1993).Plots of the density and hrf of the EKwW distribution for some parameter values are displayed in Figures 1 and 2, respectively.

Exponentiated Kumaraswamy Gumbel (EKwGu)
The Gumbel cdf (for , where the parameters are µ real and σ > 0. The EKwGu cdf can be expressed as and the associated density function is Plots of the EKwGu density function for some parameter values are displayed in Figure 3. Revista Colombiana de Estadística 42 (2019) 133

Exponentiated Kumaraswamy Gamma (EKwGa)
The gamma cdf (for x > 0) with shape parameter α > 0 and scale parameter dw is the gamma function and γ(α, x) = x 0 w α−1 e −w dw is the lower incomplete gamma function.

Useful Expansions
The pdf (4) can be expressed as a linear combination of Kw-G density functions.
Using the generalized binomial expansion, we can rewrite (4) as where The cdf corresponding to (6) can be expressed as Based on equations ( 6) and ( 7) some structural properties of the EKw-G class can be obtained from well-established Kw-G properties.These equations can also be expressed as linear combinations of Exp-G distributions.Substituting (2) in equation ( 6) and using the binomial expansion, we obtain where Integrating (8), we have where Π (k+1) a (x) denotes the Exp-G cdf with power parameter (k + 1) a. Revista Colombiana de Estadística 42 (2019) 133

Quantile Function
The EKw-G quantile function, say Q(u) = F −1 (u), is straightforward to be computed by inverting (3) provided a closed-form expression for the baseline quan- For example, the EKwBXII quantile function comes by inverting the cdf in Section 2.4 as The shortcomings of the classical kurtosis measure are well-known.There are many heavy-tailed distributions for which this quantile is innite.So, it becomes uninformative precisely when it needs to be.Indeed, our motivation to use quantile-based measures stemmed from the non-existence of classical skewness and kurtosis for several generalized distributions.The Bowley skewness (Kenney & Keeping 1962) is based on quartiles whereas the Moors kurtosis (Moors 1988) is based on octiles where Q(•) denotes the quantile function given by (10).Plots of the B and M functions for the EKwBXII distribution computed from (11) are displayed (for some parameter values) in Figures 8 and 9, respectively.These plots indicate a high dependence of these measures on the generator parameters of the new class.
Here, we derive a power series for the quantile function (10).If the baseline does not have a closed-form expression, it can usually be expressed in terms of a power series where the coecients a i are suitably chosen real numbers which depend on the parameters of the G distribution.For several important distributions, such as the normal, Student t, gamma and beta distributions, Q G (u) does not have explicit expressions, but it can be expanded as in equation ( 12).As a simple example, for the normal N (0, 1) distribution, a i = 0 for i = 0, 2, 4, . . .and a 1 = 1, a 3 = 1/6, a 5 = 7/120 and a 7 = 127/7560, . ... Using the binomial expansion, we obtain and then using ( 12), the EKw-G quantile function can be expressed as where (−1) j+k a i i/a j j/b k .
For 0 < u < 1, we have an expansion for u ρ which holds for ρ > 0 real noninteger where Setting ρ = k/c in ( 14) and substituting in (13), we can write where q l = ∞ k=0 g k s l (k/c).
Equation ( 15) is the main result of this section since it allows to obtain various mathematical quantities for the EKw-G class as proved in the next sections.
Revista Colombiana de Estadística 42 (2019) 133 The formula derived throughout the paper can be easily handled in most symbolic computation software platforms such as Maple and Mathematica since they have currently the ability to deal with analytic expressions of formidable size and complexity.The innity limit in the sums can be substituted by a large positive integer such as 20 or 30 for most practical purposes.

Moments
Hereafter, we shall assume that G is the cdf of a random variable Y and that F is the cdf of a random variable X having density function (4).The moments of X can be obtained from the (r, s) th probability weighted moments (PWMs) of Y dened by An alternative formula for τ r,s can be based on the baseline quantile function We use throughout the paper a result of Gradshteyn and Ryzhik (Gradshteyn & Ryzhik 2007) for a power series raised to a positive integer k (for k ≥ 1) where the coecients c k,l (for l = 1, 2, . ..) are easily obtained from the recurrence equation (with Clearly, c k,l can be determined from c k,0 , . . ., c k,l−1 and then from the quantities b 0 , . . ., b l . We can write using ( 12), ( 17) and ( 18) where e r,i = (i a 0 ) −1 i m=1 [m(r + 1) − i] a m e r,i−m and e r,0 = a r 0 and then an alternative expression for τ r,s follows as Revista Colombiana de Estadística 42 (2019) 133 From equation ( 8), we can write where Thus, the EKw-G moments can be expressed as an innite weighted sum of the baseline PWMs.The ordinary moments of several EKw-G distributions can be determined directly from equations ( 16) and ( 21).Expressions for moments of several exponentiated distributions are given by Nadarajah and Kotz (Nadarajah & Kotz 2006), which can be useful to produce E(X r ).For lifetime models, it is usually of interest to compute the hth incomplete moment of X dened by m h (z) = z 0 x h f (x)dx.The quantity m h (z) can be calculated from (8) as where The quantity ϑ h,k (z) is available for some baseline distributions and can also be computed numerically for most of them.

Generating Function
The moment generating function (mgf ) M (t) = E(e tX ) of X comes from ( 8) as an innite weighted sum where M (k+1) a (t) is the mgf of Y (k+1) a ∼ Exp-G((k + 1)a).Hence, M (t) can be determined from the mgf of Y (k+1) a given by Setting G(x) = u, we can write M k (t) in terms of the baseline quantile function Now, we provide four representations for M (t).The rst one comes from (8) as

t).
A second representation for M (t) is obtained from where 23), a third representation follows as A fourth representation can be determined from ( 19) by expanding the exponential function in the last equation e n,i t n (i + a + 1) n! .
The best representation to derive a closed-form expression for M (t) depends essentially on the forms of the pdf, cdf and quantile function of G.

Entropies
The entropy of a random variable X with density function f (x) is a measure of variation of the uncertainty.Two popular entropy measures are due to Shannon and Rényi (Shannon 1951, Rényi 1961).A large value of the entropy indicates the greater uncertainty in the data.The Rényi entropy is dened by (for γ > 0 and γ = 1) Based on the pdf (4), the Rényi entropy of the EKw-G distribution is given by

Revista Colombiana de Estadística 42 (2019) 133
where We can determine the Rényi entropy using the integral and then expanding the binomial and changing variables (−1) j+k (c − 1)γ j Here, K(γ, k) denotes the integral which can be calculated for each G model.If γ > 1 and a > 1, the EKwexponential, where G(x) = 1 − e −λ x (with parameter λ), EKw-standard logistic, where G(x) = (1 − e −ν x ) −1 , and EKw-Pareto, where G(x) = 1 − x −ν (with parameter ν), distributions, are given by respectively.In Table 2 we present a small illustration in which we calculate the Rényi entropy for EKw-exponential with some values of γ and λ.
The quantity δ X = E {log[h a,b (X)]} can be determined for special forms of h a,b (x).
Thus, we obtain Equations ( 25) and ( 26) are the main results of this section.

Reliability
The component fails at the instant that the random stress X 2 applied to it exceeds the random strength X 1 , and the component will function satisfactorily whenever X 1 > X 2 .Hence, R = P (X 2 < X 1 ) is a measure of component reliability.It has many applications especially in the area of engineering.We derive the reliability R when X 1 and X 2 have independent EKw-G(a 1 , b 1 , c 1 ) and EKw-G(a 2 , b 2 , c 2 ) distributions with the same parameter vector η for G.The reliability The pdf of X 1 and cdf of X 2 are obtained from equations ( 8) and ( 9) as Revista Colombiana de Estadística 42 (2019) 133 Finally, the reliability of the X reduces to Table 3 gives some values of R for dierent parameter values.

Estimation
We determine the maximum likelihood estimates (MLEs) of the parameters of the EKw-G distribution from complete samples only.Let x 1 , . . ., x n be a observed sample of size n from the EKw-G(a, b, c, η) distribution, where η is a p × 1 vector of unknown parameters in the baseline distribution G(x; η).The log-likelihood function for the vector of parameters θ = (a, b, c, η) T can be expressed as The components of the score vector U (θ) are For interval estimation of the model parameters, we require the total observed information matrix J n (θ), whose elements can be obtained from the authors upon request.Let θ be the MLE of θ.Under standard regularity conditions (Cox & Hinkley 1974), we can approximate the distribution of √ n( θ − θ) by the multivariate normal N (p+3) (0, K(θ) −1 ), where K(θ) = lim n→∞ n −1 J n (θ) is the unit information matrix.Based on the approximate multivariate normal N (p+3) (θ, J n ( θ) −1 ) distribution of θ, where J n ( θ) is the observed information matrix evaluated at θ, we can construct approximate condence regions for the model parameters.

Simulations
We perform some simulations with the objective to note the behavior of the MLEs obtained by the BFGS method.It is used for maximizing the log-likelihood function of a probabilistic model.In some complex distributions the task of optimization can be quite complicated.
We consider ten thousand replicas of Monte Carlo (MC) under dierent sample sizes (n = 20, 60, 100, 200, 500 and 1000).For each sample size, we compute the average MLEs obtained by the BFGS method and correct these estimates by using non-parametric bootstrap.We also compute the bootstrap errors and the biases of the MLEs for the model parameters by considerig the exponential distribution with parameter λ > 0 as the baseline in Table 4. Similarly, we perform other simulations with the same scenarios by considering the Weibull distribution with parameters α = 1.1 and β = 1.5 (xed) as the baseline (Table 5).

Revista Colombiana de Estadística 42 (2019) 133
The corrected MLEs are based on 500 replicas of bootstrap and the true parameter values considered are a = b = c = λ = 1.1.For each MC iteration, we consider only samples in which there have been convergence of the BFGS method.
It is very important to eliminate the non-convergence of the simulations, for not mislead the results obtained and with the penalty of having simulations in some problems involving the new class of distributions.Computationally intensive of akwards log-likelihood for som families of distributions are fairly complicated, having many regions approximately at.In our case, to carry out the simulations we use a computer with 32 GB of RAM memory, operating system Arch Linux with processor Intel Core i7-4710QM octa core with each core working at a frequency of 3.5 GHz.
We implement the code using the Julia version 0.6.3programming language which is a pseudo-compiled general-purpose language and has several functions that facilitate its use in scientic computing.Julia is a language that provides several advantages when considering the implementation in statistical simulations that are usually computationally intensive.Among these advantages can be highlighted its computational eciency by using a Low Level Virtual Machine (LLVM) based compiler with run-time compilation (JIT).JIT computing solutions exist in language like R (see compiler package), however it is something that has been recently developed unlike Julia which is a compiled language by denition.Several benckmarks can be obtained on the web showing the superiority with respect to the computational eciency of the language Julia comparing it with other languages.
In addition to being a compiled language by denition, Julia is a programming language that is being designed for parallel computing and does not impose any specic parallelism to the programmer.Instead, it provides several important building blocks for distributed computing, making it exible enough to support multiple styles of parallelism and allowing users to add other styles.In order to use all the computational resources of the available hardware, each Monte Carlo iteration was broken into threads using the macro @threads.In this way, it was possible to perform eight iterations simultaneously at every step.In order to be able to execute the code below it is necessary to install the libraries Distributions and Optim to have access to some functions of probability distributions and global optimization, respectively.In Appendix we provide the Julia script setting the exponential distribution for G.By the simulations it is possible to observe improvements in the MLEs obtained when using bootstrap correction.However, the corrections are more signicant and perhaps more justiable in small sample sizes (20 or 60).
The statistics W * and A * for all the models are given in Tables 8 and 9     @time mc_estimates, mc_estimates_boot, mc_error_boot = ekwg_monte_carlo_bias(m, b, n, true_parameters, par1) print("\n---> Average uncorrected estimates: ", mc_estimates,"\n") print("\n---> Mean of the bootstrap-corrected estimates: ", mc_estimates_boot,"\n") print("\n---> Average bootstrap error estimates: ", mc_error_boot) will not be symmetric.The three extra parameters in (4) can control both tail weights and possibly adding entropy to the center of the EKw-G density function.Henceforth, X ∼EKw-G(a, b, c) denotes a random variable having density function (4).Each new EKw-G distribution can be obtained from a specied G model.For a = b = c = 1, the G distribution is a special model of the EKw-G class with a continuous crossover towards cases with dierent shapes (e.g., a particular combination of skewness and kurtosis).Some cases of EKw-G have been discussed and explored in recent works.Here, we refer to the papers and baseline distributions: Huang & Oluyede (2014) for the Dagum, Rodrigues & Silva (2015) for the exponential and Rodrigues, Silva & Hamedani (2016) for the inverse Weibull.One major benet of the family ( This model is also called the Lehmann type I distribution.Note that there is a dual transformation Exp c (1 − G) referred to as the Lehmann type II distribution corresponding to the cdf F (x) = 1 − [1 − G(x)] c .Thus, equation (3) encompasses both Lehmann type I (Exp c G for a = b = 1) and Lehmann type II (Exp c (1−G) for a = c = 1) distributions (Lehmann 1953).Clearly, the triple construction Exp c 1-Exp b 1-Exp a G generates the EKw-G class of distributions.Several properties of the EKw-G class can be derived using this triple transformation.

Figure 1 :Figure 2 :
Figure 1: Plots of the EKwW density function for some parameter values.

Figure 3 :
Figure 3: Plots of the EKwGu density function for some parameter values.

a b c − 1 .Figure 4 :
Figure 4: Plots of the EKwGa density function for some parameter values.

Figure 5 :
Figure 5: Plots of the EKwGa hrf for some parameter values.

Figure 6 :
Figure 6: Plots of the EKwBXII density function for some parameter values.

Figure 7 :
Figure 7: Plots of the EKwBXII hrf for some parameter values.
reveals that the EKw-G density function is a linear combination of Exp-G densities.Thus, some structural properties of the EKw-G class such as the ordinary and incomplete moments and generating function can be obtained from well known Exp-G properties.Several Exp-G properties have been studied by many authors in recent years, seeMudholkar and Srivastava (Mudholkar & Srivastava 1993) and Mudholkar et al.(Mudholkar, Srivastava & Kollia 1996) for exponentiated Weibull, Gupta et al.(Gupta et al. 1998) for exponentiated Pareto, Gupta and Kundu(Gupta & Kundu 1999) for exponentiated exponential, Nadarajah(Nadarajah 2005) for exponentiated Gumbel, Nadarajah and Gupta(Nadarajah & Gupta 2007) for exponentiated gamma and Lemonte et al.(Lemonte, Barreto-Souza & Cordeiro 2013) for exponentiated Kumaraswamy distributions.See, also, Nadarajah and Kotz(Nadarajah & Kotz 2006), among others.Equations (6)-(9) are the main results of this section.
(θ) = n log(a) + n log(b) + n log(c) + n j=1 log[g (x j ; η)] for the current data.The proposed EKwW model ts these data better than the other models based on the values of W * and A * .This model may be an interesting alternative to other models available in the literature for modeling positive real data.
Table 1 gives some values from equation (21) for the EKw-exponential model (parameter λ) with dierent parameter values.

Table 1 :
The moments of EKw-exponential for (a = 2, b = 1 and c = 1) and some values of r and λ.

Table 4 :
Estimates of maximum likelihood corrected, errors and biases obtained by bootstrap in dierent sizes of samples for EKw-exponential distribution.

Table 5 :
Estimates of maximum likelihood corrected, errors and biases obtained by bootstrap in dierent sizes of samples for EKwW distribution (with β = 1.5 xed).

Table 7 :
MLEs (standard errors in parentheses). of these statistics, the better the t to the data.Let F (x; θ) be the cdf, where the form of F is known but θ (a k-dimensional parameter vector, say) is unknown.To obtain the statistics W * and A * , we can proceed Revista Colombiana de Estadística 42 (2019) 133 as follows: (i) Compute v i = F (x i ; θ), where the x i 's are in ascending order; (ii) Compute y i = Φ −1 (v i ), where Φ(•) is the standard normal cdf and Φ −1 (•) its inverse; (iii) Compute u i = Φ{(y i − ȳ)/s y }, where ȳ = (1/n)

Table 8 :
Formal statistics