A New Extended-X Family of Distributions: Properties and Applications

During the past couple of years, statistical distributions have been widely used in applied areas such as reliability engineering, medical, and financial sciences. In this context, we come across a diverse range of statistical distributions for modeling heavy tailed data sets. Well-known distributions are log-normal, log-t, various versions of Pareto, log-logistic, Weibull, gamma, exponential, Rayleigh and its variants, and generalized beta of the second kind distributions, among others. In this paper, we try to supplement the distribution theory literature by incorporating a new model, called a new extended Weibull distribution. The proposed distribution is very flexible and exhibits desirable properties. Maximum likelihood estimators of the model parameters are obtained, and a Monte Carlo simulation study is conducted to assess the behavior of these estimators. Finally, we provide a comparative study of the newly proposed and some other existing methods via analyzing three real data sets from different disciplines such as reliability engineering, medical, and financial sciences. It has been observed that the proposed method outclasses well-known distributions on the basis of model selection criteria.


Introduction
In the practice of statistical theory, particularly, in engineering, medical, and financial sciences, data modeling is an interesting research topic. In this context, the statistical distributions are worthwhile for modeling such data sets. The most frequently used statistical distributions are exponential, Rayleigh, Weibull, beta, gamma, log-normal, Pareto, Lomax, and Burr, among others. However, these traditional distributions are not flexible enough for countering complex forms of the data sets. For example, in reliability engineering and biomedical sciences, the data sets are usually unimodal and skewed to the right; see Demicheli et al.'s [1], Lai and Xie's [2], Zajicek's [3], and Almalki and Yuan's [4] studies. Hence, in such cases, the utilization of the exponential, Rayleigh, Weibull, or Lomax distributions may not be a suitable choice to employ. On the other hand, the gamma, beta, and lognormal distributions do not have closed forms for the cumulative distribution function (cdf) causing difficulties in estimating the parameters.
Furthermore, in financial and actuarial risk management problems, the data sets are usually unimodal, skewed to the right, and possess thick right tail; for details see, Cooray and Ananda's [5] and Eling's [6] studies, among others. The distributions that exhibit such characteristics can be used quite effectively to model insurance loss data to estimate the business risk level. The distributions commonly used in the literature include Pareto by Cooray and Ananda [5], Lomax by Scollnik [7], Burr by Nadarajah and Bakar [8], and Weibull by Bakar et al. [9], which are particularly appropriate for modeling of insurance losses, financial returns, file sizes on the network servers, etc. Unfortunately, these distributions are subject to some sort of deficiencies. For example, the Pareto distribution, due to the monotonically decreasing shape of the density, does not provide the best fit in many applications, whereas the Weibull model is capable of covering the behavior of small losses, but fails to cover the behavior of large losses.
Moreover, Dutta and Perry [10] provided an empirical study on loss distributions using exploratory data analysis and other empirical approaches to estimate the risk. They rejected the idea of using exponential, gamma, and Weibull distributions due to their poor results and pointed out that one would need to use a model that is flexible enough in its structure. Hence, there are only few probability distributions capable of modeling heavy tailed data sets and none of them are flexible enough to provide greater accuracy in fitting complex forms of data.
To address the problems stated above, the researchers have shown an increased interest in defining new families of distributions by incorporating one or more additional parameters to the well-known distributions. The new families have been defined through many different approaches introducing additional, location, scale, shape, and transmuted parameters, to generalize the existing distributions. These generalizations are mainly based on, but not limited to, the following approaches: (i) transformation of the variable and (ii) compounding of two or more models; in detail, we refer the interested readers to studies by Tahir and Cordeiro [11], Bhati and Ravi [12], and Ahmad et al. [13].
One of the most interesting methods of adding the shape parameter to the existing distributions is exponentiation. The exponentiated family pioneer to Mudholkar and Srivastava [14] is defined by the following cdf: where a is the additional shape parameter. Marshall and Olkin [15] pioneered a new simple approach of introducing a single-scale parameter to a family of distributions. The cdf of the Marshall-Olkin (MO) family is given by where σ is the additional scale parameter. Cordeiro and Castro proposed (2010) proposed the Kumaraswamy-G family defined by where a and b are the additional shape parameters.
Mostly, so far in the literature either the scale or shape parameters are introduced to propose a new family of distributions. Introducing both the scale and shape parameters to a family of distribution may increase the level of flexibility. But the number of parameters increases, and the estimation of parameters and computation of many mathematical properties become complicated.
In the premises of above, a new attempt has been made to introduce more flexible probability distributions by introducing a single additional parameter which serves as a scale as well as a shape parameter and provides greater accuracy in fitting real-life data in applied fields such as reliability engineering, medical, and financial sciences. Hence, in this paper, a new method is proposed to introduce new statistical distributions. The proposed family may be named as a new extended-X (NE-X) family. A random variable X is said to follow the proposed family, if its cdf is given by The introduction of the additional parameter θ in expression (4) adds greater distributional flexibility to the baseline distributions with cdf Fðx ; ξÞ which may depend on the vector parameter ξ. The additional parameter plays the role of both scale and shape parameters. The probability density function (pdf) corresponding to (4) is We concentrate our focus to a special submodel of the proposed family, called a new extended Weibull (NE-W) distribution.
Finally, we direct our attention to the results related to the NE-W model with real life data in three different disciplines. The first data set is taken from biomedical field, and the results of the proposed model are compared with five other competitive models including (i) two-parameter Weibull distribution and (ii) three-parameter models such as flexible Weibull extended (FWE), alpha power transformed Weibull (APTW), Marshall-Olkin Weibull (MOW), and modified Weibull (MW) distributions. The second data set is taken from reliability engineering, and the results of the proposed model are compared with three other well-known distributions such as (i) the three-parameter extended alpha power transformed Weibull (Ex-APTW), (ii) four-parameter Kumaraswamy Weibull (Ku-W), and (iii) beta Weibull (BW) distributions. The third data set is taken from financial sciences, and the results of the proposed model are compared with the Weibull and other heavy tailed models including Lomax and Burr-XII (B-XII) distributions.
The rest of the paper is organized as follows: in Section 2, a special case of the proposed family is introduced and the shapes of its density and hazard functions are investigated. Some mathematical properties of the proposed family are derived in Section 3. Maximum likelihood estimators of the model parameters are obtained in Section 4. In the same section, a Monte Carlo simulation study is conducted. Practical applications are analyzed in Section 5. Here, the NE-W 2 Computational and Mathematical Methods in Medicine distribution is compared with the models mentioned above under different measures of discrimination and other goodness of fit measures. Finally, some concluding remarks are given in the last section.

Model Description
In this section, we introduce the NE-W distribution. Considering the cdf of the two-parameter Weibull distribution with the shape parameter α > 0 and scale parameter γ > 0, given by Fðx ; ξÞ = 1 − e −γx a , x ≥ 0, and pdf, given by f ðx ; ξÞ = aγx a−1 e −γx a , respectively, where ξ = ðα, γÞ. Then, the cdf of the NE-W distribution is given by The density function of the NE-W distribution is Some possible shapes for the density and hazard functions of the NE-W distribution are sketched in Figures 1  and 2, respectively, In Figure 1, we plotted different shapes for the density of NE-W distribution. When α, θ < 1, then the density of the proposed model behaves like exponential distribution. But as the value of these parameters increases, the proposed model captures the characteristics of the Rayleigh and Weibull distributions. However, the proposed model has certain advantages over these distributions, since it provides the best fit to data in different disciplines as shown in Section 5. The hrf is plotted in Figure 2. The hazard function of the proposed model is very flexible in accommodating different shapes, namely, decreasing, increasing, unimodal, and bathtub; hence, the NE-W distribution becomes an important model to fit several real lifetime data in applied areas such as reliability, survival analysis, economics, and finance.

Mathematical Properties of the NE-X Distributions
In this section, we study some mathematical properties of the NE-X distributions such as the quantile function, r th moment, and moment generating function.
3.1. Quantile Function. The quantile function of the NE-X distributions is given by where u ∈ ð0, 1Þ. From expression (8), we can see that the proposed model has a closed form solution of the quantile function which makes it easier to generate random numbers for the subcase of the NE-X family.

3.2.
Moments. This subsection deals with the derivation of r th moment of the NE-X distributions. The r th moment of the NE-X distributions is derived as Using (5) in (9), we have Using the expansion (https://math.stackexchange.com/ questions/1624974/series-expansion-1-1-xn) and using x = ð1 − θÞFðx ; ξÞ 2 and n = θ + 1 in (11), we get Also using the series representation and using y = F ðx ; ξÞ 2 and m = θ − 1 in (13), we get Using (12) and (14) in (10), we have where Numerical values for the mean, variance, skewness (Sk), and kurtosis (Kur) of the NE-W distribution for some selected values of the parameters are given in Tables 1 and   3 Computational and Mathematical Methods in Medicine 2. To check the effect of the additional parameter on Sk and Kur, (i) we kept the parameters α and γ constant and allow θ to vary and then (ii) we kept constant the parameters θ and γ and allow α to vary.
From the numerical results provided in Table 1, it is clear that as the additional parameter θ increases the mean and variance decrease, whereas increasing θ results in increasing the Sk and Kur of the model showing that the proposed distribution is leptokurtic, unimodal, and skewed to the right. From the results provided in Table 1, we can also detect that increase in the parameter θ results in producing skewness to the right indicating heavy tail to the right. Also, from the results in Table 2, we can see that as the parameter α increases, the distribution produces skewness to the right Using (15) in (17), we get the mgf of the NE-W distributions.

Maximum Likelihood Estimation and Simulation Study
This section offers the maximum likelihood estimators of the model parameters and provide Monte Carlo simulation study to assess the behavior of these estimators.
The log-likelihood function can be maximized directly either by using the ASS (PROC UNMIXED) or by solving the nonlinear likelihood equations obtained by differentiating (18). The partial derivatives of (18) are as follows: Equating the nonlinear system of equations ð∂ℓ n ðΘÞÞ/∂θ and ð∂ℓ n ðΘÞÞ/∂ξ to zero and solving these expressions simultaneously yield the MLEs b θ and b ξ, respectively. From expressions (19), it is clear that these expressions are not in explicit forms. Therefore, computer software can be used to solve these expressions numerically. We use optimðÞ R-function with the argument method = } SANN } to obtain the maximum likelihood estimators. The expression (18) can be used to obtain the MLEs for any subcase of the proposed family. For the NE-W distribution, the expressions for the MLEs are derived in the appendix.

Monte Carlo Simulation Study.
In this subsection, we investigate the performance of the maximum likelihood estimators of the proposed distribution. For the simulation purposes, the NE-W distribution is considered. We use the inverse cdf method for generating random numbers from the NE-W distribution. If U ∼ Uð0, 1Þ and if G has an inverse function, then

Comparative Study
As we have mentioned earlier, the researchers have been developing new distributions to provide the best fit to reallife data in applied areas such as reliability engineering, medical, actuarial, and financial sciences. Therefore, in this section, we consider three real life applications from different discipline of applied areas including medical, engineering, and financial sciences. For each data set, the NE-W distribution is compared with different well-known distributions and we observed that the proposed distribution outclasses other competitors.
To decide about the goodness of fit among the applied distributions, we consider certain analytical measures. In this regard, we consider two discrimination measures such as the Akaike information criterion (AIC) introduced by Akaike [16] and Bayesian information criterion (BIC) of Schwarz [17], and Scollnik [18]. These following measures are given: (ii) The BIC is given by where ℓ denotes the log-likelihood function evaluated at the MLEs, k is the number of model parameters, and n is the sample size. In addition to the discrimination measures, we further consider other goodness of fit measures such as the Anderson Darling (AD) test statistic, Cramer-von Mises (CM) test statistic, and Kolmogorov-Smirnov (KS) test statistic with corresponding p values. These following measures are given: (i) The AD test statistic where n is the sample size and x i is the i th sample, calculated when the data is sorted in an ascending order (ii) The CM test statistic (iii) The KS test statistic is given by where G n ðxÞ is the empirical cdf and sup x is the supremum of the set of distances A distribution with lower values of these analytical measures is considered to be a good candidate model among the applied distributions for the underlying data sets. By considering these statistical tools, we observed that the NE-W distribution provides the best fit compared to other distributions because the values of all of the selected criteria of goodness of fit are significantly smaller for the proposed distribution.

A Real Life Application of Biomedical Analysis.
The bladder cancer is the ninth most frequently diagnosed malignancy worldwide [19] and one of the most prevalent, representing 3 of cancers diagnosed globally [20]. Bladder cancer accounts for an estimated 386,000 new diagnoses and 150,000 related deaths annually. Early detection of bladder cancer remains one of the most urgent issues in many researches. The first data set is taken from Lee and Wang [21]; the authors studied the remission times (in months) of a random sample of 128 bladder cancer patients. They rejected the hypothesis of using the exponential and Weibull distributions for modeling medical sciences data having nonmonotic hazard function. The authors observed that the extended versions of these classical distributions can be used quite effectively to model such type of data. The proposed NE-W model is applied to this data in comparison with other well-known competitors. The distribution functions of the competitive models are as follows: (2) APTW distribution (4) MW distribution The maximum likelihood estimators with standard error (in parenthesis) of the model for the analyzed data are   9 Computational and Mathematical Methods in Medicine presented in Table 3. The discrimination measures along with the goodness of fit measures of the proposed and other competitive models are provided in Table 4. Form the results provided in Table 4, it is clear that the proposed distribution has lower values of these measures than the other models. The      10 Computational and Mathematical Methods in Medicine data are spread out. From Figure 8, we can easily detect that the data has a heavy tail skewed to the right (Box plot) and the proposed model closely followed the PP plot.

A Real Life Application from Reliability Engineering.
Here, we investigate the NE-W distribution via analyzing reliability engineering data set taken from Algamal [22] representing the failure time of coating machine. To show the potentiality of the proposed method, the proposed model and other competitive distributions are applied to this data set and it is observed that the NE-W model again outclassed the well-known distributions. The distribution functions of the competitive models selected for the second data set are as follows: (2) Ku-W distribution (3) BW distribution Corresponding to data set 2, the values of the model parameters are reported in Table 5. The analytical measures of the proposed and other competitive models are provided in Table 6. The estimated cdf and Kaplan-Meier survival plots are sketched in Figure 9, which shows that proposed distribution fits the estimated cdf and Kaplan-Meier survival plots very closely. The PP plot and box plot are sketched in Figure 10. From the box plot of the second data set, it is also clear that the data set has heavier tail.

A Real Life
Application from Insurance Sciences. The third data set was taken from the insurance sciences representing the vehicle insurance losses available at http://www. businessandeconomics.mq.edu.au/our_departments/Applied_ Finance_and_Actuarial_Studies/research/books/GLMsforInsu ranceData. We fitted the proposed model in comparison with the other models. The distribution functions of the competitive models are as follows: (2) Burr For the third data set, parameter values are reported in Table 7, and the analytical measures are presented in Table 8. The estimated cdf and Kaplan-Meier survival plots are sketched in Figure 11. The PP plot and Box plot are

Concluding Remarks
The importance of the extended distributions was first realized in financial sciences and later in other applied fields such as engineering and medical sciences. To cater data in those fields, a number of methods have been introduced. In this context, we have proposed a versatile three-parameter distribution, called a new extended Weibull distribution using a new approach allowing closed form expressions for some basic mathematical and other related properties. The applicability of the proposed family has been illustrated via three data sets from medical, engineering, and financial sciences, and the model performs reasonably well as compared to some well-known distributions.
This new development, which has a promising approach for data modeling in the field, may be very useful for practitioners who handle such data sets. For that reason, it can be deemed as an alternative to the Weibull and other wellknown competitors. ðA:2Þ

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
There is no competing interest regarding the publication of this paper.

12
Computational and Mathematical Methods in Medicine