A New Extended Weibull Distribution with Application to Inﬂuenza and Hepatitis Data

: The Weibull is a popular distribution that models monotonous failure rate data. In this work, we introduce the four-parameter Weibull extended Weibull distribution that presents greater ﬂexibility, thus modeling data with bathtub-shaped and unimodal failure rate. Some of its mathematical properties such as quantile function, linear representation and moments are provided. The maximum likelihood estimation is adopted to estimate its parameters, and the log-Weibull extended Weibull regression model is presented. In addition, some simulations are carried out to show the consistency of the estimators. We prove the greater ﬂexibility and performance of this distribution and the regression model through applications to inﬂuenza and hepatitis data. The new models perform much better than some of their competitors.


Introduction
The Weibull is a traditional distribution for positive real data. However, it does not accommodate data with unimodal hazard function or bathtub shape. Several modifications of the Weibull appeared to model non-monotone hazard rates, including the extended Weibull (EW) model [1]. There are also many references regarding extensions in which one seeks to obtain hazard functions that are unimodal or bathtub shaped (see [2][3][4], which provide a survey of the modified Weibull distributions). Most recently, refs. [5,6] defined the Maxwell-Weibull and the alpha power Kumaraswamy Weibull, respectively.
Two papers on EW distribution [7,8] have been most seminal in that they pioneered the development of distributions for bathtub-shaped hazard rates. Since the publication of these papers, many distributions and in particular other generalizations of the twoparameter Weibull distribution have been proposed, each allowing for non-monotone and bathtub-shaped hazard rates. It has been proven in the literature that the EW distribution provides significantly better fits than traditional models based on the exponential, gamma, Weibull and lognormal distributions. Thus, this is a central point to choose this distribution for the baseline model in this article.
Some works using influenza data are studied from a non-parametric point of view [16] or by using logistic regression [17] and functional data analysis [18]. On the other hand, spatial regression [19], machine learning models [20], Markov chains [21], and epidemiological models involving the fractal-fractional Caputo category [22] have been used in studies with Stats 2023, 6 hepatitis data. Our main idea with applications to real data is to show the flexibility of the new distribution that adds one more parameter in the EW distribution as well as to the new log-Weibull extended Weibull (LWEW) regression model. As examples of the application of these models, we use time data (in days), which comprises the date of hospitalization until cure of influenza patients. To apply the LWEW regression model, a data set obtained from the literature of a study with hepatitis patients is used, in which the variable of interest is the time until death from hepatitis. The result "time until the occurrence of an event of interest" is the variable of interest in survival analysis studies, and one of the main characteristics of this type of study is censoring, i.e., the partial observation of the response. Furthermore, when considering the regression structure, we can analyze possible influences of characteristics of individuals in the sample under study on the response variable.
The three-parameter EW probability density function (pdf) of the random variable X is where α ≥ 0 and β > 0 are the shapes, and λ > 0 is the scale. The support of the EW distribution is R + , and its rth ordinary moment becomes where Γ(·) and B(·, ·) are the gamma and beta functions, respectively. For lifetime models, it is of interest to know the rth incomplete moment of X, say T r (x) = x 0 u r f (u)du, which has the form where 2 F 1 is the hypergeometric function defined by and γ(s, x) = x 0 t s−1 e −t dt , s > 0 , is the incomplete gamma function. We define the Weibull extended Weibull (WEW) distribution in Section 2. The quantile function (qf) and linear representation are reported in Section 3. Estimation by the maximum likelihood method is discussed in Section 4. A simulation and a misspecification study are presented in Section 5. We define the log-Weibull extended Weibull (LWEW) regression in Section 6 and perform a simulation study for this model. Applications to influenza and hepatitis data are reported in Section 7. Some conclusions are summarized in Section 8.

The WEW Distribution
Consider the W-G class of distributions [9] with scale a = 1 and shape b > 0. By taking the pdf (1) for the baseline in this class, the cumulative distribution function (cdf) and pdf of the WEW distribution become (for x > 0) respectively. Henceforth, we change the notation and let X ∼ WEW(b, α, β, λ) have pdf (5). The WEW distribution has some special cases: the EW when b = 1, W-Weibull (WW) when α = 0, W-exponential (WE) when α = 0 and β = 1. Figures 1 and 2 report the densities and hazard rate functions (hrfs) for fixed parameters, respectively. Plots of the WEW hrf can be inverted bathtub, bathtub, monotonically increasing, and monotonically decreasing.
In Figure 3a, the skewness B decreases (for fixed β) when b grows. In Figure 3b, B increases to β = 0.5 when α increases, but for larger values of β fixed, it tends to become constant. In Figure 3c, B decreases (for any α) when β grows. In Figure 4a, the kurtosis M decreases for β = 1 if b grows. For high values of β (fixed), M drops drastically when b grows, and after that, this curvature will be reversed, and then, M increases when b grows. In Figure 4b, as the parameter α increases, M is increasing for β = 0.3, β = 0.5 and β = 1, and it tends to become constant for β = 2. In Figure 4c, M decreases for any α when β grows.

Linear Representation
In Appendix A, it is given a linear representation for the WEW pdf, namely where (for j .
In conclusion, this representation is important since complete and incomplete moments, generating function, mean deviations, and reliability of X can be determined from those of the EW distribution.

Moments
We can study some important characteristics of the distribution through moments. It follows from Equations (2), (A5) and (A6) It is simple to verify from Equations (3), (A5) and (A6) that T r (x) can be expressed as We can obtain the mean deviations and Lorenz and Bonferroni curves from the first incomplete moment.

Estimation
Let x 1 , · · · , x n be a sample of size n from (5). The log-likelihood function for θ = (b, α, β, λ) from this sample reduces to Equation (10) for α = 0 gives the log-likelihood for the WW distribution. The maximum likelihood estimates (MLEs) can be found by maximizing l(θ) using the Adequecy-Model library [25] of the R software; another option is the maxLik function via the maxLik library that provides a convenient interface for the MLEs [26], or by the optim function by selecting an optimization method, for example, BFGS, CG, and SANN, and still finding the Hessian matrix. We also can maximize (10) numerically using SAS (PROCNLMIXED) or the Ox program (sub-routine MaxBFGS), among others. The score components in
We calculated the average estimates, biases and mean squared errors (MSEs) in Table 1. The biases and MSEs decrease when n grows. Thus, the estimators are consistent.

Misspecification Study
We investigated the behavior of the MLEs of the parameters in the WEW distribution when it was poorly specified by carrying out Monte Carlo simulations based on 1000 replications (for n = 100). The observations were simulated by taking b = 0.8, α = 3, β = 2 and λ = 3. We used the maxLik library with the SANN method for each generated data set. In Table 2, the observed values are generated from the Gamma Extended Weibull (GEW) distribution [27] by taking a = 0.8, α = 3, β = 2, and λ = 3. In Table 3, the observed values are generated from the EW distribution by setting α = 3, β = 2, and λ = 3. Further, in Table 4, the observed values are generated from the WW distribution with b = 0.8, β = 2, and λ = 3. Table 2. Simulation results for the GEW distribution when n = 100, a = 0.8, α = 3, β = 2 and λ = 3.  In addition to the average estimates (AEs), the relative biases (RB), and MSEs, we present the mean measures of global deviance (GD), say GD = −2l(θ), where l(θ) is the maximized log-likelihood function (10), AIC and BIC. They indicate that there are small sample biases in the parameter estimation. The average measures of GD, AIC and BIC for the estimated WEW distribution are very close to those values obtained from the true distributions used in the generation of the observed values. Hence, the WEW distribution provides consistent MLEs even when the data are generated from different distributions.

Measures
Clearly, the goodness-of-fit measures (GD, AIC, and BIC) are lower for the distribution from which the data are generated.

The LWEW Regression Model
If X has the WEW pdf (5), then Y = log(X) has the log-Weibull extended Weibull (LWEW) pdf (with real support) reparameterized in terms of σ = β −1 and µ = −σ log(λ), which can be expressed as (for y ∈ R) where b , α , σ > 0 and µ ∈ R. For α = 0, we obtain the log-Weibull Weibull (LWW) model, where µ is a location and σ is a scale. The survival function of Y has the form The density of Z = (Y − µ)/σ (for z ∈ R) can be expressed as We construct a regression based on the LWEW distribution where z i has pdf (13), γ = (γ 1 , · · · , γ p ) is the vector of coefficients, and v i = (v i1 , · · · , v ip ) is the vector of covariates for the ith response y i , which models the location parameter Consider that F and C are groups of individuals that failed and are censored, respectively. The log-likelihood for θ = (b, α, σ, γ ) can be found from (13) and (14) as where q is the number of failures, and z i = (y i − v i γ)/σ. The MLE θ of θ can be found by maximizing (15).

Regression Simulation Study
A simulation study was conducted using the BFGS algorithm in R to examine the accuracy of the MLEs of the LWEW regression model with parameters: γ 0 = 2.2, γ 1 = 1.2, σ = 1.5, b = 2 and α = 5. We considered 1000 Monte Carlo replications for n = 30, 50, and 100, and censoring percentages 0%, 10%, 30%, and 66% generated using the inverse transformation method. Occurrences of the Bernoulli distribution with success probability (1 − p) are generated to obtain the censored observations, where p is the percentage of censoring. The location parameter is The AEs, biases, and MSEs are reported in Table 5. The biases and MSEs usually decrease when n grows. By increasing the percentage of censoring for a fixed sample size, the biases and MSEs decrease for most AEs. Thus, an improvement in the accuracy of the estimators occurs. Clearly, it is not possible to note the same behavior for b. This can be explained, probably, because the estimators are naturally biased since the likelihood function in the presence of censoring has the contribution of the survival function.

The WEW Distribution
Consider a data set from the City of São Paulo (Brazil) obtained from the Severe Acute Respiratory Syndrome on the platform of the Ministry of Health (BD-SRAG at https://opendatasus.saude.gov.br/dataset/srag-2021-a-2023, accessed on: 26 May 2022), which comprises events from 31 December 2021 to March 2022. The data set passed for a filter process to obtain the 162 times (measured in days) of influenza patients from the date of admission to the hospital until cure.
The MLEs and their standard errors (SEs between parentheses) found via the SANN method (with AdequacyModel, GenSA and MASS libraries from R software) are reported in Table 6. We adopted the well-known W * , A * and KS statistics (with abbreviations in place of full names) to compare the WEW distribution with some competitive distributions. We used AIC, CAIC, and BIC to compare the new distribution with some special cases. The findings are reported in Table 7. Further, the likelihood ratio (LR) in Table 6 confirms the superiority of the WEW distribution for these data.
Further, we compared the proposed distribution with the previous models via the generalized likelihood ratio (GLR) test [33]. The results in Table 8 indicate that the WEW distribution is the most suitable model. The histogram and the best four fitted pdfs are displayed in Figure 5a. Figure 5b reports the empirical and estimated cdfs. They also reveal the superiority of the WEW distribution.

The LWEW Regression Model
We used a data set from a randomized clinical trial carried out to investigate the effect of therapy with steroids in the treatment of acute viral hepatitis [34]. Twenty-nine patients with this disease were randomized to receive either a placebo (lactose) or the steroid (Methylprednisolone) treatment. Each patient was followed for 16 weeks or until death (event of interest) or even loss of follow-up (censoring). The observed survival times, in weeks, for the two groups are reported in Table 9. The explanatory variable in this work is taken as: (v 1 ): treatment (placebo = 1, steroid = 2).
We fit the LWEW regression model where z i has pdf (13). Some competing models for the regression modeling are: log-gamma extended Weibull (LGEW) [27], log-beta Weibull (LBW) [35], and log-Kumaraswamy-Weibull or Kumaraswamy Gumbel (KwGu) [36]. Table 10 provides the MLEs for the fitted LWEW, LEW, LWW, LGEW, LBW and KwGu regressions via the maxLik function and the BFGS method in R software. The codes can be accessed at https://github.com/elisangelacbiazatti/WEW (accessed on 28 April 2023). This table shows that the LWEW is the best model. The LR statistic confirms the superiority of the LWEW model for both its sub-models at the 1% level of significance. Further, control treatment and steroids are statistically different. Thus, patients who received control treatment had a shorter time to death than patients who received steroids, since the estimate of the coefficient of the treatment variable (v 1 ) is negative. The plots of the Kaplan-Meier and estimated survival functions in Figure 6 support that the LWEW model is the best among the fitted models. The plot of the deviance residuals randomized around zero is reported in Figure 7a. A normal plot with an envelope is shown in Figure 7b. The model fits the data reasonably well.

Conclusions
We introduced the Weibull extended Weibull density and provided some of its properties. The consistency of the maximum likelihood estimators is proven by a simulation study. An application to real influenza data revealed its flexibility. We constructed a regression model log-Weibull extended Weibull and performed some simulations to study the behavior of the estimators in small and large samples. We compared the fit to acute viral hepatitis data with other existing models and performed a residual analysis study for the final model. Overall, the two applications showed the utility of the new models for symmetric and asymmetric data, censored or uncensored. In future works, we can, for example, select other systematic components for the regression model and, as an alternative method, present the estimation of the model parameters from the Bayesian approach.