The Hjorth's IDB Generator of Distributions: Properties, Characterizations, Regression Modeling and Applications

We introduce a new flexible class of continuous distributions via the Hjorth’s IDB model. We provide some mathematical properties of the new family. Characterizations based on two truncated moments, conditional expectation as well as in terms of the hazard function are presented. The maximum likelihood method is used for estimating the model parameters. We assess the performance of the maximum likelihood estimators in terms of biases and mean squared errors by means of the simulation study. A new regression model as well as residual analysis are presented. Finally, the usefulness of the family is illustrated by means of four real data sets. The new model provides consistently better fits than other competitive models for these data sets.

Hence, If U is a uniform random variable on (0, 1) then Q Hj (U) is an IDB random variable, where Q Hj (⋅) is the solution of the Equation (2). On the other hand, Alzaatreh et al. [2] proposed a new technique to construct wider families by using any pdf as a generator. This generator called the T-X family of distributions has cdf defined by Based on the above transformer T-X generator, we propose a new wider family of continuous distribution, the Hjorth-G family by replacing r (t) with the Hjorth density function and having cdf given by where G (x; ) = 1 − G (x; ) is the baseline survival function depending on a q × 1 vector of unknown parameters, G (x; ) is the baseline cdf and , and are the extra scale and shape parameters which are ensure flexibility to baseline distribution. The pdf corresponding to equation (3) is given by Hereafter, the random variable X with pdf (4) is denoted by X ∼Hj −G( , , , ). Further, we can omit the dependence on the parameter vector of the parameters and simply write G (x; ) = G (x) and g (x; ) = g (x). If T is an IDB random variable with cdf (1), then X = is the Hj-G random variable where G −1 (⋅) is the qf of the baseline distribution. Hence, qf of X is the solution of the non- , u ∈ (0, 1). The hazard rate function (hrf) of X is given by The goal of this work is to introduce a new flexible and wider family of the distributions based on T-X family using the IDB model. We are motivated to introduce the Hj −G family because it exhibits increasing, decreasing, constant, upside down, unimodal then bathtub as well as bathtub hazard rates as shown in Figures 1 and 2. The members of the Hj −G family can also be viewed as a suitable model for fitting the bimodal, unimodal, U-shaped and other shaped data. The Hj −G family outperforms several of the well-known lifetime distributions with respect to three real data applications as illustrated in Section 9. The new log-regression model based on the Hj −Weibull provides better fits than the log-Topp-Leone odd log-logistic-Weibull [3] and log-Weibull regression models for Stanford heart transplant data set.
The paper is organized as follows: Some sub-families of the new family are introduced in Section 2. In Section 3, the series expansions for cdf and pdf of the new family are presented. In Section 4, some of its mathematical properties are derived. Section 5 deals with some characterizations of the new family. In Section 6, the maximum likelihood method is used to estimate the parameters. A new regression model as well as residual analysis are presented in Section 7. In Section 8, two simulation studies are performed to evaluate the efficiency of the maximum likelihood estimates. In Section 9, we illustrate the importance of the new family by means of three applications to real data sets. The paper is concluded in Section 10.

SPECIAL HJ-G DISTRIBUTIONS
The Hj-G family can extend to any baseline distribution due to its shape and scale parameters. So, the pdf (4) will generate more flexible distributions than baseline model. Also, Hj-G family includes some sub-families such as Rayleigh-G, exponential-G and linear failure rate-G families for = 0, = = 0 and = 0, respectively. We note that the Rayleigh-G and exponential-G families are the special members of the Weibull-G family which was introduced by Corderio et al. [4]. Here, we obtain three special models of the Hj-G family. These special models extend some well-known distributions given in the literature.

The Hj-Normal (Hj-N) Distribution
The normal distribution is very useful model in statistics and related field. Since, it has increasing hrf shape, symmetrical and uni-modal pdf shape, its data modeling area can be limited. To extend the normal distribution, we consider Hj-N distribution as our first example by , the cdf and pdf, respectively, where x, ∈ ℝ, > 0 and (⋅) and Φ (⋅) are the pdf and cdf of the standard normal distribution, respectively. We denote this distribution with Hj-N ( , , , , ). Some plots of the Hj-N density and hrf for selected parameter values are displayed in Figure 1. W note that the pdf shapes of the Hj-N can be skewed, bi-modal and uni-modal. Also, its hrf shapes are increasing and firstly increasing shape then bathtube shape.

The Hj-Weibull (Hj-W) Distribution
As our second example, we consider the Weibull distribution, which has monotone hrf and decreasing and uni-modal pdf, with shape parameter > 0 and scale parameter > 0. Its cdf and pdf are given by G (x; , for x > 0, respectively. We denote this distribution with Hj-W ( , , , , ). Some plots of the Hj-W density and hrf for selected parameter values are displayed in Figure 2. Figure 2 shows that the pdf and hrf shapes of the Hj-W can be very flexible. For example, the new extended Weibull distribution has bi-modal, uni-modal, decreasing, firstly increasing shape then U shaped pdf. Nevertheless, it has both monotone and non-monotone hrf shape such as bathtube shape and firstly increasing shape then bathtube shape.

The Hj-Uniform (Hj-U) Distribution
As our third example, let the baseline distribution have an uniform distribution in the interval (a, b) We denote this distribution with Hj-U ( , , , a, b). Some plots of the Hj-U density and hrf for selected parameter values are displayed in Figure 3. Figure 3 shows that the pdf can be increasing, decreasing, uni-modal, firstly decresing then uni-modal and U-shaped. Its hrf has increasing and bathtube shape. Hence, Hj-U distribution may be suggested for hrf model with bathtube shape.

USEFUL EXPANSIONS
In this section, we provide a useful linear representation of the Hj-G cdf function Expanding the quantity A i in power series, we can write and expanding the quantity B i by via Taylor series where j is a positive integer and ( ) is the descending factorial, we get Expanding the quantity C i by using log(1 − z) expansion Using the equation by Gradshteyn and Ryzhik [5], page 17, for a power series raised to a positive integer n where the coefficients c n,k (for k = 1.2, ...) are easily determined from the recurrence relation where C n,0 = a n 0 , the coefficient c n,k can be calculated from c n,0 , ... , c n,k−1 and hence from the quantities a 0 , ... , a i , we have where Π k+1 (x) = G (x) k+1 is the cdf of the exponentiated-G (exp-G) class with power parameter k + 1 and Upon differentiating (6), we obtain where k+1 = −v k+1 and k+1 (x) = (k + 1) g (x) G (x) k denotes the exp-G class density with power parameter k + 1.

General Properties
The r th ordinary moment of X is given by  . The values for mean, variance, √ 1 and 2 for selected Hj-W distributions are shown in Table 1. Table 1 shows that Hj-N distribution can be left skewed and right skewed as well as having different kurtosis values. Hence, the Hj-W model can be useful for data modeling in terms of skewness and kurtosis.
The n th descending factorial moment of X (for n = 1, 2, …) is is the Stirling number of the first kind. Here, we provide two formulae for the moment generating Clearly, the first one can be derived from Equation (7) as Hence, M X (t) can be determined from the exp-G generating function. A second formula for M X (t) follows from (7) as The integral m r, (y) can be determined analytically for special models with closed-form expressions for Q G (u; ) or computed at least numerically for most baseline distributions.

Order Statistics
Suppose X 1 , … , X n is a random sample from any Hj-G distribution. Let X i∶n denote the ith order statistic. The pdf of X i∶n can be expressed as Following similar algebraic developments of Nadarajah et al. [6], we can write the density function of X i∶n as where and r+1 is given in Section 3 and the quantities f j+i−1,k can be determined with (8) is the main result of this section. It reveals that the pdf of the Hj-G order statistics is a linear combination of Exp-G density functions. So, several mathematical quantities of the Hj-G-G order statistics such as ordinary, incomplete and factorial moments, mean deviations and several others can be determined from those quantities of the exp-G distribution.

CHARACTERIZATION
This section deals with various characterizations of the Hj-G distribution. These characterizations are based on (i) a simple relationship between two truncated moments; (ii) the hazard function and (iii) conditional expectation of a function of the random variable. It should be mentioned that for characterization (i) the cdf is not required to have a closed form. We present our characterizations (i) − (iii) in three subsections.

Characterizations Based on Two Truncated Moments
In this subsection we present characterizations of Hj-G distribution in terms of a simple relationship between two truncated moments. The first characterization result employs a theorem due to Glänzel [7] see Theorem 5.1.1 below. Note that the result holds also when the interval H is not closed. Moreover, as mentioned above, it could be also applied when the cdf F does not have a closed form. As shown in Glänzel [8], this characterization is stable in the sense of weak convergence. . Let X ∶ Ω → H be a continuous random variable with the distribution function F and let q 1 and q 2 be two real functions defined on H such that is defined with some real function . Assume that q 1 , q 2 ∈ C 1 (H), ∈ C 2 (H) and F is twice continuously differentiable and strictly monotone function on the set H. Finally, assume that the equation q 1 = q 2 has no real solution in the interior of H. Then F is uniquely determined by the functions q 1 , q 2 and , particularly where the functions is a solution of the differential equation s ′ = ′ q 1 q 1 − q 2 and C is the normalization constant, such that ∫ H dF = 1.
} for x ∈ ℝ. The random variable X has pdf (4) if and only if the function defined in Theorem 5.1.1 has the form Pdf_Folio:64 Proof. Let X be a random variable with pdf (4), then and finally Conversely, if is given as above, then and hence The general solution of the differential equation in Corollary 5.1.1 is where D is a constant. Note that a set of functions satisfying the above differential equation is given in Proposition 5.1.1 with D = 0. However, it should be also noted that there are other triplets ( q 1 , q 2 , ) satisfying the conditions of Theorem 5.1.1.

Characterization Based on Hazard Function
It is known that the hazard function, h F , of a twice differentiable distribution function, F, satisfies the first order differential equation For many univariate continuous distributions, this is the only characterization available in terms of the hazard function. The following characterization establishes a non-trivial characterization of Hj-G distribution in terms of the hazard function, which is not of the above trivial form.
Proof. If X has pdf (4), then clearly the above differential equation holds. Now, if the differential equation holds, then , which is the hazard function of the Hj-G distribution.

Characterizations Based on Conditional Expectation
The following proposition has already appeared in Hamedani [9], so we will just state it here which can be used to characterize the Hj-G distribution.
if and only if

MAXIMUM LIKELIHOOD ESTIMATION (MLE)
We consider the estimation of the unknown parameters of the new family from complete samples only by maximum likelihood method. Let x 1 , ⋯ , x n be a random sample from the Hj-G family with a (q + 3) × 1 parameter vector Θ = ( , , , ⊺ ) ⊺ , where is a q × 1 baseline parameter vector. The log-likelihood function for Θ is given by The components of the score vector,

SIMULATION STUDY
In this section, the performance of the MLEs of Hj-W distribution is discussed via simulation study. The inverse transform method is used to generate random variables from Hj-W distribution.

LOG-HJ-W REGRESSION MODEL
Consider the Hj-W distribution with five parameters presented in Subsection 2.2. Henceforth, X denotes a random variable following the Hj-W distribution and Y = log(X). The density function of Y (for y ∈ ℝ) obtained by replacing = 1/ and = 1/ exp ( ), can be expressed as  where ∈ ℝ is the location parameter, > 0 is the scale parameter and > 0, > 0 and > 0 are the shape parameters. We refer to equation (9) as the log-Hj-W (LHj-W) distribution, say Y ∼ LHj-W( , , , , ). Figure 5 provides some plots of the density function (9) for selected parameter values. They reveal that this distribution is a good candidate for modeling left skewed and bimodal data sets.
The survival function corresponding to (9) is given by and the hrf is simply h(y) = f(y)/S(y). The standardized random variable Z = (Y − )/ has density function Based on the LHj-W density, we propose a linear location-scale regression model linking the response variable y i and to explanatory variable given by where the random error z i has the density function (11), = ( 1 , … , p ) T , > 0, > 0, > 0 and > 0 are unknown parameters. The parameter i = v T i is the location of y i . The location parameter vector = ( 1 , … , n ) T is represented by a linear model = V , where V = (v 1 , … , v n ) T is a known model matrix. Consider a sample (y 1 , v 1 ), … , (y n , v n ) of n independent observations, where each random response is defined by y i = min{log(x i ), log(c i )}. We assume non-informative censoring such that the observed lifetimes and censoring times are independent. Let F and C be the sets of individuals for which y i is the log-lifetime or log-censoring, respectively. The log-likelihood function for the vector of parameters = ( , , , , T ) T from model (12) where u i = exp(z i ), z i = (y i − i )/ i , and r is the number of uncensored observations (failures). The MLÊof the vector of unknown parameters can be obtained by maximizing the log-likelihood function (13). The R software is used to estimate.

Residual Analysis
Residual analysis has critical role in checking the adequacy of the fitted model. In order to analyze departures from the error assumption, two types of residuals are considered: martingale and modified deviance residuals.

Martingale Residual
The martingale residuals is defined in counting process and takes values between +1 and −∞ (see [10] for details). The martingale residuals for LHj-W model is

Modified Deviance Residual
The main drawback of the martingale residual is that when the fitted model is correct, it is not symmetrically distributed about zero.
To overcome this problem, modified deviance residual was proposed by Therneau et al. [11]. Th modified deviance residual for LHj-W model is wherer M i is the martingale residual.

REAL DATA APPLICATIONS
In this section, we consider three applications to real data sets to show the modeling ability of the Hj-N, Hj-W and Hj-U distributions. We compare these distribution models with both distributions of some members of the T-X family, where W[G(x)] is equal to − log[1−G(x)], and some generalizations of ordinary normal, Weibull and uniform distributions. These families and generalized models are the Mc Donald-G (Mc-G) family [12], Gompertz-G (Gom-G) family [13], Generalized odd log logistic-G (GOLL-G) family [14], Weibull-G (W-G) family [4], Lomax-G (Lx-G) family [15], Lindley-G (Li-G) family [16], logistic-G (L-G) family [17], Kumaraswamy odd log logistic normal (KwOLLN) distribution [18], odd Burr normal (OBN) distribution [19], Zografos-Balakkrishnan odd log logistic Weibull (ZBOLLW) distribution [20], additive Weibull (AW) distribution [21] and gamma uniform (GU) distribution [22]. The cdfs of these distributions are available in the literature. To determine the best model, we also compute the estimated log-likelihood valuesl , Akaike Information Criteria (AIC), corrected Akaike information criterion (CAIC), Bayesian information criterion (BIC), Hannan Quinn information criterion (HQIC), Cramer-von-Mises (W * ) and Anderson-Darling (A * ) goodness of-fit statistics for all distribution models. We note that the statistics W * and A * are described in detail in [23]. In general, it can be chosen as the best model the one which has the smaller the values of the AIC, CAIC, BIC, HQIC, W * and A * statistics and the larger the values ofl and p-values. All computations are performed by the maxLik routine in the R programme. The details are given below.

Otis IQ Scores of Non-White Males Data Set
The first real data set is the data on the Otis IQ Scores of 52 non-white males hired by a large insurance company in 1971. This data set has been analyzed by [24 26] and [27]. On the data set, we compare the Hj-N model with Mc-N, Lx-N, W-N, KwOLLN, L-N, OBN, GOLL-N, Gom-N and Li-N models. Table 2 shows MLEs and standard erros of the estimates for the first dat set. Table 3 lists information criteria results and goodness-of-fits statistics. Table 3 clearly show that the Hj-N model has the smallest values AIC, CAIC, BIC, HQIC, W * and A * statistics and it has the largest values forl and two p-values among the fitted models. So, it can be chosen as the best model based on these criteria. For this data set, the plots of the fitted pdfs and cdfs for all models are shown in Figure 6. From this figure, we see that the Hj-N, Mc-N and KwOLLN models fit data as bi-modal shape whereas the OBN model fits data as uni-modal shape.

Failure Times Data Set
The second data set represents the times between successive failures (in thousands of hours) in events of secondary reactor pumps studied by [28,29] and [30]. This data set is also known as bathtub shaped. So, for this data set, we compare the Hj-W model with AW, Mc-W, Lx-W, W-W, ZBOLLW, L-W, GOLL-W, Gom-W and Li-W models. We fitted the Hj model and obtained itsl value as -31.2520. Table 4 shows Table 2 MLEs and standard erros of the estimates (in parentheses) for the first data set.

Figure 6
The fitted probability density functions (pdfs) and cumulative distribution functions (cdfs) for the first data set.
MLEs and standard erros of the estimates for the second data set. Table 5 lists information criteria results and goodness-of-fits statistics.
The Hj-W model has the smallest values of the AIC, HQIC, W * and A * statistics and have the largest values forl and all p-values among the fitted models. For this data set, the plots of the fitted pdfs and cdfs for all models are shown in Figure 7. From this figure, we see that the Hj-W model fits the histograms of the data sets with more adequate fitting than Li-W and other models.

Student's Cognitive Skill Data
The third data set contains the student's cognitive skills for Organisation for Economic Co-operation and Development (OECD) countries. The score of student's cognitive skill represents the average score in reading, mathematics and science as assessed by the OECD's Programme Table 4 MLEs and standard erros of the estimates (in parentheses) for the second data set.  for International Student Assessment (PISA). The data set can be found in https://stats.oecd.org/index.aspx?DataSetCode=BL. By using this data set, we compare the Hj-U model with GU, L-U and W-U models. We note that since a < x < b, the MLE of the a and b are the minumum order statistic x 1∶n and maximum order statistic x n∶n respectively. Hence, we assume that the parameters are a = 416 and b = 529 for all fitted models. Table 6 shows MLEs and standard erros of the estimates for the third data set. Table 7 lists information criteria results and goodness-of-fits statistics. The Hj-U model has the smallest values of the AIC, CAIC, HQIC, W * and A * statistics and have the largest values of thel and all p-values among the fitted models. So, it can be chosen as the best model based on these criteria. For this data set, the plots of the fitted pdfs and cdfs for all models are shown in Figure 8. From this figure, The Hj-U model has fitted the data as uni-modal shaped .

Data Set Model̂̂̂̂Î
Finally, when we observe all results, we can say that the Hj-N, Hj-W and Hj-U models could be chosen as the best models for the three data sets via the above criteria.

Figure 7
The fitted probability density functions (pdfs) and cumulative distribution functions (cdfs) for the second data set.

Stanford Heart Transplant Data
Recently, Brito et al. [3] introduced the Log-Topp-Leone odd log-logistic-Weibull (Log-TLOLL-W) regression model. Brito et al. [3] used the Stanford heart transplant data set to prove the usefulness of Log-TLOLL-W regression model. Here, we use the same data set to demonstrate the flexibility of LHj-W regression model against the Log-TLOLL-W and Log-Weibull regression models. These data set is available in p3state.msm package of R software. The sample size is n = 103, the percentage of censored observations is 27%. The goal of this study is to relate the survival times (t) of patients with the following explanatory variables: x 1 -year of acceptance to the program; x 2 -age of patient (in years); x 3 -previous surgery status ( 1 = yes, 0 = no ) ; x 4 -transplant indicator ( 1 = yes, 0 = no ) ; c i -censoring indicator (0 = censoring, 1 = lifetime observed). The regression model fitted to the stanford heart transplant data is given by respectively, where the random variable y i follows the LHj-W distribution given in (9). The results for the above regression models are presented in Table 8. The MLEs of the model parameters and their SEs, p values and −ℓ, AIC and BIC statistics are listed in Table 8. Based on the figures in Table 8, LHj-W model has the lowest values of the −ℓ, AIC and BIC statistics. Therefore, it is clear that LHj-W regression model outperforms the others for this data set. In view of the results of LHj-W regression model, 0 , 1 and 2 are statistically significant at 1% level.
Finally, when we observe all results, we can say that the Hj-N, Hj-W and Hj-U models could be chosen as the best models for the three data sets via the above criteria. Figure 9 displays the index plot of the modified deviance residuals and its Q-Q plot against N(0, 1) quantiles for Stanford heart transplant data set. Based on Figure 9, we conclude that none of observed values appear as possible outliers. Therefore, the fitted model is appropriate for this data set.

CONCLUSIONS
In this work, we introduce a new flexible class of continuous distributions via the Hjorth's IDB model. We provide some mathematical properties of the new family. Characterizations based on two truncated moments, conditional expectation as well as in terms of the hazard Table 8 MLEs of the parameters to Stanford Heart Transplant Data for Log-Weibull, Log-TLOLL-W and LHj-W regression models with corresponding SEs, p-values and −ℓ, AIC and BIC statistics.

Log-Weibull
Log-TLOLL-W LHj-W  9 Index plot of the modified deviance residual (left) and Q-Q plot for modified deviance residual (right).
function are presented. The maximum likelihood method is used for estimating the model parameters. We assess the performance of the maximum likelihood estimators in terms of the biases and mean squared errors by means of two simulation studies. A new regression model as well as residual analysis are presented, Finally, the usefulness of the family is illustrated by means of three real data sets. The new model provides consistently better fits than other competitive models for these data sets.