The New Odd Log-Logistic Generalized Inverse Gaussian Regression Model

We define a new four-parameter model called the odd log-logistic generalized inverse Gaussian distribution which extends the generalized inverse Gaussian and inverse Gaussian distributions. We obtain some structural properties of the new distribution. We construct an extended regression model based on this distribution with two systematic structures, which can provide more realistic fits to real data than other special regression models. We adopt the method of maximum likelihood to estimate the model parameters. In addition, various simulations are performed for different parameter settings and sample sizes to check the accuracy of the maximum likelihood estimators.We provide a diagnostics analysis based on case-deletion and quantile residuals. Finally, the potentiality of the new regression model to predict price of urban property is illustrated by means of real data.


Introduction
The inverse Gaussian (IG) distribution is widely used in several research areas, such as life-time analysis, reliability, meteorology and hydrology, engineering, and medicine.Some extensions of the IG distribution have appeared in the literature.For example, the generalized inverse Gaussian (GIG) distribution with positive support is introduced by Good [1] in a study of population frequencies.Several papers have investigated the structural properties of the GIG distribution.Sichel [2] used this distribution to construct mixtures of Poisson distributions.Statistical properties and distributional behavior of the GIG distribution were discussed by Jørgensen [3] and Atkinson [4].Dagpunar [5] provided algorithms for simulating this distribution.Nguyen et al. [6] showed that it has positive skewness.More recently, Madan et al. [7] proved that the Black-Scholes formula in finance can be expressed in terms of the GIG distribution function.Koudou [8] presented a survey about its characterizations and Lemonte and Cordeiro [9] obtained some mathematical properties of the exponentiated generalized inverse Gaussian (EGIG) distribution.
In this paper, we study a new four-parameter model named the odd log-logistic generalized inverse Gaussian (OLL-GIG) distribution which contains as special cases the GIG and IG distributions, among others.Its major advantage is the flexibility in accommodating several forms of the density function, for instance, bimodal and unimodal shapes.It is also suitable for testing goodness-of-fit of some submodels.
Our main objective is to study a new regression model with two systematic structures based on the OLLGIG distribution.We obtain some mathematical properties and discuss maximum likelihood estimation of the parameters.For these models, we presented some ways to perform global influence (case-deletion) and, additionally, we developed residual analysis based on the quantile residual.For different parameter settings and sample sizes, various simulation studies were performed and the empirical distribution of quantile residual was displayed and compared with the standard normal distribution.These studies suggest that the empirical distribution
We denote by  ∼ GIG(, , ]) a random variable having density function (2).The mean and variance of  are respectively.
The moment generating function (mgf) of  reduces to We use the reparameterized GIG distribution according to GAMLSS in software R.For example, we have (,  1/2 , −0.5) = (, ).Other properties of the GIG distribution are investigated by Jørgensen [3].
Henceforth, we write () =  ,,] () to simplify the notation.The OLLGIG density function can be expressed as The main motivations for the OLLGIG distribution are to make its skewness and kurtosis more flexible (compared to the GIG model) and also allow bi-modality.We have  = log[()/()]/ log[()/()], where () = 1 − () and () = 1−().Thus, the parameter  represents the quotient of the log odds ratio for the new and baseline distributions.Note that the pdf and cdf of the OLLGIG distribution depend on integrals, which are calculated numerically in the same way as those of the Birnbaum-Saunders distribution.
Some plots of the OLLGIG density for selected parameter values are displayed in Figure 1.It is evident that the proposed distribution is much more flexible, especially in relation to bimodality (for 0 <  < 1), than the GIG and IG distributions.
Equation ( 5) has tractable properties especially for simulations, since its quantile function (qf) takes the simple form where   () =  −1 ,,] () is the qf of the GIG distribution.This scheme is useful because of the existence of fast generators for GIG random variables in some statistical packages.For example, we can fit the generalized additive models for the location, scale, and shape (GAMLSS) in R.

Properties of the OLLGIG Model
where !,   = ∑ (,)∈   , , and To calculate , the index  can stop after a large number of summands.

Two Properties. Equation (19) becomes useful in deriving
several mathematical properties of the proposed distribution using well-known properties of the GIG distribution.We provide only two examples.The th moment about zero of the GIG(, , ]) random variable defined by ( 2) is Then, the ordinary moments of the OLLGIG random variable  follow from (19) as where By combining ( 19) and ( 4), the generating function of  takes the form ] . (22)

The OLLGIG Regression Model
In many practical applications, the lifetimes are affected by explanatory variables such as sex, smoking, diet, blood pressure, cholesterol level and several others.So, it is important to explore the relationship between the response variable and the explanatory variables.Regression models can be proposed in different forms in statistical analysis.In this section, we define the OLLGIG regression model with two systematic structures based on the new distribution.It is a feasible alternative to the GIG and IG regression models for data analysis.Regression analysis involves specifications of the distribution of  given a vector x = ( 1 , . . .,   )  of covariates.We relate the parameters  and  to the covariates by the logarithm link functions where  ] (⋅) and (⋅) are defined in Section 2. The MLE θ of  can be calculated by maximizing the log-likelihood (24) numerically in the GAMLSS package of the R software.The advantage of this package is that we can adopt many maximization methods, which will depend only on the current fitted model.Initial values for  1 and  2 are taken from the fit of the GIG regression model with  = 1.We do not have problems of maximizing this log-likelihood function.This fact is shown in Section 4.1, where some simulations of the proposed regression model are given under different scenarios.
Under general regularity conditions, the asymptotic distribution of ( θ − ) is multivariate normal  2+2 (0, () −1 ), where () is the expected information matrix.The asymptotic covariance matrix () −1 of θ can be approximated by the inverse of the (2 + 2) × (2 + 2) observed information matrix − L ().The elements of this matrix are calculated numerically.The approximate multivariate normal distribution  2+2 (0, − L ( θ) −1 ) for θ can be used in the classical way to construct approximate confidence for the parameters in .
We can use the likelihood ratio (LR) statistic for comparing some special sub-models with the OLLGIG regression model.We consider the partition  = (  1 ,   2 )  , where  1 is a subset of parameters of interest and  2 is a subset of remaining parameters.The LR statistic for testing the null hypothesis  0 :  1 =  (0)  1 versus the alternative hypothesis  1 : 1 is given by  = 2{ℓ( θ)−ℓ( θ)}, where θ and θ are the estimates under the null and alternative hypotheses, respectively.The statistic  is asymptotically (as  → ∞) distributed as  2  , where  is the dimension of the subset of parameters  1 of interest.For example, the test of  0 :  = 1 versus  :  ̸ = 1 is equivalent to compare the OLLGIG regression model with the GIG regression model and the LR statistic reduces to  = 2{( β1 , β2 , ], τ) − ( β1 , β2 , ], 1)}, where β1 , β2 , ], and τ are the MLEs under H and β1 , β2 , and ] are the estimates under  0 .

Simulation Study.
In this part of simulation, we approach in two different ways.First, we perform a simulation to study the behavior of the MLEs of the parameters of the OLLGIG distribution without systematic structures.Second, we evaluate the behavior of the parameter estimates considering two systematic structures.
The OLLGIG Distribution.Some properties of the MLEs are evaluated using a classical analysis by means of a simulation study.We simulate the OLLGIG distribution as follows: (i) Compute the inverse function  −1 (⋅) from the cumulative distribution (1).
(iv) The values  = () are generated from the OLLGIG distribution, where () is the inverse of (1).
We take  = 20, 50, 150 and 350 for each replication and then evaluate the estimates μ, σ, ], and τ.We repeat this process 1, 000 times and then calculate the average estimates (AEs), biases, and means squared errors (MSEs).In the first scenario, we take  = 0.3662,  = 5.7915,  = 0.0658, and ] = 12.7216.We use the values fitted in the adjustment to the iris data set in Section 6.The estimates of the model parameters are computed using the GAMLSS package of the R software.The results of the Monte Carlo study under maximum likelihood are given in Table 1.They indicate that the MLEs are accurate.Further, the MSEs of the MLEs of the model parameters decay toward zero when  increases in agreement with first-order asymptotic theory.
The OLLGIG Regression Model.We examine the performance of the MLEs in the OLLGIG regression model by means of some simulations with sample sizes  = 100, 300 and 500.We simulate 1, 000 samples from two scenarios ( = 0.5 and  = 1.5) by considering   =  10 +  11   and   =  20 +  21   .For both cases, we take ] = 0.53.The explanatory variable is generated by   ∼ (0,1) and the response variable is generated by   ∼ OLLGIG(  ,   , ], ).For each fitted model, we compute the AEs, biases, and MSEs.Based on the results given in Table 2, we note that the MSEs of the MLEs of  10 ,  11 ,  20 ,  21 , and  decay toward zero when the sample size  increases, as usually expected under first-order asymptotic theory.Further, the AEs of the parameters tend to be closer to the true parameter values when  increases.These facts support that the asymptotic normal distribution provides an adequate approximation to the finite sample distribution of the estimates.

Checking Model: Diagnostic and Residual Analysis
A first tool to perform sensitivity analysis, as stated before, is by means of global influence starting from case-deletion [11,12].Case-deletion is a common approach to study the effect of dropping the ith observation from the data set.The In the following, a quantity with subscript "(i)" means the original quantity with the ith observation deleted.For model (25), the log-likelihood function of  is denoted by  () ().Let θ() = ( β1  () , β2  () , ]() , τ() )  be the MLE of  from  () ().To assess the influence of the ith observation on the MLEs θ = ( β1  , β2 , ], τ)  , we can compare the difference between θ() and θ.If deletion of an observation seriously influences the estimates, more attention should be paid to that observation.Hence, if θ() is far from θ, then the th observation can be regarded as influential.A first measure of the global influence is defined as the standardized norm of θ() − θ (generalized Cook distance) given by Another alternative is to assess the values of   ( 1 ),   ( 2 ), and   (], ) since these values reveal the impact of the ith observation on the estimates of  1 ,  2 , and (], ), respectively.Another popular measure of the difference between θ() and θ is the likelihood distance given by Once the model is chosen and fitted, the analysis of the residuals is an efficient way to check the model adequacy.The residuals also serve to identify the relevance of an additional factor omitted from the model and verify if there are indications of serious deviance from the distribution considered for the random error.Further, since the residuals are used to identify discrepancies between the fitted model and the data set, it is convenient to define residuals that take into account the contribution of each observation to the goodness-of-fit measure.In summary, the residuals allow measuring the model fit for each observation and enable studying whether the differences between the observed and fitted values are due to chance or to a systematic behavior that can be modeled.The quantile residuals (qrs) [13] for the OLLGIG regression model with two systematic structures are defined by where (⋅) is given in (1) and Φ(⋅) −1 is the inverse cumulative standard normal distribution.
Atkinson [14] suggested the construction of an envelope to have a better interpretation of the probability normal plot of the residuals.The simulated confidence bands of the envelope should contain the residuals.If the model is wellfitted, the majority of points will be within these bands and randomly distributed.The construction of the confidence bands follows the steps: (i) Fit the proposed model and calculate the residuals   's; (ii) Simulate  samples of the response variable using the fitted model; (iii) Fit the model to each sample and calculate the residuals   ( = 1, . . .,  and  = 1, . . ., ); (iv) Arrange each group of  residuals in rising order to obtain  () for  = 1, . . .,  and  = 1, . . ., ; (v) For each , calculate the mean, minimum and maximum  () , namely, (vi) Include the means, minimum, and maximum together with the values of   against the expected percentiles of the standard normal distribution.
The minimum and maximum values of     form the envelope.If the model under study is correct, the observed values should be inside the bands and distributed randomly.
Simulation Study.A simulation study is conducted to investigate the behavior of the empirical distribution of the qrs for the OLLGIG regression model.We generate 1, 000 samples based on the algorithm presented in Section 4.1.We also give the normal probability plots to assess the degree of deviation from the normality assumption of the residuals.Based on the plots in Figures 3 and 4 representing the first and second scenarios, respectively, we conclude that the empirical distribution of the qrs agrees with the standard normal distribution in both scenarios.This empirical distribution becomes closer to the standard normal distribution when  increases in both scenarios.

Applications
In this section, we provide two applications to real data to prove empirically the flexibility of the OLLGIG model.The calculations are performed with the R software.

Application 1: Iris Data.
In the first application, the OLLGIG distribution is compared with the nested GIG and IG distributions.The data set is iris, in which it provides measurements in centimeters of the variables length and width of the septal and length and width of the petal, respectively, for 50 flowers of each of the 3 iris species (setosa, versicolor, and virginica).In this application, the variable septum length (Sepal.Length) is used.This data set has been analyzed by several authors in multivariate analysis, for example, Anderson (1935) and Fisher [15].We show that the distribution for these data presents bimodality.
Table 3 provides a descriptive summary for these data and indicates positively distorted distributions with varying degrees of variability, skewness, and kurtosis.
A brief descriptive analysis of the data in Table 3 reveals that the average score of the variable septum length is 5.843  and the median value is 5.800, thus indicating that the data has a symmetric distribution.
In Table 4, we report the MLEs of the model parameters and their standard errors (SEs) in parentheses.We give in Table 5 the following goodness-of-fit measures: Akaike Information Criterion (AIC), Consistent Akaike Information Criterion (CAIC), Bayesian Information Criterion (BIC), Hernnan-Quinn Information Criterion (HIQC), Cramérvon Misses ( * ), Anderson Darling ( * ), and Kolmogarov-Smirnov () test statistic.The small values of these measures, the better the fit.The figures in Table 5 indicate that the OLLGIG distribution has the lowest values of AIC, CAIC, BIC, HQIC,  * ,  * , and  among those of the fitted models and therefore it could be chosen as the best model.
We consider LR statistics to compare nested models.The OLLGIG distribution includes some submodels as mentioned above, thus allowing their evaluations relative to the others and to a more general model.The values of the LR statistics are listed in Table 6.It is evident from the figures in this table that the OLLGIG distribution outperforms its submodels according to the values of the LR statistics.So, it indicates that the OLLGIG model provides a better fit to these data than their sub-models.
More information is provided by a visual comparison of the histogram of the data and the fitted density functions and cumulative functions.The plots of the fitted OLLGIG, GIG, and IG densities are displayed in Figure 5(a).The estimated OLLGIG density provides the closest fit to the histogram of the data.In order to assess if the model is appropriate, the plots of the fitted OLLGIG, GIG, and IG cumulative distributions and the empirical cdf are displayed in Figure 5(b).They indicate that the OLLGIG distribution provides a good fit to these data.
In Figure 6, we note that the iris data has a bimodality shape, where they cannot have the GIG and IG distributions (see Figure 5(a)).

Application 2: Price of Urban Property Data.
Here, we provide a second application of the OLLGIG regression model to evaluation the price of urban residential properties for sale in the municipality of Paranaíba in the State of Mato Grosso do Sul (MS) in Brazil.These data collected in 2017 refer to  = 45 houses for sale in the municipality.In the context of real estate appraisal, it is necessary to develop statistical methodologies (characterized by the scientific accuracy) of residential property prices.Besides this aspect, we can perceive the rare use of such methodologies by the real estate market.We construct a OLLGIG regression model with two systematic components to describe the relationship between real estate prices and other explanatory variables, thus allowing an understanding of the behavior of the price variable [16,17].The following explanatory variables are considered: (i) price of the property   ; this variable was divided by 10, 000; (ii) area  1 of land in square meters; (iii) number of parking spaces  2 in the residence (0=no vacancy, 1=one vacancy, and 2=more than one vacancy); in this case, two dummy variables,  21 and  22 , are created; (iv) number of rooms with suites  3 in the residence (0=no suites, 1=one suites, 2=more than one suites); in this case two dummy variables,  31 and  32 , are created; (v) if the residence has a swimming pool  4 (0=no, 1=yes); (vi) if the residence is located in the center of the city  5 (0=no, 1=yes);  = 1, . . ., 45.
In the descriptive analysis of the data from Table 7, the mean score of the variable value is 24.98, which is not close to the median value 17.00, thus indicating that the data has an asymmetric distribution.
We define the OLLGIG regression model by two systematic structures for  and    = exp (     Section 5.The results of such influence measures index plots are displayed in Figure 8.These plots indicate that the cases ♯7, ♯43, and ♯45 are possible influential observations.In addition, Figure 9(a) provides plots of the qrs for the fitted model, thus showing that all observations are in the interval (−3, 3) and a random behavior of the residuals.
Hence, there is no evidence against the current suppositions of the fitted model.In order to detect possible departures from the distribution errors in model, as well as outliers, we present the normal plot for the qrs with a generated envelope in Figure 9(b).This plot reveals that the OLLGIG regression model is very suitable for these data, since there are no observations falling outside the envelope.Also, no observation appears as a possible outlier.

Concluding Remarks
We present a four-parameter distribution called the odd loglogistic generalized Gaussian inverse (OLLGIG) distribution, which includes as special cases the generalized Gaussian inverse (GIG) and inverse Gaussian (IG).We provide some of its mathematical properties.Further, we define the OLLGIG regression model with two systematic structures based on this new distribution, which is very suitable for modeling censored and uncensored data.The proposed model serves as an important extension to several existing regression models and could be a valuable addition to the literature.Some simulations are performed for different parameter settings and sample sizes.The maximum likelihood method is described for estimating the model parameters.Diagnostic analysis is presented to assess global influences.We also discuss the sensitivity of the maximum likelihood estimates from the fitted model via quantile residuals.The utility of the proposed OLLGIG regression model is demonstrated by means of a real data set for price data of urban residential properties in the municipality of Paranaíba in the State of Mato Grosso do Sul, Brazil.

Figure 1 :Figure 2 :
Figure 1: Plots of the OLLGIG density for some parameter values.

Figure 5 :Figure 6 :
Figure 5: (a) Estimated densities of the OLLGIG, GIG, and IG models for iris data.(b) Estimated cumulative functions of the OLLGIG, GIG, and IG models and the empirical cdf for iris data.

Figure 7 :
Figure 7: Estimated cdf from the fitted OLLGIG regression model and the empirical cdf for the price of urban property data.(a) For covariate  4 , and (b) for covariate  5 .

Figure 9 :
Figure 9: (a) Index plot of the qrs and (b) normal probability plot with envelope for the qrs from the fitted OLLGIG regression model fitted to urban property data.

Table 1 :
AEs, biases, and MSEs for the parameters of the OLLGIG distribution.

Table 2 :
AEs, biases, and MSEs for the OLLGIG regression model under scenarios 1 and 2.

Table 3 :
Descriptive statistics for iris flower data.

Table 4 :
MLEs and SEs (in parentheses) of the model parameters for the iris data.

Table 5 :
Goodness-of-fit measures for the iris data.

Table 6 :
LR tests for the iris data.

Table 10 :
LR tests for the price of urban property data.