A Bayesian Approach to Mixed Gamma Regression Models

Gamma regression models are a suitable choice to model continuous variables that take positive real values. This paper presents a gamma regression model with mixed effects from a Bayesian approach. We use the parametrisation of the gamma distribution in terms of the mean and the shape parameter, both of which are modelled through regression structures that may involve fixed and random effects. A computational implementation via Gibbs sampling is provided and illustrative examples (simulated and real data) are presented.


Introduction
Mixed models are used in a wide variety of disciplines: physical, biological and social sciences, among others.These models have been particularly useful to represent clustered and, therefore, dependent data, arising for example when data are collected hierarchically, when observations are taken about related individuals, or when data are gathered over time regarding the same individuals (Ronquist & Huelsenbeck 2003, Vonesh 2006, Brown & Prescott 2014, Demidenko 2013).The study of hierarchical data has been largely framed for variables with normal distribution or at least symmetric distributions.However, models may not be appropriate when the response variable takes only positive values, such as, in the process of rate setting in the framework of heterogeneous insurance portfolios (vehicles, personal injury, etc.) (De Jong, Heller et al. 2008), where one possibility is to use a gamma regression model.
The substantial advantage of considering a gamma model is due to their flexibility compared to other models, such as exponential and Poisson, among others.Thus, gamma regression models allow for monotone models with no constant hazard in survival.McCullagh & Nelder (1989) presented a gamma regression model where the coefficient of variation is assumed constant for all observations.Cepeda (2001) and Cepeda & Gamerman (2005) proposed an extension of the gamma regression models, assuming regression structures for both mean and dispersion parameters, and presented a Bayesian method to fit regression models in the two-parameter exponential family of distributions.Specifically for gamma observations, they modeled the mean and shape or variance parameters.
Following Cepeda (2001), our proposed model uses a parameterization of the gamma distribution in terms of the mean and the shape parameters.In this paper, we model the mean and the shape of a gamma distribution taking into account that, in this model, the variance is a function of the mean and shape parameters and that the mean and shape parameters are orthogonal in the Box-Cox sense (Cox & Reid 1987, Cepeda 2001).This property is not satisfied by the mean and variance.
The mean of the response variable is linked to a mixed-effects regression structure by the log and identity functions.An extended version of this model is also considered, assuming the shape parameter is not constant over the observations, but rather is related to a mixed-effects regression through the log link, similar to the mean.This paper proposes a Bayesian method to fit gamma regression models, that include fixed and random effects in the mean and in the shape parameters.This paper is organized as follows: after an introduction, Section 2 present the Bayesian mixed gamma regression.In Section 3 a Bayesian method to fit the proposed models is presented.Section 4 show results of simulation studies.Section 5 include one aplication to real data set.Finally, in section 6 some conclutions are formulated.The outline of the OpenBUGS software (Thomas 2006) used, is presented in an appendice.

Bayesian mixed gamma regression models
Due of its flexibility and other characteristics, the gamma distribution is frequently used to model continuous data taking only positive values.The probability density function of a variable y that follows a gamma distribution, parameterized in terms of its mean µ (µ > 0) and shape parameter α (α > 0), is given by: where α > 0 and Γ(.) is the gamma function (Cepeda-Cuervo, 2001).In this case, E(y) = µ and Var(y) = µ 2 α .If y has density function (1), we write y ∼ G(µ, α).Now, let y 1 , . . .y n be independent random variables such that y i ∼ G(µ i , α).The gamma regression model is defined assuming that where β = (β 1 , . . ., β p ) ′ is a vector of unknown regression parameters (p < n), x i = (x i1 , . . ., x ip ) ′ is a vector of covariates of the i-th observation and η i is a linear predictor.Usually x i1 = 1 for all i, so that the model has a mean intercept.
From generalized linear models, the canonical link function g(.) : (0, ∞) → R, is the inverse function g(µ) = 1/µ.However, the usual link functions in the gamma regression models are g(µ) = log(µ) and g(µ) = µ, due to the underlying constraints in the canonical link .
In the gamma regression model presented by Cepeda (2001), the shape parameter is not constant for every observation and it is modeled like the mean.Here, we assume n independent random variables, y i ∼ G(µ i , α i ), i = 1, . . ., n, with mean and shape parameters given by where β = (β 1 , . . ., β p ) ′ and δ = (δ 1 , . . ., δ k ) ′ , p + k < n, are the sets of regression parameters, x i and w i are observed values of the covariates, and η i and τ i are the linear predictors at the i-th observation.
Since the shape parameter is strictly positive, the log link function is a natural choice for h.It is then assumed that log(α i ) = w ′ i δ where w i is a vector of covariates and δ denotes a vector of unknown regression coefficients.Again, it is convenient to take the first component of the w i vector as 1 to allow for an intercept in the shape regression structure.There is no restriction on whether or not the w i ′ s contain the same predictor variables as the x i ′ s.
The gamma regression model described above does not involve random effects.To extend work in Bayesian generalized linear models and specifically in Bayesian gamma regression (Cepeda 2001, Cepeda & Gamerman 2004, Cepeda-Cuervo, Migon, Garrido & Achcar 2014), in this paper we propose two gamma mixed regression models, the first assuming that the shape parameter is constant for all observations and the second involving a mixed-effects model for the shape parameter.
Let y 1 , . . ., y m be independent continuous random vectors, where y i = (y i1 , . . ., y ini ) ′ represents a vector of responses observed (with n i measurements) for a sample unit i (in chronological order) and for which each of its components y ij , j = 1, . . .n i takes positive real number values.
Also consider a regression model with the following structure: with i = 1, . . ., m, where G(.) is a vector-function, linking the response vector of the conditional mean with the mixed linear model where X i is the n i × p design matrix, β = (β 1 , . . ., β p ) ′ are regression coefficients or fixed effects, and Z i is the n i × q design matrix associated with the vector of random effects b i = (b i1 , . . ., b iq ) ′ .
For the identity link function, the j-th component of ( 2) is where For the log link function, the j-th component of ( 2) is: x ij = (x ij1 , . . ., x ijp ) ′ and z ij = (z ij1 , . . ., z ijq ) ′ , which is equivalent to In this paper, we first assume that conditional on b i , β and α, that y ij , i = 1, . . ., m; j = 1, . . ., n i , are independent and have probability density function given by (1) with µ replaced by µ ij (specified by (3) or (4) ), depending on a set of explanatory variables through the selected link.In this formulation, α represents a constant shape parameter.
In mixed models, the random effects, b 1 , . . ., b m , are typically assumed to be independent and normally distributed: where Σ b is a q × q positive-definite matrix.The normality assumption may be questionable in some practical situations when there are outliers, when the data Revista Colombiana de Estadstica 42 (2019) 81-99 exhibit fat tails or the behavior of the data turns out to be asymmetric.An alternative is to consider other types of distributions such as the multivariate tdistribution with ν b > 0 degrees of freedom, location vector µ b ∈ R q and positivedefinite dispersion matrix Σ b to model the random effects b For a more general formulation of this model, we consider a different shape parameter, α ij , for each response y ij .Let us now assume a mixed model for the logarithm of α ij .This is where w ij = (w ij1 , . . ., w ijp * ) ′ is the design vector corresponding to the vector p * × 1, δ, of fixed effects, and h ij = (h ij1 , . . ., h ijq * ) ′ is the design vector corresponding to the vector, d i , of random effects.It may be noted that the matrices W i = (w i1 , . . ., w ini ) and H i = (h i1 , . . ., h ini ) can contain the same covariate matrices X i = (x i1 , . . ., x ini ) and Z i = (z i1 , . . ., z ini ), but this is not required.
Here it can be assumed that In order to apply Bayesian methods to fit the gamma Bayesian mixed model, we assume multivariate normal prior distributions for the fixed effects, that is Vague priors are specified by taking large values for the prior variances.An alternative strategy is to consider a multivariate t-distribution, i.e., and to specify an appropriate value for ν β , the degrees of freedom parameter.If the vector of random effects is assumed to follow a multivariate t-distribution, i.e., b i |ν b , µ b , Σ b ∼ t q (ν b , 0, Σ b ), then the prior distribution for the degrees of freedom can be discrete as in Albert & Chib (1993) and Besag, Green, Higdon & Mengersen (1995), or continuous as in Geweke (1992).We have chosen the latter alternative.More specifically, we consider an exponential prior distribution with mean 1/a, for the degrees of freedom, which we denote ε(a).The prior distribution for the scale matrix of random effects Σ b is chosen, mainly for computational simplicity, to be an inverted Wishart distribution as in Fong, Rue & Wakefield (2010) In summary, in this paper we study two gamma mixed regression models, where: 1. Model 1: The mean follows a mixed regression structure given by equation (2) modeled as in (3) or (4) (Model 1A and Model 1B, respectively) and both have a constant shape parameter α.

Model 2:
The mean follows a mixed regression structures given by (2) and the shape follows a regression structures given by (5).

Model 1
Assuming that the parameters Σ b , ν b , α and β are independents, the joint posterior distribution is given by: Thus, samples of f (β, Σ b , ν b , α, η|y) are obtained by an iterative process from the full conditional distributions: The algorithm can be implemented using OpenBUGS and this software can also be used to obtain posterior parameter inferences.

Model 2
, where τ i = (τ i1 , . . ., τ ini ) ′ , and assuming (conditional independence of y ij ) δ, Σ d and ν d , the τ ′ i s are independent and have density function Revista Colombiana de Estadstica 42 (2019) 81-99 Thus, a Gibbs sampling algorithm also can be used to generate samples from: through iteratively sampling from the following full conditional distributions: for i, k = 1, . . ., m, and i ̸ = k.The algorithm for this model can also be implemented in OpenBUGS, and posterior inferences for the parameters can be made with this software.

Simulation studies
We conducted a simulation study to examine how similar the estimates of the parameters of the models are, compared with the true values of the parameters.

Model type 1
In this section we consider two classes of models: Model 1A.In this section, we assume the model: α), i = 1, . . ., n, and j = 1, . . ., 5, and b i |ν b , Σ b ∼ t 2 (ν b , 0, Σ b ), assuming as link function the inverse of the canonical link.
For n = 50, 100 and 150, values of the explanatory variables X 2 , X 3 and Z 2 were generated from uniform distributions U[0,30], U[0,15] and U[10,20] respectively, and we set ν b = 5, α = 13, β = (1.5, 2.0, 3.0) ′ and The following prior specifications were adopted: The true values of the parameters and their estimations and standard deviations are presented in Table 1.In this model we consider 100,000 iterations, discarding the first 10,000 iterations as burn-in.The parameter estimates improve and standard deviations decline as sample size increases.To monitor the chains' global convergence, we calculated the potential scale reduction factor(psrf) proposed by Brooks & Gelman (1998) (the psrf is an estimated factor by which the scale of the current distribution for the target distribution might be reduced if the simulations were continued for an infinite number of iterations).When the psrf is high (perhaps greater than 1.1 or 1.2), then we should run our chains longer to improve convergence to the stationary distribution.We obtained psrf = 0.995, 0.996 and 0.99 for n = 50, 100 and 150, respectively.The tests indicated that the Markov chain converged to its stationary distribution.Also, jumps were performed every 5 iterations to remove effects of autocorrelations in each chain.
The study of the convergence of the individual chains was performed using the CODA package (Plummer, Best, Cowles & Vines 2006), in the R software (R Core Team 2017).This package provides different diagnostic methods to check convergence: Gelman and Rubin's diagnostic (Geweke 1992), Geweke's diagnostic (Geweke 1992), Heidelberg and Welch's diagnostic (Heidelberger & Welch 1981) and Lewisťs diagnostic (Raffery & Lewis 1992).
Table 2 shows, for n = 50, the posterior correlations between the posterior parameters chains.Small correlations can be seen between them, with values between −0.11 and 0.15.Note that the correlations between β i and α are near zero.The correlations considering the other two sample sizes were similar to those shown in Table 2. Model 1B.We assume the model: . ., n, and j = 1, . . ., 5, For n = 50, 100 and 150, values of the explanatory variables X 2 , X 3 and Z 2 were generated, again, from uniform distributions U[0,30], U[0,15] and U [10,20], respectively, and we set ν b = 5, α = 13, β = (1, 0.2, −0.03) ′ , and Prior distributions for the parameters ν b , Σ b , β and α are the same as those proposed for model 1A.
The parameter estimations and standard deviations are given in Table 3.We consider 100,000 iterations, discarding the first 10,000 iterations as burn-in, like in the above model.Again we analyze the convergence of the chains with the above mentioned criteria.These diagnostic tests suggest good multivariate (psrf = 1.02) and individual behavior of the chains.Table 4 shows for n = 50, posterior correlations between the estimated parameters.As above, small correlations can be seen, with values between -0.13 and 0.16.Note, again, that the correlation between samples of β 2 , β 3 and α is equal to zero.The correlations considering the other two sample sizes were similar to those shown in Table 4.
Prior distributions for the parameters ν b , Σ b and β are the same as those proposed for model 1A.Prior distribution for the parameter δ is the same as the parameter β: t 3 (ν β , µ β , Σ β ).Also, for model specifications that include random effects for the shape parameter, we assume that the shape random effects d i have the same distribution as the location random effects b i : t 2 (ν b , 0, Σ b ).
In this model, we again considered 100,000 Monte Carlo iterations and the estimates were obtained using the samples obtained in the last 90,000 iterations.Testing the chains indicated very good convergence behavior.Table 6 shows, for n = 50, posterior correlations between posterior samples of the parameters.There are small correlations between them, with values between -0.15 and 0.17.Here, the correlations between posterior samples of β 2 , β 3 , and δ 1 , δ 2 , δ 3 are equal to zero.The correlations considering the other two sample sizes were similar to those shown in Table 6.The parameter estimates are given in Table 5.
We considered different values for ν b = 3, 7, 10 for the models 1A, 1B and 2. The results obtained were good enough.
According to the results obtained in the above simulated models, the estimates obtained are reliable.

Application
Life insurance is a form of personal insurance that covers the risk of death of the insured, the occurrence of a serious illness or an unforeseen event that causes total and permanent disability of the insured.
The insurance data correspond to life insurance premiums (The values are premiums corresponding to contracts perfected or extended in the year, whose receipts have been issued in the corresponding period), in millions of Colombian pesos, of 19 insurance companies which operate the branches of life and person according to the Superintendence of Colombia from 2004 to 2015 (Source: Federation of Insurers of Colombia: FASECOLDA).
The data set is illustrated in Figure 1.This figure shows incresing behavior of the mean over time and that, conditional on the time, the data set presents a skewness like in a gamma distribution.Thus, giving the positivity of the data, it is assumed that the average life insurance premiums data can be modeled using a gamma distribution, with an appropriate link function.

Models with Constant Shape Parameter
Here, a mixed gamma regression model with a constant shape parameter is considered (Model 1), assuming location regression structure given by: Model 1C and Model 1D) t = 1, 2, . . ., 12, i = 1, 2, . . ., 19.For the shape parameter α, an inverse gamma distribution α ∼ IG(ϵ, ϵ), with ϵ = 0.001 is assumed.• Model 1A: In this model, the following prior distributions were considered: • Model 1B: In this model, the following prior specification was assumed: • Model 1D: In this model, the following prior distributions were assumed:  These models were fitted using the OpenBugs program given in the Appendix.The correspondign DIC values for the fitted models are given by: for Model 1A, DIC=2305; for Model 1B, DIC=2376; for Model 1C, DIC=2286 and, for Model 1D, DIC=2293.Table 7 shows the posterior parameter estimates of Model 1C, which provide the least DIC value.

Models with Non-Constant Shape Parameter
Here a mixed gamma regression model is considered, assuming that α is not constant through the observations and that its behavior can be explained by a time quadratic or a lineal relation.In this section, we consider four different prior specifications for the parameters of models: 2A, 2B, 2C and 2D: Cepeda

• Model 2A
In this model, we assume that mean and shape have regression structures are given by with the following prior specifications:

• Model 2B
In this model we assume the mean and shape regression structures given by ( 6) and ( 7), and the following prior distributions: with a 0 = 0.1, c = 5, ν β = 10, and ), with a 0 = 0.1, c = 5, ν δ = 10, and In this model, we assume the following regression structures: and the following prior dsitributions:

• Model 2D
In this model we assume the mean and shape regression structures given by ( 8) and ( 9), and the following prior distributions: These models were fitted using the OpenBugs program given in the Appendix.The correspondign DIC values for the fitted models are given by: for Model 2A, DIC = 3744; for Model 2B, DIC = 3520; for Model 2C, DIC = 3815 and, for Model 2D, DIC = 3855.Table 8 gives the posterior estimates of the parameters associated with model 2B, which provide the least DIC value.
We considered 20000 Monte Carlo iterations (to secure convergence) and our results were obtained with the posterior samples obtained from last 17000 iterations.Additionally, we performed the diagnostic tests reported for the simulated data, all of which suggested suitable behavior of the chains.For Model 2B, similar diagnostic evidences werw obtained (psrf = 1.07).Figure 1 shows the posterior mean estimation μ through the time and the estimated trajectory of each of the premiums.The results suggest that the premiums average tends to increase over time, in non-linear form, as obtained in the model through a quadratic polynomial time.The Figure 1 shows that the variances are not homogeneous and tend to increase over time and that this variance can be modeled by a quadratic or cubic time polynomial.Between the constant shape model (Model 1C) and the variable shape model (Model 2B), the DIC values suggest that the first model is the best.

Conclusions and Extensions
The implementation of the mixed gamma regression model for random effects that are normally distributed and non-normally distributed is very challenging.In this sense, this approach is more flexible that others, because one can easily implement it when the distribution of the random effects follows the Student-t, skew normal or another distribution, by using simple and accessible software such as OpenBUGS.Another advantage of this approach is the easy implementation for the imputation of missing values, a common situation in longitudinal data for which a classic approach is much more complicated.
In the application we choose the identity link for the mean parameter.We did not have problem with this selection.However, other options, like the log link, are possible.
We selected the exponential distribution for the prior of the degrees of fredom of the multivariate t distribution, but other distributions may be used.A posterior study using different distributions can be implemented, such as that proposed by Fonseca, Ferreira & Migon (2008), and a comparison can be made.In general, a study of sensitivity to the choice of priors can be done, selecting different types of distributions, such as hierarchical distributions, among others.
It also possible to perform the study by modeling the mean and variance, instead of modeling the mean and shape, and comparing the obtained results.

Figure 2 :
Figure 2: Behavior of the sample path.Model 1C.

Table 1 :
True and estimated parameter values, and standard deviations for model 1A.

Table 3 :
True parameter values and standard deviations for model 1B.

Table 5 :
True and estimated parameter values, and standard deviations for model 2.

Table 7 :
Estimated posterior medians and means, 95% credibility intervals (CI) for the parameters in Model 1C.

Table 8 :
Estimated posterior medians and means, 95% credibility intervals (CI) for the parameters in model 2B.