TWEEDIE COMPOUND POISSON MODEL WITH FIRST ORDER AUTOREGRESSIVE TIME RANDOM EFFECT

Modeling with the Tweedie compound Poisson distribution is mostly done based on the Generalized Linear Model (GLM). GLM can be expanded into the Generalized Linear Mixed Model (GLMM) if there are fixed effects and random effects. GLMM modeling with Tweedie compound Poisson response variables is still rarely done because it is not analytically tractable and the density function cannot be stated in closed-form. By using the h-likelihood method, GLMM modeling with Tweedie compound Poisson can be solved numerically. This research models the Tweedie compound Poisson response variable by using GLMM with two random effects, region and the time assumed to follow the first-order autoregressive process. A simulation study is carried out with an evaluation using the average relative bias and the average MSE. The simulation results show the greater the autoregressive coefficient results in the smaller value of the relative bias. MSE values that are close to zero indicate the model is very good in describing data. An application, which is conducted to model the total number of claims in a certain area and time based on the 2014 2 ADAM, KURNIA, PURNABA, MANGKU, SOLEH profile of risk and loss of motor vehicle insurance in Indonesia, shows model has small value of absolute bias and


INTRODUCTION
The exponential dispersion model (EDM) is an exponential family distribution with additional dispersion parameters. EDM has an important role in modern data analysis because it is able to overcome problems where the response variable does not have a normal distribution. The density function of the random variable with the distribution including the EDM family is ( ; , ) = ( ; ) exp ( 1 ( − ( ))) , where and are known function, > 0 is dispersion parameter, and is the natural parameter [2]. A characteristic of EDM is the mean-variance relation if the dispersion parameter is considered constant. In other words, if has EDM distribution with mean , variance function There are many applications of the Tweedie compound Poisson distribution. In the actuarial field, the distribution of Tweedie compound Poisson is used to model the total number of insurance claims [1], [5]. In the field of climatology, the distribution of Tweedie compound Poisson is used 3 TWEEDIE COMPOUND POISSON MODEL to model rainfall [6], [8]. In the field of fisheries, Tweedie compound Poisson distribution models the amount of fish caught [12].
All those examples above are done based on a generalized linear model (GLM). GLM is an extension of the usual regression with the response variable not always coming from the normal distribution. GLM can be expanded into a generalized linear mixed model (GLMM) if there are fixed effects and random effects. GLMM with response variables have Tweedie's compound Poisson is still rarely performed. This is because the distribution itself is not analytically tractable and the density function cannot be stated in a closed-form [9], [10]. As a result, modeling involving the distribution of the Tweedie compound Poisson must be approximated numerically.
In general, the numerical methods used are mostly based on penalized quasi-likelihood (PQL) [7]. But the PQL method is only able to estimate the regression parameters. The PQL method is not yet equipped with the ability to estimate variance parameters needed in GLMM.
Some methods had been done to overcome this problem such as Laplace approximation and the Gauss-Hermite quadrature adaptive method that are able to get variance parameter estimators in addition to estimating regression parameters [14].
The alternative method is hierarchical likelihood (h-likelihood) [13]. H-likelihood combines fixed and random effects in GLMM into an extended likelihood function. In this method the random effect does not have to be normal distribution like most in GLMM. For example, the random effect has gamma distribution while the response variable has Poisson distribution [3].
The h-likelihood method avoids the use of integrals in obtaining marginal likelihood. This method is also able to get the estimation of regression parameters and variance parameters.
So the research question arises as to how the modeling involves the response variable that has Tweedie's compound Poisson distribution with two random effects? Supposed the random effect added is the time assumed to follow the first-order autoregressive process [11], the next research question is whether the h-likelihood method can be used to estimate the regression through link function ( ) = , where fixed effect vector, and are the associated design matrix, and assumed ~ ( , ). The variance component is further expressed in terms of the relative covariance factor such that = ′ [14]. As a result, the specification of equation (2) can be expressed as where * ~ ( , ).
where 0 and 1 are fixed effect vectors, ~ (0, 2 ) is the random effect of the -th region, the is the random effect of the time assumed to follow the first-order autoregressive process where is an error of the assumed ~ (0, 2 ), and is the autoregressive coefficient. The random effects of and are assumed to be independent.
Suppose there is one covariate, then for each region at time , equation (4) If log = then the above equation becomes So for region , equation (6) becomes Supposed that the number of observation is balanced and if defined 1 = and 2 = the equation (7) above becomes Model for all region is Equation (4) assumed ~ (0, 2 ), so the expectation value and covariance matrix of vector On Equation (4) is assumed independent and AR(1) so the expectation value and where is symmetrical matrix sized × with element ( , ′ ) is | − ′| , = 1, … , and Equation (10) and equation (11) are rearranged in the form of relative covariance factor and so that where is a Cholesky decomposition matrix of first-order autoregressive correlation matrix (1− 2 ) .

Parameter Estimation
In equation (14) above, it can be seen that the first parameters to be estimated are , * , and * . Those parameters are estimated by h-likelihood method. According to [3], h-likelihood is defined as where 1 = log ( | * , * ) is the log-density function for given * and * , 2 is the log-density function for * with * ~ ( , 2 ), and 3 is the log-density function for * with * ~ ( , 2 (1− 2 ) ). Since has Tweedie compound Poisson distribution, is belong to the MDE family with ( ) = and 1 < < 2. As a result, the density function of is as defined in equation (1).
The parameter estimation solution can be done by maximizing the h-likelihood function above, by finding the solution of the equation ℎ = 0, ℎ * = 0, and ℎ * = 0. Since the Tweedie compound Poisson distribution is not a closed-form, then a numerical approximation is carried out.
Parameter estimation can be solved by the Newton-Raphson method as follows After getting the parameter estimates of , * , and * , then the variance parameters will be estimated. To get the estimation of variance parameters, [13] defined adjusted hierarchical likelihood ℎ as where ℎ is h-likelihood function from equation (15) and is a Hessian matrix from equation (17).
Variance parameters 2 and 2 can be obtained by iteratively solving the equation until convergent where is Hessian matrix containing the second derivatives of adjusted hierarchical likelihood h A function.

Simulation Design
In this section a simulation study will be conducted to evaluate the goodness of the developed model. The determination of the parameter values in this simulation refers to [11] and [14]. The stages of the simulation are as follows: 2. Estimate the parameters of the fixed effect , the region random effect * , and the time random effect * using the h-likelihood method until convergent.
3. Estimate variance component 2 and 2 using adjusted profile likelihood ℎ method until convergent.
6. Repeat steps 1 to 6 above as many as = 100 times.
7. Evaluate the model as in [14] by In this simulation study, estimating the parameters of both models was carried out using the h-likelihood method. The computational program was built using the R programming software.

Simulation Result
Much like [11] studies were conducted with known autoregressive coefficient values. In addition, the dispersion parameter values and Tweedie distribution index parameters were also assumed to be known referring to [14]. Simulation results with 100 replications can be seen in Table 1 and   Table 1 Model 1, for the same autoregressive coefficient, the greater the regions variances the smaller the relative bias. Then, the greater the autoregressive coefficient the smaller the relative bias produced. This shows that, the greater the autoregressive coefficient, the more unbiased the estimator can be. In addition, the greater the region variances the more unbiased the estimators produced. In other words, the autoregressive coefficient and regional variance influence   Table 1 also show the MSE values of the two models approaching zero. This shows that both models describe the data very well.

APPLICATION STUDY
The data used comes from the Financial Services Authority (FSA), which is a report on the risk profile and loss of motor vehicle insurance of a general insurance company in Indonesia in 2014.
The total claim becomes the observed response variable, the deductible becomes the fixed effect, and the region code as many as 35 regions as well as the month of occurrence to be random effects.
The month of occurrence that are considered to follow the first order autoregressive process then. For the purposes of this study the data were partly drawn through simple random sampling of 175,000 items of actual data. In each region and month, 10 policy numbers were taken.
Similar to the simulation study, the application study was also carried out on two models.
Model 1 is using the autoregressive assumption on the time random effect as defined on equation (4) and Model 2 is without the autoregressive assumption on the time random effect as on equation (20).
To show that the response variable has Tweedie compound Poisson distribution, the index parameters of the Tweedie compound Poisson distribution must be between 1 and 2, or 1 < < 2. The Poisson compound distribution index of Tweedie is obtained from the Tweedie package in with the tweedie.profile () function. The program package also produces dispersion parameters . Figure 2 shows the highest likelihood profile value achieved by the index parameter between 1.5 and 1.6.   The correlation coefficient between the total claims data at time and − 1 is equal to ρ = 0.362 and the initial value 2 = 0.1 is obtained by finding the variance of the difference in the total claims at time t reduced by multiplication between the correlation coefficient ρ with the total claim time t -1. The initial value of the area random effect 2 = 0.5, and the regression parameters 0 = 0, and 1 = 0 refer to [4].
From Table 3 below it can be seen that the two models produce different estimates of regression parameters. The parameter 0 in Model 1 is -2.357819 and 1 is 13.291004. While in Model 2, the estimated value of the parameter 0 is -2.378214 and 1 is 13.2967676. As a result, the interpretation of the fixed effect on Model 1 is that if the deductible changes to one rupiah, then the expected total value of claims will change 56004.37 rupiahs, while in Model 2 if the deductible changes to one rupiah, then the expected total value of claims will change 55241.07 rupiahs.  In the estimation of the random effects variance, Model 2 produces a relatively smaller variance than Model 1. In both models, the region's random effect is greater than the residual. This shows that there is diversity in the total value of claims between regions. Whereas the variance estimate of time random effect is smaller than the variance estimate of residuals. This shows there is a total diversity of claims in time but there is no variation in total claims between time Evaluation of the model is done by calculating the mean absolute bias value and MSE of both models. Table 4 shows that the Model 1 produces a smaller mean absolute bias compared to the Model 2. In addition, the MSE value of Model 1 is smaller than the Model 1. This shows that in the application study, Model 1 is relatively better than the Model 2.

CONCLUSION
The h-likelihood method can be used to estimate the regression and variance parameters in GLMM with response variables has Tweedie compound Poisson distribution with two random effects namely region and time which are assumed to follow the first-order autoregressive process. The simulation study shows that for models with a time that are assumed to follow the first-order autoregressive process, the greater the autoregressive coefficient the relative bias produced is smaller and the greater the variance of regions the more unbiased the estimators produced. In other words, the autoregressive coefficient and regional variance influence the predictor's biasness.
MSE values close to zero indicate that the model describes the data well. The application study shows the absolute bias value and the MSE model with a time which is assumed to follow the first-order autoregressive process is smaller when compared to models with a time without firstorder autoregressive assumptions. In general, models with a time that are assumed to follow the first-order autoregressive process are relatively better when compared to models with a time random effect without first-order autoregressive assumptions.

ACKNOWLEDGMENT
We would like to thank Kemenristek Dikti Republic of Indonesia for funding this research through 2020 Doctoral Grant Program.