THE SPATIO-TEMPORAL MODEL FOR THE TWEEDIE COMPOUND POISSON GAMMA RESPONSE IN STATISTICAL DOWNSCALING

This research aims to develop a Spatio-temporal generalized linear mixed model with the h-likelihood estimation method for Statistical Downscaling modeling with the Tweedie compound Poisson Gamma distribution which can produce estimates for fixed effects, random effects, and variance components simultaneously. The results showed that the proposed model has a good performance characterized by the lowest root mean square error prediction and able to reduce the variety of random effects caused by spatial and temporal dependencies.


INTRODUCTION
Statistical Downscaling (SD) is a technique in climatology that uses statistical modeling to develop functional relationships between large-scale (global) data and small-scale (local) data. SD modeling involves General Circulation Model (GCM) output data in the form of precipitation used as an explanatory variables (global scale) and rainfall data (local scale) as the response. The GCM output is spatially and temporally related because the data is taken from several grids time to time which results in the multicollinearity problems because the GCM variables are correlated with each other. Multicollinearity can be overcome by dimensional reduction, variable selection, and shrinkage in parameter estimation such as principal component analysis methods, lasso, fused lasso, elasticnet and others.
Rainfall consists of two types of data, namely discrete and continuous. If there is rain or no rain then the type of data is discrete. Meanwhile, if it rains then the intensity of the rain is continuous [1]. Rainfall modeling in SD usually uses two different distributions, separate and does not involve a zero value because there is no rain event. Research on SD has been carried out by [2] which uses the normal distribution with the fused lasso dimension reduction method but is still limited to the intensity of rainfall without f non-rainy events. [3] proposed a mixed distribution of the Tweedie family to model the two components of rain simultaneously. The Tweedie distribution is a family of distributions that is flexible to overcome non-negative data, highly right-skewed, symmetric data, and exact zero [4]. The Tweedie distribution is a special case of the exponential dispertion model (EDM) which has a variance function of the form ( ) = which is a function of variance that is proportional to several power or index parameters of the mean [5]. The Tweedie distribution involves a variety of discrete, continuous and mixed probability spreads depending on the index p parameter that is owned. The discrete probability distribution consists of the Poisson distribution with = 1. The continuous distribution consists of the normal distribution with = 0, gamma distribution with = 2 and the inverse-gaussian distribution with = 3. The mixed distribution between the Tweedie family consisting of Poisson and gamma is called Tweedie Compound Poisson Gamma (TCPG) with 1 < < 2. The selection of the appropriate distribution according to the occurrence and amount of rain is carried out simultaneously to overcome difficulties in modeling the two components of rain so as not to lose information in making predictions. 3  THE SPATIO-TEMPORAL MODEL FOR THE TWEEDIE COMPOUND POISSON GAMMA   SD will be more difficult to model if the data are taken from several locations and times from each location. Such a data structure will cause spatial and temporal dependencies. Thus, modeling must pay attention to these two dependencies, namely spatio-temporal data modeling.
Dependencies occur because the data between adjacent locations having a greater closeness of relationship and data from each location being interconnected will occur if the Spatio-temporal data is modeled with a generalized linear model.
Research on spatio-temporal data in SD has been carried out by [6] using the Normal distribution and [7] using three different distributions, namely the gamma distribution to model rainfall, the Bernouli distribution and the generalized Pareto distribution to select extreme rainfall.
Both studies used the INLA parameter estimation method based on the Bayes method. However, these two studies have not modeled the two components of rainfall simultaneously.
The SD study using the Tweedie distribution has been carried out by [8] comparing three different models to see which model is the best. The results showed that the mixed distribution of the Tweedie family, also known as the Tweedie compound Poisson gamma (TCPG) distribution with lasso regularization, had good modeling abilities, indicated by the smallest RMSEP value and the large correlation close to one. The model used is still limited to one location and does not consider the temporal dependencies.
The modeling of Spatio-temporal data can be done using a generalized linear mixed model (GLMM) which can handle correlated data due to spatial and temporal dependencies. Estimation of GLMM parameters involves complex integration when obtaining the marginal model [9].
Several methods have been proposed to solve parameter estimation with integral approximation such as the Laplace method, Gauss Hermite Quadrature (GHQ) method. These methods cannot be used for data analysis with more than two random effects because the computational speed will decrease rapidly if more random effects are used [10].
Bayes method is widely used to estimate GLMM parameters with Spatio-temporal random effects [11]. However, the Bayes method requires knowledge of the prior distribution of each estimated parameter and is computationally difficult, especially to obtain the convergence of the predicted parameters. Thus, GLMM requires a parameter estimation method that can handle some of these problems. [12] proposed a method of estimating hierarchical likelihood (h-likelihood) to avoid complex integration, slow or non-converging convergence, and do not require prior distributions for each of the estimated parameters [13] [14].
SD research using the GLMM method has been carried out by [15] and [16] but are still limited to the Gaussian distribution. Research on TCPG distribution with GLMM model has been done by Zhang [17] in the field of insurance, but spatial and temporal dependencies have not been included in the model. Thus, this study aims to build a GLMM model that can overcome spatio-temporal dependencies with the Tweedie Compound Poisson-Gamma (TCPG) response for SD modeling called the STGLMM model. The estimation of regression parameters was carried out using the hlikelihood method and multicollinearity in GCM output data was solved using the principal component analysis (PCA) method. Modeling with the TCPG distribution is not only able to predict the intensity of rain like the distribution commonly used but is also able to predict the probability of not raining, and the number of rain events every month.

TWEEDIE COMPOUND POISSON GAMMA RESPONSE
This study will describe the results of the development of the GLMM model with a spatially and temporally dependent. Tweedie compound Poisson gamma (TCPG) distribution is used as response with involving explanatory variables for fixed effects. The method of estimating parameters used is h-likelihood for estimating fixed effects, random effects, and variance components in the model. The steps of estimating fixed effect parameters, random effects, and variance components are as follows: 1. Specification of the STGLMM model with TCPG distribution.
2. Estimation of fixed and random effect intercept parameters use equation (5) 3. Estimation of the spatial and temporal random variance components random range effects use equation (6) 5 THE SPATIO-TEMPORAL MODEL FOR THE TWEEDIE COMPOUND POISSON GAMMA Solutions for stapes two and three are obtained using the Newton Rapson iteration method.

Spatio Temporal GLMM Model Spesification with TCPG Distribution
TCPG distribution can be used for modeling in the field of meteorology, especially rainfall. The of characteristics rainfall data are continuous positive and exact zero. In TCPG models of rainfall, Y is the total monthly rainfall, N is the total number of rain events per month and is the precipitation from the i-th event which has a Poisson distribution ~Pois(λ) mathematically written as: The amount of rainfall is represented as the total amount of rain from each rain event. Suppose ( ) ≥1 is assumed to have gamma distribution, namely: is a probability density function with mean and variance 2 . If [18]. For example, the spatio-temporal data ( , ) is the observation data assumed to be distributed in TCPG with the ( , ) ~ ( ( , ), ) with 1 < < 2. ( , ) is the spatio-temporal data notation observed at the location = 1,2, … . , and month = 1,2, … . , .
[19] state that GLMM is a model with a hierarchical structure. Level one is given for discrete or continuous response variable that follow family of exponential distribution. Level two assigned to unobservable latent is referred to as random effects. The GLMM model for Spatio-temporal data can be written as a hierarchical model, namely: ( ) is a spatial random effect, ( ) is a temporal random effect following the first order of the autoregressive process which has the form ( ) = ( + 1) + , | | < 1. ( ) is an exponential spatial correlation matrix with = √( − ) 2 + ( − ) 2 as euclidean distance between locations and is range parameter, , ′ 1−ρ 2 is temporal correlation matrix [20]. The TCPG distribution belongs to the exponential family. Thus, data modeling can use a link function that connects the observed data expectations with the regression equation, namely: This research adopts the research results [17] and [21]. Based on [17], the regression model in equation (1) is transformed in the form of a relative correlation factor matrix and which are Cholesky decomposition matrices such that = 2 ( ) = 2 ′ and = 2 , ′ 1− 2 = 2 ′. This regression model change aims to adjust to the model developed by [21]. The change in the form of the regression equation in equation (1) becomes the regression equation with the relative correlation matrix and shown in the equation (2) (2) ( | , ) = + 1 + 2 Equation (2) can be written in the form: (3) = ( | , ) = + * + * In which * = , * = , ~(0, 2 ) and ~(0, 2 ).

The Estimation of Fixed and Random Effect
The estimation of regression parameters uses the hierarchical likelihood estimation method which has a form as in equation (4): Based on (4), the h-likelihood function for this study is: The estimation of fixed effect parameters , spatial random effect , and temporal random effect is done by maximizing the h function in equation (5) by finding the first derivative h with respect to , and by completing ∂h ∂ = 0 , ℎ = 0, and ℎ = 0 using the chain rule as in [22] and [23]. The three derivatives are difficult to obtain a closed-form. This will be difficult if the parameter estimation is done manually. This problem can be solved by the iteration method such as the Newton -Raphson method.

Effect Estimation of Fixed and Random Effect
One of the concerns in random effects modeling is to develop better methods for estimating the component of the variance (dispersion) parameter. Estimating parameters involving random effects will involve complex integration using the likelihood concept. Meanwhile, the Bayes method requires priors in modeling and convergence is difficult to obtain. [12] developed a method of estimating parameter that is the maximum adjusted profile hierarchical likelihood estimator (MAPHLE) that is defined as follows:

The Algorithm for Spatio-Temporal GLMM (STGLMM) Models Prediction
nnn Prediction is done to see how close the actual data is to the predicted results and can also be used to predict several future time periods. The following prediction algorithm is used 1. Call the parameters ̂ , ̂, ̂ estimation that has been obtained.

Data Source
This study uses monthly rainfall and precipitation data from the GCM with a period of January

Rainfall Prediction using STGLMM in Statistical Downscaling
This study aims to build an GLMM model that has a spatio-temporal dependence with a Tweedie Compound Poisson-Gamma response which can be called a model STGLMM. This study uses 3 models to be compared, namely the spatio-temporal GLMM (STGLMM) method with location assumed to be exponentially correlated and time assumed to follow a first-order autoregressive process, spatial GLMM (SGLMM) with location assumed to be exponentially correlated and time assumed to be independent, Temporal GLMM (TGLMM) with location assumed to be independent and time is assumed to follow a first-order autoregressive process. The three models were compared to see which model was better and the effect of random effects on the GLMM model gradually.
Modeling using the TCPG distribution needs to consider the scale of the data used. Large-scale data need to be divided by a certain value so that modeling can be done. Large-scale data make modeling difficult, such as singular Hessian matrices, inconsistent results, and predictive results are not obtained because they are infinite. Some of the STGLMM model parameters that will be estimated include: fixed effect parameters, region and time random effects, spatial and time random effects variance components. The goodness of the model is measured by using the root mean square error prediction (RMSEP), and the correlation between the actual data and the prediction. The data exploration is shown in Figure 1. suspected that the data follows the TCPG distribution. The next step is to estimate the index parameter as a determinant of the TCPG distribution if the estimated index parameter has a value between 1 < < 2. Some of the important parameters that are suspected for modeling are listed in Table 1.    Table 2 is the parameter estimates for the fixed effects, and the variance components for STGLMM model.  Table 3 shows that the STGLMM method has the smallest RMSEP value compared to the SGLMM and TGLMM models. The highest correlation value between observation and prediction data is owned by the TGLMM model, followed by the SGLMM and SGLMM models. This shows that the three models are equally good at modeling rainfall data with spatial and temporal dependencies. However, the SGLMM model produces a model that is able to reduce the variability due to spatial and temporal dependencies and has the smallest RMSE value compared to the SGLMM and TGLMM models. The modeling results show that the STGLMM model has a good ability to predict two components of rainfall simultaneously.  Figure 3. Based on the plot in Figures 3(a) to 3(i) show that the plots of the three models have almost the same and close patterns, but the STGLMM method is close to the actual data. The nine rain stations from each image have the same predictive pattern, namely the monsoon pattern. The monsoon pattern is a type of rainfall that is unimodial, namely one peak of the rainy season between December-January-February and the dry season between June-July-August. can be obtained such as the intensity of rain, the average number of rain events per month , the average rainfall per incident , the probability of no rain events per month = exp (− ), many events no rain ( ). Predicted rainfall characteristics are described in Table 4 for the Malabar rain station. To make it easier to see the predicted pattern of rainfall characteristics for other regions, the plot of the estimated parameters for γ, αγ, π, Nπ of the three models for several regions can be seen in Figures 4(a) to 4(l).  Table 4 can be interpreted that the average daily rainfall events per month (λ) in January are twice, the average daily rainfall events per month (αγ) in January are 151, the probability of no rain events per month (π) for January is 0.21, the number of events without rain (Nπ) January is once.

CONCLUSIONS
Based on the description above, it can be concluded that The Spatio-Temporal Generalized Linear Mixed Model (STGLMM) is good for rainfall modeling which can be seen from the RMSEP value obtained, which is the smallest compared to the SGLMM and TGLMM models. The TCPG distribution is not only able to predict the intensity of rainfall but is also able to predict the number of rain events, and the probability will not rain in a certain month.