Spatially Filtered Ridge Regression Modeling to Find out the Rice Production Factors in East Java, Indonesia

The research aims to model rice production in East Java using the Spatially Filtered Ridge Regression (SFRR) method and ensure that all violations of assumptions are resolved by knowing the direct and indirect effect of predictor variables. The data are secondary data sourced from the publication of Badan Pusat Statistik containing provincial food crop agriculture statistics in East Java and the 2018 publication of Dinas Pertanian Jawa Timur (literally translated as Agriculture Department of East Java). The data analysis process is done by RStudio and ArcMap 10.3 software. In the research, the observation unit is 38 regencies or cities in East Java. The analysis results show that SFRR with queen contiguity weighting can overcome spatial autocorrelation and multicollinearity in rice production data in East Java. As for the established model, the variables of rice field area, urea fertilizer, Phonska fertilizer, SP-36 fertilizer, and tractor have a significant effect on rice production. However, ZA fertilizer has no significant effect on rice production. Then, a large comparison of direct and indirect impacts for each predictor variable is also generated. Generally, direct impacts are greater than indirect impacts.


I. INTRODUCTION
T HE agricultural sector is essential in the development of the national economy. The role of the agricultural sector includes providing food and industrial raw materials, absorbing labor, and being the main source of rural household income. As a food provider, the agricultural sector should be constantly supervised because it is an important need for humans. Rice is the main food commodity of Indonesian society that almost the entire population in the country consumes rice every day [1].
Received: Aug. 31, 2020; received in revised form: Nov. 11, 2020; accepted: Nov. 11, 2020; available online: Nov. 23, 2020. *Corresponding Author The agricultural sector is closely related to geographical conditions. Geographical conditions include altitude, air temperature, soil type, soil pH, and water content, which are suitable for certain types of plants. Based on those conditions, the development of rice crops is related to spatial effects. Tobler proposed the first law on geography that the condition at one point or area related to other adjacent points or areas [2]. Like rice production, if an area has high rice production, the surrounding area will have high rice production potential. It is because the adjacent areas have almost the same geographical conditions. It indicates that there is spatial autocorrelation in rice production.
Rice production factors include the area of rice fields, the amount of labor, fertilizer, pesticides, and irrigation systems [3]. Among the factors that influence rice production, there is a correlation between predictor variables that the wider the rice fields are cultivated, the more the seeds and fertilizers are needed. If there are cases of two or more highly correlated predictor variables, it is difficult to separate the influence of each variable into a response variable called multicollinearity. If the issue is not addressed, it may lead to an estimator with a large variance or standard error. Also, multicollinearity issues cause high R 2 values, but many predictor variables have no statistical effect. One method of analysis that can be used to solve multicollinearity is the ridge method [4].
Some assumptions must be fulfilled in the multiple linear regression model: the residuals should be normally distributed with a zero mean, and the variance should be consistent and not correlate with themselves or other variables [4]. If the non-autocorrelation assumptions are violated, the multiple linear regression model is not properly used. They will cause an estimator with large variance, so another model is required that can accommodate violations of non-  [5]. If SAR has a violation of non-multicollinearity assumptions, another method is needed to overcome the deviation of such assumptions. Spatially Filtered Ridge Regression (SFRR) is a combination of SAR and ridge method. It obtains stable parameter guessing value and resistant to multicollinearity problems and dependency between observation areas (spatial autocorrelation). The previous research studies the factors affecting the temperature in land cover issues on Urban Heat Island (UHI). Most of the methods used in these cases are Ordinary Least Squares (OLS), but there are cases of violated non-multicollinearity. It causes parameter estimations value to vary from one sample to another. Large sample variability results in an increased variance of estimation. The predictors should have a significant but insignificant effect. When the parameters to measure the degree of spatial spillover (ρ) are specified, it is substituted into (I − ρW )Y . The result is used as a response variable for the ridge regression method. The research shows that the SFRR model can produce the least Mean Squared Error values and Akaike's Information Criterion (AIC) compared to spatial regression, Spatial Autoregressive Model and Ridge Models [6]. It is a guide to the selection of Spatially Filtered Ridge Regression models.
In addition to information on spatial autocorrelation between observation units based on the position of each unit that is related to each other, it is interesting to know the direct and indirect impacts. Research on rice production with spatial influences is conducted by [7] with the Geographically Weighted Regression (GWR) model. The research also includes Phonska fertilizer and harvest ingestion, which influences the increase in rice production in East Java. The other previous research mentions that rice field area, harvest area, and rice productivity significantly affect rice production with Spatial Durbin Error Models (SDEM) approach [8]. Next, the amount of used fertilizer also has a significant effect on rice productivity. The increase in rice production can also be influenced by agricultural machinery that farmers use [9]. Similarly, previous research shows the effect of the tractor on agricultural income. It mentions that using tractors can reduce the labor to accelerate land preparation activities, implement timely planting in the growing season, and increase rice production [10].
The research aims to model rice production in East Java using SFRR method. It addresses the problem of multicollinearity and spatial autocorrelation on rice production. It is expected to assist the government in Tractor Unit supervising areas with low rice production so that there is no shortage of rice, especially in highly populated areas with no potential as a rice producer.

A. Data Sources and Research Variables
The used data are secondary data sourced from the Badan Pusat Statistik 'Central Bureau of Statistics' (BPS) containing East Java Provincial Food Crop Agriculture Statistics and the publication of Dinas Pertanian Jawa Timur 2018. The data analysis process is done using RStudio and ArcMap 10.3 software. In this research, the observation unit is 38 regencies or cities in East Java. The selection of location is because it is an area that has the potential to produce the largest rice in Indonesia. The variables used in the research are in Table I.

B. Analysis Method
For the analysis, the researchers conduct several steps. First, explore the rice production distribution map (Y ) and descriptive statistical analysis of predictor variables (X p ). Second, perform the multiple linear regression analysis so that the residual model is obtained. The multiple linear regression analysis model describes the relationship between response variables and predictor variables, as formulated by [4]. Third, define a spatial weighting matrix (W ) of standardized lines with the queen contiguity method [10]. Fourth, test spatial heterogeneity on the residuals of the multiple linear regression analysis model with the Breusch-Pagan test formulated by [11]. The test ensures that the residual elemental relationships of the multiple linear regression analysis model do not vary between locations [12]. Fifth, conduct spatial autocorrelation testing. It is performed on residuals of the multiple linear regression model and response variables of rice production. The presence of spatial autocorrelation on the residuals indicates a violation of the assumption of residual freedom at the time of hypothesis testing. It will cause the variance of estimation of correlated residuals to no longer have a minimum variance [13]. The spatial autocorrelation test uses Moran's I test [5]. Sixth, perform multicollinearity testing using Variance Inflation Factor (VIF) value and correlation coefficients between predictor variables. Multicollinearity occurs if the VIF value is more than 10 [4]. The correlation coefficient (r) describes the degree of closeness of the relationship between two or more variables [14]. Seventh, perform spatial filtering process with the following stages. It estimates theρ parameter to measure spatial spillover degrees on SAR models (Eq. (1)) [10]. Y is a vector of dependent variable. X is a matrix of independent variables. Then, β is a vector of regression coefficient. I is a identity matrix. ν is a vector of residual.
When theρ parameter has been specified, it is substituted into Eq. (2). Y f iltered is used as a response variable for the ridge regression method [6]. It is is a vector of observations on the spatially filtered dependent variable.
Eighth, conduct SFRR model test (Eq. (3)) with several stages. e is a vector of residual.
Firstly, transform it into the default form (standardization) of response variables and predictor variables like in Eq. (4) [15]. X * ip is a default value of independent variables. Meanwhile, Y * i is a default value of dependent variable. Then, S Y is a standard deviation for Y , and S p is for each predictor variable. p is the total number of parameters, and n is the number of observations.
Secondly, determine bias constant (λ) based on VIF value in Eq. (5). The best bias constant (λ) value results in a VIF value of less than 1.
Thirdly, estimate the parameters in the SFRR model. β SF RR is vector parameter of SFRR (p × 1), and λ is bias constant located on the hose 0 ≤ λ < 1. X * is a matrix of predictor variables of standardized results (n×p). Y * f iltered is a vector response variable of standardized results (n×1). The equation is as follows: Fourthly, in forming the SFRR model, the SFRR regression guess is restored to the initial model form in Eqs. (7) and (8) [6].β k is a coefficient of SFRR, andβ 0 is an intercept coefficient.
Ninth, test the residual assumptions of the SFRR model. It is to ensure that the assumption violation is resolved. The normality test uses the Jarque-Bera (JB). The test utilizes skewness and kurtosis measures [4]. Meanwhile, the spatial autocorrelation test uses Moran's I test [5]. Tenth, perform a test of the significance of the SFRR model parameters using Wald test [5]. Last, interpret the SFRR model and determine the direct and indirect effects.

A. Rice Production Spread Pattern in East Java
Kementerian Pertanian Republik Indonesia 'Ministry of Agriculture Republic Indonesia' published the rice production improvement program results in 2018. It noted that East Java was the highest rice producer in Indonesia in 2018 with 13,000,475 tons [16]. Moreover, the spread of rice production in each regency or city in East Java Province was mapped into three categories, namely the highest, moderate, and lowest rice production. The category division was based on ArcGIS with Natural Breaks classification method. Figure 1 shows the results of mapping rice production data in tons throughout the observation site. There are 6 regencies or cities with the highest rice producers (532,815-924,212 tons). The majority are coastal areas near the coast. Lamongan Regency has the highest rice production of 924,212 tons, followed by regencies of Bojonegoro, Ngawi, Jember, Tuban, and Banyuwangi. About 19 regencies or cities fall into moderate rice production (179,915-532,814 tons). The remaining 13 regencies or cities are classified into low rice production (4,903-179,914 tons). Areas with low rice production are mostly in urban areas, such as Mojokerto City with 4,903 tons. This condition is due to differences in geographical conditions and economic activity in each region.

B. Description of Predictor Variables of Rice Production in East Java in 2018
Several factors can cause a high amount of rice production in each area. It includes the area of rice fields, fertilizer, and agricultural machinery. The geographical location of a region also affects rice production.   Table II provides information on descriptive statistics of predictor variables. The pattern of functional relationships between predictor and response variables can be estimated by creating scatterplots that contain information about both variables. Based on Fig. 2, it provides information that the rice field area (Fig. 2a), urea fertilizer (Fig. 2b), Phonska fertilizer (Fig. 2c), SP-36 fertilizer (Fig. 2d), and ZA fertilizer (Fig. 2e) have a positive and linear pattern of a relationship with rice production. It is characterized by the distribution of data that follows the regression line. However, the relationship between tractor (Fig. 2f) and rice production does not form linear patterns.

C. Multiple Linear Regression Analysis
Multiple linear regression analysis ensures that there is no spatial autocorrelation and spatial heterogeneity in the residuals. The residuals in the multiple linear regression model are derived from a subtraction in the value of rice production (Y i ) and the estimated value of the multiple linear regression analysis model (Ŷ i ) as shown in Eq. (9). Y i = −1.158 × 10 5 + 45.59X 1 + 4.73X 2 + 9.54X 3 + 9.64X 4 − 6.87X 5 + 816.4X 6 . (9) The model of multiple linear regression analysis is not interpreted since the results cannot be trusted. Based on the theory, rice production requires a relationship with spatial interaction. The multiple linear regression model is only used to obtain residual values for spatial effect testing.

D. Spatial Heterogeneity Test
This test is carried out to ensure that the residual elemental relationship of the multiple linear regression model does not vary between locations. It obtains a Breusch Pagan (BP) value of 12.4 < X 2 (0.05;6) = 12.59. It means H 0 is accepted that the variance of estimation for all observations is the same. Hence, it can be concluded that there is no spatial heterogeneity in rice production data in East Java of 2018.

E. Spatial Autocorrelation Test
In Fig. 1, the adjacent locations have the same color categories. Thus, there may be a spatial dependency.
In Table III   autocorrelation test results in response variables (rice production) with p-value of < α = 5%, so H 0 is also rejected. It means that the residuals have a correlation or relationship between locations of regency or city. Moran's I value is positive, so the spatial pattern formed is a grouping pattern with positive spatial autocorrelation properties. There is spatial autocorrelation in residuals of multiple linear regression model and response variables from the results of spatial autocorrelation tests. As a result of linear regression analysis is less precise, the analysis is performed using spatial regression. The model that can be used is SAR. It is used when there is an influence of the on-site (j) response variable on the observation location (i).

F. Multicollinearity Test
Multicollinearity testing on the research is conducted using Variance Inflation Factor criteria and correlation coefficients. Based on the VIF value in Table IV. it can be seen that the Phonska fertilizer (X 3 ) has a value of VIF > 10. It indicates that there is a multicollinearity issue between predictor variables. To determine the predictor variables with a relationship with Phonska fertilizer (X 3 ), the researchers calculate the correlation coefficient between predictor variables. Table V shows that a very strong correlation between the Phonska fertilizer with urea and ZA fertilizers. The correlation coefficient (r) is close to +1 and p-value is 0.00 < α = 0.05. It indicates the stronger presumption of multicollinearity in the case of rice production. Phonska is a compound fertilizer containing three main nutrients namely Nitrogen, Phosphorus, and Potassium that are indispensable for plants. If the Phonska predictor variable is issued, it will cause a specification error in the factor of increasing rice production.
G. Estimation of Spatial Spillover (ρ) Degree Parameters in Spatial Autoregressive (SAR) Model The first step in the spatial filtering process is to estimate the spatial spillover (ρ) degree parameter of the SAR model (Eq. (1)). The estimation of SAR parameters is done by the Maximum Likelihood Estimator (MLE) method. The equation is: Based on the estimation of SAR model parameters, it obtains ρ-value of 0.301. SAR model in Eq. (10) is not interpreted. The results cannot be trusted due to the violation of non-multicollinearity assumptions.

H. Spatial Filtering Process in Response Variables
The spatial filtering process substitutes the ρ-value of 0.301 and the response variable (rice production) to the function (I − ρW )Y as Eq. (2). The result is used as a new response variable for the ridge regression method.

I. Parameter Estimation of Spatially Filtered Ridge Regression (SFRR)
The SFRR parameter estimation utilizes the bias constant (λ) and spatial filtering of the response variable (Y (f iltered i) ). Forming an SFRR model is to return the SFRR parameter estimation to the original variable into Eqs. (7) and (8). The SFRR model formed using rice production data in East Java of 2018 is in Eq. (11). This SFRR model has not been interpreted before it confirms that all assumptions are met. The equation is:

J. Residual Assumption Test of Spatially Filtered Ridge Regression (SFRR)
In the assumption test of SFRR, there are two stages of testing: normality and spatial autocorrelation. These tests ensure that the assumption violation is resolved.

K. Parameter Test of Spatially Filtered Ridge Regression (SFRR)
Once it is confirmed that the assumption violation has been resolved. The results of the estimated parameters of the SFRR model are partially tested. It is to indicate that the parameter has a significant or insignificant effect. Partial testing uses Wald test statistics. If Wald test statistical value is more than X 2 α.1 of 3.841, then it is decided that the H 0 or model parameters have a significant effect. Table VI shows that the predictor variables of rice field area, urea fertilizer, Phonska fertilizer, SP-36 fertilizer, and tractor on the SFRR model have a significant effect on rice production. ZA fertilizer has no significant effect on rice production.

L. Direct and Indirect Effects
Dependency between site units affects rice production in East Java. It indicates that changes in conditions of one predictor variable at one location will affect rice production at the same location (direct effect). In turn, it will affect rice production in another location (indirect effect) [12]. Direct effect values of rice field area, urea fertilizer, Phonska fertilizer, SP-36 fertilizer,  ZA fertilizer and tractor are obtained from the diagonal matrix element of (I − ρW )β k . The values of ρ and β k are from the SFRR model. For example, the application of direct effects and interpretation to Lamongan Regency is presented in Table VII. The direct effect value indicates that an increase in the area of rice fields in Lamongan Regency by 1 ha will directly increase rice production by 42.696 tons. Then, if there is an increase in using urea fertilizer by 1 ton, it will increase rice production by 3.046 tons. The rise in Phonska fertilizer by 1 ton will have a direct effect on the increase in rice production by 3.907 tons. Similarly, the changes in SP-36 fertilizer by 1 ton will directly increase rice production by 15.092 tons. If the use of ZA fertilizer rises by 1 ton, it will have a direct effect on the increase in rice production by 0.1029 tons. Last, a change tractor by 1 unit will affect rice production about 654.3145 tons.
Moreover, the value of indirect effect changes in rice field area, urea fertilizer, Phonska fertilizer, SP-36 fertilizer, ZA fertilizer, and tractor for each regency or city is obtained from the average element instead of diagonal matrix of (I − ρW )β k . An example of the indirect effect applied to Lamongan Regency against Gresik Regency is in Table VIII. Every increase in rice fields in Lamongan Regency by 1 ha will indirectly increase rice production in Gresik Regency by 2.748 tons. Then, the increase in using urea fertilizer by 1 ton will have an indirect effect on the rise in rice production by 0.196 ton in Gresik Regency. Next, if there is an increase in Phonska fertilizer by 1 ton in Lamongan Regency, it will increase rice production in Gresik Regency by 0.251 tons. Similarly, the changes in SP-36 fertilizer in Lamongan Regency by 1 ton will affect rice production by 0.971 tons in Gresik Regency indirectly. If the ZA fertilizer in Lamongan Regency increases by 1 ton, it will indirectly affect the increase in rice production by 0.007 tons in Gresik Regency. Last, the increase in a tractor by 1 unit in Lamongan Regency will indirectly increase rice production in Gresik Regency by 42.1 tons.

IV. CONCLUSION
Spatially Filtered Ridge Regression models are formed with queen contiguity weighting that can solve spatial autocorrelation and multicollinearity problems. The analysis has been carried out to determine the direct and indirect effects. The variables of the rice field area, urea fertilizer, Phonska fertilizer, SP-36 fertilizer, and tractor have a significant impact on rice production. Those variables have a significant effect directly and indirectly. However, ZA fertilizer has no significant effect on rice production. Then, a large comparison of direct and indirect effects for each predictor variable is generated, generally direct effects are greater than indirect effects.
The research limitation is that the used predictor variables are only six variables. Hence, for future research, it is suggested that the selection of variables are more various from many aspects, such as rice seeds, harvest area, and soil nutrients. Moreover, for weighting matrix based on the concept of the area, the future research can use another weighting matrix approach with the concept of distance.