Identifying the best spatial interpolation method for estimating spatial distribution of PM2.5 in Jakarta

Nowadays, many researchers are focused on analyzing the association between PM2.5 concentration and respiratory diseases. PM2.5 is one of the most threatening air pollutant for human health in cities and causes an increasing number of deaths. However, obtaining detailed PM2.5 concentration data constitutes one of the problems in analyzing its relationship with the human health effect. This study aims to select the best model for predicting PM2.5, spatially explicit in Jakarta, and estimate its spatial distribution in this region over the 2019-2020 period. The observation data of PM2.5 measurement results were in eight points spread across Jakarta. Furthermore, the data is a two-year daily time series from 2019-2020, which was then be processed into annual average data. Seven spatial interpolations of different methods were selected to identify which is most realistic in generating the estimated concentration value of PM2.5. From the results, we conclude that the Spline with Tension was the best interpolation method based on 2D visualization and model evaluation. Based on the model evaluation, the Spline with Tension method generated the best model with minimum error, where RMSE, MSE, MAE, and MAP had values of 0.0533,0.0028, 0.0400, 0.0008, respectively. Meanwhile, Ordinary Kriging with spherical had the most significant.


Introduction
Jakarta, as a coastal megacity, is one of the big cities with the worst air quality in Indonesia. Air pollution has become one of the most critical problem on megacities due to the densely populated region [1,2]. The impacts of declining air quality include environmental damage, decreased air visibility, global warming, and increased health effects on humans [3][4][5][6][7]. One of the harmful pollutants in Jakarta is Particulate Matter [5]. The concentration of Particulate matter in Jakarta exceeds the threshold standard of the WHO [8]. Particulate matter (PM) is a mixture of solids and liquids with different chemical sizes and characteristics suspended in the air, collectively formed into aerosols in the form of dust, dirt, soot, smoke, and water droplets [9,10]. The ones with diameters less than 2.5 μm are referred as PM2.5 [9].
The sources of particulate matter are divided into two, natural and anthropogenic. Particulate matter from nature comes from volcanoes, forest fires, and biological sources. In contrast, anthropogenic sources are derived from transportation, burning of fossil fuel in stationary sources, various industrial processes, solid waste disposal, and others, including agricultural activities and boron emissions from highways [11,12]. Based on the formation process, PM is divided into primary and secondary particles  [11]. Primary particulate matters are emitted directly from the source, while secondary is formed from chemical processes [11].
Anthropogenic activity constitutes part of the causative factors of increasing PM2.5 concentrations in major cities, including Jakarta [11]. The main causes are energy sources used by transportation, power plants, industries, and food production systems [13]. Moreover, the other factor such as meteorology, topography, structure, and urban settlement problems also influence the perceived, commendable, and chemical transformation process of pollutants significantly [14]. Based on previous research, PM2.5 was proven harmful pollutants, both to the environment and human health. PM2.5 is a dangerous type of pollutant as its effect on human health ranges from cardiovascular to respiratory diseases [4,5,[15][16][17][18]. Therefore, research on PM2.5 is fundamental, especially in urban areas which are prone to the risks of PM2.5 exposure.
The importance of analysis on PM2.5 concentration is necessary to ascertain the influence of this pollutant on human health [19] Besides, determining the distribution of PM2.5 concentration in a region, enables the Government to make policies as an effort to reduce the concentration and, in turn, the risks of air pollution for humans and the environment. However, there are currently problems in obtaining the complete information of PM2.5 data in Jakarta. The limited number of air quality monitoring stations (AQMS) in Jakarta is one of the problems affecting the availability of pollutant data, especially PM2.5. Limited information on the concentration of PM2.5 necessitates the analysis of its relationship with human health. Spatial information on the distribution of PM2.5 is also an obstacle in the research. Therefore, there needs to be an estimate of the value of pollutant concentrations, one of which can be done by the spatial interpolation method.
Identifying of appropriate interpolation methods in GIS applications to estimate the value of pollutant concentration poses several challenges [20]. Modeled fields are usually complex, spatially heterogeneous data, and lack sample data, even to noise levels [20]. It is often difficult to find reliable, eligible interpolation tools. Choosing the wrong method can lead to potentially wrong result and interpretation based on misleading spatial information, thereby producing false spatial patterns [20]. Therefore, selecting of adequate methods with appropriate parameters using GIS application is critical in determining realistic spatial interpolation results [20].
The objectives of this study include: 1) to select the best model for predicting PM2.5, spatially explicit in Jakarta; and 2) to estimate the spatial distribution of PM2.5 in Jakarta over the 2019-2020 period. Currently, there are lot of literature on estimating the distribution of pollutants by interpolation methods [19,21,22]. This research compares several spatial interpolation methods to determine the estimated concentration value of PM2.5 and determine the best method that produces the most realistic value. In the future, the results will be used to analyze the influence of PM2.5 on respiratory diseases in humans.

Study area
The research location is in the Jakarta area. The city is astronomically located in 6°12' South Latitude and 106° 48' East Longitude with an average height of 7 meters above sea level. Jakarta is divided into five cities, there are South Jakarta, East Jakarta, Central Jakarta, West Jakarta, North Jakarta and one administrative district, and Kepualaun Seribu. The city has a total area of 662.33 km 2 [23].
This study used eight points of sampled location to estimate PM2.5 concentration based on the location of air quality monitoring stations (AQMS) points spread in Jakarta. PM2.5 data obtained from the Department of Environment of DKI Jakarta Province, Indonesia Ministry of Environment and Forestry, and the United States Embassy. The stations are marked with the code DKI 1, DKI 2, DKI 3, DKI 4, DKI5, KLHK-GBK, US Embassy 1, and US embassy 2 as shown in Table 1. Figure 1 shows the study area dan the locations of AQMS in Jakarta.

Data
This study used daily PM2.5 data through eight observation points in Jakarta collected from the Environmental Agency, the United States Embassy, and the Ministry of Environment and Forestry. The simple average of daily PM2.5 concentration was calculated to obtain the annual spatial distribution of PM2.5. In this study, only pollutant data in urban areas of Jakarta were used. Therefore, the air quality in Kepulauan Seribu will not be analyzed as the islands do not have a significant contributing effect on emissions. We also collect coordinates data for all air quality monitoring station locations, as shown in Table 1.
Based on previous research, the impact of air pollution continues to be a significant concern in respiratory disease prevention, as it affects individuals living in urban areas around the world [17]. The latest time series data over 2 years from 2019 to 2020 were used. It is hoped that air quality can be ascertained both before and after the Covid-19 pandemic. Daily PM2.5 data were calculated into average annual data to determine the changes from year to year. Besides, coordinate point data were used from each air quality monitoring station spread across Jakarta. There were eight coordinate points used based on the number of air quality monitoring stations mentioned earlier in Table 1.

Spatial Interpolation Methods
Nowadays, spatial interpolation methods are used and developed in predicting values of spatial phenomena in unsampled areas using a set of point data. These are data that have value at a specific location in a study area [24]. Surface data divided the focus area into cells according to the data value for each cell. The Spatial interpolation method converts corresponding point data into surface data by dividing the study area into small cells containing a data value for each cell location within the study area, whether sampled or not [24]. In this study, spatial interpolation methods were used to estimate concentration values of PM2.5 by converting the available point data to surface data to determine the spatial distribution of PM2.5 in Jakarta. We used spline and kriging spatial interpolation method for estimating PM2.5 concentration. Jakarta is an urban area with a high population density. However, Jakarta is a unique city, where it is a city that borders an industrial area and also borders the sea. Kriging is the right method used in Jakarta, based on previous research Kriging method successfully producing a realistic model compared to IDW method in urban areas where the area characteristics same as Jakarta [19]. In addition, Spline is a good method for estimating pollutants in urban areas [25].
Seven spatial interpolations of different methods were selected -i.e., Spline with Tension, Regularized Spline, Ordinary Kriging with spherical, Gaussian, and linear semivariogram models, Universal Kriging with linear and quadratic semivariogram models, respectively was conducted to select the best methods of PM2.5 spatial distributions.

2.3.1
Spline is an interpolation method suitable for flexible surfaces where it relies on physical models with the flexibility provided by changes in the elastic properties of the interpolation function [20]. This method has the main advantage in making surface data that is entirely accurate and visually interesting based on only a few sample points [24]. Furthermore, it represents a two-dimensional curve on a threedimensional surface [24]. The Spline interpolation methods used in this study were Spline with Tension and Regularized Spline.

2.3.2
Kriging is a stochastic method that uses a combination of linear weights at a known location to estimate the value of data from an unknown [24]. This method requires an input of spatial correlation size between two points, known as a variogram [24]. Furthermore, it uses variograms as an esential tool for selecting the appropriate methods, and performs optimally unbiased linear interpolation estimates on random spatial structures and parameters [26]. In this study, air pollution measurements had complex and unstructured spatial variations. Therefore, this method relies on variograms that summarize these complex variations [19].
The main advantage of kriging is that this method can measure the error or uncertainty of the expected surface [24]. The Kriging method used the research was divided into Ordinary and Universal Kriging semivariogram models. Ordinary kriging is a method that assumes the mean of an unknown population, and in the spatial data does not contain trends and extreme values [27]. The local variance of data in the Ordinary Kriging method in search ellipsoids was used to estimate the specified value in the case of similar input data [28]. The number of kriging variances in this method was minimized using a linear external parameter called Lagrange factor (μ) [28]. The Universal Kriging method is used for spatial data that is not stationary, assuming the data is stationary [27,28]. Average values vary and are often not constant across areas of study or variables said to be non-stationary [28].

Quantitative assessment
The interpolation method used to estimate PM2.5 values were validated using cross-validation and 2D visualization. Cross-validation was performed through calculation by removing a single sample point with a known data value. The other value was used to predict the value at the location of the discarded point [24]. Furthermore, known data values compared to predicted values of other sample values and the accuracy of predicted results were calculated [24]. Cross-validation was used to confirm the robustness of the PM2.5 model [8]. Model evaluations were performed by four different metrics -i.e., Where n is the sample size, z(xi) is the observed value at location i, z*(xi) is the interpolated or predicted value at location i, and n is the sample size [24,29].

2D Visualization
The image shown in Figure 2 is the result of the estimated spatial distribution of annual PM2.5 concentrations in Jakarta in 2019, using seven different interpolation methods. The result showed that, the PM2.5 concentration distribution on the 2D map is different for each method used. The model results can show differently even used the same technique and same input point data, because different parameters may result in different surfaces [24]. The color contour on the map showed the estimated value of PM2.5 concentration. A dark brown color signified the higher estimated value, and the lighter color showed the lower estimated value. The results of PM2.5 concentration estimation were more precise using the Spline interpolation method. This method also showed a more realistic distribution compared to others, as seen from the contours of the color. The color showed by the Spline had more variation compared to the results of the kriging method (Ordinary and Universal). The more colors are shown (variations), the more realistic the estimated results will be. Maps generated via Spline with Tension method showed a better distribution of color contours compared to the Regularized Spline. The color contours shown on the Spline with Tension model were more realistic than the other methods. Therefore, the estimated value of PM2.5 concentration using the Spline with Tension method was more accurate and showed good model performance. On the other hand, PM2.5 value estimation results using Regularized Spline method showed a rough color contour spread, where color contours predict fluctuating and extreme PM2.5 concentration values. Conclusively, the estimation of PM2.5 based on Regularized Spline model showed that the southern Jakarta concentration is higher compared to northern Jakarta.
The estimated Kriging method, both Ordinary and Universal in 2D visualization showed a more monotonous color contour. The estimation results using Ordinary Kriging appeared to show only one color. Therefore, the model shown by the Universal Kriging method is better compared to the Ordinary Kriging method, as seen from the color variations from universal kriging. The results of the Ordinary Kriging with Spherical, Gaussian, and Linear Semi variograms model showed light brown contours with an estimated concentration value of PM2.5 ≤ 55.68 µg/m 3 all over the Jakarta region. It indicates that the accuracy and performance models of PM2.5 concentration estimation using Ordinary Kriging were lower than the estimation using the Spline with Tension, Regularized Spline, and Universal Kriging methods.
The distribution map of PM2.5 concentration generated using the Universal Kriging method displayed more varied color contours than the Ordinary Kriging method. The Universal Kriging with Quadratic method showed estimated extreme values but were more varied and realistic than the Universal Kriging with Linear results. The Linear method displayed color contours like striped effects.
Estimation or prediction results generated with the interpolation method with the smoother showed less extreme values were not an indication of good and bad model performance [24]. The results shown from the seven different interpolation methods are only characteristic of the overall trend of interpolated surfaces, and model evaluation still needs to be analyzed by quantitative assessments [24].

Quantitative Assessments
The calculated error results are shown in Table 2 below. Slight values indicate fewer errors, which means a model's accuracy is higher and performed better than other models [24]. However, note that crossvalidation only validates the accuracy of estimates or predictions at the location of the research sample and cannot reflect the differences in spatial interpolation techniques [29].
The results of the evaluation models showed that the Spline with Tension method has the highest accuracy, as indicated by the lowest RMSE, MSE, MAE, and MAPE values (low error rate). On the other hand, Ordinary Kriging with Spherical interpolation method is the lowest performance method, as the estimated or predicted value is inaccurate. The RMSE, MSE, MAE, and MAPE values of the three Ordinary Kriging methods were relatively high at 5.98, 35.77, 4.61, and 0.09, respectively. Therefore, the Ordinary Kriging method has the most significant.  Evaluation results support the result of the visual evaluation on the 2D map displayed by the seven methods discussed earlier. The Spline with Tension method can generate the most accurate and realistic model. Therefore, it has a better performance than other models. Other methods are fail to estimate the PM2.5 concentrations due to limitations of the AQMS that cause the variability of air pollution at that location to be not well quantified, and the interpolation methods cannot account for variability without more monitored values [19].

PM2.5 Spatial Distribution
Spline with Tension is one of the best methods of seven other spatial interpolation methods based on 2D visualization and model evaluations. Figure 3 shows the map of the estimated distribution of PM2.5 concentrations annually using Spline with Tension interpolation methods in Jakarta in 2019 (left) and 2020 (right). From the estimation, the concentration of PM2.5 in Jakarta in 2019 is relatively high on the edge of the region, especially in West and East Jakarta. Based on the contour shown, the estimated highest PM2.5 concentration in Jakarta in 2019 reached 65 μg/m 3 , and the lowest value was 39 μg/m 3 . The area with the lowest concentration of PM2.5 is the central part of Jakarta, known as Central Jakarta, and northeast Of Jakarta named North Jakarta. In general, the concentration of PM2.5 in Jakarta in 2019 was relatively high in some regions.
The estimated concentration of PM2.5 in 2020 showed that there was a decrease in most areas in Jakarta. Annual PM2.5 concentration in 2020 is generally said to be lower in the northern part of Jakarta than the central part. Furthermore, the highest concentration of PM2.5 was found in a small part of western and eastern Jakarta. Based on color scales, the highest concentration was < 65 μg/m 3 , and the lowest was 39 μg/m 3 . To clarify the changes in the concentration from 2019 to 2020, Figure 4 shows a map of changes in PM2.5 concentration.
The scale bar located to the right of the map signifies the value of the color on the map. The cyan color indicates the most significant change in the value of PM2.5, while magenta signifies fewer value changes of PM2.5 concentration. Furthermore, the map showed the difference in concentration before the pandemic (2019) and during the pandemic (2020). The most significant change in the northern part of Jakarta is shown by cyan color contours with a change in the value of 23.33 µg/m 3 . In contrast, the change in concentration of PM2.5 is not very significant in the western and eastern parts of Jakarta, as indicated by magenta color with significant value changes (2.79 µg/m 3) . The concentration in small part of the Central and Southern areas of Jakarta are decreased, but not as large as the decline in the northern part. Generally, it was concluded that all areas in Jakarta have a decreased concentration of PM2.5.

Discussion
The lack of air quality monitoring stations in Jakarta is a weakness to analyze the association of air pollution on human health. The quality and number of monitoring stations are critical. They provide data in order to assess air quality, identifying potential and relevant sources of air pollution, assisting in strengthening management control air quality, and advising policymakers, especially the Government [8]. Lack of air quality information is a problem that requires an urgent solution. Nowadays, there are no satellite models or data for PM2.5 observations, and the reanalysis data available only represents the black carbon. Therefore, remote sensing and GIS techniques are some of the applicable solutions for the estimation of PM2.5 concentrations that allow the management of air quality for human health protection [30].
The limitation of this research in generating models using GIS is data from air quality monitoring stations in Jakarta. Moreover, the location of AQMS is centralized and undistributed locations. This limitation certainly affects the estimated results of the specified model, where the generated models are dramatically different. The results can also be biased due to small samples [24]. Moreover, an extensive search radius also makes the sample location very far away from the predicted location. Therefore, the relationship with air pollution level in the predicted location becomes weak [19].
The best estimation result was the model generated using the Spline with Tension method. Evaluation results through cross-validation and 2D maps showed that interpolation methods using Spline with Tension provided the most accurate and realistic results compared to other methods. Therefore, the distribution of PM2.5 concentration was estimated using this method. In general, PM2.5 concentration in 2020 was lower than in 2019. This decrease in PM2.5 concentration occurred during the pandemic. The Government policy in limiting all activity during the pandemic (PSBB) is assumed to be one of the factors which had a significant effect on the decrease in concentration in Jakarta. Other findings from the distribution map of PM2.5 concentration are the Western, Southern, and Eastern regions of Jakarta, which had the highest concentration of PM2.5. Meanwhile, in the Northern and Central parts of Jakarta, the concentration level was relatively low. In the Eastern part of Jakarta, an industrial area such as Cakung and the Bekasi area, the PM2.5 concentration was high [31]. The main source of pollutants my come from industrial waste and large vehicles using fossil fuels as energy [31]. Besides pollutant sources, meteorological factors, topography, structure, and urban settlement problems are factors that are affect the perceived, commenced, and process of transformation of pollutants [14]. Furthermore, there needs to be further analysis in the sources and causative factors of high concentrations of PM2.5 in Jakarta.

Conclusion
The lack of air quality monitoring stations in Jakarta is a weakness in analyzing air pollution's influence on human health. Therefore, remote sensing and GIS techniques are one solution that can be used for estimating PM2.5 concentration. However, not all interpolation methods can generate accurate estimation. All models are required to be evaluated both with 2D visualization and quantitative assessment (crossvalidation) to determine which is best at predicting or estimating PM2.5 concentrations. This research showed that Spline with Tension interpolation method could generate the most realistic and accurate model with better performance than the other methods. The value of RMSE, MSE, MAE, and MAPE were 0.053, 0.003, 0.040, and 0.008 respectively. The results of this evaluation showed that Spline with Tension is the method with the lowest error rate. Besides, the distribution map displayed more realistic and varied color contours without extreme values. Other methods are failed to estimate the PM2.5 concentrations due to limitations of the AQMS that cause the variability of air pollution at that location to be not well quantified, and the interpolation methods cannot account for variability without more monitored values. Moreover, Spline method has the main advantage in making surface data that is entirely accurate and visually interesting based on only a few samples points The Spline with Tension method was used to make a distribution map of PM2.5 in Jakarta from 2019 until 2020. In general, the concentration of PM2.5 in 2020 was lower than in 2019. The decrease in PM2.5 concentration occurred during the pandemic. We conclude that the decrease occurred because of the Government policy in limiting all forms of activity during the pandemic. Besides, the results of the annual average of the concentration also showed that, the concentration of PM2.5 is high or accumulated in the Western, Eastern, and Southern regions of Jakarta.