A generalized spatial autoregressive neural network method for three-dimensional spatial interpolation

. Spatial interpolation, a fundamental spatial analysis method, predicts unsampled spatial data from the values of sampled points. Generally, the core of spatial interpolation is ﬁtting spatial weights via spatial correlation. Traditional methods express spatial distances in a conventional Euclidean way and conduct relatively simple spatial weight calculation processes, limiting their ability to ﬁt complex spatial nonlinear characteristics in multidimensional space. To tackle these problems


Introduction
Due to the difficulties of establishing abundant observation stations and the existence of unobservable positions in space, research areas in geospatial subjects typically contain many unsampled data points.Estimating unknown data based on sampled point values and expanding discrete and sparse data into continuous field are the main goals of spatial interpolation models.Spatial interpolation is widely applied in many research fields, including air quality (Tang et al., 2017), climate and hydrology (Arowolo et al., 2017;Adhikary et al., 2017;Cheng et al., 2017), marine environment (Gao et al., 2020;Zhang et al., 2021), ecosystem (Pan et al., 2021), city (Hu et al., 2013;Szczepańska et al., 2020;Ma et al., 2019;Aumond et al., 2018), and agriculture (da Silva Júnior et al., 2019).Therefore, accurately fitting the spatial correlation between elements and improving model spatial interpolation abilities are important for exploring spatial distribution patterns and change trends and solving myriad problems encountered in nature and society.
According to Tobler's first law of geography, "everything is related to everything else, but near things are more related to each other" (Tobler, 1970).It proposes the existence of spatial correlation, which is a general feature of geospatial data as well as a core theory supporting spatial interpolation modeling.Following spatial correlation theory, most spatial interpolation methods define the value of an unknown point as the weighted sum of the values of surrounding sample points.In the spatial weight calculation process, spatial distance is the most fundamental and direct data used to measure spatial correlation.Therefore, (i) the expression of spatial distance and (ii) the solution method and precision of spatial correlation weights are the key to spatial interpolation modeling and determine the reliability of interpolation prediction.In fact, interpolation can be regarded as the prob-lem of mining complicated nonlinear relationships between spatial distances and spatial weights.
Since the 1950s, scholars have proposed various classical spatial interpolation methods through extensive practical explorations, including inverse distance weighted (IDW), Kriging, natural neighbor, spline, trend surface, and radial basis function, and they can use sampled points to model and restore spatial feature fields to a certain extent.Many studies have been conducted to improve and reform the traditional methods from the following perspectives: design search strategy of adjacent sampled points (Babak and Deutsch, 2009;Sun et al., 2020), change the measuring method of spatial distance (Greenberg et al., 2011;Aumond et al., 2018), change the calculation method of spatial weights based on data distribution characteristics (Lu and Wong, 2008;Li et al., 2020), incorporate other variables and information (Kumar et al., 2012;Adhikary et al., 2017), and improve the efficiency of interpolation calculation (Liang et al., 2018;Wang, 2015).However, most methods are still based on simple mathematical formulas and parameter calculations and have difficulty describing nonlinear and complex relationships in spatial processes.These limitations prevent these interpolation approaches from accurately reflecting the relevant characteristics of geographical elements, restricting their spatial interpolation abilities.
In recent years, machine learning theories have developed rapidly, which has provided new solutions for accurate spatial interpolation.A number of strategies and models of machine learning were introduced to solve interpolation problem, such as random forest (da Silva Júnior et al., 2019;Sekulić et al., 2020), support vector machine (Li et al., 2018;X. Zhang et al., 2017), and neural network (Rigol et al., 2001;Kanevski et al., 2008;Tao et al., 2019;Zeng et al., 2020).These models enable spatial interpolation methods to fit the nonlinear features.In particular, Zeng et al. (2020) proposed the spatial autoregressive neural network (SARNN) model for two-dimensional spatial interpolation by integrating the neural network with spatial autoregression theory, achieving superior performance compared with traditional spatial interpolation methods.However, these methods still lack consideration for the sufficient expression of spatial distance and their applicability in three-dimensional spaces with more complex feature fields.
With regard to spatial distance expression, traditional methods and the SARNN model employ Euclidean distances calculated using a fixed formula, treating all directions in space equivalently.However, spatial anisotropy, the universal feature of spatial element distribution and change, should be considered for accurate spatial interpolation, especially in three-dimensional space (Wu et al., 2020).For example, mineral resource distribution exhibits directional differences affected by geological structures (Samal et al., 2011), soil nutrient content gradients have specific orientation patterns (Abd El-Hady et al., 2018), and climate elements such as surface temperature and precipitation can be strongly direction-dependent on spatial scales (Chen et al., 2016;Y. Zhang et al., 2017;Wang et al., 2018).In three-dimensional spatial interpolation, spatial isotropic distance expression implies that any point with the same distance from a target point will exert the same effect on it, even if they are from different directions.It ignores the effects of differences and the complex coupling of various spatial axes on spatial weights, resulting in insufficient spatial correlation mining.
To address these limitations, we propose a generalized spatial distance neural network (GSDNN) unit to express distances in multidimensional space with nonlinear characteristics.In the GSDNN, generalized spatial distances between elements are fitted using multidirectional distance components.Furthermore, by combining the GSDNN unit with the SARNN, we integrated generalized distances into the spatial interpolation method and developed a generalized spatial autoregressive neural network (GSARNN) model to realize complex nonlinear spatial interpolation modeling in threedimensional space, improving spatial interpolation prediction and fitting abilities.
The remaining sections of this paper are organized as follows.Section 2 briefly introduces two traditional interpolation methods; defines the SARNN model and GSDNN unit; and describes the overall GSARNN model framework, training strategy, and evaluation method.In Sect.3, we perform interpolation experiments on two cases and compare the IDW, Kriging, SARNN, and GSARNN model results.The discussion and conclusion are given in Sects.4 and 5, respectively.
2 Generalized spatial autoregressive neural network

Traditional spatial interpolation
Interpolation methods can be divided into deterministic interpolation and geostatistical interpolation approaches, according to their mathematical principles.Deterministic interpolation, such as IDW, spline, and trend surface methods, builds the fitting surface according to the smoothness of the whole spatial surface or the similarities of spatial information elements to predict data in unknown regions.Geostatistical interpolation, such as the Kriging method, builds the sample point spatial structure by analyzing the distribution laws and relevant features of the sample points in space and predicting the change trend of the whole spatial area.

IDW interpolation
IDW interpolation (Shepard, 1968) is a deterministic interpolation method (Watson and Philip, 1985).IDW regards the value at an unsampled location as the distance-weighted average of the sampled point values (Longley et al., 2011).For an unsampled point, the closer the sampled point is, the greater an influence it exerts; the influence is inversely proportional to the distance.IDW can be expressed as where ẑi is the predicted value at the unsampled point i, z j is the observed value of point j , d ij is the Euclidean distance between point i and point j , and p is the power parameter that defines the weight decline rate as the distance increases.By defining a larger p, the influence of closer points is strengthened, affecting the smoothness of the interpolation results.
Due to the simplicity, convenience, and intuitiveness of the IDW method, it has been widely used in many fields, including geography, agriculture, oceanography, and environmental studies; however, extreme values among the sampled points can have a substantial impact on IDW spatial prediction results.

Kriging interpolation
Kriging methods, such as ordinary Kriging (OK), universal Kriging, and co-Kriging, are spatial interpolation methods designed to solve the problems of mineral deposit predication and error estimation (Krige, 1952;Matheron, 1963).These methods generate unbiased optimal variable estimations in a finite area using the variation function to perform moving average interpolation according to the differences of the sample points' positions and spatial correlation degree.Among the Kriging methods, OK is the most commonly used.
Kriging can be expressed as where z * (x 0 ) is the predicted value, and λ i and z (x i ) are, respectively, the weight coefficient and observed value of point i.
Kriging methods involve the calculation of the weight coefficient λ i , for which the key is to satisfy the unbiasedness and optimality.Unbiasedness means that z * (x 0 ) is the unbiased estimate of z(x), that is which can derive the following constraints on λ i : Optimality means that z * (x 0 ) is the optimal estimate of z(x i ), and the variance between the predicted value of the unsampled points and the estimated value of the observed points is the smallest, that is Var(z * (x 0 ) − z(x)). (5) Define the cost function and try to figure out a set of weights λ i that satisfy unbiasedness and minimize the cost function.Finally, the following equation set can be derived: where r ij is the semi-variogram between point i and point j , which can be expressed as where σ 2 is the variance of z(x), which is a constant in OK. r ij can be simply determined by z i and z j .Kriging assumes that there is a functional relationship between r ij and d ij (the distance between point i and point j ).By taking any two sampled points from the dataset, a total of n(n−1) 2 (r, d) pairs can be generated.We can use a linear, Gaussian, spherical, or exponential model to fit the relationship between r ij and d ij .
Using the fitted function, we can calculate r j 0 through d j 0 .Thereby, the optimal weight set λ i for Eq. ( 2) can be solved through Eq. ( 6).

SARNN model
Summarizing the principles of most traditional interpolation methods, it can be found that they are modeled following the core concept of fitting the relationship between spatial distance and spatial weight, a relationship that is often complicated, containing nonlinear characteristics.Thus, achieving accurate fitting using only simple mathematical functions is difficult.Establishing a nonlinear expression between the spatial distance d ij and the weight coefficient w ij is necessary to interpolate unsampled points from observed points.The spatial weight of sampled points to point i is defined as where w i represents the spatial weight vector of point i, w ij is the spatial weight of point j to point i, and d s ij is the spatial distance between point i and point j .
To characterize complex nonlinear relationships in space, Zeng et al. (2020) designed the SARNN model, exploiting the powerful modeling and nonlinear fitting capabilities of neural networks to fit the spatial weight w i .
It should be noted that there is a weight w ii in the vector w i which represents the spatial weight of point i to itself.To avoid overfitting, this weight should be set to 0: The spatial weights of all sampled point pairs can be expressed by an n•n weight matrix W. According to Eq. ( 9), the https://doi.org/10.5194/gmd-16-2777-2023 Geosci.Model Dev., 16, 2777-2794, 2023 weights on the diagonal of W should be reset to 0. Therefore, W can be defined as where ρ is the spatial weight component, and K is the standard weight component, which ensures that the neural network weight is independent of the point itself in the training process.k ij in K can be expressed as Next, the problem of solving the spatial weight can be transformed into the problem of constructing and training the neural network.The distance from the point to be interpolated to the observed point is the network input, the hidden layers are defined, and the spatial weight vector ρ i is the output, that is where T represents the vector of distances from point i to other sample points, and ρ ij represents the spatial weight of point j to point i. ρ ij is correspondingly multiplied by k ij to obtain the weight coefficient w ij .The matrix form is as follows: The product of the final spatial weight matrix W and the sampled value vector y is the unsampled point estimation results.ŷ can be expressed as 2.3 GSARNN model

Model definition
Spatial distance is the most important indicator of the relationship between two objects as well as the basis of spatial weight fitting.The essence of spatial interpolation is establishing a distance-based mapping relationship between the sampled region and the unsampled region.
For any two vectors α, β in the n-dimensional linear space V , there are a pair of coordinates α under the orthonormal basis.There are many ways to define spatial distances, such as the Manhattan distance, Euclidean distance, and Minkowski distance, which can be expressed as The traditional two-dimensional spatial interpolation methods always use Euclidean spatial distance as the basis for expressing spatial correlation, treating different spatial relative positions equivalently.However, in geographic space -especially in three-dimensional and higher-dimensional spaces -the changing trend and rate of elements often differ along various axes, and there is local variability in the data.Using Euclidean distance for three-dimensional spatial interpolation is an isotropic solution (Allard et al., 2016) that reduces the dimensionality of the raw data, discards a large amount of relative position information between points, and cannot adequately reflect the complicated nonlinear characteristics of data change, restricting the accuracy of interpolation in multidimensional linear space.
To solve these problems, we propose a generalized expression of spatial distance.The generalized spatial distance d G ij of α and β in n-dimensional linear space is defined as the function of the coordinate difference under the orthonormal basis, which can be expressed as The distance components of the point to be interpolated (x, y, z) and the known sample point (x i , y i , z i ) under the three-dimensional orthonormal basis are To fully and adaptively capture the nonlinear effect of the elements' changing trend in three-dimensional space, we designed a GSDNN unit that generates generalized spatial distances considering anisotropy based on the distance components of each axis.It can be simply expressed as Through network training, the generalized spatial distance automatically output by this network unit will reflect the complex characteristics of the specific spatial elements.The GSDNN structure is shown in the dashed box in Fig. 1.By replacing the input of the SARNN model with the GS-DNN unit, Eq. ( 12) can be refined as The refined model is the GSARNN model, and the overall model structure is shown in Fig. 1.
In the modeling process, the distance components from the unknown point to the sampled points in three-dimensional space are input into the GSDNN, and all GSDNN units share network weights and biases.Through the training process, the generalized spatial distance between the two points under the specific spatial context of the interpolation element is output and simultaneously becomes the input of GSARNN.After the hidden layer calculations, the output layer finally outputs the spatial weight component, which is multiplied by the standard weight component and the observed values of the sampled points.The sum of the output tensor is the interpolated value of the unsampled point.Note that since there is no recognized true value of generalized spatial distance for training process, the GSDNN unit can only be embedded in the neural-network-based method and participates in its overall training and calculation process.In other words, the generalized spatial distance is determined by the spatial characteristics of the elements to be interpolated, owning a specific connotation based on specific context of spatial elements.2014) is integrated to strengthen the generalizability of the model.

Model training and validation
We use the 10-fold cross-validation method for model training.The dataset is randomly divided into 10 equal portions, among which 9 portions serve as the training set, and the remaining portion is used as the validation set in turn.The training set is used to fit the data characteristics, and the validation set is used to verify the generalization performance of the model.The cross-validation method averages the training results of each group, reduces the sensitivity to data division, https://doi.org/10.5194/gmd-16-2777-2023 Geosci.Model Dev., 16, 2777-2794, 2023 Figure 3.The variable learning rate change line.

Evaluation method
To quantitatively measure the performance of the IDW, OK, SARNN model, and GSARNN model methods, we use the determination coefficient (R 2 ), root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) as evaluation indicators.The formulas are as follows: R 2 is a relative indicator that compares the model with the baseline using the average value as the interpolation result.The RMSE, MAE, and MAPE are absolute indicators that reflect the interpolation error; smaller values indicate higher model accuracy.

Experiments and results
We use two three-dimensional datasets with distinct characteristics to test the interpolation performance of the GSARNN model in different scenarios, comparing it with the traditional IDW and OK methods and the SARNN model.In case one, we conduct experiments using a simulated dataset, which can be generated arbitrarily and controllably.By simulating a dataset with complex features and conducting a quantitative cross-validation interpolation experiment on it, we can fully test the feature extraction and fitting ability of the GSARNN model.In case two, we experiment on a measured Argo temperature dataset in the western Pacific area, which reflects the authentic natural characteristics.In this case, in addition to the cross-validation interpolation, we select several spatial sections for interpolation prediction.By qualitatively analyzing the section interpolation results, the GSARNN model's ability to restore spatial element field patterns in practical interpolation applications is examined.

Case one: simulated dataset
To examine the ability of the GSARNN model to handle data with complex characteristics in three-dimensional space, we combine trends of gradual change and sudden variation to simulate a dataset in the three-dimensional spatial field, repeating the simulation and interpolation for 100 times.

Dataset
Defining the three-dimensional area of a cube with the unit length, the side length of the cube is 6, and the distance between adjacent data points is defined as 0.5.Therefore, the three-dimensional spatial research area contains 12•12•12 = 1728 data points, and the spatial coordinates (x i , y i , z i ) of each point are generated according to the following formulas: The simulated value V is defined as V 1 is a three-dimensional spatial data item with a gradual change trend, calculated as follows: where (x i , y i , z i ) are the coordinates of the sample point i in the three-dimensional spatial field.V 1 has a gradual gradient along the z = y = x direction, as shown in Fig. 4a.
V 2 is a three-dimensional spatial data item with local spatial variability, calculated as follows: V 2 has a spatial mutation at the two spherical surfaces 1.5 and 3 units away from the center of the three-dimensional space.The values of the entire dataset appear to diffuse from the center of the sphere to the surroundings, as shown in Fig. 4b.V 2 possesses high local spatial variability, which imposes certain challenges to the interpolation work.
The term ε is a random item that brings some uncertainty to the simulated dataset and can be expressed as The final three-dimensional spatial simulated dataset V is generated by adding V 1 , V 2 , and ε, as shown in Fig. 4c.The figure shows that the simulated data have both the gradient feature from V 1 and the mutation feature from V 2 .
This case compares the interpolation abilities of the GSARNN model and the other three models in threedimensional space using the simulated dataset above.

Experiment implementation
According to the data division method, the simulated dataset is randomly divided into 10 equal parts for the crossvalidation experiments.Each experiment has 1555 data points in the training set and 173 data points in the validation set.The validation set interpolation results of each fold are merged to obtain the complete interpolated dataset.
Considering that the four-layered feedforward network is a simple but efficient network structure (Tamura and Tateishi, 1997), we design the GSARNN architecture with one input https://doi.org/10.5194/gmd-16-2777-2023 Geosci.Model Dev., 16, 2777-2794, 2023 layer, two hidden layers, and one output layer.The number of neurons in the input layer and output layer is equal to the number of sample points in the training set.There is no standard method to determine the optimal number of neurons in two hidden layers.Instead, we determine the optimal number using a simple combination strategy proposed by Du et al. (2020).Table 1 lists the optimal network structure settings and hyper-parameters of the GSARNN model in this case.Besides, the power parameter of the IDW method is 4, and in the Kriging method we adopt the Gaussian model to fit the functional relationship between the semi-variogram and the spatial distance, which turns out to be the optimal variation function model among linear, Gaussian, spherical, and exponential models.The generalized spatial distance output by the GSDNN unit serves as the input for the GSARNN model, while the three-dimensional spatial Euclidean distance serves as the input for the three comparison methods.
The GSARNN and SARNN models are implemented using TensorFlow-GPU 1.13.0 and Python 3.5.4.

Results
Under the same conditions, interpolation experiments are conducted on the three-dimensional simulated dataset 100 times using the IDW and OK methods and the SARNN and GSARNN models.The mean statistical indicator results of the cross-validation experiments are shown in Table 2.
Compared with the traditional IDW and OK methods, the two neural network methods show significant improvements on all statistical indicators.The R 2 value of the SARNN model (0.8804) is 19.62 % higher than that of the OK method (0.7360).After integrating the GSDNN unit, the R 2 increases by 5.95 % to 0.9328 for the GSARNN, for an overall increase of 26.74 %.In addition, the RMSE, MAE, and MAPE values of the OK method are 1.4466, 1.0040, and 92.69 %, respectively, decreasing to 0.7298, 0.5280, and 40.99 %, respectively, for the GSARNN.Among the four models, the GSARNN model achieves the best performance in all indicators.
Figure 5 shows the three-dimensional diagrams of the simulated dataset example in Fig. 4 and its corresponding cross-validation interpolation results generated by the four methods.Taking the simulated dataset as a reference, all four methods express the overall change trend, but the IDW and OK methods perform poorly in the mutation area, which presents as the weakening of the mutation trend and the existence of an obvious interpolation transition zone.The interpolation results of the SARNN and GSARNN models capture and display the mutation characteristics well, and the overall pattern is basically consistent with the simulated data.
Figure 6 shows the detailed interpolation results of Fig. 5 in the form of section images, which are cut along the X-Y plane.In the IDW and OK method results, the low values of the mutation area are obviously overestimated, and the high values are underestimated.Moreover, under the influence of the central high value, unexpected imprints appear in the 4th and 10th layers, which reflects the limitations of traditional interpolation methods in handling mutation.The SARNN and GSARNN models largely restore the data characteristics of the simulated dataset, and the mutation is properly interpolated.Furthermore, the GSARNN model achieves more accurate results at the mutation interface.
The real value and the cross-validation interpolation result values of the four models in Fig. 5 are drawn in line charts in Fig. 7. To evaluate the model performance in different value ranges, the line charts are drawn in ascending order of the real value, which is shown as a rising blue curve.The red line is connected by the model interpolation result points corresponding to the points in the simulated dataset, shown as a fluctuating broken line.In the median value area, the interpolation results of the four methods fluctuate relatively slightly near the real value.The IDW and OK method results show obvious low-value overestimation and high-value underestimation in a large range of low and high values, corresponding to both sides of the mutation interface.Limited by the interpolation mechanism and simplicity of traditional methods, it is difficult for them to interpolate elements containing mutation characteristics.However, the fluctuation amplitude and deviation degree of the OK method result are slightly smaller than those of the IDW method.The interpolation performance of the SARNN and GSARNN models in each value

GSDNN
Input Hidden 1 Hidden 2 Output  range is comparatively stable.Only a slight overestimation is observed in the low-value area, but there are individual points with large deviations in the high-value area.By contrast, the interpolating capacity of the GSARNN for mutant elements is significantly better than that of the SARNN.
In addition, compared with multifarious models in the fields of deep learning, the structure of GSARNN is relatively lightweight, so its training and calculation efficiency can be quite high.Taking advantage of mighty parallel computing capabilities of GPU units and distributed computing structures to accelerate the training process, the GSARNN model usually converges to the optimal state within 15-20 min in our cases.Although the efficiency of the Kriging method is better than the GSARNN model, under the same condition, it still takes about 10 min to fit the functional relationship between the semi-variogram and the distance.

Study area and dataset
The second case uses the measured Argo ocean dataset.The study area is in the northern part of the western Pacific, which is located near the Equator, and is one of the main sources of atmospheric water vapor.The sea-atmosphere interaction in this area is strong and exerts certain influences on natural phenomena such as El Niño (Jian and Jin, 2008); therefore, it is of practical significance to conduct research in this region.Water temperature is one of the most important oceanographic elements.Because the western Pacific is the divergent center of three major monsoon circulations and multiple ocean currents converge here, the seawater temperature in https://doi.org/10.5194/gmd-16-2777-2023 Geosci.Model Dev., 16, 2777-2794, 2023 this area has a substantial impact on the natural environment.This case uses the sea temperature in the western Pacific as the interpolation object.Three-dimensional temperature data were obtained from the Argo (Array for Real-time Geostrophic Oceanography) project, which was initiated to study global oceanic climate change.The Argo observation network has launched 3000 profile buoys that measure the ocean temperature and salinity in the depth range of 2000 m (Riser et al., 2016).Argo data have become the main source of marine climate infor-mation and are widely used in marine and climate research (Liu et al., 2017).However, the Argo buoys are sparsely distributed, and the practical applications of the discrete data they collect are limited.Therefore, interpolating Argo data is necessary for generating a continuous data field and enhancing the practicability of the data products.
The data used in this case were obtained from China's Argo Real-time Data Center (http://www.argo.org.cn/, last access: 6 May 2022).The data collection time is early August 2018; the space range is 0-34 • N, 115-160 • E; and the depth  range is 0 to 1000 m below the sea surface.The data include measurements from 144 buoy stations and 1944 monitoring items.The distribution of buoy stations on the sea surface is relatively uniform, as shown in Fig. 8.
A three-dimensional visualization of the Argo dataset is shown in Fig. 9.The temperature field data in the western Pacific region are distributed regularly, with obvious and uniform variation trends and strong spatial correlation.Little temperature variation is observed in the longitudinal di-rection.In the latitudinal direction, the boundary between the low-temperature region and the high-temperature region sinks obviously, and the overall temperature increases with increasing latitude from 0 to 35 • N. In the vertical direction, the temperature decreases gradually with increasing water depth.The mean, minimum, maximum, and standard deviation of the Argo temperature dataset are shown in Table 3.
This case compares the interpolation abilities of the GSARNN model and the other three models in threedimensional space using real temperature data collected by Argo buoys.

Experiment implementation
The model details are determined in a similar way to case one.The optimum network structure settings and hyperparameters of the GSARNN model for case two are listed in Table 4.

Results
Under the same conditions, interpolation experiments are conducted on the three-dimensional measured Argo dataset using the IDW, OK, SARNN model, and GSARNN model methods.The statistical indicators for the cross-validation experiments are shown in Table 5.In contrast to the simulated dataset of case one, the values of the Argo dataset mainly change in a gradual manner, which is relatively simple.Therefore, all four methods achieve satisfying interpolation experimental results on the whole.However, we notice that certain differences of interpolation accuracy exist in https://doi.org/10.5194/gmd-16-2777-2023 Geosci.Model Dev., 16, 2777-2794, 2023  Figure 10 shows three-dimensional diagrams of the crossvalidation interpolation results generated by the four methods and their corresponding interpolation errors.In interpolation error diagrams, red represents overestimation and blue represents underestimation.Taking the real dataset in Fig. 9 as a reference, the four models restore the data features in most areas, which is consistent with the statistical indicator results, with small differences in some details.The IDW method evidently underestimates the temperature at shallow depths, which may be because its interpolation mechanism can produce large errors at the edge points of a given space.In the OK results, the coexistence of underestimation and overestimation around the sea surface is observed, indicating that the OK method also has some limitations in edge-area interpolation.The SARNN and GSARNN models slightly overestimate the temperature of the bottom area.The error of GSARNN is generally smaller than that of SARNN.Further quantitative analysis is needed to elucidate more details of the interpolation experiment results.
The cross-validation interpolation result values of the four models and the real values are respectively drawn as line charts and scatter diagrams, as shown in Fig. 11a and b.In the low-value area, the fluctuations of the four models are generally small.Several points with large errors are in similar positions for all models, indicating the presence of potential outliers in the dataset; however, the GSARNN model has the strongest ability to minimize these errors.In addition, the IDW, SARNN, and GSARNN methods marginally overestimate the lowest value.Entering the median area, the fluctuation of the four models begins to increase gradually; IDW produces the highest amplitude, followed in descending order by the OK, SARNN, and GSARNN methods.The GSARNN method avoids potential large errors in several positions to the greatest extent.In the high-value area, there  are significant performance differences among the four models.The IDW method underestimates the high values across a large range, the SARNN model slightly underestimates them, the OK method fluctuates around the real values, and the GSARNN model hovers within a narrow range.The information conveyed by the scatter diagrams is consistent with the line charts.The scatter diagrams show that the scatter points of all four methods are concentrated around the diagonal, and the trend lines almost coincide with the standard trend line.Among them, the performance of the GSARNN is quantitatively best.
To compare the visual performance and effects of the four methods for practical interpolation applications, we interpolate and render horizontal sections at 100 m depth intervals in this area.Each method generates nine sections of 0-800 m depth, as shown in Fig. 12.The four methods produce similar interpolation results on the overall pattern, but there are great differences in detail.Due to the sparsity of the sampled points, the points closer to the section have a more prominent impact than the distant points in the interpolation results of the IDW method, producing many noticeable speckles on the interpolation surface.The OK method uses the statistical calculation process to fit the spatial features to a certain extent, alleviating the speckle problem; however, uneven color bands with abrupt color changes can still be observed.The SARNN and GSARNN models fit the continuous temperature field characteristics using the same set of sparse Argo temperature data.The overall change trend of the interpolated sections is consistent with the traditional methods but is significantly smoother and more uniform, reflecting the actual temperature field characteristics.Compared with the SARNN, the GSARNN presents richer details on the basis of smoothness, more exhaustively describing the ocean temperature field characteristics, showing the qualitatively best performance.

Discussion
In case one, the comparison between two traditional interpolation methods and two neural-network-based methods demonstrates that introducing neural networks for powerful nonlinear fitting improves interpolation performance, enabling the adequate extraction and construction of complex change characteristics of spatial elements such as mutation.The comparison of the SARNN and GSARNN models shows that deconstructing and remodeling the expression and solution of spatial distances, and subsequently applying the generalized expression in interpolation calculations, enables the model to mine and restore the characteristics of the original https://doi.org/10.5194/gmd-16-2777-2023 Geosci.Model Dev., 16, 2777-2794, 2023 data to the greatest extent, effectively improving the interpolation accuracy and optimizing the interpolation result.
In case two, the section interpolation prediction performance of the four methods varies considerably.The spatial distribution of the Argo buoys is sparse, uneven, and irregular, which is common in most practical interpolation scenarios.When interpolating such datasets, the traditional methods tend to produce dominant weights on the points adjacent to the point to be interpolated, which may lead to disproportionate regional impacts of specific sample points around them, resulting in uneven speckles and bands.Traditional methods lack the global consideration of the comprehensive effect of all sample points on the interpolation area.In contrast, neural-network-based models generate a smoother interpolation surface than traditional methods.This indicates that neural-network-based models can greatly reduce the influence of local extreme points on points to be interpolated and acquire quite reasonable spatial patterns of geospatial elements exploiting the non-linear fitting ability of neural networks.In particular, the GSARNN model incorporates the raw coordinate vectors as the network input and fits the generalized spatial distances in the three-dimensional spatial element field, extracting more detailed data features, generating interpolation results that are more consistent with reality.
In summary, in case one, we test the quantitative interpolation performance of the four methods on a dataset with complex characteristics; in case two, we examine the qualitative performance of the four methods in a practical interpolation application.The experiment results indicate that traditional methods are sensitive and dependent on the spatial distribution and data characteristics of the sampled points.By applying the concepts of neural networks, spatial autoregression, and generalized spatial distances to three-dimensional spatial interpolations, the GSARNN model can effectively optimize the interpolation result and improve the adaptability of interpolation methods in various scenarios.

Conclusions
In this study, we focus on the integration of interpolation and neural network model in three-dimensional space, in which the spatial elements possess complex characteristics.To improve the interpolation effect, we remodel the expression and solution of spatial distances and spatial weights -two critical elements in spatial interpolation.For spatial distance, we employ the generalized spatial distance expression and propose a GSDNN unit to adaptively generate the generalized spatial distance, replacing the conventional Euclidean spatial distance as the interpolation network input.For spatial weight, we construct the GSARNN model by integrating the GSDNN unit into the SARNN model.Exploiting the powerful feature extraction and nonlinear fitting abilities of neural networks, we can realize accurate spatial weight calculations.
Experiments are conducted on two three-dimensional cases: a simulated case and a real Argo temperature case.The GSARNN model is compared with the traditional IDW and OK methods and the advanced SARNN model.The experiment results indicate that the GSARNN model achieves the best interpolation performance among the four methods, especially on the complex three-dimensional spatial dataset with discontinuous features and sparse and irregular distribution.The GSARNN model can effectively extract subtle spatial correlation characteristics and accurately fit the spatial weights, adapting well in three-dimensional space.
The GSARNN can perform spatial interpolation with high accuracy at the cost of longer model training and calculation time.Therefore, the GSARNN is more suitable for interpolation scenarios with complex characteristics and strict demands on the result quality.For interpolation tasks with relatively simple spatial characteristics and specific requirements for efficiency, traditional methods may be a better choice.
In the future, we plan to consider the time dimension in addition to the spatial dimension to develop an accurate spatiotemporal data interpolation model.Meanwhile, based on the interpolation-dependent variable, the relevant regression variable factors can be introduced for further interpolation statistical analyses.In addition, as the number of sampled points increases, the number of input neurons and output neurons of the GSARNN will also increase, resulting in the expansion of network parameters and the extension of training time inevitably.Therefore, how to maintain a stable and acceptable training time given different sample data volumes is an important problem to be tackled in further research.
Review statement.This paper was edited by Rohitash Chandra and reviewed by Chao Ma and three anonymous referees.
To improve the transferability and adaptability of the GSARNN and solve the problems of overfitting and gradient vanishing in neural network training, we design a set of model training strategy based on the cross-validation method, including the overall training framework, parameter initialization method, activation function definition, and training optimization algorithms.A complete set of training processes is established to improve training quality and interpolation accuracy, as shown in Fig. 2. We employ several neural network structure design and model optimization techniques to improve training efficiency.For each hidden layer, we first use the robust parameter initialization method proposed by He et al. (2015).Second, the batch normalization method of Ioffe and Szegedy (2015) is adopted to accelerate the model training convergence speed and improve the training process stability.Third, the PReLU (parametric rectified linear unit) proposed by He et al. (2015) is used as the activation function to improve model performance.Finally, the dropout strategy developed by Srivastava et al. (

Figure 2 .
Figure 2. The network training framework of the GSARNN model.

Figure 4 .
Figure 4.An example of a simulated dataset.(a) Component V 1 with a gradual change trend; (b) component V 2 with a mutation; (c) the simulated dataset combining V 1 , V 2 , and ε.

Figure 5 .
Figure 5. Three-dimensional diagrams of the simulated dataset example in Fig. 4 and its corresponding cross-validation interpolation results.

Figure 6 .
Figure 6.Section images cut along the X-Y plane of Fig. 5.

Figure 7 .
Figure 7. Line charts of real and interpolated values of the four models in Fig. 5.

Figure 8 .
Figure 8. Distribution map of Argo buoy stations represented by blue points, 144 stations in total (the base map is from ESRI maps).

Figure 9 .
Figure 9. Three-dimensional visualization of the Argo temperature dataset.

Figure 10 .
Figure 10.Three-dimensional diagrams of the cross-validation interpolation results and interpolation errors.

Figure 11 .
Figure 11.The quantitative analysis results of the four models.(a) Line charts in ascending order of real value; (b) scatter diagrams with trend lines.

Table 1 .
Network structure settings and hyper-parameters of the GSARNN model in case one.

Table 2 .
The mean statistical evaluation results of 100 experiments on the simulated dataset using the IDW, OK, SARNN, and GSARNN methods.

Table 3 .
Basic statistics of the Argo temperature dataset with 1944 monitoring points in total.

Table 4 .
Network structure settings and hyper-parameters of the GSARNN model in case two.

Table 5 .
The statistical evaluation results of the Argo temperature dataset experiments using the IDW, OK, SARNN, and GSARNN methods.