Geostatistical characterization of the soil of Aguascalientes, México, by using spatial estimation techniques

Four spatial estimation techniques available in commercial computational packages are evaluated and compared, namely: regularized splines interpolation, tension splines interpolation, inverse distance weighted interpolation, and ordinary Kriging estimation, in order to establish the best representation for the shallow stratigraphic configuration in the city of Aguascalientes, in Central Mexico. Data from 478 sample points along with the software ArcGIS (Environmental Systems Research Institute, Inc. (ESRI), ArcGIS, ver. 9.3, Redlands, California 2008) to calculate the spatial estimates. Each technique was evaluated based on the root mean square error, calculated from a validation between the generated estimates and measured data from 64 sample points which were not used in the spatial estimation process. The present study shows that, for the estimation of the hard-soil layer, ordinary Kriging offered the best performance among the evaluated techniques.

method is to simulate a sector, testing different estimation techniques and compare the results with the simulated reality (Bell and Whateley 1994).
The objective of this study is to evaluate and compare four spatial estimation techniques: the regularized spline interpolation, by splines in tension, by the inverse distance weighted and ordinary Kriging estimation to determine which best suits the surface stratigraphy settings in the city of Aguascalientes.

Geographical location
The city of Aguascalientes is located between the extreme projected coordinate X1 = X2 = 2416000 and 2430000 north latitude, and between Y1 = Y2 = 774.000 and 786.000 west longitude coordinate system Universal Transverse Mercator, North American Datum of 1927 (UTM-NAD27, Zone 13 North) (Fig. 1), with an average elevation of 1870 m above mean sea level, nestled in the Valley of Aguascalientes which is a morphological subunit crosses the east-central part of the state, with an orientation substantially NS, and that an elongated depression is between 10 and 20 km. wide, extending beyond state boundaries, from south of Zacatecas to the eastern end of Jalisco (Arroyo- Contreras et al. 2004). Depression is an alluvial plain formed by more or less regular reaching an altitude between 1850 and 1900 m levels, where the river runs central Aguascalientes (also known as San Pedro River), which flows from north to south and is the most important state flow. The Valle de Aguascalientes corresponds to a tectonic graben structure, known in geological literature as the "Graben of Aguascalientes", bounded on the east and west by two large faults oriented north-south, this is not no rock formation emerges as filling materials are unconsolidated alluvial deposited on a bedrock these fillers thickness reaches more than 600 m. in the deeper parts and not less than 200 m. in the more shallow (Aranda-Gómez and Aranda-Gómez 1985;Arroyo-Contreras et al. 2004).
From 1979 cracks were detected in the urban area of the city of Aguascalientes and other towns in the surrounding area as Jesus Mary and Puertecito Chicalote of the Virgin, all located in the Quaternary alluvial fill (Aranda-Gómez and Aranda-Gómez 1985). The direction of these cracks coincides with the general direction of the valley with maximum vertical displacement of 1 m or more. Invariably, with few exceptions, the downthrown the crack is located to the west of the city, that is, toward the San Pedro River. On the eastern edge of the city is located a normal fault with approximately NS direction, which is considered that originated the Graben of Aguascalientes, from this failure, the surface of the city has an east-west slope greater than 1 %.

Methods
The computer software ArcGIS (ESRI 2008) was used to implement the spatial estimation techniques by regularized splines, Splines in tension, weighted inverse distance and ordinary kriging estimation, and determine which one best represents the stratigraphy surface conditions of the city of Aguascalientes.
The data used in estimating the records consist of 478 polls open pit type (PCA) performed within the urban area of the city (Fig. 1), which were obtained from the files of the Department of Splits Development Secretariat Aguascalientes urban township, and company Integral Civil Works Supervisor, S. A. de CV, from the same sources were obtained records of 64 polls PCA type ( Fig. 1) with additional assessment was made of each technique.
The open pits are scattered all over, with no obvious pattern of the spacing between them, as they were made for particular studies and do not share the same information among each other. The depth of the open holes varies between 3.00 and 5.00 m. Generally speaking, two layers can be identified: a collapsible surface layer composed mainly of clay or silt of low compressibility (CL or ML) and a second resistant layer, with high stiffness, composed by cemented silty sands or clayey materials (SM or SC). The depth of the hard layer is the basic parameter for the analysis. The mechanical properties of the soil obtained from a stratigraphic profile were used to establish the depth of this layer.
The data collected were synthesized in a database created with the coordinates of all the polls and the depth of the hard layer (surface layer thickness) in meters represented negative (−m) to characterize the relief of the hard layer (composed silty sand or gravelsand). For the location of the probes was used UTM projected coordinate system, with the North American Datum of 1927 (Fig. 1).
The hard layer depth was estimated using four techniques, obtaining for each estimate a raster file which was processed and compared with data from the probes 64 prepared for evaluation, using the calculation of the root mean square error (RMSE) to evaluate and compare each technique. RMSE was chosen as a benchmark as it retains the same units as the input data.

Spatial estimation techniques
The hard layer depth was estimated using the four techniques mentioned above. Since the base of each appears widely described in the literature (Isaaks and Srivastava 1989;Armstrong 1998;Webster and Oliver 2001), only given general elements each.

Regular splines and splines in tension
These are techniques which estimate deterministic interpolation values using a function that minimizes the overall curvature of the surface, resulting in a smooth surface that passes exactly through all known points of entry.
The basic shape of the spline interpolation imposes two conditions on the interpolation function: 1. The surface must pass exactly through the input data. 2. The surface must have a minimum curvature: the sum of the squares of the second derivatives calculated at each point of the surface should be as short as possible.
Splines There are two techniques: the regularized and the voltage in the regularized creates a smooth, gradual change with values that may be generated outside the range of the data sample, and the tensile stiffness is controlled surface according to the nature of the phenomenon to model, it creates a smooth surface with values less linked to the range of the sample data.
Splines regularized technique modifies the criterion of minimizing the curvature, incorporating the third derivative and assigning a weight, referred to as τ in the literature. This method is especially useful if required the calculation of the second derivatives of the surface of interpolation.
Splines technique used in tension arising in the first criterion of minimizing the curvature, assigning a weighting referred to as φ in the literature. The generated surface is continuous and smooth, while the first derivative is continuous, but is not smooth.
Both techniques use the following expression Splines to generate surface interpolation: where N, is the number of input points, λ j are the weight coefficients and r j is the distance between the point to be estimated (x, y) and the j input point.
T (x, y) y R (r j ) are defined according to the specific Spline used, as follows:for the regular splines: for the splines in tension: where τ y φ are the weighing parameters, r is the distance between the point to be estimated and the input point, K 0 is the modified Bessel function, c = 0.577215, and a i are coefficients of the associated system of linear equations.

Weighted inverse of distance
This method implicitly assumes that parts of the data that are nearby are more similar than those further away. To estimate a value at a given point using a linear combination of neighboring data, so it is assumed that each input has a local influence that decreases with distance.
The general term used for this method is: where x 0 is the point where the estimation is to be made, z(x i ), i = 1, 2, …, N, are the actual measurements of the point x i , and λ i are the weight coefficients associated to each measurement; the estimation itself is symbolized by z * (x 0 ). The weighting coefficients are calculated based on the inverse of the distance between the target point and the locations x 0 x i input measurements: The preferred choice for β is 2, thus, the estimation corresponds to the square of the inverse of the distance.
The only requirement of this method is that the sum of weights equals λ i unit. Thus, if x 0 x i match then the weight for that point becomes unitary and x z* takes the value 0 z (x i), so that interpolation is accurate.
An attractive feature of this technique is that the weights decrease rapidly as the distance increases, so that the local estimation is significantly further since the weights never become null no discontinuities. However, the selection of the weighting function is arbitrary, and there is a measure of the estimation error. Furthermore, the created surface always has a zero gradient in the entry points, so that the maxima and minima occur only at those sites.

Standard Kriging
Kriging methods rely on mathematical and statistical models, and it is precisely the addition of statistical models which separates deterministic Kriging methods, the Kriging uses or functions of semi-variance or variance as a function of distance, which is characteristic geostatistics distinctive and allows models based on this semi-variance.
The technique of ordinary Kriging is to use estimators that are linear combinations of the data: for point estimates of the variable V (X) at a point x0 we use the estimator V * x 0, as a linear combination of known data V (x i): To ensure that the estimator is unbiased, the sum of the weights must be equal to unity: And the expected error should be zero: Thus, the variance of the estimate is: where γ (xi, xj) is the semi-variance of V (X) between the data points x i and x j, and γ (x i, x0) is the semi-variance between the i-th data point and the target point x 0, with the following expression for the semi-variance: where h is a scalar equal to the distance between the points and x j x i.
For each estimated value exists kriging variance associated, which we denote as σ 2 ( 0), defined by Eq. (9). The next step is to determine the coefficients which minimize λ i variance value, this is done using the method of Lagrange multipliers.
First we define an auxiliary function f (λ i, ψ) containing that we want to minimize the variance plus a term that contains a multiple of Lagrange, ψ, for ordinary Kriging this function is: After defining the partial derivatives of the function with respect to the coefficients λ and ψ regarding: for i = 1, 2,…, N. This leads to a system of N + 1 equations with the same number of unknowns: The solution of this system of equations provides the coefficients λ i, which are used in Eq. (8) for the estimates.

Evaluation and comparison
To compare the results obtained by each technique was used root mean square error (RMSE): where N is the number of validation points, and ν i and ν i* are estimated and measured values at each location i. Table 1 shows the basic statistical data of the layer depth.

Results and discussion
The data distribution is asymmetrical negative as the average is less than the median, having a bias towards the left, it can be seen that the shape of the histogram (Fig. 2).
We also carried out an analysis of spatial trend, showing a clear trend in the data, which was modeled by a quadratic polynomial fit (Fig. 3). The trend analysis and the analysis of the distribution were useful for the implementation of the estimation by ordinary kriging, since this technique is based on statistical parameters and spatial variability (Isaaks and Srivastava 1989;Webster and Oliver 2001). The variograms adjusted (Fig. 4) permitted the identification anisotropy in the spatial variability since not all the directions have the same distance phenomenon correlation, the correlation distance is minimum is in the direction of azimuthal = 107.10° and is 2170.84 m. and the higher correlation distance of 4030.82 m. in the azimuth = 17.10° (Figs. 4,5) could also capture the variability in distance below the sample, through the "nugget effect" (or "nugget" in English) found 0256.

Regularized splines
The estimate generated by this technique presents very erratic values and even positive (Fig. 6), the phenomenon being modeled is hard layer depth in meters negative (-m). As RMSE values, this technique had the highest value of four, with RMSE = 4.937 (Table 2), so that this technique is more yielding inaccurate results.

Splines in tension
The estimate generated by this technique has less erratic values, but still has some positive values (Fig. 7). For RMSE, this technique had the second highest value of 1.237 (Table 2) so the relief representation of the hard layer was acceptable either.

Weighted inverse technique
The estimate generated by the technique of Inverse weighted distance presents quite acceptable values at first, since they fail to generate positive (Fig. 8), but from the RMSE a value of 0.843 was obtained (Table 2), which is the third highest value, so, despite being more accurate than the two other techniques, Splines does not offer an accurate representation of the hard layer.

Standard Kriging
The estimate generated from ordinary Kriging technique has very acceptable values, as with the inverse distance weighted positive values are not generated, and as to the RMSE have the lowest value of all 0.814 (Table 2), so is the characterization technique that generated the relief of the hard layer more accurately (Fig. 9).

Conclusions
The methodology used to evaluate the accuracy of different estimation techniques and spatial interpolation in order to decide which one provides the most accurate results. The study shows that within deterministic methods to the case study, the technique of inverse distance weighted yielded the most accurate results, with a value of root mean square error (RMSE) of 0.843, while the technical regularized spline was the most inaccurate of all, with a RMSE of 4.937.
Noteworthy is the large difference between the RMSE values reported by regularized spline technique (RMSE = 4.937) and spline in tension (RMSE = 1.237).
The ordinary kriging estimator is the space that offered the most accurate results among the four techniques evaluated, with the lowest value of the root mean square error RMSE = 0.814.
These methods were chosen, among several techniques available in the literature, namely inverse weighted distance, regularized spline, tension spline and ordinary kriging, to estimate the depth of the hard layer of the Valley of Aguascalientes due to its popularity and robustness as interpolation methods for spatial prediction of variables  in similar cases reported. Indeed, the technique of weighted inverse distance provides average values depending on the distance, perhaps this technique works better when the unknown points are closer to the sampling, so it is necessary to have a high density of known points. Regularized spline and tension spline tension interpolation functions, even when passing through the points of the model, are not able to predict the values of the checkpoints because of the roughness of the functions. These techniques provide good results when the variation of the surface is slight. Tension spline technique provides a closer approximation as modifies the minimization criterion incorporating the terms of the first derivative considering more abrupt variations without sacrificing the smoothness of the function. This technique is used when you want to preserve shape properties. Ordinary Kriging technique, unlike the previous ones, uses random field theory to establish the spatial correlation of the variables achieving better prediction with a stochastic model. The advantage of the Kriging is that ensures that the average errors is zero. In ordinary Kriging, local media should not necessarily resemble the average of the entire population, so just take the closest points to make the interpolation, so the variables are correlated locally and not with the entire population. For this reason, ordinary Kriging is widely used for the prediction of variables whose value depends less on all samples collected, but the nearest properties such as geographic regions samples.
The hard layer depth in the valley of Aguascalientes has a clear spatial structure, capable of being modeled by adjusting variograms for estimation with ordinary Kriging technique. Generally, it is recommended for the prediction of properties of clearly defined geographical areas, where properties to predict not depend greatly from the distant properties, but local, such as processes of formation of soil, deposition sequences and climatic focalized actions, so ordinary Kriging be preferred over other techniques.