Geostatistical Analyst for Deciding Optimal Interpolation Strategies for Delineating Compact Zones

Variability maps of Hydraulic conductivity (K) were generated by using geo statistical analyst extension of ARC GIS for delineating compact zones in a farm. In the initial exploratory spatial data analysis, K data for 0 15 and 15 30 cm soil layers showed spatial dependence, anisotropy, normality on log transformation and linear trend. Outliers present in both layers were also removed. In the next step, cross validation statistics of different combinations of kriging (Ordinary, simple and universal), data transformations (none and logarithmic) and trends (none and linear) were compared. Combination of no data transformation and linear trend removal was the best choice as it resulted in more accurate and unbiased prediction. It thus, confirmed that for generating prediction maps by kriging, data need not be normal. Ordinary kriging is appropriate when trend is linear. Among various available anisotropic semivariogram models, spherical model for 0 15 cm and tetra spherical model for 15 30 cm were found to be the best with major and minor ranges between 273 410 m and 98 213 m. The kriging was superior to other interpolation techniques as the slope of the best fit line of scatter plot of predicted vs. measured data points was more (0.76) in kriging than in inverse distance weighted interpolation (0.61) and global polynomial interpolation (0.56). In the generated prediction maps, areas where K was <12 cm·day were delineated as compact zone. Hence, it can be concluded that geostatistical analyst is a complete package for preprocessing of data and for choosing the optimal interpolation strategies.


Introduction
Precision agriculture concept relies on the existence of in-field variability.It requires the use of new technologies, such as global positioning system (GPS) and information management tools such as Geographical Information System (GIS) to assess and understand spatio-temporal variations.Collected information may be used to precisely evaluate inputs need such as tillage and fertilizer requirements so as to improve the input use efficiency of the farm.Precision tillage or site-specific tillage is a component of precision agriculture farming that employs detailed site-specific soil compaction information by determining variability of soil bulk density (BD) or saturated hydraulic conductivity (K) within the field so as to optimize the tillage requirement in different areas of the farm.
Analysis of the spatial variability of the above soil properties could be done by the use of geostatistics which deals with the spatial structure analysis and spatial interpolation for preparing prediction maps of these properties.These maps are used to define the constrained areas where the values of these properties are beyond their critical ranges and accordingly site-specific management practices could be suggested to ameliorate these constraints.
In past studies, various semivariogram models were developed for examining the spatial structure of several measured soil properties such as saturated hydraulic conductivity, % sand, saturated water content, available soil water storage capacity, soil bulk density, nutrient content of soil and pesticide distribution in soil [1][2][3][4][5][6][7].details of various steps of spatial variability analysis were discussed [8] including collection of samples, binning, drawing of empirical semivariogram, development of semivariogram model, checking of data for sta-tionarity, analysis of data for presence or absence of trend or drift, neighborhood search strategy, different forms of kriging and their applications under different situations.
In most of the recent softwares developed for preparing kriged maps of properties, information about their spatial structure (semivariogram models) are required in the input files.Hence spatial structure analysis should be carried out carefully for preparing semivariogram models.In softwares like surfer, S+ SpatialStats, GEOEAS (Geostatistical Environmental Assessment Software), GS+ etc, semivariogram models developed are isotropic and there is limited provision for checking the data for normality, trend, outliers and anisotropy.In one such study on spatial variation of soil properties of a farm [9], data of various properties were not explored for normality, presence of outliers, anisotropy and presence of trend which were essential for checking the stationarity.Kriging is not appropriate without removing the cause of nonstationarity in data.Hence the range of BD as computed were 1053 m and they concluded that their BD and organic carbon content data showed large amount of nugget variation, whereas few earlier studies [8,[10][11] reported relatively less range and strong to medium spatial dependence of these soil properties.Similarly, in another study [12], stationarity in data was not checked and soil properties were presumed to be isotropic in nature as GS+ software was used to determine the spatial structure and thus it was reported that BD data in their field study was spatially uncorrelated.
Again in one earlier study [11], data for normality and trend were explored and omnidirectional semivariogram models were developed by using S+ Spatial Stats software, which do not have options about presenting anisotropic nature of semivariogram With the introduction of Geostatistical Analyst extension in ArcGIS, it is now possible to carry out exploratory spatial data analysis followed by spatial structure analysis for preparing semivariogram models which can be used for interpolating data at unvisited sites by choosing appropriate kriging techniques.A distinctive feature of this statistical package is that it is fully integrated into the ArcGIS environment, so there is no need to use other software for preprocessing, statistical analysis and interpolation.
Hence, the present study was conducted with the objective to demonstrate the step by step use of geostatistical analyst extension of ArcGIS 9.2 for preparing kriged maps of saturated hydraulic conductivity of soils of IARI farm for delineation of compact zones.

Material and Methods
The study was conducted after the harvest of rainy sea-son crop at Indian Agricultural Research Institute farm, New Delhi, India.The study area was in main block of IARI farm, situated between 77 °8'45'' to 77 °10'30''E and 28 °37'15'' to 28 °39'00''N (Figure 1) and had semiarid climate.98 sampling points were selected on a rectangular grid at 100 m × 70 m interval.Land uses were predominantly maize-wheat and maize-mustard.The soils (Typic Ustrocrepts) had sandy loam texture.
For determination of K by constant head method, at each site a core auger of 5 cm diameter was used to take triplicate soil samples from 0 -15 and 15 -30 cm soil depths.

Geospatial Analysis
Drawing of prediction maps of hydraulic conductivity parameter using Geostatistical Analyst, involved three key steps [13] : 1) Exploratory spatial data analysis 2) Spatial Structural analysis 3) Surface prediction and assessment of results

Exploratory Spatial Data Analysis
The boundary shape file of IARI farm along with sampling shape file (point data) were added as layers to ArcMap.Exploratory Spatial Data Analysis (ESDA) involves exploring the distribution of the data, looking for outliers, trends and examining spatial autocorrelation.Each ESDA tool provides a different view of the data and is displayed in a separate window.The different Tools of ESDA are Histogram, Voronoi Map, Trend Analysis, Semivariogram/Covariance Cloud.Views under all tools interact with one another and with the Arc-Map map.

Histogram Tool 1)
For examining normal distribution: Several methods in the Geostatistical Analyst require that the data should be normally distributed.The tool displays the frequency distribution for the dataset and calculates summary statistics.For normally distributed data (a bellshaped curve), the mean and median should be similar, the skewness should be near zero, and the kurtosis should be near 3.If the data is highly skewed, then linear, box-cox or logarithmic techniques should be used to make data normal.
2) For detecting global outliers: Outliers can have several detrimental effects on prediction surface including effects on semivariogram modeling and the influence of neighboring values.A global outlier is a measured sample point that has a very high or a very low value relative to all of the values in a dataset.The histogram

Voronoi Map Tool
Cluster varoni maps are prepared for studying the local outliers.A local outlier is a measured sample point that has a value that is within the normal range for the entire dataset, but it is unusually high or low when compared to the the surrounding points.Voronoi maps are constructed from a series of polygons formed around the location of a sample point.Voronoi polygons are created so that every location within a polygon is closer to the sample point in that polygon than any other sample point.After the polygons are created, neighbors of a sample point are defined as any other sample point whose polygon shares a border with the chosen sample point.Using this definition of neighbors, Cluster Voronoi maps are prepared.Here all cells are placed into five class intervals.If the class interval of a cell is different from each of its neighbors, the cell is colored grey to distinguish it from its neighbors.These grey polygons indicating local outliers should be removed from the data layer before proceeding for geostatistical analysis.

Trend Analysis Tool
Data should also be subjected to trend analysis for checking the presence of global trend, which may arise due to topography.Trend if present should be removed so that data could meet the condition of stationarity, which is necessary for using kriging as an interpolation technique.
The Trend analysis tool can help identify global trends in the input dataset.It provides a three-dimensional perspective of the data.The locations of sample points are plotted on the x, y plane.Above each sample point, the value is given by the height of a stick in the z dimension.
The unique feature of the Trend Analysis tool is that the values are then projected onto the x, z plane and the y, z plane as scatter plots.A best fit line (a polynomial) is drawn through the projected points, which model trends in specific directions.If the line were flat, this would indicate that there would be no trend.

Semivariogram/Covariance Cloud
The semivariogram/covariance cloud shows the empirical semivariogram for all pairs of locations within a dataset and plots them as a function of the distance between the two locations.The semivariogram/covariance cloud can be used to examine the local characteristics of spatial autocorrelation within a dataset e.g. to check if the sampling interval is well within the range of the spatial correlation between the data points and to explore the directional influence.It is also examined for presence of outliers.Each red dot in the Semivariogram/Covariance Cloud represents a pair of locations.Since closer locations should be more alike, in the semivariogram the close locations (far left on the x-axis) should have small semivariogram values (low on the y-axis).As the distance between the pairs of locations increases (move right on the x-axis), the semivariogram values should also increase (move up on the y-axis).However, a certain distance is reached where the cloud flattens out, indicating that the relationship between the pairs of locations beyond this distance is no longer correlated.Looking at the semivariogram, if it appears that some data locations that are close together (near zero on the x-axis) have a higher semivariogram value (high on the y-axis) than others, these pairs of locations should be checked to see the inaccuracy in data (Local Outliers) and if it is so, in accurate data points should be removed.

Spatial Structural Analysis
Steps for use of Geostatistical wizard tool for Investigating spatial structure (variography): 1) In first step of geostatistical wizard, input data layer and attribute field on which kriging was to be performed were selected.Three options of kriging types i.e.Ordinary, Simple and Universal; Two data transformation i.e. none and log and two trend types i.e none and Linear were taken.In all 12 combinations were chosen.
2) In the second step of the wizard, for each combination semivariogram model was developed.For this, the wizard first determined a good lag size for grouping empirical semivariogram pairs (binning) for reducing their number.Then it fitted a spherical semivariogram model (default model) and computed its associated parameter values namely nugget, range, and partial sill.To actually account for the directional influences on the semivario-gram model, directional search tool was used and anisotropical semivariogram model was developed.

Surface Prediction and Assessment of Results
In this step of geostatistical wizard, neighborhood search strategy was decided by taking appropriate shape of search neighborhood (circular for isotropic and elliptical for anisotropic data) and dividing it in four quadrants for avoiding biasness in particular direction.Neighborhood radius was taken as half of the computed range.Maximum and minimum number of neighborhood points in each quadrant were kept between 2 -5 so as to use a total of 8 -20 neighbor points for interpolation.
The Geostatistical Analyst predicted the value at unknown location by using the equation: where Z(s i ) is the measured value at the i th location and i  is an unknown weight for measured value at the i th location, o s is the prediction location.N is number of measured points used for estimation.To ensure the predictor is unbiased for the unknown measurement, the sum of the weights i  must equal one.These weights could be computed by solving the ordinary kriging equations given below: Fitted semivariogram models were used for creating Г matrix and g vectors which then were used for solving for the kriging weights vector λ by matrix.Summation of the weight for each measured value times the value would be the final prediction value for the location.

Performing Cross Validation to Assess Parameter Selections
Cross-validation helps in deciding the model which makes the best predictions.The calculated statistics serve as diagnostics that indicate whether decisions on choice of method of kriging, as well as on choice of transformation and order of trend decided from ESDA are reasonable to provide an unbiased and more accurate prediction of parameter values along with valid prediction standard errors.Cross-validation tool of geostatistical wizard gives a scatter plot of predicted vs measured values and a best fit linear line (green solid line) through it.Higher the slope of this line (magnitude always <1), nearer it will be to 1:1 line (black dashed line) which is an indicator of good prediction choices.If all the data are spatially independent, best fit line would be horizontal.For a model that provides unbiased predictions, the mean of prediction errors should be close to zero.Again for the correct assessment of the variability and to check if the prediction standard errors are appropriate and valid, the root-mean-square prediction error and average standard prediction error should be similar and the root-mean square standardized prediction error should be close to 1.

Delineation of Compact Zones
For delineation of compaction zones, prediction map of the best choice i.e. the most suited kriging type, data transformation, trend type and semivariance model should be used.Map should be divided into five hydraulic conductivity classes namely 0 -6, 6 -12, 12 -24, 24 -72 and 72 -150 cm•day -1 as used in physical rating of soils [14].The areas of the map with saturated hydraulic conductivity values <12 cm•day -1 were considered as compact zones.

Exploratory Data Analysis
First of all, histogram tool was used to examine the frequency distribution of data for checking its normality and presence of outlier (Figure 2 and Table 1).As the differences between mean and median were 5.04 and 3.13, respectively, for 0 -15 and 15 -30 cm layers.Similarly the skewness values of both layers were 1.52 and 0.78.The above results thus indicated that data were far away from being normal.But data on log transformation of for both layers showed normal distribution.
In second step, varoni map tool was used to prepare cluster varoni map for examining the presence of outliers which were shown as grey cell polygons.These data points were highlighted in Arc map and were removed as shown in Figure 3.
In third step semivariance tool was used for examining the empirical semivariogram which clearly indicated that data were spatially correlated.Each red dot in semivariogram cloud represents a semi variance value which is the difference squared between the values of each pair of location and is plotted on y-axis relative to the distance separating each pair on the x-axis.Green points on the  other hand indicates the number of pairs made by a given point with its neighborhood.It appeared that some data locations that were close together (near zero on the xaxis) had very high semivariance values (high on the yaxis) which indicated the possibility of inaccuracy in data.i.e. the presence of local outliers.Such local outliers (red dots with very high variance at small lag distances) were highlighted and removed (Figure 4).To explore for a directional influence in the semivariogram cloud, search direction tool was used, which indicated anisotropic nature of data.
In the fourth step, trend Analysis tool was used to examine if trend is present in both XZ and YZ planes in different directions by looking at the shapes of both green and blue lines.For 0 -15 cm layer, trend in XZ direction was negligible, where as trend in YZ direction was linear (Figure 5).For 15 -30 cm soil layer slight trend was present in both XZ and YZ directions.Presence of trend in surface layer in YZ direction was attributed to 1% -3% slope in Y direction and no trend because of slope <0.5% in X direction.

Geostatistical Wizard for Spatial Structure Analysis
In Geostatistical wizard, firstly sampling points data layer (after removing local and global outliers) and attribute field i.e.HC of 0 -15 cm layer were selected.Then different of kriging (Ordinary, simple and universal), data transformations (none and logarithmic) and trends (none and linear) were taken and their cross validation statistics were compared (Figure 6).It was observed that for all types of kriging, with no transformation of data and without removal trend, predictions were unbiased as mean of prediction errors was nearer to zero (0.003 -0.097) and prediction of standard errors were appropriate as indicated by the closeness of the root-mean-square prediction error and average standard prediction error values (differences in their magnitude were 0.08 -0.3).But predictions were not close to measured values as indicated by appreciable magnitude of root-mean-square prediction error (nearly 17.4) and also best fit line was far away from 1:1 line and slope of best fit line was less (0.47).The results thus described these choices as not desirable.Again for all kriging types, on log transformation of data along with no trend removal option, although the slope of best fit line increased (0.61 -0.63) as compared to no transformation but the magnitude of other cross validation statistical parameters for deciding the biasness of prediction (higher mean of prediction errors of the order of 2.8 -2.9) and validity of estimated prediction errors were not satisfactory (large differences between root-mean-square prediction error and average standard prediction error of the order of 12 -40).Similarly irrespective of kriging type, on log transformation of data and linear trend removal (the choices suggested by ESDA), the slope of best fit line improved further but the magnitude of other cross validation prediction error parameters were not indicating these choices as the suitable ones.However, the combination of no data transformation and linear trend removal indicated higher slope of best fitted line (0.55 -0.77), lower mag- nitude of mean of prediction errors (0 -0.27), less differences (1 -4) between root-mean-square prediction error and average standard prediction error.The above analysis thus indicated combination of no data transformation and linear trend removal is the best choice as it resulted in more accurate and unbiased prediction of data.
The above results thus confirm the earlier findings [8,15] that for drawing prediction maps by kriging, data need not be normal, but trend if present should be removed in order to remove the cause of nonstationarity in data before making predictions.
On comparing the cross validation statistics of all three methods of kriging (with no data transformation and linear trend removal), ordinary kriging was found to be the most suited choice for interpolation.The results were in agreement with those earlier reported [8] where it was mentioned that universal is more appropriate when trend is of higher order.Similar results were obtained when K data was analysed for 15 -30 cm soil layer.
After selecting ordinary kriging with no data transformation and removing linear trend, cross validation statistics of different semivariogram models were also compared to choose the best option (Table 2).Spherical model for 0 -15 cm and tetra spherical model for 15 -30 cm were found to be the best.It was further observed that for drawing semivariogram, lag size of 23 m along with 12 number of lags were chosen by wizard as default choices (Table 3) which were in accordance to the thumb rule that product of lag size and no of lags (23 × 12 = 276 m) should be less than half of the maximum distance between farthest pair of points (nearly 1200 m).
Semivariogram showed anisotropic nature with major and ranges of 273 m and 410 m and minor range of 98 and 213, respectively, for 0 -15 and 15 -30 cm layer.The superiority of kriging over other interpolation techniques was obvious as the slopes of best fit line in 4 th power under inverse distance weighted interpolation (IDW) and global polynomial interpolation (GP) were 0.61 and 0.56 against 0.76 under ordinary kriging (Figure 7).Besides making more accurate predictions, kriging  also gave prediction error maps of the data.It was also learned that by increasing the sample size from 36 to 50 and 50 to 72, prediction improved as indicated by the increased value of slope of best fit line of scatter plot from 0.29 to 0.45 and from 0.45 to 0.76 (Figure 8).The lesser the number of samples in a given area means the separation interval between pairs increases and such pairs are representative of variance structure not the majority of the samples [8].Since the area covered for prediction was 72 ha, it could be stated that at least one sampling point is required per hectare.
Final prediction layers of K for both soil depths were presented as filled contours with 5 classes (Figure 9).The area on the prediction map with K value <12 cm•day -1 was delineated as compact area.It was further analyzed that 41% -42% of total area was under compaction for both layers.Hence instead of carrying out chiseling in the whole farm, it is required in less than half of the field, which could result in a saving of 50% of fuel and time.

Conclusions
Geostatistical Analyst has served as a bridge between geostatistics and GIS.In the other available kriging softwares, few geostatistical tools have been available, but only in geostatistical analyst, all tools are available and also integrated within GIS modeling environments.The most important feature of such Integration is that now it is possible to quantify the quality of prediction surface models by measuring the statistical error of predicted surfaces.In conclusion it could be stated that geostatistical analyst is a complete package for analysis of spatial variation of a given parameter on field scale and for assessing the various choices for drawing optimum and unbiased prediction surfaces.
The above study also suggests that Geostatistical analyst extension could be used as a tool for interpolation of data if it the sampling interval of the parameter is less than half of its major range.Thus its use is limited to field scale only.Study also indicated the superiority of

Figure 1 .
Figure 1.Location map of IARI farm with sample points.tool enables you to select points on the tail of the distribution.The selected points are displayed in the ArcMap data view.If the extreme values are isolated locations (i.e.surrounded by very different values), then they may require further investigation.If outliers are caused by errors during data entry that are clearly incorrect, they should either be corrected or removed before creating a surface.

Figure 2 . 10 Figure 3 .
Figure 2. Histogram showing normal distribution of no transformation and log transformation of hydraulic conductivity (K) for depth 0 -15 cm.Data 10

Figure 7 .
Figure 7.Comparison of cross validation statistics of different interpolation methods.

Figure 8 .
Figure 8.Comparison of cross validation statistics of predicted K(0 -15) data with different number of sampling points.

Figure 9 .
Figure 9. Prediction maps of K of 0-15 and 15-30 cm soil layers of main block of IARI farm.

Table 3 . Comparison of cross validation statistics of different models by Ordinary Kriging with no transformation and first order trend Model Mean Root-Mean-Square Average Standard Error Mean Standardized Root-Mean-Square Standardized Regression equation
* x + 8.529