Spatial prediction of soil infiltration using functional geostatistics

The infiltration of water into the soil is a necessary parameter for irrigation systems design. Characterizing its spatial behavior allows a site-specific management of water according to soil conditions and crop requirements. The aim of this study is to establish the spatial distribution of infiltration in an Andisol by means of two geostatistical approaches: on the one hand by means of functional kriging, taking as input infiltration curves (obtained after a smoothing stage), and on the other hand by using classical ordinary kriging on the parameters of the Kostiakov and Phillip models. The comparison between these methodologies is carried out taking as a criterion the sum of squared errors of a leave-one-out cross-validation analysis. The results show a high correlation between observations and predictions (R2 values around 99%), which indicates that the use of functional geostatistics in this context could be a good alternative. Moreover, from a descriptive point of view, we can point out that the contour maps of basic infiltration (BI), cumulative infiltration (Ci), saturated hydraulic conductivity (Ks), and sorptivity (S) obtained with the observed data, as well as the predictions by functional geostatistics, show a very similar behavior, which empirically validates the use of this methodology.


Introduction
Soil management does not usually consider the particularities of the terrain or its physical properties; thus it is important to perform an evaluation based on the concepts of sustainability (Martins et al. 2010). In efficient and sustainable agricultural production systems, it is important to understand the behavior of water within the soil in order to be able to use this resource reasonably.
Infiltration is a hydrodynamic attribute that concerns the movement of water and that is closely related to the capillarity processes and forces associated with the adhesion and cohesion of soil particles (Orjuela-Matta et al. 2010). Moreover, infiltration is a basic parameter that must be accounted for when designing and implementing irrigation systems (Chowdary et al. 2006;Machiwal et al. 2006). Other attributes associated with soil water movement include hydraulic conductivity and sorptivity (S). Hydraulic conductivity is associated with the resistance of soil pores relative to the soil itself, is used to solve drainage and soil conservation problems, and depends not only on soil properties but also on the soil water content (Jačka et al. 2016). Sorptivity explains the movement of water in the soil as an effect of its matric potential, since S is estimated as a measure of the ability of a porous medium to absorb or desorb fluid by capillarity (Moret-Fernández et al. 2017).
Understanding soil behavior and its rate of infiltration over time could help control soil erosion or other degradation processes, establish the availability of water for crops and the movement of substances in the soil, and be used to create strategies for watershed management (Orjuela-Matta et al. 2010).
In general, the study of soil infiltration begins with experimental trials in a finite group of sites within an area of interest. Next, the experimental data are fit to theoretical models, such as those of Kostiakov, Lewis, Horton, or Philip, among others, and the empirical parameters of the models are determined. This process concludes with a calculation of the univariate descriptive measures (localization and dispersion), obtaining distribution graphs (histograms and box plots), conducting multivariate analyses (correlation, classification, and principle components), and performing univariate geostatistical analyses (variogram estimation, kriging prediction, and creation of maps). The spatial behavior can be described (Camacho-Tamayo et al. 2013) and the management zones can be established for agricultural production by using these tools (Cucunubá-Melo et al. 2011).
Geostatistics (Isaaks, Srivastava 1989) is a field of statistics which is used in practice for predicting a variable for non-sampled locations of a region. Specifically, several variations of kriging and cokriging methods (Cressie 1993) are used to fulfill this task. These tools are considered when for each sampled site we have data for one or several properties (for example, hydraulic conductivity, or sorptivity). However, in many practical situations a large number of measurements are recorded continuously (depth, time, wavelength, etc.). In these cases, instead of univariate or multivariate data, we have (after a smoothing step) curves or functions (functional data). The extension of the classical univariate and multivariate geostatistical methods to the context of functional data has given rise a new field known as functional geostatistics (Giraldo et al. 2010). In particular, kriging and cokriging methods have been extended to in order to deal with functional data (Giraldo et al. 2009;Giraldo et al. 2010). These tools allow predicting curves rather than data of one or several variables. The application of functional geostatistics starts with the application of non-parametric smoothing techniques (e.g., kernel or B-spline regression) to convert the discrete data at each sampling site (i.e. infiltration) into continuous functions. After that, functional kriging (Giraldo et al. 2010) can be used for predicting curves (instead of scalar values) at sites for which information is lacking.
In this study, functional geostatistics is applied to infiltration curves in order to evaluate their predictive capacity. A cross-validation analysis is performed to estimate prediction errors and compare them with those obtained using standard univariate kriging methods that are used to analyze infiltration data (Giraldo et al. 2010). The infiltration curve of each site (obtained after smoothing the discrete data) is temporally removed and predicted by using functional kriging and classical ordinary kriging (in this case, the parameters of the Kostiakov and Philip models are predicted, and then they are used to construct the predicted curve). This paper is organized as follows: Section 2 gives an overview of the data analyzed, the Kostiakov and Phillip models, and the theory of functional geostatistics. Then in Section 3, a comparison between functional and classical univariate kriging methods (based on infiltration data) is carried out.

Description of study site
This study was performed at the Marengo Agricultural Center (Centro Agropecuiario Marengo, CAM by its initials in Spanish) located in the municipality of Mosquera (Cundinamarca, Colombia) at 4°42′ N and 74°12′ W and at an altitude of 2,543 m. The mean annual temperature in this zone is 13.1 °C, and the mean annual rainfall is 665 mm (weather records of the National University of Colombia). The area described, according to temperature and precipitation characteristics, is classified according to Holdrige as a life zone of low montane dry forest (bs-MB) and the climate as cold dry (FS).
The soils of the CAM have been formed from three types of parent materials, namely lacustrine clays, volcanic ash, and alluvial sediments. The type of relief corresponds to terraces and floodplains, specifically terrace plains and overflow plains (Bolívar, Ordóñez 2014).

In-situ infiltration measurement
An evenly spaced, rigid grid with 75 recording sites was created over four plots with mainly kikuyu grass (Pennisetum clandestinum). These plots covered an area of 16 ha and were located using a GPS Garmin Etrex 20 receiver in real time via satellite. At each site, a double-ring infiltrometer was used to measure water infiltration over a period of 150 minutes (at 1, 2, 3, 4, 5, 10, 15, 30, 45, 60, 90, 120, and 150 minutes) using variable loads and while ensuring that the difference between the maximum and minimum reading was never greater than 100 mm.

Determination of infiltration characteristics
The results obtained from the field were fit to two theoretical infiltration models, the Kostiakov (Equation 1) and the Philip (Equation 4) models. The parameters for these models were estimated by using the R software (R Development Core Team 2011) and then were used to calculate S, K s , B I , and C i and to carry out conventional statistical and geostatistical analyses.

Kostiakov model
where I(t) is the cumulative infiltration content (cm) at time t and a and b are empirically-derived fit coefficients.
with B I as the basic infiltration (cm h −1 ).

Philip model
where S depends on the initial soil moisture conditions (cm h −1/2 ) and K s is the saturated hydraulic conductivity of the soil (cm h −1 ).

Functional geostatistics
Functional geostatistics (Giraldo et al. 2010) offers methods for spatial prediction of curves. Usually, the discrete data (for example infiltration records) at each sampling site are previously recorded in continuous curves after smoothing them by using basis functions (Ramsay, Silverman 2005). Afterward, they are used for predicting a whole curve at unsampled sites. A cross-validation analysis was carried out. Each of the 75 curves was temporarily removed and predicted based on the remaining 74 by using functional kriging (Giraldo 2009). Thus observed and predicted curves (by using functional kriging) were obtained for each of the 75 sites. The comparison between them (cross-validation) was used for evaluating the quality of this approach and also to compare with the results obtained by means of classical geostatistics.
The functional kriging predictor is defined as follows (Giraldo 2009): where χ(s 0 ) is the predicted curve at site s 0 , χ(s i ) corresponds to the curve observed at site s i , and i = 1.2, … , n and λ i = 1, … , n, are the weighting parameters that indicate the contributions of each observed curve to the predicted curve.
The λ i parameters are estimated by solving the following system of equations (Giraldo et al. 2010): where the integrals correspond to the trace-variogram function (Giraldo et al. 2010) evaluated for both the distances between observation sites (left matrix) and the distances between the observation site and the prediction site (right vector). The trace-variogram function is estimated by the method of moments (Giraldo et al. 2010): ‖s i − s j ‖ = h} is the number of pairs of sites separated by a distance h. Once the trace-variogram function for a sequence of K values is estimated, a parametric semivariance model (spherical, Gaussian, exponential, etc.) is fit to the scatterplot in order to make an estimation at any possible distance between sites.
The results achieved with functional geostatistics were compared with those obtained by using conventional geostatistical methods (estimating the semivariogram and using classical ordinary kriging). For this process, the variables B I , C i , S, and K s from the theoretical infiltration models were considered. For each case (each variable), the semivariance function was estimated (Cressie 1993): where z(x i ) is the value of each variable at a site i, z(x i + h) is the value of the same variable at a point with a distance h away from the previous point, and N(h) is the number of pairs of data separated by the distance h. This empirical semivariogram is fit using one of the above-mentioned theoretical semivariance models. Various criteria are considered for selecting the best model, including the coefficient of determination (R 2 ), the least sum of squared errors (LSSE), and the cross-validation correlation (CVC). The shared parameters among the theoretical semivariance models include the nugget (C 0 ), which is a discontinuity in the semivariogram at the origin, the variance of the process (C), and the reach (r), which is the distance until there is a spatial correlation. The nugget-variance ratio, C/(C 0 + C), is also often used as a criterion for model selection. This parameter establishes the degree of spatial dependence (DSD) expressed by the studied attribute. Cambardella (1994) states that if the DSD is greater than 75% the dependence is strong, if the DSD is between 25% and 75% the dependence is moderate, and if the DSD is less than 25% the dependence is weak.

Results and discussion
In Figure 2, we can see the fit of the data to a decreasing potential curve, which describes the infiltration rate. The strong change in the first moments of the curve confirms that the soil was in a state of water deficit at the time of the test. Given the trend of the curves, the results suggest that at some points the soil did not reach a condition close to saturation, after being subjected to a constant application of water, for a period of 150 minutes, especially for those sites that they registered a high infiltration rate. This behavior was observed for all the infiltration tests performed and reported by different authors (Machiwal et al. 2006;Orjuela-Matta et al. 2010;Latorre et al. 2015).
To provide an example of a functional geostatistical analysis of collected data, a curve was predicted for a site located at coordinates 984475 (W) and 1009685 (N) (Figure 1), as shown in Figure 2. To perform this prediction, the trace-semivariogram function was estimated using Equation 7. By using this function and Equation 6, the λ i parameters for the functional kriging predictor, as defined in Equation 5, were estimated. Additionally, based on the predicted curve, the parameters for the theoretical Kostiakov and Philip models were estimated (Table 1).
The predicted curve is consistent with the behavior patterns of the observations. After 60 minutes, the water entry rate into the soil, or infiltration velocity, became constant, due to the increased matric potential of the soil. According to the observations   of Montenegro and Malagón (1990), the infiltration velocity at the unsampled site was high (12.7-25.4 cm h −1 ), potentially due to a greater quantity of micropores (Richmon, Rillo 2006) relative to the sites with lower infiltration, which resulted in a high K s value. The recorded cumulative infiltration curves and the predictions at each site are shown in Figure 3. As an indicator of goodness-of-fit, a simple linear regression (Figure 4) was estimated to compare observed and predicted values. This plot shows a high correlation between them. The R 2 (around 99%) confirms that the method used allows obtaining good predictions.
We compared observed and predicted data of C i , B I , K s , and S, respectively, by means of descriptive measures (Table 2), estimated parameters of the semivariogram models (Table 3), and maps of spatial distribution ( Figure 5).  Tab. 2 Descriptive measures for the Kostiakov and Philip parameters estimated from the observed (Obs) and predicted (Pre) data.

Model Parameter Mean Median CV (%) Min Max
Kostiakov In both cases (Kostiakov and Phillip models), the descriptive values in Table 2 are very similar, indicating a good performance of the functional kriging results.
Tab. 3 Parameters of the theoretical semivariogram model fit by observed (Obs) and predicted (Pre) basic infiltration values (BI), cumulative infiltration (Ci), sorptivity (S), and saturated hydraulic conductivity (Ks). The mean of parameter B I (Table 2) indicates a moderately rapid infiltration velocity (Montenegro, Malagón 1990). However, at the location of the study, B I varies from moderately slow to very fast. Some high infiltration velocities represent relatively dry soils due to the hydrological stress that they are subjected to. Additionally, high CV (greater than 60%) values indicate general areas or sites where infiltration behavior is very different from the other sites (Rodríguez-Vásquez et al. 2008;Martins et al. 2010).

Parameter Model
The estimations of the semivariogram parameters (C o , C o + C and Range, respectively) were similar (Table  3), which again indicates a good performance of the functional predictor used in this study. In both cases, spherical and exponential models were fit. In this sense, it is important to mention that some authors have reported problems for fitting semivariogram models to these properties (Rodríguez-Vásquez et al. 2008).
According to the results shown in Table 3, we can conclude that there is a moderate spatial dependence for C i and B I , because the values of DSD are greater than 25% and lower than 75% (Cambradella et al. 1974). This behavior is similar to that reported by Martins et al. (2010). In addition, the K s and S show C o values close to zero, which indicates a strong spatial dependence. A similar result was reported by Rodríguez-Vásquez et al. (2008) for the same type of soil.
The contour maps reveal high spatial variability in terms of the soil properties studied ( Figure 5). The maps generated through the estimations obtained by means of functional kriging have a pattern very similar to those obtained from the recorded data (only minimal differences can be identified in each case). We can see in these maps that there is a direct relationship between the parameters (zones with high values of B I have also high values of C i ). This is also evidenced with the parameters of the Phillip model. We also note in these maps that there is direct relationship between K s and B I and C i (zones with high values of K s also have high B I and C i values). High values of K s result from the passage of time and the saturation of soil pores with water, which cause the rate of infiltration to reach a constant value (B I ) that is similar to K s (Gil, 2002), and consequently the soil can drain greater amounts of water and achieve greater C i values.

Conclusions
The cross-validation analysis showed a high correlation between the results obtained with observed and predicted data (R 2 around 99%), which supported the use of functional geostatistics. Traditional statistical analyses demonstrated the reliability of applying FG to predict infiltration curves, given that the descriptive measures of the parameters for the theoretical infiltration model were similar for the observed data and predictions, even when the parameters displayed high spatial variability.
The estimations of the semivariogram parameters and the contours maps were also similar. The contour maps for B I , C i , K s , and S for the recorded and predicted data demonstrate the wide range of these parameters. However, the maps also showed the same behavioral patterns, which confirmed the congruence between the studied models and the efficacy of predicting these parameters via FG. This was established even when the parameters displayed high spatial variability. All the analyses indicate that functional geostatistics can be a useful tool in the study of the spatial variability of soil properties.