Robust soil mapping at the farm scale with vis–NIR spectroscopy

Sustainable agriculture practices are often hampered by the prohibitive costs associated with the generation of fine‐resolution soil maps. Recently, several papers have been published highlighting how visible and near infrared (vis–NIR) reflectance spectroscopy may offer an alternative to address this problem by increasing the density of soil sampling and by reducing the number of conventional laboratory analyses needed. However, for farm‐scale soil mapping, previous studies rarely focused on sample optimization for the calibration of vis–NIR models or on robust modelling of the spatial variation of soil properties predicted by vis–NIR spectroscopy. In the present study, we used soil vis–NIR spectroscopy models optimized in terms of both number of calibration samples and accuracy for high‐resolution robust farm‐scale soil mapping and addressed some of the most common pitfalls identified in previous research. We collected 910 samples from 458 locations at two depths (A, 0–0.20 m; B, 0.80–1.0 m) in the state of São Paulo, Brazil. All soil samples were analysed by conventional methods and scanned in the vis–NIR spectral range. With the vis–NIR spectra only, we inferred statistically the optimal set size and the best samples with which to calibrate vis–NIR models. The calibrated vis–NIR models were validated and used to predict soil properties for the rest of the samples. The prediction error of the spectroscopic model was propagated through the spatial analysis, in which robust block kriging was used to predict particle‐size fractions and exchangeable calcium content for each depth. The results indicated that statistical selection of the calibration samples based on vis–NIR spectra considerably decreased the need for conventional chemical analysis for a given level of mapping accuracy. The methods tested in this research were developed and implemented using open‐source software. All codes and data are provided for reproducible research purposes.


Introduction
Farmers need accurate information about the spatial distribution of soil properties to support sustainable agricultural production systems. Conventional methods of soil sampling and analysis are the number of samples collected, which impairs the accuracy of soil maps.
An alternative to overcome the time and budget constraints associated with adequate sampling densities for soil mapping is the use of visible and near infrared (vis-NIR) spectroscopy. This technique has garnered wide interest during the past two decades in soil assessment studies (Kopačková & Ben-Dor, 2016) and its benefits have been documented extensively. It enables the quantification of several important properties of soil samples from their vis-NIR spectral responses in a cheaper and faster way than by conventional laboratory methods. For this reason, vis-NIR spectroscopy enables soil surveyors to increase sampling densities without incurring substantial additional costs, and provides the potential for fine-resolution soil monitoring (Wetterlind et al., 2008;Ramirez-Lopez et al., 2014). For a detailed review on the relation between soil vis-NIR light absorbance and soil properties, refer to Stenberg et al. (2010).
The research efforts on soil vis-NIR spectroscopy have resulted in the development of many soil vis-NIR databases (i.e. spectral libraries) for the quantification of soil properties at farm (Guerrero et al., 2016), regional (Gogé et al., 2012) or global scales (Viscarra Rossel et al., 2016a). These studies have shown that the inaccuracy of soil vis-NIR models tends to increase when the geographical scale at which they are calibrated also increases . This inaccuracy is largely determined by how variable the soil is and the environmental conditions covered by the calibration samples .
One may want to derive digital soil maps based on the soil properties available at sampling locations (i.e. derived either from laboratory analysis or vis-NIR spectroscopy). At the continental scale, Viscarra Rossel et al. (2015) produced three-dimensional maps of soil properties for Australia by combining legacy soil data and vis-NIR information. At a smaller regional scale, Peng et al. (2015) combined laboratory and remotely sensed vis-NIR spectra to model topsoil organic carbon. Both authors showed that augmenting the laboratory measurements with vis-NIR-predicted data considerably improved the accuracy of soil maps, even though the predictions of soil properties with the latter remain less accurate than conventional laboratory analysis (Viscarra Rossel & Webster, 2012). For farm-scale soil mapping, vis-NIR spectroscopy has been successfully applied in situ and in the laboratory to obtain spatially dense soil data (Table 1). For example, Brodský et al. (2013) used soil vis-NIR spectra recorded in the laboratory to predict the topsoil organic carbon content of 579 samples over an area of 100 ha (Table 1).
Farm-scale soil mapping with vis-NIR-augmented data (i.e. laboratory data coupled with spectroscopic predictions) has been carried out by spatial interpolation, predominantly kriging (Table 1). Kriging requires an experimental variogram, and to compute this the underlying random variable should be approximately normally distributed. In practice, however, soil data are often skewed (Lark & Lapworth, 2012) and might also contain outliers. Robust variogram estimation can prevent outliers from having an adverse effect on the experimental or sample variogram and therefore the model parameters (see Nussbaum et al., 2014). Outliers are less likely to occur in the vis-NIR-predicted soil properties than in conventionally measured properties because vis-NIR predictions tend to smooth the variation. However, vis-NIR models can be inaccurate, for example if they are predicting a soil property using spectra from an unknown soil. Such predictions could be considered outliers. In addition, vis-NIR-augmented datasets might also contain outliers because they include laboratory measurements. In-situ vis-NIR measurements are also prone to produce predictions that contain outliers because of variation introduced by less stable conditions for data acquisition and lack of sample preparation prior to the spectral measurements (e.g. samples are not dried or sieved). Despite these effects, the use of robust geostatistics has been disregarded for farm-scale soil mapping with vis-NIR augmented datasets.
Recently, Somarathna et al. (2018) stressed the importance of accounting for the propagation of uncertainty with vis-NIR models through spatial models. Error propagation is frequently ignored in soil mapping with vis-NIR-augmented data, which might affect the assessment of mapping accuracy. In our review of the literature for farm-scale soil mapping with vis-NIR data (which spanned the last 10 years) only two of the papers we cite applied error propagation analysis of the vis-NIR model ( Table 1). The first, by Brodský et al. (2013), studied how the uncertainty of predictive vis-NIR models of soil organic carbon (SOC) calibrated with different sample sizes affected the accuracy of mapping the soil property. The second, by Viscarra Rossel et al. (2016b), used vis-NIR for mapping SOC by kriging with external drift and a novel approach to account for the uncertainty propagated from the vis-NIR models to the spatial modelling.
When sampling for calibrating vis-NIR models for farm-scale mapping, several methods are commonly used, such as Kennard-Stone, (Kennard & Stone, 1969), conditioned Latin hypercube (Minasny & McBratney, 2006) and k-means (Naes, 1987). However, choice of the size of vis-NIR calibration datasets is often arbitrary (Table 1). Vis-NIR models derived from calibration sets with insufficient samples to represent soil variation adequately in the field might lead to poor ability to generalize, which will reduce mapping accuracy. Therefore, optimization of sample size for vis-NIR calibrations should also be taken into account for farm-scale mapping in relation to both budget and mapping accuracy.
Given that some of the key aspects suggested above have been neglected previously in vis-NIR spectroscopy for farm-scale soil mapping, our study aimed to (i) optimize the use of vis-NIR spectroscopy at the farm scale to predict soil properties robustly, (ii) demonstrate the utility of vis-NIR data for soil mapping at the farm scale in a potential application scenario and (iii) compare maps based on conventional laboratory analysis with those using a vis-NIR-augmented dataset, while accounting for the propagated prediction error of the spectroscopic model. In the spirit of reproducible research, we make all the data and the computational code used for carrying out the analyses presented in this study publicly available.

Study area
The study area near the municipality of Barra Bonita in the state of São Paulo (Brazil) covers 473 ha at altitudes ranging from 550 to 710 m above sea level. There is part of a small catchment that runs from northeast to southwest in which sandstones dominate, but basaltic flows also occur to a lesser extent. The predominant soil types are classified as Typic Quartzipsamment (TQ), Typic Hapludox (TH), Typic Hapludalf (THa), Typic Hapludult (Thu) and Typic Eutrudept (TE).

Soil sampling and laboratory analysis
We collected samples at 458 sites on a regular grid of 100 m × 100 m at two depths (l = {A: 0-0 0.2 m with 458 samples; B: 0.8-1.0 m with 452 samples}), representing a total of 910 georeferenced samples ( Figure 1). The sampling design was not initially intended for geostatistical mapping; therefore, this grid might not account for spatial dependence over short distances. Our study focused on particle-size fractions (i.e. sand, silt and clay contents) and exchangeable calcium content (Ca ++ ). The determination of particle-size fractions was by the hydrometer method, in which sodium hydroxide (4 g l −1 ) and sodium hexametaphosphate (10 g l −1 ) were used as dispersing agents. Soil property Ca ++ was extracted by KCl 1 mol l −1 solution and quantified by atomic absorption spectrophotometry.

Acquisition and pre-processing of the vis-NIR spectra
For the spectral measurements, the samples were oven-dried for 24 hours at 45 ∘ C, ground and sieved (2 mm mesh). Following that, an infra-red intelligent spectroradiometer (IRIS) sensor (Geophysical and Environmental Research Corporation, 1996, New-York, United-States) was used to measure the vis-NIR reflectance spectrum of each sample. The data were obtained at a resolution of 2 nm between 350 and 1000 nm, and at a resolution of 4 nm between 1000 and 2500 nm. We used the R package prospectr (Stevens & Ramirez-Lopez, 2013) implemented in R 3.3.2 (R Core Team, 2013) for the preprocessing of the spectral data. The vis-NIR reflectance spectra were transformed to apparent absorbance (log 1/reflectance; the logarithm is natural throughout). To improve the signal-to-noise ratio we applied the Savitzky-Golay filter and resampled the spectra to a resolution of 6 nm. The spectral ranges with very small signal-to-noise ratio (corresponding to the ranges between 400-500 nm and 2340-2500 nm) were removed from the spectra. Multiplicative interference and light scattering effects were reduced by normalizing the spectra using the standard normal variate (SNV) algorithm (Barnes et al., 1989).

Selection of calibration and validation sets
Validation samples. A total of 115 sampling locations were randomly selected for validation. Observations from both depths (surface and subsurface, comprising 227 samples) were used for validation; subsurface samples were not available in three cases. All remaining samples (910-227 = 683 samples) were used for calibration. The locations of the validation samples are shown in Figure 1.
Calibration samples. The optimal number of samples to calibrate the vis-NIR models was based on the method proposed by Ramirez-Lopez et al. (2014) as follows.
1 The vis-NIR spectra were projected on to the principal components (PCs) space. The PCs were derived with the singular value decomposition algorithm using the R package resemble (Ramirez-Lopez & Stevens, 2016). 2 Subsets of different sizes were sampled within the set of possible calibration samples. We started from ten and went to 400 samples in steps of ten. Each subset was sampled using the conditioned Latin hypercube (cLH) algorithm (Minasny & McBratney, 2006) from the PCs of the vis-NIR data. For the cLH sampling we used the R package clhs (Roudier, 2011). 3 For each calibration subset we computed the mean squared Euclidean distance (msd) between estimates of the probability density functions (pdfs) of the whole set of samples and the pdfs of samples in the subset. The msd was computed for the PCs of the vis-NIR data as follows: where k is the number of PCs, KDE x j ∈cs represents a vector of kernel density estimates for the probability density function of the jth PC of the subset of samples cs and KDE x j represents a vector of kernel density estimates for the probability density function of the jth PC for the whole population. The estimates corresponding to both the KDE x j and the KDE x j ∈cs were computed for the same values within the ranges of values of the PC j and using the same bandwidth and Gaussian kernel. The optimal calibration set size was indicated by a substantial reduction in the msd, and when there was no further change by including more samples; that is, the KDE x j ∈cs (kernel density estimates of selected subset of samples) was similar to the KDE x j (kernel density estimates of the whole set of candidate samples for calibration). This was decided visually by plotting the size of the calibration sample set against the averaged msd.
To obtain reliable estimates of msd as a function of the sample set size, steps 1-2 were repeated ten times and the final msd considered was the average of those obtained during all iterations. Once the optimal set size was identified, we sampled the final calibration set using the cLH algorithm. For this, we sampled ten different sets with the suggested optimal size and selected the one with the minimum msd as the final calibration set.

Transformation of the particle-size data
Sand, silt and clay contents are reported as proportions that sum to 100%. However, models formulated for each of these fractions do not guarantee that their individual predictions sum to 100%. To avoid this compositional constraint, the particle-size data (V = {clay, silt, sand}) for both depths (l = {A : 0-0.2 m, B : 0.8-1.0 m}) were transformed using the additive log-ratio (alr) transformation (Odeh et al., 2003), applied as follows: where Y l, i is the resulting transformed variable, V l, i is the ith variable of the set of compositional variables (silt and clay contents) at depth l, V l, r designates a reference compositional variable at depth l and r is the total number of compositional variables. In this study, we used the sand content as reference (V l, r ). Back-transformations of the alr-transformed variables for the vis-NIR predictions and spatial predictions are explained in their respective sections.

Vis-NIR modelling and predictions
A prediction model based on vis-NIR spectra was fitted for each soil property using the selected optimal calibration subset. Each model was developed using a memory-based learning (MBL) algorithm. That is, for each new sample (requiring prediction of a given soil property), a local model is calibrated using only the most similar samples (nearest neighbours) in the calibration set. They were selected based on the similarity or dissimilarity among the spectra. Although MBL is commonly used to model large and complex vis-NIR datasets (e.g. Clairotte et al., 2016), it can also be used for modelling small vis-NIR datasets. In this case, rather than overcoming the typical complexity of large vis-NIR datasets, the MBL algorithm was used to optimize the prediction of each sample by removing calibration samples that were far from the prediction sample in the vis-NIR space. These distant samples can be considered as outliers, which might affect the accuracy of prediction. In other words, this modelling technique can be seen as a collection of local regressions with the removal of outliers. The basic aspects of the MBL algorithm for calibration and prediction with the removal of outliers are given below.
1 A similarity or dissimilarity metric was created to select the nearest neighbours (excluding possible outliers) for each sample we wished to predict. This consisted of projecting the spectra on to the partial least squares (PLS) space and then computing the Mahalanobis distance between samples. We used an optimized selection of components based on a nearest-neighbour approach (see Ramirez-Lopez et al., 2013, section 2.12, for more information). 2 We defined the number of neighbours to select (or the number of local outliers to remove). Different threshold distances (from 0.3 to 1.5 in steps of 0.05) were tested to remove possible outliers. We conditioned outlier removal to ensure a minimum of 120 samples (out of 180 samples) in the local calibration set. In fact, this number must be small compared with the total number of samples in the calibration set so that one would not expect outliers to be included in the analysis. From this point, we no longer used the dissimilarity information. 3 The nearest selected observations were used to fit local linear models by the weighted average PLS regression method. This is a case of ensemble modelling where the final predictions are a weighted average of the predictions generated by multiple (and consecutive) PLS components. The weight for each component (w j ) is calculated as follows: where s 1 : j is the root mean square of the spectral residuals for unknown samples when a total of j PLS components is used and g j is the root mean square of the regression coefficients corresponding to the jth PLS component. We averaged the predictions retrieved by 16 PLS models (ranging from 5 to 20 PLS components). The MBL modelling was implemented using the R package resemble (Ramirez-Lopez & Stevens, 2016). Nearest neighbour cross-validation (Ramirez-Lopez & Stevens, 2016) was used to optimize the threshold distance (for neighbour selection) by minimizing the standardized root mean squared error (sRMSE) of the cross-validation, which was computed as follows: whereŷ i is the predicted value and n is the number of validation samples.
Prediction using the vis-NIR models. After optimization, the MBL algorithm was applied to the validation set to predict the soil properties. To report these results for the particle-size distribution, we back-transformed the additive log-ratio transformed variables to the original clay, silt and sand content. These back-transformations were based on the following equations: The predicted back-transformed valuesV l,i ,V l,r and Ca ++ were compared with reference values (obtained from conventional wet-chemical analysis) using the root mean squared error (RMSE) and the coefficient of determination (R 2 ). The bias was assessed by the mean error (ME). Finally, MBL was used to predict the soil properties of the samples that were not included in the validation set (227 samples) or in the final calibration subset (180 samples). For each soil property and for each depth, we pooled the reference laboratory values (i.e. the values of the samples in the calibration set) and the vis-NIR predicted values (i.e. the values of the samples outside the calibration and validation sets). Hereafter, we refer to this resulting dataset as vis-NIR augmented dataset.
Spectroscopic model error. We note that our spectroscopic predictions are not error free (Somarathna et al., 2018). Viscarra Rossel et al. (2016b) indicated two possible sources of error. The first is the spectroscopic sampling error (i.e. the spectra are measured at only two discrete depths in the soil profile). The second is the uncertainty introduced by using the spectroscopic model. The former is accounted for in the spatial modelling step by the nugget variance of the variogram, whereas the latter needs to be quantified explicitly because it is propagated through spatial prediction. We followed the approach introduced by Viscarra Rossel et al. (2016b) and estimated the average error of spectroscopic prediction by taking the variance of the residuals between the reference value y i and the predicted valueŷ i for i = 1, … , n samples, and for each depth l. We further checked for spatial dependence of the prediction errors of the vis-NIR models. The variance of these predictions, denoted hereafter by var (̂y , was propagated to the spatial analysis in the next section.

Spatial modelling of the vis-NIR augmented data
Spatial modelling. Soil property Y: Ca ++ and alr-transformed particle-size fractions at a depth l = {A : 0 -0.2 m, B : 0.8 -1.0 m} at location s in the study area are defined as: where l is a (constant) mean and Z l (s) is a normally distributed, autocorrelated Gaussian random field with zero mean and unit variance. If the correlation function l between Z l (s i ) and Z l (s i ′ ) becomes smaller as the lag distance h between s i and s i ′ increases, then Z l (s) is spatially structured. Finally, l accounts for an uncorrelated random error with possibly a long-tailed distribution. Isotropic models can be fitted to the experimental variogram, such as exponential or spherical functions. Spherical function: where is the nugget variance, which represents the spatially uncorrelated variation at distances less than the sampling interval and measurement error, and is the range of spatial dependence or spatial autocorrelation. Exponential function: where (h) is the semivariance at lag h, is the a priori variance of the autocorrelated process and is a distance parameter for this function. The exponential model approaches its sill gradually and also asymptotically so that it does not have a finite range. In practice, an effective range is assigned as the distance at which the function has reached 95% of . The effective range, , is 3 . Note that Z l is correlated in space whereas Z l and Z l ′ are uncorrelated if l ≠ l ′ . The parameters of the autocorrelation model l = ( l , l , l ) , the mean l and the realizations of the Gaussian field Z T l = are unknown and must be estimated from the data. We propagated the prediction error of the spectroscopic model by adding the estimated variance var (̂y l ) to the diagonal of the covariance matrix l of Z l (as in Knotters et al., 1995) corresponding to the sampling locations where the soil property was derived by spectroscopy.
Spatial model parameter estimation. Model parameters were estimated using a robust version of the residual maximum likelihood (rREML) method first developed by Künsch et al. (2013). In rREML, variogram and spatial trend parameters are estimated simultaneously. This method particularly suits data that possibly contain outliers. The robustness of rREML parameter estimation is controlled by a tuning constant c, which accounts for the weights given to the outliers in the observations when estimating l and l . We set this value to two based on expert knowledge and similar previous research (e.g. Nussbaum et al., 2014), which is equivalent to providing a moderate robustness to parameter estimates.
Predictions and validation. Once the parameters had been estimated, it was possible to predict the soil properties at new, unobserved locations by kriging (Cressie, 2006): wherêl is the mean estimated by rREML, l,̂( s) and l,̂a re a vector with the covariances between Z l and Z l (s), and covariance matrix of Z l , respectively. They were estimated using the parameter vector̂l. Predictions were made by global block kriging on 10 m ×10 m blocks.
To back-transform the alr-predicted values to the original clay, silt and sand contents, we used the standard unbiased back-transformation for lognormal kriging, computed as follows (Cressie, 2006): and V l,r = 100 1 The RMSE, R 2 and ME were used to assess the accuracy and the bias of the prediction at validation locations for the back-transformed clay, silt, sand and the predicted Ca ++ .
Geostatistical models corresponding to each soil property were also derived using only the reference laboratory data, and validated using the 115 locations described above (227 samples). Geostatistical analyses were carried out using the georob package (Papritz, 2017) implemented in R 3.3.2. For comparison, we also calibrated a geostatistical model in which the estimate of REML parameters was not robust, denoted standard REML hereafter. Table 2 presents the summary statistics for the observations. All soil properties have a large standard deviation except for silt. Because the area is dominated by highly weathered soils developed from sandstones, most of the samples have large sand contents at both depths. This explains the small variation in silt content of this area (standard deviation ≤ 5.2).

Soil properties and vis-NIR spectra
The doublet absorption feature observed in the vis-NIR spectra near 2200 nm in Figure 2 indicates the presence of kaolinite in the sampled soils (Demattê et al., 2006). In addition, the spectra exhibit absorption features centred at 550 and 850 nm, characteristic  of iron-bearing minerals such as goethite and haematite . Samples corresponding to Typic Quartzipsamment have a higher albedo (i.e. overall reflectance) than soil samples of other classes because of their larger sand contents. Table 3 gives the Pearson correlation coefficients between soil properties and the average vis-NIR absorbance. Overall, there is a strong absolute correlation between soil properties and soil albedo (correlations always larger than 0.6). The large absolute correlation of mean absorbance with the clay and sand contents (around 0.8) suggests that soil particle-size fractions were an important factor affecting the overall absorbance intensity. Relatively large correlations between Ca ++ and clay and sand content indicate that this attribute was to some extent correlated with the particle-size fractions; therefore, the relation between Ca ++ content and spectral information might have depended partially on indirect correlation.

Optimal calibration set size
The first five PCs accounted for 98% of the spectral variation and were selected to infer the optimal calibration set size. Figure 3 shows the mean squared Euclidean distance (msd) values corresponding to comparisons between estimates of the probability density functions (pdfs) of the whole set and pdfs of the samples in the different calibration sets. The msd values decreased as the size of the sample set increased. Despite this, the differences between the calibration sets in terms of their msds were marginal beyond 180 samples. Therefore, we used this number of samples as the optimal size for calibration of the vis-NIR models of the target soil properties.

Performance of the MBL modelling
The vis-NIR models of the additive log-ratio (alr) transformations of particle-size fractions formulated with the final calibration set (180 samples) gave good predictive performance for the back-transformed sand, silt and clay contents (Table 4). Back-transformed predictions for sand and clay contents have a large R 2 of 0.88 and 0.87, respectively. This confirms that the spectral variation described in the previous section relates to the particle-size fractions. Accuracy was least for the model of Ca ++ (R 2 of about 0.6), indicating that predictions for this attribute might have depended on its indirect relation with the soil spectra. The ME values show that predictions by the vis-NIR models are slightly positively (sand and silt) or negatively (clay and Ca ++ ) biased.
During calibration, only one prediction sample (out of the 503) required removal of an outlier before creating the log(silt/sand) model. Therefore, a global vis-NIR model was almost appropriate for predicting this attribute considering the dataset used. Conversely, all the samples in the prediction set required local outlier removal for the estimation of log(clay/sand) and Ca ++ .

Spatial analysis
The fitted variogram models for both datasets (i.e. laboratory based and vis-NIR augmented) show strong spatial dependence Table 4 Results of the validation of the predictive vis-NIR models (with 180 samples) in the validation set (n = 227). The results for the particle-size fractions are presented after back-transformation of log(silt/sand) and log(clay/sand).  in all cases (Figures 4 and 5). However, the sill variances of both laboratory-based and vis-NIR-augmented fitted variograms indicate considerable variation in the soil properties. In most cases, the sill of the vis-NIR-augmented variograms is larger than that estimated for the laboratory-based variograms. There are clear differences between the variogram parameters estimated for depths A and B, for example in their (effective) ranges, which differ considerably in some cases (e.g. range of log(silt/sand) for depth A was 812 m, whereas for depth B it was 423 m). This suggests smoothing of the topsoil variation compared with that of the subsoil.

Soil property
The errors of the vis-NIR predictions were spatially independent, which was confirmed by plots of the sample variograms of the prediction errors of the spectroscopic model (not shown). For each soil depth, the variance of the propagated error from the vis-NIR prediction is presented in Table 5. The contribution of model error to the total error from spectroscopic modelling is moderate. When compared with the spectroscopic sampling error (nugget variance, given in Figures 4 and 5), the spectroscopic model error represents between 20% (log(clay/sand), depth A) and about 50% (log(clay/sand), depth B) of the total spectroscopic error.
The robust model calibration provided a weight < 0.8 for several observations (Table 6). This means that there were potential outliers in the observations used for the geostatistical model calibration. There were more potential outliers at depth B than A. With the exception of Ca ++ at depth B, the vis-NIR-augmented dataset was more prone to outliers than the laboratory-based one because of inherent uncertainty of the vis-NIR models (Table 6). This indicates the need for more robust geostatistical methods when vis-NIR augmented datasets are used.
The spatial predictions using only the results of laboratory analysis outperformed those using the vis-NIR-augmented dataset (Table 7). Particle-size fractions were, in general, well predicted. The R 2 for Ca ++ was relatively small compared with that for the rest of the predicted soil properties, and was predicted more accurately at depth B than A.
The ME values reported in Table 7 show that the spatial predictions were always slightly positively or negatively biased. The bias was more important in B than in A, and more pronounced in the vis-NIR-augmented predictions than in the laboratory-based ones. For the vis-NIR-augmented spatial prediction in B, sand content was considerably overestimated (ME = −4.75), whereas clay and Ca ++ contents were underestimated (ME of 5.05 and 2.41, respectively). However, when compared with the relatively larger RMSE values, the ME shows that inaccuracy of the spatial predictions was mainly a result of their imprecision rather than a result of their bias (Viscarra Rossel et al., 2016b).
Comparison of the prediction results when using a robust model with the standard REML parameter estimates showed that there was a small increase only in prediction accuracy when using the robust REML (Table 7). Figures 6-8 and 9 show the spatial predictions of sand, silt, clay and Ca ++ , respectively. In all cases, the maps produced by both methods were similar with relatively small absolute differences. The largest absolute differences between maps are in the southwest part of the study area, where the altitudes are lowest. This is mainly because of the different parent material (basaltic rocks) from which soils in this area have developed. Performance of both vis-NIR and conventional spatial models tends to fail in this area because of the lack of sample representation on soils derived from basaltic rocks. For example, the vis-NIR-augmented maps of sand show smaller contents than the laboratory-based map ( Figure 6). A different pattern is evident for silt, clay and Ca ++ (Figures 7, 8 and 9, respectively). Note that the maps predicted with the vis-NIR augmented dataset are smoother than those predicted with the laboratory-based dataset. This was expected and is explained in the discussion.

Discussion
By using the proposed methodology, we reduced the number of samples to be analysed by conventional methods by 74%. This percentage excluded the validation samples. Together with the reduction in number of calibration samples, there should be an equivalent reduction in chemical reagents needed for soil analysis, mitigating the environmental effects of soil survey. Also, one soil sample needs at least 3 days to pass through a complete conventional analysis, whereas spectroscopic measurements can be recorded in a few minutes and it is possible to obtain information from several soil features with a single reading. The above result indicates that the mapping costs could be substantially reduced with vis-NIR spectroscopy as a complementary technique to conventional soil analysis, as observed by , Debaene et al. (2014) and Ramirez-Lopez et al. (2014). Statistical selection of the optimal vis-NIR calibration set is not new. Our results confirm the conclusion drawn by Debaene Table 7 Results of the validation of the geostatistical predictions in the validation set for each depth. The results for the particle-size fractions are presented after back-transformation of log(silt/sand) and log(clay/sand). The results obtained using the robust REML parameter estimation are compared with those using the standard REML Soil property Laboratory based  et al. (2014). The authors used the k-means algorithm to show that the number of calibration samples can be reduced by a factor of five, without substantial loss of prediction accuracy. In this study we used the mean squared Euclidian distance between a subset of sample pdf and the whole calibration sample set pdf to choose the optimal calibration set size. This is similar to selecting a few representative spectra from the whole calibration dataset. Difference mmol c kg −1 Figure 9 Maps of Ca ++ content generated from the laboratory-based and vis-NIR-augmented datasets. The right-hand maps show the differences between them.
methodology would perform equally well for a different case study. Further work will show if this is effectively the case. Several authors have suggested that MBL requires large reference databases (e.g. Gogé et al., 2012;Guerrero et al., 2016) and there appears to be a consensus that this is one of the main constraints of this technique. Nevertheless, we have demonstrated that MBL could be applied successfully even when small (but representative) reference datasets are available. The goal of MBL in this study was to (i) select the most similar samples from a local spectral library to increase the accuracy of the prediction of each sample and (ii) enable the model to discard potential outliers contained in the spectral library. When no outliers were identified and the model for the target soil property was created with all the samples, the local model of the MBL algorithm became equivalent to a global model. Therefore, one may expect that accuracy of the MBL local model would be at least as good as that of the global model. This supports the findings of Lobsey et al. (2017), who used a small set of local samples together with selected samples from a large spectral library for local predictions.
For our case study, the relatively accurate maps derived with vis-NIR augmented models suggest that calibration benefits from the local calibration and outlier removal. In all vis-NIR models it was necessary to remove at least one observation. This shows that local spectral libraries can also contain outliers, which can negatively affect the models' prediction quality. Local spectral libraries are generally more effective than large-sized ones in providing accurate vis-NIR predictions at the local scale (Guerrero et al., 2016), but they can also contain outliers, which must be accounted for. Our method is based on the similarity or dissimilarity among spectra, which performed well at the farm scale.
We propagated the error of the spectroscopic model by spatial analysis. The variance of the averaged prediction errors of the spectroscopic model was relatively large compared to a similar study by Viscarra Rossel et al. (2016b) and to the spectroscopic sampling error. The outcome was the generation of soil property maps with a smooth pattern. This was expected because the prediction at one sampled location is not the observation itself. The smoothing increases with the size of the nugget variance, which averages the overall error. The accuracy of maps produced with the vis-NIRaugmented data was slightly poorer than that from the laboratory measurements (Table 7). However, this effect might be cancelled out by accounting for the standard error in laboratory measurements of soil properties; the latter was disregarded in our case study.
We also disregarded uncertainty of the variogram parameters, which might increase the final prediction uncertainty in our case. This is because we used a regular grid sampling design, which does not provide information on spatial dependence over short distances. The nugget variance could possibly be poorly estimated. A solution would be to include a second-phase survey where a subset, generally 10% of the total existing sample size, would be included as close pairs from the existing locations (Lark & Marchant, 2018). In our case, budget constraints made this solution impossible to implement.
We advocate that soil management techniques that require fine-resolution spatial data can benefit considerably from the use of local vis-NIR spectral libraries. In Brazil, users of precision agriculture for soil applications usually collect and analyse from one to five samples per hectare to develop suitable soil maps. It involves an investment that, in some cases, might be reduced considerably by using the present methodology.
Finally, based on this research we suggest a set of steps that could be useful for soil mapping at the farm scale in areas where prediction of soil properties based on vis-NIR data will be implemented for the first time. In a 'real' situation, the steps that should ideally be followed to implement some of the concepts presented in this study are as follows.
1 Collect a large number of soil samples, if possible, to represent soil spatial variation within the area to be mapped. A sampling density of at least one sample per hectare or more should be used in case no a priori information about the variation in the area is available. Determination of an ideal number of samples to collect is not straightforward and the sampling scheme, which will not be optimal, should include samples at variable distances so that spatial dependence over short distances can be determined during geostatistical analysis (Lark & Marchant, 2018). If environmental covariates are available for the area of interest, they can be used as ancillary information to aid in the selection of the sampling locations. 2 Measure the vis-NIR spectra of all the samples collected in step 1. 3 Select a representative optimal subset of samples to calibrate the soil vis-NIR models. The size of the subset must guarantee both small prediction errors and reasonable investment in conventional laboratory analyses. This optimal number of samples can be obtained from a dataset with a size that enables probability density functions (pdfs) of the whole dataset to be reproduced as proposed in Ramirez-Lopez et al. (2014) and also described and evaluated in the present study. 4 Send the selected calibration samples to the laboratory for conventional analysis to determine reference values for the soil properties to be mapped. 5 Use the calibration samples to create vis-NIR models capable of predicting the soil properties and quantify the uncertainty of prediction. 6 Use the vis-NIR models to predict the properties for samples that were not submitted for laboratory analysis. This step should be carried out for only vis-NIR models with acceptable prediction accuracy. 7 For each property, the predicted values of the samples not included in the calibration set and the values determined through conventional laboratory analysis for the calibration set must be pooled and the required analyses for mapping the soil property of interest can then be performed.

Conclusions
We investigated the use of vis-NIR spectroscopy for highresolution soil mapping at the farm scale. The main conclusions were as follows.
• The number of samples used for calibration of the vis-NIR model can be selected optimally based on their spectral variation. In our case study, only 26% of the soil samples would be sent to the laboratory for conventional soil analysis. Despite the small number of calibration samples, predictive performance of the vis-NIR models was satisfactory. • The vis-NIR models fitted to a dataset from a small-scale spectral library benefitted from a local modelling approach. In this case, the local models might also be used to exclude potential outliers.
• Prediction errors of the spectroscopic model can be easily propagated by spatial analysis using the kriging equations. • For our case study, the large prediction error of the spectroscopic model propagated by kriging resulted in smoothing of the predictions, without compromising on prediction accuracy. • Maps produced from the vis-NIR-augmented data were similar (albeit slightly worse than) to maps produced using only laboratory measurements. This is an important result because the costs incurred in producing the vis-NIR-augmented maps were considerably less than those for laboratory-based maps.