Improving soil organic carbon predictions from a Sentinel – 2 soil composite by assessing surface conditions and uncertainties

Soil organic carbon (SOC) prediction from remote sensing is often hindered by disturbing factors at the soil surface, such as photosynthetic active and non – photosynthetic active vegetation, variation in soil moisture or surface roughness. With the increasing amount of freely available satellite data, recent studies have focused on stabilizing the soil reflectance by building reflectance composites using time series of images. Although composite imagery has demonstrated its potential in SOC prediction, it is still not well established if the resulting composite spectra mirror the reflectance fingerprint of the optimal conditions to predict topsoil properties (i.e. a smooth, dry and bare soil). We have collected 303 photos of soil surfaces in the Belgian loam belt where five main classes of surface conditions were distinguished: smooth seeded soils, soil crusts, partial cover by a growing crop, moist soils and crop residue cover. Reflectance spectra were then extracted from the Sentinel – 2 images coinciding with the date of the photos. After the growing crop was removed by an NDVI < 0.25, the Normalized Burn Ratio (NBR2) was calculated to characterize the soil surface, and a threshold of NBR2 < 0.05 was found to be able to separate dry bare soils from soils in unfavorable conditions i.e. wet soils and soils covered by crop residues. Additionally, we found that normalizing the spectra (i.e. dividing the reflectance of each band by the mean reflectance of all spectral bands) allows for cancelling the albedo shift between soil crusts and smooth soils in seed – bed conditions. We then built the exposed soil composite from Sentinel – 2 imagery for southern Belgium and part of Noord-Holland and Flevoland in the Netherlands (covering the spring periods of 2016 – 2021). We used the mean spectra per pixel to predict SOC content by means of a Partial Least Squares Regression Model (PLSR) with 10 – fold cross – validation. The uncertainty of the models was assessed via the prediction interval ratio (PIR). The cross validation of the model gave satisfactory results (mean of 100 bootstraps: model efficiency coefficient (MEC) = 0.48 ± 0.07, RMSE = 3.5 ± 0.3 g C kg – 1 , RPD = 1.4 ± 0.1 and RPIQ = 1.9 ± 0.3). The resulting SOC prediction maps show that the uncertainty of prediction decreases when the number of scenes per pixel increases, and reaches a minimum when at least six scenes per pixel are used (mean PIR of all pixels is 12.4 g C kg – 1 , while mean SOC predicted is 14.1 g C kg – 1 ). The results of a validation against an independent data set showed a median difference of 0.5 g C kg – 1 ± 2.8 g C kg – 1 SOC between the measured (average SOC content 13.5 g C kg – 1 ) and predicted SOC contents at field scale. Overall, this compositing method shows both realistic within field and regional SOC patterns.


Introduction
Soils have become part of the global agenda for climate-change mitigation and adaptation initiatives (Amelung et al. 2020).The role of soils in these initiatives is two-fold.Firstly, carbon (C) sequestration in soils is part of negative emission options for climate change mitigation (Paustian et al. 2016).Secondly, increasing soil organic matter (SOM) and hence its main component soil organic carbon (SOC) is crucial for the adaptation of agricultural systems to climate change, through its effect on soil health via water and nutrient holding capacity and soil structure.Implementing effective soil-based climate-change mitigation strategies requires the capacity to measure and monitor SOC with acceptable accuracy at a low cost and over large areas (Paustian et al. 2016).Traditional soil mapping techniques have limitations related to E-mail address: klara.dvorakova@uclouvain.be (K.Dvorakova).
Contents lists available at ScienceDirect Geoderma journal homepage: www.elsevier.com/locate/geodermahttps://doi.org/10.1016/j.geoderma.2022.116128Received 8 March 2022; Received in revised form 19 August 2022; Accepted 21 August 2022 financial and time constraints, not being able to meet the spatiotemporal resolution required.Existing soil mapping products covering a large geographical extent are available either at coarse spatial resolution of 250 m to 1 km (de Brogniez et al. 2015;Jones et al. 2005), or are based on legacy data such as the harmonized World Soil Database (1:5 000 000; Nachtergaele et al. (2009)).This spatiotemporal resolution is insufficient, as changes in SOC are often related to agricultural management decisions taken at the field / farm-scale with a realistic return on investment in mind.
Digital soil mapping emerged as an alternative to meet the global demand for soil maps (McBratney et al. 2003).Earth observation serves as a source of co-variates for digital soil mapping, as satellite images allow monitoring over time and over large areas with a resolution that allows distinguishing within and between field patterns.Many studies have demonstrated the potential of remote sensing for SOC mapping either directly when bare soils are exposed or indirectly quantifying the biomass input into the soils (Angelopoulou et al. 2019).However, only a limited number of studies have focused on directly predicting soil properties for areas larger than 500 km 2 .Silvero et al. (2021) mapped 4 815 km 2 in southeastern Brazil, Diek et al. (2017) mapped 12 000 km 2 in the Swiss Plateau and Zepp et al. (2021) mapped 130 000 km 2 in the German federal state of Bavaria and adjacent areas.These studies all used exposed soil composites and therefore provided a nearly continuous cover of croplands.Indeed, the area that is likely to be predicted on a single remote sensing scene is limited as many pixels are covered by crops, and the periods during which exposed soils dominate are restricted to short time windows (Zepp et al. 2021).Creating an exposed soil composite, i.e. stacking several spatially overlapping images into one composite image based on an aggregation function, allows combining all exposed soils of all input scenes, thus increasing their extent.The soil information that can be derived from the composite image is then averaged over multiple years.Averaging over multiple years will go at the expense of temporal resolution.However, as SOC changes can usually only be observed over a period of 5-10 years, integration over five years is not the main constraint for the composite.On the one hand, averaging exposed soil spectra over several years produces new spectrally enhanced data, which are less sensitive to seasonal differences in soil surface conditions.On the other hand, it has been shown that SOC prediction models are strongly hindered by atmospheric disturbance (varying with season, clouds, sun azimuth and elevation), as well as by varying surface conditions due to roughness, moisture or crop residue cover (Castaldi et al. 2019a;Gholizadeh et al. 2018;Vaudour et al. 2019).Ultimately, a spectrum does not necessarily reflect the pure soil signal at least for an important fraction of the cropland area.Besides atmospheric correction and cloud masking, a common approach relies on the empirical definition of a spectral index threshold that is used to differentiate between soils in suitable and unsuitable conditions.Well-known indexes are the normalized difference vegetation index (NDVI) (Loiseau et al. 2019;Rogge et al. 2018), the bare soil index (BSI) (Diek et al. 2017), the normalized burn ratio (NBR2) (Demattê et al. 2018;Gallo et al. 2018), and Sentinel-1 derived volumetric soil moisture per pixel (Vaudour et al. 2021).So far, index thresholds are chosen on trial and error basis, where the threshold providing the best model performance is usually selected e.g.(Castaldi et al. 2018;Dvorakova et al. 2021;Vaudour et al. 2021).The extent to which these index thresholds distinguish between different soil surface conditions is rarely reported and may also be specific for pedoclimatic zones.
The NBR2 is computed as the normalized difference between two SWIR bands.It was initially defined by van Deventer et al. (1997) for the Landsat TM to detect crop residue cover for fields under conservation tillage.It has also shown to be able to detect moist soil (Demattê et al. 2018), because the SWIR bands are on the shoulders of the water absorption feature (Musick and Pelletier 1988).Given the high temporal and spatial resolution needed for SOC mapping with composite imagery, the multispectral instrument (MSI) aboard the Sentinel-2 (S-2) constellation was chosen.The S-2 constellation is composed of twin satellites S-2A and S-2B in the same orbit and phased at 180 • (Drusch et al. 2012).Together, they provide time series with high revisit frequency (five days at the equator).The S-2 MSI has 13 spectral bands of which four cover the visible spectral range (440 nm to 670 nm), six cover the red-edge and the near infra-red spectral range (700 nm to 1000 nm) and three bands cover the shortwave infra-red spectral range (1300 nm to 2200 nm).SOC shows a relationship with electromagnetic radiation in all these spectral regions (Ben-Dor et al. 1997).The S-2 mission has already shown to be promising for SOC mapping.Gholizadeh et al. (2018) obtained good results when predicting SOC in a 0.93 km 2 cropland field in Czech Republic (RPD = 1.92,RMSE = 0.8 g C kg -1 ).Castaldi et al. (2019a) predicted SOC for a 10 000 km 2 area in Northern Germany with good accuracy when NBR2 thresholding was applied (RPD = 2.3, RMSE = 13.4 g C kg -1 ).Vaudour et al. (2021) used an S-2 exposed soil composite to predict SOC in the Versailles Plain (221 km 2 ), France (RPD = 1.4,RMSE = 3.7 g C kg -1 , R 2 = 0.5).Dvorakova et al. (2021) obtained good results when predicting SOC on an S-2 exposed soil composite in central Belgium (3 630 km 2 ) using NBR2 thresholding (RPD = 1.6, RMSE = 3.6 g C kg -1 , R 2 = 0.56).McBratney et al. (2003) have stated that digital maps of soil properties should be accompanied by a spatial measure of their associated uncertainty.This uncertainty of prediction must be assessed to characterize the quality of the prediction maps and these limits allow the user to define whether the product is suitable for the intended use (Gomez et al. 2019;McBratney et al. 2003).Compared to the common model performance indicators such as MEC, RMSE, RPD or RPIQ which are calculated only from points for which the reference values are known, the uncertainty maps can be calculated for all new predictions.One of the rare examples of uncertainty quantification for soil prediction from imaging spectroscopy is provided by Gomez et al. (2015).They analyzed the uncertainties that affect clay content mapping from VNIR/SWIR airborne data.Brodský et al. (2013) have evaluated the spatial uncertainty of the SOC prediction maps acquired by means of field spectroscopy covering a single cropland field in the Czech Republic.
In response to the scarcity of SOC maps accompanied by the uncertainty of prediction, this study aims to predict SOC from an S-2 exposed composite and quantify the uncertainty for the croplands in parts of Belgium and The Netherlands.The first part of this study consists of gathering in-situ field observations of soil surface conditions of croplands (notably crop residues, soil moisture and soil crusts), and linking them to remote sensing information, more specifically the NBR2 index.Once a NBR2 threshold based on field observations is found that allows eliminating soils in unsuitable conditions for SOC prediction from remote sensing, the exposed soil composite is created using this threshold.The second part of this study focuses on uncertainty assessment for SOC prediction maps from an S-2 exposed soil composite.Here we use bootstrapping to vary the calibration data set and quantify the effects on SOC prediction.Given the large storage and high computing power needed for the creation of an exposed soil composite, the Google Earth Engine cloud computing platform (Gorelick et al. 2017) is employed for processing bare soil surface spectra using S-2 multi--temporal data.The composite is used to predict SOC over large areas: 16 900 km 2 in southern Belgium, and 6 350 km 2 in the Dutch polders.These zones were selected because of their large cropland extent (5 370 km 2 ).

Study site
The study was conducted in Wallonia, southern Belgium and in part of Noord-Holland and Flevoland in the Netherlands (Figure 1).The study area in Wallonia covers 16 900 km 2 .There is an increase in precipitation (from 800 to 1 200 mm) and a decrease of mean annual temperature (from 10 to 8 • C) from the northwest to the southeast (Chartin et al. 2017).Sandy loam soils are dominant in the north west (haplic Luvisols, WRB (2014)), silty soils in the center and shallow silt loam and stony soils in the south east (dystric Cambisols, WRB (2014)).
A shift from intensive arable agriculture to more extensive cattle breeding can be observed along the same gradient (Chartin et al. 2017).Winter cereals, sugar beet, maize and potatoes are the dominant crops, grown in a three-year rotation.Most soils are under conventional tillage using a moldboard plow.The croplands amount to 3 878 km 2 .
The study area in the Netherlands contains four polders, which were reclaimed from the IJsselmeer during the 20 th century (Figure 1B) (Waterman et al. 1998).The oldest polder Wieringermeer (drained in 1930) covers 200 km 2 and is part of the province Noord-Holland.The Noordoostpolder was drained in 1942 and spreads across 480 km 2 .The Flevoland polder was reclaimed in two parts: East Flevoland drained in 1957 (540 km 2 ) and South Flevoland drained in 1968 (430 km 2 ) (Waterman et al. 1998).The soils are calcareous and relatively young, with predominantly loamy soils with clayey intrusions (Fluvisols and Gleysols, WRB (2014)).The agriculture is intensive, with maize, winter wheat and potatoes being the dominant crops.As both the Dutch and Belgian study areas are dominated by conventional agriculture, soil properties are uniform in the top 0-20 cm due to repeated tillage (Meersmans et al. 2009).

Field data
A total of 303 observations of the soil surface were made in the Belgian loam belt region (red points on Figure 1C).The data were collected on nine occasions from October 2019 to April 2021 (Table 1).The locations were chosen at least 20 m from the field border in order to avoid mixed pixels.For each field observation, an RGB geo-referenced (7-8 m precision) photo was taken with a smartphone.For each observation we assigned a yes/no variable according to the state of the surface conditions (i.e.seedbed, crop residues, moist soils, growing crops and soil crust).The soil was categorized as moist if puddles were visible or if a color difference could be attributed to partial drying of the soil surface.Ultimately, observations with mixed soil surface conditions (e.g.presence of vegetation on wet soil, etc.) were discarded.At some of the locations, if the conditions allowed it (field was not freshly seeded or vegetation has not developed yet), soil samples were taken in March 2021, avoiding field edges.In total 51 pseudo-random (given site availability) surface soil samples (0-10cm) were collected (Figure 1C, referred to as the Belgium calibration dataset).Each sample consists of five sub-samples collected within a circle of 5 m radius centered on the geographical position of a sampling plot, which was recorded by a Garmin GPS with 3 m precision.The sub-samples were then thoroughly mixed and stored in a plastic bag.
Additionally, 73 geo-referenced soil samples (referred to as Dutch calibration dataset) were collected in June 2021 on the S-2 tile T31UFU (Figure 1B).These samples were selected so that (1) they were located in croplands and (2) they covered the principal component space computed from the spectra on a S-2 T31UFU image acquired on 05-04-2020.This image was selected because of the large extent of exposed soils, and the dry conditions preceding the acquisition date.The first two principal components characterizing 70% of the original S-2 image were retained to describe the spatial variability, and 100 locations were selected for sampling.Given the non-accessibility of some of the locations, 73 out of the 100 locations were sampled.In total 124 samples were available for calibration in Belgium and The Netherlands.The reason for joining these datasets was to increase the SOC variability in the calibration dataset.
The model predictions were validated on an external and independent dataset.The results from routine analyses for fertilizer advice are available in a database of the REQUASUD network since 2005 (https:// www.requasud.be/,accessed date: 30 November 2021).The soil data refer to composite samples covering an entire field (or a soil series within a field).A total of 34 385 cropland fields were sampled from 2017 until 2021 (referred to as the REQUASUD independent validation dataset).

Laboratory analyses
All samples (i.e. the Belgium and the Dutch calibration datasets) were air-dried, gently crushed and passed through a 2 mm sieve.SOC was analyzed by dry combustion, using a VarioMax CN Analyzer (Elementar Analysensysteme GmbH, Hanau, Germany), as detailed in Shi et al. (2020).For samples showing reaction with 10% HCl (1 out of the 51 samples from the Belgian calibration dataset and 62 out of the 73 samples from the Dutch calibration dataset), carbonate content was measured using a modified pressure-calcimeter method (Sherrod et al. 2002).Then, SOC was obtained by subtracting the inorganic carbon from the total carbon content.The SOC content of the composite samples of the REQUASUD independent validation dataset is measured via sulfochromic oxidation (ISO 14235) or dry combustion (ISO 10694) (Genot et al. 2012).
To evaluate the general accuracy of the prediction, the difference for all samples between predicted and measured SOC contents (i.e. the residues) of the external independent dataset was calculated (equa-tion1).

Remote sensing data
Remote sensing data were obtained using the Multi-Spectral Instrument (MSI) on board of the Sentinel 2-A (S-2A) and S-2B platforms, as the S-2 mission is a constellation with twin satellites.The MSI has 13 spectral bands, including four bands of 10 m resolution and six bands of 20 m resolution (see Drusch et al. (2012) for a detailed description of the MSI).S-2 imagery was obtained as Level-2A product; i.e. geometrically, radiometrically and atmospherically corrected using sen2cor, in the Google Earth Engine (GEE) environment (COPERNICUS/S2_SR).Seven S-2 tiles 31UFU, 31UES, 31UER, 31UFS, 31UFR, 31UGS and 31UGR were used.A cloud mask precomputed with the s2cloudless Python package, and available as GEE COPERNICUS/S2_CLOUD_PROBABILITY product, was used to remove pixels affected by clouds (pixels with 'probability'<30 were kept).We used nine bands (B2, B3, B4, B5, B6, B7, B8, B11 and B12) resampled to 20 m resolution.The images were masked using the Walloon cropland map extracted from the Land Parcel Information system for Wallonia in 2019 (http://geoportail.wallonie.be,accessed date: 28 January 2021) and the Dutch cropland map 2017 from the Ministry of Economic Affairs and Climate (https://data.overheid.nl,accessed date: 5 July 2021).

Sentinel-2 single date imagery
The S-2 spectrum was extracted for each of the 242 geo-referenced field observations to link these with S-2 imagery.For each field observation, the closest S-2 acquisition date was selected (Table 1).The time difference in ground campaign and image acquisition was in the majority of cases not more than two days (Table 1).However, at three occasions the S-2 image was acquired more than three days after the ground campaign.It was ensured that the weather conditions between the ground campaign and image acquisition remained similar.Fields that were in seedbed condition during the ground campaign and for which the image was acquired more than two days ahead of time (this is the case for the October field campaign) were not included in the sample to minimize the chance of false observations, as it is not possible to know how long ago the field was ploughed.
The normalized reflectance is derived to highlight the spectral shape and reduce the amplitude of variation of the spectra (equations ( 2) and (3); Rogge et al. ( 2018)): Table 1 The acquisition date of the geo-referenced field observations, versus the acquisition date of the Sentinel-2 (S-2) imagery.
The Normalized Burn Ratio 2 (NBR2) spectral index was calculated for the 242 S-2 spectra (equation (4); van Deventer et al. (1997)): where ρ is the surface reflectance (%) of the shortwave infrared (SWIR) spectral regions (i.e.SWIR1=B11 and SWIR2=B12 for the MSI on board of the S-2 constellation).The NBR2 index was then compared amongst the five soil surface categories, with the aim to obtain an NBR2 threshold value that separates moist soils and soils with crop residues from the other categories.

Sentinel-2 multi-temporal series
We used Sentinel-2 spectra acquired during the spring months from 2016 to 2021 (March, April and May) to build the exposed soil composite in the GEE environment by a pixel based multi-temporal analysis.The compositing refers to the process of combining spatially overlapping images into a single image based on an aggregation function, in this case the mean.The mean was chosen because it is the most common approach in soil compositing for the moment (Diek et al. 2016;Rogge et al. 2018;Vaudour et al. 2021).In total, 158 images per S-2 tile were used.We focused on the spring months because this is the period in temperate cropping systems when the maximum extent of bare soil occurs during preparation for and sowing of the summer crops (e.g.sugar beet, potato, maize).Moreover, the soil surface dries out quickly during sunny conditions that are required for a cloud free S2 image and the sun zenith angle is already high (avoiding the winter months when sun zenith angle drops below 70 • ).The low sun zenith angle hampers the correct estimation of atmospheric parameters used for converting Level-1C to Level-2A product (Vermote et al. 2016).Then the spectra were normalized (equations ( 2) and (3)), and the NBR2 threshold was applied to mask out moist soils and crop residues.Additionally, the NDVI (Rouse et al. 1974) was calculated (equation ( 5)): where ρ is the surface reflectance (%) of the red and near-infrared (NIR) spectral regions (i.e.Red=B4, NIR=B8 for the MSI on board of the S-2 constellation).NDVI values range between -1 and 1, where higher values of NDVI indicate extensive green vegetation coverage.Choosing a threshold NDVI value is required for masking green vegetation.The threshold was determined by (i) visually inspecting the S-2 RGB images and by (ii) minimizing the 'salt-and-pepper' patchiness of the resulting mask.For all images, pixels with NDVI values above 0.25 were considered as pixels containing green vegetation.Overall, only the pixels which comply with the following conditions were kept: no clouds, ∈ cropland, NDVI < 0.25, NBR2 < NBR2 threshold .The NBR2 threshold used was found following the methodology in section 2.4.1.Since an image collection from spring months 2016-2021 was used (in total 158 images per tile), it is very likely that several dates comply with the abovementioned conditions for a pixel, i.e. the number of scenes per pixel > 1.Therefore, we computed the mean reflectance value for each band per pixel.This mean value represents the final exposed soil composite, which was used for SOC prediction.Note that as cloud shadows have not been removed, they might affect the composite.However, we hypothesize that the effect of shadows strongly decreases when averaging multiple S-2 spectra into the final composite.For further analysis, the number of scenes contributing to the mean spectrum of each pixel was calculated and stored in a database.

SOC prediction models
Exposed soil composite reflectance values (Section 2.4.2) were regressed against topsoil SOC contents from the corresponding pixels in the calibration dataset (n=124) in order to calibrate a SOC model.Partial least squares regression (PLSR) was then chosen to construct the SOC prediction models.The PLSR approach uses the full spectrum to establish a linear regression model where the significant spectral information is contained in a few orthogonal factors, called latent variables (LV) (Nocita et al. 2014;Wold et al. 2001).Because a limited number of samples were available, they could not be partitioned into calibration-validation datasets.Therefore, a ten-fold-cross-validation procedure was adopted to estimate the prediction capability of the PLSR model for the training set.With this method, the dataset is divided randomly into ten parts.Nine of those parts were used for training, and one part was reserved for testing.This procedure was repeated ten times, each time with different partitioning of the original dataset.
The PLSR analyses were performed using the 'pls' package developed in an R environment.To avoid over-or under-fitting, the optimal number of LV was determined as the one producing a model having the minimal Root Mean Square Error (RMSE) of cross-validation, while the maximum number of LV's was set at five in order to avoid over-fitting the model.The Variance Importance Projection (VIP) index was calculated in order to detect the main spectral regions involved in the SOC prediction.Spectral bands with VIP values greater than one were considered important in the PLSR model.
The quality of model fit was assessed using the following parameters: Model Efficiency Coefficient (MEC) which is the equivalent of the Rsquare of predicted and observed values computed against the 1:1 line , Root Mean Square Error (RMSE), Ratio of Performance to Deviation (RPD) and Ratio of Performance to Interquartile Range (RPIQ) of ten-fold cross-validation (equations ( 6)-( 9)): where ŷ = predicted value, y= mean observed value, y = observed values, n = number of samples with i = 1, 2, …, n, and SD the standard deviation and IQ the interquartile range (quartiles 0.25 and 0.75) of the observed values.Thresholds for RPD can be found which classify the models into three categories: non reliable when RPD < 1.4, fair when 1.4 < RPD < 2 and excellent when RPD > 2 (Chang et al. 2001).Minasny (2013), however, considers these thresholds to be arbitrary.We will, therefore, not use the thresholds as model performance indicators, but provide these for comparison with the literature only.According to Bellon-Maurel and McBratney (2011), RPIQ index is a more realistic evaluation of a model, when working with non-normal data.
Additionally, each set of calibration samples was subject to bootstrapping to stabilize the prediction model performance, and to calculate the variance of prediction, which is an input in the measure of uncertainty (Malone et al. 2011).Bootstrapping consists of repeatedly calculating a given statistic from a series of subsamples obtained by randomly resampling with replacement an initial dataset (Efron and Tibshirani 1993).Hence, 100 PLSR models were created (100 bootstraps), each with a different calibration dataset of 124 samples, drawn with replacement from the original sample that has the same sample size.These models were applied to the exposed soil composite, resulting in 100 SOC prediction maps of the area.

SOC maps and their uncertainty
The 100 SOC prediction maps were averaged at each point in space in order to obtain one prediction value per pixel (Figure 2), while the variance of the 100 predictions as well as the mean squared error (MSE) estimated from the cross-validation were used to quantify uncertainty (Malone et al. 2016).An overall prediction variance at each pixel is the sum of the random error component (MSE) and the bootstrap prediction variance as follows (equation (10); Malone et al. (2016)): where VAR all is the overall variance i.e. the sum of the variance of the 100 bootstrap predictions and MSE vali the mean squared error estimated by validation.To derive the 90% prediction interval, the square root of the variance estimate is multiplied by the z value that corresponds to the 90% probability (z=1.64), to calculate the Standard Error (SE, equation ( 11)): The Upper and Lower Prediction Limits (UPL and LPL) are then calculated (equations ( 12) and ( 13)): Where MEAN boot is the mean bootstrap prediction.The Prediction Interval Range (PIR) is then the difference between the UPL and LPL (equation ( 14)): Here, we express the model output uncertainties as the PIR (Malone et al. 2011).To assess whether a link exists between the number of scenes per pixel and the uncertainty, 5000 points were randomly selected within the croplands of the loam belt (2500 points) and the croplands of the polders of Flevoland and Wieringermeer in the Netherlands (2500 points).For each of these points, new composites were created from the S-2 time series (Section 2.4.2),where the number of scenes per pixel included in the calculation of the composite varied from 1 to N. The composite spectra were calculated as follows: for each of the 5000 points of Dataset n , randomly extract n ∈ [1,N] spectra from the Sentinel-2 time series, and calculate the mean of the selected spectra if n > 1.Hence, n represents the number of scenes per point (Figure A1).For example, the points in Dataset 4 contain the mean of four spectra randomly selected from the Sentinel-2 time series for each of the 5000 points.The PLSR model with bootstrapping as described above was then applied to the 5000 points of each dataset, creating a database of SOC predictions with dimensions N x 5000 x bootstraps.Given the large computational load of this procedure, the number of bootstraps was set to 50 and N was set to 24.To guarantee that the datasets were balanced, it was ensured that each of the 5000 randomly extracted pixels appeared on at least 24 scenes.Finally, the uncertainty is calculated for each point of each Dataset n .Based on this, the minimal number of scenes per pixel F min which allows to minimize the uncertainty of SOC predictions is found.All pixels for which the number of scenes on the exposed soil composite < F min are then eliminated.Since SOC predictions per pixel might suffer from noise, due to for example mixed pixels, the SOC prediction information is aggregated per field.This also allows validating the results against the independent REQUASUD dataset which provides SOC content of composite samples collected in 34 385 fields.After eliminating pixels for which the number of scenes < F min , the 100 SOC prediction maps are each averaged per field, providing 100 mean values per field.Fields which contain less than 10 pixels were eliminated to avoid errors due to mixed pixels in such very small fields.

In-situ observations of soil conditions
Out of the 303 soil surface observations 242 presented only a single type of disturbance: (a) seedbed (n=76), (b) crop residues (n=38), (c) moist soils (n=13), (d) growing crops (n=10), and (e) soil crust (n=114).The 242 extracted S-2 spectra were classified into five classes based on the observations of the surface conditions (Figure 3), and the spectra were normalized (equations ( 2) and ( 3)) and averaged per category (Figure 4).The spectrum of a soil in seedbed condition is the desirable shape for soil property prediction from remote sensing.The spectrum of vegetation shows recognizable features in the visible and infra-red region, with high absorption in B4 due to photosynthetic activities and high reflectance in B8 due to leaf structure (spongy mesophyll; Figure 4) (Rouse et al. 1974).The NDVI was designed to quantify this increase in reflectance and spectra affected by vegetation can be differentiated from exposed soils with an NDVI threshold.The spectrum of a soil crust has an overall higher albedo than a spectrum of a soil in seedbed condition, but has a similar shape (Figure 5A).According to the literature, the effects of crusts on reflectance are mainly translated in a change of albedo, with absorption features varying slightly at 1450 nm and 1900 nm (Ben-Dor et al. 2003).Since these spectral regions are not corresponding to any of the S-2 MSI band, we hypothesize that structural crusts in the loam belt region in Belgium only affect the soil albedo, and the albedo shift can be corrected for by spectral normalization (Figure 5B).This hypothesis still has to be studied in other geographic conditions.For example, the colonization of physical soil crusts by algae or lichens in semi-arid and arid regions induces a large spectral variability at 550, 680 and 750 nm, due to chlorophyll absorption (Chamizo et al. 2012).This chlorophyll absorption feature might be observable on S-2 spectra, and a simple spectral normalization will not be enough to mask such biological soil crusts.Additionally, soil albedo is not only affected by soil crusts.In fact, Muller and Décamps (2001) observed strong impact of soil moisture on soil albedo, and Franceschini et al. (2015) observed that the albedo decreases when SOC and clay content increases.Therefore, by normalizing the spectra, the albedo effect caused by soil crust, soil moisture, but also by SOC are eliminated.This might lead to a reduction in the predictive power of the SOC models which use normalized spectra as input.
Finally, the spectra of moist soils and soils with crop residues (i.e.non-photosynthetically active vegetation) show a higher absorption in B12.For crop residues this absorption feature is due to the structural compounds of cellulose, hemicellulose and lignin found in dry residues at 2100 nm (Daughtry et al. 1996).The higher absorbance in B12 for moist soils is a result of the strong absorption band between 1750-2000 nm due to water, that also impacts the adjacent wavelength regions covered by B11 and B12.Hence, the NBR2 index, which quantifies the difference between B11 and B12 shows a stronger response to spectra affected by soil moisture and crop residues, rather than to spectra of soil crusts and of soils in seedbed conditions (Figure 6).Given the high  absorption of photosynthetically active vegetation in B12 due to high water content, the NBR2 index is also important for vegetation.ANOVA analysis of soils with crust and in seedbed conditions against moist soils, vegetation and crop residues showed a significant difference in NBR2 (at α=0.10 significance level).An NBR2 threshold of 0.05 was found to be able to differentiate soils with crust and in seedbed conditions from moist soils, vegetation, and crop residues (Figure 6).

Exposed soil composite comparison with in-situ soil observations
An exposed soil composite image for the areas of interest was built from the S-2 image collection (Figure 1).The pixels which complied with the following conditions were kept: no clouds, ∈ cropland, NDVI < 0.25, NBR2 < 0.05.For comparison with the soil surface categories, the composite spectra for 242 in-situ observation locations were averaged per class.The mean normalized spectrum showed a close resemblance to a spectrum of a soil in seedbed conditions and soil crusts (Figure 7).The widespread occurrence of soil crusts during the spring months is reflected in the spectra of the composite, although the difference to the spectra of the seedbeds is minimal.Overall, the combination of a NBR2 < 0.05 threshold and spectral normalization increases the likelihood of including only spectra in suitable conditions in the exposed soil composite.
An important step in the methodology was the definition of thresholds to discriminate soils from other targets (vegetation, wet soils and crop residues).These thresholds are site specific, and no rule of thumb exists to adjust them according to regional or local spectral patterns (Demattê et al. 2020).The common NDVI thresholds used in the literature for vegetation masking are 0.20 (Gholizadeh et al. 2018;Gomez et al. 2016;Lagacherie et al. 2019), 0.25 (Demattê et al. 2018;Dvorakova et al. 2021;Žížala et al. 2019Žížala et al. ), 0.30 ((Gomez et al. 2019;;Safanelli et al. 2020) and 0.35 (Castaldi 2021;Castaldi et al. 2019a).The choice of stricter thresholds to minimize the inclusion of pixels with disturbing effects is made at the expense of the extent of the area that can be mapped (Vaudour et al. 2021).Here, NDVI < 0.25 represents a good trade-off between mappable area and effectivity (evaluated visually) to mask vegetated surfaces.The NBR2 threshold to discriminate moist soils and crop residues was defined based on field observations, and was set to 0.05 (Figure 6).Vaudour et al. (2021) used NBR2 < 0.053 to detect bare soils.They defined the index value as the first quantile of the NBR2 distribution across S-2 images of the Versailles Plain, France.Demattê et al. (2018) stated that the use of 0.075 as NBR2 threshold provides the best association between laboratory and Landsat 5 spectra in a 14,614 km 2 area in southeastern Brazil.Castaldi et al. (2019a) tested different NBR2 thresholds and found that NBR2 < 0.05 improves SOC prediction, but reduces the area for which SOC can be mapped in Demmin, Northern Germany.Ultimately, the choice of thresholds used to mask unwanted pixels is bound to have an effect on the exposed soil composite, whether on its quality or its spatial extent.Given that both study areas (southern Belgium and the Dutch polders) have similar climate conditions and are both dominated by intensive arable cropping systems, we assumed similar thresholds for NDVI and NBR2.In total 83% of the total cropland area was included in the exposed soil composite when using NDVI < 0.25 and NBR2 < 0.05.The number of observations per pixel varied, with 18% of pixels having less than seven observations per pixel (Figure 8).

SOC prediction models
The general statistics of the calibration datasets and the two independent validation datasets are shown in Table 2 and Figure 9.The SOC content of the 124 calibration points was on average 14.2 g C kg -1 and was rather variable: variation coefficient (CV) of 29% (Table 2).The exposed soil composite obtained in section 3.2 provided a good basis for SOC mapping over large areas.The average prediction accuracy of the 100 ten-fold-cross validation runs yielded an MEC of 0.48 ± 0.07, RMSE of 3.5 ± 0.3 g C kg -1 , RPD of 1.4 ± 0.1 and RPIQ of 1.9 ± 0.3 (Figure 10A).In 71% of all cases, the best performance of the PLSR model was obtained when using four LVs.The accuracy is comparable to the one obtained by Vaudour et al. (2021), who applied an NBR2 threshold to S-2 composite imagery to create bare soil composite.They obtained R 2 = 0.53, RMSE = 3.2 g C kg -1 and RPD = 1.46 and covered 34.28 km 2 of croplands in the Versailles Plain, France.
The VIP pointed to the relative importance of S-2 bands B2, B11 and B12 in the model (i.e.VIP > 1; Figure 10B).This is in agreement with previous studies (Castaldi et al. 2016;Dvorakova et al. 2020).In particular B11 and 12 are also the ones that react to disturbing effects from residues and soil moisture (see section 3.2).This underlines the importance of the NBR2 threshold to remove pixels with residue cover or moist pixels in order to avoid using these for SOC prediction models.

SOC maps and their uncertainty
Three maps were constructed: i) mean SOC content map, ii) a map of the number of scenes per pixel of the exposed soil composite and iii) a map visualizing the PIR of the SOC prediction (Figure 11).According to Gomez et al. (2015) there are five sources of uncertainty in soil property mapping from remote sensing: i) uncertainty of the spectra linked to device repeatability and ii) spatial positioning; iii) uncertainty of the reference lab values; iv) uncertainty of the model building linked to the choice of calibration set and v) choice of model dimension.Here we used the bootstrapping method (100 bootstraps, section 2.4.2) to assess the uncertainty linked to the model building (iv).Given the spectral stability of the exposed soil composite and the radiometric stability and spatial resolution (20 m) of S-2 MSI (Drusch et al. 2012), the uncertainty related to the spectra (i) and its positioning (ii) was neglected.We also   neglected the uncertainty related to the reference lab values (iii), as it is not strictly related to spectral modelling.
Higher uncertainty values were generally observed for field borders (Figure 11B bottom), where the number of contributing scenes is low (Figure 11B top) and pixels can represent a mixture of the surface conditions in both of the adjacent fields.Additionally, mixed pixels were excluded from the calibration dataset and the PLSR model is therefore not trained to predict mixed pixels.Several techniques exist for spectral unmixing, i.e. the separation of a pixel spectrum into its pure component endmember spectra, and the estimation of the abundance value for each endmember (Plaza et al. 2009).These techniques are however computationally intensive and require a large dataset of training samples.
The southern part of the SOC prediction map of the Belgian loam belt shows spots with higher SOC content within the fields (Figure 11b).These are the remains of pre-industrial charcoal kilns.On bare cropland soil, the enrichment with charcoal appears as circular black spots up to 40 m in diameter.The charcoal has been mixed through the topsoil by repeated tillage over time (Hardy et al. 2019).The uncertainty map of SOC prediction displays patterns linked to the number of scenes per pixel.Large uncertainty (average PIR = 14 g C kg -1 ) is observed in the western part of Noord-Holland (Figure 11A bottom), compared to the neighboring Wieringermeer polder with smaller uncertainty (average PIR = 11 g C kg -1 ).This large uncertainty coincides with a small number of scenes contributing to the composite map (Figure 11A top).This small number of scenes is linked to the early season cropping leading to high NDVI values on the spring composite.Spring-blooming bulbs, such as   tulips, are widespread in the sandy region bordering the coast.Overall, it can be seen that uncertainty hardly decreases for pixels representing the mean spectrum of six or more images (Figure 12).Such a threshold is not commonly specified.Castaldi (2021) for example arbitrarily chose three scenes as a minimum for an exposed soil composite and Zepp et al. (2021) suggested considering the number of cloudless scenes per pixel as an internal quality measure.We observed that setting the threshold of minimum six scenes can lower the mean PIR of the prediction maps from 15.0 to 12.4 g C kg -1 (Figure 12).This has, however, an effect on the total surface of croplands covered by the SOC prediction map: 83% without threshold versus 65% if at least six scenes per pixel are required.
The pixels for which less than six scenes are available are generally field borders (mixed pixels, Figure 11B), soils with bad drainage (removed by the NBR2 threshold), and fields where farming practices do not allow full exposure during spring months (removed by the NDVI or NBR2 thresholds, Figure 11A).A map of the mean SOC content per field was constructed by averaging SOC predicted values per field entity, to allow an independent validation with the REQUASUD dataset.The coverage of the mean SOC map is lower than the one of the per pixel map.This is because for a number of fields the conditions for predicting the SOC were not met i.e. number of scenes per pixel > 5, minimal count of pixels per field unit, or both.The validation against an independent dataset with 34 385 samples (REQUASUD) showed a Gaussian distribution with a relatively small bias of 0.5 g C kg -1 ± 2.8 g C kg -1 (Figure 13).From the distribution it can be seen that 95% (1.96 × SD) of fields are correctly predicted within an interval of ±5.5 g C kg -1 and 68% (1 × SD) of samples are correctly predicted within th interval of ±2.8 g C kg -1 .
For the mean SOC content per field, 95% of samples have an error below 5.5 g C kg -1 (Table 3, Figure 14).The good performance of the SOC prediction for a large area (5,370 km 2 of croplands) is partially attributed to the similarity of soil types, and the low SOC contents (the measured SOC values range from 5 to 30 g C kg -1 ), making a single PLSR    model sufficient to capture the variability.For a larger area, splitting into several models based on ancillary variables (e.g.soil type or principal component analysis) should be considered (Castaldi et al. 2019b;Stevens et al. 2012).
Overall, a five-year S-2 spring composite image has shown to be appropriate for SOC prediction of loamy soils in Belgium and the Dutch polders with 5.5 g C kg -1 accuracy at the field level.SOC monitoring, however, relies on repeatability over time, with as main purpose change detection (Andries et al. 2021).This requires consistency of calibration and validation datasets, and suitable accuracy.As a matter of fact, comparison between 2009 and 2015 LUCAS surveys showed limited changes in SOC over the six years period (Fernandez Ugalde et al. 2020).The pixel SOC prediction map demonstrated large uncertainty: mean PIR = 12.4 g C kg -1 , while mean predicted SOC = 14.1 g C kg -1 (Table 3, Figure 13).Hence, PIR represents on average 88% of the predicted SOC value.While the mean SOC map might be a useful product to quantify average SOC at regional or national level, local soil monitoring of SOC is rendered obsolete with such high PIR values (Kempen et al. 2019).As suggested by Chen et al. (2022), it should be discussed whether PIR should be calculated at 90% prediction intervals, or should be narrower to be useful for decision-making.For some decision makers, 75% prediction interval might be adequate (Chen et al. 2022).Validating models by region is an alternative to improving the uncertainty, as regional MSE tends to be lower (Brungard et al. 2021).Lower MSE decreases the variance (equation ( 10)), resulting in an overall lower uncertainty.
The accuracy of SOC content prediction maps at field level (± 2.8 g C kg -1 ; Table 3) should in theory be small enough to detect the effects of management practices on SOC contents in the short to mid-term (5-10 years).In fact, van Wesemael et al. (2019) have shown that fields in the Belgian loam belt converted to conservation agriculture for at least 10 years had more than 4 g C kg -1 higher SOC contents than comparable fields under conventional agriculture.The average SOC contents per field can be used for decision making at the regional scale e.g. to characterize the SOC contents in croplands for a region with similar soils and climate or as a proxy for SOC stocks.Obviously, the bulk density, vertical gradient in SOC content and the stone content are additional variables that are not provided by the current spectroscopic approach.Garten Jr and Wullschleger (1999) were among the first to calculate a minimum detectable difference (MDD) for SOC inventories.MDD determines the level of SOC change that could theoretically be detected for each unit.The challenge associated with MDD is the large number of independent soil samples that is required (Smith et al. 2020).The UN Sustainable Development Goal (SDG) 15.3.1 defines areas of degraded land as the ones suffering from more than 10% average net reduction in SOC stocks (Sims et al. 2020).Here we provide a large inventory of SOC content data that could be used as a proxy for SOC stock in such MDD calculations.For example, the MDD of the SOC content prediction of 1000 randomly sampled fields (selected by conditioned Latin hypercube on spatial coordinates, Minasny and McBratney (2006)) in the loam belt is 0.35 g C kg -1 for an average SOC content of 12.5 g C kg -1 .This means that SOC content change of 2.8% can be detected, which is well below the 10% of the SDG 15.3.1.
It remains to be seen whether the detection limit of SOC prediction models from exposed soil composite is low enough to detect SOC stock changes in croplands, and what the appropriate period of time is to acquire a minimum of six valid scenes per pixels in the exposed soil composite.Furthermore, the methodology used for SOC modelling has to be consistent in every way, if a time series of SOC maps is to be compiled.As addressed by Zepp et al. (2021), larger and standardized validation and calibration datasets are needed, to produce large-scale SOC predictions (national to European-wide), in a consistent manner.Last but not least, de Gruijter et al. (2016) have shown that SOC prediction maps and associate error variances can be used as a source of univariate information for sampling stratification.Taking prediction errors into account in soil sampling strategy minimizes sampling variance and creates optimized stratification, leading to better performing models (de Gruijter et al. 2016).

Conclusion
Earth observation serves as a valuable base for digital soil mapping, as satellite images allow monitoring over time and over large areas with a resolution that allows distinguishing within and between field patterns.Creating an exposed soil composite, i.e. stacking several spatially overlapping images into one composite image based on an aggregation function, allows combining all exposed soils of all input scenes, thus increasing their extent.The resulting composite spectrum however does not necessarily reflect the pure soil signal at least for an important fraction of the cropland area.We have found that applying an NBR2 < 0.05 threshold and normalizing the S-2 spectra allow obtaining a reasonable soil reflectance signal, where crop residues and moist soils are excluded, and crusts are corrected for.Using these criteria in Google Earth engine, we built an exposed soil composite from Sentinel-2 imagery (covering the spring periods of 2016-2021), mapping 5,370 km 2 of croplands in Belgium and the Netherlands.The exposed soil composite was used to produce SOC prediction maps, accompanied by a measure of uncertainty.This compositing method showed both realistic within field and regional SOC patterns.An accuracy of 5.5 g C kg -1 was reached at the field level for 95% of samples.The uncertainty of prediction was assessed via the PIR.On average, PIR represented 88% of the predicted SOC value.Such high uncertainty values render local SOC monitoring obsolete, however given the high accuracy of the SOC map, it may be a useful product to quantify average SOC at regional and national level.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figure 1 .
Figure 1.(A) Location of the croplands (in black) and the Sentinel 2 tiles (in red) in Belgium and the Netherlands.Zoom on the two study areas: (B) the polders in The Netherlands and (C) southern Belgium.In red are the 303 in-situ observations of soil surface conditions and in blue the 124 calibration samples.For each blue point in the Belgian loam belt, a soil surface observation was made.The 34 385 REQUASUD validation samples are not shown, as they are uniformly distributed throughout the Walloon region.

Figure 2 .
Figure 2. General flowchart for the SOC modelling from the exposed soil composite.

Figure 3 .
Figure 3. Examples of the five categories of soil surface conditions in Belgian croplands.

Figure 4 .
Figure 4. Mean Sentinel-2 (S-2) spectra for each category of the five soil surface conditions in the Belgian loam-belt.For the sake of clarity of the figure, the standard deviation of the spectra is only provided for the spectrum of a soil in seedbed conditions, i.e. the optimal condition for soil mapping from remote sensing imagery.The green vertical bands indicate the S-2 B4 and B8 used to calculate Normalized Difference Vegetation Index (NDVI) and the brown vertical bands are S-2 B11 and B12 used to calculate Normalized Burn Ratio 2 (NBR2).

Figure 5 .
Figure 5. (A) Sentinel-2 (S-2) spectra for the same location extracted at two dates: 26 March 2020 (orange line) with a soil crust, and 28 March 2020 (brown line) in seedbed condition.(B) The same S-2 spectra but normalized.

Figure 6 .
Figure 6.Normalized Burn Ratio (NBR2) for the 242 in-situ soil surface observations in the Belgian loam belt, categorized into five classes.An NBR2=0.05threshold can separate vegetation, crop residues and moist soils from soils with crusts and in seedbed condition.

Figure 7 .
Figure 7. Mean Sentinel-2 (S-2) spectra of the two soil surface conditions encountered in the Belgian loam belt which are suitable for soil mapping from remote sensing: seedbed and soil crust.The black spectrum is a mean S-2 exposed soil composite spectrum of spring months 2016-2021 which is used for soil organic carbon mapping.The green vertical bands indicate the location of S-2 B4 and B8 used to calculate Normalized Difference Vegetation Index (NDVI) and the brown vertical bands are S-2 B11 and B12 used to calculate Normalized Burn Ratio 2 (NBR2).

Figure 8 .
Figure 8. Percentage of mapped cropland surface by the exposed soil composite in southern Belgium and the Netherlands for groups of pixels with the same number of observations.

Figure 9 .
Figure 9. Frequency distribution of soil organic carbon (SOC) contents of the model calibration dataset (n=124) and the external validation dataset REQUASUD (n=34,385).

Figure 10 .
Figure 10.(A) Measured against predicted soil organic carbon content (SOC) of one (randomly chosen) of the 100 partial least squares regression (PLSR) model runs (10-fold cross validation).The model performance indicators refer to the average values of the 100 PLSR models.The blue line is the 1:1 line, the red line is the regression line.(B) Distribution of model performance indicators obtained from the 100 PLSR models.(C) Mean Variance Importance in Projection (VIP) scores for the 100 PLSR model runs.Predictors with VIP values greater than one (area above the horizontal line in the plot) were considered significant for the PLSR model.The width of the bars represents the width of the Sentinel-2 bands.

Figure 11 .
Figure11.Spatial patterns (from top to bottom) of the number of scenes per pixel, the predicted soil organic carbon (SOC) and the uncertainty (expressed by PIR; see equations 10-15) of the SOC predictions resulting from partial least squares regression model with 100-fold bootstrapping.(A) Noord-Holland in the Netherlands, (B) fields in the vicinity of Beuzet village in the Walloon region Belgium).Grid units are in meters from the origin of the projection.Projection: Transverse Mercator, Projected coordinate system: WGS_1984_UTM_Zone31N.

Figure 12 .
Figure 12.The uncertainty (expressed by Prediction Interval Ratio (PIR); equations 10-15) of 100 soil organic carbon (SOC) predictions resulting from a partial least squares regression model with 50-fold bootstrapping as function of the number of scenes per pixel in the exposed soil composite used for SOC prediction.

Figure 13 .
Figure 13.A comparison of the residues between the predicted and the measured SOC contents of the external validation dataset obtained from REQUASUD.The blue dashed line is 1 × SD, the red dashed line 1.96 × SD.

Figure 14 .
Figure 14.Distributions of the uncertainty (expressed by Prediction Interval Ratio (PIR), Equations (10)-(15) of 100 soil organic carbon (SOC) predictions resulting from a partial least squares regression model with 100-fold bootstrapping performed at the scale of a pixel in southern Belgium loam belt and the Dutch polders (for the extent of croplands see Figure1).

Table 2
General statistics of the soil organic carbon (SOC) content of the model calibration soil samples and the independent validation soil samples.Skewness is calculated via Fisher's moment coefficient.(SD = standard deviation, CV = coefficient of variation) *Expressed in gC kg -1

Table 3
Statistics of the soil organic carbon (SOC) content prediction and the uncertainty (%) for the pixel-based approach and for field-based approach.(SD = standard deviation)