Sentinel-2-based predictions of soil depth to inform water and nutrient retention strategies in dryland wheat

The thickness or depth of fine-textured soil ( z f ) dominates water storage capacity and exerts a control on nutrient leaching in semi-arid agroecosystems. At small pixel sizes ( < 1 m; ‘fine resolution ’ ), the normalized difference vegetation index (NDVI) of cereal crops during senescence (Zadoks Growth Stages [ZGS] 90 – 93) offers a promising alternative to destructive sampling of z f using soil pits. However, it is unclear whether correlations between z f and NDVI exist (a) at larger pixel sizes (1 – 10 m; ‘intermediate resolution ’ ) and (b) across field boundaries. The relationship of z f to NDVI of wheat ( Triticum aestivum L.) was tested using images from a combination of multispectral sensors and fields in central Montana. NDVI was derived for one field using sensors of fine and intermediate spatial resolution and for three fields using intermediate resolution sensors only. Among images acquired during crop senescence, z f was correlated with NDVI ( p < 0.05) independent of sensor ( p = 0.22) and field ( p = 0.94). The z f relationship to NDVI was highly dependent on acquisition day ( p < 0.05), but only when pre-senescence (ZGS ≤ 89) images were included in the analysis. Results indicate that cereal crop NDVI of intermediate resolution can be used to characterize z f across field boundaries if image acquisition occurs during crop senescence. Based on these findings, an empirical index was derived from multi-temporal Sentinel-2 imagery to estimate z f on fields in and beyond the study area.


Introduction
The ability of soil to store water is a function of texture, rock content, and depth, resulting in a storage capacity that is critical for crop production, particularly in non-irrigated, semi-arid systems.The thickness or depth of fine-textured material (z f ) in the upper profile retains water against deeper percolation and dominates soil-water storage capacity (Sigler et al., 2020).In this sense, z f impacts within-field heterogeneity of key agronomic variables by virtue of water storage.For example, z f affects wheat grain yield (Grylls et al., 1997;Bestwick, 2016;John et al., 2017;Krüger et al., 2020) and grain protein concentration (Bestwick, 2016;John et al., 2017), as well as related edaphic properties including precipitation storage efficiency of fallow (Vomocil and Ramig, 1984;Sigler et al., 2020;Carr et al., 2021), inorganic nitrogen (N) changes during the off-season (Jones et al., 2011;Fordyce, 2022), nitrate leaching (John et al., 2017;Sigler et al., 2018;Sigler et al., 2020), and denitrification (Sigler et al., 2022).
In some environments, z f is highly variable over short distances (Bestwick, 2016;Sigler et al., 2020).Site-specific management such as variable rate seeding and N fertilizer application has strong potential to improve fine-scale N-use efficiency in these environments (Sigler et al., 2020;Hegedus et al., 2022).The use of spatial z f information in variable rate fertilizer programs can improve net returns and mitigate environmental impacts of high application rates (Bullock and Bullock, 2000;Basso et al., 2011;Vazquez-Amabile et al., 2013).Spatial z f information can also assist in the interpretation of crop growth, yield, and other data generated by small-plot agronomy experiments initiated without regard for fine-scale heterogeneity in z f (e.g., Fig. 1).Government soil databases like those of the United States Natural Resources Conservation Service (USDA, 2020a) contain spatial information regarding the overall distribution of soil texture and horizon development within the soil profile, including depth to subsurface horizons that are dominated by coarse textures and, thus, have limited water storage capacity.However, these resources use discrete polygons and associations of multiple soil series to delineate soil units that do not capture detailed spatial variability of z f (Tesfa et al., 2009;Sigler et al., 2020).These resources lack intermediate-to fine-scale characterization of z f and other soil properties (e.g., texture, organic matter content, hydraulic conductivity) that impact crop production (Santanello et al., 2007;Brom et al., 2021), creating the need to characterize spatial heterogeneity by alternative means.
Soil depth can be determined by excavating soil pits, but the methodology is destructive, costly, and labor intensive (Dietrich et al., 1995;Tesfa et al., 2009).Bestwick (2016) used maximum depth of a soil coring tube and block kriging to estimate mean z f for inclusion as a covariate in assessing crop sequence effects on wheat grain yield and protein concentration in central Montana.Crop yield increased 410 kg ha − 1 and protein concentration decreased 16 g kg − 1 for each 10-cm increase in soil depth in the drier of two years in that study.A wheat yield response of this magnitude (41 kg ha − 1 cm − 1 ) was within the range (10-56 kg ha − 1 cm − 1 ) reported by Krüger et al. (2020), who detected correlations between yield and z f (p < 0.05) in six study years with below average growing season precipitation, but no correlation (p = 0.17) in an additional year when growing season precipitation was above-normal.Importantly, the methods used to characterize soil depth in these studies were time-and resource-intensive and the resulting maps were limited in both extent and resolution.Sigler et al. (2020) mapped z f by regressing depth to cobble observations from soil pits with normalized difference vegetation index (NDVI) of barley (Hordeum vulgare L.) derived from National Agricultural Imagery Program (NAIP) data acquired during crop senescence (ZGS 90-93;Zadoks et al., 1974).Their results supported the hypothesis that variation in crop greenness is related to water availability as a function of z f in regions where water limitation affects senescence timing.Images acquired during crop senescence are useful for z f mapping because thicker soils store more water in the root zone, thus delaying senescence where soils are deeper (Sigler et al., 2020).While NDVI images acquired near mid-senescence  are likely to correlate with z f , 'optimal' timing has not been determined.
In fertility assessments, soil depth is typically estimated as depth to refusal of a soil coring tube (e.g., Jones et al., 2011).Soil probing has been used to estimate z f directly (Williams et al., 2009;Jones et al., 2011;John et al., 2017), in combination with kriging efforts (Bestwick, 2016), and in model validation (Tesfa et al., 2009;Ziadat, 2010;Sigler et al., 2020).Relative to soil pits, the probing method is minimally destructive, providing a simple means of estimating z f while preserving spectral signatures for future assessments.Cereal crop NDVI of pixel size < 1 m has been shown to correlate positively with z f within field boundaries (Sigler et al., 2020), but it remains unclear whether correlations between z f and NDVI exist at larger pixel sizes (1-10 m) and across field boundaries.Our objective was to determine the impact of image spatial resolution on the strength of z f -NDVI correlations within and across multiple fields in a semi-arid, dryland agroecosystem.

Field selection
Multiple ancillary and reference datasets were used in field selection, including CropScape (USDA, 2020b), NAIP (USDA, 2020c), multispectral imagery from an unmanned aerial vehicle (UAV; senseFly eBee, Cheseaux-Lausanne, Switzerland), and a shale surface digital elevation model of the Moccasin Terrace derived from well logs (Sigler et al., 2018).Fields were selected that spanned a known gradient in depth to shale to capture the range of spectrally relevant geologic boundary conditions for water table elevation within the study area.Visual comparisons of the shale surface digital elevation model and growing season average NDVI derived from image reduction of Sentinel-2 (Copernicus, 2020) multispectral data corroborated depth to shale estimates (Fig. 2).Fields were selected where spring wheat was planted in 2020 and winter or spring wheat was planted in 2018 and 2019, no-tillage practices were used, and there was high spectral variability within cereal crop stands observed in 2017 and/or 2015 NAIP imagery.This methodology ensured that at least one viable NDVI map (i.e., a map with application for z f estimation) could be generated.Moreover, selection occurred so distance among fields was minimized (≤15 km) while the range of predicted shale depth values was maximized (1-17 m).Any potential orographic contributions to weather differences among fields were minimal in the study area given the relative absence of complex topography.Nonetheless, study fields were selected along a transect oriented in the direction of the prevailing wind (i.e., southwest) to minimize potential effects of weather differences among fields.

Subfield selection and soil depth measurements
A 0.4-ha subfield within each 8-80 ha field was selected for data collection by visually identifying areas of maximum spectral variability that were representative of larger fields, as determined by single-band histogram comparisons.Soil depth within each subfield was randomly sampled at 12 points at Fisherman and Sun Basin, and 42 points at Central.Initial approximations of z f were made based on the maximum depth penetrated by a tile probe (diam.= 19 mm; Fabian Machine & Welding Inc., Lewistown, MT, United States) at a downward pressure of > 100 MPa.Downward pressure on the probe was estimated by retrofitting a Giddings hydraulic soil probe (15-TS / Model GSRTS; Giddings Machine Company, Windsor, CO, United States) with a pressure gauge.
At Central, a subset of 12 points (of the original 42) were selected at random for z f sampling three additional times by tile probe (Table S1).The 12 resampled points were withheld, and ordinary kriging with leave-one-out cross validation was performed with the remaining 30 tile probe sampling points (Fig. S1a).To determine if edaphic conditions (e.

Table 1
Coordinates of soil depth estimates by tile probe with elevation and slope estimates for individual points, grouped by field.1).

Statistical analysis
Average tile probing depth values of the 12 withheld probing points at Central were regressed against kriged estimates of z f , as well as NDVI from UAV (10 August 2017), NAIP (16 August 2017), and Sentinel-2 (4 August 2019) sensors.Moran's I detected no autocorrelation in any model, and assumptions of normality and homoscedasticity were satisfied by Jarque-Bera and Breusch Pagan tests, respectively.Still, visual inspections of fitted values versus residuals raised concerns of heteroscedasticity, so nonlinear regression was performed using the nls function of the package nlme (Pinheiro et al., 2021) in R statistical software (R Core Team, 2021).Soil depth was designated as the predictor variable and NDVI as the response variable in a logistic model: where parameters a, b, and c represent the upper asymptote, maximum slope, and value of z f at the maximum slope, respectively.Parameter estimates and significance for select converged models are indicated in Table 2. Causality and precedence (Sigler et al., 2020) were considered when designating z f as the predictor variable and NDVI as the response, although it is recognized that these variables will be reversed in practical applications.
Linear regression was used to determine whether z f predicted NDVI independent of acquisition day, sensor, field, and acquisition year.However, assumptions of normality and homoscedasticity were often violated when using linear regression, and transformations showed inconsistencies among image groups in resolving these violations.Thus, when evaluating interactions, observations > 50 cm were discarded to capture only the linear portion of the curve, thereby eliminating severe violations of normality and homoscedasticity (p < 0.01) in linear models.Precedence (Sigler et al., 2020) and apparent bimodality in average tile probe depths with the approximate minimum frequency dividing the modes at 50 cm (Fig. S2) gave justification for this approach.For these assessments, analysis of variance was performed in base R (R Core Team, 2021) with z f designated as a fixed effect crossed with either date, sensor, field, or year in the interaction term.

Imagery based soil depth models
Tile probing depth from a single sampling event was designated as the response variable in multilocation models assessing the ability of Sentinel-2 data to predict soil depth across management boundaries.Two separate modelling approaches relying on image reduction

Table 2
Parameter estimates for non-linear regression models (Eq.1; n = 12; df = 9) where average tile probing depth was designated as a predictor and NDVI as a response variable, grouped by sensor, year, acquisition day of year (DOY), and field.Regressions were performed using NDVI of fields planted to spring wheat in central Montana.Highlighted in grey are regressions performed with single-event tile probing depths versus average probing depths.In the continuous approach, mean (NDVI MEAN ) and max NDVI (NDVI MAX ) for the period 16 June to 04 September 2020 were included Table 3 Summary statistics of linear models assessing effects of average (Central; n = 6; df = 4) or single event (All; n = 32; df = 30) tile probing depths (< 50 cm) on NDVI from multiple sensors acquired in different years, on different days of the year (DOY), and at different growth stages (Zadoks et al., 1974)  as independent variables in a logistic model: where parameters a and b are empirically derived constants set to 0.013 and − 22.3, respectively, based on an iterative parameterization that optimized the regression statistics of predicted ẑ f and observed z f (tile probing depth).Regression statistics of linear models designating categorical (class) or continuous (ẑ f ) estimates of z f as fixed effects were compared to those of the model designating the 'best' Sentinel-2 NDVI image (i.e., the image explaining the most variability in tile probing depth across fields) as a fixed effect.Initially, tile probing depth was designated as the response variable with class, ẑ f , or NDVI designated as fixed effects crossed with field (class × field, ẑ f × field, or NDVI × field).Because interactions (at p < 0.05) were not observed in any model, both field and the interaction term were dropped as fixed effects and the ability of z f proxies (class, ẑ f , NDVI) to explain variance in z f observations (tile probing depth) across fields was assessed with linear modelling in base R. To assess year-dependencies of z f proxies, mixed-effects modelling was performed where NDVI was treated as the response variable and acquisition year crossed with tile probing depth, ẑ f , or class was designated as a fixed effect.Field was designated as a random effect in these models, allowing comparisons of conditional coefficients of determination (conditional R 2 ), representing variability explained by the entire model, with marginal coefficients of determination (marginal R 2 ), representing variability explained by fixed effects only.

Kriged estimates of soil depth
Ordinary kriging based on a single tile probe sampling event (n = 30) with optimized variogram statistics (lag size; = 4.6; nugget = 0; range = 37; sill = 332; Fig. S1a) performed modestly well based on leave-one-out cross validation (Fig. S1b) and regression of predicted z f against average tile probing depths at the 12 withheld points (y = 0.82x + 9; RMSE = 12 cm; R 2 = 0.50; p = 0.006).The fitted spherical variogram in Bestwick (2016) did not describe the empirical variogram in the current study, but grid sampling of z f in the former study versus fully randomized sampling in the latter may help explain this difference, given that distances between sampling points were not consistent.Furthermore, parameters used to describe the variogram fit in the current study (nugget = 0; range = 37; sill = 332) were outside the range of parameters (nugget = 149-266; range = 31-34; sill = 111-187) reported by Bestwick (2016), although this is not surprising given that different variograms were selected (stable versus spherical).
Sensor resolution could not be assessed independently of acquisition day, but all sensors acquired images during crop senescence when acquisition day is least likely to affect the slope of z f -NDVI regression lines and most likely to affect the y-intercept (Fig. 4c).Therefore, differences in the slopes of z f -NDVI regression lines among sensors (Fig. 5a) more likely reflected differences in spatial resolution than differences in y-intercept.Since no statistical z f × sensor interaction was observed (p = 0.22), and since the weakest z f -NDVI correlation (NAIP; R 2 = 0.34; 0.6 m) did not correspond with the coarsest (Sentinel-2; 10 m) or finest (UAV; 0.1 m) resolution sensor, it was concluded that correlations between z f and NDVI do not depend on sensor, as long as the image resolution is ≤ 10 m and images are acquired during crop senescence.Similarly, because acquisition day was held constant in assessments of the impacts of field and year on z f -NDVI relationships, yet no z f interactions with field (p = 0.94; Fig. 6) or year (p = 0.68; Fig. 7) were detected, it was concluded that correlations between z f and NDVI depend less on field and acquisition year than on acquisition day.
Soil depth-NDVI relationships seem to depend more on seasonal (i.e., intra-annual) variation in remote sensing spectra than on inter-annual variability, field, and spatial resolutions ≤ 10 m.These results add to a growing body of evidence highlighting the value of multi-temporal remote sensing datasets in characterizing spatially variable soil properties (e.g., Maynard and Levi, 2017).Images acquired at different crop stages may be used to spatially characterize different soil properties.For example, Webb et al. (2021) found that early-season NDVI explained more variability in surface pH than late season NDVI in a field in southwest Montana, while the current study demonstrates that late-season NDVI explains more variability in z f than early season NDVI within fields.Future research should investigate temporal sensitivity of NDVI-z f relationships in different geographic regions and climates.

Multilocation soil depth estimates
Soil depth modelling across fields showed that the Sentinel-2 sensor's 'best' multilocation z f -NDVI relationship (5 August 2020; Table 4) outperformed z f proxies relying on image reduction techniques.Impacts of sensor on the slope (linear) and shape (non-linear) of probing depth-NDVI regression lines were minimal, while impacts to the magnitude of NDVI were likely associated with differences in acquisition date.Specifically, the empirical spectral index (ẑ f ; Eq. 2) and the CART classifier (class) explained 30% and 24% of the variability in tile probing depth, compared to 54% of the variability explained by well-timed Sentinel-2 NDVI.The continuous model outperformed the categorical model based on coefficients of determination (R 2 = 0.30 vs 0.24) and pvalues (p = 0.001 vs 0.003), but root mean square errors were equivalent (5 cm).Despite coefficients of determination for class and ẑ f predictions falling outside the range of previously published z f predictions (R 2 = 0.32-0.89),p-values and root mean square errors were quite small for both models.Furthermore, the ability of class to predict NDVI did not depend on acquisition year (p = 0.12), but the acquisition year interaction was stronger (p = 0.053-0.062)for both ẑ f and tile probing depth models (Table S3), suggesting predictions of the CART classifier (class) may be relatively tolerant to inter-annual variability in remote sensing spectra.
While the error matrix and accuracy statistics of the CART classification (Table S4) were not directly applicable to z f characterization (i.e., z f was not measured in all training/test areas) results indicate that spatial patterns of NDVI in well-timed NAIP images are reproducible with Sentinel-2 imagery on a landform where such patterns are known to correlate with z f .Visual agreement between classified Sentinel-2 data and NDVI from well-timed UAV imagery at Central suggests the categorical approach did a reasonable job of delineating z f features (Fig. S3).While previous research characterized soil depth at high spatial resolutions (< 10 m; Bestwick, 2016;Sigler et al., 2020), the current study improved the extent of z f prediction three-fold (145 vs 44 ha) by including multiple fields.Results support the use of multitemporal imagery of intermediate resolution for characterizing z f across fields.
A field effect on NDVI was detected in a well-timed Sentinel-2 NDVI image (p < 0.001; 5 August 2020; Table 4) in the current study, but a z f × field interaction was not (p = 0.94).Since fields were selected to capture a known gradient in depth to a subsurface shale layer (Fig. 2), and since the highest NDVI values corresponded with the shallowest shale contacts (Fig. 2; Fig. 6), it is suggested that fine-scale spatial patterns in NDVI were largely driven by z f but that geologic differences exerted a regional influence on these patterns.The similarity in slopes of z f -NDVI relationships with field suggests that edaphic (z f ) and geologic (shale depth) controls on water availability did not interact but had additive effects on NDVI of wheat.Thus, well-timed NDVI (Zadoks 90-93) derived for multiple fields in the study area might be best conceptualized as an effective z f index, reflecting differences in water supply related to variability in z f but also to other factors.It is theorized that crop senescence occurred earlier where shale contacts were deeper because plant roots were unable to access shallow groundwater in these locations, eliminating the 'sub-irrigation' effect observed at shallower shale contacts.This is consistent with previous research (Sigler et al., 2018) which demonstrated that depth to shale altered hydrologic partitioning to deep percolation and, by extension, plant uptake.
Results of the present study stress the importance of image timing for NDVI-based predictions of z f across fields, as a single well-timed NDVI image explained more variability in tile probing depth than z f predictions derived from image reduction and machine learning techniques.Results also highlight the inherent value of spaceborne sensors given their large geographical extent and frequent revisit times (Gholizadeh et al., 2018).Where satellite data are not available, well-timed UAV-based imaging is an alternative.
Remote observation of z f offers a promising means of assessing vulnerability to nutrient loss in agricultural systems, given that this soil property can impact N leaching and denitrification loss (John et al., 2017;Sigler et al., 2018Sigler et al., , 2020Sigler et al., , 2022)), as well as mineralization of soil organic N during fallow (John et al., 2017;Sigler et al., 2022) and in the offseason (Fordyce, 2022).These relationships have relevance for site-specific management strategies that could use spectral indices to optimize N applications and balance potential revenue to possible loss of inputs (Preza-Fontes et al., 2019;Sigler et al., 2020;Zhao et al., 2020;Argento et al., 2021;Hegedus et al., 2023).The analysis allows producers to direct N fertilizer to areas where it will have the greatest impact on crops and least impact on the environment, and together with site-specific assessment of fertilizer response, can inform more cost-effective management strategies.

Conclusion
Linear and non-linear regression of probing depth and NDVI from three sources indicated z f -NDVI relationships were not dependent on sensor resolution or field boundary when images were acquired during senescence.Fine (<1 m) and intermediate (≤ 10 m) resolution NDVI acquired during crop senescence predicted z f with equivalent or greater accuracy than the time-and resource-intensive sampling efforts involved in kriging.The research highlights the potential for satelliteand secondarily UAV-based imagery to characterize spatial

Fig. 2 .
Fig. 2. Moccasin Terrace study area indicating locations of the Fisherman, Central, and Sun Basin study fields within the Judith Basin, Montana.True color aerial image from 2019 (a), shale depth (b); adapted from Sigler et al., 2018), and average NDVI from Sentinel-2 for the 2019 growing season (c).
techniques were used to predict tile probing depth from categorical (class) and continuous (ẑ f ) estimates of z f .In the categorical approach, supervised classification (Classification and Regression Trees classifier; CART;Breiman et al., 1984) of 4-band (R, G, B, NIR) Sentinel-2 imagery (10 m resolution) was carried out in Google Earth Engine (Goerlick et al., 2017).Average seasonal (16 June through 4 September 2020) reflectance values of each band were used as training and validation data in the CART classification.NDVI from NAIP (0.6 m resolution) was used as ancillary data in the delineation of training and validation polygons, within which every Sentinel-2 pixel was treated as a training or validation point.Training polygons included a total of 124 and 126 pixels for 'deep' and 'shallow' classes, respectively, satisfying the 100-pixel guideline suggested by Campbell & Wynne (2011).

Fig. 3 .
Fig. 3. Kriged estimates of soil depth based on probing (a) and NDVI from different imagery sources (b-d).NDVI was acquired by an unmanned aerial vehicle (UAV; b), the National Agricultural Imagery Program (NAIP; c), and Sentinel-2 satellite systems (Sentinel 2; d) on 10 August 2017, 16 August 2017, and 4 August 2019, respectively.Black symbols are the 30 initial tile probing points and white symbols are the 12 points withheld for cross validation and NDVI regressions.

Fig. 4 .
Fig. 4. Impacts of image acquisition timing on linear (a, c) and non-linear (b,d) regression models for probing depth and NDVI from Sentinel-2.Probing depth observations > 50 cm were excluded in linear regressions (a, c).Among pre-(12 July 2019) and early-(22 July, 25 July 2019) senescence images (a, b), image date was important for determining the slope (linear) and shape (non-linear) of probing depth-NDVI regression lines.Impacts of acquisition date on probing depth-NDVI relationships were less pronounced among early-(22 July 2019), mid-(6 August 2019), and late-(21 August 2019) senescence images (c,d).

Fig. 5 .
Fig. 5. Impacts of sensor on linear (a) and non-linear (b) regression models for probing depth versus NDVI acquired by an unmanned aerial vehicle (UAV), the National Agricultural Imagery Program (NAIP), and Sentinel-2 satellite systems (Sentinel) on 10 August 2017, 16 August 2017, and 4 August 2019, respectively.Probing depth observations > 50 cm were excluded in linear regressions.Impacts of sensor on the slope (linear) and shape (non-linear) of probing depth-NDVI regression lines were minimal, while impacts to the magnitude of NDVI were likely associated with differences in acquisition date.

Fig. 6 .
Fig. 6.Impacts of field on linear (a) and non-linear (b) regression models for probing depth versus NDVI acquired by Sentinel-2 satellite systems on 5 August 2020.Probing depth observations > 50 cm were excluded in linear regressions.Impacts of field on the slope and y-intercept (linear) and the shape (non-linear) of probing depth-NDVI regression lines were minimal.

Fig. 7 .
Fig. 7. Impacts of acquisition year on linear (a) and non-linear (b) regression models for probing depth versus NDVI acquired by Sentinel-2 satellite systems on 6 August 2019 and 5 August 2020.Probing depth observations > 50 cm were excluded in linear regressions.Impacts of year on the slope (linear) and shape (non-linear) of probing depth-NDVI regression lines were minimal.
of z f -and more generally, soil water supply-across field boundaries and improves our understanding of the practical constraints on spectral and other z f proxies.There is strong potential for imagerybased z f indicators to provide site-specific decision support in N fertility management.Future work should use z f indices to develop specific practices which balance potential revenues and environmental loss.
S.I.Fordyce et al.heterogeneity