Evaluating the contribution of Sentinel-2 view and illumination geometry to the accuracy of retrieving essential crop parameters

ABSTRACT Wide field-of-view (FOV) sensors such as Sentinel-2 exhibit per-pixel view and illumination geometry variation that may influence the retrieval accuracy of essential crop biophysical and biochemical variables (BVs) for precision agriculture. However, this aspect is rarely studied in the existing literature. Hence, the current study aimed to evaluate the contribution of view and illumination geometries to the accuracy of retrieving Leaf Chlorophyll a and b (LCab), Canopy Chlorophyll Content (CCC), and Leaf Area Index (LAI) using the Random Forest (RF). The experiments were performed on various input variable scenarios where per-pixel geometric covariates, i.e. View and Sun Zenith Angles (VZA and SZA, respectively), and Relative Azimuth Angle (RAA), are excluded and included in spectral bands (SB) and spectral vegetation indices (SVIs), respectively, in two semi-arid areas. The results showed that spectral bands or vegetation indices combined with geometric covariates improved the R2 by 10–15% for LAI and 3–5% for CCC. In contrast, negligible improvements of 1–2% were achieved for LCab with cross-validation test data and independent held-out dataset, respectively. In agreement with previous studies, VZA and SZA were among the topmost influential variables in the RF models for estimating LAI, LCab, and CCC. Collectively, per-pixel geometric variables explained more than 30% of the variability in surface reflectance for all Sentinel-2 spectral bands (p < 2.2e-16). Overall, the results showed that incorporating geometric covariates improved the accuracy of retrieving BVs; thus, it provided additional information that improves the predictive power of SB and SVIs. The significant benefits of the geometric variables were mainly realized for canopy-level BVs (i.e. LAI and CCC) than for LCab. Therefore, it is recommended to incorporate per-pixel view and illumination geometry in estimating LAI and CCC, especially when using wide-view sensors such as Sentinel-2. However, further testing in different phenology, site and acquisition conditions is needed to confirm the contribution of the geometric covariates to facilitate reliable retrieval of BVs from remotely sensed data and aid better agronomic decisions.


Introduction
Improving the estimation of crop biophysical and biochemical variables (BVs) is critical to support agricultural management practices, aid the optimization of farm inputs, and monitor agricultural production at landscape to regional level (Alexandridis, Ovakoglou, and Clevers 2020).Plant parameters such as the Leaf Chlorophyll a and b (LC ab ), Leaf Area Index (LAI), and Canopy Chlorophyll Content (CCC) are often used in as input to models monitoring the physiological status of plants and their health (Lu et al. 2018).The LAI is considered an essential parameter for quantitatively analyzing a series of biophysical processes which describe vegetation dynamics (Alexandridis, Ovakoglou, and Clevers 2020;Liu et al. 2017).LC ab and CCC, on the other hand, are critical for characterizing the photosynthetic activity of crops.Because they are both related to Nitrogen (N) content (Clevers, Kooistra, and Van den Brande 2017), CCC and LC ab are critical agronomic parameters to monitor N content at canopy and leaf scales.Moreover, the photosynthetic activities of crops hinge on water availability.Thus, crop stress caused by declining soil moisture can be detected rapidly by assessing the changes in chlorophyll content.
LAI and LC ab are traditionally measured using destructive and lab-based methods (Liang et al. 2011), which entail prohibitive operational costs for vast areal and temporal coverages.Therefore, optical handheld LAI instruments, such as Accu-Par Ceptometer, digital hemispherical photography, SunScan, and LAI-2200C and chlorophyll meters such as MC-100 Chlorophyll Concentration Meter and SPAD-502, provide reasonably accurate measurements of effective LAI and LC ab in a rapid and noninvasive manner.When coupled with satellite-based remotely sensed data, the detailed retrieval of crop BVs can be realized over vast areas at frequent intervals, thus suitable for precision agriculture needs.
For years, spectral vegetation indices (SVIs) have shaped the scientific advancements in the retrieval of BVs due to their high correlation with vegetation traits such as leaf pigments, nitrogen, and LAI (Lee et al. 2008;Clevers and Gitelson 2013;Clevers, Kooistra, and Van den Brande 2017;Haboudane et al. 2008).Such SVIs were developed using visible and near-infrared (VNIR) bands from earlier missions such as AVHRR (Advanced Very High-Resolution Radiometer), MODIS (Moderate Resolution Imaging Spectroradiometer), and Landsat.Hence, SVIs such as the NDVI (ROUSE 1973) and its variants -designed to overcome background effects, atmospheric interference, and saturation -are popular in agronomic and other vegetation applications.However, VNIR indices are known to suffer from underestimations at various (i.e.low or high) biomass, chlorophyll content and LAI.Recently, red-edge SVIs based on quasi-hyperspectral sensors, such as Worldview-2 and Multi-Spectral Instrument (MSI) on board Sentinel-2, were shown to enhance the accuracy of BVs and overcome challenges of VIS/NIR indices (Delegido et al., 2013;Dimitrov et al., 2019;Fitzgerald et al., 2010;Sibanda et al., 2019).Other studies have demonstrated the potential of Radiative Transfer Models (RTMs) (Jacquemoud et al. 2009).However, such models are beset with errors resulting from poor parameterization and complexity, which may result in considerable uncertainty in the retrieved BVs (Xu et al. 2019;Durbha, King, and Younan 2007).
Alternatively, MLRAs (Machine Learning Regression Algorithms) were tailored to learn complex relationships between measured BVs and reflectance data from satellite imagery (Karimi et al. 2018;Mao et al. 2019).The application of MLRAs for the retrieval of important crop BVs is considered ideal due to their high accuracy and flexibility to process multiple features and datasets (Apolo-Apolo et al., 2020).It is, therefore, essential to examine whether such approaches could improve future product development, aid informed agricultural management decisions, and accelerate precision agriculture, especially in developing regions such as Africa where operational use of Earth observation for precision agriculture is still in its infancy.
Vegetation canopy reflectance is affected by the biophysical and biochemical traits, phenological and environmental changes, signal attenuation by the atmosphere, topographic effects, and acquisition conditions, among others (Ollinger 2011).In particular, satellite-based radiometric measurements exhibit non-negligible effects from view and illumination geometry or Bidirectional Reflectance Distribution Function (BRDF) that may introduce ambiguity in retrieving BVs.BRDF refers to the ratio of the solar incident irradiance E i from a given direction on a target surface to its contribution to the reflected radiance L r from the same surface in another direction.Mathematically, it is expressed as where θ and ϕ indicate a direction, i and r are the quantities associated with the incident and reflected radiant flux, respectively.E i , L r , and d denote the incident irradiance, reflected radiance, and differential quantity, respectively (Nicodemus et al. 1977).Essentially, the BRDF refers to the surface reflectance variation with the varying view and illumination geometry (Nagol et al. 2015).Due to the different acquisition angles between sensors and orbits and the sun-angle variations across different seasons, latitudes, and terrains, previous studies dedicated considerable effort to reducing BRDF effects (Roy et al. 2017b;Roy, Li, and Zhang 2017a).The reduction of BRDF effects is particularly essential for obtaining temporally consistent data across different sensors to enhance the accuracy and temporal consistency of land cover maps (Nagol et al. 2015), canopy structural parameters (Sharma, Kajiwara, and Honda 2013), and BVs (Pocewicz et al. 2007).
In that regard, techniques such as the c-factor (Roy et al. 2017b;Roy, Li, and Zhang 2017a) -based on a semiempirical Ross Thick/Li Sparse-Reciprocal BRDF model (Schaaf et al. 2002) -and integrated topographic and angular normalization (Yin et al. 2020)-based on the Cfactor and Path Length Correction (PLC)-have been proposed for Landsat and Sentinel-2.Other techniques, such as the MODIS BRDF/Albedo algorithm, have been used to produce Nadir BRDF-Adjusted surface Reflectance (NBAR) for decades (Schaaf et al. 2002).For high-resolution sensors, e.g.Sentinel-2, atmospheric correction tools such as Sen2Cor provide an option to perform empirical BRDF corrections (Louis et al. 2016).However, the BRDF correction option is often left deactivated in validation studies to be consistent with other methods (Sola et al. 2018).So far, the radiometric variation introduced by the sensor view and illumination geometry has been regarded as a nuisance.However, such geometry-induced spectral variation may provide additional structural information content due to the dependence of reflected radiance on the geometric configurations (Kneubühler et al. 2008).This is especially highlighted by the studies utilizing multi-angular data (Verrelst et al. 2008;Duveiller, Lopez-Lozano, and Cescatti 2015;Vierling, Deering, and Eck 1997), showing its utility for characterizing structural variation of various layers of vegetation (e.g.foliage clumping) that affects LAI.However, high-resolution multi-angular data are costly and can only be acquired by commercial satellites (Roosjen et al. 2018).Other studies showed that view and illumination geometry have a considerable effect on vegetation indices and phenology metrics (Ranson, Daughtry, and Biehl 1986;Middleton 1991;Pocewicz et al. 2007;Verrelst et al. 2008;Ma et al. 2020).
Meanwhile, studies (Nagol et al. 2015;Roy et al. 2017b) also show that popular multispectral sensors (e. g.Landsat and Sentinel-2) exhibit within-scene angular effects.In particular, Sentinel-2's wide Field-of-View (FOV), i.e. ~20.6° (~295 km), has been shown to exhibit considerable within-scene surface anisotropy due to varying per-pixel view and illumination geometry for each of its 12 detectors (Gascon et al. 2017;Roy et al. 2017b).Fortunately, Sentinel-2 data provides a pixelwise and band-wise view and illumination geometry, thus providing some prospects to address the questions such as (1) How much variability in crop canopy reflectance is explained by the view and illumination geometry variations?(2) What is the contribution of incorporating per-pixel view and illumination geometry to the accuracy of retrieving crop BVs using MLRAs?(3) Are geometric variables influential in improving modeling accuracy when coupled with commonly used radiometric input variables, i.e. spectral bands and SVIs?These will clarify whether acquisition conditions (i.e.sun-view geometry) provide additional information or noise to the estimation of crop BVs.The operational Sentinel-2 Land bio-physical processor (SL2P) (available in Sentinel Application Platform, SNAP), based on pretrained ANN, uses per-pixel geometric information as input to the biophysical retrieval process (Weiss and Baret 2020).However, such complex, "black-box" (or opaque) models do not provide model agnostics, which would otherwise reveal the influence of per-pixel geometric variables on the estimations.Interpretable MLRAs such as RF are essential for obtaining new causal links and novel insights between predictor and response variables.Moreover, they enable the detection and diagnosis of biases in the MLRA models and associated input data, hence essential for improved satellite-based valueadded products (Azodi et al., 2020).
In that context, this study evaluated the contribution of view and illumination geometries on crop biophysical and biochemical retrieval using the Random Forest (RF) regression algorithm and various combinations of Sentinel-2 covariates.The evaluated covariates consisted of spectral bands (SB), spectral vegetation indices (SVIs), SB combined with sensor view-illumination geometries (SB + Angles), and SVIs combined with sensor view and illumination geometries (SVIs + Angles), and all variables (Avar).The specific objectives were threefold: (1) To quantify the amount of spectral variability caused by the per-pixel view and illumination geometry in Sentinel-2 spectral bands; (2) To determine the contribution of per-pixel view and illumination geometry to the accuracy of retrieving LAI, LC ab and CCC with RF; and (3) To identify the most important (i.e.influential) variables contributing to the model accuracy, considering different combinations of covariates.The study area is comprised of two semi-arid African agricultural sites, located in South Africa.

Study area
Bothaville and Harrismith sites are located in the Free State province, South Africa (see Figure 1).The sites are characterized by mainly commercial farming of crops and livestock takes place.The summer growing season, which constitutes most cropping activities in both sites, has mean annual temperatures of about 18°C and 19.2°C and mean annual rainfall of about 584 and 115 mm, respectively (Kganyago et al. 2020(Kganyago et al. , 2020)).The crops in the sites are generally similar, i.e. predominantly maize and beans, while sunflower and groundnuts are dominant in Bothaville only.Cropping takes place on flat slopes with sandy-loamy and sandy soils in Bothaville and undulating slopes with clay-loamy soils in Harrismith.The summer cropping calendar at both sites is from December to May or June.

Materials and methods
The summary of the methods followed in this study is given in Figure 2.  40 m) were designed along transects chosen randomly within various crop fields.Six to eight random measurements were captured within each plot and averaged.Each plot's centroids were Geo-tagged with a GPS coordinate and a picture using TDC600 handheld Data Collector (Trimble® Inc., Westminster, CO, USA) with 1.5 m GNSS accuracy.Plant Canopy Analyzer 2200c (Li-Cor, Inc., Lincoln, NE, USA), equipped with an 180° field-of-view (FOV) view cap, was used to measure LAI.The view cap ensured that any possible uncertainty caused by the operator's influence and uneven sky conditions are avoided.On the other hand, MC-100 (Apogee Instruments, Inc., Logan, UT, USA) was used to acquire LC ab measurements on sun-exposed leaves.The CCC was obtained as a product of plot-averaged LC ab and LAI measurements (LC ab × LAI) (Jacquemoud et al. 2009).Because our aim was to develop generic models, we combined the in situ data from each site, i.e. beans, maize, and peanuts and beans and maize found in Bothaville and Harrismith, respectively.Our approach is aligned with NGUY-Robertson et al. ( 2012) who note that crop types with different physiological pathways (i.e.C3 and C4), plant structures and architectures are representative many other crop types' physiological and anatomical traits.Therefore, the developed models are likely widely applicable to multiple crops, critical for sub-Saharan Africa, where inter-cropping agricultural practices dominate.
Other land covers, i.e. other than croplands, were removed from the images by applying a crop mask generated from the National Crop Boundaries data (Cropestimatesconsortium 2017).To limit the analyses to the active croplands of 2021 summer growing season, we applied an NDVI-based vegetation mask, where non-vegetated pixels were considered as those with NDVI <0.2.

Multiple linear regression
As shown in Figure 3, the SZA, VZA, and RAA vary spatially within a single Sentinel-2 MSI tile.Gascon et al. (2017) indicate that Sentinel-2 acquisition geometries differ according to the 12 detectors for each spectral band due to the sensor's push-broom design.This spatial variability is clear in Figures 3 and 4, where it is evident that the SZA, VZA, and RAA vary by pixel across the scene.Therefore, it is also expected that pixel-wise surface reflectance over crops would vary by sensor view and illumination geometry (Verrelst et al. 2008;Jay et al. 2017;Galvao et al. 2011;Biriukova et al. 2020).
In this study, multiple linear regression (Equation 1) and coefficient of determination adjusted for the degree of freedom, adjusted R 2 , were used to evaluate how much of the reflectance variation is explained by sensor view and illumination per-pixel variation using mean SZA, VZA, and RAA as predictor variables (x), and each spectral band as the response variable (y).RAA was calculated as a difference between SAA and VAA.
where y is the response variable (i.e.surface reflectance of each spectral band), and x 1...n are predictor variables (SZA, VZA, RAA), m 1 represent regression coefficients, while b is the intercept.The regression coefficient, i.e. the partial derivative of the response variable for each predictor variable, measures the linear sensitivity of y to inputs x i...n (Mohanty and Codell 2002).Adjusted R 2 was used to assess the relationship between the response variable and predictor variables.In contrast, the F-statistic p-value was used to determine the significance of the relationships at α = 0.05.

Random Forest regression algorithm
The retrieval of BVs considered here, i.e.LAI, LC ab , and CCC, was performed using Random Forest (RF) (Breiman 2001) within the "randomForest" R-statistics package (Tierney 2012;Liaw and Wiener 2002).RF is a popular tree-based MLRA that utilizes bagging to repeatedly and independently build multiple decision trees using a subset of random training samples generated using resampling with replacement out of the original sample (Fawagreh, Gaber, and Elyan 2014;Breiman 2001).From all the training data (i.e.70%), about 64% are kept for model building (i.e.in-bag samples), while model performance evaluation and variable importance are performed with the remaining 36% (i.e.out-of-bag or OOB samples) (Gislason, Benediktsson, and Sveinsson 2006).For each predictor, variable importance is computed by assessing the %IncMSE (Percent Increase in Mean Squared Error) when permuting OOB samples while keeping all other variables constant.The predictive power of the variables is assessed with %IncMSE, i.e. also used as a ranking measure for the various variables.For example, a variable is deemed significant if its exclusion from the model causes a considerable reduction in prediction accuracy.Essentially, a predictor with a high importance score strongly correlates with the response variable or predictive power (Mutanga, Adam, and Cho 2012).The tuning parameters, i.e. ntree (number of trees) and mtry (the number of variables for each split), were determined using the Grid-search approach using values from 100 to 500 with an interval of 10, and from 1 to p (i. e. number of predictors) with a single interval, respectively.

Input variables
The input variables to the Random Forest (RF) regression algorithm were divided into five (5) experimental scenarios (Table 1).The per-pixel view and illumination geometries, i.e.SZA, mean of VZA and RAA (i.e.calculated as the difference between SAA and VAA), were obtained as raster layers from the Sentinel-2 MSI product.At the same time, the MSI spectral bands were used to compute carefully selected red-edge spectral vegetation indices (SVIs, see Table 2) based on their previous empirical relationship with the BVs considered in the current study (Viña et al. 2011;Schlemmer et al., 2013;Pasqualotto et al., 2019;Delegido et al., 2011;Nguy-Robertson et al. 2012).All SVIs were calculated in the Sentinel-2 Toolbox within Sentinel Application Platform (SNAP) v8.0 software (http://step.esa.int,accessed 10 November 2020).Since the variables had varying values, they were normalized using scale and center approaches in the "mlr" R-package.

Model calibration and testing
The in situ data from Harrismith data were split into 70% calibration (i.e.training) vs 30% testing (i.e.validation) using stratified-random sampling to represent  all crops at the study site proportionally.Then, the 70% calibration data were combined with Bothaville data to form the calibration data for the study, while 30% was held-out for independent testing of the models.Using the pooled calibration data from the two sites, nested k-fold cross-validation (cv), where two nested k-fold cv resampling, were used for hyperparameter tuning (k = 10) and model calibration and testing (k = 5), respectively.k-fold cross-validation (cv) randomly divides the dataset into equal number of sub-datasets according to k.Then, k-1 sub-datasets constitute a training dataset, while one kth sub-dataset is the validation dataset.Using one of the k subdatasets at each iteration, i.e. k times, the final estimation value is formed by aggregating all the validation steps.The nested k-fold cv resampling ensured that all pooled data from both study sites were used to calibrate and test the RF models (Shah et al. 2019;Verrelst et al. 2015).The cv resampling for model calibration and testing was performed with 100 iterations to reduce variability and increase the model's predictive ability on unseen data.

Prediction accuracy metrics
The prediction accuracies for each Random Forest (RF) model, constructed using various experimental scenarios, were assessed with the recommended metrics by Richter et al. (2012) ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi 1 n where x i denotes the observed BV (e.g.CCC), and y i denotes the predicted BV (e.g.CCC), � x i and � y i are the means of the observed (or measured) and predicted BV, respectively.N in Equation 3 is the number of errors and n is the sample size.

Spectral variability explained by the view and illumination geometry
To determine the amount of spectral variability explained by the view and illumination geometry variables, we assessed the pixel-wise relationship of the geometries and each spectral band using multiple linear regression and coefficient of variation adjusted for the degree of freedom (adjusted R 2 ).The results (Table 3) show that there is a significant relationship between view and illumination geometric covariates and Sentinel-2 spectral bands (p-value <2.2e-16).The adjusted R 2 values were between 0.30 and 0.43; therefore, view and illumination geometry covariates (i.e.SZA, VZA, RAA) explained 30% to 43% of spectral variability.B4, B2, and B7 had the lowest adjusted R 2 values of 0.30 ≤ 0.34, while B3, B5, B6, B8, B8AB8A, and B12 had adjusted R 2 values of 0.35 < 0.4, and B11 had adjusted R 2 value of >0.4.The same is observed in Harrismith for VIS bands B2, B3, and B4 and NIR band B8, but the geometric variables explained <30% of surface reflectance variability in B5, B6, B7B7, and B8A, while they explained >40% of surface reflectance variability in B11 and B12.In Harrismith, the significant geometric variables at α < 0.05were RAA in B2, RAA and VZA in B3, and SZA and VZA in B7, while all the other bands were influenced significantly by all geometric variables, i.e.SZA VZA, and RAA.Similarly, in Bothaville, all the geometric variables were significant at α < 0.05.

Predictive performance of Sentinel-2 covariates
Five experimental scenarios, designed based on the different combinations of Sentinel-2 MSI co-variates, were assessed for their performance in retrieving LAI, LCab, and CCC using Random Forest (RF) regression algorithm.The results were confirmed by applying the calibrated models on two test datasets, i.e. one selected through five5-fold cross-validation (cv) using the pooled data from two study sites (hereafter, cross-validation test dataset) and an independent held-out (i.e.unseen) dataset from Harrismith (hereafter, independent held-out test dataset).Generally, the results (Table 4) indicate that, among the compared experimental scenarios, the SB + Angles and SVIs + Angles scenarios provided the most robust retrievals of all the BVs considered here, except for CCC, where SB + Angles and Avar provided better results than other scenarios.The LAI retrieval models using SB + Angles and SVIs + Angles yielded excellent predictive performance with RMSE of 0.50 and 0.43 m 2 m −2 .Moreover, they explained the most significant variability of LAI within the two test datasets, i.e. 69% and 67% for cross-validation test data and independent held-out (i.e.unseen) data, respectively.These performances were closer to Avar, which achieved RMSEs of 0.52 and 0.55 m 2 m −2 and R 2 of 0.66 and 0.67, respectively.In contract, the predictive performance of input scenarios where the per-pixel view and illumination geometry covariates were excluded, i.e.SB and SVIs, were relatively inferior, achieving RMSE >0.60 m 2 m −2 and R 2 <0.60) for both test datasets.Interestingly, the best predictive performance for LC ab was achieved with SVIs + Angles with RMSE of 6.57 µg cm −2 (R 2 : 0.76) and 6.16 µg cm −2 (R 2 : 0.61), respectively.The next better predictive performance was achieved by SVIs and Avar, while SB + Angles and SB were slightly worse.Similar to LAI models, the CCC retrieval models built with input scenarios that incorporated per-pixel view and illumination geometry covariates, i.e.SB + Angles, SVIs + Angles, and Avar, were better than those that used scenarios which did not contain any geometric variables, i.e.SB and SVIs.
Overall, the predictive performance of the BV models was consistent between the two test datasets, demonstrating the contribution of per-pixel view and illumination geometry covariates in improving the robustness of BV retrievals.However, contrary to LAI results, the SVIs-LC ab model outperformed the scenarios incorporating per-pixel view and illumination geometry covariates, i.e.SB + Angles and Avar.As demonstrated by SB + Angles,  5, all the experimental scenarios resulted in underestimations, to some extent, of all the BVs considered.The LAI seems to saturate at ~5 m 2 m −2 with a %Bias of up to 4% with the independent test data.The retrievals of LC ab saturate at ~50 µg cm −2 .Finally, using the cross-validation test dataset, the CCC saturates at ~250 µg cm −2 , while the same occurs at relatively lower values, i.e. ~200 µg cm −2 with the independent held-out test dataset.It should be noted, however, that the range was lower in the latter test dataset.As shown in Table 4, the %Bias was slightly worse in models which did not incorporate per-pixel view and illumination geometry covariates, i.e. up to 6%, across all BVs.Overall, the results from the two test datasets indicate that incorporating pixel-wise sensor view and illumination geometries reduced the %Bias for LAI, LC ab and CCC.
The spatial distribution maps of crop biophysical and biochemical variables are presented in Figure 6.The maps were derived with the best RF models, which incorporated geometric variables, i.e.SB + Angles and SVIs + Angles (see Table 4)

Variable importance
We evaluated the contribution of each input variable to the BV retrieval models with the Random Forest (RF) variable importance measure, i.e. the Percentage Increase in Mean Squared Error (%IncMSE).For brevity, we exclude the results for SB and SVIs which did not incorporate sensor view and illumination geometries as covariates because our interest was on the contribution of geometric covariates.Nevertheless, the results for SB were reported in our previous work (Kganyago, Mhangara, and Adjorlolo 2021).Generally, all evaluated input variable scenarios (i.e. that incorporated geometric covariates in the modeling of BVs) show the considerable importance (or influence) of the per-pixel illumination and viewing geometries on crop BV retrieval (Figure 7).The RF variable importance results for LAI (Figure 7a,d,g) show that VZA and SZA were the topmost important input variables, consistently showing a high %IncMSE of >20% when using the SB + Angles and SVIs + Angles, while they had an average %IncMSE of ≥10%<20% when using Avar.The other geometric covariate, i.e.RAA, had an average %IncMSE (i.e.≥10% <20%) in all the scenarios, along B3, B8A, and B5, and SWIR bands B11 and B12 when using SB + Angles, CI RE_740 , MTCI, NDI45, MCARI, IRECI, and RERI 705 when using SVIs + Angles, and CI green , CI RE_740 , B3, MTCI, B8A, and B12 when using Avar.The visible bands, B2 and B4, were among the highly influential variables in the LAI model using SB + Angles (i.e.%IncMSE >20%), while B2 was among the top three influential variables in the LAI model using Avar.
In Figure 7b,e,h, the contribution of geometric covariates is also evident, where when using SB + Angles as input variables, VZA was the only geometric variable featured in the top 10 most important variables.When using SVI + Angles and Avar, only SZA (among other geometric covariates) was the topmost influential variable.The radiometric variables, B5, B11, B8A, and B4, were relatively more influential in the SB + Angles-LC ab models, while MCARI, RERI 705 , NDI45, REIP, and S2REP were among the highly influential among the SVI + Angles variables.For Avar, B11, RERI 705 , MCARI, B5, NDI45, and REIP had the most influence on the LC ab model.Similar to LAI, the RF variable importance results (Figures 7c,f,i) show that VZA and SZA seem to a pivotal role in the CCC retrieval, where they were consistently featured among the most important variables in all experimental scenarios, SB + Angles, SVI + Angles and Avar.In addition to these geometric covariates, B5, B3, B11, and B12 were the topmost variables when using SB + Angles.On the other hand, MTCI, S2REP, RERI 705 , and REIP have ranked topmost in SVI + Angles, while B5, B3, S2REP, REIP, and MTCI were topmost in Avar.

Discussions
Understanding the contribution of view and illumination geometry to the accuracy of retrieving crop BVs is critical for reducing the retrieval uncertainty of these BVs accounting for geometric sources of variability in the reflectance data.As Jacquemoud et al. (2009) note, the view and illumination geometry profoundly impact the canopy reflectance.Therefore, incorporating geometric variables provides an opportunity to exploit additional structural information content or to minimize their effects where it causes noise (Roy et al. 2017b;Verrelst et al. 2008).Previous studies focused on the effects of varying illumination conditions due to topography (Martín-Ortega, García-Montero, and Sibelet 2020; Matsushita et al. 2007), correction of surface anisotropy effects for inter-sensor comparisons (Roy et al. 2017b;Roy, Li, and Zhang 2017a) and global effect of either view or illumination geometry on vegetation reflectance (Pocewicz et al. 2007;Verrelst et al. 2008;Ranson, Daughtry, and Biehl 1986), without considering the within-scene, per-pixel view, and illumination geometry contribution to the retrieval of BVs.
To address this gap, we evaluated various input variable scenarios (see Table 2) for crop BVs retrieval using the Random Forest (RF) regression algorithm.These experimental scenarios were designed to elucidate the contribution of view and illumination geometries to the crop BVs retrieval accuracy.Since Sentinel-2 has a relatively wide field of view (FOV), i. e. ~20.6° (i.e.~290 km), compared to relatively narrow FOV sensors such as Landsat-8 OLI, i.e. ~15° (~190 km), it is prone to within scene view and illumination geometry variations (see Figure 2), with potential to affect BVs retrieval.To highlight view and illumination geometry contribution to model performance practically, we assessed the predictive power of all input variables per experimental scenario using RF variable importance (%IncMSE).Furthermore, the adjusted R 2 of multiple linear regression was used to determine the amount of spectral reflectance variation explained by geometric covariates (see Table 4).

Contribution of geometric covariates to BVs retrieval accuracy
Generally, the results showed that the experimental scenarios incorporating per-pixel geometric covariates, i.e.SZA, VZA, and RAA, yielded robust retrievals of LAI, LC ab , and CCC.For example, adding geometric covariates to the standard input variables for retrieval of LAI using cross-validation test data, i.e. spectral bands (SB) and spectral vegetation indices (SVIs), showed improvements in R 2 of 10% and 15%, respectively.At the same time, RMSEs improved by 0.07 m 2 m −2 and 0.11 m 2 m −2 , respectively.These improvements also manifest in the derived spatial distribution maps, which show with-field and between-field spatial variations in LAI when geometric variables are excluded (i.e.SB and SVIs) and included (i.e.SB + Angles and SVIs + Angles) in the model (see Figure S1 and S2).The independent held-out test dataset showed improvements in R 2 by 17% and 9% and RMSEs by 0.1 and 0.09 m 2 m −2 .On the other hand, relatively lower improvements were realized for CCC, where SB + Angles achieved improvements of 3% in explained variability and RMSE improvements by only 1.41 µg cm −2 using a crossvalidation test dataset.In contrast, the independent held-out dataset showed no improvements (instead, a decline in RMSE by 0.15 µg cm −2 was observed).Using SVIs + Angles, 5% improvements in explained variability were achieved using both test datasets, respectively, while RMSEs improved by 2.15 and 1.88 µg cm −2 for the cross-validation and independent held-out test datasets, respectively.For LC ab , the improvements in R 2 and RMSE were marginal and possibly trivial.When the geometric covariates were added to SB (i.e.SB + Angles), only a 1% increase in R 2 and RMSE increase by 0.15 and 0.12 µg cm −2 were achieved with both test datasets, respectively.Similarly, SVIs + Angles led to a 2% improvement in explained variability and RMSE improvements by 0.18 and 0.15 µg cm −2 for the cross-validation and independent held-out test datasets, respectively.Generally, LC ab (measured at the leaf level) is difficult to estimate from canopy reflectance, which contains the influence of the background, plant structural and biochemical traits.That is why the variability introduced by geometric covariates was not crucial for LC ab estimation in this study.Consequently, spatial distribution maps derived with standard variables, i.e.SB and SVIs show no apparent differences (see Figures S1 and S2).
Indeed, the SVIs offer a simple and rapid assessment of crop health, with several studies showing their relationship with and accuracy in retrieving crop BVs (Jay et al. 2017;Main et al. 2011;Kooistra and Clevers 2016;Haboudane et al. 2008Haboudane et al. , 2004)).In retrieving the LAI, and canopy nitrogen content of sugar beet crops, Jay et al. (2017) found that the SVIs perform equivalent or slightly better than PROSAIL inversion.It is, therefore, not surprising that the SVIs such as MTCI, S2REIP, REIP, NDI45, MCARI, and CI RE_740 were consistently ranked among the top variables when using Avar across all BVs.Also, the SVIs scenario was superior to SB, SB + Angles, and Avar in retrieving LC ab .Vegetation indices such as MTCI, RVI and NDI45 were previously found to be highly correlated to LAI and chlorophyll content (Viña et al. 2011).In agreement, Shah et al. (2019) found that MTCI is one of the best-performing SVIs with an R 2 of 0.86 and an RMSE of 6.07 µg cm −2 in estimating Wheat LC ab due to its sensitivity to wide-ranging chlorophyll content.Similarly, CLEVERS and Gitelson (2013) observed that the MTCI is an accurate estimator of Canopy Chlorophyll Content (CCC) and nitrogen (N) content due to its linear relationship with these BVs.The results in this study are consistent with these previous studies and others (Main et al. 2011), using crossvalidation and independent held-out test datasets.Moreover, SVIs selected in this study yielded comparative results to SB, except in exceptional cases, i.e. the SVIs-LC ab model, where the former was superior.Their excellent performance can be attributed to their relatively low sensitivity to background soils, and residual atmospheric effects, which affect most VIS/NIR indices (Liu, Pattey, and Jégo 2012).However, their relatively low performance relative to SVIs + Angles signifies their sensitivity to per-pixel view and illumination geometry, as consistently reported previously (Verrelst et al. 2008;Jay et al. 2017).For example, Verrelst et al. (2008) found that broadband and narrowband greenness indices exhibit significant sensitivity to view geometry.In the current study, when per-pixel view and illumination geometric covariates were incorporated (i.e.SVIs + Angles), the variability explained improved from 52% (58%), 74% (59%) and 67% (67%) when using SVIs only to 67% (67%), 76% (61%), and 72% (72%) for LAI, LC ab , and CCC, respectively, using cross-validation test (independent heldout test) datasets.Moreover, when per-pixel view and illumination geometric covariates were incorporated into SB (i.e.SB + Angles), the variability explained improved from 59% (49%), 71% (49%), and 69% (69%) to 69% (66%), 72% (49%), 72% (69%) for LAI, LC ab , and CCC, respectively, using cross-validation test (independent held-out test) datasets.These results signify that the per-pixel view and illumination geometry affect the upwelling radiance from crop canopies as determined by the interaction of various leaf and canopy traits.Such effects were not accounted for during atmospheric correction and thus were propagated to surface reflectance.
Therefore, incorporating pixel-wise view and illumination geometry into the retrieval of BVs is reasonable since it is one of the factors influencing the canopy reflectance.In this study, its inclusion to SB and SVIs ensured that the variability in reflectance caused by the view and illumination geometry is accounted for, thus leading to improved predictive accuracy relative to using the SB or SVIs only.As shown by the multiple linear regression analysis (Table 3), more than 30% of the variability in all Sentinel-2 spectral bands was explained by the perpixel variation in view and illumination geometries at a 5% significant level (p < 2.2e-16).A previous study (Verrelst et al. 2008) found that topographic and illumination geometry accounted for about 13% of SVIs' variability.Similarly, Rahman et al. (1999) found that the canopy reflectance of rice and wheat varies significantly with view geometry, where the change in view geometry resulted in different spectral responses due to the exposed soils and vegetation for each geometry.Nagol et al. (2015) found that the effect of narrow VZA (i.e.±7°) in Landsat data resulted in non-negligible variations in reflectance, i.e. 20%.

The predictive power of the geometric covariates
The additional predictive power provided by the geometric covariates was also shown by the Random Forest (RF) variable importance results, where one or more view and illumination geometry covariates were among highly informative variables in all biophysical and biochemical models (see Figure 7).Among the geometric covariates, VZA and SZA were the most influential in retrieving LAI and CCC.In particular, all scenarios showed the high influence (ranked by their %IncMSE) of SZA and VZA for the retrieval of LAI and CCC.Other scenarios, i.e.LC ab retrieval with SB + Angles, showed that VZA was the only geometric variable featured in the top 10 important variables (see Figure 7).In contrast, the retrieval of LC ab using SVIs + Angles and Avar showed SZA as the only geometric variable with a strong influence the model.The findings in this study correspond to Verrelst et al. (2008) that the spectral response of plant canopies is significantly affected by the sensor view geometry, affecting the accuracy of retrieved BVs.
Moreover, the findings not only ascertain previous studies that demonstrated solar illumination effects on SVIs (Galvao et al. 2011) but also indicate that per-pixel SZA can contribute essential additional information that improves the predictive power of SVIs, thus reducing the uncertainty in BV retrieval.SZA contains per-pixel geometric information of the incident radiation.Critically, withinscene SZA changes with the crop growth stages (i.e.time of year); hence, ignoring its effects may lead to inconsistent BV retrieval and subsequent incorrect interpretations.For example, Galvao et al. (2011) found that intra-annual variability in enhanced vegetation index (EVI) from MODIS and nadir Hyperion was regulated by illumination effects than actual changes in LAI.In the context of precision agriculture and crop monitoring, such false alarms, i.e. caused by geometric variability, have profound implications.It may lead to incorrect crop condition diagnosis that can lead to wasteful use of scarce and expensive farm inputs (e.g.irrigation water, pesticides, and fertilizers) and poor yields as the stressed crops are not timely and adequately remedied.This can potentially lead to loss of trust in remotely sensed crop BVs for agronomic applications.
As one of the factors affecting reflectance variability (see Table 3), per-pixel geometric information has been ignored in retrieving BVs with optical satellite images.This may be a consequence of the absence of spatially explicit view and illumination information by heritage missions such as Landsat and AVHRR that largely drove BV retrieval techniques' development.As the results in this study have shown, the per-pixel geometric variables explain a reasonable amount of variability in satellitebased canopy reflectance; hence it should be incorporated into MLRA models to improve the accuracy and reliability of crop BVs.Therefore, it is reasonable to deduce that spectral bands or vegetation indices combined with geometric covariates offer the most significant prospects of reliable prescription maps for precision agriculture technologies, such as variable rate application (VRA) of fertilizer and pesticides, and to aid agronomic decisions.Although SNAP biophysical processor, based on PROSAIL LUT and ANN incorporate these geometric variables (Weiss and Baret 2020), validation studies (Kganyago et al. 2020;Xie et al. 2019;Djamai et al. 2019) have shown that the product is associated with significant uncertainties which render its site-specific agronomic application questionable.Therefore, based on the results here and in line with Combal et al. (2003), it can be assumed that locally parameterizing the PROSAIL LUT (using prior information) may reduce high uncertainty in BV retrieval and show the contribution of the geometric covariates.
Therefore, models with SB or red-edge SVIs and pixel-wise geometric covariates are more promising for LAI and CCC retrieval (see Figures S1 and S2).This is attractive for precision agriculture applications since these geometric covariates come standard with Sentinel-2 products and can be conveniently incorporated into BV retrieval models to account for per-pixel view and illumination geometry effects and achieve unprecedented accuracies and temporal consistency for relatively wide FOV sensors such as Sentinel-2.

Limitations of the study
Despite the promising results, the main limitation of this study was that the applicability of MLRA models constructed with experimental data is limited by the representativeness of such data (Weiss, Jacob, and Duveiller 2020); thus, the results obtained here may be crop-type, region-and phenology-specific.Therefore, further experiments in different geographic locations and periods are required to ascertain the contribution of perpixel geometric covariates.Moreover, simulated data using RTMs is recommended for future works to make the models widely applicable and assess the geometric variables' influence on BVs.Nonetheless, the multiple linear regression analysis showed a non-negligible influence on surface reflectance of the considered crop types, while RF variable importance indicated that they are influential to the model performance and accuracy of retrieving LAI, LC ab and CCC.Overall, this study found that per-pixel view and illumination geometric covariates improve MLRA crop BVs retrieval.Future studies should explore the optimization of Sentinel-2 feature space (including derived vegetation indices using feature selection to avoid potential collinearity of predictors and improve the practical prospects of combining geometric radiometric variables in retrieving crop BVs.

Conclusion
This study evaluated the influence of Sentinel-2 pixel-wise sensor view and solar illumination geometry on the accuracy of estimating crop biophysical and biochemical variables relevant for precision agriculture and crop monitoring, i.e.LAI, LC ab , and CCC.Generally, the results showed that the experimental scenarios incorporating per-pixel geometric covariates, i.e.SZA, VZA, and RAA, yielded robust retrievals of LAI and CCC, while there was a negligible improvement (i.e.R 2 : 1-2%) in LC ab retrieval accuracy.In particular, standard input variables, such as spectral bands and spectral vegetation indices, to biophysical retrieval models, combined with geometric covariates resulted in the best (i.e.lowest) RMSE for all considered crop BVs owing to high information content in the spectral bands and rededge indices, and residual variability explained by the pixel-wise view and illumination geometry.The results showed that this variability exceeded 30% in all Sentinel-2 bands and sites.Moreover, variable importance assessment showed that VZA and SZA were among the topmost influential variables in retrieving all BVs.The spatially explicit view and illumination information provided with Sentinel-2 can be used as additional information to account for variability resulting from sensor and sun geometries in BV retrieval with MLRAs.This study showed that ignoring the influence of sensor and sun geometry may lead to inconsistencies in BV retrieval as the SZA changes along the cropping season, and VZA varies across the scene.Moreover, their influence on reflectance is considerable, i.e. >30%; thus, estimated BVs can be incorrectly interpreted, as shown by Galvao et al. (2011).In precision agriculture, it may lead to an inaccurate crop health diagnosis, risking wasteful use of irrigation water, pesticides and fertilizers, poor yields, and loss of trust in remotely sensed BVs for agronomic applications.Based on this study's results, we recommend incorporating per-pixel view and illumination geometry in estimating canopy-level BVs such as LAI and CCC, especially when using wide-view sensors such as Sentinel-2.However, further tests are required to reaffirm the contribution of view and illumination geometry toward improving the crop BVs that support precision agriculture and aid agronomic decisions.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
The work was supported by the Horizon 2020 Framework Programme

Figure 1 .
Figure 1.Land cover distributions in Bothaville and Harrismith (orange and red squares), Free State province (dark grey), South Africa (adopted from Kganyago 2022).The insert maps are projected to UTM (Universal Transverse Mercator) with WGS-84 (World geodetic System 1984) Datum.The land cover dataset was obtained from the Department of Forestry, Fisheries and the Environment (https:// egis.environment.gov.za/gis_data_downloads,accessed 20 March 2021).

Figure 2 .
Figure 2. The schematic representation of the methods used in this study.Five experimental scenarios, i.e.Spectral bands (SB), SB and Angles (SB + Angles), Spectral vegetation indices (SVIs), SVIs and Angles (SVIs + Angles), and All variables (Avar) were used for crop biophysical and biochemical variable (BV) retrieval using two test datasets, i.e. cross-validation test dataset and an independent heldout test (i.e.unseen) dataset.

Figure 4 .
Figure 4. Summary statistics Sentinel-2 illumination and view angles in Bothaville (a-c) and Harrismith (d-f).The red line indicates the mean value for each geometric variable over 50,000 randomly sampled pixels.

Figure 5 .
Figure 5. Scatterplots for the best Random Forest (RF) models for retrieving various BVs using various Sentinel-2 covariates, crossvalidation test dataset pooled from Bothaville and Harrismith (i.e.a-c) and an independent held-out (i.e.30%) dataset from Harrismith (i.e.d-f).SB + Angles achieved the best LAI and CCC predictive accuracy with cross-validation test datasets (i.e. a and c), while SVIs + Angles resulted in superior predictive accuracy for LC ab with both test datasets (i.e.b and e), and for LAI with the Independent held-out (i.e.30%) dataset (i.e.d).Avar achieved superior predictive accuracy for CCC with the independent held-out dataset (i.e.f).

Figure 6 .
Figure 6.Spatial distribution of crop biophysical and biochemical variables for Bothaville (a-c) and Harrismith (d-f).The LAI and CCC maps in (a) and (c) were achieved with SB + Angles, while LC ab in (b) and (e) SVIs + Angles.Avar was used to map CCC in (f).

Figure 7 .
Figure 7. Top 10 important variables (high to low) for retrieving LAI, LC ab , and CCC using each experimental scenario identified with Random Forest (RF) and fivefold cross-validation with 100 iterations.

Table 1 .
The various experimental scenarios were designed by excluding and including view and illumination geometry from commonly used radiometric inputs (i.e.spectral bands and vegetation indices).
(Richter et al. 2012nd µg cm −2 for LC ab and CCC.The RRMSE is used for comparing different variables or ranges.RRMSE values of ≤10% are indicative of Excellent predictions, and 10% < RRMSE ≤20% are Good predictions(Richter et al. 2012).
(Gara et al. 2019) of the response variable (e.g.CCC) variance explained by the predictors (e.g.spectral bands).The RMSE and MAE, on the other hand, measure the degree of error between predictions and observations in the units of the BVs, i. eLastly, percentage Bias (%Bias) measures the model's tendency to overestimate or underestimate a specific BV.A model is deemed accurate when the %Bias is close to 0%(Gara et al. 2019).Prediction accuracy assessment was assessed using R-statistics version 4.1.2(Tierney 2012) packages "Metrics" (https://cran.r-project.org/web/packages/Metrics/index.html,accessed 18 July 2021) and "Fgmutils" (https://rdrr.io/cran/Fgmutils/,accessed 18 July 2021).

Table 2 .
The red-edge indices from Sentinel-2 MSI data.

Table 3 .
The amount of spectral variability explained by the geometric variables based on multiple linear regression.
SVIs + Angles, and Avar results, incorporating pixel-wise geometric information led to robust LAI, LC ab , and CCC predictions using Random Forest (RF).It can also be deduced that carefully chosen red-edge vegetation indices have high information content considering SVIs' next better performance after SVIs + Angles in the retrieval of LC ab .In contrast, spectral reflectance bands were relatively superior in retrieving LAI and CCC.All input variable scenarios achieved RRMSE of <5%, thus within the GCOS limit of 10%(GCOS, 2009).As shown in Figure

Table 4 .
The performance of various Sentinel-2 input variables in retrieving various crop parameters using Random Forest (RF) with two test datasets, i.e. cross-validation data pooled from the two sites and independent test data from Harrismith (in brackets).The bold values indicate the best (i.e.lowest) RMSE achieved by the input variable scenario for each biophysical/biochemical variable (BV).