Evaluation of Satellite-Derived Surface Soil Moisture Products over Agricultural Regions of Canada

Soil moisture is a critical indicator for climate change and agricultural drought, but its measurement is challenging due to large variability with land cover, soil type, time, space and depth. Satellite estimates of soil moisture are highly desirable and have become more widely available over the past decade. This study investigates and compares the performance of four surface soil moisture satellite datasets over Canada, namely, Soil Moisture and Ocean Salinity Level 3 (SMOS L3), versions 3.3 and 4.2 of European Space Agency Climate Change Initiative (ESA CCI) soil moisture product and a recent product called SMOS-INRA-CESBIO (SMOS-IC) that contains corrections designed to reduce several known sources of uncertainty in SMOS L3. These datasets were evaluated against in situ networks located in mostly agricultural regions of Canada for the period 2012 to 2014. Two statistical comparison methods were used, namely, metrics for mean soil moisture and median of metrics. The results suggest that, while both methods show similar comparisons for regional networks, over large networks, the median of metrics method is more representative of the overall correlation and variability and is therefore a more appropriate method for evaluating the performance of satellite products. Overall, the SMOS products have higher daily temporal correlations, but larger biases, against in situ soil moisture than the ESA CCI products, with SMOS-IC having higher correlations and smaller variability than SMOS L3. The SMOS products capture daily wetting and drying events better than the ESA CCI products, with the SMOS products capturing at least 75% of observed drying as compared to 55% for the ESA CCI products. Overall, for periods during which there are sufficient observations, both SMOS products are more suitable for agricultural applications over Canada than the ESA CCI products, even though SMOS-IC is able to capture soil moisture variability more accurately than SMOS L3.


Introduction
Surface soil moisture (SM) is a key component of the Earth system which can impact weather through its influence on evaporation and surface energy fluxes [1], with a lack of SM associated with drought occurrence [2] and an excess related to flooding [3,4]. Given its importance for regional-to-global hydroclimate variability and change, SM is considered as one of the 50 Essential Climate Variables (ECVs) in the Climate Change Initiative (CCI) project established by the European Space Agency (ESA) [5]. In a Canadian context, SM availability over agricultural regions has been identified as a limiting factor to agricultural yields [6]. Furthermore, the Prairie region, which accounts for more than 80% of agricultural land in Canada, is an important global food supplier [6,7], thus highlighting the need for SM measurement for applications related to agriculture in Canada.

In Situ Soil Moisture Data
The three in situ networks used to evaluate satellite-derived SM products over Canada are shown in Figure 1. The Ontario and Manitoba networks are run by Agriculture and Agri-Food Canada as part of the Real Time In Situ Monitoring for Agriculture (RISMA) network while the Alberta network, also called the Alberta Ground Drought Monitoring Network (AGDMN), is run by Alberta Agriculture and Rural Development. While all three networks are located in predominantly agricultural regions, they serve different purposes. The Ontario and Manitoba networks consist of multiple stations covering a relatively small area to capture SM variation within the radiometer footprint for validating modelled and satellite observed SM data while the Alberta network is a mesoscale network (mesonet) which captures SM variability over the province [25]. Even though the pixel-to-point comparison of SM is challenging due to the uncertainties associated with SM variability within a pixel footprint, previous studies have reported good agreement between coarse resolution satellite SM products and these networks [11,25]. Furthermore, while other physical measurement methods of SM have shown promise in addressing some of these challenges in some parts of the world [46], further research is required to assess their full potential for Canada [47]. The proportion of agricultural land, the cultivation practices, the landscape type, the soil and the climate also vary between the above network sites. A brief description of these networks is provided below.
Remote Sens. 2020, 12, x FOR PEER REVIEW 3 of 16 Canada as part of the Real Time In Situ Monitoring for Agriculture (RISMA) network while the Alberta network, also called the Alberta Ground Drought Monitoring Network (AGDMN), is run by Alberta Agriculture and Rural Development. While all three networks are located in predominantly agricultural regions, they serve different purposes. The Ontario and Manitoba networks consist of multiple stations covering a relatively small area to capture SM variation within the radiometer footprint for validating modelled and satellite observed SM data while the Alberta network is a mesoscale network (mesonet) which captures SM variability over the province [25]. Even though the pixel-to-point comparison of SM is challenging due to the uncertainties associated with SM variability within a pixel footprint, previous studies have reported good agreement between coarse resolution satellite SM products and these networks [11,25]. Furthermore, while other physical measurement methods of SM have shown promise in addressing some of these challenges in some parts of the world [46], further research is required to assess their full potential for Canada [47]. The proportion of agricultural land, the cultivation practices, the landscape type, the soil and the climate also vary between the above network sites. A brief description of these networks is provided below.  ) and (c) due to their close proximity. The land cover types, shown at 0.25°, are obtained from the most recent ESA CCI Land Cover map [48] and are grouped based on the IPCC land categories [48].
The stations in the RISMA network consist of three sets of replicate dielectric probes (Stevens Hydra Probe SDI-12) installed vertically at 0-5 cm and horizontally at 5 cm, 20 cm, 50 cm and, for some stations, at 100 cm. The Manitoba network consists of nine stations distributed over a largely agricultural region in the Prairie/Boreal Plain Ecozone in the Red River watershed and covers two main soil types, namely, heavy Red River clay soils to the northeast and sandier loam soils to the southwest. The Ontario network consists of five stations over largely agricultural land which covers  ) and (c) due to their close proximity. The land cover types, shown at 0.25 • , are obtained from the most recent ESA CCI Land Cover map [48] and are grouped based on the IPCC land categories [48].
The stations in the RISMA network consist of three sets of replicate dielectric probes (Stevens Hydra Probe SDI-12) installed vertically at 0-5 cm and horizontally at 5 cm, 20 cm, 50 cm and, for some stations, at 100 cm. The Manitoba network consists of nine stations distributed over a largely agricultural region in the Prairie/Boreal Plain Ecozone in the Red River watershed and covers two main soil types, namely, heavy Red River clay soils to the northeast and sandier loam soils to the southwest. The Ontario network consists of five stations over largely agricultural land which covers an area of about 20 × 20 km surrounded by patches of forest and which consists mainly of cultivated corn and soybean. Stations located over clay soils were previously reported to have less SM variability and retain wetness longer than sandy soils [25] and to have representativeness issues in Manitoba [11]. Therefore, classification of soil textures derived from the RISMA Network Metadata document [49] for both networks is given in Table 1. The term representativeness above refers to the ability of in situ stations to represent SM conditions of the field in which they are located. The Alberta network spans from 49 • N to 56 • N over the agricultural regions of the province. The northern region of the network is dominated by Boreal plain while the generally drier and warmer southeastern region is covered with mixed Prairie grasslands. Each station in the network consists of dielectric probes (Delta-T Theta) buried horizontally at 5 cm, 20 cm, 50 cm and 100 cm but none installed vertically. Data from 33 of the 38 stations are used in this work based on data availability for the duration of the study period. In addition, SM data for all networks are quality checked to remove unrealistically extreme values, and SM values below 0 m 3 m −3 and exceeding 1 m 3 m −3 are removed before computing the metrics for analysis. This process revealed that some stations over Alberta (such as Oyen and Oliver) recorded erroneous SM values exceeding 2 m 3 m −3 for the year 2017 (not shown).

SMOS Products
Global SM is monitored every 3 days by the SMOS satellite, with ascending and descending overpasses occurring at 6:00 and 18:00 local time, respectively [18,19]. The SMOS SM retrieval algorithm [50], which is used to produce the SMOS Level 2 (L2) SM product, retrieves SM by minimising the squared differences between the L-Band microwave emission of the biosphere (L-MEB) [51] forward simulations of multi-angular dual-polarisation brightness temperatures (TB) and the corresponding SMOS observations. Dynamic auxiliary data such as rainfall, snow, freeze defreeze and temperature are obtained from the European Centre for Medium Range Weather Forecasts (ECMWF) forecasts. The SMOS Level 3 (L3) [31] product used here provides enhanced robustness and quality of SM retrievals as compared to L2 by using several orbits, rather than a single orbit, to account for angular sampling and radiometric accuracy at the border of the swath. However, the algorithms in both products employ the same bottom-up approach to obtain simulated TB values at the sensor resolution and are impacted by the uncertainties associated with the higher resolution auxiliary data, such as land cover maps, which are used to characterise pixel heterogeneity [32]. The daily L3 product available on a 25 km Equal-Area Scalable Earth Grid version 2.0 (EASE-Grid 2.0) is used in this study and only data for the ascending orbit are used since data for both orbits were reported to be similar for the study region [25].
The SMOS-INRA-CESBIO (IC) product [32] differs from the L3 product in the following ways. Firstly, the IC algorithm does not take into account pixel land use and assumes the pixel to be homogeneous. Thus, the IC product is independent of the ECMWF SM auxiliary data used in the L2 and L3 algorithms to estimate TB in the sub pixel fractions of heterogeneous pixels. Secondly, the L-MEB vegetation and soil parameters in the IC product were calibrated using in situ SM measurements, based on the more recent findings of Fernandez-Moran et al. [52] and Parrens et al. [53], and differ from those used in the L2 and L3 products and which were defined from literature before the launch of SMOS. Thirdly, while the L3 algorithm is based on SMOS Level 1 C TB data, which contain multi-incidence angle TB at the top of the atmosphere, the SMOS-L3 full-polarisation angle-binned TB product is used in the IC algorithm for its ease and convenience of use. Finally, in the cost function including observed and modelled TB, the SM a priori values are obtained from ECMWF reanalysis data and the SM standard deviation is set to 0.7 m 3 m −3 in L3 [31] while both the a priori SM and SM standard deviation are set to 0.2 m 3 m −3 in IC [32].

ESA CCI Soil Moisture
The ESA CCI SM daily product is obtained by merging Level 2 SM datasets from available active and passive microwave sensors, for which the error characteristics are known [35]. Full documentation of the ESA CCI SM product is provided by Dorigo et al. [35], and information on the merging process can be found in the Algorithm Theoretical Baseline Document (ATBD) [44]. Briefly here, passive microwave products produced with the Land Parameter Retrieval Model (LPRM) [54] and active microwave SM products generated with the TU Wien method [55,56] are included given their consistency between the different sensors [33,57,58]. The Level 2 SM retrievals from all sensors are first interpolated to a common daily time step (0:00 UTC ±12h) using a nearest neighbour search in time. The radiometer-based products are scaled based on the Advanced Microwave Scanning Radiometer for the Earth Observing System (AMSR-E) [54] SM using cumulative distribution function (CDF) matching [33] and merged into a radiometer-only product known as PASSIVE. The scatterometer-based products are scaled with Advanced Scatterometer (ASCAT) [59] SM moisture and merged into a scatterometer-only product (ACTIVE). Finally, the ACTIVE and PASSIVE products are scaled using SM from GLDAS-Noah v1 [60] and then merged into a COMBINED product. The merging process employs a weighted average of measurements from all available sensors at each time step using triple collocation (TC) analysis [61]. For our study period of 2012 to 2014, ESA CCI contains information from two passive satellite products, namely SMOS and AMSR2 [62], and one active product, ASCAT-A/B. However, the SMOS product included here is a LPRM-based SMOS product, which ensures consistency with other passive products blended in ESA CCI [63].
The COMBINED products versions 3.3 (CCI3) and 4.2 (CCI4), provided at a 0.25 • spatial resolution, are used in this study. Furthermore, the above description of the merging process is the one employed for CCI3. CCI4 differs from CCI3 in three important ways. First, in CCI4, GLDAS-Noah soil temperature and Snow Water Equivalent estimates are used to mask out unreliable retrievals before performing CDF-Matching and TC error estimation. Second, before merging into the COMBINED product, "frozen" flags of the passive products are used to mask out unreliable retrievals in the active products and vice versa. Third, CCI4 combines all active and passive L2 products directly after CDF-Matching against GLDAS-Noah, instead of merging the pre-merged ACTIVE and PASSIVE products as in CCI3. Since the first ESA CCI product (v0.1) was released in 2012, every release has been continuously updated in the near present by introducing new satellite sensors over time [35].

Evaluation Strategy and Comparison Metrics
SM is evaluated for the months May to October when most land areas are snow-free over the study sites, and for the period 2012 to 2014 when ground data are consistently available from all networks. All statistical metrics are calculated for days when SM data are available for all products. In general, SMOS has more missing observations than ESA CCI, with 45% and 11% fewer days of data over the RISMA and Alberta networks, respectively. While there is virtually no difference between the number of days of data between CCI3 and CCI4 over all regions, IC has an average of 49% fewer days of data over Ontario and Manitoba and 27% fewer over Alberta as compared to L3. This difference between the SMOS products can be attributed to the use of different input TB products. The fixed angle L3 TB used in IC includes many corrections, which eliminate TBs impacted by anthropogenic (RFIs) and spurious effects (sun impact) [31], as compared to the multi-incidence angle L1 C TB used in L3 [32].
SM measurements at 5 cm depth available from all networks are used to evaluate the satellite SM products, which are all provided in volumetric units (m 3 m −3 ). For each in situ station, satellite data are extracted from the pixel that includes the point location of the station and compared with in situ measurements using several commonly-used statistical metrics, namely, the Pearson correlation coefficient (R ; Equation (1)), the bias (Bias; Equation (2)), the relative bias (Relative Bias; Equation (3)), the root-mean-square difference (RMSD; Equation (4)), the unbiased root-mean-square difference (ubRMSD; Equation (5)) and the normalized standard deviation (SDV; Equation (6)). The above metrics are calculated as follows: Relative Bias = Bias where Sat is SM measured by the satellite, InSitu is SM measured at an in situ station, N is the number of daily observations and σ Sat and σ InSitu are the standard deviations (computed over all days of the evaluation period) of the satellite and in situ measurements, respectively. We also assess the ability of the satellite products to capture wetting and drying by using standardized daily SM anomalies calculated with respect to the evaluation period [25,64]. We employ the nonparametric Kruskal-Wallis test [65] to identify the statistical significance of differences between the distribution of SM anomalies between the satellite and in situ products, and the Wilcoxon rank sum test [66] is used for post-hoc pairwise comparisons between the products. Statistical comparison of SM from in situ networks and satellite products often involves arithmetically averaging SM from all stations within a network before calculating statistical metrics like SDV, R and ubRMSD (henceforth Method A) [11,25,29]. However, some studies have used an alternative approach for measurement networks that cover larger spatial domains, where the statistical metrics are calculated for each station separately and the median for each metric is selected: the so-called "median of metrics" method (henceforth Method B) [67,68]. In this study, since we compare satellite SM with in situ SM across both large and small measurement networks, both methods are used and compared. For the median of metrics method, the spatial standard deviation across all stations for each network and for each metric is also included.

Comparison of the Satellite-Derived Products with In Situ Networks
We begin by examining the consistency in temporal variation of SM, as estimated from the absolute (m 3 m −3 ) in situ measurements and the satellite products. Figure 2 shows time series of warm-season (May to October) SM for Ontario, Manitoba and Alberta from 2012 to 2014. The in situ measurements reveal a saw-tooth pattern of variability, comprising large periodic spikes associated with rainfall events, followed by multi-week periods of drying. All of the satellite products represent this observed variability fairly well, although the ESA CCI products show noticeably more daily-frequency variability, but less intraseasonal variability than in situ measurements. The SMOS products have a much larger range of intraseasonal variability than ESA CCI, leading to qualitatively worse agreement with the in situ measurements than for ESA CCI. Features such as the higher frequency of spikes associated with rainfall events in Ontario as compared to Manitoba [25], and a consistent drying trend of >0.  In Figure 3 we compare the temporal variability and correlation between in situ and satellite SM products. While the SMOS products have larger biases in mean and variability, we find that their correlations with in situ measurements are somewhat higher than for the ESA CCI products for all networks. The average correlations over Ontario for ESA CCI and SMOS are R = 0.6 and R = 0.8, respectively, and are typically lower over Manitoba for both ESA CCI (R= 0.5) and SMOS (R = 0.7). Figure 3c also shows higher correlations (R > 0.8) for the SMOS products as compared to the ESA CCI products for most stations over Alberta. Furthermore, CCI4 tends to show higher correlations than CCI3, and IC tends to show higher correlations than L3, at all networks. Figure 3 also confirms that SMOS has larger variability than in situ measurements and ESA CCI, with CCI4 generally having lower variability than CCI3 and IC having lower variability than L3, at all networks. While the variability for both ESA CCI products is typically close to that of in situ measurements, the variability In Figure 3 we compare the temporal variability and correlation between in situ and satellite SM products. While the SMOS products have larger biases in mean and variability, we find that their correlations with in situ measurements are somewhat higher than for the ESA CCI products for all networks. The average correlations over Ontario for ESA CCI and SMOS are R = 0.6 and R = 0.8, respectively, and are typically lower over Manitoba for both ESA CCI (R= 0.5) and SMOS (R = 0.7). Figure 3c also shows higher correlations (R > 0.8) for the SMOS products as compared to the ESA CCI products for most stations over Alberta. Furthermore, CCI4 tends to show higher correlations than CCI3, and IC tends to show higher correlations than L3, at all networks. Figure 3 also confirms that SMOS has larger variability than in situ measurements and ESA CCI, with CCI4 generally having lower variability than CCI3 and IC having lower variability than L3, at all networks. While the variability for both ESA CCI products is typically close to that of in situ measurements, the variability for the SMOS products varies widely at the different observing networks. For instance, the variability for L3 is twice the variability of IC over Ontario, but they are much more similar over Manitoba.
Remote Sens. 2020, 12, x FOR PEER REVIEW 8 of 16 Alberta. These diagrams show the relative position of the satellite soil moisture data to the reference in situ data using temporal linear correlation coefficient and normalized standard deviation. The closer a point is to the in situ measured value, the closer the satellite data is to the ground measured data. The Taylor diagrams for Ontario and Manitoba also include points corresponding to the mean of stations located over different soil types, that is, sand and clay.
We also perform this analysis separately for stations located in sandy and clay soils, to investigate whether soil type explains the lower correlations seen over Manitoba as compared to Ontario (Figure 3a,b). The correlations obtained for stations in Manitoba located in clay soil are systematically lower than those located in sand (Figure 3b). The correlations for Manitoba stations located in sand agree closely with the correlations for all stations over Ontario (R ~ 0.7-0.8 for SMOS and R ~ 0.6-0.7 for ESA CCI), where there is very little difference between sandy and clay locations (Figure 3a). This suggests that, for an in situ SM network containing both clay and sand stations, the in situ measurements for clay stations that have representativeness issues, such as those located over Manitoba [11], can reduce correlations of that network with satellite products. For Manitoba, we also observe that the ESA CCI products closely match in situ variability regardless of the soil type, while SMOS variability matches in situ variability more closely for clay stations, though not as well as ESA CCI, and it is at least twice as large for sand stations. In contrast, the differences in correlation and variability for different soil types are negligible for all satellite products over Ontario.
Finally, we examine whether the evaluation of satellite SM products is different for Method A versus Method B (see Section 2.2.3). Figure 4 shows that both methods produce highly similar results over the smaller Ontario and Manitoba networks: the SMOS products show higher correlations, RMSD and variability than the ESA CCI products. One advantage of Method B is that it allows quantification of spatial variability in the statistical metrics, and the righthand panels of Figure 4 reveal considerably greater spatial variability for all metrics over Manitoba as compared to Ontario. Over the large, spatially-distributed Alberta network, both methods produce similar relative biases, RMSD and ubRMSD. However, there are some notable differences in the correlation and variability between the different versions of the ESA CCI and SMOS products (Figure 4a,b,i,j). For example, Method A shows that L3 has larger correlation and smaller variability than IC, while Method B shows the reverse, which is in closer agreement to the overall performance over Alberta (Figure 3c). We find higher correlations in IC than L3 at 84% of the stations, and smaller variability at all stations (not shown). This suggests that Method B is more appropriate to assess the correlation and variability over a spatially-distributed network like the one in Alberta. The closer a point is to the in situ measured value, the closer the satellite data is to the ground measured data. The Taylor diagrams for Ontario and Manitoba also include points corresponding to the mean of stations located over different soil types, that is, sand and clay.
We also perform this analysis separately for stations located in sandy and clay soils, to investigate whether soil type explains the lower correlations seen over Manitoba as compared to Ontario (Figure 3a,b). The correlations obtained for stations in Manitoba located in clay soil are systematically lower than those located in sand (Figure 3b). The correlations for Manitoba stations located in sand agree closely with the correlations for all stations over Ontario (R~0.7-0.8 for SMOS and R~0.6-0.7 for ESA CCI), where there is very little difference between sandy and clay locations (Figure 3a). This suggests that, for an in situ SM network containing both clay and sand stations, the in situ measurements for clay stations that have representativeness issues, such as those located over Manitoba [11], can reduce correlations of that network with satellite products. For Manitoba, we also observe that the ESA CCI products closely match in situ variability regardless of the soil type, while SMOS variability matches in situ variability more closely for clay stations, though not as well as ESA CCI, and it is at least twice as large for sand stations. In contrast, the differences in correlation and variability for different soil types are negligible for all satellite products over Ontario.
Finally, we examine whether the evaluation of satellite SM products is different for Method A versus Method B (see Section 2.2.3). Figure 4 shows that both methods produce highly similar results over the smaller Ontario and Manitoba networks: the SMOS products show higher correlations, RMSD and variability than the ESA CCI products. One advantage of Method B is that it allows quantification of spatial variability in the statistical metrics, and the righthand panels of Figure 4 reveal considerably greater spatial variability for all metrics over Manitoba as compared to Ontario. Over the large, spatially-distributed Alberta network, both methods produce similar relative biases, RMSD and ubRMSD. However, there are some notable differences in the correlation and variability between the different versions of the ESA CCI and SMOS products (Figure 4a,b,i,j). For example, Method A shows that L3 has larger correlation and smaller variability than IC, while Method B shows the reverse, which is in closer agreement to the overall performance over Alberta (Figure 3c). We find higher correlations in IC than L3 at 84% of the stations, and smaller variability at all stations (not shown).
This suggests that Method B is more appropriate to assess the correlation and variability over a spatially-distributed network like the one in Alberta.

Capturing Wetting and Drying Events over Agricultural Landscapes using Satellite-based SM
Since agricultural applications are mostly affected by periods of anomalously high/low SM, in this section we evaluate the ability of the different satellite products to capture the statistical distribution of standardized anomalies of daily SM (SM') binned by the sign of the in situ anomaly for each day. Figure 5a shows the distribution of in situ anomalies for wetting (SM' > 0) is skewed toward extreme wet days, with 75% of values below 1 s.d. in Ontario and Manitoba and individual extreme wet days as large as 4 or 5 s.d. Alberta SM' tends to be the most variable for wetting, with only 50%

Capturing Wetting and Drying Events over Agricultural Landscapes Using Satellite-Based SM
Since agricultural applications are mostly affected by periods of anomalously high/low SM, in this section we evaluate the ability of the different satellite products to capture the statistical distribution of standardized anomalies of daily SM (SM') binned by the sign of the in situ anomaly for each day. Figure 5a shows the distribution of in situ anomalies for wetting (SM' > 0) is skewed toward extreme wet days, with 75% of values below 1 s.d. in Ontario and Manitoba and individual extreme wet days as large as 4 or 5 s.d. Alberta SM' tends to be the most variable for wetting, with only 50% of the days below 1 s.d.. The in situ measurements for drying (SM' < 0) reveal almost the opposite behavior, skewed toward dry extremes but with fewer large outlier days than for the wetting events, with Ontario and Manitoba slightly more variable than Alberta (Figure 5b). We suspect that this behaviour for Alberta is due to the larger apparent SM variability during the wetter first half of the warm season every year as compared to SM variability during the drier second half (Figure 2), while for Ontario and Manitoba, there seems be larger variability in the local minimums throughout the warm season during the study period.
Remote Sens. 2020, 12, x FOR PEER REVIEW 10 of 16 of the days below 1 s.d.. The in situ measurements for drying (SM' < 0) reveal almost the opposite behavior, skewed toward dry extremes but with fewer large outlier days than for the wetting events, with Ontario and Manitoba slightly more variable than Alberta (Figure 5b). We suspect that this behaviour for Alberta is due to the larger apparent SM variability during the wetter first half of the warm season every year as compared to SM variability during the drier second half (Figure 2), while for Ontario and Manitoba, there seems be larger variability in the local minimums throughout the warm season during the study period. Turning to the satellite products, the mean anomaly across all networks and all satellite products is closer to zero than the in situ mean for both wetting and drying, due to a substantial number of days where the sign of the anomaly is opposite to the in situ data. For wetting, the large number of negative values for all products-representing drying during periods when in situ is wet-indicates considerable disagreement in post-rainfall SM estimates, with only 57% of wetting events captured by SMOS over Manitoba (Table 2). For both wetting and drying, a nonparametric Kruskal-Wallis test indicates a highly significant (P<0.05) difference between the medians of all satellite products and the in situ data. However, for drying events we note slightly better agreement with in situ data for the SMOS products (IC and L3) than the ESA CCI products (Figure 5b), with a higher proportion of anomalies of the correct sign (Table 2) and the median SM' closer to in situ (Table 3). A post-hoc Wilcoxon rank sum test reveals that while the satellite products are generally statistically significantly Turning to the satellite products, the mean anomaly across all networks and all satellite products is closer to zero than the in situ mean for both wetting and drying, due to a substantial number of days where the sign of the anomaly is opposite to the in situ data. For wetting, the large number of negative values for all products-representing drying during periods when in situ is wet-indicates considerable disagreement in post-rainfall SM estimates, with only 57% of wetting events captured by SMOS over Manitoba (Table 2). For both wetting and drying, a nonparametric Kruskal-Wallis test indicates a highly significant (P < 0.05) difference between the medians of all satellite products and the in situ data. However, for drying events we note slightly better agreement with in situ data for the SMOS products (IC and L3) than the ESA CCI products (Figure 5b), with a higher proportion of anomalies of the correct sign (Table 2) and the median SM' closer to in situ (Table 3). A post-hoc Wilcoxon rank sum test reveals that while the satellite products are generally statistically significantly (P < 0.05) different from in situ for most cases, the distribution of anomalies from IC and in situ are not significantly different for drying over Ontario, and that at least one SMOS product is statistically significantly (P < 0.05) different from the ESA CCI products for most cases. This suggests that IC is the best-performing product for this metric. A robust feature of the distribution of SM' in the satellite products is far fewer extreme dry anomalies than extreme wet anomalies, and this is true even for the subset of events where SM' < 0 (Figure 5b). In other words, even though the mean anomaly of the satellite products is negative for days where in situ detects drying (SM' < 0), these products still capture many individual large amplitude wetting events. We speculate that this is due to an overestimation of surface wetness in the satellite products shortly after rainfall events, where the satellite measurements may represent a thinner contributing layer of soil [25]. This behavior is observed for all networks but seems particularly pronounced for Alberta since for each condition, about 6-7 times more anomaly values from 33 stations over Alberta are used as compared to the fewer Ontario and Manitoba stations.

Discussion and Conclusions
Two versions of the ESA CCI SM product, v3.3 (CCI3) and v4.2 (CCI4), SMOS-L3 (L3) and a more recent SMOS product known as SMOS-IC (IC), were evaluated against regional (Ontario and Manitoba) and provincial (Alberta) in situ daily SM monitoring networks over important agricultural regions of Canada. We found that SMOS products generally show higher temporal correlations with in situ measurements as compared to ESA CCI, irrespective of the soil texture or location. However, SMOS products tend to have larger biases, RMSD and SM variability than ESA CCI but are able to better capture anomalies, even though all products capture drying better than wetting.
Overall, ESA CCI could be more appropriate than SMOS for short period studies spanning over a few days or weeks, especially at regional scales, due to the large difference in the number of observations between them. However, given their ability to capture anomalies and SM variability at regional and provincial scales, for periods during which there are sufficient observations, SMOS is more suitable for agricultural applications over Canada than ESA CCI. We also found that, overall, CCI4 and IC outperformed their counterparts in all comparisons.
Several possible reasons have been reported for the dry bias in SMOS [26,41]. First, the sampling depth for SMOS at 0-3 cm is shallower than the in situ SM measurement at 5 cm [70]. Second, it is possible that the SM measured in situ could be overestimated due to soil compaction, especially during dry periods [71]. Third, Cui et al. [41] reported that an underestimation in surface temperature could be a factor causing the dry bias in SMOS. Finally, RFIs can increase the recorded TB in SMOS and thus a dry bias in the retrieved SM [14]. Furthermore, the drier bias in IC as compared to L3 is consistent with previous literature [72] and could be due to the small constant initial value for SM in the TB cost function in IC [32]. As for ESA CCI, in case RFIs are detected in multi-frequency retrievals, retrievals at a higher frequency such as X-band are selected [35], which could explain the smaller bias as compared to SMOS, even though the bias in ESA CCI is also characteristic of the bias present in GLDAS-Noah since its dynamic range is imposed on ESA CCI [73,74].
Our results also suggest that the "median of metrics" method is more appropriate than the more common "metrics of mean SM" method for evaluating the correlation and variability of satellite products with large in situ networks (Figure 4). While the small spatial variability for Ontario shows good representativeness of all stations, the large spatial variability over Manitoba due to high soil diversity shows weak representativeness of some stations over that network and agrees with previous literature [11,25]. However, while the low correlation values over clay soils did not have a large influence on the overall correlation for Manitoba, the observed lower SM variability of these soils can influence the comparison of trends. For instance, clay soils can retain SM longer than sandy soils due to having lower SM variability [25]. Since all satellite products showed considerable drying when the stations in Manitoba showed wetting, it could mean that the observed wetting for that network is an artefact resulting from low SM variability at the clay stations.
In Section 3.1 we showed that of the SMOS products, L3 has much higher temporal variability than IC over Ontario, as compared to Manitoba and Alberta. We speculate that L3 could be influenced by the relatively higher spatial variability in land cover over Ontario, which is not present over Manitoba and Alberta; for example, the individual satellite pixels that include the Ontario stations contain large fractions (>50%) of forest, open water, and non-agricultural land [25]. The L3 retrieval algorithm takes into account pixel land use and heterogeneity while IC does not, and therefore could be impacted by the uncertainties present in the auxiliary datasets used to characterise pixel heterogeneity [32]. We believe that other methodological differences in the IC product, such as the use of a more robust TB product, and the updated L-MEB vegetation and soil parameters [32], are less likely to explain the observed differences in SM variability. The effect of changing the TB data removes outlier retrievals that were present in L3, reducing the overall number of non-missing observations in IC; however, we compare products only for days when data is available from all products. Furthermore, updated land cover parameters in IC differ from L3 only for areas of low vegetation, while for forested areas the parameters are the same in both products [32].
This study attempted a pixel-to-point comparison, which presents well-known challenges due to inconsistencies in the spatial averaging scale of satellite products versus in situ probe measurements and the treatment of pixel heterogeneity in satellite products. Future work including spatial gridded comparison between these satellite products and reanalysis products, such as the recently released fifth generation of ECMWF reanalysis, ERA5 [75], could reveal more information about the spatial characteristics of these products and the consistency among these different products. Finally, we note that a more recent L-band product, the Soil Moisture Active and Passive (SMAP) [20], launched in 2015, has been shown to capture relative soil moisture trends well over Canada [76]. We anticipate that future work comparing IC and SMAP for a more recent period should evaluate which of these products performs best for monitoring SM variability across Canadian agricultural regions.