Performance of SMOS Soil Moisture Products Over Core Validation Sites

The European Space Agency (ESA) launched the Soil Moisture and Ocean Salinity (SMOS) mission in 2009; currently, multiple global soil moisture (SM) products are based on the measurements of its L-band (1.4 GHz) radiometer. We compared four SMOS products with each other: Level 2, Level 3, IC (INRA-CESBIO), and near real-time products. The comparisons focused on core validation sites (CVS), whose spatial representativeness errors allow the estimation of the SM product performance for bias-insensitive metrics [unbiased root-mean-square error (ubRMSE) and correlation (<inline-formula> <tex-math notation="LaTeX">$R$ </tex-math></inline-formula>), and anomaly <inline-formula> <tex-math notation="LaTeX">$R$ </tex-math></inline-formula>] with negligible uncertainty and for bias-sensitive metrics [mean difference (MD) and root-mean-square difference (RMSD)] with acceptable uncertainty. When the products were compared with CVS independently, the results showed that the ubRMSE, <inline-formula> <tex-math notation="LaTeX">$R$ </tex-math></inline-formula>, and anomaly <inline-formula> <tex-math notation="LaTeX">$R$ </tex-math></inline-formula> of the IC product were better than those of the other products, while the MD was larger. However, the differences between the performances were smaller when the products were assessed using only the data points when each product had a valid retrieval. This indicates that the algorithms have similar performance and that data screening and quality flagging of the retrievals markedly affects the performance. The NASA Soil Moisture Active Passive (SMAP) mission produces a similar SM product as SMOS using an L-band radiometer. The closeness of the ubRMSE, <inline-formula> <tex-math notation="LaTeX">$R$ </tex-math></inline-formula>, and the anomaly <inline-formula> <tex-math notation="LaTeX">$R$ </tex-math></inline-formula> performance of the IC product and the SMAP product (0.039 versus 0.041 <inline-formula> <tex-math notation="LaTeX">$\text{m}^{3}/\text{m}^{3}$ </tex-math></inline-formula>, 0.80 versus 0.81, and 0.75 versus 0.75) demonstrate that the SMOS and SMAP radiometers can achieve similar SM sensitivity.

comparisons focused on core validation sites (CVS), whose spatial representativeness errors allow the estimation of the SM product performance for bias-insensitive metrics [unbiased root-meansquare error (ubRMSE) and correlation (R), and anomaly R] with negligible uncertainty and for bias-sensitive metrics [mean difference (MD) and root-mean-square difference (RMSD)] with acceptable uncertainty. When the products were compared with CVS independently, the results showed that the ubRMSE, R, and anomaly R of the IC product were better than those of the other products, while the MD was larger. However, the differences between the performances were smaller when the products were assessed using only the data points when each product had a valid retrieval. This indicates that the algorithms have similar performance and that data screening and quality flagging of the retrievals markedly affects the performance. The NASA Soil Moisture Active Passive (SMAP) mission produces a similar SM product as SMOS using an L-band radiometer. The closeness of the ubRMSE, R, and the anomaly R performance of the IC product and the SMAP product (0.039 versus 0.041 m 3 /m 3 , 0.80 versus 0.81, and 0.75 versus 0.75) demonstrate that the SMOS and SMAP radiometers can achieve similar SM sensitivity.  [2]. One of each mission's main objectives is retrieving global near-surface SM. L-band TB observations have been found to provide the best combination of sensitivity to SM with low sensitivity to atmospheric effects, vegetation, and surface roughness (see [3], [4]). Several SM data products have emerged from the SMOS and SMAP missions. Some of the products translate the TB observations directly to SM at the instrument footprint scale (see [5], [6], [7], [8], [9], [10]); some use other data sources to improve the spatial resolution, which comes with some compromise concerning the SM performance (see [11], [12]); and some assimilate the TB with land surface models (see [13]). Notably, some products use both SMOS and SMAP observations in the retrieval of SM and vegetation optical depth (VOD) (see [14], [15]). Numerous studies have evaluated these products, both separately (see [5], [6], [7]) and side by side (see [16], [18], [19]). However, none of the studies have employed reference sites that have multiple in situ measurement stations within an area corresponding to the size of the radiometer footprint, such as the core validation sites (CVSs) used by the SMAP mission in its product validation [17]. The value of the CVS is that they reduce the spatial representativeness errors for bias sensitive [i.e., mean difference (MD) and root-mean-square difference (RMSD] and render them essentially negligible for bias-insensitive metrics [unbiased root-mean-square error (ubRMSE), correlation (R), and anomaly R] [20].
This investigation aimed to assess and compare the performance of four SM products based on the SMOS measurements using 14 different CVS across the globe (see Fig. 1). In addition, the performance of these products was compared to that of an SMAP SM product over the same sites.
II. DATA A. SMOS L2 Soil Moisture Product The SMOS L2 SM algorithm is based on the L-band Microwave Emission of the Biosphere (L-MEB) radiative transfer model [21]. The approach is to retrieve SM and VOD by minimizing the difference between radiative transfer estimates of the TB and actual satellite measurements. The approach relies heavily on the multiangular measurements of SMOS to separate the vegetation contribution from the surface contribution [5]. The data are provided on the ISEA-4H9 grid (icosahedral Snyder equal-area projection with aperture 4, resolution 9, and shape of cells as a hexagon), which provides a uniform intercell distance of 15 km [22]. This analysis used version V700 of the product (the latest available).
The data were filtered using the quality information contained in the product. For assessing the potential degradation of the sample to radio frequency interference (RFI), an indicator was computed using the sum of the N_RFI_X and N_RFI_Y fields divided by the M_AVA0 field [23]. The threshold of the RFI indicator affects how many data points are available for validation. Data points were flagged out if the fifth bit of the confidence flag was set, the RFI indicator was over 0.1 [18], or the goodness of fit indicator (χ 2 P ) [24] was less than 0.05 [23].

B. SMOS L3 Soil Moisture Product
The SMOS L3 SM product is produced by the Centre Aval de Traitement des Données SMOS (CATDS). The L3SM dataset, as the L2 SM, is based on the L-MEB forward model, but three orbits within a one-week window are used to constrain the solution, assuming that the optical depth due to vegetation should be correlated in that period [6]. The product is provided on the 25-km Equal-Area Scalable Earth Grid version 2 (EASE-2) grid [25]. This analysis used version V330 of the product (the latest available).
The data were filtered using the quality information contained in the product. The RFI indicator used for the L2 product was computed using the same threshold (see Section II-A). Moreover, the data points were flagged out if χ 2 P was less than 0.05.

C. SMOS IC Soil Moisture Product
The SMOS IC product is also based on the original algorithm developed for SMOS [26], which is the foundation used for the L2 product [27]. IC (as for L2 and L3) retrieves VOD and SM simultaneously from a two-parameter inversion of the L-MEB model from the multiangular and dual-polarized SMOS observations. In contrast to the L2 and L3 algorithms, SMOS-IC retrievals are only made if TB is available for at least a 10 • incidence angle range within the [20 • , 55 • ] interval. This reduces the effective swath width and the number of retrievals but increases quality as retrievals made from an angular range narrower than 10 • have a higher uncertainty, and the incidence angles lower than 20 • are only viewed in the aliased field of view (see [28]). Consequently, improved quality is obtained at the cost of degraded coverage and revisit time. IC also differs from L2 and L3 in other respects, the main one being the fact that it assumes each pixel to be homogeneous: SM is retrieved over the whole pixel rather than over a fraction with a specific landcover/water fraction on it, as is done in L2 and L3. The data are provided on the 25-km EASE-2 grid. This analysis used version 2 of the product (the latest available).
The data were filtered using the quality information contained in the product. Data points were flagged if the TB rootmean-square error was more than 6 K or the Scene Flag was greater than one [27].

D. SMOS Near Real-Time Soil Moisture Product
The SMOS near real-time (NRT) SM product is based on a neural network trained with past SMOS L2 SM observations and uses the SMOS multiangular dual-polarized TB as inputs [7]. The product is developed for accessing SM very soon after the SMOS observation (less than 3.5 h), which requires streamlining the algorithm and the input parameters. The product is provided on the same ISEA-4h9 grid as the L2 product. Unlike the other SMOS products, the NRT is available only starting January 2016. The analysis used version V100 of the product from 1 January 2016, until 8 August 2018, and V200 from 8 August 2018, until 31 December 2020. These are the versions available for those dates; while the version may have some differences, here, they were assessed as one continuous product as that represents what is available for users.
The data were filtered using the quality information contained in the product. Data points were flagged if SM uncertainty was over 0.07 or the RFI probability was over 0.2.

E. SMAP L2 Enhanced Soil Moisture Product
The SMAP L2 enhanced radiometer-based product [9], [29] was used for additional comparison with the SMOS-based SM products. The SM produced using the dual channel algorithm (DCA), which is the current baseline algorithm for the product, was used in the study [30]. The SMAP data are available from 31 March 2015. The data are provided on the 9-km EASE-2 grid. The resolution of the product, however, is defined by a 33 km 2 based on the 3-km EASE-2 grid matching the scale of the radiometer footprint [9]. The analysis used version V5 (R18290) of the product (the latest available).
The data were filtered using the quality information contained in the product. Data points were flagged if the retrieval quality flag indicated that the quality was not recommended [31].

F. Core Validation Site Data
CVSs were used for ground reference in the study. They include multiple SM monitoring stations within each satellite resolution cell. Fig. 1 shows the locations of the CVS used in the analysis (Table S1 lists more information on the sites). The SM data were quality controlled as described in [32]. The entire period of the analysis was 2011-2020. For some sites, the data start later than in 2011 ( [33] shows the periods of availability for each site).

III. METHOD
In assessing the product performances over the CVS, an overriding priority was to make the comparisons equitable despite the varying alignment of the product grids with respect to the SM stations. The standard upscaling approach would compute the area average SM based on the stations within the observed satellite footprint and how they are distributed within that area. However, computing these values for each grid type separately for each CVS would result in artificial differences. Therefore, the same CVS SM value was used for all products; in computing the upscaled SM value, the alignment of the grids was accounted for. This was facilitated by defining a so-called hybrid footprint for each CVS. Conceptually, the hybrid footprint approximates a footprint for all products without systematically favoring any of the products. The center of the hybrid footprint was defined by the average of the centers of the grid pixels of each product that matches the CVS station distribution most accurately. The area of the footprint was defined as a circle with a 43 km diameter around the center point (see Fig. 2). The upscaled SM was computed using the stations within this hybrid footprint. The upscaling used the Voronoi diagram approach to avoid preferential weighting of stations within the footprint caused by a potentially uneven distribution of stations (i.e., clustering of stations) [32].
The upscaled SM was matched up to each satellite product using the data from the grid pixel closest to the hybrid footprint (the center of which was used in defining the hybrid footprint area) and the overpass time of the satellites. Each product was filtered based on the quality flagging approach mentioned in Section II. The performance metrics were computed for each product using the matchup time series. The assessment between the SM products and the CVS was conducted using five metrics: RMSD, ubRMSE, MD, Pearson correlation (R), and anomaly R, which were computed similarly as in [17]. The statistical confidence intervals were calculated for each metric following the approach given in [17]. The confidence interval does not represent the error in the CVS reference; it is only a statistical metric indicating the range within which the result is expected to fall based on the variability and amount of the data points [34].
Two different periods were used: 1) 2016-2020 for comparing SMOS L2, L3, IC, and NRT products and the SMAP product, and for comparing the four SMOS products for overpasses when each product provided a valid value based on their quality flagging and 2) 2011-2020 for comparing SMOS L2, L3, and IC products over a more extended period representing better the entire lifetime of the SMOS mission. In both cases, only the morning overpasses (ascending part of the orbit for SMOS and descending for SMAP) were used. Fig. 3 shows the result of the CVS matchup for the 2016-2020 period for all the products. The processing and flagging choices used in the algorithms and the analysis affect the conditions under which the retrievals were conducted and the number of data points available for the evaluation. The right of Fig. 3 shows the total number of retrievals used in the computation. The L2 and L3 products have a similar number with each other and with the SMAP product. The IC has substantially fewer, and the NRT product has considerably more data points. As explained in Section II-C, the screening for the highest quality TB measurements reduces the numbers for the IC product. IC has the best ubRMSE, R, and anomaly R performance (the IC average values are outside the confidence intervals of the other products) compared to the other SMOS products. However, the MD performance is substantially poorer than that for the L2, L3, and NRT products (the average is outside of the confidence intervals of the other products); consequently, the RMSD performance is very similar for all products. The individual CVS results are generally distributed closer to the mean value for the IC product than the other products indicating greater consistency for the retrieval performance across the sites. However, the mean absolute bias (MAB) for IC is larger than that for the other SMOS products.

IV. RESULTS AND DISCUSSION
The ubRMSE, R, and anomaly R performance of the IC product are very similar to those of the SMAP product. However, the SMAP product has an average MD closer to 0, which resulted in a smaller RMSD value. Also, the MAB value is the smallest despite a notable spread in the individual MD values. The comparisons for each CVS are shown in the Supplemental Material.
When the SMOS products are compared for data points with a valid retrieval value for each product, the performance differences are smaller [33]. The IC metrics are virtually unchanged, and the L2, L3, and NRT products have somewhat improved the ubRMSE performance, putting the IC average ubRMSE within their confidence intervals (but the mean values are still larger). The IC product benefits from using only the highest quality TB at the expense of less coverage; the other products gain part of that benefit when they are restricted to the same data points as IC. However, considering the susceptibility of SMOS to RFI, this also means that the filtering strategy used by IC is more effective by estimating the concurrent RFI impact [8], rather than using the probability map of RFI occurrence used by L2 and L3.
For the 2011-2020 period, the relative performances of L2, L3, and IC were very similar [33]. This indicates overall stability in the performance of the products despite some yearto-year changes shown by the time-series plots in Section IV of the Supplemental Material.
Based on [20], the uncertainties (not to be confused with the confidence intervals) for the bias-insensitive metrics are very low, but for the bias-sensitive metrics, some nonnegligible uncertainties remain depending on the site. The average MD between IC and L2, L3, and NRT was 0.017-0.024 m 3 /m 3 , and the difference between SMAP and L2, L3, and NRT was 0.030-0.037 m 3 /m 3 (Fig. 3). These differences are substantial even when considering the uncertainties, therefore likely reflecting real differences between the products.
The similarity of the NRT performance with the L2 performance (Fig. 3) is a consequence of the fact that the NRT product neural network was trained with the L2 product. The approach used by the L2 and L3 algorithms to handle landcover heterogeneities within pixels may also contribute to the better ubRMSE performance of IC, as speculated in [27].
All metrics are combinations of both instrument and algorithm performance. The SMOS IC product achieved essentially the same SM sensitivity (ubRMSE, R, and anomaly R) as the SMAP product, being evidence that the SMOS and SMAP radiometers can achieve similar SM sensitivity. The result shows that careful use of the SMOS TB angular data can compensate for the inherently better TB snapshot sensitivity and advanced RFI filtering of SMAP [35] for SM retrievals. Because of the low uncertainty of the CVS for determining bias-insensitive metrics, this result is particularly important. The difference in MD of the SMAP and SMOS products is therefore more likely attributable to the algorithm parameterization choices than instrument performance.

V. CONCLUSION
The SMOS L2, L3, IC, and NRT products were compared over the CVS, allowing the estimation of bias-sensitive and bias-insensitive metrics. The results showed that the IC product had the best sensitivity (ubRMSE, R, and anomaly R) but the worst MD; consequently, the RMSD of all SMOS products was very similar. Five-year comparisons had the same result as ten-year comparisons for the L2, L3, and IC products. The performance of the SMOS products became more similar when only those points having valid data available were used, emphasizing the significance of filtering and flagging the data in the retrieval process. Moreover, the IC sensitivity was very close to that of the SMAP radiometer-based product, which indicates that the SMOS and SMAP radiometers can achieve similar sensitivity to SM. The discrepancy in the MD performance is likely a consequence of the algorithm parameterization approach rather than caused by performance differences of the instruments.