A novel global grid model for soil moisture retrieval considering geographical disparity in spaceborne GNSS-R

Spaceborne global navigation satellite system-reflectometry has become an effective technique for Soil Moisture (SM) retrieval. However, the accuracy of global SM retrieval using a single model is limited due to the complexity of land surface. Introducing redundant ancillary data may also result in over-reliance problems. Therefore, we propose a method for SM retrieval that considers geographical disparities using the data from Cyclone GNSS (CYGNSS) observations and Soil Moisture Active and Passive (SMAP) product. Based on the CYGNSS effective reflectivity and ancillary datasets of SMAP, we establish five models for each grid with different parameters to achieve global SM retrieval. Subsequently, an optimal model, determined by the performance indicator, is used for SM retrieval. The results show that the root mean square error S RMSE with the improved method is decreased by 9.1% using SMAP SM as reference with the S RMSE = 0.040 cm 3 /cm 3 compared with using single reflectivity-temperature-vegetation method. Additionally, using the in-situ SM of International Soil Moisture Network as reference, the overall correlation coefficient R and S RMSE values with the improved method are 0.80 and 0.064 cm 3 /cm 3 , respectively. The average R of the chosen sites is increased by 22.7%, and the average S RMSE is decreased by 8.7%. The results indicate that the improved method can better retrieve SM in both global and local scales without redundant auxiliary data.


Introduction
Soil Moisture (SM) is a crucial parameter in the water cycle as it links the atmosphere with the land.Accurate SM estimation is essential for advancing research on water cycle dynamics and crop growth (Holzman & Rivas, 2016;Schlüter et al., 2022).The traditional methods for estimating large-scale or global soil moisture primarily rely on microwave remote sensing.However, for SM retrievals with high spatial and temporal resolution active and passive microwave remote sensing techniques poses a significant challenge (Kuenzer et al., 2013).Recently, Global Navigation Satellite Systems-Reflectometry (GNSS-R), wherein GNSS signals reflected from the Earth's surface are utilized in a forward bistatic radar configuration, has emerged as an effective remote sensing technique for estimating Earth surface geophysical parameters (Rodriguez-Alvarez et al., 2011, 2019;Wigneron et al., 2008).GNSS-R operating L-band signals can effectively penetrate the atmosphere, vegetation, and rain (Alonso-Arroyo et al., 2016;Balasubramaniam & Ruf, 2020;Camps et al., 2020).Moreover, with hundreds of GNSS satellites in orbit, GNSS-R benefits from many signal sources (Bu et al., 2020;Kim & Park, 2021), which enables accurate SM estimation with high spatial and temporal resolution.
Remote sensing observations of the physical parameters on the Earth's surface using GNSS-reflected signals can be traced back to the later 1980s (Jin et al., 2024;Pan et al., 2020).Martin-Neira (1993) is the first to use GNSS-reflected signals to retrieve seasurface heights.Subsequently, GNSS-R have demonstrated the capacity for surface geophysical parameter retrieval in ground-based and airborne platforms (Yu et al., 2014), such as those associated with soil moisture (Larson et al., 2008;Wu et al., 2021;Yan et al., 2022), sea level (Liu et al., 2022;Rajabi et al., 2021;Wang et al., 2019), water-level monitoring (Ichikawa et al., 2019;Wang et al., 2021), sea surface wind speed (Dong & Jin, 2019;Li et al., 2021), and snow depth (Jin et al., 2016;Yu et al., 2015;Zhou et al., 2019).Jin et al. (2024) summarized the progress of GNSS-R technology in various applications and discussed the current challenges and development prospects in multiple fields.The GNSS-R platform has expanded from ground-based and airborne to spaceborne (Zavorotny et al., 2014).Owing to its ability to rapidly obtain the surface physical information of a large area, spaceborne GNSS-R has broader application prospects, such as in forest biomass retrieval (Carreno-Luengo et al., 2020;Chen et al., 2021) and flood monitoring (Chew & Small, 2020;Zhang et al., 2021).
Spaceborne GNSS-R has a great capacity for SM retrieval (Al-Khaldi and Johnson, 2021a;Nan et al., 2022).Chew et al. (2016) found that there was a high correlation between SM and the observable derived from TechDemoSat-1.Camps et al. (2018a) also revealed the high correlation between SM and the peak power of the Delay Doppler Map (DDM).Based on the above works, Chew and Small (2018) established a linear relationship between Cyclone GNSS (CYGNSS) effective reflectivity and the SM derived from Soil Moisture Active and Passive (SMAP).However, due to the complex surface environment, the CYGNSS effective reflectivity is influenced not only by SM but also vegetation, surface roughness, soil surface temperature, and other factors (Dong et al., 2023;Izadgoshasb et al., 2024;Pierdicca et al., 2014).Clarizia et al. (2019) proposed a Reflectivity-Vegetation-Roughness (R-V-R) ternary linear regression algorithm to comprehensively consider the influences of vegetation and roughness on the CYGNSS effective reflectivity.Eroglu et al. (2019) established a SM retrieval method using an artificial neural network to learn this complex relationship.Additionally, the soil surface temperature also influences the effective reflectivity (Wigneron et al., 2008).Thus, Zhu et al. (2022) proposed a Reflectivity-Temperature-Vegetation (R-T-V) method to estimate SM by analyzing the impact of surface temperature on the CYGNSS effective reflectivity.Yan et al. (2020) proposed CYGNSS observables that can resolve the contributions of SM and surface roughness.Camps et al. (2018b) found that the relationship between the spaceborne GNSS-R effective reflectivity and other factors was affected by the geographical disparities.Moreover, Jia et al. (2024) proposed an advanced SM retrieval method based on Geographically Weighted Regression (GWR) which encompasses various spatial weights.It can preserve local spatial relationships and patterns while providing fine-resolution SM estimates.Therefore, it is necessary to consider geographical disparities for global SM retrieval.Additionally, introducing redundant parameters may also result in over-reliance problems with heavy-loaded ancillary data (Yan et al., 2020;Yang et al., 2024).
Following the studies and the associated limitations identified, we developed a SM retrieval method with spaceborne GNSS-R that considers the geographical disparities.This work aims to mitigate the impact of geographical disparities on SM estimates and avoid using redundant ancillary data.In this work, the CYGNSS effective reflectivity accounts for the effects of SM, surface roughness, vegetation, and soil surface temperature.Additionally, the relationship between the auxiliary parameters included in the model and geographical disparities is investigated.This work can help separate the effects of SM, vegetation, and other factors on the CYGNSS effective reflectivity.Compared with previous studies, this paper proposes a simple and effective method for SM retrieval.The remainder of this paper is organized as follows.The collocations of CYGNSS, SMAP, and International Soil Moisture Network (ISMN) data are described in Section "Datasets"; The development of the improved method is presented in Section "Methodology"; The results are presented in Section "Results and discussion"; The concluding remarks are given in Section "Conclusion".

Datasets
The data utilized in this study are derived from CYG-NSS, SMAP, and ISMN.The observations of CYGNSS and SMAP from January 2019 to December 2019 are collocated in an Equal-Area Scalable Earth (EASE) 2.0 36 km × 36 km grid.Note that the original data from Day of the Year (DOY) 171 to DOY 203 has not been provided in the SMAP product.The SM data of SMAP and ISMN are used for validation.

CYGNSS
CYGNSS, a constellation of eight miniature satellites launched by National Aeronautics and Space Administration (NASA) in 2016, is designed to monitor the global surface within a latitude range of ± 38° (Wu et al., 2020;Zavorotny & Voronovich, 2000).Each CYGNSS satellite simultaneously records four GNSS signals, with mean and median revisit times of 7 and 3 h, respectively (Jia et al., 2021).In this study, the observations derived from CYGNSS version 3.0 (L1) data are the latitude and longitude of a specular reflection point, the distance from the specular point to the CYGNSS spacecraft and the Global Positioning System (GPS) satellite, effective isotropic radiation power of GPS, antenna gain, DDM, and incident angle.

SMAP
SMAP, an Earth observation satellite launched by NASA in 2015, is initially designed to provide global SM levels and freeze-thaw classification using radar technology (Chew & Small, 2018).Although the active radar malfunctioned in July 2015, the microwave radiometer is still operational and provides important data for SM research and applications.By evaluating SMAP products, Colliander et al., (2019aColliander et al., ( , 2019b) ) demonstrated that L-band microwave radiometer data provided the expected accuracy for satellite design.In this study, the SM dataset derived from the SMAP L3 Radiometer Global Daily 36-km EASE-Grid (version 8) product is used as the reference dataset.This SM P SM dataset is the descending orbit data.Addition- ally, the other auxiliary datasets utilized in this study, as shown in Fig. 1, are Surface Roughness (SR) P SR , Vegetation Optical Depth (VOD) P VOD , Soil surface Temperature (ST) P ST , and Vegetation Water Content (VWC) P VWC .

ISMN
The ISMN was established in 2009 to maintain a global in-situ SM database.It is a centralized data-hosting facility that supports the calibration and validation of global satellite products (Dorigo et al., 2011).With numerous operational and experimental SM networks worldwide, the global in-situ SM database serves as a valuable resource for validating SM retrieval.In this study, the insitu SM data of ISMN is aggregated daily for field validation.The data at a depth of 5 cm for ISMN is used due to the limited penetration of L-band.

Methodology
In this section, the improved SM retrieval method for CYGNSS is investigated.The method consists of five linear models, similar to the R-V-R method proposed by Clarizia et al. (2019).

CYGNSS effective reflectivity
Previous studies found a strong relationship between CYGNSS effective reflectivity and SM (Senyurek et al., 2020a(Senyurek et al., , 2020b)).Loria et al. (2023) demonstrated that land-surface Delay Doppler Maps (DDMs) showcase the scattering behaviors from pure coherent reflection to pure incoherent scattering, as well as a combination of both.In the regions with dense vegetation or large topographic variation and roughness, the coherent component contained in the reflected signal is weaker than the incoherent component (Jin et al., 2024;Ruf et al., 2018;Zavorotny et al., 2014).Al-Khaldi et al. (2019) found that CYGNSS land observations were primarily coherentcomponent-dominated with the incoherent component having minimal impact on soil moisture retrieval.Like previous literature, this study also assumes coherent reflectivity as the dominant factor across the land surface.Thus, the CYGNSS effective reflectivity ( Γ coh CYGNSS ) Fig. 1 The spatial maps of SMAP yearly mean P SR , P VOD , P ST , and P VWC , respectively can be calculated using the following formula (Al-Khaldi et al., 2021b) ,where Γ coh CYGNSS is the CYGNSS effective reflectivity; P t is the transmitting power of the GPS satellite; P r is the peak value of the simulated scattering power DDM; G t and G r are the gains of the reflecting and receiving antennae, respectively; P t G t can be expressed by the Equivalent Isotope Radiation Power (EIRP) of the GPS transmitter at the specular reflection point; R ts and R rs represent the distance between the GPS signal transmitter and the CYGNSS receiver to the specular reflection point, respectively; is the wavelength of the GPS L1 signal.

Data quality control
The data quality control in this study is as follows.The CYGNSS data with incident angles exceeding 65° are excluded to reduce DDM noise.The observations with Signal-to-Noise Ratio (SNR) less than 2 dB, as well as those with SNR equal to or greater than the receiver antenna gain plus 14 dB, are also excluded.The sampling points with poor accuracy are also eliminated according to the variable quality mark of the data extraction.The SMAP data with SM values lower than 0.01 cm 3 /cm 3 and VWC values higher than 18 kg/m 2 are removed to reduce the error caused by low SM value and the effect of dense vegetation (Camps et al., 2020).

Development of the improved method
Due to the complex surface environment, the influences of vegetation, water, SM, and other factors on spaceborne (1) GNSS-R SM retrieval are difficult to define precisely (Camps et al., 2018b).Thus, analyzing the geographical disparities in different regions is necessary for improving SM retrieval.With SMAP product assimilated the global land surface types of the International Geosphere-Biosphere Programme (IGBP), the grids can be marked with land cover categories to consider geographical disparities.Figure 2 shows the 16 land cover categories globally.
From the distribution of land cover categories, the global geographical disparity is obvious.
Here, the auxiliary parameters ( P SR , P VOD , P ST , and P VWC ) are used to compensate for the CYGNSS effective reflectivity in spaceborne GNSS-R SM retrieval.Note that the P VOD is the same as 'tau' parameter normalized by the cosine of the incidence angle in the 'tau-omega' model, with the incidence angle set to a fixed value (40°).Considering the impacts of vegetation in different incident angles, we recalculate the P VOD parameter with the incident angle at the specular reflection point.Besides, the P VWC parameter without incident angle is addition- ally introduced to compensate for reflectivity.The results of the significance difference demonstrate that P VOD and P VWC are different at the significance level of 5%.
The specific SM retrieval method is illustrated in Fig. 3.As mentioned previously, introducing redundant auxiliary parameters may result in over-reliance problems.Yan et al. (2024) implemented a variable importance analysis by sequentially excluding input data and measuring the decrease in the accuracy of the results retrieved by each model.Thus, the removal of input variables can be an optional solution for sensitivity analysis and addressing the coupling problem.The similar approach is used to pair auxiliary parameters and combine them with the CYGNSS effective reflectivity after quality control.The number of auxiliary parameters in each model is two.Then, five models consisting of five groups of triadic linear models are established as shown in Table 1.In addition to the two models (R-S-V and R-T-V), three models are established (R-S-T, R-S-W, and R-T-W).Additionally, to obtain a more stable model, the data is divided randomly into training set and verification set.
The training set comprises 70% of the data, while the validation set account for 30%.The regression coefficients in each linear model are calculated in the process of model training.Five ternary linear models are simultaneously fitted within the grid, with each model having its regression coefficients.To maintain the generalization ability of models, the training and validation sets are the same for each model.Additionally, only the results with close accuracy verification between the training set and the validation set are recorded.
The root-mean-square error I RMSE , correlation coef- ficient I R , and coefficient of determination I D of grid

Model number Model name Equation expression
Model 1 R-T-V F (1) fitting are used to assess the performance of the proposed models.These indexes are obtained using the validation set for each linear model.Due to the different values of I RMSE , I R , and I D for different models, it is difficult to judge the optimal model in each grid.Therefore, we propose a performance indicator, which is defined as: where I is the performance indicator; The model with minimum I is selected as the optimal model for the grid.
Meanwhile, the auxiliary parameters used by the optimal model are considered the optimal parameters for the grid.

Results and discussion
In this section, the relationship of the optimal model in different grids with characteristic regions is investigated.Subsequently, the SM retrieval performance of the proposed method is evaluated using SM from SMAP and ISMN. (2)

Analysis of geographical disparities using the optimal model in a grid
Here, the optimal model, as well as the relationship between the CYGNSS effective reflectivity and the auxiliary parameters in different land cover categories and the characteristic regions are presented.The average correlation coefficient is used to assess the sensitivities between CYGNSS effectivity reflectivity and influenced factors.Due to the positive and negative relationship between the reflectivity and other impact factors, the absolute value of correlation coefficient in each grid is used.Figure 4 illustrates these sensitivities in different land cover categories.In addition to the P SR , the other auxiliary parameters exhibit a higher correlation with reflectivity compared to other influencing factors.As previously mentioned, the sensitivities of P VWC and P VOD to the CYGNSS effective reflectivity are different.
Due to the large number of grids in the world, the grids in the characteristic regions (i.e., Southeast China hills, Sahara Desert, Great Artesian Basin, Himalayas, Congo Basin, and Deccan Plateau.) are additionally used for further analysis.As shown in Table 2, the results demonstrate that the average correlation coefficients for P ST , P VOD , and P VWC are higher than those for P SR across all land cover categories.A higher correlation between P VWC and CYGNSS effective reflectivity is observed in the Himalayas and Deccan Plateau.Furthermore, the averaged correlation coefficient of P VWC is higher than that in the Sahara Desert region.Additionally, there are differences in the sensitivities of parameter P VWC and P VOD .In the Sahara Desert, the average correlation coef- ficient of P VOD reaches 0.101, while that of P VWC is 0.018.The specific distributions of the models in these regions are shown in Figs. 5, 6, and 7.Although model 1 (R-T-V model) is widely distributed in Sahara Desert, other models are also identified as the optimal choice in certain grid areas.From these findings and those in Table 2, one can conclude that auxiliary parameters with a high correlation value cannot accurately compensate for  CYGNSS effective reflectivity.For instance, P VWC exhib- its a higher correlation with CYGNSS effective reflectivity than P VOD , but the model 1 (R-T-V model) is still determined as the optimal model in the grids of Deccan Plateau.These results demonstrate that it is insufficient to rely solely on the correlation value between the auxiliary parameters and CYGNSS effective reflectivity to determine the most suitable auxiliary parameter for SM retrieval.
Figure 8 illustrates that model 4 (R-T-W model) is the most accepted globally.The model 1 (R-T-V model) is predominantly for arid regions, such as the Arabian Peninsula and Sahara Desert, which are characterized by small surface fluctuations and sparse vegetation.According to Table 3, the number of models with P SR is one order of magnitude lower than that without P SR .This result reflects the limita- tions in using the static variable P SR to compensate for CYGNSS effective reflectivity in global SM retrieval method in most regions.Therefore, the optimal model in each grid and the compensation parameters can be determined by a comprehensive comparison of multiple linear models.Moreover, this approach can avoid introducing heavy-loaded auxiliary parameters.

Global SM retrieval results
The global distribution of the Root Mean Square Error (RMSE) S RMSE and correlation coefficient R for the improved method is shown in Figs. 9 and 10.From the figures, notable global distinctions are observed in different regions.The R values in most land regions are greater than 0.6.The R values in the regions with small surface fluctuations and sparse vegetation are greater than 0.8.Furthermore, the S RMSE is generally less than 0.06 cm 3 / cm 3 with lower values observed in most regions, such as Africa.One should also note that the R values of the Indian Peninsula is greater than 0.8, but the S RMSE is poorer compared with the regions with lower R values.Similarly, the performance of the SM retrievals in central Australia is better than that in the eastern regions surrounded by water and vegetation, which exhibit lower From Fig. 11, the scattered points of the retrievals are mostly distributed along the diagonal line, with R = 0.923 and S RMSE = 0.040 cm 3 /cm 3 .Moreover, the fitting performance is better in the areas where the SM values are lower than 0.15 cm 3 /cm 3 .The results indicate that CYGNSS tends to underestimate the SMAP SM, especially in the regions with high SM values.
The R-T-V model, which exhibits the best SM retrieval effect among the five linear models (see Table 4), is used as the reference.The improved method demonstrates a decrease in S RMSE and the Mean Absolute Error (MAE) S MAE of 9.1 and 7.1%, respectively, and an increase in R and the coeffi- cient of determination R 2 of 1.6 and 3.2%, respec- tively.As shown in Fig. 12, the improvement varies widely across the regions.From Fig. 8 and Fig. 12, the improvements in some regions are insignificant, such as northern Africa and the Arabian Peninsula.However, significant improvements are observed in some regions for the R-T-W model, such as in the Niger River Basin, where the S RMSE is increased by 30%.Except for the grids in arid regions, R-T-W model is the optimal for most grids.These results demonstrate that the compensation effect of introducing P ST and P VWC in these regions is better than that of other aux- iliary parameters.

Comparison between the retrievals in different land cover categories
To analyze the specific impacts of SM retrievals in different land cover categories, the average values of R and S RMSE are used as shown in Fig. 13.Note that there is no effective data in land cover categories 3 and 15.From Fig. 13, the R and S RMSE are different in different land cover categories.Moreover, the performance of the improved method is better than the R-T-V model, with the lowest S RMSE = 0.024 cm 3 /cm 3 observed in land   However, introduction of vegetation parameters in the models for compensation has a little improvement in the SM retrieval performances for land cover categories 1, 2, and 6.The results show that the R-T-V model performs better in dense vegetation regions.These results demonstrate that the proposed method not only maintains a good SM retrieval performance in the regions with small surface fluctuations or sparse vegetation, but also enhances retrieval performance in the regions with large surface fluctuations or dense vegetation.One can conclude that the proposed method can reduce the errors caused by geographical disparities.5.The main land cover categories provided by SMAP at the sites are 7, 8, 10, 12, 13, 14, and 16.Some sites in certain land cover categories are not analyzed due to the relatively scattered global distribution of ISMN stations and the removal of some data.
From Table 5, the performance of the improved method at all sites is better than that of the R-T-V model with the R increased by 21.0%, the S RMSE decreased by 6.9%, and the unbiased Root Mean Square Error (ubRMSE) S ubRMSE decreased 11.1%.Furthermore, there are numerous sites with the same precision indexes in both models, such as the Eulo, Kemole_Gulch, and Bodega_6_WSW sites listed in Table 5.The reason can be the optimal model in these sites is model 1 (R-T-V model).Therefore, the comparisons of these sites are not listed here.
The data for each site are divided into different categories according to the surface classification provided by SMAP.As presented in Fig. 14f, the field data and global SM retrieval results show good consistency, with R = 0.80 and S RMSE = 0.064 cm 3 /cm 3 , respectively.From the scatter distribution of each land cover category, the dispersion of the improved method is closer to the 1:1 line.The improvements in each site are illustrated in Figs. 15 and 16.Compared to the R-T-V model, the SM retrieval performance of the improved method is better, with the R being increased from 2.9% to 92.0%, the S RMSE being decreased from 1.0% to 25.0%, and the S ubRMSE being decreased from 1.1% to 25.0%.From Fig. 17, there are different improvements in the vegetation regions.The average R is increased by 22.7%, and the average S RMSE is decreased by 8.7%.Moreover, the R values of some sites, such as the Yan- kee_Reservoir, Asheville_8_SSW, and Asheville_13_S, exhibit larger increases.Combining the CYGNSS observation area and its surrounding environment sites exhibiting low correlation reveals that these sites are close to water bodies or vegetation.The Pawhuska, Lovell_Summit, and Asheville_8_SSW sites are in densely vegetated areas, whereas the Kemole_ Gulch and Batesville_8_WNW sites are on islands and cropland, respectively.Notably, as the CYGNSS observation areas of these sites fall within the EASE 36 km × 36 km grid, their surrounding areas include vegetation, water bodies, and other environmental features close to the sites.Therefore, the CYGNSS effective reflectivity of these sites is influenced by surrounding environments.Furthermore, the sudden precipitation or paddy irrigation near the site can also result in low correlation.The S RMSE of the five sites (Watkinsville_#1, WTARS, Asheville_8_ SSW, Asheville_13_S, and Batesville_8_WNW) are decreased by more than 8.7%, with the largest value of 25% for Watkinsville_#1 site.Compared with R-T-V method in global SM retrieval, the improved method can better retrieve SM in the regions with complex surface conditions.

Discussion
In this paper, a gridded SM retrieval method considering geographical differences is proposed.This method compensates for the attenuation of the CYGNSS effective reflectivity using the auxiliary parameters provided by SMAP.However, the possible uncertainty of this method is related to several factors.The first is the uncertainties and internal errors in the used auxiliary data.As the current SM estimation, which relies entirely on CYGNSS data, has not been implemented, the utilization of auxiliary data can only be minimized to maintain accuracy.Introducing more auxiliary data will decrease the robustness and stability of the model.Furthermore, the comparison results of the combined models indicate that the same type of data has different compensation effects on the reflectivity.
The second is the influence of seasonal variation in SM, vegetation, and other factors.Seasonal variations can impact the surface reflectivity derived from

Table 5 Performance of the improved method and R-T-V model in 19 ISMN sites
The I-M represents the improved method.Each column below the R-T-V and I-M presents the corresponding the accuracy indicator in the selected sites   also have an impact on the SM retrieval (Colliander et al., 2019a(Colliander et al., , 2019b;;Jin et al., 2024).
In addition, the bias in SM retrieval can originate from spatial scale differences among the various data sources.The inconsistency between the depth measured by the SM at in-situ sites and the penetration depth of the microwaves can also lead to biases.

Conclusion
To address the limitations of land surface complexity and over-reliance problems, a method for global SM retrieval that considers geographical disparities is developed.The CYGNSS data and auxiliary parameters of P SR , P VOD , P ST , and P VWC provided by the SMAP are used to develop an improved method.The SMAP and ISMN SM are used as references.Additionally, the sensitivities of the introduced auxiliary parameters to CYGNSS effective reflectivity in different land cover categories and characteristic regions are presented.
The improved method consists of five linear models to consider the influence of geographical disparity on the CYGNSS effective reflectivity and avoid redundant auxiliary data.Based on the performance indicator, the optimal model in each grid is determined.After determining the optimal model in each grid, the SM retrieval is investigated.The results show that the improved method can provide a good retrieval effect in both global and local scales.The global SM retrieval results demonstrate that the performance of the improved method is better than  the R-T-V method, with the correlation coefficient R being increased from 0.908 to 0.923 and the S RMSE being decreased from 0.044 to 0.040 cm 3 /cm 3 , respectively.The performance of the improved method in local regions is also better than the R-T-V method, with the lowest S RMSE of 0.024 cm 3 /cm 3 in the Barren or sparsely vegetated region.Furthermore, the results in different land cover categories reveal that the performance can be maintained in the area with small surface fluctuation and sparse vegetation, and the performance can be improved in the area with large surface fluctuation and dense vegetation, among which the Niger River Basin has the largest increase of S RMSE , reaching 30%.In the field validation of ISMN, the overall R and S RMSE are 0.80 and 0.064 cm 3 / cm 3 , respectively.The average S RMSE of chosen sites is decreased by 8.7%.
The SM retrieval results indicate that the improved method can obtain better SM retrieval results in both global and local scales without redundant auxiliary data.Moreover, the findings of this paper can contribute to a novel way that considers the impact of geographical disparity for global and local SM retrieval.Additionally, the coupling physical mechanism of multiple factors needs to be further analyzed in future studies.

Fig. 3
Fig. 3 Diagram of data processing and flowchart of the improved method

Fig. 4
Fig. 4 Correlations between the influencing factors and CYGNSS effective reflectivity in different land cover categories.The P SM , P SR , P ST , P VOD , and P VWC represent the SM, SR, ST, VOD, and VWC, respectively

Fig. 5
Fig. 5 Distribution and models in the characteristic regions of Southeast China hills and Himalayas

Fig. 6
Fig. 6 Distribution and models in the characteristic regions of Sahara Desert and Congo Basin

Fig. 7
Fig. 7 Distribution and models in the characteristic regions of Great Artesian Basin and Deccan Plateau

Fig. 8 Number
Fig. 8 Specific distribution of the models used in global SM retrieval

Fig. 9
Fig. 9 Distribution of R for the improved method in global SM retrieval

Fig. 11
Fig. 11 Density scatterplot, R , and S RMSE of the soil moisture retrieval results using the improved method In this section, 19 sites derived from five networks (ARM, OZNET, SCAN, TxSON, and USCRN) within ISMN are used for field verification.The estimated SMs from the nearest grid to the site are used for analysis.The performance of the improved method and R-T-V model in 19 ISMN sites are listed in Table

Fig. 12
Fig. 12 The improvement percentages of S RMSE for the SM retrieval results compared with the R-T-V model spaceborne GNSS-R, which limits the accuracy of SM retrieval.According to the results of Fig.18, in addition to the regions with low SM values (such as the Sahara Desert), there are obvious differences in CYG-NSS SM in different seasons, especially in vegetated regions.The performance of SM retrieval is closely related to the SM variation, as depicted by the red boxes in Figs.18 and 19.The performance of the SM retrieval method decreased gradually when the value of SM increases due to seasonal variation.Furthermore, the seasonal variations in the environmental factors such as vegetation and soil temperature may

Fig. 14
Fig. 14 Scatter distribution in different land cover categories (7, 8, 10, 12, and 14) for the improved method and the R-T-V model.The I-M represents the improved method

Fig. 16
Fig. 16 The S RMSE of the improved method and the R-T-V model at 16 sites.The I-M represents the improved method

Fig. 17
Fig. 17 Improvements of R and S RMSE for the improved method compared with the R-T-V model

Table 1
Five models with corresponding equations in the improved methodThe R and i represent the CYGNSS effective reflectivity ( Γ coh CYGNSS ) and model number, respectively.The F i SM is the CYGNSS retrieved SM.The P SR , P ST , P VOD , and P VWC represent the auxiliary parameters of SR, ST, VOD, and VWC, respectively.The coefficients of each model from a i to d i are determined via a training process

Table 2
Number of matching grids and the averaged correlation coefficient between CYGNSS effective reflectivity and the influencing factors in the characteristic regions

Table 4
Errors statistics of the five models and the improved method