Comprehensive Error Analysis of CLDAS Soil Moisture Over Arid and Semiarid Regions

Soil moisture (SM) is an important parameter in all environments because it affects the relationship between the land surface and atmospheric processes. Therefore, finding products that can accurately measure SM is critical to improving drought management. The objective of this study was to investigate the accuracy of satellite data from SM produced by the China Meteorological Administration Land Data Assimilation System (CLDAS), focusing on the Huaihe and Heihe River basins in China, as both are prone to drought. To verify the accuracy of the daily surface data SM, measurements were obtained from 34 meteorological stations between January and December 2016. In addition, CLDAS measurement data were collected at a depth of 10 cm and 40 cm and compared with observed soil moisture measurements (OBS SM). The results show that the agreement of CLDAS SM with OBS was R > 0.66 at 10 cm and R > 0.47 at 40 cm in the Huaihe River Basin. R > 0.63 at 10 cm and R > 0.44 at 40 cm was observed in Heihe River Basin.

measures can help elucidate SM heterogeneity across spatial scales and improve drought management and environmental implications [15].
In recent years, China has faced severe droughts [16], [17], [18], [19]. This issue is attributable to climate change and anthropogenic activities [20], such as deforestation [21]. Therefore, there is a need for products that can help monitor atmospheric conditions and drought processes. Water supply is declining in many regions owing to warming climate and industrial activities, especially in arid regions [22], [23]. Research studies have confirmed that water supply challenges can threaten human well-being by limiting agricultural activities and reducing food production [24]. Therefore, it is critical to identify suitable measures and products for monitoring and managing droughts.
High-quality SM products are vital for improving drought monitoring and management strategies. Satellite observations or land surface simulations can meet this need by collecting continuous data and merging them to provide accurate conclusions regarding SM in different regions [25]. CLDAS is suitable for drought monitoring in China because it produces and assimilates multisatellite SM data. Previous investigations revealed that CLDAS is superior to similar products implemented in China in terms of quality [26]. Satellite-based services, such as CLDAS, provide an improved model for analyzing and evaluating SM at regional scales. Moreover, CLDAS SM data are more precise than those of competing products [27]. CLDAS finished covering East Asian regions after 2010 [28], [29]. The precision of the CLDAS SM product is an important factor in its usefulness and reliability as a tool for monitoring and understanding SM [30]. However, the accuracy and reliability of the CLDAS product have not been extensively evaluated in the literature. We acquired 2016 as the study period because there were fewer missing values in the SM measurements we collected during quality control.
A regression model fits the data, and statisticians calculate its root mean square error (RMSE) to help readers develop testable hypotheses. This research targets to evaluate the precision of CLDAS SM product and its potential applications by comparing it with observed SM (OBS SM) from ground-based sensors. Thus, the fundamental purpose of this study was to confirm these observations and recommend the use of CLDAS in Huaihe and Heihe River Basins.
This research focused on two study regions in China, one in the Huaihe River Basin and one in the Heihe River Basin, and examined the spatial and temporal variations in CLDAS SM data at various depths to confirm their usefulness and validity.
In this study, we aimed to  1) determine the association between the CLDAS SM and OBS SM at soil depths of 10 and 40 cm, 2) determine the correlation between CLDAS SM at soil depths of 10 cm and 40 cm, 3) explain the differences between 10-cm and 40-cm soil depths in the Huaihe River Basin and Heihe River Basin, respectively.

A. Study Area
The study was conducted in the Huaihe River Basin and the Heihe River Basin, which are classified as semiarid and arid regions, respectively (see Fig. 1). The Huaihe River Basin is the fifth-largest river basin in China [31]. The basin is situated in a transitional climate zone, where the weather system changes drastically throughout the year [32]. The sites at the Huaihe River Basin were spatially distributed between 30°N-36°N and 113°E-122°E. The southern part of the basin has a subtropical climate, whereas the northern part has mild temperatures. This basin has an 11-16°C mean annual temperature range and an 884-mm average annual precipitation, and the heavy rainy season occurs from May to September, which is precise by the monsoon system [33]. However, the central part of the region consists mostly of alluvial plains with a relatively flat landscape [24], [34].
In contrast, the sites at the Heihe River Basin were located between 37°N-43°N and 98°E-102°E. The Heihe River Basin is located at the center of the Hexi Corridor and is the secondlargest river basin in the arid zone of northwest China [35], [36], [37]. The Heihe River Basin has a complex environment, with mountains in the south that begin in the Qilian Mountains [38]. Precipitation and snowmelt are the two most important factors affecting SM changes across arid regions [14]. Anthropogenic factors, such as flood control and cultivation, are the primary drivers of seasonal and monthly variations [39], [40]. The Heihe River Basin has the highest precipitation in the summer and autumn, particularly between May and August [41]. However, the spring in this region is dry, with snow and ice melting, whereas snow is abundant in the winter [42], [43].
Consequently, the two basins are vulnerable to drought disasters. Based on their descriptions, the geographical and climatic conditions of the two regions differ significantly. Thus, the conditions were used to demonstrate the effectiveness of CLDAS at different sites and compare its accuracy in these regions to determine the most suitable application.

B. Datasets
This study used SM data collected from 34 Chinese meteorological sites (24 in the Huaihe River Basin and 10 in the Heihe River Basin). The data were utilized to assess the precision of the CLDAS SM and make relevant comparisons (see Fig. 1). Table I shows the different depths at which CLDAS and OBS SM were measured [44].
OBS SM data were used to measure SM at sites in the two regions and to confirm the effectiveness of CLDAS. The daily (366 days) SM data were collected at 10-cm and 40-cm depths at 08:00 AM Beijing time (00:00 AM UTC).
This study used SM data at 10-cm and 40-cm soil depths from OBS SM and CLDAS listed in Table I. The information was provided by the National Meteorological Information Center of the CMA. The volumetric SM products were converted into relative (%) products to enhance the analysis and accuracy of the study.
To process the data, Interactive Data Language (IDL) software and specific routines were employed. Furthermore, Excel and ArcGIS software were used for metrics calculation and map visualization, respectively.

C. Methodology
The methods utilized in the data analysis enabled a comprehensive investigation of the relationship between climatic conditions and SM in arid and semiarid regions. The processing and visualization of the data were facilitated by the use of IDL, Excel, and ArcGIS programs. It should be noted that appropriate measures were taken to ensure the originality and authenticity of the study's findings.
The RMSE and Pearson's correlation coefficient (R) were the two statistical comparison indicators that were utilized in this investigation to assess the efficacy of CLDAS SM products [47], [48], [49]. The RMSE approach was used to measure the spatial variability between the actual value and predicted CLDAS and OBS SM values as follows: The Pearson correlation coefficient (R) technique was employed to determine the linear fit of CLDAS and OBS SM data at 10 cm and 40 cm [50]. It is used to evaluate the precision of SM products and is typically stated in units of SM content as follows: (2) In both equations, N is the total number of values in the time series, and x i and y i represent the two datasets being compared.
The average values of the relevant variables are given by X and Y.
Acquiring the relevant data for the study is the initial phase of analysis. This includes OBS SM data and satellite-derived SM products. The satellite-derived SM data were obtained from the CLDAS. Following that, the algorithms were implemented into the IDL software in order to process the data [51]. IDL was used to get 10-cm and 40-cm SM data of the grid where the observation station is located. Any missing data and extreme values (≤0) were deleted and turned to zero prior to processing [52]. The R and RMSE are computed to verify the precision of SM data.

III. RESULTS
This section presents the results of the SM measurements in the Huaihe and Heihe River Basin areas. Here, we compare the CLDAS SM product and OBS SM estimations to determine the accuracy of the readings. Moreover, we evaluate the results obtained at different soil depths to illustrate whether these instruments are effective for monitoring drought risk in arid and semiarid regions. The metrics results are mentioned in Table II.   The findings indicated that CLDAS SM data in both regions were more consistent with OBS SM at a soil depth of 10 cm than at 40 cm. The Huaihe River Basin had an RMSE between CLDAS and OBS SM of 0.10 at 10 cm and 0.12 at 40 cm, respectively. In contrast, the Heihe River Basin had an RMSE of 0.10 at 10 cm and 0.11 at 40 cm (see Fig. 3). This implied that CLDAS data in both regions at 10 cm had fewer errors than that at 40 cm. CLDAS and OBS SM data are also positively correlated in both regions at 10 cm depth, with an R of 0.66 and R of 0.63, respectively. Detailed linear regressions for both regions are shown in Fig. 3.

A. Comparison Between SM Estimates From CLDAS and OBS SM
Furthermore, the daily validation findings demonstrated that the CLDAS SM data agreed with the OBS SM data. Overall, the temporal variation in the CLDAS and OBS SM errors in shallow soil layers was more considerable than that in deeper layers.
1) Huaihe River Basin (Semiarid region): Mild and moderate droughts occur frequently in the Huaihe River Basin, but unusual and severe droughts are rare [53]. Research studies in this area have revealed that rainfall occurs in the summer months from June to September [52], [54], [55], [56], [57]. These weather patterns contribute to flooding and drought during this season because they affect SM. SM is highly sensitive to rainfall; therefore, the study period selected for this research was ideal because it ensured that CLDAS SM products could capture this phenomenon. In addition to rainfall, droughts and floods in the Huaihe River Basin are significantly impacted by climate variations, intense anthropogenic activities, and changes in land topography [58], [59], [60]. Precipitation variability can increase the risk of soil loss [31]. As precipitation frequency decreases, the likelihood of drought increases in the Huaihe River Basin [61], [62], [63]. Furthermore, the basin is one of China's most flood-prone areas. The maximum monthly mean temperature occurs in June, whereas the lowest occurs in January [64]. Varying R and RMSE values were recorded at different regions and soil depths in the Huaihe River Basin. These measurements are presented in Table III. According to the results, SM conditions varied in different sections of the Huaihe River Basin. For example, the northward side of the basin is relatively dry, whereas the southward side is considerably wet [65], [66]. A systematic analysis of data from this region between 2015 and 2017 similarly concluded that daily precipitation was higher in summer and lower in winter [67]. The analysis also revealed that there were low levels of rainfall in the mountainous areas of the southern and eastern parts of the region in the winter, particularly in January and February. These observations may explain the variations in SM as well as the accuracy of the CLDAS SM data. They emphasized that, while being in the same general location, the sites all have significantly distinct weather and geographical conditions.

2) Heihe River Basin (Arid Region):
There is a desert-oasis river terrain in the lower and middle parts of the Heihe River Basin, where some of the study sites are located [68]. Thus, the SM content at the center of the basin is low [69], and freezing temperatures occur in December and January [70]. Between April and July, the stream flow in the lower reaches drops dramatically, occasionally causing the river to dry up [71]. These weather patterns and variations account for the region's high susceptibility to droughts. Research studies and investigations in the basin suggest that land degradation and anthropogenic activities are the primary drivers of the climate variations observed in this region [72], [73], [74]. These activities have resulted in consequences such as global warming and rapid deglaciation of the Qilian Mountains [75]. The Heihe River Basin also had varying R and RMSE values at different sites and soil depths. These measurements are presented in Table IV.

B. Spatial Distribution of Errors in SM
CLDAS SM data at 10 cm and 40 cm had a spatial distribution of errors in both semiarid and arid regions. In the Huaihe River Basin, 22 sites had an R higher than 0.5 at 10 cm, which indicates strong correlations. In contrast, out of 24 sites, 20 sites had similar values at 40 cm. Guiji station showed a correlation of less than 0.5 (R < 0.5) in all seasons, and Wudaogo station indicated R < 0.5 in winter at 10-cm depth. Huitanggou station reported R < 0.5 in spring, autumn, and winter, whereas Yaoli in spring, Yanglou in summer, and Shuangdui station indicated R < 0.5 in winter at the 40-cm soil depth. Moreover, no negative error was found for the 10-cm and 40-cm SM data in the Huaihe River Basin. In the Heihe River Basin, seven sites at 10-cm and 40-cm depths had an R greater than 0.5, which shows a higher correlation. Hunhelin station indicated errors of less than 0.5 in spring, summer, and autumn, whereas Yaogan and Shidaoqiao stations indicated errors of less than 0.5 in spring and autumn at 10-cm soil depth. However, at Huazhaizi station, the correlation was less than 0.5 in spring, summer, and autumn, and Yaogan station indicated R < 0.5 in spring, summer, and autumn at 40-cm soil depth. In winter, no stations indicated R < 0.5 at both depths. In the Heihe River Basin, no stations demonstrated a negative correlation either. Furthermore, a higher degree of accuracy was observed between the CLDAS and OBS SM in the semiarid and arid study areas. Most stations exhibited a significant correlation between CLDAS SM data at 10-cm and 40-cm depths (see Fig. 4). This further illustrates that CLDAS products were accurate at 10-cm and 40-cm depths in both regions. In

IV. DISCUSSION
We evaluated our results from different aspects, including data availability, data processing, retrieval algorithms, and error analysis of SM using R and RMSE. These reliable RMSE and R are also useful to estimate predictions for climate, hydrology, ecology, and the importance of agriculture. This research also provides us with a comprehensive direction to evaluate the error analysis of daily CLDAS product. SM is a vital component of all ecosystems due to its influence on the connection between the land surface and atmospheric processes.
Therefore, finding products that accurately detect SM is crucial for improving drought management. Based on the present findings, CLDAS SM data were regarded appropriate for China's arid and semiarid zones. Previous studies have described the advantages and disadvantages of different reanalysis and modelbased SM products for some regions of China [76], [77], [78], [79].
According to previous research [80], other products have constraints, such as a limited spatial area and a minimum depth, that are inappropriate for the evaluation of remotely sensed SM. Because of this, CLDAS SM is a more reliable source of SM data for China.
The CLDAS SM and OBS SM utilized for validation in this work are 10 cm and 40 cm for the Huaihe and Heihe River Basins, respectively. SM is vital to climate research and has a significant influence on agricultural development, according to Wang et al. [81]. In addition, this research indicates that CLDAS has considerable effects in the Huaihe and Heihe River Basins. A few sites in the Huaihe River Basin recorded R values of less than 0.5 (R < 0.5), including the Guiji station, which reported R = 0. 44 [82] determined that the R-value was 0.42 and the RMSE was 0.18. Wang et al. [83] found that there was a strong link between rainfall and temperature in the Huaihe River Basin, with more rainfall happening when the temperature went up. Furthermore, there was higher rainfall in the south and fewer rainfall in the north of the Huaihe River Basin [24]. The accuracy of SM measurements in the Huaihe River Basin may be mainly related to the local terrain, plant cover, and rainfall, according to the findings of Zhao et al. [84]. Another study [57] shows that the Huaihe River Basin's complexity is mostly caused by its complicated geography, wet and dry climate zones, and high population density.
Several stations in the Heihe River Basin recorded R values of less than 0.5. At 10-cm soil depth, Hunhelin station indicated R = 0.25 and RMSE = 0.06, Yaogan station had R = 0.48 and RMSE = 0.07, and Shidaoqiao station had R = 0.44 and RMSE = 0.14, whereas at 40-cm soil depth, Huazhaizi station indicated R = 0.34 and RMSE = 0.09 and Yaogan station indicated R = 0.40 and RMSE = 0.09. No stations revealed a negative correlation in the Heihe River Basin. Wang et al. [85] utilized the SMAP, the SMOS, and the AMSR 2 data for SM retrievals and discovered R values of 0.65, 0.56, and 0.45, respectively, in the Heihe River Basin. In addition, this research also found that in the Heihe River Basin, the majority of precipitation occurs from May to October during the monsoon season. The strongest performance in the arid region was seen at the Ebao station, where R = 0.89 and RMSE = 0.06 at 10-cm depth and R = 0.92 and RMSE = 0.05 at 40-cm depth were recorded. Using the Real Thermal Inertia model, Ma et al. [86] determined that the correlation coefficient R is 0.60 and RMSE is 0.07 in the Heihe River Basin. Another investigation revealed that precipitation and snowmelt are the key variables affecting SM variation in China's arid areas [87]. Our results demonstrate that the CLDAS accurately captured these variances, and these investigations validate our findings. The findings also reveal that the in situ SM estimate methods have an acceptable degree of precision.
The complete error analysis of CLDAS SM over arid and semiarid regions is an essential work that gives vital insights into the precision and dependability of the CLDAS SM in the Huaihe and Heihe River Basins. In addition, this study indicates that the CLDAS SM product might be enhanced by combining new data sources, including ground-based measurements, remote sensing data, and in situ observations. This study also suggests more research to understand the influence of atmospheric variability on the CLDAS SM products and the acquisition of advanced data assimilation techniques to increase the accuracy and dependability of the CLDAS SM products across arid and semiarid regions. This study identifies areas for further research to improve this product. When utilizing the CLDAS SM product in arid and semiarid environments, researchers must understand the findings of this study.

V. CONCLUSION
This study evaluated the accuracy of SM data generated using the CLDAS SM product at soil depths of 10 and 40 cm. A comparison between the CLDAS SM and the OBS SM shows a close relationship between the two datasets. The results indicate that the CLDAS works well in arid and semiarid areas. This research also provides us with a comprehensive direction to evaluate the error analysis and its influence on the connection between the land surface and atmospheric processes. Satellite-retrieved SM data had relatively small errors for the CLDAS datasets in both study areas. The results confirm that negative errors did not occur in the semiarid and arid areas. Similarly, the accuracy of the data was significant for both regions. Correlation results demonstrate that the association between the CLDAS SM and OBS SM at different depths is strong at the majority of stations in the Huaihe and Heihe River Basin. A significant positive correlation was found between CLDAS SM and OBS SM at the 10-cm and 40-cm soil depths; however, the data were more accurate at a soil depth of 10 cm than at 40 cm. There is no negative correlation between 10-cm and 40-cm soil depths in both regions. Nevertheless, the above-mentioned analysis suggests that CLDAS SM could better identify SM in the semiarid and arid regions. Consistency between 10-cm and 40-cm SM was acceptable in both regions. These results highlight the need to enhance the CLDAS SM product by focusing on retrieval correction and data quality of SM. Future attempts should be made to reduce input errors for more accurate results. We concluded that satellite-retrieved SM and its evaluation process may produce several errors. Additional studies are required for a more comprehensive analysis of CLDAS products and their applications in various geographical regions.