Gap Filling of the ESA CCI Soil Moisture Data Using a Spatiotemporal Attention-Based Residual Deep Network

As an essential climate variable, soil moisture (SM) exerts an indispensable influence on numerous disciplines. However, various degrees of data gaps exist in current microwave SM products. Therefore, this article proposed a spatiotemporal attention-based residual deep network (STARN) to reconstruct gaps of the daily SM data from the Climate Change Initiative program of the European Space Agency (ESA CCI) over the Qinghai–Tibet Plateau (QTP) during unfrozen seasons (May to September) from 2001 to 2021. The developed model is an end-to-end residual network embedded with three attention modules to comprehensively consider the potential relationship between SM and surface variables. Evaluation results revealed that the proposed model could well reconstruct SM gaps with an overall median R and unbiased RMSE (ubRMSE) values of 0.52 and 0.054 m3/m3, while the overall median R and ubRMSE values for the ESA CCI SM were 0.41 and 0.058 m3/m3. Besides, comparison with five baseline methods (e.g., the artificial neural network, convolutional neural network, extreme gradient boosting, long-short term memory, and DCT-PLS model) indicated that the STARN model had certain advantages over the five baseline models with higher correlation and more reasonable distribution patterns. The R/ubRMSE values for the five models were 0.38/0.057, 0.34/0.058, 0.40/0.058, 0.41/0.056, and 0.41/0.058, respectively. The pretraining using the ERA5-Land SM data could further improve the accuracy of generated seamless SM data since the ERA5-Land and ESA CCI SM complemented each other to a certain extent on the QTP. In summary, by leveraging the spatiotemporal information and attention modules, the STARN model showed great potentials in SM gap filling.


I. INTRODUCTION
S OIL moisture (SM) has been recognized as one of the 50 essential climate variables and plays a crucial role in the hydrothermal process between the land surface and atmosphere by controlling the evapotranspiration and partitioning the precipitation [1], [2]. Accurate characterization of SM would, therefore, greatly benefits numerous disciplines, such as climate forecast [3], drought monitoring [4], and water resource management [5].
Land surface SM can be obtained by means of in-situ measurements, land surface modeling, and remote sensing. Ground measurements provide the most accurate estimation of SM at different depths but are extremely sparse in spatial. Although land surface models can generate spatial continuous SM simulations, their accuracy suffers from the model parameterization, model structure, and quality of forcing data [6], [7]. On the other hand, satellite remote sensing, especially microwave remote sensing, has been deemed an effective tool for retrieving surface SM due to its strong penetrating capability and high sensitivity to SM [8], [9], [10]. Great efforts have been made in estimating surface SM using microwave remote sensing, including the Lband-based Soil Moisture and Ocean Salinity (SMOS) satellite, the C-band-based Advanced Scatterometer (ASCAT), and the X-band-based Advanced Microwave Scanning Radiometer for Earth Observing System (AMSR-E). However, the time span of single-sensor-based SM product is too short to support long-term climate monitoring and forecasting which usually requires a time span over 30 years [6]. To bridge this gap, the Climate Change Initiative program of the European Space Agency (ESA CCI) released the first multidecadal, global SM product named ESA CCI SM in 2012 by combining various active and passive microwave SM datasets [11].
The Qinghai-Tibet Plateau (QTP), known as the Asia's water towers, is the headstream of seven major rivers in Asia. Due to its complex and unique geographical environment, the QTP is extremely sensitive to climate changes and exerts an essential impact on the Asian monsoon and the global water cycle [12]. Therefore, obtaining a comprehensive and long-term SM dataset over the QTP can promote our understanding of the water dynamics of land surface-atmosphere feedbacks [13]. Nevertheless, even after decadal development, there still exist serious data gaps in current SM products caused by satellite orbit changes, radio-frequency interference, complex topography, or snow and ice [6], [11], [14], especially in the QTP. Previous articles have shown that the percentage of data gaps over the QTP is more than 40% and even reaches 80% in the western regions [15]. Thus, it is of great urgency for the development of gap-filling technology for SM products.
Currently, several approaches have been proposed to reconstruct and generate seamless SM products. These methods can be categorized as statistical regression, geostatistics, and machine learning. The statistical regression methods mainly rely on the linear or nonlinear relationship between input variables and the target [7]. In addition, geostatistical approaches utilize the spatial autocorrelation of SM and have been widely adopted in the domain of gap filling. For example, Tong et al. [16] applied the ordinary kriging to fill the SMAP SM gaps on the QTP. Sun and Xu [6] also evaluated the feasibility of geostatistical methods for SM data imputation in China. Machine learning is another way to obtain gapless SM which learns the complex relationship based on data-driven methods. Previous articles have leveraged various machine-learning algorithms to fill gaps of SM data [3], [10], [15]. Cui et al. [17] used a general regression neural network to obtain spatiotemporal continuous SM dataset over the QTP. By exploiting a random forest model, Qu et al. [12] also rebuilt an SM product adopting AMSR-E/AMSR2 brightness temperature and SMAP SM datasets.
However, the abovementioned machine-learning methods usually construct models based on pixelwise extracted data, failing to take the spatial neighborhood information into consideration. Although approaches, such as geostatistics and the convolutional neural network (CNN) can effectively extract the spatial characteristics of SM to a certain extent, the temporal information is still seriously lacking, which is indispensable for SM data imputation since SM has the substantial spatiotemporal variability due to its complex interactions with topographic, pedologic, and meteorological factors [18]. In addition, even though the 3-D discrete cosine transform-based penalized least square regression (DCT-PLS) method has been applied for SM gap filling [19], the application of utilizing both the spatiotemporal information and deep learning for SM data gap filling needs to be further explored, especially on the QTP. Therefore, the main objective of this article was to propose a new deep-learning model named the spatiotemporal attentionbased residual deep network (STARN) for SM data gap filling and examine its performance by evaluating against in-situ measurements and comparing with five baseline methods. The ESA CCI SM product over the QTP was chosen due to its long temporal span. Specifically, we aimed to address the following specific questions: 1) How well does the STARN model perform in SM imputation? 2) How well does the STARN model behave compared with previous gap-filling methods? 3) Does the transfer learning have the feasibility and potential in SM data gap filling?
The rest of the article is organized as follows. Section II introduced the study area and datasets. The methodology was given in Section III. Sections IV and V showed the results and discussion, respectively. Finally, Section VI concludes this article.

A. Study Area
The QTP (Fig. 1), known as the "Roof of the world," is the largest and highest plateau in the world with an average elevation more than 4000 m and a total area of about 2.57 million km 2 [20]. Due to its complex geographical environment, the average annual temperature drops from 15.5°C in the southeast to −5.0°C in the northwest [21]. Influenced by the Southeast Asian monsoon and western winds, the precipitation on the QTP has strong seasonal variations with 75% concentrated in the unfrozen seasons (May to September) [15]. The QTP has diverse vegetation types including forests, shrubs, meadows, and grasslands. As the "water tower of Asia," the QTP shows a significant impact on the Asian monsoon and global atmospheric cycle [18].

1) ESA CCI Dataset:
The ESA released the first long-term, global satellite-observed SM product named ESA CCI in 2012 with the purpose of providing a consistent and continuous global SM dataset [11], [22]. Specifically, the ESA CCI version 07.1 product was utilized in this article. This product was generated by combining four active (AMI-WS and ASCAT MetOp-A, B, C) and 12 passive (SMMR, SSM/I, AMSR-E, TMI, WindSat, SMOS, AMSR2, SMAP, GPM, FY-3B, FY-3C, and FY-3D) microwave instruments. The ESA CCI SM v07.1 product consists of three SM datasets with a spatial resolution of 0.25°and a temporal resolution of 1 day: the active product, the passive product, and the combined product, and the latter was adopted in this article (https://esa-soilmoisture-cci.org/).
2) ERA5 Land Reanalysis Dataset: The ERA5-land reanalysis dataset that provides a consistent view of the evolution of the water and energy cycle over land at a spatial resolution of 9 km was also adopted in this article (https://cds.climate.copernicus. eu/cdsapp#!/dataset/reanalysis-era5-land?tab = overview). It is generated through reanalyzing the land component of the European Centre for Medium-Range Weather Forecasts land surface model and combining model outputs with observations around the world into a global consistent dataset [23]. The hourly ERA5-Land SM data have been fairly processed into the daily arithmetical averages before model training and evaluation.
3) Auxiliary Datasets: Fourteen environmental auxiliary variables were selected for the SM data imputation (Table I). The land surface temperature (LST) including both daytime LST and the difference between daytime and nighttime (LST_Diff) were derived from the MOD11A1 and MOD11A2 products. The normalized difference vegetation index (NDVI) and four widely used SM related indexes (e.g., the surface water content index (SWCI) [24], the visible and shortwave infrared drought index (VSDI) [25], the shortwave infrared water stress index (SIWSI) [26], and the normalized multi-band drought index (NMDI) [27]) were calculated from the daily MODIS surface reflectance products MOD09GA and MYD09GA. The soil texture dataset (clay/sand/silt) was obtained from the SoilGrid250m product. The daily precipitation data were from the Climate Hazards Group InfraRed Precipitation with Station data dataset with a spatial resolution of 0.05°. In addition, the void-filled digital elevation model at a 3 arc-seconds resolution was obtained from the World Wildfire Fund HydroSHEDS and two topographic factors (slope and topographic wetness index) were calculated.

4) In-Situ Soil Moisture Measurements:
The in-situ SM measurements from the International Soil Moisture Network (ISMN) were chosen as ground truth for evaluation. The ISMN provides quality-controlled and harmonized SM datasets from various operational networks and ground validation campaigns [28], [29], [30]. Three networks on the QTP namely the NGARI network (21 sites available), the NAQU network (11 sites available), and the MAQU network (27 sites available) were chosen for evaluation (https://ismn.earth/en/). Only the 0-5 cm depth data with good quality in unfrozen seasons (May to September) were retained for evaluation and were first averaged on a daily scale.

III. METHODOLOGY
The overall workflow was presented in Fig. 2. First, all the 14 ancillary variables were preprocessed and resampled to the projection of ESA CCI SM data. Then, an STARN model was pretrained using the ERA5-Land SM as target, and was fine-tuned using the ESA CCI SM. Finally, the spatiotemporal seamless SM data were generated and were evaluated using in-situ observations and compared with the results generated by five baseline gap-filling learning methods [e.g., the artificial neural network (ANN), the CNN, the extreme gradient boosting (XGBoost), the long-short term memory network (LSTM), and the DCT-PLS method] that have been widely utilized in SM imputation or estimation [7], [19], [31], [32], [33].

A. Data Preprocessing
The quality of auxiliary variables is important for the generated results and spatiotemporally continuous input variables with good quality should be obtained. Even though the original temperature and SM related indexes (e.g., NDVI, SWCI, VSDI, SIWSI, and NMDI) data were continuous, data gaps existed after quality control since only pixels with good quality were retained. Therefore, the widely utilized harmonic analysis of time series and Savitzky-Golay filter were applied to the quality-controlled data to reconstruct data gaps. Then, all the variables were resampled and reprojected to the projection of ESA CCI SM data. During model construction, the ERA5-Land SM in unfrozen seasons was divided into the training (data during 2001-2018) and validating sets (data during 2019-2021) since it was used to pretrain the model and no testing set was needed. The ESA CCI SM data in unfrozen seasons were split into the training (data during 2001-2016), validating (data during 2017-2019), and testing sets (data during 2020-2021).   is then applied to regulate the value to between 0 and 1, which is the reasonable interval of SM.

B. Spatiotemporal Attention-Based Residual Deep Network
The RDAB is composed of a dense block followed by three attention modules. For the dense block, six 3-D CNN layers with 64 3 × 3 × 3 filters are densely connected to the last 3-D CNN layer, which has 64 filters with a kernel size of 1 × 1 × 1. The output features are subsequently enhanced with the channel attention, the spatial attention, and the temporal attention modules (Fig. 4). Eventually, a skip connection is adopted which adds the inputs to the enhanced features.
For a given date T, the time series of ancillary variables during dates T − 4 and T + 4 are simultaneously imported as the 4-Dtensor inputs with a shape of (55125,914). Except for the RDAB, the kernel size of all 3-D filters is set to 3 × 3 × 3 and the number of filters before the last 3-D CNN layer is 64 and 1 for the last.
The number of filters for the 2-D CNN layer is 1 and the size is set to 2 × 2. The mean square error is chosen as the loss function and the initial learning rate is 0.001 (the source code of the model could be assessed through https://github.com/Mew-YL/STARN/tree/main).

C. Application of STARN Model for Soil Moisture Gap Filling
It is worth noting that the missing proportion of ESA CCI SM data on the QTP is relatively large (Fig. 5), resulting in very limited data available for training the model, and thus, the constructed model could not well learn the spatiotemporal features for SM imputation. Meanwhile, deep learning usually requires massive training samples, otherwise the models are prone to overfitting and could have a relatively poor performance [33]. To overcome such problems, a simple fine-tuning transfer learning strategy was adopted in this article. Specially, an STARN model was preliminarily trained taking the ERA5-Land SM during 2001-2021 as target and the corresponding ancillary variables as input. Hyperparameters of the model were tuned until an acceptable accuracy (R 2 > 0.95) was achieved. Then, this well pretrained model was further fine-tuned using the ESA CCI SM and corresponding auxiliary variables, thereby transferring the knowledge of ERA5-Land SM for imputing the ESA CCI SM data gaps. Finally, the seamless input variables during 2001-2021 were fed into the well-trained model and the gapless SM data were obtained. The feasibility and potential of leveraging ERA5-Land and transfer learning to improve SM prediction has been preliminary explored by Li et al. [33].

D. Evaluation Metrics
Several traditional statistical metrics [34], [35] including the correlation coefficient (R), root mean square error (RMSE), unbiased RMSE (ubRMSE) and bias were adopted in this article to evaluate the accuracy of reconstructed SM. The formulations are as follows: where x i and y i are the predicted SM and true SM values and N is the size of samples. Besides, the spatial scale mismatch between in-situ points and remote sensing/model pixels is inevitable and the errors in the RMSE and bias might be partly due to such mismatching issue [34], [36]. While the R and ubRMSE metrics are less influenced and therefore, more emphasis was given on these two metrics in the following sections. In addition, in order to obtain a fair evaluation, the site values that fail within one pixel were averaged before evaluation [37], [38]. 4%. However, it should be noted that due to the increasing number of utilized microwave instruments, the data gap gradually decreased over time. From the perspective of spatial distribution, owing to the complex topography and permafrost, snow, and glaciers [6], the northwestern of the QTP was confronted with extremely large gaps with the missing rate basically above 85%, while the eastern regions had relatively smaller missing rates ranging from 10% to 80%. In contract, the reconstructed SM data were spatiotemporally seamless.

B. Evaluation With In-Situ Measurements
The generated seamless SM, original ESA CCI SM, and ERA5-Land SM were evaluated against ground observations, and the median statistics are presented in Table II. Overall, all  In addition, their performances varied among different networks. Specifically, regarding the R metric, the predicted SM outperformed the ESA CCI and ERA5-Land SM products both in MAQU and NAQU networks, whereas had the lowest correlation in NGARI network. The median R values were 0.52, 0.41, and 0.54 for the MAQU, NAQU, and NGARI networks. Meanwhile, all the three SM products showed the highest correlation in NGARI network and presented relatively lower R value in MAQU or NAQU network. Especially for the ERA5-Land SM data, which had surprisingly low R value in NAQU network. Such divergent performances among networks indicated their better performances in semiarid/arid areas and inferior behaviors in vegetated regions. As for the ubRMSE value, the generated SM exhibited lower ubRMSE scores than ESA CCI and ERA5-Land SM in NGARI and NAQU networks. Whereas in MAQU network, its value was slightly higher than the ESA CCI SM data. It should be noted that even though the ERA5-Land SM showed highest correlation in NGARI network, it also had the largest ubRMSE score.
The temporal variations of the obtained gap-free SM, ESA CCI SM, ERA5-Land SM, and in-situ measurements for each network during 2008-2019 were presented in Fig. 6. Values   (Table III) to further examine whether the SM products could capture the dynamic evolution of areal SM [39]. Great agreement could be observed between the predicted seamless SM and ESA CCI SM with correlation ranging from 0.68 to 0.86, demonstrating the potential of STARN model for SM imputation. Meanwhile, compared with original ESA CCI SM, certain improvement was achieved for the predicted SM in NAQU and NGARI networks with R and ubRMSE scores enhanced by 0.02-0.15 and 0.003-0.018 m 3 /m 3 , respectively. However, its performance was relatively inferior than the ESA CCI SM in MAQU network. Furthermore, both ESA CCI SM and generated SM showed obviously overestimation in NAQU and NGARI networks, and certain degree of underestimation in MAQU network. In addition, the reconstructed SM exhibited the highest and lowest correlation with ground observations in NGARI and NAQU network, respectively, which might be associated with the transfer learning procedure since the ERA5-Land SM had similar performance in these two networks.

C. Comparison With Baseline Methods
Comparisons were carried out with five baseline gap-filling approaches to further justify the feasibility and potential of proposed model in SM imputation. Table IV gives the median evaluation metrics of generated SM by different methods against in-situ measurements.
The STARN model outperformed the other five baseline methods and the overall performance order was STARN > LSTM > XGB > DCT-PLS > ANN > CNN when ESA CCI SM was available. Specifically, the median R and  (Table II), indicating their poor gap-filling performances. While the DCT-PLS method achieved satisfying performance with median R value of 0.41. Furthermore, evaluation was conducted when ESA CCI SM was missing and similarly, the STARN model owned the best performance with median R and ubRMSE values of 0.41 and 0.056 m 3 /m 3 , followed by the CNN and XGB methods. The performances of DCT-PLS and LSTM methods were slightly weakened with R scores decreased by 0.02 and 0.05.
In addition, the spatial distributions of generated SM by the six methods and original ESA CCI SM on three consecutive days were presented in Fig. 7. All the reconstructed SM exhibited a similar spatial pattern with SM gradually drying from southeast to northwest. However, the DCT-PLS method tended to generate much smoother results when data gaps existed over large areas. Meanwhile, it should be noted that the SM generated by the ANN and LSTM model had obviously unreasonable spatial noise, i.e., the "salt-and-pepper" effects, since the spatial information was not considered. Such effect was mitigated for the XGBoost method to a certain extent and the obtained SM was also closer to original ESA CCI SM. Although the CNN could extract the spatial neighborhood information, it still failed to effectively impute SM gaps in the QTP and produced abnormally low values in the northwestern regions. In contrast, the proposed STARN model utilized the 3-D CNN to capture the temporal information and could generate more accurate SM. Based on above evaluation results, it could be concluded that the STARN model could well reconstructed data gaps of ESA CCI SM product by leveraging the spatiotemporal information and outperformed other baseline methods both in terms of accuracy and spatial patterns.

A. Improvements of Transfer Learning
Transfer learning is a commonly used deep-learning strategy for limited training samples, which usually transfers the similar knowledge from the source domains to target domains through pretraining and fine tuning, thereby alleviating the overfitting problem to a certain extent and further improving model performance [33]. It has been widely applied in image classification  [40], soil organic content prediction [41], and meteorological forecasting [42], [43]. The feasibility of applying transfer learning for SM predicting has also been demonstrated by Li et al. [33], and this article further examined the potential of utilizing transfer learning to impute SM gaps. The knowledge of ERA5-Land SM was transferred and validation results in Section IV-B indicated that using it as the source domain could effectively improve the accuracy of gap-filled SM data. The high accuracy of ERA5-Land SM in cold areas, such as the NGARI network, was completely retained in the generated SM, while the original ESA CCI SM behaved relatively poor in network scale (Table III). Besides, the low-correlation characteristic of ERA5-Land SM in NAQU network was eliminated during fine tuning and the obtained SM showed similar performance with ESA CCI SM in these regions (Table III). Meanwhile, the relatively higher correlation and lower ubRMSE score of STARN model with transfer learning than that without transfer learning in Table V also demonstrated the great potential of utilizing ERA5-Land and transfer learning for SM gap filling. The over median R and ubRMSE values for the STARN model with transfer learning were 0.52 and 0.054 m 3 /m 3 , respectively, and were 0.45 and 0.055 m 3 /m 3 , respectively, for the model without transferring when ESA CCI SM was available. The generated SM with transfer learning usually exhibited more reasonable and accurate patterns in some regions, such as the central and southwestern of the QTP (Fig. 8). While the model without transfer learning showed certain degrees of underestimation.

B. Advantages and Limitations of Proposed Model
Due to the strong spatiotemporal heterogeneity of SM [18], both the spatial and temporal features related with SM were adopted by the proposed model and were further enhanced through three attention modules, which contributed to the gapfilling performance improvements. Compared with the original ESA CCI SM, the relatively higher correlation and lower ubRMSE values of generated SM by the STARN model without transfer learning just demonstrated the superiority of the STARN model for SM imputing (Tables II and V). The transferring of ERA5-Land SM further enhanced the performance, since the ERA5-Land and ESA CCI SM complemented each other to a certain extent. The high correlation of EAR5-Land SM in NGARI network just compensated for the deficiency of ESA CCI SM in this region, while in NAQU network, situation was the opposite.
Besides, the proposed model also presented certain advantages to the baseline methods (e.g., the ANN, CNN, XGBoost, LSTM and DCT-PLS model) with higher correlation and more reasonable SM patterns. The simple ANN model failed to well reconstructed SM gaps since it was sensitive to uninformative features [44], which might count for the obvious spatial noise. Even though the LSTM has been demonstrated to well predict the SM time series [45], the gap-filled SM also presented certain degrees of spatial noise. Meanwhile, the performance was deteriorated to a certain extent when ESA CCI SM value was missing (Table IV). Although Guo et al. [46] has successfully utilized the simple 3-D DCT-PLS method for SM imputing, in this article, it generated overly smoothed results especially when large areas were missing, and the detailed information was completely lost. This might be related to the fact that the DCT-PLS only relied on the SM data and if the data gaps cover large areas and long temporal span, the gap filling performance would be very limited.
However, it should be noted that the definition of unfrozen seasons was simply based on the time period instead of using the temperature or snow cover as thresholds. This might exclude some valuable data or retain more bad data in prolonged frozen seasons, which could impact the results to a certain extent. Hence, future article should further investigate the influence of potential retained bad data/masked valuable data. Meanwhile, the ancillary variables used were mainly come from optical remote sensors, while the microwave data, such as the brightness  temperature data, at different frequencies (C, X, and L bands), and the radar backscattering coefficients have been proven to show strong direct relationship with SM [47]. Further articles should consider these factors to improve the gap-filling accuracy.

C. Spatiotemporal Variations of Soil Moisture on the QTP
The mean SM and the yearly change trends of SM, precipitation, and temperature of unfrozen seasons (May to September) during 2001-2021 were shown in Fig. 9 by calculating the Sen's slope [48]. The distribution of SM on the QTP showed a gradually drying pattern from southeast to northwest. While overall SM exhibited an increasing trend with a slope of 0.00015 m 3 /m 3 per year. The northeastern regions were experiencing a wetting trend, while the southeastern regions showed a "wet gets drier" pattern. Meanwhile, SM had a strong coupling relationship with precipitation and temperature with a correlation coefficient of 0.61 and −0.60, which was consistent with previous article [12]. The gradually wetness trend of SM might be mainly caused by the increasing precipitation, which had an increasing slope of 4.647 mm per year during 2001-2021. While the overall trend of temperature was 0.024°C per year. Spatially, for the northeastern regions, the wetting SM corresponded well with the increasing change of precipitation and decreasing trends of temperature. In the southern parts of the plateau, the drying SM might be associated with the rising temperature. Besides, in the southwestern regions, both precipitation and temperature exerted an impact on SM, leading to its wetness trend. However, the increasing temperature in the northwestern areas resulted in the dryness trend of SM despite the increasing precipitation in these regions.

VI. CONCLUSION
This article proposed an end-to-end residual deep network (STARN) that was embedded with three attention modules for SM data imputation. The potential and feasibility of the developed model for SM data gap filling were fully examined by evaluating against ground observations and comparing with five baseline gap-filling methods. Conclusions derived are as follows: 1) The ESA CCI SM data showed extremely large gaps over the QTP in unfrozen seasons (May to September) during 2001-2021 with an average missing rate of 47.24%. Due to permafrost, snow, and glaciers, the northwestern regions had missing percentage basically above 85%, while the eastern areas showed relatively smaller missing rates.
2) The performance of the STARN model was evaluated against ground measurements and compared with five baseline methods. Results showed that the reconstructed seamless SM had comparable performance with original ESA CCI SM with an overall median R and ubRMSE values of 0.52 and 0.054 m 3 /m 3 , respectively. Besides, the STARN model had certain advantages over the five baseline models with higher correlation and more reasonable SM distribution patterns.
3) The ESA CCI and ERA5-Land SM data presented overall comparable performance and complemented each other to a certain extent. The transferring of ERA5-Land SM could effectively improve the performance of gap-filled SM data. The high-accuracy characteristic of ERA5-Land SM in cold regions was retained in the reconstructed SM, while its low-correlation feature was eliminated during fine tuning. 4) The SM over the QTP had experienced a gradually wetness trend during 2001-2021 and exhibited strong relationships with precipitation and temperature. Spatial heterogeneity existed in the SM trend with the northeastern regions showing a wetness trend and the southern areas being increasingly dry. In conclusion, by leveraging the spatiotemporal information and attention modules, the STARN model showed great potential in SM gap filling and had certain advantages over baseline methods both in performance and spatial patterns.