Investigating the seasonal dynamics of surface water over the Qinghai–Tibet Plateau using Sentinel-1 imagery and a novel gated multiscale ConvNet

ABSTRACT The surface water in the Qinghai–Tibet Plateau (QTP) region has undergone dramatic changes in recent decades. To capture dynamic surface water information, many satellite imagery-based methods have been proposed. However, these methods are still limited in terms of automation and accuracy and thus prevent surface water dynamic studies in large-scale QTP regions. In this study, we developed a new fully automatic method for accurate surface water mapping by using Sentinel-1 synthetic aperture radar (SAR) imagery and convolutional networks (ConvNets). Specifically, we built a new multiscale ConvNet structure to improve the model capability in surface water body extraction. Moreover, a gating mechanism is introduced to promote the efficient use of multiscale information. According to the accuracy assessment, the proposed gated multiscale ConvNet (GMNet) achieved the highest overall accuracy of 98.07%. We applied our GMNet for monthly surface water mapping on the QTP; accordingly, we found that the QTP region experienced significant surface water fluctuations over one year. The surface water also showed distinct spatial heterogeneity on the QTP; that is, the surface water fraction of the Inner Tibetan Basin was significantly higher than that of the Mekong Basin in both the wet and dry seasons.


Introduction
The Qinghai-Tibet Plateau (QTP) is the world's largest and highest plateau and is the so-called 'Third Pole' of the Earth (Yao et al. 2012;Yao et al. 2013).Due to its rich water resources, the QTP acts as 'Asia's water tower', and the hydrological processes shaped by the lakes and rivers in this region supply water to >1.4 billion people in Asia (Immerzeel et al. 2010;Pritchard 2019;Yao et al. 2022).This region is sensitive to climate change and has experienced a sharp expansion of surface water due to the rapid melting of glaciers over the past decades (Wang et al. 2013;Moser et al. 2019;Zhang et al. 2019;Lhakpa et al. 2022).Since surface water dynamics exert an important influence on the hydrosphere, atmosphere, cryosphere, lithosphere, and biosphere (Lehner and Döll 2004;Mulch and Chamberlain 2006;Cheng and Wu 2007), they are not only indicators of climate change but also play a significant role in ecological environmental protection, biological diversity richness, and drinking water security (Huybers, Rupper, and Roe 2016;Laird et al. 2016;Wong et al. 2017;Qiu et al. 2022).
Investigating surface water dynamic features on the QTP is crucial for better understanding global and local climatic variability and is beneficial to ecosystem protection (Zhang et al. 2021).Currently, the increasing volume of satellite remote sensing data has been widely used for long-term monitoring of surface water across the QTP region.Previous studies have indicated a continued expansion of the surface water on the QTP since the 1970s, with a remarkable acceleration in the 2000s, particularly for lakes on the central QTP (Lei et al. 2013;Song et al. 2014).Some studies also revealed that the increased water storage was mainly concentrated in the central and northern QTP (Shao et al. 2008;Qiao, Zhu, and Yang 2019).Although many researchers have examined the surface water change features on a decadal time scale or annual scale on the QTP, to date, few have investigated surface water variations at a fine temporal scale (e.g.monthly, or seasonal) and fine spatial scale (e.g. 10 m or 20 m) across the entire QTP region.Therefore, the seasonal surface water dynamics on the QTP remain poorly understood.
Satellite observations are particularly suitable for high-frequency and large-scale surface water monitoring.The widely used optical images are susceptible to cloud contamination and thus lead to a large amount of information loss in the target region.Previous optical image-based surface water studies experienced a trade-off between the temporal scale and spatial scale; that is, they were either carried out in a local region at a fine temporal scale, carried out over the entire QTP region at a coarse spatial resolution or carried out on portions of surface water bodies (e.g.lakes larger than 50 km 2 ) at a fine temporal scale (Zhang et al. 2021;Lu et al. 2017;Qiao, Zhu, and Yang 2019;Lei et al. 2017;Wang et al. 2022).
SAR images have been an important data source for surface water investigations since they are capable of penetrating through clouds and capturing the Earth's surface information under all weather conditions (Strozzi et al. 2012;Li et al. 2018;Liang et al. 2022).However, SAR images contain speckle noise, which complicates surface water identification.Among the existing methods, thresholding methods are based on the backscattering intensity of the water surface, which is generally lower than that of the land surface in SAR images (Lingyan, Zhi, and Hong 2015), and by selecting the optimal threshold based on the histogram statistic of the SAR image, the water surface and land surface can be separated.The widely used thresholding methods include the OSTU thresholding method (Otsu 1979;Tong et al. 2018;Zeng et al. 2017) and entropy threshold methods (Han and Wu 2018;Huo et al. 2018).Machine learning methods have also been widely used for SAR image-based surface water mapping.Classical machine learning algorithms, such as random forest (Zhang et al. 2021;Huang et al. 2018), supervised vector machines (Zhang et al. 2021;Insom et al. 2015) and artificial neural networks (Skakun 2010), can perform fast surface water mapping based on SAR images.Other methods, such as the active contour model (Li et al. 2014), fuzzy classification (Twele et al. 2016) and change detection (Giustarini et al. 2013), have also been introduced for SARbased surface water extraction.
Although various kinds of methods have been proposed for improving SAR-based surface water extraction, there are still defects in precision and efficiency when applying the proposed methods to different application cases.Some researchers have carried out manual postprocessing and auxiliary data assistance (Zeng et al. 2017;Zhang et al. 2021) to guarantee accurate surface water mapping.However, these processes limit the applicability and automation of the proposed methods.Generally, both automation and precision are crucial for large-scale research on the QTP.The automation level determines the efficiency of processing large volumes of satellite data, and the high precision guarantees that fine surface water variations can be captured from the generated surface water maps.
Deep learning methods have been widely explored for SAR-based surface water extraction due to their superiority in image processing (LeCun, Bengio, and Hinton 2015).Specifically, Li et al. (2018) proposed an active self-learning convolutional neural network for urban surface water extraction with TerraSAR-X data.Dong et al. (2021) compared the performance of deep learning methods with traditional methods and found that deep learning methods outperformed traditional methods.The existing research has demonstrated that deep learning methods are largely free of manual postprocessing and auxiliary data and show great potential in improving both automation and precision (Dong et al. 2021;Li, Martinis, and Wieland 2019).However, the current deep learning-based surface water mapping models are mainly 1) the simple transfer use of state-of-the-art deep learning models (Isikdogan, Bovik, and Passalacqua 2017;Li, Martinis, and Wieland 2019) and 2) the simple structural fine adjustment of state-of-the-art deep learning models (Jiang et al. 2018;Luo, Tong, and Hu 2021).Although these studies have achieved improvements in surface water mapping, because of the large difference between the SAR image and computer vision picture in terms of color channels, size, resolution, and the complexity of the surface water bodies in the QTP region, the obtained deep learning models are still limited in precision and intelligence due to the lack of sufficient customization in SAR image-based surface water mapping.In this study, we further explore the potential of deep learning in surface water mapping.Specifically, by considering the high complexity of the water bodies in the QTP region, we designed a multiscale ConvNet structure to improve the model performance in feature learning of surface water bodies of different spatial scales.Moreover, a gating mechanism for adaptively determining the significance of multiscale features is designed to promote the efficiency of feature learning in this study.
We make full use of accessible ascending and descending dual-polarization Sentinel-1 data and perform fully automatic surface water mapping in the QTP region by using the structured gated multiscale ConvNet (GMNet).Monthly surface water maps are generated, and spatiotemporal surface water features are captured and analyzed for the QTP region accordingly.The main works and contributions of this paper are described as follows: 1) we developed a new gated multiscale Con-vNet model for automatic and accurate surface water mapping based on Sentinel-1 SAR images; 2) we applied the proposed method for month-by-month surface water mapping on the QTP, and surface water maps at 10-m spatial resolution were produced for 2020; and finally, 3) seasonal dynamics of surface water on the QTP were quantified and analyzed accordingly.The remainder of this paper is structured as follows.Section 2 introduces the study area and dataset.Section 3 describes the proposed deep learning model and the experimental setup.Section 4 provides the method evaluations, and Section 5 presents the spatiotemporal features of the surface water in the study area.Finally, we provide the conclusions of our study in Section 6.

Study area and data
The QTP region is a vast, elevated plateau in central Asia, and its mean elevation reaches 4,000 m above sea level (Zhang et al. 2013) (Figure 1).This region has large numbers of glaciers and lakes that serve as the headwaters for most of the streams and rivers in the surrounding regions, including the three longest rivers in Asia (Yellow, Yangtze, and Mekong Rivers).In this study, we use the QTP boundary provided by Zhang, Li, andZheng (2002, 2017) to determine the range of the QTP region.The study area covers an area of approximately 2.6 × 10 6 km 2 and is approximately 1,500 km in length from north to south and 2,900 km from east to west.

Time-series Sentinel-1 SAR images
Sentinel-1 is a space mission funded by the European Union and carried out by the European Space Agency (ESA) within the Copernicus Programme.The Sentinel-1A and Sentinel-1B satellites were launched on 3 April 2014 and 25 April 2016, respectively.The two satellites have a repeat observation period of 12 days, so the dual-satellite constellation can provide a repeat period of 6 days.Sentinel-1 collects C-band synthetic aperture radar (central frequency of 5.404 GHz) imagery with a variety of ground resolutions depending on the acquisition mode and processing level.Sentinel-1 Ground Range Detected (GRD) scenes in interferometric wide swath (IW) acquisition mode and 10-m spatial resolution are used for our study.The Sentinel-1 C-band SAR instruments support operation in single (HH or VV) and dual polarizations (HH + HV or VV + VH).The Sentinel-1 data follow the polarization scheme; that is, HH + HV or HH polarization data are mostly used for the monitoring of polar environments and sea ice zones, while VV + VH or VV polarization data are used for other observation zones.With the accessible data archive corresponding to our study region, VV + VH images in IW mode are collected for surface water mapping in this study.In total, we acquired 2,756 dual-polarization (VV + VH) images in 2020 (Figure 2).Even though the QTP region can be fully covered by the Sentinel-1 images in each month, there are still gaps in acquiring both ascending and descending images for a specific region and month.As demonstrated in Figure 2, the accessible ascending data are richer than the descending data for the QTP region.For most of the QTP region that is covered by both Sentinel-1 ascending and descending images, simple image stacking is performed on the acquired ascending and descending images, and the stacked Sentinel-1 image is applied for monthly surface water mapping accordingly.

SAR-based surface water labeled dataset
We built an SAR-based surface water dataset for multiscale feature learning of surface water.The dataset contains 39 pairwise Sentinel-1 scenes and ground truths, which are collected at different sites across the QTP (Figure 1).Each Sentinel-1 scene is composed of ascending and descending images, and the pairwise scenes of the dataset are mostly in a very short time interval within 7 days.The surface water boundaries are delineated manually on the screen based on both ascending and descending images.To avoid false determination for the unclear region, the optical Sentinel-2 image with a close acquisition time to the Sentinel-1 scene is used for reference.The extremely difficult-to-recognize water bodies, which are mainly caused by the low signal-to-noise ratio and the limited spatial resolution of the Sentinel-1 image, are excluded from our dataset.The Sentinel-1 scenes have image sizes larger than 3000 × 3000 pixels; thus, they can be further cropped into multiscale patches with varying numbers and sizes.In this study, we set three scales as 2048 × 2048, 512 × 512, and 256 × 256 pixels.The samples of the pairwise multiscale Sentinel-1 scenes as well as the ground truths and the Sentinel-2 optical scenes are shown in Figure 3 below.In the model testing stage, we divide the dataset into training and validation parts, which consist of 32 Sentinel-1 scenes and 7 Sentinel-1 scenes, respectively.Specifically, to make the training data and validation data independent, thus guaranteeing the reliability of the validation result and making the validation data representative of the whole QTP region, the validation scenes are selected by following the rule of randomness and geographical dispersion in our study.Since more labeled data are used to train the deep learning model, better performance is usually achieved by the trained model; therefore, in the model deployment stage, the fully labeled dataset is used for model training, and the fully trained GMNet is used for monthly surface water mapping in the QTP region.

Methodology
We perform monthly surface water mapping on the QTP, and the spatiotemporal analysis is described as follows.In general, we conducted our study in 3 steps (Figure 4): 1) automatic data acquisition with the Google Earth Engine (GEE) platform, 2) surface water mapping using the deep learning method, and 3) spatiotemporal analysis of the surface water.To obtain highly accurate surface water maps, we design a novel gated multiscale ConvNet (GMNet) for water body identification in this study.The source code of our method is open access and available at https:// github.com/xinluo2018/Tibet-Water-2020.git.

Framework of the gated multiscale ConvNet
Multiscale information from satellite images is usually beneficial for more accurate identification of the land cover category.In this study, the surface features corresponding to coarse, medium, and fine scales are captured through a structured convolutional module (named the FeaNet module).In general, features from different scales do not contribute to the image recognition accuracy of equivalence.Accordingly, we introduce a gating mechanism to control the multiscale feature flow.We suggest that the multiscale features that can improve surface water identification can pass through the gate and thus be used for better surface water mapping.Multiscale feature gating and surface water classification are performed through a structured convolutional module (the GateNet module) in this study, and the FeaNet and GateNet modules constitute the newly developed GMNet model.
Specifically, we use X to refer to the input data and w to refer to the weights for convolutional computing.Through convolutional computing, the input X [ R H×W×C can be mapped to the feature map f [ R H×W×C by.
where * denotes convolution; f l = [ f l1 , f l1 , . . ., f ln ] T represents multiscale features in the l-th layer; and n is the number of scales of input data, which is set to 3 in our study.In the decoder part of the structured model, the multiscale features that have passed through gating modules are concatenated on the mainstream with skip connections.We use s l = [s l1 , s l1 , . . ., s ln ] T to represent the multiscale gate in the l-th layer.Then, the concatenated multiscale features can be obtained by where F l represents the combined multiscale feature in the l-th layer of the decoder part of the structured GMNet model, Q represents the elementwise product, and • represents feature concatenation.In the last layer of the GMNet model, a sigmoid function is used to generate the final surface water probability corresponding to each pixel.The overall structure of GMNet is shown in Figure 5.

Multiscale feature gating mechanism
The fine-scale information possesses high ground resolution and is beneficial for fine land cover identification.In the multiscale feature learning of our study, we consider the priorities of scales in the image recognition from fine to coarse, that is, the coarser-scale feature will pass through the gate if the finer-scale feature is unable to accurately separate the surface water from the surface nonwater.We set the fine-scale gate value to a fixed value of 1, and the medium-scale and high-scale gate values are adaptively set as 0-1 through the sigmoid function in the GateNet module.We use s to refer to the feature gate; w and w ′ are the weights of FeaNet and GateNet, respectively.The subscripts 'fine' and 'med' represent fine and medium scales, respectively.Therefore, the gated medium-scale feature can be obtained by.
where j is a scale transform module to be designed for spatial scope unification of multiscale features by simple cropping and interpolation processing.Here, j m−fi represents the scale transform from medium to fine scale, and t represents the sigmoid function.
Similarly, the coarse-scale feature gate can be derived by the medium-scale feature and fine-scale feature obtained by FeaNet.We use F coar to refer to the gated coarse-scale feature, and it can be obtained through: where the subscript 'coar' represents the coarse scale and j c−m represents the scale transform from coarse scale to medium scale.The workflow to obtain the medium-scale and coarse-scale feature gates is shown in Figure 6.Both the FeaNet and GateNet modules are structured with encoder-decoder architecture, and the downsampling (Dsample) and upsampling (Upsample) blocks constitute the basic modules of the network (Table 1).The Dsample block is designed as an inverted residual and linear bottleneck (Sandler et al. 2018); that is, each block consists of a narrow input layer, expanded middle layers, and a narrow output layer.The middle expansion layer consists of depthwise convolutions, which can improve the efficiency of feature learning with lightweight parameters (Howard et al. 2017).The components of Dsample and Upsample blocks are shown in Table 2.

Data augmentation
We perform min-max data normalization and conventional data augmentation during model training.A dynamic data augmentation strategy is performed in our study, that is, the data is augmented before each batch of training data are fed into the model, and we set the augmentation probability for each batch of training data to 0.2 in our study.The data augmentation methods are described as follows:  5) Random region missing: We carry out a region-missing augmentation for the multiscale input patches, i.e. the input patch is masked in a random local region, and the height and width of the maximum masked region are less than 1/6 of the height and width of the patch size.6) Random line missing: The Sentinel-1 images sometimes appear as invalid edges, which is similar to missing line data; this phenomenon is also reported by Liang et al. (2021).To make the trained model applicable for any Sentinel-1 image, random line-missing augmentation is carried out in our study, and the width and length are randomly set to 1 3 pixels and .50 pixels, respectively.

Training strategy
We train our model using the Adam optimizer with b 1 = 0.9 and b 2 = 0.999.We set the initial learning rate to 0.0002, and the learning rate is dynamically reduced by a factor of 0.6 when no improvement is seen in the training loss metric for 20 epochs.We use binary cross entropy to measure the training loss, and a regularization technique of label smoothing (Szegedy et al. 2016; Müller, Kornblith, and Hinton 2019) with a = 0.05 is implemented on the dataset for training loss calculation.Additionly, the overall training epochs and batch size are set to 300 and 16 in our study.

Evaluation metrics
We evaluate the performance of the proposed method by using recall, precision, overall accuracy (OA), and intersection-over-union (IoU) metrics.The recall metric reflects the probability that a surface water pixel is correctly classified, and the precision metric reflects the probability that a pixel classified as surface water is correct.The IoU represents the area of overlap divided by the area of union between the prediction area and the actual area.The OA metric measures classification quality by combining both the surface water and surface land classification accuracies.
IoU is one of the most basic metrics for evaluating the performance of image semantic segmentation, and it can be calculated with.
IoU = P water > T water P water < T water (5) where P water and T water represent the prediction area and the ground truth area for the surface water body.

Accuracy assessment for the proposed method
The proposed GMNet is evaluated based on the seven selected validation sites on the QTP (Figure 1).We selected two state-of-the-art SAR-based water extraction methods for comparative analysis.One is the MS-Deeplab method, which is proven to be better than the other mainstream methods, including PSPNet, U-Net, and the original DeeplabV3 + models (Wu et al. 2022), and the other is the HRNet method, which is proven to be better than the DenseNet121, SegNet, ResNet101, Dee-plabV3+, OTSU, BTS, FCN, U-Net, and DeepUNet methods (Dong et al. 2021;Kim et al. 2021).
According to the accuracy assessment illustrated in The detailed accuracy assessment for each validation site is performed to further illustrate the surface water mapping performance of the proposed GMNet method.As illustrated in Table 4, the proposed GMNet achieves high accuracies at the validation sites.Specifically, the mean values of the recall, precision, OA and IoU metrics are all above 88%.The accuracies vary with the specific conditions at the validation sites (e.g.landscape and topography).Among the validation sites, validation site 7 achieves the highest OA (> 99%) in surface water mapping.On the other hand, the model outputs also show serious misclassifications at validation site 4. Specifically, the low recall value indicates that there are many omission errors in surface water classification.Nonetheless, the model still achieves a high OA (97.10%) for the derived surface water map at validation site 4 due to the accurate classification of nonsurface water.

Ablation analysis of the proposed method
We evaluate the effectiveness of our method through ablation analysis.Generally, the novelty of our ConvNet includes the following: 1) we designed a multiscale ConvNet structure to integrate multiscale information for surface water mapping and 2) we introduced a gating mechanism to determine the feature importance of different scales.We perform an ablation analysis to evaluate the improvements of the novelties in surface water mapping; that is, a comparative analysis is performed among the gated multiscale ConvNet model, the multiscale ConvNet and the single-scale ConvNet to illustrate the effects of the multiscale ConvNet structure and the gating mechanism.We selected a popular DeepLabV3+ (Chen et al. 2018)  The surface water bodies vary in type, size, and shape, and the background nonwater surface varies in land cover category and topography, thus making it challenging to accurately extract water bodies from Sentinel-1 images.For easily recognized scenes (e.g. Figure 8a, b, c, d, e), the singlescale, multiscale and gated multiscale models can correctly separate the surface water body from the background.For complex scenes, such as rugged terrain areas (Figure 8f, g, h, i, j), SAR images experience layover, foreshortening and shadow issues, which result in missing information on the Earth's surface.For this complex scene, the single-scale model performs poorly in surface water mapping, as shown in Figure 8h, and many omissions exist in the derived result.The multiscale model achieves improved results while still experiencing some omissions (Figure 8i).The gated multiscale model performs the best and obtains the fewest misclassifications in the produced  surface water map (Figure 8j).Taking a large lake as an example (Figure 8k, l, m, n, o), the local scenes are fully occupied by a water body, and this situation leads to background information missing.Accordingly, it is challenging to determine the land cover category in the image.As shown in Figure 8m, since only the local-region patch is adopted for model training, the single-scale model performs poorly in surface water extraction.With the integration of the fine-, medium-, and coarsescale information of the image, both the multiscale and gated multiscale models achieved high-precision surface water extraction.

Model performance evaluation with different image acquisitions
We applied both the ascending and descending Sentinel-1 images rather than only a single ascending or descending image for surface water mapping on the QTP region; therefore, the effectiveness of the combined images in surface water mapping was evaluated in this section.The combination of ascending and descending images provides more information to potentially improve the surface water mapping accuracy, while the data combination also brings redundant information and thus leads to uncertainty and computational burdens in data processing.To verify whether it is indeed effective in improving surface water mapping by simply combining ascending and descending Sentinel-1 images, we compared the performances of the proposed models trained on ascending images only, descending images only, and combined ascending and descending images.Since accuracy fluctuation exists in different training processes, we trained the model 10 times in correspondence to different image acquisition cases, and the comparison among different validation results was followed.
According to the average accuracy among the seven validation sites (Figure 9), the model trained on the combined ascending and descending images achieved the best performance, and the model trained on the ascending images only performed the worst.Nevertheless, the model trained on the combined ascending and descending images does not always outperform those trained on the ascending or descending images only.Among the seven sites, the model trained on descending images only outperformed the model trained on the combined ascending and descending images at sites 4 and 6.
We conduct qualitative analyses for sites 3 and 6, which correspond to the following two situations: 1) the model trained using combined ascending and descending images outperforms the model trained on the single-orbit image and 2) vice versa.As shown in Figure 10, site 3 has very rugged terrain, and the acquired image is covered with a large amount of shadow.Since the ascending and descending images provide information from different views and thus compensate for the missing information from each other in the shadow and layover regions, the model trained on the combined ascending and descending images is superior to the model trained on the single-orbit images only.Validation sites 4 and 6 are both relatively flat and thus are not greatly affected by shadows.We take site 6 as an example (Figure 10).The descending image shows high quality in separating the surface water from the nonsurface water background, while the ascending image is contaminated with noise, thus making surface water indistinguishable from the nonsurface water background.The obtained surface water maps follow human intuition and show that the descending image-based surface water map is comparable to the combined ascending and descending image-based surface water map, and both are superior to the ascending image-based surface water map.Since dense high mountains are distributed in the QTP region, in this study, the model trained on the combined images is regarded as the main model for surface water mapping.For the regions in which we could not acquire both ascending and descending images in a specific month, the model trained on single-orbit images was applied for surface water mapping.

A glimpse of the surface water in the QTP region
We train GMNet on all the Sentinel-1 scenes of the dataset, and the fully trained GMNet model is applied for month-by-month surface water mapping on the QTP.Accordingly, monthly surface water maps at a 10 m spatial resolution are generated.We calculated the surface water area in each month, and the seasonal variabilities in 2020 were obtained (Figure 11).The surface water area shows dramatic changes, with a maximum of 52,481.33km 2 in August and a minimum of 24,585.62km 2 in April.Specifically, from February to April, the QTP region covers a low-level surface water area that is less than 30,000 km 2 .The surface water area increases in May and reaches its peak in August, and the surface water area remains stable above 45,000 km 2 from July to October.From the peak month of August, the surface water area continually decreases in the following months and returns to the lowest value in April.According to the surface water area trend obtained in this study, we divide one year into four periods corresponding to the surface water area on the QTP, that is, the low surface water period from February to April, the high surface water period from July to October, the positive surface water transition period from May to June, and the negative transition period from November to January.
We visually analyze the spatial heterogeneity of surface water in the QTP region in this paper.To enhance the visualization effect, we aggregate surface water within 1 • × 1 • tiles.The monthly surface water distribution is illustrated in Figure 12.Generally, the surface water percentage ranges within 30% in different tiles, and the surface water is mainly distributed in the western and northern QTP (Figure 12.a).A possible reason is that the northern and western regions are flat, and therefore, many lakes developed in these regions.We calculate the standard deviation of monthly surface water areas in each tile.As Figure 12.b shows, the tile with more surface water shows larger surface water fluctuation, and the maximum area change reaches 5% of one tile.
This QTP region mainly consists of 10 hydrological basins (Vörösmarty et al. 2010 andShean et al. 2020); accordingly, the surface water percentage and the standard deviation of the surface water areas for each hydrological basin are calculated.We use surface water maps in March and August to represent the surface water in the dry season and wet season, respectively.As shown in Figure 13, the Inner Tibetan Plateau basin has the highest surface water percentages in both the dry and wet seasons, and the Mekong basin has the lowest surface water percentages in both the dry and wet seasons.Specifically, the surface water percentages in the Inner Tibetan Plateau basin and Extended Inner Tibetan Plateau basin reach 2.44%, 1.45% and 0.59% in the dry season and 4.73%, 2.43% and 1.69% in the wet season, respectively.According to the standard deviation map in Figure 13, surface water suffers serious fluctuation in the Inner Tibetan Plateau basin, which demonstrates the highest standard deviation of 0.89% of the surface water areas in one year.The surface water in the Mekong basin is stable, which is mainly due to the low surface water percentage in this region.We selected three local sites in different basins to visually analyze the landscape in different regions and periods.According to Figure 13, the Inner Tibetan Plateau basin site and Yellow basin site have high surface water coverage in the wet season, while in the dry season, many of these areas are frozen.The Yangtze Basin site region is mountainous and has little surface water coverage, and there are no significant differences in surface water coverage between the wet season and dry season.

Fine-scale analysis of surface water dynamics
We count the number of pixels that occur as surface water based on the multiple-month surface water maps, and the surface water occurrence for each pixel can be obtained by dividing the number of months in one year (Figure 14).To illustrate the surface water dynamics in the QTP region in detail, we select three local regions for the fine-scale analysis of surface water dynamics.The remote sensing images, including Sentinel-1 ascending and descending images and Sentinel-2 optical images acquired during the cold season and warm season, are also selected to visually analyze the fine-scale surface water variations in this section.We first perform a visual accuracy analysis for the obtained monthly surface water maps.According to visual inspection between the Sentinel-1 images and the obtained surface water maps, the surface water is highly consistent between the surface water maps and the Sentinel-1 images, which demonstrates that the obtained monthly surface water maps are of high quality and are applicable for seasonal surface water dynamic analysis in the QTP region.
The selected region 1 is located in the southern QTP region, and this region contains several lakes, such as Dung Tso, Pung Tso, and Sam Tso.As demonstrated in Figure 15, the lakes in  this region show different temporal variations; that is, Pung Tso Lake and Dung Tso Lake are relatively stable throughout the year, while Drong Tso Lake and part of Sam Tso Lake show drastic changes within one year.According to the Sentinel-1 ascending images, Pung Tso Lake and Dung Tso Lake maintain similar landscapes in both the cold and warm seasons, while Drong Tso Lake and part of Sam Tso Lake show surface water landscapes in the warm season and nonsurface water landscapes in the cold season.Thus, Drong Tso Lake and part of Sam Tso Lake are frozen in the cold season.This region is extremely representative of most regions of the QTP; that is, frozen surface water is the main reason for surface water fluctuations.Rivers constitute parts of the surface water in the QTP region; therefore, river dynamics are also investigated in detail in this study.As shown in Figure 16, the selected local region contains two rivers: the Yellow River and the Peihe River.Similar to most seasonal surface water bodies in the QTP region, the rivers freeze in the cold season and then melt in the warm season.
However, there are also surface water changes caused by drought and evaporation in a specific region.As shown in region 2 (Figure 17), the surface water at the top of East Taiji Nai'er Lake is frozen in the cold season, the frozen water melts in the warm season, and the surface water increases accordingly from the cold season to the warm season.Nevertheless, the bottom of East Taiji Nai'er Lake is filled with surface water during the cold season, while the water dries up during the warm season, therefore leading to a decrease in surface water from the cold season to the warm season.A similar finding was also demonstrated by Duan (2018), who mentioned that both West Taiji Nai'er Lake and East Taiji Nai'er Lake underwent drought over time.According to the fine-scale analysis of the surface water dynamics above, the surface water dynamics were usually caused by ice melting and water freezing in the cold season and warm season, respectively.However, the surface water dynamics also show abnormal trends due to drought and evaporation in the QTP region.

Conclusion
We proposed a novel deep learning model for surface water mapping based on SAR images.The novelty of the proposed method is that 1) we structured a new convolutional network for multiscale feature learning and 2) we introduced a gating mechanism for more efficient surface water identification by combining multiscale features.According to the validation experiments, the proposed deep learning model achieved high-accuracy surface water mapping; that is, the mean values of the recall, precision, IoU and OA metrics were all above 88%.Through a comparative analysis of methods, the new structured multiscale network and the gated learning mechanism showed significant improvement in surface water mapping.We evaluated the models trained on different Sentinel-1 data acquisitions and showed that the model trained on the combined ascending and descending images outperformed the model trained on the descending image only, and the model trained on the descending image only outperformed the model trained on the ascending image only.
The proposed GMNet model achieved high-accuracy surface water mapping in the QTP region.With the generated monthly surface water maps, the spatiotemporal variations in surface water were explored.According to the dynamic trend in the surface water coverage, we divided one year into four surface water-related periods, that is, the high surface water period from July to October, the negative transition period from November to January, the low surface water period from February to April, and the positive transition period from April to July.For the spatial features of the QTP region, the surface water was mostly distributed in the western and northern Tibetan Plateau region, which is relatively flat and, therefore, developed many lakes.The monthly surface water change showed high consistency with the surface water coverage; that is, the region with high surface water coverage varied greatly over time.

Figure 1 .
Figure 1.Study area and the locations of data for model training and validation.The Shuttle Radar Topography Mission (SRTM) elevation data (Farr and Kobrick 2000) are used as the base map.

Figure 2 .
Figure 2. Space-time distribution of the acquired Sentinel-1 images.a. Temporal distribution of acquired Sentinel-1 images in ascending and descending tracks.The high transparency of the color represents that fewer images are collected on a specific day of the year.b.Spatial distribution of ascending observations.c.Spatial distribution of descending observations.

Figure 3 .
Figure 3. Demonstration of the Sentinel-1 SAR image-based surface water dataset.Sentinel-1 scene is visualized with a color composite of the VV band (R)-VH band (G)-VV band (B), and the reference Sentinel-2 scene is visualized with a false-color composite of NIR (R)-Red (G)-Green (B).
1) Random color jitter: We perform random color jitter by following f (x) = a • f (x) + b, and the contrast and brightness parameters a and b are randomly generated within 0 0.05 during data augmentation.2) Random rotation and flipping: As commonly used data augmentation methods, random horizontal and vertical flipping and random rotation at angles of [90 • , 180 • , 270 • ] are applied.3) Random noise: We add Gaussian noise to the training data during the training process, and the random standard deviation of the Gaussian noise is within 0∼0.1.4) Random band missing: We perform band missing augmentation for the Sentinel-1 images: the ascending bands or descending bands are randomly discarded in a randomly determined region.
model to represent the single-scale ConvNet.Because accuracy fluctuations exist in different model training processes, we train each model 10 times, and the statistical results of each metric are obtained.We use the IoU and model loss values to monitor the model performance during model training.As illustrated in Figure 7, the model performance becomes stable when the training epoch reaches 200, and the proposed gated multiscale model maintains the best performance in the on-going model training.More specifically, according to validation accuracy and model loss metrics, the multiscale model outperforms the single-scale model, and the gated multiscale model outperforms the multiscale model, which demonstrates that both the multiscale model structure and the gating mechanism achieved significant improvements in SAR-based surface water mapping.

Figure 7 .
Figure 7.Comparison of model performances during model training.(a) and (b) correspond to validation accuracy and model loss value metrics, respectively.

Figure 8 .
Figure 8. Illustrations of surface water mapping for the easily recognized scene, complex scene, and large-size water body scene.Panels a, f and k are the Sentinel-1 images for the three scenes.The green and red boxes represent a larger-scale scene and the location of the visualized scene in the larger-scale scene.Panels b, g and l are the Sentinel-2 images used for references.Panels c, h and m are the results derived by the traditional single-scale deep learning model.Panels d, i and n are the results derived by the multiscale deep learning model.Panels e, j and o are the results derived by the gated multiscale deep learning model.

Figure 9 .
Figure 9.Comparison among models with training on ascending images only, descending images only, and combined ascending/descending images.

Figure 10 .
Figure 10.Comparisons of the surface water maps for validation site 3 that are derived from the trained models with different Sentinel-1 image acquisitions.a and f are Sentinel-1 ascending images; b and g are Sentinel-1 descending images; c and h are water maps derived from the trained model with ascending images only.d and i are water maps derived from the trained model with descending images only.Finally, e and j are water maps derived from the trained model with both ascending and descending images.

Figure 11 .
Figure 11.Monthly surface water area statistics for the QTP.

Figure 12 .
Figure 12.Spatial heterogeneity of the monthly surface water in the QTP region.a. the spatially resolved monthly surface water maps, and b. the surface water fluctuation map for the QTP region.The fluctuation is quantified by using the standard deviation metric.

Figure
Figure The dynamic surface water features of hydrological basins in the QTP region.a and b represent the surface water percentages of the hydrological basins in the dry season and wet season, respectively.c shows the standard deviation of the surface water areas in each hydrological basin.d, e, and f corresponding to local sites in Inner Tibetan Plateau, Yellow, and Yangtze basins, respectively.L8 and S2 represent the Landsat 8 and Sentinel-2 optical images, respectively.

Figure 14 .
Figure 14.Surface water occurrence map in the QTP region.

Figure 15 .
Figure 15.Surface water occurrence map for the selected region 1.The green and orange circles represent mutable and stable regions, respectively.

Figure 16 .
Figure16.Surface water occurrence map for the selected region 3.The green circle labels a mutable region.

Figure 17 .
Figure17.Surface water occurrence map for the selected region 2. The green and orange circles represent mutable and stable regions, respectively.

Table 1 .
ConvNet structure in this study.'In','Ex', and 'Out'represent the numbers of input, exchange, and output channels, respectively.

Table 2 .
Layers of the Dsample and Upsample modules.

Table 3
52%, and OA of 97.49% obtained by the other methods, the new proposed GMNet method shows significant superiority over the existing methods.
, the proposed GMNet obtained a slightly lower precision of 97.59% compared with the highest precision of 97.72% obtained by MS-Deeplab.While the proposed GMNet achieved the highest recall of 90.50%, IoU of 88.51% and OA of 98.07%, through comparison with the second-highest recall of 85.27%, IoU of 83.

Table 3 .
Accuracies of the surface water map derived by GMNet and the comparison methods.

Table 4 .
Accuracies of the surface water map derived by GMNet for the validation sites.