A novel fine-resolution snow depth retrieval model to reveal detailed spatiotemporal patterns of snow cover in Northeast China

ABSTRACT Seasonal snow cover is a key component of the global climate and hydrological system, it has drawn considerable attention under global warming conditions. Although several passive microwave (PMW) snow depth (SD) products have been developed since the 1970s, they inherit noticeable errors and uncertainties when representing spatial distributions and temporal changes of SD, especially in complex mountainous regions. In this paper, we developed a fine-resolution SD retrieval model (FSDM) using machine learning to improve SD estimation quality for Northeast China and produced a long-term, fine-resolution, daily SD dataset. The accuracies of the FSDM dataset were evaluated against in-situ SD data along with existing SD products. The results showed the FSDM dataset provided satisfactory inversion accuracy in spatiotemporal evaluation, with the root-mean-square error (RMSE), bias, and correlation coefficient (R) of 7.10 cm, −0.13 cm, and 0.60. Additionally, we analyzed the spatiotemporal variations of SD in Northeast China and found that snow cover was mainly distributed in the Greater Khingan Range, Lesser Khingan Mountains, and Changbai Mountain regions. The SD exhibited high-low distribution patterns with the increased latitude. The annual mean SD slightly increased at the rate of 0.029 cm/year during 1987-2018.


Introduction
Seasonal snow cover, a meteorological component of the cryosphere, has important implications for surface energy balance because of its high reflectivity and low heat conduction (Bhatti, Koike, and Shrestha 2016;Estilow, Young, and Robinson 2015).Many previous studies indicated that seasonal snow cover has many impacts on hydrology, agriculture, ecology, and human activities (Iwata et al. 2010;Qin et al. 2020;Zhang and Ma 2018).Additionally, snowmelt is a valuable freshwater resource feeding river and lake and has a significant influence on soil moisture, soil drought, and flooding in spring (Pulliainen et al. 2020;Yang et al. 2020b).Being one of the main snow-covered areas in China, Northeast China is strongly sensitive to climate change and is an important region for agricultural production (Li et al. 2014;Li et al. 2022b;Qi et al. 2020).Therefore, it is essential to understand the temporal variations and spatial distribution patterns of the snow cover in Northeast China.The snow depth (SD) indicates the amount of snow cover for a given region.A high-quality and long-term SD dataset is a key requirement for the monitoring of climate change and the analysis of snow dynamics (Hori et al. 2017;Luojus et al. 2021).
Various long-term SD datasets have been developed for climate change monitoring, hydrologic research, and flood monitoring, including ground weather station observations, land surface modeling, and remote sensing inversion (Matiu et al. 2021).However, ground weather observations impede the understanding of SD spatial changes in continuous areas because the required number of stations is lacking (Dai and Che 2014;Estilow, Young, and Robinson 2015).Although land surface modeling can provide long-term SD datasets on a regional scale, the coarse spatial resolution of the datasets and extensive input parameters of the models cause significant errors in SD assessments (Zhang et al. 2022).In the preparation of snow cover estimates employing satellite remote sensing, optical remote sensing satellites can be used to extract snow cover under clear weather conditions (Hao et al. 2021;Huang et al. 2016); however, SD retrieval is difficult using visible and infrared remote sensing (Dai et al. 2017;Derksen et al. 2010).Because passive microwaves (PMW) can penetrate cloud cover and interact with snow particles, PMW remote sensing has been widely used to estimate SD since the 1970s (Chang, Foster, and Hall 1987;Foster, Chang, and Hall 1997;Kelly et al. 2003).Chang et al. (Chang, Foster, and Hall 1987) were the first to propose an SD inversion algorithm by taking advantage of the difference between the horizontally polarized TB of the 19 and 37 GHz channels.Foster et al. (Foster, Chang, and Hall 1997) improved Chang's algorithm and applied the novel algorithm in North America and Eurasia.Kelly et al. (Kelly 2009) developed a dynamic SD retrieval algorithm based on the difference between the vertical and horizontal polarization information.An advanced assimilation method has been proposed to assimilate ground SD observations with PMW TBs and produce daily snow-water equivalent datasets in the Northern Hemisphere (Pulliainen et al. 2020).A more accurate SD dataset has been also developed by Che et al. (Che et al. 2008) in China, and the SD product is widely used for the evaluation and analysis of SDs in China (Barnett, Adam, and Lettenmaier 2005;Che et al. 2016;Dai and Che 2014;Wei et al. 2021a;Wei et al. 2021b).
Although several SD inversion algorithms based on PMWs have been developed since the 1970s, and many available PMW SD products have been produced, they are affected by multiple factors, such as coarse spatial resolutions, snowpack metamorphism, and vegetation cover.These interference factors cause notable biases and impede the explicit characterization of the spatiotemporal variations of the snow cover, especially in complex mountainous regions (Gu, Ren, and Li 2016;Li, Durand, and Margulis 2012;Slater et al. 2013;Wang et al. 2019).Many previous studies made considerable efforts to improve the existing PMW SD data.For example, Yan et al. (Yan, Ma, and Zhang 2021) developed a fine-resolution SD product for use in Tibetan Plateau by combining the existing PMW SD data with a snow cover probability product of a high spatial resolution, the accuracy of the fine-resolution SD product obtained depending on the original PMW SD data, which ignored the influence of snowpack metamorphism and surface environment on the data.To reduce the influence of complex environmental factors on SD inversion, Wei et al. (Wei et al. 2022b) redistributed the PMW SD dataset (AMSR2) and built a multifactor SD downscaling model by introducing several factors (e.g.geolocations, land cover fractions, and topographical features).However, this method can calculate only historical SD data and it cannot process real-time SD data.Dai et al. (Dai et al. 2018) proposed a dynamic, high spatial resolution SD inversion model to overcome the impact of coarse spatial resolution and snowpack metamorphism on SD assessments in the Tibetan Plateau.However, the application of the model in high vegetation coverage areas, such as Northeast China, is difficult.Machine learning models also have been used to retrieve SD in the past decade.Yang et al. (Yang et al. 2020a) investigated the potential capability of the machine learning method in SD estimation and reconstructed a new SD dataset over China.Xu et al. (Xu et al. 2022) present a spatiotemporally dynamic SD retrieval algorithm based on the BP-ANN model to account for the heterogeneity of snow cover in the global region.Hu et al. (Hu et al. 2022) developed a data fusion method based on machine learning models by combining multiple PMW SD datasets, but the coarse spatial resolution of the SD datasets has failed to improve.
The abovementioned studies have shown that seasonal snow cover has become an interesting research topic for many scholars.However, a few studies have focused on the spatiotemporal patterns of fine-resolution SD, and most studies have focused only on algorithm improvement.Affected by global warming recently, spring drought has occurred frequently in Northeast China (Li et al. 2022a;Li et al. 2022b), previous SD datasets lack practical significance to guide the development of crop growth, agricultural production, and water resource management because of the continuity of data and the low resolution.It is necessary to develop a high-quality SD dataset for monitoring and modeling hydrological processes.Therefore, the objective of this study was to develop an accurate SD retrieval model and produce a high-quality, long-term, fine-resolution, daily SD dataset.Compared with previous studies that focused on improving existing PMW SD products, our study was novel because it adopted a zoning modeling strategy to overcome the snow cover heterogeneity in different land types and produce a fine-resolution SD dataset.The spatial and temporal changes of the SD in Northeast China were also subsequently investigated during 1987-2018.We anticipate that the novel SD dataset will contribute to climate change monitoring and hydrological research in Northeast China.

Study area
The location of Northeast China (115°30 ′ E-135°02 ′ E and 38°43 ′ N-53°36 ′ N) is shown in Figure 1.Northeast China is the main snow cover region in China (Qi et al. 2020), the climate is controlled by the high-latitude East Asian monsoon, which includes warm temperate and cold temperate zones latitudinally, as well as semi-arid, semi-humid and humid zones from west to east.Winter is long and cold, with an annual average temperature range of 3-7°C.Besides, rainfall is generally concentrated from June to August (400-800 mm/year) (Li et al. 2022a).The terrain of Northeast China is complex and diverse, surrounded by mountains on three sides and high in elevation, with the Changbai Mountains in the east and the Lesser Khingan Mountains and Great Khingan Mountains in the north and west, respectively.The Sanjiang Plain, Songnen Plain and Liaohe Plain with flat terrain have fertile black soil and are important commodity grain bases in China.Furthermore, various types of land cover, such as farmland, grassland, and forests, are found in Northeast China.According to statistics, the forest cover in Northeast China exceeds 40% of its total area (Figure 1), which poses a serious challenge to SD inversion (Gu, Ren, and Li 2016;Guangrui et al. 2021).

Ground observation data
(1) Meteorological station observations (Dataset 1) The National Meteorological Information Centre, China Meteorology Administration (CMA, http://data.cma.cn/en)operates 91 ground meteorological stations in Northeast China.The spatial distribution of the stations is shown in Figure 1(a).The variables of the stations, including the name, geolocation (latitude and longitude), daily near-surface temperature (°C), daily SD (cm), and observation date, were collected.The SDs observed at daily near-surface temperatures below 0°C (273 K) were selected during the period 1987-2018.Accordingly, a total of 97,200 independent SD observations were collected during the study period during 2010-2018 and used to train the developed SD model.Moreover, a total of 201,696 independent SD observations made during the period 1987-2009 were used as 'true' data to estimate the accuracy of the developed model.
(2) Snow route observation data (Dataset 2) Dataset 2 relates to the SD samples used in the field snow survey experiment.Two snow survey routes (Figure 1(a)) were conducted between December 2017 and March 2018 in Northeast China under the Chinese snow survey project.A total of 102 snow samples were obtained from snow route 1 (Figure 1(b)), which were mainly distributed in farmland, grassland, and forest areas.Snow route 2 (Figure 1(c)), which included 60 snow samples, was located in the Lesser Khingan Mountains and Changbai Mountains.Dataset 2 was also used as the 'ground-truth' data for verification of the performance of the proposed model.

Enhanced resolution PMW data
The Calibrated Enhanced Resolution Brightness Temperature (CETB) data are available at https://nsidc.org/data/NSIDC-0630/versions/1 from 2018.They have been provided by the NASA Making Earth System Data Records for Use in Research Environments program (Brodzik, Long, and Hardman 2018).The original coarse-resolution TB data were gridded to obtain fine-resolution TB data using the scatterometer image reconstruction algorithm.The 19 and 22 GHz channels were reconstructed at a 6.25 km × 6.25 km spatial resolution, while the 37, 89, and 91 GHz channels were reconstructed at a 3.125 km × 3.125 km spatial resolution.The continuous CETB data from SSM/S and SSMIS radiometers could be downloaded from 1987.The CETB data at 19 and 37 GHz during the period from 1987 to 2018 were used in the study.

Other auxiliary data
It is important to understand spatiotemporal variations of snow cover characteristics (including snow density and snow grain size) in SD retrieving (Xu et al. 2022;Yang et al. 2021).In this study, the ERA5-land data with 10 km spatial resolution were employed to obtain the dynamic snow density.The air and soil temperatures obtained from ERA5-land data were used to indirectly characterize spatiotemporal variations of the snow grain size (see Section 3).The hourly ERA5-land data can be obtained at https://cds.climate.copernicus.eu.In addition, the land cover data with 500 m spatial resolution were required at https://search.earthdata.nasa.gov.The digital elevation model (DEM) data with 90 m spatial resolution were used in this paper, which could be obtained at http://srtm.csi.cgiar.org.

Main variables' preparation
The variables used in the study included ground in situ SD, CETB, snow cover characteristics, and environmental factors.We have listed these variables and their sources and contributions in Table 1.
Compared with the original PMW TB data at a 25 km resolution, the CETB data provide snowpack scattering information at a high spatial resolution (6.25 km × 6.25 km).Xiao et al. (Xiao et al. 2021) proved that the CETB data could provide a reliable data source for snow cover fraction assessment at a fine scale.We attempted to use these data to retrieve SD, improve the uncertainty caused by the coarse resolution, and restructure a fine-resolution SD dataset.Considering the influence of wet snow and the sensitivity of snow cover, only descending CETB data with horizontal (H) polarization (F08, F11, F13, and F17) were used in the study.Furthermore, to correspond to the spatial resolution at 19 GHz, the CETB at 37 GHz was aggregated to 6.25 km by averaging the surrounding four pixels.
The accuracy of SD assessment using PMW remote sensing is strongly related to snow characteristics, especially to snow grain size and density (Dai et al. 2022;Gao et al. 2022).However, temporally continuous and spatially distributed snow characteristics are usually unavailable at large spatial scales.According to ERA5-land data, daily snow density data can be employed as an input for the developed SD model.Besides, the temperature gradient (TG), expressed using Equation (1), can be used to indirectly characterize the spatiotemporal variations of the snow grain size.Previous studies have also shown that the TG is helpful a priori information for understanding the spatiotemporal distribution of snow grain size on continental scales (Xu et al. 2022).
where TG d , Tsoil d , and Tair d represent the daily temperature gradient in snow cover, soil temperature in the upper layer, and air temperature, respectively, they could be extracted from hourly ERA5-land data.The snow density and TG were resampled to the 6.25 km spatial resolution using the nearest-neighbor interpolation method.Vegetation-covered regions, especially in mountainous regions with complex terrain, are the main cause of errors found in SD retrieval (Che et al. 2016;Li and Kelly 2017).To overcome vegetation-related uncertainties, in our study, we calculated the forest fraction (FF) within each pixel of the CETB according to land cover type data.The FF maps from 2001 to 2016 were averaged into a single map to denote the average FF.In addition, the original DEM data were mosaicked and resampled to 6.25 km using nearest-neighbour interpolation.Subsequently, topographic factors, including elevation, slope, aspect, and roughness, were extracted using ArcGIS 10.4 to reduce the uncertainties of the complex terrain.

Random forest regression model
In our previous studies, we discussed the performance of different machine learning models, including support vector regression, XGboost, and RF regression, used in SD retrieval, and the results indicated that the RF regression model considerably outperformed the other machine learning regression models (Wei et al. 2022a;Wei et al. 2022b).The RF is an ensemble model based on decision trees.It can untangle the complex nonlinear relationship between the response variable and the prediction variables and was developed by Breiman (Breiman 2001).Before employing the RF regression model, the number of decision trees in the RF had to be typically defined.As stated by Yang et al. (Yang et al. 2020a), the number of decision trees was set to 1000 in this study.The other system parameters in the model were set to their default values.The RF regression model was obtained using the R package.

Development of the fine-resolution SD model
Because SD inversion is complex and affected by many factors, such as coarse spatial resolution, snow metamorphism, and environmental factors, we directly used the RF regression model to establish the nonlinear relationship between SD and relevant variables, including CETB, snow density, TG, FF, and topographical features (elevation, slope, aspect, and roughness).Previous studies have shown that the single SD retrieval model did not perform satisfactorily in Northeast China, especially in forest regions (Gu, Ren, and Li 2016;Wei et al. 2021b;Yang et al. 2020b).In this study, the developed SD retrieval model (called fine-resolution SD model, FSDM), was designed to be calibrated individually in forest regions to account for the spatiotemporal heterogeneity of the prediction variables.As shown in Equation (2) that the FSDM is divided into two parts based on the RF regression model: the non-forest region (FF ≤ 0.10) and the forest region (FF > 0.10).Dataset 1 of the period 2010-2018 was regarded as the response variable in the RF model for the training model, and the abovementioned variables were considered the input variables.The selection of the input variables will be explained in section 5.The main structure of the study is illustrated in Figure 2.

Bias
where SD 0 and SD x represent the observed SD and retrieved SD, respectively, N is the number of SD samples.To show the spatiotemporal patterns of SD variation in Northeast China between 1987 and 2018, Sen's slope estimator and the Mann-Kendall test (Gocic and Trajkovic 2013) were employed to analyze the SD variation trend.−0.13 cm, and 0.60, respectively.The RMSE, bias, and R of the WESTDC SD product were 8.08 cm, −0.62 cm, and 0.43, respectively (Figure 3(b)).Furthermore, the error bar in Figure 3 indicates that two SD products underestimated observed SD under deep snowpacks.To explore the performance of two SD products under deep snowpack conditions, we selected 25 cm as the threshold to assess the performances in shallow (≤25 cm) and deep (>25 cm) snowpacks.Table 2 displays the comparison between FSDM data and the WESTDC product in deep and shallow snow covers.In a shallow snowpack, the overall accuracy of the FSDM dataset was higher than that of the WESTDC product, the RMSE, bias, and R were 6.36, 0.51 cm, and 0.62, respectively.In the deep snowpack, the FSDM dataset and WESTDC product demonstrated significant underestimation with a bias of −12.42 cm and −13.60 cm, respectively.However, when we checked the SD observation samples, we found that shallow snowpack conditions account for approximately 95% of total SD samples, and deep snow conditions account for approximately only 5% of all samples in Northeast China.

Validation and comparison of the FSDM dataset
Figure 4 shows the spatial distribution of RMSEs of the FSDM dataset and WESTDC product from the validations against the observed SD data at 91 meteorological stations in Northeast China.The results suggested that the RMSEs of the FSDM dataset were lower than those of the WESTDC SD product in most stations, the FSDM dataset effectively improved the accuracy of SD inversion in mountainous areas (Figure 4(a)).However, we found that the FSDM dataset and WESTDC product still had higher RMSE in forest areas, and the RMSE of WESTDC products was higher than that of the FSDM dataset.We compared the SD retrieval accuracy of the FSDM dataset and WESTDC product in forest and non-forest areas, as shown in Table 3.The results have shown that the overall accuracy of the FSDM dataset and WESTDC product was lower in forest regions than in non-forest regions.However, the overall accuracy of the FSDM dataset was higher than that of the WESTDC product, with the RMSE, bias, and R of 6.56 cm, −0.41 cm and 0.69 in non-forest regions, with the RMSE, bias, and R of 7.78, 0.24 cm, and 0.47 in forest regions.
To investigate the stability of the FSDM dataset, we analyzed the agreement between the retrieved SD and observed SD based on snow survey routes.Figure 5 shows the results of the validation of the two SD products against Dataset 2, the results from Figure 5(a) and (b) show that the FSDM dataset had better stability compared with the WESTDC product because of considering the spatiotemporal variation of snowpack characteristics, with RMSE, bias, and R were 8.71 cm, −5.12 cm, and 0.57, respectively.The WESTDC product had large uncertainties, with RMSE, bias, and R of 12.19 cm, −6.07 cm, and 0.12, respectively.Furthermore, we analyzed the performance of the FSDM dataset and WESTDC product in time series.Figure 5(a') and (b') represent the scatter diagrams of the FSDM dataset and WESTDC product from the validations against snow routes in time series, the results indicated that the two products had better consistency with the observed SD during snow stable period than accumulation and ablation periods.This is mainly because the snowpack was dry in the stable period, the snow characteristics were relatively stable, and the surface radiation signal was less disturbed by the snowpack characteristics.

Spatial patterns of the SD from the FSDM datasets
The abovementioned results suggested that the FSDM dataset had satisfactory accuracy and stability.In this section, we analyze the spatial patterns of the FSDM dataset during the period 1987- 2018.Figure 6 presents the spatial distributions of the monthly SDs in Northeast China.The results show that the snow cover is mainly distributed in the Greater Khingan Range, Lesser Khingan Mountains, and Changbai Mountain regions, and the SD exhibited high-low distribution patterns with latitude.The snow cover appeared in the Greater Khingan Range in October and disappeared in April in most regions.The maximum SD value occurred in the Greater Khingan Range in February.In the whole snow season, the SD in Liaohe Plain was the smallest, because the air temperature was high in Liaohe Plain, resulting in late snowing and early melting.Figure 7 shows the spatial distribution of the multiyear (1987-2018) mean monthly SD change trends from October to April.From a visual perspective, the spatial distribution of the SD change trends had obvious heterogeneity in Northeast China, and the changing trend of SD in different months also had significant differences.For example, in October, January, February, and April, the changing trends of SD decreased in most parts of the Changbai Mountains, whereas the changing trends of SD increased in November, December, and March.Except for October, the changing trends of the mean monthly SD decreased in the north of Northeast China.However, Figure 8 shows that the change trends of the monthly SD were not significant in most parts of Northeast China.

Temporal variation of SD from the FSDM dataset
Figure 9 shows the interannual variations and linear trends of the monthly SDs during 1987-2018.
In the case of interannual variations, the fluctuation range of annual average SD tends to be stable, for example, the interannual mean SD fluctuated between 3 and 5 cm in October (Figure 9   average SD was significant in January (6-18 cm) and February (4-17 cm).In the case of change trends, except for February, the interannual mean SD in other months showed an increase trend, and the increase trend was highest in March with a rate of 0.036 cm/year (p > 0.05).The annual mean SD increased slightly with a rate of 0.007 in October.In February, the annual mean SD decreased slightly with a rate of −0.01 cm/year (p > 0.05).However, the variation trends of the mean monthly SD were not significant during 1987-2018.

The selection of the input variables in FSDM
In the FSDM, nine types of input variables, namely, TB at 37H and 19H, FF, TG, snow density, elevation, slope, aspect, and roughness are selected as the input variables for SD retrieval.Each type of input variable can provide a priori knowledge from specific perspectives, as shown in Table 2. Figure 10(a) and (b) demonstrate the importance of the input variables to FSDM performance in non-forest and forest regions, respectively.The results indicate that all input variables contribute to the improvement of the SD accuracy of the FSDM at varying degrees.However, to determine the optimal input features and make the training procedure simple without sacrificing prediction accuracy, the redundant predictor variables have to be filtered out.Accordingly, we applied the 10-fold CV strategies to the training dataset to demonstrate FSDM performance.First, only TB at 19 and 37 H without auxiliary data were used as the input variables to test the effectiveness of the FSDM.Thereafter, we added different types of variables to the FSDM and tested its performance.If during this process, the RMSE decreased and R increased, then the selected input variables are considered to be effective for SD assessment.Tables 4 and 5 show the performance of the FSDM based on different feature combinations in non-forest and forest regions, respectively.The RMSEs of the M1 are 6.30 and 8.21 cm in non-forest and forest regions, respectively, and R was less than 0.60.When the FF factor was included, the FSDM demonstrated better performance in forest regions than when the FF factor was excluded, as shown in Table 5 for M2.However, the FF factor does not effectively improve the FSDM performance in non-forest regions and may even play a negative role (M2 in Table 4).When we included the variables TG, FF, snow density, elevation, and slope in the FSDM individually, as shown for M6 in Tables 4 and 5, the SD estimation accuracy based on FSDM showed a significant improvement, with the RMSE decreasing by ∼ 2 cm in the non-forest and forest regions, respectively.The inclusion of the two variables, aspect and roughness, does not improve FSDM performance in non-forest  regions (M7 and M8 in Table 4), but the inclusion of roughness contributed to improving the SD estimates in forest regions (M8 in Table 5).Accordingly, as the input variables of FSDM, we selected TB, TG, snow density, elevation, and slope for the non-forest regions and TB, FF, TG, snow density, elevation, slope, and roughness for the forest regions, as shown in Equation (2).

Detailed advantages of the FSDM
In the studies mentioned, the accuracy and stability of the FSDM dataset were verified and the spatiotemporal changes in Northeast China were analyzed using the FSDM dataset.In this section, we compared the temporal and spatial changes of SD using FSDM and WESTDC datasets in Northeast China and discussed the differences between FSDM and WESTDC datasets for further illustrating the potential capability of FSDM in revealing spatiotemporal variations of snow cover at a fine scale.Figure 11    Meanwhile, Figure 12(c) shows the similarity and difference in variation trends between the FSDM dataset and the WESTDC product.Generally, the variation trends of SD were the same in most areas (more than 60% of the total region) between the FSDM dataset and the WESTDC product.However, we could see from Figure 12(c) that there were significant differences in SD variation trends between FSDM and WESTDC products.For example, in the northern part of the Lesser Khingan Mountains, the FSDM product recorded a decreasing trend in the SD (Figure 12 (a)), whereas the WESTDC product recorded an increasing trend in the SD (Figure 12 (b)).In the southern Changbai Mountains, an increasing trend of the SD was shown by the FSDM product, whereas a decreasing trend of the SD was shown by the WESTDC product.Based on the comparison of trends in Figure 12 and ground meteorological station observations in Figure 1, we selected 5 specific regions (the black circle in Figure 12 (c)) to check the variation trend of SD. Figure 13 shows the variation trends of SD based on the ground meteorological station observations (black line), FSDM dataset (orange line), and WESTDC product (blue line).The results show that the changing trends of the SD had good consistency between FSDM datasets and ground observation stations.However, the variation trends of the SD were different between the WESTDC product and station observations.

Potential errors in the FSDM dataset
The FSDM dataset has satisfactory accuracy and stability, and it can accurately reveal the spatiotemporal variations of SD at fine resolution, which will contribute to hydrological and climatological studies conducted in continuous watersheds.However, the FSDM dataset has several limitations.Figure 3 and Table 2 show that the FSDM does not effectively solve the problem of underestimation  in deep snowpack conditions (SD > 25 cm), the value of the bias was −12.42 cm for deep snowpack, which may be due to the TB reaching a saturation threshold and does not tend to monotonically decrease with an increase in the SD when SD in deep condition.However, because deep snow samples constitute only 5% of all the samples (Table 2), the FSDM still has extensive applications in Northeast China.
Figure 6 shows that the snow cover is mainly distributed in mountainous areas with a dense forest cover and complex terrain environments, including the Greater Khingan Range, the Lesser Khingan Mountains, and the Changbai Mountain regions.Although the FF and topographical features were included to reduce the impact of the forest attenuation and complex terrains and they achieved lower RMSEs in FSDM. Figure 4 and Table 3 indicate that the FSDM did not perform as well in complex mountainous areas as in the other regions (such as farmland), which may be due to the following reasons.First, forest cover attenuation with a high degree of complexity is related to not only the FF but also forest types, forest biomass, and tree canopy size (Gu, Ren, and Li 2016;Guangrui et al. 2021).Secondly, because the physical explanation that how the complex terrain in mountainous regions alters the radiative transfer of microwave emission is still unclear for machine learning models, it is still challenging to model the spatial variation and temporal evolution of SD in mountainous regions (Li, Durand, and Margulis 2012).Thirdly, an uncertain factor in SD inversion is the presence of liquid water.Yang et al. (Yang et al. 2020a) showed that large errors in SD assessment always occur in the late snow season.When the snow begins to melt, the snowpack tends to undergo frequent freeze-thaw cycles which leads to the evolution of the snow characteristics.Therefore, it is impossible to retrieve SD based on PMW under wet snow conditions (Che et al. 2014).

Conclusion
An accurate, reliable, high-resolution, and long-term SD dataset is essential for hydrological process monitoring and climate modeling applications.To overcome the uncertainties derived from coarse spatial resolution, snowpack metamorphism, complex terrain, and vegetation cover in SD inversion, this study developed an FSDM by combining the CETB data and other auxiliary information (e.g.snow cover characteristics and environmental factors), and produced a long-term, fine-resolution daily SD dataset for the period 1987-2018.Instead of complex physical models, the RF regression model was used to untangle the nonlinear complex relationship between SD and other auxiliary information, and a zoning modeling strategy was employed in the FSDM to reduce the bias caused by snow heterogeneity in forest regions.The validation against 91 ground meteorological stations and 162 SD observations in snow routes shows that the FSDM dataset achieves satisfactory accuracy and stability and that it significantly improves both spatial resolution and retrieval accuracy compared with traditional empirical algorithms.
Using the FSDM dataset, we found that the snow cover was mainly distributed in the Greater Khingan Range, Lesser Khingan Mountains, and Changbai Mountain regions and the SD showed a high-low distribution pattern with latitude.The snow cover appeared in the Greater Khingan Range in October and disappeared in April in most regions.The maximum SD value occurred in the Greater Khingan Range in February.In the whole snow season, the SD in Liaohe Plain was the smallest.The spatial distribution of the SD change trends had obvious heterogeneity, and the changing trends in different months also had significant differences.In October, January, February, and April, the SD decreased in most parts of the Changbai Mountains, whereas it increased in November, December, and March.Except for October, the changing trends of the mean monthly SD decreased in the north of Northeast China.Furthermore, the annual mean SD showed a slightly increasing trend (p > 0.05) at a rate of 0.029 cm/year during 1987-2018.However, the variation trends of SD are not significant in most parts of Northeast China.
Like the existing SD products, the FSDM dataset also has limitations, i.e. underestimation of deep snowpack.The FSDM did not perform as well in complex mountainous areas as in the other regions (such as non-forest regions) because of the high complexity of microwave radiation in forest regions.Future studies may incorporate the physical model to separate the contributions of complex terrains to TB.In this way, we can not only achieve accurate SD assessment but also gather the physical details of the radiative transfer of microwave emissions in mountainous regions.In addition, the presence of liquid water in the snowpack can cause more errors in SD assessment.Therefore, an accurate snow cover detection method should be developed to filter out the wet snow cover in future studies.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Figure 1 .
Figure 1.Study area (a) Base map showing the land cover types, the spatial distribution of ground meteorological stations, and the field snow routes.(b) and (c) represent the distribution of SD observations along two snow routes.
FSDM = RF(TB 19H , TB 19H , Snow density, TG, Elevation, Slope) FF ≤ 0.01 RF(TB 19H , TB 19H , Snow density, TG, FF, Elevation, Slope, Roughness) FF .0.01 (2) 3.4.Evaluation indicators The daily observed SD at 91 ground meteorological stations (Figure 1(a)) was used to evaluate the performances of the FSDM during the period 1987-2009.Additionally, Dataset 2, including 162 observations from the field snow survey experiment during 2017-2018, was also used as the truth SD observations to verify the stability of the FSDM in space.The RMSE, bias, and R were selected as the evaluation indicators to illustrate the inversion accuracy of the FSDM.The evaluation indicators were calculated using the following equations:

Figure 2 .
Figure 2. Schematic of the fine resolution SD retrieval model.

Figure 3 .
Figure 3. Density scatterplots of the retrieved SD and the observed SD during the period 1987-2009.(a) FSDM dataset and (b) WESTDC product.
Figure9shows the interannual variations and linear trends of the monthly SDs during 1987-2018.In the case of interannual variations, the fluctuation range of annual average SD tends to be stable, for example, the interannual mean SD fluctuated between 3 and 5 cm in October (Figure9(a)), and fluctuated between 3 and 7 cm in April (Figure9(g)).But the fluctuation range of the annual

Figure 4 .
Figure 4. Spatial patterns of the RMSE of the (a) FSDM dataset and (b) WESTDC product from the validations against the observed SD data from 91 meteorological stations in Northeast China.

Figure 5 .
Figure 5. Scatter plots of retrieved SD and observed SD based on Dataset 2. (a) and (b) represent the FSDM and WESTDC, respectively; (a') and (b') represent the comparison result along the observation time.

Figure 6 .
Figure 6.Spatial variations of the multiyear (1987-2018) mean monthly SD from October to April in Northeast China based on the FSDM dataset.

Figure 7 .
Figure 7. Spatial distribution of the multiyear (1987-2018) mean monthly SD change trends from October to April in Northeast China based on the FSDM dataset.
(a)  and (b) show the variation trends of the FSDM dataset and WESTDC product during the period 1987-2018.Generally, the two SD datasets exhibited similar variation trends.The annual SD increased slightly at a rate of 0.029 cm/year (p > 0.05) in the FSDM dataset and at a rate of 0.027 cm/year (p > 0.05) in the WESTDC product.The maximum SD values (exceeding 10 cm) were recorded in 2017, while the minimum SD values (below 5 cm) were recorded in 2007 for two datasets.We also show the spatial pattern of the annual SD variation.Figure12(a) and (b) represent the variations of the SD spatial pattern of the FSDM dataset and WESTDC product during 1987-2018, respectively.The results show that the variations of the annual SD were not significant in most parts of Northeast China.However, compared with the WESTDC product, the FSDM dataset exhibited a fine spatial pattern.

Figure 8 .
Figure 8. Spatial distribution of the multiyear (1987-2018) mean monthly SD significance test from October to April in Northeast China based on the FSDM dataset.The variation trend is regarded as statistically significant when p < 0.05.

Figure 9 .
Figure 9.The interannual variations and the linear trends of the mean monthly SD from October to April during 1987-2018.

Figure 10 .
Figure 10.Importance of the input variables for FSDM performance in (a) non-forest regions and (b) forest regions.

Figure 11 .
Figure 11.Mean annual SD variation of the (a) FSDM dataset and (b) WESTDC product during 1987-2018 in Northeast China.

Figure 12 .
Figure 12.Variation trends of the SD from 1987 to 2018.(a) and (b) represent the variations of the SD spatial pattern of the FSDM dataset and WESTDC product, respectively.(a) and (b) represent the spatial patterns of the trend changes from the FSDM dataset and the WESTDC product, respectively.(c) shows the similarity and differences between the FSDM dataset and the WESTDC product, and the pie chart at the bottom right represents the proportion of the similarity/difference areas in the total area.

Figure 13 .
Figure 13.Variation trend analysis of the SD in five regions based on station observations, FSDM dataset, and WESTDC product.

Table 1 .
Summary of the datasets used in the study.

Table 2 .
Comparison between the FSDM dataset and WESTDC SD product for shallow (≤25 cm) and deep (>25 cm) snow covers.

Table 3 .
RMSE, bias, and R of the FSDM dataset and WESTDC product in non-forest and forest regions.

Table 4 .
Detailed error statistics of the FSDM in non-forest regions based on different feature combinations.

Table 5 .
Detailed error statistics of the FSDM in forest regions based on different feature combinations.