A long-term daily gridded snow depth dataset for the Northern Hemisphere from 1980 to 2019 based on machine learning

ABSTRACT A high-quality snow depth product is very import for cryospheric science and its related disciplines. Current long time-series snow depth products covering the Northern Hemisphere can be divided into two categories: remote sensing snow depth products and reanalysis snow depth products. However, existing gridded snow depth products have some shortcomings. Remote sensing-derived snow depth products are temporally and spatially discontinuous and tend to underestimate snow depth, while reanalysis snow depth products have coarse spatial resolutions and great uncertainties. To overcome these problems, in our previous work we proposed a novel data fusion framework based on Random Forest Regression of snow products from Advanced Microwave Scanning Radiometer for the Earth Observing System (AMSR-E), Advanced Microwave Scanning Radiometer-2 (AMSR2), Global Snow Monitoring for Climate Research (GlobSnow), the Northern Hemisphere Snow Depth (NHSD), ERA-Interim, and Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2), incorporating geolocation (latitude and longitude), and topographic data (elevation), which were used as input independent variables. More than 30,000 ground observation sites were used as the dependent variable to train and validate the model in different time periods. This fusion framework resulted in a long time series of continuous daily snow depth product over the Northern Hemisphere with a spatial resolution of 0.25°. Here, we compared the fused snow depth and the original gridded snow depth products with 13,272 observation sites, showing an improved precision of our product. The evaluation indices of the fused (best original) dataset yielded a coefficient of determination R2 of 0.81 (0.23), Root Mean Squared Error (RMSE) of 7.69 (15.86) cm, and Mean Absolute Error (MAE) of 2.74 (6.14) cm. Most of the bias (88.31%) between the fused snow depth and in situ observations was in the range of −5 cm to 5 cm. The accuracy assessment of independent snow observation sites – Sodankylä (SOD), Old Aspen (OAS), Old Black Spruce (OBS), and Old Jack Pine (OJP) – showed that the fused snow depth dataset had high precision for snow depths of less than 100 cm with a relatively homogeneous surrounding environment. The results of random point selection and independent in situ site validation show that the accuracy of the fused snow depth product is not significantly improved in deep snow areas and areas with complex terrain. In the altitude range of 100 m to 2000 m, the fused snow depth had a higher precision, with R2 varying from 0.73 to 0.86. The fused snow depth had a decreasing trend based on the spatiotemporal analysis and Mann-Kendall trend test method. This fused snow depth product provides the basis for understanding the temporal and spatial characteristics of snow cover and their relation to climate change, hydrological and water cycle, water resource management, ecological environment, snow disaster and hazard prevention.


Introduction
Seasonal snow is a fundamental component of the global energy and water cycles (Barnett et al., 2005;Bormann et al., 2018).Snowpack is highly vulnerable to climate change, but although snow extent is monitored routinely, the fundamental properties of snow depth are measured remarkably poorly (Pritchard, 2021).The IPCC Special Report on Oceans and Cryosphere in a Changing Climate (SROCC) highlighted key strengths and weaknesses in our understanding of the cryosphere (IPCC, 2019).And spatio-temporal variability trends in snow depth over the Northern Hemisphere during the 20th century that have still not been accurately assessed.In contrast to snow cover, reliable quantitative knowledge on snow depth and its trend are lacking (Bormann et al., 2018;Pulliainen et al., 2020).Snow depth is more challenging to monitor than snow cover extent at both global (Zhu et al., 2021) and regional scales (Che et al., 2008) because of limited surface observations along with inadequate measurements and methods for retrieval from space (Bormann et al., 2018).Hence, confidence in snow depth trends is much lower than that of snow cover extent.
Currently, snow depth data can be obtained from in situ observations (Kinar & Pomeroy, 2015), passive microwave brightness temperature (Chang et al., 1987), synthetic aperture radars (Lievens et al., 2019), LiDAR (Currier et al., 2019;Painter et al., 2016), and reanalysis datasets (Parker, 2016).In situ observations consist of meteorological station data, snow survey data and automatic snow depth observation equipment set up in the field.These datasets, while accurate, are based on single-point observation.Snow surveys are time-consuming and laborious.Some automatic measurement networks provide snow depth information but are mostly located in nearly flat terrains and seldom cover the higher elevations (Dozier et al., 2016).Snow depth data based on passive microwave brightness can reveal the spatiotemporal pattern of large areas.Hereafter, these datasets are referred to as remote sensing snow depth products.At present, frequently used remote sensing snow depth products are from the Advanced Microwave Scanning Radiometer for the Earth Observing System (AMSR-E), the Advanced Microwave Scanning Radiometer-2 (AMSR2), which succeeded AMSR-E (Kelly, 2009), a long time series of the Northern Hemisphere snow depth (NHSD) (Che et al., 2019) and the Global Snow Monitoring for Climate Research (GlobSnow) (Takala et al., 2011).The accuracy of AMSR-E and AMSR2 is higher in North America than in Eurasia (Xiao et al., 2020).The accuracy of the NHSD snow depth product is high in China but low in Europe, America and Canada, where snow tends to be deep (Hu et al., 2021).GlobSnow incorporated some ground observations and has been recognized as the most accurate hybrid snow depth product (Larue et al., 2017;Pulliainen et al., 2020).But the GlobSnow product excludes mountainous areas and only includes areas north of 35° N (Luojus et al., 2021).Remote sensing snow depth products tend to saturate when the value of snow depth exceeds 80 cm in deep snow areas (Xiao et al., 2020;Lievens et al., 2022).Similarly, reanalysis snow depth products are susceptible to various structural limitations, and some errors caused by uncertainties in the input climate variables mean state (Parker, 2016).These reanalysis snow depth products also include uncertainties caused by the biases in the forcing datasets (Broxton et al., 2016).Additionally, several satellitederived products do not cover the high latitudes and atmospheric reanalysis are often inconsistent across these regions (Thackeray et al., 2019).However, some reanalysis snow depth products systematically overestimate snow depth (Xiao et al., 2020;Mortimer et al., 2020).Instead of developing a single "best" snow depth product, an observational ensemble composed of multiple products can provide important information to reduce observational uncertainty.Despite the ongoing challenges, we must continue to assess, intercompare and combine multiple snow depth datasets from independent sources to mitigate uncertainties related to individual products.Mudryk et al. (2015) carried out cross-comparisons of snow water equivalent products both globally and regionally and found that they differed by more than 50% (Mudryk et al., 2015).The multiple snow depth or snow water equivalent products assessment and inter-comparisons have typically focused on identifying the best dataset (Xiao et al., 2020;Mortimer et al., 2020).Mortimer et al. (2020) evaluated nine gridded snow water equivalent products over the Northern Hemisphere by using snow course data in Russia, Finland, and Canada.The results indicated that stand-alone passive snow water equivalent products exhibit low spatial and temporal correlations to other products, and GlobSnow snow water equivalent product performed comparably to the reanalysis-based products.Previous assessment (Xiao et al., 2020) of the Northern Hemisphere snow depth products showed GlobSnow, ERA-Interim, and MERRA-2 snow depth products perform better in plains and forests regions.AMSR-E and AMSR2 showed satisfactory performance in shallow snow regions (0-10 cm), while MERRA-2 agreed better when snow depth exceeded 50 cm.The comprehensive comparison of snow depth products from reanalysis and remote sensing over the Northern Hemisphere has highlighted numerous uncertainties (Xiao et al., 2020).The assessment results showed no clear advantage to using a single type of snow depth product, and a multi-dataset mean exhibited a great improvement in precision (Mudryk et al., 2015).Only through combined and integrated improvements in remote sensing, modelling, and in situ observations will progress in snow depth product development be achieved and sustained (Mortimer et al., 2020;Snauffer et al., 2018).
Machine learning has emerged as a promising alternative means to solve complex nonlinear problems in the field of geography (Reichstein et al., 2019;Yuan et al., 2020).Machine learning also has incomparable advantages in processing geographical data in the era of big data (Zhang et al., 2019).Some machine learning algorithms have been used to retrieve snow depth (Tedesco et al., 2004) as the performance of Artificial Neural Networks (ANN) has been found to surpass that of traditional retrieval algorithms (Cao et al., 2008;Evora et al., 2008).
More advanced machine learning algorithms have been introduced in recent years to retrieve snow depth over large scales.Support Vector Machine (SVM), Random Forest Regression (RFR), and Convolutional Neural Networks (CNN) were employed to retrieve snow depth in China (Liang et al., 2015;Yang et al., 2020), Alaska (Wang et al., 2020), and the Northern Hemisphere (Xiao et al., 2018).These results showed that the accuracies achieved by machine learning methods were higher than those from the traditional Spectral Polarization Difference (SPD) algorithm (Chang et al., 1987) and improved retrieval algorithms (Che et al., 2008).Snauffer et al. (2018) directly incorporated the snow depth products and auxiliary information as input variables of their ANN model to obtain a more precise product.This ANN framework was found to have a reduced MAE of 47% compared to 60% of the original individual products, and the fused results indicated that the machine learning algorithm improved the fusion of multisource datasets.Machine learning methods have been introduced to fuse snow depth datasets, and the fused results indicated that machine learning methods could be used to rapidly produce a long times series snow depth product over large areas with high accuracy (Hu et al., 2021).These fused frameworks based on machine learning algorithms offer a new perspective for creating long time series of snow depth over large regions.Machine learning methods can not only integrate the advantages of these input snow depth products but also improve their accuracy by using a large number of in situ observations to train the sample data.In our previous work (Hu et al., 2021), we only considered special periods and areas north of 35° N latitude.The significant improvements compared with our previous work was the selection of input variables and the hyper-parameters of the random forest algorithm, and the fused snow depth product also covered the whole Northern Hemisphere.
In this study, we propose a data fusion framework based on the RFR algorithm (Hu et al., 2021) to derive a comprehensive snow depth product for the Northern Hemisphere from 1980 to 2019.In this framework, geolocation, topographical and as many gridded snow depth products as possible were selected as the input variables.In Section 2, we describe the various data in this study and illustrate data pre-processing, introduce the RFR algorithms, the experimental design, and the accuracy assessment indicators.Section 3 is the description of the fused snow depth dataset.Detailed comparisons between the fused snow depth dataset and in situ observations and validation works are given in Section 4, with subsequent analysis and explanation.Section 5 discusses the uncertainty and model generalization capacity and data availability.Conclusions are discussed in Section 6.

Gridded snow depth product
The snow depth products used in this study were chosen based on three main criteria: complete the Northern Hemisphere spatial coverage (with the exception of GlobSnow version 2.0, which covers the region 35° N-85° N), continuous availability throughout the satellite era (we used 1980-2019 as our fusion period) and most frequently used by the snow research community (Table 1).Three categories of gridded snow depth products were employed: (1) stand-alone passive microwave retrievals (NHSD, AMSR-E, and AMSR2), (2) passive microwave estimates incorporated with in situ snow depth measurements (GlobSnow), and (3) snow depth products using some form of reanalysis (ERA-Interim, and MERRA-2).

Stand-alone passive microwave snow depth products. AMSR-E and AMSR2
snow depth products are produced by the National Aeronautics and Space Administration (NASA) and Japan Aerospace Exploration Agency (JAXA), while NHSD snow depth was released by the Chinese Academy Sciences (CAS).AMSR-E, AMSR2, and NHSD snow depth products did not rely on any external in situ snow measurements, only on passive microwave brightness data.The retrieval algorithm used in producing the AMSR-E and AMSR2 snow depth products was derived by Kelly (2009), which was an improved Chang algorithm (Chang et al., 1987) that takes the fractional forest cover into consideration.The AMSR-E and AMSR2 brightness temperatures were calibrated based on the Special Sensor Microwave Imager/Sounder (SSMI/S) to ensure data consistency.Before the AMSR-E/AMSR2 snow depth retrieval, the snow cover area was identified by a multiple year average hemispheric snow cover dataset and land cover dataset.In these two snow depth products, if the snow was detected as shallow depth by the passive microwave brightness temperature threshold, the value of snow depth was set to 5 cm.For moderate or deep snow, an improved Chang algorithm was employed to retrieve snow depth.The accuracy of the AMSR-E and AMSR2 snow depth products were more accurate in North America, while the error was larger in Eurasia.However, the second version of the AMSR-E snow depth product, produced by NASA, used an ANN algorithm to improve the accuracy (Tedesco & Jeyaratnam, 2016).
The NHSD snow depth product was derived based on multiple-sensor inter-calibrated passive microwave brightness temperature data (Dai et al., 2015) using an improved Chang algorithm (Che et al., 2008).Through extensive in situ observations, a dynamic coefficient was applied to the Northern Hemisphere to produce a more reliable and timeconsistent snow depth product.The validation of in situ observation sites showed that the relative deviation of the NHSD snow depth product was less than 30% (Che et al., 2019).The NHSD snow depth product has higher accuracy in China than in Europe, America and Canada where the values of snow depth are deeper.Due to the saturation effect of the passive microwave brightness temperature, all remote sensing snow depth products tend to underestimate snow depth when the depth is deeper than 50 cm.

Passive microwave incorporated in situ observation snow depth product.
The GlobSnow snow depth product, released from the European Space Agency (ESA), provides the snow water equivalent value and snow density at the pixel scale over the Northern Hemisphere.The GlobSnow snow depth product includes a number of in situ observations (Takala et al., 2011) and is currently considered the most accurate hybrid snow depth product (Pulliainen et al., 2020).However, this product excludes mountainous areas and only includes areas north of 35° N. The algorithm of the GlobSnow snow depth dataset was used to best estimate snowpacks roughly 0.05 m-1.00 m deep (Luojus et al., 2021).The GlobSnow snow depth product shows high accuracy in Eurasia and low precision in regions of North America and the Tibetan Plateau.The global gridded GlobSnow depth product, also released by ESA, marks snow depth values in mountainous areas as missing values.A recent study attempted to apply ANN to improve the accuracy of the GlobSnow snow depth product (Venäläinen et al., 2021).

Reanalysis snow depth products.
The ERA-Interim snow depth product from the fourth generation of reanalysis (Dee et al., 2011) was generated and released by the  European Centre for Medium-Range Weather Forecasts (ECMWF).The latest ERA-Interim reanalysis dataset was updated until 31 August 2019, after which the dataset was replaced by ERA5.For this study, the snow water equivalent and snow density from ERA-Interim were downloaded at a six-hour time step, then processed as a daily snow depth dataset.
The MERRA-2 reanalysis snow depth dataset (Gelaro et al., 2017) was released by the Global Modelling and Assimilation Office of NASA (GMAO).In the process of the MERRA-2 reanalysis dataset, the Catchment land surface model (Gelaro et al., 2017) was employed to improve the quality of the data.In the study, the average snow depth of the snowcovered area in a pixel and the fractional snow cover in same pixel were used to calculate the average snow depth of the pixel.The original spatial resolution of the MERRA-2 reanalysis dataset was 0.5° × 0.625°, which was subsequently resampled to 0.25° × 0.25° spatial resolution by nearest interpolation.This ensured that the original gridded snow depth products used in this study had the same spatial resolution of 0.25° × 0.25° and daily temporal resolution from 1980 to 2019.
The original gridded snow depth datasets of NHSD and GlobSnow were produced by the passive microwave brightness temperature from SMMR (Scanning Multichannel Microwave Radiometer) in 1980 to 1987.Brightness temperatures form SMMR were collected every other day, and the scanning width is much narrower than SSM/I (Special Sensor Microwave/Imager); thus, large amounts of data are missing.Snow depth product of NHSD and GlobSnow also inevitably have a large number of gaps data, resulting in similar data gaps in the fused snow depth dataset.In addition, as daily detections of AMSR-E and AMSR2 do not fully cover the extent of the Northern Hemisphere, stripe gaps southward of 55°N can be found in the daily images.During June to August, snow cover existed in limited areas over the Northern Hemisphere, and the presence of liquid water content with snowpack can lead to large error of snow depth derived from passive microwave remote sensing (e.g.AMSR-E, AMSR2, NHSD and GlobSnow).In fact, the GlobSnow gridded snow depth products are severely missing in June to August.The data gaps also severely influence the accuracy of the fused snow depth, so this fused data did not cover the temporal phase of June to August.The areas of Greenland and Iceland were excluded in all gridded snow depth dataset, and Svalbard was excluded in GlobSnow snow depth product, so these three areas were also excluded in the fused snow depth dataset.

Ground-based snow depth measurement
In situ snow depth datasets included meteorological station observations from China and Russia, snow survey data from Russia and Global Historical Climatology Network (GHCN)daily snow depth.These four categories of snow depth datasets were used as the groundtruth values for training and validating the accuracy of the proposed fused framework.In this study, the time series of these in situ observations were obtained from 1980 to 2019.
Daily snow depth data of China were acquired from the national meteorological information centre of the Chinese Meteorological Administration, with 945 stations used in this study.This dataset offers several meteorological parameters, i.e. snow depth, snow pressure, location, and elevation of the station sites.Daily snow depth data and snow pressure per five days' data were manually observed at 8:00 a.m. with the same ruler.Daily snow depth data of China were calibrated, the extreme data and wrong value was filtered out before they were uploaded on the national meteorological data platform.Therefore, we do not make any data filtering on this dataset.
A daily snow depth dataset from Russia between 1980 and 2019 was collected from the Russian meteorological centre.This dataset also contained snow depth, location, and elevation of the station sites, and the quality checked field was also conducted.Some anomalous snow depth records were eliminated in this dataset during the data preprocessing phase.Doubtful records, which were flagged in the quality control field were removed.
Snow survey data of Russia were accessed through the Russian meteorological centre.Several snow depth variables are included in this dataset, i.e. maximum, average, and minimum snow depth, snow density every five days to ten days from September to May.This dataset was also conducted based on the extreme value of snow depth to exclude some abnormal values.Data was filtered based on the criteria that the average snow depth should be within the range of the deepest and shallowest snow depth.
The GHCN daily snow depth dataset was the most comprehensive for recording global snow depth observations.This dataset also contained snow depth and elevation of in situ observation sites.Through the inter-annual consistency and climatological outlier check, some errors or abnormal values were excluded.Finally, 41,261 in situ station sites were retained for the random forest fused training and validation.
In this study, seven long-term snow depth datasets from reference in situ sites were chosen to evaluate the machine learning fused snow depth dataset (Ménard et al., 2019).These datasets were derived from the Earth System Model-Snow Model Intercomparison Project (ESM-SnowMIP) (Krinner et al., 2018).These in situ observation sites offer observation periods for seven to twenty years.These in situ sites were not included in those operational weather stations which were used for model training.These in situ observations were vital for developing snow depth retrieved model and quantitative evaluation of the uncertainty of the model.The accuracy assessment of the fused snow depth dataset was validated by these sites which were conducted for scientific research.

Auxiliary data
Auxiliary data mainly included land use/cover data and elevation.Forest can cause snow depth to be underestimated in remote sensing-derived products, and bare land can impact wind-blown snow causing deeper snow depth in some areas.To improve the overall fusion accuracy by training and verifying the model under different vegetation types, the land of the Northern Hemisphere was divided into five categories (forest, grassland, shrub, bare-land, and unclassified) (Figure A1).We assumed that the major land cover type at a spatial resolution of 0.25° did not change over the 40-year period from 1980 to 2019.The snow depth data within one land cover type should be systematically more consistent and may show less variability than taking the snow depth data as a whole (Ntokas et al., 2021).
Global Multi-resolution Terrain Elevation data 2010 (GMTED2010) is an update of GTOPO30 released by the United States Geological Survey (USGS).Elevation data are available at three spatial resolutions (Danielson & Gesch, 2011).In this study, the elevation data with a spatial resolution of 30 arc-seconds was resampled into a new spatial resolution of 0.25° × 025°.Elevation data at a resolution of 0.25° were used as an input variable.

Methodology and the accuracy assessment indices
As described in our previous work (Hu et al., 2021), three machine learning methods, i.e.ANN, SVR, and RFR, were employed to generate fuse snow depth datasets of the Northern Hemisphere.This work showed that the random forest fusion framework has the best performance for generating a long time series snow depth dataset.Detailed information about the random forest algorithm can be found in Belgiu and Drăguţ (2016) and information on the random forest fusion framework can be found in Hu et al. (2021).Two important parameters are required to optimize the random forest algorithm (Du et al., 2020;Hu et al., 2021;Yang et al., 2020), i.e. the number of the trees (ntree) and the number of random variables at each node (mtry).In our previous work (Hu et al., 2021), the input variables and the number of ntree and mtry were fixed.In this study, there are three different periods.In every period, "leave-one-year-out" method is used to obtain the fused snow depth data: one year for data produce and the remaining data for model training and optimization.In the model training and optimizing phase, the Grid Search (a parameter adjustment method, in all candidate parameter selection, try every possibility through cyclic traversal, and the best parameter is the final result) used to determine the optimal parameters for random forest fused model.Every period has its optimal parameter for random forest after Grid Search.

Description of the fused data preparation process
In this study, a snow hydrology season corresponding to year Y was defined as starting on September 1st of year (Y-1) and ending on May 31st of year Y.The snow seasons were selected because the Northern Hemisphere snow season spans from September until June over the next year (Venäläinen et al., 2021).As the snowpack evolves, the snow depth tends to first increase and then decrease.A complete snow hydrology year includes three snow phases (snow accumulation, snow stabilization, and snow melting).To reduce the errors caused by seasonal variations, one snow hydrology year was divided into three snow seasons: autumn (September to November), winter (December to February) and spring (March to May).In this study, these three snow seasons were only used to split the snow depth dataset and were not used as input variables.Because the original gridded snow depth products had various time durations, the period 1980 to 2019 was divided into three parts.The first period was from 1980 to 2002, and 2012 to 2013, the input gridded snow depth products were NHSD, GlobSnow, ERA-Interim, and MERRA-2 products.The second period includes five input gridded snow depth products, i.e.AMSR-E, NHSD, GlobSnow, ERA-Interim, and MERRA-2, from 2003 to 2012.The last period also contained five input gridded snow depth products, i.e.AMSR2, NHSD, GlobSnow, ERA-Interim, and MERRA-2, from 2013 to 2019.According to the spatial coverage of the GlobSnow snow depth dataset, the terrestrial land of the Northern Hemisphere was divided into two parts with 35°N as the splitting line.In different periods, different selected datasets were extracted for the random forest fusion model training and prediction.To improve the accuracy of the RFR model in different snow seasons, different models were established.According to the three snow seasons and five land cover types (Figure A1), 15 models can be employed to train and verify the model.
According to an existing accuracy assessment of several snow depth products (Xiao et al., 2020) and the testing of the data fusion based on machine learning (Hu et al., 2021), the random forest fusion framework displayed a better performance on large scale snow depth combination.In every stage, a "leave-one-yearout" cross-validation was employed to generate the fused snow depth dataset.Lastly, by integrating the data from these three stages, the daily fused snow depth covered 1980 to 2019 was obtained (Figure 1).

Accuracy and uncertainty assessment
We evaluated the accuracy of the fused snow depth and the original gridded snow depth products against the in situ observations.Snow depth products as follows: NHSD, AMSR-E/AMSR2, GlobSnow, ERA-Interim and MERRA-2.The coefficients of determination (R 2 ), root mean square error (RMSE), mean absolute error (MAE) and the bias of in situ observations and snow depth products (bias) were calculated to assess these snow depth products.
where n is the number of sample pixels, Sirepresents the in situ observation snow depth, while b Sidenotes the original gridded snow depth or fused snow depth values of the i-th pixel, respectively.S represents the mean value of in situ observations of n pixels.Furthermore, to detect the trend of the fused snow depth dataset from 1980 to 2019, we used the Mann-Kendall trend analysis method.

Data records
The duration of the fused snow depth dataset is from 1980 to 2019, and temporal resolution is daily.This daily snow depth dataset contains January 1 to May 31 and September 1 to December 31 of 1981 to 2018 exclude June 1 to August 31, and the dataset for 1980 includes September 1 to December 31 while for 2019 contains January 1 to May 31.There are some missing data or swath gap at low-latitude areas in the AMSR-E, NHSD, and GlobSnow gridded snow depth datasets.Gaps of only one day were filled by linearly interpolating method between the prior and following day.After temporal interpolation, only a few stripe gaps in low-middle latitude areas remained on some days from 1980 to 1988.The fused snow depth dataset missed values owe to the limitations of the machine learning fused framework when the original gridded snow depth product missed values in some areas.The format of the daily fused snow depth was GeoTiff, which projection was set to EPSG:4326.The spatial resolution of this snow depth is 0.25° × 0.25°.This fused snow depth dataset also adopted centimetre as unit of the snow depth to match other snow depth products, and NoData_Value is represented by −9999.The name of every daily snow depth was "ML_NHSD_YYYY-MM-DD.tif", ML represents machine learning, NHSD denotes snow depth of the Northern Hemisphere, YYYY stands for the year 1980 to 2019, MM means the month from January to December excluding June to August, and DD represents date.The new fused snow depth dataset is freely available from the National Plateau Data Center (TPDC) and can be downloaded at https://dx.doi.org/10.11888/Snow.tpdc.271701(Che et al., 2021).This snow depth also can be downloaded at https:// zenodo.org/record/6336866#.Yjs0CMjjwzY.

Inter-comparison between the fused snow depth dataset and original snow depth products based on in situ observations
From 1980 to 2019, several original gridded snow depth products, the fused snow depth dataset and in situ observations were cross-verified at pixel scale.In the process of machine learning model training, one point every ten points was retained for independent validation by sparse sampling.Since these points did not participate in the model training phase, they can be considered as independent reference points.The results indicate that the correlations between the commonly used gridded snow depth products and the in situ observation values were not particularly good, while there was good consistency between the fused snow depth product and the in situ observations.The coefficient of determination increased from the best 0.23 (GlobSnow snow depth product) in the original gridded snow depth products to 0.81 (fused snow depth product) (Figure 2, Table A2).The accuracy of the fused data was strongly improved in terms of RMSE and MAE, which were both reduced by about two times.
As can be seen from the scatter density map (Figure 3), the goodness-of-fit between the fused snow depth dataset and the in situ observations was high.When the snow depth values were less than 50 cm, most of these values were distributed near the 1:1 line.It can also be seen from this figure that the fused snow depth dataset showed good accuracy whether the snow depth values were shallow.It also can be found that AMSR-E/ AMSR2, NHSD, and GlobSnow remote sensing snow depth products were largely underestimated when the snow depth value was greater than 50 cm.Although ERA-Interim has a good capture of the deep snow depth values, its overestimation and underestimation were obvious.MERRA-2 was not very accurate and there were many points of underestimating and overestimating disorderly distribution (Figure A2).
The relative bias between the fused snow depth dataset and in situ observation was −10% to 0 and 0 to 10%, which accounted for 21.33% and 33.28%, respectively.Figure 4 also indicated the percentage of negative relative bias was smaller than the percentage of positive relative bias.The regions with large negative relative percent bias were mainly located in Mexico, India, and Iran, where the in situ observations were sparse.The accuracy of the proposed random forest model framework is strongly dependent on the number and quality of training samples.To improve the accuracy of the fused snow depth product, the number of training samples should be as larger as possible.

Accuracy assessment of the fused snow depth dataset at several independent in situ snow depth observation sites
In Section 4.1, the accuracy assessment of the fused snow depth was conducted using ground station observations, which may reduce the credibility of this product since some station observations were already used in the model train.To increase the reliability of the fused snow depth dataset, some independent in situ observations were introduced to validate our snow depth product.In this study, seven reference snow observation sites, which were chosen for evaluating models participating in the ESM-SnowMIP (Krinner et al., 2018), were employed to test the accuracy of the fused snow depth dataset in these regions.Detailed information on these selected sites is shown in Table A1.The different time series of the fused snow depth dataset were extracted based on the time series of these in situ observations.From the accuracy validation of these seven sites, Sodankylä (SOD) had the highest assessment accuracy with the R2, RMSE, MAE, and bias of 0.89, 8.7 cm, 6.4 cm, and 3.9 cm, respectively (Table 2).
Although, the assessment accuracies of Old Aspen (OAS), Old Black Spruce (OBS), and Old Jack Pine (OJP) were not as high as those of Sodankylä, their accuracies were still relatively high compared with those of other gridded snow depth products.The fused snow depth product contains accurate estimates of deeper snow, i.e. deeper than 50 cm (Figure 5), capturing the maximum value and inflection point.Senator Beck Basin Study Area (SBBSA), Swamp Angel Study Plot (SASP), and Weissfluhjoch (WFJ), which had deeper snow depths in winter and spring, did not have a good performance.In these three in situ sites, the fused snow depth product showed a prominent underestimation, especially that from November to April.Because the period from November to May is an important period for snow accumulation, stabilization, and ablation, the fused snow depth dataset is not suitable for these regions.Compared with the original gridded snow depth in SBBSA, SASP, and WFJ, there was a greater underestimation across all products (Figure A3).When the fused snow depth shows a prominent underestimation, these original gridded snow depth products all have a distinct underestimation, indicating that the input variables of the random forest regression model are very important for the fused results.From these assessment sites, in a homogeneous environment and relative lower elevation of a 0.25° pixel, the fused snow depth showed a better performance.At sites in relatively complex environments, the fused snow depth product had a poor performance.Based on the KML file provided by Ménard et al. (2019), we found that SBBSA and SASP are located in the same basin.SASP is in a sheltered area and is surrounded by a sub-alpine forest.In this complex environment, the surrounding forest made it easier to store more snowpack.This site was better suited for measuring precipitation.In the original gridded snow depth products, remote sensing snow depths were smaller than in situ observations.The SBBSA site is located at the top of the mountain, with an elevation of more  than 3700 m.The land cover type of this site is short grass, while the land cover of a 0.25° pixel is complex, with grass, bare rock and forest.In a 0.25° pixel, this site only represents a small region; the elevation varies from 2700 m to 3900 m.These two sites can be used to observe the snowpack characteristics in a basin but they cannot represent the large area snow depth.The WFJ site is located on a hillside at an altitude of about 2540 m.The major land cover type is grassland in one pixel, but this site has a higher altitude.The elevation increased from 800 to 2600 m over one pixel, so this site also has a complex environment.These results indicate that the accuracy of the fused snow depth dataset was heavily dependent on the input gridded snow depth products.Additionally, the snow depths changed too rapidly to be accurately captured by these products.In other words, the fused snow depth dataset had higher accuracy at the snow depths of less than 100 cm.

Accuracy evaluation of the fused snow depth dataset at different snow depth values
We compared the accuracy of the fused snow depth product at different depths based on selected in situ snow depth values.Overall, Figure 6 shows that the error of the fused snow depth dataset was small.Fused snow depth values were divided into eight intervals: 0-5 cm, 5-10 cm, 10-15 cm, 15-20 cm, 20-30 cm, 30-40 cm, 40-50 cm, and above 50 cm.In the interval of 0-5 cm, the relative frequency of bias between −5 and 5 cm was more than 90% (Figure 6(a)), and it indicated that the fused snow depth had a slightly overestimated trend.In the range of 5-10 cm, the bias also showed slight overestimation, but the relative frequency of the bias between −5 and 5 cm was also more than 80% (Figure 6(b)).The distribution charts of relative frequency in the intervals of 10-15 (Figure 6(c)), 15-20 (Figure 6(d)), and 20-30 cm (Figure 6(e)) depths were similar.These frequency distribution charts had a near normal distribution, and the relative frequency of the bias between −5 and 5 cm were all higher than 70%.This result indicated that the errors of the fused snow depth were still relatively small.In the intervals 30-40 cm (Figure 6(F)) and 40-50 cm (Figure 6(g)), the trend distribution of the relative frequency was similar, the frequency of the bias between −10 and 10 cm was higher.This indicated that errors increase with snow depth.For snow depths larger than 50 cm (Figure 6(h)); the relative distribution plots showed an underestimation of the fused snow depth product.Although the estimates for large snow depths are underestimated, the relative frequency of bias near 0 was relatively high.

Accuracy assessment of the fused snow depth dataset at different elevations
We investigated the accuracy of the fused snow depth dataset at different elevation intervals (Table 3).The accuracy of the fused snow depth data had poor precision at altitudes less than 100 m or greater than 2000 m.In the elevation interval greater than 100 m and less than 1000 m, the accuracy of the fused snow depth was high and the consistency of the fused snow depth data was better.In the range of 1000 m to 1500 m, both snow depth and error of the fused dataset were greater.In the interval range of 1500 m to 2000 m, the accuracy of the fused snow depth dataset was highest, with R 2 reaching 0.86.In the range of the elevation greater than 2000 m, the fused accuracy was higher than at the range of elevation less than 100 m.

Spatial and temporal patterns of the snow depth between 1980 and 2019
We separated the Northern Hemisphere into North America and Eurasia to evaluate the temporal change in average snow depth from 1980 to 2019.There was an overall trend of decrease followed by a slow increase in both regions, and the minimum value of snow depth occurred in the 2012 snow hydrology season.From 2013 to 2019, the average snow depth in North America and Eurasia was relatively smooth.From 1980 to 2019, the decreasing trend of snow depth in North America was stronger than in Eurasia (Figure 7(a)).
The spatial distribution of the multi-year average fused daily snow depth dataset from 1980 to 2019 is shown in Figure 7(b).The regions with the deepest snow depth values are distributed in the western Siberian plain, Rockies, and Alps.The average snow depth values of the central Siberia Plateau were about 20-30 cm, which was less than in the eastern Siberian Mountain and western Siberia plain.The average snow depth distribution in Canada increased gradually from west to east, from 10 cm to 40 cm.From this spatial pattern, we also found that most low latitude regions (<40° N) had snow depth of less than 5.0 cm.The snow depth of the Tibetan Plateau was also less; only small parts near the eastern part of Qinghai-Tibet Plateau had snow depth of about 5-10 cm.There were fewer in situ observations to train the random forest data fusion framework, resulting in a lower snow depth accuracy in this region.
The spatial distribution of average snow depth in autumn, winter, and spring was roughly similar to the overall spatial pattern, with more pronounced distribution areas with high value (Figure 8).High average snow depths were found in western Siberian plain, eastern European plain, eastern Canada, Rockies, and European Alps.Seasonally, the mean snow depth in autumn was roughly below 5 cm, which was significantly lower than that in winter and spring, but all three times were relatively smooth, indicating that it was reasonable to divide the snow hydrological year into three different snow seasons, and to calculate the snow depth at different levels was more reasonable and precise.

Comparison of the changing trend of the fused snow depth based on the Mann-Kendall trend test
The changing trend in the annual average snow depth of the Northern Hemisphere from 1980 to 2019 was detected by the Mann-Kendall trend test method.From 1980 to 2019, the test value of fused snow depth product was −3.28, and showed a significantly decreasing trend at the significance test level of 0.05.The trend analysis result indicated that 71% of the area had a decreasing trend while 29% of the area showed an increasing trend.According to the Mann-Kendall trend monitoring algorithm, the areas of significant increase were mainly in the West Siberian plain, East Siberian, and Rockies (Figure 9).These areas also passed the 0.05 significance test.The regions with significant decreases were mainly distributed in northern and eastern Canada, Sweden, and Finland.Most parts of these regions also passed the 0.05 significance test.In the regions below 40° N, the change trend was not significant.

Limitations and uncertainties of the fused snow depth dataset
Previous assessments determined that the original gridded snow depth products contain some errors (Xiao et al., 2020).Based on our proposed data fusion framework (Hu et al., 2021), creating a seamless snow depth product over the Northern Hemisphere requires that all original gridded snow depth products have non-missing snow depth values.During preprocessing, the AMSR-E and NHSD snow depth products were temporally interpolated to remove striping.The GlobSnow snow depth product did not cover the whole Northern Hemisphere, it was only used to produce the fused snow depth dataset cover the north 35° N. Therefore, in different periods, the input variables of the random forest were different, which may influence the temporal consistency of the fused snow depth dataset.However, a large amount of temporal consistent in situ observations was used as the reference value in the training process of the mode could remove this difference.Furthermore, there was a good temporal consistency in different periods, the value of RMSE and MAE did not show big gaps.From the 2003 to 2011, the fused snow depth dataset has a highest precision.In this study, we assumed that land cover change at a spatial resolution of 25 km was negligible, so the land cover was kept constant.In the future, we will also consider snow climate class (Sturm & Liston, 2021) which were produced based on air temperature, precipitation and GlobCover landcover data, as priori information to train model, and compare the results from different schemes.Based on previous assessment work (Xiao et al., 2020), ERA-Interim reanalysis snow depth dataset is selected as a candidate variable of the random forest fused framework.According to the latest research, ERA5 and ERA5-Land reanalysis snow depth datasets have a better accuracy than ERA-Interim (Mortimer et al., 2020;Muñoz-Sabater et al., 2021).Also, the spatial resolution of ERA5-Land snow depth dataset has increased to 0.1°.The original spatial resolution of AMSR-E/AMSR2, NHSD, and GlobSnow was 0.25°.ERA-Interim was retained as an input candidate variable of the random forest model.To obtain a more continuous and accurate snow depth dataset with higher spatial resolution, we are assessing the ERA5 and ERA5-Land reanalysis snow depth datasets and expecting to introduce these two datasets for improving our fused snow depth dataset with a higher spatial resolution.Lastly, the accuracy of the fused framework depends heavily on the training samples.Most snow depths from the meteorological stations were small, and only a small number of deeper snow values were derived from snow field surveys.According to the validation result from seven independent in situ sites, the results indicate that the fused snow depth dataset also exhibit a poor performance in the areas with deeper snow depth, and regions with complex terrain.The quality and accuracy of the fused snow depths in regions with deeper snow were poor.To overcome this deficiency, more snow surveys data should be included as input variables.The validation results indicate that the fused snow depth dataset does not show good performance in areas with deeper snow depth values.Independent validation site only represents "point-scale" while the spatial resolution of fused snow depth dataset is about 25 km.It is difficult to observe snow depth in complex mountainous areas, and a small number of sites were used to train and optimize the random forest model.

Generalization ability of the proposed random forest regression model
Because most machine learning methods are currently black-box models, they have no physical mechanism to constrain their estimates (Banesh et al., 2021).In our fused framework, the input variable selection is based on experience and previous studies.Even so, the fused results indicate that the approach used in this study is rational.For example, in different snow seasons, the absolute snow depth differed considerably, suggesting that it was reasonable to fuse snow depth datasets in different snow seasons.Although current machine learning algorithms are not coupled with physical processes, some physical mechanisms are being depicted in the resulting products.In a future study, we will not only pursue integrated physical process constraints in machine learning algorithms (Karniadakis et al., 2021), but also use these machine learning results to explain physical mechanisms and significant trends.
To evaluate the stability of the random forest model and validation of the results, we used a combination of different spatial positions in the training samples (same time), different times of training samples and verification samples (same spatial positions), and different time and spatial positions of the training and verification samples.As expected, the highest accuracy was obtained when training and predicting using time samples in the same region and different years.Because of the generalization ability of machine learning, it was a more reasonable choice to use all the data in the Northern Hemisphere for model training.For example, to achieve the highest accuracy, we used pixels on the same position, snow hydrologic year data from 2002 to 2003 as the training data, and snow hydrologic year data from 2003 to 2004 as the verification sample.For pixels in different positions, the accuracy is not as high as when the model uses the same position and time, whether they are the same in time (different space) or different in time and space.It is also recognized that machine learning models are challenging to generalize in different spatial locations.The proposed RFR model structure is expected to be applicable to every area in the world.However, new training is advisable.The significance score increases even when only one variable is eliminated.Thus, more input variables help widen the spread of the ensemble (Ntokas et al., 2021).Therefore, adding more variables could further increase the model's reliability.In the future, an important direction is to improve the random forest fused model architectures so the obtained model can be transferred across different geographic areas.The fused snow depth product showed poor performance for deeper snow, mainly because the amount of available training data was low for these extreme values, as found in previous studies (Xiao et al., 2018;Yang et al., 2020).

Conclusions
We presented a new fused daily gridded database of snow depth for the Northern Hemisphere from 1980 to 2019 at a resolution of 0.25° (25 km).This dataset fused an unprecedented amount of in situ measurements and multisource gridded snow depth products, using a machine learning algorithm.Comparing AMSR-E/AMSR2, NHSD, GlobSnow, MERRA-2, ERA-Interim, and the fused snow depth datasets with in situ observation snow depths, the results show that the original gridded snow depth have weak correlations, while the fused snow depth dataset has a good agreement with in situ measurements.The fused (best original) dataset has a coefficient of determination R 2 of 0.81 (0.23), Root Mean Squared Error (RMSE) of 7.69 (15.86), and Mean Absolute Error (MAE) of 2.74 (6.14).Precision validation via several independent in situ observations indicates that fused snow depth data are likely more accurate than those of previous studies in area with shallow snow depth.The fused snow depth dataset shows a clear underestimation in deep snow areas and indicates a lower accuracy in areas with complex terrain.Therefore, this new fused snow depth dataset not only provides information about snow depth and its variation over the Northern Hemisphere but also presents potential value for hydrological and water cycle studies related to seasonal snowpack.More importantly, the fused snow depth dataset can be used for water resource management, environmental monitoring, and early warning of snowfall.
Although the proposed machine learning fused framework performs well in the snow depth product fusion over the Northern Hemisphere, some drawbacks and limitations still need to be overcome, especially on the input variables selection and machine learning algorithm optimisation.In our future work, deep learning algorithms will be introduced to fuse snow depth datasets in conjunction with exploring relevant physical information about the snowpack.We will update the fused snow depth dataset when new gridded snow depth products become available.For example, ERA5 snow depth product will be incorporated in our proposed framework.
represents the average snow depth in the snow-covered area of a 0.5° × 0.625° pixel.

Figure 1 .
Figure 1.Flowchart of the random forest fused framework to produce a comprehensive snow depth dataset from 1980 to 2019.

Figure 2 .
Figure2.Accuracies of the fused snow depth dataset for six gridded snow depth products compared with in situ observations.The raw gridded snow depth products are displayed in grey, while the fused snow depth dataset was shown in orange.

Figure 3 .
Figure 3. Scatterplots between average snow depth of fused snow dataset and average snow depth of in situ observations between 1980 and 2019.

Figure 4 .
Figure 4. Spatial distribution of the relative bias (i.e.bias (fused snow depth minus in situ observations)) over the Northern Hemisphere.The numbers outside the parentheses represent the intervals of relative percent bias while the numbers between parentheses denote the percentage of each interval in the legend.

Figure 5 .
Figure 5. Fused (black line) and in situ (red line) snow depths at seven independent observation sites.(a) Sodankyla observation site in Finland (Essery et al., 2016); (b) Old aspen black site in Canada (Bartlett et al., 2006); (c) Old black spruce site in Canada (Bartlett et al., 2006); (d) Old jack pine site in Canada (Bartlett et al., 2006); (e) Senator beck basin study area site in the United States of America (Landry et al., 2014); (f) Swamp angel study plot site in the United States of America (Landry et al., 2014) and (g) Weissfluhjoch site in Switzerland(Wever et al., 2015).

Figure 6 .
Figure 6.The frequency histogram of bias between the fused snow depth dataset and selected in situ observations.a, b, c, d, e, f, g, and h correspond to snow depths of the in situ observations of 0-5 cm, 5-10 cm, 10-15 cm, 15-20 cm, 20-30 cm, 30-40 cm, 40-50 cm, and above 50 cm, respectively.The blue bar represents the relative frequency of the fused snow depth in every intervals and the red dashed line represents the fitted line.

Figure 7 .
Figure 7. Spatial and temporal distribution of multi-year average snow depth over the Northern Hemisphere from 1980 to 2019: (a) time series of average snow depth over North America and Eurasia; (b) spatial distribution of multi-year average snow depth.

Figure 8 .
Figure 8. Spatial and temporal distribution of multi-year average snow depth in different snow seasons over the Northern Hemisphere from 1980 to 2019.(a) Spatial pattern of multi-year average snow depth in Autumn (September to November); (b) spatial pattern of multi-year average snow depth in Winter (December to February); (c) spatial pattern of multi-year average snow depth in Spring (March to May), and (d) time series of change of the average snow depth over the Northern Hemisphere by snow seasons.

Figure 9 .
Figure 9. Spatial distribution of snow depth changed rate over the past four decades by Mann-Kendall trend test.The black dots in the graph indicated these regions passed the significance test of 0.05.

Lin
Xiao received a B.Sc. degree from China West Normal University, Nanchong, China, in 2008, and a Ph.D. degree from the Chinese Academy of Sciences, Beijing, China, in 2019.She is currently a lecture with Sichuan agricultural university.Her current research interest includes the remote sensing in snow and forest.Jie Deng received a Ph.D. degree from the Northwest Institute of Eco-Environment and Resources (CAREERI), Chinese Academy of Sciences (CAS), Lanzhou, China, in 2021.She is currently a postdoctoral fellow at CAREERI, CAS.Her research interests include climate risk assessment of ski tourism sector and remote sensing of snow cover.Xin Li received a B.Sc. degree from Nanjing University, Nanjing, China, in 1992, and a Ph.D. degree from the Chinese Academy of Sciences, Beijing, China, in 1998.He has been a Professor with the Cold and Arid Regions Environmental and Engineering Research Institute, Chinese Academy of Sciences (CAS), Lanzhou, China, since 1999.He is currently the Director and a Professor with the National Tibetan Plateau Data Center, Institute of Tibetan Plateau Research, CAS.His current research interests include land data assimilation, the application of remote sensing and geography information systems in hydrology and cryosphere science, and integrated watershed modelling.

Table 1 .
Summary of the original six gridded snow depth products used in this study.

Table 2 .
Accuracy metrics at several in situ observation sites.R 2 : coefficient of determination, RMSE: root mean square error, MAE: mean absolute error, N: Number.-means that R 2 was not calculated because the errors were too large.

Table 3 .
Accuracy assessment between the fused snow depth dataset and in situ observations at different elevations.