Investigating Catchment‐Scale Daily Snow Depths of CMIP6 in Canada

Accurate modeling of snow depth (SD) processes is critical for understanding global energy balance changes, affecting climate change mitigation strategies. This study evaluates the Coupled Model Intercomparison Project Phase 6 (CMIP6) model performance in simulating daily SD across Canada. We assess CMIP6 outputs against observed data, focusing on daily SD averages, snow cover durations, and rates of accumulation and depletion, alongside annual SD peaks for 11 major Canadian catchments. Our findings reveal that CMIP6 simulations generally overestimate daily SD by 57.7% and extend snow cover duration by 30.5 days on average. While three models (CESM2, UKESM1‐0‐LL and MIROC6) notably align with observed annual SD peaks, simulation biases suggest the need for enhanced model parameterization to accurately capture snow physics, particularly in regions with permanent snow cover and complex terrains. This analysis underscores the necessity of refining CMIP6 simulations and incorporating detailed geographical data for better SD predictions.


Introduction
Accurate simulations of snowpack processes are crucial for understanding and predicting changes in the global energy balance, affecting nutrient cycles, surface and ground temperatures, water availability, water quality, and ecosystems (Hui et al., 2022;Qi et al., 2016).Further, reliable modeling of snow physical processes can improve future climate predictions, enabling more effective forecasting and mitigation of the impacts of climate change.Yet modeling snowpack processes poses significant challenges, especially in mountainous regions, where the risk of snow-related disasters like snow slides, avalanches, and high wind interactions is heightened.These complexities make both observations and simulations particularly demanding (Grünewald et al., 2013;Lehning et al., 2011).Snow depth (SD) plays a pivotal role in shaping snowpack dynamics.By enhancing the accuracy of SD simulations in Global Climate Models (GCMs), we can gain deeper insights into global climate variability and improve global climate projections.The latest generation of climate models, the Coupled Model Intercomparison Project Phase 6 (CMIP6), demonstrates considerable improvements in simulating snow processes compared to previous generations (e.g., Burke et al., 2020;Druel et al., 2017;Lee et al., 2021).However, parametrization schemes in GCMs often oversimplify sub-grid heterogeneity and neglect various drivers, leading to inaccurate simulations of SD (Clark et al., 2011).It is crucial to assess the outputs of GCMs to identify sources of biases and problematic regions, enabling enhancements in climate modeling.
Evaluating CMIP6 outputs requires employing reliable data sets for SD observations.Conventional methods for measuring SD include field snow surveys and automated snow sensors.Recent advances include techniques such as SD polarimetric phase differences (Leinss et al., 2014), ground-based laser scans (Deems et al., 2013), electromagnetic wave technology (McCreight et al., 2014), and GPS-based systems (Koch et al., 2019).More recently, remote sensing methods have been employed, utilizing tools like uncrewed aerial systems, motion photogrammetry (Miller et al., 2022), scanning lidar (Painter et al., 2016), and satellite imagery (Hui et al., 2022).The recent advancements are promising; however, providing SD in large domains, especially in mountainous regions, remains cost-ineffective, resulting in short records that lack comprehensive spatiotemporal coverage (Cluzet et al., 2022).To compromise these challenges, regional SD data sets have been developed utilizing probabilistic techniques to mitigate spatial irregularities and account for data gaps.For instance, the Canadian Meteorological Centre (CMC) daily SD data set is a reliable regional data set for the Northern Hemisphere (Brown & Brasnett, 2011).Global SD data sets which leverage advanced modelling and data assimilation techniques include MODIS and ERA5-Land (Hall et al., 2021;Muñoz-Sabater et al., 2021).
Limited studies have evaluated CMIP6 simulations of SD, mostly conducted at monthly scales, revealing spatiotemporal disparities and substantial biases in simulated SD across various regions (Chen et al., 2021;Zhang et al., 2022;Zhong et al., 2022).Snow-process biases are mostly attributed to errors in simulating temperature and precipitation, particularly in mountainous regions and the Northern Hemisphere (Kouki et al., 2021;Zhong et al., 2022).Snow water equivalent (SWE), typically associated with SD estimation in land surface models (LSMs) as noted by Sturm et al. (2010), has been assessed in numerous studies.Kouki et al. (2021) have highlighted that CMIP6 simulations overestimate SWE, with considerable model variability.However, the simulated area means of SWE align well with estimates from the ERA5-Land reanalysis data set (Räisänen, 2023).Another recent study has depicted a slight overestimation of the regional climate model of the Weather Research & Forecasting Model (WRF) compared to the ERA5-Land data set, with large biases identified when investigating seasonal extremes in Germany during 1987-2018 (Poschlod & Daloz, 2024).
Despite recent advances in snow process modelling within LSMs, no studies have investigated the performance of CMIP6 models in simulating daily SD (to the best of our knowledge).Here, we employ a unique, comprehensive observational data set of daily SD covering Canada to assess SD in CMIP6 model outputs.We investigate the performance of CMIP6 climate models in simulating daily SD in the Canadian catchments in two aspects.First, we evaluate the SD of CMIP6 against station data, CMC, and ERA5-Land data sets in terms of average daily SD, snow cover durations, accumulation and depletion rates, and annual peaks for 11 major catchments in Canada.Second, we attempt to understand the impacts of the adopted physical parametrization scheme of SD in climate models on their performance.

Study Area
Canada comprises 11 major catchments: Arctic, Great Slave Lake, Maritime Provinces, Mississippi River, Nelson, Northern Quebec and Labrador, Pacific, Southwestern Hudson Bay, St. Lawrence, Western and Northern Hudson Bay, and Yukon River (Figure 1 illustrates the catchments with color-coded abbreviations for reference).These catchments experience varying influences, such as precipitation patterns, oceanic fluctuations, topography, land use, and anthropogenic activities.For instance, highly populated watersheds like St. Lawrence and Great Slave Lake undergo changes in temperature, land use, and vegetation patterns (Watersheds Canada, 2018).In contrast, with a minimal population, the Arctic catchment presents challenges in simulating heat storage and snow melting uncertainties (e.g., Keen et al., 2021).Coastal catchments are affected by oceanic fluctuations and high natural variability.Additionally, catchments may exhibit mountainous terrain (e.g., Pacific catchment) or flat prairies (e.g., Nelson and Mississippi catchments).Different climate conditions can result in distinct SD variability (Matiu et al., 2021).Analyzing CMIP6 simulation biases, considering the distinct characteristics of Canadian catchments, can provide valuable insights into the sources of these biases.

Simulations
We utilize all available daily SD simulations, totaling 42 CMIP6 simulations from six models (Eyring et al., 2016).Each model exhibits varying spatial resolutions; thus, we applied the area conservative mapping to re-grid the simulations to 0.1°× 0.1° (Jones, 1999).Table S1 in Supporting Information S1 lists model names, original spatial resolution, and the number of study locations within each catchment for reference.

Observations
We employ 481 reference locations for SD observations provided by the Environment and Climate Change Canada (ECCC) to evaluate SD simulations.Variations exist in the starting and ending dates of available SD observations at different locations.For our analysis, we selected the period from 1998 to 2014, representing the most comprehensive period covered by the observations and overlapped with CMIP6 historical simulations.We also employ two gridded data sets: ERA5-Land at 0.1°× 0.1° (Muñoz-Sabater et al., 2021) and the CMC data set at 24 km (Brasnett, 1999).We regridded the CMC data set to 0.1°× 0.1°to match ERA5-Land and CMIP6 simulations.ERA5-Land relies on reanalysis that couples physical modeling with observations, while CMC primarily relies on in situ observations, including surface synoptic observations and meteorological aviation reports.CMC data set employs optimal interpolation techniques for spatial processing, utilizing an initial guess field derived from a basic snow accumulation and melt model.This model operates based on analyzed temperature data and predicted precipitation from the Canadian forecasting model (Brasnett, 1999).

Methods
We assess the performance of SD in CMIP6 simulations by comparing their gridded data with the corresponding reference stations located within the grid.Despite spatial structure mismatches, this approach is common in regional studies (e.g., Li et al., 2021;Rajulapati et al., 2022;Zamani et al., 2020).However, to effectively tackle this spatial structure mismatch and its impact on snow depth, our analysis includes evaluating simulations against gridded observational data sets such as ERA5-Land and CMC.We also perform a terrain comparison between station data and simulations, offering insights into model performance.
Our methodology involves calculating average daily nonzero SD, snow cover duration, and the slopes of SD accumulation and depletion for each study location.Accumulation refers to SD increase from zero to its peak within the annual cycle, while depletion tracks its decline back to zero. Figure S1 in Supporting Information S1 demonstrates that the accumulation and depletion of SD behave almost linearly (also refer to Abdelmoaty et al., 2024 for more insights into SD properties).Consequently, we compute the linear slopes of these processes to quantify the change over time for simulations and observations at the same locations.Snow cover duration and slopes are calculated each year, and the annual average value is determined for each study location.For evaluating SD extremes, we compare simulated and observed annual peaks, defined as the 97.5th percentile of each year's daily SD time series.This quantile is assumed to represent extremes without falsification for outliers or biased values.An additional analysis is presented in Supporting Information S1 Section 3 for evaluating extremes using L-moments as they offer robust representation for shape properties (i.e., tail) of distributions.
In presenting the results, we alternate between relative and absolute differences for clarity and precision because the effectiveness of each measure varies depending on the magnitude of the values.Hence, percentage differences are utilized to compare the average daily nonzero SD and annual peaks, whereas absolute differences are employed to assess the bias of CMIP6 simulations in replicating average annual snow cover durations, accumulation, and depletion slopes.
Finally, we attempt to relate each parameterization scheme the individual CMIP6 models adopted to their performance.SD biases primarily arise from simplifying complex snow physics processes through parameterization schemes used in GCMs.These snow processes are incorporated within the implemented LSM, which describes land surface processes using meteorological data such as precipitation, snowfall, and temperature (Abramowitz et al., 2008).Additionally, errors can accumulate in LSMs due to the interactions with the atmospheric modules within GCMs.Each GCM applies a distinct LSM and simulates specific processes believed to influence snow metrics, including SD.

Results and Discussions
Across 481 stations and 42 CMIP6 simulations, there's a 57.7% median overestimation in average daily snow depth (SD), translating to a 6.9 cm absolute difference (Figure 1).Variability within catchments is evident, with an interquartile range of the differences from 12.5% to 221.5%.Catchments C08 and C02 show minimal biases of 0.68% and 3.38%, respectively, attributed to their small size, inland location, and uniform land cover, with C02 being forested and C08 wetland-dominated (Natural Resources Canada, 2019).Catchments C03, C10, C06, and C09 exhibit low-range biases between 20.45% and 31.25%.In these catchments, large water bodies, such as the Atlantic Ocean, Labrador Sea, and Hudson Bay, complicate snowfall patterns, slightly impacting the accuracy of SD modeling (Burnett et al., 2003;Hartley, 1999).
C07 shows significant overestimation (271.84%),likely due to its complex terrain in this region, featuring mountains, ocean interactions with land, and persistent snow cover, presenting challenges in climate modeling (e. g., Allen & Zhao, 2022;Li et al., 2022;Masud et al., 2021;Zhang et al., 2022).These challenges are also common in the Yukon River catchment C11 (118.68%bias).Notably, these two catchments boast the highest observed records of SD during the study period (also recorded in Brasnett, 1999;Hong & Ye, 2014).The mountainous feature of these catchments contributes to this overestimation.Snow depth generally increases with altitude (Anderson et al., 2014;Grünewald et al., 2013), and climate model grids often represent higher elevations than actual observation stations, typically located in more accessible, lower-altitude areas.Hence, the simulated SD is higher than the observed data, affecting mountainous areas such as C07 and C11 (Figure S2 in Supporting Information S1).Further, due to limited stations in elevated regions, the difficulty of validating climate models persists.

10.1029/2024GL109664
Catchments C04 and C05, representing the Prairie region, also experience substantial median biases of 85.3% and 101.93%, respectively.LSMs mostly do not consider wind stress when simulating snow-land interactions, which is a major component of snow depth spatial distribution in the Prairies (Fang & Pomeroy, 2009).Finally, the Arctic region (C01) demonstrates a median bias of 51.93%.In this region, characterized by persisting annual snow cover, SD values depend on the insulating effect of snow cover, the high albedo of surfaces delaying the melting process, and regular ice deformations (Kwok et al., 2020;Lam et al., 2023).Physically simulating such interactions remains challenging (Chylek et al., 2022;Notz & SIMIP Community, 2020).For insights into the multi-model median performance of CMIP6 simulations at individual locations, refer to Figure 3.For the performance of each CMIP6 model simulation at each catchment, refer to Figure S3 in Supporting Information S1.
Comparing the average daily SD from the CMIP6 with the CMC data set reveals consistency with station data while employing ERA5-Land as a reference presents divergent outcomes (Figure S4 in Supporting Information S1).Specifically, the comparison of CMIP6 simulations with the CMC data set confirms the overestimation of the average daily SD with a median difference of 44.45% (2.3 cm) and an interquartile range of 30.5% ( 92.43 cm) to 194% (61.32 cm).This pattern of overestimation aligns with previous research; Zhong et al. (2022) found CMIP6 simulations overestimated monthly SD by 11% in high-altitude Northern Hemisphere regions, while Zhang et al. (2022) noted significant annual SD overestimation in China.Conversely, when comparing CMIP6 simulations to the ERA5-Land data set, the average daily snow depth is underestimated by a median of 15.14% ( 1.7 cm) and interquartile range of 47.01% ( 68.6 cm) to 83.48% (235.3 cm).Notably, ERA5-Land exhibits higher SD values compared to the observed ones (Figure S5 in Supporting Information S1).As a result, observations show the lowest SD values, followed by CMIP6 simulations, and then ERA5-Land.The higher values of both ERA5-Land and CMIP6 compared to station data are likely due to inadequate modeling of precipitation microphysics and land surface processes.A summary statistic of the comparison of simulations against station data, CMC data set, and ERA5-Land is presented in Table 1.
The accurate representation of snow cover duration is vital for climate models, as it significantly influences global temperatures by impacting global surface albedo.Additionally, the duration of snow cover can have implications for streamflow during the snowmelt season.Generally, CMIP6 simulations overestimate these durations, with a median difference of approximately 30.5 days for all simulations and grid points (Figure 2a).The individual performance of CMIP6 models is presented in Figure S6 of Supporting Information S1.Notably, catchment C07 exhibits the highest overestimation, around 75 days.The Yukon (C11) and Prairies (C04) catchments follow with the second-highest overestimations of median durations of 38 and 41 days, respectively.The remaining catchments overestimate snow cover durations by an average ranging from 18 to 33 days, while the Western and Northern Hudson Bay catchment (C10) displays the lowest overestimation bias of only 12 days.Note that four out of six models (CESM2, GISS-E2-2-G, IPSL-CM6A-LR, and IPSL-CM6A-LR-INCA) simulate permanent snow cover at some locations (5, 28, 6, and 6 locations, respectively) all year round in the Arctic and sub-Arctic regions (contrary to observations; see Figure S7 in Supporting Information S1).These models overestimate the snow cover duration (i.e., simulate longer snow cover than observed) by medians of 96, 73, 73, and 74 days, respectively (Figure S8 in Supporting Information S1).When analyzing the accumulation and depletion slopes of simulated SD compared to station data, the biases are minimal, with medians of 0.083 cm/day and 0.040 cm/ day, respectively (Figures 2b and 2c) for all simulations and grids.However, the individual model performance of these slopes is presented in Figures S9 and S10 of Supporting Information S1.Coastal and mountainous catchments (C11, C07, and C03) exhibit high bias variabilities.
The overestimation of snow cover duration may be attributed to the mismatch in spatial coverage, wherein one grid has more area coverage than its reference point of observation.However, this overestimation is confirmed when comparing simulations to the gridded CMC data set, highlighting that this bias is not fully explained by the spatial structure mismatch (Figure S11 in Supporting Information S1).Consequently, CMIP6 simulations appear to face challenges in accurately simulating SD characteristics in snow-dominant regions.While the CMC data set aligns closely with station data when comparing snow cover duration, it yields more bias in accumulation and depletion slope (Figure S11 in Supporting Information S1).This mismatch is probably caused due to the dependence of this data set on a simple snow model for spatial interpolation.In contrast, ERA5-Land captures the same simulation biases as station data for these slopes as it depends on more complex parametrization schemes for snow processes despite its mismatch in the average daily SD and snow cover duration (Figure S12 in Supporting Information S1).Therefore, caution is required when employing different observational data sets for SD, as different sources yield different uncertainties (Table 1).
To examine extreme events, we assess the performance of each CMIP6 model in every catchment by comparing the observed and simulated annual 97.5th percentile of SD at 481 locations of the stations across Canada (Figures 3b and 3d).The performance patterns of CMIP6 models align with the SD characteristics that were previously analyzed.The Pacific and Yukon catchments, C07 and C011, exhibit the highest bias, with medians of 119.40% and 67.81%, respectively.The Prairies catchments, C04 and C05, also demonstrate relatively high biases, with medians of 31.11% and 43.32%, respectively.In the Arctic catchment, C01, the bias is 24.01%as a median.The Western and Northern Hudson Bay catchment, C10, displays a more moderate bias of 8.23%.The remaining catchments show relatively low biases but with opposite signs (i.e., underestimation), with medians ranging from 9.67% to 1.76%.
The models show considerable variability in annual peak differences among simulations and catchments (Figure 4).Thus, there is no model consensus on the sign of the change with considerable variability within and among catchments.However, generally, GISS-E2-2G shows extreme positive and negative biases among catchments by medians of 148.14% and 73.57%, respectively.Besides, IPSL-CM6A-LR-INCA and IPSL-CM6A-LR models mostly overestimate annual peaks with medians of 49.88% and 48.30%, respectively, across all catchments.Despite having large biases with high variability in Arctic and coastal catchments, CESM2 manages to exhibit the lowest bias, with a median of 6.37%, followed by UKESM1-0-LL (15.47%) and MIROC6 ( 17.66%) models.The behavior of CMIP6 models shows considerable variation, which aligns with findings from previous studies on various snow process indicators (Kouki et al., 2021;Räisänen, 2023).When assessing simulation biases against different reference data sets, the comparative performance of individual models remains consistent (Figures S13 and S14 in Supporting Information S1).Using the CMC data set as a reference tends to align more closely with station data, showing an overestimation of SD annual peaks, whereas results derived from ERA5-Land indicate the opposite direction (Table 1).
The variability in CMIP6 model outputs partly stems from the use of different LSMs, with each model's parameterization schemes affecting SD estimates.MIROC6 employs MATSIRO, a comprehensive LSM that includes processes like snowfall and melt but assumes constant snow density, which might limit accuracy (Takata et al., 2003).UKESM uses JULES, considering sub-grid land types, but lacks vegetation-snow interactions, while CESM2's CLM5 focuses on sub-grid topography but is limited by being a single-layer model (Lee et al., 2021).Despite these limitations, MIROC6, UKESM, and CESM2 generally perform better in simulating peak SD, highlighting the importance of accounting for geomorphological factors (Clark et al., 2011).Challenges remain in accurately representing land heterogeneity, affecting SD variability.Models like GISS-E2-2G and IPSL-CM6A, which use simpler or older snow models, require additional improvements in daily SD simulation (Druel et al., 2017;Krinner et al., 2005;Lynch-Stieglitz, 1994;Wang et al., 2013).Considering the complexities of physically modeling SD, an effective alternative approach is stochastic modeling, which has demonstrated its capability to accurately reproduce observed properties such as autocorrelation, intermittency, and distributions (Abdelmoaty et al., 2024;Papalexiou, 2022;Papalexiou et al., 2023).

Conclusions
Precise modeling of daily SD is essential for Canada, as it directly influences water resource management, flood prediction, and climate change adaptation strategies across its diverse landscapes.Yet, to the best of our knowledge, no studies have investigated the performance of CMIP6 climate models in simulating daily SD.Here, we present a comprehensive analysis of CMIP6 simulations for daily SD in 11 major catchments across Canada, covering a wide range of climate types.We employ a dense network of daily SD gauges covering Canada and two gridded data sets (CMC and ERA5-Land), allowing for accurate assessment of CMIP6 outputs.This detailed evaluation allows climate modellers to improve their models and reduce biases by addressing specific issues related to the model performance of LSMs.It can also increase the applicability of CMIP6 models, knowing the strength and unsatisfactory performance of different SD metrics.The results indicate that the models perform best in small-sized catchments with homogenized land cover.However, the Pacific, Arctic, and Prairies catchments exhibit substantial variability and biases, necessitating further understanding and improved simulation of snowpack physical processes.In summary, the key findings of this study are.
• CMIP6 simulations generally overestimate average daily SD values, with a median of 57.7%.
• The simulated elevation in CMIP6 is higher than the actual station elevation, suggesting a partial reason for the overestimation of SD by CMIP6 models.• The simulations indicate a longer snow cover duration than observations, with a median of 30.5 days.
• The annual peaks of SD are accurately captured by three models: MIROC6 ( 17.66%), UKESM1-0-LL (15.47%), and CESM2 ( 6.37%).• The CMC data set closely aligns with station data, showing the same direction of bias across all SD metrics.However, it exhibits more simulation bias for SD accumulation and depletion slopes.• ERA5-Land generally presents the opposite sign of bias when used as a reference data set compared to station data.• Incorporating additional geomorphological factors improves the performance of climate models.Parameterization schemes require further enhancements to accurately represent snow physics in regions with heavy snowfall, complex terrains, and open environments like the Prairies.
These findings highlight the importance of refining CMIP6 simulations and incorporating detailed geographical data to improve snow depth predictions and our response to climate impacts in Canada.S1 in Supporting Information S1 for a brief description of the used CMIP6 models), and then download the NC files that appear as search outputs.An overview of these climate models is available in Eyring et al. (2016).The Canadian catchment shapefiles are extracted from the ECCC Historical Climate Data website (https://ftp.maps.canada.ca/pub/nrcan_rncan/vector/geobase_nhn_rhn/shp_en/) on 1 June 2023.The analysis in this study, along with the figures, were produced using R software, version 4.3.1 (R Core Team, 2021) and RStudio (Racine, 2012).This software is offered openly through https://cran.r-project.org//.The authors mostly use the following packages: "tidyverse", "sf", and "ggplot" (Bivand et al., 2021;Wickham, 2006Wickham, , 2017)).

Figure 1 .
Figure1.The major Canadian catchments, with black dots representing observations.The subplot shows the percentage difference between the average daily SD between simulations and observations.The filled regions inside boxplots show the interquartile range, while the horizontal line shows the median values.The whiskers extend 1.5 times the interquartile range from both ends.

Figure 2 .
Figure 2. Percentage difference between observed and simulated annual duration of snow cover, the accumulating slope, and the depletion slope of SD.When the percentage difference is negative, it indicates underestimated simulations.The shaded region and the error bar indicate the distributions of all stations and coupled model intercomparison project phase 6 simulations inside each catchment, while the point indicates the median value.

Figure 3 .
Figure 3.The observed and simulated (multi-model median) daily SD properties at the 481 study locations: (a) the observed and (b) the simulated average daily SD value; (c) the observed and (d) the simulated average annual SD peak represented by the 97.5th percentile.For these legends, approximately, the first class is the range between the minimum value and the 25% quantile; the second class is the interquartile range; the third class is the range between 75% quantile and 1 m; and the last class represents points with SD above 1 m.Penals (e) and (f) show percentage differences between observed (station data) and simulated (multi-model median) for the daily nonzero SD and annual SD daily peak, respectively.

Figure 4 .
Figure 4.The percentage difference between the observed and simulated annual peaks of SD for individual coupled model intercomparison project phase 6 (CMIP6) models in all major catchments in Canada.Each subplot presents the performance of the different CMIP6 models inside one catchment following the color code.In each subplot/catchment, the violin shapes show the distribution of all points from simulations of each model.

Table 1 A
Summary Statistic of the Comparison of Simulations Against Station Data, CMC Dataset, and ERA5-Land for Different SD Properties in All Catchments and CMIP6 Simulations

Table
).The CMC data set(Brasnett, 1999)is publicly available upon registering an account through the Canadian Meteorological Centre (CMC) website (https://daacdata.apps.nsidc.org/pub/DATASETS/nsidc0447_CMC_snow_depth_v01/). CMIP6 simulations provided by ESGF can be found by the open-source link: https:// esgf-node.llnl.gov/search/cmip6/.Users should select the variable as snd, which stands for snow depth or orog which stands for elevation, the frequency as daily, select Experiment ID as historical, and select CMIP6 models found in this study (See supplementary table: tab=form