Evaluation of CLASS Snow Simulation over Eastern Canada

The Canadian Land Surface Scheme (CLASS), version 3.6.1, was run offline for the period 1990–2011 over a domain centered on eastern Canada, driven by atmospheric forcing data dynamically downscaled from ERA-Interim using the Canadian Regional Climate Model. The precipitation inputs were adjusted to replicate the monthly average precipitation reported in the CRU observational database. The simulated fractional snow cover and the surface albedo were evaluated using NOAA Interactive Multisensor Snow and Ice Mapping System and MODIS data, and the snow water equivalent was evaluated using CMC, Global Snow Monitoring for Climate Research (GlobSnow), and Hydro-Québec products. The modeled fractional snow cover agreed well with the observational estimates. The albedo of snow-covered areas showed a bias of up to20.15 in boreal forest regions, owing to neglect of subgrid-scale lakes in the simulation. In June, conversely, there was a positive albedo bias in the remaining snow-covered areas, likely caused by neglect of impurities in the snow. The validation of the snow water equivalent was complicated by the fact that the three observationbased datasets differed widely. Also, the downward adjustment of the forcing precipitation clearly resulted in a low snowbias in some regions. However, where the density of the observations was high, the CLASS snow model was deemed to have performed well. Sensitivity tests confirmed the satisfactory behavior of the current parameterizations of snow thermal conductivity, snow albedo refreshment threshold, and limiting snow depth and underlined the importance of snow interception by vegetation. Overall, the study demonstrated the necessity of using a wide variety of observation-based datasets for model validation.


Introduction
The pivotal role of seasonal snow cover in the climate system has long been recognized. Numerous studies (e.g., Cohen and Rind 1991;Yang et al. 2001;Gong et al. 2004;Fletcher et al. 2009) have demonstrated the effect of snow albedo and temperature feedbacks on climate simulations. The relationship between the spring Tibetan snow cover and the strength of the East Asian monsoon is well established (e.g., Seol and Hong 2009). The impact of climate change on future snow cover is a topic of ongoing concern (e.g., Adam et al. 2009;Bavay et al. 2009). However, it is a recognized fact that current climate models vary quite widely in their treatment of snow and that this has led to substantial continuing divergence among climate simulations (e.g., Pedersen and Winther 2005;Qu and Hall 2007;Thackeray et al. 2015). For example, several studies have demonstrated the importance of properly taking into account the interactions between snow and vegetation (e.g., Essery 2013; Thackeray et al. 2014;Loranty et al. 2014). It is therefore important that the snow models used in climate studies be carefully validated.
This study describes a large-scale assessment of the snow simulation produced by the Canadian Land Surface Scheme (CLASS), which is used operationally in the Fourth Generation Canadian Atmospheric General Circulation Model (CanAM4;von Salzen et al. 2013). CLASS is also used operationally within the Canadian Regional Climate Model, versions 4 and 5 (CRCM4 and CRCM5, respectively;Caya and Laprise 1999;Zadra et al. 2008), which is employed as a research tool at l'Université du Québec à Montréal (UQAM; e.g., Garnaud et al. 2015;Diro et al. 2014;Ganji et al. 2015) and in the production of climate change simulations for North America and eastern Canada by the Ouranos Consortium in Montreal (e.g., Music and Caya 2007;Frigon et al. 2010); thus, this study provides a useful reference point for these groups as well. CLASS, like many other land surface schemes associated with atmospheric models, has participated in most of the Project for the Intercomparison of Land-Surface Parameterization Schemes (PILPS) experiments. Three of these included a snow simulation evaluation component: PILPS2d, which employed data collected at a boreal grassland watershed near Valdai, Russia (Schlosser et al. 2000;Slater et al. 2001); PILPS2e, which used data collected over the Torne-Kalix River basin in northern Sweden (Bowling et al. 2003;Nijssen et al. 2003); and PILPS2f, which undertook a set of aggregation experiments over the Rhône River basin in France (Boone et al. 2004). CLASS has also participated in the two international Snow Model Intercomparison Project (SnowMIP) experiments (Etchevers et al. 2004;Brown et al. 2006;Essery et al. 2009;Rutter et al. 2009). In addition, several other independent validation studies of the CLASS snow simulation have been carried out using Canadian field datasets (e.g., Pomeroy et al. 1998;Munro et al. 2000). These have all provided valuable insights into the behavior of the CLASS snow model and have often led to improvements in the model algorithms. Such studies have, for the most part, made use of data collected at a small number of field sites (except for the PILPS2e and PILPS2f experiments, and those did not undertake any in-depth evaluation of the snow simulation). Therefore, there remains a need for a validation study of the CLASS snow simulation on a regional scale, comparable to the scales typically used when running with atmospheric models.
It was decided to test CLASS in uncoupled mode, using forcing data derived from downscaled atmospheric reanalyses. Although this neglects the effects of feedbacks between the land and the atmosphere, for the purposes of model validation, it has certain advantages over running coupled to an atmospheric model (e.g., Parajka et al. 2010;Toure et al. 2016;Brun et al. 2013). It enables the simulations to be run more quickly than in coupled mode and facilitates the examination of model sensitivities. It also allows for mitigation of biases in the atmospheric forcing fields by adjusting the fields to more accurately reflect observations (as described in section 4 below). This more easily enables the identification of shortcomings in the simulation that are purely related to the formulation of the land surface scheme. In coupled runs, it is often difficult to distinguish whether it is the atmospheric model or the land surface scheme that is in need of improvement.
A key goal of this study is to document the performance of the snow model portion of the newly released version 3.6.1 of CLASS, which is to be used within the Canadian Earth System Model (CanESM) in the upcoming CMIP6 project. A previous study of the snow simulations generated by the models participating in the past CMIP5 experiment (Thackeray et al. 2015) found that the models did reasonably well at simulating the seasonal evolution of the snow-covered area, but showed substantial biases in the snow albedo, especially in boreal forest areas. The version of CLASS that was used in the Second Generation Canadian Earth System Model (CanESM2) was version 2.7, which was frozen for operational use in 1997. A number of improvements to the model, including the snow portion, have been introduced since then. This study also provides an opportunity to conduct certain sensitivity tests on selected model parameterizations.

CLASS description and snow model characteristics
The basic structure of CLASS is the same in version 2.7 and in version 3.6.1. Briefly summarized, CLASS models separate energy and water balances for the vegetation canopy, snow, and soil. As a first-order treatment of subgrid-scale heterogeneity, the vertical energy and water fluxes are modeled separately for four subareas in each grid cell: vegetated, bare soil, vegetated with snow cover, and bare soil with snow cover. The basic equations governing the transfer of energy and water between the atmosphere and the canopy, snow, and the three soil layers are described in Verseghy (1991) and Verseghy et al. (1993). Extensive testing and improvement of the model algorithms have been carried out over the years, involving the participation of numerous university and government colleagues [e.g., see summary in ].
Snow in both versions of CLASS is modeled as a single layer, thermally separate from the underlying soil. Some researchers have opted for multiple layers in their snow models, but the results of the SnowMIP experiment (Etchevers et al. 2004) demonstrated that singlelayer models were able to perform as well as multilayer models in their simulation of the seasonal snowpack evolution. CLASS in particular assigns a fitted quadratic temperature curve over the snow layer as a first-order way of accounting for the characteristic sharp nearsurface temperature gradient in snow packs. The snow albedo decreases and the snow density increases exponentially with time from fresh snow values according to empirically derived functions [based on data from Aguado (1985), Robinson and Kukla (1984), Dirmhirn and Eaton (1975), Longley (1960), and Gold (1958)]. The fresh snow density is determined as a function of the air temperature (Pomeroy and Gray 1995). The snow thermal conductivity is derived from the snow density [in CLASS, version 2.7, based on the work of Mellor (1977)]. Melting of the snow layer can occur either from above (in response to surface energy fluxes) or from below (in response to conduction from the underlying soil); if melting occurs at the top, percolation and refreezing of meltwater in the snowpack occurs until the snow is isothermal at 08C, after which meltwater can infiltrate into the soil. Interception of snowfall by vegetation is explicitly modeled, with the interception capacity depending on the vegetation leaf area index. The snow cover is assumed to be complete if the diagnosed snow depth is equal to or greater than a threshold value of 0.10 m; if it is less, the snow depth is set to this threshold value and the fractional snow cover is determined on the basis of conservation of snow mass.
The new version of CLASS used in this study, version 3.6.1 (Verseghy 2017), incorporates several improvements to the snow model since version 2.7. These include a revised formulation for vegetation interception of snow (Bartlett et al. 2006), a new parameterization for unloading of snow from vegetation , adjustments to the albedo of snow-covered canopies (Bartlett and Verseghy 2015), a revision of the limiting snow density as a function of depth (Tabler et al. 1990;Brown et al. 2006), and new algorithms for snow thermal conductivity (Sturm et al. 1997). Water retention in snow packs has also been incorporated (Bruce and Clark 1980), and the snow albedo refreshment threshold has been updated (see discussion in section 5f below).

Model setup, forcing, and background data
The modeling domain chosen for this study was the eastern portion of Canada, centered over the province of Quebec (see Fig. 1). This area has also been the focus of other recent studies using CLASS coupled to CRCM4 (e.g., Langlois et al. 2014;Wang et al. 2014). The vegetation cover over the domain ranges from tundra in the north, through boreal forest in the central region, to temperate mixed and deciduous forest and agricultural areas in the south. The central Quebec-Labrador area is characterized by the second-largest seasonal snow accumulation over the North American continent, after the western Rocky Mountains. The CLASS runs analyzed in this study were driven using data from the European Centre for Medium-Range Weather Forecasts (ECMWF) interim reanalysis (ERA-Interim; Dee et al. 2011). The reanalysis was dynamically downscaled over the model domain from 0.758 to 0.258 horizontal resolution (the CLASS target modeling grid) and from a 6-h to a 15-min time step using version 5.0.1 of the CRCM (Zadra et al. 2008). A modeling period from June 1990 to May 2011 was selected. The forcing fields needed to drive CLASS were saved at each time step: incoming shortwave radiation; incoming longwave radiation; temperature, humidity, and wind speed at the lowest model level; height of the lowest model level; rainfall and snowfall rates; fractional cloud cover; and surface pressure. The provision of separate rainfall and snowfall rates constitutes an important advantage over previous studies, since some groups (e.g., Langlois et al. 2014), which used an empirical partitioning between rainfall and snowfall based on the surface air temperature, found the snow simulation to be very sensitive to the parameterization used. The first simulation year, from June 1990 to May 1991, was treated as a spinup period and was not included in the subsequent analyses. Gridded maps of the land cover and soils over the model domain are required to run CLASS. The landcover classification used was the Global Land Cover 2000 (GLC2000) dataset (Bartholomé and Belward 2005), the same as that used in coupled mode with CanAM4. Soil profile data were obtained from the ''Soil Landscapes of Canada'' dataset produced by the Canadian Soil Information Service of Agriculture and Agri-Food Canada over Canada, and from USGS data (Miller and White 1998) over the United States. Initial soil temperatures for each grid cell were set to the average forcing air temperature over the modeling period, and initial soil moistures were set to field capacity.

Validation datasets
For the validation of the near-surface climate and the snow simulation, an effort was made to avoid the complication of model biases as much as possible by selecting datasets that were primarily observation based, that is, reliant on surface observations or remote sensing. These datasets are described below.

a. Precipitation and surface air temperature
The Canadian Gridded Temperature and Precipitation Anomalies (CANGRD) dataset (Milewska et al. 2005) is a product of the Climate Research Division of Environment Canada (now Environment and Climate Change Canada). It is based on weather and climate station observations from 1900 to the present of total precipitation, as well as minimum, maximum, and mean screen-level air temperature, corrected for systematic errors such as gauge undercatch and for the effects of observation system changes, station relocation, and spatial inhomogeneities. The locations of the observing stations over the modeling domain are shown in Fig. 2a. Gandin's method of optimal interpolation [described in Alaka and Elvander (1972)] was used to generate gridded monthly temperature and precipitation anomalies (departures from 1961-91 normals) based on these station data on a 50-km polar stereographic grid. The normals were interpolated to the grid using the ANUSPLIN model (based on thin plate smoothing splines, developed at the Australian National University), and the anomalies were combined with the gridded normals to produce the final, corrected observationbased datasets of temperature and precipitation. For the purpose of this study, the CANGRD data were interpolated to the 0.258 CLASS modeling grid using bilinear interpolation.
A second gridded dataset of precipitation and temperature was also used for validation, Climatic Research Unit Time Series, version 3.21 (CRU TS3.21), of the University of East Anglia (Harris et al. 2014). The database is derived from monthly observations at meteorological stations around the world, interpolated to a 0.58 grid, and covers the period from 1901 to 2012. Data for the modeled time period were extracted and interpolated to the model grid using the nearest-neighbor approach.

b. Snow cover and surface albedo
Snow cover and surface albedo data derived from two satellite retrieval sources were used for the evaluation of the snow simulation. Snow cover data were obtained from the NOAA daily hemispheric operational snow maps at 24-km resolution, which have been produced since 1997 on the basis of a variety of satellite data using the Interactive Multisensor Snow and Ice Mapping System (IMS; Ramsay 1998). This dataset reports a simple binary measure of the presence or absence of snow based on a 50% snow cover fraction threshold. The monthly snow cover fraction was computed from the number of snow-covered days. Snow cover and surface white-sky albedo were obtained from the Moderate Resolution Imaging Spectroradiometer (MODIS) products, available at 0.058 resolution since 2000 (Schaaf et al. 2002;Hall et al. 2002). The data were interpolated to the model grid using the nearest-neighbor approach for the NOAA IMS data and box averaging for the MODIS data, and averaged for each month. To minimize the problem of cloud cover contamination of the MODIS images, only high-quality (grade 2 or better) retrievals were included in the averaging. Regarding the two snow cover datasets, Brown et al. (2010) found that the NOAA IMS product reported consistently higher snow cover fractions than the MODIS product during the spring melt period. Wang et al. (2014) suggested that the NOAA IMS and the MODIS products may be interpreted as providing the upper and lower limits, respectively, of the ''true'' snow cover fraction.

c. Snow water equivalent
Three snow water equivalent (SWE) datasets were used in the evaluation of the snow simulation. The first is derived from the global snow depth analysis that has been produced on a daily basis by the Canadian Meteorological Centre (CMC) since 1998 and archived on the same 24-km grid as the NOAA IMS product described above (Brown and Brasnett 2010). The CMC global daily snow depth analysis (Brasnett 1999) is based on optimal interpolation of in situ daily snow depth observations with a first-guess field generated from a simple snow accumulation and melt An SWE product is derived from the snow depth product by means of a climatologically based density lookup table (Brown and Mote 2009). It must be noted that in datasparse areas (i.e., most of Quebec north of ;558N), the snow depth will depend almost entirely on the background first-guess field, which is derived from the GEM forcing precipitation. Compared with the CRU data, we have found that the wintertime GEM precipitation in these northern areas tends to be biased low, by around 0.1-0.4 mm day 21 . Also, where observations come from coastal locations or large open areas at airports, the observed snow depth itself will tend to be biased low (because of unrepresentatively temperate conditions at the former and erosion of the snowpack by wind at the latter; Brown and Brasnett 2011). The CMC SWE values for these areas must therefore be interpreted with caution. The second dataset is the European Space Agency (ESA) Global Snow Monitoring for Climate Research (GlobSnow), version 2, SWE dataset (Takala et al. 2011). This product makes use of data assimilation techniques to combine satellite passive microwave data with surface snow depth observations. It spans the period from 1979 to the present and provides Northern Hemisphere coverage (excluding mountains and ice sheets) at a spatial resolution of approximately 25 km. Monthly SWE data were extracted for the model time period and interpolated to the model grid using the nearest-neighbor approach. As with the CMC dataset described above, the GlobSnow dataset will tend to be biased low where the snow data come from coastal or airport locations. Another drawback is the fact that passive microwave SWE retrieval algorithms systematically begin to underestimate at SWE values greater than approximately 100-150 kg m 22 due to signal saturation (e.g., Dong et al. 2005).
The third dataset consists of historical snow course data for Quebec and adjacent regions, covering the period 1936-2006 and incorporating measurements taken by the Meteorological Service of Canada, provincial governments (Ontario, New Brunswick, and Quebec), and the hydroelectric power company Hydro-Québec and partners (Brown 2010). These measurements were made biweekly, around the first and the fifteenth day of the month, during the time of maximum snow accumulation and into the spring melt period, to provide information in support of hydroelectric power generation. This dataset has the advantage of containing a large number of direct, multiple measurements of SWE at given locations for certain regions over a long time period. It also has the advantage of avoiding low biases associated with data from meteorological stations located at coastal locations or at airports, which is a potential weakness of the CMC and GlobSnow datasets as noted above. On the other hand, utilizing the data presents difficulties since the seasonal number of the measurements is not always consistent, and the measurements are largely confined to selected river basins that contribute to hydroelectric production (see Fig. 2c). For this reason, no attempt was made to interpolate the data onto the model grid; the data points were simply allocated to the corresponding model grid cells using box averaging.

Results and discussion
a. Adjustments to precipitation in forcing data A reasonable simulation of the snow on the land surface is not possible if the precipitation data contain substantial biases. Therefore, before the simulations were carried out, the monthly average precipitation from the 21 years of forcing data were compared with the CANGRD and CRU observations. An overview of the seasonal differences is shown in Fig. 3, including average differences for the tundra (558-658N), taiga (508-558N), and southern zones (458-508N). It can be seen that, in general, the downscaled forcing data tend to show a positive precipitation bias. This is mostly small in the winter (averaging around 0.1 mm day 21 ), but is larger in the fall and spring, in some areas exceeding 1 mm day 21 . An adjustment to the precipitation forcing was therefore performed. For each month k of the simulation period, an adjustment factor F adj was calculated as the ratio of the CRU precipitation P CRU divided by the modeldownscaled precipitation P mod for that month: This factor was then used to scale the precipitation at each model time step within the month, so that for each month over the simulation, the monthly average modeled precipitation was exactly equal to the CRU value. (The CRU data were used rather than CANGRD, because the two are broadly similar as seen in Fig. 3, and the CANGRD data do not extend south of the Canadian border.)

b. Simulated temperatures
The simulated seasonal screen temperatures over the modeled 20-yr period (after the year of spinup) are compared with the CRU and CANGRD data in Fig. 4. It can be seen that the average fall anomalies over the tundra, taiga, and southern zones are relatively small (less than 18C), which would suggest that the snow accumulation period should be reasonably well simulated. The winter season on average shows quite a large warm anomaly in the south-central to west-central part of the domain, exceeding 28C in the southern zone, but FIG. 4. As in Fig. 3, but for seasonally averaged near-surface temperature T (8C).

1212
since the average temperatures associated with these regions are well below 08C during this period, it may be conjectured that this should not have a large effect on the snow simulation. The spring season in general shows a cold anomaly in the tundra zone and a warm anomaly in the southern zone. This suggests that we may expect a tendency for longer persistence of the snowpack in the north compared with observations and a quicker disappearance of the snow in the south.

c. Simulated snow cover
For the snow simulation, the analysis focused mainly on the spring ablation period, since that is the most challenging for a land surface scheme to model. The simulated fractional snow cover is compared with the NOAA IMS and the MODIS products for the 10-yr interval of 2001-10, when there is overlap between the two observational datasets and the modeled period. Figure 5 shows the average annual variation of the snow cover. It can be seen that, as noted in section 4b, the NOAA and MODIS values agree closely during the fall period but diverge in the spring period, when the NOAA values are consistently higher. Except for an underestimation in November of about 12%, the simulated snow cover tracks the MODIS average values closely. Figure 6 shows the modeled and observed monthly average snow cover for the months of March-June. By March the snow cover is fully developed and has reached its maximum extent. The southern extent of full snow cover is very well simulated, outlining the Laurentians in Quebec and the northern Appalachians. The Adirondacks in New York State and the Haliburton highlands in Ontario also show up clearly. In April the simulated southern extent of snow cover and the area of full snow cover match the MODIS outlines closely, with some apparent underestimation of the fractional snow cover south of 508. In May and June the differences between the NOAA IMS and MODIS datasets are quite marked, so it becomes difficult to assess the simulation. However, the simulated southern extent of snow cover again follows the two observational datasets closely in May, and in both months the simulated snow cover fraction lies between them, as shown in Fig. 5. Figure 7 (left, center) shows the variation of the simulated surface albedo through the months of March-June and the differences between the simulated monthly albedo and the MODIS values. Included on the plots are average differences for snow-covered areas in the tundra, taiga, and southern zones. Again, the 10-yr interval of 2001-10 is used. Grid cells reporting missing data (indicated by white areas) are those where one or more of the 10 years' MODIS monthly averages were unusable because of cloud contamination. Average differences in the albedo of snow-covered areas for the 4 months are shown in Table 1.

d. Simulated albedo
It can be seen that the spatial patterns of the simulated albedos and the differences between them and the MODIS data look very similar for the two months of March and April over the snow-covered areas of the domain north of 508. The largest anomalies in the simulated albedo, reaching an average of 20.15, occur in the taiga zone, which is characterized by high coverage of evergreen needleleaf forest as shown in Fig. 8a. It has been suggested in other studies that albedo errors in such regions may point to an erroneous parameterization of the leaf area index or of forest type (e.g., Thackeray et al. 2015;Wang et al. 2016). However, in our particular case, the most important factor likely lies in the treatment of subgrid-scale lakes. Over our eastern Canada modeling domain the fractional coverage of lakes is very high, reaching as much as 40% for some grid cells, as shown in Fig. 8b. Upon comparing the lake distribution with the albedo differences, a strong correlation can be seen. This is because although a subgrid lake model has recently been developed for CLASS (Mackay 2012), it is still undergoing testing and has not as yet been implemented in our modeling framework. Where lakes occur, the fractional vegetation cover is currently normalized by the total fraction of lake-free land, which has the effect of extrapolating the existing vegetation over the lake areas. This leads to an overestimation of the needleleaf forest cover compared with the original data by about 15%, as shown in Fig. 8c, and therefore to an underestimation of the surface albedo, since if lakes were modeled, they would be frozen over and snow covered in the winter, and consequently brighter than the surrounding forest-covered land.
To further investigate this hypothesis, we performed an experiment where the unextrapolated vegetation data were used, which means in effect treating the subgrid lakes like bare soil. For the months of March and April the albedo differences are greatly reduced (Fig. 7,  right), by around a factor of 2, as shown in the zonal averages and in Table 1. In the month of May the differences between the simulated and the MODIS albedos are small for both the original simulation and the one with unextrapolated vegetation. In June, the simulated albedos are overestimated compared to the MODIS data in a swath stretching from the Ungava Peninsula south to approximately 528N. In the southern part of this area (i.e., around 558N), where the snow cover has largely disappeared, the overestimation is slightly worse when the vegetation is unextrapolated. This region is characterized by very sandy soils (see Fig. 8d), which are assigned a high albedo in CLASS (0.45 for dry soil). Much of this area is rich in subgrid lakes, so if the vegetation is not extrapolated, that is, if dark, ice-free lakes  8. (a) Fractional coverage of evergreen needleleaf forest in the base run, (b) fractional coverage of subgridscale lakes in the source GLC2000 dataset, (c) fractional coverage of boreal forest in the run with unextrapolated vegetation cover, and (d) percent sand content of first soil layer (zero sand content represents areas with organic soil or continental ice sheets). Over the taiga area (508-558N), the fractional coverage of boreal forest decreases from 74% to 64% between (a) and (c). are treated like bright, sandy soil, the overestimation becomes exacerbated. In the northern part of the area of albedo overestimation, where snow is still present, the overestimation may be related to delayed melt onset, reflected by the spring cold bias (Fig. 4), or possibly to neglect of the effect of impurities in old snow in the CLASS snow albedo calculations. A parameterization of the effect of black carbon on the snow albedo has recently been developed for CLASS (Namazi et al. 2015), and this is used when CLASS is run coupled to CanAM4, when aerosol deposition on the surface is modeled as part of the global climate simulation. (Aerosol forcings were not available in the current study.)

e. Simulated SWE
The two gridded SWE datasets, GlobSnow and CMC, were compared with the simulation for the 12-yr period of 1998-2010, in which the three overlap. Figure 9 presents plots for the months of March-June for the GlobSnow and CMC average SWE values, and the differences between the modeled and CMC average SWE, including averages for the tundra, taiga, and southern zones. The first point to note is that the GlobSnow dataset appears as an outlier. Despite its relatively high resolution, the SWE fields seem greatly smoothed and lacking the SWE maxima in the coastal areas north of the Gulf of St. Lawrence and in eastern and northern Labrador. Similar patterns were reported in another study by Zhou et al. (2014) focusing on the whole circumpolar area. This is partly explained by the problem, noted above, of signal saturation at SWE values greater than around 150 mm, but the dataset also shows a considerably earlier ablation of the snow cover than either the CMC or the model data, despite the fact that signal saturation at this time should not pose a problem. We therefore conclude that the GlobSnow product is in need of refinement in this particular region of the world and hereafter focus on the CMC data alone. A similar conclusion was reached by Larue et al. (2015).
The second point to note is that adjusting the model forcing precipitation fields to match the monthly CRU values seems actually to have degraded the snow simulation in some areas. Comparing the difference during the fall season between the original forcing precipitation and the CRU data in Fig. 3 with the March plot of modeled minus CMC SWE in Fig. 9, it can be seen that much of the area with negative anomalies in modeled SWE corresponds with areas where the fall precipitation was scaled back, such as in southwestern Quebec, Newfoundland, and especially the north shore of the Gulf of St. Lawrence, where the SWE anomaly is very marked both in March and April. (The low SWE bias in the areas south of 508N in March and April is reflected in the warm springtime air temperature bias, as shown in Fig. 4.) Of course the CRU product, like the CANGRD product, relies on surface weather observations, which as shown in Fig. 2 become very sparse in remote and northerly areas, and are also biased toward coastal locations that may not be representative of the general area. Scaling the forcing precipitation to the CRU values has therefore clearly tended to introduce a low bias in some areas, which is in fact greater than would have been the case if the CANGRD precipitation had been used, since as noted in section 4a above the latter incorporate corrections for systematic biases such as gauge undercatch.
Keeping in mind the above considerations, the most notable feature of the SWE difference plots is the quite substantial positive bias in modeled SWE compared to the CMC data for March-May in Labrador and in the northern part of Quebec; the tundra zonal average SWE anomaly in April exceeds 65 kg m 22 . This is further illustrated by the annual average variation in SWE for the tundra, taiga, and southern zones, shown in Fig. 10. In the southern zone the modeled SWE tracks the CMC SWE very closely, showing a maximum bias of about 27% in February and March. In the taiga zone the modeled SWE differs from the CMC SWE by only a few percent until April, when it develops a positive bias of over 30%. In the tundra zone the modeled SWE reaches a positive bias of almost 60% in April. (The areally averaged SWE atypically increases from south to north over this domain because of the high SWE values over the Laurentians in the taiga zone, and because of the large maximum over the Torngat Mountains coupled with the relatively small land area in this zone.) It must be noted that since the CMC SWE product is not well supported by surface observations toward the northern part of the model domain, it is unclear whether the apparent model overestimation of SWE is real or whether it reflects a bias in the observations. As noted in section 4c above, we may in fact expect a low bias in the CMC SWE in the northern part of the domain where surface observations come almost exclusively from coastal or airport locations, and also because of the tendency toward a low bias in the GEM forcing precipitation compared with the CRU data. On the model side, for the tundra zone it is possible that the positive bias in the modeled SWE may be at least partly caused by the neglect of sublimation of blowing snow. Field studies, for example, Pomeroy and Li (2000), have found that for Arctic sites, blowing snow sublimation can exceed 20% of the snowfall, but the application of blowing snow models designed for point or catchment scales to global climate simulations presents substantial difficulties. At any rate, it is encouraging that the modeled SWE compares well with the CMC SWE in the southern zone, where surface observations are more plentiful.
We now turn to the historical snow course data described above for a second, independent assessment of the SWE simulation. In the case of this dataset, we have a different 12-yr period of overlap between the model run and the observations, 1992-2003. As noted in section 4c and as shown in Fig. 2c, the snow course observations are clustered in the southern Canadian part of the domain and in river basins associated with hydroelectric power production. Also, a unique challenge in utilizing this dataset lies in the fact that in any given area, the number of observations generally peaks at the time of maximum SWE and then falls off rapidly, so that the total number of observations varies from year to year. It should also be kept in mind that during the spring melt period, the snow course data are only reporting from areas that still have a snow cover, which creates a growing bias compared to the modeled areal average, and for logistical reasons snow course measurements tend to be made at forest edges or in clearings. Thus, in contrast to the CMC data, the historical SWE values will generally tend to be biased high.
For these reasons, we elected to compare the modeled and observed SWE by region and by date. We selected 12 regions, shown in Fig. 11, where the observations were relatively dense in space and relatively plentiful across the 12-yr period, paying particular attention to regions where the simulated SWE and the CMC SWE differed most substantially, but including some more southerly regions so as to provide a wider assessment of the snow ablation period. For the dates, we selected the first of the month from February to May. As noted above, the snow course measurements were taken around the first and the fifteenth of each month, but the measurement date at any given site sometimes varied by as much as several days (as illustrated in Fig. 12), so in order to capture as much data as possible, we used a window of 67 days around the chosen date. For each selected date, within each region, for each year we calculated the average observed SWE over the grid cells where snow course observations were available, and the average modeled SWE over the same grid cells, and then averaged these values over the 12-yr period. Regions 1-5 were selected to correspond to areas where large positive differences occurred between the modeled and CMC SWE values in the March and April plots, regions 6-9 were chosen to provide examples of snowpack evolution in southerly areas, and regions 10-12 were selected as examples of areas where the adjustment of the forcing precipitation had evidently caused an underestimation in the modeled SWE. The results for 1 February, 1 March, 1 April, and 1 May are shown in Tables 2-5. Also shown are the cumulative modeled precipitation, sublimation, melting, and freezing fluxes for the snowpack from the previous summer to the date in question. To enable a general comparison with the CMC data, Table 6 shows the average modeled and CMC SWE for the same 12 regions, for the months of because they represent monthly averages and not date snapshots, second because they are averaged over the whole regions and not just where snow course measurements were available on a given date, and third because they relate to a different 12-yr time period. However, it is possible to investigate in general terms the overall biases seen in the modeled SWE compared to the CMC SWE and ascertain whether they are in fact confirmed by this second comparison. It is striking to note that even though the modeled SWE has been shown on the whole to be biased high relative to the CMC SWE, without exception the historical SWE values for each region and each date are larger than the modeled values-in some cases markedly so. For regions 1-5, the largest mean bias error (MBE) for the modeled SWE anomalies compared with the CMC product, 161.7 mm, occurs in April. Yet the MBE associated with the modeled versus the historical SWE values is 243.3 mm on 1 April and 251.2 mm on 1 May. This can partly be attributed to the fact that, as noted above, the snow course measurements are increasingly biased high during the snowmelt period compared with areal averages, since they are only reporting from areas where there is still snow. However, the dichotomy between the CMC and the historical SWE values is evident even as far back as February, when the MBE associated with the CMC monthly average is 119.8 mm, yet the MBEs associated with the historical data from 1 February and 1 March are 219.8 and 223.9 mm, respectively. We may conclude that in view of the evident biases associated with both the observational datasets, we should not be perturbed by the seemingly high bias in the modeled SWE compared with  the total number of grid cells over the 12-yr period in each region containing snow survey data. Variables P, S, M, and F (mm) represent the cumulative modeled precipitation, sublimation, melt, and freezing fluxes for the date, respectively. The SWE MBE for regions 1-5 is 219.8 mm, for regions 6-9 is 25.9 mm, and for regions 10-12 is 241.4 mm. The r 2 value for the 12 regions is 0.79.   Table 2, but for 1 Mar. The SWE MBE for regions 1-5 is 223.9 mm, for regions 6-9 is 216.7 mm, and for regions 10-12 is 258.2 mm. The r 2 value for the 12 regions is 0.85.  Table 2, but for 1 Apr. The SWE MBE for regions 1-5 is 243.3 mm, for regions 6-9 is 223.7 mm, and for regions 10-12 is 287.5 mm. The r 2 value for the 12 regions is 0.89. That the downscaled precipitation is at least partly to blame (especially in the earlier months, associated with the maximum number of snow course data reports) is particularly evident in region 12, where the historical SWE is larger than the cumulative precipitation on the snowpack for all three dates of 1 February, 1 March, and 1 April.

f. Sensitivity tests
Previous studies have suggested that the snow simulation is sensitive to various parameterizations and threshold values in the model. Four tests were therefore carried out to explore the sensitivity of the snow simulation to some of these. The differences between the SWE values simulated in the base run and those simulated in the sensitivity tests for each of the 12 test regions listed above for 1 April are summarized in Table 7.
The first sensitivity test investigated the effect of replacing the Mellor (1977) formulation for the snow thermal conductivity, originally used in CLASS, version 2.7, with the Sturm et al. (1997) formulation, introduced in CLASS, version 3.6. The Mellor values are on average about twice as large as the Sturm et al. values, resulting in colder soils over the winter period. Offline CLASS simulations at three Quebec forest sites (R. Harvey et al. 2010, poster presentation) showed that simulations using the Sturm et al. formulation yielded more realistic snow and soil temperatures than with the Mellor formulation. In the present sensitivity test, during the winter months the simulated SWE is similar for the two runs. However, during the spring ablation period, snowmelt proceeds more slowly in the run with the Mellor formulation because of the colder underlying soil temperatures, as shown in Table 7, where on 1 April the SWE in the Mellor run is on average 14% larger than in the Sturm et al. run. By May and June the fractional snow cover in the Mellor run is close to or even slightly exceeds the snow cover extent of the NOAA IMS product (not shown), thus providing further evidence that the Sturm et al. formulation results in a more realistic snow simulation.
The second sensitivity test took the extreme step of neglecting completely snow interception by vegetation. This is deemed an instructive exercise, as some global atmospheric models still do not explicitly model this process (e.g., Dutra et al. 2010). The vegetation interception capacity was set to 0 for this test, and Table 7 shows that doing so causes an SWE increase of over 40% The third sensitivity test investigated the effect of varying the snow albedo refreshment threshold, that is, the depth of snowfall over a time step that is required to refresh the snow albedo to the fresh snow value. Previous studies (e.g., Wang et al. 2014) have demonstrated high sensitivity to this parameter. The value assigned to this threshold in CLASS, version 2.7, was 0.0013 m; a value of 0.005 m was adopted on an experimental basis in CLASS, version 3.1, but this was later revisited; and after extensive testing the value chosen for CLASS, version 3.6, was 0.0001 m. It was found that such a small value is needed in order to take account of the bias in the atmospheric model forcing of low-intensity, long-duration precipitation events-a known characteristic of large-scale atmospheric models (Dai 2006). As shown in Table 7, changing the snow albedo refreshment threshold back to 0.005 m results in an overall average decrease in the 1 April SWE by nearly one-quarter. The May and June snow cover fractional areas (not shown) are much smaller even than in the MODIS product, which as noted above is deemed to represent the lower limit of observation-derived snow cover. This therefore confirms that a threshold value of 0.005 m is substantially too high, supporting the earlier conclusion reached by Wang et al. (2014).
The final sensitivity test addressed the effect of the choice of the value assigned to the lower limit of snow depth, below which the snow cover in CLASS is modeled as discontinuous, as described in section 2 above. Snow simulations performed by other land surface schemes have sometimes shown substantial sensitivity to the method used to calculate fractional snow cover (e.g., Wang and Zeng 2010). In contrast, CLASS shows little sensitivity to this limiting depth, perhaps owing to the fact that, in other schemes, often this limiting snow depth is also used to take into account the effects of vegetation masking of the snow cover, whereas in CLASS this is treated explicitly. Increasing or decreasing the value of the limiting depth, currently set to 0.10 m, by 50% has little effect on the 1 April SWE simulation, as shown in Table 7. The differences in the May and June fractional snow cover fields are likewise very small (not shown). We therefore conclude that the current constant value of 0.10 m for the snow limiting depth seems to be a reasonable choice even over a wide range of landscape types.

Summary and conclusions
The Canadian Land Surface Scheme (CLASS), version 3.6.1, was run over a domain centered on eastern Canada for the period 1990-2011, driven by forcing derived from dynamically downscaled ERA-Interim. An adjustment was applied to the precipitation field to make it equal to the CRU monthly averages. The snow cover fraction was well simulated, tracking the MODIS and NOAA IMS estimates closely. Biases in the modeled albedo compared with MODIS estimates were only a few percent, except in boreal forest areas owing to the neglect of subgrid-scale lakes and in the tundra area late in the snow season possibly because of neglect of impurities in old snow. Validation of the simulated SWE was challenging because of known or suspected biases in the observation-based datasets (low for the CMC data and high for the historical snow course data) and scarcity of observations in the northern part of the domain. The modeled SWE agreed well with both datasets in the TABLE 7. The 1 Apr SWE values for the 12 test regions for the base run and the four sensitivity tests: using the Mellor formulation for snow thermal conductivity, neglecting vegetation interception of snow, changing the snow albedo refreshment threshold (SART) from 0.0001 to 0.005 m, and changing the limiting snow depth D lim below which the snow cover becomes patchy from 0.10 to 0.05 m and 0.15 m. southern part of the domain and lay between the two in the taiga zone. In the tundra zone the modeled SWE may have been biased high, possibly owing to neglect of blowing snow sublimation, but this hypothesis will require further study. Four tests runs were carried out to explore the sensitivity of the snow simulation to different parameterizations. The formulation for the snow thermal conductivity used previously in CLASS was found to lead to an unrealistically long persistence of the snow cover. The neglect of vegetation interception of snow caused an overall increase in SWE of over 40%, because of the large coverage of evergreen needleleaf forest over this domain. Setting the snow albedo refreshment threshold to the larger value previously used on an experimental basis resulted in a low SWE bias and an overly rapid disappearance of the snowpack. Finally, the simulation was found to be quite insensitive to the choice of the snowpack depth at which the snow cover was modeled as becoming discontinuous.
In summary, although the validation results are inconclusive in a few areas, in view of the evident uncertainty in the observation-based datasets and the substantial disagreement between some of them, the performance of CLASS is deemed satisfactory. Clearly, when undertaking a model validation exercise, a variety of observational datasets must be used if one is not to run the risk of misinterpreting the simulation results and possibly making an attempt to reparameterize the model to address a problem that in reality does not exist. We conclude that this study has not revealed any worrisome biases in the CLASS snow simulation, and that for future coupled runs such as those planned for CMIP6, we have found no grounds for anticipating any significant land surface model related biases in the snow simulation.