The Latest Improvements in SURFEX v8.0 of the Safran-Isba-Modcou Hydrometeorological Model over France

The Safran-Isba-Modcou (SIM) hydrometeorological system was developed and put into 15 operations at Meteo-France in the early 2000’s. SIM application is devoted to water resource monitoring, and therefore can help monitoring drought or forecasting flood risks over French territory. This complex system combines three models: SAFRAN analysis of near-surface meteorological variables, ISBA land surface model which aims at computing surface fluxes at the interface with the atmosphere and soil variables, and finally MODCOU a hydrogeological model which calculates river 20 discharges and groundwater level evolutions. SIM model was improved first by reducing the SAFRAN infrared radiation bias, then by using the more advanced multi-layer diffusion surface scheme of ISBA to have a more physical representation of the surface and soil processes. At the same time more accurate and up-to-date input databases for vegetation, soil texture and orography were used and finally in mountainous areas a subgrid representation of the orography using elevation bands was adopted as well 25 as the possibility to add a reservoir to represent the effect of aquifers in mountains. SIM model was run for a 60y period from 1958 to 2018. The present paper describes the changes and demonstrates that the SIM new version performs better than the previous one by comparing a set of selected simulations to measurements of daily streamflow and observations of snow depths.

surface runoff is collected by the surface river network whereas deep soil infiltration contributes to the aquifer recharge. Such systems have been used for decades to study the water resource and forecast its evolution. Moreover, they have been used from the global to the regional scale to assess climate models against independent variables, such as river discharges and more recently against groundwater level. More generally, as a complement to surface and soil variable measurements, the evaluation of surface 40 and soil parameterizations benefits also from independent variable comparisons such as river discharges simulated by hydrogeological models that can be compared to in situ measurements from gauging stations Decharme et al, 2013;Alkama et al., 2010;Barthel and Banzhaf, 2015;Decharme et al. 2019). Interfacing SVAT and hydrogeological models in water resource studies, is an appropriate tool to address scientific questions such as the importance of climate change on these 45 resources (Vidal et al., 2010;Dayon et al., 2018;Bonnet et al., 2018) or how human activity influences these resources Biancamaria et al., 2019). Recent initiatives aimed at studying the impact of irrigation on water availability, such as those supported by the Global Energy and Water Exchanges project (GEWEX), where the modelling contribution of SVAT models appeared to be prominent and showed that irrigation should be accounted for in models . 50 At Meteo-France, SIM has been designed first to study the water cycle for large French basins such as the Rhone basin (Etchevers et al., 2000), the Adour basin (Habets et al., 1998), the Garonne river (Voirin et al, 2002) and finally the whole France . This latter system has demonstrated its usefulness in the number of derived applications it is used for. For instance, since 2003, SIM is used operationally at Meteo-France for drought monitoring using hindcast simulations, but 55 also for near real time applications. These applications over France were based on a SVAT model using the Force-Restore approach (Noilhan and Planton, 1989;Noilhan and Mahfouf, 1996) for the transfer of heat and water in the soil. However, this method exhibits some limitations in the realism of the physical parameterizations, that are detailed in 2.2. These limitations concern the representation of the snow, the interactions between snow and soil freezing that are not always well captured, the root vertical profile 60 description in the soil, or the composite approach to represent vegetation, mixing different types of vegetation in a single one, with aggregated characteristics. Although this method has proven in the last decades to be suitable to tackle scientific questions related to water resource, a revision towards a more physical method, based on the diffusion of heat and moisture in the soil (Decharme et al., 2011), was undertaken to account for more sophisticated numerical schemes to enhance the system performances. 65 The goal of the current paper is to show how the development of new parameterizations has improved the system performances. The current study, based on a 60y long time series covering 1958-2018, shows how improvements in atmospheric forcing, land surface model physics and subgrid orography and hydrology impact and enhance these results. It also aims to describe how results are impacted by each set of modification and finally to demonstrate the new model configuration performs better than 70 the previous one in terms of discharge extremes, when comparing model results to independent observed data such as snow depth or river discharges. Section 2 describes the original SIM configuration and its recent updates. Section 3 presents climate data and evaluation datasets, and the offline experiments used to demonstrate the advantages of the new SIM system. In section 4 the results of the new system are presented and finally they are discussed in 75 the last section.

Overview of the 2008 version of the model
The SIM hydrometeorological model  combines the three SAFRAN, ISBA and MODCOU models. SAFRAN (Durand et al., 1993;Quintana Segui et al., 2008) performs a 6-hourly 80 analysis of near-surface meteorological variables such as 2-meter air temperature and relative humidity, 24 h precipitation, wind speed, cloudiness and a daily analysis of 24 h accumulated precipitation. The analysis is performed on geographic areas of a few hundreds of square-kilometres and the analysed fields are interpolated at an hourly timestep and direct and diffuse solar radiation as well as infrared radiation are computed from cloudiness, temperature and humidity profiles analysis using a radiation 85 model (Ritter and Geleyn, 1992). A spatial interpolation is then performed on a regular horizontal grid of 8km mesh to provide to the ISBA (Noilhan and Planton, 1989;Noilhan and Mahfouf 1996) land surface model the necessary climate information. The grid is composed of 9892 cells (Fig. 1, left panel) covering France and is extended across its boundaries to encompass the upstream of the water basins. ISBA model uses SAFRAN analysis as input and computes the surface energy and water budget over 90 vegetated areas. The water balance in ISBA ensures that soil moisture results of the balance between the supply due to incoming precipitation, and the release caused by surface evaporation, runoff, and soil infiltration. These two latter components are used to feed the hydrogeological model MODCOU (Ledoux et al., 1989;Habets et al., 1998) to compute the time evolutions of river discharges for a given set of selected gauging stations and groundwater head where aquifers are simulated i.e. on the Seine and 95 Rhone basins only (delineated by the yellow areas in Fig. 1, right panel). The original SIM system differs in many aspects from the version described in the present paper.

Land surface model improvements in SURFEX v8.0
In the original SIM system, heat and water transfers in the soil were based on the Force-Restore method (Noilhan and Planton, 1989;Mahfouf, 1996, Decharme et al., 2011) that have been widely 100 used in research for decades and is still used operationally in the French numerical weather prediction global model ARPEGE (Courtier and Geleyn, 1988) and the mesoscale model AROME (Seity et al., 2011). In the Force-Restore method, the soil is split into two layers for temperature and three layers for moisture (Boone et al., 1999). However, such a method has exhibited some limitations in the representation of surface and soil processes like for instance the interaction between snow/soil freezing 105 (Luo et al., 1998) due to the vertical discretization or the non-ability to properly represent the root vertical profile in the soil (Braud et al., 2005) and therefore the vertical transfers of moisture and heat. The alternative of using the Force-Restore method was developed by Boone et al. (1999) and revisited by Decharme et al. (2011) who proposed to use diffusive equations to solve both the heat and water transfer equations in the soil, based on Fourier's and Darcy's laws, respectively. Such a method 110 proposes a discretization of the soil into 14 layers, resulting in a total depth of 12 meters, with a fine description of subsurface layers to well capture the diurnal cycle. The vertical discretization (bottom depth of each layer in meters) is the following: 0.01, 0.04, 0.1, 0.2, 0.4, 0.6, 0.8, 1, 1.5, 2, 3, 5, 8, 12, as described in Decharme et al. (2013). The heat transfer is solved over the total depth, whereas the moisture transfer is solved over the root depth only, which depends on the vegetation types, a few tens 115 https://doi.org/10.5194/gmd-2020-31 Preprint. Discussion started: 2 March 2020 c Author(s) 2020. CC BY 4.0 License. of centimeters for crops and a few meters for forests. In such a model, soil temperatures and moistures are calculated at the same nodes, which is mandatory to correctly represent soil freezing for example. Another noticeable improvement concerns the snow modelling. The original 3-layer snow scheme developed by Boone and Etchevers (2001) aimed at representing as realistically as possible the physical processes in the snow and for that purpose, some processes were adapted from the Crocus snow model 120 (Vionnet et al., 2012) devoted to snow avalanche forecasting. The main new features recently developed and implemented in the ISBA snow model are described in details in Decharme et al. (2016) and concern (i) snow layering with an increase of the number of layers close to the surface in contact with the air, but also to the ground to better represent the diurnal cycle and the heat transfer at the snow/ground interface, respectively, (ii) snow compaction due to viscosity changes (Brun et al., 1989) 125 and densification caused by wind at the surface (Brun et al., 1997), and (iii) snow absorption of solar radiation depending on 3-band spectral albedo. Then, the representation of the vegetation in the model has also evolved from the original version, where the various land use types present within a cell were aggregated with averaged surface parameters (Noilhan and Lacarrere, 1995), whereas the new system uses 12 separated vegetation types 130 each having its own set of parameters (Masson et al. 2003;Faroux et al, 2013). The classification distinguishes three non-vegetated types: rocks, bare soil and permanent snow and ice, and nine vegetated types: temperate broadleaf deciduous, boreal needleleaf evergreen, tropical broadleaf evergreen, C3 culture type, C4 culture type, irrigated crops, grassland, tropical grassland, and finally peat bogs, parks and gardens. Although this approach is more expensive in terms of computational time 135 because the model has to be run for each vegetation type, the realism of ISBA simulations is enhanced because the parameters are adapted to the various vegetation types. Moreover, using explicitly 12 vegetation types is mandatory when using ISBA-A-gs the simplified photosynthesis module of ISBA (Calvet et al., 1998) aiming at representing a realistic photosynthesis of the different biomes. In the new version the avoiding or tolerant response to drought is adopted . 140 Hydrology processes are obviously important in a system aiming at computing water balance for natural surfaces and simulating river discharges. The former parameterization of drainage developed by  for the Force-Restore scheme, has been replaced by a diffusive method of water in the soil. Surface runoff in ISBA occurs over saturated areas (Dunne and Black, 1970). Habets et al. (1998) have proposed a subgrid parameterization to generate surface runoff on grid boxes of 145 several square kilometres before the whole grid box was saturated, in order to account for regional heterogeneities such as orography variability or spatial inhomogeneity of precipitation. In this approach, based on the VIC model (Liang et al., 1994;Dümenil and Todini, 1992), the fraction of saturated area varies as a function of the soil water content and a curvature term that needs to be calibrated and defined as = ' ( ) *+ ,-.
( ) *+ ,/0 , 10 *4 5, where 789 and 7:; are the minimum and maximum elevations in a 150 grid box and + the standard deviation. In the original system b equals 0.5, a very high value as compared to other studies at the scale of a watershed. Indeed, a more realistic value should be around 0.2 (Lohman et al., 1997;Ducharne et al., 1998). However, the Force-Restore scheme is known to be too dry (Decharme et al. 2011) and a high slope (so a rather large curvature term) in the grid cell is mandatory to generate enough runoff. Using the diffusion scheme has removed this constraint of a 155 high b coefficient and in the new SIM application a value of 0.25 is now used over areas without aquifer https://doi.org/10.5194/gmd-2020-31 Preprint. Discussion started: 2 March 2020 c Author(s) 2020. CC BY 4.0 License.
where it is set to 0.01 corresponding to no subgrid runoff. Dümenil and Todini (1992)  , where w2 is the volumetric soil water content in the rooting zone and wsat its value at saturation, and for a loamy (w J:K = 0.45 m 3 /m 3 ) wet soil (w 4 = 0.4 m 3 /m 3 ), the presence of an aquifer ( = 0.01) is characterized 160 by a small saturated fraction area of about 2%.

Use of more accurate land surface parameters
In addition to the changes in the model physics described previously, the databases of soil texture, vegetation coverage and topography were upgraded to improve the realism of external parameters in the ISBA model. Only the hydrogeologic database representing aquifer and the routing network are 165 unchanged. Soil texture database in France was a soil map provided by the Institut National de Recherches Agronomiques (INRA -King et al., 1995) at a 1 km resolution. The update here consists in using the Harmonized World Soil Database (HWSD - Nachtergaele et al., 2012) which is a soil map at 1 km resolution that combines several datasets available worldwide. In particular over France the above-mentioned INRA soil map was integrated in the HWSD dataset (used in other applications using 170 SURFEX outside France), therefore, this change does not affect the SIM simulations. Figure 2 shows maps of clay and sand fractions, key parameters that control transfers of energy and moisture in the soil, aggregated at 8 km resolution from HWSD. In the same spirit, the Global 30 Arc-Second Elevation (GTOPO30, http://eros.usgs.gov/#/Find_Data/Products_and_Data_Available/gtopo30_info) at a 1 km resolution was 175 replaced by the Shuttle Radar Topography Mission (SRTM90, https://cgiarcsi.community/data/srtm-90m-digital-elevation-database-v4-1/) at a 90 m resolution to represent topography. The impact of using SRTM90 is also very limited because the target resolution of SIM applications over France is 8 km which implies that the orography is averaged at such a resolution whether GTOPO30 or SRTM90 is used. 180 The last input database modification is the vegetation map which provides the fraction of each ecosystem. The global ECOCLIMAP1 (Masson et al., 2002) at 1 km resolution was used originally in SIM application over France. Then a new classification algorithm was developed over Europe to build the ECOCLIMAP2 land cover map , in order to use more accurate and recent satellite information as inputs and for a longer time period. Among various differences, ECOCLIMAP1 185 was for instance using 1992-1993 AVHRR satellite data whereas ECOCLIMAP2 is using 1999-2005 SPOT/VEGETATION data. The impacts of changing the vegetation fraction as input to the ISBA model are manifold and won't be described here (see Faroux et al. (2013) for a detailed comparison), but using ECOCLIMAP2 makes the user more confident on the ability of such a database to capture the surface heterogeneities even if when performing numerical simulations with the ISBA model for long 190 time periods, the land cover varies in time and can obviously be different from the ones from 1999-2005 corresponding to the classification.

Changes in Infrared radiation
SAFRAN radiation was corrected to compensate a radiative deficit that was already identified in several studies (Le Moigne et al., 2002;Carrer et al., 2012, Decharme et al., 2013 although observations of this 195 variable are very scarce. Radiation in SAFRAN is simulated (Ritter and Geleyn, 1992) from a cloudiness analysis based on the analyses of air temperature and humidity profiles from the French global ARPEGE atmospheric model. Le Moigne (2002) and Carrer et al. (2012) have shown that the SAFRAN infrared radiation was biased low, and Decharme et al. (2013) have increased globally over France by 5% the infrared radiations in their offline simulations. For instance, Le Moigne (2002)  200 showed that both infrared and solar radiations were too low at Col de Porte site in the Alps and proposed a correction depending on cloudiness and altitude that was successfully applied to the Rhône basin during the Rhone-AGG intercomparison experiment (Boone et al., 2004). In this study, only the infrared correction is considered and applied over the whole France. The correction of infrared radiation, described in Appendix A, was established by comparing the SAFRAN analysis and the 205 infrared measurements of two meteorological stations Carpentras and Col de Porte which are reference stations for infrared measurements monitored by Meteo-France and located respectively in south-east France and in the Alps. Carpentras is located in plain whereas Col de Porte, the experimental measurement site of the Centre d'Etudes de la Neige, is located in the French Alps at a height of 1340 m ( Figure 3 shows the initial infrared radiation annual mean (left panel) and the amount of energy supply when the correction is applied (right panel).

Altitudinal subgrid variability in mountainous areas
In SAFRAN the analysis is performed on homogeneous areas of several hundreds of square kilometres 215 and an explicit vertical discretization is applied so that the analysis is done every 300 meters. For each grid box i, the analysed variables : ( ) are then interpolated on a horizontal 8 km grid, accounting for the averaged elevation of each grid box, and used as input to the ISBA model. At this resolution, the 9892 grid cells cover the whole extended France and some boundary areas for hydrological purposes. However, this resolution is still too coarse to accurately capture the subgrid variability of some 220 variables, in particular over mountains. Lafaysse et al (2011) demonstrated on the Durance basin that using elevation bands is an efficient method to better describe the spatial variability of the snowpack and its impacts on river discharges, at a much cheaper numerical cost than by increasing the horizontal resolution. Therefore, a similar approach was defined for the whole French territory and can be summarized as follows: for a given grid cell i, the SAFRAN analysis is performed every 300 m and the 225 : ( , ) are the k sets of analysed variables corresponding to each of the k elevation bands. For each i, if the vertical subgrid variability is large enough, a complementary set of k' bands is defined for different elevations to represent this variability. The vertical interpolation is then done on the atmospheric forcing at each k' band. ISBA simulates surface runoff and soil infiltration at each k' band that are used to compute the final surface runoff and soil infiltration for a grid cell i. Among the 9892 230 grid cells, 1044 are above 500 m and exhibit a large subgrid topographic variability. For each of these 1044 grid cells, between three and five layers are necessary to capture their total vertical extent, leading https://doi.org/10.5194/gmd-2020-31 Preprint. Discussion started: 2 March 2020 c Author(s) 2020. CC BY 4.0 License.
to a total of 3878 grid cells involved in the computations over mountains. Figure 1 (right panel) shows the 1044 grid cells topography height where elevations bands method is applied. Moreover, to compensate the inability of the SIM system to simulate low flows where aquifers are not 235 explicitly accounted for, a subgrid parameterization of the drainage was used in the original SIM system. This subgrid drainage is controlled by a calibrated parameter both for plain areas and mountains, but such a calibration does not perform very well because the water used to sustain low flow is taken from the rooting zone and not from the aquifer. In the new system, this parametrisation is removed, and a parameterization was added to mimic the behaviour of a deep reservoir to sustain low 240 flows and to limit the flood peak due to the snow melt (Lafaysse et al., 2011). Retaining the water due to snowmelt and releasing it at the dry season has helped simulating the flood peak but still the summer low flows are underestimated.

245
In this study the SIM system is an offline application where the ISBA land surface model is driven by climate data and no feedback from the surface to the atmosphere exists. Various configurations of SIM have been designed to highlight the improvements that have been achieved, each simulation being spun up for two years. The first configuration refers to the old SIM system (SIM_REF hereafter, as described in 2.1), i.e. 250 before any of the changes previously described. An additional reference simulation (SIM_REF2 hereafter) is based on SIM_REF where subgrid drainage is removed. The first experiment (SIM_PHY hereafter) consists in changing the physics and the input databases. SIM_PHY uses the ISBA diffusion scheme with 14 layers in the soil, the updated snow scheme with 12 snow layers, a tile approach based on 12 vegetation types, and a runoff parameterization where the high constraint on the b coefficient 255 (b=0.5 in the runoff parameterization in SIM_REF) was lowered to 0.25. In SIM_PHY also, the updated databases are used for a better representation of soil texture, orography, and vegetation. The SAFRAN infrared radiation correction with cloudiness is then introduced in experiment SIM_FRC (based on SIM_PHY). Then, SIM_TOP (based on SIM_FRC) uses the subgrid orography representation in mountains, and finally SIM_NEW (based on SIM_TOP) accounts for a drainage reservoir in mountains. 260 Table 1 summarizes the main characteristics of the experiments. In the SIM system, climate data is provided by the SAFRAN analysis. In the present study, SAFRAN covers a period of 60 years from 1 st August 1958 to 31 st July 2018. In SAFRAN, the guess comes from ERA-40 or ERA-Interim (Dee et al., 2011). The analysed variables are then interpolated hourly into the SIM grid at 8 km resolution and this full set of near-surface variables is then used to drive offline 265 simulations. Figure 4 summarizes the near-surface SAFRAN averaged analysed fields for the whole period over France used as input to the offline experiments, whereas Fig. 5 highlights annual means of these quantities.

Validation data sets and tools
Various datasets were used to assess the SIM model performances along the validation process, to 270 ensure that an improvement in climate input data or physics was improving at the same time surface or soil variables and rivers discharges. The Meteo-France climatological database (BDCLIM) was used to retrieve soil temperatures at depths 10, 20, 50 and 100 centimetres over France from 1992 to 2015 for 104 weather stations (Fig. 6). These temperatures were compared to the temperatures simulated with the ISBA model at the same depth to assess the ability of the soil heat transfer process to accurately simulate temperature profiles. The river discharge observations come from the Banque Hydro (http://hydro.eaufrance.fr/), French 285 national database from 1958 to 2018. Daily and monthly stream flow data from 470 selected gauging stations were used to evaluate river discharges simulated by the hydrogeological model MODCOU. The Nash Sutcliff Efficiency (NSE, Nash and Sutcliff, 1970) was used to evaluate the model performance, and in addition the discharge ratio between the SIM simulations and the observations was calculated to assess the bias of the system. The complementary cumulative distribution function (CCDF hereafter) of 290 NSE that calculates the probability of the NSE to be greater than a threshold, averaged over the number of gauging stations in France, is also used as a metric to assess NSE. Snow depth is another independent dataset (i.e. not assimilated in the reanalysis process) used to evaluate the system. Measurements are provided by 185 stations in the Alps, the Pyrenees, and Corsica at elevations ranging between 600 and 3000 m a.s.l. They include 26 ultrasonic sensors mainly in high 295 elevation areas (Nivose network) and 161 stations operated by Meteo-France partners mainly in ski resorts and providing manual measurements on snow stakes. Daily total snow depths are used to compute bias and Root Mean Square Errors for SIM_REF and SIM_NEW simulations over the 1984-2016 period between October and June. Note that most stations do not provide comprehensive data over the whole period. 300 Figure 5 shows the annual means of the atmospheric forcing from 1958 to 2018. 2-meter air temperature (Fig. 5a) and specific humidity (Fig. 5b) exhibit a natural interannual variability and a tendency to increase over time by about 1.4 K and 0.6 g Kg -1 (linear regression of the annual means time series), 305 respectively (which implies that relative humidity tends to decrease). The abrupt change of temperature  (2015) is not that obvious. 10-meter wind speed (Fig. 5c) at the beginning and the end of the analysis period is of the same order with a magnitude of 2.8 m s -1 , but exhibiting a significant decrease of 0.5 m s -1 between 1983 and 1995 and then a regular increase until 2018. Interannual variability is larger for precipitation than for the other variables but does not show 310 any tendency on average. Incoming radiations also present a remarkable change around 1988 with approximately +15 W m -2 for the direct solar radiation and +5 W m -2 for the infrared radiation between the periods before and after 1988. In the meantime, the diffuse solar radiation decreases by 10 W m -2 from 1988 onwards. On average the total amount of solar and infrared energy received by the surface increases by about 10 W m -2 . This behaviour is consistent with the discussion of Brulebois et al. (2015)  315 and the analysis of Boé (2016) and can be caused by several reasons. It can be argued that the decrease of aerosols and the increase of greenhouse gases in the atmosphere have increased significantly the incoming radiations as climate studies have shown (Wild 2012). In addition to these physical reasons, a more technical reason is the change in the large-scale reanalysis, that evolved from ERA-40 to ERA-Interim, with a priori small changes in the analysed fields. However, in ERA-Interim, the data 320 assimilation system evolved substantially and changed to a 4D-variational method as compared to the 3D-variational method previously used in ERA-40. This new system appeared to be more accurate and the assimilation of a lot more satellite observations has significantly improved the analysis and the forecast. In particular the vertical profiles of temperature and relative humidity were improved, and they are used as input of the cloudiness analysis for the computation of the solar ad infrared radiation. Table  325 2 summarizes the comparison in terms of bias and root mean square error (RMSE) at the four weather stations measuring IR radiation. Except for Carpentras station where LSAF IR radiation is almost unbiased and the error the lowest as compared to SAFRAN, scores are better for stations in altitude with SAFRAN when the correction is applied. Due to their high elevation no correction was applied to Argentière nor Saint-Sorlin. At Argentière station, bias and RMSE are smaller with SAFRAN than 330 LSAF. At Saint-Sorlin however, the bias is higher with SAFRAN but the RMSE if of the same magnitude than LSAF. Figure 7 presents the comparison of each SIM simulation with the observed river flow for the 470 gages. The river discharge is slightly underestimated with SIM_REF2, and the underestimation is 335 corrected by calibrating the subgrid drainage term, as shown in SIM_REF simulation, where discharge ratio (simulated over observed) is centred around 1 and the daily efficiency range (NSE hereafter) characterized by its CCDF is larger for all stations. SIM_PHY does not account for any subgrid drainage parameterization and therefore should be compared to SIM_REF2 simulation. SIM_PHY tends to overestimate discharges as shown by the mean discharge ratio between simulations and observations. 340

Impact of new model configurations
However, the CCDF of NSE shows that SIM_PHY performs better than SIM_REF2 for NSE greater than 0.56, corresponding to half of the total number of stations, and highlighting the added value of the physics associated with better description of vegetation types and other more accurate databases. SIM_PHY exhibits slightly worse results for Nash ranging from 0.5 to 0. (about 40 % of the stations) but in this case the two models do not perform very well. 345 https://doi.org/10.5194/gmd-2020-31 Preprint. Discussion started: 2 March 2020 c Author(s) 2020. CC BY 4.0 License. Figure 8 shows how scores are improved for the experiments with corrected infrared radiation (SIM_FRC), subgrid orography (SIM_TOP) and hydrology (SIM_NEW), both in terms of CCDF of NSE and discharge ratio. The bias in river discharge is significantly reduced when increasing infrared radiation due to a higher total evaporation leading to less water available in the rivers. However, a positive bias remains, which is expected, since SIM simulates natural runoff and discharges, without 350 abstraction nor derivation, whereas some basins are anthropized. SIM_FRC and SIM_TOP NSE scores are very close and better than SIM_PHY for all stations and SIM_REF for about 75 % of the stations with NSE greater than 0.4. Finally, SIM_NEW as SIM_TOP tends to overestimate the discharge, however their NSE are significantly improved as compared to SIM_REF for all NSE ranges. Figure 9 presents a map of the differences of mean annual NSE (for stations with positive NSE) 355 between the different configurations. Over the whole reanalysis period, Fig. 9a confirms first that SIM_PHY alone does not improve discharge simulations everywhere over France but only in the gages that were already reasonably represented (with NSE above 0.56). Then the new IR forcing improves scores almost everywhere except 2 isolated stations in the Seine basin (Fig. 9b). As expected, SIM_TOP has an impact only over mountains and more particularly over the Alps (Fig. 9c) and finally SIM_NEW 360 compared to SIM_TOP highlights the benefits of using a groundwater mountain reservoir for snow (Fig.  9d). Note that the number of stations is reduced in Fig. 9c and Fig. 9d because these experiments do not concern the whole territory. In Fig. 10 SIM_NEW is compared to SIM_REF and shows the benefits of all the changes. The SIM_NEW NSE map indicates that the model explains a large part of discharges variance over most stations (brown to green colours) but still some stations exhibit NSE values medium 365 (red) to low (blue). In particular, the gages in northern France are not well simulated but also the Alps region known to be highly anthropized.

Comparison to old SIM
To complement the previous results and demonstrate the successive improvements of the simulated discharges, seasonal scores were computed over the 60 years using Taylor diagrams, recognized as a 370 useful tool for graphically summarizing how a set of simulations matches observations (Taylor, 2001). A set of experiments can be analyzed in terms of correlations, centered root-mean-square difference (RMSD), and the amplitude of their variations represented by the normalized standard deviation. These scores are computed using all daily observations and simulations. Seasonal Taylor diagrams (DJF, MAM, JJA, SON for winter, spring, summer, and fall respectively), of the different experiments are 375 shown in Fig. 11. It results that whatever the season, SIM_NEW simulation exhibits the highest correlation and the smallest RMSD, except perhaps for JJA, the season with the highest normalized standard deviation. In DJF, scores are very good and tightened whereas in JJA, they are still tightened but the RMSD is larger. MAM and SON confirm the benefit of using a groundwater reservoir to keep water in mountains before releasing it in spring. 380

Extremes river flows
Previous results have shown how SIM_NEW was performing on average over the 60y time period. To assess the ability of the new system to properly simulate extreme river discharge and so to distinguish flood and low flow periods, daily river discharge deciles were computed and particular attention was https://doi.org/10.5194/gmd-2020-31 Preprint. Discussion started: 2 March 2020 c Author(s) 2020. CC BY 4.0 License.
given to Q10 decile corresponding to low flow states and Q90 decile the threshold above which a flood 385 is considered as decennial.
As shown in Fig. 12 Q10 and Q90 indicate first that in very dry periods (discharges lower or equal to Q10) all simulations but SIM_NEW have a too low amplitude of variations. In addition, for SIM_NEW experiment, correlation, RMSD and normalized standard deviation are best. The variability in terms of normalized standard deviation is reversed when considering floods (Q90) as compared to dry periods 390 (Q10). Again SIM_NEW has the smallest RMSD and all simulation correlations are superior to 0.99. Figure 13 compares monthly observed to simulated discharges of the Garonne river at Lamagistere with SIM_NEW and confirms the ability of the model to rather accurately simulate low flows during summer seasons and its tendency to overestimate flood peaks.

395
To complement the previous results against discharges, a comparison of snow depth from SIM_REF and SIM_NEW was performed using 185 stations as described in Sect. 3.2. In Fig. 14 the spatial variability of scores is presented as a function of elevation with notched boxplots where the boxes represent the interquartile interval, the whiskers the 10 th and 90 th percentiles, and the notch represents the 90% confidence interval of the median estimated by a bootstrap sampling technique among the 400 available stations. The SIM_REF simulation exhibits a positive median bias at the lowest elevations and a negative median bias between 2000 and 2400 m while the SIM_NEW is not biased at any elevation. The variability of bias between stations is also reduced in the SIM_NEW simulation. Consistently, a significant reduction of RMSE is obtained at the lowest and highest elevations with SIM_NEW as well as a reduction of the 90 th percentile of RMSE at all elevations. These results are consistent with the 405 improved elevation resolution in mountainous areas which reduces the elevation discrepancies between simulated grid cells and observation stations. Slight improvements of SIM_NEW scores can also be obtained by linear interpolations at the stations elevations of the simulated snow depths at the two closest elevation bands. Note also that the improved snow scheme is also expected to explain a part of the scores improvement . 410

Soil temperature profile
A direct comparison of SIM_NEW to another independent variable, soil temperature profile, was also performed. Model output temperatures at depth 10, 20, 50 and 100 cm in the soil for baresoil were compared to observed temperatures between 1992 and 2015 for 104 weather stations (indicated in Fig.  6). The results summarized in Table 3 show bias and the centered RMSE calculated at the 104 stations. 415 As in the previous study of Decharme et al. (2013), a global cold bias (here of about -0.8 K) is observed at each depth, which can probably be attributed to a deficit in the incoming shortwave radiation at the surface and/or to the none representation of the deep earth geothermal heating. On the other hand, the RMSE remain rather small with higher values close to the upper soil layers as compared to the deepest ones. RMSE ranges from 2.1 K at 10 cm below the surface to 1 K at 100 cm. Such a comparison of 420 temperatures at different depths is possible because the soil vertical discretization allows it (as opposite to SIM_REF).

Changes on the simulated water and energy balance
Maps of the Bowen ratio and the ratio of the evaporation to precipitation are presented in Fig. 15. The areas where the Bowen ration is the largest are located in the mountains where snowfalls limit the 425 evaporation, along the Mediterranean coast where the annual precipitation is limited and the incoming radiation large, and in a large area covering the Garonne basin, and part of the Loire and Seine basins, characterized by high vegetation fractions. The ratio evaporation to precipitation is also the largest in plain areas where the Bowen ration is large. Over mountains, high precipitation and limited evaporation due to the presence of snow conduct to the smallest ratio evaporation to precipitation. These results are 430 comparable to those obtained by  for another time period, except that in SIM_NEW the Landes forest (Southwestern France at the Atlantic coast) exhibits a higher Bowen ratio. The first reason comes from the difference in the photosynthesis parameterization and more precisely the parameterization of the leaf conductance used in SIM_REF based on Jarvis (1976) and SIM_NEW based on Calvet (1998) ISBA-A-gs module that accounts for plant stress and reduces substantially the 435 evaporation over vegetated areas. Thus, the surface energy balance tends to increase the sensible heat flux. The second reason is linked to the increase in infrared incoming radiation that strengthens sensible heat flux and diminishes the latent heat flux, which usually occurs on dry soils with a low evaporative capacity. Interannual variability of the ratio of the evaporation to precipitation (E/P hereafter) and the Bowen ratio are shown in Fig. 16 and 17 respectively, for SIM_REF, SIM_PHY, and SIM_NEW to 440 characterize first the old system as compare to the new one and also to highlight the impact of changes from SIM_PHY to SIM_NEW on the energy budget. E/P is larger in SIM_REF than in the two other simulations every year, and E/P in SIM_NEW is closer to SIM_REF than in SIM_PHY. Total precipitation is very similar but slightly lower in SIM_REF and in SIM_PHY or SIM_NEW, due to the higher resolution orography and the subgrid orography representation in mountains. Therefore, a larger 445 E/P corresponds to a larger total evaporation. The Bowen ratio is the lowest for SIM_REF, it increases in SIM_PHY, and is the highest in SIM_NEW, which already tends to evaporate more than SIM_PHY. This result shows that the sensible heat flux in SIM_NEW is much larger than in SIM_PHY due mainly to the increase in the incoming infrared radiation.

Climate data
Coming back to climate data as shown in Fig. 5, heterogeneity exists in forcing data, especially in radiation. Two reasons can explain the breaking in time series, the first is due to the large-scale analysis used to reconstruct temperature, humidity and cloudiness profiles. As explained in Sect. 4.1, the computations of these profiles have varied in time following the improvements of the global data 455 assimilation systems that were used from ERA-40 to ERA-INTERIM. The second reason that can be invoked is the variation of the observation density network in time. Indeed from 1958 to nowadays substantial changes have been experienced in the deployment of new weather stations. The combination of these two changes makes that the SAFRAN reanalysis is not homogeneous in time and it appears important to understand how the OI results are influenced due to these changes when analysing the 460 https://doi.org/10.5194/gmd-2020-31 Preprint. Discussion started: 2 March 2020 c Author(s) 2020. CC BY 4.0 License. results of the simulations. However, abrupt change can also be due to the dimming / brightening effect (Wild 2012, Brulebois et al., 2015, Boé 2016. As already mentioned, the uncertainty on the SAFRAN IR radiation is large. Taking advantage of having observations of IR in plain and mountains has allowed a fair comparison between LSAF products and SAFRAN without correction (SIM_PHY) and with correction (SIM_FRC). Because the 465 impact of this variable is very large, and particularly on snow (Quéno et al, 2017, Sauter andObleitner, 2015), an extension of the in situ observation network would help to better understand its spatial variability and to improve model simulations. Extending the correction to the whole France is debatable but the decision was guided by the positive bias in river discharges and also the willingness to have a more realistic energy supply in mid-mountain areas (i.e. below 1500 m) to better catch the evolution of 470 the snowpack.

Subgrid hydrology
This method has shown that hydrology in mountainous areas was improved because the analysed precipitation rate and phase were better represented for each elevation band than when averaging on the vertical, resulting, in the Durance case (Lafaysse et al., 2011), in a decrease of the overestimated spring 475 peak flow associated to a better phase between the monthly observed and simulated discharge. However, summer and winter low flows were still largely underestimated by the model. During long dry periods without precipitation or snowmelt river discharges are controlled by underground drainage.
In the frame of the Aqui-FR project (https://www.metis.upmc.fr/~aqui-fr/) aiming at developing a hydrogeologic model platform over France to simulate discharges and water table heights, aquifers are 480 explicitly simulated and the SURFEX (Masson et al., 2013) water fluxes used as input must not be impacted by any empirical aquifer representation. Moreover, in Aqui-FR some hydrogeologic applications have been calibrated using SURFEX runoff and infiltration water fluxes as input (Vergnes et al., 2019).

485
This study illustrates how the developments performed these last ten years improve the hydrometeorological system SIM. Several important changes have been done, in particular in the ISBA soil physics where the Force-restore method has been abandoned and replaced by the soil multi-layer diffusive method. In the same time the snow model was revised to get closer to Crocus model as described in Sect. 2. The model was run based on the tiling approach for vegetation, each of the 12 490 vegetation types being characterized by their own set of parameters, opposite to the one vegetation type approach where parameters are aggregated. Then more accurate databases of soil, orography, and land use were used. A more accurate infrared forcing has allowed to significantly improve the results as well as the use of a groundwater reservoir in mountains associated to a specific vertical discretization of massifs. The new configuration of the model, including all of the new or updated functionalities 495 mentioned above, has been shown to perform better than the old system and was therefore better suited for water resource studies. Comparisons to independent observations of daily total snow depth and river https://doi.org/10.5194/gmd-2020-31 Preprint. Discussion started: 2 March 2020 c Author(s) 2020. CC BY 4.0 License.
discharges were conducted and confirmed that scores were improved. In addition, the new SIM system was better representing extremes in river discharges, both for low flow periods and floods.
There are a few perspectives that can be proposed to improve the SIM system. The first one concerns 500 the improvement of the climate description. It has been noticed that SAFRAN was performing well in most cases but some deficiencies were still remaining. A new near-surface reanalysis system is being developed at Meteo-France to replace SAFRAN. It includes a new surface analysis of 2-meter air temperature, relative humidity and daily precipitation, and uses high resolution model outputs as background for the analysis. In addition, in the framework of the Copernicus program, a 5.5 km high-505 resolution reanalysis will be produced over Europe and will constitute an interesting product to compare to SAFRAN over France. The second one concerns the improvement of the physics of the model and more precisely the use of the Multi-Energy Budget scheme (Boone et al., 2017;Napoly et al., 2017) to allow an explicit calculation of the canopy interactions with air and soil. The MEB model has demonstrated that the use of a litter in 510 forests was improving surface fluxes results. Then, accounting for anthropisation, especially irrigation and the presence of dams could benefit the SIM system to improve its realism and allow fairer comparisons to gauging stations in anthropized basins. Irrigation is currently being developed in the ISBA model and the integration of dams is a longer-term project. Finally, better representing groundwater and their characteristics in France is 515 another challenge that needs also to be addressed.
Code availability. The SURFEX v8.0 source code, including the ISBA code, used in this study is available in the supplement, as well as the SAFRAN code. The post-processing codes, including the scores package of the snowtools opensource project, are also available in the supplement. 520 Data availability. The outputs from all models discussed here and the R and Python programs to plot the results are available in the supplement.           https://doi.org/10.5194/gmd-2020-31 Preprint. Discussion started: 2 March 2020 c Author(s) 2020. CC BY 4.0 License.