General overestimation of ERA5 precipitation in flow simulations for High Mountain Asia basins

Precipitation is one of the most important input to hydrological models, although obtaining sufficient precipitation observations and accurate precipitation estimates in High Mountain Asia (HMA) is challenging. ERA5 precipitation is the latest generation of reanalysis dataset that is attracting huge attention from various fields but it has not been evaluated in hydrological simulations in HMA. To remedy this gap, we first statistically evaluated ERA5 precipitation with observations from 584 gauges in HMA, and then investigated its potential in hydrological simulation in 11 HMA basins using the Variable Infiltration Capacity (VIC) hydrological model. The ERA5 precipitation generally captures the seasonal variations of gauge observations, and the broad spatial distributions of precipitation in both magnitude and trends in HMA. The ERA5 exhibits a reasonable flow simulation (RB of 5%–10%) at the Besham hydrological station of the upper Indus (UI) basin when the contribution from glacier runoff is added to the simulated total runoff. But it overestimates the observations in other HMA basins by 33%–106% without considering glacier runoff, mostly due to the overestimates in the ERA5 precipitation inputs. Therefore, a bias correction is definitely needed before ERA5 precipitation is used for hydrological simulations in HMA basins.


Introduction
Accurate precipitation is crucial for understanding the hydrological responses to climate changes by hydrological models in high mountainous basins. However, gauges are sparse or nonexistent in many high mountainous regions due to their complex environment. This is especially true for High Mountain Asia (HMA), which is the origin of major Asian rivers in the Tibetan Plateau (TP) (figure 1).
Many studies have attempted to evaluate precipitation data in HMA from gauge-based interpolation estimates (Tong et al 2014a, Tong et al 2014b, Li et al 2020, satellite-based estimates (Tang et al 2018b, Tan et al 2020, Zhang et al 2020b, reanalysis datasets (Zhang and Bao, 2013, Wang et al 2017, Bai et al 2020, and outputs of regional climate models (Gao et al 2015, Gao et al 2020, Li et al 2020, Sun et al 2021. These studies suggest that of the current precipitation datasets, none are equally good for all HMA basins because of high variabilities in their amounts and spatiotemporal patterns. The ERA5 (Hersbach et al 2020) precipitation dataset, newly released fifth-generation reanalysis of the European Centre for Medium-Range Weather Forecasts (ECMWF), has been used in describing large scale spatial-temporal patterns of precipitation (Lai et al 2020, high-altitude melt (Bhattacharya et al 2021, and streamflow simulation (Dahri et al 2021a, Khanal et al 2021 in HMA. For instance, Hu and Yuan (2020) found that ERA5 precipitation could capture the general spatiotemporal features of the evolution of observations at the scale of rainfall events over the eastern periphery of the Tibetan Plateau.  compared the precipitation gradient characteristics obtained from ERA5 and gauge observations in the monsoon-dominated and westerlies-dominated HMA basins, suggesting that ERA precipitation can capture the pattern of precipitation gradient in most basins. Dahri et al (2021b) evaluated 27 gridded precipitation products in the high-altitude Indus, and suggested that among all the products, ERA5 exhibited the most acceptable performance for all sub-regions of the upper Indus. However, existing studies about ERA5 precipitation estimates are mostly focused on the spatiotemporal performances in the TP based on limited gauge observations which are mostly located in the eastern TP region. The performance of ERA5 precipitation is unclear in the western TP region. Khanal et al (2021) simulated streamflow directly forced by ERA5 precipitation for 15 HMA basins without considering the precipitation uncertainties in ERA5 precipitation, resulting in the large differences in meltwater contribution between their simulation results and existing studies. Therefore, it is essential to evaluate the performance of ERA5 precipitation data before it is used in hydro-climatological applications.
However, the representation and hydrological utility of ERA5 precipitation in HMA river basins has not been systematically evaluated. In this study, more stations on the western TP are collected, which together with stations on the eastern TP constitute a unique observation basis to evaluate the performances of ERA5 precipitation. Two questions are addressed: (1) How well can ERA5 describe HMA precipitation in both magnitude and spatiotemporal distribution?(2) Can ERA5 precipitation meet the hydrological accuracy in HMA basins?Aiming at these issues, ERA5 precipitation estimates are evaluated with observations from 584 gauges according to magnitude and spatio-temporal patterns of precipitation in HMA, and its potential utility in hydrological modelis investigated in 11 HMA basins.

Data and methodology
In this study, hydrological evaluations of ERA5 precipitation estimates are focused on 11 upper basins in HMA 2.1. Precipitation data 2.1.1. ERA5 precipitation estimates The ERA5, which is the successor of ERA-Interim, provides the next generation of global precipitation estimates at a temporal resolution of one hour from 1950 to the present and spatial resolution of about 25 km (Hersbach et al 2020), which can be downloaded from https://www.ecmwf.int/en/forecasts/datasets/reanalysisdatasets/era5. It uses one of the most recent versions of the Earth system model and data assimilation method applied at ECMWF, which enables it to use modern parameterizations of Earth processes (Hu and Yuan, 2020). In this study, hourly ERA5 precipitation estimates on single levels from 1950 to 2020 are used for evaluation.

Gauge observations
Observations from 293 meteorological stations and 291 rain gauges (figure 1) are used to evaluate the ERA5 precipitation estimates in HMA.
Daily observations from 150 meteorological stations are from the China Meteorological Administration (CMA, http://data.cma.cn/) for 1961-2016 in the monsoon-dominated eastern and southeastern HMA regions. In addition, monthly observations from 264 rain gauges for 2014-2016 in the monsoon-dominated Yarlung Zangbo basin  are also used to evaluate the performance of ERA5 precipitation.
In These gauge data have undergone quality control procedures to check (either validated, corrected or removed) erroneous (e.g. daily precipitation values less than 0 mm) and homogenous data record associated with non-climatic influences such as changes in instrumentation, station environment, and observing practices that occur over time.

Methodology
The performances of ERA5 precipitation estimates in both magnitude and spatiotemporal distribution are firstly evaluated with observed precipitation from 584 gauges at point and regional scales for overlapping periods in terms of the statistical indexes of monthly correlation coefficient (CC) and relative bias (RB; %). Then, the ERA5 precipitation estimates are evaluated by the VIC hydrological model (Liang et al 1994, Liang et al 1996 in 11 HMA basins. VIC is a physically based, distributed hydrological model that parameterizes the water and energy exchanges among soil, vegetation, and atmosphere and has been widely used in simulations in HMA basins , Su et al 2016, Kan et al 2018, Meng et al 2019. The required VIC forcing data include daily precipitation, maximum and minimum temperature and wind speed. The modeling frameworks at a three-hourly time step and 1/12°× 1/12°(around 10 km × 10 km) spatial resolution, parameters and required forcing data for the monsoon-dominated UYA, UYE, UNJ, ULC, and YZ basins are adopted from  and , and these of the westerlies-dominated UI, UYK, UAMD, and USRD basins are adopted from Li (2019), Kan et al (2018) and Huang and Su (2019) without further calibration. To exclude the impact of glacier runoff on the precipitation evaluation, the off-line glacier scheme, which is included in previous applications of the VIC-Glacier model, is not used in this study.
Available monthly streamflow observations from 13 hydrological stations (figure 1, table 1) are used to compare with simulations driven by ERA5 precipitation estimates in all the selected HMA basins for 1980-2010(1980-1991for UAMD and 2001-2010 for USRD). The statistical indexes of RB and Nash-Sutcliffe efficiency (NSE) are used to quantify the systemic deviation and agreement between the simulations with ERA5 precipitation and observed streamflow (figure 1).

Evaluation of the ERA5 precipitation estimates with gauge observations
When compared with observations from 584 gauges, ERA5 precipitation estimates have significant correlations with gauge observations in monthly variations (mostly with CCs>0.35, p < 0.05) in HMA ( figure 2(a)). However, ERA5 precipitation estimates perform better in the monsoon-dominated regions than in the westerlies-dominated regions in terms of CC. About two thirds of precipitation gauges in the monsoondominated regions show high correlations (mostly with CCs of 0.61-0.97, p < 0.05) with the corresponding ERA5 grids in monthly variations, but two thirds of the gauges in the westerlies-dominated regions show low correspondences with the ERA5 precipitation estimates (mostly with CCs<0.4). The ERA5 precipitation generally overestimates the gauge observations in annual means with RBs of 30%-270% ( figure 2(b)).
Precipitation estimates from gauge observations show consistent seasonal patterns among the monsoon basins, with 70%-85% of mean annual estimates occurring in June-September (figure 2(c)). However, diverse seasonal patterns are present among the westerlies-dominated basins (figures 2(d)-(f)). The UAMD and USRD basins (figures 2(d), (e)) display a strong westerlies signal with a winter-spring precipitation maximum pattern. However, the UI exhibits a bimodal pattern clearly reflecting the effect of the westerlies and occasional intrusions of monsoons ( figure 2(e)). The Tarim basin shows a summer precipitation maximum (figure 2(f)) due to the orographic barrier of Pamir-Tian Shan mountains (Chen et al 2020). The ERA5 precipitation successfully reproduces the seasonal pattern of gauge in all selected basins (CC of 0.65-0.96, p < 0.05, figures 2(c)-(f)). It is worth noting that the ERA5 ably captures the observed precipitation seasonality in the UYK of Tarim basin, which is not reflected in the widely used satellite-based Global Precipitation Measurement (GPM) precipitation estimates and outputs from regional climate models (Supplementary figure S1 (available online at stacks.iop. org/ERC/3/121003/mmedia)).
The well recognized large-scale spatial precipitation pattern in HMA (Tong et al 2014b, Wang et al 2018, Tan et al 2020, Sun et al 2021 is also preserved in the ERA5 precipitation, with a decreasing trend from southeastern HMA (800-3000 mm) to the inner transition region (200-400 mm) with the decay of monsoon precipitation, and then an increasing trend to the western HMA with mean annual precipitation reaching 400-1000 mm in the mountainous regions of Amu and Syr Darya (figure 3(a)), along with the enhanced impacts from the westerlies. In addition, ERA5 precipitation estimates can detect well the monsoon signal in June-September ( figure 3(b)) and the westerlies signal in October-May in HMA ( figure 3(c)).
The ERA5 annual precipitation generally shows a strong increasing trend in the center and mountainous regions of northwestern HMA (0.6-1.5 mm yr −1 , p −1 <0.05), while a decreasing trend is seen in the southeast (e.g., significantly decreasing trends of −16 to −8 mm yr −1 in the downstream of the Yarlung Zangbo river) during 1950-2020 ( figure 3(d)). This contrasting spatial pattern is intensified after 2000 (figure 3(f)), resulting in a spatial pattern of wetter in central and western HMA, and drier in the southeast. This north-south dipole pattern of precipitation changes in HMA, which might be explained by the weakening Indian monsoon towards the interior and the strengthening westerlies towards the northwest of HMA (Turner andAnnamalai, 2012, Yao et al 2012), is also detected in other precipitation datasets, such as gauge-based (Zhang et al 2020a), satellitebased, and reanalysis (Song et al 2016) precipitation estimates.
Systematic errors in gauge observations, resulted from their locations at low altitudes, precipitation undercatch (Yang et al 2005, Ma et al 2015, and the scale mismatch between observations and grids (Tang et al 2018a) may lead to uncertainties in the ERA5 precipitation evaluation at point scales. For instance, the large overestimation of ERA5 precipitation against the gauge observations in the UI (RB of 160%, figure 2(e)) may be due to the unrepresentative gauges at low elevations. Land surface hydrological models provide an important tool for inversely evaluating the gridded precipitation in flow simulations against flow observations, especially for basins where precipitation gauges are lacking (Su et al 2008, Sun and.

Hydrological evaluation of the ERA5 precipitation estimates
Flow observations show that more than 60% of the annual total flow occurs in June-September in all the 11 selected basins (figure 4), while the behind drivers differ. In the monsoon-dominated basins (figures 4(a)-(e)), the seasonal pattern of streamflow is mostly a direct response to that of precipitation ( figure 2(c), Figures S2 (a)- (e)) due to the dominant role of monsoon precipitation in runoff generation over these basins . In the westerlies-dominated basins, melt water from seasonal snow and glaciers contributes about 45%-77% to total runoff (Lutz et al 2014, Kan et al 2018 in spring and summer. Therefore, the seasonal pattern of streamflow in the westerlies-dominated basins is not always consistent with that of precipitation due to the large influences of melt water, such as the UI, UAMD, and USRD (figures 4(f)-(h)), where precipitation mostly peaks in spring and winter (spring and summer in the UI, figure 2(d), figures S2(f)-(h)). Consistent seasonal patterns between streamflow and precipitation are present in the UYK, UAKS and UHT (figures 4(i)-(k), 2(f), S2(i)-(k)), which are mostly due to the co-occurrences of precipitation and melt water in June-September.
The simulated streamflow driven by ERA5 precipitation can generally reproduce the seasonal pattern of observations in all selected basins ( figure 4). However, the flow simulation driven by the ERA5 precipitation tends to largely overestimate the observed streamflow in HMA basins even without including the water contribution from glaciers, with RBs of 45%-106% in the monsoon-dominated basins (figures 4(a)-(e)) and RBs of 33%-70% in the westerlies-dominated basins (figures 4(g)-(k)). However, one exception is the UI, where the flow simulation driven by the ERA5 precipitation underestimates the observed streamflow by 16% ( figure 4(f)).
The large overestimation in the ERA5-driven simulated streamflow is mostly because of the overestimates in the ERA5 precipitation (figure 2). For instance, the mean annual precipitation was 519 mm based on 16 CMA gauge stations in the monsoon-dominated UYE basin ( figure 1, figure S3(a)) for 1980-2010, while the ERA5 precipitation was 773 mm (49% larger than the gauged-based estimates, figure S3(a)). The simulated streamflow with the ERA5 precipitation overestimates the observations by 87% in the UYE, while the flow simulations with the gauge-based precipitation match well with the observations with NSE of 0.87 and RB of −5% ( figure S3(d)), inversely suggesting the large overestimates in the ERA5 precipitation. Another example is the monsoondominated YZ basin (figures 1, S3(b)), where Sun and Su (2020) reconstructed a precipitation dataset in the YZ basin with a basin-wide mean annual estimates of 729 mm for 1980-2010 ( figure S3(b)). The mean annual ERA5 precipitation was 1266 mm (74% higher than the reconstructed precipitation) in the YZ for the same period, resulting an overestimate of 61% in the simulated streamflow against observed. On the other hand, the reconstructed precipitation results in a high NSE of 0.91 and small model RB of −7% (figure S3(e)), which may be compensated by glacier contributions (13.9%) to total runoff .
For the westerlies-dominated basins (figures 4(g)-(k)), the ERA5-driven simulated streamflow would further overestimate the observations if glacier runoff is considered in the flow simulations. For example, Kan et al (2018) generated daily precipitation estimates in the UYK basin with mean annual precipitation of 232 mm for 1980-2010, while the ERA5 precipitation was 447 mm (93% larger than the generated gridded estimates, figure S3(c)). The simulated streamflow driven by ERA5 precipitation overestimates the observations by 33% in the UYK (figure S3(f)) even without including glacier runoff. The simulated total runoff driven by the gaugebased precipitation (Kan et al 2018) would match well (RB of 6%, figure S3(f)) with the flow observations when a 52% contribution from glacier runoff (Kan et al 2018) is added to the simulated total runoff.
In terms of the westerlies-dominated UI basin, the contribution of glacier runoff to total flow is estimated to be about 21%-26% in UI basin Khan, 2014, 2015) based on statistical and hydrologic analyses of the river discharge data, and snow and glacier cover estimations. Glacier tend to melt from June to September with the peak in July-August. Therefore, the total runoff also peaks in summer. If we take these numbers as the glacier runoff contribution, the ERA5 precipitation-driven simulations would be comparable (RBs of 5%-10%) with the observed streamflow at the Besham station, suggesting a reasonable precipitation magnitude of the ERA5 averaged over the UI basin (691 mm in 1980-2010).
In summary, the simulated flows with the ERA5 precipitation generally largely overestimate the observations in HMA basins. Therefore, ERA5 precipitation should be systematically bias-corrected when it is used for hydrological simulations in HMA basins.