Evaluation of Upper Indus Near-Surface Climate Representation by WRF in the High Asia Reﬁned Analysis

Data paucity is a severe barrier to the characterization of Himalayan near-surface climates. Regional climate modeling can help to ﬁll this gap, but the resulting data products need critical evaluation before use. This study aims to extend the appraisal of one such dataset, the High Asia Reﬁned Analysis (HAR). Focusing on the upper Indus basin (UIB), the climatologies of variables needed for process-based hydrological and cryospheric modeling are evaluated, leading to three main conclusions. First, precipitation in the 10-km HAR product shows reasonable correspondence with most in situ measurements. It is also generally consistent with observed runoff, while additionally reproducing the UIB’s strong vertical precipitation gradients. Second, the HAR shows seasonally varying bias patterns. A cold bias in temperature peaks in spring but reduces in summer, at which time the high bias in relative humidity diminishes. These patterns are concurrent with summer overestimation (underestimation) of incoming shortwave (longwave) radiation. Finally, these seasonally varying biases are partly related to deﬁciencies in cloud, snow, and albedo representations. In particular, insufﬁcient cloud cover in summer leads to the overestimation of incoming shortwave radiation. This contributes to the reduced cold bias in summer by enhancing surface warming. A persistent high bias in albedo also plays a critical role, particularly by suppressing surface heating in spring. Improving representations of cloud, snow cover, and albedo, and thus their coupling with seasonal climate transitions, would therefore help build upon the considerable potential shown by the HAR to ﬁll a vital data gap in this im-mensely important basin.


Introduction
Runoff generated in the high mountains of the upper Indus basin (UIB) is a vital resource for vast populations. Combined with concerns about the impacts of climate change, this has led to substantial interest in the UIB's climate and hydrology over many decades (e.g., Young and Hewitt 1990;Archer et al. 2010). Yet here, like most mountain regions in the world, data paucity remains a persistent and major challenge for researchers and practitioners (Viviroli et al. 2011). Measurement networks are sparse and focused on lower-elevation valley locations, while data quality and continuity issues are inherent in such a harsh and remote environment. This hinders application of one of the key tools for research and management in the basin, namely processbased cryospheric and hydrological modeling.
One response to this problem harnesses modern computing facilities to simulate near-surface climate in the Himalaya using high-resolution numerical weather prediction (NWP) or regional climate models. For example, Collier and Immerzeel (2015) found that the Weather Research and Forecasting (WRF) Model (Skamarock et al. 2008), with a 1-km-resolution inner domain, reproduces the key features of variability in a detailed measurement network in the Langtang Denotes content that is immediately available upon publication as open access.
Supplemental information related to this paper is available at the Journals Online website: https://doi.org/10.1175/JHM-D-18-0030.s1. catchment, Nepal. Similarly, Karki et al. (2017) demonstrated the added value of convection-permitting simulations elsewhere in the Nepalese Himalaya. Good correspondence between WRF and observations was also found by Collier et al. (2013Collier et al. ( , 2015 in simulating glacier-atmosphere interactions in the Karakoram, and by Norris et al. (2017) in their study of the spatiotemporal distribution of precipitation across the Himalayan arc.
The relatively small number of WRF simulations conducted to date for this region therefore show clear potential to supplement local observations. While most of these experiments are for reasonably short (,1 year) simulation periods, Maussion et al. (2014) performed high-resolution WRF simulations of 14 years for the whole of the Tibetan Plateau and its adjoining mountain ranges. The resulting publicly available dataset is the High Asia Refined Analysis (HAR), which is in essence a dynamical downscaling of coarser global analysis using 30-and 10-km nested domains. For this region, the HAR thus represents a uniquely fine spatial discretization for a WRF simulation of this length over such a large area. As such, it has recently been used to improve our understanding of spatial and temporal patterns of precipitation variability and atmospheric water transport (Maussion et al. 2014;Curio et al. 2015;Curio and Scherer 2016). The HAR has also been employed to successfully provide offline meteorological forcing data for models of glacier mass balance and hydrology (Huintjes et al. 2015;Biskop et al. 2016;Tarasova et al. 2016), as well as to examine the connection between midlatitude westerlies and glacier mass balance in monsoonal parts of the region .
In their evaluation of HAR precipitation skill, Maussion et al. (2014) found that the dataset corresponds well with local observations, certain features of the Tropical Rainfall Measuring Mission (TRMM) product (Huffman et al. 2007), and known relationships between precipitation and topography. Our aim in this paper is to extend the evaluation of the HAR dataset in several ways. First, we focus in detail on the near-surface climate of the UIB, an area for which ground-based measurements were unavailable for Maussion et al.'s (2014) original assessment. Second, we go beyond precipitation to consider other climate variables related to surface mass and energy balances, including temperature, humidity, and incoming radiation. Finally, we explore some of the key factors affecting HAR performance. For this, we concentrate on cloud and albedo influences, based on the nature of bias patterns identified in nearsurface variables.
The primary focus in our evaluation is on HAR climatological representations and biases, as reasonable performance in these respects is vital for numerous applications. The findings from this analysis can therefore inform future hydrological and cryospheric modeling and other applications in this part of the Himalayan arc, as well as regional WRF modeling.

Study area
The UIB, shown in Fig. 1, exhibits pronounced hydroclimatic variation on a range of scales. The higher parts of the basin give rise to the glacially dominated hydrological regimes of the Hunza, Shigar, and Shyok subbasins (Sharif et al. 2013). Elsewhere, runoff mainly originates from seasonal snowmelt, derived from snowfall in the preceding winter and spring, or rainfall in the concurrent season (Archer 2003). The coherent spatial differentiation of these glacial, nival, and pluvial regimes stems in large part from variations in hypsometry and interactions of topography with westerly and monsoonal weather systems.
Much of the precipitation in the basin occurs in winter and spring, often originating from westerly disturbances (Filippi et al. 2014;Archer and Fowler 2004). These synoptic-scale low pressure systems are guided toward the UIB from the Atlantic and Mediterranean by the subtropical westerly jet. There are some relatively infrequent monsoon-related storms in summer, but westerly tracking depressions can also bring precipitation at this time of year, even though the subtropical westerly jet shifts to the north of the Karakoram (Hewitt 2014). Precipitation is strongly orographically forced, leading to a general increase with elevation to around 5500 m MSL but then most likely a decrease at higher elevations, due to exhaustion of moisture availability (Hewitt 2014). The interplay of precipitation sources and topographic barrier effects incurs substantial horizontal gradients in precipitation across the UIB (Young and Hewitt 1990;Reggiani and Rientjes 2015). Archer (2004) demonstrated that there is high spatial and vertical correlation of observed near-surface air temperatures in the UIB on monthly, seasonal and annual time scales. Temperature measured at a relatively small number of valley-based stations thus explains a substantial proportion of observed variability in melt season runoff in heavily glaciated, energylimited catchments (Archer 2003;Fowler and Archer 2005). In contrast to many parts of the world, observed summer temperatures in the UIB show a cooling trend (Fowler and Archer 2006;Forsythe et al. 2012b), which is concurrent with comparative glacier mass stability (Hewitt 2005;Brun et al. 2017). This appears to be related to interactions between the summer monsoon and the Karakoram/Western Tibetan Vortex system (Forsythe et al. 2017;Li et al. 2018). Studies of the full surface energy balance in the UIB are limited due to poor data availability, although the modulating influence of cloud cover has been examined by Forsythe et al. (2015).
Here we focus particularly on the northwest part of the UIB (hereafter NWUIB) shown in Fig. 1. This 52 473-km 2 domain is high-yielding, contributing ;50% of UIB runoff to Besham, upstream of Tarbela Reservoir, despite comprising only ;30% of the catchment area. The NWUIB contains a mixture of the pluvial, nival, and glacial hydrological regimes described above.

Data and methods
We build our evaluation of the HAR climatology and biases primarily around local hydroclimatic measurements. For reference, we also incorporate some limited comparisons with gridded climate data products commonly applied in this region, namely, TRMM and APHRODITE (Yatagai et al. 2012). However, these precipitation products exhibit important limitations in the UIB. Specifically, TRMM does not adequately detect snowfall, which comprises a substantial proportion of UIB precipitation (Forsythe et al. 2012a;Reggiani and Rientjes 2015;Maussion et al. 2014). APHRODITE is based on interpolation of the small number of in situ observations, which are biased toward drier, valley locations. This inevitably leads to underestimation of precipitation at higher elevations and at the catchment scale . Therefore, here we focus more on alternative datasets and approaches based on MODIS products, in order to supplement local observations and provide broader spatial coverages, as detailed in Table 1 and sections 3b and 3c. In addition, we do integrate some comparisons with global reanalysis products, drawing on ERA-Interim (Dee et al. 2011), JRA-55 (Kobayashi et al. 2015, and NASA MERRA-2 (Gelaro et al. 2017). However, our primary aim is to evaluate the high-resolution HAR with respect to the best available observations, rather than to compare it with datasets at generally coarser resolutions.

a. High Asia Refined Analysis
The initial and boundary conditions for the WRF simulations comprising the HAR were provided by the NCEP FNL (Final) Operational Global Analysis dataset (ds083.2), which uses the same model as the NCEP Global Forecast System (GFS; NOAA/NCEP 2000). The FNL dataset integrates surface and upper-air observation networks, as well as remote sensing products. The HAR used a daily reinitialization strategy to prevent drift from observed synoptic conditions over its total simulation period (from October 2000 to October 2014). Each day was simulated independently of all FIG. 1. Regional overview and study area maps. (a) The inner HAR domain (HAR10), orography, and the UIB study area and (b) the locations of climate stations and river gauges used in this analysis (details in Table 2 and  Table 3). The UIB boundary was delineated by Khan et al. (2014).
other days. For any given day, 36 h were simulated, beginning at 1200 UTC on the previous day. The first 12 h were then discarded as a spinup to decrease the influence of the initial conditions, which were interpolated from the FNL dataset to the model grid. The outputs of all of the individual daily simulations were ultimately concatenated to produce the overall time series. This approach to initialization and spinup was found to perform well relative to other computationally feasible strategies during extensive testing (Maussion et al. , 2014 and in other Himalayan modeling studies (e.g., Norris et al. 2015;Cannon et al. 2017). However, it does lead to a degree of discontinuity in the time series for key land surface states, such as snow water equivalent (SWE). This has implications for the land surface water balance, as discussed in sections 3b(2) and 4a(2). Two-way nesting was applied to run WRF with a 10-kmresolution child domain (HAR10) within a broader 30-kmresolution parent domain. The 30-km domain was then run separately, without the child domain, to remove any inconsistencies, thereby producing HAR30. Differences between HAR10 and HAR30 therefore provide insights into the effects of resolution and simulation strategy. Further details are given in the online supplemental material (section S1.1) and Maussion et al. (2014).
b. In situ observations and data processing

1) CLIMATE OBSERVATIONS
This study uses local climate observations from 13 stations covering a range of elevations, as detailed in Table 2 (see also Fig. 1b). These datasets have been subjected to quality control procedures and used in a number of previous studies (e.g., Archer and Fowler 2004;Fowler and Archer 2006;Collier et al. 2013;Soncini et al. 2015). We performed additional checks for inconsistencies, spurious values, and outliers, particularly for the most recent parts of the data. For a given station, these checks included comparisons of recently obtained data with climatologies derived from earlier parts of the time series, as well as inspections of interstation and intervariable consistency.
To compare the HAR with climate observations, we extracted hourly time series from the closest HAR grid point to each station location, similarly to Maussion et al. (2014). In the analyses, we included only days for which observations are available and months that are largely complete (less than 3 days missing data). For the evaluations of monthly temperature biases (section 4b, Fig. 6a) and incoming longwave radiation (section 4d, Fig. 10), we applied corrections for elevation differences between stations and HAR grid points. The corrections were based on local, monthly climatological gradients (i.e., lapse rates), which were calculated from the 9 HAR grid cells surrounding the station location using linear regression. The elevation correction procedure reduced mean annual temperature and longwave radiation biases by 4.68C and 12 W m 22 , respectively. Details of the regressions and sensitivity of the elevation correction to different time scales are discussed in section S1.2 of the supplemental material. We did not make similar elevation corrections in the other evaluations in section 4, due to the nature of the comparisons we employ, as well as the weaker, less consistent and more complicated dependence of the other variables on elevation.
Limitations here include the difference in scale between weather stations and the HAR grid resolution, as well as uncertainties inherited from numerical approximations at grid cells. Where applied, the elevation correction also omits the important influence of more local aspect and slope variation (Daly et al. 2008). This is difficult to account for given the low station density and the limitations of the HAR orography. With these issues in mind, we compare the datasets at multiple locations in relative as well as absolute terms, generally at monthly or longer time scales to reduce noise and identify robust patterns.

2) HYDROLOGICAL OBSERVATIONS
We consider the plausibility of HAR precipitation at the catchment scale based on river flow records for 11 gauged subbasins of the UIB (Table 3 and Fig. 1b). These data have been checked and applied in previous studies (Archer 2003;Sharif et al. 2013). We calculated HAR catchment means accounting for partial gridcell coverage, focusing on mean annual aggregations to diminish any confounding influences from interannual storage changes. We used overlapping periods of record as far as possible. In addition to comparing HAR precipitation with observed runoff directly, we also derived ''effective precipitation,'' defined here as precipitation minus evapotranspiration. This was calculated based on mean annual aggregations, with both terms taken from the HAR. Effective precipitation should be approximately equal to runoff in the mean annual case if storage changes are close to zero, an assumption which is consistent with the neutral glacier mass balances in much of the UIB (Hewitt 2005;Zhou et al. 2017;Bolch et al. 2017;Brun et al. 2017).
The effective precipitation approach was taken primarily because the runoff simulated by the Noah LSM in the HAR was found to be strongly affected by the limited accuracy of daily SWE reinitialization from the coarse FNL driving dataset. Limitations of our approach include notable uncertainty in HAR evapotranspiration, which means that this analysis should be considered indicative. These issues are discussed in section 4a(2) and the supplemental material, section S1.3. c. MODIS products and processing

1) SWE RECONSTRUCTION
Using the Collection 6 MOD10A1 500-m daily snow cover product (Hall and Riggs 2016), with cloud gaps infilled using the method of Gafurov and Bárdossy (2009), we employed a well-established procedure for reconstructing spatially distributed peak SWE (e.g., Raleigh and Lundquist 2012). In this approach, the timing of snow disappearance is identified from remote sensing, and snowmelt prior to this date is calculated using a model. In line with other studies, we used a simple temperature index (degree day) algorithm (e.g., Rice et al. 2011;Raleigh and Lundquist 2012). This was driven by observed daily temperature time series, adjusted for elevation with monthly climatological lapse rates estimated from linear regression using station data and elevations. The purpose of reconstructing peak SWE in this way was to gain a proxy for relative spatial variation in UIB mass input to evaluate the HAR's spatial and vertical precipitation gradients.
This method is predicated on the previous finding that air temperature provides a good index of energy available for ablation in this region (Archer 2004). However, its applicability is limited to areas where melting is mass constrained, that is, locations that are not glacierized or perennial snowpacks. Elsewhere, the primary assumption is that patterns of relative spatial variation in SWE are roughly consistent with those of precipitation at fairly broad scales. There are of course methodological limitations, including the simplification of temperature variability and its relationship with ablation, as well as the omission of snow redistribution and subgrid patterns. Nevertheless, our primary goal is only to indicate likely relative variation in mean annual mass input at much broader scales than the MODIS pixels.
2) CLOUD COVER The Collection 5.1 MOD06L2 daily 5 km cloud cover fraction (CCF) product (Platnick et al. 2003) is used to understand some of the patterns identified in the HAR. The applicability of this dataset in the UIB was demonstrated by Forsythe et al. (2015). We use MOD06L2 spatial means of daily average CCF for the NWUIB, as well as time series of the means of the nine pixels in MOD06L2 surrounding each climate station location. The equivalent HAR cloud cover variable was calculated by Maussion et al. (2014) as the maximum CCF in a 50-km horizontal view field for each cell, based on classifying all model layers using a threshold condensate mixing ratio, following Mölg and Kaser (2011). While there are differences in the provenance of MODIS and HAR CCF, we take these datasets as indicative of cloud cover patterns at generally coarse aggregations.

3) ALBEDO
We also use spatial means and station location time series from the Collection 5 MCD43A3 albedo product (shortwave band). The accuracy of this product has been established previously (e.g., Wang et al. 2012), although the challenges in albedo retrieval do increase in complex terrain (Wen et al. 2018). As such, we use the snow albedo from the MOD10A1 500 m daily snow cover product (Hall and Riggs 2016) for comparison, infilling no-snow pixels using MCD43A3 white sky albedo to provide an upper estimate of albedo.

1) POINT-SCALE COMPARISON
For a first indication of precipitation performance, we compare HAR mean annual values with observations (Fig. 2a). This shows that HAR10 is generally consistent with most observations, while HAR30 tends to overestimate relative to station records. The exceptions to good HAR10 agreement with measurements are three relatively high elevation stations-Khunjerab, Ziarat, and Ushkore-for which the HAR simulates high annual totals. This could be due to HAR limitations. For example, reductions in precipitation downwind of the major topographic barriers to the south and east of Khunjerab and Ziarat could be underestimated (see Fig. 1). However, the Khunjerab station is also located on an exposed, high mountain pass, such that the influences of strong local wind patterns unresolved by the HAR may be especially important. Ushkore and Ziarat both lie in relatively narrow valleys that are not captured in the HAR orography, such that local, valley-scale vertical precipitation gradients are not simulated in these cases. Figure 1b shows that most of the other stations recording precipitation sit in major valleys that are at least partly represented. These points highlight the challenge of scale differences for precipitation evaluation at station locations, while measurement limitations must also be acknowledged. As such, we augment our analysis with appraisal of precipitation gradients and subbasin water balances below.
To examine the intra-annual distribution of precipitation, we show the fractions of annual precipitation occurring in each month for the station ensemble in Fig. 2b. From this we can see that the HAR essentially agrees with the observed annual cycle of precipitation, with a maximum in winter/spring and a minimum in summer. However, the HAR generally overestimates the proportion of precipitation falling in winter and early spring, while underestimating the summer fraction. Figure 2b also shows that the HAR10 and HAR30 cycles are very similar, indicating little modification of precipitation seasonality by grid resolution. Notably, the observations exhibit more spatial variation in the shape of the annual precipitation cycle than exists in the HAR.
Interestingly, Reggiani et al. (2017) show that some of the coarser-resolution global reanalyses, including ERA-Interim and JRA-55, have a less stark annual cycle, which is more consistent with observations (see also supplemental material, section S2.1). This raises the possibility that HAR seasonality could be improved with alternative boundary conditions, although it is also possible that the limitations stem from aspects of the HAR setup, for example parameterized convection. Additional point-scale comparisons are given in the supplemental material (section S2), including time series, which support Maussion et al.'s (2014) finding that the HAR reproduces precipitation interannual variability.

2) CATCHMENT-SCALE ASSESSMENT
Moving to the catchment scale, we first compare spatially averaged mean annual precipitation from the HAR with observed runoff for gauged subbasins of the UIB (Fig. 3a). The results indicate that the HAR reproduces the intuitively positive relationship between mean annual precipitation and observed runoff. It is also apparent that the simulated precipitation exceeds observed runoff for all catchments, although the nival Astore and Gilgit subbasins do approach parity. Based on HAR10, we find the ratio of annual runoff to precipitation to be approximately 0.65. In addition, Table 4 shows that HAR annual precipitation for the UIB to Besham, upstream of Tarbela Reservoir, is similar to the ERA-Interim, JRA-55, and MERRA-2 global reanalyses, with all estimates falling in the range of 721-793 mm yr 21 for the period considered. In contrast, precipitation from the APHRODITE and TRMM products is substantially lower than observed runoff. This highlights the limits to their applicability in the UIB, as discussed in section 3.
We examine the HAR precipitation further by accounting for evapotranspiration, as simulated by the HAR [section 3b (2)]. The resulting quantity is termed effective precipitation here, which should be approximately equal to runoff at the mean annual scale considered, if storage changes are close to zero. Figure 3b shows that HAR mean annual effective precipitation and observed runoff do indeed generally converge, particularly for low-to moderate-yielding subbasins. For higher-yielding subbasins, larger errors are evident, although there does not appear to be systematic over or underestimation if all subbasins are considered together, at least for HAR10.
This suggests that the HAR may provide a reasonable approximation of catchment-scale precipitation on the whole, but the strength of this finding of course depends on the accuracy of HAR-simulated evapotranspiration and assumptions about storage changes. With glaciers being the major source of storage in the UIB, the storage change assumption is consistent with the approximately neutral glacier mass balances observed in the heavily glaciated parts of basin (Hewitt 2005;Zhou et al. 2017;Bolch et al. 2017;Brun et al. 2017). HAR mean annual evapotranspiration is approximately 275 mm for the UIB to its outlet at Besham, ranging from around 200 to 400 mm between subbasins.
These evapotranspiration values are within the range of estimates from reanalyses (see Table 4), remote sensing and hydrological modeling used in other recent studies Reggiani et al. 2017;Reggiani and Rientjes 2015). While consistency with other data sources is useful, clearly it does not provide complete validation of the HAR's absolute accuracy for evapotranspiration. Substantial uncertainties due to various errors and biases are of course associated with both HAR and alternative estimates. Although this means that the analysis should be considered indicative, it does at least suggest that the HAR shows notable potential as a source of catchmentscale precipitation for the data-sparse UIB, according to the best sources of information we have on other water balance terms at present (see also supplemental material, section S1.3). Further analyses through processbased hydrological modeling could explore this in more detail. Moreover, we can conclude with some confidence that HAR precipitation outperforms APHRODITE and TRMM at the catchment scale. How much information it adds relative to global reanalyses depends  (b) is calculated by subtracting HAR-simulated evapotranspiration from precipitation. Red outlines denote stations without data overlapping the HAR period, such that only the available record period was used. Numbers identifying gauges are given next to HAR10 points. Rientjes (2015) 675 6 100 200 6 100 ---in part on its vertical gradients and patterns of spatial variation, as explored in the next section.

VARIATION
We now examine whether the HAR simulates the vertical precipitation gradients required to accurately reproduce cryospheric accumulation and ablation patterns in the UIB, and therefore runoff amounts and timing (Hewitt 2014). The HAR hypsometry is provided in Fig. 4a, demonstrating the substantial benefit of a 10-km resolution, while Fig. 4b shows the precipitationelevation relationship for the NWUIB (based on HAR10 only). This relationship takes a log-linear form. The points in Fig. 4b, each representing a HAR10 grid cell, are colored according to the standardized residuals calculated after applying regression to the HAR10 precipitation-elevation relationship. Two observationbased profiles of precipitation are also shown on Fig. 4b (Hewitt 2014;Winiger et al. 2005), while additional profiles from model-based precipitation reconstructions are given in the supplemental material (section S2.5).
Importantly, Fig. 4b suggests that the central tendency of the HAR10 vertical profile agrees quite closely with Winiger et al.'s (2005) profile, the formulation of which has been the basis for several studies of UIB hydroclimatology (e.g., Bocchiola et al. 2011;Soncini et al. 2015;Reggiani et al. 2016). Some contrasts with the observation-based profiles are of course expected, based on the differing and restricted spatial extents of the measurement networks used, for example. In addition, field observations suggest that precipitation may begin to decrease at very high elevations (above around 5000-5500 m MSL) due to exhaustion of moisture availability, as noted in section 2 (e.g., Hewitt 2014). However, Fig. 4b indicates that HAR10 does not show a decrease in precipitation at its highest elevations. This may be related to topographic smoothing in HAR10, as its elevation peaks at around the same point at which the precipitation inversion is expected to start. This could mean that a decrease is simply not visible or is potentially inhibited by the effects of topographic smoothing on orographic precipitation dynamics and thermodynamics .
To understand the notable scatter in Fig. 4b, we map the cell-wise standardized residuals back to the NWUIB domain in Fig. 4c to reveal their patterns of spatial correlation. The red and blue areas can be interpreted as showing which parts of the NWUIB have relatively low or high precipitation, respectively, compared with what might be expected from elevation alone (i.e., black line in Fig. 4b). This allows us to see spatial variation at the subregional scale, but clearly scales finer than the HAR10 grid are unresolved. From Fig. 4c we can see relatively high precipitation zones along the west and southwest margins of the NWUIB, the areas of the domain first encountered by the prevailing westerly flows. This is consistent with the high yield of the nival Astore subbasin (Fig. 3). A second zone of relatively high precipitation trending from northwest to southeast (from the Hunza into the Shigar subbasin) can be seen to follow the terrain (see Fig. 1b). To the northwest of this ridge in the Hunza basin, precipitation is comparatively low for its elevation, suggesting that the known rain shadow effect here is captured by the HAR to some degree at least (Winiger et al. 2005). Other prominent zones of relatively low precipitation and vertical gradients are found in the central Gilgit catchment and the southeast corner of the NWUIB adjoining the Shigar subbasin (see also Fig. 1b).
For qualitative evaluation of these patterns, we compare the HAR with reconstructed SWE (section 3c). Similar to Fig. 4c, we show a spatial distribution of standardized residuals in Fig. 4d, but this time based on the regression of reconstructed SWE and elevation, rather than the regression of HAR precipitation and elevation that underpins Fig. 4c. Glaciers are also shown, as SWE reconstruction is not undertaken in these areas. Comparing Fig. 4c and Fig. 4d shows several similarities, such as the relatively high precipitation amounts in the Astore subbasin and around the southern margin of the Gilgit subbasin. It is clear that the major band of glacierized area, running from northwest to southeast from the Hunza into the Shigar subbasin in Fig. 4d, corresponds with a relatively high precipitation zone in the HAR. There is also agreement on the zones of comparatively low precipitation relative to their elevations, including the central part of the Gilgit subbasin, but particularly the southeast and northeast corners of the NWUIB.
There are of course limitations in this qualitative comparison. In particular, it presumes high correlation of precipitation, snowfall, and peak SWE, but this is expected to be generally the case at the scale of multiple HAR cells discussed here (supplemental material, section S2.5). Thus, despite the uncertainties, the key similarities between Fig. 4c and Fig. 4d suggest that the HAR patterns of relative spatial variation in precipitation are physically consistent with what we can infer from remote sensing. In conjunction with the generally reasonable agreement with both climate and hydrological observations, this analysis suggests that the HAR10 reproduces some critical features of UIB precipitation amounts and spatiotemporal distribution. The HAR10 may thus approach a resolution high enough to capture the most important effects of orographic precipitation in this area, particularly given the dominance of westerly winter/spring events Daly et al. 2017).

b. Temperature
The annual cycle of monthly mean near-surface (2 m) air temperature for the station ensemble is shown in Fig. 5a. Prior to calculating the ensemble summary statistics, temperatures for each station were expressed as differences relative to their annual means, in order to remove the effects of elevation differences between stations. This provides a way to examine overall differences in the shape of the annual cycles. We can therefore see from Fig. 5a that HAR10 and HAR30 have very similar annual cycles and both agree with observations in general. However, the amplitude of the cycle is overestimated by the HAR, mainly through a higher summer peak relative to the observed cycle. The rate of temperature increase in spring and early summer is also initially shallower than for observations, before becoming comparatively steeper. Figure 5b displays the annual cycle of monthly mean diurnal temperature range (DTR), an important variable that contains signals from cloud influences on radiation and other factors affecting near-surface heating and fluxes. Figure 5b demonstrates that observed DTR is typically higher than in the HAR. However, it does also show that the HAR captures the comparatively modest annual DTR cycle for most stations, with DTR highest in the summer months. On average, the HAR tends to be most biased in spring, early summer and autumn, underestimating DTR at these times, especially in HAR10. The largest underestimation of DTR occurs for lower-elevation stations, especially Gilgit and Skardu, which could reflect a cold pool effect not present in the HAR.
Using the HAR elevation-corrected temperatures to examine the bias in the monthly means [section 3b(1)] reveals that the HAR is generally colder than observations (Fig. 6a). The bias is largest in spring and smallest in summer, with HAR30 generally being slightly colder than HAR10. In the supplemental material (section S1.2), we show that this annual cycle of bias is robust to the alternative methods of elevation correction tested. Importantly, the peak cold bias in spring coincides with the largest underestimation in the temperature cycle shown in Fig. 5a, suggesting that the HAR does not warm quickly enough at this time of year. The smallest temperature bias in summer occurs when the HAR FIG. 6. HAR temperature bias and its hydrological implications. (a) Differences between elevation-adjusted HAR monthly mean temperatures and station observations. The median, interquartile range, and the range of differences across all stations are given. (b) The fraction of NWUIB area with temperatures above freezing according to linear regression, for both observations and the HAR (ranges show 95% confidence intervals). HAR temperatures were adjusted for elevation differences compared with stations, using local lapse rates in the HAR, identified through regression.

MARCH 2019 P R I T C H A R D E T A L .
shows a more rapid and pronounced increase in the annual temperature cycle than observations (Fig. 5a). Controls on rates of change in the annual temperature cycle are therefore likely to be critical to the seasonally varying magnitude of the cold bias in the HAR.
To assess the implications of this temperature bias for runoff, we use linear regression of temperature and elevation to examine the proportion of the NWUIB area that lies above the freezing isotherm each month. This provides a broad indication of the area of the catchment where snow and ice melt can take place, termed here ''melt area.'' Figure 6b shows that the HAR has a substantially compressed annual cycle of melt area compared with what we can infer from observations. As melt area is a key control on runoff sensitivity and variability in the UIB (Forsythe et al. 2012b), understanding the causes and implications of this bias is important for hydrological applications, as well as for minimizing biases in future WRF simulations. Additional analysis of temperature variability and lapse rates, both fairly well simulated, is given in the supplemental material (section S3). Figure 7a depicts the annual cycle of specific humidity variation as monthly differences relative to the annual mean, similarly to Fig. 6a. There is generally good agreement between the observed and HAR datasets, although the HAR appears to underestimate the amplitude of the annual cycle. In particular, it shows a higher winter minimum than observations, but a lower summer peak. Expressed as relative humidity, the HAR provides an overestimate in winter, spring, and autumn, before converging on observations in summer (Fig. 7b).

c. Humidity
The HAR thus displays a comparatively pronounced annual relative humidity cycle. The closer match in summer in relative humidity coincides with the dampened peak in the specific humidity cycle, as well as lower apparent temperature bias at this time of year (see section 4b). Conversely, overestimation of the specific humidity minimum in winter-and relative humidity in winter, spring, and autumn-occurs during periods of colder temperature bias.
To see whether the HAR captures spatial and vertical variation in humidity, we plot seasonal elevation profiles for winter [December-February (DJF)] and summer [June-August (JJA)] in Figs. 8a and 8b, respectively. The strong elevation dependence of specific humidity in winter is well simulated, although there may be some underestimation at lower elevations; more local observations would be required to demonstrate this. In summer, the HAR shows a substantial increase in spatial variation of specific humidity, which is also present in observations. This means the observations are likely to be less spatially representative, but in general the level of consistency is good. Equivalent vertical profiles for relative humidity are given in the supplemental material (section S4), which shows better consistency with observations is present in summer.
There are several hydrological implications from these humidity patterns. First, high relative humidity could lead to a low vapor pressure deficit, suppressing evapotranspiration calculated using the FAO Penman-Monteith approach for example (Allen et al. 1998). However, the relative humidity (and temperature) bias generally reduces in summer when evapotranspiration is most significant in the UIB (Reggiani et al. 2017), thereby diminishing the magnitude of this issue, although further simulations would be required to quantify this more precisely. Second, as specific humidity is reasonably well represented, it may provide useful direct inputs to snow and glacier surface energy balance models, where latent heat fluxes depend on near-surface gradients in specific humidity or vapor pressure.

d. Incoming radiation
In Fig. 9 we compare HAR incoming shortwave radiation time series (daily and 28-day moving average) with observations from the Askole EvK2CNR AWS. The daily time series demonstrates that the HAR and observed peaks match well throughout the year. This confirms that the HAR accurately simulates incoming shortwave radiation under clear-sky conditions. In contrast, the observed moving average series are clearly lower than the HAR, particularly in summer. This is also the case for the two other EvK2CNR AWSs, Urdukas and Concordia, as well as for other years where data are available (supplemental material, section S5). Given the good level of agreement for incoming shortwave peaks, that is, clear-sky conditions, the differences appear to be related to the underestimation of cloudiness or cloud reflection effects. This is particularly noticeable in the HAR daily time series in Fig. 9, where the magnitude and frequency of cloud effects are visibly underestimated in summer.
For incoming longwave radiation, we compare the HAR with observations from the Concordia AWS in Fig. 10. The daily time series in this figure shows that in general the HAR captures quite accurately the observed peaks in incoming longwave radiation for much of the year. However, the HAR displays an overall underestimation of longwave radiation, particularly in summer. The magnitude of daily variability relative to observations is also too large in spring, summer and autumn. During clear-sky conditions the HAR, especially HAR10 in fact, may therefore underestimate incoming longwave radiation, while during cloudy conditions it may induce more longwave enhancement than apparent in observations. However, longer time series of in situ observations from multiple stations would be required to confirm the generality of this finding. Therefore, in late spring, summer and early autumn, the underestimation of cloudiness suggested by the incoming shortwave radiation comparison may correspond to a low bias in incoming longwave radiation. The bias toward too high a frequency of clear-sky conditions peaks in summer, which would be expected to lead to a positive bias in total incoming radiation, due to the higher magnitude of the shortwave flux at this time of year.

1) CLIMATOLOGY
To see if the cloud bias implied by the incoming radiation comparison can be corroborated with independent data, we compare the HAR cloud cover climatology with MODIS remote sensing in Fig. 11a. This does indeed confirm that the HAR underestimates overall cloudiness in the NWUIB, most severely in summer. Too little cloud therefore induces the summer underestimation of shortwave reflection identified in section 4d, which dominates total incoming radiation variability at this time of year. Similar conclusions regarding low cloud bias influences on shortwave radiation were also reached by Ruiz-Arias et al. (2016), albeit for a different climatic context. Interestingly, comparing the HAR cloud cover performance with the analysis in Forsythe et al. (2015) demonstrates that greater agreement with MODIS is evident for some global reanalyses, particularly ERA-Interim. Unraveling the causes of the apparent low cloud cover in the HAR is beyond the scope of this paper, but the microphysics, planetary boundary layer (PBL) and cumulus schemes might all play a role (e.g., Otkin and Greenwald 2008;Thompson et al. 2016;Ruiz-Arias et al. 2016).
Due to the important coupling between cloud radiative effects, snow cover, and temperature (Betts et al. 2013(Betts et al. , 2014Forsythe et al. 2015), we also evaluate the HAR annual cycle of surface albedo, which is defined in general terms as the proportion of shortwave radiation received at the surface that is reflected. Figure 11b suggests that the HAR has notably higher albedo than MODIS in winter, spring and autumn. The largest overestimation occurs in spring, but there is potentially closer agreement in summer. The biases in albedo suggest that snow cover extent, the main control on the annual cycle of albedo variation in the UIB, does not decay sufficiently rapidly in spring and summer in the HAR. This issue is closely connected to the HAR's daily reinitialization strategy, especially in terms of the challenges of deriving higher-resolution snow cover (SWE) for each day from the much coarser FNL dataset. The snowpack process parameterizations in the Noah land surface model (LSM) may also play a role, as discussed in section 5. While there is some uncertainty in the MODIS reference datasets [section 3c(3)], the magnitude of differences in the mean, amplitude and shape of the annual cycles are large, which suggests that the HAR overestimation issues are substantive.

2) RELATIONSHIPS WITH TEMPERATURE
To examine the implications of the apparent biases in cloud and albedo, we focus now on correlations of monthly (ranked) anomalies of CCF and near-surface air temperature (Fig. 12a), as well as correlations of surface albedo and temperature (Fig. 12b). These correlations are performed separately for the HAR and FIG. 10. As in Fig. 9, but for incoming longwave radiation at the Concordia AWS.
observations (including MODIS) to see if the HAR adequately reproduces the observed dependencies between temperature and two of its controlling influences. From this we can infer whether there are process representation deficiencies in the HAR.
Considering observations first, Fig. 12a shows a characteristic annual cycle of observed cloud influences on temperature. In winter, the positive correlations reflect the net warming effect of cloud cover at this time of year, which occurs through the mechanism of longwave enhancement (Forsythe et al. 2015). The subsequent transition to negative correlations shows a change in dominant cloud radiative effects to cooling in spring, summer, and autumn. This cooling is induced primarily by cloud shortwave reflection, which increases in importance in line with the annual cycle of incoming shortwave radiation (Forsythe et al. 2015;Betts et al. 2013). At this time of year, Fig. 12b shows that correlations between surface albedo and temperature in observations drop to around zero on average, although FIG. 11. Comparison of HAR cloud cover fraction (CCF) and albedo with MODIS products. (a) The annual cycle of CCF for the NWUIB (spatial means) from MODIS (MOD06L2) compared with the HAR. Bold lines show the monthly means while ranges show the 10th-90th percentiles (i.e., interannual variability). The HAR CCF variable is only available for HAR10. (b) The annual cycle of spatial mean albedo for the NWUIB from MODIS and the HAR. MCD43b stands for MODIS MCD43A3 black sky albedo, MCD43w for MCD43A3 white sky albedo, and MOD10 for MOD10A1 snow albedo infilled using MCD43A3 for no-snow pixels. HAR ranges show the 10th-90th percentiles (i.e., interannual variability). variability among station locations is notable. This is consistent with snow cover retreat in the spring and summer, which allows cloud cover variations to dominate net shortwave radiation variability and thus surface heating. Figure 12a suggests that the warming effects of cloud cover are present in winter, but the transition to negative correlations (cloud cooling effects) does not develop as early, strongly, or consistently in the HAR. This is consistent with the concurrent low absolute cloud cover in the HAR, which suppresses the potential for interannual cloud variability to influence surface heating. For surface albedo, the HAR shows an annual cycle that is approximately the inverse of the pattern in observations. From spring into summer, with the exception of August, much stronger (negative) correlations between albedo and temperature exist in the HAR than in observations. The greater sensitivity to albedo variations likely reflects the influence of high biases in both the mean and variability of HAR albedo, but it is potentially compounded by the underestimation of cloud and its influences on temperature.
The implication of this is that the role of snow presence and its effects on albedo as a climate switch for cloud radiative effects is not entirely reproduced in the HAR (Betts et al. 2013(Betts et al. , 2014. However, we note that the station locations analyzed here are primarily in valleys, although Fig. 11a does suggest that the cloud underestimation problem affects much of the NWUIB in summer. The statistical significance of the correlations in Fig. 12 is summarized in the supplemental material (section S7.1). This supports the point that there is a significant and physically coherent annual cycle of correlations between cloud and temperature in the observations, which does not develop as strongly in the HAR. It also confirms that the frequently significant sensitivity to albedo variation in the HAR in summer is higher than observations would suggest. However, we note that spread in Fig. 12 can be large, which highlights the variability in these correlations and therefore the complexity of these relationships in reality.

Discussion
The results in section 4 show both strengths and weaknesses in the HAR climatology for different variables. One important finding is that there are relationships between seasonal variations in bias in different variables. These relationships are relevant for various applications of the HAR dataset, as well as further regional climate modeling, such that we focus our discussion here on them.
Starting with temperature, the HAR bias is largest (coldest) in spring and smallest in summer. This helps to induce the annual cycle of bias in relative humidity, whereby a general high bias peaks in spring but reduces in summer. Critically, the seasonal variation in temperature bias appears to be associated with annual cycles of bias in the surface radiative balance. In summer, incoming shortwave (longwave) radiation is generally overestimated (underestimated), largely due to insufficient cloud cover. This would generally lead to greater surface heating and so reduce the overall cold (low) bias in temperature (DTR), albeit through error compensation. The relatively low fraction of annual precipitation falling in summer months is also consistent with insufficient cloudiness and reduced temperature bias. Furthermore, in spring, the peak cold bias in temperature and slow warming rates in the HAR appear to be related to the high bias in surface albedo. This is because surface warming, associated with rising incoming shortwave radiation at this time of year, would be impeded by overestimation of reflection effects.
The importance of albedo and snow representations for near-surface radiative and temperature biases was also recently investigated by Tomasi et al. (2017). Modifications to snow initialization, snow cover fraction and albedo parameterizations in both the Noah and Noah-MP LSMs were needed to reduce winter cold biases in their WRF simulations of an Alpine valley. For the Tibetan Plateau, Meng et al. (2018) additionally found that inserting albedo derived from MODIS into WRF simulations substantially diminished the cold bias in winter, spring, and to a lesser degree autumn. However, García-Díez et al. (2015) noted that albedo may act mainly as a feedback amplifying poor representations of the snowpack or snow-atmosphere interactions in the Noah LSM, rather than a primary driver of temperature bias. Indeed, it may be that such a feedback underpins the large peak cold bias in spring in the HAR, possibly by reinforcing biases induced by other aspects of the WRF configuration. While snow process representation in Noah has improved over time (e.g., Livneh et al. 2010;Barlage et al. 2010;Wang et al. 2010), these and other studies (e.g., Saha et al. 2017) agree with our suggestion that some of the UIB cold bias in the HAR could potentially be reduced with revisions to the snow initialization approach, as well as snow process and subgrid variability parameterizations. García-Díez et al. (2015) also highlighted the sensitivity of summer precipitation and temperature biases to different cumulus parameterization schemes. Similar to our results, they demonstrated the potential for error compensation to reduce cold summer temperature biases through overestimation of incoming shortwave radiation. That summer humidity and temperature appear reasonably well simulated in the HAR, when cloud cover and precipitation are underestimated, confirms García-Díez et al.'s point that model evaluation needs to look at multiple climate variables to reveal possible error compensation and inconsistencies.
The other influences that may contribute substantially to biases in the HAR include PBL schemes, which parameterize unresolved turbulent fluxes of heat, momentum and moisture. Hu et al. (2010) showed that the differences in temperature when using various schemes relate to the degree of vertical mixing and entrainment of air overlying the PBL. Interestingly, Hu et al. found the Mellor-Yamada-Janjić (MYJ) PBL scheme used by the HAR to be the coldest of the schemes they tested, which could contribute to the general cold bias identified here. García-Díez et al. (2013) also found the MYJ scheme to be relatively cold, but they noted that the differences between PBL schemes were relatively consistent throughout the year, despite the fact that the sign of temperature bias changed between seasons. This led García-Díez et al. to suggest that further investigation of the surface radiative balance is required to understand seasonal variation in temperature bias, as opposed to mean annual bias, which is supported by the results of this study.
Furthermore, several studies have shown that large errors can result from soil moisture initialization using coarse-resolution datasets (e.g., Case et al. 2008;Massey et al. 2016). Bastin et al. (2018) found that a dry soil bias led to overestimated summer temperatures at their study site in France, with low cloud cover and consequent overestimation of incoming shortwave radiation acting to further dry the soil. The feedbacks between temperature, soil moisture, cloud cover and radiation biases in this example highlight the importance of interactions between multiple processes. Any local or regional soil moisture biases in the HAR could affect a range of surface and upper-level variables through numerous feedbacks (Massey et al. 2016), particularly when snow cover declines in spring and summer. The scarcity of soil moisture data in the region for evaluation or assimilation is thus a significant issue, which is compounded by the challenge of providing sufficient spinup with limited computational resources.
Several important avenues for future work arise from this discussion. One is that more sensitivity and performance tests with different physics parameterization schemes are needed in this region. These would ideally be conducted with a multiphysics approach to test different microphysics, cumulus, PBL, and LSM schemes (e.g., García-Díez et al. 2015). For the LSM scheme, the biases in the HAR suggest that testing needs to include careful evaluation of snowpack processes, snow-atmosphere interactions and subgrid variability, as well as glacier representations. As noted by Collier et al. (2013), limitations arise in the Noah LSM used in the HAR from assumptions on minimum snow depth and SWE in glaciated cells, which likely overestimate albedo by omitting the influences of bare ice and debris cover under thin or no snow cover. MODIS snow cover and albedo products may help to further constrain the relevant process parameterizations to some degree.
In addition, the potential for improving initialization strategies needs to be explored. This applies particularly to snow cover and surface albedo, which could again utilize MODIS remote sensing products (e.g., Meng et al. 2018). The NASA Soil Moisture Active Passive (SMAP) project may also ultimately provide a possibility to improve soil moisture initialization, while the implications of spinup period length should also be assessed. Moreover, it would be useful to test alternative data products for boundary conditions, such as global reanalyses, in order to delineate the relative influences of WRF configuration choices and forcing datasets. In these tests, the boundaries of the inner model domain could be placed farther from the UIB, which would help determine if there is any residual boundary influence on the UIB in HAR10. With increasing computing power it may also be possible to test the feasibility of moving to higher-resolution, convection-permitting climate simulations. This would be particularly useful for finescale characterization of near-surface climate, as well as hydrological and cryospheric modeling.

Conclusions
This evaluation of HAR performance for the UIB leads to three main conclusions. First, we find that the HAR provides a good representation of UIB precipitation at multiple scales. In particular, HAR10 is consistent with most in situ measurements, while also showing coherence with observed runoff for most gauged subbasins. In addition, vertical precipitation gradients and spatial variation fit with other studies and inferences from MODIS data products. The HAR10 thus appears to represent a valuable source of precipitation data to supplement local observations in the UIB, although further testing through hydrological modeling should be undertaken.
Second, our results suggest that HAR temperature, humidity, and incoming radiation in the UIB show reasonable climatologies overall, but also distinct patterns of seasonal variation in their biases. Temperatures exhibit a cold bias, which is largest in spring and smallest MARCH 2019 P R I T C H A R D E T A L .
in summer. DTR is slightly underestimated throughout the year, with bias a little larger in spring and autumn. The HAR also displays high relative humidity in winter, spring, and autumn, but less bias in summer. The available observations show that incoming shortwave radiation is overestimated in spring, summer (particularly), and autumn, during which time incoming longwave radiation is also underestimated. Finally, we conclude that seasonal variation in biases is at least partially related to deficiencies in HAR cloud, snow cover, and albedo representations. Comparisons with MODIS products confirm that the HAR underestimates cloudiness, particularly during summer, which helps explain the biases in incoming radiation. Correlation analyses further suggest that the HAR does not fully reproduce the observed pattern of cloud radiative effects on temperature. Observations show a cycle of cloud warming effects in winter, through longwave enhancement, giving way to cloud cooling in summer, through shortwave reflection. This cycle does not develop to the same extent in the HAR. As such, the lower absolute temperature bias in summer may stem at least partly from excess incoming shortwave radiation occurring in the HAR at a time when cloud reflection effects are strongly underrepresented and surface albedo is at its lowest, thereby heightening warming. The high bias in HAR surface albedo throughout much of the year is also likely to affect the seasonality of biases, particularly in spring by reflecting a high proportion of the rising incoming shortwave radiation, magnifying the general cold bias.
Overall, our evaluation demonstrates the strong potential of simulations like the HAR for supplementing the limited local climate observations available in the Himalayan region. However, the findings also suggest that correctly parameterizing cloud, snow cover, and albedo processes appears to be critical for improved simulations of regional climate. Combining local observations and remote sensing data with more sensitivity tests may help to improve model representation, but of course uncertainties will be substantial in complex and data-sparse regions like the UIB. Further research is also required to see whether alternative datasets can provide improved boundary conditions and initialization datasets for WRF.