Statistical and dynamical downscaling impact on projected hydrologic assessment in arid environment: A case study from Bill Williams River basin and Alamo Lake, Arizona

A study was conducted to assess the projected impact of future climate on Alamo Lake and the Bill Williams River basin. We analyzed simulations of three-selected Representative Concentration Pathways 8.5 Global Climate Models (GCM) (i.e. HadGEM2-ES, MPI-ESM-LR and GFDL-ESM2M). These GCMs which, were part of the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report, were selected as well performing GCMs that represent the historic climatology and prevailing precipitation bearing synoptic conditions in the southwest US. An analysis of both statistically and dynamically downscaled simulations projected increase in the frequency of dry winters during the mid-21st century (2020–2059) in two out of the three selected GCMs. For summer precipitation, the statistically downscaled simulations are inconclusive whereas, the dynamically downscaled simulations showed significant but contradicting future projections. In order to assess the impact of the projected climate on the hydrologic cycle at the Bill Williams River basins, we developed a modeling framework that includes the following components: 1) a weather generator that produces realizations of likely hourly precipitation events over the basin; 2) a hydrologic model that is based on the Colorado Basin River Forecast Center (CBRFC), National Weather Service modeling configuration that predicts flow at ten internal points and inflow into Alamo Lake; and 3) a lake model with the existing operation rules to simulates the lake outflow and levels. Using the above-described modeling framework, the impact of the projected mid 21-century climate on Alamo Lake was examined with respect to the total outflow from the dam, the frequency of large outflow events, and the frequency of high and low lake levels. The results show that dynamic downscaling provides a larger range of impacts than those provided by statistical downscaling. The results also indicate a wide range of impact scenarios with contradicting trends among the selected climate projections for mid-21st Century. These results imply increasing challenges in operating the Lake at its target level. This modeling framework can potentially be used to examine various future scenarios and to develop recommendations for a sustainable management scheme for the Alamo Lake. 2019 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Introduction
We conducted a hydrologic impact assessment using projected climate for the mid-21st century at the Bill Williams River (BWR) watershed and Alamo Lake, a tributary of the Colorado River, upstream of Lake Havasu. Several studies have explored the projected impact of future climate on the Colorado River flow (e.g. Christensen and Lettenmaier, 2007;Vano et al., 2014;Gautam and Mascaro, 2018). However, because of the large spatial climate variability within the Colorado River watershed, these basin-wide studies may not be representative of the BWR tributary. In the arid water-scarce BWR watershed, the precipitation high spatiotemporal variability triggers highly variable streamflow events in https://doi.org/10.1016/j.hydroa.2019.100019 2589-9155/Ó 2019 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). ephemeral streams. Therefore, even small projected climatic changes may be a cause of concern.
Our objective in this study is to evaluate the differences in the hydrologic assessment that result from the selection of GCMs or the selection of downscaling methodology. For hydrologic impact assessment studies, a downscaling of Global Climate Models (GCMs) future projections is often needed in order to bridge the scale gap between the coarse GCM simulations and the required finer resolution for regional and local watershed studies. Two fundamental downscaling approaches are often used: Statistical Downscaling (SD) and Dynamical Downscaling (DD).
Statistical downscaling (SD) procedures derive empirical relationships between atmospheric forcing data of the GCM and surface variable of interest in finer resolution to apply those relationships to future GCMs' projections. It assumes that the statistical relationship between the predictors (GCM) and predictand (surface variables) do not change over time and are therefore stationary (e.g., Carpenter and Georgakakos, 2001). The SD procedures main advantage is that they do not require special computational resources and therefore can be used to produce high-resolution simulations and can be applied for a large number of GCMs.
Dynamically downscale (DD) models use Regional Climate Models nested within the GCM to simulate local climate features. They respond in physically consistent ways to resolve regional atmospheric processes and simulate mesoscale variables of interest, such as: convective storms, extreme events, and snowfall versus rainfall. They simulate internally consistent multivariate quantities within an atmospheric column. The main limitation associated with DD is the requirement for intensive computational resources, therefore DD simulations are only available for a limited number of GCMs. General discussion of the pros-and-cons of selecting one downscaling approach over the other have been the subject of multiple manuscripts (e.g. Fowler et al., 2007;Maraun et al., 2010;Kotamarthi et al., 2016).
Local hydrologic impact assessment studies for various objectives often face the dilemma of having to choose between SD and DD approaches. SD simulations are easier to obtain and there are several readily available datasets that include simulations for many different GCMs. DD simulations on the other hand, require a high level of expertise to produce and are available for only a few GCMs. This study is geared to evaluate the differences in the hydrologic assessment that result from selecting one downscaling approach over the other. We compared between the state-of-theart DD and SD datasets that are available for the study area.
To assess the hydrologic impact of the projected climate a hydrologic modeling framework was developed for the Bill Williams River and Alamo Lake, which consists of a 1) precipitation weather generator (WG) that produces ensembles of equally likely to occur hourly precipitation sequences; 2) hydrologic model that simulates flow in internal points in the basin and inflow into Alamo Lake; and 3) lake model that simulates lake levels and outflow.
The objective of the WG is to transform the climate models simulations into precipitation ensembles that can be used as input to the hydrologic model. A WG ensemble with sufficient realizations represents the spatio-temporal variability of the observed record. In addition, the WG was further modified to reflect changes that were identified by comparing the historic and projected climate models simulations, to generate precipitation ensembles that reflect the projected changes. This approach provides a tool to translate a single climate model simulation into an ensemble that can be used as input for the hydrologic model and thus simulate a distribution of model outcomes, rather than a single simulation. This is an attractive approach for water resources planning in arid regions that have large inter-and intra annual climate variability and its hydrologic response is very sensitive to nuances in precipitation patterns.
In the next section (Section 2), we discuss the selected GCMs and their performance for the study area. The study area and its hydrologic characteristics are presented in Section 3. Section 4 is dedicated to the description of the hydrologic modeling framework that is implemented in this study. Following the results presented in Section 5, in Section 6 we provide a summary and conclusions.

Selected climate models
Three GCMs from the Coupled Model Intercomparison Project Phase 5 (CMIP5) simulated with Representative Concentration Pathways 8.5 (RCP 8.5) were selected for this study. The selected GCMs are: 1) HadGEM2-ES (Global Environmental Model, Version 2 from the United Kingdom Meteorological Office, the Hadley Centre), 2) MPI-ESM-LR (Earth System Model) running on low resolution (LR) grid from the Max Planck Institute for Meteorology, and 3) GFDL-ESM2M (NOAA Geophysical Fluid Dynamic Laboratory -Earth System Model). These GCMs were selected to be dynamically downscaled because of their good performance over North America and because they represent the range of the North America climate sensitivity that is provided by all CMIP 5 GCMs (Sheffield et al., 2013 I and II).
DD precipitation simulations of the three GCMs are available for the historic period  and future projections with RCP 8.5  at 25 km horizontal grid resolution from National Center for Atmospheric Research (NCAR) and the University of Arizona, Department of Hydrology and Atmospheric Sciences. The simulations followed the specifications of the North-America Coordinated Regional Climate Downscaling Experiment (CORDEX) program (https://na-cordex.org/), an initiative sponsored by the World Climate Research Program to provide regional climate downscaling data for regional climate change adaptation and impacts assessment. The three GCMs were downscaled for the domain of the NA-CORDEX program using the Advanced Research version of the Weather Research and Forecasting (WRF) model (Version 3.1) as the Regional Climate Model. The configuration of the WRF model is described in Castro et al. (2017). The simulations are available at 3-hour intervals for the WRF-HadGEM2-ES, and 6-hour intervals for the WRF-MPI-ESM-LR and WRF-GFDL-ESM2M.
An equivalent WRF reanalysis 6-hour precipitation dataset  nested within ERA-Interim global reanalysis (Dee et al., 2011) is also available at a similar spatial resolution and model configuration as the DD simulations.
SD daily precipitation for 1950-2005 and 2006-2099 at 0.0625°( 6 km) horizontal grid spacing for the three GCMs is available for the western U.S from the state-of-the-art Localized Constructed Analogs (LOCA) methodology (Pierce et al., 2014). LOCA's leading downscaling assumption is that the forecast will evolve the same way as the best matching historical event. An observed gridded precipitation with similar spatiotemporal resolution for 1950-2015 is available from Livneh et al. (2013). This dataset which was used as the reference for the derivation of the LOCA dataset, was developed for the conterminous Mexico, and U.S. and regions in Canada south of 53°N.
In the Southwest U.S., a credible GCM to be used for hydrologic impact assessment should be one that represents the precipitation characteristics of both winter and summer. The prevailing winter storms (November-March) originate from large-scale lowpressure frontal systems approaching from the west and southwest. These storms that may last for a few days to drop persistent rain over large areas, often produce snowfall at higher elevations. However, in our study area, the falling snow commonly melts within a few days and is not a major runoff component into Alamo Lake.
The prevailing summer rainfall (June-September), is driven by the North American Monsson (NAM) climate system, which triggers isolated convective cells that often produce intense shortlived rainfall events. While winter rainfall events generate gradual rising streamflow events with low flow that persists following the rain event, summer rainfall events trigger quick and sudden rising flow followed by short-lived low flow.
A comprehensive evaluation analysis of GCMs for winter precipitation is available from the California Department of Water Resources (2015) who used 3-step model screening process to evaluate the historical performance of 31 CMIP 5 GCMs in three sequential spatial scales (i.e. global, Southwestern U.S, and California). Most winter storms to hit Arizona originate in the mid latitude Pacific Ocean and cross over California. In their analysis, the HadGEM2-ES was selected among the 10 top performing GCMs while the MPI-ESM-LR and GFDL-ESM2M were selected among the 15 top performing models. Dominated by regional (mesoscale) processes, representing the NAM in the relatively coarse GCM is a challenge (e.g. Castro et al., 2012Castro et al., , 2017Bukovsky et al., 2013, Geil et al., 2013. Evaluation of large-scale features of the NAM system by GCMs is an active research topic (e.g. Arritt et al., 2000;Liang et al., 2008;Lin et al., 2008;Geil et al., 2013;Pascale et al., 2016). The selected GCMs for this study were reported to well represent the NAM system (Geil et al., 2013).
Given our study objective to evaluate the selection impact of the downscaling methodology on hydrologic impact assessments, a case can be made that three GCMs and one RCM (i.e. WRF) as analyzed in this study may not be sufficient to reach conclusions that can be generalized for all SD and DD simulations. As mentioned above, the selection of the three GCMs is constrained by the limited availability of DD simulations. The selected datasets, as far as we know, are the only high-resolution datasets readily available and any future climate study conducted in this region would have likely contemplated the use of the DD and SD datasets presented herein. Therefore, it is likely that for the southwest U.S. region the selected datasets well represent the consequence of the selection of a downscaling procedure.

Study area
The Bill Williams River watershed (hereinafter BWR) (5393 mile 2 ; 13,968 km 2 ) in west-central Arizona, is a tributary of the Colorado River that drains into Lake Havasu just upstream of Parker Dam. Alamo Dam, 36 miles (58 km) upstream of the confluence with the Colorado River, was built in 1968 to form Alamo Lake as a multi-purpose facility with a primary objective to control flooding downstream of the dam on the Lower BWR. Secondary objectives of the lake include water supply and conservation, recreation, and fish and wildlife enhancement. At the site of the dam, the contributing drainage area is about 4770 mile 2 (12,354 km 2 ) from mainly three ephemeral tributaries, with a few perennial sections, which drain into Alamo Lake: Big Sandy River, Santa Maria River, and Burro Creek. No significant flood control structures exist in the drainage area of these tributaries. A map of the BWR watershed with the eleven sub basins that were used for the model development (further discussed in Section 4) is in Fig. 1.
The diverse physiography of the BWR watershed spans from high elevation forested mountains to rugged desert terrain in the lower elevations. Detailed hydrologic and geomorphic description of the watershed is provided in House et al. (1999).
Operated by the U.S Army Corp of Engineers (USACE), releases from Alamo dam have a direct impact on the operation of the Lower Colorado River by the Bureau of Reclamation. Downstream of Alamo Lake, the Lower BWR flows through a series of alternating narrow canyons and wider alluvial valley reaches. The peak discharge in the U.S Geological Survey (USGS) gauge just below Alamo Dam (USGS09426000) during the pre-dam era was estimated at 200,000 cfs (5660 cms) in February 1891 (Patterson and Somers, 1966), while post-dam maximum discharge, as of October 2017, has not exceeded 7000 cfs (200 cms), which is the maximum possible release rate from the dam.

Climate models evaluation
A comparison of the average monthly total precipitation among the SD, DD, raw GCM, WRF ERA-Interim reanalysis, and gridded observations for the historical period  is shown in Fig. 2 for the Had-GEM2-ES (left), MPI-ESM-LR (center), and GFDL-ESM2M (right). This Figure compares among the spatial averages over the entire BWR watershed and it represents well the results for each of the eleven sub-basins (Figures for the subbasins are not included). It is seen that the SD simulations closely follow the observed gridded dataset. This is expected because the SD procedure was designed to conform to the observed record. The WRF ERA-Interim reanalysis follows the seasonal patterns fairly well, which is indicative of the skill of WRF for producing monthly climatological averages when it is nested within optimal boundary conditions. The added value of WRF is also seen by comparing the raw GCMs to the DD simulations. Marked differences are seen for the summer, where raw GCMs underestimate the precipitation and DD overestimated by HAD and lagged by about 1month by the GFDL.

Observation datasets
Hourly Mean Areal Precipitation (MAP) (1 October 1980-30 September 2010) data from the Colorado Basin River Forecast Center (CBRFC), National Weather Service (NWS) (www.cbrfc.noaa.gov) are available for eleven sub-basins in the BWR watershed (Table 1). These historical MAP time series were derived by CBRFC for the purpose of calibration of their hydrologic models. Using elevation thresholds, each of the three tributaries that drain into Alamo Lake ( Fig. 1) was divided into three sub-basins (upper, middle, and lower). Two Additional downstream sub-basins (upper and lower) just upstream of the lake were configured by CBRFC. The MAP derivation procedure consists of quality control of the rain gauge records and spatial interpolation with the Mountain Mapper technique used by the NWS River Forecast System (Schaake et al., 2004).
The inflow into Alamo Lake was calculated from a mass balance equation, using lake's water level from the USACE , outflow from the lake as observed at the gauge just below the dam, and estimated monthly lake evaporation from CBRFC. For the analyses of future projections, we used the historic monthly evaporation values. It is likely that future climate may impact the lake evaporation. However, lake evaporation is a complex process and the projection of future evaporation requires analysis of the lake energy balance in addition to the energy and hydrodynamic atmospheric conditions over the lake (e.g. Shilo et al., 2015).

Hydrologic analysis
There are two wet seasons (winter and summer) in the BWR watershed. The winter (November-March) peaks in January-February and the summer extends from July to September. The spring (April-June) is relatively dry with mean areal precipitation of 21 mm and less than 1000 acre feet inflow to the lake. The fall (October) is generally very dry with infrequent very wet storms that have synoptic rainfall bearing conditions that are often different than the ones of the winter storms.    Table 1.
The difference between winter and summer MAP is seen in Fig. 3 for the average over the entire BWR watershed. Average winter (November-March) and summer (June-September) precipitation, as indicated by the dashed lines, are 180 and 130 mm/ season, respectively. Although the winter season is occasionally very wet, with a maximum of 430 mm, it is more often drier than the average, which implies a more skewed distribution (skewness coefficients are 0.95 and 0.52 for winter and summer, respectively). The higher variability of winter precipitation can also be shown by comparing the coefficient of variations (0.6 and 0.4 for winter and summer, respectively).
Compared with summer, winter streamflow events are longer with larger volume, higher daily maximum, and the duration between the streamflow events is shorter. The shorter duration between winter streamflow events may be attributed to the longer events that have long lasting baseflow . In Fig. 4 total seasonal (winter and summer) inflow into Alamo Lake are compared with total seasonal outflow from the Lake. Notice that although average summer precipitation is 75% of the average winter precipitation, the average summer inflow into Alamo Lake is only 10% of the average winter inflow. Even the largest releases recorded for the summer are relatively small compared with winter releases. It is also noted that many of the winters had very low inflows. This large seasonal difference between precipitation and streamflow ratio underscores the necessity to consider the properties of the precipitation events and their interaction with land surface processes that control the generation of streamflow.

Hydrologic modeling framework
In order to assess the impact of the projected climate on the hydrologic system in the BWR, a hydrologic modeling framework that comprises the following three models was configured: precipitation Weather Generator (WG), hydrologic model, and lake model. Future projections were applied by modifying the WG to generate hourly precipitation ensembles that reflect the projected changes as inferred from comparing between the simulations of the historic and future periods for each of the downscaled climate models. Thus, the only projected future change that is being considered in this study is the projected change in precipitation. The detailed development and evaluation of the hydrologic framework is described in Shamir et al. (2017). In this section, we provide a description of the modeling framework components.

Precipitation weather generator
The above analysis of the BWR watershed precipitation and lake inflow points to the importance of representing the temporal and spatial characteristics of the rainfall events. The purpose of the precipitation WG is to produce numerous equally likely (random) realistic scenarios of hourly precipitation sequences. An ensemble that includes a sufficient number of realizations of these scenarios should represent the spatiotemporal statistical characteristics of the observed record, the natural variability of events, and their likelihood to occur.
Based on analysis of the MAP from CBRFC (1980-2010), a WG was developed to simulate likely hourly MAPs for the 11 subbasins of the BWR watersheds. Since the WG was developed to describe the spatial and temporal characteristics of the MAPs time series that were used as input for the development and calibration of the SAC-SMA hydrologic model, the WG realizations can be used as precipitation input to the hydrologic model.
By modifying the WG to reflect the changes that were identified from comparing between the historic and projected simulations of the climate models, it is also possible to assess the impact of projected future precipitation changes on the lake inflow and lake level. The precipitation WG concept that is presented herein is following on previous work by Shamir et al. (2007aShamir et al. ( ,b, 2015, Shamir (2017), and Wang et al. (2007). A detailed description and evaluation of the WG for the BWR watershed is provided in Shamir et al. (2017) and a brief description is provided below.
The WG is comprised of two successive modules: 1) a point process module to derive hourly MAP that represents the entire BWR watershed (Fig. 5, left side) and; 2) a spatial disaggregation module to estimate hourly MAPs in the eleven sub basins (Fig. 5, right side).

Point process module
The point process model is developed for three wetness categories (wet, medium and dry) for both winter (November-March) and summer (June-September). The spring (April-May) and fall (October) were assumed to be dry. The highly skewed precipitation  distribution and the very large inter-annual variability in the arid Southwest U.S. calls for a WG with wetness categories that can be developed independently (Shamir et al., 2007a,b). The wetness categories represent the tercile statistics of the total observed seasonal (summer and winter) precipitation during 1980-2010. The selection of a wetness category in the WG for a given season is sampled from a uniform distribution (i.e. a distribution that has constant probability) and it is independent from the selected wetness category of the previous seasons.
Following the sampling of a wetness category, the duration of winter or summer is determined by sampling the season's precipitation onset and offset. The onsets were identified in the observed record as the duration since 1 June and 1 November for the summer and winter, respectively when the cumulative areal average precipitation over the entire basin exceeded 10 mm. The offset of the seasons were selected as the last precipitation event prior to end of September and the end of March for the summer and winter, respectively. The onset and offset of the rainy season are selected from a normal distribution with a mean and standard deviation calculated from the historical record for the winter and summer and for the different wetness categories.
Following the selection of the wetness category and the duration of the rainy period, a precipitation time series is being created for the season (winter or summer) by sequentially sampling from the following distributions: inter arrival time of storms, duration of storms, the chance for an hourly precipitation event to occur, and the magnitude of the hourly event. This sequential sampling repeats until a time series is generated for the duration from the season's onset to its offset, as seen in looping arrow in Fig. 5. Each of this distribution was developed independently for the three wetness categories (i.e. wet, medium, and dry).
The point process module is based on the assumption that precipitation storm events tend to arrive in clusters as a response to a transient synoptic scale atmospheric disturbance. Each synoptic event may produce multiple hourly precipitation pulses with possible multiple dry hours in between. The definition of a storm is a key component that is required for the derivation of a sample of observed storms to be used for the development of the WG parameters. Our storm definition procedure is based on the assumption that the distribution of storms inter-arrival time constitutes a Poisson stochastic process and therefore the distribution of storms inter arrival times conforms to an exponential distribution (Restrepo-Posada and Eagleson, 1982).
Thus, it is possible by selecting the minimum inter-arrival time between storms (i.e. the number of dry hours beyond which the occurrence of rainfall marks the beginning of a new event) to develop a statistical sample of storms with inter-arrival distribution that has a coefficient of variation of one, as in an exponential distribution. In this study the minimum inter arrival time were prescribed to 84 and 36 h for the winter and summer, respectively. A storm event is defined as one that follows the previous storm by a period that is longer than the minimum inter arrival time, starts, and ends with an occurrence of hourly precipitation events.

Precipitation WG spatial distribution
The BWR basin-wide hourly MAP realizations that were derived by the point process module are further disaggregated into MAP for the 11 sub-basins. The disaggregation consists of two steps. First, for each wet hour, a wetness category in each sub-basin is independently assigned. The wetness categories, which can also include an assignment of no-rain event, were sampled for winter and summer from a uniform random distribution with chances coefficients as calculated from the observed sub-basins MAP records. The chance coefficients indicate the chance of a precipitation event in a sub-basin to be in a particular tercile, independently of the other sub-basins, as a function of the wetness category of the average master time series.
Second, the magnitude of the hourly precipitation in the subbasins, if in the upper tercile, is assigned from a Generalized Pareto distribution with its threshold parameter being the observed 67th percentile in each of the sub-basins. Otherwise, the precipitation magnitude is selected from a log normal distribution. The parameters of these distributions were derived independently for each sub basin from the observed non-zero hourly events.
In order to better simulate large extreme events, we added a chance of one day per winter to sample an upper tercile precipitation event from Generalized Pareto distributions that were parameterized to represent the annual maxima series of the observed records for each sub-basin. The threshold parameters of the Generalized Pareto distributions are assigned as the lowest value of the annual maxima series in the sub-basins. Fig. 5 outlines the different WG components. The left column describes the sequential sampling process that is used to derive the entire basin areal average hourly time series and the right column describes the sequential sampling process that is used to disaggregate the areal average to areal averages in the sub-basins. Following the selection of a wetness category for a given season (winter or summer) the duration of the rainy season is selected as the time between the seasonal onset and offset. The looping arrow indicates a repetitive sampling sequence that continues until the precipitation simulation is completed for the identified rainy season duration.
In Shamir et al. (2017), in addition to detailed description of the WG development, the performance of the WG was comprehensively assessed for the winter and summer with respect to the distribution of hourly precipitation, frequency of occurrence of hourly events, distribution of seasonal totals and occurrence of extreme hourly events. These are perceived as precipitation features that control runoff generation in ephemeral streams. In Figs. 6 and 7, we provide an example for the WG performance in simulating the distribution of the winter hourly precipitation and frequency of occurrence of hourly precipitation, respectively. In these Figures, the cumulative distributions of 100 WG realizations, each 30 years The left column describes the point process sequential sampling to derive basin average hourly time series. The right column is the sequential sampling to disaggregate the synthetic time series to the sub-basins. The looping arrow indicates a repetitive sampling sequence that continues until the precipitation simulation is completed for the duration between the onset and offset of the winter or summer. of hourly MAP for the 11 sub-basins (similar to the dimension that available for the observed record) are shown as gray lines. The WG realizations are compared with the cumulative distribution of the observed MAPs from CBRFC, which are shown in red. While the cumulative distributions in Fig. 6 are derived for all non-zero precipitation events during the 30-years period, in Fig. 7, the distributions are of a seasonal index and therefore the ensemble comprised of 100 realizations, each includes 30 values. The WG simulations of the hourly precipitation show tight spread and overall match very well the distribution shape of the observed records in the subbasins (Fig. 6). Since the WG was designed to represent the interand intra-annual variability, tight spread of the realization and a good match of the shape of the observed distributions are considered favorable traits when comparing multi year time series. The lower right panel, which compares the distributions of the areal weighted averages over the entire Bill Williams watershed, offers an evaluation measure for the spatial module performance of the WG.
The inter-annual variability of the ensemble is presented by the spread between the realizations of the count of non-zero precipitation occurrence distributions (Fig. 7). The distributions of the ensemble's realizations encompass well the observed distributions in the sub-basins and slightly underestimate the frequency over the entire watershed (lower right panel).
To the authors' knowledge, there is no formal methodology for evaluating the performance of a WG ensemble. The spread of the ensemble realizations should represent the uncertainty that is associated with the observed record. The width of the ensemble spread should neither be too wide or too narrow. A narrow spread does not describe the uncertainty and variability in the data while a spread that is too wide may fail to capture the unique characteristics of the data.

Hydrologic model Sacramento soil moisture Accounting
In this hydrologic framework, we implemented the CBRFC hydrologic model configuration and parameters for the BWR watershed. The CBRFC uses the NWS River Forecast System, which includes the following components: 1) the Sacramento Soil Moisture Accounting model (SAC-SMA) as hydrologic model that continuously updates the soil moisture conditions and simulates runoff and streamflow in the channels (Burnash et al., 1973); the Snow17 model which keeps track of snow accumulation and ablation in the basins (Anderson, 1976) and; a unit hydrograph to route the channels' streamflow into the sub basins outlets. The CBRFC operational model is mainly used to simulate high flow events, and in order to accommodate water resources studies, the SAC-SMA parameters were further tuned.
The tuning of the parameters was conducted by comparing the daily streamflow simulations to the observed daily flows at the Big Sandy (USGS 09424450), Burro Creek (USGS 09424447) and Santa Maria (USGS 09424900) (Fig. 1). We used the MAP time series as input for the SAC-SMA model in order to simulate the daily streamflow that were used for the tuning of the parameters.
The simulated winter and summer cumulative distributions of the daily inflow into the lake are compared to the calculated inflow in Fig. 8. For the winter, the calculated and simulated cumulative distributions matched very well. On the other hand, for the summer the simulation overestimated the calculated record. The weak model performance during the summer period was also visually noticeable in the hydrographs. Calibrating to match well the summer events requires to compromise the winter model performance. The lack of performance during the summer is thought to be attributed to the inability to capture the characteristics of the short and local intense rainfall characteristics of the summer rainfall. The use of MAP over relatively large basins likely has a smoothing effect that misrepresents the convective characteristics of the summer rainfall events. Thus, a good simulation of summer events requires higher resolution model configuration with input data to represents the convective summer storm characteristics.

Alamo lake model
An hourly mass balance model that simulates lake level, storage, and releases from the dam as a function of inflow into the lake is described below. The lake operation rules are based on 'rain on the ground' strategy, which implies that the operation rules are reactive to observed inflow into the lake. The physical dimensions of the lake and the relationships between water level, storage and surface area are from Kirby andBurnham (1998), Bill Williams River Corridor Technical Committee (1994), and the U.S. Army Corps of Engineers (USACE) website (http://resreg.spl.usace.army. mil/pages/alamo.php). The operational rules that specify dam releases as a function of lake level and season, the specification of the dam's dimensions, and the monthly lake evaporation values are from the USACE Operational Manual (2003).
The operation of the lake is intended to maintain the lake water level at, or near, 1125 feet (342.9 m), for as long as possible. This level is considered optimal to satisfy all the objectives of the authorizing legislation and optimize downstream benefits. The actual operation of the dam often deviates from the recommended rules. During storm events that impact large areas of the Colorado River basin, releases from Alamo dam are coordinated in order to control the flow on the Colorado River. Other cases of deviation from the operational rules may be subjective operational considerations such as dam maintenance and specific downstream demands.
The simulations and observations of the lake's water level and outflow are shown in Fig. 9. It is seen that although the lake's actual operation had occasionally deviated from the rules, the simulated water level and outflow overall do well at representing the observed record.
It is important to note that the flood control storage compartment of the lake holds about 600,000 acre-feet (740 10 6 m 3 ), and at this water level the maximum release rate is 7000 cfs (200 cms). This is a relatively large storage volume that has been utilized to store water during flood events in the Colorado River basin. However, at the maximum release rate it takes more than 40 days to drain the entire volume of the flood control storage, which may be too slow in emergency situations.

DD vs. SD historical period
Cumulative distributions of the historical inter-annual daily precipitation characteristics (mean, standard deviation, and number of rainy days from top-to-bottom, respectively) for the DD, SD, reanalysis and observed MAP are compared in Fig. 10. The WRF reanalysis simulation matches well the observed MAP distribution, which implies that at least at the scale analyzed herein, the reanalysis simulation captures precipitation features that are important for hydrologic assessment. The SD simulations of the three models are, as expected, very similar to each other, since they were created to conform to the Livneh dataset. However, compared with the observed MAP, the SD simulations and therefore also the Livneh dataset, underestimate the mean and standard deviation and overestimate the number of daily events. This underestimation suggests that although the LOCA well represent the Livneh dataset, it does not do well representing the CBRFC MAP for the study area.
The DD simulations exhibit larger differences among the GCMs. The three DD simulations overestimate the mean and standard deviation, and the frequency of the daily precipitation is underestimated by the MPI and overestimated by both HAD and GFDL.
These differences seen in Fig. 10 between the downscaled simulations (DD and SD) and the observed MAP imply that the downscaled simulations are not adequate as rainfall input to the hydrologic model that was developed with the observed MAP time series. In order to use the downscaled simulations as rainfall input  to the hydrologic model, it is necessary to modify the simulations to have similar characteristics to the input that was used for the configuration of the hydrologic model. Bias correcting of the SD and DD simulations, a practice commonly used, may show a good fit in the upper panel of Fig. 10 but will likely not resolve the differences that are seen in the middle and lower panels, which are important precipitation features in arid environment.

Tercile analysis
The projected mid-21st century (2020-2059) precipitation changes analysis, looking at rainfall characteristics that were used for the development of the WG, is presented in this section. The distributions of the downscaled simulations of the different WG features, such as the duration of the storms, the time between storms, and the hourly magnitudes were compared between the historic and projected periods for the three wetness categories. This comparison showed no statistically significant differences between the historic and projected periods for these features, which implies that the historic wetness categories represent well the projected rainfall characteristics of the matching wetness category. The most notable difference between the historic and projected periods is the frequency in which the wetness categories are projected to occur in the future.
The projected frequencies of the seasonal wetness categories in mid-21st century as defined by the terciles of the historic period are listed in Table 2. In this table, chances that are significantly different than the historical period are indicated.
The significance test was determined from a Monte Carlo experiment. In this experiment, we randomly sampled from a continuous uniform distribution that is taking values from 0 to 1. We sampled an ensemble of 100,000 members, each member with a sample size of 40 to represent the 40-year the future projection duration of interest. The statistical distribution of the tercile values of the ensemble members can be used to estimate the chance of a randomly selected value to be in a given tercile.
In this Monte Carlo experiment we found that 5% [1%] of the terciles were below 0.22 [0.18] and above 0.46 [0.51]. Thus, it can deduced, for example, that a given projected wetness category with frequency that is either below 0.18 or above 0.51 has chance of less than 1% to be selected from a uniform distribution, and therefore we assume that it is significantly different than 33%.
For winter season, the MPI DD and SD simulations projected a significantly higher frequency of dry winters. The DD-MPI indicates decreased frequency in wet winters and the SD-MPI indicates decreased in normal winters. The DD-GFDL, similarly to the DD-MPI, shows increase and decrease frequency of dry and normal winters, respectively. No significant changes in winter projections are seen for the HAD GCM. The projection for higher frequency of dry winters is congruent with the Assessment of Climate Change in the Southwest U.S report (Garfin et al., 2013) that projected a decrease trend in winter precipitation.
For the summer season, the DD-HAD projected a lower frequency of dry and normal summers and a higher frequency of wet summers in mid-21st century. The only other significant result is shown for the DD-GFDL that projected decrease frequency of dry summers. An analysis of the GCMs that participated in the IPCC Fifth Assessment Report, indicated a delay in the onset of the monsoon season and increase rainfall in the late summer (Cook and Seager, 2013).

Hydrologic impact results
In order to assess the projected hydrologic impact, we modified the WG by applying the frequencies of winter and summer wetness categories as indicated in Table 2 to generate six additional ensembles (SD and DD for three GCMs) of MAPs for the eleven sub-basins that represent the mid-21st century projections. Each ensemble consists of 100 realizations of hourly precipitation, and each realization is 30-year long. These ensembles were used as input to the hydrologic model and the lake model to generate ensembles of lake inflow, outflow and water level.
In Fig. 11, the cumulative distributions from the 100 realizations of the 30-year mean annual precipitation over the entire  BWR watershed are shown. The right panel of this figure shows the ratio between the quantiles of the cumulative distributions of the historic and projected simulations. The average changes of all the ratios of the quantiles are indicated in the upper right. While the DD-MPI has a clear signal of drying, the DD-HAD and SD-HAD are showing a wetting trend. The smallest changes are found for the GFDL. For both the MPI and HAD the largest changes are seen for the DD simulations. This behavior is likely the result of the changes in the forcing of the regional climate model as manifested in Table 2. As in Fig. 11, in Fig. 12 we plot the 30-year mean annual inflow into the lake. As expected the direction of the trends for the different projections are similar to the ones shown in Fig. 11 for the precipitation. However, the magnitude of the projected change is much larger for the lake inflow. The two extreme examples are the DD-HAD with average projection to increase the precipitation by 16% and lake inflow by 42%, and the DD-MPI with average projection to decrease the precipitation by 9% and lake inflow by 23%. This underscores the high sensitivity of the BWR watershed to small changes in precipitation, which is attributed to the complex hydrologic response in ephemeral streams.
The results for the projected changes in the outflow from the lake (not shown) are very similar to the changes projected for the inflow (projected average changes are within 2% of the changes indicated in Fig. 12).
The projected climatic impact on the lake is shown in Fig. 13. In this Figure, the cumulative distribution of the percent of the time in the 30-year duration that the water level is projected to drop below 1070 feet, which is the optimal lake level for operations (see Fig. 8), is shown. While the HAD showed less events with lake level below this marker, the MPI showed a substantial increase in frequency of the lake levels dropping below this marker. It is seen that the MPI and HAD DD simulations produce the extreme wet and dry events, respectively.
The projected extreme high levels of the lake are explored in Fig. 14 for 1171.3 and 1235 feet water levels, which are the flood control threshold and the dam's crest (Fig. 9), respectively. We note that the projected encroachment into the flood control reservoir is not substantially different from the historic period, and the probabilities for this exceedance are fairly small. Moreover, with the current lake operational rules the chance to reach the crest    level is less than 0.1% in frequency of 95%. This is because of the large flood control storage that provides a buffer and requires multiple days of very large flow.

Summary and conclusions
The objective of this study is to assess the impact the downscaling methodology (statistical versus dynamical) of the projected mid-21st century on hydrologic assessment in an arid environment. Our study area, the Bill Williams River Watershed that drains into Alamo Lake, is comprised of three ephemeral rivers that are highly sensitive to temporal and spatial precipitation characteristics. Precipitation projections are available from three CMIP5 RCP8.5 GCMs that were found to well perform for Southwest U.S (i.e. HadGEM2-ES, MPI-ESM-LR, and GFDL-ESM2M). For these three GCMs, dynamical downscaled projections using WRF were compared to statistical downscaled projections available from the LOCA dataset. Hydrologic assessment was carried out using a modeling framework that is based on the operational hydrologic model configuration used by the NWS CBRFC. This hydrologic model consists of the SAC-SMA hydrologic model that runs with hourly mean areal precipitation to generate streamflow in ten internal locations and inflow into Alamo Lake. The Lake outflow was simulated using USACE dam operational rules and lake specifications.
A precipitation weather generator that represents the temporal and spatial statistical characteristics of the CBRFC MAP that forces the hydrologic model was developed. This WG is developed to produce ensembles of likely to occur hourly precipitation that are congruent with the hydrologic model input. The WG was further modified to generate ensembles that reflect the projected mid-21st century as found by the SD and DD simulations of the three selected GCMs. These ensembles were created by modifying the WG to reflect the projected precipitation changes that were analyzed by comparing the historic and future periods of the climate projections.
The main findings of this study are that the DD simulations of the MPI and the HAD showed larger range of projected changes than their corresponding SD simulations. These changes however are contradicting in their mid-21st century projection trends, as while the MPI projected a drying trend, the HAD projected a wetting trend. The projected changes in the frequency of occurrence of dry or wet season imply projected increase in climate variability. On the other hand, the contradicting trends between the DD-MPI and DD-HAD points to the increase uncertainty in the projected future precipitation.
These changes in wetting and drying are magnified during the progression from precipitation, to streamflow, to lake inflow, and changes to lake level. Changes in risk of lake flooding were not clearly identified in this study. However, the main finding points to an increasing challenge to operate the lake in its target water level.
We note that although the study did not yield a clear trend of drying or wetting, it provides uncertainty bounds that can be useful for future planning of the lake operations. These uncertainty bounds are much wider when considering the dynamically downscaled models of the MPI and HAD. Relying solely on statistically downscaled projections may underestimate the degree of change in projected precipitation. This information should guide an adaptive lake operation that can adjust and respond to these uncertain lake water level changes.
Finally, we stress that our study used a limited number of downscaled model simulations, which may not represent all other statistically and dynamically downscaling procedures. Moreover, the study was conducted in the Bill Williams River watershed, located in the arid Southwest US. Thus, the generalization of our conclusions to other downscaled datasets and other regions may not be appropriate.