Climate change and non-stationary flood risk for the upper Truckee River basin

Future flood frequency for the upper Truckee River basin (UTRB) is assessed using non-stationary extreme value models and design-life risk methodology. Historical floods are simulated at two UTRB gauge locations, Farad and Reno, using the Variable Infiltration Capacity (VIC) model and non-stationary Generalized Extreme Value (GEV) models. The non-stationary GEV models are fit to the cool season (November–April) monthly maximum flows using historical monthly precipitation totals and average temperature. Future cool season flood distributions are subsequently calculated using downscaled projections of precipitation and temperature from the Coupled Model Intercomparison Project Phase 5 (CMIP-5) archive. The resulting exceedance probabilities are combined to calculate the probability of a flood of a given magnitude occurring over a specific time period (referred to as flood risk) using recent developments in design-life risk methodologies. This paper provides the first end-to-end analysis using non-stationary GEV methods coupled with contemporary downscaled climate projections to demonstrate the evolution of a flood risk profile over typical design life periods of existing infrastructure that are vulnerable to flooding (e.g., dams, levees, bridges and sewers). Results show that flood risk increases significantly over the analysis period (from 1950 through 2099). This highlights the potential to underestimate flood risk using traditional methodologies that do not account for time-varying risk. Although model parameters for the non-stationary method are sensitive to small changes in input parameters, analysis shows that the changes in risk over time are robust. Overall, flood risk at both locations (Farad and Reno) is projected to increase 10–20% between the historical period 1950 to 1999 and the future period 2000 to 2050 and 30–50% between the same historical period and a future period of 2050 to 2099.


Introduction
"Stationarity is dead" (Milly et al., 2008), yet the standard practice for flood frequency analysis is predicated on this very assumption.This discrepancy has not gone unnoticed within the scientific community and there is a growing body of research investigating (1) trends in observed floods (e.g., Franks, 2002;Vogel et al., 2011), (2) ways to incorporate non-stationarity into frequency distributions (e.g., Katz et al., 2002;Raff et al., 2009) and (3) methodologies to interpret risk and approach design within a non-stationary framework (e.g., Mailhot and Duchesne, 2010;Rootzén and Katz, 2013;Salas and Obeysekara, 2014).Both the frequency and intensity of extreme events are particularly susceptible to change because small shifts in the center of a distribution can potentially have much larger impacts on the tails (Meehl et al., 2000).Regardless of climate change, naturally occurring long-term climate oscillations, such as the El Niño-Southern Oscillation (ENSO), have been linked to low frequency variability in flood frequency (e.g., Cayan et al., 1999;Jain and Lall, 2001).Anthropogenic climate change has the potential to amplify natural climatic variability throughout the interconnected climate and hydrologic systems.
Already trends in many hydrologic variables have been observed across the western United States (as well as around the world).For example, clear increases in temperature have been measured across the west (e.g., Cayan et al., 2001;Det-tinger and Cayan, 1995).Precipitation trends are more variable.Regonda et al. (2005) found increased total winter precipitation (rain and snow) from 1950 to 1999 in many sites across the western United States, although springtime snow water equivalent (SWE) was shown to decline over the same period.Similarly, Mote et al. (2005) analyzed snowpack trends in western North America and reported widespread declines in springtime SWE over the period 1925-2000, especially since the middle of the 20th century.They attribute this decline predominantly to climatic factors such as ENSO, Pacific Decadal Oscillation (PDO) and positive trends in regional temperature.Easterling et al. (2000) summarized previous studies on precipitation trends.They note that trends vary from region to region but, in general, increases in precipitation have occurred disproportionately in the extremes.Several subsequent studies have observed increasing trends in extreme precipitation events, although the changes are relatively small (Gutowski et al., 2008;Kunkel, 2003;Madsen and Figdor, 2007).
Research has also demonstrated increasing trends in flood frequency in some regions.Walter and Vogel (2010) and Vogel et al. (2011) observed increasing flood magnitudes across the United States using stream gauge records, and Franks (2002) showed statistically significant increases in flood frequency since the 1940s.Still, non-stationary flood behavior has been historically difficult to quantify and there has been some debate about the significance of flood frequency trends.For example, Hirsch (2011) noted both increasing and decreasing trends in annual flood magnitudes in different regions of the US.Also, Douglas et al. (2000) found that, if one takes into account spatial correlation, many previous findings of flood trends are not statistically significant.Difficulty in diagnosing flood trends is not unique to the western US; a literature review of historical flood studies across Europe also found spatial variability in flood trends (Hall et al., 2014).
Even when significant trends are found, the complexity of flooding mechanisms, which depend on many variables that can vary regionally and seasonally, makes it difficult to attribute trends to specific causes.Illustrating the importance of seasonality, Small et al. (2006) showed that if a highprecipitation event occurs in the fall, as opposed to the spring, it will contribute to baseflow rather than inducing flooding.Also, urbanization can drastically increase the impervious area of a basin, thus amplifying floods by decreasing infiltration and speeding runoff.The largest flood magnitude increases observed by both Walter and Vogel (2010) and Vogel et al. (2011) were in basins with urban development.The influence of development trends on flood behavior can be difficult to separate from other variables.For example, Villarini et al. (2009) could not conclusively tie reduced stationarity (i.e., changes in mean and/or variance) in peak discharge records to climate change because of variability in the other factors that influence runoff.Merz et al. (2012) note that attributing changes in flood hazard is complicated by the complex array of drivers that can include land cover change and infrastructure development as well as natural climate variability and change.Here we set aside the impacts of development and management practices and focus on the role of climate change.However, even with this simplification, future extremes can still be influenced by a number of interrelated variables such as changes in temperature, precipitation efficiency and vertical wind velocity (Mullet et al., 2011;O'Gorman and Schneider, 2009).Analyzing global circulation model (GCM) outputs, Pierce et al. (2012) found total changes in precipitation to be small relative to the existing variability, but noted larger seasonal changes in storm intensity and frequency.Despite uncertainty, many studies agree that warming will increase the potential for intense rainfall (Allan, 2011;Gutowski et al., 2008;Pall et al., 2011;Sun et al., 2007).Furthermore, Min et al. (2011) found that some GCM simulations may underestimate extreme precipitation events in the Northern Hemisphere, indicating that projections of extreme precipitation based on GCM outputs may be conservative.
Studies have also predicted increases in flood frequency and magnitude with a warmer climate, especially in snowmelt-dominated basins (e.g., Das et al., 2011).As with historical flooding trends, translating forecasted climate variables to flood frequency is a complex process and several methodologies have been used.Downscaled GCM climate forcings can be used to drive hydrologic models and simulate future floods directly (e.g., Das et al., 2011;Vogel et al., 2011;Raff et al., 2009).With this approach, traditional stationary flood frequency distributions can be fit to the simulated floods to calculate return periods of interest (e.g., Raff et al., 2009;Vogel et al., 2011).This allows for return periods and flood magnitudes that change over time, as with the flood magnification and recurrence reduction factors calculated by Walter and Vogel (2010) and Vogel et al. (2011).While these approaches do capture temporal changes between analysis periods, they still assume that flood mechanisms are stationary within each period of analysis.
This limitation can be overcome using non-stationary generalized extreme value (GEV) distributions where the model parameters, like mean (i.e., location) and spread (i.e., scale), are allowed to vary as a function of time (e.g., Gilroy and McCuen, 2012) or with relevant covariates (e.g., Griffis and Stedinger, 2007;Katz et al., 2002;Towler et al., 2010).This approach has been gaining popularity for flood frequency estimation.Using this technique, it is not necessary to simulate future floods directly by forcing a hydrologic model with projected hydroclimate fields (e.g., precipitation and temperature).The parameters of the GEV model, like mean and spread, change with time (i.e., non-stationary) based on a linear combination of covariates like precipitation and temperature.Historical relationships between extreme events and hydroclimate fields are used to identify the weighting of covariates.These weights are then used to estimate parameters for future time periods using precipitation and temperature outputs from hydroclimate projections.For example, Gilroy and McCuen (2012) used non-stationary GEV models of flood frequency that incorporated a linear trend in the location parameter.Similarly, Griffis and Stedinger (2007) and Towler et al. (2010) used climate variables as covariates for the distribution parameters.
While non-stationary flood forecasting methods provide flexibility in analyzing flood variability, they are also incongruent with many of the traditional metrics used in water resources planning.Historically, most infrastructures that are vulnerable to flooding (e.g., dams, levees, sewers and bridges) have been designed to withstand flooding of a specified return period (e.g., the 100-year flood).However, these calculations rely on a flood frequency distribution which is assumed to remain stationary with time, and hence the return period design metric is also assumed to be stationary.When non-stationary methods are used, the underlying flood frequency distributions, and associated return periods, vary with time.Thus, under a non-stationary climate, the notion of a static return period flood event (e.g., 100-year flood, 200year flood) is no longer a valid concept.
To address this issue, Rootzén and Katz (2013) introduced the concept of design life level to calculate the risk of a given flood magnitude occurring over a specified time period.Salas and Obeysekera (2014) further demonstrated the relevance of this technique to the hydrologic community using flood frequency examples.However, this methodology has yet to receive widespread attention within the hydrologic community.Here, we present a non-stationary flood frequency assessment for the upper Truckee River basin (UTRB) using contemporary downscaled climate projections and the nonstationary design life level technique introduced by Rootzén and Katz (2013) to quantify flood risk.Note that, following the convention of Rootzén and Katz (2013), we use the term flood risk to refer to the probability of an extreme event occurring and not as a quantification of expected losses.While the methodology used for this analysis is previously established, this paper provides the first end-to-end demonstration of non-stationary GEV analysis coupled with contemporary downscaled climate projections (specifically, downscaled climate projections from the Coupled Model Intercomparison Project Phase 5, or CMIP-5) to quantify how the flood risk profiles may evolve in the upper Truckee River basin over the 21st century.The flood analysis presented here is part of a larger study on climate change impacts in the Truckee River basin (Reclamation, 2010).This project is supported by local water managers and conducted by the Bureau of Reclamation through the WaterSMART Basin Studies Program authorized under US Public Law 111-11, Subtitle F (SECURE Water Act).The intent of this work is (1) to investigate potential flood risk changes over time in the UTRB and (2) to demonstrate the applicability of non-stationary techniques in a regional flood analysis to make these tools more accessible to the hydrologic community.
The paper is organized as follows.Section 2 provides background on the study area along with the data sets and models used.The methodologies of using non-stationary spatial GEV analysis in conjunction with climate projections and time-evolving risk assessment are described in Sect.3. Results and discussions of findings are given in Sect. 4. Summary and conclusions from the analysis are presented in Sect. 5.

Upper Truckee River basin
The Truckee River originates in the northern Sierra Nevada in California (above Lake Tahoe) and flows northeast to Nevada, where it ends in the Pyramid Lake (Fig. 1).The total basin area is roughly 7900 km 2 ; however, the area upstream of Reno (2763 km 2 ), henceforth referred to as the upper Truckee River, provides the majority of the basin's precipitation through snowpack.The focus of this analysis is on the Farad and Reno gauge locations shown in Fig. 1, henceforth referred to as Farad and Reno.The Farad gauge is located roughly 1.5 km downstream of the Farad hydropower plant and provides a cumulative measure of all of the upper basin tributaries (Stokes, 2002).Most of the available water supply is generated upstream of the Farad gauge (USACE, 2013a).The Reno gauge is located downstream of Farad in the heart of Reno and is a good reference point for analyzing urban flooding.The intervening area between the Farad and Reno gauges is small, roughly 350 km 2 [km], and there are only two small tributaries that enter the main stem between Farad and Reno.
Flooding in the upper Truckee generally takes one of three forms.Some of the most severe floods have resulted from heavy rain events covering most of the basin and lasting 1 to 6 days.These storms generally occur from November to April and may be linked to atmospheric rivers (Ralph and Dettinger, 2012).Snowmelt floods are also common from April to July.Although snowmelt floods transmit large volumes of water for longer durations, they generally do not cause damage because they are typically well predicted and can be regulated with upstream reservoirs.Finally, in late summer (July-August), local cloudbursts can generate highintensity precipitation over small areas.These storms can cause local damage to tributaries but generally do not have a large impact on the main stem of the Truckee.
In the 20th century, nine major floods have been recorded on the Truckee River, all of which occurred from November to April (USACE, 2013b).The flood of record occurred in January of 1997 and was caused by warm rain falling on a large snowpack (∼ 180 % of normal) and melting nearly all of the snowpack below ∼ 2100 m (USACE, 2013b).The floods of 1950, 1955 and 1963 were some of the most damaging due to the development of Reno along the river during this time period (USACE, 2013b).Subsequent flood damages have been, at least partially, mitigated by the implementation of flood infrastructure starting in the 1960s.

Streamflow data and simulations
Streamflow has been measured at both the Farad and Reno USGS gauges.However, gauge flows are not readily applicable to flood frequency analysis due to upstream developments of water supply and flood control structures.For example, upstream of Reno there are four dams with flood control capabilities (i.e., Martis Creek Dam, Prosser Creek Dam, Stampede Dam and Boca Dam); in addition to Tahoe, Donner and Independence lakes provide incidental flood regulation.Unregulated flow estimates were developed by the US Army Corps of Engineers (USACE) but are only available for historical flood events (USACE, 2013b).Therefore, we simulate unregulated flows from 1950 to 1999 using the three-layer variable infiltration capacity (VIC) model and validate results using the available unregulated flow estimates.
A brief summary of the VIC model is provided here, and for additional technical specifications the reader is referred to Liang et al. (1994Liang et al. ( , 1996) ) and Nijssen et al. (1997).VIC is a gridded hydrologic model designed to simulate macroscale (spatial resolution is greater than 1 km) water balances using parameterized sub-grid infiltration and vegetation processes.In the VIC model, surface water infiltrates to the subsurface based on soil properties, and soil moisture is distributed vertically through three model layers extending up to about 2 m below the land surface.At the surface, potential evapotranspiration (PET) is simulated using the Penman-Monteith PET model (Maidment, 1993).Surface flows are determined in a two-step process.First, the water balance for each grid cell is calculated independently to determine surface runoff and baseflow, and subsequently runoff from each cell is routed to river channels and outlets using a predefined routing network.Here we drive VIC with daily weather forcings including precipitation, maximum and minimum temperature, and wind speed.Additional climate variables such as short-and long-wave radiation, relative humidity and vapor pressure are calculated within the model using established empirical relationships.The VIC model is well documented and has already been used in a number of hydrologic and climate change studies (e.g., Christensen and Lettenmaier, 2007;Christensen et al., 2004;Gangopadhyay et al., 2011;Maurer et al., 2007;Payne et al., 2004;Reclamation, 2011;Van Rheenen et al., 2004).Recently VIC has also been applied for real-time flood estimation (Wu et al., 2014).
The VIC model used for this analysis was part of the Bureau of Reclamation (Reclamation) West Wide Climate Risk Assessment (WWCRA) effort and is described in Reclamation (2011).The WWCRA VIC model encompasses the western US.Simulated and observed streamflows were compared at 152 locations primarily from the USGS Hydroclimatic Data Network (Slack et al., 1993) and 43 additional locations of importance to Reclamation's water management activities.Among the evaluated locations are several in the Truckee basin including the Truckee River at Farad.For details on model calibration and development we refer the reader to Reclamation (2011) and Gangopadhyay et al. (2011).While we do not discuss model calibration further here, in the subsequent sections we provide additional model verification for flood simulation in the UTRB.

Climate data and models
As noted in the previous section, the VIC model requires daily climate inputs to drive water balance simulations.We use the national 1/8 • (roughly 12 km) gridded data set from Maurer et al. (2002) for historical (i.e., 1950-1999) climate observations.Additionally, monthly total precipitation and average temperature were aggregated for the upstream area of each gauge for every month of the flood season (i.e., November through April).These values are used as covariates for fitting non-stationary GEV models discussed in Sect.3.
Future gridded precipitation and temperature values from 2000 to 2099 were generated from GCM outputs.We analyzed 234 projections generated by 37 different climate models from the CMIP-5 archive (Taylor et al., 2012).In the absence of objective guidance in contemporary climate literature to limit the number of projections, we chose to include all of the available CMIP-5 projections of future climate in this study.However, it should be noted that other studies have demonstrated that a subset of projections could provide comparable results for specific study objective (e.g., water supply) (Pierce et al, 2009;Harding et al., 2012).Projections span four representative concentration pathways (RCPs) for greenhouse gas emissions.Each GCM projection includes monthly gridded precipitation and temperature from 1950 to 2099 at a coarse grid resolution ranging between ∼ 65 and 250 km.
Reclamation in collaboration with other federal and nonfederal partners has developed a monthly archive of downscaled CMIP-5 projections at the finer 1/8 • resolution using the two-step bias correction and spatial disaggregation (BCSD) algorithm described by Wood et al. (2004).For this analysis we extended the existing hydrology archive to cover the UTRB domain for all 234 BCSD CMIP-5 climate projections following the steps detailed below.A subset of the CMIP-5 hydrology projections is publically accessible through the archive of the downscaled CMIP3 and CMIP5 climate and hydrology projections at http://gdo-dcp.ucllnl.org/downscaled_cmipprojections/.Additional documentation on the archive and the methodology is provided in Reclamation (2014).
The downscaled climate variables include monthly total precipitation, monthly maximum and minimum temperatures and monthly average temperature.Before applying the BCSD algorithm, all 234 climate projections were first gridded from their respective native GCM scale to a common grid of 1 • latitude by 1 • longitude.Similarly, the observed 1/8 • gridded data set (Maurer et al., 2002) was aggregated to the coarser 1 • latitude by 1 • longitude grid.Next, for a given climate variable, GCM and location (1 • latitude by 1 • longitude grid cell), the bias correction (BC) step uses quantile mapping between monthly cumulative distribution functions (CDFs) of historical simulated and historical observed values to identify biases over a common climatological period -in this case, 1950-1999.The projected future climate variables from the same GCM at the same location are then biascorrected using the identified bias.The result of bias correction is an adjusted GCM data set (20th century and 21st century, linked together) that is statistically consistent with the observed data during the bias correction overlap period (i.e., 1950-1999 in this application).Note that the BC step happens at the coarse 1 • latitude by 1 • longitude grid.Next, multiplicative adjustment factors (ratio of bias-corrected GCM to observed) for precipitation and offset adjustment factors (bias-corrected GCM minus observed) for temperature are calculated for each of the 1 • latitude by 1 • longitude grid cells (Reclamation, 2013).These adjustments are then spatially disaggregated (SD) to a 1/8 • latitude by 1/8 • longitude grid.Finally, the adjustments are applied (multiplicative for precipitation, additive for temperature) to the finer resolution, 1/8 • gridded observed precipitation and temperature fields (Maurer et al., 2002) to derive the 1/8 • gridded BCSD climate projections.

Methodology
This section describes the methodology used for flood frequency analysis in the UTRB.Discussion is divided into two sections.First, we describe the process of extreme value modeling using non-stationary GEV distributions (Sect.3.1).Second, the methodology for design-life level risk assessment is described (Sect.3.2)

Extreme value modeling
Extreme values analysis (EVA) deals with the examination of the tail (i.e., extreme) values of a distribution (as opposed to standard approaches which are generally more concerned with the average system behavior).EVA methods are standard practice for flood frequency analysis because they are designed to capture the behavior of low-frequency highimpact events.Furthermore, Katz (2010) points out that in climate change studies traditional approaches are not sufficient and extreme value statistics are needed.For this analysis, we use the GEV, which is commonly applied to flood frequency analysis to model block maxima from streamflow time series (e.g., Katz et al., 2002;Towler et al., 2010).The cumulative distribution function (CDF) for the GEV, F , is as follows: where z is the streamflow maxima value of interest and θ is the parameter set (µ, σ , ξ ) used to specify the distribution, such that the center is given by the location (µ), the spread by the scale (σ ) and the behavior of the upper tail by the shape (ξ ).Based on the shape parameter, the GEV can take one of three forms: Gumbel, or light tailed, when ξ is zero; Fréchet, or heavy tailed, if ξ is positive; and Weibull, or bounded, when ξ is negative.Following the methodology of Towler et al. (2010), GEV parameters (µ, σ , ξ ) are fitted using the maximum likelihood estimation (MLE) technique.
In traditional stationary flood frequency analysis, it is assumed that observations are independent and identically distributed (IID), and therefore model parameters (µ, σ , ξ ) derived from the observed flood record are assumed to remain constant across the period of record and into the future.Here, we introduce non-stationarity into the distribution by allowing location and scale parameters to change with relevant covariates, such that: (2) where the β variables represent the coefficients and the x variables are the covariates.In keeping with previous studies, the shape parameter, which is the most difficult to estimate, is assumed constant (e.g., Obeysekara and Salas, 2014;Salas and Obeysekera, 2013;Towler et al., 2010).Some previous studies (e.g., Salas and Obeysekera, 2014;Stedinger and Griffis, 2011) have developed non-stationary location and scale parameters that are explicitly dependent on time.This approach requires first the derivation of temporal flooding trends and second the projection of this trend into the future.Here we derive location and scale parameters based on time-varying meteorological variables (i.e., temperature and precipitation).With the approach used here, temporal trends in flooding are introduced as a function of temporal variability in precipitation and temperature, but no explicit trend is specified a priori.
To determine the optimal set of covariates for a nonstationary model, additional statistical methods must be employed.The Akaike information criterion (AIC; Akaike, 1974), given in Eq. ( 4), weighs the goodness of the fit of a model with the level of complexity.
Here, NLLH is the negative log-likelihood estimated for a model fitted with K parameters.In this formulation, higherranked models have lower AIC scores.For this analysis, the best model is selected using pairwise comparisons of NLLH scores following the methods of Salas and Obeysekera (2014) and others.Models are compared using the deviance statistic (D), which is equal to twice the difference in NLLH scores.The D is then tested for significance based on a chi-squared distribution with the degrees of freedom set equal to the difference in the number of parameters (K) between models.Finally, p values less than 0.05 indicate a statistically significant improvement in model performance at the 5 % significance level.Following the methodology described above, GEV distributions are fitted to time series of maximum monthly historical (1950-1999) 1-day simulated streamflows (detailed in Sect.2) for the cool season (November to April).Although there are some unregulated historical flow estimates, the available data set only covers six storms.Therefore to be consistent, we fit our model only to the simulated flows.The data set includes maximum daily streamflows for each month in the cool season defined by the block of months November through April, as opposed to the more traditional single value per year.This technique was also used by Towler et al. (2010), who noted that expanding the data set helps avoid the problems associated with using maximum likelihood estimate on small data sets.However, as noted by Towler et al. (2010), when multiple values are used per year the calculated probabilities must be adjusted appropriately to derive annual values.Floods during the cool season generally last between 1 and 4 days.Here we focus on the 1-day flood peak, as opposed to multi-day flood volumes, because this is a representative metric for flood damage.Additionally, using the 1-day flood maximum focuses the analysis on flood magnitude rather than duration.
Two covariates were considered, monthly total precipitation (P ) and mean temperature (T ), averaged over the upstream area for each gauge.As discussed in Sect.2, precipitation is a relevant covariate because many of the floods in this season are rain-on-snow events or extreme rainfall events.Similarly, temperature drives snowmelt and is an important contributor to UTRB flood events (e.g., January 1997 event).Both stationary and non-stationary GEV models were evaluated using the extRemes package (Gilleland and Katz, 2011) in the R statistical computing environment (R Core Team, 2012).

Time-varying risk assessment
Traditional flood planning relies on the concept of return periods, which are usually calculated as the inverse of annual exceedance probability for a given flood magnitude, assuming a stationary distribution: for example, the log-Pearson Type III (LP3) distribution described by the Interagency Advisory Committee on Water Data bulletin 17B (IACWD, 1982).However, when non-stationary models are used, the distribution parameters, and hence the exceedance probabilities, vary with time.Table 1 compares various flood probability calculations between stationary and non-stationary approaches (Salas and Obeysekera, 2014).As shown here, when the flood distribution is stationary, the return period for a given flood magnitude is constant and relies only on the exceedance probability (Eq.4a in Table 1).However, if distribution parameters are non-stationary, then the return period will vary based on the period of interest (Eq.4b in Table 1).This concept is easily extended to flood risk (here defined as the probability of a flood of a given magnitude occurring, not expected losses).In traditional analyses, the risk of a flood occurring in a given period depends only on the length of the period (Eq.5a), while in a non-stationary analysis, risk depends on both the length of time considered and the time period itself (Eq.5b).This is the concept of design life level proposed by Rootzén and Katz (2013).Here we adopt the design-life level risk framework given by Eq. (5b) in Table 1 and calculate the risk of flood for a range of future periods and design life lengths.Probability of the first flood occurring in year x 2 3 Probability of a flood occurring between flood occurrences 4,5 ) 5 Probability of a flood occurring 1 Flood is defined as a flow exceeding a predefined threshold; 2 f (x) is the probability density function of X; 3 F (x) is the cumulative distribution function of X; 4 X is a random variable denoting the waiting time for the first flood occurrence; 5 x max is the time when p x equals 1; 6 n is the length of the time period over which flood risk is calculated.

Results and discussion
Results are grouped into three sections.First we present the development of the non-stationary GEV models (Sect.4.1).
Next the models are verified by comparing simulated results to observations (Sect.4.2).Finally we present future projections of flood frequency analysis (Sect.4.3).

Extreme value model development
A suite of models were fit to the logarithms of block cool season (November-April) maxima flows simulated by the calibrated VIC model with different non-stationary parameter combinations.The model structures tested include stationary, non-stationary location, non-stationary scale and nonstationary location and scale.For all model structures, model fit was tested using one or both covariates (i.e., precipitation and temperature).Models were also tested using the block maxima flows directly; however, performance was improved considerably with the logarithmic transformation.Validation of the VIC-simulated flows as well as the GEV models is presented in the following section.Table 2 summarizes NLLH and AIC scores for each model configuration.The D for pairwise comparisons of NLLH scores and the p values calculated for each D based on a chi-squared distribution are also provided.The bottom row of Table 2 provides the number of parameters in each model and the model number that was used for the pairwise comparisons.As shown here, the models with non-stationary location and scale relying on both precipitation and temperature as covariates have the best (i.e., lowest) NLLH scores for both stations and are a statistically significant improvement over the other models listed in Table 2. Figure 2 plots stationary and non-stationary location and scale models with histograms of observed flow for both gauges.Qualitatively, the stationary model fits well with the center of the distribution but overestimates the tails.The non-stationary models overestimate the median values but are a closer fit to the extreme values.
The coefficients for Eqs. ( 2) and (3) for the selected models are provided in Table 3.Using the coefficients determined above, the location and scale parameters are calculated for every climate projection (i.e., 234) and flood season month (i.e., November to April 1950 to 2099) based on the downscaled precipitation and temperature values detailed in Sect. 2 (note that the shape parameter remains fixed).Thus, for every future month there is a separate GEV distribution curve for each of the 234 climate projections.
To address uncertainty of model parameters (namely the model coefficients, β in Eqs. 2 and 3), models of the same form (i.e., non-stationary location and scale with precipitation and temperature as covariates) were also fit to the historical simulation period (1950-1999) using downscaled precipitation and temperature from all 234 climate projections.Because each climate projection seeks to reproduce similar behavior over the historical 1950-1999 period, the variability between projections in this time frame is a measure of uncertainty in model coefficients given the representation of the same physical system.This differs from the variability between climate projections in future periods (i.e., after 1999), which is a measure of uncertainty in future forcing conditions.Table 3 shows the interquartile range of model coefficients calculated from the 234 historical GCM simulations.Using these parameters, the return period of the design flood at Reno (37 600 cfs, 1065 cms) was calculated for every set of model parameters using observed historical precipitation and temperature.The observed model estimates a return period of 45 years while the interquartile range (IQR) using the simulated model parameters (i.e., the model parameters estimated from each of the 234 historical GCMs) with observed precipitation and temperature varies from 28 to 247 years.Note that the return period of 45 years estimated from observed meteorology is within the IQR of 28 to 247 years.Although the IQR is large, it should be kept in mind that some of the uncertainty in this range is a result of the performance of individual GCMs in simulating historical climate and in the BCSD downscaling methodology.The monthly BCSD algorithm used for downscaling GCM climate only constrains the monthly precipitation and temperature statistics (total precipitation and mean monthly tempera-

Hydrologic and GEV model validation
Since we used modeled VIC flows for flood analysis, there are two considerations for model validation.First, we compare VIC-simulated 1-day flood events to the observed unregulated flow estimates (i.e., validating that our calibrated VIC model is accurately simulating flood flows).Second, we compare the GEV-modeled floods to the VIC-simulated 1day flood events and the observed unregulated flow estimates (i.e., validating that the GEV models we fit to the simulated data match both the observed unregulated flows and the VICsimulated flows).Although unregulated flows are not available for the entire period of record, 1-day maximum unregulated flow estimates are available at Reno for six historical floods (USACE, 2013b).Figure 3 plots the observed flow (blue triangle) with the 1-day VIC flow that was simulated using historical observed forcings from Maurer et al. (2002) (red triangle) and a box plot of the non-stationary GEV distribution for the same month generated using the same monthly historical precipitation and temperature (i.e., Maurer et al., 2002).Comparing first the 1-day maximum VIC-simulated flow with the observed flow the maximum percent difference between the natural logarithm of simulated and observed flows is 12 %.There does appear to be a slight positive bias in the VIC simulations (i.e., VIC-simulated flows are greater than observed flood flows).Still, the simulated flood values (red circles) generally fall within the interquartile range of the GEV In these instances the VIC simulation matches very closely (percent difference in the natural logarithm of flows are 0.5 and 1.2 %, respectively) with the observed flow; however, the GEV model underestimates the events.This discrepancy is caused by the flood timing.In both cases the flood occurs at the very beginning of the month.In the GEV framework, the precipitation and temperature are used as covariates for the flow of the same month.However, for these storms, flooding is linked to precipitation and temperature in the month of flooding and the preceding month.Therefore, the GEV model simulates the flood in the preceding month and/or underestimates the flood magnitude if the precipitation is split between 2 months.While this is a limitation for matching individual historical events (primarily timing), it should not be a major concern for future projections.This is because, for the purposes of risk calculations, it does not matter in what month the GEV model simulates the flood event as long as it realistically captures flood magnitude behavior.Comparing the GEV model distribution to the other observed floods (blue triangles), the distribution encompasses the observed flood magnitude (within the 5th and 95th percentile) for all except two of the floods (1955 and 1963).For 1963, the VIC-simulated and observed floods are in close agreement (the difference between the natural logarithm of simulated and observed flows is the smallest of any event at 0.5 %), and the discrepancy with the GEV model is consistent with the flood timing described above.The 1955 flood resulted from 38 cm of melted snow combined with 33 cm of rainfall over a 3-day period (O'Hara et al., 2007).In the historical forcings used to drive the VIC model, December 1955 has 75 cm of precipitation, which is the highest December precipitation value in the historical period.In this instance, the VIC-simulated flow falls within the interquartile range of the GEV model, but the high monthly precipitation results in an overestimation of the flood magnitude.Again, this is a limitation of using monthly forcings because the total December precipitation is used as a covariate and not a stormspecific value, though in many cases the storm-specific values constitute the bulk of the monthly precipitation totals.
Figure 4 is a time series plot of VIC historical simulated flow along with the median and 5th to 95th percentile flow of the GEV model.As would be expected from the model fit demonstrated in Figs. 2 and 3, Fig. 4 shows that the VIC-simulated flows are generally close to the median GEV-modeled flow and nearly always fall within the 5th to 95th percentile range.Although there are differences in the simulation of individual events discussed above, the median simulated flood magnitudes are only greater than the maximum observed flood in two instances of the 300 historical months simulated.
In general, Figs. 3 and 4 show that the VIC-simulated flows match closely with the observed floods (based on percent difference in the natural logarithm of flows) and that the interquartile range of the GEV distributions encompasses the observed and simulated flows in most instances.Figure 3 does illustrate some of the complications in matching individual events.However, based on analysis of the driving forces behind each individual event we are able to explain and document the sources of these discrepancies.Based on this analysis we conclude that the VIC model behavior has a reasonable match with the natural system.

Future flood risk
Future flood risk is calculated using Eq.(5b) from Table 1.For the first part of this analysis we define "flood" as 1day flow exceeding 1065 cms (37 600 cfs).This is the maximum historical unregulated flow at Reno from the 2 January 1997 event and is considered to be the design flood for flood protection infrastructure design.For each simulation month (1950-2099 November-April) exceedance probabilities are calculated for every climate projection (234 in total) using the selected non-stationary GEV models from Table 3 (fit to the historical observations) and the projected monthly precipitation and temperature.As detailed in Sect.3.2, when exceedance probabilities are time dependent, the flood risk (refer to Eq. 5b, Table 1) is a function of both the length of the design life and the period of operation.Figure 5 plots the risk of flood versus project life for three time periods: 1950 to 1999, 2000 to 2049 and 2050 to 2099.In other words, this is the risk of a flood exceeding 1065 cms in the next n years from 1950, 2000 or 2050.The median and interquartile ranges show the distribution of the 234 climate projections simulated.Here we use the interquartile range, as opposed to the 5th and 95th percentile, to focus on the central tendencies of each time period.Note that the ranges presented here express the variability between climate projections.Uncertainty of future VIC model simulations is not investigated   (1950-1999, 2000-2049 and 2050-2099).Blue dashed lines show the flood risk calculated from the stationary GEV model fit to the historical data.
here.For a detailed analysis of uncertainty in VIC simulations the reader is referred to Elsner et al. (2014).
For both Farad and Reno there is a clear positive shift in flood risk between the three time periods.In all cases the median risk for each subsequent time period falls outside the interquartile range of the preceding time period although the prediction spread for Reno is greater than that for Farad.It is important to note that the flood risk is actually higher at Farad than Reno in both the historical and future periods despite the fact that the observed flow distributions at the two stations are very similar (refer to Fig. 2).This shift between Farad and Reno is caused by the differences in the shape parameters (refer to Table 3).Farad has a relatively heavier tailed distribution (i.e., the GEV shape parameter for Farad is greater than the shape parameter for Reno) and therefore flood risks are increased.The sensitivity of the model parameters (and the associated flood risk) to small differences in the flow and covariate distributions is further demonstrated by Fig. 6.
Figure 6 presents the project life risk from Fig. 5 for three project life periods (10, 20 and 30 years).Box plots show the non-stationary model results for the 234 climate projections with the different time periods compared side by side.Also, the risk calculated using a stationary GEV model and a stationary LP3 model (i.e., the distribution prescribed by Bulletin 17B and fitted using L-moments; IACWD, 1982) fit to the historical flow data are plotted for reference (blue and red dashed lines, respectively).Comparing these three approaches (non-stationary GEV, stationary GEV and station-ary LP3) provides information on the sensitivity of results to the modeling approach and non-stationary parameters.For instance, both stationary models are fit to the same historical simulated flows (one using MLE and the other using Lmoments) so differences between the stationary lines reflect the impact of model choice and fitting approach on estimated risk.Conversely the stationary GEV model (blue line) and the historical non-stationary models (grey box plot) have the same model form and cover the same time period; the only difference is the addition of covariates to estimate model parameters.Thus differences between these two show the effect of model parameter changes from the non-stationary approach.Finally, variability between the box plots for a given design period demonstrates the evolution of risk over time (i.e., the impact of climate change on risk).The latter (i.e., changing risk over time) is the purpose of this analysis; however, before assessing change over time, we must first discuss the impact of model choice and parameters on risk estimates.
For both of the stationary methods, the risk increases with project life following Eq.( 5a) from Table 1.The distinction between these lines and the non-stationary approaches is that, with the stationary approach, a single exceedance probability is calculated for the given flood magnitude, and this probability is assumed to remain constant throughout the design life.Also, for both stationary approaches the model is fit directly to the historical 1-day maximum flow distribution and no covariates are required (note that stationary models are not fit to the future time periods because this would require future simulated flows).Comparing the GEV (blue line) and the LP3 (red line) stationary models, there is a 10-20 % increase in risk between the two models.This difference is purely a function of model form and highlights the sensitivity of the risk calculations to model choice.
Contrasting the difference between the stationary (blue line) and the non-stationary GEV for the historical time period (grey box plot) illustrates the effect of adding nonstationary parameters to a given model form.Recall that in both cases the GEV model is fit to the historical simulated flows.However, for the stationary approach, model fitting results in a single set of parameters (location, scale and shape), whereas with the non-stationary approach we derive the shape parameter and a set of coefficients for linear models to determine the location and scale parameters based on precipitation and temperature values.Thus, for the non-stationary approach, different location and scale parameters are calculated for every historical cool season month and GCM model (234).
Overall, there is close agreement between the stationary (S) and average non-stationary (NS) location parameters (6.55 S vs. 6.64 NS at Farad and 6.63 S vs. 6.78NS at Reno).However, for both gauges the scale parameter is lower with the non-stationary approach (1.30 S vs. 0.94 NS at Farad and 1.28 S vs. 0.96 NS at Reno).At Reno the shape parameter is similar (−0.24 S vs. −0.27NS), but at Farad the difference is somewhat larger (−0.24 S vs. −0.18NS).Differ-ences in model parameters are reflected in the distance between the stationary GEV model (blue line) and the median historical non-stationary GEV box plots (center of the grey box plots) in Fig. 6.For Reno, the stationary line is closer to the historical box plots.However, at Farad the non-stationary box plots are consistently higher than the stationary line.The larger differences between the stationary and non-stationary models for Farad result from changes in the shape parameter between the stationary and non-stationary model fits.This change demonstrates the sensitivity of model results to changes in model parameters.
As with Fig. 5, Fig. 6 shows significant increases in risk moving into the future and subsequently larger differences between the stationary and non-stationary approach.By the second future period the differences between the stationary and non-stationary models can be as much as 50 % or more.For both gauges, difference in risk between the nonstationary and stationary approaches grows over time, indicating greater potential to underestimate the future risk if non-stationary parameters are not considered.
Results were also grouped by RCPs to analyze connections between greenhouse gas emission rates and changes in flood risk.As shown in Fig. 7, we observed no clear trend in flood risk based on the different RCPs.This indicates that, for this flood statistic in this basin, the variability between GCM model form and initial conditions likely overwhelms the influence of greenhouse gas emissions when comparing between scenarios.Although we caution that this is not a general finding, for this application we show that the variability between projections within any RCP scenario is larger than the difference between RCP scenarios.Harding et al. (2012) also noted similar behavior in their study of the Colorado River basin.
Given the sensitivity of projected risk to model parameters, an obvious question is whether increases in risk over time are similarly sensitive.For the 1065 cms flood plotted in Fig. 6, the increased risk with added project life (i.e., 20 years vs. 10 years) is greater with the non-stationary models than the stationary one at both stations.This is intuitive, given the increased flood risk with time demonstrated in Fig. 5 for the non-stationary models.Although Farad has higher overall risk, the relative increase in risk between time periods is similar between the two stations.For example, the median 10-year flood risk when comparing the first  and second (2000-2049) time periods increases by 21 % for Farad and 29 % for Reno.
Next, analysis is expanded to a range of flood magnitudes.Figure 8 plots the flood risk over a 10-year project life starting in 1950, 2000 and 2050 for flood values ranging from 283 to 1416 cms (10 000 to 50 000 cfs).As would be expected, the 10-year flood risk decreases with increasing flood rate.The shapes of the curves are slightly different between Farad and Reno; flood risk decreases more sharply with increased flow at Reno than Farad.Again, this behavior is a function of the shape parameter of the respective GEV distri-  butions.Despite these differences, both gauges display clear shifts between time periods similar to Fig. 5. Again, the median risk for each subsequent period consistently falls outside the interquartile range of the preceding period.
Changes in the median flood risk (i.e., differences between the solid lines in Fig. 8) between each future period and the historical period are plotted in Fig. 9 for both gauges.As would be expected based on the qualitative differences in Fig. 8, the shape of the Farad and Reno difference curves are slightly different.However, the salient point for this analysis is that the increased risk between periods and the two stations is generally within 10 %.Overall, the increased risk between the first future period (2000-2050) and the historical period  is between 10 and % for flows from 600 to 1200 cms.Similarly, the increased risk from the historical period to the second future period (2050-2099) is between 30 and 50 %.Differences for the highest and lowest flows are difficult to assess because the median is skewed for high and low flow values by the fact that the risk values are bound between 0 and 100 %.Farad is plotted with a solid line and Reno is a dashed line.

Summary and conclusions
The analysis presented is unique in its incorporation of nonstationary GEV analysis using CMIP-5 projections and the design-life level risk assessment concepts.We present our findings as a relevant case study and an example application of recent developments in non-stationary flood assessment.Lacking sufficient unregulated flow data we simulate historical floods using the VIC model.Subsequently we use the simulated floods to fit non-stationary GEV models, with downscaled monthly precipitation and temperature as covariates.Although there are some discrepancies between individual simulated and observed flood events, we demonstrate that the VIC model adequately captures the range of flood magnitudes.Furthermore, we show that the GEV-modeled historical floods are in good agreement with both the VICsimulated floods and the published flood events (USACE, 2013b).
Discrepancies between historical and simulated events often result from the monthly time step used for covariates.This can affect the ability to model floods that are generated by precipitation that occurs in 2 months.Also, because the climate variables are monthly aggregates, and not event based, large floods can be generated in months with high precipitation even if that precipitation does not occur in one concentrated event.Despite these differences, comparison with historical flood events demonstrates that the GEV model does reasonably well at simulating historical flood magnitudes, even if some individual historical events are not matched exactly.
Using the derived non-stationary GEV models, we generate flood distributions for 234 CMIP5 climate projections from 1950 to 2099.For the historical 1-day design flood magnitude of 1065 cms, results show significant increases in the frequency of high-flow events in the future.From a water management standpoint, this finding translates directly to increased flood risk.For example, we calculate a 21 % (29 %) increase risk of a 1065 cms flood over a 10-year design life for Farad (Reno) from the historical time period to the first future period and similar increases from the first future period to the second.Increased risk between time periods is also relatively consistent for longer design life periods and similar shifts in flood risk are noted across a range of flood magnitudes.For both stations, the increased risk from the historical to the first future period is between 10 and 20 % and from the historical to the second future period is between 30 and 50 % for floods ranging from 600 to 1200 cms.
The significant increases in flood risk through time indicate the importance of non-stationary flood frequency analysis for future infrastructure planning, and the potential to underestimate risk when stationarity is assumed.For both stations the difference between the stationary and nonstationary approach increases over time.By the second future period , differences in risk calculations between the stationary and non-stationary models can be 50 % or larger.This finding is in keeping with a number of recent studies (e.g., Griffis and Stedinger, 2007;Katz et al., 2002;Towler et al., 2010) that have highlighted potential applications for non-stationary analysis of flood frequency.
An important consideration for this approach is the sensitivity of results to model parameters.In all cases, the flood risk is higher at Farad than Reno due to the relatively heavier tailed distribution that was fit.Estimated model parameters differed by station despite the fact that the flow, precipitation and temperature distributions for both locations are very similar.While these changes affected the overall risk projections, the relative increase in risk over time remained consistent between stations.This indicates that the more robust metric from this analysis is the relative increase in flood risk and not the absolute values.This finding is further supported by the fact that absolute flood risk estimates could be impacted by model bias.By focusing on differences in risk, we specifically highlight the impact of non-stationarity on risk assessment as opposed to parameter sensitivity.Similarly, it is important to note that this analysis is based on natural flow estimates and does not include infrastructure development or operation.Therefore, results indicate the potential increase in the underlying natural flood risk in the UTRB over the 21st century.

Figure 1 .
Figure 1.Map of model domain including the Farad and Reno gauges and their drainage areas.

Figure 3 .
Figure 3. "Observed" unregulated flow estimated from gauge records (blue triangle) compared to VIC-simulated flow (red circles) and the simulated GEV distribution.Boxes span the 25th to 75th percentile of the GEV distribution for a given month and the whiskers extend to the 5th and 95th percentiles.

Figure 4 .
Figure 4. VIC-simulated 1-day flood maximums for November through April 1950 to 1999 (red lines) compared with the historical GEV distributions (blue line is median and grey shading is the 5th to 95th percentile range) and the six observed flow rates.

Figure 5 .
Figure 5. Probability of 1-day flood exceeding historical maximum of 1065 cms (risk) at Farad and Reno.Solid lines represent the median risk of the 234 climate projections and shading covers the interquartile range (i.e., 25th to 75th percentile).

Figure 6 .
Figure 6.Box plots of the probability of a 1-day flood exceeding 1065 cms (risk) for three project life lengths (10, 20 and 30 years).Results are grouped by time period(1950-1999, 2000-2049 and 2050-2099).Blue dashed lines show the flood risk calculated from the stationary GEV model fit to the historical data.

Figure 7 .
Figure 7. Box plots of the probability of a 1-day flood exceeding 1065 cms (risk) in 10 years for three 50-year periods.Results are grouped by the representative concentration pathways (RCPs) used to drive the GCM projection.RCP 8.5 has the largest increase in greenhouse gas concentrations and RCP 2.6 the smallest.

Figure 9 .
Figure 9. Increased probability of flood occurrence for a 10-year project life (risk) from the historical period (1950-1999) to each of the two future periods, 2000-2050 (black) and 2050-2099 (grey).Farad is plotted with a solid line and Reno is a dashed line.

Table 2 .
Negative log likelihood (NLLH) and Akaike information criterion (AIC) scores for each model, as well as the deviance statistics (D) of pairwise comparisons of different model configurations (P is precipitation only, T is temperature only and P & T are both) and the p values of each D score based on a chi-squared distribution.The number of parameters in each model and the models used for comparison are listed at the bottom of the table.The selected model for each station is in bold.
Figure 2. Probability density function of fitted stationary (solid black) and non-stationary (dashed) GEV models compared to historical VIC-simulated flow histogram.

Table 3 .
Summary of derived model covariates for Eqs.(2) and (3) based on historical observations (historical observed) and using historical simulated data from the 234 CMIP-5 projections (historical simulated interquartile range, IQR).