Future Changes in Regional Tropical Cyclone Wind, Precipitation, and Flooding Using Event‐Based Downscaling

Understanding changes in the hazard component of climate risk is important to inform societal resilience planning in a changing climate. Here, we examine local changes in wind speed, rainfall, and flooding related to tropical cyclones (TCs) and compare them across statistical and dynamical modeling approaches. Our focus region is the Delaware River Basin, located in the northeastern United States. We pair event‐based downscaling with large ensemble climate model information to capture the details of extreme TC wind, rain, and flooding, and their likelihood, in a changing climate. We identify local TCs in the Community Earth System Model 2 Large Ensemble (CESM2‐LENS). We find fewer TCs in the future, but these future storms have higher wind speeds and are wetter. We also find that TCs produce heavier 3‐day precipitation distributions than all other summertime weather events, with TCs constituting a larger percentage of the upper tail of the full precipitation distribution. With this information, we identify a small collection of 200‐year return events and compare the resulting TC rain and wind across dynamical and statistical downscaling methods. We find that dynamical downscaling produces peak rain rates far higher than CESM or the statistical downscaling method. It can also produce quite different future changes in precipitation totals for the small set of events considered here. This leads to vastly different flood responses. Overall, our results highlight the need to interpret future changes of event‐based simulations in the context of downscaling method limitations.


Introduction
Using historical records to understand tropical cyclone (TC) hazards is limited by the short length of these records.Global climate model simulations can extend the record, produce events far beyond historical experience, and respond physically to changing climate forcing (e.g., Cobb & Done, 2017).Yet today's global climate model resolution is too coarse to represent the mesoscale and convective scale processes critical to the details of wind, rain, and flood hazards associated with TCs (e.g., Cobb & Done, 2017;Davis, 2018;Diffenbaugh et al., 2005).Therefore, several methods have been developed to take coarse information down to the local scale.Such methods include using statistical relationships with observations (statistical downscaling), simulating physical processes at a finer resolution (dynamical downscaling), or applying a combination on specifically selected events (eventbased downscaling).
Statistical downscaling methods are computationally efficient and may also correct biases in climate model fields such as precipitation (i.e., delta change or quantile mapping).Such methods are popular for regional climate analysis (Cannon et al., 2015;Fang et al., 2015;Hundecha et al., 2016).However, they are only as good as the large-scale climate information and they do not necessarily capture physical behavior of weather phenomena (Amengual et al., 2012).
Higher-resolution global climate models, for example, are reducing long-standing model errors and increasingly represent multiple scales of motion (Yeager et al., 2023).However, they still lack the resolution to resolve the most intense TCs.Global and regionally refined models that resolve the most intense TCs can only be run for short periods (Maraun et al., 2010;Stevens et al., 2019;Tapiador et al., 2020).Furthermore, it is necessary to examine a large sample of ensemble members to ensure a more robust statement on projected changes (Diffenbaugh et al., 2017).Ensemble boosting is an effective way to increase sample size (Fischer et al., 2023) but requires an additional step to generate higher-resolution information.
An approach that has been developed to alleviate some of the aforementioned limitations is event-based downscaling as demonstrated by Huang et al. (2020).The authors utilize this approach to downscale selected extreme atmospheric river (AR) storms in California.For the selection of events, the authors searched for ARs during historical and future time periods in the 40 ensemble members of the Community Earth System Model Large Ensemble (CESM1-LENS; Kay et al., 2015).They then ranked events within the study periods and selected the 20 most intense ARs for a set of sub regions and time periods.The events were then dynamically downscaled, and the authors were able to examine precipitation extremes.The approach allows for a combination of large ensemble data sets and high-resolution weather modeling to understand extreme events.This event-based approach has since been utilized to understand mega flood risk in California (Huang & Swain, 2022).The study highlighted the translation of dynamically downscaled climate data to land surface processes and gave information on projected increases on runoff extremes.However, this event-based downscaling methodology has not been used to explore intense TCs.
Here, we pair event-based downscaling with large ensemble climate model information to capture the details of extreme TC wind, rain, and flooding, and quantify their likelihood, in a changing climate.Our focus region is the Delaware River Basin, located in the U.S. Northeast.We track TCs in all 100 ensemble members of the Community Earth System Model Large Ensemble Project 2 (CESM2-LENS; Rodgers et al., 2021) impacting the study area and examine how TC wind quantiles are projected to change by the end of the century.Then, we stratify events by TC and non-TC to examine how their precipitation distributions are projected to change.With this information, we extend the previously mentioned event-based downscaling technique by selecting extreme TC precipitation events and compare the results of dynamical and statistical downscaling methods.Finally, we connect these downscaling methods with hydrologic modeling to examine the flood response to the dynamically versus statistically downscaled information.In doing so, we explore the utility of this expanded event-based downscaling for the case of TC hazards.

Study Site and Data Sets
For this study, we focus on the Delaware River Basin draining to Trenton, New Jersey at U. S. Geological Survey (USGS) gage 01463500.The basin has a drainage area of ∼17,000 km 2 and is shown in Figure 1.The main channel of the basin is undammed.The physiographic regions within the basin include the mountainous Appalachian Plateau, the Valley and Ridge, New England, and Piedmont provinces with a majority of the area dominated by forested land cover (Kauffman et al., 2011).The basin is consistently impacted by TC wind and flood hazards (Barthel & Neumayer, 2012;Czajkowski et al., 2013;Mendelsohn et al., 2012) and hazards linked to extratropical transitioning cyclones (Evans et al., 2017;Hart & Evans, 2001;Lee et al., 2022), making it an ideal candidate to examine TC hazards.
The foundational data set for our analysis is CESM2-LENS, which consists of 100 members with a 1-degree spatial grid spacing and uses Coupled Model Intercomparison Project Phase 6 (CMIP6) historical and Shared Socioeconomic Pathway 370 (SSP370) future radiative forcing scenario from 1850 to 2100 (Rodgers et al., 2021).Only members 91-100 have stored 6-hourly upper-air fields required for dynamical downscaling.However, sufficient information is available to identify TCs in all 100 members, allowing us to use the full ensemble to characterize TC climatology as described below.
For a reference data set to use for statistical downscaling and hydrologic model calibration, we utilize daily precipitation and temperature from Parameter Elevation Regression on Independent Slopes Model (PRISM; Daly et al., 2002) on a 4-km spatial grid.For monthly evapotranspiration (ET) we utilize data from the ERA5 reanalysis data set (Hersbach et al., 2020).For daily soil temperature we utilize data from ERA5-Land reanalysis data set (Muñoz Sabater, 2019).Soil temperature is only utilized for hydrologic model calibration.We utilize a reference time period from 1981 to 2020.
To examine extreme events impacting the Delaware River Basin from CESM2-LENS, we implement an eventbased downscaling approach as previously mentioned.The approach consists of determining the empirical cumulative distribution of 3-day basin average precipitation for TC and non-TC events.We split the data into two sets of historical  and future (2067-2100) 34-year periods and only focus on events between June and October.

Identifying TCs in CESM2-LENS
Identifying TCs in global climate models typically requires upper-air information at 6-hourly temporal frequency (Prein et al., 2023;Ullrich et al., 2021).However, the full 100 members CESM2-LENS have only surface variables archived at 6-hourly frequency.We therefore develop a simple place-based TC detection algorithm that defines a TC as wind speed anywhere in our basin exceeding a threshold value.
To obtain TC wind speed (V max ) in CESM2-LENS, we first obtain the 6-hourly sea-level pressure (PSL) timeseries and then compute V max using the TC pressure-wind relationship described in Knaff and Zehr (2007) as follows: V max = 4.4(1010 PSL) 0.76  (1) We start with PSL since it is a less variable measure of storm intensity than wind speed and is therefore more likely to represent TC system-scale intensity.Once 6-hr V max is computed we need to define TC events.To do so we decluster the V max timeseries above a threshold value using a 2-day declustering timescale, which groups sequences of exceedances and is analogous to a single value of V max per storm.The threshold is chosen such that the resulting TC frequency matches the observed mean frequency within a 500-km radius of the basin center for the period of 1981-2020.If an identified 2-day cluster is above the threshold it is considered a TC.We use observed data from the International Best Track Archive for Climate Stewardship (IBTrACS) Project (Knapp et al., 2010).This step is completed utilizing all 100 members of CESM2-LENS grouped together.
We use the TCs to subset CESM2-LENS 3-day precipitation accumulations into TC and non-TC categories for the period between June and October.Next, we construct the survival functions from the empirical cumulative distribution functions for each precipitation category.For the event-based downscaling we then select a small set of 3-day precipitation amounts that match or are close to the 200-year return 3-day precipitation value from the previously described survival functions.We choose three TC events and three non-TC events in current and future climates, for a total of 12 events.These events are from the 10 ensemble members that have the 6-hourly upper-air data needed for dynamical downscaling.

Statistical Downscaling
To statistically downscale and bias-correct precipitation, temperature and ET, we implement empirical quantile mapping (EQM; Amengual et al., 2012) based on functions developed by the Santander Meteorology Group (2015).For precipitation and temperature this is conducted at the daily scale, while for ET at the monthly scale.We utilize EQM as it has been shown to work well for the representation of wet days and their intensity in observations (Fang et al., 2015) as well as widely utilized in hydrologic studies (Camici et al., 2014;Hundecha et al., 2016;Michalek et al., 2023Michalek et al., , 2024;;Teutschbein et al., 2011;Tofiq & Guven, 2014).We use the following procedure: 1.For a given day, i, we use the forcing of interest R (i.e., precipitation, temperature) for all the days that fall within a time window of ±15 days (i.e., a 31-day window) for all the years (n = 34 because we focus on the 1981-2014 period).Therefore, we obtain a vector R i that has 1054 values (i.e., 31 × 34) values for the ith day (training data set).We also collect the ith day of the years in the historical and future periods (test data set).This is done for the reference data set (i.e., PRISM) and each CESM2-LENS ensemble member.2. The training data set is used to derive statistical differences between the reference (i.e., PRISM) and each CESM2-LENS ensemble member based on the empirical distribution with the EQM method.3. Next, the test data set is used to perform the respective bias correction and statistical downscaling to correct the statistical difference found in Step 2. This result is what is utilized for further analysis.
We apply the EQM method utilizing all data from 1981 to 2014 and 2067 to 2100 and not just for each event.This is completed for all 100 ensemble members.For ET, the same procedure is applied but using monthly data from ERA5.

Dynamical Downscaling
Next, for dynamical downscaling of the events, we utilize the Weather Research and Forecasting (WRF) model version 4.5 (Skamarock et al., 2019).The WRF model has a long history of successful TC simulations (Powers et al., 2017).We provide a table of the WRF physics options in Supporting Information S1 (Table S1).To perform downscaling, we use the outputs of CESM2-LENS to drive WRF.The CESM2-LENS output data are formatted following Bruyère et al. (2019) and the corresponding code is provided in the link within the availability statements.We implement two domains in WRF referred to as domain 1 (d01) and domain 2 (d02) in Figure 1, with spatial grid spacings of 20 and 4 km, respectively.We select domains' sizes to step down in grid scale from CESM2-LENS (∼100 km) to the 4 km needed to resolve intense TCs (Davis, 2018).The greenhouse gas forcing in WRF is set to match those based on SSP370.We run the event simulations for 7 days to allow time for WRF to spin-up the intensity and fine scale information of the events prior to landfall.All WRF related analyses shown in Sections 3 and 4 are based on data from d02.

Hydrologic Modeling
For our hydrologic analysis we utilize the Hillslope-Link model (Mantilla et al., 2022) to translate event-based precipitation events into flood events.Specifically, we utilize the TETIS version of the HLM (Quintero & Velasquez, 2022), which requires inputs of precipitation, temperature, soil temperature, and ET, and is set up to provide streamflow simulations for every channel in the basin.We utilize daily precipitation, daily temperature, daily soil temperature, and monthly ET with all inputs spatially distributed across the study basin.Soil Earth's Future 10.1029/2023EF004279 temperature is only utilized to determine if the ground is frozen or not.For hillslopes and channel links we utilize the hydrography from HydroSHEDS (Lehner et al., 2008;Lehner & Grill, 2013), which produces a network with ∼4,000 channel links for our study basin.We calibrate the model with the Dynamical Dimension Search (DDS) algorithm (Tolson & Shoemaker, 2007) based on daily average discharge at the USGS gage previously mentioned.The optimization function is set up to maximize Kling Gupta Efficiency (KGE) defined as: where ρ, α, and β denote correlation, the ratio of standard deviation, and the ratio of mean between simulated and observed streamflow, respectively.KGE values close to 1 imply an optimal estimation.The model is calibrated from 1981 to 2000 and validated from 2001 to 2020 as daily average discharge observations are available across the entire period.The calibration and validation are completed utilizing forcings of described in Section 2.1.For daily precipitation, the total is distributed uniformly over a 24-hr period to get an hourly intensity, which is input to the HLM.The DDS algorithm is run for 1,000 iterations to optimize 10 global parameters.Additionally, we also estimate the metric of Nash Sutcliffe Efficiency (NSE) for the calibration and validation of the best model selected defined as where S i is the simulated discharge for each time step i, O i is the observed value, N is the total number of values within the period of analysis.NSE values range from ∞ to 1 and values near 1 are optimal.We utilize the KGE over the NSE for the optimization function as it improves upon the NSE's deficiency in assuming the optimal reference is the mean.A link to the model can be found in the Data Availability Statement section.
To hydrologically model each event, we utilize the outputs of precipitation, temperature, and ET from the native CESM2-LENS outputs, statistically downscaled, and WRF simulations to drive our hydrologic model.For WRF hydrologic simulations, we run two sets.First, utilizing the hourly precipitation directly from the downscaling and second aggregating the precipitation to the daily scale (i.e., total for a day) similar to that of the BCSD and native CESM inputs.This is completed for all selected events.A 3-day buffer around the selected events is added so that the modeling time (7 days) can capture the hydrologic behavior.Initial conditions at the start of each 7-day simulation set are taken as the conditions from a summertime bank full event from the PRISM simulations described previously.As previously mentioned, for daily precipitation, the total is distributed uniformly over a 24-hr period to get an hourly intensity, which is input to the HLM.Finally, the ground is assumed not to be frozen for each of the 12 events.

TC Frequency and Intensity
To begin exploring the projected changes in TC hazards impacting the Delaware River Basin, and consequently evaluate our methodology of determining TCs in CESM2-LENS, we present the number of TCs per year per ensemble member stratified by decade in Figure 2. The horizontal line represents the average number of observed TCs per year (2.2) for the period 1981-2020 for the study basin based on observed storm tracks from IBTraCS.To reiterate, the observed mean frequency (2.2) is used to set a threshold wind speed and decluster wind speed exceedances into single storm events in the CESM2-LENS data.We find the median number of TCs for decades within the reference period is centered around the observed average for the period and proceed to use this TC classification across the entire period of CESM2-LENS.From 1850 to 1900, our results highlight that the median number of TCs per year across all ensemble members is consistently near 2.8.However, after 1910 we find the number of TCs per year decreases to roughly 1.9 by the 2090s.According to a state of the science summary for Atlantic hurricanes and climate change by the National Oceanic and Atmospheric Administration (NOAA, 2023), our results are in-line with their survey of existing studies in which the total number of Atlantic TCs are projected to decrease by 15%.Therefore, we deem our TC tracking suitable for stratifying TC events for our study area.
Next, we analyze future changes in TC maximum wind speeds.Figure 3 presents the results of quantile regression to quantify temporal trends in the maximum wind speed from 1850 to 2100 based on the methodology from Elsner and Jagger (2013).The magnitude of the change increases with wind speed magnitude, in agreement with prior work (e.g., Elsner et al., 2008;Knutson et al., 2020).Furthermore, the wind speed quantiles above 0.5 show a statistically significant temporal trend (p < 0.05).Finally, it should be noted that the uncertainty bounds are relatively consistent across upper wind speed quantiles.

Precipitation
With the tracking and analysis of TCs in CESM2-LENS complete, we now shift our focus to the projected changes in basin precipitation.To begin, we stratify 3-day precipitation totals that occur between June and October by whether they are TC or non-TC.Figure 4 shows the survival functions for the historical (blues) and future (reds) non-TC and TC precipitation events utilizing all 100 members of CESM2-LENS.Additionally, we compute the same functions based on observations using PRISM and IBTrACS.We first notice that TC precipitation curves are higher than non-TC precipitation curves in both PRISM and CESM2-LENS.We also notice that CESM2-LENS precipitation has an overall slight low bias compared to PRISM in all but the most intense events.Non-TC 3-day precipitation events show minor change by the end of the century (Figure 4, light gray line).For the extreme exceedance probability, where differences are more pronounced, we find the CESM2-LENS values are within the observed confidence intervals as the intervals become relatively large.For TC 3-day precipitation, our results indicate the differences between historical and future survival functions are much more pronounced compared to non-TC ones.Furthermore, we find that the future TC curve shows a projected increase across all exceedance probabilities compared to the historical curve.Finally, it should be noted that we utilize all 100 ensemble members as it takes ∼50 ensemble members to produce a stable 200-year 3-day precipitation value (Figure S1 in Supporting Information S1).Earth's Future 10.1029/2023EF004279 Next, we examine the TC fraction of total precipitation for extreme exceedance probabilities shown in Figure 5. First, our results indicate that CESM2-LENS members capture the observed historical fraction well.Additionally, we find that the fraction of precipitation related to TC events is not projected to change for the end of the century.For all the exceedance probabilities, except 5%, our results indicate there is a slight decrease in the TC fraction for the future period compared to the historical one, but the difference in percentage is less than 2%.Connecting our  current results with the previous analysis, we ascertain that TCs are expected to become more intense, not just in terms of wind but also in precipitation.To explain, despite the decrease in the number of TCs, we find that their contribution to the total precipitation will remain the same by the end of the century, pointing to an increase in TC rainfall.

Event Downscaling
Based on the survival functions from Section 3.2, we determine the 200-year (0.5%) 3-day precipitation accumulations (Figure 4, dashed line) for the historical and future non-TC and TC distributions.Our approach therefore differs from the original proposed methodology by Huang et al. (2020) who selected the top-20 ranked events.We calculate the associated 3-day precipitation 200-year value for non-TC historical and future periods as 79 and 81 mm, respectively.For TCs, we find values of 131 mm for the historical period and 167 mm for the future.We select three events that are near each of these values for downscaling.Details of these events are provided in Table S2 in Supporting Information S1.One important aspect of this selection to note is the difference in non-TC versus TC event selection.Since there are vastly more non-TC events, we select events which match the 200-year return value.However, for the historical and future TCs, the number of events is much smaller and so we select events that match or were close to the 200-year value while also maximizing the maximum wind speed.This is one of the primary limitations of this study as only 10 ensemble members have upper-air information for dynamical downscaling.If a larger sample size were available, our TC events would be closer to the 200-year value.The tracks of the six selected TC events are shown in Figure 1.

Event Precipitation and Wind
For each of the 12 events, we examine the impact of the downscaling method on the precipitation for our study basin.To begin, we compare the 3-day basin average precipitation values for each method to the original CESM2-LENS value selected as provided in Table S2 in Supporting Information S1.For non-TC events, we determine that the native CESM produces an average across events of 79 and 81 mm for the historical and future periods, respectively.For the EQM downscaling, we find that the historical and future event averages are 94 and 101-mm.
For WRF, we calculate the historical non-TC event average as 122 mm, with a future average of 222 mm.Our findings highlight that the WRF simulations indicate a future increase on average in non-TC precipitation compared to the native CESM data, which is not shown in the survival function analysis.Additionally, for non-TC events our results highlight that WRF simulations produce higher precipitation amounts overall compared to native CESM and EQM downscaling.For TC events, we determine the native CESM historical and future event average values as 116 and 166 mm, respectively, or a 43% increase in the future.For the downscaling methods, we find that EQM produces a relative percent change of 24%, with a historical average of 159 mm and a future average of 198 mm, whereas WRF produces average values of 155 and 208 mm for the historical and future periods (i.e., 34% increase).Furthermore, when examining the average accumulated precipitation, we find there is more agreement in the relative change of TC events compared to non-TC events.For non-TC events, WRF shows more than twice the projected increase of EQM.Overall, our results indicate a potential for an increase in precipitation across the basin for future 200-year events which is not as apparent when examining the survival functions and highlights how the choice of downscaling can impact the magnitude of projected changes.
Next, we focus on wind speed and the impact of dynamical downscaling on a dynamic parameter related to simulating TCs.Specifically, we examine the basin maximum daily average wind speed at 10 m above the surface of the selected TC events between native CESM and WRF simulations for the 7-day time period as shown in Figure 6.Wind speed is an important mechanism driving rainfall distribution (Lu et al., 2018) as well as an important factor to capture for wind risk.Our results show that WRF downscaling produces larger daily average maximum wind speed across the basin compared to native CESM, especially for the historical events (Figure 6, top row) where WRF increases wind speeds up to three times that of CESM.For two of the future events the wind speed peaks in the time series are similar but still higher for WRF downscaling.The time series shape is similar in pattern between both sets but WRF tends to produce higher wind speeds over the basin along with larger variances in the time series.The higher wind speeds in WRF are likely to be primarily a consequence of the smaller grid spacing that can resolve sharper pressure gradients across TC eyewalls (Davis, 2018).Finally, our analysis reveals that the events in the historical and future have similar maximum values and no statement on projected changes can be made.

Event Floods
Next, we focus on peak flood magnitudes for each of the selected events based on hydrologic simulations.To begin, we expand on our precipitation analysis and analyze the time series of precipitation based on the 7-day window utilized for hydrologic simulations.Figure 7 shows the basin average precipitation intensity for the simulation period.For this analysis, we aggregate the intensity for WRF simulations to the daily scale for a more appropriate comparison and provide the WRF hourly results.Our results highlight how the timing of the hyetographs is similar between methods; however, the basin average intensity is greater with WRF compared to the EQM approach even when aggregating to the daily scale.We find that the WRF daily aggregated precipitation basin average intensity can be 1-3 times larger compared to the native climate model forcings or EQM downscaling.In the future events for both non-TC and TC, the WRF peaks at the daily scale tend to be much higher than EQM or WRF whereas the historical events produce more similar peaks.For certain events of non-TC: 1988-08-27, non-TC: 1995-10-12, and TC: 2092-10-22, the peak values with WRF are either less than or equal to EQM.Additionally, these plots illustrate how the EQM statistical downscaling produces basin average precipitation intensity that is slightly larger but overall similar to the native CESM2-LENS compared to WRF.Additionally, for the WRF hourly Figure 7 shows how the peak intensity is much higher compared to all the other methods and since the basin is large (∼17,000 km 2 ), these differences in basin average intensity would lead to large differences in volume of precipitation entering into the hydrologic system.These differences highlight the impact of downscaling and temporal resolution on precipitation extremes driving flooding for non-TC and TC events.
The next step in our analysis is to understand how precipitation from selected events translates to floods.For these simulations we utilize the HLM as described in the methods set up with the calibration as shown in Figure S2 in Supporting Information S1.For our calibration period from 1981 to 2000, the KGE is 0.86, with a score of 0.85 for the validation period (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019)(2020).The NSE for the calibration and validation periods are 0.77 and 0.71, respectively.These values show strong model performance as acceptable NSE values for daily scale simulations are above 0.5 (Moriasi et al., 2015).For KGE, acceptable criteria are not as clear but an equivalent value for a NSE of 0 (mean performance) has been shown to be approximately 0.41 (Knoben et al., 2019).We exceed this value and are close to an ideal KGE of 1.As shown in Figure S2 in Supporting Information S1, our model set up captures most of the observed peaks at the daily scale.The average monthly peak discharge bias for the months of interest is 1.07.We therefore deem this model suitable for the simulation of events at the Delaware River Basin site near Trenton, New Jersey.
For the quantification of the future changes in extreme precipitation and flooding, our results show large sensitivity to the downscaling method.To explain, we examine the event maximum discharges for the 12 events of interest as shown in Figure 8.The results are stratified by event type and period.For WRF, we conduct simulations aggregating the precipitation to the daily scale as well as with the output hourly rates.
For non-TC events, our results indicate the native CESM produces historical and future flood peaks with similar magnitudes across events, which matches our findings from the previous precipitation analysis.Furthermore, we show that EQM simulations for non-TC events have similar magnitudes for the historical and future periods as well (Figure 8, black points).However, for the WRF flood peak simulations, our results in Figure 8 highlight that there is large variability in the historical and future non-TC peak magnitudes (blue points) for both hourly (square) and daily (circle) event simulations.The event magnitudes for the historical and future non-TC are smaller when using daily scaled and uniformly distributed precipitation compared to the hourly rainfall obtained directly from WRF.Furthermore, we find the non-TC WRF flood peaks, for both sets, indicate a projected increase for the future compared to the historical simulations.
Regarding TC events, we show a projected increase in future flood peaks for CESM and EQM simulations.For flood peaks of the TC events based on WRF simulations (hourly and daily), we ascertain that no clear pattern of future change can be deduced due to large variability between event simulations.However, the WRF-Daily produces results more similar in magnitude to that of the EQM simulations.Overall, our results highlight the need to interpret future changes in the context of the limitations of the chosen downscaling method and precipitation aggregation type.

Concluding Discussion
Our study is one of the first to employ the large climate model ensemble analysis coupled with the event-based downscaling to study regional TC wind, rain, and flood hazards.In doing so, we explore the characteristics of the most intense TCs, while at the same time quantifying their likelihood of occurrence.To perform our analysis, we tracked TCs in CESM2-LENS ensemble members based on maximum wind speed and determined those that impacted our study basin within a 500-km radius.Next, we constructed survival functions for non-TC and TC precipitation in historical  and future (2067-2100) periods.This allowed us to examine changes in regional TC frequency, wind speed, and precipitation.From here, we selected 12 events that matched the 200-year return precipitation amount and downscaled them to study the regional character of these rare non-TC and TC events and associated flooding.We compare downscaled information produced using statistical (EQM) and dynamical (WRF) approaches.
Based on our tracking of TCs impacting the Delaware River Basin, we found that wind speed quantiles are projected to increase by the end of the century.This is due to the large sample size (100 ensemble members) combined with a long time period (1850-2100), which helps to identify trend signals (Amorim & Villarini, 2024).
From an impacts' perspective, the trends in wind speed quantiles are important to take into account, as studies such as Cui and Caracoglia (2016) have shown that structural damage and intervention costs for major cities on the U. S. East Coast are projected to increase due to wind speed distributions shifting to have larger median speeds and thicker upper tails.Finally, the primary limitation of our analysis of wind speed is that we utilize the native climate models from CESM2-LENS and these native models can have difficulties in producing hurricane intensities above category 3 (Bender et al., 2010).
Next, we shift our focus to precipitation as simulated by CESM2-LENS.Consistent with prior work, precipitation amounts in CESM2-LENS are found to increase in the future (Diffenbaugh et al., 2005;Janssen et al., 2014Janssen et al., , 2016) ) and for TCs specifically (Gori et al., 2022;Xi & Lin, 2022).Regionally, some studies (Swain et al., 2020;Wright et al., 2015) found that for the U.S. Northeast, near our study site, 1-day precipitation return periods are projected to increase by the end of the century.However, many such studies do not stratify events by event type (i.e., non-TC vs. TC).We find that TC precipitation increases more than non-TC precipitation.We also find that the fraction of the heaviest 3-day precipitation events associated with TCs remains similar in the future.These precipitation changes taken together with the wind speed changes indicate that, while TCs may be fewer in the future, they are likely to be more potentially damaging.
For the non-TC events, we did not stratify or track the type of events in CESM2-LENS.Based on work by Prein et al. (2023), non-TC extreme precipitation over the U.S. Northeast is most commonly associated with mesoscale convective systems and fronts associated with weather systems tied to upper-level jets.It is likely that these are the events dominating the non-TC fraction of precipitation for our study.However, future work should explore the tracking and selection of events from these weather phenomena.
Based on our downscaling results, dynamical simulations typically produced higher event flood peak values compared to native CESM or statistical downscaling.However, aggregating precipitation from dynamical downscaling to the daily scale and uniformly redistributing for the day produced magnitudes similar to that of statistical downscaling.The native CESM produced the smallest flood peak values, likely due to spatial resolution of the input data (Quintero et al., 2022).The higher flood peaks found from hourly dynamical downscaling are likely a combination of higher localized rain rates and differences in the other hydrologic model inputs (i.e., temperature and ET).Furthermore, analysis of the spatial pattern of WRF precipitation found some locations received over 500 mm of precipitation per event.The daily aggregation and redistribution likely removed some of these extreme precipitation events, lowering the flood peak magnitudes to potentially more "realistic" values.However, this difference may be due to our model set up not being robust to different temporal distributions of precipitation and should be explored in the future.Finally, while statistical downscaling produces more "realistic" flood peaks compared to observations in a climatological sense, the future scenarios are constrained by scaling based on historical bias corrections and empirical distributions for a given cell.
We find large sensitivity of TC precipitation and flooding to the downscaling method and dynamical downscaling approaches can provide quite different views to statistical downscaling for these rarest of events.This highlights a need for studies to interpret results within the context of the limitations of the chosen method.Our small sample size of downscaled events precluded robust identification of future change signals in extreme TC precipitation and flooding.More members would be better to determine the exact median behavior of events and we encourage future studies to explore the optimal number of samples for this methodology.Finally, we also suggest that future work examines the impact of dynamical downscaling on climate projection uncertainty compared to the original climate models (e.g., Blanusa et al., 2023;Sørland et al., 2018) as well as determine how well the statistics of observed behavior (i.e., precipitation, temperature) are captured by downscaled simulations (Pierce et al., 2013) to guide the determination of the optimal event sample size.
We are only just beginning to understand the potential of such multi-scale modeling systems.Historically, new modeling approaches introduced into climate research have allowed for new ways to understand climate.In addition, connecting with other disciplines such as storm surge modeling, agriculture, engineering, or finance becomes easier when climate information is produced at the scales needed to assess impacts.In a changing climate these systems are needed now more than ever.

Figure 1 .
Figure 1.The Delaware River Basin at Trenton, New Jersey (gray shaded area) with the Weather Research and Forecasting (WRF) domains d01 and d02 utilized (black rectangles) and TC tracks for the three historical (cool colors) and three future (warm colors) periods selected from CESM2-LENS (colored lines).

Figure 2 .
Figure 2. The distribution of number of TCs per year stratified by decade incorporating 100 ensemble members of CESM2-LENS from 1850 to 2100 impacting a 500km radius of the Delaware River Basin's centroid.The red line represents the average number of TCs per year (2.2) from observed storms within a 500-km radius taken from IBTrACS.

Figure 3 .
Figure3.Quantile regression slopes versus quantile for TC wind speed stratified for CESM2-LENS events impacting the Delaware River Basin.The 95% confidence interval (gray) is based on a bootstrap method with 1,000 iterations.

Figure 4 .
Figure 4. Exceedance probability of 3-day basin average precipitation for the Delaware River Basin for TC and non-TC associated events.PRISM events represented by gray and black lines for non-TC and TC, respectively, are shaded with 5th and 95th confidence intervals.Historical and future CESM2-LENS curves are represented in blue and red, respectively, containing information from all 100 ensemble members.The dotted line represents the 99.5% or 200-year probability.

Figure 5 .
Figure 5.The fraction of total precipitation associated with TCs versus basin average 3-day precipitation exceedance probability for the Delaware River Basin at Trenton, New Jersey.Observed precipitation is represented by PRISM (gray, 1981-2014), while analysis based on the 100 members of CESM2-LENS is represented for the historical (1981-2014) and future (2067-2100) periods with blue and red, respectively.The numbers represent the ensemble average count of TC events within each probability bin where PRISM only consists of "one ensemble member."

Figure 6 .
Figure 6.Basin maximum daily average wind speed [m/s] at 10 m above the surface for the six selected TC events.The native CEMS2-LENS is represented in red and WRF downscaling in blue.The event date is provided in the panel header.

Figure 7 .
Figure 7. Basin average precipitation intensity [mm/hr] for the 12 selected storms.The native CEMS2-LENS is represented in red, EQM downscaling in black, and WRF downscaling aggregated to daily scale in blue, and WRF hourly in teal.The event date and tropical storm association are provided in the panel header.

Figure 8 .
Figure 8. Peak flow discharges from the 7-day hydrologic simulations for each of the 12 events stratified by event type and downscaling method.