A flood inundation forecast of Hurricane Harvey using a continental-scale 2D hydrodynamic model.

Forecasts of tropical cyclones have seen rapid improvements in recent years as expanding computational ca- pacity permits more runs of ﬁ ner resolution meteorological models with increasing representation of physical processes. However, the utilization of a hydrodynamic component in these models is often neglected, meaning ﬂ ood forecasts typically output point water levels that give little indication of a projected inundation extent on the ground. Here, we append this critical component to the forecast cascade by coupling Fathom-US, a con- tinental-scale hydraulic model which employs the LISFLOOD-FP numerical scheme, to forecasts of stream ﬂ ow, rainfall and coastal surge height from the National Oceanic and Atmospheric Administration (NOAA). Medium- term (2 – 15days) ﬂ ood inundation forecasts, as well as hindcasts driven by real-time observations, were executed for Hurricane Harvey by rapidly simulating pluvial and coastal ﬂ ood hazard and extracting ﬂ uvial ﬂ ood maps from an existing US-wide simulation library. The resultant ~30m resolution depth grids were then validated against post-event observations collated by the US Geological Survey. Across the disaster zone, the hindcast (forecast) model captured, on average, 78% (75%) of the benchmark ﬂ ood extent, obtained a Critical Success Index of 0.66 (0.57) and deviated from observed high water marks by ~1m (~1.2m). When compared to a simpler GIS-based approach, the hydraulic model exhibited much higher skill in replicating observations. This study shows that fully hydrodynamic approaches can be practicably employed in large-scale forecast frameworks at high resolution to produce skillful projections of inundation extent without signi ﬁ cantly a ﬀ ecting the forecast lead time.


Introduction
Flood events are among the most costly and deadly natural disasters on the planet: since 1980, they have caused economic damages of over $1 trillion and 220,000 deaths worldwide (Munich Re, 2018a). Recent devastating events, particularly flooding (both freshwater and saltwater) arising from tropical cyclones, have sharply focused the issue in the minds of the public and policy makers alike. In 2017, Hurricanes Harvey, Irma and Maria collectively caused $220 billion of damage in the Gulf of Mexico; Typhoon Haiyan claimed the lives of over 6000 people and caused $10 billion worth of damage in East Asia in 2013; and in 2008, Cyclone Nargis caused 140,000 fatalities and $4 billion of economic damage in Myanmar (Munich Re, 2018b). A growing body of evidence suggests that such tropical cyclones will become more intense (Kang and Elsner, 2016;Sobel et al., 2016;van Oldenborgh et al., 2017;Emanuel, 2017) and move more slowly once they make landfall (Kossin, 2018) as a result of climate change. With more precipitation falling over a longer duration, tropical cyclone driven flood impacts are likely to increase in the future. On top of the freshwater component of such events, the low atmospheric pressure and strong onshore winds arising from tropical cyclones result in coastal inundation. Regardless of potential changes to these storm characteristics under climate change, increased sea levels in a warming world are likely to exacerbate coastal flood impacts.
In light of this, there is a clear need for substantial risk reduction measures to mitigate against present and future flood consequences. One facet of such measures is improved flood forecasting, which permits a short-term response to be mounted (e.g. temporary defense erection, evacuations, first-responder preparedness, reinsurance purchasing). A typical generic flood forecasting approach can be conceptualized as a source-pathway-receptor framework (e.g. Narayan et al., 2012)  Direct meteorological and hydrological observations (e.g. of rainfall from gauges or radar, of wind speed from anemometers or radar, or of river flows from an upstream gauge) can form the source data of the forecast cascade if spatial coverage is sufficient, but such information is often unavailable or generates forecasts with short lead times (a few hours to a maximum of perhaps 1-2 days in very large basins), limiting their usefulness. To increase lead times, medium-term forecasts (2-15 days) use numerical weather prediction (NWP) models as the primary input data source to the model cascade, and NWP systems have benefitted from the rapid advances in computational capacity seen in recent years. High performance computing (HPC) resources have had the dual effect of permitting NWP models to run at increasingly fine spatial and temporal resolutions so that atmospheric dynamics can be more accurately represented (Buizza et al., 1999), and allowing multiple NWP simulations in an ensemble to account for underlying model uncertainties (Cloke and Pappenberger, 2009).
Pathway models translate the source-generated meteorological variables (e.g. precipitation, wind speed) to water flows (e.g. streamflow, coastal surge height), except where the rainfall output from the source forms a direct input to a hydraulic model (i.e. for pluvial flood events). For riverine flood events, a hydrological model takes meteorological inputs and computes, with varying levels of physical complexity, water fluxes at the land surface based on soils, land cover and topography to generate streamflow. As examples, the European Commission Joint Research Centre (JRC) LISFLOOD (van der Knijff et al., 2010) and the European Centre for Medium-Range Weather Forecasting (ECMWF) HTESSEL (Balsamo et al., 2011) models are used in the JRC-ECMWF forecast product GloFAS (Alfieri et al., 2013), while the National Oceanic and Atmospheric Administration (NOAA) National Water Model is driven by WRF-Hydro (Gochis et al., 2018) to forecast river discharge in the US. For coastal storm surges, forecast wind fields are the driving forces in models describing fluid motion in the ocean which simulate surge height at given coastal locations (e.g. the NOAA SLOSH model (Jelesnianski et al., 1992)).
The source and pathway components of the forecast framework have received much attention in the literature (Cloke and Pappenberger, 2009;Thielen et al., 2009;Pappenberger et al., 2010;Alfieri et al., 2012Alfieri et al., , 2013 and form the products of the world's leading forecast centers -NOAA and ECMWFbut a receptor model is an oftenneglected component of the forecast cascade. A receptor model in the framework described here translates input water flows from pathway or source models (e.g. streamflow, rainfall or coastal surge height) and translates them to a 2D grid of flood depths using a hydraulic model. If a receptor model is used at all (many forecasts are point water or flows levels only), such products are not operational and only focus on a single peril (Pappenberger et al., 2005;Schumann et al., 2013); simulate over small spatial scales (Addor et al., 2011;Nguyen et al., 2016); require significant manual set-up and have demanding data requirements (Sanders et al., 2010;Bhola et al., 2018;Adams et al., 2018); or employ simplified representations of hydraulic processes to reduce computational costs (Paiva et al., 2013;Zheng et al., 2018a). This is predominantly because most of the computational time available is afforded to the NWP models, maximizing resolution and producing probabilistic ensemble simulations (Cloke and Pappenberger, 2009), alongside the prevailing view that full-physics hydraulic models are too computationally expensive to be used in operational forecasts (Leskens et al., 2014;Bhola et al., 2018). Yet, tropical cyclones demand forecasts of multiple flood drivers and end-users would greatly benefit from detailed, local predictions of flood extent and depth to enable a more complete risk calculation.
Official riverine flood forecasts in the US are issued by the NOAA National Weather Service (NWS) through Weather Forecast Offices. These forecasts are generated by River Forecast Centers (RFCs), providing accurate information to inform public alerts and warnings (for more information on operational practice, see Adams, 2016). These RFC forecasts are produced at particular points, with forecast information currently available at 3697 points across the contiguous US according to the NOAA NWS Advanced Hydrologic Prediction Service (AHPS; https://water.weather.gov/ahps/forecasts.php). Of these, only 155 have accompanying inundation maps (i.e. adopt the receptor component in Fig. 1). Adams et al. (2018) illustrate state-of-the-art practice at the Ohio RFC, where unsteady-state 1D HEC-RAS models rapidly translate forecast point discharge to inundation maps for 3200 km of continuous river reach. Reported errors in predicted stage are < 0.5 m. Furthermore, Mashriqui et al. (2014) apply a similar approach in the Middle Atlantic RFC, coupling 1D HEC-RAS to a tidal boundary at the mouth of the Potomac River. Reported accuracies were similar to Adams et al. (2018). While these 1D approaches provide rapid and accurate riverine forecasts, they require channel cross-section data which are only sparsely available and considerable manual set-up by skilled practitioners. Furthermore, their focus only on large-river flooding means the significant pluvial hazard posed by tropical cyclones remained unmodelled.
Here we present a medium-term (2-15 days) tropical cyclone flood inundation forecasting product which is capable of being used in an operational system, and demonstrate that high-resolution hydraulic models can now be practicably employed at large-scale in such frameworks where accurate local forecasts are lacking. We forecasted Hurricane Harvey by coupling streamflow, rainfall and storm surge predictions from NOAA to Fathom-US (Wing et al., 2017), a continental-scale hydraulic flood model of the US, and used this to produce daily flood depth footprints at~30 m resolution. Updating the hydrologic inputs with post-event observations as they became available, we also produced a model hindcast in real-time. After the event, we tested the forecast and hindcast model against ground observations and derived flood extents from the US Geological Survey (USGS) (Watson et al., 2018). The results of this validation procedure are compared to those obtained when a simpler Height Above Nearest Drainage (HAND) model from the NOAA National Water Center (NWC) is used instead of the hydrodynamic model (NOAA NWC et al., 2018).

Hydrodynamic model (Fathom)
Hurricane Harvey made landfall on the east coast of Texas in August 2017, where some areas experienced 8-day rainfall totals of over 1500 mm and dozens of USGS river gauges recorded return period flows exceeding 1 in 100 years. Three important features of this tropical cyclone were forecasted by NOAA: streamflow from the NOAA National Water Model (NWM), rainfall from the NOAA Weather Prediction Center (WPC) and predicted storm surge height from the NOAA National Hurricane Center (NHC). The NOAA NWM (http://water. noaa.gov/about/nwm) has four variants: the analysis and assimilation configuration, which provides a real-time view of current streamflow conditions; short-range streamflow forecasts up to 18 hours; mediumrange streamflow forecasts up to 10 days; and long-range streamflow forecasts up to 30 days. The medium-range product used here (NWM v1.1) takes meteorological variables from the NOAA Global Forecast System (https://www.emc.ncep.noaa.gov/GFS/) as inputs to the WRF-Hydro hydrological model (Gochis et al., 2018) to forecast streamflow for every river reach in the US, as defined by the USGS National Hydrography Dataset (NHD), via the Noah-MP land surface model (Niu et al., 2011) and a Muskingum-Cunge channel routing scheme. Here, we extract the maximum simulated streamflow on each river in the domain from the NWM medium-range configuration over 3 days of forecast model time, i.e. the maximum of all forecast streamflows within a 72-hour forecast horizon from the same model run (http:// thredds.hydroshare.org/thredds/catalog/nwm/medium_range/catalog. html). The NOAA National Water Model is unrelated to official forecasts issued by NOAA NWS based on RFC modelling, but is employed here since streamflow is forecast for every US river. The NOAA WPC data used is the 3-day interactive forecast of 72-hour rainfall for 20 × 20 km grid cells, output by a NWP model but subject to manual adjustments by forecasters at the WPC (https://www.wpc.ncep.noaa.gov/qpf/day1-3. shtml). The NOAA NHC Probabilistic Tropical Storm Surge (P-Surge) model routes simulated meteorological variables through a SLOSH model to determine potential storm surge heights at the coast with a 3day lead time (https://slosh.nws.noaa.gov/psurge2.0/).
For the fluvial inundation forecast, flood depths within a given river basin, as defined by HydroBASINS (Lehner and Grill, 2013), are extracted from an existing library of nationwide flood maps at~30 m resolution (Wing et al., 2017). These fluvial flood maps are driven by discharges from a regional flood frequency analysis (RFFA) to ensure flood inundation is simulated on every US river, meaning multiple maps corresponding to a certain annual exceedance probability (e.g. 20% to 0.1% AEP) have been generated for the whole country. For each river basin in the study area, the NWM streamflow forecast is assigned an AEP, based on basin-specific data from a US variant of the global RFFA of Smith et al. (2015) using USGS river gauges, and the relevant AEP flood map is extracted from the library. Fig. 2 visualizes this process. By sampling from pre-existing flood maps of the entire continental US, simulation quality (e.g. grid resolution, physical process representation) is not hampered by the need for a forecast with reasonable latency: the extraction process takes only seconds (Leedal et al., 2010).
For fluvial flooding, the response to extreme streamflow in a river basin is relatively well-confined to that particular locality, meaning any plausible flood event will match a pre-computed, event-agnostic inundation simulation of that area provided enough runs with different boundary conditions are executed. How rainfall and surge vary in time and space, however, is less related to the hydrologic conditioning of the ground surface. In other words: simulating these perils ahead of time is inadvisable, given the a priori specification of suitable extraction zones is less defensible than for fluvial flooding. As such, the NOAA rainfall and surge information is input to Fathom-US to generate new eventspecific depth grids, requiring~6-hour parallel simulations of coastal and pluvial flooding for a~400,000 km 2 domain at~30 m resolution on a single node of 20 cores (Intel Broadwell E5 Xeon). Alongside model-building and post-processing, the final product (maximum flood depth in a pixel from the fluvial, pluvial and surge models) was pro-duced~24 hours after the release of the NOAA data, meaning the total lead time was~2 days. The forecasts were updated with observed boundary conditions to produce a hindcast so that a more accurate delineation of impacted areas could be identified with immediacy. These boundary condition observations consisted of 24-hour rainfall totals from the NOAA NWS (https://water.weather.gov/precip/ download.php), peak flows observed at USGS river gauging stations (https://waterdata.usgs.gov/nwis/rt) and coastal water level observations from the NOAA Tides and Currents service (https:// tidesandcurrents.noaa.gov/gmap3/). During the Harvey event the forecast and hindcast flood extent data produced by the system were made freely available to first-responders and were used operationally by NASA and insurers to rapidly assess exposure.
Fathom-US is a large-scale hydraulic modelling framework, whose freshwater flooding component is set out in Wing et al. (2017). The US implementation is itself a variant of the global model first described by Sampson et al. (2015). It has been rigorously tested against thousands of bespoke flood inundation studies carried out by US government agencies, concluding that the large-scale methodology is approaching the accuracy of traditional local models (Wing et al., 2017). The framework permits rapid construction and execution of model domains as defined by the forecast footprint of US hurricane landfalls. Its computational hydraulic engine is driven by a variant of the LISFLOOD-FP hydraulic model, which solves a local inertial form of the shallow water equations in two dimensions over a regular grid (~30 m: 1 arc second resolution in this case) using a highly efficient numerical solution de Almeida and Bates, 2013). This grid is populated with elevation values from the 1 arc second version of the USGS National Elevation Dataset (NED), with levee information from the US Army Corps of Engineers National Levee Database explicitly represented. This Digital Elevation Model (DEM) has complete coverage of the conterminous US and is thus the crucial component of the framework's applicability to simulate inundation for all potential hurricane landfall locations in the US. River hydrography is represented by HydroSHEDS (Lehner et al., 2008); with those rivers wider than the grid resolution being burnt directly into the DEM, and narrower streams being represented by the subgrid 1D model of Neal et al. (2012). Fluvial modelling is executed by inserting river discharge information (which is ultimately linked to an RFFA-derived AEP, an NWM streamflow forecast or a USGS river gauge observation) at the relevant inflow points in the stream network, while the pluvial model takes spatial rainfall data and drops it directly onto the land surface ("rain-on-grid"; Sampson et al., 2013). Assumptions relating to infiltration capacity are made based on soil information from the Harmonized World Soil Database in conjunction with a modified Hortonian infiltration equation (Morin and Benyamini, 1977). In urban areas, identified using satellite luminosity data (Elvidge et al., 2007), the infiltration capacity is defined using assumed urban drainage design standards. The storm surge model component was conceived by Bates et al. (2005), who adapted the 2D LISFLOOD-FP code traditionally used in fluvial settings for coastal flooding. Using LISFLOOD-FP for such an application has precedent: as examples, Smith et al. (2012) evaluated its suitability for coastal inundation modelling in the UK and Quinn et al. (2014) applied the code in a coastal flood risk assessment. In this component, the coastal boundary line of the model domain is set within oceanic cells just offshore of the coastal flood defenses. For each cell along the coastal boundary, the predicted peak surge height was extracted from P-surge output and used to scale the tidal time series at that location to create a fractional surge height time series. Water enters the model domain along the boundary in accordance with this time series, and the hydraulic model simulates the dynamics of the surge as it interacts with the shoreline and moves inland. 2D (burnt in the DEM) and 1D (subgrid) channels are represented here also, meaning the ability of the storm surge to propagate inland via river channels is properly represented: a crucial component of coastal flood models (e.g. Maskell et al., 2014).

NWM-HAND model (NOAA NWC)
The NOAA National Water Center has recently explored the coupling of the National Water Model to the Height Above Nearest Drainage method (Rodda, 2005;Rennó et al., 2008;Nobre et al., 2011Nobre et al., , 2016. This effort forms part of the National Flood Interoperability Experiment (Maidment, 2016): which, like this paper, seeks to append the neglected receptor component to the flood forecasting cascade (Fig. 1). The HAND approach normalizes the DEM so that a given pixel takes the value of the vertical distance to the stream it drains to. A rating curve is then used to translate NWM flow forecasts to stage for a given river cell, and any land cells that drain to this stream location and have a HAND value less than the stage become flooded. Planar approximations such as these, which do not consider flow physics, have been shown to be less skillful than models which represent the dynamics of flood inundation since the inception of raster-based hydraulic modelling (Bates and De Roo, 2000) and again more recently (Afshari et al., 2018), owing to their omission of mass and momentum conservation laws. In isolated test cases however, particularly on confined floodplains with steep valley sides and straight river reaches, a planar approach may offer satisfactory performance (Bates and De Roo, 2000). Furthermore, the reduction in model skill may be considered acceptable where rapid solutions are required to large computation problems (Afshari et al., 2018;Liu et al., 2018). Geographies where these methods are appropriate are difficult to specify a priori though, since they have not undergone the same level of wide-area testing as hydrodynamic approaches (e.g. Wing et al., 2017). In evaluating the simpler HAND-based approach alongside the more complex hydraulic model presented here, the trade-off between including the physics of water flow in a forecast model and computational efficiency can be quantified.
The NWM-HAND model was executed for Hurricane Harvey (NOAA NWC et al., 2018) using the USGS NED at 1/3 arc second resolution as the DEM. The model was run each day with NWM analysis and assimilation outputs, providing a snapshot of the flood extent caused by Harvey at that time. By taking the maximum extent from all of these Creek, a tributary of the Trinity River~100 km NE of Houston, and has a drainage area of~500 km 2 . Streamflow is forecast for each flowline from the USGS NHD by the NOAA NWM. The graph in (b) shows the regional flood frequency analysis at the outlet of basin 17944. The NWM 3-day forecast peak discharge was 845 m 3 s −1 from a model run executed on 27 August 2017. This corresponds to the 1 in 150 year streamflow in this basin, whose inundation has already been simulated in the Fathom-US library as shown in (c). The depth grid is then extracted for this catchment. The final flood inundation map is shown in (d), where this process has been repeated for all river basins in the domain and integrated with the new pluvial simulations to represent headwater (rivers with drainage area < 50 km 2 ) and surface water hazard.
HAND-based simulations, an analogous comparison can be made with the Fathom hindcast: both models intended to simulate the maximum flood extent when driven with observations.

Validation data
We evaluated the Hurricane Harvey forecast and hindcast footprints against ground observations of flood extent and depth made by the USGS (Watson et al., 2018). After the event, USGS field teams visited impacted basins and collected high water marks (HWMs) in accordance with the guidelines of Koenig et al. (2016). These are surveyed from the debris or stain lines left by the receding water on the sides of buildings, trees, fences and other structures. Horizontal co-ordinates are obtained with a GPS, while vertical heights are referenced to the NAVD88 datum. The 2123 resultant HWMs were, in combination with USGS gauging station maximum water levels, interpolated (technique described in Musser et al., 2016) across 1.4-3 m resolution DEMs built with LiDAR data for fourteen sites. This provided the current best reconstruction of flood extents from the observed water level data. Though these observation data are used as a benchmark in this study, they are not errorfree. Watson et al. (2018) listed uncertainties for specific data points in their study, and these range from < 0.01 m-0.55 m (mean 0.07 m), though no information was specified regarding the method of quantification. For further details, see Watson et al. (2018).

Validation metrics
Firstly, the gridded flood depth benchmark data, derived from observed water levels, were used to test the extent to which the models captured the overall spatial pattern of the flooding. For this, the same binary pattern measures were used as in Wing et al. (2017): where M and B represent model and benchmark cells respectively, and the subscript 1 and 0 indicate if the cell considered is wet and dry respectively.
The 'hit rate' metric (HR; Eq. (1)) penalizes type II errors and is thus a measure of the model's tendency to underpredict the benchmark flood extent. It can be interpreted as the proportion of benchmark flooded areas that were replicated by the model. The 'false alarm ratio' (FAR; Eq. (2)) penalizes type I errors and so represents the tendency of the model to overpredict the benchmark flood extent. This metric can be interpreted as the proportion of modelled flooded areas that are dry in the benchmark. The 'critical success index' (CSI; Eq. (3)) penalizes both type I and type II errors, thus being a metric that accounts for both under and overprediction. It ranges between 0 (no match between model and benchmark) and 1 (perfect match between model and benchmark) and can be thought of as representing the model performance over floodplain areas only as it excludes areas that do not inundate in both the model and benchmark data. Finally, 'error bias' (EB; Eq. (4)) is the ratio of type I to type II errors. Values greater than 1 indicate the model tends to overpredict, while values less than 1 indicate a tendency to underpredict with respect to the benchmark.
Secondly, we calculated the difference in water surface elevation (WSE) specified by the USGS HWMs and the models. These differences are analyzed in three ways: where O n and M n is the WSE at a given observation point and corresponding model cell respectively, N is the number of HWMs analyzed and everything else is as above. The original dataset was trimmed down to 1134 points so that the analysis was confined to high quality HWMs that were not taken at the same location and which were referenced against the same geodetic datum.
Both Root Mean Squared Error (RMSE; Eq. (5)) and Mean Absolute Error (MAE; Eq. (6)) measure the average magnitude of the errors (where an error is O n -M n ). RMSE is a quadratic scoring rule (meaning greater weight is given to larger errors), while MAE is linear (all errors have equal weight). Mean Error (ME; Eq. (7)) calculates the average error whilst still accounting for their sign: a negative ME indicates the model has a tendency to overpredict observed WSEs, while a positive one suggests underprediction.

Flood extent comparison
The results of the flood extent comparison between the Fathom hindcast model (driven with observed streamflow, rainfall and surge heights) and USGS benchmarks for each of their fourteen study sites are shown in Table 1a and Fig. 3. Most of the USGS flood extent is captured by the model, with 78% of observed wet pixels being correctly identified as such in the model on average across all sites. Many of the sites with high HRs also have relatively high FARs (e.g. Tres Palacios River (Fig. 3a), Lower San Bernard River (Fig. 3l) and Upper Brazos River (Fig. 3b)), driving a lower overall correspondence and overpredictive bias as indicated by their CSIs and EBs respectively. Conversely, many lower-HR sites have very low FARs (e.g. Upper and Lower Neches River ( Fig. 3k and 3 m) and Cow Bayou (Fig. 3f)), generating comparable CSIs and EBs < 1 indicating underprediction. Some high-HR sites have correspondingly high CSIs, owing to low FARs (e.g. Lower Brazos River (Fig. 3e) and Middle San Bernard River (Fig. 3j)), suggesting there is a strong match between model and benchmark flood extents. CSIs, which are the most discriminatory metric, range from a poor 0.5 (the model is correct as often as it is incorrect) on the Upper San Bernard River (Fig. 3i) to an excellent 0.9 (a 90% match between model and benchmark) on the Lower Brazos River (Fig. 3b). Across the domain, an average CSI of 0.66 indicates that roughly 2 in every 3 model pixels in the functional floodplain match the benchmark. Further to the details in Section 2.3, the USGS benchmark flood inundation extent is subject to uncertainty. It is generated via interpolation between point HWMs (themselves containing vertical error), rather than a genuine 2D observation, meaning the accuracy of these maps is heavily dependent on the spatial resolution of the HWMs. Without representing the physics of flow that generated the HWMs in a hydraulic model (e.g. timing and interaction of adjacent streamflows), the interpolation procedure may produce unrealistic inundation extent boundaries where unconstrained by point observations. To put the pattern scores into context: the Wing et al. (2017) model attained average CSIs of~0.75 against detailed local models, with a maximum of~0.90; global flood models average CSIs of~0.50, obtaining up to~0.70 in isolated test cases, against a variety of local models and satellite-derived flood extents Winsemius et al., 2016;Dottori et al., 2016;Ward et al., 2017); local flood models that have been manually built and extensively calibrated generally achieve CSI scores of 0.70-0.80 when compared to satellite observations of flood inundation (Aronica et al., 2002;Pappenberger et al., 2007;Wood et al., 2016) and up to 0.9 when benchmarked against very high quality data (e.g. Bates et al., 2006;Altenau et al., 2017). Fleischmann et al. (2019) propose that a hydrodynamic model provides locally relevant estimates of flood extent when CSI > 0.65. It should be noted that the CSI metric is sensitive to the selected study area, however: favoring overpredictive models of larger floods on flat terrain compared to the reverse case (Stephens et al., 2014). The results shown here are towards the higher end of those in the literature (though many of these are for smaller floods where high CSIs are difficult to obtain). The hindcast model performance is, however, shy of the very high CSIs exhibited by calibrated local modelobservation comparisons. The results thus indicate that, across the model domain, the hindcast hydraulic model presented here has some skill in replicating benchmark patterns.
The results of the comparison between the NWM-HAND model and the USGS benchmark are shown in Table 1b. It is evident here that the NWM-HAND model has lower predictive skill when benchmarked against the USGS flood extents. Mean HRs indicate 46% of flooded areas are correctly captured, with mean CSIs suggesting just over 4 in every 10 pixels in the functional floodplain are identified correctly. It should be noted that the NWM-HAND model structure is only capable of representing fluvial flood hazard, meaning both hydrologically-isolated flooding from intense local rainfall and coastal surge will not be captured. Hurricane Harvey was predominantly a pluvial event: flooding arose in many areas due to intense local rainfall on the land surface, rather than from rivers flowing out of bank. Fig. 4 shows how different the Fathom and NWM-HAND models look when this runoff component is represented in an area to the NE of Houston. HAND being a fluvial-only model perhaps explains why coastal basins (e.g. two sites in Matagorda Bay) have such low scores, since storm surge may have played an important role here. The lack of a rainfall component may explain poor performance on small streams: the Upper San Bernard River has a drainage area of~500 km 2 and the San Jacinto River site Table 1 A comparison of (a) the Fathom model hindcast and (b) maximum flood extent from the NWM-HAND model hindcast of the NOAA NWC to the benchmark USGS flood extents. The metrics Hit Rate (HR), False Alarm Ratio (FAR), Critical Success Index (CSI) and Error Bias (EB) are explained by Eqs. (1)-(4) in Section 2.4. To aid interpretation, (c) indicates the color scale used to classify each measure: the darker the color, the higher the performance. contains headwater streams with drainage areas as small as~20 km 2 , and both exhibit extremely poor performance with NWM-HAND. The flood hazard arising on these small streams is not driven by traditional fluvial flooding processes, where aggregation effects lead to a large, low-amplitude flood wave which propagates downstream, but by the rapid lateral surface flow of intense local rainfall generating a flash flood. This further contextualizes the scores obtained by the Fathom model at these sites too, where, despite possessing a pluvial model component, the difficulties in simulating this phenomenon are evident. With that being said, for even the relatively simple problem of modelling the Brazos River, which has a flat floodplain confined by steep valley sides, NWM-HAND correctly identifies less than half of the pixels. The hydrodynamic Fathom model, in contrast, correctly identifies between 74% and 90% of flooding from the Brazos. For reference, CSIs obtained by HAND models have been reported between 0.5 and 0.9 for a selection of watersheds in the US when comparing HAND approaches to observational data or hydrodynamic methods (Zheng et al., 2018a(Zheng et al., , 2018bZhang et al., 2018;Afshari et al., 2018). The results from the NWM-HAND model presented here do not appear to be as skillful as those presented in these smaller scale studies. Further, despite this approach making use of higher resolution (~10 m) terrain data, it is outperformed by a coarser (~30 m) hydrodynamic model. This suggests that the importance of higher grid resolution is only realized when physical processes are represented.
Further to testing the Fathom and NWM-HAND models that were driven with real-time observations, the Fathom forecast variant was also benchmarked against the USGS flood extents. It was driven with NWM peak flows from a 3-day forecast commencing at 1800 GMT-6 on 27th August 2017, as well as forecast rainfall and surge data from this time. Hurricane Harvey generated peak streamflows across the domain for over 5 days: beyond the horizon of the 3-day forecast model presented here. The forecast model output chosen for validation here captures (temporally) the main hurricane impact in Texas, which was felt prior to 31st August 2017. The results of this comparison are shown in Table 2, where the Fathom forecast model sees a performance drop with respect to its hindcast variant: mean CSI drops by 0.09 and mean HR drops by 0.03. While HRs actually increase for most sites when compared to hindcast performance (as high as 99.9% on the Lower San Bernard River), this is amidst a backdrop of increased EBs that indicate a heavy bias towards overprediction (mean EB of 84.02, compared to a more modest 1.42 in the hindcast). It's worth noting here, though, that EBs are measured on a logarithmic scale. Since underprediction is indicated by values less than 1 and overprediction by values greater than 1, a value of 0.5 has the same magnitude bias as a value of 2, yet the mean of these two values is 1.25 (not 1). Underrepresented by this

Table 2
A comparison of the forecast variant of the Fathom model to the benchmark USGS flood extents. The color scale used is explained in Table 1c. mean EB, therefore, are the extremely high underpredictive biases evident on the Lower and Upper Neches River, and Cow and Pine Island Bayous. The use of forecasted, rather than observed, model inputs is therefore widening the tails of the bias distribution (EBs moving further from 1) and reducing overall performance (CSIs decreasing). With that being said, capturing, on average, three-quarters of benchmark inundated pixels (HR) while keeping false alarms at 23% (FAR) is indicative of fair performance in the forecast model.
Aside from structural errors in the hydraulic model and its components (e.g. the DEM), the hindcast model contends only with errors in the measured data it is driven with. Though this is not insignificant, 25-40% error in measured flows is common and can be even higher when flows are extreme (Di Baldassarre and Montanari, 2009;Coxon et al., 2015;Westerberg et al., 2016), it is likely to be much lower than uncertainties in the boundary conditions used in the forecast hydraulic model. These uncertainties, though, are typically explored by the use of ensemble prediction systems (EPS) in meteorological (e.g. Buizza, 2005) and hydrological (e.g. Thielen et al., 2009) forecast models, where many realizations of projected weather or water levels are simulated for a single site (Cloke and Pappenberger, 2009). These probabilistic frameworks thus permit some measure of forecast predictability to be quantified. In the models presented here, there is only a single deterministic flood extent for each day's forecast. The reason for this is two-fold. Firstly, the NOAA NWM itself currently produces deterministic streamflow forecasts for its medium-range variant. NWM v1.1 was the version in existence during Hurricane Harvey, but current and future versions will improve on this functionality (e.g. NWM v2 will have a 7-member ensemble in the medium-range forecast). Were a probability distribution of projected river discharge available, the rapid extraction algorithm that samples from a pre-existing library of fluvial flood maps could feasibly produce 2D grids where each cell represents the probability density function of water depth. This process takes only seconds, so scaling to a probabilistic framework remains trivial. Secondly, the computational expense of running new hydrodynamic pluvial and coastal flood models in an EPS would add significant computational burden. With that being said, running an ensemble of the hydraulic model presented here (a single deterministic run re-quires~120 processor hours) would be a manageable task for the HPC facilities of leading forecast centers and so probabilistic depth grids of these flood drivers could be constructed too. Incorporating the hydrodynamic model in an EPS would be a relatively straightforward addition to the method proposed here, but is one which is beyond the scope of the current paper.
Digging deeper into the forecast uncertainties, observations of peak streamflow from 63 USGS river gauges across the footprint domain are compared to the corresponding NWM streamflow forecast from 27th August 2017. All river gauges used experienced peak flow during this forecast's 3-day time horizon. By volume, the mean absolute error (as in Eq. (6), but for discharge rather than water surface elevation) comes to 2970 m 3 s −1 . For reference, this error is equivalent to roughly 80% of the peak discharge experienced on the Lower Brazos River during Hurricane Harvey. Errors as a proportion of the observed discharge are shown in Fig. 5. The MAE is 290%, with a very high bias towards overprediction (mean error of -281%). Forecast discharge on the Buffalo Bayou near Addicks was an eighteenfold overestimate: an implausible 7129 m 3 s −1 when 390 m 3 s −1 was observed. In some cases, though, the NWM was very accurate. For instance, on the East Fork of the San Jacinto River near New Caney: 3492 m 3 s −1 was forecasted while 3398 m 3 s −1 was observed. It should be noted that this a single forecast from single point in time, while the medium-range NWM variant is run four times a day for 80 time horizons. The testing presented here, therefore, will not be representative of NWM performance during Harvey and should not be viewed as an authoritative assessment of model skill. It may be ill fortune that the particular Fathom flood extent forecast selected for this study was driven by anomalously poor discharge forecasts, but their benchmarking against observed flows provides useful context for analyzing the skill of the 2D flood forecast. Furthermore, the NWM was in its infancy around the time of Harvey (v1.1). Improvements, both to date and in the future (v2), of this hydrological model after rigorous validation exercises will result in much closer replication of observed streamflows than those presented here. Official NOAA RFC forecasts are of much greater accuracy (Adams, 2016), but do not have total coverage of US rivers. In Fig. 6, the discharge errors at USGS observation sites are plotted and polygons representing the fourteen sites are colored by their error bias (from Table 2). Many of the USGS gauging stations do not relate to sites where USGS benchmark flood extents were generated, perhaps explaining why the CSIs in Table 2 are not as low as the discharge errors in Fig. 5 might suggest. Stations are heavily concentrated in Houston, of which only some are relevant to just one of the fourteen study sites (San Jacinto River (Fig. 6g)). Equally, many of the sites do not have a representative USGS gauging station: but for those that do, biases in the forecast discharge are often replicated in the inundation extent (where the site polygon and gauge point are of a similar color in Fig. 6). Forecast flood inundation in the east of the domain was driven by NWM discharges that are generally biased towards underprediction, which is reflected in the bias of the flood extent (Big Cow Creek (Fig. 6c), Upper (Fig. 6k) and Lower Neches River (Fig. 6m), Cow Bayou (Fig. 6f), Pine Island Bayou (Fig. 6n)). Most of the remaining sites have an overpredictive flood extent bias, where nearby gauges indicate that the NWM forecast discharge was overpredictive also (Upper (Fig. 6b) and Lower Brazos River (Fig. 6e) and San Jacinto River (Fig. 6g)).
The conclusions drawn in this section generally verify those found in the wider literature that models based on some formulation of the shallow water equations produce a more accurate simulation of flood extent than simple planar approximations, though a performance differential of this magnitude has not been documented previously for fluvial flooding (in Bates et al. (2005) the planar method obtained a CSI of 0.11 when simulating coastal floods). As fully hydrodynamic approaches become increasingly computationally tractable, the advantages of using GIS-based methods will further diminish. Intuitively, the performance of the Fathom forecast model is lower than that of its hindcast counterpart, partly due to some significant errors in the NWM forecast streamflows and perhaps forecast rainfall relative to observations also. Yet, these results evidence that despite these limitations the forecast model has fair skill in replicating benchmark flood extents for Harvey.

High water mark comparison
The Fathom models were further tested against the raw HWMs that the USGS used to construct the benchmark flood extents in the preceding section. These observational data represent the maximum water surface elevation surveyed by USGS field teams as a result of Hurricane Harvey at a given point. For the hindcast model, the mean absolute WSE error between the selected observation points and the modelled value at the same location is 1.03 m. The influence of a small number of large outlying error values means that the RMSE is significantly higher at 1.71 m. Fig. 7a outlines the distribution of these WSE differences, exhibiting a Gaussian distribution about a central tendency close to 0 (mean error: 0.15 m). 50% of the errors lie between −0.43 m (Q 25 ) and 0.90 m (Q 75 ), while 90% of the errors are between −2.32 m (Q 5 ) and 2.12 m (Q 95 ). Fig. 7b exhibits the expected shallowing and widening of the error distribution when analyzing the Fathom forecast model. MAE rises to 1.22 m (RMSE = 1.88 m). However, this does not represent a drastic reduction in skill from hindcast to forecast.
Point comparisons of surveyed vs. simulated WSEs are a particularly stringent test for a model of this scale. Indeed, such examinations are rarely carried out for analogous models, particularly uncalibrated, large-scale, high-resolution, ones which include urban areas and smaller streams, meaning context-setting for these results is quite difficult. The Wilson et al. (2007) model of the Amazon obtained an RMSE of 2.37 m (0.99 m at high water) when comparing simulated water levels to those derived from satellite altimetry data. Schumann et al. (2013) calibrated their 1 km resolution forecasting model of the Lower Zambezi River to within 0.27 m of ICESat-derived water levels. This, though, is the product of using a smooth, coarse-resolution DEM of a very flat and wide floodplain, meaning WSEs change very little as the flood extent grows. Obtaining low errors under such circumstances is much less challenging than in the test case presented here, especially when the Zambezi model has been calibrated to observations. Neal et al. (2009), in a high-resolution, reach-scale hydraulic model of an urban area built with local data, obtained a maximum RMSE of 0.28 m when calibrated to HWMs. As another example, the standard deviation of errors in the calibrated Mignot et al. (2006) model of an urban area in France was 0.53 m. It should be noted that hydraulic models that  have been calibrated may not necessarily be more inherently skillful, meaning these local-scale model errors may not be analogous to the uncalibrated large-scale model presented here. As an example of an operational NOAA flood forecast, Adams et al. (2018) compare modelled hindcasts of their large-scale framework in the Midwest to USGS gauged stages and found errors of < 0.5 m. While this is more accurate than the model presented here, it required thousands of manual bathymetric surveys to build, meaning the Adams et al. (2018) model only simulates flooding on large rivers of known geometry. In the context of Hurricane Harvey, where flooding from small streams and surface runoff accounted for a considerable portion of the hazard, this NOAA exemplar model may not have attained similar errors when benchmarked against high water marks on the floodplain and particularly where flooding was predominantly pluvial. Zheng (2018) tested the NWM-HAND model of maximum inundation outlined in the preceding section (NOAA NWC et al., 2018) against this same set of USGS HWMs and found the standard deviation of model error to be 4 m. Previous studies have also quantified the errors in surveyed HWMs that such models are calibrated to or validated against. Errors are generally 0.3-0.5 m (Schumann et al., 2007;Neal et al., 2009;Horritt et al., 2010;Fewtrell et al., 2011), owing to deposition during a hiatus in flood recession (i.e. not high water), wall seepage or debris line width. These are broadly consistent with the maximum errors reported for the HWMs used in this study, but average errors are generally < 0.1 m (Watson et al., 2018).
It is clear that the automatically generated forecast and hindcast models in this paper do not obtain comparable WSE errors to bespoke hydraulic models in the wider literature or NOAA NWS RFC forecasts, though without hydraulic model calibration, local data and manual operation they were never likely to. It is also important to point out the difference in purpose between this and other models. Rather than seeking to perfectly replicate a given flood event with few limitations regarding processing and computation time, this is a forecast product designed to quickly indicate areas susceptible to flooding from an incoming storm to enable an immediate response. Rapid exposure assessments and resource allocation by first responders do not demand a highly accurate map of water depths, but rather a broad indication of the spatial patterns in flooding that may occur (Pitt, 2008;Price et al. 2012). Despite some sacrifice to point precision (e.g. with respect to the accuracy of the 155 discrete forecast inundation points in NOAA AHPS), the framework's total coverage, both in geography and flood driver, serves to fill the information gaps left by sporadic yet accurate local forecasts. Besides, efforts to reduce WSE errors in the hydraulic model are of little value when considering the commensurate or larger errors in the meteorological and hydrological models that precede it in the forecast cascade. Cangialosi (2018) noted that intensity and track errors in the NOAA NHC 3-day forecasts during the 2017 hurricane season were~7 ms −1 and~150 km respectively. The maximum total rainfall in the NOAA WPC 3-day forecast was~1000 mm, yet a maximum of over 1500 mm was observed (Blake and Zelinsky, 2018). Hydrological models, used to translate rainfall to streamflow, may also be significantly uncertain, even if the meteorological inputs were error-free (Blöschl et al., 2013). Generally accepted benchmarks for satisfactory performance in hydrological models are: (i) within 25% error in simulated discharge and (ii) a Nash-Sutcliffe efficiency of greater than 0.5, meaning model mean square discharge error represents less than half of the observed variance (Moriasi et al., 2007;Refsgaard and Knudsen, 1996;Ritter and Muñoz-Carpena, 2013). With source and pathway uncertainties such as these propagating into the receptor component of the forecast cascade (Fig. 1), the hydraulic model deviating from observations by an average of~1 m is unsurprising yet may still be useful for event early warning and first responder preparedness in the absence of local modelling strategies.
659 of the HWMs collected by the USGS also reported the height of the water above the ground surface, meaning a surveyed ground elevation can be calculated by subtracting this measurement of water depth from the WSE. Comparing these elevation values to those contained at the corresponding location in the DEM provides further context for the errors shown in Fig. 7a and b. Again, it should be noted that the observed data contends with error in both the surveyed water surface elevation and its height above the ground: such errors could plausibly be half a meter or more (Watson et al., 2018). Fig. 7c shows the spread of these elevation errors: the DEM RMSE is 3.77 m, though this is heavily biased by a handful of implausibly high errors (maximum negative error: −58.9 m; maximum positive error: 20.9 m) which are just as likely to be due to human error in the manual ground survey as erroneous values in the DEM. Reducing the influence of these errors by considering MAE, the quantity stands at 1.19 m. With forecast and hindcast WSE MAEs of 1.21 and 1.03 m respectively, the water levels simulated by the model are similar to or outperform the accuracy of the DEM, implying two key findings. Firstly, much of the WSE error is attributable to the quality of the USGS NED (the source of the model DEM) in this domain. It stands to reason, then, that in domains with a greater proportion of LiDAR in the NED, ground, and thus water surface, elevation error would drop. Once again, the assertion made by Horritt and Bates (2002) still holds: topography is the major control on flood inundation patterns. Secondly, although it seems counterintuitive that errors in WSE can be smaller than the DEM from which they are derived, this underlines the idea that relative, rather than absolute, DEM accuracy is much more important in hydraulic modelling. How the elevation varies between pixels in a locality controls the movement of water over a floodplain, not a pixel's elevation relative to a vertical datum (though this does not apply for simulations of coastal flooding). Typically, relative DEM errors are much lower than absolute ones: Gesch et al. (2014) quantified a~20% reduction in the standard deviation of relative compared to absolute errors in the NED at 1/3 arc second resolution. Fig. 8 displays the spatial distribution of the WSE errors for the hindcast. Areas of high performance (in white; errors between −0.5 and 0.5 m) are evident across the domain, particularly in Matagorda and Brazoria Counties south of Houston (approx. 29.2°N 95.5°W; to the south of the top-left panel in Fig. 8 Fig. 8) and the Calcasieu River in Lake Charles (30.2°N 93.2°W; to the east of the top-right panel in Fig. 8), signaling model overprediction of WSEs. Incorporating Fig. 9 into this interpretation, which exhibits the spatial distribution of the DEM errors in Fig. 7c, it may be expected that areas of high WSE error are also areas of high ground elevation error. Observing Figs. 8 and 9 in tandem suggests that this is not the case. Areas of low WSE error to the south of Houston are in fact dominated by relatively high ground elevation error (as colors move away from white towards red and black), while ground elevation errors in areas identified as having high WSE error (Houston and Lake Charles) are low or, if anything, of the opposite sign to the WSE errors. This reinforces the suggestion that relative, and not absolute, DEM accuracy is pre-eminent and draws on the well-documented challenges of hydraulic modelling in urban areas (Yu and Lane, 2006a;Mason et al., 2007;Hunter et al., 2008), especially at large scales (Wing et al., 2017). High WSE errors are generally confined to urban areas in this study, where horizontal, rather than vertical, elevation accuracy (i.e. grid resolution) controls flood model performance to a much greater extent than elsewhere. This is due to the increased prevalence of hydraulically important anthropogenic features (small-scale flow paths, building walls, levees, roads and much else) which are unresolved by the elevation data since they are smaller than the width of a grid cell. In the absence of the computational capacity available to run large-scale models at very high-resolution (e.g. Sampson et al., 2012), which is some distance from feasibility given doubling the granularity of grid resolution increases computation time by an order of magnitude (Savage et al., 2016), the solution to this problem may lie with nested or variableresolution grids where high-risk cities can be modelled at finer resolution (e.g. Sanders et al., 2010;Kim et al., 2014;Sanders and Schubert, 2019) or innovative sub-grid scale solutions (e.g. Yu and  Lane, 2006b;Sanders et al., 2008;Schubert and Sanders, 2012;Guinot et al., 2017): the bottleneck of the latter approach being obtaining parameterization data at large scale.

Conclusions
This paper presents a flood forecasting product for hurricanes in the US that has the potential to be used operationally in the absence of accurate local forecasts, comprehensively testing it against ground truth data collected by the USGS. The framework takes available hydrologic NOAA forecasts as inputs to an existing continental-scale model structure (Wing et al., 2017), accounting for all primary flood drivers, in order to rapidly simulate event water depth grids for a given domain anywhere in the US. Comparing model hindcasts of Hurricane Harvey to benchmark data of its maximum flood extent indicates that the model has skill in picking up the spatial patterns of inundation (mean CSI value of 0.66). When benchmarked against surveyed water surface elevations, the model misestimates this quantity by roughly 1 m on average. This is amidst a backdrop of similar errors in the DEM and up to~0.5 m of error in the measured HWMs. The model in forecast mode experiences only a moderate drop in performance relative to the hindcast. We conclude this to be commensurate to or beneath likely uncertainties in the preceding components of the forecast cascade, as well as similar to errors in the underlying DEM. With expanded lidar coverage within the USGS NED, the model presented here may approach the accuracy of operational NOAA inundation forecasts (Adams et al., 2018;Mashriqui et al., 2014) but with total coverage and for a diverse range of flood drivers.
Leading forecast centers generally only produce point water levels and flows (e.g. river discharge, storm surge height), neglecting the crucial receptor component that translates this information to a 2D grid of flood depths in favor of focusing almost exclusively on improved meteorological modelling. Contemporary 1D approaches by NOAA RFCs are likely more accurate and computationally efficient in fluvial settings, but these are only available in limited areas with accurate local data. This paper shows that large-scale hydraulic modelling of fluvial, pluvial and coastal flooding can and should play a role in medium-term forecasts, outperforming simpler GIS-based approaches. With the hydrodynamics of pluvial and coastal flooding from Hurricane Harvey taking only~6 hours to run on a single node of 20 cores and extractions from a pre-computed library of fluvial flood maps taking only seconds, forecast centers can couple such a module to their existing mediumterm forecast frameworks and provide benefits to a plethora of endusers, while sacrificing only a marginal portion of available computation time. The principle of sampling from a pre-computed inventory of flood maps could permit this framework to be applied in a probabilistic ensemble forecast; an important method in accounting for model and exogenic errors. This will be addressed in forthcoming research.
Employing a true hydrodynamic model which properly represents the physics of floodplain flow is shown to outstrip the performance of a simplified GIS-based approach. While these planar approximations have gained in popularity due to their "quick-and-dirty" solutions to computationally-intensive problems, we have demonstrated that this is not a substitute for 2D hydraulic modelling in this instance. Mass-and momentum-conserving hydraulic codes are shown to be suitably fast and much more accurate than zero-physics approaches, even over large scales where such simulations have historically been intractable.
Looking to the future, improved representation of terrestrial featuresthrough sub-grid parameterization (e.g. Guinot et al., 2017;Sanders and Schubert, 2019), more comprehensive inventories of defense structures (Scussolini et al., 2016) and large-scale acquisition of river channel information from imminent satellite launches (e.g. NASA's Surface Water and Ocean Topography satellite in 2021; Biancamaria et al., 2016) to name but a few projects on the horizonwill herald yet another revolutionary leap in large-scale hydraulic modelling.

Data availability
Streamflow forecasts from the NOAA National Water Model are available from https://nomads.ncep.noaa.gov/pub/data/nccf/com/ nwm/. NOAA Weather Prediction Center rainfall forecasts can be accessed at https://www.wpc.ncep.noaa.gov/qpf/day1-3.shtml. NOAA National Hurricane Center coastal surge height forecasts are accessible from https://slosh.nws.noaa.gov/psurge2.0/. USGS river gauge observations can be downloaded from https://waterdata.usgs.gov/nwis/ rt. Rainfall observations from the NOAA National Weather Service are available at https://water.weather.gov/precip/download.php. Observed coastal surge heights from NOAA Tides and Currents can be accessed at https://tidesandcurrents.noaa.gov/gmap3/. The hydrodynamic model LISFLOOD-FP can be downloaded from http://www. bristol.ac.uk/geography/research/hydrology/models/lisflood/ downloads/. Output from the NOAA National Water Center Hurricane Harvey HAND models is available at https://www.hydroshare.org/ resource/fe85a680d0144e79b39e8c483dc1e5aa/. USGS observations and benchmark data can be downloaded from https://www. sciencebase.gov/catalog/item/5a85f30fe4b00f54eb36d3a9. The Fathom Harvey depth grids presented in this study are proprietary in nature, but can be made available for academic research purposes (i.e. not commercial, policy or regulatory applications) by contacting Christopher Sampson at Fathom (c.sampson@fathom.global).

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.