Predicting the Storm Surge Threat of Hurricane Sandy with the National Weather Service SLOSH Model

Numerical simulations of the storm tide that flooded the US Atlantic coastline during Hurricane Sandy (2012) are carried out using the National Weather Service (NWS) Sea Lakes and Overland Surges from Hurricanes (SLOSH) storm surge prediction model to quantify its ability to replicate the height, timing, evolution and extent of the water that was driven ashore by this large, destructive storm. Recent upgrades to the numerical model, including the incorporation of astronomical tides, are described and simulations with and without these upgrades are contrasted to assess their contributions to the increase in forecast accuracy. It is shown, through comprehensive verifications of SLOSH simulation results against peak water surface elevations measured at the National Oceanic and Atmospheric Administration (NOAA) tide gauge stations, by storm surge sensors deployed and hundreds of high water marks collected by the U.S. Geological Survey (USGS), that the SLOSH-simulated water levels at 71% (89%) of the data measurement locations have less than 20% (30%) relative error. The RMS error between observed and modeled peak water levels is 0.47 m. In addition, the model’s extreme computational efficiency enables it to OPEN ACCESS J. Mar. Sci. Eng. 2014, 2 438 run large, automated ensembles of predictions in real-time to account for the high variability that can occur in tropical cyclone forecasts, thus furnishing a range of values for the predicted storm surge and inundation threat.

. (a) GOES-13 natural color satellite image at 17:45 UTC on 28 October 2012 (courtesy of NASA Earth Observatory); and (b) surface weather chart at 21 UTC 29 October 2012, approximately two and a half hours before landfall (courtesy of the National Oceanic and Atmospheric Administration (NOAA)). Note the interaction of the hurricane with the approaching winter storm, the subsequent drop in mean sea level pressure to 940 mb, and the development of cold and warm fronts during the hybridization process off the coast of New Jersey.
One of the most dangerous aspects of Hurricane Sandy was its large size, approximately 1150 miles (1850 km) in diameter, based on the extent of the last closed isobar, with a wind field that created a significant storm tide threat to vast areas along the Atlantic coastline and inland. Hurricane Sandy retained its large wind field, large radius of maximum winds, and hybrid characteristics through landfall [2]. After Hurricane Sandy made landfall in NJ, its sustained winds increased as an effect of the winter storm approaching from the west. The combination of both Hurricane Sandy and the winter storm, timed with the full-moon high tide on the night of 29 October, worsened the storm-tide flooding along the NJ, NY and CT coastlines and caused significant flooding far inland along the Delaware and Hudson Rivers [3]. Hurricane Sandy caused 147 direct deaths (286 total) and damage of $68 billion dollars. It is the second-costliest Atlantic hurricane on record.
The storm surge above astronomical tide produced by Hurricane Sandy reached its highest observed levels of 3.86 m (12. According to a recent National Hurricane Center (NHC) technical memorandum [4], inundation is defined as the total water level that occurs on normally dry ground as a result of the storm tide. It is expressed in terms of height of water, in feet, above ground level (AGL). NHC's official forecasts provide storm surge-induced flooding information in terms of inundation (feet of water above ground level). The tidal datum MHHW (Mean Higher High Water) is considered the best possible approximation of the threshold at which inundation can begin to occur since at the coast, areas higher than MHHW are typically dry most of the time.
The highest recorded total water levels, which occurred within half an hour of high tide in the Staten Island and Manhattan areas, reached a record 4.28 m (14.06 ft) above Mean Lower Low Water (MLLW), 2.74 m (8.99 ft) above MHHW at The Battery, NY; a record 4.36 m (14.31 ft) above MLLW, 1.98 m (6.51 ft) above MHHW at King's Point, and 4.44 m (14.58 ft) above MLLW, 2.76 m (9.06 ft) above MHHW at Bergen Point West Reach. At The Battery, the storm tide (the combination of storm surge and astronomical tide [4]) crested 1.39 m (4.55 ft) higher than the water that occurred during Hurricane Irene (2011) [2]. Storm tide records were broken in Sandy Hook, NJ with 4.03 m (13.23 ft) MLLW, 2.44 m (8.01 ft) MHHW and at Philadelphia, PA with 3.24 m (10.62 ft) MLLW, 1.2 m (3.93 ft) MHHW 8 h after landfall. The tide gauge at Sandy Hook failed before the peak water levels were reached. Table 1 summarizes the maximum total, tide (referenced to various vertical datums) and surge water levels reached at three NOAA stations at the coast: The Battery, Bergen Point and Kings Point. At The Battery total water levels crested at the same time as the surge, even though the highest tides arrived half an hour earlier. At Bergen Point the maximum surge arrived half an hour after the highest total water level, while at Kings Point the maximum surge arrived two hours before the highest total water level.
A buoy at the entrance of New York Harbor (Station 44065), 15 nm southeast of Breezy Point, NY, measured a record significant wave height (SWH, the highest one-third of all wave heights measured during a 20-min sampling period) of 9.86 m (32.5 ft) at 00:50 UTC on 30 October and an atmospheric pressure of 958 hPa, while buoys in Central (44039) and Western (44040) Long Island Sound recorded SWHs of 2.2 m and 2.1 m, respectively. Buoy (44009) at Delaware Bay, 48 km (26 nm) SE of Cape May, NJ, USA, reached a SWH of 7.38 m. At more than 300 km (190 miles) away from the point of landfall at Block Island, RI (44097), SWHs reached 9.48 m. Even as far away as 450 km (280 miles) at Buoy 44008, located 54 nm SE of Nantucket Shoals, a SWH of 10.97 m was registered. Table 1. Maximum total, tide (referenced to various vertical datums) and surge water levels reached at three NOAA tide gauge stations at the coast: The Battery, Bergen Point and Kings Point, NY (see Figure 2 for station locations).  These various measurements depict the difficulty in assessing the storm surge threat because water level values might be referenced to different vertical datums or the quoted water surface elevations might represent only partial components of the total water level (e.g., tide or surge). It is easy to see how the public could become confused by this plethora of information and why it is crucial to communicate the storm surge threat clearly to the public to minimize the loss of life. Therefore, in addition to producing operational storm surge forecasts and issuing public advisories, the National Hurricane Center (NHC) has worked extensively with social scientists to craft graphics and text that convey the potential dangers of storm surge effectively [6].
Operational storm surge forecasts during the storm and post-storm hindcast simulations of Hurricane Sandy were run by forecasters in NHC's Storm Surge Unit using the NWS Sea, Lake, and Overland Surges from Hurricanes (SLOSH) model. This manuscript describes the operational forecasts of Hurricane Sandy run in the SLOSH ny3 basin (Figure 3), the improvements to the surge forecasting system implemented during 2013, and how the storm would have been predicted had the enhanced system been available in 2012.
Hindcast simulations of Hurricane Sandy were run for analysis and verification. Comparisons of observed water levels at NOAA tide gauge stations, by USGS temporary storm surge sensors (SSS) and high water marks (HWM) were compared with the numerically simulated water levels to assess model performance. Hurricane Sandy track and the storm tide (m) simulated by the Sea, Lake, and Overland Surges from Hurricanes (SLOSH) numerical storm surge prediction model in the ny3 basin.
It is an extremely computationally efficient, 2-D explicit, finite-difference model, formulated on a semi-staggered Arakawa B-grid [8]. The horizontal transport equations are solved through the application of the Navier-Stokes momentum equations for incompressible and turbulent flow. The SLOSH model transport equations were derived by Platzman [9], in which the dissipation is determined solely by an eddy viscosity coefficient. A bottom slip coefficient was included by Jelesnianski [10]. The governing equations are integrated over the entire depth of the water column. At every time step, the horizontal transports are solved from the pressure, Coriolis and frictional forces. These transports generate an updated level of surge at every model grid point. SLOSH includes a wetting-and-drying algorithm to predict inland inundation.
A simplified parametric wind model is embedded in the SLOSH model. The input parameters of the wind model consist of the storm track (latitude and longitude of the center's location), radius of maximum winds and the difference between the environmental and the central pressures (pressure drop) of the storm. The wind-driven forcing is incorporated into SLOSH as wind stress.
SLOSH grids have different shapes (hyperbolic, elliptical or polar) that can be customized for specific coastline geometries, with higher resolution near the coast and grid cells that telescope outward concentrically to lower resolution offshore. There are 37 operational SLOSH basins that cover the east coast of the US, the Gulf of Mexico, the Bahamas, Puerto Rico and the Virgin Islands. The bathymetry and topography in the model grid cells are derived from National Elevation Dataset (NED) digital elevation models (DEMs) from the U.S. Geological Survey (USGS), the NOAA National Geophysical Data Center (NGDC) Tsunami inundation DEMs, and Light Detection and Ranging (LIDAR) data from the US Army Corps of Engineers (USACE) or from state and local sources, if available, and the bathymetry from NGDC 3 arc-second Coastal Relief Model. All the bathymetric/topographic data must be referenced to a single vertical datum and averaged to obtain the depth/elevation of each individual SLOSH cell. The land cover classifications are derived from the USGS 30 m spatial resolution National Land Cover Database (NLCD). SLOSH basins include subgrid-scale features that allow simulation of the flow through barriers, gaps, passes, overtopping of barriers, roads, and levees.
An automated, event-triggered, storm surge prediction system, AutoSurge [11], was developed at NHC in 2010 to accelerate forecaster workflows by eliminating labor-intensive tasks, computing storm parameters with greater accuracy and preventing human input error. The system runs the SLOSH model; the input is determined objectively and consistently for all operational simulations. AutoSurge automatically generates a vast array of products from the SLOSH model output to provide internal guidance to the Storm Surge Specialists.

Forecasts
As soon as a tropical disturbance with the potential of developing into a tropical cyclone in the subsequent 48-h is identified in the Atlantic Ocean, Caribbean Sea, or the Gulf of Mexico, AutoSurge begins generating storm surge forecast simulations using the SLOSH model. The system alerts the Storm Surge Specialists at NHC, sending guidance products via e-mail, and the results are available on an internal web site, both in tabular and graphical format. Forecasts are run using storm track information that includes the latitude and longitude of the storm's center, intensity (maximum sustained 1-min wind speed), pressure drop and radius of maximum winds from NHC's Best Track operational data and parameters from all of the model information available to the Hurricane Specialists at NHC. The SLOSH parametric wind model is used to ensure that the parameters in the SLOSH wind formulation are consistent with those in the model guidance, i.e., the resulting wind speed in the SLOSH wind model is in accordance to the NHC's Best Track and the model guidance intensity, in a manner similar to other storm surge forecast systems [12,13].
Graphics of the ensemble maximum envelope of water, model track spread, individual ensemble member maximum water levels, wind intensity, the radius of maximum winds, and forecast trends are generated to depict the expected range of the storm surge forecasts to account for variability in the atmospheric forcing.
AutoSurge was run in surge-only mode during the 2012 hurricane season. More than 1000 AutoSurge numerical simulations were run during Hurricane Sandy using the Best Track and the internal NHC Results for the ny3 basin will be described and the model output graphics will be shown in this manuscript. These ensemble simulations are run in conjunction with the probabilistic P-Surge modeling system [7] developed at NOAA/Meteorological Development Laboratory (MDL), which runs an ensemble of storm surge simulations using historical error statistics of the wind parameters to generate the forecast tracks.
Enhancements made to AutoSurge in 2013 include: The new version of SLOSH + Tides (V. 2) incorporates the tides dynamically at every time step and at every SLOSH model grid point [14]. The location-dependent amplitudes and phases of 37 tidal constituents (selected to be consistent with NOAA/NOS station data) at all locations in the SLOSH grid [15] Table 2 and the glossary at [16]).
The harmonic constituents used in the SLOSH + Tides code had recently been extracted from the new, updated experimental EC2013 ADCIRC tidal database. This database employs high-resolution NOAA VDatum meshes (coastal resolution down to 14 m) along the Atlantic and Gulf Coasts of the United States, Puerto Rico and US Virgin Islands, an updated offshore bathymetry using the latest global sources, namely, Space Shuttle Radar Topography Mission SRTM30_PLUS V8.0 from the Scripps Institution of Oceanography and ETOPO1 global relief model from NOAA [17] and open boundary forcing with the latest global tidal models (TPXO 7.2 OSU Tidal Inversion Software, and later on from the FES 2004 Global Tidal Atlas and the newly released FES2012 model) [18]. AutoSurge incorporated V. 2 of SLOSH + Tides in the forecast system workflow for the 2013 hurricane season. AutoSurge used V. 2.1 of SLOSH + Tides for the ny3 basin, which has a tide-forcing threshold (bathymetric depth of influence) from the deep ocean up to a specified depth. Testing and analysis of various threshold depths for the ny3 basin determined that the optimum setting was 100 ft (30.48 m).
Due to the limited amount of time available to complete the numerical forecasts, the model runtime has to be short to be able to construct the storm surge prediction ensembles. The runtime performance for a typical SLOSH model simulation run over the ny3 basin on a typical desktop PC or Linux workstation is shown in Table 3. In the past two years, directed by research, testing and recommendations from social scientists [6], NHC's public advisories were modified to include values of inundation above ground level at the peak of high tide so the public would better understand the storm surge threat. An example, Public Advisory 26A, issued for 8:00 PM EDT (00:00 UTC) Sunday 28 October 2012, one day before Hurricane Sandy made landfall in New Jersey, is shown in Figure 4. Note that the water levels are referenced -above ground‖ and are considered valid only if the peak surge occurs at the time of high tide. Note that the inundation depths are given in feet above ground level, with the caveat that these values would be reached only if the peak of astronomical tides coincided with the peak of the storm surge.

Surge Forecast Simulations
SLOSH surge-only simulations (without tides) were run operationally in 2012 for Hurricane Sandy, as described above. Figure 5 shows an example of the model tracks used by NHC's Hurricane Specialists as guidance to determine the OFCL track for Hurricane Sandy 48-hours prior to landfall. It depicts a large spread in the model tracks with various intensities, sizes and storm center locations. This guidance is used to run the ensemble SLOSH simulations. Figure 6 displays the ensemble maximum envelope of water 48-hours prior to landfall with a maximum total water level of 4.94 m (16.2 ft) NAVD88. A summary plot of the ensemble results for the simulations, valid 48 h prior to landfall, is shown in Figure 7.   The maximum wind speed of 51 ms −1 (100 kt) shows in all the models, which occurred when Sandy made landfall in Cuba on October 25. The winds at the closest point of approach (prior to or at landfall) vary from 8 to 37 ms −1 (17 to 72 kts), which indicates the uncertainty in the wind forcing and, therefore, the variability in the storm surge potential. The top (purple) panel indicates the radius of maximum winds at CPA for each model/aid ensemble, which varies from 8 to 218 km (5 to 136 miles, 4 to 118 nm). This also contributes to the unpredictability of the storm surge hazard, even 48 h prior to actual landfall. As the storm evolves in time, the AutoSurge forecast system calculates the trend of maximum water elevation above NAVD88 and the water height above ground level for all the ensemble members at each synoptic time, as shown in Figure 8a,b. The yellow box depicts the range of water levels issued by NHC in the forecast advisories. The maximum water elevation levels predicted converge to 3.8 m (12.4 ft) relative to the NAVD88 vertical datum or 2.9 m (9.5 ft) of inundation (AGL).

Surge-Plus-Tides Forecast Simulations
If Hurricane Sandy were to be forecast today with the enhancements described earlier, then the SLOSH model simulations would have tides included in the hydrodynamic equations and would depict the total water levels. A comparison of surge vs. surge-plus-tides simulation results, in the form of an ensemble summary plot, is shown in Figure 9a,b, respectively.
Depending on the timing of the tides, the water levels of each ensemble member vary accordingly, in some cases higher and other cases lower than the counterpart without tides. In the case of the surge-plus-tides simulations, the water levels AGL are lower since the cells (areas) that would be wetted by the tides alone at any time during the model simulation are not considered inundated in the results. The maximum water level simulated 48 h prior to landfall is 3.6 m (11.7 ft) AGL for surge-plus-tides, while it is 4.3 m (14.1 ft) for surge-only. The maximum water levels in NAVD88 are higher for the surge-plus-tides simulations, with a maximum of 5.46 m (17.9 ft) as opposed to 4.9 m (16.1 ft) for the surge-only simulations.
The ensemble maximum envelope relative to the NAVD88 vertical datum for both predicted surge-only and surge-plus-tides at 00 UTC on 28 October 2012 are shown in Figure 9c,d. Clearly, higher values are predicted by the surge-plus-tides ensemble than the surge-only ensemble, as highlighted by the east-west gradient across the Long Island Sound.

(a)
The forecast trends of the surge-plus-tides simulations are shown in Figure 10. The water level values converge to 3.9 m (12.9 ft) relative to NAVD88, or 2.6 m (8.5 ft) AGL. The light yellow polygon delineates the range of water levels issued in real-time by NHC in its forecast advisories, which encompasses the maximum inundation actually recorded during this storm event of 2.71 m (8.9 ft) AGL. Figure 10. Trend of (a) maximum water elevation relative to the NAVD88 vertical datum and (b) the water height above ground level (AGL), for all the ensemble members for the surge + tides simulations. The light yellow polygon delineates the range of water levels issued in real-time by NHC in its forecast advisories, which encompasses the maximum inundation actually recorded during this storm event.

Hindcasts
Post-storm hindcast surge (S) and surge-plus-tides (ST) simulations were run for the SLOSH ny3 basin to determine the accuracy of the results. The hindcast simulation that generated surge-only water levels was forced by wind parameters from the Hurricane Sandy Best Track to drive the SLOSH model. A second hindcast simulation was run with surge plus tides. First, tides were spun up for 720 h. After this 30-day spin-up period with tides alone, a 100-hour SLOSH hindcast simulation was run with both tides and Best Track wind forcing.
The results were then compared with the water surface elevations recorded at NOAA tide gauge stations, measurements from temporary USGS storm surge sensors (SSS) and high water mark (HWM) estimates made by the USGS.

NOAA Stations vs. SLOSH Water Levels
The tide and total water levels were extracted from 13 NOAA stations (Figure 2) located in New York (NY), New Jersey (NJ), Rhode Island (RI), Connecticut (CT), and Massachusetts (MA) within the ny3 basin area and compared to the SLOSH water levels from the surge-only and surge-plus-tide hindcast simulations.
The time evolution of the observed vs. modeled water levels is shown in Figure 11 for the surge-only (left panels) and surge-plus-tides (right panels) runs.    -tides), respectively. The Cape May, NJ station is located near a SLOSH boundary, thus the phase is slightly accelerated (the simulated surge arrives too early) relative to the observations. Preliminary experiments, in which the boundary condition in the SLOSH grid was modified from deep to shallow water (since it is so close to the coast) at that model boundary, seem to improve the results for this station. It is anticipated that this adjustment will be included when a new higher-resolution SLOSH New York grid is built. The highest resolution in the current ny3 basin is 213 m. Considering only those stations away from the basin boundary, the correlations between the model-simulated and measured water surface elevations range from 0.83 to 0.94 for the surge-only, and 0.81 to 0.95 for the surge-plus-tides simulations. Table 4 shows a summary of the NOAA stations and SLOSH surge (S) and surge-plus-tide (ST) simulation results. The observed peak of S arrived earlier than the observed peak of ST, except at Bergen Point, NY, Cape May, NJ, Chatham, MA and Nantucket, MA. The same timing was replicated in the SLOSH simulations, except at Bergen Point and Cape May where the peaks of S were simulated to arrive earlier than the peaks of ST. The RMS errors range from 0.15 to 0.41 m. The correlations range from 0.80 to 0.95.  Panels in Figure 12 display the maximum water levels for (a) surge and (b) surge-plus-tides and the time-of-arrival of the peaks for (c) surge and (d) surge-plus-tides, measured at NOAA stations vs. those simulated by SLOSH. Figure 12a,b shows the stations that fall within the 10% height error (dark orange) cone, 20% error (orange) cone and 30% error (yellow) cone. In Figure 12a the simulated surge at station locations in NJ and at two station locations in NY show errors between 10% (dark orange) and 20% (orange cone), while at station locations far from the point of landfall the modeled maximum surge is underestimated, The simulated surge-plus-tides water surface elevation errors at most station locations in Figure 12b are within the 10%-20% range. In Figure 12c,d the stations that fall in the ±3 h error range for the time-of-arrival of the peak are within the orange band and the ±6 h error range are within the yellow band. The simulated peak arrival times at most sensor locations are within 3 h of that which was observed, except at stations in RI and MA far from the landfall location in panel (c), and at Cape May (station 8536110) in panel (d) because, as mentioned above, the station is located too close to the model boundary.

USGS Storm Surge Sensors vs. SLOSH Water Levels
The USGS deployed a temporary network of water level and barometric pressure sensors at 224 locations along the Atlantic coast from Virginia (VA) to Maine (MN). This was the second-largest deployment of storm-tide sensors, exceeded only by the number distributed during Hurricane Irene (2011), which made landfall in the same area of the US [3]. 145 water level and 9 wave-height sensors were deployed at 147 locations while 8 rapid deployment gauges (RDGs), and 62 barometric pressure sensors were deployed at additional locations. The water level sensors recorded water levels at 30-second intervals, the wave sensors recorded data every 2 s, the RDG sensors recorded water levels and meteorological data every 15 min and the barometric pressure sensors recorded at 30-second intervals. The water levels were recorded in feet above NAVD88. Unfortunately, 7 water level sensors were lost or the structures to which they were attached were damaged, 4 water level sensors and 1 wave sensor did not record (the water did not rise high enough to be measured) and 2 RDGs were destroyed by flood. This temporary monitoring network augmented the existing tide gauge networks and helped characterize the height, extent and timing of the storm tides. Table 5 shows the USGS storm surge sensors (SSS) deployed in each state that were used to compare water level measurements against results from the SLOSH surge-plus-tides simulation. Of the 154 sensors, only 81 were located in the ny3 basin. 9 sensors that recorded high-frequency wave heights could not be used for verification purposes because the coupled surge (SLOSH) plus wave (SWAN, Simulating WAves Nearshore) modeling system is still undergoing development and testing. 12 sensors were close to the SLOSH basin boundary or were sited in locations that were contaminated by local effects (some sensors were buried under the sand attached to an underground piling, others were surrounded by high marsh grass/weeds, some sensors were mounted on structures that block flow in most directions, other sensors were located in narrow alleys between buildings where extreme, unrepresentative channeling can occur, etc.). These sub-grid scale features and geomorphologies are not modeled or resolved by the SLOSH grid, so those sensors were not employed in the verification process. Therefore, 60 SSS sensors (Figure 13a) were compared with the model results (Figure 13b).  A comparison between the SSS sensor measurements and SLOSH-simulated water levels AGL, displayed in Figure 13b, show the extent and degree of inundation and how well the model values agree with the observed water levels. The hydrographs at the SSS stations show excellent agreement in both amplitude and phase with the SLOSH model-simulated surge-plus-tides results. Figure 14a shows the SSS sensor measurements that fall within the 10% error (dark orange) cone, 20% error (orange) cone and 30% error (yellow) cone. The SLOSH-simulated surge-plus-tides values at most station locations are within the 10%-20% error range. Figure 14b shows the stations that fall in the ±3 h error range in the arrival time of the peak (orange) and ±6 h error (yellow). Most of the simulated peak arrival times are accurate within 3 h of the observed arrival times. Table 6 compares the USGS storm surge sensor (SSS) vs. SLOSH maximum water surface elevations from the SLOSH surge-plus-tides simulation, the timing of the peak water levels, and calculations of the RMS errors and the correlations. Tables 7, 8 and 9 provide summary statistics for the data in Table 6. The RMSE of the SSS vs. SLOSH-simulated water levels show that 80% of the values simulated at station locations are less than 0.5 m (1.6 ft) in error and have correlations greater than 0.60. The SLOSH-simulated relative errors are less than 0.30 at 92% of the SSS sensor locations Figure 14. USGS SSS sensor vs. SLOSH-simulated surge-plus-tides (a) maximum water levels (m) and (b) time-of-arrival (hours) of the peak water levels. In (a), the dark orange cone depicts the 10% error, the orange cone depicts 20% error and the yellow cone depicts the 30% error. The water surface elevation errors at most sensors are within the 10%-20% range. In (b) the stations that fall in the ±3 h error range for the timing of the peak are within the orange band and the ±6 h error range are within the yellow band. Most sensors' observed vs. modeled peak arrival times are within 3 h.

USGS High Water Marks vs. SLOSH Maximum Water Levels
The observational measurements for Hurricane Sandy were supplemented by an extensive dataset of post-flood high water marks (HWMs). The USGS flagged, surveyed and collected more than 950 HWMs. Of those 950 HWM, 650 were classified to be independent (greater than 1000 ft apart from each other), and 257 flagged in CT, RI and MA were not surveyed due to lack of funding. Vertical accuracy was 0.26 ft in all counties except 0.47 ft in NJ-Union, Middlesex and Monmouth counties [3]. 559 HWMs were inside the SLOSH ny3 basin, and 312 had valid data, so excluding those close to the SLOSH boundaries, 284 HWMs were analyzed and 17 outliers (a HWM estimated from a streak on the wall of a steel shipping container, another identified by a mud line inside a small enclosed room under an air-conditioning unit, etc.) were removed. The remaining 268 HWMs distributed in different states (Table 10) were then compared to SLOSH-simulated inundation values AGL.
A comparison of the HWM estimates vs. SLOSH surge-plus-tides maximum water levels is shown in Figure 15. 34% of the simulated height at HWM locations have relative errors less than or equal to 10% (dark orange), 72% have errors less than or equal to 20% (orange cone) and 89% have errors less than or equal to 30% (yellow cone).  Figure 15. USGS High Water Marks (HWM) vs. SLOSH model-simulated surge-plus-tides maximum height of inundation (m) AGL. The dark orange cone depicts the 10% error, the orange cone depicts 20% error and the yellow cone depicts 30% error. The water surface elevation errors at most stations are within the 10%-20% range. Table 11 summarizes the relative error of the HWM vs. SLOSH maximum water levels. Almost 90% have errors less than or equal to 30%. Of the remaining HWM locations where the relative error exceeds 30%, there were 17 locations where the SLOSH-simulated maximum water levels were greater than HWM and 13 locations where the SLOSH-simulated maximum water levels were less than HWM, so there is no clear error bias.  Figure 16 shows the SLOSH-simulated surge-plus-tides maximum envelope of water (relative to NAVD88) for Hurricane Sandy. Observations at NOAA stations (squares), SSS (triangles) and HWM (circles) have been added with the same color range for comparison. For the most part, the observations are in good agreement with the model results. Some HWMs have higher water level values than those simulated (red circles), particularly in west Raritan Bay, NY. It seems the water in the East River is not flowing through the grid properly. There could be many reasons for this including: unsimulated features in the wind field, the formulations of the surface and bottom stresses, lack of coupling to a wave model, and/or sophistication of the boundary conditions; however of particular significance is a lack of resolution in that area and a non-optimal orientation angle of the grid lines with respect to the river. More detailed investigation needs to be conducted and a new New York basin might need to be built to remedy this retardation of the water flow. The distribution of the relative error between the observed and modeled maximum heights is shown in Figure 17. Errors are less than 10% in the Long Island Sound, the CT and RI coastlines and 20% along the south shore of Long Island (Breezy Point, Atlantic Beach, Long Beach, Jones Beach the Hamptons). Some isolated areas along the east NJ coastline (Surf City) exhibit higher relative errors. Figure 17. Geographical distribution of the relative error between the observed and SLOSH-simulated maximum water levels.

Horizontal Distribution of Observations vs. SLOSH
The SLOSH model-simulated surge-plus-tides AGL results over land and maximum envelope of water over the ocean, as rendered by the interactive SLOSH Display Program [19], are compared to the Federal Emergency Management Agency (FEMA) Modeling Task Force (MOTF) field-verified, -ground-truth‖ Hurricane Sandy Impact Analysis graphic [20], which depicts the final high-resolution storm surge extent (grey) and very high-resolution extent in NYC (blue) in Figure 18 to provide a more detailed verification of the inundation area. The geographical patterns of inundation agree quite well, especially at Breezy Point, Rockaway, the low-lying areas surrounding JFK airport and further east along the shores of East Bay and South Oyster Bay. The SLOSH wetting-and-drying algorithm performs skillfully inland to the west, in the area extending from south to north along the west bank of the Hudson River from Hoboken to Union City, NJ and further west in the larger Jersey City, Secaucus and Ridgefield area. Flooding over the river banks is also accurately simulated to the south along the Raritan River, the Washington Canal and the South River. The inundation area calculated from the SLOSH Best Track hindcast simulation was 561 km 2 (216 sq mi).

Conclusions
The verification analyses conducted in this study show that the NWS SLOSH storm surge prediction model is able to simulate the height, timing, evolution and extent of the water that was driven ashore by Hurricane Sandy (2012) with a high degree of fidelity. Upgrades to the numerical model in 2013, including the incorporation of astronomical tides with 37 harmonic constituents, have increased its hindcast accuracy and will enable forecasters to better predict the timing and extent of the total water level and inundation.
In addition, the model's extreme computational efficiency enables it to run large, automated ensembles of predictions in real-time to account for the high variability in atmospheric forcing that can occur in tropical cyclone forecasts, which makes the guidance designed to alert the public and prevent the loss of life more robust and reliable.
Quantitative comparisons ( Figure 19, summary provided in Table 12) of SLOSH simulation results against water surface peak elevations measured at all 13 NOAA tide gauge stations, by 60 storm surge sensors deployed by the USGS prior to the storm, and from 268 HWMs collected by USGS-a total of 341 observations-reveal that the SLOSH model-simulated water levels at more than one-third (34%) of the data measurement locations have less than 10% error (dark orange cone), while 71% (89%) have less than 20% (30%) error (orange and yellow cones, respectively). The RMS error between the observed and modeled peak water levels is 0.47 m (1.5 ft) (Table 13).   The arrival times of the peaks in the water elevation observations at NOAA and USGS SSS stations and their SLOSH-simulated counterparts are in good agreement, as demonstrated by the hydrographs and the statistical calculations (RMSE and correlation) from the time series.
The SLOSH simulations underestimated the surge in some areas far from the point of landfall and far from the center of the SLOSH grid where the resolution is coarser (CT, MA, RI) and in the Raritan Bay where the resolution (2 grid cells) across the East River might not be allowing the water to flow freely into the bay. Many other factors may have contributed to the underestimation of water levels in these locations: grid resolution, basin size, boundary conditions, lack of waves in the simulations, the tidal method, wind field, surface stress, bottom stress, etc. In this case, the most likely reason for the error is the coarseness of the grid. Previous SLOSH studies [21] have shown that larger and higher resolution SLOSH grids and different parameterizations of the surface and bottom stresses can improve the accuracy of the storm surge results. Efforts are currently underway to test and validate a coupled SLOSH + SWAN modeling system [21] that includes surge, tides and waves.
The highly complex structure of Hurricane Sandy presented an operational challenge for the standard tropical version of SLOSH. Figure 20 shows a comparison between the winds produced by the SLOSH parametric wind model and the real-time multi-platform satellite surface wind analysis at 00 UTC on 30 October 2012 from the NOAA National Environmental Satellite, Data and Information Service (NESDIS), the Cooperative Institute for Research in the Atmosphere (CIRA) Regional and Mesoscale Meteorology Branch (RAMMB) at Colorado State University (CSU) [22] as Hurricane Sandy made landfall northeast of Atlantic City, NJ. The wind analysis combines information from five different data sources to create a mid-level wind analysis, which is then adjusted to the surface using empirical, radially varying coefficients obtained from reconnaissance aircraft and GPS dropwindsonde data. Despite the simplicity of the SLOSH parametric wind model, the simulated winds are remarkably realistic. There is strong wavenumber 1 asymmetry due to the storm's forward motion. The 50 kt (25.72 ms −1 ) isotachs in panels (a) and (b) are similar in orientation, shape and extent. The SLOSH surface friction simulates a reduction in wind speed of about 10 knots (5.14 ms −1 ) over Long Island Sound due to the downwind effects of the Long Island land cover. The wind directions in both panels also compare quite favorably. The basis of this study was to assess a baseline skill level of SLOSH and compare it to its latest improvements demonstrated by the inclusion of tidal constituents in SLOSH. Implementing gridded wind fields, an improved parametric wind model [12], and a combination thereof are planned upgrades to SLOSH.
The ExtraTropical Storm Surge Model (ETSS), developed by the NOAA/NWS Meteorological Development Laboratory (MDL), is a variation of the NWS SLOSH that runs operationally on NCEP's central computing system four times daily. The model is forced by real-time output of winds and pressures from the NCEP Global Forecast System (GFS) and produces numerical storm surge guidance for extratropical systems in 6 grids that cover the US East Coast, Gulf of Mexico, West Coast, Gulf of Alaska, Bering Sea and Arctic. This modeling system does not currently include overland flooding or tides. Work is currently underway to combine the ETSS and the newer versions of SLOSH, which include tides and inundation, via nesting from the coarser ETSS grids down to the latest higher resolution SLOSH grids.
An improved version of the Mattocks and Forbes [12] asymmetric parametric wind model, GWAVA (Gradient Wind Asymmetric Vortex Algorithm), is currently being incorporated into SLOSH. Blending the near-field winds from this more advanced parametric wind model with gridded far-field winds from the GFS or other numerical weather prediction models will potentially improve storm surge prediction by providing more realistic multi-scale wind forcing at the ocean surface and its hydrodynamic response.
The value of future upgrades to the SLOSH model and basin refinements can later be compared to this baseline study. This analysis will also be instrumental in the evaluation of other modeling systems and to assess how they might contribute to operational forecasting as NHC moves toward a multi-model ensemble.

Author Contributions
Cristina Forbes developed AutoSurge, ran the operational forecast and hindcast simulations, generated graphics and did the analysis and validation of observations vs. SLOSH results; Jamie Rhome provided his expertise in operational storm surge forecasting and communication of the storm surge threat; Craig Mattocks contributed his knowledge on the atmospheric forcing of storm surge simulations, forecast ensembles, parametric wind models, optimization of numerical models and visualization of the results; Arthur Taylor provided his expertise on the SLOSH model and implementation of the various upgrades used in the operational and hindcast simulations.