Benchmarking of Numerical Models for Wave Overtopping at Dikes with Shallow Mildly Sloping Foreshores: Accuracy versus Speed

To accurately predict the consequences of nearshore waves, coastal engineers often employ numerical models. A variety of these models, broadly classified as either phase-resolving or phase-averaged, exist; each with strengths and limitations owing to the physical schematization of processes within them. Models which resolve the vertical flow structure or the full wave spectrum (i.e. sea-swell (SS) and infragravity (IG) waves) are considered more accurate, but also more computationally demanding than those with approximations. Here, we assess the speed-accuracy trade-off of six well-known wave models for overtopping (q), under shallow foreshore conditions. The results demonstrate that: i) q is underestimated by an order of magnitude when IG waves are neglected; ii) using more computationally-demanding models does not guarantee more accurate results; and iii) with empirical corrections to account for IG waves, phase-averaged models like SWAN can perform on par, if not better than, phase-resolving models but with far less computational effort.


Background
Coastal engineers often employ numerical modelling in the design, assessment and rehabilitation of coastal structures to accurately forecast nearshore waves and currents, sometimes including the consequences (Akbar and Aliabadi, 2013, Sierra, et al., 2010, Smith, et al., 2012, Suzuki, et al., 2017. Of particular interest is the extent to which waves reach and pass over the crest of a structure, referred to as wave overtopping. Extreme overtopping events are characterized by considerable flow velocities which impose serious hazards to both people and infrastructure; with flooding or coastal inundation as the most critical consequence. The integration of numerical modelling in estimating wave overtopping and the design of coastal structures is becoming increasingly more attractive given the progress in available computing power and the limitations of traditional empirical approaches which are typically limited to the number of simplified structure configurations and the range of environmental conditions applied in their derivation. Furthermore, as many of the empirical models (e.g. EurOtop, (2018)) require the incident significant wave height ( 0, , , ) and spectral wave period ( −1,0, , ) at the toe of the structure as input, numerical models are often needed to accurately capture the nonlinear effects associated with the shoaling and breaking of highfrequency sea-swell (SS) waves in shallow water (Altomare, et al., 2016, Mase, et al., 2013. Such effects include a rise in mean water level-known as wave-induced setup-and the growth of low-frequency infragravity (IG) waves ( Figure 1) which not only contribute to 0, , , but also result in higher values of −1,0, , (Hofland, et al., 2017).

Figure 1 Schematic representation of wave transformation over a shallow foreshore (from an XBeach model simulation), showing a) the growth of IG waves; b) the increase MWL at the dike toe; and c) the shift in the peak in energy density to lower frequencies from offshore (P1) to the dike toe (P2). Vertical line in panel 'c' indicates the separation between SS and IG frequencies.
A variety of numerical models, which may be broadly classified as phase-resolving or phaseaveraged, have been developed for such applications; each with strengths and limitations owing to the physical parameterization of processes and the numerical schemes incorporated within them (Cavaleri, et al., 2007, Vyzikas andGreaves, 2018). Models which attempt to resolve the vertical flow structure and those that consider the full frequency range of nearshore waves (i.e. both SS and IG waves) are considered not only more accurate, but also more computationally demanding than those which make use of approximations.
Within the phase-resolving class of wave models, those that resolve the vertical flow structure and solve the fully nonlinear, time-averaged Navier-Stokes (NS) equations-often referred to as Computational Fluid Dynamics (CFD) or depth-resolving models-have the least theoretical limitations and are generally considered the most accurate. CFD models, such as the meshbased Eulerian approach OpenFOAM (Jasak, et al., 2007) or mesh-less Lagrangian approach DualSPHysics , are able to simulate complex wave problems, such as: nonbreaking and breaking waves, wave-current interaction and wave-structure interaction from deep to shallow water conditions, including the overturning (Lowe, et al., 2019) and roller formation of breaking waves (Higuera, et al., 2013). However, these models require a significant amount of computational effort (unless a coupling method is applied (Altomare, et al., 2015, Verbrugghe, et al., 2018); thus, limiting their application so far to very local phenomena-for example, wave overtopping. As depth-resolved (fully 3D or 2DV) models are generally considered too computationally expensive for operational use, the problem may be further simplified by depth-averaging. These models, in which the vertical structure is not directly resolved but only modelled parametrically, are referred to as two-dimensional in the horizontal (2DH), or 1DH where only a cross-shore transect is simulated (Brocchini and Dodd, 2008). As a result of depth-averaging, processes such as wave overturning, air-entrainment and wave generated turbulence are not directly solved. Those that simulate the amplitude and phase variation of SS waves are often referred to as phase-resolving. Within this type of model, there are generally two main sets of governing equations: i) the Non-linear shallow water (NLSW) equations; and ii) the Boussinesq type.
While the Boussinesq-type models (e.g. FUNWAVE (Kirby, et al., 1998), MIKE21-BOUSS (Warren and Bach, 1992) and BOSZ (Roeber and Cheung, 2012)) directly account for the dispersive properties of waves in deeper water, the NLSW models assume that waves are nondispersive and are therefore limited to shallow-water applications (Brocchini andDodd, 2008, Zijlema andStelling, 2008). This limitation can be removed by taking a SS-wave averaged approach; however, at the cost of decreased accuracy (due to exclusion of SS-wave motions).
In order to use the NSLW equations for phase-resolving simulation of SS-wave motions, Stelling and Zijlema (2003) proposed another method to account for dispersion (a result of nonhydrostatic pressure) whereby the pressure is decomposed into non-hydrostatic and hydrostatic pressure components (e.g. SWASH (Zijlema, et al., 2011), NHWAVE (Ma, et al., 2012) and XBeach Non-hydrostatic (XB-NH) (Smit, et al., 2010) numerical models). This approach improves the dispersive properties without neglecting the higher-frequency motions; however, at the expense of more computational demand. The accuracy and range of applicability of the non-hydrostatic models may be further enhanced by coarsely dividing the model domain into a fixed number of vertical layers ( ≤ 3); thereby, improving the frequency dispersion (e.g. SWASH, NHWAVE or XB-NH in multi-layered mode (De Ridder, 2018)). By further increasing the number of vertical layers ( ≥ 10), models like SWASH may be extended to the depth-resolving class. This approach increases the computational demand but allows processes, such as undertow and the shoreward flow near the surface, to be resolved.
Given that phase-resolving models require a grid resolution high enough to resolve the individual SS-wave components, they are generally computationally feasible only for areas of limited size. For large-scale modelling of wave motion, a phase-averaged approach is most commonly used. This type of model is constructed on the assumption that a random sea-state is composed of a superposition of linear waves whose height is a function of their frequency and direction of propagation. For an individual wave train the rate of change of wave energy (or action) flux is balanced by the wave energy transfer among different wave components in different directions and different frequencies, as well as energy input and dissipation. With the phase information filtered out, these models can use much courser computational grids and therefore be applied to large areas. However, as individual waves are not resolved, these models must be combined with empirical formulae to estimate wave run-up and overtopping (Oosterlo, et al., 2018, Sierra, et al., 2010. Commonly used spectral models in nearshore applications include SWAN (Booij, et al., 1999) and STWAVE (Smith, et al., 2001). These models are generally able to accurately reproduce higher harmonics (SS waves); however, they do not account for the interactions that force IG-wave motions (Cavaleri, et al., 2007), which tend to dominate in shallow water.
With respect to previous model comparisons in shallow coastal environments, Buckley, et al. (2014) assessed the performance of SWASH, SWAN and XB-SB in predicting SS wave heights ( 0,SS ), IG wave heights ( 0,IG ) and setup ( ̅ ) across a steep laboratory fringing reef profile (varying from 1:5 to 1:18.8). Results showed that each model was capable of accurately predicting 0,SS ; however, SWAN failed to simulate the transformation of energy to lower frequencies and thus, failed to predict 0,IG . Likewise, SWAN showed considerably more error in its prediction of ̅ compared to SWASH and XB-SB. On the other hand, XB-SB performed comparably well to its phase-resolving counterparts in the prediction of nearshore wave heights; and surprisingly the extent of wave run-up, particularly when IG-waves dominated at the shoreline (Lashley, et al., 2018). From these previous studies, the points of discussion that naturally arise are: i) Can phase-averaged models like SWAN be accurately applied under very shallow conditions, where IG waves dominate and ̅ is significant?
ii) Given that IG waves dominate, are models of increasing complexity needed or is a shortwave averaged but IG-wave resolving approach all that is required? and iii) While attempts at model comparisons for wave overtopping have been made (St-Germain, et al., 2014, Vanneste, et al., 2014, no study to date has the full range of model complexity (from depth-resolving to phase-averaged) or successfully quantified the accuracy versus speed of these models under irregular wave forcing .

Objective
In the present study, it is our primary aim to quantify the accuracy versus speed of computation of six commonly-used nearshore wave models (Table 1) in their prediction of irregular wave overtopping of a dike with very shallow foreshore conditions-where IG waves and setup contribute significantly.

Outline
This report is organized as follows: Section 2 provides descriptions of the physical and numerical models applied, followed by descriptions of key parameters and empirical formulae used in the analysis. It ends with a description of the metrics used to quantify model accuracy.
In Section 3, the results of the model-data comparisons and the overall influence of IG waves on overtopping are presented and discussed. Section 4 concludes the report by summarising the findings, acknowledging the present study's limitations and identifying areas for future work.

Methods
This section begins with a description of the physical model tests under consideration. After which it describes the five numerical models under evaluation, including their governing equations and setup details. A description on the parameters and metrics used to assess model accuracy and computation speed is then provided. Finally, the additional numerical simulations for comparative analysis are described.

Description of the Physical Models
In the present study, we consider two specific test cases that were both performed at Flanders Hydraulics Research in a smooth, 1-m wide section of their 70-m long and 1.45-m deep wave flume (Altomare, et al., 2016) with different deep water wave heights ( 0, , ), peak periods ( ), foreshore slopes ( ), initial water depths at the toe (ℎ ), dike slopes ( ) and dike freeboards ( ) (Table 2). These cases were selected to cover a wide range of deep-water wave steepness ( 0 ), from very mild ( 0 = 0.007, typical of swell conditions) to very steep ( 0 = 0.047, typical of wind-sea conditions). With relative water depths ( ℎ 0, , ⁄ ) < 1, these conditions are considered very shallow (Hofland, et al., 2017). Both experiments simulated irregular spilling waves (with breaker parameter based on , 0, < 0.5) with a duration approximately equal to 500 waves to obtain accurate and comparable estimates of the mean overtopping discharge ( �) (Romano, et al., 2015).  For the mild swell-wave case, the variations of water-surface elevations were measured using 10 resistance-type gauges, all synchronously sampling at 50 Hz ( Figure 2a); while 6 gauges with a sample frequency of 20 Hz were used in the steep-wave case ( Figure 2b). In the analysis to follow, the term "offshore" is used to refer to gauges 1 to 7 and 1 to 3 of the mild swell and steep-wind wave cases, respectively; and the term "nearshore" to refer to gauges 8 to 10 and 4 to 6, respectively. In either case, the term "toe" refers to the last wave gauge (gauge 10 and gauge 6 of the mild swell and steep wind-wave cases, respectively).
In both cases, the instantaneous overtopping was measured using two Balluff "Micropulse" water sensors situated inside the overtopping box; and � was then obtained by dividing the total volume of water collected at the end during the test by the total test duration.

Description of Numerical Models
In this study, six widely-used open-source numerical wave models are considered for comparative analysis. Each model is forced at its boundary with still water levels and parametric spectra (JONSWAP) to match those observed at the most offshore wave gauge during each physical experiment. Likewise, the smooth flume bottom was represented as either a Manning coefficient ( ) of 0.01 s/m 1/3 or a Nikuradse geometrical roughness ( ) of 0.3 x 10 -3 m (in the case of SWAN). A general description of each model is provided in the sections that follow. As we investigate two extremes: very mild swell and very steep wind waves, it is reasonable that some calibration was required for the depth-averaged models (BOSZ, XB-NH and XB-SB).
Therefore, a description of the main calibration parameters, their optimum values and impact on model results is also provided. In general, calibration was aimed at reducing the error in � and 0, .

OpenFOAM
The software OpenFOAM is an Open Source object-oriented library, composed by solvers and utilities (Jasak, et al., 2007). The formers are designed to numerically solve continuum mechanics problems, while the latter perform tasks involving data manipulation.
For the present study, the library waves2Foam, a toolbox capable of generating and absorbing free surface water waves, has been adopted. Currently, the method applies the relaxation zone technique (active sponge layers) and supports a large range of wave theories (Jacobsen et al., 2012). The governing equations for the combined flow of air and water are given by the Reynolds Averaged Navier-Stokes equations (Equations 1 and 2): coupled with the continuity equation (2) for incompressible flow: where is the velocity field, * is the dynamic pressure component, is the density, g is the acceleration due to gravity and is the dynamic molecular viscosity. The Reynolds stress tensor is defined as: where is the dynamic eddy viscosity, is the strain rate tensor, is the turbulent kinetic energy per unit mass and is the identity matrix. The last term in Equation 1 is the effect of surface tension, where is the surface tension coefficient and is the surface curvature (Jacobsen, et al., 2012). The track of the free surface is performed by using the VOF method (Hirt and Nichols, 1981).
For the mild and the steep cases, two regular slightly different meshes have been generated, to account for the differences between the two wave conditions. The numerical domains of the

SWASH
SWASH is a time domain model for simulating non-hydrostatic, free-surface and rotational flow. It solves the NLSW equations with an added non-hydrostatic pressure correction term (Smit, et al., 2013): where is the free surface elevation; ( , , ) and ( , , ) are the horizontal and vertical velocities, respectively; ℎ is the water depth; is the density of water; ℎ and ℎ are the hydrostatic and non-hydrostatic pressures, respectively; and , , and are the turbulent stresses.
The model exhibits good linear dispersion up to ℎ ≈ 8 and ℎ ≈ 16 with two and three equidistant (sigma) vertical layers ( ), respectively; its frequency dispersion is further improved by increasing .
Here, the model was applied with = 20, which is sufficient for the phase velocity at the breaking wave front to be computed accurately. As such, no additional control is required to initiate or terminate wave breaking. The vertical pressure gradient was discretized by the standard central differencing scheme with the ILU pre-conditioner. The standard k-ε turbulence model is applied to take into account vertical mixing.
A cross-shore grid spacing (∆ ) of 0.04 m was specified for both the mild-and steep-wave cases. This resulted in approximately 200 and 110 grid cells per deep-water wavelength ( 0 ∆ ⁄ ) for the mild-and steep-wave cases, respectively. For phase-resolving models, 0 ∆ ⁄ is typically kept between 50 and 100 (by rule of thumb) to ensure that the wave components are accurately resolved; however, as waves propagate in very shallow water, the local wavelength becomes much shorter than 0 . Thus, in order to maintain a reasonable number of grid cells per local wave length, these higher-than-typical grid resolutions ( 0 ∆ ⁄ = 200 and 110) were specified. Rijnsdorp, et al. (2017) proposed a sub-grid approach to improve model efficiency, where vertical accelerations and non-hydrostatic pressures are resolved on a relative course grid while the horizontal velocities and turbulent stresses are resolved on a much finer sub-grid. This approach was attempted here, however, the simulations failed due to instabilities.

BOSZ
The BOSZ wave model-which is freely-available upon request from the developerscomputes hazardous free surface flow problems ranging from near-field tsunamis to extreme swell ranges generated by hurricanes. It solves the following re-formulated, depth-integrated Boussinesq equations of Nwogu (1993), in vector notation: where U is the horizontal flow velocity defined at a reference depth ̅ = -0.55502ℎ (Simarro, et al., 2013).
The governing equations exhibit good dispersion accuracy up to ℎ ≈ . Given the difficulty of Boussinesq equations in handling flow discontinuities (such as with breaking waves), the model deactivates the dispersion terms during wave breaking and makes use of the underlying NLSW equations where the breaking wave is then approximated as a bore or hydraulic jump. Wave breaking-and the deactivation of the dispersion terms-occurs in the model based on the momentum gradient: where is a calibration coefficient (by default = 0.5). Here, = 0.8 produced the best agreement between model and observations for both cases. This suggests that under these particularly shallow conditions, the wave face becomes very steep prior to breaking. For a detailed overview of the model's sensitivity to this parameter, the reader is pointed to (Roeber, et al., 2010). All other model parameters were kept at their default values.
The grid resolution ( 0 ⁄ ) was set as 200 for the mild swell-wave case but was reduced to 60 for the steep wind-wave case to ensure model stability. For the steep-wave case, higher grid resolutions and lower values led to instabilities in the form of strong oscillations in surface elevation in the breaking region. This phenomenon, explored extensively by Kazolea and Ricchiuto (2018), is due to the model's hybrid approach to handling wave breaking; that is, where the Boussinesq equations are reduced to the NLSW equations during wave breaking. It should be noted that Boussinesq wave models which take a different (eddy viscosity) approach to wave breaking reportedly show less sensitivity to the grid size (Kazolea and Ricchiuto, 2018); however, this was not evaluated here.

XBeach Non-hydrostatic
Like SWASH, XB-NH solves the NLSW equations with a non-hydrostatic pressure correction term (Equations 4 to 7). Here, XBeach version 1.235527 (also known as the "XBeachX" release) is applied in reduced (simplified) two-layer mode, where the non-hydrostatic pressure is assumed constant in the lower (first) layer (De Ridder, 2018). The water depth is divided into two layers with heights 1 = ℎ and 2 = (1 − )ℎ, where is the layer distribution. The resulting layer-averaged velocities ( 1 and 2 ) are transformed to a depth-averaged velocity ( ) and a velocity difference (Δu). Due to the simplified non-hydrostatic pressure in the lower layer, the vertical velocity between layers is neglected. Therefore, only the continuity relation for the upper (second) layer is required: To determine the water elevation, the global continuity equation is applied: In order to control the computed location and magnitude of depth-limited wave breaking, a hydrostatic front approximation is applied. With this, the pressure distribution under breaking waves is considered hydrostatic when the local surface steepness exceeds a maximum prescribed value ( = 0.5, by default): Here, = 0.9 and 0.7 produced the best agreement between the model and observations for the mild-and steep-wave cases, respectively. This further supports the statement that for very shallow foreshores, the waves become particularly steep before breaking. All other model parameters were kept at their default values. Additionally, the grid resolution ( 0 ∆ ⁄ ) was set to ~200 and ~180 for the two respective cases.

XBeach Surfbeat
XB-SB solves SS-wave motions using the wave-action equation with time-dependent forcing, similar to that of the HISWA model (Holthuijsen, et al., 1989). The model represents the SSwave frequency spectrum by a single frequency ( ) and the wave-action equation is applied at the timescale of the wave group: where is the wave action, is the wave energy density, is the intrinsic wave frequency, is the wave number, is a dissipation term to account for wave breaking and is the waveaction propagation speed in the cross-shore direction. To simulate wave breaking, XB-SB applies a dissipation model (Roelvink, 1993), by default, for use with SS-wave groups; and a roller model (Nairn, et al., 1991, Svendsen, 1984 to represent momentum stored in surface rollers which results in a shoreward delay in wave forcing. The radiation stress gradients that result from these variations in wave action exert forces on the water column and drive IG waves and unsteady currents which are solved by the NLSW equations (Equations 4 to 7). Therefore, the model directly simulates wave-driven currents and the run-up and overtopping of IG waves.
where � is the total (directionally-integrated) wave energy dissipation due to breaking, = 1⁄ is the representative wave period and is the fraction of breaking waves; the rootmean-square SS-wave height, = �8 ⁄ ; the maximum wave height, = ℎ; is the wave-group varying SS-wave energy; is a dissipation (by default = 1) and is the ratio of breaking waves to local water depth (by default = 0.55 but typically used for calibration).
Here, = 0.45 and 0.65 provided the best agreement between the model and observations for the mild swell and steep wind-wave cases, respectively.
XB-SB does not directly produce the SS-wave component of the energy density spectrum, instead it computes the change in SS-wave energy as a change in the bulk parameter, as described above. In order to produce a complete energy density ( ) spectrum at each gauge location, a JONSWAP distribution was assumed around the peak-frequency ( ), where�8 ∫ 2 /2 = . This SS-wave spectrum (Figure 3b) was then combined with the IG-wave spectrum (Figure 3a)-obtained directly from the computed surface elevation-to produce the complete spectrum (Figure 3c). For the mild swell-wave case, the grid resolution was varied such that it increased shoreward. This reduced computation time while ensuring that the steep dike slope was accurately capture.
As such, 0 ∆ ⁄ varied from ~25 (offshore) to ~160 (at the dike) in the mild-wave case; and from ~45 to ~90 in the steep-wave case.

SWAN
SWAN is a third-generation, phase-averaged wave model used to estimate the generation (by use of a single representative frequency, SWAN takes the frequency distribution of action density into account. To simulate wave breaking, SWAN uses the following parametric dissipation model (Battjes and Janssen, 1978): and is estimated as: where is the mean wave frequency, = ℎ and is the total wave-energy variance. Here, = 0.73 (default value) provided good agreement between the model and observations for both the mild swell and steep wind-wave cases. For both wave cases, a constant grid spacing of 0.25 m was applied. This corresponded to 0 ∆ ⁄ ≈ 30 for the mild-wave case and 0 ∆ ⁄ ≈ 20 for the steep-wave case.

Mean water level
The mean water level ( ̅ ) was calculated by taking the average of the surface elevation, ( ), at each gauge location, relative to the elevation of the dike toe. The wave-induced setup, < >, was then obtained as the difference between ̅ at each gauge location and ̅ at the most offshore gauge.

Separation of infragravity and sea-swell waves
The time series of ( ) were further analysed using the Welch's average periodogram method and a Hann filter with a 50% maximum overlap. The resulting one-dimensional spectra of wave energy density, ( )-with ~43 degrees of freedom and a frequency resolution of ~0.008 Hz-were then used to determine 0, , 0, and 0, , as follows: and 0, where half the peak frequency ( 2 ⁄ = 1/2 ) is taken as the cut-off to separate SS and IG motions (Roelvink and Stive, 1989). This choice of cut-off frequency is based on the tendency that, in deep water, the majority of SS-wave energy is found at frequencies > 2 ⁄ while the majority of IG-wave energy lies at frequencies < 2 ⁄ .

Spectral Wave Period
In addition to wave heights, the spectral wave period ( −1,0 ) at each gauge location was calculated as follows: where,

Empirical Estimate of the Incident Infragravity Waves
As SWAN neglects the contribution of IG waves to the total wave incident height at the dike toe ( 0, , , ) we apply an empirical correction, proposed by (Lashley, et al., Forthcoming).
As � represents the ratio of IG to SS waves,

Empirical Wave Overtopping
While the fully phase-averaged models like SWAN are-to some extent-able to estimate nearshore wave conditions, they cannot directly simulate wave overtopping, as this requires that the individual waves be resolved. In order to estimate wave overtopping, these models can be (and are often) combined with well-established empirical models that require wave parameters at the dike toe as input. In the present study, the EurOtop (2018) tanα = 1.5 0, , , + 2% �1.5 0, , , − ℎ � · + (ℎ + 2% ) · cot , 2% 0, , , where is the gravitational constant of acceleration, α is an equivalent slope (to account for waves breaking on the foreshore) and −1,0, , is the spectral wave period at the dike toe based on the incident waves (i.e. without the influence of waves reflected at the dike). It should be noted that −1,0 and 2% are obtained iteratively (until 2% converges), with a first estimate of 2% = 1.5 0, , , .

Error Metrics
In order to compare the performance of the numerical models, we assess the mean relative accuracy in an approach similar to that of Lynett, et al. (2017): where is a stand-in for the parameter under consideration ( ̅ , . While the focus of this study is primarily at the dike toe, it is important to assess the model performance offshore to ensure that: i) the boundary conditions are correctly modelled; and ii) that no (significant) numerical dissipation occurs in deep water, as a result of a coarse grid resolution for example.
Finally, the performance of each model for wave overtopping was also assessed by comparing the absolute relative error in the prediction of mean overtopping discharge:

Computation Speed
Two work stations (WS) were used to carry out this research (Table 3). Given the required computational effort, the OpenFOAM simulations were performed on WS-A, while the other models were run on WS-B. To assess computation speed, the duration of each simulation (in wall clock time) was recorded.

Results and Discussions
In this section, the results of the model-data comparisons are presented and discussed. As wave overtopping is the end result of wave propagation, the performance of each model for the prediction of mean water levels, wave heights and periods is first assessed. For the models where calibration was carried out (BOSZ, XB-NH and XB-SB), both default and calibrated results are presented. Note that no parameter tuning was done for the depth-resolving models (OpenFOAM and SWASH) as wave breaking is intrinsically resolved. Likewise, SWAN with default settings showed reasonable agreement and was therefore not calibrated. Lastly, it should be noted that the BOSZ simulation of the steep wind-wave case with default settings resulted in instabilities (see Section 2.2.3) and is therefore not included in the analysis.

Mean Water Level
Each model, excluding OpenFOAM, is able to accurately (within 15% error) and consistently reproduce � for both the mild swell ( Figure 4a) and steep wind-wave (Figure 4b) cases. This includes the increase in � nearshore, referred to as wave-induced setup (< >), highlighted in Figure 5 with the XB-NH results representing the general behaviour of the numerical models.
While OpenFOAM agrees well with the observations for the mild swell case, it overestimates < > offshore (gauges 2 and 3) and underestimates < > nearshore (gauges 4 to 6) for the steep wind-wave case ( Figure 5). This may be indicative of premature wave breaking in OpenFOAM.
The satisfactory performance of BOSZ and XB-NH observed here (Figure 4) is in contrast with previous studies (Lashley, et al., 2018, Zhang, et al., 2019, which found that depth-averaged models were unable to accurately estimate < > due to their lack of vertical resolution and exclusion of wave roller dynamics. However, the difference in model performance here is likely due to the spilling nature of the waves ( 0, = 0.23 and 0.13 for the mild swell and steep wind-wave cases, respectively) compared to the plunging waves and steep fore-reef slopes assessed by Lashley, et al. (2018)

Figure 5 Cross-shore profiles of modelled (XB-NH and OpenFOAM) and observed < > for both the mild swell and steep wind-wave cases.
There is also a notable difference in the observed maximum < > between the mild swell (< > = 0.004 m) and steep wind-wave (< > = 0.015 m) cases ( Figure 5). This substantial increase in < > as ℎ 0, , ⁄ decreases agrees with the findings of Gourlay (1996) on shallow reefs, and suggests that < >-which contributes to wave run-up (Stockdon, et al., 2006) and, by extension, overtopping-increases proportionally as foreshores become more shallow, or as deep water wave conditions become more energetic.

Significant Wave Height
SWASH, BOSZ, XB-NH and XB-SB are able to reproduce 0, , both offshore and nearshore, within 15% error for two cases ( Figure 6). On the other hand, OpenFOAM and SWAN both show notable differences; with SWAN consistently and considerably underestimating 0, nearshore.
While SWAN is able to accurately simulate the propagation of high-frequency waves ( 0, , Figure 7), it does not compute the low-frequency waves ( 0, , Figure 8) and therefore underestimates 0, nearshore, where the contribution of IG waves is significant ( Figure 6). The relatively high standard deviation associated with SWAN's nearshore 0, estimates is due to its exclusion of wave reflection. In the physical model, the superposition of the incident and reflected waves results in a nodal/anti-nodal pattern with a maximum at the dike (outsets in Figure 9a and Figure 10a). As SWAN excludes the reflected component, the model SWAN also predicts a higher and lower maxima in 0, (just before breaking) than XB-NH, for the mild swell ( Figure 9a) and steep wind-wave (Figure 10a) cases, respectively. This is likely due to the dissipation model employed by SWAN (Equation 19). Tuning -the parameter which controls the maximum wave height to water depth ratio in SWAN-would yield better agreement between the two models; however as there were no wave gauges in this region it is difficult to ascertain which model is correct here.
With respect to OpenFOAM, the model shows inconsistent results between the two cases.
Under the mild swell conditions, OpenFOAM underestimates 0, nearshore ( Figure 6a); however for the steep wind-wave case, the model overestimates 0, nearshore ( Figure 6b). In both cases, the model appears to be too dissipative, resulting in a reduction in 0, offshore. For the mild swell case, this dissipation is minor resulting in a consistent underprediction of 0, (Figure 7a and Figure 9a) and 0, (Figure 8a and Figure 9b). Under the steep wind-wave conditions, however, the dissipation is significant. This observation,  Figure 10a). XB-NH also shows some numerical dissipation offshore but this is negligible compared to that of OpenFOAM ( Figure 10a). As a reduction in grid size did not significantly improve the OpenFOAM model results, the observed dissipation is possibly due to an over-production of turbulence leading to premature wave decay (Larsen and Fuhrman, 2018).
Though SWASH is able to accurately predict  17)-the parameter that controls the magnitude of dissipation-is calibrated (Lashley, et al., 2018) would yield better results. However, as XB-SB predicts IG-wave overtopping only, the loss in accuracy for 0, to improve 0, predictions was considered acceptable.

Spectral Wave Period
SWASH, XB-NH and XB-SB show good agreement between modelled and observed −1,0 predictions; while SWAN, OpenFOAM and BOSZ show notable deviations. As the accurate prediction of −1,0 requires the models to correctly represent the distribution of wave energy by frequency (Equation 25), we assess the modelled versus observed wave spectra ( Figure 12). SWASH, BOSZ, XB-NH and XB-SB correctly capture the shift in peak energy density ( ) from the SS-wave (Figure 12a and b) to the IG-wave band (Figure 12c and d); however BOSZ overestimates the magnitude of the IG peak and shows it at slightly lower frequencies than observed. This, coupled with a minor underestimation of the SS-wave energy-most evident for the mild swell case (Figure 12c)-results in an overestimation of −1,0 .
The consistent underestimation of −1,0 by SWAN is expected due to its exclusion of at IG frequencies (Figure 12c and d). OpenFOAM, on the other hand, does show a shift in energy from offshore to the dike toe; however, it shows two distinct IG peaks (Figure 12c and d), not present in the observations. In the mild swell case, this misrepresentation of at IG frequencies couple with the underestimation of in the SS-wave band resulted in the significant overestimation of −1,0 nearshore ( Figure 11a). Under the steep wind-wave conditions, OpenFOAM also shows considerable in the SS-wave band (nearshore) while the observed spectra shows very little (Figure 12d). This further supports the argument that due to premature wave decay in the model, some unbroken SS waves are able to reach the dike. and neglect the contribution of 0, (Vuik, et al., 2016, Yang, et al., 2012. To further investigate the influence of the IG-waves, we compare the overtopping estimated using SWAN and EurOtop with and without the corrections to    (Table 4). On the other hand, including the IG waves resulted in a 2.5-fold increase in −1,0, , and magnitude 4-fold increase in the predicted �,  Considering the wider model comparison, each model-with the exception of BOSZ-fails to reproduce the overtopping for the mild swell case. This is particularly evident for SWASH and XB-SB which significantly underestimate � for both wave cases, with the calibrated XB-SB model producing zero overtopping. This suggests that while XB-SB may estimate wave run-up accurately in IG-wave dominant environments (Lashley, et al., 2018), it's exclusion of the SSwave component considerably limits its performance for wave overtopping.
The poor performance of SWASH here for wave overtopping is surprising, since it performed reasonably well in the prediction of � , 0 and −1,0 in both cases here and has been previously successful in one-layered mode (Suzuki, et al., 2017). However, Suzuki, et al. (2017) focused on obtaining good agreement at the toe (the last wave gauge only) and the resulting � in their tuning of SWASH; therefore � was not assessed unless wave heights and periods at the toe were within a certain accuracy range, regardless of the input (offshore) conditions. Whereas here, we assess the model's general performance for wave propagation (both offshore and nearshore), in addition to �. It should be noted that a finer grid resolution had little impact on SWASH predictions of � , 0 and −1,0 (~3%), it increased � by a factor of 7-though still significantly underestimated (not shown)-for the mild swell case, with significantly increased computational demand. The models do, however, perform considerably better for the steepwave case. This is consistent with the findings of Roelvink, et al. (2018) and Suzuki, et al. (2017) who showed that XB-NH and SWASH, respectively, were more accurate for higher overtopping rates, but suffered for rates below 0.08 -0.16 l/s per m (in model scale).
The improvement in SWAN with the corrections is most evident for the steep-wave case, with the estimated � now on par with that of BOSZ and outperforming the other more physicallycomplex models. Figure 14 shows the modelled relative overtopping discharge As most of the models performed reasonably well for wave propagation, the excellent agreement between BOSZ and the observed � is likely not dependent on underlying governing equations (Boussinesq versus NLSW) but more to do with how the shoreline and wave run-up are treated numerically. However, an in-depth analysis of the various numerical schemes implemented in each numerical model was beyond the scope of this study.

Accuracy versus Speed
In contrast with the general assumption that models of increasing physical complexity produce more accurate results, Figure 15 shows no clear relationship between computational demand (simulation time) and the absolute relative error in overtopping. Furthermore, the depthresolving models (SWASH and OpenFOAM), which have significantly higher simulation times show larger errors than the depth-averaged models (XB-NH and BOSZ). The phase-averaged models (XB-SB and SWAN (original)), despite their considerable speed advantage, significantly underestimated the overtopping discharge due to their exclusion of higher-and lower-frequency wave components, respectively. However, by including the IG-waves empirically, SWAN's performance improved significantly; now within acceptable limits and on par with those of XB-NH and BOSZ but at little to no computational cost ( Figure 15). It should be noted that the use of SWAN with Equation 36 is already the recommended approach in EurOtop (2018); the novelty here is the further improvement in results offered by Equation

Conclusion
In the present study we assess the ability of 6 widely used numerical models to simulate waves overtopping steep dikes with mildly-sloping shallow foreshores. However, with the exception of OpenFOAM and to some extent SWASH (multi-layered mode) the above (phase-resolving) models were originally developed to simulate wave evolution over mildly-sloping foreshores; and not specifically for wave run-up and overtopping of steep structure slopes. Since their development, the phase-resolving models have each been successfully applied to wave propagation over steep reefs and run-up of relatively steep beaches. Likewise, depth-resolving models like OpenFOAM and SWASH (multi-layered) were originally developed to simulate wave-structure interaction and not specifically for wave propagation. In the present study we test the ability of these models in both applications: i) wave evolution over a shallow mildlysloping foreshore; and ii) the resulting overtopping discharge.
Overall, BOSZ and XB-NH (under steep wind-waves) showed high skill in both applications with a reasonable computational demand; while OpenFOAM, with a much higher computational demand-showed difficulty in performing both functions. The broad implication of the present work is that higher-resolution, more computationally-demanding wave models may simply not be needed; specifically where the analysis is focused on bulk, time-averaged physical quantities ( 0 , −1,0 and �), as shown here. Should more detail be required-for example, estimates of the vertical velocity profile or turbulence-then a depth-resolving model such as SWASH (multi-layer) or OpenFOAM should be applied. Moreover, SWASH and OpenFOAM are likely to perform well if the computational domain begins at the dike toe and ends at the overtopping box; i.e., where simulating wave propagation over a large domain is not required.
In addition, our results showed that with simple empirical corrections, phase-averaged models like SWAN can perform on par-if not better than-phase-resolving models, with much less computational effort. Importantly, our work emphasizes the importance of including IG waves in the design and assessment of coastal dikes; as neglecting their contribution to 0, , and −1,0, can lead to under-predictions in � of up to two orders of magnitude.
Given the scope of the model comparison, including both phase-resolving and phase-averaged, a detailed wave-by-wave comparison of the higher-resolution models was not carried out.
Future work should address this and investigate the influence of the various numerical schemes implemented in the respective numerical models, as this was not within the scope of the present work. Additionally, Equations 31 and 36 were developed (in part) using the wider dataset from which these cases were taken; therefore their performance under different conditions is still to be confirmed. Despite these limitations, the findings here can aid practitioners in their decision making; specifically in deciding which numerical model should be applied based on the level of accuracy required.