Estimating the parameters of ocean wave spectra

Wind-generated waves are often treated as stochastic processes. There is particular interest in their spectral density functions, which are often expressed in some parametric form. Such spectral density functions are used as inputs when modelling structural response or other engineering concerns. Therefore, accurate and precise recovery of the parameters of such a form, from observed wave records, is important. Current techniques are known to struggle with recovering certain parameters, especially the peak enhancement factor and spectral tail decay. We introduce an approach from the statistical literature, known as the de-biased Whittle likelihood, and address some practical concerns regarding its implementation in the context of wind-generated waves. We demonstrate, through numerical simulation, that the de-biased Whittle likelihood outperforms current techniques, such as least squares fitting, both in terms of accuracy and precision of the recovered parameters. We also provide a method for estimating the uncertainty of parameter estimates. We perform an example analysis on a data-set recorded off the coast of New Zealand, to illustrate some of the extra practical concerns that arise when estimating the parameters of spectra from observed data.


Introduction
Due to the random nature of wind-generated gravity waves, it is common to treat them as stochastic processes. There is particular interest in the spectral density function of such wave processes. For this reason, it is important that we are able to construct good spectral density estimators. Using such an estimator, estimates of the spectral density function can be obtained from observed wave records. Broadly speaking, there are two approaches for obtaining such an estimator: non-parametric and parametric. The most basic non-parametric spectral density estimator is the periodogram -the Fourier transform of the sample autocovariance. However, the periodogram is a noisy estimator. Therefore many less noisy estimators, such as that of Bartlett (1948), have been developed. The second approach is to use a parametric spectral density estimator. Here we assume that the spectral density function follows a parametric form, meaning that the inference task becomes estimation of the parameters of this form. In general, parametric estimators are often preferable because they result in smoother estimates and more concise representations of the spectral density function-and the parameters themselves provide physical interpretation of the nature of the wave process.
Many such parametric forms have been developed in the oceanography literature. Phillips (1958) gave theoretical arguments for the tail behaviour of the spectral density function for wind-generated wave processes. Based on this, Pierson and Moskowitz (1964) established a parametric form that characterised the spectral density function of a fully developed sea, describing both the spectral tail and peak behaviour. This was later extended by Hasselmann et al. (1973), so that the parametric form could encompass a wider variety of spectral density functions, including those associated with fetch limited wave processes. This widely used parametric form is usually known as the JONSWAP spectral form. It should be noted that we use JONSWAP to refer to the original formulation given by Hasselmann et al. (1973), with a tail decay of ω −5 (where ω denotes angular frequency).
Despite general acceptance of the JONSWAP spectral form amongst practitioners, there is debate concerning the values of the tail decay index and peak enhancement factor. Arguments for an ω −5 tail decay, made by Phillips (1958), were called into question by Toba (1973) and later by Phillips (1985), who argued that an ω −4 tail had a stronger theoretical basis. Experimental work such as Hasselmann et al. (1973) and Battjes et al. (1987) found evidence for both ω −4 and ω −5 tail decays, while Hwang et al. (2017) could not find evidence for either, further suggesting that the tail decay index should be treated as a free parameter. In addition, there is a large literature speculating on other tail behaviours, such as the occurrence of a transition frequency from ω −4 to ω −5 (Forristall, 1981;Ewans and Kibblewhite, 1986;Babanin, 2010, for example). It is also common to fix the peak enhancement factor to 3.3; however, there is little evidence for using precisely this value. In this work, we adopt a more general version of the JONSWAP spectral form, which treats both the tail decay index and peak enhancement factor as free parameters (though our methods apply to the special cases mentioned, in terms of estimating the remaining parameters of interest). Many authors (Rodrǵuez and Soares, 1999;Ewans and McConochie, 2018, for example) have found that both the tail decay index and peak enhancement factor are hard to estimate accurately, using current techniques. However, both of these parameters are important for determining the properties of a given sea state. Our contention is that current techniques are not sufficiently accurate or precise to allow strong statements to be made concerning the true values of the tail decay index or peak enhancement factor, from typical data sets. Indeed, in Section 5 we demonstrate with simulated half hour records that estimates for the tail decay (using current estimation techniques) range from ω −3 to ω −6 , when the true tail decay is known to be ω −4 . Because there is too much variability in the estimates, it is impossible to determine from data of such lengths if the true tail decay is ω −4 or ω −5 . However, in this work we present an alternative technique that is capable of obtaining these parameters more accurately and precisely, and show in simulated data that this technique can distinguish ω −4 and ω −5 tail decays, even from short records.
The standard approach for estimating model parameters is maximum likelihood inference. When an analytical form for the likelihood function is known, such parameters can be optimally estimated using maximum likelihood (Pawitan, 2001). However, in the case of wind-generated wave processes, the exact probability distribution is unknown. Though it is possible to make the simplifying assumption that the wave process is Gaussian, for many sea states this assumption will not be reasonable. For this reason, it has become common for oceanographers to use a non-parametric estimator of the spectral density function, and obtain parameters by fitting a parametric form in the least squares sense; however, such least squares estimators will in general be sub-optimal when compared to full maximum likelihood (Constable, 1988).
We therefore turn to frequency domain likelihoods, which are widely used in both time series analysis and spatial statistics (Nordman and Lahiri, 2006;Fuentes, 2007, for example). The canonical approach is to use an approximation to maximum likelihood known as the Whittle likelihood (Whittle, 1953).
The Whittle likelihood can be computed quickly using Fast Fourier Transforms and makes minimal distributional assumptions and does not require Gaussianity (Dzhaparidze and Yaglom, 1983). However, the Whittle likelihood has been shown to produce biased estimates for small sample sizes (Dahlhaus, 1988;Velasco and Robinson, 2000). Sykulski et al. (2019) developed a de-biased version of the Whittle likelihood that corrects for this bias, without sacrificing the computational speed or making extra distributional assumptions. In Section 3.4, we will provide more technical detail on why we would expect the de-biased Whittle likelihood to perform better than least squares, both in terms of accuracy (bias) and precision (variance). Then in Section 5 we evidence this claim using both numerical simulations and a real data study.
The contributions of this paper are as follows. Firstly, we introduce the de-biased Whittle likelihood estimator for use on wind-generated wave processes (Section 3). Secondly, we detail practical concerns regarding the implementation of the estimation procedure for wind-generated ocean wave processes (Section 4), with accompanying MATLAB code provided on GitHub (Grainger, 2020). This includes an important generalisation of the Sykulski et al. (2019) procedure to allow parameters to be fitted directly to the proposed spectral form without having to posit an analytical form for the time-domain theoretical autocovariance sequence-as required in Sykulski et al. (2019), but unavailable for ocean wave spectral forms. Thirdly, we present a novel reformulation of the variance of the de-biased Whittle likelihood estimator which can be used to quantify the uncertainty of parameter estimates (Section 6). Finally, we perform a detailed simulation and field data study comparing the performance of different parametric spectral density estimators for wind-generated wave processes (Section 5 and Section 7).

Background
So far we have used the word "wave" loosely to describe the shifting nature of the sea surface. In truth, we are actually interested in modelling the displacement of the sea surface from the resting surface. Of course, in reality this is a 3-dimensional phenomena, but in this paper we shall consider the vertical displacement of the surface at a specific x,y location in time. We can think of the displacement at a given time as being a random variable with some distribution. Therefore we can describe the displacement over time by a stochastic process, an indexed family of random variables, which we shall denote X = {X t } t∈R . Note that this is a family of random variables indexed over continuous time, as the actual physical process is constantly changing. However, since we cannot actually record data continuously in time, we must instead settle for recording the process at discrete points in time. We assume that the data are being sampled regularly and denote the sampling interval ∆ and the process that arises from sampling X every ∆ seconds we shall call for ω ∈ [−π/∆, π/∆], where π/∆ is the Nyquist frequency and is the highest observable frequency of the sampled process. Secondly for the continuous time process, for almost every ω ∈ R (i.e. equal except on a set of measure zero). 1 Similarly, the inverse relations are for τ ∈ Z and for τ ∈ R. The spectral density of the discrete time process, f ∆ (ω), can be thought of as an aliased version of the continuous time spectral density function f (ω). More formally, we have the following relation: for ω ∈ [−π/∆, π/∆] (Percival and Walden, 1993, chapter 4). In Section 5, we demonstrate that aliasing can cause bias in parameter estimation, which is why it is important to define both f (ω) and f ∆ (ω) and understand their relationship.

Non-parametric spectral density estimators
Though our purpose is the analysis of parametric spectral density estimators, it is also pertinent to define some of the non-parametric spectral density estimators that are used throughout the paper. There are two important properties that should be considered when choosing an estimator. The first of these is bias, which is the expectation of the estimator minus the true value. Ideally we would want to choose an estimator that is unbiased, i.e. has a bias of zero. This is often not possible, but the weaker condition of asymptotically unbiased is often achievable. An estimator is said to be asymptotically unbiased if, as the number of observations increases, the bias tends to zero. The second important property is consistency. For an estimator to be consistent it must converge in probability to the true parameter as the number of observations tends to infinity. More formally, denote the true parameter θ 0 and an estimatorθ N , thenθ N is a consistent estimator if, for all > 0, P |θ N − θ 0 | > → 0 as N → ∞.
The most basic non-parametric estimator is the periodogram. Let X ∆,N = {X 0 , . . . , X ∆(N −1) } be a series of N consecutive random variables from X ∆ , then the periodogram is defined as for ω ∈ R. However, the periodogram is typically only evaluated at the Fourier frequencies ω = 2πj/∆N using the FFT procedure, where j = − N/2 + 1, . . . , N/2 . For convenience, we shall write Ω N,∆ for the set of these frequencies. The periodogram can be shown to be an asymptotically unbiased estimator for f ∆ (ω); however, the periodogram is not consistent. In fact, Brockwell and Davis (2006) show that as the number of observations used tends to infinity, the periodogram, at frequencies that are not the Nyquist or zero frequency, becomes exponentially distribution with expectation f ∆ (ω). This means that even with a record observed for an infinite amount of time, the periodogram will be a noisy estimate. In essence, this means that the periodogram at an omega does not converge to a point, but rather to a distribution -though we would hope it converged to a point. It should also be noted that the periodogram is an estimator for the aliased spectral density, not the spectral density of the continuous time process.
For this reason, modified versions of the periodogram, that are consistent, are usually used as an alternative to the periodogram. One such modified periodogram, suggested by Welch (1967), splits the series into smaller segments, calculates their periodogram, and then averages these across frequencies. In practice, Welch's method results in an estimate that is less noisy than a standard periodogram, but has lost resolution in frequency and may be biased. To use Welch's method, a window function and segment length must first be chosen. A subset of such methods is known as Bartlett's method (Bartlett, 1948). This approach uses non-overlapping segments with no window function. In other words Bartlett's estimator is where P is the number of segments and L is the number of observations in each segment (where P L ≤ N ).

Models for the sea surface
When describing the sea surface, models are often expressed in terms of the spectral density function. Many different spectral density functions have been developed for ocean waves, perhaps most notably the JON-SWAP spectra, developed by Hasselmann et al. (1973). We shall consider a more general model, which encompasses many of the other waves models that have been developed. Following Mackay (2016), we use the following parametrisation: where for ω > 0; where α, ω p , s > 0, γ ≥ 1, r > 1 2 and θ denotes the vector of parameters. Typically, and for the remainder of this paper, σ 1 , σ 2 and s are set to 0.07, 0.09, and 4 respectively. In this case, the parameter vector is θ = (α, ω p , γ, r). Also let Θ denote the parameter space -the set of possible values that θ can take. Then this general model, then the parameter space is Note that this is a one sided spectral density, and is not defined at ω = 0. We shall work with the two sided version as this fits in with the way we have defined the spectral density function, the way techniques are described in the statistical literature, and the way Fast Fourier Transforms are implemented on a computer. Therefore, We shall refer to the function defined by (8) as the generalised JONSWAP spectral form.

Fitting parametric spectral density functions
The process of fitting a parametric spectral density function to observations can be thought of as estimating the parameters of a statistical model, which we denote θ. We note that the techniques discussed in this section are applicable to a broad class of spectral density function. As such, we consider the general case and shall write f (ω | θ) for the spectral density function given some choice of parameters θ. We shall also write f ∆ (ω | θ) and c(τ | θ) for the aliased spectral density function and autocovariance function respectively. We also write Σ θ for the covariance matrix of the multivariate random variable corresponding to N consecutive random variables from X ∆ . We now describe each of the fitting methods discussed in this paper.

Least squares
Current approaches to estimating parameters of spectral density functions used in the ocean waves literature, such as the approaches described by Ewans and McConochie (2018), usually involves two key steps. Firstly, a non-parametric estimator of the spectral density function is constructed. Secondly, a curve fitting algorithm is used so that the corresponding parametric form is a good fit for the observed data. Typically this involves minimising the square distance between the parametric form and non-parametric spectral density estimator. As such, we shall refer to such approaches as least squares fitting techniques.
For the purpose of this section, we letĪ(ω) denote a general non-parametric spectral density estimator (this could be the periodogram, I(ω), Bartlett estimator, I B (ω), or some other non-parametric spectral density estimator). The second part of this fitting routine involves fitting the parametric form to the obtained non-parametric estimator. Typically, this is done by minimising the Euclidean distance between the nonparametric estimator and the parametric spectral density function. We therefore must minimise the objective function given by where Ω ⊆ Ω N,∆ (the choice of Ω is discussed in Section 4.1). In other words, the least squares estimator for θ is defined asθ LS = argmin θ∈Θ LS (θ | X ∆,N ). This approach could be adapted to account for aliasing by replacing f (ω | θ) with f ∆ (ω | θ), the aliased spectral density. Though there will usually not be an analytical form for the aliased spectral density, we can instead use the relation given in (5) to construct the approximation by truncating the summation.

Maximum likelihood
Maximum likelihood inference treats the sea surface data as observations of a random variable with a given distribution. The parameters for this distribution are chosen by maximising the probability of observing the data given that the underlying distribution had certain parameters. For the moment, let the sea surface observations be multivariate normal with expectation zero and an unknown covariance matrix Σ θ . The log-likelihood function for observations of such a process is The maximum likelihood estimator is then obtained by maximising the log-likelihood function. More formally, the maximum likelihood estimator of θ isθ M L = argmax θ∈Θ M L (θ | X ∆,N ). Provided that the underlying random variable is actually multivariate normal, this technique will provide asymptotically optimal estimates of θ, in the sense that they converge at an optimal rate (see Pawitan (2001), chapter 8.5, for more details).
However, this approach can be computationally expensive because evaluating the objective function given by (10) requires the inversion of an N × N matrix. Also, if we want to model a distribution that is not Gaussian, then a different log-likelihood function must be used. This may take significantly longer to compute, or may not even be tractable. As previously discussed, wave processes will not typically be precisely Gaussian. However, in Section 5 we shall compare fitting techniques on simulated Gaussian processes in the first instance. In this case, full maximum likelihood provides a useful benchmark to compare the performance of other estimators to the optimal choice of estimator.

Spectral-likelihood
To avoid some of the problems associated with maximum likelihood estimation we can use approximations to the likelihood, known as pseudo-or quasi-likelihoods, to gain some of the accuracy and precision of maximum likelihood, while keeping computational costs (and distributional assumptions) low. One such quasi-likelihood 3 is known as the Whittle likelihood (Whittle, 1953). The Whittle likelihood has been used in a wide range of applications due to its computational efficiency and fairly free distributional assumptions (in particular, we no longer need to assume that the underlying process is Gaussian). In its discretised form, the Whittle likelihood is where I(ω) denotes the periodogram ordinate at angular frequency ω. The corresponding estimator is again obtained by maximising this spectral-likelihood, which we shall denote byθ W = argmax θ∈Θ W (θ | X ∆,N ). This estimator also does not account for aliasing. However, by replacing (11), we obtain an estimator that does account for aliasing. We shall refer to this as the aliased Whittle likelihood, though it should be noted that some authors refer to this as simply the Whittle likelihood.
Though this aliased approach accounts for some of the bias in the Whittle likelihood, other forms of bias introduced through phenomena such as blurring are still present (Percival and Walden, 1993, chapter 6). Sykulski et al. (2019) introduced the de-biased Whittle likelihood to deal with both aliasing and blurring simultaneously. The de-biased Whittle likelihood is wheref n (ω | θ) = E [I(ω) | θ] is the expected periodogram. As noted by Sykulski et al. (2019), the expected periodogram can be calculated in O(N log N ) time by using the relation: The resulting estimator can then be expressed asθ DW = argmax θ∈Θ DW (θ | X ∆,N ) . Despite being constructed from the periodogram, an inconsistent estimator of the spectral density function, the de-biased Whittle likelihood is a consistent estimator of the parameters for the parametric model. The de-biased Whittle likelihood is able to address the deficiencies in the periodogram without introducing bias, by accounting for the finite sample properties of the periodogram. Sykulski et al. (2019) also show that, under certain conditions, the de-biased Whittle estimator converges optimally. These condition are discussed further in Appendix C.

Comparison
In Section 5, we perform a simulation study to compare each of the estimators which we have discussed. However, it is also possible to set up a theoretical framework to compare the assumptions that each technique makes about the recorded data. To achieve this we shall consider the conditions in which each technique would be equivalent to full maximum likelihood, then evaluate how likely it is that said assumptions are satisfied.
For least squares to be equivalent to full maximum likelihood, we would need the non-parametric spectral density estimator used in the fitting routine,Ī(ω) , to satisfy the following four assumptions. Firstly, at each frequency, the non-parametric estimator must be Gaussian. In general, this is not true for non-parametric spectral density estimators, though it is true asymptotically for some of them (e.g. Bartlett's method). Secondly, the expectation of the non-parametric estimator must be equal to the spectral density function at a given frequency. This is not actually true for non-parametric spectral density estimators, as these are constructed to estimate the aliased spectral density function, not the spectral density function of the continuous time process. Though this aliasing could be accounted for by modifying the spectral form used in the fitting routine, many non-parametric spectral density estimators are biased, and such modification is not standard practice. Thirdly, the variance of the non-parametric spectral density estimator must be the same for each frequency. This is not the case for non-parametric spectral density estimators in general, as the variance at a given frequency depends on the spectral density function at that frequency (Brockwell and Davis, 2006). Though weighted least squares approaches, such as the approach proposed by Chiu (1988), do begin to address the problem of assumption three, they are not widely used and still make the first and second assumptions. Fourthly, the non-parametric estimators at any two different frequencies must be uncorrelated. This assumption is discussed further in Section 3.5.
For the Whittle likelihood to be equivalent to maximum likelihood we would need the following three assumptions to hold. Firstly, we require the periodogram to be exponentially distributed at each frequency. 4 Secondly, we require that the expectation of the periodogram is equal to the spectral density function at a given frequency (and consequently that the variance is the square of the spectral density function). Thirdly, the periodogram at any two different frequencies must be uncorrelated. The first assumption is true asymptotically (Brockwell and Davis, 2006). At first glance, this may seem to be equivalent to the asymptotic normality of Bartlett modified periodograms that are often used in least squares. However, it should be noted that in the case of the periodogram, this asymptotic result is in term of the number of observations; whereas for Bartlett modified periodograms, this result is in terms of the number of segments that are used, which is much smaller. When it comes to the second assumption, the periodogram is an asymptotically unbiased estimator of the aliased spectral density function. For this reason, the aliased version of the Whittle likelihood should be used over the standard version. Again it may seem that this is also true for Bartlett modified periodograms, as Bartlett's method averages periodograms and each periodogram is an asymptotically unbiased estimator of the aliased spectral density function. Therefore if we were to adjust for this aliasing least squares would be justified. However, each of these component periodograms are calculated from small segments of the full record, so it is difficult to invoke asymptotic results. Indeed, this creates somewhat of a catch-22 for Bartlett least squares: to get asymptotic normality we must average many periodograms; however, this results in using shorter segments for each periodogram, introducing bias (and vice versa). The de-biased Whittle likelihood (Sykulski et al., 2019) bypasses the second assumption (made by the Whittle likelihood) altogether, as it uses the theoretical expectation of the periodogram in place of the spectral density function. This means that even for small sample sizes the de-biased Whittle likelihood produces estimates with very small to no bias. The final assumption, the assumption of independence between frequencies, is required by both least squares and spectral-likelihoods; however, the Whittle likelihood is also in a strong position when it comes to satisfying this assumption. This is because asymptotically the periodogram is uncorrelated at different frequencies, and we are using the longest periodogram possible, given the length of the data. Of course, least squares techniques could be used on the periodogram, meaning that the second and last assumptions are just as likely to be satisfied as when using spectral-likelihoods. However, the asymptotic normality required for least squares will not be satisfied (nor in general will the constant variance assumption).
When it comes to the final assumption for both least squares and spectral-likelihood techniques, there are some practical concerns that should be considered. In particular, when the aliased spectral density has high dynamic range, the frequencies are often correlated. As we shall shortly show, in the case of wind-generated waves, this issue does not present itself for 1Hz data; however, for higher sampling frequencies, such as 4Hz data, the periodogram is often highly correlated. To solve this problem we can turn to differencing, a technique that is well established for reducing correlations in the periodogram Velasco and Robinson (2000).

Differencing
If the periodogram is highly correlated, spectral-likelihoods will perform poorly when compared to full maximum likelihood Velasco and Robinson (2000). Differencing can sometimes provide a convenient mechanism for removing such correlations. Define the differenced process as Y t = X t+∆ − X t . We briefly switch notation and let c X (τ ) and f X (ω) denote the autocovariance and spectral density function of X at τ and ω respectively, and likewise c Y (τ ) and f Y (ω) for the differenced process Y . First notice that by stationarity. Then note that Therefore, differencing can be easily incorporated into the fitting techniques that have been discussed in this paper, by simply replacing X with the differenced process Y and f X with f Y using the relation given by (20). Figure 1 shows a plot of the correlation matrix for the periodogram of a wind-generated wave process, estimated using the technique described in Appendix B. We can see that for data recorded at a 1Hz sampling rate there is little correlation in the periodogram; however, this is not the case for 4Hz data. We can also see that the periodogram of the differenced process is almost completely uncorrelated, even for the 4Hz data. 5 From the signal processing perspective, this has reduced the dynamic range of the spectrum as we are multiplying the spectral density function by something that is close to zero for angular frequencies that are small, but is close to one near the Nyquist, clearly down-weighting the peak far more than the tail.

Practical concerns for implementation with the generalised JON-SWAP
In Section 3, we described some techniques that can be used to estimate model parameters. When implementing these techniques for ocean wave models, there are some practical concerns that must be addressed. Firstly, we need not use all of the Fourier frequencies when fitting the model. Indeed, it may be preferable to remove some frequencies that are contaminated by some other process or by observational noise. Secondly, there is no known analytical form for the autocovariance corresponding to many of the spectral density functions used when modelling ocean waves. Therefore, numerical techniques for estimating the autocovariance play an important role in many of the fitting procedures discussed in Section 3. In particular, it is necessary for both the de-biased Whittle likelihood and for full maximum likelihood.

Frequency selection
Many of the estimators defined in Section 3 involve minimising or maximising objective functions, which are expressed as the sum over some set of frequencies Ω ⊆ Ω N,∆ . The most simple choice for this set Ω is just the set of Fourier frequencies Ω N,∆ . At first glance, this would seem like the most sensible choice (as omitting frequencies is essentially the same as throwing away data-points). However, there are many different circumstances in which it is preferable to remove some of the frequencies from the fit.
One practical reason for removing certain frequencies is that for very low frequencies, the generalised JON-SWAP spectra is zero to machine precision. This often introduces numerical instabilities, especially for objective functions that involve dividing by the spectral density function (such as the Whittle likelihood). Another reason for removing certain frequencies from the fit is that it can help to remove noise processes that are present in a record. For example, wave records often contain a low-frequency swell component, but we are interested in the parameters of the wind sea component. By removing frequencies in which the swell is dominant, we are better able to model the wind sea component of a sea state. On top of this, there is an added technical concern when using the Whittle and de-biased Whittle likelihoods. The zero and Nyquist frequencies must be omitted (or a modified version of the summand must used for those frequencies). This is because these methods are based on the asymptotic distribution of the periodogram, which is different at the Nyquist and zero frequency than it is at other frequencies.
Fitting the model in this way can be thought of as fitting a semi-parametric model (as some of the frequencies are being modelled using a parametric model, and the remaining frequencies by some non-parametric model such as the periodogram). It is worth noting that this approach can actually be applied to full maximum likelihood as well. This can be achieved by transforming both the observations and autocovariance of the model into the frequency domain, applying a band pass filter, and then transforming back. While this is possible in theory, it is fiddly in practice and is no longer exact. This demonstrates another major advantage of spectral-likelihoods: it is far easier to filter out undesired frequencies from the model fit. However, the choice of frequencies to be used in the fit should be made prior to the objective function being optimised. Otherwise the number of degrees of freedom could be changing throughout the optimisation routine, which would likely cause problems with the optimisation, resulting in bias.

Numerical estimation of the autocovariance
To calculate both the maximum likelihood and de-biased Whittle likelihood we require the autocovariance of the process given a certain parameter choice (in (10) for maximum likelihood and (13) for the de-biased Whittle likelihood). For the generalised JONSWAP spectra, there is no analytical form for this autocovariance. As such, the autocovariance must be approximated numerically. Firstly, note that the autocovariance is the Fourier transform of the spectral density function, as defined in (4), and we wish to obtain the autocovariance at lags 0, ∆, . . . , (N − 1)∆. The first problem we encounter is that this integral is over the entire real line. Clearly, we cannot approximate such an integral numerically and must instead settle for integrating up to some finite frequency, such that the spectral density function beyond that frequency is sufficiently small. In particular, it is convenient to choose a multiple of the Nyquist frequency, as the integral will be approximated using a Fast Fourier Transform, so the desired lags can be extracted by sub-sampling if a multiple of the Nyquist is used in the integration. Therefore, based on equation (4), we can construct the approximate autocovarianceĉ for L ∈ N = {1, 2, 3, . . .}.
Alternatively, we could consider the relation given in equation (3), between the autocovariance and the discrete time spectral density function. In this case, we would first need to approximate the spectral density function for the discrete time process. To do this, we use a truncated version of the relation given by equation (5), between the spectral density of the continuous and discrete time processes. Consider the approximation of the aliased spectral density function given bỹ for K ∈ N 0 = {0, 1, 2, . . .}. Then define the approximation to the autocovariance given bỹ Notice that we may writec From (27) we can see that, if L = 2K + 1, thenĉ(τ ) andc(τ ) are equivalent. In practice, these integrals must be approximated numerically. To do this, we consider a Riemann approximation with bins of width 2π/M . By choosing M to be a multiple of N , we can obtain the desired lags by performing a Fast Fourier Transform and then sub-sampling appropriately.
We can now see that the approximation based onĉ(τ ) (in (21) For this reason, we use the latter approach when approximating the autocovariance: first approximating the aliased spectral density, then approximating the autocovariance. The choice of K (or equivalently L) depends on the tail behaviour of the spectral density function in question. In practical implementation, we choose K so that for frequencies beyond (2K + 1)π/∆, the spectral density is below some threshold (typically 10 −6 m 2 s rad −1 ), though K should really be chosen so that it scales with N , for convergence results to still apply. The choice of M is based on the required accuracy of the integral approximation and should be tuned accordingly. We have found in simulation a high precision could be achieved at low computational cost with relatively small values of K and M , further details are given in the accompanying MATLAB code.

Simulation Study
Though it is possible to make theoretical statements about the asymptotic behaviour of different estimators, from a practical perspective, their finite sample behaviour is of primary interest. To investigate this, we shall perform a simulation study to assess the performance of the estimators described in Section 3. In this simulation study, we compare six different fitting techniques based on these estimators. The first, which we call least squares, uses the curve fitting approach with the periodogram. The second approach is similar, but uses Bartlett's method to estimate the spectral density function, chosen so that we have a spectral resolution of 0.2π, i.e. the window size is 100/∆. 6 For 1.28Hz data, this corresponds to a window size of 128. The third and fourth approaches are the Whittle and aliased Whittle likelihoods respectively. The final two approaches are the de-biased Whittle likelihood and full time domain maximum likelihood.

Method
To investigate the effectiveness of different fitting approaches we simulate a linear wave record with a known parametric spectral density function and then re-estimate the parameters from the simulated record. By repeatedly performing this process, we can assess the bias and variance of each of the estimators discussed in Section 3. For the purposes of the simulation study we let X ∆,N be a random variable with a multivariate normal distribution resulting from sub-sampling the continuous-time mean-zero stationary Gaussian process X, where X has spectral density function f G (ω | θ), defined by (8). We then simulate a realisation of X ∆,N using the circulant embedding method described by Davies and Harte (1987) (and for complex valued processes by Percival (2006)). We choose to use circulant embedding over the typical approaches for simulating Gaussian processes often used in the ocean waves literature, such as the method due to Tucker et al. (1984), as these methods only approximately simulate a Gaussian process with the given spectral density function, whereas circulant embedding is exact (up to the quality of the approximation of the autocovariance which is used). Furthermore, many techniques, such as the method proposed by Tucker et al. (1984), or the more recent modification due to Mérigaud and Ringwood (2018) do not account for aliasing when simulating the process, thus making the simulated process affected by the choice of the Nyquist sampling frequency which is undesirable. Since we are explicitly interested in the effect that aliasing has on recovered parameters, it is important that we simulate something that is as close as we can get to a Gaussian process with the desired aliased spectral density function, regardless of the sampling rate.
To perform the fitting we first choose one of the objective functions described in Section 3 and optimise this using the fminsearch function in MATLAB (with maximisation done by minimising the negative of the objective function). An initial guess for the fitting procedure needs to be provided for each of the parameters.
For ω p , we use the frequency corresponding to the largest value of the periodogram. For r, we use a basic linear regression coefficient between the log spectral density and log periodogram over the tail frequencies (where the tail is chosen to be all frequencies that are closer to the Nyquist than the peak). We choose to initialise γ by setting it equal to 3. This is because choosing γ heuristically is not easy, and γ = 3 is close to the value commonly assumed by many oceanographers. In practice, the initial choice of γ does not seem to have a huge impact on the final fitted values; however, the optimisation could also be run with multiple starting values of γ and the best estimate could then be selected. Once these parameters are initialised, α is initialised so that the area under the initial parametric spectral density function matches the area under the periodogram. In practice, we find that the inference is not sensitive to the initial guess (provided it is sensible). Once this initial model was decided, the spectral density is evaluated at each of the required frequencies. Any frequency with spectral density below a certain threshold is removed from the analysis to ensure numerical stability. Note that in practice we are often fitting models to multiple consecutive sea states. In this case, it can be more efficient to use the parameter estimates for the previous sea state as initial values when optimising.

A canonical sea state
We shall begin by considering how each of the estimators perform for one choice of true parameters, before showing that the results are robust to the true parameters. In particular, we begin by considering a spectral density function of the form described in Section 2.2, with σ 1 = 0.07, σ 2 = 0.09 and s = 4 treated as known, and with α = 0.7, ω p = 0.7, γ = 3.3 and r = 4 treated as unknown parameters to be estimated. The reason for choosing these parameters is that α = 0.7 roughly corresponds to the scaling present when using Phillip's constant in a JONSWAP spectra, ω p = 0.7 is a reasonable choice for peak frequency, γ = 3.3 is commonly assumed to be the peak enhancement factor, and r = 4 is one of the suggested values for the tail decay index.
Half hour records sampled at 1.28Hz (a standard time interval and sampling frequency for wave records) were simulated and the parameters were estimated using each of the six estimation methods described above. The resulting estimates across 200 repeated simulations are summarised in Figure 2, alongside the time taken to perform the optimisation. For comparison, the true value of each parameter is given by a horizontal black dashed line.
Perhaps the most striking feature of Figure 2 is the difference in the variability in estimates of the tail parameter, r, when comparing least squares type techniques to likelihood based techniques. Least squares techniques recover parameter estimates ranging from well beyond three to five, making it very difficult to make any statements about the true value of the tail decay. However, we can see that statistical techniques such as the de-biased Whittle likelihood are able to recover the original tail parameter to within a few decimal places. Therefore, by using the de-biased Whittle likelihood, practitioners would be able to distinguish between ω −4 and ω −5 spectral tails in observed records. We can also see that the de-biased Whittle likelihood offers an improvement in estimates of γ, performing almost as well as full maximum likelihood.
It is also interesting to note that bias can be seen in both the Whittle and aliased Whittle likelihood estimates; however, this bias is not present in the de-biased Whittle likelihood estimates. This verifies that the de-biased Whittle likelihood is indeed accounting for some of the bias present in standard Whittle likelihood, and demonstrates why de-biased Whittle likelihood is necessary over the aliased Whittle likelihood, which can still be seen to be biased for some parameters. It is also clear from Figure 2 that full maximum likelihood provides the best estimates, both in terms of bias and variance. However, this comes at significant computational cost, whilst giving limited improvement in bias and variance when compared to the de-biased Whittle likelihood.
Often, during optimisation, parameters may trade off against one-another. Therefore it is also important to look at the joint behaviour of parameter estimates. Figure 3 shows a scatter plot of the de-biased Whittle likelihood estimates from Figure 2. We can see that there is very strong correlation between the estimates of α and r, and some negative correlation between α and γ. This likely occurs because γ and r change the area under the spectral density function, so α is likely to be adjusted to compensate. Though it would be possible to reparameterise to try and avoid this, it does not seem to have a significant impact on the resulting estimates and is therefore not necessary.
In practice, longer sea states are often used to estimate model parameters. Therefore, we also compare some of the methods for 3 hour records. Figure 4 shows the comparison of least squares, Bartlett least squares and de-biased Whittle likelihood estimates for these 3 hour records. The variance in the first two estimators has indeed decreased when compared to the estimates from half hour records shown in Figure 2. However, by comparing the de-biased Whittle likelihood estimates in Figure 2 to the least squares estimates in Figure  4, we can see that the de-biased Whittle likelihood used on a half hour record yields better estimates than the least squares based estimates performed on 3 hour records. The longer record reduces the variance of the least squares and Bartlett least squares techniques enough to allow us to see another interesting feature, namely that there is significant bias present in the Bartlett least squares estimates that is not present in the standard least squares estimates. This demonstrates that non-parametric smoothing can have unexpected consequences when used to fit a parametric spectral density function.
In essence, by using the de-biased Whittle likelihood, we can obtain more accurate and precise parameter estimates, whilst simultaneously reducing the length of the record required to obtain the estimates. This improvement is especially noticeable (and important) for the peak enhancement factor, γ, and tail decay index, r. In practice, this has two important consequences. Firstly, we can reduce the amount of time for which the surface is assumed to be stationary. This means that we can fit stationary models to weather systems that evolve very quickly, such as tropical cyclones. Secondly, we can obtain parameter estimates at more frequent time intervals. This higher parameter resolution means that we gain a more detailed insight into how certain parameters evolve throughout the course of a meteorological event.

Robustness of results
In Section 5.2 we have demonstrated that the de-biased Whittle likelihood can produce parameter estimates that are both more accurate and more precise than those produced by least squares techniques, without making huge sacrifices in terms of computational time. However, it is also important to check that these results extend to different choices of the true parameter. We have performed simulation studies similar to the  Figure 4: Boxplots of parameter estimates using three different fitting routines on the same simulations using the same true parameters as Figure 2 and same sampling interval but simulating 3 hour records. one described above on many different choices of true parameter, with the resulting conclusions being similar for each of the choices. In particular, the ordering of techniques in terms of which is the most accurate and precise does not change, though for some choices of true values the estimates may be more varied than for others. Because of this, and for the sake of brevity, we shall omit such plots from the paper (though they are available online, alongside the code to generate them).
For illustration, we discuss the results for the special case when γ = 1. The fitted parameters for half hour simulated records with true parameter values α = 0.7, ω p = 0.7, γ = 1 and r = 5, using the least squares, Bartlett least squares and de-biased Whittle likelihood techniques can be seen in Figure 5. This is an interesting case because the true parameter lies on the boundary of the parameter space, but this value of γ corresponds to a Pierson-Moskowitz spectrum, for a fully developed sea. This means that such a value of γ could occur in nature, and as such it is important that we can model this case. The problem is that the theoretical guarantees for an approach such as the de-biased Whittle likelihood rely on the assumption that the true parameter does not lie on the boundary of the parameter space. Since this is not the case for γ = 1, therefore we must take care when dealing with records for which the true value of γ may indeed be 1.
In particular, we can see from Figure 5 that the de-biased Whittle likelihood estimates for γ is not normally distributed, a result that is assumed when constructing the confidence intervals that are described later in Section 6. One way to deal with this problem is to fit both a model with γ = 1 fixed and one with γ as a free parameter, and then test to see if there is evidence for γ > 1. Such a procedure allows us to circumvent potential issues caused by γ lying on the boundary of the parameter space. This could be performed by using the procedure developed by Sykulski et al. (2017), adapted to the 1D case.

Differencing for high sampling frequencies
As we discussed in Section 3.5, for wind generated waves observed at a 4Hz sampling rate, the periodogram is highly correlated. As noted in 3.4, in this case, we would expect spectral techniques to perform poorly compared to full maximum likelihood. Differencing can reduce the correlation in the periodogram, and therefore can be a powerful tool to remove bias from spectral methods. Figure 6 shows box plots of estimated parameters for least squares, Bartlett least squares and de-biased Whittle likelihood techniques, both with and without differencing, for both 1Hz and 4Hz data. We can see little benefit from differencing in the 1Hz estimates ( Figure 6a). However, when it comes to 4Hz data, there is a major benefit to differencing, especially for the de-biased Whittle likelihood. Because there is little difference between the de-biased Whittle fits on the original and differenced 1Hz data, we would recommend that differencing is used as standard, to protect against the issues seen in Figure 6b. At the very least, investigating the correlation matrix of the periodogram should be an important diagnostic when fitting spectral models.

Quantifying estimation uncertainty
As with any statistical analysis, it is important to quantify the uncertainty in the parameter estimates that are obtained using the de-biased Whittle likelihood. Sykulski et al. (2019) developed an approach for quantifying the asymptotic variance of the estimator for a single time series. Say that we have p parameters that are written in a vector θ and are interested in the de-biased Whittle likelihood estimatorθ DW of the true parameter vector θ. Sykulski et al. (2019) decompose the variance as where H(θ) denotes the matrix of second derivatives of the log-likelihood function DW (θ), and ∇ denotes the vector of first derivatives of DW (θ).
Note that for the de-biased Whittle likelihood, we may write Figure 5: Boxplots of parameter estimates using two different fitting routines with true parameters α = 0.7, ω p = 0.7, γ = 1 and r = 5, estimated from simulated half hour records sampled at 1.28Hz.   for j = 1, . . . , p. As will be seen in Appendix B, this is required to calculate var (∇ DW (θ)). Furthermore, we may write for j, k = 1, . . . , p. Therefore, we require only the first derivative of the expected periodogram in order to compute both parts of the variance decomposition given by (28).
Note further that the triangle function and Fourier basis are constant with respect to θ, so from (13) we have that for j = 1, . . . , p. In other words, to calculate the derivative of the expected periodogram, we may first calculate the derivative of the autocovariance, then calculate the expected periodogram of the process with that derivative as its autocovariance. In the case of the generalised JONSWAP spectral density function, when an analytical form for such autocovariance is unavailable, we may instead approximate the derivative of the autocovariance by first approximating the derivatives of the aliased spectral density function, and then of the autocovariance.
In Appendix A we show that, for the generalised JONSWAP form, partial derivatives of f ∆ (ω | θ) can be calculated from the derivatives of the spectral density function, and that the resulting derivatives are continuous. Therefore, by Leibniz's rule As a result, we may approximate the derivative of the autocovariance from the derivative of the aliased spectral density function in the same way that we approximated the autocovariance from the aliased spectral density function in Section 4. Therefore, each partial derivative can be computed in the same time it would take to do one function evaluation, which is faster than using numerical approximations for the derivative (e.g. finite differencing methods).
In Appendix B, we discuss a novel procedure for estimating var (∇ DW (θ)) in a computationally efficient manner. Combining this with the approach for obtaining derivatives, we can estimate the variance of the de-biased Whittle likelihood estimator using (28). This enables the computation of approximate confidence intervals for our estimates, by using the asymptotic normality of the estimator and standard theory for confidence intervals.

Maui data
Observations of fetch-limited seas made near the Maui-A platform off the coast of New Zealand between November 1986 and November 1987 have been studied previously by Ewans and Kibblewhite (1986) and Ewans (1998). We revisit these observations to demonstrate additional practical concerns when applying the techniques discussed in this paper to observed wave records. Observations were taken using a Datawell WAVEC buoy at a location with a water depth of 110m. Data were recorded for 20 minutes at a sampling rate of 1.28Hz, then there is a gap of 10 minutes while data is processed. Non-parametric spectral density estimates were recorded for each of these intervals, while the raw time series was only retained for the first 20 minutes of a 3 hour segment. In other words, a 20 minute record is available starting at 00:00, 03:00, 06:00 etc. for each day.
For the sake of illustration, we shall analyse the behaviour of spectral parameters throughout a single storm event, from the 2nd to the 7th of November, 1986. As can be seen from Figure 7, throughout the first and second days, the conditions become steadily more severe, before calming down over the proceeding days.
Transforming to the frequency domain, we see the presence of persistent swell, which remains fairly constant throughout all of the records. Parameters of the wind-sea component of these bi-modal seas are estimated by first removing contaminated frequencies from the objective function, with optimisation proceeding as previously described. 7 Ideally we would aim to fit a bi-modal model that was designed to also describe the swell component. However, such an approach is beyond the scope of this paper, though we note that preliminary results for such a procedure are encouraging and are discussed further in Section 8. It was also observed that frequencies beyond 3 rad s −1 exhibited a different behaviour from the rest of the spectral tail. We suspect this may be related to the physical response of the buoy to the waves, though we do not know exactly how such behaviour should be modelled. Therefore, we also chose to omit those frequencies in the spectral fitting procedure when using each of the fitting procedures. After filtering out the aforementioned frequencies, a generalised JONSWAP spectral form was fitted to each of the sea states in turn, using least squares, Bartlett least squares, and the de-biased Whittle likelihood. These estimates are shown in Figure  7a.
From Figure 7a, we can see that, especially for the tail index, the de-biased Whittle estimates seem more stable than both the least squares and Bartlett least squares estimates. Figure 7b shows the de-biased Whittle likelihood estimates and approximate 95% point-wise confidence intervals, calculated using the technique described in Section 6. It is worth noting that these confidence intervals are based on the asymptotic distribution of the estimator, in particular, we assume asymptotic normality of the estimator. This is especially problematic for values near to or on the boundary of the parameter space, as the confidence interval may include values outside the boundary. To represent this, in Figure 7b, we only shade regions of the confidence interval that are within the parameter space. Such confidence intervals enable us to better understand and communicate the uncertainty surrounding parameter estimates. This uncertainty, we argue, should be considered when using these parameters as inputs for other related models.
From Figure 7b we can see that the parameters behave broadly as expected when moving through the storm. We see a steady evolution in peak frequency throughout the growth of the wind sea component, alongside large values of γ as the storm is developing. After this point, the values of γ reside close to 1. We can also see that there is a greater level of uncertainty when H s was low. This is because there is very little energy in the wind-sea component. However, as the storm evolves, this uncertainty quickly reduces. The uncertainty surrounding the peak enhancement factor remains fairly high through the second half of the first day, even when the uncertainty around other parameters has reduced, but this is because when the true value of γ is large, information about γ resides in relatively few frequencies and as such, there is a greater amount of uncertainty surrounding the estimates.
A useful diagnostic tool for spectral models can be constructed by noticing that I(ω)/E [I(ω) | θ] should be approximately exponentially distributed with mean 1 (Brockwell and Davis, 2006), where I(ω) denotes the Periodogram (this result is specific to the periodogram and is not be true in general for non-parametric spectral density estimators). Therefore, if we calculate this ratio for a given model at each Fourier frequency, we can then compare the quantiles of these fractions to the quantiles of an exponential distribution with mean 1 using a QQ-plot. Figure 8a and 8b show the periodograms and fitted models on both the standard scale and the decibel scale respectively, for 4 time periods during the early development of the storm. Figure  8c shows the corresponding QQ plots, which demonstrate how well each of the models fits the data. The closer the points on the QQ-plot are to the reference line y = x, the better. We can see from Figure 8c that the de-biased Whittle likelihood consistently outperforms the other techniques. Plots for the remaining sea states are available alongside the MATLAB code on GitHub (Grainger, 2020).

Discussion and Conclusion
The de-biased Whittle likelihood has been shown to yield major improvements in both the bias and variance of estimated parameters for wind-generated waves. In particular, the tail decay index can be estimated to much greater levels of accuracy and precision than when using least squares techniques. Such an improvement will (a) Parameter estimates using least squares, Bartlett least squares and de-biased Whittle likelihood.
(b) Parameter estimates using de-biased Whittle likelihood with approximate 95% confidence intervals. Figure 7: Plots of H s and the estimated parameters over time, from the second to the seventh of November, 1986. Estimates are from 20 minute records starting every 3 hours and are plotted at the start time of each record. Note that some of the plots in (a) have been truncated. The maximum of the least squares estimate for α was 9.37 and maximum and minimum for r were 9.66 and 2.28.  Each column is from records recorded on the fourth of November, 1986, starting at 09:00, 12:00, 15:00 and 18:00 respectively. Colours and line styles are used to denote different fitting techniques in the same manor as Figure 7a. In (c), reference lines are added to each of the fitting methods to aid comparison to the black line denoting x = y. The symbols "+", "×" and "•" denote quantiles from the de-biased Whittle likelihood, least squares estimates, and Bartlett least squares estimates, respectively. Note that the plots in (c) have been truncated in the y axis so that they have a 1:1 aspect ratio, so some quantiles are not shown. The untruncated figures can be found in Grainger (2020).
enable reliable tracking of the tail decay index's behaviour throughout the course of meteorological events, allowing oceanographers to gain fresh insights into the behaviour of wind-generated waves. We have also demonstrated some improvement in the estimation of the peak enhancement factor. The de-biased Whittle estimator recovers estimates that are of similar quality to full maximum likelihood, which can be thought of as optimal. Since information about the peak enhancement factor is contained in a small region around the peak frequency, it is not surprising that it is so hard to estimate. Because there is significant variability in estimates of the peak enhancement factor, it is essential that we can describe uncertainty surrounding parameter estimates when performing an analysis. For this reason, the development of computationally efficient techniques for quantifying uncertainty surrounding the estimated parameters is important. We have shown that the method presented by Sykulski et al. (2019) for estimating the variance of the de-biased Whittle estimator can be modified so that it can be computed using 2D Fast Fourier Transforms. Combining this with an analytical approach for calculating derivatives, we are able to calculate the uncertainty in parameter estimates accurately and quickly. In addition, differencing can be used so that we can cope with high sampling frequency data, which tend to be correlated in the frequency domain.
As we discussed previously, when performing simulations we chose to use circulant embedding (Davies and Harte, 1987) to obtain realisations of Gaussian processes with the desired covariance matrix. This is different to the standard method used for simulating linear ocean waves, presented by Tucker et al. (1984) (or to the adapted version due to Mérigaud and Ringwood (2018), which is preferable). The first difference is that the standard waves method is only approximate, and does not exactly simulate a Gaussian process with the desired covariance (equivalently spectral density). The second is that these methods do not account for the aliasing that we would expect to be present in a record, essentially treating the spectral density as if it is zero beyond the Nyquist frequency. Though for many applications this will not matter, the methods discussed in this paper will be sensitive to this difference. If the simulated record does not have the aliasing that should be present, then the parameters estimated using the de-biased Whittle likelihood will seem biased (and standard Whittle will often seem better). This is important because if a method such as Tucker et al. (1984) was used to perform the simulation study described in this paper, the results would be different, as likelihood based estimation will be sensitive to this problem (which could be thought of as model miss-specification, as we are trying to fit a model with a non-zero density beyond the Nyquist to a process that has been simulated with no density above the Nyquist). Least squares is somewhat invariant to this problem, as it mainly effects frequencies with low levels of energy, and least squares is not heavily influenced by such frequencies.
We have also developed a MATLAB toolbox, implementing the methods discussed in this paper. This is available, alongside code to generate the figures in this paper and additional supplementary figures, on GitHub (Grainger, 2020). The toolbox contains code to perform each of the fitting techniques discussed in this paper on arbitrary processes, as well as a function implementing the generalised JONSWAP (including first and second derivatives), which can then be used straight out the box. On top of this, the user may provide any spectral density function they wish and then use the toolbox to obtain parameter estimates from observations. Alongside this, an implementation of circulant embedding is provided, enabling exact simulation of a desired Gaussian process.
Though the de-biased Whittle likelihood has been seen to perform well in the estimation of wind-seas, we have yet to fully explore its potential when we wish to describe multi-modal seas (e.g. including one or more swell components). In this paper, we were able to describe the wind-sea component of such sea states by first removing swell with a high pass filter. However, it would be preferable to develop and fit models that were bi-modal themselves, avoiding the need for partitioning schemes to determine which frequencies should be filtered. Indeed, such a procedure could be extended to describe seas with any number of components, using model selection to determine the number of components that are actually present. Such techniques could also allow for the development of model-based partitioning schemes, that were able to separate overlapping wind-sea and swell components. In the development of such multi-modal models, it would be important to consider the interactions between different component weather systems. In particular, techniques such as higher-order spectral analysis could be used to determine if any non-linear interactions were present and help characterise them. Such interactions could then also be parameterised and fitted using similar techniques to those discussed in this paper.
Another important aspect of wind-generated waves is their directional characteristics. One approach to describing these is to assume some dispersion relation between wave-number and frequency, and then look at frequency direction spectra, which can be estimated, for example, from heave-pitch-roll buoys (Longuet-Higgins et al., 1963). However, it may preferable to model the three recorded series instead, requiring the use of a multivariate extension to the de-biased Whittle likelihood. These are all important avenues of further investigation.
In summary, we have demonstrated that, by using the de-biased Whittle likelihood, we are able to obtain more accurate and precise estimates of parameters for the wind sea component of a wind-generated wave process in O(N log N ) time. Using differencing, we are able to obtain the same quality estimates when the periodogram is correlated. Furthermore, we have described a procedure which can be used to obtain the variance of such estimates in O(N 2 log N ) time. Together these developments will improve the tools available to practitioners, both in terms of fitting models to data and describing the associated uncertainty.
Proof. This first part follows from Proposition A.1, the convergence and continuous differentiability of f ∆ , and Theorem 4.4.20 of Trench (2013). The continuity follows from the continuity of the derivatives of f G and the uniform limit theorem.
B Computing the variance of the first derivative Sykulski et al. (2019) decompose the variance of the first derivative of the de-biased Whittle likelihood as follows: where ω j denote Fourier frequencies and To estimate the variance, they propose usinĝ a ij (θ DW )a ik (θ DW )ĉ ov (I(ω j ), I(ω k )) .
For ocean wave models, a ij (θ DW ) can be easily computed by using the results from Section A. So, from Sykulski et al. (2019), we are interested in computinĝ cov (I(ω j ), I(ω k )) = ∆ 2πN where To do this efficiently, first note that It is also convenient to write Now, consider the function where we have rearranged for later convenience. We can now see, by linearity, that Thus for r = 0, . . . , N − 1 and s = 0 . . . , N − 1 we must calculate Notice that by letting t = r + s we need to calculate the following integral for t = 0, . . . , (2N − 1) − 1 For clarity let then we need to compute for t = 0, . . . , 2N − 2. We notice that this is a Fourier transform and we can obtain an estimate at each of the required t by doing an fft on the relevant length 2n − 1 sequence (the details of this are discussed in Section B.1). This means that we only need to do one Fourier transform, and can then sub this into the previous sums. Now we require e is∆ωj e ir∆ω k Q(r + s) whereQ(r, s) = Q(r + s). This is a 2D Fourier transform, and can be computed efficiently for j, k = 0, . . . , N − 1. This means we take O(N 2 log N ) time. Importantly libraries exist to compute this very quickly.
To achieve this, we use the Riemann sum given bȳ An FFT will produce values ofQ(t) for t = 0, 1, . . . , (M − 1). Therefore, provided M > 2N − 1 we can obtain approximations of the desired integrals.

C Assumptions for de-biased Whittle
Theorem 1 of Sykulski et al. (2019) gives assumptions for the optimal convergence of the de-biased Whittle likelihood estimator. The first of these assumptions is that the parameter space is compact with non-null interior and that the true value of the parameter lies in the interior of the parameter space. This is not strictly satisfied by the generalised JONSWAP spectral density; however, we may note that for physical reasons the parameter space can be restricted so that it is compact. The greater problem is that a value of γ = 1, which corresponds to the fully developed sea described by Pierson and Moskowitz (1964), may occur in nature and would violate this assumption. In Section 5.3 we demonstrate that, in this case, the de-biased Whittle likelihood estimator still performs well.
The second assumption is that the spectral density of the aliased process is bounded above and is bounded below by some positive real number. This is satisfied by the generalised JONSWAP spectral density. The aliased spectrum is bounded below by the non-aliased spectrum, and is bounded above by the variance of X t (which is finite). Therefore, the only frequency remaining to consider is zero, as the spectral density is zero when ω = 0. However, we note that contributions from above the Nyquist frequency are positive, and as such the aliased spectral density at zero will also be positive.
Assumption three relates to parameter identifiability. Informally, this requires the aliased spectral density function to be different for different choices of θ. Intuitively, provided the sampling interval is sufficiently small (so that the peak frequency is smaller than the Nyquist), then each of the parameters is changing the shape of the generalised JONSWAP spectrum in a different way, such that the parameters will in general be identifiable for a sufficient sample size.
The fourth assumption states that the aliased spectral density function must be continuous in θ and Riemann integrable in ω. This is satisfied for the generalised JONSWAP spectral form as it is continuous in θ and ω.
Assumption five states that the expected periodogram has two continuous derivatives in θ, and that these derivatives are bounded uniformly for all N . Furthermore the first derivatives are required to have Θ(N ) non-zero frequencies. Strictly speaking, the generalised JONSWAP is not twice differentiable (the second derivative does not exist at the peak, due to the step function σ(ω | θ)). However, a simple adaptation can be made, by replacing σ(ω | θ) with σ(ω | θ) = σ 1 + (σ 2 − σ 1 ) 1 2 + 1 π arctan (C (ω − ω p )) , where C ∈ (0, ∞) is chosen to be large. This has the advantage of having continuous second derivatives, and is essentially equivalent to the generalised JONSWAP, because C can be chosen such that it would be impossible to distinguish between the two models from observed data. Since this part of the generalised JONSWAP was developed empirically, there is no practical difference in using the step function over this reformulation. Indeed, in keeping with the general philosophy of statistical modelling -"all models are wrong, but some models are useful" -we suggest that this alternative formulation is just as appropriate as the generalised JONSWAP, but more useful here as it allows us to show this assumption is satisfied. By similar arguments to those presented in Appendix A and Appendix B, we can see that the aliased spectral density has continuous second derivatives. The autocovariance then has continuous second derivatives by a similar argument to that in Section 6, and noting that the second derivative of the aliased spectral density function is integrable, and so therefore the second derivative of the autocovariance must be continuous (as they can be shown to be Fourier pairs). Therefore, the expected periodogram also has continuous second derivatives by linearity of derivatives and the fact that linear combinations of continuous functions are continuous. These derivatives are also bounded uniformly for all N due to the compactness of the set of frequencies Ω and the parameter space Θ.
The final assumption states that the process in question is fourth-order stationary with finite fourth order moments and absolutely summable fourth order cumulants. Clearly this is true for a Gaussian process (as second order stationarity implies strict stationarity for Gaussian) and it is also true for some non-linear processes, such as the class of non-linear processes discussed by Sykulski et al. (2019).
Finally, we note that in our simulations the estimator based on the de-biased Whittle likelihood performs in broad agreement with the theory, for example we observed desirable properties such as root N convergence when exploring different values of N . We also did not find any problems with local minima during optimisation for any of the record lengths considered, suggesting that the parameters of the generalised JONSWAP form are indeed identifiable in practice for sufficiently long records.