Stochastic analysis of ocean wave states with and without rogue waves

This work presents an analysis of ocean wave data including rogue waves. A stochastic approach based on the theory of Markov processes is applied. With this analysis we achieve a characterization of the scale dependent complexity of ocean waves by means of a Fokker-Planck equation, providing stochastic information of multi-scale processes. In particular we show evidence of Markov properties for increment processes, which means that a three point closure for the complexity of the wave structures seems to be valid. Furthermore we estimate the parameters of the Fokker-Planck equation by parameter-free data analysis. The resulting Fokker-Planck equations are verified by numerical reconstruction. This work presents a new approach where the coherent structure of rogue waves seems to be integrated into the fundamental statistics of complex wave states.


Introduction
Rogue or freak waves are exceptionally large waves which at present are the subjects of intensive studies in a number of different fields [1][2][3][4][5][6][7][8][9]. In the ocean, waves are today called rogue waves if their amplitude exceeds twice (some also state 2.2 or 2.5 times) that of the significant wave height, which is defined as the average height of the one-third largest waves. Most remarkably, rogue waves are often considered to appear seemingly from nowhere and to disappear without a trace [3,10], and thus pose a substantial threat to ships and offshore structures. Installations extracting energy from or on the ocean, like offshore wind farms, wave energy conversion devices or ocean tidal and ocean current turbines seem especially of concern, since they have to be able to withstand rogue waves and to maintain the continuity of the energy supply to remain economically viable.
At present it is a matter of intensive and controversial discussion whether the physics and the mathematical description of rogue wave phenomena in the different disciplines have generic or common grounds. In particular, recent work based on the nonlinear Schrödinger equation (NLSE), describing the weakly nonlinear dynamics of narrow-banded wave states, suggests that nonlinear breather states [11] might play a crucial role and might form the deterministic backbone of nonlinear focusing of wave energy in a background wave state. Opposed to this hypothesis, which is fundamentally nonlinear in nature, alternative linear mechanisms are also under study, and the picture is far from complete. Also in statistical physics, an interest in rogue waves has been arising recently in the context of the study of extreme events. Studies on extreme events are often focused on predicting probability density functions (pdfs) of event characteristics, and links to stochastic theory [12] and nonlinear random wave theory [13][14][15][16] do exist.
In the present study we make an attempt to characterize ocean wave states by applying and extending a technique for identifying stochastic differential equations from measured data. One of the difficulties in studies on rogue waves within a natural sea state of the ocean is caused by their rare occurrence in field measurements. In consequence, data-based information concerning rogue waves is comparatively poor and incomplete, and definitely more analysis of real ocean wave measurements is highly desirable. Today there are only a limited number of publications presenting analysis of extreme waves recorded in the North Sea [23][24][25], the Japan Sea [26], and the Black Sea [27]. The number of laboratory experiments in wave tanks, to study wave structures and wave formation, and to test different hypotheses on wave turbulence and rogue wave formation, is also relatively small [1]. Due to the often highly idealized setup of wave experiments, in the present study we will apply our techniques to ocean wave data only.
However, field measurements also have to be considered with great care. There are several in situ instrumentation techniques, based on different measurement principles, that can track the surface elevation of the water, including laser and radar altimeters, buoys, and subsurface instruments such as pressure gauges and acoustic devices. The performance of several different measurement systems was for example compared in the WACSIS experiment at the Dutch coast and at the Tern platform in the North Sea [28]. Data errors and inaccuracies are particularly serious when looking for exceptionally large waves. Spikes in the data due to measurement error may be mistaken for rogue waves and in addition many recording systems (e.g. the traditional wave buoys) employ mechanical and electronic filters that need to be compensated for when an accurate tracking of the actual surface elevation is required. In some exceptional cases, measurements are, however, supported by independent and observable physical effects. This was the case for the Draupner incident, in which damage was reported to equipment on a temporary deck below the main deck of the platform. In the present study we will apply the technique to a wave data set, measured in the Sea of Japan, that we consider very trustworthy. The data set also contains some waves that can be classified as rogue waves, and so the analysis presented will, we hope, also shed some further light on the question of how rogue waves are embedded into an irregular natural background sea state.
The approach that we apply was originally introduced by Friedrich and Peinke [17] and since then has been successfully applied in a large variety of fields, such as turbulence [17,51], economics [18][19][20], biology [21,22,38], and many more; see [40]. It is based on identifying and exploiting Markov properties for the evolution of probability density functions, and starts out with measured data, to identify coefficients of fundamental stochastic differential equations, e.g. Fokker-Planck equations. The properties of the resulting equations in turn characterize the complex data. An important new aspect is that stochastic processes not only evolving in time but also in relative scale variable can be grasped by this procedure. With scale dependent processes, fractal and multi-fractal structures and even more generally joint n-point statistics can be reduced by the Markov properties to particular three-point statistics; cf [39,40].
The paper is structured as follows. First, some fundamental statistical properties and spectral properties of the data set under study are presented. Then the present approach is described. Subsequently it is shown how a Markovian process for the pdf of the surface elevation can be derived from the data, how Kramers-Moyal coefficients can be estimated, and how a Fokker-Planck equation governing conditional probability densities can be derived that contains a dependence on the time scale of the process, by which the analysis allows insight into the temporal multi-scale nature of irregular ocean waves, i.e. a natural seaway.

Measurement data and probability distribution functions
The wave measurements used in this study were taken in the Sea of Japan, at a location 3 km off the Yura fishery harbor, where the water depth is about 43 meters (figure 1). The data set consists of × 1.08 10 5 samples at a sampling frequency of 1 Hz [31]. The measuring device is an ultrasonic wave gauge manufactured by Kaijo Sonic Corporation. Mori et al [26,[29][30][31] have published several studies with data from the location.
First we will compare statistical properties of the experimental surface elevation data with Gaussian and Rayleigh distributions. Figure 2(a) shows the surface elevation distribution of the time series and a best fitting Gaussian distribution. Figure 2(b) gives the peak value distribution, i.e. the distribution of the local surface elevation maxima, and a best fitting Rayleigh distribution. Wave height is given in units of σ ∞ , which is defined according to: It can be observed that the Gaussian and the Rayleigh distributions characterize the overall statistics quite well. However, in a semi-logarithmic presentation for the large events, remarkable deviations are found. The Gaussian and Rayleigh distributions underpredict the probabilities of such large events by very large factors.
To gain further insight into whether the observed 'fat tails' can indeed be related to the occurrence of rogue waves, another data set was analyzed, for which it is expected that no rogue waves are included: we picked a wave data recording from the North Sea-based Fino research platform, which had been sampled with a frequency of 2 Hz [53,54]. The corresponding peak value distributions for the Yura and the Fino data are shown in figure 3. The Fino data follow the Rayleigh distribution nicely, whereas the Yura data clearly deviate from it. The Yura data can be fitted quite well by a generalized gamma distribution    [32][33][34]. Although the data follow the gamma distribution quite well, one may also see a transition at h = 4 from a Rayleigh distribution to an exponential tail ( − e h ) for large events. As we will focus in this paper on large events, or rogue waves, we will restrict our study to the Yura data in the following. All data will be normalized to zero mean and a standard deviation of 1 (σ = 1 m).
3. Markov properties, multi-scale analysis, and the technique adopted to derive Fokker-Planck equations Complex systems typically have hierarchical and non-trivial structures on different scales. Such systems are often more successfully described as multi-scale processes, rather than processes in time or space [16,[35][36][37]. In the following we will employ these ideas with the data described above. The starting point is the surface elevation h(t). To resolve the scale dependent complexity, increments, defined as , are taken as measures of the scale dependent structure. Here τ is the increment time interval which is used to analyze different time scales.
Complete information about the complexity of the height structure h(t) would be provided by the general joint pdfs 0 of increments at different scales τ [40]. Here and in the following we use the simplified notation of = In terms of conditional pdfs, the joint pdfs can be written as Because of the scales involved, the dimension of the joint pdf is very high. Thus in general it is very hard to estimate it from time series data directly. However, the description and computation can be highly simplified if the conditional pdfs obey the condition n n n n n n n n 1 1 This property (3) is nothing else than the Markov property for a process of τ h evolving in τ, and reduces the multi-conditioned pdf to a simple conditioned one. Only the nearby data set, τ 1 , is relevant to the probability of finding the system at a particular state h n in scale τ n . Whereas the multi-conditioned pdf involves the information on the height structure at many instances, the simple conditioned pdf In this sense the Markov property is a three-point closure of the general n-scale pdf: It is well known that a given process fulfills the stochastic Markov property only for an infinite Markov-Einstein (ME) length scale [41], which is the minimum length scale over which the data can be considered as a Markov process [43]. For Markov processes, the conditional probability density fulfills a master equation which can be put into the form of a Kramers-Moyal expansion [42,43]    Also the probability density τ τ ( ) p h , has to obey the same equation: Next we analyze the above mentioned data, and show how to ensure Markov properties, how to derive Fokker-Planck equations, and how to interpret the results.

Markov properties
We first have to show for the conditional pdfs that the Markov property is fulfilled. This is in general difficult for higher conditional pdfs, but it is possible for n = 3. The relation should hold for any value of τ 2 in the interval τ τ τ < < 1 2 3 . In figure 4 we compare doubly and singly conditional pdfs for the present experimental data to check this requirement of equation (11).
Obviously the doubly and singly conditional pdfs deviate substantially from each other, so the necessary condition for a Markov process is not fulfilled, which makes the direct application of the approach described above impossible. To understand the situation better, and to possibly   identify the origin of the non-Markovian characteristics, we first had a look at the spectral characteristics of our data set. Figure 5 shows the Fourier spectrum of our data, and it is clearly visible that there is a sharp peak indicating a strong high frequency component in the data. Obviously this peak is related to the background wave state, and indicates that a comparatively small frequency band plays an important role in the process. Such a narrow frequency band plausibly suggests that in the signal there is a long range correlation in time and space, which in turn may be in conflict with the Markov properties needed for the present analysis. Here one should note that correlation and non-Markovian memory are not the same. Correlations caused by the drift term D ( ) 1 will have no memory, whereas correlated noise in the corresponding Langevin equation (within the diffusion term) causes memory effects [43]. To cope with these non-Markovian effects, we first tried a decomposition of our wave data in Fourier modes. After filtering out some of the modes responsible for the high frequency peak, we already saw a better fulfillment of the Markov conditioned (3). Best results were obtained by the application of empirical mode decomposition (EMD). Details of EMD can be found in [44][45][46][47][48][49][50]. EMD aims at breaking down the original signal into a hierarchy of intrinsic mode functions (IMF) that separate the different frequency components of the signal through counting maxima and zero crossings. Figure 6 shows the results of the EMD. By looking at the Fourier spectra of the IMFs, which are shown in figure 7, it turns out that the effect of the dominant frequency is largely confined to the first IMF, C1, only. We thus hypothesized that this decomposition method could provide a promising new filtering opportunity to eliminate the non-Markovian contribution, and to consequently make the stochastic approach possible.
We therefore removed component C1 from the signal and reconstructed the data by adding C2 up to C11. The resulting time series and frequency spectra are shown in figures 8(a) and (b), respectively. First, it turns out that the filtering approach has successfully removed the high frequency component from the signal. Remarkably, it seems that we still have all the extreme wave events, or rogue waves, in the reconstructed data, as can be seen from figure 8(a). At first sight this seems surprising, since the filtering appears to have removed the dominant background wave component. However, the result might in fact also be a confirmation of the hypothesis that rogue waves are indeed caused by nonlinear breather type dynamics of the wave field: breather states have the remarkable property of phase jumps. For example, the Peregrine breather is characterized by a phase jump of π 2 across its spatial coordinate. In terms of physical characteristics of the wave field in the neighborhood of a breather state, one would thus indeed expect significant local changes of wavelength and wave frequency. This is indeed confirmed for the large waves given in the present data set: from the EMD results it turns out that the wave dynamics in the phases of rogue events is mainly captured by higher IMFs, here say C3 to C8, i.e. IMFs corresponding to slightly shifted underlying numbers of waves.
For our original approach, the EMD was successful in removing the dominant frequency. To see whether the memory has indeed been removed, we now test for Markovian properties of the new data set. In figure 9(a) 2 have been superposed. The proximity of corresponding contour lines yields highly satisfying evidence for Markovian properties for the chosen set of scales. Additionally, two cuts through the conditional probability densities are provided for fixed values of h 2 , further supporting this result. For completeness, figure 9(b) shows the same analysis for the first IMF. It becomes highly plausible, comparing this to the previous analysis based on the full data, that the first IMF is mainly responsible for the non-Markovian properties of the original overall process.
To quantify the Markov properties we also performed a Wilcoxon test for the scales τ 1 , τ 2 and τ 3 , for details of the method see [51]. The normalized expectation value < > t of the number

of inversions of the conditional velocity increments h h 3 2 and h h h
3 , 2 1 was calculated. If Markovian properties exist, < > t has a value of π ≈ 2 0.8. The expectation value < > t is a function of the surface elevation increment h 1 and the length scales τ 1 , τ 2 and τ 3 . In order to reduce the number of parameters, we chose h 1 to be zero and the differences τ τ − 1 . In figure 10, < > t is plotted as a function of Δτ for scale τ 3 . The Wilcoxon test indicates that Markov properties exist, as long as Δτ is chosen to be larger than at least 14 s. Note, that Δτ = 14 s is here the Markov-Einstein length.

Kramers-Moyal coefficients and Fokker-Planck equations
As we have shown that the EMD filtered wave data show Markov properties in scale τ, we can proceed to calculate, according to equation (7), the conditional moments, τ Δτ using the empirically determined conditional probability density functions. The Kramers-Moyal coefficients can be obtained in the limit Δτ → 0 using equation (6). Figure 11 shows the resulting drift and diffusion coefficient D ( ) 1 and D ( ) 2 functions, respectively. The data analysis has been conducted for a number of different time increments, and the results indicate the multiscale nature of the process.
As mentioned, the magnitude of the fourth Kramers-Moyal coefficient, D ( ) 4 , is important.
If D ( ) 4 can be taken as zero, the whole scale-dependent complexity can be described by a comparatively simple Fokker-Planck equation, which means that Gaussian distributed and delta correlated noise is present in the stochastic scale process. Figure 12 shows the results of the corresponding analysis, and indeed the reduction to a Fokker-Planck equation seems highly plausible.
With these results, Fokker-Planck equations for the relevant pdfs (equation 9) can be formulated. D ( ) 1 and D ( ) 2 completely determine the equations, and by themselves give a characterization of the multi-scaled process. Although the analysis leading to this conclusion was already quite plausible, to gain further confidence, we compared solutions of the resulting Fokker-Planck equations and the resulting pdfs with the corresponding distributions obtained directly from the data. The solution of the Fokker-Planck equation for small Δτ is given by [43] In order to obtain pdfs for larger steps, we use the Chapman-Kolmogorov equation Equation (12) is a direct consequence of the Markov condition, too. By iterating this procedure, we finally obtain pdfs on different scales. Figure 13 compares the solution of Fokker-Planck equations for the pdf τ τ ( ) p h , with the empirically estimated pdfs from the data set. The comparison proves strikingly that the Fokker-Planck equation accurately describes the evolution of τ τ ( ) p h , in τ over the different scales. One should note that the large events, or our rogue wave events, are well within this stochastic description.

Conclusions and outlook
We have shown that for ocean wave data measured at a single point, a stochastic description can be achieved. It turns out that by means of the empirical mode decomposition and on eliminating the influence of the first intrinsic mode function, the remaining signal obeys Markov properties in its scale evolution. Remarkably, but in agreement with theoretical knowledge on rogue waves derived from nonlinear analysis, the filtered data set still contains all large wave, or rogue wave, events. The Markov properties in the scale variable τ indicate that even large scale coherent or correlated structures in the data can be expressed by three-point statistics. This is a drastic reduction in complexity. Here one has to keep in mind that the Markov properties were found in the scale variable and not in time. Thus even structures in the signal that correspond to special combinations of several height increments on different scales are grasped by this threepoint closure, or expressing this stochastically, the joint probability density function τ τ τ ( ) A further consequence of the Markov properties is that the scale evolution of the height increments statistics can be expressed by a Fokker-Planck equation, which finally allows one to set the complexity of wave structures in the context of non-equilibrium thermodynamics [52]. For the Fokker-Planck equation, drift and diffusion coefficients could be estimated from the data. A verification of this description based on this shows an excellent agreement between the empirical statistics and the statistics obtained from the Fokker-Planck equation. Most interestingly, the extreme wave structures also seem to be grasped quite well by this approach. This might be a very far reaching finding for a number of reasons. First, the result suggests that in the end the rogue wave statistics can also be captured by relatively simple Fokker-Planck equations, although still with a view to an underlying multi-scale approach being involved. Second, in extreme events there is an ongoing discussion as regards whether the very extreme outliers in the data could in any way still be integrated into the fundamental statistics of the complex wave process state. In our present study it seems tempting to conjecture that we might indeed have succeeded in finding such a general approach unifying the rogue and non-rogue waves. This would also open the possibility of a stochastic forecasting of extreme wave events like what was achieved for financial data [19]. However, although the present findings seem promising, a number of questions definitely require further study. First of all, the present analysis is based on a single data set only, and more data-based evidence is without doubt necessary in order to gain more and further confidence in the present approach. Second, from a theoretical perspective and with a view to the discussion of whether breather waves or modulation instability and nonlinear focusing could be the backbone structures of rogue waves, more and additional understanding is definitely needed. It seems plausible that the EMD-based filtering approach for the data employed in the present study focuses the data analysis towards processes that manifest some local non-trivial phase dynamics and local phase changes, as is also manifested by breather solutions, modulational dynamics and nonlinear focusing in general. So the present filtering approach might indeed be regarded as a nonlinear transformation of the original data that in the end allows a unified statistical description. Further understanding on why and how this surprising result has been achieved, and whether the same result will also apply for other wave states, will have to be gathered.
In consequence, further work applying the present single-point analysis to more wave data, including rogue waves, has to be conducted. Moreover, multi-point data will also have to be used to capture possibly underlying effects of multi-directional wave states and their role in the present context. From a conceptual perspective, in particular the role of EMD-based filtering will have to be clarified to better understand the complex interplay between the linear and nonlinear wave dynamics of the sea state with rogue and non-rogue waves, and the stochastic framework capturing it.