Extreme Value Analysis of Wave Heights

This paper discusses the most common problems associated with the determination of design wave heights. It analyzes two common methods used in fitting wave data and shows some of the stability or inconsistency problems associated with commonly used distributions. Some methods to obtain confidence intervals, detecting of outliers and treatment of missing data are given.


Introduction
As in many other Helds of engineering, the design of ocean or marine structures is governed by extreme values of wave heights.Several methods have been given in the past for the determination of design values.However, no method is widely accepted by the engineering community.
Traditionally, the analysis of yearly maxima has been considered as a good method for this purpose.However, recently, peak value methods arose as a promising alternative.
The aim of this paper is to compare these ^*o methods and illustrate some of the problems related to their use.

Two Standard Methods in the Determination of Wave Height Design Values
In this section we analyze the following two well known procedures for obtaining design wave heights: Wx&peak value method and X\\^ yearly maxima method.
The first one employs the peak wave heights of individual storms and thus composes a set of extreme wave data.The second one uses the yearly maxima.
Several authors have criticized the second method in that it discards large wave heights, when they occur in years with large storms, but includes relatively small wave heights which are maxima of calm years.

The Peak Value Method
This method consists of the following steps: 1, Fit the peak values of individual storms to a parametric family of distributions fo(x; Ao, So, /3o), (1) where Ao, 5o and )3u are the parameters.In some cases these three parameters can be reduced to' two or even to a single one.The fitting of the above family can be done either by using all data or only tail data (Peak over threshold (POT) method).It is worthwhile mentioning that this distribution corresponds to the wave height of a storm, that is, we assume that the cdf of the maximum wave height of storms is Eq.(1).
2. Use the following cdf for the maximum wave height in a period of duration D years: where k is the mean number of storms per year, or determine the wave height, XT, associated with a return period T, that is, solve, for jc, the equation Note that the cdf in Eq. ( 2) implies the assumption of independence of storms.

The Yearly Maxima Method
This method consists of the following steps: 1. Fit the yearly maxima to a parametric family of distributions where Ai, Si and 0i are the new parameters.This is equivalent to assuming that the yearly maxima follow a distribution which belongs to Eq. ( 4).
Use the following formula to extrapolate to the maximum of a period of D years: Fi(:r; A,,6,, 00", (5) or determine the wave height x associated with a return period 7", i.e., solve the equation for x: FI(J:;A,, 5,.Pi) = l-|;.
3. Some Problems Related to the Data Analysis In the analysis of data one has to deal with some problems.Among them we mention the following: • Selection of the families Fo(x; An, Su, Po) or -^-

The minimal Weibull family
The Gumbel, maximal Weibull and maximal Jenkinson's families are justified from a theoretical point of view, because they are the limit distributions for maxima (see Galambos [5] or Castillo [2]).It is interesting to note that the Jenkinson's family includes the other two, as particular cases, and the maximal Frechet family (for k>0).The Frechet distribution is not justified in this case because wave heights are physically limited, no matter we deal with shallow or deep waters (see Castillo and Sarabia [3] and [4]).The minimal Weibull distribution, though widely used, is not theoretically justified in the case of maxima.Its only justification is that its range can be made to be consistent with the positive character of wave heights.In addition, we remind the reader that it belongs to the maximal domain of attraction of the Gumbel type, i.e., it is asymtoticaiiy equivalent to a Gumbel distribution of the type Eq.(7).
However, due to the fact that this distribution is widely used in the analysis of wave heights, it seems convenient to make here some comments.
Initially we can say that this distribution has the following advantages: • For A =^0, its range is (0, <»), that is, it does not include negative values of the random variable.
• Assuming that the location parameter, due to physical reasons, is fixed to zero, it depends only on two parameters.This makes the estimation process much simpler.
• Its associated domain of attraction is Gumbel type.Thus, it could be used if this were the actual case.
Its main drawbacks are the following: • Its range is unbounded on the right.This contradicts the physical reality.
• It does not cover the Weibull domain of attraction that could be the real situation.
• It is an asymptotical minimum law.
It is not stable with respect to maximum operations.Thus, if the minimal Weibull law is satisfied for yearly maxima the maxima of periods of duration different from one year cannot satisfy this law.This problem can be solved by adding an extra parameter to this family, which leads to the extended minimal Weibull family.
Consequently, the minimal Weibull family could be used if and only if we were sure that the domain of attraction of wave heights is of a Gumbel type.
In order to determine the domain of attraction of a given distribution several methods are available, such as the Pickands' or the curvature methods (see Castillo [2] chapter 6 and Castillo, Galambos and Sarabia [3]).

Estimation Methods
Several methods have been used to estimate the parameters of the families Eqs. ( 7) to (10).The most important are: • The maximum likelihood method

• The method of moments
The least squares method • The probability paper method • The Goda's method • The percentile method 3j;.lThe Maximum Likelihood Method This method is based on maximizing the likelihood of data with respect to the parameters.The central idea consists of assuming that the sample comes from a population with parent distribution belonging to a parametric family and choosing the parameter values that maximize the probability of ocurrence of the sample data.This is the best known method in statistics and it is recognized as the most convenient, due to its statistical properties.It leads to the best estimators, which, in addition, are a.symptoticalty normal.This allows asymptotic confidence intervals for the parameters to be easily obtained.Using the 5-method, to be described later, the confidence interval of any regular function of the parameters can be obtained, too.In particular, confidence intervals of percentiles can be obtained in this manner.
In order to estimate an extreme value distribution with the purpose of extrapolation beyond the data range, only high order statistics must be used and the rest must be discarded.Thus, we recommend the method indicated by Castillo [2], in chapter 5.
In the case of the minimal and maximal Weibull families, the estimation process can lead to some problems, either because the likelihood function becomes unbounded ({3^ 1) or because some nonregularities, lor some values of the shape parameter (1 <)3 <2).However, it can be applied to values of the shape parameter larger than or equal to 2 without any problem.Thus, once the estimates are available, it is necessary to check that their values are consistent with the initial hypothesis.Here we give the following recommendations: • If the shape parameter takes a negative value, this means that the data indicate a Frechet type domain of attraction.This suggests the presence of at least one outlier that gives an erroneous curvature in the right tail.
If we get a value of ^^ 1, we can think on the presence of outliers.This value of the shape parameter indicates that the probability density function is increasing in the tail, which contradicts the physical reality.
• If we get 1 < ^ < 2 then the law is far from the Gumbei law (note that Gumbel corresponds to /3 = «).
• If the value of the A parameter is less than the maximum of the sample this indicates that there is an outlier.
3JJ The Method of Moments This method consists of equating the moments of the sample to the moments of the theoretical distribution.We use as many moments as there are parameters to be estimated and we get the same number of equations from which the parameters can be obtained.The asymptotic properties of the moment estimates are good but worse than those associated with the maximum likelihood estimates.
This method can also be applied to tail estimation, using the moments of the truncated distribution.

3.2J
The Least Squares Method This method consists of minimizing the sum of squares of the differences between the theoretical and the empirical values.There are many versions of this method.In some cases the random variable scale is used to measure the errors and in other cases the probability or the return period scales are used (see chapter 4 of Castillo [2]).
The main advantage of these methods is that they give an explicit solution and do not depend on convergence of any algorithm, as is the case with the maximum likelihood method.
Nevertheless, these methods are sensitive to the plotting position formulas used in the estimation method.

3.2.4
The Probability Paper Method By probability paper method we understand a visual method, in which the data is drawn on probability paper and a straight line is visually fitted to data.
The main drawback of this method is that it depends on the plotting position formula used in the graphic representation and the subjective criteria for selecting the optimal fit.[4] fits a minimal Weibull distribution, truncated at the threshold value JCO, to the right tail of data.By right tail are meant the wave heights above a second threshold value xi > >xa-3.2.6The Percentile Method One way of obtaining quick estimates of the parameters of a dis-tribution is by means of the percentile method.This method consists of equating as many percentiles in the sample and the theoretical distribution as the number of parameters to be estimated.

The Coda Method Goda
As an illustrative example we use this method for the estimation of the parameters of a three parameter maximal Weibull family.
The cdf of the maximal Weibull distribution is: GOi:) = exp H'rn (11)   Thus, the percentile or order p satisfies the equation

Xp,-Xp, (-l0g/Jj)"*-(-|0gp2)
. ( 16) which depends only on the parameter ^ and thus, it can be easily solved by an iterative method, as the bisection method for example, with a personal computer.Once /3 is known, the values of A and fi can be obtained from any two of the equations in Eq. ( 14), For example: )'"'-(-log/i.)"''' A =^"-l-5(-logp,)"^ (17) For the estimates to be consistent with the model we must have A >max{xi,X2,,.. ,x") (18) where {xi.Jfi,..., Jf,) is the sample.If the percentiles are arbitrarily chosen, this inconsistency can easily appear.Thus, it is good practice to choose as one of the percentiles the maximum of the sample x^.
In addition, if we are dealing with a tail estimation we must choose the adequate percentiles, that is, percentiles in it.
In order to improve the quality of the estimates we can use three groups of percentiles instead of three percentiles, that is, replace the system Eq.( 14) by the system 3.2.7 Plotting Position Formulas There is much controversy about the plotting position formulas to be used for representing data on probability paper and the posterior estimation by least squares methods.
The resulting estimates are sensitive to the plotting position formulas being used.This confirms the fact that the least squares method is not optimal.Note that maximum likelihood or moment methods do not depend on plotting positions.
The discussion of the appropriateness of various formulas is intended to avoid or reduce some of the errors involved (in this case authors recommend using formulas leading to unbiased estimators).
However, we mention here that all plotting posi* tion formulas are asymptotically equivalent.

S-Method
The ^-method (Bishop, Fienberg, and Holland [1]) allows us to obtain confidence intervals of certain regular functions of the parameters, as functions of the parameter estimates, and its variance-covariance matrix.Let where m,-, (/=1,2,3) are the numbers of percentiles included in each group.With this, equation Eq. ( 16) becomes Eq.(20).( The maximal Weibull distribution satisfies the necessary regularity conditions for the asymptotic normality if ^ 3= 2. Thus, if we have a sufficiently large sample coming from a population with maximal Weibul!parent, we can write: where where / = (fl.y) is the information matrix associated with the maximal Weibull family, that is.
where X is the variance-covariance matrix of (Ai, A2 Aj).

Estimation of Percentiles of the Maxima) Weibull Distribution
As a simple example of the 5-method we give the confidence interval of one percentile of the three parameter maximal Weibull distribution.We assume that the parameter p is larger than 2.

Point Estimate
The percentile Xj, of the maximal Weibull distribution is: Thus, according to the invariance principle, the maximum likelihood estimator of that percentile. is: 3.4.2Maximum Likelihood Estimators: Asymptotic Theory We assume here that the sample consists of those obser\'ed values above the threshold value / (type II censoring), that is, the probability density function is given by and / and F are the pdf and cdf of the maximal Weibull family.
from which the confidence interval for the percentile Xp at level a becomes: 3^ Outlier Detection In this section we give a method to detect the presence of outliers in the sample data.The method is based on the fact that if we make the following change of variable: where F{x) is the cdf of X, the resulting random variable, Y, is uniform t/(0,l)-In addition we know that the maximum of a random sample of size n coming from a standard uniform parent has cdf FY^(y)=y" = ?ioh[Y^^y] (42) We shall say that the sample maximum is one outlier if the probability of being exceeded is very small.Thus, the value >■ (! can be considered as critical for the maximum value of the sample if Prob[y™ >yo] = fv™(y.,)= 1 -y"u = a (43)

Treatment of Incomplete Series
If we know about the existence of r storms in a given series, but we ignore the peak intensities we can perform an estimate based on the known peaks and then make a correction for the unknown peaks.This means estimating the cdf with the known peaks and raise to the power (n+r)/n, where n and r are the number of known and unknown peaks, respectively.

Critical Anal^^sis
In this section we analyze the previous methods and discuss some of their inconsistencies.

Inconsistencies due to the Lack of Stability
With Respect to Maximum Operations When several design methods are recognized by the engineering community a certain consistency in the respective resuhs should be expected.We shall see that this is not the case for some of the previous methods, Let us assume that we try to fit the minima) WeibuU family F{x; A", So, ^o) = 1 -exp { (^^^ ) *} , (46) where Ac, do and fin are the parameters.Then, Eq.
(2) transforms to Foix; An, 5", /3(i) -F(x; A". 5".^u)" = and the wave height associated with a return period T becomes Eq. ( 3): with a very small (0.01, 0.05, etc.).Then, we get yo = (!-«)"" This critical value refers to the random variable y.Thus, we need to obtain X by means of the inverse of Eq. (41).As one example, for the maximal WeibuU distribution we get xo=-A-5t-log(l-a)""]"^ In relation to the second inconsistency, the model should be stable with respect to truncations.With the purpose of clarifying this idea, let us assume that we choose a family of candidate distributions H(x;y), where the second argument y is one parameter, which, without loss of generality, can bs assumed to he the threshold value.Then, if the wave height exceeding z has as cdf the function mx,z), then, the wave height exceeding y should have a cdf given by where the right hand term arises from the consistency condition that expresses that the family H(x;z) remains valid for any value of the threshold parameter, which in this case isy.Equation ( 60) is a functional equation.Its general solution can easily be obtained by making z = zo, that is.For H{x;y) to be a cdf, then G{x) must also be a cdf.
Equation (61) proves that any consistent family H{x;y) must come from another family G(x) by means of a truncation procedure.
The minimal Weibull family, used by Goda, does not satisfy this condition.Thus, it is inconsistent.
With the purpose of having a consistent family in the two previously given senses, one solution would consist of assuming G{x) to be extended minimal Weibull with null location parameter.This would imply that the sample data above the threshold value J:I should be fitted to the family where j8, 5 and rj are the parameters to be estimated.In the last case we are assuming that the cdf of the maximum wave height in an indeterminate period, to be estimated, is minimal Weibull.
Note that fitting the right tail means fitting a truncated model with basic distribution given by Eqs.(65), (66) or (67).
All these models are consistent in the previously mentioned sense.

4J Inconsistencies Associated With the Use of H. and T,
It is very common in the Ocean Engineering field to work with the significant wave height, H,, and period, Tt, as the basic variables for extreme value analysis of waves.However this is not correct because Ti is the mean zero up-crossing period and H, is defined as the mean of the 1/3 largest waves.These two random variables are convenient to justify normality assumptions in wave spectra, but cannot be accepted if an extreme value analysis of wave height, H, is to be performed.In fact, distributions in different domain of attraction types can lead to the same distribution for H, and/or T^, thus, obscuring the tail properties of single waves.

Volume 99 ,
Number 4, July-August 1994 Journal of Research of the National Institute of Standards and Technology

Volume 99 ,
Number 4. July-August 1994 Journal of Research of the National Institute of Standards and Technology

Volume 99 ,
Number 4, July-August 1994 Journal of Research of the National Institute of Standards and Technology
rnx,y,o,t})l-[F(y;6,^)]" ' ^ ■' where F(A:; 5. P) = l-exp [-(!)'],(64) Nevertheless, we remind the reader that this solution can be satisfactory only in the case of a parent distribution in the domain of attraction for maxima of a Gumbel type.Consequently, as a summary, we recommend to fit the sample data above the threshold to one of the following three families: If the domain of attraction is Weibull type, fit the right tail to the maximal Weibull family f<.(j:Ao,5u,^a) = exp(-(^^)*'} (65) if the maximal domain of attraction is Gumbel type, fit the right tail to the maximal Gumbel family Fo(x; Ao, 5(K) = exp -exp I ° j (66) or to the extended minimal Weibull family Fo(x;A,5,^)= {l-exp[-(|)^]}'; jf&O