Bayesian Hierarchical Modeling of Sea Level Extremes

Reliable estimates of occurrence probabilities of sea level extremes are required in coastal planning (e.g. design floods) and to mitigate risks related to flooding. Probabilities of specific extreme events have been traditionally estimated from the observed extremes independently at each tide gauge location. However, this approach has shortcomings. Firstly, sea level observations often cover a relatively short historical time period and thus contain only a small number of extreme cases (e.g. annual maxima). This causes substantial uncertainties when estimating the distribution parameters. Secondly, exact information on sea level extremes between the tide gauge locations and incorporation of dependencies of adjacent stations is often lacking in the analysis. A partial remedy to these issues is to exploit spatial dependencies exhibited by the sea level extremes. These dependencies emerge from the fact that sea level variations are often driven by the same physical and dynamical factors at the neighboring stations. Bayesian hierarchical modeling offers a way to model these dependencies. The model structure allows to share information on sea level extremes between the neighboring stations and also provides a natural way to represent modeling uncertainties. In this study, we use Bayesian hierarchical modeling to estimate return levels of annual sea level maximum in the Finnish coastal region, located along the north-east part of the Baltic Sea. As annual maxima are studied, we use the generalized extreme value (GEV) distribution as the basis of our model. To tailor the model specifically for the target region, spatial dependencies are modeled using physical covariates which reflect the distinct geometry of the Baltic Sea. We illustrate the added value of the hierarchical model in comparison to the traditional one using the available long-term tide gauge time series in Finland. Careful analysis of the sources of uncertainties is necessary when extrapolating the return level estimates into the future. This work is a part of project PREDICT (Predicting extreme weather and sea level for nuclear power plant safety) that supports nuclear power plant safety in Finland.


Photo credit: Jarmo Vehkakoski
Reliable estimation of occurrence frequency of high consequence sea level extremes is very important to support coastal planning and safety in all coastal regions of the world. The Baltic Sea is a semi-enclosed shallow sea connected to the Atlantic Ocean via narrow Danish straits. Finnish coastal region -the focus region of this study -is located in the northeastern Baltic Sea (Figure 1). Three factors affect the sea surface level on the Finnish coast: 1. short-term sea level variations (storm surges, internal oscillations in the Baltic Sea, tides, and meteotsunamis), 2. long-term mean sea level change (global mean sea level, the postglacial land uplift and the Baltic Sea water balance), and 3. wind-induced waves.
In this study, the focus is on short-lasting sea level variations. The main aims of this study are 1. to provide updated return level estimates for annual sea level maxima along the Finnish coastal region with Bayesian hierarchical modeling and 2. to better quantify uncertainties related to the return level estimates.
According to recent studies, it is foreseen that the flooding risks will increase during the 21st century at all places on the Finnish coast except in the Bothnian Bay .

Background
FMI has studied extreme weather, climate and sea level events potentially posing risks to Nuclear Power Plants since 2007 (Jylhä et al., 2018), and presently in the PREDICT project within the SAFIR2022 program.

GEV distribution
We follow the standard approach to use generalized extreme value (GEV) distribution (Coles, 2001) to model maximum annual sea-level Y as In the equation above, µ is the real valued location parameter, σ>0 is the scale parameter and ξ denotes the shape parameter. The tail behavior of GEV distribution is strongly controlled by the shape parameter. If ξ<0, y has a bounded upper tail, while in other cases the tail is unbounded and has either exponential (ξ=0) or polynomial (ξ>0) decay.
The quantile distribution for the annual maximum sea level is obtained by inverting the GEV equation: This equation gives the return level y associated with occurrence probability p or conversely, the return period 1/p. There are several ways to estimate the associated distribution parameters θ=(µ,σ,ξ). Here, we take Bayesian approach to make inferences on the posterior predictive distribution f(y|θ) for maximum annual sea-level. As we are particularly interested in reliably quantifying uncertainty in the predicted return levels, Bayesian approach provides a natural way to do this.
Individual GEV model fits to the gauge locations were used as the baseline. In addition to modeling Y separately for each tide gauge Y ∼ GEV(µ ,σ ,ξ ), we experimented with four different hierarchical model formulations, where the station specific parameter values are tied together with hyper distributions. The models used are: 1. The individual values for all three GEV parameters are coming from a joint Gaussian hyper distribution, whose parameters are estimated (common).
2. Individual location and scale parameters µ and σ depend linearly on the distance to the Bothnian Bay (linear).
3. Individual location and scale parameters are modeled with B-splines using distance as the covariate (spline).
4. Individual location and scale are modeled using Gaussian processes with squared exponential covariance function depending on the distance (GP).
The distribution parameters were estimated with Stan program and the rstan and splines R packages.

PARAMETER ESTIMATES
Stan program (Stan Development Team, 2020) implements Markov chain Monte Carlo (MCMC) simulation from the multi dimensional posterior distribution of the model parameter uncertainty. For the hierarchical models, it estimates both the individual, station-specific GEV parameters µ,σ,ξ, as well as the corresponding hyper-parameters, for example the between station mean of µ and the between station variability of µ.
For the Gaussian process model, the hyper parameters include the between stations correlation distance, and for the spline model, the coefficients for the spline basis functions. We are mostly interested in the individual station specific parameters as well as using the estimated model to predict future return levels. All the four selected hierarchical models provided quite similar results. This is illustrated in Figure 2, which shows the model estimates of parameter values for each tide gauge. Overall, it can be said that compared to the individual estimates without hierarchy, all the hierarchical models were able to borrow information from neighbouring stations and thus, provided more informative posterior distributions. Especially the shape parameter ξ seems to be quite constant over the study area, and it makes sense to combine this information. Interestingly, ξ seems to estimate nicely around -0.2(±0.08), which suggests a Weibull type tail behaviour for the water level maxima. This means that we could use an upper limit of µ − σ / ξ as an estimate (with attached uncertainty) of the maximum attainable sea-level height at any given location. An example distribution for the upper limit is given for the Kemi tide gauge in Figure 3, as this gauge has one of the highest observed maximum annual sea-level heights in Finland (see Figure 1).
Moreover, both the Gaussian process and the spline model can predict model parameters for between station locations, too, as they depend on the distance from Bothnia Bay. Fits with respect to the distance to Kemi are illustrated in Figures 4a-b and 5a-b. As an additional note, Figure 5a shows that the spline model does not capture the mean parameter as well as the Gaussian process model. This is due to the fact that the number of knots was fixed to a small number (5) in order to avoid overfitting. Penalised B-splines could probably better capture the mean parameter, but this approach was not tested here due to the good performance of the Gaussian process model.

PREDICTIVE INFERENCE
A big advantage of Bayesian statistical analysis based on Markov chain Monte Carlo (MCMC) simulation is the ease with which we can estimate uncertainties in model based predictions. Monte Carlo predictions by sampling parameter values from the posterior MCMC chain readily produce a sample from the corresponding posterior predictive distribution. This allows us to predict return levels within and even beyond the observed occurrences, provided the chosen GEV model is a realistic description of the maximum annual sea-levels. In the present case it does seem to provide a robust analysis of the available observations, so even the estimated return levels with very small probabilities should be well based, at least in the current climate, and they can be used as a part of risk analysis in planning coastal buildings etc. Figure 6 displays predictive analyses done with station specific separate models as well as with hierarchical models, which again shows the advantage from pooling the information.
More specific examples are given in Figures 7 and 8, which shown the 50-year and 1000-year return level estimates simultaneously for all used models. The most notable feature in these figures is the larger predictive uncertainty for separate models for most tide gauges.
OBSERVATIONS 4-hourly/hourly observations were collected from 12 tide gauges located along the Finnish coast ( Figure 1). The length of the available time series varies between 127 (Hanko) and 86 (Rauma) years and thus, this data provide a good basis for extreme value analysis.
The observations have been converted to the height system N2000 from fixed tide-gauge-specific reference systems (Saaranen et al., 2009). Long-term linear sea level trend was removed from each time series before calculating the annual maximum sea level heights, which were then used in statistical analysis based on three parameter generalized extreme value distribution (GEV).

CONCLUSIONS AND NEXT STEPS
We conclude with the following items: Pooling the information across stations by using hierarchical models reduces the uncertainties in individual stationspecific parameters. Shape ξ and the tail behaviour are more accurately estimated. Different models affect the return level predictions which can be seen in the predictive distribution plots.
The Gaussian process or spline hierarchy allows estimating water levels at non-gauged locations, too.
Next steps in our project involves: Study the temporal stationarity of the parameters.
Study the effect of North Atlantic Oscillation (NAO).
Take into account the changing climate.
Compare maximum sea-level heights to physics based analysis of the Baltic Sea.