Joint modelling of the body and tail of bivariate data

,


Motivation
When dealing with environmental phenomena such as high temperatures, wind speeds or air pollution, or with financial applications such as insurance losses, interest often lies in modelling the extreme observations, which are typically scarce.For such cases, a model with focus on the tail of the distribution is required as common statistical models that may be used to fit the entire data set lead to poor estimates of the extremes.To overcome this issue, models based on extreme value theory (EVT) can be applied; these aim to quantify the behaviour of a process at extremely large (or small) values of a series.Typically, the generalised extreme value (GEV) distribution is fitted to block maxima, often annual maxima, or the generalised Pareto distribution (GPD) is fitted to data exceeding a high threshold.The former can be seen as a wasteful approach if there are more data on extremes available, while the latter usually requires a subjective choice of threshold, which inevitably leads to uncertainty, with different choices leading to different results; see Coles (2001).
However, in some cases, interest not only lies in modelling the extreme observations accurately but also fitting the non-extremes well, meaning a flexible model over the whole support of the distribution is required.For instance, the concentration of pollutants in the air may be so high that harmful levels are actually in the body of the data set.Thus, from a public health perspective, we care not only about the probability of exceeding extreme, and potentially more dangerous, pollutant levels but also about the probability of exceeding harmful yet locally moderate levels.Fitting a model to both the bulk (i.e., the non-extreme observations) and tail (i.e., the extreme observations) of a data set has been dealt with in the univariate framework but little work has been done in extending to a multivariate setting.In this work, we outline an approach that offers dependence models for the bulk and tail, while ensuring a smooth transition between the two.

Background
In the univariate setting, several models have been proposed to join one distribution for the bulk to a GPD for the tail.Scarrott and MacDonald (2012) review several of these approaches, hereafter referred to as extreme value mixture models, or EVMMs.These models aim to account for the uncertainty in the choice of threshold, by implicitly or explicitly estimating it.With EVMMs, care is needed so that the bulk and tail are not excessively influenced by each other, though they cannot be fully disjoint since they share information.Parametric EVMMs entail fitting a specified distribution to the bulk and a GPD to the tail, while semi-parametric models fit a GPD to the tail with a more flexible model in the bulk.Behrens et al. (2004) propose a parametric model, which exhibits discontinuity at the threshold; Carreau and Bengio (2009) avoid this by forcing continuity up to the first derivative of the density function.On the other hand, Frigessi et al. (2002) fit two distributions to the whole data, giving more weight to the bulk at low ranges in the support and to the GPD in the upper tail by means of a dynamic weighting function p(x; θ) ∈ (0, 1].The density of their model is defined as h(x; θ, β, α) = [1 − p(x; θ)]f (x; β) + p(x; θ)g(x; α) K(θ, β, α) , where g(x; α) is the density of the GPD with vector of parameters α, f (x; β) is a density with a lighter tail and vector of parameters β, K(θ, β, α) is a normalising constant and p(x; θ) is increasing in x for all θ.Because p(x; θ) depends on x, it favours the GPD in the upper tail whilst the lower tail is controlled by f (x; β).However, careful choice of the weighting function is needed since some functions, such as the unit step function, may lead to a discontinuity in the transition between the two distributions; see Frigessi et al. (2002) for details.More recently, methods introduced by Naveau et al. (2016) and Stein (2021) aim to model the lower and upper tails of the data with GPDs, while ensuring a smooth transition between the regions.The former achieve this by constructing a model relying on compositions of functions, where one is a cumulative distribution function (CDF) of a GPD, and the other is a CDF that satisfies certain constraints to ensure both tails follow a generalised Pareto-type distribution.The model proposed by Stein (2021) also assumes a composition of functions, where one is a monotone-increasing function that controls both the lower and upper tails, and the other is a Student t CDF.Finally, Krock et al. (2022) extend the latter approach to incorporate non-stationarity.The methods proposed by Frigessi et al. (2002), Naveau et al. (2016), Stein (2021) and Krock et al. (2022) avoid the choice of threshold.
In a semi-parametric framework, Cabras and Castellanos (2010) approximate the bulk distribution by an equi-spaced binning of the data followed by a Poisson log-link generalised linear model fit to the counts with a polynomial smoother for the mean parameter.Nascimento et al. (2011) define the bulk distribution as a weighted mixture of gamma densities, extending the method proposed by Behrens et al. (2004), while Huang et al. (2019) estimate the log-density by first transforming the data and then applying a cubic spline to the histogram.Tencaliec et al. (2020) propose a method based on the extension of the GPD proposed by Naveau et al. (2016).Finally, Tancredi et al. (2006) and MacDonald et al. (2011) propose non-parametric fits to the data.In the former, the bulk model is fitted by a mixture of uniform distributions whereas in the latter a kernel density estimator is used instead.
When we move to the multivariate setting, there is an extra difficulty; not only is it important to model the margins of the data correctly, but the dependence between the variables is also of interest since the behaviour of one variable can influence the behaviour and the value of another.It is common practice to measure this relationship using correlation coefficients, such as the Pearson's linear correlation or Kendall's concordance (Kendall, 1938).However, these only give information about the association between variables as a whole.An alternative is to use copulas, which fully capture the dependence between two or more variables.According to Sklar's Theorem (Sklar, 1959), a multivariate distribution F of the random vector (X 1 , . . ., X d ) can be written as the composition of a copula, C, and the marginal distributions of each If the variables are continuous, then the copula C is unique.One advantage of copulas is that they are able to describe the dependence structure of two or more variables in a way that does not depend on the margins.Where it exists, the copula density c (F X 1 (x 1 ), . . ., F X d (x d )) can be obtained by taking the d th order derivative with respect to the variables F X 1 (x 1 ), . . ., F X d (x d ).
There is a large literature on dependence modelling for extremes, which usually involves defining a multivariate threshold above which an asymptotically-motivated copula is assumed to hold.However, models able to capture the behaviour of extremes as well as the body of the data are scarce in the literature.Both defining and performing inference on such models can be challenging compared to univariate models.Aulbach et al. (2012a,b) suggest an extension to the multivariate setting of the model proposed by Behrens et al. (2004).They define a novel copula model by joining two d-dimensional (d ≥ 2) copulas, one for the upper tail and the other for the body, in a manner that produces a new copula.Specifically, the authors assume two independent random vectors, each of which follow an arbitrary copula, that is Then, by an appropriate choice of threshold vector t = (t 1 , . . ., t d ), they construct a random vector Q, whose i th element is given by The authors prove that Q also follows a copula on [−1, 0] d , which coincides with C 1 on the region (t 1 , 0] × . . .× (t d , 0] and with C 2 on the region An exact representation of the method is presented in Aulbach et al. (2012b).However, the model not only requires a choice of cut-off values t i , i = 1, . . ., d, to define the regions to fit each copula but also the transition between the two copulas may not be smooth.Figure 1 displays an example of a data set simulated according to equation (1); the discontinuity at the threshold is evident.Moreover, this method does not offer a convenient formulation of the likelihood, which results in difficulties for inference.1) with a Gumbel copula with parameter α = 2 selected for the upper tail copula C 1 and Gaussian copula with parameter ρ = 0.6 selected for the body copula C 2 of the model proposed by Aulbach et al. (2012a).For illustration purposes, the vector of thresholds was chosen to be t = (0.8, 0.5).
A different type of approach was taken by Hu and O'Hagan (2021), who consider averaging different copula families that have been fitted to the whole distribution, in order to obtain a more robust estimate of the tail dependence of the data set.However, the use of BIC in the calculation of the weights assigned to each copula places the focus on the body and not on the tail of the data.
In a spatial context, Krupskii et al. (2018) and Zhang et al. (2022) each propose models fitted to both the body and tail of a distribution.The former outlines a copula model based on the assumption that there exists a common factor which affects the joint dependence of all the observations of the underlying process, and which is able to model both tail dependence and asymmetry.Numerical integration over this factor variable leads to a likelihood that can be fitted to all data.The latter propose using the generalised hyperbolic copula, which is flexible due to having a relatively large number of parameters.For both of these models, the authors show that there is reasonable flexibility for capturing both body and tail, yet a primary motivation for fitting to all data is the desire to avoid the computational difficulty involved in using censored likelihoods for extremes.

Extremal dependence properties
When the focus lies on extreme values, studying the extremal dependence between the variables is of interest.Two variables are said to be asymptotically dependent (AD) if joint extremes occur at a similar frequency to marginal extremes, or asymptotically independent (AI) otherwise.To quantify this dependence, Coles et al. (1999) propose the measure χ = lim where C is the copula of (X, Y ).The random variables X and Y are asymptotically independent if χ = 0, whereas if χ > 0 they are asymptotically dependent.
This paper is organised as follows: in Section 2 we present our proposed model and its properties.Inference for the model is studied in Section 3, complemented by a simulation study to demonstrate performance in correctly specified and misspecified scenarios.We then apply our methodology to ozone and temperature data in the UK in Section 4 and conclude with a discussion in Section 5.

Model definition
Our interest lies in accurately modelling both the bulk and the tail of the whole distribution.From existing literature in the dependence context, only Aulbach et al. (2012a,b) are concerned with representing both regions correctly.However, our model differs from theirs in that we aim for a smooth transition between the two regions.To do so, we propose a mixture model where we fit two copulas to the whole range of the support and blend them by means of a dynamic weighting function π; in this way, data can be allowed to favour the "best" copula for each region, avoiding the subjective choice of thresholds often present in EVT applications.This approach can be seen as an extension to the multivariate framework of the model proposed by Frigessi et al. (2002) mentioned in Section 1.2.
Although our ideas could be applied in higher dimensions, we restrict ourselves to the bivariate setting for computational simplicity.Let c t and c b be copula densities representing the tail and the body, with vectors of parameters α and β, respectively.For where γ = (θ, α, β) is the vector of model parameters and is a normalising constant.The weighting function π depends on the data, and is specified such that, for small values of u * and v * , more weight is given to c b and, for larger values, more weight is given to c t .Thus, for a fixed value of the parameter θ, the function π : (0, 1) 2 → (0, 1) should be increasing in u * and v * .
A direct consequence of π(u * , v * ; θ) depending on the data is that the margins of the density c * are non-uniform; this leads to complications for inference.That is, we cannot fit c * directly to the data as it is not a copula density.We overcome these issues by fitting the copula of the density in equation ( 5), which requires numerical integration to calculate.The first stage is to obtain the true margins of (U * , V * ) ∼ c * as and similarly for F V * , and then the corresponding inverse functions, F −1 U * and F −1 V * so that we can transform the margins to Uniform(0, 1).The resulting copula is thus represented as where f U * and f V * are the marginal probability density functions of c * and γ = (θ, α, β) is the vector of model parameters, common to the density in equation ( 5).Note that each of f U * , f V * , F U * and F V * depends on γ, but this is suppressed in the notation for readability.

Simulation
It is important to be able to sample from the proposed model so that it can be validated.
To do so, we first note that we can rewrite the density (5) as a standard mixture of two densities where K = K(γ) and Note that K = K t +K b .Thus, to simulate from c * (u * , v * ; γ) we need to be able to sample from the two densities f t (u * , v * ; θ, α) and f b (u * , v * ; θ, β), which are non-standard as they depend on the weighting function π(u * , v * ; θ) as well as the copula densities.However, as we can sample from the densities c t (u * , v * ; α) and c b (u * , v * ; β), we can use a rejection sampling scheme to simulate from the required densities f t and f b .
Note that, since the weighting function π(u * , v * ; θ) is in (0, 1), it is the case that Similarly, the ratio f b /c b is bounded by 1/K b .The rejection algorithm for sampling from c * via f t and f b is then as follows: 1. Simulate n draws from c t (u * , v * ; α) and keep each with probability The expected number of returned draws from f t is nK t .
2. Simulate n draws from c b (u * , v * ; β) and keep each with probability The expected number of returned draws from f b is nK b .
The total expected number of draws from both distributions together is n(K t +K b ) = nK; these are in proportions K t /K and K b /K = 1−K t /K, and consequently we have a random sample from density c * .To get a fixed sample size n , we simply take sufficiently large n and keep n draws at random.
Figure 2 illustrates two examples of random samples from our weighted copula model with different weighting functions.In each case we take a Gumbel copula with α = 2 as c t and a Gaussian copula with ρ = 0.6 as c b , which are the same components as the example in Figure 1.See Appendix A for a directory of copula models and their parameterisations.Contrary to the Aulbach et al. (2012a) approach, we see that there is no cut-off between the two regions, with a smooth transition from data points mainly derived from c b in the bottom left to those mainly derived from c t in the top right.The influence of the choice of weighting function is also visible; for the same value of θ, a preference for c t over c b is shown in the right plot.
Figure 2: Example of data points from two weighted copula models simulated according to the sampling procedure detailed in Section 2.2.In both cases, a Gumbel copula with parameter α = 2 is taken as c t and a Gaussian copula with parameter ρ = 0.6 as c b .Two weighting functions are used with θ = 1.5 in both: Points in blue originate from c b and points in red originate from c t .

Extremal dependence properties
We are interested in understanding the extremal dependence properties of the proposed model and, to do so, we compute the dependence measures χ and η mentioned in Section 1.3.However, since they are defined in terms of the joint survival function of (F X (X), F Y (Y )) , which we do not have, and the integral of the density in equation ( 5) is intractable, χ and η are obtained numerically.For a set of bivariate copulas, Joe (2014) and Heffernan (2000) study these dependence measures; a selection of which are summarised in Table 1.
Table 1: χ and η for a selection of copulas; ρ is the parameter of the Gaussian copula, and α the parameter of the Gumbel and Hüsler-Reiss copulas.
Copula χ η Gaussian 0 We consider mixtures of these four copulas to study the dependence properties of our model.In addition, we study the influence of the weighting function π(u * , v * ; θ) and its parameter θ.Thus, we consider two functions, π(u , 15].The dependence measures χ(r) and η(r) were computed for 10 different threshold values r from 0.7 to (1 − 1.49 × 10 −8 ), according to equations ( 2) and ( 4).For small θ, the weighting functions are closer to 1 at lower levels u * and v * , meaning that the tail copula dominates over a larger region, and vice versa for large θ.In general, we expect that, in the limit r → 1 and with a weighting function that goes to 1 with u * and v * , the dependence properties of our model are those from the copula tailored to the tail, with more similarities to the body copula for large θ and smaller r.Table 2 shows the theoretical values for χ and η for each of the copulas used in the four weighted copula models, and Figure 3 shows the outcomes of our numerical investigations for Case 3. The remaining results are shown in the Supplementary Material.For use in Table 2 and beyond, we let η t and χ t represent η and χ for the tail copula, and similarly η b and χ b for the body copula.
Table 2: Theoretical values for χ and η for each of the copulas considered in the weighted copula models studied based on Table 1.AD = asymptotically dependent; AI = asymptotically independent.

Case Body
Hüsler-Reiss (AD) α = 2 0.62 0.59 1 1 We can see from Figure 3 that, in the limit r → 1, χ(r) and η(r) of the weighted copula model tend towards χ t and η t for both weighting functions.The results in the Supplementary Material suggest this holds true for each of the combinations we consider.However, the influence of the parameter θ differs since π(u * , v * ; θ) = (u * v * ) θ grows more  We note that this investigation does not provide general conclusions about the tail dependence of models constructed in this way, but similar investigations can be carried out for any copulas and weighting functions of interest.

Parameter estimation
In order to estimate γ, we maximise the log-likelihood function of model ( 6), assuming n independent observations from the copula.Because F −1 U * and F −1 V * are computationally expensive to obtain by a root finding algorithm, these are approximated using a smooth spline, following Zhang et al. (2021).We found the spline approximation produces results with a similar degree of precision to the root finding algorithm, while reducing the computational time considerably.
We conduct a simulation study to verify that inference on the proposed model produces reasonable estimates for the vector of model parameters γ, and their inherent uncertainty.To do so, we consider two examples with different sample sizes: 500 and 1000 data points.Data are sampled from density (5) via the sampling procedure outlined in Section 2.2.
For the first case, we take c b to be the Clayton copula density with α = 1, and c t to be the Gumbel copula density with α = 2.For the second example, c b is taken as the Joe copula density with α = 2 and c t is the Gaussian copula density with ρ = 0.6.The parameter of the weighting function is set to be θ = 0.8 in the first example and θ = 1 in the second case.Each data set is simulated 100 times.
Figure 4 displays the results of the simulation study.For each parameter, the left boxplot shows the spread of estimates when n = 500, and the right boxplot displays this for n = 1000.We observe that estimation seems generally unbiased and uncertainty reduces when the sample size increases.Because the copula density (6) relies on numerical integration to obtain F, f and F −1 , it is important to assess the computational effort required to perform inference.Figure 5 displays the time taken to optimise the likelihoods.We can see that, for each of the models, the time taken increases with the sample size, which is to be expected.It also varies with the chosen copulas; for example, to evaluate the likelihood with n = 500 data points, the first model took around 30 minutes while the second took around 50 minutes.

Model misspecification
In addition to checking if inference on the model produces reasonable estimates for γ, we study the ability of the model to capture a misspecified dependence structure.We consider two situations: the case where the underlying data set comes from a single copula and we fit our model with this copula as one of the components; and the case where the fitted model does not contain the true copula.In the first case, we investigate whether the estimate of the parameter of the weighting function θ agrees with the true data.Since π(u * , v * ; θ) is increasing in (0, 1), we expect θ to be large (small) when the true copula is tailored to the body (tail) of the distribution.In the second case, we investigate whether our model still produces reliable estimates of various dependence summaries even though the true dependence structure cannot be captured.
For the first case, we generate 1000 data points from a Joe copula with α = 2 and fit two weighted copula models: one with the true copula as c t and a Gaussian copula as c b , and the other with the true copula as c b and a Clayton copula as c t .As before, 100 simulations for each case were performed and the results are shown in the boxplots in Figure 6.We observe that, when the Joe copula is taken as c t , the estimates for θ are all less than 1, and when it is taken as c b , these are considerably larger (here we use the logarithm of θ for ease of visualisation).Looking at the estimates for the parameter of the true copula, although they show some bias, they are fairly close to the true values, represented by the red lines.Finally, the estimates for the parameters of the misspecified copula show larger variability, which is to be expected as most of the weight is on the true copula.
Figure 7 shows a comparison between the AIC of the true and weighted copula models, respectively.In the majority of cases (89% for the first and 92% for the second), the true model outperforms the weighted copula model in terms of AIC, as expected.For our second experiment, to evaluate the outcome of not being able to capture the true dependence structure, we simulate 1000 data points from a Gaussian copula with ρ = 0.65 and from a Galambos copula with α = 2.For both cases, we generate 50 repetitions of the data set and fit a variety of weighted copula models, selecting the best model based on the average AIC values.In order to assess if the selected weighted copula model is flexible enough to capture the dependence of the true data sets, we compute three measures of dependence: Kendall's τ, and χ(r) and η(r) from equations ( 2) and ( 4), respectively, at several thresholds r ∈ (0, 1).We show how the model performs by comparing with the theoretical values of the underlying models; the results are shown in Figures 8 and 9.
Figure 8 displays the results for the weighted copula model where c t is inverted Gumbel, c b is Student t, and the true underlying structure is Gaussian.The results for the second model where the true underlying structure is Galambos and the selected weighted copula model is Coles-Tawn as c t and Frank as c b are shown in Figure 9.In both cases, we observe that the misspecified models capture the three dependence measures fairly well.4 Case study: ozone and temperature data

Data and background
The relationship between ozone concentration and temperature has been analysed previously in the literature.For instance, Finch and Palmer (2020) show that there is an increase of exceeding regulated thresholds for ozone when the temperature is high.More recently, Gouldsbrough et al. (2022) study how extreme levels of ozone concentration are influenced by temperature in the UK by applying a temperature-dependent univariate extreme value model.They show that, with the increase in temperatures, the probability of exceeding a moderate regulated threshold of ozone concentration has increased over the last decade; this leads to this event no longer being considered extreme.The analysis of Gouldsbrough et al. (2022) only considers the univariate distribution of ozone extremes conditional upon the value of temperature.Since both temperature and ozone concentration are measurements of random variables, we can apply our weighted copula model to learn about the relationship between these variables at all levels.Specifically, we study the dependence between temperature and ozone concentration at two UK sites: Blackpool (urban background) and Weybourne (rural background).Table 3 shows the regulated threshold indexes for the levels of air pollution for Ozone in the UK.

Levels
Low Moderate High Very High O 3 (µg/m 3 ) [0, 100] [101, 160] [161, 240] > 240 We took the daily maxima from 8-hour running means ozone concentration available on the UK's Automatic Urban and Rural Network (AURN) (https://uk-air.defra.gov. uk) and obtain the corresponding daily maximum temperature data from the Centre for Environmental Analysis (CEDA) archive (https://archive.ceda.ac.uk).Since higher temperatures are expected during summer, and in order to overcome the non-stationarity often present in temperature data, we restrict our analysis to the summer months (June-August).Based on the available data, we consider the years from 2011 to 2019 for Blackpool and from 2010 to 2019 for Weybourne; this results in 827 and 892 observations, respectively.Figure 10a shows the scatterplot of the daily maxima of temperature and the daily maxima of ozone for the summers of 2011 to 2019 in Blackpool and the respective regulated UK thresholds, while Figure 10b shows the relationship between the variables when transformed to uniform margins using a semi-parametric approach with a GPD fit to the tail of both distributions.That is, we estimate the CDF of each marginal distribution via where F (x) is the empirical distribution function, φ r is the probability of exceeding a selected high threshold r, and ξ and σ are the GPD shape and scale parameters, respectively.The corresponding analysis for Weybourne is presented in the Supplementary Material; the results show similar conclusions to the analysis for Blackpool.

Model fitting
We start by fitting a single copula model to the whole data set for comparison with the weighted copula model.Looking at Figure 10b, the variables seem to exhibit positive correlation when they are both extreme, but negative dependence otherwise.We anticipate that the weighted copula model may be flexible enough to capture this, whereas a single copula is likely to be too rigid.Table 4 shows the MLEs obtained by fitting a range of copulas and the corresponding AIC values.From the copulas considered, the only ones capable of capturing negative dependence are the Gaussian and Frank, when their parameters are negative, and the Student t (which also exhibits lower and upper tail dependence).However, all parameter estimates are positive.In terms of AIC, the best fit is the Joe, followed by the Galambos, Hüsler-Reiss, Gumbel and Coles-Tawn copulas; these are all known to be asymptotically dependent copulas, which appears to agree with the dependence in the upper tail shown in Figure 10b.As a further diagnostic, we compute the dependence measure η(r) from equation (4) for r ∈ (0, 1) empirically, as well as for the five best models in terms of AIC, and for the Gaussian and Frank copulas; this is shown in Figure 11.The confidence intervals in Figure 11 were obtained via block bootstrapping the data with a block length of 14 days, to reflect temporal dependence in the extremes.It is evident that none of the copulas fit the model well in the whole support based on this measure.However, the Joe copula (in orange) appears to give the best fit in the tail, agreeing with the AIC. Figure 11: Empirical η(r) (in black) and η(r) for seven copulas (in colour) for r ∈ (0, 1).The 95% confidence bands were obtained by block bootstrapping.Note that the η(r) for the Galambos, the Hüsler-Reiss, the Gumbel and the Coles-Tawn copulas overlap.
We next fit the weighted copula model to the whole data set taking the weighting function We consider several copulas with different extremal dependence characteristics to fit both c b and c t ; Table 5 shows the MLEs obtained by optimising the log-likelihood (7) and their AIC values for some of the models considered.According to AIC, there is a preference for models with the Gaussian and Frank as candidates for c b and AD copulas, such as the Galambos, Hüsler-Reiss, Joe and Coles-Tawn copulas, as c t .
In contrast to the single copula fits, the parameter estimates for the Gaussian and the Frank copulas are negative, which mirror the negative association visible in the body of Figure 10b.We next consider a different weighting function, in the five models with the lowest AICs.The MLEs and the AIC values are shown in Table 6.In terms of AIC, these models are all better fits to the data, while the negative correlation is still captured by c b , and is now stronger.Because these models represent a better fit based on AIC, we focus on them for the rest of the analysis.
Table 6: MLEs for five weighted copula models and their AIC values when the weighting function

Diagnostics
To check the adequacy of the model fits, we compare a variety of empirical dependence measures to their model-based counterparts.These include Kendall's τ, the dependence measures χ(r) and η(r) for r ∈ (0, 1), and some probabilities of interest.Specifically, we look at the probability of ozone concentrations exceeding the so-called moderate threshold (i.e., 100 µg/m 3 ) when the temperature is high or low, and the probability of O 3 exceeding this and the higher threshold of 160 µg/m 3 , knowing that the temperature is in a specific range.
Figure 12 displays χ(r) and η(r) for r ∈ (0, 1).A clear improvement from the single copula models shown in Figure 11 can be seen as now all five models offer a reasonable fit throughout the whole support.In addition, model 5 (in light green) seems to provide slightly better χ(r) and η(r) estimates at median values of r and in the tail.
The average temperature in summer in Blackpool is between 17 • C and 20 • C and the observed 90th, 95th and 99th percentiles of the temperature are approximately 22 • C, 24 • C and 28 • C, respectively.Thus, we focus on probabilities based on these values of temperature; these are presented with Kendall's τ in Table 7.We can see that the five models give very similar probabilities and they are all inside the 95% confidence interval of the empirical values, except for P [T ≤ 16, O 3 ≥ 100] and The empirical probability and its 95% confidence interval of the latter are explained by the low number of observations present in the data set.When there are no observations in a certain region then this will be true of each bootstrap sample as well.Gouldsbrough et al. (2022) obtained the mean probability of exceeding the high threshold 160 µg/m 3 at the 99th percentile of temperature for urban and rural backgrounds across the UK.These were 0.0002 ([0, 0.0004]) for an urban background and 0.006 ([0.003, 0.009]) for a rural background.We obtained higher probabilities of exceeding this threshold given that the temperature is close to the observed 99th percentile (we refer readers to the Supplementary Material for the results for Weybourne).This might be due to having only considered two sites within the UK, and potentially some of the characteristics of the relationship between temperature and ozone being better captured with the weighted copula model than with the univariate conditional model.
An advantage of this modelling approach in comparison to the conditional univariate modelling of Gouldsbrough et al. (2022) is that we are able to extrapolate and consider probabilities of ozone exceeding certain thresholds at temperature values that have not been observed in the data set.In this way, we can consider probabilities such as which we estimate to be 0.6944 for Model 1, for example.(a) Empirical χ(r) (in black) and χ(r) for the five models (in colour) for r ∈ (0, 1).The 95% confidence bands were obtained by block bootstrapping.

Conclusions and discussion
In this paper, we introduced a dependence model that is able to capture both the body and tail of a bivariate data set.This is important when we aim to obtain an accurate representation of the data in both regions.The model has the advantage of not requiring a choice of thresholds above which we fit the copula tailored to the extreme observations.Moreover, it offers a smooth transition between the two copulas.Through simulation studies, we have shown that the model behaves as expected when only a single dependence structure is present, and that it is sufficiently flexible to capture misspecified dependence structures.We applied the weighted copula model to study the relationship between temperature and concentrations of air pollution in the UK and showed that this model performs substantially better than fitting a single copula model to the data.In fact, in this particular application, we were able to capture the negative dependence exhibited by the bulk and the positive association present in the upper tail, which was not possible through fitting a single copula.
A drawback of the weighted copula model is that it is computationally expensive due to the need for numerical integration and inversion.As shown in the simulation studies in Sections 3.1 and 3.2, for a sample size of 1000, optimising the log-likelihood takes more than one hour to compute (on an internal computing node running CentOS Linux with an Intel CPU running at 500GB of RAM), although the run time also varies depending on the chosen copulas.Whilst in principle the weighted copula model could be extended to higher dimensions, doing so would exacerbate the computational issues.
For the temperature and ozone data, we have χ(r) > 0 and η(r) < 1, for the largest values of r, which does not allow us to draw conclusions about the extremal dependence.This is a common situation in practice but results in complications if we wish to extrapolate for larger values than the ones observed.Incorporating a more flexible copula as the tail component of the proposed model is a possibility to overcome this issue.Such a copula could be the one proposed by Huser and Wadsworth (2019), which is able to capture both dependence classes with the transition between them occurring at an interior point of the parameter space.However, because it is computationally expensive on its own, when applied as the tail component in our model, the computational time required was not feasible.
It would be an advantage to have a copula model that could accommodate changes in the dependence structure due to covariates over the whole support of the distribution.Until now, we have been assuming stationarity, which is rarely the case in real world situations.Non-stationary multivariate extreme value methods naturally focus on capturing trends present in the extreme observations.However, data may be extreme in only one variable and thus studying the trends present in the body of the data is of importance as well.
Incorporating covariates in the proposed model would also be an interesting avenue for future work.
Declarations of Interest: None.
and its density can be written as

A.4 Clayton copula
The Clayton copula with parameter α ∈ R + is given by and its density can be written as

A.5 Joe copula
The Joe copula with parameter α > 1 is given by and its density can be written as where x = 1 − u and y = 1 − v.

A.6 Gumbel copula
The Gumbel copula with parameter α > 1 is given by where x = − log(u) and y = − log(v).The Gumbel copula density can be written as The Inverted Gumbel copula density is obtained if we substitute u and v by (1 − u) and (1 − v), respectively.2 Ozone and temperature analysis for Weybourne, UK Following the same structure as the case study in Section 4 in the main paper, the analysis for the summers of 2010 to 2019 of Weybourne, UK, is presented here.Figures 3a and  3b show the scatterplots of the daily maxima of temperature and the daily maxima of ozone on the original scale and on uniform margins, respectively.

Model fitting
Table 1 shows the MLEs obtained by fitting a range of single copulas and the corresponding AIC values, whereas Figure 4 illustrates the comparison between the empirical extremal dependence measure η(r) for r ∈ (0, 1) and the model-derived ones.

Diagnostics
Figure 5 displays χ(r) and η(r) for r ∈ (0, 1) for the five models considered.A clear improvement from the single copula models shown in Figure 4 can be seen as now all five models offer a reasonable fit throughout the whole support of the data.In summer, the average temperature in Weybourne is between 18 (a) Empirical χ(r) (in black) and χ(r) for the five models (in colour) for r ∈ (0, 1).The 95% confidence bands were obtained by block bootstrapping.

Figure 1 :
Figure 1: Example of Q simulated according to equation (1) with a Gumbel copula with parameter α = 2 selected for the upper tail copula C 1 and Gaussian copula with parameter ρ = 0.6 selected for the body copula C 2 of the model proposed byAulbach et al. (2012a).For illustration purposes, the vector of thresholds was chosen to be t = (0.8, 0.5).

Figure 3 :
Figure 3: χ(r) and η(r) for different thresholds r ∈ [0.7, 1) for the proposed model with both π(u * , v * ; θ) when c b is Gumbel (AD) and c t is Gaussian (AI).The coloured lines represent the 10 different models depending on different values of θ; the thick black lines represent the single copula models -Gumbel (dashed) and Gaussian (solid).The theoretical values for the Gumbel and Gaussian copulas based onTable 2 are represented by the horizontal dashed lines.
Parameter estimates of c t (left plot), c b (middle plot), and the weighting parameter θ (right plot) for n = 500 and n = 1000.The true values for the parameters are shown in red.Parameter estimates of c t (left plot), c b (middle plot), and the weighting parameter θ (right plot) for n = 500 and n = 1000.The true values for the parameters are shown in red.

Figure 4 :
Figure 4: Estimation variability obtained by simulating each case 100 times.
Case when c t is Gumbel and c b is Clayton for n = 500 (left) and n = 1000 (right).Case when c t is Gaussian and c b is Joe for n = 500 (left) and n = 1000 (right).
estimates when c t is taken as the true copula (left) and c b is taken as the Gaussian copula (middle).The true value for the parameter is shown in red.Estimates of θ are shown in the right boxplot.
estimates when c b is taken as the true copula (left) and c t is taken as the Clayton copula (middle).The true value for the parameter is shown in red.Estimates of log(θ) are shown in the right boxplot.

Figure 6 :
Figure 6: Estimation variability obtained by simulating each case 100 times.
Case with the true copula as c t and a Gaussian copula as c b .Case with true copula as c b and a Clayton copula as c t .

Figure 7 :
Figure 7: Comparison between the AIC of the true model and the fitted model.
Daily maxima of temperature and ozone.The moderate, high and very high DAQI are represented by the yellow, orange and red lines, respectively.Daily maxima of temperature (u) and ozone (v) on uniform margins.The corresponding moderate, high and very high DAQI are represented by the yellow, orange and red lines, respectively.

Figure 10 :
Figure 10: Summer data from 2011 to 2019 for Blackpool, UK.
Daily maxima of temperature and ozone.The moderate, high and very high DAQI are represented by the yellow, orange and red lines, respectively.Daily maxima of temperature (u) and ozone (v) on uniform margins.The corresponding moderate, high and very high DAQI are represented by the yellow, orange and red lines, respectively.

Table 3 :
Daily Air Quality Index (DAQI) for ozone (O 3 ) concentrations in the UK.

Table 4 :
MLEs for ten copulas and their AIC values.

Table 1 :
MLEs for ten copulas and their AIC values.
• C and 22 • C and the observed 90th, 95th and 99th percentiles of the temperature are around 24 • C, 26 • C and 29 • C, respectively.Table4shows Kendall's τ and some probabilities of interest.

Table 4 :
Diagnostics for the best five models according to their AIC values.The 95% confidence intervals for the empirical values were obtained by block bootstrapping.The empirical probability P [O 3 ≥ 160 | 29 ≤ T ≤ 30] and its 95% confidence interval are explained by the low number of observations present in the data set.≥ 26, O 3 ≥ 100] P [O 3 ≥ 100 | 24 ≤ T ≤ 25] P [O 3 ≥ 160 | 29 ≤ T ≤ 30]