Improving estimation for asymptotically independent bivariate extremes via global estimators for the angular dependence function

Modelling the extremal dependence of bivariate variables is important in a wide variety of practical applications, including environmental planning, catastrophe modelling and hydrology. The majority of these approaches are based on the framework of bivariate regular variation, and a wide range of literature is available for estimating the dependence structure in this setting. However, such procedures are only applicable to variables exhibiting asymptotic dependence, even though asymptotic independence is often observed in practice. In this paper, we consider the so-called `angular dependence function'; this quantity summarises the extremal dependence structure for asymptotically independent variables. Until recently, only pointwise estimators of the angular dependence function have been available. We introduce a range of global estimators and compare them to another recently introduced technique for global estimation through a systematic simulation study, and a case study on river flow data from the north of England, UK.


Introduction
Bivariate extreme value theory is a branch of statistics that deals with the modelling of dependence between the extremes of two variables. This type of analysis is useful in a variety of fields, including finance (Castro-Camilo et al., 2018), engineering (Ross et al., 2020), and environmental science (Brunner et al., 2016), where understanding and predicting the behaviour of rare, high-impact events is important.
In certain applications, interest lies in understanding the risk of observing simultaneous extreme events at multiple locations; for example, in the context of flood risk modelling, widespread flooding can result in damaging consequences to properties, businesses, infrastructure, communications and the economy (Lamb et al., 2010;Keef et al., 2013b). To support resilience planning, it it imperative to identify locations at high risk of joint extremes.
Classical theory for bivariate extremes is based on the framework of regular variation. Given a random vector (X,Y ) with standard exponential margins, we say that (X,Y ) is bivariate regularly with R := e X + e Y , V := e X /R and H(∂ B) = 0, where ∂ B is the boundary of B (Resnick, 1987).
Note that bivariate regular variation is most naturally expressed on standard Pareto margins, and the mapping (X,Y ) → (e X , e Y ) performs this transformation. We refer to R and V as radial and angular components, respectively. Equation (1.1) implies that for the largest radial values, the radial and angular components are independent. Furthermore, the quantity H, which is known as the spectral measure, must satisfy the moment constraint 1 0 vdH(v) = 1/2.
The spectral measure summarises the extremal dependence of (X,Y ), and a wide range of approaches exist for its estimation (e.g., Einmahl and Segers, 2009;de Carvalho and Davison, 2014;Eastoe et al., 2014). Equivalently, one can consider Pickands' dependence function (PDF; Pickands, 1981), which has a direct relationship to H via This function again captures the extremal dependence of (X,Y ), and many approaches also exist for its estimation (e.g., Guillotte and Perron, 2016;Marcon et al., 2016;Vettori et al., 2018).
Moreover, estimation approaches for the spectral measure and PDF encompass a wide range of statistical methodologies, with parametric, semi-parametric, and non-parametric modelling techniques proposed in both Bayesian and frequentist settings.
However, methods based on bivariate regular variation are limited in the forms of extremal dependence they can capture. This dependence can be classified through the coefficient χ (Joe, 1997), defined as where this limit exists. If χ > 0, then X and Y are asymptotically dependent, and the most extreme values of either variable can occur simultaneously. If χ = 0, X and Y are asymptotically independent, and the most extreme values of either variable occur separately.
Under asymptotic independence, the spectral measure H places all mass on the points {0} and {1}; equivalently, A(t) = 1 for all t ∈ [0, 1]. Consequently, for this form of dependence, the framework given in equation (1.1) is degenerate and is unable to accurately extrapolate into the joint tail Tawn, 1996, 1997). Practically, an incorrect assumption of asymptotic dependence between two variables is likely to result in an overly conservative estimate of joint risk.
To overcome this limitation, several models have been proposed that can capture both classes of extremal dependence. The first was given by Ledford and Tawn (1996), in which they assume that as u → ∞, the joint tail can be represented as Pr(X > u,Y > u) = Pr(min(X,Y ) > u) = L(e u )e −u/η , (1.2) where L is a slowly varying function at infinity, i.e., lim u→∞ L(cu)/L(u) = 1 for c > 0, and η ∈ (0, 1]. The quantity η is termed the coefficient of tail dependence, with η = 1 and lim u→∞ L(u) > 0 corresponding to asymptotic dependence and either η < 1 or η = 1 and lim u→∞ L(u) = 0 corresponding to asymptotic independence. Many extensions to this approach exist (e.g., Ledford and Tawn, 1997;Resnick, 2002;Ramos and Ledford, 2009); however, all such approaches are only applicable in regions where both variables are large, limiting their use in many practical settings.
Since many extremal bivariate risk measures, such as environmental contours (Haselsteiner et al., 2021) and return curves ( Several copula-based models have been proposed that can capture both classes of extremal dependence, such as those given in Coles and Pauli (2002), Wadsworth et al. (2017) and Huser and Wadsworth (2019). Unlike equation (1.2), these can be used to evaluate joint tail behaviour in all regions where at least one variable is extreme. However, these techniques typically require strong assumptions about the parametric form of the bivariate distribution, thereby offering reduced flexibility. Heffernan and Tawn (2004) proposed a modelling approach, known as the conditional extremes model, which also overcomes the limitations of the framework described in equation (1.2).
This approach assumes the existence of normalising functions a : R + → R and b : for a non-degenerate distribution function D. Note that the choice of conditioning on X > u is arbitrary, and an equivalent formulation exists for normalised X given Y > u. This framework can capture both asymptotic dependence and asymptotic independence, with the former arising when a(x) = x and b(x) = 1, and can also be used to describe extremal behaviour in regions where only one variable is large.
Finally, Wadsworth and Tawn (2013) proposed a general extension of equation (1.2). As u → ∞, they assume that for each ray w ∈ [0, 1], where L(· ; w) is slowly varying for each w ∈ [0, 1] and λ is the angular dependence function (ADF), which generalises the coefficient η = 1/{2λ (0.5)}. This extension captures both extremal dependence regimes, with asymptotic dependence implying the lower bound, i.e., λ (w) = max(w, 1 − w) for all w ∈ [0, 1]. Evaluation of the ADF for rays w close to 0 and 1 corresponds to regions where one variable is larger than the other.
The ADF can be viewed as the counterpart of the PDF for asymptotically independent variables, and shares many of its theoretical properties (Wadsworth and Tawn, 2013). It can be used to differentiate between different forms of asymptotic independence, with both positive and negative associations captured, alongside complete independence, which implies λ (w) = 1 for all w ∈ [0, 1]. Figure 1 illustrates the ADFs for three copulas. We observe a variety in shapes, corresponding to differing degrees of positive extremal dependence in the underlying copulas. The weakest dependence is observed for the inverted logistic copula, while the ADF for the asymptotically dependent logistic copula is equal to the lower bound.
Despite these modelling advances, the majority of approaches for quantifying the risk of bivariate extreme events still require bivariate regular variation. Many of the approaches that do allow for asymptotic independence use the conditional extremes model of equation (1.3) despite some well known limitations of this approach (Liu and Tawn, 2014).  Figure 1: The true ADFs (given in red) for three example copulas. Left: bivariate Gaussian copula with coefficient ρ = 0.5. Centre: inverted logistic copula with dependence parameter r = 0.8. Right: logistic copula with dependence parameter r = 0.8. The lower bound for the ADF is denoted by the black dotted line.
One particular application of the model described in equation (1.4) is the estimation of socalled bivariate return curves, RC(p) := (x, y) ∈ R 2 : Pr(X > x,Y > y) = p , which requires knowledge of extremal dependence in regions where either variable is large; see Section 5.4.
? obtain estimates of return curves, finding that estimates derived using equation (1.4) were preferable to those from the conditional extremes model. Mhalla et al. (2019) and Murphy- Barltrop and Wadsworth (2022) also provide non-stationary extensions and inference methods for the ADF.
In this paper, we propose a global methodology for ADF estimation in order to improve extrapolation into the joint upper tail for bivariate random vectors exhibiting asymptotic independence. Until recently, the ADF has been estimated only in a pointwise manner using the Hill estimator (Hill, 1975) on the tail of min{X/w,Y /(1 − w)}, resulting in unrealistic rough functional estimates and, as we demonstrate in Section 4, high degrees of variability. Further, ? showed that pointwise ADF estimates result in non-smooth return curve estimates, which are again unrealistic.
The first smooth ADF estimator was proposed in Simpson and Tawn (2022) based on a theoretical link between a limit set derived from the shape of appropriately scaled sample clouds and the ADF (Nolde and Wadsworth, 2022). The authors introduce global estimation techniques for the limit set, from which smooth ADF estimates follow; see Section 2 for further details.
We introduce several novel smooth ADF estimators, and compare their performance with the pointwise Hill estimator, as well as the estimator given in Simpson and Tawn (2022). In Section 2, we review the literature on ADF estimation. In Section 3, we introduce a range of novel estimators, and select tuning parameters for each proposed estimation technique. In Section 4, we compare each of the available estimators through a systematic simulation study, finding certain estimators to be favourable over others. A subset of estimators are then applied to river flow data sets in Section 5 and used to obtain estimates of return curves for different combinations of river gauges. We conclude in Section 6 with a discussion.

Existing techniques for ADF estimation
In this section, we introduce existing estimators for the ADF, with (X,Y ) denoting a random vector with standard exponential margins throughout. To begin, for any ray w ∈ [0, 1], define the min-projection at w as T w := min{X/w,Y /(1 − w)}. Equation (1.4) implies that for any w ∈ [0, 1] and t > 0, as u → ∞, with t * := e t . Since the expression in equation (2.1) has a univariate regularly varying tail with positive index, Wadsworth and Tawn (2013) propose using the Hill estimator (Hill, 1975) to obtain a pointwise estimator of the ADF; we denote this 'base' estimatorλ H . A major drawback of this technique is that the estimator is pointwise, leading to rough and often unrealistic estimates of the ADF. Furthermore, this estimator need not satisfy the theoretical constraints on the ADF identified in Wadsworth and Tawn (2013), such as the endpoint conditions λ (0) = λ (1) = 1. Moreover, no information is shared across different rays, increasing the variability in the resulting estimates. Simpson and Tawn (2022) recently proposed a novel estimator for the ADF using a theoretical link with the limiting shape of scaled sample clouds. Let C n := {(X i ,Y i )/log n; i = 1, . . . , n} denote n scaled, independent copies of (X,Y ). Nolde and Wadsworth (2022) explain how, as n → ∞, the asymptotic shape of C n provides information on the underlying extremal dependence structure. In many situations, C n converges onto the compact limit setḠ = {(x, y) : g(x, y) ≤ 1} ⊆ [0, 1] 2 , where g is the gauge function ofḠ. A sufficient condition for this convergence to occur is that the joint density, f , of (X,Y ) exists, and that − log f (tx,ty) ∼ tg(x, y), x, y ≥ 0, t → ∞.
(2.2) Following Nolde (2014), we also define the unit-level set G = {(x, y) : g(x, y) = 1} ⊂ [0, 1] 2 . Given fixed margins, the shapes ofḠ and G are completely determined by the extremal dependence structure of (X,Y ). Furthermore, Nolde and Wadsworth (2022) show that the boundary set G is also directly linked to the modelling frameworks described in equations (1.2), (1.3) and (1.4), as well as the approach of Simpson et al. (2020). In particular, letting R w := (w/max(w, 1 − w), ∞] × ((1 − w)/max(w, 1 − w), ∞] for all w ∈ [0, 1], we have that The boundary sets G for each of the copulas in Figure 1 are given in Figure 2, alongside the coordinates (r w w/max(w, 1 − w), r w (1 − w)/max(w, 1 − w)) for all w ∈ [0, 1]; these coordinates represent the relationship between G and the ADF via equation (2.3). One can again observe the variety in shapes. For the asymptotically dependent logistic copula, we have that (1, 1) ∈ G; this is true for all asymptotically dependent bivariate random vectors.  Figure 2: The boundary set G (given in red) for three example copulas, with coordinate limits denoted by the black dotted lines and the blue lines representing the coordinates (r w w/max(w, 1− w), r w (1 − w)/max(w, 1 − w)) for all w ∈ [0, 1]. Left: bivariate Gaussian copula with coefficient ρ = 0.5. Centre: inverted logistic copula with dependence parameter r = 0.8. Right: logistic copula with dependence parameter r = 0.8.
In practice, both the limit set,Ḡ, and its boundary, G, are unknown. Simpson and Tawn (2022) propose an estimator for G, which is then used to derive an estimatorλ ST for the ADF via equation (2.3). The resulting estimatorλ ST was shown to outperformλ H in a wide range of scenarios (Simpson and Tawn, 2022).
Estimation of G uses an alternative radial-angular decomposition of (X,Y ), with R * := X +Y and V * := X/(X +Y ). Simpson and Tawn (2022) assume the tail of R * | V * = v * , v * ∈ [0, 1], follows a generalised Pareto distribution (GPD) (Davison and Smith, 1990) and then use generalised additive models to capture trends over angles in both the threshold and GPD scale parameter (Youngman, 2019). Next, high quantile estimates from the conditional distributions R * | V * = v * , v * ∈ [0, 1] are computed using the fitted GPDs. They are then transformed back to the original scale using X = R * V * and Y = R * (1−V * ) and finally scaled onto the set [0, 1] 2 to give an estimate of G; see Simpson and Tawn (2022) for further details.
Wadsworth and Campbell (2022) also provide methodology for estimation of G, though their focus is on estimation of tail probabilities more generally, including in dimensions greater than two. Furthermore, their approach requires prior selection of a parametric form for g. We therefore restrict our attention to the work of Simpson and Tawn (2022) as their main focus is non-parametric estimation for G in two dimensions.
When applying the estimatorsλ H andλ ST in Section 4 and 5, we use the tuning parameters suggested in the original approaches. In the case ofλ H , we set u to be the empirical 90% quantile of T w . The default tuning parameters forλ ST can be found in Simpson and Tawn (2022), and example estimates of the set G obtained using the suggested parameters are given in the Supplementary Material.

Novel estimators for the ADF
Motivated by the goal of global estimation, we propose a range of novel estimators for the ADF.
Since the ADF and PDF bear many theoretical similarities, we begin by reviewing estimation of the PDF. A smooth functional estimate for the ADF is desirable, so we restrict attention to approaches for the PDF which achieve this: spline-based techniques (Hall and Tajvidi, 2000;Cormier et al., 2014) and techniques that utilise the family of Bernstein-Bézier polynomials (Guillotte and Perron, 2016;Marcon et al., 2016. In this paper, we focus on to the latter category, since spline-based techniques typically result in more complex formulations and a larger number of tuning parameters. Moreover, approaches based on Bernstein-Bézier polynomials have been shown to improve estimator performance across a wide range of copula examples (Vettori et al., 2018). For functions on the interval [0, 1], the family of Bernstein-Bézier polynomials of degree k ∈ N is given by Many approaches assume that the PDF A ∈ B k and propose techniques for estimating the coefficient vector β β β , resulting in an estimatorβ β β . This automatically ensures A(t) ≤ 1 for all t ∈ [0, 1], thereby satisfying the theoretical upper bound of the PDF.
We make a similar assumption about the ADF, and use this to propose novel estimators.
However, unlike the PDF, the ADF is unbounded from above, meaning functions in B k cannot represent all forms of extremal dependence captured by equation (1.4). Moreover, the endpoint conditions λ (0) = λ (1) = 1 are not necessarily satisfied by functions in B k . We therefore propose an alternative family of polynomials: given k ∈ N, let (3.1) Functions in this family are unbounded from above, and f (0) = f (1) = 1 for all f ∈ B * k . Furthermore, the parameter vector β β β is constrained such that each f ∈ B * k satisfies the lower bound of the ADF.
For the remainder of this section, let λ (·; β β β ) ∈ B * k represent a form of the ADF from B * k .
Interest now lies in estimating the coefficient vector β β β , which requires choice of the degree k ∈ N.
This is a trade-off between flexibility and computational complexity; polynomials with small values of k may not be flexible enough to capture all extremal dependence structures, resulting in bias, while high values of k will increase computational burden and parameter variance.

Composite likelihood approach
One consequence of equation ( . . , n} denote n independent observations from the joint distribution of (X,Y ). For each w ∈ W , where W denotes some finite subset spanning the interval [0, 1], let . . , n} and take u w to be the empirical 100q% quantile of t w , with q close to 1 and fixed across w. Letting t * w := {t w − u w | t w ∈ t w ,t w > u w }, we have a set of realisations from the conditional variable T * w .
One approach to obtain an estimate of λ (w) while considering all w ∈ W simultaneously is to use a composite likelihood, in which multiple likelihood components are treated as independent whether or not they are independent. Provided each component is a valid density function, the resulting likelihood function provides unbiased parameter estimates; see Varin et al. (2011) for further details. For this model, the likelihood function is given by where |t * w | denotes the cardinality of the set t * w . This composite likelihood function has equal weights across all w ∈ W (the 'components'). An estimator of the ADF,λ CL , is given by To apply this method in practice, one must first select a set W and a value for the probability q; see Section 3.4 for further details. The former is akin to selecting the degree of smoothing, while the latter is analogous to selecting a threshold for the GPD Defined in Section 2 in the univariate setting.

Probability ratio approach
With W and t w defined as in Section 3.1, consider two probabilities q < p < 1, both close to one. Given any w ∈ W , let u w and v w denote the 100q% and 100p% empirical quantiles of t w , respectively. Assuming the distribution function of T w is strictly monotonic, equation (2.1) implies that Similarly to Murphy-Barltrop and Wadsworth (2022), we exploit equation (3.2) to obtain an estimator for the ADF. Firstly, we observe that this representation holds for all w ∈ W , hence To ensure equation (3.2) holds requires careful selection of q and p. This selection also represents a bias-variance trade off: probabilities too small (big) will induce bias (high variability).
Moreover, owing to the different rates of convergence to the limiting ADF (Wadsworth and Tawn, 2013), a single pair (q, p) is unlikely to be appropriate across all data structures. We instead consider a range of probability pairs simultaneously. Specifically, letting {(q j , p j ) | q j < p j < 1, 1 ≤ j ≤ m}, m ∈ N, be pairs of probabilities near one, consider the expression in which u w, j and v w, j denote 100q j % and 100p j % empirical quantiles of t w , respectively, for each j = 1, . . . , m. Since it is desirable for S(β β β ) = 0, we setβ β β PR = arg min β β β ∈[0,∞) k−1 S(β β β ) and denote byλ PR the estimator λ (·;β β β PR ). Similarly toλ CL , one must select the sets W and {(q j , p j ) | q j < p j < 1, 1 ≤ j ≤ m}, m ∈ N prior to applying this estimator; see Section 3.4.

Incorporating knowledge of conditional extremes parameters
As noted in Section 2, the set G links different representations of bivariate extremes. Recall the conditional extremes modelling framework described in equation (1.3). Let a y|x and a x|y be the normalising functions for conditioning on the events X > t and Y > t respectively, and let α y|x := lim t→∞ a y|x (t)/t and α x|y := lim t→∞ a x|y (t)/t, with α y|x , α x|y ∈ [0, 1]. From Nolde and Wadsworth (2022), we have that g(1, α y|x ) = 1 and g(α x|y , 1) = 1, with g defined as in equation (2.2). Assuming that the values of α y|x and α x|y are known, it follows from equation (2.3) that since the set R 1/(1+α) will always intersect the point (1, α y|x ) ∈ G for any α ∈ [0, α y|x ], This result is illustrated in Figure 3 for a Gaussian copula with ρ = 0.5. Here, ; these rays correspond to the blue lines in the figure. One can observe that for any region R w defined along either of the blue lines (such as the shaded regions illustrated for w = 0.1 and w = 0.9), we have that r w = 1, since these regions will intersect G at either the coordinates (0.25, 1) or (1, 0.25).  3). The boundary set G, given in red, is from the bivariate Gaussian copula with ρ = 0.5, with the points (1, α y|x ) and (α x|y , 1) given in green. The blue lines represent the rays w ∈ [0, α * x|y ] [α * y|x , 1], while the yellow and pink shaded regions represent the set R w for w = 0.1 and w = 0.9, respectively.
In practice, α * x|y and α * y|x are unknown; however, estimatesα y|x andα x|y are commonly obtained using a likelihood function based on a misspecified model for the distribution D in equation (1.3) (e.g. Jonathan et al., 2014). The resulting estimates, denotedα * x|y ,α * y|x , can be used to ap- . What now remains is to combine this with an A crude way to obtain an estimator via this framework would be to set λ (w) = max(w, 1 − w) , the techniques introduced in Sections 3.1 and 3.2 can be used to obtain estimates of the coefficient vectors, which we denoteβ β β CL2 andβ β β PR2 , respectively. The resulting estimators for λ are given byλ withλ PR2 defined analogously. We lastly define the discontinuous estimatorλ H2 aŝ This is obtained by combining the pointwise Hill estimator with the information provided by the estimatesα * x|y ,α * y|x . Illustrations of all the estimators discussed in this section, as well as in Section 2, can be found in the Supplementary Material.

Tuning parameter selection
Prior to using any of the ADF estimators introduced in this section, we are required to select at least one tuning parameter. For the probability values required by the estimators introduced in information in all cases. We tested a range of probabilities for both estimators and found that the ADF estimates were not massively sensitive to these across different extremal dependence structures. For example, forλ CL , a lower q resulted in mild improvements for asymptotically independent copulas, while simultaneously worsening the quality of ADF estimates for asymptotically dependent examples, while a higher q led to higher variance.
We also set W := {0, 0.001, 0.002, . . . , 0.999, 1}, i.e., a finite set of equally spaced rays spanning the interval [0, 1]. This set was sufficient to ensure a high degree of smoothness in the resulting ADF estimates without too high a computational burden.
For each of the novel estimators (exceptλ H2 ), we must also specify the degree k ∈ N for the polynomial families described by equations (3.1) and (3.4). In the case of the PDF, studies have found that higher values of k are preferable for very strong positive dependence, while the opposite is true for weak dependence Vettori et al., 2018). We prefer to select a single value of k that performs well across a range of dependence structures, while minimising the computational burden; this avoids the need to select this parameter when obtaining ADF estimates in practice.
To achieve this objective, we estimated the root mean integrated squared error (RMISE), as defined in Section 4.1, of the estimatorsλ CL andλ PR with k = 2, 3, . . . , 11 using 200 samples from two Gaussian copula examples, corresponding to strong (ρ = 0.9) and weak (ρ = 0. resulting in data structures exhibiting medium negative, weak positive, and medium positive de-pendence, respectively. Note that in the case of ρ = −0.6, the choice of exponential margins will hide the dependence structure (Keef et al., 2013a;Nolde and Wadsworth, 2022).
For the next two examples, we consider the bivariate extreme value copula with logistic and asymmetric logistic families (Gumbel, 1960;Tawn, 1988). In both cases, the dependence is characterised by the parameter r ∈ (0, 1]; we set r = 0.8, corresponding to weak positive extremal dependence. For the asymmetric logistic family, we also require two asymmetry parameters (k 1 , k 2 ) ∈ [0, 1] 2 , which we fix to be (k 1 , k 2 ) = (0.3, 0.7), resulting in a mixture structure.
We next consider the inverted bivariate extreme value copula (Ledford and Tawn, 1997) for the logistic and asymmetric logistic families, with the dependence again characterised by the parameters r and (r, k 1 , k 2 ), respectively. We set r = 0.4, corresponding to moderate positive dependence, and again fix (k 1 , k 2 ) = (0.3, 0.7). Note that for this copula, the model described in equation (1.4) is exact: see Wadsworth and Tawn (2013).
Illustrations of the true ADFs for each copula are given in Figure 4, showing a range of extremal dependence structures. For examples where the ADF equals the lower bound, the copula exhibits asymptotic dependence. While the fifth copula exhibits asymmetric dependence, the limiting ADF is symmetric; the same is not true for its inverted counterpart.
To evaluate estimator performance, we use the RMISE Therefore, the RMISE summarises the quality of an estimator in terms of both bias and variance, and can be used as a means to compare different estimators.

Results
For the copulas described in Section 4.1, data from each copula example was simulated on standard exponential margins with a sample size of n =10,000, and the integrated squared error (ISE) of each estimator was approximated for 1, 000 samples using the trapezoidal rule; see the Sup-  well, though this is in part due to the choice of exponential margins.
Overall, these results indicate that no one estimator is preferable across all extremal dependence structures. However, we suggest using the estimatorsλ CL2 andλ ST since, on average, these appeared to result in less bias and variance. The form of extremal dependence appears to affect the performance of both of these estimators; since this is often difficult to quantify a priori, we suggest using both estimators and evaluating relative performance via diagnostics, as we do in Section 5.

Overview
Understanding the probability of observing extreme river flow events (i.e., floods) at multiple sites simultaneously is important in a variety of sectors, including insurance (Keef et al., 2013b;Quinn et al., 2019;Rohrbeck and Cooley, 2021) and environmental management (Lamb et al., 2010;Gouldby et al., 2017). Valid risk assessments therefore require accurate evaluation of the extremal dependence between different sites.
In this section, we estimate the ADF of river flow data sets obtained from gauges in the north of England, UK, which can be subsequently used to construct bivariate return curves. Daily average flow values (m 3 /s) at six river gauge locations on different rivers were considered. The gauge sites are illustrated in Figure 5. For each location, data is available between May 1993 and September 2020; however, we only consider dates where a measurement is available for every location. To avoid seasonality, we consider the interval October-March only; from our analysis,  We fix the site on the river Lune to be our reference site and consider the extremal dependence between this and all other gauges. We first estimate the extremal dependence measure χ and the coefficient η using the upper 10% of the corresponding joint tails. Both χ and η are limiting values; however, in practice, we are unable to evaluate such limits without a closed form for the joint distribution. We therefore calculate these values empirically. Taking χ, for example, an estimate is given byχ q =Pr(X >x q ,Y >ŷ q )/Pr(X >x q ), wherePr(·) denotes an empirical probability estimate andx q andŷ q denote empirical 100q% quantiles estimates for the variables X and Y , respectively, and q is some value close to 1. Specifically, we take q = 0.9. In practice, we are unlikely to observe χ =χ q , even at the most extreme quantile levels, i.e., as q → 1. This can be problematic when trying to quantify the form of extremal dependence, sinceχ q > 0 may arise for asymptotically independent data sets (for example). Therefore, the estimated coefficients should act only as a rough guide for this quantification.
The dependence measure estimates and 95% confidence intervals are shown in Figure 6 as a function of distance from the reference site. Here and throughout, all confidence intervals are obtained via block bootstrapping with block size b = 40; this value appears appropriate to account for the varying degrees of temporal dependence observed across the six gauge sites.
These estimates suggest that asymptotic independence may exist for at least three of the site pairings; therefore, modelling techniques based on bivariate regular variation would likely fail to capture the observed extremal dependencies in this scenario.

ADF Estimation
We transform each marginal data set to exponential margins using the semi-parametric approach of Coles and Tawn (1991), whereby a GPD is fitted to the upper tail and the body is modelled empirically. The GPD thresholds are selected using the technique proposed in Varty et al. (2021).
Diagnostic plots found in the Supplementary Material indicate good model fits. Since our results from Section 4 suggest that the estimatorsλ CL2 andλ ST perform best overall, we used these, alongside the base estimatorλ H , to estimate the ADF for each combination of the reference gauge and the other five gauges. The resulting ADF estimates can be found in Figure 7. For the most part, there is decent agreement across the estimators. One can observe contrasting shapes across the different pairs of gauges, illustrating the variety in the observed extremal dependence structures. We note that the estimates obtained usingλ H demonstrate some of the drawbacks mentioned in Section 2, such as non-smoothness and not respecting theoretical results for the ADF. Moreover, these results illustrate that on the whole, the estimatorλ CL2 is very much a smoothed version ofλ H , owing to the form of likelihood function used.

Assessing goodness of fit for ADF estimates
Recall that, from equation (2.1), we have T * w ∼ Exp(λ (w)) as u w → ∞ for all w ∈ [0, 1]. We exploit this result to assess the goodness of fit for ADF estimates.
Letλ (w), w ∈ [0, 1], denote an estimated ADF obtained using the sample {(x i , y i ) : i = 1, . . . , n}. Given w ∈ [0, 1], let u w denote some high empirical quantile from the sample t w , and consider the sample t * w , with t w and t * w defined as in Section 3.2. If t * w is indeed a sample from an Exp(λ (w)) distribution, we would expect good agreement between the empirical and model quantiles. Letting n w = |t * w | and t * w( j) denote the j-th order statistic of t * w , we consider the set of

Estimating return curves
To quantify the risk of joint flooding events across sites, we follow ? and use the ADF to estimate a bivariate risk measure known as a return curve, as defined in Section 1. This measure is the direct bivariate extension of a return level, which is commonly used to quantify risk in the univariate setting (Coles, 2001). Taking probability values close to zero gives a summary of the joint extremal dependence, thus allowing for comparison across different data structures.
In the context of extreme river flows, return curves can be used to evaluate at which sites joint extremes (floods) are more/less likely to occur. For illustration, we fix p to correspond to a 5 year return period, i.e., p = 1/(5n y ), where n y is the average number of points observed in a given year (Brunner et al., 2016). Excluding missing observations, we have 28 years of data, hence the resulting curve should be within the range of data whilst simultaneously representing the joint tail. The resulting return curve estimates for each ADF estimator and pair of gauge sites can be found in Figure 9. There is generally good agreement among the estimated curves. The almost-square shapes of the estimates for the first two pairs of gauges indicate higher likelihoods of observing simultaneous flood events at the corresponding gauge sites; this is as expected given the close spatial proximity of these sites. In all cases, the curves derived viaλ H are quite rough and unrealistic, and are subsequently ignored. To assess the goodness of fit of the remaining return curve estimates, we consider the first and fifth examples and apply the diagnostic introduced in ?. Our results suggest good quality model fits for both of the estimates obtained usingλ CL2 andλ ST . Furthermore, we also obtain 95% return curve confidence intervals for these examples. The resulting plots illustrating the diagnostics and confidence intervals, along with a brief explanation of the diagnostic tool, are given in the Supplementary Material.

Discussion
We have introduced a range of novel global estimators for the ADF, as detailed in Section 3.
We compared these estimators to existing techniques through a systematic simulation study and found our novel estimators to be competitive in many cases. In particular, the estimators derived via the composite likelihood approach of Section 3.1, alongside the estimator of Simpson and Tawn (2022), appear to have lower bias and variance, on average, compared to alternative estimation techniques. We also applied ADF estimation techniques to a range of river flow data sets, and obtained estimates of return curves for each data set. The results suggest that our estimation procedures are able to accurately capture the range of extremal dependence structures exhibited in the data, allowing for a more robust risk analysis of joint flooding events.
From Section 4, one can observe that the 'combined' estimators proposed in Section 3.3 outperform their 'uncombined' counterparts in the majority of instances. This indicates that incorporating the knowledge obtained from the conditional extremes parameters leads to improvements in ADF estimates. Furthermore, in most cases, ADF estimates obtained via approximations of the set G appeared to have lower bias compared to alternative estimation techniques. More generally, these results suggest that inferential techniques that exploit the results of Nolde and Wadsworth (2022) are superior to techniques which do not. Estimation of G, and its impact on estimation of other extremal dependence properties, represents an important line of research.
As noted in Section 1, few applications of the modelling framework described in equation (1.4) exist, even though this model offers advantages over the widely used approach of Heffernan and Tawn (2004) when evaluating joint tail probabilities. Inference via the ADF ensures consistency in extremal dependence properties, and one can obtain accurate estimates of certain risk measures, such as return curves.
For each of the existing and novel estimators introduced in Sections 2 and 3, we were required to select quantile levels, which is equivalent to selecting thresholds of the min-projection. With the exception ofλ ST , similar quantile levels were considered for each estimator so as to provide some degree of comparability. However, due to variation in estimation procedures, we acknowledge that the selected quantile levels are not readily comparable since the quantity of joint tail data used for estimation varies across different estimators. Moreover, as noted in Section 3.4, trying to select 'optimal' quantile levels appears a fruitless exercise since the performance of each estimator does not appear to alter much across different quantile levels.
Finally, we acknowledge the lack of theoretical treatment for the proposed ADF estimators which is an important consideration for understanding properties of the methodology. However, theoretical results of this form typically require in-depth analyses and strict assumptions, which themselves may be hard to verify, whilst in practice one can only ever look at diagnostics obtained from the data. We have therefore opted for a more practical treatment of the proposed estimators.

A Example ADF estimates
Examples of ADF estimates obtained using each of the estimators discussed in the main article are given in Figure S1 for a bivariate Gaussian copula with ρ = 0.5. The different estimates are in good agreement, and one can observe the roughness in estimates obtained via the Hill estimator. Figure S2 illustrates estimates of the boundary set G obtained using the technique proposed in Simpson and Tawn (2022) for three copula examples.  Figure S1: ADF estimates from each of the estimators discussed in the main article. Red represents the true ADF, with the theoretical lower bound given by the dotted black lines.  Figure S2: The boundary set G (given in red) for three copula examples, with estimates from Simpson and Tawn (2022) given in green. Left: bivariate Gaussian copula with correlation coefficient ρ = 0.5. Centre: inverted logistic copula with dependence parameter r = 0.8. Right: logistic copula with dependence parameter r = 0.8. In each plot, the coordinate limits of G are denoted by the black dotted lines.

B Example boundary set estimates
whereλ j denotes the ADF estimate for sample j and W denotes the set of rays spanning the interval [0, 1], as defined in Section 3.4 the main article. An estimate of the RMISE is then given For the polynomial degree, we considered k ∈ {2, 3, . . . , 11}; higher values of k were not considered due to computational complexity. The left and right panels of Figures S3 and S4 correspond to Gaussian copulas exhibiting strong (ρ = 0.9) and weak (ρ = 0.1) positive dependence, respectively.
One can observe that, for the strongly dependent example, the RMISE estimates tend to decrease as the value of k increases. This is in agreement with findings for the PDF (e.g., Vettori et al., 2018); strongly dependent copulas require a higher degree of flexibility to capture the triangle-like shapes of the resulting dependence functions.
On the other hand, for the weakly dependent Gaussian copula, the value of k made little difference to the resulting RMISE estimates. There is even a slight increase in RMISE estimates for higher values of k; this suggests that having a higher polynomial degrees for such data structures may lead to over-fitting.
In practice, we set k = 7; this value is sufficient for obtaining low RMISE estimates under
both copula examples. Furthermore, for the strongly dependent Gaussian copula, the reduction in RMISE values above this value of k are very marginal. This polynomial degree therefore appears to offer sufficient flexibility without high computational burden and/or parameter variability.

D Additional simulation study results
The ISB and IV estimates for each estimator are given in Tables S1 and S2. One can observe that, while theλ ST appears to perform best in terms of ISB, the estimators derived using the composite likelihood function (λ CL andλ CL2 ) exhibit the least IV for eight out of the nine copula examples.
For the most part, one can observe similar ISB and IV values across the different estimators.  Figure S5 illustrates daily river flow time series for each of the six gauges in the north of England, UK. These series suggest a stationarity assumption is reasonable for the extremes of each data set. Figure S6 illustrates the QQ plots from the fitted GPDs at each of the six gauges. One can observe that, in each case, the majority of points lie close to the y = x line, indicating the fitted models capture the upper tails well.  Figures S9 and S10 illustrate the return curve diagnostic of ? for the estimatorsλ ST andλ CL2 , respectively. For this diagnostic, a subset of points are selected on a return curve estimate; these points correspond to a set of m = 150 equally spaced angles θ in the interval [0, π/2], i.e., given (x, y) ∈ RC(p), we have θ = tan −1 (y/x). Empirical estimates of the joint survival function are computed for each point and bootstrapping is used to evaluate uncertainty. Finally, the median empirical estimates, alongside 95% pointwise confidence intervals, are plotted against the angle index and compared to the true probability; see ? for further details. Simpson, E. S. and Tawn