Deep hedging of long-term financial derivatives

https://doi.org/10.1016/j.insmatheco.2021.03.017Get rights and content

Abstract

This study presents a deep reinforcement learning approach for global hedging of long-term financial derivatives. A similar setup as in Coleman et al. (2007) is considered with the risk management of lookback options embedded in guarantees of variable annuities with ratchet features. The deep hedging algorithm of Buehler et al. (2019a) is applied to optimize neural networks representing global hedging policies with both quadratic and non-quadratic penalties. To the best of the author’s knowledge, this is the first paper that presents an extensive benchmarking of global policies for long-term contingent claims with the use of various hedging instruments (e.g. underlying and standard options) and with the presence of jump risk for equity. Monte Carlo experiments demonstrate the vast superiority of non-quadratic global hedging as it results simultaneously in downside risk metrics two to three times smaller than best benchmarks and in significant hedging gains. Analyses show that the neural networks are able to effectively adapt their hedging decisions to different penalties and stylized facts of risky asset dynamics only by experiencing simulations of the financial market exhibiting these features. Numerical results also indicate that non-quadratic global policies are significantly more geared towards being long equity risk which entails earning the equity risk premium.

Introduction

Variable annuities (VAs), also known as segregated funds and equity-linked insurance, are financial products that enable investors to gain exposure to the market through cashflows that depend on equity performance. These products often include financial guarantees to protect investors against downside equity risk with benefits that can be expressed as the payoff of derivatives. For instance, a guaranteed minimum maturity benefit (GMMB) with ratchet feature is analogous to a lookback put option by providing a minimum monetary amount at the maturity of the contract equal to the maximum account value on specific dates (e.g. anniversary dates of the policy). The valuation of VAs guarantees is typically done with classical option pricing theory by computing the expected risk-neutral discounted cashflows of embedded options under an appropriate equivalent martingale measure; see, for instance, Brennan and Schwartz (1976), Boyle and Schwartz (1977), Pelsser (2003), Bauer et al. (2008) and Ng and Li (2011). A comprehensive review of pricing segregated funds guarantees literature can be found in Gan (2013).

During the subprime mortgage financial crisis, many insurers incurred large losses in segregated fund portfolios due in part to poor risk management with some insurers even stopping writing VAs guarantees in certain markets (Zhang, 2010). Two categories of risk management approaches are typically used in practice: the actuarial method and the financial engineering method (Boyle and Hardy, 1997). The former consists in providing stochastic models for the risk factors and setting a reserve held in risk-free assets to cover liabilities associated to VAs guarantees with a certain probability (e.g. the Value-at-Risk at 99%). The second approach, commonly known as dynamic hedging, entails solving for a self-funded sequence of positions in securities to mitigate the risk exposure of embedded options. Dynamic hedging is a popular risk management approach among insurance companies and is studied in this current paper; the reader is referred to Hardy (2003) for a detailed description of the actuarial method.

Financial markets are said to be complete if every contingent claim can be perfectly replicated with some dynamic hedging strategy. In practice, segregated funds embedded options are typically not attainable as a consequence of their many interrelated risks which are very complex to manage such as equity risk, interest rate risk, mortality risk and basis risk. For insurance companies selling VAs with guarantees, market incompleteness entails that some level of residual risk must be accepted as being intrinsic to the embedded options; the identification of optimal hedging policies in such context is thus highly relevant. Nevertheless, the attention of the actuarial literature has predominantly been on the valuation of segregated funds, not on the design of optimal hedging policies. Indeed, the hedging strategies considered are most often suboptimal and not necessarily in line with financial objectives of insurance companies. One popular hedging approach is the greek-based policy where assets positions depend on the sensitivities of the option value (i.e. the value of the guarantee) to different risk factors. Boyle and Hardy (1997) and Hardy (2000) delta-hedge GMMBs under market completeness for mortality risk and Augustyniak and Boudreault (2017) delta-rho hedge GMMBs and guaranteed minimum death benefits (GMDBs) in the presence of model uncertainty for both equity and interest rate. An important pitfall of greek-based policies in incomplete markets is their suboptimality by design: they are a by-product of the choice of pricing kernel (i.e. of the equivalent martingale measure) for option valuation, not of an optimization procedure over hedging decisions to minimize residual risk. Furthermore, as shown in the seminal work of Harrison and Pliska (1981), in incomplete markets, there exist an infinite set of equivalent martingale measures each of which is consistent with arbitrage-free pricing and can thus be used to compute positions in hedging instruments (i.e. the greeks).

Another strand of literature optimizes hedging policies with local and global criterions. Local risk minimization (Föllmer and Schweizer, 1988, Schweizer, 1991) consists in choosing assets positions to minimize the periodic risk associated with the hedging portfolio. On the other hand, global risk minimization procedures jointly optimize all hedging decisions with the objective of minimizing the expected value of a loss function applied to the terminal hedging error. In spite of their myopic view of the hedging problem by not necessarily minimizing the risk associated with hedging shortfalls, local risk minimization procedures are attractive for the risk mitigation of VAs guarantees as they are simple to implement and they have outperformed greek-based hedging in several studies. Coleman et al. (2006) and  Coleman et al. (2007) apply local risk minimization procedures for risk mitigation of GMDBs using standard options with the foremost considering the presence of both interest rate and jump risk and the latter the presence of volatility and jump risk. Kélani and Quittard-Pinon (2017) extends the work of Coleman et al. (2007) in a general Lévy market with the inclusion of mortality risk and transaction costs, and Trottier et al., 2018b, Trottier et al., 2018a propose a local risk minimization scheme for guarantees in the presence of basis risk.

Within the realm of total risk minimization, global quadratic hedging pioneered by the seminal work of Schweizer (1995) aims at jointly optimizing all hedging decisions with a quadratic penalty for hedging shortfalls. The latter paper provides a theoretical solution to the optimal policy with a single risky asset (see Rémillard and Rubenthaler (2013) for the multidimensional asset case) and Bertsimas et al. (2001) develop a tractable solution to the optimal policy relying on stochastic dynamic programming. A major drawback of global quadratic hedging is in penalizing equally gains and losses, which is naturally not in line with the financial objectives of insurance companies. Alternatively, non-quadratic global hedging applies an asymmetric treatment to hedging errors by overly (and most often strictly) penalizing hedging losses. In contrast to global quadratic hedging, there is usually no closed-form solution to the optimal policy, but numerical implementations have been proposed in the literature: François et al. (2014) develop a methodology with stochastic dynamic programming algorithms for global hedging with any desired penalty function, Godin (2016) adapts the latter numerical implementation under the Conditional Value-at-Risk measure in the presence of transaction costs and Dupuis et al. (2016) study global hedging procedures under the semi-mean-square error penalty in the context of short-term hedging for an electricity retailer. The aforementioned studies demonstrated the vast superiority of non-quadratic global hedging over popular alternative hedging schemes (e.g. greek-based policies, local risk minimization and global quadratic hedging). Yet, to the best of the author’s knowledge, both quadratic and non-quadratic global hedging has seldom been applied for risk mitigation of segregated funds guarantees, or more generally, of long-term contingent claims.1 Furthermore, numerical schemes for global hedging are computationally intensive and often rely on solving Bellman’s equations, which is known to be prone to the curse of dimensionality (Powell, 2009). In the context of dynamically hedging segregated funds guarantees, the latter is a major drawback as it restrains the number of risk factors to consider for the financial market as well as prevents the use of multiple assets in the design of hedging policies. A feasible implementation of global hedging for the risk mitigation of VAs guarantees which is flexible to the choice of market features, to the hedging instruments and to the penalty for hedging errors would be desirable.

Recently, Buehler et al. (2019a) introduced a deep reinforcement learning (deep RL) algorithm called deep hedging to hedge a portfolio of over-the-counter derivatives in the presence of market frictions. The general framework of RL is for an agent to learn over many iterations of an environment how to select sequences of actions to optimize a cost function. RL has been applied successfully in many areas of quantitative finance such as algorithmic trading (e.g. Moody and Saffell (2001) and Deng et al. (2016)), portfolio optimization (e.g. Jiang et al. (2017) and Almahdi and Yang (2017)) and option pricing (e.g. Li et al. (2009), Becker et al. (2019) and Carbonneau and Godin (2021)). Hedging has also received substantial attention: Halperin (2020) and Kolm and Ritter (2019) propose TD-learning approaches to the hedging problem and Cao et al. (2020) and Carbonneau and Godin (2021) deep hedge European options under respectively the quadratic penalty and the Conditional Value-at-Risk measure. The deep hedging algorithm trains an agent to learn how to approximate optimal hedging decisions by neural networks through many simulations of a synthetic market. This approach is related to the deep learning method of Han and E (2016) by directly optimizing policies for stochastic control problems with Monte Carlo simulations. Arguably, the most important benefit of using neural networks to approximate optimal policies is to overcome the curse of dimensionality which arises when the state-space gets too large.

The contribution of this paper is threefold. First, this study presents a deep reinforcement learning procedure for global hedging long-term financial derivatives which are analogous under assumptions made in this study to embedded options of segregated funds. The methodological approach, which relies on the deep hedging algorithm, can be applied for the risk mitigation of any long-term European-type contingent claims (e.g. vanilla, path-dependent) with multiple hedging instruments (e.g. standard options and underlying) under any desired penalty (e.g. quadratic and non-quadratic) and in the presence of different risky assets stylized features (e.g. jump, volatility and regime risk). The second contribution consists in conducting broad numerical experiments of hedging long-term contingent claims with the optimized global policies. A similar setup as in the work of Coleman et al. (2007) is considered with the risk mitigation of ratchet GMMBs strictly for financial risks in the presence of jumps for equity. To the best of the author’s knowledge, this is the first paper that presents such an extensive benchmarking of quadratic and non-quadratic global policies for long-term options with the use of various hedging instruments and by considering different risky assets dynamics. The use of neural networks to solve global hedging problems enables us to provide novel qualitative insights into long-term global hedging. Such benchmarking would have been hardly attainable when relying on more traditional optimization procedures for global hedging such as stochastic dynamic programming due to the high-dimensional continuous state and action spaces considered in this study. Numerical experiments demonstrate the vast superiority of non-quadratic global hedging as it results simultaneously in downside risk metrics two to three times smaller than best benchmarks and in significant hedging gains. Our results clearly demonstrate that non-quadratic global hedging should be prioritized over other popular dynamic hedging procedures found in the literature as it is tailor-made to match the financial objectives of the hedger by always significantly reducing the downside risk as well as earning large expected positive returns. The third contribution is in providing important insights into specific characteristics of the optimized global policies. Monte Carlo experiments indicate that on average, non-quadratic global policies are significantly more bullish than their quadratic counterpart by holding a larger average equity risk exposure which entails earning the equity risk premium. The conduction of these experiments, and thus of the finding of these novel qualitative observations into long-term global hedging policies, heavily rely on the neural-based hedging scheme considered in this paper. Key factors which contribute to this specific characteristic of non-quadratic global policies are identified. Furthermore, analyses of numerical results show that the training algorithm is able to effectively adapt hedging policies (i.e. neural networks parameters) to different stylized features of risky asset dynamics only by experiencing simulations of the financial market exhibiting these features.

The paper is structured as follows. Section 2 introduces the notation and the optimal hedging problem. Section 3 describes the numerical scheme based on deep RL to optimize global hedging policies. Section 4 presents benchmarking of the risk mitigation of GMMBs under various market settings. Section 5 concludes.

Section snippets

Hedging long-term contingent claims

This section details the financial market setup and the hedging problem considered in this paper.

Methodology

This section describes the reinforcement learning procedure used to optimize global policies. The approach relies on the deep hedging algorithm of Buehler et al. (2019a) who showed that a feedforward neural network (FFNN) can be used to approximate arbitrarily well optimal hedging strategies in very general financial market conditions. At its core, a FFNN is a parameterized composite function which maps input to output vectors through the composition of a sequence of functions called hidden

Numerical study

This section presents an extensive numerical study of the neural-based global hedging scheme for the mitigation of the risk exposure associated to a short position in the long-term lookback option. Section 4.3 examines the hedging effectiveness of both quadratic and non-quadratic global hedging strategies as well as the local risk minimization scheme of Coleman et al. (2007) with different hedging instruments and different dynamics for the financial market. The conduction of such thorough

Conclusion

This paper studies global hedging strategies of long-term financial derivatives with a reinforcement learning approach. A similar financial market setup to the work of Coleman et al. (2007) is considered by studying the impact of equity jump risk on the hedging effectiveness of global procedures for segregated funds GMMBs. In the context of this paper, the latter guarantee is equivalent to holding a short position in a long-term lookback option of fixed maturity. The deep hedging algorithm of 

Acknowledgements

The author gratefully acknowledges financial support from the Fonds de recherche duQuébec - Nature et technologies (FRQNT, grant number 205683).

References (54)

  • PelsserA.

    Pricing and hedging guaranteed annuity options via static option replication

    Insurance Math. Econom.

    (2003)
  • RockafellarR.T. et al.

    Conditional Value-at-Risk for general loss distributions

    J. Bank. Financ.

    (2002)
  • SchweizerM.

    Option hedging for semimartingales

    Stochastic Process. Appl.

    (1991)
  • AbadiM.

    Tensorflow: Large-scale machine learning on heterogeneous distributed systems

    (2016)
  • AugustyniakM. et al.

    Mitigating interest rate risk in variable annuities: An analysis of hedging effectiveness under model risk

    N. Am. Actuar. J.

    (2017)
  • AugustyniakM. et al.

    Assessing the effectiveness of local and global quadratic hedging under GARCH models

    Quant. Finance

    (2017)
  • BauerD. et al.

    A universal pricing framework for guaranteed minimum benefits in variable annuities

    ASTIN Bull.: J. IAA

    (2008)
  • BeckerS. et al.

    Deep optimal stopping

    J. Mach. Learn. Res.

    (2019)
  • BengioY. et al.

    Learning long-term dependencies with gradient descent is difficult

    IEEE Trans. Neural Netw.

    (1994)
  • BertsimasD. et al.

    Hedging derivative securities and incomplete markets: an ϵ-arbitrage approach

    Oper. Res.

    (2001)
  • BlackF. et al.

    The pricing of options and corporate liabilities

    J. Political Econ.

    (1973)
  • BoyleP.P. et al.

    Equilibrium prices of guarantees under equity-linked contracts

    J. Risk Insurance

    (1977)
  • BuehlerH. et al.

    Deep hedging

    Quant. Finance

    (2019)
  • BuehlerH. et al.

    Deep Hedging: Hedging Derivatives under Generic Market Frictions using Reinforcement LearningTechnical Report 19-80

    (2019)
  • CaoH. et al.

    Discrete-time variance-optimal deep hedging in affine GARCH models

    (2020)
  • CarbonneauA. et al.

    Equal risk pricing of derivatives with deep hedging

    Quant. Finance

    (2021)
  • ColemanT. et al.

    Robustly hedging variable annuities with guarantees under jump and volatility risks

    J. Risk Insurance

    (2007)
  • Cited by (18)

    • An accurate and stable numerical method for option hedge parameters

      2022, Applied Mathematics and Computation
      Citation Excerpt :

      Jang et al. [16] proposed the “Deep Option” framework, which solves the inherent data shortage problem due to the liquidity of the option data by pre-training the deep learning model with distilled data. Based on a hybrid gated neural network (hGNN), [17] developed a model that effectively handles options on the boundary and option Greeks by applying no-arbitrage constraints to the input layer parameters for their differentiable pricing model. [18] presented a deep reinforcement learning algorithm that is superior in hedging long-term contingent claims.

    View all citing articles on Scopus

    A GitHub repository with some examples of codes can be found at github.com/alexandrecarbonneau.

    View full text