Pricing and hedging American-style options with deep learning

This paper describes a deep learning method for pricing and hedging American-style options. It first computes a candidate optimal stopping policy. From there it derives a lower bound for the price. Then it calculates an upper bound, a point estimate and confidence intervals. Finally, it constructs an approximate dynamic hedging strategy. We test the approach on different specifications of a Bermudan max-call option. In all cases it produces highly accurate prices and dynamic hedging strategies yielding small hedging errors.


Introduction
Early exercise options are notoriously difficult to value. For up to three underlying risk factors, tree-based and classical PDE approximation methods usually yield good numerical results; see e.g. [19,14,27] and the references therein. To treat higher-dimensional problems, various simulation based methods have been developed; see e.g., [31,4,11,2,26,32,15,3,24,13,9,8,21]. [18,23] have already used shallow 1 neural networks to estimate continuation values. More recently, in [30] optimal stopping problems in continuous time have been solved by approximating the solutions of the corresponding free boundary PDEs with deep neural networks. In [5,6] deep learning has been used to directly learn optimal stopping strategies. The main focus of these papers is to derive optimal stopping rules and accurate price estimates.
The goal of this article is to develop a deep learning method which learns the optimal exercise behavior, prices and hedging strategies from samples of the underlying risk factors. It first learns a candidate optimal stopping strategy by regressing continuation values on multilayer neural networks. Employing the learned stopping strategy on a new set of Monte Carlo samples gives a low-biased estimate of the price. Moreover, the candidate optimal stopping strategy can be used to construct an approximate solution to the dual martingale problem of [28] and [18], yielding a high-biased estimate and confidence intervals for the price. In the last step, our method learns a dynamic hedging strategy in the spirit of [17] and [10]. But here, the continuation value approximations learned during the construction of the optimal stopping strategy can be used to break the hedging problem down into a sequence of smaller problems that learn the hedging portfolio only from one possible exercise date to the next. Alternative ways of computing hedging strategies consist in calculating sensitivities of option prices (see e.g., [3,7,21]) or approximating a solution to the dual martingale problem (see [28,29]).
Our work is related to the preprints [25] and [12]. [25] also uses neural network regression to estimate continuation values. But the networks are slightly different. While [25] works with leaky ReLU activation functions, we use tanh activation. Moreover, [25] provides a proof that the pricing algorithm converges if the number of simulations and the sizes of the networks go to infinity, whereas we calculate a posteriori guarantees for the prices and use the estimated continuation value functions to implement an efficient hedging algorithm. [12] proposes an alternative way of calculating prices and hedging strategies for American-style options by solving BSDEs.
The rest of the paper is organized as follows. In Section 2 we describe our neural network version of the Longstaff-Schwartz algorithm to estimate continuation values and construct a candidate optimal stopping strategy. In Section 3 the latter is used to derive lower and bounds as well as confidence intervals for the price. Section 4 discusses two different ways of computing dynamic hedging strategies. In Section 5 the results of the paper are applied to price and hedge a call option on the maximum of different underlying assets. Section 6 concludes.

Calculating a candidate optimal stopping strategy
We consider an American-style option that can be exercised at any one of finitely 2 many times 0 = t 0 < t 1 < · · · < t N = T . If exercised at time t n , it yields a discounted payoff given by a squareintegrable random variable G n defined on a filtered probability space (Ω, F, G = (G n ) N n=0 , P). We assume that G n describes the information available at time t n and G n is of the form g(n, X n ) for a measurable function g : {0, 1, . . . , N } × R d → R + and a d-dimensional G-Markov process 3 (X n ) N n=0 . We assume X 0 to be deterministic and P to be an equivalent martingale measure. So that the value of the option at time 0 is given by where T is the set of all G-stopping times τ : Ω → {0, 1, . . . , N }. If the option has not been exercised before time t n , its discounted value at that time is where T n is the set of all G-stopping times satisfying n ≤ τ ≤ N . Obviously, τ N ≡ N is optimal for V T = G N . From there, one can recursively construct the stopping times Clearly, τ n belongs to T n , and it can be checked inductively that In particular, τ n is an optimizer of (1).
Recursion (2) is the theoretical basis of the Longstaff-Schwartz method [26]. Its main computational challenge is the approximation of the conditional expectations E[G τ n+1 | X n ]. It is well-known that E[G τ n+1 | X n ] minimizes the mean squared distance E (G τ n+1 − Z) 2 ) over all σ(X n )-measurable random variables Z : Ω → R; see, e.g., [1]. The Longstaff-Schwartz algorithm approximates E[G τ n+1 | X n ] by projecting G τ n+1 on the linear span of finitely many functions of X n . But it is also possible to project on a different subset. If the subset is given by c θ (X n ) for a function family c θ : R d → R parametrized by θ, one can apply the following variant 4 of the Longstaff-Schwartz algorithm: (iv) Approximate the continuation value function by C θn := c θn ∨ 0 and set , and set C θ 0 constantly equal to θ 0 .
In this paper we specify c θ as a feedforward neural network, which in general, is of the form where 4 The main difference between this algorithm and the one of Longstaff and Schwartz (2001) is the use of neural networks instead of basis functions. In addition, the sum in (4) is over all simulated paths, whereas Longstaff and Schwartz (2001) focus on the in-the-money paths to save computation effort. Moreover, since we know that continuation values cannot be negative we work with C θn = c θn ∨ 0 instead of c θn .
• I ≥ 1 and q 0 , q 1 , . . . , q I denote the depth and numbers of nodes in the different layers The components of the parameter θ consist of the entries of the matrices A 1 , . . . , A I and vectors b 1 , . . . , b I appearing in the representation of the affine functions (4) we use a stochastic gradient descent method.

Lower bound
Once θ 0 , θ 1 , . . . , θ N −1 have been determined, we set Θ = (θ 0 , . . . , θ N −1 ) and define This defines a valid G-stopping time. Therefore, L = E g(τ Θ , X τ θ ) is a lower bound for the optimal value V . But typically, it is not possible to calculate the expectation exactly. Therefore, we generate simulations g k of g(τ Θ , X τ Θ ) based on sample paths (x k n ) N n=0 , k = K + 1, . . . , K + K L , of (X n ) N n=0 generated independently of (x k n ) N n=0 , k = 1, . . . , K, and approximate L with the Monte Carlo averageL Denote by z α/2 the 1 − α/2 quantile of the standard normal distribution and consider the sample standard deviationσ Then one obtains from the central limit theorem that is an asymptotically valid 1 − α/2 confidence interval for L.

Upper bound, point estimate and confidence intervals
From τ Θ , we construct the G-martingale M Θ 0 = 0, M Θ 1 , . . . , M Θ N as in Subsection of [5]. We know from Proposition 7 of [5] that if (ε n ) N n=0 is a sequence of integrable random variables satisfying E [ε n | G n ] = 0 for all n = 0, 1, . . . , N , then is an upper bound for V . As in [5], we use nested simulation 5 to generate realizations M k n of M Θ n + ε n along simulated paths (z k n ) N n=0 , k = 1, . . . , K U , of (X n ) N n=0 and estimate U aŝ The sample standard deviation of the estimatorÛ , given bŷ can be used together with the one-sided confidence interval (8) to construct the asymptotically valid two-sided 1 − α confidence interval for the true value V ; see [5].

Hedging
We now consider e ∈ N tradable financial instruments as hedging instruments and assume the candidate optimal exercise strategy τ θ derived in Section 2 does not stop at time 0. We fix a positive integer M and introduce a time grid 0 = u 1 < u 2 < · · · < u N M such that u nM = t n for all n = 0, 1, . . . , N . We suppose that the information available at time u m is described by F m , where F = (F m ) M N m=0 is a filtration satisfying F nM = G n for all n. If the hedging instruments 5 The use of nested simulation ensures that M k n are unbiased estimates of M Θ n , which is crucial for the validity of the upper bound. We do not use the approximate continuation value functions C θn trained in Section 2 to compute upper bounds.
pay dividends, they are immediately reinvested. We assume that the resulting discounted value processes are of the form P um = p m (Y m ) for measurable functions p m : R d → R e and an F-Markov process 6 (Y m ) N M m=0 such that Y nM = X n for all n = 0, . . . , N . A hedging strategy consists of a sequence h = (h m ) N M −1 m=0 of functions h m : R d → R e specifying the holdings in P 1 um , . . . , P e um at time u m . The resulting discounted gains at time u m are given by

Hedging until the first possible exercise date
For a typical Bermudan option, the time between two possible exercise dates t n − t n−1 might range between a week and several months. In case of an American option, we choose t n = n∆ for a small amount of time ∆ such as a day or half a day. If τ Θ does not stop at time 0, we only compute the hedge until time t 1 . If the option is still alive at time t 1 , the hedge can then be computed until time t 2 and so on. To construct a hedge from time 0 to t 1 , we approximate the time-t 1 value of the option with V θ 1 To do that we approximate the functions h m with neural networks 7 h λ : R d → R e of the form (6) and try to find parameters λ 0 , . . . , λ M −1 which minimize for simulations (y k m ) M m=0 , k = 1, . . . , K H of (Y m ) M m=0 . We train the networks h λ 0 , . . . , h λ M −1 together, again using a stochastic gradient descent method.
Once λ 0 , . . . , λ M −1 have been determined, we assess the quality of the hedge by simulating new 8 realizations (y k m ) M m=0 , k = K H + 1, . . . , K H + K E of (Y m ) M m=0 and calculating the empirical intermediate hedging shortfall

Hedging until the exercise time
Alternatively, one can precompute the whole hedging strategy from time 0 to T and then use it until the option is exercised. In order to do that we introduce the functions Once the hedging strategy has been trained, we simulate independent samples (y k m ) N M m=0 , k = K H + 1, . . . , K H + K E , of (Y m ) N M m=0 and denote the realization of τ Θ along each sample path (y k m ) N M m=0 by τ k . The corresponding empirical hedging shortfall is given by

Example
In this section we study 9 a Bermudan max-call option on d financial asset with risk-neutral price dynamics for a risk-free interest rare r ∈ R, initial values s i 0 ∈ (0, ∞), dividend yields δ i ∈ [0, ∞), volatilities σ i ∈ (0, ∞) and a d-dimensional Brownian motion W with constant instantaneous correlations 10 ρ ij ∈ R between different components W i and W j . The option has payoff max 1≤i≤d S i t − K + for a strike price K ∈ R + and can be exercised at one of finitely many times 0 = t 0 < t 1 < · · · < t N = T . For notational simplicity, we assume in the following that t n = nT /N for n = 0, 1, . . . , N , and all assets have the same 11 characteristics; that is, s i 0 = s 0 , δ i = δ and σ i = σ for all i = 1, . . . , d. 9 The computations were run on a NVIDIA GeForce GTX 1080 GPU with 1974 MHz core clock and 8 GB GDDR5X memory with 1809.5 MHz clock rate. As underlying system we used an Intel Core i7-6800K 3.4 GHz CPU with 64 GB DDR4-2133 memory running Tensorflow 1.11 on Ubuntu 16.04. 10 That is, for all i = j and s < t. 11 Simulation based methods work for any price dynamics that can efficiently be simulated. Prices of max-call options on underlying assets with different price dynamics were calculated in [9] and [5].   Table 1: Price estimates for max-call options on 5 and 10 symmetric assets for parameter values of r = 5%, δ = 10%, σ = 20%, ρ = 0, K = 100, T = 3, N = 9. t L is the number of seconds it took to train τ Θ and computeL. t U is the computation time forÛ in seconds. 95% CI is the 95% confidence interval (9). The last column lists the 95% confidence intervals computed in [5].

Pricing results
Let us denote X n = S tn , n = 0, 1, . . . , N . Then the price of the option is given by where the supremum is over all stopping times τ : Ω → {0, 1, . . . , N } with respect to the filtration generated by (X n ) N n=0 . The option payoff does not carry additional information. But the training of the continuation values worked more efficiently when we used it as an additional feature. So instead of X n we simulated the extended state processX n = (X 1 n , . . . , X d n , X d+1 n ) for Our results forL,Û ,V and 95% confidence intervals for different specifications of the model parameters are reported in Table 1. It can be seen that to achieve a pricing accuracy comparable to the more direct methods of [5] and [6], the networks used in the construction of the candidate optimal stopping strategy have to be trained for a longer time. We set P i um = Y i m . To learn the hedging strategy, we trained neural networks h λm : R d → R d , m = 0, . . . , N M − 1, of the form (6) with depth I = 3, d + 50 nodes in each of the hidden layers and activation function ϕ = tanh. Again, we used mini-batch stochastic gradient descent with Xavier initialization, batch normalization and Adam updating. Table 2 reports the intermediate hedging shortfall (10) and the total hedging shortfall (11) for various numbers M of rebalancing times between two different exercise dates t n−1 and t n .

Conclusion
In this article we used deep neural networks to price and hedge American-style options. In a first step our method uses a neural network version of the Longstaff-Schwartz algorithm to estimate continuation values and design a candidate optimal stopping rule. The learned stopping rule immediately gives a low-biased estimate of the price. In addition, it can be used to construct an approximate solution of the dual martingale problem of [28] and [18]. This gives a high-biased estimate and confidence intervals for the price. To achieve the same pricing accuracy as the more direct approaches of [5] and [6], we had to train the neural network approximations of the continuation values for a longer time. But computing approximate continuation values has the advantage that they can be used to break the hedging problem into a sequence of subproblems that compute the hedge only from one possible exercise date to the next.   Table 2: Empirical hedging shortfalls for 5 and 10 underlying assets and different numbers M of rehedging times between consecutive possible exercise times t n−1 and t n . The values of the parameters r, δ, σ, ρ, K, T and N were chosen as in Table 1. IHS is the intermediate hedging shortfall (10) and HS the total hedging shortfall (11). They were estimated using K E = 4,096,000 sample paths.V is our point estimate of the price from Table 1. CT1 is the computation time in seconds for training the hedging strategy from time 0 to t 1 = T /N . CT2 is number of seconds it took to train the complete hedging strategy from time 0 to T .