A deep learning approach for computations of exposure profiles for high-dimensional Bermudan options

In this paper, we propose a neural network-based method for approximating expected exposures and potential future exposures of Bermudan options. In a first phase, the method relies on the Deep Optimal Stopping algorithm, which learns the optimal stopping rule from Monte-Carlo samples of the underlying risk factors. Cashflow-paths are then created by applying the learned stopping strategy on a new set of realizations of the risk factors. Furthermore, in a second phase the risk factors are regressed against the cashflow-paths to obtain approximations of pathwise option values. The regression step is carried out by ordinary least squares as well as neural networks, and it is shown that the latter produces more accurate approximations. The expected exposure is formulated, both in terms of the cashflow-paths and in terms of the pathwise option values and it is shown that a simple Monte-Carlo average yields accurate approximations in both cases. The potential future exposure is estimated by the empirical $\alpha$-percentile. Finally, it is shown that the expected exposures, as well as the potential future exposures can be computed under either, the risk neutral measure, or the real world measure, without having to re-train the neural networks.


Introduction
The exposure of a financial contract is the maximum amount that an investor stands to loose if the counterparty is unable to fulfill its obligations, for instance, due to a default. This means that, in addition to the market risk, a so-called counterparty credit risk (CCR) needs to be accounted for. Furthermore, the liquidity risk, which is the risk arising from potential costs of unwinding a position, is also closely related to the financial exposure. Over the counter (OTC) derivatives, i.e., contracts written directly between counterparties, instead of through a central clearing party (CCP), are today mainly subject to so-called valuation adjustments (XVA 1 ). These valuation adjustments aim to adjust the value of an OTC derivative for certain risk factors, e.g., credit valuation adjustment, which adjusts the price for the CCR or the funding valuation adjustment (FVA), which adjusts for the funding cost of an uncollateralized derivative. The financial exposure is central in the calculations of many of the XVAs. For an in-depth overview of XVAs, we refer to [2] and [3].
In this paper, we focus on financial exposures arising from options traded OTC. For a European style option, the exposure is simply the option value at some future time, given all the financial information available today. If t 0 is the time today, X t is a the d-dimensional underlying risk factors of an option at some t > t 0 , we define the exposure of the option, at time t, as the random variable 2 However, for derivatives with early-exercise features, we also need to take into account the possibility that the option has been exercised prior to t, sending the exposure to zero. The most well-known of such options is arguably the American option, which gives the holder the right to exercise the option at any time between t 0 and maturity T . In this paper, the focus is on the Bermudan option which gives the holder the right to exercise at finitely many, predefined exercise dates between t 0 and maturity T . For early-exercise options, the exposure needs to be adjusted for the possibility that the stopping time τ , representing the instance of time at which the option is exercised, is smaller than or equal to t. This leads to the following definition of the exposure where I {·} is the indicator function. Two of the most common ways to measure the exposure are the expected exposure (EE), which is the expected future value of the exposure, and the potential future exposure (PFE), which is some α−percentile of the future exposure (usually α = 97.5% for the upper, and α = 2.5% for the lower tails of the distribution). To accurately compute EE and PFE of a Bermudan option, it is crucial to use an algorithm which is able to accurately approximate, not only V t (X t ), but also I {τ >t} .
It is common to compute the EE and the PFE with simulation-based methods, usually from realizations of approximate exposures, from which the distribution of E t is approximated by its empirical counterpart. A generic scheme for approximation of exposure profiles is given below: 1. At each exercise date, find a representation, v t : R d → R, of the true value function V t (·); 2. Generate trajectories of the underlying risk factors, and at each exercise date, evaluate 3 v t (X t (ω)); 3. At the earliest exercise date where v t (X t (ω)) ≤ g(X t (ω)), set τ (ω) = t;

The distribution of E Ber
t , is then approximated by the empirical distribution created by many trajectories of the form V t (X t (ω))I {τ (ω)>t} . For instance, if the target statistic is the EE, the estimation is the sample mean of the exposure paths.
Step 1 above corresponds to the option valuation problem, which can be tackled by several different methods, all with their own advantages and disadvantages. For instance, the value function can be approximated by solving an associated partial differential equation (PDE), which is done in e.g., [4], [5], [6], [7] and [8], or the value function can be approximated by a Fourier transform methodology, which is done in e.g., [9], [10] and [11]. Furthermore, classical tree-based methods such as [12], [13] and [14], can be used. These types of methods are, in general, highly accurate but they suffer severely from the curse of dimensionality, meaning that they are computationally feasible only in low dimensions (say up to d = 4), see [15]. In higher dimensions, Monte-Carlo-based methods are often used, see e.g., [16], [17], [18], [19] and [20]. Monte-Carlo-based methods can generate highly accurate option values at t 0 , i.e., v t0 (X t0 = x t0 ), given that all trajectories of the underlying risk factors are initiated at some deterministic x t0 . However, at intermediate exercise dates, v t (X t (ω)) might not be an equally good approximation, due to the need of cross sectional regression. We show with numerical examples that, even though the approximation is good on average, the option value is underestimated in certain regions, which is compensated by overestimated values in other regions. For European options, this seems to have minor effect on EE and PFE, but for Bermudan options the effect can be significant due to step 3 above. To provide an intuitive understanding of the problem, we give an illustrative example. Assume that v t underestimates the option value in some region A, which is close to the exercise boundary, and overestimates the option value in some region B, where it is clearly optimal to exercise the option. The effect on the exposure would be an underestimation in region A, since underestimated option values would lead to fewer exercised options. In region B, the exposure would be zero in both cases since all options in that region would be exercised immediately. In total, this would lead to overestimated exposure. In numerical examples this is, off-course more involved and we see typically several regions with different levels of under/overestimated values. This makes the phenomenon difficult to analyze since the effect may lead underestimated exposure, unchanged exposure (from offsetting effects), or overestimated exposure. In addition to the classical regression methods, with e.g., polynomial basis functions, which are cited above there are several papers in which a neural network take the roll as a basis functions, see e.g., [21], [22], [23] and [24].
In this paper we suggest the use of the DOS algorithm, proposed in [1], to approximate the optimal stopping strategy. The DOS algorithm approximates the optimal stopping strategy by expressing the stopping time in terms of binary decision functions, which are approximated by deep neural networks. The DOS algorithm is very suitable for exposure computations, since the stopping strategy is computed directly, not as a consequence of the approximate value function, as is the case with most other algorithms. This procedure leads to highly accurate approximations of the exercise boundaries. Furthermore, we propose a neural network-based regression method (NN-regression) to approximate v t , which relies on the stopping strategy, approximated by the DOS algorithm. Although NN-regression is needed in order to fully approximate the distributions of future exposures, it turns out that for some target statistics, it is in fact sufficient to approximate the optimal stopping strategy. This is, for instance, the case for some estimates of the EE.
Another advantage of the method is that the EE and the PFE easily can be computed under the risk neutral measure, as well as the real world measure. When the EE is used to compute CVA, the calculations should be done under the risk neutral measure, since CVA is a tradable asset and any other measure 4 would create opportunities of arbitrage. For other applications, such as for computations of KVA, the EE should be calculated under the real world measure 5 . The reason for this is that the KVA, among other things, depends on the required CCR capital which is a function of the EE, which in this case, ideally should be calculated under the real world measure. For an explanation of KVA in general, see e.g., [25] and for a discussion about the effect different measures in KVA computations see [26]. This paper is structured as follows: In Section 2, the mathematical formulation of a Bermudan option and its exposure profiles are given. Furthermore, the Bermudan option, as well as the exposure profiles are reformulated to fit the algorithms introduced in later sections. The DOS algorithm is introduced in Section 3, and we propose some adjustments to make the training procedure more efficient. In Section 4, we present a classical ordinary least squares regression-based method, as well as a neural network-based regression method to approximate pathwise option values. In Section 5, the EE and PFE formulas are reformulated in a way such that they can easily be estimated by a combination of the algorithms presented in Sections 3 and 4, and a simple Monte-Carlo sampling. Finally, in Section 6, numerical results of the different algorithms, are presented for Bermudan options following the Black-Scholes dynamics as well as the Heston model dynamics.

Problem formulation
For d ∈ N and N ∈ N \ {0}, let X = (X tn ) N n=0 be an R d −valued discrete-time Markov process on a complete probability space (Ω, F, A). The outcome set Ω is the set of all possible realizations of the stochastic economy, F is a σ−algebra on Ω and we define F n as the sub-σ-algebra generated by (X tm ) n m=0 . With little loss of generality, we restrict ourselves to the case when X is constructed from time snap shots of a continuoustime Markov process at monitoring dates {t 0 , t 1 , . . . , t N }. The probability measure A is a generic notation, representing either the real world measure, or the risk neutral measure denoted P and Q, respectively.
If not specifically stated otherwise, equalities and inequalities of random variables should be interpreted in an ω-wise sense.

Bermudan options, stopping decisions and exercise regions
A Bermudan option is an exotic derivative that gives the buyer (or holder) the opportunity to exercise the option at a finite number of exercise dates, typically one per month. We define the exercise dates as T = {t 0 , t 1 , . . . , t N }, which for simplicity coincide with the monitoring dates. Furthermore, for t n ∈ T, we let the remaining exercise dates be defined as T n = {t n , t n+1 . . . , t N }. Let τ be an X−stopping time, i.e., a random variable defined on (Ω, F, A), taking on values in T such that for all t n ∈ T, it holds that the event {τ = t n } ∈ F n . Assume a risk-free rate r ∈ R and let the risk-less savings account process M (t) = e r(t−t0) be our numraire. For all t ∈ T, let 6 g t : R d → R be a measurable function which returns the immediate pay-off of the option, if exercised at market state (t, X t = x ∈ R d ). The initial value of a Bermudan option, i.e., the value at market state (t 0 , X t0 = x t0 ∈ R d ), is given by where T is the set of all X−stopping times. Assuming the option has not been exercised prior to some t n ∈ T, the option value at t n is given by where T n is the set of all stopping times satisfying τ n ∈ T n and we define E n To guarantee that (1) and (2) are well-posed, we assume that for all t ∈ T it holds that To concretize (1) and (2), we view the problem from the holder's perspective. At each monitoring date, t n ∈ T, the holder of the option is facing the decision whether or not to exercise the option, and receive the immediate pay-off. Due to the Markovian nature of X, the decisions are of Markov-type (Markov decision process (MDP)), meaning that all the information needed to make an optimal decision 7 is contained in the current market state, (t n , X tn ). With this in mind, we define for each t n ∈ T, the exercise region, E n , in which it is optimal to exercise, and the continuation region, C n , in which it is optimal to hold on, by Note that E n ∪ C n = R d and E n ∩ C n = {∅}.
Below, we give a short motivation for how these decisions can be expressed mathematically and how we can formulate a stopping time in terms of (stopping) decisions at each exercise date. For a more detailed motivation we refer to [1]. We introduce the following notation for measurable functions 8 Furthermore, for P ∈ N, let D(A; B) P denote the P :th Cartesian power of the set D(A 1 ; A 2 ). Define the decision functions f 0 , f 1 , . . . , f N ∈ D(R d ; {0, 1}), with f N ≡ 1, and denote f n = (f n , f n+1 , . . . , f N ) with f = f 0 . An X−stopping time can then be defined as where the empty product is defined as 1. We emphasize that τ n (f n ) = τ n f n (X tn ), f n+1 (X tn+1 ) . . . , f N (X t N ) but to make the notation more clean, we do not specify this dependency explicitly. When it is not clear from the context which process the decision function f n is acting on, we will denote the stopping time by τ n (f n (X)), where we recall that X = (X tm ) N m=n . As a candidate for optimal stopping decisions, we define We define the fair price of an option that follows the strategy constructed by combining (3) and (4) as The following theorem states that (5), in fact coincides with the fair price of the option as defined in (2).
Theorem 2.1. For all t n ∈ T, and V and V * as defined in (2) and (5), respectively, it holds that 9 V tn (X tn ) = V * tn (X tn ).
The proof is carried out by induction and we assume that We can rewrite V * tn (X tn ) as By the law of total expectation and the assumption (6), the last conditional expectation satisfies We insert the rightmost part of (8) in (7) and note that I { · ∈En} ∈ D(R d ; {0, 1}), which implies that Moreover, V * tn (X tn ) ≤ V tn (X tn ) and therefore we conclude that V * tn (X tn ) = V tn (X tn ).

Exposure profiles
For t n ∈ T and α ∈ (0, 1), we define the expected exposure (EE) and the potential future exposure (PFE) under the generic probability measure A as Note that the option value, given by (2), is a conditional expectation of future cashflows, which is always measured with Q. The exposure profiles under the P−measure, on the other hand, are statistics of the option value at some future time, t n under the assumption that the conditional distribution X tm |X t0 = x t0 is considered under the P−measure.
In the theorem below, the expected exposures are reformulated in terms of discounted cashflows and the decision functions introduced in Subsection 2.1. It will become clear in later sections that this is a more tractable form for the algorithms used in this paper.
Theorem 2.2. Let Q and P be probability measures on the measurable space (Ω, F) and assume that the laws of X under Q and P are absolutely continuous with respect to the Lebesque measure. Then, the expected exposure, (9) under the Q− and P−measures satisfies where l(y n , y n−1 , . . . , y 0 ) = n k=1 q X t k |X t k−1 (y k |y k−1 ) p X t k |X t k−1 (y k |y k−1 ) , with q Xt k |Xt k−1 (y k |y k−1 ) and p Xt k |Xt k−1 (y k |y k−1 ) being transition densities for X under the measures Q and P, respectively. Note that l(y n , y n−1 , . . . , y 0 ) is the Radon-Nikodym derivative of P and Q evaluated at (y n , y n−1 , . . . , y 0 ). 9 We recall that equalities and inequalities are in an ω-wise sense.
The same argumentation holds for q Xt n ,..., Xt 1 (y n , . . . , y 1 ). The proof is finalized by inserting the product of density functions (14) in (13) and writing the expression as a Q expectation, and again, use the law of total expectation. Theorem 2.2 opens up for approximations of the expected exposure directly from the discounted cashflows. We now consider the special case, when X is described by a diffusion-type stochastic differential equation (SDE).
be the drift and diffusion coefficients, respectively, and let (W Q t ) t∈[t0,t N ] be a d-dimensional standard Brownian motion under the measure Q. We assume that the µ Q and σ satisfy the usual conditions (see e.g., [27]) for existence of a unique, strong solution X to Let µ P : [t 0 , t N ] × R d → R d and assume that σ(t, X t ) is invertible and that 10 σ −1 (t, X t ) max is bounded almost surely. Then, (W P t ) t∈[t0,t N ] given by is a Brownian motion under the measure P. Furthermore, under the measure P it holds almost surely that X is described by As a way to rewrite EE P (t) as a conditional expectation under the measure Q, we define a process U = The reason for introducing this process is that U has the same distribution under the measure Q as X has under the measure P. We can then express EE P (t n ) with only Q−expectations in the following way Remark 2.1. Regarding the equality between the right hand side of the first line and the second line, we want to emphasize that X under the measure P, and U under the measure Q do not represent the same stochastic process (as they would have done after a change of measure). If they were, then the conditional expectation would change, and the equality would not hold. To enforce the equality to hold we could have corrected with the Radon-Nikodym derivative, and obtained (12). However, to find a way to write EE P (t n ) without having to including the Radon-Nikodym derivative, we introduce another process U, which is distributed in such a way that the equality holds, i.e., the conditional expectation under P, when using X should equal the conditional expectation under Q, when using U . For this to hold, it is sufficient that the distribution of X under the measure P equals the distribution of U under the measure Q, and this is true when X follows (15) and U follows (16).
Before we get rid of the inner expectation in (17), we need to define a process following (16) on [t 0 , t n ], and the dynamics of (15) on [t n , t N ] with (stochastic) initial condition X tn = U tn . We denote such a process bỹ , and conclude thatX tn should satisfy the following SDE where µ P,Q,tn (t, ·) = µ P (t, ·)I {t≤tn} + µ Q (t, ·)I {t>tn} . Note that we have implicitly assumed that also µ P satisfies the usual conditions for existence of a unique strong solution, U , to (16). As a consequence of this assumption we are also guaranteed that there exists a unique strong solution,X tn , to (18). We can then use the law of total expectation to obtain where we remind ourselves that in τ (f * ) = τ (f * (X tn )) and τ n (f * n ) = τ n (f * n (X tn )). In the next sections we describe a method to approximate f * (·). It is then straightforward to estimate (11), (12) and (19) by Monte-Carlo sampling. Furthermore, in Section 4, we introduce a method to approximate the price function V tn (·), which makes it straightforward to also approximate the potential future exposure, (10).

Learning stopping decisions
In the first part of this section, we present the DOS algorithm, which was proposed in [1]. The idea is to use fully connected neural networks to approximate the decision functions introduced in the previous section. The neural networks are optimized backwards in time with the objective to maximize the expected discounted cashflow at each exercise date. In the second part of this section, we suggest some adjustments that can be done in order to make the optimization more efficient.

The Deep Optimal Stopping algorithm
As indicated above, the core of the algorithm is to approximate decision functions. To be more precise, for n ∈ {0, 1, 2, . . . , N }, the decision function f n is approximated by a fully connected neural network of the form f θn n : R d → {0, 1}, where θ n ∈ R qn is a vector containing all the q n ∈ N trainable 11 parameters in network n. Since binary decision functions are discontinuous, and therefore unsuitable for gradient-type optimization algorithms, we use as an intermediate step, the neural network F θn n : R d → (0, 1). Instead of a binary decision, the output of the neural network F θn n can be viewed as the probability 12 for exercise to be optimal. This output is then mapped to 1 for values above (or equal to) 0.5, and to 0 otherwise, by defining f θn is as close as possible to V tn (X tn ) (in mean squared sense), where f * n+1 is the vector of optimal decision functions, defined in (4). Although (20) is an accurate representation of the optimization problem, it gives us some practical problems. In general, we have no access to either f * n+1 or the distribution of V tn (X tn ) and in most cases the expectation needs to be approximated. We however notice that at maturity time, the option value is equal to its intrinsic value, i.e., V t N (·) ≡ g t N (·), which implies that f * N ≡ 1 and τ N (f * N ) = t N . With this insight, we can write (20) with n = N − 1 in the form which can be approximated by Monte-Carlo sampling. Given M ∈ N samples, distributed as X, which for m ∈ {1, 2, . . . , M } is denoted by x = (x tn (m)) N n=0 , we can approximate (21) by Note that the only unknown entity in (22) is the parameter θ N −1 in the decision function f Furthermore, we note that we want to find θ N −1 such that (22) is maximized, since it represents the average cashflow in [t N −1 , t N ]. Once θ N −1 is optimized, we use this parameter and find θ N −2 such that the average cashflow on [t N −2 , t N ] is maximized.
In the next section, we explain the parameters θ n and present the structure for the neural networks used in this paper.

Specification of the neural networks used
For completeness, we introduce all the trainable parameters that are contained in each of the parameters θ 1 , θ 2 , . . . , θ N −1 , and present the structure of the networks.
• We denote the dimension of the input layers by D input ∈ N, and we assume the same input dimension for all n ∈ {1, 2, . . . , N − 1}, networks. The input is assumed to be the market state x train tn ∈ R d , and hence D input = d. However, we can add additional information to the input that is mathematically redundant but helps the training, e.g., the immediate pay-off, to obtain as input vec(x train tn (m)), g tn x train tn (m) ∈ R d+1 , which would give D input = d + 1; • For network n ∈ {1, 2, . . . , N − 1}, we denote the number of layers 13 by L n ∈ N, and for layer ∈ {1, 2, . . . , L n }, the number of nodes by N ,n ∈ N. Note that N n,1 = D input ; • For network n ∈ {1, 2, . . . , N }, and layer ∈ {2, 3, . . . , L n } we denote the weight matrix, acting between layers − 1 and , by w n, ∈ R N n, −1 ×N n, , and the bias vector by b n, ∈ R ; • For network n ∈ {1, 2, . . . , N }, and layer ∈ {2, 3, . . . , L n } we denote the (scalar) activation function by a n, : R → R and the vector activation function by a n, : R N n, → R N n, , which for x = (x 1 , x 2 , . . . , x N n, ) is defined by a n, (x) =    a n, (x 1 ) . . . a n, (x N n, )    ; • The output of our network should belong to (0, 1) ⊂ R, meaning that the output dimension of our neural network, denoted by D output should equal 1. To enforce the output to only take on values in (0, 1), we restrict ourselves to activation functions of the form a n,Ln : R → (0, 1).
Furthermore, since we have N − 1 neural networks, we denote by the trainable parameters in the neural networks at exercise dates T n and by θ = θ 1 .
13 Input and output layers included.

Training and testing the algorithm
The main idea of the training and testing procedure is to fit the parameters to some training data, and then use the fitted parameters to make informed decisions with respect to some unseen, so-called, test data. The training part of the algorithm is summarized below in pseudo code.
Training phase: Sample M train ∈ N independent realizations of X, which for m ∈ {1, 2, . . . , M train } are denoted (x train tn (m)) N n=0 . At maturity, define the cashflow as CF N (m) = g t N (x train t N (m)). For n = N − 1, N − 2, . . . , 1, do the following: In machine learning terminology, this would give an (empirical) loss-function of the form where x train refers to all the M train samples of (x train tn ) N n=0 . The minus sign in the loss-function transforms the problem from a maximization to minimization, which is the standard formulation in the machine learning community. Note the straightforward relationship between the loss function and the average cashflows in (22). In practice, the data is often divided into mini-batches, for which the loss-function is minimized consecutively.
2. For m = 1, 2, . . . , M train , update the discounted cashflow according to: The performance of the algorithm is not sensitive to the specific choice of the number of hidden layers, number of nodes, optimization algorithm, etc. Below is a list of the most relevant parameters/structural choices: • Initialization of the trainable parameters, where a typical procedure is to initialize the biases to 0, and sample the weights independently from a normal distribution; • The activation functions a ,n , which is used to add a non-linear structure to the neural networks. In our case we have the strict requirement that the activation function of the output layer maps R to (0, 1). This could, however, be relaxed as long as the activation function is both upper and lower bounded, since we can always scale and shift such output to take on values only in (0, 1). For a discussion of different activation functions, see e.g., [28].; • The batch size, B n ∈ {1, 2, . . . , M train }, is the number of training samples used for each update of θ n , i.e., with B n = M train , the loss function is of the form defined in step 1 above. Note that if we want all batches to be of equal size, we need to choose B n, to be a multiplier of M train ; • For each update of θ n , we use an optimization algorithm, for which a common choice is the Adam optimizer, proposed in [29]. Depending on the choice of optimization algorithm, there are different parameters related to the specific algorithm to be chosen. One example is the so-called learning rate which decides how much the parameter, θ n is adjusted after each batch.
Testing phase: , and fθ = fθ 0 . We then obtain for sample m, i.e., x test (m), the following stopping rule The estimated option value at t 0 is then given bŷ where we recall that τ (fθ) = τ 0 fθ 0 (x test (m)) . Note that, by construction, any stopping strategy is sub optimal, implying that the estimate (24) is biased low. It should be pointed out that it is possible to derive a biased high estimate of (1) from a dual formulation of the optimal stopping problem, which is described in [1]. In addition, numerical results in [1] show a tight interval for the biased low and biased high estimates for a wide range of different problems.

Proposed adjustments to the algorithm
The presentation of the DOS algorithm in [1] is in a general form. In addition to pricing of Bermudan options, the authors considered the non-Markovian problem to optimally stop a fractional Brownian motion (this is done by including also the historical states in the current state of the system). Since the aim of this paper is more specific (to approximate exposure profiles of Bermudan options), it is natural to use more of the known underlying structure of this specific problem. In this section we use some properties of the specific problems, and propose some adjustments to the DOS-algorithm, which make the training procedure more efficient.

Reuse of neural network parameters
The first proposed adjustment is to reuse parameters of neural networks that have already been optimized. We note that for a single Bermudan option (possibly with a high-dimensional underlying asset) the pay-off functions are identical at all exercise dates, i.e., g tn = g tm for all t n , t m ∈ T. In this case the stopping rules at adjacent exercise dates are similar, especially when t n+1 − t n is small. We therefore use stopping decisions at t n+1 as an "initial guess" of stopping decisions at t n . This is done by initializing the trainable parameters in network n by the already optimized parameters in network n + 1, i.e., at t n , initiate θ n byθ n+1 . This allows us to use smaller learning rates leading to a more efficient algorithm.

Use simple stopping decisions when possible
The term simple stopping decisions is loosely defined as stopping decisions that follow directly without any sophisticated optimization algorithm. The most obvious example is when the contract is out-of-the-money, in which case it is never optimal to exercise. For t n ∈ T, we define the set of in-the-money points and out-of-themoney points as respectively. Another, less obvious insight is that, given a single Bermudan option with identical pay-off functions at all exercise dates, if it is optimal to exercise at (t n , x), then it is also optimal to exercise at (t n+1 , x). Or in other words, the exercise region is non-decreasing with time. This statement is formulated as a theorem below.
Theorem 3.1. Define the set of exercise dates by {t 0 , . . . , t n , t n+1 , . . . , t N , t N + ∆}, and let ∆ = t n+1 − t n = t N +1 − t N ≥ 0. Note that an equidistant time grid is sufficient, but not necessary for the above to be satisfied. Moreover, assume that where t N and t N + ∆ indicate the maturity of otherwise identical contracts of the form (2), with g = g tn for all exercise dates t n . Then, for anyx ∈ E n , it holds thatx ∈ E n+1 .
Theorem 3.1 shows that the exercise region is non-decreasing with time, but since the optimization of the neural network parameters is carried out backwards in time we instead use the fact that the continuation region is non-increasing with time. In practice, this leads to the following three alternatives: A1. Use all training data in the optimization algorithm (as the algorithm is described in Subection 3.1), A2. At t n ∈ T, use the subset of the training data satisfying x test tn (m) ∈ ITM n in the optimization algorithm. Define the decision functions as fθ n n (·) = I {gt n (·)>0} (a • Fθ n n (·)), A3. At t n use the subset of the training data x test tn (m) ∈ E n+1 in the optimization algorithm. Define the decision functions as fθ n n (·) = fθ n+1 n+1 (·)(a • Fθ n n (·)).
In Figure 1 the three cases above are visualized for a two-dimensional max call option at one of the exercise dates. To the left we have the blue points belonging to E n+1 (used for optimization in case 3), the blue and red points belong to ITM n (used for optimization in case 2) and the blue, red and yellow points are all the available data (used for optimization in case 1). To the right, we see the fraction of the total data used in each case at each exercise date.

Learning pathwise option values
In Section 3 an algorithm to learn stopping decisions was described and (24) gives an approximation of the option value at time t 0 , given some deterministic initial state X t0 = x t0 . As described in Subection 2.2, to compute exposure profiles we sometimes need additional information about the future distribution of the option values. In this section, we present two methods to approximate the pathwise option values at all exercise dates. The first method is the well-established Ordinary Least Squares (OLS) regression and the second method is a neural network-based least squares regression. Both methods rely on projections of conditional expectations on a finite-dimensional function space.

Formulation of regression problem
Central for the regression algorithms presented in this section is the cashflow process Y = (Y tn ) N n=0 , defined as 14 where τ n (·) and f * are defined in (3) and (4), respectively. We assume that for t n ∈ T it holds that which also implies that E Q [Y 2 tn ] < ∞. The following theorem states that the option value, at some t n ∈ T, is equivalent (in L 2 sense) to the so-called regression function. Furthermore, we see that the regression function can be obtained by solving a minimization problem.
Theorem 4.1. Let Y tn be as defined in (25) and for h ∈ D(R d ; R), define the so-called L 2 -risk as It then holds that For an arbitrary function, v : R d → R, it holds that By the law of total expectation, the last term satisfies which is clearly minimized when v ≡ m. Also, notice that m(X tn ) = V * tn (X tn ), and by Theorem 2.1, which concludes the proof.
In practice, the distribution of (X, Y ) is usually unknown. On the other hand, we are often able to generate samples distributed as 15 (X, Y ). We consider some t n ∈ T, and generate M reg ∈ N independent realizations of the regression pair (X tn , Y tn ), which we denote by x reg tn (m), y reg tn (m) Mreg m=1 . Similarly, we define the empirical L 2 -risk by With a fixed sample of regression pairs it is of course possible to find a function h ∈ D(R d ; R) such that the empirical L 2 -risk is zero. However, such a function is not a consistent estimator in general. Therefore, we want to use a smaller class of more regular functions. When choosing the function class, which we denote by A M , we need to keep two aspects in mind; P1. It should to be "rich enough" to be able to approximate V tn (·) sufficiently accurately, P2. It should not be "too rich" since that may cause the empirical L 2 -risk being an inaccurate approximation of the L 2 -risk. Since this problem is more severe for smaller M reg it is reasonable to have the sample size in mind when choosing the function class, and hence the subscript M on A M , where"reg" is dropped for notational convenience. A too rich function class may lead to what is known as overfitting in the machine learning community.
Given a sample and a function class A M , we define the empirical regression function as Since (26) holds for arbitrary v, we can write the L 2 -risk of the empirical regression function and the option value as This in turn can be written in terms of the so-called estimation error (first term) and the approximation error (second term), i.e., The approximation error measures how well the option value can be estimated by functions in A M , which corresponds to (P1) above. The estimation error is the difference between the L 2 -risk of the estimator m M and the optimal h in A M , which corresponds to (P2) above.
There is however, one problem with the approximation error above; we have assumed that we can sample realizations of (X, Y ), while we in practice only are able to sample from (X,Ŷ ), withŶ = (Ŷ t ) t∈[t0,t N ] given bŷ Y tn = e −r τn fθ n −tn g τn(fθ n ) X τn(fθ n ) .
By also taking into account that the regression is carried out against an approximation of Y , (27) becomes instead The first two lines in (28) are, again, the estimation error and the approximation error, respectively. The third line represents the difference between how well a function in A M can approximateŶ tn and Y tn and the final row is the L 2 -risk of our approximation of the discounted cashflow and the true discounted cashflow. Furthermore, note that the equality in (27) has changed to an inequality in (28).
In the next section, we introduce the two different types of function classes that are used in this paper.

Ordinary least squares regression
At t n ∈ T, we assume that V tn (X tn ) can be represented by a linear combination of a countable set of F n -measurable basis functions. We denote by {φ b } ∞ b=0 the basis functions and given optimal parameters α We now want to estimate (29) by projecting a sample of realizations of (X tn , Y tn ) onto the B first basis functions. This procedure is similar to the Least Squares Method (LSM), proposed in [17]. In the LSM, only ITM samples are used, which is motivated by the fact that it is never optimal to exercise an option that is OTM and the objective is to find the optimal exercise strategy. Furthermore, the authors claim that the number of basis functions needed in order to obtain an accurate approximation is significantly reduced since the approximation region is reduced by only considering ITM paths. However, this is not possible in our case since we need to approximate the option everywhere 16 . On the other hand, the exercise region, E n , has already been approximated (as described in Subsection 3.1) and the option value in the exercise region is always known. This means that, similar to the LSM, the approximation region can be reduced (in many cases significantly) by only considering samples belonging to the continuation region, C n . Given a sample of regression pairs x reg tn (m), y reg tn (m) tn ) such that the following L 2 -risk is minimized For notational convenience, we introduce the compact notation x tn = x reg tn (1), . . . , x reg tn (M Cn reg ) , y tn = y reg tn (1), . . . , y reg tn (M Cn reg ) and It is a well-known fact (see e.g., [30]) thatα tn is given bŷ where we note that, if we choose linearly independent basis functions, matrix inversion in (31) exists almost surely since X tn has a density 17 . We define the estimator Since the LSM is one of the most established algorithms for valuation of Bermudan options the theoretical properties are extensively studied and many of the results can also be applied to the algorithm above. However, we first need to make an assumption regarding M Cn reg . Assume that there exists c > 0 such that Q{X tn ∈ C n } ≥ c. It then holds for any C ∈ R that which implies that M Cn reg approaches infinity when M reg approaches infinity almost surely. Since the regression pairs are independently and identically distributed it holds thatv B,K (X tn ) converges both in mean square and in probability to v B (X tn ) as M reg approaches infinity (see e.g., [31]). To make it more clear when comparing the OLS-approximator of the option value to the neural network approximator (to be defined in the next section), we use the following notationv OLS tn (·) =v B,M (t n , ·), where we assume that B and M are chosen such that both accuracy and time complexity are taken into account. A nice property of OLS regression is that, given B and a sample of regression pairs, we have a closed-form expression for the optimal parameters and in turn also the regression function (32). On the other hand, we may face memory or runtime issues for large B and M Cn reg due to (31). This is a problem, especially when we want to approximate a complicated function surface over a large approximation region. For example, consider an option based on 50 correlated assets. If we want to use the first and second-order polynomials as basis functions 16 By "everywhere" we mean the region in which the distribution of Xt n has positive density. Of course, this is in many cases R d , so in practice by "everywhere" we mean the region in which the density is significantly positive. 17 In practice we run into troubles if we choose B too high since the approximation of the matrix inverse may be unstable.
(including cross-terms) we have B = 50(50+3) 2 = 1325, which is often too large for practical purposes. We should also have in mind that this corresponds to an approximation with polynomials of degree 2, which is usually not sufficient for complicated problems. There are however methods to get around this problem, see e.g., [20] in which the state space is divided into several bundles and regression is carried out locally at each bundle. Another suitable method to overcome these difficulties is neural network regression, which is presented in the next section.

Neural network regression
In this section we present, a simple neural network approximation of V tn (·). The neural network is a mapping, v ϕn : R d → R, parametrized by the p tn ∈ N trainable parameters ϕ n ∈ R pt n . The objective is to find ϕ n such that the empirical L 2 -risk is minimized. We denote byφ n an optimized version of ϕ n and define our neural network approximator of the option price at t n byv NN tn (·) = vφ n (·). To avoid repetition, the description of the neural networks in Subsection 3.1.1 is also valid for the neural network used here. However, one important difference is the output, which in this section is an approximation of the option value, and should hence take on values in (0, ∞). A natural choice as activation function in the output layer is therefore ReLU(·) = max{0, ·}. Furthermore, by shifting the output with −g tn (·), i.e., designing the neural network to output vφ n (·) − g tn (·) and definingv NN tn (·) = vφ n (·) + g tn (·) we can, for all x ∈ R d , enforcê v NN (x) ≥ g tn (x) by using ReLU as activation function in the output layer. However, in many cases it seems to be beneficial to use the identity as the activation function in the output layer.
Another difference, which has to do with the training phase, is that the optimization of the parameters ϕ n does not have to be carried out recursively. This opens up the possibility for parallelization of the code.
By comparing (34) to (30) we see that the optimization problems are similar. There are, however, some major differences. In Subsection 4.2 we have a closed-form expression for the optimal parameters resulting in the final regression function (32). This is not the case for the neural network regression and we therefore need to use an optimization algorithm to approximate the optimal parameters. On the other hand, as mentioned in Subsection 4.2 it is sometimes hard to find basis functions that are flexible enough. This problem can be overcome with neural networks, which are known to be good global approximators.

Approximation algorithms for exposure profiles
In this section, we introduce different ways to estimate (9) and (10) relying on Monte-Carlo sampling and the approximation algorithms described in Sections 3 and 4. Furthermore, a simple example is presented and visualized, which aims to provide an intuitive understanding of the different methods. Finally, the advantages and disadvantages of each method are presented in a table.
In this section, the neural network-based approximation of the value function of the option, introduced in Subsection 4.3 is used. However, it would have been possible to use the OLS-based approximation from Subsection 4.2, instead.
We use M ∈ N independent realizations of X, which for m ∈ {1, 2, . . . , M } are denoted by x(m) = (x t (m)) N n=0 for t ∈ T. When X is given by (18), realization m is denoted byx tn (m) = x tn t (m) t∈[t0,t N ] , where we recall that superscript t n refers to where in time the discontinuity of the drift coefficient is located. We introduce the following two approximations of the expected exposure under the risk neutral measurê Specification of requirements for each approximation Approximation of the functional form V tn (·) Sampling fromX tn Known density functions X given by diffusiontype SDÊ For the expected exposure under the real world measure, we have the following three approximationŝ where l is the likelihood ratio function defined in Theorem 2.2. We note that (35) and (37) are the only approximations that require a calculation of the option value. On the other hand, we need X to be described by a diffusion-type SDE in (38) and we need to know the density functions to calculate (39). To define the approximations of the potential future exposure, we start by defining the permutation given by the vector v NN tn (x tn (m 1 ))I {τ(fθ)>tn} , . . . ,v NN tn (x tn (m M ))I {τ(fθ)>tn} satisfyingv NN tn (x tn (m i )) ≤v NN tn (x tn (m j )) whenever i ≤ j. Furthermore, we define i α = αM , for α ≥ 0.5, αM , for α < 0.5.
The approximations of the potential future exposure are then defined aŝ In Table 1, some characteristics of the calculations needed for each approximation are given.
To explain the different approximations in a concrete setting we turn to a simple example.
Example 5.1. Consider a one-dimensional American put option, where we for simplicity assume that r = 0. We are interested in the expected exposure and the potential future exposure at time t n ∈ (t 0 , t N ) given that we have full knowledge of the market at time t 0 . We give a short explanation of the intuition behind the different methods by referring to Figure 2, where the problem is visualized. We start byÊE 1 Q (t n ) andÊE 1 P (t n ) for which we only use the figure to the left. ForÊE 1 Q (t n ) we follow the blue samples and note that samples 2 and 3 are not exercised prior to, or at t n which means that the indicator function in (35) equals 1. Sample 1, on the other hand, touches the exercise region prior to t n and has therefore already been exercised, which means that When focusing on the red samples instead, we see that no sample touches the exercise region prior to t n and we obtainÊ E 1 P (t n ) = 1 3 v NN (x tn tn (1)) +v NN (x tn tn (2)) +v NN (x tn tn (3)) .
Similarly we can e.g., state thatP FE 2.5 (2)). Moving on toÊE 2 P (t n ) andÊE 3 P (t n ) we shift focus to the figure to the right. ForÊE 2 P (t n ) we want to use the red samples and notice that samples 2 and 3 end up out of the money. We therefore obtain (1) .

ForÊE
3 P (t n ), we instead consider the blue samples and see that sample 1 is exercised prior to t n and samples 2 and 3 end up in the money. However, we also need to adjust the estimate for using the wrong state process 18 . This is done by multiplying each term with the likelihood ratios l(x(2)) and l(x(3)) to finally obtain (3)) .
The last estimate,ÊE 2 Q (t n ), is obtained by removing the likelihood ratios from the estimate forÊE 3 P (t n ). Figure 2: Blue trajectories are distributed as X and red trajectories are distributed asX tn . The boundary for immediate exercise is pointed out and should be interpreted as, as soon a trajectory touches the boundary, the option is exercised and the holder receives the immediate pay-off. Recall that the exercise boundaries are calculated in order to be optimal under the Q-measure.
To conclude, we note that Figure 2 displays, to the left, the cases where functional form approximations of the option values are used and to the right the cases where cashflow-paths are used (this can be read in Table  1). Furthermore, blue and red trajectories are distributed according to X andX tn , respectively (this can also be read in Table 1).

Numerical results
This section is divided into two parts: in the first part we use the Black-Scholes dynamics for the underlying assets. The proposed algorithm is compared with two different Monte-Carlo-based algorithms for a twodimensional case. We focus on both the accuracy of the computed exercise boundaries and the exposure profiles. Furthermore, the exercise boundary is compared with the exercise boundary for the corresponding American option, computed by a finite element method, from the PDE-formulation of the problem. Exposure profiles are then computed both under the risk neutral and the real world measures for problems in 2 and 30 dimensions. Finally, we compare the ordinary OLS-regression with the NN-regression.
In the second part we consider stochastic volatility and compute exposure profiles under the Heston model.

Black-Scholes dynamics
In the Black-Scholes setting the only risk factor is the d ∈ N, underlying assets, denoted by S. We assume a constant risk-free interest rate r ∈ R, and for each asset i ∈ {1, 2, . . . , d}, volatility σ i ∈ (0, ∞), and constant dividend rate q i ∈ (0, ∞). The state process X is then given by the asset process S = (S t ) t∈[t0,t N ] , i.e., X = S, with initial state (s t0 ) i ∈ (0, ∞), and where A is either the real world measure P or the risk neutral measure Q.
In the real world setting A i = µ i ∈ R and in the risk neutral setting A i = r. The process W A = (W A t ) t∈[t0,tn] is a d-dimensional Brownian motion under the measure A. Furthermore, W A is correlated with correlation parameters ρ ij ∈ [−1, 1], i.e., for i, j ∈ {1, 2, . . . , d}, we have that (40) is given by We note that S has Gaussian increments under both the P− and the Q−measure which implies that we have access to closed-form expressions for the transition densities, and in turn also the likelihood ratio in (39).

Bermudan max-call option
At exercise, a Bermudan max-call option pays the positive part of the maximum of the underlying assets after subtraction of a fixed amount K ∈ R. This implies an identical pay-off function at all exercise dates, given by where s = (s 1 , s 2 , . . . , s d ) ∈ (0, ∞) d and for c ∈ R, (c) + = max{c, 0}. We choose to focus on Bermudan max-call options for two reasons: first, there exist plenty of comparable results in the literature (see e.g., [19], [1], [20]); second, and more importantly, the exercise region is nontrivial with several interesting features. For example, max{s 1 , s 2 , . . . , s d } is not a sufficient statistic for making optimal exercise decisions, meaning that we cannot easily reduce the dimensionality (all dimensions are needed in order to price the option). This is not the case with, e.g., the geometric-average call option with pay-off function g(s) = s i is a sufficient statistic for optimal exercise decisions (only the geometric average is needed to price the option, meaning that the problem can be reduced to 1 dimension, see e.g., [32]).
Another example is the arithmetic-average call option with pay-off function g(s) = 1 to the max-call option, 1 d d i=1 s i is not a sufficient statistic for optimal exercise decisions, but on the other hand the exercise region is convex 19,20 (see [33,Proposition 6.1]). Convexity of the exercise region does not hold for the max-call option, making it hard to capture the exercise region when global polynomials are used as basis functions in e.g., the LSM. Methods instead using local regression can, to some extent, overcome this problem but it is difficult to decide how the localization should (usually localization of the state space) be done, especially in high dimensions.
In the numerical experiments we use the following parameters r = 0.05 and for i, j ∈ {1, 2, . . . , d}, q i = 0.1, σ i = 0.2, for i = j, ρ ij = 0, N = 9, t 0 = 0, t N = 3, (s t0 ) i = 100 and K = 100. We want to emphasize that the choice of having no correlation between assets is due to the fact that this case has been studied thoroughly in the literature. To verify that the algorithm is also able to tackle problems with correlated assets, we replicated an experiment in [24], of pricing a put option on the arithmetic average of 5 correlated assets and obtained the same price 21 .

Approximation of the option value at initial time
The performance of the DOS algorithm is thoroughly explored for a wide range of different examples in [1]. However, the convergence with respect to the amount of training data used, M train was not given. We therefore present, in Figure 3, an example of how the option price at t 0 is converging to a reference value in terms of the amount of training data. In the considered example, we use the parameter values given in the end of Subsection 6.1.1 with d = 2, i.e., a two-dimensional Bermudan max-call option. The reference value (13.902) for V t0 (x t0 ) is computed by a binomial lattice model in [16]. The DOS approximation, denoted byV t0 (x t0 ), is computed according to (24). To be more specific, one neural network is trained for each M train ∈ {2 12 , 2 14 , 2 16 , 2 18 , 2 20 , 2 22 , 2 25 }, as described in Subsection 3.1.2. For each M train , the option value, V t0 (x t0 ) is computed 20 times (with 20 independent realizations of X) with M test = 2 20 and the average of these 20 values is showed in Figure 3. Furthermore, the figure to the right displays, empirical 95%-confidence intervals of the sample mean, which is computed by adding and subtracting 1.96 √ 19 times the sample standard deviation. Figure 3: Convergence of approximate option values in the amount of training data. Reference value 13.902, computed by a binomial lattice model in [16]. Left: With different levels of deviation from the reference value. Right: With empirical 95%-confidence intervals of sample mean.

Comparison with Monte-Carlo-based algorithms
We start with a short introduction of the two Monte-Carlo-based algorithms with which we compare results for the two-dimensional max-call option.
The least squares method (LSM) proposed in [17], is one of the most used methods for pricing American/Bermudan options. The method approximates exercise boundaries and computes the option value from discounted cashflows. The regression is carried out globally, i.e., with the same regression function over the entire state space. However, if one is only interested in the option value at t 0 , it is beneficial to only consider ITM-samples in the regression state. This is done when the exercise boundary, and PFE 97.5 are approximated. For approximating EE and PFE 2.5 we need the entire distribution of future option values forcing us to also include OTM-samples in the regression. We use as basis functions the first 6 Laguerre polynomials L n (x) = e −x/2 e x n! d n dx n (x n e −x ) of (S tn ) 1 and (S tn ) 2 and the first 4 of max{(S tn ) 1 , (S tn ) 2 } for all t n ∈ T. Note that since we have no correlation between (S) 1 and (S) 2 , we do not include cross-terms in the basis.
The second algorithm is the stochastic grid bundling method (SGBM) proposed in [20], in which regression is carried out locally in so-called bundles. In the SGBM the target function in the regression phase is the option value which makes is suitable for approximations of exposure profiles. There are several papers on computations of exposure profiles with the SGBM, see e.g., [34]. At each exercise date t n ∈ T, we use as basis functions: a constant, the first four powers of (S tn ) 1 and (S tn ) 2 , and the first three powers max{log((S tn ) 1 ), log((S tn ) 2 )}. Furthermore, the state space is divided into 32 equally-sized bundles based on max{(S tn ) 1 , (S tn ) 2 }.
Before presenting any results, we recall from equations (9) and (10) that the expected exposure and the potential future exposure are two statistics of where τ is an S−stopping time. This means that approximations of the exposure profiles are sensitive to, not only the option value itself, but also the exercise strategy. The SGBM and LSM only compute the exercise strategy implicitly, i.e., by comparing the approximate option value and the immediate pay-off. Therefore small errors of the approximate option value close to the exercise boundary can lead to significant errors in the exercise strategy 22 . We therefore start by presenting a comparison of the exercise boundaries. As a reference, we use the exercise boundary for the corresponding American option, which is computed from the PDE-formulation with the finite element method used in [35]. We note that since the PDE formulation refers to the American option, the exercise boundary differs slightly 23 from the exercise boundary of the Bermudan counterpart, which we are interested in.
In Figure 4, a comparison of the exercise boundaries at t 8 ≈ 2.67 for the different algorithms is presented. As we can see, the DOS algorithm captures the shape of the exercise regions while both the SGBM and the LSM seem to struggle, especially with the part of the continuation region along the line (S tn ) 1 = (S tn ) 2 , in particular for high values of (S tn ) 1 and (S tn ) 2 . By moving on to the exposure profiles, we see in Figure 5, that even though the DOS algorithm and the SGBM seem to agree on the exposure profile in general, we notice a difference in the PFE 97.5 . This is a consequence of the slight bias towards classifying samples as belonging to the continuation region, which is shown in Figure 5, to the right.
The LSM is performing worse, both in terms of accuracy of exposure profiles and bias towards missclassification. This is however not a surprise, since the LSM is tailored to calculate the option value at t 0 .
Finally, it should be pointed out that for both the SGBM and LSM, it could very well be the case that another set of basis functions would better capture the shape of the exercise boundaries. In this two-dimensional example one could probably use geometric intuition to come up with a better set of basis functions, but in higher dimensions, and for more complicated pay-off functions, this becomes difficult.

Exposure profiles under different measures
In this section we compare exposure profiles under different measures for the max-call option, in 2 and 30 dimensions. In Case I we set d = 2 and P 1 and P 2 such that for i ∈ {1, 2}, we have drifts (µ 1 ) i = 15% and (µ 2 ) i = −5%. In Case II we set d = 30 and P 1 and P 2 such that such that for i ∈ {1, . . . , 30}, we have drifts (µ 1 ) i = 7.5% and (µ 2 ) i = 2.5%. Figure 6 shows exposure profiles in 2 and 30 dimensions on the left side. On the right, we see a comparison of the different ways to compute expected exposures which all agree to high accuracy. Furthermore, Figure 7, displays that the fraction of exercised options over time is highly dependent on the choice of measure.    To overcome this problem, we also implement a slightly different version of the algorithm, where we instead carry out local regression in the continuation regions. To be able to differentiate between the local and global OLS-regressions we denote the regression functions by v OLS loc and v OLS glob , respectively. The localization procedure is done similarly as in the SGBM, i.e., at each t ∈ T, the state space is divided into bundles of equal size, based on max{(S t ) 1 , (S t ) 2 }. With local OLS-regression we obtain almost identical exposure profiles as with NN-regression. In Figure 8, on top to the left, the exposure profiles computed with the three different algorithms are displayed. Furthermore, from top right to bottom right, we compare the approximate risk premiums for holding instead of immediately exercising the option at t 2 ≈ 0.667 and some x ∈ R 2 , i.e., v Z t2 (x) − g t2 (x), with Z representing, in order NN, OLS glob and OLS loc . We know that for all x ∈ R 2 , it holds that V Z t2 (x) − g t2 (x) ≥ 0. We see in Figure 8, top right, that this is captured by the NN-regression, since the the values range from 0 to just above 12. When we carefully evaluate the values of the risk premiums computed with local and global OLS-regression ( Figure 8 bottom left and bottom right) we see that negative values exist in both plots. We see a similar phenomena for high values, i.e., the range is stretched upwards in comparison to the values obtained with NN-regression. The reason for the negative values is that v OLS loc and v OLS glob underestimate the option values close to the boundary (which in this case coincide with the exercise boundary since the regression is carried out only in the continuation region). To compensate for this, we see a tendency of higher values in the center (not close to the exercise boundaries) of the continuation regions. As a final remark, we note that this behaviour is reduced by using local regression instead of global regression (the range of values in Figure 8 is tighter for local than global regression). On the other hand, we also note discontinuities in Figure 8 bottom, left, stemming from the localized regression, of the risk premium computed with the local OLS-regression.
Even though, the local OLS-regression is less accurate when it comes to computations of exposure profiles than the other two algorithms in this section, it is clear by comparing Figures 5 and 8, that it outperforms the LSM. This is not a surprise due to: 1. The accumulate error in the LSM (because of recursive dependency of the regression functions) is significantly reduced since the regression functions are not sequentially dependent (the current state is regressed against discounted cashflows, not regression functions at other exercise dates). The only error accumulation (over time), in the OLS-regression originates from the DOS algorithm, which computes the exercise boundaries highly accurately; 2. By recalling equation (42), we note that a less accurate stopping strategy may cause a less accurate exposure. , v Z t2 (·) − g t2 (·), with Z representing NN, OLS glob and OLS loc , respectively.

Heston model dynamics
In this section we assume a one-dimensional underlying asset following the Heston stochastic volatility model [36], which is considered only under the risk neutral measure. We therefore omit the explicit notation of the probability measure used in this section. In this setting, the market is described by, not only the underlying asset price process S = (S t ) t∈[t0,t N ] itself , but also by the instantaneous variance process ν = (ν t ) t∈[t0,t N ] . The state process is then the two-dimensional process X = (ν, S) which satisfies the system of SDEs with risk-free interest rate r ∈ R, dividend rate q ∈ (0, ∞), initial conditions s t0 , ν 0 ∈ (0, ∞), speed of mean reversion κ ∈ (0, ∞), long term mean of the variance process θ ∈ (0, ∞), and volatility coefficient of the variance process ξ ∈ (0, ∞). Furthermore, (W S is satisfied, it holds that 0 is an unattaiable boundary for ν. If the Feller condition is not satisfied, then 0 is an attainable, but strongly reflective 24 boundary, see e.g., [39]. This leads to an accumulation of probability mass around zero, which makes it more challenging to approximate ν accurately. Unfortunately, the Feller condition is rarely satisfied for parameters calibrated to the market. In this paper, we use the QE-scheme, proposed in [40] is used to approximate 25 (ν, S). If necessary, we choose a finer time grid for the approximation of (S, ν) than the exercise dates, T.
We consider a standard Bermudan put option, i.e., identical pay-off functions at all exercise dates, only depending on the underlying asset. Furthermore, the pay-off function for the Bermudan put option is given by g(s) = (K − s) + , for s ∈ (0, ∞), and strike K ∈ R.

Comparison with Monte-Carlo-based algorithms
In this section, we again compare the DOS algorithm with the two Monte-Carlo-based algorithms, SGBM and LSM. We use the following set of model parameters: r = 0.04, q = 0, s t0 = 100, κ = 1.15, θ = 0.0348, ξ = 0.459, ν 0 = 0.0348 and ρ ν,S = −0.64, and the contract parameters: T = 0.25, N = 10, K = 100. The parameters coincide with Set B in [41], in which the valuation is carried out with the so-called 2D-COS method. The 2D-COS method is a Fourier-based method and is assumed to yield highly accurate valuation of the option.
For the LSM, we use as basis functions, for t n ∈ T, Laguerre polynomials of degree 2 of S tn , degree 3 of ν tn and ν tn S tn . For the SGBM, we use 8 equally-sized bundles based on S tn and for t n ∈ T, we use as basis functions S tn and the first 3 powers of ν tn . These parameters are chosen such that the approximate option value at t 0 are as close as possible to the (almost) exact value of 3.198944, for a Bermudan option with 10 exercise dates, retrieved from [41]. The obtained option values (for each algorithm, average value of 10 runs) at t 0 were 3.1792, 3.2085, and 3.1631 for the DOS algorithm, the SGBM and the LSM, respectively.
It should be stressed that the numerical results for the LSM and the SGBM in this section should not be seen as state of the art performance of the algorithms. For example, in [34], a bundling scheme based on recursive bifurcation and rotation of the state space to match the correlation between S and ν gave accurate results. Furthermore, they use as basis functions only monomials of log (S tn ) and letting the stochastic variance ν tn enter only through conditional expectations of the form E[(log(S tn )) k | S tn−1 , ν tn−1 ]. These conditional expectations are computed from the characteristic function of the S tn | S tn−1 , ν tn−1 which was presented in [36]. Similar improvements could also be done for the LSM. The reason for this comparison to still be relevant is to demonstrate the flexibility of DOS and the NN-regression, in which nothing has been changed (from the examples with Black-Scholes dynamics) except the dynamics of the stochastic process from which we generate training data.
In Figure 9, the exercise boundaries are presented in the state space. Worth noticing is that for t n ∈ T, both the option value and the pay-off functions are increasing in ν tn and decreasing in S tn . An immediate consequence of this is that we can have at most one exercise boundary along the lines with constant S tn or constant ν tn . We can therefore conclude inconsistencies in the exercise boundaries for both the LSM and the SGBM. In Figure 9, on the bottom line to the right, the empirical probability density functions (pdf) of the exposures at the exercise dates are plotted. Figure 10 shows the exposure profiles and the exercise frequency computed with the three algorithms. We note that the DOS and the SGBM seem to agree fairly well on the EE and the lower percentile of the PFE but differ significantly on the upper PFE.