Adaptive neural network surrogate model for solving the implied volatility of time-dependent American option via Bayesian inference

: In this paper, we propose an adaptive neural network surrogate method to solve the implied volatility of American put options, respectively. For the forward problem, we give the linear complementarity problem of the American put option, which can be transformed into several standard American put option problems by variable substitution and discretization in the temporal direction. Thus, the price of the option can be solved by primal-dual active-set method using numerical transformation and finite element discretization in spatial direction. For the inverse problem, we give the framework of the general Bayesian inverse problem, and adopt the direct Metropolis-Hastings sampling method and adaptive neural network surrogate method, respectively. We perform some simulations of volatility in the forward model with one- and four-dimension to compare the point estimates and posterior density distributions of two sampling methods. The superiority of adaptive surrogate method in solving the implied volatility of time-dependent American options are verified.


Introduction
With the widespread application of inverse problem (IP) in many fields such as finance, physics, chemistry, and geophysics, researchers have become more and more interested in how to solve it efficiently and accurately. The core of IP is to estimate inputs or some unknown parameters from some observations in mathematical models, which usually consist of parameters that can be estimated from observations. Correspondingly, the model from parameters to the observations is called the forward problem. In this paper, we mainly study the IP in the financial field, in particular, the implied volatility of time-dependent American put option.
To the option investors, implied volatility is a very important indicator of the options, and reflects the investors' expectation of the volatility of the underlying assets in the future. This indicator can be obtained by taking the option price as a known condition and inverting it into the corresponding Black-Scholes (BS) model. Therefore, our forward problem is to solve the option price of American put option with time-dependent volatility, and the corresponding IP is to estimate the posterior distribution from the observations of the forward model.
As for our forward problem, the major difficulty we encounter is the time-dependent volatility in the BS equation, thus we cannot directly use the existing methods based on finite element technique to solve this American put option. In order to draw from the well known methods for solving various American options, the problem needs to be preprocessed. Firstly, we discretize the forward problem in the temporal direction, then the original model is transformed into multiple American put option pricing problems with fixed volatility. We apply the far-field technique to truncate the solution region to ensure the solution region is a finite field. After that, we can adopt some existing numerical methods, such as primal-dual active-set (PDAS) method [1][2][3] and perfectly matched layer method [4,5] after finite element method (FEM). Furthermore, we can also solve this BS equation by difference technique directly in both temporal and spatial direction, such as projected successive overrelaxation method [6].
For the IP, accurate and efficient estimation of the implied volatility is a current topic of great practical significance in finance. There are two major challenges of our problem: a). to estimate the implied volatility accurately and efficiently; b). to design the efficient algorithm in order to speed up the computation on the premise of ensuring a certain degree of calculation accuracy. As for the first issue, the estimation of the implied volatility is closely related to the forward model obtained by numerical approximation, the number of observations, and the errors introduced by the observations. Hence, it is very important to find an efficient approach to evaluate the uncertainty of the implied volatility in IP. Bayesian inference is a popular statistical method for IP that can not only obtain the corresponding estimation of the parameters to be solved, but also elaborate on the uncertainty of the parameters in the IP [7][8][9][10]. And this approach is a natural method by adding additional information, such as prior distribution of the parameter to supplement the observation model, in which the parameter is regarded as a random variable to highlight the characteristic of its uncertainty. Then we can obtain the posterior distribution via the Bayesian formula, from which much information about the parameters, such as the mean, variance, and distribution, can be obtained. Thus, for the BIP, solving the parameter is equivalent to solving its posterior distribution. The simplest and fastest method for BIP is the explicit calculus method, such as the conjugate prior method [11,12]. When the posterior distribution can not be characterized in the closed form because of the complexity of the forward model, various numerical approaches can be adopted, e.g., acceptance-rejection method [13], importance sampling method [14], and Markov Chain Monte Carlo (MCMC) method [15]. In particular, Metropolis-Hastings (MH) method in MCMC is applied in this paper. The goal of MCMC method is to generate samples from the posterior distribution of unknown parameters, where the posterior distribution is represented by the product of the prior distribution and the likelihood function. Nevertheless, the likelihood sampling can be time consuming while the forward problem is computationally expensive, when dealing with non-linear or high-dimensional models. Therefore, we would like to reduce the time cost in the sampling process of forward model, which can be called online time. In many recent works, replacing the original forward model with a cheap surrogate model of offline constructed is a popular approach [16,17]. Meanwhile, it can be theoretically proved that the surrogate model converges to the true forward one in the prior-weighted L 2 norm, and the approximated posterior distribution converges to the true posterior distribution at least two times faster [18].
As regards the second difficulty, the high-dimensional parameter space and the computationally expensive model pose a great challenge for the MCMC method and other sampling approaches to be adopted. To address the curse of dimensionality in the approximation of the solutions of different equations, neural network (NN) has received much attention in the past decades [19,20]. Therefore, we consider to choose the NN surrogate technique. Yet, a prior-based NN model may not be accurate for the online computation as the posterior distribution tends to focus on a small portion of what the prior support, and the developed posterior-based surrogate methods in some important regions of the posterior distribution. In order to improve the computational cost of estimating the posterior distribution, we develop an adaptive NN surrogate method (ANNSM) [21], of which the idea is that the sampling process begins with a low-precision and cheap computation surrogate model to speed up the online computation by MCMC method. In this approach, we correct the low-precision surrogate model adaptively via the true forward model, then can obtain a new high-precision surrogate one which is regarded as the low-precision one in a new iteration. The iterations will be performed repeatedly till the maximum number of iterations is reached. In our paper, the surrogate model is generated by NN.
The rest of this paper is organized as follows. In section 2, we give the detailed solution process about time-dependent American put option by PDAS method after variable substitutions and discretization in temporal and spatial direction, respectively. In section 3, interpolation is carried out using the obtained option prices on the fixed observation points in the fixed regions. Then ANNSM are proposed to solve the implied volatility parameters. The numerical experiment results using ANNSM against DMHSM for one-and multi-dimensional implied volatility are presented in section 4. And the computational superiority of ANNSM is verified. In section 5, the concluding remarks are given.

Setup of the forward problem
In this section, we first introduce the linear complementary problem (LCP) for American put option, where the volatility is a time-dependent function. With a conventional variable substitution, the backward pricing problem with variable coefficient is transformed into a forward one. Secondly, by applying the backward Euler method in the temporal direction, the forward problem can be decomposed into some American put option pricing problems each of which has constant volatility. Thirdly, we use another numeraire transformation to obtain a series of bounded LCPs after the far-field truncation technique. Based on finite element method (FEM) in spatial direction, the associated discrete form is presented. At the end of this section, the PDAS method is adopted to solve the option price.

Bounded linear complementary problem
In order to simplify the following problems, we introduce some notations first. Let r and S be the interest rate and the price of the underlying asset, respectively. The arbitrary time and the volatility function is denoted by t and σ(t), respectively. T and K stand for the maturity date and the strike price, respectively. We assume that the underlying asset pays no dividends in this paper. Then, the LCP of time-dependent American put option price V = V(S , t; σ) can be described as follows: It needs to note that σ = (σ 1 , . . . , σ d ) T is the parameter vector belonging to the parameter space Θ ⊂ R d . The volatility function is expressed as a linear combination of the following d parameters and the basis functions The original problem (2.1) is a backward partial difference equation (PDE). We apply a traditional variable transformation then the backward problem becomes a forward issue where the simplified operator LP(S , τ;σ) iãi (τ). The initial and the corresponding boundary conditions in problem (2.4) are given as P(S , 0) = f (S ), P(0, τ;σ) = f (0) and lim S →+∞ P(S , τ;σ) − f (S ) = 0. Since the coefficient of forward LCP (2.4) varies with time t, we would like to discretize the volatility in the temporal direction as soon as possible, so as to decompose this problem into several American put option pricing problems with constant volatility before applying the existing methods of solving the American option pricing problems. Furthermore, we need to emphasize that the options are traded more frequently near the maturity date T . Therefore, we ought to adopt a geometric grid partition in the temporal direction the coefficients of the first order derivative terms in problem (2.6) equal 0. In addition, for m = 1, . . . , M, these problems turn into the following formulations We use the far-field truncation technique to deal with these unbounded problems (2.8), so as to obtain bounded problems under the premise of ensuring the accuracy [22].
Let L represent the truncated length sufficiently large to guarantee the accuracy. Thus the bounded LCPs are as follows: and v m (±L;σ) = h m (±L). Therefore, we have converted the original LCP (2.1) on a unbounded domain into M BLCPs on a bounded domain, on which we can adopt known numerical algorithms to solve the option prices.

Design of numerical algorithms
Before applying FEM, we first convert the BLCPs (2.9) into the variational inequalities through a lemma. Let (2.10) For the above problem (2.10), we apply a uniform grid partition in the spatial direction Here, Λ n := (x n−1 , x n ) represent the spatial elements. We use piecewise linear finite element to formulate the discretization scheme of the problems (2.10). For any m = 1, . . . , M, we define the function set as S 1 where P 1 denotes the set of polynomials of degree less than or equal to 1. Therefore, the discretized approximation of the VIs (2.10) can be formulated as follows where v m h and u m h stand for the numerical solution and test function of mth layer, respectively. h M+1 Therefore, we obtain the finite element form of solutions and the test functions below (2.13) Our goal is to find the coefficients v m l , l = 2, . . . , N, m = 2, . . . , M, so that to obtain the option price. For convenience, we abbreviate v m h (x) as v m and u m h (x) as u m . By substituting the above equations (2.13) into the problem (2.12), it can be reformulated as (2.14) Here, the matrix Other unknown quantities in the above problem (2.14) are shown below To simplify the problem (2.14), let When h 2 △τ is small enough, the matrix D m is a positive definite matrix. Therefore, we can solve the forward problem (2.15) via the PDAS method. The complete algorithm of solving this problem using PDAS method is as follows: The whole algorithm of solving the option prices via PDAS method. x Given the quantity to be evaluated and the conditions: v m pre = v m cur . 18: 19: So far, we have obtained the time-dependent American put option price by PDAS approach. For the sake of simplicity in later sections, our discretized forward problem can be condensed into a mathematic model where σ = (σ 1 , . . . , σ d ) T is the parameter vector, and G : R d → R M+1,N+1 is the discretized operator by FEM and BEM, mapping from the parameters σ 1 , . . . , σ d ∈ R to the observable.

Bayesian inverse problem
In the previous section, we obtain the option price by some techniques. Now, we consider its inverse problem, that is, finding the implied volatility by Bayesian inference. We first give a brief introduction to the BIP. And we give the specific process of DMHSM to solve our IPs. Since DMHSM cause tremendous amount of computation when tackling with non-linear forward model and multidimensional volatility function, we develop ANNSM.

Framework of Bayesian inverse problems
With regards to the model discussed in the previous section, we now proceed to study its IP, that is, to solve the implied volatility via Bayesian inference. To apply the Bayesian technique, the numerical option price would be preprocessed. We only choose the bounded rectangular region near the optimal exercise boundary, so that most of the important information of the option is covered. After that, we discretize this region to get some fixed observation points. Through the linear interpolation, we obtain the option price at these fixed observation points. Meanwhile, the measurement noises come from the observations. Hence, the problem can be reformulated as follows where observation data is denoted by y d ∈ R D with sampling noise δ ∈ R D . Here, g : R d → R D is the discretized observation operator. The aim of the IP is to estimate the unknown parameters σ 1 , . . . , σ d , i.e., the parameter vector σ from noisy observations y d . In order to adapt our problem to the framework of the BIP, we combine the model (2.16) with the observation model (3.1) to redefine a forward model below where F is the forward problem operator defined by the parameter vector σ. The parameter vector σ should be characterized by a prior distribution π(σ) when using the Bayesian technique. Correspondingly, the posterior distribution π(σ; y d ), that is, the distribution of the parameter σ conditioned on the data y d , follows from the Bayes' rule: Here, L(σ; y d , F) stands for the likelihood function. Assume the density function of the noises δ ∼ π δ is given, and usually supposed to be Gaussian. Then, the specific form of L(σ; y d , F) can be shown as In Bayesian inference, the posterior distribution L(σ; y d , F) is the Bayesian solution of the inverse problem.

Bayesian inverse problem based on Metropolis-Hastings sampling
Since our forward problem is non-linear, the posterior distribution is very difficult to be characterized in the closed form. Therefore, we usually use numerical sampling methods, e.g., acceptancerejection sampling [13], importance sampling [14], and MCMC sampling [15]. The acceptancerejection sampling and importance sampling is usually suitable for one-or two-dimension simple problems. The former method is the basis of MCMC. Hence, we shall use a special kind of MCMC method: MH sampling method. MH approach is a sampling scheme for getting access to a sequence of random samples from the distribution, where direct sampling is hard. The obtained sequence can be used to approximate the posterior distribution π(σ; y d ), and then to calculate such things as the expectation of parameter vector σ, and so on. MH method generates a random walk using a proposal density q and an acceptance-rejection method for some proposed moves. The details of MH sampling is given by the following pseudo-code.
1: for j = 0 : 1 : n 1 − 1, do 2: Select candidate point σ * from the proposal density q(· ; σ j ), 3: Draw u from uniform distribution U[0, 1], 4: Evaluate the acceptance probability via the forward problem operator F 5: if u < α(σ j , σ * ) then Reject σ * by setting σ j+1 = σ j . 10: end if 11: end for We resort to the DMHSM to sample enough points. In general, choosing samples of at least the same order of magnitude as 10 4 will achieve enough accurate. Then we obtain the approximate posterior distribution π(σ; y d ), so as to solve the BIP directly.

Bayesian inverse problem based on surrogated method
As the forward problem operator F is a non-linear and can be a high-dimensional one, it is time consuming to calculate the posterior distribution π(σ; y d ) via DMHSM. Therefore, surrogate models have received much attention in recent works [24,25]. This method allows us to generate a few collection of model samplings, which includes the parameter vector σ and the forward model F(σ). Then, we can construct a surrogate forward model F by samplings. Based on this surrogate operator F, we can obtain a surrogate posterior distribution π(σ; y d ) ∝ L(σ; y d , F)π(σ). (

3.4)
It is advantageous that the surrogate operator F is cheap to be evaluated. Therefore, we can use the same samplings with less time and similar precision. To this end, methods such as polynomial chaos expansions is available [26]. We should point out that this surrogate is constructed on the prior distribution, not on the posterior distribution. However, our goal is to satisfy high precision in the posterior density field. Another thing worth mentioning is that to obtain a high accuracy posterior density estimate, we need enough samplings and huge amount of computation. The accuracy of the whole samplings of the surrogate model on the prior distribution can not be guaranteed. Therefore, we come up with an adaptive surrogate model, transitioning from a high precision prior distribution with sampling data to a high accuracy posterior distribution. For this surrogate approach, we reconstruct the surrogate model after some sampling points in several loops, and the details of this adaptive surrogate approach is given by the following algorithm 3.
In practice, it is possible to encounter high-dimensional parametric space and cases, where the forward model F is high-dimensional and nonlinear, and thus much computation is needed. Therefore, the number of samplings used to build the surrogate model will increase dramatically. In order to handle this curse of dimensionality problem, we introduce NN, which is a popular technique in many fields. Basically, NN can be described as an input-output map H : R D → R d , which has input or training data σ, the output d, and M hidden layers. We give a general formulation of NN as follows: where σ(·) represents the activate function, W (l) ∈ R and b (l) are the weight matrix and the biases vector of the lth layer in NN, respectively. θ := {W, b} are jointly called the unknown parameters of NN. There are some choices of activate function, e.g., rectified linear unit (ReLU), sigmoid, hyperbolic tangent (tanh). While the architecture of NN is given, we can resort to some optimization techniques to determine the unknown parameters θ := {W, b} by virtue of the training data. Furthermore, we can define the loss function as follows: where N, λ, and ∥θ∥ 2 denote the sampling number, the regularization constant, and the regularization function, respectively. Then the minimization problem can be described as There are various optimal algorithms for NN, for instance, gradient descent (GD) [27], stochastic gradient descent (SGD) [28], Adam [29], and so on. In this paper, our goal to use NN is twofold: the first one is to establish an initial surrogate forward model, and the second one is to establish an non-linear network for generating high precision surrogate model on the posterior distribution. As for the first issue, we would like to generate a NN, of which the inputs are some parameter samplings {σ i }, and the output is a low-precision surrogate forward model F 1 on the prior distribution. As regard the other one, we want to generate a NN so that the inputs and an approximation operator F as the initial surrogate model. via F using the Algorithm 2.

23:
end if 24: are the low-precision model F 1 and some parameter samplings {σ i }, and the corresponding output is a high-precision surrogate forward model on the posterior distribution. In this way, we give the concrete form of the ANNSM to improve the accuracy of the surrogate model based on the prior distribution to on the posterior distribution, and to obtain the high precision posterior density, so as to solve the BIP. This approach greatly eliminates the computational burden of MH algorithm when solving non-linear and high-dimensional BIP, or even IPs, so that we can overcome the obstacles that large scale problems cannot be calculated due to direct MH sampling.

Numerical simulations
In this section, we shall exhibit some simulations about the DMHSM and the ANNSM. For solving the implied volatility of option to illustrate the excellent performance of our proposed algorithm.
To setup the forward problem, we choose r = 0.03, T = 1, K = 1, and let the truncated length after the transformations L = 1.6, so that the truncated region is large enough. The spatial and temporal partitions for the discretized problem are M = 150 and N = 100, respectively. What we need to emphasize is that although the observation field is not the entire discretized region, it contains important information, such as the region containing the points near the optimal exercise boundary. We only observe data by a few fixed points in the field. We assume that the field is divided isometrically in the transformed coordinate scale, and the number of subintervals in the partitions in the spatial and temporal directions are both selected as 15, then the total number of observation points is 256. In addition, the observation field and observation points are completely fixed regardless of the change of the forward problem. However, as the form of volatility changes, the solutions of the forward problem also change accordingly, so we need to carry out linear interpolation using the result of the forward problem to get the solution value at the fixed observation point.
We solve the BIP by using DMHSM and ANNSM, both of which use 50, 000 sample points. We first set the prior distribution π(σ) ∼ N d (−2 × 1 d , 0.5 × I d ) or N d (−2 × 1 d , 0.1 × I d ) when different cases, and the proposal density q(· ; σ) ∼ N d (σ, 0.025 2 × I d ) or N d (σ, 0.005 2 × I d ), respectively. In addition, the sampling noise δ obeys N D (0 D , 0.01 2 × I D ). Meanwhile, we initialize the architecture of two structures of NN, one is the low-precision surrogate model, and the other is the high precision surrogate model. For the surrogate model with low precision, we choose the NN is structured with 4 hidden layers, each of which has 40 neurons, and the corresponding activation function is ReLU. For the high precision surrogate model, the NN is structured with 1 hidden layer, which has 150 neurons, and the activation function is tanh. Adam is selected as the optimization algorithm for training these two NNs.
Let the number of iterations I = 500, the sampling number of each iteration s = 100, and the sampling number of updating model Q = 10. The error threshold value and the radius of updating are set to be ϵ = 10 −3 and R = 0.1, respectively. In order to ensure the stability assumption and independence assumption between samples in the MH sampling method in all experiments, we take one sample point out of dozens to hundreds points in the last fifty percent of the complete sampling set. For each simulation, we first obtain the complete sets of two sampling methods, then draw the atuocorrelation function (ACF) images about DMHSM, so as to determine how many sample intervals can ensure the independence between samples, and then calculate the posterior means as the point estimates. We are going to examine the volatility of forward problem with d = 1, 4. The simulations are all preformed by MATLAB R2020b on a computer with Intel Core i7 CPU of 3.2 GHz.

Simulations for d=1
For the formulation of one-dimensional volatility are specifically given as (4.1) Here, for testing, the basis function in the (4.1) are selected as a(t) = 1, t + 1, and 0.5e t . Furthermore, the parameter σ of the volatility in the forward problem are given as 0.15, 0.25, and 0.4, respectively. Firstly, we can get the sets of sample points for DMHSM and ANNSM, and draw the corresponding posterior density distributions in Figures 1-3.    Accordingly, the corresponding posterior mean estimates by adopting the DMHSM and ANNSM can be obtained in Tables 1-3. Meanwhile, we record the computational times of 9 different cases.
In Tables 1-3, computational time is consist of offline part and online part. Offline time is the CPU time on constructing NN and initialing the low-precision NN. Meanwhile, the online time is time cost on sampling. From these tables, we can compare the infinite modulus error estimates between the point estimations of two methods are both at least on the order of 10 −4 . Moreover, we suppose the given volatility parameter σ is the true estimation of BIP, then the relative error is controlled within 0.5%, and the estimates of two sampling methods agree well. Moreover, we can conclude that the calculation speed of the ANNSM is 2 order of magnitude faster than that of DMHSM while ensuring the calculation accuracy.
In conculsion, the superiority of ANNSM is verified.

Simulations for d=4
Similar to the previous subsection, we now consider the multi-parameters case, i.e., the parameter is a vector. The specific expression of four-dimensional volatility is given as Here, we choose two cases of the basis functions in the (4.2) for simulations: Case 1 : a 1 (t) ≡ 1, a 2 (t) = t, a 3 (t) = t 2 , a 4 (t) = e t , Case 2 : a 1 (t) ≡ 1, a 2 (t) = t, a 3 (t) = 3 −t , a 4 (t) = e t .
In addition, the parameter vector σ = (σ 1 , σ 2 , σ 3 , σ 4 ) of the volatility in the forward problem are chosen as   At first, the sets of sample points for two sampling methods can be obtained, and we can draw the corresponding posterior density distributions in Figures 7 and 8.
From Figures 7 and 8, we can conclude that although the results for each component in mutli dimensions are not as good as those in one dimension, the posterior distributions obtained by ANNSM are in general agreement with those obtained by DMHSM to a large extent. We shall compare the posterior mean estimates of two methods further.
We draw the ACF figures about the samplings of DMHSM for two different basis functions and parameter vector choices of the volatility in Figures 9-12.       Tables 4 and 5, we can draw two conclusions: one is that the L ∞ error estimates between the results of DMHSM and ANNSM are at least on the order of 10 −3 . Furthermore, the relative error is controlled within 6.5%. The other one is that the calculation speed of ANNSM is 1 order of magnitude faster than that of DMHSM when the calculation accuracy is also guaranteed. Therefore, the advantages of high accuracy and less time consuming of ANNSM for four-dimensional BIP is proved.

Conclusions
In this paper, we have developed an ANNSM to solve the BIP of implied volatility of timedependent American option. Firstly, we give the linear complementarity problem of this American put option. Then, the original problem is transformed into several standard American put option problems through variable substitution and discretization in temporal direction. Furthermore, the price of these standard American put options can be solved by primal-dual active-set method after numerical transformation and finite element discretization in spatial direction. Secondly, we solve the inverse problem by Bayesian inference about implied volatility under the premise that the option price is known and the fixed observations can be carried out in a finite region by interpolation. The general background of BIP is introduced. We consider to use the surrogate model because of the nonlinearity and high-dimensionality of BIP, and further propose ANNSM combined with NN. Finally, we perform numerical simulations with one-and four-dimensional IPs to compare the accuracy and calculation speed between DMHSM and ANNSM, respectively. And from the point estimates and posterior distributions, the superiority of ANNSM in solving implied volatility of time-dependent American option is verified.