Calibration and simulation of Heston model

Abstract We calibrate Heston stochastic volatility model to real market data using several optimization techniques. We compare both global and local optimizers for different weights showing remarkable differences even for data (DAX options) from two consecutive days. We provide a novel calibration procedure that incorporates the usage of approximation formula and outperforms significantly other existing calibration methods. We test and compare several simulation schemes using the parameters obtained by calibration to real market data. Next to the known schemes (log-Euler, Milstein, QE, Exact scheme, IJK) we introduce also a new method combining the Exact approach and Milstein (E+M) scheme. Test is carried out by pricing European call options by Monte Carlo method. Presented comparisons give an empirical evidence and recommendations what methods should and should not be used and why. We further improve the QE scheme by adapting the antithetic variates technique for variance reduction.


Introduction
Since the introduction of the Black Scholes model in [1] a number of complex models have been proposed to reflect the behaviour of markets and the derivatives. Particularly, the Black Scholes model of option valuation relied on constant volatility, which does not represent the dynamics in the financial markets sufficiently anymore. Relaxing this assumption led to the models called stochastic volatility models.
Among the first publications about the class of stochastic volatility models were Hull and White [2], Scott [3], Stein and Stein [4] and Heston [5]. Heston model was one of the first models that allowed a calibration to real market data using thee semi-closed form solution for European call and put option prices. In Heston model, one cas also consider a correlation between the asset price and the volatility process as for example opposed to Stein and Stein [4].
Heston model is one of the most popular models for option pricing. It can be calibrated using the vanilla option prices and then used to price exotic derivatives for which there is no closed form valuation formula. For this purpose a method for simulating the evolution of variable of interest is necessary. Although the Heston model was already introduced in 1993, the research on discretization methods of its dynamics appeared only very recently. Different Euler schemes implementations were examined and various ways how to deal with negative values of the variance process were suggested, for example Higham and Mao [6] or Lord et. al. [7]. An exact scheme was developed by Broadie and Kaya [8], which simulates the Heston processes from its exact distributions. Higher order Milstein scheme is also available as well as QE scheme, which is the most popular one right now, introduced by Andersen [9].
Most of the tests for pricing using Monte Carlo in the papers omit certain schemes and also choose their parameters and discretization rather wisely when comparing the schemes or showing their performance. We aim to eliminate this phenomenon by calibrating the Heston model to real market data and obtaining a set of parameters which explain the prices of European call options on the real market. We use them afterwards in the Monte Carlo simulations and compare the Monte Carlo results with the prices implied by the Heston model.
The calibration of the model is a crucial process and a price to pay with more complex model is the increased complexity of the calibration process. The industry standard approach is to minimize the difference between the observed prices and the model prices. We can follow for example Mikhailov and Nögel [10], who used the method of adaptive simulated annealing and weights of their fine choice. Unfortunately, it is quite common in the literature that only few data and mostly at the money options are used to demonstrate the calibration process, which of course leads to small errors and it makes the calibration problem easy to solve. Also synthetic option prices with simple parameters are often calculated and used for the calibration. See for example Kienitz and Wetterau [11], where synthetic option prices are calculated and the model parameters are recovered from the calibration process. A significant role is played by the optimizer, because local optimizers heavily depend on the initial guess and thus the distance between the model prices and the market prices does not necessarily have to result from the model, but it can be due to the calibration process.
We demonstrate the complexity of the calibration process and employ variety of optimizers. To overcome the problem with the initial guess we use the global optimizers, in particular adaptive simulated annealing and genetic algorithm. We use different optimizers with a specific approach to calibrate the model to data obtained from the real market, namely we used daily data for DAX Options obtained using the Bloomberg Terminal. We present the results of the calibration for two consecutive days. A combination of a global and local optimizer will be compared to the results obtained by running a large amount of local optimizers stating at different initial points of a deterministic grid. The idea of a deterministic grid was thoroughly tested in the Master's thesis [12] and, in terms of relative error, offers the best results. We show that the combination of a global and local optimizers can provide reasonably good results in much shorter computational times.
Calibration of the Heston model together with the recent approximative fractional stochastic volatility model is briefly described and compared also in the author's papers [13,14], where only calibration results are presented. In this paper we focus on the Heston model only, for which we deeply compare not only calibration techniques, but also several simulation methods. We further improve the calibration process by also incorporating the approximation formula by Alòs, de Santiago and Vives [15] and the Monte Carlo simulation is then improved by considering antithetic variates in the QE scheme.
It is worth to mention, that although there have been many papers and even books, e.g. recent book by Rouah [16], published about Heston model since the original Heston's paper [5], one can hardly find practical rules of thumb for calibration of the model to real market data and the performance of the simulation techniques for the parameters obtained by the calibration process. The list of techniques for parameter estimation and schemes for simulation mentioned in Chapters 6 and 7 of the book [16] are far from being satisfactory. This paper aims to fill in this gap by presenting several comparisons, giving some recommendations for the optimization techniques and also by showing what simulation methods should and should not be used and why.
The paper is organized as follows. In section 2, we outline the notations used in this paper together with the properties of the Heston model. Calibration of the model to real market data is presented and optimization techniques are compared in section 3. A special attention is given to the combination of global and local optimizers that use the approximation of Heston formula to get the initial guess of the price really quickly. In section 4 we present and compare the simulation schemes and introduce also a new method combining the exact and Milstein (E+M) scheme. We also show how antithetic variates can further improve the QE scheme. We conclude in section 5.
As for the notation used in this paper, it is worth to mention that many formulas are presented in the notation used in the original cited papers. Although this may seem as a little inconsistency in the notation in this paper, we believe that it can help further investigation of studied formulas and schemes within the cited materials.

Heston model
Following the original paper by Heston [5] and the recent book by Rouah [16], we consider the following risk-neutral stock price model with initial conditions where S.t/ represents the price of the asset at time t, v.t / is the instantaneous variance at time t, r is the risk-neutral rate of return, Â is the long run average variance (as t tends to infinity, the expected value of v.t / tends to Â ), Ä is the rate at which v.t / reverts to Â and is the volatility of the variance. Processes e W S and e W v are two Wiener processes under the risk-neutral measure e P that are correlated with instantaneous correlation .
Stochastic process v.t / is referred to as the variance process (it is also called a volatility process and the parameter then volatility of volatility) and it is the CIR process, the square-root mean reverting process introduced by Cox, Ingersoll and Ross [17]. It is always positive and cannot be zero or negative if the Feller [18] condition 2ÄÂ > 2 is satisfied.
The main parameters of interest in the Heston Model are v 0 ; Ä; Â; and . High values of Ä essentially mean a higher speed of the reversion of the process toward its long-run mean Â . Obviously, an increase in Â increases the prices of options. The parameter affects the skewness of the returns distribution and hence the skewness in the implied volatility smile [5]. Negative values of induce a negative skewness in the returns distribution since lower returns will be accompanied by higher volatility which will stretch the left tail of the distribution. The reverse is true for positive correlation. The parameter affects the kurtosis of the returns distribution and hence the steepness of the implied volatility smile. If Ä is not too large, large values of cause more fluctuation in the volatility process and hence stretch the tails of the returns distribution in both directions.
A European call option maturing at time T with strike price K can be expressed as c.S; v; t / D e r EOE.S.T / K/ C D e x P 1 .x; v; / e r K P 2 .x; v; /; where x D ln S and D T t is the time to expiration. The first term represents the present value of the underlying given an optimal exercise and the second term is the present value of the strike price payment. Moreover, P 1 is the delta of the European call option and P 2 is the conditional risk neutral probability that the asset price will be greater then K at the maturity. Both P j ; j D 1; 2, represent the conditional probability of the call option expiring in-the-money given the value x D x.t / of the stock and the value v D v.t / of the volatility at time t . Heston [5] considers the characteristic functions of these probabilities in the form and shows that where for j D 1; 2. The terms P 1 , P 2 are defined via the inverse Fourier transformation, for j D 1; 2 The equations (6), (7) and (12) give the solution for European call options as first introduced by Heston in [5]. The integrand in equation (12) is a smooth function that decays rapidly and presents no difficulties as was pointed out also in [5].

The little Heston trap
The characteristic function solution (7) was later the subject of the paper called "The little Heston trap" by Albrecher et al. [19], who investigated in detail the properties and relations between the original formulation given by Heston and other formulation presented for example in Gatheral [20], which is identical to the Heston formulation, but causes fewer numerical problems in the implementation of the model. Change of term (10) is desired and the new form is as well as change of formulation of terms C. I / given by (8) and D. I / given by (9). The new formulations read These changes produce an equivalent formulation to the original Heston formulation, but regarding properties of both formulations Albrecher et al. [19] provided proofs that the Heston formulation is unstable under certain conditions and the alternative formulation is stable for all considered parameter ranges. They also established a threshold maturity from which the Heston formulation suffers from instability. When the Feller condition is exactly satisfied, they encountered no problems in any version. The paper of Kahl and Jäckel [21] also concerns itself with this problem and uses the rotation count algorithm to overcome the discontinuities in the original Heston formulation.

Lewis formula
In the book by Lewis [22], the author presents the so called fundamental transform approach for option valuation. The pricing formula by Lewis is well-behaved compared to the formula by Albrecher [19] and it has a numerical advantage in the sense that we have to calculate only one numerical integral for each price of the option. A price of the European call option can be expressed as where X D ln.S=K/ C r and To show that formulas (6) and (16) are equivalent, we refer to the recent paper by Baustian et. al. [23], where authors also extended the Lewis's approach to models with jumps. Yet another formula was later obtained by Attari [24], who uses the Euler's identity to represent separately the real and imaginary part of the integrand that in Attari's representation decays faster than in the original Heston's (or Albrecher's) case. Since Attari's version does not bring any extra advantage over the Lewis formula, and the Lewis approach is more general, we further omit the Attari formula from our tests. All pricing formulas that are expressed as the inverse Fourier transform integral must be evaluated numerically with caution as is pointed out by Daněk and Pospíšil in [25]. They showed that inaccurately evaluated integrand causes serious problems to adaptive numerical quadratures that consequently can lead to significant differences in the option price, even in the order of hundreds of dollars. To overcome the problems with inaccurately evaluated integrands, Daněk and Pospíšil suggest to use the high precision arithmetic in problematic cases. In MATLAB, there exists a possibility of defining variables and perform numerical calculations in high precision arithmetic using the vpa (variable precision arithmetic) bundle that is part of the Symbolic Math Toolbox. In [25] authors provided a fast switching regime algorithm between the standard double arithmetic and the high precision vpa arithmetic. If not stated otherwise, we use this algorithm in the implementation of the Lewis formula (16).

Approximation formulas
Numerical evaluation of the inverse Fourier transform integral in (12) or (16) can be rather time consuming, which is most vivid in the calibration process that can require thousands of such evaluations. To find a suitable fast approximation is therefore of interest. Instead of the characteristic function solution, one can use for example the so called Edgeworth expansion of the European call option price (6), where probabilities P 1 and P 2 are approximated using the so called Sartorelli approximations, see Ribeiro and Poulsen [26]. This approach leads to a fast method for option pricing (according to Ribeiro and Poulsen a seventh order expansion is about three times faster than the Heston integration and it is working reasonably well in the range 80-120% moneyness). Another approach to approximation was used by Forde and Jacquier, who derived both a large-maturity approximation [27] as well as a small-maturity expansion formula [28]. The large-maturity approximation is derived using the tools from large deviations theory and it provides similar results as the volatility of volatility series expansion derived also by Lewis [22]. The small-maturity approximation formula is then derived using the so called saddle point expansion and although the approximation is reasonable good, its usage is rather impractical, because it requires a numerical solution of nonlinear equations that is rather time consuming (speed-up factor is not so significant comparing to the numerical integration in above mentioned formulas).
Among all approximation formulas for the Heston model, we would like to highlight the formula obtained by Alòs, de Santiago and Vives [15]. The approximation of the Heston European call price is where BS.t; x; / denotes the Black-Scholes price at t of a European call option with constant volatility , current log-stock price x, time to maturity D T t , strike price K and interest rate r, X t D ln.S t =K/, D t D exp. Ä.T t //, denotes the future average volatility, Alòs, de Santiago and Vives showed that the approximation is especially good for small , in particular that the error is of order O. 2 . C //. The biggest advantage of this approximation is however in its speed of calculation which makes it suitable for fast pre-calibration that we plan to show later on. We would like to close this section by mentioning also the approximation by the usage of fast Fourier transform (FFT) as was suggested by Carr and Madan [29]. Although the FFT methods, including the fractional FFT modification [30,31], cosine FFT method [32] or methods based on wavelets [33], are fast in calculating an approximation of the inverse Fourier transform integral in many discrete points at once, however, in option pricing problem many values are calculated redundantly and, moreover, with relatively low precision that should be in modern financial applications considered unsatisfactory. Heston model in particular was studied from the FFT perspectives by Zhylyevskyy [34,35].

Heston model calibration
Market prices are used to calibrate an option pricing model, which can then be used to compute prices of more complex (exotic) options or to compute hedge ratios. Determining the model parameters to match the market prices of a set of options is what is commonly referred to as model calibration problem, it is in fact the inverse problem to the option pricing problem. The price to pay for more realistic model such as the Heston model is the increased complexity of model calibration. As pointed out by Jacquier and Jarrow [36], the estimation method becomes as crucial as the model itself.
In order to compute option prices using the Heston model, we need input parameters that are not observable from the market data, therefore we can not use empirical estimates. Bakshi et al. [37] documented that the implied structural parameters differ significantly from their time-series estimated counterparts. For instance, the magnitudes of time-series correlation coefficient of the asset return and its volatility estimated from the daily prices were much lower than their option-implied counterparts.
One of the difficulties in solving the calibration problem is that the information observed from market data is insufficient to exactly identify the parameters. Several sets of parameters may perform well and produce model prices that are consistent with the prices prices observed on the market. This causes the ill-posedness of the problem.
In practice it is not possible nor meaningful to match exactly the observed prices. Therefore the problem of calibrating the model is formulated as an optimization problem. Our aim is to minimize the pricing error between the model prices and the market prices for a set of traded options. A common approach to measure this error is to use the squared difference between market and model prices, this approach leads to the nonlinear least square method where N denotes the number of options used, w i is a weight, C i .T i ; K i / is the market price of the call option observed at time t. C ‚ denotes the model price computed using vector of model parameters ‚ D .v 0 ; Ä; Â; ; /. Thinking of G as an objective function of ‚, the optimization problem (18) is not an easy one. Function G is not convex and is not of any particular structure. Moreover, it may have more than one global minimum and even if it had not, it is not clear whether the unique minimum can be reached by gradient based algorithm. Local deterministic algorithms can be used, but there is always the possibility to end up in a local minimum, one has to choose an initial guess, which is "good enough". Due to high sensitivity to initial parameters various stochastic algorithms are suggested. Nevertheless, for example Mikhailov and Nögel [10] use built-in Excel solver, which is a local optimizer that uses generalized reduced gradient method.
Another method used to calibrate the model is regularisation. This method adds to the objective function (18) penalization function, e.g. f .‚/ such that inf is convex, hence gradient based optimizing procedure can be used. This yields another parameter to be estimated˛, which is called regularisation parameter. For further details, see Hamida and Cont [38].

Considered optimization techniques
Following the methodology and results obtained by the authors in [13,14], we describe in more details the behaviour of all optimization techniques. We consider functions from MATLAB's Optimization Toolbox, genetic algorithm (GA) and simulated annealing (SA) as well as MATLAB's lsqnonlin using trust-region-reflective algorithm. In addition to the MATLAB's functions we also test performance of the MS Excel's solver, adaptive simulated annealing (ASA) (available at ingber.com as a standalone C code and library with MATLAB gateway available) and modified sequential quadratic programming (modSQP) method suggested by Kienitz and Wetterau [11]. Although the ASA is not implemented in MATLAB directly and the available gateway has its limits in vector calculations, the re-annealing algorithm in ASA is faster than the standard annealing in SA and based on the performance on a set of artificially generated prices we abandon SA results from the tables below. Although modSQP is not designed for nonlinear least squares minimization problems of the type (18), it surprisingly works well on small examples, but it is not applicable for larger number of options, therefore we omit modSQP as well.

Considered errors
To evaluate the performance of all optimization methods, we measure the following errors: Maximum absolute relative error:

Considered weights
As defined above, w i stands for a particular weight. In practice the weights are put where the most liquid quotes are on the market. This is usually around ATM. With our market data we use the bid ask spreads. We aim to have the model prices as close to the Mid price as possible. Thus we put weight on the small spreads. According to U. Nögel, everybody does his own fine tuning on that, depending on the market and the products. We decided not to pick up just one choice for the weight function, but rather test more of these choices and see if there is a significant influence on the results caused by the particular choice of the weight function. For one particular option we denoted the weights by capital letters A, B and C, where weight A denotes 1 jSpreadj , weight B denotes 1 .Spread/ 2 and weight C denotes 1 p Spread . The bigger the spread the less weight is put on the particular difference between the model price and the market price during the calibration process. In [14], it was shown that weights calculated from the Black-Scholes vega greeks (denoted by weights D) can be the least suitable and hence we abandoned their usage as well as we have not normalized the weights and omitted also the equal weights E.

Considered real market data
In order to calibrate Heston model to data obtained from real market we use Bloomberg's Option Monitor to obtain data for two consecutive days for ODAX call options. The following Table 1 sums up the data we use to calibrate the Heston model. As a risk free interest rate r we use the corresponding 3 month EURIBOR rate, S 0 is the current DAX value at the time of obtaining the data.

Calibration results -March 18, 2013
In this section we present the results of calibration of the Heston model to the first set of data from March 18, 2013. We were using GA and ASA as global optimizers and MS Excel's Solver and MATLAB's lsqnonlin as local optimizers. Global optimizers can find a solution on their own even without any initial guess, local optimizers on the other hand need a suitable starting point. Hence, the combinations of global and local optimizers also follow. In all algorithms we set the function tolerance stopping criteria to be TolFun=1e-9, i.e. the algorithm stops if the change in the best utility function value (i.e. the change to the minimal value found in previous iterations) is less than or equal to TolFun. In global optimizers this tolerance represents the average relative change of the best value of the utility function over the so called stalled iterations, with the maximum of stalled generations set to StallGenLimit=50.
In local optimizers, an additional termination tolerance TolX=1e-9 for change in the solution (point of the minima) is considered.

Genetic algorithm
We were able to obtain a negative correlation rho between the underlying asset and volatility using GA with different weights. Using weights A and B resulted in similar set of parameters, the best result in terms of absolute relative errors was obtained by using weights C. However, we can still observe higher absolute relative errors in Figure 2, for out of the money options, especially those maturing on June 21, 2013 represented by the first row.

Adaptive simulated annealing
As a result of the calibration using ASA with different weights we obtained three quite different sets of parameters. All three contained negative correlation parameter rho for the underlying asset and volatility, but other than that the parameters quite varied. Using weights B produced very low absolute relative errors. Only deep out of money options suffered higher than 0.8% average absolute relative error. This was the best result obtained so far and absolute relative errors can be seen in Figure 3.

Deterministic grid for lsqnonlin
When calibrating the model using lsqnonlin, the result can be extremely sensitive to the provided initial value. To overcome this we divided the state space of possible parameters' values in order to obtain a set of initial values. We used these as initial values and adaptively refined the results by repeating the process for regions where further improvement was possible. This method gave us the best results in terms of absolute relative errors, also the maximum relative error was in line with the rest of the errors. We do not observe high values for the out of the money options with short maturities in Figure 4. We used weights B to do so. Nonetheless, this method is rather time consuming and therefore we adopted another approach described in the next section in order to match the magnitude of the errors for the paramaters obtained by this method.

Excel's solver
Having observed quite high absolute relative errors for out of the money options with short maturities when using global optimizers ASA and GA, we tried to eliminate these errors by applying the local optimizers.
At first we employed MS Excel's solver and provided it with the initial guesses from the global optimizers (GA, ASA) and the results were as follows. Except for when we were using weights C, Excel was able to refine the values of parameters when starting from initial values provided by GA. For the choice of weights B we were able to match the best result in terms of average absolute relative error. This can be partly observed in Figure 5, also we were able to obtain maximum absolute relative error under 3%.  When starting from point provided by ASA, although the average absolute error was already at the low value of 0.58%, we were able to refine the parameters using MS Excel's Solver. The lowest average absolute error was obtained using weights A. Also the maximum absolute relative error was lowered to 3.53%. Similar results were obtained using weights C. Using weights B, Excel's solver did not refine the parameters significantly.
Although the MS Excel's solver behaved surprisingly well, it still has its limitations both in terms of speed and precision. Standard Excel installations for example do not support high precision arithmetic and one has to use third party products such as xlPrecision. Their usage was beyond the scope of this paper, i.e. the implementation of all codes in Excel use standard double arithmetic only.

lsqnonlin
We applied the same approach for lsqnonlin as previously for MS Excel's solver, thus starting from initial points provided by GA nad ASA. lsqnonlin was able to refine the parameters using all the weights and starting from both.
Using weights B and starting from the initial point provided by GA we were able to calibrate the model with average absolute relative error of 0.52%, also we obtained the lowest maximum absolute error of 2.33%, see Figure 6.
The best result, when starting from point provided by ASA, was also obtained using weights B and that was in terms of both maximum absolute relative error and average absolute relative error. Nevertheless, it was not better than what we were able to obtain when using initial guess from GA. Table 2 sums up the results comparing the global optimizers ASA and GA on the real set of data. However, using the lsqnonlin and starting from predetermined number of initial points we were able to calibrate the model with  Figure 4.

Comparison
As mentioned before, this is not quite a way to calibrate the model since it is very time consuming. But we can consider it as some sort of benchmark value.
Using the global optimizers and then employing MS Excel's Solver we were able to obtain set of parameters resulting in a similar magnitude of errors. For example in terms of average absolute relative error, we managed to calibrate the model using an initial point provided by GA with error of 0.51%, whereas the maximum relative error was 2.79%. The mean average relative error was of the same value as for lsqnonlin using deterministic grid for the initial points.
The lowest maximum absolute relative error was obtained when using lsqnonlin with initial point provided again by GA and it was 2.33% with average absolute relative error of 0.52%.
We can say that the best initial point for local optimization was provided by GA algorithm, which is part of MATLAB's Global Optimization Toolbox and requires no additional installation as opposed to ASA.
The choice of weights plays an important role though. We were not able to distinguish which weight is the most suitable one. Since for example using weights C with Excel did not work for an initial point provided by GA. But considering an initial point provided by ASA, the solver provided us with the best result as far as the maximum absolute relative error is concerned.
It is very likely, based on the results in Table 2, that weights B suit lsqnonlin the best in terms of both average absolute relative error and maximum absolute relative error.

Speed-up using approximations
A tremendous speed-up of the calibration process can be achieved if the optimization is run against the approximation formula (17) rather than the exact formula. For example local optimization using lsqnonlin can give a solution for any initial guess in computational time less than a second (reference PC with 1x Intel i7-3520M 2.9 GHz CPU and 8 GB RAM). To get a more precise solution, we take this result as a starting point for another lsqnonlin run now with the exact formula. If we have no initial guess available at all and we need to run the global optimization first, we can actually use the approximation formula in the global optimization as well. Moreover, it is not necessary to wait for the global optimization to finish, but several iterations are sufficient to give a meaningful result that can serve as the initial guess for the first local optimization. Based on empirical observations we designed the following calibration algorithm: Step 1: Run 10 iterations (generations) of the GA optimization for utility (fitness) function (18) that uses approximation formula (17), GA is run with population size 50, let us denote the result ‚ GA .
Step 2: Run the first lsqnonlin optimization for utility function (18) that uses approximation formula (17), the optimization is run with the initial guess ‚ GA , let us denote the result as ‚ LSQ 1 .
Step 3: Run the second lsqnonlin optimization for utility function (18) that uses exact Lewis formula (16), the optimization is run with the initial guess ‚ LSQ 1 and the result ‚ LSQ 2 is taken as the solution of the whole calibration process.
If run repeatedly, the average computational times for steps 1, 2 and 3 are 550 msec, 120 msec and 950 msec respectively, i.e. giving us the average calibration time 1.62 sec. However, if we skip the second step, i.e. if we run only one lsqnonlin optimization for utility function that uses exact Lewis formula (16) and the optimization is run with the initial guess ‚ GA , the computational times increase. The average time for this single step is 2.83 sec. The most significant difference is however in computational time of the first step. If it is run with the exact formula (16), the average time of this step is 4.15 sec. It is worth to mention that due to the random nature of GA, the variance of computational times can be large. Computational times of the GA optimization can especially grow, if the numerical evaluation of the integral in (16) requires to switch to the high precision arithmetic as we discussed in section 2.2 above. From this perspective, using the approximation formula in the global optimization is very efficient. We can conclude that incorporating the approximation formula into the calibration process brought us a speed-up factor ca 4.3, i.e. the calibration is now 4.3 times faster than the calibration procedure presented in [14]. In fact, the speed-up is even bigger, since in [14] the GA was run for more iterations than 10 and with population size 100, but to provide the meaningful comparison, in both tests we fixed the number of iterations in the first step of the calibration to be 10 and the population size to be 50 which seems to be sufficient for problems with five variables.

Calibration results -March 19, 2013
We did find out in the previous section that the global optimizers GA and ASA are comparable, although it seemed that GA provided us with a better starting point for local optimizers (we remind that in standalone GA or ASA optimizations the maximum number of iterations is not limited, the algorithms terminate if the step size or function tolerance stopping criteria is met). MS Excel's Solver was competing very well against MATLAB's lsqnonlin. We repeated the process on the data from the following day March 19, 2013. This time we had 10 more data available with one more maturity. Although performing well for the set of data from the previous day, this time Excel failed to significantly refine the initial values provided by the global optimizers. On the other hand, using MATLAB's lsqnonlin we were able to refine the initial set of parameters provided by GA and obtain average absolute relative error in Figure 8 comparable to average absolute relative error obtained by the method using deterministic grid as the initial value for lsqnonlin in Figure 7.
When comparing the global otimizers, GA provided better results than ASA. A solution from GA was then used as the initial starting point for lsqnonlin that improved the result even further, producing the lowest maximum absolute relative error of 2.22%, which can be observed in Figure 8.
Interesting fact is that as far as the weights are concerned, we can not say which works the best for GA and ASA. Different weights yielded best results for both GA and ASA when compared to the previous set of data. On the contrary, according to the results, lsqnonlin seems to be favoring weights B. All the results discussed above and more can be seen in Table 3.

Heston model simulation
Monte Carlo approach is a popular and flexible pricing alternative. As computational power still increases it may result in even wider application of Monte Carlo techniques in the near future, especially for certain derivatives which can not be priced in a closed form.

Euler scheme
Let us now apply the Euler scheme to the variance process (2). A naive Euler discretization of the variance process where Z v is a standard normal random variable and n denotes the step size. The main problem when using the above scheme is the fact, that the scheme does not preserve the positivity of the process. The discrete process v n can become negative with non-zero probability, as Z v is the standard normal variable,ˆdenotes the standard normal cumulative distribution function. This makes the computation of p v n impossible and the time-stepping scheme fails. To avoid this, the practitioners adopted two ways of fixing it. First approach is referred to as the absorption. This approach follows the rule that process equals to zero whenever it attains a negative value (i.e. if v < 0 then let v WD 0). The other approach is called reflection. The process is being reflected by its origin (i.e. if v < 0 then let v WD v), see Gatheral [20].
The various schemes can be unified in the following framework where f i .x/ D x for x 0 and i D 1; 2; 3 and f 3 .x/ 0 for all x 2 R. The particular choice for the "fixing" function f i .x/ in the literature is the identity function .f .x/ D x/, for the absorption we have f .x/ D x C , where x C D max.x; 0/, and for reflection f .x/ D jxj. In the paper by Higham and Mao [6] the scheme uses the following configuration f 1 .x/ D f 2 .x/ WD x and f 3 .x/ WD jxj, whereas for example in paper by Deelstra and Delbaen [39] f 3 .x/ WD x C is used. All schemes coincide and are to be bias-free as n ! 0 and so the choice of the fix functions may seem almost indifferent, but the contrary is true. While some schemes are extremely biased for practical sizes of the time step, others turn out to be almost bias-free for not "too extreme parameter configurations" as pointed out in Van Haastrecht and Pelsser [40].
According to the paper by Lord et al. [7] a scheme called the full truncation scheme produces the smallest bias, closely followed by the moment-matching, Euler scheme using log-normal approximation by Andersen and Brotherton-Ratcliffe [41] and the partial truncation scheme by Deelstra and Delbaen [39]. The full truncation scheme where x C Dmax.xI 0/. The simulation schemes of the corresponding asset price process (1) is then where Z S is a standard normal random variable with correlation to Z v . This pair of correlated standard normal random variables Z v and Z S with a correlation of between the driving Brownian motions using Cholesky decomposition can be simulated by setting where Z 1 and Z 2 are two independent draws from the standard normal distribution. Alternatively, the process for ln S.t / can be simulated, and then exponentiated. If we discretize log-stock price S rather than the stock price S , there are no higher order corrections to the Euler discretization for details, see Gatheral [20]. The log-Euler scheme ln.S nC1 / D ln.S n / C OEr as noted by van Haastrecht and Pelsser [40] does not entail any discretization error in the stock price direction. On the other hand, the scheme usually shows biases in the Euler discretization of the variance process (and thus in resulting stock prices).

Milstein scheme
Milstein scheme is a refinement to the Euler scheme produced by adding an extra term using the Itô's lemma. As pointed out by Gatheral [20], applying this scheme to the variance process yields reduced frequency with which the process goes negative relative to the Euler scheme. Nevertheless, negative values still appear and we have to adopt similar fixes in (24). The Milstein scheme reads where Z v stands for standard normal random variable. Similarly, we can apply the Milstein scheme to the process S.t/, which produces where Z S is a standard normal random variable with correlation to Z v .

Kahl-Jäckel scheme
Introduced in Kahl and Jäckel [42] as IJK scheme, it follows This scheme was developed as a drop-in replacement for the standard log-Euler procedure since it requires no additional random numbers per step or any adaptive refinement. Authors Kahl and Jäckel [42] claim the IJK scheme is not superior in convergence order to the log-Euler scheme, but they praise the fact that it excels over other equally simple methods by showing a superior convergence behaviour that is faster by a multiplicative factor.

van Haastrecht and Pelsser revised
In [40], van Haastrecht and Pelser commented on Kahl-Jäckel scheme implemented in the following way. Conditional on time s a discretization of the log Stock price process S for t > s (with ıt WD t s) reads exactly We left the original notation as well. This however does not match our implementation of the scheme. The Heston log price S comes from the formula by Kahl and Jäckel [42], which is exactly For Heston model we have parameter p D 1 2 . Also note that W D Z S p t and Z is Z V p t to match the notation a bit as well.
Clearly the sign in the term .Z S C Z V / is wrong and should be .Z S Z V / instead. Another mistake is in the last term, where there is 1 2 instead of 1 4 compared to our equation. b n is v p n for Heston in the original article by Kahl and Jäckel [42]. This implies b.s/ D v.s/ 1=2 , which cancels out with v.s/ 1=2 and leaves in the notation of Haastrecht and Pelsser. This is probably due to the fact that the IJK scheme in Kahl-Jäckel [42] was not stated explicitly for Heston model. The wrongly interpreted scheme published in Haastrecht and Pelsser [40] can be found in some other papers as well.

Exact scheme
Broadie and Kaya [8] devised an exact simulation algorithm for the Heston model. In numerical comparisons of their algorithm to an Euler discretisation with the absorption fix, they find that for the pricing of European options in the Heston model and variations thereof, the exact algorithm compares favourably in terms of root-mean-squared (RMS) error. According to Broadie and Kaya [8] the volatility at time t , given the values of v u and for u < t , can be written as and similarly the stock price at time t , given the values of S u and v u for u < t, can be written as The exact simulation method uses sampling exactly from the distribution of v t . This can be done given v u for some u < t by using sampling from the noncentral chi-squared distribution. The next step having v t and sample from v u is generating a sample from the distribution of the integral R t u v s ds. This is done by inverting the characteristic function. Broadie and Kaya uses this approach to obtain the probability function of the random variable V .u; t /, which has the same distribution as the integral R t u v s ds conditional on v u and v t . They use the trapezoidal rule to compute the probability distribution. The value of the integral is then simulated by the inverse transform method, by first generating a uniform standard normal variable U and then looking for the value of x for which the P .V .u; t / Ä x/ is equal to U .
Knowing all the terms in (26)  given the path generated by v t is normal with mean 0 and variance R t u p v s ds. This scheme gives an unbiased estimator of the derivative price. The algorithm is however very time-consuming, because of the application of the numerical inversion of a cumulative distribution using the characteristic function involving modified Bessel function of the first kind. Also we found out that with finer discretization the modified Bessel function of the first kind takes too large values (in particular, unrepresentable values). Therefore it is not recommended for the pricing of options when requiring the value of the asset price on a larger number of time instants. Nonetheless, (28) is used in the next sections.

QE scheme
The Quadratic Exponential scheme, also known as QE scheme, was introduced by Andersen [9]. It exploits the fact that the value v nC1 conditional on value v n follows a non-central chi-square distribution. It uses two different approximations to the non-central chi-square distribution, depending on the values of the variance.
The non-centrality parameter of the distribution for v nC1 is proportional to v n and for sufficiently large values of v n the scheme uses a power function applied to a standard normal variable Z v to approximate v nC1 . Thus a critical level C 2 OE1; 2 is introduced and compared to , which is given by D s 2 =m 2 , where given v n The algorithm uses U v , which is a uniform random number.
As stated above the value of C indicates which approximation to use. The first approximation can be only used for For higher values of , that correspond to low values of v n , the scheme will fail. For 1 the second one can be used. Therefore C 2 OE1; 2. Andersen [9] states that the exact choice for C appears to have relatively small effects on the quality of the overall simulation scheme and proposes C D 1:5.
This scheme can not be used with the Euler discretization of S.t /, since it would result in too feeble effective correlation and consequently, paths of S with poor distribution tails. Instead Andersen used the exact scheme from Broadie and Kaya (28) and tried to preserve the key driver of the correlation between S nC1 and v nC1 , which is v nC1 . Instead of sampling the integral R t u v s ds he used an approximation R t u v s ds n OE 1 v n C 2 v nC1 ; where setting 1 D 1, 2 D 0 yields Euler like setting. For the last integral the same approach as in (27) is applied, thus where Z S is standard normal random variable. This leads to the following discretization scheme for S.t / where K 0 ; :::; K 4 are given by: and 1 D 2 D 0:5 for a central discretization of the time integral of v in the exact scheme. Equation (29) is the exact proposal as can be found in Andersen [9]. However, we also include the drift, thus the resulting scheme we use for S.t / reads QE scheme appears to be the most popular one among the recent propositions. A couple years later, a high order discretization schemes for the variance process were presented in the paper by Alfonsi [43], but rigorous error analysis for Heston model was regarded as an open issue. Possible extensions of the QE scheme include GammaQE scheme and Double Gamma scheme introduced by Chan and Joshi [44]. These two methods use a long-time stepping discretization for the variance process. The Double Gamma scheme is an exact simulation scheme and the GammaQE scheme is an almost exact simulation scheme for the variance process where the accuracy improves with the step length. Both methods are based on the combined compound Poisson and Gamma distribution representation of the variance process (cf. with Cox et. al. [17]). Proposed schemes then use precalculated values of the inverse of the gamma distribution and the compound Poisson random variables are approximated using a mixture of exponential distribution and quadratic Gaussian distribution. There are some similarities to the non-central chi-squared inversion scheme presented by van Haastrech and Pelsser [40]. None of these methods are, however, widely accepted by practitioners and further numerical analysis of these schemes is an ongoing research.

E+M scheme
We also consider a combination of the exact scheme for the underlying asset process with Milstein scheme for the volatility process. The motivation for this is the fact that the frequency of negative values of the variance process decreases for the Milstein scheme (24) compared to Euler, as pointed out for example in Gatheral [20]. The idea behind the use of the exact scheme by Broadie and Kaya (28) is that it can be used for discretizations on a larger number of time instants when not trying to sample the integral, but using an approximation instead. Thus, we expect this scheme to perform better than Milstein and log-Euler scheme. The question is whether it can outperform the QE scheme, because QE scheme approximates the non-central chi-square distribution and therefore does not suffer from negative values for the variance process. However, the proposed scheme does not include switching between different approximations, which makes it easier to implement and quite possibly faster compared to the QE scheme.

MC simulations
Most of the tests done in previous literature are quite easy to pass, they typically involve at-the-money options and model parameters are not similar to what we were able to obtain from the real market. Thus we aim to test the schemes using parameters obtained from real market and for range of strikes, that is also similar to market range.
We use the set of parameters listed in Table 4. The model implied parameters were obtained by genetic algorithm and then refined using nonlinear least-squares method. Using the model with this set of parameters we were able to obtain prices with mean absolute relative error of 0.65% and maximum absolute relative error of 2.22%. However, as we stated above, the choice of the parameters can cause problems. The following Figure 9 shows the mean path of the underlying asset for the examined schemes. We used D 2 6 and T D 1. Number of simulations was nP at hs D 100 000. It is clear in Figure 9 that the paths simulated by IJK scheme differ significantly. This behaviour is invariable over changing number of simulation and the step size and results in great errors when trying to price the options using Monte-Carlo method. IJK scheme clearly does not work for this set of parameters and thus we do not consider it as a competing scheme. Nonetheless the results for IJK scheme are presented for illustration.
It is arguable whether the IJK scheme is of any practical use, since even with so to speak friendlier set of parameters the IJK scheme seems to be outperformed by the other schemes presented here.

MC test
We designed the following test for measuring performance of pricing the options using the previous schemes.
We have four maturities at times T D 0:25; 0:5; 0:75 and 1. We will be pricing 28 options for each maturity with strikes .7100; 7200; : : : ; 7500; 7550; 7600; : : : ; 8350; 8400; 8500; : : : ; 9000/: We can compute the price of the options at time 0 denoted as C.0/ with the parameters from Table 4 This leads to error e D C.0/ O C .0/: We measure errors for different steps and also number of simulations used in terms of mean absolute relative error, thus where N is the number of options, in our case N D 112. However, O " j is a random variable taking different value in different batch j D 1; : : : ; M . We estimate the variance 2 " of O " by arranging the nP at hs simulations into M batches and use it to construct a confidence interval for the absolute error ". In particular we estimate the mean of the batch averages and then use formula to estimate the variance 2 " . For the Student t-distribution with M 1 degrees of freedom an 100.1 ˛/% confidence interval for " has the form where with t 1 ˛;M 1 is determined from the Student t-distribution with M 1 degrees of freedom. Figure 10 corresponds to    results for smaller than 2 8 since the scheme was not improving and it also took too long to simulate the paths this way. This is a factor which we mentioned earlier and indeed the E+M scheme was better in terms of speed, but as we can see from the pictures it needed finer discretization to reach the level of mean absolute relative error of the QE scheme.

MC test results
Nevertheless, the combination of the exact approach provided with the variance process discretized using Milstein scheme performed significantly better than the log-Euler scheme and plain Milstein scheme without much of an additional effort needed. And this scheme can be used when greater precision is required for larger number of times steps.
We also included the results of IJK scheme although it proved to be not useful for this set of parameters. We can see this also in Table 5. The errors are high and do not depend on the time step .

Variance reduction
In Monte Carlo simulations, reduction of the variance of the simulation results is of interest. One of the simplest variance reduction methods uses the so called antithetic variates. Having one simulated path of considered stochastic process, antithetic path is such a trajectory that is negatively correlated to the original path and both paths together reduce the variance of the sample paths. There are at least two advantages of this technique. At first, antithetic variates reduce the number of underlying random numbers needed to generate nP at h trajectories. Moreover, they reduce the variance of the sample paths and consequently improve the accuracy.
Let Â D EOEh.X / D EOEY and let Y 1 and Y 2 be two generated samples of the random variable Y . Then an and its variance Obviously the variance is reduced if Cov.Y 1 ; Y 2 / < 0. This principle can be easily adapted to pathwise simulations such as the Euler or Milstein schemes that use the standard normal random variables for simulation of the underlying Wiener process. If Y 1 N . ; 2 / and Y 2 D 2 Y 1 , then also Y 2 N . ; 2 /. This fact actually can be used also in the QE scheme, that behaved best in our tests. We can therefore further improve our Monte Carlo simulations.
Let O C 1 denote the option price estimate (32)  In our example of nP at hs D 100 000 simulations of the process using the QE scheme with the same values of the parameters and T D 1 and with antithetic variates implemented, we get the variance reduction ca 38.2%.

Conclusion
We presented several most popular and industry standard methods to simulate the Heston model and gave reasons why we excluded the Exact scheme by Broadie and Kaya [8] from the testing. It was not only because of the computationally expensive numerical inversion but also because of the problems of dividing two modified Bessel functions of the first kind. These two were causing difficulties because of the representation of their rather high values for finer discretizations. Also IJK scheme by Kahl and Jäckel [42], which is supposed to be a drop-in method for simple schemes, is tested with parameters obtained from the market and it appears not to be useful for pricing options using Monte Carlo simulations. We also did experiments and proposed a scheme compromising speed and accuracy based on the observed performances of the so far known schemes and we seem to have out-performed the other simple schemes when using the Monte Carlo simulations for pricing options. The proposed scheme uses Milstein scheme for the variance process exploiting the lower frequency of negative values and combines it with the Exact scheme approach by Broadie and Kaya for the asset process and could be a drop-in replacement for log-Euler or Milstein scheme if higher precision is required. The best in terms of precision appears to be the QE scheme of Andersen [9], which is the most popular one among the practitioners these days. All the conclusions were made based on the test that involved mimicking real market situation and thus we avoided parameters or range of strikes and maturities that could favour some of the schemes purposely. We could further improve even the QE scheme by some of the variance reduction technique such as by using antithetic variates, but in our testing we avoided variance reduction methods in order to compare the core of all simulation methods. We did experience rather turbulent behaviour with the classical numerical schemes, probably due to the possible negative values and the consequent fixes of the variance process, and thus we did not provide a proper results in terms of convergence criterion. This appears to be an issue, which could use some further exploration and proper analysis. Also further study of the Exact scheme and possible simplifications seems attractive since practitioners prefer these schemes to the higher order numerical schemes. Variance reduction using antithetic variates is an easy way to improve the Monte Carlo simulations.
Calibration of the Heston model is not covered in the literature to such extent that could be comparable to the coverage of the simulation schemes. We take the real data from the market and try to calibrate the model to them. This is not an easy task, especially when one uses rather high number of options. We approached it by combining the local optimizers, which tend to be sensitive to an initial guess and thus we think one should avoid using only those, with the global optimizers. We suggested using the genetic algorithm from the MATLAB's Global Optimization Toolbox and we developed what we believe is a stable procedure combining the genetic algorithm providing an initial guess, which is later refined by non-linear least squares method (lsqnonlin). Incorporating the usage of approximation formula into the calibration process is especially useful in the global optimization part and we showed that we can speed-up the overall calibration more than four times. Having the opportunity to run long computations in the distributed environment we were able to find benchmark values by creating a grid of the state space for the parameters and find out what the model is capable of. We managed to match the magnitude of the errors by our procedure, but of course in shorter computational time.
We also used adaptive simulated annealing and Excel's solver and compared the results for two consecutive days. Another conclusion is that slight adjustments to weights used during calibration can play a significant role and one can improve the results when experimenting with them.
Many other stochastic volatility models have been introduced since the original Heston paper [5]. First possible extension can involve time-dependent parameters. A model with piece-wise constant parameters is studied by Mikhailov and Nögel [10], a linear time dependent Heston model is studied by Elices [45] and a more general case is introduced by Benhamou et al. [46], who calculate also an approximation to the option price. However, Bayer, Fritz and Gatheral [47] claim that the overall shape of the volatility surface does not change in time and that also the parameters should remain constant. Affine jump-diffusion processes keep original parameters constant and add jumps to the underlying asset process, to volatility or to both, see e.g paper by Duffie, Pan and Singleton [48]. A special attention is nowadays paid to models with fractional volatility, we refer to recent paper by Pospíšil and Sobotka [49] where a new approximative fractional stochastic volatility jump diffusion model is presented and compared to the Heston model. In [50], authors also study the robustness and sensitivity analysis of these two models.