Fast exact simulation of the first passage of a tempered stable subordinator across a non-increasing function

We construct a fast exact algorithm for the simulation of the first-passage time, jointly with the undershoot and overshoot, of a tempered stable subordinator over an arbitrary non-increasing absolutely continuous function. We prove that the running time of our algorithm has finite exponential moments and provide bounds on its expected running time with explicit dependence on the characteristics of the process and the initial value of the function. The expected running time grows at most cubically in the stability parameter (as it approaches either $0$ or $1$) and is linear in the tempering parameter and the initial value of the function. Numerical performance, based on the implementation in the dedicated GitHub repository, exhibits a good agreement with our theoretical bounds. We provide numerical examples to illustrate the performance of our algorithm in Monte Carlo estimation.


Introduction
The study of the first-passage event across a barrier is a classical subject that has long been of interest for many stochastic processes, including Lévy processes.For instance, first-passage events describe the ruin event (i.e. the ruin time and penalty function) in risk theory [5,Ch. 12], are used to describe the law of the steady-state waiting time and workload in queuing theory [3] and arise in the probabilistic representation of the solutions to fractional partial differential equations (FPDEs) (see e.g.[16] and the references therein), as well as in financial mathematics (in payoffs of barrier-type derivative securities).In all such areas, simulation algorithms and Monte Carlo methods are of great interest and widely used (see [4,Ex. 5.15]).However, direct biased Monte Carlo methods, based on random walk approximations of the crossing time, may lead to errors that are often difficult and computationally expensive to control, see e.g.Subsection 3.3.1 below.Fast exact simulation of the crossing time of a level by a subordinator is thus needed for a stable Monte Carlo algorithm in these contexts.For example, in [20], exact simulation of the first-passage time of a stable subordinator was used to solve numerically an FPDE with a Caputo fractional derivative [19].Generalising [19] beyond Caputo fractional derivatives requires an exact simulation algorithm for more general subordinators, along with an a priori bound on its expected running time.The exact simulation algorithm in [8], applicable to a broad class of subordinators, typically turns out to have an infinite expected running time (see Subsection 2.4 below), making it hard to exploit the general algorithm in [8] in this context.
The present paper develops an exact simulation algorithm for the first-passage event of a tempered stable subordinator across a non-increasing function, such that its random running time has finite exponential moments with explicit (in terms of model parameters) control over the expected running time.While our algorithm is widely applicable to problems in applied probability discussed above when the underlying model is a finite variation tempered stable process (cf.[18] for examples in financial mathematics), our main interest in this algorithm stems from our aim to generalise [20] to tempered fractional derivatives and possibly, in future work, to more general non-local operators using ideas from [8,9].

The complexity of the main exact simulation algorithm
For any θ > 0, α ∈ (0, 1) and q ∈ [0, ∞), consider a driftless, possibly, tempered stable subordinator S with Lévy measure ν(dx) = θ(α/Γ(1 − α))e −qx x −α−1 dx.The firs-passage event of S is described by the Here and throughout we assume that elementary operations such as evaluation of sums, products and elementary functions sin, cos, exp all have constant computational cost.Further, we assume all operations use floating point precision and denote by N ∈ N the desired number of precision bits of the output.Then T has finite exponential moments (up to some order) and its mean is bounded above by where the constant κ 2 depends on neither α ∈ (0, 1), q ∈ [0, ∞) nor b (see Corollary 2.2 below).Moreover, the constant κ 2 can be made explicit in terms of the costs of elementary operations (such as addition, multiplication, exponentiation and trigonometric functions).
Since the law of (τ b , S τ b − , S τ b ) the vector degenerates in any of the limiting regimes, α ↑ 1, α ↓ 0, q → ∞ or b(0) → ∞, the deterioration in the performance of the algorithm in these regimes is expected.Indeed, the subordinator S converges weakly to a linear drift as α ↑ 1, making the law of the undershoot S τ b − increasingly harder to simulate from.If instead α ↓ 0 or q → ∞, then the subordinator S converges weakly to the trivial process 0 for which the first-passage time τ b is infinite.Finally, if b ≡ b(0) increases, then the crossing time τ b increases, extending the running time of several loops in the algorithm.The main point of the formula in the display above is that it provides an explicit insight into the deterioration of the performance of the algorithm in any of the limiting regimes.There is a good match between this theoretical upper bound on the performance and what we observe through the implementation found in the repository [13], see Subsection 3.1 below for details.
The key step in the exact simulation of the first-passage triplet (τ b , S τ b − , S τ b ) for a general tempering parameter q ∈ [0, ∞) (Algorithm 2 below) is the stable case q = 0, given in Algorithm 3 below.The the general case q ∈ (0, ∞) in Algorithm 2 requires only an additional accept-reject step based on the Esscher transform described below.In turn, the simulation of the first-passage event of a stable process is essentially reduced to the simulation of the undershoot S τ b − on the event {S τ b − < b(τ b )} when the first passage did not occur continuously (i.e. by creeping 1 ).Sampling the undershoot efficiently, a deceivingly hard task (see Subsection 2.3 below), is the main technical contribution of this paper.Although the density of the undershoot S τ b − admits a semi-analytic explicit form (see (4.5) below), it is unbounded and does not lend itself well to direct accept-reject methods.Our main strategy to circumvent this issue is to break up the domain into regions, depending on the values of the (possibly random) parameters, and use a mix of direct accept-reject methods and Devroye's algorithm for log-concave densities [11], see Algorithm 11 in Appendix C.2 below.The algorithm in [11] plays a key role precisely in the regions where the density of S τ b − decreases sharply to super-exponentially small values, making the standard accept-reject methods infeasible.
Most algorithms developed and used in this paper are based on a combination of two components: (I) a numerical search step in the form of binary search (also known as the bisection method), possibly followed by an application of the Newton-Raphson method (rigorous care is taken to verify the assumptions that guarantee quadratic convergence of the Newton-Raphson algorithm) and (II) rejection (or accept-reject) sampling.For both components, we do a careful complexity analysis accounting for all computational operations.This is crucial because parameter values themselves are often random, requiring control over their distribution as well as the impact on the computational complexity when these random parameters take extreme values.Note that the infinite expected running time of the algorithm in [8] arises essentially due to an accept-reject step having a random acceptance probability with large mass close to zero, see Subsection 2.4 below for details.

Comparison with the literature
Exact simulation of the first-passage event of a tempered stable subordinator is a central topic of two articles [8,9].Chi [8] was the first to develop a simulation algorithm for the first-passage event for a wide class of subordinators over a general non-decreasing absolutely continuous boundary functions b.The case of the tempered stable subordinator plays a prominent role in [8], because the entire general class of subordinators considered in [8] can be viewed as a modification of the tempered stable case.Indeed, the main algorithm in [8] relies on this structure.As discussed above, the algorithm for the tempered stable first-passage event in [8] typically has infinite expected running time (see Subsection 2.4), making it hard to use in Monte Carlo estimation.
Dassios, Lim and Qu [9] consider a tempered stable subordinator and a constant function b.They develop an exact simulation algorithm [9,Alg. 4.1] for the pair (τ b , S τ b − b(τ b )).In this case, the problem is first reduced to the stable case via the Esscher transform.Then the constant barrier makes the density of (τ b , S τ b − b) explicit with a stable scaling property.The scaling property in turn allows for a change of variables (in the form of the variable r = y −α/(1−α) t 1/(1−α) in [9, l.11, §4.2]) that makes it easy to sample from the resulting density.This technique, which depends on the explicit density of the random vector (τ b , S τ b − b), does not generalise easily to a non-constant function b, including a linear function b(t) = a 0 − a 1 t with a 1 > 0. Part of the difficulty in generalising such methodology is that a new possibility arises: the subordinator may creep (i.e.cross continuously [7,8]) through the curve b, in which case Moreover, the probability of creeping is not uniform in time even for linear functions (see Proposition 4.1(b) below), making it hard to compute even the mass of the absolutely continuous component of the joint law.

Organisation of the paper
The main algorithms and the results on their respective computational complexities are provided in Section 2. In Section 3 we present numerical evidence for the theoretical bounds on the computational complexities of our algorithms.We also present two numerical applications of our simulation algorithms, using Monte Carlo estimation, to solve fractional partial differential equations and to price barrier options.The GitHub repository [13] contains the implementation (in Julia) of the algorithms in this paper, used in Section 3. Finally, the proofs of all our results on the validity and computational complexities of our algorithms are given in Section 4. See [14] for a short YouTube presentation on Prob-AM channel discussing the algorithm.

Sampling the first passage of a tempered stable subordinator
For any stability index α ∈ (0, 1), intensity θ ∈ (0, ∞) and tempering parameter q ∈ R + , let S = (S t ) t∈R + be, under P q (with expectation operator E q ), a driftless tempered stable subordinator started at 0 with Laplace transform and Lévy measure Note that, under P := P 0 , S is a stable subordinator with Lévy measure ν := ν 0 .
Before presenting the algorithm, we give a brief intuitive account.The measures P q are known to be equivalent to each other (see Appendix B).In fact, it is known that the simulation of a path function of (S t : t ∈ [0, T ]) (for some T > 0) under P q is equivalent to proposing the same path function under P 0 and then accepting with probability exp(−q α θT ).Thus, it is natural to expect that the simulation of the first-passage triplet (τ b , S τ b − , S τ b ) under P q can essentially be reduced to the case q = 0 on the stochastic interval [0, τ b ].However, since τ b is possibly not bounded above, this could make the acceptance probability arbitrarily small.Algorithm 1 makes use of the change-of-measure idea by introducing a time grid and sampling S on the grid until the compact interval in which τ b lies is identified.
Algorithm 1 Simulation of the triplet (τ b , S τ b − , S τ b ) under P q Require: Parameters α ∈ (0, 1), θ > 0, q 0 and function Sample s from the law of S t * under P q via Algorithm 10 Sample w from the law of S t * −t under P via Algorithm 9 and E ∼ Exp(1) Algorithms 9 and 10 are well-known simple and fast procedures for sampling stable and tempered stable marginals, respectively (see Appendix C.1 below).Algorithm 4 and its analysis constitutes the main technical contribution of this paper.The complexity of Algorithm 1 is given in the following theorem.Proofs of all the results in this paper are in Section 4 below.
Theorem 2.1.Algorithm 1 samples from the law of the triplet (τ b , S τ b − , S τ b ) of a tempered stable subordinator S under P q .Moreover, the running time of Algorithm 1 has exponential moments with mean bounded by where the constant κ 1 does not depend on the parameters α ∈ (0, 1), θ ∈ (0, ∞) or q ∈ [0, ∞), and N specifies the required number of precision bits of the output.
The upper bound on the running time of Algorithm 1 has a term of order O(exp(2qb(0)/(2 α − 1)), which may be quite large if either b(0), q or 1/α are large.However, the exponential dependence on these parameters of the complexity of Algorithm 1 can be easily turned into a linear dependence (see Cor. Require: Parameters α ∈ (0, 1), θ > 0, q 0 and function Corollary 2.2.Algorithm 2 samples from the law of (τ b , S τ b − , S τ b ) under P q .Moreover, the running time of Algorithm 2 has exponential moments with mean bounded by where κ 2 depends on neither α ∈ (0, 1), θ ∈ (0, ∞) nor q ∈ [0, ∞), and N is as in Theorem 2.1.

Sampling the first passage of a stable subordinator
The crucial ingredient of Algorithm 4 for sampling (τ b , S τ b − , S τ b )|{τ b t * } under the stable law P is Algorithm 3 for the sampling of the unconditional (τ b , S τ b − , S τ b ) under P.The basis of Algorithm 3 is Proposition 4.1 below, describing the law of (τ b , S(τ b −), S(τ b )) under P. Before presenting it, we give a brief account of its contents.First, Algorithm 3 samples the crossing time τ b of the stable subordinator.Then, the algorithm determines whether the crossing occurred continuously (i.e., by creeping, with a 'jump' of size ∆ S (τ b ) = 0) or via a jump of size ∆ S (τ b ) > 0. In the former case, the algorithm simply returns the crossing triple and, in the latter, the algorithm samples the undershoot and the overshoot.
The proof of Theorem 2.3 is in Section 4.4 below.Theorem 2.3 implies that the expected running time of Algorithm 3 may increase as fast as (1 − α) −3 as α ↑ 1.A deterioration in the limit as α ↑ 1 is expected since, in this limit, the stable subordinator S converges to a pure-drift subordinator θt, for which the distribution of the triplet degenerates to a delta distribution.Similarly, the expected running time of Algorithm 3 may increase as fast as | log α| as α ↓ 0. This is expected as well since the stable subordinator S converges to the trivial process 0 as α ↓ 0. Numerical evidence for these increases in complexity and the theoretical bounds in Theorem 2.3 can be found in Section 3.1 below.We stress that existing algorithms for the simulation of the triplet (τ b , S τ b − , S τ b ) have infinite expected running time for any α ∈ (0, 1), see Subsection 2.3 below for more details.
A minor variation of Algorithm 3 yields samples conditional on the event τ b t * .
This is the only step different from Algorithm 3  Generate U 2 ∼ U(0, 1) t * .Moreover, the running time of Algorithm 4 has exponential moments and its mean is bounded by where the constant κ 3 is as in Theorem 2.3 and N specifies the number of precision bits of the output.
As we said before, Algorithms 3 and 4 have three main steps.First, we generate the first-passage time T in line 1.This requires evaluating, up to N bits of precision, the inverse B −1 of the function B : t → t −1/α b(t), which is on (0, T b ) where T b := inf{t > 0 : b(t) = 0} ∈ (0, ∞].This step may require a numerical inversion of B (or, equivalently, numerically finding the root of t → B(t) − V 1 ).Such algorithms are standard in the literature (see Appendix C.3), making this a simple step.Second, the algorithm determines if the first passage occurred continuously (i.e., by creeping) in line 3.If the process crept, the triple equals (T, b(T ), b(T )), but, if it did not, then we must sample the undershoot and overshoot of the process conditional on the crossing time.The third main step is precisely the simulation of the undershoot and overshoot in lines 7 and 8.The law of the undershoot, although expressible in terms of the stable law, is complicated and hard to generate samples from.For this reason, the focus of the remainder of this section is to construct Algorithm 5.

Simulation of the stable undershoot via domination by a mixture of densities
Our main objective now is to present a simulation algorithm from the undershoot law US α (t, w), t, w > 0, that is, the law of S τ b − under P, conditional on {τ b = t, b(τ b ) = w, ∆ S (τ b ) > 0}.Our main tool is the general rejection-sampling principle.Unlike before, the tools and algorithms used here have little probabilistic interpretation and are mostly based on analytical properties and inequalities.The derivation of these inequalities, which validate Algorithm 5, will be developed and explained in Section 4 below.For now, we only present the algorithms with a brief description of how it works.The algorithm has several nontrivial components that we will slowly introduce and explain.The reason for this is that the task of simulating from the undershoot law US α (t, w) comes with multiple pitfalls, with several tempting approaches having severe problems, see Subsection 2.3 for an account of these pitfalls.
First, the density f t,w of US α (t, w) involves the density of the stable law, which can be written as a non-elementary integral of elementary functions (see Proposition 4.1).Thus, by extending the state space and applying an elementary transformation, the problem is converted into producing samples from a density ψ proportional to (x, y) → (s − x −1/r ) −α σ α (y)e −σα(y)x on (s −r , ∞) × (0, 1), where r := α/(1 − α) and σ α is the convex function defined in terms of trigonometric functions in (4.2).The density ψ s is difficult to sample from directly, so we first bound it with a mixture ψ s = p ψ on (s −r , ∞) × (0, 1) proportional to, respectively, (x, y) → σ α (y)e −σα(y)x and (x, y) → σ α (y)e −σα(y)x (x − s −r ) −α (see details in Subsection 4.3 below).The simulation algorithms corresponding to these simpler densities will be discussed later.
Remark 2.1.The proportion p (and hence p) can be efficiently evaluated to arbitrary precision using Gaussian-Legendre quadrature.In fact, the numerator in the formula in (4.9) equals the distribution function of a stable law.The computation time for p is thus assumed to be constant for any s = (θt) −1/α w, see fast algorithms in e.g.[21,1].Our simulation times agree with this assumption, see Section 3.1 below.
The proof of Proposition 2.5 is given in Subsection 4.3.4below.In the following two subsections we describe how to sample from the laws given by the densities ψ

Simulation from the density ψ (1) s
The bivariate density ψ (1) s can be written as the product of a marginal log-concave density and an exponential density with deterministic shift and a random scale.The latter is easy to simulate and the former can also be simulated via the general Algorithm 11 by virtue of being log-concave.s .Moreover, the running time of Algorithm 6 is bounded by a multiple of G+α(1−α) −1 log + (s −1 ) where G is a geometric random variable with parameter 1/5.In particular, the running time has exponential moments with mean bounded by where the constant κ 6 > 0 depends neither on s ∈ (0, ∞) nor α ∈ (0, 1).
The proof of Proposition 2.6 is given in Subsection 4.3.1 below.Algorithm 6 does not rely on numerical inversion (cf.Algorithm 12).Its expected running time does therefore not depend on the number of precision bits N , specified a priori.In fact, the computational complexity of Algorithm 6 comes solely from the pre-prosessing step (line 1 of Algorithm 11), which depends on the values of the parameters in Algorithm 6.

Simulation from the density ψ (2) s
Simulating from the density ψ s is a delicate matter, particularly because of the sensitivity of the function on α ∈ (0, 1).Algorithm 7 below samples from the density ψ  depending on the parameter α and using rejection-sampling with respect to various dominating (proposal) densities.Algorithm 7 needs to numerically invert several twice differentiable functions f : [0, 1] → R to produce samples of certain dominating (proposal) densities.This inversion is achieved via the Newton-Raphson root-finding method given in Algorithm 12.Each call of Algorithm 12 requires the input of a specific target function f and an auxiliary function

22:
Generate V ∼ U(0, 1) 23: end if For the same reason we mentioned in Remark 2.1, the integral b a f (y)du can be computed efficiently to arbitrary precision via Gaussian-Legendre quadrature.The computation time for Step 3 in Algorithm 5 is thus assumed to be constant.Proposition 2.7.Algorithm 7 samples from the law given by the density ψ s .Moreover, the running time of Algorithm 7 is bounded above by a constant multiple of where N specifies the number of precision bits of the output and G is a geometric random variable with probability of success P(G = 1) bounded below by a multiple of (1 − α) 2 .In particular, the running time of Algorithm 7 has exponential moments and its mean is bounded by where the constant κ 5 depends neither on s ∈ (0, ∞) nor α ∈ (0, 1).
The proof of Proposition 2.7 is given in Subsection 4.3.3below.The computational cost in Algorithm 7 comes from various sources: (I) applications of the inversion Algorithm 12, which may depend on the parameters α ∈ (0, 1) and s > 0 (recall that the parameter s is in fact random in Algorithm 5, which calls Algorithm 7); (II) in the cases D = 0 and D = 1, we also perform an accept-reject step where the acceptence probability depends on the parameters α ∈ (0, 1) and s > 0; (III) in the case D = 2, the pre-processing (line 1 in Algorithm 11) also adds to the final computational complexity since the function Algorithm 11 is applied to in line 5 of Algorithm 7 depends both on α ∈ (0, 1) and s > 0. We note that the accept-reject step in the case D = 2 is embedded in Algorithm 11 and has uniformly bounded (in parameter values) expected running time.

How not to sample the stable undershoot
At a glance, there are several apparently feasible ways to produce a sample from the undershoot law US(t, w) of S τ b − , under P, conditional on {τ b = t, b(τ b ) = w, ∆ S (τ b ) > 0}.However, most of these simulation methods are either infeasible altogether or their computational complexity grows rapidly as either α ↓ 0 or α ↑ 1.In this section, we will briefly discuss some of these "approaches".
First recall that sampling from the undershoot law is, by Proposition 4.1(c), essentially reduced to producing samples from the density proportional to x → ϕ α (x)(s − x) −α 1 {x<s} where s = (θt) −1/α w = (θτ b ) −1/α b(τ b ) > 0 and ϕ α is the density of S 1/θ and given in (4.3).A first guess would be to employ rejection sampling on this density.Indeed, since the function x → (s − x) −α is unbounded, we cannot propose from the density ϕ α .However, the density ϕ α is bounded and hence we may propose from the density proportional to x → (s − x) −α and accept the sample with probability proportional to ϕ α (x).For a fixed s, this algorithm has a running time with finite mean; however, we require an algorithm that performs well for s −α puts most of its mass close to s (which is stable distributed and thus heavy tailed), the acceptance probability of such an algorithm is asymptotically equivalent to ϕ α (s)/ sup u∈(0,∞) ϕ α (u) ∼ Cs −α−1 as s → ∞ for some C > 0. In turn, this means that the running time is bounded below by C max{1, S α+1 1 }, for some C > 0, which does not have a finite mean (in fact, it has a finite moment p > 0 if and only if p < α/(α + 1)).
Another possibility would be to perform a numerical inversion of the distribution function , where ϕ α can be computed to arbitrary precision using numerical quadrature, one may expect Newton-Raphson's method to perform well in inverting F α numerically.It is, however, computationally intensive.Every step in Newton-Raphson method and in the binary search required prior to the quadratic convergence of Newton-Raphson's method evaluates the double integral in F α (recall that ϕ α is itself given in terms of an integral).If the numerical quadrature for computing ϕ α accurately requires n nodes, the corresponding quadrature for F α needs approximately n 2 nodes.Since we typically have n ∈ [10 4 , 10 5 ] for double-precision floating-point numbers, each numerical evaluation of F α requires evaluating at least 10 8 functions (sums, products, exponentiation and trigonometric functions).This makes this approach infeasible despite the good properties of Newton-Raphson method and the boundedness and smoothness of the function ϕ α .A third option is to maintain parts of our current methodology and modify the parts of the algorithm that sample from a density of the form u → σ α (u) p exp(−σ α (u)s) for some p ∈ (0, ∞) \ {1}.For instance, one may attempt a rejection sampling algorithm via elementary bounds on σ α , such as those found in Lemma 4.9 below.Such algorithms are feasible and, in fact, we use some of them for certain parameter combinations.The drawback of those approaches is that the acceptance probability becomes tiny as α → 1, which is why we introduce the auxiliary parameter z * and other simulation algorithms for certain parameter combinations.Indeed, the acceptance probability is often upper bounded by c r for some c ∈ (0, 1) where we recall that r = α/(1 − α).This probability is incredibly small even for moderate values of α.For instance, the quotient between the lower and upper bounds on σ α in Lemma 4.9 is proportional to (π/4) r+1 (1 − α) r , as α → 1, which is approximately 4.28 × 10 −6 (resp.8.93 × 10 −11 ) when α = 0.85 (resp.α = 0.9).
Our Algorithm 5 (and its subalgorithms) are in fact informed by the pitfalls described above.In particular, the choices made in Algorithm 7 may appear arbitrary but are, in fact, designed to control the computational complexity.It remains an open problem to design an exact simulation algorithm with uniformly bounded expected running time on the entire parameter space α ∈ (0, 1). Figure 3.1b below appears to suggest that the complexity of our algorithm grows as (1 − α) −0.2 as α ↑ 1, which is much smaller than (1 − α) −3 , as suggested by our theoretical bounds.[8,Sec. 7] In [8,Sec. 7], an algorithm that produces a sample of the first-passage triplet of a tempered stable process is given.Such an algorithm has almost surely finite running time.However, it can be seen as follows, that its expected running time is infinite.

Infinite expected running time of the algorithm in
Suppose for simplicity that θ = 1 and b is bounded by the variable denoted by r in the algorithm in [8,Sec. 7].Consider Step 3 of the algorithm in [8,Sec. 7], which uses rejection-sampling to simulate the pair (S α is a global bound on h.Thus, the expected runnig time of this step equals E[C], where C := 1/p d = M α /h(βS 1 , U ) and the variables U , β and S 1 are independent.Hence, recalling that r = α/(1 − α), we have However, E[S η 1 ] < ∞ if and only if η < α by (4.26) below.Since r = α/(1 − α) > 0 for all α ∈ (0, 1), the expected running time of the algorithm in [8,Sec. 7] is indeed infinite.
We note that, similarly, Step 5 of the algorithm in [8,Sec. 7] also has infinite expected running time.Since Step 5 is more complex than Step 3, we made the phenomenon above explicit only for Step 3.

Can an implemented simulation algorithm be exact?
The answer to the question in the title of the present subsection depends on the definition of exact simulation.The simplest, and in some sense most natural, definition requires the output of the algorithm to have the same law as the variable it is simulating.In this case, the answer is clearly no: working with N ∈ N bits of precision makes the output lie on a grid, implying that the total variation (TV) distance between a simulated output and any target law with a density equals one. 2 A way of dealing with this issue might be to understand and minimize a natural distance between the output of an implemented algorithm and a draw from the target law.Two natural ways of measuring the distance between algorithm's output and the actual law are as follows.(I) L ∞ -Wasserstein distance.Let A (resp.I) denote the output of an implemented algorithm (resp."ideal" sample form the target law).For any coupling λ of the laws of A and I, the L ∞ -distance is given by the essential supremum λ-ess sup |A − I| and the L ∞ -Wasserstein distance equals the infimum of L ∞ -distances over all couplings λ.It appears that even a simple accept-reject method, implemented on a computer cannot reduce this distance, regardless of the precision used (i.e. for any N ∈ N).Indeed, consider an implementation of Algorithm 11 (in Appendix C.2 below), together with the natural coupling, obtained by running the algorithm on a computer and on an ideal computer (IC), which carries out all evaluations without error.Even if the constants a 0 , a 1 and f (a 1 ) could be represented exactly on the grid, the eventdetection occurring in lines 6 and 8 will have a chance of being misdetected by the computer, simply because the variable V 2 is not uniform on (0, 1) but instead lives on the grid.Therefore, there is a (small, but non-zero) chance that the value of I originated from line 7, while the value of A came from line 9. On this event of positive probability, we have |A − I| = a 1 > 0, regardless of the precision we are using.The same phenomenon occurs with positive probability with lines 9 and 11, making the L ∞ -distances for the natural coupling in fact infinite.Of course, the L ∞ -Wasserstein distance may be smaller, but there appears to be no easy way of finding a good bound on it other than through the natural coupling.(II) Total variation distance on a grid.Another natural way of measuring the distance between the variables A and I would be to project I onto the state-space (i.e.grid in the computer) of A and consider the total variation distance (TV-distance) between the two discrete variables.However, rounding errors can make the TV-distance be equal to one for any level N ∈ N of precision bits used on a computer: consider, for example, the deterministic random variable 0 and assume the simulation algorithm first samples the deterministic variable π and then computes sin(π).This algorithm on an IC returns zero, which is already on the grid (so the projection does nothing).However, regardless of the number of precision bits N , our computer may, due to rounding errors, return a non-zero value on the grid, see Figure 2.1.In particular, in this case A and the projection of I onto the grid are at TV-distance one for all N ∈ N. Note in passing that the L ∞ -distance in this case decreases with increasing precision N , while the TV-distance does not.The discussion in (I) and (II) suggests that it is not evident how one can quantify a deviation from the strong definition of exact simulation stated above.In this paper we use the following weaker definition: an algorithm implemented on a computer is exact if, given exact values of the iid uniforms and assuming all the elementary operations carried out by the algorithm are exact, the returned value has the correct law (i.e. with zero TV-distance from the target law).In this definition, elementary operations include evaluations of elementary functions and their integrals as well as inversions.
Our main aim is to implement a fast exact algorithm and give theoretical guarantees for its running time, so that the algorithm can be used within a Monte Carlo estimation procedure.In exact simulation algorithms, function inversion is often implicitly assumed to produce exact values and take a bounded number of steps, uniformly in the input (see e.g.[8, Table 1, line 2]).In the case of our main simulation method (Algorithm 2 above), we instead carefully analyse the cost of each Newton-Raphson call needed to perform an inversions within the error margin, smaller than what can be represented on the computer.All other elementary operations, including the evaluation of the definite integrals in Algorithms 5 (Step 1) and 7 (Step 3) are assumed to have constant complexity (the integrands are well behaved and close to those arising in the representation of a stable density, making the numerical procedures converge fast, see remarks following the algorithms).
Finally, we note that the error between simulated variables on a grid and their idealised counterparts (e.g. for the increment of Brownian motion) are typically not taken into account when analysing the convergence properties of Monte Carlo estimates.This further motivates our definition of exact simulation for an implemented algorithm.

Applications
The main objective of this section is to present some applications of our algorithms.First, we present some numerical evidence for the theoretical bounds on the expected running time of Algorithms 2 and 3.
We then show how Algorithm 2 can be used to sample the first-passage event of a two-sided tempered stable process, which in turn can be used to price barrier options via Monte Carlo.Next, we use the probabilistic representation of the solution of a fractional partial differential equation (FPDE) [16] along with Algorithm 2 to produce Monte Carlo estimators of the solution of such a FPDE.

Implementation
In this section, we showcase an implementation of our algorithms in Julia computing language.Our first goal is to show that such an implementation makes the simulation of the first-passage event feasible and fast in practice.Our second goal is to show that, in practice, our bounds on the expected running time of Algorithms 2 and 3 overestimate the true complexity.Observe that the expected running time appears to be smaller for α = 0.999 (close to 1) than for α = 0.001 (close to 0).Although this is surprising since our upper bounds are of order O((1 − α) −3 ) for large α and of order O(| log α|) for small α, it is important to consider a few things.First, our key Algorithm 7, used to simulate the stable undershoot, uses a very different simulation method for α 1/2 and for α > 1/2.In particular, the multiplicative constants in the O notation can be significantly different.Second, the bounds in Theorem 2.3 are only bounds, and need not be sharp.In particular, this does not necessarily mean that the asymptotic cost as α ↑ 1 is truly smaller than the asymptotic cost as α ↓ 0.  E[T ] = O(q) as q → ∞.This is supported by the picture: the quotient log(E[T ])/ log(e + q) appears to converge to 1 as q → ∞.

Extensions to Lévy processes of bounded variation
We start by recalling the algorithm in [8,Sec. 5.1].Let Z be a Lévy process with non-positive drift, such that its Lévy measure satisfies Decompose Z as Z + − Z − , where Z + and Z − are independent subordinators with Levy measures ν + and ν − , respectively, and where Z + is driftless.Fix some c > 0 and let τ Z c := inf{t > 0 : Z t > c}.The following algorithm provides a method to simulate τ Z c under the assumption that the first-passage triplets (τ + x , Z τ + x − , Z τ + x ), where τ + x = inf{t > 0 : Z + t > x}, as well as the increments Z + t and Z − t can be sampled for any x > 0 and t > 0. Further consider some T ∈ (0, ∞].According to [8, Proposition 5.1], if either lim sup t→∞ Z t = ∞ a.s.(which can be determined, e.g., via Rogozin's criterion [15, Thm 2.7]) or T < ∞, then Algorithm 8 below stops a.s. and it samples the triplet (min{τ Z c , T }, Z min{τ Z c ,T }− , Z min{τ Z c ,T } ).Algorithm 8 First-passage event of a Lévy processes of bounded variation such that either lim sup t→∞ Z t = ∞ a.s. or T < ∞.
else 8: end if

Pricing barrier options
Let Z be a Lévy process of bounded variation started from 0 and let R t := R 0 e Zt , t 0, model the risky asset for some initial value R 0 > 0. The simulation of the first-passage event can be used to obtain Monte Carlo estimators of the price of a barrier option.Indeed, fix K < M and consider the payoff of the up-and-out barrier call option with payoff where Then the price of the option is given by ), i ∈ N, be iid samples with the law of (τ * , Z τ * ) generated via Algorithm 8.Then, for every n, we consider the Monte Carlo estimator   3].

Solution of fractional partial differential equations
Let g : R + × R d → R be a continuous function that vanishes at infinity and φ : R d → R. For a < b, consider the following fractional partial differential equation (FPDE): where A x is a generator of a Feller semigroup on C ∞ (R d ) acting on x, φ ∈ Dom(A x ), the operator − t D a is a generalised differential operator of Caputo type of order less than 1 acting on the time variable t ∈ [a, b].
The solution u of the FPDE problem in (3.1) exists and its stochastic representation is given by (see [16]): where {X x s } s 0 is the stochastic process generated by A x and T t = inf{s > 0, Y a,t s < a} where {Y a,t s } s 0 is the decreasing [a, b]-valued process started at t ∈ [a, b] generated by − t D a .Note that T t is the first-passage time of the subordinator (t − Y a,t s ) s∈R + over the constant level t − a.In the special case g ≡ 0, φ(x) = x 2 , A x = (x 2 /2)∆ (where ∆ is the Laplacian on R) and the subordinator is tempered stable, we can simulate the Monte Carlo estimator of the solution u of the FPDE in (3.1).In particular, the variables T k t are an iid samples of T t , obtained by Algorithm 2, while where N k are iid standard normal random variables.Since X x is a geometric Brownian motion and Algorithm 2 is exact, the variables φ(X x,k ) in the estimator in (3.2) have the same law as φ(X x Tt ).A plot of u n (t, x), based on n = 10 4 samples, is presented in Figure 3.4.In this case, the dependence of the estimator u n (t, x) for fixed t (as a function of x) is explicit.The section of u n (t, x) as a function of t (for fixed x) is obtained by sampling consecutive crossing times for increasing barriers.This is a wellknow procedure in Monte Carlo estimation that retains the accuracy of each estimate u n (t, x) but reduces random fluctuations between the values of u n at consecutive time points and a given level x by introducing correlation [12,Sec.4.2].The spacing between values of t (resp.x) where u n (t, x) is evaluated in Figure 3.4 is 1/20 (resp.1/100).

Comparison with a biased approximation
We stress that being able to simulate exactly from the law of T t is essential for the stability of the estimator u n (t, x).It is tempting to simulate a random walk approximation (i.e. the skeleton) of the subordinator (t − Y a,t s ) s∈R + on a time grid with mesh h > 0 and approximate the law of T t with the first time T (h) t the random walk crosses level t − a. Unlike u n (t, x), the estimator is biased.Thus, in order for the L 2 -error to be at most 2 > 0, its bias and variance need to be of order and 2 , respectively.In the specific case considered here, it is easy to see that, for fixed (t, x), the bias of u (h) n (t, x) is proportional to h.In particular, the computational complexity of evaluating φ(X x,k proportional to 1/h, making the total complexity of u (h) n (t, x) proportional to n/h ∝ −3 .In contrast, the computational complexity of evaluating u n (t, x) based on our exact simulation algorithm is proportional to n ∝ −2 .This yields an improvement in computational complexity proportional to 1/h ∝ −1 .
The analysis outlined in the previous paragraph can be made precise for all Lipschitz functions φ.We note that, if φ is only locally Lipschitz (as is the case in the example above), the error of the naive random walk algorithm may deteriorate significantly as a function of (t, x).For instance, in the example above, the bias is proportional to , where E q [exp(T t )] E[exp(T t )] (because stable paths dominate tempered stable paths) and, by Proposition 4.1(a) and (4.26), Thus, the bias of the naive approximation u (h) n (t, x) is proportional to hx 2 e t α .This would require h to be proportional to x −2 e −t α if the L 2 -error is to be smaller than 2 .Thus, the improvement in computational complexity of u n (t, x) over u

Distribution of the first-passage event of a stable subordinator
The paths of S are assumed to be right-continuous with left limits, where for t > 0 we define S t− := lim s↑t S s .The jump at time t is given by ∆ S (t) := S t − S t− .The first-passage event for S over a nonincreasing, absolutely continuous function Theorem A.1, stated and proved in Appendix A.1 below for completeness but originally established in [8] (see also [7]), describes the law of the triplet (τ b , S τ b − , ∆ S (τ b )) (and hence of (4.4), since S τ b = S τ b − +∆ S (τ b )) for a general class of subordinators.In particular, as a consequence of (A.1a), the stable subordinator S does not jump onto or out of the function b at τ b : Note that, since b is monotone and absolutely continuous, it is differentiable Lebesgue-a.e. on (0, ∞) and its derivative is a version of its density.We denote its derivative by b and, on the Lebesgue zero measure set where it is not defined, we set it equal to −1 (arbitrary choice, without ramifications).The next proposition is key for the sampling of (τ b , S τ b − , S τ b ).
(c) Conditional on S not creeping at the crossing time τ b = t ∈ (0, T b ), we have ) ∞ B(t) g 1 (s)ds, which is absolutely continuous since b (and hence B) is absolutely continuous.Differentiating this identity in t and applying the scaling property to the density g t (s) = t −1/α g 1 (t −1/α s) yields (c) By (A.1a) in Theorem A.1, for t ∈ (0, T b ) and u ∈ (0, b(t)), we have To evaluate the denominator, let us recall that K(t) is the density of τ b .Together with (A.1b) we have and (4.5) follows.Also from (A.1a) we know that and (4.6) follows.

Simulation of the first-passage event of a stable process
The task is to justify the validity of Algorithm 3, which samples the triplet (τ b , S τ b − , S τ b ) of the stable subordinator S under P.The analysis of its expected computational complexity has some technical prerequisites and will thus be deferred to Subsection 4. By the previous paragraph, for Algorithm 3 to be valid, it suffices to justify that Algorithm 5 produces samples from the law US α (t, w) of S τ b − |{τ b = t, b(τ b ) = w, ∆ S (τ b ) > 0}, which is described analytically in (4.5) of Proposition 4.1(c).Note that the simulation from the law US α (t, w) via Algorithm 5 is the only step in Algorithm 3 that has random running time.In fact, the design of Algorithm 5 is one of the main contributions of this paper and will be discussed in detail in the following subsection.

Simulation from the law of the stable undershoot
We now describe and justify Algorithm 5 and analyse its computational complexity.Recall from Subsection 4.1 above that the function b : [0, ∞) → [0, ∞) is absolutely continuous and non-decreasing and hence almost everywhere differentiable with derivative b almost everywhere equal to the density of b and set to be equal to −1 at the points of non-differentiability of b.Further recall that r = α/(1 − α), w α = Γ(1 − α)/α > 0 and the definition T b > 0, as well as that of the functions σ α and ϕ α defined in (4.2) and (4.3), respectively.By (4.5) in Proposition 4.1(c) and the representation of the stable density g t (x) in (4.3), the density of US α (t, w) is given by while the density of ξ ∼ US α (1/θ, (θt) −1/α w) equals Clearly, (θt) −1/α f 1/θ,(θt) −1/α w ((θt) −1/α x) = f t,w (x) for x ∈ (0, w).Thus, (θt) 1/α ξ ∼ US α (t, w).It suffices to sample ξ from the law US α (1/θ, s) with density x → ϕ α (x)(s − x) −α /(w α sϕ α (s)), x ∈ (0, s) and s = (θt) −1/α w.Sampling efficiently from a density on (0, s), proportional to x → ϕ α (x)(s − x) −α , is hard to do directly, see Section 2.3.We use the integral representation of ϕ α (x) in ∼ US α (t, w).Moreover, the law of the random vector (ζ s , Y s ) is given by the density Fast exact simulation from the law given by the density ψ s is non-trivial but possible, see Algorithm 5.The first step, given in Proposition 4.2, is to dominate the density ψ s by a mixture of densities that can both be simulated using rejection sampling.

Simulation from the density ψ (1)
s : proof of Proposition 2.6 The proof of Proposition 2.6 requires the following two elementary lemmas about certain properties of the function σ α defined in (4.2).
As explained in Section C.2 below, the computational cost of Algorithm 11 has two parts: (I) the cost of finding the value of a 1 in line 1 of Algorithm 11 and (II) the expected cost of the accept-reject step in Algorithm 11.By [11,Sec. 4], the expected cost of (II) is bounded above by 5.
Note that cost (I) of finding a 1 within Algorithm 6 is not constant in the variable s, which is itself random in Algorithm 3. Thus we need to analyse the cost of finding a 1 in Algorithm 11 as a function of s.By Lemma 4.4, for y ∈ (0, 1/2), we have ξ(y) exp(−(π − 2e −1 )(1 − α)ys −r ).Thus we set y := (π − 2e −1 ) −1 (log 4)(1 − α) −1 s r and note that the following inequality holds: ξ(y) 1/4.Therefore the number of steps in the binary search in Step 1 of Algorithm 11 is bounded above by where the constant in κ 6 depends neither on s ∈ (0, ∞) nor α ∈ (0, 1), implying the proposition.
Before proving Proposition 2.7, we first clarify in the next subsection the the time cost of applying the Newton-Raphson method in lines 1, 8, 15 and 21 of Algorithm 7. As we shall see, the costs depend on α and may, but need not, depend on s.
Proof.Since ς(x) is convex, if our initial estimate x 0 of the root is larger than the true root, then the Newton-Raphson sequence decreases and has quadratic convergence to the root x * .Using the power series of the cotangent function (as in the proof of Lemma 4.3) and the inequality sin(πx) πx(1 − x), for (x 0 , k) satisfying x * x 0 − 2 1−k , we have , by the stopping condition of the binary search of line 7 in Algorithm 12, we only need .
Proposition 4.6.Define the auxiliary function The total cost of numerical inversion of the function u where N specifies the number of precision bits.
Proof.On [0, z * ], the function is convex and increasing with derivative σ α (u) α + uασ α (u)σ α (u) α−1 .Thus, given y ∈ (0, σ(z * )), if our initial estimate x 0 of the root x * of the function uσ α (x) α − y is larger than x * and sufficiently close to x * , then the Newton-Raphson sequence is decreasing and has quadratic convergence to the root x * .Note that log(σ α ) = σ α /σ α − (σ α /σ α ) 2 and log(σ α ) = σ α /σ α are increasing, and thus has uniform upperbound for α ∈ (0, 1) and By the stopping condition of the binary search of line 7 in Algorithm 12, we only need the final expression in the previous display to be smaller than For z ∈ (1/2, 1), define the function Proposition 4.7.Define auxiliary function For α ∈ (0, 1/2), the total cost of numerical inversion of the function where N specifies the number of precision bits.
Proof.Note that Given y ∈ (F (z), F (1/2)), if our initial estimate x 0 of the root x * of the function F (x) − y is sufficiently close to x * , then the Newton-Raphson sequence has quadratic convergence to the root x * .Note that By the stopping condition of the binary search of line 7 in Algorithm 12, we only need s : proof of Proposition 2.7 Before proving Proposition 2.7, we need following lemmas, describing the properties of functions ρ and σ α .
For 3 i < m, the inequality q i (x) > q m (x) for x ∈ (0, 1/2) is equivalent to This inequality is further equivalent to It is easy to see that the positive and negative coefficients of the powers of c are the same (with different corresponding powers of c).An elementary but tedious calculation shows that A i,m (c) can be rewritten as Since c > 1 and m j i > i j m for j i − 1 and m j i m j−i i for i j i + (m − i)/2, we deduce that A i,m (c) > 0 completing the proof.Lemma 4.9.For x ∈ (0, 1), we have In particular, if z ∈ (0, 1) satisfies σ α (z) = αs r for some s > 0 (recall r = α/(1 − α)), we have 1/(1 − z) 4α 1−α s α / sin(απ).Lemma 4.10.For x ∈ (0, 1), we have σ α (x) Proof.Note that σ α (x) = σ α (x)(log(σ α )) (x).By Lemma 4.9, we only need to show that (r + 1) The power series expansion of cot shows that the left-hand side is equivalent to where ζ is Riemann zeta function.Note that 1 − α 2n+1 − (1 − α) 2n+1 attains its maximum value when α = 1/2, so it suffices to establish the following inequality In fact, let t = cot(πx/2), then Now we can complete the proof of Proposition 2.7.
To sample from f , we first toss (at most two) coins to determine on which of the following (possibly degenerate) intervals lies the sample: on Again, the computational cost of Algorithm 11 has two parts: (I) the cost to finding the value of a 1 and (II) the expected cost of the accept-reject step.By [11,Sec. 4], the expected cost of (II) is bounded above by 5.
c) The case D = 1 (where z ∈ (1/2, 1) and the sample lies in [1/2, z]), is split in two subcases: (i) Case α 1/2.We propose from the density proportional to When α = 1/2, the density is proportional to u → 1 {1/2<u z} (1 − u) −r .We can sample from this via inversion directly and the time cost is a constant.When α ∈ (0, 1/2), we can sample from the density via inversion via Newton-Raphson method: the CDF of our target density equals with F defined in (4.16).So to get a sample from the target density we only need to solve F (x) = U where U ∼ U(F (1/2), F (z)).By Proposition 4.7 and Lemma 4.9, the time cost of inversion is bounded above by Then we accept the sample U with probability h c,1 (U ), where h c,1 is given by
All in all, the expected running time is bounded above by where the constant κ 7 depends on neither s ∈ (0, ∞) nor α ∈ (0, 1).
Lemma 4.12.With s defined in line 1 of Algorithm 5, we have: Proof of Lemma 4.12.
) as α ↑ 1 by Lemma 4.11.Thus, By substituting x = (y + A r ) 1/r we get (b) Note that the function y → ye −y attains its maximum when y = 1, implying ye −y ≤ 1/e for all y ∈ R + .Thus The expectation in part (b) of the lemma can now be bounded as follows: By substituting x = (y + A r ) 1/r we get We can now give a proof Theorem 2.3 on the expected running time of Algorithm 3.
Proof of Theorem 2.3.By Proposition 2.5, to find an upper bound on the average running time, we only need to take expectation of the average running time of Algorithm 5 in s.In other words, upper bound . By (4.26) (with η equal to either α/2 or −α/2), the inequality α log(x) 2x α/2 for all x > 0 and the fact that inf x∈(0,2) Γ(x) > 1/2, we have Together with Lemma 4.12, the average running time of Algorithm 3 is bounded above by where the constant κ 0 does not depend on α.
To show the running time of Algorithm 3 has exponential moments, we only need to verify that the running times of Algorithms 6 and 7, with random parameter s d = S 1/θ , have exponential moments.By Propositions 2.6 and 2.7, we only need to show that log + (s −1 ), log + (s) and log for any η ∈ (0, 1), where we recall that ϕ α ∞ = sup x∈R + ϕ α (x).Thus, these steps indeed have running times with exponential moments, completing the proof. , where Suppose λ α θt * > λb(t * ), then the elementary inequality 1 − e −x xe −x for x > 0 gives Since xe −x attains its maximum e −1 when x = 1, it is optimal to let λ be the 'solution' to λ α θt * −λb(t * ) = 1.This solution, however, need not exist if θt * /b(t * ) α is small.Note that λ → λ α θt * − λb(t * ) is increasing and continuous on [0, (αθt * /b(t * )) r+1 ] and decreasing thereafter, with maximal value α r (1 − α)(θt * ) r+1 b(t * ) r .Thus, we may take λ to be the solution of This leads to the lower bound implying the claim and completing the proof.We begin by introducing a construction of the probability measures P q given the base probability measure P via the Esscher transform.Fix any θ > 0 and α ∈ (0, 1) and let S be a driftless stable subordinator under the probability measure P with Lévy measure ν.Denote by (F t ) t≥0 be the right-continuous filtration generated by S. Given q 0, the probability measure P q can be constructed via its Radon-Nikodym derivative dP q dP Ft := M q (t) := exp − qS t + q α θt , t 0.

.27)
Proof of Theorem 2.1.For a fixed time t * , Algorithm 1 first samples the event {τ b t * } under P. Indeed, this is done by sampling s ∼ L(S t * ) under P q and noting that the event s b(t * ) is equivalent to the corresponding tempered stable process, of which s is the final value, to not cross the boundary on the Recall that E q [e −uS 1 ] = exp((q α − (u + q) α )θ), u 0.

A.2 Moments of a crossing time of a level for infinite activity subordinators
A first-passage time over a level of an infinite activity Lévy process has exponential moments of all order.
Proof of Lemma A.  In practice, for a specific function f , M (x 0 , k) may, but need not, depend on either x 0 or k (see e.g.Proposition 4.5 4.64.7).The main purpose of M is to guarantee that the starting point x 0 of the Newton-Raphson method is sufficiently close to the root x * so that the convergence is quadratic.Constructing M for specific functions f requires a priori estimates on f .0 .Thus, for the Newton-Raphson methodology to attain quadratic convergence, it suffices to find x 0 such that M ε 0 1/2, as this would imply that |ε n | (1/2) 2 n −1 , making N bits of precision attainable in log N iterations.The computational cost of Algorithm 12 has two parts: (I) the cost to finding the value of x 0 and (II) the cost to compute the Newton-Raphson sequence x n+1 = x n − f (x n )/f (x n ) until the error reaches desired precision.(I) depends on 2 1−k M (x 0 , k) and is in fact bounded above by N .(II) is bounded by log N as we explained before.
random element τ b , S τ b − , S τ b , where τ b := inf{t > 0 : S t > b(t)} is the first crossing time of S over a non-decreasing absolutely continuous function b : [0, ∞) → [0, ∞), with b(0) ∈ (0, ∞).The variable S τ b − , referred to as the undershoot in this paper, gives the position of the subordinator S just before it crosses the function b.The position of the subordinator S at first crossing time τ b over b is given by S τ b .Algorithm 2 below, which is the main simulation algorithm of this paper, samples from the law of the vector (τ b , S τ b − , S τ b ).Denote by T the running time T of Algorithm 2.

Algorithm 2
2.2) by picking R > 0 and applying Algorithm 1 successively with the boundary function t → min{b(t), R}.Fast simulation of the triplet (τ b , S τ b − , S τ b ) under P q

) 9 : end if Theorem 2 . 3 .
Algorithm 3 samples from the law of the triplet (τ b , S τ b − , S τ b ) under P.The running time of Algorithm 3 has exponential moments and its mean is bounded by

Figure 3 . 1 :
Figure 3.1: The pictures show an implementation of Algorithm 3. The average time E[T ] is measured in seconds taken for every 10 4 samples.

Figure 3 .
3 shows such an implementation for the case where Z is the difference of two tempered stable subordinators.To obtain smoother graphs, we apply Algorithm 8 to successive values of the barrier level log(M/R 0 ) (i.e., by decreasing values of R 0 ∈ {98, 98.031, . . ., 101.999}), obtaining strongly correlated samples.This is a well-known procedure in Monte Carlo estimation that retains the accuracy of each estimate as if we had used independent samples for each value of R 0 , but reduces the random fluctuations between any two estimated values, see[12, Sec.4.2].

Figure 3 . 4 :
Figure 3.4: Estimated values u n (t, x) of u(t, x) using n = 10 4 samples.Here φ(x) = x 2 , T t has the law of the first-passage time of a tempered stable subordinator with parameters α = 0.4, θ = 1, q = 1 crossing level b(t) ≡ 1 and X = (X t ) t∈R + is geometric Brownian motion X t = x exp(B t − t/2) with X 0 = x > 0.
is described by the random vector τ b , S τ b − , S τ b , where τ b := inf{t > 0 : S t > b(t)} (4.4) is the crossing time of S over the function b and S τ b − = lim s↑τ b S s is the left limit of S at the crossing time of b.The size ∆ S (τ b ) := S τ b − S τ b − of the jump of S at the crossing time τ b may be zero with positive probability (i.e. it does not jump at time τ b ) if b is not constant.In this case we say that the subordinator S creeps over b.

. 6 )
Remark 4.1.(i) The representation of the law of τ b in Proposition 4.1(a) is based on the scaling property of the stable subordinator S. In particular, Proposition 4.1(a) implies that, at the crossing time τ b of a stable subordinator over a boundary function b, the law of the variable τ −1/α b b(τ b ) is stable for any non-increasing absolutely continuous function b. (ii) The expression of the probability of creeping, conditional on τ b = t, in Proposition 4.1(b) does not depend on the density of S t .This makes the event of creeping easy to sample, see Algorithm 3 below.(iii) Conditional on not creeping, the jump ∆ S (τ b ) is easy to sample by inverting its distribution function in (4.6).In contrast, sampling the undershoot from the law in (4.5) in finite expected running time is delicate, see Algorithm 5 below.Proof of Proposition 4.1.(a) By the scaling property of S t d = t 1/α S 1 , note P[τ b t] = P[S t b(t)] = P[S 1 B(t)] = P[B −1 (S 1 ) t] for any t > 0. (b) We claim that τ b is absolutely continuous and its density is given by a function K : (0, T b ) → (0, ∞), K(t) = g t (b(t)) α −1 t −1 b(t) − b (t) for all t ∈ (0, T b ).(4.7)In fact, P[τ b t] = P[B −1 (S 1 ) t] = P[S 1 B(t)] =

Proof of Corollary 2 . 4 . 1 d= S 1 1 d= S 1
Algorithm 4 samples χ S b under P conditional on τ b t * and its Line 1 is the only step different from Algorithm 3: we keep sampling τ b until τ b t * .This is equivalent to V being larger or equal to B(t * ), where B is specified in Proposition 4.1, and has acceptance probability equal to P[τ b t * ].The other steps are the same as in Algorithm 3. The running time of Algorithm 4 consists of two parts: (I) the naive sampling of V until V 1 B(t * ) in Line 1 of Algorithm 4 and (II) the running time of Algorithm 5 with random input t = τ b in Line 7 of Algorithm 4, where τ b is conditioned to be smaller or equal to t * .Let T α (t) be the expected runtime of Algorithm 5 with parameter t.Then the running time of Algorithm 4 is bounded by 1