Two algorithms for the discrete time approximation of Markovian backward stochastic differential equations under local conditions

Two discretizations of a class of locally Lipschitz Markovian backward stochastic differential equations (BSDEs) are studied. The first is the classical Euler scheme which approximates a projection of the processes Z, and the second a novel scheme based on Malliavin weights which approximates the mariginals of the process Z directly. Extending the representation theorem of Ma and Zhang leads to advanced a priori estimates and stability results for this class of BSDEs. These estimates are then used to obtain competitive convergence rates for both schemes with respect to the number of points in the time-grid. The class of BSDEs considered includes Lipschitz BSDEs with fractionally smooth terminal condition as well as quadratic BSDEs with bounded, H\"older continuous terminal condition (for bounded, differentiable volatility), and BSDEs related to proxy methods in numerical analysis.


Introduction
Framework. Backward stochastic differential equations play an important role in the theory of mathematical finance, stochastic optimal control, and partial differential equations. In this paper, we study two discrete-time approximations of the for the so-called locally Lipchitz Markovian backward stochastic differential equation (BSDE). The purpose is to determine the error induced by these approximations under suitable norms. The first is the well-established Euler scheme for BSDEs, and the second is a novel scheme we call the Malliavin weights scheme for BSDEs. Let T > 0 be a fixed terminal time and (Ω, F T , {F t }, P) a filtered probability space, where {F t : 0 ≤ t ≤ T } is the filtration generated by a q-dimensional (q ≥ 1) Brownian motion W and satisfying the usual conditions of right-continuity and completeness. We look to approximate the R × (R q ) -valued, predictable process (Y, Z) solving the BSDE (1.1) Here, (R q ) is the space of q-dimensional, real valued row vectors; X is an R d -valued (1 ≤ d ≤ q) diffusion; and Φ : R d → R and f : [0, T ) × R d × R × (R q ) → R are deterministic functions that are termed the terminal condition and driver, respectively. We focus on the setting in which the terminal condition Φ is in the space of fractionally smooth functions L 2,α for parameter α ∈ (0, 1] -see (A Φ ) in Section 1.2 for details -and the driver is locally Lipschitz continuous in (x, y, z) and locally bounded at 0 in the sense that there exist exponents θ L , θ X , θ c ∈ (0, 1], finite constants L f , L X , C f ≥ 0, such that, for all t ∈ [0, T ) and (x, y, z), (x , y , z ) ∈ R d × R × (R q ) , (1.2) Furthermore, X solves a time-inhomogeneous stochastic differential equation (SDE) with suitable coefficients; see (A b,σ ) in Section 1.2. The existence and uniqueness of this class of BSDEsgiven in Section 2.3 -follows from [FJ12, Theorem 3.2]. Below, we show that this class of BSDEs includes a section of the important quadratic BSDEs, and also BSDEs related to so-called proxy schemes used for numerical methods, so it is of interest to find good discrete-time approximations for such BSDEs. We note that fully implementable algorithms -admitting the full generality of the assumptions considered in this paper -based on the Euler and Malliavin weights schemes have been studied in detail in [GT13b][ GT13a] respectively, but, to the best of our knowledge, this is the first paper considering the discretization error under the full generality of the local conditions. Summary of results. In the spirit of [GM10], we make use of non-uniform time-grids π  [GM10], the use of these time-grids appears to substantially reduce the error due to disctretization.
Now θ Φ + β + 2γ ≥ 1 is sufficient to obtain the optimal convergence rate O(N −1 ). Although the complex relationship between θ Φ , α and γ make it difficult to compare the two results in full generality, the latter result relaxes the constraint α + θ L ≥ 1 in order to obtain the optimal error bound O(N −1 ) if θ c ≥ 1/2 -see (1.2) to recall the definition of θ c . The second approximation, studied in Section 5, is the so-called Malliavin weights scheme. Rather than approximating the projections of the process Z, this algorithm is used to approximate the version of Z, determined by the Malliavin integration-by-parts formula of Theorem 2.16, at the points of the time grid directly: for each N ≥ 1, set (1.5) for i ∈ {0, . . . , N − 1}, where (H i j ) i,j is a suitable random variable. Due to the connection between BSDEs and quasilinear partial differential equations (PDEs) -see [Ric12] [CD12] and references therein -it may be of interest to approximate the marginals of the process Z rather than the projections. Other schemes that make use of Malliavin calculus are available [BL13] [HNS11], but this is, to the best of our knowledge, the first scheme which makes use of the Malliavin integrationby-parts formula (Theorem 2.16). Convergence results are given -for weaker norms than those used in E(N ) for the Euler scheme -in Theorem 5.5. Although one is able to prove results under stronger norms than for the Euler scheme, there are several disadvantages (regardless of the norm used to measure the error) of the Malliavin weights scheme over the Euler scheme. Our results are proven under stronger conditions than for the Euler scheme because the use of stronger a priori estimates -Proposition 4.2 -is essential in the proof: one requires that either the terminal condition has exponential moments or that it is Hölder continuous. We have not yet been able to weaken the conditions on these a priori estimates. One also requires a greater constraint β ≤ γ ∧ θ L ∧ α (where γ := ( α 2 ∧ θ c + θ L 2 ) ∧ θ c ) on the time-grid than for the Euler scheme. The rate of convergence again depends on the parameters (α, θ L , θ c , β). In the more general setting of exponential moments on the terminal condition, β + 2γ ≥ 1 is required for the optimal error bounds O(N −1 ), whereas in the setting of θ Φ -Hölder continuous terminal condition, β + θ Φ + 2γ ≥ 1 is sufficient. One may ask, given the additional constraints, why it is of interest to study the Malliavin weights scheme over the Euler scheme? The reason has to do with the approximation of the conditional expectation. It is shown in [GT13a] that, using Monte Carlo least-squares regression to approximate the conditional expectation, one can theoretically gain an order one improvement with respect to the number of time-steps N on the algorithm complexity using the Malliavin weights scheme compared to the multi-step forward implementation of the Euler scheme [GT13b]. Such a complexity reduction is substantial, given that N may be very large.
In order to obtain the results on discretization, we extend some basic tools from the literature of BSDEs. These results are interesting in their own right. Firstly, we extend stability estimates for Lipschitz BSDEs to the class of BSDEs satisfying local Lipschitz continuity and boundedness conditions (1.2). This enables us to make estimates on the basis of constructing approximating sequences, a key technique used throughout the paper. A natural consequence of stability estimates are a priori estimates, which we also frequently require. These results are contained in Section 2.4. Secondly, we obtain dynamical representations of the process Z t in the form of the product U t σ(t, X t ), where (U, V ) is the solution of a linear BSDE. Such representations are very valuable for making estimates on the increments E[|Z t − Z s | 2 ], because one can make use of a priori estimates on the linear BSDE and the process X. In fact, it is not possible to obtain the results for Z directly, but for a suitable sequence {Z (ε) t : ε > 0} of approximating BSDEs. A priori estimates for the approximation are computed and play an important role in the overall convergence rate of the numerical schemes. To obtain this result, we extend the method and results of [GM10, Section 2], who consider the setting (1.2) with θ L = θ c = 1 only, to our more general setting. The key results are contained in Lemma 2.9. Thirdly, we extend the classical representation theorem of Ma and Zhang [MZ02,Theorem 4.2] for the Z process to our class of BSDEs. This theorem is proved in Section 2.5 and is a key result in this paper. One the one hand, it is the basis for the Malliavin weights scheme. On the other hand, we use the representation theorem to obtain stability estimates directly on the marginals of the process Z -see Proposition 2.12 -which are key to the analysis. These stability estimates lead in turn to a priori estimates of the form for all t ∈ [0, T ) almost surely. Such estimates are, to the best of our knowledge, novel and allow us to study the impact of the regularity of the terminal condition on a priori estimatessee Proposition 2.13. Finally, in Proposition 4.2, we obtain a priori estimates for the process V (ε) t -the solution (U (ε) , V (ε) ) to the linear BSDE such that the approximating BSDE solution satisfies Z t σ(t, X t ) -under additional regularity conditions on the terminal condition. These estimates are essential to analyse the error due to the Malliavin weight scheme. Rather than considering a second Malliavin derivative of the process Y t , as for example do [CD12], we make use of a functional representation that comes from the Markov property of X and determine regularity properties of the said functional representation. A consequence of this is the Lipschitz continuity of the functional representation of the process Z t under suitable conditions -see Corollary 4.3. To our knowledge, this result is novel. Since regularity properties are very useful for the calibration of numerical schemes -see for example [GT13b,Section 4.4] -this result may have some impact on reducing the cost of fully implementable algorithms.
Contributions to quadratic BSDEs and proxy methods. We consider the setting where Φ is a bounded, θ Φ -Hölder continuous function. To make the contributions of the numerical results in this paper clearer, we consider two important examples. Note that these examples have also been given some attention in [GT13b, Section 2]. We emphasize that the forward process X is a diffusion with bounded, twice continuously differentiable coefficients, whose partial derivatives are bounded and Hölder continuous; this assumption stands throughout this paper -see (A b,σ ).
Quadratic BSDEs have powerful applications in financial mathematics, for example to solve utility optimization problems in incomplete markets [REK00] [HIM05]. Let q = d and the measur- It is known [DG06] that the solution (Y, Z) of the BSDE with terminal condition Φ and driver F (t, x, y, z) exists and is unique and that there is a constant θ ∈ (0, 1] and finite C u > 0 such that |Z t | ≤ C u (T − t) (θ−1)/2 for all t ∈ [0, T ) almost surely. This implies that (Y, Z) also solves the BSDE under local conditions with terminal condition Φ and driver f (t, x, y, z) := , and θ L = θ. The terminal condition is fractionally smooth with parameter α at least as large as θ Φ -see Remark 1.3. It is shown in Corollary 2.13 that |Z t | ≤ C(T − t) (θΦ−1)/2 , so θ L is at least as large as θ Φ . Therefore, the error E(N ) of the Euler scheme is bounded above by In [Ric11], the Euler scheme for bounded, Hölder continuous is also considered, but with a different non-uniform time-grid and a transformation of the terminal condition; there is a further modelling difference in that the author requires no uniform elliptic condition, but sacrifices state-dependence in the volatility matrix. The author obtains a rate of convergence C η N η−θΦ for any η > 0, so we have obtained an improvement in this work; This improvement is likely due to the use of the timegrids π (β) N in our scheme -indeed, [GM10] show a rate of convergence O(N −α ) in the uniformly Lipschitz continuous driver setting if only a uniform time-grid is used. It is important to remark that this work is a complement to the recent papers [Ric12][ CR14], in which the authors consider weaker assumptions on the drift and the volatility of the SDE -only Lipschitz continuity and linear growth are required -however stronger assumptions are required on the terminal function Φ, which must be locally Lipschitz continuous.
Next we consider a particular instance of the proxy method. Let F (t, x, y, z) satisfy (1.2) with exponents θ L,F ≤ 1, θ X,F = 1 and θ c,F = 1, and constants L F , L F,X and C F . Let (Y, Z) satisfy the BSDE with terminal condition Φ and driver F (t, x, y, z). Let the functionF (t, x, y, z) satisfies (1.2) with exponents θ L,F = θ X,F = θ c,F = 1, and constants LF , LF ,X and CF , andΦ(x) is θ Φ -Hölder continuous and suppose that the parabolic PDE has a unique strong solution v, and, for every t ∈ [0, T ), the k-th order (k ≤ 3) partial derivatives in x of v are bounded by C u (T − t) (θΦ−k)/2 . We assume also that the parabolic operatorL t,x satisfies the property that, for any i ∈ {1, . . . , d}, x is the parabolic operator given by this is stronger than the previous assumption on the third order partial derivatives of v(t, ·), which asks for the upper bound .
The idea is that it may be numerically advantageous to simulate the BSDE (Y, Z) as opposed to the original BSDE (Y, Z). A simple example of a proxy is given byΦ(x) ≡ Φ(x),F ≡ 0, and L t,x u(t, x) = L t,x u(t, x); see Lemma 2.8 for the gradient bounds. We show in Corollary 4.3 that the process (Y, Z) brought about by this proxy may lead to some regularity improvements for the process Z compared with the original process Z. This may lead to an improvement of the numerical complexity for fully implementable algorithms that approximate the conditional expectation, where regularity is extremely important; moreover, [GT13b][GT13a] both demonstrate that there will an improvement in the constants for the error estimates when using Monte Carlo least-squares regression on this proxy compared to the same algorithm on the original BSDE (Y, Z).
Remarks on extensions. In this paper, we work with one of the simplest time-inhomogeneous SDE models with stochastic volatility, which, in particular, allows us to make use of results from the theory of parabolic PDEs [Fri64] -see Lemma 2.8. The representation theorem for Z in Theorem 2.16 also makes use of the uniform ellipticity condition. Our application to quadratic BSDEs requires these conditions, and additionally that Φ is Hölder continuous and bounded, because we make use of the results of [DG06] to introduce local Lipschitz continuity. There are already several directions that may help us to avoid the uniformly elliptic condition. The results of [Kus03][CD12] [Nee11], offer suitable PDE results under UFG conditions. Also, a representation theorem beyond the uniformly elliptic setting has been found by [Zha05] and [GM + 05] (although only for the zero driver case in the second reference). Another interesting aspect of our general results is that we require neither BMO results nor (local)-Lipschitz continuity of Φ. Combined with the connection to quadratic BSDEs already discussed here, this suggests the results of this paper may be an important stepping-stone to obtain novel representation theorems, a priori estimates, existence and uniqueness results for (super-)quadratic BSDEs with possibly unbounded and discontinuous terminal conditions. It would also be interesting to combine the results of this paper with those of [Ric12] to handle the setting of unbounded, state-dependent σ with non-Lipschitz continuous terminal condition. Unfortunately, all of these extensions are beyond the scope of this paper.

Notation and conventions
Time-grids. Since each result is given for a fixed number of time-points N , we denote the points {t (N ) i } of the time-grid simply by {t i }. Let ∆ i := t i+1 − t i and ∆W i := W ti+1 − W ti . We also suppress the superscript (N ) in the Euler and Malliavin weights scheme.
Conditional expectations. The conditional expectation E[·|F t ] is denoted by E t [·], and E ti [·] is denoted E i [·]. We make use of a conditional version of Fubini's theorem, stated in Lemma A.1. We slightly abuse notation by writing s))ds for any measurable function g) where F t is the process defined in Lemma A.1, because we believe this notation to be somewhat clearer -in particular, this formal definition indicates more clearly that the inner integral comes from a conditional expectation than strictly mathematically correct version using the process F t (·, s).
Lebesgue measure For any Euclidean space E, B(E) denotes the Borel measurable sets in E, and the Lebesgue measure on the measurable space E, B(E) is denoted by m.
Processes and spaces. For two processes X and Y in L 0 ([0, T ] × Ω; R k ), Y is said to be a version of X if X = Y m × P-a.e. P ⊂ B([0, T ]) ⊗ F T is the predictable σ-algebra, generated by the continuous, adapted processes, and H 2 is the subspace of L 2 ([0, T ] × Ω) containing only predictable processes. For p ≥ 2, S p is the subspace of H 2 of continuous processes Y such that Y S p := (E[sup 0≤s≤T |Y s | p ]) 1 p is finite for all Y ∈ S p ; · S p is a norm for this space. Linear algebra We identify the space of k × n dimensional, real valued matrices with R k×n . x denotes the transpose of the vector x. I n denotes the identity matrix in R n×n . For any A ∈ R k×n , let A j denote the j-th column vector of A. For any vector x ∈ R n , |x| is the vector 2-norm, defined by ( n i=1 |x i | 2 ) 1/2 , and for any matrix A, |A| is the matrix 2-norm, defined by max |x|=1 |Ax|, where |Ax| is the vector 2-norm of the vector Ax.
Functions and regularity. Let γ ∈ (0, 1] and A(·) be a function in the domain [0, T ) × R l taking values in R k×n (resp. R k ). We say that A(t, ·) is γ-Hölder continuous uniformly in t with Hölder constant L A if, for all (x, y) ∈ (R l ) 2 and t ∈ [0, T ), |A(t, x) − A(t, y)| ≤ L A |x − y| γ ; in the case that γ = 1, we say that A(t, ·) is Lipschitz continuous uniformly in t with Lipschitz constant L A . Likewise, we say that A(·, x) is γ-Hölder continuous uniformly in x with Hölder constant L A if, for every (t 1 , t 2 ) ∈ [0, T ) 2 and x ∈ R l , |A(t 1 , x) − A(t 2 , x)| ≤ L A |t 1 − t 2 | γ . For a given multi-index α = (i 1 , . . . , i |α| ) with no zero entries, we define by ∂ α x A(t, ·) the multiple derivative ∂ xi 1 . . . ∂ xi |α| A(t, ·). If A(t, ·) takes values in R k and is differentiable, we define by ∇ x A(t, ·) the Mollifiers. The following definitions will come in handy.
Definition 1.1. Let n be a non-zero integer. A mollifier is a smooth function φ : An example of a mollifier is φ(x) = e −1/(1−|x|) 1 |x|<1 / |x|<1 e −1/(1−|y|) dy. The following lemma, which is standard, shows how a mollifier can be used to generate a smooth function from a continuous one.
Lemma 1.2. Let F : R n → R be continuous, and define the function F R (x) := R n F (x − y)φ R (y)dy. Then the function F R (x) is smooth and lim R→∞ F R (x) = F (x) for all x ∈ R n .

Assumptions
The following assumptions will hold throughout this paper.
(A b,σ ) X is a solution to the stochastic differential equation (SDE) is R d -valued, measurable and uniformly bounded. Moreover, b(t, ·) is twice continuously differentiable with uniformly bounded derivatives and Hölder continuous second derivative, and b(·, x) is 1/2-Hölder continuous uniformly in x.
(A Φ ) The terminal condition Φ : R d → R is a measurable function and there exists a constant α ∈ (0, 1] such that K α (Φ) < ∞, where We say that Φ is fractionally smooth, and that it belongs to the space L 2,α . We refer to [GM10] for further discussion of and references for the space L 2,α .
The following condition will be required for both the Euler scheme and the Malliavin weights scheme convergence results; this is a standard assumption for BSDE approximation schemes in order to obtain a convergence bounded from above by O(N −1 ).
(A ft ) The driver f (t, x, y, z) is 1 2 -Hölder continuous in its t uniformly in (x, y, z) with Hölder constant L f .
Our convergence results for the Malliavin weights scheme require stronger conditions than those of the Euler scheme; one of the following assumptions will be necessary to obtain the main result, Theorem 5.5, of Section 5.
(A expΦ ) The terminal condition has exponential bounds in the sense that there is a finite (A hΦ ) The function Φ is Hölder continuous: there exists a finite constants K Φ and θ Φ ∈ (0, 1] such that |Φ( The following assumptions will be needed for partial results only. They will hold only when specifically stated. (A ∂f ) The driver (t, x, y, z) → f (t, x, y, z) is continuously differentiable with respect (x, y, z) for all t ∈ [0, T ). The partial derivatives in (y, z) are bounded by L f (T − t) (θ L −1)/2 and the partial derivatives in x are bounded above by L X (T − t) 1−θ X /2 .
(A bΦ ) The function Φ is uniformly bounded: Φ ∞ < ∞. In the proofs below, it will be necessary to compute a right-inverse to the matrix σ(·), i.e., for every (t, x) ∈ [0, T ) × R d , it will be necessary to find a (q, d)-dimensional matrix σ −1 (t, x) such that σ(t, x)σ −1 (t, x) = I d . In the case where the dimensions d and q are equal, this is uniquely defined by usual matrix inverse of σ(t, x), whose existence is guaranteed by the uniform ellipticity condition (A u.e. ). If the dimensions d and q are not equal, σ −1 (t, x) is defined by the pseudoinverse σ(t, x) σ(t, x)σ(t, x) −1 ; this is well defined because the uniform ellipticity condition (A u.e. ) guarantees the existence of the inverse of σσ .
2 Key preliminary results

Malliavin calculus
We recall briefly some properties and definitions of Malliavin calculus. For details, we refer the reader to [Nua06].
For any m ≥ 1, define C ∞ p (R m ) to be the space of functions taking values in R which are infinitely differentiable such that all partial derivatives have at most polynomial growth, and denote by W (h) := T 0 h t dW t the Itô integral of the (R q ) -valued, deterministic function h ∈ L 2 ([0, T ); (R q ) ). Let R ⊂ L 2 (F T ) be the subspace containing all random variables F of the form f (W (h 1 ), . . . , W (h m )) for h i ∈ L 2 ([0, T ); R q ) and any finite m. Define the derivative operator D : Define by D 1,2 (R k ) (resp. D 1,2 ((R k ) )) by the space of random variables F = (F 1 , . . . , F k ) (resp. F = (F 1 , . . . , F k )) such that F i ∈ D 1,2 for each i ∈ {0, . . . , k}. The Mallivin derivative DF is denoted by the R k×q -(resp. R q×k -) valued process whose i-th row (resp. column) is DF i (resp. (DF i ) ).
The following lemma, termed the chain rule of Malliavin calculus, is proved in [Nua06, Proposition 1.2.3].
Lemma 2.1 (Chain rule). Let (F 1 , . . . , F m ) ∈ (D 1,2 ) m . For any continuously differentiable function f : R m → R with bounded partial derivatives, and Remark. In the case that  Lemma 2.2 (Integration-by-parts). Suppose that u ∈ dom(δ) and F ∈ D 1,2 are such that E[F 2 T 0 |u s | 2 ds] < ∞. Then, the integration by parts formula holds: The integration by parts formula, Lemma 2.2, is applied column-wise in the case of matrix valued u. where D s F u s is understood as a matrix-matrix multiplication, and the Skorohod integrals are defined in the multidimensional sense of equation (2.1).

SDEs and Malliavin calculus
Fix t ∈ [0, T ) and x ∈ R d . We recall some standard properties on the Malliavin calculus applied to SDEs X (t,x) of the form Observe that the SDE X defined in (1.6) is equal to X (0,x0) . First, we recall the flow ∇X (t,x) and its inverse ∇X (t,x,−1) , which are respectively defined as the solutions to the SDEs where σ j is the j-th column of σ. These processes are linear SDEs, and we list some standard properties used throughout this paper in the following Lemma. Moreover, T ] almost surely, and, for any r < u < s, The Malliavin derivative of the marginals of X (t,x) is strongly related to the flow and its inverse, as shown in the following Lemma. The proof of the estimates follows directly from Lemma 2.4.

Existence, uniqueness, approximation and decomposition of the BSDE
Since the class of BSDEs under local conditions has, to the best of our knowledge, not been studied in full generality, we now include a proof of the existence and uniqueness of solutions. We remark that the existence and uniqueness follows also from [FJ12, Theorem 3.2]. The proof below is simpler, since a simpler class of BSDEs is considered, and different, so we include for the interest of the reader.
Theorem 2.6. There exists a unique pair of process (Y, Z) in S 2 × H 2 solving the BSDE (1.1) with terminal condition Φ(X T ) ∈ L 2 (F T ) and driver f satisfiying the locally Lipschitz continuous and boundedness of (1.2).
As in the proof of [EKPQ97, and δψ = ψ 1 − ψ 2 . It then follows from Hölder's inequality that where η r = η(r ∧ t 0 ). This is sufficient to prove that Ξ is a contraction.
We now introduce an approximation procedure that will be used repeatedly in this paper; we introduce intermediate BSDEs by "cutting" the tail of the driver close to the time horizon T , prove our results for these BSDEs, then extend the result to the BSDE we're interested by limiting procedures. This technique was used extensively in [GM10], and we shall frequently take advantage of it throughout this work.
We first treat the linear BSDE (y, z). The following Lemma relates the linear BSDE (y, z) to the PDE in (2.5) and gives some boundedness properties for the function u and its derivatives; these bounds will be used throughout this paper.
Lemma 2.8. Let (A bΦ ) be in force and consider the PDE (2.5) is a classical solution of the PDE (2.5) (the so-called Feynman-Kac representation). The derivatives ∂ α x u (|α| ≤ 3), ∂ t u, ∂ t ∇ x u exist and are continuous. There is a constant C depending only on the bound on b and it's derivatives, the bound on σ and it's derivatives, andβ such that Proof. The Feynman-Kac representation of the solution is well known, see [GM + 05] among others. To obtain the gradient bounds, recall that X is a Markov process and denote its transition density by p(t, x; s, ξ). For some C 1 and β finite, the following gradient bounds hold on p(t, x; s, ξ): We obtained these bounds from [GL10, Appendix A], who provide references for proofs.
The bounds on the derivatives of u(t, ·) then follow from Lebesgue's differentiation theorem (differentiation with respect to t and x) applied to for multiindices α 0 and α; we apply the gradient bounds on the transition density above and the boundedness of Φ to obtain the result on |∂ α0 t ∂ α x u(t, x)|. To show the bound on |∇ x u(r,X r )|, let us recall first that the result in the case α = 1 and β = 0 is given in [GM10, Lemma 1.1]. The authors use the tools of [GM + 05, Lemma 2.9] to show that, for every r ∈ [0, T ) and This result follows largely from the integration-by-parts formula of Malliavin calculus -Lemma 2.2 -and martingale arguments; see the proof of [GM + 05, Lemma 2.9] for details. H r,x satisfies The result for (α, β) = (1, 0) then follows by the Cauchy-Schwarz inequality. (Note that we in fact don't need (A bΦ ) to obtain this result.) One can follow the proof method of [GM + 05, Lemma 2.9], using additionally the linearity of the Malliavin derivative, to show that whereH r := αH r,x1 + βH r,x2 , whence the result follows. The proof for the bound on |∇ 2 x u(r,X r )| is similar.
We move onto the non-linear BSDE (y (ε) , z (ε) ). The following representations and a priori estimates will be critical throughout this paper.
where the gradients ∇ ξ f (Θ r ) is given by ∇ ξ f (r, x, y, z)| (r,x,y,z)=Θr for ∇ ξ f (r, x, y, z) defined as in Section 1.1, and U (r, x) is defined by Then there a finite constant C depending only on T , d, K α (Φ), the bounds on b and σ and their derivatives, L f , and θ L such that r . There is a (possibly different) constant C such that, for any 0 ≤ t < T and ε > 0, (2.10) Let us consider (∇y (ε) , ∇z (ε) ) solving the BSDE (2.11) The processes z (ε) and ∇z (ε) satisfy the representations Proof. In what follows, C may change from line to line.
This is the second inequality in (2.10). Additionally, for all t ∈ [0, T ), |b j,t | ≤ C(T −t) (θ L −1)/2 almost surely. The first inequality in (2.10) follows. Let (φ, ψ) be a (R d ) ×R d×q − valued process in H 2 , and define the random function The function g is progressively measurable and satisfies assumptions (H1)-(H5) of [BDH + 03, Section 4]. Since f takes no argument in (y, z), it is only necessary to validate (H1): using the triangle inequality, Jensen's inequality, the Cauchy-Schwarz inequality, and assumptions (A ∂f ) and (A b,σ ), it follows that Thanks to [BDH + 03, Theorem 4.2], there exists a unique solution (u, v) to the BSDE The remainder of the proof of existence and uniqueness follows exactly as the proof of Theorem 2.6. To prove the first inequality in (2.10), observe that the driver g(r) satisfies (A.1) from Proposition A.2 with f r = |a (ε) r | and λ r = µ r = C(T − r) (θ−1)/2 . The proofs of (2.12) and (2.13) are given in [GM10, Theorem 2.1]. The inclussion of the local Lipschitz continuity assumptions (1.2) make no difference, because the driver

A priori estimates
For 0 ≤ s < r ≤ T , we define the Malliavin weights by where D s X t is the Malliavin derivative of X t at s defined in Section 2.2. It was shown in Lemma The following constant appears throughout this paper The following result is used in the proof of [GM10, Lemma 1.1]; we include it here for completeness.
Observe, using Lemma 2.5 and the fact that One then applies the conditional Fubini's lemma, Lemma A.1, and the uniform bound on E s [|D s X t | 2 ] from Lemma 2.5 to complete the proof. The bound on H s r p is proved using the Burkholder-Davis-Gundy inequality on the continuous local martingale (t − s)H s t .
The Malliavin weight is a critical element of this work. We use it to obtain a priori estimates in this section, to obtain the representation theorem in Section 2.5, and for the Malliavin weights scheme of Section 5. The following elementary corollary indicates an important technique in which we make use of the Cauchy-Schwarz inequality in conditional form in order to obtain upper bounds: We leave the implementation of the conditional Fubini theorem, Lemma A.1, in its full form in the above lemma, without using the notation given in Section 1.1. We do this to be absolutely clear about how the conditional Fubini theorem is used in this paper, before returning to the -in our opinion -much more clear, if slightly abusive, notation Proof. The first inequality follows from application of the conditional Cauchy-Schwarz inequal- as required.
We now state and prove a priori results on the solutions of BSDEs with drivers satisfying (1.2). These estimates are in the spirit of [EKPQ97, Proposition 2.1] with two extensions: firstly, we allow the drivers of the BSDEs to satisfy locally Lipschitz continuity like condition (A f ); secondly, we prove point-wise (in time) a priori estimates on the Z processes assuming the existence of a representation formula. The latter estimates will be extremely useful, as we shall prove the this representation formula for our BSDEs in Section 2.5 and use the below proposition extensively in subsequent sections.
and f i (ω, t, 0, 0) ∈ H 2 for i ∈ {1, 2}. Let (Y i , Z i ) be a solution to the FBSDE with terminal condition Φ i and driver f i (t, y, z) (i = 1, 2 respectively). Define Then there is a finite constant C ≥ 0 depending only on T , L f2 and θ 2,L such that, for all s < t < T , 1, 2). Then there is a (possibly different) finite constant C ≥ 0 depending only on T , C M , L f2 , and θ 2,L such that, for all t ∈ [0, T ) almost surely.
Proof. In what follows, C may change from line to line. We start by proving the result for s = 0; the general case is proved analogously, the only difference is that one must use the conditional version of the Minkowski, Cauchy-Schwarz (Corollary 2.11), and Hölder inequalities in the place of the usual version of these with the regular expectation. Using the definition of the BSDE (1.1), Using (1.2) and Hölder's inequality,  and the proof of (2.16) is complete by substituting the bounds on ∆Y t0 2 2 from above.
The estimates (2.17) allow us to determine a priori estimates on the conditional second moments of the solution of the BSDE (Y, Z).
for all t ∈ [0, T ) almost surely. Then there is a constant C depending only on L f , θ L , C f , θ c , K α (Φ) and T such that, for all t ∈ [0, T ) and s ∈ [0, t], we have Proof. In what follows, C may change from line to line. As in Proposition 2.12, we only prove the result for s = 0; the general case is proved using the conditional version of the Minkowski, Cauchy-Schwarz (Corollary 2.11), and Hölder inequalities in the place of the usual version of these with the regular expectation. Recalling V t,T (Φ) from (A Φ ), apply (2.17) from Proposition 2.12 with (Y 1 , Z 1 ) := (0, 0) and (Y 2 , Z 2 ) := (Y, Z) to obtain (for all t ∈ [0, T )) Combining the local Lipschitz continuity and boundedness of f in (1.2) leads to the required bound on the conditional second moments of Z t . The estimate on the conditional moments of Y t is obtained similarly starting from (2.16). The remaining bounds are obtained by taking into account (1.2) and the regularity of the terminal condition ((A Φ ) or (A hΦ )).
To end this section, we present a mollification procedure that will be used frequently to allow us to extend results under the assumptions (A ∂f ) and (A bΦ ) to the same results without these assumptions. The following corollary is a trivial consequence of Proposition 2.12 and the properties of mollifiers.

Representation theorem
In this section, we prove that BSDEs satisfying the local Lipschitz continuity and local boundedness conditions (A f ) also satisfy the a representation theorem in the spirit of [MZ02, Theorem 3.1]. Following on from Section 2.4, we see that this representation is very valuable, as it gives us additional access to a priori results. We use these a priori results in the sections that follow, so it is essential that we also establish the representation result. Unlike in the proof of [MZ02, Theorem 3.1], we do not prove the representation result on Z directly. The strategy is rather to take the approximative BSDE (Y (ε) , Z (ε) ), for which we already know that Z (ε) satisfies the representation from [MZ02, Theorem 3.1], then to prove it converges in H 2 to the process that we claim is a version of Z as ε converges to 0 by classical (ε, δ)−arguments, and to finally conclude using the fact that Z (ε) also converges to Z in H 2 and because Z is unique.
Theorem 2.16. Recall L 2,α from (A Φ ), suppose that Φ ∈ L 2,α and (t, x, y, z) → f (t, x, y, z) satisfies (A f ). Then, there is a predictable version Z of Z which satisfies where H t s are the Malliavin weights given in (2.14). Proof. In the following, C is a constant whose value may change from line to line.
To start with, let assume (A ∂f ) and (A bΦ ) be in force. We prove the representation theorem first under these conditions, and then extend to the general result by means of mollification. Recall the BSDEs (Y (ε) , Z (ε) ), (y, z) and (y (ε) , z (ε) ) from Section 2.3, and the decomposition (Y (ε) , Z (ε) ) = (y + y (ε) , z + z (ε) ). We first prove the that there is a predictable version of Z (ε) equalling In fact, this is an application of [MZ02, Theorem 4.2]; this is not immediately clear, so we make the calculations explicit for the benefit of the reader. Definition 2.7 and Lemma 2.8 give us that (y (ε) , z (ε) ) solves the BSDE with terminal condition 0 and driver on the time interval [0, T − ε]. Due to the bounds on u and its derivatives given in Lemma 2.8, the Lipschitz constant of (x, y, z) → F (t, x, y, z) is bounded from above (for all t ∈ [0, T − ε]) by

Using this Lipschitz constant, we also show that
Therefore, the driver F is uniformly Lipschitz continuous in (x, y, z) and uniformly bounded at (y, z) = (0, 0), i.e. it satisfies (A f ) with θ L,F ≡ 1, θ C,F ≡ 1, and constants L F and C F (given above). On the other hand, z (ε) t and F (t, x, y, z) are 0 for all t ∈ (T − ε, T ] almost surely, so the representation holds trivially in the interval (T − ε, T ], whence it follows that there is a version of z (ε) equalling t,T σ(t, X t ) in their notation -that there is predictable version of (z t ) t∈[0,T ) equalling for all t ∈ [0, T ) almost surely, and this implies the version of Z (ε) given by (2.27) thanks to the the decomposition (Y (ε) , Z (ε) ) = (y + y (ε) , z + z (ε) ). Define by Z the predictable projection [JS03, Theorem 2.28] of the process (X t : In what follows, we show that Z (ε) t − Z t 2 → 0 as ε → 0 for almost all t ∈ [0, T ). This implies, by the dominated convergence theorem, that Z (ε) → Z in H 2 . Since Z (ε) → Z in H 2 was determined in Corollary 2.14, this implies that Z t = Z t m × P − a.e., which completes the proof under the assumptions (A ∂f ) and (A bΦ ).
We first need some intermediate upper bounds. Analogously to Corollary 2.13, we have that (2.28) Fix t ∈ [0, T ) and η > 0. Using the representation formula (2.27), it follows from Minkowski's inequality, the conditional Cauchy-Schwarz inequality (Corollary 2.11), and Lemma 2.10 that (2.29) Taking ε < (T − t)/2 and using (2.28), it follows that Taking ε < η 1/γ (T − t) 1/(2γ) /C, where C is the last constant in the inequality above, is sufficient to bound the above term by η. On the other hand, letting δ < (T − t)/2, To bound the first integral term on the right hand side above, we apply Hölder's inequality and the Lipschitz continuity of f (t, ·) to obtain Using that (Y (ε) , Z (ε) ) → (Y, Z) in S × H 2 as ε → 0 (Corollary 2.14), set ε sufficiently small so that the above is bounded above by √ δη. To bound the second integral term on the right hand side of (2.30), we use (2.22) and (2.28) combined with the triangle inequality to show that and set δ sufficiently small so that the above is bounded above by η. Therefore, we have shown that for almost every t ∈ [0, T ) and every η > 0, there is a sufficiently small ε such that Z for all t ∈ [0, T ) almost surely. Thanks to the point-wise convergence of f M to f and Φ M to Φ, and the convergence of (Y M , Z M ) to (Y, Z) in S 2 × H 2 from Corollary 2.15, we can use analogous limit arguments as above to complete the proof.

Convergence rate of the Euler scheme for BSDEs
Throughout this section, the assumption (A ft ) is in force. Let us recall now the Euler scheme for BSDEs: We determine error estimates on the error of the Euler scheme, which is given by The following proposition serves as the starting point of our analysis; it allows us to estimate the error E(N ) using estimates for the so called L 2 -regularity, which we will do subsequently. : N ≥ 1}, there is a constant C depending only on L f , L X , θ L , θ X , β, and T , but not on N , such that, for all N ≥ 1, The proof is analogous to the proof of [GL06, Theorem 1], one must only use the result ∆ k /(T − t k ) 1−θ L ≤ T θ L (βN ) −1 for β ≤ θ L (see Lemma B.1) in order to compensate for the local Lipschtz constant of the driver.
The sum dt is called the L 2 -regularity; it's study was initiated by [Zha04]. Since (Z ti := 1 ∆i E i ti+1 ti Z t dt ) i is the projection of Z onto the space of adapted discrete processes with nodes on π under the scalar product (u, v) = E T 0 (u s · v s )ds, it follows that To bound E(N ), it follows from Proposition 3.1 that it is sufficient to bound the term on the right-hand side of (3.1). However, as in the proof of the Representation Theorem in Section 2.5, it is not possible to do so directly for the BSDE (Y, Z), so we use an approximation procedure via the BSDE (Y (ε) , Z (ε) ), which we recall from Definition 2.7 in Section 2.3. Throughout the remainder of this section, we work with the version of Z and Z (ε) given by Theorem 2.16, i.e This version empowers us with the additional a priori estimates estimates developed in Section 2.4; we use these estimates frequently in the analysis of this section.
The following lemma decomposes the L 2 -regularity of Z -the left hand side of equation (3.1) -into the L 2 -regularity of Z (ε) and a small correction term controlled by ε.
Lemma 3.2. Let β ∈ (0, 1]. Then there is a constant C depending only on L f , C M , θ L , θ c , β, C f , K α (Φ), and T , such that for all N ≥ 1 In what follows, C may change in value from line to line. Using the Cauchy inequality and the orthogonality of the projections, 1 ti 2 2 ds. Recall from Corollary 2.14 that Moreover, using Jensen's inequality, and this completes the proof.
We now come to our first and most general estimate on the E(N ). Later, in Theorem 4.5, we augment this result with stronger assumptions.

A priori estimates under (A bΦ ) and (A hΦ )
At the end this section, we give a complementary result to Theorem 3.3 under stronger the conditions on the terminal condition (A bΦ ) and (A hΦ ), i.e. where the function Φ is bounded (and/)or Hölder continuous, respectively. This is achieved using the an additional a priori estimates on V (ε) t 2 , given in Proposition 4.2 below. Moreover, these a priori estimates will be critical in Section 5, where one requires more structure than in Section 3. The result is proved, roughly speaking, by using a functional representation of the intermediate process z (ε) and show Lipschitz continuity of the said functional representation. This adds an additional layer of interest under (A hΦ ) for the parameters θ Φ + θ L ≥ 1, where we can demonstrate that limit of the process z (ε) s in H 2 , i.e. the process Z s − ∇ x u(s, X s )σ(s, X s ), has a functional representation and that function is Lipschitz continuous; see Corollary 4.3. Regularity results are important for numerical schemes as they allow one to build algorithms with lower numerical complexity -see for example [GT13a, Section 3.5] -and this regularity result has such implications for the proxy scheme described in the introduction of this paper.
First, we state the result that x → σ −1 (t, x) is uniformly Lipschitz continuous, and t → σ −1 (t, x) is uniformly 1/2-Hölder continuous. This elementary result will also be useful in Section 5 below. The proof is to be found in Appendix D.
We now state the main result of this section, the a priori estimates on the process V (ε) .
is in force, there exists version of V (ε) and a finite constant C depending only on L f , the bounds on b and σ and their partial derivatives,β, C M , θ L , θ c , C f , and T such that for any ε ∈ (0, T ] Remark. The integrals in (4.1,4.2) exist and are bounded by Proof. In what follows, C may change from line to line.
Step 3. Proving that ∇X . We make use of Malliavin calculus -see Section 2.1. By taking the Malliavin derivative on both the BSDE solution (y (ε) , z (ε) ) and on the functional representation z (ε) (s, X ) s≤τ ≤T , the Malliavin derivatives of the processes (y (ε) , z (ε) ), solving the BSDE (4.6) We multiply (4.6) on the right by σ −1 (s, X (t,x) s )∇X (t,x) s and apply Lemma 2.5 to obtain comparing the BSDE (4.7) to to (4.5) term by term, it is clear that a version of the solution to (4.7), is a version of (∇y Functional arguments. We start by assuming that z (ε) (t, ·) is smooth (or by taking a mollification). The chain-rule of Malliavin calculus -Lemma 2.1 -yields D s z (ε) (τ, X , and, applying Lemma 2.5, σ −1 (s, X ). The result follows for z (ε) (τ, ·) only Lipschitz continuous by standard limiting arguments. Since (z (ε) (τ, X (t,x) τ )) 0≤τ ≤T is a version of (z (ε,t,x) τ ) 0≤τ ≤T , it follows that (D s z (ε) (τ, X (t,x) τ ))) s≤τ ≤T is a version of (D s z (ε,t,x) τ ) s≤τ ≤T , and therefore that We now combine the BSDE arguments and the functional arguments from above. Thanks to the intermediate version ( Step 4. Proving z (ε) (t, ·) is Lipschitz continuous. Fix s ∈ [t, T ). Using the representation (4.4) of z (ε,t,x) , it follows that We start with an estimate for A 2 . Using the Cauchy-Schwarz inequality, it follows that Using the same techniques as in the proof of Lemma 2.10, one shows that where C 4 is the constant coming from the BDG inequality. Thanks to [RY99, Theorem IX.2.4], we have that Corollary 4.3. Let (A hΦ ) and (A ∂f ) be in force, and let θ L +θ Φ ≥ 1. Then there exists a function z : [0, T ) × R d → (R q ) such that ∇ x u(s, X s )σ(s, X s ) + z(s, X s ) is a version of Z s . Moreover, recalling the function φ(t, ε, θ L , θ Φ ) from (4.2) for all t ∈ [0, T ), x → z(t, x) is Lipschitz continuous with Lipschitz constant equal to for some finite constant C depending only on K Φ , L f , the bounds on b and σ and their partial derivatives,β, C M , θ L , θ c , C f , and T .
Proof. Let (Y (t,x) , Z (t,x) ) be the solution of and set x) by mimicking the proof of Theorem 2.16. Since Z is the limit of Z (ε) as ε → 0 in H 2 , and z (ε) (s, X s ) is a version of Z (ε) s −∇ x u(s, X s )σ(s, X s ), it follows that z(s, X s ) is a version of Z s −∇ x u(s, X s )σ(s, X s ), as required. Finally, to prove the Lipschitz continuity of z(t, ·), we observe that, for θ L + θ Φ ≥ 1, thanks to Lemma C.2, and proceed as in Step 4 of the proof of Proposition 4.2 (with z(t, ·) in the place of z (ε) (t, ·)); the upper bound on the limit lim ε→0 φ(t, ε, θ L , θ Φ ) comes from Lemma C.2.
In order to make use of Proposition 4.2, it is is necessary to approximate Z by an intermediate process Z M which satisfies the hypotheses of Proposition 4.2.
It follows from Markov's exponential inequality and (A expΦ ) that (4.15) The last inequality is obtained by substituting the value of M . On the other hand, the basic properties of the mollifier in Definition 1.1 yields . Substituting (4.15) and (4.16) into (4.14) Lemma C.2 then yields The sum on the right hand side above is bounded by 1 + t N −1 0 (T − t) −1 dt = 1 + C ln(N ), whence the proof is complete.
We now provide an extension to Theorem 3.3 under (A hΦ ) with the aid of Proposition 4.2.

Convergence rate of the Malliavin weights scheme
In this section, we treat the Malliavin weights schemē Recall the Malliavin derivative of the the marginals of the process X in Section 2.2. In the definition of the Malliavin weights scheme (1.5), we use the following discrete-time approximation of the Malliavin weights (2.14): ; the latter property is proved exactly like Lemma 2.10. If the marginals of X and D ti X are not known explicitly, one can use an SDE scheme to provide approximations, but this is beyond the scope of this work; some work has been done on this in the zero driver case (f ≡ 0), in particular we refer the reader to Section 3 (and the sequel) of [GM + 05]. In what follows, we use the version of Z given by Theorem 2.16, in other words We start with some preliminary results.
Lemma 5.1. There is a constant C depending only on the bound on b and it's derivatives, the bound on σ and it's derivatives,β, L f , θ L , C f , θ c , K α (Φ) and T such that, for any 0 ≤ i < j ≤ N , Using the decomposition it follows from the boundedness and Lipschitz continuity of σ and σ −1 (Lemma 4.1) that for any j > i and t ∈ [t j , t j+1 ], It now follows from Lemma 2.5 the usual bound E i [|X t − X tj ]| 2 ] ≤ C(t − t j ) and Lemma B.1 that The upper bound follows from the conditional Cauchy-Schwarz inequality (Corollary 2.11). Therefore, (5.2) and ] 2 follows from the Cauchy-Schwarz inequality (Corollary 2.11), i.e.
from here, one applies the estimate (5.2) and the fact that, similarly to (2.22 Moreover, Proof. First let (A ∂f ) be in force and recall, as argued in the proof of Theorem 2.16, that the BSDE solved by (y (ε) , z (ε) ) in Definition 2.7 satisfies the conditions of [MZ02,Theorem 4.2]. A key element of the proof of that Theorem is to show that, for almost all v ∈ [0, r), where U (r, x) is defined in (2.7); see the equality just above equation (4.19) in [MZ02]. Integrating with respect to v over v ∈ [t i , t j ), on the one hand, and between v ∈ [t i , r), on the other, which yields One then follows the proof of [MZ02, Theorem 4.2], which essentially uses integration-by-parts for Malliavin calculus -Lemma 2.2 -to show that r )H ti tj ]. One extends to the general case (5.3) by convergence arguments as in the proof of Theorem 2.16 The relation (5.4) is now straightforward to obtain from (5.3).
Lemma 5.3. There is a finite constant C depending only on the bound on b and its derivatives, the bound on σ and its derivatives, L f , θ L , C f , θ c , and T such that, for all i ∈ {0, . . . , N − 1}, Proof. In what follows, C may change from line to line. Using the conditional Cauchy-Schwarz inequality (Corollary 2.11), then, Minkowski's inequality and the moment bound (2.22) of Corollary 2.13 imply that Using the Lipschitz continuity of f , Minkowski's inequality, and Lemma 2.10, For (5.7), the t-Hölder continuity of f in (A ft ), the Cauchy-Schwarz inequality (Corollary 2.11), Minkowski's inequality, and Hölder's inequality are needed: The usual upper bound X r − X tj 2 ≤ C √ r − t j implies that tj+1 tj X r − X tj 2 dr ≤ C tj+1 tj r − t j dr. Now, we obtain the upper bound tj+1 tj √ r − t j dr = 2 3 ∆ 3/2 j ≤ CN −1/2 ∆ j from Lemma B.1, and substitute it to the already acquired estimates to obtain Applying Lemma C.2 to bound the sums without the integrals is then sufficient to complete the proof.
In the following proposition, we obtain a bound for the error terms on the right hand side of (5.7); these error terms are intrinsically related to the discritization error of the Malliavin weights scheme. Proposition 4.2 will be essential in the proof of this result.
Proof. We will prove the bounds for The bounds for the N −1 j=0 Ψj (T −tj ) (1−θ L )/2 are obtained analogously. Moreover, we will only prove the result for the terms in Z. The bound for the terms in Y are also obtained analogously. In what follows, C may change from line to line. We first prove the result under (A ∂f ) and (A bΦ ), and then obtain the general result by means of mollification. Fix ε ≤ ∆ N −1 and recall the BSDE (Y (ε) , Z (ε) ) from Definition 2.7 in Section 2.3. We use the version of Z (ε) provided by Theorem 2.16. First, apply the triangle inequality to the integrand in order to obtain To bound the terms in Z − Z (ε) , recall the bound (2.24) from Corollary 2.14. For j ≤ N − 2, the bound on Then, use (t N −1 − t i ) −1/2 ≤ 2(T − t i ) −1/2 on the denominator on the right hand side. For the outstanding term, j = N − 1, we implement Lemma C.2 to show that Combining (5.8) and (5.9), it follows that where we have used that ∆ 2γ+θ L N −1 = T N (2γ+θ L )/β and β < (2γ) ∧ θ L . Analogously, we can also show that Recalling the BSDEs (y, z) and (y (ε) , z (ε) ) from Definition 2.7 and that Z (ε) = z + z (ε) , the triangle inequality yields Z ti 2 . In the proof of [GM10, Theorem 1.1], in bounding the terms E 1 and E 2 , it is shown that, for all t ∈ [0, T ], x u(r, X r ) 2 2 dr.
We come to the main result of this section, namely the error estimation for the Malliavin weights scheme.

D Regularity results for inverse matrices
Lemma D.1. Let ξ > 0 be finite and A : R n → R l×l be symmetric and such that η A(x)η ≥ ξ|η| 2 for all x ∈ R n and η ∈ R l . Then, for every x ∈ R d , the matrix A(x) is invertible and |A −1 (x)| ≤ 1/ξ. Moreover, if x → A(x) is γ-Hölder continuous,then it's inverse x → A −1 (x) is also γ-Hölder continuous.