An iterative technique for bounding derivatives of solutions of Stein equations

We introduce a simple iterative technique for bounding derivatives of solutions of Stein equations $Lf=h-\mathbb{E}h(Z)$, where $L$ is a linear differential operator and $Z$ is the limit random variable. Given bounds on just the solutions or certain lower order derivatives of the solution, the technique allows one to deduce bounds for derivatives of any order, in terms of supremum norms of derivatives of the test function $h$. This approach can be readily applied to many Stein equations from the literature. We consider a number of applications; in particular, we derive bounds for derivatives of any order of the solution of the general variance-gamma Stein equation.


Introduction
Stein's method is a powerful technique for assessing the distributional distance between a random variable of interest W and a limit random variable Z. Stein [50] originally developed the method for normal approximation, and the method was adapted to Poisson approximation by Chen [6].

Outline of Stein's method
For a general target random variable Z, Stein's method involves three steps; see [47].In the first step, a suitable characterisation of the target distribution is obtained; namely, a Stein operator L such that the random variable X is equal in law to Z if and only if for all functions f belonging to some measure determining class.For continuous random variables, L is a differential operator; for discrete random variables, L is a difference operator.Such characterisations have often been derived via Stein's density approach [51], [52] or the generator approach of Barbour and Götze [2], [30].The scope of the density approach has recently been extended by [36], and other techniques for obtaining Stein characterisations are discussed in that work.The characterisation (1.1) leads to the so-called Stein equation: where the test function h is real-valued.The second step of Stein's method, which will be the focus of this paper, concerns the problem of obtaining a solution f to the Stein equation (1.2) and then establishing estimates for f and some of its lower order derivatives (for continuous distributions).
Evaluating (1.2) at a random variable W and taking expectations gives and thus the problem of bounding the quantity Eh(W ) − Eh(Z) reduces to solving (1.2) and bounding the right-hand side of (1.3).The third step of Stein's method concerns the problem of bounding the expectation E[Lf (W )].For continuous limit distributions, such bounds are usually obtained via Taylor expansions and coupling techniques.For many classical distributions, the problem of obtaining the first two necessary ingredients is relatively tractable.As a result, over the years, Stein's method has been adapted to many standard distributions, including the beta [10], [28], gamma [25], [37], exponential [4], [18], [43], Laplace [46] and, more generally, the class of variance-gamma distributions [22].The method has also been adapted to distributions arising from specific problems, such as preferential attachment graphs [44], the Curie-Weiss model [5] and statistical mechanics [12], [13].For a comprehensive overview of the literature, see [36].

Estimates for solutions of Stein equations
The Stein equations of many classical distributions, such as the normal, beta and gamma, are linear first order ODEs with simple coefficients.As a result, the problem of solving the Stein equation and bounding the derivatives of the solution is reasonably tractable.
In fact, for Stein characterisations arising from the generator approach, the Stein operator L associated with the distribution µ can be viewed as the generator of a Markov process with stationary distribution µ.For example, the classical standard normal Stein operator Lf (x) = f ′ (x) − xf (x) is the generator of an Ornstein-Uhlenbeck process (if we take f = g ′ ).In such cases, by standard theroy of Markov processes ( [16], Proposition 1.5), the solution to the Stein equation is f = g ′ , where if the integral exists.Here, P t , defined by P t f (x) = E x f (X t ), is the transition semigroup operator.If it is permissible to differentiate under the integral sign, then In cases where a formula is available for (P t h) (k) (x), it is often possible to easily deduce uniform bounds for the derivatives of f in terms of supremum norms of derivatives of h.This approach has yielded uniform bounds for derivatives of any order of the solutions of the multivariate normal [2], [29] and gamma [37] Stein equations.However, for many Stein equations in the literature, this approach cannot be applied or is not tractable.This can either be because the Stein operator cannot be viewed as the generator of a Markov diffusion process (for example, the Stein operator for the product of three of more independent standard normals [21]) or because a simple formula is not available for (P t h) (k) (x) (for example, the variance-gamma Stein operator; see Section 3.3.2 of [19]).In such cases, estimates for the solution and its derivatives are usually obtained via analytic methods, by which formulas for the solution and its derivatives are derived, and these quantities are then bounded through technical calculations.This is often a tedious and lengthy process, which usually only yields bounds on lower order derivatives.
In recent years, Stein's method has been adapted to a number of distributions for which the Stein operator is a higher order linear differential operator.Second order operators have been obtained for the Laplace [46], variance-gamma [22] and PRR distributions [44], whilst operators of order greater than two have been for the distribution of mixed products of independent normal, beta and gamma random variables [21], [24].
These higher order Stein operators have simple coefficients, so bounding the quantity E[Lf (W )] via various coupling techniques is often quite tractable.However, solving the Stein equation and obtaining estimates for the derivatives of the solution is often a challenging problem.Indeed, to date, there do not exist uniform bounds for the derivatives of the solution of the general variance-gamma Stein equation.
This paper is to some extent motivated by this challenge.In this paper, we introduce a simple iterative technique for bounding derivatives of solutions of Stein equations.We introduce the technique in Section 2 and consider applications of it in Section 3. Given a bound on just the solution or some of its lower order derivatives, the iterative procedure allows one to deduce bounds for derivatives of any order in terms of supremum norms of the derivatives of h.The technique allows us to obtain bounds for derivatives of any order of the solutions of many of the Stein equations from the literature.Such bounds have previously only been obtained for the solutions of the Stein equations for the normal and multivariate normal [7], [23], [29], [35], gamma [25], [37] and beta distributions [10].
The power of the technique is particularly well demonstrated if we consider the variance-gamma Stein equation.For this Stein equation, our iterative procedure reduces the problem of bounding derivatives of any order of the solution to just bounding the solution and its first derivative.This is a tractable problem, and we obtain uniform bounds for these quantities in Section 3. On the other hand, bounding even the second and third derivatives via analytic calculations may require very involved calculations; see Section 3 and Appendix D of [19].
When the Stein operator L is a second second order (possibly degenerate) elliptic differential operator, then the Stein equation coincides with the Poisson equation.It was in the context of the Poisson equation that we first recognised the importance of the iterative technique to Stein's method.Indeed, in Section 4, we consider the connection between Stein equations and Poisson equations, and consider an application of the iterative procedure to Poisson equations in Section 4.2

Outline of the paper
In Section 2, we introduce a general iterative technique for bounding derivatives of any order of solutions of Stein equations.In Lemmas 2.3-2.5, we present general bounds for solutions of certain classes of Stein equations.These lemmas cover the normal, gamma, beta, Student's t, inverse-gamma, PRR and variance-gamma Stein equations.In Section 3, we apply these lemmas to obtain estimates for n-th order derivatives of the solutions to these Stein equations.Many of these bounds are new.The technique also applies equally well to many Stein equations not covered by Lemmas 2.3-2.5.We demonstrate this by considering the application of the technique to the Stein equations for the multivariate normal distribution, as well as to a distribution with density proportional to e −x 4 /12 .In Section 4, we consider the connection between Stein equations and Poisson equations.We also consider the iterative technique in the context of Poisson equations in Section 4.2.

An iterative approach to bounding solutions of Stein equations
Consider the ordinary differential equation where L is a linear differential operator of the form In Stein's method, one frequently encounters linear ODEs of the type (2.5).Indeed, usually one is interested in some distribution µ, which is characterised by L, meaning that E[Lf (X)] = 0, for all sufficiently smooth functions f , if and only if X has distribution µ.One is then interested in solving (2.5) for f in terms of h, which will be supposed to be centred with respect to µ, and in bounding f and derivatives of f in terms of h and its derivatives.In this paper, we address the latter problem within the following general set-up: Suppose that we are given a finite or infinite sequence (µ k , L k ) of distributions µ k and linear differential operators L k of the form such that L k is characterizing for µ k in the above mentioned sense.Also, assume that the operators L k are linked by the relation where T k is some other linear differential operator.We will assume that all operators L k have the same order m and, then, T k will have the form for some coefficients b k,j (x).Thus, if h k is such that then, writing f k for a particular solution to we find that Often, given h k with (2.9), there is only one solution f k,h k to (2.10) which has nice boundedness properties.This is due to the fact that every other solution is obtained by adding to f k,h k a solution to the corresponding homogeneous equation whose non-zero solutions (or one of their lower-order derivatives) often explode in a certain sense.In this general framework, we will always assume that there is one such distinguished solution f k,h k to (2.10).The curious fact is that then, if the operators L m and T m are well-chosen, usually, the right hand side of (2.11) is centred with respect to the measure µ k+1 and that f ′ k,h k = f k+1,h k+1 .Given this, we can apply bounds holding for the solution f k+1,h k+1 in order to bound f ′ k,h k .Iterating this observation already gives the theoretical essence of our method.For ease of reference, we now state the technical conditions.Assumption 2.1.Assume that, for some N ∈ N ∪ {∞} and m ∈ N, we are given a sequence (µ k , L k , T k ), 0 ≤ k < N + 1, of probability measures µ k on (R, B), linear differential operators L k and T k of the form (2.6) and (2.8), respectively, such that L k is characterizing for µ k and (2.7) holds.Also, suppose that for each 0 ≤ k < N + 1 and each function h k such that µ k (h k ) = 0, there is a distinguished solution f k,h k to the Stein equation (2.10) and that the function  [19], Lemma 3.14).This then becomes the distinguished solution.
Now, we present three general lemmas, which provide general useful bounds under different additional conditions, if the operators T k have a special and simple form.As we shall see in Section 3, these lemmas cover many of the Stein equations from the literature.Lemma 2.3.Suppose that Assumption 2.1 holds and that there are constants α j , 0 ≤ j < N + 1, such that T j f = α j f holds for all 0 ≤ j < N + 1.Also, assume that for some 0 ≤ n < N + 1 a function h ∈ C n b (R) with µ 0 (h) = 0 is given and write f for f 0,h .Then, writing a j := |α j |, 0 ≤ j < N + 1: and, for n = 2k, Proof.Note that, by induction, we obtain that holds for each 0 ≤ j < N.
(i) The result follows from a simple induction on n.Suppose the result is true for all l ≤ n.Then and so the result has been proved by induction on n.
(ii) We prove the result by induction by considering the cases of odd and even n separately.Firstly, suppose the result holds for all odd l ≤ 2k + 1.Then as required.Now, we suppose the result holds for all even l ≤ 2k.Then which completes the proof.
The proof of the following lemma is analogous to that of Lemma 2.3 and is omitted.
If there exist constants K j and D j , 0 ≤ j < N + 1, such that f j,h j ≤ K j h j and f ′ j,h j ≤ D j h j hold for each 0 ≤ j < N + 1 and h k with µ k (h k ) = 0, then, with a j := |α j | and b j := |β j |, for m = 0, 1, . . ., n + 1 we have that Here, for j ∈ {0, 1, . . .}, we let and for l ∈ I j we define the constants and S (m) j,l is the collection of those size l subsets M of {m − j, m − j + 1, . . ., m} such that m, m − j ∈ M.
Proof.By induction one can see that holds for all 0 ≤ j < N. We fix 0 ≤ m ≤ n + 1 and also occasionally suppress it from the notation.Note that for each fixed 1 ≤ k ≤ n we have From this, for each 0 ≤ k ≤ m − 1, we can obtain a formula of the form (2.12) 3 , j = 1, . . ., n, which satisfy the following recursive relations: (2.13)

and
(2.14) (2.15) The initial values are given by 12), we obtain where 3 .Thus, it suffices to find explicit expressions for the sequences C , j = 0, 1, . . ., m − 1.We claim that these are given by This will be shown by induction on j = 0, 1, . . ., m − 1.If j = 0, then the claim is true by virtue of (2.16).Suppose that it holds for some 0 ≤ j ≤ m − 2.Then, from (2.13) and the induction hypothesis for C as claimed.Similaraly, we obtain the claim for C (m−j−1) 3 . Finally, from (2.14) and the induction hypothesis for (2.17) Suppose, first, that j is even.Then, I j+1 = I j ∪ {j + 2} and In a similar way, one can check that for all j/2 + 2 ≤ l ≤ j + 1 we have Thus, from (2.17) we can conclude that for even j we indeed have that as claimed.The proof for odd values of j can be concluded by similar arguments and is therefore omitted.

Estimates for solutions of Stein equations
Here, we consider the application of the iterative method to a number of Stein equations from the literature.For each of the distributions considered, we state the Stein equation and its solution, as well as the best existing bounds for the solution and its derivatives.For each distribution, the iterative technique can be applied to bound derivatives of any order given bounds on just the solution or some of its lower order derivatives.Some of the bounds we obtain are new (Student's t, inverse-gamma, PRR, variance-gamma and the distribution with density proportional to e −x 4 /12 ).The results are stated without further comment; proofs and further discussions are given in Section 3.2.

Notation. The double factorial of a positive integer n is given by
and we define (−1)!! = 0!! = 1 ([1], p. 547).The Pochhammer symbol is defined by We set the empty product −1 i=0 a i to be equal to 1.We say that the function h : 1) exists and is absolutely continuous, with derivatives up to n-th order bounded.It shall also be convenient to let h(x) = h(x) − Eh(Z) and h (0) ≡ h.
2. The Stein equation is (see [50], [51]) and, for k ≥ 1, We can therefore bound the derivatives of the solution via Lemma 2.3.
3. The solution to (3.18) is 4. The following bounds for the solution were obtained by [51]: These bounds have since been extended to higher order derivatives.For h ∈ C n b (R), n ≥ 0, the solution satisfies (see [29], [23], [7] respectively) 2. The Stein equation is (see [8]) and, for k ≥ 1, We can therefore bound the derivatives of the solution via Lemma 2.3.

Consider the Exp(λ)
For the case r = 1, some of the estimates for the solution (3.24) can be improved.For a continuously differentiable test function h with a Lipschitz derivative h ′ , [10] obtained the bounds which have better constants than (3.27) and (3.28).

Beta distribution
2. The Stein equation is (see [49]) (3.30) and, for k ≥ 1, We can therefore bound the derivatives of the solution via Lemma 2.3.
3. The solution to (3.30) is 4. It was shown by [10] that, if h is bounded, where m is the median of the B(α, β) distribution.Also, for Lipschitz h with Lipschitz constant h ′ , where A bound of the form f ′ ≤ K(α, β) h ′ was also obtained by [28], in which the constant K(α, β), for various parameter values, is sometimes better and sometimes worse than C(α, β).An iterative approach with base case f ′ ≤ C(α, β) h ′ was used by [10] to obtain the bound, for h ∈ C n b ((0, 1)), n ≥ 1, 5. For the special case of the B(1/2, 1/2) distribution, also known as the arcsine distribution, [9] showed that For δ > 0 and d > 0, the non-standardised Student's t-distribution has p.d.f.
In the case d = ν, δ = √ ν the density (3.31) is that of Student's t-distribution with ν degrees of freedom.
2. The Stein equation is (see [19], Section 7) and, for k ≥ 1, We can therefore bound the derivatives of the solution via Lemma 2.3.
3. The solution to (3.32) is 4. The density p satisfies (s(x)p(x) x, and therefore from Lemmas 1 and 3 of [48] we have the bounds Suppose that h ∈ C n b (R) and d − 2n > 0. Then where and, for n = 2k,

Inverse-gamma distribution
1. Consider an inverse-gamma random variable Z with parameters α, β > 0 and density 2. The Stein equation is (see [33]) and, for k ≥ 1, We can therefore bound the derivatives of the solution via Lemma 2.3.

PRR distribution
1. Considered the PRR distribution (introduced by [44]) with density where U(a, b, x) denotes the confluent hypergeometric function of the second kind ( [41], Chapter 13).
2. The Stein equation is (see [44]) and, for k ≥ 1, We can therefore bound the derivatives of the solution via Lemma 2.4.
5. We can apply Lemma 2.4 with f ′ ≤ √ 2π h and f ′′ ≤ 4 h as base cases to bound higher order derivatives of the solution.
where x ∈ R and the modified Bessel function of the second kind is given, for x > 0, by K ν (x) = ∞ 0 e −x cosh(t) cosh(νt) dt (see [41]).The support of the variance-gamma distributions is R when σ > 0, but in the limit σ → 0 the support is the region (µ, ∞) if θ > 0, and is (−∞, µ) if θ < 0. The class of variance-gamma distributions contains many classical distributions as special cases (see [22] for a list of these cases); in particular, where Laplace(µ, σ) denotes a Laplace distribution with density 1 2σ exp ) is the distribution of the product of the independent random variables X ∼ N(0, σ 2 X ) and Y ∼ N(0, σ 2 Y ). 2. For simplicity, consider the case µ = 0.The Stein equation is (see [22]) (3.40) Note that the VG(r, θ, σ, 0) Stein equation reduces to the standard normal (3.18) and gamma (3.24) Stein equations in the appropriate limits.
For k ≥ 1, we have When θ = 0 we can bound the derivatives of solution using Lemma 2.3.Otherwise, we can obtain estimates via Lemma 2.5.

The solution to
where ν = r−1 2 and ν+2k is a modified Bessel function of the first kind (see [41]).
5. An application of Lemma 2.3 with (3.42) and (3.43) as base cases allows us to obtain bounds for higher order derivatives. Suppose and, for n = 2k, h .
6. Now, consider the case of general θ.In Section 3.2.4,we obtain the estimates where 7. We can use Lemma 2.5 with (3.44) and (3.45) as base cases to obtain bounds for higher order derivatives. where and the sets I j and S (n) j,l are defined as in Lemma 2.5.In particular, applying the iterative technique with (3.44) and (3.45) as base cases yields the following simple bounds: 3.1.8Density proportional to e −x 4 /12 1. Consider a probability distribution with density proportional to e −x 4 /12 .We quote from [5] that the exact form of the density p of this distribution is The limiting distribution of the properly scaled total magnetization in the Curie-Weiss model at the critical inverse temperature β = 1 was shown to have this density in [15].In [12] and [5] a version of Stein's method of exchangeable pairs was used to prove that also a Berry-Esséen rate of order n −1/2 holds for this limit theorem.As in neither of the works [12] and [5] a constant for this rate is aimed at, no numerical bounds on the solution to the Stein equation have been stated, yet.However, in the appendix section of [12], actually some numerical bounds are proved.
We will firstly use these and the general bounds from [10] to provide concrete bounds on the solution to the Stein equation and, then, use these as starting points for our iterative method to derive bounds for derivatives of all orders.
2. The Stein equation is (see [12] and [5]) We have and, for k ≥ 3, We are unable to apply either of Lemmas 2.3-2.5 to bound the derivatives of the solution.However, we can still apply the iterative technique to bound the derivatives; the details are given in Section 3.2.5.
3. The solution to (3.46) is 4. If h is bounded and measurable, then it holds that These bounds are not stated explicitly, but are actually proved in the appendix section of [12].If h is Lipschitz-continuous, then we can prove that More precisely, we can even show that for each x ∈ R we have We would like to stress that the numerical bounds for Lipschitz test functions h in (3.50) are original and the first ones obtained for this distribution.The bound on f ′′ in (3.50) will be proved by standard, but pretty involved calculations.It should thus not be left unmentioned that, using the fairly easy second bound in (3.49) and our iterative procedure, we can easily obtain the bound which has a slightly worse constant.This is completely analogous to the case of the normal distribution (see Section 3.2.1).
For bounds on derivatives of general order let h ∈ C n b (R).We obtain the following generalization of the first bound in (3.49): Here, the nonnegative real numbers a (n) j , n ∈ N 0 , 0 ≤ j ≤ n are given as follows: For each n we have that (3.54) whenever these quantities are defined, and, for 3 where we write for the normalizing constant.Since we can also show that from the second bound in (3.49) we obtain the following generalised version of this bound: (3.57) 3.1.9Multivariate normal distribution 2. The Stein equation is (see [29]) where σ ij = (Σ) ij , and, for k ≥ 1, We are unable to apply either of Lemmas 2.3-2.5 to bound the derivatives of the solution.However, we can still apply the iterative technique to bound the derivatives; the details are given in Section 3.2.6.

The solution to
4. It was shown by [29] that, if Σ is nonnegative-definite and h ∈ C n b (R d ), then the solution satisfies Suppose now that Σ is positive-definite matrix and that h is bounded.Then, the following estimate was obtained by [23]: Analogous bounds to (3.59) and (3.61) were obtained for the n-th derivatives of the solution f , viewed as n-linear forms, by [39] and [23] respectively.

Further comments and proofs
In this section, we provide additional comments to complement the results of Section 3.1, and we also provide insights into the application of the iterative technique in practice.

Normal distribution
Using elementary calculations, [51] obtained the bounds (3.20) for the solution (3.19) of the standard normal Stein equation.However, this approach quickly becomes tedious for bounding higher order derivatives.The first progress with regard to bounding higher order derivatives came from the observation by [2] that the standard normal Stein operator is the generator of an Ornstein-Uhlenbeck process (if we take f = g ′ ).Applying standard theory of Markov processes leads to a solution of the form (1.4): , by the dominated convergence theorem, and an application of part (ii) of Lemma 2.3 with base case f ′ ≤ 2 h leads to a bound on f (n) involving one fewer derivative of h.These bounds are much worse than the best bounds from the literature, as given in Section 3.1.1.However, the iterative technique does allow us to deduce bounds for derivatives of any order of the solution given just the easily derived bounds f ≤ π 2 h and f ′ ≤ 2 h .An interesting application of the iterative method is that it allows us to easily deduce a bound of the form f (n+1) ≤ C h (n) , where C is a universal constant, given just the bounds f ′ ≤ 2 h and f (n−1) ≤ 1 n h (n) .Recall that, for k ≥ 1, Then, an application of the iterative procedure with these bounds yields The constant 4 is worse than the constant 2 in the bound (3.23) obtained by [7], but our derivation is considerably simpler.This example also illustrates how the iterative technique can be used to deduce bounds of the form f (n+1) ≤ C h (n) from bounds of the form f (n) ≤ K h (n) .This is of course useful in practice, because bounds involving fewer derivatives of the test function h are often desirable.

Gamma distribution
As is the case with the normal distribution, bounding derivatives of the solution of the gamma Stein equation using the representation (3.25) of the solution is tedious.The bound (3.29) for the solution can be obtained by this approach, but bounds for derivatives of the solution have not been established through this representation of the solution.However, [37] used a generator approach, similar to that described for the normal distribution, to obtain a formula for the n-th derivative of the solution of the Stein equation, from which the bound (3.26) follows.Through a technical calculation, [19] obtained (3.27) (this improved an earlier bound of [45]).The bound (3.28) was obtained by an iterative approach similar to that described in this paper.
As was the case for the normal distribution, a direct application of part (i) of Lemma 2.3 with (3.29) as the base case would yield bounds on the higher order derivatives of the solution that are much worse than the best bounds from the literature.However, the difficulty of bounding higher order derivatives is again greatly reduced, because one only requires the bound (3.29) to achieve such bounds.
For the gamma distribution, we can again use the iterative technique to obtain bounds of the form f (n) ≤ C h (n) from bounds of the form f (n) ≤ K h (n+1) .Using an iterative approach with inequalities (3.29) and (3.26) gives Thus, the iterative method has yielded a bound of similar form to (3.27), but with a much simpler proof.Using Stirling's approximation, we see that (3.64) is of order (r + n) −1/2 as r + n → ∞, which is the same order as (3.27).However, the constant is better in (3.27) than (3.64) for all n ≥ 1, whilst the constant in (3.64) is better than bound (3.27) in the case n = 0.

Beta, Student's t, inverse-gamma and the PRR distribution
Unlike the normal and gamma distributions, there are currently no simple formulas in the literature for general n-th order derivatives of the solutions of the beta, Student's t, inverse-gamma and PRR Stein equations.However, for each of these distributions uniform bounds are available for the solution and some of its lower order derivatives (beta and PRR) or it is straightforward to derive such bounds (Student's t and inverse-gamma).Therefore, the iterative method of this paper is particularly useful for these distributions.
From a straightforward application of Lemmas 2.3 and 2.4 we arrive at our bounds for n-th order derivatives of the solutions of the non-standardised Student's t, inversegamma and PRR Stein equations respectively.We could also apply Lemma 2.3 to bound the higher order derivatives of the solution of the beta Stein equation; however, this was already done by [10].The bound of [10] is slightly different from the one that would arise from an application of Lemma 2.3, as in that work the base case was f ′ ≤ C(α, β) h ′ , rather than a base case of the form f ≤ K(α, β) h .
The non-standardised Student's t example illustrates a pertinent point regarding the application of the iterative method.Suppose that we are interested in bounding the derivatives of solution of the classical Student's t Stein equation (d = ν, δ = √ ν).By (3.33), we have that, for k ≥ 1, the solution satisfies It is therefore clear that in order to use the iterative approach to bound higher order derivatives of the solution of the Student's t Stein equation, we must work with the Stein equation for the more general non-standardised Student's t distribution.It was also observed by [10] that in order to apply the iterative method to the exponential Stein equation, one must must work with the more general gamma Stein equation.

Variance-gamma distribution
For the case θ = 0, we were able to bound n-th order derivatives of the solution of the variance-gamma Stein equation via Lemma 2.3 with inequalities (3.42) and (3.43) as base cases.Previously, only uniform bounds existed for the solution and its first four derivatives (see [22]).Our bounds for the n-th order derivatives decay at a rate of order (r + n) −1 as r + n → ∞; better than the order r −1/2 rate obtained by [22], and our bounds also have smaller constants.Consequently, our bounds allows for an improvement of the O(r 1/2 (m −1 + n −1 )) bound of Theorem 4.9 of [22] to a O(m −1 + n −1 ) bound.
For the case of general θ, only modest progress had been made on the problem of bounding the derivatives of the solution of the Stein equation.It was shown by [19] and [22] that the solution and its first derivative are bounded if the test function h is bounded, although explicit constants were not given.The complicated form of the solution (3.41) means that bounding higher order derivatives directly is a challenging problem; see [19], Section 3 and Appendix D.
This application clearly demonstrates the power of our method: bounding just the solution and its first derivative, and then applying Lemma 2.5 yields bounds on derivatives of the solution of any order.This makes the problem much more tractable: the problem is essentially reduced to bounding the solution and its first derivative.We end this section by obtaining such bounds; the bounds of Section 3.1.6then follow immediately from Lemma 2.5 with inequalities (3.44) and (3.45) as base cases.
By an application of Stirling's inequality, we see that our bounds for n-th order derivatives of the solution of VG(r, θ, σ, 0) Stein equation are of order (r + n) −1/2 as r + n → ∞.This is slower than the O(r + n) −1 rate we obtained for the θ = 0 case.It is an open problem to improve the bounds for the θ = 0 case to this rate.
Finally, we note an application of our estimates for the solution of the VG(r, θ, σ, 0) Stein equation to the work of [14] on variance-gamma approximation on Wiener space.In that work, it is shown that a sequence of distributions of random variables in the second Wiener chaos converge to a variance-gamma distribution if and only if their second to sixth order moments converge to those of a variance-gamma random variable.Bounds on the rate of convergence were also derived using the variance-gamma Stein equation.These bounds were given in terms of an absolute constant, because explicit bounds were not available for derivatives of the solution of the Stein equation.With our explicit bounds on the derivatives, it is possible to give explicit constants in the bounds of [14].
Proof of inequalities (3.44) and (3.45).In order to simplify the calculations, we make the following change of parameters With these parameters, the solution (3.41) can be written as The equality between (3.66) and (3.67) means that in order to obtain uniform bounds for all x ∈ R it suffices to bound the solution and its derivative in the region x ≥ 0 as long as we consider the case of both negative and positive β.That is, the uniform bound we derive for x ≥ 0 also holds for x ≤ 0, and thus all x ∈ R. We now note the following bounds, which follow almost immediately from Theorems 3.1 and 3.2 of [20].Fix ν > − 1 2 and 0 < |β| < α.Then, for all x > 0, e −βx K ν (αx) x ν x 0 e βy y ν I ν (αy) dy < 2 α(2ν + 1) , e −βx I ν (αx) x 0 e βy y ν I ν (αy) dy < 2(γ + 1) 2ν + 1 , where γ = β/α and Applying these inequalities to bound the above expressions for |f (x)| and |f ′ (x)|, and recalling that it suffices to bound the solution for x ≥ 0 leads to the following bounds, which hold for all x ∈ R, Using (3.65) to return to the original parameters yields (3.44) and (3.45), as required.We first prove the bounds (3.50) and (3.51).Note that the density p and its Stein equation (3.46) satisfy the general Conditions 3.1 and 3.2 from the paper [10].Hence, by Proposition 3.13 (a) of that paper, we conclude that holds for each x ∈ R. Writing Φ for the standard normal distribution function and ϕ = Φ ′ for its density, it is an easy matter of partial integration to check that where we denote by c 1 := √ 2 3 1/4 Γ(1/4) the normalizing constant of p. Hence, from (3.68) we obtain the bound The Gaussian Mill's ratio inequality states that, for each t > 0, as claimed.In order to prove the bound on f ′′ , we rely on the following representation of the second derivative of the Stein equation within the so-called density approach of Stein's method: where F is the distribution function corresponding to p and Identity (3.74) is fairly standard in Stein's method.A proof can, for instance, be found in the arXiv version of the paper [11].With the help of the Mill's ratio type inequalities (5.51)-(5.54)for the distribution corresponding to p from [12], one can easily see that both U and V are non-positive functions.Using the fact that where we used (3.69) and (3.71), again, to obtain the final inequality.Using the second bound in (3.49) for the iterated Stein equation (3.47) as well as (3.72), we easily obtain which is a very simple derivation as compared to that of the previous bound.
Next, we prove (3.53), (3.54) and (3.55) by induction on n.We omit the cases n = 0, 1, 2 and assume that n ≥ 3 and that the claim holds for all 0 ≤ k < m.It is easily checked that Hence, from (3.49) applied to f n , from the definition of h n and the induction hypothesis we conclude that Now, collecting terms and using the induction hypothesis on (3.54) we obtain Note that, in passing, we have also proved (3.56) and, now, (3.49) yields (3.57).

Multivariate normal distribution
iterative method, which has been presented mostly for linear ODEs in this paper, applies equally to linear PDEs.Indeed, we can obtain bounds for n-th order partial derivatives of the solution of the multivariate normal Stein equation if we are given just the bound (3.60).For simplicity, we consider the standard multivariate normal distribution with covariance matrix Σ = I d , the d × d identity matrix.Recall that, for k ≥ 1, Then applying the iterative technique with the bound (3.60) gives Repeating this procedure allows us to bound partial derivatives of f of any order.However, the resulting bound would be much worse than the best bounds from the literature.There is a large literature on existence and regularity results for elliptic differential equations, see [17] and [27] as well as references thereof and therein.Most of these results focus on bounded domains D R d .However, [42] establishes regularity results for D = R d .We concentrate on presenting the formal relation before summarising and exploiting the results of [42] in Section 4.1.

Connection between Stein equations and Poisson equations
These regularity results are achieved by studying the SDE dX where B t is d 1 -dimensional Brownian motion, that has as its infinitesimal generator.In the Stein equation literature this is known as generator approach which was introduced by Barbour and Götze [2], [30], but instead of looking at a case by case basis we establish a general connection that has to best of our knowledge not been exploited in the literature.However, this connection has been hinted at in Section 6.2 of [38] without making the connection with regularity results of [42].The iteration of the regularity results of [42] similar to results presented below has been used in [53] to analyse MCMC algorithms with subsampling.Recently, in [31] and in [3], the generator approach of Stein's method for certain classes of diffusion processes has been used for applications related to queuing theory.In fact, [31] seems to re-develop this technique, being unaware of Barbour's and Götze's previous insights.Stein's method can be viewed as a quantitive version of the Echeverría-Weiss theorem (more details about this theorem can be found in Sections 4.1-4.2 in [34] and in [16]).This theorem illustrates the relation between invariant/stationary of the associated stochastic process and the Stein differential operator.Moreover, sufficiently fast convergence of the stochastic process to its invariant distribution guarantees the existence of a solution to the Poisson equation (4.76).

Solution of the Poisson equation
Theorem 1 and 2 in [42] characterise the smoothness and growth of the solution to the Poisson equation associated with equation (4.77).We now state a simplified version of these theorems, which we make use of in Section 4.2.Before doing so, we state some assumptions.We suppose that b is a locally bounded Borel vector function of dimension with M 0 ≥ 0 and r > 0. This condition prevents the solution of the SDE (4.77) from blowing up, so that the process {X t } is well-defined for all t > 0.
[42] Suppose that the above assumptions hold, and that there also exist C > 0 and β ≥ 0 such that

.79)
Then there exists a solution f , belonging to the Sobolev class W 2 p,loc for any p > 1, to the Poisson equation (4.76).
Recall that P t f (x) = E x f (X t ) is the semigroup associated with L, that can also be expressed by f (y)µ x t (dy) using the distribution µ x t of X t started at x. On that basis we see that f where B x,r is a ball of radius r around x; see p. 1070 of [42].The second term on the RHS is just the RHS of the Poisson equation which is bounded by assumption.The problem lies in the term |f | L ∞ (Bx,2) .It is known that for typical classes of test functions h, like Lipschitz functions, the solution f is not uniformly bounded on R d .However, such uniform bounds on f would would be necessary in order to apply the iterative procedure.Nevertheless, in the next section we obtain bounds of the form where we used multi-index notation for derivatives and |β i | = i and β 0 < • • • < β |α| .This discussion highlights that a bound on |∇f | involving |f | L ∞ (Bx,2) is too weak to obtain results like the classical plug-in theorems which are familiar from Stein's method for univariate distributions.However, even non-uniform bounds on f with a polynomial order of growth can sometimes be of use, if one has control of certain moments of the random variable under consideration.This has been exemplified recently in [31], [3] and in [26].
1. a priori using a direct calculation following page 1068 of [42] Using equation (4.80), bounds can be obtained as follows For most processes of interest the convergence rate of |µ x t − µ| T V depends on x.In case of the Langevin equation, we study below, the bounds would be of the form |P t − µ| T V ≤ CV (x) exp (−λt) ; see Theorem 6.1 of [40].Typically, 1 ≤ V (x) grows at least linearly in |x|.This is not a deficiency of the method because similar bounds are strict for the Ornstein-Uhlenbeck process.While the convergence to equilibrium thus plays a role in bounding |f | it is not so clear how the convergence relates to bounds on derivatives of f .

a posteriori using a representation formula
The result in Theorem 4.2 is proved using the representation formula in combination with Itô's formula and (4.79).For more details see p. 1070 of [42].

The over-damped Langevin SDE
We consider the Langevin SDE in R d , we see that µ is the invariant distribution by verifying that it solves the corresponding Fokker-Planck equation (L ⋆ µ = 0, where L ⋆ is the adjoint operator to L). Matching the coefficients to equation (4.77), we see that b(x) = 1 2 ∇ log µ(x) and σ = I.
In order for Stein's method to be applicable, we need bounds on higher derivatives as well.For this reason, we will iterate Theorem 4.2.In order to do so, we note that the derivatives of f can be expressed as solutions to Poisson equations with different RHSs: ) We will denote by β f,i numbers that satisfy where we used multi-index notation for derivatives.We use a similar notation for the derivatives of f , that is β f,i and assume that these bounds are a priori given.Using Theorem 4.2 we can obtain β f,i to satisfy (4.86) in terms of the β's, which we formulate as the following proposition.

If L in equation ( 1 . 2 )
takes the form of a second order (possibly degenerate) elliptic differential operator, then the Stein equation coincides with the Poisson equation Lf = h − µ(h).(4.76)
Note that we do not state in general, what the attribute distinguished in this context precisely means.Indeed, for the general theory, it only has to be guaranteed that the choice of f k,h k is unambiguous.In all our examples in the sequel except for the variance-gamma Stein equation when 0 < r < 1 (see Section 3.1.7for the p.d.f.), the distinguished solution will be the unique bounded solution.In case of the variance-gamma Stein equation when 0 < r < 1, there are infinitely many bounded solutions to the Stein equation, for general bounded test function h.However, if h is bounded, then for all parameter values there is a unique solution f to the variance-gamma Stein equation such that f and f ′ are bounded (see