Variance-Gamma approximation via Stein's method

Variance-Gamma distributions are widely used in financial modelling and contain as special cases the normal, Gamma and Laplace distributions. In this paper we extend Stein's method to this class of distributions. In particular, we obtain a Stein equation and smoothness estimates for its solution. This Stein equation has the attractive property of reducing to the known normal and Gamma Stein equations for certain parameter values. We apply these results and local couplings to bound the distance between sums of the form $\sum_{i,j,k=1}^{m,n,r}X_{ik}Y_{jk}$, where the $X_{ik}$ and $Y_{jk}$ are independent and identically distributed random variables with zero mean, by their limiting Variance-Gamma distribution. Through the use of novel symmetry arguments, we obtain a bound on the distance that is of order $m^{-1}+n^{-1}$ for smooth test functions. We end with a simple application to binary sequence comparison.


Introduction
In 1972, Stein [41] introduced a powerful method for deriving bounds for normal approximation. Since then, this method has been extended to many other distributions, such as the Poisson [10], Gamma [27], [29], Exponential [9], [31] and Laplace [13], [34]. Through the use of differential or difference equations, and various coupling techniques, Stein's method enables many types of dependence structures to be treated, and also gives explicit bounds for distances between distributions.
At the heart of Stein's method lies a characterisation of the target distribution and a corresponding characterising differential or difference equation. For example, Stein's method for normal approximation rests on the following characterization of the normal distribution, which can be found in Stein [42], namely Z ∼ N(µ, σ 2 ) if and only if for all sufficiently smooth f . This gives rise to the following inhomogeneous differential equation, known as the Stein equation: where Z ∼ N(µ, σ 2 ), and the test function h is a real-valued function. For any bounded test function, a solution f to (1.2) exists (see Lemma 2.4 of Chen et al. [11]). There are a number of techniques for obtaining Stein equations, such as the density approach of Stein et al. [43], the scope of which has recently been extended by Ley and Swan [24]. Another commonly used technique is a generator approach, introduced by Barbour [3]. This approach involves recognising the target the distribution as the stationary distribution of Markov process and then using the theory of generators of stochastic process to arrive at a Stein equation; for a detailed overview of this method see Reinert [36]. Luk [27] used this approach to obtain the following Stein equation for the Γ(r, λ) distribution: where X ∼ Γ(r, λ). The next essential ingredient of Stein's method is smoothness estimates for the solution of the Stein equation. This can often be done by solving the Stein equation using standard solution methods for differential equations and then using direct calculations to bound the required derivatives of the solution (Stein [42] used the approach to bound the first two derivatives of the solution to the normal Stein equation (1.2)). The generator approach is also often used to obtain smoothness estimates. The use of probabilistic arguments to bound the derivatives of the solution often make it easier to arrive at smoothness estimates than through the use of analytical techniques. Luk [27] and Pickett [33] used the generator approach to bound k-th order derivatives of the solution of the Γ(r, λ) Stein equation (1.3). Pickett's bounds are as follows where f = f ∞ = sup x∈R |f (x)| and h (0) ≡ h.
In this paper we obtain the key ingredients required to extend Stein's method to the class of Variance-Gamma distributions. The Variance-Gamma distributions are defined as follows (this parametrisation is similar to that given in Finlay and Seneta [17]). Definition 1.1 (Variance-Gamma distribution, first parametrisation). The random variable X is said to have a Variance-Gamma distribution with parameters r > 0, θ ∈ R, σ > 0, µ ∈ R if and only if it has probability density function given by (1.5) where x ∈ R, and K ν (x) is a modified Bessel function of the second kind; see Appendix B for a definition. If (1.5) holds then we write X ∼ VG 1 (r, θ, σ, µ).
The density (1.5) may at first appear to be undefined in the limit σ → 0, but this limit does in fact exist and this can easily be verified from the asymptotic properties of the modified Bessel function K ν (x) (see formula (B.4) from Appendix B). As we shall see in Proposition 1.2 (below), taking the limit σ → 0 and putting µ = 0 gives the family of Gamma distributions. It is also worth noting that the support of the Variance-Gamma distributions is R when σ > 0, but in the limit σ → 0 the support is the region (µ, ∞) if θ > 0, and is (−∞, µ) if θ < 0.
The Variance-Gamma distributions were introduced to the financial literature by Madan and Seneta [28]. For certain parameter values the Variance-Gamma distributions have semi heavy tails that decay slower than the tails of the normal distribution, and therefore are often appropriate for financial modelling.
The class of Variance-Gamma distributions includes the Laplace distribution as a special case and in the appropriate limits reduces to the normal and Gamma distributions. This family of distributions also contains many other distributions that are of interest, which we list in the following proposition (the proof is given in Appendix A). As far as the author is aware, this is the first list of characterisations of the Variance-Gamma distributions to appear in the literature. Proposition 1.2. (i) Let σ > 0 and µ ∈ R and suppose that Z r has the VG 1 (r, 0, σ/ √ r, µ) distribution. Then Z r converges in distribution to a N(µ, σ 2 ) random variable in the limit r → ∞.
(vi) Suppose that (X, Y ) follows a bivariate gamma distribution with correlation ρ and marginals X ∼ Γ(r, λ 1 ) and Y ∼ Γ(r, λ 2 ). Then the random variable X − Y has the The representations of the Variance-Gamma distributions given in Proposition 1.2 enable us to determine a number of statistics that may have asymptotic Variance-Gamma distributions.
One of the main results of this paper (see Lemma 3.1) is the following Stein equation for the Variance-Gamma distributions: where VG r,θ, σ,µ h denotes the quantity Eh(X) for X ∼ VG 1 (r, θ, σ, µ). We also obtain uniform bounds for the first four derivatives of the solution of the Stein equation for the case θ = 0.
In Section 3, we analyse the Stein equation (1.7). In particular, we show that the normal Stein equation (1.2) and Gamma Stein equation (1.3) are special cases. As a Stein equation for a given distribution is not unique (see Barbour [2]), the fact that in the appropriate limit the Variance-Gamma Stein equation (1.7) reduces to the known normal and Gamma Stein equation is an attractive feature.
Stein's method has also recently been extended to the Laplace distribution (see Pike and Ren [34] and Döbler [13]), although the Laplace Stein equation obtained by [34] differs from the Laplace Stein equation that arises as a special case of (1.7); see Section 3.1.1 for a more detailed discussion. Another special case of the Stein equation (1.7) is a Stein equation for the product of two independent central normal random variables, which is in agreement with the Stein equation for products of independent central normal that was recently obtained by Gaunt [20]. Therefore, the results from this paper allow the existing literature for Stein's method for normal, Gamma, Laplace and product normal approximation to be considered in a more general framework.
More importantly, our development of Stein's method for the Variance-Gamma distributions allows a number of new situations to be treated by Stein's method. In Section 4, we illustrate our method by obtaining a bound for the distance between the statistic where the X ik and Y jk are independent and identically distributed with zero mean, and its asymptotic distribution, which, by the central limit theorem and part (iv) of Proposition 1.2, is the VG 1 (r, 0, 1, 0) distribution. By using the VG 1 (r, 0, 1, 0) Stein equation local approach couplings, and symmetry arguments, that were introduced by Pickett [33], we obtain a O(m −1 + n −1 ) bound for smooth test functions. A similar phenomena was observed in chi-square approximation by Pickett, and also by Goldstein and Reinert [21] in which they obtained O(n −1 ) convergence rates in normal approximation, for smooth test functions, under the assumption of vanishing third moments. For non-smooth test functions we would, however, expect a O(m −1/2 + n −1/2 ) convergence rate (cf. Berry-Esséen Theorem (Berry [6] and Esséen [16]) to hold; see Remark 4.11. The rest of this paper is organised as follows. In Section 2, we introduce the Variance-Gamma distributions and state some of their standard properties. In Section 3, we obtain a characterising lemma for the Variance-Gamma distributions and a corresponding Stein equation. We also obtain the unique bounded solution of the Stein equation, and present uniform bounds for the first four derivatives of the solution for the case θ = 0. In Section 4, we use Stein's method for Variance-Gamma approximation to bound the distance between the statistic (1.8) and its limiting Variance-Gamma distribution. We then apply this bound to an application of binary sequence comparison, which is a simple special case of the more general problem of word sequence comparison. In Appendix A, we include the proofs of some technical lemmas that are required in this paper. Appendix B provides a list of some elementary properties of modified Bessel functions that we make use of in this paper.

The class of Variance-Gamma distributions
In this section we present the Variance-Gamma distributions and some of their standard properties. Throughout this paper we will make use of two different parametrisations of the Variance-Gamma distributions; the first parametrisation was given in Section 1, and making the change of variables leads to another useful parametrisation. This parametrisation can be found in Eberlein and Hammerstein [15].
The first parametrisation leads to simple characterisations of the Variance-Gamma distributions in terms of normal and Gamma distributions, and therefore in many cases it allows us to recognise statistics that will have an asymptotic Variance-Gamma distribution. For this reason, we state our main results in terms of this parametrisation. However, the second parametrisation proves to be very useful in simplifying the calculations of Section 3, as the solution of the Variance-Gamma Stein equation has a simpler representation for this parametrisation. We can then state the results in terms of the first parametrisation by using (2.1).
The Variance-Gamma distributions have moments of arbitrary order (see Eberlein and Hammerstein [15]), in particular the mean and variance (for both parametrisations) of a random variable X with a Variance-Gamma distribution are given by 3) The following proposition, which can be found in Bibby and Sørensen [7], shows that the class of Variance-Gamma distributions is closed under convolution, provided that the random variables have common values of θ and σ (or, equivalently, common values of α and β in the second parametrisation). Proposition 2.3. Let X 1 and X 2 be independent random variables such that X i ∼ VG 1 (r i , θ, σ, µ i ), i = 1, 2, then we have that Variance-Gamma random variables can be characterised in terms of independent normal and Gamma random variables. This characterisation is given in the following proposition, which can be found in Barndorff-Nielsen et al. [5].
Proposition 2.4. Let r > 0, θ ∈ R, σ > 0 and µ ∈ R. Suppose that U ∼ N(0, 1) and V ∼ Γ(r/2, 1/2) are independent random variables and let Z ∼ VG 1 (r, θ, σ, µ), then Using Proposition 2.4 we can establish the following useful representation of the Variance-Gamma distributions, which appears to be a new result. Indeed, the representation allows us to see that the statistic (1.8) has an asymptotic Variance-Gamma distribution.

.1 A Stein equation for the Variance-Gamma distributions
The following lemma, which characterises the Variance-Gamma distributions, will lead to a Stein equation for the Variance-Gamma distributions. Before stating the lemma, we note that an application of the asymptotic formula (B.4) to the density function (2.2) allows us to deduce the tail behaviour of the VG 2 (ν, α, β, µ) distribution: (3.1) Note that the tails are in general not symmetric.
Proof. To simplify the calculations, we prove the result for the special case µ = 0, α = 1, , and so we can deduce the general case by applying a simple linear transformation.
Necessity. Suppose that W ∼ VG 2 (ν, 1, β, 0). We split the range of integration to obtain and p(x) = κ ν,β x ν e βx K ν (x), where κ ν,β is the normalising constant, is the density of W . The integrals I 1 and I 2 exist because f is piecewise twice continuously differentiable that satisfies the conditions of (3.3), which on recalling the tail behaviour of p(x) given in (3.1) ensures that, for k = 0, 1, 2, we have that xp(x)f (k) (x) = o(|x| −1 ) as |x| → ∞.
We now note that p(x) = O(x ν−1/2 e −(1−β)x ) as x → ∞ (see (3.1)), and by differentiating p(x) and using the asymptotic formula (B.4) for K ν (x) we can see that p ′ (x) is also of are equal to 0 in the limit x → ∞. The terms xp(x)f ′ (x) and xp(x)f (x) are also equal to 0 at the origin, because f and f ′ are continuous and thus bounded at the origin. Hence, Using formula (B.7) to differentiate K ν (x) gives We now calculate the limit in the above expression. We first consider the case ν > 0. Applying the asymptotic formula (B.2) gives since νΓ(ν) = Γ(ν+1). Now consider the case ν = 0. We use the fact that K 1 (x) = K −1 (x) to obtain since Γ(1) = Γ(2). Therefore we have Finally, we consider the case −1/2 < ν < 0. We use the fact that K −λ (x) = K λ (x) to obtain A similar argument (with the difference being that here p(x) = κ ν,β (−x) ν e βx K ν (−x)) shows that As f is continuous, it follows that I 1 = −I 2 (and so I 1 + I 2 = 0), which completes the proof of necessity. Sufficiency. For fixed z ∈ R, let f (x) := f z (x) be a bounded solution to the differential equation (3.5) is given by This solution and its first derivative are bounded (see Lemma 3.3) and is piecewise twice differentiable. As f z and f ′ z are bounded, they satisfy the condition (3.3) (with α = 1) and f ′′ z must also satisfy the condition because, from (3.5), Therefore W has the VG 2 (ν, 1, β, 0) distribution. Lemma 3.1 suggests the following Stein equation for the VG 2 (ν, α, β, µ) distribution: In order to simplify the calculations of Section 3.2, we will make use of the Stein equation for the VG 2 (ν, 1, β, 0) distribution, where −1 < β < 1. Results for the full parametrisation can then be recovered by making a simple linear transformation. For the VG 2 (ν, 1, β, 0) distribution, the Stein equation (3.6) reduces to Changing parametrisation in (3.6) via (2.1) and multiplying through by σ 2 gives the VG 1 (r, θ, σ, µ) Stein equation (1.7), which we presented in the introduction.
Remark 3.2. The VG 1 (r, θ, σ, µ) Stein equation has the interesting property of being a (true) second order linear differential equation. Such Stein equations are uncommon in the literature, although Peköz et al. [32], Gaunt [20] and Pike and Ren [34] have obtained similar operators for the Kummer densities, the product of two mean zero normals, and the Laplace distribution, respectively. Gaunt [20] and Pike and Ren [34] used the method of variation of parameters (see Collins [12] for an account of the method) to solve their equations, whereas Peköz et al. used a substitution to turn their second order operator into a first order operator, which leads to a double integral solution. We attempted to follow this approach but the double integral solution we obtained was rather complicated. However, solving using variation of parameters lead to a representation of the solution (see Lemma 3.3) that enabled us to obtain uniform bounds for the solution and its first four derivatives (see Lemma 3.5 and Theorem 3.6).
We could have obtained a first order Stein operator for the VG 1 (r, θ, σ, µ) distributions using the density approach of Stein et al. [43]. However, this approach would lead to an operator involving the modified Bessel function K ν (x). Using such a Stein equation to prove approximation results with standard coupling techniques would be difficult. In contrast, our VG 1 (r, θ, σ, µ) Stein equation is much more amenable to the use of couplings, as we shall see in Section 4. Peköz et al. [32] encountered a similar situation (the density approach would lead to an operator involving the Kummer function) and proceeded as we did by instead considering a second order operator with simple coefficients.

Special cases of the Variance-Gamma Stein equation
Here we note a number of interesting special cases of the VG 1 (r, θ, σ, µ) Stein equation. Whilst the Gamma distribution is not covered by Lemma 3.1, we note that letting r = 2s, θ = (2λ) −1 , µ = 0 and taking the limit σ → 0 in (1.7) gives the Stein equation which, recalling (1.3), we recognise as the Γ(s, λ) Stein equation (1.3) of Luk [27] (up to a multiplicative factor).
We also note that a Stein equation for the VG 1 (r, 0, σ/ √ r, µ) distribution is gives the following Stein equation for distribution of the product of independent N(0, σ 2 X ) and N(0, σ 2 Y ) random variables (see part (iii) of Proposition 1.2): This Stein equation is in agreement with the Stein equation for the product of two independent, zero mean normal random variables that was obtained by Gaunt [20]. Finally, we deduce a Stein equation for the Laplace distribution. Recalling part (ii) of Proposition 1.2, we have that Laplace(0, σ) = VG 1 (2, 0, σ, 0). Thus, we deduce the following Stein equation for the Laplace distribution: where X ∼ Laplace(0, σ). Pike and Ren [34] have obtained an alternative Stein characterisation of the Laplace distribution, which leads to the initial value problem They have also solved (3.9) and have obtained uniform bounds for the solution and its first three derivatives. Their characterisation was obtained by a repeated application of the density method, and is similar to the characterisation for the Exponential distribution that results from the density method (see Stein et al. [43], Example 1.6), which leads to the Stein equation where Y ∼ Exp(λ). Since Exp(λ) = Γ(1, λ), equation (3.10) and the Gamma Stein equation (1.3) (with r = 1) give a choice of Stein equations for applications involving the Exponential distribution. Both equations have been shown to be effective in the study of Exponential approximation, but in certain situations one equation may prove to be more useful than the other; see, for example, Pickett [33] for a utilisation of (1.3), and Peköz and Röllin [31] for an application involving (3.10). We would expect a similar situation to occur with the Laplace Stein equations (3.8) and (3.9), although we do not further investigate the use of these Stein equations in Laplace approximation.

Applications of Lemma 3.1
The main application of Lemma 3.1 that is considered in this paper involves the use of the resulting Stein equation in the proofs of the limit theorems of Section 4. There are, however, other interesting results that follow from Lemma 3.1. We consider a couple here. Suppose W ∼ VG 2 (ν, α, β, 0). Then taking f (x) = e tx , where |t + β| < α (which ensures that condition (3.3) is satisfied), in the charactering equation (3.2) and setting M(t) = E[e tW ], we deduce that M(t) satisfies the differential equation Solving this equation subject to the condition M(0) = 1 then gives that the moment generating function of the Variance-Gamma distribution with µ = 0 is Similarly, taking f (x) = x k and setting M k = EW k leads to the following recurrence equation for the moments of the Variance-Gamma distributions with µ = 0: which in terms of the first parametrisation is We have that M 0 = 1 and M 1 = (2ν + 1)β/(α 2 − β 2 ) = rθ (see (2.3)), and thus we can solve these recurrence equations by forward substitution to obtain the moments of the Variance-Gamma distributions. As far as the author is aware, these recurrence equations are new, although Scott et al. [40] have already established a formula for the moments of general order of the Variance-Gamma distributions.

Smoothness estimates for the solution of the Stein equation
We now turn our attention to solving the VG 2 (ν, 1, β, 0) Stein equation (3.7). Handling this particular set of restricted parameters simplifies the calculations and allows us to write down the solution of VG 1 (r, θ, σ, µ) after a straightforward change of variables.
Since the homogeneous version of the VG 2 (ν, 1, β, 0) Stein equation has a simple fundamental system of solutions (see the proof of Lemma 3.3 in Appendix A), we consider variation of parameters to be an appropriate method of solution. We carry out these calculations in Appendix A and present the solution in Lemma 3.3. We could also have solved the Stein equation by using generator theory. Multiplying both sides of (3.7) by 1 x , we recognise the left-hand side of the equation as the generator of a Bessel process with drift with killing (for an account of the Bessel process with drift see Linetsky [25]). The Stein equation can then be solved using generator theory (see Durrett [14], pp. 249). For a more detailed account of the application of the generator approach to Stein's method for Variance-Gamma distributions see Gaunt [18].
In the following lemma we give the solution to the Stein equation. The proof is given in Appendix A. Lemma 3.3. Let h : R → R be a measurable function with E|h(X)| < ∞, where X ∼ VG 2 (ν, 1, β, 0), and ν > −1/2 and −1 < β < 1. Then a solution f : R → R to the Variance-Gamma Stein equation (3.7) is given by

11)
where the modified Bessel functions I ν (x) and K ν (x) are defined in Appendix B. Suppose further that h is bounded, then f (x) and f ′ (x) and are bounded for all x ∈ R. Moreover, this is the unique bounded solution when ν ≥ 0 and −1 < β < 1.
is very useful when it comes to obtaining smoothness estimates for the solution to the Stein equation. The equality ensures that we can restrict our attention to bounding the derivatives in the region x ≥ 0, provided we obtain these bounds for both positive and negative β.
By direct calculations it is possible to bound the derivatives of the solution of the VG 2 (ν, 1, β, 0) Variance-Gamma Stein equation (3.7). Gaunt [18] carried out these (rather lengthy) calculations for case β = 0, to obtain uniform bounds on the solution of the Stein equation (3.7) and its first four derivatives. By a change of variables it is then possible to establish smoothness estimates for the solution of the VG 1 (r, 0, σ, µ) Stein equation (1.7).
Bounds on the first four derivatives of the solution of the VG 1 (r, 0, σ, µ) Stein equation are sufficient for the limit theorems of Section 4. However, it would be desirable to extend these bounds to the general case of the VG 1 (r, θ, σ, µ) Stein equation. Another open problem is to obtain uniform bounds for the derivatives of all order for the solution of the VG 1 (r, 0, σ, µ) Stein equation. This has been achieved for the normal and Gamma Stein equations (see the bounds of Goldstein and Rinott [22] and Luk [27])) using the generator approach. Gaunt [18] made some progress towards the goal of achieving such bounds via the generator approach, but the problem remains unsolved.
Gaunt [18] remarked that the rogue 1/ sin(πν) term appeared to be an artefact of the analysis that was used to obtain the bounds. It is also worth noting that the bounds of Lemma 3.5 break down as ν → −1/2. This is to be expected, because in this limit the VG 2 (ν, 1, 0, 0) distribution approaches a point mass at the origin.
The bounds simplify in the case that ν ∈ {0, 1/2, 1, 3/2, . . .}, and from these bounds we use a simple change of variables to obtain uniform bounds on the first derivatives of the solution of the VG 1 (r, 0, σ, µ) Stein equation (1.7) for the case that r is a positive integer. These bounds are of order r −1/2 as r → ∞, which is the same order as Pickett's [33] bounds (1.4) for the solution of the Γ(r, λ) Stein equation (1.3).
Theorem 3.6. Let r be a positive integer and let σ > 0. Suppose that h ∈ C 3 b (R), then the solution of the VG 1 (r, 0, σ, µ) Stein equation (1.7) and its derivatives up to fourth order satisfy Proof. Let gh(x) denote the solution (3.11) to the VG 2 (ν, 1, 0, 0) Stein equation (3.7) 0,0h is verified by the following calculation: where we made the change of variables u = x−µ σ . We have that f for k ∈ N, and h − VG ν,1 0,0h = h − VG r,0 σ,µ h and h (k) = σ k h (k) for k ≥ 1, and the result now follows from the bounds of Lemma 3.5.

Limit theorems for Symmetric-Variance Gamma distributions
We now consider the Symmetric Variance-Gamma (θ = 0) limit theorem that we discussed in the introduction. Let X be a m × r matrix of independent and identically random variables X ik with zero mean and unit variance. Similarly, we let Y be a n × r matrix of independent and identically random variables Y jk with zero mean and unit variance, where the Y jk are independent of the X ik . Then the statistic is asymptotically VG 1 (r, 0, 1, 0) distributed, which can be seen by applying the central limit theorem and part (iv) of Proposition 1.2. Pickett [33] showed that the statistic where the X ik are independent and identically random variables with zero mean, unit variance and bounded eighth moment, converges to a χ 2 (d) random variable at a rate of order m −1 for smooth test functions. We now exhibit a proof which gives a bound for the rate of convergence of the statistic W r to VG 1 (r, 0, 1, 0) random variables, under additional moment assumptions, which is shown to be of order m −1 +n −1 for smooth test functions, using similar symmetry arguments to obtain this rate of convergence.

Local approach bounds for Symmetric Variance-Gamma distributions in the case r = 1
We first consider the case r = 1; the general r case follows easily as W r is a linear sum of independent W 1 . For ease of reading, in the statement of the following theorem and in its proof we shall set X i ≡ X i1 , Y j ≡ Y j1 and W ≡ W 1 . Then we have the following:  i,j=1 X i Y j is symmetric in m and n and the random variables X i and Y j , and yet the bound (4.1) of Theorem 4.1 is not symmetric in m and n and the moments of X and Y . This asymmetry is a consequence of the local couplings that we used to obtain the bound.
In practice, when applying Theorem 4.1, we would compute γ k m,n (X, Y ) for k = 1, 2, 3, and γ k n,m (Y, X) for k = 1, 2, 3, which would yields two bounds for the quantity Eh(W ) − V G 1,0 1,0 h. We would then take the minimum of these two bounds. We proceed in this manner when applying bound (4.1) to prove Theorem 4.12.
Before proving Theorem 4.1, we introduce some notation and preliminary lemmas. We define the standardised sum S and T by and we have that W = ST . In our proof we shall make use of the sums which are independent of X i and Y j , respectively. We therefore have the following formulas In the proof of Theorem 4.1 we use the following lemma, which can be found in Pickett [33], Lemma 4.3.
We will also use the following lemma.
Proof. Applying Jensen's inequality gives as required.
Using the VG 1 (1, 0, 1, 0) Stein equation (1.9) (with r = 1), we require a bound on the expression E[W f ′′ (W ) + f ′ (W ) − W f (W )]. We split the proof into two parts. In the first part of the proof, we use use local couplings and Taylor expansions to bound by the remainder terms that result from our Taylor expansions. Most of these terms are shown to be of the desired order of O(m −1 + n −1 ), but the bounding of some of the terms is more involved. The second part of the proof is devoted to bounding these terms to the required order.

Proof Part I: Expansions and Bounding
Due to the independence of the X i and Y j variables, we are in the realms of the local approach coupling. We Taylor expand f (W ) about S i T to obtain where S [1] i = S i + θ 1 (S − S i ) for some θ 1 ∈ (0, 1). Later in the proof we shall write T [q] j = T j + θ q (T − T j ), where θ q ∈ (0, 1). Using independence and the fact that EX i = 0, we have We begin by bounding R 1 and R 2 . Taylor expanding f ′′ (S i T ) about W and using (4.2) gives where we used that the random variables X, X 1 , . . . X m are identically distributed. In obtaining the last inequality we used that ET 4 < 3 + EY 4 n and that E|X i | ≤ EX 2 i = 1. Bounding the term 1 √ m |ET 3 f ′′ (W )| to the desired order of O(m −1 + n −1 ) is somewhat involved and is deferred until the part II of the proof.
The bound for R 2 is immediate. We have We now consider N 1 . We use independence and that EX 2 i = 1 and then Taylor expand .
where we used independence and that the X i have zero mean to obtain the final inequality.
Putting this together we have that where Noting that We first consider R 4 . Taylor expanding f ′ (W ) about ST j and using that where we used independence and that EY j = 0 and EY 2 j = 1 to obtain the final equality. Taylor expanding f ′′ (ST j ) about W gives EY j S 2 f (3) (ST [6] j ).
Putting this together we have the following bound for R 4 : As was the case with the term 1 √ m |ET 3 f ′′ (W )|, bounding the quantity 1 √ n |ESf ′′ (W )| to the desired order of O(m −1 + n −1 ) is somewhat involved and is deferred until part II of the proof.
We now consider N 2 . Taylor expanding f ′ (W ) about ST j , then using independence and that EY j = 0 and EY 2 j = 1 gives Using independence and that the Y j have zero mean and then Taylor expanding To bound R 7 we Taylor expand f ′′ (W ) about ST j and use independence and that the Y j have zero mean to obtain To summarise, at this stage we have shown that |Eh(W ) − VG 1,0 1,0 h| ≤ 7 k=1 |R k |. We have also bounded all terms to order m −1 + n −1 , except for the terms 1 In part II of the proof we shall use symmetry arguments to bound these terms to the required order. But before doing so, we obtain a useful bound for ES 2 T f (3) (W ) that will ensure that our bound for |Eh(W ) − VG 1,0 1,0 h| will only involve bounds of the first four derivatives of the VG 1 (1, 0, 1, 0) Stein equation (1.7), and hence will only involve the supremum norm of the first three derivatives of the test function h. The bound is given in the following lemma, which is proved in Appendix A. Lemma 4.5. Let f : R → R be four times differentiable, then

Proof Part II: Symmetry Argument for Optimal Rate
We now obtain bounds for 1 √ n ESf ′′ (W ), 1 √ n EST 2 f ′′ (W ) and 1 √ m ET 3 f ′′ (W ). To obtain the desired rate of convergence we shall use symmetry arguments, which are similar to those used in Section 4.1.2. of Pickett [33] to achieve the optimal rate of convergence for chi-square limit theorems.
, and therefore Eg(Z 1 , Z 2 ) = 0. We now apply Lemma 4.6 and then perform Taylor expansions to bound the expectations Eg k (S, T ). Providing that a solution ψ k exists for the test function g k , we have Before we bound the remainder terms, we need bounds for the third order partial derivatives of the solution ψ k in terms of the derivatives of f . We achieve this task by using the following lemma, the proof of which is given in Appendix A. Before stating the lemma, we define the double factorial function. The double factorial of a positive integer n is given by n!! = 1 · 3 · 5 · · · · (n − 2) · n, n > 0 odd, 2 · 4 · 6 · · · (n − 2) · n, n > 0 even, (4.4) and we define (−1)!! = 0!! = 1 (Arfken [1], p.547).
With these bounds it is straightforward to bound the remainder terms. The following lemma allows us to easily deduce bounds for the remainder terms R k 8 , R k 9 , R k 10 and R k 11 , k = 1, 2, 3.
Lemma 4.8. Suppose that f : R 2 → R is four times differentiable, and let g(s, t) = The bound for R k 10 is similar to the bound for R k 8 but with EX p and E|X p | replaced with EX p−2 and E|X p−2 | respectively. The bound for R k 11 is similar to the bound for R k 9 but with EY p and E|Y p | replaced with EY p−2 and E|Y p−2 |. respectively.
Proof. We prove that the bound for R k 8 holds; the bound for R k 9 then follows by symmetry. We begin by defining S * i = S i + φ 1 √ m X i . We note the following simple bound for |S * i | p , for p ≥ 1: Using our bound (4.5) for the third order partial derivative of ψ with respect to s, we have where we used (4.7) to obtain the final inequality. Applying the triangle inequality, that X i and S i are independent and that, by Lemma 4.4, E|S i | p ≤ E|S| p gives the desired bound. The final statement of the lemma is clear.
We can bound R k 8 , R k 9 , R k 10 and R k 11 by using the bounds in Lemma 4.8. We illustrate the argument by bounding R 1 11 . In this case we have g 1 (s, t) = sf ′′ (st), that is a = 1 and b = 0. We have where we used that 0!! = 1!! = 1, and E|T | ≤ √ ET 2 = 1 to obtain the second equality. Continuing in this manner gives the following bounds: The bound for R k 10 is similar to the bound for R k 8 but with EX p and E|X p | replaced with EX p−2 and E|X p−2 | respectively. The bound for R k 11 is similar to the bound for R k 9 but with EY p and E|Y p | replaced with EY p−2 and E|Y p−2 |. respectively.
We have therefore been able to bound the terms 1 √ n ESf ′′ (W ), 1 √ n EST 2 f ′′ (W ) and 1 √ m ET 3 f ′′ (W ) to order m −1/2 + n −1/2 . It therefore follows that the remainder terms R 1 , . . . , R 7 are of order m −1 + n −1 . We showed in part I of the proof that |Eh(W ) − VG 1,0 1,0 h| ≤ 7 k=1 |R k |, and so we have achieved the desired O(m −1 + n −1 ) bound. We can now sum up the remainder terms to obtain the following bound: To complete the proof of Theorem 4.1 we simplify this bound by using that m, n ≥ 1, as well as that for a ≥ b ≥ 2 we have EX a ≥ EX b ≥ 1, and then round all numbers up to the nearest integer. Doing so leads to bound (4.1), as required.

Extension to the case r > 1
For the case of r > 1, we have the following generalisation of Theorem 4.1: Theorem 4.9. Suppose the X ik and Y jk are defined as before, each with bounded sixth moment. Let W r = 1 √ mn m,n,r i,j,k=1 X ik Y jk . Then, for any positive integer r and h ∈ C 3 b (R), we have where the M i r,1 (h) are defined as in Theorem 3.6, VG r,0 1,0 h denotes the expectation of h(Z) for Z ∼ VG(r, 0, 1, 0), and the γ i are as in Theorem 4.1.
Proof. Define W (k) = 1 √ mn m,n i,j=1 X ik Y jk , so that W r = r k=1 W (k) . Using the VG 1 (r, 0, 1, 0) Stein equation (1.9) we have Since g (n) (x + c) = g (n) (x) for any constant c, we may use bound (4.1) from Theorem 4.1 and the bounds of Theorem 3.6 for the derivatives of the solution of the V G 1 (r, 0, 1, 0) Stein equation to bound the above expression, which yields (4.8).

Application: Binary Sequence Comparison
We now consider a straightforward application of Theorem 4.1 to binary sequence comparison. This example is a simple special case of a more general problem of word sequence comparison, which is of particular importance to biological sequence comparisons. One way of comparing the sequences uses k-tuples (a sequence of letters of length k). If two sequences are closely related, we would expect the k-tuple content of both sequences to be very similar. A statistic for sequence comparison based on k-tuple content, known as the D 2 statistic was suggested by Blaisdell [8] (for other statistics based on k-tuple content see Reinert et al. [39]). Letting A denote an alphabet of size d, and X w and Y w the number of occurrences of the word w ∈ A k in the first and second sequences, respectively, then the D 2 statistic is defined by Due to the complicated dependence structure at both the local and global level (for a detailed account of the dependence structure see Reinert et al. [37]) approximating the asymptotic distribution of D 2 is a difficult problem. However, for certain parameter regimes D 2 has been shown to be asymptotically normal and Poisson; see Lippert et al. [26] for a detailed account of the asymptotic distributions of D 2 for different parameter values.
We now consider the case of an alphabet of size 2 with comparison based on the content of 1-tuples. We suppose that the sequences are of length m and n. We assume that the alphabet is {0, 1}, and P(0 appears) = P(1 appears) = 1 2 . Denoting the number of occurrences of 0 in the two sequences by X and Y , respectively, then Clearly, X and Y are independent binomial variables with expectation m 2 and n 2 respectively. Since EX 2 = m(m+1)

4
, it is easy to compute the mean and variance of D 2 , which are given by ED 2 = mn 2 and VarD 2 = mn 4 .
Proof. We can write the number of occurrences that letter 0 occurs in the first sequence as where I i is the indicator random variable that letter 0 appears at position i in the first sequence. Similarly, the number of occurrences of letter 0 in the second sequence is given by Y = n j=1 J j , where J j is the indicator random variable that letter 0 appears at position j in the second sequence. The standardised D 2 statistic W may therefore be write as . The X i and Y j are all independent and have zero mean and unit variance. We may therefore apply Theorem 4.1 with EX 3 i = EY 3 j = 0 and EX 4 i = EY 4 j = 1 to obtain bound (4.10).

A.2 Proof of Lemma 3.3
We begin by proving that there is at most one bounded solution to the Variance-Gamma Stein equation (3.7) when ν ≥ 0. Suppose u and v are solutions to the Stein equation that and is a solution to the following differential equation This homogeneous differential equation has general solution From the asymptotic formula (B.3) for I ν (x), it follows that to have a bounded solution we must take B = 0. From the asymptotic formula (B.2) for K ν (x), we see that w(x) has a singularity at the origin if ν ≥ 0. Therefore if ν ≥ 0, then for w(x) to be bounded we must take A = 0, and therefore w = 0 and so u = v. We now use variation of parameters (see Collins [12]) to solve the Stein equation equation (3.7). The method allows us to solve differential equations of the form Suppose v 1 (x) and v 2 (x) are linearly independent solutions of the homogeneous equation Then the general solution to the inhomogeneous equation is given by where a and b are arbitrary constants and It is easy to verify that a pair of linearly independent solutions to the homogeneous equation are e −βx x −ν K ν (x) and e −βx x −ν I ν (x). However, we take f 1 (x) = e −βx |x| −ν K ν (|x|) and f 2 (x) = e −βx |x| −ν I ν (|x|) as our linearly independent solutions to the homogeneous equation. It will become clear later why this is a more suitable basis of solutions to the homogeneous equation. We now show that f 1 and f 2 are indeed linearly independent solutions to the homogeneous equation. From (B.1) we have Formula (B.6) states that K ν (−x) = (−1) ν K ν (x) − πiI ν (x) and therefore and so e −βx K ν (|x|) x ν χ (−∞,0] (x).
Since e −βx x ν I ν (x) is a solution to the homogeneous equation that is linearly independent of e −βx x ν K ν (x), it follows that e −βx |x| ν K ν (|x|) is a solution to the homogeneous equation. From (B.9) and (B.8) we have d dx and therefore W (x) = e −2βx (I ν (|x|)K ν+1 (|x|) + K ν (|x|)I ν+1 (|x|)) x|x| 2ν−1 = e −2βx x|x| 2ν , where we used (B.5) to obtain the equality in the above display. Therefore the general solution to the inhomogeneous equation is given by This solution is clearly bounded everywhere except possibly for x = 0 or in the limits x → ±∞. We therefore choose a and b so that our solution is bounded at these points and thus for all real x. To ensure the solution is bounded at the origin we must take a = 0. We choose b so that the solution is bounded in the limits x → ±∞. If we take b = ∞ then we obtain solution (3.11). Taking b = −∞ would lead to the same solution (see Remark 3.4).
Solution (3.11) is a candidate bounded solution, and we now verify that this solution and its first derivative are indeed bounded for all x ∈ R. Straightforward calculations show that, for x ≥ 0, f ≤ h e −βx K ν (x) x ν x 0 e βy y ν I ν (y) dy + h e −βx I ν (x) x ν ∞ x e βy y ν K ν (y) dy , x ν x 0 e βy y ν I ν (y) dy ESf ′′ (ST j ), Here we used that EY j = 0 and EY 2 j = 1 to simplify R 1 . To bound R 1 we Taylor expand f ′′ (ST j ) about W to obtain We now consider bound N 1 . Using independence and that EY j = 0 and EY 2 j = 1 gives Taylor expanding the f (3) (ST j ) about W allows us to write N 1 as where Putting this together, we have shown that Rearranging and apply the triangle inequality now gives and summing up the remainder terms completes the proof. and a similar inequality holds for z t . With these inequalities we have the following bound ∂ 2 g ∂s 2 ≤ e −2u {2 a+b f (4) (|s| a + |x| a )(|t| b+2 + |y| b+2 ) + 2a2 a+b−2 f (3) (|s| a−1 + |x| a−1 )(|t| b+1 + |y| b+1 ) + a(a − 1)2 a+b−4 f ′′ (|s| a−2 + |x| a−2 )(|t| b + |y| b )}.

B Elementary properties of modified Bessel functions
Here we list standard properties of modified Bessel functions that are used throughout this paper. All these formulas can be found in Olver et al. [30], except for the inequalities, which are given in Gaunt [19].