Upper bounds for Stein-type operators

We present sharp bounds on the supremum norm of D j Sh for j ≥ 2, where D is the diﬀerential operator and S the Stein operator for the standard normal distribution. The same method is used to give analogous bounds for the exponential, Poisson and geometric distributions, with D replaced by the forward diﬀerence operator in the discrete case. We also discuss applications of these bounds to the central limit theorem, simple random sampling, Poisson-Charlier approximation and geometric approximation using stochastic orderings


Introduction and main results
Stein's method was first developed for normal approximation by Stein (1972). See Stein (1986) and Chen and Shao (2005) for more recent developments. These powerful techniques were modified by Chen (1975) for the Poisson distribution and have since been applied to many other cases. See, for example, Peköz (1996), Brown and Xia (2001), Erhardsson (2005), Reinert (2005) and Xia (2005).
We consider approximation by a random variable Y and write Φ h := E[h(Y )]. Following Stein's method, we assume we have a linear operator A such that for some suitable class of functions F. From this we construct the so-called Stein equation whose solution we denote f := Sh. We call S the Stein operator. If Y ∼ N(0, 1), Stein (1986) shows that we may choose Ag(x) = g ′ (x) − xg(x) and where ϕ is the density of the standard normal random variable. It is this Stein operator we employ when considering approximation by the standard normal distribution. We also consider approximation by the exponential distribution, Poisson distribution and geometric distribution (starting at zero). In the case of the exponential distribution with mean λ −1 we use the Stein operator given, for x ≥ 0, by When Y has the Poisson distribution with mean λ we let for k ≥ 1. See, for example, Erhardsson (2005). We discuss possible choices of Sh(0) following the proof of Theorem 1.3. For the geometric distribution with parameter p = 1 − q we define Sh(0) = 0 and, for k ≥ 1, Essential ingredients of Stein's method are so-called Stein factors giving bounds on the derivatives (or forward differences) of the solutions of our Stein equation. Theorems 1.1-1.4 present a selection of such bounds. Throughout we will let D and ∆ denote the differential and forward difference operators respectively. As usual, the supremum norm of a real-valued function g is given by g ∞ := sup x |g(x)|.
Theorem 1.2. Let k ≥ 0 and S be the Stein operator given by (1.2). For h : R + → R (k + 1)times differentiable with D k h absolutely continuous, Theorem 1.3. Let k ≥ 0 and S be the Stein operator given by (1.3). For h : Theorem 1.4. Let k ≥ 0 and S be the Stein operator given by (1.4). For h : Our work is motivated by that of Stein (1986, Lemma II.3). Stein proves that for h : R → R bounded and absolutely continuous and S the Stein operator for the standard normal distribution Our work extends this result, though relies on the differentiability of h. Some of Stein's bounds above may be applied when h is not differentiable. Barbour (1986, Lemma 5) shows that for j ≥ 0  Barbour (1990) and Götze (1991), again in the case of the standard normal distribution. For a fixed j ≥ 1 and h : R → R with j bounded derivatives It is straightforward to verify that this bound is also sharp. We take h = H j , the jth Hermite polynomial, and get that Sh = −H j−1 .
Bounds of a similar type to ours have also been established for Stein-type operators relating to the chi-squared distribution and the weak law of large numbers. See Reinert (1995, Lemma 2.5) and Reinert (2005, Lemmas 3.1 and 4.1).

The central limit theorem and simple random sampling
Motivated by bounds in the central limit theorem proved, for example, by Ho and Chen (1978), Bolthausen (1984) and Barbour (1986) we consider applications of our Theorem 1.1.
Using the bound (1.6) Goldstein and Reinert (1997) give a proof of the central limit theorem. We could instead use our We can further follow Goldstein and Reinert (1997) in applying our Corollary 1.5 to the case of simple random sampling, obtaining the following analogue of their Theorem 4.1.
Corollary 1.6. Let X 1 , . . . , X n be a simple random sample from a set A of N (not necessarily distinct) real numbers satisfying a∈A a = a∈A a 3 = 0 .
Set W := X 1 + · · · + X n and suppose Var(W ) = 1. Let h : R → R be twice differentiable, with h and Dh absolutely continuous. Then where C := C 1 (N, n, A) is given by Goldstein and Reinert (1997, page 950).
Applications of our Theorem 1.1 to other CLT-type results come in combining our work with Proposition 4.2 of Lefèvre and Utev (2003). This gives us Corollary 1.7.
Corollary 1.7. Let X 1 , . . . , X n be independent random variables each with zero mean such that Var(X 1 ) + · · · + Var(X n ) = 1. Suppose further that for a fixed k ≥ 0 for all i = 1, . . . , k + 2 and j = 1, . . . , n. Write W = X 1 + · · · + X n . If h : R → R is (k + 1)-times differentiable with D k h absolutely continuous then We need show only the value of the universal constant p 3 to establish Corollary 1.7. This proof is given in Section 2. The remainder of the result follows immediately from Lefèvre and Utev's (2003) work.
As noted by the referee, the bounds in our Theorem 1.1 may also be applied to Edgeworth expansions. See, for example, Barbour (1986), Rinott and Rotar (2003) and Rotar (2005 for suitable random variables X 1 , . . . , X n . In the large deviation case we may employ our Corollary 1.7 to give an improvement on a constant of Lefèvre and Utev (2003, page 364). We consider a sequence X, X 1 , X 2 , . . . of iid random variables with zero mean and unit variance and write W := X 1 + · · · + X n . We let Y ∼ N(0, 1). Define t ∈ R and the random variable U , with expected value α, as in Lefèvre and Utev (2003, page 364). Following an analogous argument, we apply our Corollary 1.7 to the function h(x) := x + e −rx , where r := t nVar(U ), to obtain By modifying the representation in Lefèvre and Utev (2003, page 361) used in deriving Corollary 1.7 we may also prove the following bound.
We postpone the proof of Corollary 1.8 until Section 2.

Poisson-Charlier approximation
We suppose X 1 , . . . , X n are independent random variables with P ( be the jth factorial cumulant of W . We follow Barbour (1987) and Barbour et al (1992) in considering the approximation of W by the Poisson-Charlier signed measures {Q l : l ≥ 1} on Z + defined by [s] denotes the sum over all s-tuples (r 1 , . . . , r s ) ∈ (Z + ) s with s j=1 jr j = s, and we let R := s j=1 r j . Barbour (1987) shows that for h : (1.7) We let T be the operator such that T h(k) = Sh(k + 1), S the Stein operator for the Poisson distribution with mean λ and (l) denote the sum over Now, we use our Theorem 1.3 and a result of Barbour (1987 Hence, using (1.7), for l ≥ 2. This provides an alternative bound to that established in Barbour's work. Barbour (1987, page 756) further considers the case in which for all m ≥ 0. We can prove a bound analogous to (1.8) in this situation, with λ and λ l+1

Geometric approximation using stochastic orderings
We present here an application of our results based on unpublished work of Utev. Suppose Y is a random variable on Z + with characterising linear operator A. For k ≥ 0 and X another random variable on Z + define m Rearranging this sum we get for any l ≥ 1. We can take, for example, g = Sh, where S is the Stein operator for the random variable Y , so that E[ASh( Supposing that Sh(0) = 0, we obtain the following.
k+l has the same sign for each k ≥ 0 we have Consider the case l = 2 and P (Y = k) = pq k for k ≥ 0, so that Y has the geometric distribution starting at zero with parameter p = 1 − q. It is well-known that in this case we can choose Ag(k) = qg(k + 1) − g(k). See, for example, Reinert (2005, Example 5.3). With this choice we may also define Sh(0) = 0. It is straightforward to verify that, by the linearity of A, so that m So, combining the above with Theorem 1.4, we assume that X is a random variable on Z + with E[X] = q p , such that E[A(X − k + 1) + ] has the same sign for each k ≥ 2. Then, for all h : (1.11) We consider an example from Phillips and Weinberg (2000). Suppose m balls are placed randomly in d compartments, with all assignments equally likely, and let X be the number of balls in the first compartment. Then X has a Pólya distribution with . It can easily be checked that m (2) k ≥ 0 for all k ≥ 2, so that our bound (1.11) becomes . In many cases this performs better than the analogous bound found by Phillips and Weinberg (2000, page 311).
We consider a further application of (1.11). Suppose X = X(ξ) ∼ Geom(ξ) for some random variable ξ taking values in Using (1.10) we get that For example, suppose ξ ∼ Beta(α, β) for some α > 2, β > 0. It can easily be checked that the criterion (1.13) is satisfied for all k ≥ 2 and the correct choice of p, using (1.12), is

Proofs
Proof of Theorem 1.1 In order to prove our theorem we introduce a variant of Mills' ratio and exploit several of its properties. We define the function Z : R → R by where Φ and ϕ are the standard normal distribution and density functions, respectively. The function Z has previously been used in a similar context by Lefèvre and Utev (2003). Note that Lemma 5.1 of Lefèvre and Utev (2003) gives us that D j Z(x) > 0 for each j ≥ 0, x ∈ R.
The properties of our function which we require are given by Lemma 2.1 below. Several of these use inductive proofs, in which the following easily verifiable expressions will be useful for establishing the base cases. Note that throughout we will take D j Z(−x) to mean the function DZ(x) = 1 + xZ(x) , (2.1) Proof. For (iv) we note firstly that using Lemma 5.1 of Lefèvre and Utev (2003) the required integrals exist. Now, for the case k = 0, by (2.1). For the inductive step let k ≥ 1 and assume α k−1 has the required form. Integrating by parts, Integrating by parts again, and using the inductive hypothesis, we get Rearranging and applying (i) gives us the result. 2 We are now in a position to establish the key representation of D k+2 Sh, with S given by (1.1).
Lemma 2.2. Let k ≥ 0 and let h : R → R be (k + 1)-times differentiable with D k h absolutely continuous. Then for all x ∈ R Proof. Again, we proceed by induction on k. The case k = 0 was established by Stein (1986, (58) on page 27). The required form can be seen using (2.2) and (2.5). Now let k ≥ 1. We assume firstly that h satisfies the additional restriction that D k h(0) = 0. Using this and the absolute continuity of D k h we may write, for t ∈ R, We firstly consider the case x ≥ 0. Using the above, and interchanging the order of integration, we get that Applying Lemma 2.1(iv) we obtain In a similar way we get In the case x < 0 a similar argument also yields (2.6) and (2.7). Now, by the inductive hypothesis we assume the required form for D k+1 Sh. Differentiating this we get Using (2.6) and (2.7) in the above we obtain the desired representation along with the additional term which is zero by Lemma 2.1(iii).
The proof is completed by removing the condition that D k h(0) = 0. We do this by applying our result to g(x) Finally, it is easily verified that Sp k is a polynomial of degree k−1 and thus D k+2 Sp k ≡ 0. Hence D k+2 Sg = D k+2 Sh. 2 We now use the representation established in Lemma 2.2 to prove Theorem 1.1. Fix k ≥ 0 and let h be as in Lemma 2.2, with the additional assumption that D k+1 h ≥ 0. Since ϕ(x), D j Z(x) > 0 for each j ≥ 0, x ∈ R, we get that for all where Applying Lemma 2.1(iv) and (ii) we get that ρ k (x) = k! for all x. Combining this with (2.8) we obtain which gives us that D k+2 Sh ∞ ≤ D k+1 h ∞ for all such h.
We now remove the assumption that D k+1 h ≥ 0. We use a method analogous to that in the last part of the proof of Lemma 2.2. Consider the function H : R → R given by H( which gives Theorem 1.1. The proofs of Theorems 1.2, 1.3 and 1.4 below are analogous to our proof of Theorem 1.1. Remark. We note that the bound we have established in Theorem 1.1 is sharp. Let a > 0 and suppose h a is the function with D k+1 h a continuous such that D k+1 h a (x) = 1 for |x| > a and D k+1 h a (0) = −1. Using Lemma 2.2 we get Since each of the integrands in the above are bounded, letting a → 0 gives us that Proof of Theorem 1.2 We suppose now that Y ∼ Exp(λ), the exponential distribution with mean λ −1 . It can easily be checked that a Stein equation for Y is given by Proof. We proceed by induction on k. For k = 0 we follow the argument of Stein (1986, Lemma II.3) used to establish (1.5). From (2.10) we have that for all x ≥ 0. Now, we write where f is the density function of Y . Using the absolute continuity of h to write, for example, x t Dh(y) dy and interchanging the order of integration we obtain where F is the distribution function of Y . We substitute (2.12) into (1.2), interchange the order of integration and rearrange to get Combining this with (2.11) and (2.12) we establish our lemma for k = 0. Now let k ≥ 1 and assume for now that h also satisfies D k h(0) = 0. We proceed as in the proof of Lemma 2.2. We use the absolute continuity of D k h to write D k h(t) = t 0 D k+1 h(y) dy and hence show that Using the above with our inductive hypothesis we obtain the required representation. The restriction that D k h(0) = 0 is removed as in the proof of Lemma 2.2, noting that S applied to a polynomial of degree k returns a polynomial of degree k in this case. 2 Suppose h is as in the statement of Theorem 1.2, with the additional condition that D k+1 h ≥ 0. Noting that λe λx ∞ x e −λt dt = 1, we use Lemma 2.3 to obtain for k ≥ 0. The restriction that D k+1 h ≥ 0 is lifted and the proof completed as in the proof of Theorem 1.1.

Proof of Theorems 1.3 and 1.4
Suppose we have a birth-death process on Z + with constant birth rate λ and death rates µ k , with µ 0 = 0. Let this have an equilibrium distribution π with P (π = k) = π k = π 0 λ k k i=1 µ i and F (k) := k i=0 π i . It is well-known that in this case a Stein equation is given by and (2.14) for k ≥ 1. See, for example, Brown and Xia (2001) or Holmes (2004). Sh(0) is not defined by (2.13), and we leave this undefined for now. We consider particular choices later. With appropriate choices of λ and µ k our Stein operator (2.14) gives us (1.3) and (1.4). We define Z * 1 and Z * 2 analogously to our function Z in the proof of Theorem 1.1. Let for k ≥ 1. We note that for any functions f and g on Z + we have The following easily verifiable identities will prove useful.
for k ≥ 1. Now, we follow the proof of Lemma 2.3 and use (2.13) to get that for k ≥ 0. We obtain the discrete analogue of (2.12), for h : Z + → R bounded and k ≥ 0, and combining this with (2.14) we get that for k ≥ 1. To prove our Theorems 1.3 and 1.4 we first generalise (2.23) in both the geometric and Poisson cases.

The geometric case
Suppose µ k = 1 for all k ≥ 1 and λ = q = 1 − p. Then π k = pq k for k ≥ 0 and π ∼ Geom(p), the geometric distribution starting at zero. We now define Sh(0) = 0. In this case we have the following representation.
Proof. We proceed by induction on j. It is easily verified that the case j = 0 is given by (2.23) for k ≥ 1. For k = 0 we can combine (2.20) and (2.21) to get our result. Now let j ≥ 1, and assume additionally that ∆ j h(0) = 0. Then, writing ∆ j h(i) = i−1 l=0 ∆ j+1 h(l) it can be shown that Using (2.15) together with (2.24) and our representation of ∆ j+1 Sh we can show the desired result. Finally, we remove the condition that ∆ j h(0) = 0 as in the final part of the proof of Lemma 2.2, noting that S applied to a polynomial of degree j gives a polynomial of degree j in this case. 2 We now complete the proof of Theorem 1.4. Let h : for all j ≥ 0. The restriction that ∆ j+1 h ≥ 0 is removed and the proof completed as in the final part of the proof of Theorem 1.1.

The Poisson case
We turn our attention now to the case where µ k = k, so that π ∼ Pois(λ). We begin with some properties of our functions Z * 1 and Z * 2 in this case. Lemma 2.6 gives us an analogue to Lemma 2.1 for the Poisson distribution.
Proof. We note that d dλ P (π ≤ k) = −π k and hence So, we can write ii. iii.
Proof. To prove (iv) we again use induction on j. For the case j = 0 we check the result directly for k = 1. For k ≥ 2 note that k−1 i=0 F (i) = kF (k − 1) − λF (k − 2) and use (2.16). For the inductive step we will use that for all functions f and g on Z + (2.25) Applying (2.25) once more and using our representation for α * j−1 we obtain Rearranging and applying (i) completes the proof. (v) is proved analogously to (iv). 2 We may now prove our key representation in the Poisson case.
for all k ≥ 1.
Finally, the restriction that ∆ j h(0) = 0 is lifted as in the proof of Lemma 2.2. It can easily be checked that in the Poisson case if h(k) is a polynomial of degree j then Sh(k) is a polynomial of degree j − 1 for k ≥ 1. 2.
We now complete the proof of Theorem 1.3. Let h : Z + → R be bounded with ∆ j+1 h ≥ 0. Then we can use Lemmas 2.5 and 2.7 to write Applying Lemma 2.6(ii), (iv) and (v) we get that ρ * j (k) = j! λ j for all k ≥ 1, and so We remove our condition that ∆ j+1 h ≥ 0 and complete the proof as in the standard normal case.
Remark. The value of Sh(0) is not defined by the Stein equation (2.13) and so may be chosen arbitrarily. In the geometric case it was convenient to choose Sh(0) = 0 so that the representation established in Lemma 2.4 holds at k = 0. We now consider possible choices of Sh(0) in the Poisson case. Common choices are Sh(0) = 0, as in Barbour (1987) and Barbour et al (1992), and Sh(0) = Sh(1), as in Barbour and Xia (2006). However, with neither of these choices can we use the above methods to obtain a representation directly analogous to our Lemma 2.7 for k = 0 and all bounded h. Our proof relies on the fact that if h(k) = k j for a fixed j ≥ 1 and all k ≥ 0 then Sh is a polynomial of degree j − 1 for k ≥ 1. Taking h(k) = k 2 , for example, shows that with neither of the choices of Sh(0) outlined above is Sh a polynomial for all k ≥ 0.
Despite these limitations, there are some cases where useful bounds can be obtained. Suppose we choose Sh(0) = Sh(1), so that ∆ 2 Sh(0) = ∆Sh(1). Using (2.13), (2.21) and (2.22) we can write Assuming ∆h ≥ 0, we can then proceed as in the proof of Theorem 1.3 and use Lemma 2.6(v) to get that With our choice of Sh(0) here we are able to remove the condition that ∆h ≥ 0. Setting H(k) = h(k)−g(k) for k ≥ 0, where g(k) := Ck and C := inf i≥0 ∆h(i), we have ∆H(k) = ∆h(k)−C ≥ 0. It can easily be checked that Sg(k) = −C for all k ≥ 0 and so ∆ 2 SH = ∆ 2 Sh. Applying (2.29) to H and combining with Theorem 1.3 we have and so we obtain an analogous result to that of Barbour and Xia (2006, Theorem 1.1). We note that this argument is heavily dependent on our choice of Sh(0).
Of course, if we wish to estimate ∆ j+2 Sh ∞ for a single, fixed j ≥ 0 when Sh(0) may be chosen arbitrarily, we can always choose such that ∆ j+2 Sh(0) = 0 and apply Theorem 1.3.
Proof of Corollaries 1.7 and 1.8 We let Y ∼ N(0, 1) and Φ h := E[h(Y )]. We begin by establishing the bound in Corollary 1.8. Using (4.1) of Lefèvre and Utev (2003) gives us that where P k (X, t) := X k+1 t k−1 (k−1)! − σ 2 j X k−1 t k−2 (k−2)! and S is given by (1.1). Now, integrating by parts we get 1 − σ 2 j x . We apply (2.30) to G j (T n,j ) for each j and combine this with (2.31) to obtain Now, U (X) is continuous and weakly convex. That is, for all λ ∈ [0, 1], U (λX + (1 − λ)Y ) ≤ max{U (X), U (Y )}. Thus, by a result of Hoeffding (1955) we need only consider random variables with at most two points in their support. We suppose that X takes the value a with probability p ∈ [0, 1] and b with probability 1 − p such that 0 ≤ a ≤ 1 ≤ b < ∞. We use our condition E[X 2 ] = 1 to obtain p = b 2 −1 b 2 −a 2 . This allows us to express U (X) in terms of a and b. Using elementary techniques we maximise the resulting expression to give p 3 = 1 3 .