Multivariate Stein Factors for a Class of Strongly Log-concave Distributions

We establish uniform bounds on the low-order derivatives of Stein equation solutions for a broad class of multivariate, strongly log-concave target distributions. These"Stein factor"bounds deliver control over Wasserstein and related smooth function distances and are well-suited to analyzing the computable Stein discrepancy measures of Gorham and Mackey. Our arguments of proof are probabilistic and feature the synchronous coupling of multiple overdamped Langevin diffusions.


Introduction
In 1972, Stein [22] introduced a powerful method for bounding the maximum expected discrepancy, d H (Q, P ) sup h∈H |E Q [h(X)] − E P [h(Z)]|, between a target distribution P and an approximating distribution Q. Stein's method classically proceeds in three steps: 1. First, one identifies a linear operator A that generates mean-zero functions under the target distribution. A common choice for a continuous target on R d is the infinitesimal generator of the overdamped Langevin diffusion 1 (also known as the Smoluchowski dynamics) [19,Secs. 6.5 and 4.5] with stationary distribution P : (Au)(x) = 1 2 ∇u(x), ∇ log p(x) + 1 2 ∇, ∇u(x) . (1.1) Here, p represents the density of P with respect to Lebesgue measure.
2. Next, one shows that, for every test function h in a convergence-determining class H, the Stein equation Our next result shows that control over the smooth function distance also grants control Proof. The first inequality follows directly from the inclusions BL ⊂ W and M ⊂ W.
To establish the second, we fix h ∈ W and t > 0 and define the smoothed function where φ is the density of a vector of d independent standard normal variables. We first show that h t is a close approximation to h when t is small. Specifically, if X ∈ R d is an integrable random vector, independent of G, then, by the Lipschitz assumption on h, We next show that the derivatives of h t are bounded. Fix any x ∈ R d . Since h is Lipschitz, it admits a weak gradient, ∇h, bounded uniformly by 1 in · 2 . We alternate differentiation and integration by parts to develop the representations In the final equality we have used the fact that v, Z and w, Z are jointly normal with zero mean and covariance Σ = v We can now develop a bound for d W using our smoothed functions. Let represent the maximum derivative bound of h t , and select X ∼ µ and Z ∼ ν to satisfy

Example application to Bayesian logistic regression
Before turning to the proof of Theorem 2.1, we illustrate a practical application to measuring the quality of Monte Carlo or cubature sample points in Bayesian inference. Consider the Bayesian logistic regression posterior density [see, e.g., 11] logistic regression likelihood based on L observed datapoints (v l , y l ) and a known prior hyperparameter σ 2 > 0. In this standard model of binary classification, β ∈ R d represents our inferential target, an unknown parameter vector with a multivariate Gaussian prior; y l ∈ {0, 1} is the class label of the l-th observed datapoint; and v l ∈ R d is an associated vector of covariates.
Since the normalizing constant of p is unknown, it is common practice to approximate expectations h(β)p(β)dβ under p with sample estimates, 1 n n i=1 h(β i ), based on sample points β i ∈ R d drawn from a Markov chain or a cubature rule [11]. Theorem 2.1 furnishes a way to uniformly bound the error of this approximation, | 1 n n i=1 h(β i ) − h(β)p(β)dβ|, for all sufficiently smooth functions h.
Concretely, we have, for all 2 unit vectors u 1 , u 2 , u 3 , u 4 ∈ R d , . We may now plug the associated Stein factors into the non-uniform graph Stein discrepancy of [12] to obtain a computable upper bound on d M (Q, P ) or d W (Q, P ) for any discrete probability measure Q = 1 n n i=1 δ βi .

Proof of Theorem 2.1
Before tackling the main proof, we will establish a series of useful lemmas. We will make regular use of the following well-known Lipschitz property:

Properties of overdamped Langevin diffusions
Our first lemma enumerates several properties of the overdamped Langevin diffusion that will prove useful in the proofs to follow.
, has stationary distribution P , and satisfies strong continuity Proof. Consider the Lyapunov function V (x) = x 2 2 + 1. The strong log-concavity of p, the Cauchy-Schwarz inequality, and the arithmetic-geometric mean inequality imply that

High-order weighted difference bounds
A second, technical lemma bounds the growth of weighted smooth function differences in terms of the proximity of function arguments. The result will be used to characterize the smoothness of Z t,x as a function of the starting point x (Lemma 3.3) and, ultimately, to establish the smoothness of u h (Theorem 2.1). Lemma 3.2 (High-order weighted difference bounds). Fix any weights λ, λ > 0 and any vectors x, y, z, w, x , y , z , Proof. To establish the second-order difference bound (3.2), we first apply Taylor's theorem with mean-value remainder to h( for some ζ, ζ ∈ R d . Cauchy-Schwarz, the definition of the operator norm, and the Lipschitz gradient relation (3.1) now yield the advertised conclusion (3.2).
To derive the third-order difference bound (3.3), we apply Taylor's theorem with for some ζ , ζ , ζ , ζ ∈ R d . We will bound each line in this expression in turn. First we see, by Cauchy-Schwarz and the Lipschitz property (3.1), that Next, we invoke our second-order difference bound (3.2) on the C 2 (R d ) function x → ∇h(x), y − x , apply the Cauchy-Schwarz inequality, and use the definition of the operator norm to conclude that To bound the subsequent line, we note that Cauchy-Schwarz, the definition of the operator norm, and the Lipschitz property (3.1) imply that Similarly, Finally, Cauchy-Schwarz and the definition of the operator norm give 2 ).
Bounding the third-order difference (3.4) in terms of these four estimates yields (3.3).

Synchronous coupling lemma
Our proof of Theorem 2.1 additionally rests upon a series of coupling inequalities which serve to characterize the smoothness of Z t,x as a function of x. The couplings espoused in the lemma to follow are termed synchronous, because the same Brownian motion is used to drive each process.   (3.6) and define the differenced processes These coupled processes almost surely satisfy the synchronous coupling bounds, e kt/2 Z t,x+ v − Z t,x 2 ≤ , the second-order differenced function bound, and the third-order differenced function bound, for each t ≥ 0, h 2 ∈ C 2 (R d ), and h 3 ∈ C 3 (R d ).

Second-order bounds
To establish the second conclusion (3.8), we consider the Itô process of second-order differences and apply Itô's lemma to the mapping (t, w) → e kt/2 w 2 . This yields where, to achieve the second inequality, we used the k-strong log-concavity of p. Now we may derive the second-order synchronous coupling bound (3.8), since Applying the synchronous coupling bound (3.8) to the estimate (3.13) finally delivers the second-order differenced function bound (3.10).

Third-order bounds
To establish the third conclusion (3.9), we consider the Itô process of third-order differences Fix a value s ∈ [0, t], and introduce the shorthand c 1 f 1 (x, x , , , ) and c 2 f 2 (x, x , , , ). For any h 3 ∈ C 3 (R d ), the Lemma 3.2 third-order difference inequality where we have applied the triangle inequality to achieve (3.14). Applying the bound (3.14) to the thrice continuously differentiable function h 3 (z) = U s , ∇ log p(z) with In the final line, we used the k-strong log-concavity of p. Our efforts now yield (3.9) via The third-order differenced function bound (3.11) then follows by applying the third-order synchronous coupling bound (3.9) to the estimate (3.15).

Proof of Theorem 2.1
By Lemma 3.1, for each x ∈ R d , the overdamped Langevin diffusion (Z t,x ) t≥0 is well-defined with stationary distribution P . Moreover, for each x ∈ R d , the diffusion (Z t,x ) t≥0 , by definition, satisfies In what follows, when considering the joint distribution of a finite collection of overdamped Langevin diffusions, we will assume that the diffusions are coupled in the manner of Lemma 3.3, so that each diffusion is driven by a shared d-dimensional Wiener process (W t ) t≥0 .
Fix any x ∈ R d and any h ∈ C 3 (R d ) with bounded first, second, and third derivatives. We divide the remainder of our proof into five components, establishing that u h exists, u h is Lipschitz, u h has a Lipschitz gradient, u h has a Lipschitz Hessian, and u h solves the Stein equation (1.2).
Existence of u h To see that the integral representation of u h (x) is well-defined, note The first relation uses the stationarity of P , the second uses the Lipschitz relation ( The second relation is an application of the Lipschitz relation (3.1), and the third applies the first-order coupling inequality (3.7) of Lemma 3.3.
Lipschitz continuity of ∇u h To demonstrate that u h is differentiable with Lipschitz gradient, we first establish a weighted second-order difference inequality for u h .

Lemma 3.4.
For any vectors x, x , v ∈ R d with v 2 = 1 and weights , > 0, Proof. We apply the Lemma 3.3 second-order function coupling inequality (3.10) to obtain The desired bound follows by integrating the final expression. Now, fix any x, v ∈ R d with v 2 = 1. As a first application of the Lemma 3.4 secondorder difference inequality (3.17), we will demonstrate the existence of the directional derivative Hence, the sequence is Cauchy, and the directional derivative To see that the directional derivative (3.18) is also Lipschitz, fix any v ∈ R d , and consider the bound Proof. Introduce the shorthand c 1 f 1 (x, x , , , ) and c 2 f 2 (x, x , , , ). We apply the Lemma 3.3 third-order function coupling inequality (3.11) to the thrice continuously differentiable function h to obtain Integrating this final expression yields the advertised bound. Now, fix any x, v, v ∈ R d with v 2 = v 2 = 1. As a first application of the Lemma 3.5 third-order difference inequality (3.20), we will demonstrate the existence of the second-order directional derivative Lemma 3.5 guarantees that, for any integers m, m > 0, Hence, the sequence is Cauchy, and the directional derivative (3.21) exists.
To see that the directional derivative (3.21) is also Lipschitz, fix any v ∈ R d , and consider the bound where the final inequality follows from Lemma 3.5. Since each second-order directional derivative is Lipschitz continuous, we conclude that u h ∈ C 2 (R d ) with Lipschitz continuous Hessian ∇ 2 u h . Our Lipschitz gradient result (3.19)