An algebra of Stein operators

We build upon recent advances on the distributional aspect of Stein's method to propose a novel and flexible technique for computing Stein operators for random variables that can be written as products of independent random variables. We show that our results are valid for a wide class of distributions including normal, beta, variance-gamma, generalized gamma and many more. Our operators are $k$th degree differential operators with polynomial coefficients; they are straightforward to obtain even when the target density bears no explicit handle. As an application, we derive a new formula for the density of the product of $k$ independent symmetric variance-gamma distributed random variables.


Stein operators
Stein operators are, in short, linear operators that are of use within the context of Stein's method of distributional approximation. We refer to the survey paper [30] as well as to the monographs [24,6] for an overview of some of the fruits of Charles Stein (1920-2016)'s seminal insights first proposed in [35].
In the sequel we adopt the following lax definition: Definition 1.1. A linear differential operator A acting on a class F of functions is a Stein operator for X if E [Af (X)] = 0 for all f ∈ F. We call the operator polynomial if it can be written as a finite sum A = i,j a ij M i D j for real coefficients a ij ∈ IR, with M (f ) = (x → xf (x)) and D(f ) = (x → f ′ (x)).
The first key to setting up Stein's method for a target X is to identify the operator A X . A general canonical theory is available in [20], and many other general theories have been proposed in recent years. These are relatively easy to setup under specific assumptions on the target density, see https://sites.google.com/site/steinsmethod/ for an overview of the quite large literature on this topic. In the case where the target has a density with respect to the Lebesgue measure, an assumption which we impose from here onwards, then adhoc duality arguments are easy to apply for targets X whose densities satisfy explicit differential equations. For instance the p.d.f. γ(x) = (2π) −1/2 e −x 2 /2 of the standard normal distribution satisfies the first order ODE γ ′ (x) + xγ(x) = 0 leading, by integration by parts, to the well-known operator Af (x) = f ′ (x) − xf (x). By a similar reasoning, natural first order operators are easy to devise for target distributions which belong to the Pearson family [32] or which satisfy a diffusive assumption [8,18].
There is a priori no reason for which the characterising operator should be of first order and a very classical example is the above mentioned standard normal operator which is often viewed as Af (x) = f ′′ (x) − xf ′ (x), the generator of an Ornstein-Uhlenbeck process (see, for example, [3], [16]). Higher order operators whose order can not be trivially reduced are also available : [11] obtains a second order operator for the entire family of variance-gamma distributions (see also [10] and [12]), [29] obtain a second order Stein operator for the Laplace distribution, and [27] obtain a second order operator for the PRR distribution, which has a density that can be expressed in terms of the Kummer U function. More generally, if the p.d.f. of X is defined in terms of special functions (Kummer U , Meijer G, Bessel, etc.) which are themselves defined as solutions to explicit dth order differential equations then the duality approach shall yield a tractable differential operator with explicit coefficients.
In many cases the target distribution is not defined analytically in terms of its distribution but rather probabilistically, as a statistic (sum, product, quotient) of independent contributions. Applying a direct duality argument or requiring in any way explicit knowledge of the density in order to obtain Stein operators for such objects is generally not tractable, and new approaches must be devised. In [1], a Fourier-based approach is developed for identifying appropriate operators for arbitrary combinations of independent chi-square distributed random variables. In [14,13], an iterative conditioning argument is provided for obtaining operators for (mixed) products of independent random beta, gamma and mean-zero normal random variables.

An operator algebra
An important open research problem on this topic is the following: given independent random variables X 1 , . . . , X d with polynomial operators A 1 , . . . , A d , respectively, can one deduce a tractable polynomial Stein operator for statistics of the form X = F (X 1 , . . . , X d )? In this paper we provide an answer for functionals of the form F (x 1 , . . . , x d ) = x α 1 1 · · · x α d d with α i ∈ IR and X i 's with polynomial Stein operator satisfying a specific commutativity assumption (Assumption 1 below).
Our starting point is the following extension of one of the main results of [14]: Proposition 1.2. Let F = C ∞ 0 (IR), the set of smooth functions with compact support, and τ a (f ) = (x → f (ax)). Assume X, Y are random variables with respective Stein operators where p ∈ IN and where the operators L X , K X , L Y , K Y commute with each other and with every τ a , a ∈ IR\{0}. Then, if X and Y are independent, is a Stein operator for XY .
Proof. Let f ∈ F. Using a conditioning argument and the commutative property between the different operators, we have that which achieves the proof.
The assumption that the operators commute with scaling τ a is crucial for the proof of Proposition 1.2; it will also reveal itself to be the linchpin of our "operator algebra". Restricting to first order derivatives we deduce that the fundamental operators need to be of the form with I(f ) = f the identity operator, as these are the only first order polynomial Stein operators which commute with the multiplication operator (as long as the class F is large enough). A fundamental subalgebra of linear operators, which will have a prominent role in this work, is the algebra T spanned by the T r : Note also that T is the set of operators that are polynomials of the operator M D. Since each T r commutes which each τ a , so does any element of T . These considerations naturally lead to the following assumption which will underpin the entire theory we develop: Assumption 1: There exist linear operators L, K and k ∈ IN such that X admits a Stein operator (in the sense of Definition 1.1) of the form where the operators L, K are elements of T .
In most situations, however, the operators K and L will be products of T r operators. Remark 1.3. We will also need to consider the limit of operator T r as r → ∞. Although such a limit is badly defined, we note how lim r→∞ r −1 T r = I (pointwisely for any f ∈ F). Identities (5) also hold, after the appropriate rescaling, as r → ∞.

Purpose and overview of the paper
As we shall see in Section 2, the operator algebra developed in this paper will allow us to easily obtain many of the Stein operators from the existing literature. This is in part due to the fact that classical probability distributions (including the PRR and variance-gamma distributions) can be expressed as products and powers of simpler probability distributions, such as the normal, beta and gamma, whose classical Stein operators satisfy Assumption 1. Moreover, our operator algebra and a neat new technique (see the proof of Proposition 4.2) allow us to obtain new Stein operators that would be beyond the scope of existing methods. This paper is mostly devoted to the problem of obtaining Stein operators, which allows for a focused treatment of their theory. We acknowledge, though, that further work is required before the Stein operators obtained in this paper can be used to prove approximation theorems via Stein's method. However, we stress that the theory of Stein operators to which this paper contributes is of importance also in its own right. Indeed, there are now a wide variety of techniques which allow one to obtain useful bounds on solutions to the resulting Stein equations (see, for example, [19,9]) and which can be adapted to the operators that we derive. Also, and this has now been demonstrated in several papers such as [1,2,25], Stein operators can be used for comparison of probability distributions directly without the need of solving Stein equations; such an area is also the object of much interest. Stein operators are also of use in applications beyond proving approximation theorems. Indeed, in Section 3, we propose a novel general technique for obtaining formulas of densities of distributions that may be intractable through other existing methods. Finally, the characteristics of the operators open new and deep questions on the very nature of the objects that we are working with; see Conjecture 4.5.
Many classical continuous probability distributions on the real line have an operator of the form (4) and a trove of examples will be provided in Section 2.2. In Sections 2.3 and 2.4 we will build upon Proposition 1.2 to provide abstract results for products and powers of a wide variety of probability distributions satisfying the assumptions, and detailed examples will be worked out in Section 2.5. Section 3 will be devoted to an application of our operators. A general theory of "Stein operator algebra" for arbitrary functionals of arbitrary targets is, however, out of reach of the current state-of-the-art on Stein's method. In fact, even in the case of products of normal random variables there are great difficulties and important questions remain unsolved. We will further elaborate on this (and also provide several examples illustrating the difficulties) in Section 4. In Appendix A, we derive a Stein operator for the product of two independent non-centered normal random variables. Appendix B contains a list of classical Stein operators for continuous distributions, written in terms of the T r operators, as well as some examples of Stein operators for product distributions. In Appendix C, we collect some basic properties of the Meijer G-function that are used in this paper. 2 Stein operators for product distributions 2.1 About the class of functions on which the operators are defined Let us first briefly discuss the class of functions F on which the Stein operators are defined. It is not the purpose of this paper to seek for the largest possible set F for which we have IE[Af (X)] = 0 for all f ∈ F, since this can only be done on a case-by-case basis. Note that even for the classical normal distribution, there is no simple characterisation of this largest set (and, in fact, there is little interest in doing so in the context of Stein's method). A commonly 'weak' condition found in the literature is that both IE|f ′ (Z)| and E|Zf (Z)| (Z ∼ N (0, 1)) must be finite; this is sufficient but of course not necessary. In our context, we explain here why our operators act on a 'large' class F, in the sense that, throughout the paper except in Sections 2.4 and 4.5, F will always contain the set of smooth functions with compact support C ∞ 0 (IR).
Remark 2.1. In some cases (for instance when X has exponential moments), one can easily extend this class to smooth functions f such that f (k) has at most polynomial growth for all k ≥ 0 (and in particular, to polynomials).
The distributions of the underlying random variables X considered in this paper will always admit a smooth density p defined on some (possibly unbounded) interval J ⊂ IR. p is also nonvanishing on J. In this case, we call 'canonical' the Stein operator [20]. We have IE[A c f (X)] = 0 at least for those functions f such that f (x)p(x) → 0 on the border of J.
If J = (−∞, +∞), it is clear that any f ∈ C ∞ 0 (IR) satisfies f (x)p(x) → 0 on the border of J. On the other hand, if J has a finite border, say J = (a, +∞) with a ∈ IR, then f ∈ C ∞ 0 (IR) does not imply necessarily that f p vanishes on the border (see [5] for the case of exponential approximation). In this case we apply the operator to (x − a)f instead of f , which leads to the new operator A = A c (M − aI). This operator satisfies IE[Af (X)] = 0 for all f ∈ C ∞ 0 (IR). Of course this has to be done on both borders, if both are finite. For instance, the canonical Stein operator for the Beta distribution is f (which is actually the one commonly used in the Stein literature; see [8], [15]).

The building blocks
The purpose of this section is to give a simple criteria to identify operators A that satisfy Assumption 1.
Let us first give some basic results. Note that, using the fact that DM = M D + I, as well as the fact that for all a ∈ IR τ a M = a M τ a and Dτ a = a τ a D we deduce ∀r ∈ IR, ∀n ∈ IN, T r M n = M n T r+n , and T r D n = D n T r−n .
Note also that if we define, for an operator L, LT : {LA ; A ∈ T }, and similarly T L, then a direct consequence of (5) is that for any n ∈ Z + , M n T = T M n and D n T = T D n .
We have the following criterium for X to satisfy Assumption 1, when its Stein operator is written in an expanded form.
Proof. Let us give a preliminary result.
Then if q ≥ 0, L ∈ M q T . Indeed, note that M D = T 0 and DM = T 1 . Hence, in the product M k 1 D l 1 . . . M kn D ln , one can pair any product M D (or DM ), replace it by T 0 (or T 1 ), and flush it right using (5). Hence the result. Now we prove the Lemma. If X satisfies Assumption 1, then a Stein operator for X is A = P (M D) − M k Q(M D), where P and Q are polynomials. Using repeatedly the fact that DM = M D + I, one sees that for any integer n ∈ N, (M D) n can be expanded in a sum of terms of the type a l M l D l , l ∈ N, a l ∈ IR. The same holds for Q(M D).
Let us prove the converse. We only treat the case # {j − i | a ij = 0} = 2, the others being similar. Assume A = i,j a ij M j D i such that # {j − i | a ij = 0} = {k 1 , k 2 } with k 1 > k 2 , is a Stein operator for X. Assume first that k 2 ≥ 0. Then from the above, M j D i ∈ M j−i T , the latter set being either M k 1 T or M k 2 T , depending on the value of j − i. But from (6), Simplifying by M k 2 on the right (i.e., applying A to x → f (x)x −k 2 instead of f ) yields the result.
If k 2 < 0, one can multiply A by M −k 2 on the right and proceed in the same manner.
Remark 2.3. Assume X admits a smooth density p, which solves the differential equation Then, by duality (i.e. integration by parts; see Section 3 for further detail), a Stein operator for X is given by Then, in a similar manner as in previous Lemma, one can prove that X satisfies Assumption 1 if, and only if, In other words, the condition given in previous Lemma for X to satisfy Assumption 1 can be equivalently checked on the Stein operator A or on the differential operator B which cancels out the density of X.
One can specialize the result of Lemma 2.2 when the score of the distribution of X is a rational fraction, which includes a wide class of classical distributions.
Corollary 2.4. Assume X admits a score function of the form with k, l ∈ Z + . Then X satisfies Assumption 1. Conversely, if the score ρ of X is a rational fraction, and if X satisfies Assumption 1, then ρ is of the form (7).
Starting from the canonical Stein operator f → f ′ + p ′ p f , and applying it to ( m j=0 b j x j )f (x), we have that for some a, ℓ, δ 0 , δ 1 , δ 2 and x on the resulting support. If δ 0 = 0 then and if δ 0 = 0 then is a Stein operator for X.
Proof. If X is Pearson with log-derivative (8) then denoting −P num /P denom this ratio we see This operator is integrable with respect to p (with integral 0) for all f ∈ F. Conclusion (9) follows immediately, while (10) is obtained after replacing f with xf and using T r M = M T r+1 .
Example 2.6. Here we provide a non-exhaustive list of examples; others include the Pareto, inverse-gamma and F -distribution.
Assumption 1 is satisfied.

Products of independent random variables
We first note how Proposition 1.2 is easily generalised to the product of n random variables, by induction. More precisely, if (X i ) 1≤i≤n are independent random variables with respective Stein commute with each other and with the τ a , a ∈ IR, then a Stein operator The main drawback of Proposition 1.2 is that we assume the same power of M p appears in both operators. As such, the Proposition cannot be applied for instance for the product of a gamma (for which p = 1, see Example 2.6) and a centered normal (for which p = 2, see Example 2.6). In the following Lemma and Proposition, we show how to bypass this difficulty: one can build another Stein operator for X with the power p multiplied by an arbitrary integer k (even though by doing so, one increases the order of the operator). Here we restrict ourselves to the case where the L i and K i operators are products of operators T α and we make use of the relation (5).

Lemma 2.7. Assume X has a Stein operator of the form
Then, for every k ≥ 1, a Stein operator for X is given by Proof. We prove the result by induction on k. By assumption, it is true for k = 1. Then, using the recurrence hypothesis and (5), which proves our claim.
Lemma 2.7 leads to the following rule of thumb for the problem of finding a Stein operator for a product of independent random variables X and Y with Stein Apply Lemma 2.7 to X with k = p ′ and to Y with k = p to get Stein operators for X and Y of the form of Proposition 1.2, but with p replaced by pp ′ . Then apply the Proposition.
As an illustration, one can prove the following.
Proposition 2.8. Assume X, Y are random variables with respective Stein operators where p, q ∈ IN and α 1 , α 2 , β 1 , β 2 ∈ IR∪ {∞} (and Remark 1.3 applies in case any of the α i or β i , i = 1, 2 is set to +∞). Let m be the least common multiple of p and q and write m = k 1 p = k 2 q. Then, if X and Y are independent, is a Stein operator for XY .
Proof. Apply Lemma 2.7 with k 1 and k 2 to get, for all f ∈ F, Then the proof follows from an application of Proposition 1.2.
Remark 2.9. Let us give an example when one of the T operators is the identity. We have that if X, Y are random variables with respective Stein operators then, with the same notations, a Stein operator for XY is Remark 2.10. As seen in Example 2.6 and as will again be shown in Section 2.5, many classical distributions have Stein operators of the form (15) and (16), including the mean-zero normal, gamma, beta, Student's t, F -distribution, as well as powers of such random variables, which includes inverse distributions, such as the inverse gamma distribution. See Appendix B for a list of these Stein operators. Note, in particular, that the standard normal Stein operator given in Appendix B is given by T 2 1 − M 2 , rather than the classical standard normal Stein operator D − M . Further comments on this matter are given in Section 2.5.

Powers and inverse distributions
In this section, we assume that a.s., X takes values in IR\{0}, and that test functions f are defined on this open set. Then, we extend the definition of M a to a ∈ Z by M a f (x) = x a f (x), x = 0. In the particular case where X takes values in (0, ∞) (and thus, test functions are defined on (0, ∞)), this definition also makes sense when a ∈ IR.
Let us first note a result concerning powers. Let P a be defined by P a f (x) = f (x a ). For a = 0, we have that T r P a = aP a T r/a , since This result allows us to easily obtain Stein operators for powers of random variables and inverse distributions. Suppose X has Stein operator We can write down a Stein operator for X γ immediately (if X takes negative values, we restrict to positive or integer-valued γ): Applying P 1/γ on the left of (18) gives the following Stein operator for the random variable X γ : as P 1/γ P γ = I. From (19) we immediately obtain, for example, the classical χ 2 (1) Stein operator T 1/2 − 1 2 M from the standard normal Stein operator T 1 − M 2 . However, in certain situations, a more convenient form of the Stein operator may be desired. To illustrate this, we consider the important special case of inverse distributions. Here γ = −1, which yields the following Stein operator for 1/X: To remove the singularity, we multiply on the right by M −1 to get Cancelling constants gives the Stein operator

Examples
Starting from the classical Stein operators of the centered normal, gamma, beta, Student's t, inverse-gamma, F -distribution, PRR, variance-gamma (with θ = 0 and µ = 0), generalized gamma distributions, and K-distributions, we use the results of Section 2.3 to derive new operators for the (possibly mixed) products of these distributions. The operators of the aforementioned distributions are summed up in Appendix B. Stein operators for any mixed product of independent copies of such random variables are attainable through a direct application of Proposition 2.8. We give some examples below.

Mixed products of centered normal and gamma random variables
Stein operators for (mixed) products of independent central normal, beta and gamma random variables were obtained by [14,13]. Here we demonstrate how these Stein operators can be easily derived by an application of our theory (we omit the beta distribution for reasons of brevity). Let (X i ) 1≤i≤n and (Y j ) 1≤j≤m be independent random variables and assume X i ∼ N (0, σ 2 ) and Y j ∼ Γ(r i , λ i ). The random variables X i and Y j admit the following Stein operators: A repeated application of Proposition 2.8 now gives the following Stein operators: The product gamma Stein operator (24) is in exact agreement with the one obtained by [14]. The Stein operators (23) and (25) differ slightly from those of [13,14], but one can write T 1 = DM then simplify by M on the right to retrieve the result of [13,14] (see also discussion in Section 2.1). The same is true of the mixed product operator (25), which is the mixed normal-gamma Stein operator of [14] multiplied on the right by M . We refer to Appendix B where this idea is expounded.
Finally, we note that whilst the operators (23) and (24) are of orders n and m, respectively, the mixed product operator (25) is of order n + 2m, rather than order n + m which one may at first expect. This a consequence of the fact that the powers of M in the Stein operator (21) and (22) differ by a factor of 2.

Mixed product of Student and variance-gamma random variables
Let (X i ) 1≤i≤n and (Y j ) 1≤j≤m be independent random variables and assume X i ∼ T (ν i ) and Y j ∼ VG(r j , 0, σ j , 0); the p.d.f.s of these distributions are given in Appendix B. X i and Y j admit Stein operators of the form : Note that one cannot apply Proposition 1.2 to the VG(r, θ, σ, 0) Stein operator σ 2 T 1 T r +2θM T r/2 − M 2 , although we do obtain a Stein operator the product of two such distributions in Section 4.4.3.
Applying recursively Proposition 1.2, we obtain the following Stein operators: As an aside, note that (26) can be obtained by applying Proposition 1.2 to the Stein operators where X and Y are independent. We can identify A X as the Stein operator for a N (0, σ 2 ) random variable and A Y as the Stein operator of the random variable Y = √ V where V ∼ Γ(r/2, 1/2). Since the variance-gamma Stein operator is characterising (see [11], Lemma 3.1), it follows that that Z ∼ VG(r, 0, σ, 0) is equal in distribution to X √ V . This representation of the VG(r, 0, σ, 0) distribution can be found in [4]. This example demonstrates that by characterising probability distributions, Stein operators can be used to derive useful properties of probability distributions; for a further discussion on this general matter see Section 3.

PRR distribution
A Stein operator for the PRR distribution is given by see Appendix B.
We now exhibit a neat derivation of this Stein operator by an application of Section 2.3. Let X and Y be independent random variables with distributions and Then it is known that √ 2sXY ∼ K s (see [27], Proposition 2.3). If s > 1, then we have the following Stein operators for X and Y : and, for 1/2 < s ≤ 1, we have the following Stein operators for X and Y : Using Proposition 2.8, we have that, for all s > 1/2,

From (19) we obtain the Stein operator
which on rescaling by a factor of √ 2s yields the operator (29).

Inverse and quotient distributions
From (20) we can write down inverse distributions for many standard distributions. First, suppose X ∼ Beta(a, b). Then a Stein operator for 1/X is This is a Stein operator for a Beta( which is a second order differential operator. Let us consider the inverse-gamma distribution. Let X ∼ Γ(r, λ), then the gamma Stein equation is From (20) we can obtain a Stein operator for 1/X (an inverse-gamma random variable): If X 1 ∼ Γ(r 1 , λ 1 ) and X ∼ Γ(r 2 , λ 2 ), we have from the above operator and Proposition 2.8, the following Stein operator for Z = X 1 /X 2 : which is a first order differential operator. As a special case, we can obtain a Stein operator for the F -distribution with parameters d 1 > 0 and d 2 > 0. This is because are independent. Now applying (32) and rescaling to take into account the factor d 1 /d 2 gives the following Stein operator for Z : One can also easily derive the generalized gamma Stein operator from the gamma Stein operator. The Stein operator for the GG(r, λ, q) distribution is given by T r − qλ q M q . Using the relationship X L = (λ 1−q Y ) 1/q for X ∼ GG(r, λ, q) and Y ∼ Γ(r/q, λ) (see [28]) together with (19) and a rescaling, we readily recover the generalized gamma Stein operator from the usual gamma Stein operator.
As a final example, we note that we can use Proposition 2.8 to obtain a Stein operator for the ratio of two independent standard normal random variables. A Stein operator for the standard normal random variable X 1 is T 1 − M 2 and we can apply (20) to obtain the following Stein operator for the random variable 1/X 1 :

Hence a Stein operator for the ratio of two independent standard normals is
which is the Stein operator for the Cauchy distribution (a special case of the Student's t Stein operator of [32]), as one would expect.

Duals of Stein operators and densities of product distributions
Fundamental methods, based on the Mellin integral transform, for deriving formulas for densities of product distributions were developed by [33,34]. In [34], formulas, involving the Meijer Gfunction, were obtained for products of independent centered normals, and for mixed products of beta and gamma random variables. However, for other product distributions, applying the Mellin inversion formula can lead to intractable calculations.
In this section, we present a novel method for deriving formulas for densities of product distributions based on the duality between Stein operators and ODEs satisfied by densities. Our approach builds on that of [14] in which a duality argument was used to derive a new formula for the density of the distribution of a mixed product of mutually independently centered normal, beta and gamma random variables (deriving such a formula using the Mellin inversion formula would have required some very involved calculations). We apply this method to derive a new formula for the p.d.f. of the product of n independent VG(r, 0, σ, 0) random variables and to recover a formula for the product of n independent Student's t-distributed random variables that was given in [23]. We begin with a duality lemma whose proof is a straightforward generalisation of Section 3.2 of [14].
and suppose that (We denote this class of functions by C p ). Then p satisfies the differential equation Remark 3.2. The class of functions C p consists of all f ∈ C k ([a, b]), where k = max{m, n}, that satisfy particular boundary conditions at a and b. Note that when (a, b) = R the class includes the set of all functions on R with compact support that are k times differentiable. The class C p suffices for the purpose of deriving the differential equation (36), although we expect that for particular densities (such as the beta distribution) the conditions on f could be weakened.
Proof. We begin by writing the expectation (35) as which exists if f ∈ C p . In arriving at the differential equation (40), we shall apply integration by parts repeatedly. To this end, it is useful to note the following integration by parts formula. Let γ ∈ R and suppose that φ and ψ are differentiable.
provided the integrals exist. We now return to equation (37) and use the integration by parts and formula (38) to obtain a differential equation that is satisfied by p.
where we used condition (i) to obtain the last equality. By a repeated application of integration by parts, using formula (38) and condition (i), we arrive at By a similar argument, this time using formula (38) and condition (ii), we obtain Putting this together we have that for all f ∈ C p . Since (39) holds for all f ∈ C p , we deduce (from an argument analogous to that used to prove the fundamental lemma of the calculus of variations) that p satisfies the differential equation This completes the proof.
We now show how the duality Lemma 3.1 can be exploited to derive formulas for densities of distributions. By duality, p satisfies the differential equation Making the change of variables y = b q n−m x q yields the following differential equation We recognise (41) as an instance of the Meijer G-function differential equation (78). There are max{m, n} linearly independent solutions to (41) that can be written in terms of the Meijer G-function (see [26], Chapter 16, Section 21). Using a change of variables, we can thus obtain a fundamental system of solutions to (40) given as Meijer G-functions. One can then arrive at a formula for the density by imposing the conditions that the solution must be non-negative and integrate to 1 over the support of the distribution. Due to the difficulty of handling the Meijer G-function, this final analysis is in general not straightforward. However, one can "guess" a formula for the density based on the fundamental system of solutions, and then verify that this is indeed the density by an application of the Mellin transform (note that in this verification step there is no need to use the Mellin inversion formula). An interesting direction for future research would be to develop techniques for identifying formulas for densities of distributions based solely on an analysis of the differential equation (40). However, even as it stands, we have a technique for obtaining formulas for densities that may be intractable through standard methods.
As has been noted, the method described above has already been used by [14] to derive formulas for the density of a mixed product of independent central normal, beta and gamma random variables. We now apply the method to obtain formulas for densities of products of independent variance-gamma and Student's t-distributed random variables. (27) for the product of n independent Student's t-distributed random variables with ν 1 , . . . , ν n degrees of freedom respectively:

Products of Student's t-distributed random variables. Recall the Stein operator
By lemma 3.1, we know that the density p of the product Student's t-distribution satisfies the differential equation Making the change of variables y = (−1) n ν 1 ···νn x 2 yields the differential equation From (78) We can apply (77) to choose C such the p integrates to 1 across its support: where we used that Γ(1/2) = √ π. The formula (45) represents a candidate density for product of n independent Student's t-distributed random variables, which we could verify using Mellin transforms. However, we omit this analysis, because the density of this distribution has already been worked out by [23]: p(x) = 1 π n/2 |x| Products of VG(r, 0, σ, 0) random variables. Let (Z i ) 1≤i≤n ∼ VG(r i , 0, σ i , 0) be independent, and set Z = n i=1 Z i . Recall the Stein operator (28) for the product of VG(r i , 0, σ i , 0) distributed random variables: . . σ 2 n . By Lemma 3.1, it follows that the density p satisfies the following differential equation: Arguing as we did in the Student's t example, we guess the following formula for the density p: p(x) = 1 2 n π n/2 σ n j=1 1 Γ(r j /2) G 2n,0 0,2n It is straightforward to verify that (48) solves (47) using (78), and the normalizing constant was obtained using (77). Unlike the product Student's t-distribution formula of the previous example, the formula (48) is unknown, so we must prove that it is indeed the density of Z. We verify this using Mellin transforms; note that this verification is much more straightforward than an application of the Mellin inversion formula. Let us define the Mellin transform and state some properties that will be useful to us. The Mellin transform of a non-negative random variable U with density p is given by for all s such that the expectation exists. If the random variable U has density p that is symmetric about the origin then we can define the Mellin transform of U by The Mellin transform is useful in determining the distribution of products of independent random variables due to the property that if the random variables U and V are independent then M U V (s) = M U (s)M V (s).

To obtain the Mellin transform of
and Y i ∼ Γ(r/2, 1/2) are independent. Using the formulas for the Mellin transforms of the normal and gamma distributions (see [34]), we have that and therefore . (49) Now, let W denote a random variable with density (48). Then, using (77) gives that which is equal to (49). Since the Mellin transforms of W is equal to that of Z, it follows that W is equal in law to Z. Therefore (48) is indeed the p.d.f of the random variable Z.

Extensions, outlook and open problems 4.1 Sums of independent random variables
The proofs of Proposition 1.2 and Lemma 2.7 rely heavily on the fact that L b τ a = τ a L b for all a, b ∈ IR. Since T b τ a = τ a T b for all a, b ∈ IR, we can apply a conditioning argument to derive a Stein operator for the product XY from the Stein operators for X and Y . However, using such a conditioning argument to derive a Stein operator for the sum X + Y from the Stein operators for X and Y only works for quite special cases. This is because, for sums, the operator analogous to τ a is the shift operator S a , defined by S a (f ) = (x → f (x + a)) is, for non-zero a, not commutative with the operator T b . To see this: There is, however, a class of Stein operators under which a conditioning argument can be easily used to obtain a Stein operator for a sum of independent random variables. Suppose X, X 1 , . . . , X n are i.i.d., with Stein operator Let W = n j=1 X j . Then, by conditioning,

Thus, a Stein operator for W is given by
This approach can be used, for example, to obtain the χ 2

Operators for functionals
Formally, the following easy-to-prove result (see, for instance, [25]) provides an answer to all our queries on the topic of Stein operators for random variables which can be written as functionals of independent contributions.
Proposition 4.1. Let X be a random variable with Stein operator A X acting on F. Let T : IR 2 → IR be non constant in its first coordinate, and let Y be a random variable independent of X. Then This means that if we know the operator for one of the contributions then we can, in principle, deduce an operator for the statistic (and Proposition 4.1 is easy to generalize to statistics of an arbitrary number of independent contributions). For example if X, Y are independent standard normal then A X g(x) = g ′ (x) − xg(x) and, choosing T (x, y) = x + y, we immediately obtain by independence and equality in distribution of X and Y that which is none other than the operator for Z ∼ N (0, 2) a centered normal random variable with variance 2, as expected. Such a simple argument breaks down if T (X, Y ) = XY because then (51) becomes (still under the assumption that X, Y are i.i.d. standard normal) This first order operator is uneasy to handle and the more appropriate operator is known from [13] to be a second order operator. The passage from the former to the latter permits to obtain a tractable operator at the cost of a higher degree. In the sequel we will see that such changes in the order of the operator are far from anecdotal and rather reflect deeply on the inherent randomness of the target distribution. For instance, if X and Y are independent normal with variance 1 and mean 1 then we will prove in Section 4.4.1 that a polynomial Stein operator (i.e. a Stein operator with polynomial coefficients) is now a third order operator. Similarly, if X and Y do not have the same mean then the resulting operator will have still a different order, depending on whether or not one of them is centered or not.

An extension when Assumption 1 fails
A natural question is whether one can extend the result of Propositon 1.2 to the case where the underlying random variable X does not satisfy Assumption 1 (as, for instance, for the noncentered normal distribution). The proof of Proposition 1.2 is not easily adaptated in this case.
We are able, however, to give a general result for the product of two i.i.d. random variables whose Stein operator has a particular form. Note that the operator in the following Proposition, written in an expanded form i,j a ij M i D j as in Lemma 2.2, satisfies # {j − i | a ij = 0} = 3 : we have added only one level of complexity in the operator. Nevertheless, as we will see later on, it is sufficient to include the classical cases of non-centered normal and non-centered gamma, and a more general sub-family of variance-gamma. Then, a weak Stein operator for Z = XY is Proof. Let Z = XY and f ∈ F. We have which yields, since (X, Y ) is exchangeable, and Then, from the above, However Thus, from (55), Now, by applying equations (55) and (56) to respectively L 1 f and L 2 f for some suitable operators L 1 and L 2 , we can make the terms in X disappear. More precisely, if we define then we have Indeed, Thus, using (55) and (56), we get IE[(KL 2 − αT a LL 2 − βKL 1 )f (Z)] = 0, and a straightforward calculation leads to (52).  (56) to L 1 f and L 2 f for suitable operators L 1 and L 2 in order to make the terms in X disappear. As far as we are aware, this is the first time that this technique has been used to obtain Stein operators, and we consider this to be a useful addition to the Stein operator tool kit. In Appendix A, we use the technique to derive a Stein operator for the product of independent non-centered normals with different means. This more specific setting provides a useful illustration of the technique.
The product operator (52) is in general a seventh order differential operator. However, for particular cases, such as the product of two i.i.d. non-centered normals, the operator reduces to one of lower order, see Section 4.4.1. Whilst we strongly believe that this operator is a minimal order polynomial operator, we have no proof of this claim (nor do we have much intuition as to whether the seventh order operator (52) is of minimal order). We believe this question of minimality to be of importance and state it as a conjecture.
Conjecture 4.5. There exists no second order Stein operator (acting on smooth functions with compact support) with polynomial coefficients for the product of two independent non-centered normal random variables.
We have not been able to devise a proof strategy to establish Conjecture 4.5. However, one can use a brute force approach to verify the conjecture for polynomials of fixed order (if the conjecture is true). Such results would be worthwhile in practice, because a third order Stein operator with linear coefficients may be easier to work with in applications than second order Stein operators with polynomial coefficients of very high order.
We used Mathematica to compute that the determinant of the matrix corresponding to this system of equations is 783360 = 0. Therefore, there is a unique solution, which is clearly a = b = · · · = f = 0. Thus, there does not exist a second order Stein operator with linear coefficients for Z.

Product of non-centered normals
Assume X and Y have a normal distribution with mean µ and variance 1. Their common Stein operator is thus D − M + µI. Apply Proposition 4.2 with α = µ, β = 1 and a = b = ∞ to get the following Stein operator for XY : which, in an expanded form, is Note that when µ = 0, the above operator becomes which we recognise as the product normal Stein operator that was obtained by [13]. Suppose that X ∼ N (µ X , 1) and Y ∼ N (µ Y , 1) are independent, and that µ X and µ Y are not necessarily equal. Then one can use an argument similar to that used to prove Proposition 4.2 to obtain a Stein operator for the product XY : The derivation is given in Appendix A and serves as a useful concrete illustration of the proof technique of Proposition 4.2. It is interesting to note that (61) is a fourth order differential operator; one higher than the third order operator (59) and two higher than the Stein operator for the product of two central normals.

Product of non-centered gammas
Assume X and Y are distributed as a Γ(r, 1), and let µ ∈ IR. A Stein operator for X + µ (or Y + µ) is A X = T r+µ − µD − M. Proposition 4.2 applied with α = 1, β = −µ, a = r + µ, b = ∞ yields the following fourth-order weak Stein operator for Z = (X + µ)(Y + µ): Note also that when µ = 0, this operator reduces to (M − T 2 r )T r−1 , which is the operator found in Section 2.5.1 applied to T r−1 f instead of f .

Reduced order operators
Here we consider a general setting in which reduced order operators can be obtained. Suppose that the random variable Z has Stein operator: which may have arisen naturally from a repeated application of Proposition 2.8. For general parameter values, this is a differential operator of order max{m, n}. However, for particular parameter values, we can obtain an operator of lower order. Consider the sets R = {a 1 , . . . , a m } and S = {r 1 , . . . , r n }.
If |R ∩ S| = t, then we can obtain a Stein operator for Z that has order max{m, n} − t. To see this, suppose, without loss of generality, that r j = a j for j = 1, . . . , t. Then we can write (recalling that the operators T α and T β are commutative) Taking f (x) = T a 1 · · · T rt g(x) now gives the following reduced order operator: Specific examples of reduced order operators for mixed products of centered normal, beta and gamma random variables are given in [14]. The condition that the order of the operator reduces to max{m, n} − t when |R ∩ S| = t is related to the fact the density of the random variable Z can be written as a Meijer G-function. By duality, the density p of Z satisfies the differential equation Using (78), we have that solutions to this differential equation are of the form where C is an arbitrary constant and k, l ∈ {0, . . . , max{m, n}} are integers that we are free to choose (k = n, l = 0 for the density of the product normal distribution (see [34]), but k = n, l = n for the density of Student's t-distribution). It is interesting to note that the order of the Gfunction (64) reduces to max{m, n} − t precisely when |R ∩ S| = t (see Section C.2). The duality between Stein operators and differential equations satisfied by densities therefore suggests that Stein operators for product distributions that arise from an application of Proposition 2.8 have minimal order amongst all Stein operators with polynomial coefficients for the given distribution.
We expect this to be the case unless the sets R and S share at least one element, in which case we can obtain a lower order operator by arguing as as we did in obtaining (63). It is clear that the notion of characterisation heavily depends on the class F of functions which has to contain enough functions to ensure that A is characterising. For instance, we have the following.

On the characterisation properties of polynomial Stein operators
Proposition 4.7. Assume that the distribution of X is not uniquely determined by its moments, and let Z be a random variable with different distribution but same moments as X. Let A X be a polynomial Stein operator for X, and n be its degree. Define the weighted Sobolev norm ||.|| Z,n by on the space of functions that admit weak derivatives up to the order n and such that those expectations are finite. Let F be the completion of the set of polynomials for this norm. Then i.e. A X is not characterising on F.
Proof. Since X and Z have the same moments, we deduce at once that for any polynomial P , The proof follows from a denseness argument : let P m → f for the norm defined above. Then Developing A X , it suffices to show that for k, l ≥ 0, m (Z))] → n→∞ 0. But by Cauchy-Schwarz, which tends to zero as m → ∞.
It is now a purely analytical problem to determine "how large" is the set F defined in last Proposition. For instance, does it contain smooth functions with polynomial growth? Or, at least, smooth functions with compact support? This is a difficult question, that is partly answered in the literature. For instance, [31] proves that the answer is yes in the particular case n = 1 and when Z has exponential moments; but this case does not fall into last Proposition, since we know that if Z has exponential moments, then it is characterised by its (polynomial) moments. The question is all the more difficult since even when the distribution of X is well known, little is known about the distribution of Z, and thus on the topological properties of the weighted Sobolev norm (65).
We also have the following partial converse to Proposition 4.7.
Proposition 4.8. Suppose that the law of the random variable X, supported on I ⊂ IR, is determined by its moments. Let the operator A X = n i=1 p j=1 a i,j M j D i , where a i,j ∈ R, act on a class of functions F which contains all polynomial functions. Suppose A X is a Stein operator for X: that is, for all f ∈ F, where the maxima and minima are taken over all i, j such that a i,j = 0. Suppose that the first m moments of Y are equal to those of X and that for all f ∈ F. Then Y has the same law as X.
Proof. We prove that all moments of Y are equal to those of X. As the moments of X determine its law, verifying this proves the Proposition. The monomials {x k : k ≥ 1} are contained in the class F, so applying f (x) = x k , k ≥ m, to (67) yields the recurrence relation where C k = k(k − 1) · · · (k − i + 1) if k − i + 1 > 0 and C k = 0 otherwise. We have that IEY 0 = 1 and we are given that IEY k = IEX k for k = 1, . . . , m. We can then use forward substitution in (68) to (uniquely) obtain all moments of Y . Due to (66), E[A X f (X)] = 0 for all f ∈ F, and so it follows by the above reasoning that But this is same recurrence relation as (68) and, since IEY k = IEX k for k = 1, . . . , m, it follows that IEY k = IEX k for all k ≥ m as well.
If we have obtained a Stein operator A X for a random variable X, then Proposition 4.8 tells us that the operator characterises the law of X if X is determined by its moments. This characterisation is weaker than those typically found in Stein's method literature, as it involves moment conditions on the random variable Y . This is perhaps not surprising, because the characterisations given in the literature have mostly been found on a case-by-case basis, whereas ours applies to a wide class of distributions.
Let us first give an example that falls into the application of both Propositions 4.7 and 4.8, and shows the importance of the role of the class of functions F in the characterising nature of a Stein operator. Assume X has a Student's t-distribution with ν degrees of freedom; then X admits as a Stein operator A X f (x) = (ν + x 2 )f ′ (x) − (ν − 1)xf (x) (see [32]; note that the Student's t operator in Appendix B acts on functions g(x) = xf (x)). We know, however, that X is not characterised by its moments (see [37]); thus if F 1 is defined as in Proposition 4.7, then A X does not characterise the distribution of X on F 1 . However, we know from [32], Theorem 1 and [12], Proposition 3.1 that when applied to a wider class of functions, say F 2 , then A X characterises X on F 2 .
We now give some simple examples of the application of Proposition 4.8. For a detailed account of the moment determinacy of powers and products of random variables, see [37].
• The Beta distribution is supported on (0, 1), so its distribution is determined by its moments. Therefore the polynomial Stein operator for the product of independent Beta's obtained by [14] is characterising.
• The distribution of the product X n of n independent standard normal random variables is determined by its moments if and only if n = 1, 2 (see [37], Corollary 2). Thus, polynomial Stein operators for X n are characterising if n = 1, 2, and the case n ≥ 3 remains an open problem. However, the moments of |X n | 1/n determine its distribution (see [37], Corollary 1), and so polynomial Stein operators for |X n | 1/n are characterising for all n ≥ 1.
A more general, open question of interest is : given a random variable X admits a polynomial Stein operator acting on a 'large' class of functions (e.g. that contains, say, smooth functions with compact support), can we always extend this class so that the operator is characterising on this larger class?

Some open problems
We conclude with a list of open problems: • Problem 1: Find bounds on solutions to Stein equations. This is an important technical challenge that must be overcome before higher order Stein operators, as obtained in this paper, can be used to prove approximation theorems.
• Problem 3: Obtain an algebra of Stein operators for other functionals of random variables.
Of particular interest would be sums of non-identically distributed random variables, although we saw in Section 4.1 that this seems more challenging than an algebra for products.
• Problem 4: Given a random variable X admits a polynomial Stein operator acting on a 'large' class of functions (e.g. that contains, say, smooth functions with compact support), can we always extend this class so that the operator is characterising on this larger class?
A A Stein operator for the product of non-central independent normals In Section 4.4.1, we stated that a Stein operator for the product of independent the independent normals N (µ X , 1) and N (µ Y , 1) is given by Here we provide a derivation of this Stein operator, which serves as a concrete illustration of the proof technique of Proposition 4.2. Let X and Y be independent standard normal random variables and define Z = (X +µ X )(Y + µ Y ). Let us now show that (69) is a Stein operator for Z. We will use repeatedly the fact that for W standard normal, IE[W g(W )] = IE[g ′ (W )]. We shall also repeatedly apply conditioning arguments, and we let IE W [] stand for the expectation conditioned on W . Let f : R → ∞ be four times differentiable and such that IE|Zf (i) (Z)| < ∞ for i = 0, 1, . . . 4 and IE|f (i) (Z)| < ∞ for i = 0, 1, 2, 3, where f (0) ≡ f . Then By again applying a conditioning argument we obtain Hence From the last equations of (70) and (70), we have that and

B List of Stein operators for continuous distributions
Recall that M f (x) = xf (x), Df (x) = f ′ (x), I is the identity and T a f (x) = xf ′ (x) + af (x). We also recall the definition of some standard functions. The beta function is defined by B(a, b) = Γ(a)Γ(b) Γ(a+b) . U (a, b, x) denotes the confluent hypergeometric function of the second kind ( [26], Chapter 13). The modified Bessel function of the second kind is given, for x > 0, by K ν (x) = ∞ 0 e −x cosh(t) cosh(νt) dt (see [26]). We give a list of Stein operators for several classical probability distributions, in terms of the above operators. References for these Stein operators are as follows: normal [35], gamma [7,21], beta [8,15,32], Student's t [32], inverse-gamma [17], F -distribution (new to this paper), PRR [27], variance-gamma [11], generalized gamma [14], and two versions of the K-distribution (both of which can be deduced from [14], as they are both powers of product of independent gammas).
The usual Stein operators (as defined in the above references) for the normal, PRR and variance-gamma distributions, are not in the form required in Section 2. In these cases, we multiply the operators by M on the right (which is equivalent to applying them to xf (x) instead of f (x)); see discussion in Section 2.1.
We note that the support of the variance-gamma distributions is R when σ > 0, but in the limit σ → 0 the support is the region (µ, ∞) if θ > 0, and is (−∞, µ) if θ < 0.

C The Meijer G-function
Here we define the Meijer G-function and present some of its basic properties that are relevant to this paper. For further properties of this function see [22,26].