Generalized bounds for active subspaces

In this article, we consider scenarios in which traditional estimates for the active subspace method based on probabilistic Poincar\'e inequalities are not valid due to unbounded Poincar\'e constants. Consequently, we propose a framework that allows to derive generalized estimates in the sense that it enables to control the trade-off between the size of the Poincar\'e constant and a weaker order of the final error bound. In particular, we investigate independently exponentially distributed random variables in dimension two or larger and give explicit expressions for corresponding Poincar\'e constants showing their dependence on the dimension of the problem. Finally, we suggest possibilities for future work that aim for extending the class of distributions applicable to the active subspace method as we regard this as an opportunity to enlarge its usability.


Introduction
Many modern computational problems, having a large number of input variables or parameters, suffer from the "curse of dimensionality" in that their solution becomes computationally expensive or even intractable as the dimension of the problem grows. The active subspace method (ASM), or shorter, active subspaces [17,18], is a set of tools for dimension reduction which reduce the effects caused by the curse of dimensionality. ASM splits an Euclidean input space into a so-called active and inactive subspace based on average sensitivities of a real-valued function of interest. The sensitivities are found by an eigendecomposition of weighted outer products of the function's gradient with itself. That is, eigenvalues indicate average sensitivities of a function of interest in the direction of the corresponding eigenvector. Eigenvectors and eigenvalues belonging to the active subspace are then considered as dominant for the global behavior of the function of interest, whereas the inactive subspace is regarded as negligible.
The usefulness of ASM has already been demonstrated for several real case studies in various applied disciplines; see, e. g., [22,31,36,37,39]. It has also motivated other methodological advances, e. g., in the solution of Bayesian inverse problems [35] by an accelerated Markov chain Monte Carlo algorithm [20], in uncertainty quantification and propagation [15,40], and in the theory of ridge approximation; see, e. g., [19,24,25].
However, ASM is only one dimension reduction technique among others. For example, likelihood-informed dimension reduction for the solution of Bayesian inverse problems [21] is based on a similar idea. This approach, however, analyzes the Hessian matrix of the function of interest instead of the gradient. An extension to vector-valued functions in gradient-based dimension reduction is given by [45]. Dimension reduction for nonlinear Bayesian inverse problems based on the Kullback-Leibler (KL) divergence of approximate posteriors and (subspace) logarithmic Sobolev inequalities, including a comprehensive comparison of several other techniques, was provided by the authors of [46]. Furthermore, Active Manifolds [11], as a nonlinear analogue to ASM, and PTU [13], as an extension to a framework for nonlinear dimension reduction called Isomap [38], both have demonstrated a lot of promise.
A main result in ASM theory is an upper bound on the mean squared error between the original function of interest and its low-dimensional approximation on the active subspace. The corresponding proof is based on an inequality of Poincaré type which is probabilistic in nature since ASM involves a probability distribution that weights sensitivities of the function of interest at different locations in the input space. The upper bound consists of the product of a Poincaré type constant and the sum of eigenvalues corresponding to the inactive subspace, called inactive trace in the following. The constant derived in [18] is claimed to depend only on the original distribution which is generally incorrect. Also, to the knowledge of the authors, existing theory for dimension reduction techniques based on Poincaré or logarithmic Sobolev inequalities are subject to quite restrictive assumptions on the involved probability distribution. These assumptions comprise either the distribution having compact support or its density ρ being of uniformly log-concave form, i. e., ρ(x) = exp(−V (x)), where V is such that its Hessian matrix V (x) αI for each x and some α > 0. By the famous Bakry-Émery criterion, the latter assumption implies a logarithmic Sobolev inequality and Poincaré inequality with universal Poincaré constant 1/α; see, e. g., [3,41]. Note that the case α = 0, i. e., V being only convex, is not covered. However, Bobkov [8] showed that a Poincaré inequality is still satisfied in this case and gave lower and upper bounds on the corresponding Poincaré constant. Distributions with heavier tails, i. e., for α = 0, as, e. g., exponential or Laplace distributions, do not satisfy the assumptions above, but are, however, of practical relevance.
In ASM theory, it is not the original distribution that must satisfy a Poincaré inequality, but a conditional distribution on the inactive subspace, which depends on a variable defined on the active subspace. Both assumptions on the original distribution from above are in fact passed on to the conditional distribution. However, the case α = 0 is cumbersome. We shall give an example for this case regarding a distribution that itself satisfies a Poincaré inequality, but might not be applicable at all or only with care due to an arbitrary large Poincaré constant in the final bound for the mentioned mean squared error. Our arguments are based on the bounds for corresponding Poincaré constants given by Bobkov in [8]. We also describe a way to still get upper bounds in this situation, however with a weaker, reduced order in the inactive trace. This order reduction is controllable in the sense that the practitioner can decide for the actual trade-off between the order of the inactive trace and the size of the corresponding Poincaré constant. The mentioned general problem and its solution is exemplified on independently exponentially distributed random variables in dimension two and larger. Also, it is shown that the final constant is very much depending on the dimension of the problem. However, since this example is rather special, we eventually propose opportunities for future work that aim for extending the class of distributions for which the bounds and the involved constants are explicitly available in order to expand the applicability of ASM to more scenarios of practical interest. In particular, the class of multivariate generalized hyperbolic distributions is a rich class that is, in our opinion, worthwhile to get investigated. Details on arising difficulties with this class are also provided.
The outline of the manuscript is as follows. Section 2 gives an introduction to ASM and its formal context. In Section 3, we recall results involving compactly supported and normal distributions. The main results consisting of a motivation and discussion of the mentioned problems, with independently exponentially distributed random variables as an extreme example, are presented in Section 4. In Section 5, we propose possibilities for future work. Finally, a summary is given in Section 6.

Active subspaces
The active subspace method is a set of tools for gradient-based dimension reduction [17,18]. Its aim is to find directions in the domain of a function f along which the function changes dominantly, on average. For illustration, consider a function of the form f (x) = g(A x) with a so-called profile function g and a matrix A ∈ R n×k , 1 ≤ k ≤ n, n ≥ 2. Functions of this type are called ridge functions [34]. Note that f is constant along the null space of A . Indeed, for x ∈ dom(f ) ⊆ R n and v ∈ N (A ) such that x + v ∈ dom(f ), it holds that That is, f is intrinsically at most k-dimensional. For arbitrary f , the general task is to find a suitable dimension k, a function g : dom(g) → R, dom(g) ⊆ R k , and a matrix For this, the active subspace method assumes that the function of interest f : X → R is continuously differentiable with partial derivatives that are square-integrable w.r.t. a probability density function ρ X . We define X := dom(f ) ⊆ R n to be the support of ρ X , i. e., the closure of the set X + := {x ∈ R n | ρ X (x) > 0}. We assume that X is a continuity set, that is, its boundary is a Lebesgue null set. The central object of interest is a matrix constructed by outer products of the gradient of f , ∇f = ∇ x f , with itself weighted by ρ X , Since C is real symmetric, there exists an eigendecomposition C = W ΛW with an orthogonal matrix W ∈ R n×n and a diagonal matrix Λ ∈ R n×n with descending eigenvalues λ 1 , . . . , λ n on its diagonal. The positive semidefiniteness of C additionally ensures that λ 1 ≥ · · · ≥ λ n ≥ 0. Note that the matrices C, W , and Λ all depend on f and ρ X . The behavior of the function f and the eigendecomposition of C have an interesting, exploitable relation, i. e., If, for example, λ i = 0 for some i, then we can conclude that f does not change in the direction of the corresponding eigenvector w i . That is, if eigenvalues λ i , i = k + 1, . . . , n, are sufficiently small for a suitable k ≤ n−1, or even zero as in the case of ridge functions, then f can be approximated by a lower-dimensional function. Formally, this corresponds to a split of Λ and W , i. e., the split of W suggests a new coordinate system (y, z) for the active variable y := W 1 x ∈ R k and the inactive variable z := W 2 x ∈ R n−k . The range of W 1 , R(W 1 ) := {W 1 y | y ∈ R k } ⊆ R n , is called the active subspace of f . Note that the new variable y is aligned to directions on which f changes much more, on average, than on directions the variable z is aligned to. For the remainder, we define Also, for y ∈ Y and z ∈ Z, let y, z := y, z W : to concisely denote changes of the coordinate system. Variables x, y, and z can also be regarded as random variables X, Y , and Z, respectively, that are defined on a common probability space (Ω, F, P). The orthogonal variable transformation x → (y, z) induces probability density functions for the random variables Y and Z. That is, the joint distribution of (Y , Z) is for y ∈ Y and z ∈ Z. Corresponding marginal and conditional densities are defined as usual. Additionally, set to denote the set of all values for the active variable y with a strictly positive density value. We frequently use that for a ρ X -integrable function h : X → R, it holds that (2.10) Given the eigenvectors in W , we still need to define a lower-dimensional function g approximating f . For y ∈ Y + , a natural way is to define g(y) as the conditional expectation of f given y, i. e., as an integral over the inactive subspace weighted with the conditional density ρ Z|Y (·|y). Recall that this approximation is the best in an L 2 sense [28,Corollary 8.17]. Hence, we set (2.11) Additionally, we define Remark. In practice, both the matrix C from (2.2) and the low-dimensional function g from (2.11) are often not exactly available. Our results can, however, also adapted to a corresponding perturbation analysis, provided in [18], which we do not perform since it would require additional notation and complexity but not contribute to the central aspects of this manuscript.
One of the main results in ASM theory is a theorem that gives an upper bound on the mean squared error of f g approximating f . The upper bound is the product of a Poincaré constant C P,W > 0 and the sum of n − k eigenvalues corresponding to the inactive subspace, called inactive trace. That is, if the inactive trace is small, then the mean squared error of f g approximating f is also small. Mathematically, for a given probability density function ρ X , the theorem states that [18, Theorem 3.1] for a Poincaré constant C P,W = C P,W (W, ρ X ) > 0. Note that C P,W depends on W = W (f ) and thus also indirectly on f . If desired, we could remove this dependence by considering the supremum of C P,W over all orthogonal matrices, i. e., and get provided the constant C P = C P (ρ X ) exists. Deriving such an upper bound for a certain class of distributions would allow to choose ρ X independently of f . Note that [45,46] also control the Poincaré constant for any orthogonal matrix W . The derivation of (2.13) starts with where we used a probabilistic Poincaré inequality w.r.t. ρ Z|Y (·|y) for a given y ∈ Y + . Note that the Poincaré constant C y of ρ Z|Y (·|y) depends on y. In [18, Theorem 3.1], it was indirectly assumed that this constant does not depend on y. Under the assumption that C P,W := ess sup C Y < ∞, i. e., the distribution of C Y has compact support, we can continue with However, as we see in Subsection 4.3, this assumption on C Y is not always fulfilled. The continuation of (2.18) follows [18, Lemma 2.2 and Theorem 3.1]. We repeat the steps here for the sake of completeness. So, first, note that The next section gives two examples for types of densities ρ X that are well-known to imply a probabilistic Poincaré inequality for ρ Z|Y (·|y) and allow for a bound on its constant C y that is uniform in y and W . Again, we emphasize that it is not ρ X that should satisfy a probabilistic Poincaré inequality but ρ Z|Y (·|y).

Compactly supported and normal distributions
The uniform distribution, as a canonical example of a distribution with compact support X , is well-known to satisfy a probabilistic Poincaré inequality on its own and to imply the same for densities ρ Z|Y (·|y) which are also uniform. Note that a probabilistic Poincaré inequality involving a uniform distribution is actually equivalent to a regular Poincaré inequality w.r.t. the Lebesgue measure. The following theorem is a slightly more general result. We add a convexity assumption on X • since it makes Poincaré constants explicit. Recall that the Poincaré constant for a convex domain with diameter d > 0 is d/π; see, e. g., [7].
for y ∈ Y + and z ∈ Z • y . This justifies the following lines of computation for y ∈ Y + , Then, combining (2.19) with (3.10) yields the result in (3.1).
Also, it is well-known that the Poincaré constant is one for the multivariate standard normal distribution N (0, I) [14]. Since its density is rotationally symmetric, random variables Y and Z are independent and each follow again a standard normal distribution. Hence, it holds that C P = 1. For general multivariate normal distributions N (m, Σ) with mean m and non-degenerate covariance matrix Σ, shifting and scaling arguments give that C P = λ max (Σ).
Remark. Note that the constant C P in the previous two examples is independent of W .

Main results
This section contains the main contribution of the manuscript which lies in an investigation of general log-concave probability measures w.r.t. their applicability for ASM. Log-concave distributions have Lebesgue densities of the form ρ X (x) = exp(−V (x)) for a convex function V : R n → (−∞, +∞]. Note that +∞ is included in the codomain of V . The conditional density ρ Z|Y (·|y) for a given y ∈ Y + is then given by whereṼ y (z) := V ( y, z ) + log(ρ y (y)). Note thatṼ y inherits convexity (in z) from V . Bobkov [8] shows that general log-concave densities satisfy a Poincaré inequality and gives lower and upper bounds on the corresponding Poincaré constant. First, we discuss the special case of α-uniformly convex functions V for which the corresponding density ρ X is known to satisfy a Poincaré inequality with universal Poincaré constant 1/α. However, the assumption of the density ρ X being of uniformly log-concave type is somewhat restrictive since it excludes distributions with heavier tails as, for example, exponential or Laplace distributions. For this reason, we secondly investigate general log-concave densities and show that there might arise problems with this class of probability distributions due to arbitrary large Poincaré constants C Y . In particular, the problems and their proposed solution are exemplified on an extreme case example involving independently exponentially distributed random variables in n ≥ 2 dimensions.

α-uniformly convex functions V
Definition 4.1 (α-uniformly convex function). A function V ∈ C 2 is said to be αuniformly convex, if there is an α > 0 such that for all x ∈ R n it holds that for all u ∈ R n , where V denotes the Hessian matrix of V .
In [41, p. 43-44], it was shown that there is a dimension-free Poincaré constant 1/α for α-uniformly log-concave ρ X . Note that this says nothing about the special case α = 0. The existence of a dimension-free Poincaré constant for this special case is actually a consequence of the famous Kannan-Lovász-Simonovits conjecture; see, e. g., [1,30]. However, since we need a Poincaré inequality for ρ Z|Y (·|y), y ∈ Y + , we have to prove the following lemma similar to [46, Subsection 7.2] Choose w ∈ R n−k arbitrarily. Then, for every z ∈ R n−k , it holds that Since ρ Z|Y (·|y) inherits the universal Poincaré constant 1/α from ρ X , the result in (2.15) also holds for α-uniformly log-concave densities with C P = 1/α (independent of W ) which is similar to [46,Corollary 2].
For example, α-uniformly log-concave densities comprise multivariate normal distributions N (m, Σ) with mean m and covariance matrix Σ (α = 1/λ max (Σ)). However, distributions that satisfy the assumption only for α = 0 as, e. g., Weibull distributions with the exponential distribution as a special case or Gamma distributions with shape parameter β ≥ 1, only belong to the class of general log-concave distributions.

General convex functions V
Since we cannot make use of a universal dimension-free Poincaré constant involving general convex functions V : R n → (−∞, +∞], we look at them more closely in this subsection. Recall that ρ Z|Y (z|y) = exp(−Ṽ y (z)), y ∈ Y + , for a convex functionṼ y . We have to deal with the fact that the essential supremum of the random Poincaré constant C Y of ρ Z|Y (·|Y ) does possibly not exist. A corresponding example is given in Subsection 4.3.1. In the step from (2.17) to (2.18), we have applied Hölder's inequality with Hölder conjugates (p, q) = (+∞, 1). Since this is not possible for unbounded random variables C Y , we can only show a weaker result.
Remark. The previous lemma requires the gradient of f to be uniformly bounded, an assumption that is not needed in [18] and [46]. However, first, applying ASM, in the sense that the matrix C from (2.2) is estimated by a finite Monte Carlo sum, requires the same assumption to prove results on corresponding approximations of eigenvalues λ i and eigenvectors w i ; see [16] and [17,Section 3.3].
Secondly, this assumption can be weakened by applying another Hölder's inequality analogous to (4.9). Indeed, for ε ∈ (0, 1), we would get we would only require ∇ x f (X) to be integrable. What we, however, would have to accept in this case, is the resulting weaker order ε/(1 + ε) in the inactive trace.
The L-and ρ X -dependence of C P,ε,W is notationally neglected in the following. If possible, we can choose a suitable ε > 0 to get E[C (1+ε)/ε Y ] < ∞ and thus a finite constant C P,ε,W . Note that we lose first order in the eigenvalues from the inactive subspace, but have instead order 1/(1 + ε) < 1. Of course, the constant C P,ε,W could get arbitrarily large as ε → 0, but this strongly depends on W and the moments of C Y ; see the example given in Subsection 4.3.1.
It is known by Bobkov [8, Eqs. (1.3), (1.8) and p. 1906] that there exists a (dimensionally dependent) Poincaré constant C y for a general log-concave density ρ Z|Y (·|y) that is bounded from below and above by To the authors' knowledge, the constant C y is the best available. We provide a scenario in Subsection 4.3.1 ("Rotation by θ = π/4") in which the lower bound viewed as a random variable has no finite essential supremum implying the same for C Y .
However, to make use of Lemma 4.3, we need to investigate the involved constant C P,ε,W (ε, n, k).

Lemma 4.4. It holds that
.

(4.22)
Proof. Using Jensen's inequality for weighted sums, it follows that The result follows.
Eventually, we get and get E[(f (X) − f g (X)) 2 ] ≤ C P,ε (λ k+1 + · · · + λ n ) 1/(1+ε) , (4.28) provided the constant C P,ε = C P,ε (ε, n, k, L, ρ X ) exists. For C Var , we argue that it is actually enough to take the supremum only over the set of rotation matrices. Indeed, any orthogonal matrix W is either a proper (det W = 1) or an improper (det W = −1) rotation which is the combination of a proper rotation and an inversion of the axes; see, e. g., [27,33]. However, since the constant C Var,W from (4.22) is invariant to inversions of the axes, it holds that sup W orth.
This equality is exploited in the next subsection.

Independently exponentially distributed random variables as an extreme case
In this subsection, we take a closer look at independently exponentially distributed random variables in n ≥ 2 dimensions as an example for a general log-concave distribution.
In particular, we use the lower bound of Bobkov from (4.20) in Subsection 4.3.1 to show that there exists a scenario in which the random Poincaré constant C Y does not have an essential supremum implying that C P from (2.14) does not exist. Therefore, the quantity C Var from (4.27) is investigated in Subsections 4.3.1 and 4.3.2 to derive a (finite) upper bound for C P,ε from (4.26) in this special case. We regard a random vector X = (X 1 , . . . , X n ) whose components are independently exponentially distributed with unit rates ν i = 1, i = 1, . . . , n and will see that investigations with unit rates are sufficient to derive statements also involving other rates. The distribution of X has the density That is, in this case X = R n ≥0 and Note that V is convex. Since we are interested in C Var as a supremum over all orthogonal matrices, we assume that, in this subsection, W = W 1 W 2 is an arbitrary orthogonal matrix not depending on f and ρ X . Indeed, as the equality in (4.29) motivates, we can further assume that W is a rotation matrix.

2 dimensions
The joint density of two independently exponentially distributed random variables X 1 and X 2 both with unit rate is First, let us regard a rotation of the two-dimensional Cartesian coordinate system by a general angle θ ∈ [−π, π) to a coordinate system for (y, z), i. e., for a rotation matrix W = R θ := cos θ − sin θ sin θ cos θ . (4.34) That is, in two dimensions, it holds that Subsequently, we look at the special case θ = π/4 as an example for an unbounded Poincaré constant C y of ρ z|y (·|y). Variables are written in thin letters in this subsection since they denote real values and not multidimensional vectors. Note that the bound from (4.21) in this two-dimensional setting becomes

Rotation by θ = π/4
A rotation of 45 • , i. e., θ = π/4 and W = R π/4 , is a limit case since a − π/4 from (4.40) becomes zero. The joint density for Y and Z is then A graphical illustration of this case is given in Fig. 4. Consequently, the marginal distribution of Y is and the conditional density ρ Z|Y (·|y) computes to ρ Z|Y (z|y) = 1 [−y,y] (z) 2y (4.55) for y > 0. Note that ρ Z|Y (·|y) is the density of a uniform distribution on the interval [−y, y]. For Y > 0, it follows that which is the expression that variances of Z|Y for other angles θ * approach to as θ * → π/4 (see Fig. 3a). Note that the lower bound from (4.20) for C Y in this case becomes ) and, hence, its distribution is not compactly supported implying the same for the distribution of C Y . Therefore, we found a scenario in which the constants C P,W and C P indeed do not exist. However, there is still a chance that the constants C P,ε,W and C P,ε from (4.7) and, respectively, (4.26) exist. It holds that C Var (ε, 2, 1) = C Var,R π/4 (ε, 2, 1) = implying that the constant C P,ε (ε, 2, 1) can be bounded from above by For example, choosing ε = 2 would give

n dimensions
This subsection aims to generalize the results of the previous subsection, i. e., we investigate the constant C P,ε from (4.26) for n independently exponentially distributed random variables.
x 1 x 2 x 3 x 1 x 2 x 3 T a Figure 5: Exponential distribution in 3D with a rotated coordinate system.
Motivated by the two-dimensional case, we regard the rotation of the coordinate system by a matrix W = R * that rotates the vector (1, 0, . . . , 0) ∈ R n to (1/ √ n, . . . , 1/ √ n) ∈ R n . Note that in the two-dimensional case, a rotation by θ = π/4 corresponds to a matrix rotating (1, 0) to (1/ √ 2, 1/ √ 2) . This is the worst case in the sense that Z i |Y is uniformly distributed for each component Z i in Z = (Z 1 , . . . , Z n−k ) and hence, similar to the two-dimensional case, the conditional variance of Z i |Y has no finite essential supremum. In the context from above, it holds that C Var (ε, n, k) = C Var,R * (ε, n, k). (4.61) The following theorem studies this case and investigates the dimensional dependence of the involved constant.
Theorem 4.5. For ρ X as in (4.30), it holds that for a constant C exp n = C exp n (ε, n, k, L, ρ X ) ≥ C P,ε (4.63) Proof. In the support of ρ X , i. e., in X = R n ≥0 , ρ X is greater than zero and constant on the intersection of R n ≥0 and planes i.e., on hypersurfaces T a := P a ∩ R n ≥0 . The situation is illustrated by Fig. 5 for n = 3 dimensions.
We can rewrite T (y 1 ) as This motivates to view T (y 1 ) as an (n − 1)-dimensional set in the rotated coordinate system, i.e., we defině We observe that the conditioned random variable (Y , Z)|Y 1 is uniformly distributed on the regular (n−1)-simplexŤ (y 1 ). The basic idea to get a bound for E[Var(Z i |Y ) (1+ε)/ε ] is based on the fact that z i , moving as the (k + i − 1)-th coordinate insideŤ (y 1 ), takes values in [0, h i (y 1 )], where h i (y 1 ) is the height of a regular (k + i − 1)-simplex with side length √ 2ny 1 and is thus bounded. In general, the height of a regular n-simplex is the distance of a vertex to the circumcentre of its opposite regular (n − 1)-simplex. By [12, p. 367], it holds that We start the computation by noting that The marginal distribution of Z i |Y is given by and so we get Using Jensen's inequality in a first step, we can continue with Note that an intermediate step of the previous calculation uses the fact that the volume of the regular (n − 1)-simplexŤ (y 1 ) with side length √ 2ny 1 is (see [12, p. 367]) 1Ť (y 1 ) (y, z) dz dy = √ n n (n − 1)! y n−1 1 .
The result follows by Lemma 4.3. Fig. 6 depicts the quantity C ε (n, k = 1) from (4.90) as a function of ε > 0 for some n ∈ N (left plot) and as a function of n ≥ 2 for several ε > 0 (right plot). We set k = 1 since this gives the maximum value for C ε over all k ≥ 1. As expected, the curves increase quickly as ε approaches zero or, respectively, n becomes large.
Remark. In the previous theorem, the exponentially distributed random variables are assumed to have unit rates. The computations can also be made for arbitrary rates ν i , i = 1, . . . , n. However, some modifications are necessary. Let ν = (ν 1 , . . . , ν n ) denote the vector of rates. To get again the worst case scenario as in the previous subsection (uniform distribution on a simplex structure), the coordinate system has to be rotated in such a way that the vector (1, 0, . . . , 0) rotates to ν/ ν 2 . The structure of a regular simplex that is used in the estimates above is not present in this more general case. Instead, we get a general simplex whose heights are not as easy to compute as in the regular case. However, rough estimates can be achieved by enclosing the general simplex with a larger regular one.

Future work with MGH distributions
The generalized bound from Lemma 4.3 and the study of corresponding Poincaré type constants C P,ε,W and C P,ε for independently exponentially distributed random variables in Subsection 4.3 motivate further similar investigations of more general distributions. From a statistical perspective, a study of the class of multivariate generalized hyperbolic distributions (MGH) (see e. g., [4]) can be considered as a next step since it allows for distributions with both non-zero skewness and heavier tails. An MGH is a distribution of the random vector with location parameter µ ∈ R n , skewness parameter β ∈ R n , and a symmetric positive definite matrix M ∈ R n×n . The scalar random variable A, called the mixing variable, follows a generalized inverse Gaussian distribution (GIG) [26], and V ∼ N (0, I) is independent of A. As a particular example, for X to be Laplace distributed, we set β = 0 and let A be exponentially distributed [29]. Note that, however, the example from Subsection 4.3, assuming independently exponentially distributed random variables, is not an MGH. In order to include this case, we would need to introduce a mixing random matrix as scaling for V . Nevertheless, MGH is a large class containing classical distributions like the normalinverse Gaussian, generalized Laplace, and Student's t-distribution. In particular, these distributions are interesting since they have been used in areas like, for instance, economics and financial markets [5,6,23], spatial and Geostatistics [9,10,42], and linear mixed-effects [2,32,47] which are used, e. g., for linear non-Gaussian time series models in medical longitudinal studies [2].
We mention that, under an assumption on a parameter, MGH distributions are log-concave [44], i. e., we can use the estimates on Poincaré constants C Y of Bobkov from (4.20).
In our opinion, it is preferable to start the investigation with the subclass of symmetric MGH distributions, i. e., β = 0 in (5.1). The following lines demonstrate particular difficulties that we already encounter in this smaller subclass. Let us choose µ = β = 0 and M = I in (5.1) such that with V ∼ N (0, I). A common first step is to study X conditioned on A, i. e., X|A ∼ N (0, AI), and to use the tower property of conditional expectations. That is, analogously to (2.2), we define The computation starts, similar to (2.16), with = A trace (Λ A,2 ) . (5.10) In (5.9), we use the fact that the Poincaré constant of a normal distribution N (0, AI) is λ max (AI) = A; see Section 3. The last step to (5.10) is equal to (2.19). This yields (5.12) where the random variable A · trace (Λ A,2 ) is assumed to have finite first moment. At this point, as long as A is not compactly supported, we can only continue by applying another Hölder's inequality similar to the proof of Lemma 4.3. However, in any case, we have to face the problem that E[trace (Λ A,2 )] is, in general, not equal to trace (Λ 2 ) which denotes the inactive trace of C that we actually aim for. Nevertheless, we know that E[trace (Λ A )] = trace (Λ) , (5.13) but it is unclear whether, and how, this equality can be exploited for our purposes.

Summary
This manuscript discusses bounds for the mean squared error of a given function of interest and a low-dimensional approximation of it which is found by the active subspace method. These bounds, consisting of the product of a Poincaré constant and a sum of eigenvalues belonging to a non-dominant subspace, are based on a probabilistic Poincaré inequality. Existing literature applies this Poincaré inequality with indirect non-explicit assumptions that, as a consequence, limit the class of distributions applicable for the active subspace method. For example, these assumptions exclude distributions with exponential tails as, e. g., exponential distributions. In this respect, the main results of this manuscript give details on the problem that arises when applying the active subspace method with log-concave distributions (which include exponential distributions). We are able to provide a scenario, involving independently exponentially distributed random variables, in which the usual estimates are not achievable due to an unbounded Poincaré constant. However, using Hölder's inequality with conjugates (p, q) (p, q ∈ (1, ∞)) instead of (∞, 1), we show that it is possible to derive a generalized result in a way that enables to balance the size of the Poincaré constant and the remaining order of the error. We exemplify this trade-off on the mentioned scenario and show that the size of the involved constant is very much depending on the dimension of the problem. Finally, we propose directions for future work related to the applicability of active subspaces to the large class of multivariate generalized hyperbolic distributions. Also, details are provided for particular difficulties that already arise with a smaller subclass of these.

Source code
Wolfram Mathematica notebooks and code for generating the plots in this manuscript are available in a repository at https://bitbucket.org/m-parente/asm-poincare-pub/.