A multidimensional version of noise stability

We give a multivariate generalization of Borell's noise stability theorem for Gaussian vectors. As a consequence we recover two inequalities, also due to Borell, for exit times of the Ornstein-Uhlenbeck process.


Introduction
There has been a recent flurry of activity in probability [12,15] and computer science [9,10,16] around a certain paper of Borell [3] on inequalities satisfied by the Ornstein-Uhlenbeck process. In his paper, Borell proved a theorem, which is somewhat complicated to state, showing that certain quantities only decrease under Ehrhard symmetrization [7]. He then derived two simpler corollaries, about hitting times for the Ornstein-Uhlenbeck process, from this general result.
We recall that the Ornstein-Uhlenbeck process on R n is the Gaussian process {X t ∶ t ∈ R} with mean zero and covariance EX s X T t = e t−s I n . This is a Markov process, as may be seen by the construction X t = e −t B e 2t for a Brownian motion B t , and the stationary measure of X t is the standard Gaussian measure, γ n . For a set A ⊂ R n , we denote its exit time under X t by e A = inf{t ≥ 0 ∶ X t ∈ A}. Although they were originally written in terms of hitting times instead of exit times, Borell's two corollaries of his general inequality may be written as follows, in which half-space means a set of the form {x ∈ R n ∶ x ⋅ a ≤ b}, and half-spaces are parallel if they have the same normal vector. Theorem 1.1 (Borell). If B ⊂ R n is a half-space with γ n (B) = γ n (A) then e B stochastically dominates e A ; i.e., for every t ≥ 0, Theorem 1.2 (Borell). If B 1 and B 2 are parallel half-spaces with γ n (B i ) = γ n (A i ) then There is a third corollary of Borell's general inequality that did not appear in his original paper [3], but has nevertheless become widely applied in theoretical computer science, particularly in the study of hardness of approximation. Theorem 1.3 (Borell). If B 1 and B 2 are parallel half-spaces with γ n (B i ) = γ n (A i ) then for any t > 0, In the special case A 1 = A 2 , this inequality is sometimes interpreted as showing that half-spaces are the most "noise stable" sets. Here, we think of X t as being a noisy version of X 0 , and so a set A is noise stable if the event {X 0 ∈ A} tends to agree with the event {X t ∈ A}. Theorem 1.3 implies that this overlap is maximized, over all sets with a fixed Gaussian measure, by half-spaces. Using an invariance principle, Mossel et al. [15] deduced from Theorem 1.3 a similar inequality on the discrete cube (although the statement on the cube is necessarily more complicated, because the direction of a half-space's normal vector becomes important); that work then laid the foundation for many applications in theoretical computer science (for a few examples, see [9,10,16]).
Note that in Theorem 1.3, the joint distribution of (X 0 , X t ) has mean zero and covariance 1 ρ ρ 1 ⊗I n , where ρ = e −t . Our main result is a multivariate generalization of Theorem 1.3, which allows for more than two Gaussian vectors and a more general covariance structure than that endowed by the Ornstein-Uhlenbeck process. Our generalization is also strong enough to recover Theorems 1.1 and 1.2; we are thankful to Michel Ledoux for pointing this out. Theorem 1.4. Let M = (m ij ) be a k × k positive semidefinite matrix with m ij ≥ 0, and let X = (X 1 , . . . , X k ) be a kn-dimensional Gaussian vector with covariance M . For any measurable A 1 , . . . , A k ⊂ R n , whenever B i is a collection of parallel half-spaces with γ n (B i ) = γ n (A i ).
Moreover, if equality is attained in (1) then there exists a collection (B 1 , . . . , B k ) of parallel half-spaces such that A i = B i up to sets of measure zero.
By setting k = 2, Theorem 1.4 recovers Theorem 1.3. We should remark that a generalization along these lines, but nevertheless different from Theorem 1.4, has already been discovered: Isaksson and Mossel [8] showed that the inequality (1) also holds under the hypothesis that the off-diagonal elements of M −1 are non-positive. In other words, our hypothesis is that every pair X i , X j is positively correlated, while [8] assumed that every pair X i , X j is conditionally positively correlated given all the other variables. Neither of these conditions is strictly stronger than the other, and in fact both would suffice for recovering Theorems 1.1 and 1.2. However, [8] did not characterize the equality cases of (1) under their hypothesis.
Next, we will show how Theorem 1.4 may be used to recover Theorem 1.1. This reduction is quite similar to one by Burchard and Schmuckenschlager [4], who were studying exit times of Brownian motion on manifolds. (In that case, the study of exit times has a fairly long history; see [4] for references.) As we will see, though, our approach to Theorem 1.4 is quite different to that of Burchard and Schmuckenschlager, who studied two-point symmetrizations.
Let X t be the Ornstein-Uhlenbeck process, and consider the finite dimensional marginal (X t 1 , . . . , X t k ) for a sequence of times t 1 < ⋯ < t k . This is a mean-zero Gaussian vector with covariance M ⊗ I n , where m ij = e − t i −t j . Clearly, then, M satisfies the hypothesis of Theorem 1.4 and in particular, we have Pr when B is a half-space with γ n (A) = γ n (B). This is essentially a discrete version of Theorem 1.1, since the event To complete the proof, we need to show that one can take limits. Setting where the last inequality follows from (2). Next, we send k → ∞. Recall that X t is uniformly continuous with probability 1. Hence for any ǫ > 0, we We have shown, therefore, that for any ǫ > 0, Pr(e A ≥ τ ) ≤ Pr(e Bǫ ≥ τ ). It only remains to show, then, that Pr(e Bǫ ≥ τ ) converges to Pr(e B ≥ τ ) as ǫ → 0. Consider instead the equivalent statement that Pr(e Bǫ < τ ) converges to Pr(e B < τ ). Since B is closed and X t has continuous paths, e B < τ implies that there is some ǫ > 0 with e Bǫ < τ . That is, the function 1 {e Bǫ <τ } converges pointwise (and upwards) to 1 {e B <τ } as ǫ → 0. By the monotone convergence theorem, it follows that Pr(e Bǫ < τ ) converges to Pr(e B < τ ) as ǫ → 0. Hence and so we have recovered Theorem 1.1.
We have mentioned already that it is also possible to recover Theorem 1.1 from the result in [8]. Indeed, the matrix M with entries m ij = e − t i −t j does satisfy the hypothesis in [8] (namely that the off-diagonal entries of its inverse are non-positive), although this is certainly less obvious then the fact that M satisfies the conditions of Theorem 1.4.
Let us also indicate how Theorem 1.2 is recovered. We want to show that is only increased when A is replaced by B (recall that B 1 and B 2 are parallel half-spaces satisfying γ n (B i ) = γ n (A i ). We may suppose that A 2 ⊂ A 1 , since if not then (3) may be trivially made larger by moving some of A 2 's mass inside A 1 ; if this is impossible because γ n (A 2 ) > γ n (A 1 ) then (3) is trivially bounded by t ∧ e A 1 , which, by Theorem 1.1, is stochastically dominated by t∧e B 1 , and this in turn is equal to the right hand side of (3) with B replacing A. Now that we have reduced to the case Pr(X t j ∈ A 1 for all j < i and X t i ∈ A 2 ).
By (2), this is only increased when A is replaced by B. To recover Theorem 1.2 from here, it suffices to take a limit in much the same manner as before; we omit the details.

The Ornstein-Uhlenbeck semigroup
We will prove Theorem 1.4 by differentiating a particular functional under the Ornstein-Uhlenbeck semigroup. This proof method has a long history, beginning with Varopoulos' work [17] connecting the heat semigroup with Sobolev inequalties. More recently, and more apropos of this work, Bakry and Ledoux [1] proved the Gaussian isoperimetric inequality by differentiating Bobkov's functional [2] under the Ornstein-Uhlenbeck semigroup. We will follow quite a similar approach here, using a generalization of a functional that was introduced by Mossel and the author [14] to prove Theorem 1.4 in the case k = 2.
We define the Ornstein-Uhlenbeck semigroup {P t ∶ t ≥ 0}, which acts on functions f ∶ R n → R n , by where γ n is the standard Gaussian measure on R n . Equivalently, is the Ornstein-Uhlenbeck process from the previous section. From either definition, one can easily see that P 0 is the identity operator, while P t f → Ef as t → ∞.
One remarkable property of the Ornstein-Uhlenbeck semigroup is that it has very nice formulas for its commutation with smooth functions. In particular, for any F = (f 1 , . . . , f k ) ∶ R n → R k and for any smooth Ψ ∶ R k → R, there is the formula (see, eg., [11]) We begin with an observation that comes, essentially, from applying (4) to the function that is Ψ composed with a linear operator. In the following, ⊙ denotes the Hadamard (elementwise) product between two matrices and H Ψ denotes the Hessian matrix of Ψ.
We remark that the assumption m ii = 1 in Proposition 2.1 is not necessary, but it makes our notation simpler since otherwise we need to consider Ornstein-Uhlenbeck semigroups with different stationary measures.
Before proving Proposition 2.1, we introduce some notation that will be useful in what follows: for any f ∶ R n → R and any n × m matrix M , denote the function f ○ M ∶ R m → R by f M .
Proof. Let Q = (q ij ) be the positive definite square root of M ⊗ I n , and for i = 1, . . . , k, let Q i be the n × kn matrix consisting of rows (i − 1)n + 1 through in of Q. Let Z be a standard Gaussian vector in R kn , and note that QZ = (Q 1 Z, . . . , Q k Z) is a kn-dimensional Gaussian vector with mean 0 and covariance M ⊗ I n (i.e., QZ has the same distribution as X). We consider the quantity for s, t ∈ [0, ∞) and z ∈ R kn . First, let us check how P t commutes with linear transformations. Since Q ⊗ I n is the square root of M ⊗ I n , we have Q T i Q i = I n and so Of course, the gradient commutes with linear transformations as ∇f A = A T (∇f ) A . Combining this with (5), In particular, For brevity, let v i = (∇P t−s f i ) Q i . Then, by (4) and (6), Note that if v T = (v T 1 . . . v T k ) then the last line may be rewritten as In particular, if M ⊙ H Ψ ≤ 0 then ∂F (s,t,z) ∂s ≤ 0 for every s, t and z. Hence, lim t→∞ F (t, t, Z) ≤ lim t→∞ F (0, t, Z). But since (Q 1 Z, . . . , Q k Z) has the same distribution as (X (1) , . . . , X (k) ), EF (t, t, Z) converges to EΨ(F (X)) and EF (0, t, Z) converges to Ψ(EF (X)).
With hardly any extra effort, the proof of Proposition 2.1 also allows us to characterize its equality cases. Indeed, if EΨ(F ) = Ψ(EF ) then we must have ∂F (s,t,z) ∂s = 0 for every s, t, and z. Going back to (7) and (8), we see that P s v T ((M ⊙ H Ψ ) ⊗ I n )v must be identically zero, and hence ((M ⊙ H Ψ ) ⊗ I n )v = 0. In other words, we have the following corollary:

Proof of Theorem 1.4
Before proving Theorem 1.4, note that by translating X, A i , and B i , it suffices to consider the case in which X has mean zero. Moreover, by scaling each X i , A i , and B i , we may assume that m ii = 1 for each i (here and in the previous sentence we are using the fact that a collection of parallel halfspaces remains one under translation and scaling). We may also assume that M is strictly positive definite, since if not then the distribution of X is supported on a subspace, onto which we may project. Consider the function Note that Pr( Since every such set of parallel half-spaces may be obtained by applying a fixed rotation to each B i , Theorem 1.4 is equivalent to the statement Pr(X i ∈ A i for all i) ≤ J(γ n (A 1 ), . . . , γ n (A k ); M ).
Next, note that if x 1 , . . . , x k ∈ {0, 1} then J(x 1 , . . . , x k ) is 1 if all the x i are 1, and 0 otherwise. In particular, and hence (1) is equivalent to the statement in the special case f i = 1 A i . In fact, we will prove (10) for general measurable Unsurprisingly, the proof of (10) goes through Proposition 2.1. The main task left, therefore, is to compute the Hessian of J and check that it satisfies the hypothesis of Proposition 2.1. A well-known formula for conditional distributions of Gaussian vectors [6] states if X has mean zero and covariance M satisfying m ii = 1, then conditioned on X i = x i , Xî has mean x i M iî (where M i denotes the ith row of M , so M iî is the ith row of M with its ith element removed) and covariance M i . To compute the first derivatives of J, let and note that J(x 1 , . . . , x k ) = K(Φ −1 (x 1 ), . . . , Φ −1 (x k )). Now, for any i we may write K as from which we see that Now, given that X i = x i , Xî has mean M iî x i and covariance M i ; hence, and so we have the formula (bear in mind that this formula is only valid under the assumption m ii = 1; if not then m ii makes an appearance in the formula also). Applying the chain rule and the identity d (where by Φ −1 (xî), we mean the vector obtained by applying Φ −1 to xî element-wise.) Now let I(x) = φ(Φ −1 (x)) and define, for j ≠ i, by the chain rule applied to (11), we have It is worth mentioning that this last equation shows that in fact J ij = J ji . This is not obvious from the definition of J ij , although it may also be checked by the tedious process of calculating the derivative in that definition.
To compute the repeated second derivatives of J, we use (11) and the chain rule to write Now let I(x) be the k × k matrix with 1 I(x i ) as the ith diagonal entry. Note that by (12), the ij entry of M ⊙ H J is given by while the ii entry of M ⊙ H J is just given by (13) (since m ii = 1). Hence we may write where a ij = m ij J ij and a ii = − ∑ j≠i a ij .

Lemma 3.2. If
A is a matrix such that a ij ≥ 0 for i ≠ j and a ii = − ∑ j≠i a ij then A ≤ 0.
Proof. In fact, the proof follows immediately from some well-known facts in linear algebra, such as the fact that −A is diagonally dominant. However, we may also give a simple proof by noting that the quadratic form of A is nothing but This formula also leads us to the observation that if all of the a ij are strictly positive (which is the case for the matrix A in (14)) then the kernel of A is the span of the all-ones vector. This observation is irrelevant to Lemma 3.2, but we will use it later.
To complete the proof of Theorem 1.4, it remains to characterize the equality cases; for this, we will use Corollary 2.2: if EJ(F ) = J(EF ) then for every t > 0, where the second equality follows from (14). Since I is always non-singular, we may drop the first instance of it from (15). Defining w i = Φ −1 ○ f i , we have ∇w i = ∇f i I(f i ). By multiplying out the last two terms of 15, we have Now, recall from the proof of Lemma 3.2 that the kernel of A is (1 . . . 1) T . It follows then that if then the (∇w i ) Q i are all equal. Since this holds pointwise, and since the distribution of Z is fully supported on R kn , we see that ∇w i must all be almost surely equal to the same constant, a say. Since each w i is a smooth function, we have w i (x) = a ⋅ x + b i for some b i . Recalling the definition of w i , we have (P t f i )(x) = Φ(a ⋅ x + b i ). Carlen and Kerce [5] observed (and this observation was subsequently used in [13] and [14]) that under this condition, and if f i = 1 A i , then A i is a half-space and a is normal to it. Since we have the same a for every A i , it follows that A 1 , . . . , A k is a family of parallel half-spaces, which completes the proof of Theorem 1.4. In order to be more self-contained, let us sketch a proof (from [14]) of why (P t 1 A )(x) = Φ(a⋅x+ b i ) implies that A is a half-space. First, one checks that if A is a half-space and ν its outward unit normal, then for some b ∈ R, where k t = (e 2t − 1) −1 2 . Moreover, as A ranges over all half-spaces with normal ν then b ranges over R. After checking that P t is one-to-one, this implies that if (P t 1 A )(x) = Φ(a ⋅ x + b) with a = k t , then A is a half-space with normal a. It remains to see what happens when a ≠ k t . First of all, Bakry and Ledoux showed that ∇(Φ −1 ○ P t f ) ≤ k t for any f ∶ R n → [0, 1]; hence a ≤ k t . But if a < k t then there is some s > 0 with a = k t+s . It follows from the previous argument, then, that there is a half-space B with P t+s 1 B = Φ(a ⋅ x + b) = P t 1 A . We then have P s 1 B = 1 A , which is a contradiction since P s 1 B is always a smooth function.