Noise Stability and Correlation with Half Spaces

Benjamini, Kalai and Schramm showed that a monotone function $f : \{-1,1\}^n \to \{-1,1\}$ is noise stable if and only if it is correlated with a half-space (a set of the form $\{x: \langle x, a\rangle \le b\}$). We study noise stability in terms of correlation with half-spaces for general (not necessarily monotone) functions. We show that a function $f: \{-1, 1\}^n \to \{-1, 1\}$ is noise stable if and only if it becomes correlated with a half-space when we modify $f$ by randomly restricting a constant fraction of its coordinates. Looking at random restrictions is necessary: we construct noise stable functions whose correlation with any half-space is $o(1)$. The examples further satisfy that different restrictions are correlated with different half-spaces: for any fixed half-space, the probability that a random restriction is correlated with it goes to zero. We also provide quantitative versions of the above statements, and versions that apply for the Gaussian measure on $\mathbb{R}^n$ instead of the discrete cube. Our work is motivated by questions in learning theory and a recent question of Khot and Moshkovitz.


Introduction
In a seminal paper, Benjamini, Kalai and Schramm [2] related noise stability to correlation with half-spaces by showing that a monotone boolean function is noise stable if and only if it is correlated with a half-space. Our interest in this paper is relating noise stability with correlation with half-spaces for general boolean functions. Our results are motivated by recent work of Khot and Moshkovitz whose goal is to construct a Lasserre integrality gap for the Unique Games problems as well as by natural problems in learning theory.
In the following subsections we introduce the setup and results in the boolean and Gaussian cases and discuss the motivation for our work.
Our main theorem, in its qualitative form (its analogous quantitative versions are Theorem 3.1 and Theorem 3.10), says that a set is noise stable if and only if we can make it correlated with a half-space by randomly restricting a constant fraction of its coordinates. The proof of Theorem 1.1 is not very complicated. In one direction, it is well-known (Theorem 3.11) that every half-space is noise-stable, and it then follows that if a set is correlated with a half-space then it is noise-stable (Proposition 3.12). Finally, if a random restriction of a set is noise-stable then the original set must also be noise-stable. This proves that if a random restriction is correlated with a half-space then the set is noise-stable.
For the other direction, the main idea is to involve the "first-level Fourier weight" of a set, defined by We then proceed in two steps: first (in Proposition 3.2), we prove that if w 1 (B) is large then B is correlated with a half-space. For this step, note that if w 1 (B) is large then B is correlated with a linear function, and we can use that linear function to find a correlated half-space. For the second step (in Proposition 3.3), we prove that for a noise-stable function, random restrictions have large first-level Fourier weight.
Since the notion of taking restrictions may seem artificial, it is natural to ask whether taking restrictions in Theorem 1.1 is really necessary. That is, could it be that B (i) noise stable already implies that M (B (i) ) → 0? In fact, this is not the case. As an example, take n m = n 2 and consider the sets B (m) ⊂ {−1, 1} nm defined by x j 2 ≤ m .

The Gaussian setting
The preceding results also make sense in a Gaussian setting: Let γ n denote the standard Gaussian measure on R n and write P t for the Ornstein-Uhlenbeck semigroup, EJP 23 (2018), paper 16.
(Here and elsewhere we will reuse symbols that we also used in the boolean setting; however, the meaning should always be clear from the context.) The Gaussian noise For a probabilistic definition of noise stability, suppose that X ∼ γ n and Y ∼ γ n are jointly Gaussian with correlation e −t . Then NS t (A) = Pr(X ∈ A and Y ∈ A). As in the boolean case, we have NS t (A) − γ n (A) 2 = Var(P t/2 1 A ); we say that a sequence A i of sets is noise sensitive if Var(P t 1 Ai ) → 0 for all t > 0, and we say that A i is noise stable otherwise. A half-space is a set of the form {x ∈ R n : x, a ≤ b}; write H n for the set of all half-spaces in R n and define In the setting above, we prove that a sequence of sets is noise stable if and only if by scaling and randomly shifting it, we make them correlated with half-spaces. Specifically, The proof of Theorem 1.3 follows the same general outline as the proof of Theorem 1.1. In fact, Theorem 1.3 is a little bit nicer to prove, because the Gaussian measure is particularly easy to work with; therefore, we will prove Theorem 1.3 first.
As in the boolean case, one can find examples showing that Theorem 1.3 would be false if we didn't introduce the scaling and random shifting. In this case, the example is very easy: let B (n) ⊂ R n be the Euclidean ball of radius √ n.
One can learn a little more from this example. First, note that any restrictions of B (n) are also Euclidean balls. In the Gaussian setting, therefore, unlike in the boolean one, noise stability does not imply that random restrictions are correlated with half-spaces.
Another observation (since B (n) is rotationally invariant) is that noise stable sets do not necessarily "encode" directions. We make this more precise in Proposition 2.4, which says that even though random shifts and scalings of B (n) are correlated with half-spaces, the directions in which those half-spaces point are unpredictable.

Motivation
Our work is motivated by extending the results of [2] to non-monotone functions, as well by the following motivations: • In a recent work Khot and Moshkovitz [4], proposed a Lasserre integrality gap for the Unique Games problem. The proposed construction is based on the assumption that in a certain family of functions, the most stable functions are half-spaces. More specifically [4] considers f : R n → {−1, +1} which satisfy for all x and for the standard basis vectors e i ; they asked whether the most stable functions in this family are of the form sgn( i σ i x i ) where σ i ∈ {−1, 1} n , and also whether every function that is almost as noise stable as possible must be correlated with a function of this form.
In this context, it is natural to ask whether every noise stable function is correlated with a half-space. This is the question we address in this paper. However, since our functions are not required to satisfy f (x + e i ) = −f (x), our results and examples do not have direct implications for the proposed Lasserre integrality gap instances.
• It is well known that the class of functions having a constant fraction (resp. most) of their Fourier mass on "low" coefficients can be weakly (resp. strongly) learned under the uniform distribution [7,5]. In particular, noise stable functions can be weakly learned. On the other hand, the most classical learning algorithms involve learning half-spaces. Thus it is natural to ask if there is more direct relation between the weak learnability of noise stable functions and the learnability of half-spaces. Our examples seem to provide a negative answer to this question.

Remark 1.5.
It is natural to ask whether the theorem of [2] can be recovered from our results. For example, by combining Theorem 1.1 with [2] it follows that a monotone set is correlated with a half-space if and only if its random restrictions are correlated with half-spaces. But is this fact obvious from first principles? If it were, it would combine with Theorem 1.1 to give a different proof of [2].

The Gaussian case
For this section, let X ∼ γ n . Recall that the Ornstein-Uhlenbeck semi-group is defined For t ∈ R and y ∈ R n , define f t,y by f t,y (x) = f ( √ 1 − e −2t x + e −t y).

Theorem 2.1.
There is a universal constant c > 0 such that for any measurable f : R n → [0, 1] and any t > 0, where the expectation is with respect to Y ∼ γ n .

An example
As discussed in the introduction, a simple example shows that f itself may not be correlated with a half-space: let B n ⊂ R n be the Euclidean ball of radius √ n. First, we note that for sufficiently small t, Var(P t 1 Bn ) is bounded away from zero as n → ∞. (This is already well-known [3], since B n is obtained by thresholding a quadratic function, but the computation in our special case is quite easy.) Proposition 2.2. For any n and any t > 0, In particular B n is noise stable.
Proof. For a set of B of smooth boundary, we may define the Gaussian perimeter of B as where H n−1 denotes the (n − 1)-dimensional Hausdorff measure and dγn dλ denotes the Gaussian density with respect to the Lebesgue measure. Since the Gaussian density restricted to ∂B n takes the constant value (2πe) −n/2 and the Euclidean surface area of B n is √ n n−1 · 2π n/2 /Γ(n/2), it follows that the Gaussian perimeter of B n is where the approximation follows from Stirling's formula.
On the other hand, Ledoux [6] proved that if P is the Gaussian perimeter of B then Plugging in our asymptotics for the Gaussian perimeter of B n , we have Since P t = P t/2 P t/2 and P t/2 is self-adjoint, this may be rearranged into Since Pr(B n ) = 1 2 + o n (1), this proves the claim. Next, we observe that B n is not correlated with any half-space: In particular, Propositions 2.2 and 2.3 together imply that Theorem 2.1 would no longer be true if f t,y were replaced by f .
Proof. Since B n is rotationally invariant, it suffices to consider half-spaces of the form . Then the f i are orthogonal and satisfy f i 2 ≤ 1. Hence, A very similar argument shows that even though shifts of A n may be correlated with half-spaces, the half-spaces are pointed in unpredictable directions.
In particular, Chebyshev's inequality implies that for any u > 0, with probability at least As in the proof of the previous proposition, for any Y and t, Taking the expectation over Y completes the proof.

Proof of Theorem 2.1
, we may also write w 1 (f ) = |E∇f | 2 . The proof of Theorem 2.1 goes in two steps: first, we show that if w 1 (f ) is non-negligible then there exists a half-space correlated with f . Then, we show that for a random Y ∼ γ n , For the first step, we will make use of the following simple identity: Lemma 2.5. Let ν be a probability measure on R that has a finite mean and is symmetric, Proof. Note that the convergence of the integral follows from the finiteness of the mean.
It follows that ψ is continuous and Lebesgue-a.e. differentiable, and ψ ( , and note that where | · | denotes the Euclidean norm. Choose v to maximize the right hand side above.
Since the distribution of v, X is symmetric, Lemma 2.5 implies that Next, we will show that the tails of the above integral decay rapidly, and it will follow Now, v, X has a standard Gaussian distribution and so at least one out of Pr( v, X ≥ t) and Pr( v, X < t) is bounded by exp(−t 2 /2). Hence, EJP 23 (2018), paper 16.
For any s ≥ 1, it follows that where the second inequality follows from bounding exp(−t 2 /4) by t exp(−t 2 /4) and then integrating by parts. Going back to (2.1), we have (for any s ≥ 1) Plugging in our value for s proves that which (after some rearrangement) implies the claim.

Remark 2.7.
We remark that the proof of Proposition 2.6 does not use the Gaussian setting in a particularly strong way. In particular, the proof is valid whenever X has a symmetric sub-Gaussian distribution, where "sub-Gaussian" means that for any unit vector v and any t ≥ 0.
The second step in the proof of Theorem 2.1 is to show that if a function f is noise stable then it has some shifts f t,y with non-negligible w 1 (f t,y ). In order to do this, recall the Gaussian Poincaré inequality (see, e.g. [1]), which states that Var(f ) ≤ E|∇f | 2 for any f with continuous derivatives.
Proof. Since smooth functions are dense in L 2 (γ n ), and since both w 1 (f ) and Var(P t f ) are preserved under L 2 (γ n ) convergence, we may assume that f is smooth. Then ∇f t,y = √ 1 − e −2t (∇f ) t,y . Hence, Now set Y to be a standard Gaussian vector in R n , independent of X. Then where the last line follows because P t ∇f = e t ∇P t f . Finally, the Poincaré inequality applied to P t f yields Proof of Theorem 2.1. By Proposition 2.8, there exists some y ∈ R n such that w 1 (f t,y ) ≥ Var(P t f ). Applying Proposition 2.6 to f t,y completes the proof.

The converse of Theorem 2.1
The following result is a (qualitative) converse of Theorem 2.
Next, we show that any set which is correlated with a half-space must be noise stable (indeed, almost as noise stable as the half-space itself). Proposition 2.11. Suppose that A ⊂ R n is a half-space. Then for any f : R n → [0, 1] and any t > 0, for a universal constant C.
Proof. Let g = 1 A − γ n (A) and h = f − Ef , so that g and h both have mean zero and Since P 2t g − g = P 2t 1 A − 1 A , Lemma 2.10 implies that EJP 23 (2018), paper 16.
Recalling that c = Cov(A, f )/ Var(1 A ), this proves the first claimed inequality.
For the second inequality, note that Lemma 2.10 implies that Combining this with the first claimed inequality, In order to relate the noise stability of f to half-spaces correlated with f t,y , note that To prove Theorem 2.9, note that if we fix r and s and solve for t the we obtain e −2t = 1 − 1−e −2r 1−e −2s . For small t, this gives t = Θ( 1−e −2r 1−e −2s ) (while for large t the Theorem is vacuous anyway).

Boolean functions
For this section, P t denotes the Bonami-Beckner semigroup defined in Section 1.1. Recall also the definition of f z for z ∈ {−1, 0, 1} n from that section. Let µ s be the probability distribution e −s δ 0 Then we have the following relationship between P s and Z s : (P s f )(x) = Ef Zs (x).
Then f is noise-stable, but if z 1 = −1 then f z is noise sensitive and uncorrelated with any half-space. In other words, f Zt has probability 1 2 e −t of failing to be correlated with any half-space.

Proof of Theorem 3.1
The Also, we abbreviatef ({i}) byf (i), and we define , we may also write as in the Gaussian case.
We will prove Theorem 3.1 in two steps. First, we will show that if w 1 (f ) is nonnegligible then there is a half-space correlated with f . Then we will show that Ew 1 (f Zt ) is non-negligible. Actually, the first step is already done, thanks to Remark 2.7. Indeed, Hoeffding's inequality implies that the uniform measure on {−1, 1} n is sub-Gaussian in the sense of Remark 2.7, and so the proof of Proposition 2.6 applies with no changes to the boolean setting: EJP 23 (2018), paper 16.
Proof. Fix t and set Z = Z s . Recalling the definition of w 1 , we have Summing over i proves the first claim; the second follows from the fact that Finally, e 2t − e t = e t (e t − 1) ≥ 1 2 (e t + 1)(e t − 1) = 1 2 (e 2t − 1).

An example
Let n = m 2 , and let J i = {(i − 1)m, . . . , im − 1}. Let B n ⊂ {−1, 1} n be the set From the central limit theorem, one sees immediately that B n is noise stable, with the same estimate as its Gaussian analogue in Section 2.1.

Proposition 3.4.
For any n and any t > 0, In particular B n is noise stable.
Finally, we show that B n is not correlated with any half-space. This essentially follows from the invariance principle, which says that nice boolean functions have almost the same distribution when their arguments are replaced by Gaussian variables.
EJP 23 (2018), paper 16. For the rest of this section, fix x ∈ R n and b ∈ R, and suppose that A = {x ∈ {−1, 1} n : . . , n} be the set containing the indices of the m 1/3 largest |a i |. Define a + by a + i = 1 {i∈J * } a i and set a − = a − a + .
We split our proof of Proposition 3.5 into two parts, depending on the decay properties of a. If a − is unbalanced, it follows that a + must contain only large coordinates. We apply the Littlewood-Offord theorem to argue that a − is essentially irrelevant and A depends only on a few coordinates. Since B n doesn't depend on any small set of coordinates, this implies that A and B n are uncorrelated. If a − is fairly balanced then we condition on {X i : i ∈ J * } and apply an invariance principle to {X i : i ∈ J * }, replacing boolean variables with Gaussian variables and applying Proposition 2.3. First, we recall the Littlewood-Offord inequality: Proof. By Theorem 3.6 and since |a i | ≥ a − ∞ for all i ∈ J * , On the other hand, Chebyshev's inequality implies that Putting these two inequalities together, we see that with probability at least 1 − Cm −1/12 over {X j : j ∈ J * } we have On the other hand, conditioning on {X j : j ∈ J * } has little effect on the event B n : each random variable Z i := j∈Ji X j 2 has conditional expectation m ± O(|J i ∩ J * | 2 ) and Recalling (from the Berry-Esseen theorem) that Pr(X ∈ B n ) = 1 2 + O(m −1/2 ), our goal is We will achieve this by conditioning on Ω z : for an arbitrary z, we claim that Going back to the definitions of p z and q z , this is equivalent to We divide the proof of (3.3) into several steps: for any > 0, If the range of f is {−1, 1} then Inf i (f ) is just the probability that negating X i will change the value of f (X). For (3.4), note that the Berry-Esseen theorem applied to S k := j∈J k X j implies that with probability at least 1−Cm −1/6 , h 1 (p(X)) falls outside the interval [1−6m −2/3 , 1+ 6m −2/3 ]. Hence, in order to change the value of h 1 (p(X)), one would need to change the value of k S 2 k by at least 6m 4/3 . On the other hand, Hoeffding's inequality implies that with probability at least 1 − Cm −1/6 , max k |S k | ≤ 2m. On this event, in order to change the value of k S 2 k by 6m 4/3 , one would need to change at least 2m 1/3 of the X j . Since p z (X) is obtained from p(X) by changing at most m 1/3 of the X j , we see that h 1 (p(X)) = h 1 (p z (X)) unless one of the two events above fails. This proves (3.4).
Recognizing that h 1 (p(Y )) = 1 Bn (Y ) and h b (q z (Y )) is the indicator function of some half-space, the following Lemma proves (3.10).
Note that A and B are the push-forwards ofÃ n andB under a map that preserves the standard Gaussian measure: if Π m : R n → R m is defined by Π m x = (x J1 , . . . , x Jm ) and Π is defined by Finally, (3.7) follows from the following multivariate invariance principle that was proved by the first author in [8]: EJP 23 (2018), paper 16. Theorem 3.9. Suppose p(x) and q(x) are polynomials of degree at most d such that Inf i (p) ≤ τ and Inf i (q) ≤ τ for all i. For any Ψ : R 2 → R with third partial derivatives uniformly bounded by B, where Y ∼ γ n , X is uniform on {−1, 1} n , and C is a universal constant.

The converse of Theorem 3.1
Here, we state and prove the boolean analogue of Theorem 2.9 (or, the qualitative converse of Theorem 3.1). That is, we show that if M (f s,Y ) is non-negligible with constant probability then f is noise stable. , where Z s ∼ µ s and C is a universal constant.
The proof of Theorem 3.10 is very much like the proof of Theorem 2.9, so we give only a sketch. As in the proof of Theorem 2.9, the first step is a bound on the noise stability of half-spaces. However, the bound that we used to prove Lemma 2.10 is not known for boolean functions (it would be equivalent to a weak version of the "majority is least stable" conjecture). Instead, we use a weaker (by a constant factor) bound due to Peres [10]: Theorem 3.11. For any half-space A and any t > 0, E[(1 A − P t 1 A ) 2 ] ≤ C √ t, where C is a universal constant.
Next, we show that any set which is correlated with a half-space must be noise stable (indeed, almost as noise stable as the half-space itself). The proof of Proposition 3.12 is essentially identical to the proof of Proposition 2.11, so we omit it. The only difference is that we use Theorem 3.11 instead of Lemma 2.10.
Finally, the argument to go from Proposition 3.12 to Theorem 3.10 is also essentially identical to the Gaussian case: the only property of Gaussians that we used in that argument was the Poincaré inequality, which takes the same form in the boolean case.

Acknowledgement
We thank Dana Moshkovitz, Gil Kalai and Irit Dinur for encouragement to complete this work. E.M acknowledges the support of NSF grant CCF 1320105, DOD ONR grant N00014-14-1-0823, and grant 328025 from the Simons Foundation".
We also thank the anonymous reviewers for many comments that helped us to improve the presentation, most notably a substantial improvement to the proofs of Theorems 2.1 and 3.1.