Sparse juntas on the biased hypercube

We give a structure theorem for Boolean functions on the $p$-biased hypercube which are $\epsilon$-close to degree $d$ in $L_2$, showing that they are close to sparse juntas. Our structure theorem implies that such functions are $O(\epsilon^{C_d} + p)$-close to constant functions. We pinpoint the exact value of the constant $C_d$. We also give an analogous result for monotone Boolean functions on the biased hypercube which are $\epsilon$-close to degree $d$ in $L_2$, showing that they are close to sparse DNFs. Our structure theorems are optimal in the following sense: for every $d,\epsilon,p$, we identify a class $\mathcal{F}_{d,\epsilon,p}$ of degree $d$ sparse juntas which are $O(\epsilon)$-close to Boolean (in the monotone case, width $d$ sparse DNFs) such that a Boolean function on the $p$-biased hypercube is $O(\epsilon)$-close to degree $d$ in $L_2$ iff it is $O(\epsilon)$-close to a function in $\mathcal{F}_{d,\epsilon,p}$.


Introduction
Let  : {0, 1}  → {0, 1} be a Boolean function on the hypercube, where {0, 1}  is endowed with the uniform measure.If  has degree 1 then it is a dictator, and if it has degree  then it is a junta [13].Friedgut, Kalai, and Naor (FKN) [7] showed that if  is -close to degree 1 (that is, ∥  >1 ∥ 2 ≤ ) then  is ()-close to a dictator (that is, Pr[  ≠ ] = () for some dictator ).+||).Concretely, the monomial coefficients appearing in the functions in G ,, depend only on , a property which fails for the Fourier expansion.
This implies that the monomial coefficients of  are bounded integers (implying that  is integervalued, as a function on {0, 1}  ).As an example, when  = 1, the constant coefficient is either 0 or 1; in the former case, all other coefficients are 0 or 1; and in the latter case, all other coefficients are 0 or −1.
Choosing  = ∅, the property states that the expected number of monomials in supp() which evaluate to 1 under a random input is  (1).Considering other  , the same holds even under the condition   = 1.For these reasons, we call a polynomial satisfying this property a sparse junta.
This item fails for the Fourier expansion.For example, the Fourier support of the function /2 =1  2−1  2 contains 2/ 2 degree-1 monomials rather than just (1/).Item (iii) concerns minimal non-Boolean inputs.An input  ∈ {0, 1}  is a minimal non-Boolean input of  if  ( ) ∉ {0, 1} but  () ∈ {0, 1} for all inputs  strictly below , that is, obtained from  by switching one or more 1s to 0s.Equivalently, if  = 1  then the restriction  |  obtained by zeroing out all coordinates outside of  is Boolean except for the input 1  .Due to property Item (i), every minimal non-Boolean input occurring in  has (Hamming) weight at least  + 1.
We can think of a minimal non-Boolean input as a forbidden configuration, which we can describe by specifying the function  |  .When  = 1, there are two minimal non-Boolean inputs, corresponding to the functions  1 +  2 and 1 −  1 −  2 .
If g (∅) = 0 then only the first one  1 +  2 is possible.Defining  = | supp() |, the number of minimal non-Boolean inputs is  2 , and so Item (iii) states that

Monotone version
When the function  is monotone, we can guarantee that the function  is monotone as well, in fact a monotone width  DNF, which is a disjunction of minterms   1 ∧ • • • ∧    with  ≤ .
T H E O R E M 1 .2. Suppose that  : ({0, 1}  ,   ) → {0, 1} is a monotone function which is -close to degree , where  ≤ 1/2.Then  is ()-close to a monotone width  DNF , where  satisfies the following properties, for some constant  depending only on : (i) For every  and , the DNF  contains at most /  minterms of  ⊇  of size | | + .
(ii) For all , the number of sets  of size  such that deg( |  ) >  but deg( |  ) ≤  for all  ⊊  is at most /  .
Conversely, if  is a monotone width  DNF satisfying these properties then it is ()-close to degree .
We list all forbidden configurations for  = 2 in Section 2.2, where we also show that there is a finite number of them for every .

Junta approximation
The FKN theorem of Filmus [5] implies that if  : ({0, 1}  ,   ) → {0, 1} is -close to degree 1 then it is ( √ )-close to a dictator and ( √  + )-close to a constant.Theorem 1.1 implies a similar result for arbitrary , where √  is replaced by an appropriate power of .
Using a Ramsey-theoretic argument, we show that this kind of construction is the best possible for .
As an example, when  = 2 we get  = 4, corresponding to the polynomial () It is an intriguing open question to find the optimal value of .Another interesting exponent is the best  such that  is ( 1/ )-close to some Boolean junta or to some degree  junta (both exponents are the same).

Proof sketch
We prove Theorem 1.1 by reduction to the Kindler-Safra theorem, in the following form.
Applying the Kindler-Safra theorem for every  |  , we obtain a Boolean degree  function We would like to construct the function  by pasting together the various functions   , and for that we need to know that the   agree with each other, on average.This sort of pasting is achieved using agreement theorems, and in this case we use the following theorem, with  being the maximal size of the monomial support of a Boolean degree  function,  = 2, and T H E O R E M 1 .6 (Junta agreement theorem).Fix the following parameters: integer  (junta size),  ∈ (0, 1) (bias),  ∈ (0, 1) (fractional intersection size).
For each  ⊆ [], let   be a degree  polynomial whose monomial support contains at most  monomials.
Suppose that the following agreement condition holds: (The value of  is chosen so that the marginal distribution of  1 :=  ∪  1 and  2 :=  ∪  2 is   .) Then there exists a degree  polynomial  such that (The error bound depends on  and  but is independent of .) Moreover, for each , the monomial coefficient g () is chosen by majority decoding: it maximizes The junta agreement theorem follows from the more general agreement theorem in our earlier work [2], which does not require a bound on the size of the monomial support.However, since the proof of the junta agreement theorem is much easier than the proof of the general agreement theorem, we include it in this paper.
In order to apply Theorem 1.6, we need to show that the condition in the theorem holds.
Since   A short argument now shows that Pr[  ≠ ] = ().
Since  is obtained by majority decoding and each   is a junta, we can show that  is a sparse junta, proving Item (ii).This allows us to show that E   [(  − ) 2 ] = (), a step which we highlight below.
Next, let us show why Item (iii) holds.Let 1  be a minimal non-Boolean input of .Since  is a sparse junta, the probability that   = 1 and   = 0 for all other minimal non-Boolean inputs 1  is Ω(  || ).These events are disjoint, and so where the sum goes over all minimal non-Boolean inputs of .This implies Item (iii).
Finally, let us address Item (i).The original function  doesn't necessarily satisfy it.
However, an argument along the lines of the preceding paragraph shows that the number of offending inputs is small, allowing us to slightly perturb  so that it satisfies Item (i).
From  0 to  2 using the reverse union bound In the proof of Theorem 1.
Since the monomial coefficients of ℎ 2 are quantized, it suffices to show that the monomial support of ℎ 2 is sparse, in the sense that it contains (/  ) sets of size  for all  ≤ 2.Since ℎ is a sparse junta, in order to show that the monomial support of ℎ 2 is sparse, it suffices to show that the monomial support of ℎ itself is sparse.
The union bound shows that If the inequality were in the other direction then it would follow that the monomial support of ℎ is sparse, since Pr[ℎ ≠ 0] = ().In general, we cannot reverse the inequality (consider for example ℎ =  =1   , where the left-hand side tends to 1 while the right-hand side tends to infinity for constant ).However, when ℎ is a sparse junta, the reverse union bound states that we can indeed reverse the direction of the inequality: The proof of the reverse union bound is not difficult.It uses the Harris-FKG inequality.

Differences from conference version
This paper originates in an extended abstract [3], which covers both the material eventually published in [2] and part of the material in the present work.The full version of [3] is available on arXiv (1711.09428)and ECCC (TR17-180).
The full version of [3] contains weaker versions of Theorems 1.1 and 1.2, which describe only some properties of the approximating function , and in particular do not constitute a full characterization of all Boolean functions which are close to degree .The current versions of these theorems are inspired by [4].In addition, Theorem 1.4 is completely new.It is inspired by [6].
In contrast, the main theorems in the full version of [3] apply in the more general -valued setting, which we describe briefly in Section 7, as well as in the setting of the slice.We omit these generalizations in this version to make it more concise, and since these generalizations only require a few new ideas, the most interesting of which is that whereas the monomial expansion of functions on the slice is not unique, the sparse monomial expansion of a junta is unique.

Paper organization
We start with various preliminaries in Section 2. We formally define sparse juntas and describe some of their properties in Section 3. We prove Theorem 1.6, the junta agreement theorem, in Section 4. We prove the main structure theorem, Theorem 1.1, in Section 5, and its monotone version, Theorem 1.2, in Section 6.We prove the junta approximation theorem, Theorems 1.3 and 1.4, in Section 7.
We use 1  for the characteristic vector of the set  or for the indicator of the event .We also use the notation 1[𝑆] for the latter to improve legibility.
The weight of  ∈ {0, 1}  is the number of coordinates equal to 1.
A function on {0, 1}  is an -junta if it depends on at most  coordinates.

Monotone functions
Equivalently, if we identify an element in {0, 1}  with the corresponding subset, then a subset  ∼   is sampled by including each element independently with probability .

Function restriction
We denote the restriction of  ∈ {0, 1}  to the coordinates in a subset obtained by substituting zero for all coordinates outside of : Function substitution Given a function  : {0, 1}  → R and a subset  ⊆ [], the substitution If  = {} then we use the notation  |   ←1 .

Unbiased structure theorems Granularity
The monomial coefficients of Boolean functions are bounded integers.

L E M M A 2 .1.
If  is Boolean and  ≠ ∅ then f () is an integer whose magnitude is at most 2 ||−1 .

P R O O F .
Let  be the function obtained by substituting zero in all coordinates outside .Then f () = g () and Considering the coefficient of   , we see that The claim follows since 2 ||−1 of the summands are +  () and 2 ||−1 are −  ().■ Nisan-Szegedy Nisan and Szegedy [13] proved the following theorem.
The number of coordinates in the junta was improved to (2  ) in [1,15].

Kindler-Safra
In unpublished work, Kindler and Safra [12,11] proved the following result, which we state in its formulation due to Keller and Klein [9].

T H E O R E M 2 . 3 (Kindler-Safra).
There exists a universal constant  > 0 such that the following holds for all  ∈ N.
Boolean degree  function.
This formulation follows by combining Theorem 1.4 and Proposition 5.6 in [9].(Keller and Klein also prove a stronger version in which the closeness is improved to the optimal expression,  + Õ( 2 ).) For our purposes, we need a version of the Kindler-Safra theorem which holds for all .

T H E O R E M 2 . 4.
There exists a universal constant  > 0 such that the following holds for all follows directly from Theorem 2.3.Otherwise,    ≥ 1, and so  is   -close to the constant zero function.

Monotone Kindler-Safra
When the function  is monotone, we can guarantee that the approximating function in Theorem 2.4 is monotone as well.
We start with a warm-up.
Suppose without loss of generality that  depends on the first  coordinates.We separate accordingly the inputs to  and  into two parts: a point in {0, 1}  , and a point in {0, 1} − .
Since there are 2 − choices for , Using a similar argument, we can prove a monotone version of the Kindler-Safra theorem.
We are indebted to an anonymous reviewer for suggesting the following proof, which improves the bound on   from doubly exponential to singly exponential.If  is monotone then we are done.Otherwise, there is an index  and two inputs  <  differing only in the 'th coordinate such that  () >  ( ).We will show that there are in fact many such pairs.To this end, define Here | ← is obtained from  by setting   = .
The function   , which is just the derivative of  with respect to   , attains the values {−1, 0, 1}, where   ( ) = −1 if (  ←0 ,  ←1 ) is a pair of inputs violating monotonicity.The function   is the indicator function of this event.
By construction,   has degree 2 − 2, and so E[  ] is an integer multiple of 2 −2−2 .(One way to see this is to observe that the monomial coefficients of   are integers, and that the expectation of a degree  monomial is 2 − .)By construction, E[  ] > 0, and so Since  is monotone, this implies that On the other hand, this probability is at most   , and so  ≥  − /2 2−1 .It follows that  is 2 2−1   -close to the zero function.■ Curiously, the proof of the biased version of Theorem 2.6 only uses Lemma 2.5.

Forbidden configurations
Our work involves three different types of forbidden configurations.These are configurations of coefficients which cannot occur in Boolean functions, but do arise in our context -but only sparingly.For example, the biased FKN theorem [5] involves functions of the form  =      .
If  is Boolean then we cannot have   =   = 1, but if  is -close to Boolean then there could be up to (/ 2 ) copies of this configuration.

Minimal non-Boolean inputs
Suppose that  has degree 1, say If  0 ∉ {0, 1} then 0 is the only minimal non-Boolean input.Otherwise, suppose without loss of generality that  0 = 0.If   ∉ {0, 1} then 1 {} is a minimal non-Boolean input.If   =   = 1 then 1 {, } is a minimal non-Boolean input.There are no other types of minimal non-Boolean inputs.
In addition, if  is a minimal non-Boolean input then this is due to some non-zero monomial coefficients: if 1 {} is a minimal non-Boolean input then   ≠ 0, and if 1 {, } is a minimal non-Boolean input of  then   ,   ≠ 0. In both cases, the minimal non-Boolean input  = 1  is such that  is the union of sets in supp(  ): {} in the former case, and {}, { } in the latter case.
The following lemma shows that a similar picture holds for all .We thank an anonymous reviewer for suggesting the following proof, which improves the bound on   from exponential to linear.
In general, the minimal non-Boolean inputs come in pairs  , 1 −  , and it suffices to list those which satisfy  (0) = 0.
Here are the minimal non-Boolean functions satisfying  (0) = 0 for  = 2: As in the case of minimal non-Boolean inputs, we will be interested in minimal nonmonotone inputs of degree  functions whose weight is larger than , which we can describe using minimal non-monotone functions.When  = 1, there are no minimal non-monotone functions.When  = 2, the unique minimal non-monotone function is

Minimal high-degree inputs
As in the case of minimal non-Boolean inputs and minimal non-monotone inputs, we can describe minimal high-degree inputs as minimal high-degree monotone DNFs.When  = 1, the unique minimal high-degree monotone DNF is  1 ∨  2 , and when  = 2, the minimal high-degree monotone DNFs are

Sparsity
The approximating function  in Theorem 1.1 has sparse monomial support, and its monotone counterpart in Theorem 1.2 has sparse minterm support.This crucial property drives much of the proofs of these theorems.
Here is the notion of sparsity that comes up in both theorems.

D E F I N I T I O N 3 .1 (Sparse).
A set system over a set  is a collection of subsets of  .
A set system F is (, , )-sparse, where  ≥ 0 is an integer,  ≥ 0, and  ≥ 1, if the following three properties hold: (i) All sets in F have size at most .
(ii) For all  ⊆ [] and integer  ≥ 0, the set system F contains at most   sets  ⊇  of size | | + .
(iii) For all integer  ≥ 0, the set system F contains at most    sets  of size .
A set system is (, )-sparse if it is (, , 1)-sparse; this corresponds to omitting the final property.
We describe some elementary properties of sparse set systems in Section 3.1.Sparse set systems satisfy a crucial property we call the reverse union bound, which we explain in Section 3.2.Finally, we discuss quantized sparse juntas, which are sparse juntas in which the monomial coefficients belong to a fixed finite set.We show in Section 3.3 that if a quantized

Elementary properties
We discuss the effect of three natural operations on the sparsity of set systems: union, sum and substitution.
The union of two set systems is their set-theoretic union, and it corresponds to the sum of functions: where the inclusion is tight generically.

D E F I N I T I O N 3 . 3 (Sum of set systems).
If F , G are set systems then their sum is The  ′ th iterated sum of a set system F is the sum of  copies of F : Since  1 , . . .,   are not necessarily distinct, F ⊔ includes F ⊔ℓ for all non-zero ℓ < .
The sum of two set systems corresponds to the product of functions: where the inclusion is tight generically.Union, sum and contraction all preserve sparsity, after adapting the parameters.
Every set system satisfies the definition of (, )-sparse for  = 0, and so it suffices to prove it for  ≥ 1.

Item (a).
Let F , G be (, )-sparse set systems.Given  and  ≥ 1, there are at most   sets in each of F , G of size || +  which contain .Therefore there are at most 2  ≤ (2)  sets in F ∪ G of size || +  which contain .
Since there are at most (2 + 1) 3 many options for ,   ,   , this will complete the proof.
There are

Item (c).
This item follows from Item (b) by induction on .

Item (d).
Let F be a (, )-sparse set system, and let  be a set.Given  and , we need to bound the number of sets  ∈ F such that  \  ⊇  and | \  | = || + .We can assume that  is disjoint from .
We will bound the number of such sets  given  =  ∩ .
Given  ≥ 1, there are at most    sets of size  in each of F , G. Therefore there are at most 2   ≤ (2)   sets of size  in F ∪ G.
The case  = 0 requires special treatment.If  < 1 then F , G do not contain ∅, and so F ∪ G also doesn't contain ∅.If  ≥ 1 then clearly F ∪ G contains at most  sets of size 0.
Given , we need to bound the number of sets  =  ∪  of size , where  ∈ F and  ∈ G.
For this, we show that for each ,   ≤ , there are at most 2 Since there are at most ( + 1) 2 many options for ,   , this shows that the number of sets  ∈ F ⊔ G of size  ≥ 1 is at most ( + 1) 2 2     ≤ (( + 1) 2 2  )  , completing the proof except for the case  = 0.
In order to handle the case  = 0, we again consider whether  < 1 or not.If  < 1 then ∅ ∉ F and so ∅ ∉ F ⊔ G.If  ≥ 1 then F ⊔ G trivially contains at most  sets of size 0.

Item (c).
This item follows from Item (b) by induction on .■

Reverse union bound
Let F be a set system.The union bound shows that In general, the union bound is not tight.For example, suppose that F = {{1}, . . ., {}}.As  → ∞, the left-hand side tends to 1 while the right-hand side tends to infinity.More generally, if F = {[] ∪ { + 1}, . . ., [] ∪ {}} then the left-hand side tends to   while the right-hand side tends to infinity.
In this section, we show that the union bound is tight up to a constant factor provided that F is sparse.
For  ∈ F , let E  denote the event that   = 1 and   = 0 for all  ∈ G such that  ⊈ .
Let E denote the union of the events E  for all  ∈ F .Then If E holds then   = 1 for some  ∈ F , and so In the other direction, observe first that the events E  are disjoint.Indeed, suppose that ,  ∈ F are distinct.If  ⊆  and  ⊆  then  =  , so assume without loss of generality that  ⊈  .If E  holds then   = 1 and so E  cannot hold.This implies that We now bound each of the summands: where ( * ) follows from the Harris-FKG inequality, since the events   = 0 are anti-monotone.
Lemma 3.5 shows that G| ←1 is (, /)-sparse, where  depends only on , .Therefore We conclude that When F = G, we recover the statement that the union bound is tight up to a constant factor for sparse set systems.Indeed, if E happens then in particular  ⊇  for some  ∈ F , and so the Lemma shows that Pr

Quantized sparse juntas
The sparse juntas that we consider later on will arise from applying the junta agreement theorem, Theorem 1.6.Such functions are quantized in the following sense.

D E F I N I T I O N 3 . 8 (Quantized functions).
A function  : {0, 1}  → R is -quantized, where  is a finite set, if all monomial coefficients of  belong to .
Quantization is preserved under basic operations.

Junta agreement theorem
We prove our main structure theorems, Theorems 1.1 and 1.2, by reduction to the Kindler-Safra theorem on ({0, 1}  ,  1/2 ).As we explain in Section 1.2, this involves applying the Kindler-Safra theorem to  |  for  ∼  2 , obtaining approximating juntas   , and pasting the juntas together to a global function .
In order to show that the juntas   can be pasted together, we need to assume that they agree with each other, in the sense that Pr for an appropriate distribution  supported on triplets ( 1 ,  2 ,  ) such that  1 ,  2 ⊇  .
Since we are interested in the behavior of   for  ∼  2 , we need the marginals of  1 and  2 to be  2 .Moreover, we want  1 and  2 to be independent given  .This naturally leads to the product distribution  , (where  = 2 and  ≤ ), which is defined as follows: 1.  includes each  ∈ [] with probability .
We can now state the junta agreement theorem, which is a formal statement of Theorem 1.6.
Suppose that for each  ⊆ [] we are given a degree  function   : {0, 1}  → {0, 1} whose monomial support contains at most  monomials (the junta assumption).Assume that the functions   satisfy the following agreement condition: Define a degree  function  : {0, 1}  → {0, 1} as follows: for each || ≤ , let g () be any value  maximizing The function  is a (,  ,, (1/))-sparse junta satisfying In the statement of the Theorem and below, we use the convention that in Pr and E, we take the distribution on the first line of the subscript conditioned on the constraints in the following lines.For example, in the second display,  is chosen by including all elements in  with probability 1 and all elements outside of  with probability .
Our previous work [2] proves a stronger version of Theorem 4.1, which does not require the   to be juntas but has the same conclusion (without the guarantee that  is a sparse junta, which follows from the   being juntas).The proof of this stronger version is somewhat nonintuitive, using the notion of good and excellent sets which is borrowed from [8].It turns out that the proof simplifies dramatically when the   are juntas, and this is the proof that we present here.
The proof of Theorem 4.1 relies on the following expansion lemma.In order to use the junta assumption, we move the sum over  inside: where disagreements counts the number of different monomial coefficients.
At this point, we invoke Fourier analysis on ({0, 1}  ′ ,   ).The Fourier basis for this domain is given by the functions .
This basis is orthonormal with respect to   , and the corresponding Fourier expansion of ℎ is ℎ( ) = ∑︁  ĥ()  ( ), where ĥ(∅) = E[ℎ].Since ℎ is Boolean, we furthermore have This allows us to express the left-hand side of Lemma 4.3 as In order to express the right-hand side in a similar form, it suffices to compute Indeed, since the marginal distributions of  1 and  2 are   , then The independence of coordinates implies that Therefore the expectation on the right-hand side of Lemma 4.3 is implying that we can take   = 1/(1 − ).

Structure theorem
In this section we prove our main structure theorem, Theorem 1.1, which we reformulate using terminology defined in Section 2.
Applying the Kindler-Safra theorem (Theorem 2.4), there exist Boolean degree  functions   such that E  1/2 [(  |  −   ) 2 ] = (  ), and so We would like to paste the functions   to a global function ℎ using the junta agreement theorem (Theorem 4.1), which we will apply with  =   (the value from Theorem 2.2),  = √︁ 1/2, and  = 2.Theorem 2.2 shows that each   is a -junta.In order to apply the junta agreement theorem, we need to verify the agreement condition: We can choose ( 1 ,  ) in the following way: choose  1 ∼  2 , and choose  ∼  √ 1/2 ( 1 ).If we then choose  ∼  √ 1/2 ( ) then it is equivalent to directly sampling  ∼  1/2 ( 1 ).This shows that Pr and so the agreement condition is satisfied.
To conclude this step of the proof, we need to show that (The term 1[  ≠ ℎ|  ] is an indicator.) In order to convert this  0 guarantee to an  2 guarantee, we use Lemma 3.11, which implies that This implies that Concluding, in this step we have constructed a function ℎ satisfying the following properties: (i) ℎ is a (, /)-sparse junta.
In the final two properties, the underlying distribution is   , which we assume henceforth.

Step 2: Fixing coefficients
In this step we modify ℎ to a function  which satisfies all properties listed in Theorem 5.1.We do this by constructing a sequence of functions ℎ = ℎ −1 , ℎ 0 , . . ., ℎ  = , where ℎ  satisfies the following properties: (i) ℎ  is a (,   /)-sparse junta, where   depends only on , .
(ii) ℎ  is   -quantized for some finite   depending only on , .
where E is the event that for some  ∈ F , it holds that   = 1 and   = 0 for all  ∈ supp(G) such that  ⊈ .
If E happens for some , an event we denote by E  , then ℎ −1 |  is a non-monotone Boolean function of degree , which is an   -junta by Theorem 2.2.Therefore Lemma 2.5 implies that Since the left-hand side is (), we conclude that Item (iv) follows from Item (iii) as in Section 5.2.Item (v) follows by construction.

Concluding the proof
Taking  = ℎ  , the function  satisfies all properties stated in Theorem 6.1 except for Items (iii) and (iv).
Item (iii) follows as in Section 5.2.In order to prove Item (iv), let F  denote all minimal non-monotone Boolean inputs of .The proof of Item (iii) above shows that ∑︁ ∈F   || = (), from which Item (iv) of the theorem immediately follows.

Converse part
We conclude the proof of Theorem 6.1 by proving the converse direction.Suppose that  satisfies the properties in the statement of the theorem: (i)  ( ) ∈ {0, 1} whenever  ∈ {0, 1}  has weight at most .
(iii)  has (/  ) minimal non-Boolean inputs of weight , for all .
(iv)  has (/  ) minimal non-monotone Boolean inputs of weight , for all .
We need to show that  is ()-close to some monotone Boolean function  .where E is the event that for some  ∈ F we have   = 1 and   = 0 for all  ∈ G such that  ⊈ .Suppose that this event happens for .Denoting  = 1  , we have  |  =  |  , and   depending only upon on , and therefore there are   colors, for some constant   depending only for .The hypergraph Ramsey theorem now implies that for an appropriate value of  depending only on , there is a subset  ′ ⊆ { 1 , . . .,   } of size   such that all tuples of ℓ ≤  elements from  ′ have the same color  ℓ ( 1 , . . .,   ).
Let  0 , . . .,  ℓ be a random choice of colors.If we choose  1 , . . .,   ∈ M  at random, then with probability at least  , , all of them contain , and

Low degree functions A
When the domain {0, 1}  is not clear from context, we write it explicitly, for example   ({0, 1}  ) or   ().Given a function  : {0, 1}  → R, we often think of the domain as endowed with the measure   for some .In this case, we write  : ({0, 1}  ,   ) → R. If we then write E[  ] or Pr[  ∈ {0, 1}], then the underlying measure is   .Closeness Our basic notion of closeness is  2 .Two functions  ,  are -close with respect to   if E   [(  − ) 2 ] ≤ .If   is clear from context then we omit its mention. 1 , . . .,   ∈ {0, 1}.We call this representation the monomial expansion.We stress that it differs from the Fourier expansion.The functions   are called monomials, and the coefficients f () are called monomial coefficients.The monomial support of  is supp(  ) = { ⊆ [] : f () ≠ 0}.The degree of  , denoted deg  , is the maximal size of a set in supp(  ) (if supp(  ) = ∅ then deg  = 0).This notion of degree coincides with the Fourier-theoretic notion of degree.function  : {0, 1}  → R has degree  if deg  ≤ .A function is -close to degree  (with respect to   ) if E   [(  − ) 2 ] ≤  for some degree  function .
where We can also define these concepts via the -biased Fourier expansion.Let  =  f ()  , where {  } is the -biased Fourier basis, described in Section 4.1.Let  ≤ = ||≤ f ()  and  > = ||> f ()  .The function  has degree  if  > = 0.The degree  function closest to  (with respect to   ) is  ≤ , and so A function is Boolean if it is {0, 1}-valued.If  ,  are Boolean then they are -close if and only if Pr[  ≠ ] ≤ .In other words, for Boolean functions the  2 and  0 notions of distance coincide.If  : {0, 1}  → R then round(  , {0, 1}) is the Boolean function obtained by rounding each  ( ) to the nearest value among {0, 1}.The original function  is -close to Boolean if For every  ∈ N there is a constant   > 0 such that the following holds.If  : ({0, 1}  ,  1/2 ) → {0, 1} is monotone and -close to degree  then  is   -close to a monotone Boolean degree  function.Suppose that  : ({0, 1}  ,  1/2 ) → {0, 1} is monotone and -close to degree .Applying Theorem 2.4,  is   -close to a Boolean degree  function .
T H E O R E M 2 .6.P R O O F .
is a minimal non-monotone input of  then  |  depends on all inputs, and so || ≤   by Theorem 2.2.Moreover,  is the union of all sets in supp(  |  ), and so it is the union of at most || sets in supp(  ).■ We extend the definition on minimal non-monotone inputs to non-Boolean functions as follows.If  : {0, 1}  → R then an input  = 1  is minimal non-monotone Boolean if  |  is Boolean and  is a minimal non-monotone input of  |  .Lemma 2.8 holds also for minimal non-monotone Boolean inputs of a degree  function  : {0, 1}  → R, since if 1  is a minimal non-monotone Boolean input of  then it is also a minimal non-monotone input of the Boolean function  |  .