Discrete Analysis

. We outline the proof of the OSSS inequality. As an application we give a random decision tree complexity lower bound. Moreover, we introduce an inductive proof [4] of the main inequality.

This inequality motivates asking about a lower bound in terms of complexity of boolean functions, in particular, decision tree complexity.
This inequality improves (1.1) and can be generalized to more general contexts. An easy computation yields so when ∆(f ) ≤ d or D(f ) ≤ d and f is near-balanced, we obtain Inf max (f ) ≥ Ω(1/d), which improves (1.2) if ∆(f ) = o(n/ log(n)). The inequality (1.3) seems to be the first quantitatively strong influence lower bound in the literature that takes into account the computational complexity of f . It is easy to check that R(f ) ≤ D(f ), but a reverse inequality is quite subtle. It is well known that R(f ) ≥ Ω( D(f )) [2], and the largest known separation is the case D(f ) = n and R(f ) ≤ n β with β 0.753, which holds for a specific monotone transitive function. We say that a boolean function f is monotone if f (x) ≤ f (y) whenever x ≤ y, under componentwise partial order. On the other hand, we say f is transitive if for each pair i, j ∈ [n] there exists a permutation σ of [n] satisfying f (x 1 , . . . , x n ) = f (x σ(1) , . . . , x σ(n) ) for any x and σ(i) = j.

Randomized decision tree complexity lower bounds
An interesting subclass of transitive boolean functions is that of graph properties: a property of v−vertex graphs is a set of graphs on a vertex set V = {1, . . . , v} that is invariant under vertex relabelings. Then each graph G can be indentified with a vector x G ∈ {−1, 1} ( v 2 ) , and each property P, with a function f P such that f P (x G ) = 1 if and only if G satisfies P.
Several lower bounds for the RDT complexity of monotone graph properties have been obtained along the last four decades, but the following lower bound relies on purely probabilistic results and improve them (under some assumptions), and, for monotone transtive functions, is essentially as good as the best unconditional known bound. If f corresponds to a v−vertex graph property, then Since E[f ] is a strictly increasing continuous function on p, there exists p ∈ (0, 1) such that E[f ]. For that p, and since f is transitive, Theorem 1.1 yields 1 ≤ (Inf (f )/n) ∆(f ), where Inf (f ) := n i=1 Inf i (f ). Moreover, using that for any p ∈ (0, 1) and any monotone f : {−1, 1} n (p) → {−1, 1}, it holds Inf (f ) ≤ 2 p(1 − p)∆(f ), we obtain (1.4). The identity n = v 2 yields (1.5).

The main inequality
Theorem 1.1 can be formulated and proved in a more general context than the p−biased boolean cube. Indeed, let (Ω, µ) = (Ω 1 × . . . × Ω n , µ 1 × . . . × µ n ) be a n−wise product probability space, and let (Z, d) be a metric space. We will consider functions f : Ω → Z. Then a DDT for f is a rooted directed tree that satisfies • each internal node v is labeled by a coordinate i v ∈ [n], • each leaf is labeled by an element of Z, • the emanating edges from an internal node v are in one-to-one correspondence with Ω iv , • the nodes along every root-leaf path are distinct.
Then, replacing {−1, 1} n When Ω = {−1, 1} n (p) , and Z = {−1, 1} is equipped with the distance d(z, z ) = |z − z | = 2 1 z =z , the boolean case is involved.  Let x and y be random inputs chosen indepently from Ω. Let s be the number of coordinates queried by T on x, and let i 1 , . . . , i s , i s+1 , . . . , i n be the sequence of those coordinates with i t = ∅ whenever t > s.
Linearity of expectation and 1 {t≤s} = n i=1 1 {it=i} implies the previous identity. Now, consider the sequence of values (x i1 , x i2 , . . . , x min{t−1,s} ) read by T by time t−1 on input x. This sequence determines i t . Then, it is easy to show that variables y and (x j : j = i 1 , . . . , i min{t−1,s} ) are independent with respect to conditional distribution on (x 1 , . . . , x min{t−1,s} ), and their respective conditional distributions coincide with their original ones. Therefore, for any i, t ∈ [n], only differ on their i t th coordinate, which are x it and y it respectively. Taking expectation and summing in t gives So summing in i yields the desired result.
It turns out that inequality (1.6) is tight since identity holds for separated trees. Moreover, a twofunction version of Theorem 1.3 can be proved and used to obtain a estimation for the complexity of approximations of a given function g. Finally, it admits a version when Z is a semimetric space in which the constant on right hand side of (1.6) is greater than one.

An inductive proof of OSSS inequality
Another proof can be obtained through a inductive argument [4]. Indeed, a two-function version can be proved: let f, g : {−1, 1} n → {−1, 1} and T a DDT for f . Then The proof is based on martigale differences: for i ∈ [n], define It is easy to check that Inf n (f ), Inf n (g). The base case n = 1 supposes a simple verification. Let T be a DDT for f whose root has label x n . Let T −1 and T 1 be the left and right subtree.

Introduction
1} be a function on the n-dimensional hypercube. We define the sensitivity of f at x, which we will write s(f, x) as the number of inputs y ∈ {−1, 1} n that differ from x at exactly one co-ordinate such that f (x) = f (y). We define the sensitivity of f , or s(f ) as the maximum of the sensitivity of f over all of its inputs.
This is a notion of complexity, or lack of smoothness for functions on the hypercube. Multiple other notions of complexity for boolean functions have been studied and related to each other through the years. Amongst those we can highlight the block sensitivity and the degree.
Block Sensitivity bs(f ) Given a subset I of [n] := {1, . . . , n}, and binary string x = (x 1 , . . . x n ) ∈ Q n we define T I x as For a boolean function f : {−1, 1} n → {−1, 1} and a point x in {0, 1} n the quantity bs(f, x) (block sensitivity at x) counts how many disjoint subsets I 1 , I 2 , . . . , I bs(f,x) of [n] one can simultaneously find such that f (x) = f (T I k x). In particular, bs(f, x) ≥ s(f, x), since s(f, x) adds the further constraint that |I k | = 1.
We define the block sensitivity of f , or bs(f ) as the maximum of the sensitivity of f over all of its inputs.
The degree deg(f ) A function f : {−1, 1} n → {−1, 1} can be thought as the restriction of a polynomial P f : R n → R. Since x k = x k−2 for x ∈ −1, 1, one can restrict P f to be in the class of polynomials that are multilinear, that is, linear in each of their n variables. These polynomials are a basis for the set of functions f : {−1, 1} n → R, ans in particular, P f is determined uniquely by f .
The degree deg(f ) of P f is another measure of complexity for boolean functions.
These last two quantities are closely related to each other. Nisan and Szegedy [2] show that b(s) ≤ 2 deg(f ) 2 (which is Topic 12 in this school). This was later improved to bs(s) ≤ deg(f ) 2 by Tal [3]. Our goal in the rest of these notes will be to show that deg(f ) ≤ s(f ) 2 , showing that the three quantities are polynomially related.

The induced subgraph problem
Gotsman and Linial [4] reduced the problem of relating the sensitivity and the degree to that of understanding the degree of certain induced subgraphs of the hypercube graph Q n , which has as vertices the elements of {−1, 1} n , and edges joining vertices that differ only on one bit (coordinate).
Given a graph G, we denote by ∆(G) the maximun of the degrees of the vertices of G. Gotsman and Linial showed the following: Now, relating the sensitivity and the degree is related to computing the degree of certain induced subgraphs. Huang showed the following holds: If H is an induced subgraph of Q n with strictly more than 2 n−1 vertices, then the degree of H is at least √ n. Therefore, for any boolean function f : Combining this with the inequality deg(f ) ≥ bs(f ) 2 , one obtains a polynomial (fourth power) relation between bs(f ) and s(f ). This is not expected to be sharp: for the best known counterexamples only give a quadratic relation.

Proof of Theorem 2.2
The proof is a sleek modification the so called "spectral method" for graphs. We will first understand the method in the general setting, and then addapt it to our scenario. The goal is to bound (from below) the maximum degree of a graph using the following two tools: Lemma 2.3. Let G be an undirected graph with adjacency matrix A. Let B be a symmetric matrix such that |B ij | ≤ A ij . Then the degree of G is at least the largest eigenvalue (in absolute value) of B.
Proof. The largest eigenvalue is the l 2 → l 2 operator norm of B, and the degree of G bounds the l 1 → L ∞ (and l 1 → L ∞ by symmetry) norm of B. In particular, the lemma follows from Schur's test.
For a more direct proof, let v be an eigenvector associated to the largest eigenvalue of B, and assume v k is the largest component (in magnitude) of v. Then and the inequality follows by dividing by |v k | on both sides.
For the second tool, we define a principal submatrix of B as one that is obtained by removing the same set of rows and columns from B.
Lemma 2.4 (Cauchy's Interlace Theorem). Let B be a symmetric n × n real matrix with eigenvalues β 1 ≥ λ 2 · · · ≥ β n . LetB be an m × m principal submatrix of B, with eigenvaluesβ 1 ≥ · · · ≥β m . Then This follows essentially from the Courant-Fischer characterization of the eigenvalues of a symmetric matrix. The term Cauchy's Interlace Theorem is sometimes used for the case m = n − 1. The other cases can be deduced from this one by induction removing one row/column after the other.
Combining this two pieces of information we get immediate bounds to the maximum degree ∆(H) of a subgraph as follows: Corollary 2.5. Let G be a graph with n vertices and adjacency matrix A. If B is a n × n matrix with |B ij | ≤ |A ij | and eigenvalues λ 1 ≥ · · · ≥ λ n then any induced subgraph H ≤ G with cardinality > k will have maximum degree ∆(H) at least λ n−k+1 . Application of the spectral method to the hypercube If the vertices of the hypercube graph Q n are sorted in lexicographic order, the adjacency matrices A n on Q n satisfy the following recurrence relation in other words, the cube Q n is formed by getting two copies of Q n−1 (corresponding to the sub-matrices A n−1 ) and joining the two copies of each vertex in each cube (giving rise to the I 2 n−1 matrices).
We will build the matrices B n (in the spirit of Lemma 2.3) similarly, by the recurrence relation This construction already guarantees that |(B n ) ij | ≤ (A n ) ij , one of the conditions to apply Lemma 2.3. Moreover, it has particularly nice spectral properties: Proposition 2.6. The matrices B n have the following properties: 1. B 2 n = n · I 2 n 2. tr(B n ) = 0 3. Exactly half of the eigenvalues of B n are √ n. The other half are − √ n.
Proof sketch. The first equality is proven by induction on n, using the recursive definition of B n . It already implies that all the eigenvalues are ± √ n. The second equality follows by direct inspection. Since the trace is the sum of the eigenvalues, half of them must be √ n and half − √ n.
These are all the tools we need to show Theorem 2.2 (assuming Theorem 2.1, which will be proven in the next section): Proof of Theorem 2.2. We can apply Corollary 2.5 to the matrices B n (using that |(B n ) ij | ≤ (A n ) ij . If j > 2 n−1 then 2 n − j + 1 ≤ 2 n−1 . The first 2 n−1 eigenvalues of B n are √ n. Therefore the maximum vertex degree of any vertex-induced subgraph with j > 2 n−1 verices will be at least the (2 n − j + 1)-th largest eigeivalue of B n , that is, √ n.

The subgraph problem and the degree bounds
The goal of this section is to give a proof of Theorem 2.1, following the origina proof in [4]. The first step is to simplify the statements (GL1) and (GL2) of the theorem to simpler, but equivalent, statements. On one hand (GL1) can be transformed into a statement about sensitivity by studying the indicator function of the vertices of H. On the other hand (GL2) can be reduced to the case when the degree of f is maximal. That makes Theorem 2.1 equivalent to the following proposition: Proposition 2.7. The following are equivalent for any monotone function h : N → R (GL1 ) For any boolean function g with mean not equal to zero there is x with s(g, x) ≤ n − h(n).
(GL2 ) For any n ≥ 0 and any boolean function f : Proof. We will see the equivalence by relating the functions in the statements by g( This also implies f (x) = g(x)p(x). There are two key relations between f and g: (A) The function g has mean zero if and only if f has degree n. This is because for any multi-index I ⊂ [n] multiplication by p sends x I to its complement: x I p(x) = x [n]\I . Then g has a non-zero constant coefficient if and only if f has a degree n coefficient (coeficients are degree 1 in each separate x i ).
(B) It holds that s(g, x) = n − s(f, x). It holds that p(x)− = p(T i x), and therefore f (x) = f (T i x) if and only if g(x)− = g(T i x) (and viceversa).
(1 ) =⇒ (2 ) If d(f ) = n then g does not have mean zero by (A). In particular, s(g, x) ≤ n − h(n) at some point x. This implies that s(f (x)) ≥ h(n).
(2 ) =⇒ (1 ) If s(g, x) > n − h(n) for all x, then s(f ) < h(n) by (B). By (2') therefore d(f ) < n. Now, by (A) that shows that g has mean zero. Abstract. Quantum mechanics can speed up a range of search applications over unsorted data. For example, there is a quantum algorithm that can obtain one's phone number over a phone directory of N names arranged randomly in only O( √ N ) accesses to the database compared to at least 0.5N accesses needed by any classical algorithm.

Bibliography
This Letter presents a quantum mechanical algorithm for the following search problem that is polynomially faster than any classical algorithm. Search problem: Suppose there is an unsorted database containing N items and we want to find one out of them that satisfies a given condition. We can check if an item satisfies the condition in one step. The most efficient classical algorithm for this examines the items one by one requiring an average of 0.5N items to be examined before finding the desired one.
However, quantum mechanical systems can be in superpositions of states and simultaneously examine multiple items allowing a certain probability of examining the desired object. This Letter shows that using the same amount of hardware as in the classical case, but having the input and output in superspositions of states, we can find an object in O( √ N ) quantum mechanical steps instead of O(N ) classical steps.

Quantum mechanical algorithms
In a quantum computer the logic circuitry and time steps are essentially classical, the biggest difference are the memory bits that hold the variables. A classical bit can have state 0 or 1, while a quantum bit can have a state which could be 0 or 1 (computational basis vectors), but it could also be in a superposition state (linear combination of states): ψ = α0 + β1, where α, β ∈ C and |α| 2 + |β| 2 = 1. We say that α, β are the amplitudes of the states 0 and 1 respectively.
The quantum mechanical operations that can be performed are unitary operations, i.e. unitary matrices, that act on a small number of bits, i.e. vectors, in each step. The quantum search algorithm of this letter is a sequence of the following three unitary operations on a pure state followed by a measurement operation: The Walsh-Hadamard operation performed on a single bit is represented by the matrix i.e. a bit in the state 0 = 1 0 is transformed into a superposition in the two states (1/ Similarly, a bit in the state 1 = 0 i.e. the magnitude of the amplitude in each state is 1/ √ 2, but the phase of the amplitude (a constant multiplier of the form e iθ ) in the state 1 is inverted. Now for the first operation we need, consider the possible states of the system to be N := 2 n so they can be described by n bits. Then, we can perform the transformation M on each bit independently in sequence, thus changing the state of the system. If we start the system with all n bits in the first state, i.e. x = 0 ⊗ 0 ⊗ · · · ⊗ 0, we get a configuration ψ = 2 −n/2 2 n −1 x=0 x, where x is the binary representation of x. This way we can create a superposition in which the amplitude of the system being in any of the 2 n basic states is equal.
Next consider the case when the starting state x is another one of the 2 n possible states (not the 0 ⊗ 0 ⊗ · · · ⊗ 0). Performing the transformation M on each bit we get a superposition of states described by all possible n bit binary strings with amplitude of each state having the same magnitude of 2 −n/2 and sign either + or −. The sign of each state y is determined by the parity of the bitwise dot product of x and y, i.e. (−1) x·y . This describes the Walsh-Hadamard transformation on n bits.
The third transformation that we will need is the selective rotation of the phase of the amplitude in certain states. The transformation describing this for a 2-state system is of the form

The abstracted problem
Let a system have N = 2 n states which are labelled S 1 , S 2 , ..., S N and are represented as n bit strings. Let there be a unique state, say S ν , that satisfies the condition C(S ν ) = 1, whereas for all other states S, C(S) = 0. Assuming that for each S, the condition C(S) can be evaluated in unit time, the problem is to identify the state S ν . (a) Let the system be in any state S: If C(S) = 1, rotate the phase by π radians If C(S) = 0, leave the system unaltered (b) Apply the diffusion transform D which is defined by the matrix D as follows:

Algorithm
(iii) Measure the resulting state. This will be the state S ν with a probability of at least 0.5.

Convergence
The loop in step (ii) above is the heart of the algorithm. Each iteration of this loop increases the amplitude in the desired state by repetitions of the loop, the amplitude and hence the probability in the desired state reaches O(1). In order to see that the amplitude increases by O(1/ √ N ) in each repetition, we first show that the diffusion transform D is equivalent to the inversion about average operation which is a unitary operation.
Let α denote the average amplitude over all states S i , i.e. if α i is the amplitude in the i-th state, then α = 1 N N i=1 α i . Now, observe that the diffusion transform, D, defined in (b) can be represented in the form D = −I + 2P , where I is the identity matrix and P is a projection matrix with P i,j = 1/N, ∀i, j. Notice also that P 2 = P and that P acting on any vector v gives a vector each of whose components is equal to the average of all components. Thus, when D acts on an arbitrary vector v we get Since each component of the vector P v is A, where A is the average of all components of the vector v, the i-th component of Dv is which is precisely the inversion about average. Next consider what happens when we apply this operation to a vector with each of the components, except one, having an amplitude equal to C/

Then, the average
A of all components is approximately equal to C/ √ N , thus each of the (N − 1) components which are approximately equal to the average do not change significantly, while by (3.1), the component that was negative becomes positive and its magnitude increases by 2C/ √ N . Now, in the algorithm of subsection 3.3, in the loop of step (ii), first the amplitude in the selected state, S ν , is inverted. Then, the inversion about average operation is carried out, giving an increase in the amplitude of S ν by 2C/ √ N in each iteration. Therefore, as long as the magnitude of the amplitude in S ν is less than 1/ 2 and the increase in its magnitude is greater than 1/ √ 2N . Thus, there exists an M ≤ √ N such that in M repetitions of the loop in step (ii), the magnitude of the amplitude in S ν will exceed 1/ √ 2. Thus, measuring now the state of the system we get S ν with a probability ≥ 1/2.

Implementation
As mentioned in subsection 3.1 quantum mechanical operations that can be carried out in a controlled way are unitary operations that act on a small number of bits in each step, like for example the Walsh-Hadamard transformation with matrix say W and the phase rotation with matrix say R. We show that the diffusion transform D = −I + 2P , where P ij = 1/N , ∀i, j, can be implemented as a product of three such unitary transformations, namely, D = W RW , where and W ij = 2 −n/2 (−1) i·j as discussed in subsection 3.1. Writing where M is the matrix defined in subsection 3.1, we have W W = I and hence D 1 := W R 1 W = −I. Next, we evaluate D 2 := W R 2 W by standard matrix multiplication and get Thus, the only operations required for this quantum search algorithm are the Walsh-Hadamard transform and the conditional phase shift operation, and this makes the algorithm rather simple compared to many other known algorithms.
The author wishes to acknowledge Peter Shor, Ethan Bernstein, Gilles Brassard, Norm Margolus, and John Preskill for helpful comments.

Bibliography
At the end we discuss the sharpness of the result by examining the majority function.

Introduction
We are studying the Fourier transform of Boolean functions, which are functions which, in one dimension, assume values from a two element set {−1, 1}, and are generalized in higher dimensions as f : . N is often called the arity of the function, [2]. The study of Boolean functions was popularized by their contributions to areas like complexity theory and computer science, where major results were discovered by studying their Fourier transforms. Bourgain references the work of Friedgut [3], who discovered a sharp bound for indicators of monotone subsets of {−1, 1} N , as well as the works of Kahn, Kalai, and Linial who studied how variables can influence these functions. Again, here, the analysis of the Fourier transform is crucial to obtaining the desired results.
Heuristically, the idea is that the higher the complexity of the property that f defines, the more spread out the support of the Fourier transform suppf has to be. An application of this idea is the topic of Bourgain's paper [1], where we particularly see that the tail distribution of the Fourier transform of a function f which is not essentially determined by a few variables is bounded below, as described in Theorem 4.1.
Johan Håstad, and his work with Boolean functions in [4] and [5], was the one to initially raise the question about the tail distributions, however his original estimate was of the order C −k .
Throughout this summary we shall use 1 A to denote the usual indicator function of a subset A of the real numbers R or the integers Z. We also use [1, N ] = {1, 2, . . . , N } for the interval of integers. Moreover, any logarithms that appear are base 2. Finally,f to denote the Fourier transform of a Boolean function f . Specifically, for a real function f :

The main result
Bourgain's main result is Proposition 1 of [1], which is presented below.
The implied constant C ε in (4.2) depends on ε but is independent of k.
We shall devote the rest of the section to a brief sketch of the proof. The first step is to define a subinterval I 0 of [1,N] that contains integers for which the quantity |S|>k |f (S)| 2 is "large" as long as S contains that integer. We can then bound the size of I 0 and use that bound to show that, on I 0 , and provided that (4.1) holds, We then focus on its complement I 0 = [1, N ] \ I 0 , and use the aforementioned estimate to show that when restricted on I 0 , The next step is to consider dyadic sums defined as and to show that for an arbitrary 0 ≤ t 0 ≤ log k and 1 < p < 2 there holds Proving (4.3) is the biggest part of the proof, and involves carefully decomposing x = (x 1 , x 2 ) so that x 1 ∈ {−1, 1} I1 , for a subset I 1 of I 0 that satisfies certain growth criteria, as well as bounds for the expectations of the sizes of various intersection with S. Now, considering f as a function of two variables, we study its Fourier transforms with respect to each of them and attempt to obtain bounds for them. Working on the Fourier side, it is possible to ultimately obtain (4.3).
The last step is to consider two cases, one for t≤log k 2 t ρ t < √ k and one for t≤log k 2 t ρ t ≥ √ k, and show that in either case, the right hand-side of (4.3) satisfies the desired lower bound of (4.2), which completes the proof.

The majority function as an example of sharpness
Bourgain presents the following Corollary as an immediate consequence of Theorem 4.1: Let K be a positive integer and assume that We now discuss how Bourgain established that the lower bound as presented in (4.4) is a sharp one.
It is shown in [7] that the majority function satisfies and therefore one can easily see that Abstract. It is shown in [Y04] that in a threshold activation control system with a linear weighted majority function, boolean noise with probability ε produces a difference in decision with probability O( √ ε).

Main Results
Noise sensibility of boolean functions has attracted a lot of attention in the last decades. In [Y04], Y.
Peres studies noise sensitivity of the majority function where w i ∈ R are given weights, t ∈ R is some given threshold and sgn (y) = y/|y| for y = 0 and normalized so that sgn (0) = 0. The main result of the paper [Y04] is the following. where σX = (σ 1 X 1 , ..., σ N X N ). In fact we have the stronger estimate where m = ε −1 and B m ≡ Bin(m, 1/2). and asked if the exponent 1/4 can be improved. Indeed, the exponent 1/2 in Theorem 5.1 is optimal since for the classical majority function (w i = 1 for all i) we have [G66] lim N →∞ where above we used that Hence we cannot replace 1.92 by anything smaller than 2/π = 0.797....

Proof of the Main Result
First note that we can assume that t = 0 and w i > 0. The main idea is to write P[f (X) = f (σX)] as an expectation. Let m = ε −1 , consider a partition of [N ] into m + 1 sets A j for j = 0, ..., m and define the sums with the convention that f ∅ = 0. We then select the sets A j randomly by setting Peres then shows the key identity Let Q be the quantity above inside the expectation. If f (X) = 0 then and if sgn f (X) = ±1 then where above we use that E[|Bin( , A summary written by Dylan Langharst Abstract. We analyze a black-box model and determine the number of input variables a quantum algorithm in said model requires to compute Boolean functions on {0, 1} N . We show the exponential speed increase for partial functions from certain algorithms cannot be obtained for any total function. In the exact, zero-error and bounded-error settings, asymptotic estimates for T are given. These results are a quantum extension of the polynomial method.

Introduction and Definitions
A boolean variable is a variable that takes on values 0, 1; an N -tuple of Boolean variables shall be denoted X = (x 0 , x 1 , . . . , x N −1 ). A black-box model is a form of computation where, given the input i, the black-box outputs the bit x i . Accessing the bits x i only through a black-box is called a query. A function f : {0, 1} N → {0, 1} is a Boolean function, and is called a property of X. The goal of this paper is to compute such properties using as few queries as possible. Quantum mechanics allows a drastic increase in the efficiency of algorithms design to accomplish this task. For example, the computation determines if any of bits x i of X contain a 1, and classically (i.e. deterministic-ally or probabilistic-ally) has a computation time of Θ( √ N ). However, by using the concept of superposition, Grover [2] was able to construct a quantum algorithm using only O( √ N ) queries; different i can be in super-positions, and so a query can access different input bits x i , each with some probability amplitude, simultaneously.
A promise is a model with some constraint. For example: consider the black-box model with N = n2 n , then, query X n times. This creates a function,X : {0, 1} n → {0, 1} n . Suppose we have the constraint, or promise, that there exists an s ∈ {0, 1} n such thatX(i) =X(j) if, and only if, i = j + s mod 2 component wise. Simon's problem is, given this scenario, one must compute if s is the n-tuple of 0's. The quantum algorithm for such a task requires O(n) applications ofX, while classically, Ω( √ 2 n ) queries are required. The fact that there is a promise means that Simon's problem is partial, as the associated f : {0, 1} N → {0, 1} is not defined on all X ∈ {0, 1} N , but only on those that satisfy the promise.
The goal of this paper is to establish upper and lower bounds for the black-box complexity of several functions and classes of functions in the quantum computing setting. In particular, it will be shown that the exponential speed-up, discussed in the example of the Simon problem, cannot be obtained for a quantum algorithm for an arbitrary total function; like in the OR N (X) example, only a polynomial speed-up is possible in general. The main step is the translation of quantum algorithms that make T queries into multi-linear polynomials of degree at most 2T over N variables; this is a quantum extension of the polynomial method. Three different settings for computing f on {0, 1} N in the black-box model will be discussed: The exact setting where an algorithm must return f (X) with certainty for every X; the zero-error setting where, for every X, the result "inconclusive" can have probability at most 1/2; when a result is returned, it must be exact; and the two-sided bounded-error setting, or Monte Carlo algorithm, where, for every X, an algorithm must return the correct answer with probability > 2/3.
Throughout, X will be an N -tuple, and with N an arbitrary positive integer unless specified. The Hamming weight of X is the number of 1's of X, denoted |X|. We say f is symmetric if f (X) depends only on |X|. We will be interested in symmetric functions, non-symmetric functions and the functions AND, OR, PARITY, and MAJORITY. These functions are defined as follows: OR N (X) = 1 iff |X| > 0, AND N (X) = 1 iff |X| = N , PARITY N (X) = 1 iff |X| = 1 mod 2, and MAJ N (X) = 1 iff |X| > N/2.
A multilinear N -variate polynomial p : R N → R represents a function f if p(X) = f (X) for all X ∈ {0, 1} N . If such a p exists then it is unique and has degree ≤ N ; the degree is deg(f ). If |p(X) − f (X)| ≤ 1 3 for all X ∈ {0, 1} N , then we say p approximates f , andd eg(f ) is the degree of a minimum-degree polynomial p that approximates f. If S N is the symmetry group of {0, 1, . . . , N − 1} and π is any permutation, then π(X) = {x π(0) , . . . , x π(N −1) } and the symmetrization of a polynomial p is given by (6.1) p sym (X) = π∈S N p(π(X)) N !

Quantum Networks
Throughout we will assume f is a Boolean function on N -tuples X, and a black-box on i returns the bit x i of X. A classical algorithm that computes f using black-box queries is a decision tree. The cost is the number of queries made on the worst-case input X. A quantum network with T queries is a string of unitary operations that changes the state of a quantum bit, or qubit, in the form . . , 2 m − 1}, then a superposition state φ is given by φ = k∈K α k |k , α k ∈ C and k∈K |α k | 2 = 1; the probability of measuring |k is |α k | 2 . The initial state will always be taken to be |0 . Unitary operations act in the following way: let = addition mod 2 = exclusive-or. Then, if i is log N bits, b is one bit and z is The right-most qubit of the final state of a network is the output bit. If this output equals f (X) with certainty for every X, then the network computes f exactly. If the output equals f (X) with probability at least 2/3, then the bounded error probability is said to be at most 1/3. For the zeroerror setting, the two rightmost qubits are observed. If the first qubit is 0, then the network outputs "inconclusive". Otherwise, the second qubit should contain f (X) with certainty. The minimum number of queries required by a quantum network to compute f will be denoted Q E (f ), Q 0 (f ) and Q 2 (f ) for exact, zero-error and bounded-error settings respectively.

Peremptory Lemmas
Lemma 6.1. Let N be a quantum network that makes T queries to a black-box X. Then, there exists complex-valued N -variate multilinear polynomials p 0 , . . . , p 2 m −1 , each of degree at most T , such that the final state of the network is the superposition state k∈K p k (X)|k for any black-box X.
The main argument is that, if the amplitude of |i, 0, z is α and the amplitude of |i, 1, z, is β before a query, then after a query the amplitudes are (1 − x 1 )α + x i β and x 1 α + (1 − x i )β respectively (which are polynomials of degree 1); then, continue counting after each query. Furthermore, by splitting each polynomial p k into real and imaginary parts, one obtains the following lemma: Lemma 6.2. Let N be a quantum network that makes T queries to a black-box X, and B be a set of basis states. Then, there exists a real-valued multilinear polynomial P (X) of degree at most 2T which equals the probability that observing the final state of the network with black-box X yields a state from B.

The exact and zero-error settings
Proof. Suppose a quantum network computes f using exactly T = Q E (f ) queries. Then, its acceptance polynomial has degree deg(f ). But from Lemma 6.2, this is bounded above by 2T , and the conclusion follows.
In [5], Nisan and Szegedy showed that, if f is a Boolean function that depends on N variables, then deg(f ) ≥ log N − O(log log N ). Combining this and Theorem 6.3, we obtain the following. Suppose p : R n → R is a multilinear polynomial. Then, it was shown in [3] that there exists a polynomial q : R → R of degree at most deg(p) such that p sym (X) = q(|X|) for all X ∈ {0, 1} N . Letting T = Q 0 (f ) and using this fact and Lemma 6.2 on the set of basis states that have 11 as the rightmost bits, one obtains the following.
Theorem 6.5. If f is non-constant and symmetric, then Q 0 (f ) ≥ (N + 1)/4. We conclude this section by remarking that the above yields like OR N , AND N , etc. require at least (N +1)/4 queries to be computed exactly or with zero-error on a quantum network. Thus, since N queries always suffice (even classically) one has, for all non-constant symmetric f that Q E (f ), Q O (f ) ∈ Θ(N ).

Lower Bounds for Bounded-Error Quantum Computation
From the definition of bounded error, one immediately obtains the following.
In the case of symmetric f , we can do better. Let f be symmetric, and denote f k = f (X) for |X| = k. Define Theorem 6.7. If f is non-constant and symmetric, then Q 2 (f ) ∈ Θ( N (N − Γ(f ))).

Lower Bounds from Block Sensitivity
In Section 6.3, we saw that the minimum number of queries for a quantum network in various settings is bounded below by degrees of polynomials. Instead of polynomials, one can introduce another method for bounding these quantities.
Through a series of counting arguments, one can show the following.
Theorem 6.8. If f is a Boolean function, then

Polynomial Relation for Classical and Quantum Complexity and Specific Functions
Let D(f ) be the decision tree complexity D(f ) of f , that is the cost of the best decision tree that (classically) computes f . Similarly, let R(f ) be the worst-case number of queries for randomized algorithms that computes (classically) f (X) with error probability ≤ 1/3 for all X. We will state, without proof, various relations between D(f ), R(f ), Q E (f ), Q 0 (f ), and Q 2 (f ).
Theorem 6.9. Let f : {0, 1} N → {0, 1} be a Boolean function, X ∈ {0, 1} N . Then, the following hold (some of which are already known): We remark that Item Four in Theorem 6.9 implies that if a quantum algorithm computes f with bounded-error probability using T queries, then the corresponding classical algorithm needs at most O(T 6 ) queries. Item Five states, if f is monotonically increasing (decreasing), that is changing any input bit from 0 to 1 causes an increase (decrease), then one only needs O(T 4 ) queries. Furthermore, if f is symmetric, then Theorem 6.7 yields Q 2 (f ) ∈ Ω( √ N ) and so the classical algorithm only needs O(T 2 ) queries. We conclude by stating the following calculations for specific functions.

Complexity Measures of Boolean Functions
It was also proved in [2] that A long-standing conjecture is whether block sensitivity can bounded by a polynomial in sensitivity. The monomial X S of the index set S ⊂ {1, . . . , n} is defined as the product of variables X S = i∈S x i . If a function p : R n → C can be written p(x) = S c S X S for c S ∈ C, then we call p a multilinear polynomial with degree deg(p) = max{|S| |c S = 0}. A polynomial p : R n → R represents f if p(x) = f (x) on all Boolean inputs x. Every Boolean function can be represented by a unique multilinear polynomial p : R n → R, and we define the degree deg(f ) as the degree deg(p) of the multilinear polynomial p that represents f . It was proved in [3] that if f depends on all n variables. The approximate degree deg(f ) of f is defined as the minimum degree of any multilinear polynomial p : R n → R such that |p(x) − f (x)| ≤ 1 3 for any Boolean input x ∈ {0, 1} n . Ambainis ( [4]) proved that almost all functions f have high approximate degree In 1994, Nisan and Szegedy ( [3]) pointed out that and bs(f ) ≤ 6 deg(f ) 2 (7.6) for any Boolean function f . They also put forth the conjecture that sensitivity is bounded below by a polynomial of any other complexity measure, which has been resolved recently in [5].

Decision Trees
A deterministic decision tree for a Boolean function f : {0, 1} n → {0, 1} is a rooted ordered binary tree, where each internal node is assigned with an input bit x i and each leaf is assigned with either 0 or 1. We proceed the computation by querying the input bit assigned to the root, which lead to the left (right) sub-tree if the returned value is 0 (1, respectively). We repeat the procedure recursively till we reach the leaf. A decision tree is said to compute f if the outputs of the tree coincide with the outputs of f for all Boolean inputs. Decision tree complexity D(f ) of f is the minimal depth of trees that compute f . We can add randomness to the decision tree by including a coin flip node with bias p ∈ (0, 1) . We reach the left (right) sub-tree if the outcome of the coin flip is head (tail). Such a tree is said to compute f with bounded-error if the outcome of the tree equals f (x) with probability at least 2/3 for all Boolean inputs x. The corresponding tree complexity R 2 (f ) is the minimal depth of trees that compute f with bounded-error. The quantum decision tree is usually referred to as quantum query algorithm or quantum black-box algorithm in the literature, where we work with qubits instead of classical binary bits. A T -quantum decision tree is defined by a initial state |0 and a series unitary transformations U 0 , O, U 1 , . . . , 0, U T , where O is the query unitary transformation. Here U i are independent of the choice of the Boolean input. The output of the quantum decision tree only depend on querying the input T times via O. The quantum decision tree is said to compute the function f exactly if the output of the quantum algorithm coincide with f (x) for any Boolean input x. It is said to compute the function f with bounded-error if the output of the quantum algorithm equals f (x) with probability at least 2/3 for any Boolean input x. Let Q E (f ) and Q 2 (f ) be the minimal number of queries if a quantum decision tree that compute f exactly and with bounded-error, respectively. Every T -query deterministic decision tree can be simulated by a T -query quantum decision tree without error, and every T -query randomized decision tree can be simulated by a T -query quantum decision tree with bounded-error. Thus we have and for any Boolean function f .

Applications to Decision Tree Complexity
A natural question is how the complexity measures C(f ), s(f ), bs(f ), deg(f ), and deg(f ) of Boolean functions are related to the decision trees complexity D(f ), R 2 (f ), Q E (f ), and Q 2 (f ). It has been proved that those complexity measures are all polynomially related. We first summarize relationship between the deterministic decision tree complexity D(f ) and the complexity measures of Boolean functions f . For any Boolean function f , we have It still remains unknown whether we can bound D(f ) by block sensitivity quadratically. Let f : {0, 1} k 2 → {0, 1} be the AND of k ORs of k variables each, then D(f ) = bs(f ) 2 = n. Thus the optimal scenario is D(f ) = bs(f ) 2 for any Boolean function f .
This gap is not optimal, and biggest gap between D(f ) and R 2 (f ) still remains a conjecture. R 2 (f ) is bounded below by approximate degree and block sensitivity ( [2]): and bs(f ) ≤ 3 R 2 (f ). (7.11) Q E (f ) and Q 2 (f ) are the quantum analogue of D(f ) and R 2 (f ), respectively, and the biggest gap between Q E (f ) and Q 2 (f ) remains unknown as well. [2] Nisan, N., CREW PRAMs and decision trees. SIAM Journal on Computing, 20(6), 999-1007 (1991).

Vector-valued Talagrand influence inequalities after D. Cordero-Erausquin and A. Eskenazis [2] A summary written by Sang Woo Ryoo
Abstract. Talagrand's influence inequality is an enhancement of the discrete Poincaré inequality for real-valued functions on the discrete hypercube. We state and prove Talagrandtype inequalities for functions on the discrete hypercube taking values in Banach spaces of Rademacher or martingale type 2. The proof builds upon the work of Ivanisvili, van Handel, and Volberg (2020), who proved the discrete Poincaré inequality for functions taking values in Banach spaces of Rademacher type 2, and uses Bonami's hypercontractive inequality and a vector-valued Littlewood-Paley-Stein due to Xu (2020).

Introduction
Let C n = {−1, 1} n be the discrete hypercube, and let σ n be the uniform probability measure on C n . If (E, · E ) is a Banach space and p ≥ 1, then we denote the L p (σ n ; E) norm of a function f : C n → E by We define the i-th partial discrete derivative of f by When E = C, the discrete Poincaré inequality tells us that for f : C n → C, Talagrand's influence inequality [5] provides an asymptotic improvement over the discrete Poincaré inequality: there exists C > 0 such that for all f : C n → C, One may inquire whether analogous phenomena happen for general Banach spaces E. At the very least, we should require (8.1) to be true (up to constant factors) for linear functions f (ε) = n i=1 ε i x i , x i ∈ E: there should exist T > 0 such that We say that E has Rademacher type 2 with constant T if (8.3) holds. The recent breakthrough of Ivanisvili, van Handel, and Volberg [3] asserts that then the discrete Poincaré inequality holds: Theorem 8.1 ([3]). There is a universal constant C > 0 such that the following is true. Let (E, · E ) be a Banach space having Rademacher type 2 with constant T . Then for any n ∈ N, f : C n → E, Cordero-Erausquin and Eskenazis [2] enhance the approach of [3] to prove a near-optimal analogue of Talagrand's influence inequality.
It is unknown whether the proper Talagrand influence inequality (8.2) holds for Banach spaces with Rademacher type 2. It does hold, however, under the stronger assumption that E has martingale type 2, i.e., there exists M > 0 such that for every n ∈ N, probability space (Ω, F, µ), and filtration Theorem 8.3 ([2], Theorem 2). Let E be a Banach space with martingale type 2. Then, there exists C(E) ∈ (0, ∞) such that for every n ∈ N and f : C n → E, We will prove Theorem 8.2 in section 8.2 and Theorem 8.3 in section 8.3.

Proof of Theorem 8.2
We will first sketch the proof of Theorem 8.1 given by [3], and then describe the modifications made by [2] which lead to Theorem 8.2. We consider the heat flow on C n relative to the Laplacian The heat kernel at time t is given by the random vector ξ(t) = (ξ 1 (t), · · · , ξ n (t)) ∈ C n , P{ξ i (t) = ±1} = 1 ± e −t 2 whose coordinates are independent, so that the time-t evolute of f : C n → E is P t f (ε) = E ξ(t) f (εξ). We also denote the centered normalization δ(t) = (δ 1 (t), · · · , δ n (t)) of ξ(t): The key idea of [2] is as follows. First, we have the identity by a straightforward computation, and so by convexity where we used that (ε, ξ(t)) d = (ε, εξ(t)). Due to a result by Ledoux and Talagrand [4,Proposition 9.11], since δ i are centered and normalized, we may use the type condition on the expectation with a slightly worse constant: and so The idea of [2] is to replace f by P t f in (8.6): (we replace ∂ ∂t by ∆ to avoid confusion, and we used the fact that P t and ∂ t commute), and then apply Bonami's hypercontractive inequality [1] P t g L2(σn;E) ≤ g L 1+e −2t (σn;E) , ∀g : C n → E, to obtain One can show by calculus that for any g : C n → E,

Proof of Theorem 8.3
The starting point of the proof is the following vector-valued Littlewood-Paley-Stein inequality due to Xu [6]: Theorem 8.4 ([6], Theorem 2). Let (E, · E ) be a Banach space with martingale type 2. Then there exists C(E) > 0 such that for a symmetric diffusion semigroup {T t } t≥0 on a probability space (Ω, µ), We now proceed with (8.7): One can show by calculus that for any g : C n → E, 1 + log( g L2(σn;E) / g L1(σn;E) ) . Abstract. We outline a paper by Michel Talagrand in which he proves the existence of a 'threshold' effect for the measures of sufficiently nice subsets of the discrete cube as the mass of the cube becomes more concentrated towards a single vertex. Some of the more informative proofs are explained in detail while others are more tersely summarized.
Given p ∈ [0, 1], consider the product measure µ p on the discrete cube {0, 1} n in which 0 is given weight 1 − p and 1 is given weight p, i.e., considering x = (x 1 , ..., x n ) ∈ {0, 1} n , and writing |x| = x i , we have that µ p ({x}) = (1 − p) n−|x| p |x| . In his paper "On Russo's Approximate Zero-One Law," Michel Talagrand investigates a so-called "threshhold effect" in this measure, namely that for specific types of subsets A of the discrete cube, the measure µ p (A) increases from near 0 to near 1 as p varies within a very small neighborhood of [0, 1]. In all cases, we assume A to be a monotone subset of the discrete cube, that is that for any point x ∈ A, any other point y ∈ {0, 1} n whose coordinates pointwise dominate those of x must also be in A. On a purely intuitive level, this threshold effect has been demonstrated to exist for subsets that are essentially determined by very few coordinates 1 . The author expands on a result by Russo in which the threshold effect exists as soon as A depends little on any given coordinate, although he adapts Russo's definition as follows. Given x = (x 1 , ..., x n ) ∈ {0, 1} n , let U i (x) = (x 1 , ..., x i−1 , 1−x i , x i+1 , ..., x n ) and set A i = {x ∈ {0, 1} n ; x ∈ A, U i (x) / ∈ A}. Since by definition a monotone subset A of the discrete cube must contain U i (x) if x i = 0 and x ∈ A, the set A i gives us some idea of which points in A are in A without their existence being required by the presence of the point "directly underneath x" in the i th coordinate direction. This encodes the idea of "points in A that don't depend on the i th coordinate". This brings us to the primary result Talagrand presents in the paper Theorem. There exists a universal constant K, such that, for any p and any monotone subset A of {0, 1} n , we have which gives rise to the following corollaries: Corollary. Let ε = sup 0≤p≤1 sup i µ p (A i ). Then, for p 1 < p 2 , we have where K is universal.
Corollary. We have where K' is universal and where U = µ p (A)(1 − µ p (A))/(n log(2/p(1 − p))) In essence all these corrolaries illustrate that the presence of a threshold effect (encoded as the restrictions on the quantity µ p (A)(1 − µ p (A)) is more quantifiable the less A depends on any given coordinate (the dependence on the i-th coordinate is itself encoded by the magnitudes µ p (A i ). In the case where p = 1 2 = µ p (A), one can prove Corollary 3 using harmonic analysis. Talagrand adapts these ideas in order to prove an analogous result in this more general setting where the techniques of harmonic analysis are unavailable. Moreover, since Theorem 1 doesn't concern itself with the specifics of the set A too much, it can be derived from the following more general result concerning functions on {0, 1} n : Theorem. For some numerical constant K and each function f : {0, 1} n → R such that f dµ p = 0, we have .
Here ||f || q denotes the L q (µ p ) norm. Theorem 1 is an immediate consequence of Theorem 2 as soon as one observes that for In addition Talagrand proves another estimate that improves upon the result in Theorem 2, although Theorem 2 is still included for its ease of understanding and sufficiency in deducing Theorem 1. For this, let ϕ(x) = x 2 log(e+x) for x ≥ 1. For a function f we will consider the following Orlicz norm: which is used in the following result: Theorem. There is a universal constant K such that for each f : The introduction concludes with a proof of the following claim: Claim. For each p, the estimate in Theorem 1 is sharp.
Proof. Case 1: p < 1 2 . Let k ≥ 1 and assume that r = p −k is an integer. For n = kr consider points in {0, 1} n as r k-tuples of coordinates. Let A denote the set of points in {0, 1} n such that at least one k-tuple of coordinates consists of 1's only. We can compute that µ p (A) = (1 − p k ) r , which we note approximates e −1 closely for sufficiently large r. This tells us that the left hand side of (9.1) is of constant order. Furthermore, we have that for each i, µ p (A i ) = p k (1−p k ) r−1 , which by the same logic as above approximates p k e , which means that nµ p (A i ) is of order k. Furthermore, given that log (1/(1 − p)µ p (A i )) k log(1/p), which gives us that the RHS of (9.1) is also of order 1.
As a tool in some of proofs, Talagrand introduces the following collection of functions on L 2 (µ p ). Given a subset S ⊆ {1, ..., n}, write Noting that r ∅ ≡ 1, we have that {r S } S⊆{1,...,n} forms an orthogonal basis for L 2 (µ p ). Given g = a S r S such that a ∅ = gdµ p = 0 define The quantity M (g) is important to the results presented in the paper (specifically Theorem 2) insofar that for f on {0, 1} n with f dµ p = 0, we can write f = S b S r S , b ∅ = 0. We note that ∆ i has been defined in such a way that ∆ i (r S ) = 0 if i / ∈ S and ∆ i (r S ) = r S if i ∈ S. Thus, a series of computations allows us to deduce that Talagrand then proceeds to prove the following important property for the basis {r S } S⊆{1,...,n} .
Lemma. For q ≥ 2 and set θ = . Then for any k and numbers {a S } |S|=k , we have Proof.
We have that the w S form an orthonormal basis for L 2 (λ), and moreover, we have by results from Fourier analysis that the operator (9.5) Step 2: Equip the space H = {0, 1} n × {0, 1} n × {−1, 1} n with the measure ν = µ p ⊗ µ p ⊗ λ and consider the function By some simple computations, it follows that Furthermore, applying the results from Step 1 to b S = a S g S (x, y) = a S i∈S (r i (x) − r i (y)), noting that |r i (x) − r i (y)| ≤ θ, and some further computations yield that Step 3: The result follows by combining the estimates obtained in Part 2 as well as the observation that .
Using duality via Hölder's inequality, one can obtain the following result from the previous Lemma: Proposition. Let g : {0, 1} n → R and set a S = r S gdµ. Then where q is the conjugate exponent of q.
Equation (3) combined with the next statement are sufficient to prove Theorem 2 (and by extension Theorem 1): Proposition. For some universal constant K, if gdµ p = 0, we have Proof. The proof of this proposition involves considering the result from Proposition 1 for the case q = 3, q = 3 2 . We also observe that for the sequence x k = (2θ 2 ) k k and for any integer m, k≤m x k ≤ 2x m by previous observations. Thus, combining the results from our application of Prop 1 and this observation, we obtain the following: One can then cleverly choose m as the largest integer such that (2θ 2 ) m ||g|| 2 3/2 ≤ ||g|| 2 2 . 2 This results in the observation that (2θ 2 ) m+1 ||g|| 2 3/2 ≥ ||g|| 2 2 , i.e. m + 1 ≥ 2 log(||g|| 2 /||g|| 3/2 ) log 2θ 2 Using both this and our initial constraint on m, we can plug this into what we have so far for M (g) 2 to get M (g) 2 ≤ K log 2θ 2 log(e||g|| 2 /||g|| 3/2 ) ||g|| 2 2 We finally obtain the desired result by noting that which is in itself a simple consequence of the Cauchy-Schwarz Inequality.
The remainder of the section is dedicated to the proof of Theorem 3, which itself involves exploiting a few key properties of the Orlicz norm || · || ϕ . These are the following: Lemma. For a function f : This allows us to improve upon Proposition 2 as follows: Proposition. For a universal constant K, we have .
The proof of this proposition involves manipulating the following family of seminorms. Given h = h S r S , with h ∅ = 0, define: The remaining proof proceeds similarly to using Proposition 1 to prove Proposition 2. Theorem 3 immediately follows via an application of Propsosition 3 to Equation (3). The remainder of the paper concerns deriving the corollaries of Theorem 1.
Proof of Corollary 1. The only thing required beyond Theorem 1 is what is commonly referred to as "Russo's formula", that is: from which the remaining derivations are straightforward.
The computations to derive Corollary 2 follows from an application of Corollary 1 to the expression d dp (g(µ p (A))) where g(x) = log(x/(1 − x)). Finally, Corollary 3 is a consequence of Theorem 1 and the observation that x log(1/x) is increasing for x < 1 as well as the fact that for x ≤ 1 2 y log(1/y) ≥ x =⇒ y ≥ x K log(1/x). Abstract. This is a terse summary. The main theorem is stated along with related theorems. The proof of the main theorem is outlined. We use f p := (E|f | p ) 1/p , 1 ≤ p < ∞ to denote the associated L p -norms.

Main Theorem and Related Theorems
For each S ⊂ [n], let |S| be the cardinality of S. Then f : {±1} n → R is of degree-d if f (S) = 0 for all |S| > d, and f is d-homogeneous if f (S) = 0 whenever |S| = d. We will need the following consequence of hypercontractivity [1] This result is closely related to many other topics such as Sidon sets, Boolean radii and the Aaronson-Ambainis conjecture [3]. To compare, we also have Bohnenblust-Hille type inequalities for real polynomials on n-dimensional cubes [−1, 1] n , with the best constants BH  Any element i in I(d, n) can be uniquely decomposed into the direct sum of some i 1 ∈ I(S, n) and some i 2 ∈ I( S, n).

Proof of the d-homogeneous case
The following inequality is crucial and will also be used in the proof of the degree-d case.
[2] Let n ∈ N and 1 ≤ k ≤ d be integers. Then for any scalar matrix (a i ) i∈I(d,n) , we have (11.7)   i∈I(d,n) The proof of (11.6) for BH =d {±1} is based on the following inductive inequality for some constant C(k, d) > 0 and any 1 ≤ k ≤ d. The desired bound (11.6) for BH =d {±1} will follow by applying (11.8) repeatedly to special k's. Now we use a simple example to illustrate the proof of (11.8). Let (d, k) = (2, 1). Then for any 2-homogeneous function f (x) = i<j a ij x i x j on {±1} n we need to show  Here and in what follows, C > 0 is some constant that may differ from line to line. The proof consists of four steps.
Step 1: Put a ii := 0, i ∈ [n] and a ji := a ij for i < j.
Step 2: Apply the inequality (11.7) to (a ij ) i,j∈[n] : Step 3: The hypercontractivity result (11.4) (with p = 1) implies By definition of BH =1 {±1} , the last term is bounded from above by a ij x i y j .
Step 4: By a polarization result that we will discuss later, the right-hand side is bounded from above by CBH =1 {±1} f {±1} n . This finishes the proof. Now let us recall the polarization result in the last step establishing (11.9) sup x,y∈{±1} n i =j a ij x i y j ≤ C f {±1} n .
Any degree-d function f : {±1} n → R is the restriction of a unique polynomial (the tetrahedral ) P = P f : R n → R that is affine in each variable. Moreover, f {±1} n = P f [−1,1] n . This polynomial P is associated to a unique d-affine symmetric form L : (R n ) d → R such that P (x) = L(x, . . . , x). When f is dhomogeneous, the form L is d-linear. For the above f (x) = i<j a ij x i x j , L(x, y) = 1