Is submodularity testable?

We initiate the study of property testing of submodularity on the boolean hypercube. Submodular functions come up in a variety of applications in combinatorial optimization. For a vast range of algorithms, the existence of an oracle to a submodular function is assumed. But how does one check if this oracle indeed represents a submodular function? Consider a function f:{0,1}^n \rightarrow R. The distance to submodularity is the minimum fraction of values of $f$ that need to be modified to make f submodular. If this distance is more than epsilon>0, then we say that f is epsilon-far from being submodular. The aim is to have an efficient procedure that, given input f that is epsilon-far from being submodular, certifies that f is not submodular. We analyze a very natural tester for this problem, and prove that it runs in subexponential time. This gives the first non-trivial tester for submodularity. On the other hand, we prove an interesting lower bound (that is, unfortunately, quite far from the upper bound) suggesting that this tester cannot be very efficient in terms of epsilon. This involves non-trivial examples of functions which are far from submodular and yet do not exhibit too many local violations. We also provide some constructions indicating the difficulty in designing a tester for submodularity. We construct a partial function defined on exponentially many points that cannot be extended to a submodular function, but any strict subset of these values can be extended to a submodular function.


Introduction
Submodular functions have been studied in great depth in combinatorial optimization [Edm70,NWF78,FNW78,Lov83,Fra97,Sch00,FFI01]. A set function 2 U → R is submodular if ∀S, T ⊆ U , f (S ∪ T ) + f (S ∩ T ) ≤ f (S) + f (T ). An alternative and equivalent view of submodularity is the monotonicity of marginal values. For all S ⊂ T and elements i / ∈ T , a submodular function satisfies We will think of f as a function in {0, 1} n → R.
These functions are often used in many algorithmic applications and very naturally show up when modeling utilities. It is quite common to assume that algorithms have oracle access to some submodular function: given a set S, we have access to f (S). Observe that, in general, the description of the submodular function f has size that is exponential in n, whereas most algorithms that use f run in polynomial time. This means that these algorithms look at a very tiny fraction of f , yet their behavior depends on a very global property of f . This leads to the very natural question: what if the function f provided to the algorithm was not submodular? Could the algorithm detect this, or would it get fooled? Obviously, if f is constructed by taking a submodular function and making very few changes to the values, then there is no need to think that algorithms should be affected. On the other hand, if f is "significantly different" from a submodular function, the behavior of these algorithms could very different.
Let us formally explain the notion of being different from a submodular function. Since polynomial time algorithms are sublinear with respect to the size of f , it is natural to use some property testing terminology. A function f is ǫ-far from being submodular if f needs to be changed at an ǫ-fraction of values to make it submodular. In polynomial time, can we detect that such a function is not submodular? If this is not possible, then this raises some very fundamental questions about submodularity. If the plethora of algorithms used cannot tell whether their input f is submodular or not, then in what sense are they actually using the submodularity of f ? This would suggest that the algorithms exploit a property more general than submodularity. It would be strange if we expect input functions f to have a property (submodularity), but we cannot even check if these functions deviate significantly from submodularity.
The main question here is whether submodularity is testable, i.e, is there a polynomial time procedure that distinguishes submodular functions from those that are ǫ-far? (This question was first posed as an open problem in [PRR03], in the context of submodularity testing over grids. Their results focused on testing over large low-dimensional grids rather than the high-dimensional hypercube {0, 1} n .) More concretely, what are the kind of structural properties of submodularity that we need to address? Property testing algorithms, especially those for functions on the hypercube, usually check for some local property. These algorithms check if the desired property holds in a small local neighborhood, for some randomly chosen neighborhoods. If no deviation is detected, then property testers conclude that the input function is close to the property. Do similar statements hold for submodularity? We show non-trivial upper and lower bounds connecting local submodularity violations to the distance.
Property testing proofs often show that a function is close to a property by explicitly modifying the function to make it have the property. Usually, there is some procedural method to perform this conversion. This raises a very interesting question about partial submodular functions: suppose one is given a partial function over the hypercube. This means that some set of values is defined, but the remaining are left undefined. Under what circumstances can this be completed into a submodular function? If this cannot be completed, can we provide a small certificate of this? For a vast majority of natural testable properties (over functions on the hypercube, e.g. monotonicity) such small certificates do exist. Unfortunately, this is no longer true for submodularity. We present an example showing that a minimal certificate of non-extendability can be exponentially large.

Our results
Before we state our main theorems, we first set some notation.
Definition 1.1 Denote by e i ∈ {0, 1} n the canonical basis vector which has 1 in the i-th coordinate and 0 everywhere else.
For a function f : {0, 1} n → R, i ∈ [n] and x ∈ {0, 1} n such that x i = 0, we define the marginal value of i (or discrete derivative) at x as ∂ i f (x) = f (x + e i ) − f (x).
A function f is submodular, if for any i ∈ [n] and x, y ∈ {0, 1} n such that x i = y i = 0 and x ≤ y coordinate-wise, The distance d(f, g) between two functions f and g is the fraction of points x where f (x) = g(x). Let S be the set of all submodular functions. The distance of f to submodularity is min g∈S d(f, g). We say f is ǫ-far from being submodular if the distance of f to submodularity is more than ǫ.
Definition 1.2 A property tester for submodularity is an algorithm with the following properties.
• If f is submodular, then the algorithm answers YES with probability 1 1 .
• If f is ǫ-far from submodular, then the algorithm answers NO with probability at least 2/3. • The number of queries made to f is sublinear in the domain size, which is 2 n . (Ideally, the number of queries is polynomial in n and 1/ǫ.) Submodularity vs. monotonicity. Our first observation is that testing submodularity is at least as hard as testing monotonicity. More formally, the problem of testing monotonicity for a function f : {0, 1} n → R can be reduced to the problem of testing submodularity for a function f ′ : {0, 1} n+1 → R. We present this reduction in Section 5. A consequence of this is that known lower bounds for monotonicity testing apply also to submodularity testing. For example, it is known that a non-adaptive monotonicity tester requires at least Ω( √ n) queries [FLN + 02]. We remark that the best known monotonicity tester on {0, 1} n takes O(n 2 /ε) queries [DGL + 99] and is non-adaptive. Submodularity can be naturally viewed as "second-degree monotonicity", i.e. monotonicity of the discrete partial derivatives ∂ i f . So a very natural test for submodularity is to simply run a monotonicity tester on the functions ∂ i f . In one direction, it is clear that for a submodular function, such a tester would always accept. However, it is not clear whether this tester would recognize functions that are far from being submodular and label them as such.
Monotonicity testers search randomly for pairs x, x + e i such that f (x) > f (x + e i ). Such a pair of points can be naturally called a "violated pair". It is known that if f is ǫ-far from being monotone, then the fraction of violated pairs is at least ǫ/n O(1) [GGL + 00, DGL + 99]. If we want to test submodularity by reducing to a monotonicity tester in each direction, this means that we are looking for violations of the following type: . We call such violations violated squares.
The density of violated squares is the number of violated squares divided by n 2 2 n−2 .
Our main combinatorial result consists of two bounds on the relationship of the distance from submodularity and the density of violated squares. Theorem 1.4 Let n be a sufficiently large integer.
• For any ǫ ≥ 2 −n/10 , there is a function f : {0, 1} n → R which is ǫ-far from being submodular and its density of violated squares is less than ǫ 4.8 .
The first part of the theorem is proven through relatively basic observations. The second part is quite technical and requires a much deeper understanding of submodularity.
Theorem 1.4 provides evidence that testing submodularity is very different from testing monotonicity. An intuition one might get from monotonicity testing is that if a natural extension to submodularity exists, its dependence on ǫ should be relatively mild, perhaps linear or quadratic. We show that this is not the case, in particular if the dependence is a polynomial in 1/ǫ, the degree of the polynomial would have to be at least 5. This holds even in the range of exponentially small ǫ = 2 −Θ(n) , which means that poly(n)/ǫ 4.8 queries for any polynomial in n are not enough. This might be interpreted as counterintuitive to the notion that the dependence is polynomial at all. However, we cannot currently push this construction any further.
The first part of Theorem 1.4 implies immediately that a submodularity tester that checks q = 1/ǫ O( √ n log n) random squares succeeds with high probability 2 . Note that this is a non-adaptive tester, because the queries do not depend on the function values. To our knowledge, this is the first testing result asymptotically better than the trivial tester checking 2 Θ(n) squares.
Corollary 1.5 There is a subexponential time non-adaptive tester for submodularity. This procedure samples 1/ǫ O( √ n log n) sqaures at random and checks if any are violated. If the input f is ǫ-far from being submodular, this procedure rejects with high probability.

Extending partial functions.
A partial function f is one that is defined on only some subset of the hypercube. Such a function is extendable, if the remaining values can be filled in to get a submodular function. Although the question of extending partial functions is interesting in itself, it also has some relevance to question of testing submodularity. Any proof of a property tester must show that if a function f passes the tester (with high probability), then f must be ǫ-close to submodularity. This is usually done by arguing that if f has a sufficiently low density of local violations, one can modify an ǫ-fraction of values and remove all "obstructions" to submodularity. Since an f that passes the tester must have a low density of local violations, f is ǫ-close. An understanding of these obstructions to submodularity is often helpful for designing testers. An obstruction is just a subset of values that cannot exist in any submodular function.
Given a partial function f that is not extendable, we would ideally like to find a small certificate for this property. Unfortunately, we will show that such certificates can be exponentially large. We give a partial function with a surprising property. The partial function f is defined on an exponentially large set and is not extendable. If any single value is removed, then this new function is extendable. Definition 1.6 For a partial function f , let def(f ) be the set of domain points when f is defined. Let A ⊆ {0, 1} n . The restriction of f to A, f | A , is the partial function that agrees with f on A and is undefined everywhere else. The partial function f is minimally non-extendable if f | A is extendable for all A ⊂ def(f ).

The difficulty in testing submodularity
The values of f can interact in non-trivial ways to create obstructions to submodularity. Contrast this to monotonicity. A partial function f (on the hypercube) cannot be extended to a non-decreasing monotone function iff there is a pair of sets S ⊂ T such that f (S) > f (T ). There is always a certificate of size 2 that a partial function cannot be extended. So this completely characterizes the obstructions to monotonicity, and is indeed one of the reasons why monotonicity testers work. Our work implies that such a simple characterization does not exist for submodularity. Indeed, as Theorem 1.7 claims, obstructions to submodularity can have an extremely complicated structure.
Functions that are far from being submodular can "hide" their bad behavior. In Theorem 3.3, we show the existence of a function f with exactly one violated square, but making f submodular requires changing 2 n/2 values. Somehow, even though the function is (in a weak sense) "far" from submodular, the only local violation that manifests itself is a single square. The functions described by the second part of Theorem 1.4 are constructed through generalizations of this example.
As mentioned earlier, the problem of testing submodularity was first raised first by [PRR03]. They considered submodularity over general grid structures (of which the hypercube is a special case). Their focus was on testing submodularity over 2-dimensional grids. Specifically, [PRR03] gave strong results for testing Monge matrices. Monge matrices are essentially submodular functions over the n × m integer grid. Here, the dimension is 2, but the domain in each component is large. In contrast, we are studying submodular functions over high-dimensional domains, where each component is binary. Hence, our problem is quite orthogonal to testing Mongeness, and we need a different set of techniques.
Another related set of results is recent work on learning and approximating submodular functions [GHIM09,BH09]. Here, we want to examine a value oracle through polynomially many queries (which is similar to our setting) and learn sufficient information so that we are able to answer queries about the function. The difference is that in this model, we care about multiplicative-factor approximation to the original function. An even more essential difference is that the input function is guaranteed to be submodular, rather than possibly being corrupted. For example, [GHIM09] shows that we can "learn" a monotone submodular function using polynomially many queries so that afterwards we can answer value queries within a multiplicative O( √ n) factor, and this is optimal up to logarithmic factors. In contrast, the input function in our model might be masquerading as a submodular function but in truth be very far from being submodular.

Organization
The rest of the paper is organized as follows. In Section 2, we present our basic submodularity tester and prove the first part of Theorem 1.4. In Section 3, we present our construction of submodular functions from lattices and prove the second part of Theorem 1.4. In Section 4, we discuss extendability of submodular functions and prove Theorem 1.7. In Section 5, we present the reduction from monotonicity testing to submodularity testing. In Section 6, we discuss future directions.

A subexponential submodularity tester
The violated-square tester.
• For a parameter q ∈ Z, repeate the following q times.
• If none of the tested squares is violated, then return YES.
Clearly, if the input function is submodular, the tester answers YES. We would like to understand how well this tester performs in case the input function is ǫ-far from being submodular. The following observation is standard and reduces this question to a combinatorial problem about violated squares.
The following two statements are equivalent: • The violated-square tester using q(n, ǫ) queries detects every function that is ǫ-far from submodular with constant probability. • For every function which is ǫ-far from submodular, the density of violated squares is Ω(1/q(n, ǫ)).
Therefore, to understand this tester we need to understand the relationship between the distance from submodularity and the density of violated squares. In the rest of this section, our main goal is to prove the first part of Theorem 1.4, i.e. the claim that for a function ǫ-far from submodular, the density of violated squares must be at least ǫ O( √ n log n) . Using Lemma 2.1, this implies Corollary 1.5. First, we prove the following lemma.
Then it is possible to decrease all the values either in {y : y ≤ x} or in {y : y ≥ x + e i + e j } by a constant such that the square {x, x + e i , x + e j , x + e i + e j } is no longer violated and no new violated square is created.
the "deficit" of the violated square. One way to fix this square is to decrease the value of f (x) by d; however, this might create new violated squares. Instead, we decrease the value of f (y) for every y ≤ x; i.e., we define a new functionf (y) = f (y) − d for y ≤ x, andf (y) = f (y) otherwise. (Alternatively, we can definẽ f (y) = f (y) − d for y ≥ x + e i + e j , andf (y) = f (y) otherwise; the analysis is symmetric and we omit this case.) Consider any other square that was previously not violated, i.e. f ( We consider four cases: • If x ′ ≤ x and both x i ′ = 0 and x j ′ = 0, then the only value we modify in the square is f (x ′ ), which is decreased by d. This cannot create a submodularity violation.
• If x ′ ≤ x and exactly one of the coordinates x i ′ , x j ′ is 1, then we modify two values in the square; for example f (x ′ ) and f (x ′ + e i ′ ). Since we decrease both by the same amount, this again cannot create a submodularity violation.
• If x ′ ≤ x and x i ′ = x j ′ = 1, then we decrease all four values in the square by the same amount. Again, this cannot create a submodularity violation. 2 This means we can fix violated squares one by one, and the number of violated squares decreases by one every time. The cost we pay for each fix is the number of points in the cube above or below the respective square. Recall that we count the number of modified values overall, and hence what counts is the union of all the cubes modified in the process. Intuitively, it is more frugal to choose up-closed cubes for violated squares that are above the middle layer of the hypercube, and down-closed cubes for squares that are below the middle. A counting argument gives the following. Lemma 2.3 Let ǫ ∈ (0, e −5 ) and let f have at most ǫ √ n log n 2 n violated squares. Then these violated squares can be fixed by modifying at most ǫ2 n values.
Proof: Denote by B the set of bottom points for the violated squares which are below the middle layer; i.e. we have ||x|| 1 ≤ n/2 for each x ∈ B. (The squares above the middle layer can be handled symmetrically.) We choose to modify the down-closed cube, C x = {y ∈ {0, 1} n : y ≤ x}, for each x ∈ B. We can fix the violated square one by one, by modifying values in the cubes C x . The total number of modified values is | x∈B C x |. We estimate the cardinality of this union by combining two simple bounds across levels of the hypercube. Denote L j = {x ∈ {0, 1} n : ||x|| 1 = j}. We have First, by the union bound, we have Secondly, we have (trivially) We choose the better of the two bounds depending on j. In particular, for j ≤ n/2 − a √ n, we get where X is a binomial Bi(n, 1/2) random variable and the last inequality is a standard Chernoff bound. For j > n/2 − a √ n, we use Let a = 1 2 ln(1/ǫ); we also assume that |B| ≤ 2 n ǫ √ n ln n . For ǫ ∈ (0, e −5 ), this implies This lemma immediately implies the first part of Theorem 1.4. Assuming that f is ǫ-far from being submodular, we get that the number of violated squares is at least ǫ √ n log n 2 n for ǫ ∈ (0, e −5 ), i.e. the density of violated squares is at least ǫ √ n log n .

Few violated squares, yet large distance
We now give a construction of submodular functions that have large distance but a relatively small fraction of violated squares. As we mentioned earlier, these bounds are nowhere near our positive results. Nonetheless, we are able to show a significant difference from monotonicity. Our first tool to construct these functions is an interesting family of submodular functions. It is known that that the set of minimizers of a submodular function always forms a lattice 3 [Edm70]. We prove that conversely, for any lattice L ⊂ {0, 1} n there is a submodular function whose set of minimizers is exactly L. We will then piece together these submodular functions to construct a non-submodular function with the desired properties.

Submodular functions from lattices
These two symmetric differences can be bounded as follows: Adding up the two bounds and merging terms such as |S ∩Ū ∩V | + |S ∩Ū ∩ V | = |S ∩Ū|, we obtain Considering the known fact that the minimizers of any submodular function form a lattice, we get the following characterization.
Corollary 3.2 Let S ⊆ {0, 1} N . Then the following statements are equivalent: 1. S is a lattice. 2. S is the set of minimizers of some submodular function.

Functions with one violated square
We start with the following counter-intuitive result.
Theorem 3.3 For any n, there is a function f : {0, 1} n → R which has exactly one violated square but 2 n/2 values must be modified to make it submodular.
We remark that this statement is tight in the sense that for any function with exactly one violated square, it is sufficient to modify 2 n/2 values (we leave the proof as an exercise, using Lemma 2.2). To prove Theorem 3.3, we use Lemma 3.1 which says that any lattice in {0, 1} n yields a natural submodular function. This function does not have any violated squares. However, we will add two additional dimensions and extend the function in such a way that each point of the lattice will produce exactly one violated square. Moreover, due to the nature of the distance function, the function we construct will be a linear function in a large neighborhood of each violated square. This will imply that we cannot simply change one value in each violated square if we want to make the function submodular -such changes would propagate and force many other values to be changed as well. We make this argument precise later. The construction is as follows.  Proof: Observe that for any fixed a, b ∈ {0, 1}, f (a, b, x) is a submodular function of x. Therefore, there is no violated square {z, z + e i , z + e j , z + e i + e j } unless at least one of i, j is a special bit.
If exactly one of i, j is a special bit, we can assume that it is the first special bit. First assume the other special bit is 0, therefore we are looking at a square with values f (0, 0, x), f (1, 0, x), f (0, 0, x + e i , f (1, 0, x + e i ). By construction, we know that therefore the square cannot be violated. Similarly, if the other special bit is 1, we are looking at a square with values f (0, 1, x), f (1, 1, x), f (0, 1, x+e i , f (1, 1, x+e i ).
So again, the square cannot be violated.
Finally, consider a square where i, j are exactly the special bits. The square has values f (0, 0, x), f (0, 1, x), f (1, 0, x), f (1, 1, x). Observe that f (0, 0, x) + f (1, 1, x) = 1, and f (0, 1, x) + f (1, 0, x) = 2d L (x). The square is violated if and only if 2d L (x) < 1, i.e. when x ∈ L. This means that we have a one-to-one correspondence between violated squares and the points of the lattice. Thus we can generate functions with a prescribed number of violated squares, depending on our initial lattice L. The simplest example is generated by L = {x} being a 1-point lattice. In this case, it is easy to verify directly that the function d L (x) is submodular, and hence our construction produces exactly one violated square.
The second part of our argument, however, should be that such a function is not very close to submodular. In particular, consider L = {x} where ||x|| 1 = n/2. Suppose that we want to modify some values so that the function f becomes submodular. We certainly have to modify at least one value in the violated square {(a, b, x) : a, b ∈ {0, 1}}. However, for each fixed choice of a, b ∈ {0, 1}, the function f (a, b, x) is linear. The last point in our argument is that it is impossible to modify a small number of values "in the middle" of a linear function (with many values both above and below), so that the resulting function is submodular. First, we prove the following. Note that this is tight, for example by taking f (x) = 1 − x 1 .
Proof: We prove the statement by induction on n. Obviously it is true for n = 1. For n > 1, we partition the cube {0, 1} n as follows: let In other words, Q i is the set of points such that the first nonzero coordinate is If there is coordinate i such that f (e i ) ≤ 0, then the discrete derivative ∂ i f (0) is negative. By submodularity, ∂ i f must be negative everywhere. Hence, for any point x such that The other case is that f (e i ) > 0 for all i ∈ [n]. Then we apply the inductive hypothesis to Q i , which implies that at least 1 2 |Q i | values in Q i are nonzero. By adding up the contributions from Q 1 , . . . , Q n , we conclude that at least half of all the values in {0, 1} n are nonzero. 2 To rephrase the lemma, we can start with a zero function on {0, 1} n , increase the value of f (0) to a positive value, and ask -how many other values do we have to modify to make the function submodular? The lemma says that at least 2 n−1 values must be modified. In fact, the condition of submodularity does not change under the addition of a linear function, so the zero function can be replaced by any linear function. Thus the lemma says that it is impossible to increase the value of a linear function at the lowest point of a cube, without changing a lot of other values in the cube.
Note that it is possible to decrease the value of a linear function at the lowest point of a cube and this does not create any violation of submodularity. What is impossible is to decrease the value "in the middle" of a linear function, without changing a lot of other values. This is the content of the next lemma.
Proof: Consider Q = {y ∈ {0, 1} n : y ≤ x}; this is a cube of dimension n/2, hence |Q| = 2 n/2 . If f (y) = 0 for all y ∈ Q, we are done. Therefore, assume that there is any point y ∈ Q such that f (y) = 0. Then consider a monotone path from y to x; there must be an edge (y ′ , y ′ + e i ) of negative marginal value. By submodularity, all edges (z ′ , z ′ + e i ) for z ′ ≥ y ′ must have negative marginal value. There are at least 2 n/2 such edges, since all the n/2 zero bits in x are also zero in y ′ and can be increased arbitrarily to obtain a point z ′ ≥ y ′ . Each of these (disjoint) edges (z ′ , z ′ + e i ) contains a point of nonzero value, and hence there are at least 2 n/2 such points.  {(0, 0, x), (0, 1, x), (1, 0, x), (1, 1, x)}, f ′ must differ from f on at least one of these values. Fix a, b ∈ {0, 1} such that f ′ (a, b, x) = f (a, b, x) and consider the function f ′ (a, b, x) − f (a, b, x) as a function of x. Since f is linear, f ′ − f is again submodular as a function of x. We have > 0, we apply Lemma 3.5 to the cube {y : y ≥ x}; if (f ′ − f )(x) < 0, we apply Lemma 3.6. In both cases, we conclude that there are at least 2 n/2 values x ∈ {0, 1} n such that f ′ (x) = f (x). Therefore, f is 2 −n/2 -far from submodular. 2

Boosting the example to increase distance
Observe that in Theorem 3.3, the relationship between relative distance and density of violated squares is quadratic: we have relative distance ǫ = 2 −n/2 and density of violated squares ≃ ǫ 2 = 2 −n . In order to prove the second part of Theorem 1.4, we need to consider a denser lattice. Since the regions of linearity will be more complicated here, we need a more general statement to argue about the number of values that must be fixed to make a function submodular. Proof: Suppose f (y) = 0 for some y ∈ D. Then let x ≤ y be minimal such that f (x) ≤ 0. Since x is minimal (and cannot be 0 because f (0) > 0), for any x i = 1 we have f (x − e i ) > 0. Hence f (x) − f (x − e i ) < 0 and by submodularity f (y) − f (y − e i ) < 0. Since f (y) = 0, this implies that f (y − e i ) > 0. In this case we call y − e i a witness for y. To summarize, for every y ∈ D we have either f (y) = 0 or f (y − e i ) = 0 for some witness of y. Since every point can serve as a witness for at most n other points, the number of nonzero values must be at least |D|/(n + 1). 2 Now we are ready to prove the second part of Theorem 1.4.
Obviously, this is a lattice, in fact it is isomorphic to a cube of dimension n/2. The function f : {0, 1} n+2 → R based on this lattice has exactly 2 n/2 violated squares, due to Lemma 3.4. It remains to estimate the distance of f from being submodular.
To that end, focus on the "middle layer" of the lattice, M = {x ∈ L : ||x|| 1 = n/2}. Such points have exactly a half of the pairs equal to (0, 0) and a half equal to (1, 1). For each such point x, consider points y ≥ x such that y still has the same number of pairs equal to (1, 1) as x. Formally, let The reason for this definition is that for any point y ∈ Q x , it is possible to trace it back to x (by zeroing out all the pairs which are not equal to (1, 1), we obtain x). Hence the sets Q x are disjoint. The path from y to x is also the shortest possible path to any point of the lattice (because it is necessary to modify all pair which are equal to (1, 0) or (0, 1)). In other words, d L (y) = ||x − y|| for any y ∈ Q x . This implies that the function f (a, b, y) for any fixed a, b is linear as a function of y ∈ Q x . Our final argument is that in order to make f submodular, we would have to fix many values in each set Q x . Let us assume that f ′ is submodular. Since f has a violated square {(0, 0, x), (0, 1, x), (1, 0, x), (1, 1, x)} for each x ∈ L, f ′ must be different from f in at least one point in each such square. More specifically, f ′ must be larger than f for one of the points (0, 1, x), (1, 0, x) or f ′ must be smaller than f for one of the points (0, 0, x), (1, 1, x).
In the other case, a = b, we have (f ′ − f )(a, b, x) < 0. Note that in this case f is actually linear on all of {0, 1} n and f ′ −f is submodular everywhere. Then we use arguments similar to Lemma 3.6. Let Q − x be the set of points y ≤ x such that the set of (0, 0) pairs is the same in y and x. Again, y ∈ Q − x can be traced back to x and so these sets are disjoint. From the proof of Lemma 3.6, we obtain that either f (y) = 0 for all y ∈ Q − x , or else there is an edge (x − e i , x) of negative marginal value. This implies that all edges above this edge have negative marginal value. I.e., at least half of the points in Q x ∪ (Q x − e i ) must have nonzero value. Now let us count the size of Q x . We have n/4 pairs of value (0, 0) which can be modified and we have 3 choices for each (we avoid (1, 1) for such pairs). Therefore, |Q x | = 3 n/4 . The same holds for Q − x . This holds for every lattice point in the middle layer M. Therefore, each lattice point x ∈ M contributes Ω(3 n/4 /n) nonzero points in f ′ − f . There are n/2 n/4 = Ω(2 n/2 /n) points in M. We have to be careful about the last case where the nonzero points are guaranteed to be in Q x ∪ (Q x − e i ) rather than Q x . Such points could be potentially overcounted n times, but we had a 1/2-fraction of them nonzero, so we still get Ω(3 n/4 /n) nonzero points from each point in M. Overall, we get Ω(2 n/2 3 n/4 ) nonzero points in f ′ − f . This means that the distance of f from being submodular is ǫ = Ω(2 −n/2 3 n/4 ). A calculation reveals that this is ǫ ≃ Ω(2 −0.104n ), while the density of violated squares is 2 −n/2 < ǫ 4.8 .
Finally, it is easy to boost this example to larger value of ǫ. Supppose we want to construct an example for a given n and ǫ = 2 −0.104n ′ , n ′ < n (n ′ can even be a constant). Assume for simplicity that n = an ′ and a is an integer. Then we start from an example on n ′ coordinates where the distance is ǫ = 2 −0.104n ′ and density of violated squares is 2 −n ′ /2 . We extend f to dimension n ′ = an so that it does not depend on the new coordinates. There are no violated squares involving the new coordinates and hence the density of violated squares as well as relative distance remain unchanged. 2

Path certificates for submodular extension
Given a partial function f , can we get a precise characterization of when f is submodular-extendable? Using LP duality, we can give a combinatorial condition that captures this condition. In this subsection, f will be some fixed partial function. We will set D = def(f ) and U = B \ D. Let us associate a variable x S for every set S. If S ∈ D, then x S has value f (S) (so this is not really a variable, but it will be convenient to keep this notation). For set S, A + (S) is the set If f is extendable, then the following LP has a feasible solution.
∀e, e ′ ∈ Γ + (e), Using Farkas' lemma, if this is infeasible, then we can derive a contradiction from these equations. So, we have dual variables y e,e ′ , y e associated with each equation, and the following LP is feasible.
∀e, y e + e ′ ∈Γ + (e) y e,e ′ = e ′ ∈Γ − (e) y e ′ ,e ∀S ∈ U , e∈A + (S) y e = e∈A − (S) y e ∀e, e ′ ∈ Γ + (e), y e,e ′ ≥ 0 Definition 4.1 Consider a set of directed paths P consisting of cycles or paths with endpoints in D. An edge is upward if it is directed from the smaller set to the larger, and downward otherwise. Let U be the multiset of upward edges of P and D be the multiset of downward edges (so we keep as many copies of edge e as occurrences in P). Let G be a bipartite graph on U and D (with links, instead of edges). An edge e ∈ U is linked to e ′ ∈ D if e e ′ . The set of paths P is matched if there is a perfect matching in G.
The value of a directed path P, val(P), that starts at S ∈ D and ends at S ′ ∈ D is f (S ′ ) − f (S). Cycles have value 0. The value of P is the sum of values of the paths in P. If P has negative value, then P is referred to as a path certificate.

Lemma 4.2
The partial function f is not submodular-extendable iff f contains a path certificate.
Proof: Suppose P is a path certificate, but f can be extended to a submodular function f ′ . Let U be the multiset of upward edges in P and D the multiset of downward edges. We have a perfect matching between U and D. Consider a matched pair (e, e ′ ). We have e e ′ . By the submodularity of f ′ , f ′ (e) ≥ f ′ (e ′ ). Considering e, e ′ as directed edges, we get f (e) + f ′ (e) ≥ 0. Summing over all matched pairs, e∈P f ′ (e) ≥ 0. Consider a path P ∈ P. Note that val(P) is the same in f and f ′ , since f ′ extends f . Considering P as a multiset of directed edges, we have val(P) = e∈P f ′ (e). We get P∈P val(P) ≥ 0. Contradiction.
Suppose f cannot be extended to a submodular function. By Farkas' lemma, the second LP is feasible. Consider the directed hypercube (abusing notation, call this graph B). The second equality is a flow conservation constraint for all vertices in U . Hence, we can think of the y e 's as giving a flow in B, where the terminals are D. Precisely, y e is the flow in e from the lower end to the higher end. The first constraint is a little stranger 4 . Consider the graph G, where the vertices are edges of the hypercube, and there is a directed link from e to every member of Γ + (e). This actually gives n disconnected graphs, each of which is a hypercube in n − 1 dimensions. Think of y e,e ′ as a flow in G. Note that this is always positive. We do not really have a flow conservation condition, because of the extra y e . Add a extra terminal for every e that is attached to the vertex e ∈ G. This is called the terminal e ∈ G. Think of y e amount of flow being removed (if y e ≥ 0) or injected (if y e < 0) into e from this terminal. Then, we have a legitimate flow in G represented by the y e,e ′ 's.
Since the y values are rational, we can assume that they are integral. We will construct a path certificate through a flow decomposition process. At an intermediate stage, we will maintain a set P of directed paths in B and a list of matched pairs in P. For each matched pair, we have a directed path in G from the smaller edge to the larger (call this set of paths Q). All these paths start and end at terminals in their respective graphs. We maintain the following invariants. Through every path in P ∪ Q, a single unit of flow can be simultaneously routed, in the flow given by the y values. Furthermore, a directed edge e in P is upward iff y e > 0. Flow in any directed edge of Q is always positive. Suppose the current set of paths P is not completely matched. We will describe a procedure that either increases the number of matched pairs, or adds a new path to both P and Q. That means that the total flow that is routed through P (and Q) increases by one. Since the flow is finite, this process must terminate and return a set of matched paths.
Suppose there is an unmatched edge e ∈ P (wlog, we can take it to be upward). This means that y e is positive. Note that because P can be considered as a multiset of edges, there could be many copies of the upward edge e in P. Suppose there are t copies, which means that t paths in P pass through e. Since we can route one unit of flow in each of these path simultaneously, y e ≥ t. Let us look at the situation in G. At most t − 1 copies of e are matched, so there are at most t − 1 paths in Q that end at the terminal e ∈ G (since y e ≥ 0, there is a net influx at terminal e ∈ G). Let us route a single unit of flow through all paths in Q (and remove this flow). This must still leave at one unit of flow going into e. So, we can route one unit of flow from some e ′ to e along path Q. Note that because the flow is always positive in G, e ′ ≻ e.
Note that y e ′ < 0, because in G, the terminal e ′ has a net outflow. Suppose there is an unmatched copy of e ′ in P (it must be downward). Then we can match e to this copy of e ′ , and we are done. Suppose this is not the case. Let s be the number of copies of the downward edge e ′ in P (all of these are matched). We argue that s < |y e ′ |. Suppose, for the sake of contradiction, that |y e ′ | = s. Them, there are s paths in Q that start at the terminal e ′ ∈ G. If we remove all the flow paths corresponding to Q, then there is no flow going out of e ′ . But, we were able to route one unit of flow from e ′ to e along Q after removing flow corresponding to Q. Contradiction. Hence |y e ′ | > s. This means that after removing all the flow corresponding to P (in B), there is still at least one unit of (downward) flow left on e ′ . So, after the removal, we can still route one unit of flow through e ′ , giving us path or cycle P . We add P to P and Q to Q, observing that the invariants are maintained. This ends the procedure.
Finally, we end up with a set of matched paths P. If this has negative value, we have found our certificate. Suppose it has positive value. We argue that the we can find a new (integral) solution for the dual which has a smaller flow. This is done by just removing one unit flow along all paths in the final P and Q. Consider some upward edge in P. Since P is completely matched, the number of copies of e in P is exactly the number of paths in Q ending at terminal e in G. Hence, the y values, after the decrease, will maintain the flow conservation conditions. The original value of the solution is negative, and we removed a set of matched paths of positive value. So, the value of the remaining solution is still negative. This gives us the new solution for the dual. A path in P is called a singleton if it consists of only a single edge. We will prove some "clean-up" claims that provide us with nice path certificates.
Claim 4.3 Let f be a partial function. Let f contain a set of matched paths P and let e is an upward edge in P that is matched to a downward copy of itself. There is an operation that converts P to P ′ such that P ′ contains the same multiset of edges P except for an upward and downward copy of e. The matching of P ′ is identical to P (except for the matched pair of e) and val(P) = val(P ′ ).
Proof: Let e = (S, S + i). Suppose path P u contains edge e upwards, and P d contains it downwards. We can split P u into portions P 1,u and P 2,u such that the former is the part before e and the latter is after e. Similarly, we can get P 1,d and P 2,d . Note that P 1,u ends at S and P 2,d starts at S. Similarly, P 2,u ends at S + i and P 1,d starts at S. We can combine P 1,u and P 2,d to get a path P ′ 1 . Similarly, we get P ′ 2 . We replace P u and P d by he P ′ 1 and P ′ 2 . Note that the sum of values does not change. Also, the only edges removed are the upward and downward copies of e and the matching on the remaining edges stays the same. 2 Claim 4.4 Let f be partial function such that for any square of B, at most 2 points are present in def(f ). Let f contain a path certificate P, such that no edge occurs both upward and downward in P. There exists a path certificate Q such that Q contains no singleton edge. Furthermore, no edge in Q appears both upward and downward.
Proof: We will show how to remove any singleton in P and give an "equivalent" certificate Q. The value will remain the same. Suppose there is a singleton path consisting of upward edge e. Some downward edge e ′ , e ′ e must occur in path P ∈ P. If e = e ′ , then this edge occurs both upward and downward. This cannot happen. So e ′ ≻ e. Let e = (S, S + i) and e ′ = (T + i, T ), for some S ⊂ T . We will split P into two paths. Let P 1 be the portion of P before e ′ and P 2 be the portion after e. Note that P 1 ends at T + i and P 2 starts at T . Consider a downward path Q 1 from T + i to S + i and a parallel upward path Q 2 from S to T . Observe that there is a perfect matching between the edges of Q 1 to those of Q 2 .
Consider the path Q ′ 1 formed by joining P 1 to Q 1 , and the similarly constructed Q ′ 2 . Note that Q ′ 1 ends at S + i and Q ′ 2 starts at S. To get Q, we remove the singleton e from P and replace P by P 1 and P 2 . The set Q is completely matched. The edges in Q 1 and Q 2 (matched to each other) are disjoint. Hence, no edge in Q appears both upward and downward. The singleton edge e starts at S and ends at S + i. So val(Q ′ 1 ) + val(Q ′ 2 ) = val(e) + val(P). and val(Q) = val(P). Suppose |Q 1 | > 1. Then neither of Q ′ 1 and Q ′ 2 are singletons. Suppose Q 1 is a single edge. Then e and e ′ form a square, so neither endpoint of e can be in def(f ). This means that the path P 1 and P 2 are at least of length 1 and Q ′ 1 and Q ′ 2 are at least of length 2. The total number of singletons has decreased by 1. We can repeatedly apply this procedure, and remove all singletons. 2

Large minimal certificates
This will require many steps. We will start by giving a construction of a long cycle in B with some special properties. This cycle will be a sort of "frame" on which we can define f . For this f , we will find a set of matched path of negative value, showing that f is non-extendable.
The simple cycle will be obtained by performing a series of moves in B. An upward (resp. downward) step is one where some coordinates is incremented (resp. decremented). We will assume that n = 2m + 4. The cycle will only involve points in the m + 1, m + 2, m + 3, m + 4 levels of B. We will call these levels the 1, 2, 3, 4 levels. Any point is represented as (b 1 , b 2 , b 3 , b 4 , S, T ), where b i 's are bits, and S and T are sets on m elements. We will denote the starting (and hence, ending) point of the cycle to be (0, 0, 1, 0, ∅, [m]), where [m] represents the complete set on m elements. The cycle C has the following properties: • The cycle is simple, i.e., does not intersect itself.
• The cycle can be divided into a sequence of contiguous chunks of three steps. Every odd (resp. even) chunk has three upward (resp. downard) steps. There are an even number of chunks.
• The cycle has M ≥ 2 m chunks.
• Let the ith chunk is denoted by K i . The second edge e of K i is parallel to the first edge e ′ of K i+1(mod M ) . Suppose i is odd. Then K i has upward steps, and hence e ′ ≻ e. Similarly, if i is even, e ′ ≺ e.
A crucial combinatorial property of the hypercube that we use is the existence of Hamiltonian circuits. We set H to be a (directed) Hamiltonian circuit on the m-dimensional hypercube. For any set R ∈ H, s(R) denotes the successor of R in H. The complement path H is the Hamiltonian circuit obtained by taking the set-complement of every point in H.
Lemma 4.5 There exists a cycle C with the properties above.
Proof: Starting from a point (0, 0, 1, 0, R, R), we will give a sequence of 4 chunks that will end at (0, 0, 1, 0, s(s(R)), s(s(R))). Since H is a Hamiltonian circuit, we get a cycle. The reason we keep R and R is that from (· · · , R, R), we can perform a single upward and then downward step to reach (· · · , s(R), s(R)). We will assume that the moves to both s(R) and s(s(R)) are upward. Whenever this is not the case, we can just reverse the roles of R (or s(R)) and R (or s(R)).
We describe the sequence of chunks. In the arrows below, the labels above them represents the coordinate being changed. The numbers 1, 2, 3, 4 represent the first four coordinates. If the label has a set, then that set is being changed by moving along (appropriately) either H or H. These labels help verify the matching property. The first and third chunks only have upward steps, and the remaining have only downward steps. For convenience, S = s(R) and T = s(S).  It is easy to see that no point can occur in two different chunks, because the sets on H or H are different. So, the cycle is simple. The number of chunks is at least the number of points in the m-dimensional hypercube. The matching property should be clear. 2 We now define the function f . Let the directed path consisting of the first two edges of chunk K i be P i . Note that P 2i is downward and P 2i+1 is upward. We describe the function f and state many properties of def(f ). It will be convenient to have define the following sequences of 4 bits. We set B 1 = (0, 0, 1, 0), B 2 = (1, 0, 0, 0), C 1 = (1, 1, 1, 0), and C 2 = (1, 0, 1, 1). We use A to denote any one of these.
• The function f will be defined on all the endpoints of the P i 's.
• For P 1 , the small endpoint has value v (the exact choice for this is immaterial), and the larger endpoint has value v + 1. For P 2i+1 (i > 0), the small end has value v and the large end has value v + 2. For P 2i (∀i), the large end has value v + 2 and the small end has value v.
• Fix any R. One and only one point of the form (B j , R, R) is present in def(f ). Similarly, one and only one of (C j , R, R) is present in def(f ). We also have (B j , R, R) ∈ def(f ) iff (C j , R, R) ∈ def(f ). No other point is present in levels 1 and 3.
• Fix any R. Suppose s(R) ⊃ R. One and only one of (B j , s(R), R) is present in R. Similarly, one and only one of (C j , s(R), R) is present in R. We also have (B j , s(R), R) ∈ def(f ) iff (C j , s(R), R) ∈ def(f ). No other point is present in levels 2 and 4.
Suppose s(R) ⊂ R. Then these points are of the form (A, R, s(R)).
• For any point of def(f ), there is at most one neighbor present in def(f ). Hence, any square of B contains at most 2 points of def(f ).
• Consider some point (B j , R, R) in level 1. The only point in level 3 at a Hamming distance 2 from this point is (C j , R, R). A similar statement holds for points in level 2.
Claim 4.6 The function f is not submodular-extendable.
Proof: By Lemma 4.2, it suffices to show a path certificate. As the astute reader might have guessed, all the P i 's form such a set. A matching exists because of the fourth property of the cycle C. The value of P 1 is 1. The value of any other P 2i+1 is 2. Every P 2i has value −2. Since the total number of chunks is even, the value of this set of paths is −1. 2 We will now show that f | S for any S ⊂ def(f ) is extendable. It will be easiest to show that by proving that any path certificate for f must essentially be the P i 's.
Claim 4.7 Suppose f contains a set of matched paths P with no singletons. This P must be the set of all P i 's.
Proof: Consider a point X in P that lies in the lowest level (the number of 1s in the representation of the point is minimized). We argue that this point only has upward edges incident to it. If there is a downward edge e incident to it, then P must contain an upward edge e ′ that is matched to e. Therefore, e ′ ≺ e and the lower end of e ′ must lie in a lower level than S. This contradicts the choice of S. Hence, X only has upward edges incident to it. This means that it can never be in the interior of a path, and must be a terminal. Therefore, X ∈ def(f ). Similarly, points in P that lie in the highest level only have downward edges incident to them, and are also in def(f ).
The points of def(f ) lie in levels m + 1, m + 2, m + 3, m + 4, called the 1, 2, 3, 4 levels. Edges between the 1 and 2 levels are called low edges, those between the 2 and 3 levels are middle edges, and those between the 3 and 4 levels are high edges. All edges of P fall into one of these three sets. Low edges are always upward and high edges are always downward. Middle edges are matched to either low or high edges. Therefore, the number of middle edges is exactly the same as the total Hence h is submodular.
Assume that f is monotone. Then, for any x, f (x) ≤ f (∅) = M Since f (x + e i ) + f (x + e j ) − f (x + e i + e j ) − f (x) ≤ 2M , f + h is also submodular.
Suppose g is not submodular. Then there exists a violated square in g. Suppose this square does not involve e * . This square is contained in a copy of {0, 1} n where the function is equal to h or f + h. But this would imply that either h or f + h is non-submodular. So, this square must involve e * . Then we have the following: 0 < g(0, x) + g(1, x + e i ) − g(0, x + e i ) − g(1, x) = f (x + e i ) − f (x).
This violates the non-increasing property of f . Hence, we conclude that g is submodular.
Now, suppose that f is ǫ-far from being monotone. Furthermore, suppose we can modify ǫ2 n values of g to get a submodular function g ′ . Consider the function f ′ (x) = g ′ (1, x) − g ′ (x). Since g ′ is submodular, f ′ must be monotone. Since g ′ differs from g in at most ǫ2 n values, the monotone function f ′ differs from f in at most ǫ2 n values. This is a contradiction. So, g must be ǫ/2-far from being submodular. 2 By the results in [FLN + 02], there is an Ω( √ n) non-adaptive and Ω(log n) lower bound for 1-sided monotonicity testers. We get the following corollary.

4.
Testing matroid independence oracles: Any matroid can be represented as a collection of independent sets. Suppose we have a function that tells us whether a set is independent (for some purported matroid). Can we efficiently test whether this function is indeed a valid independence oracle? This seems like a rather fundamental question about matroids.