A Refined Analysis of Submodular Greedy

Many algorithms for maximizing a monotone submodular function subject to a knapsack constraint rely on the natural greedy heuristic. We present a novel refined analysis of this greedy heuristic which enables us to: $(1)$ reduce the enumeration in the tight $(1-e^{-1})$-approximation of [Sviridenko 04] from subsets of size three to two; $(2)$ present an improved upper bound of $0.42945$ for the classic algorithm which returns the better between a single element and the output of the greedy heuristic.


Introduction
Submodularity is a fundamental mathematical notion that captures the concept of economy of scale and is prevalent in many areas of science and technology. Given a ground set E, a set function f : 2 E → R over E is called submodular if it has the diminishing returns property: f (A ∪ {e}) − f (A) ≥ f (B ∪ {e}) − f (B) for every A ⊆ B ⊆ E and e ∈ E \ B. 1 Submodular functions naturally arise in different areas such as combinatorics, graph theory, probability, game theory, and economics. Some well known examples include coverage functions, cuts in graphs and hypergraphs, matroid rank functions, entropy, and budget additive functions.
A submodular function f is monotone if f (S) ≤ f (T ) for every S ⊆ T ⊆ E. In this note we consider the problem of maximizing a monotone submodular function subject to a knapsack constraint (MSK). An instance of the problem is a tuple (E, f, w, W ) where E is a set of n elements, f : 2 E → R ≥0 is a non-negative, monotone and submodular set function given by a value oracle, w : E → N + is a weight function over the elements, and W ∈ N is the knapsack capacity. 2 A subset S ⊆ E is feasible if e∈S w(e) ≤ W , i.e., the total weight of elements in S does not exceed the capacity W ; the value of S ⊆ E is f (S). The objective is to find a feasible subset S ⊆ E of maximal value.
MSK arises in many applications. Some examples include sensor placement [9], document summarization [10], and network optimization [15]. The problem is a generalization of monotone submodular maximization with a cardinality constraint (i.e., w(e) = 1 for all e ∈ E), for which a simple greedy algorithm yields a (1 − e −1 )-approximation [13]. This is the best ratio which can be obtained in polynomial time in the oracle model [12]. The approximation ratio of (1 − e −1 ) is also optimal in the special case of coverage functions under P = N P [6].
Many algorithms for MSK rely on a natural greedy heuristic. Greedy maintains a feasible subset A ⊆ E. In each step it adds to A an element e ∈ {e ′ ∈ E | w(A ∪ {e ′ }) ≤ W } which maximizes . 3 While the greedy heuristic does not guarantee any constant approximation ratio, it is commonly utilized as a procedure within approximation algorithms.
The first (1 − e −1 )-approximation for MSK was given by Sviredenko [16] as an adaptation of an algorithm of Khuller, Moss and Naor [8] proposed for the special case of coverage functions. The algorithm of Sviridenko exhaustively enumerates (iterates) over all subsets G ⊆ E of at most 3 elements and extends each set G using the greedy heuristic. The algorithm uses O(n 5 ) oracle calls and arithmetic operations.
Several works were dedicated to the development of simple, fast and greedy-based algorithms for MSK, with approximation ratios strictly smaller than (1 − e −1 ). Special attention was given to the algorithm which returns the better solution between the single element of highest value and the result of the greedy heuristic, to which we refer as Greedy+Singleton (see the pseudocode in Section 4).
The algorithm was first suggested in [8] for coverage functions, and adapted to monotone submodular function in [10]. Both works stated an approximation guarantee of (1 − e −0.5 ), though the proofs in both works were flawed. A correct proof for a (1 − e −0.5 )-approximation was given by Tang et. al. [17], improving upon an earlier approximation guarantee of e−1 2e−1 ≈ 0.387 by Cohen and Katzir [2]. Recently, Feldman, Nutov and Shoham [7] showed that the approximation ratio of the algorithm is within [0.427, 0.462].
A recent work by Yaroslavtsev, Zhou and Avdiukhin [18] gives an O(n 2 ) greedy based algorithm with an approximation guarantee of 1 2 . A variant of the algorithm of Yaroslavtsev el. al. was used by Feldman et. al. [7] to derive a greedy based 0.9767 · (1 − e −1 )-approximation in O(n 3 ) oracle calls.
Taking a more theoretical point of view, in [4] Ene and Nguyen presented a (1 − e −1 − ε)approximation for MSK in time O(n · log 2 n) for any fixed ε > 0, improving upon an earlier O(n 2 · polylog(n)) algorithm with the same approximation ratio due to Badanidiyuru and Vondrák [1]. We note, however, that the dependence of the running times of these algorithms on ε renders them purely theoretical.
Our main technical contribution is a tighter analysis of the greedy heuristic, presented in Section 2. We show two applications of the analysis. In the first application we consider a variant of the algorithm in [16] which only enumerates of subsets of size at most two (as opposed to three in [16]), and show it retains the tight approximation ratio of (1 − e −1 ) for MSK. Theorem 1.1. There is a (1 − e −1 )-approximation for MSK using O(n 4 ) value oracle calls and arithmetic operations which works as follows: enumerate over all subsets of size at most two and extend each using the greedy heuristic.
Let us now briefly elaborate on the insight we use in order to improve the analysis of [16]. Intuitively, the analysis in [16] bounds the value of the solution generated by the greedy phase assuming a worst case submodular function f , and then bounds the value loss due to a discarded element (the element is discarded by the analysis, not by the algorithm) assuming a worst case submodular function g. The main insight for our improved result is that g = f ; that is, there is no function which attains simultaneously the worst cases assumed in [16] for the outcome of greedy and for the value loss due to the discarded element. This insight is well captured by the refined analysis. The proof of Theorem 1.1 is given in Section 3.
In the second application we utilized the refined analysis to improve the best known upper bound on the approximation ratio of Greedy+Singleton.
Theorem 1.2. The approximation ratio of Greedy+Singleton is no greater than β = 0.42945.
The proof of the above theorem is given in Section 4. The result was obtained by generating an instance for which the guarantee of the refined analysis is poor via numerical optimization. Combined with the result of [7], the theorem limits the approximation ratio of Greedy+Singleton to the narrow interval [0.427, 0.4295].
We note that our observation, which states that it suffices to enumerate only over all subsets of size at most two, was also independently and in parallel obtained by Feldman, Nutov and Shoham [7]. However, our proof and the proof of [7] differ significantly. Our approach for proving the main observation is useful for constructing counter examples for the approximation of the greedy heuristic, as demonstrated in Theorem 1.2.

The Greedy Procedure
We start with some definitions and notation. Given a monotone submodular function f : 2 E → R ≥0 and A ⊆ E, we define the function f A : 2 E → R ≥0 by f A (S) = f (A ∪ S) − f (A) for any S ⊆ E. It is well known that f A is also monotone, submodular and non-negative (see, e.g., Claim 13 in [5]). We also use f (e) = f ({e}) for e ∈ E.
The greedy procedure is given in Algorithm 1. While the procedure is useful for deriving efficient approximation, as a stand-alone algorithm it does not guarantee any constant approximation ratio. We say that the element e ∈ E found in Step 3 is considered in the specific iteration of the loop in Step 2. Furthermore, if the element was also added to A in Step 5 we say it was selected in this iteration. For any MSK instance (E, f, w, W ), we define a value function V . Let {a 1 , . . . , a ℓ } be the output of Greedy(E, f, w, W ), in the order by which the elements are added to A in Step 5 of Algorithm 1.
We note that the value of That is, the value function V is semi-linear and continuous. By definition we have that V (0) = f (∅) and V (w(A ℓ )) = f (A ℓ ). Intuitively, V (u) can be viewed as the value attained by Algorithm 1 while using capacity of u. We use V ′ to denote the first derivative of V . We note that ]. Similar to [16], our analysis is based on lower bounds over V ′ (in [16] the analysis used a discretization of V , thus omitting the differentiation).
w(a i ) . The next lemma gives a lower bound for V ′ . Lemma 2.2. Let E ′ be the set from Algorithm 1 at the beginning of the iteration in which a i is selected, . Otherwise, by the assumption of the lemma, y j ∈ E ′ and therefore , since a i was selected in Step 5 of Algorithm 1 when the value of the variable A was A i−1 . Thus, for every j ∈ [m]. By the last inequality, and since f is monotone and submodular, we have the following.
By rearranging the terms, we have , as desired.
To lower bound V , we use Lemma 2.2 with several different sets as Y . Let X be a solution for It makes sense to first use Lemma 2.2 with Y = X 1 and utilize the differential inequality to lower bound V on an interval [0, Subsequently, the resulting differential inequality is used to bound V on [D 1 , D 2 ]. When repeated k times, the process results in the bounding function h, formally given in Definition 2.3. Lemma 2.4 shows that indeed h lower bounds V . Definition 2.3. Let I = (E, f, w, W ) be an MSK instance, and consider X ⊆ E such that w(X) ≤ W , and X 1 , . . . , X k is a partition of X ( for 0 ≤ j ≤ k − 1, and D k = ∞. 5 The bounding function of X 1 , . . . , X k and I is h : It can be easily verified that Then, The first equality follows from (1). The second and forth equalities follow from the definitions of D j ′ and S j ′ . The sixth equality holds since for every j ′ < i ≤ j + 1 we have that . Thus, the bounding function h is continuous.
Let S * be an optimal solution for an MSK instance, I = (E, f, w, W ). Then, the bounding function of I and S * (i.e., where the restriction to integer values can be easily relaxed. Thus, the following lemma can be viewed as a generalization of the analysis of [16]. Lemma 2.4. Let I = (E, f, w, W ) be an MSK instance, V its value function, and A the output of Algorithm 1 for the instance I. Consider a subset of elements X ⊆ E where w(X) ≤ W , and a partition X 1 , . . . , X k of X, such that for any i ∈ [k − 1]. Let h be the bounding function of I and X 1 , . . . , X k , and W max = min {W − max e∈X w(e), w(A)}. Then, for any u ∈ [0, W max ], it holds that V (u) ≥ h(u).
The proof of Lemma 2.4 uses a differential comparison argument. We say a function ϕ : Z → R, Z ⊆ R 2 , is positively linear in the second dimension if there is K > 0 such that for any u, t 1 , t 2 where (u, t 1 ), (u, t 2 ) ∈ Z it holds that ϕ(u, t 1 ) − ϕ(u, t 2 ) = K · (t 1 − t 2 ). The following is a simple variant of standard differential comparison theorems (see, e.g., [11]).
The lemma follows from standard arguments in the theory of differential equations. A formal proof is given in Appendix A Proof of Lemma 2.4. Let (D j ) k j=0 , and (S j ) k j=0 be as in Definition 2.3. Define for j ∈ [k], D j−1 ≤ u < D j and v ∈ R. Let A = {a 1 , . . . , a ℓ }, where a 1 , . . . , a ℓ is the order by which the elements were added to A in Step 5 of Algorithm 1. As before, we use A i = {a 1 , . . . , a i } for i ∈ [ℓ] and A 0 = ∅. Let C = (0, W max ) \ {D 1 , . . . , D k−1 } \ {a 1 , . . . , a ℓ }. Then, for any u ∈ C, there is j ∈ [k] such that D j−1 < u < D j . Hence, where h ′ is the first derivative of h. As u ∈ C, there is also i ∈ [ℓ] such that w(A i−1 ) < u < w(A i ). Hence, For the first inequality, we note that X ⊆ A i−1 ∪ E ′ , where E ′ is the set at the beginning of the iteration in which a i was selected. Indeed, otherwise we have that X contains an element e ∈ E that was considered by the algorithm at some iteration 1 ≤ ℓ < i, but not selected since w(A ℓ ∪ {e}) > W . This would imply that u > w(A i−1 ) ≥ w(A ℓ ) > W max . Thus, as S j ⊆ X we have the conditions of Lemma 2.2. The second inequality holds since V is increasing. By (3) and (4) we have ∀u ∈ C : h ′ (u) = ϕ(u, h(u)) and V ′ (u) ≥ ϕ(u, V (u)).
We can write C = s r=1 (c r , c r+1 ) where 0 = c 1 ≤ c 2 ≤ . . . ≤ c s+1 = W max . For any r ∈ [s] let ϕ r : (c r , c r+1 ) → R be the restriction of ϕ to (c r , c r+1 ) (ϕ r (u) = ϕ(u) for any u ∈ (c r , c r+1 )). It can be easily verified that ϕ r is continuous and positively linear in the second dimension. Furthermore, it holds that V ′ and h ′ are continuous on (c r , c r+1 ) for any r ∈ [s]. Thus, by (5) and Lemma 2.5 it holds that V (u) ≥ h(u) for any u ∈ [0, W max ].
To prove Theorem 1.1 we use EnumGreedy 2 , i.e., we take Algorithm 2 with κ = 2. We note that EnumGreedy 3 is the (1 − e −1 )-approximation algorithm of [16]. Proof. It can be easily verified that the algorithm always returns a feasible solution for the input instance. Let (E, f, w, W ) be an MSK instance and Y ⊆ E an optimal solution for the instance. Let Y = {y 1 , . . . , y |Y | }, and assume the elements are ordered by their marginal values. That is, . Therefore, in this special case the algorithm returns an approximate solution as required. Hence, we may assume that |Y | > 3.
Our analysis focuses on the iteration of the loop of Step 2 in which G = {y 1 , y 2 }. Let A be the output of Greedy(E, f G , w, W −w(G)) in Step 3 in this iteration. If Y \G ⊆ A then f (A∪G) ≥ f (Y ), thus following this iteration it holds that f (S * ) ≥ f (Y ), and the algorithm returns an optimal solution. Therefore, we may assume that Y \ G ⊆ A.
Let e * ∈ Y \ G such that w(e * ) = max e∈Y \G w(e) and denote R = Y \ G \ {e * }. Define two sets X 1 , X 2 such that {X 1 , X 2 } = {{e * }, R} and f (X 1 ) w(X 1 ) ≥ f (X 2 ) w(X 2 ) . As f is submodular it follows that . Let h be the bounding function of (E, f G , w, W − w(G)) and X 1 , X 2 . Also, let r 1 , r 2 , and D 1 be the values from Definition 2.3. By Step 5 of Algorithm 1, as Y \ G ⊆ A, it follows that w(A) ≥ W − w(G) − w(e * ). Thus, by Lemma 2.4, it holds that f G (A) ≥ V (W − w(G) − w(e * )) ≥ h(W − w(G) − w(e * )). We consider the following cases.
Case 1: W − w(G) − w(e * ) ≥ D 1 . In this case it holds that The first and second equalities follow from the definitions of h and D 1 (Definition 2.3). The last inequality follows from w(X 1 ∪ X 2 ) + w(G) ≤ W . Define two sets H e * , H R as follows. If X 1 = {e * } then H e * = ∅ and H R = {e * }. If X 1 = R then H e * = R and H R = ∅. It follows that As the elements y 1 , . . . , y m are ordered according to their marginal values, we have that f ( . Therefore, f (G) ≥ 2 · f G∪H e * (e * ) and we have that By combining (7) and (8) we obtain the following.
Case 2: W − w(G) − w(e * ) < D 1 and X 1 = {e * }. We can use the assumption in this case to lower bound f G (X 1 ) f G∪X 1 (X 2 ) as follows.
By rearranging the terms we have Thus, and the last inequality follows from w(X 1 ) + w(X 2 ) + w(G) ≤ W .
We use (10) to lower bound f (G ∪ A).

Upper Bound on Greedy+Singleton
In this section we consider the common heuristic Greedy+Singleton for MSK (see Algorithm 3). We construct an MSK instance for which the approximation ratio of Greedy+Singleton is strictly smaller than 0.42945, thus tightening the upper bound and almost matching the lower bound of [7]. The input instance was generated by numerical optimization using an alternative formulation of Lemma 2.4. The numerical optimization process was based on a grid search combined with quasiconvex optimization. As the grid search does not guarantee an optimal solution, applying the same approach with an improved numerical optimization may lead to a tighter bound.
Proof of Theorem 1.2. Let µ be the Lebesgue measure on R. That is, given I, a union of intervals on the real-line, µ(I) is the "length" of I. In particular, for I = [a, b], µ(I) = b − a. We define an MSK instance (E, f, w, W ) in which E ⊆ 2 R , all the sets in E are measurable, and for any S ⊆ E, It is easy to verify that f is submodular and monotone. Define The optimal solution for the instance is {X, Y, Z} with f ({X, Y, Z}) = 1 and w({X, Y, Z}) = 1 = W . In the following we add elements to the instance such that the greedy algorithm (invoked in Step 1 of Algorithm 3) selects the other elements rather than {X, Y }. The greedy algorithm first selects elements which are subsets of Z, then elements which are subsets of X ∪ Y ∪ Z, and finally it selects Z.
Also, f ({e}) < β for any e ∈ E. Hence, Algorithm 3 returns a set S such that f (S) < β. It follows that the approximation ratio of Greedy+Singleton is bounded by β.