Generating clause sequences of a CNF formula

Given a CNF formula $\Phi$ with clauses $C_1,\ldots,C_m$ and variables $V=\{x_1,\ldots,x_n\}$, a truth assignment $a:V\rightarrow\{0,1\}$ of $\Phi$ leads to a clause sequence $\sigma_\Phi(a)=(C_1(a),\ldots,C_m(a))\in\{0,1\}^m$ where $C_i(a) = 1$ if clause $C_i$ evaluates to $1$ under assignment $a$, otherwise $C_i(a) = 0$. The set of all possible clause sequences carries a lot of information on the formula, e.g. SAT, MAX-SAT and MIN-SAT can be encoded in terms of finding a clause sequence with extremal properties. We consider a problem posed at Dagstuhl Seminar 19211"Enumeration in Data Management"(2019) about the generation of all possible clause sequences of a given CNF with bounded dimension. We prove that the problem can be solved in incremental polynomial time. We further give an algorithm with polynomial delay for the class of tractable CNF formulas. We also consider the generation of maximal and minimal clause sequences, and show that generating maximal clause sequences is NP-hard, while minimal clause sequences can be generated with polynomial delay.


Introduction
The concept of well-designed pattern trees was introduced by Letelier et al. [9] as a convenient graphic representation of conjuctive queries extended by the optional operator. The nodes of such a tree correspond to the queries, while the tree itself represents the optional extensions. Well-designed pattern trees have been studied from a complexity point of view in several aspects. One of the most interesting problems in the context of query languages is the generation problem, that is, generating the solutions one after the other without repetition.
Previous work The generation problem was studied for First-Order and Conjunctive Queries [3,5,7,12] and for well-designed pattern trees [9]. Recently, Kröll et al. [8] initiated a systematic study of the complexity of the generation problem of well-designed pattern trees. They identified several tractable and intractable cases of the problem both from a classical and from a parameterized complexity point of view. One class of pattern trees however remained unclassified. For a class C of conjunctive queries, a well-designed pattern tree T is globally in C if for every subtree T 1 of T the corresponding conjunctive query is also in C. The treewidth of a conjunctive query is the treewidth of its Gaifman-graph [6]. In [8], the complexity of the generation problem for the class of well-designed pattern trees falling globally in the class of queries of treewidth at most k and having c-semi-bounded interface was left open (see [8, Table  1 on page 16]).
At the Dagstuhl Seminar 19211 "Enumeration in Data Management", Kröll proposed an open problem on the generation of clause sequences of CNF formulas [2,Problem 4.7]. The problem is motivated by the fact that it can be reduced to the above mentioned unsolved case of pattern trees, thus any bound on the generation complexity would be helpful in understanding the general problem. A generation algorithm outputs the objects in question one by one without repetition. We call it a polynomial delay procedure if the computing time between any two consecutive outputs is bounded by a polynomial of the input size. We call it incrementally polynomial, if for any k the first k objects can be generated in polynomial time in the input size and k. Finally, it is called total polynomial if all N objects are generated in polynomial time in the input size and N .
The problem studied in this paper can be formalized as follows. Let V " tx 1 , . . . , x n u be a set of n Boolean variables and Φ " C 1^¨¨¨^Cm be a CNF in these variables with clauses C 1 , . . . , C m . For an assignment a : V Ñ t0, 1u, the corresponding binary sequence σ Φ paq " pC 1 paq, . . . , C m paqq is called a signature 1 of Φ, that is, C i paq " 1 if clause C i evaluates to 1 under assignment a, and C i paq " 0 otherwise. In particular, this means that Φ is satisfiable if and only if there exists some assignment a with σ Φ paq " p1, . . . , 1q. Moreover, MAX-SAT and MIN-SAT can be encoded by asking for the signature with the largest and smallest sum of elements, respectively.
Given a CNF Φ " C 1^¨¨¨^Cm , we denote by dimpΦq " max i"1,...,m |C i |, and call Φ a d-CNF if dimpΦq ď d. The number of clauses and the number of literals appearing in Φ are denoted by |Φ| and }Φ}, respectively. Vectors are written using bold fonts throughout, e.g. x.
The problem asked in [2] is for d-CNF formulas where d is a fixed positive integer, but we also consider the same problem for general CNFs.
Motivated by MAX-SAT and MIN-SAT, we also consider maximal and minimal signatures. A signature of a CNF Φ is called maximal (resp. minimal) if an inclusionwise maximal (resp. minimal) subset of the clauses takes value 1.

Generation of maximal signatures
Input: A CNF Φ. Output: All possible maximal signatures of Φ.

Generation of minimal signatures
Input: A CNF Φ. Output: All possible minimal signatures of Φ.
Our results We show that GSpΦq can be solved in incremental polynomial time for formulas with a bounded dimension, thus answering the open problem posed by Kröll, and with polynomial delay for the class of tractable CNF formulas. For the class of formulas with bounded dimension and co-occurrence, we derive a faster incremental polynomial algorithm. We also show that generating maximal signatures is NP-hard, while minimal signatures can be generated with polynomial delay.
Organization Our algorithm with polynomial delay for the class of tractable CNF formulas is given in Section 2. Section 3 discusses CNFs with bounded dimension: an incremental polynomial algorithm is presented in Section 3.1 for CNFs with bounded dimension and cooccurrence, while our main result answering the question of Kröll is presented in Section 3.2. The generation of maximal and minimal clause sequences is considered in Section 4. Finally, we conclude the paper in Section 5, where a 'reversed' variant of the problem is proposed as an open question.

Tractable CNFs
We call a family of CNFs tractable if for any CNF Φ in this family the satisfiability of any sub-CNF of Φ can be decided in polynomial time even after fixing any subset of the variables at arbitrary values. For example, the classes of 2-CNFs or Horn CNFs are tractable. Theorem 1. If Φ belongs to a tractable family and has m clauses, then its signatures can be generated with a delay of Opmq SAT-calls.
Proof. The idea is to apply the so-called 'flashlight' approach in the signature space, using SAT as a 'flashlight' [1]. Let Φ " Ź m i"1 C i . We are going to build a binary tree in which the paths from the root to the vertices of the tree correspond to binary values of initial segments of the set of clauses, that is, C 1 , . . . , C k for some 1 ď k ď m. There exists a signature with this prefix if and only if the CNF formed by the clauses set to value one in this sequence is satisfiable even after all the forced fixing of variables that appear in clauses whose value is zero (note that a clause has value 0 if and only if all the literals in it are 0). If such a CNF is not satisfiable, we backtrack and do not explore the subtree rooted at this vertex as there exists no signature with this prefix. If the CNF is satisfiable, we continue building the corresponding subtree which in this is guaranteed to contain at least one signature. The algorithm will not backtrack above this vertex before outputing all (at least one) signatures in this subtree. It is not difficult to verify that after at most 2m calls to SAT we can output a new signature not generated before. After outputting the last signature, the procedure terminates after at most m SAT calls.
Remark 2. Let us remark that the family of monotone CNFs is tractable, but for this case there is a more efficient polynomial delay generation of the signatures. Indeed, in this case we can view a clause as a subset of the variables. Consequently, the set of zeros in a signature corresponds to a union of clauses. We claim that all such unions can be generated with Opnmq delay, where m " |Φ| is the number of clauses, implying that all signatures of Φ can be generated with polynomial delay.
To see this claim, we represent unions as leaves of a binary tree of depth n (nodes correspond to variables), where we construct only the vertices that are on paths to the leaves. Besides the binary tree, we keep the leaves in a last-in-first-out queue 2 . Initially, leaves correspond to individual clauses of Φ. Each time before outputting the first union U in the queue, we check for all clauses C P Φ if C Y U is a new union or not by using our binary tree. This takes Opnq time for one clause, and Opnmq time for all the clauses of Φ. Whenever a new union is found, it is added to the tree and the queue as a last element. After this, we output U and remove it from the queue. It is not difficult to verify that this gives us an Opnmq delay generation of all unions. Note that in this case Theorem 1 guarantees only an Op}Φ}mq delay, because every SAT call requires Op}Φ}q time.
3 CNFs with bounded dimension

Bounded co-occurrence
Given a CNF Φ, we denote by H Φ " pΦ, Eq the conflict graph of Φ. The vertices of H Φ are the clauses of Φ and edges are exactly the conflicting pairs of clauses, i.e., pairs pC i , C j q for which there exists a literal u P C i such thatū P C j .
Let S Ď Φ be a maximal independent set of H Φ , and let LpSq " Ť C i PS C i denote the set of literals appearing in the clauses of S. We define a partial assignment a S : LpSq Ñ t0, 1u by setting all literals of LpSq to zero (and hence the complementary literals are set to 1). The signature associated to S is then defined as σ Φ pSq :" σ Φ pa S q " py 1 , . . . , y m q P t0, 1u m . The coordinates of σ Φ pSq are well-defined as y i " 0 if and only if C i P S for i " 1, ..., m. We will dismiss the subscript Φ whenever the CNF in question is clear from the context. Note that for different maximal independent sets S ‰ S 1 of H Φ we have σpSq ‰ σpS 1 q. It is worth mentioning that all maximal independent sets of H Φ can be generated with polynomial delay [10,13], which is hence a good start for CNF signature generation.
Assume that Φ has bounded dimension, i.e., for a constant d we have |C i | ď d for all i " 1, ..., m. Let us define X j " tC i P Φ | x j P C i orx j P C i u. We say that Φ is of ω-bounded co-occurrence if |X j | ď ω for j " 1, ..., n and ω is a fixed constant.
Theorem 3. If Φ has bounded dimension and co-occurrence, then its signatures can be generated in incremental polynomial time.
Proof. Let us construct greedily a maximal induced matching M Ď E in H Φ . Note that H Φ has at least 2 |M | maximal independent sets (and hence at least this many signatures can be generated with polynomial delay, as explained above). We denote by W Ď Φ the set of clauses that have edges in H Φ connecting them to some of the clauses covered by M , and set U " ΦzW . Note that U is an independent set in H Φ .
Assume that µ " |M |, |C i | ď d for all i " 1, ..., m, and |X j | ď ω for all j " 1, ..., n. According to our assumptions, d and ω are fixed constants. Observe that with these notations we have |W | ď 2µdω. We denote by n 1 the number of variables involved in clauses of W . Note that we have n 1 ď d|W |.
We denote by L 1 the (possibly empty) set of variables that are monotone in Φ and appear only in clauses of U (some variables appear only positively while some others appear only negatively). Let us first set all literals in L 1 to 0, and consider the resulting CNF Φ 1 in n 1 variables. We generate with polynomial delay the maximal independent sets S ℓ , ℓ " 1, ..., k of H Φ 1 , and the corresponding signatures σpS ℓ q, ℓ " 1, ..., k. Now we have k ě 2 µ ě p2 n 1 q 1{2d 2 ω , and thus we can try all binary assignments to the n 1 variables in Opmnk 2d 2 ω q time, and see if we get some more signatures.
Assume we get k 1 ě k distinct signatures. By switching the literals in L 1 , we may get new signatures, resulting from changing some of the zeros in a signature to one. For any partial assignment to the n 1 variables, this is a set-union generation problem that can be solved with polynomial delay, see Remark 2. We may get in this way the same signature multiple times, but no more than k 1 times, and thus at this stage the additional signatures are also generated in incremental polynomial time.

Unbounded co-occurrence
In the previous section, we considered CNFs with bounded dimension and co-occurrence. The running time of the algorithm provided by Theorem 3 depends exponentially on ω, hence it is not suitable for handling the general case. In the present section, a more general procedure is given based on a different approach.
For a CNF Φ, we denote by G Φ " pΦ, Eq the so called dual graph of Φ [11]. The vertices of G Φ are the clauses of Φ and edges are exactly the pairs of clauses pC i , C j q for which there exists a variable that occurs in both C i and C j (complemented or not). If S Ď Φ is an independent set of G Φ , then the clauses of S have pairwise disjoint sets of variables involved. Proof. We prove the claim by induction on d. For d ď 2 the claim follows by Theorem 1.
Assume now that we already proved the claim for all d 1 ă d, and let us consider a CNF Φ " C 1^C2^¨¨¨^Cm with dimpΦq " d. Let us associate to Φ its dual graph G Φ as defined above. Let S Ď V pG Φ q be a maximal independent set of G Φ . Such a set can be obtained by a simple greedy procedure in polynomial time in the size of Φ. Note that clauses in S involve pairwise disjoint sets of variables, due to the fact that S is an independent set of G Φ . Thus, we can choose a literal u C P C for each clause C P S, set all other literals in C to zero, set all other variables not occurring in clauses of S to zero, and make all possible truth assignment to the literals u C , C P S. This way we obtain k 0 " 2 |S| different binary signatures of Φ. Note that we can output these k 0 signatures with polynomial delay.
The total number of variables involved in clauses of S is n 1 ď d|S|. Hence we can assign in all possible ways values to these variables, and produce 2 n 1 subproblems Φ j , j " 1, ..., 2 n 1 in the remaining variables in Opmn2 n 1 q " Opmnk d 0 q time which is polynomial in the input size and k 0 , since d is a fixed constant. Each of these residual problems is of dimension at most d´1. Indeed, each of the clauses not in S shares at least one variable with the clauses of S, since S is a maximal independent set of G Φ , and now that shared variable is fixed at a binary value.
We apply algorithm A to each of the residual sub-CNFs Φ j , j " 1, ..., 2 n 1 , one by one. This way we produce signatures that extend the pattern on S defined by x j P t0, 1u n 1 , for all j " 1, ..., 2 n 1 one by one. We may produce the same signature in this way again and again, but no more than 2 n 1 times. Since 2 n 1 " Opk d 0 q, we can show that this procedure works in total polynomial time.
To see this let us introduce some additional notation. We denote by X j Ď Y " t0, 1u n 1 , j " 1, ..., 2 |S| the nonempty sets of (partial) assignments that produce the same signature on the clauses of S. For x P Y , let us denote by Φpxq the residual CNF, and by kpxq the number of signatures of Φpxq. We denote by gpΨq the running time of the above described recursive algorithm on CNF Ψ and let Gpm, n, d, kq be the maxima of gpΨq over all CNFs with at most m clauses on n variables having dimpΨq ď d and having at most k signatures.
The total computational time in the first phase of the above procedure that ends with producing a list of 2 n 1 residual CNFs, each of dim ď d´1 is bounded by for a suitable constant K that does not depend on m, n, and k 0 . The first term on the left hand side is the time to build G Φ and to find a maximal independent set S. The second term is the time we need to generate the k 0 initial signatures. The third term is the time to generate the 2 n 1 ď k d 0 subproblems. For x P X j and x 1 1 1 P X j 1 with j ‰ j 1 the CNFs Φpxq and Φpx 1 1 1 q cannot share signatures, since those must already differ on S by the definition of the sets X j for j " 1, ..., k 0 . However, for x, x 1 1 1 P X j CNFs Φpxq and Φpx 1 1 1 q may share (many) signatures. Discounting the one signature we already produced with a given trace on S, we can still expect k j ě max xPX j kpxq´1 different signatures produced by algorithm A when we use it for CNFs Φpxq, x P X j . Thus, in total we get k " k 0`k1`¨¨¨`k 2 |S| different signatures for Φ. The total running time on CNFs Φpxq, x P X j can be bounded by ÿ xPX j gpΦpxqq ď |X j |Gpm, n, d´1, k j q.
Thus, for the total running time of algorithm A on Φ we get gpΦq ď Gpm, n, d, kq ď Km 2 nk d where for the last inequality we used k j ď k for all j " 1, ..., k 0 , implying Gpm, n, d´1, k j q ď Gpm, n, d´1, kq, which allows this quantity to be factored out of the sum, that can be then upper bounded by ř k 0 j"1 |X j | " 2 n 1 ď k d 0 . Using this we can show by induction on d that Gpm, n, d, kq ď Ldm 2 nk p d 2 q for some constant L (we will choose L ě K) which will complete the proof of our claim. Now Corollary 5. The algorithm A constructed in the above proof in fact works in incremental polynomial time.
Proof. Using the above theorem, we can prove this claim by induction on the dimension d.
When d " 1, the claim is trivially true. Consider now the general case, as in the proof of the above theorem. As we remarked there, producing the first k 0 " 2 |S| signatures in fact can be done with polynomial delay. After this we start processing the CNFs Φpxq for x P X j , j " 1, ..., k 0 . Note that the signatures produced from Φpxq, x P X j and Φpx 1 1 1 q, x 1 1 1 P X j 1 are all different if j ‰ j 1 . Note also that dimpΦpxqq ď d´1 for all x P X j , j " 1, ..., k 0 , and thus we can assume by induction that their signatures can be produced in incremental polynomial time in the size of Φpxq, which is bounded by the size of Φ. Thus, if X j " tx 1 , ..., x ℓ u, then we can produce kpx 1 q new signatures in incremental polynomial time, in fact regardless how many we produced previously (including the k 0 we have from the first phase.) Let us denote by qpm, n, kpx 1 qq the polynomial bounding the total time processing Φpx 1 q. If kpx 2 q ą kpx 1 q, then maybe the first kpx 1 q signatures produced from Φpx 2 q coincide with the ones we already generated from Φpx 1 q, but still after at most qpm, n, kpx 1 qq time we get a new signature. In the worst case, we have k j " kpx 1 q ě kpx i q for all x i P X j , i ‰ 1, in which case processing Φpx i q, i " 2, ..., ℓ may not produce any new signatures. Since ℓ ď k d 0 , this means that the largest gap between the output of the last signature of Φpx 1 q and next new signature is not more than k d 0 qpm, n, kpx 1 qq, at a moment when we have already produced k 1 ě k 0`k px 1 q signatures. Thus this largest time gap between two outputs is still bounded by a polynomial of the input size Opmnq and the number of signatures k 1 ě k 0`k px 1 q produced so far.

Generating maximal and minimal signatures
Generation of maximal signatures is difficult as it includes SAT as a special case.
Theorem 6. Generating all maximal signatures is NP-hard.
Proof. Let us consider a CNF Φ, and observe that its unique maximal signature is the all-one vector if and only if Φ is satisfiable. Hence any total polynomial time algorithm generating the maximal signatures would detect satisfiability of Φ. As SAT is difficult in general [4], the theorem follows.
It turns out that minimal signatures can be generated efficiently.
Theorem 7. Minimal signatures can be generated with polynomial delay.
Proof. We claim that there is a one-to-one correspondance between minimal signatures of a CNF Φ and maximal independent sets of its conflict graph H Φ . Since H Φ can be built in polynomial time from Φ and maximal independent sets of a graph can be generated with polynomial delay [10,13], this would prove the theorem.
To see the above claim, assume first that a signature σ " tσ C | C P Φu is a minimal signature of Φ. Note that the set S " tC P Φ | σ C " 0u is an independent set in H Φ . For any C P Φ with σ C " 1 there must exist a conflict between C and some C 1 P S, since otherwise we could set σ C to zero without forcing any of the clauses in S to change their values, contradicting the minimality of σ. Thus S must be a maximal independent set.
The other direction follows from the fact that if S is a maximal independent set of H Φ and we set all the clauses in S to zero, then all other clauses of Φ are forced to take value one due to the conflicts between S and other vertices of H Φ .

Conclusions
In this paper we show that all signatures of a given CNF with a bounded dimension can be generated in incremental polynomial time, answering an open problem posed by Kröll [2,Problem 4.7]. A faster incremental polynomial algorithm is provided for the class of formulas where both the dimension and the co-occurrence are bounded. Moreover, it is also shown that the same task can be done with polynomial delay if the input CNF is from a tractable class (in this case no bound on dimension or co-occurrence is necessary). Finally, it is proved that generating maximal signatures is NP-hard, while minimal signatures can be generated with polynomial delay.
In this context it is interesting to note that given a 3-CNF Φ with m clauses and the vector y " p1, 1, ..., 1q P t0, 1u m it is NP-hard to test whether y is a signature of Φ, or not (y is a signature if only if Φ is satisfiable). On the other hand, our results show that generating all signatures of Φ can be done in incremental polynomial time. This is a rather unusual behavior for a generation problem. Typically, if all solutions of a given problem can be generated in incremental polynomial time, checking if a given candidate is a solution or not is computationally easy. An additional problem connected to CNF signatures was stated at the Dagstuhl Seminar 19211 by Gy. Turán. Given a set S Ď t0, 1u m , does there exist a CNF with m clauses such that S is exactly its set of all signatures? If yes, can such a CNF be computed efficiently? This 'reverse' problem (get the signatures, output clauses) to the problem presented in this paper (get the clauses, output signatures) is to the best of our knowledge completely open.