Unifying Splitting

AVATAR is an elegant and effective way to split clauses in a saturation prover using a SAT solver. But is it refutationally complete? And how does it relate to other splitting architectures? To answer these questions, we present a unifying framework that extends a saturation calculus (e.g., superposition) with splitting and that embeds the result in a prover guided by a SAT solver. The framework also allows us to study locking, a subsumption-like mechanism based on the current propositional model. Various architectures are instances of the framework, including AVATAR, labeled splitting, and SMT with quantifiers.

superposition calculus because it combines superposition's strong equality reasoning with the SAT solver's strong clausal reasoning. It is also appealing theoretically, because it gracefully generalizes traditional saturation provers and yet degenerates to a SAT solver if the problem is propositional.
To illustrate the approach, we follow the key steps of an AVATAR-enabled resolution prover on the initial clause set containing ¬p(a), ¬q(z, z), and p(x)∨q(y, b). The disjunction can be split into p(x)←{[p(x)]} and q(y, b)←{[q(y, b)]}, where C ←{[C]} indicates that the clause C is enabled only in models in which the associated propositional variable [C] is true. A SAT solver is then run to choose a model J of What about refutational completeness? Far from being a purely theoretical concern, establishing completeness-or finding counterexamples-could yield insights into splitting and perhaps lead to an even stronger AVATAR. Before we can answer this open question, we must mathematize splitting. Our starting point is the saturation framework by Waldmann, Tourret, Robillard, and Blanchette [29], based on the work of Bachmair and Ganzinger [2]. It covers a wide array of techniques, but "the main missing piece of the framework is a generic treatment of clause splitting" [29, p. 332]. We provide that missing piece, in the form of a splitting framework, and use it to show the completeness of an AVATAR-like architecture. The framework is currently a pen-and-paper creature; a formalization using Isabelle/HOL [21] is underway.
Our framework has five layers, linked by refinement. The first layer consists of a base calculus, such as resolution or superposition. It must be presentable as an inference system and a redundancy criterion, as required by the saturation framework, and it must be refutationally complete.
From a base calculus, our framework can be used to derive the second layer, which we call the splitting calculus (Sect. 3). This extends the base calculus with splitting and inherits the base's completeness. It works on A-clauses or A-formulas of the form C ← A, where C is a base clause or formula and A is a set of propositional literals, called assertions (Sect. 2).
Using the saturation framework, we can prove the dynamic completeness of an abstract prover, formulated as a transition system, that implements the splitting calculus. However, this ignores a major component of AVATAR: the SAT solver. AVATAR considers only inferences involving A-formulas whose assertions are true in the current propositional model. The role of the third layer is to reflect this behavior. A model-guided prover operates on states of the form (J, N ), where J is a propositional model and N is a set of A-formulas (Sect. 4). This layer is also dynamically complete.
The fourth layer introduces AVATAR's locking mechanism (Sect. 5). With locking, an A-formula D ← B can be temporarily disabled by another A-formula C ← A if C subsumes D, even if A B. Here we make a first discovery: AVATAR-style locking compromises completeness and must be curtailed.
Finally, the fifth layer is an AVATAR-based prover (Sect. 6). This refines the locking modelguided prover of the fourth layer with the given clause procedure, which saturates an Aformula set by distinguishing between active and passive A-formulas. Here we make another discovery: Selecting A-formulas fairly is not enough to guarantee completeness. We need a stronger criterion.
There are also implications for other architectures. In a hypothetical tête-à-tête with the designers of labeled splitting, they might gently point out that by pioneering the use of a propositional model, including locking, they almost invented AVATAR themselves. Likewise, developers of satisfiability modulo theories (SMT) solvers might be tempted to claim that Voronkov merely reinvented SMT. To investigate such questions, we apply our framework to splitting without backtracking, labeled splitting, and SMT with quantifiers (Sect. 7). This gives us a solid basis for comparison as well as some new theoretical results.
A shorter version of this article was presented at CADE-28 [14]. This article extends the conference paper with more explanations, examples, counterexamples, and proofs. We strengthened the definition of consequence relation to require compactness, which allowed us to simplify property (D4). The property (D4) from the conference paper is proved as Lemma 5. The definition of strongly finitary was also changed to include a stronger condition on the introduced assertions, which is needed for the proof of Lemma 72.

Preliminaries
Our framework is parameterized by abstract notions of formulas, consequence relations, inferences, and redundancy. We largely follow the conventions of Waldmann et al. [29]. A-formulas generalize Voronkov's A-clauses [28].

Formulas
A set F of formulas, ranged over by C, D ∈ F, is a set that contains a distinguished element ⊥ denoting falsehood. A consequence relation | over F is a relation | ⊆ (P(F)) 2  The intended interpretation of M | N is conjunctive on the left but disjunctive on the right: " M − → N." The disjunctive interpretation of N will be useful to define splittability abstractly in Sect. 3.1. Property (D4) is called the cut rule, and (D5) is called compactness.
For their saturation framework, Waldmann et al. instead consider a fully conjunctive version of the consequence relation, with different properties. The incompatibility can easily be repaired: Given a consequence relation | , we can obtain a consequence relation | in their sense by defining M | N if and only if M | {C} for every C ∈ N. The two versions differ only when the right-hand side is not a singleton. The conjunctive version | can then be used when interacting with the saturation framework.
The | notation can be extended to allow negation on either side. Let F ∼ be defined as Proof By (R2), it suffices to show ⊥ / An A-formula over a set F of base formulas and an assertion set A is a pair C = (C, A) ∈ AF = F × P fin (A), written C ← A, where C is a formula and A is a finite set of assertions {a 1 , . . . , a n } understood as an implication a 1 ∧ · · · ∧ a n − → C. We identify C ← ∅ with C and define the projections C ← A = C and (C n ← A n , . . . , C 0 ← A 0 ) = (C n , . . . , C 0 ). Moreover, N ⊥ is the set consisting of all A-formulas of the form ⊥ ← A ∈ N , where A ∈ P fin (A). Since ⊥ ← {a 1 , . . . , a n } can be read as ¬a 1 ∨ · · · ∨ ¬a n , we call such Aformulas propositional clauses. (In contrast, we call a variable-free base formula such as p∨q a ground clause when F is first-order logic.) The set N ⊥ represents the clauses considered by the SAT solver in the original AVATAR [28]. Note the use of calligraphic letters (e.g., C , N ) to range over A-formulas and sets of A-formulas.
Model-guided provers only consider A-formulas whose assertions are true in the current interpretation. Thus we say that an A-formula C ← A ∈ AF is enabled in a propositional interpretation J if A ⊆ J. A set of A-formulas is enabled in J if all of its members are enabled in J. Given an A-formula set N ⊆ AF, the enabled projection N J ⊆ N consists of the projections C of all A-formulas C enabled in J. Analogously, the enabled projection Inf J ⊆ Inf of a set Inf of AF-inferences consists of the projections ι of all inferences ι ∈ Inf whose premises are all enabled in J.
A propositional interpretation J is a propositional model of N ⊥ , written J | N ⊥ , if ⊥ / ∈ (N ⊥ ) J . (i.e., (N ⊥ ) J = ∅). Moreover, we write J | ≈ N ⊥ if ⊥ / ∈ (N ⊥ ) J or fml(J) | ≈ {⊥}. A set N ⊥ is propositionally satisfiable if there exists an interpretation J such that J | N ⊥ . In contrast to consequence relations, propositional modelhood | interprets the set N ⊥ conjunctively: J | N ⊥ is informally understood as J | N ⊥ . Given consequence relations | and | ≈, we lift them from P(F) to P(AF): M | N if and only if M J | N for every J in which N is enabled, and M | ≈ N if and only if fml(J)∪M J | ≈ N for every J in which N is enabled. The consequence relation | is used for the completeness of the splitting prover and only captures what inferences such a prover must perform. In contrast, | ≈ captures a stronger semantics: For example, thanks to fml(J) among the premises for | ≈, the A-formula fml(a) ← {a} is always a | ≈-tautology. Also note that assuming ∅ | ∅, then | ⊆ | ≈ on sets that contain exclusively propositional clauses. When needed, we use | ≈ F to denote | ≈ on P(F) and analogously for | ≈ AF , as well as | F and | AF .
Finally, we show the compactness of | ≈ AF (D5), using the compactness of propositional logic. First we consider the case where N is never enabled. Then the set of assertions in N , seen as conjunctions of propositional literals, is unsatisfiable. By compactness, there exists a finite subset of these assertions that is also unsatisfiable, i.e., there is a finite subset N of N that is also never enabled. Thus for any finite subset M of M , M | ≈ N as wanted.
Otherwise, there is at least one J enabling N . By abuse of notation, we write N A even if A ⊆ A is not an interpretation. For every interpretation J in which N is enabled, there exist by compactness of | ≈ F finite sets J ⊆ J, M J ⊆ M , and N J ⊆ N such that Note that J | E if and only if N is enabled in J. This observation implies that the sets of propositional clauses E and {⊥ ← J | J interpretation where N is enabled} ∪ E are, respectively, propositionally satisfiable and propositionally unsatisfiable. By compactness, there exists a finite unsatisfiable subset {⊥ ← J 1 , . . . , ⊥ ← J n } ∪ E of the latter set.
where J i is any of the interpretations enabling N that is at the origin of the existence of this J i and N is a finite subset of N such that all assertions in E also occur negated in N . Note that both M and N are finite sets. It now suffices to show M | ≈ N . Thus let J be an interpretation in which N is enabled. Then J | E because all assertions in E also appear negated in N ⊆ N . Thus, Given sets M, N ⊆ P(F), the expression M | N can refer to either the base consequence relation on P(F) or the lifted consequence relation on P(AF) (since F ⊆ AF). Fortunately, there is no ambiguity. First, let us show a preparatory lemma: Aside from resolving ambiguity, Lemma 6 justifies the use of splitting in provers without compromising soundness or completeness: When we prove a completeness theorem that claims that a given prover derives ⊥ from any initial | AF -unsatisfiable set M ⊆ AF, Lemma 6 allows us to conclude that it also derives ⊥ when starting from any initial | F -unsatisfiable set M ⊆ F.
Given a formula C ∈ F ∼ , let asn(C) denote the set of assertions a ∈ A such that {fml(a)} | ≈| {C}. Normally, we would make sure that asn(C) is nonempty for every formula C. Given a ∈ asn(C), observe that if a ∈ asn(D), then {C} | ≈| {D}, and if ¬a ∈ asn(D), then {C} | ≈| {∼D}.

Remark 7
Our propositional interpretations are always total. We could also consider partial interpretations-that is, J ⊆ A such that at most one of v ∈ J and ¬v ∈ J holds for every v ∈ V. But this is not necessary, because partial interpretations can be simulated by total ones: For every variable v in the partial interpretation, we can use two variables v + and v − in the total interpretation and interpret v + as true if v is true and v − as true if v is false. By adding the propositional clause ⊥←{v − , v + }, every total model of the translated A-formulas corresponds to a partial model of the original A-formulas.

Example 8
In the original description of AVATAR [28], the connection between first-order clauses and assertions takes the form of a function [ ] : F → A. The encoding is such that [¬C] = ¬[C] for every ground unit clause C and [C] = [D] if and only if C is syntactically equal to D up to variable renaming. This can be supported in our framework by letting A different encoding is used to exploit the theories of an SMT solver [4]. With a notion of | ≈-entailment that gives a suitable meaning to Skolem symbols, we can go further and have [¬C(sk ¬C(x) )] = ¬[C(x)]. Even if the superposition prover considers sk ¬C(x) an uninterpreted symbol (according to | ), the SAT or SMT solver can safely prune the search space by assuming that C(x) and ¬C(sk ¬C(x) ) are exhaustive (according to | ≈).

Splitting Calculi
Let F be a set of base formulas equipped with ⊥, | , and | ≈. The consequence relation | ≈ is assumed to be nontrivial: (D6) ∅ | ≈ ∅. Let A be a set of assertions over V, and let AF be the set of A-formulas over F and A. Let (FInf, FRed) be a base calculus for F-formulas, where FRed is a redundancy criterion that additionally satisfies These requirements can easily be met by a well-designed redundancy criterion. Requirement (R5) is called reducedness by Waldmann et al. [30,Sect. 2.3]. Requirement (R6) must hold of any complete calculus (Lemma 2), and (R7) can be made without loss of generality (Remark 3). Bachmair and Ganzinger's redundancy criterion for superposition [1,Sect. 4.3] meets (R1)-(R7).
From a base calculus, we will define an induced splitting calculus (SInf, SRed). We will show that the splitting calculus is sound w.r.t. | ≈ and that it is statically and dynamically complete w.r.t. | . Furthermore, we will show two stronger results that take into account the switching of propositional models that characterizes most splitting architectures: strong static completeness and strong dynamic completeness.

The Inference Rules
We start with the mandatory inference rules.

Definition 9
The splitting inference system SInf consists of all instances of the following two rules: Unsat ⊥ For Base, the side condition is (C n , . . . , C 1 , D) ∈ FInf. For Unsat, the side condition is that {⊥ ← A 1 , . . . , ⊥ ← A n } is propositionally unsatisfiable.
In addition, the following optional inference rules can be used if desired; the completeness proof does not depend on their application. Rules identified by double bars, such as Split, are simplifications; they replace their premises with their conclusions in the current A-formula set. The premises' removal is justified by SRed F , defined in Sect. 3.2.
In the Split rule, we require that C = ⊥ is splittable into C 1 , . . . , C n and that a i ∈ asn(C i ) Split performs an n-way case analysis on C. Each case C i is approximated by an assertion a i . The first conclusion expresses that the cases are exhaustive. The n other conclusions assume C i if its approximation a i is true.
In a clausal prover, typically C = C 1 ∨ · · · ∨ C n , where the subclauses C i have mutually disjoint sets of variables and form a maximal split. For example, the clause p(x) ∨ q(x) is not splittable because of the shared variable x, whereas p(x) ∨ q(y) can be split into {p(x), q(y)}.
Collect removes A-formulas whose assertions cannot be satisfied by any model of the propositional clauses-a form of garbage collection. Similarly, Trim removes assertions that are entailed by existing propositional clauses.
StrongUnsat is a variant of Unsat that uses | ≈ instead of | . A splitting prover may choose to apply StrongUnsat if desired, but only Unsat is necessary for completeness. In practice, | ≈-entailment can be much more expensive to decide, or even be undecidable. A splitting prover could invoke an SMT solver [4] (| ≈) with a time limit, falling back on a SAT solver (| ) if necessary.
Approx can be used to make any derived A-formula visible to | ≈. It is similar to a one-way split. Tauto, which asserts a | ≈-tautology, allows communication in the other direction, from the SMT or SAT solver to the calculus.

Example 11
Consider a splitting calculus obeying the AVATAR conventions of Example 8. When splitting on C(x) ∨ D(y), after closing the C(x) case, we can assume that C(x) does not hold when considering the D(y) case. This can be achieved by adding the A-clause ¬C(sk ¬C(x) ) ← {¬[C(x)]} using Tauto. If we use an SMT solver that is strong enough to determine that ¬C(sk ¬C(x) ) and D(y) are inconsistent, we can then apply StrongUnsat immediately, skipping the D(y) branch altogether. This would be the case if we took C(x) := f(x) > 0 and D(y) := f(y) > 3 with a solver that supports linear arithmetic and quantifiers. We are not aware of any prover that implements this idea, although a similar idea is described for ground C(x) in the context of labeled splitting [15,Sect. 2].

Example 12
Consider a splitting calculus whose propositional solver is an SMT solver supporting linear arithmetic. Suppose that we are given the inconsistent clause set {c > 0, c < 0}. Two applications of Approx make these clauses visible to the SMT solver, as the propositional clause set {⊥ ← ¬(c > 0), ⊥ ← ¬(c < 0)}. Then the SMT solver, modeled by StrongUnsat, detects the unsatisfiability. The splitting inference system commutes nicely with the enabled projection: Proof The condition ⊥ / ∈ N J rules out the Unsat inferences. It remains to show that the enabled projection of a Base inference is an FInf-inference from enabled premises, and vice versa.
The first case is trivial. In the other case, J | ≈ N J ∪{C} and thus J | ≈ {C}, as required.
Case Approx: The proof is as for the left conclusion of Split.

The Redundancy Criterion
Next, we lift the base redundancy criterion.

Definition 15
The splitting redundancy criterion SRed = (SRed I , SRed F ) is specified as follows. An A-formula C ← A ∈ AF is redundant w.r.t. N , written C ← A ∈ SRed F (N ), if either of these conditions is met: An inference ι ∈ SInf is redundant w.r.t. N , written ι ∈ SRed I (N ), if either of these conditions is met: (3) ι is a Base inference and {ι} J ⊆ FRed I (N J ) for every J; or (4) ι is an Unsat inference and ⊥ ∈ N .
Condition (1) lifts FRed F to A-formulas. It is used both as such and to justify the Split and Collect rules, as we will see below. Condition (2) is used to justify Trim. We will use SRed F to justify global A-formula deletion, but also FRed F for local A-formula deletion in the locking prover. Note that SRed is not reduced. Inference redundancy partly commutes with the enabled projection: Proof Since ⊥ / ∈ N , condition (4) of the definition of SRed I cannot apply. The inclusion then follows directly from condition (3) applied to the interpretation J.
Proof By Lemma 2, condition (1) of the definition of SRed F cannot apply. Nor can condition (2).

Lemma 18 SRed is a redundancy criterion.
Proof We will first show that the restriction ARed of SRed to Base inferences is a redundancy criterion. Then we will consider Unsat inferences. We start by showing that ARed is a special case of the redundancy criterion FRed ∩G , of Waldmann et al. [29,Sect. 3]-the intersection of lifted redundancy criteria with tiebreaker orders. Then we can simply invoke Theorem 37 and Lemma 19 from their technical report [30].
To strengthen the redundancy criterion, we define a tiebreaker order such that C ← A D ← B if and only if C = D and A ⊂ B. In this way, C ← B is redundant w.r.t. C ← A if A ⊂ B, even though the base clause is the same. The only requirement on is that it must be well founded, which is the case since the assertion sets of A-formulas are finite. We also define a family of grounding functions G J indexed by a propositional model J. Here, "grounding" will mean enabled projection. For A-formulas C , we set We must show that G J satisfies the following characteristic properties of grounding function: (G1) G J (⊥) = {⊥}; (G2) for every C ∈ AF, if ⊥ ∈ G J (C ), then C = ⊥; and (G3) for every ι ∈ SInf, G J (ι) ⊆ FRed I (G J (concl(ι))).
Condition ( if and only if for every propositional interpretation J and every D ∈ G J (C ), either D ∈ FRed F (G J (N )) or there exists C ∈ N such that C C and D ∈ G J (C ).
We also need to check that the consequence relation | used in SRed coincides with the consequence relation | ∩ G , which is defined as M | ∩ G {C } if and only if for every J and After expanding G J , this is exactly the definition we used for lifting | to AF.
To extend the above result to SRed, we must show the second half of conditions (R2) and (R3) as well as (R4) for Unsat inferences.
(R4) Given an Unsat inference ι, we must show that if ⊥ ∈ N , then ι ∈ SRed I (N ). This follows from the definition of SRed I .
SRed is highly versatile. It can justify the deletion of A-formulas that are propositionally tautological, such as C ← {v, ¬v}. It lifts the base redundancy criterion gracefully: . It also allows other simplifications, as long as the assertions on A-formulas used to simplify a given C ← A are contained in A. If the base criterion FRed F supports subsumption (e.g., following the lines of Waldmann et al. [29]), this also extends to A-formulas: D ← B ∈ SRed F ({C ← A}) if D is strictly subsumed by C and B ⊇ A, or if C = D and B ⊃ A. Finally, it is strong enough to justify case splits and the other simplification rules presented in Sect. 3.1.

Theorem 19 (Simplification)
For every Split, Collect, or Trim inference, the conclusions collectively make the premises redundant according to SRed F .
If a i ∈ J for some i, this follows from Split's side condition C ∈ FRed F ({C i }). Otherwise, this follows from (R7), the requirement that C ∈ FRed F ({⊥}), since C = ⊥.
. This follows directly from condition (2) of the definition of SRed F . Annoyingly, the redundancy criterion SRed does not mesh well with α-equivalence. We would expect the A-formula p(x) ← {a} to be subsumed by p(y) ← ∅, where x, y are variables, but this is not covered by condition (2) of SRed F because p(x) = p(y). The simplest solution is to take F to be the quotient of some set of raw formulas by α-equivalence. An alternative is to generalize the theory so that the projection operator G J generates entire α-equivalence classes (e.g.,

Standard Saturation
We will now prove that the splitting calculus is statically complete and therefore dynamically complete. Unfortunately, derivations produced by most practical splitting architectures violate the fairness condition associated with dynamic completeness. Nevertheless, the standard completeness notions are useful stepping stones, so we start with them.

Lemma 20
Let N ⊆ AF be an A-formula set, and let J be a propositional interpretation. If N is saturated w.r.t. SInf and SRed I , then N J is saturated w.r.t. FInf and FRed I .

Proof
Assuming ι ∈ FInf(N J ), we must show ι ∈ FRed I (N J ). The argument follows that of the "folklore" Lemma 26 in the technical report of Waldmann et al. [30]. First note that any inference in FInf is lifted, via Base, in SInf, so that we have ι ∈ (SInf(N )) J . This means that there exists a Base inference ι 0 ∈ SInf(N ). By saturation of N , we have ι 0 ∈ SRed I (N ). First, we show ⊥ ∈ N J for every J. From N | {⊥}, by the definition of | on Aformulas, it follows that N J | {⊥}. Moreover, by Lemma 20, N J is saturated w.r.t. FInf and FRed I . By static completeness of (FInf, FRed), we get ⊥ ∈ N J .
Hence N ⊥ is propositionally unsatisfiable. By compactness of propositional logic, there exists a finite subset M ⊆ N such that M is propositionally unsatisfiable. By saturation w.r.t. Unsat, we obtain ⊥ ∈ N , as required.
Thanks to the requirements on the redundancy criterion, we obtain dynamic completeness as a corollary: Proof This immediately follows from Theorem 21 by Lemma 6 in the technical report of Waldmann et al. [30].

Local Saturation
The above completeness result, about SRed F -derivations, can be extended to prover designs based on the given clause procedure, such as the Otter, DISCOUNT, and Zipperposition loops, as explained by Waldmann et al. [29,Sect. 4]. But it fails to capture a crucial aspect of most splitting architectures. Since SRed F -derivations have no notion of current split branch or propositional model, they place no restrictions on which inferences may be performed when.
To fully capture splitting, we need to start with a weaker notion of saturation. If an Aformula set is consistent, it should suffice to saturate w.r.t. a single propositional model. In other words, if no A-formula ⊥ ← A such that A ⊆ J is derivable for some model J | N ⊥ , the prover will never be able to apply the Unsat rule to derive ⊥. It should then be allowed to deliver a verdict of "consistent." We will call such model-specific saturations local and standard saturations global.
Local saturation works in tandem with strong static completeness:

Theorem 24 (Strong static completeness) Assume (FInf, FRed) is statically complete. Given a set N ⊆ AF that is locally saturated w.r.t. SInf and SRed I and such that
Proof We show ⊥ ∈ N by case analysis on the condition by which N is locally saturated. The first case is vacuous.
By the definition of local saturation and static completeness of (FInf, FRed), we get ⊥ ∈ N J , contradicting J | N ⊥ .

Example 25
Consider the following A-clause set expressed using AVATAR conventions: It is not globally saturated for resolution, because the conclusion ⊥ ← {[q(y)]} of resolving the last two A-clauses is missing, but it is locally saturated with J ⊇ {[p(x)], ¬[q(y)]} as the witness in Definition 23.
We also need a notion of local fairness that works in tandem with local saturation.

Proof
The proof is by case analysis on the condition by which Lemma 17, and N ∞ is therefore locally saturated. In the remaining case, we by Lemma 4 in the technical report of Waldmann et al. [30], and therefore i FRed Local fairness works in tandem with strong dynamic completeness.

Theorem 28 (Strong dynamic completeness) Assume (FInf, FRed) is statically complete. Given a SRed F -derivation (N i ) i that is locally fair w.r.t. SInf and SRed I and such that
Proof We connect the dynamic and static points of view along the lines of the proof of Lemma 6 in the technical report of Waldmann et al. [30]. First, we show that the limit inferior is inconsistent: An alternative proof based on dynamic completeness follows: Proof We show ⊥ ∈ N i for some i by case analysis on the condition by which (N i ) i is locally fair. The first case is vacuous.
By the definition of local fairness and Theorem 22, we get ⊥ ∈ (N i ) J for some i. By Lemma 2 and the definition of In Sects. 4 to 6, we will review three transition systems of increasing complexity, culminating with an idealized specification of AVATAR. They will be linked by a chain of stepwise refinements, like pearls on a string. All derivations using these systems will correspond to SRed F -derivations, and their fairness criteria will imply local fairness. Consequently, by Theorem 28, they will all be complete.

Model-Guided Provers
The transition system SRed F provides a very abstract notion of splitting prover. AVATAR and other splitting architectures maintain a model of the propositional clauses, which represents the split tree's current branch. We can capture this abstractly by refining SRed F -derivations to incorporate a propositional model.

The Transition Rules
The states are now pairs (J, N ), where J is a propositional interpretation and N ⊆ AF. Initial states have the form (J, N), where N ⊆ F. The model-guided prover MG is defined by the following transition rules: The Derive rule can add new A-formulas (M ) and delete redundant A-formulas (M ). In practice, Derive will perform only sound or consistency-preserving inferences, but we impose no such restriction. If soundness of a prover is desired, it can be derived easily from the soundness of the individual inferences. Similarly, M and M will usually be enabled in J, but we do not require this.
The interpretation J should be a model of N ⊥ most of the time; when it is not, Switch can be used to switch interpretation or StrongUnsat to finish the refutation. Although the condition J i | (N i ) ⊥ might be violated for some i, to make progress we must periodically check it and apply Switch as needed. Much of the work that is performed while the condition is violated will likely be wasted. To avoid this waste, Vampire invokes the SAT solver whenever it selects a clause as part of the given clause procedure.
Proof The only rule that deletes A-formulas, Derive, exclusively takes out A-formulas that are redundant w.r.t. the next state, as mandated by SRed F .
To develop our intuitions, we will study several examples of ⇒ MG -derivations. In all the examples in this section, the base calculus is first-order resolution, and | is entailment for first-order logic with equality.

Example 30 Let us revisit Example 10. Initially, the propositional interpretation is
Finally, we detect that the propositional clauses are unsatisfiable and generate ⊥. This corresponds to the transitions below, where arrows are annotated by transition names and light gray boxes identify enabled A-clauses:

Fairness
We need a fairness criterion for MG that implies local fairness of the underlying SRed Fderivation. The latter requires a witness J but gives us no hint as to where to look for one. This is where basic topology comes into play.
Intuitively, a limit point is a propositional interpretation that is the limit of a family of interpretations that we revisit infinitely often. We will see that there always exists a limit point. To achieve fairness, we will focus on saturating a limit point.  Fig. 1. The direct path from the root to a node labeled J i specifies the assertions that are true in J i . The limit point J corresponds to the only infinite branch of the tree.
We can nearly as easily supply an elementary proof: Proof We construct a subsequence (J j ) j converging to a limit point J in such a way that J j gets the first j variables right-i.e., such that J j | v k if and only if J | v k for every k ≤ j. Moreover, we maintain the invariant that there are infinitely many elements in the sequence (J i ) i that agree with this finite prefix. Assume that we have already defined J 0 , . . . , J j . Among the infinitely many elements J i that agree with J 0 , . . . , J j on v 1 , . . . v j , there must be infinitely many with J i | v j+1 or infinitely many with J i | ¬v j+1 . In the first case, set J j+1 = J i for one such index i, and analogously in the second case.
Lemma 35 tells us that every sequence has a limit point. No matter how erratically the prover switches branches, it will systematically explore at least one branch in a limit point. It then suffices to perform the base FInf-inferences fairly in that branch: Until ⊥ is derived, it is impossible in a fair ⇒ MG -derivation to delay Switch forever (by the first half of (2)) or to starve off Derive by performing only Switch transitions (by the second half of (2)). Also note that we make no assumptions about the order in which propositional models are enumerated; the propositional solver is given carte blanche.
We might at first expect that a realistic prover would ensure the inclusion for all limit points J. However, a prover like Vampire, based on the given clause procedure with an age-based heuristic, might saturate only one of the limit points, as we will see in Sect. 6.2.
Fairness of ⇒ MG -derivations is deliberately defined in terms of FRed I instead of SRed I . This results in a more suitable notion of fairness, since it allows the prover to ignore formulas and inferences that are locally redundant at the limit point but not redundant w.r.t. (SInf, SRed). For example, the inference (t ≈ s, p(t), p(s)) is locally redundant in J ⊇ {v 0 } if the A-clause p(s) ← {v 0 } has already been derived, but it is not redundant w.r.t. (SInf, SRed).
In the spirit of refinement, we have that fairness of an ⇒ MG -derivation implies local fairness of the underlying SRed F -derivation: We take this limit point as the witness for J in Definition 26. It remains to show that J | (N ∞ ) ⊥ . This follows from Lemma 38.
Proof By Theorem 28.
A well-behaved propositional solver, as in labeled splitting, enumerates potential models in a systematic way and always gives rise to a single limit point J ∞ , which can be taken for J in the definition of fairness (Definition 36). To achieve this kind of fairness, a splitting prover would perform all inferences from persistently enabled A-formulas-that is, A-formulas that eventually become enabled and remain enabled forever. In a prover based on the given clause procedure, this can be implemented in the standard way, using an age-based selection heuristic [27,Sect. 4]. However, such a strategy is not sufficient if the prover exploits local redundancy, as we will see in Sects. 5 and 7.2, even if the propositional solver is well behaved.
By contrast, an unconstrained solver, as supported by AVATAR, can produce multiple limit points; in particular, the restart feature of SAT solvers [20] could produce this kind of behavior. Then it is more challenging to ensure fairness, as we will see in Sect. 6.

Example 41
Suppose that we leave out ¬q(z, z) from the initial clause set of Example 10. Then we can still derive ⊥←{v 0 }, as in Example 30, but not ⊥←{v 1 }. By static completeness of the splitting calculus, we conclude that the A-clause set is consistent.
Example 42 Consider the initial clause set consisting of p(x)∨q(a) and ¬q(y)∨q(f(y)). Without splitting, and without selection [2, Sect. 3], a resolution prover would diverge attempting to generate infinitely many clauses of the form p(x) ∨ q(f i (a)).
By contrast, in a splitting prover, we might split the first clause, yielding the A-clauses If we then choose the model {v 1 } and commit to it, we will also diverge, although somewhat faster since we do not need to carry around the literal p(x). On the other hand, if we at any point switch to {v 0 }, we notice that {p(x)} is saturated and terminate. This illustrates the benefits of employing an unconstrained SAT solver.

Example 43
It is crucial to invoke the SAT solver often enough-in other words, to take Switch and StrongUnsat transitions periodically. Suppose that the inconsistent initial clause set of Example 10 is supplemented by the prolific but unhelpful clauses r(a) and ¬r(x)∨r(f(x)). We can perform the same split as before, but if we ignore the fairness condition that J i | (N i ) ⊥ must hold infinitely often, we can stick to the interpretation {¬v 1 , ¬v 2 } and derive useless consequences of the form r(f i (a)) forever, thereby failing to generate ⊥. Similarly, the SAT solver must be invoked eventually after deriving a propositional clause ⊥ ← A that conflicts with the current interpretation.

Example 44
Consider the consistent set consisting of ¬p(x), p(a) ∨ q(a), and ¬q(y) ∨ p(f(y)) ∨ q(f(y)). Splitting the second clause into p(a) and q(a) and resolving q(a) with the third clause yields p(f(a)) ∨ q(f(a)). This process can be iterated to yield arbitrarily many applications of f. Now suppose that v 2i and v 2i+1 are associated with p(f i (a)) and q(f i (a)), respectively. If we split every emerging clause p(f i (a)) ∨ q(f i (a)) and the SAT solver always makes v 2i true before v 2i+1 , we end up with the situation of Example 32 and Fig. 1. For the limit point J, all FInf-inferences are performed. Thus, the derivation is fair.

Example 45
We build a clause set from two copies of Example 44, where each clause C from each copy i ∈ {1, 2} is extended to ¬r i ∨ C. We add the clause r 1 ∨ r 2 and split it as our first move. From there, each branch imitates Example 44. A SAT solver might jump back and forth between them, as in Example 33 and Fig. 2. Even if the A-clauses get disabled and re-enabled infinitely often, we must select them eventually and perform all nonredundant inferences in at least one of the two limit points (J or J ).

Locking Provers
With both AVATAR and labeled splitting, an enabled A-clause can be redundant locally, w.r.t. the current interpretation, and yet nonredundant globally. Both architectures provide mechanisms to temporarily lock away such A-clauses and unlock them when coming back to an interpretation where they are no longer locally redundant. In AVATAR, conditionally deleted A-clauses are stored in the locked set; in labeled splitting, they are stored in the split stack. We will refine the model-guided prover into a locking prover that captures these mechanisms.

The Transition Rules
The states of a locking derivation are triples (J, N , L ), where J is a propositional interpretation, N ⊆ AF is a set of A-formulas, and L ⊆ P fin (A) × AF is a set of pairs of finite assertion sets and A-formulas. Intuitively, (B, C ← A) ∈ L means that C ← A is "locally redundant" in all interpretations J ⊇ B. The function erases the locks: The locking prover is defined by two transition rules: The Lift rule performs an ⇒ MG -transition and unlocks any A-formulas that are no longer locally redundant. The Lock rule can be used to lock A-formulas that are locally redundant.
Proof Every Lift transition clearly corresponds to an MG transition. Every Lock transition corresponds to a Derive transition with M = M = ∅.
Example 47 Let J 0 = {¬v 0 } and J 1 = {v 0 }. The following derivation based on first-order resolution illustrates the locking and unlocking of an A-clause: Gray boxes indicate enabled unlocked clauses. We first put a lock on p(a), because it is "locally subsumed" by p(x) ← {¬v 0 } in J 0 . Once we switch to J 1 , the lock is released, and we can use p(a) to conclude the refutation.
There are three things to note. First, if we had simply thrown away the clause p(a) instead of locking it, we would have lost refutability. Second, it would have been advantageous not to lock p(a) at all and to use it immediately to derive ⊥; however, it is not difficult to come up with examples where locking actually helps, which is why AVATAR includes this mechanism.
Third, although the derivation shows only "local subsumption," it could easily be changed to perform "local simplification"-e.g., demodulation from an equation s ≈ t ← A.

Counterexamples
Locking can cause incompleteness, because an A-formula can be locally redundant at every point in the derivation and yet not be so at any limit point, thereby breaking local saturation. For example, if we have derived p(x) ← {¬v k } for every k, then p(c) is locally redundant in any interpretation J that contains ¬v k . If the sequence of interpretations is given by . .}, the clause p(c) would always be locally redundant and never be considered for inferences. Yet p(c) might not be locally redundant at the unique limit point J = V.
Example 48 Consider the inconsistent initial clause set and ordered resolution with selection as the calculus. Assume that the selection function always chooses the maximal negative literals w.
, and let v i and w i be false in the initial model for all i. Following an age-based selection heuristic and maximal splitting, the second clause the prover derives is ¬s(a) ∨ ¬r(a, y) ∨ q(y), which it splits into ¬s(a) ← {w 0 } and ¬r(a, y) ∨ q(y) ← {v 0 }. The s predicate's role is purely to ensure that this clause is split and that the assertion v 0 is introduced. The prover later also derives q(x) ∨ p(x) and q(y) ← {v 0 } and switches to a model in which v i is true if and only if i = 0. The first of the two clauses is clearly locally redundant, so Lock applies, and ({v 0 }, q(x) ∨ p(x)) is added to L .
Next, q(y)←{v 1 } is derived, before q(y)←{v 0 } is selected for inferences. Eventually, that latter clause can be used together with ¬q(c) to derive ⊥ ← {v 0 }. The prover then switches to a new model in which v i is true if and only if i = 1. The clause q(x) ∨ p(x) can immediately be relocked. This process can be repeated indefinitely. The clause q(x) ∨ p(x), which is necessary for a refutation (together with ¬p(c) and ¬q(c)), is ignored because it is always locally redundant. It is locked each time the prover selects an A-clause for inferences, due to a different A-clause. But it is not locally redundant at the limit point J = ¬V.
In the derivation in Example 48, locking is not applied exhaustively: The A-clause ¬r(f i (a), y) ∨ q(y) ← {v i } is not locked, even though q(y) ← {v j } has already been derived. This situation is unrealistic and would not happen in Vampire. We could hope that is enough for completeness to forbid such anomalous scenarios. However, this is not the case, as we can see from a more complicated example:

Example 49
The calculus is ordered resolution with selection using the precedence p ≺ q 1 ≺ q 2 ≺ r 1 ≺ r 2 ≺ s ≺ t 1 ≺ t 2 ≺ u, selecting nothing if the clause is of the form ¬u(. . .) ∨ u(. . .) and otherwise selecting the maximal negative literals.
The initial clauses are as follows. First, we have a splittable clause q 1 (x) ∨ q 2 (y). Then we have clauses u(a, y) and ¬u(x, y) ∨ u(f(x), y). We will use the predicate symbol u to delay the selection of a clause in an age-based selection heuristic, by adding a literal ¬u(f j (x), y) to the clause. Moreover, we have the clause s(x, y). We can prevent splitting by adding the literal ¬s(x, y) to a clause. Finally, we add the following clauses: The initial clause set is clearly inconsistent. Yet we will sketch an infinite derivation that corresponds to an age-based selection heuristic and that does not derive ⊥. First, we split q 1 (x) ∨ q 2 (y) into q 1 (x) ← {x 1 } and q 2 (x) ← {x 2 }, where the assertion denotations are as follows: The derivation uses the following sequence of interpretations J i : The derivation thus alternates between two families of interpretations, with even and odd indices, giving rise to two limit points. After the clause q 1 (x) ∨ q 2 (y) is split, the prover is in the model J 0 and derives the clauses ¬u(y, x) ∨ r 1 (x) ∨ ¬q 1 (x), ¬u(y, x) ∨ r 2 (x) ∨ ¬q 2 (x), ¬s(a, y) ∨ r 1 (y) ∨ ¬p(a). The last clause is split into ¬s(a, y) ∨ r 1 (y) ← {w 0 }, ¬p(a) ← {v 0 }, and ⊥ ← {¬v 0 , ¬w 0 }. Then an analogous split happens with r 2 instead of r 1 . After a few more inferences, we derive r 1 (x) ∨ ¬q 1 (x) and then r 1 (y) ← {w 0 }, which makes r 1 (x) ∨ ¬q 1 (x) locally redundant (and analogously for r 2 in place of r 1 ). Once r 1 (y) ← {w 0 } is picked as the given clause, the prover derives ⊥ ← {w 0 } and switches to the next model J 1 .
In the first family, J 2i , the clause ¬q 1 (x)∨r 1 (x) is always locally redundant due to r 1 (x)← {w 2i }, and is locked with the assertion w 2i . Similarly, ¬q 2 (x) ∨ r 2 (x) is locally redundant in the second family, J 2i+1 , with the assertion w 2i+1 . In both cases we can already lock each of the clauses while the prover is still in the previous model (J 2i−1 and J 2i , respectively).
The clause ¬q 2 (x) ∨ r 2 (x) is thus only ever unlocked in interpretations J 2i . In those interpretations, q 2 (x) ← {x 2 } is disabled and hence no inferences can be performed with ¬q 2 (x) ∨ r 2 (x). The same holds mutatis mutandis for ¬q 1 (x) ∨ r 1 (x), which is unlocked only when no inferences can be performed with it. As a result, the derivation never performs inferences with q 1 (x) ← {x 1 } or q 2 (x) ← {x 2 }. Removing these A-clauses makes the set satisfiable; thus, by soundness, the derivation cannot contain ⊥.
Given the right sequence of propositional interpretations returned by the SAT solver, the derivation in Example 49 could potentially happen in a prover such as Vampire. It is difficult to exclude that the SAT solver used by Vampire could produce this sequence, or generally to characterize the sequence of models produced by SAT solvers in such a concrete way. This derivation is also strongly fair-every inference that is possible infinitely often, perhaps intermittently, is eventually made redundant. Thus strong fairness is not a sufficient criterion for completeness either.

Fairness
Our solution to the issues encountered above is as follows. Let To achieve fairness, we now consider N ∞ , the A-formulas persistent in the subsequence (N j ) j . By contrast, with no A-formulas locked away, fairness of ⇒ MG -derivations could use N ∞ .
for infinitely many indices i and there exists a subsequence (J j ) j converging to a limit point J such that FInf( Fairness of an ⇒ L -derivation implies fairness of the corresponding ⇒ MG -derivation. The condition on the sets L j ensures that inferences from A-formulas that are locked infinitely often, but not infinitely often with the same lock, are redundant at the limit point. In particular, if we know that each A-formula is locked at most finitely often, then lim sup j→∞ L j = L ∞ and the inclusion in the definition above simplifies to Proof We already showed that (J i , N i , L i ) i is an ⇒ MG -derivation in Lemma 46. It remains to show fairness. If the ⇒ L -derivation is fair by case (1) of Definition 50, we apply case (1) of Definition 36 to show that the ⇒ MG -derivation is fair. Otherwise, from case (2) of Definition 50, we retrieve a limit point J. We will show, for that limit point, case (2) of Definition 36: . By the definition of fairness of ⇒ Lderivations, if all of ι's premises belong to (N ∞ ) J ∪ ((lim sup j→∞ L j ) J \( L ∞ ) J ), then ι is redundant. Otherwise, we have that (b) one of ι's premises, C, is not in that set; that is, C / ∈ (N ∞ ) J and either C / , C ← A cannot be persistent in (N j ) j and hence must occur infinitely often in the sequence ( L j ) j . Thus C ∈ (lim sup j→∞ L j ) J and therefore C ∈ ( L ∞ ) J by (b).
Hence (B, C ← A ) ∈ L ∞ for some A ⊆ J and B. If (B, C ← A ) ∈ L j for some j, then necessarily B ⊆ J j due to the side conditions of the ⇒ L -transitions. Since this is true for infinitely many indices j, we also have B ⊆ J ∞ = J, and thus C ∈ FRed((N i ) J ) for some i by the side condition of the Lock transition. Therefore, by reducedness of FRed, the inference ι is redundant.
Proof By Theorems 28 and 39.

AVATAR-Based Provers
AVATAR was unveiled in 2014 by Voronkov [28], although it was reportedly present in Vampire already in 2012. Since then, he and his colleagues studied many options and extensions [4,22]. At least two reimplementations exist, in Ebner's super tactic for Lean [13] and in the Drodi prover by Oscar Contreras. Here we attempt to capture AVATAR's essence.
We will define an abstract AVATAR-based prover that extends the locking prover L with a given clause procedure [19,Sect. 2.3]. A-formulas are moved in turn from the passive to the active set, where inferences are performed. The heuristic for choosing the next given A-formula to move is guided by timestamps indicating when the A-formulas were derived, to ensure fairness.

The Transition Rules
Let TAF = AF × N be the set of timestamped A-formulas. (We will often omit the adjective "timestamped.") Given a subset N ⊆ TAF, we define N = {C | (C , t) ∈ N for some t} and overload existing notations to erase timestamps as necessary. Accordingly, N = N , N ⊥ = N ⊥ , and N J = N J . Note that we use a new set of calligraphic letters (e.g., C, N) to range over timestamped A-formulas and timestamped A-formula sets. We say that N is enabled in J if and only if N is enabled in J. We also define (C 1 , . . . , C n , D) = ( C 1 , . . . , C n , D ) for TAF-inferences ι.
Using the saturation framework [29, Sect. 3], we lift a calculus (SInf, SRed) on AF to a calculus (TSInf, TSRed) on TAF with the tiebreaker order < on timestamps. The tiebreaker is used to strengthen redundancy, so that if the same A-formula appears but with two different timestamps, the more recent version is considered redundant. In other words, if an A-formula appears with two timestamps, the later version is redundant. Note that TSRed is in general not reduced. Traditionally, provers use the active or passive status as tiebreaker: An active clause may subsume a passive copy of itself, but not the other way around. Timestamps are more fine-grained.
Lemma 53 Let N ⊆ AF, C ∈ AF, and t, k ∈ N. Then: Proof This follows directly from the definition in Waldmann et al. [29].
A state is a tuple (J, A, P, Q, L) consisting of a propositional interpretation J, a set of enabled nonpropositional active A-formulas A ⊆ TAF, a set of enabled nonpropositional passive A-formulas P ⊆ TAF, a set of A-formulas Q ⊆ TAF that are either disabled in J or propositional clauses, and a set of locked A-formulas L ⊆ P fin (A) × TAF such that (1) A ⊥ = P ⊥ = ∅; ( 2 )A ∪ P is enabled in J; Whenever we write a tuple (J, A, P, Q, L), we assume that it satisfies all these invariants.
The input formulas are first put in the passive set P. Once an A-formula is selected for inferences and all inferences with it and the active A-formulas have been made redundant, it is moved to the active set A. Inferences such as Split produce disabled and propositional clauses, which are put into Q. When switching to a new model, the prover moves the newly enabled A-formulas from Q to P and the newly disabled A-formulas from A and P to Q to preserve the state invariant.
The division of nonactive A-formulas into the sets P and Q is done for notational convenience; for example, P is a separate set because fairness will be stated in terms of A and P. In a practical implementation, this division would likely be different. The set Q would typically be distributed over two data structures: The propositional clauses in Q ⊥ are directly passed to the SAT solver and need not be stored by the prover itself. The remaining A-formulas Q \ Q ⊥ are those that need to be moved back into P when the prover switches to an interpretation that enables them. These might be stored in the same data structure as the set of locked A-formulas L, which also need to be reactivated depending on the interpretation. This is what Vampire does. Alternatively, they could be stored in the same data structure as P, with the prover checking on every access whether an A-formula is enabled in the current interpretation.
The AVATAR-based prover AV is defined as the following transitions: There is also a LockP rule that is identical to LockA except that it starts in the state (J, A, P {(C ← A, t)}, Q, L). An AV-derivation is well timestamped if every A-formula introduced by a rule is assigned a unique timestamp. In practice, a prover would ensure well-timestampedness by assigning timestamps monotonically, but this is not necessary for the fairness and completeness proofs.
Proof The transitions map directly to the corresponding transitions in ⇒ L ; both Infer and Process map to a Lift of Derive.

Proof The invariant is preserved by each transition.
Example 56 Let us redo the ⇒ MG -derivation of Example 30 using ⇒ AV . For readability, we emphasize in gray the A-clauses that appear or move between state components and omit all timestamps. One possible derivation is

Counterexamples
In contrast with nonsplitting provers, for AV, fairness w.r.t. formulas does not imply fairness w.r.t. inferences. To ensure fairness in a nonsplitting prover, it suffices to select the oldest formula for inferences infinitely often; for example, provers can alternate between choosing the oldest and choosing the heuristically best formula. In splitting provers, such a strategy is incomplete, and we need an even stronger fairness criterion.
A problematic scenario involves two premises C, D of a binary inference ι and four transitions repeated forever, with other steps interleaved: Infer makes C active; Switch disables it; Infer makes D active; Switch disables it. Even though C and D are selected in a strongly fair fashion, ι is never performed.
Example 58 More concretely, make two copies of the clause set Let the models alternate between the x 1 and x 2 branches, making as few variables true as possible. Because the models alternate between the two branches, ¬p(x) ∨ p(f(x)) ∨ q(x) ← {x i } will always be the oldest passive A-clause after switching to a new model. Assume that the prover chooses this clause for inferences based on age. If we are allowed to interleave age-based selection with heuristic selection, we can cause the prover to switch model after selecting at most two additional A-clauses for inferences: If an A-clause q(f j (a)) ← {w j } is enabled, we heuristically select both that A-clause and ¬q(x) ←{x i }. Otherwise, an A-clause of the form p(f j (a)) ← {v j } is enabled. Assume that j is maximal among such clauses, and that thus v j+1 is false in the model. We heuristically select that clause for inferences, deriving With this strategy, the prover will never select ¬p(x) ∨ q(f(x)) ←{x i } for inferences, since there is always an older clause to choose first. Consequently, it will never derive ⊥.
In Example 58, the prover did not derive ⊥ because it never performed an inference between p(a) ← {x 1 } and ¬p(x) ∨ q(f(x)) ← {x 1 } (and analogously for x 2 ), even though both A-clauses are enabled infinitely often. Forbidding this situation does not guarantee completeness either. As Example 49 showed, there exist strongly fair derivations that do not derive ⊥ from an inconsistent initial set.
We believe that this counterexample cannot arise with Vampire, because Vampire alternates between age-based and weight-based selection using a fixed ratio (the "age-weight ratio" or "pick-given ratio"). In contrast, our example requires a highly unrestricted heuristic selection, where we choose young, large A-clauses such as p(f n (a)) ← {v n } even though smaller, older ones are enabled.
Unrelated to completeness, we might expect that under a reasonable strategy an ⇒ AVderivation saturates every limit point. This is, however, not the case either, even with strict age-based selection: Example 59 Take the following consistent A-clause set: Assume ordered resolution as the base calculus with the precedence q ≺ p. Thus the prover will not resolve ¬q(x) ← {x} with ¬p(x) ∨ p(f(x)) ∨ q(f(x)) ← {x}. We will sketch a derivation with two limit points, J | x and J | x, where J will not be locally saturated. Let fml(v i ) = p(f i (a)) and fml(w i ) = q(f i (a)). We define the sequence of models (J i ) i such that The prover now starts in the model J 0 , and processes the formulas in the order we listed them at the beginning of the example. The first new formula it derives is p(f(a)) ∨ q(f(a)) ← {x}, which it splits into p(f(a)) ← {v 1 }, q(f(a)) ← {w 1 }, and ⊥ ← {x, ¬v 1 , ¬w 1 }. The last propositional clause is not satisfied in J 0 , so the prover switches to a new interpretation.
After switching to the next model J 1 = J 2·0+1 , the two formulas p(f(a)) ← {v 1 } and q(f(a)) ← {w 1 } remain in the active set. The prover then chooses the oldest enabled passive formula, ¬q(x)←{¬x}, for inferences. Thus deriving the propositional clause ⊥←{¬x, w 1 }, which is not satisfied in J 1 .
This process is then repeated infinitely often. In the model J 2i , the prover derives the three new formulas The subsequence (J 2i+1 ) i converges a limit point, call it J . The formulas enabled at J are not saturated: p(a) and ¬p(x) are enabled, but ⊥ is not.

Fairness
for infinitely many indices i and there exists a subsequence (J j ) converging to a limit point J such that (3) lim inf j→∞ TSInf(A j , P j ) = ∅ and (4) Condition (3) ensures that all inferences involving passive A-formulas are redundant at the limit point. It would not suffice to simply require P ∞ = ∅ because A-formulas can move back and forth between the sets A, P, and Q, as we saw in Example 58. Condition (4) is similar to the condition on locks in Definition 50.
Proof We trivially have L 0 = ∅. Furthermore, if ⊥ ∈ i Q i , we clearly have ⇒ L -fairness. It remains to show subcase (B)(2) of Definition 50, using the corresponding subsequence as used for ⇒ AV -fairness. So let ♣ L = (lim sup j→∞ L j ) J \ lim sup j→∞ L j J and ♣ A V = (lim sup j→∞ L j ) J \ L ∞ J be the terms from the corresponding fairness conditions.
First we show , that is, every enabled formula in the subsequence is either persistent or redundant on the base level. So let (C ← A, t) ∈ P j . We prove the statement by induction on (A, t) w.r.t. the lexicographic order. If C ∈ ♣ A V or C ∈ lim sup j→∞ L j J , we are done. Otherwise C / ∈ L ∞ J (because L ∞ J ⊆ lim sup j→∞ L j J ) and hence C / ∈ (lim sup j→∞ L j ) J (because C / ∈ ♣ A V). So since (C ← A, t) is not locked infinitely often, there exists an index after which it is never locked again, which means that it is either persistent in (A j ∪ P j ) j (and we are done) or deleted in Process and thus (C ← A, t) In case (a), we are done. In cases (b) and (c), we apply the induction hypothesis.
Assuming the restriction on locking we already required for ⇒ L -derivations, the ⇒ AV relation is concrete enough to allow us to show that typical clause selection strategies are fair and avoid the counterexamples from Sects. 5.2 and 6.2. Many selection strategies are combinations of basic strategies, such as choosing the smallest formula by weight or the oldest by age. We capture such strategies using selection orders . Intuitively, C D if the prover will select C before D whenever both are present. That is, the prover always chooses one of the -minimal A-formulas. We use two selection orders: TAF , on timestamped A-formulas, must be followed infinitely often; F , on base formulas, must be followed otherwise.
Definition 63 Let X be a set. A selection order on X is an irreflexive and transitive relation such that {y | y x} is finite for every x ∈ X.
Example 64 Let X ⊆ TAF be such that X only contains finitely many A-formulas with the same timestamp. Define age on X so that (C , t) age (C , t ) if and only if t < t . Then age is a selection order corresponding to age-based selection.

Remark 65
Every selection order is a well-founded relation, but not every well-founded relation is a selection order. A well order is a selection order if and only if its order type is less than ω + 1. The ordinal ω + 1 = {0 < 1 < 2 < · · · < ω} is not a selection order since {y | y > ω} is infinite. Even well-founded relations of low rank need not be selection orders: The empty relation ∅ ⊆ N × N is irreflexive, transitive, and well founded (with rank zero) but not a selection order.
Selection orders on TAF also generalize the mechanism, outlined by Bachmair and Ganzinger in a footnote [2,Sect. 4.3] and elaborated by Schlichtkrull et al. [27,Sect. 4], of using an N-valued weight function that is strictly monotone in the timestamp.
Example 66 Let F be the set of first-order clauses in a fixed signature. Define the selection order nv on F by C nv C if and only if |C | ≤ |C|, where |C| denotes the sum of the number of nonvariable positions in C. Then nv is a selection order because there exists at most a finite number of first-order clauses with at most n nonvariable positions for any n. This selection order corresponds to a simple weight-based selection scheme.
The intersection of two orders 1 and 2 corresponds to the nondeterministic alternation between them. The prover may choose either a 1 -minimal or a 2 -minimal A-formula, at its discretion.
Lemma 67 Let 1 and 2 be selection orders on X. Then 1 ∩ 2 is a selection order as well.
Proof Irreflexivity and transitivity are preserved by intersections, and note that {y | not (x 1 y and x 2 y)} = {y | x 1 y} ∪ {y | x 2 y} is finite as a union of two finite sets.

Lemma 68 Let be a selection order on an infinite set X. Then for all elements x and y, there exists an element z such that x z and y z.
Proof The set X\{z | x z and y z} = {z | x z} ∪ {z | y z} is finite, and therefore {z | x z and y z} is infinite and in particular nonempty.
To ensure completeness of the given clause procedure, we must restrict the inferences that the prover may perform; otherwise, it could derive infinitely many A-formulas with different assertions, causing it to switch between two branches of the split tree without making progress as in Example 58. Given N ⊆ AF, let N = {A | C ← A ∈ N for some C}. For the completeness of the given clause procedure, Lemma 77, we will fix a strongly finitary function I restricting the inferences: The prover may perform an inference only if its conclusion is in I(A i ), where A i is the active clause set after the Infer transition. This restriction will allow us to rule out the case where i A i is finite and the prover switches models without making progress. Condition (1) in Definition 69 then says that the prover only infers finitely many F-formulas-this will in turn ensure that splitting creates only finitely many new assertions. Condition (2) says that the inferred A-formulas contain only finitely many new assertions. Taken together, only finitely many assertions are added in the case, where i A i is finite, which means that the prover can only switch models finitely often, a contradiction.
Simplification rules used by the prover must be restricted even more to ensure completeness, because they can lead to new splits and assertions, and hence switching to new models. For example, simplifying p(x * 0) ∨ p(x) to p(0) ∨ p(x) transforms a nonsplittable clause into a splittable one. Even for the standard orders on first-order clauses, there can be infinitely many clauses that are smaller than a given clause. For example, with the lexicographic path order, the set {u | u ≺ u} is typically infinite for a term u. If simplifications were to produce infinitely many new splittable clauses, the prover might split clauses and switch propositional interpretations forever without making progress.

Example 70
Even if ≺ is a well-founded order on F, and I is a set of binary inferences such that C 2 ≺ C 1 and D ≺ C 1 for every inference (C 2 , C 1 , D) ∈ I, simplification with I can still produce infinitely many base formulas. This is because in an AV prover, the same base formula may be rederived infinitely often (for example due to switching between two families of interpretations).
As a slightly abstract example, consider F = N ∪ {∞} with 0 ≺ 1 ≺ 2 ≺ · · · ≺ ∞, and let I = {(n, ∞, n + 1) | n ∈ N}. If the prover then rederives ∞ infinitely often, it might simplify ∞ using I in a different way each time, the first time to 1, then to 2, and so on. We hence need to ensure that, in the entire derivation, each formula is simplified in at most a finite number of ways.

Definition 71
Let ≺ be a transitive well-founded relation on F, and let be its reflexive closure. A function S : AF → P(AF) is a strongly finitary simplification bound for ≺ if N → C ∈N S (C ) is strongly finitary and C C for every C ∈ S (C ).
The prover may simplify an A-formula C to C only if C ∈ S (C ). It may also delete C . Strongly finitary simplification bounds are closed under unions, allowing the combination of simplification techniques based on ≺. For superposition, a natural choice for ≺ is the clause order. Analogously to strongly finitary functions, we define the extension of strongly finitary simplification bounds to sets of formulas as S F (N) = S (N × P fin (A)) . The key property of strongly finitary simplification bounds is that if we saturate a finite set of A-formulas w.r.t. simplifications, the saturation is also finite. This is crucial to bound the number of A-formulas derived by the prover and thus the number of possible model switches: If the prover only selects a finite set of A-formulas for inferences, then simplification will only derive finitely many A-formulas as well, no matter how often an A-formula is derived again:

Lemma 72 Let S be a strongly finitary simplification bound. For every
, where S i denotes the ith iterate of S . Then S * is also a strongly finitary simplification bound.
Proof Clearly, C C for every C ∈ S * (C ). Let N ⊆ F be finite. Next we show that S * F (N) is finite as well. Define a sequence of finite sets M i ⊆ F by M 0 = N and Because S is a strongly finitary simplification bound, M i+1 \ M i is always finite, and decreasing in the multiset extension of ≺. It is even strictly decreasing as long as By assumption, we have such a function B for S . Then set B (C) = B(S * F (C)) (which is finite for all C), and we have S i+1 Example 73 Let F be the set of first-order clauses and S (C ← A) = {C ← A |C is a subclause of C and A ⊆ A}. Then S is a strongly finitary simplification bound, because (1) C C if C is a subclause of C and (2) each clause has only finitely many subclauses. This S covers many simplification techniques, including elimination of duplicate literals (simplifying C ∨ L∨ L to C ∨ L), deletion of resolved literals (simplifying C ∨ u ≈ u to C), and subsumption resolution (simplifying Cσ ∨Dσ ∨Lσ to Cσ ∨Dσ given the side premise C ∨¬L). Removing redundant clauses is possible with every S .

Example 74
If the Knuth-Bendix order [18] is used as the term order and all weights are positive, then S (C←A) = {C ←A | C ≺ C and A ⊆ A} is a strongly finitary simplification bound. This can be used to cover demodulation.

Example 75
The simplification rules Collect, Trim, and StrongUnsat from Sect. 3.1 are all strongly finitary simplification bounds. In a practical implementation, Split will deterministically split C ∈ F into C 1 , . . . , C n and use the same assertions a i ∈ asn(C i ) every time. Under these conditions, Split is also a strongly finitary simplification bound.
For other term orders, the S in Example 74 is not strongly finitary, and proving that demodulation is a strongly finitary simplification bound is much more involved. In this case, the necessary strongly finitary simplification bound even depends on the derivation.

Example 76
If unit equations are only removed by demodulation, reflexivity deletion, or subsumption, the one-step demodulations possible at any point in the derivation are a strongly finitary simplification bound. By Lemma 72, this implies that many-step demodulation is also a strongly finitary simplification bound.
Assume that demodulation is performed in a postorder traversal (i.e., subterms first), always rewriting using the oldest available equation. Also assume that if l ≈ r is an existing (ordered) equation and the prover derives l ≈ r , that l ≈ r is not used for demodulation (but for example instead simplified to r ≈ r ).
We will show that for every term t, there exist only finitely many terms t that are simplified from t in one step. The term t will typically be different over the course of the derivation. However, we can assign a decreasing well-founded measure to the rewrite step, ensuring finiteness. Consider a demodulation step transforming C[lσ ] to C[rσ ] using an equation l ≈ r, with rσ ≺ lσ . Let i be the index of lσ in C[lσ ] in a postorder traversal, let |l| be the number of nonvariable positions in l, and let u = 1 if the equation is unorientable (l r) and u = 0 otherwise. Then the tuple (i, |l|, u, rσ ) decreases or stays the same in the left-to-right lexicographic order as we move along the derivation.
If the prover derives a new ordered equation l ≈ r , it is possible that it applies at an earlier position in C, thus decreasing the i. Otherwise, it applies at the same position as l ≈ r previously, and the prover rewrites using the older l ≈ r first, keeping the tuple unchanged. If the equation l ≈ r is subsumed by l ≈ r and deleted, then |l | < |l|. (Note that if l ≈ r is subsumed by l ≈ r , then r and r are identical because all variables that occur in r, r also occur in l.) If l ≈ r is simplified to t ≈ r by l ≈ r and l ≈ l , then |l | < |l|. If l ≈ r is simplified to t ≈ r by l ≈ r and l ≈ r is unorientable, then |l | = |l| and u decreases. If l ≈ r is simplified to l ≈ t by l ≈ r , then |l| stays the same, u might decrease, and rσ tσ .
Based on the above definitions, we introduce a fairness criterion that is more concrete and easier to apply than the definition of fairness of ⇒ AV -derivations.

is fair if all of the following conditions hold:
Proof If ⊥ ∈ i Q i , the derivation is trivially fair. Otherwise, the StrongUnsat transition never occurs, and therefore Switch is eventually applied if the propositional clauses are not satisfied by the interpretation. Hence J i | Q i for infinitely many i, thus satisfying condition (C)(2) of Definition 36. Conditions (A) and (B) are satisfied due to condition (2) of this lemma, and (C)(4) due to (9). It remains to construct a subsequence (J j , A j , P j , Q j , L j ) j such that (J j ) j converges to a limit point and lim inf j→∞ TSInf(A j , P j ) = ∅, as required for (C)(3).

Case 1:
The set of F -minimal A-formulas selected in an Infer transition for some state j is unbounded in F . That is, for every C ∈ F, there is a Infer transition from state j such that the selected A-formula S j is F -minimal in P j , and C F S . These Infer transitions clearly form an infinite subsequence. By Lemma 35, we can further refine it into a subsequence (J j , A j , P j , Q j , L j ) j , where (J j ) j converges to a limit point. Assume towards a contradiction that ι ∈ lim inf j→∞ TSInf(A j , P j ). By Lemma 68, for every C ∈ prems(ι) there exists an index j such that C F S j . Therefore prems(ι) ⊆ A j by the F -minimality requirement on the Infer transition, a contradiction.
Case 2: The set of TAF -minimal A-formulas selected in an Infer transition for some state j is unbounded in TAF . This case is analogous to case 1.
Case 3: Neither case 1 nor case 2 apply. Then the set of F -minimal formulas selected in an Infer transitions is bounded and hence finite since F is a selection order. Similarly, the set of TAF -minimal TAF-formulas selected in an Infer transitions is finite as well. Let T be the set of A-formulas selected in an Infer transition. So T and therefore i A i are both finite. The set S * is then finite, and therefore i A i ∪P i ∪Q i ∪ L i is finite as well. Since both S * and I are strongly finitary, only a finite number of new assertions are introduced, and i A i ∪ P i ∪ Q i ∪ L i is also finite. Thus ( i Q i ) ⊥ is also finite, and only a finite number of Switch transitions can occur. Thus there exists an index N such that no Switch transitions occur at states i > N.
Now take the whole derivation as subsequence. We have P ∞ = ∅ because there are infinitely many states j with a Infer transition such that a TAF -minimal A-formula S j is selected. After the initial N steps, every A-formula is selected only once (that is, S i = S j if i = j), because once it has been selected, it can only be removed from the active set if it becomes redundant or locked. (There are no Switch transitions.) In either case, the A-formula is removed from the passive set for the rest of the derivation. Newly derived Aformulas have a different timestamp due to the well-timestampedness requirement. Therefore, once an A-formula is in the active set, it will not come back into the passive set, and we have lim inf i→∞ TSInf(A i , P i ) = ∅.
Recall the abstract counterexample from Sect. 6.2 in which the A-formulas C and D were selected and disabled in turn. Intuitively, selection orders, together with the restrictions on the inferences, ensure that the prover will follow roughly the same steps whenever it is in a model that enables C and D. Since there are only finitely many formulas that it can select for inferences before C or D, the prover will eventually repeat itself and thus make progress.
We could refine AV further and use Lemma 77 to show the completeness of an imperative procedure such as Voronkov's extended Otter loop [28, Fig. 3], thus showing that AVATAR as implemented in Vampire is complete if locking is sufficiently restricted. A slight complication is that in Vampire's AVATAR, A-clauses C ← {[C]} are generated on a per-need basis when switching model. This is not a serious issue because we can imagine that the A-clauses were there all along in the Q set.
Even the concrete criterion offered by Lemma 77 refers, in its condition 9, to limit superiors and limit points. Some architectures will satisfy it by their very design. For AVATAR, an easy way to meet the condition is to bound the number of times each A-formula can be locked. Once that number has been reached, the A-formula can no longer be locked. An alternative, suggested by a reviewer, is to disable all splitting after the prover has run for a specified time.

Application to Other Architectures
AVATAR may be the most natural application of our framework, but it is not the only one. We will complete the picture below by studying splitting without backtracking, labeled splitting, and SMT with quantifiers.

Splitting Without Backtracking
Before the invention of AVATAR, Riazanov and Voronkov [25] had already experimented with splitting in Vampire in a lighter variant without backtracking. They based their work on ordered resolution O with selection [2], but the same ideas work with superposition. Weidenbach [31, end of Sect. 4.5] independently outlined the same technique at about the same time.
The basic idea of splitting without backtracking is to extend the signature with a countable set P of nullary predicate symbols disjoint from and to augment the base calculus with a binary splitting rule that replaces a clause C ∨ D with C ∨p and D∨¬p, where C and D share no variables and p ∈ P. Riazanov and Voronkov require that the precedence ≺ makes all P-literals smaller than the -literals. Binary splitting then qualifies as a simplification rule. They show that their rule and a few variants are consistency-preserving. They do not show refutational completeness, but this is obvious since the rule is a simplification.
Riazanov and Voronkov also extend the selection function of the base calculus to support P-literals. They present two such extensions: The blocking function allows for the selection of P-literals in clauses that contain -literals, whereas the parallel function selects only maximal P-literals in pure P-clauses and otherwise imitates the original selection function. Parallel selection cleanly separates the Pand the -literals. Bachmair and Ganzinger proved O statically complete, and this also obviously extends to ordered resolution with this extension, which we denote by O P , since it is an instance of the same calculus.
The calculus O P is closely related to an instance of our framework. Let F be the set of -clauses, with the empty clause as ⊥. Let O = (FInf, FRed), where FInf is the set of ordered resolution inferences on F with some selection function and FRed is the standard redundancy criterion [2,Sect. 4.2.2], and similarly O P = (FPInf, FPRed). We use the notion of entailment from Example 1 for the base relations | and | ≈ for both calculi. We take V = P for defining AF. The properties (D1)-(D6) and (R1)-(R7) are verified for | and FRed, respectively. This gives us a splitting calculus LA = (SInf, SRed), whose name stands for lightweight AVATAR. Lightweight AVATAR amounts to the splitting architecture Cruanes implemented in Zipperposition and confusingly called "AVATAR" [9,Sect. 2.5]. Binary splitting can be realized in LA as the following simplification rule: with the side conditions that a ∈ asn(C) and C ∨ D is splittable into C, D. By Theorem 21, LA is complete.
Like splitting without backtracking but unlike the real AVATAR, Cruanes's architecture is not guided by a propositional model. It is essentially an instance of LA, except that it is based on superposition instead of ordered resolution. It performs branch-specific simplifications (a special case of subsumption demodulation [17]), which is supported by our locking mechanism. A SAT solver is used to detect propositional unsatisfiability (corresponding to our Unsat rule) and to eliminate assertions that are implied at the SAT solver's top level (corresponding to our Trim rule).
The calculi O P and LA are very similar but not identical. O P has a slightly stronger notion of inference redundancy, because its order ≺ can access not only the -literals but also the P-literals, whereas with LA the P-literals are invisible to the base calculus. To see this, consider the set consisting of the P -clauses q b∨ p ∨ ¬q ¬a ∨ p ∨ ¬q a∨ p ∨ ¬q where P = {a, b}. Given the precedence a ≺ b ≺ p ≺ q, an ordered resolution inference is possible between the first two clauses, with b ∨ p as its conclusion. This inference is redundant according to FPRed I , because the conclusion is entailed by the first, third, and fourth clauses taken together, all of which are ≺-smaller than the main premise b ∨ p ∨ ¬q. However, the corresponding AF-inference is not redundant according to SRed I , because the assertions are simply truncated by the projection operator ( ) J and not compared. Without the assertions, the third and fourth clauses are equal to, but not smaller than, the main premise, and the inference is not redundant. Note that the set is not saturated: Inferences are possible to derive ¬a ∨ p and a ∨ p, which make b ∨ p redundant. Another dissimilarity is that LA can detect unsatisfiability immediately using a SAT solver, whereas splitting without backtracking generally needs many propositional resolution steps to achieve the same. Correspondingly, on satisfiable problems, LA allows smaller saturated sets. For example, while the A-clause set {⊥ ← {a, ¬b}, ⊥ ← {b}} is saturated, its O P counterpart is subject to an inference between ¬a ∨ b and ¬b.
As positive results, we will show that O P and LA share the same notion of entailment and O P 's redundancy criterion is stronger than LA's, yet saturation w.r.t. LA guarantees saturation w.r.t. O P , up to the natural correspondence between A-clauses and P -clauses. More precisely, a P -clause can be written as C ∨ L 1 ∨ · · · ∨ L n , where C is a -clause and the L i 's are P-literals. Let α be a bijective mapping such that α(C ∨ L 1 ∨ · · · ∨ L n ) = C ← {¬L 1 , . . . , ¬L n } is the corresponding A-clause. We overload the operator to erase the P-literals: C ∨ L 1 ∨ · · · ∨ L n = C ← {¬L 1 , . . . , ¬L n } = C. Moreover, let G denote the function that returns all ground instances of a clause, clause set, or inference according to , which is assumed to contain at least one constant. For the forward direction, we must show that (α(M)) J | N for some J in which α(N) is enabled. Let K be a -model of {α(M)} J . We will show that at least one clause in N is true in K. We start by showing that K ∪ J is a P -model of M. Let C ∈ M. If α(C) is enabled in J, then C ∈ (α(M)) J . Thus K | C and finally K ∪ J | {C}. Otherwise, α(C) contains an assertion that is false in J, which means that C contains the complementary P-literal, which is true in J, and we have K ∪ J | {C}. Either way, K ∪ J | M and hence, since M | N, one of the clauses in N is true in K∪J. Since α(N) is enabled in J, all P-literals occurring in N are false in K ∪ J. Therefore, each clause in N must contain a true -literal in K, which means that the corresponding clause in N must also be true in K.
For the backward direction, we must show that M | N. Let K ∪ J be a P -model of M, where K is a -interpretation and J is a P-interpretation. We will show that a clause in N is true in K ∪ J. If α(N) is disabled in J, there exists a P-literal in some clause from N that is true in K ∪ J, which suffices to make the entire clause true. Otherwise, N is enabled in J and then (α(M)) J | N . Since K ∪ J | M, we have K | (α(M)) J . Hence, one of the clauses in N is true in K, and its counterpart in N is also true in K ∪ J.
Proof Let ι = (C n , . . . , C 1 , C 0 ) and assume α(ι) is a Base inference. By the definition of ordered resolution, none of the -clauses C i , for i ∈ {1, . . . , n}, can be ⊥. Thus, the selected literals in the premises coincide with those chosen by the parallel selection function on the P -clauses C i and so ι ∈ FPInf.
Case Base: We need to show that {ι} J ⊆ FRed I ((α (G (N))) J ) for every propositional interpretations J. The case where ι is disabled in J is trivial. Otherwise, let θ be a substitution such that ιθ ∈ G (ι). We must show that { C n θ , . . . , Because the premises' assertions are contained in the conclusion's, this is equivalent to showing that {α(C n θ), . . . , α(C 2 θ)}∪E | {α(C 0 θ)}. By Lemma 79, there exists an inference (C n , . . . , C 1 , C 0 ) ∈ FPInf. Since N is saturated, the inference is redundant-i.e., {C n θ, . . . , (N) and C ≺ C 1 θ }. If α(F ) ⊆ E , we can invoke Lemma 78 to conclude. However, in the general case, we have only that α(F \ F eq ) ⊆ E , where F eq = {C ∈ F | C = C 1 θ }, and thus there might be models of E that are models of F \ F eq but not of F eq . Fortunately, we can show that {C n θ, . . . , C 2 θ } ∪ (F \F eq ) | {C 0 θ }. We proceed by removing from F each clause D ∈ F eq in turn and by showing that the entailment is preserved by each step. Finally, we invoke Lemma 78. A slight complication is that F eq may be infinite. However, by compactness, only a finite subset F eq ⊆ F eq is needed to have the desired entailment.
Let D ∈ N be a clause that generalizes the ground, -largest clause in F eq . Then there exists an inference (C n , . . . , C 2 , D, D 0 ) ∈ FPInf such that C 0 θ ∈ G (D 0 ) and the Pliterals of D 0 are the union of those of C n , . . . , C 2 , D. By renaming the variables in D and D 0 , we can ensure that Dθ = C 1 θ and D 0 θ = C 0 θ . Now, to prove the desired entailment, assume that J is a model of {C n θ, . . . , Since we are proceeding from largest to smallest clause, we have {C ∈ G (N) | C ≺ Dθ } ⊆ F \{Dθ }, even if some clauses have been removed from F already. Thus, in both cases, J | {D 0 θ }. If J makes a -literal of D 0 θ true, J makes the same literal in C 0 θ true. Otherwise, either J makes one of the P-literals of C n θ, . . . , C 2 θ true, satisfying C 0 θ for the same reason, or it makes one of the P-literals of D true and then J | {C n θ, . . . , C 2 θ } ∪ F , which as noted above implies J | {C 0 θ } by the saturation of N. In both cases, J | {C 0 θ }.
Case Unsat: The inference derives ⊥ from a set of P-clauses (α( is saturated, we have ⊥ ∈ N and hence ⊥ ∈ α(N). Therefore, ι is redundant also in this case.

Labeled Splitting
Labeled splitting, as originally described by Fietzke and Weidenbach [15] and implemented in SPASS, is a first-order resolution-based calculus with binary splitting that traverses the split tree in a depth-first way, using an elaborate backtracking mechanism inspired by CDCL [20]. It works on pairs ( , N ), where is a stack storing the current state of the split tree and N is a set of labeled clauses-clauses annotated with finite sets of natural numbers.
We model labeled splitting as an instance of the locking prover L based on the splitting calculus LS = (SInf, SRed) induced by the resolution calculus R = (FInf, FRed), where | and | ≈ are as in Example 1 and V = i∈N {l i , r i , s i }. A-clauses are essentially the same as labeled clauses.
Splits are identified by unique split levels. Given a split on C ∨ D with level k, the propositional variables l k ∈ asn(C) and r k ∈ asn(D) represent the left and right branches, respectively. In practice, the prover would dynamically extend fml to ensure that fml(l k ) = C and fml(r k ) = D.
When splitting, if we simply added the propositional clause ⊥ ← {¬l k , ¬r k }, we would always need to consider either C←{l k } or D←{r k }, depending on the interpretation. However, labeled splitting can undo splits when backtracking. Yet fairness would require us to perform inferences with either C or D, which Fietzke and Weidenbach avoid. We solve this as follows. Let = ∼⊥. We introduce the propositional variable s k ∈ asn( ) so that we can enable or disable the split as we wish. The StrongUnsat rule then knows that s k is true and that the cases are exhaustive, but we can still switch to propositional models that disable both C and D. A-clauses are then split using the following binary variant of Split: where C and D share no variables and k is the next split level. Unlike AVATAR, labeled splitting keeps the premise and might split it again with another level. We rely on locking to ensure that the premise is not split within either branch.
To emulate the original, the locking prover based on the LS calculus must repeatedly apply the following three steps in any order until saturation: If a left branch is closed, the model must be updated so as to disable the splits that were not used to close this branch and to enable the right branch. If a right branch is closed, the split must be disabled, and the model must switch to the right branch of the closest enabled split above it with an enabled left branch. If a right branch is closed but there is no split above with an enabled left branch, the entire tree has been visited. Then, a propositional clause ⊥ ← A with A ⊆ i {s i } is | -entailed by the A-clause set, and StrongUnsat can finish the refutation by exploiting fml(s i ) = .
We illustrate the strategy on an example.

Example 82 Let
where the last three clauses listed in N are disabled and thus currently unusable for inferences.
The first backtracking step happens after a Base inference produces ⊥ ← {l 0 , l 2 } from p(x) ← {l 0 } and ¬p(x) ← {l 2 }. The Switch disables s 1 , because this split was not useful in closing the branch, and it moves from branch l 2 to r 2 . The new model disables ¬p(x) ← {l 2 }, enables q(y) ← {r 2 }, and unlocks r(x) ∨ s(y).
The second backtracking step happens after ⊥ ← {r 2 } is derived from ¬q(x) and q(y) ← {r 2 }. Since both branches of the split s 2 have now been closed, the Switch rule is invoked, producing the model (J 0 \{¬r 0 , ¬s 0 }) ∪ {r 0 , s 0 }. This unlocks ¬p(x) ∨ q(y), and now only q(y) ← {r 0 } is enabled in addition to the unlocked input clauses.
By following the strategy presented above, LS closely simulates the original calculus, in the sense that it is possible to add and remove (or at least disable) exactly the same elements to the A-clause set as is done in the original, and in the same order. A subtle, inconsequential difference lies in the backtracking: Labeled splitting can move to a branch where ⊥ is enabled, whereas our Switch rule requires that all propositional clauses are satisfied.
What about fairness? The above strategy helps achieve fairness by ensuring that there exists at most one limit point. It also uses locks in a well-behaved way. This means we can considerably simplify the notion of fairness for ⇒ L -derivations and obtain a criterion that is almost identical to, but slightly more liberal than, Fietzke and Weidenbach's, thereby re-proving the completeness of labeled splitting.
For terminating derivations, their fairness criterion coincides with ours: Both require that the final A-clause set is locally saturated and all propositional clauses are satisfied by the interpretation. For diverging derivations, Fietzke and Weidenbach construct a limit subsequence ( i , N i ) i of the derivation ( i , N i ) i and demand that every persistent inference in it be made redundant, exactly as we do for ⇒ L -derivations. The subsequence consists of all states that lie on the split tree's unique infinite branch. Therefore, this subsequence converges to a limit point of the full derivation. Locks are well behaved, with lim sup j→∞ L j = L ∞ , because with the strategy above, once an A-clause is enabled on the rightmost branch, it remains enabled forever. Our definition of fairness allows more subsequences, although this is difficult to exploit without bringing in all the theoretical complexity of AVATAR.
Example 83 Alternating age-based and unrestricted heuristic selection is incomplete for labeled splitting just as it is for AVATAR (Example 58). To see why, start with the clause set {p(a, y), ¬p(x, y) ∨ p(s(x), y), q(a), r(x) ∨ q(y) ∨ ¬p(x, y), s(x) ∨ ¬q(x), ¬s(x)} and always select the negative literal if there is one. The prover begins by deriving p(s(a), y) and r(a) ∨ q(y) using the age-based heuristic. Then it heuristically selects r(a) ∨ q(y) and splits it. In the left branch, where q(y) is enabled, q(a) is locally redundant and locked. Before age-based selection allows the prover to derive ⊥ from the clauses s(x) ∨ ¬q(x), q(y), and ¬s(x), it will also have derived p(s(s(a)), y) and r(s(a)) ∨ q(y). When the prover switches back to the right branch, it can heuristically select the newly derived disjunction and split it.
This process can be repeated to give rise to infinitely many splittable clauses of the form r(s i (a)) ∨ q(y). In this way, no inferences are ever performed in the rightmost branch, only splits. The clause q(a), which is necessary for a refutation, is never selected for inferences; most of the time, it is even locally redundant.

SMT with Quantifiers
SMT solvers based on DPLL(T ) [20] combine a SAT solver with theory solvers, each responsible for reasoning about a specific quantifier-free theory (e.g., equality, linear integer arithmetic). In the classical setup, the theories are decidable, and the overall solver is a decision procedure for the union of the theories. Some SMT solvers, including cvc5 [3], veriT [8], and Z3 [10], also support quantified formulas via instantiation at the expense of decidability.
Complete instantiation strategies have been developed for various fragments of first-order logic [16,23,24]. In particular, enumerative quantifier instantiation [24] is complete under some conditions. An SMT solver following such a strategy ought to be refutationally complete, but this has never been proved. Although SMT is quite different from the architectures we have studied so far, we can instantiate our framework to show the completeness of an abstract SMT solver. The model-guided prover MG will provide a suitable starting point, since we will need neither L's locking mechanism nor AV's given clause procedure.
Let F be the set of first-order -formulas with a distinguished falsehood ⊥. We represent the SMT solver's underlying SAT solver by the Unsat rule and complement it with an inference system FInf that clausifies formulas, detects inconsistencies up to theories excluding quantifiers, and instantiates quantifiers. For FRed, we take an arbitrary instance of the standard redundancy criterion [2,Sect. 4.2.2]. It can be used to split disjunctions destructively and to simplify formulas. We define the "theories with quantifiers" calculus TQ = (FInf, FRed). For the consequence relations | and | ≈, we use entailment in the supported theories including quantifiers.
Some theories such as linear integer arithmetic are not compact and thus cannot directly be used for the consequence relation. Instead, we define M | LIA N to be true if and only if there exist finite sets M ⊆ M and N ⊆ N such that M − → N is valid modulo linear integer arithmetic. For finite sets, this relation coincides with noncompact entailment: If M is finite, then M | LIA ⊥ if and only if M is inconsistent modulo linear integer arithmetic. Both completeness and soundness of a concrete prover are statements about the finite set of input formulas, so using a compactified version of the consequence relation is purely an implementation detail and poses no restriction.
The clausification rules work on logical symbols outside quantifiers; they derive C and D from a premise C ∧ D, among others. The theory rules can derive ⊥ from some finite formula set N if N | {⊥}, ignoring quantifiers; this triggers a model switch. Finally, the instantiation rules derive formulas p(t) from premises ∀x. p(x), where t is some ground term; the instantiation strategy determines which ground terms must be tried and in which order. A lot of complexity hidden in FInf-such as purification and theory-specific data structures and algorithms-is taken as a black box.
As with AVATAR, the initial problem is expressed using -formulas. We use the same approximation function as in AVATAR to represent formulas as assertions (Example 8). Abusing terminology slightly, let us call an A-formula C←A a subunit if C is not a disjunction. Whenever a (ground) disjunction C ∨ D ← A emerges, we immediately apply Split. This delegates clausal reasoning to the SAT solver. It then suffices to assume that TQ is complete for subunits.
Theorem 84 (Dynamic completeness) Assume TQ is statically complete for subunit sets. Let (J i , N i ) i be a fair ⇒ MG -derivation based on TQ. If N 0 | {⊥} and N ∞ contains only subunits, then ⊥ ∈ N j for some j.
Proof The proof is analogous to that of Theorem 28. Because we only have conditional static completeness of (FInf, FRed), we need the assumption that N ∞ contains only subunits.
Care must be taken to design a practical fair strategy. Like AVATAR-based provers, SMT solvers will typically not perform all SInf-inferences, not even up to SRed I . Given a ≈ b ← {v 0 }, b ≈ c ← {v 1 }, a ≈ d ← {v 2 }, c ≈ d ← {v 3 }, and a ≈ c ← {v 4 }, an SMT solver will find only one of the conflicts ⊥ ← {v 0 , v 1 , v 4 } or ⊥ ← {v 2 , v 3 , v 4 } but not both. This leaves us in a similar predicament as with locking: A theory conflict might be nonredundant at the limit point, even though it is redundant at every point of the derivation. The SMT solver just happened to choose the wrong conflict every time.

Example 85 Consider the initial clause set
{∀x (x ≤ 0 ∨ a > x), a ≈ 0, a + 3 < 2} Eagerly applying quantifier instantiation, we get the instances Iterating this process, we see that all conflicts are of the form [a > i] for some i. However, at the limit point-where [a > i] is false for every i-none of these conflicts is enabled. The only conflict which exists at the limit point is between a ≈ 0 and a + 3 < 2, and the solver never finds it because it detects a different conflict first.
For decidable theories, a practical fair strategy is to first clausify and detect theory conflicts and to instantiate quantifiers only if no other rules are applicable. A similar case analysis as in the proof of Lemma 77 works to establish fairness for this strategy.
First consider the case where quantifier instantiation is invoked infinitely often. Then there exists an infinite subsequence (J j , N j ) j of states such that (1) (J j ) j converges to a limit point, and (2) no N j has a theory conflict. To prove the ⇒ MG -derivation fair, we need to show that ι ∈ FInf((N ∞ ) J ) implies ι ∈ FRed I ((N i ) J ) for every ι. If ι is a theory conflict or clausification inference, then its finitely many premises are in N j for some j, contradicting the strategy. Otherwise, ι is a quantifier instantiation. Here, it suffices to ensure that A-formulas that are enabled infinitely often at a quantifier instantiation step are also fully instantiated. (Just as with AV provers, it is possible that not all limit points are saturated.) Otherwise, quantifier instantiation is only invoked finitely often-either because every encountered model had a theory conflict, or because there was nothing to instantiate. Here, it suffices to assume that clausification is a strongly finitary simplification bound (which means that a formula can only be clausified in a finite number of ways). Under this assumption only finitely many base formulas will be derived; this implies that only a finite number of models will be considered. The last model will then be saturated due to the strategy.
There is also the question of model soundness. If the SMT solver starts with the -formula set N 0 and ends in a state (J i , N i ) with J i | (N i ) ⊥ , we would like the solver to generate a model of (N i ) J i , from which a model of N 0 can be derived. This is possible if the solver performs only sound inferences and applies Approx systematically. Then (N i ) J i is fully exposed to the propositional level, and fml(J i ) is a theory model of N J i and therefore of N 0 .
Our mathematization of AVATAR and SMT with quantifiers exposes their dissimilarities. With SMT, splitting is mandatory, and there is no subsumption or simplification, locking, or active and passive sets. And of course, theory inferences are n-ary and quantifier instantiation is unary, whereas superposition is binary. Nevertheless, their completeness follows from the same principles.

Conclusion
Our framework captures splitting calculi and provers in a general way, independently of the base calculus. Users can conveniently derive a dynamic refutational completeness result for a splitting prover based on a given statically refutationally complete calculus. As we developed the framework, we faced some tension between constraining the SAT solver's behavior and the saturation prover's. It seemed preferable to constrain the prover, because the prover is typically easier to modify than an off-the-shelf SAT solver. To our surprise, we discovered counterexamples related to locking, formula selection, and simplification, which may affect Vampire's AVATAR implementation, depending on the SAT solver and prover heuristics used. We proposed some restrictions, but alternatives could be investigated.
We found that labeled splitting can be seen as a variant of AVATAR where the SAT solver follows a strict strategy and propositional variables are not reused across branches. A benefit of the strict strategy is that locking preserves completeness. As for the relationship between AVATAR and SMT, there are some glaring differences, including that splitting is necessary to support disjunctions in SMT but fully optional in AVATAR. For future work, we could try to complete the picture by considering other related architectures [5-7, 11, 12].