Unprovability results for clause set cycles

The notion of clause set cycle abstracts a family of methods for automated inductive theorem proving based on the detection of cyclic dependencies between clause sets. By discerning the underlying logical features of clause set cycles, we are able to characterize clause set cycles by a logical theory. We make use of this characterization to provide practically relevant unprovability results for clause set cycles that exploit different logical features.


Introduction
The subject of automated inductive theorem proving (AITP) is a subfield of automated theorem proving, that aims at automating the process of finding proofs that involve mathematical induction. The most prominent application of automated inductive theorem proving is the formal verification of hardware and software. Another field of application of automated inductive theorem proving is the formalization of mathematical statements, where AITP systems assist humans in formalizing statements by discharging lemmas automatically, suggest inductions [Nag19], or explore the theory [JRSC14,VJ15].
Finding a proof by mathematical induction essentially amounts to finding suitable induction formulas [HW18]. This is a challenging task, because induction formulas have in general a higher syntactic complexity than the formula one wants to prove. This phenomenon is commonly known as the non-analyticity of induction formulas and can for example manifest itself in the number of free variables as well as the number of quantifier alternations of the induction formula. Indeed, in the language of primitive recursive arithmetic there is a sequence of quantifierfree formulas whose proofs require induction formulas of unbounded quantifier complexity. We refer to [HW18] for a precise exposition of the non-analyticity phenomenon.
A large variety of methods for automating mathematical induction has been developed. Methods usually differ in the type of induction formulas they generate, the calculus they are integrated in, and other more technical parameters such as the degree of automation, the input encoding, semantics of datatypes, and so on. For example there are methods based on term rewriting [Red90], theory exploration [CJRS13], and integration into saturation-based provers [KP13,Ker14], [Cru15,Cru17], [EP20], [RV19,HHK + 20].
The current methodology in automated inductive theorem proving focuses on empirical evaluations of its methods. A given method is usually evaluated on a set of benchmark problems such as [CJRS15]. Such an evaluation provides evidence about the strengths and weaknesses of a method but does not result in a systematic understanding of the underlying principles. In particular, it is difficult to compare the methods with each other in terms of their logical strength and to provide explanations of the failures of a given method.
The work in this article is part of a research program that addresses this problem by formally analyzing AITP systems in order to discern their underlying logical principles. The analysis of an AITP system typically begins by developing a suitable abstraction. After that, the abstraction is simulated by a logical theory, whose properties can be investigated by applying powerful results and techniques from mathematical logic. Analyzing families of methods in the uniform formalism of logic allows us not only to understand the strength of individual methods but also to compare the methods with each other. Furthermore, approximating an AITP system by a logical theory is a prerequisite to providing concrete and practically meaningful unprovability results. These results are especially valuable because they allow us to determine the logical features that a given method lacks. Thus, negative results drive the development of new and more powerful methods.
In [HV20] the authors of this article have introduced the notion of clause set cycle as an abstraction of the n-clause calculus [KP13,Ker14]-an extension of the superposition calculus by a cycle detection mechanism. In particular, we have shown an upper bound on the strength of clause set cycles in terms of induction for ∃ 1 formulas and moreover that this bound is optimal with respect to the quantifier complexity of induction formulas.
In this article we continue this analysis of clause set cycles. By discerning the logical features underlying the formalism of clause set cycles more precisely, we are able to provide an exact characterization of refutation by a clause set cycle in terms of a logical theory. After that, we make use of the characterization of clause set cycles to provide practically meaningful clause sets that are not refutable by clause set cycles, but that are refutable by induction on quantifier-free formulas. Hence, the results in this article settle in particular Conjecture 4.7 of [HV20]. We provide unrefutability results that exploit different logical features of clause set cycles. This allows to recognize features that are particularly restrictive.
In Section 2 we will first introduce general notions and results about the logical setting that we use in this article. In Section 3 we carry out the analysis of clause set cycles which culminates in Section 3.3 with two unprovability results for clause set cycles. The two unprovability results exploit different logical features of clause set cycles. The proof of the first unprovability result is straightforward, whereas the second unprovability result relies on a more involved independence result in the setting of linear arithmetic whose proof is carried out in Section 4.

Preliminary definitions
In this section we introduce some definitions that we will use throughout the article. In Section 2.1 we will briefly describe the logical formalism. In Section 2.2 we describe the setting of formal linear arithmetic, in which we will formulate in Section 3.3 a family of clause sets that are refutable by open induction but that are not refutable by clause set cycles.

Formulas, theories, and clauses
We work in a setting of classical logic with equality, that is, the logic provides besides the usual logical symbols a binary infix predicate symbol = representing equality. A first-order language L is a set of predicate symbols and function symbols together with their respective arities. Let S be a predicate or function symbol, then we write S/n to indicate that S has arity n. Terms, atoms, and formulas are constructed as usual from function symbols, variable symbols, the logical connectives ¬, ∧, ∨, →, ↔, and the quantifiers ∃ and ∀. A ground term is a term that does not contain variables. The set of all L formulas is denoted by F(L). A sentence is a formula that does not contain free variables. Let x 1 , . . . , x n be variables, t 1 , . . . , t n a terms, and ϕ a formula, then ϕ[x 1 /t 1 , . . . , x n /t n ] denotes the simultaneous substitution of x i by t i for i = 1, . . . , n in ϕ.
In this article we are more interested in the axioms of a theory, rather than the deductive closure of these axioms. Hence, we define a theory as a set axioms and manipulate the deductive closure by means of the first-order provability relation.
Definition 1 (Theories and provability). A theory T is a set of sentences called the axioms of T . Let T, U be theories, then by T + U we denote the theory axiomatized by T ∪ U . Let ϕ be a formula, then we write T ⊢ ϕ if ϕ is provable in first-order logic from the axioms T . Let Γ be a set of formulas, then we write T ⊢ Γ if T ⊢ γ for each γ ∈ Γ. Furthermore, we write T ≡ U , if T ⊢ U and U ⊢ T .
Let T be a theory and ϕ a formula, then we write T + ϕ to denote the theory axiomatized by the axioms of T and the universal closure of ϕ. In this article we will be particularly interested in formulas with a restricted number of quantifier alternations.
Definition 2. We say that a formula ϕ (possibly containing free variables) is ∀ 0 or ∃ 0 if ϕ is quantifier-free. Moreover, we say that a formula ϕ( z) is ∀ k+1 (∃ k+1 ) if it is of the form (∀ x)ψ( x, z) ((∃ x)ψ( x, z)) and ψ is ∃ k (∀ k ). Let L be a firstorder language, then by Open(L), ∃ k (L), and ∀ k (L), we denote the quantifier-free formulas, ∃ k formulas, and the ∀ k formulas of the language L. A theory is said to be ∃ k (∀ k ) if all of its axioms are ∃ k (∀ k ) sentences.
Clause sets are an alternative representation of ∀ 1 formulas, that is preferred by automated theorem provers because of its uniformity.
Definition 3 (Literals, clauses, clause sets). Let L be a first-order language. By an L literal we understand an L atom or the negation of an L atom. An L clause is a finite set of L literals. An L clause set is a set of L clauses. Whenever the language L is clear from the context, we simply speak of atoms, literals, clauses and clause sets.
We will now recall some basic model-theoretic concepts.
Definition 4. Let L be a first-order language, then L structures and the first-order satisfaction relation |= are defined as usual. Let L ′ ⊆ L be a first-order language and M an L structure, then by M | L ′ we denote the L ′ reduct of M . Let M be an L structure, then we write b ∈ M to express that b is an element of the domain of M . Formulas and clauses are interpreted as usual. In particular, a clause is interpreted as the universal closure of the disjunction of its literals. Let ∆ be a set of L formulas and L clauses, then M |= ∆ if M |= δ for each δ ∈ ∆.
Let us conclude this section by introducing some notation to manipulate clauses and clause sets.
Definition 5. By cls we denote a fixed function that assigns to every ∀ 1 sentence ϕ, a clause set cls(ϕ) over the language of ϕ such that ϕ and cls(ϕ) are logically equivalent. Let Γ be a set of ∀ 1 sentences, then we define cls(Γ) := γ∈Γ cls(γ). Furthermore, by cls −1 we denote a fixed function that assigns to every clause set C a ∀ 1 sentence cls −1 (C) over the language of C such that C and cls −1 (C) are logically equivalent.
Lemma 6. Let C be a finite set of clause sets, then there exists a clause set C ′ such that M |= C ′ if and only if there exists C ∈ C such that M |= C.

Induction and formal arithmetic
In this section we introduce some basic notions about induction and formal arithmetic. In particular we introduce the setting of linear arithmetic in which we formulate an unrefutability result for clause set cycles in Section 3.3.
Inductive theorem provers customarily work in a many-sorted setting with a notion of inductive datatypes encompassing at least the natural numbers, lists, trees, and sometimes even more complicated types such as mutually recursive datatypes. However, working in such a general setting is notationally tedious. Moreover, all the phenomena we are interested in can already be observed over the natural numbers. Hence, we restrict ourselves in this article to a one-sorted setting over the natural numbers. By 0/0 and s/1 we denote function symbols that represent the number zero and the successor function on natural numbers, respectively. We fix some abbreviations. Let n be a natural number and t a term, then s n (t) denotes the term s(· · · s n times (t) · · · ) and n denotes the term s n (0). Furthermore, let + be a binary infix function symbol representing addition of natural numbers, then the notation n · t for the multiplication of the term t by the constant n is defined inductively by 0 · t = 0 and (i + 1) · t = t + (i · t).
Definition 7. Let ϕ(x, z) be a formula, then I x ϕ denotes the formula In the definition above, we call ϕ the induction formula, x the induction variable, and z the induction parameters. Let Γ be a set of formulas, then the theory Γ-IND is axiomatized by the universal closure of the formulas I x γ with γ ∈ Γ.
If induction is carried out on formulas without induction parameters, we speak of parameter-free induction. A notion related to the induction scheme is that of inductivity in a theory.
. Whenever the induction variable x is clear from the context we simply say that ϕ is inductive in T .
Let us now introduce the setting of linear arithmetic. This setting has the advantage of being sufficiently complex to provide interesting independence results while still having straightforward model theoretic properties.
Definition 9 (Language of linear arithmetic). The function symbol p/1 represents the predecessor function on natural numbers and the infix function symbol +/2 represents the addition of natural numbers. The language L LA of linear arithmetic is {0, s, p, +}.
By N we denote the set of natural numbers as well as the L LA structure whose domain is the set of natural numbers and that interprets the symbols 0, s, + naturally and interprets the symbol p by p N (0) = 0 and p N (n + 1) = n for all n ∈ N. Analogously, we denote by Z the set of integers and the L LA structure whose domain consists of the integers and that interprets all symbols naturally. In particular, Z interprets the symbol p as the function x → x − 1. All the theories of linear arithmetic that we will work with are extensions of the following base theory.
Definition 10. The L LA theory B is axiomatized by the universal closure of the formulas x + s(y) = s(x + y).
In the following we will recall some basic properties of the theory B and its extension by induction for quantifier-free formulas. Clearly, we have N |= B and Z |= A1 + A3 + A4 + A5, but Z |= A2, since Z |= p(0) = −1.
Lemma 13. Let t be an L LA ground term, then there exists k ∈ N such that Proof. Proceed by induction on the structure of the term.
Lemma 14. Let ϕ be a quantifier-free L LA sentence, then either B ⊢ ϕ or B ⊢ ¬ϕ.
Proof. By a straightforward induction on the structure of the sentence ϕ. The only interesting case is the case where ϕ is an atom t 1 = t 2 . By Lemma 13 there exist k 1 , k 2 ∈ N such that B ⊢ t 1 = t 2 ↔ k 1 = k 2 . If k 1 = k 2 we apply reflexivity. Otherwise, we apply Lemma 11.(ii) repeatedly and finally we use 11.(i).

Analysis of clause set cycles
In this section we carry out an analysis of the formalism of refutation by a clause set cycle. In Section 3.1 we define clause set cycles and recall some basic properties as well as some results from [HV20]. After that, we will provide in Section 3.2 a characterization of clause set cycles in terms of a logical theory with induction. Finally, in Section 3.3 we will use this characterization and an independence result, that will be proved in Section 4, to obtain concrete and practically meaningful unrefutability results for clause set cycles.

Clause set cycles
Refutation by a clause set cycle is a formalism introduced in [HV20] by the authors of this article to describe abstractly the inductive arguments that take place in the n-clause calculus [KP13,Ker14]. The n-clause calculus is an extension of the superposition calculus by a mechanism that detects cyclic dependencies between the derived clauses. These cyclic dependencies correspond to arguments by infinite descent and thus establish the inductive unsatisfiability of a set of clauses. The notion of refutation by a clause set cycle abstracts the underlying superposition calculus and the detection of the cycle in that proof system and therefore extracts the essence of the arguments by infinite descent that may appear in refutations by the n-clause calculus. Since all the variables occurring in clauses are implicitly universally quantified, a clause set does not have a free variable on which we can carry out an argument by induction. Instead we will rely on a special constant symbol η, on which arguments by infinite descent will take place. This is in analogy to the special constant n that is used by the n-clause calculus for the same purpose, see [KP13]. The constant η can be thought of as a Skolem constant, that is selected before a refutation is attempted. In particular, clauses may of course contain other Skolem symbols besides η.
Carrying out arguments by infinite descent (or induction) only on positions of constants is unsurprisingly very restricting (see Corollary 49). Since clause set cycles are used as an abstraction of the inductive cycles of the n-clause calculus, we did not extend the formalism to allow arguments to take place in more varied positions. The logical characterization that we give in Section 3.2 makes considering such extensions easier. In particular, the main unprovability result of this article, Corollary 55, does not rely on this restriction. A method that lifts this restriction has been proposed in [EP20].
Remark 19. In the literature [KP13, Ker14, HV20] a constant such as η is usually called a parameter. In order to avoid confusion with induction parameters in the sense of Definition 7 we will not use this designation.
Let C be a clause set possibly containing η, then we write C(η) to indicate all the occurrences of η in C. Let furthermore t be a term, then C(t) denotes the clause set obtained by replacing all the occurrences of η in C by t. Hence, we obtain an finite strictly descending sequence of natural numbers m such that M [η → m] |= C(η). This is impossible, hence M |= C(η).
Remark 21. In the literature cycles on clause sets are usually equipped with parameters that control the offset and the descent step size and thus permit a more flexible usage of the cycles (see for example [KP13]). In Definition 24 we shall consider clause set cycles with parameters inspired by the parameters found in the cycles of the n-clause calculus. After that, we show in Proposition 26 that such parameters do not make the system more powerful. In particular, [HV20] uses a slightly different notation. A refutation by a clause set cycle in [HV20] corresponds to a refutation by a (1, 0)-clause set cycle with external offset i ∈ N in the sense of Definition 24. Hence, by Proposition 26 the notion of refutation by clause set cycle used in [HV20] is exactly as powerful as the more elegant notion of refutation by a clause set cycle used in this article.
Clause set cycles could be integrated into a saturation-based prover by carrying out the saturation process as usual and by detecting a clause set cycles among the clauses derived so far, thus satisfying Condition C3 with respect to the set of generated clauses. The detection of a clause set cycle could for example make use of the derivation relation generated by the prover in order to detect the Conditions C1 and C2. The detection of a clause set cycle, then provides the inductive unsatisfiability of the clauses generated and therefore ends the refutation. This is essentially how the n-clause calculus described in [KP13,Ker14] operates.
Let us now consider an example of a refutation by a clause set cycle. Example 23. Intuitively, the clause set C(η) asserts the existence of an element η, which is neither even nor odd. We will now show that C(η) is a clause set cycle. We start by showing that C(η) satisfies Condition (C2). Suppose that C(0) has a model M , then we have in particular M |= 0 = 0+0 = 0. This is a contradiction, and therefore C(0) |= ⊥.
For Condition (C1), let M be a model of C(s(η)). Clearly, we have M |= cls(B+B2), hence we only have to show that M |= η = x+x and M |= η = s(x+x).
Hence, C(η) is a clause set cycle and therefore refutes itself.
The induction argument contained in a refutation by a clause set cycle is peculiar in the sense that it does not take place in an explicit background theory. Instead of a background theory clause set cycles may contain clauses free of η that act as a background theory. In the example above the clause set cycle contains the clauses cls(B + B2), that correspond to the background theory.
The cycles detected by practical methods such as the n-clause calculus differ from clause set cycles in that they can be controlled by three parameters: An external offset, an internal offset, and the step size of the descent. In the following we will show that these parameters do not increase the overall strength of the system.
Definition 24. Let L be a first-order language and j, k ∈ N with j ≥ 1. A finite L ∪ {η} clause set C(η) is called an L (j, k)-clause set cycle if We call the parameter j the descent step size and k the internal offset. Let i ∈ N and D(η) an L ∪ {η} clause set, then D(η) is refuted by the (j, k)-clause set cycle Clearly, clause set cycles in the sense of Definition 20 are exactly the (1, 0)clause set cycles and a refutation by a clause set cycle in the sense of Definition 20 is a refutation by a (1, 0)-clause set cycle with external offset 0.
We start by showing that (j, k)-clause set cycles with j, k ∈ N and j ≥ 1 can be simulated by clause set cycles.
Lemma 25. Let L be a first-order language, j, k ∈ N with j ≥ 1, and C(η) an L (j, k)-clause set cycle. Then there exists a clause set cycle C ′ (η) such that C(s k (η)) |= C ′ (η).
Proof. We start by eliminating the internal offset of the (j, k)-clause set cycle, by letting C ′ (η) := C(s k (η)). It is clear that C ′ is a (j, 0)-clause set cycle. Moreover by the definition of C ′ we have C(s k (η)) |= C ′ (η). Let C ′′ (η) be the clause set obtained by applying Lemma 6 to the set C : . Otherwise we have m + 1 = j and therefore by C1' we obtain M |= C ′ (η) and since C ′ (η) ∈ C, we have C ′ (η) |= C ′′ (η). Now we can show that a refutation by a (j, k)-clause set cycle with internal offset i, where i, j, k ∈ N with j ≥ 1 can be reduced to a refutation by a clause set cycle.
Proposition 26. Let L be a first-order language, D(η) an L ∪ {η} clause set, and i, j, k ∈ N with j ≥ 1 such that D is refuted by an L (j, k)-clause set cycle with external offset i. Then D(η) is refuted by a clause set cycle.
Proof. Let C(η) be an L (j, k)-clause set cycle such that D(η) is refuted by C with external offset i. By Lemma 25 there exists a clause set cycle C ′ (η) such that C(s k (η)) |= C ′ (η). Hence D is refuted by a (1, 0)-clause set cycle with external offset i. In the next step we will eliminate the external offset. Let C := {D(s m (η)) | m = 0, . . . , i − 1} ∪ {C ′ (η)} and apply Lemma 6 in order to obtain a clause set C ′′ (η) corresponding to the disjunction of the clause sets in C. We will now show that The first case is impossible because of Condition C3" and the second case is impossible because C ′ (η) is a clause set cycle and therefore we need to consider two cases. If m + 1 < i, then we have D(s m+1 (η)) ∈ C and therefore M |= C ′′ (η). Otherwise we have m + 1 = i, and therefore we obtain M |= C ′ (η) by Condition C3'. Again we obtain M |= C ′′ (η). Hence C ′′ (η) is a clause set cycle. We complete the proof by observing that D(η) |= C ′′ (η), since D(η) ∈ C. Hence, D(η) is refuted by the clause set cycle C ′′ (η).
As already mentioned earlier, the notion of refutation by a clause set cycle is a useful intermediary abstraction of the induction mechanism of a family of AITP systems including in particular the n-clause calculus [KP13,Ker14]. Since our goal is to develop a uniform logical representation of methods for AITP, we thus use the notion of refutation by a clause set cycle as a starting point to provide logical abstractions of AITP systems such as the n-clause calculus. In particular, we want, for a fixed language L, to provide a logical L ∪ {η} theory T that simulates refutation by a clause set cycle in the following sense: Let D(η) be an L ∪ {η} clause set that is refuted by an L clause set cycle, then T + D(η) is inconsistent. The authors of this article have shown in [HV20] that refutation by L clause set cycles can be simulated by the theory ∃ 1 (L)-IND (see Theorem 42) and moreover that Open(L)-IND does not simulate refutations by a clause set cycle.
In the following section, we will give a proof of Theorem 27 that is simpler, shorter, and more elegant than the proof given in [HV20].
Definition 28. Let n, m be natural numbers, then by n− m we denote the truncated subtraction of m from n given by Proof. By Theorem 18 it suffices to show that B ′ ⊢ (∃y)(x = y + y ∨ x = s(y + y)).
Consider the L LA structure M whose domain consists of the pairs of the form (m, n) ∈ N × Z such that m = 0 implies n ∈ N and that interprets the non-logical symbols as follows: Proof of Theorem 27. Consider the clause set C(η). In Example 23 we have shown that C(η) is refuted by an L LA clause set cycle. We will now show that Open(L LA )-IND+ C(η) is consistent. We proceed indirectly and assume that Open(L LA )-IND + C(η) is inconsistent. Hence B+B2+Open(L LA )-IND ⊢ (∃y)(η = y + y)∨(∃y)(η = s(y + y)). Thus, B + Open(L LA )-IND ⊢ (∃y)(x = y + y ∨ x = s(y + y)), which contradicts Lemma 29.
However, empirical evidence suggests that clause set cycles are not strictly stronger than open induction. This has given rise to the following conjecture.
In the following section we will give a characterization of refutation by a clause set cycle in terms of a logical theory. In Section 3.3 we will make use of this characterization to give a positive answer to Conjecture 30.

Logical characterization
In the previous section we have introduced the notion of refutation by a clause set cycle and we have shown that certain practically motivated generalizations of refutation by a clause set cycle do not result in stronger systems. In this section we will give a characterization of refutation by a clause set cycle in terms of a logical theory.
We start by converting clause set cycles into formulas.
Let C be a clause set cycle, then the formula ¬cls −1 (C)[η/x] is the formula that corresponds to the induction argument contained in a refutation by a clause set cycle. Clearly, this formula is logically equivalent to an ∃ 1 formula. In the following we will make three further important observations about this argument by induction.
The first observation is that the formula ¬cls −1 (C)[η/x] has only one free variable, that is, the variable on which the argument by induction takes places. Hence the induction captured by clause set cycles is essentially parameter-free induction. In this article we use a notation for parameter-free induction that is inspired by the notation used in the literature from mathematical logic on parameter-free Definition 32. Let Γ be a set of formulas, then Γ-IND − is axiomatized by the universal closure of the formulas I x ϕ for ϕ(x) ∈ Γ.
When the set of induction formulas is unrestricted, induction without parameters is just as powerful as induction with parameters.
Lemma 33. Let L be a first-order language, then we have Proof. We only show F(L)-IND − ⊢ F(L)-IND, the other direction is trivial. Let ϕ(x, z) be an L formula, x a variable, and z a vector of variables. We let the formula ψ(x) be given by

By a straightforward quantifier shift we obtain
However, when we are dealing with restricted induction schemes such as ∃ k (L)-IND, then its parameter-free counterpart ∃ k (L)-IND − may be a weaker theory [KPD88].
Another remarkable property of the formula ¬cls −1 (C)[η/x] is its ∅-inductivity. In a refutation by a clause set cycle, there is no explicit induction axiom. Instead, whenever a clause set C(η) is shown to be a clause set cycle, it can be used in a refutation. This is reminiscent of a Hilbert-style induction rule that allows us to deduce The idea of Hilbert-style inference rules and in particular of induction rules is made explicit in the following two definitions.
Definition 34. An inference rule R is a set of tuples of the form Γ/γ 0 called the instances of R, where Γ = {γ 1 , . . . , γ n } is a finite set of sentences and γ 0 is a sentence. Let T be a theory, then the theory of unnested applications [T, R] of the inference rule R over the theory T is axiomatized by Let R be an inference rule and Γ/γ 0 ∈ R, then the intended meaning of the rule instance Γ/γ 0 is that whenever all the sentences in Γ are derived, then we can derive γ 0 . The instance Γ/γ 0 will also be written as Definition 35. Let Γ be a set of formulas, then the rule Γ-IND R consists of the instances of the form with γ ∈ Γ and where the variable x is called the induction variable and the variables z are called the induction parameters. The induction rule Γ-IND R− consists of these instances of Γ-IND R where the induction variable is the only free variable of the induction formula.
Let T be a theory and Γ a set of formulas, then we can make use of Definition 8 to reformulate the theory [T, Γ-IND R ] as follows In other words the theory [T, Γ-IND R ] provides induction only for T -inductive formulas from Γ, whereas T + Γ-IND provides induction for all formulas in Γ. It is obvious that is in general not as strong as T + Γ-IND, see [Par72]. For further literature on induction rules, see for example [Sho58,She63,Par72,Bek97a,Jeř20].
We will now make a last observation about the argument by induction contained in a refutation by a clause set cycle. The previous observations show that clause set cycles are simulated by unnested applications of the parameter-free ∃ 1 induction rule over the theory ∅. A sentence derived by an induction rule is the universal closure of an inductive formula. Hence, once a formula is derived by an induction rule it can be instantiated freely. Similarly, a clause set cycle C(η) acts, roughly speaking, as the lemma ¬cls −1 (C)[η/x] of which, however, only the instance ¬cls −1 (C) is used. In other words, a clause set cycle allows us to derive properties of η only. We will informally refer to this restriction as the instance restriction. We can capture this restriction in the following restricted induction rule.
Definition 36. Let Γ be a set of formulas, then the rule Γ-IND R η consists of the instances of the form The rule Γ-IND R− η consists of those instances of Γ-IND R η where the induction variable is the only free variable of the induction formula.
By combining the above observations we obtain the following proposition, that allows us to simulate clause set cycles in a logical theory.
Proof. Since D is refuted by a clause set cycle, there exists an L clause set cycle is clearly logically equivalent to D(η). By the soundness of first-order logic it thus suffices to show that Let ψ(x) be an ∃ 1 formula that is logically equivalent to ¬cls −1 (C)[η/x]. Then, by applying the completeness theorem and the deduction theorem to (*), we obtain By Lemma 31 we know that ψ(x) is ∅-inductive, and therefore we have Hence, by considering the contrapositive of ( †) we clearly obtain [∅, We will now show that we even have the converse and thus obtain a characterization of refutation by a clause set cycle by a logical theory. We start by observing that finitely many inductive formulas can be fused into a single inductive formula.
Lemma 38. Let T be a theory and let This simple result is particularly interesting because fusing inductive formulas neither introduces more induction parameters and when fusing ∃ k induction formulas, the fused induction formula is also logically equivalent to an ∃ k formula. Similar techniques exist for fusing a finite number of induction axioms into a single induction axiom [HW18,Gen54]. However, these either introduce a new induction parameter or increase the quantifier complexity of the resulting induction formula.
We thus obtain a characterization of refutation by a clause set cycle in terms of induction rules.
Proof. An immediate consequence of Propositions 37 and 39.
Remark 41. In a refutation by a clause set cycle the constant η plays essentially two roles: On the one hand, it can be thought of as a Skolem symbol and, on the other hand, it plays the role of an induction variable. The characterization of Theorem 40 clarifies this situation by allowing us to distinguish between induction variables and the Skolem symbol η.
As a corollary we obtain Theorem 2.10 of [HV20].
Theorem 42 ([HV20, Theorem 2.10]). Let L be a first-order language and D(η) an L∪{η} clause set. If D(η) is refuted by an L clause set cycle, then In the following section we will make use of the characterization of Theorem 40 to construct clause sets that are refutable by open induction but which are not refutable by clause set cycles. In particular the unrefutability results that we provide exploit different logical features of clause set cycles.

Unprovability by clause set cycles
In the previous sections we have introduced the notion of refutation by a clause set cycle for which we have shown a characterization in terms of a logical theory. We have shown this characterization by discerning four main logical features of refutation by a clause set cycle: the quantifier-complexity, the absence of induction parameters, the similarity with induction rules, and the restriction on instances of derived formulas. In this section we will make use of this characterization in order to provide practically relevant clause sets that are not refutable by clause set cycles, but that are refutable by induction on quantifier-free formulas. The unrefutability results in this section will exploit different logical features of clause set cycles. In particular we will show that restricting the instances of the conclusion of the induction rule can be very drastic.
Let us now briefly discuss the practical applicability of the unprovability results given in this section. The unprovability results apply to any sound (for first-order logic) saturation prover that detects clause set cycles over the language of the initial clause set. Hence, our unprovability results apply, in particular, to all sound saturation provers that do not extend the language of the initial clause set and detect cycles among the derived clauses such as for example the n-clause calculus (see [KP13,Ker14]). On the other hand systems that extend the language are also of practical importance, since such extensions can be used to organize the refutation process, see for example [Vor14]. In particular, the extension of the language by definitions can be expected to have interesting effects. However, investigating the interaction between clause set cycles and various language extending mechanisms would go beyond the scope of this article and should be investigated separately. Observe, furthermore, that our setting does not rule out the presence of Skolem symbols other than η in clause set cycles.
We start by slightly reformulating Theorem 40 so that we can work with formulas and theories instead of clause sets.
In Section 3.1 we have informally observed that clause set cycles do not take place in some explicit background theory but instead clause set cycles contain the clauses corresponding to the background theory. In the following we will make this informal observation more precise.
Clearly, τ is logically equivalent to a ∀ 1 sentence, hence τ → γ(x) is logically equivalent to an ∃ 1 formula γ ′ (x). Hence, analogously, with the exception that in the last part of the argument we have to shift the universal quantifier in (∀x)(τ → γ(x)) inwards.
Lemma 44 allows us to move ∀ 1 axioms in and out of the induction rule and thus to consider the η-free clauses of a clause set cycle as the background theory. As an immediate consequence of Corollary 43 and Lemma 44 we now obtain a general pattern to reduce unrefutability problems for clause set cycles to independence problems.
Proposition 45. Let L be a first-order language, T a ∀ 1 L theory, and let ϕ(x, y) be a quantifier-free L formula. Then [T, y) if and only if the clause set cls(T + (∀ y)¬ϕ(η, y)) is refuted by an L clause set cycle.
We can now consider some theories and formulas that will yield clause sets that are unrefutable by clause set cycles. By the characterization of clause set cycles by a logical theory we have discerned several restrictions of the induction principle that corresponds to clause set cycles. In the following two subsections we will formulate unprovability results that attack different restrictions of the induction principle that is contained in the notion of refutation by a clause set cycle.

Instance restriction
In Section 3.2 we have observed that a refutation by a clause set cycle only permits a single instance of a clause set cycle to appear in a refutation. In this section we will formulate an unprovability result for clause set cycles that exploits this restriction. In particular, we will base this unprovability result on a stronger independence result that shows how drastic the instance restriction is.
Definition 46. Let f /1 be a function symbol and P/1 be a predicate symbol. The theory P is axiomatized by the universal closure of the following formulas Definition 47. Let ϕ(x, z) be a formula, then I η x ϕ denotes the formula Let Γ be a set of formulas, then the theory Γ-IND η is axiomatized by the universal closure of the formulas I η x γ with γ ∈ Γ.
We have the following independence. The above independence result is remarkable in the sense that it imposes no restriction whatsoever on the induction formulas, only the conclusion of the induction axioms is restricted. Hence the result shows that this restriction is extremely strong. As a corollary we obtain the following unrefutability result for clause set cycles. Proof. The formula P (x) is inductive in P.
Proposition 48, Corollary 49, and Lemma 50 together show that the η-restriction as encountered in the n-clause calculus is drastic and can result in pathological unrefutability phenomena. On the one hand, without the η-restriction a very simple argument by induction suffices to prove P (f (η)) and on the other hand in presence of the η-restriction even induction for all {0, s, P, f } formulas does not allow us to prove the formula P (f (η)). However, because of this the unrefutability result of Corollary 49 does not tell us anything about the other restrictions of the induction principle contained in refutations by a clause set cycle.
Hence, it would be interesting to have a similar result for linear arithmetic. In particular we conjecture the following.

Induction rule and absence of parameters
In the following we will consider another unprovability result for clause set cycles that does not make use of the instance restriction, but instead exploits the absence of induction parameters and the induction rule. This time we work in the setting of linear arithmetic described in Section 2.2. The unprovability result developed in this section is based on the following weak cancellation property of the addition of natural numbers.
Definition 52. Let k, n, m ∈ N with 0 < n < m, then we define The formula E k,n,m is a generalization of Most of the upcoming Section 4 is devoted to proving the following independence result.
Theorem 53. Let n, m, k ∈ N with 0 < n < m, then By making use of the above independence result and the characterization of refutation by a clause set cycle in Proposition 45, we straightforwardly obtain an unrefutability result.
Corollary 55. Let k, n, m ∈ N with 0 < n < m, then the clause set E k,n,m (η) is not refuted by an L LA clause set cycle.
Let us now discuss this unprovability result. The clause sets E k,n,m (η) with k, n, m ∈ N and 0 < n < m are refuted by open induction.
Hence, Corollary 55 together with Proposition 56 give a positive answer to Conjecture 30. We conclude this section with some remarks on this result and possible improvements.
The formula E 0,1,2 (x) is particularly interesting, because it can be proven by a comparatively straightforward induction. Proof. Clearly it suffices to show that the formula ϕ(x, y) := x+0 = y +x → y = 0 is B-inductive in x. It is obvious that B ⊢ ϕ(0, y). Now work in B and assume ϕ(x, y) and s(x) + 0 = y + s(x). By (A4) and (A5) we obtain s(x + 0) = s(x) = s(x) + 0 = y + s(x) = s(y + x). By Lemma 11 we obtain x + 0 = y + x, hence by the assumption we obtain y = 0.
This demonstrates that clause set cycles are a very weak induction mechanism in the sense that they are unable to deal even with simple generalizations and therefore fail to refute relatively simple clause sets. The unprovability results in Corollaries 49 and 55 were constructed so that only one Skolem constant η appears in the language of the considered clause sets. Consider now the clause set C given by where ν is a Skolem constant distinct from η. It is straightforward to check that C(η) is an L LA ∪ {ν} clause set cycle. Hence, if clause set cycles are detected on the languages obtained by Skolemization of the given property and its background theory, then clause set cycles allow us to prove the property x + 0 = y + x → y = 0 from B but fail to prove the weaker property x + 0 = x + x → x = 0 from B.
Thus, clause set cycles are sensitive to the syntactic material present in a given set clauses. In particular, Skolem constants other than η may act similar to induction parameters. The independence result of Theorem 53 also shows that the unrefutability result of Corollary 55 does neither rely on the η-restriction nor on the absence of nesting in clause set cycles. Moreover, in the light of Lemma 57 we conjecture that the unrefutability of Corollary 55 is entirely due to the absence of induction parameters from induction captured by clause set cycles.
Conjecture 58. Let k, n, m ∈ N with 0 < n < m, then Furthermore, we believe that an independence similar to the one in Conjecture 58 also holds for the atomic formula x + (x + x) = (x + x) + x, which is a wellknown challenging formula for inductive theorem provers [BIS92, Bee06, HHK + 20].

Nesting of the induction rule
In this section we briefly consider the role of the depth of the nesting of applications of the induction rule. The idea underlying the results developed in this section was brought to our attention by one of the anonymous reviewers. We will show that a formalism that extends clause set cycles to achieve a fixed finite depth of the nesting of the corresponding induction rule will have an unprovable clause set, that becomes provable when the nesting depth is increased by one. Moreover, the result remains valid in extensions of clause set cycles that allow for induction parameters. However, the unprovability results in this section are more abstract than in the previous sections in the sense that we work over a much stronger background theory. We expect that providing more elementary unprovability results is not difficult but is left as future work.
In the remainder of the section we will show the following result.
Theorem 60. Let k ∈ N, then there is a language L and an L ∪ {η} clause The language of Peano arithmetic L PA consists of the function symbols 0/0, s/1, the infix function symbols +/2, * /2, and the infix predicate symbol ≤ /2. Let x and y be distinct variables, then we write (∃x ≤ y)ϕ as an abbreviation for (∃x)(x ≤ y ∧ ϕ) and similarly we write (∀x ≤ y)ϕ as an abbreviation for the formula (∀x)(x ≤ y → ϕ). A L PA formula is said to be bounded if all the quantifiers occurring in it are bounded as above. The Σ 0 , Π 0 and ∆ 0 formulas are the bounded formulas. The Σ n+1 (Π n+1 ) formulas are the formulas of the form (∃ x)ϕ ((∀ x)ϕ) where x is a possibly empty finite sequence of variables and ϕ is a Π n (Σ n ) formula.
We will prove the theorem above by providing a sequence of theories T 0 , T 1 , . . .
such that the provably total recursive functions of T i are exactly those of the level 3 + i of the Grzegorczyk hierarchy, for i ∈ N, and over T 0 the Σ 1 formulas are exactly the ∃ 1 (L(T 0 )) formulas. Since, the Grzegorczyk hierarchy is a strict hierarchy (see for example [Ros84]), we obtain for each level i ∈ N a quantifier-free L(T 0 ) formula ϕ(x, y), such that For a definition of the Grzegorczyk hiearchy we refer the reader to [Ros84].
Definition 61. Let n ∈ N, then we denote by E n the n-th level of the Grzegorczyk hiearchy.

(Q8)
Definition 63. Let n ∈ N, then the theory Q + Σ n -IND is called IΣ n . The theory IΣ 0 is also called I∆ 0 .
There is a ∆ 0 definition of the exponential function such that the theory I∆ 0 proves the inductive properties of the definition of the exponential function, but I∆ 0 does not prove the totality of such a definition.
Proof. See [HP93,Section V.3] In the following we will mainly work a theory that extends I∆ 0 by a statement asserting the totality of the exponential function.
The theory I∆ 0 + EXP is also called elementary arithmetic and has various equivalent formulations, see [Bek05, Section 1.1]. In the following we will develop a particular formulation with a ∀ 1 axiomatization and in which the ∃ 1 formulas of the extended language are exactly the Σ 1 formulas.
Proof. Drop axiom Q3, replace axiom Q8 by the universal closure of the formulas x ≤ y → (∃z ≤ y)z + x = y and z + x = y → x ≤ y, and replace the induction It is routine to check that the resulting theory is equivalent to I∆ 0 . Now we will show that I∆ 0 has ∆ 0 definitions of Skolem functions of all ∆ 0 formulas. Later on we will introduce the corresponding Skolem functions in order to get rid of bounded quantifiers.
Definition 67 (Least number principle). Let ϕ(x, z) be a formula, then the least number principle for ϕ is given by Definition 69. Let ϕ( x, y, z) be a ∆ 0 formula, then the formula D (∃z≤y)ϕ ( x, y, z) is given by Lemma 70. Let ϕ( x, y, z) be a ∆ 0 formula, then I∆ 0 proves We will now define the way in which we Skolemize ∆ 0 formulas.
We can now obtain a suitable formulation of I∆ 0 + EXP.
Lemma 73. There exists a ∀ 1 axiomatized conservative extension T of I∆ 0 +EXP such that every ∃ 1 (L(T )) formula is equivalent over T to a Σ 1 formula and every Σ 1 formula is equivalent over T to an ∃ 1 (L(T )) formula.
Proof. We consider a Π 1 formulation U of I∆ 0 . For each axiom (∀ x)ϕ of U where ϕ is ∆ 0 , T contains the axiom (∀ x)ϕ ∃ . Furthermore, T contains the axiom Exp ∃ [z/e(x, y)]. Finally, for each ∆ 0 formula ϕ( x, y, z), T contains the axiom (D (∃z≤y)ϕ ) ∃ [z/F (∃z≤y)ϕ ( x, y)]. Now obtain a ∀ 1 axiomatization by moving the remaining quantifiers outwards. By a model-theoretic argument it is straightforward to see that the resulting theory is conservative over I∆ 0 + EXP. It is straightforward to check that every ∆ 0 formula ϕ is equivalent in T to a quantifier-free L(T ) formula. Let ψ be a Σ 1 formula, then ψ = (∃ x)ϕ where ϕ is ∆ 0 . Hence, ψ is equivalent over T to the formula (∃ x)ϕ ′ where ϕ ′ is a quantifierfree formula that is equivalent over T to ϕ. Now let ψ be an ∃ 1 (L(T )) formula, then by [Hod97, there exists an equivalent unnested ∃ 1 (L(T )) formula of the form (∃ x)ϕ where ϕ is quantifier-free. Now we simply replace atoms of the form f ( u) = y where f is either a Skolem symbol of a ∆ 0 formula or e by the corresponding defining ∆ 0 formula. Hence, the resulting formula is a Σ 1 formula.
In the following we fix one such extension of I∆ 0 + EXP and call it EA.
Theorem 75 ( [Sie91]). The provably total recursive functions of the theory EA k are precisely those of the class E 3+k of the Grzegorczyk hierarchy.
Proof. See also the proof Corollary 7.5 of [Bek97a].
We can reformulate the theories EA k as follows.
Proof. We proceed by induction on k and show If k = 0, then the claim follows trivially. Now assume the claim for k, then EA k is Π 2 axiomatized, hence by [Bek97a,Corollary 7.4] Furthermore, by [Bek05,Lemma 4.6] we have Since over EA the Σ 1 formulas are exactly the ∃ 1 (L(EA)) formulas, we obtain By the induction hypothesis we readily obtain Since E k E k+1 for all k ∈ N, we can now provide a proof of Theorem 60.
Proof of Theorem 60. Let k ∈ N, then there exists a function f : N → N such that f ∈ E k+4 \ E k+3 . Hence, there exists a Σ 1 formula ϕ(x, y) such that f (n) = m if and only if N |= ϕ(n, m) and Thus, by the construction of EA, there exists a quantifier-free L(EA) formula Hence, C := cls(EA + (∀y)(∀ z)¬ϕ ′ (η, y, z)) is a suitable clause set.
This result tells us that a mechanism that extends refutation by a clause set cycle so as to allow at most k-fold nested ∃ 1 parameter-free induction rule is strictly weaker than a mechanism that allows (k + 1)-fold nested applications of the ∃ 1 parameter-free induction rule. This naturally gives rise to the question whether we can separate a system that provides arbitrary nestings of the parameter-free ∃ 1 induction rule from a system that provides the parameter-free ∃ 1 induction schema. The following lemma shows that we need a different approach to resolve this question.
Hence the theory EA + ∃ 1 (L(EA))-IND is also Π 2 conservative over EA + L(EA)-IND R− . Thus the technique used above does not provide us with a clause set that separates both systems.
Nevertheless, we conjecture that the parameter-free ∃ 1 induction schema is in general stronger than the parameter-free ∃ 1 induction rule.
Conjecture 78. There exists a language L and an L ∪ {η} clause set D(η) such that ∃ 1 (L)- The results in this section are less elementary than the results of Sections 3.3.1 and 3.3.2 in the sense that we work over the comparatively strong EA and the separation involves clause sets that express totality assertions. However, totality assertions are an important class of problems for AITP systems. In this sense the connection with the Grzegorczyk hierarchy is remarkable.

Idempotents in linear arithmetic
In the previous section we have introduced clause set cycles and we have given a characterization of refutation by a clause set cycle in terms of a logical theory. Moreover, we have shown two unrefutability results for clause set cycles. We have shown the second unrefutability result by anticipating the independence result of Theorem 53 for which a proof will be provided in this section. In Section 4.1 we introduce some preliminary notions and we carry out some syntactic simplifications on ∃ 1 formulas. In Section 4.2 we consider some properties of ∃ 1 formulas in the structures N and Z. Finally, in Section 4.3 we carry out the model theoretic construction.
We work in the setting of linear arithmetic, hence, unless stated otherwise, whenever we speak of a formula (sentence) we mean an L LA formula (sentence).

Preliminaries
In this section we mainly carry out some syntactic transformations that allow us to eliminate the function symbols p and 0 from ∃ 1 formulas. The absence of these symbols allows us to carry out certain embeddings of structures in Sections 4.2 and 4.3.
Definition 79. The theory V is axiomatized by the universal closure of the formulas where k ∈ N.
We will carry out these transformations in the very weak theory B + B1 + V. In a first step we will show that we can eliminate the symbol p from ∃ 1 formulas without increasing the quantifier complexity of ∃ 1 formulas. After that, we show that we can moreover eliminate to a certain extent the symbol 0 from ∃ 1 formulas, again without increasing the quantifier complexity.
In order to eliminate the symbol p from ∃ 1 formulas we proceed by replacing all the occurrences of the symbol p by the following definition of the predecessor function.
We can now factor the symbol p into the axiom B1 by replacing all the occurrences of p by the definition of the predecessor function.
Lemma 83. Let ϕ( x) be an ∃ 1 L LA formula, then there exists a p-free ∃ 1 formula Proof. Let ϕ be an ∃ 1 (L LA ) formula, then there exists an unnested ∃ 1 (L LA ) formula ψ such that ⊢ ϕ ↔ ψ, see for example [Hod97,. In particular, the symbol p occurs in ψ only in atoms of the form p(x) = y. Hence, we obtain the desired formula by replacing in ψ the atomic formulas of the form p(x) = y by D(x, y).
In the following we will eliminate the symbol 0 to a certain extent from ∃ 1 formulas in one variable. In order to simplify the arguments we will introduce some additional assumptions. Since we work in the context of the theory B we can by Lemma 13 assume without loss of generality that ground terms are numerals. Moreover, since equality is symmetric we will assume without loss of generality that atoms are oriented in such a way that whenever the atom contains a variable, then the left hand side of the atom contains a variable.
Let us start by introducing the notion of components, a class of ∃ 1 formulas that is particularly suitable to carry out the elimination of the symbol 0. Moreover, components will also be of use for the arguments in Section 4.2.
Definition 84 (Components). A component χ( x) is a formula of the form ∃ yC χ ( x, y), where C χ is a conjunction of literals.
We will distinguish between three types of literals: Those where both sides contain variables, those where only one side of the equation contains a variable and those where none of the sides contain a variable.
Definition 86. Let l be a literal of the form u ⊲⊳ v with ⊲⊳ ∈ {=, =}, then l is: ↑↑ if both u and v contain a variable, ↑↓ if u contains a variable and v is ground, and ↓↓ if both u and v are ground. We will combine this notation with superscript + to indicate that the literal is positive and a superscript − to indicate that the literal is negative. We say that a ↑↓ literal is simple if it is of the form z = k where z is a variable and k ∈ N and complex otherwise.
Lemma 87. Let t be a term with Var (t) = ∅, then there exists a 0-free term t ′ such that B + V ⊢ t = t ′ .
Proof. We proceed by induction on the structure of the term t. If t is a variable, then we are done by letting t ′ = t. If t is of the form s(u), then Var (u) = ∅.
Hence, we can apply the induction hypothesis to u in order to obtain a 0-free term u ′ such that B +V ⊢ u = u ′ . Thus, B +V ⊢ t = s(u ′ ) and we let t ′ = s(u ′ ). If t is of the form p(u), then we proceed analogously. If t is of the form u + v, then we need to consider several cases. If Var (u) = ∅, then Var (v) = ∅ and we have B ⊢ u = k for some k ∈ N and therefore B + Hence, we can apply the induction hypothesis to u in order to obtain a 0-free term u ′ such that B + V ⊢ u = u ′ . Since Var (v) = ∅, there exists k ∈ N such that B ⊢ v = k. By multiple applications of (A5) followed by an application of (A5) we obtain B + V ⊢ t = u+ k = s k (u)+ 0 = s k (u). Hence, t ′ := s k (u) is the desired 0-free term. If u and v contain variables, then by the induction hypothesis we obtain 0-free terms u ′ and v ′ such that B By Lemma 14 and Lemma 87, it is straightforward to eliminate the symbol 0 from ↑↑ and ↓↓ literals. However, eliminating the symbol 0 from ↑↓ literals needs some more work. Let us start by observing that complex ↑↓ atoms can be split into several simple ones.
Lemma 88. Let u( z) be a p-free term with z = (z 1 , . . . , z l ) and k ∈ N, then Proof. Work in B + B1. The "if" direction is obvious. For the "only if" direction assume u( z) = k and proceed by k-fold case analysis on the variables z. If z i = m i with 0 ≤ m i ≤ k for i = 1, . . . , l, then we have two cases. If u(m 1 , . . . , m l ) = k, then the claim follows trivially. Otherwise if u(m 1 , . . . , m l ) = k, then we are done as well since z 1 = m 1 ∧ · · · ∧ z l = m l is a conjunct of the right side. Otherwise, there exists an i ∈ {1, . . . , l} and z ′ i such that z i = s k+1 z ′ i . Then let j be the index of the variable z j with the rightmost occurrence such that z j = s k+1 z ′ j for some z ′ j . Then we have u( z) = s k+1 (u ′ (z 1 , . . . , z j−1 , z ′ j , z j+1 , . . . , z l )) and a term u ′ . Hence, by Lemma 11.(i) we have u( z) = k.
Furthermore, we can eliminate simple ↑↓ − literals at the expense of introducing several positive literals and an existential quantifier.
Proof. The "if" direction is obvious. For the "only if" direction assume z = k and proceed by k-fold case analysis on z. If z = i with i < k, then we are done. The case where z = k contradicts the assumption and therefore we are done. If z = s k+1 z ′ for some z ′ , then we are done as well.
The elimination of the ↑↓ literals from a component χ(x 1 , . . . , x m ) consists of two majors steps. In a first step we deal with all the ↑↓ literals except the simple ↑↓ literals of the form x i = k with k ∈ N and i ∈ {1, . . . , m}. In the second step we will deal with the remaining ↑↓ literals by making use of the observation that the truth value of a literal of the form x = k with k ∈ N becomes fixed when x is large enough.
Let us start by defining some measures that will be used to control the first step of the elimination procedure.
We will now provide some intermediate lemmas that allow us to eliminate a single literal.
Lemma 91 (Elimination of ↑↓ − literals). Let χ( x) be a p-free component containing a ↑↓ − literal, then there exist p-free components χ ′ 1 , . . . , χ ′ n such that Proof. We first apply Lemma 88 in order to split the atom of the ↑↓ − literal.
After that, we move the negations inwards and apply Lemma 89 to all the newly introduced literals of the form z = k with k ∈ N. Now we move the newly introduced existential quantifiers outwards and possibly rename some bound variables. Finally, we distribute conjunctions over disjunctions exhaustively. Let χ 1 , . . . , χ k be the resulting components. Since we have introduced only existential quantifiers and positive literals, we have # − (χ i ) < # − (χ).
The following lemma combines the previous lemmas in order to accomplish the first step of the elimination of the ↑↓ literals.
Lemma 94. Over B + B1 + V every ∃ 1 (L LA ) formula ϕ(x 1 , . . . , x n ) is equivalent to a disjunction of formulas of the form i∈I x i = k i ∧ (∃ y)C( x, y), where I ⊆ [n] = {1, . . . , n} and C is a p-free 0-free conjunction of literals that contains only those variables x i such that i / ∈ I.
Proof. Let χ( x) be a p-free component, then we proceed by induction on the lexicographic order ≺ on N 4 induced by ≤ and show that over B +B1 the component χ is equivalent to disjunction of formulas of the form i∈I x i = k i ∧ (∃ y)C( x, y), where I ⊆ [n] and C is a p-free disjunction of ↑↑ and ↓↓ literals that contains only those variables x i such that i / ∈ I. Let #(χ) = (# − (χ), # ∃ (χ), # + complex (χ), # FV (χ)). If χ contains a ↑↓ − literal, then we apply Lemma 91 in order to obtain p-free and therefore we can apply the induction hypothesis to each of χ ′ 1 , . . . , χ ′ n in order to obtain the desired components. If χ contains a complex ↑↓ + literal, then we apply Lemma 92 in order to obtain p-free components χ ′ 1 , . . . , . . , n and therefore we can apply the induction hypothesis to χ ′ 1 , . . . , χ ′ n in order to obtain the desired components. Let χ(x) = (∃y 1 ). . . (∃y l )C χ . If χ contains a ↑↓ literal of the form x i = k i with i ∈ {1, . . . , n}, then let χ = (∃ y)C χ ( x, y) and Hence, we may apply the induction hypothesis to the component χ ′ . If χ contains a simple ↑↓ + literal y i = k, then we apply Lemma 93 in order to obtain a p-free component χ ′ (x) such that B + B1 ⊢ χ ↔ χ ′ , # − (χ ′ ) = # − (χ), and # ∃ (χ ′ ) < # ∃ (χ). Hence we have #(χ ′ ) ≺ #(χ) and therefore we can apply the induction hypothesis in order to obtain the desired components. Now let ϕ(x 1 , . . . , x n ) be an ∃ 1 (L LA ) formula. By Lemma 85 the formula ϕ is equivalent over B + B1 to a disjunction of p-free components. Therefore, by the procedure above the formula ϕ is equivalent over B + B1 to a disjunction of formulas of the form i∈I x i = k i ∧ (∃ y)C( x, y), where I ⊆ [n] and C is a p-free disjunction of ↑↑ and ↓↓ literals containing only those variables x i such that i / ∈ I. Now we apply Lemma 14 to eliminate the ↓↓ literals from C and Lemma 87 to eliminate 0 from the ↑↑ literals of C.
In the next step we eliminate the remaining literals of the form x = k. This step relies on the observation that the truth value of these literals is fixed when x is large enough.
Proof. By Lemma 94 the formula ϕ is equivalent over B + B1 + V to a disjunction of the form m j=1   i∈I j where for j = 1, . . . , m, I j ⊆ [n] and C j is a p-free 0-free disjunction of literals containing only those variables x i such that i / ∈ I j . Let N = 1 + max{k i,j | j = 1, . . . , m, i ∈ I j }, then ϕ(s N (x 1 ), . . . , s N (x n )) is equivalent over B + B1 + V to the formula j=1,...,m I j =∅ (∃ y)C j ( x, y).
Finally, we obtain the desired formula by moving the ∃ quantifiers outwards over the disjunction.

Components in N and Z
In this section we will investigate some basic model-theoretic properties of ∃ 1 formulas in the structures N and Z.
Definition 96. Let M be an L LA structure and ϕ(x) a formula. We say that d ∈ M is a solution of ϕ in M if M |= ϕ(d).
Let θ(x 1 , . . . , x k ) be an atom, then it is obvious that θ is equivalent in Z to a linear equation of the form k i=1 a i x i = b with integers a 1 , . . . , a k , b. Hence a conjunction of atoms θ 1 (x 1 , . . . , x k ), . . . , θ n (x 1 , . . . , x k ) is equivalent over Z to an inhomogeneous system of linear equations of the form where A ∈ Z m×k and b ∈ Z m×1 . Now consider the corresponding homogeneous system The solutions of the system (H) form a submonoid H of Z k with pointwise addition. Furthermore, assume that (I) has a particular solution i (p) , then the set of solutions of (I) is given by Lemma 97. Let χ(x) be a component with two solutions in Z, then for all n ∈ N there exists n ′ ∈ N with n ′ ≥ n such that Z |= χ(−n ′ ).
with n T (i) = (n i,0 , . . . , n i,k ) and i ∈ {1, 2} such that n 1,0 < n 2,0 . We start by considering the positive literals of χ. By the above the positive literals of χ are equivalent in Z to an inhomogeneous linear system with A ∈ Z l×(k+1) and b ∈ Z l×1 , where l is the number of positive literals of χ. Let us denote by I the set of solutions of (I) and by H the set of solutions of the homogeneous system. Then n (1) , n (2) ∈ I and therefore h 0 := n (1) − n (2) ∈ H. Hence m · h 0 + n (1) ∈ I for all m ∈ N. Now consider a negative literal p(x 0 , . . . , x k ) = 0 of χ, where p is a linear polynomial in the variables x 0 , . . . , x k with coefficients in Z. Let q(m) := p((m · h 0 + n (1) ) T ), then q is a linear polynomial in one variable and moreover by the assumptions we have q(0) = p(n T (1) ) = 0. Hence, there clearly is at most one k ∈ Z such that q(k) = 0.
We summarize the results of this section in the following proposition.
Proposition 98. Let ϕ(x) be a p-free ∃ 1 formula. There exists an n ∈ N such that if ϕ has n solutions in N, then there exists an infinite strictly descending sequence of integers (k i ) i∈N with Z |= ϕ(k i ) for all i ∈ N.
Proof. Let χ 1 , . . . , χ k be p-free components such that ⊢ ϕ ↔ k i=1 χ i . Let n = k+1 and assume that ϕ has n solutions in N. Then by the pigeonhole principle there is a component χ i 0 with two solutions in N and therefore χ i 0 has two solutions in Z. Finally, we apply Lemma 97 to χ i 0 .

A non-standard model
In this section we construct a family of non-standard structures for the language L LA and we make use of the results from Sections 4.1 and 4.2 in order to show that these structures are models of the theory Let us start by introducing some terminology about the models of this theory. Since already the theory [B, Open(L LA )-IND R− ] proves B1 and V, the models of (B + B2 + B3) + ∃ 1 (L LA )-IND R− are composed of a copy of the natural numbersthe standard elements-and copies of the integers, which we call the non-standard elements. The elements of the models we construct below are pairs of the form n [m] = (m, n) ∈ N × Z such that m = 0 implies n ∈ N. If m = 0, then the element is a standard element, otherwise it is non-standard and belongs to the m-th copy of the integers. We call m the type of the element and n the value of the element. We start by defining an operation that will allow us to relate the types of the elements. The structure M 0 is isomorphic to the standard model N. Since we are interested in constructing non-standard structures, we will consider mainly the structures M I with I ≥ 1. Proof. Let ϕ(x) be an ∃ 1 formula and assume that ϕ is T -inductive. Since T is sound, we have N |= ϕ(x). Now consider an element n [m] ∈ M . If m = 0, then n ∈ N and by the observation above N |= ϕ(n). By the ∃ 1 -completeness of B we have B ⊢ ϕ(n) and therefore M I |= ϕ(n). Since M I |= n = n [0] we obtain M I |= ϕ(n [m] ). Now assume that m > 0. By Proposition 95 there exists a 0-free p-free formula ϕ ′ and an N ∈ N such that B + B1 + V ⊢ ϕ(s N (x)) ↔ ϕ ′ (x). Hence we have N |= ϕ ′ and therefore by Proposition 98 there is an infinite strictly descending sequence of integers (k i ) i∈N such that Z |= ϕ ′ (k i ) for i ∈ N. By Lemma 102 we In particular, we have M I |= ϕ(n [m] ).  Proof. An immediate consequence of Lemma 105 and Corollary 104.
Theorem 53 can now finally be obtained as an immediate consequence of Corollary 106.
Proof of Theorem 53. By Corollary 106 we can work with M 1 . Now observe that n·k Let us now consider whether some straightforward modifications of the background theory in Theorem 53 will improve the result. The following lemma shows that we do not strengthen the result of Theorem 53 by adding any ∀ 1 consequence of (B + B2 + B3) + ∃ 1 (L LA )-IND R− to the background theory. Proof. The first part is obtained by a straightforward induction on n and applying Lemma 44. The second part is an immediate consequence of the first part.
The next natural question to ask is whether removing the formulas B2 and B3 in Theorem 53 would weaken the result. The following result shows that removing the axiom B2 would indeed weaken the result.

Conclusion
Clause set cycles are a formalism introduced by the authors of this article in [HV20] for the purpose of giving an upper bound on the strength on a family of AITP systems based on the extension of a saturation theorem prover by a cycle detection mechanism, such as the n-clause calculus [KP13,Ker14]. In this article we have extended the analysis of clause set cycles that was begun in [HV20] by providing a logical characterization of refutation by a clause set cycle and concrete clause sets that are not refutable by a clause set cycle but that are refuted by induction for quantifier-free formulas.
In Section 3 we have identified several logical features of clause set cycles. Identifying these features has enabled us to give a characterization of the notion of refutation by a clause set cycle in terms of a logical theory. The characterization allows us to think of clause set cycles essentially as unnested applications of the parameter-free ∃ 1 η induction rule. In the light of this logical characterization we were able to reduce the task of finding clause sets that are not refuted by a clause set cycle to an independence problem.
Based on this characterization we have shown two unprovability results for clause set cycles. The first result (Corollary 49) exploits the fact that refutations by a clause set cycle only make use of η-instances of the inductive lemmas. In particular, we have shown that even the full induction schema subject to the η-restriction does not prove some atoms that can already be obtained by an unnested application of the open parameter-free induction rule. This shows that the η-restriction is very limiting. However, our second unprovability result (Corollary 55) does not rely on the η-restriction and thus shows that AITP systems based on clause set cycles have more limitations. In Section 4 we have developed the underlying independence result (Theorem 53). This independence result shows us that the unprovability persists even when the induction rule is nested. We conjecture that this unprovability phenomenon is due to the absence of induction parameters and therefore also persists when the induction rule is replaced by the induction schema. This second unrefutability result shows that clause set cycles fail to capture induction arguments that involve very simple generalizations.
The results in this article together with the results in [HV20] explain much about the situation of AITP systems based on clause set cycles in the logical landscape. We have summarized the current results as well as some conjectures in Figure 1. The figure depicts the refutational strength of various induction systems. The set of clause sets refuted by a system is described by an arc. The name of the system is inscribed near the top of the corresponding arc. The systems range over all first-order languages. The system CSC denotes refutation by a clause set cycle and NCC denotes the n-clause calculus as described in [HV20]. The points {• i | i = 1, 2, 3, 4, 5} represent clause sets whose positions are confirmed by the results in this article and [HV20]. In particular • 1 corresponds to the clause set that witnesses [HV20, Corollary 5.8], • 2 corresponds to the clause set constructed in Section 3.3.1, and • 3 corresponds to the clause sets constructed in Section 3.3.2. the points • 4 , • 5 correspond to some the clause sets mentioned in Theorem 60. The inclusion of Open-IND − in [∅, ∃ 1 -IND R− ] is shown by Lemma 111.
The dashed arc corresponding to the system ∃ 1 -IND − is positioned according to Conjecture 58. The points { * i | i = 1, 2} of Figure 1 represent clause sets that we conjecture to be at the respective positions. In particular, the point * 1 corresponds to the clause set mentioned in Conjecture 78. We would like to clarify the status of the point * 1 and the dashed arc corresponding to the system ∃ 1 -IND − , as this would contribute to the understanding of the role of induction parameters and induction rules in automated inductive theorem proving. Due to the recent advances in saturation-based theorem proving, the research on automated inductive theorem proving has recently increasingly focused on the integration of induction into saturation-based theorem provers [Cru15, Cru17, KP13, Ker14, Wan17, EP20, RV19, HHK + 20]. We plan to carry out similar investigations for all these methods in order to develop a more global and unified view of induction in saturation-based theorem proving. In particular these investigations will give rise to the analysis of the interaction of the induction principle with various mechanisms of saturation-based provers such as Skolemization, splitting, term orderings, and redundancy criteria.
The point * 2 in Figure 1 gives rise to a more general topic that is worth mentioning separately. On the one hand it is computationally expensive for AITP systems to carry out even a small number of inductions, and on the other hand the space of all possible induction formulas is very large. Hence AITP systems rely on heuristics to find induction formulas such as restricting the overall shape of the considered induction formulas and drawing syntactical material for induction from the formulas generated during the proof search. For example, the n-clause calculus as described in [KP13,Ker14] only makes use of clause set cycles that appear as a subset of the clauses that are generated by the underlying saturation-based system. Such heuristics will not succeed in cases where a sufficiently non-analytic induction is required. Our technique for analyzing AITP systems as logical theories can deal with such heuristics only to a limited extent. For example, the notion of refutation by a clause set cycle completely ignores the fact that the n-clause calculus draws clause set cycles only from the generated clauses. Once the logical strength of most inductive theorem provers is known precisely enough it will likely be necessary to investigate the fine grained analyticity properties of the provers in order to get a better understanding of the consequences of restricting the degree of analyticity.