UvA-DARE (Digital Academic Repository) Herbrand's theorem as higher order recursion

This article examines the computational content of the classical Gentzen sequent calculus. There are a number of well-known methods that extract computational content from ﬁrst-order logic but applying these to the sequent calculus involves ﬁrst translating proofs into other formalisms, Hilbert calculi or Natural Deduction for example. A direct approach which mirrors the symmetry inherent in sequent calculus has potential merits in relation to proof-theoretic considerations such as the (non-)conﬂuence of cut elimination, the problem of cut introduction, proof compression and proof equivalence. Motivated by such applications, we provide a representation of sequent calculus proofs as higher order recursion schemes. Our approach associates to an LK proof π of ⇒ ∃ vF , where F is quantiﬁer free, an acyclic higher order recursion scheme H with a ﬁnite language yielding a Herbrand disjunction for ∃ vF . More generally, we show that the language of H contains all Herbrand disjunctions computable from π via a broad range of cut elimination strategies. © 2020 The Authors. Published by Elsevier B.V. This is an open access


Introduction
The property of being a valid first-order formula is intimately tied to the consideration of the ground, i.e., variable-free, instances of that formula.This connection is apparent in most, if not all, proofs of the completeness theorem which, in one way or another, rely on the construction of a term model.The phenomenon is plainly visible in Herbrand's theorem which states that a formula is valid if, and only if, there is a finite expansion (of existential quantifiers to disjunctions and universal quantifiers to conjunctions of instances) which is tautological.This feature of classical first-order logic is in contrast to both classical second-order logic, whose standard semantics goes beyond the ground instances of a countable language, and intuitionistic first-order logic, which exhibits a more complicated interaction between quantifiers and propositional connectives.
Proof-theoretically, the use of instances of a formula naturally leads to analytic, cut-free, proofs.Gentzen's mid-sequent theorem makes the close connection between Herbrand expansions and cut-free proofs apparent.Taking this perspective on the cut-elimination theorem, and thereby keeping the well-known complexity bounds in mind, shows that, in essence, cut-elimination consists of the computation of a Herbrand expansion.One may ask, however, whether it is possible to compute Herbrand expansions in a more direct way, circumventing the cumbersome process of cut-elimination.There are a number of formalisms that do just that, the historically first being Hilbert's ε-calculus [25] (see [30] for a contemporary exposition of the εtheorems in English).In [17], Gerhardy and Kohlenbach adapt Shoenfield's variant [38] of Gödel's functional interpretation [18,6] to a system of pure predicate logic.A recent adaptation of the functional interpretation, utilising star types to interpret contraction, is given by Ferreira and Ferreira [14].Related to proof nets is the work of Heijltjes [19] and McKinley [28], and a similar approach, in the formalism of expansion trees [29], can be found in [5].A different method with similar aims is cut-elimination by resolution [9].
The present work is motivated by two follow-up questions: Which Herbrand expansions are implicit in (i.e., can be computed from) a sequent calculus proof with cut?What is a minimal amount of information needed to describe these expansions?These questions are closely related to the issue of (non-)confluence of cut elimination.For instance, the number of distinct Herbrand expansions computed by Gentzen-style cut-elimination can be non-elementary in the size of the starting proof [7].In other words, the choice of reduction strategy affects which Herbrand expansion is computed by cut elimination.In light of this the motivating questions become a matter of representation, namely how to express the Herbrand expansions embedded in a proof with cut while abstracting away the propositional structure of the original proof and avoiding direct computation of cut-free proofs.
In this article, we provide a representation of Herbrand's theorem as languages of higher order recursion schemes.Higher order recursion schemes are a generalisation of regular tree grammars (which correspond to order-0 recursion schemes) to finite types.They have their origin in Park's program schemes [34] and are widely used for verification of higher-order functional programs [33].
More specifically, sequents in a given proof with cut are interpreted as non-terminals whose production rules follow the local instantiation structure of quantifiers.The type of a given non-terminal is completely determined by the quantifier complexity of formulae in the corresponding sequent in such a way that a sequent comprising Σ n ∪ Π n formulae is represented by a non-terminal of order n.These types correspond closely to the types arising in Shoenfield's version of Gödel's functional interpretation.Concerning production rules, cut corresponds to composition of non-terminals and contraction gives rise to non-determinism.
Our representation of Herbrand's theorem is specifically tailored for the classical sequent calculus in the sense that it remains faithful to the non-deterministic process of computing Herbrand expansions via (reductive) cut elimination.In this respect, we believe the present work marks the first method of Herbrand extraction that operates directly on sequent calculus proofs.The framework of higher order recursion schemes opens the door to applying techniques and results from formal language theory directly to structural proof theory.An example of the latter is an upper bound on the size (and, therefore, number) of Herbrand expansions which can be obtained via a broad array of cut elimination strategies.On the other hand, our adaptation of Beckmann's theorem on the length of β-reductions for the simply-typed λ-calculus [11] to language bounds on acyclic higher order recursion schemes, may be of independent interest in formal language theory.
The main result of this article can be summarised as follows and was announced in [3].
Theorem 1.1.Let F be a quantifier-free formula and π a first-order proof of ∃ vF in which cut-formulae are prenex Π n or Σ n .There exists an acyclic order n recursion scheme H with language L(H ) such that: n+2 where |π| is the number of inference rules in π; iii) L(H ) subsumes the Herbrand set extracted from any cut-free proof that can be obtained from π via a sequence of Gentzen-style cut reductions that always reduces to the weak (quantifier) side of a cut before the strong side.

Outline
We begin by rehearsing the sequent calculus and reductive cut elimination.Higher order recursion schemes are introduced in Section 3. Theorem 3.16 establishes the upper bound on the size of languages for acyclic recursion schemes by generalising Beckmann's theorem on the length of reduction sequences for simply typed λ-calculus [11].Section 4 concerns our representation of sequent calculus proofs as higher order recursion schemes.Connections with the functional interpretation of classical logic are discussed and a correspondence between the two type hierarchies established (Theorem 4.4).The upper bound on language size stated in Theorem 1.1 is proved in Theorem 4. 15.Section 5 provides a case study in which we analyse the Herbrand scheme for a proof of the pigeonhole principle.In Section 6 we establish the technical machinery necessary to relate reductive cut elimination to derivations in Herbrand schemes.The analysis of languages of Herbrand schemes in relation to reductive cut elimination is undertaken in Section 7, from which Theorem 1.1 follows.The article concludes with a discussion of the results and potential extensions.

Sequent calculus for classical first-order logic
Terms and formulae of first-order logic are defined as usual using the connectives ∧, ∨ and quantifiers ∀, ∃, as well as a selection of predicate and function symbols.We assume two sets of variable symbols, free variables, denoted α, β, etc., and bound variables, v, w, etc. Upper-case Roman letters, A, B, etc. denote formulae and upper-case Greek letters Γ, Δ, etc. range over sequents, namely finite sequences of formulae.We abbreviate by Γ, Δ the concatenation of Γ and Δ; and Γ, A is shorthand for Γ, {A}.The length of a sequent Γ is denoted |Γ|.As the order of the formulae in a sequent is often (though not always) unimportant, we will frequently identify sequents with (finite) multisets.We write Ā to denote the dual of the formula A obtained by de Morgan laws.Given a sequence of variable symbols v = (v 0 , . . ., v k−1 ) of length k, we write ∀ vA and ∃ vA as shorthand for ∀v 0 • • • ∀v k−1 A (resp.∃v 0 • • • ∃v k−1 A).If t = (t 0 , . . ., t k−1 ) is a sequence of terms of the same length, A( t/ v) is the formula obtained from A by replacing each v i by the corresponding term t i , where bound variables in A are renamed as necessary to avoid variable capture.
The following abbreviations will be used in later sections.For a formula A, we write A qf to indicate that A is quantifier-free, and u(A) (resp.e(A)) for the number of consecutive universal (existential) quantifiers in A before encountering an existential (universal) quantifier: For notational simplicity, we work in one-sided sequent calculus with explicit structural rules for weakening (w), contraction (c) and permutation (p), though the results presented apply equally to two-sided (so-called Gentzen-style) sequent calculi and either form of calculus without explicit structural rules.The axioms and rules of the calculus are laid out in Fig. 1.The quantifier introduction rules ∀ α and ∃ r introduce a sequence of quantifiers in one application.Applications of ∀ α are subject to an eigenvariable condition that if α = (α 0 , . . ., α k−1 ) then α i does not occur in the sequent Γ, A for any i < k.In each inference rule, the formulae which are explicitly mentioned in the premise(s) (usually the right-most formula in the sequent) are said to be active in the rules applied.For example, A and B are active in ∧ rule, both copies of A are active in contraction, and there are no active formulae in the weakening rule.Active formulae of cut where in each case p * denotes a sequence of permutation inferences p, a notation we also extend to the other structural rules.
Definition 2.1.A proof is a finite tree labelled by sequents obtained from the axioms and rules of the calculus with the restriction that cuts apply to prenex formulae only.Without loss of generality, we assume all proofs are regular, by which we mean: 1. every eigenvariable in the proof appears in exactly one ∀ α inference in the proof and does not occur in any sequent outside the sub-proof of this inference, 2. if A appears as the active formula of a quantifier inference ∀ (∃) then u(A) = 0 (resp.e(A) = 0).A proof that does not contain the rule cut is cut-free.A proof in which every cut formula is quantifier-free is called quasi cut-free.

We write π
Γ to express that π is a regular proof with Γ being the sequent appearing at the root of π.EV(π) denotes the set of eigenvariables in a proof π, and for sequences α = (α 0 , . . ., α k−1 ) and t = (t 0 , . . ., t k−1 ) of variable symbols and terms, π ( t/ α) is the result of replacing throughout the proof π each occurrence of the variable symbol α i by the term t i .

Cut reduction and normal forms
The standard cut reduction and cut permutation steps are given in Figs. 2 and 3.For the sake of a concise presentation, the axioms and rules are stated with implicit permutation in place.We assume all the proofs drawn in Figs. 2 and 3 are regular.Hence, in the case of contraction reduction where the sub-proof π 1 is duplicated it is assumed that the eigenvariables are renamed in the copy, which is emphasised by annotating Fig. 2. One-step cut reduction rules.
Unary inf.: the sub-proof with an asterisk i.e. π * 1 .In the two reductions of Fig. 3, r represents an arbitrary unary or binary inference rule.An example of the binary inference permutation rule for r = cut is For proofs π and π we write π π to express that π is obtained from π by application of a reduction or permutation rule to a sub-proof of π, and let * denote the reflexive transitive closure of .If π π then the reduced cut either no longer exists, is replaced by cuts on formulae with either lower logical complexity or fewer applied contractions, or permuted to a subproof.In any given proof there may, however, be many cuts and eliminating one can (through duplicating a sub-proof) result in introducing several copies of other cuts.To obtain a cut-free proof, it is necessary to provide a (terminating) cut elimination strategy i.e. a procedure that given any proof π Γ induces a sequence of cut reduction and permutation steps π * π such that π Γ and the rule cut is not used in π .

Theorem 2.2 (Gentzen's Hauptsatz). There is a cut elimination strategy that transforms any proof in firstorder logic to a cut-free proof.
There are many cut elimination strategies such as top-most reduction strategy or the elimination of the cut with highest logical complexity.Different strategies provide different cut-free proofs, commonly also referred to as normal forms.In fact, there exist proofs with infinitely many normal forms (see e.g.[41,Example 2.1.3]).We now turn to the relationship between cut elimination and Herbrand's theorem.

Herbrand's theorem and cut elimination
Herbrand's theorem is considered a classic result in proof theory.It can be thought of as reducing validity in first-order logic to validity in propositional logic.From the modern perspective it can also be seen as extracting computational content to first-order proofs.A simple case of the theorem is the following.

Theorem 2.3 (Herbrand's theorem). A formula ∃ vA qf is valid if and only if there exists a finite set of sequences of terms
If a formula ∃ vA qf is valid then any set of terms { t 0 , . . ., t k } that validate the disjunction is called a Herbrand set, and the disjunction itself a Herbrand disjunction for the formula.
Herbrand's theorem pre-dates Gentzen's Hauptsatz but the latter readily provides an instructive proof of the theorem: Suppose ∃ vA qf is valid and fix a quasi cut-free proof π ∃ vA.It is possible to permute the rules applied in π so that no quantifier inference occurs above a purely propositional rule (Gentzen's mid-sequent theorem [15]).Once the proof is partitioned into a propositional part and a quantifier part, the terms that validate the formula can be directly read off from the mid-sequent, the sequent separating the two parts.
Herbrand's original statement is much more general than that stated above and applies to any formula of first-order logic thanks to Herbrandisation, the dual notion of Skolemization.Given an arbitrary formula A, by introducing suitable constant and function symbols it is possible to remove universal quantifiers in A and obtain a Σ 1 prenex-formula which is equi-valid to A. Herbrandisation can also be applied to a proof of a sequent Γ transforming it to a proof of the Herbrandisation of Γ.
If a Herbrand set (disjunction) is obtained via cut elimination it is customary to refer to it as a Herbrand set (disjunction) of the proof.Note that these are not unique: different reduction strategies can lead to non-elementary many pairwise distinct Herbrand disjunctions [7].For both computing and representing Herbrand disjunctions it is therefore desirable to bypass cut elimination.There has been a number of successful approaches such as via Herbrand nets [28], proof forests [19], expansion trees with cut [5] and functional interpretation [17].In the next section we introduce a fresh approach based on higher order recursion schemes.The aim of this approach is twofold.On the one hand, we wish to provide a representation of Herbrand's theorem specifically tailored for the classical sequent calculus which is faithful to the nondeterministic process of computing Herbrand expansions via (reductive) cut elimination.On the other hand, the framework of higher order recursion schemes opens the door to applying techniques and results from formal language theory directly to structural proof theory.

Recursion schemes
We begin this section introducing the type system and terms that will be used throughout the paper.In Sections 3.2 and 3.3, higher order recursion schemes for this type hierarchy are introduced.The upper bounds on language size we establish in Theorem 3.16 allow us to deduce the numerical bound claimed in Theorem 1.1.The association of proofs with recursion schemes is given in Section 4.

Types and terms
The type system we utilise extends the hierarchy of simple types (over a type of individuals ι) by pair types and two additional type constants.These are the unit type, denoted , and a type ς of (stacks of) substitutions, elements of which are finite sequences of pairs (α, r) where α and r are elements of some (and the same) type.We are interested specifically in the case that α is a constant symbol (of simple type) from a particular ranked alphabet Σ, and refer to the type ς as the type of substitution stacks (over Σ), or simply Σ-substitutions.
The informal reading behind the type ς is that of an accumulator for a sequence of substitutions that are generated by reading a particular thread through a formal proof: when a witness to an existential quantifier is encountered along such a thread, the witness is outputted accompanied by the current stack of substitutions.The substitutions are not evaluated at the formal level but recorded as an element of ς.
We begin with a formal definition of the types and conventions for their representation, followed by ranked alphabets and the recursive definition of (typed) terms including the precise form of inhabitants of the type of substitution stacks.Definition 3.1.The types are defined in the following way.
• ι is a type, called the type of individuals.
• is a type, called the unit type.
• ς is a type, called the type of substitution stacks.
A type formed without reference to ς is called basic, and one formed only out of ι and → is simple.The types ι and are referred to collectively as ground types and any type that is not a function type is called prime.The sequence types are the types of the form ι n for any n, where ι 0 = and ι n+1 = ι × ι n .The set of all types is denoted Type.
We follow the convention that the two type forming operations × and → associate to the right, and that → binds more strongly than ×, so for ρ 0 , . . ., ρ k types we have where co(ρ) is a prime type.Given such a decomposition of ρ we refer to co(ρ) as the co-domain of ρ, to k as the arity of ρ, and to ρ i (1 ≤ i ≤ k) as the i-th domain of ρ.Note that the co-domain of a ground type or pair type is the type itself.
The order of a type ρ generalises the usual definition of order for simple types.Motivated by later technicalities however, it convenient to assume any function type whose co-domain is the unit type has order 0, which in our Herbrand schemes will always be observationally equivalent to a single constant function, has order 0. Definition 3.2 (Order).The order of a type ρ, ord(ρ), is defined as follows.Given an alphabet A = S, λ , we write α ρ ∈ A if α ∈ S and λ(α) = ρ, and hence frequently identify A with the set {α λ(α) | α ∈ S} of symbols with type annotations.For alphabets A = S, λ and B = S , λ , we write A ⊂ B if S ⊆ S and λ = λ S. In case A and B are disjoint, A ∪ B denotes the alphabet formed by the union of A and B, namely S ∪ S , λ ∪ λ .The empty alphabet is denoted ∅.Definition 3.4 (Terms and substitutions).Fix alphabets Σ ⊂ A where Σ is simple.The A-terms over Σ (henceforth A-terms) and the types they inhabit are defined inductively as follows, where r : ρ expresses that r is an A-term of type ρ.

1.
is an A-term of type .2. If α ρ ∈ A then α is an A-term of type ρ. 3.If r : ρ and s : σ then r, s is a A-term of type ρ × σ. 4. If r : σ → τ and s : σ then rs is a A-term of type τ . 5. ⊥ is an A-term of type ς. 6.If a : ς and r : ρ, and α ρ ∈ Σ then [α ← r]a is an A-term of type ς. 7. If r : ρ, a : ς and ρ is basic then r • a is an A-term of type ρ.
Note that λ-abstraction is not present in the term calculus, so the existence of terms of function type depends the A-symbols.
In addition to the notation r : ρ used above, we may also write r ρ to express that r is an A-term of type ρ.We drop mention of Σ and A if they can be inferred from the context or are not important to the given setting, in which case A-terms are referred to simply as terms.Terms arising from clauses 1, 2 and 5 are called constants; terms arising from 3 and 4 are called pairs and applications respectively; terms of type ς are called substitution stacks; and terms of the form in 7 are called explicit substitutions (or simply substitutions if there is no cause for confusion).A basic term is any term constructed via the rules 1 to 4 only, i.e. a B-term for some basic alphabet B. A term of a sequence type is called a sequence.Application is assumed to associate to the left, and pairing and the formation rule for substitution stacks both associate to the right.
The sub-term relation is defined as usual over the basic terms, and is extended to terms containing substitutions by defining the sub-terms of ⊥ to be {⊥}, the sub-terms of a = [α ← r]b to be a and any sub-term of r or b, and the sub-terms of r = s • a to be r and any sub-term of s or a.Thus the basic terms are precisely those terms that do not have a substitution stack as a sub-term.
Given a finite sequence of terms (r i : ρ i ) i≤k , let r 0 , r 1 , . . ., r k be the term r 0 if k = 0 and, otherwise, the pair r 0 , r 1 , . . .
The order of a term is the order of its type.Proposition 3.5.If Σ ⊂ Σ are simple alphabets and A is an alphabet extending Σ , then every A-term over Σ is an A-term over Σ .
In addition to the term-level explicit substitutions, there is of course the usual operation of substituting given symbols by terms of corresponding type which we refer to as implicit substitution.Explicit substitutions can be interpreted as implicit substitutions by reading terms r • a as the image of r under the (implicit) substitution described by a, a process we call evaluation.The following definitions explicate these two operations.Fix an alphabet A. Definition 3.6 (Implicit substitution).For A-terms r : ρ, t 0 : τ 0 , . . ., t k : τ k and distinct symbols α τ 0 0 , . . ., α τ k k ∈ A, the term r( t/ α) is the A-term given by simultaneously replacing every occurrence of α i (for i ≤ k) in r by t i , defined recursively by: If the choice of α can be inferred from context, we write r( t) in place of r( t/ α).
Definition 3.7 (Evaluating substitutions).Given an A-term r and a substitution stack ς over some simple alphabet Σ ⊂ A, the evaluation of r relative to a is the A-term over Σ given by The evaluation of r is the term r • given by recursively evaluating relative to each substitution in r, namely evaluation leaves basic terms unchanged, commutes with application and pairing, is defined by substitution stacks, and by (r Note that the evaluation of a substitution stack on a term is well-defined due to the typing constraints on their formation.
An alphabet generally specifies a set of symbols which are associated certain re-write rules in a recursion scheme.In this context, an explicit substitution acts as a delayed substitution which is not evaluated until no further re-writes to sub-terms are possible.For instance, over the alphabet The final term evaluates to e • e. Attempting to read the explicit substitution implicitly leads also to the derivation Lemma 3.8.If r : ρ is a Σ-term for some simple alphabet Σ and ρ is a basic type then r • is a basic Σ-term of type ρ.
Proof.All substitution stacks that may occur in a term of basic type built from an alphabet of simplytyped symbols must be within the context of an explicit substitution.As evaluation replaces every explicit substitution by an implicit one, the result is a basic term of the same type.Lemma 3.9.If r and a = [α ← s]b : ς are A-terms such that α does not occur in r, then r a = r b .Definition 3.10 (Σ-length).Given alphabets Σ ⊂ A and an A-term s, the Σ-length of s, written |s| Σ , is the number of occurrences of symbols in s that are not Σ-terms, formally: In particular, the Σ-length of a Σ-term is 0 and if Σ is the empty alphabet then Σ-length of any term is the number of leaves in the tree representation of the term not labelled by .
Notational conventions Symbols ρ, σ and τ (also with indices) range over types.We commonly notate alphabets by upper-case Roman symbols in calligraphic typeface: A, B, etc., though Greek symbols Σ and Σ will be used for simple alphabets.Sans-serif typeface (f, F, s, S, etc.) and lowercase Greek symbols α, β, etc. range over elements of ranked alphabets, with the latter particularly used for constants of simple type.In the Herbrand schemes we introduce in Section 4, the constant symbols of simple type will be precisely the eigenvariables of sequent calculus proofs, hence our use of the same symbols.Italicised letters r, s, t, R, S, etc. range over terms and a, b over substitution stacks, i.e. terms of type ς.

Higher order recursion schemes
Higher order recursion schemes (HORS) are a generalisation of regular and context-free grammars to the simple type hierarchy.Their origin lies in Park's program schemes [34] from the late 1960s.More recently, HORS have found notable applications in the verification of higher-order functional programs [31,26] (see, e.g.[33], for an overview).
In this section we recall the notion of higher order recursion schemes, which we generalise to the type system introduced above.We establish bounds on the size of languages of acyclic HORS (Corollary 3.17), which will be utilised later to deduce upper bounds on the length of Herbrand disjunctions.Definition 3.11 (Higher order recursion scheme).A (non-deterministic) higher order recursion scheme, or simply recursion scheme, is a tuple R = Σ, N , S, P where Σ is a simple alphabet, N is a alphabet of non-terminals disjoint from Σ, S ⊆ N is a designated finite set of starting symbols of sequence type, P is a set of pairs (F ρ , t), called production rules, such that F ρ ∈ N and t : co(ρ) is a (Σ ∪ N ∪ {x ρ 1  1 , . . ., x ρ k k })-term over Σ where x i is a fresh symbol not in N and ρ i is the i-th domain of ρ.A production rule (F ρ , t) where the arity of ρ is k is written as A non-terminal F ∈ N of R is determined if there is a unique production rule (F ρ , t) in P. By an R-term we mean a (Σ ∪ N )-term over Σ.The order of R is the supremum over orders of the types of non-terminals of R.
Notice that we do not require that R contains only finitely many non-terminals, nor that the set of start symbols is non-empty.This is for technical convenience as it allows us to consider the recursion schemes of the next section as finitely generated 'sub-schemes' of a single infinite recursion scheme.Moreover, higher order recursion schemes are traditionally presented in the context of simple types, wherein start symbols are all of type ι (and indeed a single start symbol suffices) and production rules have the form F x → t with t : ι.The above definition is a direct extension of recursion schemes that accommodates non-trivial prime types.
A given non-terminal may be assigned multiple production rules, leading to non-determinism.To simplify presentation of production rules in this case we adopt the convention of writing Definition 3.12 (Derivations and language).Let R = Σ, N , S, P be a higher order recursion scheme.We extend the relation → R to a relation on R-terms defined by setting r → R s if either • r = Fr 1 • • • r k for some F ρ ∈ N with arity k and there exists a production rule there exists a derivation of s from r, and s is derivable in R if s is derivable from some S ∈ S. The language of R, written L(R), is the set of pairs (S, t) such that S ∈ S, t is a basic Σ-term and S → * R t.
Definition 3.13.Let R = Σ, N , S, P be a higher order recursion scheme.R is finite if N and P are both finite sets, and is acyclic if there exists a transitive, irreflexive relation < on N such that for every production rule F x → R t and every non-terminal G occurring in t, G < F.

Lemma 3.14. A finite acyclic recursion scheme induces a finite language.
An upper bound on the size of the language of acyclic recursion schemes can be obtained by reducing the problem to the length of reduction sequences for the simply-typed λ-calculus.Bounds on normalisation in the simply-typed λ-calculus have been given by Schwichtenberg [36] and improved to exact bounds by Beckmann [11].In the following we use Beckmann's result to obtain concrete bounds for acyclic recursion schemes.Let 2 n 0 = n and 2 n k+1 = 2 2 n k and extend the length function of the previous section to include λ-abstractions by setting |λxs| Σ = |s| Σ + 1.
Theorem 3.15 (Beckmann [11]).Let t be a term in the simply-typed λ-calculus over a simple alphabet Σ.The length of any β-reduction sequence starting from t is bounded by where d(t) denotes the maximum order among sub-terms of t.
Beckmann's bound still applies if t is an arbitrary λ-term over the calculus of Σ-terms given in Definition 3.4 subject to the restriction that is the only Σ-term of type (a necessary restriction due to our requirement that ord(σ → ) = 0).Non-deterministic reductions can also be incorporated via a fresh operator | and permitting β-reductions of the form (λx.
In this case the length and the function d is given by |s|t| Σ = max{|s| Σ , |t| Σ } and d(s|t) = max{d(s), d(t)}.Finally, we wish to allow for so-called η-long reductions, i.e., reductions (λx where s is not an abstraction.Provided that only η-long reductions are permitted and each counts as one step in a β-reduction sequence, Beckmann's bound holds with the analogous change to the length function: From these observations we may deduce the following result.We restrict ourselves to recursion schemes over basic types (i.e.without substitution stacks) as this will suffice for later use.
Theorem 3.16.Let R = Σ, N , S, P be a finite acyclic order n recursion scheme such that every nonterminal has basic type, and for every production rule Proof.Let R = Σ, N , S, P be an order n recursion scheme fulfilling the requirements in the statement.Without loss of generality we may assume that S is a singleton, that every non-terminal is associated at least one production rule, and that X is the alphabet of variable symbols disjoint from both Σ and N such that every term occurring in a production rule in R is an (Σ ∪ N ∪ X )-term.
Fix an enumeration 1 of the non-terminals of R according to a total ordering (<) witnessing acyclicity of R. We may assume 1 , y ρ 2 2 , . . ., y ρ N N } be a set of fresh variable symbols of marked type.We define by recursion a sequence s 1 , . . ., s N of well-typed λ-terms all of type ρ 1 such that s i contains only the variables y i+1 , . . ., y N free and the length of every derivation from F 1 which only re-writes non-terminals F j for j ≤ i is bounded by the length of the longest β-reduction sequence starting from s i .Suppose {F i x → T j : j ≤ m} is the set of production rules associated to F i in R. For each j ≤ m, let t j = T j (y i+1 , . . ., y N /F i+1 , . . ., F N ) be the (Σ ∪ X ∪ Y)-term resulting from T j by substituting the non-terminals F i+1 , . . ., F N by variables y i+1 , . . ., y N respectively.It follows that ) and the maximal order among sub-terms of s N is no greater than n + 1.Every R-derivation from F 1 can be replicated as a sequence of one-step η-long β-reductions starting from s N , the length of which, by Beckmann's bound, is no greater than 2 As a corollary we obtain we obtain bounds on the size of languages.
Corollary 3.17.Let R and k be as in the previous theorem and suppose every non-terminal in R is associated at most two production rules.Then the size of L(R) is bounded by Proof.Given a recursion scheme R all terms in L(R) can be derived via the leftmost reduction strategy.By the previous theorem, the length of these derivations is bounded by 2
The bound given in Corollary 3.17 is optimal in the parameter n as the next lemma demonstrates.Lemma 3.18.Let Σ be the ranked alphabet {a ι , b ι , d ι→ι→ι→ι }.There exists a sequence of acyclic higher order recursion schemes Proof.It suffices to translate Beckmann's lower bounds from [11] to the context of recursion schemes.Define τ 0 = ι and τ i+1 = τ i → τ i for each i < ω.So τ i has order and arity i for each i.Fix n > 0. The recursion scheme R n comprises a single start symbol S n : ι and a non-terminal F i : τ i for each i ≤ n.The production rules are Requirements 1-3 are clearly satisfied.To deduce 4, observe that applying deterministic production rules only, F 0 , where X (k) denotes the k-fold iteration of X.Thus we see that L(R n ) is the set of complete binary trees of height 2 1  n + 1 with each leaf and inner node labelled by either a or b, i.e.

Recursion schemes with pattern-matching
To control the space of derivations we will utilise recursion schemes equipped with pattern-matching, introduced in [32].In their full generality pattern-matching recursion schemes form a Turing complete model of computation [32], though we will require only the decidable subclass in which pattern-matching is restricted to decomposing sequences.The following definition presents the particular schemes we utilise.

Definition 3.19 (Pattern-matching recursion scheme).
A pattern-matching recursion scheme is a tuple R = Σ, N , S, P where Σ, N and S are as in Definition 3.11 and P may include type-preserving production rules of the form The associated reduction relation r → R s is defined by the two conditions in Definition 3.12 and an additional clause: . ., r k+l for some F ∈ N of arity k and terms r = (r i ) i≤k+l , and there is a The definition of a derivation and language for pattern-matching recursion schemes are analogous.
Pattern-matching recursion schemes can be simulated by higher order recursion schemes using constants representing projection functions for pairs in place of pattern-matching.In particular, the upper-bounds given by Theorem 3.16 and Corollary 3.17 apply to pattern-matching recursion schemes without change.There is, however, a subtle difference between the two in the presence of non-determinism and this will be exploited heavily in the next section.In the remainder of this paper recursion scheme refers to patternmatching recursion schemes unless otherwise stated.

Herbrand schemes
We now turn to the task of associating to each sequent calculus proof π with Σ 1 end-sequent a nondeterministic higher order recursion scheme H π .The recursion scheme H π , which we term the Herbrand scheme of π, contains a non-terminal N i π for each sub-proof π A 0 , . . ., A k and each i ≤ k.The interpretation of N i π is of a function which returns a witness (possibly involving explicit substitutions) for each weak quantifier in A i given input for every strong quantifier in the sequent.The arity of N i π is k + 2, namely one greater than the length of the sequent: the first argument is a substitution stack and the (j + 1)-th argument is the 'input' for the formula A j .The type of N i π depends only on the quantifier structure of the formulae in the sequent.In particular, the types of N i π and N j π differ only in their co-domain.Reduction rules governing the non-terminal N i π are determined by the final inference in π and value of i, and re-write the non-terminal to a term built from non-terminals for the immediate sub-proofs of π.Hence, they are independent of the particular starting proof.This property implies that the typing and re-write rules for a non-terminal N i π are invariant across all Herbrand schemes for proofs that feature π as a sub-proof, whence we may consider two Herbrand schemes as comprising identical sets of non-terminals and production rules and differing only in the selection of start symbols.
For proofs with only Π 2 ∪ Σ 2 cuts, Herbrand schemes closely resemble the context-free grammars introduced in [4].The 'generic' case of the representation, enabling the interpretation of cuts of arbitrary quantifier complexity, is when the cut formula on both sides of a cut feature weak/strong quantifier alternations, i.e.Π n ∪ Σ n for n ≥ 3.
We start by introducing the types that occur most prominently in Herbrand schemes.To each prenex formula F we assign two types, the output type, τ F , and the input type, τF , representing the 'existential' and 'universal' structure of F respectively.These types are determined by the quantifiers in F and are defined as follows.For quantifier-free F , τ F = τF = ; otherwise, We compute the input and output types for prenex Π 2 and Σ 2 formulae.Let u and v be sequences of variables of non-zero length | u| and | v| respectively and C qf any quantifier-free formula.Then Let F be a prenex formula and v a sequence of variables of length k.Then Proof.By definition.
Lemma 4.3.Fix a prenex formula A. The order of τ A , τA are as presented in Table 1 where .− denotes subtraction truncated at 0, i.e. n .

Proof. By induction on complexity of A. If A is quantifier free then τ
Table 1 Order of types τ A and τA .
We have been describing the types τ F and τF as representing, respectively, the existential and universal structure of F .Beyond the basic case of Π 1 ∪ Σ 1 formulae this view may not be obvious from the definition and requires explanation.Consider first the case of a Π 2 formula F = ∀x∀y∃zG qf (x, y, z).(We assume two universal quantifiers to provide contrast between the 'type' of the universal quantifiers, ∀x∀y, corresponding to the pair ι × ι, and the existential, ∃z, whose type is ι.)As was seen in the example above, τF = ι × ι and τ F = ι represent the 'type' of the two quantifier kinds in F .However, thinking of the 'computational content' of a Π 2 formula such as F suggests an alternate explanation, namely a function f : τF → τ F such that ∀x∀yG qf (x, y, f x, y ).The type of this function is precisely the type we associate to the universal structure of the dual of F .That is, for a formula H = ∃x∃y∀zI qf (x, y, z) we have τH = ι × ι → ι which, by the above, is simply the type of the universal quantifier in the Skolemised form ∀g∃x∃yI qf (x, y, g x, y ).The existential structure of H is defined as τ H = ι × ι × (ι → ).Morally, this is no different from the type of the existential quantifiers ι × ι (as is the unit type, any function a : σ → is observationally equivalent to the constant function λx σ ).The additional structure arises because, for the purposes of defining its existential 'content', we treat the Σ 2 formula H as a Σ 3 formula with a vacuous inner existential block, H = ∃x ι ∃y ι ∀z ι ∃w I qf (x, y, z, w).Skolemising the inner quantifier yields ∃x ι ∃y ι ∃f ι→ ∀z ι I qf (x, y, z, fz) which has the desired type.
The astute reader will have observed the similarity between the types and formulae above and those arising in Gödel's functional interpretation.Indeed, the pattern extends to the whole prenex quantifier hierarchy.
Theorem 4.4.Shoenfield's variant of the functional interpretation [38] translates a prenex formula A in the language of (classical) first-order logic to a prenex Π 2 formula which is provably equivalent to a formula ∀x τA ∃y τ A F qf over a system of first-order logic in finite types extended with pairing and unit.
Proof.Shoenfield's functional interpretation [38] associates to each formula A in the language of arithmetic a prenex Π 2 formula A S = ∀x∃yA S (x, y) of HA ω such that if PA A then for some term t, HA ω ∀xA S (x, tx).In [17], Gerhardy and Kohlenbach apply the interpretation to pure predicate logic yielding Π 2 formulae in the language of "extensional predicate logic in all finite types", denoted E-PL ω .For the present lemma we utilise the expansion of E-PL ω to the hierarchy of basic types (including new constant symbols).Utilising pair types to collapse blocks of like quantifiers, we may assume A S always has the form ∀x ρ A ∃y σ A A S where A S is quantifier free and ρ A , σ A are basic types.If A is quantifier-free we may take ρ A = σ A = , in which case A S is already of the desired form.If A = ∀vB and B S = ∀x ρ B ∃y σ B B S then by construction A S = ∀x ι×ρ B ∃y σ B A S .Since τA = ι × τB and τ A = τ B , the induction hypothesis shows A S is equivalent to a formula ∀x τA ∃y τ A A * qf .For the existential case, let A = ∃vB with u(B) > 0 and suppose ) and τA = τ A → τB , the induction hypothesis shows A S is logically equivalent to a formula ∀x τA ∃y τ A A * qf for appropriate A * qf .The case that A leads with a block of existential quantifiers is analogous.
We now present the definition of the Herbrand scheme associated to LK proofs.Definition 4.5 (Herbrand scheme).Fix a proof π A 0 , . . ., A k with Σ 1 end-sequent and let Σ π be the simple alphabet consisting of a constant symbol c of type ι and the function symbols and eigenvariables occurring in π (typed accordingly).The Herbrand scheme for π is the higher order recursion scheme H π = Σ π , N π , S π , P π with the following non-terminals and production rules.
1.A non-terminal c ρ : ρ for each basic type ρ / ∈ {ι, } that occurs as a sub-type of a type τ B or τB for a formula B occurring in π, with production rules with c ι and c defined to be the constants c and respectively.2. A non-terminal N i π for each sub-proof π B 0 , . . ., B l of π and for each i ≤ l, with type and production rule(s) as given in Table 2, determined in each case by the final inference of π .3. A start symbol S π,i : τ A i for each i ≤ k with associated production rules The language of π is the set It remains to check that the production rules of Herbrand schemes are well-typed.This task will be taken up later in Lemma 4.13.For now we take for granted the fact that Herbrand schemes are welldefined and continue with some basic properties of them (Lemmas 4.7 to 4.10) followed by the intended interpretation of the schemes as generating Herbrand disjunctions (Definition 4.11) and the observation that this interpretation coincides with the Herbrand set for quasi cut-free proofs (Lemma 4.12).We start, however, with a brief explanation of some of the production rules from Table 2.
Remark 4.6.We comment on some of the rules from Table 2.
• Axiom.We are restricting axioms to quantifier-free formulae only, which motivates the simple production rule given in the table.One may wish to permit axioms π A 0 , A 1 where A 1 = Ā0 has arbitrary (prenex) complexity.These can be accommodated by the production rules , otherwise.
which the interested reader can check are well-typed.This definition mimics the behaviour of the Herbrand scheme for the natural proof of A 0 , A 1 that uses only quantifier-free instances of axioms and alternate applications of ∃ and ∀ inferences.Our reason for favouring quantifier-free axioms is that, as a consequence, production rules never return their arguments as output, a fact that simplifies some technical aspects of the later analysis (specifically Lemma 6.9).• ∧ and ∨.As proofs involve prenex formulae only, conjunctions and disjunctions are necessarily quantifierfree with associated type , and therefore possess no computational content relevant to the construction of a Herbrand disjunction.When focusing on such formulae, the production rule in each case returns the empty sequence.• ∃ r .The production rule in this case depends on both i and the quantifier form of the active formula.
Consider the instance of ∃ r given in Table 2.As π is assumed regular, the active formula (A in the table) is either quantifier-free or universally quantified.If i marks the active formula (i.e.i = m) then the production rule for N i π directly outputs the witness terms provided by the proof and the current substitution (the sequence (r 0 • a, . . ., r p • a)) as the first p + 1 components of a nested pair.The final component is either trivial (in case A is quantifier-free) or, if A is universally quantified, the continuation of the trace to the immediate sub-proof in the form of a function.If i = m, the production rule instead passes the above term to the corresponding argument.• ∀ α .This is the only case that involves pattern matching in Herbrand schemes.Although it can be simulated by a recursion scheme without pattern-matching using projection functions for pair types, doing so introduces a duplication of arguments that is avoided in the chosen formulation.For instance, the production rule for ∀ α where α consists of the single eigenvariable α and the sequent Γ is empty yields the production rule which may be simulated by the rule where p 0 and p 1 are constants representing the two projection functions for pair types.If s is a term such that s → * H r 0 , s 0 | r 1 , s 1 and the four sub-terms are pairwise distinct then the reduction in (1) permits the derivation which is forbidden in the Herbrand scheme due to pattern-matching.In this sense pattern matching plays a role analogous to the rigidity conditions utilised in [21,1,2] for representing first-order proofs with Π 1 /Π 2 cut complexity.
• cut.For each choice of i, the rule provides exactly one reduction for the non-terminal N i π : for i < m this is Note that, in the case e(A) = 0 the type τA (which marks the final argument to N i π 0 ) is prime, and is otherwise the function type τ Ā → τ Ā.Moreover, the case distinction above is independent of i.For instance, if A = ∀vB exactly the following production rules arise from the cut.
In the following let H = Σ, N , S, P be the Herbrand scheme for a regular proof π with prenex Σ 1 end-sequent.Lemma 4.7.H is an acyclic recursion scheme.Hence, L(π) is finite.
Proof.Let < be the transitive relation on non-terminals in H generated by the equations: c ρ < c σ if ρ is a proper sub-type of σ; c ρ < N i π 0 for every ρ, sub-proof π 0 of π and i; N i π 0 < N j π 1 if either π 0 is a proper sub-proof of π 1 or π 0 = π 1 and j < i; and N i π < S π,i for any i.Clearly < is acyclic and irreflexive.Moreover, for every production rule F x → H t and any non-terminal G occurring in t we have G < F. Lemma 4.9.If r : σ → τ is a H -term then τ is a basic type and σ is either basic or the type of substitution stacks.In the latter case, r = N i π or Ni π for some π and i.
Proof.By inspection of the types of non-terminals and terms.
Lemma 4.10.Suppose r is an H -term of type containing no explicit substitutions (i.e.having no sub-term of the form t • a).If r → * H s for some basic term s then s = .
Proof.By induction on the proof generating H , on the composition of r and the length of the derivation r → * s.
We now describe how Herbrand schemes can be interpreted as ascribing existential content to first-order proofs.

Definition 4.11 (Herbrand expansion). Let π
Γ be a proof with Γ = ∃ v 0 A 0 , . . ., ∃ v k A k where A i is quantifier-free for each i ≤ k.Let k i be the length of v i .The Herbrand expansion of π is the quantifier free sequent Γ π given by Lemma 4.12.If π Γ is a quasi cut-free proof of a Σ 1 end-sequent then the Herbrand expansion of π is a valid sequent and Γ π is a Herbrand disjunction in the sense of Theorem 2.3.
Proof.Observe that in every production rule associated to a quantifier-free cut, the term r • A s becomes .Derivations in π are therefore in 1-1 correspondence with traces following the breakdown of formulae in the end-sequent.As a result we observe that L(π) simply outputs all literal witnesses to the existential quantifiers in the end-sequent.
The idea behind Herbrand schemes is to provide a generalisation of the above lemma to proofs containing quantified cuts.The analysis necessary for the result is carried out in Section 7. In the remainder of this section we prove the production rules of Herbrand schemes are well-typed and derive upper bounds on the size of Herbrand expansions.Proof.Fix a proof π with prenex end-sequent A 0 , . . ., A m and i ≤ m.We establish type-preservation of the production rules for the non-terminals N 0 π , . . ., N m π via a case distinction on the final inference rule in π.Suppose π A 0 , . . ., A m−1 , ∃ vA is obtained from proof π 0 by ∃ r .Thus π 0 Γ, A( r/ v) for some sequence r = (r j ) j≤k of simple Σ-terms of type ι.By regularity, e(A) = 0, i.e. either A is quantifier-free or u(A) > 0. Let Γ = A 0 , . . ., A m−1 and fix a term z : τ∃ vA and a sequence of terms x of length m such that N i π xz is well-typed.By definition N i π 0 has type To check type preservation there are two cases to consider: 4), we are done.
2. i = m.In this case it is necessary to check that τ∃ vA = τ ∃ vA → τA .But this follows directly from the definition and the fact that τ ∀ v Ā = τ Ā = τA as e(A) = 0.
Suppose π is derived from π 0 via the inference ∀ α and A m = ∀ vA with u(A) = 0 and α = (α j ) j<k .Let i ≤ m and fix terms x, z = (z 0 , . . ., z k ) such that N i π a x z 0 , . . ., z k is well-typed.Lemma 4.2 implies that z j : ι for each j < k, and z k : τA .Thus The remaining cases are straightforward and omitted.
Lemma 4.14.For a proof π A 0 , . . ., A k and i < k, the order of the non-terminal and the order of N i π is 0 by definition.Otherwise, by Lemma 4.3, the order of N i π is one greater than the maximum among the orders of τA 0 , . . ., τA k which is the smallest n such that every A j is Π n+1 .
It is now possible to strengthen Lemma 4.7 to a concrete bound on the number of terms derivable from a Herbrand scheme.The idea is to eliminate occurrences of pattern-matching in a Herbrand scheme H in a way that does not decrease the length of derivations so that Theorem 3.16 and Corollary 3.17 can be applied.
Theorem 4.15.If π Γ is a proof of a single prenex Σ 1 formula in which all cut formulae are contained in Π n ∪ Σ n then the size of the Herbrand expansion Γ π is no greater than 2 4|π| 3 n+2 where |π| is the number of inference rules in π.
Proof.The case n = 0 is covered by Lemma 4.12 so suppose n > 0. Let H be the Herbrand scheme of π.Since the cut rank of π is bounded by n, Lemma 4.14 implies that the order of H is no greater than n.To obtain the desired bounds we apply Theorem 3.16.However, this requires first eliminating the explicit substitutions introduced by the ∀ inferences.Let H denote the higher order recursion scheme with nonterminals of basic type obtained from H by removing all substitutions terms and types from non-terminals and production rules.In particular, the productions originating from ∀ α and ∃ r inferences are replaced by following in H : The n+1 .

A Herbrand disjunction for the pigeonhole principle
We consider a formal proof of the pigeonhole principle for two boxes via the infinite pigeonhole principle.The question of the computational content of this proof is attributed to G. Stolzenberg [13].A variety of analytic methods have since been applied to this proof [20,10,41,8,1] and its generalisations [37,35].The version we present here is a formal proof with a single Π 3 cut based on the proof with two Π 2 cuts given in [1,41].
Let f : N → {0, 1} be a total Boolean function, let I i (for i = 0, 1) express that there are infinitely many m ∈ N for which f (m) = i and T express that there exists m < n such that f (m) = f (n).A consequence of the law of excluded middle is ∃wI w .Moreover, I i implies T for each i ∈ {0, 1}: assuming I i there exists m ≥ 0 and n ≥ m + 1 for which f (m) = f (n) = i.Combining these observations we conclude T .
The following formalises the above argument into a proof with a single Π 3 cut.The formal language, Σ, comprises two unary function symbols f, s, one binary function symbol m, a constant symbol 0 and a binary relation ≤.We make the following definitions and abbreviations: The intended interpretation of the symbols is: f represents the (arbitrary) function f , s the successor function on N, ≤ the standard ordering and m the binary max function.
A formal proof of the pigeonhole principle (namely Γ, Δ T ) is given in Fig. 4 which we name π ∞ .The proof is displayed in two-sided sequent calculus as this simplifies the presentation and following discussion.The intended interpretation of the two-sided sequent A 1 , . . ., A k B 1 , . . ., B l is the sequent Ā1 , . . ., Āk , B 1 , . . ., B l .For brevity, only eigenvariables and witnesses of the quantifiers and instances of the existential formula T are displayed in π ∞ .The proof fully fleshed out uses about 50 application of the axioms and rules of the calculus but the only cut in π ∞ is the one displayed in the figure.Two normal forms of the proof of size ∼200 have been computed in a case study [41] from which one can read off the Herbrand sets for the formula T (also for formulae in Γ ∪ Δ but these are less interesting).Up to interpretation of the logical symbols by their intended semantics, the two Herbrand sets combined provide the witnesses { 0, 1 , 1, 2 , 2, 3 , 0, 2 , 1, 3 } to the existential quantifiers in T . 1 The Herbrand scheme H π ∞ associated to the proof π ∞ computes the same Herbrand set, a fact we demonstrate in the following.

Types and terms
The Herbrand scheme for π ∞ comprises a non-terminal for each sub-proof of π ∞ and each formula in the end-sequent of that sub-proof.Recall, for each sub-proof p : Π Λ of π ∞ and each i < |Π| + |Λ| there is a non-terminal N i p in H π ∞ representing the existential content of the i-th formula in the sequent at position p.In the following, in place of N i p we will write N A p where A is the i-th formula in the sequent assuming this is unique.In case A occurs more than once in the sequent Π Λ (such as at positions b and 2) the non-terminal N A p refers to the first occurrence of A and we use the notation N A + p for the second occurrence.Concerning the type of N A p , we recall the types τ F and τF for each formula • The remaining formulae that occur in π ∞ are quantifier-free and are assigned type in all cases.
According to the definition, the type of where ς is the type of substitution stacks, F and G are the two formulae in Γ and C and D are the formulae in Δ.As the formulae in Γ ∪ Δ are Σ 1 , their input type carries no computational content (cf.Lemma 7.3), and we can ignore these formulae and identify the type above with ς → τT → τ T , and the term Likewise, the type of N I 1 c is assumed to be ς → τI → τI 1 → τ I 1 and the type of N Other abbreviations and simplifications we utilise are: • r for either the sequence r, or r, c ι 1 → , depending on type, and r, s as a term of type ι2 represents r, s, .• 0 = m00, 1 = s 0, 1 = m01, 2 = s(m10) and 2 = m02.
• For each non-terminal N A p where A is the i-th formula at position p, an additional non-terminal NA p with the same arity as N A p and associated production rule is included in the Herbrand scheme H π ∞ .These non-terminals ease the computation in derivation steps involving permutation.• The Herbrand scheme also includes explicit non-terminals for non-determinism at each type, which are represented via set notation: for terms s 0 , . . ., s k : ρ of the same type, the set S = {s i | i ≤ k} is a term of type ρ with reduction S → s i for each i ≤ k. • An equivalence relation on terms of identical type defined as inducing the same language within all contexts.Formally, we set r s iff r s r where r s holds just if r, s : ρ and for every H π ∞ ∪ {x ρ }-term t of basic type (where x is a fresh symbol of type ρ), whenever t(r/x) → * u for a Σ-term u, then t(s/x) → * v for some Σ-term v such that u For instance, if r → s via an application of a deterministic production rule then r s, and if S = S are two representations of the same set of terms then S S .In general, r(S/x) {r(s/x) | s ∈ S} as shown by considering r = Fx with reduction Fx → * mxx. 2 However, suppose Finally, we remark that, generalising Lemma 4.10, for every type ρ with co-domain and every term r : ρ, we have r c ρ .
Language of π ∞ We now compute the language of H π ∞ focusing on the formula T , i.e. set of terms (after evaluation) derivable from the term N T π ∞ ⊥c ˆ .The first, and only, production rule applicable to this term is given by the cut rule at the root of the proof: Analysing derivations directly from this term is complicated.As the right sub-proof at 0 culminates in a ∀ γ inference, the external non-terminal N Ī 0 cannot be reduced until its second argument (the term N I a ⊥( NĪ 0 ⊥c ˆ )) is reduced to an explicit pair.But the inference at a in the left sub-proof is a contraction, so this immediately introduces non-determinism and duplication of arguments.After resolving the non-determinism and reducing the two continuations of N I a to pairs (say in terms of N I 0 d /N I 1 d ), the external non-terminal can be reduced.The argument NĪ 0 ⊥c ˆ comes into play at this point: the productions for N I + b and N I c increase the nesting of non-terminals which must also be evaluated as pairs in order to proceed beyond N I 0 d /N I 1 d .In the following, we compute the language via a top-down approach, analysing derivations starting from relatively simple terms, and building these together to compute the language of more complex interactions between non-terminals.We begin with the most simple derivations available.Recall that τI 0 = τI 1 = ι 1 .Concerning non-terminals from the left sub-proof we have the following derivation starting from N I 0 d /N I 1 d .
(Note, as |Γ| = 2 the (2 + i)-th formula at positions e and f is the ancestor of I i from d.) If r happens to be such that r • [α ← s]a r • a, then since the derivation above follows deterministic reductions only, we deduce Examining the non-terminals from lower in the left sub-proof affords us The derivation from N I 1 c ars can be continued provided that the two arguments of N I 1 d , namely r 0, NI 0 d as and s, are reducible to pairs.Thus if r 0, NI 0 d a s → * r 0 , r 0 and r 0 • [α ← s]a r 0 • a then Because the reductions governing N I 1 c are all deterministic, when phrased in terms of equivalences, this becomes Property (4) will be useful later.
Returning briefly to the derivations from non-terminals N I b and N I + b started earlier, each of these derivations is also deterministic, so therefore which provides the first step in the continuation of the derivation from N T π ∞ .Before extending (2) however we consider some simple derivations arising from the right sub-proof.On this side, the alternation of universal and existential inference rules means that few non-terminals can be adequately analysed in isolation as we did above.Most straightforward are non-terminals N Īγ 4 and N Īγ 2 , for which we have This gives rise to, for example, The equivalences for N Īγ 2 and N Ī+ γ 2 combine to yield, given the same r, In particular, choosing r = NI 0 d ⊥ s , this implies which will be needed later.In addition to (7), it is necessary to analyse the complex term N Īγ 1 a(N However, here we can use ( 6) again.If δ : ι is a fresh symbol then, applying ( 7) and ( 4) (using r = N Ī 0 ⊥c ˆ ), we get whence (6) implies We have still not examined derivations starting from the non-terminals N T 0 , N T 1 , and N T i for i ≥ 2, which will arise in the computation of L(π ∞ ).The first three non-terminals behave according to The remaining behave similarly to the N A i non-terminals analysed earlier, except that it is N T 6 that provides the only 'outputs' in the derivation.In particular, Let δ : ι be a fresh symbol.Combining the two sets of equations above, if s : ι → ι 1 is such that s δ { s i | i ≤ k} and s i contains neither β or γ for each i, it follows that We can now proceed with calculating the language of N T π ∞ ⊥c ˆ .Let w = NĪ 0 ⊥c ˆ .Following on from (2) and ( 5) we have Thus, we need only compute and apply (11) (assuming that the terms obtained will be free of β and γ).The latter was already established in (9): For the former, we have w 1, (10) implies Hence, by ( 11) and ( 12), we deduce Under the standard interpretation of the symbols 0, s and m (as zero, successor and binary 'max') L(π ∞ ) ascribes to T the set {(0, 1), (0, 2), (1, 2), (2, 3), (1, 3)} .

Substitution, subsumption and normality
In the previous section we introduced a preorder on terms, r s, specifying that the language induced by s extends the language induced by r.Formally, r s holds if r and s are of the same type ρ and for every H ∪ {x ρ }-term t of basic type, if t(r/x) → * r 0 for a Σ-term r 0 then there exists a Σ-term s 0 such that t(s/x) → * s 0 and r • 0 = s • 0 .This relation can be extended to proofs in a natural way, by defining π π if π and π have the same end-sequent and N i π N i π for each i.For many one-step cut reductions π π indeed π π (and even π π ), from which it immediately follows that L(π ) ⊆ L(π) (resp.L(π ) = L(π)).However, there exist reductions π π for which L(π ) ⊆ L(π) but π ⊀ π.These scenarios all arise in reductions which interact with quantifiers and alter the contexts in which explicit substitutions occur in derivations.In order to prove language preservation, i.e. that π π implies L(π ) ⊆ L(π), for these reductions, we will replace the preorder with a coarser relation which we call subsumption, that quantifies not over all possible contexts (the 't' in the definition of r s above) but only over contexts of a particular syntactic shape.Such terms we name normal terms and will be defined (along with our relation of term subsumption) in Section 6.1 below.
Before embarking on these definitions, it will be convenient to abstract the notion of Herbrand scheme slightly and observe that we can specify a single, universal, recursion scheme in which every Herbrand scheme for a regular proof can be viewed as a natural finite sub-scheme.Definition 6.1 (Universal Herbrand scheme).Let Σ be the signature of first-order logic.We let H denote the infinite recursion scheme comprising: We refer to H as the universal Herbrand scheme.
Henceforth, a term is an H -term and we write → in place of → H . Finite sets of H -terms will represent applications of the non-deterministic non-terminals D ρ .Specifically, the set {s ρ 0 , . . ., s ρ k } represents any term formed by combining all the terms s 0 , . . ., s k (possibly with repetitions) via the non-terminal D ρ .If S is a finite set of terms of the same type, it follows that S → * s for each s ∈ S.
Notice that there are no start symbols in H .In this regard we may consider the individual Herbrand scheme H π as obtained from H by specifying an appropriate set of start symbols.The new 'hat' nonterminals do not play a role in viewing H as a universal Herbrand scheme.Rather, they become useful in 'transferring' non-terminals lacking their final argument through applications of permutation.For example, the following partial proof (where we assume u(A) > 0) gives rise to the production rules on the right: The derivation cannot be extended as it stands because N 1 π 0 lacks an argument, meaning that it is not formally possible to express the term N 0 π axz by reference to the proof π 1 only without instantiating x and z by concrete terms.However, N 1 π 0 ax is extensionally equal to the term N0 π 1 ax, allowing us to equate N 0 π axz with the term N 1 π 1 a z r • a, N0 π 1 ax x for any choice of a, x and z.Equations such as these are useful in the close examination of the cut elimination process carried out in the sections below.
In the previous section a natural subsumption and equivalence relation on terms was introduced given by equating terms that induce the same language in all contexts.In the context of the universal Herbrand scheme H , this subsumption is given by r s which holds just if r, s : ρ for some ρ and, for every H ∪{x ρ }term t of basic type, whenever t(r/x) → * r 0 for a Σ-term r 0 , then t(s/x) → * s 0 for some Σ-term s 0 such that r • 0 = s • 0 .The corresponding equivalence relation is defined by r s iff r s r.The following properties of the relations and were remarked in the last section.Lemma 6.2.Let r, s : ρ be H -terms of the same type and S a finite set of terms of pair type σ = σ 0 ×• • •×σ l .
1.If r → s then s r.If, in addition, the reduction follows from a production rule for a deterministic non-terminal then r s. 2. If r and s are representations of the same finite set of H -terms then r s.

If the co-domain of ρ is then r c ρ .
Proof.Properties 1-3 are straight-forward, though for 3 we note that only deterministic non-terminals have production rules that invoke pattern-matching.4 generalises Lemma 4.10 and is proved by induction on ρ and r, noting that • a for any substitution a.

Normal terms and subsumption
In order to focus on the impact of substitutions in H -terms it is necessary to introduce a notion of free and bound occurrences of Σ-symbols in these terms where, recall, Σ is the signature of first-order logic.The free symbols of a basic Σ-term are simply the Σ-symbols that occur in the term; Σ-terms have no bound symbols.For a basic H -term t, the free symbols of t are the Σ-symbols occurring in t combined with the Σ-symbols occurring in any proof π for which a non-terminal N i π or Ni π appears in t; the bound symbols of t are the eigenvariables of the proofs which occur leftmost in t.For non-basic terms, substitutions and substitution stacks are interpreted as contributing to the set of bound symbols, and limiting the set of free symbols in the natural way.Explicitly, for a substitution stack a : ς and H -term r : ρ we define EV (π) denotes the set of eigenvariables in the proof π, and F r(π) the set of all non-eigenvariable Σ-symbols occurring in the π.Notice that Bd(N i π ) and F r(N j π ) are disjoint sets by definition.

Definition 6.3 (Normal terms).
A normal term is an H -term r satisfying: 1. if a is substitution stack which is a sub-term of r then Bd(a) ∩ F r(a) = ∅, 2. if st is an application which is a sub-term of r then Bd(s) ∩ F r(t) = ∅, 3. if s • a is a substitution occurring as a sub-term of r then s is of simple type.
As mentioned at the beginning of this section, the aim of the above definition is to provide a class of terms for which we can examine a more refined subsumption relation on H -terms that captures both language inclusion and equality for a wide range of cut reduction rules.The subsumption relation that achieves this is essentially the restriction of that only quantifies over normal contexts.Definition 6.4 (Subsumption).Given normal H -terms r, s : ρ of the same type, s subsumes r, in symbols r s, just if, for every H ∪{x ρ }-term t of basic type such that t(r/x) and t(s/x) are both normal, whenever t(r/x) → * u for a Σ-term u then t(s/x) → * v for some Σ-term v satisfying u • = v • .Define r ∼ s if r s and s r.
Clearly, for normal terms r and s, r s implies r s, and r s implies r ∼ s.Hence, if π, π are two regular proofs of a Σ 1 sequent Γ and S π ,i S π,i for every i < |Γ| then L(π ) ⊆ L(π).However, what we require is the more general property that if for every H -derivation S π ,i → * u of a Σ-term there exists H -terms r, s and t such that S π ,i → * t(r/x) → * u, S π,i → * t(s/x) and r s, then we may conclude L(π ) ⊆ L(π).This result holds trivially for in place of .For it to work for subsumption, the terms r, s and t must all be normal, i.e., we require Lemma 6.5.If r → s and r is normal then s is normal.In particular, if S π,i → * s then s is a normal term.Lemma 6.5 is not difficult to establish but requires some technical observations concerning the preservation of free and bound symbols through H -derivations.The proof is given in Section 6.2 below.
To accompany Lemma 6.5 it is necessary to know that every maximal derivation from a start symbol terminates in a Σ-term.By the previous lemma, it would suffice to show an arbitrary normal term of basic type is either reducible or a Σ-term but this claim is easily seen to be false.In place of the general statement we have the next two lemmas.Lemma 6.6 (Finite basis lemma).For every normal H -term r of pair type there exists terms s 0 , t 0 , . . ., a normal term and not a Σ-term then r → s for some term s.
There are two scenarios in which explicit substitutions can block derivations.In the first, there is a term r of pair type which cannot be reduced to an explicit pair because it has the form, say, s • a.Even if r itself is a Σ-term, i.e., does not contain any non-terminals, it may appear as the argument of a non-terminal whose reduction depends on pattern-matching r against an explicit pair.The second scenario is if there exists a sub-term of the form Fr 1 • • • r k • a where F is a non-terminal of arity greater than k.Normality rules out both scenarios.This is the main idea behind Lemmas 6.6 and 6.7.We now prove the three lemmas stated above.The arguments rely on a number of technical details concerning derivations of normal terms.Following, we examine the interaction of subsumption and explicit substitutions which will prove important for establishing language preservation for the case quantifier inferences.At this point the reader may wish to proceed directly to Section 7 and refer back the technical results as needed.

Proofs of
We begin with Lemma 6.5 which requires two technical observations on free and bound symbols in normal terms.Their effect is to reduce the problem of proving the lemma to checking that each production rule of the universal Herbrand scheme preserve normality.Lemma 6.8.If r(s/x) is a normal term, t is a normal term of the same type as s, F r(t) ⊆ F r(s) and Bd(t) ⊆ Bd(s) then r(t/x) is normal.
is a production rule of H , and r 0 , . . ., r k+l are such that s = Fr 0 • • • r k−1 r k , . . ., r k+l is normal, then F r(t( r/ x)) ⊆ F r(s) and Bd(t( r/ x)) ⊆ Bd(s).
Proof.We examine two particular cases, namely the quantifier rules, and leave the remaining for the reader to check.Consider an instance of the production rule for ∀ α for a single eigenvariable: where r 0 , . . ., r k and s are terms of suitable type and number, and a : ς is a substitution stack.Let m and n abbreviate the left-and righthand term in the above rule respectively.Assume m is a normal term.
Concerning free symbols, we have For production rules resulting from the inference rule ∃ s , suppose for suitable terms r 0 , . . ., r k and a. Recall that s is the Σ-term instantiating the existential quantifier in the active formula of π 0 .By our regularity condition on proofs, F r(s) ⊆ F r(π), so, letting m and n denote the left and right side of the reduction in ( 13), we have We can now prove Lemma 6.5.
Proof of Lemma 6.5.By the previous two lemmas it suffices to show that every production rule of H locally preserves normality.As in the proof of Lemma 6.9, we offer the argument for the important cases of the two quantifier rules and leave the remaining cases to the reader.Let the derivation arise from an inference ∀ α .Let m and n denote respectively the left and righthand term of the above equation.Assume m is normal.In particular, We first show that for every application s t occurring in n, Bd(s ) ∩ F r(t ) = ∅.This is evident if s t is a sub-term of a, r 1 , . . ., r k , s or t.Moreover, it holds for the case s = N i π 0 and t = [α ← s]a because The other requirement to check for normality is that the sets Bd([α ← s]a) and F r([α ← s]a) are disjoint, but this follows from (14) The second production rule is the one arising from the inference ∃ s : as all other cases follow immediately from normality of and, as Bd(a) is disjoint from F r(a) and EV (π) is disjoint from F r(π) ∪ F r(a), we are done.
We now turn to the task of proving Lemma 6.7 which follows from the next two lemmas.The first characterises the syntactic form of normal H -terms and will be useful in the subsequent analysis of derivations in Herbrand schemes.Lemma 6.10.If r : ρ is a normal H -term and ρ is a basic type whose co-domain is a pair σ × τ in which σ is simple and τ is not simple, then either r = s, t for a Σ-term s and H -term t, or r = Fr 1 • • • r k for some non-terminal F and terms r 1 , . . ., r k .
term satisfying the hypothesis of the lemma.Since ρ is not a simple type, r is not a Σ-symbol nor of the form s • a for a substitution stack a (by definition of normal terms).This leaves three cases: i) l = 0 and r = s, t for s : σ and t : τ ; ii) r = st for s : τ → ρ and t : τ ; or iii) r is a non-terminal of H .If (i), as σ is simple, s is a Σ-term by Lemma 4.8 and we are done.In case (ii), suppose r = st is an application and s : σ = τ → ρ and t : τ .If σ is not basic then Lemma 4.9 implies s = F for some π and i, whence r = Ft.On the other hand, if σ is basic the induction hypothesis applies and s = Fr 1 • • • r k for terms r 1 , . . ., r k , and so similarly for r.So we are done.Lemma 6.11.If r : ι × ρ is a normal H -term of pair type but not a pair then r → s for some H -term s.
Proof.Assume to the contrary that r : ι × ρ is an H -term which is not a pair and that there is no s such that r → s.Without loss of generality assume r is minimal in length.By Lemma 6.10, r = Fr 1 , . . ., r k for some non-terminal F and terms r 1 , . . ., r k .It follows that F = c σ for any σ as otherwise r → c, c ρ .Also F = D ι×ρ (as then r → r 1 ) and F = Ni π for any π and i.So F = N i π for some π and i.The fact that r is not reducible means that the production rule for N i π requires pattern-matching on the final argument.But then r k : ι × σ for some σ, is not a pair and is not reducible, contradicting minimality of r.
We can now prove the two remaining lemmas.
Proof of Lemma 6.6.Let r : ρ be a H -term where ρ = σ × τ .Without loss of generality, we may assume r = D ρ r 0 r 1 for any r 0 and r 1 .If r has the form s, t then trivially r ∼ { s, t } and if r = c ρ then r ∼ { c σ , c τ }.Otherwise, Lemma 6.10 implies that r ∼ N i π ar 0 • • • r k for some π, i, a, r 0 , . . ., r k .An induction on π determines terms s 0 , . . ., s l and t 0 , . . ., t l such that r ∼ { s j , t j | j ≤ l}.Note that Lemma 6.11 implies there is no issue with pattern-matching stopping derivations from fully writing out.
Proof of Lemma 6.7.Let A be a Σ 1 formula and suppose r : τ A is an irreducible normal term that is not a Σ-term.Without loss of generality, we assume every sub-term of r satisfies the statement of the lemma.Considering the types of non-terminals that form the H -terms we deduce ρ is a pair type.By Lemma 6.6 we may assume r is a pair, say r = s, t .As s is of simple type, it is a Σ-term.Hence t is an irreducible normal term which is not a Σ-term and, by the assumption on A, is of type τ B for some B ∈ Σ 1 .

Substitution and normality
Here we present some results concerning the interaction of subsumption with explicit substitutions which are needed for analysing the cut reduction and permutation rules for quantifiers.Lemma 6.12.If t(r/x) and t(s/x) are normal terms and r s then t(r/x) t(s/x).
Proof.Direct consequence of the definition.Lemma 6.13.If ru su for every term u then r s.
Proof.For every derivation t(r/x) → * r 0 of a Σ-term, there are terms t , u 0 , . . ., u k such that t(r/x) → * t ((ru i ) i≤k / x) → * r 0 and t(s/x) → * t ((su i ) i≤k / x).Since normality is preserved through derivations, we are done.Lemma 6.14.Let r be a basic H -term and a be a substitution stack over Σ.Then, Proof. 1 is proved via induction on the basic term s.That r and a are Σ-terms is necessary for showing s(r/α) • a s • ([α ← r]a). 2 follows from 1 by induction on r.Regarding 3, Lemma 6.14(1, 2) imply F r(a • ) ⊆ F r(a) and Bd(a • ) ⊆ Bd(a), so α / ∈ F r(a • ) by normality.Hence, if t is a basic term then The first and last equivalence are applications of 2; the second and fourth equivalence are consequences of Lemma 6.14(4); and the third equivalence uses Lemma 6.14(3) and the fact that F r(a • ) ∩ Bd(a • ) = ∅.Via 2 the above holds for t an arbitrary Σ-term, and from there generalises to deduce 4 is derived by induction on π.By 2 we may assume r is a basic term, i.e., r = r • .In the base case, where π is an axiom, the equivalence is trivial as ) ar 1 r 2 for any choice of r 1 and r 2 of appropriate type.The induction step is straightforward except in the case of quantifier rules.If π ends in the inference where s = (s j ) j≤k then we have, if i = |Γ|, b = [α ← r]a and r 1 , . . ., r |Γ| and t are suitable normal terms, where the second equivalence due to the induction hypothesis for N |Γ| π 0 ([α ← r]a).The case i < |Γ| is similar.For applications of the ∀ inferences, we consider the inference and the assumption that F r(r) ∩ EV (π) = ∅, it follows that β j / ∈ F r(a) ∪ F r(r) for each j.Let r 1 , . . ., r |Γ| , s 0 , . . ., s k and t be such that n := N i π ([α ← r]a)r 1 • • • r |Γ| s 0 , . . ., s k , t is well-typed and normal.In particular, α, β 0 , . . ., β k / ∈ F r( s 0 , . . ., s k , t ).Moreover, as s j has simple type for each j ≤ k, Lemma 4.8 implies that s is a sequence of Σ-terms.Then assuming α / ∈ {β j | j ≤ k}, and writing [ The third equivalence holds since α / ∈ F r( s 0 , . . ., s k , t ) and β j / ∈ F r(r) for any j.If α = β j then π (r/α) = π and, using again that α / ∈ F r( s 0 , . . ., s k ), we have Note, 5 is a special case of 4.

Language preservation for Gentzen-style cut elimination
Recall the relation π π which expresses that π is obtained from π by the application of a reduction rule in Figs. 2 and 3 to a sub-proof of π.In the present section we determine in which cases supports: (i) language inclusion: π π implies L(π ) ⊆ L(π); and (ii) language equality: π π implies L(π ) = L(π).Establishing language inclusion for the cut reduction steps will suffice to derive the main theorem; language equality offers a finer study of the Herbrand content of proofs since if π 0 and π 1 can be connected by a sequence of forward and backward language preserving reductions then L(π 0 ) = L(π 1 ).
Let π and π be regular proofs of some sequent Γ.We say that π subsumes π , in symbols π π, if N i π N i π for every i < |Γ|.If π and π each subsumes the other then π and π are equivalent, in symbols π ∼ π .Proof.By definition.
Herbrand schemes have the property that their languages are invariant under many basic proof transformations.The first example we give concerns the operation of substitution in proofs: Lemma 7.2.Suppose π and π are proofs with the same end-sequent such that π is the result of replacing a sub-proof π 0 of π by π 0 .If π 0 π 0 then π π .
Proof.Let π, π 0 , π and π 0 be as in the statement.We assume π and π have the same end-sequent, say Γ.Given a subproof π of π which is not a proper subproof of π 0 , let π denote the corresponding subproof of π .Observe that if π is a subproof of π but not a proper subproof of π 0 then the non-terminals N j π and N j π are of the same type for each j.Fix i < |Γ| and a normal term t 0 = t(N i π /x).Suppose t 0 → t 1 → • • • → t k = r is a derivation in H of a Σ-term r.By Lemma 6.5, t i is normal for every i ≤ k and, without loss of generality, we may assume t does not feature any non-terminals labelled by proofs with π 0 as a sub-proof.Throughout this derivation, recursively replace each occurrence of a non-terminal N j π for which π is not a proper subproof of π 0 by the non-terminal N j π .Arguing by induction on k, using π 0 π 0 , we deduce t(N i π /x) → * s for some Σ-term s with s We begin our analysis of cut elimination by observing three common scenarios in which language computations can be simplified.
We now turn our attention to the analysis of the subsumption relation with respect to the cut reduction and permutation steps of Figs. 2 and 3.Only the most interesting cases will be covered in detail: the cut and quantifier permutation, and contraction and quantifier reduction.As before, we leave instances of the permutation inference implicit and make use of Lemma 7.3(3) without reference.Recall the characterisation of the cut inference from Remark 4.6: where i < |Γ| and x, y and a are terms of suitable type.

Cut permutation
Suppose π π are the two proofs Due to the asymmetry in the production rules for cut, it is necessary to split the analysis of this reduction into two cases, depending on whether or not A and B are both universally quantified.Provided at least one of the two formulae is existentially quantified or quantifier free, the two proofs above are equivalent and their languages are equal.This is proved in Lemma 7.4.If both A and B are universally quantified we do not expect equivalence to hold in general.However, if there are no contractions to the formula Ā in π 1 or the formula B in π 2 , the proofs π and π are equivalent.This is relevant to the cut reduction strategies employed in Theorem 1.1 and is treated in Lemma 7.7.
Lemma 7.4.For π π as above, if at least one of u(A) and u(B) is zero then π ∼ π .
Proof.If one of A or B is quantifier-free the argument is straightforward following the production rules for cut.This leaves the following three cases to consider: u( Ā), u(B) > 0, u(A), u( B) > 0 and u( Ā), u( B) > 0.
We consider only the first case as the second is symmetric and the third follows a simpler argument.Thus assume e(A), u(B) > 0. Let r, s, t be sequences of terms of length m = |Γ|, n = |Δ| and o = |Λ| respectively, and let a : ς be an arbitrary substitution stack.By the production rules for cut we have, for each i ≤ m, j < n and k < o, and each term w, w of suitable type, and so, for i ≤ m, We prove N i π a r s t ∼ N i π a r s t for every i < m + n + o, from which Lemma 6.13 implies N i π ∼ N i π .For i < m we have by applying (16).For j < n, again applying (16).For k < o, using (15): As noted above, in the case u(A) and u(B) are both positive, language equality holds only in particular circumstances.A sufficient condition for this is given by the next lemma.Proof.Recall that R and S have type ρ and σ respectively.Then for i < m, The other cases, namely m ≤ i < n + o follow similar reasoning.
Proof.Assume Ā and B are both Σ 1 formulae (if not, apply Lemma 7.4).Let R and S be as in Lemma 7.5.
We have, by Lemma 7.3, The previous lemma then implies π ∼ π .
Lemma 7.7.For the same π and π , if there are no contractions to either the formula Ā in the sub-proof π 1 or the formula B in the sub-proof π 2 then π ∼ π .
Proof.Suppose there are no contractions to B in π 2 and let R and S be as above.By Lemma 7.3, N k π 2 a tu ∼ N k π 2 a tv for any two terms u, v : τ B .Hence, in particular, which suffice, by the proof of Lemma 7.5, to show π ∼ π .

Contraction reduction
Consider the two proofs where π * 1 denotes a copy of π 1 with fresh eigenvariables.Observe that π 1 ∼ π * 1 .Although the reduction above does not in general induce language inclusion, for the two scenarios required in Theorem 1.1, namely either u(A) = 0 or there are no applications of contraction are applied to the formula Ā in the sub-proof π 1 , we have π π.The following two lemmas deal with these two cases.

Quantifier permutation
Concerning permuting quantifier rules with cut, consider the following two proofs.
Let α = (α j ) j≤p and v = (v j ) j≤p .Regularity ensures that u(A) = 0.In the following, if u = (u j ) j≤p is a sequence of terms of type ι and u p+1 : τA , we write u u p+1 to abbreviate the sequence term u 0 , . . ., u p+1 : τ∀ vA .
Like with the case of permuting cuts, an application of the quantifier permutation reduction does not preserve equivalence of proofs in all cases.For the main theorem it suffice to prove only π π.This is taken up in Lemma 7.11 below.First, however, we show that if B is not universally quantified then indeed π ∼ π .Lemma 7.10.For π and π above, if u(B) = 0 then π ∼ π.
Proof.Suppose u(B) = 0 and B is not quantifier-free.The other cases involve much similar arguments.Fix r and s sequences of normal terms of length m = |Γ| and n = |Δ| respectively, and normal terms t and u u of type τ∀ vA .By regularity of π and Lemma 6.15, for each j ≤ n.Concerning π the following equivalences therefore appear for i ≤ m, j < n and k ≤ m + 1, So, if t is a normal term and t ∼ { u 0 u 0 , . . ., u l u l } is given by Lemma 6.6 then for i ≤ m and j < n, Examining π , we observe Hence N i π ∼ N i π for every i ≤ m + n and so π ∼ π.
Proof.Fix r and s sequences of terms of length m = |Γ| and n = |Δ| respectively.Let u u : τ∀ vA .Suppose u(B) > 0 and i ≤ m.The other cases have been considered earlier or involve similar but simpler arguments.
As was observed earlier, for each j ≤ n.With respect to π the following equivalences therefore appear.
whereas the rules for π yield, for arbitrary t : τ∀ vA , If t is a normal term and t ∼ { u 0 u 0 , . . ., u l u l } is given by Lemma 6.6 then for each i ≤ m, whereas, due to pattern-matching in the production rule for π , The contrast between equations ( 18) and ( 19) demonstrates why π π need not hold in general.

Quantifier reduction
Consider the reduction Note that regularity of π implies u(A) = 0.This leaves two cases to consider: A is quantifier-free or e(A) > 0. Suppose the latter, so the cut in π remains a quantified cut (the case A is q.f.follows an analogous argument).The following equivalences arise, where i < m and j < n.
The penultimate equivalence in each column is given by Lemma 6.15.

Remaining reductions
The remaining rules are all straightforward to analyse and all induce language equality with the exception of weakening reduction for which only language inclusion holds in general.

Proof of main theorem
We can now prove Theorem 1.1.Let π ∃ vF qf be a regular proof and π = π 0 π 1 • • • π n be a reduction of π to a quasi cut-free proof π n such that for each i < n, the reduction π i π i+1 applies a cut reduction or permutation rule from Figs. 2 or 3 to a sub-proof of π i with the restriction that a rule reducing the strong quantifier side of a cut is applied only if no other reduction of this cut is possible.By Lemma 7.2 and the analysis in the previous section, L(π i+1 ) ⊆ L(π i ) for each i < n.This together with Lemma 4.12 establishes part (iii) of the theorem.The existence of a reduction of the form above is well-known: see, e.g.[40], hence (i).Acyclicity of H π is shown in Lemma 4.7, the bound on the order of H π is given by Lemma 4.14, and the language bound in (ii) follows from Theorem 4.15.

Discussion
This work contributes to the structural analysis of first-order proofs with respect to their Herbrand content.To a first-order classical proof π F of a Σ 1 formula we associate a recursion scheme H with a finite language that constitutes a Herbrand set for F .More generally, the language of H covers the Herbrand set implicit in any quasi cut-free proof obtained from π by a sequence of reductions fulfilling the following two restrictions.
1.A contraction on a universally quantified formula is reduced only when no other reduction rule is applicable to this cut; 2. If two cuts are permuted in the following form then either there are no contractions on the formula B in the relevant subproof, or one of A and B is not universally quantified.where φ is a proof in Shoenfield's calculus [38], t is the realiser extracted from φ, and t is the number of symbols in t.The degree dg(φ) is the maximal ¬-depth of a cut formula in φ.The ¬-depth of a formula is defined precisely in the discussion on pp.17-25 of [16] as the maximal number of nested negations over quantifier-free subformulae (that may contain an arbitrary number of negations).This is sufficient for describing the height of the tower of exponentials since, in Shoenfield's system, ∃x is considered an abbreviation of ¬∀x¬.Thus (the translation of) a Π n ∪ Σ n formula has ¬-depth at most n.Presumably it is possible to give a polynomial translation from the sequent calculus into Shoenfield's system which preserves the maximal ¬-depth of cut formulae (but, to the knowledge of the authors, this has not been done in the literature) and, moreover, to bound t polynomially in terms of the number of inferences of φ.Under these assumptions, the bound of Gerhardy and Kohlenbach would yield the upper bound 2 p(|π|) n+1 on the cardinality of a Herbrand expansion for some polynomial p and any sequent calculus proof π with Π n ∪ Σ n -cuts.This would be one exponent less than our own.
Closely related is the bound obtained by Buss in [12].The proof of Theorem 9 of [12] shows, given a proof π where all cut formulae are contained in Π n ∪Σ n , that there is a cut-free proof whose number of inferences is no greater than 2 |π| n+2 .As an immediate corollary this also yields the upper bound of 2 |π| n+2 on the cardinality of the Herbrand expansion.If one is interested in the cardinality of the Herbrand expansion, Buss's bound and our Theorem 1.1 give the same number of iterations of the exponential function, but Gerhardy and Kohlenbach's would give one less.If one is interested in the number of inferences in the cut-free proof, Buss's bound is one exponential better than ours but has the same number of exponentials as the one that could be obtained from Gerhardy and Kohlenbach's since the number of inferences is at most exponential in the cardinality of the Herbrand expansion (considering the symbolic complexity of the end-sequent is constant).That being said, the bounds we obtain apply to any cut-free proof (and Herbrand expansion) that can be reached by the class of reductions pertaining to 1 and 2 above.In particular, it places no restriction on which cut is to be reduced at any given step, and therefore accommodates a variety of strategies, including top-most and maximal cut-complexity.Whether this freedom of strategies necessitates the larger bound is not entirely clear, and requires further investigation.
Below we highlight finer features of our representation of Herbrand's theorem and some potential applications.

Sequent versus trace grammars
In this paper, the grammar associated to a proof is 'sequent based' in the following sense.Consider an inference of the form The production rules corresponding to r can be seen as transforming a sequence of inputs (x 0 , . . ., x m ) for the formulae F 0 , . . ., F m to a sequence of terms (t 0 , . . ., t n ) which are used as inputs for G 0 , . . ., G n in the inference rule immediately above r.The production rules affect the whole sequence of inputs regardless of which formula is active.This is in contrast with the 'trace'-based grammars of, e.g., [21,1,2] where an inference of the form Γ, G r − − − − − Γ, F is associated a production rule that updates an input for F to an input for G, entirely ignoring presence of formulae in Γ.In the latter type of grammars the derivations can be viewed as traces that climb up and also down the proof tree mimicking the traces revealed through Gentzen-style cut-elimination.These grammars are generally cyclic and it is necessary to place equality constraints (the 'rigidity' conditions of [21,1]) on derivations to ensure finite languages.For proofs that contain cuts with complexity greater than Π 2 /Σ 2 the trace-based analysis quickly becomes infeasible.In contrast, the sequent-based approach generates an acyclic term grammar that not only ensures a finite language but allows one to obtain upper bounds on language size by standard language-theoretic arguments.

Providing a minimal grammar
Part of the motivation behind this study is to ultimately invert the cut-elimination procedure and find an algorithmic method for introducing cuts into cut-free proofs.The idea has been successfully carried out for Π 1 /Σ 1 -cut introduction and more recently for the introduction of a single Π 2 /Σ 2 -cut (see [24,23,22,27]).The general method proceeds as follows.Given a cut-free proof π, one computes a concise representation of π as a term grammar (such as a regular tree grammar whose language contains the Herbrand set induced by π).The grammar is then viewed as a proof with cut, in which the cut-formulae are yet to be determined.Finding the cut-formulae involves solving a unification problem induced by the grammar.Key to successfully carrying out this procedure is identifying natural classes of formal grammars that describe the instantiation structure of a proof with cut.Higher order recursion schemes are a promising candidate to lift the method of cut-introduction above the Π 2 level.

First-order logic in finite types
A natural extension to consider is first-order logic in finite types, namely many-sorted predicate logic with a sort of individuals for each simple type and well-typed application as a term forming operation.On the sequent calculus side, we add new quantifier inferences for each type: At the level of types, the definition of τ A and τA is extended to incorporate higher-type quantification: for example, τ ∀v σ F = τ F and τ ∃v σ F = σ × τ F .The production rules corresponding to the new quantifier inferences will be as before, though the move to higher-type means substitution stacks may contain symbols of non-ground types.With appropriate modifications to the notion of normal terms, we expect the analogous language preservation lemmas to hold.

Lifting the prenex restriction
Our representation of first-order proofs as recursion schemes forces an asymmetric interpretation of formulae to types that does not easily generalise to non-prenex cuts.Specifically, the type of an existentially quantified formula is, except in the case of Σ 1 , an order higher than the dual universally quantified formula.This disparity is due to the production rules for cut which treat the cut formula from one premise as a function which receives as its input 'witnesses' for the dual (cut) formula in the other premise.If the same representation is to be applied to non-prenex cuts we should expect the types associated to, say, a conjunction to be an order higher than those assigned to the dual disjunction.Motivated by the existing connection to the functional interpretation (Theorem 4.4), the duality of the logical connectives in accounts of Shoenfield's functional interpretation (such as is examined in [39]) may well be relevant.Nevertheless, the primary desideratum is that the associated production rules respect the local operations of reductive cut elimination, so the types of any additional connectives should respect the 'proof-theoretic' semantics imbued by cut-elimination over other computational interpretations.

Functional interpretation for sequent calculus
Shoenfield's functional interpretation [38] maps every first-order formula A to a Π 2 formula in finite types A S = ∀ x∃ yA S ( x, y) where A S is quantifier-free.Gerhardy and Kohlenbach [17] show how a sequence of terms t can be extracted from a proof of A such that A S ( x, t) is derivable in a quantifier-free predicate logic for simple types.As we saw in Theorem 4.4, modulo a calculus for basic types we may assume A S has the specific form ∀x τA ∃y τ A A S (x, y), whence the approach of [17] extracts, from a proof π A, a realiser t π : τA → τ A and a proof of A S (x, t π x).The Herbrand schemes of this article operate similarly.Given a proof π A with singleton end-sequent, the term N 0 π ⊥ has the type of realisers for A, namely τA → τ A .Moreover, it is easily shown that the induced quantifier-free formula A S (c τA , N 0 π ⊥c τA ) can be derived in a quantifier-free first-order logic extended by an equational calculus for the production rules of the Herbrand scheme H π (recall c τA is a canonical constant inhabiting the type τA ).
At this stage it is unclear how close our approach is to the functional interpretation.For instance, an obvious question is whether there exists a class of classical sequent calculus proofs for which Herbrand schemes behave, in a suitable sense, identically to the functional interpretation.Although we do not know the answer to this question, even the superficial connection between the two formalisms that has come to light points to an unexpected correlation between witness extraction and classical cut elimination.

Lemma 4 . 8 .
Every H -term of simple type is a Σ-term, and every H -term of substitution stack type has the form either ⊥ or [α ← s]b for some α ∈ Σ, Σ-term s and b : ς.Proof.The non-terminals of H all have type one of three forms: , pair type, or function type with nonsimple co-domain.It therefore follows that the only H -terms of simple type are the Σ-terms.Likewise, ⊥ and [α ← s]b are the only kind of H -terms of type ς.Given a substitution stack [α ← s]b however, as α ∈ Σ the first part of the lemma implies that s is a Σ-term.

Lemma 4 . 13 .
The production rules of Herbrand schemes are type preserving.
N i π 0 b xz k is well-typed and has type τ A = τ A m .Suppose π is derived via cut from sub-proofs π 0 Γ, A and π 1 Δ, Ā.Let m = |Γ| and n = |Δ| and fix x and y suitably typed.Without loss of generality we may assume i < m, in which case we require to show (N n π 1 a y) • A (N m π 0 a x) : τA which reduces (via Remark 4.6) to proving e(A) > 0 implies τA = τ Ā → τ Ā, u(A) > 0 implies τA = τ Ā and τ Ā = τA → τ A , both of which follow directly from Lemma 4.2.
1. a non-deterministic non-terminal D ρ : ρ → ρ → ρ for each basic type ρ with production rules D ρ rs → r and D ρ rs → s, 2. all non-terminals N i π , S π,i and c ρ from Definition 4.5 with their associated production rules formulated deterministically in terms of the D ρ non-terminals above, 3. for each non-terminal N i π : ς → τ 0 → • • • τ m → τ from the above with τ prime, a non-terminal Ni π with type and associated production rule

Lemma 7 . 8 .
If u(A) = 0 then π π.Proof.If A is quantifier-free then π ∼ π is easily established by following the reduction rules for cut.So assume u(A) = 0 < u( Ā).Let m = |Γ| and n = |Δ|, and fix i < m and j < n.Let r = N n π 1 a y and r * = N n π * 1 a y.Unravelling the production rules for the two proofs yield

Table 2
Production rules for Herbrand schemes.x and y are sequences of distinct variable symbols of length m := |Γ| and n := |Δ| respectively.
second part of Lemma 4.8 implies that derivations in H from the start symbol are in 1-1 correspondence with derivations in H . Repeating the argument of Corollary 3.17, the size of L(π) is therefore bounded by 2 K where K is the length of the longest derivation in H from the single start symbol.The order of H is no greater than n, the number of non-terminals is bounded by |π| 2 , and for each production rule F x → t in H , |t| Σ < 3 × |π| where Σ is the ranked alphabet of function symbols and constants occurring in π.Theorem 3.16 then implies K ≤ 2 Lemma 7.3.Let π A 1 , . . ., A m , B, C 1 , . . ., C n and let a : ς, r i : τA i , s : τB and t j : τC j be terms for 1 ≤ i ≤ m and 1 ≤ j ≤ n such that N m π a rs t is normal.Let ρ = τB .1.If B is prenex Σ 1 then N m π a rs t ∼ N m π a rc ρ t. 2. If e(B) > 0 and there are no applications of contraction to B in π then N m π a rs t ∼ N m π a rc ρ t. 3.If the final inference in π is an application of p with immediate sub-proof π A 1 , . . ., A m , C 1 , B, C 2 , . . ., C n then for each j ∈ [0, m) ∪ [m + 2, m + n],