UvA-DARE (Digital Academic Repository) Exact bounds for acyclic higher-order recursion schemes

Beckmann [1] derives bounds on the length of reduction chains of classes of simply typed λ -calculus terms which are exact up-to a constant factor in their highest exponent. Afshari et al. [2] obtain similar bounds on acyclic higher-order recursion schemes (HORS) by embedding them in the simply typed λ -calculus and applying Beckmann’s result. In this article, we apply Beckmann’s proof strategy directly to acyclic HORS, proving exactness of the bounds on reduction chain length and obtaining exact bounds on the size of languages generated by acyclic HORS. © 2022 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

as a corollary of Beckmann's result [1] and extend them to language bounds. However, they do not prove their reduction chain length bounds exact in the sense of Beckmann. In this article, we derive bounds similar to those in [2] by applying Beckmann's proof strategy directly to acyclic HORS and avoiding the detour through simply typed λ-calculus. This allows us to obtain bounds both on reduction chain length and language size, which we prove exact in the sense of Beckmann. Furthermore, we derive a result which relates the reduction length bounds to the arity of HORS terms, thereby accounting for the differences in the reduction behavior of HORS and the λ-calculus. Our analysis is partly motivated by a desire to illustrate the methodology of [1] by applying it to other rewriting systems.
Outline We begin this article by sketching the overall proof strategy for obtaining the bounds and proving their exactness in Section 2. We continue in Section 3 by formally defining acyclic higher-order recursion schemes and related concepts. The article's technical work is carried out in Sections 4 and 5. In Sections 4.1 and 4.2, respectively, we derive upper and lower bound on the length of reduction chains of acyclic HORS. We use these to deduce similarly exact language size bounds in Section 5 and close with a discussion on the wider applicability of Beckmann's method in Section 6.

Sketching the proofs
While the details of obtaining reduction length and language size bounds are technical, the general approach, employed by Beckmann [1] and Afshari et al. [2], respectively, is strikingly elegant. We thus begin by sketching the results and proof strategies to obtain them.

Reduction chain length bounds
We derive uniform bounds on the lengths of reduction chains for certain classes of terms of acyclic HORS. These classes are characterized by the degree deg(t) (the highest order among the recursive subterms of t), the recursive size s(t) or the recursive height of the syntax tree h(t) of these terms: The proof of these results is split into two steps.
Deriving the upper bound Following Beckmann [1], the bound is derived using expanded head reduction trees, trees which "account for" certain possible future reductions along a term's reduction chains. More concretely, these trees are formalized as a derivation system ⇓ α ρ t, describing a tree of height α which accounts for all future reduction steps of non-terminals of an order at least ρ which can occur along any sequence t * t . The upper bound is then derived by combining three results about such trees: (ii) We may decrement the index ρ of a derivation at the cost of increasing the height α exponentially. In other words, ⇓ α ρ+1 t implies ⇓ 2 α ρ t.
By repeatedly applying (ii) to trees obtained via (i) until we may derive bounds based on (iii), we arrive at Furthermore, when restricting our attention to those terms which only feature terminals and non-terminals of arity at least k, we can sharpen the estimate to |t| ≤ 2 α k+1 and thus subsequently This is a result unique to higher-order recursion systems which accounts for the fact that higher arity non-terminals are unfolded in only one reduction step.
Deriving the lower bound We bound the length of reduction chains from below by giving terms of suitable degree, height and size which have sufficiently long reduction chains. For example, for the size-based bound, we construct terms D s . Alongside analogous terms D h n (N), they witness that dh n (N) = 2 n+1 ( (N)) and ds n (N) = 2 n ( (N))

Language size bounds
We derive uniform bounds on the size of the language generated by an acyclic HORS term. For this, we first define sizeand height-based measures. These depend on one additional parameter: the branching-factor bf(t) of a given HORS term t, the maximal number of distinct expressions the non-terminals occurring within t may be unfolded to.
which are exact in the sense that, for any fixed k, ls k which is again obtained in two separate steps.
Deriving the upper bound The proof follows the approximation given by Afshari et al. [2]. Essentially, as any normal form of an acyclic HORS can be derived via the leftmost-reduction strategy, there can only be at most "bf(t) different meaningful reduction steps" at any point along the reduction chain. This means that l(t) ≤ bf(t) d(t) and, therefore, using the previous results, Deriving the lower bound Our approach directly parallels that for reduction chain length bounds. For the size-based bound, we build AHORS terms L s . For the height-based bound, analogous terms L h n,k (N) are constructed. These terms then witness the lower bounds lh n (N) = 2 n+2 ( (N)) and ls n (N) = 2 n+1 ( (N))

Acyclic higher-order recursion schemes
The rewriting system we examine are acyclic higher-order recursion schemes (AHORS). The syntax of (possibly cyclic) HORS expressions and their types is given below. s, t ∈ Tm ::= x | a | s t σ , τ ∈ Ty ::= 0 | σ → τ In addition to variables and applications, HORS expressions can contain letters a, b, c, . . . which are separated into terminals and non-terminals. While terminals, similarly to variables, cannot be reduced further, non-terminals mirror defined procedures common to programming languages in that they act as shorthands which unfold into more complex HORS expressions of ground type 0. In a departure from the behavior of procedures in programming languages, this shorthand relation need not be functional, that is, a non-terminal a may be defined as a shorthand for multiple different expressions t 1 , . . . , t n . In such cases, an application of a may, non-deterministically, unfold a to any expression t i . We write "a : τ := x 1 , ..., x n → t a 1 | . . . | t a k " to denote that a is such a shorthand of type τ for the expressions t a 1 , ..., t a n which may refer to the parameters x 1 , ..., x n . Clearly, some choices of shorthands, such as a : 0 := a, allow for expressions whose reduction sequences diverge, trivializing the question of reduction chain bounds. We thus restrict our attention to acyclic HORS, meaning those whose call-graph (i.e. the graph tracking which shorthands refer to which other shorthands) is acyclic.
We begin by defining the inductively defined judgment N ; T ; s : τ which characterizes HORS which are both acyclic and well-typed. Here, N is a context of typed non-terminals of the shape a : τ := x → t a 1 | ... | t a k as explained above. To simplify arguments involving variable binding and substitution, we require that the choices of parameter variables for each non-terminal are fully distinct from each other. For example, if some non-terminal b with definition b : σ := y → t b appears in the body t a of a non-terminal a : τ := x → t a , this ensures that x / ∈ FV(t b ). T and list typed terminals and variables with the syntax a : τ and x : τ , respectively. We denote the arity of a type τ by aty(τ ) as defined below. We sometimes speak of the arity of an expression, in which case we mean the arity of its type.
There are multiple things worth noting about these definitions: First of all, observe that N ; T ; s : τ also asserts the acyclicity of s. Whenever the Def-rule is applied, all shorthand definitions before and including the one currently being unfolded are removed from N . This ensures that the order in which non-terminals appear in N is a strict linear order < such that b < a for any non-terminal b occurring (recursively) in any t a i . It is well-known that a graph, in this case the call-graph, can be linearized in such a fashion if and only if it is acyclic.
Secondly, note that whenever the Def-rule is applied, the variable context is replaced with the argument variables of the non-terminal which is being unfolded. This is because an expression associated to a non-terminal must not refer to variables outside of its arguments.
Lastly, observe that N ; T ; s : τ only ensures the well-typedness of all non-terminals occurring recursively in s. It is possible that there is some (a : τ := x → t a 1 | . . . | t a n ) ∈ N with N ; T ; args( x, τ ) t a : 0 which simply does not occur recursively in s and is thus never "checked" by N ; T ; s : τ . For our purposes, this does not spell any trouble as only those non-terminals which may occur along the reduction sequence of s are of relevance to the question at hand, which are precisely those occurring recursively in s.
The recursive size s(t) and height h(t) of a term t is defined on the derivation of N ; T ; t : τ . This is well-defined as long as each of the terminals and non-terminals are distinct and only one shorthand definition for each non-terminal is contained in N , which we will assume going forward. Under these circumstances the derivation of N ; T ; If t : τ is clear from the context, we sometimes write ord(t) for ord(τ ). We write deg(s) for the degree of s which is the maximal order ord(σ ) of the types σ occurring in the derivation of N ; T ; s : τ .
We close this section by defining the reduction steps N s s of the AHORS expressions. Observe that the context N of non-terminals is needed to determine which terms t a i any given non-terminal a may be unfolded to. We usually omit N , simply writing s s , if N is clear from the context. Furthermore, note that a non-terminal may only be unfolded when all of its arguments are present.

Deriving an upper bound
The key tool for establishing the upper bounds on reduction chains are expanded head reduction trees. These are derivations of judgments of the form N ; T ; ⇓ α ρ t : τ according to the rules given in Fig. 1. Intuitively, the judgment expresses that the term t of type τ is strongly normalizing. 1 The parameter α is an upper bound on the depth of the tree deriving ⇓ α ρ t and ρ restricts which instances of the Cut-rule may be used. Crucially, ρ = 0 indicates a judgment in the Cut-free fragment of the derivation system.
As the judgments of N ; T ; ⇓ α ρ t : τ are syntactically complex, we omit parts of them which are clear from the context wherever possible. Often, we simply write ⇓ α ρ t. Most of this section is concerned with proving various properties of these derivations, often the validity of various transformation procedures on them. The choices made in designing these derivation rules are justified by the various lemmas we prove in this section. We thus elaborate on these as we go along.
We begin by illustrating the ultimate purpose of the expanded head reduction tree in this proof method: Observe that for any well-typed term t, there is one unique derivation of ⇓ α 0 t (up-to the choice of α, which does not influence the "shape" of the derivation). Derivations ⇓ α 0 "account for" every possible future reduction step t can take. To prove this, one first defines a size measure on such derivations. Note that the notation |t| is appropriate because of the unique "shape" of the derivation ⇓ α 0 t.
The following lemma should be read as a formalizing that ⇓ α 0 s "accounts for" all future reductions of s.

Lemma 2.
If ⇓ α 0 s and s + t then ⇓ β 0 t for some β and |s| > |t|. 1 Our choice of notation differs from most of the literature, in which the judgment is usually denoted by α ρ t. We opt for this break in convention to avoid confusion with the typing judgment N ; T ; t : τ . Our notation ⇓ α ρ t is inspired by the notation for big-step semantics in the programming language design community, who denote by s ⇓ t that a term s normalizes to a term t. We believe the notation ⇓ α ρ t is appropriate as ⇓ α ρ t entails strong normalization of t.
Proof. Observe that the claim can be extended to the case s * t weakening the inequality to |s| ≥ |t|. We prove the claim per induction on ⇓ α ρ s.
Case VAR: Then s = x u. We know that x u + t means that x u 1 This yields the desired inequality: Case TERM: Completely analogous to the Var case.
Case δ: Then s = a u. If aty(a) < | u| then the case is just analogous to Var. Thus suppose aty(a) = | u|. There are two possibilities how a u + t can be constituted.
1. Possibly, a u 1 ... u n + a u 1 ... u n with u j + u j for at least one j and u k * u k for all other j = k. Then we x] by simply replaying u k * u k and u j + u j at each substituted instance of u k and u j , respectively. This then yields ⇓ ... we know that |s i | > |s i+1 | by Lemma 2. That means any chain can be at most of length |s|.
With Lemma 2 in mind, we can motivate the choice of premises for the derivation rules of ⇓ α ρ s. For illustrative purposes, we give a simplified variant of the derivation system, in which details irrelevant to this discussion have been omitted, below.
The proof of Lemma 2 relies on the fact that whenever a term ⇓ α 0 s takes a reduction step s t, this reduction step is "accounted for" somewhere within the derivation of ⇓ α 0 s. Thus each of the (non-Cut) rules has as its premises the various "places" a reduction step could lead to or take place in. Simplest of all, the Var-and Term-rules only have to account for their argument terms as variables and terminals cannot take reduction steps on their own, so any reduction involving them will have taken place in one of their arguments. The more interesting case is that of δ: Clearly, the reductions x t y as ⇓ α ρ y as well.
Case CUT: Then s = t t and ord(t) ≤ ρ. But this means that ord(t t ) ≤ ord(t) ≤ ρ as well. We can thus arrive at ⇓ α+1 ρ t t y via another Cut.
Note that the proof of Lemma 6 motivates why the δ-rule allows for partial applications of a non-terminal a (those a t : τ with τ = 0) as this accommodation eases the proof in the δ-case. i . If y / ∈ x then y * = y and we may thus conclude that ⇓ α+n+β ρ y u * . If y ∈ x then y * = t j . We know that ord(t j ) ≤ ρ meaning we may derive ⇓ α+n+β ρ t j u * by n applications of the Cut rule.
Case δ: Then s = a u with (a : τ := y, y → t a ) ∈ N . We know by inductive hypothesis that ⇓ α+β ρ u * i and ⇓ α+β ρ By replacing each Cut-application of order ρ + 1 with an application of the Cut-admissibility lemma, one can lower the Cut-level to ρ at the cost of an exponential blowup of the height-bound α. We next show that any well-typed acyclic HORS N ; T ; t : τ can be embedded into ⇓ α ρ t. There are two different methods of achieving this. The "naïve" method simply replaces each -rule with its corresponding ⇓-rule. Notably, this means App is replaced with Cut, resulting in a height-bound of h(t) and a Cut-level of deg(t). If one instead opts to replace each App of order deg(t) with an application of Lemma 9, one lowers the Cut-level of the resulting derivation to deg(t) − 1 while increasing the height-bound to s(t). As s(t) ≤ 2 h(t) the bound obtained via this embedding will never be worse because one Cut-reduction step and the resulting exponential increase of α is prevented.

Lemma 10 (Embedding). Let N ; T ;
t : τ . Then Proof. We prove ⇓ s(t)−1 ρ t and ⇓ h(t) ρ t per induction on ; N t : τ . We combine the handling of both claims when possible.
The upper bounds on the lengths of reduction chains can then be obtained by combining all previous results of this section.
Lastly, because of Corollary 3 we know As this can be done for any N ; T ; t : τ with deg(t) = n > 0 this yields the bounds as desired.
For the case of n = 0, observe that we must have N ; T ; t : 0 where t is a variable, terminal or non-terminal of type 0, such non-terminals in turn unfolding to analogous terms of degree 0. For such terms, each step t t guarantees

s(t) > s(t ) and h(t) > h(t ) as no substitutions take place. Thus we obtain
Corollary 13 (Strong Normalization). All reduction chains starting at a term N ; T ; s : τ are finite.
Theorem 12 is completely analogous to the result for the simply typed λ-calculus obtained by Beckmann [1]. As we prove in Section 4.2, the bounds are also exact. Nonetheless, the bounds can be improved when considering another parameter: the arity of non-terminals. This result is motivated by the difference in reduction behavior between the λ-calculus and HORS. If a λ-abstraction over multiple variables is applied to multiple arguments, the arguments are substituted one-ata-time (as pictured below). On the other hand, non-terminals in HORS (such as a := x, y → s below) receive all of their arguments in one reduction step.

(λx.λy.s) u v (λy.s[x/u]) v s[x/u, y/v] a u v s[x/u, y/v]
This means that high-arity HORS expressions should reduce "faster" than the equivalent λ-terms in some way. The result below makes this intuition concrete.

Lemma 14 (Estimation II). Let s be an AHORS expression such that all non-terminals (recursively) occurring in s are
The only interesting case is that of aty(a) = | t| = n. Then because n ≥ k and thus k − n ≤ 0.
Corollary 15. When restricting our attention to AHORS expressions which contain only non-terminals of arity at least k, we obtain the following upper bounds.

Deriving a lower bound
In this section, we show that the upper bounds from Section 4.1 are exact in the sense of Beckmann [1]. More concretely, we prove that from which we can conclude that the bounds from Section 4.1 can only be improved to something of the shape for constants 0 < r s , r h < 1 and c s , c h ∈ R.
To derive this result, we follow Beckmann in defining terms D s n (N) and D h n (N) of suitable degrees, sizes and heights which have reduction chains of at least length 2 n+1 (N) and 2 n (N), respectively. For this, fix a terminal B : 0 → 0 → 0. For terms t : 0 we define b-trees of height K with t at each leaf as follows Such trees are of interest as they allow for exponential replication of reduction steps, as proven in the following lemma:

Lemma 16. For any t, s : 0 with t n s we have T K (t) n * 2 K T K (s).
Proof. Observe that T K (t) has 2 K copies of t as its nodes. Then each copy can take n steps, leading to n * 2 K steps from T K (t) to T K (s) overall. Now, with a non-terminal b : 0 → 0 := x → b x x and writing [ f ] n (v) for the n-fold application of f to v as defined below we can derive the following result. The majority of this section is concerned with "compressing" these trees into terms s(S N Beckmann [1] achieves this with the help of Church encodings of natural numbers, an approach that we parallel in HORS. Unfortunately, the reduction behavior of HORS Church encodings of different orders applied to each other is somewhat erratic, as often only reduction steps in the outer-most application may be taken. We thus first develop a theory of nrepeaters, which can be seen as a semantic characterization of Church encodings and which helps us control this behavior in our proofs. Writing τ i for the types This definition allows us to state and prove the following crucial lemma. It is needed for proving both the welldefinedness of our HORS Church encodings and the correctness of S N n (t) and H N n (t).
Lemma 19. For a family ( f n j j ) j≤i of n j -repeaters of order j, N > 0 and any m : τ 1 and v : tower(n 0 , ..., n i , N).

Proof.
We proceed by induction on the maximum order i.
We proceed by induction on N.
Lemma 20. The term e n i : τ i+2 is a n-repeater of order i.

Language size bounds
To be able to state the language size bounds as a closed expression, one needs 2 to separate terms into classes according to one further parameter.
Definition 26. For any sequence N of non-terminal assignments, we define its branching factor bf(N ) as the least n such that all non-terminal mappings (a := x → t a 1 | . . . | t a m ) ∈ N have m ≤ n.
Overall, we obtain the bounds ls k where k is the branching factor of the terms under consideration. The strategy for obtaining the upper bound on language size follows that employed by Afshari et al. [2] to derive such upper bounds from their reduction chain length bounds.
Corollary 28. We obtain the bounds ls k The strategy to obtaining the lower bounds is very similar to that in Beckmann [1] and Section 4.2: Define terms which "blow up" to a sufficient degree and many copies of a non-deterministic non-terminal, which in turn generate a large language. The proof also requires a result relating binary tree expressions with their resulting language size, similar to Lemma 16 for the reduction chain length bounds.
Proof. Proof per induction on K . The case for K = 0 is trivial. For the K + 1 case,

Discussion
We have obtained bounds on the lengths of reduction chains and the sizes of the languages generated by classes of acyclic HORS and have proven their exactness in the sense of Beckmann [1]. Key to these results was adapting Beckmann's method of expanded head reduction trees to acyclic HORS. A similar undertaking can be found in the work of Clairambault [8,9] who adapts Beckmann's method to obtain bounds on the length of plays in game semantics for higher-order programming languages.
A drawback of Beckmann's method, as he presents it in [1] and our adaptation, is that it does not easily apply to basic extensions of the systems presented. In cut-free derivations, the next applicable rule is always determined by the left-most syntactic element of the term. In contrast, consider the following rewriting rule for the projections on pairs, added to either the simply typed λ-calculus or HORS: π i (s 1 , s 2 ) s i With this definition, an expression π i s, with s not being of the form (s 1 , s 2 ), cannot take the π -reduction step. This means there is no expanded head reduction tree derivation rule which applies to all expressions of the form π i s t, at least not in the style employed by Beckmann in [1]. This problem was addressed by Beckmann and Weiermann in [13] where they give bounds for terms of Gödel's T, including the recursion combinator, which raises the same problems as the projections π i . Instead of applying rules based on the left-most syntactic element, the applicability of these rules is determined by the term's "head redex", which may occur deep inside the term under consideration.
In [2], Afshari et al. translate derivations in the classical first-order sequent calculus with Cut into AHORS whose generated language are the derivation's Herbrand expansions. To estimate the number of Herbrand expansions which may be generated in this manner, they derive reduction chain length and language size bounds for AHORS by embedding AHORS into the λ-calculus and applying Beckmann's result [1]. To express their bound, they employ a notion of expression size which does not recurse on non-terminals. That is, for an expression N ; T ; t : τ their size function s (t) is defined as below: For a given expression 3 N ; T ; t : 0, they then derive a reduction chain length bound of 2 which, at face value, is exponentially larger than our bound of 2 s(t) deg(t) . However, this difference can be attributed to the fact that our notion of size can be exponentially larger than theirs. To illustrate this, fix terminals b : 0 → 0 → 0 and I : 0 as well as the non-terminals below and consider t := a n I .
a 0 x : 0 → 0 → b x x a n+1 x : 0 → 0 → (a n x) (a n x) For an N which contains the definitions for a 0 , . . . , a n we have |N | = n + 1 and k = s (b (a m x) (a m x)) = 4, meaning (|N | + 1)(k + 1) = 5(n + 2). On the other hand, s(t) = 2 n+3 − 2. Thus, in the worst-case scenario of "exponentially growing" AHORS, both bounds can be seen to agree as the exponential "saved" in our bound is reintroduced by our notion of size. It should be noted that for the application of the bounds in [2], our bounds are tighter than those derived by Afshari et al. because the size of the AHORS under consideration is linear in the size of the sequent calculus derivation for both notions of AHORS size, meaning the exponential worst-case illustrated above does not occur.