Conjugacy problem in groups with quadratic Dehn function

We construct a finitely presented group with quadratic Dehn function and undecidable conjugacy problem. This solves E. Rips' problem formulated in 1992. v2: nisprints corrected.


Introduction
We assume that the reader is familiar with basic facts and definitions about van Kampen (disk and annular (Schupp) diagrams over group presentations. We remind some of it in Section 5.2 below, see also books [11,13,25]).
The Dehn function of a finitely presented group G = X | R is the smallest function f (n) such that for every word w of length at most n in the alphabet X ∪ X −1 , which is equal to 1 in G there exists a van Kampen diagram over the presentation of G with boundaary label w and area at most f (n). It is well known [6,7] that the Dehn functions of different finite presentations of the same group are equivalent, where we call two functions f (n), g(n) equivalent if for some constants A, B, C, D ≥ 1 we have − Cn − D < g(n) < Af (Bn) + Cn + D.
As usual, we do not distinguish equivalent functions. The Dehn function of a group is an important asymptotic invariant. From the algorithmic point of view, smaller Dehn function means more tractable word problem (see, for example, the Introduction of [26] for details). Moreover as was shown in [1] a not necessarily finitely presented finitely generated group has word problem in NP if and only if it is a subgroup of a finitely presented group with polynomial Dehn function (a similar result holds for other computational complexity classes [1]). From the geometric point of view the Dehn function measures the "curvature" of the group: linear Dehn functions correspond to negative curvature, quadratic Dehn function correspond to zero curvature, etc.
More precisely, a finitely presented group is hyperbolic if and only if it has a subquadratic (hence linear) Dehn function [6,2,14]. In particular, the conjugacy problem in such groups is decidable [6].
It is also known that groups with quadratic Dehn functions exhibit certain "nongeneric" non-positive curvature behavior as far as geometric and algorithmic properties are concerned. For example their asymptotic cones are simply connected [23]. For large classes of groups with quadratic Dehn functions, the conjugacy problem is decidable. In fact it is true for all known examples of groups with quadratic Dehn functions such as bi-automatic groups [5], SL n (Z), n ≥ 5 [27,8], groups acting geometrically on CAT(0) spaces [4], the R. Thompson group F [9,10], free-by-cyclic groups [3,20], etc. The decidability of conjugacy problem was proved in a completely different way in each of these cases and it is natural to ask if every group with quadratic Dehn function has decidable conjugacy problem and there is a uniform proof of that fact. That question was first formulated by Rips in the early 90s (some of the important results mentioned above had not appeared yet at that time). Problem 1.1 (Rips). Does every finitely presented group with quadratic Dehn function have decidable conjugacy problem?
In fact Rips had a "quasi-proof" showing that the answer should be positive. That "quasi-proof" first appeared in [20]. Basically the idea is the following (see details in [20]). If the conjugacy problem in a group G = X | R is undecidable, then for arbitrary n ∈ N for some pairs of words (u, v) in the alphabet X of length ≤ n, there exists a minimal area annular diagram ∆ with boundaries labeled by u, v and no path connecting the two boundaries of length smaller than any given recursive function f (n). Let q be a simple path connecting the boundaries of ∆, t = |q|. Then there are simple closed paths p 1 , ..., p m of ∆ surrounding the hole such that p i , ..., p j do not intersect if i = j and m > c 1 t for some constant c 1 . The area of ∆ is at least a constant times |p i |. If "many" lengths |p i | are less than c log t where c = 1 2|X| , then two of the paths p i , p j (i = j) have the same labels. That allows us to identify p i , p j and remove the annular subdiagram of ∆ bounded by p i , p j , decreasing the area of ∆, a contradiction. Therefore "many" lengths |p i | are at least c 2 log t for some constant c 2 . Hence the area of ∆ is at least c 3 t log t for some constant c 3 . If we cut ∆ along the path q, we obtain a disk van Kampen diagram ∆ with boundary path subdivided into four parts q 1 p 1 q −1 2 p −1 2 where |p 1 |, |p 2 | ≤ n and the labels of q 1 and q 2 coincide with the label of q. The area of ∆ is at least c 3 t log t. Since the labels of q 1 , q 2 are the same, we can glue t/n copies of ∆ together to obtain a van Kampen diagram ∆ with perimeter bounded from above by a linear function in t and area bounded below by c 3 t 2 log t/n since t is bounded below by any given recursive function in n, n is insignificant compared to t. The diagram ∆ can be assumed reduced. So we found a reduced van Kampen diagram of perimeter ∼ t and area ∼ t 2 log t. Hence the Dehn function cannot be smaller than n 2 log n.
The incorrectness of this "quasi-proof" is in the last phrase. Indeed, there may be a smaller area van Kampen diagram with the same boundary label as ∆ . Still there is a lot of flexibility in choosing ∆ and the path q in it. It looks like it would require infinite number of relations to ensure that all the boundary paths of various diagrams ∆ have feelings with much fewer cells than ∆ . In particular, if G satisfies some mild form of asphericity, the proof should work. Rips conjectured that this should be true for all finitely presented groups. In [20] we confirmed this conjecture for a wide class of multiple HNN extensions of free groups. We also constructed in [20] a multiple HNN extension of a free group with undecidable conjugacy problem and the minimal possible Dehn function n 2 log n.
Nevertheless, in this paper, we give a negative answer to Rips' question (and hence disprove Rips's conjecture as well): Theorem 1.2. There exists a finitely presented group with undecidable conjugacy problem and quadratic Dehn function.
As in several of our previous papers ( [26,1,20,17] the construction is based on an S-machine (we call it M 5 ) which can be viewed as a computing device with undecidable halting problem or as a group which is a multiple HNN extension of a free group. Smachines were first introduced by Sapir in [26] (see Section 2.1 below for the definition used here and [24] for various other definitions).
In order to describe some ideas of our proof in more details, let us start with a simple example of an S-machine S (That S-machine first appeared in [21]. The corresponding group was the first example of a group with polynomial Dehn function, linear isodiametric function and non-simply connected asymptotic cones answering a question of C. Druţu.) It is a rewriting system [25] with alphabet {a, q, a −1 , q −1 } and two "same" rules θ i : q → aq and their inverses θ −1 i : q → a −1 q, i = 1, 2. The rewriting system works with group words in {a, q}. And applying a rule θ ±1 i means replacing every letter q (where = ±1 by (a ±1 )q) and then reducing the word. The S-machine S can also be viewed as a multiple HNN extension of the free group a, q : a, q, θ 1 , θ 2 | q θ i = aq, a θ i = a, i = 1, 2 .
(Note that this is far from the only way to interpret an S-machines as groups. We are using a different interpretation in this paper, and the most complicated one so far was used in [19]. But the main principle is still the same.) As the name S-machine suggests, we can also consider S as a kind of Turing machine with tape letter a, state letter q and commands θ 1 , θ 2 (and their inverses). Then we can consider computations. Say, is a reduced computation of S. At the same time if we consider S as an HNN-extension of the free group, then this computation corresponds the van Kampen diagram on Figure  1. This diagram is called the trapezium corresponding to the computation (1.1). Three things need to be noticed from this diagram.
1. The trapezium looks like a rectangle with the first word and the last word of the computation on the bottom and top side. All other words of the computation are on the horizontal paths of the trapezium, and θ's conjugate each of these words to the next one.
2. The vertical sides of the trapezium are labeled by the same words: the history of the computation (in the case of (1.1) it is θ 1 θ 2 θ −1 1 θ −1 2 ). 3. The trapezia has three types of bands (also called in the literature corridors), i.e. sequences of cells where each two consecutive cells share an edge with a prescribed letters: horizontal θ ±1 i -bands, vertical q-bands and a-bands. The median lines of these bands serve as "walls" in van Kampen diagrams over S-machines, provide necessary rigidity and are crucial for all applications of S-machines. Now let us continue our description of the construction and proof of Theorem 1.2.
As any S-machine viewed as a group, M 5 contains Y -letters, q-letters and θ-letters. Some of the words containing only q-and Y -letters are called input words. Among them, there is one word W 0 which does not contain the Y -letters. All other input words are obtained from W 0 by inserting a power of a single letter a into the input sector of W 0 .
If we view M 5 as multiple HNN-extension M 5 of a free group, it easily follows from undecidability of the halting problem by M 5 that the group M 5 has undecidable conjugacy problem (the existence of such S-machines was proved in [26]). (A construction of groups with undecidable conjugacy problem was suggested by C. Miller [12]. In fact although C. Miller did not use the term S-machine, the first non-trivial S-machine was constructed by him. We call it the Miller machine in [19].) By [20], the Dehn function of M 5 is at least n 2 log n.
But we prove here that most of the area in van Kampen diagrams of large area over M 5 is concentrated in a few standard trapezia which we call big trapezia.
The phenomenon that large area of a van Kampen diagram is concentrated in a few large standard subdiagrams is interesting and seems to be very common. For example, we proved similar facts for van Kampen diagrams over presentations satisfying the small cancelation condition C(p) − T (q) in the the CAT(0) case 1 p + 1 q = 1 2 in [22]. In that case the geometric meaning of existence of large standard subdiagram is very close to a popular topic in CAT(0) geometry: "every quasi-flat in the universal cover of the presentation complex is close to a flat" (see a discussion in [22]). In the case of S-machines, we proved similar facts in [20] and [17], in both cases, as in the present paper, these were crucial steps in the proofs.
The big trapezia over M 5 must correspond to "very long" computations of M 5 . The S-machine M 5 is constructed in such a way that long computations C are "normal", that is the start word w of C can be reached by the S-machine M 5 from some (determined uniquely by the computation) input word u of M 5 . That is another crucial property of M 5 .
Then we lower the Dehn function from ≥ n 2 log n to O(n 2 ). For this goal we embed M 5 in a larger group G. We extend the S-machine M 5 to M = M 6 and add hub relations to obtain group G. The hub is the product of L 1 copies of the accept word of M; for convenience, we use two hubs in this paper, but the second hub relation follows from the other relations of G. (Note that hub relations are usually used in constructions of groups with undecidable word problem, but the word problem in G is decidable.) The hubs and the disks (that are hubs surrounded by θ-annuli) make the areas of trivial over M words quadratic with respect of the presentation of G (another important idea). Therefore the presentation of G is highly non-aspherical: the boundaries of the large normal trapezia can be filled both by diagrams with ∼ n 2 log n cells and by diagrams with at most ∼ n 2 cells.
The new S-machine M is obtained by augmenting M 5 with a simple S-machine M 12 (the union of Steps 1 and 2 in the definition of M given in Subsection 4.1) which starts with a specific input word W 0 with no tape letters, and produces (nondeterministically) an arbitrary input word u of M 5 by inserting the a power of a into the input sector (by a command similar to θ 1 of S). This augmentation provides us with the property that arbitrary configuration of a "long computation" of M can be reached with linear time and space either from W 0 or from the stop configuration of M. Afterwards this linearity guaranties quadratic estimates of the areas of both disks and big trapezia over the presentation of G. The linearity is achieved by, in particular, adding many so-called history sectors where the history of a computation is non-deterministically written before the actual computation executing that history starts.
In order to connect M 12 with the S-machine M 5 and obtain the main S-machine M, we need one rule, called θ(23) which changes the state letters to the start state letters of M 5 . However the standard interpretation of M as a group would make the conjugacy problem decidable in the group M . So the rule θ(23) is interpreted in G as turning all copies of the input word into identical words (by erasing extra indices). This new "irregular" interpretation requires a study of some non-reduced (elligible) computations, i.e., the history of an "eligible" computation may contain (many) subwords θ(23)θ(23) −1 .
The proof that G has quadratic Dehn function is much harder than the proof of undecidability of the conjugacy problem. We use several tools developed in [26,20,16,17] and more. As in all our papers where estimates of the Dehn function are produced, we need to consider diagrams with and without hubs separately. This is done in Sections 6 and 7 respectively. In both cases, one of the main ideas is to assign to the boundary of every van Kampen diagram ∆ over the presentation of G a certain numeric invariant µ(∆) (the mixture from [17]) which is bounded from above by a quadratic function in terms of the perimeter. We had a somewhat similar numeric invariant called dispersion in [20] but that invariant does not work well for diagrams with hubs.
To obtain a quadratic estimate for diagrams ∆ over M , we have to consider an artificial G-areas instead of areas, and just at the end of this paper we replace the diagrams of quadratic G-area over M with diagrams with hubs, having quadratic (usual) areas over G. The quadratic upper bound for G-area is obtained by induction over the (modified) perimeter n of ∆. We perform surgeries on the diagram, so that each surgery makes the diagram look more "standard" and smaller. Our inductive argument estimates the G-area in terms of some linear combination of n 2 and the mixture µ(∆). Although we are not able to choose just one of these two summands for induction, the final upper bound of the G-area is O(n 2 ), because of the aforementioned quadratic estimate of the mixture in terms of n.
In the case of diagrams with hubs, we estimate a similar linear combination, but the inductive parameter is not the (modified) perimeter n but the sum Σ = n + σ(∆). The invariant σ(∆) = σ λ (∆) was invented in [17]. It is defined by the design formed by maximal bands of two types in ∆. The important and non-trivial feature of the σinvariant is the linear inequality σ λ (∆) = O(n), and so the quadratic upper bound of the form O(Σ 2 ) is also quadratic in terms of the perimeter n.
In fact in both cases (over M or over G), the proof proceeds by taking a minimal counterexample diagram ∆ and then by performing surgeries trying to find a smaller counterexample. This provides more and more useful information about ∆, until finally one of the surgeries succeeds and we show that ∆ could not have been a minimal counterexample.
For instance, in Section 7 where diagrams with hubs are considered, we need to remove one of the disks from the diagram. As in our previous papers (starting with [26] and [15]), we use hyperbolicity of certain graph associated with hubs (hubs are vertices, q-bands connecting hubs are edges), and find a hub connected to the boundary of the whole diagram by almost all bands starting on the hub. This gives a subdiagram of ∆ consisting of a subdiagram called a clove and a disk. We would like to remove that subdiagram from ∆ producing a smaller counterexample.
A similar task was solved in [26]. It is one of the most non-trivial parts of [26]. Using it, we decomposed a diagram in [26] into a few disks of small total perimeter, and a diagram without hubs, it was called the snowman decomposition. But that task is now much harder than in [26]. The reason is that in [26], after removing the clove and the disk, we needed to show that the perimeter of the diagram decreases and the perimeter of the removed disk (only the disk) is linearly bounded by the difference of the perimeters of the old and new diagrams. For the quadratic upper bound this is not enough. We need to get a linear lower bound of the difference in terms of the whole piece that we cut off (the clove and the disk). That can be achieved not always. If not, we get a new information about the disk and the clove and remove the disk together with a certain sub-clove. The mixture and σ λ invariant help achieve it at the end.
Some estimates used in this paper are very similar to the estimates in [17] More precisely for every function f (n) satisfying certain conditions, a finitely presented group G f with Dehn function n s f (n) 3 (where s ≥ 2) is constructed in [17]. In particular, if s = 2 and f (n) is a constant, then G f has quadratic Dehn function. Although the group G f in [17] is very different from the group G in this paper, the underlying S-machines have similar enough properties, so that we could use identical and almost identical proofs of several lemmas (which indicates that there is a general theory of S-machines for which this paper and [17] are applications). For the sake of completeness, we include these lemmas here.

S-machines as rewriting systems
There are several equivalent definitions of S-machines (see [24]) We are going to use the following definition which is easily seen to be equivalent to the original definition from [26] (essentially the same definition was used in [20]): Here and below denotes the disjoint union of sets.
We always set Y n = Y 0 = ∅ and if Q n = Q 0 (i.e., the indices of Q i are counted mod n, then we say that S is a circular S-machine. The elements from Q are called state letters, the elements from Y are tape letters. The sets Q i (resp. Y i ) are called parts of Q (resp. Y ).
The language of admissible words consists of reduced words W of the form where every q i is a state letter from some part Q ±1 j(i) , u i are reduced group words in the alphabet of tape letters of the part Y k(i) and for every i = 1, ..., s one of the following holds: Every subword q i u i q i+1 of an admissible word (2.2) will be called the Q ±1 j(i) Q ±1 j(i+1) -sector of that word. An admissible word may contain many Q ±1 j(i) Q ±1 j(i+1) -sectors. For every word W , if we delete all non-Y ±1 letters from W we get the Y -projection of the word W . The length of the Y -projection of W is called the Y -length and is denoted by |W | Y . Usually parts of the set Q of state letters are denoted by capital letters. For example, a part P would consist of letters p with various indices.
If an admissible word W has the form (2.2), W = q 1 u 1 q 2 u 2 ...q s , and q i ∈ Q ±1 j(i) , i = 1, ..., s, u i are group words in tape letters, then we shall say that the base of W is the word Here Q i are just symbols which denote the corresponding parts of the set of state letters. Note that, by the definition of admissible words, the base is not necessarily a reduced word.
Instead of saying that the parts of the set of state letters of S are Q 0 , Q 1 , ..., Q n we will write that the the standard base of the S-machine is Q 0 ...Q n .
The software of an S-machine with the standard base Q 0 ...Q n is a set of rules Θ.
Each component q i → a i q i b i is called a part of the rule. In most cases the sets Y j (θ) will be equal to either Y j or ∅. By default Y j (θ) = Y j .
To apply a rule θ = [q 0 → a 0 q 0 b 0 , ..., q n → a n q n b n ] as above to an admissible word . . , s − 1); and if this property holds, • if the resulting word is not reduced or starts (ends) with Y -letters, then reduce the word and trim the first and last Y -letters to obtain an admissible word again.
For example, applying the rule [q 1 → a −1 q 1 b, q 2 → cq 2 d] to the admissible word then after trimming and reducing we obtain If a rule θ is applicable to an admissible word W (i.e., W belongs to the domain of θ) then we denote the result of application of θ to W by W · θ. Hence each rule defines an invertible partial map from the set of configurations to itself, and one can consider an S-machine as an inverse semigroup of partial bijections of the set of admissible words.
We call an admissible word with the standard base a configuration of an S-machine. We usually assume that every part Q i of the set of state letters contains a start state letter and an end state letter. Then a configuration is called a start (end ) configuration if all state letters in it are start (end) letters. As Turing machines, some S-machines are recognizing a language. In that case we choose an input sector, usually the Q 0 Q 1sector, of every configuration. The Y -projection of that sector is called the input of the configuration. In that case, the end configuration with empty Y -projection is called the accept configuration. If the S-machine (viewed as a semigroup of transformations as above) can take an input configuration with input u to the accept configuration, we say that u is accepted by the S-machine. We define accepted configurations (not necessarily start configurations) similarly.
A computation of length t ≥ 0 is a sequence of admissible words W 0 → · · · → W t such that for every 0 = 1, ..., t − 1 the S-machine passes from W i to W i+1 by applying one of the rules θ i from Θ. The word H = θ 1 . . . θ t is called the history of the computation. Since W t is determined by W 0 and the history H, we use notation W t = W 0 · H.
A computation is called reduced if its history is a reduced word. Clearly, every computation can be made reduced (without changing the start or end configurations of the computation) by removing consecutive mutually inverse rules.
Note, though, that in this paper, unlike the previous ones, we consider non-reduced computations too because these may correspond to reduced van Kampen diagrams under our present interpretation of S-machines in groups. The If for some rule θ = [q 0 → a 0 q 0 b 0 , ..., q n → a n q n b n ] ∈ Θ of an S-machine S the set Y i+1 (θ) is empty (hence in every admissible word in the domain of θ every Q i Q i+1 -sector has no Y -letters) then we say that θ locks the Q i Q i+1 -sector. In that case we always assume that b i , a i+1 are empty and we denote the i-th part of the rule q i → a i q i . If the Q i Q i+1 -sector is locked by θ then we also assume that a i+1 is empty too.
Thus for the sake of brevity we will allow parts of rules of the form q i ...q j → aq i ...q j b. If the rule locks the Q s Q s+1 -sector where Q s is the part of state letters containing q j , q j , then we write q i ...q j → aq i ...q j b (in that case b is empty).
The above definition of S-machines resembles the definition of multi-tape Turing machines (see [26]). The main differences are that every state letter of an S-machines is blind: it does not "see" tape letters next to it (two state letters can see each other if they stay next to each other). Also S-machines are symmetric (every rule has an inverse), can work with words containing negative letters, and words with "non-standard" order of state letters.
It is important that S-machines can simulate the work of Turing machines. This nontrivial fact, especially if one tries to get a polynomial time simulation, was first proved in [26]. but we do not need a restriction on time, and it would be more convenient for us to use an easier S-machine from [20].
Let M 0 be a deterministic Turing machine accepting a non-recursive language L of words in the one-letter alphabet {α}.
There is a recognizing S-machine M 1 whose language of accepted input words is L. In every input configuration of M 1 there is exactly one input sector, the first sector of the word, and all other sectors are empty of Y -letters.
We say that two recognizing S-machines are equivalent if they have the same language of accepted configurations.
We can simplify rules of any S-machine in the obvious way.
Lemma 2.3. Every S-machine S is equivalent to an S-machine S , where (*) every part q i → aq i b of an S-rule of S has ||a|| ≤ 1, ||b|| ≤ 1, i.e., both words a and b are just letters from Y ±1 or empty words; (**) moreover S can be constructed so that for every rule θ = [q 0 → a 0 q 0 b 0 , ..., q n → a n q n b n ] of S , we have i (||a i || + ||b i ||) ≤ 1.
For example, a rule [q → aq b] is equivalent to the set of two rules [q → aq ], [q → q b] where q is a new state letter added to the part containing q and q .
Thus, applying Lemma 2.2 we will assume that the S-machine M 1 satisfies Property (**).

Some elementary properties of S-machines
The base of an admissible word is not always a reduced word. However the following is an immediate corollary of the definition of admissible word.
In this paper we are often using copies of words. If A is an alphabet and W is a word involving no letters from A ±1 , then to obtain a copy of W in the alphabet A we substitute letters from A for letters in W so that different letters from A substitute for different letters. Note that if U and V are copies of U and V respectively corresponding to the same substitution, and U ≡ V , then U ≡ V, where '≡' means leter-by-letter equality of words. We also use copies of S-machines (defined in the same way).
The following two lemmas also immediately follow from definitions (see details in [17, Lemmas 2.6,2.7]). Lemma 2.5. Suppose that the base of an admissible word W is Q i Q i+1 . Suppose that each rule of a reduced computation starting with W ≡ q i uq i+1 and ending with W ≡ q i u q i+1 multiplies the Q i Q i+1 -sector by a letter on the left (resp. right). And suppose that different rules multiply that sector by different letters. Then (a) the history of computation is a copy of the reduced form of the word u u −1 read from right to left (resp. of the word u −1 u read from left to right). In particular, if u ≡ u , then the computation is empty; (b) the length of the history H of the computation does not exceed ||u|| + ||u ||; (c) for every configuration q i u q i+1 of the computation, we have ||u || ≤ max(||u||, ||u ||).
Suppose that each rule θ of a reduced computation starting with W ≡ q i uq −1 i (resp., q −1 i uq i ), where u = 1, and ending with W ≡ q i u (q i ) −1 (resp., W ≡ (q i ) −1 u q i ) has a part q i → a θ q i b θ , where b θ (resp., a θ ) is a letter, and for different θ-s the b θ -s (resp., a θ -s) are different. Then the history of the computation has the form H 1 H k 2 H 3 , where k ≥ 0, ||H 2 || ≤ min(||u||, ||u ||), ||H 1 || ≤ ||u||/2, and ||H 3 || ≤ ||u ||/2. Lemma 2.8. Suppose that a reduced computation W 0 → W 1 → · · · → W t of an Smachine S satisfying (*) has a 2-letter base and the history of the form H ≡ H 1 H k 2 H 3 (k ≥ 0). Then for the Y -projection w i of W i (i = 0, 1, . . . , t) , we have the inequality Proof. By (*) we have that the absolute value of ||w i || − ||w i−1 || is at most 2 for every Denote the words w i with i = ||H 1 || + j||H 2 ||, by u j , j = 0, 1, . . . , k and the corresponding words W i by U j . Then there exist two words v l , v r such that for every s from 1 to k, u s = v l u s−1 v r in a free group for some Y -words v l and v r depending on H 2 . Hence u j = v j l u 0 v j r , where both v l and v r have length at most ||H 2 || by (*). By [19,Lemma 8.1], the length of an arbitrary word U j then is not greater than ||v l || + ||v r || + ||U 0 || + ||U k || provided 0 ≤ j ≤ k.

The highest parameter principle
In this paper, we estimate length and space of computations of S-machines, and also areas and other numerical invariants of van Kampen diagrams. The following constants will be used in the estimates throughout this paper.
where means "much smaller". For each inequality in this paper involving several of these constants, let D be the biggest constant appearing there. The inequality always can then be rewritten in the form D ≥ some expression involving smaller constants.
This highest parameter principle [13] makes the system of inequalities used in this paper consistent.
3 Auxiliary S-machines and constructions

Running state letters
For every alphabet Y we define a "running state letters" S-machine LR(Y ). We will omit Y if it is obvious or irrelevant. The standard base of LR(Y ) is Q (1) P Q (2) where (2) }. The state letter p with indices runs from the the state letter q (2) to the state letter q (1) and back. The S-machine LR will be used to check the "structure" of a configuration (whether the state letters of a configuration are in the appropriate order), and to recognize a computation by its history.
The alphabet of tape letters (1) . The positive rules of LR are defined as follows.
Comment. The state letter p (2) moves right towards q (2) replacing letters a from Y (2) by their copies a from Y (1) .
Remark 3.1. Note that each of the rules (ζ j ) ±1 (a), (j = 1, 2) either moves the state letter p left or moves it right, or deletes one letter from left and one letter from right, or insert letters from both sides of itself. In the later case, the next rule of a computation must be again ζ(j) ±1 (b) for some b, and if the computation is reduced, it again must increase the length of the configuration by two. This observation implies Remark 3.2. Note that no rule of LR changes the projection of a configuration onto the free group with basis Y (1) if the state letters are mapped to 1 and the letters from Y (2) are mapped to their copies from Y (1) . This will be later referred to as the projection argument. (2) and q (1) vp (2) q (2) , then the history H of C is a copy of the wordūζ(12)(ū ) −1 whereū is the mirror image of u andū is a copy ofū. Thus W 0 , W t , H uniquely determine each other in that case.
Suppose that |W i−1 | Y < |W i | Y for some i. That means that the i-th rule in the computation is of the form (ζ (k) (a)) ±1 . This rule multiplies u i−1 by a letter a ±1 on the right, and multiplies v i−1 by a copy of the inverse of that letter on the left, and these letters do not cancel in u i , v i . In particular both u i and v i are not empty. Hence ζ (12) does not apply to W i . Thus the rule in W i → W i+1 is (ζ (j) (b)) ±1 (with the same j) and it multiples u i = u i−1 a by b ±1 on the right and multiples v i by a copy of the inverse of that letter on the left. Since the computation is reduced Continuing in this manner, we establish (1).
To establish (2), we can choose the shortest word W j in the computation and apply (1) to the computation W j → · · · → W t and the inverse computation W j → · · · → W 0 .
Suppose that the assumptions of (3) hold. Then u ≡ v by the projection argument. Since ζ (12) locks Q 1 P -sector, the p-letter must reach q (1) moving always left to change p (1) by p (2) , and so W k ≡ q (1) p (1) . . . . If the next rule of the form ζ (1) (a) ±1 could increase the length of the configuration, we would obtain a contradiction with Property (1). Since the computation is reduced, the next rule is ζ (12) , and arguing in this way, one uniquely reconstructs the whole computation in case (3) for given W 0 or W t , and vice versa, the history H determines both u and v. Propery (4) holds for same reasons.
The projection argument also immediately gives: (2) and for some words u, v, then |W j | Y ≥ |W 0 | Y for every j = 0, . . . , t.
Remark 3.5. We will also use the right analog RL of LR. The base of RL is Q 1 RQ 2 . The state letter r first moves right from q (1) to q (2) and then left. Lemmas "left-right dual" to Lemmas 3.3 and 3.4 as well as Remark 3.2 are true for RL as well.
Remark 3.6. For every m ≥ 1, we will also need the S-machine LR m , that repeats the work of LR m times. That is the S-machine LR m runs the state letter p back and forth between q (2) and q (1) m times. Every time p meets q (1) or q (2) , the upper index of p increases by 1 after the application of the rule ζ (i,i+1) (i = 1, . . . , 2m − 1), so the highest upper index of p is (2m). A precise definition of LR m is obvious and is left to the reader. (Recall that m is one of the system of parameters used in this paper (see Section 2.3).) Remark 3.7. The analog of Lemma 3.3 holds for LR m . In particular, if (2) in the formulaion of (3), then t = 2mk + 2m − 1 (the proof is essentially the same and is left to the reader).

Adding history sectors
We will add new (history) sectors to our S-machine M 1 . If we ignore the new sectors, we get the hardware and the software of the S-machine M 1 . The new S-machine M 2 will start with a configuration where in every history sector a copy of the history H of a computation of M 1 is written. Then it will execute H on the other (working) sectors simulating the work of M 1 , while in the history sector, a state letter moves scans the history, one symbol at a time. Thus if a computation with the standard base starts with a configuration W and ends with configuration W , then the length of the computation does not exceed ||W || + ||W ||.
Here is a precise definition of M 2 . Recall that the S-machine M 1 satisfies the condition (**) of Lemma 2.3 and has hardware (Q, Y ), where Q = n i=0 Q i , and the set of rules Θ. The new S-machine M 2 has hardware Q 0,r Q 1, Q 1,r Q 2, Q 2,r · · · Q n, , Y h = Y 1 X 1 Y 2 · · · X n−1 Y n where Q i, and Q i,r (left and right) copies of Q i X i is a disjoint union of two copies of Θ + , namely X i, and X i,r . (The sets Q 0, , Q n,r are empty.) Every letter q from Q i has two copies q ( ) ∈ Q i, and q (r) ∈ Q i,r . By definition, the start (resp. end) state letters of M 2 are copies of the corresponding start (end) state letters of M 1 . The Q 0,r Q 1, -sectors are the input sectors of configurations of M 2 .
The positive rules θ h of M 2 are in one-to-one correspondence with the positive rules θ of M 1 . If θ = [q 0 → a 0 q 0 b 0 , ..., q n → a n q n b n ] is a positive rule of M 1 , then each part where h θ,i (resp., h θ,i ) is a copy of θ in the alphabet X i, (in X i,r , respectively). If θ is the start (resp. end) rule of M 1 , then for any word in the domain of θ h (resp. θ −1 h all Y -letters in history sectors are from i X i, (resp. X i,r ). Thus for every rule θ of M 1 , the rule θ h of M 2 acts in the Q i,r Q i+1, -sector in the same way as θ acts in the Q i Q i+1 -sector. In particular, Y -letters which can appear in the Q i,r Q i+1, -sector of an admissible word in the domain of θ h are the same as the Y -letters that can appear in the Q i Q i+1 -sector of an admissible word in the domain of θ. Hence if θ locks Q i Q i+1 -sectors, then θ h locks Q i,r Q i+1, -sectors. Remark 3.9. Every computation of the S-machine M 2 with history H and the standard base coincides with the a computation of M 1 whose history is a copy of H if one observes it only in working sectors Q i,r Q i+1,l . In the standard base of M 2 the working sectors Q i,r Q i+1, alternate with history sectors Q i, Q i,r . Every positive rule θ h multiples the content of the history Q i, Q i,r -sector by the corresponding letter h θ,i from the right and by letter h −1 θ,i from the left. Thus if the S-machine M 2 executes the history written in the history sectors, then the history word H in letters from X i, gets rewritten into the copy of H in letters from X i,r . Say, if the copy of the history H was written in a history sector as h 1 h 2 h 3 , then during the computation with history H it will transform as follows: Let I 1 (α k ) be a start configuration of M 1 (i.e.,a configuration in the domain of the start rule of M 1 ) with α k written in the input sector (all other sectors do not contain Y -letters). Then the corresponding start configuration I 2 (α k , H) of M 2 is obtained by first replacing each state letter q by the product of two corresponding letters q ( ) q (r) , and then inserting a copy of H in the left alphabet X i, in every history Q i, Q i,r -sector. End configurations A 2 (H) of M 2 are defined similarly, only the Y -letters in the history sectors must be from the right alphabet X i,r . Lemma 3.10. (1) If a word α k is accepted by the Turing machine M 0 , then for some word H, there is a reduced computation I 2 (α k , H) → · · · → A 2 (H) of the S-machine M 2 .
(2) If there is a computation I 2 (α k , H) → · · · → A 2 (H ) of M 2 , then the word α k is accepted by M 0 and H ≡ H.
Proof. (1) The word α k is accepted by the S-machine M 1 by Lemma 2.2. If H is the history of the accepting computation of M 1 , then the computation of M 2 with history H starting with I 2 (α k , H) ends with A 2 (H) since M 2 works as M 1 in the working sectors and replaces the letters from the left alphabets by the corresponding letters from the right alphabets in the history sectors.
(2) If I 2 (α k , H)·H = A 2 (H ) for some history H of M 2 then the word α k is accepted by M 0 by Lemma 2.2 and the fact that M 2 works as M 1 in the working sectors. Note that both H and H must be the copies of H , because the word I 2 (α k , H) has no letters from right alphabets, A 2 (H ) has no letters from left alphabets, and every rule multiplies the Y -projection of every history sector by a letter from X −1 i, (from X i,r ) on the left (resp., on the right).
The sectors of the form Q i, Q −1 i, and Q −1 i,r Q i,r (in a non-standard base) are also called history sectors. History sectors help obtaining a linear estimate of the space of every computation W 0 → · · · → W t in terms of ||W 0 || + ||W t ||.
Lemma 3.11. Let W 0 → · · · → W t be a reduced computation of M 2 with base Q i, Q i,r and history H. Assume that all the Y -letters of W 0 belong to only one of the alphabets .., t, and assume that v 0 has no letters from X i,r . Then v t = uv 0 u , where u is a copy of H −1 in the alphabet X i, and u is a copy of H in X i,r . So no letter of u is cancelled in the product uv 0 u , Therefore |W t | Y ≥ ||u || = ||H|| and Proof. Let Q ±1 i 1 . . . Q ±1 i k be the base of the computation. We can divide the base into several subwords of length 3 or 4, each containing one history sector. Thus we can assume that k is equal to 3 or 4 and that the base contains one history sector. Without loss of generality, that history sector is either a Q i, Q i,r -sector or a Q i, Q −1 i, -sector or a Q −1 i,r Q i,r -sector. Consider two cases. 1. The history sector has the form Q i, Q i,r . By Lemma 2.6, we have ||H|| ≤ 1 The history sector is either a Q i, Q −1 i, -sector or a Q −1 i,r Q i,r -sector. Then one can apply Lemma 2.7 to the history sector and obtain the factorization H ≡ H 1 H c 2 H 3 , with c ≥ 0, ||H 2 || ≤ min(||u 0 ||, ||u t ||), ||H 1 || ≤ ||u 0 ||/2, and ||H 3 || ≤ ||u t ||/2, where u 0 and u t are the Y -projections of the history sectors of W 0 and W t , respectively. Since every W i has at most three sectors, applying Lemma 2.8 to each of them, we obtain: Lemma 3.13. Suppose that a reduced computation W 0 → · · · → W t of the S-machine M 2 starts with an admissible word W 0 having no letters from the alphabets X i,l (resp., from the alphabets X i,r ) . Assume that the length of its base B is bounded from above by a constant N 0 , and B has a history subword Proof. Let V 0 → · · · → V t be the restriction of the computation to the Q i, Q i,r -sector.

Adding running state letters
Our next S-machine will be a composition of M 2 with LR and RL. The running state letters will control the work of M 3 . First we replace every part Q i of the state letters in the standard base of M 2 by three parts P i Q i R i where P i , R i contain the running state letters. Thus if Q 0 ...Q s is the standard base of M 2 then the standard base of M 2 is where P i (resp., R i ) contains copies of running P -letters (resp. R-letters) of LR (resp. RL), i = 0, . . . , s.
where p (i) ∈ P i , r (i) ∈ R i ) do not depend on θ.
Comment. Thus, the sectors P i Q i and Q i R i are always locked. Of course, such a modification is useless for solo work of M 2 . But it will be helpful when one constructs a composition of M 2 with LR and RL which will be turned on after certain rules of M 2 are applied. If ) of admissible words with nonstandard bases will be called history sectors of M 2 too. (Alternatively, history sectors of admissible words of M 2 are those sectors which can contain letters from left or right alphabets.) The R 0 P 1 -sectors of admissible words are the input sectors. The R 0 R −1 0 − and P −1 1 P 1 -sectors are also input sectors of admissible words of M 2 . If B is the base of some computation C of M 2 , and U V is a 2-letter subword of B such that U V -sectors of admissible words in C are history (resp. working, input) sectors, then we will call U V a history (resp. working, input) subword of B.

M 3
The next S-machine M 3 is the composition of the S-machine M 2 with LR and RL. The S-machine M 3 has the input, working and history sectors, i.e. the same base as M 2 , although the parts of this base have more state letters than the corresponding parts of M 2 . It works as follows. Suppose that M 3 starts with a start configuration of M 2 , a word α k in the input R 0 P 1 -sector, copies of a history word H in the alphabets X i, in the history sectors, all other sectors empty of Y -letters. Then M 3 first executes RL in all history sectors (moves the running state letter from R i in the history sectors right and left), then it executes the history H of M 2 . After that the Y -letters in the history sectors are in X i,r and M 3 executes copies of LR in the history sectors (moves the running state letters left then right). After that M 3 executes a copy of H backwards, getting to a copy of the same start configuration of M 2 , runs RL, executes a copy of the history H of M 2 , runs a copy of LR, etc. It stops after m times running RL, M 2 , LR, M −1 2 and running RL one more time.
Thus the S-machine M 3 is a concatenation of 4m + 1 S-machines M 3,1 − M 3,4m+1 . After one of these S-machines terminates, a transition rule changes its end state letters to the start state letters of the next S-machine. All these S-machines have the same standard bases as M 2 .
The configuration I 3 (α k , H) of M 3 is obtained from I 2 (α k , H) by adding the control state letters r Set M 3,1 is a copy of the set of rules of the S-machine RL, with parallel work in all history sectors, i.e., every subword Q i−1 R i−1 P i of the standard base, where Q i−1 Q i is a history sector of M 2 , is treated as the base of a copy of RL, that is R i−1 contain the running state letters which run between state letters from Q i−1 and P i . Each rule of Set M 3,1 executes the corresponding rule of RL simultaneously in each history sector of M 2 . The partition of the set of state letters of these copies of RL in each history sector is X i, X i,r for some i (that is state letters from R i−1 first run right replacing letters from X i, by the corresponding letters of X i,r and then run left replacing letters from X i,r by the corresponding letters of X i, . The transition rule χ(1, 2) changes the state letters by the state letters of start configurations of M 2 . The admissible words in the domain of χ(1, 2) ±1 have all Y -letters from the left alphabets X i, . The rule χ(1, 2) locks all sectors except the history sectors R i−1 P i and the input sector. It does not apply to admissible words containing Y -letters from right alphabets.
Set M 3,2 is a copy of the set of rules of the S-machine M 2 . The transition rule χ(2, 3) changes the state letters of the stop configuration of M 2 by their copies in a different alphabet. The admissible words in the domain of χ(2, 3) ±1 have no Y -letters from the left alphabets X i, . The rule χ(2, 3) locks all sectors except for the history sectors R i−1 P i . It does not apply to admissible words containing Y -letters from right alphabets.
Set M 3,3 is a copy of the set of rules of the S-machine LR, with parallel work in the same sectors as Set M 3,1 (and the partition of Y -letters in each history sector X i,r X i, ). The transition rule χ(4, 5) changes the state letters of the start configuration of M 2 by their copies in a different alphabet. The admissible words in the domain of χ(4, 5) ±1 have no Y -letters from the right alphabets X i,r . The rule χ(4, 5) locks all non-history and non-input sectors. . . . Sets M 3,4m−3 , . . . , M 3,4m consist of copies of the steps M 3,1 , . . . , M 3,4 , respectively. Set M 3,4m+1 is a copy of Set M 3,1 . The end configuration for Set M 3,4m+1 , A 3 (H), is obtained from a copy of A 2 (H) by inserting the control letters according to (3.4).
The transition rules χ(i, i + 1) are called χ-rules. We say that a configuration W of the S-machine M 3 is tame if every P -or R-letter is next to some Q-letter in W .
Lemma 3.14. Let C : W 0 → · · · → W t be a reduced computation of M 3 consisting of rules of one of the copies of LR or RL with standard base. Then Proof. (a) Let W r be a shortest word of the computation C.
It follows that the number of sectors increasing their lengths by two at the transition W s → W s+1 is greater than the number of the sectors decreasing the lengths by 2. Now it follows from Lemma 3.3 (1) that the lengths of the Y -projections will keep increasing: If the word W 0 is tame, then it is the shortest configuration by the projection argument.
(b) If the rules do not change the lengths of configurations, then every control letter runs right and left only one time by Lemma 3.3 (4), and the inequality follows. If ||W r || < ||W r+1 || for some r, then every next transition keeps increasing the length by Lemma 3.3 (1), and so the inequality holds as well.
Lemma 3.15. Let C : W 0 → · · · → W t be a reduced computation of M 3 . Then for every i, there is at most one occurrence of the rules χ(i, i + 1) ±1 in the history H of C provided the base of C has a history (R j−1 P j ) ±1 -sector.
Proof. Arguing by contradiction, we can assume that where H is a copy of the history of a computation of either LR or RL or M 2 . The first two case contradict Lemma 3.3 (4). The later case is also impossible. Indeed, consider any history subword (R j−1 P j ) ±1 of the base of the computation. Then the Y -projection of the (R j−1 P j ) ±1 -sector of W 1 must be a word either in the X j, or in X j,r (depending on the parity of i). Without loss of generality assume that it is X j, . Then the computation W 1 →, . . . , → W t−1 multiplies the Y -projection of the (R j−1 P j ) ±1 -sector of W 1 by a word in X j, and a reduced word in X j,r . Hence the (R j−1 P j ) ±1 -sector of W t−1 contains letters from a right alphabet, hence W t−1 cannot be in the domain of χ(i, i + 1) ±1 , a contradiction.
Lemma 3.16. Let C : W 0 → · · · → W t be a reduced computation of M 3 . Suppose also that the base of C is standard, then (a) if the history of C has the form χ(i, i + 1)H χ(i + 4, i + 5), then the word W 0 is a copy of W t ; (b) two subcomputations C 1 and C 2 of C with histories χ(i, i + 1)H χ(i + 4, i + 5) and χ(j, j + 1)H χ(j + 4, j + 5) have equal lengths; moreover a cyclic permutation of C 2 is a copy of Proof. (a) Without loss of generality we assume that i = 1. Consider the projection H χ of the history H of C onto the alphabet of χ-rules of M 3 . By the definition of M 3 , if χ = χ(j, j + 1) ±1 is a letter in H χ , then the next letter in H χ is either χ −1 or χ(j − 1, j) ±1 or χ(j + 1, j + 2). By Lemma 3.15, for the every letter χ, the word H χ contains at most one occurrence of χ ±1 . This implies that H χ ≡ χ(1, 2)χ(2, 3)χ(3, 4)χ(4, 5)χ (5,6).
Therefore the history of C has the form Let U V be a history 2-letter subword in the base B of the computation C. The Yprojection u of the U V -sector of W 1 is a word in a left alphabet, while the Y -projection of the U V -sector of W 1 · H 1 is a word in the corresponding right alphabet. Each rule θ of H 1 multiples the Y -projection of the U V -sector by a letter from the left alphabet on the left and by a letter from the right alphabet on the right. The two letters correspond to the rule θ. Therefore u must be a copy of H 1 . In particular, this implies that the Y -projections of all history sectors of W 1 and W 1 · H 1 are copies of H 1 .
Applying Lemma 3.3 (3) to the subcomputation W 1 ·H 1 χ(2, 3) → . . . , W 1 ·H 1 χ(2, 3)H 2 and considering the history U V -sector again, we deduce that H 2 is a copy of Similar arguments work for the rest of the computation C: H 3 is a copy of H −1 1 and H 4 is a copy of H 1 ζ (12) H 1 . This implies (a).
(b) follows from the same argument as (a). (c) If the history H of C does not have χ-rules, then C is a computation of a copy of one of the S-machines M 2 , LR, RL and we can apply Lemmas 3.14 (b) and 3.13.
Suppose that H contains a χ-rule. Then H = H 1 H 2 H 3 where H 1 , H 3 do not contain χ-rules, but H 2 starts and ends with χ-rules (it is possible that Then W k is tame being in the domain of a χ-rule. Hence by Lemmas 3.14 (b) and 3.13 for every i between 0 and k |W i | Y does not exceed c|W 0 | Y for some constant c. The same argument shows that for i between s and t |W i | Y does not exceed c|W t | Y . The proof of part (a) describes the subcomputation W k → · · · → W s in detail. This description and Lemma 3.13 imply that for i between k and s, |W i | Y does not exceed a constant times the maximum of |W k | Y and |W s | Y . This implies (c). (1) If a word α k is accepted by the Turing machine M 0 , then for some word H, there is a reduced computation I 3 (α k , H) → · · · → A 3 (H) of the S-machine M 3 .
(2) The word I 3 (α k , H) is in the domain of a rule from M 3,1 while I 3 (H ) is in the domain of a rule from M 3,4m+1 . For different i, j domains of rules from M 3,i and M 3,j are disjoint and if rules of sets M 3,i and M 3,i+1 appear in a computation, the computation must also contain the χ-rule χ(i, i + 1). Therefore the projection of the history of C onto the alphabet of χ-rules must contain a subword χ(1, 2)χ(2, 3) Hence C must contain a subcomputation D with history of the form χ(1, 2)H 1 χ(2, 3), where H 1 is the history of a computation of a copy of M 2 of the form I 2 (α , H) → · · · → A 2 (H ) for some , H and the rules in C applied before this χ(1, 2) are from M 3,1 . Since rules of M 3,1 do not modify the input sector, k = . Therefore α k is accepted by M 2 . By Lemma 3.10 then α k is accepted by M 0 and H ≡ H. The fact that H ≡ H is proved in the same way as in Lemma 3.10 (2).

M 4 and M 5
Let B 3 be the standard base of M 3 and B 3 be its disjoint copy. By M 4 we denote the S-machine with standard base B 3 (B 3 ) −1 and rules θ(M 4 ) = [θ, θ], where θ ∈ Θ and Θ is the set of rules of M 3 . So the rules of Θ(M 4 ) are the same for M 3 -part of M 4 and for the mirror copy of M 3 . Therefore we will denote Θ(M 4 ) by Θ as well. The sector between the last state letter of B 3 and the first state letter of (B 3 ) −1 is locked by any rule from Θ.
The 'mirror' symmetry of the base will be used in Lemma 7.40.
The S-machine M 5 is a circular analog of M 4 . We add one more base letterť to the hardware of (2) If there is a computation C :  Lemma 3.20. There is a constant C = C(M 5 ), such that for every reduced computation C : W 0 → · · · → W t of M 5 with a faulty base and every j = 0, 1, . . . , t, for every 0 < r < t since otherwise it suffices to prove the statement for two shorter computations W 0 → · · · → W r and W r → · · · → W t . Since χ-rules do not change the length of configurations, the history H of C cannot start or end with a χ-rule.
Step 2. If the history H of C has no χ-rules, then the statement with C ≥ 18 follows from Lemmas 3.14 (a), 3.4 and 3.12.
Step 3. If there is only one χ-rule χ in H, then H = H χ ±1 H , where H is a copy of the history of a computation of a copy of LR or RL and H is the history of a computation of a copy of M 2 (or vice versa). For the computation W r → · · · → W 0 with history (H ) −1 , we have |W r | Y ≤ |W 0 | Y by Lemmas 3.14 (a) and 3.4. This contradicts the assumption of Step 1, and so one may assume further that H has at least two χ-rules.
Step 4. The base B of the computation C has no history sectors P P −1 -, R −1 R-, QQ −1 -, or Q −1 Q-sectors, since every χ-rule locks the P Q-and QR-sectors of the standard base.
The same statement is true for the mirror copies of the above-mentioned sectors, and this stipulation works throughout the remaining part of the proof.
Step 5. Assume that the history H ±1 is of the form where H 2 is the history of a computation of a copy of M 2 . Since B is not reduced, there is a 2-letter subword of the base of the form U ±1 U ∓1 (for some part U of the set of state letters). By Lemma 2.4, then this subword must be a history subword of the form P −1 P or RR −1 since every sector of the standard base of M 3 , except for history RP -sectors is locked either by χ(i − 1, i) or by χ(i, i + 1).
Let us consider the case of P −1 P since the second case is similar. Depending on the parity of i either a prefix H 3 of H 3 is the history of a computation of a copy of LR or the suffix H 1 of H 1 is the history of a computation of a copy of LR. These two cases are similar so we consider only the first one.
Then between the P -letter of the P −1 P -sector of an admissible word in the subcomputation of C with the history H 3 and the corresponding R-letter in that admissible word, there is always a Q-letter or a P −1 -letter, hence the P -letter never meets the corresponding R-letter during that subcomputation and no transition rules rules can apply to any of the admissible words of that subcomputation. Therefore H 3 = H 3 and for the subcomputation C :

(1) and 3.4. This contradicts
Step 1, and so the assumption made in the beginning of Step 5 was false.
Step 6. Assume that there is a history of a subcomputation of C of the form H 1 χH 2 χ −1 H 3 , where χ is a χ-rule, H 2 is the history of a computation of a copy of M 2 . Then we claim that the base of C has no history P −1 P -or RR −1 -sectors. To prove this, we consider only the former case since the latter one is similar.
If the subcomputation C of C with history H 3 starts with an admissible word W having in the P −1 P -sector all Y -letters from the right alphabets, then, as in Step 5, H 3 corresponds to the work of LR, which gives a contradiction as in item 5.
If the P −1 P -sector of W has all Y -letters from the left alphabet, then the subcomputation of C −1 with history χH −1 2 will conjugate the Y -projection of that sector by a non-empty reduced word from the right alphabet. Therefore in the last admissible word of that subcomputation, there will still be letters from both left and right alphabets, and so it cannot be in the domain of any χ-rule or its inverse, a contradiction.
Together with Step 4, this implies that the base of C has no mutually inverse letters from history sectors staying next to each other.
Since the base is faulty, it must contain an input P −1 1 P 1 or R 0 R −1 1 -sector. This implies that the base does not contain input (R 0 P 1 ) ±1 -sectors since the first and the last letters of the base are equal (say, positive) and the base has no proper subwords with this property. In both cases the configuration W r corresponding to the transition χ : W r−1 → W r is the shortest one in C since the Y -projection of that word is of the form α k , each rule from C conjugates the Y -projection from the input sector, and α k cannot be shortened by any conjugation. This contradicts Step 1.
Step 7. It follows from items 2,3, 5 and 6 that H = H 1 χH 2 χ H 3 , for two χ-rules (or their inverses). Moreover H 2 is the history of a computation C 2 of a copy of LR or of RL and H 1 , H 3 are histories of computations C 1 , C 3 of copies of M 2 , i.e.,H has exactly two χ-rules (otherwise H has a subword which is ruled out in the previous steps of the proof).
Step 8. We claim that we can assume that the admissible words in the computation C do not have a history (P R) ±1 -sectors. Indeed, if such a sector exists, then for the subcomputation C 1 : W 0 → · · · → W r with history H 1 χ, we have |W r | Y ≤ c|W 0 | by Lemma 3.13. A similar estimate is true for the subcomputation with history χ H 3 starting with some W s . So in order to prove the inequality from the lemma, it suffices to apply Step 2 to the three subcomputations C 1 , C 2 .C 3 .
Step 9. Suppose that the base of C contains a history subword of the form P −1 P . If the admissible word from C in the domain of χ has no letters from the left alphabets, then H 2 is the history of a computation of a copy of LR and the state P -letter will never meet the corresponding state R-or Q-letter during the computation C 2 , so an application χ is not possible after C 2 ends, a contradiction.
Thus we can assume that if the base of C contains a history subword of the form P −1 P , then the last admissible word of C 2 (which is in the domain of χ) contains letters from the left alphabet.
Similarly, if the base of C contains a history subword of the form RR −1 , then the last admissible word in C 2 contains letters from the right alphabet. This implies, in particular that the base of C cannot contain both a history subword of the form P −1 P , and a history subword R (R ) −1 . Without loss of generality, we will assume that there are no subwords R (R ) −1 .
Step 10. It follows from Steps 4,8 and 9, that there are no unlocked by χ history sectors of the base except for P −1 P -sectors, and if there is such a sector U V ,then C 2 is a computation of a copy of RL. Therefore U V may cotain tape letters from a left alphabet, while every rule θ of C −1 1 multiplies this sector from both sides by letters from a right alphabet. So θ increases the lengths of every history sectors by 2. The rule χ locks working sectors (except for the input one), and so by Lemma 2.3 (**), θ can decrease the lengths of every working sector at most by one. Since working sectors alternate with history ones in any base, we have ||W r || ≤ ||W 0 ||, contrary to Step 1.
Step 11. To complete the proof of the lemma, it remains to assume that there are no history sectors in the base of C. Then the faulty base of C must contain input subwords of the form R 0 R −1 0 only, because every χ-rule locks all sectors of the standard base except for the input and history sectors. Then any admissible word of C from the domain of a χ-rule in H is the shortest admissible word in C since (as in Step 6) every rule of the computation conjugates R 0 R −1 0 -sectors and a word α k cannot be shortened by any conjugation. The lemma is proved since we can refer to Step 1 again. 4 The main S-machine M

The definition of M
We use the S-machine M 5 from Section 3.5, LR m from Section 3.1 and three more easy S-machines to compose the main circular S-machine M needed for this paper. The standard base of M is the same as the standard base of M 5 , i.e.,{ť}B 3 (B 3 ) −1 , where B 3 has the form (3.4). However we will useQ 0 instead of Q 0 ,Ř 1 instead of R 1 and so on to denote parts of the set of state letters since M has more state letters in every part of its hardware.
The rules of M will be partitioned into five sets (S-machines) Θ i (i = 1, . . . , 5) with transition rules θ(i, i + 1) connecting i-th and i + 1-st sets. The state letters are also disjoint for different sets Θ i . It will be clear thatQ 0 is the disjoint union of 5 disjoint sets including Q 0 ,Ř 1 is the disjoint union of five disjoint sets including R 1 , etc.
By default, every transition rule θ(i, i + 1) of M locks a sector if this sector is locked by all rules from Θ i or if it is locked by all rules from Θ i+1 . It also changes the end state letters of Θ i to the start state letters of Θ i+1 .
Set Θ 1 inserts input words in the input sectors. The set contains only one positive rule inserting the letter α in the input sector next to the left of a letter p fromP 1 . It also inserts a copy α −1 next to the right of the corresponding letter (p ) −1 (the similar mirror symmetry is assumed in the definition of all other rules.) So the positive rule of Θ 1 has the form The rules of Θ 1 do not change state letters, so it has one state letter in each part of its hardware.
The connecting rule θ(12) changes the state letters of Θ 1 by their copies in a disjoint alphabet. It locks all sectors except for the input sectorŘ 0P1 and the mirror copy of this sector.
Set Θ 2 is a copy of the S-machine LR m working in the input sector and its mirror image in parallel, i.e.,we identify the standard base of LR m withŘ 0P1Q1 . The connecting rule θ(23) locks all sectors except for the input sectorŘ 0P1 and its mirror image.
Set Θ 3 inserts history in the history sectors. This set of rules is a copy of each of the left alphabets X i,l of the S-machine M 2 . Every positive rule of Θ 3 inserts a copy of the corresponding positive letter in every history sectorŘ iPi+1 next to the right of a state letter fromŘ i .
Again, Θ 3 does not change the state letters, so each part of its hardware contains one letter.
The transition rule θ(34) changes the state letters by their copies from Set M 5,1 of M 5 . It locks all sectors except for the input sectors and the history sectors. The history sectors in admissible words from the domain of θ(34) have Y -letters from the left alphabets X i,l of the S-machine M 5 .
Set Θ 4 is a copy of the S-machine M 5 . The transition rule θ(45) locks all sectors except for history ones. The admissible words in the domain of θ(45) have no letters from right alphabets.
Set Θ 5 . The positive rules from Θ 5 simultaneously erase the letters of the history sectors from the right of the state letter fromŘ i . That is, parts of the rules are of the form r → ra −1 where r is a state letter fromŘ i , a is a letter from the left alphabet of the history sector.
Finally the accept rule θ 0 from M can be applied when all the sectors are empty, so it locks all the sectors and changes the end state letters of M 5 to the corresponding end state letters of M. Thus the main S-machine M has unique accept configuration which we will denote by W ac .
For every i = 1, 2, 3, 4, we will sometimes denote θ(i, i + 1) −1 by θ(i + 1, i). Considering eligible computations instead of just reduced computations is necessary for our interpretation of M in a group.

Standard computations of M
The history H of an eligible computation of M can be factorized so that every factor is either a transition rule θ(i, i + 1) ±1 or a maximal non-empty product of rules of one of the sets Θ 1 − Θ 5 . If, for example, H = H H H , where H is a product of rules from Θ 2 , H has only one rule θ(23) and H is a product of rules from Θ 3 , then we say that the step history of the computation is (2)(23) (21) is used for the rule θ(12) −1 an so on. For brevity, we can omit some transition symbols, e.g. we may use (2)(3) instead of (2)(23)(3) since the only rule connecting Steps 2 and 3 is θ(23).
If the step history of a computation consists of only one letter (i), i = 1, . . . , 5, then we call it a one step computation. The computations with step histories (i) are also considered as one step computations. Any eligible one step computation is always reduced by definition.
The step history of any computation cannot contain certain subwords. For example, is not a subword of any step history because domains of rules from Θ 1 and Θ 3 are disjoint. In this subsection, we eliminate some less obvious subwords in step histories of eligible computations.  If H has no χ-letters, then it is a history of RL, and we obtain a contradiction with Lemma 3.3 (4) (and Remark 3.7).
(2) Suppose the step history of C is (23)(3)(32). Since the history sectors are locked by θ(23) ±1 , the history subwords in the base of C must have the form (R i−1 P i ) ±1 for some i. Every rule of Θ 3 inserts a letter next to the left of every P i -letter, different rules insert different letters, same letter for the same rule. Since at the beginning and at the end of the subcomputation with step history (3) all history sectors are empty of Y -letters, the word inserted during the subcomputation must be freely trivial. That contradicts the assumption that this subcomputation is reduced.
By definition, the rule θ(23) locks all history sectors of the standard base of M except for the input sectorŘ 0P1 and its mirror copy. Hence every admissible word in the domain of θ(23) −1 has the form W (k, k ) ≡ w 1 α k w 2 (a ) −k w 3 , where k and k are integers and w 1 , w 2 , w 3 are fixed words in state letters; w 1 starts withť. Recall that W ac is the accept word of M. Proof. Consider only the step history (12)(2)(12). Thus the history H of the computation is θ(12)H θ(21) −1 and H is a computation of a copy of LR m working in the input sectors of admissible words of M. Then applying Lemma 3.3 (4) and Remark 3.3 we can conclude that H is empty, a contradiction.  (4) ,it works as LR m with step history (2)).
Then the base of the computation C is a reduced word, and all configurations of C are uniquely defined by the history H and the base of Moreover, ||H|| = 2s+3 (resp., ||H|| = s+2), where s is the Y -length of every history sector (input sector) in case (a) (in case (b), resp.), and H is the copy of the maximal Y -word contained in arbitrary history (resp., input) sector of W 0 .
Proof. (a) Every history sector of the standard base is locked either by one of the rules , or by a rule of H . Every non-history sector of the standard base is also locked either by χ(i − 1, i) or by χ(i, i + 1). It follows from Lemma 2.4 that the base of C is a reduced word. By Lemma 3.3 (3), the histories of the primitive Smachines subsequently restore the tape words in all history sectors. Since one of the rules χ(i − 1, i), χ(i, i + 1) locks all non-history sectors, Lemma 3.14 applied to C implies equalities |W 0 | Y = |W 1 | Y = · · · = |W t | Y , and gives the other statements.
Lemma 4.6. (1) If the word α k is accepted by the Turing machine M 0 , then there is a reduced computation of M, W (k, k) → · · · → W ac whose history has no rules of Θ 1 and Θ 2 .
(2) If the history of a computation C : W (k, k) → · · · → W ac of M has no rules of Θ 1 and Θ 2 , then the word α k is accepted by M 0 .
Proof. (1) By Lemma 3.18, there is a computation I 5 (a k , H) → · · · → A 5 (H) of the S-machine M 5 for some H. So we have the corresponding computation of Θ 4 : D : I 6 (a k , H) → · · · → A 6 (H). Now the computation of Θ 3 inserting letters in history sectors and a computation of Θ 5 erasing these letters extend D and provide us with a computation W (k, k) → · · · → I 6 (a k , H) → · · · → A 6 (H) → · · · → W ac .
(2) By Lemma 4.2 (1), the step history of C begins with (3)(4)(5), and so there is a subcomputation of Set 4 of the form I 5 (α , H) → · · · → A 5 (H) for some and H, where according to Lemma 3.18 (2), the word α is accepted by M 0 . Since the computation of Set M 3,3 does not change the input sector, we have = k.

The first estimates of computations of M
Proof. (a) If C is a one-step computation and its step history is (1), (3), or (5), then Statement (a) follows from Lemma 2.5 (c). For step history (2) (resp. (4) it follows from Lemma 3.14 (a) (resp., Lemma 3.16 (c)). If there is a transition rule of M in the history H of C, then H can be decomposed in at most three factors H = H 1 H 2 H 3 , where H 2 is a one-step computation of step history (1), (3) or (5), or H 2 = (23)(32) and H 1 , H 3 , if non-empty, are of step history (2) or (4). Respectively, the computation C is a composition of at most three subcomputations applying either Lemma 3.14 (a) (for step history (2)) or Lemma 3.16 (c) (for step history (4)) to C 1 and C 3 . The same lemmas applied to subcomputations C 1 , C 2 and C 3 completes the proof since we can assume that c 2 c (see Section 2.3). If W 0 is a tame configuration, then the same Lemmas linearly bound ||W 0 || in terms of ||W t ||, and the required estimates follow.
(b) It suffices to bound the lengths of at most three one step subcomputations (1). For step history (1), (3) or (5), the history lengths are bounded by Lemma 2.5 (b). For (2), we refer to Lemma 3.14 (b). The computation with step history (4) has at most 4m χ-rules in the history as follows from Lemma 3.15. So it has at most 4m + 1 maximal subcomputations of the form W l → · · · → W s , corresponding to one of the 4m+1 subsets (1) of the lemma. Hence we have the same upper bound for s − l by Lemmas 3.3 (3) (if it is a computation of LR) and 3.11 (if it is acomputation of M 2 ). This completes the proof of the first inequality since we have c 2 m (Section 2.3). The tame case is treated in the same way (the proof is even shorter).

Computations of M with faulty bases
Step 1. As in Step 1 of the proof of Lemma 3.20, one may assume that |W j | Y > max(|W 0 | Y , |W t | Y ) if 1 < j < t and so the history H of C neither starts nor ends with a transition rule θ(i, i + 1) ±1 .
Step 2. If C is a one step computation and (i) is its step history, then the statement follows from Lemma 2.5 (c) for i = 1, 3, 5, (since c 1 ≥ 2), Lemma 3.14 (a) for i = 2 (since c 1 ≥ 2) and Lemma 3.20 (since c 1 ≥ C). Hence one may assume further that H contains a transition rule of M or its inverse.
Step 3. Assume that C (or the inverse computation) has a transition rule θ(23), W j+1 = W j · θ (23). Recall that the θ(23) does not lock only the inputŘ 0P1 -sector and its mirror copy. So by Lemma 2.4, we should have an input subwordŘ 0Ř in the faulty base. Moreover, we must have exactly two such input subwords in the base and no subwords (Ř 0P1 ) ±1 since the first and the last letters of the base are equal (e.g., positive) and the base has no proper subwords with this property (see Definition 3.19).
The input sectors of both W j and W j+1 have Y -projections of the form α k , and they are not longer than the corresponding Y -words in the input sectors of any other W i since α k cannot be shortened by conjugation. It follows that Step 1. Thus, one may assume further that H has no letters θ(23) ±1 . In particular, C is a reduced computation.
The same argument eliminates letters θ(12) ±1 from H, and so the letter (1) from the step history of C. Hence one can assume that the step history contains neither (1) nor (2).
Step 4. Suppose H (or H −1 ) contains a subhistory H θ(45), where H is a maximal subword of H which is word in Θ 4 (which is a copy of the S-machine M 5 ). By Lemma 2.4, the faulty base of the computation C contains one of the history subwordsŘ i−1Ř j−1 -sector will then never meet a letter from eitherQ j−1 orP j . Therefore H cannot contain the transition rule χ(4m, 4m + 1) ±1 or θ(45) −1 . Thus H is a prefix of H, is a computation of a copy of RL, and by Lemma 3.14 (a) applied to the subcomputation of C −1 with history (H ) −1 , we get a contradiction with Step 1 because admissible words in the domain of θ(45) −1 is tame.
Suppose the base of C contains a subword (Ř i−1Pi ) ±1 . Then H has no subword θ(45) −1 H θ(45) by Lemma 4.2 (1) If H has neither transition rules nor χ-rules, then we have a contradiction by Lemma 3.14 (a). Hence H has a subword χ(4m, 4m + 1)H θ(45), but then by Lemma 3.3 (3), H has a rule locking all the sectorsŘ i−1Pi of the standard base, and we get a contradiction with Lemma 2.4.
Finally suppose all history subwords in the base of C have the formP −1 iP i . Then the rules of a copy of RL from H do not change the history sectors of admissible words in the corresponding subcomputation C of C, hence the lengths of all admissible words in C stay the same. Moreover since the state letters in the history sectors do not change during the subcomputation C, none of the admissible words in that subcomputation is in the domain of χ(4m, 4m + 1) ±1 . Therefore the rules of H do not change the lengths of admissible words, and either H is a prefix of H and we get a contradiction with Step 1 or we have the subhistory θ(45) −1 H θ(45).
In the latter case, we consider the maximal subhistory H of type 5 following after the rule θ(45) (or before θ(45) −1 ). All the admissible words of the corresponding subcomputation C have equal lengths since the base has no lettersŘ i . Arguing in this way we see that the history of C has Steps 4 and 5 only, and all the admissible words in C have equal length, which proves the inequality of the lemma.
We can conclude that H does not contain θ(45) ±1 . By Step 2, (5) is not in the step history of C and the only possible transition rules of M in H are θ(34) ±1 .
Step 5. Assume that there is a subhistory of H of the form Then the base of C has no history sectors of the form R iŘ −1 i (since, as before, the machine RL starting with θ(34) would never end with χ (12)). If there is a history subwordŘ i−1Pi in the faulty base, then H 2 cannot follow by the transition rule θ(34) −1 , by Lemma 3.15 if H 2 contains χ-rules and by Lemma 3.3 (4) otherwise, a contradiction.
Thus the base of C has noŘ-letters from history sectors. It also has noP 1 -letters from input sectors, because otherwise the base would contain the letterŘ 1 of the history sector next to the input sector since the sectorsP 1Q1 andQ 1Ř1 are locked by θ(34).
Thus, all history sectors have the formP −1 iP i in the faulty base of C, and so H cannot have the rule χ(1, 2) ±1 (for the same reason the rule χ(4m, 4m+1) was eliminated in Step 4). But without χ(1, 2) ±1 , one cannot get a rule in H changing history sectorsP −1 iP i since the rules of Θ 3 leave such sectors unchanged. The input sectorsŘ 0Ř −1 0 of the base of C (if any) cannot be shorten by a subcomputation since no conjugation shortens a power of one letter in a free group. therefore the rules θ(34) ±1 are applied to the shortest admissible word of C, contrary to Step 1.
So our assumption was wrong.
Step 6. If there is only one transition rule θ(34) in H ±1 , then H ±1 = H θ(34)H , where H is the history of M 5 . If H is the history of a copy of RL, starting with an admissible word W r , then |W r | Y ≤ |W t | Y by Lemmas 3.14 (a) and 3.4, contrary to Step 1. Otherwise we have a subhistory θ(34)H 0 χ(1, 2), and by Lemma 3.3 (3), there are no history subsectors of the formŘ iŘ If there is a history sectorŘ i−1Pi , then one can linearly bound |W r | Y in terms of |W t | Y applying Lemmas 3.14 (b) and 3.13 several times, namely at most 4m + 1 times by Lemma 3.15. Since c 1 C, c 1 m (see Section 2.3) one consider two subcomputations of C can divide C: W 0 → · · · → W r and W r → · · · → W t and reduce the proof to Step 2.
Thus, one may assume that the base of C has no lettersP andŘ from history sectors. This also eliminates the letterP 1 of the input sector and gives the inequality contrary to Step 1. Therefore the assumption of Step 6 was wrong.
Step 7. It remains to consider the case when H ±1 is of the form where H 2 is the history of Θ 3 , H 1 and H 3 are histories of Θ 4 , and it suffices to repeat the argument of Step 6 with decomposition of C in the product of three subcomputaions, because we did not use there that the subword H 1 θ(34) was absent. The lemma is proved.

Space and length of M-computations with standard base
Let us call a configuration W of M accessible if there is a W -accessible computation, i.e., either an accepting computation starting with W or a computation s 1 (M) → · · · → W , where s 1 (M) is the start configuration of M (i.e., the configuration where all state letters are start state letters of Θ 1 and the Y -projection is empty). Proof. Assume that a W -accessible computation C has (4) in its step history and its history H has a rule χ(i, i + 1) with 1 < i < 4m. Since C is accessible, we have by Lemma 3.16 (b), a subcomputation W l → · · · → W r with history of the form (a) where H is a history of a canonical computation of M 5 . By Lemma 4.4 we also conclude that every history sector of W l and of W r is a copy of H . It makes possible to accept W r using erasing rules of Set 5 in case (a) or to construct a computation of type (1)(2)(3) starting with s 1 (M) and ending with W l in case (b).
It follows now from Lemma 3.16 that one can choose a accessible computation C having no subhistories of type (34)(4)(45) or (45)(4)(34), and so Set 4 can occur only in the beginning or at the end of H. In the first case H has to have type (4)(5), and the required inequalities follow from Lemma 4.7 since c 3 c 2 . In the second case, the step history ends with (3)(4), and the connection θ(34) : W k−1 → W k provides us with copies in all history sectors and in all input sectors since W k is accessible. Hence one may assume that the step history has the form (1)(2)(3)(4). Here |W k | Y ≤ c 1 |W | Y by Lemma 3.16 (c). The canonical computation with step history (1)(2)(3) does not decrease the lengths of configurations. Now the required estimates follows from Lemma 4.7 for four one-step subcomputations since we chose c 3 after c 2 .
One obtains even better estimates than the required iequalities if C has no Set 4 in the step history since it has one of the types (5) or (1)(2)(3), or (1)(2) or (1).
For any accessible word W we choose an accessible computation C(W ) according to Lemma 4.9. Proof. One may assume that t > c 4 max(||W 0 ||, ||W t ||), because otherwise Property (a) holds for sufficiently large c 5 since an application of every rule can increase the length of a configuration by a constant depending on M. Hence by Lemma 4.9, ||H 0 || + ||H t || ≤ 2c 3 max(||W 0 ||, ||W t ||) ≤ t/500.
The computation C is not a B-computation by Lemma 4.7 since c 2 < c 4 . Therefore it is a computation satisfying Property (A) of Lemma 4.5, and one has a maximal subcomputation C : W r → · · · → W s starting and ending with subcomputations with step histories 2 or 4, which where listed in part (A) of that lemma. We have C = C C C , and applying Lemma 4.7 to (C ) −1 and C , we have max(||W r ||, ||W s ||) ≤ c 2 max(||W 0 ||, ||W t ||) and the lengths l and l of C and C do not exceed c 4 max(||W 0 ||, ||W t ||)/1000 since c 4 > 1000c 2 c 3 . Therefore l + l ≤ t/500. Lemma 4.5 implies that the subcomputation C is a product C 1 D 1 . . . C k−1 D k−1 C k , where k ≥ 1, every C i has one of the four step histories from item (a) of that lemma, and every D i is a subcomputation having type 1 or 3, or 5, or just empty if the history H(i) of C i ends with θ(23) and H(i + 1) starts with θ(23) −1 . Let K(i) be the history of D i . Below we will prove that ||K(i)|| ≤ (||H(i)|| + ||H(i + 1)||)/1000. In turn, this will imply that i ||H(i)|| ≥ 500 i ||K(i)||, and the last claim of the lemma will follow since l + l ≤ t/500.
We call a base B of an eligible computation (and the computation itself) revolving if B ≡ xvx for some letter x and a word v, and B has no proper subword of this form.
If v ≡ v 1 zv 2 for some letter z, then the word zv 2 xv 1 z is also revolving. One can cyclically permute the sectors of revolving computation with base xvx and obtain a uniquely defined computation with the base zv 2 xv 1 z, which is called a cyclic permutation of the original computation. The history and lengths of configurations do not change when one cyclically permutes a computation. Proof. If the computation is faulty, then Property (1) is given by Lemma 4.8 since c 4 > c 1 .
If it is non-faulty, then we have all sectors of the base in the same order as in the standard base (or its inverse), and we obtain Property (2a). Therefore we may assume now that the base xv is standard and Property (1) does not hold. If C is a B-computation, we obtain a contradiction with Lemma 4.7 since c 4 > c 2 . Therefore we assume further that C is an A-computation. So it (or the inverse one) contains a subcomputation with step history (12)(2)(23) or (34)(4)(45). In case of (34)(4)(45), we consider the transition θ(45) : W j → W j+1 . By Lemma 4.4, the words in the history sectorsŘ i−1Pi are copies of each other. Therefore they can be simultaneously erases by the rules of Set 5, and so W j+1 and all other configurations are accepted. Similarly one applies Lemma 4.4 in case (12)(2)(23) and concludes that Property (2b) holds.
Now the second part of (2c) and (d) follow from Lemma 4.10.

Two more properties of standard computations
Here we prove two lemmas needed for the estimates in Subsection 7.2. The first one says (due to Lemma 4.3 (2)) that if a standard computation C is very long in comparison with the lengths of the first and the last configuration, then it can be completely restored if one knows the history of C, and the same is true for the long subcomputations of C. This makes the auxiliary parameter σ λ (∆) useful for some estimates of areas of diagrams ∆.
Lemma 4.13. Let a reduced computation C : W 0 → · · · → W t start with an accessible word W 0 and have step history of length 1. Assume that for some index j, we have |W j | Y > 3|W 0 |. Then there is a sector QQ such that a state letter from Q or from Q inserts an Y -letter increasing the length of this sector after any transition of the subcomputation W j → · · · → W t .
Proof. First of all we observe that the Y -words in all history sectors (in all input sectors) of any configuration W i are copies of each other, because W 0 is accessible. Also inducting on t one can assume that |W 1 | Y > |W 0 | Y .
I we have one of the Sets 1, 3, 5, then . . since the second rule cannot be inverse for the first one, and so on, i.e., we obtain the desired property of any input sector for Set 1 or of any history sector for Sets 3 or 5.
If we have Set 2, then the statement for any imput sector follows from Lemma 3.3 (1) .
Let the step history be (4). Recall that the rules of Set 4 are subdivided in several sets, where each set copies the work of either LR or M 3 . If a PM-rule of the subcomputation D : W 0 → · · · → W j increases the length of a history sector, then we refer to Lemma 3.3 (1) as above. So one may assume that no PM-rules of D increase the length of history sectors.
Assume now that D has an M 3 -rule increasing the length of history sectors. It has to insert a letter from X i, from the left and a letter from X i,r from the right. Since the obtained word is not a word over one of these alphabets, the work of M 3 is not over, and the next rule has to increase the length of the sector again in the same manner since the computation is reduced. This procedure will repeat until one gets W t . This proves the statement for any history sector.
It remains to assume that there are no transitions in D increasing the lengths of history sectors and the first transition W 0 → W 1 is provided by a rule θ of M 3 . It cannot shorten history sectors (by 2). Indeed can θ change the length of working sectors of neighbor working sectors at most by 1 (see Lemma 2.3 (3)), which implies |W 0 | Y ≥ |W 1 | Y , a contradiction. It follows that no further rules of M 3 can shorten history sectors. Then  2)). Hence if its length inW 0 is , its length in W s is at most + h. It follows that |W s | Y ≤ 3|W 0 | Y , because the working sectors of M 2 and its history sectors alternate in the standard base; and the same inequality |W r | Y ≤ 3|W 0 | Y holds for any configuration W r of E. Hence s = j and the subcomputation E follows in D by a subcomputation F of PM, which does not change the length of configurations by Lemma 3.14.
So F has to follow in D by a maximal subcomputation G of M 3 again. Since we have the canonical work of M 3 in history sectors, a prefix of the history of G −1 is a copy of the entire H(E) −1 , where H(E) is the history of E. (G cannot be shorter than E since otherwise the configuration W j would have a copy in E, whence |W j | Y ≤ 3|W 0 | Y , a contradiction.) It follows that a configuration W l of G is a copy of W 0 , and so |W l | Y = |W 0 | Y . Since the subcomputation W l → · · · → W j → · · · → W t is shorter than C, we complete the proof of the lemma inducting on t.

The groups
Every S-machine can be simulated by a finitely presented group (see [26], [19], [20], etc.). Here we apply a modified construction from [26] to the S-machine M. To simplify formulas, it is convenient to change the notation. From now on we shall denote by N the length of the standard base of M.
Thus the set of state letters is For every θ ∈ Θ + we have N generators θ 0 , . . . , θ N in M (here θ N ≡ θ 0 ) if θ is a rule of Set 3 (excluding θ(23)) or Set 4, or Set 5. For θ of Set 1 or 2 (including θ(23)), we introduce LN generators θ are obtained from U j and V j by addiing the superscript i to every letter.
For θ = θ(23), we introduce relations i.e.,the superscripts are erased in the words U (i) j and in the Y -letters after an application of (5.7).
Finally, the required group G is given by the generators and relations of the group M and by two more additional relations, namely the hub-relations where the word W (i) st is a copy with superscript (i) of the start word W st (of length N ) of the S-machine M and W ac is the accept word of M.
Remark 5.1. The main difference of the construction of M and the groups based on Smachines with hubs from our previous papers [26,19,20,17] and others, is that relations (5.6) are defined differently for different rules of the S-machine. We also use two hub relations instead of just one, although it is easy to see that one hub relation follows from the other (and other relations).
Note also that, as usual, M is a multiple HNN extension of the free group generated by all Y -and q-letters.

Van Kampen diagrams
Recall that a van Kampen diagram ∆ over a presentation P = A|R (or just over the group P ) is a finite oriented connected and simply-connected planar 2-complex endowed with a labeling function Lab : E(∆) → A ±1 , where E(∆) denotes the set of oriented edges of ∆, such that Lab(e −1 ) ≡ Lab(e) −1 . Given a cell (that is a 2-cell) Π of ∆, we denote by ∂Π the boundary of Π; similarly, ∂∆ denotes the boundary of ∆. The labels of ∂Π and ∂∆ are defined up to cyclic permutations. An additional requirement is that the label of any cell Π of ∆ is equal to (a cyclic permutation of) a word R ±1 , where R ∈ R. The label and the combinatorial length ||p|| of a path p are defined as for Cayley graphs.
The van Kampen Lemma [11,13,25] states that a word W over the alphabet A ±1 represents the identity in the group P if and only if there exists a diagram ∆ over P such that Lab(∂∆) ≡ W, in particular, the combinatorial perimeter ||∂∆|| of ∆ equals ||W ||.
( [11], Ch. 5, Theorem 1.1; our formulation is closer to Lemma 11.1 of [13], see also [25,Section 5.1]). The word W representing 1 in P is freely equal to a product of conjugates to the words from R ±1 . The minimal number of factors in such products is called the area of the word W. The area of a diagram ∆ is the number of cells in it. The proof of the van Kampen Lemma [13,25] shows that Area(W ) is equal to the area of a van Kampen diagram having the smallest number of cells among all van Kampen diagrams with boundary label Lab(∂∆) ≡ W.
We will study diagrams over the group presentations of M and G. The edges labeled by state letters ( = q-letters) will be called q-edges, the edges labeled by tape letters (= Y -letters) will be called Y -edges, and the edges labeled by θ-letters are θ-edges.
We denote by |p| Y (by |p| θ , by |p| q ) the Y -length (resp., the θ-length, the q-length) of a path/word p, i.e., the number of Y -edges/letters (the number of θ-edges/letters, the number of q-edges/letters) in p.
The cells corresponding to relations (5.9) are called hubs, the cells corresponding to (θ, q)-relations are called (θ, q)-cells, and the cells are called (θ, a)-cells if they correspond to (θ, a)-relations.
A Van Kampen diagram is reduced, if it does not contain two cells (= closed 2-cells) that have a common edge e such that the boundary labels of these two cells are equal if one reads them starting with e (if such pairs of cells exist, they can be removed to obtain a diagram of smaller area and with the same boundary label).

5.2.1
The superscript shift of a van Kampen diagram over M or G Remark 5.2. If one changes all superscripts of the generators of M or G by adding the same integer k: (i) → (i + k) (modulo L) in all letters having a superscript, then one obtains the relations again, as it is clear from formulae (5.6 -5.9). Therefore similar change ∆ → ∆ (+k) of the edge labels transforms a (reduced) diagram ∆ to a (reduced) diagram ∆ (+k) . Let us call such a transformation superscript shift (or k-shift) of ∆.

Bands
To study (van Kampen) diagrams over the group G we shall use their simpler subdiagrams such as bands and trapezia, as in [15], [26], [1], etc. Here we repeat one more necessary definition.
Definition 5.3. Let Z be a subset of the set of letters in the set of generators of the group M . A Z-band B is a sequence of cells π 1 , ..., π n in a reduced van Kampen diagram ∆ such that • Every two consecutive cells π i and π i+1 in this sequence have a common boundary edge e i labeled by a letter from Z ±1 .
• Each cell π i , i = 1, ..., n has exactly two Z-edges in the boundary ∂π i , e −1 i−1 and e i (i.e.,edges labeled by a letter from Z ±1 ) with the requirement that either both Lab(e i−1 ) and Lab(e i ) are positive letters or both are negative ones.
• If n = 0, then B is just a Z-edge.
The counter-clockwise boundary of the subdiagram formed by the cells π 1 , ..., π n of B has the factorization e −1 q 1 fq −1 2 where e = e 0 is a Z-edge of π 1 and f = e n is an Z-edge of π n . We call q 1 the bottom of B and q 2 the top of B, denoted bot(B) and top(B). The trimmed top/bottom label are the maximal subwords of the top/bottom labels starting end ending with q-letters.
Top/bottom paths and their inverses are also called the sides of the band. The Zedges e and f are called the start and end edges of the band. If n ≥ 1 but e = f , then the Z-band is called a Z-annulus .
If B is a Z-band with Z-edges e 1 , ..., e n (in that order), then we can form a broken line connecting midpoints of the consecutive edges e 1 , ..., e n and laying inside the union of the cells from B which will be called the median of B.
We will consider q-bands, where Z is one of the sets Q i of state letters for the Smachine M, θ-bands for every θ ∈ Θ, and Y -bands, where Z = {a, a (1) , . . . , a (L) } ⊆ Y . The convention is that Y -bands do not contain (θ, q)-cells, and so they consist of (θ, a)cells only. (1) If the start and the end edges e and f have different labels, then B has (θ, q)-cells.
(2) For every (θ, q)-cell π i of B, one of its boundary q-edges belongs in q 1 and another one belongs in q 2 .
Proof. (1) If every cell π i of B is a (θ, a)-cell, then both θ-edges of the boundary ∂π i have equal labels, as it follows from the definition of (θ, a)-relations. Then the definition of band implies that Lab(e) = Lab(f ), a contradiction.
(2) Proving by contradiction, we have that that π i and π j (i = j) share a boundary q-edge g. We may assume that the difference j − i > 0 is minimal, and so the subband formed by π i+1 , . . . , π j−1 has no (θ, q)-cells. It folows from (1) that π i and π j have the same boundary labels if one read then starting with Lab(g), contrary to the assumption that the diagram is reduced.
Remark 5.5. To construct the top (or bottom) path of a band B, at the beginning one can just form a product x 1 . . . x n of the top paths x i -s of the cells π 1 , . . . , π n (where each π i is a Z-bands of length 1). No θ-letter is being canceled in the word W ≡ Lab(x 1 ) . . . Lab(x n ) if B is a q-or Y -band since otherwise two neighbor cells of the band would make the diagram non-reduced. By Lemma 5.4 (2), there are no cancellations of q-letters of W if B is a θ-band.
If B is a θ-band then a few cancellations of Y -letters (but not q-letters) are possible in W. (This can happen if one of π i , π i+1 is a (θ, q)-cell and another one is a (θ, a)-cell.) We will always assume that the top/bottom label of a θ-band is a reduced form of the word W . This property is easy to achieve: by folding edges with the same labels having the same initial vertex, one can make the boundary label of a subdiagram in a van Kampen diagram reduced (e.g., see [13] or [26]).
We shall call a Z-band maximal if it is not contained in any other Z-band. Counting the number of maximal Z-bands in a diagram we will not distinguish the bands with boundaries e −1 q 1 fq −1 2 and fq −1 2 e −1 q 1 , and so every Z-edge belongs to a unique maximal Z-band.
We say that a Z 1 -band and a Z 2 -band cross if they have a common cell and Z 1 ∩Z 2 = ∅.
Sometimes we specify the types of bands as follows. A q-band corresponding to one of the letter Q of the base is called a Q-band. For example, we will considerť-band corresponding to the part {ť}.
Our previous papers (see [26], [1], etc.) contain the proof of the next lemma in a more general setting. The difference caused by different simulation of the S-machine M by defining relations of M does not affect the validity of the proof since the proof uses the properties mentioned in Lemma 5.4 and Remark 5.5. To convince the reader, below we recall the proof of one the following claims.
Lemma 5.6. A reduced van Kampen diagram ∆ over M has no q-annuli, no θ-annuli, and no Y -annuli. Every θ-band of ∆ shares at most one cell with any q-band and with any Y -band.
Proof. We will prove only the property that a θ-band T and a q-band Q cannot cross each other two times. Taking a minimal counter-example, one asuumes that these bands have exactly two common cells π and π , and ∆ has no cells outside the region bounded by T and Q. Then Q has exactly two cells since otherwise a maximal θ-band starting with a cell π of Q, where π / ∈ {π, π }, has to end on Q, bounding with a part of T a smaller counter-example. Figure 2: A Q-band intersects a θ-band twice.
Thus, the boundaries of π and π share a q-edge.
For the similar reason, T has no (θ, q)-cells except for π and π , and by Lemma 5.4 (1), these cells have the same pairs of θ-edges in the boundaries. This makes the diagram non-reduced, a contradiction.
If W ≡ x 1 ...x n is a word in an alphabet X, X is another alphabet, and φ : X → X ∪ {1} (where 1 is the empty word) is a map, then φ(W ) ≡ φ(x 1 )...φ(x n ) is called the projection of W onto X . We shall consider the projections of words in the generators of M onto Θ (all θ-letters map to the corresponding element of Θ, all other letters map to 1), and the projection onto the alphabet {Q 0 · · · Q N −1 } (every q-letter maps to the corresponding Q i , all other letters map to 1).
Definition 5.7. The projection of the label of a side of a q-band onto the alphabet Θ is called the history of the band. The step history of this projection is the step history of the q-band. The projection of the label of a side of a θ-band onto the alphabet {Q 0 , ..., Q N −1 } is called the base of the band, i.e., the base of a θ-band is equal to the base of the label of its top or bottom As in the case of words, we will use representatives of Q j -s in base words. If W is a word in the generators of M , then by W ∅ we denote the projection of this word onto the alphabet of the S-machine M, we obtain this projection after deleting all superscripts in the letters of W . In particular, W ∅ ≡ W , if there are no superscripts in the letters of W .
We call a word W in q-generators and Y -generators permissible if the word W ∅ is admissible, and the letters of any 2-letter subword of W have equal superscripts (if any), except for the subwords (qť) ±1 , where the letter q has some superscript (i) and q ∅ ∈ Q N −1 ; in this case the superscript of the letterť must be (i + 1) (modulo L). (1) The trimmed bottom and top labels W 1 and W 2 of any reduced θ-band T containing at least one (θ, q) − cell are permissible and W ∅ 2 ≡ W ∅ 1 · θ. (2) If W is a θ-admissible word, then for a permissible word W 1 such that W ∅ 1 ≡ W (given by Remark 5.8) one can construct a reduced θ-band with the trimmed bottom label W 1 and the trimmed top label Proof.
(1) By Lemma 5.4 (2), we have are the labels of q-edges of some cells π(j) and π(j + 1) such that the subband connecting these cells has no (θ, q)-cells. Therefore by Lemma 5.4 (1), all the θ-edges between π(j) and π(j + 1) have the same labels. It follows from the list of (θ, a)-relations that all Yletters of the word u j have to belong in the same subalphabet. In particular, if we have the subword q j u j q j+1 , then the projection of this subword is a subword of W ∅ 1 satisfying the first condition from the definition of admissible word. Similarly one obtains other conditions if q j or/and q j+1 occur in W 1 with exponent −1. Hence the word W ∅ 1 (and W ∅ 2 ) are admissible, and the words W 1 , W 2 are permissible since again the condition on 2-letter subwords follows from Lemma 5.4 and the relations (5.6 -5.8).
If x = x 1 . . . x n ( y = y 1 . . . y n ) is the product of the top paths x i -s (bottom paths y i -s) of the all cells π 1 , . . . , π n of T , as in Remark 5.5, then the transition from the trimmed label of x to the trimmed label of y with erased superscripts, is the application of θ, as it follows from relations (5.6 -5.8). Since by definition, the application of θ automatically implies possible cancellations, we have W ∅ 2 ≡ W ∅ 1 · θ for the reduced words W 1 and W 2 , as required.
Since W is θ-admissible, there is an equality W ≡ W · θ. Therefore we can simulate the application of θ to every letter of W as follows. We draw a path p = e 1 . . . e n labeled by W 1 and attach a cell π i corresponding to one of the defining relations of M to every edge e i of p from the left. Since the word W 1 is permissible, the θ-edges started with the common vertex of π i and π i+1 must have equal labels, and so these two edges can be identified. Finally, we obtain a required θ-band. It is reduced diagram since the permissible word W 1 is reduced.

Trapezia
Definition 5.10. Let ∆ be a reduced diagram over M , which has boundary path of the form p −1 1 q 1 p 2 q −1 2 , where p 1 and p 2 are sides of q-bands, and q 1 , q 2 are maximal parts of the sides of θ-bands such that Lab(q 1 ), Lab(q 2 ) start and end with q-letters.
Then ∆ is called a trapezium. The path q 1 is called the bottom, the path q 2 is called the top of the trapezium, the paths p 1 and p 2 are called the left and right sides of the trapezium. The history (step history) of the q-band whose side is p 2 is called the history (resp., step history) of the trapezium; the length of the history is called the height of the trapezium. The base of Lab(q 1 ) is called the base of the trapezium.
Remark 5.11. Notice that the top (bottom) side of a θ-band T does not necessarily coincides with the top (bottom) side q 2 (side q 1 ) of the corresponding trapezium of height 1, and q 2 (q 1 ) is obtained from top(T ) (resp. bot(T )) by trimming the first and the last Y -edges if these paths start and/or end with Y -edges. We shall denote the trimmed top  (1) Let ∆ be a trapezium with history H ≡ θ(1) . . . θ(d) (d ≥ 1). Assume that ∆ has consecutive maximal θ-bands T 1 , . . . T d , and the words U j and V j are the trimmed bottom and the trimmed top labels of T j , (j = 1, . . . , d). Then H is an eligible word, U j , V j are permissible words, Furthemore, if the first and the last q-letters of the word U j or of the word V j have some superscripts (i) and (i ), then the difference i − i (modulo L) does not depend on on the choice of U j or V j .
(2) For every eligible computation U → · · · → U · H ≡ V of M with ||H|| = d ≥ 1 there exists a trapezium ∆ with bottom label U 1 (given by Remark 5.8) such that U ∅ 1 ≡ U , top label V d such that V ∅ d ≡ V , and with history H.
Proof. (1) The trimmed top side of one of the bands T j is the same as trimmed bottom side of T j+1 (j = 1, . . . , d − 1), and the equalities U 2 ≡ V 1 , . . . , U d ≡ V d−1 follow. The equalities V ∅ j ≡ U j (j = 1, . . . d) are given by Lemma 5.9 (1). By the same lemma the words U j and V j are permissible.
Assume that there is a cancellation: θ(i + 1) ≡ θ(i) −1 . Since ∆ is a reduced diagram, any pair of (θ, q)-cells π ∈ T i and π ∈ T i+1 with a common q-edge e are not cancellable. Hence the relations given by these cells are not uniquely defined by the q-letter Lab(e) and the history letter θ(i). It follows from the list of defining relations (5.6 -5.8) that Lab(e) has no superscripts while other labels of the boundary edges of these two cells do have superscripts. Thus, these relations are in the list (5.7) and θ(i) ≡ θ(23), which prove that the history H is eligible.
Since by Lemma 5.6 every maximal q-band of ∆ connects the top and the bottom of ∆, it suffices to prove the last claim under assumption that the base of ∆ is a word Q ±1 (Q ) ±1 of length 2. Then by definition of permissible word, i − i = 0, except for the base Q N −1 Q N (or the inverse one) with i − i = 1 modulo L (resp., i − i = −1 modulo L). Since all the words U j and V j have equal bases, the last statement of (1) is proved.
(2) We can obtain the θ(1)-band T 1 by Lemma 5.9 (2). By induction, there is a trapezium ∆ of height d − 1 with bottom label U 2 ≡ U 1 an top label V such that U ∅ 2 ≡ U ∅ 1 · θ(1) and V ∅ d ≡ V , such that the union ∆ of T 1 and ∆ has history H. If ∆ is not reduced then we have a pair of cancellable cells π ∈ T 1 and π ∈ T 2 . Then as in item (1) we conlude that θ(1) ≡ θ (23), and so the top q of T 1 has no superscript in the boundary label. Therefore one can replace ∆ with its subscript shift (∆ ) +1 in ∆. After such a modification, ∆ becomes a reduced diagram since for any pair cells π and π with common boundary edge from q, the other edges have now different superscripts in their labels. Since V ∅ d does not change under the superscript shift, the lemma is proved.

Big and standard trapezia
Using Lemma 5.12, one can immediately derive properties of trapezia from the properties of computations obtained earlier.
If H ≡ θ(i) . . . θ(j) is a subword of the history H from Lemma 5.12 (1), then the bands T i , . . . , T j form a subtrapezium ∆ of the trapezium ∆. This subtrapezium is uniquely defined by the subword H (more precisely, by the occurrence of H in the word θ 1 . . . θ d ), and ∆ is called the H -part of ∆.
Definition 5.13. We say that a trapezium ∆ is standard if the base of ∆ is the standard base B of M or B −1 , and the history of ∆ (or the inverse one) contains one of the words (a) χ(i − 1, i)H χ(i, i + 1) (i.e.,the S-machine works as Θ 4 ) or (b) ζ i−1,i H ζ i,i+1 (i.e.,it works as Θ 2 ).
Definition 5.14. We say that a trapezium Γ is big if (1) the base of ∆ or the inverse word has the form xvx, where xv a cyclic permutation of the L-s power of the standard base; (2) the diagram Γ contains a standard trapezium. Proof. The diagram ∆ is covered by L subtrapezia Γ i (i = 1, . . . , L) with bases xv i x.
Assume that the the step history of ∆ (or inverse step history) contains one of the subwords . Then by Lemma 4.4 (and 5.12), the base of ∆ has the form (xu) L x, where xu is a cyclic permutation of the standard base (or the inverse one). Since ∆ contains a standard subtrapezia, it is is big. Now, under the assumption that the step history has no subwords mentioned in the previous paragraph, it suffices to bound the the length of a side of every θ-band of arbitrary Γ i by ≤ c 4 (||V | Y + ||V ||), where V and V are the labels of the top and the bottom of Γ i .
Assume that the word xv i x has a proper subword yuy, where u has no letters y, and any other letter occurs in u at most once. Then the word yuy is faulty since v i has no letters x. By Lemma 4.8, we have |U j | Y ≤ c 1 max(|U 0 | Y , |U t | Y ) for every configuration U j of the computation given by Lemma 5.12 (1) restricted to the base yuy. Since c 4 > c 1 , it suffices to obtain the desired estimate for the computation whose base is obtained by deleting the subword yu from xv i x. Hence inducting on the length of the base of Γ i , one may assume that it has no proper subwords yuy, and so the base of Γ i is revolving. Now the required upper estimate for Γ i follows from Lemma 4.11 (see (1) and (2c) there).

A modified length function
Let us modify the length function on the group words in q-, Y -and θ-letters, and paths. The standard length of a word (a path) will be called its combinatorial length. From now on we use the word 'length' for the modified length. Definition 6.1. We set the length of every q-letter equal to 1, and the length of every Y -letter equal a small enough number δ so that Jδ < 1. (6.10) We also set to 1 the length of every word of length ≤ 2 which contains exactly one θ-letter and no q-letters (such words are called (θ, Y )-syllables). The length of a decomposition of an arbitrary word into a product of letters and (θ, Y )-syllables is the sum of the lengths of the factors.
The length |w| of a word w is the smallest length of such decompositions. The length |p| of a path in a diagram is the length of its label. The perimeter |∂∆| of a van Kampen diagram over G is similarly defined by cyclic decompositions of the boundary ∂∆.
The next statement follows from the property of (θ, q)-relations and their cyclic permutations: the subword between two q-letters in arbitrary (θ, q)-relation is a syllable. This, in turn, follows from Property (*) of the S-machine M 2 (see Remark 3.8).

Rim bands
Let e −1 q 1 fq −1 2 be the standard factorization of the boundary of a θ-band. If the path (e −1 q 1 f ) ±1 or the path (fq −1 2 e −1 ) ±1 is the subpath of the boundary path of ∆ then the band is called a rim band of ∆.
From now on we shall fix a constant K K > 2K 0 = 4LN (6.12) The following basic facts will allow us to remove short enough rim bands from van Kampen diagrams (see Lemma 6.17 below). Lemma 6.3. Let ∆ be a van Kampen diagram whose rim θ-band T has base with at most K letters. Denote by ∆ the subdiagram ∆\T . Then |∂∆| − |∂∆ | > 1.
Proof. Let s = top(T ) and s ⊂ ∂∆. Note that the difference between the number of Y -edges in s = bot(T ) the number of Y -edges in s cannot be greater than 2K, because every (θ, q)-relator has at most two Y -letters by Property (*) and the commutativity relations do not increase the number of Y -letters. Hence |s | − |s| ≤ 4LN δ. However, ∆ is obtained by cutting off T along s , and its boundary contains two θ-edges fewer than ∆. Hence we have |s 0 | − |s 0 | ≥ 2 − 2δ for the complements s 0 and s 0 of s and s , respectively, in the boundaries ∂∆ and ∂∆ . Finally, 3), (6.11) and the highest parameter principle . Proof. The hub base includes every base letter L times. Hence every word in this group alphabet of length ≥ K 0 + 1 includes one of the letters L + 1 times.

Combs
Definition 6.6. We say that a reduced diagram Γ is a comb if it has a maximal q-band Q (the handle of the comb), such that (C 1 ) bot(Q) is a part of ∂Γ, and every maximal θ-band of Γ ends at a cell in Q.
If in addition the following properties hold: (C 2 ) one of the maximal θ-bands T in Γ has a tight base (if one reads the base towards the handle) and Notice that every trapezium is a comb.  Figure 4). We can assume that bot(T 1 ) and top(T l ) are contained in ∂Γ. Denote by ν a = |∂Γ| Y the number of Y -edges in the boundary of Γ, and by ν a the number of Y -edges on bot(T 1 ). Then ν a + 2lb ≥ 2ν a , and the area of Γ does not exceed c 0 bl 2 + 2ν a l for some constant c 0 = c 0 (M) . (Recall that c 0 is one of the parameters from Section 2.3.) Figure 4: A comb Definition 6.8. We say that a subdiagram Γ of a diagram ∆ is a subcomb of ∆ if Γ is a comb, the handle of Γ divides ∆ in two parts, and Γ is one of these parts. Lemma 6.9. [Compare with Lemma 4.9 of [20]] Let ∆ be a reduced diagram over G with non-zero area, where every rim θ-band has base of length at least K. Assume that (1) ∆ is a diagram over the group M or (2) ∆ has a subcomb of basic width at least K 0 . Then there exists a maximal q-band Q dividing ∆ in two parts, where one of the parts is a tight subcomb with handle Q.
Proof. Let T 0 be a rim band of ∆ ( fig.5). Its base w is of length at least K, and therefore w has disjoint prefix and suffix of lengths K 0 since K > 2K 0 by (6.12). The prefix of this base word must have its own tight subprefix w 1 , by Lemma 6.5 and the definition of tight words. A q-edge of T 0 corresponding to the last q-letter of w 1 is the start edge of a maximal q-band Q which bounds a subdiagram Γ containing a band T (a subband of T 0 ) satisfying property (C 2 ). It is useful to note that a minimal suffix w 2 of w, such that w −1 2 is tight, allows us to construct another band Q and a subdiagram Γ which satisfies (C 2 ) and has no cells in common with Γ .
Thus, there are Q and Γ satisfying (C 2 ). Let us choose such a pair with minimal Area (Γ). Assume that there is a θ-band in Γ which does not cross Q. Then there must exist a rim θ-band T 1 which does not cross Q in Γ. Hence one can apply the construction from the previous paragraph to T 1 and construct two bands Q 1 and Q 2 and two disjoint subdiagrams Γ 1 and Γ 2 satisfying the requirement (C 2 ) for Γ. Since Γ 1 and Γ 2 are disjoint, one of them, say Γ 1 , is inside Γ. But the area of Γ 1 is smaller than the area of Γ, and we come to a contradiction. Hence Γ is a comb and condition (C 1 ) is satisfied.
Assume that the base of a maximal θ-band T of Γ has a tight proper prefix (we may assume that T terminates on Q), and again one obtain a q-band Q in Γ, which provides us with a smaller subdiagram Γ of ∆, satisfying (C 2 ), a contradiction. Hence Γ satisfies property (C 3 ) as well.
(2) The proof is shorter since a comb is given in the very beginning.
We will also need the definition of a derivative subcomb from [16]. Definition 6.10. If Γ is a comb with handle C and B is another maximal q-band in Γ, then B cuts up Γ in two parts, where the part that does not contain C is a comb Γ 0 with handle B. It follows from the definition of comb, that every maximal θ-band of Γ crossing B connects B with C. If B and C can be connected by a θ-band containing no (θ; q)-cells, then Γ 0 is called the derivative subcomb of Γ. Note that no maximal θ-band of Γ can cross the handles of two derivative subcombs.

The mixture
We will need a numerical parameter associated with van Kampen diagrams introduced in [16], it was called mixture.
Let O be a circle with two-colored (black and white) finite set of points (or vertices) on it. We call O a necklace with black and white beads on it.
Assume that there are n white beads and n black ones on O. We define sets P j of ordered pairs of distinct white beads as follows. A pair (o 1 , o 2 ) (o 1 = o 2 ) belongs to the set P j if the simple arc of O drawn from o 1 to o 2 in the clockwise direction has at least j black beads. We denote by µ J (O) the sum J j=1 card(P j ) (the J-mixture of O). Below similar sets for another necklace O are denoted by P j . . In this subsection, J ≥ 1, but later on it will be a fixed large enough number J from the list (2.3).  The Dehn function of the group M is super-quadratic (in fact by [20] it is at least n 2 log n because M is a mulltiple HNN extension of a free group and has undecidable conjugacy problem). However we are going to obtain a quadratic Dehn function of G, and first we want to bound the areas of the words vanishing in M with respect to the presentation of G. For this goal we artificially introduce the concept of G-area, as in [17]. The G-area of a big trapezia can be much smaller than the real area of it in M . This concept will be justified at the end of this paper, where some big trapezia are replaced by diagrams with hubs whose areas do not exceed the G-area of the trapezia.
Definition 6.13. The G-area Area G (Γ) of a big trapezium Γ is, by definition, the minimum of the half of its area (i.e., the number of cells) and the product where h is the height of Γ and c 5 is one of the parameters from (2.3).
To define the G-area of a diagram ∆ over M , we consider a family S of big subtrapezia (i.e.,subdiagrams, which are big trapezia) and single cells of ∆ such that every cell of ∆ belongs to a member Σ of this family, and if a cell Π belongs to different Σ 1 and Σ 2 from S, then both Σ 1 and Σ 2 are big subtrapezia of ∆ with bases xv 1 x, xv 2 x, and Π is a (θ, x)-cell. (In the later case, the intersection Σ 1 ∩ Σ 2 must be an x-band.) There is such a family 'covering' ∆, e.g., just the family of all cells of ∆.
The G-area of S is the sum of G-areas of all big trapezia from S plus the number of single cells from S (i.e.,the G-area of a cell Π is Area G (Π) = 1). Finally, the G-area Area G (∆) is the minimum of the G-areas of all "coverings" S as above.
It follows from the Definition 6.13 that Area G (∆) ≤ Area(∆) since the G-area of a big trapezium does not exceed a half of its area and no cell belongs to three big trapezia of a covering. Lemma 6.14. Let ∆ be a reduced diagram, and every cell π of ∆ belongs in one of subdiagrams ∆ 1 , . . . , ∆ m , where any intersection ∆ i ∩ ∆ j either has no cells or it is a q-band, Then Proof. Consider the families S 1 , . . . , S m given by the definition of G-areas for the diagrams ∆ 1 , . . . , ∆ m . Then the family S = S 1 ∪ · · · ∪ S m 'covers' the entire ∆ according to the above definition. This implies the required inequality for G-areas,

Combs of a potential counterexample
In this section we show that for some constants N 1 , N 2 the G-area of any reduced diagram ∆ over M with perimeter n does not exceed N 2 n 2 + N 1 µ(∆).
Using the quadratic upper bound for µ(∆) from Lemma 6.11 (a), one then deduces that the G-area is bounded by N n 2 for some constant N .
Roughly speaking, we are doing the following. We use induction on the perimeter of the diagram. First we remove rim θ-bands (those with one side and both ends on the boundary of the diagram) with short bases. This operation decreases the perimeter and preserves the sign of so we can assume that the diagram does not have rim θ-bands. Then we use Lemma 6.9 and find a tight comb inside the diagram with a handle C. We also find a long enough q-band C that is close to C. We use a surgery which amounts to removing a part of the diagram between C and C and then gluing the two remaining parts of ∆ together. The main difficulty is to show that, as a result of this surgery, the perimeter decreases and the mixture changes in such a way that the expression does not change its sign.
In the proof, we need to consider several cases depending on the shape of the subdiagram between C and C. Note that neither N 2 n 2 nor N 1 µ(∆) nor Area G (∆) alone behave in the appropriate way as a result of the surgery, but the expression behaves as needed.
Arguing by contradiction in the remaining part of this section, we consider a counterexample ∆ with minimal perimeter n, so that Of course, the G-area of ∆ is positive, and, by Lemma 5.6, we have at least 2 θ-edges on the boundary ∂∆, so n ≥ 2.
Lemma 6.15. (1) The diagram ∆ has no two disjoint subcombs Γ 1 and Γ 2 of basic widths at most K with handles B 1 and B 2 such that some ends of these handles are connected by a subpath x of the boundary path of ∆ with |x| q ≤ N .
(2) The boundary of every subcomb Γ with basic width s ≤ K has 2s q-edges.
Proof. We will prove the Statements (1) and (2) simultaneously. We use induction on A = Area(Γ 1 ) + Area(Γ 2 ) for Statement (1) and induction on A = Area(Γ) for Statement (2). Suppose that our diagram ∆ is also a counterexample for Statement (1) or (2) with minimal possible A. Suppose that ∆ is a counterexample to (1). Since the area of Γ i (i = 1, 2) is smaller than A, we may use Statement (2) for Γ i , and so we have at most 2K q edges in ∂Γ i .
Let h 1 and h 2 be the lengths of the handles B 1 and B 2 of Γ 1 and Γ 2 , resp. Without loss of generality, we assume that h 1 ≤ h 2 . Denote by y i z i the boundaries of Γ i (i = 1, 2), where z i is the part of ∂∆ and y i is the side of the handle of Γ i (so y 1 xy 2 is the part of the boundary path of ∆, see Figure 6 (1)). Then each of the θ-edges e of y 1 is separated in ∂∆ from every θ-edge f of y 2 by less than 4K + N < J q-edges. Hence every such pair (e, f ) (or the pair of white beads on these edges) makes a contribution to µ(∆).
Let ∆ be the diagram obtained by deleting the subdiagram Γ 1 from ∆. When passing from ∂∆ to ∂∆ , one replaces the θ-edges (black beads) from z 1 by the θ-edge of y 1 (black bead) belonging to the same maximal θ-band. The same is true for white beads.
But each of the h 1 h 2 pairs in the corresponding set P of white beads is separated in ∂∆ by a smaller number of black beads than for the pair defined by ∆. Indeed, since the handle of Γ 1 is removed when one replaces ∂∆ by ∂∆ , two black beads at the ends of this handle are removed, and therefore µ(∆) − µ(∆ ) ≥ h 1 h 2 (6.14) by Lemma 6.11 (d).
Let ν a be the number of Y -edges in ∂Γ 1 . It follows from Lemma 6.7 that the area, and so the G-area of Γ 1 , does not exceed J(h 1 ) 2 + 2ν a h 1 since J > c 0 K.
(2) If there are at least two derivative subcombs of Γ, then one can find two of them satisfying the assumptions of Statement (1).
Indeed, the derivative subcombs of Γ are ordered linearly in a natural way (as they connected with the handle of Γ by θ-bands). Consider two neighbor derivative subcombs Γ 1 , Γ 2 . The handle of Γ i are intersected by two collections of θ-bands C 1 , C 2 which connect these handles with the handle of Γ (by Definition 6.10). The maximal θ-bands that intersect the handle of Γ and are between the two collections C 1 , C 2 do not intersect any derivative combs, hence they do not intersect q-bands except for the handle of Γ. Therefore the handles of Γ 1 and Γ 2 are connected by a subpath x of ∂∆ with no q-edges, so |x| q = 0 < N .
We deduce that Area(Γ 1 ) + Area(Γ 2 ) < Area(Γ) = A, a contradiction. Therefore there is a most one derivative subcomb Γ in Γ (Figure 6 (2)). In turn, Γ has at most one derivative subcomb Γ , and so on. It follows that there are no maximal q-bands in Γ except for the handles of Γ , Γ , . . . . Since the basic width of Γ is s, we have s maximal q-bands in Γ, and the lemma is proved. Lemma 6.16. There are no pair of subcombs Γ and Γ in ∆ with handles X and X of length and such that Γ is a subcomb of Γ, the basic width of Γ does not exceed K 0 and ≤ /2.
Proof. Proving by contradiction, one can choose Γ so that is minimal for all subcombs in Γ and so Γ has no proper subcombs, i.e. its basic width is 1 (fig. 7). It follows from Lemma 6.7 that for ν Y = |Γ | Y , we have Let ∆ be the diagram obtained after removing the subdiagram Γ from ∆. The following inequality is the analog of (6.15) (where h 1 is replaced by ) |∂∆| − |∂∆ | ≥ γ = max(2, δ(ν a − 2l )) (6.21) The q-band X contains a subband C of length l . Moreover one can choose C so that all maximal θ-bands of Γ crossing the handle X of Γ , start from C. These θ-bands form a comb Γ contained in Γ, and in turn, Γ contains Γ . The two parts of the compliment X \C are the handles of two subcombs E 1 and E 2 formed by maximal θ-bands of Γ, which do not cross X . Let the length of these two handles be 1 and 2 , respectively, and so we have l 1 + l 2 = l − l > l . (E 1 or E 2 can be empty; then l 1 or l 2 equals 0.) It will be convenient to assume that Γ is drawn from the left of the vertical handle X . Denote by yz the boundary path of of Γ, where y is the right side of the band X . Thus, there are l 1 (resp., l 2 ) θ-edges on the common subpath x 1 (subpath x 2 ) of z and ∂E 1 (and ∂E 2 ). By Lemma 6.15 (2), the path z contains at most 2K 0 q-edges, because the basic width of Γ is at most K 0 .
Consider the factorization z = x 2 xx 1 , where x is a subpath of ∂Γ . It follows that between every white bead on x 1 (i.e. the middle point of the θ-edges on x 1 ) and a white bead on x we have at most 2K 0 black beads (i.e. the middle points of the qedges of the path x). Since J is greater than 2K 0 , every pair of white beads, where one bead belongs in x and another one belongs in x 1 (or, similarly, in x 2 ) contributes 1 to µ(∆). Let P denote the set of such pairs. By the definition of E 1 and E 2 , we have card(P ) = l (l 1 + l 2 ) = l (l − l ) > (l ) 2 .
When passing from ∂∆ to ∂∆ , one replaces the left-most θ-edges of every maximal θ-band from Γ with the right-most θ-edges lying on the right side of X . The same is true for white beads. But each of the l (l − l ) pairs in the corresponding set P of white beads is separated in ∂∆ by less number of black beads since the q-band X is removed. Therefore every pair from P gives less by 1 contribution to the mixture, as it follows from the definition of mixture. Hence µ(∆) − µ(∆ ) ≥ l (l − l ) ≥ (l ) 2 . This inequality and inequality (6.21) imply that because the perimeter of ∆ is less than the perimeter of the minimal counter-example ∆. Adding the estimate of G-area of Γ (6.20) we see that This will contradict the fact that ∆ is a counterexample of (6.13) when we prove that − N 2 γn − N 1 (l ) 2 + c 0 (l ) 2 + 2ν a l < 0, (6.22) Consider two cases. (a) Let ν a ≤ 4l . Then inequality (6.22) follows from the inequalities γ ≥ 2 and N 1 ≥ c 0 + 8.
Lemma 6.17. ∆ has no rim θ-band whose base has s ≤ K letters.
Since top(T ) lies on ∂∆, we have from the definition of the length , that the number of Y -edges in top(T ) is less than δ −1 (n − s). By Lemma 6.2, the length of T is at most 3s + δ −1 (n − s) < δ −1 n. Thus, by applying the inductive hypothesis to ∆ , we have that G-area of ∆ is not greater than N 2 (n − 1) 2 + N 1 µ(∆) + δ −1 n because µ(∆ ) ≤ µ(∆) by Lemma 6.11 (b). But the first term of this sum does not exceed N 2 n 2 − N 2 n and so the entire sum is bounded by N 2 n 2 + N 1 µ(∆) provided This contradicts the choice of ∆, and the lemma is proved. Figure 8: Rim θ-band

The quadratic estimate
The next lemma is one of the main ingredient in this section.
Proof. We continue studying the hypothetical counter-example ∆ of minimal possible perimeter. By Lemma 6.17, now we can apply Lemma 6.9 (1). By that lemma, there exists a tight subcomb Γ ⊂ ∆. Let T be a θ-band of Γ with a tight base.
The basic width of Γ is less than K 0 by Lemma 6.5. Since the base of Γ is tight, it is equal to uxvx for some x, where the last occurrence of x corresponds to the handle Q of Γ, the word u does not contain x, and v has exactly L − 1 occurrences of x. Let Q be the maximal x-band of Γ crossing T at the cell corresponding to the first occurrence of x in uxvx ( fig. 9 (a)).
We consider the smallest subdiagram Γ of ∆ containing all the θ-bands of Γ crossing the x-band Q . It is a comb with handle Q 2 ⊂ Q. The comb Γ is covered by a trapezium Γ 2 placed between Q and Q, and a comb Γ 1 with handle Q . The band Q belongs to both Γ 1 and Γ 2 . The remaining part of Γ is a disjoint union of two combs Γ 3 and Γ 4 whose handles Q 3 and Q 4 contain the cells of Q that do not belong to the trapezium Γ 2 . The handle of Γ is the composition of handles Q 3 , Q 2 , Q 4 of Γ 3 , Γ and Γ 4 in that order.

3) give inequalities
where A i is the G-area of Γ i . (We take into account that G-area cannot exceed area.) Let p 3 , p 4 be the top and the bottom of the trapezium Γ 2 . Here p −1 3 (resp. p −1 4 ) shares some initial edges with ∂Γ 3 (with ∂Γ 4 ), the rest of these paths belong to the boundary of ∆. We denote by d 3 the number of Y -edges of p 3 and by d 3 the number of the Y -edges of p 3 which do not belong to Γ 3 . Similarly, we introduce d 4 and d 4 .
Let A 2 be the G-area of Γ 2 . Then by Lemma 5.15 and the definition of the G-area for big trapezia (if Γ 2 is big), we have because the basic width of Γ 2 is less than K and J > 2Kc 5 by (2.3).
Recall that first and the last base letters of the base of the trapezium Γ 2 are equal to x. So for every maximal θ-band T , the first and the last (θ, q)-cells have equal boundary labels up to some superscript shift +k (if there are superscripts in these labels). However k does not depend on the choice of T by the last statement of Lemma 5.12 (1). Therefore the whole Q (+k) is a copy of Q 2 , and so there is a superscript shift Γ (+k) 1 of the entire comb Γ 1 such that the handle (Q ) (+k) of Γ (+k) 1 is a copy of Q 2 . This makes the following surgery possible. The diagram ∆ is covered by two subdiagrams: Γ and another subdiagram ∆ 1 , having only the band Q 2 in common. We construct a new auxiliary diagram by attaching of Γ (+k) 1 to ∆ 1 ∪ Q with identification of the of the band (Q ) (+k) of Γ (+k) 1 and the band Q 2 . We denote the constructed diagram by ∆ 0 .
Note that Area G (Γ (+k) 1 ) = Area G (Γ) and ∆ 0 is a reduced diagram because every pair of its cells having a common edge, has a copy either in Γ 1 or in ∆ 1 ∪ Q. Now we need the following claim.
Lemma 6.19. The G-area A 0 of ∆ 0 is at least the sum of the G-areas of Γ 1 and ∆ 1 minus l .
Proof. Consider a minimal covering S of ∆ 0 from Definition 6.13of G-area, and assume that there is a big trapezium E ∈ S, such that neither Γ (+k) 1 nor ∆ 1 contains it. Then E has a base ywy, where (yw) ±1 is a cyclic permutation of the L-th power of the standard base, and the first y-band of E is in Γ (+k) 1 , but it is not a subband of Q . Since the history H of the big trapezium E is a subhistory of the history of Γ 2 , and H uniquely determines the base starting with given letter by Lemma 4.4, we conlude that Γ 2 is a big trapezium itself, and therefore (xv) ±1 is an L-th power of the standard base. Since the first y occurs in uxvx before the first x it follows that we have the (L + 1) − th occurrence of y before the last occurrence of x in the word uxvx. But this contradicts the definition of tight comb Γ.
Hence every big trapezium from S entirely belongs either in Γ (+k) 1 or in ∆ 1 . Therefore one can obtain 'coverings' S and S of these two diagrams if (1) every Σ from S is assigned either to S or to S and then (2) one add at most l single cells since the common band Q in ∆ 0 should be covered twice in disjoint diagrams Γ (+k) 1 and ∆ 1 . These construction complete the proof of the lemma.
Let us continue the proof of Lemma 6.18. By Lemma 6.14, the G-area of ∆ does not exceed the sum of G-areas of the five subdiagrams Γ 1 , Γ 2 , Γ 3 , Γ 4 and ∆ 1 . But the direct estimate of each of these values is not efficient. Therefore we will use Lemma 6.19 to bound the G-area of the auxiliary diagram ∆ 0 built of two pieces Γ 1 and ∆ 1 .
It follows from our constructions and lemmas 6.14, 6.19, that Let p 3 be the segment of the boundary ∂Γ 3 that joins Q and Γ 2 along the boundary of ∆ ( fig. 9 (b)). It follows from the definition of d 3 , d 3 , l 3 and ν 3 , that the number of Y -edges lying on p 3 is at least Let u 3 be the part of ∂∆ that contains p 3 and connects Q with Q . It has l 3 θ-edges. Hence we have, by Lemma 6.2, that Since u 3 includes a subpath of length d 3 having no θ-edges, we also have by Lemma 6.2 (c) that |u 3 | ≥ l 3 + δ(d 3 − 1).
One can similarly define p 4 and u 4 for Γ 4 . When passing from ∂∆ to ∂∆ 0 we replace the end edges of Q , u 3 and u 4 by two subpaths of ∂Q having lengths l 3 and l 4 . Let n 0 = |∂∆ 0 |. Then it follows from the previous paragraph that In particular, n 0 ≤ n − 2. By the inductive hypothesis, We note that the mixture µ(∆ 0 ) of ∆ 0 is not greater than µ(∆) − l (l − l ) . Indeed, by Lemma 6.16 (2), one can use the same trick as in Lemma 6.16 as follows. For every pair of white beads, where one bead corresponds to a θ-band of Γ 2 and another one to a θ-band of Γ 3 or Γ 4 , the contribution of this pair to µ(∆ 0 ) is less than the contribution to ∆. It remains to count the number of such pairs: l (l 3 + l 4 ) = l(l − l ).
7 Minimal diagrams over G

Diagrams with hubs
Given a reduced diagram ∆ over the group G, the maximal q-bands start and end either on the boundary ∂∆ or on the boundaries of hubs. Therefore one can construct a planar graph whose vertices are the hubs of this diagram plus one improper vertex outside ∆, and the edges are the maximalť-bands of ∆.

Eliminating pairs of hubs connected by twoť-bands
Let us consider two hubs Π 1 and Π 2 in a reduced diagram, connected by two neighboř t-bands C and C , where there are no other hubs between theseť-bands. By Lemma 5.6, these bands, together with parts of ∂Π 1 and ∂Π 2 , bound either a subdiagram having no  1 (fig. 10). The former case is impossible. Indeed, in this case the hubs have to correspond the same hub relation since the relations (5.9) have no common letters. Hence the diagram is not reduced since a cyclic permutation of a hub relation starting with a fixed copy of the letterť is unique.
We want to show that the latter case is not possible either if the diagram ∆ is chosen with minimal number of hubs among the diagrams with the same boundary label.
Indeed, by Lemma 5.9 (1), theť-band C is a k-shift of C In fact, k = ±1 since the superscripts of the letters in W L st change by one after everyť-letter. One may assume that k = 1. So if we construct a 1-shift Ψ 2 of Ψ 1 = Ψ, then the first maximalťband of Ψ 2 is a copy of C (the secondť-band in Ψ 1 ). Similarly one can construct , . . . , Ψ L = Ψ (+L) 1 . Let us separately construct an auxiliary diagram ∆ 1 consequently attaching the bottoms of Ψ 1 , Ψ 2 , . . . , Ψ L to Π 1 and identifying the second ť -band of Ψ i with the firstť-band of Ψ i+1 (indices modulo L). This is possible since the L-shift of any diagram is equal to itself. Now we can attach Π 2 to the tops of Ψ i -s in ∆ 1 and obtain a spherical diagram ∆ 2 . The diagram ∆ 2 contains a copy of the subdiagram Γ of ∆ formed by Π 1 , Π 2 and Ψ. Hence the boundary label of Γ is equal to the boundary label of the compliment Γ of (the copy of) the subdiagram Γ in ∆ 2 . Thus, one can replace Γ with Γ in ∆ decreasing the number of hubs.

Disks
Definition 7.1. A permissible word V is called a disk word if V ∅ ≡ W L for some accessible word W . The cyclic permutations of W and W −1 are also disk words by definition.
Lemma 7.2. Every disk word V is equal to 1 in the group G.
Proof. Assume there is an eligible computation W st → · · · → W , where V ∅ ≡ W L . Then the computation W L st → · · · → W L with the same history is eligible too. By Lemma 5.12 (2), one can construct a trapezium ∆ with bottom label W st . Since V is the boundary label of the obtained disk diagram, we have V = 1 in G, and so V = 1, as required. If there there is an eligible computation W → · · · → W ac , then the proof is similar with bottom label of ∆ equal to W L ac .
Remark 7.3. In fact, for the disk word W , we have built a van Kampen diagram using one hub and L trapezia corresponding to an accessible computation for W .
We will increase the set of relations of G by adding the (infinite) set of disk relation V for every disk word V . So we will consider diagrams with disks, where every disk cell (or just disk) is labeled by such a word V . (In particular, a hub is a disk.) If two disks are connected by twoť-bands and there are no other disks between thesě t-bands, then one can reduce the number of disks in the diagram. For this aid, it suffices to apply the trick exploited for a pair of hubs in Subsection 7.1.1.
Definition 7.4. We will call a reduced diagram ∆ minimal if (1) the number of disks is minimal for the diagrams with the same boundary label as ∆ and (2) ∆ has minimal number of (θ, t)-cells among the diagrams with the same boundary label and with minimal number of disks.
Clearly, a subdiagram of a minimal diagram is minimal itself.
Thus, no two disks of a minimal diagram are connected by twoť-bands, such that the subdiagram bounded by them contains no other disks. This property makes the disk graph of a reduced diagram hyperbolic, in a sense, if the degree L of every proper vertex (=disk) is high (L 1). Below we give a more precise formulation (proved for diagrams with such a disk graph, in particular, in [26], Lemma 11.4 and in [15], Lemma 3.2).  A maximal q-band starting on a disk of a diagram is called a spoke.

The band moving transformation
Recall the following band moving transformation for diagrams with disks, exploited earlier in [15], [26]. Assume there is a disk Π and a θ-band T subsequently crossing some spokes B 1 , . . . , B k which start (say, counter-clockwise) from Π. Assume that k ≥ 2 and there are no other cells between Π and the bottom of T , and so there is a subdiagram Γ formed by Π and T .
We describe the band moving transformation (see, e.g., [26]) as follows. By Lemma 5.9 (1), for some s, we have a word )( t s ) +(k−1)) ) written on the top of the subband T of T , that starts on B 1 and ends on B k . (There are no superscripts in V if V is θ-admissible word for a rule θ of Steps 3 -5.) The bottom q 2 of T is the subpath of the boundary path q 2 q 3 of Π ( fig. 12), its label is a part of a disk word, and so is V by Lemma 5.9.
Therefore one can construct a new disk Π with boundary label and boundary s 1 s 2 , where Lab(s 1 ) ≡ V . Also one construct an auxiliary band T with top label and attach it to s −1 2 , which has the same label. Finally we replace the subband T by T (and make cancellations in the new θ-band T if any appear). The new diagram Γ formed by Π and T has the same boundary label as Γ. In particular, if k > L − k, then the number of (θ, t)-cells in Γ is less than the number of (θ, t)-cells in Γ. This observation implies Lemma 7.7. Let ∆ be a minimal diagram.
(1) Assume that a θ-band T 0 crosses kť-spokes B 1 , . . . , B k starting on a disk Π, and there are no disks in the subdiagram ∆ 0 , bounded by these spokes, by T 0 and by Π. Then k ≤ L/2.
(2) Assume that there are two disjoint θ-bands T and S whose bottoms are parts of the boundary of a disk Π and these bands correspond to the same rule θ (if their histories are read towards the disk) and θ = θ (23). Suppose T crosses k ≥ 2ť-spokes starting on ∂Π and S crosses ≥ 2ť-spokes starting on ∂Π. Then k + ≤ L/2.
(4) A θ-band cannot cross a maximal q-band (in particular, a spoke) twice.
Proof. (1) Since every cell, except for disks, belongs to a maximal θ-band, it follows from Lemma 5.6 that there is a θ-band T such that T crosses all B 1 , . . . , B k and ∆ 0 has no cells between T and Π. If k > L/2, then by Remark 7.6, the band moving T around Π would decrease the number of (θ, t)-cells in ∆, a contradiction, since ∆ is a minimal diagram.
(2) As above, let us move the band T aroud Π. This operation removes k (θ, t)-cells but add L − k new (θ, t)-cells in T . However (θ, t)-cells of S and (θ, t)-cells of T will form mirror pairs, because for θ = θ(23), the boundary label of a (θ, q)-cell π, considered as a θ-band, is uniquely determined by the history θ and the label of the top q-edge of π. So after cancellations one will have at most L − k − 2 new (θ, t)-cells. This number is less than k if k + > L/2 contrary to the minimality of the original diagram. Therefore k + ≤ L/2.
(3) Proving by contradiction, consider the subdiagram ∆ bounded by a θ-annulus. It has to contain disks by Lemma 5.6. Hence it must contain spokes B 1 , . . . , B L−3 introduced in Lemma 7.5. But this contradits to item (1) of the lemma since L − 3 > L/2.
(4) The argument of item (3) works if there is a subdiagram ∆ of ∆ bounded by an q-band and a θ-band.
The band moving will be used for removing disks from quasi-trapezia.

Quasi-trapezia
Definition 7.8. A quasi-trapezium is the same as trapezium (Definition 5.10), but may contain disks. (So a quasi-trapezium without disks is a trapezium.) Lemma 7.9. Let a minimal diagram Γ be a quasi-trapezium with standard factorization of the boundary as p −1 1 q 1 p 2 q −1 2 . Then there is a diagram Γ such that (1) the boundary of Γ is and Lab(q j ) ≡ Lab(q j ) for j = 1, 2; (2) the numbers of hubs and (θ, q)-cells in Γ are the same as in Γ; (3) the vertices (p 1 ) − and (p 2 ) − (the vertices (p 1 ) + and (p 2 ) + ) are connected by a simple path s 1 (by s 2 , resp.) such that we have three subdiagrams Γ 1 , Γ 2 , Γ 3 of Γ , where Γ 2 is a trapezium with standard factorization of the boundary p −1 1 s 1 p 2 s −1 2 and all cells of the subdiagrams Γ 1 and Γ 3 with boundaries q 1 s −1 1 and s 2 (q 1 ) −1 are disks; (4) All maximal θ-bands of Γ and all maximal θ-bands of Γ 2 have the same number of (θ, t)-cells (equal for Γ and Γ 2 ) .
Proof. Every maximal θ-band of Γ must connect an edge of p 1 with an edge of p 2 ; this follows from Lemma 7.7 (3). Hence we can enumerate these bands from bottom to top: If Γ has a disk, then by Lemma 7.5, there is a disk Π such that at least L − 3ť-spokes of it end on q 1 and q 2 , and there are no disks between the spokes ending on q 1 (and on q 2 ). By Lemma 7.7 (2), at least L − 3 − L/2 ≥ 2 of these spokes must end on q 1 (resp., on q 2 ).
If Π lies between T j and T j+1 , then the number of itsť-spokes crossing T j (crossing T j+1 ) is at least 2. So one can move each of the two θ-bands around Π. So we can move the disk toward q 1 (or toward q 2 ) until the disk is removed from the quasi-trapezium. (We use the property that if kť-spokes B 1 , . . . , B k of Π end on q 1 , then after band toward q 1 , we again have kť-spokes B 1 , . . . , B k of Π ending on q 1 . -See the notation of Remark 7.6.) No pair T j and T j+1 corresponds to two mutual inverse letters θθ −1 of the history if θ = θ(23). This follows from Lemma 5.12 (1) if there are no disks between these θ-bands. If there is a disk, then this is impossible too by Lemma 7.9 (2) since one could choose a disk Π as in the previous paragraph. So the projection of the label of p 1 on the history is eligible.
Let us choose i such that the number m of (θ, t)-cells in T i is minimal. It follows that Γ has at least hm (θ, t)-cells.
If the disk Π lies above T i , we will move it upwards using the band moving transformation. So after a number of iterations all such (modified) disks will be placed above the θ-band number h and form the subdiagram Γ 1 . Similarly we can form Γ 3 moving other disks downwards.
In the resulting diagram Γ 2 lying between Γ 1 and Γ 3 , every θ-band is reduced by the definition of band moving. The neighbor maximal θ-band of Γ 2 cannot be mirror copies of each other since the labels of p 1 and p 1 are equal and Lab(p 1 ) is a reduced word by Remark 5.5. It follows that the diagram Γ 2 (without disks) is a reduced diagram, and so it is a trapezium of height h.
The θ-band T i did not participate in the series of band moving transformations above. Therefore it is a maximal θ-band of Γ 2 . Hence the trapezium Γ 2 contains exactly mh (θ, t)-cells, which does not exceed the number of (θ, t)-cells in Γ. In fact these two numbers are equal since Γ is a minimal diagram. So every maximal θ-band of Γ and every maximal θ-band of Γ 2 has m (θ, t)-cells.

Shafts
We say that a history word H is standard if there is a standard trapezium with history H.
Definition 7.10. Suppose we have a disk Π with boundary label V , V ∅ ≡ (tW ) L , and B be at-spoke starting on Π. Suppose there is a subband C of B, which also starts on Π and has a standard history H, for which the wordtW is H-admissible. Then we call thě t-band C a shaft.
For a constant λ ∈ [0; 1/2) we also define a stronger concept of λ-shaft at Π as follows. A shaft C with history H is a λ-shaft if for every factorization of the history H ≡ H 1 H 2 H 3 , where ||H 1 || + ||H 3 || < λ||H||, the middle part H 2 is still a standard history. (So a shaft is a 0-shaft).
Lemma 7.11. Let Π be a disk in a minimal diagram ∆ and C be a λ-shaft at Π with history H. Then C has no factorizations C = C 1 C 2 C 3 such that (a) the sum of lengths of C 1 and C 3 do not exceed λ||H|| and (b) ∆ has a quasi-trapezium Γ such that top (or bottom) label of Γ has L+1 occurrences ofť-letters and C 2 starts on the bottom and ends on the top of Γ.
Proof. Proving by contradiction, we first replace Γ by a trapezium Γ according to Lemma 7.9. The transpositions used for this goal do not affect neither Π nor C since C crosses all the maximal θ-bands of Γ. Also one can replace Γ by a trapezium with shorter base and so we assume that the base of it starts and ends with lettert.
For the beginning, we assume that C is a shaft (i.e.,λ = 0). Then it follows from the definition of shaft and Lemma 4.4 that bot(Γ ) is labeled by a word V t such that V ∅ ≡ (tW ) L , where the word tW has standard base. Now it follows from Remark 5.8 and Lemma 5.12that V is the boundary label of Π. One can remove the last maximaľ t-band from Γ and obtain a subtrapezium Γ whose bottom label coincides with the label of ∂Π (up to cyclic permutation), and ∂Γ shares at-edge with ∂Π ( fig.13 with λ = 0). It follows that the subdiagram ∆ = Π ∪ Γ has boundary label freely equal to Lab(top(Γ )). However Lab(top(Γ ) ≡ V , where (V ) ∅ = V ∅ ·H by Lemma 5.12, and so there is a disk Π with boundary label V . Therefore the subdiagram ∆ can be replaced by a single disk. So we decrease the number of (θ, t)-cells contrary to the minimality of ∆.
Now we consider the general case, where C = C 1 C 2 C 3 . As above, we replace Γ by a trapezium Γ and obtain a trapezium Γ after removing of oneť-band in Γ . To obtain a contradiction, it suffices to consider the diagram ∆ = Π ∪ C 1 C 2 ∪ Γ (forgetting of the complement of ∆ in ∆) and find another diagram ∆ with one disk and fewer (θ, t)-cells such that Lab(∂∆ ) = Lab(∂∆ ) in the free group.
Since both histories H and H 2 (and so H 1 H 2 ) are standard, one can enlarge Γ and construct a trapezium Γ with history H 1 H 2 . (The added parts E 1 and E 2 are dashed in figure 13 with λ > 0). Note that we add < λ||H||L new (θ, t)-cells since every maximal θ-band of Γ has L such cells. As in case λ = 0, this trapezium Γ and the disk Π can be replaced by one disk Π . However to obtain the boundary label equal to Lab(∂∆ ), we should attach the mirror copies ∃ 1 and ∃ 2 of E 1 and E 2 to Π . The obtained diagram ∆ has at most λ||H 1 ||L (θ, t)-cells, while ∆ has at least ||H 2 ||L ≥ (1 − λ)||H|| (θ, t)-cells. Since λ < 1 − λ, we have the desired contradiction.

Designs
As in [17], we are going to use designs.
Let D be the Euclidean unit disk and T be a finite set of disjoint chords (plain lines in fig. 14) and Q a finite set of disjoint simple curves in D (dotted lines in fig. 14). We assume that a curve is a non-oriented broken line, i.e., it is built from finitely many finite line segments. To distinguish the elements from T and Q, we will say that the elements of Q are arcs.
We shall assume that the arcs belong to the open disk D o , an arc may cross a chord transversally at most once, and the intersection point cannot coincide with one of the two ends of an arc.
Under these assumptions, we shall say that the pair (T, Q) is a design. The number of elements in T and Q are denoted by #T and #Q.
By definition, the length |C| of an arc C is the number of the chords crossing C. The term subarc will be used in the natural way. Oviously one has |D| ≤ |C| if D is a subarc of an arc C.
We say that an arc C 1 is parallel to an arc C 2 and write C 1 C 2 if every chord (from T) crossing C 1 also crosses C 2 . So the relation is transitive (it is not necessarily symmetric). For example, the arc of length 2 is parallel to the arc of length 5 in fig. 14.
Definition 7.12. Given λ ∈ (0; 1) and an integer n ≥ 1, the property P (λ, n) of a design says that for any n different ars C 1 , . . . , C n , there exist no subarcs D 1 , . . . , D n , respectively, such that |D i | > (1 − λ)|C i | for every i = 1, . . . , n and D 1 D 2 · · · D n . The number of chords will be denoted by #T. Here is the main statement about designs from [17].

Designs and the σ λ invariant
Let λ ∈ [0, 1/2). For everyť-spoke B of a minimal diagram ∆, we choose the λ-shaft of maximal length in it (if a λ-shaft exists). If B connects two disks Π 1 and Π 2 , then there can be two maximal λ-shafts: at Π 1 and at Π 2 . We denote by σ λ (∆) the sum of lengths of all λ-shafts in this family.
Proof. Let us associate the following design with ∆. We say that the median lines of the maximal θ-bands are the chords and the median lines of the maximal λ-shafts are the arcs. Here we use two disjoint median lines if two maximal λ-shafts share a (θ,ť)-cell. By Lemma 7.7 (3), (4), we indeed obtain a design.
Observe that the length |C| of an arc is the number of cells in the λ-shaft and #T ≤ |∂∆|/2 since every maximal θ-band has two θ-edges on ∂∆.
Thus, by Theorem 7.13, it suffices to show that the constructed design satisfies the condition P (λ, n), where n does not depend on ∆.
Let n = 2L + 1. If the property P (λ, n) does not hold, then we have n maximal λshafts C 1 , . . . , C n and a subband D of C 1 , such that |D| > (1 − λ)|C 1 |, and every maximal θ-band crossing D must cross each of C 2 , . . . , C n . (Here |B| is the length of ať-band B.) It follows that each of these θ-band crosses at least L + 1 maximalť-bands. (See Lemma 7.7 (3,4). Here we take into account that the sameť-spoke can generate two arcs in the design.) Hence using the λ-shaft C 1 one can construct a quasi-trapezium of height |D|, which contradicts Lemma 7.11. 7.2 Upper bound for G-areas of diagrams over the group G.

The area of a disk is quadratic
By definition, the G-area of a disk Π is just the minimum of areas of diagrams over the presentation (5.6,5.9) of G having the same boundary label as Π.
Lemma 7.15. There is a constant c 6 such that both area and the G-area of any disk does not exceed c 6 |∂Π| 2 .
Proof. By Remark 7.3, a disk with boundary label V can be built of one hub and L trapezia corresponding to an accessible computation C for W , where W L ≡ V ∅ . By Lemma 4.9, the length of C can be bound by c 2 ||W || and the length of every configuration of C does not exceed c 1 ||W || Hence by Lemma 6.2, the area and the G-area of the disk is bounded by c 6 |∂Π| 2 since the constant c 6 can be chosen after c 1 , c 2 and δ.
By definition, the G-area of a minimal diagram ∆ over G is the sum of G-areas of its disks plus the G-area of the compliment Γ. For the compliment, as in subsection 6.3, we consider a family S of big subtrapezia and single cells of Γ such that every cell of Γ belongs to a member Σ of this family, and if a cell Π belongs to different Σ 1 and Σ 2 from Σ, then both Σ 1 and Σ 2 are big subtrapezia of Γ with bases xv 1 x, xv 2 x, and Π is an (θ, x)-cell.) Hence the statement of Lemma 6.14 holds for minimal diagrams over G as well.

Weakly minimal diagrams.
We want to prove that for big enough constant N , Area G (∆) ≤ N n 2 for every minimal diagram ∆, which will imply in Subsection 8.1 that the boundary label of ∆ has quadratic area with respect to the finite presentation of G. However to prove this property by induction, below we have to consider a large class of diagrams, called weakly minimal.
Let C be a cutting q-band of a reduced diagram ∆ with disks, i.e. it starts and ends on ∂∆ and cut up the diagram. We call C a stem band, if it either a rim band of ∆ or both components of ∆\C contain disks. The (unique) maximal subdiagram of ∆, where every cutting q-band is stem, is called the stem ∆ * of ∆ . It is obtained by removing all crown cells from ∆, where a cell π is called crown, if it belongs to the component Γ defined by a cutting q-band B, where Γ contains no disks and π is not in B. In particular, all the disks and q-spokes of ∆ belong in the stem ∆ * . The stem of a diagram without disks is empty. (c) There is a constant c = c(λ) such that σ λ (∆ * ) ≤ c|∂∆| for every weakly minimal diagram ∆ over the group G; (d) If a diagram ∆ has a cutting q-band C and two components ∆ 1 and ∆ 2 of the compliment of C such that ∆ 1 ∪ C is a reduced diagram without disks and C ∪ ∆ 2 is a weakly minimal diagram, then ∆ is weakly minimal itself; (e) a weakly minimal diagram ∆ contains no θ-annuli, and. a θ-band cannot cross a q-band of ∆ twice.
Proof. (a) Every crown cell π of ∆ belonging is ∆ 1 is crown in ∆ 1 since the cutting q-band B separating π from all the disks of ∆ separates (itself or the subbands of B in the intersection of B and ∆ 1 ) π from ∆ * 1 . Therefore we have ∆ * 1 ⊂ ∆ * , and so ∆ * 1 is minimal being a subdiagram of a minimal diagram.
(d) The diagram ∆ is reduced since both ∆ 1 ∪ C and ∆ 2 ∪ C are reduced subdiagrams sharing the cutting band C. Since ∆ 1 has no disks, we have ∆ * = (∆ 2 ∪ C) * by the definition of stem. Therefore the stem ∆ * is a minimal diagram and ∆ is weakly minimal.
(e) The statement follows from Lemma 7.7 (3,4) if the bands belong in the stem ∆ * . By the same reason, a θ-band cannot cross a rim q-band of ∆ * twice. It remains to assume that the bands belong to the crown of ∆, and in this case, the statement follows from Lemma 5.6 since the crown is a union of disjoint reduced subdiagrams over the group M .
Remark 7.18. The statement (d) of Lemma 7.17 fails if one replaces the words "weakly minimal" with "minimal".
We will prove that for large enough parameters N 3 and N 4 , Area G (∆) ≤ N 4 (n + σ λ (∆ * )) 2 + N 3 µ(∆) for every weakly minimal diagram ∆ with perimeter n. For this goal, we will argue by contradiction in this section and study a weakly minimal counterexample ∆ giving opposite inequality with minimal possible sum n + σ λ (∆ * ).

Getting rid of rim bands with short base
Lemma 7.19. The diagram ∆ has no rim θ-bands with base of length at most K.

The cloves
By Lemma 6.18, ∆ has at least one disk. Taking into account that all disks and their spokes belong in the stem ∆ * , we can apply Lemma 7.5 to the minimal diagram ∆ * and fix a disk Π in ∆ such that L − 3 consecutive maximalť-bands B 1 , . . . B L−3 start on ∂Π, end on the boundary ∂∆ , and for any i ∈ [1, L − 4], there are no disks in the subdiagram bounded by B i , B i+1 , ∂Π, and ∂∆. (See fig. 11.) We denote by Ψ = cl(Π, B 1 , B L−3 ) the subdiagram without disks bounded by the spokes B 1 , B L−3 (and including them) and by subpaths of the boundaries of ∆ and Π, and call this subdiagram a clove. Similarly one can define the cloves Ψ ij = cl(Π, Proof. Proving by contradiction, we may assume that there is a tight subcomb Γ by Lemma 6.9 (2). Then contradiction for the counter-example ∆, which is a weakly minimal diagram over the group G, appears as in the proofs of Lemmas 6.18 with the following modifications of few constants and references.
We should replace N 2 and N 1 with N 4 and N 3 , replace n with n + σ λ (∆ * ), and notice that the value of σ λ does nor increase when passing from ∆ to a subdiagram by Lemma 7.17 (b). We should use Lemma 7.17 (e) instead of Lemma 5.6 used in the proofs of Lemmas 6.15 -6.19. The diagram ∆ 0 is weakly minimal because it is constructed from the reduced diagram Γ (1) The counter-example ∆ has no two disjoint subcombs Γ 1 and Γ 2 in Ψ with basic widths at most K and handles C 1 and C 2 such that some ends of these handles are connected by a subpath x of the boundary path of ∆ with |x| q ≤ N .
(2) The boundary of every subcomb Γ of ∆ with basic width s ≤ K has 2s q-edges provided Γ ⊂ Ψ. (2) There exists r, L/2 − 3 ≤ r ≤ L/2, such that the θ-bands of Ψ crossing B L−3 do not cross B r , and the θ-bands of Ψ crossing B 1 do not cross B r+1 .
Proof. (1) If the claim were wrong, then one could find a rim θ-band T in Ψ, which crosses neither B 1 nor B L−3 . By Lemma 7.19, the basic width of T is greater than K. Since (1) a disk has LN spokes, (2) no q-band of Ψ intersects T twice by Lemma 5.6 (3), T has at least K q-cells, and (4) K > 2K 0 + LN , there exists a maximal q-band C such that a subdiagram Γ separated from Ψ by C contains no edges of the spokes of Π and the part of T belonging to Γ has at least K 0 q-cells ( fig. 15). If Γ is not a comb, and so a maximal θ-band of it does not cross C , then Γ must contain another rim band T having at least K q-cells. This makes possible to find a subdiagram Γ of Γ such that a part of T is a rim band of Γ containing at least K 0 q-cells, and Γ does not contain C . Since Area(Γ ) > Area(Γ ) > . . . , such a procedure must stop. Hence, for some i, we obtain a subcomb Γ (i) of basic width ≥ K 0 , contrary to Lemma 7.20.
(2) Assume there is a maximal θ-band T of Ψ crossing the spoke B 1 . Then assume that T is the closest to the disk Π, i.e. the intersection of T and B 1 is the first cell of the spoke B 1 . If B 1 , . . . , B r are all the spokes crossed by T , then r ≤ L/2 by Lemma 6.5, which is applicable here since all the spokes belong in the stem ∆ * , which is a minimal diagram. Since the band T does not cross the spoke B r+1 , no other θ-band of Ψ crossing B 1 can cross B r+1 . and no θ-band crossing the spoke B L−3 can cross B r . The same argument shows that r + 1 ≥ L/2 − 2 if there is a θ-band of Ψ crossing the spoke B L−3 .
For the clove Ψ = cl(π, B 1 , B L−3 ) in ∆, we denote by p(Ψ) the common subpath of ∂Ψ and ∂∆ starting with theť-edge of B 1 and ending with theť-edge of B L−3 . Similarly we define the (outer) path p ij = p(Ψ ij ) for every smaller clove Ψ ij .
Proof. Let a maximal q-band C of Ψ starts on p i,i+1 and does not end on Π. Then is has to end on p i,i+1 too. If Γ is the subdiagram (without disks) separated by C, then every maximal θ-band T of Γ has to cross the q-band C since the extension of T in Ψ must cross either B 1 or B L−3 by Lemma 7.22. Therefore Γ is a comb with handle C.
Consider the q-bands of this kind defining maximal subcombs Γ 1 , Γ 2 , . . . Γ k in Ψ i,i+1 . The basic width of each of them is less than K 0 by Lemma 7.20. Therefore k ≤ 1 since otherwise one can get two subcombs contradicting to Lemma 7.21 (1), because there are at most N + 1 maximal q-bands starting on ∂Π in Ψ i,i+1 . By Lemma 7.21 (2), such a subcomb has at most 2K 0 q-edges in the boundary. Hence there are at most 2K 0 + N < 3K 0 q-edges in the path p i,i+1 .
We denote by ∆ the subdiagram formed by Π and Ψ, and denote by p the path top(B 1 )u −1 bot(B) −1 L−3 , where u is a subpath of ∂Π, such that p separates ∆ from the remaining subdiagram Ψ of ∆ ( fig. 16).

Figure 16: Boundaries of Ψ and Ψ
Similarly we define subdiagrams ∆ ij , paths where u ij is a subpath of ∂Π, and the subdiagrams Ψ ij .
Recall that by Definition 7.1 the boundary label of ∂Π is a disk word V , where V ∅ ≡ W L and W is an accessible word.
Lemma 7.24. We have the following inequalities and, if i ≤ r and j ≥ r + 1, then Proof. The first iequality follows from Lemma 6.2 (b) since the path u ij has L − j + i − 1 t-edges. To prove the second inequality, we observe that the path p ij has (j − i)N + 1 q-edges and it has h i + h j θ-edges by Lemma 7.22.
Proof. (1) Assume there is a q-band Q of Ψ 0 ij starting and ending on q ij . Then j = i + 1 and q i,i+1 = uevfw, where Q starts with the q-edge e and ends with the q-edge f . Suppose that Q has length . Then |v| ≥ since every maximal θ-band of Ψ 0 i,i+1 crossing Q has to end on the subpath v. So one has |evf | ≥ + 2, and replacing the subpath evf by a side of Q of length one replaces the path q i,i+1 with a shorter homotopic path by Lemma 6.2. This contradicts the choice of q i,i+1 , and so statement (1) is proved.
(2) Assume there is a θ-band T of Ψ 0 i,i+1 starting and ending on q i,i+1 . Then q i,i+1 = uevfw, where T starts with the θ-edge e and ends with the θ-edge f . Moreover, one can chose T such that v is a side of this θ-band. By Statement (1) the band T has less than N (θ, q)-cells. Therefore if v is another side of T , we have |v | Y − |v| Y ≤ 2N . It follows from the definition of length in Subsection 6.1 that |evf | − |v | ≥ 2 − 2δN > 1 + 2δ. Therefore, by Lemma 6.2 (c), replacing the subpath evf with v we decrease the length of q i,i+1 at least by 1, a contradiction.
It follows from Lemma 7.22 that between the spokes B j and B j+1 (1 ≤ j ≤ r − 1), there is a trapezium Γ j of height h j+1 with the sideť-bands . Similarly, we have trapezia Γ j for r + 1 ≤ j ≤ L − 4. By Lemma 7.28 (2), every trapezium Γ j is contained in both Ψ j,j+1 and Ψ 0 j,j+1 . The bottoms y j of all trapezia Γ j belong to ∂Π and have the same label Wť. We will use z j for the tops of these trapezia. Since Γ j and Γ j−1 (2 ≤ j ≤ r − 1) have the same bottom labels and the history H j is a prefix of H j−1 , by Lemma 5.12, h j different θ-bands of Γ j−1 form the copy Γ j of the trapezium Γ j (more precisely, a copy of a superscript shift Γ (+(±1)) j ) with top and bottom paths z j and y j = y j−1 . We denote by E j (by E 0 j ) the comb formed by the maximal θ-bands of Ψ j,j+1 (of Ψ 0 j,j+1 , resp.) crossing theť-spoke B j but not crossing B j+1 (1 ≤ j ≤ r − 1, see fig.  17). Its handle C j of height h j − h j+1 is contained in B j . The boundary ∂E j (resp., ∂E 0 j ) consists of the side of this handle, the path z j and the path p j,j+1 (the path q j,j+1 , respectively).
Assume that a maximal Y -band A of E 0 j (2 ≤ j ≤ r − 1) starts on the path z j and ends on a side Y -edge of a maximal q-band C of E 0 j . Then A, a part of C and a part z of z j bound a comb ∇. Lemma 7.29. There is a copy of the comb ∇ in the trapezium Γ = Γ j−1 \Γ j . It is a superscript shift of ∇.
Proof. The subpath z of z j starts with an Y -edge e and ends with a q-edge f . There is a copy z of z in z j starting with e and ending with f . Note that the θ-cells π and π attached to f and to f in ∇ and in Γ are copies of each other up to superscript shift, since they correspond to the same letter of the history. Now moving from f to e, we see that the whole maximal θ-band T 1 of ∇ containing π has a copy in Γ. Similarly we obtain a copy of the next maximal θ-band T 2 of ∇, and so on.

Bounding the number of Y -bands in a sector of a clove
Lemma 7.30. At most N Y -bands starting on the path y j can end on the (θ, q)-cells of the same θ-band. This property holds for the Y -bands starting on z j too.
Proof. We will prove the second claim only since the proof of the first one is similar. Assume that the Y -bands A 1 , . . . , A s start from z j and end on some (θ, q)-cells of a θband T . Let T 0 be the minimal subband of T , where the Y -bands A 2 , . . . , A s−1 end and z j be the minimal subpath of z j , where they start. Then by Lemma 5.6, every maximal q-band starting onz j has to cross the band T 0 and vice versa. Hence the base of T 0 is a subbase of the standard base (or of its inverse). Since every rule of M can change at most N − 2 Y -letters in a word with standard base, all (θ, q)-cells of T 0 have at most N − 2 Y -edges, and the statement of the lemma follows.
Without loss of generality, we assume that (7.57) (Recall that L 0 is one of the parameters used in the paper, a number between c 5 and L, Section 2.3.)

Estimating the sizes of trapezia
, is less than L/5.
Proof. Consider Γ j as in the assumption of the lemma with j ∈ [L 0 + 1, r − 1]. The subcomb E 0 j has at most N maximal q-bands by Lemma 7.28. So there are at most N maximal Y -bands starting on z j and ending on each of the θ-bands of E 0 j . If g j is the length of the handle of E 0 j for an index j from the set S = [L 0 +1, r−1]∪[r+1, L−L 0 −5], then j∈S g i ≤ 2h. Hence at most 2hN maximal Y -bands starting on all z j -s, j ∈ S (denote this set of Y -bands by A), end on some (θ, q)-cells.
Proving by contradiction, we have at least L|W | Y /5c 5 N Y -bands in A. Hence at least L|W | Y /5c 5 N − 2hN bands from A end on the subpaths q j,j+1 for j ∈ S. Since the path q j,j+1 has at most 2h θ-edges by Lemma 7.28. Therefore by Lemma 6.2, at least L|W | Y /5c 5 N − 2hN − 2h Y -edges contribute ∆ in the length of this path. It follows from Lemma 7.27 that since 2hN + 2h ≤ 3L 2 0 N |W | Y by the assumption of the lemma, which is less than L 3 0 |W | Y /10c 5 N ≤ L|W | Y /10c 5 N by the choice of L 0 and L. Also by Lemma 7.24, we have because by the choice of L, we have 3L 0 < L/20c 5 N . The inequalities (7.58, 7.59) gives us which implies, together with (7.60), that Finally, for the right-hand side, we have δ/40c 5 N > ε = N Proof. Let T and S be the maximal θ-bands of Ψ crossing B 1 and B L−3 , respectively, and the closest to the disk Π. Let they cross k and spokes of Π, respectively. By Lemma 7.31, k + > L − L/5 − 3L 0 > 2L/3, and also k, ≥ 2 since L/2 − 3 ≤ r ≤ L/2. It follows from Lemma 7.7 (2) (applied to ∆ * ) that the first letters of H 1 and H L−3 are different.
Proof. Assume that |W | y ≤ LN/4L 0 . By Lemma 7.24 for i = L 0 + 1 and j = L − L 0 − 3, because L >> L 0 . It follows from inequalities (7.51, 7.57) that h i + h j ≤ 2h. Hence Proof. Proving by contradiction, we have inequality (7.62 from Lemma 7.33. By Lemma 7.31, there are at least L − L/5 − 3L 0 > 0.7L trapezia Γ j with |z j | Y < |W | Y /c 5 N , and so one can choose two such trapesia Γ k and Γ such that k < r, ≥ r + 1 and − k > 0.6L. Since H k+1 (resp. H ) is a prefix of H 1 (of H L−3 ), it follows from Lemma 7.32 that the first letters of H k+1 and H are different unless they are equal to θ(23) −1 .
Since the bottoms of Γ k and Γ (which belong in ∂∆) have the same label, up to a superscript shift, one can construct an auxiliary trapezium E identifying the bottom of a copy of Γ k and the bottom of a mirror copy of Γ . The history of E is H −1 H k+1 , which is an eligible word if the first letters of H k and H are different.
If both first letters are θ(23) −1 , then the word H −1 H k+1 also eligible by definition. If the bottom θ-bands of Γ k and Γ l are just copies of each other then the above constructed diagram E is not reduced. However one can modify the construction replacing Γ k by an auxiliary superscript shift Γ The top W 0 and the bottom W t of E have Y -lengths less than |W | Y /c 5 N . Without loss of generality, one may assume that h k+1 ≥ h , and so h k+1 ≥ t/2, where t is the height of E.
The path p j,j+1 has at most h j − h j+1 ≤  Proof. Assume that |z j | Y ≥ 2N h j . By Lemma 7.30, at most N h j maximal Y -bands of E j starting on z j can end on the (θ, q)-cells of E j . Hence at least |z j | Y − N h j ≥ N h j of them has to end on the path p j,j+1 . The path p j,j+1 has at most h j θ-edges. Hence by Lemma 6.2,   (*) The history H H H has only one step, and for the subcomputation D with this history, there is a sectors Q Q such that a state letter from Q or from Q inserts a letter increasing the length of this sector after every transition of D.
Proof. Recall that the standard base of M is built of the standard base B of M 4 and its inverse copy (B ) −1 (plus letter t). Due to this mirror symmetry of the standard base, we have mirror symmetry for any accessible computations, in particular, for C and D. Therefore proving by contradiction, we may assume that the Y -letters are inserted from the left of Q.
Let Q be the maximal q-spoke of the subdiagram E 0 i ⊂ Γ i corresponding to the base letter Q. If Q is the neighbor from the left q-spoke for Q (the spokes are directed from the disk Π), then the subpath x of z i between these two q-spokes has at least h i+1 − h i+2 = ||H || Y -letters. Indeed, Γ i contains a copy Γ i+1 of Γ i+1 , the bottom of the trapezium Γ i \Γ i+1 is the copy z i+1 of z i+1 and the top of it iz z i , and so the subcomputation with history H has already increased the length of the Q Q-sector. Thus, by lemmas 7.38, 7.34 and the choice of L 0 > 100c 5 N , we have Note that an Y -band A starting on x cannot end on a (θ, q)-cell from Q. Indeed, otherwise by Lemma 7.29, there is a copy of this configuration in the diagram Γ i−1 , i.e. the copy of A ends on the copy of Q contrary to the assumption that the rules of computation with history H H H do not delete Y -letters.
Let us consider the comb bounded by Q, Q , x and the boundary path of ∆ 0 (without the cells from Q ). If the lengths of the parts of Q and Q bounding this comb are s and s , respectively, then there are |x| + s maximal Y -bands starting on x and Q and ending either on Q or on ∂∆ 0 since the comb has no maximal q-bands by Lemma 7.28. At most s < s of these Y -bands can end on Q . Therefore at least |x| + s − s of them end on the segment of the boundary path of ∆ 0 lying between the ends of Q and Q.
Proof of Theorem 1.2. Since the Turing machine M 0 accepts a non-recursive language, the conjugacy problem is undecidable for the group G by Lemma 8.3. The Dehn function of G is at most quadratic by Lemma 8.2. To obtain a lower quadratic estimate, it suffices to see that if a θ-letter θ and Y -letter a commute, then by Lemmas 7.5 and 5.6, the area of the word a n θ n a −n θ −n is equal to n 2 (or to use [2]: every nonhyperbolic finitely presented group has at least quadratic Dehn function). The theorem is proved. history of computation, 9 length of a computation, 9 reduced, 9 space of a computation, 9 configuration of an S-machine, 9 end configuration of an S-machine, 9 end state letter of an S-machine, 9 hardware of an S-machine, 7 LR, 12 parallel work of LR or RL in several sectors, 18 LR m , 14 M, 24 accessible computation of M, 30