1 Introduction

For a group G and a set S of generators of G, we write Γ(G,S) for the Cayley graph of G with connection set S, that is, the graph with vertex set G and with edge set {{g,sg}∣gG,sS}. The diameter \(\operatorname {diam}(\varGamma)\) of a graph Γ is the maximum distance among the vertices of Γ, and, in the case of a Cayley graph Γ(G,S), it is the maximum (over the group elements gG) of the shortest expression \(g=s_{1}^{i_{1}}\cdots s_{m}^{i_{m}}\), with s k S and i k ∈{−1,1}. We define the diameter of a group G as

$$\operatorname {diam}(G):= \max \bigl\{ \operatorname {diam}\bigl(\varGamma(G,S) \bigr) \mid S \ \mbox{generates}\ G \bigr\}. $$

A first investigation of the diameter of Cayley graphs for general groups was undertaken by Erdős and Rényi [6]. Later Babai and Seress [4] obtained asymptotic estimates on \(\operatorname {diam}(G)\) depending heavily on the group structure of G. In particular, the results in [4] highlight the discrepancy between the diameter of Cayley graphs of groups close to being abelian and the diameter of Cayley graphs of non-abelian simple groups. Moreover, [4] contains the following conjecture of Babai.

Conjecture 1.1

[4, Conjecture 1.7]

There exists c>0 such that, for all non-abelian simple groups G, \(\operatorname {diam}(G) \leq (\log |G|)^{c}\).

The conjecture remains open, although significant progress has been made. In particular, starting with the work of Helfgott on the groups PSL(2,p) and PSL(3,p) [9, 10] and based thereon, there has been a series of results [5, 7, 17] proving the conjecture for finite simple groups of Lie type of bounded rank. The best statement known at the time of writing is by Pyber and Szabó [17] and says that there exists a polynomial c such that, for a finite simple group G of Lie type of Lie rank r, we have \(\operatorname {diam}(G)\leq (\log|G|)^{c(r)}\). For the sake of comparison, Conjecture 1.1 asserts that c should be a constant rather than a polynomial.

The proofs of these theorems make use of new results in additive combinatorics, specifically on growth in simple groups. We note that the difficulties in generalizing these results from groups of bounded rank to those of unbounded rank seem closely related to difficulties in proving Conjecture 1.1 for the alternating groups \(\operatorname {Alt}(n)\). In both cases (that is, classical groups of unbounded rank and alternating groups) there are known counterexamples to general “growth results” for sets (see, for example, [1618]), which were central to the approach used to prove Conjecture 1.1 for groups of Lie type of bounded rank. What is more, these two classes of counterexample are, in some sense, related.

In this paper we focus on the case where \(G=\operatorname {Alt}(n)\) or \(\operatorname {Sym}(n)\). Let Ω be a set of size n. For \(g\in \operatorname {Sym}(\varOmega)\), define the support of g by \(\operatorname {supp}(g)=\{\gamma\in\varOmega \mid \gamma^{g}\neq \gamma\}\). Observe that \(\operatorname {supp}(g)\) is equal to the complement in Ω of the fixed set, \(\operatorname {fix}(g)\), of g. Babai, Beals, and Seress [2] proved the following result.

Theorem 1.2

[2]

For every ε<1/3, there exists c ε >0 such that, if \(G=\operatorname {Sym}(n)\) or \(\operatorname {Alt}(n)\) and S is a set of generators of G containing an element g with \(|\operatorname {supp}(g)| \le \varepsilon n\), then

$$\operatorname {diam}\bigl(\varGamma(G,S)\bigr) \leq c_\varepsilon n^8. $$

In this paper we provide a variant of the argument in [2] to prove the following stronger theorem.

Theorem 1.3

Let C=0.63. There exists c>0 such that, if \(G=\operatorname {Sym}(n)\) or \(\operatorname {Alt}(n)\) and S is a set of generators of G containing an element g with \(|\operatorname {supp}(g)| \le Cn\), then

$$\operatorname {diam}\bigl(\varGamma(G,S)\bigr) \le O\bigl(n^{c}\bigr). $$

We do not try to minimize the exponent c in the theorem. Our arguments give c≤78, but with some more work the bound on \(\operatorname {diam}(\varGamma(G,S)) \) can be improved to at least O(n 66).

Theorem 1.3 also extends to directed graphs. Given G=〈S〉, the directed Cayley graph \(\vec{\varGamma}(G,S)\) is the graph with vertex set G and edge set {(g,sg):gG,sS}. Analogously to the undirected case, the diameter of \(\vec{\varGamma}(G,S)\) is defined as the maximum (taken over gG) of the shortest expression g=s 1s m , with each s k S. By a theorem of Babai [1, Corollary 2.3], \(\operatorname {diam}(\vec{\varGamma}(G,S)) = O (\operatorname {diam}(\varGamma(G,S) ) \cdot (\log |G|)^{2} )\) for all groups G and sets S of generators, so we immediately obtain the following corollary.

Corollary 1.4

Let C=0.63. There exists d>0 such that, if \(G=\operatorname {Sym}(n)\) or \(\operatorname {Alt}(n)\) and S is a set of generators of G containing an element g with \(|\operatorname {supp}(g)| \le Cn\), then

$$\operatorname {diam}\bigl(\vec{\varGamma}(G,S) \bigr) \leq O\bigl(n^{d}\bigr). $$

We note that for arbitrary sets of generators, the best known bound is quasipolynomial, by a recent result of Helfgott and Seress:

Theorem 1.5

[11]

For \(G=\operatorname {Alt}(n)\) and \(\operatorname {Sym}(n)\), \(\operatorname {diam}(G) = \exp(O( (\log n)^{4} \log\log n))\).

The machinery developed in this paper turns out to have application to other questions within permutation group theory. Indeed, it is possible to use variants of the results given in Sect. 3 to recover, and strengthen, classical results concerning multiply transitive groups due to Manning [1315] and Wielandt [21]. This will be the subject of a forthcoming paper [8].

1.1 The main ideas

It is well known and easy to see that if a set A of generators of \(G=\operatorname {Alt}(n)\) or \(\operatorname {Sym}(n)\) contains a 3-cycle t, then every element of G can be written as a word of length less than n 4 in A. Indeed, repeatedly conjugating t by A gives all 3-cycles as words of length less than n 3, and each element of \(\operatorname {Alt}(n)\) is a product of at most ⌊n/2⌋ 3-cycles. Finally, if \(G=\operatorname {Sym}(n)\), then one more multiplication by A gives words for all elements of G. Thus, given any set S of generators of G, in order to prove Conjecture 1.1, it is enough to construct a 3-cycle as a word in S of polynomial length.

We may try to reach a 3-cycle in stages by constructing elements of smaller and smaller support. Up to very recently, the only subexponential method to obtain an element of support less than cn, for some constant c<1, from arbitrary generating sets was in [3]. In that paper, iteration of the support reduction was utilized to prove \(\operatorname {diam}(G) =\exp( (1+o(1))\sqrt{n \log n} )\), the only subexponential bound until Theorem 1.5.

Theorem 1.2 may be interpreted as a reduction of Conjecture 1.1 for alternating groups, to the problem of constructing an element g of moderately small support as a short word in an arbitrary set S of generators. The proof of Theorem 1.2 in [2] is based on the following observations.

(BBS1):

If \(|\operatorname {supp}(a)|<\varepsilon n\) for some ε<1/3, aG, \(G=\operatorname {Alt}(n)\) or \(\operatorname {Sym}(n)\), and r is a random element of G, then, for b=a r, the commutator [a,b]=a −1 b −1 ab has support smaller than a with positive probability.

(BBS2):

In (BBS1), it is not necessary that r is a uniformly distributed random element of G. It is enough that, for some constant , r maps a sequence of distinct elements of length from the permutation domain nearly uniformly to all other sequences. Furthermore, random words r on any set S of generators of G, of length n O(), satisfy this property.

In [2], the number =3 was chosen, and a 3-cycle was constructed in O(loglogn) applications of (BBS1). A major conceptual novelty of [2] is that besides the natural action of G on n points and the action of G on itself as in Γ(G,S), it is beneficial to work with other actions of G. This principle is more clearly formulated in [11]. In [2] and in the present paper, the action of G on sequences of length from the natural permutation domain is used, while in [11] other actions are utilized as well.

Chronologically, we made three improvements to the argument in [2].

(NEW1):

The conclusion of (BBS1) holds for ε<1/2, implying a version of Theorem 1.3 with C<0.5.

(NEW2):

With positive probability, the commutator [a,b] has many fixed points and also contains a significant number of 3-cycles. Thus, if \(|\operatorname {supp}(a)|<\varepsilon n\) for some ε<0.585, then [a,b]3 has support smaller than a. This implies Theorem 1.3 with C=0.585.

(NEW3):

With positive probability, the permutation [a,b −1][a,b] has many fixed points and 2-, 3-, 4-, and 5-cycles. So, for ε≤0.63, ([a,b −1][a,b])60 has support smaller than a.

In this paper, we only prove the strongest version based on (NEW3). We have to overcome several technical difficulties: (i) the analysis of the local behaviour (i.e., finding how a and b should interact on small subsets Δ of the natural domain such that [a,b −1][a,b] forms a short cycle on some points of Δ); (ii) ensuring that ([a,b −1][a,b])60 is not the identity of G; and (iii) handling the special case where the originally given generator a has order 2x3y for some x,y≥0. We shall apply the argument of (BBS2) with =26.

The structure of this paper is as follows. In Sect. 2, we collect basic concepts regarding groups and graphs, and introduce the central notion of αβ-trees. These are the objects describing the possible local interactions of a and b. We also introduce our probabilistic method. In Sect. 3 we give a graph-theoretic technique for estimating the number of fixed points of w(a,b), where w(α,β) is a reduced word in α and β, and a and b are particular conjugate permutations of \(\operatorname {Sym}(\varOmega)\). In Sect. 4, we apply the results of Sect. 3 to the word [α,β −1][α,β]=α −1 βαβ −1 α −1 β −1 αβ and prove Theorem 1.3. In Sect. 5, we discuss some possible extensions of Theorem 1.3.

2 Basic concepts

In this section, we collect the definitions and basic results that will be needed in the proof of Theorem 1.3.

2.1 Permutation groups

Let Ω={1,2,…,n}. We use \(\operatorname {Sym}(n)\) and \(\operatorname {Sym}(\varOmega)\) interchangeably; more exactly, we use \(\operatorname {Sym}(\varOmega)\) when we emphasise the action of \(\operatorname {Sym}(n)\) on Ω. Let \(S \subseteq \operatorname {Sym}(n)\) and \(k, l\in\mathbb{Z}^{+}\). Define

$$ S^\ell=\{s_1\cdots s_\ell \mid s_1, \dots, s_\ell\in S\}; \qquad S^{-1}=\bigl\{ s^{-1} \mid s \in S \bigr\}. $$

For ωΩ and \(a,g \in \operatorname {Sym}(n)\), we write ω g for the image of ω under g; and a g for g −1 ag. We denote by Ω (k) the set of k-tuples of distinct elements of Ω, and we write n (k)=|Ω (k)|=n(n−1)⋯(nk+1).

We shall use the following result of Whiston [20].

Lemma 2.1

[20]

Any set S of generators for \(G=\operatorname {Sym}(n)\) or \(\operatorname {Alt}(n)\) contains a subset A of cardinality less than or equal to n−1 that also generates G.

If A and B are two sets of generators for G with AB, then clearly \(\operatorname {diam}(\varGamma(G,B)) \le \operatorname {diam}(\varGamma(G,A))\). Therefore, Lemma 2.1 implies that it is enough to prove Theorem 1.3 for sets S of generators of size at most n that contain an element g of small support.

2.2 Graphs

In this paper, a graph X is a finite connected directed graph. Moreover, we allow loops on the vertices of X and multiple edges. We do not assume that X is strongly connected, i.e., it is possible that there is no directed path between some vertices x and y of X. We write V(X) for the set of vertices of X and E(X) for the set of edges of X. An edge e running from vertex i to vertex j will be written (i,j), but we warn that this notation is ambiguous as there may be more than one such edge.

We define an αβ-graph, say T, to be a graph together with a label, α or β, attached to every edge. We require that for each vertex vV(T) and for each γ∈{α,β}, v is incident with at least one edge labelled by γ.Footnote 1 We also require that at most one edge starting at v is labelled by γ and at most one edge ending at v is labelled by γ. Here a loop at v counts as one incoming and one outgoing edge for v. Notice that all vertices of an αβ-graph have in-degree at most 2 and out-degree at most 2. For γ∈{α,β}, we define T γ to be the subgraph of T with vertex set V(T) and edge set the set of edges of T labelled by γ.

We say that a cycle C of T is monochromatic if all of its edges are labelled α (resp. β), that is, C is a sequence of vertices (v 1,…,v r ,v r+1) with v r+1=v 1 and r≥2, and where for each i∈{1,…,r}, the ordered pair (v i ,v i+1) is an edge of T labelled α (resp. β).

Given permutations \(a,b\in \operatorname {Sym}(\varOmega)\) and an injective map ι:V(T)→Ω, we say that T α is hosted by (ι,a) if ()a= for each edge or loop (x,y)∈E(T α ). Similarly, T β is hosted by (ι,b) if ()b= for each edge or loop (x,y)∈E(T β ). Finally, T is hosted by (ι,a,b) if T α is hosted by (ι,a) and T β is hosted by (ι,b).

Let T be an αβ-graph. Observe that for γ∈{α,β}, the connected components of T γ are of three types:

  1. (1)

    γ-loops: isolated vertices, namely the vertices v of T having a loop at v labelled with γ;

  2. (2)

    γ-cycles: monochromatic cycles all of whose edges are labelled γ;

  3. (3)

    γ-paths: maximal directed paths such that all edges are labelled with γ.

We denote by l γ (T) the number of γ-loops and by p γ (T) the number of γ-paths and γ-cycles.

We say that an αβ-graph is an αβ-tree if all undirected cycles are monochromatic (and so necessarily they are also directed cycles). Note that an αβ-tree may not be a tree in the usual graph-theoretic sense. The following result will be crucial.

Lemma 2.2

Let T be an αβ-graph. Then p α (T)+p β (T)+l α (T)+l β (T)≤|V(T)|+1. Moreover, equality holds if and only if T is an αβ-tree.

Proof

Let B be the graph with vertex set the set of α-loops, α-paths, α-cycles, β-loops, β-paths and β-cycles of T. We declare two distinct vertices x and y of B adjacent if there exists a vertex v of T such that v is incident with both x and y. By construction, B is bipartite (with the α-objects comprising one class of the bipartition and the β-objects the other) and |V(B)|=p α (T)+p β (T)+l α (T)+l β (T). By the definition of αβ-graphs, each vertex v of T is incident with exactly one component of T α and with exactly one component of T β . Hence, v defines exactly one edge in B, and so |E(B)|=|V(T)|. Since T is connected, the graph B is connected, and so |V(B)|≤|E(B)|+1=|V(T)|+1, proving the first claim.

Observe that if T contains a non-monochromatic cycle, then B contains a cycle. Conversely, if B contains a cycle, then T must contain a non-monochromatic cycle. We conclude that B is a tree if and only if T has no non-monochromatic cycles, and the second claim follows from the standard fact that B is a tree if and only if |V(B)|=|E(B)|+1. □

We end this subsection with some more definitions. For an αβ-graph T and for 0<δ<1, we define \(\delta_{T} := (1-\delta)^{l_{\alpha}(T)+l_{\beta}(T)}\delta^{p_{\alpha}(T)+p_{\beta}(T)}\). An isomorphism between αβ-graphs T 1 and T 2 is defined to be a bijection φ:V(T 1)→V(T 2) that preserves edges and edge-labels. If, for some x,yV(T 1), there are two directed edges from x to y in T 1 (with necessarily different labels), then in T 2 there are also two directed edges from to . We define an automorphism of T to be an isomorphism between T and itself. The set of all such permutations is the automorphism group \(\operatorname {Aut}(T)\).

2.3 Words and graphs

Let T be an αβ-graph, and w=w 1 w 2w k be a reduced word with w 1,…,w k ∈{α,α −1,β,β −1}. We say that T admits w if there exists a vertex x of T such that by starting at x and by tracing the edges and the loops of T with labels (w 1,w 2,…,w k ), we visit all vertices and edges of T, and we return to the vertex x (here by abuse of notation, we interpret the label α −1 as the label α with the edge pointing in the opposite direction, and a similar convention holds for β). The vertex xV(T) is called a fixed vertex for (T,w); note that there may be more than one such vertex in T.

Table 1 contains pairwise non-isomorphic αβ-trees that admit the word w=[α,β −1][α,β]. Here dashed (red) lines are labelled with α, solid (blue) lines are labelled with β, and for simplicity of drawing, all loops are omitted; thus, any vertex which is not incident to a red line (or blue line, respectively) is in fact incident to an α-loop (or β-loop, respectively). The fixed vertices are written as red stars, and under each graph T we have written the value for δ T .

Table 1 αβ-trees admitting w=[α,β −1][α,β]=α −1 βαβ −1 α −1 β −1 αβ

Next, we explain how αβ-trees can be used to estimate the number of fixed points in certain permutations. Let T be an αβ-tree admitting the reduced word w=w 1 w 2w k , and let x be a fixed vertex of (T,w). Starting at x and tracing w, we obtain a sequence U=(x=x 0,x 1,…,x k =x) of vertices of T such that, by definition, all vV(T) occur in U.

Let \(a,b \in \operatorname {Sym}(\varOmega)\). If T is hosted by (ι,a,b) for some injective map ι:V(T)→Ω, then, starting at and tracing the word w by using a and b for the labels α and β, respectively, we obtain the sequence =(x 0 ι,x 1 ι,…,x k ι). In particular, and w uniquely determine the entire map ι, and is a fixed point of the permutation w(a,b).

For fixed a,b, and w as above, let T 1,…,T m be pairwise non-isomorphic αβ-trees admitting w. For 1≤jm, we denote the number of fixed vertices in T j (with respect to w) by fixed(T j ). Also, let I j be an index set such that for zI j , there exists ι z :V(T j )→Ω with T j hosted by (ι z ,a,b).

Lemma 2.3

Let a,b,w,m,T j ,I j be as in the previous paragraph.

  1. (i)

    If x is a fixed vertex of \((T_{j_{1}},w)\), y is a fixed vertex of \((T_{j_{2}},w)\), and \(x\iota_{z_{1}}=y\iota_{z_{2}}\) for some \(z_{1} \in I_{j_{1}}\) and \(z_{2} \in I_{j_{2}}\) then j 1=j 2, and x,y are in the same orbit of \(\operatorname {Aut}(T_{j_{1}})\).

  2. (ii)

    The number of fixed points of w(a,b) is at least

    $$ \bigl|\operatorname {fix}\bigl(w(a,b) \bigr)\bigr| \ge \sum_{j=1}^m \frac{|I_j| \cdot \mathsf {fixed}(T_j)}{|\operatorname {Aut}(T_j)|}. $$

Proof

(i) Since \(x\iota_{z_{1}}=y\iota_{z_{2}}\) and w determines \(T_{j_{1}}\iota_{z_{1}} = T_{j_{2}}\iota_{z_{2}}\), the map \(\iota_{z_{1}}\iota_{z_{2}}^{-1}\) is a label-preserving isomorphism between \(T_{j_{1}}\) and \(T_{j_{2}}\). Therefore, j 1=j 2 and \(\iota_{z_{1}}\iota_{z_{2}}^{-1} \in \operatorname {Aut}(T_{j_{1}})\). Moreover, since \(x\iota_{z_{1}}\iota_{z_{2}}^{-1}=y\), x and y are in the same orbit of \(\operatorname {Aut}(T_{j_{1}})\).

(ii) For 1≤jm, let F j be the set of fixed points of w(a,b) of the form z , for some fixed vertex xV(T j ) and zI j . By (i), the sets F j are pairwise disjoint. Moreover, for zI j , ι z contributes fixed(T j ) fixed points to F j , giving a total count of |I j |⋅fixed(T j ). Part (i) also implies that any element of F j occurs in this count at most \(\operatorname {Aut}(T_{j})\) times. □

2.4 Walks on graphs

In this subsection, we consider graphs X that are symmetric and regular in the following sense. Symmetric means that for any two vertices x,yV(X), the number of edges from x to y is the same as the number of edges from y to x. Regular of valency d means that each xV(X) has in-degree and out-degree d. For xV(X), we denote by Δ(x) the d-element multiset {y∣(x,y)∈E(X)}.

Definition 2.4

A lazy random walk on X is a discrete stochastic process where a particle moves from vertex to vertex in X. If, after k steps, the particle is at xV(X) and Δ(x)={y 1,…,y d }, then at step k+1 the particle

  1. (i)

    stays at x with probability 1/2;

  2. (ii)

    moves to vertex y i with probability 1/(2d) for all i=1,…,d.

The asymptotic rate of convergence for the probability distribution of a particle in a lazy random walk on X is an important and well-studied problem in combinatorics and computer science (see [12]). For x,yV(X), we write p k (x,y) for the probability that the particle is at vertex y after k steps of a lazy random walk starting at x. For a fixed ε>0, the mixing time for ε is the minimum value of k such that

$$\frac{1}{|V(X)|} (1-\varepsilon) \le p_k (x, y) \le \frac{1}{|V(X)|} (1+\varepsilon) $$

for all x,yV(X). The following estimate is well known; for a proof, see e.g. [11, Sect. 4].

Lemma 2.5

Let X be a connected, symmetric, and regular directed graph of valency d and with N vertices, and let ε>0. Then the mixing time for ε is at most N 2 dlog(N/ε).

For \(G=\operatorname {Sym}(\varOmega)\) or \(\operatorname {Alt}(\varOmega)\) and G=〈S〉, we are interested in the following symmetric and regular directed graphs X k for positive integers k. Let V(X k ):=Ω (k) and E(X k ):={(x,x g)∣xΩ (k) and gSS −1}. Clearly, X k has n (k) vertices, is connected, is symmetric, and is regular of valency |SS −1|.

It is useful to induce random walks on the graphs X k for different k at the same time. This is done as follows. First, we choose a subset J⊆{1,2,…,}, where J is the set of steps when the particle moves to a neighbour of the current position as in Definition 2.4(ii). The length j:=|J| is chosen from the binomial distribution B(,1/2), and then J itself is chosen from the uniform distribution on the j-element subsets of {1,2,…,}. Finally, for iJ, we choose g i SS −1 uniformly and use g i in the ith step to define the edge on which the particle moves. The overall effect, that is, the trajectory of a lazy random walk with initial position xV(X k ), is the same as computing the image of x under the permutation r=∏ iJ g i ; we say that the permutation r is realised by the lazy random walk. The construction of r uses only the number and SS −1, so the permutation r can be considered as realised by lazy random walks in more than one graph X k . Of course, these lazy random walks are not independent.

Lemma 2.5 will be useful to us in the following form.

Lemma 2.6

Let S be a set of generators of \(\operatorname {Sym}(\varOmega)\) or \(\operatorname {Alt}(\varOmega)\) of cardinality at most n, and let k be a positive integer. Fix 0<ε<1 and set ≥2n 2k+1log(n k/ε). Then, if \(r \in \operatorname {Sym}(\varOmega)\) is realised by a lazy random walk of length on X k , and x,yΩ (k), then

$$(1-\varepsilon)\frac{1}{n_{(k)}} \leq \mathbb {P}\bigl(x^r=y \bigr) \leq(1+\varepsilon)\frac{1}{n_{(k)}}. $$

Proof

Recall that n (k)=|V(X k )| by definition, and n (k)n k. Furthermore, |SS −1|≤2n, and the proof follows from Lemma 2.5. □

Note that if \(r\in \operatorname {Sym}(\varOmega)\) is realised by a lazy random walk of length on X k , then r∈(SS −1∪{1}).

3 Primary machinery

The results of this section provide the primary machinery for a proof of Theorem 1.3. The main step of the proof is that given a set S of generators for \(G=\operatorname {Sym}(n)\) or \(\operatorname {Alt}(n)\) and aS of support size \(|\operatorname {supp}(a)|=\delta n\), we would like to construct a permutation as a short word in S with support size less than δn. The permutations we consider are of the form w(a,a r) for an appropriately chosen reduced word w in the symbols {α,α −1,β,β −1}. In this section, we assume that w is given and describe how to choose rG such that w(a,a r) has many fixed points. We obtain r as a permutation realised by a lazy random walk.

Let w=w 1 w 2w k , and let \({\mathcal{T}}=\{ T_{1},\ldots,T_{m} \}\) be a set of pairwise non-isomorphic αβ-trees admitting w. By Lemma 2.3(ii), we would like to choose r so that each \(T \in {\mathcal{T}}\) is hosted by (ι,a,a r) for many maps ι:V(T)→Ω. As a is fixed, it is beneficial first to examine embeddings of T α and T β separately.

We prove results for two kinds of permutations a. In the “generic” case, all non-trivial cycles of a are long, compared to the α- and β-paths occurring in trees T, and in the “special” case, \(\operatorname {supp}(a)\) consists of short cycles of equal length. We fix a small set ΛΩ (in the application in Sect. 4, |Λ|≤10) and require that r fixes Λ setwise and acts on Λ in some prescribed way. The purpose of prescribing the action of r on a small set is to ensure that w(a,a r) is not trivial. (This trick has already been used in [2].) As the points in Λ play a special role, we are only interested in injections ι:V(T)→ΩΛ. First, we handle the “generic” case.

Lemma 3.1

Let 0<δ 0<1/2, and let κ,λ,N be positive integers. Suppose that \(a \in \operatorname {Sym}(\varOmega)\) has no cycles of length less than N and that \(|\operatorname {supp}(a)|=\delta n\) for some δ∈(δ 0,1−δ 0). Let γ∈{α,β}, and let T be an αβ-tree such that |V(T)|≤κ, T has no γ-cycles, and every γ-path in T has at most N vertices. Let ΛΩ, |Λ|≤λ, and let

$$\mathcal{S}_{\gamma} (T):= \bigl\{ \iota: V(T) \to \varOmega \setminus \varLambda \mid T_\gamma \ \textit{is hosted by} \ (\iota,a) \bigr\} . $$

Then

$$\bigl|\mathcal{S}_{\gamma} (T)\bigr| \ge C(\delta_0,\kappa,\lambda,N,n) (1-\delta)^{l_\gamma(T)} \delta^{p_\gamma(T)} n^{l_\gamma(T)+p_\gamma(T)}, $$

where C(δ 0,κ,λ,N,n) is a function with lim n→∞ C(δ 0,κ,λ,N,n)=1.

By lim n→∞ C(δ 0,κ,λ,N,n) we mean that the variables δ 0,κ,λ and N are fixed and n goes to ∞.

Proof

Let \(v_{1},\ldots,v_{l_{\gamma}(T)}\) be the loops of T γ , and let \(P_{1},\ldots,P_{p_{\gamma}(T)}\) be the directed paths of T γ . We embed the components of T γ into ΩΛ one-by-one and estimate \(|\mathcal{S}_{\gamma} (T)|\) by the product of the number of possible embeddings at each step.

The vertices v i , 1≤il γ (T), have to be mapped to fixed points of a. As a has at least nδnλ fixed points outside Λ, \(v_{1}\iota,\ldots, v_{l_{\gamma}(T)}\iota\) can be chosen in at least \((n-\delta n -\lambda)(n-\delta n-\lambda -1) \cdots (n-\delta n - \lambda - l_{\gamma}(T)+1) \ge (n-\delta n - \lambda - \kappa)^{l_{\gamma}(T)}\) distinct ways.

Next, we consider the directed paths in T γ . For 1≤ip γ (T), we write \(P_{i}=(w_{i,0},\ldots,w_{i,c_{i}})\). Now, once the image of w i,0 under ι is chosen, in order to guarantee that ι hosts the path P i , we require that \(w_{i,j}\iota=(w_{i,0}\iota)^{a^{j}}\) for each j=0,…,c i . Hence, for each i∈{1,…,p γ (T)}, the image of P i under ι is uniquely determined by w i,0 ι. Let Δ i be the union of the sets P z ι for z<i. Since by hypothesis a has no cycles of length N−1 or shorter and since T has no γ-path of length greater than N, the only requirement for choosing the image of w i,0 under ι is that

$$w_{i,0}\iota \notin \bigcup_{j=0}^{c_i}(\Delta_i \cup \varLambda)^{a^{-j}} $$

(since we have to avoid Λ and the ι-images of previously mapped P z ). A gross overestimate for the size of this union is (c i +1)(λ+|V(T)|)≤κλ+κ 2, and so w i,0 ι can be chosen in at least δnκλκ 2 ways. Summarizing, we obtain

$$\bigl|\mathcal{S}_{\gamma} (T)\bigr| \ge (n-\delta n - \lambda - \kappa)^{l_\gamma(T)} \bigl(\delta n - \kappa\lambda - \kappa^2\bigr)^{p_\gamma(T)}. $$

By factoring out δ, (1−δ) and n we get

$$ \bigl|\mathcal{S}_\gamma (T)\bigr| \ge C(\delta_0,\kappa, \lambda,N,n) (1-\delta)^{l_\gamma(T)} \delta^{p_\gamma(T)} n^{l_\gamma(T)+p_\gamma(T)}, $$

where

$$C(\delta_0,\kappa,\lambda,N,n)= \biggl( 1 - \frac{\kappa+\lambda}{\delta_0 n} \biggr) ^\kappa \biggl( 1-\frac{\kappa\lambda+\kappa^2}{\delta_0 n} \biggr) ^\kappa. $$

Clearly, lim n→∞ C(δ 0,κ,λ,N,n)=1. □

The case of “special” permutations a is very similar.

Lemma 3.2

Let 0<δ 0<1/2, and let κ,λ,N be positive integers. Suppose that every cycle of \(a \in \operatorname {Sym}(\varOmega)\) has length 1 or N and \(| \operatorname {supp}(a)|=\delta n\) for some δ∈(δ 0,1−δ 0). Let γ∈{α,β}, and let T be an αβ-tree such that |V(T)|≤κ, every γ-cycle in T has N vertices, and every γ-path in T has at most N vertices. Let ΛΩ, |Λ|≤λ, and let

$$\mathcal{S}_{\gamma} (T):= \bigl\{ \iota: V(T) \to \varOmega \setminus \varLambda \mid T_\gamma \ {\textit{is hosted by}} \ (\iota,a) \bigr\} . $$

Then

$$\bigl|\mathcal{S}_{\gamma} (T)\bigr| \ge C(\delta_0,\kappa,\lambda,N,n) (1-\delta)^{l_\gamma(T)} \delta^{p_\gamma(T)} n^{l_\gamma(T)+p_\gamma(T)}, $$

where C(δ 0,κ,λ,N,n) is a function with lim n→∞ C(δ 0,κ,λ,N,n)=1.

Proof

We may follow almost verbatim the proof of Lemma 3.1. The only difference is that the list \(P_{1},\ldots,P_{p_{\gamma}(T)}\) may also contain γ-cycles, so we have to change slightly the definition of the vertices w i,0. If P i is a γ-path, then w i,0 is the starting vertex of the path as before, while if P i is a γ-cycle, then w i,0 can be chosen as an arbitrary vertex of P i . Since the γ-cycles of T γ have the same length as the cycles in \(\operatorname {supp}(a)\), the rest of the proof goes through without any modification. □

Now we are ready to prove the main result of this section. Let 0<δ 0<1/2 and \(\kappa,\lambda,N \in \mathbb {Z}^{+}\) be fixed, and let w=w 1w k be a reduced word in {α,α −1,β,β −1}. Suppose further that \({\mathcal{T}}=\{ T_{1},\ldots,T_{m} \}\) is a set of αβ-trees admitting w and |V(T)|≤κ for all \(T \in {\mathcal{T}}\). Let \(G=\operatorname {Sym}(n)\) or \(\operatorname {Alt}(n)\) be generated by a set S of cardinality at most n. We do not have to distinguish the two cases (“generic” and “special”) for a anymore, so let \(a \in \operatorname {Sym}(\varOmega)\) with \(|\operatorname {supp}(a)|=\delta n\) for some δ∈(δ 0,1−δ 0) and suppose that either

  • all non-trivial cycles in a have length at least N, none of the αβ-trees \(T \in {\mathcal{T}}\) have any cycles, and every α- and β-path in T has at most N vertices; or

  • every cycle of a has length 1 or N, for all \(T\in {\mathcal{T}}\), every α- and β-cycle has N vertices, and every α- and β-path has at most N vertices.

Let ΛΩ, |Λ|≤λ, let \(g \in \operatorname {Sym}(\varLambda)\), and let S γ (T) be as in Lemmas 3.1 and 3.2. Finally, let an error bound ε>0 be given. We “collect” errors in estimates from different sources, so we choose ε′<ε such that

$$\frac{(1-\varepsilon')^3}{1+\varepsilon'}=1-\varepsilon . $$

Moreover, we may assume that n is larger than a bound n 0(δ 0,κ,λ,N,ε) depending only on δ 0,κ,λ,N and ε such that:

  1. (i)

    For all γ∈{α,β} and for all \(T \in {\mathcal{T}}\),

    $$ \bigl|{\mathcal{S}}_\gamma(T)\bigr| > \bigl( 1- \varepsilon'\bigr) (1-\delta)^{l_\gamma(T)} \delta^{p_\gamma(T)} n^{l_\gamma(T)+p_\gamma(T)}. $$
    (3.2.1)

    (Note that this inequality is satisfied by large enough n, by Lemmas 3.1 and 3.2.)

  2. (ii)

    n 2(κ+λ+1)>2n 2(κ+λ)+1log(n κ+λ/ε′).

Recall that \(\delta_{T}=(1-\delta)^{l_{\alpha}(T)+l_{\beta}(T)}\delta^{p_{\alpha}(T)+p_{\beta}(T)}\) and that fixed(T) denotes the number of fixed vertices of (T,w).

Theorem 3.3

With the notation of the previous paragraph, there exists \(r \in \operatorname {Sym}(\varOmega)\) realised by a lazy random walk of length n 2(κ+λ+1) such that r| Λ =g and

$$\bigl|\operatorname {fix}\bigl(w \bigl(a,a^r \bigr) \bigr)\bigr| > (1-\varepsilon) n \sum_{j=1}^m \frac{\delta_{T_j} \cdot \mathsf {fixed}(T_j)}{|\operatorname {Aut}(T_j)|}. $$

By r| Λ we mean the restriction of the permutation r (considered as a function r:ΩΩ) to Λ.

Proof

Let r be realised by a lazy random walk of length :=n 2(κ+λ+1). Our main goal is to give an estimate for the conditional expectation \(\mathbb {E}(|\operatorname {fix}(w(a,a^{r}))| \mid r|_{\varLambda}= g)\).

Let \(T \in {\mathcal{T}}\) and \(\iota \in \mathcal{S}_{\alpha}(T)\) be arbitrary but fixed. First, we give an estimate for the probability that T is hosted by (ι,a,a r). By Lemma 2.6, for any x,y∈(ΩΛ)(|V(T)|),

$$\operatorname {Prob}\bigl(x^r=y \wedge r|_\varLambda=g\bigr) \ge \bigl(1- \varepsilon'\bigr)\frac{1}{n_{(|\varLambda|+|V(T)|)}\!} \ \mbox{and}\ \bigl(1+\varepsilon'\bigr)\frac{1}{n_{(|\varLambda|)}\!} \ge \operatorname {Prob}(r|_\varLambda=g). $$

Hence, for the conditional probability \(\operatorname {Prob}(x^{r}=y \mid r|_{\varLambda}=g)\), we have

$$ \operatorname {Prob}\bigl(x^r=y \mid r|_\varLambda=g \bigr) \ge \frac{1-\varepsilon'}{1+\varepsilon'} \frac{1}{(n-|\varLambda|)_{(|V(T)|)}} > \frac{1-\varepsilon'}{1+\varepsilon'} \frac{1}{n^{|V(T)|}}. $$
(3.3.1)

Since by definition T α is hosted by (ι,a), T is hosted by (ι,a,a r) if and only if T β is hosted by (ι,a r); this is equivalent to

$$\begin{aligned} & (x\iota)^{a^r}=y\iota \quad \mbox{for all}\ (x,y)\in E(T_\beta) \\ &\quad \Longleftrightarrow\quad (x\iota)^{r^{-1}ar} = y\iota \quad \mbox{for all}\ (x,y) \in E(T_\beta) \\ &\quad \Longleftrightarrow\quad (x\iota)^{r^{-1}a} = (y\iota)^{r^{-1}} \quad \mbox{for all}\ (x,y)\in E(T_\beta) \\ &\quad \Longleftrightarrow\quad \mbox{the function}\ V(T)\to \varOmega, x\mapsto (x \iota)^{r^{-1}} \ \mbox{is in}\ \mathcal{S}_\beta(T) \\ &\quad \Longleftrightarrow\quad \bigl(V(T)\iota_z \bigr)^r=V(T) \iota \quad \mbox{for some} \ \iota_z \in \mathcal{S}_\beta(T). \end{aligned}$$

Lemma 2.3 implies that for different \(\iota_{z_{1}}, \iota_{z_{2}} \in \mathcal{S}_{\beta}(T)\), the events \((V(T) \iota_{z_{i}})^{r}=V(T)\iota\) are disjoint. Therefore, also using (3.2.1) and (3.3.1),

$$\begin{aligned} &\operatorname {Prob}\bigl(T \ \mbox{is hosted by}\ \bigl(\iota,a,a^r \bigr) \mid r|_\varLambda=g \bigr) \\ &\quad =\sum_{\iota_z \in \mathcal{S}_\beta(T)} \operatorname {Prob}\bigl( \bigl(V(T)\iota_{z} \bigr)^r=V(T) \iota \mid r|_\varLambda=g \bigr) \\ &\quad > \frac{(1-\varepsilon')^2}{1+\varepsilon'} \frac{(1-\delta)^{l_\beta(T)} \delta^{p_\beta(T)} n^{l_\beta(T)+p_\beta(T)}}{n^{|V(T)|}}. \end{aligned}$$
(3.3.2)

Next, by Lemma 2.3(ii) and (3.2.1), (3.3.2),

$$\begin{aligned} &\mathbb {E}\bigl(\bigl|\operatorname {fix}\bigl(w \bigl(a,a^r \bigr) \bigr)\bigr| \bigm\vert r|_\varLambda=g \bigr) \\ &\quad \geq \sum_{j=1}^m \sum_{\iota \in \mathcal{S}_\alpha(T_j)} \operatorname {Prob}\bigl(T_j \ \mbox{is hosted by} \ \bigl(\iota,a,a^r \bigr) \bigm\vert r|_\varLambda=g \bigr) \frac{\mathsf {fixed}(T_j)}{|\operatorname {Aut}(T_j)|} \\ &\quad > \sum_{j=1}^m \frac{(1-\varepsilon')^3}{1+\varepsilon'} \frac{(1-\delta)^{l_\alpha(T_j)+l_\beta(T_j)} \delta^{p_\alpha(T_j)+p_\beta(T_j)} n^{l_\alpha(T_j)+l_\beta(T_j) +p_\alpha(T_j)+p_\beta(T_j)}}{n^{|V(T_j)|}} \\ & \qquad {}\times \frac{\mathsf {fixed}(T_j)}{|\operatorname {Aut}(T_j)|}. \end{aligned}$$

Finally, by Lemma 2.2, l α (T j )+l β (T j )+p α (T j )+p β (T j )=|V(T j )|+1, yielding

$$\mathbb {E}\bigl( \bigl|\operatorname {fix}\bigl(w\bigl(a,a^r \bigr) \bigr)\bigr| \bigm\vert r|_\varLambda=g \bigr) > (1-\varepsilon) n \sum_{j=1}^m \frac{\delta_{T_j} \cdot \mathsf {fixed}(T_j)}{|\operatorname {Aut}(T_j)|}. $$

To finish the proof of the theorem, we simply take r that gives at least the expected number of fixed points. □

4 Proof of Theorem 1.3

In this section, we apply Theorem 3.3 to prove Theorem 1.3. We start with two technical lemmas.

Lemma 4.1

Let m be an integer and take \(g,h\in \operatorname {Sym}(m)\). In each of the following cases, [h,(h g)−1][h,h g] contains a 7-cycle:

  1. (1)

    m≥7, h=(1,2,3,…,m), g contains the cycle (1,3,m) and fixes the points 2,4,5,6,m−3,m−2,m−1;

  2. (2)

    m=7, h=(1,2,3,4,5)(6)(7), g=(1,6)(3,7)(2)(4)(5);

  3. (3)

    m=7, h=(1,2,3)(4,5,6)(7), g=(1,7,2,4)(3)(5)(6);

  4. (4)

    m=7, h=(1,2)(3,4)(5,6)(7), g=(1,5,7,2,3)(4)(6).

Proof

To show (1), we note simply that (1,m,5,3,m−1,4,2) is a 7-cycle of [h,(h g)−1][h,h g]. Parts (2),(3) and (4) follow easily. □

Note that in each case in Lemma 4.1 we prescribe the action of g on at most 10 points.

For 0<δ<1, define

$$\begin{aligned} f(\delta) := &(1 - \delta)^2 + (1 -\delta)^2 \delta + (1 - \delta)^3 \delta + 4 (1 -\delta)^4 \delta^2 + 2 (1 - \delta)^2\delta^3 \\ &{}+ 3 (1 - \delta)^5 \delta^3 + 10 (1 -\delta)^9 \delta^4 + 26 (1 - \delta)^7\delta^5 + 20 (1 - \delta)^8 \delta^5 \\ &{}+ 6 (1 - \delta)^5 \delta^6 + 16 (1 -\delta)^6 \delta^6 + 40 (1 - \delta)^8\delta^6 + 3 (1 - \delta)^4 \delta^7 \\ &{}+ 8 (1 - \delta)^6 \delta^7 + 20 (1 -\delta)^7 \delta^7 + 10 (1 - \delta)^8\delta^9 + 20 (1 - \delta)^7 \delta^{10} \\ &{}+ 10 (1 - \delta)^6 \delta^{11} + 15 (1 -\delta)^7 \delta^{11}. \end{aligned}$$
(4.1.1)

Lemma 4.2

  1. (1)

    The function δ↦1−0.999f(δ) is monotone increasing on the interval (0,1).

  2. (2)

    The equation 0.999f(δ)=1−δ has a unique solution in (0,1). Up to six significant digits, the solution is δ=0.632599.

  3. (3)

    Starting with δ=0.63, nine iterations of the function δ↦1−0.999f(δ) reach a value less than 0.326.

Proof

All three results can be established using an algebra package such as Sage [19]. □

We are now ready to prove Theorem 1.3.

Proof of Theorem 1.3

Let a be an element of S with \(|\operatorname {supp}(a)|<Cn=0.63n\). We shall apply the results of Sect. 3 for the word \(w=w_{0}^{60}\), where w 0=w 0(α,β)=[α,β −1][α,β].

The proof splits into a number of cases, according to the order of a. Let \(|a|=2^{e_{1}}3^{e_{2}}e_{3}\), with e 3 coprime to 6. Note that \(2^{e_{1}} \le n\) and \(3^{e_{2}} \le n\). Suppose first that e 3>1. Then \(a^{2^{e_{1}}3^{e_{2}}}\) is non-trivial, and we may replace a with \(a^{2^{e_{1}}3^{e_{2}}}\) and assume that the order of a is coprime to both 2 and 3 (note that \(a^{2^{e_{1}}3^{e_{2}}}\in (S\cup S^{-1} \cup \{1\})^{n^{2}}\)).

In Tables 1, 2, 3, 4, and 5, we present a family \({\mathcal{T}}\) of αβ-trees admitting \(w_{0}^{60}\). These αβ-trees were obtained with the help of a computer; note that our conventions for representing them are outlined in Sect. 2.3. Note too that if an αβ-graph admits \(w_{0}^{e}\), then it also admits \(w_{0}^{ek}\) for any positive integer k. The αβ-trees in these tables are pairwise non-isomorphic, do not have non-identity automorphisms, contain at most 16 vertices, and have all α-paths and β-paths of length at most 4. For this family,

$$\sum_{T \in {\mathcal{T}}} \frac{\delta_{T} \cdot \mathsf {fixed}(T)}{|\operatorname {Aut}(T)|}=f(\delta), $$

where f(δ) is as in (4.1.1). If a has a cycle of length m≥7, then relabel the letters of Ω so that this cycle is equal to (1,…,m), define Λ={1,2,3,4,5,6,m−3,m−2,m−1,m}, and let \(g=(1,3,m)(2)(4)(5)(6)(m-3)(m-2)(m-1)\in \operatorname {Sym}(\varLambda)\). If this is not the case, then we must have e 3=5; in this case, relabel the letters of Ω so that a contains the cycle (1,2,3,4,5) and a fixes both 6 and 7. Then define Λ={1,…,7} and let g=(1,6)(3,7)(2)(4)(5).

Table 2 αβ-trees admitting w 2 but not w
Table 3 αβ-trees admitting w 3 but not w
Table 4 αβ-trees admitting w 4 but not w 2
Table 5 αβ-trees admitting w 5 but not w

By Lemma 2.1 we may suppose that |S|≤n. We apply Theorem 3.3 with δ 0=0.3, \(w=w_{0}^{60}\), \({\mathcal{T}}\), N=4, λ=10, g, κ=16 and ε=0.001. We obtain \(r \in (S \cup S^{-1}\cup\{1\})^{n^{54}}\) such that \(|\operatorname {fix}(w(a,a^{r}))| \ge 0.999 f(\delta)\). Note that \(w(a,a^{r})\in (S \cup S^{-1}\cup\{1\})^{n^{54}+n^{2}}\) and, of course, n 54+n 2=O(n 54). By Lemma 4.1, (1) and (2), w(a,a r) is non-trivial because it contains a 7-cycle. Replacing a by w(a,a r) and using the same procedure as above, Lemma 4.2, part (3), implies that in at most nine iterations we obtain a permutation a′ with support size less than 0.326n. Each iteration increases the word length by a factor O(n 2), as we may have to raise the input permutation to a suitable power to eliminate 2 and 3 from the cycle lengths, while conjugating by (a new) r and substituting into the word w contributes only constant multipliers to the word length. Hence, a′ is a word in S of length O(n 70), and, by Theorem 1.2, \(\operatorname {diam}(\varGamma(G,S))=O(n^{78})\).

Suppose next that e 3=1, that is, a has order \(2^{e_{1}}3^{e_{2}}\). If e 1>0, let \(k=2^{e_{1}-1}3^{e_{2}}\), otherwise let \(k=3^{e_{2}-1}\). Then k<n 2, a k has order 2 or 3, and \(a^{k}\in (S\cup S^{-1}\cup\{1\})^{n^{2}}\).

In Tables 6 and 7, we present two families of αβ-trees admitting \(w=w_{0}^{60}\), represented as before. The αβ-trees in Table 6 (resp. Table 7) are pairwise non-isomorphic, contain at most 10 vertices, have all α-cycles and β-cycles of length 2 (resp. 3), and have all α-paths and β-paths of length at most 1 (resp. 2).

Table 6 αβ-trees for an involution
Table 7 αβ-graphs for an element of order 3

In each cell of Tables 6 and 7, we have written \(\operatorname {Aut}=k\) to mean that the automorphism group of the corresponding αβ-tree T has size k. We define \({\mathcal{T}}\) to be the set of αβ-trees in Table 6 (resp. Table 7) when a has order 2 (resp. has order 3). For these families,

$$\sum_{T \in {\mathcal{T}}} \frac{\delta_{T} \cdot \mathsf {fixed}(T)}{|\operatorname {Aut}(T)|}=h(\delta), $$

where

$$ h(\delta)= \begin{cases} (1-\delta)^2(1+2\delta+3\delta^2+ 4\delta^3+5\delta^4+6\delta^5) & \mbox{if} \ a \ \mbox{has order 2}; \cr (1 - \delta)^2 + \delta (1 - \delta)^2 + \delta (1 - \delta)^3 +2 \delta^3 (1 - \delta)^2 &\mbox{if}\ a\ \mbox{has order 3}. \cr \quad {}+4\delta^2 (1 - \delta)^4 + 3\delta^3 (1 - \delta)^5 + 12 \delta^{7} (1 - \delta)^4 \cr \quad {}+6\delta^6 (1 - \delta)^4+ \delta^4 (1 - \delta)^6 & \end{cases} $$

Define Λ={1,…,7}. If a has order 2 label the elements of Ω so that a| Λ =(1,2)(3,4)(5,6)(7), then define g=(1,5,7,2,3)(4)(6). If a has order 3 label the elements of Ω so that a| Λ =(1,2,3)(4,5,6)(7), then define g=(1,7,2,4)(3)(5)(6).

By Lemma 2.1 we may suppose that |S|≤n. We define N to equal 1 (resp. 2) when a has order 2 (resp. has order 3). We apply Theorem 3.3 with δ 0=0.3, \(w=w_{0}^{60}\), \({\mathcal{T}}\), N, λ=7, g, κ=10 and ε=0.001. We obtain \(r \in (S \cup S^{-1}\cup\{1\})^{n^{36}}\) such that \(|\operatorname {fix}(w(a,a^{r}))| \ge 0.999 h(\delta)\).

Using Sage [19], it is easy to check that the function δ↦1−0.999h(δ) is increasing on the interval (0,1). Furthermore, for δ≤0.63, we have 0.999h(δ)>0.374, and so \(|\operatorname {supp}(w(a,a^{r}))|<0.626\). Note that \(w(a,a^{r})\in (S\cup S^{-1})^{O(n^{36})}\) and, by Lemma 4.1, (3) and (4), w(a,a r) contains a 7-cycle.

We now run the first part of the argument using the element w(a,a r) instead of a as our initial element of small support. After one iteration we obtain an element \(a'\in (S\cup S^{-1}\cup \{1\})^{n^{2}O(n^{36})+n^{54}}\) with support of size less than 1−0.999f(δ). Iterating as before, we obtain that \(\operatorname {diam}(\varGamma(G,S))=O(n^{78})\). □

5 Improving Theorem 1.3

It should be clear to the reader that Theorem 1.3 is not optimal. In particular, we prove Theorem 1.3 with respect to a particular word, \(w=w_{0}^{60}\) where w 0=[α,β −1][α,β]; it is this word that yields the value C=0.63. How might one go about improving this value?

The most obvious way of improving Theorem 1.3 is via an appeal to higher powers. Consider \(w_{k}=w_{0}^{k}\) where k is any multiple of 60. All of the trees in Tables 1, 2, 3, 4 and 5 admit w k , and, for suitable choices of k, there will be yet more trees to consider. This will inevitably result in an increase for the value of C.

There is a limit to the improvement that such a strategy might yield, and we briefly explain why this is the case. Fix a word v 0, let k be some positive integer, and define the word \(v_{k}=v_{0}^{k}\). Let \(\mathcal{T}=\{T_{1}, \dots, T_{m}\}\) be a set of αβ-trees admitting v k and consider the sum

$$ g_k(\delta) = \sum _{T \in {\mathcal{T}}} \frac{\delta_{T} \cdot \mathsf {fixed}(T)}{|\operatorname {Aut}(T)|}. $$
(5.0.1)

The main result of Sect. 3, Theorem 3.3, gives a lower bound for \(|\operatorname {fix}(w(a,a^{r}))|\) in terms of g k (δ), ε and n; here a is an element of support equal to δn, and r is some short word. In particular, if the value of g k (δ) exceeds the value of 1−δ (by a sufficient margin in terms of ε), then we obtain an element of smaller support than a.

The advantage of considering higher powers of the word v 0 is exhibited in Theorem 3.3 by noting that if, say, k doubles, then new αβ-trees may be added to \(\mathcal{T}\), thereby increasing the value of g k (δ) for δ∈(0,1).

Using methods different from those in this paper, the authors have developed a method to show that, in the “generic” case (see Sect. 3 for an explanation of what we mean by this), there is a number δ 0∈(0,1) such that

$$\limsup_{k\to\infty} g_k(\delta)<1-\delta $$

whenever δ>δ 0. It is important to note that the number δ 0 depends only on the word v 0. Furthermore, the number δ 0 corresponds to the unique all-positive solution to a certain system of polynomial equations with rational coefficients, and this system depends only on the word v 0.

In the case v 0=w 0=[α,β −1][α,β], our method shows that δ 0≈0.64242. One can see, then, that the value for C given in Theorem 1.3 is close to optimal when it comes to powers of the word w 0.

In another direction, one might hope to improve Theorem 1.3 using an entirely different choice of word w 0. With this in mind, we undertook an exhaustive search of words of length at most 20 which were balanced (i.e. α,α −1,β and β −1 all occur the same number of times) and in which α ±1 and β ±1 occur alternately. For each such word w 1, we performed the following computer experiment. For a variety of values n in the range 104n≤105 and δ in the range 0.55≤δ≤0.70, we took a random permutation aS n with \(|\operatorname {supp}(a)|=\delta n\), constructed b as a random conjugate of a, and counted how many points occur in cycles of length at most 6 in the permutation w 1(a,b). The highest counts occurred for w 0=[α,β −1][α,β] and for related words (like the cyclic permutations of w 0 or \(w_{0}^{2}\)), and this is the reason for our use of the word w 0 in the preceding proof.

We also carried out a non-exhaustive investigation into words that were either non-balanced or non-alternating. In every case, for a word w 1 of this kind and for a and b as above, the computer tests suggested that permutations of the form w 1(a,b) tended to have a smaller number of points in short cycles than w 0(a,b).