Transport-entropy inequalities on locally acting groups of permutations

Following Talagrand's concentration results for permutations picked uniformly at random from a symmetric group [Tal95], Luczak and McDiarmid have generalized it to more general groups G of permutations which act suitably 'locally'. Here we extend their results by setting transport-entropy inequalities on these permutations groups. Talagrand and Luczak-Mc-Diarmid concentra- tion properties are consequences of these inequalities. The results are also gen- eralised to a larger class of measures including Ewens distributions of arbitrary parameter $\theta$ on the symmetric group. By projection, we derive transport-entropy inequalities for the uniform law on the slice of the discrete hypercube and more generally for the multinomial law. These results are new examples, in discrete setting, of weak transport-entropy inequalities introduced in [GRST15], that con- tribute to a better understanding of the concentration properties of measures on permutations groups. One typical application is deviation bounds for the so- called configuration functions, such as the number of cycles of given lenght in the cycle decomposition of a random permutation.


Introduction
Let S n denote the symmetric group of permutations acting on a set Ω of cardinality n, and µ o denote the uniform law on S n , µ o (σ) := 1 n! , σ ∈ S n . A seminal concentration result on S n obtained by Maurey is the following.
Then for any subset A ⊂ S n such that µ o (A) ≥ 1/2, and for all t ≥ 0, one has Milman and Schechtman [MS86] generalized this result to some groups whose distance is invariant by translation. For example, in the above result we may replace and applying Tchebychev inequality with usual optimization arguments. Talagrand's result has been first extended to the uniform probability measure on product of symmetric groups by McDiarmid [McD02], and then further by Luczak and McDiarmid to cover more general permutation groups which act suitably "locally" [LM03].
For any finite subset A, let #A denote the cardinality of A. For any σ ∈ S n , the support of σ, denoted by supp(σ), is the set {i ∈ Ω, σ(i) i} and the degree of σ, denoted by deg(σ), is the cardinality of supp(σ), deg(σ) := # supp(σ).
The orbit of an element j ∈ Ω, denoted by orb( j), is the set of elements in Ω connected to j by a permutation of G, orb( j) := σ( j), σ ∈ G .
The set of orbits provides a partition of G.
As explained in [LM03], any 2-local group is a direct product of symmetric groups on its orbits, the alternating group (consisting of even permutations) is 3local, and any 3-local group is a direct product of symmetric or alternating groups on its orbits.
In the present paper, the concentration result by Luczak-McDiarmid and Talagrand is a consequence of a weak transport-entropy inequality satisfied by the uniform law on G, µ o . We also prove weaker types of transport entropy inequalities. Moreover we extend the results to a larger class of probability measures on G, denoted by M.
where by definition U#ν(C) =ν(U −1 (C)) for any subset C in S n .
The uniform measure µ o on S n belongs to the set M since µ o = U#μ witĥ µ =μ 2 ⊗ · · · ⊗μ n , where for each i,μ i denotes the uniform law on [i].
Let us observe that the uniform distribution µ o corresponds to the Ewens distribution with parameter 1, µ 1 .
Let us now construct the class of measures M for any group G of permutations. To clarify the notations, the elements of Ω are labelled with integers, Ω = [n]. Let G n := G and for any j ∈ [n − 1], let G j denotes the subgroup of G defined by G j := {σ ∈ G, σ( j + 1) = j + 1, . . . , σ(n) = n} , We denote by O j the orbit of j in G j , Definition 1.1. Let G be a group of permutations. A family T = (t i j , j ) of permutations of G, indexed by j ∈ {2, . . . , n} and i j ∈ O j , is called "ℓ-local base of G" if for every j ∈ {2, . . . , n}, t j, j := id, for every i j j, t i j , j ∈ G j and t i j , j (i j ) = j, and deg(t i j j ) ≤ ℓ.
is one to one. Lemma 1.2. Any ℓ-local group of permutations admits a "ℓ-local base".
For completeness, a proof of these two lemmas is given in the Appendix. As a consequence of these lemmas, if G is a ℓ-local group, then there exists a ℓ-local base T , such that the uniform probability measure µ o satisfies µ o = U T #μ, withμ =μ 2 ⊗ · · · ⊗μ n , where for each j,μ j is the uniform law on O j .
As for the symmetric group, given a ℓ-local base T of a group G, the class of measures M = M T on G is made up of all probability measures on G which are pushed forward of product probability measures on O 2 × O 3 × · · · × O n by the map U T defined by (37), As explained above, if G is a ℓ-local group, the class M T contains the uniform law µ o on G for a well choosen ℓ-local base T .
In this paper, the concentration results are derived from weak transport-entropy inequalities, involving the relative entropy H(ν|µ) between two probability measures µ, ν on G given by if ν is absolutely continuous with respect to µ and H(ν|µ) := +∞ otherwise.
The terminology "weak transport-entropy" introduced in [GRST15], encompass many kinds of transport-entropy inequalities from the well-known Talagrand's transport inequality satisfied by the standard Gaussian measure on R n [Tal96], to the usual Csizár-Kullback-Pinsker inequality [Pin64,Csi67,Kul67] that holds for any (reference) probability measure µ on a Polish metric space X, namely where µ − ν T V denotes the total variation distance between µ and ν, Above, the supremum runs over all measurable subset A of X. We refer to the survey [Sam16,Sam17] for other examples of weak transport-entropy inequalities and their connections with the concentration of measure principle.
The next theorem is one of the main result of this paper. It presents new weak transport inequalities for the uniform measure on G or any measure in the class M T , that recover the concentration results of Theorems 1.1 and 1.2.
By definition a subgroup G of S n is normal if for any t ∈ S n ,t −1 Gt = G.
In the next theorem the constant K n is the cardinality of the set j ∈ {2, . . . , n}, O j { j} . It follows that 0 ≤ K n ≤ (n − 1) and K n = 0 if and only if G = {id}. Theorem 1.3. Let G be a group of permutations with ℓ-local base T . Let µ ∈ P(G) be a measure of the set M T defined by (4).
The proofs of these results, given in the next section, are inspired by Talagrand seminal work on S n [Tal95], and Luczak-McDiarmid extension to ℓ-local groups [LM03].
Comments : • If G = S n and the class of measure M is given by (1), the Ewens distribution µ θ introduced before, is an interesting example of measure in M, satisfying condition (10). This simply follows from its expression given by (2), since for any σ, t ∈ S n , |σ −1 | = |σ| and |t −1 σt| = |σ|. An open question is to generalize the above transport-entropy inequalities to the generalized Ewens distribution (see the definition in [MNZ12,HNNZ13]). This measure no longer belongs to the class of measure M. In other words, no Chinese restaurant process are known for simulating the generalized Ewens distribution.
• From the triangular inequality satisfied by the Wasserstein distance W 1 , the transport-entropy inequality (7) is clearly equivalent to the following transport-entropy inequality, for all probability measure ν on G, Here is a popular dual formulation of this transport-entropy inequality: for all 1-Lipschitz functions ϕ : G → R (with respect to the distance d), For the uniform measure on S n , K n = n − 1 and this property is widely commented in [BHT06]; it is also a consequence of Hoeffding inequalities for bounded martingales (see page 18 of [Hoe63]). The concentration result derived from item (a) are of the same nature as the one obtained by the "bounded differences approach" in [Mau79,McD89,McD02,LM03,BDR15].
• Similarly, by Proposition 4.5 and Theorem 2.7 of [GRST15] and using the identity we may easily show that the weak transport-entropy inequality (8) is equivalent to the following dual property: for any real function ϕ on G and for any 0 < α < 1, Å e α Q Kn ϕ dµ where the infimum-convolution operator ‹ Q t ϕ, t ≥ 0, is defined by Moreover, let us observe that following our proof of (12) in the next section, for each α ∈ (0, 1) the inequality (12) can be improved by replacing the square cost function by the convex cost c α (u) ≥ u 2 /2, u ≥ 0 given in Lemma 2.2. More precisely, (12) holds replacing ‹ Q K n ϕ by ‹ Q α K n ϕ defined by for any σ ∈ G, t > 0. • Proposition 4.5 and Theorem 9.5 of [GRST15] also provide a dual formulation of the weak transport-entropy inequality (9): for any real function ϕ on G and for any 0 < α < 1, where the infimum convolution operator Ù Qϕ is defined by As explained at the end of this section, the property (13) directly provides the following version of the Talagrand's concentration result for any measure on G of the set M T .
Corollary 1.1. Let G be a group of permutations with ℓ-local base T . Let µ ∈ P(G) be a measure of the set M T defined by (4). Assume that µ and G satisfy the conditions of (b) in Theorem 1.3. Then, for all A ⊂ G and all α ∈ (0, 1), one has with the same definition for c(ℓ) 2 as in part (b) of Theorem 1.3. As a consequence, by Tchebychev inequality, for any α ∈ (0, 1) and all t ≥ 0, For α = 1/2 and µ = µ o the uniform law on a ℓ-local group of G, this result is exactly Theorem 2.1 by Luczak-McDiarmid [LM03], that generalizes Theorem 1.2 on S n (since S n is a 2-local group). By projection arguments, Theorem 1.3 applied with the uniform law µ o on the symmetric group S n , also provides transport-entropy inequalities for the uniform law on the slices of the discrete cube {0, 1} n . Namely, for n ≥ 1, let us denote by X k,n−k , k ∈ {0, . . . , n}, the slices of discrete cube defined by The uniform law on X k,n−k , denoted by µ k,n−k , is the pushed forward of µ o by the projection map S n → X k,n−k P : σ → 1 σ ( Theorem 1.4. Let µ k,n−k be the uniform law on X k,n−k , a slice of the discrete cube. (a) For all probability measures ν 1 and ν 2 on X k,n−k , where W 1 is the Wasserstein distance associated to d h , T 2 is the weak optimal transport cost defined by (6) with d = d h , and C k,n−k = min(k, n − k). (b) For all probability measures ν 1 and ν 2 on X k,n−k , where T 2 (ν 2 |ν 1 ) := inf with π(x, y) = ν 1 (x)p x (y) for all x, y ∈ X k,n−k .
Up to constants, the weak transport inequality (14) is the stronger one since for all ν 1 , ν 2 ∈ P(X k,n−k ), The proof of Theorem 1.4 is given in section 3. The transport-entropy inequality (14) is derived by projection from the transport-entropy inequality (9) for the uniform measure µ o on S n . The same projection argument could be used to reach the results of (a) from the transport-entropy inequality of (a) in Theorem 1.3, but it provides worse constants. The constant C k,n−k is obtained by working directly on X k,n−k and following similar arguments as in the proof of Theorem 1.3.

Remark :
The results of Theorem 1.4 also extend to the multinomial law. Let E = {e 1 , . . . , e m } be a set of cardinality m and let k 1 , . . . , k m be a collection of non-zero integers satisfying k 1 + · · · + k m = n. The multinomial law µ k 1 ,...,k m is by definition the uniform law on the set For any x ∈ X k 1 ,...,k m , one has µ k 1 ,...,k m (x) = k 1 !···k m ! n! . As a result, the weak transportentropy inequality (14) holds on X k 1 ,...,k m replacing the measure µ k,n−k by the measure µ k 1 ,...,k m . The proof of this result is a simple generalization of the one on X k,n−k , by using the projection map P : S n → X k 1 ,...,k m defined by: P(σ) = x if and only if The details of this proof are left to the reader.
A straightforward application of transport-entropy inequalities is deviation's bounds for different classes of functions. For more comprehension, we present below deviations bounds that can be reached from Theorem 1.3 for any measure in M T . A similar corollary can be derived from Theorem 1.4 on the slices of the discrete cube.
For any h : G → R, the mean of h is denoted by µ(h) := h dµ.

Corollary 1.2. Let G be a group of permutations with ℓ-local base T , G {id}.
Let µ ∈ P(G) be a measure of the set M T defined by (4). Let g be a real function on G.
(a) Assume that there exists a function β : G → R + such that for all τ, σ ∈ G, where the constants c(ℓ) and K n are defined as in part (a) of Theorem 1.3. (b) Assume that µ and G satisfy the conditions of (b) in Theorem 1.3. Let g be a so-called configuration function. This means that there exist functions Then, for all v ≥ 0, λ ≥ 0, one has and for all u ≥ 0, We also have, for all u ≥ 0 Comments and examples: • The above deviation's bounds of g around its mean µ(g) are directly derived from the dual representations (11),(12),(13) of the transport-entropy inequalities of Theorem 1.3, when α goes to 0 or α goes to 1. By classical arguments (see [Led01]), Corollary 1.2 also implies deviation's bounds around a median M(g) of g, but we loose in the constants with this procedure. However, starting directly from Corollary 1.1, we get the following bound under the assumption of (b): for all u ≥ 0, Then, the deviation bound above the median directly follows from Corollary 1.1 by optimizing over all α ∈ (0, 1). With identical arguments, the same bound can be reached for µ(g ≤ M(g) − u).
• In (a), the bound above the mean is a simple consequence of (11). As settled in (a), this bound also holds for the deviations under the mean, and it can be slightly improved by replacing sup σ∈G β(σ) 2 by 4µ(β 2 ). This small improvement is a consequence of the weak transport inequality with stronger cost T 2 . The same kind of improvement could be reached for the deviations above the mean under additional Lipschitz regularity conditions on the function β.
By applying the results of (b) (or even (15)) to the particular function g x (σ) = ϕ(x σ ), σ ∈ G, we recover and extend to any group G with ℓ-local base T and to any measure in M T satisfying (10), the deviation inequality by Adamczak, Chafaï and Wolff [ACW14] (Theorem 3.1) obtained from Theorem 1.2 by Talagrand. Namely, since for any σ, τ ∈ G, This concentration property on S n (with ℓ = 2) plays a key role in the approach by Adamczak and al. [ACW14], to study the convergence of the empirical spectral measure of random matrices with exchangeable entries, when the size of the matrices is increasing. • As a second example, for any t in a finite set F , let (a t i, j ) 1≤i, j≤n be a collection of non negative real numbers and consider the function This function satisfies, for any σ, τ ∈ G, where t(τ) ∈ F is chosen so that Let us consider the function The mean of h, µ(h), can be interpreted as a variance term as regards to g.
Observing that g satisfies the condition of (b) with , and |α| 2 2 ≤ h, Corollary 1.2 provides the following Bernstein deviation's bounds, for all u ≥ 0, If the real numbers a i, j are bounded by M, then |α| 2 2 ≤ Mg and therefore Corollary 1.2 also provides for all u ≥ 0, If we want to bound the deviation above the mean in terms of the variance term µ(h), it suffises to observe that the last inequality provides deviations bounds for the function h, replacing g by h and M by M 2 . Then, as a consequence of all the above deviation's results, it follows that for all λ, v, γ ≥ 0, , and v = u/2, we get the following Bernstein deviation inequality for the deviation of g above its mean, All the previous deviation's inequalities extend to countable sets F by monotone convergence.
When F is reduced to a singleton, these deviation's results simply implies Bernstein deviation's results for g(σ) = n k=1 a k,σ(k) when −M ≤ a i, j ≥ M for all 1 ≤ i, j ≤ n, by following for example the procedure presented in [BDR15, Section 4.2]. Thus, we extend the deviation's results of [BDR15] to probability measures in M T . • As a last example, let g(σ) = |σ| l denotes the number of cycles of lenght l in the cycle decomposition of a permutation σ. Let us show that g is a configuration function. Let C l (τ) denotes the set of cycles of lenght l in the cycle decomposition of a permutation τ. One has If c ∈ C l (τ) and c C l (σ) then there exists k in the support of c such that τ(k) σ(k). As a consequence, one has where α k (τ) = 1 if k is in the support of a cycle of lenght l of the cycle decomposition of τ, and α l (τ) = 0 otherwise. Thus, we get that the function g satisfies the condition of (b), g is a configuration function. Finally, observing that |α| 2 2 = lg, Corollary 1.2 provides for any measure µ ∈ M T satisfying (10), for all u ≥ 0, • The aim of this paper is to clarify the links between Talagrand's type of concentration results on the symmetric group and functional inequalities derived from the transport-entropy inequalities. For brevity's sake, applications of these functional inequalities are not fully developped in the present paper. However, let us briefly mention some other applications using concentration results on the symmetric group: the stochastic travelling salesman problem for sampling without replacement (see Appendix [Pau14]), graph coloring problems (see [McD02]). We also refer to the surveys and books [DP09,MR02] for other numerous examples of application of the concentration of measure principle in randomized algorithms.
Proof of Corollary 1.2. We start with the proof of (b). From the assumption on the function g, we get that for any p ∈ P(G) As α goes to 1, (13) applied to the function λg yields and therefore and if |α| 2 2 ≤ Mg, As α goes to 0, (13) yields and therefore The deviation bounds of (b) follows from (16), (19), (17), (18) by Tchebychev inequality, and by optimizing over all λ ≥ 0.
2. Proof of Theorem 1.3 Let T n = (t i j , j , j ∈ {2, . . . , n}, i j ∈ O j ) be a ℓ-local base of G. Let µ be a probability measure of the set M T n given by (4). Then, there exists a product probability measureν =ν 1 ⊗ · · · ⊗ν n such that µ = U T n #ν where the map U T n is given by (37).
Each transport-entropy inequality of Theorem 1.3 is obtained by induction over n and using the partition (H i ) i∈orb(n) of the group G defined by: for any i ∈ orb(n) = O n , According to our notations, H n = G n−1 is a subgroup of G, and we may easily check that T n−1 is a ℓ-local base of this subgroup. We also observe that if G is a normal subgroup of S n then G n−1 is a normal subgroup of S n−1 .
These properties are needed in the induction step of the proofs.
When G is a ℓ-local group, let us note that if i and l are elements of O n = orb(n), then from the ℓ-local property, there exists t i,l ∈ G such that t i,l (i) = l and deg(t i,l ) ≤ ℓ. We also have H l = H i t i,l . If moreover µ = µ o is the uniform law on G, then for In that case we will use in the proofs the following property: for any σ ∈ H n , one has σt i,n ∈ H i , σt i,n t −1 i,l ∈ H l , and The measure µ n is the uniform measure on the ℓ-local subgroup H n = G n−1 .
Proof of (a) in Theorem 1.3. As already mentioned, since W 1 satisfies a triangular inequality, the transport-entropy inequality (7) is equivalent to the following one: for all ν ∈ P(G), 2 c(ℓ) 2 W 2 1 (ν, µ) ≤ K n H(ν|µ). A dual formulation of this property given by Theorem 2.7 in [GRST15] and Proposition 3.1 in [Sam17] is the following: for all functions ϕ on G and all λ ≥ 0, e λQϕ dµ ≤ e λϕ dµ+K n c(ℓ) 2 λ 2 /8 , with We will prove the inequality (23) by induction on n. Assume that n = 2. If G = {id} then K n = 0 and the inequality (23) is obvious. If G {id}, then G is the two points space, G = S 2 , ℓ = 2 and one has In that case, (23) exactly corresponds to the following dual form of the Csiszar-Kullback-Pinsker inequality (5) (see Proposition 3.1 in [Sam17] ): for any probability measure ν on a Polish space X, for any measurable function f : X → R, with R c f (x) = inf p∈P(X) ß f dp + c 1 x y dp(y) The induction step will be also a consequence of (24). Let (H i ) i∈O n be the partition of G defined by (20). Any p ∈ P(G) admits a unique decomposition defined by This decomposition defines a probability measurep on O n . In particular, according to the definition of the measure µ ∈ M T n and sinceν n (i) = µ(H i ), one has µ = i∈O nν n (i) µ i .

It follows that
where the last equality is a consequence of property (21). Now, we will bound the right-hand side of this equality by using the induction hypotheses.
For any function g : G → R and any t ∈ G, let g t : G → R denote the function defined by g t (σ) := g(σt).
For any function f : H n → R and any σ ∈ H n , let us note The next step of the proof relies on the following Lemma. This lemma is obtained using the decomposition (25) of the measures p ∈ P(G) on the H j 's. Let σ ∈ H n . By the triangular inequality and using the invariance by translation of the distance d, one has d(σt i,n , τ) dp(τ) = l∈O n H l d(σt i,n , τ)dp l (τ)p(l)  and therefore, since It follows that The proof of the second inequality of Lemma 2.1 is similar, starting from the following triangular inequality d(σt i,n , τ) dp(τ) = l∈O n H l d(σt i,n , τ)dp l (τ)p(l) ≤ l∈O n d(σt i,n , τt i,l )dp l (τ)p(l) + l∈O n H l d(τt i,l , τ)dp l (τ)p(l) The induction step of the proof of (23) continues by applying consecutively Lemma 2.1 (1), the Hölder inequality, and the induction hypotheses to the measure µ n on the subgroup H n = G n−1 with ℓ-local base T n−1 .
If O n = {n} then K n = K n−1 and e λQϕ dµ = e λQϕ(σ) dµ n (σ) ≤ e λϕdµ n +K n−1 c(ℓ) 2 /8 = e λϕdµ+K n c(ℓ) 2 /8 If O n {n} then K n = K n−1 + 1 and for any i ∈ O n , where, by using property (21),φ(l) := ϕdµ l = ϕ t l,n dµ n . Let us consider again the above infimum-convolution R cφ defined on the space X = O n , with c = c(ℓ), one has By applying (24) with the probability measure ν =ν n on O n , the previous inequality gives This ends the proof of (23) for any µ ∈ M T n . The scheme of the induction proof of (23), with a better constant c(ℓ) when µ = µ o is the uniform measure on a ℓ-local group G, is identical, starting from the second result of Lemma 2.1 and using the property (22). This is left to the reader.
We now turn to the induction proof of the dual formulation (12) of the weak transport-entropy inequality (8). The sketch of the proof is identical to the one of (23).
For the initial step n = 2, one has G = S 2 and ℓ = 2, and one may easily check that In that case, the result follows from the following infimum-convolution property.
Lemma 2.2. For any probability measure ν on a Polish metric space X, for all α ∈ (0, 1) and all measurable functions f : X → R, bounded from below Å e α R α f dν where for all x ∈ X, and c α is the convex function defined by Observing that c α (u) ≥ u 2 /2 for all u ∈ [0, 1], the above inequality also holds replacing R α f by x ∈ X.
The proof of this Lemma can be found in [Sam07] (inequality (4)). For a sake of completeness, we give in the Appendix a new proof of this result on finite spaces X by using a localization argument (Lemma 4.1).
Let us now present the key lemma for the induction step of the proof. For any function f : H n → R and any σ ∈ H n , we define Here, writing Q H n t f , we omit the dependence in c(ℓ) to simplify the notations. The proof relies on the following Lemma.

and t i,l denotes an element of G with deg(t i,l ) ≤ ℓ and such that t i,l
The proof of this lemma is similar to the one of Lemma 2.1. By (26) and the inequality we get for any s ∈ (0, 1), Å d(σt l,n , τ) dp(τ) It follows that for any σ ∈ H n , where the last equality follows by choosing s = K n−1 /K n , which ends the proof of the first inequality of Lemma 2.3. The second inequality of Lemma 2.3 is obtained identically starting from (27). We now turn to the induction step of the proof. By the decomposition of the measure µ on the H i 's, we want to bound where the last equality is a consequence of property (21).
If O n = {n}, then the result simply follows from the induction hypotheses applied to the measure µ n .
If O n {n}, then applying successively Lemma 2.3 (1), the Hölder inequality, and the induction hypotheses, we get where by property (21), we set According to the definition of the infimum convolution Rφ on the space X = O n given in Lemma 2.2, the last inequality is and therefore Lemma 2.2, applied with the measure ν =ν n , provides The proof of (12) is completed for any measure µ ∈ M. To improve the constant when µ = µ o is the uniform law on a ℓ-local group G, the proof is similar using the second inequality of Lemma 2.3 together with property (22).
Proof of (b) in Theorem 1.3. We prove the dual equivalent property (13) as a consequence of the stronger following result: for any real function ϕ on G, for any j ∈ {1, . . . , n} Å e αQ j ϕ dµ where the infimum convolution operator Q j ϕ is defined as follows, for σ ∈ G The proof of (29) relies on Lemma 2.2 and the following ones. For any σ ∈ G, we define and for j ∈ [n − 1], Lemma 2.4. Let j ∈ [n]. For any σ ∈ G, one has This result follows from the change of variables σ(k) = l in the definition (30) of Q j ϕ(σ), one has where for the last equality, we use the fact that the map that associates to any measure p ∈ P(G) the image measure q := R#p with R : σ ∈ G → σ −1 ∈ G, is one to one from P(G) to P(G).
Here is the key lemma for the induction step of the proof of (29).
(2) For any ℓ ≥ 2, let c 2 (ℓ) := 8(ℓ − 1) 2 + 2. Assume that O n {n} and let i, j ∈ O n , i j. We note D i = supp(t −1 j,n t i,n ) \ {i} and d = |D i |. For any σ ∈ H n , for any θ ∈ [0, 1] one has (3) For any ℓ ≥ 2, let c 2 (ℓ) := 2(ℓ − 1) 2 + 2. Assume that O n {n} and let i, j ∈ O n , i j. Let t i, j ∈ G such that t i, j (i) = j and deg(t i, j ) ≤ ℓ. We note D i = supp(t i, j ) \ {i} and d = |D i |. For any σ ∈ H n , for any θ ∈ [0, 1] one has Proof. The first part of this Lemma follows from the fact that P(H j ) ⊂ P(G) and the fact that 1 σt j,n ( j) y( j) dp(y) = 0 for σ ∈ H n and p ∈ P(H j ). Therefore, according to the definition of Q j ϕ, one has for σ ∈ H j , For the proof of the second part of Lemma 2.5, we set Let us consider p l i , l ∈ D i , a collection of measures in P(H i ), and p j ∈ P(H j ) ( j i). For θ ∈ [0, 1], is a probability measure on G. Therefore, according to the definition of Q i ϕ, for any σ ∈ H n , Since σ ∈ H n and p l i ∈ P(H i ), one has 1 σt i,n (i) y(i) dp l i (y) = 0 and 1 σt i,n (i) y(i) dp j (y) = 1. It follows that For any k ∈ [n] and l ∈ D i , let us note U i (k, l) := 1 σt i,n (k) y(k) dp l i (y), and U j (k) := 1 σt i,n (k) y(k) dp j (y).

By the Cauchy-Schwarz inequality, one has
We also have All the above estimates together provide Observe that Therefore, according to the definition of c(ℓ), one has 2d 2 + 2 ≤ c(ℓ) 2 . As a consequence we get from all estimates above, by optimizing over all p l i ∈ P(H i ) and all p j ∈ P(H j ), and where we used successively the following arguments: This ends the proof of part (2) of Lemma 2.5.
Then, the only minor change is for the last step where we used successively the following arguments: The proof of Lemma 2.5 is completed.
We will now prove (29) by induction over n. For n = 2, G is the two points space S 2 which is 2-local. For i ∈ {1, 2}, and for any p ∈ P(G), 1 c(2) 2 Å 1 σ(i) y(i) dp(y) As a consequence, we get the expected result from Lemma 2.2 applied with X = G.
We will now present the induction step. We assume that (29) holds at the rank n − 1 for all j ∈ {1, . . . , n − 1}.
Let us first explain that it suffices to prove (29) for j = n. For any t ∈ S n , let G (t) = t −1 Gt. The isomorphism c t : G → G (t) , σ → t −1 σt pushes forward the measure µ on the measure µ (t) := c t #µ ∈ P(G (t) ), and conversely µ = c t −1 #µ (t) . Let j ∈ [n]. For any σ ∈ G (t) and any real function ϕ on G, one has From this observation, by choosing t −1 = t jn , and setting ψ = ϕ • c t −1 , one has Å G e αQ j ϕ dµ If we assume that G is a normal subgroup of S n and that µ satisfies the second property of (10), then G (t) = G and µ (t) = µ. Therefore the above expression is bounded by 1 as soon as (29) holds for j = n. If we assume G is a ℓ-local group and µ = µ o is the uniform law on G, then G (t) is also a ℓ-local group and µ (t) is exactly the uniform law on G (t) . Therefore the last expression is bounded by 1 as soon as (29) holds with j = n for any uniform law on a ℓ-local group. As a conclusion, it remains to prove inequality (29) for j = n.
We may assume that O n {n}, otherwise the induction step is obvious. We first apply Lemma 2.4, by the first property of (10) satisfied by µ, . We choose j ∈ O n such that min k∈O nĝ (k) =ĝ( j).
By the induction hypotheses applied to the measure µ n on the subgroup H n = G n−1 , it follows that Let us now consider i j, i ∈ O n . When G is a normal subgroup of S n , property (21), the second part of Lemma 2.5 and Jensen's inequality yield: for any θ ∈ [0, 1], θ log e αQ Hn ,l g t i,n dµ n + (1 − θ) log e αQ Hn g t j,n dµ n By the induction hypotheses applied with the measure µ n on the normal subgroup G n−1 = H n of S n−1 , and from property (21), it follows that We get the same inequality when G is a ℓ-local group and µ = µ o is the uniform law on G, by using property (22), the third part of Lemma 2.5 and the induction hypotheses applied to the uniform measure µ n on the ℓ-local subgroup G n−1 = H n .
According to the definition (28) of the infimum-convolution operator Rĝ defined on the space X = O n , we may easily check that for every i ∈ O n , Therefore optimizing over all θ ∈ [0, 1], we get from (32) and (33): for all i ∈ O n , e αQ i g dµ i ≤ e α Rĝ(i) .
Finally, from Lemma 2.2 applied with the measure ν =ν n on O n , the equality (31) gives .
The proof of (29) is completed.
3. Transport-entropy inequalities on the slice of the cube.
Proof of (a) in Theorem 1.4. We adapt to the space X k,n−k the proof of (a) in Theorem 1.3. In order to avoid redundancy, we only present the main steps of the proof. By duality, it suffices to prove that for all functions ϕ on X k,n−k and all λ ≥ 0, where x ∈ X k,n−k , and for any 0 < α < 1, where for t > 0, x ∈ X k,n−k .
The proof is by induction over n and 0 ≤ k ≤ n. For any n ≥ 1, if k = n or k = 0, the set X k,n−k is reduced to a singleton and the inequalities (34) or (35) are obvious.
For n = 2 and k = 1, X k,n−k is a two points set, (34) and (35) directly follows from property (24) and Lemma 2.2 on X = X 1,1 .
For the induction step, we consider the collection of subset Ω i, j , with i, j ∈ {1, . . . , n}, i j, defined by Since for any x ∈ X k,n−k , any probability measure p on X k,n−k admits a unique decomposition defined by and Theorem 9.5 of [GRST15], the weak transport-entropy inequality (14) is equivalent to the following property that we want to establish: for any real function f on X k,n−k and for any 0 < α < 1, where " Q f (x) := inf p∈P(X k,n−k ϕ dp + 1 8 n k=1 Å 1 x k y k dp(y) x ∈ X k,n−k .
Let us apply property (13) to the function f • P : S n → R. Since µ k,n−k = P#µ, we get The inequality (36) is an easy consequence of the following result.
It remains to prove this lemma. By definition, one has Let p ∈ S n such that P#p = q.
For y ∈ X k,n−k , let us note Y = {i ∈ [n], y i = 1}. Then P(τ) = y if and only if Assume now that j [k], if τ([k]) = Y and σ( j) ∈ Y then we also have τ( j) σ( j). It follows that From these observations, we get n j=1 Å 1 σ( j) τ( j) dp(τ) This inequality provides The proof of Lemma 3.2 and (b) in Theorem 1.4 is completed.

Appendix
Proof of Lemma 1.1. Let T = (t i j , j ) be a ℓ-local base of a group of permutations G = G n . In order to prove that the map is one to one, it suffises to construct its inverse.
This ends the proof of Lemma 1.1.
Proof of Lemma 1.2. Let G = G n be a ℓ-local group. From the definition of the ℓ-local property, it is clear that any of the subgroup G j , j ∈ {2, . . . , n} is ℓ-local. As a consequence, for any i j ∈ O j , i j j, there exists t i j , j ∈ G j such that t i j , j (i j ) = j, and deg(t i j j ) ≤ ℓ.
This completes the proof of Lemma 1.2.
Proof of Lemma 2.2. Let α ∈ (0, 1) and f be a real function on the finite set X. We want to show that for any probability measure ν on X, Å e α R α f dν We will apply the following lemma whose proof is given at the end of this section.
Lemma 4.1. Let F be a real function on X and K ∈ R. Let us consider the set If C is not empty, then the extremal points of this convex set are Dirac measures or convex combinations of two Dirac measures on X.
The left-hand side of this inequality is invariant by translation of the function f by a constant. Therefore, by symmetry, we may assume that 0 = f (y) ≤ f (x). It follows that R α f (y) = 0. Therefore we want to check that for any non-negative function f on {x, y}, for any λ ∈ [0, 1], since ψ is a convex function on [0, 1].