Interacting spin-2 fields in the Stueckelberg picture

We revisit and extend the `Effective field theory for massive gravitons' constructed by Arkani-Hamed, Georgi and Schwartz in the light of recent progress in constructing ghost-free theories with multiple interacting spin-2 fields. We show that there exist several dual ways of restoring gauge invariance in such multi-gravity theories, find a generalised Fierz-Pauli tuning condition relevant in this context and highlight subtleties in demixing tensor and scalar modes. The generic multi-gravity feature of scalar mixing and its consequences for higher order interactions are discussed. In particular we show how the decoupling limit is qualitatively changed in theories of interacting spin-2 fields. We relate this to dRGT (de Rham, Gabadadze, Tolley) massive gravity, Hassan-Rosen bigravity and the multi-gravity constructions by Hinterbichler and Rosen. As an additional application we show that EBI (Eddington-Born-Infeld) bigravity and higher order generalisations thereof possess ghost-like instabilities.


Introduction
Massive gravity has recently experienced a remarkable renaissance. While at the linearised level the Fierz-Pauli action [1] has long been known to consistently describe a single massive spin-2 field, any concrete fully non-linear extension/deformation of general relativity (GR) designed to give the graviton a mass appeared to inevitably run into problems [2][3][4][5][6][7][8]. Generating a mass by adding an 'interaction term' (i.e. a potential) for the metric changes the constraint structure of the theory and generically leads to the appearance of a new ghost-like degree of freedom (dof ): the so-called Boulware-Deser ghost [3]. However, recently it was shown that a consistent and ghost-free theory of a massive spin-2 field can be constructed despite of these obstacles: dRGT (de Rham, Gabadadze, Tolley) gravity [9][10][11]. For some related recent progress see . The particular ghost-free potential that lies at the heart of dRGT gravity is constructed using a fixed Minkowski reference metric η in addition to a dynamical metric g, so the theory is intrinsically bimetric 1 . As a result the dRGT interaction terms have been extended to the case of two fully dynamical metrics, culminating in the construction of a consistent and ghost-free theory of bigravity: Hassan-Rosen bigravity [36][37][38]. This theory describes one massless and one massive spin-2 field, consistent with the findings of [39] that any consistent theory of interacting spin-2 fields can contain at most one massless spin-2 field. For related cosmological and phenomenological studies see [40][41][42][43][44][45][46][47][48][49][50][51][52][53][54][55][56][57][58] and [59][60][61][62][63][64][65][66][67][68][69][70] for dRGT/massive gravity and Hassan-Rosen bigravity theories respectively. Having successfully constructed consistent theories for a single and two dynamical spin-2 fields, a particularly intriguing question arises: What is the picture for N interacting spin-2 fields? Can we construct consistent, ghost-free models of an arbitrary number of spin-2 fields? What is their structure and what are the physical features and associated phenomenology of such theories? In [71] the dRGT/Hassan-Rosen interaction terms were generalised by Hinterbichler and Rosen to produce a particular set of consistent and ghost-free interactions for N interacting spin-2 fields, opening the doors for investigating such 'multi-gravity' models with a concrete ghost-free proposal. For related multi-gravity work also see [72][73][74][75][76].
An essential stepping stone in the process leading up to the construction of consistent theories of massive gravity, and their extensions to multiple spin-2 fields, was the systematic development of an 'Effective field theory for massive gravitons' 2 [77] -also see [78][79][80]. This uses the so-called Stückelberg trick to project out all relevant dof in the theory and make the nature of their interaction terms explicit (in this context also see [81]). In particular this approach establishes the hierarchy of energy scales in the effective theory, its cutoff scale and how this scale can be altered/raised. Consequently it makes transparent which interactions are most relevant at low energies. Especially interesting is also the relation of these studies to higher dimensional theories of gravity [33, 78-80, 82, 83]. In the effective field theory context, the particular models investigated by [77] were two site models with Fierz-Pauli interactions as well as some specific generalisations to theories of N spin-2 fields, again with bimetric 1 Note that this 'bimetricity' is a generic feature of massive gravity proposals, as a tensor other than g is typically needed in order to generate any non-trivial (and non-GR) interaction terms. In dRGT this role is played by a non-dynamical Minkowski metric. However, note another interesting recent proposal for a massive gravity theory: The non-local massive gravity scenario proposed by [34] -also see [35]. The proposed fully non-linear action considered there does not rely on an external reference metric at the expense of introducing non-local interactions. 2 This is an effective field theory in the sense of constructing a general framework for arbitrary potentials, i.e. non derivative interaction terms between the different spin-2 fields. Derivative interactions other than self-interactions provided by the Ricci scalar are not considered here.

Fierz-Pauli interactions.
In this paper we endeavour to complete this effective field theory picture by constructing what we dub an 'Effective field theory for interacting spin-2 fields'. Specifically this means we extend and generalise the 'Effective field theory for massive gravitons' by [77] to deal with generic (i.e. non Fierz-Pauli) N-metric interaction terms -especially relevant in the light of the afore-mentioned progress in massive/multi-gravity -and general combinations of interaction terms as discussed by [71] rather than restricting ourselves to purely nearest-neighbour interactions as in [77]. This allows us to identify several qualitatively new features of generic multi-spin-2 field scenarios. The overarching aim here is to systematically understand the dof and energy scales involved in such theories, leading to a better understanding of N spin-2 field theories constructed with known ghost-free interactions of the dRGT/Hassan-Rosen/Hinterbichler-Rosen type as well as potentially paving the way for the discovery of new additional consistent interactions. In the process of constructing an effective field theory view in the Stückelberg picture, we discuss some of the new features and subtleties that appear when the approach of [77] is generalised to generic multi-spin-2 field scenarios. In particular we also extend the approach to a detailed analysis of higher order interactions in the so-called decoupling limit as well as uncovering 'mixing phenomena' at quadratic order. Throughout we will work in the metric picture -for a discussion of how this relates to interacting spin-2 field theories constructed in the vielbein picture (as in [71]) we refer to [84]. 3 The outline for this paper is as follows. In section 2 we introduce theories of interacting spin-2 fields, discuss how such theories can be constructed in general and what the dynamical dof are. In section 3 we review the Stückelberg trick in the context of massive gravity and discuss how it can be extended to restore gauge invariance in theories with multiple spin-2 fields. Various subtleties involving this extension are discussed in section 4, where we also show that there are several dual approaches to introduce Stückelberg fields in interacting N spin-2 field theories. We construct a particularly economical approach and also discuss some features of 'loops' in theory graphs. In section 5 we then discuss how the Stückelberg scalars -Goldstone bosons of the N − 1 broken diffeomorphism invariances in a theory with N spin-2 fields and corresponding to the longitudinal scalar components of the 'massive gravitons' -can be demixed from tensor perturbations and how they acquire a kinetic term in the process. We furthermore derive a generalised Fierz-Pauli tuning condition, which theories of interacting spin-2 fields need to satisfy and discuss gauge-fixing issues relevant to eliminating scalar-tensor mixing at lowest order. In section 6 we show that interacting spin-2 fields generically display scalar mixing, i.e. the Stückelberg scalars mix kinetically and the true propagating dof are in fact linear combinations of the Stückelberg scalars. This is a feature absent in bigravity (or single spin-2 field) theories, but generic in the presence of more than two fields. A discussion of higher order interactions and the decoupling limit follows in section 7, where we show that the decoupling limit in multi-gravity theories is qualitatively different from that in bi-and massive gravity. Finally, in sections 8,9 and 10, we use the machinery developed throughout the paper to analyse three example theories: Eddington-Born-Infeld (EBI) bigravity, Hassan-Rosen bigravity and an EBI bigravity extension to four interacting spin-2 fields respectively. The EBI bigravity case is especially noteworthy, since we show that this theory has ghost-like instabilities. In section 11 we conclude and summarise our findings. Some further useful results are collected in appendices A, B and C.

Interacting spin-fields
A multi metric theory: We begin with the following schematic action for N-spin-2 fields g (i) µν S = S site + S int + S matter . (2.1) The first term S site encompasses all derivative self-interaction terms for the g (i) µν , while S int contains (non-derivative) interactions between all g (i) µν and S matter describes the coupling of the spin-2 fields to other matter in the universe. As such we may write S matter = S matter Φ M , g (1) µν , . . . , g (N ) µν , (2.4) where Φ M labels all other matter fields (M being the labelling index). Note that we have made two key simplifying assumptions: Firstly, no derivative interactions between different spin-2 fields are considered. Secondly, the derivative self-interactions of each spin-2 field are taken to be the corresponding Ricci scalar (for some recent progress in understanding different 'kinetic terms' for spin-2 fields see [87][88][89]) and the associated mass scale is assumed to be universal, i.e. M 2 P l(i) = M 2 P l . To clarify our notation: Throughout this paper we use bracketed (Latin) indices to label different fields -they are also 'site indices' as we will explain shortly. Whether these indices are upper or lower indices has no meaning. Greek indices are space-time indices, which are raised and lowered with the flat Minkowski metric η.
The coupling to matter: By construction S site provides consistent and ghost-free self-interactions for all the spin-2 fields 4 . In what follows we will therefore focus on S int , devising a systematic way to understand the physics coming out of the dof arising as a result of the symmetries broken by S int in theories of multiple interacting spin-2 fields. While we will not have much to say about the matter sector S matter , it is worth making a few observations before exclusively focusing on S int throughout the rest of this paper. Firstly note that we can always choose to minimally couple matter to just one g (i) µν S matter = S matter Φ M , g (i) µν , (2.5) This ensures that a theory with healthy and consistent S site and S int remains so after coupling to matter -the constraint structure remains unchanged by the minimal coupling to just one spin-2 field [37]. A particular bi-metric coupling has recently been investigated by [66]. Note, however, that multi-metric couplings generically fall into two categories. Either the (weak)  equivalence principle is fully respected and consequently matter couples to a single effective metricg (matter) µν (minimally in the associated Jordan frame).
(2.6) g (matter) µν will in general be a function of all N spin-2 fields g (i) µν in the theory. We now rewrite the action in terms ofg (i) µν , i.e. we 'rotate' our spin-2 field space from g µν =g (matter) µν for some n. Expressed in this basis finding a consistent and ghost-free theory once again becomes equivalent to establishing the correct structure forS site andS int in terms of theg (i) µν . 5 Alternatively, if the matter coupling is 'truly bimetric' (as in [66]) different matter species Φ M will couple to different effective metrics, leading to weak equivalence principle violations (which it may indeed be interesting to constrain).
A single interaction term: The kinetic terms in S site manifestly respect N general coordinate invariances -one for each spin-2 field. Following the convention of [77,79,80] we denote these by GC (i) , where the label i indicates with which site/spin-2 field a given symmetry group is to be associated. As a result of the presence of N GC (i) there are N diffeomorphism invariances where d (i) is a diffeomorphism. However, S int will generically break several GC (i) . Let us begin by considering individual interaction terms. Following in the footsteps of [71,77,79,80], it is helpful to represent these terms graphically with the ultimate aim of painting a theory graph representing a full multi-metric theory. Figure 1 shows the possible interaction terms ordered by the number of participating spin-2 fields (we choose the same 'theory graph conventions' as [71]). Every site/node corresponds to a spin-2 field, a line connecting two sites denotes a bimetric interaction term and n sites connected via lines to a single small black node denotes an n-metric interaction vertex. Here we picture a system of 13 interacting spin-2 fields with interaction terms involving two or three fields. Three graphs internally connected by interactions, but disconnected from all the other fields, remain. As a result we have three surviving copies of GC and 13 − 3 = 10 propagating modes/Goldstone bosons for this setup. Each line for bimetric interactions breaks one copy of GC (i) and hence contributes one Goldstone boson π (i) , whereas N -metric interactions break N − 1 copies of GC (i) contributing N − 1 Goldstone bosons π (i) .
Consider one such vertex for n fields in isolation as in figure 1 (c). Before the interaction term is introduced there are n non-interacting spin-2 fields, so n unbroken copies of GC (i) which result in 2 degrees of freedom (dof ) on each site corresponding to those of a massless graviton. Consequently there are 2n degrees of freedom in total. The interaction term results in the breaking of n − 1 copies of this gauge invariance, leaving only one copy intact (this copy being the diagonal subgroup of all n previously unbroken GC (i) ). As a consequence of the broken symmetry we have n − 1 (propagating) Goldstone modes which get eaten by n − 1 spin-2 fields. These fields become massive as a result. The full theory therefore contains one massless spin-2 field (2 dof ) and n − 1 massive spin-2 fields (with 5 dof each). These take up a total of 2 + 5(n − 1) dof . However, a priori each spin-2 field contributes 6 dof , resulting in 6n dof for the case we are considering here. The surviving copy of GC (i) removes 4 dof and, as we just worked out, the single massless and n − 1 massive fields take up another 2 + 5(n − 1) dof . This leaves us with 6n − 4 − 2 − 5(n − 1) = n − 1 dof . These are n − 1 potential Boulware-Deser ghosts and only if S int has the right structure to enforce constraints, eliminating the dof associated to these would-be ghosts, is the theory healthy. In the following sections we will discuss the Stückelberg trick, which will allow us to explicitly understand the physics associated to all these dof , in particular the nature of their interactions.
Multiple interaction terms: But before moving on to the Stückelberg trick, let us briefly think about how to put the above reasoning for a single interaction vertex together in order to produce a theory graph for an arbitrary S int . We can write a general interaction term with n participating spin-2 fields where the c i 1 ...in are constant coefficients. Note that the n = 1 case is not fundamentally different from the other n, but since the only Lorentz scalar we can form with just one spin-2 field is a constant, the n = 1 term is simply a cosmological constant. Now consider a generic multi-metric theory with interaction terms as depicted in figure 2 6 . If the theory of interest is described by a single fully connected theory graph, then S int breaks all but one copy of GC, reducing the N general coordinate invariances GC (i) down to a single diagonal subgroup. As pointed out in [71] the situation is somewhat different for more general S int as depicted in figure 2. The number of unbroken copies of GC (i) corresponds to the number of disconnected 'islands' in the theory graph. As a result, in figure 2, 3 copies of GC (i) remain intact, so that the theory contains 3 massless and 10 massive spin-2 fields, which have eaten the 10 Goldstone bosons arising from the broken copies of GC (i) . Establishing whether any of the potential 10 Boulware-Deser ghosts are present requires knowledge of the exact nature of the interaction terms. A variety of interaction terms have been considered in the context of multiple interacting spin-2 fields: nearest neighbour Fierz-Pauli interactions as well as more general Fierz-Pauli terms, e.g. for truncated Kaluza-Klein models [80], are still plagued by ghosts. dRGT massive gravity [9][10][11] was the first successful (ghost-free) model of a single interacting massive graviton -its action is that of a bimetric theory with one non-dynamical metric (the Minkowski η µν ) and interaction terms formed out of elementary symmetric polynomials of √ g µν η να . The same type of interaction term has been generalised to the case of a fully dynamical bimetric theory and shown to be ghost-free by Hassan and Rosen [36][37][38] (see example II in section 9) and subsequently for N -metric interactions by Hinterbichler and Rosen [71]. Whether other types of ghost-free interaction terms exist for such theories remains to be seen. Consequently our philosophy throughout this paper (until we consider some concrete examples at the end, at least) will be to remain agnostic about the particular form of any interaction term and analyse N spin-2 field theories without imposing any particular interaction structure.

Link fields and the Stückelberg trick
The Stückelberg field: A generic multi-metric action such as is effectively a gauge-fixed action 7 , as indicated by the fact that N − 1 copies of GC (i) are broken by S int . We can restore the broken gauge invariances by employing the so-called Stückelberg trick. Consider a site i in our theory graph with associated spin-2 field g (i) µν and general coordinate invariance GC (i) (parametrised by diffeomorphisms d µ (i) (x)), which has been broken by a single interaction term. S int is consequently not invariant under the gauge transformation We now introduce the Stückelberg field Y µ (i) and use it to replace all occurrences of g µν in the interaction term via the following substitution, mimicking the diffeomorphism invariance symmetry αβ (Y (x)). (3.3) 6 Note we have restricted ourselves to bimetric and trimetric interaction terms in figure 2 to provide a simple example, but there is of course no fundamental reason why more spin-2 fields cannot participate. 7 This is partially gauge-fixed in the sense that the gauge-invariance for the surviving single copy of GC (i) is still fully intact.
Interacting spin-2 fields Figure 3. The Stückelberg trick: We start with a set of sites/spin-2 fields connected by (in the case here bimetric) interaction terms. Stückelberg/link fields are introduced, mapping interaction terms onto individual sites. Note that it is an arbitrary choice to which site (that participates in the interaction) the term is mapped and that there is no directedness to the lines before introducing a link field. We then expand the Stückelberg fields around their new site basis, projecting out a Goldstone boson associated with the symmetry broken by the original interaction term. Doing this to all the interaction terms we end up with a theory in which all the gauge-invariances broken by interaction terms in the original theory are restored at the expense of introducing the A (i) as gauge degrees of freedom. This way of applying the Stückelberg trick corresponds to approach I as shown in equation (4.1). Dual approaches are discussed below. Note that each link leads to interaction terms solely depending on one Goldstone boson A (i) , i.e. there are no cross-terms between different Goldstone bosons in the approach shown here (there is scalar-tensor mixing, however, which we will later see reintroduces Goldstone mixing).
Note that, as far as the R (i) term is concerned, this is a gauge transformation with gauge func- µν is now also invariant under gauge transformation parametrised by d µ Having replaced g µν , S int is therefore now invariant under gauge transformations generated by d µ (i) (x). 8 We will see shortly that the Stückelberg replacement corresponds to introducing extra gauge degrees of freedom into the action.
Functional composition: Taking a lead from [77], it will be useful to introduce some notation at this point, writing gauge transformations in terms of functional composition •. Consider three objects φ, a µ , g µν transforming as a scalar, one-form and metric under GC (i) respectively. d µ (i) (x) is the gauge function generating these transformations. Establishing a dictionary for writing these transformations we have Clearly gauge transformations are much simpler in the functional composition language as we no longer need to worry about the valence of the object in question. For an arbitrary tensor p (i) , which transforms under GC (i) , but remains invariant under other copies of general coordinate invariance GC (j) (where i = j) we therefore have The Stückelberg transformation that makes an action, which is built out of p (i) and other fields not transforming under GC (i) , invariant under GC (i) is therefore where p and Y transform as Link fields: So far we have only thought about making an interaction term, which is initially not invariant under a particular GC (i) , invariant under the same group of transformations. In a setting such as dRGT massive gravity, where only one metric is dynamical, this is all that is required. However, in the context of N dynamical spin-2 fields, the Stückelberg trick can be made much more powerful than this. By specifying a more complex transformation law for the Stückelberg field Y µ , this field can be viewed as a 'link field' mapping an object living on site i (i.e. transforming under GC (i) and invariant under GC (j) , where i = j) into an object living on site j (i.e. transforming under GC (j) and invariant under GC (k) , where k = j, but e.g. k = i is allowed). This means that Y µ has to transform under Let us check that g (i) • Y (ij) now has the correct transformation properties under both GC (i) and GC (j) It is instructive to think of the link field Y (ij) as generating the pull-back of g (i) from site i onto site j. The dRGT case, with only one dynamical spin-2 field, in this language corresponds to mapping onto the site of a non-dynamical metric, such as η µν , which does not transform under any GC (j) . Note that, alternatively, one could also map the non-dynamical metric, which does not transform under any GC (j) , to the site of the dynamical metric. What about more general interactions? For a bimetric interaction between two spin-2 fields g (i) and g (j) , as already considered in [77], we can use the generalised Stückelberg field Y µ (ij) to map the whole interaction onto one site, say j (3.13) The resulting S int is invariant under GC (i) by construction (nothing transforms under GC (i) ) and as long as f (. . .) describes a Lorentz scalar, it will also be invariant under GC (j) . Note that the √ g (j) term in the initial S int could be absorbed into f , but we have kept it here to indicate that the form of the interaction term quite often suggests a natural site to which to map the term. Extending this formalism to an n spin-2 field interaction vertex is now straightforward. Consider a single such vertex (3.14) which transforms under GC (i 1 ) . . . GC (in) and hence breaks n − 1 gauge symmetries. We choose an arbitrary site i k where k ∈ {1, n} and introduce n − 1 link fields Y µ (i j i k ) for each j ∈ {1, n} = k (we are of course free to also define Y µ (i k i k ) , which is the identity). Note we have not projected out the determinant of any of the participating spin-2 fields to stay fully general here. Applying the Stückelberg fields as before and defining j 1 , . . . j n−1 to label all sites except for i k we have This action is now invariant under all GC (j 1 ) , . . . , GC (j n−1 ) by construction, with all spin-2 fields being mapped onto the i k site, i.e. transforming under GC (i k ) . In this way one can restore the full n gauge invariances for an n-vertex interaction term. Throughout the rest of this paper we will call the act of introducing Stückelberg fields and restoring gauge invariances in the process "to Stückelberg " -for ease of terminology we will somewhat loosely use this as a verb. It is also worth emphasizing that there are important subtleties when applying this iteratively to several interactions in a theory graph, which we will discuss below in section 4.
Projecting out Goldstone modes: In understanding the nature of the Stückelberg /link fields Y µ (ij) it is useful to expand them about the identity Written in this way we can see that the Stückelberg trick fundamentally is nothing more than a coordinate transformation, which takes coordinate . To see this explicitly, consider an interaction vertex and notice that for a Stückelberged spin-2 field we can write where we have dropped explicit indices i, j. The A µ field then is the Goldstone boson of the broken GC (i) invariance associated with g µν and encodes all the effects of symmetry breaking. The gauge-fixed nature of S int as written in equation (3.1) now becomes apparent. The fully gauge-invariant S int with expanded Stückelberg fields is given by combining equations (3.15), (3.16) and (3.17). This is a theory with N massless spin-2 fields and N − 1 gauge bosons. But we can gauge-fix this action, removing all the gauge bosons A µ (ij) by moving to unitary gauge in which Y µ (ij) = x µ (i) . In this gauge we recover (3.1) and all gauge bosons have been eaten to produce N − 1 massive spin-2 fields (a single massless spin-2 field remains due to the single surviving copy of GC).
Having projected out the Goldstone boson of a broken GC (i) , we can further decompose A µ as i.e. into a 'vector'/spin one and a 'scalar' mode, which correspond in the decoupling limit to the longitudinal scalar and vector components of the 'massive graviton' 9 associated with g (i) µν in the gauge-fixed action (3.1). Note that following [77,80] we have introduced a fake U(1) symmetry, with π as its Goldstone boson. π is of particular interest, since the longitudinal scalar mode of the 'graviton' is the one that typically threatens to become the Boulware-Deser ghost. We also expect it to yield the most dangerous interaction terms (i.e. the ones growing most quickly with energy [79], or equivalently, the ones suppressed by the smallest scale, which means they will remain relevant in the decoupling limit, as we shall see below in section 7). Finally note that introducing the fake U(1) symmetry should be supplemented by adding appropriate gauge-fixing terms. However, their exact form doesn't matter here [77,90].
Summary so far: Until now we have reviewed the construction of [77] to introduce and use Stückelberg fields in order to restore gauge invariances broken by bimetric interaction terms. This effectively projects out the n − 1 Goldstone modes of the n − 1 GC (i) symmetries broken by the interaction terms and which got eaten to produce n − 1 massive 'gravitons' in the initial gauge-fixed action. We found it straightforward to extend this formalism to the case of a single n spin-2 field interaction vertex. We will now move on to consider more complicated theories of N interacting spin-2 fields and find that various subtleties not present in the simple setups considered so far appear and of which care must be taken.

Stückelberging interacting spin-2 fields
Different ways to Stückelberg : In the previous section we discussed how, for an n spin-2 field interaction term, one can introduce n − 1 Stückelberg fields Y µ to map all the participating nodes/spin-2 fields to the same site. How does this prescription generalise to the case of a full theory graph, i.e. when one has multiple interaction terms as in figure 2? Below we describe three approaches to this. 9 We insist on inverted commas for the term 'graviton' in this context, since the statement about A µ of course applies to all of the spin-2 fields which are subjected to the Stückelberg treatment in an interacting theory with N such fields. Prior to coupling this to matter via Smatter, which will decide what other matter in the universe sees as gravity, there is consequently no way of discriminating one such field as 'the graviton' and we resist calling the excitations of all spin-2 fields 'gravitons', since as far as other matter in the universe is concerned, these extra fields have very little to do with the geometry perceived by matter. Figure 4. Different ways of Stückelberging: Here we show graphs corresponding to approach II analogous to the case shown in equation (4.2) (left) and approach III analogous to the case shown in equation (4.4) (right). Note that, on the second line in approach II (left), even though the right-most node is only mapped via Y (c) , the interaction term connecting it to its neighbouring node is mapped via Y (c) and Y (b) , which is why we label the 'mapping' in the second line with both of these link fields. As a result there will be an interaction term for Goldstone bosons associated to the b and c index in the final Stückelberged action, whereas the other links give contributions depending on Goldstone bosons associated to indices a and b only, i.e. they do not introduce extra cross-interaction terms. We denote this by A (a,b,bc) in the final line. In approach III link fields are introduced for all sites, so that link produces an interaction term involving two Goldstone bosons in the final action. Hence we have A (ab,bc,cd) .
Approach I: We treat each interaction term in isolation and apply the Stückelberg trick as in equation 3.15. This guarantees that every Stückelberg field we introduce corresponds to a propagating mode (with the caveat discussed in the 'too many links' paragraph below). However, this also means that the same metric g µν , if participating in multiple interaction terms, may be mapped to different sites when Stückelberging different interaction terms. For example consider a three-site model with two bimetric interaction terms. Then we may choose to Stückelberg as follows This is consistent -after all nothing forces us to map the same spin-2 field to the same site throughout the action -but by definition means we don't have a simple global prescription for mapping a particular spin-2 field onto a given site. This is also illustrated in figure 3.
Approach II: Consider a theory graph as in figure 2. We know each separate 'island' corresponds to one surviving copy of GC (i) . To make this explicit it may be desirable to map all the fields participating in interaction terms in that island to the same site. Consider the three-site model from above again 10 This can clearly become somewhat cumbersome, since e.g. a spin-2 field which is n 'steps' removed from the site to which it will be mapped, will have to be Stückelberged as but does have the advantage of making the surviving copy of GC (i) explicit. 11 A similar, slightly more complex, example is shown in the left graph in figure 4. The number of Stückelberg fields introduced still corresponds to the propagating dof . However, the major drawback of this approach is that the interaction terms become much more complex since multiple Stückelberg fields interact at each vertex now, whereas before only n − 1 fields interacted for an n-field vertex. As a concrete example compare equations (4.1) and (4.2). Whereas in the first approach both vertices only has a single Goldstone boson A µ not directly interacting with any other bosons, in the second approach the second vertex will have interactions between A µ (ij) and A µ (ik) . This significantly complicates interaction terms and can make working with this approach very unwieldy.
Approach III: Finally one may hope to restore gauge-invariance with an algorithmic prescription that does not rely on inspection of the interaction terms. As such we might want to Stückelberg all spin-2 fields in the theory. There are of course different ways of doing this, but a natural choice is to combine this with the advantage of approach II and send all fields to some site l. This site can be one of the existing ones (on the same island or not), in which case the approach is equivalent to approach II in the case of a single connected island, or an extra 'imaginary' site.

(4.4)
A similar example is shown in the right graph in figure 4. While affording us with a simple global prescription for performing the Stückelberg trick and making the surviving copy of GC (i) explicit, this approach has an obvious drawback. We have introduced too many wouldbe Goldstone bosons. Namely we have introduced N Y (il) , but we know there are only N − m actual Goldstone bosons, where m is the number of disconnected islands in the theory. As a result there are m non-linear combinations of the would-be Goldstone bosons that actually do not propagate. In other words, when transforming into the right basis these combinations 10 Note that we are, in an abuse of notation, denoting the mapping of a metric determinant g (i) from site i to site j by gi • Y (ij) . 11 The term Y (n,n−1) • Y (n−1,n−2) • . . . • Y (1,0) is what [77] refer to as a plaquette.

Approach I Approach II
Approach III Figure 5. Another example comparing approaches I-III for a simple three spin-2 field system with two bimetric interaction terms. Y (ij) , W (ij) , V (ij) are the Stückelberg fields used in approaches I-III respectively. Note that the choice of the direction of each Y ij is arbitrary (we can always reverse link fields by inverting , since Y −1 (ij) = Y (ji) ). Also the choice of target node in approach II is arbitrary. Again note that, while the right-most node in approach II is mapped with W (31) only, the interaction term connecting it to the central node picks up a dependence on both W (31) and W (21) , which is why we label the 'mapping' with both of these fields. The choices made here are intended to bring out the differences between all approaches for the simplest possible system. We discuss these differences and demonstrate the equivalence of all approaches below. have no kinetic terms and act as auxiliary fields that can be integrated out, leaving us with the correct number of propagating degrees of freedom.

Gauge-breaking interactions
An important insight from considering the three different approaches, is that there is no unique way of applying the Stückelberg trick to N interacting spin-2 fields and that both the interaction terms between different Goldstone bosons and the apparent number of degrees of freedom depends on the choice of approach. An analogous conclusion was reached in [80] in the context of a particular 'circle Lagrangian' (n spin-2 fields interacting bimetrically with a theory graph that forms a single closed loop -see the 'too many links' section below). In what follows we will follow approach I, since it generates the correct number of degrees of freedom by default and produces the most economical interaction terms (i.e. the ones with the fewest interacting Goldstone bosons).
The equivalence of approaches I-III: The existence of the different possible approaches discussed above is due to the fact that there is no unique prescription for introducing gauge degrees of freedom in the process of Stückelberging. Here we wish to show that all approaches presented so far are ultimately dual to one another (as they should be -after all they describe the same physics). Using one approach over another may be far more economical in any given case, but a well-defined mapping between different ways of introducing Stückelberg fields always exists. This can most easily be seen with a concrete example. Consider three spin-2 fields connected by two bimetric interactions as depicted in figure 5. The interaction terms post-Stückelberging in approaches I-III 12 are given by (23) , g (3) (4.5) It is now trivial to relate the Stückelberg (or link) fields used to one another. Y (ij) 's and W (ij) 's are related via Y (21) = W (21) , whereas Y (ij) 's and V (ij) 's satisfy That all three V 's can be expressed in terms of the two Y 's is a direct manifestation of the fact that there are really only two Stückelberg field degrees of freedom here (i.e. in approach III we introduced one too many would-be Goldstone bosons V ). In this sense equation (4.9) can be seen as providing the constraint eliminating one of the V 's. We can now explicitly relate interaction terms (4.5)-(4.7) to each other. Consider (4.6) In the second line we have used (4.8) to substitute for the W 's. In the third and fourth lines we have acted on all fields in the interaction terms with Y −1 (21) and Y (23) respectively. Note that both f 1 (. . .) and f 2 (. . .) are gauge-invariant under diffeomorphisms in the action post-Stückelberging (just as R (1) and R (2) already were before). This e.g. means that allowing us to explicitly apply a 'diffeomorphism' Y (ij) to f 2 (. . .) in (4.10) while choosing to leave f 1 (. . .) unchanged. Similar manipulations, showing the equivalence of L III to L II and L I , can also be carried out. The gauge-invariant structure of interaction terms post-Stückelberging is therefore what allows us to treat each interaction term in isolation in approach I instead of having a global prescription always sending each g (i) to the same site as in approaches II-III.
A background metric: Finally we have to address the important issue of which metric we choose to raise and lower indices in the Stückelberged theory. A natural choice 1 2 n − 1 n would seem to be the metric of the site onto which we are mapping spin-2 fields. However, this is not an optimal choice, because different interaction terms will map onto different sites with the Stückelberg convention we have chosen here (approach I). So when combining parts of the scalar field action for all the π (i) it would be desirable to have indices (especially on partial derivatives) throughout raised with the same metric (at lowest order). In other words we would like to expand all metrics about the same background metric. For the sake of simplicity we will choose the non-dynamical (flat) Minkowski metric here, but in principle any other background metric (that allows a perturbative treatment of δg (i) µν ) could be chosen too (see appendix C for more on what changes). As such we expand Again we iterate that this is a particular choice of background metric where we have chosen g 0 = η. Indices will be raised and lowered with the background metric and we denote partial derivatives acting on the scalar mode π by indices too, i.e.
Since we have chosen a flat metric to raise and lower indices, all indices on π always commute. It is perhaps worth emphasizing at this point that, even though all interaction terms 'live' on individual sites following the Stückelberg trick, information about the interactions of different sites in the gauge-fixed theory is not lost. It is still encoded in the Goldstone bosons A µ (ij) and its scalar mode π (ij) (in an abuse of notation we have only labelled these with he site they are on so far, but in cases where it is not obvious which scalar mode is considered, we will label them explicitly) as well as in the metric perturbations h 'Too many' links: So far we have only considered theories with the 'right' number of interaction vertices, so that each Stückelberg field corresponds to a propagating mode. In other words it is a true Goldstone boson (compare our discussion between approaches I-III above). Specifically for N interacting spin-2 fields, which form m 'islands', i.e. where the interaction term preserves m copies of GC (i) , the number of interaction vertices satisfied where n i counts the number of interaction vertices with i participating spin-2 fields. What happens if we violate this constraint? In terms of the theory graph this corresponds to introducing loops. Figure 6 shows examples of such theories: the graph from figure 2 with added interactions that violate (4.14) and introduce loops and another graph corresponding to the 'circle Lagrangian' (i.e. one big loop) as considered by [77][78][79][80]. Loops in theory graphs (in the metric picture used throughout this paper) modify the constraint structure of the theory [91] 13 and have been conjectured to introduce ghosts into the theory [73,91] -also see related discussions in [71,72]. 14 While we won't attempt to prove or disprove this conjecture here, we can observe that this adds a Goldstone interaction term to the Stückelberged theory that is qualitatively different from the others. Consider a single loop made up of bimetric interactions only (such as the 'circle Lagrangian' in figure 6) with n sites and, since it is a loop, n links. We now Stückelberg n − 1 links. It doesn't matter which, but for concreteness suppose we only leave out the node n to node 1 link, denoting its interaction by Now the degrees of freedom projected out from n − 1 are sufficient to describe all Goldstone bosons arising from symmetry breaking. What about the final link? We know it does not break any more symmetries, yet if we Stückelberg it as all the other links we seemingly add an extra would be Goldstone boson. However, note that the link fields introduced already are sufficient to make the interaction term corresponding to the final link gauge-invariant. We denote the Stückelberg field connecting node j and j − 1 with Y (j−i,j) (if the original link was chosen the other way, note that we can always use Y −1 (j,j−1) = Y (j−i,j) to find a link field mapping from site j − 1 to j). We can now make S loop gauge-invariant via This now restores gauge-invariance to the full action and introduces the correct number of Goldstone bosons. Closing the loop has had the effect of adding an extra interaction term, S loop , which couples all the Stückelberg fields together. We investigate whether this can be done in a ghost-free way in [92], but can already note here that the pattern shown in (4.16) is general. Whenever we encounter a loop, this means that the fully gauge-invariant action resulting from applying the Stückelberg trick will contain an interaction term coupling all the Goldstone bosons from the loop together.

From Goldstone modes to demixed scalar-tensor theories
The pure scalar action at quadratic order: Consider the interaction term post-Stückelberging. We project out the Goldstone bosons A µ (ij) and the associated longitudinal 13 We thank Angnis Schmidt-May for bringing this and reference [91] to our attention. 14 Note that theories with loops are non-equivalent in the metric and vielbein pictures. In fact theories with loops in the vielbein picture are known to be ghost-free for certain types of interaction terms [71]. scalar modes π (ij) as in equation (3.17). Finally metric perturbations h (i) µν are expanded around a flat background as in equation (4.12). This will result in an interaction term mixing scalar, vector and tensor modes. Let us briefly focus on its pure scalar part, effectively dropping all occurrences of A µ (ij) and setting all spin-2 fields to be the Minkowski η. At linear order all terms must be total derivatives, since at the level of the pure-scalar action here every π (ij) is acted on by two derivatives. Things become interesting at quadratic order.
In a theory with only bimetric interactions the different π (ij) don't interact and at quadratic order for each Goldstone boson we have In the last step we have performed two integration by parts. If the interaction term is not chosen so that a + b = 0, this term leads to a fourth order (in derivatives) equation of motion for π, signalling the presence of an Ostrogradski ghost. Setting a + b = 0 is the equivalent of the Fierz-Pauli tuning for bimetric interactions. Even though the full effective scalar action will pick up extra contributions from mixing with metric perturbations, we will see that these cannot cure an Ostrogradski instability already present at the level of the pure scalar action pre-mixing. Also notice that, pre-mixing, the scalar action has no kinetic term for π. In a theory with higher order spin-2 field interactions (trimetric and above), interaction terms mix different Goldstone bosons together. Since we are only considering terms up to quadratic order in the fields, this will show up via cross-terms between different π µ µ,(ij) . As such the quadratic interaction terms look like where k labels the site the interaction term is mapped to and i, j run over all other sites participating in the interaction. All terms π (ik) µν π µν (jk) introduce ghosts (both for i = j and i = j) and we therefore find the N spin-2 field interaction generalisation of Fierz-Pauli tuning: the requirement that for all i, j. Once again these instabilities will not be removed by mixing with the tensor modes, so it is imperative to check whether there are ghosts at the quadratic level in the pure scalar action pre-mixing to ensure stability. Also note that the presence of a generalised Fierz-Pauli coupling essentially amounts to a vanishing of the quadratic pure scalar self-interaction term up to total derivatives -if this also takes place at higher orders this is directly connected to the raising of the effective cutoff scale of the theory and potential ghost-freedom of the theory [9][10][11]90]. We will discuss this further in sections 7 and 9. As a final comment, it is worth mentioning that requiring the generalised Fierz-Pauli condition (5.3) to hold is equivalent to the following: Firstly we diagonalise the interactions of tensor-perturbations h prior to performing the Stückelberg trick and then -this is the point equivalent to enforcing (5.3) -require that the resulting massive eigen-fieldsh have Fierz-Pauli mass terms at quadratic order in the fields. Stückelberging these eigen-fields then automatically leads to an action for the Stückelberg scalars that separately satisfy the Fierz-Pauli condition (there is no mixing at quadratic order anymore, since one works with the eigen-fieldsh).
Conformal transformation(s) : As mentioned above, the scalar Goldstone bosons π ij have no proper kinetic terms before mixing with tensor modes. Upon considering the full scalar tensor theory the fields π (ij) and h (i) µν become kinetically mixed. We can already see this from Stückelberging one spin-2 field, projecting out its Goldstone bosons A µ and π and expanding around the non-dynamical background metric

This introduces kinetic mixing between h (i)
αβ and π (ik) , which we can remove at the linear level by a Weyl rescaling of the spin-2 fields, i.e. a (linearised) conformal transformation of the metric.
In a theory with only one dynamical metric, like in dRGT, it is sufficient to transform the dynamical metric like h (i) as This already introduces a proper kinetic term for π. However, even in cases where there are only bimetric interactions, several dynamical spin-2 fields require a more complex transformation. This is because fluctuations h (k) of the target site metric around the common background η will also pick up an interaction with Goldstone bosons π (ik) 15 . This mixing is not removed by the above transformation.
For a general theory with N interacting spin-2 fields and higher order interactions (trimetric and above) there can be mixings between tensor perturbations from each spin-2 field and all of the Goldstone bosons. As an example consider a tetra-metric vertex upon Stückelberging and expanding around some common background metric. Ignoring vector modes we have which will mix all participating h (i) and π (jk) . The general linearised conformal transformation demixing a theory with arbitrary interaction vertices at the linear level will therefore be This can simplify greatly depending on the nature of the interaction terms -for each index i it is sufficient to let the index j only run over all sites that h i interacts with in the original gauge-fixed theory (i.e. the theory pre-Stückelberging). For a 'line theory' (as depicted in figure 6 without the 'closing link' connecting the initial and final site) with only bimetric interactions between neighbouring sites and no loops, for example, (5.7) reduces to Note that generically, in order to explicitly eliminate any mixing in the action at linear order, one needs to fix residual gauge symmetries associated with each h via the addition of appropriate gauge fixing terms. We discuss this below and provide an explicit example in 8. Finally it is worth pointing out that mixing will generically persist at higher orders, see e.g. [90].
Scalar-tensor mixing: What form do the terms mixing scalar and tensor modes take? And does the conformal transformation proposed above allow us to remove this mixing at the lowest order? Let us begin somewhat backward by looking at the kinetic term, i.e. the Ricci scalar, for a given spin-2 field where the so-called massless kinetic operatorÊ µναβ iŝ where we are symmetrising with unit weight. Now consider the conformal transformation we use to demix scalar and tensor modes. We restrict ourselves to the simple case h (i) →h (i) + c (i) π (i) η for now and find that the Einstein-Hilbert term −g (i) R (i) will, upon conformally transforming, generate a mixing of the form where X µν (n) (π 1 , . . . , π n ) = δ µµ 1 ...µn [νν 1 ...νn] π ν 1 µ 1 ,(1) . . . π νn µn,(n) (5.13) and the antisymmetric tensor δ µµ 1 ...µn [νν 1 ...νn] is defined in (A.2) and X µν (n) thus satisfies ∂ µ X µν (n) = 0. Before the conformal transformation a number of mixing terms may be present, depending on the nature of the interaction term S int . In addition terms quadratic in the tensor perturbations h (i) will introduce mixing terms via the conformal transformation. Overall, at lowest order in the mixing, i.e. at second order in the fields, we can list all possible contributing terms L lin,(i) (5.14) Again we emphasize that, apart from L lin EH , which of these is present is determined by the form of S int . Using equation (5.11) for linearised Einstein-Hilbert terms and computing the corresponding behaviour under conformal transformations for the other terms, we find that the 'mixing terms' (5.14) are mapped to where → denotes a conformal transformation for all h (i) as discussed in the previous section and we have assumed a simple form for the conformal transformation If a more complicated form of the conformal transformation is appropriate, i.e. several π partake in the transformation for h (i) , additional cross-terms will appear. We discuss this below and in section 6. Equations (5.15)-(5.19) now allow us to draw a number of conclusions for generic interaction terms S int . 16 can manifestly be removed by cancelling off contributions from (5.15) and (5.16). In other words, by tuning the c (i) these mixing terms can be removed. Note that, at higher orders in the mixing, no local field redefinition which can removing scalar-tensor mixings is guaranteed to exist (although non-local redefinitions may achieve this [9,90]).
2. In general S int will generate mixing terms of a similar form, but also involving cross couplings h (i) µν X µν (1) (π (j) ) with i = j. This happens when a site i has several interaction terms linking to it and as a result the conformal transformation for h (i) required to demix scalar and tensor modes picks up a a dependence on several π. Suppose the original theory has some such cross-coupling h µν . Then the appropriate conformal transformation in order to achieve scalar-tensor demixing will be h (i) →h (i) + c 1 πη + c 2 φη, resulting in (5.15) also giving rise to h terms. These can be cancelled off against each other by choosing c 1 and c 2 appropriately and we will give an explicit example in section 6.
3. Suppose that, in addition to potential contributions like h (1) (π (i) ), S int also generates 'mixing terms' (5.17) and (5.19). Now, even after the conformal transformation, mixing terms involvingh µ,(i) µ will persist. We can remove these by appropriately gaugefixing. We will discuss this in more detail below, but for now it is sufficient to stress that there is always enough gauge-freedom to move to a traceless gauge whereh , these have two interesting effects. Upon conformally transforming these firstly lead to 16 In general the dRGT-like mixing terms are of the form j h (i) µν X µν (j) (π (i) ) -the conformal transformation will only remove the lowest order mixing h mixing terms, which can be removed by working in a traceless gauge. Secondly, they will generically give rise to π (i) π (j) cross-terms for the pure-scalar action and consequently also to π 2 (i) mass terms for the Goldstone bosons π (i) .
To summarise the last two sections: The conformal transformations proposed generate kinetic terms for all 'Goldstone bosons' π (i) (which is essential in order to proceed with canonically normalising all fields) and also allow us to eliminate scalar-tensor mixing at quadratic order in the fields. For further discussion of the nature of scalar-tensor mixing terms see appendix B. However, as we shall see in section 6, the different π (i) generically remain kinetically mixed, so we cannot at this stage identify them as separate propagating dof . In section 6 we discuss how this scalar mixing can be demixed and the appropriate propagating dof can be identified.

Gauge transformations:
Consider a single interaction vertex in a multi-metric theory and the effect of the Stückelberg trick on it. If everything is brought to site j, the (infinitesimal) gauge transformation properties of the fields are where the ellipsis denotes terms containing higher powers of A (i) , and the Lie derivative is Taking the decoupling limit the gauge transformations are reduced to where ξ, and Λ have been suitably rescaled. The gauge freedom associated with ξ should now be fixed by imposing suitable gauge conditions (we will discuss the Lorentz gauge below). This is particularly relevant when eliminating scalar-tensor mixing at quadratic level in fields. As we saw above, quadratic scalar-tensor mixings involving the trace h µ µ,(i) generically remain after the conformal transformation designed to demix the action (cf. equations (5.16) and (5.19)). Here we will show that this mixing is eliminated once appropriate gauge fixing terms are added to the action.
Let us begin by recalling the Lorentz and transverse traceless gauges commonly used in standard General Relativity (m 2 = 0), where we generalise to N non-interacting spin-2 fields labelled by an index i. Now the Lorentz gauge condition corresponds to ∂ µ h µ ν,(i) = 0 and fixes the gauge freedom up to residual gauge transformations ξ (i) µ = 0. One possible gauge choice to fix the remaining gauge freedom is the transverse, traceless gauge (essentially the gravitational equivalent of the Coulomb gauge), which results in the complete set of gauge conditions The requirement h µ µ,(i) = 0 is one of the gauge fixing conditions, i.e. it does not fix all the gauge degrees of freedom, but only a subset of the conditions associated with the transverse traceless gauge. In fact it fixes only a single real space degree of freedom [90]. In order to eliminate the scalar-tensor mixing at quadratic level in the interacting spin-2 field theories considered above h µ µ,(i) = 0 is a sufficient requirement, so we do not need to fully specify the gauge fixing conditions in order to ensure demixing.
So let us see how this can be implemented in the massive gravity/interacting spin-2 field case. The analogous gauge fixing conditions in the Lorentz gauge are This gauge still leaves a residual gauge freedom ( − m 2 )ξ (i) µ = 0. And this residual gauge freedom is enough so that we can always consistently set the trace of h (i) to zero and consequently eliminate scalar-tensor mixing at the quadratic level.
Finally we may wonder what the precise form of ξ (i) µ enforcing tracelessness is. Consider the decoupling limit gauge transformations for tensor modes We now take the trace and require h µ µ to vanish. In other words we want to find out whether we can always consistently set the trace to zero with some gauge transformation, regardless of the initial form of h µ µ . The result is where we recall that are we are raising and lowering indices with the flat Minkowski metric η µν . This is the condition on ξ µ (i) in order to eliminate h µ,(i) µ .

Scalar mixing and propagating modes
Having Weyl transformed all the relevant spin-2 fields for each interaction term in S int , all the π (i) modes have proper kinetic terms now and the scalar-tensor mixing has been eliminated at the lowest order. However, in a generic theory the Weyl transformation will have induced kinetic mixing between the different π modes. In addition, tensor-tensor terms such as h (i) µν h µν (j) will generically generate 'mass terms' π (i) π (j) via the conformal transformation. In other words, we still have not completely diagonalised the action (again, up to quadratic order in the fields) and scalar mixing (both kinetic-and mass-mixing generically persists).
Here we show why this mixing arises and how to demix the scalar action allowing us to identify what the actual propagating modes are. Note that, for the purposes of this section, Latin indices are site labels (unless otherwise stated and even when not bracketed) and are not summed over, whereas Greek indices are space-time indices and the Einstein summation convention applies as usual.  No scalar mixing in bimetric theories: In a bimetric theory no scalar mixing arises as there is only a single would-be Goldstone boson π. Let us show this explicitly. The interaction term is of the form where f is some function of the two spin-2 fields g (0) and g (1) . This corresponds to graph (a) in figure 7. Performing the Stückelberg trick, expanding around a background and dropping the Stückelberg vector A µ (01) this results in 17 To go further we must know something about the nature of f . Here we choose a symmetric interaction term, i.e. we consider terms f g (0) , g (1) , which do not distinguish between g (0) and g (1) . For large parts of our argument the only important feature of this choice will be its symmetric nature. However, for concreteness we take it to be of the dRGT/Hassan-Rosen bigravity form [36][37][38] as discussed in more detail in section 9 below. The scalar-tensor mixing terms pre-conformal-transformation now are In order to remove the scalar-tensor mixing we perform the conformal transformation 17 Note that we have defined the perturbations h µν (0) relative to the inverse metric g −1 (0) , i.e. g µν (0) = η µν + h µν (0) . If instead we define a perturbationh µν (0) with respect to the metric via g µν , then a relative sign change takes place (to lowest order), i.e.h µν . For comparison see (6.9) where this becomes relevant explicitly, since g (i) and g −1 (i) both enter. (1) (π) + 3c 1 π π . (6.5)

Under this transformation equation (6.3) is mapped to
As we saw above, additional mixing terms are generated from L lin EH , namely we have In order to eliminate the scalar-tensor mixing we require m 2 = −2c 1 . This means the pure scalar action that remains once equations (6.5) and (6.6) are combined is In other words, π is a propagating Goldstone mode and no further demixing needs to be performed after the scalar-tensor mixing has been eliminated.

Scalar mixing -an example:
The situation is different if more than two spin-2 fields interact with bimetric interactions. We start with a trimetric theory with two bimetric interaction terms.
This corresponds to graph (b) in figure 7. Performing the Stückelberg trick and expanding around the background this now results in where π (1) and π (2) are the Stückelberg scalars from the first and second bimetric interaction term respectively. Moving to the scalar-tensor mixing terms for the 'central' i node preconformal-transformation we now have The ± sign and the parameters α 2 are of some importance here. The sign of a h (i) µν X µν (1) (π (j) ) mixing term depends on how we define and introduce the Stückelberg fields Y ij as well as h µν and π (ij) and whether we use Y ji or Y ij for two particular neighbouring sites i and j, 'incoming' links Y (ji) will generate a positive +h (1) (π (ji) ) coupling, whereas 'outgoing' links Y (ij) will generate a negative −h (i) µν X µν (1) (π (ij) ) coupling or vice versa. What is important here is not the convention that is chosen, but that incoming and outgoing links generate mixing terms of opposite signs. Depending on how we Stückelberg the interaction terms there can therefore be a relative sign difference in the couplings in equation (6.10) encoded by the ± sign. Secondly, α 2 encodes the difference between coupling strengths m I and m II for the two bimetric interactions. In other words we have m 2 I = α 2 1 m 2 and m 2 II = α 2 2 m 2 , where the scale m 2 , which is factored out, is arbitrary.
Let us now return to (6.10) and demix the fields. For now we solely focus on the quadratic interaction terms (both self-interaction as well as mixing terms) from the h Under this transformation, dropping all h µν -dependent terms and choosing m 2 such that α 2 1 = 1, from equation (6.10) we have (2) ).
Now additional mixing terms are generated from L lin EH under the conformal transformation and we have In order to eliminate the scalar-tensor mixing we require ±α 2 2 m 2 = −2c 2 = ∓2α 2 2 c 1 . This means the pure scalar action that remains once equations (6.12) and (6.13) are combined is − 3 4 m 4 π (1) π (1) + 2α 2 2 π (2) π (1) + α 4 2 π (2) π (2) = − 3 4 m 4 (π (1) + α 2 2 π (2) ) (π (1) + α 2 2 π (2) ), (6.14) where we have worked with (X µν (1) (π (1) ) + α 2 2 X µν (1) (π (2) )), i.e. a positive relative sign, in order to avoid clutter from now on -it is straightforward to modify (6.14) and the below in the case of a negative relative sign. Now the h (i−1) and h (i+1) mixings will generate additional kinetic terms ) . (6.15) This form now makes it obvious that we can eliminate the kinetic scalar mixing by moving to i.e. we essentially map {π (1) , π (2) } to the normalised eigenvectors φ (1) , φ (2) of the matrix encoding the kinetic mixing between {π (1) , π (2) }. We can now finally write the full kinetic scalar interaction term as Note that we have so far focussed on diagonalising the kinetic terms of our trimetric theory here. That is of course consistent, but will identify dof which are no longer kinetically mixed, yet in principle still interact through potential terms such as φ (1) φ (2) . If one wants identify the proper eigenmodes of the theory, i.e. independently evolving dof (at least up to quadratic order in the fields), one should first canonically normalise the fields φ and then also diagonalise the mass terms for the Stückelberg scalars. These will come from tensor-tensor terms such as h µν h µν after the conformal transformation has come into effect. This would result in identifying the independently propagating and canonically normalised dofχ. We describe how this procedure works in detail below. Already at this point, however, we emphasize that neither the kinetic eigenmodes φ nor theχ we will describe below are the original Stückelberg scalars π (1) and π (2) , but instead are linear combinations thereof. Also we stress that we have not canonically normalised the φ fields at this point yet.
The general demixing procedure and mass-mixing: What procedure should one follow if we aim to fully eliminate the kinetic-and mass-mixing between the Stückelberg scalars and establish the independently propagating scalar dof of an interacting spin-2 field theory up to quadratic order in the fields? Having established the independently propagating modes will be of importance once we consider higher order interactions and their scale in the next section. Taking into account both kinetic and mass mixing, the pure scalar quadratic part of the Lagrangian is where we have arranged the Stückelberg fields into a column vector π and encoded kineticand mass-mixing in the matrices K and M respectively. De-mixing at the quadratic level then amounts to finding a basis in which the matrices K and M are diagonal; if and only if they commute then they can be simultaneously diagonalised by a unitary transformation acting on π. A good example for this case is the 'loop' theory considered in [33]. 18 There each of K and M is a circulant matrix, which can be diagonalised by a discrete Fourier transform. However for a more general theory (a good specific example being 'line' theories, which we will discuss in some detail below) this is not the case: K and M do not commute and hence cannot be simultaneously diagonalised by one unitary transformation. Instead one must first diagonalise K and then perform a rescaling of the new fields (a non-unitary transformation), which essentially amounts to their canonical normalisation. Let us call the vector of fields this results in φ andφ (before and after canonical normalisation, i.e. applying N , respectively). As a result the kinetic matrix forφ is proportional to the identity and the mass matrix forφ is someM , which is real and symmetric -so it is diagonalizable by an orthogonal matrix U M . SinceM and I commute, the kinetic and mass terms forφ can be diagonalised simultaneously -in fact by U M . This is because U M T U M = U M U M T = I, so when the mass matrixM is diagonalised by applying U M toφ, resulting in a new field vectorχ, the resulting kinetic term forχ is still diagonal and the fields are still canonically normalised. As a result we have successfully diagonalised both the mass and kinetic matrices. More formally one has 19 where µ 2 (i) are the diagonalised mass-terms and where U K is the matrix used to diagonalise K (e.g. the matrix of eigenvectors of K) and N is the diagonal matrix used to canonically normalise the fields. This means that where λ i are the eigenvalues of K. As discussed above U M is the orthogonal matrix of eigenvectors ofM = N T U K T M U K N . The fieldsχ are now completely de-mixed up to quadratic order in the fields and are consequently the independently propagating scalar dof of the theory. As we will see, working in terms of theχ makes analysing the suppression scale of higher order terms straightforward.
Scalar mixing with generic bimetric interactions -a 'line theory': Let us now perform the first step of the general procedure -de-mixing the kinetic scalar terms -for a more complicated theory. Namely for a 'line theory' with N sites as depicted in graph (b) of figure 7 Suppose we choose to Stückelberg this interaction term by introducing N − 1 Stückelberg fields Y (i,i+1) , i.e. we always map from site i to i + 1, never the reverse. Except for the initial and final site, each site therefore has one incoming and one outgoing link, resulting in a sign difference between h . Each Y (i,i+1) will give rise to a Stückelberg scalar π (i,i+1) . In terms of these Stückelberg scalars the kinetic scalar interactions post-conformal-transformation will schematically be − 3 4 m 4 α 4 (2) π (1,2) π (1,2) + α 4 (N ) π (N −1,N ) π (n−1,n) where m 2 i α 2 (i) = m 2 . As discussed above equation (6.22) can be re-written in matrix format as − 3 4 m 4 π T K π, (6.23) where K here is a tri-diagonal matrix with 2 on the diagonal and −1 on the super-and sub-diagonals. K is real and symmetric, so we can identify its eigenvectors and eigenvalues to construct the orthogonal matrices U K that diagonalise the kinetic terms by mapping the π (i,i+1) to linear combinations as identified by the eigenvectors. 19 We here normalise scalar kinetic terms to − 1 2φ φ and hence − 1 2χ χ. Note the choice of sign (especially since we have not specified the signature of the spin-2 fields involved anywhere yet).

Scalar mixing with generic bimetric interactions -branching vertices:
Having considered a single bimetric interaction and its generalisation to a 'line theory', we are only missing one piece to build a generic theory only containing bimetric interaction terms. To build such a theory we also need to know how to handle branching vertices like Such a branching vertex is depicted in graph (c) in figure 7. By combining such branching vertices (with a 'central' node and several 'outer' ones) with the line-type interactions discussed above, we can build a generic theory with only bimetric interaction terms. For concreteness we will only discuss a 4 spin-2 field branching vertex here, but generalisation to higher order branching vertices is straightforward. Returning to (6.24), we again Stückelberg interaction terms, expand around the background and focus on the mixing terms between h (i) and π (j,0) (though from now on the second index will be omitted) The first term corresponds to the scalar-tensor interactions at the 'central' node g (0) , whereas the second term sums up contributions from all 'outer' nodes. Note that we have chosen Stückelberg fields Y such that all relative signs for the 'central' interaction term are positive. The conformal transformations for h (i) will now be with, as above, c (i) = − 1 2 m 2 α 2 (i) . This results in a mixed kinetic scalar action where again the first term corresponds to contributions from the 'central' node g (0) and mimics the structure of 'central' nodes in the line theory example discussed above. Demixing the kinetic scalar action (again we focus on the first step in the demixing procedure and ignore any scalar mass terms for the time being) we obtain where Σ n = n i=1 α 2 (i) π (i) . The modes φ (i) that diagonalise the kinetic scalar action are (note we have not normalised these yet) Once again the demixed, propagating scalar modes are linear combinations of the original Stückelberg scalars π (ij) . Note, however, that at this stage the φ (i) derived here are interacting propagating modes, since without having diagonalised the mass terms there will still be mass mixing terms φ (i) φ (j) at this stage.
Scalar mixing for higher order interaction vertices: Finally we wish to consider interaction vertices which couple together more than two spin-2 fields 20 , cf. graph (d) in figure 7. As discussed above, an N -metric interaction will break N − 1 copies of diffeomorphism invariance and hence we introduce N − 1 Stückelberg fields. We remind ourselves that this means such an interaction term is mapped to (6.31) (Again, for brevity the second index on the Y 's has been omitted.) Note that, as discussed before, cross-terms and scalar self-interactions in the quadratic pure-scalar action pre-conformal transformation have to be eliminated to satisfy the generalised Fierz-Pauli condition (5.3).
Turning to the mixing between h (i) and π (j) , at quadratic order this can be written where k (ij) and k (ij) are constant coefficients. The conformal transformation employed to eliminate the scalar-tensor mixing is consequently where any survivingh α α terms are removed via gauge-fixing. With this conformal transformation the demixed kinetic scalar part of (6.32) (demixed in terms of having eliminated scalar-tensor mixing) becomes where we are again ignoring any mass terms for the time being. The different π (i) are mixed here, which is a general feature of N spin-2 field interactions. 21 This can be demixed to (in 20 Ghost-free higher order vertices have been constructed in the vielbein language in [71]. These vielbein terms cannot always be mapped to the metric picture used throughout this paper [71,85,86]. We provide the mapping between our approach in the metric and vielbein picture in [84], but here we focus on the generic behaviour of any higher order interaction vertex in the metric picture. 21 In other words, such interactions kinetically mix all the participating π (i) , in contrast to the line theory considered above, which only kinetically mixed 'neighbouring' π (i) . principle) N − 1 different propagating modes, corresponding to the N − 1 broken copies of GC (i) and which are linear combinations of the different π (i) as expected. For example, for a dRGT-like scalar-tensor interaction wherek (ij) = 0, demixing the scalar dof corresponds to diagonalising the (in general up to rank N − 1) matrix i k (ij) k (ik) .
If one imposes additional symmetry requirements on the nature of the interactions, one can, however, also reduce the effective number of propagating modes (equivalently: the extra symmetry can lead to some of the N − 1 modes becoming identical). We now illustrate this by showing that (6.34) simplifies remarkably upon imposing some prima facie rather modest assumptions, highlighting two particular cases: 1. Consider the case when the interactions of a given scalar π (j) to two metrics are totally proportional, that is the 'interaction strengths' factorise as k (ij) = k (i) γ (j) ,k (ij) = k (i)γ(j) . In this case (6.32) can be written i.e. everything depending on the i site index factors out. The demixed scalar action (6.34) therefore becomes In the third line we have de-mixed the scalar action (at quadratic order in the fields) and introduced Σ γ = Σ γ + 4 3Σγ in the process, so that the de-mixed modes are φ (1) = Σ γ and φ (2) =Σγ. We emphasize that, as a result of our 'factorisation assumption', there are only two distinct propagating modes left over now.
There is an interesting observation that can be made for this case. Namely that thẽ Σγ field is ghost-like (it has the wrong sign kinetic term). There are at least two ways in which the introduction of such a ghost can be avoided. Firstly one of the hallmarks of dRGT-type bimetric interactions is thatγ = 0. If this is also true for a given higher order interaction here, this eliminates this particular ghost, leaving us with a single, healthy propagating mode. Secondly, ifγ i ∝ γ i , and henceΣγ is not independent of Σ γ , then only a single Σ γ field remains again and whether or not this field is ghost-like depends on the exact proportionality constant betweenγ (i) and γ (i) .
2. Now consider the case wherek (ij) instead factorises ask (ij) =k (i) γ (j) . This corresponds to the interactions of a given tensor h (i) to two different scalars being totally proportional. Now (6.32) becomes which is similar to the expression in case 1 above, except that the 'flavour' index on the differential operator is now shared with the tensor and not the scalar. This has important consequences, as (6.34) now turns into and we immediately see that there is only a single propagating mode φ (1) , highly reminiscent of the way in which the 'central' node in our bimetric interaction branching vertex above only contributed to the π (i) mode. It is worth emphasizing that, for example by adding different bimetric interactions connecting the outer nodes of this particular N spin-2 field interaction to other nodes, the symmetry of the interaction term can be broken and the full N − 1 degrees of freedom contributed by the N spin-2 field interaction vertex can be brought out again.
Canonical normalisation: By now we have eliminated both scalar-tensor as well as kinetic scalar-scalar mixing at quadratic order.
Step 2 in the procedure summarised in equation (6.20) above is to canonically normalise the eigenmodes of the kinetic matrix K: the φ fields. In fact we can canonically normalise all the propagating dof of the theory at this point -they are all h µν (i) , A µ (ik) , φ (l) , where the φ (l) may still interact via their mass terms. Note that, since the scalar modes φ generically combine contributions from different vertices, l here is no longer a site label but just an ordering index for the φ fields as determined via the eigenvectors of K. Canonical normalisation is straightforward now, especially since we have related all φ (l) to the same mass scale m 2 by absorbing appropriate powers of α 2 (ik) into the definition of each φ (l) . Note that we have labelled the different masses m and corresponding α's with only one index (whenever it is unambiguous to do so) so far, but just to be explicit we here label them with the full two indices. Canonical normalisation 22 then instructs us to normalise as follows ignoring numerical pre-factors, where a hat denotes a canonically normalised field and where λ (l) is the eigenvalue of the eigenmode φ (l) of the kinetic matrix K. Vector perturbations A µ (ik) are consequently normalised by the scale m (ik) associated with the interaction vertex for which A µ (ik) is a Stückelberg vector. Since the φ (l) are linear combinations of the Stückelberg scalars π (ik) , they all get normalised by the same mass scale m 2 , which we recall is related to the strength of interaction terms by m 2 α 2 (ik) = m 2 (ik) . Again we emphasize that this means 22 We recall that having a universal normalising scale M P l for all h (i) is an assumption we have imposed. If different M the dependence on all other mass scales has been absorbed into our definition of the φ (l) via their dependence on the α 2 (ik) . Re-phrasing equation (6.39) in the language of (6.20), for the scalar modes the canonical normalisation amounts to again up to numerical pre-factors and where the index j is not summed over here.

Higher order interactions and the decoupling limit
Having canonically normalised all fields and de-mixed at quadratic order, we can now consider higher order interaction terms and indulge ourselves in power counting to identify the scales suppressing these terms. In particular we will be interested in the decoupling limit, i.e. the limit where one zooms in on the interaction(s) suppressed by the smallest scale and hence most relevant at low energies. As we shall see, having multiple interacting spin-2 fields can qualitatively change this limit when compared to the analogous massive/bi-gravity cases [9,10,[29][30][31]. Throughout this section the explicit examples we provide for multi-spin-2 field models will be of the 'line theory' type discussed above. Analogous examples are readily constructed for branching vertices or n-field interaction terms, but the simpler line theory examples suffice to illustrate generic features of interacting spin-2 field theories. Note that we also do not consider cases with potentially ghost-inducing loops here -we discuss these in [92].
Bimetric theories: We begin by quickly reviewing higher order interactions and the decoupling limit in massive/bi-gravity, which will serve as a reference point later on. In theories with two spin-2 fields as described by (6.1), the nature of the Stückelberg trick (cf. (3.17)) leads to interaction terms where m 2 M 2 P l is the coupling constant of the interaction term, the arrow signifies the effect of canonical normalisation, which is trivial in theories with a single bimetric interaction. n h , n A , n π denote the powers of h µν , A µ and π respectively. Note that in dRGT massive gravity there is only one h, A and π, since the second spin-2 field is non-dynamical. In a full bigravity model, however, there are two h, which will generically both have interaction terms of the type shown here. Since the Stückelberg expansion of the metric also introduces terms of the form A µ ∂ µ g αβ (again cf. (3.17)), it is not always true that h enters with no derivative, A enters with one and π with two (at least prior to a re-summing procedure -see appendix B), so not enforcing any derivative structure we denote the canonically normalised interaction terms by Here we have introduced the ordering parameters Λ and λ, setting the scale of the interaction and satisfying ≤ ∞. Note that, since all the terms shown here can in principle contribute at roughly the same scale, we here choose to show a 'generalised' decoupling limit with all these terms still present. {0, 2, 1} terms lead to contributions at the Λ The lowest and largest such scale will have analogous behaviour to that discussed in the trimetic case (c). In particular there will be a lowest scale 'sum mode', which, if all coupling constants m (i) are identical, will take its minimal value Λ for large N . In fact, as discussed below, it is the second and third term in (7.31) which yield interactions contributing to this scale. Also note that this scale is identical to the one discussed in the specific context of a circle theory in [33].
A larger λ then corresponds to a smaller scale of suppression. For the different possible cubic interaction terms we have Note that higher order interaction terms can have comparably low suppression scales, e.g. the smallest quartic scale is set by {0, 0, 4} and leads to λ = 4 and the highest order pure-scalar interactions {0, 0, n π → ∞} have λ = 3.
We can see that the pure-scalar cubic interaction term suppressed by Λ 5 generically is the least suppressed term in the theory. Phrasing it explicitly in terms of m, its scale is set by (M P l m 4 ) 1/5 . With such a term the scattering amplitude ππ → ππ obtained by patching together two cubic interactions scales like (E/Λ 5 ) 10 [77]. The theory becomes strongly coupled at Λ 5 , which is also the cutoff scale of the theory (taking into account quantum corrections may postpone unitarity violation to scales higher than this cutoff [77]) . dRGT massive gravity tunes interaction terms such that all pure-scalar interaction terms become total derivatives and hence vanish. This is essentially the higher order generalisation of Fierz-Pauli tuning at quadratic order (5.3). Eliminating the pure scalar interaction in this way also automatically turns terms with λ = 4 such as {0, 1, 2} into total derivatives [90], leaving us with a theory where the least suppressed interaction terms have a scale Λ 3 (and a corresponding Λ 3 cutoff) [9,10,90]. Again phrasing this explicitly in terms of m, at cubic order its scale is set by (M P l m 2 ) 1/3 .
The conformal transformation used to eliminate scalar-tensor mixing will turn {n h , n A , n π } terms into {n h − n, n A , n π + n} ones. However, it is worth stressing that the factors in front of the scalar fields in the conformal transformations will be ∼ m 2 , and the canonical normalisation of the fields involves a factor of m −2 for the scalar (as compared to tensor) modes. As a result the new interaction terms introduced by the conformal transformations retain their original scaling. That is, the conformal transformations will give terms, after canonical normalisation, {n h − n, n A , n π + n}. (7.5) Note that the bigravity statement, that interaction terms introduced by the conformal transformations retain their original scaling, will not strictly remain true in the N spin-2 field case discussed below, since the m 2 (i) and m −2 (j) that cancel out in the bigravity case (i = j here) no longer cancel there (since i = j for all interactions in theories with N > 2). More precisely, the scaling will stay of the same functional form, but will yield contributions where the 'original' m 2 (i) is replaced by a different m 2 (j) -more on this in the following sections.
The decoupling limit: Having understood the scale of all interaction terms one can proceed to the decoupling limit. Heuristically this amounts to zooming in on the interaction suppressed by the lowest scale, while setting all other interactions to zero. This limit is particularly interesting because the physics at low energies is dominated by the decoupling limit interaction term(s). Typically this is consequently the relevant limit for computing the cosmological evolution as well as e.g. spherically symmetric solutions around massive bodies. More formally we can define the decoupling limit by identifying the lowest suppression scale Λ dec 23 and taking the limit Note that the Galilean shift symmetry imposed on π in the pure-scalar action prior to demixing (a consequence of the nature of the Stückelberg trick and the introduction of a U (1) sym- 23 As discussed above this will generically be Λ5, but can be tuned to a higher scale such as Λ3.
metry for π via A µ →Ã µ + ∂ µ π) results in decoupling limit interactions for π of the Galileon type [93] 24 . This is shown explicitly in the massive and bigravity cases in [9,23,31,90]. What changes when we consider theories with more than two spin-2 fields? Generically there are now multiple mass scales (or equivalently: several interaction terms with different coupling constants) 25 . As a result there are now multiple ways of sending all mass scales to zero and all Planck scales to infinity, while keeping a number of distinct 'decoupling' scales Λ dec fixed. As a result there is no longer a unique decoupling limit 26 and instead we have a number of 'decoupling limits' DL (k) labelled by the index k In what follows we will pay special attention to the limit corresponding to the lowest such scale, i.e. the strong-coupling scale of the theory. We will call this scale Λ (+) λ . However, in order to understand this limit properly, we will keep a full hierarchy of interaction terms, which could all in principle contribute to this lowest scale. As we will show, which terms contribute and what their precise hierarchy is depends on the nature of the interaction terms and the coupling constants of different interactions. As a final comment, it is worth pointing out that the strong-coupling scale inferred by zooming in on the lowest scale of the theory is not necessarily the cut-off of the theory, but instead one may hope to have discovered a phenomenologically interesting (and strongly coupled) Vainshtein regime -cf. related discussions in [33].
Trimetric Λ 5 theories: We now move on to consider interactions for N spin-2 fields at cubic order and higher. The analogue of (7.1) in this case is In other words, this corresponds to the interactions generated by a term with coupling constant m 2 (c) M 2 P l ; h, A, π are shorthand for h (i) , A (jk) , π (mn) , where one of {j, k} and one of {m, n} has to be identified with i. All possible cross-terms allowed by this constraint will in principle be generated, depending on the structure of S int . Note that there is noπ here, since canonical normalisation applies to the propagating modes χ (l) only. Consequently we have to first express interaction terms in terms of the χ (l) before identifying the scale of all interactions. The fact that the χ (l) are linear combinations of π's makes this straightforward. It is also worth pointing out that for this reason we roughly expect the hierarchy (7.4) to stay intact. π (i) interactions will combine to yield χ (l) ones, so that {n h , n A , n π } combine to {n h , n A , n χ } ones. As we shall see the scale suppressing these terms changes, however.
As an initial concrete example consider a trimetric line theory as in (6.8). Generically the cubic interaction terms suppressed by the smallest scale (corresponding to Λ 5 in the bigravity case) will come from where π (1) , π (2) are the Stückelberg scalars associated with interaction terms with coupling constants m 2 (1) , m 2 (2) respectively. Both interaction terms are of the same functional form 24 Note the earlier work of [94,95]. This already contains Galileon-type interactions, which were independently re-discovered by [93]. 25 In addition there are also the in principle different Planck masses M P l for each of the spin-2 fields. 26 We thank Kurt Hinterbichler for pointing this out to us.
(only the coupling constants differ) -note that the relative sign between the two terms depends on how the Stückelberg fields Y are introduced, see section 3 for details. We now wish to re-express these interactions in terms of the kinetic eigen-modes φ (1) and φ (2) , which we recall are related to π (1) , π (2) via π (1) = 1 √ 2 (φ (1) + φ (2) ) and π (2) = 1 (2) ) . (7.10) We can now canonically normalise all the fields involved, again as discussed in the previous section, and the interaction terms become where a hat denotes canonical normalisation. We now still have to diagonalise the mass matrix forφ in order to express cubic order interactions in terms of the modesχ (1) ,χ (2) , which independently propagate at quadratic order. Note that, if m (2) = m (1) ,φ (i) =χ (i) and (7.11) already shows the cubic action for the independently propagating modes. However, in general we have to diagonalise the mass matrix where we are ignoring overall numerical factors, since we do not care about the numerical coefficients in front of mass terms forχ (1) ,χ (2) (these will vanish in the decoupling limit anyway). We now find the normalised eigenvectors of this matrix and use these to diagonalise (7.12) and in the process express cubic interactions in terms ofχ (1) ,χ (2) . We re-emphasize that, since we are using an orthogonal matrix to diagonalise the mass term, the resulting kinetic matrix forχ (1) ,χ (2) will also be canonically normalised and diagonal (since the one forφ already was). As a result of completing the full diagonalisation procedure, we obtain the following expression for cubic interaction p+q=3 p,q=0,1,2,3 where c i,p,q and d i,p,q are constant coefficients. Note that it is also possible to write this in closed form as (1) ) 3 + C 2 (∂ 2χ (1) ) 2 (∂ 2χ (2) ) + C 3 (∂ 2χ (1) )(∂ 2χ (2) ) 2 + C 4 (∂ 2χ (2) ) 3 (7.14) up to numerical coefficients not depending on any mass scale and where we have defined The first obvious observation to make is that, even though we have demixed the scalar action at quadratic order, at cubic order the mixing persists and interaction terms forχ (1) andχ (2) should not be looked at in isolation. There are four distinct interaction scales for the cubic action now, set by the C's in (7.14). What does this mean for the decoupling limit? It is interesting to look at two asymptotic limits.
1. m (1) = m (2) = m 2 : This is the case where α = 1 and here (7.11) already represents the cubic interaction for the independently propagating dof and we see that (up to numerical coefficients) the scale of interactions is set by where the positive sign applies to the (∂ 2χ (1) ) 3 , (∂ 2χ (1) )(∂ 2χ (2) ) 2 interactions, whereas the negative sign comes in for (∂ 2χ (2) ) 3 , (∂ 2χ (1) ) 2 (∂ 2χ (2) ), i.e. these modes are infinitely suppressed here. The suppression scales associated with the two surviving interaction terms are Λ is lower than the corresponding Λ 5 in a bigravity theory. See figure 8 for an illustration. We therefore expect such theories to become strongly coupled at a scale mildly lower than the analogous massive/bi-gravity theories. (1) m (2) : This is the case where α ∼ 0, i.e. one of the masses is much heavier than the other. In this case only the m (2) modes in (7.11) remain relevant, and comparing with (7.14) we have i.e. the physics of the theory is completely dominated by the lighter link, as expected.

m
In a general trimetric Λ 5 theory with arbitrary m (1) , m (2) the least suppressed mode will generically have a suppression scale Λ (+) 5 smaller than or equal to Λ (2) 5 (where we recall we have chosen m (2) ≤ m (1) ) -cf. Figure 8. Λ 5 theories with N spin-2 fields: The argument shown above can be generalised to more complex models, i.e. the generic bimetric interaction and n-point interaction theories discussed in the previous section. There will always be (several) 'sum' and 'difference' modes, where the sum/difference term in the denominator of the effective masses mimics the way the χ (l) are composed in terms of the π (i) . Some effective sum mode m 4 ef f,(+) will always be the lightest one and Λ (+) λ consequently always the decoupling limit scale. Note, however, that, just as in the trimetric example discussed above, (some of) the other effective masses associated with 'difference modes' may only be marginally heavier, so that generically one should keep all modes with similar suppression scales when investigating the phenomenology of some 'generalised decoupling limit'.
Let us now see how the above argument for trimetric Λ 5 theories generalises for an N spin-2 field theory. Especially interesting will be how suppression scales scale with N , the number of spin-2 fields. In a generic theory a Λ 5 term can be written 1 Λ 5 5 C µνρσαβ ijk π i µν π j ρσ π k αβ , (7.24) and when written in terms of the propagating modesχ (using (6.20)) the coefficient matrix becomesC since the normalisation matrix is diagonal. In a 'line' theory the kinetic terms will only mix scalars from adjacent links, and hence the kinetic term matrix will be tri-diagonal, and in the specific case in which the quadratic interactions take on the form found in the dRGT theory (when extended to bigravity), the smallest eigenvalues of this matrix scale as N −2 for large N . In such a theory (and in fact for any theory with purely bimetric interactions) the terms in equation (7.25) involve only fields of identical (original) flavour, i.e C abc ∝ δ ab δ bc (note that we do not sum over the b index until substituting into (7.25) and that, to avoid clutter, the space-time tensor structure is suppressed). Writing we can re-express equation (7.25) using (7.26) and C abc ∝ δ ab δ bc to obtaiñ where we have ignored the tensor structure relating to the Greek space-time indices, which is of no importance here. The most quickly growing entries in the coefficient matrix, associated with the smallest eigenvalues λ (i) of the kinetic scalar matrix, scale asC ∼ N 5 2 , corresponding to a suppression scale We can understand this scaling behaviour by noting that the canonical normalisation matrix N ab scales as N for the smallest eigenvalues (the corresponding eigenmodes will be those with the most strongly N -suppressed scale). Furthermore, the elements of the eigenvectors ofŨ will generically scale as O(N − 1 2 ). In other words, this second factor comes in because of the orthogonal nature of U K and U M . Overall this results in a scaling behaviour for the least suppressed modes That is, the effective suppression scale is Λ = Λ 5 / √ N as we found above. Note that care must be taken with this brute-force way of obtaining the scaling. While we have confirmed that it gives the correct result here, as we shall see below, if the flavour structure of C ijk is more complicated, then it is possible for cancellations to occur which lead to a slower scaling with N than would otherwise be expected from this argument.
An interesting observation for the Λ 5 line theory is that the same scaling can be obtained in a simplified way here. If U K is taken to be the orthogonal matrix of normalised eigenvectors of K, then the scaling ofC with N happens to be unaffected by the (unitary) diagonalisation of the mass terms. It will be interesting to investigate whether there is some deeper, modelindependent, reason why this is the case. However, taking the observation at face value, we here haveC where λ are the eigenvalues of the (pre-mass demixing) kinetic term matrix, which we already found scale as N −2 . The same reasoning as for equation (7.29) then establishes the scaling Λ = Λ 5 / √ N .
Trimetric Λ 3 theories: As seen in dRGT and Hassan-Rosen bigravity, higher order interactions can be tuned such that the lowest relevant scale is Λ 3 . Suppose we now build an interacting spin-2 field theory with N fields by linking sites with such ghost-free dRGT-like interaction terms and their generalisations to interacting spin-2 fields [9, 10, 36-38, 71] 28 . Again we will first give an explicit example for the trimetric case in order to illustrate the types of interaction terms one obtains and then consider the scaling with N for theories with a larger number of spin-2 fields. Focusing on the cubic terms, we expect interactions relevant in the decoupling limit to come from terms carrying the Λ 3 scale in Hassan-Rosen-like models ∼ h(∂ 2 π) 2 + (∂π)(∂h)∂ 2 π + (∂π) 2 ∂ 2 h + (∂A) 2 ∂ 2 π. (7.31) The first three terms are all of the {1, 0, 2} type, where the second and third term involving derivatives of h come from higher order terms in the expansion of the metric -cf. equation (5.4) and also appendix B. Note that it is consistent to ignore the final ({0, 2, 1}) term, since vector perturbations can classically be set to zero [90,96]. Consequently we will first focus on the {1, 0, 2} piece in isolation, before considering the final term later on. Again consider the trimetric line example (6.8). We focus on the relevant scalar-tensor interactions involving the 'central node'. This term will be affected by the conformal transformation we use to eliminate scalar-tensor mixing at quadratic order in the fields. Here this transformation will now introduce a scalar mixing in the cubic and higher order interactions even before re-expressing π (1) , π (2) in terms of the kinetic eigen-modes φ (1) , φ (2) . In particular, for the {1, 0, 2} terms in equation (7.31), this results in interaction terms of the type ∼ π (i) (∂ 2 π (j) ) 2 + (∂π (i) )(∂π (j) )∂ 2 π (i) + (∂π (i) ) 2 ∂ 2 π (j) . (7.33) We now want to show how such terms combine in terms of the kinetic eigen-modes φ (l) . In this section we will show this explicitly only for the cubic h(∂ 2 π (i) ) 2 term -it is straightforward to perform the analogous calculation for the second and third term and we will show combined results in a more compact notation in the following subsection. For the time being, however, it is useful to see the explicit way in which interaction terms combine in terms of the propagating modes. So for cubic h(∂ 2 π (i) ) 2 interactions we schematically have where we only show the pure scalar interactions post-conformal transformation and we recall that h (0) corresponds to the 'central' node in our trimetric theory. Re-expressing the interaction terms in terms of φ (1) and φ (2) , we recall π (1) = 1 √ 2 (φ (1) + φ (2) ) and π (2) = 1 √ up to overall numerical coefficients and where we have not canonically normalised the φ (i) fields yet 29 . The factors coming in as multipliers at the end of each line come from the pre-factor of the interaction term, the dimension of c (i) and the α factor in π (2) = 1 √ 2 (φ (1) − φ (2) )α −2 respectively. Canonically normalising and combining terms and mass scales, this set of cubic interactions reduces to (2) ).
(7.36) 29 Here one can also directly see that the conformal transformation generates interaction terms which do not retain their original scaling, since each c (i) only knows about one mass scale m (j) and cross-terms i = j consequently change the mass-scale in front of interaction terms.
where we see that the surviving interaction terms for the kinetic eigenmodes have effective mass scales We can also compute the contribution from the 'outer nodes' giving rise to h (1) and h (2) . There we have Again we re-express the interaction terms in terms of φ (1) and φ (2) and find 39) up to relative signs between interaction terms in the final expression, as these will depend on the relative size of m 2 (1) and m 2 (2) , now that we have expressed everything in terms of m 2 ef f,(±) . When combining contributions from (7.36) and (7.39), we find that all interaction terms only depend on these two mass scales.
Again we now have to diagonalise the mass matrix forφ in order to express cubic order interactions in terms of the modesχ (1) ,χ (2) , which independently propagate at quadratic order. Note that, if m (2) = m (1) ,φ (i) =χ (i) and (7.39) already shows the cubic action for the independently propagating modes. Just as in (7.12) in the Λ 5 case above we have to diagonalise the mass matrix Everything proceeds as in the Λ 5 case and we obtain the following expression for cubic interactions p+q=3;r+s=1 p,q=0,1,2,3;r,s=0,1 where c i,p,q,r,s and d i,p,q,r,s are again numerical coefficients. Closed form expressions as in (7.14) can also be derived, but the relevant scales can already be read off from (7.41). We re-emphasise that we have only been considering the h(∂ 2 π) 2 contribution to the {1, 0, 2} terms in (7.31) here. In general additional different interaction scales will arise from the other two {1, 0, 2} contributions in (7.31).
As a final step we should now again re-express (7.43) in terms of the modesχ (1) ,χ (2) . As a result we obtain wherec andd are numerical coeffcients not identical to the c, d coefficients of (7.41). Combining our results for h(∂ 2 π) 2 and (∂A) 2 (∂ 2 π) terms, it becomes clear that the decoupling limit has qualitatively changed from the bimetric Λ 3 case. Instead of just having two suppression scales Λ 3 associated with m 2 (1) and m 2 (2) respectively, we now have several 30 scales associated with {1, 0, 2} interactions and four scales associated with {0, 2, 1} terms. In particular we observe that interacting spin-2 field theories generically break the degeneracy between the scale of (∂A) 2 (∂ 2 π) and h(∂ 2 π) 2 interactions. This is illustrated in figure 8. Again it is useful to concentrate on two asymptotic cases in order to see how this happens explicitly. corresponding to the 'sum mode' arising from the {1, 0, 2} terms (the difference mode is the one with infinite mass and hence infinitely suppressed) and the 1/m 2 suppression carried by (∂A) 2 (∂ 2 π) respectively. This is an explicit example of how interacting spin-2 field theories break the degeneracy between the scale of {1, 0, 2} and {0, 2, 1} interactions. Again the smallest suppression scale is Λ (+) 3 and strictly speaking only interactions carrying this scale survive the decoupling limit. As before, however, one should be cautious that Λ (+) 3 and Λ (2) 3 may lie very close together, depending on the theory under consideration. As we will see in the next subsection, however, for large N theories the N -dependence of interaction terms will become the dominant factor unambiguously establishing which suppression scales are lowest.
2. m (1) m (2) : Again we also consider the case where one of the masses is much heavier than the other. The m (1) dependence of interaction terms then drops out and we find that {1, 0, 2} terms are of the form p,q=0,1,2,3;r,s=0,1 whereas {0, 2, 1} terms become The strong coupling scale is therefore now set by (up to numerical coefficients) Not surprisingly this is the same scale as for bigravity with m = m (2) -the much more massive m (1) modes are irrelevant to all intents and purposes here. It is worth pointing out that the extreme case considered here corresponds to the largest possible value for Λ (+) 3 -typically this scale will always lie lower than any individual Λ (n) 3 corresponding to a bigravity theory formed out of one of the theory's links. Λ 3 theories with N spin-2 fields: As discussed in the Λ 5 example above, one can generalise the trimetric Λ 3 argument to more general N spin-2 field models of the Λ 3 type. These will also give rise to several 'sum' and 'difference modes', with an effective sum mode Λ (+) 3 providing the lowest decoupling scale. We emphasize that these modes will always come from linear combinations of {1, 0, 2} interactions, re-iterating the qualitative change in the decoupling limit for multiple interacting spin-2 fields.
Let us again see what happens when we consider line theories with a large number of fields. More specifically we will be particularly interested in finding out how the suppression scale varies with the number of spin-2 fields. As explained above, in the Λ 3 case there will be two qualitatively different types of contributions to the pure scalar cubic terms. They are of the form π(∂ 2 π) 2 and (∂π) 2 (∂ 2 π) -cf. equation (7.33). We can compactly write these interaction terms as A,ijk π i π j µν π k αβ , and 1 Λ 3 3 C µν B,ijk π i µν π j,α π k α . (7.49) Specialising again to the case of a 'line' theory in which the quadratic interactions are of the dRGT/Hassan-Rosen type, the flavour index structures of C A and C B are: where β L and β R are constants, there is no summation over repeated indices here, and the tensor structure leads to terms π i X µ (2),µ (π j , π k ) and ( π i )π α j π k α respectively. We emphasize that lower case Latin indices are 'flavour' indices, whereas Greek indices are space-time indices. Expressing the interactions (7.49) using equations (7.25)-(7.27), we find We now again want to find out how the effective scale of interactions scales with N . Compared to the Λ 5 case, we now have a more complicated flavour structure, so care must be taken to track the cancellations of terms this implies. A brute-force scaling argument analogous to (7.29) would be blind to this subtlety. More specifically we find that the tensor structure imposes certain symmetries on the flavour structure. So for example the A-type terms are totally symmetric in the three fields, and the B-type terms are symmetric in the last two fields. With this in mind, one finds that the most quickly growing entries in the coefficient matrix scale asC A ∼ N We note that the B-type terms scale fastest, and hence will come to dominate. This also means the decoupling limit will be insensitive to te coefficients β L,R , which only affect A-type terms -cf. (7.50). In analogy to (7.28), the strong coupling scale of the theory will therefore be set by the B-type terms and we have Note that this scaling is the same as was found for the strong coupling scale in dimensional deconstruction models [33]. Those models deal with 'loop', rather than 'line', theories, but we see that in the large-N limit they scale in the same way. In this context it is perhaps also worth mentioning the mass eigenvalue prescription used by [33,79,80,82,83] in order to deal with what we have dubbed 'scalar mixing'. In the specific context of a circle theory this Fourier based method is a highly effective way of not only obtaining the propagating modes, but also the scaling of the interaction terms with N . Finally, it is again interesting to note that, at least for the line theories under consideration, the N -dependence for the 'decoupling' interaction terms is not affected by the mass-diagonalisation, just as we found in the Λ 5 case. Ignoring the mass-diagonalisation (essentially setting U M ab = δ ab ), the analogous expression to (7.30) is now: (a) EBI bigravity (b) Hassan-Rosen bigravity (c) 4 spin-2 "EBI"

Example I: EBI (Eddington-Born-Infeld) Bigravity
Throughout the previous sections we have provided a number of examples to illustrate the features of interacting spin-2 field models. In this and the following two sections we wish to complete this array of examples by explicitly showing how the Stückelberg analysis can be carried out for three such theories and e.g. how it can be used to demonstrate the presence or absence of ghosts (at least in a given limit/up to a given order in interaction terms). We begin by considering the EBI bigravity theory presented in [97] in its explicit bigravity formulation [98] Note that different versions of EBI theories exist [99] based on work by [100][101][102], where effectively one of the two metrics becomes an auxiliary field, so we are left with a single dynamical spin-2 field. Also see [99,[103][104][105][106][107] for related phenomenological studies in this version of the theory, e.g. motivated by the fact that in EBI-type theories cosmological solutions sourced by a perfect fluid can avoid the initial big bang singularity and naturally produce a period of exponential expansion akin to inflation. Though see [108] for possible problems with the description of compact objects. In what follows we will only consider the bigravity version (8.1). 31 The generalised Fierz-Pauli condition: In its bimetric formulation the Eddington-Born-Infeld interaction term can be written as As discussed in sections 3 and 5, we restore the gauge-invariance of the theory by introducing a single Stückelberg field Y µ = x µ +Ã µ → x µ +A µ +∂ µ π for the f µν metric, pulling it onto the 31 Note that the EBI bigravity interaction term, tr[g −1 f ], is reminiscent of one of the known ghost free interaction terms from the dRGT model and Hassan-Rosen bigravity we discuss in the next section, , only lacking the first trace of the square root.
'site' of the g µν . We will ignore the Stückelberg vector field A, and focus on the Stückelberg scalar π. We also expand each metric about a common, flat background, Up to terms quadratic in the fields, the result is 32 The pure-scalar part of this action (still prior to demixing) is L EBI, pure-scalar int, pre-demixing = 4 + 2π α α + π µν π µν (8.5) This action needs to satisfy the generalised Fierz-Pauli condition (5.3) to avoid ghost-like degrees of freedom. But in fact comparison with (5.2) shows that a = 1 and b = 0 here, so we have a + b = 1 = 0 and the generalised Fierz-Pauli condition is violated. Since the dangerous quadratic self-interactions for π consequently do not vanish up to total derivatives, a ghost-like degree of freedom enters the theory. This is also shown in table 1.
The full Stückelberg analysis: Returning to (8.4), we now wish to demix scalar and tensor modes, in the process introducing a standard kinetic term for π. As described in section 5, this is achieved by the conformal transformations h µν →h µν + c g πη µν , l µν →l µν + c f πη µν .
The new terms generated when these transformations are applied to (8.4) are while the linearised kinetic terms give (cf. (5.15)) L lin,g EH + L lin,f EH →L lin,g EH +L lin,f EH + 3(c 2 g + c 2 f )ππ µ µ + 2(c g h µν + c f l µν )(η µν π ρ ρ − π µν ). (8.8) Thus post conformal transformation the EBI Lagrangian is, to quadratic order and neglecting the Stückelberg vector A µ , (n,m) EBI scalar terms Galilean invariants Total derivative combinations Table 1. EBI bigravity scalar interaction terms from (8.11) up to numerical prefactors and grouped by (n, m), where n denotes the number of derivatives ∂ and m denotes the order in the field π for the given term. We list all EBI bigravity terms in the second column and go up to order (n, m) = (4, 2) -sufficient to see the appearance of ghosts that result from a violation of the generalised Fierz-Pauli tuning (5.3). Note that π has no potential terms {0, 1}. Gray (light shaded) contributions vanish up to total derivatives, while white contributions give Galilean invariants, i.e. ghost-free terms. Red (dark shaded) contributions denote terms which result in the equation of motion for π being higher than second order and hence give rise to Ostrogradski ghosts.
The scalar-tensor kinetic mixing can be partially removed by choosing c g = m 2 2 = c f , giving As discussed in section 5 the scalar-tensor mixing has only partially been removed by the conformal transformations and there are still residual mixing terms involvingh α α andl α α . These are removed by adding appropriate gauge-fixing terms enforcing the traceless nature ofh andl, leaving us with Extending the argument of section 5, it is now straightforward to see why a failure of the generalised Fierz-Pauli tuning (5.3) pre-demixing cannot be cured by taking into account scalar-tensor mixing terms. We denote the order of a given scalar interaction term by (n, m), where n counts the number of derivatives ∂ and m the order in the field π. Importantly different (n, m) cannot be related by integration by parts, so they cannot cancel up to total derivatives. Any (n, m) terms that produce dangerous ghost-like interactions must therefore vanish by themselves. Now consider the π α α π β β and π αβ π αβ terms relevant for the generalised Fierz-Pauli condition (5.3). These are {4, 2} terms, as expected, since these interactions are generated via the Stückelberg trick prior to any demixing procedure. Mixing terms h(∂ 2 π) n will always generate {2n, n + 1} scalar terms, so can never cancel potentially dangerous {2n, n} scalar interactions already present prior to demixing. This is why the dangerous π µν π µν interaction present in (8.11) was already visible in (8.5).
It is interesting to note that the ordinary kinetic term for π in (8.11) already has the wrong sign. In addition, the higher-derivative terms in (8.11) are non-degenerate, so given that they are not of the Galilean invariant form (cf. table 1) it is straightforward to appeal to Ostrogradski's theorem to infer the presence of ghosts. To see this explicitly and also in order to determine the mass scale of the ghost-like dof , we consider the non-total-derivative scalar piece from (8.11) and write where we have canonically normalised the scalar field π → m −2 M −1 Plπ and introduced an auxiliary field σ in the first line. In the second line we then perform a field re-definition π →π − σ to demix the fields. Thus we see that there is in fact just one ghost in the theory which is massless and the theory already displays ghost-like behaviour before any cubic or higher order interactions are taken into account. One further comment may be of interest. If the π field did have a mass term, then the field re-definition used above would generate a ∼ m 2 πσ coupling, i.e. a mixing in the potential V (π, σ). This is a quirk of the theory in D = 4 with the Planck masses of the two metrics equal.

Example II: Ghost-free Bigravity
Ghost-free massive gravity theories have recently been constructed, first for a single dynamical spin-2 field [9][10][11] and then an extension for two dynamical and interacting spin-2 fields [36][37][38]. The particular form of the interaction term V (g −1 f ) is what ensures ghost-freedom in these theories. Focusing on the second case, Hassan-Rosen bigravity, the action for in D space-time dimensions 33 is   33 Note that contrary to the rest of this paper, in this section we generalise both away from D = 4 (cf. [71]) and to non-equal Planck masses for the spin-2 fields. The reader will note that this does not change the procedure in any fundamental way. There are D free parameters β n , two of which correspond to cosmological constants (β 0 and β D ) and there is degeneracy between scaling the β n and m. Here we now wish to apply the Stückelberg machinery to this theory as a check for the 'analysis recipe' developed throughout this paper to be compared with the results of [9][10][11]31].
The Stückelberg analysis: We now restore gauge invariance for (9.1), where in the language of section 4 we take approach I. In other words, we only introduce a single Stückelberg field Y mapping f to the site of g or vice versa. Note that what seems like a trivial and arbitrary choice in this context actually lies beneath the duality presented in [32]. The two different choices of link field Y f g or Y gf will lead to two different galileon theories for the Stückelberg theories in the decoupling limit, which are nevertheless physically equivalent, i.e. dual to one another. With an interaction e m ( g −1 f ), we here choose to map f to the site of g, i.e. work with the Stückelberg fields Y f g 34 . Each metric is expanded about Minkowski as g µν = η µν + h µν , f µν = η µν + l µν , (9.5) where, contrary to the previous example, we have defined perturbations for g and not g −1 .
We have chosen this here since if one is building theory graphs with more links, such as in section 6, then it can be helpful to treat all fields in the same way. Since the action is linear in the e i we can consider each separately and confirm that, to a given order, no ghost is present.
1. The pure scalar part prior to demixing: For an e m interaction it can be shown that the pure scalar part is where L TD n (π) indicates the n th order total derivative combination of second derivatives of π -for more details see appendix A. Thus the Lagrangian is purely total derivative and vanishes (up to boundary terms). Note that this is not only true at quadratic order in the field π, but at all higher orders too. Phrased in terms of {n h , n A , n π } as defined in section 7 this means that all pure scalar interactions from {0, 0, 3} to {0, 0, n π → ∞} associated with suppression scales Λ 5 to Λ 3 are eliminated up to total derivatives. Since this tuning also results in A-dependent Λ 4 suppressed terms vanishing, this raises the cutoff of the theory to Λ 3 . 2. Scalar-tensor mixing: The mixing with h takes the form (ignoring terms with more than one h since these will have higher suppression scales and hence vanish in the decoupling limit) where X (n) (π) are the conserved tensors formed from second derivatives of π as defined in (5.13). If the f metric is fixed to be flat as in dRGT massive gravity, there are no further scalar-tensor mixing terms. However, as f is dynamical there will be further terms involving the mixing of l with π. Such terms take the form At lowest order in the scalar-tensor mixing we can simplify these re-summed expressions to an even simpler form. The mixing (9.8) then becomes where → denotes the effect of the conformal transformation l →l + c l π, and c is given by (9.11). Thel α α term actually vanishes by itself (but otherwise would be removed by gauge-fixing),l µν X µν (n) (π) cancels against terms generated from the Ricci scalar R[l] via the conformal transformation (see section 5) and πX µ µ,(1) (π) gives a contribution to a kinetic term for π.
3. Demixing and ghost-free scalar interactions: From the above we see that transformations will eliminate scalar-tensor mixing at quadratic order in the fields, where we have defined the coefficient c to be This means we obtain a scalar kinetic term (9.16) 9.12) and the structure of the interaction term ensures there are no further dangerous interactions at quadratic order in π, i.e. Fierz-Pauli tuning is upheld. The fact that this tuning generalises to eliminate all pure-scalar higher order interactions ensures that the decoupling scale of the theory is Λ 3 . One could now write out higher-order interaction terms in the decoupling limit to some given order in the π and show they are ghost-free up to this given order along the lines sketched out in [9]. However, the infinite derivative expansion of l µν (Y ) means that for this purpose it is more economical to analyse interactions in a re-summed format, cf. [31].
4. Re-summing interactions: The infinite derivative expansion of l µν (Y ) in the scalartensor mixing involving l creates a difficulty in analysing the full decoupling limit of the theory, rather than the same limit to some finite order in π. Following [31] it is therefore advisable to re-sum l µν (Y ) After a little algebra and using the techniques discussed in appendix B, the mixing terms (9.8) become In particular, in terms of the field φ related to π through x + ∂φ = (x + ∂π) −1 , we find (9.14) π and φ are essentially the Stückelberg scalars corresponding to choosing a Stückelberg field Y mapping from the f -site to the g-site and vice versa respectively. In other words, by Stückelberging the scalar-tensor mixing involving l with Y (gf ) and that involving h with Y (gf ) , we are essentially employing a hybrid approach, since we use two Stückelberg fields to Stückelberg a single interaction term, although they are of course not independent, but related via Y (gf ) = Y −1 (f g) . Having expressed the scalar-tensor mixing in this hybrid format, the conformal transformations demixing at quadratic order in the fields are now where c is defined as in (9.11). This will remove the scalar tensor mixing (at lowest order) and give scalar kinetic terms Note that we can express φ in terms of an infinite series in π (see appendix B) to find that, again at lowest order, φ = −π. This explains why both conformal transformations (9.10) and (9.15) successfully demix scalar and tensor terms at lowest order.
Having employed the above re-summation procedure, investigating higher order mixing terms becomes more straightforward. The canonical transformations (9.15) will introduce terms of the form where πX µ (n)µ (π) = π∂ µ 1 (δ µµ 1 ...µn [µν 1 ...νn] π ν 1 π ν 2 µ 2 . . . π νn µn ) = (D − n)π∂ µ 1 (δ µ 1 ...µn [ν 1 ...νn] π ν 1 π ν 2 µ 2 . . . π νn µn ), (9.18) and analogously for φX µ µ (φ). As discussed in appendix A, upon integration by parts this becomes π µ π ν X µν (n−1) (π), i.e. the (n+1)-th Galileon term, and so these will not introduce any new ghosts. For a full treatment of the Hassan-Rosen bigravity decoupling limit we refer to [31]. As far as this paper is concerned, there is an important point of principle here, however. The effective field theory description outlined here is very powerful in understanding what the propagating degrees of freedom of a general N interacting spin-2 field theory are, how to identify the associated energy and cutoff scales etc. However, if we want to make statements about interaction terms to all orders, the inherent nonlocality introduced by the Stückelberg expansion of a spin-2 field h(x + A) means it becomes necessary to perform a re-summation as outlined here and discussed further in appendix B.

Example III: A quartic EBI-like interaction
The previous two examples are explicitly bimetric, so we would like to finish the example section(s) of this paper with a higher order example. By using the arguments presented throughout this paper we here wish to show that N -metric generalisations of the bimetric EBI models considered in section 8 also always contain ghost-like instabilities at quadratic order in the Goldstone fields π. As such consider an EBI-like theory with four dynamical spin-2 fields, each equipped with its own Ricci scalar, and an interaction term βµ . (10.1) In order to restore gauge-invariance we introduce three Stückelberg fields along the lines discussed in sections 3 and 5, which we label π (j) . We can demix tensor and scalar contributions via where h (i) = g (i) − η. Note that, when we use the shorthand π (3) , we really mean π (3,0) , but in this specific example the extra index is redundant, since we have mapped all spin-2 fields to the 0-site. Now we recall that, even before demixing and as shown in equation (5.2), the interaction term needs to satisfy a generalised Fierz-Pauli condition at quadratic order to avoid ghost-like degrees of freedom. The pure-scalar action prior to demixing is L pure-scalar int, pre-mixing = η µν δ β µ + π β,(1) At quadratic order in the fields this reduces to L pure-scalar int, pre-mixing = 4 + 2 i π α,(i) This maximally violates the generalised Fierz-Pauli condition at quadratic order, i.e. none of the dangerous self-interaction and cross-terms vanish up to total derivatives (cf. equation Consequently ghost-like degrees of freedom enter at quadratic order in the fields and it is straightforward to see how this generalises to an arbitrary N spin-2 field generalisation of EBI bigravity. The pattern of equation (10.3) will be reproduced and all possible dangerous terms will be present, since there is no means for these to cancel given the interaction term.

Summary and conclusions
Throughout this paper we have generalised and extended the 'Effective field theory for massive gravitons' developed by [77] to an 'Effective field theory for interacting spin-2 fields' in light of recent progress in massive gravity -in particular the constructions of dRGT gravity [9,10], Hassan-Rosen bigravity [36][37][38] and Hinterbichler-Rosen multi-gravity [71]. The key results of this paper are as follows • We discussed how the Stückelberg trick can be used to explicitly restore gauge invariance in interacting spin-2 field theories. While in bigravity there are two different ways to do so, which already lead to very interesting duality theorems [32], in multi-gravity there is a multitude of possible approaches. We investigated these and showed that they are also ultimately all dual to one another. In particular we established a minimal approach that maximally simplifies interaction terms and makes the propagating degrees of freedom explicit.
• We investigated generalised scalar-tensor mixing in multi-spin-2 field theories and how to remove this, bringing out various subtleties not present in massive and bi-gravity. Especially intriguing is a generalised 'Fierz-Pauli' condition one can derive for the purescalar interactions at quadratic order, even before any demixing has taken place. Gauge fixing issues relevant in eliminating scalar-tensor mixing at lowest order in this context are also discussed.
• Another phenomenon absent in massive and bi-gravity, but generic in multi-gravity, is that of scalar mixing at quadratic order in the fields. We investigated this mixing and showed how it changes some of the propagating degrees of freedom of the theory into linear combinations of the Stückelberg scalars.
• Scalar mixing at quadratic order has important consequences for the structure of higher order interactions, in particular the decoupling limit, as well. We presented results showing that this limit is qualitatively changed in interacting spin-2 field theories. More specifically they generically possess a strong coupling scale that is lower than that in analogous bigravity models (cf. related results in [33]). Also only some of the interactions surviving in the decoupling limit in massive and bi-gravity contribute to this new lowest scale in multi-gravity theories (i.e. multi-gravity models break the degeneracy between different terms contributing to the same scale 35 ), changing the physics of the decoupling limit.
• Finally we applied some of the machinery developed here to a few concrete model examples. Of particular interest is our proof that EBI (Eddington-Born-Infeld) bigravity has a ghost-like instability, as well as generalisations of this theory to higher-order interactions. We also showed how Hassan-Rosen bigravity fits into our formalism.
Theories of interacting spin-2 fields stand to promise several new and intriguing features absent in theories with up to two spin-2 fields. Various tantalizing prospects are on the horizon: Higher-dimensional theories of gravity where the extra dimension(s) are discretised are intrinsically linked to particular interacting spin-2 field theories where the theory graph forms one big loop with nearest site interactions [33,[77][78][79][80]. We will investigate this and the general phenomenology of loops further in [92]. In this context one may also wonder whether the large N limit of multi-gravity theories possesses interesting generic features on top of the ones discovered so far. Perhaps new ghost-free interaction terms appear in the context of several fields, that are absent whenever just one dynamical spin-2 field is present. Another interesting topic for further investigation will be to establish whether consistent multi-gravity theories permit the existence of superluminalities -cf. discussions in [109][110][111][112]. Finally one should stress that both the 'Effective field theory of massive gravitons' as well as its extension presented here exclusively deal with constructing and analysing 'potential' interaction terms, i.e. terms at zeroth order in derivatives. Investigating whether derivative interaction terms other than Ricci scalars are admissible along the lines of [87][88][89] may lead to exciting new avenues in constructing theories of one-to-N spin-2 fields. We hope that the toolbox developed throughout this paper will prove useful in tackling some of these outstanding issues in the future.
So again this mixing leads to ghost-free scalar interaction terms at all orders n. It is straightforward to generalise this to conformal transformation involving more than one scalar dof in interacting theories with mixing cross-terms involving X µν (n) (π 1 , . . . , π n ), where not all the π (i) are identical. Also note that we can express X µν (n) in terms of L TD (n) via [10,90] X µν (n) (π) = 1 n + 1 δ δπ µν L TD (n+1) (π). (A.13)

B Appendix II: Non-local scalar-tensor interactions and demixing
We discussed scalar tensor mixing terms at quadratic order in the fields in section 5 and we discussed the structure of higher order mixing terms in section 7. One interesting issue that appears for higher order interaction terms is that of non-locality. Generic scalar-tensor interactions generated via Stückelberging interaction terms will be of the form p(π) µν (h µν + π α ∂ α h µν + 1 2 π α π β ∂ αβ h µν + . . .), (B.1) where p(π) µν is some function of π µν . This appears to be non-local due to the infinite number of derivatives acting on the tensor h. Naïvely one may think that this automatically leads to higher order equations of motion, and hence ghosts via Ostrogradski's theorem. However, the non-local interaction Lagrangian generated in this fashion is degenerate, since there is no highest order derivative term, and one consequently cannot straightforwardly apply Ostrogradski's theorem here. It is therefore worthwhile to re-sum terms in the expansion in order to produce a local theory to which Ostrogradski's theorem can be applied (cf. [31,114,115] and our discussion of this point in section 9). The non-locality stems from the expansion of the original tensor field once the Stückelberg fields have been introduced. We recall that the introduction of Stückelberg fields essentially amounted to a co-ordinate transformation, where Y µ = x µ + A µ and h µν (Y ) = h µν (x + ∂π) = h µν + π α ∂ α h µν + 1 2 π α π β ∂ αβ h µν + . . . , (B.2) where we have dropped contributions from the Stückelberg vector A µ . Thus, if we re-express the rest of the scalar-tensor action in terms of coordinates Y µ (x), instead of x µ (x), we can remove the non-locality. Let us first analyse the way in which this works for a general N -metric coupling, in which Stückelberg fields are introduced to bring everything to site 1: (1) , g (2) , . . . , g (N ) ) → d D x f (g (1) , g (2) • Y 2 , . . . , g (N ) • Y N ).

(B.4)
37 Of course to get to the decoupling limit one would have to unpack Y into vector and scalar fields as described above, however after doing that, we can always repackage them back into Y .
The first term is straightforward to deal with since all the fields explicitly depend on x, so let us focus on one term of the remainder, which can be rewritten where in the new interaction term we have Now if the functional form of f i is such that the determinental prefactor on the RHS of B.6 can eliminate the ∂Y i ∂x 's appearing within f i , then the explicit dependence on x is eliminated and we have F µν i = F µν i (Y i ). This means we can then rename the integration variable on the RHS of B.5 Y i → x and add it back to the rest of the action.
We do not here analyse what forms of f i have the above property, but demonstrate the precise way in which this works in a specific context relevant to the ghost-free bigravity theory discussed in section 9. Similar to the X (n) tensors mentioned above, we may define Working in D dimensions we can re-express this as where we have defined X Y −1 µ (n)ν in the process. Thus a bimetric interaction with f µ ν = ∂ ν Y α X Y µ (n)α has F µ ν ∝ X Y −1 µ (D−n−1)ν , which is a function purely of Y . Upon renaming Y → x, one has ∂x ∂Y → ∂Y −1 ∂x . Neglecting the Stückelberg vector A µ , Y satisfies ∂ [µ Y ν] = 0, and thus Y −1 will also satisfy this relation; since Y depends on just one scalar d.o.f., Y −1 will also. These two facts imply that we may write ∂ µ Y −1 ν = η µν + ∂ µν φ, where φ is related to π through some non-local transformation, which we specify below. Then we have and hence we can express the scalar-tensor mixing in terms of φ as which is precisely of the form discussed in section 5 that can straightforwardly be removed at linear order via a linearised conformal transformation. For completeness we present below the form of the above-mentioned transformation between φ and π, which is derived using the fact that Y −1 (Y (x)) = x. Defining the differential operators = −π + 1 2 π µ π µ − 1 2 π µ π ν π µν + 1 2 π µ π ν π ν λ π λ µ + 1 6 π µ π ν π λ π µνλ + . . . . (B.12) C Appendix III: No kinetic terms and a general curved background In section 5 we discussed that the Stückelberg scalars π only acquire a kinetic term via mixing with tensor modes. As pointed out by [77], while this is true around some flat background, this is not the case in a general curved background. Focussing on the self-interactions for each scalar for the time being (effectively setting a (ij) = b (ij) = 0, whenever i = j), the quadratic order action for a Goldstone scalar π will be of the following form (C.1) as long as the equivalent of Fierz-Pauli tuning takes place (a (ii) +b (ii) = 0) and where we have suppressed all site-label indices (ik). In a curved background the field therefore already has a kinetic term generated by curvature, even before mixing with tensor modes. Mixing will still generate a contribution to the effective kinetic term, but is no longer the sole contributor. If one now also allows cross-interactions between different Goldstone bosons, as long as the generalised Fierz-Pauli tuning is satisfied (a (ij) + b (ij) = 0) the curvature generated terms will be of the form 38 d 4 x √ gπ (ik) µ π (jk) ν R µν .

(C.2)
Note that the derivatives on the Goldstone π in action (C.1) come from the Stückelberg co-ordinate transformation and from introducing the fake U (1). As far as the fake U (1) is concerned, the background is irrelevant because the derivative acts on a scalar. The second derivative comes from the Stückelberg coordinate transformation We can understand how this is 'covariantised' by going to the full expansion of a Stückelberged metric around a curved background ∂ µ Y α ∂ ν Y β g αβ (Y (x)) = (δ α µ + ∂ µ A α )(δ β ν + ∂ ν A β )(g αβ + A σ ∂ σ g αβ + . . .) = g µν + ∂ µ A α g αν + ∂ ν A β g µβ + A σ ∂ σ g µν + . . . = g µν + L A g µν + . . . = g µν + ∇ µ A ν + ∇ ν A µ + . . . , (C.4) where L indicates the Lie derivative, and the ellipsis indicates terms containing higher powers of A. 38 If one was to require the absence of cross-terms for i = j, this would be equivalent to setting a (ij) = b (ij) = 0.