Recent developments in bimetric theory

This review is dedicated to recent progress in the field of classical, interacting, massive spin-2 theories, with a focus on ghost-free bimetric theory. We will outline its history and its development as a nontrivial extension and generalisation of nonlinear massive gravity. We present a detailed discussion of the consistency proofs of both theories, before we review Einstein solutions to the bimetric equations of motion in vacuum as well as the resulting mass spectrum. We introduce couplings to matter and then discuss the general relativity and massive gravity limits of bimetric theory, which correspond to decoupling the massive or the massless spin-2 field from the matter sector, respectively. More general classical solutions are reviewed and the present status of bimetric cosmology is summarised. An interesting corner in the bimetric parameter space which could potentially give rise to a nonlinear theory for partially massless spin-2 fields is also discussed. Relations to higher-curvature theories of gravity are explained and finally we give an overview of possible extensions of the theory and review its formulation in terms of vielbeins.


Motivation
The Standard Model of particle physics contains massive and massless fields with spin 0, 1/2 and 1. Gravitational interactions are attributed to a spin-2 field which, in the standard framework of General Relativity (GR), is massless and possesses nonlinear self-interactions. Even though the Standard Model and GR are experimentally and observationally well-tested, several phenomena still remain unexplained and motivate the study of theories beyond the standard picture. In particular, two of the biggest unresolved problems concern the presently unknown nature of dark matter and dark energy and, in order to account for these constituents in a satisfactory way, introducing new physics becomes unavoidable. Additional degrees of freedom could be of the same type as the fields already present in the standard scenarios or, more interestingly, they could arise from heretofore unknown field theories. While the field theories for spin 0, 1/2 and 1 are well-understood, the treatment of higher spins turns out to be much more difficult. One of the simplest, or at least most natural, new ingredients that could be added to the known models is a massive spin-2 field whose presence is expected to mostly affect the gravitational sector. This may be desirable since modifying gravity is motivated by the fact that the Standard Model of particle physics is based on the very solid framework of a renormalisable quantum field theory, while a quantum theory of gravity does not yet exist and hence GR is not expected to be complete. Moreover, both the dark energy and the dark matter problems are intimately related to gravity but cannot be solved in the context of GR without raising additional questions.
Interactions for massive spin-2 fields have long been thought to inevitably give rise to ghost instabilities and only recently a ghost-free theory for nonlinear interactions between massive and massless spin-2 fields has been found. Since the construction of nonlinear massive gravity and its extension to bimetric theory there has been significant progress in the field, both on the theoretical and the phenomenological side. Two long review articles on the subject have already been written by Hinterbichler [1] and de Rham [2]. While these references focus on massive gravity, its gauge invariant formulation in terms of Stückelberg fields and its cosmology, this review is mostly dedicated to the manifestly covariant and dynamical bimetric theory. We focus on theoretical aspects and the structure of the theory (which is enforced by consistency) and mention its application to cosmology only as an aside. Moreover, all our considerations in this article will be at the classical level, although quantum corrections have been under preliminary investigations and are discussed in more detail in de Rham's review [2], mostly in the context of massive gravity.
The following subsection summarises the interesting historical developments that led to the construction of consistent spin-2 interactions. It is no prerequisite for the subsequent discussions and the reader more interested in the details of massive spin-2 theories may directly proceed to section 2.

Historical background
Since massive gravity and bimetric theory have a rather long history starting with the work by , it is impossible to give sufficient credit to all the groups that have contributed to the field over the last 75 years. We try here to collect the most important inputs towards the modern construction and pay attention to historical accuracy to the best of our knowledge.

Early attempts
The program of investigating massive spin-2 fields was initiated by Fierz and Pauli, who derived the unique classically consistent linearised theory of a free massive spin-2 field h µν propagating in Minkowski space-time [3,4]. They demonstrated that the corresponding Lagrangian is of the form, where h = η µν h µν and m FP is the mass parameter. The first line here corresponds exactly to the kinetic operator obtained by linearising the Einstein-Hilbert action of GR around flat space, while the second line encodes the quadratic non-derivative selfinteractions of h µν that render it massive. The equations of motion derived from the above Lagrangian are equivalent to the system of equations, The first of these is a massive wave equation, while the latter two are constraints on components of h µν . In particular, Fierz and Pauli showed that if the relative coefficient (−1) of the two parts in the mass term of (1.1) is changed in any way, the on-shell condition of tracelessness is lost and a ghost-like scalar mode inside h µν becomes propagating. At the classical level a ghost is a field with negative kinetic energy which gives rise to an unbounded Hamiltonian and thus causes fatal instabilities; at the quantum level ghosts must be avoided in order to ensure unitarity. It is therefore crucial to work with the above Lagrangian with correct relative coefficient in the mass term. In four space-time dimensions, it describes the on-shell propagation of a traceless, transverse and symmetric tensor field h µν with five massive degrees of freedom. This allows us to identify h µν with a massive spin-2 field with helicities ±2, ±1, 0. In all that follows, when using the terms mass and spin, we refer to their correspondence to degrees of freedom of relativistic field equations; we will rarely speak about the quantum nature of these concepts. In a similar fashion, we will frequently employ the particle perspective when dealing conceptually with nonlinear generalisations of the Fierz-Pauli Lagrangian, such as bimetric theory.
During the following years much of the efforts in field theory was directed elsewhere, developing a firm understanding of GR and exploring the booming realm of particle physics. Not much consideration was paid to Fierz and Pauli's theory of massive spin-2 until the early 1970's. We note however that related questions concerning bimetric theories were raised early on by Rosen [5,6] and later by Aragone & Deser in [7] and by Isham, Salaam & Strathdee in [8]. A major development came about when van Dam & Veltman [9] and Zakharov [10] (see also [11]) independently investigated particular consequences of the Fierz-Pauli Lagrangian interpreted as a theory of a massive graviton. They realised that, in the presence of matter sources, the zero-mass limit of the theory is discontinuous, a property which is now referred to as the vDVZ discontinuity. More precisely, this limit does not result in a theory for a single massless spin-2 field like linearised GR, but also contains a propagating scalar field which couples to the trace of the stress-energy source. The observational consequence would be an inferred difference in the bending of light around massive sources which was so severe that the theory would have been ruled out already at this time. Shortly thereafter, however, Vainshtein recognised a loophole in this reasoning [12]. He argued that, due to the presence of more scales in the theory when coupled to a source, a scalar mode becomes strongly coupled below some distance r V (the Vainshtein radius). Hence the linearised analysis breaks down in this regime and a nonlinear completion of the theory is necessary in order to address any questions at distances within r V in a consistent manner. In particular these findings made it possible again to recover GR at short distances in the zero-mass limit. That this recovery can indeed be realised at the nonlinear level was demonstrated in [13] for the case of ghost-free bimetric theory (for a recent review on the Vainshtein mechanism see [14]).
As an immediate response to Vainshtein's idea, Boulware and Deser studied the consistency of a wide class of possible nonlinear extensions of the massive Fierz-Pauli theory. They concluded that it was inevitable to introduce an extra propagating ghost-like scalar mode in any nonlinear extension of the theory [15]. In fact, this scalar mode, the Boulware-Deser ghost, is the same mode that at the linear level was removed by the trace constraint on h µν . The implication of this analysis was that no consistent nonlinear theory of massive spin-2 fields could exist. As we now know this strong conclusion was incorrect for two reasons: (1) Boulware and Deser did not consider the most general nonlinear extensions possible in their analysis. They considered only nonderivative self-interactions of h µν that were given through a general analytic function f (h 2 − h 2 µν ), which naturally Taylor-expands to the Fierz-Pauli mass term at lowest order. As we will see later, in the consistent theory the self-interactions are contained in very specific scalar functions (the elementary symmetric polynomials) constructed out of the matrix argument [ 1 + η −1 h] µ ν . The correct field dependence of the interactions is therefore not of the form assumed in the proof by Boulware and Deser. (2) In their Hamiltonian analysis, they expected one specific equation (the equation of motion for the lapse variable of the metric) to provide the constraint that removes the ghost. The analysis of the consistent theory reveals that, in fact, it is a different equation (a rather contrived combination of lapse and shift equations) that gives the constraint.
The conclusions of Boulware and Deser's analysis were so widely accepted that no further progress was made in the field for another 30 years and it would be almost 40 years until a consistent theory was fully developed. In hindsight this is unfortunate since, in the meantime, some interesting ideas did not receive the deserved attention due to the strong no-go theorem. For example, the correct structure of interactions for massive spin-2 fields was in fact partly suggested very early on by Zumino and also Chamseddine [16,17], but without addressing the ghost problem (in fact it can be noted that [16] actually predated the no-go theorem). Similarly, the correct structure in the vielbein formulation was partly written down in [18], again failing to address the ghost issue. It has also recently been pointed out in [19] that some attempts to construct a theory of nonlinear massive spin-2 fields were made by Maheshwari [20] using ideas from the works of Ogievetsky and Polubarinov [21,22]. These constructions however went largely unnoticed and did not contribute towards the modern ideas and understanding of nonlinear massive gravity.
Another approach that has become popular in recent years and deserves mentioning is the attempt to construct consistent spin-2 mass terms that break Lorentz invariance [23][24][25]. Based on the experience gained from the recent progress made in constructing the Lorentz invariant theories, similar progress has also been made in constructing Lorentz-breaking theories of massive gravity [26][27][28]. In this review we will however restrict ourselves to the class of theories that respects Lorentz invariance.

A renewed interest
After the very precise confirmation of the accelerated expansion of the universe in 1998 [29,30], a huge effort was devoted towards understanding better the underlying physics of this discovery. Within the context of standard GR this observed accelerated expansion of spacetime itself requires the addition of an additional source term with rather strange behaviour as compared to standard matter sources. The main-stream philosophy at this time was to hold firm in GR and not worry too much about the origin of such a source but simply collectively call this mysterious source "dark energy". The simplest explanation for the acceleration is the presence of a constant source term in Einstein's equations. Adopting this view led to the celebrated ΛCDM ("Λ Cold Dark Matter") concordance model of cosmology, which is in excellent agreement with observational data thus far [31]. On the other hand, a growing community of theoretical physicists with a particle physics oriented mind was now realising the pressing nature of the cosmological constant problem [32]: the small value of the observed acceleration does not fit in with expectations from a particle physics perspective, where a cosmological constant is naturally associated with vacuum energy. Unless additional symmetries (such as supersymmetry) are at work, a natural value for the vacuum energy scale is the mass of the heaviest field in the theory, which in any scenario is many orders of magnitude higher than the observed value for the cosmological constant. In addition to the poorly understood nature of dark energy, cosmologists seek to explain the presence of an unidentified matter component, commonly referred to as "dark matter", which, in the context of GR, is required to account for the observational data at distances ranging from galactic to cosmological scales [31,33,34].
Since quantum field theory is such a rigorous framework and both the nature of dark matter and in particular dark energy seems so deeply connected with the large scale behaviour of gravity, which was really only tested within the solar system, many theorists started to look for an answer by modifying the gravitational sector of field theory. In the beginning much attention was given to extra dimensional setups which were mainly inspired by the additional dimensions arising in string theory and by the braneworld scenarios geared towards addressing the Higgs hierarchy problem of the Standard Model as well as supersymmetry breaking through anomaly mediation [35][36][37]. With respect to the cosmological constant problem, a particular interest was paid to models of brane induced gravity [38][39][40] and similar constructions (e.g. cascading gravity [41,42]). Brane induced gravity models, and in particular the codimension-one DGP model [38,43] (for related investigations see [44]), were historically very important for a renewed interest in studying massive gravitons, since a generic feature of these models was the appearance of massive spin-2 resonance states on the brane that makes up our Universe in these models (see e.g. [45]). Due to the generic Yukawa suppression of the gravitational potential mediated by a massive field, a massive graviton also became interesting in itself for addressing the cosmological constant problem. The hope was that this exponential suppression could sufficiently weaken gravity at large distance scales to screen out a large vacuum energy coming from the matter sector and thus lead to a small effective cosmological constant. Even though this picture is correct in linear Fierz-Pauli theory, it eventually turned out that it would not work in the nonlinear theory without fine-tuning (as discussed in e.g. [46]).
In 2002 Arkani-Hamed, Georgi and Schwartz proposed a new perspective on studying effective theories with explicitly broken general covariance, in close analogy to symmetry breaking in spin-1 theories and the associated emergence of Goldstone modes [47]. These ideas were quite general but in particular provided a new language for analysing the internal consistency of massive gravity. More explicitly, the setup was based on intuition from the Goldstone boson equivalence theorem [48,49], which relates the physics of longitudinal modes of spin-1 gauge bosons to the physics of Goldstone modes at high energies. The authors of [47] suggested a similar correspondence in gravitational theories and that, in certain energy regimes, the complex problem of selfinteracting spin-2 fields could be simplified to studying only their scalar longitudinal components.
In 2005 Creminelli, Nicolis, Papucci and Trincherini followed up on the ideas of Arkani-Hamed, Georgi and Schwartz (further inspired by the results of [23,24,50]) and attempted to explicitly construct a consistent theory of massive gravity using a bottomup approach [51]. 1 As a bitter irony of history (and humbling lesson in importance of rigor), despite a very beautiful analysis they reached the erroneous conclusion that the ghost problem of nonlinear massive gravity could not be resolved and found the Boulware-Deser ghost reappearing again. This result was based on an unfortunate sign mistake which arose from copying a basic equation of [47] (for which the sign was not important). It is worth mentioning that without this sign mistake the discovery of a consistent theory of massive gravity could have been made already in 2005.

Massive gravity rediscovered: dRGT theory
In 2009 the issue of massive gravitons was further pursued by Gabadadze who modified GR by introducing an auxiliary extra dimension [53]. This work was shortly followed up by (and with) de Rham in [54,55]. Their approach was based on an interesting field theoretical tool to generate a mass term for a vector (or scalar) field by imposing boundary conditions in an auxiliary extra dimension. Even though introducing an extra unphysical dimension and fixing boundary conditions by hand seems rather ad hoc, the procedure itself straightforwardly extends to the spin-2 case. The massive spin-2 model obtained in this way was demonstrated to be ghost-free to cubic order in a "decoupling limit" analysis [55]. Although the same setup was subsequently shown to be inconsistent in a fully nonlinear analysis by Hassan and Rosen [56], it provided important inspiration that pushed developments further. The belief that the auxiliarydimension model was consistent motivated de Rham and Gabadadze to revisit the ghost analysis of nonlinear massive gravity by Creminelli et al.
In 2010 de Rham and Gabadadze studied generic extensions of the Fierz-Pauli Lagrangian (1.1) by higher-order interactions of the massive spin-2 fluctuation h µν [57]. Their analysis went to quintic order in the longitudinal component of the massive spin-2 field and demonstrated that its interactions could in fact be made ghost-free in a decoupling limit, correcting the conclusions of [51]. The decoupling limit analysis relies heavily on the aforementioned Goldstone boson analogy suggested by Arkani-Hamed, Georgi and Schwartz and requires taking a double scaling limit in order to study the dynamics of the longitudinal mode separately. As a follow up to [57], de Rham, Gabadadze and Tolley (henceforth dRGT) presented a nonlinear theory of massive gravity in whose decoupling limit they proved the absence of ghost for all nonlinear self-interactions of the longitudinal component [58]. The dRGT action is of the form, √ g R(g) + m 2 2 4 n=2 n!(4 − n)!α n e n (K) , (1.3) where the first term is the ordinary Einstein-Hilbert term of GR with Planck mass m g and the second term is the interaction potential for the graviton whose mass is set by the scale m. Furthermore, α 2 = 1 while the two remaining α n are arbitrary interaction parameters. The e n (K) are the elementary symmetric polynomials (see appendix A) constructed out of the matrix K µ ν = δ µ ν − [ g −1 η] µ ν . 2 Expanding the action in terms of h µν = m g (g µν − η µν ) indeed results in a nonlinear extension of the Fierz-Pauli theory (1.1).
The authors of [57,58] also made the important observation of a loophole in the Hamiltonian analysis by Boulware and Deser [15] which had long been believed to forbid any consistent theory of nonlinear massive spin-2 interactions. 3 By avoiding the ghost in the interactions of the longitudinal component, the dRGT model satisfied one of the necessary requirements on the complete theory to be consistent and consequently became the most promising candidate for a consistent theory of massive gravity. However, as noted, the proof of absence of the Boulware-Deser ghost in [58] was not made for the full theory since the analysis focused on self-interactions of one scalar component only. One reason for the difficulty of conclusively proving the consistency was the complexity of the nonlinear spin-2 interactions whose structure required better understanding before a full Hamiltonian analysis could be performed.

Important generalisations
The dRGT formulation of massive gravity was constructed with a perturbative expansion around a fixed flat background in mind. In 2011 Hassan and Rosen presented a reformulation of the dRGT action, which clarified the non-perturbative structure of the theory and identified consistent generalisations to more general backgrounds that were not apparent in the original formulation. Using basic properties of the elementary symmetric polynomials, they showed that the dRGT massive gravity action (4.7) can be reformulated and then generalised to [46], where β n are five arbitrary parameters whose role will be explained in detail later. Most notably, the new formulation involves an arbitrary (but fixed) reference metric f µν , which in dRGT is strictly taken to be η µν . It is worth to stress that even though the structures of (  4) is straightforward) and thus the new formulation truly represents an important generalisation of the original massive gravity theory. In particular, the new formulation is indispensable for addressing any question about massive spin-2 interactions on a curved background. Even more importantly, as we will see, it suggests how to arrive at a fully dynamical theory of interacting spin-2 fields.
Going beyond the decoupling limit analysis of [58], Hassan and Rosen quickly utilised the formulation (1.4) to give a fully nonlinear consistency proof for dRGT massive gravity with flat reference metric [59]: In a Hamiltonian analysis based on the ADM formalism [60], they showed that the complete nonlinear theory gives rise to a constraint that removes the Boulware-Deser ghost. Shortly after this, the proof was generalised to the case of an arbitrary reference metric f µν in [61]. These consistency proofs (see also [62,63] and [27,[64][65][66][67][68][69] for subsequent confirmations) for the massive gravity action (1.4) essentially completed the program of finding a Lorentz-invariant theory for a massive spin-2 field initiated by Fierz and Pauli in 1939.

Hassan-Rosen bimetric theory
A very important outcome of the generalised investigations of Hassan and Rosen laid in obtaining an extension of the massive gravity theory in which the reference metric f µν receives its own dynamics. As a consequence, the two metrics g µν and f µν are treated on the same footing in this bimetric theory and all fields in the action are determined dynamically. Bimetric theories of gravity have been subject of earlier investigations in, for instance, [6-8, 17, 70-75] but, just as nonlinear massive gravity, they generically suffer from the Boulware-Deser ghost instability.
Shortly after the nonlinear theory for massive gravity had been developed, the unique ghost-free bimetric theory was presented by Hassan and Rosen in [76]. Its form is reminiscent of the massive gravity action, but it includes an Einstein-Hilbert term of f µν in which m f is the "Planck mass" for the second metric. Most importantly, bimetric theory provides a covariant formulation for massive spin-2 fields, in which now both metrics are dynamical and the structure of the action is in fact symmetric with respect to the interchange of g µν and f µν . The consistency of bimetric theory was first demonstrated in [62,76] and, as we shall see later, this established the first theory describing consistent nonlinear interactions of massive spin-2 fields with massless ones.

Outline of the review
We have chosen the following structure for this review: Our starting point will be the linear theories for massless and massive spin-2 fields in flat background in section 2. The possibility to generalise the massless theory to arbitrary backgrounds by constructing the nonlinear theory of general relativity motivates us to look for a nonlinear completion of the mass term. In section 3, after reviewing the ghost problem, we pursue this goal by introducing a suitable set of ADM variables and provide a constructive proof for the consistent spin-2 interaction potential. Section 4 summarises the main features of ghostfree nonlinear massive gravity with general and flat reference metric, in both gauge fixed and in gauge invariant form. We then move on to the fully dynamical bimetric theory in section 5 where we write down its mass spectrum and also introduce couplings to matter. Classical solutions to the bimetric equations are discussed in section 6, first in general and then with a focus on black hole and cosmological solutions. In section 7 we review the phenomenon of partially massless spin-2 fields on de Sitter space and present the idea of realising partial masslessness at the nonlinear level. In this context, we also reveal a connection between bimetric theory and certain higher-derivative theories of gravity. Possible extensions of bimetric theory, including new kinetic terms in higher dimensions as well as multiple spin-2 interactions, are discussed in section 8. Finally, we conclude this review article with a list of several open questions in section 9.

Spin-Fields on Fixed Backgrounds
Having outlined the history of massive gravity in the introduction, we now turn to the detailed description of spin-2 field theories. In this section we first discuss the linear theory for a massless spin-2 field in a flat background and see how the correct number of physical degrees of freedom emerges due to gauge invariance. The nonlinear completion of the corresponding Lagrangian is the Einstein-Hilbert action of GR from which one can derive the linear theory around general curved backgrounds. Thereafter we review the linear theory of massive spin-2 in flat space, where the presence of constraints lead to the correct number of propagating modes. The quest for a generalisation to curved backgrounds again requires a nonlinear completion of the spin-2 mass term and paves the way towards nonlinear massive gravity.

Flat space
The Lagrangian for a massless spin-2 field h µν to quadratic order in the field in flat space with Minkowski metric η µν reads, where the structure of the kinetic terms is captured by the two-derivative operator, The corresponding equations of motion are, As can easily be verified, the Lagrangian and equations for the massless spin-2 particle are invariant under the following gauge transformations, As should be familiar from the spin-1 case (Maxwell's theory), we can use the symmetry transformation (2.4) to pick a convenient gauge. The de Donder gauge condition for the spin-2 field is ∂ µ h µν − 1 2 ∂ ν h = 0. This is the analogue of the Lorenz gauge for massless vectors and constrains four of the ten components in h µν . In this gauge the equations of motion assume the simple form, After fixing the de Donder gauge there still exist residual transformations, namely those with ξ µ = 0, that leave the de Donder gauge intact. Since the four residual gauge parameters satisfy the same equation as the field, they can directly be invoked to remove further redundant degrees of freedom. In total, gauge invariance therefore removes eight of the ten components in the symmetric field h µν , and the only propagating modes are the two degrees of freedom of a massless spin-2 particle.
It is also possible to couple the massless spin-2 field to other fields by introducing an external source T µν into the Lagrangian, where κ is a coupling constant of mass dimension minus one (here h µν is taken to have mass dimension one). The linearised Bianchi identity now implies ∂ µ T µν = 0, i.e. conservation of the source.

Curved space
The graviton in GR is a massless spin-2 particle and should hence be described by the Lagrangian (2.6). On the other hand, the equivalence principle tells us that the field should couple to all kinds of energy in the same manner, including its own stress-energy tensor. Implementing this requirement in (2.6) iteratively introduces nonlinearities in h µν and eventually leads to the Einstein-Hilbert action for GR (see e.g. [77,78]), The corresponding equations of motion are Einstein's equations, where R µν is the Ricci tensor with trace R and the stress-energy tensor is derived from the matter Lagrangian L m as, The Einstein-Hilbert action is the gauge invariant nonlinear extension of (2.6). It can be defined uniquely, as the field theory that describes nonlinear self-interactions of a massless spin-2 particle. The gauge transformations (2.4) are linearised general coordinate transformations (GCTs), under which the metric at the nonlinear level transforms as, The presence of a gauge symmetry at the nonlinear level ensures that the full theory propagates the same number of degrees of freedom as its linear version.
The Einstein equations (2.8) allow for flat space solutions, g µν = η µν , in the case of vanishing matter source. Linearising the action around this background in the perturbation h µν = κ −1 (g µν − η µν ) results in the linear theory for a massless spin-2 field (2.6).
On the other hand, a non-vanishing background value of the source will give rise to a curved background metricḡ µν and around this solution the action can be linearised as, where now the linearised Einstein operator takes the covariant form, where∇ µ andR are defined with respect toḡ µν . At this stage let us make an important remark on covariantisation of a theory whose form is known in flat space. Suppose we were given only the flat space Lagrangian (2.6) and asked to derive its covariant generalisation. Naïvely, one would replace all partial derivatives ∂ µ by covariant ones ∇ µ and all occurrences of the Minkowski metric η µν by the more general background g µν . This procedure works for lower spin-particles coupled to gravity and results in wellknown formulations of spin-0, spin-1/2 and spin-1 theories in curved space. For the spin-2 field, however, the procedure fails because naïve covariantisation of (2.6) does not result in the consistent form (2.12) obtained from linearising GR. In particular, ambiguities arise because the covariant derivatives do not commute and since there is no obvious guideline telling us which curvature terms to include in the linearised Einstein operator. On the other hand, knowledge of the nonlinear theory (2.7) allows us to straightforwardly arrive at the correct Lagrangian for a massless spin-2 field in curved background.

Massive spin-2 field in flat space
As has been known since the work of Fierz and Pauli from 1939 [3], the quadratic Lagrangian for a massive spin-2 excitation in flat space has the form, where the kinetic operator is the same as in the massless case, given by (2.2), and we have included a source term T µν . The equations of motion obtained from the Lagrangian for the massive particle are 14) The mass term breaks the gauge invariance of the massless theory but, as a consequence of the linearised Bianchi identity, the divergence and trace of these equations give rise to five constraints, The source is not necessarily conserved but, for simplicity, we shall anyway assume ∂ µ T µν = 0. This assumption certainly holds for any source that is derived from a diffeomorphism invariant matter coupling. For vanishing sources, the constraint equations imply that the massive spin-2 field is transverse and traceless.
Using (2.15) in the equations of motion, we can rewrite them as, 16) and see that the massive spin-2 field satisfies a sourced Klein-Gordon equation. The constraints (2.15) remove five of the ten components in h µν , leaving us with the five propagating degrees of freedom of a massive spin-2 particle.
An important observation made by Fierz and Pauli is that modifying the structure of the mass term, i.e. changing the numerical factor in front of h 2 in (2.13), introduces an additional degree of freedom into the theory. This happens because the trace constraint (2.15b) is lost and h satisfies a dynamical equation of motion instead. It can furthermore be shown that the propagator corresponding to the extra degree of freedom comes with a residue of the wrong sign and therefore gives rise to a ghost instability [15]. 4 This unwanted dynamical field is referred to as the Boulware-Deser ghost. It is this ghost mode that complicates the construction of a nonlinear interaction potential for a spin-2 field because, even when banned from the linear theory, the ghost notoriously reappears through the higher-order interactions.
On the other hand, tuning the linear mass potential to the Fierz-Pauli structure comes with its own problems that threaten the phenomenological viability of the theory as a modification of GR. Namely, as we already mentioned in the introduction, it was shown by van Dam, Veltman and Zakharov (vDVZ) [9,10] that the m FP → 0 limit of linear massive gravity does not continuously approach linearised GR. Nonlinear selfinteractions for the spin-2 field may be able to cure this problem if they exhibit the Vainshtein mechanism [12]. Historically, this was one of the main motivations for the construction of a nonlinear theory of massive gravity. Another reason to search for a completion of the Fierz-Pauli mass term is the existence of the nonlinear closed form (2.7) in the massless theory.
Since consistency of linearised massive gravity in flat space requires tuning a coefficient in the mass potential, one can expect that a consistent (i.e. ghost-free) nonlinear potential cannot contain arbitrary interaction terms, but that the coefficients of certain terms will be related to each other by demanding the absence of the extra degree of freedom. If the fully nonlinear theory for massive spin-2 was known, it could be linearised around general backgrounds to give the covariantised version of (2.13). However, as in the massless case, it would be a very difficult task to derive the linear theory on arbitrary backgrounds by covariantising the flat-space Lagrangian. As we will see later in section 4.3, the most general linear theory for massive spin-2 can also be derived from a nonlinear action and has a rather complicated form.

Towards Nonlinear Spin-2 Interactions
In the previous chapter we considered massless and massive theories for a linear fluctuation h µν in a fixed Minkowski background η µν . In the nonlinear massless theory given by the Einstein-Hilbert action of GR, it was possible to combine the background and the fluctuation into a single nonlinear field g µν . We would now like to do something similar in the massive case and construct a nonlinear self-interaction term without derivatives for the metric.

General structure
Historically it was often assumed that any nonlinear theory for massive gravity must give rise to flat space solutions and a Fierz-Pauli mass term in the linear theory. On the other hand, it is well known that a consistent mass term may also be written down on maximally symmetric backgrounds 5 [80][81][82][83][84][85][86], or even more generally on homogeneous and isotropic backgrounds [87][88][89]. This possibility motivates us to consider a general nonlinear theory without reference to flat Minkowski solutions and only demand that the correct number of degrees of freedom propagate at the nonlinear level (or, equivalently, around any background). Of course, when restricting to certain backgrounds the linearised version of the nonlinear theory must also reduce to the correct known structure.
What we call a mass term for a rank-2 tensor g µν must be a scalar density, i.e. it has to be a nontrivial scalar function V (g) multiplied by the scalar density √ g. Obviously, the scalar function V (g) cannot have any loose covariant indices and, by definition, it should not contain any derivatives. But then, the only object at hand to contract the indices of the metric tensor g µν is the metric itself which, since g ρµ g µν = δ ρ ν , leads to a trivial cosmological constant contribution in the action. We conclude that there is no possibility to construct a covariant nonlinear interaction term for a spin-2 field using only one tensor field. 6 Hence, we are forced to introduce another field in order to build nonlinear contractions with g µν . In principle, this could be any object with sufficient amount of indices, but the minimal choice is to work with a second rank-2 tensor which we shall call f µν . The interaction potential will then be given by √ g multiplying a scalar function of g ρµ f µν . Note that, due to the existence of an additional tensor f µν , we could in principle consider densitising by using for example √ f or g 1/4 f 1/4 . Since it is always possible to factor out √ g (for instance, by writing √ f = √ g det g −1 f and absorbing the second factor into the potential) we may use that as the scalar density without any loss of generality.
In summary, we expect the nonlinear massive gravity action to be of the form, where m is an arbitrary energy scale that sets the mass of the spin-2 field. In this setup, the second "metric" f µν is a fixed background field that needs to be put into the theory by hand. From the viewpoint of field theory this is somewhat unusual because f µν is not determined by an equation of motion. In fact, there is no need to worry about this, since we will resolve this slightly confusing point later, when we introduce the fully dynamical bimetric theory that treats g µν and f µν on the same footing. For now, let us accept the possibility to work with a fixed reference metric f µν and investigate the consistency of this class of theories. A simple example for a possible interaction term would be V (g −1 f ) = Tr(g −1 f ). Interestingly, the corresponding action is closely related to another modification of gravity that goes under the name Eddington-inspired Born-Infeld theory [90]. Unfortunately, having only this term in the action gives rise to the Boulware-Deser ghost and is thus not a viable choice.
A necessary requirement on the interaction potential in (3.1) is that it reduces to the Fierz-Pauli mass term for parameter choices that permit a linearisation around flat space (i.e. when the reference metric f µν is flat and when flat background solutions for g µν exist), otherwise it will certainly propagate the Boulware-Deser ghost. But this requirement alone is not sufficient for consistency: A generic nonlinear interaction potential, even if it incorporates the Fierz-Pauli structure around flat space, will reintroduce the extra degree of freedom which leads to instabilities at the nonlinear level. This is anticipated from a simple degree-of-freedom counting in the full theory: The introduction of the interaction potential breaks the diffeomorphism invariance of GR. Therefore the four gauge symmetries are lost and the theory generically will propagate four additional degrees of freedom. Since 2 + 4 = 6, this does not give the correct number for a massive spin-2 particle, but there is an extra degree of freedom in the theory. This is the nonlinear Boulware-Deser ghost and a constraint is needed to remove it from the spectrum of propagating modes. Tuning the linear mass term to the Fierz-Pauli combination ensures the presence of this constraint only in the linear theory.
We will see that, in order to obtain an additional constraint and thereby ensure the absence of the ghost beyond the linear level, it is necessary to impose strong restrictions on the structure of interactions. In fact, demanding the presence of a constraint will fix all but three coefficients of all possible interaction terms.

The Boulware-Deser ghost
In the construction of the consistent theory, we will follow the approach of Boulware and Deser who studied massive gravity in the Hamiltonian formulation [15,91]. They claimed that, even though it is possible to remove the ghost from the spectrum of propagating modes at the linear level, it will return for any nonlinear interaction terms that are added to the Lagrangian (2.13). In order to show that this result is in fact incorrect, we first need to familiarise ourselves with variables suitable for the Hamiltonian formulation of GR.

ADM variables for general relativity
The Hamiltonian formulation of GR traces back to the work by Arnowitt, Deser, and Misner (ADM) from 1962 [60]. The authors decomposed the metric g µν into a scalar N (lapse), a three-dimensional metric γ ij , and a three-component vector N i (shift) as follows: where γ ij denotes the inverse of γ ij . This parametrisation essentially splits the metric into its time (0µ) and spatial (ij) components. From (3.2) we can also compute the inverse of the metric, Here and in the following we raise the indices on the shift vector N i using the inverse spatial metric γ ij . It turns out that, in any theory with kinetic structure given by the Einstein-Hilbert term, the lapse N and shift N i are non-dynamical gauge degrees of freedom because the Ricci scalar of g µν contains no derivatives on those fields. Hence, all of the propagating modes are contained in the spatial metric γ ij which has six independent components. The gauge invariance of GR further reduces the number of propagating degrees of freedom to two, along with the same number of corresponding canonical momenta.
More explicitly, in terms of the ADM variables the Einstein-Hilbert action (2.7) in vacuum becomes (up to a boundary term which, although crucial for certain applications, is not important for our considerations) [60] where, in terms of the curvature scalar R (3) of the metric γ ij , The conjugate momenta π ij of the six metric components γ ij are computed from the GR Lagrangian in the standard way which leads to expressions in terms of derivatives of the four-dimensional metric.
It is now evident that the action (3.4) does not contain dynamical terms for the scalar N nor for the vector N i . On top of that, these variables appear only linearly and therefore act as Lagrange multipliers whose equations of motion do not contain N and N i themselves. This implies that these equations in fact correspond to four constraints R µ = 0, with R µ = (R 0 , R i ), on the remaining twelve variables γ ij and π ij .
According to the theory of constrained Hamiltonian systems, 7 the presence of a gauge symmetry can now be seen in the Poisson algebra of the constraints, {R µ , R ν }. Here, the Poisson bracket for functions A, B is defined as The result for the constraint algebra of GR reads [93,94], Since these brackets are proportional to the constraints themselves, they vanish on the constraint surface, as they should in the presence of a gauge symmetry. In the language of Dirac, non-trivial gauge symmetries generate first class constraints. All 7 For a review of this subject see e.g. [92].
four primary constraints will serve to put conditions on the remaining variables γ ij and π ij , while the Lagrange multipliers N and N i remain undetermined at this stage.
Since the Hamiltonian H is itself a linear combination of R and R i , all constraints are automatically preserved in time since the time-evolution of any quantity is determined by d dt A(x) = {A(x), H}. After imposing the four conditions on γ ij and π ij , we can still use gauge transformations to remove another four degrees of freedom whose equations of motion will eventually determine N and N i . In total we therefore end up with four dynamical variables, corresponding to the two helicity states (±2) of the massless graviton and their canonical momenta. This shows that, even nonlinearly, GR propagates the correct number of degrees of freedom for describing a massless spin-2 particle.
Note that throughout the above discussion we never actually wrote down the Hamiltonian, but remained in the Lagrangian formulation. In order to investigate the positivity of the energy 8 this is naturally insufficient, but since here we were only interested in counting degrees of freedom there was no need to work directly with the Hamiltonian, which can trivially be obtained from (3.4).

The no-go theorem
The ADM variables turn out to be very useful for investigating the consistency of massive gravity. We saw that at the linearised level the only consistent mass term is the one proposed by Fierz and Pauli given in (2.13). In order to demonstrate the presence of the ghost instability in the ADM formalism, in their paper [15] from 1972, Boulware and Deser studied the more general "mass term" with arbitrary coefficient a, The trace constraint h = 0 which removes the ghost mode does not exist for a = 1. Another way of seeing the additional mode is by noticing that, precisely for the Fierz-Pauli choice a = 1, the mass term is linear in the component h 00 , which furthermore appears without time derivatives in the linearised Einstein operator. Thus, for a = 1, the equation of motion for h 00 is a constraint which removes one dynamical variable.
For other values of a, the equation depends on h 00 itself in which case it does not constrain other components. As a direct consequence, a sixth degree of freedom is propagating.
We can make this more explicit using the variables introduced for the Hamiltonian analysis of GR in the previous subsection. From the ADM decomposition (3.2) we read off the decomposition of the fluctuation, where h ij = γ ij − δ ij . Inserting this into the mass terms one finds, The last term gives rise to nonlinearities in h 00 = 1 − N 2 + N i γ ij N j and vanishes only for a = 1. Moreover, when h µν = g µν − η µν is viewed as a small perturbation of η µν , its ADM variables can be written as small fluctuations as well, and we can study their appearance to quadratic order in the mass term. For a = 1, the expression in (3.10) turns out to be linear in δN at the quadratic level, such that the equation of motion of δN gives rise to a constraint. On the other hand, we also see that the shift vector δN i does no longer appear only linearly which is of course consistent with the breaking of diffeomorphism symmetry by the mass term. Together with its associated secondary constraint, the δN equation removes two degrees of freedom, one field plus its canonical momentum. 9 We end up with 12 − 2 = 10 degrees of freedom, describing the five polarisation states and corresponding conjugate momenta of the massive graviton. Contrarily, for a = 1, there is a term involving δN 2 at the quadratic level. The constraint arising from the equation of motion of δN is lost in that case because the equation now determines δN itself instead of constraining other variables. There are thus 12 degrees of freedom, describing six propagating modes, one of which is the Boulware-Deser ghost. The same situation occurs if we do not consider h µν as a small perturbation, i.e. look at more general backgrounds than η µν . In that case, the ADM variables of h µν are no longer small fluctuations and we have to consider the theory beyond quadratic order. Boulware and Deser studied a class of corrections to the Fierz-Pauli mass term and found that those higher-order terms in h µν could never result in an expression that is linear in the lapse N . From this they concluded that the constraint that removes the ghost in the linear theory is destroyed and hence a theory of nonlinear spin-2 interactions can never be consistent [15,91].
As pointed out in [57,58] and as will be discussed in the next subsection, there exists a loophole in this no-go theorem for nonlinear massive gravity and it turns out that a consistent theory can be constructed. 10 In the following, we will deviate from the historical path and construct the consistent interaction potential directly in the redefined ADM variables that were used in the consistency proof of [59,61] instead of presenting the derivation in the decoupling limit performed in [57,58]. From our point of view this construction is the most efficient way to arrive at the action for nonlinear massive gravity and it has the further advantages that it (a) automatically ensures the absence of ghost in the full theory (i.e. away from the decoupling limit) and (b) immediately results in the generalised form of the action with arbitrary reference metric f µν .

ADM variables for massive gravity
Our aim is to arrive at a nonlinear theory for massive spin-2 fields of the form (3.1) and, to this end, we shall discuss interactions in terms of ADM variables. For definiteness we will work in four dimensions but all our considerations and conclusions generalise straightforwardly to any dimension. Before we start, let us briefly recapitulate the situation.
Since the kinetic term for the metric g µν in (3.1) is the same as in GR, the lapse and shift functions N and N i will still appear without derivatives. However, the interaction potential will in general no longer be linear in these functions. Therefore, their equations of motion, instead of imposing constraints on the remaining variables, will now determine N and N i themselves. The four gauge constraints are lost and, as explained in the previous subsection, the number of propagating modes will now generically be six, plus their corresponding canonical momenta. These are two phase-space degrees of freedom too many for the theory of a massive spin-2 field. Moreover, the Hamiltonian of a generic theory will not be positive definite [15,91], signalling that the extra propagating mode is a ghost. A necessary requirement on any consistent interaction potential is therefore the presence of an additional constraint that removes the nonlinear Boulware-Deser ghost.
Before we set out to construct the potential featuring the constraint it should be noted that there are two other conditions which need to be fulfilled by a fully consistent theory. Firstly, the preservation of this constraint in time must itself provide a constraint in order to remove also the conjugate momentum and hence the full phase-space pair associated with the pathological degree of freedom. Secondly, the resulting Hamiltonian must be positive definite so that none of the surviving five spin-2 modes gives rise to an instability. It should be emphasised that positivity of the Hamiltonian of the nonlinear theories that we are about to discuss has never been proven in general and in fact seems not to be true without additional physical assumptions, see e.g. [95]. For instance, ghosts different from the Boulware-Deser mode may still propagate around certain backgrounds, an example being the Higuchi instability of the helicity-zero mode in de Sitter space [81]. In the literature, whenever massive gravity and bimetric theory are labelled as "consistent" or "ghost-free", one is usually referring only to the complete removal of the Boulware-Deser mode and its conjugate momentum. Throughout this review we frequently adhere to this conventional abuse of terminology.

The loophole in Boulware & Deser's argument
In order to investigate whether a particular structure in the potential V (g −1 f ) can give rise to a constraint and thus satisfy the first necessary condition on any consistent theory, we first decompose the second rank-2 tensor f µν into its own ADM variables, 11 Here, φ ij denotes the inverse of the three-dimensional metric φ ij , L i is the shift-vector, and L is the lapse of f µν . We furthermore express the measure factor √ g in terms of the lapse N and the determinant γ of γ ij , With these expressions at hand, we can write a generic interaction potential in terms of ADM variables as, 12 (3.14) 11 In fact, it is not automatically guaranteed that a simultaneous ADM split for g µν and f µν exists or, equivalently, that N 2 and L 2 are both positive definite. The assumption of simultaneous ADM decompositions, which shall be made here and in the following, is related to the existence of intersecting light cones for the two metrics. The details of this will be discussed in [98]. 12 We will often make use of the notations N i ≡ γ ij N j and L i ≡ φ ij L j .
According to the argument by Boulware and Deser, the right-hand side of this equation needs to be linear in the lapse N in order to provide a constraint that removes the ghost mode. However, this is not entirely correct and this is exactly where the loophole lies: In fact, the necessary condition on the potential is slightly weaker because the constraint may be obtained after combining several of the equations for N and N i . In other words, if there exists a particular combination of these equations that is independent of N i and N , then this equation will constrain the remaining variables. At the level of the action, this means that we should allow for a field-dependent redefinition of the shift components, N i → n i , that renders the Lagrangian linear in the lapse N . Let us explain this in a bit more detail. Suppose that a linear combination of the equations for N and N i is independent of the lapse and thus corresponds to the constraint, where the C i are some functions of the ADM variables. We can now make a redefinition of variables, N i (N, n j , . . .) ≡ C i k (N, n j , . . .)n k , such that the variation of the action with respect to the lapse becomes, Here | N i means that the function N i is kept fixed when the functional derivative is taken. From this we see that if we choose the redefined shift components n k such that they satisfy δC i k (N,...) δN n k = C i then the variation of the action with respect to the lapse gives precisely the constraint (3.15) which by assumption does not involve N . We shall thus look for a redefinition of the shift vector that renders the action linear in the lapse N .
Certainly, the redefinition, i.e. the matrix C i k (N, . . .), must be linear in N itself since the N i appear linearly in the kinetic term. Furthermore, since the redefined shift components n i are expected to appear in the constraint, they must be fully determined by their own equation of motion which therefore must not depend on N . To summarise, in order to fulfil the first necessary condition of obtaining a constraint we make two requirements on the potential in terms of the redefined shift n i : For now, we focus on the first requirement and we will see later that the potential that we construct by demanding only (i) automatically satisfies (ii).

Redefinition of ADM variables
We now observe that the full interaction potential density, √ g V (g −1 f ) in (3.1), already has a factor of N in front coming from the measure factor (3.13). Hence, in order to satisfy requirement (i) listed above, the potential V (g −1 f ) written in redefined ADM variables must be of the form, where V 1 and V 2 are functions only of (γ ij , n i ) (apart from the non-dynamical components of f µν which we choose to omit here), i.e. they are independent of N .
Recalling the ADM decomposition (3.2) of g µν , we notice that g µν is quadratic in N and in order to obtain inverse powers of N we need to consider the inverse metric (3.3). The latter is quadratic in 1/N and the best we can achieve by a field redefinition which is linear in N is to complete the dependence on N into a perfect square such that taking a square-root can result in an expression linear in 1/N . In other words, the only quantity that has a chance of giving something linear in 1/N after a linear redefinition of the N i is an object whose square is proportional to the inverse metric (3.3). We are thus led to consider a potential V that is a function of the matrix S ≡ g −1 f , defined via S 2 = g −1 f . This square-root matrix has a very nontrivial ADM decomposition and is certainly highly nonlinear in 1/N before any redefinition. However, we will now make use of the allowed redefinition of N i and demand that in terms of the new shift-vectors n i the square-root matrix S is of the form [59,61], where A and B are matrix-valued functions of (γ ij , n i ). The redefinition that leads to (3.18) as well as the explicit expressions for A and B can be obtained straightforwardly by the following method: Square the right-hand side of the ansatz (3.18) and equate it with the ADM expression for g −1 f obtained from (3.2) and (3.12), using the most general ansatz for the redefinition, N i = c i + N d i . Then compare the expressions on both sides order by order in 1/N to determine the vectors c i and d i in the shift redefinition as well as the matrices A and B. This derivation was given in [61] and we discuss it in more detail in appendix B; here we simply state the result. The redefinition that renders the square-root matrix S linear in 1/N takes a rather simple form [59,61], The 3 × 3 matrix D on the right-hand side is a function of the variables (γ ij , n i ) and the non-dynamical spatial metric φ ij . Explicitly, it can be written in matrix notation as, where we have defined another matrix Q through, Note that the definition (3.20) of the matrix D that enters the redefinition involves a 3 × 3 square-root matrix. By introducing the shift vectors n i we have therefore reduced the dimension of the square-root matrix that appears in S by one and, most importantly, simplified the dependence on the lapse N which no longer appears under any square-root in (3.18). One may worry that a real solution for the 3 × 3 square-root in (3.20) does not always exist. However, we will show now that the variables can be further redefined to demonstrate the existence of real solutions for D and, in fact, to remove the square-root matrix entirely.

On the existence of the redefinition
The form of the redefinition (3.19) is not entirely unique. In fact, the original papers [59,61] on massive gravity mainly worked with a set of variables that slightly differs from the one presented here, whereas the choice of variables we made above is more suitable for application to bimetric theory [76]. Moreover, as was shown later in [99], it is possible to arrive at simpler expressions which are also more symmetric between the two metrics g µν and f µν . In order to see this, let us decompose the two spatial metrics into "spatial dreibeins", These expressions are invariant under rotations of the dreibeins, which means that ϕ = Rϕ with R T = R −1 is an equivalent dreibein of φ and the rotations are a local symmetry of the theory. This freedom can be used to resolve the square-root appearing in D (c.f. (3.20)). Next, we redefine the shift vectors n i according to, where the new shifts v a carry a spatial Lorentz index andφ = Rϕ is a gauge-fixed dreibein. The rotation matrix is chosen such that it satisfies, whereÎ ab = δ ab . It is straightforward to obtain a solution for R from this equation and this solution always exists, which follows directly from the polar decomposition theorem stating that any matrix can be symmetrised by a rotation. Inserting the new variables into (3.20) and using (3.24) to evaluate the square-root matrix, we find after some algebraic manipulations that D takes the much simpler form, where we are using matrix notation and1 a b = δ a b . Introducing the dreibeins and the new shift vectors has thus enabled us to get rid of any matrix square-root in the equations. The only square-root left is the scalar This square-root has real solutions provided that the metric components satisfy the bound v a δ ab v b < 1. Interestingly, this bound has an interpretation connected with Lorentz transformations and it turns out that the redefined shift vector v a can be interpreted as a Lorentz velocity. We will come back to this point at the end of section 8.2.2. For now, let us assume that the bound is not violated and therefore the redefinition of shift vectors exists. In terms of the new variables, it takes the following simple and symmetric form [99], Even though the structure of this expression is a little less complicated than (3.19), in order to stay closer to the conventions in the literature, we choose to continue using the original redefinition.

Construction of the ghost-free potential
As we show in appendix B, the matrices A and B in g −1 f = 1 N A + B depend on the redefined ADM variables in the following way, in which x = 1 − n i φ ij n j as before and the index on the shift vector L i is raised using the inverse spatial metric φ ij . Although the above expressions seem rather complicated, what is essential for the construction of the ghost-free potential is the structure of the matrices A and B. Most importantly, A is a matrix of rank one and it can be written as the outer product of two vectors, This will be the key property entering our constructive proof in the following. Having obtained the ADM expression with the desired dependence on N for the square-root we are ready to construct the interactions which give rise to an additional constraint that removes the Boulware-Deser ghost.
As explained above, the mass potential has to be a function of the square-root matrix We assume this function to be, at least formally, expandable as a Taylor series in S. This commonly defines what one means by a matrix valued function anyway so is not really a serious restriction. Generically, this will give an expression that is nonlinear in the lapse N . The only way to ensure linearity in N is to demand the absence of higher powers of 1 N A in the expansion. Obviously, the simplest possible term is √ g Tr(S) = √ γ Tr(A + N B). At first sight it seems that all higher powers of S = 1 N A + B will involve higher powers of 1/N . However, due to the special structure of the matrix A in (3.28) this is not quite correct and specific terms of higher order in S can still be linear in 1/N . Since A has rank one, it is a projection operator on a one-dimensional subspace. Owing to this property, there is a unique way of building polynomials of 1 N A + B that are linear in 1 N A. Namely, only an antisymmetric product of A's will automatically contain only one power of A. Let us see how this works in detail by considering with arbitrary coefficients b n and totally antisymmetric tensors µνρσ . Since the simultaneous exchange of µ i , µ j and ν i , ν j only changes the sign of both -tensors, it leaves the whole term invariant. Therefore, at nth order, the product of S's under the sum can be written in the form, We insert this expression into (3.29) and obtain Here we have used (3.28) in the second equality. Since all indices of the symmetric products of either u ν i or w µ i are contracted with the totally antisymmetric indices of the corresponding -tensors, we find that in V n (A, B) only terms with at most one A contribute. This implies that the sum over l in (3.31) actually terminates at l = 1 and hence V (S) is linear in 1/N , which is precisely the property that we were looking for. We therefore conclude that the most general form of the complete potential density which is linear in N after the redefinition (3.19) is and thus satisfies criterion (i) that we wrote down in section 3.3.1. In order to give rise to a constraint it needs to also meet criterion (ii), that the n i equations following from the action with the above potential need to be independent of the lapse N . As one can verify in a lengthy but straightforward computation, this second requirement does not impose further restrictions on the form of the potential but is automatically satisfied by (3.33). More explicitly, the n i equations are of the form [59,61], where E j does not involve N . Since the matrix multiplying E j is exactly equal to the Jacobian δN j δn i of the redefinition, it is invertible by assumption (because otherwise the redefinition would not be well-defined). We can thus multiply the equations by its inverse to arrive at the equivalent equations E j = 0 which do not depend on N .
We have thus derived the unique form of the interaction potential which gives rise to an additional constraint. Note in particular that there is only a finite number of terms giving a potential which is linear in N . Due to the antisymmetric structure of the interactions, the possible terms are limited by the dimension of spacetime. Before discussing their structure and properties in more detail in the next section, let us make some final remarks on the existence of the associated secondary constraint which arises from demanding the primary constraint to be preserved in time.
The secondary constraint: The above requirements that we used to construct the consistent interaction potential were necessary but not sufficient: A secondary constraint is crucial for the absence of the Boulware-Deser ghost because two constraints are needed to remove both the ghost mode and its canonical momentum from the set of dynamical variables.
It was already motivated in [59,61] that there is in fact a secondary constraint arising from demanding the primary constraint to be constant in time. In order to compute the time evolution of the constraint one uses the Poisson bracket (3.6) and the Hamiltonian, which for the massive gravity action (3.1) with potential (3.33) is of the form Here H 0 is independent of N and C is the Hamiltonian constraint obtained from varying the action with respect to N . The time evolution of C then reads As in the case of the primary constraint, this equation should be independent of N , because otherwise it would determine N instead of constraining γ ij and π ij . Inserting (3.36) into (3.35) gives Since C and H 0 are independent of N , we need the Poisson bracket among the constraints {C(x), C(y)} to vanish. Before this was actually demonstrated to be the case, there had been objections against the theory claiming that {C(x), C(y)} could be nonzero [100]. The issue was resolved when the secondary constraint was eventually shown to exist in a detailed calculation in [62]. We will not repeat this analysis here but instead refer the interested reader to the original reference as well as other subsequent independent confirmations of these results [27,63,[67][68][69].

Ghost-free Nonlinear Massive Gravity
In the previous section we derived the nonlinear interaction potential for massive gravity equipped with an additional constraint that removes the Boulware-Deser ghost. Here we will study its properties in more detail and discuss the subclass of dRGT models as well as the gauge invariant Stückelberg formulation.

Most general action
For notational purposes, a more convenient way of writing the consistent potential in (3.33) is to introduce the elementary symmetric polynomials e n (S) of the matrix S. These can be defined in terms of totally antisymmetric tensors (with unit weight), with e 0 (S) ≡ 1. More properties of the elementary symmetric polynomials as well as precise definitions of the -tensors as anti-symmetrisation operators are summarised in appendix A. In terms of these the complete action for ghost-free massive gravity with general reference metric takes the form, with S = g −1 f , Planck mass m g and spin-2 mass scale m. Furthermore, we have introduced the rescaled coefficients β n = b n n!(4 − n)!/2 for n = 0, . . . 4. Out of these five parameters, only three are truly measuring interaction strengths: Since e 0 (S) = 1, the β 0 -term is simply a cosmological constant for the dynamical metric g µν . Moreover, the last term in the sum that is proportional to β 4 is just a cosmological constant term for f µν because e 4 (S) = det S and hence √ g e 4 (S) = √ f . This term is therefore independent of g µν and does not contribute to the equations of motion. Nevertheless, we choose to include it in the action because it will become relevant when we give dynamics to f µν later on.
The above action is a nontrivial generalisation of the de-Rham-Gabadadze-Tolley (dRGT) model [57,58] which we shall discuss in the next subsection. Its above form (with general reference metric f µν , finite sum over n and in terms of the elementary symmetric polynomials) was first presented in [46].
The equations of motion for g µν obtained from (4.2) are, where the first two terms are the usual Einstein tensor while the contribution from the interaction potential is, where we have defined the matrix functions, There is no equation of motion for f µν whose form therefore needs to be put in by hand. 13 Taking the divergence of the above equations and using the Bianchi identity ∇ µ G µν = 0 satisfied by the Einstein tensor G µν , we arrive at a set of Bianchi constraints, These remove four degrees of freedom while the remaining extra scalar (the Boulware-Deser ghost) is eliminated by the additional constraint present in the special structure of (4.2). A covariant expression for this scalar constraint is difficult, if not impossible, to obtain in general. Its explicit form for certain regions in the parameter space is provided in [101,102] which make use of the vierbein formulation that we shall discuss in section 8.2. For the same restricted parameter choices, it is also possible to identify a covariant constraint in the linear theory around arbitrary backgrounds, see section 4.3.

dRGT theory
The theory first derived by de Rham, Gabadadze and Tolley (dRGT) in [57,58] was defined for flat reference metric f µν = η µν . The original construction uses Stückelberg fields (c.f. section 4.4) and its "ghost proof" is valid only in the scalar sector of a decoupling limit that strongly relies on the flat reference metric. 14 In contrast, the construction that we have presented here is based on the results of [59,61] and is valid for all reference metrics. It demonstrates the consistency of the full nonlinear theory away from any limiting approximation.
It is worth discussing the dRGT model and its relation to the Hassan-Rosen formulation in a bit more detail. Using the observation that the sum in the interaction potential terminates and can be given in terms of elementary symmetric polynomials [46], the dRGT action can be written in the form [57,58] (using the conventions of [2]), where We can arrive at this action starting from (4.2) by setting f µν = η µν and taking, This follows from the identity (A.4) satisfied by the elementary symmetric polynomials. Hence the consistency of massive gravity with general f µν implies the absence of ghost in the dRGT action. In contrast, there is no obvious way of getting to (4.2) from (4.7) and results obtained in the latter do not generalise automatically to the former.
Let us briefly explain why the lowest-order α n parameters in the dRGT action are fixed while they remain arbitrary in the Hassan-Rosen formulation, since the exact reason for this is sometimes obscured. For g µν = η µν to be a solution to the equations of motion following from (4.7), the cosmological constant for g µν must vanish. It is straightforward to verify that this requires 15 4α 0 + α 1 = 0, since this combination is proportional to the effective cosmological constant for g µν . Next, in order to remove terms linear in the perturbation h µν = g µν − η µν in the quadratic action (so-called "tadpoles"), one must enforce α 1 = 0. 16 Finally, the interaction parameters contain a redundant overall scale that can be absorbed into m 2 . One way to get rid of this 15 It should be noted that this first requirement can actually be avoided by demanding only that g µν = c 2 η µν is a solution. This is still flat of course and fixes c instead of a parameter of the action. 16 From the bimetric perspective, this second requirement would mean that the effective cosmological constant for f µν vanishes since the latter is proportional to α 1 ∼ β 1 +3β 2 +3β 3 +β 4 , c.f. equation (5.11). The first requirement of vanishing cosmological constant for g µν can also be written as 4α 0 + α 1 ∼ β 0 + 3β 1 + 3β 2 + β 3 = 0. redundancy is to demand that m 2 corresponds to the squared Fierz-Pauli mass in the quadratic action around flat space. This requirement finally fixes α 2 = 1.
The above action (4.7) thus gives rise to Minkowski solutions for g µν and linearising the theory around these backgrounds gives precisely the Fierz-Pauli Lagrangian (2.13) with m 2 FP = m 2 . In this sense the dRGT theory can be viewed as the consistent nonlinear completion of the Lagrangian for a massive spin-2 field in flat space. In the following, we will turn to the linear theory around arbitrary background solutions of the more general theory (4.2).

Linear theory on arbitrary background
As we pointed out in section 2.2, naïvely covariantising the linear theory for a massive spin-2 field in flat background does not result in a consistent action. Attempts to find the correct equations describing five helicity states around any background were made already before the ghost-free nonlinear theory was known. For instance, the authors of [103] were able to write down the consistent linearised theory to first order in a small-curvature expansion. On the other hand, the knowledge of the full nonlinear action (4.2) that avoids the ghost mode makes it possible to derive the linear theory around any background solution. 17 This computation was first carried out in [104,105] (see also [106][107][108]). The analysis is complicated by the presence of the square-root matrix g −1 f whose perturbation around general backgrounds is rather nontrivial due to the matrices g µν and f µν in general being non-commuting. The full expression can however be obtained using the Cayley-Hamilton theorem for matrices. An alternative way of arriving at the linearised theory is to redefine the dynamical fluctuation variables in order to simplify the squareroot variation [108].
The general structure of the quadratic Lagrangian for perturbations δg µν around arbitrary backgroundsḡ µν is, where E µνρσ is a function of the background metricḡ µν and quadratic in its associated covariant derivative∇ µ , whereas V µνρσ depends on the background metric, its curvatures and the reference metric f µν . The corresponding equations of motion for the fluctuations read, Using the background equations (4.3), it is always possible to convert curvatures ofḡ µν into functions of the background and reference metric. Eliminating the reference metric through the background equations is more difficult since the equations are nonlinear in f µν . In general it is not possible to obtain a closed form for the solution. However, quite remarkably, it turns out that if β 2 = β 3 = 0 a closed form solution exists and the equations can be used to get rid of all appearances of the reference metric. Then (4.11) describes five massive spin-2 degrees of freedom propagating in an arbitrary background given solely byḡ µν and its curvatureR µν . In the more general case it is still possible to solve the background equations perturbatively in curvatures and reproduce the results of [103].
The absence of the Boulware-Deser ghost in the linearised equations follows from the existence of the additional constraint in the nonlinear theory from which (4.10) was obtained. In order to bypass the ADM analysis and show explicitly that the above equations contain the same number of propagating degrees of freedom as in the Fierz-Pauli case (2.14) in flat space, one needs to identify the analogues of the constraints (2.15). The vector constraints, i.e. the generalisation of ∂ µ h µν − ∂ ν h = 0, are easily obtained by taking the (covariant) divergence of the equations and using the linearised version of the Bianchi identity. The additional scalar constraint which removes the ghost is more difficult to identify. For models with β 3 = 0, it has been found in [104,105,108]. It corresponds to the following combination of equations, 1 in which all terms containing two derivatives on δg µν cancel out. This equation is the analogue of h = 0, which it reduces to when the background is an Einstein spacetime. When β 3 = 0, it seems to be impossible to obtain the constraint in a covariant way [102,108] and the reason for that is not yet fully understood.

Gauge invariant massive gravity
A common perspective is to take the fixed reference metric f µν to be fully specified in a given coordinate frame which implies that the invariance under general coordinate transformations (GCTs) of GR is broken by the mass term for the graviton. This is because the metric g µν itself transforms as a covariant rank-two tensor under GCTs while the fixed background metric f µν does not if it is taken to be fixed in a given coordinate system. Therefore, in this view, objects like g ρµ f µν do not transform as tensors under diffeomorphisms. Of course covariance is not a true symmetry and it is always possible to simply view f µν as a fixed reference geometry but still allow it to transform as a tensor under GCTs, e.g. to be flat but not necessarily on the cartesian Minkowski form. A constructive way of restoring any gauge symmetry in the action is the so called Stückelberg trick: By introducing new gauge degrees of freedom, socalled Stückelberg fields, that transform in a certain way under the gauge group, one can rewrite the theory in a manifestly gauge invariant way whilst keeping track of the thereby introduced redundant degrees of freedom. In turn, fixing the so-called physical (or unitary) gauge will give back the original action.

The Stückelberg trick
One possibility to reintroduce diffeomorphism invariance in massive gravity is to perform a gauge transformation on the dynamical metric g µν and treat the gauge parameters as new fields in the action. Under a GCT, x α → x α (x), the metric transforms as After performing this GCT, one interprets the x α as dynamical fields. Another equivalent way, that turns out to be more convenient for studying massive gravity, is to mimic this kind of transformation on the reference metric f µν , which is then taken to not transform under GCTs. But since a simultaneous coordinate transformation of g µν and f µν would be a symmetry of the massive gravity action, instead of transforming g µν we can choose to perform the transformation on f µν . That is to say, we replace f µν by its covariantised form, Here,f αβ is a new fixed background metric that does not transform under the gauge group. The "coordinates" ϕ α are promoted to dynamical fields that transform as scalars under GCTs. Introducing these Stückelberg fields ensures that objects such as now are manifestly diffeomorphism invariant. In addition to the equations of motion for the metric g µν , we now also consider the ϕ α equations and, as we will see below, this system of equations is equivalent to the original g µν equation without Stückelberg fields. This method of restoring gauge invariance, in contrast to treating f µν as a fixed geometry tensor, is particularly useful for considering so called decoupling limits of a theory: Limits/regimes where certain operators become dominant and physical modes decouple.

The decoupling limit
Let us takef αβ = η αβ and expand the Stückelberg fields around the identity transformation, which corresponds to considering infinitesimal GCTs, Furthermore, let us decompose the perturbations into a transverse and a longitudinal mode, with ∂ απ α = 0. The ghost-free dRGT potential (4.7) was originally constructed using the gauge invariant formulation and performing a scaling limit to separate the interactions for the longitudinal component π.
The power of the Stückelberg formalism lies in a conjecture made in [47], stating that the ghost instability of nonlinear massive gravity can be traced back to higher-derivative interactions of the π fields in flat space, wheref αβ = η αβ . This assumption relies on an analogy to the Goldstone equivalence theorem for spin-1 fields [49,109]. In that case, the longitudinal modes of the gauge fields have been shown to carry all information needed for computing scattering processes at high energies. Although the theorem has not been proven for the spin-2 case, it is reasonable to start with the assumption of its validity in order to investigate the stability of massive gravity in a certain parameter limit defined through, The metric is decomposed into flat background η µν and fluctuations h µν according to g µν = η µν + hµν mg . The fields in the Stückelberg decomposition (4.17) already have the correct normalisations and, in a first approach, the vector modes are set to zero. Then the decomposition of the tensor g −1 f that appears in the interaction potential into flat background and perturbation reads, in matrix notation, The conjecture of [47] suggests now to construct the interaction potential in such a way that there appear no higher-order interactions of the longitudinal modes. Since each of these modes always comes with two derivatives, their interactions generically give rise to higher-derivative terms. These can be shown to lead to ghost instabilities due to the famous Ostrogradsky theorem [110] and must therefore be avoided in a consistent theory (see [111] for a recent discussion of this theorem). De Rham, Gabadadze and Tolley managed to tune the coefficients of the ∂ µ ∂ ν π interaction terms in such a way that the higher-derivative terms combined into total derivatives that could be dropped from the action [57,58]. Interestingly, the resulting action for the longitudinal modes (after the kinetic terms have been diagonalised) resembles precisely the Galileon interactions [112]. In fact, from (4.19) it is a trivial observation to see that by taking the square-root the expression becomes linear in Π µν and then any antisymmetric products of this square-root will at most carry two derivatives on any single π.
A few more comments are in order. It is sometimes argued (see e.g. [2,113,114]) that any massive gravity theory can be described as a theory of GR plus four scalar fields.
In other words, this would imply that it is always possible to reduce the number of degrees of freedom of the massive fluctuation h µν to two (corresponding to a massless spin-2 mode) and let the remaining three reside in the Stückelberg fields. Let us briefly clarify why this picture is not quite correct. Firstly, an equivalent statement would be that it is always possible to render the background solution for g µν flat by performing a coordinate transformation. As already pointed out in [115], this is obviously not correct since the flat space action and equations are not equivalent to the ones in curved space. Secondly, the counting of degrees of freedom in GR is an on-shell statement and only holds for the massless equations of motion, e.g. h µν = 0 on a flat background, which do not arise in a massive gravity theory. Therefore, the number of degrees of freedom in h µν cannot be reduced to two by simply performing a gauge transformation. Thirdly, from the transformation properties of the π α fields in (4.16) under linearised coordinate transformations (δ ξ π α = −ξ α ), it is clear that these fields can only mix with the four gauge modes of g µν . In other words, they are gauge trivial by construction. Physical degrees of freedom can therefore not be fully transferred to the gauge modes because otherwise they would not contribute to interactions between conserved sources (for which ∇ µ T µν = 0).
Although these caveats cast some serious doubts on the conclusiveness of the consistency analysis in the decoupling limit (other objections were raised in e.g. [116]), its historical importance for the discovery of ghost-free massive gravity should of course not be underestimated. The complete expression for the dRGT theory in the decoupling limit, including the transverse vector modes in (4.17), was later worked out in [117]. The limit is furthermore useful for understanding the validity regime of the effective field theory and the identification of the strong-coupling scale of massive gravity. We refer the interested reader to the review [2] where these subjects are discussed in more detail.

Absence of ghost in the Stückelberg formulation
While the proofs of absence of the Boulware-Deser ghost given in [59,61] were formulated in the gauge fixed version of Massive Gravity, i.e. without introducing the Stückelberg fields, it is expected that restoring the gauge invariance does not alter the dynamics of the theory. In fact, since gauge symmetry is merely a redundancy in the description of the underlying physics, the two formulations are completely equivalent.
In particular, absence of ghost in the gauge fixed version of massive gravity implies that the Stückelberg formulation cannot exhibit the instability either. Nevertheless, in [118] it was claimed that the ghost generically reappears in the Stückelberg sector and the author of [119,120] could not find a constraint to remove the ghost. First attempts to disprove these statements were made in [114,121], before the absence of ghost was conclusively shown in [63].
The fact that the dynamics remain unaltered in the gauge invariant theory can be demonstrated by considering the equations of motions for the Stückelberg fields. After making the replacement (4.14) for f µν , the mass potential V (g, f ) becomes a function of the scalar fields and their equations of motion read However, this equation will not give rise to any new dynamics because it is, in fact, already implied by the Bianchi constraint, where, In order to see the equivalence, consider a gauge transformation δx µ = ξ µ of the action (4.2) involving Stückelberg fields, where G µν is the Einstein tensor and the variations of the fields under the gauge transformation are δg µν = 2∇ (µ ξ ν) and δϕ α = −ξ µ ∂ µ ϕ α . The action is now invariant under GCTs, δS = 0. Thus integrating by parts and using the Bianchi identity, ∇ µ G µν = 0, we can derive the following identity, Since the Stückelberg fields were introduced to mimic nonsingular coordinate transformations, the matrix ∂ ν ϕ A is invertible and hence the Bianchi constraint ∇ µ V µν = 0 is equivalent to the ϕ α equations of motion (4.20). As expected, the redundant gauge degrees of freedom ϕ α therefore do not introduce new dynamics into the theory.
From these general arguments it is obvious that the consistency proof of [59,61] has to be valid for both the gauge fixed and the gauge invariant version of massive gravity. This was eventually confirmed in a Hamiltonian analysis where the constraint was shown to exist in the massive gravity action (with β 2 = β 3 = 0) including Stückelberg fields [63].

Potential shortcomings of massive gravity
In order to conclude the presentation of nonlinear massive gravity, we list a few drawbacks of the theory in its present form. These can be viewed as motivations to look for possibilities of going beyond the setup with fixed reference metric.
• The reference metric f µν in the general massive gravity action (4.2), or η µν in the dRGT theory (4.7), is not dynamically determined. It is put into the theory by hand and there is no obvious fundamental principle determining its form. From a field theoretical point of view, this and the related fact that the theory (without additional fields) breaks diffeomorphism invariance are rather undesirable features and it is not entirely clear how to interpret the presence of the fixed reference metric. The notion of a pre-geometric structure clearly goes against the main spirit of GR.
• Possibly related to the above point is the occurrence of superluminal and in particular acausal propagation in massive gravity [102,[122][123][124][125][126]. 18 This raises serious questions for the physical viability of the theory, but see [2] for a discussion of some counter-arguments.
• The equations of motion of massive gravity (with flat or general reference metric) cannot give rise to homogeneous and isotropic solutions that lead to a viable cosmology [127][128][129][130][131][132][133]. Lacking this feature, the model cannot serve as a serious alternative to GR.
• More generally, in the parameter space of massive gravity there is no good limit which brings the equations and their solutions close to those of GR, which are welltested. In principle, this issue could be resolved by the Vainshtein mechanism, but this causes serious tension with the cosmological Higuchi bounds [132,134]. The underlying reason is that the metric that couples to matter contains additional degrees of freedom with respect to GR. This generically changes the physics significantly, as becomes apparent through the vDVZ discontinuity already at the linearised level.
From our point of view, the above shortcomings strongly motivate the extension of the ghost-free massive gravity action to a fully dynamical bimetric theory, which we shall focus on in the remainder of this article.

Ghost-Free Bimetric Theory
A natural question that arises within the formulation of massive gravity with general reference metric f µν is whether this second metric could be dynamical on its own without spoiling the consistency of the theory. The fact that one can extend the theory by a kinetic term and an equation of motion for f µν without reintroducing the Boulware-Deser ghost is not immediately evident from the consistency proofs of [59,61,63]. Nevertheless, Hassan and Rosen were able to show that one can indeed augment the action by an Einstein-Hilbert term for the second metric and let it be determined dynamically [76]. It has subsequently been confirmed that this also seems to be the unique kinetic term which can be added to the nonlinear massive gravity action in order to give dynamics for f µν [135][136][137]. 19 In this section we shall present the resulting ghostfree bimetric theory, review its consistency proof and discuss some of its most important features.

Action and equations of motion
The ghost-free action for Hassan-Rosen bimetric theory is given through [76] S HR = m 2 where m f is the "Planck mass" for f µν . It corresponds to the massive gravity action (4.2), extended by an Einstein-Hilbert term for the reference metric f µν which is promoted to a dynamical field on the same footing as g µν . Consistency of the action now of course requires the inverse f µν to exist which, in analogy with GR, we shall always assume except perhaps at isolated physical singularities. The corresponding equations of motion for the two metrics read, in which the contributions from the interaction potential in terms of S = g −1 f are of the form, The matrix functions Y (n) (S) have already been defined in (4.5). Due to the overall covariance of the interaction potential there is an identity between the divergences of these interaction contributions [70] (note that this identity follows from covariance and is otherwise independent of the form of the bimetric interactions), where∇ is the covariant derivative compatible with f µν . Due to this identity there is only one set of independent Bianchi constraints just as in massive gravity. Two important remarks are in order: Firstly, note that the β 4 -term √ g e 4 (S) = √ f is no longer non-dynamical but now contributes to the f µν equations of motion. Secondly, due to the more general symmetry property of the elementary symmetric polynomials (which is just a rewriting of (A.8)), the structure of the above action is symmetric in the two metrics. At the level of equations, this symmetry manifests itself through the identity, The metrics are therefore treated on the same footing and in section 5.4 we will see how a "physical metric" is selected only by the choice of matter couplings. Various further aspects of the Hassan-Rosen action will be discussed in more detail in the remaining parts of this review.

Absence of ghost
The interaction potential of bimetric theory breaks the two diffeomorphism invariances of g µν and f µν down to their diagonal subgroup. In other words, the bimetric action is not invariant under independent coordinate transformations of the two metrics but only under those δx µ = ξ µ that transform both metrics simultaneously in the same way: δ ξ g µν = −2g ρ(µ ∇ ν) ξ ρ and δ ξ f µν = −2f ρ(µ∇ν) ξ ρ , where∇ is again the covariant derivative compatible with f µν . For a general interaction potential, the degree of freedom counting therefore goes as follows: There are 2 × 10 = 20 components to start with; 2 × 4 = 8 of these get removed by gauge constraints and gauge fixing. Just as in the massive gravity case there is one set of Bianchi constraints, which can be taken to be either ∇ µ V g µν = 0 or∇ µ V f µν = 0, since these are related by the identity (5.5). These vector constraints thus removes four additional degrees of freedom, leaving us with a total number of eight propagating modes. These correspond to the two degrees of freedom of a massless spin-2 field, five of a massive spin-2 and one additional scalar which gives rise to the Boulware-Deser ghost instability. In a consistent bimetric theory we therefore also need an additional constraint that eliminates the ghost mode.
In the ADM language, this means that we need the action to be linear in both lapses N and L of the two metrics as well as one set of three-dimensional vectors. In total there will then be five non-dynamical variables whose equations of motion become constraints: four corresponding to the gauge constraints associated to the overall diffeomorphism invariance and one extra constraint that removes the Boulware-Deser ghost.
Using the same variables as for massive gravity (c.f. section 3.4) resulting from the redefinition (3.19), Hassan and Rosen were able to show that the bimetric action (5.1) indeed assumes the form [76], in which the constraints C i ,C and C are independent of the lapses N and L and the shifts L i . As in the massive gravity case, the action is nonlinear in the redefined shift vectors n i , which is a consequence of the breaking of one set of general coordinate transformations. The scalar constraintsC and C contain terms coming from the interaction potential, whereas the vector constraints C i entirely originate from the Einstein-Hilbert terms. Note that due to the redefinition of the form N i = Ln i + L i + N D i k n k , all constraints receive contributions from the Einstein-Hilbert term for g µν which originally contains a term N i R i , c.f. (3.4). These results have subsequently been confirmed by explicit calculations and independently verified in various other approaches (see e.g. [63,[67][68][69]) which we shall not review here. Instead we note that the key observation which motivated Hassan and Rosen to study the fully dynamical extension of massive gravity was that due to the symmetry property (5.6), linearity of the interaction potential in the lapse N implies that it must also be linear in L. 20 The corresponding secondary constraint which removes the canonical momentum of the ghost mode was also shown to exist [62]. Bimetric theory therefore gives rise to the correct amount of constraints in order to propagate the 5 + 2 = 7 degrees of freedom, corresponding to a massive and a massless spin-2 field.
At the nonlinear level there is no unique split of degrees of freedom into "massive" and "massless" ones. As discussed in the following, the mass eigenstates can only be properly defined in the linearised theory around particular backgrounds.

Mass spectrum
The notion of mass is intimately related to the isometries of space-time. Its definition is most concise in Minkowski space where mass arises as a Casimir invariant of the Poincaré isometry group and it is possible to generalise that concept to space-times with the same amount of symmetries, i.e. Anti-de Sitter and de Sitter isometries. For less symmetric spaces it becomes more difficult to obtain a clear identification of mass, but one option is to classify a field as massless or massive depending on its number of propagating degrees of freedom. More precisely, if there is a parameter which when taken to zero increases the amount of gauge redundancy and thus reduces the number of propagating degrees of freedom to that of a massless theory then that parameter can loosely be identified with mass. This notion is implicit when we use the term "nonlinear massive gravity" or when we speak of "massless" and "massive" degrees of freedom in bimetric theory. Nevertheless, around their maximally symmetric background solutions, we expect the nonlinear theories to give rise to a well-defined mass spectrum. In bimetric theory, such backgrounds are most easily obtained by making a proportional ansatz for the metrics. 21

Proportional background solutions
Probably the simplest and yet remarkably important class of solutions to the bimetric equations of motion in vacuum is obtained by making an ansatz that conformally relates the two metrics, Having plugged this ansatz into the equations of motion, we first note that the Bianchi constraint, ∇ µ V g µν = 0, immediately enforces c(x) = const. This is simply because this equation reduces to a polynomial in c with constant coefficients that multiplies ∂ ν c. This restricts our ansatz to proportional metrics, Using this in the bimetric equation (5.2) we find that they simply reduce to two copies of Einstein's equations, In this we have defined the cosmological constant contributions arising from the interaction potential, From (5.11) it is clear that this equation is a polynomial in c with coefficients depending on the β n parameters and α = m f /m g . In general, it serves to determine the proportionality constant c of our ansatz in terms of the parameters of the theory and thereby specifies the solution completely. An important exception to this generic situation is the partially massless case, which we discuss in section 7.
Apart from being able to capture all solutions of GR, the proportional background solutions are of particular interest because they allow for a definite mass spectrum of fluctuations around them. In general, the linear perturbation equations have a rather complicated structure because in order to derive them one needs to vary the square-root matrix S = g −1 f . As we already discussed in section 4.3 for the massive gravity setup, this is always possible but, for backgrounds giving rise to matricesḡ −1f that do not commute with the fluctuations, the resulting expressions are lengthy. In particular, the equations will not contain a mass term with Fierz-Pauli structure which makes it difficult, if not impossible, to uniquely identify the massive field. In contrast, for the proportional backgrounds we haveḡ −1f = c 2 1 which does commute with any other matrix and hence drastically simplifies the perturbation equations which now will exhibit the Fierz-Pauli structure.

Spectrum of linear mass eigenstates
We shall now consider small perturbations around the proportional backgrounds for both of the metrics, The variation of the square-root matrix is easily obtained as, Plugging these into the bimetric equations (5.2), keeping only terms linear in the fluctuations and using the background equations, we obtain, Here N depends on α, c and the β n parameters and the explicit dependence can be read off from the Fierz-Pauli mass below. Here, we have made use of the fact that the background metric can be conveniently rescaled by a constant without changing the structure of the linearised equations and expressed the equations with respect to a redefined background metric,Ḡ We have also redefined the cosmological constant with respect to this background, and expressed the kinetic operatorĒ ρσ µν with respect to theḠ µν background, 18) in which∇ is the covariant derivative compatible withḠ µν .
The above equations can now easily be diagonalised into an equation for a massless and a massive perturbation. To this end, consider the following combinations of metric fluctuations, Now, taking the appropriate linear combinations of the original fluctuation equations for δg µν and δf µν decouples the massless from the massive mode. The resulting equations read [115,143] where δG =Ḡ µν δG µν and δM =Ḡ µν δM µν . The Fierz-Pauli mass in these equations is given by,m We remind that in all of the above expressions c is to be regarded as a function of the Planck masses and the β n parameters, determined by the background equation Λ g = Λ f .
As advertised, (5.20a) and (5.20b), respectively, describe a massless and a massive spin-2 fluctuation in (Anti-) de Sitter background. At the linear level around proportional backgrounds, one can therefore assign two degrees of freedom to a massless fluctuation δG µν and the remaining five to a massive fluctuation δM µν . The linearised action in terms of the mass eigenstates is [143], The main reason for choosing the new backgroundḠ µν was to render the final action and equations as simple as possible. Alternatively, we could have written all of the above expressions with respect to the original backgroundḡ µν , in terms of Λ g in (5.11) and a properly rescaled Fierz-Pauli mass, m 2 FP ≡ 1 + α 2 c 2 m 2 FP .

Couplings to Matter
So far we have been dealing with theories involving only spin-2 degrees of freedom. In order to be accessible to any type of observations or experiments, the fields need to interact with ordinary matter. It may not come as a surprise that the set of allowed matter couplings which do not reintroduce the Boulware-Deser ghost is very restricted.

Ghost-free matter couplings
The only known couplings which can be added to the bimetric action (5.1) without exciting the ghost are, where L m andL m are standard minimally coupled matter Lagrangians of the same form as in GR. Φ g and Φ f schematically stand for sets of matter fields of any kind. Importantly, it was shown in [144,145] that it is not possible to couple the same (dynamical) matter field to both metrics using minimal couplings, and hence Φ g and Φ f must be entirely independent. That coupling a field to both g µν and f µν reintroduces the Boulware-Deser ghost can be understood as follows: In GR with only one metric, the matter action becomes linear in the lapse N when written in the canonical variables for the metric and the matter fields. A simple calculation in an example with a free scalar field ϕ shows that this happens because the variation of the action with respect toφ and hence also the canonical momentum of the scalar depend on N . This Ndependence is such that the action in terms of canonical variables is linear in N . In the bimetric case, however, when the free scalar is coupled to both g µν and f µν , its canonical momentum will depend on N and L in a more complicated way and the action will not become linear in both of the lapses. As a consequence, the constraint which removes the Boulware-Deser ghost is lost. Note that this argument does not exclude coupling pure interaction terms (without appearance of time derivatives) to both of the metrics. This possibility seems quite contrived, however, and we do not discuss it any further but instead focus on couplings to two entirely independent matter sectors.
In the equations of motion the matter couplings enter in the form of stress-energy tensors, The bimetric equations (5.2) in the presence of matter sources thus become We will comment on other matter couplings that have been studied in the literature below but first we shall discuss two important limits in the parameter space of bimetric theory which take it close to either general relativity or nonlinear massive gravity.

General relativity limit
As we pointed out in section 4.5, the parameter space of massive gravity with a fixed reference metric does not include any region which is obviously close to GR. In the following we will see that, in contrast, a well-defined GR limit does exist for the bimetric theory. Since the structure of the bimetric action is completely symmetric in g µν and f µν , either of the two metrics can play the role of the physical metric whose solutions will become similar to those of GR. We choose this metric to be g µν here but note that everything can analogously be derived with the roles of the metrics interchanged.
It is straightforward to verify that the contributions from the interaction potential V to the bimetric equations of motion (5.25) satisfy the following identity [146,147], 22 Making use of this observation, the equations can be combined to give, This particular set of equations has interesting implications on classical solutions in bimetric theory which we shall come back to in section 6. For our purposes here it suffices to consider the dependence on the parameter α ≡ m f /m g in (5.27). In particular we note that in the limit α → 0, the equations reduce to [146,147], When T f µν = 0, i.e. when there is no matter sector for the metric f µν , we can use the covariant derivative compatible with g µν to take the divergence of the equations which for a covariantly conserved source then gives V = constant on-shell. In this case, the equations thus reduce to Einstein's equations for the physical metric g µν with Planck mass m g and cosmological constant m 4 Hence, the GR limit of bimetric theory is defined by, in which case the solutions for the physical metric g µν coincide with those of GR. 23 Interestingly, a large value for the physical Planck mass m g automatically implies that bimetric theory is close to its GR limit, provided that m f is of reasonable size compared to other relevant scales (such as the electroweak scale, for instance).
The effect of taking the above limit on the f µν equation ( We further observe that, in the GR limit, the fluctuation of the physical metric g µν becomes massless as expected. This can be seen from (5.19), which for α → 0 gives δG µν → δg µν . It is an important feature because, at least to our present knowledge, only the couplings of the original metrics to matter are ghost-free, whereas on the other hand the gravitational metric needs to behave like a massless spin-2 field for phenomenological reasons. In particular, since the massive mode decouples from the source, bimetric theory in the GR limit does not suffer from the vDVZ discontinuity and hence does not need to rely on the Vainshtein mechanism, a requirement which usually challenges the phenomenological viability of massive gravity models.
To summarise, in the limit of small α, bimetric theory can be viewed as a smooth deformation of GR because the massive mode decouples in the limit. In particular, the gravitational metric satisfies Einstein's equations modified by a small correction and its fluctuations are dominated by the massless spin-2 mode. The dynamics of the massive field essentially decouple from the observable matter sector and its presence manifests itself only through the constant term in (5.29) which for a sufficiently small spin-2 mass could give rise to cosmic self-acceleration [148]. On the other hand, one can tune the β n parameters such that large contributions to the effective cosmological constant cancel (at the cost of giving up technical naturalnessà la 't Hooft) and thus, even for a very large spin-2 mass, its observable effects could remain small.
Note also that, in this setup, the massive spin-2 field could potentially be a suitable dark matter candidate: It interacts with the Standard Model fields only very weakly, but couples non-minimally to the gravitational metric. A remarkable feature of this scenario would be that the closeness of the theory to GR goes hand in hand with the decoupling of the dark matter field, and both get related to the largeness of the Planck scale m g . For related approaches in the context of bimetric theory, where not the spin-2 field itself is considered as the dark matter candidate but additional fields are invoked, see [149][150][151][152].

Massive gravity limit
Let us now consider the limit opposite to the one above, namely α → ∞. In this limit, bimetric theory reduces to massive gravity with a GR reference metric, i.e. f µν solves the standard sourced Einstein equations [153][154][155][156]. To see this, consider once more the bimetric equations of motion (5.25), this time along with the following scalings, Note that we introduced a new mass scale M and stress-energyT here because now we also require a scaling of the matter fields in the f µν sector in order to be able to treat solutions for f µν that solve Einstein's equations in the presence of matter (for more on this see e.g. discussions in [115,146,154]). Similarly, the scaling of β 4 is required to keep a cosmological constant term for f µν . The remaining β n are not allowed to scale since this would destroy interactions in the g µν equations (recall that β 4 only appear in the f µν equations). The g µν equations (5.25a) are then unaffected but in the f µν equations (5.25b) all bimetric interaction terms drop out. Thus, f µν is now determined by, which is an Einstein equation with cosmological constant m 4 m 2 g β 4 and Planck mass M . The exact limit thus decouples the dynamics of f µν which is now determined by equations that do not involve g µν . We can obtain a solution to these equations and use it to replace f µν in the g µν equations. Effectively, the g µν equations are therefore the same as the ones obtained from varying the massive gravity action (4.2), in which f µν is taken to be a fixed reference metric that solves Einstein's equations (5.33). This picture explains the emergence of the fixed reference metric in massive gravity since the latter can be viewed as a particular point in the parameter space of the fully dynamical bimetric theory.
From the existence of the above limit we can infer that the solution space of bimetric theory is richer than that of massive gravity. Solutions (g, f ) to the bimetric equations of motion can be of the form (g + O(α −1 ), f + O(α −1 )), with (g , f ) taken to be αindependent. Only for such solutions is the limit α → ∞ well-defined and result in the massive gravity configurations (g , f ). Other bimetric solutions, however, do not possess a well-defined massive gravity limit, i.e. they become singular for α → ∞. Such configurations for the metrics are known to exist and have no massive gravity counterpart, but rather constitute a distinct feature of bimetric theory. Simple examples of proportional solutions that become singular in the limit have been found in [147]. Moreover, see [157] for a study of Hawking-Moss instanton solutions in bimetric theory which do not seem to allow for a well-defined massive gravity limit.
To summarise, all solutions of massive gravity can be viewed as arisen from bimetric theory, whereas it is quite easy to find solutions of the bimetric theory which does not have any massive gravity counterpart. Therefore, since the limit is singular, care has to be taken when arguing that results which hold in the massive gravity limit also hold in the full theory.
It has occasionally been argued that the above limiting procedure is somehow inferior and that the limit should instead be taken in the action. But, in fact, the two procedures of taking the limit in the equations or in the action are fully compatible when treated correctly. To see this let us briefly review the limiting procedure at the level of the action, as discussed, for instance, in [2]. In this case one starts by expanding the bimetric action (5.1) around, e.g., a constant curvature solution, 24 and then considers the limit m f → ∞. Since f E µν is assumed to be a constant curvature metric the kinetic term for f µν simply reduces to the quadratic action for canonically normalised fluctuations on this background, i.e.
whereĒ µνρσ is the operator defined in (2.12), written in terms of the metric f E µν and its curvatures. In the interaction potential on the other hand, all the δf µν fluctuations decouple in the limit and, assuming an appropriate scaling of β 4 , only a cosmological term for the fluctuations remains. One then ends up with a non-covariant action in terms of a decoupled linear spin-2 field δf µν and a nonlinear spin-2 field g µν whose interaction potential contains the fixed reference metric f E µν . Now varying with respect to the dynamical fields g µν and δf µν results in the massive gravity equations for g µν (containing g µν and f E µν ), supplemented with a completely decoupled linear equation for δf µν . Of course, this procedure is only self-consistent provided that f E µν is really a finite constant curvature background solution in the limit. Ensuring this leads exactly to the limiting procedure of the equations we have discussed above. It is also straightforward to see that if we express our equations (5.25a) and (5.33) in the limit in terms of g µν , δf µν and f E µν , then we end up with exactly the same result as that obtained from the action (the reason being that, if δf µν /(αm g ) is assumed subdominant to δg µν /m g in the action, the same will of course hold in the equations). Therefore taking the limit in the equations is equivalent to doing it in the action, but the former procedure deals more directly with solutions and the requirements for these to exist.

Other matter couplings
Matter couplings differing from the consistent ones in (5.23) have also been studied in the literature. Particular attention has been paid to the "doubly coupled" theory for which the matter sectors of g µν and f µν contain the same fields [158,159] but, as explained above, these couplings reintroduce the Boulware-Deser ghost [144,145]. On the other hand, it turns out that a certain combination of the two metrics can be coupled to matter without exciting the ghost in the low-energy effective field theory [145]. This "effective metric" contains two arbitrary parameters a and b and is of the form, whose "uniqueness" 25 has been discussed in [161,162]. The reason this metric is special is because it can be written, in matrix notation, G eff = a 2 g (1 + (b/a)S) 2 , and hence det(G eff ) = a 4 √ g det(1 + (b/a)S). The identity (A.5) then reveals that any vacuum contribution generated by matter coupled to such a G eff will not alter the ghost-free form of the bimetric interaction potential. Interestingly, this metric can be written as a Finsler metric [159] and therefore provides a situation where the geometric interpretation is shifted from the standard (pseudo) Riemannian description to an effective Finsler geometry. 26 Since our main interest here lies in working with the full bimetric action and not in the effective field theory picture, we will not discuss phenomenological implications of such a matter coupling in this review. The interested reader is referred to the large variety of references [99,151,152,[165][166][167][168][169][170][171][172][173][174] in which the effective coupling has been studied further, mostly in the context of cosmology. 25 More precisely, the metric (5.36) is unique up to multiplication of the right-hand side by an arbitrary matrix of unit determinant. When it is expressed in terms of vierbeins more ambiguities arise [160]. 26 A Finsler geometry departs from the pseudo Riemannian geometry in that it characterises a manifold with a norm but it is not necessarily infinitesimally Minkowski [163,164].

Classical Solutions
Despite the fact that the consistent theories have only been known for a few years, there already exists an extensive literature on classical solutions in ghost-free massive gravity and bimetric theory. Here we focus on bimetric theory whose solution spectrum is richer than that of massive gravity (c.f. section 5.4.3). Perhaps most interesting is the class of spherically symmetric solutions, which can potentially be used to study stars, galaxies, black holes and cosmology. An immediate problem that arises in this context is that the bimetric theory has no known analogue of Birkhoff's theorem 27 and therefore many of the valuable uniqueness theorems of GR do not apply. This means that in many situations several solutions may exist for the same problem and one is left with the task of sorting out the most relevant one. A strong physically motivated guide here is to explore the stability of solutions under perturbations. Another complication is the analytical complexity of the nonlinear equations of motion, and in many cases only numerical solutions are known as of yet. In this review we have mainly kept our attention towards analytically tractable problems and will continue this practice. We will therefore restrict our discussion mainly to black hole and cosmological solutions, discuss features of these which can clearly be discerned analytically and only comment briefly on some phenomenological issues. For the spherically symmetric solutions and in particular their applications to black hole studies there already exist a few good reviews on the current status [176][177][178]. We refer the interested reader to these for additional details. Before discussing particular features of the spherically symmetric solutions, we make some general remarks.

General properties
Recall the form of the bimetric equations of motion in the presence of matter sources, with the contributions from the interaction potential given in (5.3). It is clear that due to the presence of these additional interaction terms, in general, the solutions to the bimetric equations will significantly differ from those obtained in GR. From a phenomenological point of view, such large new effects are not desirable since Einstein's theory is well-tested over a wide range of distance scales. In order not to be ruled out immediately, any modification of gravity needs to give rise to solutions that do not deviate too much from those of GR. As we already saw in section 5.4.2, in the limit of small α ≡ m f /m g and a vanishing source for f µν , all equations for the physical metric g µν will smoothly approach the form of the GR equations and so will their linear perturbations. This is already good news for the viability of bimetric theory. On the other hand, it may also render the theory less interesting because if all solutions are (almost) indistinguishable from those of GR in the weak-field limit then there is little room for predicting new signatures that could be observed. It is therefore interesting to see if the above equations, away from the GR limit, can still give rise to solutions that behave similarly to those of Einstein's theory.
In section 5.3.1 we already encountered the proportional backgrounds as an example of GR solutions in bimetric theory. However, in the presence of matter, these backgrounds only exist if the stress-energy tensors satisfy the rather strict constraint T f µν = α 2 T g µν . As we will see below, the bimetric equations can also reduce to Einstein equations after inserting ansätze for the metrics which possess particular symmetry properties (e.g. spherical symmetry). In these cases, although the background solutions reproduce exactly the predictions of GR, differences will generically occur at the level of perturbations and the theory can in principle make testable predictions.
In this context, it is important to notice one more feature of bimetric theory: If either g µν or f µν is assumed to be an exact solution to the Einstein equations, then the bimetric equations will force the second metric to also solve (a different set of) Einstein's equations [147]. 28 In vacuum this can be seen as follows: Recall that using the identity (5.26), we were able to combine the bimetric equations into, If in this equation we assume that, for instance, f µν is a GR solution with cosmological constantΛ, then (6.2) takes the form, Taking the covariant divergence of this equation implies that the terms proportional to δ µ ν are constant and hence the equation reduces to an Einstein equation for g µν , This proof can straightforwardly be generalised to the equations including matter sources. However, in this case GR solutions do not exist unless the sources for g µν and f µν are related in a particular way [147]. Solutions of this type (with stress-energy tensors assumed to resemble perfect fluids) have been found in [179].
Another interesting general feature of the bimetric solutions was noticed in [180]. There it was found that the effective stress-energy tensors that the bimetric interactions generate, i.e. the negative of V g µν and V f µν , never satisfy the null-energy condition simultaneously unless the two metrics are proportional. In more detail, Ref. [180] found that if k µ is a null-vector with respect to g µν , i.e. g µν k µ k ν = 0, then one can always definek µ = (S −1 ) µ ρ k ρ which turns out to be a null-vector with respect to f µν , i.e. f µνk µkν = g µν k µ k ν = 0. This follows since f µν = g µρ S ρ σ S σ ν = g σρ S ρ µ S σ ν and can be used to demonstrate that if e.g. V g µν satisfy the null-energy condition V g µν k µ k ν ≤ 0 (where the direction of the inequality follows from our sign convention in defining V g µν ) then V f µν will satisfy an opposite inequality V f µνk µkν ≥ 0. Furthermore the inequalities only saturate for proportional solutions, when the interactions reduce to pure cosmological constant contributions (c.f. section 5.3.1). Of course the null-energy condition is usually applied to the matter sector within the context of GR so it is not completely obvious what a violation actually means physically in this case. For example, if we consider the case where only g µν couples to a matter source T g µν and take it that V f µν does not (does) satisfy the null-energy condition. Then generically V g µν will (will not) satisfy it which implies that the standard physical interpretation of the null-energy condition on the source term T g µν may change. This interesting observation is something which deserves further study. One immediate consequence however was the prediction of worm-hole solutions in the theory, which were subsequently found and analysed in [181].

Black hole solutions
Spherically symmetric solutions were first discussed in [182,183] and have since then received considerable attention [184][185][186][187][188][189][190][191][192][193]. As already mentioned in the introduction of this section, there is no known analogue of Birkhoff's theorem in bimetric theory and therefore many of the uniqueness theorems of GR fail straight away. In fact, for spherically symmetric ansätze, it is fairly straightforward to make an initial separation into two classes of general solutions, the so-called bidiagonal and non-bidiagonal solutions. These labels are not that imaginative but they do keep to the point: The bidiagonal solutions are solutions for which both metrics can be brought to a simultaneously diagonal form while the non-bidiagonal solutions are those solutions which cannot. Another important class of solutions are the proportional solutions discussed in section 5.3.1. Now, we can always use the isometries to choose coordinates such that at least one of the metrics is diagonal and therefore the proportional solutions fall into the broader class of bidiagonal solutions.
The most general spherically symmetric ansätze can be written in the form, where U, V, A, B, C, D are all functions of the radial (r) and temporal (t) coordinates.
We have made use of the spherical isometry to fix the angular and radial coordinates such that the angular measure dΩ 2 = dθ 2 + sin 2 θdϕ 2 is in the standard form and comes with the standard factor of r 2 for g µν , which also puts g µν in diagonal form. This form has been heavily used in the literature and serves to simplify parts of our discussions, but it should be noted that in some situations, concerning in particular black hole solutions, it is preferable to work with coordinates that are regular at the horizon, such as the advanced Eddington-Finkelstein coordinates (see e.g. [178,190]). Since g µν is diagonal it follows from the equations of motion, that g ρµ V g µν must be diagonal on the solution. The only off-diagonal terms in g ρµ V g µν that do not vanish identically are, This can vanish either if C = 0, which characterises the bidiagonal solutions, or if D = ωr with ω being a solution of, Notice that the second option, which characterises the non-bidiagonal solutions, does not exist if two of the β n parameters with n = 1, 2, 3 are zero. Furthermore, the condition forces ω to be a constant. We will now discuss the two classes of solutions in some more detail.

Bidiagonal black hole solutions
We start by discussing the bidiagonal solutions, which are obtained from (6.6) with C = 0. As a simple but interesting and illustrative warmup we recall the proportional background solutions of section 5.3.1, with f µν = c 2 g µν . If either metric is diagonal, then these are obviously bidiagonal. It should also be clear that any vacuum solutions of the standard GR equations will in fact also be solutions of the bimetric equations for the proportional backgrounds (with appropriate restrictions on the constant of proportionality). In particular we may now consider bidiagonal black hole metrics. Since these are proportional vacuum solutions the linearisation around these solutions follows the analysis of section 5.3.2 and the fluctuations obey the massless equations, and the massive equations, 11) where, in terms of the original fluctuations δg µν and δf µν , the massless field is given by δG µν = δg µν + α 2 δf µν , and the massive field is obtained from 2c δM µν = δf µν − c 2 δg µν . We also recall that (with ∇ µ associated to the background field g µν ), The massless equations (6.10) are precisely those of linearised GR and a standard treatment shows that they propagate the two (±2) helicity states of a massless spin-2 field. Let us therefore focus on the massive field equations (6.11). Due to the Bianchi identities obeyed by the Einstein tensor (including the Λ terms) a covariant divergence of these yields, where the last equality follows on-shell and provides four non-dynamical constraint equations for δM µν . A second covariant divergence of these equations yields, Tracing instead the field equations (6.11) we get, It then follows that the particular linear combination, constitutes another on-shell scalar constraint, 29 g µν δM µν = 0, which then also implies ∇ µ δM µν = 0. Together these equations correspond to five constraints that can be used to remove five degrees of freedom from the original 10 components of the symmetric tensor perturbations δM µν . Enforcing these constraints, the equations of motion for the massive spin-2 fluctuations can therefore be reduced to the following system, This brief discussion provides a generalisation of the constraint analysis of section 2.2, when augmented to the case of constant curvature backgrounds, and also complements the analysis of section 5.3.2. Moreover, as first recognised in [195], the dynamical equations for δM µν in the above form are actually identical to the equations of a 5D black string when projected down onto its 4D sub-manifold, namely, where k 2 denotes the wave number of the transverse Fourier mode. This equation is identical to (6.17) with the replacement m FP → k. It has immediate and interesting implications since it is well known that the solution of (6.18) exhibits the so called Gregory-Laflamme instability [196] (for a broad review on this subject see [197]). Namely, the solution is unstable in the parameter region 0 < k < k c , where k c is of the order of the inverse Schwarzschild radius. In the case of the 5D black string this instability concerns the mode longitudinal to the 5th direction along the black string and the end point of the instability is known to result in a sort of "pinching" of the black string into an open necklace with beads of 4D Schwarzschild black holes strung on it. In the case of the bidiagonal bimetric solution when the metrics are proportional 29 An exceptional situation occurs when the parameters satisfy 2Λ = 3m 2 FP , saturating the so called Higuchi bound [81]. In this case the left-hand side of (6.16) vanishes identically, such that a new linear scalar gauge symmetry emerges and the theory propagates a partially massless spin-2 field with only four degrees of freedom [85,194]. We will get back to this in section 7.
Schwarzschild (or Schwarzschild-de Sitter) metrics it follows that it too will have a similar instability. 30 But since there is no 5th dimension, no such interpretation is available and the status of the end point is presently unknown (one possibility is mentioned below). It is clear however that the instability only manifests itself over a characteristic time-scale comparable to the Hubble time H −1 0 for a small enough mass parameter m FP ∼ H 0 and it would therefore be hard to detect any signature connected with this. A similar study of the bidiagonal Kerr solutions have revealed that these additionally suffer from a super-radiant instability [187]. For more details and discussions on this we refer the interested reader to the reviews [176][177][178] and references therein.
Another simple and illustrative example is to consider the metric ansätze (6.6) as a static perturbation of proportional Minkowski backgrounds, i.e.f µν = c 2ḡ µν = c 2 η µν . 31 Thus we consider the functions in (6.6), with C = 0 and no temporal dependence, as given by The equations of motion for the perturbations can then be solved perturbatively to give (see e.g. [182,184]), 20) where M 1 and M 2 are integration constants and we recall that the Fierz-Pauli mass m FP is given by, The above expressions in (6.20) clearly have the form of Yukawa suppressed GR solutions for spherically symmetric perturbations of flat space-time. They exhibit the vDVZ discontinuity since, in the small mass limit m FP → 0, the combination, does not vanish as it would in GR. On the other hand, we also see that both δB and δD diverge for m FP → 0, so this limit is in fact not well-defined on the solution. These divergences in the small-mass limit are remedied via the Vainshtein mechanism when going to higher orders in perturbation theory. Although the full nonlinear forms of these solutions are not known analytically their perturbative and numerical existence provides some initial hope of such a completion. The perturbative form, going up to second order, has been used for initial studies of e.g. strong lensing [184] and in confirming the Vainshtein mechanism in bimetric theory [13,183,192]. Making a full perturbative ansatz has also allowed to find numerical solutions to the nonlinear equations which are asymptotically flat and have massive hair. It has been conjectured that these solutions are the end point of the linear bidiagonal Schwarzschild instability discussed above [186]. Matching these solutions however requires the black hole to be of cosmological size and they are therefore unlikely to be of astrophysical interest.
In more general setups, away from proportional backgrounds, all known analytical solutions correspond to both metrics being of standard GR form and do not have massive hair. On the other hand, it is known that the bidiagonal class allows also for numerical solutions which are of non-GR form and do contain massive hair, typically with anti-de Sitter asymptotics. So far these more exotic solutions have only been studied numerically and we refer the reader to [177,178] for more discussions of their explicit behaviour.

Non-bidiagonal black hole solutions
The non-bidiagonal solutions have C = 0 and D = ωr in (6.6), with ω being a constant solution of (6.9). Furthermore, the Bianchi constraint ∇ µ V g µν = 0 implies that (see e.g. [177]), where S ρ ν are components of the square-root matrix S = g −1 f . This condition together with (6.9) suggests that the non-bidiagonal solutions are a very non-generic class in the sense that the metric coefficients are forced to have a quite peculiar dependence on the β n parameters of the action. If we assume that ω is a solution of (6.9) and that (6.23) is satisfied, we find that the equations of motion decouple into two separate Einstein equations, where the cosmological terms are given by, Therefore, this peculiar way of fixing the metric functions in terms of the β n parameters is an alternative and more complicated way of tuning the interactions to be purely cosmological, as opposed to the more obvious proportional solutions. In this context, recall the results of section 6.1, showing that whenever one metric is a solution to Einstein's equations, the other one must be an Einstein solution as well. That such solutions (both Einstein, but with different cosmological constants) exist could perhaps have been anticipated by the complicated matrix structure of the interaction terms. All black hole solutions in the non-bidiagonal class are therefore of standard GR type. An initial treatment of perturbations around these solutions can be found in [191]. For radial modes the analysis simplifies drastically and it is possible to get analytical expressions for the perturbations. It turns out that, while these are regular at the horizon, they are not regular at infinity. This implies that unstable radial modes are not allowed on physical grounds and that the corresponding black holes may in fact be stable, at least in the perturbative sense. For a detailed discussion on the current status of perturbations of various known black hole solutions we refer the reader to [178].

Cosmological solutions
On small scales GR makes very good predictions and, from a phenomenological point of view, there is no need to look for a modification of the theory. On cosmological scales, however, it seems that either quantum field theory or gravity (or both) have to be modified in order to explain the observed value of the cosmic acceleration in a satisfactory way. In order for this review to be self-contained and to set up some notation we have provided a condensed summary of standard GR cosmology in appendix C.
Due to the conceptual problems of explaining the observed cosmic acceleration within GR, the implications of a consistent theory of modified gravity for cosmology are of immediate interest. Unfortunately, the original idea that a large vacuum energy contribution from the matter sector could be screened out simply by a non-zero graviton mass that weakens gravity at large scales turned out not to be realisable without fine tuning, as discussed in [46]. Nevertheless, one can take a less ambitious approach and assume that another, as of yet unknown, mechanism (such as supersymmetry, if it was realised at low energies) is at work and removes all vacuum energy. In this case, ordinary GR would predict a universe without cosmic acceleration, while in modified gravity theories, "self-accelerating" solutions certainly exist in various forms. Massive gravity is a particularly interesting candidate for this purpose since its interaction potential breaks diffeomorphism invariance and therefore the spin-2 mass scale is expected to be protected from receiving large quantum corrections. The hope would therefore be that the interactions of the graviton could give rise to a small rate of cosmic acceleration, which, unlike the pure cosmological constant, can be regarded as being "technically natural" in the sense of 't Hooft [198] (see e.g. [199] for a recent review on naturalness in the context of cosmology). Of course, it should be pointed out that although the guide of naturalness is philosophically appealing, in the end it may turn out that nature does not follow that principle. Nevertheless, like Occam's razor, it is a useful tool in discriminating between alternative hypotheses.
Shortly after the construction of the ghost-free potential, it was discovered that the dRGT model of massive gravity (with flat or even with general reference metric) does not possess stable homogeneous and isotropic solutions [127]. For more work on cosmology in massive gravity with fixed f µν , see [128][129][130][131][132][133]200]. On the other hand, in bimetric theory with dynamical reference metric the desired class of solutions does exist [153][154][155].
The simplest examples are the proportional backgrounds considered in section 5.3.1 which, however, only solve the equations with proportional sources. In this case, the effective cosmological constant in (5.11) receives contributions not only through vacuum energy from the matter sector (as captured by β 0 ) but also from all terms in the interaction potential. Even in the absence of vacuum energy (i.e., for β 0 = 0), cosmological backgrounds can be accelerating and the scale of acceleration is proportional to the technically natural mass scale m (see e.g. [201]).
In the following we provide a brief summary of results in bimetric cosmology. For a more detailed discussion, we refer the reader to [202].

Homogeneous & isotropic backgrounds in bimetric theory
We will outline the derivation of cosmological solutions in bimetric theory, following mainly [154]. For simplicity, the source T f µν of the f µν sector is set to zero, which allows us to interpret g µν as the physical metric in the usual way, provided that it has standard matter couplings. In other analyses, the second source term is kept nonzero [158,203,204], which in principle can serve to mimic a dark matter component [149][150][151][152].
To keep the comparison of the bimetric solutions to those of GR as simple as possible, it is convenient to make the usual Friedmann-Lemaître-Robertson-Walker (FLRW) ansatz for the metric g µν which is coupled to matter, 32 It is now important to notice that, in order to arrive at this form, we have used time reparametrisations to remove one time dependent function in the most general homogeneous and isotropic ansatz. We have therefore fixed the entire diffeomorphism invariance and there is no gauge symmetry left to do the same to f µν . As a result, the most general homogeneous and isotropic, bi-diagonal ansatz for the second metric reads, involving two time dependent functions X(t) and Y (t). Note that we have assumed the curvatures k of the two metrics to be the same. This does not constitute a loss of generality because, starting with two different values for k, one can show that the Bianchi constraint forces them to be equal [205].
Furthermore, as in GR, we use the perfect fluid form (T g ) µ ν = diag(−ρ, p, p, p) for the stress-energy tensor T g µν of matter coupled to g µν . In GR any source is automatically covariantly conserved as a consequence of the Bianchi identity whereas in bimetric theory the sources for g µν and f µν are not necessarily conserved. We therefore need to make the additional assumption of diffeomorphism invariance of the matter coupling which gives ∇ µ g T g µν = 0. This in turn implies that the continuity equation for matter (see equation (C.4)) is also valid in the model under consideration.
Next, we consider the bimetric equations of motion in (6.1), with T f µν = 0 and with one index raised by the respective inverse metrics. In what follows we shall refer to their 00-components simply as the g µν and f µν equation. Plugging the ansätze of the previous section into (6.1a) we can make use of known GR results to straightforwardly obtain the g µν equation, More general non-bidiagonal ansätze for the metrics have been studied in [153]. As we discussed in section 6.2.2 these are identical to GR backgrounds.
where µ 2 = m 4 /m 2 g . The first two terms are the same as in the ordinary Friedmann equation of GR, c.f. (C.5). In order to determine how the source ρ influences the cosmological evolution of the scale factor a(t), we need to determine the function Y (t) from the other equations. We start with the Bianchi constraint ∇ µ V g µν = 0, which, when evaluated on the homogeneous and isotropic ansatz, reduces to, In addition to these, we have to consider the f µν background equation whose form is slightly more complicated than (6.28) due to the presence of the additional function X(t) in the ansatz for f µν . Instead of presenting its complete form here, we first replace X(t) =Ẏ /ȧ, which is the dynamical solution to the Bianchi constraint above. 33 Then the f µν equation becomes, where, as usual, α ≡ m f /m g Multiplying this equation by Y 2 /a 2 and subtracting it from the g µν equation (6.28) yields an algebraic equation, which we have expressed in terms of the dimensionless quantities, From the quartic polynomial equation (6.31) one can obtain a solution for Υ which can be inserted into the g µν equation (6.28). After that, the latter will only contain a(t), a(t) and ρ(t), and thus becomes a modified Friedmann equation.

Classification of solutions
In general, the cosmological evolution equations derived above allow for several branches of solutions which have been categorised and studied in detail [148, 153-155, 201, 206-211]. Here we summarise the main results.
To start with, there are two options to satisfy the Bianchi constraint equation (6.29); either the first or the second bracket must vanish.
(I) The non-dynamical (algebraic) branch: If at least two out of β 1 , β 2 and β 3 are non-vanishing, then a constant b can be chosen such that β 1 + 2bβ 2 + b 2 β 3 = 0. In this case, an algebraic solution to the Bianchi constraint is Y (t) = b a(t). Even though in this case the 00-components of the two metrics are not necessarily proportional, the contribution from the interaction potential in (6.28) reduces to a cosmological constant term. The g µν equation hence simply becomes the ordinary Friedmann equation (C.5) with a cosmological constant Λ = m 2 (β 0 + 3bβ 1 +3b 2 β 2 +b 3 β 3 ) and thus the background solutions for g µν are degenerate with those of GR. At the level of linear perturbations, several degrees of freedom appear without kinetic terms and are thus expected to be strongly coupled [129,203,212]. On top of that, this branch seems to give rise to a non-perturbative ghost [131] and a late-time instability for the tensor modes [107]. Due to these pathologies, most of the literature focusses on branch II solutions. Note also that branch I does not exist if two out of β 1 , β 2 and β 3 are equal to zero.
(II) The dynamical branch: The alternative solution to the Bianchi constraint is X(t) =Ẏ /ȧ. In contrast to branch I, this is an evolution equation and allows for much more general solutions that can be very different from GR. In particular, Υ is not constant on this branch but a function of the matter density ρ, determined by (6.31).
The second branch further splits up into several subbranches, corresponding to different solutions of the quartic polynomial equation (6.31). These can be classified according to the evolution of the ratio of scale factors, Υ = Y /a [207].
(IIa) Infinite branch: This branch is characterised by an Υ that becomes infinitely large at early times and whose value decreases when moving forward in time.
(IIb) Finite branch: The solution is such that the ratio of scale factors Υ evolves towards a finite asymptotic value in the infinite past. This branch cannot be obtained in models with β 1 = 0 (assuming β 0 = 0) [207] .
(IIc) Exotic branches: These are all other branches which generically lead to bouncing cosmologies or static universes in the asymptotic past or future [210]. Even though it is not impossible to make such scenarios compatible with observations, these branches correspond to rather unconventional cosmologies and have not been studied in great detail so far.
Using phenomenologically inspired requirements, one can immediately rule out a set of bimetric models based on their cosmological background evolutions. For instance, out of the models with only one non-vanishing β n , only the β 0 -and the β 1 -model can give rise to a viable cosmology at the background level. Allowing for two non-vanishing β n , one can generically discard models with β 0 = β 1 = 0. More details on the viability conditions and exclusion of bimetric models based on background cosmology can be found in [201,207].

Self-acceleration & technical naturalness of the Hubble scale
In view of the dark energy problem, it is of particular interest to investigate whether bimetric theory can give rise to an expansion history compatible with observations even when there is no vacuum energy arising from the matter sector. If a background solution can mimic the behaviour of vacuum energy, it is referred to as "self-accelerating". Clearly, it would be even more interesting if it was possible to include a large vacuum energy contribution and still obtain a small acceleration rate. This can indeed be achieved in bimetric theory by (fine-)tuning the β n parameters to cancel terms from the interaction potential against the vacuum contribution. However, there is no mechanism (or symmetry) protecting the tiny difference of these two contributions and thus a configuration with small effective cosmological constant cannot be regarded as technically natural in this setup. At this stage bimetric theory is therefore in no better position than GR to solve the old cosmological constant problem.
Nevertheless, it is still possible to view the spin-2 interaction potential as the source of dynamical dark energy, much in the same spirit as in quintessence models [213].
To this end one assumes that all vacuum contributions in the source T g µν vanish and furthermore sets the bare cosmological constant term to zero by fixing β 0 = 0. A thorough comparison to observational data of the whole bimetric parameter space but with vanishing β 0 was performed in [201]. Self-accelerating solutions are found for many different combinations of nonzero β n parameters, and the fits to data are just as good as for the ordinary ΛCDM model of GR. As could have been anticipated, the best fit value for the mass of the massive spin-2 field is on the order of the Hubble scale H 0 . This tiny mass scale creates the hierarchy problem in GR with an ordinary cosmological constant which is not protected against large quantum corrections by any symmetry. The situation is different in bimetric theory where in the zero-mass limit the full diffeomorphism symmetry that transforms the two metrics separately is restored. This symmetry protects the spin-2 mass scale against large corrections and renders the small Hubble scale technically natural. For explicit calculations confirming this last statement, we refer the reader to [170,214].

Perturbations
In order to further study the viability of bimetric cosmology, one derives the linear equations for perturbations around the above cosmological backgrounds. In contrast to what we encountered for the proportional backgrounds in section 5.3.1, it is not possible to diagonalise the fluctuations into spin-2 mass eigenstates around more general solutions. For the homogeneous and isotropic backgrounds, the best one can do is to perform a decomposition into tensor, vector and scalar modes and try to decouple their dynamics as much as possible. Rather general analyses of the spectrum were already performed in [212,215] (and later in [216]), but since the resulting differential equations are not always easy to handle, developing a complete understanding of cosmological perturbation theory is still an ongoing process.
As already mentioned in section 6.3.2, branch I solutions are plagued by pathologies such as strong coupling behaviour and a non-perturbative ghost instability. Let us therefore focus on linear perturbations in the dynamical subbranches.
Infinite branch IIa: While specific infinite branch solutions for models with β 2 = β 3 = 0 are free of scalar instabilities [208], in general the perturbations around this class of FLRW backgrounds suffer from ghosts in both the scalar and tensor sectors [209,210,217]. Due to these pathologies, branch IIa solutions do not correspond to consistent backgrounds and must be discarded on theoretical grounds. Nevertheless, for an interesting study of the integrated Sachs-Wolfe-effect in this branch which appeared before its problems were established, see [218].
Finite branch IIb: Around this branch linear perturbations are generically ghostfree and well-behaved, except for a scalar gradient instability which occurs at early times [208,212,217,219]. While, from a theoretical point of view, this growing scalar mode is not a consistency problem, it does invalidate the use of linear perturbation theory. For generic parameters, the instability sets in at recent times and, as a consequence, bimetric theory cannot be invoked to predict phenomena in the early universe, at least not with standard perturbative techniques. However, it was demonstrated in [148] that, in the GR limit (c.f. section 5.4.2), the scalar instability is pushed backwards to arbitrarily early times. Hence, for small enough α = m f /m g , linear perturbation theory around the finite branches remains valid and the predictions for cosmology automatically resemble those made by GR. Another suggestion is that nonlinear effects related to the Vainshtein mechanism may render the instability irrelevant [220].
Exotic branches IIc: As already mentioned, the background evolution on these branches is rather unusual and hence this type of solutions has not received much attention. The perturbations around these backgrounds generically seem to have pathologies [210].
To summarise, the only well-studied models that give rise to viable cosmological backgrounds and perturbations lie on the finite branch. In order to maintain the predictivity of cosmological perturbation theory, it is furthermore necessary to bring bimetric theory close to GR, either by requiring a sufficiently small value of α = m f /m g or by invoking the Vainshtein mechanism. The advantage of the bimetric setup is the occurrence of a technically natural dark energy scale set by the spin-2 mass. However, eventually all of this will be useful only if one finds an explanation for the absence of the large vacuum energy contribution coming from the matter sector.

Partial Masslessness
A massive spin-2 field in de Sitter (dS) background exhibits several interesting features that are shared neither by lower-spin particles nor by spin-2 excitations in flat space. For a special value of the Fierz-Pauli mass in units of the background curvature, the linear theory possesses an additional gauge symmetry which removes one of the spin-2 helicity components. In this case, where the field loses one propagating degree of freedom, it is referred to as partially massless (PM). The particular mass value for which this situation occurs is called Higuchi bound and it divides the parameter region into unitary and non-unitary sub-sectors [81]. Below the Higuchi bound the helicityzero mode of the spin-2 particle develops a ghost instability, whereas above the bound all helicity states are well-behaved.
The existence of a nonlinear theory that involves a massive spin-2 field enables us to address the question whether the concept of partial masslessness is restricted to the linear theory around de Sitter backgrounds or whether it may be extended to the nonlinear level. By definition, the demand on a nonlinear PM theory is the presence of an extra gauge symmetry, even away from de Sitter backgrounds.

Partially massless spin-2 field on de Sitter background
Let us begin by reviewing the linear theory of a PM spin-2 particle in a de Sitter background, as first discussed in a sequence of papers by Deser et.al. [84-86, 194, 221-223]. Consider a de Sitter background metricḠ µν whose curvature satisfies R µν (Ḡ) = ΛḠ µν with positive cosmological constant Λ. The linearised action for a massive spin-2 fluctuation propagating on this background is, where the linearised Einstein operatorĒ ρσ µν is the same as in (5.18). The corresponding equations of motion for h µν reads, For general values of the Fierz-Pauli mass, this equation possesses no gauge symmetries. As we showed in section 6.2.1, it gives rise to five constraint equations that reduce the number of propagating degrees of freedom to five, as appropriate for a massive spin-2 particle.
A remarkable feature of (7.2) without any analogue in flat space is that if the Fierz-Pauli mass saturates the Higuchi bound, m 2 FP = 2 3 Λ, the equation is invariant under a new gauge symmetry. The corresponding linear gauge transformation reads, in which ξ(x) is a local gauge parameter.
Besides computing the variation of (7.2) under the gauge transformation explicitly, there is another straightforward way to see the existence of a gauge symmetry in the equation for the massive spin-2 field with mass at the Higuchi bound. First take the double divergence of (7.2) to arrive at, The kinetic terms have dropped out after using the linearised Bianchi identity. Furthermore, the trace of (7.2) is given by, If the mass is at the Higuchi bound, the terms without derivatives in this equation vanish identically, while the derivative terms are identical to those in (7.4). Hence we find that for m 2 FP = 2 3 Λ the traced equations of motion are identical to their double divergence or, in other words, the sum of the double divergence and the (correctly normalised) trace is identically zero. A gauge identity of this type even implies that the linearised action (7.1) is invariant under the corresponding gauge transformation. This can be seen by noticing that invariance of the action under (7.3) requires, Since δS δhµν is nothing else than the equation of motion for h µν , after integrating by parts, we find that ∆S indeed vanishes due to the gauge identity.
The conclusion is that, owing to the gauge redundancy, the spin-2 field described by (7.2) with mass at the Higuchi bound has only four dynamical degrees of freedom, one less than is the case for an ordinary massive spin-2 mode. The existence of the additional gauge symmetry around de Sitter background is in agreement with the representation theory of the de Sitter group SO(1, 4), which allows for "short" representations of higher-spin fields. These representations contain less degrees of freedom than the massive ones and have been dubbed partially massless (PM). For a discussion of this topic, consult e.g. [224].
An interesting feature of PM fields is that the enhanced symmetry could potentially improve the quantum behaviour of spin-2 theories. For instance, the gauge invariance protects the difference of cosmological constant and spin-2 mass against receiving large quantum corrections. Since the spin-2 mass scale itself is protected by diffeomorphism invariance of the massless theory, this could render a small value for the cosmological constant technically natural. Unfortunately, matter couplings of h µν destroy the gauge symmetry already at the linear level, unless the matter source is conformally invariant. An idea how to avoid this problem is presented in [225]. It is also worth mentioning that the linear PM theory possesses interesting properties similar to the electromagnetic duality in Maxwell's theory [226,227].

The search for a nonlinear PM theory
After the linear PM theory for spin-2 fields on de Sitter background had been discovered, it was soon attempted to construct higher-order interactions for the perturbation h µν that would leave the gauge symmetry intact. At cubic order in h µν around dS backgrounds such a construction turned out to be possible in d = 4 space-time dimensions [228][229][230]. On the other hand, in d > 4, it was found that gauge invariant cubic vertices can only exist if one also includes higher-derivative terms into the theory [228]. The construction of a fully nonlinear action with a PM gauge symmetry remains an open task. Many recent findings point towards the fact that such a theory cannot exist, in particular not as a theory involving nothing but a partially massless spin-2 field [134,[231][232][233][234][235][236][237]. We will summarise these arguments in section 7.4.
It is natural to expect the nonlinear PM theory (if it exists) to be found within the family of ghost-free nonlinear theories for massive spin-2 fields. A first investigation of this possibility was carried out in [238] where the authors aimed to identify a PM theory in nonlinear massive gravity with the reference metric taken to be a fixed de Sitter background. Their construction used Stückelberg fields in a generalised decoupling limit and showed that, for a certain choice of the β n parameters in the potential, a scalar degree of freedom is not propagating. Here we will follow the slightly simpler strategy of [239] which derived the parameters for the PM candidate theory in bimetric theory. 34 As it turns out, this particular bimetric model possesses several rather unexpected properties which could be of interest even if the no-go results against a nonlinear PM theory cannot be evaded. It is also possible that the nonlinear PM theory requires the input of additional degrees of freedom (e.g. higher-spin fields) and the framework of bimetric theory allows us to study merely the remnants of the enhanced symmetry present in the unknown extended setup.
All our considerations here will be in four space-time dimensions. It is possible to generalise some of the results to higher dimensions if the bimetric action is augmented by the Lanczos-Lovelock terms, which we shall come back to in section 8.1.

Identifying PM candidates
Our first observation is that, in bimetric theory, the equation of motion (5.20b) for the massive fluctuation δM µν around proportional backgrounds is of the same form as (7.2) with Λ =Λ g . As a consequence, if the cosmological constant is assumed to be positive and if the Fierz-Pauli mass in (5.21) is on the Higuchi bound, a gauge symmetry of the form (7.3) is present in the linear theory around proportional backgrounds. In bimetric notation, the corresponding infinitesimal gauge transformation of the massive fluctuation reads On the other hand, the massless fluctuation transforms under the PM symmetry at most by a term that resembles a coordinate transformation, ∆(δG µν ) ∼∇ µ ∂ ν ξ(x). This follows from the fact that its equation of motion (5.20a) does not have any extra gauge symmetry besides the usual linearised diffeomorphism invariance. Hence, we can always undo the PM transformation of δG µν by a coordinate transformation. Modulo GCTs, we can therefore demand ∆(δG µν ) = 0.
The question is now under what conditions the above symmetry transformations for the fluctuations around dS background have a chance to be extendable to the nonlinear level. Since the nonlinear theory is formulated in terms of the variables g µν and f µν , we first need to translate the transformations of mass eigenstates at the linear level into transformations of the fluctuations of the metrics. Assuming that we simultaneously perform a GCT to achieve ∆(δG µν ) = 0, this translation can be obtained uniquely using the expressions for the massive and massless fluctuations given in (5.19). The result is The crucial observation now is that, in a theory with a gauge symmetry at the nonlinear level, it must be possible to shift these transformations of the fluctuations δg µν and δf µν to the backgroundsḡ µν andf µν . This follows simply from the fact that the split into background and fluctuations is not unique and one can always take out part of the fluctuation to redefine the background. Therefore we demand that the symmetry transformations (7.8) should also leave the background equations invariant. But since we are dealing with the proportional solutions, on which the background equations are reduced to an Einstein equation forḡ µν along with Λ g = Λ f , c.f. (5.12), we restrict ourselves to constant gauge transformations∆ with ξ(x) = ξ 0 = const., such that ∆(δM µν ) = ξ 0Λg 3Ḡ µν . In this way we ensure that the transformation does not take us away from the proportional backgrounds and avoid unnecessary complication. The restriction to constant gauge transformations is a strong simplification, but as we will see now, demanding the background equation to be invariant under these is constraining enough to identify the PM candidate theory. Namely, it should be evident from (7.8) that shifting the constant transformations to the backgroundsḡ µν andf µν = c 2ḡ µν results in a shift in c 2 , In general this cannot lead to an integrable 35 symmetry of the background equations because c is determined by Λ g = Λ f and can therefore not be shifted. If this is the case, then a nonlinear PM symmetry cannot exist. The only possibility to avoid this immediate no-go statement is to demand that c is not fixed by the background equation. Then, the equation Λ g = Λ f that is automatically satisfied in this case (i.e. without fixing c) can be thought of as the gauge identity evaluated on the proportional backgrounds.
Since the proportional ansatz for the two metrics partly fixes the gauge, only constant scalings are left as residual transformations.

PM candidate theory
In four space-time dimensions, the explicit expressions (5.11) for the cosmological constants in terms of the β n parameters, the ratio of Planck masses α and the proportionality constant c can be used to write the background condition Λ g = Λ f as a polynomial equation for c, (7.10) 35 Integrability of the symmetry transformation means that it can be performed more than once and still leave the equations invariant. This is a natural requirement on any gauge symmetry. Without the integrability condition it is sufficient to set the mass to the Higuchi bound in order to have an invariant background.
This equation clearly fixes c unless all coefficients of different powers of c vanish separately. A proportionality constant c that is undetermined by the background equation therefore requires the following parameter choices in the bimetric interaction potential, We will frequently refer to these values as the PM parameters. Note that our requirement on the β n parameters fixes all but one of them uniquely in terms of the others and α. The one remaining parameter is of course degenerate with the scale m 4 in the interaction potential and sets the scale for the Fierz-Pauli mass and the cosmological constant. Moreover, it is easy to check using the expressions (5.21) and (5.17) form 2 FP andΛ g , that the parameter choice (7.11) automatically puts the mass on the Higuchi bound. Therefore the theory of linear perturbations around the backgrounds at hand exhibits the usual PM gauge symmetry. It is worth emphasising that we did not demand this in any way; it followed from an independent requirement on the background equations. 36 Starting from the proportional backgrounds the finite form of the scaling symmetry is given, for any constant a, by [239], By restricting to constant gauge transformations, clearly, we are not dealing with the full PM symmetry at the nonlinear level. Note, however, that since the set of constant transformations is a subset of the full gauge group, the theory we obtain by requiring invariance under this subset must contain the theory with the full gauge group (if it exists). Moreover, since in four dimensions the restriction to constant gauge transformations is sufficient to uniquely determine the β n parameters, we conclude that the resulting theory is already the unique candidate for having the full PM symmetry.
The scale invariance of the equations of motion for proportional backgrounds is not the only interesting property of the theory specified by the PM parameters. The PM candidate exhibits additional astonishing features that further support the existence of a gauge symmetry at the nonlinear level. For instance, consider again the homogeneous and isotropic solutions which we presented in section 6.3.1. We saw that, after solving the Bianchi constraint, it was possible to arrive at (6.31), an algebraic equation for the 36 On the other hand, note that if we assume that there is a unique PM theory, then requirinḡ m 2 FP = 2 3Λ g is sufficient to determine its parameters. Namely, in this case, the symmetry (5.6) of the interaction potential enforces α 4−n β n = α n β 4−n because otherwise the theory obtained from replacing α 4−n β n → α n β 4−n would also be PM, contradicting the uniqueness requirement. It is easy to see that this constraint on the β n parameters together with the Higuchi bound condition already implies (7.11). ratio of the two scale factors, Y (t)/a(t). Inserting the PM parameters (7.11) into this equation in vacuum, we find that it becomes an identity. Clearly, it is the analogue of the equation Λ g = Λ f for proportional backgrounds. But instead of a constant, the equations evaluated on the cosmological ansatz now leave a time-dependent function undetermined. This shows that the cosmological evolution equations of PM bimetric theory are invariant under a symmetry that is local in time.
Since, in four dimensions, the proportional backgrounds correspond to maximally symmetric spacetimes with ten independent Killing vectors and the homogeneous and isotropic solutions still possess six isometries, one might speculate that the presence of a symmetry on these solutions could somehow be related to the amount of symmetry of the underlying geometry. To obtain more general results, it is therefore important to investigate the structure of our PM candidate theory beyond the proportional and cosmological backgrounds. We shall come back to this point in the next subsection where we show that the equations in the PM theory are Weyl invariant to lowest order in a derivative expansion.
Massive gravity limit. Let us comment briefly on the massive gravity limit. The PM parameters (7.11) in four space-time dimensions provide an example for a theory in which the coefficients in the interaction potential depend on the ratio α = m f /m g that is taken to infinity in the massive gravity limit which we discussed in section 5.4.3. In this case, we have to be more careful when taking the limit because the β n parameters will scale as well. The conditions (7.11) fix the relative scale among them but their absolute scale may still be chosen freely. Suppose that β 2 does not scale when the limit α = m f /m g → ∞ is taken. Then β 0 → 0 and β 4 → ∞, whereas Λ = β 4 m 4 /m 2 f = const. In the massive gravity limit, the equations of motion for the PM theory thus reduce to These are exactly the equations singled out in [233,235] which investigated the possibility of realising nonlinear partial masslessness in massive gravity with fixed reference metric. The methods invoked in those references are completely different from ours and could provide independent support for the existence of a nonlinear PM symmetry. On the other hand, as we already mentioned previously, references [233,235] also provided evidence for the absence of an additional gauge symmetry present in (7.13). These results may extend to the bimetric case but such a generalisation is not obvious due to the singular nature of the massive gravity limit [147].

Connection to conformal gravity
As shown in [240], the PM candidate model is closely related to the well-known theory of conformal gravity whose action is invariant under Weyl transformations of the metric. In the following we will briefly review conformal gravity and then summarise the steps that establish its connection to the PM bimetric model.

Review of conformal gravity
In four space-time dimensions, there exists a particular higher-derivative action with an additional gauge symmetry [241], with dimensionless coefficient σ. This action is invariant under Weyl transformations of the metric, where ξ(x) is a local gauge parameter. The cosmological constant and the Einstein-Hilbert term are not invariant under the transformation (7.15); hence they do not appear in the above conformal gravity action.
The equations of motions that follow from variation of the conformal gravity action (7.14) imply the vanishing of the Bach tensor for g µν [242], Here, we have given the definition of B µν in terms of the Schouten tensor, as well as the Weyl tensor, 37 The conformal gravity action is closely related to the linear theory for a partially massless spin-2 field discussed in section 7.1. In order to see this, one can introduce an auxiliary field ϕ µν and a parameter Λ to write down an equivalent action [243], Linearisation around a constant curvature background and diagonalisation into mass eigenstates shows that, for σ > 0, the theory propagates a healthy massless and a ghostlike massive spin-2 particle. Moreover, the Fierz-Pauli mass of the massive fluctuation has the value corresponding to the Higuchi bound, m 2 FP = 2 3 Λ, which implies the presence of the PM gauge symmetry (7.3) for ϕ µν . The massless fluctuation does not transform under this symmetry. In fact, it is a simple exercise to check that, in addition to the obvious diffeomorphism invariance, the full nonlinear auxiliary action (7.19) is invariant under the following linear gauge transformations [231], Note that the nonlinear field g µν transforms under the conformal part of the PM symmetry because its fluctuation does not correspond to the massless mode but is a linear superposition of mass eigenstates. Its transformation is of course nothing but the linear version of the Weyl transformation (7.15).
Conformal gravity, or its equivalent form (7.19), therefore describes only six propagating degrees of freedom instead of seven that would correspond to a massless and a massive spin-2 field. For more detailed discussions of its spectrum we refer the reader to [244][245][246][247]. Like any other Weyl invariant theory, S CG does not contain any dimensionful couplings and therefore avoids the non-renormalisibility problem of GR. Being a renormalisable field theory, conformal gravity has been suggested as a quantum theory for gravity. It has also been shown to allow for viable cosmological solutions and even fit galaxy rotation curves, providing a possible solution for at least part of the dark matter problem [248].
Unfortunately, all of its features remain irrelevant unless a cure for the ghost problem in conformal gravity is found. Suggestions for altering the theory in order to make it healthy include, for instance, a modification of quantum mechanics [249] and the addition of specific boundary conditions [247], but none of these have been sufficiently convincing for the theory to be accepted as a consistent alternative to GR. In order to avoid the inconsistencies that plague theories with a finite number of higher derivatives, another possible solution is to complete the equations of motion (7.16) of conformal gravity with an infinite number of derivatives for which Ostrodgradski's theorem does not hold. Higher-derivative terms will, of course, break the conformal symmetry because they necessarily enter with a suppressing mass scale. On the other hand, one could imagine that the symmetry transformation needs to be completed with an infinite number of higher-derivative terms as well, in order to be a gauge symmetry of the full theory. In that case, both the equations of motion and the symmetry transformation could be thought of as a perturbative expansion in derivatives. Order by order, the equations would be invariant under the gauge symmetry, starting with the Bach tensor and its Weyl invariance at lowest order. Although this idea sounds promising, without any further input it is difficult to guess the form of the higher-derivative corrections to the Bach equation that could give rise to such a gauge symmetry. It would therefore be helpful to have a guideline telling us how to construct these terms. This is where the PM candidate of bimetric theory comes into play.

Perturbative solution to the g µν equation
We review here the results of [240]. Our aim is to combine the bimetric equations of motion to eliminate one of the metrics and derive an effective equation involving only the other. To this end, we will solve the g µν equation algebraically for the squareroot matrix S µ ν as a perturbation series in curvatures of g µν . From this we deduce a perturbative solution for f µν that can be used to eliminate f µν from its own equation of motion, resulting in a perturbative equation for g µν . We will derive the lowest orders of this equation for general β n parameters and then see that it exhibits remarkable features when we restrict to the PM parameters (7.11) in the subsequent subsection. Of course, we could switch the roles of the metrics and in a similar manner derive an effective equation for f µν .
It will prove convenient to first rewrite the Einstein tensor in terms of the Schouten tensor defined in (7.17) and raise one index with g µν . This gives, 38 (7.21) 38 Mainly for the sake of notational simplicity, we have set the source for g µν to zero. This simplification constitutes no loss of generality because in the final results of our computation it can always be reinstated by making the replacement, Then we make the following perturbative ansatz for the square-root matrix S = g −1 f , with arbitrary complex coefficients a, b i , c i , . . . Allowing the parameters to assume complex values is reasonable as long as the coefficients that will finally appear in the effective equation for g µν remain real.
In the model where only β 0 , β 1 and β 4 are non-vanishing, the solution takes the very simple closed form, For more general parameters, the expansion does not terminate and we can only determine the coefficients in the ansatz order by order in curvatures. As a side-remark: The existence of this exact solution in the β 1 model is rather remarkable. For example, linearising the equations of motion in this model we can use the above relation to fully remove any occurrence of f µν in the linearised equations. In particular, in the massive gravity limit this means that it is possible to obtain equations for a massive spin-2 field propagating on any background without any reference to a second metric [104,105].
In order to simplify the expressions in the following, we introduce a new set of linear combinations of the β n parameters, Note that on proportional backgrounds, f µν = c 2 g µν and s 0 is proportional to the cosmological constant Λ g defined in (5.11) if in that expression one replaces c by a.
The lowest orders in the solution for S are determined to be of the following form, We can then use f µν = g µρ (S 2 ) ρ ν to arrive at, This is the most general covariant solution for f µν obtained from the g µν equation. An immediate consequence of equation (7.26) is that if the solution for g µν has constant curvature, P µν ∝ g µν , then f µν ∝ g µν , i.e. the two metrics are proportional to each other. But, as we already mentioned in section 6.1, there exist solutions for which both metrics have constant curvature while not being proportional to each other. This class of (non-covariant) solutions is therefore not captured by our ansatz for S in (7.22).
Next we use the solution (7.26) for f µν to eliminate it in its own equation of motion. This will lead to a set of effective equations for g µν containing an infinite number of derivatives, due to the presence of the inverse f µν in these equations. Inserting (7.26) into the Einstein tensor for f µν we find, The contributions from the interaction potential to the f µν equation evaluated on (7.26) become, in which the expansion coefficients are given by, where we have defined, Ω = aβ 1 + 3a 2 β 2 + 3a 3 β 3 + a 4 β 4 . (7.30) Note that this would be proportional to Λ f in (5.11) if f µν = a 2 g µν . Combining the kinetic and potential terms, we can write the entire f µν equation of motion as a higherderivative equation for g µν , Ω a 2 α 2 g µν + 1 where we have collected some of the terms with four derivatives into the Bach tensor defined in (7.16). We have thus arrived at an effective equation involving only g µν in which the terms with more than four derivatives can be computed order by order following the same procedure as outlined above. The terms in the first line of (7.31) are the cosmological constant, the Einstein tensor and a correction proportional to the Schouten tensor. The equation hence reduces to GR in the small curvature limit only if (Ω/a 2 α 2 )(1/s 1 µ 2 ) → 0. Note that the first of these brackets directly sets the size of the observed cosmological constant. The phenomenological relevance of this equation for weak gravitational fields has not been studied so far, but doing so would require adding a source in accordance with our remark in footnote 38.
Before coming to the partially massless case, let us make one more remark: Plugging the perturbative solution (7.26) back into the bimetric action results in an effective action of the form, where c Λ , c R and c RR are functions of the bimetric parameters. We recognise the conformal gravity combination at fourth order in derivatives and notice that the above action, at quadratic order in curvatures, represents the generalisation of three-dimensional New Massive Gravity [250] to four dimensions which is also known to propagate seven degrees of freedom [241,251]. However, the action (7.32) is not equivalent to the original bimetric action since we have used the equation for g µν instead of the one for f µν to obtain a solution for f µν . The equations derived from the above action differ from the bimetric equations by the product of a differential operator. Only by restricting to solutions where the zero modes of this operator are absent will the equations give the same solutions. The additional terms that would arise using the correct procedure, i.e. integrating out f µν by its own equations of motion, are nonlocal because a derivative operator needs to be inverted in order to solve the f µν equation for f µν . This is discussed in detail in [240,252]. In the following, we will not work with any effective action but restrict ourselves to the equations of motion where this ambiguity does not arise.

Partially massless case
Let us determine the set of parameter values for which the higher-derivative equation (7.31) obtained from bimetric theory reduces to the Bach equation (7.16) at lowest order in derivatives. This means that we require, In addition to this, satisfying the g µν equation at lowest order in curvatures requires that the combination s 0 defined in (7.24) vanishes. Using a 2 = −α −2 in the expressions for Ω and s 0 , it is easy to see that these requirements combine into two complex equations, which need to be solved for the β n parameters. The real and imaginary parts must vanish separately and the unique solution is, Remarkably, these values precisely corresponds to the PM parameter choice in (7.11). We conclude that the PM candidate is the unique member of the family of bimetric theories with the feature that to lowest order in a derivative expansion the effective equation for g µν is identical to the Weyl invariant equation of conformal gravity, Moreover, it is straightforward to show that the method that we used here to arrive at an effective equation for g µν can yield a similar result for f µν : The effective equation for f µν , obtained from solving the f µν equation for g µν and plugging the solution into the g µν equation, also sets the Bach tensor for f µν to zero at lowest order in derivatives. This can directly be deduced from the symmetry property (5.7) of the equations of motion and the invariance of the PM parameter choice under α 4−n β n → α n β 4−n .
In this way, the nonlinear PM bimetric theory proposes a ghost-free completion of conformal gravity. The presence of a gauge symmetry to lowest order in a derivative expansion can be regarded as further support of the existence of an additional gauge symmetry in the theory. In particular, note that the above analysis is not based on de Sitter (nor FLRW) background.
If existent, the PM symmetry could be viewed as the generalisation of the gauge invariance of conformal gravity to higher orders in derivatives. In principle, it is possible to explicitly demonstrate the invariance of the equations under symmetry transformations that are extended to higher orders in derivatives. Using the perturbative solution to the g µν equation together with the analogous expression obtained from the f µν equation, the gauge invariance has been shown to exist up to sixth order in derivatives [253].

Arguments against a nonlinear PM theory
As mentioned before, the literature contains a large variety of no-go theorems that forbid the existence of a nonlinear theory for partially massless spin-2 fields. Most of these counter arguments refer to an action involving no other degrees of freedom besides the PM field but some of them may eventually also apply to bimetric theory. Let us summarise them briefly: • In the massive gravity limit of the PM bimetric model, first obstructions were already encountered by the authors of [235] and it was shown in [232,233] that the equations of motion do not satisfy the nonlinear Bianchi identity which is expected in a PM theory. More explicitly, the authors showed that the nonlinear covariant constraint which removes the Boulware-Deser ghost does not identically vanish. This result was argued to extend to bimetric theory [234] where, however, the situation is less clear since the nonlinear version of the constraint is not known. In fact, a covariant constraint does not even seem to exist for general backgrounds [108].
• The linear spectrum of conformal gravity always contains a ghost [231]. Although the full nonlinear bimetric theory is ghost-free, it is possible that the appearance of the Weyl invariant Bach tensor (7.36) in our perturbative approach is intimately related to the fact that we are expanding in small curvatures, i.e. around flat solutions which suffer from the same pathology as conformal gravity. Flat backgrounds in the PM theory require choosing c 2 = −1, which renders the kinetic term of the massive perturbation ghost-like [253]. Hence, the Weyl invariance at lowest order in the equations of motion (7.36) seems to be obtainable only at the cost of giving up unitarity.
• The authors of [134] argued that, in order to give rise to a nonlinear PM symmetry, the scalar mode of the massive field needs to disappear entirely from a "decoupling limit" of bimetric theory. It was then demonstrated that, even though the pure scalar interactions are indeed absent, the mode reappears in interactions with the vectors. 39 The fact that the vectors vanish on maximally symmetric backgrounds explains why one sees a PM gauge symmetry to linear order around dS space but not beyond. 39 From our point of view, it is not obvious if this analysis is actually able to rule out the gauge symmetry because, firstly, the backgrounds that [134] assumed for the metrics do not solve the equations of motion and, secondly, the constraint that removes the ghost is not imposed. As a consequence of the latter, the vector-scalar interactions involve higher-derivative terms of the scalar mode that are known to disappear after accounting for the constraint.
• Imposing a closure condition on the PM transformations, the authors of [237] showed that no nonlinear Lagrangian involving at most two derivatives on the fields can realise the required symmetry for a single spin-2 field in a gravitational background. The setup is more general than ghost-free massive gravity but it still does not include the bimetric case.
• More group theoretical evidence against a nonlinear PM symmetry for spin-2 fields coupled to gravity was provided in [236] where it was shown that no unitary theory exists and the unique non-unitary example is conformal gravity. However, for technical reasons which are discussed in [253], these arguments do not directly carry over to bimetric theory.
• In [254] the authors provide arguments against the existence of a partially massless theory with a non-abelian Yang-Mills like extension. Again, these arguments cannot directly be applied to the bimetric setup.
If the above results eventually turn out to also be extendable to bimetric theory, this would imply that the PM symmetry can indeed not be realised as a nonlinear theory involving only spin-2 fields. But even in this case, the bimetric candidate model seems to possess interesting properties that could pave the way towards understanding partial masslessness from a background-independent point of view. A promising future direction could be the combination of bimetric theory with higher-spin degrees of freedom. Finally, let us point out that finding an additional scalar symmetry is not merely an interesting exercise but could indeed prove to be very useful from several perspectives. First of all, it would guarantee that the helicity-zero mode is absent even nonlinearly. This will have an enormous effect on the phenomenology since the scalar mode is usually responsible for various instability issues and causes the most tension with observations. Secondly it would provide an argument for why a small cosmological constant is technically natural since its value is tied to the mass of the spin-2 field via a symmetry, whilst a small mass is itself technically natural since its vanishing restores full diffeomorphism invariance. Thirdly, the absence of the scalar mode may lead to improved quantum behaviour of the theory. Indeed, since the symmetry if it exists is given by Weyl scalings to lowest order in a derivative expansion, the theory may have much better renormalisation properties than GR.

Extensions of Bimetric Theory
Extensions of massive gravity have been proposed mainly by introducing new scalar degrees of freedom and in the context of cosmology. For instance, a scenario in which the spin-2 mass is obtained through a scalar condensate has been suggested in [127] and a quasi-dilaton extension of the massive gravity action was constructed in [255,256]. Similarly f (R) extensions have been studied in e.g. [138][139][140][141][142] and other scalar fields with non-minimal coupling terms have also been added to the bimetric action, as in [257].
In the following we shall focus on more direct generalisations which preserves the structure of the bimetric theory itself without adding additional degrees of freedom, except the most natural generalisation to include interactions of multiple massive spin-2 fields. 40 First, we discuss its generalisation to higher-dimensions, where new kinetic terms satisfy the requirement of classical consistency. Then we shall review interactions of multiple spin-2 fields and, in the same context, also present the vierbein formulation of bimetric theory.

Higher derivative Lanczos-Lovelock extension
According to the constructive consistency proof in section 3.4, the interaction potential of the Hassan-Rosen bimetric theory includes all ghost-free non-derivative terms. Therefore, the only option to obtain an extended version of the theory is to add more derivative terms to the action. The literature contains several no-go theorems on generalising the kinetic structure of the spin-2 fields in four dimensions to anything beyond the Einstein-Hilbert term [135,136,259]. 41 On the other hand, in dimensions greater than four, the Lanczos-Lovelock (LL) invariants, which either vanish or are topological in d = 4, are expected not to reintroduce the Boulware-Deser ghost [263][264][265][266][267]. These are defined as totally antisymmetric contractions of Riemann tensors R αβ µν , The definition of the antisymmetric product and relations to analogue expressions in terms of Levi-Civita symbols is given in appendix A. In spite of being higher-derivative operators, the LL invariants are believed to avoid inconsistencies that are usually introduced by such terms. This is due to the antisymmetric structure in (8.1) which ensures 40 Interactions between multiple massless spin-2 fields are forbidden on quite general grounds [258]. 41 See, however, [260][261][262]. Moreover, note that the structure of the kinetic terms is of course only determined up to field redefinitions. the absence of more than two time derivatives acting on one field in the action. It is interesting that the same structure appears in the interaction potential of ghost-free bimetric theory, but of course this is not a complete coincidence: In the decoupling limit of nonlinear massive gravity it can be seen that the antisymmetric structure removes higher time derivatives from the longitudinal modes of the Stückelberg fields [57,58].

Extended action and equations of motion
In order to extend ghost-free bimetric theory to d dimensions, we first write the Hassan-Rosen bimetric action in the more general form, where the elementary symmetric polynomials are still defined through the same recursion formula (A.2). The consistency proof of [76], as outlined in sections 3.4 and 5.2, generalises straightforwardly to the higher-dimensional case. It has also been shown that the mass spectrum is very similar to the four-dimensional case and contains a massless and massive excitation [143].
Next, guided by the above considerations, we extend the Hassan-Rosen action by the following terms, as done in [143,268], where we have introduced two sets of couplings, l g n and l f n , of mass dimension 2(1 − n). It is well known that the LL invariant L (n) is topological in d = 2n and vanishes for 2n > d which is why the sums in (8.3) terminate at integer part of d/2.
The equations of motion for g µν and f µν that follow from the LL extended bimetric action are Here, G µν are the Lovelock tensors that follow from variation of (8.3) and V g µν and V f µν are the contributions from the interaction potential which were defined in (5.3) and in which the summations now run from 0 to d. We will not need the explicit expressions for the Lovelock tensors, but let us note the important property, which will be useful later on.

Proportional backgrounds
As in pure bimetric theory in d = 4 we are interested in finding the proportional background solutions to (8.4) and study perturbations around those. Again, these backgrounds correspond to maximally symmetric spacetimes, for which the curvatures satisfy with cosmological constant λ. On such backgrounds with constant curvature, the n-th order LL invariant is proportional to λ n , Next, we trace the equations of motion (8.4) in order to obtain two scalar equations and insert (8.6) for the curvatures. Furthermore, we make use of the identity (8.5) and plug in the proportional ansatz,f µν = c 2ḡ µν . Finally, in the equation for f µν , we use L (n) (c 2ḡ ) = c −2n L (n) (ḡ). The resulting two equations read Here, Λ g and Λ f are the contributions from the potential evaluated on the proportional backgrounds which were defined earlier in (5.11). Note, however, that these no longer correspond to the physical cosmological constants and are not necessarily equal.
For general parameters β n , l g n , and l f n , the two algebraic equations in (8.8) serve to determine the background curvature λ as well as the proportionality constant c 2 . Note also that in the absence of the LL contributions, i.e. for l g n = l f n = 0, we re-arrive at the bimetric background equations, R(ḡ) = 2d d−2 Λ g and Λ g = Λ f .

Mass spectrum
Computing the linearised equations of motion around general background solutions to (8.4) is quite involved, even when the potential contributions vanish. However, as has been observed in [267], around constant curvature backgrounds with cosmological constant λ, the fluctuation equations of GR augmented by the LL terms with couplings l g n assume the same form as the ones obtained from Einstein's equations. The only difference is that, in the linearised equations of the LL extended theory, the Planck mass m g needs to be replaced by an effective mass parameterm g defined through It is a remarkable feature of the Lovelock invariants that, in spite of being higherderivative operators in the nonlinear theory, they give rise to linearised equations that are only second order in derivatives.
For the bimetric case the above result implies that the linear equations for the theory with LL terms are the same as in pure bimetric theory if one replaces m g bym g as above and m f bym f given bỹ (8.10) Hence, using the results of [267], without any further computation we can conclude that the linear equations in a maximally symmetric background with cosmological constant λ will again diagonalise into a massless and massive equation, where in the latter the Fierz-Pauli mass (5.21) is now given in terms ofm g andm f .

Partial masslessness in d > 4
To conclude the discussion of the Lanczos-Lovelock extension of bimetric theory, let us comment on the possibility to extend the linear symmetry of partially massless (PM) fields to the nonlinear level within this generalised setup. Since the proportional backgrounds and their perturbations possess the same structure as in pure bimetric theory in d = 4, the linear equations are again invariant under a PM symmetry if the mass is on the Higuchi bound which in d dimensions corresponds to the value m 2 FP = 2 d−1 λ. The arguments given in section 7.2.1 leading to a PM candidate theory can straightforwardly be applied to the higher-dimensional case. The criterion of leaving the proportionality constant c undetermined by the background equations (8.8) singles out unique PM candidate theories which have been identified in [143] up to d = 8. All the interaction parameters are fixed with respect to each other in these models and, in particular, the Lanczos-Lovelock coefficients l g n and l f n are nonzero. This result confirms earlier findings [228] which revealed that the linear PM symmetry cannot be extended in d > 4 unless higher-derivative terms are included into the action.

Multiple interacting spin-2 fields
Ghost-free bimetric theory contains the correct number of degrees of freedom for a massless and a massive spin-2 mode. While no-go theorems forbid consistent interactions among more than one massless spin-2 field [258], it is possible to extend bimetric theory by additional massive spin-2 degrees of freedom. Hinterbichler and Rosen first wrote down these ghost-free interactions using vierbeins instead of metrics [269]. 42 In the following we will derive the consistent multi-spin-2 interactions in the metric formulation and then review their formulation in terms of vierbeins. More work on the vierbein formulation of massive gravity, bimetric and multimetric theory can be found in [68,101,102,136,[270][271][272][273][274][275][276][277][278][279][280][281][282].

Multiple bimetric couplings
When constructing a theory for more than two spin-2 fields, of course, the main requirement on its structure remains the absence of the Boulware-Deser ghosts. The interaction potential for N fields therefore has to be chosen such that the action in Hamiltonian formulation gives rise to (4 + N − 1) constraints. Out of these, four 42 The formulation of the full ghost-free bimetric potential in terms of vierbeins first appeared in [153].
should be first class constraints, generating the group of general coordinate transformations, while the remaining (N − 1) are expected to be second class and serve to remove all Boulware-Deser ghosts from the spectrum.
An obvious way to satisfy these requirements is to simply add up several copies of the Hassan-Rosen action (5.1) for fields f µν and g µν (I) with I = 1, . . . , N − 1, to obtain a theory in which one of the metrics is coupled to all the others, This will be consistent because in every interaction potential one can redefine the ADM variables for g µν (I) to make it linear in the lapse and shift of f µν as well as the lapse of g µν (I), such that one obtains the desired (4 + N − 1) constraints.
The left panel of Figure 1 shows how such a coupling can be visualised in a graph [269]. Each solid dot represents a different spin-2 field and the lines stand for standard Hassan-Rosen bimetric couplings between them. Fields corresponding to dots that are not connected by a line do not directly interact. Due to the corresponding picture with one metric in the centre, we shall refer to interactions of the form (8.11) as "centre couplings".
Another option to obtain consistent multiple bimetric interactions is to build a "chain" of N coupled spin-2 fields g µν (I) where I = 1, . . . , N . That is to say, we take an action of the form, S HR g(I), g(I + 1) , (8.12) for which the corresponding graph is depicted in the right panel of Figure 1. This construction works because in the first term one can redefine the ADM variables for g (1) such that it becomes linear in the lapse and shift of g(2) as well as the lapse of g(1). In the second term one performs a similar redefinition for the shift of g(2) which, since the redefinition is linear in the lapses, does not destroy the linearity in the lapse of g(2) in the first term. Note however that the second term will no longer be linear in the shift of g (2). Continuing with this procedure in all the terms results in an action that is linear in the lapse and shift of g(N ) and in all the other lapses.
In fact, it is also possible to combine the two above constructions in the following manner: First introduce a chain coupling for fields f (K), K = 1, . . . , K, and then to each f (K) attach I K fields g (K) (I K ), I K = 1, . . . I K , in a centre coupling. This gives rise to the most general ghost-free multimetric action, In this theory the total number of fields is N = K + K K=1 I k . An example for a graph is shown in the left panel of Figure 2. Note that the action S multi still consists purely of multiple copies of the Hassan-Rosen bimetric action; we have not introduced any type of new interaction term. Of course, in the above construction, it is also possible to add further chain or centre couplings to each of the g (K) (I K ). However, it is important to realise that not all possible bimetric couplings are allowed. In particular, one may not couple the metrics in a "loop", which means adding terms such as S HR f (K), f (K +2) or S HR f (K + 1), g (K) (I K ) to (8.13). To see why this fails, consider a simple loop coupling among three metrics, whose graph is depicted in the right panel of Figure 2. Now imagine that in the first term one redefines the shift of g to render it linear in the lapse and shift of f and the lapse of g. Then, in the second term redefine the shift of f such that the first two terms afterwards are linear in the lapse and shift of h and the lapses of g and f . In the third term, one would now have to redefine the shift of h, but from the particular form (3.19) of the redefinition it is clear that afterwards, the action will again contain the original shift variable for g which cannot be expressed in terms of the redefined variables only. Therefore there is at least no obvious consistent redefinition of ADM variables that renders (8.14) linear in all the lapses and we conclude that the loop couplings most likely give rise to Boulware-Deser ghosts. This is in agreement with results of a similar analysis performed in [283].

Vierbein formulation
The multimetric action (8.13) consists of several copies of the Hassan-Rosen action (5.1). It is possible to rewrite the latter in terms of different variables which avoid the appearance of the square-root matrix. 43 To this end, let us introduce the tetrads or vierbeins defined for each metric, In terms of the respective vierbeins, the square-root matrix g(1) −1 g(2) built out of two metrics g(1) and g(2) becomes, (2) . (8.16) It is now convenient to fix the Lorentz gauge for one of the vierbeins such that E(2)E(1) −1 η −1 is a symmetric matrix. We thus impose the so-called "symmetric" gauge (or Deser-van Nieuwenhuizen gauge [285]), In this particular Lorentz frame, we can easily evaluate the square-root as, 43 For suggestions to remove the square-root matrix by introducing auxiliary fields, see [64,284].
Using the identity, we can then write the bimetric action in the form where S EH (I) are the standard Einstein-Hilbert actions for E(I) and where b n = 2βn n!(4−n)! are the same parameters as in (3.33). Since we have only fixed one of the two Lorentz gauges, this interaction potential still has one overall local Lorentz invariance under which both vierbeins transform in the same way.
It is important to note that the vierbeins in (8.21) must satisfy the symmetrisation condition (8.17), otherwise the vierbein formulation is not equivalent to bimetric theory for general values of b n . More precisely, if we started from (8.21) without imposing the condition by hand, then the dynamics of the vierbein theory would allow for configurations that do not give back the ghost-free bimetric formulation. As discussed in [269], for each vielbein there exist six combinations of its equations of motion that do not contain any derivatives. These read, For models with b 2 = b 3 = 0 or b 1 = b 2 = 0, these equations imply the symmetrisation condition (8.17) and the equivalence to bimetric theory is dynamically guaranteed. For more general parameters, the symmetrised vierbeins still solve the equations but other solutions also exist [286,287]. These disconnected branches give rise to different theories which contain the Boulware-Deser ghost instability [288].
Another important aspect, first pointed out in [286], is that one cannot always arrive at the symmetrised form (8.17) by acting with a Lorentz transformation on general vierbeins. For configurations that do not allow this, the square-root cannot be evaluated as in (8.18). In fact, the condition on the combination of vierbeins to be symmetrisable by a Lorentz transformation is exactly the same as the one we derived in section 3.3.3 on the metric components by requiring that the square-root exists [99]: Ensuring that the Lorentz boost velocity v a satisfies v a δ ab v b < 1 is equivalent to requiring the existence of a real solution to the scalar square-root √ x that appears in the ADM decomposition of g −1 f in (3.27). This condition in turn ensures that one can always impose the "symmetric" gauge by a Lorentz transformation.

More general interactions?
So far we have used the earlier results of the bimetric ADM analysis in order to couple several spin-2 fields to each other using only bimetric interactions. The next question is whether there exist more general couplings than the bimetric vertex. We introduce graphs for two examples of such n-point vertices in Figure 3. A priori, such couplings could be very different from the bimetric ones and the ADM analysis has to be redone from the start.
In [269] it was proposed that the interactions of vierbeins could be generalised to, with U I 1 ...I 4 = µ 1 µ 2 µ 3 µ 4 a 1 a 2 a 3 a 4 E a 1 µ 1 (I 1 )E a 2 µ 2 (I 2 )E a 3 µ 3 (I 3 )E a 4 µ 4 (I 4 ) , (8.23) where T I 1 I 2 I 3 I 4 are coupling constants symmetric in all the indices I n ∈ {I} which run from 1 to N . The total multivielbein action would then be given by Note that the antisymmetric structure of (8.23) ensures that there cannot be more than four vierbeins (or in d dimensions, not more than d vielbeins) interacting in one vertex.
However, the results obtained in [288] severely constrain the structure of the "tensor" of coupling constants T I 1 ...I 4 and only diagrams of the type displayed in Figure 1 are allowed. The more general couplings displayed in Figure 3 which would arise from an unconstrained T I 1 ...I 4 and give rise to diagrams with at most 4 lines ending in one vertex are excluded. The same holds for the loop couplings in the right panel of Figure 2. Although these forbidden terms have been shown to possess a ghost-free decoupling limit, they do spoil the consistency of the full theory [136,288] and the reason can be traced back to the fact that the corresponding equations of motion are incompatible with the symmetrisation condition (8.17). In summary, the only consistent vierbein couplings are those that give rise to a metric formulation of the type (8.13).

Discussion and Outlook
In this article we summarised the recent developments of nonlinear theories involving massive spin-2 fields. It was shown how to construct the unique ghost-free interaction potential of nonlinear massive gravity with fixed reference metric, which we afterwards generalised to the fully dynamical bimetric theory. The latter describes the interactions of a massless spin-2 field with a massive one and, when coupled to matter, captures the behaviour of gravity in the presence of an additional tensor field. We have seen that there exists a limit in the parameter space of bimetric theory which takes the theory arbitrarily close to general relativity. In this limit, viable cosmological solutions can be found where, in the absence of vacuum energy, the dark energy scale is set by the spin-2 mass which is protected against receiving large quantum corrections. We have discussed the idea of nonlinear theories for partially massless spin-2 fields and in the process established a connection between bimetric theory and conformal gravity. Finally, we have reviewed the extension of bimetric theory to dimensions greater than four and to couplings between more than two spin-2 fields.
On the phenomenological side, one of the most interesting open questions is what possible implications the existence of a massive spin-2 field in nature could have on the dark matter problem. Several scenarios where dark matter components are attributed to the matter sector of the second metric f µν have already been considered in [149][150][151][152]. From our point of view, however, the more interesting option is to regard the second metric as the dark matter field. Since the spin of the dark matter particle is unknown, such a scenario is not immediately excluded and, as we already discussed in section 5.4.2, the largeness of the physical Planck mass automatically weakens the interactions of the massive spin-2 field with all Standard Model particles.
On the theoretical side, it would be of great importance to understand the origin of the mass term for the spin-2 field at a fundamental level. For spin-1 fields, it is wellknown that the consistent picture at the quantum level is to generate their mass via spontaneous breaking of the gauge symmetry in the massless theory. To this end, it is necessary to invoke additional (Higgs) fields whose vacuum expectation values break gauge invariance and set the spin-1 mass scale. Since aspects of field theories generically do not become simpler with increasing spin, it is natural to expect an underlying mechanism that generates the mass term also for spin-2 fields. Such a "spin-2 Higgs mechanism" should break the two independent copies of diffeomorphism invariance in the g µν and f µν sectors down to their diagonal subgroup, which is the residual gauge symmetry of bimetric theory. The precise nature of the fields that are needed to trigger this symmetry breaking spontaneously is still unknown; a realisation on anti-de-Sitter space has been suggested in [289,290]. For a more detailed discussion of the topic, we refer the reader to the recent article [291]. Related to the origin of the mass term is the behaviour of bimetric theory at the quantum level. Quantum corrections at one-loop order in the effective field theory picture have been computed in massive gravity [214] and bimetric theory [170] and the results confirmed the stability of the spin-2 mass scale. On the other hand, without knowing the underlying mechanism that generates the interaction potential for the two metrics, it is difficult to tell to what extend these results can be trusted.
Theories with extended symmetries are known to generically exhibit an improved quantum behaviour. An example for the spin-2 case is the action for conformal gravity with its Weyl symmetry. Unfortunately, this theory suffers from a ghost but we have seen that bimetric theory contains a ghost-free model which seems to be closely related to the Weyl invariant action. Understanding its properties further and investigating possibilities to realise the gauge symmetry for partially massless at the nonlinear level (most likely by invoking additional degrees of freedom) could give us important new insights on the nature of quantum gravity.
Since consistent theories involving spin-2 fields are so rare, we do not expect to encounter a large variety of possibilities to extend them further. As it stands now, it does not seem to be possible to extend the kinetic sector in four dimensions [135,136,259] and the interaction potential is the only consistent one by construction. Up to field redefinitions, the structure of the ghost-free bimetric action is therefore unique. On the other hand, it is of course worth studying couplings to fields with different and, in particular, higher spins. The spectrum of presently known higher-spin theories contains only massless spin-2 fields (see, for instance, [292][293][294]). It would be interesting to see whether bimetric theory can give hints on the form of more general interactions including massive spin-2 degrees of freedom. At this stage, we can only hope that one day we will be able to write down the most general interactions for massless and massive spin-2 fields.

A Technical details of interaction potential
In the text, for notational simplicity, we frequently write the interactions in terms of elementary symmetric polynomials e n (S). These can be defined in various equivalent ways, for example via the following commonly used equalities which hold for any square d × d matrix S, e n (S) = S µ 1 [µ 1 · · · S µn µn] = 1 n! δ µ 1 ···µn ν 1 ···νn S ν 1 µ 1 · · · S νn µn = 1 n!(d − n)! µ 1 µ 2 ...µnλ n+1 ...λ d ν 1 ν 2 ...νnλ n+1 ...λ d S ν 1 µ 1 · · · S νn µn . (A.1) Here we have expressed them in terms of antisymmetrisation, the generalised Kronecker delta and the Levi-Civita symbol respectively, all normalised by unit weight. Obviously the last equality only makes sense when d is the dimension spanned by the indices. They can also be defined through a dimension-independent recursion formula, where square brackets denote a matrix trace. Moreover, the e n obey the following useful identity for any matrix S and parameter λ, This means that the consistent interaction potential can be regarded as a deformed determinant [46]. In d dimensions, they satisfy Y (n) = 0 for n ≥ d. In particular, Y (d) = 0 is simply the statement of the Cayley-Hamilton theorem, that any square matrix satisfies its own characteristic (or secular) equation.
Finally we note that all of the above expressions can easily be rewritten in terms of the inverse S −1 by using the identity,

B Derivation of redefined shift vector
Here we derive the expressions for the redefinition (3.19) of the ADM shift vector N i = γ ij N j and the matrices A and B in (3.27), following [61]. We start from the most general ansatz for the redefinition, where c i and d i are functions of γ ij , the new shift vector n i and the ADM components of f µν . They will be determined in what follows. Recall that the redefinition has to be linear in the lapse N in order not to introduce nonlinearities into the Einstein-Hilbert term (3.4). Next we turn to equation (3.18) and take the square of both sides to arrive at, 44 See e.g. [105] for an explicit derivation.
The left-hand side of this equation can be expressed in terms of ADM variables using (3.3) and (3.12). We get, .

(B.3)
In this expression we replace N i in terms of n i using our ansatz (B.1). Next, we collect the pieces according to their order in 1/N , which gives, where, in terms of a 0 ≡ L 2 − L l φ lk L k + c l L l and a i ≡ −L i + c l φ li , we have defined the matrices, Comparing this to the right-hand side of (B.2) and equating the coefficients in front of equal powers of 1/N , we see that, Since E 0 is a projector on a one-dimensional subspace, its square-root can easily be evaluated, resulting in, The special structure of E 2 also allows us to evaluate its square-root which gives, Using these expressions for A and B in the third equation of (B.6), we obtain, where we have also made use of the symmetry property φ ik D k j = φ jk D k i which follows directly from the definition of D in (B.8). This equation determines a particular combination of the unknown functions d i and c i in the redefinition. The redefinition is thus not uniquely determined. One simple option to fix the ambiguity is to choose the new shift vectors as n k = 1 L (c k − φ kl L l ). Then the above condition reduces to d i = D i k n k and the redefinition becomes, where L i ≡ φ ij L j . The definition of the matrix D in (B.8) depends on d i and therefore gives rise to the following matrix equation that needs to be solved for D, √ x D = (γ −1 − Dn(Dn) T )φ , x ≡ 1 − n l φ lk n k . (B.11) As shown in [61], the solution to this equation is given by (3.20).

C Short summary of standard GR cosmology
Here we very briefly recapitulate the derivation of the cosmological evolution equations in GR. Consider Einstein's equations of motions for g µν , where T µν is the stress-energy tensor obtained from variation of the matter Lagrangian and we have defined the constant energy density ρ Λ ≡ ΛM 2 Pl . When looking for homogeneous and isotropic solutions in General Relativity, one makes a Friedmann-Robertson-Walker ansatz for the metric, g µν dx µ dx ν = −dt 2 + a(t) 2 1 1 − kr 2 dr 2 + r 2 dΩ , (C. 2) in which k = 0, −1, +1 parameterises the curvature of the universe (flat, open, closed), a(t) is the scale factor, and dΩ = dθ 2 + sin 2 θ dϕ 2 . In accordance with homogeneity and isotropy, the stress-energy tensor is taken to be that of a perfect fluid, (T g ) µ ν = diag(−ρ, p, p, p) , where ρ(t) and p(t) are the time dependent energy density and pressure of the fluid, respectively. As a consequence of the Bianchi identity, the source is automatically covariantly conserved which implies the continuity equation, Moreover, from the acceleration equation (C.6) we infer that, in the absence of a bare cosmological constant, the expansion of the universe accelerates for w < −1/3 and decelerates for w > −1/3. Three cases are of particular interest: w = 0 characterises non-relativistic matter, w = 1/3 corresponds to relativistic matter ("radiation"), and w = −1 describes a cosmological constant. The energy density for non-relativistic matter therefore scales as a −3 , i.e. inversely proportional to the volume. Non-relativistic matter experiences an additional redshift and its energy density goes as a −4 . None of these can explain an accelerated expansion of the universe. In contrast, a cosmological constant which of course has constant energy density leads to an exponentially accelerated expansion.
It is common to measure the contributions of different fluid components in terms of their density parameters, where i stands for either radiation, non-relativistic matter or the cosmological constant and H 0 denotes the Hubble parameter at the present time. One also defines a curvature contribution, Ω k = − k a 2 H 2 0 , and in terms of these the Friedmann equation becomes Ω rad + Ω mat + Ω Λ + Ω k = H 2 H 2 0 .
(C. 12) In particular, at the present time the density parameters add up to one. Latest observational data [31] suggest the following (approximate) values for the cosmological parameters at the present time, These values imply that our universe is flat and dominated by a cosmological constant component. As already discussed above, the energy density corresponding to this cosmological constant is extremely small compared to energy scales of the Standard Model of Particle Physics. Since so far we lack an explanation for this small value, the curious energy component ρ Λ is often referred to as dark energy. On top of that, observations also show that the matter component Ω m is not dominated by familiar baryonic matter, but rather mainly consists of an unknown dark matter component.
Consequently, under the assumption that GR is indeed the theory of gravity, we must accept that 95% of the universe's energy content is not at all understood. Nevertheless, when this obscurity is ignored, the so-called ΛCDM model (GR with a cosmological constant and cold dark matter) fits the observational data very well.