Dark Energy in light of Multi-Messenger Gravitational-Wave astronomy

Gravitational waves (GWs) provide a new tool to probe the nature of dark energy (DE) and the fundamental properties of gravity. We review the different ways in which GWs can be used to test gravity and models for late-time cosmic acceleration. Lagrangian-based gravitational theories beyond general relativity (GR) are classified into those breaking fundamental assumptions, containing additional fields and massive graviton(s). In addition to Lagrangian based theories we present the effective theory of DE and the $\mu$-$\Sigma$ parametrization as general descriptions of cosmological gravity. Multi-messenger GW detections can be used to measure the cosmological expansion (standard sirens), providing an independent test of the DE equation of state and measuring the Hubble parameter. Several key tests of gravity involve the cosmological propagation of GWs, including anomalous GW speed, massive graviton excitations, Lorentz violating dispersion relation, modified GW luminosity distance and additional polarizations, which may also induce GW oscillations. We summarize present constraints and their impact on DE models, including those arising from the binary neutron star merger GW170817. Upgrades of LIGO-Virgo detectors to design sensitivity and the next generation facilities such as LISA or Einstein Telescope will significantly improve these constraints in the next two decades.

The Standard Model of Cosmology (or ΛCDM) stands as a robust description of our universe. It is based on the theory of General Relativity (GR), which dictates the long-range gravitational interactions, together with the Cosmological Principle, which describes the geometry as homogeneous and isotropic on large scales. Standard matter ( other major component is dark matter (DM), an undetected constituent that seeds cosmic structures. The last piece of the SM of Cosmology are the initial conditions, which are thought to be set by an early period of quasiexponential expansion known as inflation. Despite the observational success of this model [1], it remains as a puzzle the fundamental origin of each piece, which could be associated to new physics (see Fig. 1 for a summary of the different ingredients).
In the SM of Cosmology, the current accelerated expansion is explained by a constant energy density acting as a perfect fluid with negative pressure. Such a cosmological constant (CC) term is perfectly consistent with present observations but notoriously disagrees with theoretical expectations for the vacuum energy [2,3]. If this energy density is let to evolve in time, one naturally arrives to a dynamical description of DE sourced by a cosmological scalar field [4]. If this field is now allowed to interact (non-minimally) with gravity, the possibilities to describe the cosmic expansion escalate [5]. Alternatives to ΛCDM offer the possibility to alleviate some of its tensions. For instance, DE models with an effective equation of state more negative than the cosmological constant could ease the tension between the local measurement of the Hubble constant and the inferred value from the cosmic microwave background (CMB). Exploring the largest scales with galaxy surveys like Euclid or LSST will help us understanding the expansion history of the universe and will provide new insights about gravity. Gravity can be tested at different scales and regimes. Classical tests of gravity range from laboratory experiments to Solar System distances, and cover gravity in its weak field regime [6]. Astrophysical observations provide new avenues to improve these tests [7]. Pulsars in particular can be specially constraining, for instance with the recent observations of a triple stellar system [8]. Tests in a much stronger regime has been performed tracking stellar orbits around the galactic center [9]. Alto-gether, these observations severely constrain modifications of GR. Theories beyond Einstein's theory should thus resemble GR at small scales, e.g. hiding fifth forces with screening mechanisms [10,11]. At large scales, however, present constraints are considerably weaker. Combining different probes could be crucial to set an observational program to test gravity from cosmology [12].
GWs could be critical in resolving the open problems of the SM of Cosmology. For instance, (non) observations of cosmological backgrounds of primordial GWs test inflation. BBHs events teach us about the population of BHs, which constrains their possible contribution to DM and their possible primordial origin [22]. Moreover, if DM is described by ultra-light bosons or axions, it could resonate with pulsars [23] or form clouds around BHs observable with GWs [24]. Finally, BNS with an associated counterpart such as GW170817 [25,26] become standard sirens [27] and allow to probe DE. In this review we will focus on this last case, exploring the possibilities of multimessenger GW astronomy to probe the nature of DE and the fundamental properties of gravity (see a schematic timeline of present and future facilities in Fig. 2).

A. Summary for the busy reader
Dark energy is the major component of the universe and yet its nature escapes our present understanding. Beyond the cosmological constant paradigm, a plethora of alternative theories of gravity has been proposed to explain the current cosmic acceleration (see Fig. 3 for FIG. 2: Schematic multi-messenger GW astronomy timeline. The binary neutron star (BNS) rate, the localization area in the sky, and the number of BNS detections are given for past and future observation runs. Second generation (2G) ground-based detection runs organize in five runs (O1-O5) with different number of detectors online (from 2 to 5) [28]. The nomenclature used is H=Hanford, L=Livingston, V=Virgo, K=KAGRA, I=IndIGO. Third generation (3G) detectors projected are Einstein Telescope (ET) [29] or Cosmic Explorer (CE) [30]. The localization in 3G depends on the network of detectors which is still uncertain [31]. For reference, we include the timeline space-based detector LISA [32]. The reader should note that this numbers correspond to present expectations. For more details we refer to Sec. III D.
a roadmap of possible modifications of gravity). We present an overview of the landscape of theories in Sec. II, as well as a summary of the different approaches to cosmological gravity (see Fig. 4 for a schematic diagram). Gravitational wave astronomy opens new possibilities to probe gravity and DE. For readers unfamiliar with the basics of GWs, we provide a short introduction in Sec. III. For the purpose of cosmology, the most promising GW events are those that can be observed by other messengers (either EM waves or neutrinos). There are four main tests one can do with multi-messenger GW events: • Standard sirens (Sec. IV): the amplitude of GWs is inversely proportional to its luminosity distance. If a counterpart of the GW is observed, a redshift measurement of the source is possible and the cosmic expansion history can be constrained. For close by sources, only the Hubble constant is measured. Future standard sirens measurements could help resolving the present tension in H 0 (see Fig. 8).
• GW speed (Sec. V): the propagation speed of GWs follows from the dispersion relation. Once the location of a GW event is known, it is possible to compare the speed of GWs with respect to the speed of light. Many alternative gravity theories predict that GWs propagate at a different speed either by modifying the effective metric in which GWs propagate, by inducing a mass for the graviton or by introducing higher order terms in the dispersion relation.
• GW damping (Sec. VI): modified gravity interactions can also alter the amplitude of GWs. In addition to the cosmic expansion, effective friction terms can damp GWs. This introduces an inequality between the GW and the EM luminosity distance that can be tested.
• Additional polarizations (Sec. VII): in alternative theories of gravity, there could be additional modes propagating. These extra polarizations could be directly tested if the source is localized and there is a network of detectors online. Moreover, these modes could mix with the tensor perturbations leading, for instance, to GW oscillations.
In this review we aim at summarizing current bounds on gravity theories and dark energy models from the first multi-messenger GW detection, GW170817. Up to date, the most constraining test is the GW speed. We also survey the prospects of different multi-messenger tests with future detectors. Significant improvements can be achieved in probing the GW Hubble diagram with an increasing number of events. A schematic timeline of multi-messenger GW astronomy is presented in Fig. 2 (the reader should be aware that expectations far in the future are very preliminary). The theoretical implications of present and future observations are discussed in Sec. VIII. We close the work in Sec. IX with an outlook of prospects and challenges of multi-messenger GW tests of gravity and DE.

II. THEORIES OF GRAVITY AND DARK ENERGY
The quest to test gravity and find alternatives to the cosmological constant has produced many theories beyond Einstein's General Relativity (GR) and other descriptions of gravity on cosmological scales. We will classify the different means to modify Einstein's theory and review their status as descriptions of cosmic acceleration. Then we will review other general approaches to describe gravity on cosmological scales, namely through the effective theory of dark energy and phenomenological parameterizations of the gravitational potentials. The landscape of alternative theories is summarized in Fig. 3 and the approaches to cosmological gravity are schematically described in Fig. 4.

A. Theories of Gravity
The action-based approach to modify gravity is based on generalizing the Einstein-Hilbert action where G is Newton's constant and S m denotes the action of matter, universally and minimally coupled to the metric g µν . Variation of the action (1) with respect to the metric leads to Einstein's field equations where R µν is the Ricci tensor, R ≡ g µν R µν the Ricci scalar and T µν = −2 √ −g δSm δg µν is the matter energymomentum tensor. Einstein's equations can be used to obtain solutions for the space-time (g µν ) given the matter content (T µν ) in any physical situation, including cosmological solutions relevant to study dark energy.
The structure of gravitational theories is severely restricted and several results can be used to prove the uniqueness of General Relativity under quite broad assumptions. Weinberg's theorems restrict possible infrared (low energy) interactions of massless, Lorentz invariant particles, which for spin-2 lead unavoidably to the equivalence principle [33] and the derivation of Einstein's equations [34]. 1 At the classical level, the results of Lovelock imply that the Einstein-Hilbert action is unique in 4D [36,37].
According to the above results, alternative theories of gravity can be classified into those that • Break the fundamental assumptions. 1 In addition to GR, there is another theory for massless, spin-2 fields in 4D, Unimodular Gravity, which is invariant under diffeomorphisms preserving the 4D volume element [35].
• Include additional fields.
• Make the graviton massive.
Note that those descriptions are not exclusive, and many theories fall within several categories. For instance: bimetric gravity has an additional field (tensor) and contains a massive graviton, Einstein-Aether is both Lorentz-violating and includes a vector field, TeVeS has a scalar in addition to a vector, and many extradimensional models can be described in terms of additional fields in certain limits. Also, when referring to massive gravitons, we will be considering only classical spin-2 fields.

Breaking fundamental assumptions
The theorems that fix the structure of General Relativity assume a four dimensional pseudo-Riemannian manifold and local interactions satisfying Lorentz invariance. Any departure from these principles offers a way to construct modified theories of gravity. 2 a. Extra dimensions: Additional spatial dimensions allow the inclusion of new operators constructed only from the metric tensor. The canonical example are Lovelock invariants [36], such as the Gauss-Bonnet term (a topological term in 4 dimensions which does not contribute to the equations of motion). The lack of observation of extra dimensions requires some mechanism to hide them. One example is compactification, when extra dimensions are sufficiently small that they are not accessible to experimental tests [38,39]. A radically opposite view consist on Braneworld constructions, in which the standard model fields live in a 3+1 dimensional brane, embedded in the higher dimensional space [40][41][42]. The Dvali-Gabadadze-Porrati (DGP) model [43,44] is one such construction in which self-accelerating solutions 3 can be obtained. However, this branch of solutions is plagued by a ghost instability. The 4D effective theory can avoid this problem and was the origin of Galileon gravity [45].
b. Lorentz Invariance Violation Gravity can be extended by breaking Lorentz invariance. In many of these alternatives a preferred time direction emerges spontaneously breaking Lorentz symmetry (see [46] for a review). Hořava gravity [47] implements Lorentz violation through a preferred foliation of space-time, with the attractive property that Lorentz symmetry can be recovered at low energies (see [48][49][50][51][52] for extensions/variants) and leading to a power-counting renormalizable theory 2 A class of GR extensions include additional geometric elements like torsion or non-metricity. These elements can be viewed as either breaking the fundamental assumptions or including additional fields. 3 Self-accelerating solutions are those in which there is a late time acceleration without a cosmological constant (Λ = 0).

General Relativity
Unique theory of massless g µν

Massive
Gravity Additional Field  of gravity. Another class of Lorentz-violating theories is Einstein-Aether, in which a vector field with constant norm introduces a preferred direction [53]. The special case of Einstein-Aether theories in which the vector field is the gradient of a scalar is known as Khronometric [54]. Khronometric theories describe the low-energy limit of some extension of Hořava-gravity, linking the two frameworks [55]. These ideas have been studied as cosmological scenarios [56,57].
c. Non-local theories Non-local theories include inverse powers of the Laplacian operator. These models can involve general functions (e.g. R · f ( −1 R)) [58,59] or be linear (e.g. R m 2 2 R) [60]. The latter class of models lead to phantom dark energy [61,62] and are compatible with cosmological observations [63] (see [62] for a review). However, their viability on the solar system is disputed due to the time evolution of the effective degrees of freedom and the lack of a screening mechanism [64]. Non-local interactions have been also proposed as a means to improve the ultra-violet behavior of gravity [65,66]. Non-local models are constructed using the Ricci scalar, since non-local terms involving contractions of the Ricci tensor give rise to cosmological instabilities [67,68].

Additional fields
Gravity can be extended by the inclusion of additional fields that interact directly with the metric. These theories will vary by the type of field (scalar, vector, tensor) and the interaction with gravity it has. Theories with additional tensors (bigravity and multigravity) are extensions of massive gravity and will be described in Sec. II A 3. We will assume a minimal universal coupling of matter to the metric. For a very complete review of gravity theories containing additional fields, see Ref. [69].
a. Scalar field A scalar is the simplest field by which gravity can be extended. Scalars do not have a preferred orientation and thus a macroscopic, classical state can exist in the universe without affecting the isotropy of the space-time if it depends only on time. Moreover, a potential term can mimic a cosmological constant very closely in the limit in which the field is varying very slowly (e.g. if the potential is very flat), which is the foundation of the simplest single-field inflation and dark energy models (quintessence). Scalar fields may also arise in effective descriptions of fundamental theories belonging to other categories, such as braneworld constructions [70][71][72]. These properties had lead to a proliferation of scalar-based models to describe accelerating cosmologies, both in the context of inflation and dark energy.
Recent efforts to study scalar-tensor theories have lead to a classification based on the highest-order derivatives of the additional field present in the action and the equations of motion, with three generations of theories 1. Old-school scalar tensor theories: 1 st order derivatives in the action, 2 nd order in equations.
2. Horndeski theories [73]: 2 nd order derivatives in the action and 2 nd order in equations.
3. Beyond Horndeski: 2 nd order derivatives in the action and higher order in equations.
The classification is motivated by Ostrogradski's theorem, which states that theories with second and higher (time) derivatives in the action generically introduce unstable degrees of freedom [74,75]. While most physical theories belong to the first class, known loopholes to Ostrogradski's theorem exits, for instance in effective or non-local theories (in which the ghost degrees of freedom are removed) [76] or when the theory is degenerate (that is, the inversion to canonical variables is not possible). The degeneracy condition is automatically satisfied if the equations of motion are second order, but that is not strictly necessary (different conditions appear when there are additional degrees of freedom [77]). 4 Known viable beyond Horndeski theories are known as Degenerate Higher Order Scalar Tensor (DHOST) [80], which have second derivatives in the action (higher derivatives in the equations), but recently toy models with higher derivatives in the action have been proposed [81]. Old-school scalar-tensor theories contain at most first derivatives of the scalar in the action. They can be seen as a generalization of the Jordan-Brans-Dicke theory of gravity [82] where X ≡ −∇ ν φ∇ ν φ/2 is the canonical kinetic term of the scalar field. This theory includes GR (ω = 1, K = Λ), quintessence (ω = 1, K = X − V ) [83,84], Brans-Dicke 4 Scalar-tensor theories can be reformulated in terms of differential forms in which the second order equations follow naturally from the antisymmetry of this language [78]. This approach can be generalized to gravity theories with additional vector and tensor fields as well [79]. models [82] (ω = φ, K = ωBD φ X − V (φ)), k-essence [85,86] (ω = 1, K = K(φ, X)). Archetypal modified-gravity models such as f (R) [87][88][89] are equivalent to instances of these theories [90]. Chameleons [91] and symmetrons [92] also belong to this class of theories (see [93] for a review). Certain freedom exists in writing the theory due to the possibility of rescaling the metric g µν →ḡ µν = C(φ)g µν and redefining the scalar field, i.e. the Jordan frame in which the metric is minimally coupled (3) and the Einstein frame in which ω is constant but matter is explicitly coupled to the scalar [94]. Current cosmological observations constrain the Brans-Dicke parameter ω > 692 (99%) [95].
Horndeski's theory contains the best understood examples of scalar-tensor theories. The Horndeski action encompasses all local, 4D Lorentz invariant actions whose metric and field variation leads to second order equations of motion [73] (Horndeski's theory is also known in the literature as Generalized Galileons [96,97]). Horndeski's action reads where we have assumed minimal and universal coupling to matter in S m . The sum is over the four Lagrangians where K and G A are functions of φ and X ≡ −∇ ν φ∇ ν φ/2, and the subscripts X and φ denote partial derivatives. Horndeski theories include all the generalized Jordan-Brans-Dicke type, plus new additions that involve second derivatives of the scalar at the level of the action. These include kinetic gravity braiding (KGB) (K(X), G 3 (X)) [98][99][100], covariant galileons (K, G 3 ∝ X, G 4 , G 5 ∝ X 2 ) [45,101], disformal [102] and Dirac-Born-Infeld gravity (G i ∝ 1 + X/Λ 4 i ) [70,103], Gauss-Bonnet couplings [78] and models self-tuning the cosmological constant [104,105]. Just as Brans-Dicke is invariant under rescalings of the metric, Horndeski theories are invariant under field-dependent disformal transformations g µν →ḡ µν = C(φ)g µν + D(φ)φ ,µ φ ,ν , which amount to a redefinition of the Horndeski functions G i (and the introduction of an explicit coupling to matter) [106].
Theories beyond Horndeski have higher order equations of motion without including additional degrees of freedom. The first examples of these theories [107] were related to GR by a metric redefinition involving derivatives of the scalar field [108], applied to the gravity sector. The simplest such beyond Horndeski theory emerged from the metric rescaling with derivative dependence C = Ω 2 (X, φ), D = 0, and was dubbed kinetic conformal gravity [107] where S φ is an additional scalar field Lagrangian. One of the premises in constructing this type of theories was the existence of an inverse for the relation (9), which can be studied through the Jacobian of the mapping [107]. If this assumption is broken the resulting theory is mimetic gravity [109], a gravitational alternative to dark matter. Interestingly, the conformal relation between kinetic conformal gravity (10) and GR ensures that this is one of the theories in which the speed of GWs is nontrivially equivalent to the speed of light [110,111]. The best known beyond Horndeski theory is given by the Gleyzes-Langlois-Piazza-Vernizzi (GLPV) action [112], which consists of Horndeski plus the additional Lagrangian terms: Horndeski and GLPV Lagrangians of the same order, i.e. L 4 + L 4b (7+11) or L 5 + L 5b (8+12), can be mapped to Horndeski via g µν →ĝ µν = C(φ)g µν + D(X, φ)φ µ φ ν showing the viability of these combinations [112,113]. For generic combinations of Horndeski and GLPV, viability arguments were first based on a special gauge (unitary gauge) that assumed that the scalar field derivative φ µ is timelike. Subsequent analyses eventually lead to covariant techniques to study the degeneracy conditions [80] (see [114] for earlier criticism). These techniques later showed that not all Horndeski and GLPV combinations met the degeneracy condition on a covariant level [115]. The study of degeneracy conditions for scalar-tensor theories ultimately lead to the degenerate higher-order scalar-tensor (DHOST) [80] paradigm classification of theories with the right number of degrees of freedom (also known as Extended Scalar-Tensor or EST) [116]. DHOST theories include cases beyond conformal kinetic gravity (10) and GLPV theories (11,12). DHOST theories are invariant under general disformal transformations (9), which can in turn be used to classify them [117] (see also [118]). DHOST theories have been fully identified including terms with up to cubic second-field derivatives in the action, e.g. ∼ ( φ) 3 [119]. Demanding the existence of a Poisson-like equation for the gravitational potential restricts the space of DHOST theories to those that are related to Horndeski via disformal transformations (9) [120].
b. Vector field Theories with vector fields have been proposed as modifications to GR and in the context of dark energy. A background vector field does not satisfy the isotropy requirements of the cosmological background, unless it points in the time direction and only depends on time A µ = (A 0 (t), 0, 0, 0). Isotropy can also happen on average, if a vector with a space-like projection oscillates much faster than the Hubble time [121]. In that case the background is isotropic on average but the perturbations (including gravitational waves) inherit a residual anisotropy [122]. Finally, theories with multiple vectors can satisfy isotropy, for instance, if they are in a triad configuration A a µ = A(t)δ a µ [123]. 5 A large number of vectors can also lead to statistical isotropy (e.g. if the orientations are random) [125]. The kinetic term for a vector field, F µν F µν , is defined by the gauge invariant field strength F µν = ∂ µ A ν − ∂ ν A µ and the addition of a mass term m 2 A 2 µ is known as Proca theory [126]. Proca theories have been generalized to include explicit gravitational interactions of a massive vector field [127][128][129][130]. The vector field Lagrangian is built so that precisely one extra (longitudinal) scalar mode propagates in addition to the two usual Maxwell-like transverse polarizations. Its full generalization contains terms with direct couplings between the vector and space-time curvature, whose structure closely resembles those of Horndeski's theory (7,8). In analogy to beyond Horndeski, there are also beyond generalized Proca interactions [131,132]. Further extensions to multiple vector fields known as generalized multi-Proca/Yang-Mills theories are able to incorporate new couplings [133] and configurations [134], e.g. the extended triad A a µ = φ a δ 0 µ + A(t)δ a µ , as do theories with a vector and a scalar (Scalar-Vector-Tensor) [135]. For more details about these theories we recommend Ref. [69].
An iconic theory containing a vector is the Tensor-Vector-Scalar (TeVeS) theory by Bekenstein [136]. TeVeS emerged as a relativistic theory able to describe Modified Newtonian Dynamics (MOND), and thus as an alternative to dark matter. For an overview of field-theoretical aspects of TeVeS and related theories, including other relativistic MOND candidates, see Ref. [137]. TeVeS theory introduces several non-minimal ingredients. In addition to the gravitational metricg µν matter is minimally coupled to an effective metric which generalizes the scalar disformal relation (9), incorporating the vector. Hereg µν is the gravitational metric, φ is the scalar. The vector A µ is enforced to be time-like and normalized with respect to the gravitational metric g µν A µ A ν = −1. TeVeS has a very rich phenomenology, including effects in GW propagation [138]. At the level of cosmology it is partially able to mimic DM, although the oscillations of the fields make it hard for the theory to reproduce the peaks in the CMB [139][140][141].

Massive gravity and tensor fields
Giving a mass to the graviton is another means to extend GR, with gravity mediated by a particle with mass m g , spin s = 2 and 2s + 1 = 5 polarization states (see [142] for bounds on the graviton mass). Weinberg theorem on the structure of GR relies on the infrared properties of the interactions: a mass term changes this structure. Despite this clear loophole, constructing a selfconsistent theory of massive gravity, free of pathologies and with the right number of degrees of freedom proved an extremely hard endeavor that took nearly 70 years to complete. The linear theory of massive gravity was formulated in 1939 by Fierz & Pauli [143] as linearized GR plus a mass term It was later found that Fierz-Pauli theory was discontinuous and gave different results from GR in the limit m g → 0 (vDVZ discontinuity) [144,145]. The discrepancy is due to the longitudinal polarization of the graviton (the helicity-zero mode) not decoupling in that limit. Considering non-linear interactions solved the apparent discontinuity by hiding the helicity-zero mode, which is strongly coupled in regions surrounding massive bodies and effectively decouples, recovering the GR predictions when m g → 0 [146]. Despite this progress, massive gravity had another important flaw: all theories seemed to have an additional mode (known as Boulware-Deser (BD) ghost) that renders the theory unstable [147,148]. a. Ghost Free Massive Gravity The apparent difficulties were overcome in de Rham-Gabadadze-Tolley theory (dRGT) [149], also known as ghost-free massive gravity (for current reviews on the theory see [150,151]). dRGT is a ghost free theory propagating the 5 polarizations corresponding to a spin-2 massive particle, universally coupled to the energy-momentum tensor of matter (cf. Fig. 6). The ghost-free property was initially shown in the decoupling limit (in which the helicity-0 mode decouples from the other polarizations) and then in the full theory [152,153] . The phenomenological deviations induced by massive gravity are primarily due to the helicity-0 mode. On small enough scales the Vainshtein mechanism [146] (see [154] for a review) effectively suppresses these interactions, leading to predictions very similar to GR on Solar System scales (however, new classes of solutions for black holes do exist, in addition to the usual ones [155]).
Massive gravity may offer a solution to the accelerating universe. A heuristic argument is that the force mediated by the massive graviton has a finite range V ∼ 1 r exp(−r/λ g ), weakening over distances larger than the Compton wavelength of the graviton r λ g = /(m g c 2 ). Hence, if the mass of the graviton is m g ∼ H 0 then gravity weakens at late times and on cosmological scales, causing an acceleration of the cosmic expansion relative to the GR prediction. The program to apply massive gravity as a dark energy model has hit important barriers, as flat FLRW solutions do not exist in this theory [156]. Accelerating solutions without a cosmological constant (CC) do exist with open spatial hypersurfaces [157], but they are unstable [158]. Proposed solutions include the graviton mass being generated by the vacuum expectation value of a scalar [156] or deformations of the theory in which the BD ghost is introduced, which provides dynamical and accelerating but meta-stable solutions [159]. Other ways to make massive gravity dynamical include the addition of a new field, such as a scalar field, e.g. quasi-dilaton [160], or one (or several) dynamical tensors in bigravity (and multigravity).
b. Bigravity and Multigravity In order to write a mass term for the metric, dRGT incorporates an additional, non-dynamical tensor, akin to the occurrence of η µν in eq. (14). Massive gravity can be extended by including a kinetic term to the auxiliary metric, which becomes fully dynamical. This leads to the theory of bigravity (or bimetric gravity) [161], which contains two spin-2 particles: one massive and one massless. The same procedure can be extended to more than two interacting metrics, leading to multigravity theories [162]. In these constructions there is always one massless excitation of the metric (a combination of the different tensor fields), with all other excitations being massive.
Bigravity solves the problem of cosmological evolution, at least at the background level. Flat FLRW solutions do exist, and many viable expansion histories have been found to be compatible with data [163]. However, it was later found that these models had instabilities that affected the growth of linear perturbations [164], which were found to be quite generic across different branches of solutions [165]. In some cases the instabilities affect only scales sufficiently small for non-linear effects to be important (i.e. the Vainshtein mechanism) which might render the theory stable [166]. Another solution is to choose the parameters of the theory so instabilities occur at early times, when characteristic energies are high and bigravity is not a valid effective field theory. This happens by choosing a large hierarchy between the two Planck masses: the so-obtained theory is practically indistinguishable from GR plus a (technically natural) CC [167].

B. Descriptions of cosmological gravity
The immense variety of alternative theories has motivated the search for effective descriptions able to capture the phenomenology of generic dark energy models. The covariant actions approach reviewed in Sec. II A offers several advantages, including 1) full predictivity, as (classical) solutions can be found from microscopic scales, to strong gravity and all the way to cosmology, 2) selfconsistency, as different regimes can be computed for the same theory, leading to tighter constraints when the data is combined. For instance, following this approach, we discuss the cosmology of covariant Galileons in Sec. IV A.

3) Gravitational "constants"
• Fully general Nonetheless, a great downside of this approach is that the predictions for every model/theory have to be obtained from scratch, which makes the exploration of the theory space a daunting task. An alternative route is to constrain deviations from GR, without reference to any fundamental theory. The tradeoff is to keep the theory of gravity as general as possible at the expense of dealing with a very simple spacetime. The simplest situation is where the background space-time is flat and maximally symmetric (Minkowski), a setup useful to model gravity in the Solar System. In this simple case one can define a series of quantities, known as Parameterized Post-Newtonian (PPN) coefficients, that describe general modifications of gravity over Minkowski space (see Ref. [6] for details, including constraints and additional assumptions). These PPN parameters that can be constrained by experiments (such as the deflection of light by massive bodies) and computed for any theory, and thus provide a very efficient phenomenological dictionary.
In cosmology we are interested in describing gravity over a slightly less symmetric background: a spatially homogeneous and isotropic, but time evolving, Friedmann-Lemaitre-Robertson-Walker (FLRW) metric: where metric perturbations are in Newtonian gauge with the sign conventions of Ma & Bertschinger [168]. The tensor perturbation is symmetric, transverse and traceless (∂ i h ij , δ ij h ij = 0) and we have ignored vector perturbations. The time-evolution of the cosmological background makes an extension of PPN approach to cosmol-ogy a difficult task, as instead of constant coefficients one needs to deal with functions of time due to the evolution of the universe.
The most important example of an effective description in cosmology is the parameterization of the cosmological background, often done in terms of the equation of state w ≡ p/ρ [169,170]. Instead of computing the modifications to the Friedmann equations and the pressure and energy density contributed by the additional fields, a general approach to cosmological expansion is to specify w(z) so that This is sufficient to describe any cosmological expansion history and in any theory (as long as matter is minimally coupled) just by using the Friedmann equation (16) as a definition for ρ DE .
Describing the perturbations requires more functional freedom. Here we will review two common procedures, namely the effective theory of dark energy and the modified gravitational "constants". The different approaches (including the covariant theory approach), their features and connections are outlined in Fig. 4. Consistency checks between the background and perturbations can also be used to test the underlying gravity theory [171,172].

Effective theory of Dark Energy
The effective (field) theory of dark energy (EFT-DE) [173][174][175] can be used to systematically describe general theories of gravity over a cosmological background (see Ref. [176] for a review). The original formulation applies to theories with a scalar field φ and uses the unitary "gauge": a redefinition of the time coordinate as the constant φ hypersurfaces (this is always possible if φ ,µ is time-like and non-degenerate, as in perturbed cosmological backgrounds, but not in general). One then constructs all the operators compatible with the symmetries of the background (recalling that the time translation invariance is broken by the cosmological evolution).
A very convenient basis for the EFT functions was proposed by Bellini & Sawicki [177], when restricted to Horndeski's theory. In their approach the EFT functions are defined by the kinetic term of the propagating degrees of freedom in the equations of motion. The dynamical equation for tensor perturbations introduces two dimensionless functions • tensor speed excess α T describes the modification in the GW propagation speed c 2 g = (1 + α T ). This modification is frequency independent (see Sec. V).
• Planck-mass run rate α M enters as a friction term.
It is related to the cosmological strength of gravity M 2 * (the kinetic term of tensor perturbations) by The equations in the scalar sector (eqs. (3.20), (3.21) of [177]) can be used to define the remaining functions. If we look only at the second time derivatives (that is, the kinetic terms) (20) (note the ellipsis denote terms without second time derivatives) one can define • braiding, or kinetic gravity brading α B quantifies mixing between the second derivatives of the metric in the field equation (and vice versa). This is a generic property of modified gravity [98,178].
• kineticity α K modulates the "stiffness" of the scalar field (how hard it is to excite perturbations in φ).
The kineticity is intimately related to the propagation speed of scalar perturbations, which satisfies c 2 s ∝ α −1 K : higher kineticity values lead to slower scalar waves and vice versa.
These functions can be computed from the Lagrangian functions in (4), and for a given theory will depend on the value of the scalar field and its time derivative. Constraints on the α-functions can also be used to reconstruct the terms in a fundamental theory, as shown in Tab. I. Systematic reconstructions of the Lagrangian from the α functions have been also explored [179,180]. Increasingly complex theories of gravity lead to a larger number of EFT functions. In beyond Horndeski theories of the GLPV type, e.g. (11,12), a new function α H is introduced [113] which phenomenologically produces a weakening on gravity on small but linear cosmological scales [181]. In DHOST theories including (10) the situation is more involved, as the new EFT functions (α L , β 1 , β 2 , β 3 ) need to be related to each other and α T , α H by the degeneracy conditions that prevent the introduction of additional degrees of freedom [120]: this leads to two classes of theories with one free function, which is either α L or one among β i . New EFT functions appear beyond scalar-tensor theories, as has been explicitly derived for vector-tensor [182] and bimetric [183] theories (including bimetric gravity), with a unified treatment of theories with different degrees of freedom [184].
Different versions of the linear EFT-DE approach has been implemented in numerical codes able to obtain predictions based on linear perturbation theory. Publicly available implementations exist in EFTCAMB [185], hi class [186] and COOP [187], with the first two based on the CAMB and CLASS Boltzmann codes [188,189]. In addition, the CLASS-Gal code (integrated into CLASS) can be used to compute relativistic corrections to cosmological observables [190]. These and other codes have been tested against a large class of models at a level of precision sufficient for current and nextgeneration cosmological experiments [191].
The EFT framework has been tested using linear ob-servables. Horndeski theories were tested against current experiments, leading to O(0.1 − 1) constraints on the α-functions varying over α B , α M , α T [192], with α M = −α B [193] and setting α T = 0 to reflect the strong bounds on the GW speed [194] (α K is very weakly constrained by current data). Future experiments have great potential to improve on these bounds, and are expected to improve the sensitivity to O(0.01 − 0.1) [195][196][197][198][199].
EFT-based modifications of gravity might be observable through relativistic effects on ultra-large scales [197,200] (see also the discussion in Sec. II B 2): these techniques might improve significantly our ability to constrain α K , although it will remain the hardest to measure [196]. Those works used simple functional dependence of the EFT functions. It has been nonetheless shown that simple parameterizations are indistinguishable from more complex models in most cases, even for next-generation cosmology experiments [201].
The EFT approach has been generalized beyond linear perturbations for Horndeski theories. Including nonlinear cosmological perturbations in general introduces new functions at every order in perturbation theory (e.g. to compute the bispectrum [202]). However, a restriction to cubic and quartic operators (in the unitary gauge) leads to only 3 new operators on quasi-static scales [203]. Some applications of non-linear EFT-DE include corrections to the power spectrum (e.g. [204]), the use of higher-order correlations as a test of gravity, such as the bispectrum of mattter [202], galaxies [205] and CMB lensing [206] or the the non-linear shift of the BAO scale [207].

Modified Gravitational "constants"
A very commonly used approach employs general modifications of the equations relating the gravitational potentials to the matter density contrast (note that different conventions exist in the literature).
Here δ is the density contrast in Newtonian gauge (15) and the functions µ, Σ parameterize the evolution of the gravitational potentials as a function of time a and scale k. The functions µ, Σ are often referred to as G matter , G light because gradients of Ψ determines the force felt by non-relativistic particles and those of Ψ+Φ the geodesics of massless particles (and thus the lensing potential). The ratio of the gravitational potentials, is of particular interest, since GR predicts that it is exactly one in the absence of radiation and any sizable deviation could be an indication of modified gravity.
This approach has numerous advantages as a test of gravity against data. It is completely theory agnostic, not requiring any information on the ingredients or laws of the theories being tested. Most importantly, it is completely general for universally coupled theories: given any solution ∆, Ψ, Φ(a, k) it is possible to obtain µ, Σ through (21,22). In this sense, any finding of µ, Σ = 1 might point towards deviations from GR and warrant further investigation.
The main shortcoming of this approach is its great generality: any practical attempt to implement (21,22) requires a discretization of the functional space, introducing 2 · N k · N z free parameters for a homogeneous binning. In contrast, the EFT approach for Horndeski theories (18,19) requires only 4 · N z parameters, making it a more economic parameterization for all but the simplest scale-dependencies (N k = 1, 2). Capturing the full scale dependence of µ, Σ requires either a large parameter space or assumptions about the k-dependence.
A common practice to overcome this limitation is to choose a functional form for µ, Σ as a function of scale. For Horndeski theories the functional form is a ratio of quadratic polynomials in k [208,209] for functions h i that depend on redshift through the theory (4) and the scalar field evolution. The mapping is exact on small scales in which the field dynamics can be neglected, below scalar sound horizon [210]. A kdependence as the ratio of polynomials is generic in local theories at quasi-static scales [211], with higher order polynomials possible in Lorentz-violating [212], multifield [213] theories. Studies with current data have tested rather simple parameterizations of µ, Σ: for instance the Planck survey tested the case of k-independent µ, η in addition to the theory-motivated (24) [193]. Future surveys will improve the resolution on the scale-dependence: 3 k-bins are the minimum to constraint all the parameters in eq. (24), with 6 bins in z [214,215]. A limited handle on scale-dependence on ultra-large scales might be achievable [216,217] (see also [218][219][220] for related parameterizations). Another main shortcoming of the completely general approach is that there is no information from other regimes. The major setback with respect to EFT is the lack of information from gravitational wave observables, while in EFT the tensor and scalar sectors are modified accordingly i.e. GW data restrict the modifications available to scalar perturbations, for instance, theories with η = 1 require either α M or α T to be non-zero [221]. Attempts to explore the connections between µ, Σ and the EFT approach in Horndeski-like theories have used very general parameterizations: connecting theoretical viability conditions of the theory with the behavior of µ, η [222], including the case with α T = 0 to address the impact of the GW speed measurement [223]. General properties of Horndeski theories could be inferred from detailed measurements of µ, Σ [224]. Similarly to the EFT approach, the background evolution is unknown and the equation of state (17) is in principle arbitrary. However, theoretical priors on w(z) can be obtained for broad classes of Lagrangians (e.g. quintessence [225]) or from stability conditions in general realizations of the EFT functions [226].

III. BASICS OF GRAVITATIONAL WAVES
Gravity is a universal, long-range force. This, in field theory language, implies that it must be described by a metric field g µν in order to manifestly preserve locality and Lorentz invariance. At low energies, the leading derivative interactions are second order. Therefore, gravity theories generically predict the existence of propagating perturbations or, in other words, the existence of GWs. One can define a metric perturbation h µν as a small difference between the metric field g µν and the background metric g B where |h µν | 1. However, in curved space it is nontrivial to distinguish the perturbation from the background unless the latter posses some degree of symmetry, e.g. flat space or FLRW. A way out is to define GWs via geometric optics [227]. In this context, the key element to distinguish the GW from the background is the size of the fluctuations λ gw with respect to the typical size of the background variation L B . One could associate the typical variation scale in the background with the minimum value of the components of the background Riemann tensor For astrophysical sources, we will see later that the wavelength of the GW λ gw is orders of magnitude smaller than the typical variations of L B for cosmological setups. The fact that λ gw L B implies that there is a clear hierarchy between background and perturbations, allowing to solve the problem using an adiabatic (or WKB) expansion.
In the following, we describe the basics of GWs. We begin by introducing GWs in GR. Then, we explore the propagation in cosmological backgrounds. Subsequently, we describe how this picture is changed beyond GR. Finally, we discuss the status of present and future GW detectors. We recommend the reader Ref. [227][228][229][230][231] for more details.

A. GWs in General Relativity
General Relativity is a universal, infinite-range force. As we have seen in the previous section, this implies that it is described by a massless, spin-2 field. The dynamics is described by Einstein's equations (2). Importantly, not all the components of the Einstein tensor G µν contain second order time derivatives of the metric g µν . This implies that not all of the 10 components of g µν will propagate. In particular, the G 0µ equations act as 4 constraint equations. This, together with the 4 unphysical modes reduced by the gauge choice, leaves only 2 propagating degrees of freedom. This is precisely what one would expect for a massless spin-2 particle.
In order to study GWs, the next step is to study the linearized Einstein's equations. To diagonalize the equations for the tensor perturbations, one has to introduce the trace-reversed perturbation whose name comes from the fact thath = −h where h = g µν B h µν andh = g µν Bh µν are the traces of h µν and h µν respectively. Fixing the Lorenz gauge for this new variable ∇ µh µν = 0, the linearized Einstein equations in curved space-time read where covariant derivatives are built with the background metric g B µν . Here, we have introduced the perturbed energy-momentum tensor δT µν as the difference of the total energy momentum tensor T µν with respect to the background solution 8πGT B µν = R B µν − 1 2 g B µν R B . One should note that, in vacuum, all the Ricci tensors vanish in the second line. Moreover, for short-wave GWs λ gw L B , the Riemann tensor in the first line has a subdominant contribution.
To deal with the two GW polarizations, it is convenient to work in the transverse-traceless (TT) gauge, which is defined by Note that in the TT gauge, the trace-reversed perturbationh µν is equal to the original perturbation h µν . If the GW is propagating in the z-direction, the spatial components become with h + and h × being the two polarizations of GR.

Generation
A first question to address is how GWs are produced. Let us consider a GW source in vacuum within the short-wave approximation. Then, the general propagation equation (28) reduces to This wave equation can be solved in analogy to electromagnetism using a Green's function. In terms of the retarded time t r = t − | x − y|, the solution is For an isolated, far away, non-relativistic source, this solution can be simplified. In fact, one can make a multipole expansion. The zeroth moment corresponds to the mass-energy of the source M = T 00 (t, y)d 3 y. However, conservation of energy for an isolated source tells us that M cannot vary in time. Next, the mass dipole moment M i (t) = y i T 00 (t, y)d 3 y is associated to the motion of the center of mass. Nevertheless, its time derivate is the momentum of the source that also has to be conserved 6 . Consequently, the leading contribution is the mass quadrupole moment M ij (t) = y i y j T 00 (t, y)d 3 y, which generates GWs through its second time derivatives For a binary system of masses m 1 and m 2 , the quadrupole radiation is where F is a function of the orientation of the binary that depends on the polarization + or × (recall (30)), Φ(t) is the phase and we have introduced the chirp mass As the masses orbit one around the other, they will lose energy with the emission of GWs. They will begin getting closer and orbiting faster until they eventually merge. Thus, the frequency of GWs will increase with a characteristic chirp signal followinġ Note that to consider the energy loss due to GWs emission one has to go to second order in perturbation theory. An example of the typical GW strain and frequency of a compact binary coalescence is presented in Fig. 5. Typical binary compact objects emitting detectable GWs are binary neutron stars (BNS) and binary blackholes (BBH). The order of magnitude of the frequency of the GWs of these systems is 6 Similar arguments apply for the spin angular momentum in case the source exhibit some internal motion. The GW strain (above) and the GW frequency (below) are plotted as function of the time before merging. This waveform is a template of the first event detected GW150914 [232].
where M is equal to one solar mass. This implies that higher masses lead to lower frequencies. In terms of the wavelength one finds (39) Contrary to EM waves, GW detectors are directly sensitive to the amplitude of the wave, which falls like 1/r and not as the luminosity 1/r 2 . This means that even if the amplitudes are very small, GW detectors are more sensitive to distant sources.

Propagation
Once the GW is generated, it will propagate in vacuum following A general solution of this wave equation can be written as the sum of plane waves where Re denotes the real part. By plugging this expression in the wave equation and expanding in powers of k, one finds at leading order that Therefore, GWs propagate in null geodesics determined by the background metric. This means that the GW-cone is the same as the light-cone and both waves propagate at the same speed. Moreover, the wave is transverse to the propagation direction similarly to electromagnetic waves. Finally, by defining the scalar amplitude A = 1 2 A * µν A µν 1/2 one realizes that which can be interpreted as the conservation of gravitons. One should note that R B µανβ in the wave equation only modifies the amplitude at second order. Consequently, at first order in geometric-optics, the wave equation h µν = 0 can be rewritten as This expression could be used as a gauge invariant, coordinate independent definition of the propagation of GWs in vacuum.

Detection
To see the effect of a GW passing by, one has to study the deviation of nearby geodesics. Given two particles with four-velocity U µ separated by S µ , their separation evolves as where τ is the proper time. At leading order, the four velocity is just the unit vector U µ = (1, 0, 0, 0) + O(h), and we only have to compute the Riemann tensor in the TT gauge. The result is where we have also used that at leading order the proper time τ and the coordinate time t coincide. Accordingly, only the components of the separation vector S µ transverse to the propagation vector will feel the effect of the GW. In these directions, the separation between the test particles will oscillate as the GW travels perpendicular to them. In Fig. 6, we plot the effect of the different GW polarizations crossing a circle of test masses. GW detectors precisely rely on this principle that GWs can alter the separation between test masses. Modern detectors are interferometers. In brief, they are constituted by two perpendicular arms of the same length with two mirrors in free fall at their ends (acting as test particles). A laser beam is split in the two arms so that the beams reflect in each mirror and come back to the splitting point. In the absence of a GW, both laser beams returning will interfere destructively and no signal would arrive to the detector. However, if a GW crosses the interferometer, it will change the length of the arms differently. This means that the laser beams will take different times to travel the arms, arriving at the splitting point with different phases. Then, the destructive interference is lost and some signal gets to the detector.
Note that the typical distance variation δL of two test masses separated by L is approximately δL ∼ h · L. For compact binaries, we have seen that the strain amplitude is h ∼ 10 −21 . Therefore, LIGO-type detector with arms of the order of kilometers have to measure distance variations a thousand times smaller than the nucleus of an atom. To achieve that, each arm has a resonant cavity in which the laser beams bounce back and forth about 300 times. This effectively makes ground-based interferometer arms to be 1200km long (since the variation time of the GW is much longer the travel time of the laser in the cavity). Accordingly, LIGO is sensitive to frequencies of f LIGO ∼ 10 2 Hz. For the future space-based interferometer LISA, the working principle will be the same but with longer arms L ∼ 10 6 km, being thus sensitive to much smaller frequencies, f LISA ∼ 10 −2 Hz.

B. GWs in cosmology
At large scales, the universe is homogeneous and isotropic to very high accuracy. The background geometry is then described by a (flat) Friedmann-Lemaitre-Robertson-Walker (FLRW) metric where a(η) is the scale factor and we are timing in conformal time dη = dt/a(t). The propagation equation (40) becomes in Fourier space where H = a /a is the Hubble parameter and primes denote derivatives with respect to conformal time. This is nothing but a wave equation with a friction term due to the cosmic expansion. This Hubble friction will produce a redshift of the frequencies f emit = (1 + z)f obs and a rescaling of the GW amplitude h ∼ 1/(a·r). The previous formulae for a compact binary (34)(35)(36) written in terms of the observed frequency f obs are thus valid if we replace the chirp mass M c by the redshifted chirp mass and the physical distance a · r by the GW luminosity distance where c is the speed of light and z the redshift. In this way, all the (1 + z) terms cancel each other. Note that there is an intrinsic degeneracy between the redshift and the Hubble parameter H(z) in the GW luminosity distance. Therefore, the expansion history can only be obtained from the GW amplitude if the redshift is known. For near by sources z 1, the Hubble constant H 0 can be obtained showing the power of GW astronomy to do cosmology. We will review this topic in more detail in Sec. IV. Finally, let us mention that we have only focused on GWs from binary sources in the late universe. However, there could be other sources of GWs in the early universe leading to stochastic, cosmological backgrounds. For a nice review in the subject one can follow [233]. One may wonder if there could be an effect in the GW propagation when traveling through the cold dark matter. This question has been addressed recently and the answer is that the effect is too small [234,235].

C. GWs beyond GR
As we have emphasized at the beginning of this section, the existence of wave solutions for metric perturbations is generic of second order gravity theories. However, the behavior of these GWs can be very different depending on the gravity theory. The differences can arise either at the production or the propagation.

Additional polarizations
During the generation of GWs, the main differences in theories beyond GR is that there could be another polarizations excited. We have seen that in GR only the 2 tensor polarizations propagate (recall (30)). Nevertheless, modifications of gravity might introduce new degrees of freedom. For instance, in scalar-tensor theories there will be an additional scalar mode. Or in Massive Gravity, where there will be in addition 2 vector modes and a scalar one. For a GW propagating in the z-direction, one could decompose the amplitude A ij in the different polarizations where A + and A × are the two tensor modes, A V 1,2 the two vector polarizations, A T the transverse (breathing) scalar and A L the longitudinal scalar mode. One should note that these other types of polarizations will also leave an imprint in the detectors. Each polarization will have a different effect as we exemplify in Fig. 6. In principle, with a set of 6 detectors one could distinguish all possible polarizations. Before continuing, it is important to remark that if a source is emitting additional polarizations, it will lose energy more rapidly. For binary pulsar, if additional modes were emitted, the orbit would shrink faster due to the higher energy loss. For PSR B1913+16 (better known as Hulse-Taylor pulsar) [236], the orbit has been tracked for more than four decades now, showing an impressive agreement with GR [237]. Binary pulsars have been intensively used to constrain alternative theories of gravity, placing severe bound on dipolar radiation as reviewed in [238,239]. An example of this are Einstein-Aether propagating waves [240], which have been constrained from pulsars due to dipolar GW emission [241,242]. Another would be the constraints on Brans-Dicke from a pulsarwhite dwarf binary [243].
Due to these constraints on the emission of additional polarizations, it is usually invoked a screening mechanism around the source to evade them. If this is the case, deviations of GR could only be measured in the propagation of GWs. We will discuss more about the emission of extra modes and screening mechanisms in Sec. VIII B.

Modified propagation
The propagation of GWs in gravity theories beyond GR can be very complicated. The additional fields might modify the background over which GWs propagate and their perturbations could even mix with the metric ones. For simplicity, we will restrict here to cosmological backgrounds. In that case, due to the symmetries of FLRW, tensor perturbations can only mix with other tensor perturbations. Possible deviations from the cosmological wave equation in GR (50) can be parametrized by [244] h ij + (2 + ν)Hh ij + (c 2 where ν is an additional friction term, c g accounts for an anomalous propagation speed, m is an effective mass and Π ij is a source term originated by the additional fields. For instance, the scalar-tensor analogue of this equation is (18). It is interesting that the modified GW propagation can also be understood in analogy with optics as GWs propagating in a diagravitational medium [245].

Gravitational Wave Polarizations
Tensor × Focusing in the case without sources, Π ij = 0, the original GR wave-form h GR , given by (34) for instance, will be modified by where we have introduced α T = c 2 g − 1. Mainly, the additional friction will modify the amplitude while the anomalous speed and the effective mass change the phase. The modified luminosity distance is then We will discuss how to test the GW phase in Sec. V and the damping of the strain in Sec. VI. For GWs propagating in FLRW backgrounds, a source is present Π ij = 0 when there are additional tensor modes propagating. A paradigmatic example of this is bigravity, where there are two dynamical metrics. In that case, we have to track the evolution of both metric perturbations [246][247][248] where for shortness we have absorbed the Hubble friction in the definition of the perturbation and we do not show the spatial indices. Here m g is the effective mass (one of the tensor fields is massive) and θ is the mixing angle. Since there are interactions between h ij and t ij , this means that the mass eigenstates are not the same as the propagation eigenstates. In analogy with the propagation of neutrinos, there can be GW oscillations. In Sec. VII A we will see how GW oscillations can be tested. One should note that the possibility of having GW oscillations is not restricted to bigravity. Any gravity theory in which the additional degrees of freedom can arrange to form a tensor perturbation over FLRW background could display the same phenomenology. In particular, this is what happens with gauge fields in a SU(2) group [124,249]. , advanced Virgo (aVirgo) and KAGRA, with curves given at design sensitivity [28]. Third generation (3G) detectors projected are Einstein Telescope (ET) [29] and Cosmic Explorer (CE) [30]. A space-based detector planned is LISA [32]. For illustration, we include the strain amplitude of GW150914 [232] and the expected background for massive binary black-holes (BBH) and galactic white-dwarf (WD) binaries [250].

D. Present and future GW detectors
Before presenting the different tests of gravity with multi-messenger GW astronomy, let us outline briefly the status of present and future GW detectors. We summarize the different sensitivities of each detector and the typical sources in Fig. 7. The capabilities of multimessenger GW astronomy depend mainly on two aspects: • Number of detections: this is most sensitive to the size of the volume of the Universe covered by the GW detector. However, there is a large uncertainty in the actual population of the sources, e.g. BNS.
• Sky localization: this is most sensitive to the number of detectors that allow for a better triangulation of the source. A better localization of the GW events simplifies the search for a counterpart.
We draft a summary of present expectations for the range of detection and localization angle of different GW detectors in Fig. 2. The reader should be aware that these expectations, specially the ones far in the future, might be subject to important modifications.
At present, we are in the second generation (2G) of ground-based detectors. There have been already two operation runs. In the first one, only the two aLIGO detectors were online with a detection range for BNS of the order of 80 Mpc. In the second one, aVirgo joined. Although its sensitivity was still lower, aVirgo helped to reduce the localization area an order of magnitude, from 100 − 1000 deg 2 to 10 − 100 deg 2 . For illustration, we plot in Fig. 7 the strain of the first event GW150914 [232].
However, neither aLIGO nor aVirgo has reached their designed sensitivity yet. Moreover, other two 2G detectors are on the way. KAGRA [251] in Japan is under construction and it is expected to start operating in 2020. On the other hand IndIGO [252], a replica of LIGO located in India has been approved. This means that in the coming years two main improvements are expected: a larger event rate and a more precise localization [28]. The range of detection is expected to improve by a factor of 3 implying a factor 27 in the detection rate. The localization is expected to reduce to areas of 5 − 20 deg 2 with KAGRA and to a few deg 2 with IndIGO. Note that this is a key point in order to associate any counterpart with a GW event.
A third generation (3G) of ground-based detectors is being planned. The European 3G proposal is the Einstein telescope (ET) [29], an underground, three 10kmarms detector. Its current design aims at improving by a factor of 10 present sensitivity. The US 3G proposal, Cosmic Explorer (CE) [30], is more ambitious with two 40km arms further improving the sensitivity of ET. In any case, 3G detectors imply a substantial change in GW astronomy. While 2G detectors will only be able to reach up to z ∼ 0.05 for BNS and z ∼ 0.5 for BBHs, 3G detectors might reach z ∼ 2 for BNS and z ∼ 15 for BBHs. In terms of multi-messenger events, this corresponds to thousands or tens of thousands standard sirens. The sky localization of events in 3G will vary depending on the available network of detectors [31]. In this sense, there are already plans to upgrade advanced LIGO detectors. This envisioned upgrade is known as LIGO Voyager [253]. Voyager could reach sensitivities between 2G and 3G. The localization will thus vary depending on the redshift of the source since the sensitivity of the network will not be homogeneous. A network of three Voyager detectors plus ET would localize 20% of the events within 10 deg 2 , while a setup with three ET detectors would localize 60% of the events within 10 deg 2 [31].
Moreover, space-based GW detectors have been also projected. The European space agency has approved LISA [254]. Being in space and with million kilometer arms, the frequency band and targets of LISA are very different from ground-based detectors (see Fig. 7). Expected sources include supermassive BHs, extreme mass ratio inspirals (EMRI) and some already identified white dwarf binaries (known as verification binaries). It is presumed that these sources could be observed with counterparts, enlarging the reach of multi-messenger GW astronomy. For reference, we have included in Fig. 7 the expected background of massive BBH (M BH ∼ 10 4−7 M ) and unresolved galactic white-dwarf binaries [250] (see more details about the different sources in Fig. 1 of [254]).
Finally, there are other proposals to detect GWs at even lower frequencies, in the band of 1-100 nHz. Sources in this regime could be binary SMBH in early inspiral or stochastic, cosmological backgrounds. These GWs could be observed using a network of millisecond pulsar, in which the pulsation is extremely well-known, for instance with PPTA [255]. Other proposals are to use astrometry with GAIA, which is capable of tracking the motion of a billion stars [256], or to use radio galaxy surveys [257].

IV. STANDARD SIRENS
GWs coming from distant sources can feel the cosmic expansion in the same way as EM radiation does. In fact, we have seen in Sec. III B that the amplitude of the GWs is inversely proportional to the GW luminosity distance d gw L . In GR the GW luminosity distance is equal to EM luminosity distance, with the standard formula given by (52). However, this is not a universal relation in theories beyond GR as we will discuss in Sec. VI. For the moment,  we will restrict to Einstein's theory only.
In order to measure distances in cosmology one needs both a time scale and a proper ruler. The inverse dependence of the strain with d gw L makes GWs natural cosmic rulers. Introducing the full cosmological dependence 7 , the GW luminosity distance is given by where sinn(x) = sin(x), x , sinh(x) for a positive, zero and negative spatial curvature respectively. Assuming a ΛCDM cosmology, the Hubble parameter is a function of the matter content Ω m , the curvature Ω K and the amount of DE Ω Λ (radiation at present time is negligible) On the contrary, GWs alone do not provide information about the source redshift. This is because gravity cannot distinguish a massive source at large distances with a light source at short distances. Nevertheless, when GWs events are complemented with other signals that allow a redshift identification, these events become standard sirens [265]. Standard sirens are complementary to already well-established standard candles, SN events in which the intrinsic luminosity can be calibrated allowing for a measurement of the EM luminosity distance. There are also standard rulers, such as the one determined by baryon acoustic oscillations (BAO) which provides the angular diameter distance. For binary black-holes (BBH) it is not expected to observe any counterpart unless there is matter around the BHs [266]. Fortunately, binary neutron stars (BNS) and black-hole neutron star systems (BHNS) are expected to emit short gamma-ray bursts (sGRB) and other EM counterparts, becoming clear standard siren targets. The first ingredient for a standard siren is the measurement of the GW luminosity distance. However, d gw L is degenerate with the inclination of the binary. More precisely, showing the explicit angular dependence of the waveform (34) one finds that the two polarizations scale as where ι is the inclination angle. This distance-inclination degeneracy is the main source of uncertainty of present measurements of d gw L [267]. One possibility to break this degeneracy is to have an identification of both polarizations. This requires at least a three detector network and a good sky localization. Another possibility to break this degeneracy is when the binary has a precessing spin. Then, there is a characteristic modulation of the amplitude that can disentangle the inclination angle. Orbital precession is more significant for large effective spin χ eff 8 and/or small mass ratios q = m 2 /m 1 ≤ 1 since there is also an effective spin-mass ratio degeneracy. Possibly good candidates for this would be BHNS binaries since BNS typically have a mass ratio close to 1.
The other ingredient for a standard siren is the identification of the redshift. This can be achieved by different means. The simplest consists in finding an EM counterpart of the GWs from the binary coalescence [265]. Then, the redshift could be extracted from the EM counterpart or from the host galaxy depending on the case. BNS will produce a sGRB after the merger. This sGRB is characterized by a beaming angle θ j , which is typically expected to be θ j ≤ 30 • . This means that depending on the orientation of the source we will be able to detect both signals only in a small fraction of the events. Observing a bright afterglow or kilonovae [268] might increase the changes of detecting a counterpart. BNS will be the primary source for LIGO [269], although BHNS could also play an important role [270]. SMBHs might be good standard sirens for LISA as well [271]. Several multi-messenger observations will lead to a precise measurement of the cosmic expansion either for second generation detectors [262,272] or for third generation [273]. 8 The effective spin is the mass weighted projection of the two spins of the binaries into the orbital angular momentum. There are alternative proposals to identify the redshift without observing a counterpart. Based on statistical methods, one could associate every GW event with all the galaxies within the error in the localization and compute the cosmology [265,274]. For a large number of events, the true cosmology will statistically prevail. Conveniently, this method applies to any type of source, including BBH which is the most common observation. Moreover, for very loud (golden) events there might be only few galaxies in the localization box [275]. On the con side, this method relies on a complete galaxy catalogue.
For events involving a NS there are other possibilities. If the EoS of the NS is known, one could compute the tidal effects in the GW phase, which breaks the degeneracy between the source masses and the redshift [276]. A good sensitivity could be achieved with Einstein Telescope [277]. Since this method relies on the knowledge of the EoS, which most probably will be uncover through GW observations also, an iterative approach could be performed. In addition, one could benefit from the narrow mass distribution of NS to statistically infer the redshift [278]. Finally, numerical simulations suggests that in BNS a short burst of GWs with a characteristic frequency will be emitted after the merger. If this burst was observed, a redshift measurement could be obtained [279]. The main challenge of this method is possibly the low SNR of the GW burst.
GW170817 has become the first standard siren detected. The redshift, z = 0.008 +0.002 −0.003 , was obtained identifying the host galaxy NGC4993 through the different EM counterparts [26]. For such a close event, only the leading term in the cosmic expansion H 0 could be obtained following (53). The precise value obtained was [27], This result has the relevance of being the first indepen- A phantom-like equation of state w < −1 helps to solve the tension between Planck and the direct measurement (the non-zero neutrino mass partly compensates the effect of w, cf. Fig. 9). Figures reproduced with permission from the authors of [280] and [281] respectively. dent measurement of H 0 using GWs. Still, since it is only one event, the relative error is large, of the order of 14%. From this error budget, 11% arises from the uncertainty in the measurement of the distance due to present detector sensitivity and the previously mentioned degeneracy with the inclination angle. The rest of the error comes from the uncertainty in the estimation of the peculiar velocity of the host galaxy. Observations of the afterglow in different frequencies can help in reducing the inclination uncertainty [282,283]. One could also use the statistical method to obtain H 0 without information of the counterpart, although the error is significantly larger H 0 = 76 +48 −23 km s −1 Mpc −1 [284]. Recent studies have shown that with order ∼ 50 BNS standard sirens events H 0 could be measured at the level of ∼ 2% [285,286]. Depending on the actual population of BNS this might be achieved with second generation detectors. LISA will detect mergers of SMBHs (with EM counterparts), providing measurements of cosmic expansion up to z ∼ 8 and potentially measuring H 0 with 0.5% precision [287].

A. The Hubble rate tension
Standard siren observations of the cosmic expansion can also explore the tension on the Hubble parameter: where a distance ladder measurement gives a value H 0 = (73.52 ± 1.62)km s −1 Mpc −1 [260] higher than the model-dependent inference from the CMB H 0 = (67.4 ± 0.5)km s −1 Mpc −1 [1] (see in Fig. 8). The tension now reaches the level of 3.6σ. Reanalysis of the local distance ladder with more sophisticated statistical techniques tend to agree on the high value, although with somewhat larger error bars [288,289]. Other low red-shift determinations confirm this trend, for instance time delays from multiply-imaged quasar systems [290] give Measurements of H 0 can also be obtained combining BAO and primordial deuterium abundances [291] (see more details in the review [292] and a compilation of the values of H 0 in [293]).
If the tension is not due to systematic errors in either of the surveys, it would indicate a mismatch between the low and high redshift distance ladders [294], which might be the first hint of the need to revise the standard cosmological model. Several partial solutions to the H 0 tension have been proposed, although no satisfactory solution exists. Extensions to ΛCDM have been studied, but no simple model seems to work: for instance, increasing the effective number of relativistic species by ∆N eff ≈ 0.4 eases the tension but enters in conflict with small scale Planck polarization [295], which has been confirmed in the latest Planck results. The role of dark energy (through w(z)) has also been investigated in connection with the H 0 tension: no equation of state evolution w(z) can reconcile all datasets, as long as GR holds (although the tension could be eased if BAO or SNe data are not included) [296]. Interacting DE eases the tension, particularly for a phantom-like equation of state with w ∼ −1.2 [297]. Some dark energy models beyond GR and with massive neutrinos have been proposed to ease the tension. Galileon gravity leads to a phantom-like equation of state (EoS) w < −1 [298,299]: adding massive neutrinos with total mass m ν ≈ 0.6eV yielded a good fit to both Planck and the direct H 0 measurement [300]. One should note that although the EoS of Galileons w Gal deviates significantly from w Λ = −1, massive neutrinos compensate part of the effect so that the total EoS w tot = p tot /ρ tot is more similar to ΛCDM (see bottom panel of Fig. 9). Still, this difference is enough to shift the present value of the Hubble parameter H 0 ≡ H(z = 0) to higher values (see upper panel of Fig. 9). A latter analysis, shown in Fig. 10, reproduced the result, but found a slight tension with the most recent BAO data [280]. Most importantly, the cosmologically viable Galileons were ruled out by GW speed [110] and weak lensing [301]. Note however that those data employed BAO reconstruction and Galileons are known to affect the non-linear BAO evolution [207], making it more conservative to use the unreconstructed data, for which no tension exists. Non-local gravity has similar features (cf. Fig. 10) but its less negative equation of state (compensated with m ν ≈ 0.3) leads to a reduced tension rather than close agreement [281].

V. GRAVITATIONAL WAVE SPEED
The speed of GWs is a fundamental property of any gravity theory. GR predicts that GWs propagate at the speed of light. However, alternative theories generically change this prediction. In contrast to (42), GWs in modified gravity do not have to travel on null geodesics of the background metric. One can parametrize the generalized propagation by A α1···αn k α1 · · · k αn = 0 .
Here, G µν is the effective metric over which GWs propagate, m g is the effective mass of the graviton and the tensors A α1···αn encode higher-order, wave-vector corrections. When time and space can be decomposed, the above expression leads to a generalized dispersion relation where k is the spatial modulus of the wave-vector and A n are the coefficients of the higher order corrections. Accordingly, we can see that the effective metric determines the propagation speed c g [302] while the higher order wave-vector corrections control Lorentz-violating modifications of the dispersion relation [303]. The mass term m g also modifies the dispersion relation [304]. In the following, we discuss the origin of and the constraints on these three different contributions. We will focus on constraints from late time GW sources. A modified dispersion relation for primordial GWs could be tested with the B-mode polarization of the CMB, as it has been studied for the case of the speed c g [305][306][307], and the mass m g [308][309][310].

A. Anomalous GW speed
In order to obtain the frequency independent propagation speed c g , one has to focus on the leading derivative terms for the second order action for the tensor perturbations h. At small scales and for arbitrary backgrounds, the action is determined by the effective metric G µν over which GW propagate [302] The effective metric can be further decomposed in a piece proportional to the original metric C and another not proportional D. Then, whenever the (non-conformal) second term is present, the GW-cone will be different from the light-cone and both signals will travel at different speeds (see Fig. 11). 9 In scalar-tensor gravity, two conditions have to be fulfilled to induce an anomalous propagation speed: i) there is a non-trivial scalar field configuration (if we want to explain DE, we typically demandφ ∼ H 0 ) and ii) there is a derivative coupling to the curvature. This highlights the presence of a modified gravity coupling that will lead D αβ ∼ ∂ α φ ∂ β φ. Whenever these two conditions are satisfied, c g = c and there would be a delay between the GW and the EM counterpart. For instance, differences of 1%, c g /c ∼ 0.01, for sources at 100Mpc induce delays of ∆t ∼ 10 7 years, clearly beyond human timescales.
Similar arguments can be applied to other gravity theories with additional degrees of freedom. Massive gravity and bigravity have a canonical kinetic term for the gravitons (due to the Einstein-Hilbert term) and thus GWs propagate at the speed of light. In vector-tensor theories there could be couplings to the curvature leading to an anomalous propagation speed, for instance R µν v µ v ν in vector DE [312]. Interestingly, in more complex vector theories, it is possible to have derivative couplings to the curvature through the field strength F µν but not induce an anomalous speed over cosmological backgrounds [124]. This is because in these theories it is possible to have cosmic acceleration while the background of F µν vanishes, thus violating condition i). One should notice that, when violating some of the initial assumptions, the propagation speed of GWs might not be subject to the background value of any additional field and just to the parameters of the theories. This is the case for instance of Hořava gravity [46].
Alternatively, a much more common strategy followed in the literature is to compute the speed of GWs directly in a given background, usually FLRW. For Horndeski theory this was done in [97,177]. The implications of an anomalous GW speed have been discussed for instance for purely kinetic coupled gravity [313], covariant Galileons [314] and models with self-acceleration [315,316]. The implications for cosmology were discussed in [221,317]. In vector-tensor theories, cosmological tensor perturbations have been computed for instance in [318,319].
Prior to the direct detection of GWs, there were indirect constraints on the speed of GWs. High energy cosmic rays from galactic origin set a stringent lower bound −2 · 10 −15 ≤ c g /c − 1 [320], due to the absence of gravitational Cherenkov radiation [321]. The reason is that if gravitons propagate slower than the speed of light, cosmic rays could decay into them and their signal would be lost. This lower bound affects Horndeski theory [313]. However, note that we are talking about very energetic gravitons, different from the low energy GW emission of an astrophysical compact binary. Moreover, the GW speed was indirectly constrained at the level of |c g /c − 1| ≤ 0.01 with the orbits of binary pulsar in the absence of screening of the cosmological solution [322].
With the detections of GWs from BBHs, the first direct constraints on the speed of GWs were placed [323,324]. The constraints were still not very strong, −0.45 ≤ c g /c− 1 ≤ 0.42, due to the uncertainties in the localization of the source and the low number of detections (3 at the time of the analysis). Detecting a GW with an EM counterpart changes completely the situation, leading to very precise measurements [304,325,326] Such a multi-messenger GW event was detected on August 17, 2017 with the BNS GW170817 [18]. The GW signal was followed by a short gamma ray burst (sGRB) only ∆t = 1.74 ± 0.05s after [25]. The source was localized at a distance of d L = 40 +8 −14 Mpc. In order to set the constraints, the LIGO-Virgo collaboration conservatively considered the source at the lowest distance of 26Mpc. For the upper bound, it was assumed that both the GW and the sGRB were emitted at the same time and that all the delay is caused by the faster propagation of the GW. For the lower bound, they assumed that the sGRB was generated 10s after the GW, order of magnitude expected in standard astrophysical models, and that the delay was reduced to 1.74s due to the slower propagation of the GW. In total, this led to the impressive constraint This result has profound implications for many gravity theories and dark energy models. In scalar-tensor gravity at least one of the conditions for an anomalous GW speed has to be broken. If we want the scalar field to keep playing a role in the cosmic expansion history, it cannot have a trivial scalar field configuration. Therefore, the only possibility to satisfy GW170817 is to break the second condition an eliminate derivative couplings to the curvature. For Horndeski theory (5)(6)(7)(8) this implies [110,111,327,328] Translating this result, only the simplest models such as quintessence, Brans-Dicke or Kinetic Gravity Braiding survive. On the contrary, models like Covariant Galileons, Fab Four, Gauss-Bonnet or some sectors of beyond Horndeski are ruled out. The fact that the pa-t r )) }∆t g µν q µ q ν = 0 G µν k µ k ν = 0 FIG. 11: Anomalous GW speed. Gravitational waves propagate on an effective metric G µν (blue) with a different causal structure than the physical metric g µν (red) [302]. The speed is derived as cg( k) = ω( k)/| k| where k µ = (ω, k) is the solution to G µν kµkν = 0. Note that the speed can depend on the propagation direction. It may also depend on the frequency (e.g. massive graviton or Lorentz violation), cf. (64).
rameter space has been drastically reduced has implications for cosmological constraints [194,223,329] and for large scale structure [330]. For vector-tensor theories the situation is very similar. In order to describe DE and to pass the GW test some couplings of the theory have to be eliminated [110,327], in particular G 4,Y ≈ 0 and G 5,Y ≈ 0 (see full action in Eq. (299) of [69]) The same happens for Hořava gravity where one has to impose ξ ≈ 1 or β kh ≈ 0 [331], which correspond to the conditions for the low-energy version of the theory or its Einstein-aether analogue respectively. The implications of GW170817 for other gravity theories have been extensively explored, for instance for doublycoupled bigravity [332], f (T ) gravity [333] or Born-Infeld models [334].

B. Mass term
A graviton mass, either effective or fundamental, modifies the propagation speed of GWs. However, contrary to the anomalous speed term c g , it does it in a frequency dependent way. This means that it can be constrained with GW observation alone, analyzing how the phase of the wave changes in time. The present bound from the LIGO-Virgo collaboration is [15] m g ≤ 7.7 · 10 −23 eV/c 2 .
Note that this bound is still far away from the cosmologically "motivated" m g ∼ H 0 10 −33 eV/c 2 . Since a graviton mass would also change gravity in other regimes, we can compare the GW bound with other tests. In particular, a massive graviton introduce a Yukawa potential that can be constrained with Solar System observations. This issue has been recently revisited [335], showing that the best bound comes from the perihelion advance of Mars, leading to m g < (4 − 8) · 10 −24 eV/c 2 , which is an order of magnitude better than present GW constraints.
For LISA, the GW bound could improve significantly, due to the lower frequencies and higher distances, possibly reaching m g < 10 −26 eV/c 2 [336]. In addition, there are proposals to bound m g measuring the phase lag of continuos sources of GWs and EM radiation with LISA binaries [337][338][339]. 10 For more details in other types of constraints, we recommend the recent review [142].

C. Modified dispersion relation
Similarly to a graviton mass, Lorentz violating terms modify the dispersion relation in a frequency dependent way. Different wavelengths thus travel at different speeds, modifying the time evolution of GW phase. The effects of the new terms A i in the dispersion relation (64) can be systematically parametrized in modifications of the waveform [303]. A typical example of a Lorentzviolating theory would be high-energy Hořava gravity [47] in which where κ h and µ h are parameters of the theory [340]. From the first two events, GW150914 [13] and GW151226 [14], one can already constraints several theories as detailed in Ref. [341]. For Hořava gravity, one can constrain the combination of parameters κ 4 h µ 2 h , which were not bounded previously. GW170104 [15] and GW170817 [25] have also been used by LVC to constrain the different A n .

D. Equivalence principle
The fact that GWs and EM radiation from GW170817 arrived almost simultaneously at Earth after approximately 100 million light years of travel tells us that both signals follow very similar geodesics. This statement can be made precise in terms of the Shapiro delay [342]. The Shapiro delay measures the difference on arrival time of a massless particle in flat and curved space-time. This can be computed parametrizing the integral of the gravitational potential U (r) over the line of sight [343] where r e and r o are the positions at emission and observation. The prediction of GR is that γ = 1 for any massless particle. This has been tested to very good precision for photons, γ em − 1 ≤ (2.1 ± 2.3) · 10 −5 , using the Cassini space-craft [344]. This is one of the most stringent Solar System test of gravity and implies that in these scales the gravitational potential should be very similar to GR as discussed in detail in the review [6]. Now, the multi-messenger observation of GW170817 allow us to test if GWs and EM radiation feel the same gravitational potential. In other words, this is testing the equivalence principle. In order to get a bound on the relative difference of γ gw and γ em one needs to know the gravitational potential between the BNS and the detectors. A conservative bound can be placed introducing only the effect of the Milky Way to arrive at [25] −2.7 · 10 −7 ≤ γ gw − γ em ≤ 1.2 · 10 −6 .
This constraint has implications for instance for theories in which the dark matter arises from a non-minimal matter coupling to gravity, the so-called dark matter emulators [345]. If both types of waves propagate in the same effective metric, no relative difference is present as it has been argued for the case MOG gravity [346].

VI. GRAVITATIONAL WAVE DAMPING
Apart from the speed of GWs, the other main observable from the modified propagation is the luminosity distance of GWs d gw L . For theories in which c g = c, the GW luminosity distance (57) is related to the EM luminosity distance d em L by d gw where ν is the additional friction term from modifying gravity, cf. (55). Therefore, one can probe the damping of GWs using standard sirens, since for those multimessenger observations both d gw L (z) and d em L (z) are measured [347]. Moreover, even without an EM counterpart, any additional friction for the GWs could be probed using GW source counts [348].
A paradigmatic example of a modification of gravity in which the GW luminosity distance differs from the EM one is adding extra dimensions [347]. In extra dimension theories, for instance DGP, there can be a large distance leakage of the gravitons into the additional dimensions. This means that, as a net effect, an observer will receive less gravitons or, in other words, the gravitational signal will be dimmer. By dimensional analysis, the GW luminosity distance scales in these theories as where D refers to the number of space-time dimensions in which the graviton can propagate. For D = 4, one recovers the GR result d gw L = d em L . In cases in which the graviton can only travel in the extra dimensions above a certain screening scale R c , the previous relation generalizes to where n measures the transition steepness.
In scalar-tensor gravity it is also known how the GW luminosity distance will evolve. The additional friction is equal to the effective Planck mass run rate α M where M * is the effective Planck mass, i.e. the normalization of the kinetic term of the tensor perturbations. Then, recalling the redshift definition 1 + z = a 0 /a, one arrives at where M * (0) and M * (z) are the effective Planck masses at the time of observation and emission respectively. Assuming that there is no screening and taking α M constant, one could rewrite this expression as [244] d gw For this case, the implications of measuring α M for Horndeski cosmology have been discussed in [221,315]. The prospects of constraining the time variation of the Planck mass has been investigated for aLIGO in [244] and for LISA in [349]. For illustration, we plot in Fig. 12 how the ratio d gw L (z)/d em L (z) would vary in Brans-Dicke depending on the coupling ω BD , cf. Eq. (3).
Another theory in which the GW luminosity distance has been investigated is the non-local, RR-model. For this model, one finds [350] d gw where the effective Newton constant is related to the parameters of the theory withV (z) being the background evolution of the auxiliary field and γ = m 2 /(9H 2 0 ) linked to the mass of the conformal mode m (see details in the review [281]). Thus, the growth of structure is directly related to the GW propagation. This behavior is also reproduced in some Horndeski models [351]. Differently to the scalar-tensor case, there is no screening for these non-local models. One should note also that the strength of the modification of the GW luminosity relation is sensitive to the initial conditions of the auxiliary fieldV (z), which depends on the unknown early universe physics.
With the detection of the multi-messenger event GW170817 it was possible to test the gravitational Hubble diagram for the first time. The observation was consistent with GR although being just one event the constraining power is still moderate. For theories with extra dimensions, it was found that the number of space-time dimensions in which the gravitons propagate is limited to [352] for SN or CMB prior in H 0 (see Fig. 8 and Sec. IV A). Moreover, the additional friction ν can only be loosely constrained [329] In order to connect this result with the previously discussed theories recall that for scalar-tensor gravity ν = α M and for the non-local, RR-model ν = −2δ [350]. An important remark when evaluating the GW luminosity distance in modified gravity is that it will not only be altered with respect to GR due to the modified propagation of GWs but also because the cosmological expansion history is different. In other words, in alternative theories of gravity both the EM luminosity distance d em L and its relation with the GW luminosity distance can be modified, due to a different H(z) and to an additional friction ν respectively. In fact, the contribution of the the modified propagation can dominate over the modified cosmic expansion history. Introducing a phenomenological parametrization of the GW luminosity distance [353] d gw L (z) d em L (z) together with the usual (w 0 , w a ) parametrization of H(z), it was shown that the largest contributions are Ξ 0 and w 0 . The prospects of measuring Ξ 0 with the Einstein telescope were also considered in [353].

VII. ADDITIONAL POLARIZATIONS
Apart from the modified GW propagation, the other main GW effect of theories beyond GR would be the emission of additional polarizations. We have seen that observing the orbits of pulsars already severely constrains the gravitational energy loss to follow GR. Now, GW astronomy enables to directly probe these extra modes. For this test, the basic role of multi-messenger events is improving the localization and breaking degeneracies with the orientation. 11 With direct GW observations, the emission of additional polarizations can be constrained from the modifications of the waveforms. For instance, with the first two events it was possible to limit the presence of scalar hair [341]. However, there are still degeneracies between the modified GW phase and the spin and mass parameters that weaken the constraints. This is the case of Einstein-dilaton-Gauss-Bonnet [354] and dynamical Chern-Simons gravity [355], archetypical examples of theories studied in numerical relativity [356,357].
Moreover, there are also searches for direct signals of non-tensorial polarizations, analyzing the GW geometry through the projection of the different polarizations A ij (54) onto the detector's network. Since the two LIGO interferometers Hanford and Livingston are basically coaligned, they maximize the SNR of the detection but are insensitive to polarizations. This situation changes with the incorporation of Virgo. From the observation of GW170814, a three detector BBH signal, pure tensor polarization were favored over pure vector or pure scalar modes [17,358]. However, this was just a simplified analysis and the LIGO-Virgo collaboration is performing a more intensive study including mixed-polarization, which would be a more realistic setup. In the future, these constraints will improve with the switch on of the Japanese detector KAGRA and aLIGO India (see Fig. 2). Nevertheless, one should note that quadrupolar detectors like aLIGO and aVirgo cannot distinguish between the breathing and longitudinal scalar modes (see Fig. 6).
In addition, it will be possible to test additional polarizations with continuous GW sources such as pulsars [359]. No signal has so far been detected [360,361], although only the first run has been analyzed because of the costly computational analysis.
Finally, observing the stochastic backgrounds of GWs can probe as well non-GR polarizations. Such background is composed of individually unresolved sources. Since the signal is received from different points in the sky in a continuous manner, it allows a direct measurement of the polarization from the spectral shape of the background [362]. No stochastic GW background has been detected yet, placing limits on the stochastic background from tensor, vector and scalar polarizations [363].

A. Gravitational wave oscillations
Interestingly, these extra modes might mix with the GR polarizations h +,× . Over cosmological backgrounds, tensor polarizations can only mix with other tensor modes by symmetry arguments. The simplest example of a theory with two metric perturbations is bigravity. In analogy with neutrino oscillations, the difference between the mass and propagation eigenstates in bigravity leads to GW oscillations [246][247][248]. Assuming that GWs are emitted as in GR, these oscillations during the propagation introduce a modulation of the GW amplitude. Thus, depending on the mixing angle and the mass of the graviton, there will be oscillations in the GW luminosity distance as a function of redshift. We plot different examples in Fig. 13. Present ground-based detectors are sensitive to masses m g ∼ 10 −22 eV. The mixing is maximized at an angle θ = π/4 (recall (58)). In principle, for several multi-messenger events at different redshifts one could trace these oscillations [247]. Moreover, with space-based detectors like LISA one could reach a thousand times smaller masses. Interestingly, since both perturbations travel at different speeds due to the mass term of one of them, it is possible that they decohere ending traveling independently and arriving at different times. GW detectors will then see an echo signal. This allows to further constraint the parameter space of bigravity [248].
Finally, we should stress that GW oscillations are not a unique property of bigravity. For instance, gravity theories with gauge fields in an SU(2) group have effectively two metric perturbations as well, leading to the same phenomena [249,364]. This can happen in different classes of vector-tensor DE models too [124].

VIII. THEORETICAL IMPLICATIONS
Present GW observations place severe limits on deviations from GR. Among the different constraints, the most stringent ones are the propagation speed equal to the speed of light and the absence of emission of additional polarizations. The key question is then within the set of theories passing present tests, what interesting phenomenology is still possible?
Of course, we do not have a complete answer to this question. In the following, we survey different proposals of viable theories and highlight some lessons we have learned in light of current bounds.
A. cg = c Before considering which theories are compatible with present constraint on the speed of GWs, it is important to discuss how far reaching this new measurement is. The first thing to note is that due to the closeness of the BNS, the constraint only applies basically to present time. Therefore, one could envision a situation in which the speed of GWs was different from the speed of light at early times but due to the cosmological evolution at present time c g (z = 0) = c. However, one should be careful about this argument for several reasons. First, the level of precision of c g /c − 1 requires the cosmological evolution to be tuned at the level of 1 part in 10 15 . One way around this argument is to have c g (z = 0) = c as a late time cosmological attractor. An example of this is Doppelgänger DE [365], where a coupling between DM and DE allowed for this attractor to exist. Still, if the derivative couplings to the curvature leading to the anomalous speed remain present in the action, there are reasons to worry [110]. This is because although the cosmological evolution might lead to the precise cancellation of the dangerous terms, there will be deviations from the cosmological background along the path of the GWs towards the detector, for instance, when they cross the Milky Way.
A second remark is that constraints in the dispersion relation only apply to the characteristic wave numbers of the compact binary systems detected so far. These modes are characterized by k gw H 0 . As a consequence, in a phenomenological approach, one could envision modifications of the dispersion relation only arising at cosmo-logical scales [366], for instance This could, in principle, lead to modified gravity effects at large scales not affecting present GW constraints. However, in practice, only theories with non-local couplings or higher derivative interactions with ghost degrees of freedom are known to have this dispersion relation. It would be interesting to study in depth the theoretical framework allowing for this modified propagation. Related to this point, one should note that the frequency of GW170817 was close to the typical strong coupling scale of the EFT of DE Λ strong ∼ (M pl H 2 0 ) 1/3 ∼ 260 Hz. If the cutoff of the theory is of the order of the strong coupling scale M cutoff ∼ Λ strong , as it is usually assumed, higher dimensional operators might modify the dispersion relation although one would not expect that they conspire to completely cancel the anomalous speed at the level of O(10 −15 ) [111]. In the case in which the cutoff scale is parametrically smaller, M cutoff Λ strong , the situation could be different [367]. Theories with a Lorentz-invariant ultra-violet (UV) completion are presumed to have luminal GW propagation. Therefore, one would expect higher dimensional operators to erase any anomalous speed beyond the cutoff scale, which in this case might already happen in the LIGO band. However, the speed of GWs cannot be computed beyond M cutoff if the UV completion is unknown. In any case, the hypothesis that higher dimensional operators render c g (k LIGO ) = c could be tested detecting GWs at different frequencies, for example with LISA (see Fig. 7). This might give us valuable information about the cutoff scale of the effective theory of DE.
Another lesson from GW170817, as it was discussed in Sec. V A, is that the effective metric for GWs is proportional to the effective metric for EM radiation. In other words, the GW-cone and the light-cone are the same. This fact suggests two ways to construct theories with c g = c [110]. On the one hand, one could start with a theory in which GWs propagate at the speed of light an apply a conformal transformationg µν = Ω 2 (φ, X)g B µν to the gravity sector 12 . Then, one would automatically arrive to a theory with c g = c. In the case of scalar-tensor gravity, if one applies this recipe to GR, one arrives at the kinetic conformal theory (10), which was the first extension beyond Horndeski [107]. On the other hand, one could begin with a theory with c g = c and apply a disformal transformation [108], engineered to compensate the anomalous speed. This is because the term D not proportional to the metric can modify the causal structure unlike the conformal term Ω 2 . This is clear when computing how the speed of GWs would transform [110] c 2 g = c 2 g (X) 1 + 2XD , where c g is the speed of tensors of the original gravity theory and −2X =g µν φ ,µ φ ,ν . 13 In this case, starting with Horndeski theory, one would arrive at the subclass of GLPV theory [112] characterized by c g = c. In terms of the free functions in the action (11), one needs to impose B 4 = G 4,X /X. Satisfying this constraint, concrete DE models have been proposed [368]. In the context of DHOST theories, this constraint implies A 1 = 0. Something to note is that after GW170817, DHOST has the same number of free functions in the action as Horndeski had before the constraint on the speed of GWs, i.e. four free functions of φ and X that could be counted as K(φ, X), G 3 (φ, X), G 4 (φ, X) (with B 4 = G 4,X /X) plus the conformal redefinition Ω 2 (φ, X) (cf. (5)(6)(7)11,10)). One may worry whether the conditions for c g = c are stable under quantum corrections. If they are not, one would need to tune the GW speed order by order in perturbation theory. Using the results of [369,370] linking the properties of Horndeski with those of Galileons, it was argued in Ref. [111] that the quantum corrections to the EFT coefficients are negligible, O(10 −40 ), even compared to the 10 −15 constraint in c g /c − 1. Thus, the tree-level condition is not modified.
Within the scalar-tensor theories compatible with the constraint on the speed of GWs, there have been extensive efforts to explore interesting phenomenology. One immediate question is whether the survival theories can provide accelerated expansions at late times without a cosmological constant as covariant Galileons were providing. This possibility was investigated in the context of DHOST theory [371]. It was found that indeed there are scaling solutions with a late time de Sitter behavior. Still, a full comparison with present cosmological observations is missing due to the lack of appropriate Boltzmann codes for these higher-derivative theories.
Another attractive feature of Horndeski gravity was the possibility to have self-tuning solutions [104,372]. This was an attempt to solve the cosmological constant problem by counterbalancing the large bare vacuum energy with the energy momentum of the scalar field. However, Fab Four models realizing this behavior predict an anomalous GW speed. Now, beyond Horndeski models with c g = c could also exhibit self-tuning. Indeed, an infinite set of self-tuning models compatible with GW170817 were found in [373]. Again, a detailed cosmological analysis is left for future work.
In the realm of Horndeski gravity, one could search for other definite predictions. In addition to the condition in the speed of GWs (α T = 0) one could impose that the gravitational strength coupling to matter is the same that the one to light, G matter = G light (or α B = −2α M ). This model, named no slip gravity [351], has the property of predicting that gravity should be weaker than GR in the late universe. This could be tested with growth of structure observations in the next generation galaxy redshift surveys.

B. Compact objects
Present observations severely constrain deviations from GR at small scales. Screening mechanisms are thus needed to surpass these bounds [374][375][376][377]. Modified gravity theories can display different types of screening mechanisms (see reviews [10,11]). For theories with derivative interactions this is achieved with the Vainshtein mechanism, which screens the fifth force when the local curvature is larger than a given threshold. Such mechanism has been extensively studied for Horndeski theory [378][379][380]. For theories beyond Horndeski of the GLPV class the screening works similarly outside the source, but there is a breaking of Vainshtein screening inside matter [381]. This hints astrophysical systems such as neutron stars as targets to test these theories [382,383].
The question then is whether the viable scalar-tensor theories (in light of GW tests) can display a successful screening and if there are any observational signatures to test them. This was addressed by different groups soon after the announcement of GW170817 [384][385][386]. One should note that many models in which the breaking of Vainshtein screening was studied previously are incompatible with c g = c. Still, these recent analyses show that within DHOST theories satisfying the constraint in the speed of GWs screening is effective outside nonrelativistic bodies but there could be a breaking inside matter as well. This deviation from GR inside compact bodies is only predicted for theories beyond Horndeski. Comparing with the previous GLPV analysis, the weakening of Vainshtein screening inside matter in DHOST theories has a different form with an additional term not present before.
Moreover, the emission of additional polarizations is highly constrained as well. Depending on the gravity theory, compact objects might emit extra radiation (see [387] for a review on no-hair theorems). An interesting question is if cosmologically relevant theories compatible with the bound on the speed of GWs can exhibit scalar hair in the black-hole solutions. In [388] it was found under these conditions only very little or non scalar hair is possible. Analysis of black-hole solutions including screening effects have not been studied so far. Strong field effects are yet possible in theories not aimed at ex-plaining cosmology, for instance spontaneous scalarization in neutron stars [389] or even in black-holes [390] (see more details in the extensive review [391]). However, one should note that this kind of solutions may also induce an anomalous propagation speed due to the spatial scalar field profile [392]. Possible constraints from this effect should be investigated further.

IX. CONCLUSIONS AND OUTLOOK
Gravitational wave astronomy has opened a new window to test gravity and dark energy. Multi-messenger probes are specially promising for this task. The first detection of GWs from a binary neutron star merger, GW170817, was followed up by several EM counterparts. This has provided an independent, standard siren measurement of the Hubble constant H 0 . Moreover, GW170817 already constrains large classes of DE models. In particular, the bound on the speed of GWs was significantly strong. Other multi-messenger tests of DE are possible, such as probing the GW luminosity distance or searching for additional polarizations. These tests will become more relevant in the future when more events will be available. Still, there remain important challenges in this GW program to probe DE.
From the observational side, it will be crucial to achieve a global synergy in the quest of multi-messenger GW astronomy. On the one hand, GW detectors will have to improve their sensitivity and enlarge the network to detect more events and localize them better. On the other hand, observatories around the world should be available to follow-up triggers. Moreover, improved galaxy catalogues might be necessary to maximize the chances of localization. Lastly, cross-correlations between GWs and other cosmological probes might be an interesting endeavor.
From the theoretical side, the main challenge will be to analyze the GW propagation over non-cosmological backgrounds, understanding the possible interplay of additional polarizations. This will be relevant for instance for GWs traveling through a screened region. Further-more, degeneracies between modified gravity predictions and astrophysical properties should be studied in more detail. For example, possible signatures of phenomenology beyond GR in neutron stars could possibly be the same as modifications of the equation of state.
Altogether, the future of multi-messenger GW astronomy appears promising. In the coming years standard sirens will be routinely detected and we will be able to apply the different GW test of gravity to a much higher precision. The new techniques brought by GW astronomy will bring us closer to unveil the nature of dark energy.