Hearing gravity from the cosmos: GWTC-2 probes general relativity at cosmological scales

Gravitational-wave (GW) catalogs are rapidly increasing in number, allowing for robust statistical analyses of the population of compact binaries. Nonetheless, GW inference of cosmology has typically relied on additional electromagnetic counterparts or galaxy catalogs. I present a new probe of cosmological modifications of general relativity with GW data only. I focus on deviations of the GW luminosity distance constrained with the astrophysical population of binary black holes (BBHs). The three key observables are 1) the number of events as a function of luminosity distance, 2) the stochastic GW background of unresolved binaries and 3) the location of any feature in the source mass distribution, such as the pair instability supernova (PISN) gap. Despite a priori degeneracies between modified gravity and the unknown evolution of the merger rate and source masses, a large damping of the GW amplitude could be falsifiable since as redshift grows it reduces the events and lowers the edges of the PISN gap, which is against standard astrophysical expectations. Applying a hierarchical Bayesian analysis to the current LIGO--Virgo catalog (GWTC-2), the strongest constraints to date are placed on deviations from the GW luminosity distance, finding $c_{_M}=-3.2^{+3.4}_{-2.0}$ at $68\%$ C.L., which is $\sim10$ times better than multi-messenger GW170817 bounds. These modifications also affects the determination of the BBH masses, which is crucial to accommodate the high-mass binary GW190521 away from the PISN gap. In this analysis it is found that the maximum mass of $99\%$ of the population shifts to lower masses with increased uncertainty, $m_{99\%}=46.2^{+11.4}_{-9.1}M_\odot$ at $68\%$ C.L. Testing gravity at large scales with the population of BBHs will become increasingly relevant with future catalogs, providing an independent and self-contained test of the standard cosmological model.


I. INTRODUCTION
The first three observing runs of advanced LIGO [1] and Virgo [2] have seen a rapid growth in the number of gravitational wave (GW) detections [3,4] indicating that the field will soon transition to the era of population analysis -where outliers will flag new phenomena, but the core science will arise from statistical analyses of many events. The current catalog of the LIGO-Virgo Collaboration (LVC) is known as  Preparing in advance, population studies are already central to the LVC astrophysical program [5,6]. Among many interesting findings, GWTC-2 has shown support for the theory of pair instability supernova (PISN), which predicts a mass gap in the mass distribution of black holes between ∼ 50 − 120M [5][6][7]. In their analysis only 2 +3.4 −1.7 % of binary black holes (BBH) have primary masses above 45M [6]. Another key observable is the merger rate history [8], which according to GWTC-2 is probably growing with redshift, but not faster than the star formation rate [6].
Although present astrophysical uncertainties play a crucial role in the interpretation of GW catalogs, population studies are not limited to modeling the source population. A good example are constraints on the cos- * Electronic address: ezquiaga@uchicago.edu; NASA Einstein fellow mic expansion from the location of the lower and upper edge of the PISN gap [9,10]. In general, mass distribution information allows to probe different background cosmologies [11]. Beyond testing cosmological parameters, I will show that astrophysical population analyses can probe one of the pillars of the standard model of cosmology, namely, the validity of general relativity (GR) at large scales.
Gravity can be tested with GW number counts [12], looking for deviations to the universal signal-to-noise ratio (SNR) distribution [13,14]. Nonetheless, such universal relation is only valid if the merger rate does not evolve with redshift. Therefore, this test only applies to the low-redshift universe [14]. However, cosmological modifications of the GW propagation are most relevant at high-redshifts since they accumulate over long travel distances. Some of these theories have been proposed to solve the H 0 tension [15,16]. GW catalogs alone also probe waveform distortions [17,18], GW lensing beyond GR [19] and birefringence [20].
Tests of gravity have seen a proliferation in light of multi-messenger GW astronomy [21]. If a prompt counterpart is detected, the speed of GWs can be constrained [22,23], as beautifully exemplified with GW170817 [24][25][26][27]. Moreover, by directly measuring the source redshift one can infer its electromagnetic (EM) luminosity distance (assuming a cosmology) and test differences w.r.t. the GW luminosity distance [28][29][30][31]. After GW170817 [32,33], constraints on GR deviations were set to c M = −9 +21 −28 at 68.3%C.L [33], where c M = 0 defines GR (this parameter will be introduced later). 1 Alternatively, one can use the GW localization volume to statistically infer the redshift with galaxy catalogs [36]. Recent analyses of GWTC-2 find Ξ 0 = 1.88 +3.83 −1.10 at 68.3%C.L for B-band and completeness threshold P th = 0.2 [37], where GR is Ξ 0 = 1. Despite the great promise of these multimessenger tests, their applicability heavily relies on the number of bright multi-messenger mergers and the completeness of galaxy catalogs [37]. In contrast, this proposal only relies on GW data and can be considered as a guaranteed test.

II. BBH POPULATION AND MERGER RATES
BBHs merge along the history of the universe following a comoving merger rate R(z). This quantity is highly model dependent and to present day mostly unknown. As a working hypothesis all BBHs will be assumed to be remnants of stars. Thus R(z) should be negligible at high redshift, before the peak of star formation z p . To accommodate this astrophysical prior, I follow the parametrization [38] which peaks around z p and has a slope towards z p and after controlled by α and β respectively.
Analyses of multiple populations together [39] are left for future work.
To compute the number of detections one needs to include selection effects -how probable is to detect a binary with given intrinsic parameters. Following [6], I take a broken power-law model for the primary mass p(m 1 ) ∝ m −κi 1 with sharp cutoffs at m min and m max . The transition between the two slopes κ 1 and κ 2 occurs at a breakpoint m break (see App. A for details). For the secondary mass I assume a uniform distribution between m min and m 1 . Then, the selection effects will be encapsulated in the probability of detection p det , which depends on the redshift and masses of the binary together with the detector network sensitivity. Altogether, the detection rate per redshift and component masses is where V c is the comoving volume. Spin priors follow [6]. Noticeably, if the source population and background cosmology are fixed, any modification in the number of events has to arise from the selection bias p det . Precisely, modifications of gravity will change the SNR affecting the probability of detecting binaries at different redshifts.

III. PROBING COSMOLOGICAL MODIFICATIONS OF GRAVITY
Assuming that the emission and detection of GWs follows GR and that there are no additional tensor fields or chirality, beyond GR corrections can be encapsulated in the propagation equation for both polarizations h +,× . Three possible modifications can occur: an anomalous propagation speed c g = c, a modified dispersion relation ∆ω 2 = 0 and a change in the GW amplitude when ν = 0. Relevantly, c g has already been strongly constrained by GW170817 [40] and the modified dispersion relation can be probed directly searching for waveform distortions [18]. 2 For these reasons I concentrate on ν which uniquely determines (when c g = c [30]) the relation between the GW luminosity distance d gw L and the EM luminosity distance d em L : In GR, d gw where the last equality assumes flat cosmologies. The background cosmology is fixed to Planck2018 [42].
Motivated by cosmological modifications of gravity which aim at explaining the present accelerated expansion, I will assume that the additional friction scales with the dark energy where c M is a constant. Since the SNR scales inversely with the luminosity distance, ρ/ρ gr = d em L /d gw L , this leads to which modifies p det in (2). Of course, other parameterizations are possible [43], being interesting to track directly d gw L /d em L with the (Ξ 0 , n) model [29], but this is left for future work.
Under these assumptions, the modifications of d gw L are described in very simple terms. If c M is positive, d gw L will be larger and the overall signal will be quieter. On the other hand, a negative c M reduces d gw L amplifying the GW. Similarly, the higher the redshift, the more important any of these effects become. Although modifications of d gw L are a priori degenerate with arbitrary R(z) and p(m 1 , m 2 ), e.g. a louder signal could be interpreted as a closer or heavier source, beyond GR effects can lead to signatures that are against standard astrophysical expectations as I discuss next.

A. Detection rates
If the cosmological propagation systematically changes the GW amplitude, it is easy to understand that this will affect the number of detections and how far one can hear them. This is explicit in Fig. 1 where the detected redshift distribution is plotted for present sensitivities. If c M 1, then only a fraction of the expected GR events are observed. On the opposite end, if c M −1 one detects events much further, eventually observing the entire population. This plot suggests that the shape of p(z|detected) can have constraining power on modifications of d gw L . Of course, any meaningful bound has to be placed allowing to vary the other unknown parameters of the model, as it will be done with the Bayesian inference.
Despite the observed redshift distribution being correlated with the merger rate evolution, modification of d gw L can produce unexpected results by astrophysical priors breaking some of these degeneracies. For example, a decreasing rate of events with redshift before z ∼ 1 would conflict with BBHs following the star formation rate, which is known to increase up to z ∼ 2 [44]. This could serve to constrain c M 1. Similarly, a modulation of the number of events with redshift would be astrophysically highly unexpected, but possible if GWs mix with other tensor fields introducing an oscillatory pattern in d gw L [45]. The latter nonetheless goes beyond the parametrization in Eq. (4).

B. Stochastic GW background of unresolved binaries
Even though present detectors are only sensitive to relatively low-redshift events, the (non)-observation of the stochastic GW background (SGWB) produced by unresolved binaries provides valuable information. In fact, as shown in Fig. 1, unless c M −1, only a small fraction of all mergers are being detected. The SGWB has the advantage that their sources are at higher redshift and thus more sensitive to modifications of d gw L . The energy density of the SGWB can be computed summing over the energy flux emitted by all non-detected events, as determined by 1 − p det . The dimensionless energy density Ω gw scaling, including the inspiral phase only, is where the detailed derivation is deferred to appendix B since it follows closely the classical result [46]. Interestingly, the ratio of luminosity distances appears quadratically in Ω gw . Although not included here, Ω gw could constrain also modifications of the GW emission [47][48][49].
Deviations in d gw L do not alter the typical f 2/3 spectral shape of Ω gw . However, they shift the turnaround point of the spectrum at f > 100Hz. A positive c M moves the maximum to higher frequencies because the quieter sources behave as lighter ones reducing the effective minimum mass of the population. The peak of Ω gw is, unfortunately, beyond ground-based detector sensitivities preventing the detection of this possible signature of modified gravity.

C. Source mass distribution
Modifying d gw L will bias the inferred source masses. This is particularly relevant when the distribution of masses p(m 1 , m 2 ) has a distinct mass scale, since this will break the degeneracy with the modified d gw L . For BBHs, PISN theory sets two reference scales: the edges of the gap. A d gw L beyond GR will change the inferred location of the gap as exemplified in Fig. 2. Negative c M moves the PISN gap to higher values and vice-versa. This is because c M < 0 allows to expand the horizon redshift and apparently massive events could be just at higher redshift. In particular, the primary, source mass posterior of GW190521, the most massive event so far [50], shifts according to the sign of c M as displayed in the right side of Fig. 2. Negative c M can place GW190521 below the gap and large positive c M above it, in the "far side" [10].
These results are complementary to [51] where local modifications of gravity changing the PISN mass gap in the source population were studied. Here the modified propagation changes the inferred mass gap location, but not p(m 1 , m 2 ). In addition, this idea could be extended to other source populations. For instance, binary neutron stars (BNSs) have a narrow mass distribution that could constrain d gw L when tidal effects identify the compact object as a neutron star [52]. Similarly, if the BNS merger rate was measured with EM observations, next generation GW detectors could tightly constrain c M since R(z) would be fixed [53].
Throughout this analysis the mass distribution is assumed to be constant in time, but modifications of d gw L would be roughly equivalent to change the source masses at different redshifts bym i (z) ≈ (d em L /d gw L ) 6/5 m i . Thus, a c M > 0 emulates decreasing the maximum mass with redshift. This contradicts astrophysical expectations of m max increasing with redshift due to the decrease in metallicity [54]. Therefore, the evolution of the edges of the PISN gap could be a key determinant to test modifications of gravity. Recently, [55] have shown evidence of an increase of m max for a power-law model.

IV. CONSTRAINTS FROM GWTC-2
To test cosmological modifications of gravity with current data, I develop a hierarchical Bayesian pipeline. Since this statistical framework is by now widely used in the GW community, details are presented in App. A. The key differences with the standard analysis are: With these considerations at hand, I analyze GWTC-2, using the same detection threshold as the LVC [6]. The parameters are: c M for the modification of gravity, {R 0 , α, β, z p } for the merger rate history and {κ 1 , κ 2 , m max , b} for the broken power-law mass distribution. The minimum mass of the population is fixed to 5M . Prior choices are specified in App. A.
The main results are summarized in Fig. 3 where the posterior distribution for c M , α and m 99% are presented. m 99% corresponds to the maximum mass of 99% of the events and can be derived directly from the posteriors of the mass distribution. There are several important results. First, the modification of gravity can be tightly constrained with BBH data only to c M = −3.2 +3.4 −2.0 at 68% C.L. This is ∼ 10 times better than current multimessenger constraints [33]. Second, the merger rate slope is still likely positive (α > 0 at 65% probability) but its uncertainty increases w.r.t to the LVC results. This is due to the degeneracy between c M and α. Finally, m 99% shifts to smaller masses with larger errors, m 99% = 46.2 +11.4 −9.1 M at 68% C.L., when compared to the LVC uniform-in-comoving-volume (α = β = 0) results (m 99% = 57.8 +12.5 −8.7 at 90% C.L. [6]). Therefore, allowing for a modified GW propagation makes GWTC-2 lean towards the theory of PISN.
In addition, it is found that the current non-detection of the SGWB does not impose stronger constraints on c M than individual events. Moreover, GWTC-2 has not enough high-redshift sources to constrain β and z p in the parametrization of R(z). Constraints on the other mass distribution parameters do not change significantly w.r.t. to the LVC results [6]. For completeness, full posterior samples are presented in Fig. 4 of App. C. It has been verified that the inferred model is consistent with the observed data performing a posterior predictive check analogous to [6].
Although the bounds on c M are subject to BBH population modeling, the parametrization chosen is flexible enough so that, under the astrophysical origin assumption, these results are robust. The fact that there is a preference for c M < 0 is consistent with astrophysical expectations that the PISN gap might increase with redshift [55]. Bounds on c M > 0 are driven in part by the PISN-motivated prior on m max ≤ 100M . An exploration of different population models/priors will be addressed in the future. When restricting to scalar-tensor theories, these GW constraints are comparable to present bounds from cosmological data [56], although future survey expect to improve the latter by several orders of magnitude [57]. LISA standard sirens could complement LSS bounds [58].

V. FUTURE PROSPECTS
GW observations contain a wealth of information about our universe. In this letter I have proposed a new probe of gravity at cosmological scales using the population of BBHs. This test requires GW data only and is thus a guaranteed output of any present or future GW catalog.
Applying a hierarchical Bayesian analysis to GWTC-2, I find that current BBH observations constrain gravity more strongly than multi-messenger observations from GW170817, with overall results being consistent with GR. This is because modifications in the GW luminosity distance dramatically alter the inferred redshift and source mass distributions. In particular, deviations in d gw L w.r.t. GR shift characteristic scales in the source masses as the expected PISN mass gap. The effect of damping the GW amplitude with redshift (c M > 0) is particularly falsifiable since it leads to rates and PISN mass gap edges that decrease with redshift, which is against standard astrophysical predictions.
This analysis can be extended to incorporate other parameterizations of d gw L , being particularly interesting to test GW oscillations imprinting modulations in the observed redshift distribution [45]. Similarly, including waveform distortions due to a modified dispersion relation could provide additional constraints beyond GR theories [59]. Although the background cosmology has been fixed throughout the analysis, BBHs observations can also be used to constrain H 0 and Ω m [11]. A background and perturbation analysis would in principle be possible, but probably only for the scope of next generation detectors.
This d gw L test can also be applied to other BBH populations, as those LISA will hear from space [60]. From all LISA sources, the ones at higher redshift such as extrememass-ratio inspirals and super-massive black holes would be more interesting. Lensing effects could be incorporated in a similar way, with the probability of detection modified by the optical depth.
BBH observations have proven to be a powerful test of gravity at cosmological scales. Future GW observations will only improve our understanding of the cosmological model.  [61] for the MCMC and corner [62] to present the posteriors. I am supported by NASA through the NASA Hubble Fellowship grant HST-HF2-51435.001-A awarded by the Space Telescope Science Institute, which is operated by the Association of Universities for Research in Astronomy, Inc., for NASA, under contract NAS5-26555. I am also supported by the Kavli Institute for Cosmological Physics through an endowment from the Kavli Foundation and its founder Fred Kavli. This research has made use of data, software and/or web tools obtained from the Gravitational Wave Open Science Center (https://www.gw-openscience.org/), a service of LIGO Laboratory, the LIGO Scientific Collaboration and the Virgo Collaboration.

Appendix A: Statistical analysis
In this appendix I summarize the hierarchical Bayesian pipeline developed in this analysis (see e.g. [63] for a general discussion of this statistical framework). The first step is, of course, Bayes theorem. The posterior distribution of a given set of parameters Λ describing a given population of BBHs follows from where p({d i }|Λ) is the likelihood of obtaining N obs GW events with data {d i }, while π(Λ) are the prior expectations on Λ. Information about the stochastic background can also be included by the likelihood product setting for example that the SNR of Ω gw should be less than 2 during O3a. However this will not be included in the final results since it is found that it does not constrain more than individual events. The likelihood of resolvable events can be described by a Poissonian process where ξ = N det /N bbh is the ratio between the expected detected mergers N det and the actual merger N bbh . Note that the data likelihood, p(d i |φ), given the GW parameters φ, is not directly accessible. Instead there are only the event posteriors samples p(φ|d i ) to which it is necessary to factor out the prior used in the parameter estimation π pe (φ). The main observables are the inferred redshifts and source masses, thus φ = {z, m 1 , m 2 }. Remember that, as explained in the main text, these three quantities depend on c M and are derived from the observed data of {m 1z , m 2z , d gw L }. For these parameters the only relevant parameter estimation prior is π(d gw L ) ∝ (d gw L ) 2 since for the masses the LVC uses a uniform prior. The prior π pe (z, m 1 , m 2 ) is directly obtained including the Jacobian [5]: where in this case following Eq. (4). Altogether, the BBH likelihood can be written as This result can be simplified further if the local merger rate R 0 is marginalized using a uniform in log prior to obtain which does not depend on R 0 .
The BBH population is modeled with a merger rate history following Eq. (1). For the primary mass a broken power-law distribution is used: where m break = m min + b(m max − m min ) and b ⊂ (0, 1].
In the limit of b → 1 one finds m break → m max . On the other hand, the secondary source mass is uniformly sampled below m 1 and above m min . In the analysis the minimum mass is fixed to 5M . Therefore, in total the BBH population is modeled by 8 parameters: In this appendix I provide a derivation of how the modified GW propagation affects the stochastic background of unresolved binaries. I focus in particular on modification in the GW luminosity distance. This derivation extends the classical result of [46] beyond GR, allowing for d gw L = d em L . The dimensionless stochastic GW background is defined as where ρ c = 3c 2 H 2 0 /8πG is the critical energy density and frequencies are in the detector frame. In the second equality, the total energy flux F (f ) = cdρ gw /df is introduced. The total flux is nothing but the energy emitted by all binaries per unit area: The number of events per detector frame time has already been defined in Eq. (2). There, in order to account for all the binaries which cannot be detected individually one simply needs to substitute p det → (1−p det ). One obtains The energy emitted per frequency is given by (recall only modification in the GW propagation are being considered) during the inspiral of a circular binary. A more general expression can be obtained simply noting that is the Fourier transform of the time domain strain (during the inspiral h(f ) ∼ f −7/3 ) which has been averaged over all possible sky locations and orientations Ω. Noticeably, the number of events scales with the differential comoving volume Importantly, the comoving rate scales with (d em L ) 2 while the GW energy flux scales with 1/(d gw L ) 2 . In GR these two quantities are equal and cancel each other. However, beyond GR they do not and Ω gw depends on their ratio square (d em L /d gw L ) 2 . The final result for the inspiral signal is given in Eq. (7). If one wants to include the full emitted signal this can be generalized to where φ = {m 1 , m 2 , z} and d φ = dzdm 1 dm 2 . Hereh gr is the GR emitted signal (which is inversely proportional to d em L and thus this distance factor cancels with the one from dV c /dz).

Appendix C: Full posterior samples
For completeness I present in this appendix the full posterior distributions for all the parameters in the analysis. The results are displayed in Fig. 4. It is to be noted that in Fig. 3 the range in α was cut below −6. This is because as shown in this figure, the parametrization used in this analysis saturates at α −5 and the inference is the same. In any case, 85% of the posterior is above this value.