A standard siren measurement of the Hubble constant using gravitational wave events from the first three LIGO/Virgo observing runs and the DESI Legacy Survey

We present a new constraint on the Hubble constant $H_0$ using a sample of well-localized gravitational wave (GW) events detected during the first three LIGO/Virgo observing runs as dark standard sirens. In the case of dark standard sirens, a unique host galaxy is not identified, and the redshift information comes from the distribution of potential host galaxies. From the third LIGO/Virgo observing run detections, we add the asymmetric-mass binary black hole GW190412, the high-confidence GW candidates S191204r, S200129m, and S200311bg to the sample of dark standard sirens analyzed. Our sample contains the top $20\%$ (based on localization) GW events and candidates to date with significant coverage by the Dark Energy Spectroscopic Instrument (DESI) Legacy Survey. We combine the $H_0$ posterior for eight dark siren events, finding $H_0 = 79.8^{+19.1}_{-12.8}~{\rm km~s^{-1}~Mpc^{-1}}$ ($68\%$ Highest Density Interval) for a prior in $H_0$ uniform between $[20,140]~{\rm km~s^{-1}~Mpc^{-1}}$. This result shows that a combination of 8 well-localized dark sirens combined with an appropriate galaxy catalog is able to provide an $H_0$ constraint that is competitive ($\sim 20\%$ versus $18\%$ precision) with a single bright standard siren analysis (i.e. assuming the electromagnetic counterpart) using GW170817. When combining the posterior with that from GW170817, we obtain $H_0 = 72.77^{+11.0}_{-7.55}~{\rm km~s^{-1}~Mpc^{-1}}$. This result is broadly consistent with recent $H_0$ estimates from both the Cosmic Microwave Background and Supernovae.


INTRODUCTION
Since the first detection of gravitational waves (GW) in 2015 (Abbott et al. 2016), the astrophysics community has significantly intensified its efforts in the field of multimessenger astronomy. Thanks to these remarkable efforts, it was possible to identify the first electromagnetic counterpart to a GW event (LIGO Scientific Collaboration et al. 2017;Arcavi et al. 2017;Coulter et al. 2017;Lipunov et al. 2017;Tanvir et al. 2017;Soares-Santos et al. 2017;Valenti et al. 2017), GW170817 (Abbott et al. 2017), the first binary neutron star merger to be detected in gravitational waves. Multimessenger observations can be used as powerful probes of cosmology and fundamental physics: they can be used to place constraints on gravity theories and to test General Relativity (e.g. Yunes & Siemens 2013;Baker et al. 2017; * NASA Einstein Fellow Ezquiaga & Zumalacárregui 2017;Carson & Yagi 2020;Palmese & Kim 2020), to measure the equation of state of neutron stars, and even to understand whether compact object mergers could be made of primordial black holes (Scelfo et al. 2018;Tsai et al. 2020). Another interesting application of multi messenger observations is that of "standard sirens", first proposed in Schutz (1986): given a gravitational wave detection, it is possible to measure a luminosity distance for the event, and if that is combined with the redshift of the galaxy that hosted the merger, gravitational wave events can be used to probe the distance-redshift relation. Since the relation is sensitive to cosmological parameters, GW events can be used to infer those parameters, and in particular the Hubble constant H 0 for nearby events.
The standard siren methodology can be applied both to events with a counterpart ("bright standard sirens"), for which a unique host galaxy can potentially be identified, and to events without a counterpart ("dark" or "statistical" standard sirens). In both cases, the redshift information comes from galaxies, whether that is a unique galaxy or a sample of potential hosts. The first bright standard siren measurement of the Hubble constant has been derived in Abbott et al. (2017a) for GW170817, which is also the only event so far with a confident counterpart association with a unique host galaxy (e.g. Blanchard et al. 2017;Palmese et al. 2017). Several works (Chen et al. 2020;Gayathri et al. 2020;Mukherjee et al. 2020) have used a candidate counterpart for another event, GW190521, to derive bright standard siren measurements. However, the counterpart association to the GW event cannot be made with confidence (Ashton et al. 2020;Palmese et al. 2021).
Dark standard siren measurements have been produced for a larger number of events, including GW170817 (Fishbach et al. 2019), the binary black hole mergers GW170814 and GW190814 (Soares-Santos & Palmese et al. 2019;Abbott et al. 2020a), and a larger ensemble if events from the first two LIGO/Virgo observing runs, O2 (LIGO Scientific Collaboration & Virgo Collaboration 2019a), and the first half of the third observing run, O3a (Finke et al. 2021). Dark standard sirens require knowledge of the position and redshift of the ensemble of potential host galaxies, over which one needs to marginalize, and are therefore expected to lead to less precise results than bright standard sirens on a single event basis. On the other hand, GW binary black holes, and more in general events without counterpart, currently outnumber those with a counterpart by at least a factor of 10 (Abbott et al. 2021a). We therefore take the approach of combining dark sirens with the much rarer bright sirens to derive standard siren constraints on cosmological parameters.
New measurements of the Hubble constant are valuable to gain insight on the so-called "Hubble constant tension" (e.g. Riess et al. 2019;Planck Collaboration et al. 2018;Freedman et al. 2019;Riess et al. 2021). This tension arises from the significant discrepancy between different measurements of H 0 , namely those that rely on measurements from the Cosmic Microwave Background (Planck Collaboration et al. 2018), and those using Supernovae (SN) with a local distance ladder (Riess et al. 2021). New, independent measurements of H 0 could help clarifying where the tension is arising from (e.g. Verde et al. 2019), and GW standard sirens offer such possibility. In particular, dark standard sirens for which higher order modes can be measured (such as one of the events we use in this work) are particularly promising in providing more precise sky localizations and distances, potentially reaching a few per cent uncertanty in H 0 in 5 years (Borhanian et al. 2020).
In this work, we present a new measurement of the Hubble constant using 8 dark sirens, combined with GW170817 using the counterpart. We decide to focus on the best localized events whose high probability volume is also covered by a complete galaxy catalog. These are in fact the most im-portant characteristics to derive more precise constraints on the Hubble constant. First, better localized events need to be marginalized over a smaller number of galaxies than those that encompass large volumes, so that they are more likely to provide better constraints. Better localized events also tend to be less distant, which means that a standard siren analysis would mostly be sensitive to the Hubble constant rather than other cosmological parameters. Lastly, the galaxy catalog coverage is fundamental.
LIGO Scientific Collaboration & Virgo Collaboration (2019a) find that the improvement on the Hubble constant measurement coming from dark sirens compared to a GW170817-only analysis is marginal, and this can be attributed to the lack of complete galaxy catalogs, along with the lack of well-localized events in the first observing runs. Likewise,  find that the addition of only two well-localized dark sirens with uniform and deep coverage of galaxy catalogs can provide significant improvement to GW170817. First, we improve on the results in  by adding new events from O3. We also improve on the analysis in Finke et al. (2021) by considering a more suitable galaxy catalog. The catalog used there is a compilation of galaxies from different surveys up to 2014, and it is complete out to a distance of ∼ 37 Mpc (Dálya et al. 2018), but largely incomplete at the distances of the dark sirens (all at > 200 Mpc). We use galaxy data from stateof-the-art dark energy experiments, the Dark Energy Spectroscopic Instrument (DESI; DESI Collaboration et al. 2016) imaging surveys and the Dark Energy Survey (DES; The Dark Energy Survey Collaboration 2005), both completed in 2019. Data from these surveys have been augmented with infrared bands and spectroscopic data, and processed to produce the Legacy Survey catalogs. We use photometric redshifts from Zhou et al. (2020), and also derive alternative redshifts through a method that simultaneously constrains redshift and luminosity of galaxies using random forests (Mucesh et al. 2020). These catalogs are used to derive an Hubble constant posterior using the LIGO/Virgo asymmetric mass binary black hole (BBH) merger GW190412 (Abbott et al. 2020b), and the high confidence binary black hole candidates S191204r (LIGO Scientific Collaboration & Virgo Collaboration 2019b), S200129m (LIGO Scientific Collaboration & Virgo Collaboration 2020a), and S200311bg (LIGO Scientific Collaboration & Virgo Collaboration 2020b).
The paper is structured as follows. We describe the data used in §2 and our methods in §3. Results and discussion are presented in §4, followed by the conclusions in §5. We assume a flat ΛCDM cosmology with Ω m = 0.3 and H 0 values in the 20 − 140 km s −1 Mpc −1 range. When not otherwise stated, quoted error bars represent the 68% credible interval (CI).

The LIGO/Virgo GW data
We select the top 20% gravitational wave events and high confidence candidates based on their 90% CI comoving volume. We then select those with > 70% of their probability covered by the DESI imaging.
The GW map we use for GW190412 is one of the two fiducial maps presented in Abbott et al. (2021a). We choose to use the one obtained from fitting the effective-one-body (EOB) family of waveform models to the GW signal. These models also include a post-Newtonian (PN) prescription and numerical relativity (NR) information in the inspiral-mergerringdown description, and the effect of higher-order multipoles for precessing binaries, so they are referred to as SEOBNRv4PHM (Pan et al. 2014;Ossokine et al. 2020). The event is localized to within 12 deg 2 at 90% CI, and has a luminosity distance of 740 +120 −130 Mpc after marginalization over all other parameters. With a 90% CI comoving volume of 4 × 10 −4 Gpc 3 , GW190412 is the second-to-best dark standard siren to date, only second to GW190814. At the time of writing, the three candidate events we consider had not been confirmed as gravitational wave events by the LIGO/Virgo Collaboration yet, but their False Alarm Rate (FAR) is so small (see Table 1), and their classification as BBH is confident (> 99%), that they are very likely to be real GW events of astrophysical origin. We note that their FAR is estimated differently from the other events considered here, since it results from the low-latency analyses rather than from the more accurate offline studies. These candidates have all been confirmed as gravitational wave events at the time this work was ready for submission Collaboration et al. (2021). Their 90% comoving volume is comparable to that of a previously analyzed dark standard siren, GW170814, and it is of the order of ∼ 10 −3 Gpc 3 .

The DESI Imaging data
The DESI collaboration observed a large fraction of the sky (∼ 14, 000 sq deg) in grz bands out to ∼ 2 mags deeper than SDSS, in order to select their spectroscopic targets from a photometric sample which is large and deep enough to contain a dense galaxy and quasar sample. An area of ∼ 9000 sq deg of the North and South Galactic Caps (NGC and SGC) up to dec< 32 deg was imaged in grz by the Dark Energy Camera (DECam; Flaugher et al. 2015) mounted on the Mayall 4m telescope at the Cerro-Tololo Inter-American Observatory. These observations was part of the The Dark Energy Camera Legacy Survey (DECaLS). The remaining area of the NGC was imaged in g and r by the Bok 90 inch (as part of BASS, the Beijing-Arizona Sky Survey; Zou et al. 2017), and in z by the Mosaic-3 on the Mayall 4m telescope at Kitt Peak National Observatory (as part of MzLS, the Mayall zband Legacy Survey).
The optical bands are supplemented with IR data from the Wide-field Infrared Survey Explorer (WISE; Wright et al. 2010). Source detection and photometry, including forced photometry on the unWISE W 1, W 2, W 3, and W 4 images (Lang 2014;Meisner et al. 2018;Meisner et al. 2019) at 3.4 and 4.6 µm respectively, is performed with the software package THE TRACTOR (Lang et al. 2016). The 5σ depth of the survey is 24.0,23.4 and 22.5 in grz respectively, for the AB mag of a fiducial galaxy with an exponential light profile and half-light radius 0.45 arcsec. Survey and calibration have been performed carefully to ensure a maximally uniform source sample despite the different instruments used. Survey and photometry are described in more detail in Dey et al. (2019).
We use photometry and photometric redshifts from Data Release (DR) 9, which covers more than 16, 000 sq deg thanks to imaging available outside of the nominal DESI imaging footprint. We select galaxies by requiring PSF = 1. Since this cut alone provides a highly star-contaminated galaxy sample, we follow Ruiz-Macias et al. (2021) and apply a further cut using GAIA data. We treat objects as galaxies if either of these conditions apply: • they are not in GAIA; • they are in GAIA but have (G GAIA − r raw ) > 0.6, where (G GAIA is the G band GAIA magnitude, and r raw is the DESI Imaging r band magnitude uncorrected for the Galaxy's dust extinction. Moreover, we correct the photometry for Milky Way dust extinction using the correction factors provided in the Legacy Survey release, which are based on the dust maps of Schlegel et al. (1998), and remove duplicated sources in the regions where the north and south surveys overlap, keeping the DE-CaLS sources where available.

Redshifts truth table
As a truth table for training, validation, and testing of redshifts and luminosities of the photometric sample, we use the sample compiled by Zhou et al. (2020), which comprises of spectroscopic redshifts and sophisticated multiband photo-z's from several galaxy surveys. It includes data from the 2-degree Field Lensing Survey (Blake et al. 2016 (Parkinson et al. 2012). The objects from these catalogs are selected based on their quality flags as described in Zhou et al. (2020), and matched to DESI imaging sources to create the final truth table, that comprises of more than 2.5 million objects.
The training sample that we use for GALPRO is the same as the one defined in Zhou et al. (2020): the truth table is downsampled in order to reduce the effect of spurious peaks in the redshift distribution of the evaluation sample, due to the selection function of the most numerous samples in the truth table (namely the SDSS Main Galaxy Sample, BOSS, WiggleZ and GAMA). If a training sample presents sharp peaks in the redshift or color distribution because of a specific color or magnitude selection in the truth table, these regions are overrepresented compared to others, and it is likely that the random forest would oversample from the redshifts and colors around these peaks, causing systematic biases. This selection process is particularly important for our work, since we want to avoid introducing spurious peaks in the dN/dz that could result in spurious peaks in the Hubble constant posterior. The training sample contains ∼ 950, 000 objects with a redshift distribution that is roughly flat over the redshift range of interest here (z < 0.4). The remaining truth redshifts are used for testing the photo-z performance and measure the systematic bias. We compute rest-frame r-band luminosities (L rest r ) using the method described in Rudnick et al. (2003).

Legacy survey photo-z's
Photometric redshifts have been computed by Zhou et al. (2020) for the entire Legacy Survey using Random Forests (RF). Their input features for the RF are r-band magnitude, g − r, r − z, z − W 1 and W 1 − W 2 colors. In addition, they provide three morphological parameters following Soo et al. (2017), significantly improving the scatter and outlier fraction of the testing sample. These parameters are the halflight radius, the ratio between semi-minor and semi-major axes, and a "model weight" that quantifies how well a galaxy is fit by an exponential light profile versus a de Vaucouleurs profile.
When using these photo-z's, we follow the findings of Zhou et al. (2020) and cut the sample at z < 21 to ensure that the photo-z bias is under control, and at photo-z errors < 0.1 to remove catastrophic outliers. Here we present the performance of these photo-z's on the redshifts from the truth table that were not used by Zhou et al. (2020) as part of their training sample for the redshift range of interest here. This is necessary for our work since the analyses of Zhou et al. (2020) focus on a higher-z sample of Luminous Red Galaxies, while we are interested in a lower redshift sample of generic galaxies. From the photo-z catalog we select galaxies with a spectroscopic measurement from the compilation of Zhou et al. (2020). We define our validation sample using the spec-z's that have not been used for training the photo-z's. In Fig. 2 we show results on the photo-z validation sample. The lefthand side panel shows the bias ∆z = z phot − z spec , the deviation of the photo-z point estimate from the spec-z measured for the same galaxy, as a function of spec-z. The bias is normalized by a (1 + z) factor, as this is often how this metric scales with redshift. The density plot is mostly flat around ∆z/(1 + z) = 0 over the entire redshift range of interest, with a median value of -0.0014. Another photo-z performance metric often used in photometric surveys is the scatter in the bias distribution, which can be presented as the normalized median absolute deviation (NMAD): For the DESI Legacy Survey, we find σ NMAD = 0.01. On the right hand side of Fig. 2 we also show the mean bias and scatter in photo-z bins of size 0.05. The scatter is presented as σ 68 , the 68th percentile width of the bias distribution about the median. Apart from the first redshift bin, which is mostly relevant only for one of the events we consider and is affected by a smaller validation sample, the bias shows an excellent photo-z performance as it is always below 0.004. Also σ 68 is indicative of accurate redshifts, as it is below 0.02 over most of the redshift range. Using the scatter, we compute the 2σ and 3σ outlier fraction as the fraction of objects that lay more than 2 or 3 times σ 68 away from ∆z = 0. We find that the fractions are ∼ 4.5 and 0.3%, respectively. These values are competitive with state-of-the-art cosmological photometric experiments such as DES. The DES science requirements include a maximum value for the scatter of σ 68 < 0.12, a requirement which is comfortably met by the sample described here. Both the scatter and bias are also competitive, with those found for the DES Science Verification and Year 1 photo-z catalogs (Sánchez et al. 2014;Hoyle et al. 2018) over a similar redshift range. The excellent results obtained can be understood in light of 1. the large overlap with spectroscopic surveys (larger than what is possible for DES) that provide an excellent training sample, 2. the fact that we are restricting the analysis to nearby redshifts, where the signal-to-noise of our photometry is best and more complete spectroscopic samples are available compared to larger redshifts, 3. the lack of i−band photometry in the DESI imaging is not a determining factor in the quality of the photo-z's since the 4000 Å break is crosses to the r band only at the edge of the redshift range considered, around 0.4. More information about the galaxies' Spectral Energy Distribution, other than the optical grz data, is provided by the addition of the WISE data, which also contributes to the quality of the photo-z's.
In order to ensure that our sample is volume-limited, we use an approach similar to that of . First, we define a maximum redshift of interest for each GW event, given its 90% CI bounds in luminosity distance marginalized over the whole sky. The higher bound is converted into a maximum redshift given the largest value of H 0 we consider in the prior. For each event, given the maximum redshift considered, we derive the absolute magnitude that corresponds to the galaxy sample limiting apparent magnitude at that redshift. We then discard galaxies with an absolute magnitude below this threshold. Note that while we use a fiducial ΛCDM cosmology to derive these magnitudes, the H 0 dependence is irrelevant since the threshold value and the galaxies' absolute magnitudes scale with H 0 in the same way.
In Fig. 3 we show the redshift distribution of galaxies in the 90% CI area of the GW events we analyze in this work. The distribution is subtracted with a dN/dz that is uniform in comoving volume, in order to highlight the presence of overdensities and underdensities along the line of sight. The dot-dash blue line shows the distribution using the photometric redshifts point estimates from the DESI Legacy Survey described in this subsection, the solid red line shows the same redshifts when their uncertainty is taken into account as a Gaussian error. These two distributions are very close to each other, although the Gaussian sampling case tends, as expected, to smooth out some feature that are more prominent in the point estimate distribution. Remarkably, from these plots it is clear that, in most cases, the distance of the GW events, assuming an H 0 of 70 km s −1 Mpc −1 (given by the grey vertical line, with the shaded region reflecting the uncertainty in luminosity distance when marginalized over the entire sky), is close to overdensities in the galaxy distribution. These overdensities provide peaks in the H 0 posterior.

Joint redshift-luminosity PDFs with Random Forests
We compute joint redshift-luminosity PDFs using GALPRO (Mucesh et al. 2020), a Python package for generating multivariate PDFs of galaxy properties on-the-fly using Random Forests. We build a RF model to predict redshift and rest-frame r-band absolute magnitudes simultaneously. For this, we input the same features as used by Zhou et al. (2020) to predict photo-z's. The only difference is that we convert fluxes into 'asinh' magnitudes ('luptitudes') to avoid removing any faint galaxies with close to zero or negative fluxes in our target samples. The target features are the spec-zs and the rest-frame r-band absolute magnitudes we computed before. We train and test the model using the same samples used by the authors to train and test their RF model. We generate joint PDFs for galaxies in the test sample and use them to validate the model by checking probabilistic and marginal calibration. Finally, we use the trained model to generate joint PDFs for galaxies in our target datasets, for which we do not have the 'truth' values.
To validate the marginal and joint PDFs of redshift and luminosity, we use the metrics described in Mucesh et al. (2020). We check two different modes of calibration: probabilistic calibration and marginal calibration for the marginal PDFs, and their multivariate counterparts, probabilistic copula calibration and Kendall calibration respectively to validate the joint PDFs. We apply the metrics on PDFs of galaxies in the test dataset. The PIT distributions indicate that the recovered marginal PDFs are well-behaved, as they follow a nearly uniform distribution, with an outlier fraction below the percent level. The PIT results also show that the marginal PDFs are slightly overly broad and biased as the distributions have a convex shape and contain a slope. The marginal calibration plots make this concrete as the difference between the 'true' and predicted cumulative distribution functions (CDFs) averages around zero, although with some fluctuations about the zero line. The copula probability integral transform (copPIT) distribution and the Kendall calibration are poor compared to their univariate counterparts. This indicates that the joint PDFs are not as accurate as the marginal ones. However, we will only be using the marginal PDFs in this work. To improve the calibration of the PDFs, we try different methods. At first, we tune the model by incrementing the minimum number of samples (min_samples_leaf) that must be present in a leaf node from 1 (the default) to 2, 3, and 5. However, this results in further degradation of the calibration. Next, we try augmenting the training dataset by scattering the galaxies. Specifically, we do this by scattering the photometry according to the photometric errors. Each galaxy in the training dataset is scattered five times, and the galaxies in the testing dataset are scattered once for consistency. We run the analysis again and find that there is a slight improvement in the PIT distributions as they become more uniform, so we choose this training for our analysis. Each galaxy is taken into account in the Bayesian framework detailed in §3 with 100 photo-z and luminosity samples. We consider the samples that pass the luminosity cut as described in §2.2.2, and we use the kernel density estimation (KDE) method to retrieve the redshift PDFs for the galaxies in our sample.

Galaxy fakes
Two of the events considered, S200129m and GW170608, are covered to less than 90% CI in area by the DESI imaging. For the uncovered regions of the 90% CI, we inject galaxy fakes that are samples from our prior distribution, in order to ensure that the marginalization occurs over all the possible host galaxies and that the final uncertainty on the Hubble constant is not underestimated. The prior distribution of galaxies is given in both cases considered here, the DESI imaging photo-z's and the GALPRO PDFs, by the training sample. We take Monte Carlo samples from the redshift/luminosity distribution to assign to each fake a redshift and a luminosity The spatial distribution is uniform per unit solid angle, with a number density of galaxies provided by the mean value of the training sample. We repeat the redshift/luminosity sampling 100 times per galaxy for the case of the GALPRO measurements, in order to match the number of PDF samples considered for the real galaxies. In the first step, we write the posterior probability of H 0 as it follows from Bayes' theorem: where d GW is the GW data from a single event, d EM is the data from the galaxy survey into consideration, p(d GW , d EM |H 0 ) is the joint GW-EM likelihood, and p(H 0 ) is the prior on H 0 which we take to be uniform over the range [20, 140] km s −1 Mpc −1 . After marginalizing over all potential host galaxies i in the galaxy sample, over the unknown true redshift z and over the photometric redshift bias ∆z, one can show that Eq.
(2) can be rewritten as: where β(H 0 ) is the likelihood normalization factor that takes into account selection effects, and Z i = p(d EM |z i )r 2 (z i )/H(z i ) dz i are evidence terms that normalize the posterior. The first term in the integral is the GW likelihood computed at the position and redshift of the galaxy i, the second term is the redshift likelihood of the galaxy, shifted by the photo-z bias, and p(∆z) is the prior on the photo-z bias, which we measure from the redshift validation sample.
Selection effects are taken into account in the β(H 0 ) term as follows, similarly to what is done in Chen et al. (2018) and Gray et al. (2019). This term is the joint GW-EM likelihood marginalized over all possible GW and EM data. We consider the DESI legacy survey complete down to the absolute magnitudes considered for the events taken into account, for which we only consider galaxies out to z < 0.5. Under this assumption and that of isotropy of the events on large scales, after marginalizing over the entire sky, the normalization factor reduces to: where p GW sel describes our selection function of GW events, and it is 1 if the detector network SNR is > 12 and the localization volume satisfies our selection criteria, 0 otherwise. Note this is the same as Eq. 15 in Chen et al. (2018), where a full derivation can be found.
In order to compute β(H 0 ), we adopt a Monte Carlo approach. We simulate 70,000 BBH mergers and compute β(H 0 ) for 20 values of H 0 within our prior range. We generate GW events using the BAYESTAR software (Singer & Price 2016;, also based on tools from LALSuite (LIGO Scientific Collaboration 2018). We assume the O3 sensitivity curves for LIGO and Virgo measured from the first months of O3 operations, as published by LIGO in Document P1200087 (https: //dcc.ligo.org/LIGO-P1200087/public). The injected events follow a distribution that is uniform in comoving volume, assuming a Planck 2018 cosmology with different H 0 values. We assume IMRPhenomD waveforms both for the injections and reconstructions. The BHs follow a mass function described by a power-law with index 1.6 (as found in Abbott et al. 2021b for the "power law + peak" mass function), and a uniform spin distribution between (−1, 1). After the 70,000 injections are made, we run a matched-filter search to retrieve the detected events. A detection is made when at least 2 detectors reach a single-detector signal-to-noise ratio SNR> 4 and the network SNR is > 12. Gaussian noise is added to the measurement. In the last step, we reconstruct BAYESTAR skymaps for the detection. The reconstruction is made assuming a distance prior which scales as ∝ d 2 L , where d L is the luminosity distance.
Another component of our selection is the requirement that the high probability region of the GW map is well-covered by the DESI imaging. Because we do not expect the DESI footprint to be correlated with the GW antenna pattern, we do not need to take into account this selection effect on the GW events, as if we were performing an isotropic random sampling of the events. We find that this selection results in a β(H 0 ) that follows closely the H 3 0 function used in previous works.
We note that the selection effects calculation could be improved by considering the same priors as those assumed in the GW likelihood. In other words, it could be improved by assuming the same priors used when determining the sky map. The most significant difference in the priors concerns the binary mass. This is assumed to be uniform on the redshifted detector-frame mass when computing the sky map, while the priors we have considered for the selection function calculation assume an astrophysical mass distribution on the source frame mass. However, Gray et al. (2019) show that if the mass distribution takes the form of a power law, no prior correction is required for this difference in assumptions. In the future, one may prefer to assume a more realistic mass function, and ideally simultaneously fit for both the mass function and the cosmology.
Assuming that the data {d GW,i } from a sample of GW events is made of events that are independent of each other, the Hubble constant posterior can easily be rewritten for a combination of GW events through the product of the single event j likelihoods:

RESULTS AND DISCUSSION
In this Section, we show the results using the DESI Legacy Survey photo-z's, replaced by the spectroscopic redshifts described in §2.2.1, where available. We produce an H 0 posterior for each of the events GW170608, GW170818, GW190412, S191204r, S200129m and S200311bg, and use the DES posteriors published in  for GW170814 and GW190814. As reported in Section 2, the metrics found for the GALPRO results are not as optimal as those we find for the Legacy Survey photo-z's, but we still use them as an alternative photo-z method for comparison, and find that the H 0 constraints do not significantly depend on which of the photo-z's are used. We provide the Legacy Survey photo-z results as our fiducial result.
The posterior distributions on H 0 for the single dark siren events considered (in colors) are shown in Fig. 4. Each event reduces the 68% CI of the H 0 prior to its ∼ 85%. The most constraining event is GW190412, which is reasonable provided that this event has the best localization after GW190814, and its 1σ constraint reaches close to 80% of the H 0 prior. Fig. 3 also shows a prominent overdensity of galaxies around redshift 0.19 for GW190412, which is reflected in the H 0 posterior as a peak around H 0 of 80 km s −1 Mpc −1 . We stress that it is expected that the H 0 posteriors from individual dark standard sirens present multiple peaks, as these correspond to multiple overdensities in redshift along the line of sight. It is also expected that the peaks are very broad for the current level of precision: the GW localization contains thousands of potential host galaxies, over  which we marginalize, and both the GW events luminosity distance estimate and the photometric redshifts are measured to a precision of 10-30%. We find that for both events where fakes are required, the effect of their addition is that of further flattening the posterior compared to the case with no fakes. This is expected since it effectively corresponds to adding more galaxies for marginalization, and with an uninformative redshift distribution. It is after combining a large enough number of events (namely O(100) for a O(1%) precision on H 0 ; ) that the dark siren method becomes more powerful. Fig. 5 shows the posterior obtained by combining all the dark sirens in blue. The maximum a posteriori with the 68% CI is 79.8 +19.1 −12.8 km s −1 Mpc −1 from this combination. This result is broadly consistent within 1σ with both the Planck and Riess et al. (2021) constraints, whose 68% intervals are represented for reference by the vertical shaded regions in Fig. 5. Abbott et al. (2021a) present results on GW190412 that assume two different sets of fiducial models, namely the SEOBNRv4PHM and a set of empirical models called IM-RPhenomPv3HM. We have tested that the choice of the map does not result in any significant effect on the H 0 posterior distribution, and choose to use the former version.
The origin of the BBH mergers that LIGO/Virgo have detected is unclear, and it is likely that multiple formation mechanisms are at play (Zevin et al. 2021). While it is plausible that some channels may occur preferentially in low-mass galaxies (e.g. Palmese & Conselice 2020), an assumption that several dark siren works have made is that higher mass, more luminous galaxies are more likely to host BBH events. Similarly, we weigh the galaxies in our sample based on their r-band luminosity. Since this approach effectively gives more weight to the less numerous, more luminous galaxies, compared to the lower mass galaxies, we expect a luminosity weighting to produce more stringent constraints. We find that the weighting produces a slight improvement on the constraints, yielding a 1% improvement on the precision. We prefer to provide the more conservative result that does not include the luminosity weighting.
We note that the results presented here are valid under a precise cosmological model, the Flat ΛCDM model. This is unlike the H 0 measurement from GW170817 Abbott et al. (2017a), which is nearby enough to only be sensitive to the Hubble constant. The measurement presented in this work, similarly to all other BBH dark siren measurements available so far, depend on the background cosmology because they extend beyond redshift ∼ 0.1, where the Hubble parameter becomes more sensitive to the matter density Ω m and other parameters. We have tested that our results are not significantly affected by changes to Ω m within the 2σ interval found by the DES-only measurements presented in Abbott et al. (2019b), so that our H 0 constraint, although tied to a Flat ΛCDM model, is not dependent on the exact Ω m value assumed, at least within a prior consistent with this DES constraints. Future dark siren measurements from a larger sample of events with more precise distance measurements will be sensitive to the matter density and other parameters, and should be able to place constraints on Ω m and the dark energy equation of state (e.g. Laghi et al. 2021).
We compare our results to that of Finke et al. (2021), who find H 0 = 67.3 +27.6 −17.9 km s −1 Mpc −1 from a combination of a large number of events from the O1, O2, and the first half of O3. Our result is broadly consistent with it, but we find a more precise constraint despite the smaller number of events considered. The galaxy catalog used in that work, the GLADE catalog Dálya et al. (2018), is a compilation of galaxies from different surveys complete out to a distance of ∼ 37 Mpc (Dálya et al. 2018), which is extremely useful for targeted follow-up of nearby GW events. However, at the larger distances of the events considered in Finke et al. (2021) and this work, GLADE is highly incomplete and inhomogeneous, as opposed to the DESI imaging. Only 5 events in Finke et al. (2021) are considered to be largely covered by the GLADE catalog, and are therefore expected to provide a useful contribution to the final H 0 constraint. However, one of those, GW150914, is at a problematic location for galaxy catalogs, as a large portion of its 90% CI area is either covered by the Large Magellanic Cloud, or close to the Galactic Plane. Most of the contribution to their H 0 posterior comes from GW190814, while they properly take into account the catalog incompleteness for the remaining events, which has to result in less constraining results as discussed above. For these reasons, and thanks to a careful choice of the events to be considered (i.e. those that are better localized), we are able to retrieve a more constraining Hubble constant posterior despite the use of a smaller number of events. We also note that our H 0 prior is slightly larger than theirs ([30,140] km s −1 Mpc −1 ), which could be misleading when comparing the precision of measurements. A preprint showing contsraints from a larger sample of BBH became available at the time this work was being submitted (LIGO Scientific Collaboration et al. 2021), so we do not provide an extensive comparison. We only note that LIGO Scientific Collaboration et al. (2021) use a galaxy catalog that is very similar to that in Finke et al. (2021), and will therefore suffer from similar limitations from the host galaxies standpoint.
At last, we combine our results with those from GW170817 assuming the electromagnetic counterpart. We use the posterior from Nicolaou et al. (2020), who more carefully revisit the peculiar velocity contribution to the constrain compared to the original measurement of Abbott et al. (2017a), with the H 0 prior appropriately rescaled to match the one we use for the dark sirens. The posterior obtained by the combination of all dark standard sirens considered in this work, and GW170817, is shown by the black line in Fig.  5, and it is also broadly consistent with Planck Collaboration et al. (2018) and Riess et al. (2021). Our final H 0 constraint from this analysis is 72.77 +11.0 −7.55 km s −1 Mpc −1 . The constraint is largely driven by the one bright standard siren available, but the dark standard sirens also provide a significant contribution by improving the precision from GW170817 by 28%.

CONCLUSIONS
In this paper, we have presented a new measurement of the Hubble constant using the best available gravitational wave events up to date and a state-of-the-art, uniform galaxy catalog from the DESI Legacy Survey. This measurement results in a constraint of the Hubble constant of 79.8 +19.1 −12.8 km s −1 Mpc −1 , i.e. a ∼ 20% uncertainty on H 0 from dark sirens alone. This is the most constraining dark standard siren measurement obtained so far using a complete galaxy catalog. After combination with the one bright standard siren available, properly marginalized over different models for the peculiar velocity of the host galaxy, we obtain H 0 = 72.77 +11.0 −7.55 km s −1 Mpc −1 , a 12% measurement of H 0 . We note that the combination of the GW170817 bright siren, whose H 0 estimate is independent of the cosmological model, with the dark sirens, which we derived within the assumption of a Flat ΛCDM scenario, is also tied to a Flat ΛCDM scenario.
In the near future, spectroscopic observations of the host galaxies in the localization regions, higher order multipole analyses of the GW candidates, and deep observations of other well-localized events not covered by the DESI Legacy Survey (e.g. GW150914 and S191216ap) could further improve the measurement presented in this work. However, the significant improvement on dark siren measurements will be possible with the upcoming LIGO/Virgo/KAGRA observing run, expected to start in the second half of 2022. With the dark siren precision expected to be achieved after a few years of LIGO/Virgo/KAGRA running at design sensitivity or better (when a 2% statistical precision on H 0 will become possible, and such precision is valuable to inform us on the Hubble constant tension), careful studies of potential systematics not included here should be carried out, especially in light of new results from population studies of BBH: e.g. the impact of a galaxy catalog depth on the constraints, based on different BBH formation channels; the impact of the Gaussian ansatz on the dark siren posterior.  Table 2. Hubble constant measurements using gravitational wave standard sirens from this work and previous works. H0 values and uncertainties are given in km s −1 Mpc −1 , and H0 priors are flat. The uncertainty from the flat prior only is derived by assuming the same H0 maximum found in the analysis. Quoted uncertainties represent 68% HDI around the maximum of the posterior. The "σH 0 /σprior" column shows the 68% CI from the posterior divided by 68% CI of the prior width.