Ain't No Mountain High Enough: Semi-Parametric Modeling of LIGO-Virgos Binary Black Hole Mass Distribution

We introduce a semi-parametric model for the primary mass distribution of binary black holes (BBHs) observed with gravitational waves (GWs) that applies a cubic-spline perturbation to a power law. We apply this model to the 46 BBHs included in the second gravitational wave transient catalog (GWTC-2). The spline perturbation model recovers a consistent primary mass distribution with previous results, corroborating the existence of a peak at $35\,M_\odot$ ($>97\%$ credibility) found with the \textsc{Powerlaw+Peak} model. The peak could be the result pulsational pair-instability supernovae (PPISNe). The spline perturbation model finds potential signs of additional features in the primary mass distribution at lower masses similar to those previously reported by Tiwari and Fairhurst (2021). However, with fluctuations due to small number statistics, the simpler \textsc{Powerlaw+Peak} and \textsc{BrokenPowerlaw} models are both still perfectly consistent with observations. Our semi-parametric approach serves as a way to bridge the gap between parametric and non-parametric models to more accurately measure the BBH mass distribution. With larger catalogs we will be able to use this model to resolve possible additional features that could be used to perform cosmological measurements, and will build on our understanding of BBH formation, stellar evolution and nuclear astrophysics.


INTRODUCTION
The LIGO-Virgo Collaboration's second catalog of compact object mergers has shown that the universe is teeming with colliding compact objects with a variety of masses and spins (Abbott et al. 2016). In contrast to the 11 sources reported in the first LIGO-Virgo Collaboration (LVC) catalog (GWTC-1 Abbott et al. 2019a), the second catalog (GWTC-2 Abbott et al. 2020c) contains 50 sources, enabling a deeper look into the formation environments of compact object binaries. The sources in GWTC-2 include two binary neutron stars (BNSs) (Abbott et al. 2017(Abbott et al. , 2020a, 46 binary black holes (BBHs), and two neutron star black hole (NSBHs) candidates (Abbott et al. 2020d). The 46 confirmed BBHs observed in GWTC-2 include the first clear evidence of an asymmetric mass binary, potentially the least massive black hole known, and the most massive stellar mass black hole to date (Abbott et al. 2020e,d,f,g). With this large catalog of BBH mergers, one can now begin to robustly infer the properties of the astrophysical BBH distribution in addition to each individual event properties (Abbott et al. 2019b.

bedelman@uoregon.edu
Prior to the release of GWTC-2, the inferred mass distribution for the more massive (primary) components in mergers was thought to be consistent with a declining power law that cuts off at ∼ 45M (Abbott et al. 2019b;Fishbach & Holz 2017). When analyzing the BBH primary mass distribution including events in GWTC-2, Abbott et al. (2020h) found that a truncated power law is no longer consistent with the additional observations. The primary mass distribution was found to have some feature at ∼ 35-40 M , which was best described by either a break to a steeper power law or a with the addition of a peak. The presence of a peak in the primary mass distribution in this mass range is not surprising: it would be expected if we are witnessing effects of pulsational pair-instability supernovae (PPISNe) (Talbot & Thrane 2018). Massive stars that are too light to be fully disrupted by a pair-instability supernova (PISN) can shed large amounts of mass in a series of explosive pulsations before collapsing to a black hole (Woosley 2017(Woosley , 2019Farmer et al. 2019). This process leads to a wide range of initial stellar masses that map onto remnant black holes with masses 30 M ≤ m BH ≤ 45 M (Belczynski et al. 2016;Marchant et al. 2019;Stevenson et al. 2019). GWTC-2 also includes more massive binaries than previously observed, most notably GW190521 (Abbott et al. 2020f,g). Both component black holes of GW190521 have masses that pose a challenge to the theoretical prediction that pair-instability (PI) would forbid isolated stellar evolution from creating remnant black holes with masses from ∼ 50-125 M Heger et al. 2003;Spera & Mapelli 2017). There is some evidence that GW190521 could be a mass gap straddling binary or the result of other physical processes that get around the conflict with PISN theory (Fishbach & Holz 2020;Nitz & Capano 2021;Estellés et al. 2021;Cruz-Osorio et al. 2021;Sakstein et al. 2020;Gayathri et al. 2020;Farrell et al. 2020;Secunda et al. 2020;Edelman et al. 2021a). However, the presence of these high mass component black holes could also point towards there being a contribution to the observed population of BBHs detected by LIGO/Virgo, that formed in a way that avoids PI. These formation possibilities include hierarchical mergers in dense stellar environments, relativistic accretion onto heavy BHs in active galactic nuclei disks, isolated binary evolution of low-metallicity Population II stars, or even the presence of new physics beyond the standard model (Rodriguez et al. 2019;Doctor et al. 2020;Kimball et al. 2020a,b;Doctor et al. 2021;McKernan et al. 2020;Belczynski 2020;Croon et al. 2020;Mapelli et al. 2020).
Incorrectly inferring the BBH mass distribution has been shown to significantly bias both estimates of merger rates and the stochastic gravitational wave background amplitude (Talbot & Thrane 2018). Additionally the effects of PI can imprint features onto the mass distribution such as a high mass cutoff in the mass distribution (PISN), or a possible a pileup of mergers at masses just below the cutoff (PPISN). Resolving either of these features can provide a mass scale, calibrated across cosmic time, that enables measurements of the redshift-luminosity-distance relation to infer cosmological parameters . As catalogs of GWs from BBHs grow in size (Abbott et al. 2020b), we will be able to infer the BBH mass distribution with greater fidelity to determine if there is presence of additional structure beyond a power law. Such structure could yield insights about the nature of what environments BBHs form in and how they are connected to the rich fields of stellar evolution and nuclear astrophysics (Zevin et al. 2017;Farmer et al. 2019Farmer et al. , 2020Ng et al. 2021).
Bayesian non-parametric models provide a useful data-oriented approach to modeling when one has little information or prior knowledge about the structure of a set of data. These approaches provide little to no constraints on the functional form imposed by the model and instead use very flexible functions that have large prior support for a wide variety of unknown densities. Non-parametric modeling has been widely applied across different areas of GW Astrophysics, including modeling deviations from GR waveform models, modeling the noise power spectrum of detectors, modeling the calibration of the detectors, and creating surrogate models for faster waveform execution (Edelman et al. 2021b;Littenberg & Cornish 2015;Vitale et al. 2021b;Doctor et al. 2017).
In this work, we approach the mass spectrum from a data-driven perspective, using a semi-parametric method rather than the low-dimensional parametric models used in Abbott et al. (2019b. Our semiparametric method is complementary to both parametric and fully non-parametric approaches (Mandel et al. 2016;Tiwari 2021) by incorporating a simple parametric description of the large-scale structure (i.e. a power law) with an additional non-parametrically modeled component on top. This approach can aid in searching for generic deviations to the underlying parametric descriptions that could be the result of astrophysical processes. Since non-parametric approaches make few assumptions on the form that the underlying distributions may take, our model minimizes biases to the structure such deviations could take. We expect a large fraction of stellar mass BHs to form at the end of life of massive stars, which motivates our choice of a power law form of the BBH primary mass distribution following a similar functional form to the stellar initial mass function (Kroupa 2001). We therefore reconstruct the primary mass distribution with the Truncated power law model (Fishbach & Holz 2017;Abbott et al. 2020h), in which we modulate with a non-parametric perturbation. This method takes advantage of using a simple parametric form to capture the majority of the structure in the primary mass distribution while the perturbation function can find data-informed deviations from the power law. In Section 2 we describe our semi-parametric perturbation population model, and in Section 3 we present and discuss the inferred properties of the primary mass distribution when analyzing all 46 BBHs in GWTC-2. Finally, we explore possible interpretations of our results and conclude in Section 4.

SPLINE PERTURBATION MODEL
We use a hierarchical Bayesian inference framework to infer the properties of the astrophysical distribution of BBHs that incorporates software injections to estimate selection effects (Mandel et al. 2019;Farr 2019;Vitale et al. 2021a). This procedure is described in detail in Appendix A. In order to capture both the overall trends and any sharper features that may be in the primary mass distribution, we modulate a base parametric hyper-prior on primary mass, p(m 1 |Λ), by a highly flexible perturbation function -in this case, a cubic spline. We choose the simplest of previously used parametric models as our underlying mass distribution, p(m 1 |Λ), which is described by a power law in both primary mass and mass ratio with a sharp low and high mass cutoff (Abbott et al. 2019b;Fishbach & Holz 2017;Abbott et al. 2020h). This model was referred to as the Truncated model in Abbott et al. (2020h). While the Truncated model alone does not describe GWTC-2 well (Abbott et al. 2020h), it captures the majority of the large-scale structure found in the primary mass distribution. For our underlying description, we extend the Truncated model to allow for a tapering of the distribution at low masses following the same form used for the Powerlaw+Peak model described in Talbot & Thrane (2018) and Abbott et al. (2020h). Figure 1 shows an illustration of our spline perturbation model on top of a power law without any mass cutoffs or tapering. We multiplicatively apply perturbations to the underlying distribution as: (1) In the above equation, k is a normalization factor found by numerically integrating p spline (m 1 |Λ, {m i , f i }) over the entire range of primary masses, and f (m 1 ; {m i , f i }) is the perturbation function modeled as a cubic spline that is interpolated between n knots placed in m 1 space. These knots are denoted by their locations in mass space, {m i } n i=1 , and their heights at each knot, {f i } n i=1 . For readability, we hereafter drop explicit dependence of f on {m i , f i } unless needed.
We fix the locations of each knot to be linear in log m 1 space from 5-100 M and restrict the pertur-bations to zero at the minimum and maximum knots. With these restrictions our spline model adds n − 2 extra hyper-parameters to the underlying primary mass model we are perturbing, one for each of the inner knots' heights. We log-space the knots and perturb our underlying model with the multiplicative factor, exp(f (m 1 )), to reflect the wide range in orders of magnitude of the underlying power law. We then impose Gaussian priors on the knot heights {f i } centered at 0 and with standard deviations, σ knot . Our model then has two specifications which control the resolution (n) and the magnitude (σ knot ) of perturbations the model is sensitive to. We discuss the effect of changing these model settings on our prior assumptions and motivate the particular choices we made for this work in Appendix B.
In addition to the primary mass distribution, we simultaneously fit for the mass ratio and redshift distributions, without any spline perturbations applied. We apply a power law distribution for the mass ratio as p(q|m 1 , m min , β q ) ∝ q βq Θ(qm 1 − m min )Θ(m 1 − qm 1 ), with Θ denoting the Heaviside step function that ensures m 2 is within the range [m min , m 1 ]. We then fit for the evolution of the merger rate with redshift also with a power law such that p(z|λ) ∝ dVc dz 1 1+z (1 + z) λ , where dV c is the co-moving volume element (Fishbach et al. 2018;Abbott et al. 2019bHogg 1999). We do not fit for a population prior on the BBH spins, and assume the spin prior used for individual event parameter estimation in Abbott et al. (2020c), which is uniform in component spin magnitudes and isotropic in component spin orientations. We enumerate each of the model's hyper-parameters and corresponding hyper-prior distributions used in this work in Table 1.

Astrophysical BBH Primary Mass Distribution
We use a catalog containing each of the 46 BBH mergers reported in Abbott et al. (2020c) in which we perform a hierarchical Bayesian analysis to infer the astrophysical mass spectrum and merger rate evolution with redshift, as described in Appendix A. We perform multiple iterations of our semi-parametric model with different numbers of knots and both a "conservative" (σ knot = 1) and "wide" (σ knot = 2) prior width on the knots. For both cases of prior width we additionally do a post hoc "marginalization" over the number of knots by combining posterior draws weighted according to the ratios in marginal likelihoods. Explicitly we take N min Zn Zmax samples from each inference where N min is the minimum number of samples from each posterior, Z n the marginal likelihood of inference with n knots, and Z max , the maximum marginal likelihood of the combined pos- Figure 2. We plot the differential merger rate, dR d log m 1 = m1 dR m 1 as a function of primary mass (top row) for the combined spline model marginalized over the 10, 15, and 20 knot models in blue. The solid line shows the median while the shaded region shows the 90% credible interval and the 90% credible interval found from the Powerlaw+Peak model in green. The black traces show 1000 draws from the combined spline model posterior and we plot kernel density estimates (KDEs) of the posterior samples of primary source frame mass for each of the 46 BBHs in GWTC-2. We plot m1p(m1) on a log-scaled y-axis with a Gaussian KDE approximating p(m1) for each event. These posterior samples are not re-weighted to a population and come directly from the accompanying data release to Abbott et al. (2020c). teriors. In Figure 2 we plot the posterior merger rate density as a function of primary mass (top row) for our combined spline model (combining 10, 15 and 20 knot models), compared to the Powerlaw+Peak model. The most prominent feature in the primary mass distribution is the apparent peak at ∼ 35M , similar to the peak found at the same mass by the Powerlaw+Peak model (Abbott et al. 2020h;Talbot & Thrane 2018).
In addition to the peak at ∼ 35M , there are signs of additional features -albeit less significant -at lower masses. We find signs of an inflated rate of mergers with primary masses ∼ 10M and reduced rate around ∼ 7.5M , when compared with the power law structure. The model is less certain about the low-mass features as there are only a few events with support for m 1 < 10 M . The dearth of observed low mass BBHs, combined with their small sensitive volume, significantly inflates our uncertainty at the low end of the mass distribution. In Figure 3, we show the results inferred by each of the 10, 15, and 20 knot spline models individually, showing that the spline model consistently finds common features regardless of the number and location of knots. The combined spline model based on the marginal likelihoods is mostly comprised of the 15 knot result because the ratios of marginal likelihoods favor 15 knots over 20 at 2:1 odds and 15 over 10 at 3:1. The spline models are best constrained in the re-gions of over/under densities discussed above and much less certain (prior dominated) in regions between the features where the parametric component (i.e. power law) can fully explain the trend. The bottom row of Figure 3 shows the perturbation function, f (m 1 ), inferred from the different spline models. While there are some differences between knot choices due to the different length scales, they are all in agreement when taking into account the uncertainties and each consistently recovers similar merger rates and perturbations at both the 10M and 35M peaks. In Figure 4 we plot the posterior distribution of the perturbation function sliced at the approximate masses of the three, (f (m 1 = 7.5 M ), f (m 1 = 10 M ) and f (m 1 = 35 M )). We find similar posteriors on the perturbation at these three mass regions, across the models varying the number of knots and the spline prior width. We calculate the percentile where f = 0 falls in the posterior distribution for each of these three cuts, which would be near 50% in the presence of no deviations to the power law or equivalently for draws from the spline model prior. The percentiles of zero perturbation for the combined model with the conservative (wide) priors are 70.8% (72.6%), 19.4% (13.1%), and 0.8% (2.5%) at 7.5M , 10M , and 35M , respectively.
The presence of the ∼ 35M feature was previously found and reported in Abbott et al. (2020h) as being Figure 3. The median (solid) and 90% credible interval (shaded) of the inferred differential merger rate density as a function of primary mass (top row) with spline models with 10 (purple), 15 (blue), and 20 (orange) knots. We show the conservative prior case (σ knot = 1) in the left column and the wide prior case (σ knot = 2) on the right. The middle row shows kernel density estimates (KDEs) of the posterior samples of primary source frame mass for each of the 46 BBHs in GWTC-2. We plot m1p(m1) on a log-scaled y-axis with a Gaussian KDE approximating p(m1) for each event. These posterior samples are not re-weighted to a population and come directly from the accompanying data release to Abbott et al. (2020c). The bottom row shows the median (solid) and 90% credible intervals (shaded) of the inferred perturbation function, f (m1), for each choice of n, with the vertical lines showing the locations of the spline knots. either a peak (likely due to the PPISN pileup (Talbot & Thrane 2018)) or a break to a steeper power law. The Powerlaw+Peak model returned the highest marginal likelihood of parametric mass models considered in Abbott et al. (2020h), but was only favored with roughly 3:1 odds. Due to the inherent nature of the spline perturbation model, we would be more likely to find features that look like peaks rather than a power law break in the distribution. We additionally fit for spline perturbations on top of the BrokenPowerlaw model, which found little to no support for two different power law slopes and recovered a nearly identical primary mass distribution to what was found when modulating a single power law. The low mass feature recovered by our spline model was not identified in Abbott et al. (2020h) because the models considered there did not have the flexibility to fit such features at low mass: they only include a smooth rise to the low-mass end of the powerlaw. This could be explained by additional structure that cannot be described by a smooth rise to a power law, coming from the upper edge of the proposed neutron star black hole mass gap. The flexibility of our semi-parametric approach enables us to find additional structure in the astrophysical mass distribution beyond what can be found with simpler toy models. While the spline perturbation model clearly finds structure beyond the power law around ∼ 35M in the analyzed catalog of 46 BBH mergers, we cannot say for certain that the low mass feature is inherent to the astrophysical mass distribution. There is still the possibility that our model could be latching onto fluctuations in our data due to small number statistics. This possibility is reflected in the percentiles in Figure 4. The perturbation function at 7.5M and 10M does not rule out f = 0 at high cred- ibility regardless of prior choices in the spline model, while in contrast, the perturbation at 35M rules out f (35M ) = 0 at 97-99.4% credibility across each variation of spline model used. We investigate the possibility that these subsequent deviations from a power law could appear due to our model's systematics in Appendix C. We report no signs of correlations between the successive perturbations, which would be expected if the spline function was imposing biases onto our inferred perturbations. With future larger catalogs of gravitational-wave sources, we will be able to further resolve these lowmass features to determine if they are indeed present in the astrophysical mass distribution or a reflection of the current small catalog size.

Posterior Predictive Checks
With the large flexibility that comes from taking nonparametric approaches to modeling, one must be careful in validating inferences, especially in cases with small numbers of observations. One way we evaluate our semiparametric model is through posterior predictive checks. We employ the same injection set used to estimate our search's selection effects mentioned in Appendix A to create mock detected populations under a given population described by a posterior on hyper-parameters. We then compare these mock populations to the observed population. To do this, we first re-weight the injections to our inferred population for N draw draws of hyper-parameters from our population posterior, then take N obs (46 for this BBH-only analysis) draws for each of the re-weightings. This generates N draw sets of N obs "Predicted" observations for a given population inference. Next, we re-weight the individual event posterior samples to the same inferred population for N draw draws of hyper-parameters from the posterior. For each draw of hyper-parameters, we take a fair draw from each reweighted event posterior to generate our corresponding N draw sets of N obs "Observed" observations for a given population inference. From this procedure we generated 500 sets of 46 "Observed" and "Predicted" catalogs, which we compare to each other in Figure 5 to confirm that our inferred population model predictions are consistent with the observations. We show the cumulative probability as a function of observed primary mass in the top row, the relative error in predicted primary masses to observed in the second row, and the last row shows the observed and predicted primary mass distributions averaged over all of the hyper-parameters inferred in our posterior. The colored bands in the top row of Figure 5 show that a model is inconsistent with observations when the dark "Observed" band extends outside of the lighter "Predicted" band. We see this behaviour at ∼ 40-50M for the Truncated model which illustrates a conclusion from Abbott et al. (2020h): the Truncated model is inconsistent with the mergers in GTWC-2. The spline perturbation model is the only primary mass model that recovers both the low and high mass features seen in the observed distribution, but it does exhibit more uncertainty than other models in the regions between the ∼ 10M and ∼ 35M peak. When Figure 5. Posterior predictive checks for three of the parametric models used in Abbott et al. (2020h) and the spline perturbation model of this work. We show the spline model result with the highest marginal likelihood which was the 15 knot and σ knot = 1 model. The observed and predicted values for primary masses are generated by re-weighting either the injection set or the set of posterior samples for each BBH analyzed, for 500 draws from each models inferred posterior on the hyper-parameters, and then drawing 46 values from the re-weighted injections and a single fair draw from each of the 46 event re-weighted posterior samples. The top panel shows the CDF generated from these sets observed and predicted events for each of the 4 models, with the 90% credible levels enclosed by the bands, the median in dark black, and the thin black lines showing 50 of the 500 sets of 46 predicted events. The middle row uses the same set of predicted and observed events and the y-axis shows the relative error in predicted to observed mass ((m pred 1 − m obs 1 )/m pred 1 ) as a function of m obs 1 . The last row of plots shows the PDF of the top row averaged over the 500 draws from the posterior on the hyper-parameters for both sets of events. considering possible fluctuations due to small number statistics, the observations at low mass are still consistent with both the Powerlaw+Peak and Broken-Powerlaw models.

Astrophysical Implications
The BBH mass distribution is particularly well-suited to answering a wide range of astrophysical questions. In particular, the masses of detected events are relatively well-measured, and different channels of BBH formation result in different mass distributions (e.g. Zevin et al. 2017), implying that the formation history is encoded within these distributions. With the tens of detected events available now, disentangling the overlapping subpopulations in the full population (if they exist) is a challenge. Our perturbation model can be used to see if or where a distribution describing a single (dominant) sub-population or formation channel may fail to fully fit the data, which would provide evidence that there may be non-negligible contributions due to additional formation channels. The hints of structure we see at the low end of the mass distribution could point to such a superposition of multiple formation channels. Another factor that can affect the mass distribution is the physics of PISN or PPISN. Stellar evolution models describing mass loss and PPISN have uncertainties that can drastically change predictions on the masses at which the PPISN and PISN play a role. For example, choices in nuclear reaction rates within stellar cores can affect the BBH mass distribution (Farmer et al. , 2020). Our spline model enables us to measure these imprints of BBH formation in the observed distribution without enforcing the specific distribution shapes that are inherent to parametric models. Our findings corroborate the existence of a PISN "pile-up" in the 30 -40 M range, and we infer its shape without assuming a simple functional form. With more GW detections, further resolution of this peak with the spline model could offer insights into supernova physics.

CONCLUSIONS
Accurate estimation of the BBH mass distribution is paramount to getting accurate estimates of merger rates, the GW stochastic background, and false alarm rates for potential new triggers. Low-dimensional parametric models have the advantage of being easily interpreted but are limited in their flexibility and subject to a-priori expectations. We presented a semi-parametric approach to modeling the primary mass distribution of BBH mergers, using cubic splines that modulate a power law. We show that our flexible semi-parametric approach, when applied to the BBHs in GWTC-2, consistently recovers the previously reported excess of observed mergers near ∼ 35M , and shows potential signs of additional features in the low-mass end of the BBH distribution. These low-mass features beyond the power law structure, correspond with similar features found in the chirp mass distribution using a separate non-parametric approach based on a flexible Gaussian Mixture model (Tiwari & Fairhurst 2021). We show through posterior predictive checks that the spline model is at least as good as the Powerlaw+Peak model at fitting the high mass structure in our catalog while having the flexibility beyond a smooth rise into a power law to capture the apparent excess at 10M . Structure in the mass distribution that could arise from many different astrophysical phenomena but if we are able to confidently identify either a high-mass cutoff or pileup in the mass distribution it is likely to be related to effects of PISN or PPISN. These two features can both be used as calibrated massscales to measure a redshift-luminosity-distance relation with which it is possible to infer cosmological parameters with .
Our semi-parametric approach has advantages compared to other fully non-parametric approaches modeling the BBH mass distribution (Mandel et al. 2016;Tiwari 2021;Tiwari & Fairhurst 2021). The semiparametric approach leverages the information learned from the parametric models to explain the majority of the structure, while reserving the flexibility to see where observations may start to diverge from previous inferences. This same method of applying cubic spline perturbations to simpler population models can be used on any of the other commonly modeled population distributions such as the mass ratio, spins or redshift evolution. With the relatively small catalog sizes currently available, structure, if present, would likely only appear in the best measured parameters. In future work, we plan to extend this method to incorporate multi-dimensional splines, that could uncover correlations between different parameters such as peaks in the mass distribution associated with a high spin magnitude. Such correlations would be a tell-tale sign of hierarchical mergers, for example (Gerosa & Berti 2017;Fishbach & Holz 2017;Kimball et al. 2020a;Doctor et al. 2020Doctor et al. , 2021. Future work could also extend this method to more than just the BBH mass distribution, and allow for adaptive resolution splines that allow the knot locations to vary (Edelman et al. 2021b). With additional observations of GWs associated with BNSs and NSBHs (Abbott et al. 2021) our spline perturbation model is well suited to model the joint mass distribution of all GW-observed compact objects. This would complement the parametric model of , giving insights on the structure (or lack there of) of the "lower mass gap" that may exist between the heaviest neutron stars and lightest black holes.

ACKNOWLEDGEMENTS
We thank Will Farr for countless useful discussions related to this work and hierarchical modeling. This research has made use of data, software and/or web tools obtained from the Gravitational Wave Open Science Center (https://www.gw-openscience.org/), a service of LIGO Laboratory, the LIGO Scientific Collaboration and the Virgo Collaboration. This work benefited from access to the University of Oregon high performance computer, Talapas. This material is based upon work supported in part by the National Science Foundation under Grant PHY-1807046 and work supported by NSF's LIGO Laboratory which is a major facility fully funded by the National Science Foundation.

Mass Ratio Model Parameters
PowerLawMassRatio βq slope of the mass ratio power law U(-4, 12)

Spline Perturbation Model Parameters
Cubic Spline {mn} location in primary mass of the n spline interpolant knots FIXED {fn} y-value of the spline interpolant knots N (µ = 0, σ = 1) Table 1. This table enumerates all the hyper-parameters, their descriptions, and chosen priors for this work for each respective population model we use. The Truncated model is extended from the version used in Abbott et al. (2020h) to have the option of a low-mass taper of the same form as the Powerlaw+Peak model. Note that we do not describe a spin population model in this table since in this work we are not inferring a hyper-prior on the spins and instead assume they are described by the default (uniform in component magnitudes, isotropic in orientations) parameter estimation prior used to produce Abbott et al. .
We use hierarchical Bayesian inference to simultaneously infer the population distributions of the primary masses (m 1 ), mass ratios (q) and the redshifts (z) of observed BBHs. For a set of hyper-parameters, Λ, and local (z = 0) merger rate density (units of mergers per co-moving volume per time), R 0 , we write the overall number density of BBH mergers in the universe as: Above, dV c is the co-moving volume element (Hogg 1999) and T obs , the observing time period that produced the catalog with the related factor of 1+z converting this detector-frame time to source-frame. We assume a LambdaCDM cosmology using the cosmological parameters from Ade et al. (2016). The redshift evolution of the merger rate follows p(z|λ) ∝ dVc dz 1 1+z (1 + z) λ . Integrating equation A1 across all masses, and up to some redshift, z m , returns the total number of BBH mergers in the universe out to that redshift. Let {d i } represent a set of data from N obs observed gravitational waves associated with BBH mergers. We model the merger rate as an inhomogenous point process and when imposing a log-uniform prior on the merger rate, we can marginalize over the merger rate to get the posterior distribution of our hyper-parameters, Λ (Mandel et al. 2019;Vitale et al. 2021b).
where, L(d i |m 1 , q, z), is the single-event likelihood function from each events original analysis, and ξ(Λ) is the detection efficiency given a population distribution described by Λ. The procedure for calculating ξ(Λ) is described in more detail below. The LVC reports out posterior samples for each observed event, with which we can use importance sampling to estimate the integrals above in equation A2. We replace the integrals with ensemble averages over K i posterior samples associated with each event in the catalog: Here j indexes the K i posterior samples from each event and π(m 1 , q, z) is the default prior used by parameter estimations that produced the posterior samples for each event. In the analyses of GWTC-2 the default prior used is uniform in detector frame masses and Euclidean volume. The corresponding prior evaluated in source frame masses and redshift is π(m 1 , q, z) ∝ m 1 (1 + z) 2 D 2 L (z) dD L dz , where D L is the luminosity distance. To carefully incorporate selection effects to our model we need to quantify the detection efficiency, ξ(Λ), of the search pipelines that were used to create GWTC-2, at a given population distribution described by Λ. ξ(Λ) = dm 1 dqdzP det (m 1 , q, z)p(m 1 |Λ)p(q|m 1 , Λ)p(z|Λ) (A4) The above integral is not tractable since there is no analytic prescription for P det (m 1 , q, z), the detection probability of an individual event. To estimate this integral we use a software injection campaign where GWs from a fixed, broad population of sources are simulated, put into real detector data, and then run through the same search pipelines that were used to produce the catalog we are analyzing 1 . With these search results in hand, we use importance sampling to evaluate the integral in equation A4: Where the sum indexes only over the N found injections that were successfully detected out of N inj total injections, and p inj (m 1 , q, z) is the reference distribution from which the injections were drawn. Additionally, we follow the procedure outlined in Farr (2019) to marginalize the uncertainty in our estimate of ξ(Λ) due to a finite number of simulated events. We make the assumption that repeated sampling of ξ(Λ) will follow a normal distribution with ξ(Λ) ∼ N (µ(Λ), σ(Λ)), where the mean, µ, is the estimate from equation A5, while the variance, σ 2 , is defined as: Above we define N eff as the effective number of independent draws contributing to the importance sampled estimate, in which we verify to be sufficiently high after re-weighting the injections to a given population (i.e. N eff > 4N obs ). We write the hyper-posterior marginalized over the merger rate and uncertainty in estimation of ξ(Λ), neglecting terms of O(N −2 eff ) or greater (Farr 2019), as: We explicitly enumerate each of the models used in this work for p(m 1 |Λ), p(q|m 1 , Λ), and p(z|Λ) along with their respective hyper-parameters and prior distributions in Table 1  To compare competing population models in the aforementioned Bayesian framework we calculate two different measures of model goodness-of-fit, namely the marginal likelihoods (Z) and deviance information criterion (DIC) (Spiegelhalter et al. 2002). The marginal likelihood for a given model is a constant that enforces that the posterior distribution is normalized (i.e. Z = dΛp(Λ|{d i })), which has the property that it is higher for models that fit the data better or find higher likelihoods, while penalizing more complicated models by their prior volumes. As our semiparametric approach has arbitrary prior choices one needs to make, this can significantly affect the marginal likelihoods inferred.We also calculate the DIC, a metric developed specifically for Bayesian hierarchical models (Spiegelhalter et al. 2002), which is less sensitive to the arbitrary prior choices for our semi-parametric model. While there are some limitations to the DIC (Spiegelhalter et al. 2014), it provides a secondary metric to validate our model choices. The DIC is defined as: With log L the mean log-likelihood, and p D the effective number of dimensions, defined as p D = 1 2 var(−2 log L) with var(...) denoting the variance. Lower DICs indicate better models which, similarly to the marginal likelihood, favors models that find higher likelihoods while penalizing the more complicated models through the effective dimension term. We compare two models by calculating the ratio of their marginal likelihoods (i.e. Bayes Factors 1 ), defined as the ratio of each models marginal likelihoods. To compare DICs, we take the difference of two models values (DIC dif = DIC A − DIC B ) where positive differences indicate preference for model B, and negative differences indicate preference for model A.
perturbation model to the Truncated model with different choices for n. We see that the comparisons favoring our spline model peaks around 15 knots, indicating that 15 knots is a good trade-off between our models flexibility and goodness of fit. We also report the marginal likelihoods and model comparisons (relative to the most preferred model) for each of the parametric primary mass models from Abbott et al. (2020h) and various specifications of the spline model in table 2. From this table we see that the spline model is consistently favored despite our arbitrary model specifications, giving credence to the hypothesis that there are features in the data our semi-parametric method is capable of finding that previously used parametric mass models are not sensitive to. We do not use the comparisons in Table 2 to determine the validity of the Powerlaw+Spline models over others, and further studies on simulated populations and the effect of small number statistics are needed to fully assess the significance and robustness of these features. However, as catalogs of BBH mergers increase in size, the impact of small number statistics will diminish.

C. CORRELATIONS OF PEAKS
We look for the effect of our spline function biasing the inferred perturbation function by plotting a corner plot of the value of f (m 1 ) sliced through the masses that show the largest deviations. This is shown in Figure 8 for the conservative knot prior and Figure 9 for the wide knot prior. If the dip followed by the peak feature at 7.5 and 10 solar masses was found due to the nature of cubic splines we would expect to see correlations between the heights at these values which do not appear in either Figure 8 or 9. Since we are fitting for the underlying power law model simultaneously with the perturbations we also might expect to see some correlations with the power law slope and the peaks. There are slight signs of an expected anti-correlation of the 10M peak height and the power law slope, and corresponding correlation of the 35M peak height with the slope. This happens due to the degeneracy between the parametric and non-parametric portions of our model. If the 10M peak is small, the power law slope becomes steeper so that the "turnover" power law at low mass can contribute to fitting this over-density. With a steeper power Figure 8. Corner plot that shows the posterior distribution on the power law slope, α, and the height of the perturbation function, f (m1), sliced at the three masses of most significant deviation: 7.5 M , 10 M , and 35 M . We show the results for spline models with σ knot = 1 and 10 (purple), 15 (blue) and 20 (orange) nodes. The median and 90% credible intervals quoted are for the 15 knot model. law slope, the power law portion of the model under fits the excess at 35M , leading to a larger peak found in the perturbation. We also note these correlations with the power law slope are more apparent in the lower resolution spline models and under the wider prior on the knots. Figure 9. Corner plot that shows the posterior distribution on the power law slope, α, and the height of the perturbation function, f (m1), sliced at the three masses of most significant deviation: 7.5 M , 10 M , and 35 M . We show the results for spline models with σ knot = 2 and 10 (purple), 15 (blue) and 20 (orange) nodes. The median and 90% credible intervals quoted are for the 15 knot model.