A detailed analysis of GW190521 with phenomenological waveform models

In this paper we present an extensive analysis of the GW190521 gravitational wave event with the current (fourth) generation of phenomenological waveform models for binary black hole coalescences. GW190521 stands out from other events since only a few wave cycles are observable. This leads to a number of challenges, one being that such short signals are prone to not resolve approximate waveform degeneracies, which may result in multi-modal posterior distributions. The family of waveform models we use includes a new fast time-domain model IMRPhenomTPHM, which allows us extensive tests of different priors and robustness with respect to variations in the waveform model, including the content of spherical harmonic modes. We clarify some issues raised in a recent paper [Nitz&Capano], associated with possible support for a high-mass ratio source, but confirm their finding of a multi-modal posterior distribution, albeit with important differences in the statistical significance of the peaks. In particular, we find that the support for both masses being outside the PISN mass-gap, and the support for an intermediate mass ratio binary are drastically reduced with respect to what Nitz&Capano found. We also provide updated probabilities for associating GW190521 to the potential electromagnetic counterpart from ZTF.

In this paper we present an extensive analysis of the GW190521 gravitational wave event with the current (fourth) generation of phenomenological waveform models for binary black hole coalescences. GW190521 stands out from other events since only a few wave cycles are observable. This leads to a number of challenges, one being that such short signals are prone to not resolve approximate waveform degeneracies, which may result in multi-modal posterior distributions. The family of waveform models we use includes a new fast time-domain model (IMRP TPHM), which allows us extensive tests of different priors and robustness with respect to variations in the waveform model, including the content of spherical harmonic modes. We clarify some issues raised in a recent paper [1], associated with possible support for a high-mass ratio source, but confirm their finding of a multi-modal posterior distribution, albeit with important differences in the statistical significance of the peaks. In particular, we find that the support for both masses being outside the PISN mass-gap, and the support for an intermediate mass ratio binary are drastically reduced with respect to what [1] found. We also provide updated probabilities for associating GW190521 to the potential electromagnetic counterpart from ZTF [2].

I. INTRODUCTION
GW190521 [3,4] is a uniquely stimulating gravitational wave (GW) event: it challenges our understanding of astrophysical formation channels of black holes, the accuracy of our waveform models, and our methods for data analysis. The signal found is a very short transient with a duration of only approximately 0.1 s, and around four cycles in the frequency band 30-80 Hz [3]. The source of the signal was originally identified as most likely being the merger of a binary black hole (BBH) system with a total mass of about 150 solar masses in the source frame, the highest-mass merger observed to date. Furthermore, at least the more massive component was identified as having a very high probability of being inside the pair instability supernova (PISN) mass gap [5]. In addition, a potential electromagnetic counterpart has been identified [2], although its association with the GW event is not considered robust [2,6,7].
For such short signals it is however not surprising if GW waveforms corresponding to different source parameters fit the observed data equally well, and indeed already the original publication [4] by the LIGO Scientific and Virgo collaborations (LVC) discussed a wide range of possible alternative sources, and recent papers have proposed interpretations including that of a highly eccentric collision [8][9][10], a Boson star merger [11], a high-mass black hole-disk system [12], or the first instance of an intermediate mass-ratio inspiral [1]. The latter paper found a tri-modal posterior distribution, whose modes required a careful choice of priors and sampler settings to be resolved when running with the precessing frequency-domain model IMR-P XPHM [13], which was developed recently, involving some of us.
For high mass ratios, waveform models are not yet calibrated to numerical relativity (NR) simulations of precessing systems, and even the coverage offered by aligned-spin NR waveforms is sparse when compared to approximately equal masses. Therefore, modelling and extrapolation effects are expected to be significant and the impact of waveform systematics in this region of parameter space is still poorly understood. Indeed the possibility to choose among different precession prescriptions in IMRP XPHM represents a useful tool to investigate the impact of different modelling approximations on parameter estimation. Frequency-domain waveform models such as IMR-P XPHM and its predecessors IMRP P [14,15] and IMRP P HM [16] use a number of common approximations (see [17] for a recent discussion), in particular the "twisting-up" method to represent precession effects starting from NR-calibrated aligned-spin waveforms [14,15], and the stationary phase approximation (SPA). Both strategies allow to significantly accelerate waveform evaluation. The SPA is formally valid only in the slowly-evolving inspiral phase, and its continuation into the highly-dynamical merger-ringdown regime leads to inaccuracies that are likely to be particularly relevant for short-lived signals where only a few cycles around merger-ringdown are observed.
In order to gain a better understanding of the impact of different modelling approximations on parameter estimation results, we reanalyze GW190521 with different variants of IM-RP XPHM [13,18,19] and the new phenomenological time-domain model IMRP TPHM [20][21][22]. Unlike its frequency-domain counterpart, IMRP TPHM does not resort to the SPA approximation and offers a number of improvements in the description of precession effects both in the inspiral and merger-ringdown regimes. In this paper we will systematically compare results obtained with both models, following the strategy of a closely related paper [23] presenting a complete re-analysis of the GWTC-1 catalog [24], as well as of another publication specializing on GW190412 [25,26], where similar systematic comparisons were carried out.
The purpose of this paper is twofold. First, we will clarify some of the challenges encountered in the analysis of high-mass non-vanilla GW events such as GW190521, in particular in terms of waveform systematics and robust Bayesian sampling. To this end, we will perform cross-comparisons between results obtained with two independent sampling codes, parallel Bilby [27,28] and LALInference [29]. This is a particularly urgent task, as we expect the number of such atypical events to grow with the improvements in detector sensitivity. Second, we aim to provide improved parameter estimation results for GW190521 that might be useful to clarify its astrophysical properties. A key result is that we confirm the multi-modal nature of the posterior found in [1], but with some drastic quantitative changes due to improvements in the waveform models, in particular support for both masses being outside the PISN mass-gap, and the support for an intermediate mass ratio binary are drastically reduced with respect to what whas found in [1].
The paper is organized as follows. We first discuss the waveform models we employ in Sec. II, focusing on differences that are relevant for analyzing GW190521, and on how we can test robustness by comparing results from different models. We then summarize previous results from the literature in Sec. III. In Sec. IV we describe our methods for parameter estimation and for checking convergence and consistency between different prior assumptions. Readers interested primarily in new results may wish to skip to our results in Sec. V, which is introduced by a sub-section that briefly outlines the types of analyses we have performed and the results we have obtained. We give our final conclusions in Sec. VI and discuss the dependency of the results on spherical harmonic mode content in Appendix A.

A. Notation and conventions
We use the same notation and conventions as in our reanalysis of GWTC-1 [23]. We will report all masses in units of the solar mass M , and except where otherwise noted, we always refer to masses in the source frame, assuming a standard cosmology [30] (see Appendix B of [24]). Some figures and tables use a "src" index as a more explicit notation for clarity. Individual component masses are denoted by m i , and the total mass is M = m 1 +m 2 . The chirp mass is M = (m 1 m 2 ) 3/5 M −1/5 . We define mass ratios q = m 2 /m 1 ≤ 1 and Q = m 1 /m 2 ≥ 1.
We also define effective spin parameters which are commonly used in waveform modelling and parameter estimation. The parameter χ eff is defined as where the χ i are the projection of the spin vectors of the individual black holes onto the instantaneous direction perpendicular to the orbital plane. The effective spin precession parameter χ p [31] is designed to capture the dominant effect of precession, and corresponds to an approximate average over many precession cycles of the spin in the precessing orbital plane, and is defined in terms of the average spin magnitude S p , [31] where A 1 = 2 + 3/(2q), and χ p is then defined as Both χ eff and χ p are dimensionless and thus independent of the frame (source or detector). We will employ waveforms with several multipoles beyond the quadrupolar contribution, always considering pairs of both positive and negative modes when referring to a particular multipole. Thereby, to refer to the example list of multipoles (l, m) = (2, ±2), (2, ±1), we will use the notation (l, |m|) = (2, 2), (2, 1) or simply (2, 2), (2, 1).
Here we employ two further recently developed models that represent upgrades of IMRP P HM and constitute a fourth generation of IMRP models: • The frequency-domain model IMRP XPHM [13] which builds upon an underlying non-precessing model IMRP XHM [18,19,57] that features calibration of the subdominant harmonics to NR simulations.
• IMRP TPHM [20,22], building on the nonprecessing model IMRP THM [20,21], which essentially applies the same phenomenological techniques at the heart of IMRP XPHM to construct a native time-domain model. Working in the time domain allows several key improvements that we will discuss below.
All these models include a description of precession effects and sub-dominant harmonics, but do not include eccentricity, and they have complementary strengths and shortcomings that we will detail below. I. Waveform models used in this paper. We indicate which multipoles are included for each model. For precessing models, the multipoles correspond to those in the co-precessing frame. For IMRP TPHM, we also show comparison results with reduced sets of multipoles at several points in this paper, and in fact we use the ≤ 4 setting as a default run and comparison basis in most studies of alternative model options or priors.
The IMRP models describe precession via an approximate map between signals from non-precessing and precessing systems, which we will refer to as "twisting-up" [58][59][60]. This approximation exploits the fact that, at least during the inspiral, the precession time scale is much slower than the orbital time scale, and thus the precessing motion mainly acts as an amplitude modulation. The spin of the remnant of the precessing system is however in general significantly different from the final spin of the non-precessing system. For a recent discussion of the approximations of this approach see [17].
The SEOBNR PHM model [32-34, 52, 61] numerically integrates in the time domain the EOB BBH dynamics, including the spin-precession equations, using a Hamiltonian and GW flux that are tuned to non-precessing NR simulations. Then, the waveforms in the inertial frame are obtained by applying a time-domain rotation ("twisting-up") to the waveforms in the co-precessing frame [33,34,61].
Neither the SEOBNR PHM nor IMRP models include the asymmetries between positive and negative m spherical harmonic modes in the co-precessing frame, which are related to the large recoil velocities observed in some NR simulations of precessing binaries, as discussed in [62].
We now turn to describe relevant aspects of the IMRP -X and IMRP T families which we use for the new results presented in this paper. Frequency-domain waveform models are particularly attractive for GW data analysis, since they are naturally adapted to a matched-filter type analysis, where the noise is characterized in the frequency domain, and accordingly allow the most computationally efficient Bayesian inference analysis. In order to accelerate the evaluation of precessing waveforms, current frequency domain IMRP models also use the SPA approximation to compute the transfer functions between the frequency-domain non-precessing waveform and the precessing waveform in an inertial frame (see [63] for more accurate alternatives). The assumptions underlying the SPA fail for merger and ringdown, but the method has been found to work surprisingly well, and has been routinely used in GW data analysis, see e.g. [24,64]. The approximation does however have to be employed with caution when essentially the only observable part of the signal is the merger and ringdown, as is the case for GW190521. This shortcoming has been one of the main reasons for us to also develop a time-domain phenomenological waveform model, IMRP TPHM, which does not rely on the SPA. One of the goals of this paper is to discuss in detail how to avoid misleading conclusions from analysing GW190521 with IMRP XPHM, and how improved results can be obtained with IMRP TPHM.
Since neither IMRP XPHM nor IMRP TPHM are calibrated to precessing NR waveforms but rather build on the above approximations to describe precession effects, it is essential to incorporate in the models some functionality to test the robustness of results for challenging events like GW190521. As discussed in Appendix F of [13] for IMRP XPHM and in the appendix of [22] for IMRP TPHM, the LALSuite [65] implementation of our models supports several options regarding the choice of precession prescription and final spin approximations. These options are selected with parameters that take integer values, which we will refer to as PV for the inspiral precession version and FS for final spin.
Our "twisting-up" procedure is based on time/frequency dependent rotations from the co-precessing frame to an inertial one in which we observe the signal. For the IMRP models this rotation is implemented through three Euler angles. IMRP P only supports an effective single-spin, orbitalaveraged description valid at next-to-next-to-leading (NNLO) post-Newtonian order. IMRP XPHM allows using the same prescription, but as a default it relies on a more recent double-spin description that can be derived within the post-Newtonian framework using multiple-scale-analysis (MSA) [66] (this description is also used by IMRP P HM). IMRP TPHM implements both NNLO and MSA Euler angles, but its default behavior is to numerically integrate evolution equations for the component spins as discussed in [22]. We will refer to different precession prescriptions with the acronym PV and follow the same convention enforced in LAL-Suite, where PV=223 corresponds to the MSA approximation and PV=300 to the numerically integrated angles.
Another setting of the models that can be specified by the user is the final spin approximation, as discussed in Sec. IV.D of [13] for IMRP XPHM and Sec. II.E of [22] for IM-RP TPHM. The default choice for IMRP XPHM is to use a precession-averaged equation inspired by the MSA formalism. This version will be referred to as version FS=3. Alternative versions attach the in-plane spins to the larger mass, either relying on the usual effective precession spin χ p (FS=0, which is adopted by all third-generation IMRP models), or by taking the norm of the in-plane spin vectors at the reference frequency (FS=2). The default version of IMRP -TPHM takes the norm of the in-plane spin vectors at the coalescence time (FS=4).
There are several improvements in the treatment of precession achieved by the time-domain IMRP TPHM in comparison with the frequency-domain IMRP XPHM. First, in order to obtain explicit expressions for the spherical harmonic modes of the precessing frequency-domain models, IMRP -XPHM and previous IMRP models use the SPA to compute approximate Fourier transforms. Second, in the time domain it is simple to incorporate analytical knowledge about the ringdown frequencies in the ringdown portion of a precessing waveform, see Sec. II.E of [22]. This has not yet been achieved in the frequency domain. This is particularly crucial for GW190521, where a large part of the observed signal-to-noise ratio (SNR) is due to the ringdown portion of the signal. Third, the numerical integration of the equations for the spin dynamics in IMRP TPHM also resolves an inconsistency of the MSA Euler angles with the non-precessing limit, which we discuss in [22]. The numerical integration also provides further gains in accuracy and we find it to decrease the computational cost [22].
Finally, we note that a particularly challenging region of the BBH parameter space arises at larger mass ratios, where the precession cone angle can become large, and the orbital angular momentum L can become smaller than the (sum of the) component spins. Then both L and the total angular momentum J may end up flipping their direction. The latter situation is also known as transitional precession [59,67,68], as opposed to simple precession, when J at least approximately maintains its direction. Very few NR simulations exist for the cases of large angles between J and L, and for the sign of J flipping, and these situations are related to various caveats in the post-Newtonian and MSA approximations that are often used in waveform modelling, and in particular in the IMRP models. While the systematic errors of precessing waveform models are in general not yet very well understood, this is particularly true when the normal to the orbital plane or the final spin flip sign with respect to the direction at large separation, This situation indeed arises for the results of [1]. In such cases one should proceed with great caution, and test the robustness of results by comparing different waveform models. Future work will aim to improve the robustness of our models for such situations.

III. SUMMARY OF PREVIOUS RESULTS
Due to its exceptional nature, GW190521 was the subject of two dedicated LVC publications [3,4]; later it was also reanalyzed in the context of GWTC-2 [64]. Only results obtained with NRS are shown in the discovery paper [3], while [4] also presents results obtained with SEOBNR PHM and IMRP P HM. The mass-ratio prior in these LVC analyses was constrained to q ≥ 0.17, matching the NRS extrapolation region. They also used a flat prior in detectorframe masses and a power-law d 2 L distance prior (albeit the latter was changed to a uniform in comoving source-frame volume prior in [64]). GW190521 was among the few events in GWTC-2 for which spin magnitudes could be constrained to be non-zero: this is also reflected in its relatively large inferred χ p (≈ 0.7 median value). The estimated mass ratio when running with NRS was q = 0.79 +0.19 −0.29 and runs performed with the other two waveform approximants delivered very similar results. The LVC analysis has strong astrophysical implications, as it places either or both components in the pair-instability supernova mass-gap (PISN) and the final remnant in the realm of intermediate-mass black holes, for which no conclusive evidence existed at the time of publication. Note that the limits of the PISN mass gap are uncertain, they have been placed at approximately 50 and 130 solar masses in [5], but a more recent analysis [69] suggests the lower limit could be as high as 70 solar masses, and the upper limit as high as 161 solar masses. Another recent analysis [70] placed the limits at 59 and 139 solar masses. For the LVC analysis lower limits of 50 and 65 solar masses were employed. Fishbach and Holz [71] challenged this conclusion starting from the observation that the mergerrate of systems involving a mass-gap component is expected to be very low. By imposing a population-informed prior, they concluded that GW190521 can be considered a "straddling" binary, where neither component can be confidently placed within the mass-gap. In particular, they find that, under the assumption that the secondary mass falls below the mass gap, then the primary mass distribution has a large support above the mass gap. This conclusion was supported in a later study conducted by Nitz and Capano [1], who suggested that the relatively tight constraint on the mass ratio imposed by the LVC analysis, coupled with the choice of luminosity distance prior and sampler settings (among which insufficient live points), led initial parameter estimation studies to exclude the highest likelihood region for this event. A comparison of results from the LVC and Nitz-Capano results is shown in Fig. 1 on the publicly available posterior data. Their reanalysis also identifies the primary as an IMBH, with a mass confidently above 100 M . The authors explored the impact of different mass priors (in the source frame) and imposed a uniform in comoving-volume prior on the luminosity distance, running both NRS and the more recent IMRP XPHM, which had only passed internal LVC review and become publicly available as part of LALSuite [65] on April 1 2020 and hence was not yet included in [3,4]. They found strong support for very unequal masses (q ≤ 0.25) and clear signs of multimodalities in the posteriors, which could not be eliminated when re-weighting to match the LVC priors. In particular, three distinct modes were identified, with Q ≈ 1, 5, 10. The support for the Q ≈ 10 mode was enhanced when using a flat prior in Q which favors more unequal-mass systems. Their analysis however did not yet use the latest version of the IMRP X-PHM model nor explored different options of this model, and the default version at that point used a prescription for the final spin of the merger remnant that has since been updated in the publicly available LALSuite version [65].
In a more recent paper [72], Capano et al. have analyzed GW190521 with model-agnostic ringdown signals, extracting an additional mass ratio measurement from only the ringdown part of the signal. The resulting posterior is unimodal, peaked away from equal masses, but broadly consistent with both the LVC results and the two lower-Q peaks of their previous results. They also include a re-run of their IMRP XPHM analysis with the updated default final-spin prescription which we discuss in Secs. II B and V C of this present paper and in more detail in the recently updated Sec. IV D of [13]. Those updated results no longer support the third mode at Q ≈ 10 and are overall consistent with the results we present here, and with another run with the updated IMRP XPHM default version presented in parallel by [70]. These results have been released together with a re-analysis of the public LIGO-Virgo data from the O1, O2 and O3a observing runs [73], and are available in a companion data release for [73]. We have compared the results presented here for IMRP XPHM with the updated results from [73], finding broad consistency, but with some differences in the recovered posteriors. A comparison of low-mass events from our re-analysis of GWTC-1 [23] with their results reported for these events shows larger discrepancies, as we discuss in [23]. The good agreement we find here and in [23] between different parameter estimation codes, LAL-Inference [29] and parallel Bilby [28], suggests that our results are robust. Tracking down the reasons for the differences found with respect to the results reported in [73] would require further work; we do note however that [73] use a different estimate of the noise power spectral density (PSD), and to our knowledge no calibration uncertainty estimates are employed.

A. Data set
We use public GW strain data collected by the Advanced LIGO detectors [74] and Advanced Virgo detector [75] from the Gravitational Wave Open Science Center (GWOSC) [76,77], as well as PSDs and calibration uncertainties included in the GWOSC release [78]. From the available GWOSC strain data sets we have selected the data sampled at 16 kHz, with a sampling rate of 1024 Hz chosen for our analysis, consistent with the choice in [3,4]. The lower and upper cutoff frequencies for the likelihood integration were taken to be 11.0 Hz and 512 Hz (the Nyquist frequency corresponding to the sampling rate), again consistent with [3,4].

B. Sampling codes
We have carried out Bayesian parameter estimation of the signal using two publicly available codes, the Python-based parallel Bilby (pBilby, PB) code [27,28], which uses the dynesty [79] variant of the nested sampling algorithm [80], and the LALInference (LI) code [29], which is part of the LALSuite [65] package for GW data analysis, using its implementation of Markov Chain Monte Carlo (MCMC) sampling.
Parallel Bilby provides a highly parallel and flexible implementation of nested sampling, and supports a range of priors and choices of sampling parameters and settings. With pBilby, we sample in mass ratio and chirp mass, which is easier than sampling the component masses in that code. We largely use the default settings of the code apart from the following choices: we fix the minimal (walks) and maximal (maxmcmc) number of MCMC steps to 200 and 15000 respectively. For our final results we have set the number of autocorrelation times to use before accepting a point (nact) to a value of 30. We have varied the number of nested sampling live points (nlive) between 1024 and 4096 for selected runs to test that we have obtained (sufficiently) converged results, and always show the results for nlive= 4096. In order to speed up calculations we use distance marginalization as described in [81]. For each of the pBilby runs we quote results for, we have carried out four independent simulations (independent seeds), and then merged the posteriors to a single posterior with PESummary [82].
LALInference samples in mass ratio and chirp mass, reweighting to a prior that is flat in component masses as described in [29]. We use essentially standard LALInference settings with eight temperatures, but a large number of independent chains, 120 for our production runs. For our LALInference runs we do not employ the distance marginalization used for our Bilby runs.
We have previously used pBilby as our primary code for our re-analysis of the GW190412 event [22,23,26], where we found good agreement with LALInference results as reported in [26]. We have however found that the computational cost of comparably well sampled pBilby runs is significantly higher than for LALInference runs due to the high required settings of the nact parameter. Here we use LALInference for our primary results and pBilby for comparisons.

C. Priors
Runs performed with pBilby have been sampled with a prior uniform in "inverse" mass ratio Q = 1/q, following [1]. This prior emphasizes unequal masses and improves pBilby convergence in the unequal-mass regime. Unlike [1], we choose to sample in the detector-frame, to take advantage of distancemarginalization, which would require a non-standard likelihood in Bilby if sampling in source-frame. In most of the results shown, and as indicated when stating results, we have performed a post-processing re-weighting from this prior to a prior flat in component masses with the corresponding functions from the Bilby code, to obtain results matching the same prior as the LALInference runs (flat in component masses).
Additionally we have performed some runs with restricted versions of these priors to improve resolution for more unequal masses. In particular, we report in Sec. V results for LALInference runs in a mass ratio range q ∈ [0.035, 0.15] and pBilby runs in a range q ∈ [0.035, 0.2]. For studying the possible association of the event with the reported AGN flare [2] we have also performed runs fixing the sky location and luminosity distance to the values reported for the AGN flare.
Finally, we have employed mainly three different sets of mass prior ranges for the runs reported in this work, checking that we avoid any significant railing against prior limits. Note that a small amount of railing against the lower mass-ratio bound is still present; however we decided to not attempt to precisely map out the tail at low q as models become less reliable and computational cost increases significantly. We comment on the possibility of further low-q posterior modes in Sec. VI. In some cases we also reweight the posterior samples obtained with a certain set of priors to obtain an approximation of what the posterior should be when using a different set of priors, without having to run the full alternative inference. Specifically, we use the bilby implementation of this reweighting procedure which converts samples with a given prior in chirp mass and mass-ratio to a prior flat in component masses by resampling the posterior with weights defined as the ratio in new-over-old prior values times the Jacobian of the transformation. The procedure produces new posteriors that contain only 25% of the original number of samples.
All runs have maximum component spin magnitudes limited to 0.99, and the luminosity distance prior is chosen as uniform in comoving volume, assuming the Planck15 [30] cosmology with a range D L ∈ [0.2, 10] Gpc. The LALInference prior contains an additional factor 1/(1 + z L ), accounting for time dilation, which is not present in the definition for pBilby; but using the reweighting procedure on pBilby results, we have tested that this does not have any noticeable influence on estimates of D L or any other quantities.

D. Maximum likelihood values and waveforms
The main results of Bayesian parameter estimation are the posterior distributions, and point estimates are usually given as medians with error estimates given by the 90% confidence intervals. Sometimes, it can also be enlightening to consider the maximum-likelihood value (max L) returned by an analysis, and which point in parameter space it correspond to, as the likelihood directly answers the question of how well the employed model (across the sampled part of parameter space) can fit the data. However, there are several caveats in interpreting the max L values over a set of posterior samples, since a Bayesian parameter estimation run, such as those we employ here using pBilby and LALInference, is by construction not an optimal max L finding algorithm. The prior has significant influence on how densely which parts of the parameter space are evaluated, and the max L reported over the final posterior sample may be far from the actual maximum over all likelihoods evaluated while the sampling chains progressed. It is important to note that the number of posterior samples is typically much smaller than the number of likelihood evaluations, and to achieve a good estimate of max L much more expensive sampling settings may be required than in order to get good estimates for source parameter values and error estimates. Nevertheless, comparing the max L across runs can be a useful additional diagnostic of the behavior of waveforms and samplers, and we highlight them on all posterior plots in this paper.
Indeed for GW190521, we find that the maximum-likelihood parameters do not appear to be stable across runs, and are influenced by statistical fluctuations, sampler settings, as well as waveform models and priors. For an example, see Sec. V C.

A. Overview
Our main results derive from the posterior distributions we have obtained with the IMRP XPHM and IMRP T-PHM models. To test the influence of the harmonic content in the templates, we will in general present results for different harmonic content for IMRP TPHM. In Figs. 2 and 3 we show these posteriors for some key parameters: the component masses, mass ratio, effective spins χ eff and χ p , luminosity distance d L , and the angle θ JN between J and the line of sight. We find consistency between the results obtained with LALInference and pBilby after re-weighting the pBilby results (with the uniform-in-Q prior) to a prior that is flat in component masses, giving us confidence in our sampling of parameter space. Recovered SNRs and signal-versus-noise Bayes factors for our main runs and several different model options are shown in Table II. Point estimates for key parameters of the main runs are also summarized in Table III, again comparing with those from [1,3,4]. Complete posterior datasets for our standard LALInference IMRP TPHM run with max = 4 and standard IMRP XPHM run can be found in our Zenodo data release [83].
To demonstrate the overall behavior of the GW190521 signal and the quality of the match with our waveform models, in Fig. 4 we show the max L templates from several IMRP TPHM runs with different mode content compared to the whitened detector data at the time of GW190521 and a waveform reconstruction from the unmodelled cWB analysis [3,4,84]. The detector data is best matched by the ringdown region of the IMRP TPHM model, while the cycles before merger are suppressed once whitened by the instrument's noise amplitude spectral density.
Below we will discuss details and consequences of the posterior results shown in Figs. 2 and 3, and we will analyze further posterior distributions, starting with different versions of the non-precessing versions of our models in Sec. V B, where we find good agreement between them. This serves as the more solid basis for the more challenging analysis with precessing models. We then use the IMRP XPHM frequencydomain model in Sec. V C and compare with the analysis of Nitz and Capano [1], and discuss effects of a code change we have implemented in the default version of IMRP XPHM, tracking the flipping of direction of the total angular momentum J in the same way as for non-default versions. We then investigate the case for multimodality in the mass parameters reported by [1] in Sec. V D, showing that we recover a multimodal mass posterior both with the IMRP XPHM and IMRP TPHM models, although with modified details compared to [1]. Then in Sec. V E we investigate the support for precession in the source system, comparing results obtained with precessing and non-precessing approximants. Finally we study the implications for component masses in the mass gap and the support for the association with the AGN flare as a possible electromagnetic counterpart in Secs. V F and V G. Further analysis of the importance of the multimode harmonic content is presented in appendix A. A crucial part of this analysis is to build confidence in our results by showing consistency between results obtained with different priors and different samplers (nested sampling [80] as implemented in pBilby and MCMC as available through LALInference).

B. Non-precessing approximants
Before turning our attention to precessing models, we will inspect results obtained with non-precessing waveform approximants. In this simplified context, current waveform models have reached a certain level of maturity, where all state-of-theart versions have been calibrated to NR simulations, including the subdominant harmonics content, to a varying degree. Therefore, we expect good agreement between different models, at least when the same subdominant mode content is included. We  [3,4], but as discussed in appendix A the posteriors become much better resolved once including modes up to ≤ 4, and seem mostly converged in comparison to adding further modes ≤ 5, and hence we use the ≤ 4 run as our main result in this paper.  show results for LALInference runs with IMRP THM and IMRP XHM in Fig. 5. One can see that there is consistency between IMRP XHM and IMRP THM when the same set of modes is included, which implies disabling the (3,2) mode in IMRP XHM and restricting to max = 4 in IMRP THM. We do observe larger differ-ences when including all the available modes in each model, with a shift towards slightly lower q and mild multimodality in the distance and inclination parameters for IMRP XHM, although joint distributions still look broadly consistent with IMRP THM. We also notice that the recovered mass ratio and effective spins are consistent with the values reported in the LVC publications. The same goes for other key parameters, such as source-frame masses, distance and inclination (see Fig. 1 and 6 in [4]). For all these results, both component masses lie confidently within the PISN mass gap (at 90% credible intervals).

C. Analysis with IMRP XPHM
The results reported in [1] for the IMRP XPHM model were obtained with the default version of the model (corresponding to MSA Euler angles and final spin version FS=3). The posteriors obtained by [1] have non-zero support in regions of parameter space where the direction of the total angular momentum J flips (see Sec. II B) and would thus require careful cross-checks for robustness, as discussed previously. This is due to the fact that for the default version we had initially implemented a different behavior as for other options: instead of attempting to track the direction of the total angular momentum J, a warning message was to be printed, alerting the user that the model is less reliable in case of flipped J. After the publication of [1] we however realized that the warning messages had not been printed correctly when the calculation of subdominant harmonics was activated. To avoid confusion, we have more recently implemented a change harmonizing the behavior of the different final-spin versions, and the code now always tracks the direction of J for all parameter settings; this is now also described in the recently updated Sec. IV D of [13].
With this change all final-spin versions now produce consistent results, as shown in Fig. 6, with a much reduced support for the parameter region where the mass ratio is high and the effective spin negative, and where thus J may flip its sign. In particular, we note that, using the latest code version, the support for both masses being outside the mass-gap is drastically reduced, see Table V. Consistent results with the updated default version have also been reported by [72] and [70], but here we present the first direct comparison using multiple final-spin versions. We also find in Fig. 6 that when changing the final-spin version, the position of maximum-likelihood sample changes considerably, this is however not surprising as discussed in Sec. IV D.

D. Multi-modality and support for high Q
We now turn to examining the results obtained with the default settings of IMRP XPHM and IMRP TPHM with LALInference and pBilby, where in both cases the two approximants were run with the same priors and sampler settings. As we have already mentioned, pBilby posteriors have been re-weighted to allow a direct comparison with LALInference results, see Sec. IV C. Results are shown in Fig. 2. One can appreciate a remarkable consistency between the two sampling codes. It is also clear that mass-ratio posteriors have a multi-modal behavior for both models. The main difference here is that more unequal mass ratios (q ∼ 0.25) in IMRP -XPHM are correlated with large negative χ eff while the unequal-mass-ratio support for IMRP TPHM is correlated with moderate positive χ eff . Compared to inference with aligned-spin models, support for the components to lie within XPHM. The prior is uniform in inverse mass ratio Q, re-weighted to flat in component masses. This is however an example where one can see that the max L estimate can be less robust than the posterior distribution, with the max L point when using the IMRP P prescription (FS=0, marked by an orange star) having very different parameters from the max L samples obtained using the other alternative final-spin prescriptions (light and dark blue stars) or the one from the Nitz-Capano run (red star). See Sec. IV D for caveats on interpreting max L estimates from Bayesian parameter estimation runs. For ease of inspecting the 1D posteriors, we repeat them as additional panels below the corner plot with extended aspect ratio. the mass-gap is reduced (see left panel), however we defer an extensive discussion of this point to Sec. V F. In line with [1], we find evidence for at least one high mass-ratio mode at Q ≈ 5, in addition to the mode with near-equal masses as originally reported by the LVC analysis (right panel).
In Fig. 7 we can see a comparison of the highest SNR values for the default LALInference runs with both IMRP X-PHM and IMRP TPHM. We can observe that the q ∼ 0.2 region produces similar SNRs for both models, but IMRP -TPHM has support for higher SNRs in the close-to-equal mass region. IMRP TPHM also recovers a small strip at q ∼ 0.1 at more or less the same height as the q ∼ 0.2 bulk. Differences in network matched-filter SNR here are only about 0.15 (which corresponds to 3.86 in max L). For IMRP -XPHM, the maximum SNR is located at q = 0.26 while for IMRP TPHM it is located at q = 0.975, in agreement with the max L positions shown in the right panel of Fig. 2. In order to better explore the regime of very unequal masses, we have performed IMRP TPHM runs with restricted mass-ratio priors, both with LALInference and with pBilby. In Fig. 8 one can see that results are consistent with not finding particular support for another mode below the one at q ∼ 0.2. The full prior run is poorly sampled in this region, with only 3.6% of samples below q = 0.15, so the small peak at q = 0.06 is probably an artefact from insufficient resolution in this region, and it is not recovered by the restricted runs. The maximum SNR recovered by the restricted runs is also lower than the SNR recovered by the full run near equal masses.
We have also checked that results are robust for different IMRP TPHM versions, as can be seen in Fig. 9. As discussed in appendix A, the bimodality is not recovered when using only the dominant ≤ 2 spherical harmonic modes, but is robust under inclusion of different subsets of the higherorder modes implemented in IMRP TPHM. Therefore, we overall find clear evidence for a bimodal mass-ratio posterior with a secondary unequal mass-ratio peak near Q ∼ 5, but while there is also non-vanishing posterior support reaching to even more extreme ratios, we find no clear evidence for a third peak as originally reported in [1]. These results are consistent with [72] and [70].

E. Spin and precession
In terms of spins, one can see in Fig. 3 that while IMR-P XPHM recovers a posterior that is approximately symmetric around χ p = 0.5, IMRP TPHM strongly favors high values of χ p . For χ eff , the IMRP XPHM posterior is bimodal, with peaks close to χ eff = 0 and at large negative values, while the IMRP TPHM posterior is broad but unimodal with the median close to small positive values. This is reflected also in the position of the max L points from the posteriors: while for IMRP TPHM results, both with pBilby and LALInference, max L lies at low positive χ eff and high χ p ≥ 0.8, for IMRP XPHM the max L is located around χ p ∼ 0.5 and its χ eff is large and negative.
Another thing to notice in relation with Fig. 3 is that adding precession does not seem to significantly affect the distance-θ JN joint distribution for IMRP XPHM with respect to IMR-P XHM, while for the time-domain IMRP T model family precession indeed helps to better constrain the posteriors. This is supported by the Bayes factors for the precessing hypothesis (computed from the difference in log signal-to-noise Bayes factors between precessing and non-precessing results) shown in Table IV, where the time-domain models show stronger support for precession than the frequency-domain models.

F. Masses and mass gap
In Table V we show the posterior probabilities for the component objects of the source of GW190521 being inside the PISN gap. The exact boundaries of this gap are not known exactly and depend in a highly non-trivial way on nuclear reaction rates and several aspects of stellar dynamics. For simplicity, we will provide probabilities computed assuming a gap either in the range [50, 120] M [5] or, following more recent estimates [69], in the range [70,161] M (we report results for the latter case in parentheses). Shifting the estimated boundaries towards higher masses increases (decreases) the probability of the heavier (lighter) component lying in the gap; it also drastically decreases the probability of both masses being in the mass gap.
We observe that IMRP XPHM with a prior that is flat in component masses (run with LI) has more support for the PISN gap hypothesis, as do the non-precessing models (both IMRP XHM and IMRP THM). One can see that for the IMRP XPHM LI run, sampled uniform in component masses, the support for at least one component in the mass gap is greater than 90%. The alternative prior uniform in Q = 1/q enhances the high mass-ratio region of the posterior, which typically makes both components lie outside the gap. In the runs we performed with pBilby and this prior, the support for components in the mass gap drops for IMRP XPHM. With IMRP TPHM, the mass gap probabilities are generally lower than for IMRP XPHM. Hence, we conclude that inference with non-precessing models, or with IMRP XPHM and the uniform-in-q prior, prefer the hypothesis of at least one object in the PISN gap (> 9 : 1 probability ratio), matching the original conclusions of [3,4]; but IMRP XPHM runs with uniform-in-Q prior and IMRP TPHM inference have only mild preference (ranging from ∼ 2 : 1 to ∼ 3 : 1 depending on model version and mode content) for this scenario, allowing more readily for the "straddling" hypothesis from [71] of the heavier BH above and the lighter BH below the PISN gap. As seen in Table V,  results for IMRP TPHM with different mode content ( ≤ 3, 4, 5) are broadly consistent, except for a larger probability with ≤ 3 of having the smaller mass, or either of the two masses, in the mass gap.

G. Extrinsic parameters and EM counterpart
The tentative association of GW190521 with the AGN flare ZTF19abanrhr [2] would be hugely impactful as the first detected electromagnetic counterpart to a BBH merger and open up cosmological applications [85][86][87][88]. This association is however far from certain [6,7]. Still, below we report updated association probabilities based on our new parameter estimation results. It is also illustrative to study how the GW190521 posteriors would change when factoring in knowledge of the sky location and distance of this counterpart as priors on the GW parameter estimation. To do so, we consider two scenarios: in the first, we only fix the right ascension and declination of the source, while in the second we also fix its luminosity distance, which can be calculated from the flare's redshift assuming a standard cosmology [30].
First, looking back at the d L -θ JN panel of Fig. 3 for the runs   with standard priors (no constraints on source location), we can see that while none of the models is able to break the hemisphere degeneracy, IMRP TPHM results are able to constrain more a particular inclination in each hemisphere, albeit some bimodality in each hemisphere is also present. Similarly, IMRP TPHM appears to deliver a better constraint on the luminosity distance than IMRP XPHM, for which there are also hints of bimodality in the posterior. We note also that the max L sample of the IMRP XPHM posterior has a lower luminosity distance than that of the IMRP T-PHM posterior, and that these results are consistent between LALInference and pBilby runs.
Switching to IMRP TPHM runs with counterpartinformed priors, in Fig. 10 we present the results for masses and spins. Fixing either the 2D sky location or the full 3D localization to those of the AGN flare, the support at mass ratio Q greater than 5 is enhanced in both cases. However, fixing the 3D localization also constrains the mass posteriors more overall, while fixing only the 2D sky location mostly extends support to higher Q (lower q). The 2D-fixed prior produces a slightly bimodal posterior in χ eff that is however broadly consistent with that from the standard prior, while the run with fixed 3D localization shifts the posterior mode of χ eff towards 0. For the precessing spin parameter χ p , fixing only the sky position has no noticeable effect, while fixing the 3D localization shifts the recovered posterior distribution to milder values. It is also worth noticing that fixing the 3D localization increases the probability for the companions to be inside the PISN mass gap (see Table. V) with respect to the standard run, while fixing only the sky location prior does not seem to have an effect on this. We also report the Bayes factors for these fixed-prior runs over the default runs in Table VI, which are quite high, but require a full analysis of the actual association probability to interpret.
To quantify this probability of a physical association between the GW and EM signals, in Table VII we show the results from performing the same multi-messenger coincidence significance analysis as presented in [6] (and using their public code), based on the localization overlap. We give results for the posterior overlap integrals for the sky location (I Ω ), for the distance alone (I D L ), and for the combined 3D localization (I D L I Ω ). Assuming the same prior odds as in [6], based on the reported number of flares similar to ZTF19abanrhr in the ZTF alert stream, we compute the odds O C/R for the coincident hypothesis. The resulting odds vary by a factor ∼ 2 between runs with IMRP XPHM and different versions of IMRP -TPHM (precession prescriptions, final spin versions and mode content) and overall show mild preference for association. However, our results are consistent with the findings of [6], with the highest O C/R we find at the same level as that reported for a run with the SEOBNR PHM model in [6]: essentially, despite the high Bayes factors the evidence is insufficient to confidently associate the events. For further illustration, Fig. 11 shows the position of ZTF19abanrhr compared with the sky location posterior density for GW190521 recovered by our default IMRP TPHM run.    comes from the ringdown to the final Kerr black hole, as can be seen in Fig. 4. Second, we have used the new time-domain IMRP TPHM model, which improves over IMRP -XPHM in how it treats precession, in particular regarding the ringdown, and which recovers a higher SNR (however consistent within statistical errors) and signal-to-noise Bayes factor (see Table II). Our overall results are broadly consistent with the original LVC analysis [3,4], in the sense that the inferred source parameters are consistent at 90% credible intervals. We confirm the complicated multi-modal posterior structure as first reported by [1], including support for more unequal mass ratios, but with different statistical significance of the peaks compared to what has been found in [1].
Before summarizing some more of our findings, we stress that our analysis is not the final answer on this uniquely challenging event either: not only are future waveform models expected to improve accuracy in this regime, but we also expect significant progress in understanding the systematic errors of precessing waveform models. One way of analyzing systematic errors would be to perform injection studies, where NR waveforms are injected into synthetic noise and then the bias of the recovered parameters can be studied. For GW190521-like signals, one of the challenges of such a study is the lack (or sparsity) of suitable NR waveforms across the relevant part of parameter space, including very unequal masses. Another one is that, especially for signals with strong precession, the results may significantly depend on the extrinsic parameters chosen for the injection, which will increase the computational cost and difficulty of interpretation for such studies.
As a first step, it will be interesting to repeat our analysis with other currently available time-domain waveform models which also cover large mass ratios Q and can therefore be used to test posterior support in that region. A forthcoming study [89] will present a similar re-analysis of GW190521 with the SEOBNR PHM model, which will allow a direct comparison with our results.
Meanwhile, the consistency we observe between two independent sampling codes, pBilby using nested sampling [27,28] and LALInference [29] using MCMC sampling, gives us confidence in our results, which also appear to be robust under changes of mass priors. We have also exploited the flexibility of the IMRP XPHM and IMRP TPHM models to check the influence of different modelling prescriptions and choices of spherical harmonic mode content on parameter estimation, finding that posteriors are generally robust against changes in the models' internal options. In particular, we focused on the final spin approximation, which is crucial for the ringdown regime. We have also discussed caveats when considering max L points in parameter space, pointing out that maximum likelihood is typically poorly resolved by the posterior samples from Bayesian parameter estimation algorithms.
While keeping in mind that future waveform models will provide further insight into the properties of GW190521, with this study we already provide improved parameter estimation results for the source of this event. One of our central results is to confirm the multi-modal nature of the posterior found in [1], with strong support for two peaks at near-equal mass ratios and at Q ∼ 5. We do however not find significant support for an intermediate mass-ratio merger with mass ratios of ten or higher, consistent with other recent results by [72] and [70]. We do not exclude further modes at higher Q, but we do not expect them to be significant based on our findings, and most importantly current waveform models are at this time significantly less reliable for mass ratios below our cutoff of q = 0.035 (Q = 28.57). We also provide updated probabilities of component masses being located in the PISN mass gap, in general confirming the preference of the original LVC analyses for at least one component inside the gap, but with IMRP TPHM results reducing the preference against a "straddling" binary configuration with one component below and one above the mass gap; and of associating GW190521 to the [2] potential electromagnetic counterpart.
parameters and to obtain more information from the signal. To test the effect of including different sets of harmonics, in Fig. 12 we compare the main parameters of the signal for runs performed with different harmonic content, going from ≤ 2 up to ≤ 5. We can observe how the recovered posterior densities tend to converge as more harmonics are included, resulting in a very similar distribution for ≤ 4 and ≤ 5. This finding justifies in part our choice of performing the main runs of this work for the subset ≤ 4. Going beyond ≤ 5, future work will be required to study the impact of harmonics that are not included in our waveform models.
A rough quantitative estimate of the importance of higher modes can be obtained by computing the optimal SNR contributed by each harmonic. The optimal SNR for one mode can be defined as where , refers to the usual noise-weighted inner product and h lm denotes the contribution of a given mode to the strain measured by the detector. We compute the SNR contribution of each harmonic employing the same PSDs we use for parameter estimation throughout this paper. In order to cleanly separate between different modes in the inertial frame, we perform the calculation using the aligned-spin version of the models, namely IMRP XHM and IMRP THM. The source parameters are taken from the posterior samples of the same IMRP TPHM LALInference run (corresponding to the red curves of Fig. 3). We present our results in Fig. 13: based on the SNR contribution of each harmonic, one would expect the largest impact on posteriors from the inclusion of the (3,3) mode. This conclusion is consistent with the results of a full Bayesian inference analysis outlined above, where the results for ≤ 2 are very different from the more complete runs but changes from ≤ 3 to ≤ 4 are more moderate and the ≤ 4, ≤ 5 results are very similar.