Inspiral-Merger-Ringdown Consistency Tests with Higher Modes on Gravitational Signals from the Second Observing Run of LIGO and Virgo

Current tests of General Relativity are performed using approximations which neglect a key feature of complete solution of Einstein's theory: higher-order modes. Our analysis will reassess these tests, including these higher-order mode effects. We have chosen to perform inspiral-merger-ringdown consistency tests on the gravitational transients detected by LIGO and Virgo during the observing run O2. We use an approximant which includes all higher modes with $\ell \le 4$ (NRSur7dq2) and then, for the most interesting cases, we repeat the tests involving fits on Numerical Relativity simulations.


Introduction
These last few years have decreed the dawn of gravitational waves (GW) astronomy thanks to the combined work of LIGO Scientific and Virgo Collaboration (LVC), verifying the predictions of Einstein's theory of General Relativity (GR).
The first gravitational signal, GW150914 [1], was detected on September 14th, 2015 and corresponds to a coalescence of binary black holes (BBH) with masses of 36 +5 −4 M and 29 +4 −4 M located at a luminosity distance of 410 +160 −180 Mpc. After this event, many other compact binary coalescences (CBC) of BBH have been detected [2][3][4][5] and on August 2017 the laser interferometers observed also a binary neutron stars (BNS) merger [6]. In the further years the sensitivity of the instruments will increase [7,8] and numerous observations are expected [9,10]; it follows that a solid knowledge of the theoretical models and a robust method of parameter estimation are necessary in order to be able to understand the physics behind these events.
Generally, for data analysis purposes, the CBC signals are divided into three stages. Initially the two objects are rotating one around the other (inspiral ): the orbital frequency slowly increases while the distance between the two objects decreases because the system is losing energy due to the emission of GWs. Subsequently they approach and merge together (merger ): during this phase the field has large curvature, the orbital frequency increases sharply and we found the energy peak of the gravitational radiation. After the merger, a single remnant object is generated and it dissipates the residual energy through perturbative motions (ringdown), emitting GWs though quasi-normal oscillating modes.
For a BBH coalescence, we need a set of 15 parameters to identify a single signal and we can divide those in two sub-sets: the intrinsic parameters and the extrinsic parameters. The intrinsic parameters are the fundamental to the description of the binary, if we change any intrinsic parameters we must recompute the orbital dynamics. We will call them as ξ and they are the masses m 1 , m 2 (or any combination of these two variables) and the spins S 1 , S 2 . Usually the dimensionless spins parameters χ 1 , χ 2 are defined for a binary coalescence; they are such that χ i = cS i /Gm 2 i for i = 1, 2. On the other hand, the extrinsic parameters simply describe how the binary is oriented in space and time relative to the detector; changing extrinsic parameters involves an easy transformation (rotation, translation or rescaling). For the further ones we will use the notation λ and they are the time at which the peak of the wave arrives at the Earth's geocenter t 0 , the orbital phase of the binary at coalescence φ 0 , the right ascension α and the declination δ of the source, the inclination angle between the line of sight and the binary's angular momentum ι, the luminosity distance D and the polarization angle ψ. So, the complete set of parameters is θ = { ξ , λ }.
The observations in Ref. [1][2][3][4][5][6] lead the scientific community to perform several test of GR, since thanks to GWs we are able to verify directly the predictions of Einstein's theory. Quite a few tests of GR currently performed on the BBH events detected by the ground-based interferometers of LIGO and Virgo can be found in Ref. [12]. However, these tests are performed using approximations which neglect higher-order modes, which are an important amount of information of the complete solutions of gravitational radiation.
In this analysis, we present the results of IMR consistency tests on the O2 events including all higher modes up to ≤ 4. We use RapidPE code [31,32] to evaluate the posterior distributions of BBH parameters. Initially we involve the numerical surrogate approximant in Ref. [14] labelled NRSur7dq2. Then, for the most interesting cases, we repeat the tests using Numerical Relativity (NR) simulations from the RIT catalog in Ref. [17]. After reviewing the theoretical models and the data analysis involved methods, we describe the results of IMR tests on the strains observed by LIGO and Virgo detectors during the observation run O2, which measured the gravitational waves produced by compact binary coalescences. These events are labelled as GW170104, GW170809, GW170814, GW170823 and GW170818. The others O2 events, that we did not study (GW170608 and GW170817), have low masses and thus SNR is too low in the two portions of signal (inspiral and post-inspiral), preventing the IMR consistency test from being performed. We also do not include GW170729 in this work.
Let us introduce some basic notions about parameter estimation. The GWs' parameters estimation is based on the comparison between a template h(t, θ) (which depends on some parameters θ) and the observed strain s(t). If we can suppose that a signal is present within a data segment, we are able to decompose into the inherent noise of the detector and a gravitationalwave signal, where n(t) denotes the noise inside the detector. Under these assumptions, the raw strain of data is transformed in the Fourier's space and each frequency is weighted on the power spectral density (PSD). The PSD is a characteristic function of the single interferometer and it describes the intensity of the noise in the detector at a given frequency. Under the assumption of Gaussian and stationary noise, we can define the (one-sided) PSD S n (f ) as We note that this average should be performed over many possible realizations of the system; however we have only a single physical system (i.e. our detector) but we can follow it in time, so the ensemble average is replaced by the time average. Another way to extract the PSD pass from the auto-correlation function R(τ ), From these results, it follows that the PSD is the quantity that measures our sensitivity. Then, assuming a gaussian stationary zero-mean noise, we compare the data with the template, i.e. we minimize the detector-noise-weighted residuals expressed as a conventional gaussian (log) likelihood for the data in the presence of a signal. The noise weighting is expressed using an inner product, This definition is a natural consequence of the auto-correlation R(τ ) inside the detector with the assumption of zero-mean stationary noise. In order to compare signals coming from different detectors, the waveform is shifted in reference to a detector relative to the geocentric time. Furthermore, this alignment between signals from different detectors gives us informations regarding the sky location of the source. Once we minimize the noiseweighted residuals, we are able to infer what kind of signal was detected and we get the posterior distributions of those parameters. From Eq. (4) the definition of signal-to-noise ratio (SNR) follows, which quantify the quality of our detected signal weighted on the noise inside the detector. For GW150914 it was reached an SNR approximately equal to 24, which is an incredible large value.

Higher Modes
In general, we can decompose the gravitational strain in oscillation modes using the 2-spinweighted spherical harmonics (−2) Y ,m , where h m denotes the ( , m) mode of the wave. In this section, we will focus on binaries in a quasi-circular orbit in the z = 0 plane (e.g., without precession); in this formalism we can write, and decomposing the single mode into real amplitude and real phase h m = A m · e iφ m , we get where φ orb denote the orbital phase of the binary. The spin-weighted spherical harmonics (s) Y ,m (ϑ, ϕ) (where ϑ and ϕ are used as generic angles) are defined in terms of the Wigner functions, as metioned in Ref. [18], where where k 0 = max (0, m − s) and k f = min ( + m, − s). Furthermore, in Ref. [20], another analogue decomposition is used, involving the mass-type U ,m and the current-type V ,m multipole moments, The tensors T E2, ,m ij and T B2, ,m ij are pure-spin tensor harmonics, and can be derived from the 2-spin weighted spherical harmonics by where e i = (u i + iv i )/ √ 2 and u i , v i are previously defined. Combining Eq. (6), (11) and (12), we get As we can see in Ref. [19], the higher modes (HMs) contributions are relevant in the last few orbits of the large mass ratio q = m 1 /m 2 (q ≥ 1), i.e. when one object is more massive than the other. In Ref. [19], the effects due to precession are highlighted: precessing simulations show a triggered amplitude, and the same modulations are present in the waves' frequency evolution.
The BBH we are going to study (GW170104, GW170809, GW170814, GW170818 and GW170823) are characterized by a low mass ratio q < 4 and mass M = m 1 + m 1 < 100M , as we can see in Ref. [21]. For such sources the (2, ±2) modes dominate the sum of all components. Moreover the efficiency of sub-dominant modes is strongly correlated with the inclination angle ι, as we can see in Eq. (6). However the sub-dominant modes' contributions increase with the angle ι.
In general, the dominant mode gave us a sufficient description of the events. Our purpose is to verify these statements, comparing the results of previous GR tests, made up without the HM contributions, with our analysis, which include the HM effects. In order to do that we will perform an IMR consistency test on the O2 events.

NR Surrogate: NRSur7dq2
For our purposes, we need an approximant which includes HM at least up to = 4. A good approximant is given by J. Blackman et al. in Ref. [14]. In this articles, it is presented an NR surrogate, called NRSur7dq2, which is able to describe the wave evolution for BBH with q < 2 and for |χ 1,2 | < 0.8 . This surrogate also include the precessing contributes, however, for the purposes of this report, we will only use the surrogate to describe non-precessing binaries This surrogate model is made up using parameters fitted on several NR simulations. Specifically, since GWs are highly oscillatory and they change in complicated ways as one varies masses and spins and since we have to interpolate the model in a high-dimensional space, the template is decomposed in waveform data pieces. Each waveform data piece is a simpler function that varies slowly over parameters. Once each waveform data piece is interpolated over a desired set of points in parameter space, h(t) is recombined from these pieces. Actually, in order to conserve the continuity and the differentiability of the physical quantities, the templates are built up using a set of differential equations, computing them into different frames (co-precessing and co-orbital). This equations are those that describe the evolution of spins, frequencies and phase of the binary.
As mentioned in Ref. [15,16], the m = 0 modes of NRSur7dq2 do not attempt to reproduce the expected memory terms. However, memory modes are very low frequency and therefore are not a significant contribution to the results of the parameter estimations in this search.

Parameters Estimation
Let us introduce some of the fundamental concepts for Bayesian inference. Calling s the data strain, h the model (based on certain hypothesis H) and θ the parameters' vector, the Bayes' theorem states that where p(θ|s, H) = P(θ) is the posterior distribution of the parameters given an observation, p(s|θ, H) = L(θ) is the likelihood function of the data, p(θ|H) = Π(θ) is the prior distribution of the parameters assumed before the measurements and p(s|H) = Z is the normalization constant called evidence.
Let us call H 0 the hypothesis for which there is no signal inside the observed strain and call H 1 the one that assume a gravitational signal inside the strain. So, assuming (1) and supposing that we have a set of k detectors, we get where we used the inner product defined in Eq. (4). Then, thanks to the Bayes' theorem (15), we can compute the evidences Z 0 and Z 1 , and we are able to infer that a gravitational is contained in the strain s if Z 1 > Z 0 . We start from the assumption that our analyzed strain contain a GW, as verified by the previous LVC analysis. We note that the output of the detector is a time series which describes the oscillations of the test-masses, while a GW is a tensorial perturbation h ij (t). However, the tensorial signal is reduced to a scalar due to the detector; in fact we can writhe for the k-th detector where F +,× are the antenna pattern functions, that describe the sensitivity of the instruments in the different directions, and h +,× are the two polarization of the gravitational strain, which depends also on the position of the k-th detector with respect to the geocenter. Then we define F k = F +,k + iF ×,k and combining Eq. (18) and Eq. (6), we get For a set of k detectors, we can write the time at which the peak of the GW arrives in the k-th detector as where x k is the vector from the geocenter to the k-th detector and n is the unitary wave vector that depends on the sky position of the source. Thanks to the comparison of the different time delays we are able to infer on the sky location of the source, and it emerge that only using three detectors we are able to localize the source in a sufficient narrow spot. In fact, using only two detectors, we are not able to fix an angular degree of freedom, and so the posterior distribution is spread over a circumference.

RapidPE
In order to perform these tests we use the parameters estimation software RapidPE, described in Ref. [31,32]. This code is based on Monte Carlo processes that perform fast computation of the likelihood over an input grid, and process these results returning a posterior output file. Then, the output file could be used as new grid for a second iteration (over the same observation), in order to get an output posterior distribution that is localized in a narrower region. The efficiency of RapidPE is due to how it treats the different sets of parameters. We recall that the entire set of parameters θ can be divided into two subsets of extrinsic λ and intrinsic ξ parameters. The waveform decomposition in Eq. (19) can be exploited to speed up evaluations of likelihood. In fact, the decomposed modes h ,m depends only on the intrinsic parameters ξ and the extrinsic ones (except for t k ) are encoded in the coefficients of the linear combination. So, we can take out from the inner product these parameters, thus we need only to compute the inner product involving h ,m and the data strain s. So, we define the quantities A signal will produce a peak in the filtered outputs localized to a short millisecond time window around the coalescence time, so Q k, ,m will be sharply peaked as functions of t k and we need only retain the values for a narrow range of t k . To allow for detector arrival times that differ from the geocenter time, the range of t k for which we must store the Q k, ,m is set by the light travel time across Earth (2R ⊕ /c ≈ 42ms). Now, plugging together Eq. (19) and the definition of likelihood, we have Importantly, the intrinsic parameters ξ enter only through the Q k, ,m , U k, ,m, ,m and V k, ,m, ,m . These are the dominant cost, as they require computing the orbital dynamics, the h ,m , inner product integrals, and inverse Fourier transforms. By contrast, the extrinsic parameters enter the F k and (−2) Y ,m , which are much cheaper to compute. Technically, RapidPE is composed by two steps: the first integrate out the extrinsic parameters and we call it integrate-likelihood-extrinsic (ILE), the second step processes the ILE's output to generate the posterior samples and it is called compute-intrinsic-posterior (CIP). In the following sections we will give an idea of these processes.

ILE
The first step integrate the extrinsic parameters λ, because precomputed quantities allow us to an efficiently evaluation of L(ξ, λ) as a function of ξ defined as where Π(λ) is the prior distribution for the extrinsic parameters. We assume the sources analyzed are randomly oriented and randomly distributed in the Universe out to a fiducial radius. With the potential exception of the sky position, our priors are independent, and thus separable. This computation is performed over a grid inscribed into the prior's bounds and the algorithm uses a Monte Carlo iteration.
Moreover, in order to improve the efficiency, a weight probability function w(λ) is used in the extrinsic parameters and it must be different from zero in the entire domain. Then the integral Eq. (25) is rewrite as and this functions is used also to compute the marginalized distributions for the extrinsic parameters (necessary to integrate them out). The ILE output file is a list of events in the intrinsic parameters' space which describe the ln L function. Then, this points are post-processed and collected together and the output is given to the CIP.

CIP
Once we have evaluated the the likelihood over a grid of extrinsic parameters, L(ξ) is computed via Gaussian process interpolation, over the intrinsic parameters space. This step generates a grid with a density that conforms to the values of the intrinsic likelihood., i.e. more points where the likelihood's values are larger. Then, the algorithm compute the evidence, and the posterior's values over the current grid, To extract the posterior samples, the software executes a fit on the data and then an adaptive Monte Carlo on these results getting an independent set of values from the posteriors. When fitting the likelihood, we employ coordinate systems well-adapted to the likelihood, which are likely to produce an approximately gaussian likelihood in the limit of strong signals. We set the number of output samples equal to 8000 points. To construct posteriors for the intrinsic parameters, we adopt a uniform prior in masses and spins during the first iteration. For the next ones, we use the likelihood values obtained with previous runs as input grid, and then perform an analysis with the same priors. Actually, this is the real power of RapidPE; we can re-use the output of CIP as new input for the ILE, and re-process the same values, obtaining a peaked distribution over a sufficient narrow region of the parameters' space. When the correlations between the posterior samples does not change increasing the number of iterations, we are able to suppose that we reach the convergence of the distribution. In the end, we compare our results with the analysis presented in Ref. [12].

IMR Consistency Tests
The IMR consistency test perform independent parameter estimations on the inspiral portion of data and on the post-inspiral one. This test is based on estimating the mass and the spin of the remnant black hole from the two independent portions of signal. Then, the results, coming from the low-frequency inspiral and the high-frequency post-inspiral, are compared by checking that they are consistent, as it should be from GR predictions. This test was introduced by A. Ghosh et al. in Ref. [22,23] and Fig. 1 shows their results obtained using simulated events.
In order to perform the test, we have to choose the cut off frequency, that divides the inspiral from merger and ringdown, since we perform the analyses in the frequency domain. We define  [22]) and it shows the results of an IMR consistency test. This analysis is performed on simulated non-spinning GR event with m1 = m2 = 50 M and optimal SNR of 25 in the advanced LIGO detectors. On the left, the top panel shows the 68% and 95% credible regions of the posterior distributions of the mass and spin of the final black hole estimated from the in-spiral and post-inspiral parts of a simulated GR signal, respectively; the bottom left panel shows the posterior of the parameters ∆M f /M f , ∆χ f /χ f t hat describe the deviation from GR, estimated from the same simulation. On the right, same as the left panels, except that here the injection corresponds to a kludge modified GR injection, as highlighted by the plots where the GR value is well outside the 95% credible region.
the inspiral [post-inspiral] as the Fourier frequencies lower [greater] than that of the innermost stable circular orbit (ISCO) of a Kerr black hole with mass M f and spin χ f . However, this choice is not unique and reasonable alternatives do not have a significant effect on the test, if SNR is sufficiently large in both stages. In order to be consistent with the results in Ref. [12], we will choose f cut equal to those chosen during the previous analyses. The only exception is GW170814, where we move f cut to 150 Hz instead of 161 Hz.
Once we choose the cutoff frequency, we select a model and we use RapidPE software, explained in Ref. [31,32], to perform parameter estimations on the two data segments (inspiral and postinspiral) measuring the posterior distributions of the intrinsic parameters and we repeat the same for the entire signal (labelled as IMR). Then we involve the routine of the IMR tests [24] looking for the consistency between the two portions of the signal. We expect to found agreement between these results, if GR theory's predictions are valid. As mentioned by Ref. [22,23], IMR tests are performed on the final observables values, i.e. mass and spin parameter of the remnant object. We will use to compute these quatitites the NR fit formulae given by Ref. [33][34][35]. This function is located in LAL [13] and it is labelled as bbh_average_fits_precessing.
In Fig.2 are shown the results of the IMR consistency test of GW150914 from LVC paper in Ref. [11]; for the purpose of this test, the cutoff frequency is chosen equal to 132 Hz. On the left side, the inspiral posterior distribution is consistent with the post-inspiral one up to the 90% confidence level, and in the common region we can found the IMR results. So we are able to infer that these results do not deviate from the prediction of BBH in GR. In order to assess the results the quantities ∆M f /M f and ∆χ f /χ f are defined. Those describe the fractional differences of the final masses and spins and they quantify the consistency of the observed signal with a BBH predicted by GR. In detail, these quantities are the differences between the inspiral and the post-inspiral measures divided by the averages of those, as explained in Ref. [23]. Explicitly, Figure 2: The IMR test plots of GW150914 shown in Fig.4 of LIGO Scientific and Virgo Collaboration, Phys. Rev. Lett. 116, 221101 (Ref. [11]). On the left, the 90% credible regions of the joint probability distribution of final mass and final spin for the final object as determined from the inspiral, from the post-inspiral and from the entire IMR signals. On the right, the joint probability distributions of ∆M f /M f , ∆χ fχf . The + symbol indicates the null result expected in GR, which lies on the isoprobability contour that encloses 28% of the posterior. In this analysis IMRPhenomPv2 [25][26][27] and SEOBNRv2_ROM [28,29] approximants are used to estimate the posterior distribution.
Where the labels I and MR denote respectively "inspiral" and "merger-ringdown". The GR's predictions coincide with the origin of the axes for these quantities, and this point is included in the posterior distribution on the isoprobability level that enclose 28% of the posterior distribution. We note that we cannot perform this test on all O2 events because some of those have low masses and the SNR is too low for parameter estimation to provide sufficient information about the parameters from the two portion of signal (inspiral and post-inspiral). We can use only sufficiently massive BBH coalescence, because, with the current sensitivities of the instruments, we are not able to obtain acceptable posterior distributions with BNS and low-mass BBH mergers' data.
If we have a set of N observation we can combine the posterior distributions fro the fractional quantities getting an overall posterior. Calling M ≡ ∆M f /M f and χ ≡ ∆χ f /χ f , it follows that for N observations the combined posterior distribution for these fractional quantities is: where Π ( M , χ ) is the overall prior distribution for the fractional quantities and Π i ( M , χ ) are the priors used to compute the posterior P i .

IMR Tests on O2 Events
In the following sections, we show the IMR consistency tests performed on O2 events and using all HMs with ≤ 4. The prior distributions for the extrinsic parameters are chosen flat over the entire domain, except for the distance D, which is proportional to D 2 in the range [0, 3 Gpc], the inclination angle ι which is proportional to sin ι in its domain and the prior for the sky position (α, δ) is isotropic over the solid angle dΩ = sin δ dδ dα. Moreover we set the entire frequency range from 20 Hz to 1024 Hz, and we will split it into two separate ranges for the IMR test according to the estimated cut-off frequency.
Regarding the intrinsic parameters, we use an uniform prior for the masses m 1 and m 1 , imposing the condition q < 2 since NR surrogate model is not reliable over q > 2. This choice is not a severe restriction since we do not expect to find events with large mass ratio, and this is also proved by the detections catalog GWTC-1 in Ref. [21], where the events of interest have mass ratio sufficiently lower than 2. Under this point of view, GW170104 and GW170818 are an exceptions, since the 90% isoprobability contour of the mass ratio posteriors reach values respectively of 2.45 and 2.94 . However, for GW170104 we repeat the test involving pure NR simulations without any restriction on the mass ratio. Regarding the other details about the intrinsic parameters priors, we explain the chosen prior distributions in every sub-sections. The effects of the prior selection for Bayesian analysis affect the results, as shown by Ref. [36], and we have to use the correct distributions for each specific hypothesis. In order to simplify the analysis, the spins are taken aligned orthogonally to the orbit (χ i,x = χ i,y = 0 for i = 1, 2), and precession effects are neglected.
For the most interesting cases (GW170104, GW170814 and GW170823) we evaluate the posterior distributions involving fits on pure NR simulations. In this cases we do not have any restriction on masses and spins, but we still keep aligned components of the spins. These simulations are reliable from the frequency of 35 Hz, so we have to use a larger lower-frequency in order to avoid numerical errors and maintain the sanity of the waveforms. When we use pure NR fits we move the lower-frequency from 20 Hz to 35 Hz.

GW170104
We use a flat prior for the masses components taking M in the range between 40 M and 80 M and the spins components' prior are in agreement with the aligned spins assumption. Our model NRSur7dq2 includes all HMs up to = 4 and the chosen cut-off frequency is 143 Hz.

GW170809
In this analysis we use a flat prior for the masses taking M in the range between 45 M and 105 M and the spins components' prior are in agreement with the aligned spins assumption. Our model NRSur7dq2 includes all HMs up to = 4 and the chosen cut-off frequency is 136 Hz, in accordance with the non-HMs analyses in Ref. [12].

GW170814
This observation leads to wide and bimodal posterior distributions for the post-inspiral analysis, as seen in Ref. [12]. We use a flat prior for the masses components taking M in the range between 10 M and 80 M and the spins components' prior are in agreement with the aligned spins assumption. The chosen cut-off frequency is 150 Hz, differently from the analysis in Ref. [12] (where it is used f cut = 161 Hz). However, that is the results coming from our computations of ISCO frequency using the full IMR posterior samples.
We can see in Fig. 6 that the results do not deviate from GR, since the prediction is enclosed in the contour at 9.8% credible region. However, the post-inspiral portion of data generates a multimodality with an additional peak at lower values of masses. The results from the inspiral and the post-inspiral signals are consistent with each other and they agree with the entire IMR's posterior distributions. We also note that the second peak is not able to exceed the 68% confidence level of the distribution; however this fact is due to the restrictions on the spins which do not allow our posterior samples for the final quantities to reach that region of the parameters space; i.e. because of the choice of aligned spins prior which favors small BH spins.
We repeat the measures using pure NR with an uniform distribution for the masses m 1 and m 2 in the region corresponding to η ∈ [0.01, 0.25] and M ∈ [10 M , 80 M ] assuming aligned spins in the range a i,z ∈ [−1, +1] for i = 1, 2. As it is shown in Fig. 7, these results are almost identical, modulo sampling uncertainties in both calculations: a secondary peak still appear at lower masses' values and also increasing the number of iterations we are not able to remove it. In general, even if we get a bimodality in the posterior distribution for the post-inspiral signal which is inconsistent with the full-signal analysis, the GR quantiles in enclosed in the 90% of confidence level and more advanced studies [12] show that these results are reasonable with noise fluctuations.

GW170818
We use a flat prior for the masses taking M in the range between 50 M and 100 M and the spins components' prior are in agreement with the aligned spins assumption. Our model NRSur7dq2 includes all HMs up to = 4 and the chosen cut-off frequency is 128 Hz.
We note that for this event the constraint q < 2 is a bit more stringent since a large number of inspiral's posterior samples coming from non-HMs analysis are in this area (about 50%) and our posterior samples hit the boundary of the prior imposed by NR surrogate. However, the full-signal's posterior distribution shows that the maximum-posterior value respect this bound, and we are allowed to involve the NR surrogate. The results are in Fig. 8.

GW170823
Also GW170823 leads to multimodalities in the posterior distributions but, unlike GW170814, in this case they come from the inspiral's portion of data. This fact could come from the fact that we observed few inspiralling orbits and this leads to large parameters uncertainties. We use a flat prior for the masses m 1 and m 1 imposing M in the range between 50 M and 220 M and the spins components' prior are in agreement with the aligned spins assumption. Our model NRSur7dq2 includes all HMs up to = 4 and the chosen cut-off frequency is 102 Hz.
In Fig. 9 we can see that the results involving HMs shows the degeneracy at higher mass in the inspiral's posterior distribution, however the additional peak is not the one with higher probability. Furthermore, we involve pure NR simulations using an uniform prior distribution for the masses components m 1 and m 2 in the region corresponding to η ∈ [0.01, 0.25] and M ∈ [50 M , 220 M ], we assume aligned spins in the range a i,z ∈ [−1, +1] for i = 1, 2. Also in this case we use all HMs with ≤ 4. In Fig. 10 we can see that the secondary peak reach almost the 68% credible region and contains a relatively small portion of the entire posterior volume.

Conclusions
We note that the IMR tests in Ref. [12] performed with the approximant IMRPhenomPv2 [25][26][27] and SEOBNRv4_ROM [28] show the same degeneracies coming from the posterior distributions of M MR f for GW170814 and from M I f for GW170823. However this contributions do not show the presence of coherent signals inside the data and they are reasonable with noise fluctuations.
Then we use the defined quantities ∆M f /M f and ∆χ f /χ f to join together the predictions coming from different events. So, we take the results coming from IMR tests of all O2 events where we used the NR surrogate, obtaining a single distribution which tells us the combined prediction of these events. The join posterior distribution does not show deviations from GR above the 39.3% confidence level, and we can see in Fig. 11 the the + symbol is largely enclosed in the contour at 90% credible region. This results is totally in accordance with the expected results we are able to infer that the inclusion of HMs in the IMR consistency test does not allow to deviation from the GR prediction. Then we combined the results from the analyses which involve pure NR. The posterior distributions are shown in Fig. 12 and the join distribution does not show deviation above the 55.1% confidence level. E-mail: matteo.breschi@ligo.org Figure 12: The 90% credible region of the join posterior distribution and the relative marginalization from the analyses involving pure NR. These combined results do not show deviation above the 55.1% credible region. The symbol + indicates GR prediction.