AI and extreme scale computing to learn and infer the physics of higher order gravitational wave modes of quasi-circular, spinning, non-precessing binary black hole mergers

We use artiﬁcial intelligence (AI) to learn and infer the physics of higher order gravitational wave modes of quasi-circular, spinning, non precessing binary black hole mergers. We trained AI models using 14 million waveforms, produced with the surrogate model NRHybSur3dq8 , that include modes up to (cid:96) ≤ 4 and (5 , 5), except for (4 , 0) and (4 , 1), that describe binaries with mass-ratios q ≤ 8 and individual spins s z { 1 , 2 } ∈ [ − 0 . 8 , 0 . 8]. We use our AI models to obtain deterministic and probabilistic estimates of the mass-ratio, individual spins, eﬀective spin, and inclination angle of numerical relativity waveforms that describe such signal manifold. Our studies indicate that AI provides informative estimates for these physical parameters. This work marks the ﬁrst time AI is capable of characterizing this high-dimensional signal manifold. Our AI models were trained within 3.4 hours using distributed training on 256 nodes (1,536 NVIDIA V100 GPUs) in the Summit supercomputer.

Production scale AI frameworks for gravitational wave detection that harness high performance computing (HPC) and scientific data infrastructure have also been developed [42], furnishing evidence for the scalability, reproducibility and computational efficiency of AI-driven methodologies. Figure 1 provides a glimpse of the rapid convergence of AI and extreme scale computing to study astrophysical scenarios that require waveforms with richer and more complex morphology. In view of these developments, it is time to further push the frontiers of AI applications to quantify their suitability to describe high-dimensional signal manifolds that describe waveforms whose morphology is significantly richer and much more complex than what has already been explored in the literature. This article represents a step in that direction. The driver we have selected for this study consists of characterizing higher order gravitational wave modes that describe quasi-circular, spinning, non-precessing binary black hole mergers. We densely sample a parameter space that consists of binaries with mass-ratios q ≤ 8 and individual spins s z {1,2} ∈ [−0.8, 0.8], using a training dataset of 14M waveforms. The sheer size of this training dataset requires the combination of AI and HPC, and thus we have used the Summit supercomputer to reduce time-to-insight through distributed training.
While this article showcases the convergence of AI and HPC for a computational grand challenge, the main goal of this analysis is to explore what new insights we may obtain by conducting AI-driven studies of the signal manifold of higher order gravitational wave modes. In particular, in this article we aim to address a few issues that we discussed in [43], namely: 1. Is it possible for AI to learn and accurately characterize the physics of high-dimensional gravitational wave signal manifolds? 2. Is it possible to exploit the computational efficiency and scalability of AI to train models with tens of millions of modeled waveforms and conduct fast data-driven analyses with fully trained AI models? 3. Is it true that the convergence and performance of AI models improves as we consider large volume training datasets that describe signal manifolds with higher degrees of freedom? 4. What insights do we gain when we characterize gravitational wave signal manifolds with waveforms with complex morphology?
As we describe below, the answer to questions 1-3 above is a resounding YES.
In terms of new insights, we find that our deterministic and probabilistic AI models provide informative constraints for the mass-ratio, individual spins and inclination angle of higher order wave modes. These are important results, since our previous work [43] showed that, when we only consider = |m| = 2 modes, it was difficult to constrain the individual spins of comparable mass-ratio systems, as well as the spin of the secondary for asymmetric mass-ratio systems. This study shows that the inclusion of higher order modes alleviates these problems, and thus provide an informed description of the ability of AI to characterize this high dimensional signal manifold in the absence of noise. Throughout this paper we use geometric units in which G = c = 1. This paper is organized as follows. Section 2 described the approach used to create our AI models. We present and discuss our findings in Section 3. Future directions of work are outlined in Section 4.

Methods
Here we describe the datasets, AI architectures and training methods followed to create our AI models. Datasets We use the surrogate model NRHybSur3dq8 [44] to generate timeseries datasets that include both the plus, h + , and cross, h × , polarizations. These may be represented as a complex time-series h = h + − ih × . h may also be expressed as a sum of spin-weighted spherical harmonic modes, h m , on the sphere [45] where −2 Y m are the spin-weight −2 spherical harmonics, θ is the inclination angle between the orbital angular momentum of the binary and line of sight to the detector, and φ 0 is the initial binary phase, that we set to zero in this study. Our waveforms include higher order modes ≤ 4 and (5, 5), excluding the (4, 0) and (4, 1) modes; cover the time span t ∈ [−10, 000 M, 130 M] with the merger peak occurring at t = 0M ; and are sampled with a time step ∆t = 1 M. In Figure 2 we showcase the importance of including higher order modes to accurately model binary black hole mergers. It is apparent that the inclusion of higher order modes produces significant modifications in the amplitude and phase evolution of waveform signals. Training dataset It consists of ∼ 14 million waveforms that cover a 4-D parameter space that encompasses mass-ratio, individual spins and inclination angle {q, s z 1 , s z 2 , θ}, respectively. We generate it by sampling the mass-ratio q ∈ [1,8] in steps of ∆q = 0.1; individual spins s z i ∈ [−0.8, 0.8] in steps of ∆s z i = 0.02; and the inclination angle θ ∈ [0, π] in steps of ∆θ = 0.1. Validation and test datasets Each of these sets consist of ∼ 800, 000 waveforms, and are generated by alternately sampling values that are inbetween the training set values. AI architecture We use a slightly modified WaveNet [46] neural network architecture for our model. WaveNet's main features that are relevant for this work include dilated causal convolutions, gated activation units, and the usage of residual and skip connections. These features help capture long range correlations in the input time-series, and facilitate the training of deeper neural networks. Furthermore, since we are interested in regression analyses, we turn off the causal padding in the convolutional layers. We use a filter size of 2 in all convolutional layers and stack 3 residual blocks each consisting of 14 dilated convolutions. For a more in depth discussion of the architecture we refer the reader to [43,46]. The output from the WaveNet is then fed into three separate branches of fully-connected layers. Each branch is trained to predict the mass ratio q, the effective spin parameters S eff , σ eff , and the inclination angle θ respectively, where Since our goal is to predict {q, s z 1 , s z 2 , θ}, we solve Eqs. (2) and (3) in conjunction with the predicted q values in order to extract the individual spins s z i . Training methodology We employ mean-squared error (MSE) between the predicted and the ground-truth values as the loss function. During training, we monitor the loss on the validation set to dynamically reduce learning rate as well as to stop training before over-fitting. We reduce the learning rate by a factor of 2 whenever the validation loss does not decrease for 3 consecutive epochs, and stop the training when the validation loss does not decrease for 5 consecutive epochs. Training the model on 256 nodes, equivalent to 1,536 NVIDIA V100 GPUs, in the Summit Supercomputer then takes about 71 epochs, using LAMB [47] optimizer with initial learning rate set to 0.001. Normalizing Flow: In addition to the above analysis for point estimates, we also train a normalizing flow model to estimate the posterior distribution. To do that, instead of extracting the parameters directly, we use the WaveNet model as a feature extractor and then condition a normalizing flow model on the extracted features to estimate the posterior distribution. This method was first delineated in [48], and then used in other studies [28,33]. We follow the same procedure, making use of the nflows library [49]. Normalizing flow is an example of a "likelihood-free" inference method, and is made up of a composition of invertible maps to transform a simple base probability distribution (e.g., a Multivariate Gaussian) into a desired posterior distribution which could be very complicated. The transformed distribution is then given by the change of variable formula: where Z is the random variable for the base distribution, X is the random variable for the transformed distribution, and f is the normalizing flow (i.e., the invertible transformation), such that X = f (Z). In our case, the goal is to model the conditional posterior distribution p(y|h) for the parameters y corresponding to the waveform strain h. We do this in two steps (as illustrated in Figure 3); first we pass the strain h through the WaveNet model to extract a feature vectorh. We then use a conditional version of Normalizing Flow fh ,ϑ , specifically a Neural Spline Flow [50], to transform the base Standard Multivariate Gaussian N (µ = 0, Σ = I) to the predicted posterior distribution q(y|h). The function fh ,ϑ therefore depends on the input waveform h (through the feature vectorh), and is parametrized by learnable weights ϑ. The normalizing flow model is then trained by updating the parameter ϑ so that predicted distribution q(y|h) matches the true posterior distribution p(y|h). This is achieved by minimizing the negative log-likelihood, i.e., given a batch of N ground-truth parameters y i and their corresponding waveforms h i , the objective for the Normalizing Flow model is to minimize: where, according to equation 4;

Results
We present results using deterministic and probabilistic AI models, which we described in Section 2. As described above, these AI models have been designed to take in time-series waveform signals that includes both polarizations (h + , h × ), and then output the most likely values for {q, s z 1 , s z 2 , θ} that best reproduce the input signal.

Deterministic AI models
In Figures 4 and 5 we provide a sample of results of the predictive capabilities of our deterministic AI models for a variety of input waveform signals. Note that ground truth waveforms are shown in blue, whereas waveforms whose parameters, {q, s z 1 , s z 2 , θ}, are predicted by AI are shown in dotted red. We quantify the accuracy of our AI models by computing the overlap, O(h t |h p ), between ground-truth, h t , and AI-predicted, h p , waveforms using where [t c , φ c ] indicates that the normalized waveformĥ p has been time-and phase-shifted.  Figure 6 summarizes the accuracy with which deterministic AI models estimate the parameters of higher order modes for all mass-ratios and spins for a sample of inclination angles. There we notice that predictions degrade in accuracy for edge-on systems, i.e., θ = π/2. We provide a more comprehensive analysis of these results in Figure 7, where we show overlap calculations between ground truth and predicted signals in terms of symmetric mass-ratio and effective spin (η, σ eff ) for all mass-ratios and spins under consideration for a sample of inclination angles. Again, here we find that our results are optimal for all angles except for edge-on binaries. We can understand this if we recall that for θ = π/2 we lose half of the information we feed into our AI models since h × (t, θ = π/2) → 0. These results demonstrate the ability of AI to search across the signal manifold of higher order modes, and pinpoint a set of parameters, {q, s z 1 , s z 2 , θ}, that best describes the amplitude and phase evolution of higher order modes. While this is informative, we also want to know the uncertainty associated with such AI-predicted tetrad of values. To extract such information, we estimate posterior distributions using a combination of WaveNet with normalizing flow, as described in Section 2.

Probabilistic AI models
We have selected a number of binary black hole systems to quantify the ability of AI to reconstruct the astrophysical parameters of systems that are known to be hard to characterize. In particular, we consider comparable mass-ratio binaries, for which it is difficult to tell apart individual spins, as well as asymmetric mass-ratio systems, for which it is difficult to accurately constrain the spin of the secondary. These results are presented in Figures 8-11. In all these figures, blue lines represent ground truth values and red dotted lines are point-parameter Overlap between ground truth signals and those predicted by deterministic AI models Overlap between ground truth and AI-predicted signals in terms of symmetric mass-ratio, η, and effective spin, σ eff , for a sample of inclination angles.
estimates provided by our deterministic WaveNet model. The histograms are produced by drawing 10, 000 samples from the multivariate Gaussian base distribution, and then applying the normalizing flow conditioned on the input waveform. This is therefore equivalent to drawing 10, 000 samples from the posterior distribution. The results we present below encompass the following scenarios: • Comparable mass-ratio systems Figure 8 shows that AI provides informative constraints for the individual spins (s z 1 , s z 2 ), the mass-ratio, q and the inclination angle, θ. We also reconstructed the effective spin parameters (S eff , σ eff ), given by Equations (2) and (3). These results indicate that AI is capable of providing reliable constraints for (q, s z 1 , s z 2 , θ, S eff , σ eff ) of comparable mass-ratio systems.
• Moderately asymmetric mass-ratio systems Figures 9 and 10 present parameter estimation results for binaries with mass-ratio q = 4. Here we considered two spin configurations to explore whether AI can still provide informative predictions when the spin of the secondary is high, s z 2 ∼ 0.8, Figure 9, and moderate, s z 2 ∼ −0.5, Figure 10. We notice that in both cases, AI provides reliable deterministic and probabilistic estimates for (q, s z 1 , s z 2 , θ, S eff , σ eff ). • Figures 11 and 12 present parameter estimation results for mass-ratio q = 7 binaries. The cases presented therein shed light on whether AI can provide informative constraints when the spin of the secondary is high s z 2 ∼ 0.8, Figure 11, or negligible s z 2 ∼ 0, Figure 12. In principle, one may expect that given the mass-ratio of the system, it may be difficult to constrain the spin of the secondary regardless of its spin. However, these results indicate that AI provides informative constraints not only for the individual spins, but the entire set (q, s z 1 , s z 2 , θ, S eff , σ eff ). In summary, both our deterministic and probabilistic AI models provide informative constraints on the astrophysical properties of gravitational waves that include higher order modes to accurately describe the physics of quasi-circular, spinning, non-precessing binary black holes.
Another important result result of this paper is that we have designed a methodology to train AI models that adequately handle training datasets that include tens of millions of modeled waveforms, thereby paving the way to extend this analysis for the case in which these types of signals are contaminated with simulated and advanced LIGO noise. The methods introduced in this paper will enable us to quantify the biases introduced by noise in parameter estimation analyses, and how to handle them to extract informative AI-driven parameter estimation results using higher order gravitational wave modes.   Figure 8, but now for a q = 7 binary with a rapidly rotating primary and a nearly non-spinning secondary.

Conclusions
We have developed scalable and computationally efficient methods to design AI models that are capable of characterizing the signal manifold of higher order wave modes of quasi-circular, spinning, non-precessing binary black hole mergers. Our approached enabled us to train several AI models using a dataset of over 14M waveforms within 3.4 hours with 256 nodes, equivalent to 1,536 NVIDIA V100 GPUs, achieving optimal convergence and state-of-the-art regression results.
We have demonstrated that AI can abstract knowledge from time-series data that helps constrain the physical parameters that determine the dynamical evolution of higher order modes of black hole mergers. In particular, we have provided evidence that AI provides deterministic and probabilistic predictions that tightly constrain the mass-ratio, individual spins, inclination angle, and effective spin parameters for a variety of astrophysical scenarios. We also found that deterministic and probabilistic predictions are consistent with each other, and in good accord with ground truth physical parameters.
The results we have introduced in this article provide benchmarks for the expected performance of AI to estimate the astrophysical parameters of binary black hole mergers in the absence of noise. In future work, we will present studies for the impact of simulated and advanced LIGO noise to conduct informative AI-driven inference for high dimensional signal manifolds.

Acknowledgements
A.K. and E.A.H. gratefully acknowledge National Science Foundation (NSF) awards OAC-1931561 and OAC-1934757, and the Innovative and Novel Computational Impact on Theory and Experiment project 'Multi-Messenger Astrophysics at Extreme Scale in Summit'. This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under contract no. DE-AC05-00OR22725. This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357. This work utilized resources supported by the NSF's Major Research Instrumentation program, the HAL cluster (grant no. OAC-1725729), as well as the University of Illinois at Urbana-Champaign. We thank NVIDIA for their continued support.