Rapid search for massive black hole binary coalescences using deep learning

The coalescences of massive black hole binaries are one of the main targets of space-based gravitational wave observatories. Such gravitational wave sources are expected to be accompanied by electromagnetic emissions. Low latency detection of the massive black hole mergers provides a start point for a global-fit analysis to explore the large parameter space of signals simultaneously being present in the data but at great computational cost. To alleviate this issue, we present a deep learning method for rapidly searching for signals of massive black hole binaries in gravitational wave data. Our model is capable of processing a year of data, simulated from the LISA data challenge, in only several seconds, while identifying all coalescences of massive black hole binaries with no false alarms. We further demonstrate that the model shows robust resistance to a wide range of generalization cases, including various waveform families and updated instrumental configurations. This method offers an effective approach that combines advances in artificial intelligence to open a new pathway for space-based gravitational wave observations.


I. INTRODUCTION
Assessing the ubiquity of massive black holes in earlier times [1,2], a large number of massive black hole binaries (MBHBs) are expected to have formed over the course of cosmic history [3]. The coalescences of MBHBs with total masses of 10 5 M −10 8 M due to galaxy mergers provide a primary source of low-frequency gravitational waves (GWs) detectable by the proposed the Laser Interferometry Space Antenna (LISA) observatory [4]. Such systems may also produce detectable electromagnetic (EM) emission, which allows us to witness the formation of a quasar following the final merger [4]. Furthermore, a recent study found that the EM emission from electrons accelerated at the external forward shock may emerge days to months after the coalescence [5]. However, the GW spectrum at millihertz frequencies is expected to be populated by many additional sources, including tens of millions of Galactic binaries, thousands of extreme mass ratio binaries and stellar mass black hole binaries [4] within the sensitivity band of a LISA-like detector. Therefore, the resulting superposition of GW signals from numerous resolved and unresolved sources needs to be dealt with and poses a great challenge to individually extracting information on every source.
Typically, a global-fit analysis [6][7][8] has been considered for exploring the large parameter space of all over-lapping and resolvable signals but at great computational cost. Although GW signals produced by inspiraling MB-HBs can stay in-band for months or even years, MBHBs are relatively easy to detect within a short duration due to their accumulating signal-to-noise ratios. Accordingly, once an MBHB merger is spotted, global-fit analysis can tackle the case with a fixed number of parameters. In this case, the follow-up parameter estimation can be accelerated around the alert, which is important for determining the sky location of MBHBs in a short time.
In this work, we make the initial attempt to search for MBHB coalescences in LISA data using the deep learning approach [9], which has recently gained popularity in the GW community but with the majority of efforts on the ground-based GW data analysis [10]. Different from the case in LIGO-Virgo, the characteristics of LISA data are significantly more complicated due to modulation by the motion of detectors [11]. The time-delay interferometry (TDI) technique [12] involved further increases the complexity of the data for suppressing the laser frequency noise. Currently, only a few works using neural networks have been done on fast waveform modelling [13,14] and Bayesian inference [15] for GW sources in LISA band. In this case, it is worth testing the capability of deep learning on realistic LISA data for GW detection.
For our attempt, we construct a matched-filtering convolutional neural network (MFCNN) [16] to distinguish whether there are MBHB mergers in a data segment. Here, we use the model to identify the short data segments containing MBHB mergers from long-duration data. The coalescence time is limited to a small range and the number of mergers can be determined. Valuable as a form of data pre-processing, this greatly re-arXiv:2111.14546v2 [astro-ph.IM] 15 Apr 2023 duces the computational cost of follow-up parameter estimation (including sky location). Our model takes only several seconds to analyse a year of LISA data with no false alarms. Moreover, all MBHB coalescences are identified within several days of data. It also shows robustness against various waveform families and generations of TDI techniques. Thus, our model is expected to deal with realistic LISA data and be easily extended to other space-based GW detectors such as Taiji [17].
This paper first proceeds with a thorough description of the proposed approach as presented in Sec. II. Next, Sec. III reviews the commonly used strategies in searching and introduces the search methodology on streaming data used in this work. In Sec. IV, we describe the training datasets and implementation details. The results are presented in Sec. V. Finally, summary and discussions are provided in Sec. VI.

II. MFCNN APPROACH
So far, the matched filtering method [18] plays a vital role in ground-based GW detection and has achieved great success in the discoveries of many GW events. It is based on an existing full waveform template bank which grids up parameter space sufficiently densely. Thus, GW signals beyond the template bank cannot be easily detected. For the first observational run of Advanced LIGO, the bank targeting compact binary systems contains ∼250,000 templates [19]. Considering more complicated physics of GW sources, the number of templates in the bank will increase. Although it is computationally expensive to calculate matched-filtering signal-to-noise ratios (SNRs) [20] with all templates in the bank, the process can extract features of weak signals from noisy data. Based on this point, we construct the first layer of the MFCNN to implement matched filtering process by using a small amount N t of waveform templates as learnable weights. Practically, the input strain data d and the prepared waveform templates h are first whitened as where denotes convolution operation andS n (t) = +∞ −∞ df S −1/2 n (f )e i2πf t is related to the given noise power spectral density S n (f ). Then, an expected matched filter is calculated asd(t) h i (−t) with the fixed coefficient h i (i = 1, 2, · · · , N t ). Finally, analogous to the standard matched filtering approach, maximize and normalize the output of each filter as which corresponds to the SNR for each template. The output SNRs from the first matched-filtering layer capture the general features of GW signals. The rest part of the neural network is employed as the usual convolutional neural network (CNN) [21] to analyze these features. Many studies [16,[22][23][24][25][26][27][28] have indicated that CNN can capture patterns in the data, and we show that the architecture we built works as well by extracting features from the gravitational wave templates. In the end, a Softmax function is applied to evaluate the predictions of confidence scores for searching MBHB signals, which is commonly used in the task of GW signal detection.
The MFCNN model combines the power of template matching to identify weak signals and the ability of CNN to extract features without any prior knowledge. It does not generate triggers for a single channel, but develops instead a general convolutional neural network which can be employed afterwards to recognize coherent patterns between the matched filters. The specific structure of the MFCNN model used in this work is similar to the Fig.1 in [16]. The model constructed in [16] targets to search for GW signals emitted from stellar mass compact binaries and it is capable of identifying all GW events in GWTC-1 [29]. In this work, the structure of MFCNN model is adjusted to adapt to space-based GW detection. Instead of data from LIGO interferometer channels used in [16], we implement TDI observables [12] as model input channels. Moreover, in the first layer, we set more templates and use the noise power spectral density of LISA to conduct the whitening process. Actually, more differences between the approaches of this work and Ref. [16] lie in the characteristic of datasets that will be introduced in detail in Sec. IV.

III. SEARCH STRATEGY
In this section, we develop our conceptual contributions to search strategy for GW signals, namely that (a) probability predictions from any models output with the Softmax function are not suited to claim statistically significant detections of gravitational waves, however, (b) they can still be useful statistics for trigger generation. This contribution is especially important for understanding how to identify the temporal location of potential GW signals and diminish false alarms in streaming data.
Our core argument for claim (a) hinges on the fact that the negative log-likelihood cost function applied on models with the Softmax function always strongly penalizes the most active incorrect prediction, and the correct output for samples with large SNR contributes little to the learnable parameters of the model (see Sec. 6.2.2.3 in [30]). Although certain binary classifiers can produce effective decision boundaries between signal-free and nonsignal-free samples, they perform badly in ranking influential non-signal-free samples, which are not monotonically influential on SNR or other statistics [16]. Consequently, the significance level obtained from model prediction with Softmax cannot be treated as valid ranking statistic for distinguishing loud triggers from fainter ones.
To substantiate (b), we highlight that by either "repeated-modeling" or "repeated-sampling" analysis on the short data segment, one can provide a search strategy to indicate confidence level and diminish false alarms. For example, the "repeated-modeling"-based multi-model stacking ensemble learning for GW detection [31][32][33] can combine the output of each independent sub-model to identify GW triggers that pass a given threshold. Intuitively, the probability of misclassification decreases with an increase in the number of submodels. Unlike the above "repeated-modeling" analysis, "repeated-sampling" method is an analysis of repeated measurements to capture the variability of a sample statistic. Similar to how time-shifted analysis [34] has been used to create a larger number of data to approximate background noise, the method, first applied in [16], repeatedly analyzes varied snippets of the target data using a single model to identify the significance of candidates with frequentist interpretations. This method takes longer data segments with significantly overlapping each neighbouring samples as input data and repeatedly estimates the background around the target short data segment. Importantly, in detecting a signal hiding in the streaming data, a simple value comparison of continuous alerts from model prediction can help in detecting the trigger and ruling out false positive candidates.
To detect MBHB signals in streaming data, we implement the output of the MFCNN as a ranking statistic for trigger generation. More specifically, in a real search, we adopt the "repeated-sampling" method to search for MBHB signals in a very-long-duration dataset from the LISA Data Challenge (LDC) [35]. For each input segment of a given streaming data, we choose to overlap segments by 80% in the MFCNN process. In other words, as we continuously advance by one fifth of the length of the input, each overlapping snippet appears in five segments and is thus processed 5 times by our model. Subsequently, we define the local-maximum trigger as 5 (or more than 5) neighbouring segments with the same values predicted by the MFCNN and surrounded by the segments with smaller values. For record purposes, we output the centre time of each input segment and the potential coalescence time of MBHB signals.

IV. DATASETS
We use the IMRPhenomD waveform model [36,37] to simulate 3,000 waveforms of non-precessing MBHBs at redshift z between 1 and 15. In practice, the redshift is converted to luminosity distance assuming a flat ΛCDM cosmology with Ω m = 0.31, Ω Λ = 0.69 and H 0 = 67.74 [38]. More specifically, we use a logarithmic scaling to sample redshifted total mass M z = M (1+z) in the range of (10 5.4 M , 10 8 M ) with steps of 0.013. We also sample mass ratio q in the range of (1, 15) with steps of 1 (M = m 1 + m 2 is the total mass of the two companions). We sample the coalescence time 3d ≤ t c ≤ 365.25d for nearly a whole year and adopt a uniform prior over the binary's dimensionless spins −0.9 ≤ (s 1z , s 2z ) ≤ 0.9 with spins aligned with the orbital angular momentum. We assume a uniform prior over the remaining parameters (inclination angle ι, ecliptic latitude β, ecliptic longitude λ, reference phase φ c and polarization angle ψ), and they are sampled in the range of ι The Gaussian instrumental noises are generated and based on the power spectral density (PSD) stated in the LISA Science Requirement Document [39]. The other component of the noise comes from the GW signals of Galactic binaries, although some of the signals contained in this component are detectable individually. We simulate this kind of noise using the PSD estimated from LDC noiseless data [35] which contains simulated waveforms from 30 million Galactic binaries. The noise generated in this way is both Gaussian and stationary. Though, in fact, the Galactic signals should be treated as non-Gaussian and non-stationary noises modulated by the motion of the detector [40,41].
As a tradition in training deep learning models, our data is divided into training and validation sets. The validation data set contains the same data distribution as the training data set and allows for an unbiased evaluation of a model fit on the training data set while tuning the model's hyperparameters. To develop a classification model, we split the training datasets into two categories, one containing MBHB signals and additive Gaussian noise, and the other containing Gaussian noise alone. In the training phase, we set the input size of the MFCNN as 16, 384 and the sampling rate as 1 /15 Hz. For the templates utilized in the first layer of the MFCNN, we sample the logarithm of M z uniformly between (10 5.4 M , 10 8 M ) for 50 equal-mass MBHB sources at z=3 for convenience. Unlimited amounts of training and validation dataset that combines noise and modeled waveform are generated in our experiments to overcome overfitting. The data is augmented by randomly shifting the signals within the input segment of 50-90%. Fig. 1 shows the distribution of SNR for the training and validation datasets with respect to different redshift z values. The training process takes 10 hours on a NVIDIA GeForce RTX 2080 GPU with 8GB of memory. Conventionally, we output the optimal model which works quite well in searching for MBHB signals with IM-RPhenomD waveforms.

V. RESULTS
A. Generalization test In this section, we employ the receiver operating characteristics (ROC) analysis [51,52] to visualize the performance of the MFCNN model on the testing dataset. The ROC curve is a graphical plot that depicts the changes in true positive rate (TPR) and false positive rate (FPR) [52] as the discrimination threshold is varied. To compare classifier performance, the area under the ROC curve (AUC) [52][53][54] has long been used to quantify the ROC performance as a single scalar value varying between 0 and 1. The AUC is a threshold-free metric capable of measuring the overall performance of binary classifiers. The closer the AUC to 1, the better the classifier.
We test the MFCNN model on datasets which assume independently identically distributed about the training set. Each testing dataset with various redshift z consists of 3,000 positive samples (with MBHB signals) and 3,000 negative samples (without MBHB signals).We generate waveforms of MBHB signals using the IMRPhe-nomD model, using the same priors as those used in training phase. To ensure the training samples will not appear in test dataset, We shift the grid of M z and q from the values in training data. Specifically, we use a logarithmic scaling to sample redshifted total mass M z = M (1 + z) in the range of (10 5.4+0.0065 M , 10 8+0.0065 M ) with steps of 0.013 and sample mass ratio q in the range of (1 + 0.5, 15 + 0.5) with steps of 1. The priors of the other parameters are the same in the training phase. We show the performances on testing datasets with various redshifts in Fig. 2. It is consistent that, as the redshift of test dataset increases (SNR decrease as seen in Fig. 1), the classification ability of MFCNN models declines. We found that our MFCNN model attains optimal performance in AUC as we shift the redshift between z = 1 and z = 16 for testing while the model trained on z = 6 achieved best performance among the others. Here, we will use this model for the following test. In Fig. 3, we also vary the threshold of prediction from 0 to 1 to show the ROC curves for each test dataset. As we vary the observing limit from z = 1 to z = 16 for FPR = 10 −3 , the TPR decrease from 0.997 to 0.627. This shows our model achieves a high level of sensitivity and can capture the distinctive waveform features of MBHBs in highly noisy environments. In reality, MBHB signals have richer features than the approximate waveform template. The IMRPhenomD waveform only considers aligned-spin black hole binaries with circular orbits. However, more complicated evolution of MBHBs in LISA band need to be considered, such as residual eccentricity [55] and precession of binary's orbital plane [56][57][58], as MBHB waveforms will be modulated due to these facts. This is necessary for the neural network to work well on MBHB waveforms beyond the training dataset. Therefore, we generate 3 additional test datasets based on SEOBNRv4 [59], SEOBNRE [60][61][62] and SEOB-NRv4P [63][64][65] waveform family.
The SEOBNRv4 dataset describes the same binary system as IMRPhe-nomD but differs in modelling implementation. The SEOBNRE dataset contains the GW waveforms of eccentric MBHBs, and we sample parameters of eccentricity uniformly in the range of (0, 0.3). The SEOBNRv4P dataset captures the precession-modulated waveforms, in which the components' dimensionless spins (s 1x , s 1y , s 1z ) and (s 2x , s 2y , s 2z ) of binary system are uniformly sampled in the range of (−0.9, 0.9). In Fig. 4, we test the MFCNN model on the various datasets generated by the four waveform families. Compared to the IMRPhenomD waveform used for training, the model's performance on modulated waveform families is fairly similar throughout a wide range of redshift evaluated here. Even if the signal-to-noise ratio of a distant waveform event is very low, such as z = 16, the model can generalize to various waveform variations. Moreover, it turns out that our model has the nice ability to robustly extrapolate beyond the representations of our training region. This implies that our model may have the power to search real LISA data in the future for MBHB signals beyond the theoretical templates.

B. Sangria dataset
To assess how it handles more realistic data, we use the Sangria dataset from LDC-2 [35] to evaluate the trigger generation performance of our MFCNN model. The dataset covers approximately a year of simulated LISA data and contains simulated waveforms of 30 million Galactic binaries, 17 verification Galactic binaries and 15 coalescing MBHBs. We down-sample the dataset to 1/15 Hz for consistency. We then divide the dataset into overlapping segments corresponding to the input size of our model. We choose to overlap the segments by 80% as discussed in Sec. III and then pass the data through the MFCNN model. Processing all the segments with our MFCNN model produces a sequence of predictions which we use in further analysis. Specifically, the find peaks algorithm provided by SciPy [66] is used to detect a peak plateau on the probability of the positive class predicted by our model. Based on our search strategy in Sec. III, we use the previously described local-maximum triggers to identify MBHB coalescences, and we output the centre time of the segments that are potential merger locations. Notably, it takes only several seconds for the model to analyse a year long dataset on our device.
As the main result of this work, we demonstrate the predictive ability of the MFCNN model on the Sangria dataset in Fig. 5. For this analysis, we used two types of MFCNN for MBHB classification: models trained with and without mixed confusion noise. Both models, it turns out, locate all 15 MBHB mergers to within a short segment. Particularly, on the upper left of Fig. 5, even the MBHB signal with the smallest amplitude can be clearly recognized by our MFCNN models. Each trigger implies that the range of coalescence time is within 5.12 days. The target MBHB mergers are found right in the middle of this range. We find that this is always the case. Notably, we can also achieve the desired output even when multiple MBHBs merger are close together. Examples of this are shown in the upper middle panel of Fig. 5.
Although GW signals from Galactic binaries in LDC's 1-year data are generally considered non-Gaussian and non-stationary noise [40,41], the MFCNN models are simply trained on datasets with Gaussian and stationary noise regardless of whether it is mixed with or without the estimated confusion noise. In addition, we setup, for simplicity, a static configuration with a TDI-1.0 response [49] to generate our training data and measure the sensitivity (see caption of Fig. 5 for further explanations) of the MFCNN models as shown in upper right panel of Fig. 5. However, considering the complexity of the spacecraft motion, we perform a thorough analysis on the Sangria dataset coded by LISANode [67] with a TDI-1.5 response [68], which applies to a rigid but rotating configuration. Although the code and TDI version are inconsistent with our training data, we find that our model still can recognize all MBHB signals and reports no false alarms. It implies that the MFCNN model shows a robust resistance to the dynamic modulation of space- based GW detectors and has the potential to capture the general features of the waveform response.

VI. SUMMARY AND DISCUSSIONS
In this study, we demonstrate that the deep learning method, when applied to LISA data, is capable of searching for MBHB coalescences. We further employ the MFCNN model with a small number of templates to analyse and output predictions on a year long Sangria dataset within ten seconds. Our model can identify all 15 MBHB mergers with no false alarms and locate each merger to data segments as short as 5.12 days long. These results lay the foundation for accelerating the detection and forecasting the mergers of MBHBs to enable the observation of EM mission emerging after the MBHB coalescence. By building a neural network capable of rapidly searching for and counting MBHBs, we answer a fundamental question regarding the applicability of neural networks to LISA data analysis.
In practice, reliable searches for MBHB signals in streaming data are strongly affected by non-Gaussian and non-stationary noise, such as from unresolved Galactic binaries. To account for the overlap between signals, global-fit approaches [6][7][8] are adapted for space-based GW detectors to model all resolvable signals and instrument noise. The MFCNN-based analysis provides the number of sources and the time of coalescences, which are useful for the subsequent global-fit analysis. According to the number of sources we can fix the dimension of the parameter space in the subsequent global fit. If the number of sources is unknown, model selection has to be employed to determine the number of sources. In the conventional approach to model selection we need to calculate the Bayes factor between competing models of different dimension. Compared to the fixed-dimension analysis with the help of MFCNN, the global fit including model selection is several times computationally costly. As mentioned in Sec.V our model takes several seconds to analysis a year of data. However, the global fit for a MBHB with a year of data usually takes several hours on a multi-core processor. Actually the computational cost of parameter estimation for MBHB signals is affected by the SNR and the dimension of the parameter space. Of cource, for different global-fit algorithms, the MFCNN model may bring different improvements. Moreover, the time of coalescences provided by MFCNN can be used to determine or optimize the observation time needed to achieve a desired result. Our analysis represents a starting point for applying a neural network trained on Gaussian and stationary background noise to realistic non-Gaussian and non-stationary data. Due to the large dimensionality of the data characteristics needing modelling, there exists the potential for neural networks to exceed the sensitivity of existing Bayesian analysis in regard to real data.
In real-world tasks, the actual noise in recordings will need to be estimated simultaneously along with the various potential signals. However, in general, one can obtain information of the noise from a specific TDI observable in which the signals are greatly depressed [44]. Nevertheless, as can be seen in Fig. 5, it makes little difference on the Sangria dataset whether or not the MFCNN model is trained using confusion noise. This approach is essential to ensuring that neural networks properly characterize the non-Gaussian and non-stationary nature of realistic detector noise experienced by observatories. We further test the two models with additional injected signals and found that there is a slight difference in performance for signals with SNR < 100. As a result, once fully trained, the model can be used for real-time MBHB searches or even to achieve early warning in practice [69][70][71].
In this work, we present a model with robust sensitivity to numerous GW sources and modulation of MBHB waveform family. As we mentioned in the case of the Sangria dataset, the generalization ability of the supervised learning approach can be extended to various TDI configurations and can also be greatly useful for future space-based GW detectors. There will definitely be differences between the real data and the simulated data, such as the existence of glitches, non-stationary instrumental noise and data gaps (see discussion in [41]), so the performance and robustness of our method on realistic SNRs needs to be assessed and understood. However, more detailed investigations, such as the cosmic population of MBHBs, are beyond the scope of the present work. The analysis described here is an initial application of the deep learning approach to searching for MBHBs via space-based detection. Note that, the MFCNN model is trained with GW waveforms containing the merger phase. In the future, it is more inclined to train a deep learning model which is capable of identifying MBHB signals at the early inspiral stage for multi-messenger astronomy. The approach developed in this work can provide a potential application for LISA-like observatories to identify and localize MBHBs during the inspiral phase allowing time for detailed planning and coordination of joint multi-messenger observations.

ACKNOWLEDGMENTS
We thank Zhoujian Cao for his helpful comments and discussions. CL would like to thank Stanislav Babak for help in the use of the LDC code and datasets. WHR would like to thank Pengfei Zhou for discussion on CNN and Xiaolin Liu for discussion on SEOBNRE waveform model. We thank the Peng Cheng Laboratory (PCL) Cloud Brain for computation support. This work is supported in part by the National Key Research and Development Program of China Grant No. 2020YFC2201501, in part by the National Natural Science Foundation of China under Grant No. 12075297 and No. 12235019. The authors would like to acknowledge the work of the LDC group. For this study, both the LDC software and datasets were used [35]. PyCBC [72] and LALSuite [73] are also used to generate gravitational strains of coalescing MBHBs. Plots are generated by Matplotlib [74,75]. The implementation of the MFCNN model is coded based on PyTorch [76].