Real-Time Detection of Gravitational Waves from Binary Neutron Stars using Artificial Neural Networks

The groundbreaking discoveries of gravitational waves from binary black-hole mergers and, most recently, coalescing neutron stars started a new era of Multi-Messenger Astrophysics and revolutionized our understanding of the Cosmos. Machine learning techniques such as artificial neural networks are already transforming many technological fields and have also proven successful in gravitational-wave astrophysics for detection and characterization of gravitational-wave signals from binary black holes. Here we use a deep-learning approach to rapidly identify transient gravitational-wave signals from binary neutron star mergers in noisy time series representative of typical gravitational-wave detector data. Specifically, we show that a deep convolution neural network trained on 100,000 data samples can rapidly identify binary neutron star gravitational-wave signals and distinguish them from noise and signals from merging black hole binaries. These results demonstrate the potential of artificial neural networks for real-time detection of gravitational-wave signals from binary neutron star mergers, which is critical for a prompt follow-up and detailed observation of the electromagnetic and astro-particle counterparts accompanying these important transients.


Introduction
The detections of gravitational waves (GWs) from binary black hole (BBH) mergers have verified Einstein's theory of General Relativity in extraordinary detail in the most violent astrophysical environments [1,2,3,8]. In addition, the first observation of coalescing neutron stars in both gravitational and electromagnetic spectra has initiated the era of Multi-Messenger Astrophysics (MMA), which uses observations in electromagnetic radiation, gravitational waves, cosmic rays, and neutrinos to provide deeper insights about properties of astrophysical objects and phenomena [4,9]. These discoveries were made possible by the Advanced Laser Interferometer Gravitational Wave Observatory (LIGO) and Virgo collaborations. As gravitational-wave detectors increase their sensitivity many more observations, including BBH, binary neutron star (BNS) and black hole -neutron star (BHNS) signals are likely to be detected more frequently. Conventional gravitational-wave detection techniques are based mainly on a method known as template matched filtering [7,10], which typically uses large banks of template waveforms each with different compact binary parameters, such as component masses and/or spins. Since parameters are not known in advance, a template bank spans a large astronomical parameter space, which makes these approaches very computationally expensive and challenging. In particular, as it has been already pointed out in the literature Email address: plamenlrastev@fas.harvard.edu (Plamen G. Krastev) [11,12], GW searches based on matched-filtering techniques currently target a 4D parameter space (compact binary sources with spin-aligned components on quasicircular orbits) out of the 9D parameter space available to the current GW detectors (binary component masses (m 1 , m 2 ) and spins (ŝ 1 ,ŝ 2 ) plus the orbital eccentricity e). The computational cost of these low-latency GW searches based on implementations of matched-filtering is presently such that their extension to the full 9D signal manifold is computationally prohibitive [13]. Most importantly, these surveys may miss important GW transients where a rapid follow-up is critical for successful observation of their electromagnetic counterparts. Specifically, the optical counterparts of gravitational waves from the merger of BNS and BHNS systems, known as kilonovae [14], encode key information required to constrain the physical properties of the transient, but due to their fast decay rate they need to be identified and localized within several hours after the compact binary merger and promptly observed in the entire electromagnetic spectrum. Therefore, based on the above considerations, the need arises for new methods to overcome the limitations and computational challenges of existing GW detection algorithms, in particular, approaches to detect in real-time GW signals from binary neutron star (and black hole-neutron star) mergers in the full parameter space available to current and future GW detectors.
In this work, we explore a deep-learning approach to rapidly detect gravitational -wave signals from binary neutron star mergers. Deep learning algorithms [15], a subset of machine learning, have been very successful in tasks, such as image recognition [15,16] and natural language processing [17], and recently also emerged as a new tool in GW astrophysics for detection, characterization [5,6,7] and denoising [18] of GW signals from binary black holes. Deep-learning methods are able to perform analysis rapidly since the computationally intensive part of the algorithm is done during the training stage before the actual data analysis [19], which could make them orders of magnitude faster than conventional match-filtering techniques [5]. Here, we demonstrate the power of the deep-learning approach on the specific example of rapidly classifying gravitational waves from binary neutron star mergers from detector noise and signals from binary black holes. This example shows clearly that machine learning can help in the real-time detection of BNS signals and thus trigger a prompt follow-up of the electromagnetic counterparts of the gravitational-wave transient.

Methods
Deep learning algorithms consist of processing units, neurons, which are arranged in arrays forming one to several layers. A neuron acts as a filter performing a linear operation between the input array and the weights associated with the neuron. A deep neural network has an input layer, typically followed by one or more hidden layers, and a final layer with one or more output neurons. In classification problems, the output neurons give the probabilities that an input sample belongs to a specific class. In this case, we distinguish between three classes of time series, BNS and BBH merger signals in additive Gaussian noise (signals plus noise), and Gaussian noise only, where we use integer class labels (O: Noise, 1: BBH signal, 2: BNS signal). Accordingly, the data sets consist of simulated gravitational-wave time series where the compact binary merger signals (BNS and BBH) are generated using the LIGO Algorithm Library-LALSuite [20]. For the BNS signals, we use the PhenomPNRT waveform model [21] and simulate systems with component masses in the range from 1 to 2M , including also tidal deformation contributions, where the tidal deformability, Λ, is computed with the APR equation of state (EOS) [22]. (For computing Λ, see e.g,. Refs. [23,24].) The BBH signals are simulated using the SEOBNRv2 waveform model [25], which models the inspiral, merger and ringdown components of the signal. We simulate systems with component masses in the range from 5 to 50M , with zero spin. The simulated signals are chosen to be 10 seconds in duration sampled at 4096 Hz. This choice was made because BNS signals are considerably longer and contain typically much higher frequencies than BBH gravitational-wave signals.
The simulated signals are "whitened" with Advanced LIGO's power spectral density (PSD) at the "zero-detuned high-power" [26] to rescale the noise contribution at each frequency to have equal power [7]. Subsequently, the waveforms are shifted randomly such that the peak amplitude of each waveform is randomly positioned in the range from 9.65 to 9.95 seconds of the time series for the BNS signals, and from 8 to 9.95 seconds for the BBH signals (since BBH signals are considerably shorter than BNS signals), to reassure robustness of the network against temporal translations. Different realizations of white Gaussian noise are superimposed on top of the signals, while the waveform amplitude is scaled to achieve a predefined optimal signalto-noise ratio (SNR) defined as [7] whereh(f ) is the frequency domain representation of the GW strain, S n (f ) is the single-sided detector noise PSD (chosen here at the "zero-detuned high-power") and f min is the frequency of the GW signal at the start of the sample time series. From an astrophysical perspective, rescaling the GW waveform simply translates to moving the source closer or further away from the detector. Example time series are shown in Fig. 1. Supervised learning requires that data sets are divided into training, validation and testing data. Training data is used by the network to learn from, validation data allows for verification of whether the network is learning correctly, Figure 2: Using an artificial neural network to detect gravitational-wave signals from binary neutron-star mergers. Gravitational-wave time series serve as input for a deep convolutional network with convolutional and fully connected layers. The blue arrow represents the sliding of the convolutional filters along the input time-series vector. The last softmax layer outputs the probability that the input time series belongs to a certain class (Noise, BBH Signal, or BNS Signal). The weights of the artificial neural network are tuned by training on many labeled data samples and the network can then classify an unknown sample time series with high confidence. and the testing data is used to assess the performance of the trained model. The training sets used here consist of 100,000 independent time series with 1/3 containing BNS signal + noise, 1/3 BBH signal + noise, and 1/3 noise only. The validation and testing data sets each consist of 5,000 independent samples containing (approximately) equal fractions of each time-series class. To ensure that the neural network can identify BNS gravitational-wave signals over a broad range of astronomically motivated SNR values, we start the network training with large SNR and then gradually reduce the SNR to lower levels. This approach is adapted from "curriculum learning" [27] and enables the network to learn to distinguish signals with lower SNR more accurately. Specifically, for all training sessions the SNR of each BBH and BNS waveform was randomly sampled in the range [SNR low , SNR high ] with SNR high = 20. Initially, SNR low was set to 20 and then gradually decreased to 3 in steps of 1 in each subsequent training session. Thus, the final SNR was uniformly sampled in the range between 3 and 20.
The neural network used here is a convolutional neural network (CNN) [28] and has 4 convolutional and 4 pooling layers, followed by 2 dense fully connected layers. The filter sizes of the convolutional layers are 32, 64, 128 and 256 respectively, and the sizes of the dense layers are 128 and 64. We used kernel sizes of 16, 8, 8 and 8 for the convolutional layers and 4 for all pooling layers. The first layer corresponds to the input to the neural network which in this case is a one-dimensional time-series vector (of dimension 40,960). Each neuron in the convolutional layers computes the convolution between the neuron's weight vector and the outputs from the layer below. Neuron weights are updated through an optimization back propagation algorithm [29]. Pooling layers perform a down-sampling operation along the spatial dimensions of their input. At the end, there is a hidden dense layer connected to an output layer computing the inferred class probabilities. The network design is optimized by fine-tuning multiple hyper-parameters, which include here the number and type of network layers, the number of neurons in each layer, max-pooling parameters, and type of activation functions. The optimal network architecture was determined through multiple experiments and tuning of the hyper-parameters. We show the architecture of the final network in Table 1.
To build and train the neural network, the Python toolkit Keras (https://www.tensorflow.org/guide/keras) was used, which provides a high-level programming interface to access the TensorFlow [30] (https://www.tensorflow.org) deep-learning library. We use the technique of stochastic gradient descent with an adaptive learning rate with the ADAM method [31] with the AMSgrad modification [32]. To train the neural network, we use an initial learning rate of 0.001 and choose a batch size of 1000. For each SNR range the number of training epochs is limited to 100, or until the error on the validation data set stops decreasing. The network training was performed on NVIDIA Tesla V100 GPU and the size of the mini-batches was chosen automatically depending on the specifics of the GPU and data sets. The cost function was selected to be the sparse categorical cross-entropy loss. The process of using an artificial neural network to detect gravitational-wave signals from BNS mergers is illustrated in Fig. 2.

Results
We assess the performance of the neural network by extracting the probability values produced by the neurons in the output layer. These values are between 0 and 1 with their sum being unity. Each neuron gives the inferred probability that the input time series belong to the noise, BBH signal, or BNS signal class, respectively. Specifically, for a given SNR we construct the receiver operator characteristic (ROC) curves for the BBH and BNS classes. Here a ROC curve represents the fraction of sig-  nals correctly identified as their respective class, BNS or BBH (true alarm probability), versus the fraction of samples identified incorrectly as signals of the particular class (false alarm probability). A ranking statistic is considered superior to another if at a fixed false alarm probability it reaches a higher true alarm probability (sensitivity) [7]. The ROC curves are conveniently constructed with the Python scikit-learn library (https://scikit-learn.org). The optimal SNR was varied from 1 to 20 in integer steps of 1 and the classifier was applied to time-series inputs containing approximately equal fractions of each class (Noise, BBH Signal, BNS Signal). In Fig. 3 we show the ROC curves calculated for test data sets containing BBH and BNS GW signals. These results show that the neural network is more sensitive to detecting GW signals from BBH than BNS mergers. In particular, the neural network reaches a maximal true alarm probability for BBH signals with optimal SNR ρ opt = 10 across the range of false alarm probabilities explored in this work (Fig. 3 (a)). On the other hand, it achieves a maximal true alarm probability for BNS signals with optimal SNR ρ opt = 18 ( Fig. 3 (b)). The results imply that all BBH signals are identified for SNR ≥ 10 and both BBH and BNS signals are detected for SNR ≥ 18.
We analyze the performance of the classifier further by looking at the sensitivity of detection of BNS and BBH signals for different SNR values at a fixed false alarm probability. These sensitivity curves are shown in Fig. 4, where the true alarm probability (sensitivity) is plotted as a function of the optimal SNR for several false alarm probabilities (10 −1 , 10 −2 , 10 −3 ). The sensitivity of detection of the neural network to identify GW signals from BBH mergers is very similar to the one reported by Gabbard et al. [7]. It is also seen in Fig. 4 that all curves saturate (at 1) for optimal SNR ≥ 18, i.e., all signals, both BNS and BBH, are always detected.
Furthermore, the deep neural network automatically extracts and compresses information by finding patterns in the training data, dramatically reducing data dimensionality and thus creating a very computationally efficient and portable model. For instance, the size of the trained model is only about 83 MB, including the network weights and architecture information, therefore compressing approximately 6.6 × 10 4 gravitational BBH and BNS waveforms (excluding noise samples), each of duration 10 seconds sampled at 4096 Hz, with a total size of about 10.4 GB. The trained neural network model can therefore be viewed as an abstract and compact representation of the template bank. In addition, the computational cost of evaluating the neural network on new GW data, after it has been trained, does not depend on the data set size. As mentioned previously, the computationally intensive training stage is performed only once offline. For example, once trained, processing 10 seconds of gravitational-wave data takes only milliseconds on both CPUs and GPUs with the final CNN architecture. Such rapid processing is advantageous for generating real-time alerts and can provide useful hints for follow up searches of electromagnetic counterparts of GWs and also for focused analysis with accurate matched filtering approaches and Bayesian parameter estimation [33]. For instance, as more GW detectors come online, the computational cost of matched filtering methods scales at least linearly in the number of detectors. (This is because the search for triggers is first performed independently for each detector.) Moreover, the computational cost for trigger generation also scales linearly in the number of waveforms in the template banks. As template banks become bigger, matched filtering becomes increasingly computationally expensive, which makes online realtime trigger generation very computationally challenging [33]. Specifically, the extension of real-time matched filtering techniques to the full 9D signal manifold currently available to GW detectors is computationally prohibitive [13]. These computational considerations are the major motivation to explore alternative detection methods in the first place.
In addition, a real-time detection of GWs from compact binary systems involving neutron stars would enable fast source localization necessary for rapid multi-wavelength follow-ups with relevant telescopes, which have small fields of view. The results of this study demonstrate the potential of deep learning algorithms to aid the prompt detection of GW signals from binary neutron star mergers and distinguish them from BBH signals and noise over a wide SNR range, with moderate computing resources (e.g., a standard laptop computer), which would make it possible to trigger timely and detailed observations of their electromagnetic counterparts.

Summary
In conclusion, we have demonstrated the detection of gravitational waves from binary neutron star mergers via deep learning techniques on simulated gravitational-wave detector data using the specific example of data containing BNS and BBH signals in Gaussian noise. These results point the way to real-time detection of gravitational waves from multi-messenger astrophysical sources, where a rapid follow-up is critical. Future directions include using machine learning algorithms for real-time parameter estimation of gravitational-wave signals from BNS and BHNS systems. In particular, machine learning approaches could help to extract challenging source parameters, e.g., neutron-star tidal deformability [23], which is extremely important for understanding properties of dense Figure 4: Sensitivity curves illustrating the ability of the neural network to identify BNS and BBH GW signals. The true alarm probability is plotted as a function of the optimal SNR for false alarm probabilities 10 −1 , 10 −2 , and 10 −3 . matter and fundamental interactions, but is theoretically controversial and observationally challenging to deduce.