Deep learning merger masses estimation from gravitational waves signals in the frequency domain

.

Detection of gravitational waves (GW) from compact binary mergers provide a new window into multi-messenger astrophysics. The standard technique to determine the merger parameters is matched filtering, consisting in comparing the signal to a template bank. This approach can be time consuming and computationally expensive due to the large amount of experimental data which needs to be analyzed.
In the attempt to find more efficient data analysis methods we develop a new frequency domain convolutional neural network (FCNN) to predict the merger masses from the spectrogram of the detector signal, and compare it to time domain neural networks (TCNN). Since FCNNs are trained using spectrograms, the dimension of the input is reduced as compared to TCNNs, implying a substantially lower number of model parameters, and consequently less over-fitting. The additional time required to compute the spectrogram is approximately compensated by the lower execution time of the FCNNs, due to the lower number of parameters. In our analysis FCNNs show a slightly better performance on validation data and a substantially lower over-fit, as expected due to the lower number of parameters, providing a new promising approach to the analysis of GW detectors data, which could be further improved in the future by using more efficient and faster computations of the spectrogram.
Introduction According to general relativity, gravitational waves propagate at the speed of light and in the linear perturbative regime are produced by the second order time derivative of the quadrupole moment. The main sources of gravitational waves that can be detected at the Laser Interferometer Gravitational Wave Observatory (LIGO) and Virgo collaborations are the mergers of compact binary systems composed by black holes or neutron stars.
Due to the no-hair theorem, the merger of two black holes has no electromagnetic counterpart [1] but binary star mergers can have electromagnetic signals, therefore opening a new window into multi-messenger astronomy. These systems are known as standard sirens because the gravitational wave signal provides information of the distance to the objects independent of the cosmic distance ladder and their electromagnetic counterparts provide information about their speed. Therefore, they can be used to measure the Hubble parameter [1,2] or to constrain alternative gravity theories with superluminal or subluminal GW speeds [3].
In order to achieve this goal, the detectors must have a strain sensitivity of the order of 10 −21 / √ Hz [4] and the standard data analysis approach consist in using matched filtering to compare the detector signal to a bank of gravitational wave templates in order to determine the merger parameters. Neural networks can be used to denoise the raw signal [5,6] as a preprocessing step before matched filtering. This data analysis process must be repeated for every signal, which can be very time consuming and computationally expensive depending on the size of the template bank. Another approach has been developed [7][8][9][10][11][12][13][14][15] in which the time domain detector data is processed by a convolutional neural network to predict the merger masses. In this paper we present the results of applying CNN to the frequency domain data, i.e. the Fourier transform of the time domain data, and call this neural network FCNN, to distinguish it from the CNN applied to time domain data, which we denote as TCNN.
The FCNN relies on the short-time Fourier transform to extract the frequency domain features needed to train the network. This approach reduces the dimensionality of the input, and the FCNN has around 50.000 parameters, compared with almost 550.000 of the TCNN. As a direct consequence FCNN have better out of sample performance compared to TCNN, and tend to also have a lower over-fit, due to the significantly lower model complexity.
Training Data generation The training data is generated using the PyCBC package[17], developed by the PyCBC Development Team and the LIGO / Virgo Collaborations.
This library contains a method to generate the waveform corresponding to a GW event, and accepts as inputs several different parameters. In the waveform generation we assumed for simplicity the spins and orbital eccentricities to be zero, as in [7]. Data with π/2 polarization was also generated in order to evaluate the robustness of the neural network to signals with different parameters. The networks were trained to predict the two masses of the merger, while other parameters were kept fixed in the data generation. We kept the default values for the other pa-rameters of the waveform generator function except for the approximant, which is chosen to be the fourth version of Spin Effective One Body Numerical Relativity (SEOBNR) due to its efficiency. An example of a merger GW simulated signal is shown in fig.(1 In order to train the networks with realistic data we add noise to the simulated signal. Similarly to [7], in order to account for translations in the signal, the data was augmented by applying a random temporal shift in the interval [0,0.2]. We generated data with different signal-to-noise ratios (SNR), and colored noise was added according to the power spectral density (PSD) provided by LIGO.
The matched-filter SNR between a template h and a signal s is defined by [16]: the bracket notation involves the following correlation: whereŝ andĥ are the Fourier transforms of the signal and template respectively and S n is the PSD of the detector. An example of the signal and the corresponding noised signal is shown in fig.(2). We train the FCNN using spectrograms, which are two dimensional matrices whose columns are related to the frequency power spectra of the strain ST at different times, according to where SP ω is the spectrogram, and ST ω is the Fourier transform of ST over different time intervals. The spectrograms are obtained by performing a Fast Fourier Transform (FFT) using equally spaced time intervals, with sampling frequency of 4096 Hz, windows of 128 elements, a zero-padding of 896 elements, and an overlap between windows of 64 elements.
The spectrogram of a merger GW signal is shown in fig.(3), and the spectrogram of the corresponding noised signal is shown in fig.(4), where it can be seen that the merger signal is mainly noticeable at low frequency. As a consequence, for the purpose of training the FCNN, the spectrograms were cropped at 120 Hz.

CNN architectures
The TCNN described in [7], summarized in Table I, was implemented as a benchmark to compare the performance of the FCNN. The FCNN, whose architecture is shown in Table  II, consists of three convolutional layers that perform 2D convolutions on the zero-padded signal, followed by a max pooling layer. The resulting output of the pooling layer is then flattened into a one dimensional vector of 1024 entries, which is fed into a two layer fully connected net, that predicts the two masses of the merger.
The FCNN has about 50.000 parameters, compared with almost 550.000 of the TCNN. The smaller number of parameters has a regularization effect by reducing the variance of the model, making it less prone to over-fitting as the number of degrees of freedom is greatly reduced. This is achieved because the spectrogram reduces the total number of input components, the number of convolutions is less than the TCNN and the two dimensional pooling operation reduces the number of components more than the one dimensional pooling. The latter greatly reduces the number of input components before the flatten layer and the number of parameters in the subsequent dense layers.   In order to improve the performance of the models, the input data was normalized before training. The normalization that gave best results was the min-max scaling defined by Over-fit When the training set error is very low, due to a high number of parameters, there is a risk of over-fitting, which manifests in a large difference between the training and validation errors. In fact, even if the error of the model on the training set reaches low values, it does not necessarily imply that its predictive ability, when applied on data different from the training set, will be as good. In order to quantify the difference between the training and the validation errors we define the following over-fitting estimator O = train error − test error test error .
Low values of the over-fitting estimator correspond to a small relative difference between the training and validation errors, implying the model will have a performance on out-of-sample data similar to the one on training data.
Training Metric The purpose of this model is to predict the masses of the merger from the spectrogram of the gravitational wave's strain signal. We train the model by iteratively minimizing the mean absolute percentage error (MAPE) each epoch, defined as where n is the number of samples in each epoch,M i1 andM i2 are the masses of the merger predicted by the model, while M i1 and M i2 are the masses from the training set used in the simulation for the i − th sample.
Comparing FCNN to TCNN performance The merger GW data was simulated with a sampling rate of 8192 Hz, generated for mass values from 10M to 75M , with a mass ratio less than 10, and a distance of 2000 MPc, resulting in a total of 9346 mergers with SNRs in the 5 to 25 range. The data was split evenly between modeling and validation sets as shown in fig.(5). The modeling set was further split with a 70/30 ratio into training and development sets respectively, where the training set was used to optimize the model parameters, and the development set was used for hyperparameter tuning, specifically to find better performing network architectures. The same procedure was applied to the two data sets simulated with polarization equal to 0 and π/2. The error of FCNN and TCNN models on the validation data over the range of SNRs is shown in fig.(6) for data with polarization equal to 0.
In fig.(7) the over-fit of the two models over the range of SNRs is shown, as defined in eq.(5). As mentioned earlier, the FCNN has much fewer parameters than TCNN, and it is therefore expected to have a lower over-fit than TCNN.
In order to test the robustness of the models under the change of other merger parameters we also created another set of training and validation data, using a different polarization angle. Gravitational wave signals with a polarization angle of π/2 were simulated keeping all other parameters fixed. We used the same number of simulated samples, and the masses of the mergers ranged from 10M to 75M with a mass ratio less than or equal to 10.
The error of the TCNN and FCNN models on the validation set for data with polarization equal to π/2 is shown in fig.(9). Likewise, the over-fit of TCNN models for this data is shown in fig.(10). The FCNN over-fit was lower than for TCNN, suggesting that  FCNN generalizes better than TCNN also on signals from gravitational waves with different parameters. As can be seen in fig.(6) and fig.(9), the performance of TCNN and FCNN is approximately the same, but for low SNRs the FCNN is slightly better. Nonetheless, from the over-fit plots it can be seen that, thanks to the reduced number of parameters, FCNN has a better out-of-sample performance. A comparison of the MAPE of the mass predictions for out of sample data is shown in fig.(8).
The execution time of the FCNN is in general much lower than the TCNN, because the FCNNs have much less parameters. If we add to this execution time the time necessary to compute the spectrogram using scipy.signal.spectrogram, we obtain a total computational time which is on average only about 6% greater than that of a CNN working on the time domain data, but with a better MAPE and less over-fit due to the smaller number of parameters. Using more efficient implementations of the FFT to compute the spectro-gram, and parallelizing it, could allow to reduce the FCNN pipeline execution time. Conclusions We have developed a new convolutional neural network, FCNN, to determine the merger masses, trained on the spectrograms of simulated GW signals, and compared its performance with other CNN trained on time domain data (TCNN) [7]. The networks were trained for 1000 epochs using 4673 gravitational wave signals with a 70-30 train/development split, and the cost function which was minimized was the sum of mean absolute percentage errors between the masses and their predictions. The FCNN was trained on spectrograms, allowing it to reduce the dimension of the input, resulting in a lower number of parameters in the final fully connected layers of the network, reducing its variance.  5), is plotted for TCNN and FCNN models as a function of the SNR, for data with polarization equal to π/2.
The execution time of the FCNN is in general much lower than the TCNN, because the FCNNs have much less parameters. Adding the computational time of the spectrogram using scipy.signal.spectrogram, we obtain a total time which is on average only about 6% greater than that of a CNN working on the time domain data, but with a slightly better MAPE and substantially less over-fit, due to the smaller number of parameters.
In the future it could be interesting to use more efficient computations of the spectrogram using parallelization, in order to reduce the FCNN pipeline execution time. It will also be important to design more complex FCNNs in order to predict additional parameters such as the spins and orbital eccentricities, or to apply it to data simulated for other detectors such a the Laser Interferometer Space Antenna.