NNETFIX: An artificial neural network-based denoising engine for gravitational-wave signals

Instrumental and environmental transient noise bursts in gravitational-wave detectors, or glitches, may impair astrophysical observations by adversely affecting the sky localization and the parameter estimation of gravitational-wave signals. Denoising of detector data is especially relevant during low-latency operations because electromagnetic follow-up of candidate detections requires accurate, rapid sky localization and inference of astrophysical sources. NNETFIX is a machine learning-based algorithm designed to remove glitches detected in coincidence with transient gravitational-wave signals. NNETFIX uses artificial neural networks to estimate the portion of the data lost due to the presence of the glitch, which allows the recalculation of the sky localization of the astrophysical signal. The sky localization of the denoised data may be significantly more accurate than the sky localization obtained from the original data or by removing the portion of the data impacted by the glitch. We test NNETFIX in simulated scenarios of binary black hole coalescence signals and discuss the potential for its use in future low-latency LIGO-Virgo-KAGRA searches. In the majority of cases for signals with a high signal-to-noise ratio, we find that the overlap of the sky maps obtained with the denoised data and the original data is better than the overlap of the sky maps obtained with the original data and the data with the glitch removed.

rapid sky localization and inference of astrophysical sources. NNETFIX is a machine learning-based algorithm designed to remove glitches detected in coincidence with transient gravitational-wave signals. NNETFIX uses artificial neural networks to estimate the portion of the data lost due to the presence of the glitch, which allows the recalculation of the sky localization of the astrophysical signal. The sky localization of the denoised data may be significantly more accurate than the sky localization obtained from the original data or by removing the portion of the data impacted by the glitch. We test NNETFIX in simulated scenarios of binary black hole coalescence signals and discuss the potential for its use in future low-latency LIGO-Virgo-KAGRA searches. In the majority of cases for signals with a high signal-to-noise ratio, we find that the overlap of the sky maps obtained with the denoised data and the original data is better than the overlap of the sky maps obtained with the original data and the data with the glitch removed.

Introduction
The field of gravitational-wave (GW) astronomy began with the first direct detection of a GW signal from a binary black hole (BBH) merger [1] on September 14 th , 2015. Nine additional BBH mergers were detected with high confidence during the first and second LIGO [2] and Virgo [3] observation runs [4]. During the third LIGO-Virgo observation run, 39 binary merger events were detected with high confidence [5], including two exceptional BBH events [6][7][8] and a possible neutron star and black hole (NSBH) merger [9].
On August 17 th , 2017, the first detection of a GW signal from a binary neutron star (BNS) merger, GW170817, expanded multi-messenger astronomy to include GW observations [10]. A short gamma-ray burst (GRB) was detected approximately 1.7 seconds after the BNS merger time [11]. The sky map calculated from the GW signal allowed the identification of the event with an electromagnetic (EM) counterpart [10,11]. The association of this GW event with the observed EM transients supports the long-hypothesized model that at least some short GRBs are due to BNS coalescences [12] and has provided many insights into fundamental astrophysics and cosmology. In April 2020, a second BNS merger without an EM counterpart was detected [13].
In order to detect GW signals, ground-based GW detectors must be extremely sensitive, causing them to become highly susceptible to instrumental and environmental noise [4]. In particular, transient noise bursts, or glitches, may impair the quality of detector data. The presence of a glitch in the proximity of a GW signal can adversely affect the analysis of the latter, including calculating the sky localization of the source. The most notable example of such an occurrence was GW170817, where the effect of a glitch was mitigated in low-latency by removing the contaminated portion of the data and in follow-up studies by applying ad hoc mitigation algorithms [10,14].
One possibility to mitigate the effect of a contaminating glitch would be to discard the data from the affected detector. This is the simplest and fastest solution; however, it is also likely to impact the analysis and sky localization, especially in cases where data is only available from two detectors. Another technique that can be used in low-latency is gating, which removes the data affected by the glitch. One method of gating is to set the data affected by the glitch to zero using a window function to smoothly transition into and out of the gate [15]. Gating was used in the case of GW170817 to produce the low-latency sky localization for EM follow-up observations [16]. On larger latencies, glitch mitigation techniques such as using BayesWave [17] to model and subtract the glitch can be used [16]. Figure 1 shows an example of the detrimental effect gating data can have on the sky localization error region of a simulated BBH merger signal. The sky localization obtained with the gated data significantly differs from the sky localization estimated from the full data. Also, the 90% sky localization error region after the gating is applied no longer includes the true sky position of the injected signal. The higher detector sensitivity of LIGO and Virgo in their third observing run has led to an increased number of GW candidate detections from different astrophysical populations [6,9,13]. Future observation runs with higher sensitivity are expected to produce even greater detection rates, which would lead to higher chances of observing GW signals contaminated by glitches. The inability to accurately estimate the sky localization of GW candidates with potential EM counterparts due to glitches contaminating the signal could put at risk new astrophysical discoveries such as those made with GW170817. Thus, the development and implementation of accurate low-latency denoising methods could be highly beneficial to multi-messenger observations.
In this paper, we present a machine learning-based algorithm to denoise transient GW signals called NNETFIX ("A Neural NETwork to 'FIX' GW signals coincident with short-duration glitches in detector data"). NNETFIX uses artificial neural networks (ANNs) to estimate the portion of a signal which is lost due to the presence of an overlapping glitch. We train the ANN to reconstruct the portion of a gated signal on a template bank of BBH waveforms injected into simulated noise data. The accuracy of the algorithm is assessed by comparing the recovered waveform, signalto-noise ratio (SNR), and sky map from the processed data to the corresponding quantities obtained before gating. We derive a set of statistical metrics to assess the improvement in these quantities.

Algorithm implementation, training and testing
We consider a scenario in which a transient BBH GW signal is observed by a network of at least two detectors and the data of one detector is partially gated to remove a glitch. Without loss of generality, we perform the analysis for the two LIGO detectors, LIGO-Hanford (H1) and LIGO-Livingston (L1), with the gating applied to data from the H1 detector. We assume the merger time at the geometric center of Earth (or geocentric merger time) to be (approximately) known from L1 data. We denote with s f (t), s g (t) and s r (t) the full time series, the gated time series, and the NNETFIX reconstructed time series, respectively. The output of NNETFIX can be thought of as the map We train an ANN regression algorithm to construct the map F such that s r (t) ∼ s f (t). The NNETFIX implementation uses the scikit-learn [18] Multi-Layered Perceptron (MLP) Regressor, a type of ANN in which the nodes (mathematical functions) are arranged into layers and connected to every node in the preceding and/or succeeding layers [19]. Each node calculates a weighted linear combination of the outputs from the preceding layer and applies an activation function that introduces a non-linearity into the node's output. The ANN trained by NNETFIX consists of one hidden layer containing 200 neurons. In the ANN training process, NNETFIX uses the rectified linear unit (ReLU ) activation function [20] and the ADAM stochastic gradient-based optimizer [21] with a learning rate of 10 −3 . Ten percent of the training data samples are set aside and used for validating the training. The training iteration stops if the ANN performance plateaus with a tolerance level of 10 −4 to avoid overtraining. To reconstruct the gated portion of the time series, one hidden layer works better than multiple hidden layers for the loss function of mean square error and the number of hidden layers tested. The values from the loss function have a weak dependency on the number of neurons.
To train the algorithm, we first build template banks of simulated non-spinning IMRphenomD BBH merger waveforms [22] with varying intrinsic and extrinsic parameters. To reduce the potential for overtraining, each template bank also includes a number of (pure) noise time series. We distribute the positions of the injected signals isotropically in the sky. The waveform coalescence phase, polarization angle, and cosine of the inclination angle are uniformly distributed in the intervals [0, 2π], [0, π], and [−1, 1], respectively. We uniformly distribute the network SNR ρ N [15] of the simulated signals in the range [11.3, 42.4]. We consider three distinct template banks corresponding to low, medium, and high BBH component masses to assess the prediction accuracy of the trained ANNs for different signal lengths. The BBH component masses are uniformly sampled according to a Jeffreys prior for the matched-filter detection statistic. As the mass of the system decreases, we employ a higher number of templates to properly cover the mass parameter space [23][24][25][26].
For each of the three distinct template banks, we build 12 training+testing (TT) sets: first, we inject each waveform into 50 distinct realizations of recolored Gaussian noise for advanced LIGO (aLIGO) at design sensitivity; second, we include the (pure) noise time series; third, we shuffle and split the set by 70%-30% for training and testing; and finally, we apply the 12 combinations of gate durations t d = (50, 75, 130) ms and gate end-times before the geocentric merger time t e = (15, 30, 90, 170) ms. The time series are sampled at 4096 Hz, whitened, and high-passed. A conservative value of 25 Hz is used for the high-pass filter. The gates are implemented as a reversed Tukey window with a taper of 0.1 s and held fixed with respect to geocentric merger time; however, the merger time seen in the H1 detector naturally shifts due to the sky position and the polarization angle of the GW signal. Table 1 shows the range of the component masses, the number of waveforms, the number of noise series, and the dimension of the sets for the different scenarios. We test the effectiveness of the ANNs by calculating the coefficient of determination for the MLP Regressor in scikit-learn on the testing sets [18]. The coefficient of determination ranges from −∞ (bad) to 1 (perfect estimation), with positive values corresponding to some degree of accuracy. We evaluate the coefficient of determination on each testing set after training the ANN on the corresponding training set. The ranges of the coefficient of determination for the testing sets are We test for potential statistical effects in the training method by considering the medium mass scenario with a gate duration of 50 ms and a gate end-time of 30 ms as a representative case. For 100 trials, we find that the coefficient of determination ranges from 0.800 to 0.826 with a mean of 0.815, which is consistent with the ranges of the testing sets.
The effect of NNETFIX on quantities such as SNR and sky localization varies for different component masses, network SNR, and gate settings. Therefore, we construct 108 additional independent exploration sets with fixed network SNR ρ N = (11.3, 28.3, 42.4) and component masses of (12,10), (20,15), (35, 29) M , and identical combinations of gate durations and end-times as the TT sets. Each exploration set consists of 512 independent time series with the remaining parameters distributed as in the TT sets.

Performance in the time-domain
NNETFIX's performance in estimating the full time series can be assessed by computing the amount of SNR lost in the reconstruction process. We define the fractional residual SNR (FRS) where ρ f and ρ r are the (single interferometer) peak SNR of the full time series and the reconstructed time series in H1, respectively. Positive values of FRS close to zero generally indicate accurate time series reconstructions. However, FRS ∼ 0 may also occur when the gating does not significantly reduce the peak SNR of the full series, and thus, ρ f ∼ ρ g ∼ ρ r . These cases can be separated by the fractional SNR gain (FSG) where ρ g is the peak SNR of the gated series. The  Median values of FSG range from ∼ 0.02 (low-mass, low-SNR case with t d = 170 ms and t e = 50 ms) to ∼ 0.89 (high-mass, low-SNR case with t d = 130 ms and t e = 15 ms). High-mass (low-mass) exploration sets are typically characterized by higher (lower) values of the FSG. All high-mass sets have FSG > 0.08, while all lowmass sets have FSG < 0.07. Gate end-time and SNR do not seem to significantly affect the value of FSG. The gate duration has a larger effect. Sets with longer gate durations typically have larger median value of the FSG. Two thirds of the sets with t d = 130 ms have FSG > 0.08 compared to only 42% of the sets with t d = 50 ms.
A combined threshold of FRS 0 and FSG 0.01 is a conservative choice for good reconstructions. About 70% of the samples across all the exploration sets satisfy this criterion. Figure 2 shows the NNETFIX data reconstruction for the time series of Fig   fractional match gain (FMG) where the match M i between a time series s i and the injected waveform h is The inner product of two time series s i and s j is defined as where the tilde indicates the Fourier transform, the star denotes the complex conjugate, S(f ) is the detector noise power spectral density (PSD), f 1 is the high-pass frequency, and f N is the Nyquist frequency. In Eq. (4), we assume M f − M g > 0. In rare instances (0.5% of all exploration set data samples), M g becomes larger than M f . This occurs for small values of the single interferometer peak SNR (median value of 4.6) when the gated portion of the data is dominated by noise and anti-correlates with the injected waveform. In the following, we remove these data samples from the exploration sets.
The FMG assesses how well the NNETFIX reconstructed data matches the signal in comparison to the full data and gated data. Positive (negative) values of the FMG correspond to M r greater (smaller) than M g , indicating that the NNETFIX reconstructed time series has a better (worse) match with the injected waveform than the gated time series. Values of the FMG larger than 1 indicate that the ANN overfits the data, i.e., the reconstructed time series is more similar to the injected waveform than the full time series. Therefore, we consider the reconstructions with 0 < FMG ≤ 1 to be successful. Figures 4 and 5 show the distributions of the FMG for two exploration sets from the medium-mass scenario. Figure 6 displays the comparison of these distributions. We quantify NNETFIX's performance by estimating the reconstruction efficiency, which we define as the fraction of successfully reconstructed samples, i.e., samples with 0 < FMG ≤ 1. The fractions of samples with FMG ≤ 0, 0 < FMG ≤ 1 and FMG > 1 for all exploration sets are given in Tables 2-4 all exploration sets varies from approximately 0.31 to over 0.95. There is a mild dependence on the component masses of the system; the median value of the efficiency decreases from 0.77 for the low-mass scenario to 0.61 for the high-mass scenario when all other parameters (SNR, gate duration and gate end-time) are held fixed. Within each mass scenario when the gate duration and gate end-time are held fixed, NNETFIX's efficiency typically improves by a factor ∼ 1.5-2 as the network SNR increases. As the SNR becomes higher, the algorithm can rely on a larger amount of signal energy before and after the gated portion of the data to reconstruct the time series. NNETFIX successfully reconstructs over two thirds of the time series with ρ N = 28.3 or larger for all low-mass and medium-mass exploration sets and over half of the time series for the high-mass sets with the exception of two marginal cases with gate duration t d = 75 ms and gate end-time t e = 90 ms. The exploration sets with ρ N = 11.3 exhibit lower efficiencies, ranging from 31% for the high-mass set with t d = 75 ms and t e = 90 ms to 66% for the low-mass set with t d = 130 ms and t e = 15 ms. Figure 7 shows the efficiency for the exploration sets with component masses (m 1 , m 2 ) = (20, 15)M as a function of the single interferometer peak SNR. The percentage of successful reconstructions ranges from ∼ 33%-66% at low peak SNR to 80% at high peak SNR, with the lowest values 40% occurring for the sets with t d ≥ 75 ms and t e ≥ 30 ms. Time series with peak SNR above ∼ 20 show successful reconstructions in 70% or more of the cases, irrespective of gate duration and end-time.
Changing the gate duration does not seem to have a significant effect on NNETFIX's efficiency, which only varies slightly at fixed network SNR and gate end-time across all exploration sets. Similarly, for fixed gate duration and network SNR, the gate end-time before merger time also has a marginal effect, although NNETFIX tends to produce better reconstructions when the gate is closer to the merger time, especially for long gate durations in the low-mass and medium-mass scenarios.
In conclusion, we find that NNETFIX may successfully reconstruct gated data of durations up to a few hundreds of milliseconds and as close as a few tens of milliseconds before the merger time for a majority of time series with single interferometer peak SNR greater than 20.

Performance of sky maps
The NNETFIX reconstructed time series are expected to produce better sky maps, and therefore better sky localization error regions of the astrophysical signal, than the gated time series. We evaluate this improvement by comparing the overlaps of the sky map derived from the full time series with the sky maps derived from the gated time series and the reconstructed time series. In the following, we generate the sky maps with a modified version of a pyCBC [28] script, pycbc make skymap, in which the data can be manually gated.
We follow Ref. [29] and define the overlap of two sky maps (1,2) as where p 1 (Ω) and p 2 (Ω) are the sky localization probability densities of the sky maps and the integrals are over the solid angle Ω. The discretized version of Eq. (7) is where P 1i and P 2i each denote the sky localization probability of the i-th pixel of the corresponding sky map, and N is the total number of pixels. Each sky map is normalized such that the sum of the pixel values over the entire map is 1. Equation (8) gives values in the range (0, N ). Higher values of O 1,2 indicate a better overlap between the two maps while lower values denote worse overlaps and/or maps which tend to have less-localized error regions. A suitable metric to evaluate the improvement in the sky localization of a signal due to NNETFIX's reconstruction is the overlap log ratio (OLR) where O r,f (O g,f ) denote the overlaps of the sky maps obtained with the reconstructed (gated) time series and the full series. Positive (negative) values of the OLR indicate that the sky map from the reconstructed time series has a larger (smaller) overlap with the sky map from the full time series than the latter has with the sky map from the gated time series. Tables 5-7 give the fraction of samples with positive OLR for all exploration sets. High values of the OLR are obtained when the overlap of the reconstructed (gated) sky map with the sky map from the full time series is large (small). The former typically occurs for reconstructed time series with large values of the FMG. The latter may happen when the loss of signal due to the gate is high and even small gains in the single interferometer peak SNR have a significant impact on the sky localization.
An example of the OLR distribution as a function of O g,f is shown in Fig Values of OLR across the exploration sets generally increase with network SNR, component masses and gate duration. The network SNR of the signal is the main factor that determines the value of the OLR. Because NNETFIX efficiently reconstructs time series containing signals with large SNRs, when network SNRs are large, the sky maps obtained from the full data are typically more similar to the sky maps from the reconstructed data than to the sky maps from the gated data. We find positive OLR median values for all exploration sets with ρ N ≥ 28.3, irrespective of mass, gate duration and end-time. For these sets, the median values of the OLR for the high SNR sets are greater than the corresponding values for the medium SNR sets by a factor ranging from ∼ 1.4 for the high-mass scenario with t d = 130 ms and t e = 15 ms to ∼ 5 for the low-mass scenario with t d = 75 ms and t e = 170 ms. The sky maps of reconstructed time series with lower SNR generally show little improvement compared with the sky maps of gated time series. Median values of OLR for the sets with ρ N = 11.3 are typically around 0, irrespective of the mass scenario, gate duration and gate end-time.
The Median values of OLR have a roughly linear dependency on gate duration. For the high and medium SNR exploration sets, the median values of OLR for t d = 130 ms are larger than the corresponding values for t d = 50 ms by a factor ranging from ∼ 1.6 (medium-mass scenario with ρ N = 42.4 and t e = 15 ms) to ∼ 4.7 (high-mass scenario with ρ N = 28.3 and t e = 90 ms). Since longer gate durations correspond to greater signal losses, NNETFIX's reconstruction provides larger SNR gains and OLR values as the gate duration increases.
The portion of a signal close to the merger time has a greater impact on the sky map than the portion of the signal in the early inspiral phase. Therefore, median values of the OLR for the medium and high SNR exploration sets with t e = 15 ms are typically higher than the corresponding values for the sets with t e = 170 ms by a factor ranging from ∼ 1.1 (low-mass scenario with ρ N = 28.3 and t d = 90 ms) to ∼ 7.1 (high-mass scenario with ρ N = 28.3 and t d = 50 ms). For shorter signals and larger gate durations, a gate end-time very close to the merger time may lead to large signal losses and removal of the merger portion of the signal in H1, and thus, make the reconstruction process less efficient. Figure 9 shows the OLR as a function of the gate end-time for the high-mass sets with t d = 130 ms and different network SNRs. Figure 10 shows the sky localization error region that is obtained with the NNETFIX reconstructed data for the case of Fig. 1. The value of the OLR is ∼ 1.7, corresponding to improving the overlap by a factor of ∼ 50. In summary, for a majority of the cases with gate durations up to a few hundreds of milliseconds and as close as a few tens of milliseconds to the merger time, the sky maps of reconstructed time series with network SNR ρ N ≥ 28.3 better overlap with the sky maps of the full time series compared to the sky maps obtained with gated data (in some cases by a factor up to over 1000%). In these cases, it can also be shown that the true direction of the signals typically belong to sky localization error regions for the reconstructed data with smaller probability contour values than the regions obtained with gated data.

Conclusion
In this paper, we have presented NNETFIX, a new machine learning-based algorithm designed to estimate the portion of a BBH GW signal that may be gated due to the 20 h 16 h 12 h presence of an overlapping glitch. We have tested the accuracy of the algorithm with different choices of signal parameters and gate settings, and defined several metrics to assess NNETFIX's performance. Among these metrics, the most important ones are the FMG and the OLR. The FMG quantifies the algorithm's efficiency in reconstructing the gated data in the time domain. Positive values of this metric indicate that the full time series better matches the NNETFIX reconstructed time series than the gated time series. The fraction of samples that show improvement varies from approximately one third to over 95% across the cases that we investigated. Results show that NNETFIX may be able to successfully reconstruct a majority of BBH signals with peak single interferometer SNR greater than 20 and gates with durations up to a few hundreds of milliseconds as close as a few tens of milliseconds before their merger time.
The OLR quantifies the algorithm's efficiency in improving the sky map from the gated time series. Positive values of this metric indicate that the sky map from the NNETFIX reconstructed time series has a larger overlap with the sky map from the full time series than the one obtained from the gated data. Sky maps from reconstructed data improve for higher network SNR values as the ANN can use a larger amount of signal energy to estimate the missing portion of the waveform. We find positive OLR median values for all cases that we investigated with network SNR above 28.3. Perhaps surprisingly, NNETFIX seems also to perform better in cases with longer gate durations or shorter signals. In these scenarios, the sky localizations obtained with gated data are considerably degraded. Thus, the improvements in the reconstructed sky maps are more sizeable. Reconstructed sky maps of more massive BBH mergers typically show significant improvements compared to the sky maps obtained with gated data.
In a real case scenario, we envision NNETFIX to be pre-trained on real noise data from the detectors for a sparse set of models, each covering a region of the fivedimensional parameter space spanned by SNR, component masses, gate duration, and gate end-time before geocentric merger time, as was illustrated in Sec. 2 (but with a finer grid). While the optimal value of the network SNR and the best estimates of the signal component masses are unknown to the observer because of the gating, the gated data and/or the data from the second interferometer may provide a rough estimate of these parameters. The estimated values of these parameters and the known gate parameters can then be used to choose the most appropriate pre-trained model in the NNETFIX bank. Typically, the NNETFIX reconstructed time series will produce a higher single interferometer peak SNR than the gated time series. If that is the case, the known FSG can be used to estimate the optimal single interferometer peak SNR of the (unknown) full time series by fitting the expected roughly linear relation between the FRS and the FSG for the given TT set.
The overlap of the sky map from the NNETFIX reconstructed data with the sky map that could be obtained with the full data (if it were not contaminated by a glitch), O r,f , can be estimated by looking at the distribution of the OLR for the TT set at hand. The exploration sets that we investigated show that there is a well-defined correlation between the OLR, the FSG and the overlap between the sky maps obtained from the gated data and the NNETFIX reconstructed data, O r,g . If this correlation is generally valid, the OLR can be estimated from the observed values of the O r,g and the FSG using a fit calculated from the TT set used to train the selected NNETFIX model. To expedite this process, the samples in each TT set could be clustered according to the distributions of the OLR, the O r,g , and the FSG. A classifier could then be trained to estimate the optimal SNR and sky localization error region of the full signal.
Once NNETFIX has been trained, the CPU time required to reconstruct the data is of the order of a few seconds for gate durations up to hundreds of milliseconds. This short turnout makes the algorithm suitable to be used in low-latency. NNETFIX could also be applied to GW signals other than BBH mergers, such as BNS or NSBH mergers. Therefore, it could be beneficial for rapid follow-up of glitch-contaminated, potentially EM-bright candidate detections. In future work, we intend to explore the application of NNETFIX and its effect on the sky localization of BNS signals, as well as detector network configurations with more than two detectors. Improving the sky localizations of potentially EM-bright signals could increase the chances of coincident EM and GW observations and lead to a better understanding of the physical properties of their sources.  Table 2.  Table 3.  Table 4.  Table 7. Fraction of samples with positive OLR for the exploration sets with component masses (m 1 , m 2 ) = (35, 29)M . Entries in italic denote sets where the fraction of samples is smaller than 0.5.