Gravitational Wave Denoising of Binary Black Hole Mergers with Deep Learning

Gravitational wave detection requires an in-depth understanding of the physical properties of gravitational wave signals, and the noise from which they are extracted. Understanding the statistical properties of noise is a complex endeavor, particularly in realistic detection scenarios. In this article we demonstrate that deep learning can handle the non-Gaussian and non-stationary nature of gravitational wave data, and showcase its application to denoise the gravitational wave signals generated by the binary black hole mergers GW150914, GW170104, GW170608 and GW170814 from advanced LIGO noise. To exhibit the accuracy of this methodology, we compute the overlap between the time-series signals produced by our denoising algorithm, and the numerical relativity templates that are expected to describe these gravitational wave sources, finding overlaps ${\cal{O}}\gtrsim0.99$. We also show that our deep learning algorithm is capable of removing noise anomalies from numerical relativity signals that we inject in real advanced LIGO data. We discuss the implications of these results for the characterization of gravitational wave signals.


Introduction
Gravitational wave (GW) observations of binary black hole (BBH) mergers with the LIGO [1,2] and Virgo [3] detectors is now a common occurrence [4,5,6,7,8,9]. Extracting these time-series signals from non-Gaussian and non-stationary noise requires a firm understanding of the astrophysical properties of GWs, which is customarily obtained through numerical relativity (NR) simulations [10,11,12,13,14,15,11,16,17,18]. On the other hand, characterizing noise in GW detectors is a daunting task. Its non-Gaussian and non-stationary nature, combined with the fact that GW facilities undergo frequent commissioning to further enhance their sensitivity, presents a formidable challenge to design robust models that can accurately capture its statistical properties [19,20,21,22]. Nonetheless, this work is critical to identify and excise poor quality data segments, and noise anomalies that contaminate GW signals. Once this is done, GW detection pipelines can provide robust estimates for the astrophysical parameters of GW sources, and their significance.
While noise anomaly removal is customarily done in off-line GW searches, low-latency detection pipelines also require data quality information to identify and remove, in real-time, noise anomalies that may prevent the detection of GW events, and to accurately determine their nature, which is of central importance for Multi-Messenger Atrophysics searches [23,24,25,26,27]. To complement this ongoing effort, in this article we present deep learning algorithms that are trained with raw advanced LIGO noise to identify GW signals in realistic detection scenarios, and which upon removing the imprints of noise, produce denoised time-series signals that resemble NR waveforms. Given that BBHs represent the most abundant source of GW sources thus far [9], the analysis we present herein focuses on GWs produced by BBH mergers. When we apply these algorithms to denoise several BBH waveforms that have been detected by the advanced LIGO and Virgo detectors, we find that the output timeseries data of our denoising algorithm reproduces with excellent accuracy the NR templates that optimally describe these GW sources. Furthermore, we also demonstrate that when we contaminate NR templates with simulated noise anomalies, following [21,18], and inject these signals in real advanced LIGO noise, our denoising algorithm can tell apart between true GW waveform signals and glitches.
Denoising gravitational wave signals in low-latency may be useful for a variety of tasks. First and foremost, it may be readily applied to remove noise anomalies that contaminate or obscure true gravitational wave signals. Once this is done, denoised GW signals may be used in conjunction with existing algorithms to assess the significance of new detections. Furthermore, denoised time-series waveforms may be used to compute fast time-domain overlap calculations with machine learning based waveform generators [28,29] to constrain the astrophysical parameters of the source, which in turn may be used to inform the construction of physical priors for parameter estimation analyses, or to explore whether it is necessary to produce numerical relativity waveforms to accurately describe an event that is beyond the scope of existing semi-analytical waveform models.
This work aims to accelerate the convergence of novel signal-processing algorithms with GW astrophysics. Recent accomplishments of this program include the demonstration of deep learning for the detection and characterization of GW signals in simulated and real LIGO noise [30,31,32], the detection and characterization of higher-order waveform signals from eccentric BBH mergers [33], among many recent applications of machine and deep learning for signal detection and source modeling [19,34,35,36,37,38,39,40,41].
In the specific context of signal denoising, recent efforts have focused on the use of recurrent neural networks (RNNs) [42,43] combined with autoencoders, dictionary learning and principal component analysis to denoise burstlike GWs, i.e., short duration (O(10 −1 second)) signals with large signal-to-noise ratios (SNRs) [44,45,40]. While recurrent auto-encoders have been proven to outperform principal component analysis and dictionary learning both in signal reconstruction accuracy and computational efficiency [40], it has thus far been difficult to extend these algorithms to denoise O(second-long) GW signals. Motivated by the fact that ground-based GW detectors continue to enhance their sensitivity, thereby increasing the time-window during which GW signals can be observed, we have exhaustively explored the use of different types of neural network models, and have found that convolutional neural networks (CNNs) [46,47] are better suited to denoise BBH GW signals whose length is ∼ 10x longer, and whose SNR are significantly lower, than what existing state-of-the-art algorithms can handle.
We have tested our new CNN-based algorithm in a variety of scenarios, including the denoising of GW signals in simulated Gaussian noise, and the denoising of true BBH GW signals in realistic detection scenarios. These results represent the first application of deep learning to remove noise contamination from true BBH GW events that span a wide range of masses and SNRs. By computing the overlap between the output of our denoising algorithms with the optimal NR waveforms that describe these signals, we furnish evidence for the robustness and accuracy of this approach.
This article is organized as follows. We describe the properties of our deep learning denoising algorithm, and the datasets used to train it in Section 2. Results for the denoising of GW signals in simulated and real LIGO noise are presented in Section 3. We summarize out findings and future directions of work in Section 4.

Methods
In this section we provide a succinct overview of the mathematical and statistical foundation of the signal processing algorithms we have utilized for GW denoising. Thereafter, we describe the architecture and key features of our neural network models, and the datasets we have used to train and test them to denoise GWs, both in the context of simulated Gaussian noise and real LIGO noise.

Statistical foundations of Deep Learning Denoisers
Within the framework of statistical learning, a GW signal X can be modeled as a random process, indexed by real time t. Since we use modeled one-second GWs sampled at 8192 Hz, we treat GWs as random vectors of size 8192. We standardize our datasets by normalizing their peak amplitude so that the sample space Ω can be set to [−1, 1] 8192 .
We assume that GWs follow some unknown but fixed joint probability distribution, with the probability density function (pdf) f X (x). The GW signal contaminated by noise is denoted by Y , and follows some unknown but fixed distribution f Y |X (y|x) when conditioned on some clean signal x. Under these conventions, the goal of denoising is to find a function h(·) that minimizes the expectation value of the mean square error (MSE) of the recovered signal, namely where h(·) is the denoising function. In most cases, we only know the empirical distributionf X (x) of X andf Y |X (y|x) of Y , which are determined by the empirical data. So the quantity we can directly minimize iŝ In practice, if the choice of h(·) is arbitrary, then finding an optimal solution is computationally unfeasible. Therefore, we often restrict the searching space to a class of parameterized functions, h w (·), where w is a vector of parameters. In this case, the optimization problem can be posed as The choice of the parameterized function class is critical to the success of any statistical learning algorithm. In recent years, a deep-layered structure of functions has received much attention [48,49], where n is the number of layers or the depth. Usually, we choose, h wi (x) = g(w i x), where w i is a matrix, x is an input vector, and g(·) is a fixed nonlinear function, e.g., max{·, 0} (also known as ReLU), tanh(·), etc, that is applied element-wise. This function class and its extensions, also dubbed neural networks, combined with simple first-order optimization algorithms such as stochastic gradient descent (SGD), and improved computing hardware, has lead to disruptive applications of deep learning [48,49].

Neural network architecture
Empirically, it has been shown that a particular function class or network structure called WaveNet [50] can be used to produce raw audio waveforms that mimic human speech with high fidelity. In view of this realization, we have explored this architecture as a starting point to design a neural network model to denoise GWs. Since we are using WaveNet for denoising purposes, instead of waveform generation, we have removed the causal structure of the network. The causal structure of WaveNet is modeled with a convolutional layer [47] with kernel size 2, and by shifting the output of a normal convolution by a few time steps. However, in this paper we adopt convolutional layers with kernel size 3, so that when denoising the waveform at a certain time step, we take into account information from past and future time steps. We also dilate the convolutional layers to get an exponential increase in the size of the receptive field [50]. This is necessary to capture long-range correlations, as well as to increase computational efficiency. By construction, WaveNet utilizes deep residual learning, which is specifically tailored to train deeper neural network models [51]. The structure of WaveNet is described in detail in [50], and we provide a schematic representation of its architecture in Figure 1. To demonstrate the robustness of the performance of WaveNet for denoising, we consider two models with different sets of hyper-parameters, which we describe below.

Model I
For Model I, the dilated convolutional layers have dilations 2 0 , 2 1 , 2 2 , 2 3 , ...2 10 . These 11 layers are stacked as a block, which is repeated ten times. The nondilated convolutional layers in the repeating blocks (Conv 1 × 1 in the boxes of Figure 1) each use a kernel size of 1. Furthermore, the numbers of input and output channels are 128 for both dilated and non-dilated convolutional layers in the repeating blocks. The penultimate convolutional layer (Conv 1 × 1 in the middle of Figure 1) has 128 input channels, and 64 output channels with kernel size 1. The last convolutional layer (rightmost Conv 1 × 1 in Figure 1) has 64 input channels, 1 output channel and kernel size 1.

Model II
We use a similar structure for Model II, except for the dilated convolutional layers, which now have dilations 2 0 , 2 1 , 2 2 , 2 3 , ...2 11 . These twelve layers are stacked as a block and repeated six times. Additionally, the number of input and output channels in the convolutional layers are increased from 128 to 256 and 64 to 128, respectively.
We have considered these two models to assess their robustness and accuracy to remove noise contamination from GW signals. When applied in the context of simulated Gaussian noise or real LIGO noise, we have found that the denoised time-series signals obtained from either model are identical. These findings, shown below, demonstrate the robustness of WaveNet to denoise GW signals in realistic detection scenarios.

Data Curation
We trained our denoising algorithms using a catalog of GW signals that describe the inspiral, merger and ringdown of non-spinning BBH mergers. We produced these waveforms using the NR surrogate waveform family [52]. Each waveform is produced at sample rate of 8192Hz, and we consider the last second of evolution of BBHs with component masses m {1,2} ∈ [5M , 75M ], and massratio q ≤ 10. The training dataset (9861 templates) samples this parameter We start the training stage by computing the Power Spectral Density (PSDs) of the noise, which is used to whiten both the templates and the noise. Thereafter, we rescale the amplitude of the GW signals, and the standard deviation of the noise to create scenarios that describe GWs over a wide range of SNRs. We then add the rescaled templates and noise together, and normalize the standard deviation of data that contains both signals and noise. The simulated noisy signals will be used as input, and the corresponding clean signals will be used as targets during the training process. The actual output of our denoising algorithm is an unwhitened waveform signal. In our results, we also present the corresponding whitened signals, both denoised and true GW signals, to clearly show what portion of the denoised time-series signal is actually detectable by the advanced LIGO detectors. To further enhance the robustness of the network, we have incorporated time invariance, which basically ensures that the neural network can correctly identify and denoise signals, irrespective of their location in the data stream. Signals embedded in simulated Gaussian noise We followed the previous methodology using simulated Gaussian noise, and setting LIGO's Zero Detuned High Power (ZDHP) configuration as the target PSD [53]. Signals embedded in real LIGO noise Following well established methods to mea-sure a noise PSD estimate [54], and to encapsulate the actual sensitivity of the advanced LIGO detectors at the time true BBH GW signals were observed, we use between 512 seconds and 4096 seconds of open source advanced LIGO data around the GWs we want to denoise. It is worth noting that this PSD estimate only needs to be regenerated when there are significant changes in the detector's noise PSD, as described in [54,55]. In practice this means that it suffices to compute a noise PSD estimate every 4096 seconds [54,55], and use it to denoise on the fly any new GW events that are detected within the next 4096 second interval. Doing transfer learning to continually update our neural network model with new data to capture any significant changes in the detectors' noise PSD is computationally inexpensive. One inexpensive GPU suffices to complete this task within a few minutes. It is worth mentioning that this continuous training scheme is needed, since we found that the noise PSD estimate we compute to denoise GWs in advanced LIGO's first observing run was significantly different to the noise PSDs we used to denoise GWs in advanced LIGO-Virgo's second observing run. This is expected, since the advanced LIGO detectors underwent a significant sensitivity upgrade at the end of its first observing run.
Similarly, we have followed the above description for the training and testing procedure, with the difference that we now use open source LIGO noise, available at the Gravitational Wave Open Science Center [56]. We work with the 16384Hz LIGO noise, and downsample it as appropriate.
The neural networks are trained on 4 NVIDIA K80 GPUs with PyTorch [57] using ADAM [58] optimized method. The weight parameters are initialized randomly. The learning rate is set to 10 −3 initially and reduced to 10 −4 when the MSE loss plateaus.

Results
We have tested our denoising algorithm in the context of simulated Gaussian noise, and in realistic detection scenarios using raw LIGO noise. The first set of results is presented in the following section.

Simulated Gaussian noise
Assuming that h represents the output of our denoising algorithm, s the ground truth (clean GW signal), and defining S n (f ) as LIGO's ZDHP PSD [53], andh(f ) as the Fourier transform of h(t), the noise-weighted inner product between h and s is given by whereŝ [tc, φc] indicate that the normalized waveformŝ has been time-and phaseshifted. Under these considerations, we have used the normalized overlap to quantify the accuracy with which our denoising algorithm can reconstruct GW signals contaminated by simulated Gaussian noise. Figure 2 presents the normalized overlap between denoised signals and their clean counterparts for BBH populations with matched-filtering SNR = 9 (top left panel) and SNR = 12 (top right panel). A sample waveform embedded in noise for each BBH population is presented in the bottom panels. These results indicate that the output time-series signals of our denoising algorithm reproduce the true features of clean GW templates with overlaps O ≥ 0.97 across the BBH parameter space for noisy signals with SNR ≥ 12. These results were obtained with Model I. Results using Model II are presented in Figure A.10. A direct comparison between these two sets of results confirm that variants in the architecture of our neural network models produce consistent results, providing evidence for their robustness and stability when applied to denoise GW signals.

Real LIGO Noise
We have put at work Model I and Model II to denoise true BBH GW signals in real LIGO noise. We present one set of results, since the overlap between the denoised signals produced by either model, and the clean target signals, are consistent within 1%.
The data selected for denoising corresponds to that in which the events are observed with the largest SNR. This choice is motivated by the results presented in Figure 2, which indicate that larger SNR values improve the waveform reconstruction. Figure 3 presents the output of our denoising algorithm when applied to real advanced LIGO noise that contains four different BBH events. We distill these set of results in two cases. The first one comprises GW signals that describe the GW events GW150914, GW170104 and GW170814. To denoise these GW signals, we fed into our denoising algorithm a one second-long advanced LIGO data segment that contains the GW under consideration. The output of our denoising algorithm for each of these events is shown in the top panels of Figure 3. It is important to clearly delineate the realm of applicability of our results, since our deep learning algorithm provides a realistic description of the data when the GW signal is actually detectable by advanced LIGO. Thus, to clearly exhibit the portion of the denoised data that may be used for data analysis studies the mid-panels of Figure 3 show the whitened true GW signal and the whitened output of our denoising algorithm. The bottom panels in this Figure show the overlap between the output of our denoising algorithm, within its realm of applicability, and the NR templates that optimally describe these signals [59,60]. We notice that in all cases O ≥ 0.99. We have selected these systems to consider a broad range of masses, mass-ratios and SNRs of recently detected BBH mergers. These results show that deep learning can provide unwhitened time-series data which may facilitate rapid analyses to constrain the parameter space that describes BBH mergers. There are several important aspects of our denoising algorithm that we want to exhibit with the GW event GW170608. This is a low mass BBH merger with moderate SNR [8]. As shown in the time-frequency power maps of the LIGO strain data produced for this event in Figure (1) at [8], and the spectrogram we have produced with data available at the Gravitational Wave Open Science Center [56] in Figure 4, we notice that the characteristic chirping morphology of the BBH evolution is rather intermittent, as opposed to the smooth, continuous time-frequency tracks observed in other GW events [4]. Therefore, based on the loudness and low total mass of GW170608, we would expect to observe these signatures in the output time-series data of our denoising algorithm. Our results,  : Top panels (from left to right): denoised signals from the binary black hole mergers GW150914, GW170104, GW170608 and GW170814. Middle panels: overlap between whitened denoised signals and the whitened optimal numerical relativity templates, according to matched-filtering GW detection pipelines [59,60]. Bottom panels: overlap between denoised signals and the optimal numerical relativity templates, according to matched-filtering GW detection pipelines [59,60].
presented in the bottom panel of Figure 4, and the third bottom panel (from left to right) in Figure 3, demonstrate that this is exactly what we observe in our denoised GW170608 signal, namely, at lower frequencies our denoised signal stays in phase with the optimal NR template that, according to matchedfiltering algorithms, reproduces GW170608 [59,60]. As the BBH system nears merger, the power of the signal drops significantly, which is reflected in the reconstruction of our denoised signal. During this time window, localized around t ∼ −0.2 seconds in the bottom panel of Figure 4, our denoised signal goes out of phase and amplitude with the NR template. Right before merger, the true GW signal increases its SNR and our denoised signal is now reconstructed with high fidelity.
The above description is essential to highlight that our denoising algorithm has not just hierarchically learned the properties of GW signals, and then performed an interpolation of these abstract features to produce a denoised signal. Rather, our denoising algorithm is actually using the statistics of the noise in which the signal is embedded to provide a realistic representation of the true GW event. This reconstruction is determined by the SNR of the signal, and encodes the sensitivity of the detectors at the time of observation. These results furnish additional evidence for the versatility and power of deep learning for GW data analysis in realistic detection scenarios, and represent the first time deep learning is proven effective at denoising BBH GW signals that span a broad range of SNRs.

Spin-precessing binary black hole waveforms
As described in Section 2.3, we trained our deep learning algorithm to remove noise from waveform signals that describe non-spinning BBH mergers. In the previous section we have demonstrated that our denoising algorithm generalizes to new types of signals, since some of the BBH waveforms that we denoised are consistent with BBHs that have non-zero spins, as shown in [9]. In this section, we quantify the robustness of our denoising algorithm in a more challenging scenario, i.e., we consider spin-precessing BBH mergers, produced with the waveform model introduced in [61], with the following parameters: total mass M = {70M , 75M }, mass-ratio q = {4/3, 4}, and three spin-vector combinations, namely: These spin-precessing BBH waveforms are whitened using a noise PSD estimate in the vicinity of the event GW150914. We then extract open source advanced LIGO data around this same event and inject them therein (note that in all these studies the data containing the true BBH GW signals are excised). Figure 5 presents two sets of results: (i) the top panels show the unwhitened time-series data of the ground-truth waveform and the output of our denoising algorithm; (ii) as discussed before, it is important to explicitly show the realm of applicability of the output time-series data of our deep learning algorithm, and to inform that we show in the bottom panels the whitened versions of the ground-truth signals and the denoised signals. These panels also show the overlap between the ground-truth signals and the denoised waveforms, computed from the time marked by the dashed lines to the last time-sample of the signals. These panels show, from left to right, low-, moderate-and high-spin configurations. The overlap values reported in these panels, O = {0.99, 0.93, 0.97} indicate that, even though we trained our deep learning denoiser with nonspinning BBH signals, we can still reconstruct the features of spin-precessing BBH mergers. We have also quantified the ability of our denoising algorithm to generalize to these new types of signals by computing the overlap between these spin-precessing signals and the entire dataset of waveforms we used to train our denoising algorithm. The corresponding overlap values for the signals shown in Figure 5, from left to right, are O = {0.93, 0.83, 0.94}. If we compare these results with the actual overlap values obtained between the denoised signals and the ground-truth spin-precessing waveforms, i.e., O = {0.99, 0.93, 0.97}, we realize that the denoiser has been able to generalize to new types of signals that are not present in the training dataset.
In Figure 6 we perform another analysis concerning the suitability of our deep learning algorithm to denoise spin-precessing BBH signals that exhibit more clearly the features of spin-precession. To do so we have chosen a systems with mass-ratio q = 4. The three panels in Figure 6 present, from left to right, the overlaps between the denoised time-series signals and the ground-truth signals, namely O = {0.97, 0.99, 0.98}. As before, we have also computed the overlap between these spin-precessing signals and the entire data set of waveforms used to train our deep learning algorithm, finding that the corresponding overlaps for the signals shown in Figure 6, from left to right, are O = {0.91, 0.83, 0.93}.
These analyses furnish evidence that our denoising algorithm can generalize to new types of signals, and also sheds light on regions of parameter space where our algorithms requires additional work, in particular spin-precessing BBHs with asymmetric mass-ratios. Informed by these findings, we will present an extended version of this algorithm in future work to recover with higher fidelity these type of astrophysical events. For now, it is worth highlighting that this method can be readily applied to LIGO data analysis given the measured spin values of detected BBH mergers [9].

Glitches
An important consideration in the construction of denoising algorithms is that they should be trained so as to tell apart noise anomalies from true signals. Therefore, a metric to assess whether the denoising algorithm performs optimally consists of contaminating a given BBH waveforms with a variety of glitches and ensure that the denoised signals are not altered by them. Figure 7 presents results for a variety of studies we performed with our denoising algorithm. The top panels present ground-truth signals contaminated by two types of glitches discussed in [21,18], and the corresponding signals that  were produced by our deep learning algorithm. For this analysis we considered non-spinning BBH mergers with total mass M = 64.5M , and mass-ratio q = 1.24. We injected these signals in open source LIGO data in the vicinity of the event GW150914 [56]. We notice that, as expected, the denoiser has removed both types of noise anomalies from the denoised waveform signals. The midpanels show how the actual signals (contaminated by glitches and denoised ones) look when whitened by a noise PSD estimate. The bottom panels show the whitened version of the original signals (without glitch contamination) and the denoised signals. These studies show that our denoising algorithm has learned to tell apart noise anomalies from signals, and that it is effective at removing these from waveforms. Finally, we have considered the scenario in which there is no waveform in the data, but only noise anomalies. We performed three experiments, namely, we extracted open source advanced LIGO data in the vicinity of the event GW150914 [56], and injected two types of Gaussian glitches, as shown in the top and bottom panels of Figure 8. In the case of Gaussian glitches, we found that the output of our denoising algorithm is consistent with the expected timeseries data that it would produce in the absence of waveform signals, as shown in the top panels of Figure 8.
On the other hand, we have also considered a structured noise anomaly, namely, a sine-Gaussian glitch. As shown in the bottom panels of Figure 8, this type of glitch resembles a GW signal, and our algorithm tries to actually reconstruct it. There are key differences, however, in the output time-series signal produced by our denoiser when it reconstructs a true or simulated GW signal, and a noise anomaly that resembles a GW signal, such as a sine-Gaussian glitch. In the case of GW signals, the denoised amplitude and phase of the signal closely resemble the ground truth signal, as shown in Figure 7. In contrast, when our denoiser is applied to GW data that only contains noise and a sine-Gaussian glitch, we find that the denoised time-series captures fairly well the phase of the ground-truth sine-Gaussian, while it poorly recovers its amplitude evolution. We have explored this latter finding in detail, and present a summary of these results in Figure 9. Therein we present the recovered average power, P , of the signals defined as where x(t) represent the time-series sine-Gaussian glitch. Using this relation we have computed the average power of the denoised glitches assuming three cases SNR = {32.5, 13, 2.6}, obtaining P = {58%, 35%, 2%}, respectively. These results indicate that our denoiser is suboptimal to recover this type of noise anomalies. These studies shed light on the realm of applicability of deep learning algorithms to: (i) denoise signals in realistic detection scenarios, including signals that describe a signal manifold that is distinct to the one used for training; (ii) remove glitches from waveform signals that are embedded in real advanced LIGO  : These panels present signals contaminated by two types of glitches, namely, Gaussian glitch for the left panels and sine-Gaussian glitch for the right panels. The top panels present the ground-truth signals contaminated by the noise anomalies, accompanied by the output of our denoising algorithm. Notice that, as expected from an optimal denoiser, our deep learning algorithm has removed the glitches from the denoised signals. Bottom panels: whitened version of the ground-truth signals contaminated by glitches and of our denoised signals. Bottom panels: whitened version of the ground-truth signals without glitch contamination and of our denoised signals.  The panels show the output of our denoising algorithm when it is used to process real advanced LIGO noise that contains glitches but no waveforms. Top panels: Gaussian glitches with different amplitudes are injected in real LIGO noise. The output of our deep learning algorithm is consistent with the absence of waveform signals in the data. Bottom panels: the left panels shows a sine-Gaussian glitch and its denoised version. The right panel shows the whitened version of the glitch and of the denoised glitch. This shows that structured glitches of this nature can effectively be denoised by our deep learning algorithm, reconstructing with fair-fidelity the amplitude and phase of these noise anomalies.  noise; (iii) tell apart signals from glitches. We have also identified specific areas of improvement, including the development of a denoiser that is trained with spinning BBH mergers. This tools will be presented in future work.

Conclusions
We have designed deep learning algorithms to denoise GW signals embedded in simulated Gaussian noise, and in realistic detection scenarios, using non-Gaussian and non-stationary LIGO noise. In the former case, we have demonstrated that the overlap between the output time-series signal of our denoising algorithm, and the ground truth signals is O ≥ 0.97 across the BBH parameter space m {1,2} ∈ [5M , 75M ] for noisy signals with SNR ≥ 12.
When applied to a variety of GW signals that describe spinning BBH mergers detected by advanced LIGO and Virgo, we have shown that the overlap between the output of our deep learning algorithm and the NR templates that optimally describe these events has overlaps O ≥ 0.99.
We have also used GW170608 to demonstrate that the quality of the denoised GW is determined by the loudness of the signal and the sensitivity of the detector. We also showed that our deep learning algorithm can generalize to new types of sources, denoising spin-precessing BBH mergers. In this region of parameter space, the overlap between the ground-truth signals and the output of our denoiser is O ≥ 0.97. Finally, we showed that deep learning can remove noise anomalies from BBH mergers.
This work, combined with other successful efforts using machine learning to denoise time-series signals embedded in simulated or non-Gaussian and nonstationary noise, suggest that instead of designing sophisticated schemes to model the statistical properties of noise, one may use deep learning algorithms to learn the true properties of noise, and use this knowledge to carry out controlled experiments in which modeled signals are embedded and subsequently extracted from realistic noise datasets.
Furthermore, for GW signals with moderate to GW150914-type SNRs, we can readily use this denoising algorithms to process data segments where true GW signals are marginally detected as a result of contamination from noise anomalies, or to assess whether single-interferometer observations actually have signals in other detectors which have been obscured by noise anomalies. Furthermore, denoised signals may also be used as input data for other deep learning algorithms that may provide point-parameter estimation results of Bayesian deep learning parameter estimation analyses [62,63,64].
In future work, we will design neural network models to denoise longer GW signals. We will also explore the applicability of these methodologies to extract and denoise new classes of GWs from other realistic noise datasets, targeting in particular potential GW sources that may be observed with pulsar timing arrays [65, 66, 66]. [64] P. Maturana Russel, R. Meyer, J. Veitch, N. Christensen, The steppingstone sampling algorithm for calculating the evidence of gravitational wave models, arXiv e-prints (2018) arXiv:1810.04488arXiv:1810.04488.

Acknowledgments
[   A.10 presents the reconstruction accuracy of noisy BBH GW signals embedded in Gaussian noise. These results were obtained using Model II, described in Section 2.2.2. We notice that our denoising algorithm produces time-series signals whose properties reproduce the true signals with accuracies ≥ 97% for noisy signals with SNR ≥ 12.