Improving the background of gravitational-wave searches for core collapse supernovae: A machine learning approach

Based on the prior O1-O2 observing runs, about 30% of the data collected by Advanced LIGO and Virgo in the next observing runs are expected to be single-interferometer data, i.e., they will be collected at times when only one detector in the network is operating in observing mode. Searches for gravitational wave signals from supernova events do not rely on matched filtering techniques because of the stochastic nature of the signals. If a Galactic supernova occurs during single-interferometer times, separation of its unmodelled gravitational-wave signal from noise will be even more difficult due to lack of coherence between detectors. We present a novel machine learning method to perform single-interferometer supernova searches based on the standard LIGO-Virgo coherentWave-Burst pipeline. We show that the method may be used to discriminate Galactic gravitational-wave supernova signals from noise transients, decrease the false alarm rate of the search, and improve the supernova detection reach of the detectors.

(Dated: February 12, 2020) Based on the prior O1-O2 observing runs, about 30% of the data collected by Advanced LIGO and Virgo in the next observing runs are expected to be single-interferometer data, i.e., they will be collected at times when only one detector in the network is operating in observing mode. Searches for gravitational wave signals from supernova events do not rely on matched filtering techniques because of the stochastic nature of the signals. If a Galactic supernova occurs during single-interferometer times, separation of its unmodelled gravitational-wave signal from noise will be even more difficult due to lack of coherence between detectors. We present a novel machine learning method to perform single-interferometer supernova searches based on the standard LIGO-Virgo coherent Wave-Burst pipeline. We show that the method may be used to discriminate Galactic gravitational-wave supernova signals from noise transients, decrease the false alarm rate of the search, and improve the supernova detection reach of the detectors.

I. INTRODUCTION
A principal challenge in detecting gravitational waves (GWs) is distinguishing astrophysical signals from instrumental or environmental noise triggers produced by non-linear couplings between the detector subsystems and/or their environment [1][2][3].
If the theoretical GW signal is known, as in the case of binary coalescences [4], triggers are generated with a matchedfilter technique [5][6][7]. In a multi-detector array, such as the current Advanced LIGO [8] and Advanced Virgo [9] network, a transient GW signal should appear as a near-simultaneous trigger across all three detectors, the delay defined by the direction of travel of the gravitational wave and the associated light travel time.
The matched-filter technique cannot be used for unmodelled signals such as GWs emitted in Core-Collapse Supernovae (CCSNe) [10]. Despite recent progress in numerical simulations, the dynamics of supernova explosions is not yet fully understood as the extremely complex physics of star collapse and the computational cost required for accurate simulations make the treatment of CCSNe very challenging. Theoretical and computational improvements over the last few years have allowed several teams to calculate some CCSN GW waveforms through different approximations and numerical schemes in two-and three-dimensional scenarios [11][12][13][14]. The main time frequency features for slowly rotating progenitor stars are the progressive increase of the dominant mode frequency and an occasional development of the constant-frequency Standing Accretion Shock Instability (SASI). For rapidly rotating progenitors there is consensus in the simulation community for a strong broad band, temporally very compact (few tens of milliseconds) component, although the later stages of GW production are still under discussion. Even as the pool of available waveforms evolves and more GW waveforms appear in the literature, these main features seem to be common across the various families of numerical simulations. The waveforms used in our analysis capture these features as currently visible in published waveforms. (For more examples, see Refs. [15][16][17][18][19][20].) In addition, CCSNs are stochastic processes. Therefore, state-of-the-art waveforms are not yet sufficiently reliable to be used as templates in a matched-filter search and cover the full search parameter space.
GW signals from CCSNe are typically much weaker than GW signals from binary mergers. Because of this, and the stochastic nature of the signal, the background of GW searches from CCSNe is expected to be severely polluted by short duration noise transients which may mimick actual signals. The situation is even worse when coincident data from multiple detectors are not available. During the first and second LIGO-Virgo observing runs (O1 and O2) a significant portion of the LIGO-Virgo data (∼ 33% in O1 and ∼ 30% in O2) were collected in "single-interferometer" mode, i.e., when only one detector in the network was operating in nominal observing configuration (see https://www.gwopenscience.org [21]). Even with improved detector reliability and duty cycle (70%), it is expected that about 20% of the data in the next observing runs, the LIGO-Virgo network will be single-interferometer data. Given the rarity of a Galactic CCSN [22][23][24], extending the detector range and improving the search background is essential to maximize the chances of detection of a GW CCSN signal.
In this paper we present a novel technique based on a supervised Machine Learning (ML) algorithm [25] which may be effectively employed in future LIGO-Virgo observing runs to reduce the background of single-interferometer data and achieve a 3σ confidence level detection in GW searches for Galactic CCSN. Simpler approach without ML and using fewer GW signals was performed in Ref. [26]. In our method we assume that the event time and the distance of the CCSN are known from neutrino and optical observations. The machine learning algorithm is first trained on off-source data to produce a lower background. The results are then applied to on-source windows around GW event candidates to increase the detection confidence. We train the algorithm on approximately 1.47 days of O1 data by injecting the set of waveforms that have been used in the latest LIGO-Virgo observing runs [10,27] to obtain CCSN detection upper limits at various fixed distances smaller than the distance to the Galactic center (< 10 kpc). The features of simulated and background triggers are extracted using the coherent-WaveBurst (cWB) pipeline [28,29] employed by LIGO and Virgo for unmodelled GW transient searches. The machine learning algorithm is then used to classify the triggers and remove the noise triggers.

II. ANALYSIS SEARCH PIPELINE
Our analysis utilizes coherent WaveBurst (cWB) analysis, a software pipeline widely used in the LIGO-Virgo Collaboration for the detection and reconstruction of unmodelled gravitational-wave (GW) events, e.g. [30][31][32]. At its core, cWB employs the constrained maximum likelihood ratio. The method combines data streams from ranking statistic ρ that is the coherent network signal-to-noise ratio (SNR) of a GW signal detected in the network. To perform its analysis, cWB requires data streams from at least two detectors, thereby reducing the population of the events associated with coincident noise between them. The principal challenge in establishing a detection with only one interferometer is that consistency constraints, that measure a degree of similarity of an event between different detectors, cannot be applied. Without consistency constraints, a population of loud noise glitches might persist in the data, and as a consequence, contribute to the reduction of statistical significance of GW induced candidate.
We propose a method that employs the cWB pipeline and machine learning algorithm to perform a single detector case analysis. First, we configure the first detector as an exact copy of the second detector such that the coherent analysis with cWB pipeline can be performed. Then, the statistical significance of the triggers is assessed with the False Alarm Probability (FAP) [10,27]: where FAR is the False Alarm Rate of the trigger and T on is the time period where we expect to find the signal (on-source window). Finally, the ρ statistic becomes the SNR in the single detector case.
The time of the collapsing core for a galactic CCSN is expected to be determined with an uncertainty of less than one second by the detection of a neutrino flux [33], so we set T on = 2 s. We analyze 1.47 days of data from the Hanford detector during the O1 run, which would allow for a detection at 3σ significance level, corresponding to FAP ≈ 2.7 × 10 −3 .
The background analysis is performed across time-shifted data, thereby removing much of the potential for terrestrial noise or glitches to simulate a signal [34]. While time-shifted, pattern matching is used in the LIGO and Virgo searches, they are built on the assumption that two or three detectors are operational. In case of single detector analysis, the detector data streams cannot be time-shifted to produce the search background.
The left panel of Fig. 1 shows the FAR of the background triggers for a H1H1 network and H1L1 network for comparison, where L1 and H1 denote the LIGO Livingston and LIGO Hanford interferometers. The FAR for the H1H1 network is orders of magnitudes larger than for a regular L1H1 network. Moreover, the noise triggers are much louder.
The cWB pipeline is also employed to analyze the detectability of GW signals generated from multidimensional simulations. The waveforms derived from multidimensional simulations are added to the detector noise with amplitudes corresponding to an initial source distance of 10 kpc. In our analysis, we consider the set of waveforms that have been used in the latest LIGO-Virgo observing runs [10,27]. The pipeline first performs the analysis to isolate the injected waveforms. The analysis is then performed several more times with the waveform amplitudes rescaled to a range of source distances. For each waveform family, we inject the signals every 33 s. As a result, we obtain a detection efficiency curve with respect to the source distance.
Slowly or non-rotating massive stars are believed to constitute about 99% of CCSNe [35]. (For the rates of different explosion mechanisms see Ref. [36] and references therein, as well as Refs. [37,38].) Among slowly or non-rotating models, we choose for our analysis three distinct waveform families characterized by a neutrino-driven explosion mechanism [39][40][41]. From the Müller family of three-dimensional numerical simulation waveforms [39], we consider two waveforms calculated with a 15M progenitor (mul1, mul2 ) and one waveform calculated with a 20M progenitor (mul3 ). The Ott waveform is also generated by 3D simulations with a 27M . The progenitor star is slowly rotating and convection is the dominating mechanism, leading to a less significant Standing Accretion Shock Instability (SASI) [40] contribution to the waveform signal.
For the two-dimensional Yakunin model, we use four waveforms with progenitor masses of 12M , 15M , 20M and 25M [41], respectively. We denote them with yak1, yak2, yak3 and yak4. The signal of these waveforms is generally very strong due to the fact that the axisymmetry of the (2D) system artificially increases the amplitude of the gravitational wave when compared to the other waveforms of this group.
We also examine the waveform models leading to explosion produced from rapidly rotating progenitors. For this case, we consider the Scheidegger [42] and Dimmelmeier [43] waveform families. The explosion mechanism for this second group is believed to be magneto-hydrodynamically driven and the waveforms generally carry larger energies than in the non-rotating or slowly-rotating models. Scheidegger et al. [42] calculate a large set of waveforms from three-dimensional simulations under various conditions. In our analysis, we choose three of these waveforms with a 15M progenitor star mass and different rotational speed, R1E1CA L (no rotation), R3E1AC L and R4E1FC L, which we denote sch1, sch2 and sch3, respectively. For the Dimmelmeier family [43], we choose three waveforms (dim1, dim2, dim3) produced from two-dimensional simulations with a 15M progenitor star mass and increasing rotational velocity.
The right panel of the Fig. 1 shows the detection efficiency curves for the Yakunin waveforms for two and single detector networks. From this plot it can be concluded that the detectability of the waveforms for two and single detector networks are comparable. One interesting observation is that the efficiency goes higher for H1H1 network comparing to H1L1 network, because the injected waveforms between two detectors are fully coherent.
Given Fig. 1 it is clear that a significant challenge with single detector analysis is suppressing the non-linear loud noise transients. It is also important that we remove glitches without decreasing the sensitivity of the algorithm to detect GW signals.

III. MACHINE LEARNING ALGORITHM
In this section we introduce the ML algorithm employed in our work. As a full introduction to the method is beyond the scope of this paper, we present only the information that is essential for the understanding of our analysis. For a deeper discussion, the reader is referred to Ref. [44].
The supervised ML algorithm employed is a type of evolutionary computation called genetic programming (GP). It is an analog to biological natural selection. In GP, we evolve a population of programs through successive generations to solve a particular, defined problem [45]. An individual evolved GP program is an hypothesis which when executed takes the form of a mathematical, multivariate expression.
In the training process each evolved hypothesis is executed against each given data point (sample from the real world) and in turn generates a prediction. Each and every prediction in the training set is compared against the correct, qualified label (truth). The distance between the prediction and truth is used to generate a fitness score for each hypothesis. As the evolutionary process in GP favors those hypotheses with a higher fitness, they are more likely to pass some or all of their code into the next generation of evolved hypotheses. Lesser performing individuals are, over successive generations, abandoned. Therefore, GP programs that demonstrate a higher overall fitness are more likely (but not guaranteed) to be selected for the next generation. Thus, each subsequent generation of programs is more likely (but not guaranteed) to solve the given problem than the prior [46].
GP multivariate expressions are classically represented as a syntax tree, where the trees have a root (top center), nodes (mathematical operators), and leaves (operands). Operators can be arithmetic, trigonometric, and boolean, for example. As with any mathematical expression, operands are variable place-holders for the real-world values. When evaluated, the real-world data are substituted for the variables in the multivariate expressions, data point by data point. The depth of a tree determines the complexity of the evolved multivariate expression. Deeper trees are able to incorporate more operands in each expression, and tend toward non-linear functions.
With GP the user assigns run-time parameters such as the quantity of individual trees in the initial population, type of GP trees employed, the number of individual programs selected for each tournament (a comparison of fitness scores), and the termination criterion (eg: number of generations). The performance of the algorithm can be tuned through the selection of these parameters.
The work-flow of a generational GP run incorporates three basic steps: a) Generation of an initial, stochastic population; b) Iterative selection, evaluation, and application of genetic operations (reproduction, mutation and crossover); c) transfer of the evolved copy into the subsequent generation. Steps b) and c) repeat until the user-defined termination criterion is met [46].
In this advanced era of machine learning, many algorithms tend toward black box solutions, both off-line training and on-line processing without a working knowledge of how any given solution was derived. It is important to many researchers, and their fields of research, to understand the internal workings of any system, including computer software.
The GP algorithm employed in this body of research offers total transparency to its internal workings, and the opportunity to review the evolutionary process at each step, pause, archive, and continue. Moreover, as the GP hypothesis is a stand-alone mathematical expression whose variables call upon data features generated outside of GP, it can be readily employed as a portable model for online data classification or regression analysis in any number of shell, script, or compiled computer languages.
In our analysis we used a tree-based open source python code, Karoo GP [25], that was originally written by one of the authors (KS) for the mitigation of RFI in radio astronomy at the Square Kilometre Array [47]. Karoo GP is scalable, with multicore and GPU support enabled by the Python library TensorFlow and the capacity to work with very large datasets [48].

IV. DATA PREPARATION
We train the GP algorithm on the families of waveforms described in Sec. II (defined as Class1 for the sake of the GP algorithm) and the background events (Class0). We use 1.47 days of background events sampled from a different time frame in the LIGO/Virgo O1 observing run, else we would otherwise bias the analysis.
Each dataset is built by combining a number of simulations ranging from a few hundreds to a few thousands per model family, depending on the family, and a comparable number of background triggers randomly extracted from the total number of cWB events in the analysis time frame. The dimension of the datasets is therefore determined by the number of available injected simulations, and a similar number of background triggers combined.
Each trigger is identified by a GPS time stamp and an 11-dimensional vector that contains the cWB parameters. In our analysis we employ this data vector with a cWB parameter subet relevant to a single-interferometer configuration. The vector elements are as follows: • rho0 -ranking statistic (effectively the signal-to-noise ratio of the event) • volume0 -event volume, i.e., the number of wavelet defining the trigger    For the sake of trainig the ML algorithm, triggers corresponding to injected CCSN waveforms are labeled 1 (positives), and background events are labeled 0 (negatives). The triggers in the datasets are then randomly shuffled and split into thirds. Two thirds are used for ML training and internal testing. The remaining one third is reserved for external, blind validation. Table I shows the analyzed datasets and their dimensionality.

V. ANALYSIS AND RESULTS
With evolutionary computation, for which GP is a subset, the parameters employed by a given algorithm are explored to avoid local minima and then optimized for incremental improvements and the speed at which the algorithm arrives to the desired solution. We train the GP algorithm with a population of 300 individuals, 100 generations, tournament size set to 20, and a max (min) tree depth of 4 (3). These values are in line with well-established choices for the use of GP on datasets of similar dimensionality to the datasets considered here [46,49]. This optimal combination was tested by varying the population size, number of individuals selected for the tournament, number of generations and tree depth in preliminary runs. To minimize the uncertainty in the determination of the multivariate expressions used for the classification of the triggers inherent in the stochastic nature of GP process, we conduct a full evaluation 200 times, on each dataset.
We first use the dataset with Dimmelmeier waveforms injected at a distance of 3.16 kpc, and discuss the results in detail. We then compare these results with those obtained with other injected models, at various distances and mixed datasets.
As the goal of the analysis is to reduce the search background, the relevant quantities in the confusion matrix are the specificity (True Negative Rate, TNR) and the False Negative Rate (FNR), i.e., the number of signals mistakenly identified as noise. Figure 2 shows specificity and FNR for the Dimmelmeier dataset with waveforms injected at 3.16 kpc. The GP algorithm is able to identify on average 96.2% of the noise transients while misclassifying on average only 3.6% of GW signals. Even for the worst runs, the number of lost signals remains well below 1% with a glitch removal efficiency above 92%.
The performance of the runs as a function of the specificity and FNR metrics is shown in Fig. 3. The top (bottom) panel shows the percentage of glitches (signals) correctly (incorrectly) identified by a given percentage of runs (in bins of 5%). The majority of runs correctly identifies the noise transients while misidentifying only a small percentage of signals: About 90% of the glitches are correctly identified by 95% or more of the runs while less than 1% of the signals are misidentified as noise by 95% of more of the runs. Even if the threshold on the number of runs is reduced to 60%, we can still correctly identify over 95% of the noise transients while losing only about 3% of the signals (see Fig. 4).
The performance of the GP classification varies across datasets as the distance of the injected waveforms varies. The farther the distance of the simulated GW, the smaller is its SNR. Thus it is more difficult to distinguish injections from noise triggers. Figure 5 shows how the specificity and FNR vary as a function of injection distance for the Dimmelmeier waveforms. While the performance of the classification diminishes as the injected distance increases, even for the largest injection distance tested, 5.62 kpc, the average specificity across the run remains above 92% with FNR below 4%.
We repeat the analysis above for all other datasets in  to about 97% (3%) at 7.5 kpc. Ott and Yakunin models typically do worse than Dimmelmeier models with average specificity (FNR) ranging from about 97% (5%) and 98% (5%) at 1.0 kpc to about 86% (12%) and 93% (9%) at 3.16 kpc for Ott and Yakunin waveforms, respectively. This may be due to the Dimmelmeier waveforms being more energetic and compact in the time-frequency space than the Yakunin waveforms models. As the physics of CCSN is uncertain, we also trained the GP algorithm on "agnostic" datasets by combining waveforms from all different models. The classification performance of the GP algorithm remains comparable to the performance of the singlemodel training. Figure 6 shows specificity vs. FNR for the combined datasets with injected waveforms at distances from 1.0 to 4.22 kpc. Even for the largest distances, the average specificity remains above about 88% with FNR less than about 8%. We conclude that the procedure is robust against the different CCSN physical models.
Once the GP algorithm has been trained, it can be used to reduce the cWB search background. We first classify the cWB triggers and then remove the triggers that are identified as noise transients by 90% of the GP runs. Figure 7 shows the cWB background for the two-day period before and after the cleaning procedure, where for the latter we have used the training obtained with all waveform models at 3.16kpc. (Other training sets give comparable results.) The number of triggers in the background is significantly removed, specifically at low SNR where the number of triggers is decreased by a factor ∼ 10, as expected. As we are mostly interested in removing loud triggers to increase the detection confidence level, we can bias the GP algorithm to remove high-SNR triggers by training it on a dataset that includes only a subset of the loudest triggers. The results for the background on a (different) O1 period are shown in Fig. 8. By biasing the training dataset towards high-SNR triggers (blue and green curves), the search background is not cleaned as efficiently at low SNR as the background cleaned by training on a dataset with a random selection of background triggers. However, the high-SNR tail of the background shows a reduction by more than one order of magnitude with significant gains down to about SNR∼ 10. The cleaning procedure lowers the SNR required for 3σ c.l. detections by a factor up to ∼ 3 and ∼ 2, respectively.  Table II presents the impact of ML noise removal on the detection efficiency. We show how the detection efficiency before and after changes after ML is applied together with a percent error decrease. On average, the decrease in the detection efficiency for 2σ c.l. is about 10 %, while for 3σ c.l. it is about 30 %.
As a bonus to background reduction, the GP training can also be used to assign a probability to a search trigger being a signal or background noise. According to Bayes theorem, the likelihood that a trigger is a signal (s) if it is classified as such by n + trained multivariate expressions is P (s|n + ) = P (n + |s)P (s) where P (n + |s) is the likelihood of observing n + runs given a signal, P (s) is the probability of observing a signal, and P (n + ) is the probability of observing n + GP runs. Using the testing dataset, we estimate these quantities as P (n + |s) = n T P (n + )/n s , P (s) = n s /n T and P (n + ) = [n T P (n + ) + n F P (n + )]/n T , where n s is the number of signals in the testing dataset containing n T total triggers (n s signals + n b background), and n T P (n + ) [n F P (n + )] is the number of triggers in the dataset positively [mistakenly] identified n + times, respectively. The likelihood that a trigger is a FIG. 7. cWB background on a two-day period before (left) and after (right) the cleaning procedure is applied. The GP algorithm to produce the clean background has been trained with all waveform models injected at a distance of 3.16 kpc. signal given n + positive identifications is then P (s|n + ) = n T P (n + ) n T P (n + ) + n F P (n + ) .
The likelihood of a trigger being a signal is shown in Fig. 9 for the training on all combined CCSN waveforms injected at 3.16 kpc. The plot shows the likelihood values (blue markers) with a polynomial best fit (red curve). For this particular training, the probability of a trigger to be a signal when it is identified by 200 (190) runs is roughly 96% (68%).
As the detection of CCSN signals is very complex and only a limited number of simulated waveforms are available, in order to evaluate possible bias factors in the procedure, we tested the algorithm by classifying a set of waveforms with multivariate expressions obtained by training on a different set of waveforms on a different epoch. The different characteristics of the noise and waveforms in the training and testing sets can be used as a proxy for the stochastic nature and the unknown physical features of the waveforms in a real case. We applied the method to a set of 18 triggers (including simulated signals and background) in a blind analysis.  or signal may be considered inconclusive.

VI. CONCLUSIONS
We presented a new method to reduce the background of LIGO-Virgo searches for GW signals from Galactic CCSN when the detector network is in single interferometer observing mode. This method consists in applying a supervised GP ML algorithm to the output of the cWB pipeline. The ML algorithm is trained on datasets of CCSN waveform simulations and noise background events to classify cWB triggers and remove events based on their probability of being non-astrophysical. The outcome of the procedure is an increased statistical significance of GW candidate triggers and a higher detection confidence.
We tested the method on a variety of datasets with different CCSN waveform models injected at fixed Galactic-scale distances. Roughly 90% or more of non-astrophysical triggers can be removed from the search background with a false negative rate of just a few percent, irrespective of the waveform model, even when the algorithm is trained on datasets with mixed waveforms. To confirm these results we applied the method on a blind set of triggers and showed that the algorithm can successfully discriminate noise from simulated GW signals without any prior knowledge of the signal waveform model or injection distance. The algorithm can be tuned to enhance specific aspects of the search by introducing a bias during the training process. We illustrated this process by overpopulating the training set with high-SNR triggers. This biased dataset allows us to obtain a reduction of over one order of magnitude in the high-SNR tail of the background, which is the most relevant in case of a detection. The SNR required for a detection with a 3σ confidence level is lowered by a factor of ∼ 3. Moreover, we expect these results to improve as more, and more   accurate simulations become available in the literature and the algorithm may be trained on a larger pool of GW waveforms. Although in the case of a real detection of CCSN the GW signal is unlikely to match any of the existing simulations because of the stochastic nature of the process, the algorithm can be trained to recognize the common physical features of the explosion mechanism by injecting the waveforms on multiple realizations of the signal noise.
If applied to current LIGO-Virgo CCSN searches, our method could significantly improve the confidence level of a detection occurring at a time when only a single interferometer is in observing mode. Our ML algorithm integrates with the cWB pipeline and can be easily trained on any CCSN waveform model or interferometric data. It would also be straightforward to apply it to a multi-interferometer configuration by including in the input dataset the full output of a coherent cWB search, as well as extend it with the inclusion of environmental and instrumental auxiliary channel data from the myriad of interferometric sensors monitoring the status of the detectors.