Classification methods for noise transients in advanced gravitational-wave detectors

Noise of non-astrophysical origin will contaminate science data taken by the Advanced Laser Interferometer Gravitational-wave Observatory (aLIGO) and Advanced Virgo gravitational-wave detectors. Prompt characterization of instrumental and environmental noise transients will be critical for improving the sensitivity of the advanced detectors in the upcoming science runs. During the science runs of the initial gravitational-wave detectors, noise transients were manually classified by visually examining the time-frequency scan of each event. Here, we present three new algorithms designed for the automatic classification of noise transients in advanced detectors. Two of these algorithms are based on Principal Component Analysis. They are Principal Component Analysis for Transients (PCAT), and an adaptation of LALInference Burst (LIB). The third algorithm is a combination of an event generator called Wavelet Detection Filter (WDF) and machine learning techniques for classification. We test these algorithms on simulated data sets, and we show their ability to automatically classify transients by frequency, SNR and waveform morphology.


Introduction
The advanced Laser Interferometer Gravitational-Wave Observatory (aLIGO) detectors are two 4km interferometers at Hanford, Washington (H1) and Livingston, Louisiana (L1) [1,2]. The Italian 3km interferometer Virgo is expected to join the advanced detector network early next year [3]. The detector duty cycle and sensitivity to astrophysical signals will be determined by noise sources created by the instruments and the environment. In particular, as the detector noise is non-Gaussian short-duration transients will limit the sensitivity of searches for transient astrophysical sources such as compact binary coalescences [4].
The first aLIGO observation run (O1) began autumn 2015. On the 14th September 2015 the aLIGO and Virgo teams detected gravitational waves from the binary black hole system GW150914 [5]. A second binary black hole detection was made on boxing day [6]. An extensive study of the noise transients, which occurred in the data containing the detections, was carried out for the validation of the signals [7]. As the advanced detector network approaches its design sensitivity, the number of detections is expected to increase. Adding more detectors to the network increases the number of possible noise sources and the time it will take to identify their origin. Transients which occur in any one detector will limit the joint analysis time for the network. Understanding the sources of noise transients in the detectors will become increasingly more important with a latency of a few hours.
The detectors contain many environmental and instrumental sensors, which produce auxiliary channels of data that can be used to monitor the detector behaviour and track the causes of short-duration noise artifacts. Auxiliary channels that are not sensitive to gravitational waves can be used to identify noise transients, also known as "glitches", in the detector output and veto those events [8,9,10]. Classification and categorization of transients using individual channels of data may provide valuable clues for the identification of their sources, which can aid in efforts to eliminate them [11,12]. So far classification has mainly been achieved by visual inspection of spectrograms of the transients, but automatic classification is essential for future detections of astrophysical gravitational-wave signals.
Three methods for fast classification of transients have been developed for the analysis of aLIGO and Virgo data. They are Principal Component Analysis for Transients (PCAT), Principal Component LALInference Burst (PC-LIB) and Wavelet Detection Filter with Machine Learning (WDF-ML). Previous work has shown that these methods can classify artificial data sets with an efficiency up to 95% [11]. In this paper we evaluate the performance of these algorithms using glitches in real data from aLIGO. In Section 2 we provide details of the detector data. In Section 3 we give a brief overview of the three different algorithms and details of any improvement they underwent since the previous study. In Section 4 we present the results for the three algorithms on glitches from aLIGO L1 and H1 detector data. This is followed by a discussion in Section 5 of our plans for future improvements and classification during the second aLIGO run (O2) and first Virgo observation run.

The Data
In this study we use data from the 7th aLIGO engineering run (ER7), which began on the 3rd of June 2015 and finished on the 14th of June 2015. The average binary neutron star inspiral range for both H1 and L1 detectors in data analysis mode during ER7 was 50 − 60 Mpc [13].
2.0.1. Livingston. In the period analyzed, data from L1 consists of 48 segments where the interferometer was locked and in data analysis ready mode. These data segments vary in length from 1 second to ∼ 7 hours. We discard any segments of data that are less than a minute in duration as a longer segment of data is required to measure the power spectral density (PSD). The total length of L1 data analysed is ∼ 87 hours. Glitches of different types are often recognised by their shape in a spectrogram such as those shown in Figure 1. A description of the most common glitch types, which have occurred in aLIGO data, are described in [15]. Figure 1(a) shows glitches characterized by a tear drop shape. Figure 1(b) shows longer duration transients known as "whistles", which are caused by radio frequency beats [15]. Only a small number of whistles (∼ 11) were found in the frequency and SNR range used in this study. Some other glitches in the data that are not shown in Figure 1 include those below 10 Hz and scattered light. Some transients may have occurred due to the increased microseism created by tropical storm "Bill" in the Gulf of Mexico [13].
A number of hardware injections were also made during ER7. An example is shown in Figure 1(c). Hardware injections are artificial signals simulated by inducing a motion of the optics that can be used to test which auxiliary channels are sensitive to gravitational waves [8,9].
2.0.2. Hanford. In the period analyzed, data from the H1 detector consists of 50 segments where the interferometer was locked and in data analysis ready mode. The data segments vary in length from 1 second to almost 14 hours. As with L1 we discard any segments of data that are less than a minute in duration. The total length of Hanford data analysed is ∼ 141 hours.
The Hanford data is highly non-stationary and contains many more transients than the aLIGO L1 data. In particular, the H1 data contains many high SNR transients that caused a significant drop in the binary neutron star inspiral range. An example is shown in Figure 2(b). It was suspected that these large transients were caused by cleaning of the beam tube [13]. A few other examples of common transients found are shown in the other spectrograms displayed in Figure 2. As with the L1 data, H1 data also contains a number of hardware injections.

Transient classifying algorithms
Three different classifying algorithms were developed for the fast classification of noise transients in the detectors. Most of the technical details have been described in [11]. Here we give a brief outline of the three methods and describe any changes that have been made to improve their performance and latency. Figure 3 outlines the classification procedures for all three methods. More details are given in the following subsections.
To find transients in the data we use event trigger generators (ETGs). ETGs typically search for excess power in individual interferometers and output the time, SNR, frequency, duration and other parameters of transients found in the data. PC-LIB uses Omicron, the main ETG used by the LIGO Scientific Collaboration's (LSC) detector characterization group [16,17]. WDF-ML and PCAT have their own internal ETGs.

PCAT
PCAT uses a technique called Principal Component Analysis (PCA) that allows for dimensional reduction of large data sets [18,11]. In the first stage of the PCAT analysis, the data are downsampled to 8192 Hz, whitened and high-pass filtered at 10 Hz. Then PCA is applied to all of the noise transients found by the ETG in all the analyzed segments of data. PCAT uses a 0.125 s window around each GPS time as glitches are typically of ms duration. A projection of the original waveforms on to the Principal Components (PCs) allows for the calculation of scale factors for each PC called PC coefficients. Noise transients of different types are separated in the PC coefficient parameter space. This allows PCAT to classify the transients by applying a Gaussian Mixture Model (GMM) machine learning classifier to the PC coefficients [19].

PC-LIB
LALInference Burst (LIB) is a Bayesian parameter estimation and model selection tool, which uses a sine-Gaussian signal model to estimate parameters of gravitational-wave bursts [20]. It can also be combined with Omicron to be run as a search [21]. PC-LIB adapts LIB for the classification of transients by replacing the LIB sine-Gaussian signal model with a new signal model created from a linear combination of PCs calculated from the waveforms of known transient types [22,23]. These known transients may have been previously classified by examining spectrograms of the transients or by one of the other methods. Thus PC-LIB can only classify transients that have occurred in the data many times before. When transients of a new type start to appear in the data new signal models must be created. In our previous study we created signal models using fifty transient waveforms. In this study we only use ten waveforms. This change will allow us to start classifying new transient types more quickly as they start to appear in O2 data. Bayesian model selection can then be used to determine what population of noise transient each new glitch belongs to [24,25,21]. First, one second of data around the trigger time is downsampled to 8192 Hz and a 10 Hz high pass filter is applied. Nested sampling is then used to calculate Bayes factors to determine the correct transient type [24].

WDF-ML
Wavelet detection filter (WDF) is an ETG that is part of the Noise Analysis Package (NAP), developed by the Virgo collaboration [26,27]. It is combined with a machine learning classifier for transient classification (WDF-ML).
In order to reduce the number of wavelet coefficients produced by WDF-ML, the data are downsampled before any data conditioning in the time domain to prevent border effects introduced by the Fast Fourier Transform (FFT). The downsampling is a new feature of WDF-ML that was not implemented in the version of the algorithm used in our previous study. The data are then whitened using parameters estimated at the beginning of each locked segment. After whitening, a wavelet-transform is applied, using a bank of wavelets, as described in [11]. We use a window of 2048 points, with an overlap of 1968 points, which corresponds to a duration of 0.25 seconds, as transients are typically of a short (ms) duration.
The wavelet coefficients identified by the WDF-ML ETG are further cleaned using a wavelet de-noising procedure where only wavelet coefficients above the noise level are retained [11]. WDF-ML produces a list of wavelet coefficients, frequency, duration and SNR for each transient. The dimensions of the wavelet coefficients are then reduced by applying PCA and Spectral Embedding [28,29]. The transient classification is then performed by applying a machine learning GMM classifier to the reduced wavelet coefficients [19].

Classification
In the following sections we show the classification results obtained by PCAT, PC-LIB and WDF-ML on aLIGO H1 and L1 data. All algorithms are run with the same configurations that we expect to use during O2 to better understand our performance during the future observation runs.

Livingston
To find the transients in the L1 data we look for triggers that are coincident within half a second in the outputs of all ETGs. The WDF-ML ETG was run with an SNR threshold of 10 at a sampling rate of 8192 Hz. Omicron was run with a lower SNR threshold of 5. We then look for transients that are coincident between both WDF-ML and Omicron, above SNR 20, and find a total of 426 coincident transients. As the PCAT ETG cannot find the lower frequency (below 10 Hz) triggers and some longer duration triggers we still classify transients that are coincident between Omicron and WDF but missed by PCAT as those triggers would still be classified when running in low latency.  estimated by finding the knee of the data set variance curve. This gives a total number of 20 PCs. PCAT classifies all the transients into 10 different classes. 90 triggers that were coincident between the Omicron and WDF-ML ETGs were missed by the PCAT ETG. Included in these missed triggers are all of the whistles, as their duration is longer than the PCAT analysis window, and 17 transients that are not visible in a spectrogram. 20 of the lower SNR hardware injections are also missed.
The data contains three main types of transients with examples shown in Figure  1(a), (b) and (c). As PCAT does not detect any of the whistles shown in Figure 1(b) the remaining glitches are classified into two main types, further split into sub-types.
PCAT classes 1, 4 and 10 contain the transients which appear as a spike in the time series, as shown in the top panel in Figure 4 and in the spectrogram in Figure 1 transients for the other algorithms could be improved by using a longer time window. However, this could lead to multiple shorter duration transients occurring in the same time window. As PC-LIB looks for specific known transient types it could be used to add labels to the classifications of the other methods so that it will make it easier to find out which class corresponds to which transient type, defined in [15], and which classes are new types that have not occurred previously. As WDF-ML and PCAT can classify new transient types as soon as they appear in the data they can be used to provide waveforms for PC-LIB to use to create new signal models.

Hanford
As for the L1 data transients coincident within 0.5 s between all ETGs are classified. A higher SNR threshold of 30 is used for H1 as the data contains many more transients than the L1 data and is more non-stationary. A total of 1865 coincident transients are classified in H1.

PCAT uses 20
PCs to classify the transients into 7 different types. 120 of the transients coincident between WDF-ML and Omicron ETGs are not detected by the PCAT ETG. They are transients below 10 Hz or triggers from the long duration lines, shown in Figure 2(d), which are not really glitches. The detected transients are split into 7 different classes. The data contains two main types of transients. The first type is characterised by a typical spectrogram shown in Figure 2(a) and a time series waveform shown in the top panel of Figure 6. PCAT splits this type into 6 different sub-classes with 267, 603, 648 44, 1, and 64 transients respectively. Class 1 has 9 mis-classified transients. Classes 2, 3 and 5 all have one mis-classified transient. Class 2, 3 and 6 contain lower duration (∼ 0.005 s), but with different Q and frequency ranges, where Q is defined as Q = duration × 2π × frequency. Class 1, 5 and 16 contain relatively longer duration waveforms (∼ 0.01 s) which also have different Q and frequency ranges.
The second type of transient has a typical time-frequency morphology shown in Figure 2(c) and time series waveforms shown in the middle panel of Figure 6. This type is found in PCAT class 4 that contains 117 transients that are all classified correctly. Overall PCAT classifies 99% of the detected H1 transients correctly.

PC-LIB
As with the L1 data we use 5 PCs to create signal models for the H1 transients. PC-LIB splits the transients into two different classes. A noise class contains the 6 transients shown in Figure 2(d) as they cannot be detected.
Class 1 contains 1651 transients that correspond to a spike in the time series as in PCAT sub-classes 1, 2, 3, 5, 6 and 7. This class also contains 13 hardware injections. 23 transients are mis-classified and should be in class 2.
Class 2 contains 207 transients, which have a typical spectrogram shown in Figure  2(c), and correspond to PCAT class 4. This class also includes 4 hardware injections that are more similar to a sine-Gaussian in shape than those classified into class 1. This class includes 61 transients that are mis-classified and should be in class 1. Overall PC-LIB classifies 95% of the detected H1 transients correctly.

4.2.
3. WDF-ML splits the H1 data into three different classes. Class 1 contains 1358 transients, which appear as a spike in the time series, and correspond to PC-LIB class 1 and the 6 PCAT sub-classes. This class contains all hardware injections and all very low frequency transients that can not be detected by PCAT and PC-LIB. 10 of the transients in this class are mis-classified. WDF-ML class 2 contains 145 transients that are characterized by spikes in the time series, but have longer durations and lower SNR values than the transients in WDF-ML class 1.
WDF-ML class 0 contains 326 transients corresponding to PCAT class 4 and PC-LIB class 2. This class also contains 122 mis-classified transients. As before, this is because all of the transients in the class have a duration (∼ 1 s) which is much longer than the time window used in the WDF-ML analysis. Overall WDF-ML classifies ∼ 92% of the H1 transients correctly.

Comparison.
The results obtained by all three methods for the H1 data are compared in Figure 7. The Omicron SNR and frequency of the transients is shown in Figure 7(d). As WDF-ML uses a small time window of 0.25 s the efficiency of the classification is reduced when the data are highly non-stationary and contain many long (∼ 1 s) duration transients. Even with 137 mis-classified transients the overall accuracy of the WDF-ML H1 results is ∼ 92%. WDF-ML estimates the PSD at the beginning of each locked segment. This may introduce errors towards the end of the segment if the data is highly non-stationary.

Discussion
Non-Gaussian noise in the aLIGO and Virgo detectors can potentially mimic a gravitational-wave signal, reduce the duty cycle of the instruments and decrease the sensitivity of the detectors. Classification of different noise transient signals may help identify their origins and lead to a reduction in their number. We have developed three methods for noise classification and have previously demonstrated their performance on simulated transients in simulated Gaussian aLIGO noise [11]. However, as real noise from the advanced detectors is non-stationary and non-Gaussian, a better understanding of how our methods will perform during the upcoming observation runs of the advanced detectors is required.
In the ER7 data from the L1 detector PCAT missed 90 transients and classified 95% of the remaining transients correctly. PC-LIB missed 33 transients and classified 98% of the remaining transients correctly. WDF-ML classified all transients and 95% of them were correct. In the H1 data PCAT missed 120 transients and classified 99% of the remaining transients correctly. PC-LIB missed 6 transients and classified 95% of the remaining transients correctly. WDF-ML classified all transients and 92% of them were correct. We conclude that our methods have a high efficiency in real non-stationary and non-Gaussian detector noise.
The efficiency of the WDF-ML algorithm is reduced when the duration of the transients becomes much larger than the analysis window, which reduces the efficiency of the overall classification. This could be prevented by applying a high duration cutoff to the transients found by the ETG before classification. Most high duration and SNR transients are removed by data quality vetoes. Conversely, short duration transients will be more important as they have a higher impact on the gravitational-wave search backgrounds. Since they are rarely removed by vetoes their accurate classification is crucial to improve gravitational-wave searches as an accurate categorization will allow us to search for couplings within the detector [30,7].
Because of the different strengths and weaknesses of the different methods having multiple classifiers is a winning strategy. WDF-ML can classify lower frequency transients than the other two methods. PC-LIB is better able to classify longer duration transients due to its longer analysis window. PCAT can classify new types of transients as soon as they appear in the data and thus provide transient waveforms for PC-LIB's signal models.
Further improvements could also be made by using a training set of pre-classified waveforms or exploring the use of dictionary learning algorithms for glitch classification [31]. The aLIGO gravity spy project aims to build these data sets through a citizen science program [32].