Classification methods for noise transients in advanced gravitational-wave detectors

Noise of non-astrophysical origin will contaminate science data taken by the advanced laser interferometer gravitational-wave observatory and advanced Virgo gravitational-wave detectors. Prompt characterization of instrumental and environmental noise transients will be critical for improving the sensitivity of the advanced detectors in the upcoming science runs. During the science runs of the initial gravitational-wave detectors, noise transients were manually classified by visually examining the time–frequency scan of each event. Here, we present three new algorithms designed for the automatic classification of noise transients in advanced detectors. Two of these algorithms are based on principal component analysis. They are principal component analysis for transients and an adaptation of LALInference burst. The third algorithm is a combination of an event generator called wavelet detection filter and machine learning techniques for classification. We test these algorithms on simulated data sets, and we show their ability to automatically classify transients by frequency, signal to noise ratio and waveform morphology.


Introduction
The sensitivity of advanced gravitational-wave detectors will be limited by multiple sources of noise from the hardware subsystems and the environment. The advanced laser interferometer gravitational-wave observatory (aLIGO) detectors [1,2], which are expected to become fully operational in late summer 2015, and advanced Virgo [3], which is expected to become fully operational in 2016, will include upgrades to all hardware subsystems including suspensions, lasers, seismic isolation and optics. The upgrades are designed to reduce noise sources and significantly improve the sensitivity of the initial LIGO and Virgo instruments.
The low frequency sensitivity of the detectors ( 10  Hz) will be limited by the effects of seismic noise. Thermal noise due to Brownian motion will be the most dominant noise source in the most sensitive frequency range of the instruments. At frequencies higher than ∼150 Hz, shot noise, due to quantum uncertainties in the laser light, is expected to be the dominant noise source. Instrumental and environmental disturbances can also produce non-astrophysical triggers in science data, so called 'glitches,' as well as increasing the false alarm rate of searches and producing a decrease in the detectors' duty cycles. Although much has been learned from the initial LIGO phase, the advanced detectors will be for the most part newly designed instruments, never assembled or tested before. Thus the success of the advanced detectors will require a huge effort in commissioning and detector characterization [4,5].
Understanding detector noise, which may affect the discovery of gravitational waves, will be critical for increasing the chances of detecting an astrophysical gravitational-wave signal. aLIGO and advanced Virgo are designed to perform searches for gravitational waves of various astrophysical origins [2]. Potential sources can be split roughly into two groups: (1) signals with a known gravitational waveform, such as compact binary coalescing (CBC) sources [4], and (2) un-modelled sources, where the astrophysical source and the gravitational waveform may be completely unknown [6].
Current astrophysical estimates on the rates of cosmic events detectable by advanced detectors [7] indicate that an advanced detector network will lead to multiple detections of gravitational-wave signals from CBC sources over the network operating time. An exciting new observational window on the Universe is within reach. The non-Gaussian and nonstationary nature of advanced detector noise may produce glitches, which could affect the sensitivity of searches and be mistaken as gravitational-wave detections, in particular for unmodelled sources. To prevent this, searches for un-modelled gravitational-wave signals, or 'bursts,' currently combine the noise from multiple detectors coherently to prevent a glitch being misinterpreted as an astrophysical signal [6].
However, a glitch occurring at the same time in multiple detectors could still lead to a false positive. If the origin of the noise cannot be identified and hardware improvements cannot be made to remove the glitch, data quality (DQ) flags are applied.
DQ flags and 'vetoes' can be used to remove detector data that contain a high number of glitches [8]. This method requires monitoring multiple auxiliary channels, which are not sensitive to gravitational waves, but provide important information about the environment and other degrees of freedom in the detector. Periods with a large number of glitches are flagged as likely to cause an adverse affect in the gravitational-wave data channel. DQ flags were used in the initial detector science runs and were highly effective in increasing the sensitivity of searches [8]. The use of DQ flags in Virgo's second science run gave an ∼30% increase in the volume of which Virgo is sensitive to CBC sources [5], and ∼5 Mpc increase in the detection range of a binary neutron star system, with signal to noise ratio (SNR) equal to 8, in the initial LIGO detectors [8]. DQ flags can lead to a lower false alarm rate; however, overzealous use of DQ flags can reduce the duty cycle of the instruments.
Glitch classification and categorization may provide valuable clues for identifying the source of noise transients, and possibly lead to their elimination. In previous LIGO and Virgo science runs, this classification was performed by visual inspection of the glitches' time series and/or spectrograms. Visual inspection of individual glitches proved to be a slow and inefficient method in attempts to categorize noise transients during the S6 LIGO science run. With even more data expected in the advanced detector observing runs, faster and more reliable techniques for the classification of noise transients are needed. In order to increase detector sensitivity and duty cycle, it will be essential to provide DQ information in real time during observing runs. This can only be achieved with automatic glitch classification algorithms running in low-latency as data is collected [9].
We have developed three methods that can be used for the fast classification of advanced detector noise transients. They are principal component analysis for transients (PCAT), an adaptation of LALInference burst (LIB) [10], and a combination of a trigger generator called wavelet detection filter and machine learning techniques (WDF-ML) [11,12]. The three different methods are described in section 2. To test the performance of the classifying algorithms, we created three different simulated data sets, described in section 3. These data sets are specifically designed to test the efficiency of the algorithms in classifying noise transients with different waveform morphology or frequency content. In section 4 we present the results for the simulated data sets. This is followed by a discussion in section 5 of plans for future improvements, and a discussion of how our results may affect classification of real noise transients during the advanced detector science runs.

Principal component analysis for transients
PCAT is a python-based algorithm based on the use of principal component analysis (PCA) [14] to identify and classify noise transients in LIGO channels. PCA consists of a linear orthogonal transformation of a set of (possibly correlated) variables into a set of linearly uncorrelated variables, called principal components (PCs). The PCs define the direction of greatest variation in the data. PCA allows a quick characterization of the intrinsic properties of a data sample. It is a versatile and powerful method with a long history of applications in many different fields [14][15][16].
A summary of the PCAT algorithm is given in table 1. In this investigation, we use timesampled values of LIGO's simulated h(t) strain as PCA input variables. The PCs are used to analyze the time variability of the channel and reconstruct the properties of the transients. While this article concentrates on noise transients in the time-domain, PCAT also implements a frequency-domain analysis, where input variables are power spectral densities (PSDs). In general, PCA can be applied to any set of observations with a number of variables.
2.1.1. Pre-processing. The raw time series (sampled at 16 384 Hz) is first split into 32 s-long segments with a 50% overlap, then downsampled to 8192 Hz, and high passed with a Butterworth 4th order filter with a 40 Hz cutoff frequency. The data is then whitened by multiplying the fast Fourier transform (FFT) of the time series by the inverse of the square root of the detector's noise PSD, which is computed using the median-mean average algorithm, as described in [17]. The whitened FFT is inverted, yielding the whitened time series, of which the first and last 8 s are discarded to avoid FFT artifacts at the edges. Noise transients are identified when the channel amplitude exceeds a chosen threshold in units of the standard deviation of the analyzed 16 s segment. A value between 4.5 and 5 has been shown to maximize the algorithm efficiency in identifying transients, while minimizing false positives. For each set of points above the threshold (triggers), the time series is sampled with a fixed-width interval around the trigger's maximum amplitude (typically corresponding to around 125 ms), and then rescaled to a maximum (absolute) amplitude equal to one. This step is required to properly compare the time series and identify the main features of different transient families where i S are the eigenvalues of the matrix . S The explained variance v 0 1   measures the variation (dispersion) of the data set as a function of its dimensionality. The number of PCs that are needed to describe the sample up to a given accuracy can be determined by setting a threshold in v. Thus PCA allows dimensional reduction of the data set. An example of a PCAT variance curve is given in figure 1 .

Classification.
PCAT uses the scikit-learn Gaussian mixture model (GMM) algorithm to cluster the PCA-reduced data [18]. The data are fit to a linear combination of multivariate Gaussian distributions. The number of these distributions (number of classes) is determined by minimizing the Bayesian information criterion (BIC) [19] k m BIC 2 ln ln , 4 · · ( ) ()  = -+ where m is the number of observations, k is the number of free parameters to be estimated, and  is the maximized value of the likelihood function.  depends on the parameters of the model. An important feature of the BIC algorithm is the calculation of a penalty score for each of the free parameters in the data set to avoid over-fitting. Accurate classification of noise transients requires a careful choice of the number of PCs. A low number of PCs typically results in insufficient information to characterize the data. A high number of PCs leads to the inclusion of Gaussian noise features in the reduced dataset, which results in poor performance of the clustering algorithm. While no optimal method exists for choosing the ideal number of PCs, two of the most commonly used methods consist in setting a threshold on v, or looking for a slope change in the explained variance curve [20].

LALInference burst
LIB is a Bayesian parameter estimation algorithm for parameter estimation or model selection for gravitational-wave burst signals [10]. LIB is the burst adaptation of the LALInference library, which is designed for parameter estimation for CBC signals [21]. LIB uses nested sampling to calculate the Bayesian evidence with a sine Gaussian (SG) signal model [21,22]. It can produce posterior distributions for the parameters of the signal, such as the sky location [10].
To adapt LIB for the classification of glitches, we adopt the PCA approach taken by Logue et al [23,24] in their analysis of the explosion mechanism of core collapse supernovae. We take the time series of fifty glitches of a known type, sampled at 4096 Hz, and apply a second order Butterworth high pass filter at 40 Hz. We then FFT the waveforms, as LIB performs model selection in the frequency domain. PCA is then applied to the transient waveforms using the method described in section 2.1.2. A linear combination of the PCs, multiplied by the PC coefficients, is then used as the new signal model in LIB for each different population of noise transient. The different signal models for each glitch population can then be used for Bayesian model selection, which can determine the type of each new noise transient that is detected in the data.
For two competing models M i and M j the Bayes factor is given by the ratio of the evidences is the evidence for model M i given data D, and p D M j ( )is the evidence for model M j given data D [22]. The evidence for each model is calculated by integrating the The prior represents what we already know before any analysis of the data. The likelihood includes new information from the data. First we calculate a signal versus noise Bayes factor B .
S,N Taking the log of the Bayes factor gives where M S and M N are our signal and noise models. To compare two different glitch models, M type1 and M , type2 we can then subtract our signal versus noise Bayes factors to obtain a new Bayes factor that determines the glitch type For a large number of model parameters the evidence integral becomes difficult. This problem is solved using nested sampling, a description of which is given else where [22,25]. The nested sampling algorithm produces posterior distributions for the values of the PC coefficients.
A model for a type of noise transient is considered to be correct if log B 10. type1,type2 > We choose 10 to be conservative as a Bayes factor obtained by running on random noise can vary by around 5. A flat, uniform prior is used for the PC coefficients for each transient type.
To calculate the minimum and maximum values for the PC coefficient priors we use the method described by Logue et al [23] of projecting the transient waveforms on to the PCs. A Gaussian likelihood function is used, which is described in the LALInference paper by Veitch et al [21]. We choose the number of PCs that account for a large percentage ( 70%  ) of the variance of the dataset.
For glitches in real detector noise a trigger generator will be used before running LIB. For this study we take the GPS times for the events from the log file that is produced when simulating the noise transients.

WDF and ML
WDF-ML consists of a event detection algorithm, WDF, followed by an ML classification procedure. WDF is part of the noise analysis package, a C++ library embedded in python, developed by the Virgo Collaboration [26].
2.3.1. Wavelet detection filter. Wavelet-based algorithms are well tuned for the identification of noise transients because they decompose the data into multiple time-frequency resolution maps. The efficiency in detecting transients is linked to the similarities between the analyzing wavelet and the waveforms of the transients. As different wavelet types could better match different waveform morphologies, WDF performs wavelet domain decomposition using different types of wavelet basis, including the Daubechies and Haar wavelets [27,28].
A wavelet transform is similar to a Fourier transform. The Fourier transform sinusoidal waves are replaced by an orthonormal basis generated by translations (shifting) and dilations (scaling) of the mother wavelet where b is the scale and a is the translation. The wavelet transform of a signal f(t) is defined as the projection of f on the wavelet basis where * y is the complex conjugate of the mother wavelet. The wavelet transform has a time frequency resolution that depends on the scale b. The time spread is proportional to b, and the frequency spread is proportional to the inverse of b. The discrete wavelet transform uses a discrete set of the wavelet scales and translations. This transform decomposes the signal into a mutually orthogonal set of wavelets.
2.3.2. Data conditioning and trigger detection. In table 1 we outline the steps of the WDF-ML classification procedure. The first five minutes of data are used to estimate the parameters for the following whitening filter in the time-domain. The whitening procedure is based on a linear predictor filter, whose parameters are estimated through a parametric auto regressive (AR) model fit to the noise PSD, as described in [29]. One of the AR parameters is the standard deviation σ of the background noise, which is used in the wavelet de-noising procedure.
A signal x i that is corrupted by additive Gaussian random noise n 0, The signal x i is used to find an approximation h î to the original h i , which minimizes the mean squared error For a given wavelet thresholding function t the threshold based de-noising can be written as The thresholding function is applied to the wavelet transform of the noisy signal, then the output is inverted and the wavelet transformed. The effectiveness of the technique is dependent upon the choice of wavelet used, the decomposition level, and the amplitude of the threshold value.
For a given threshold t and wavelet coefficient w, the wavelet coefficient is retained if w t, > or is set to zero if w t. < This removes wavelet coefficients that are due to background noise and retains wavelet coefficents that are due to the transient waveforms. WDF uses the universal Donoho and Johnstone threshold method [30], where N is the number of data points andŝ is an estimate of the noise level σ, estimated during the AR parametric fit to the data.
The wavelet coefficients contain the energy of the transient at different scales. After the wavelet thresholding procedure is applied, only the highest coefficients of the wavelet transform remain. These coefficients are expected to contain only features of the transient waveforms. The energy of the transient is given by the sum of the square of the coefficients above the threshold value. The SNR is then given by the energy divided by . s WDF outputs a list of triggers, which include the maximum SNR and frequency, a GPS starting time for the transient, the transient duration, the name of the wavelet family which triggered the event, and the full list of the wavelet coefficients after the de-noising procedure. The peak frequency of the transient is estimated as here f s is the sampling frequency, 'window' is the window used in the WDF process, and b is the scale of the wavelet transform corresponding to the coefficient with the maximum value. The event duration is estimated after applying a clustering step for events that are closer than 0.01 s.
For WDF-ML to correctly identify the glitch, the choice of window size and overlapping parameter between two consecutive sliding windows becomes important. For this data set we As there is no re-sampling filter in the data preprocessing, the data is sampled at 16 384 Hz; therefore, with 1024 points the time window is 0.0625 s. This ensures that the waveforms of duration 2 ms will be inside the window. An overlap value of 0.05 s was used in order to avoid problems caused by a transient waveform being in two consecutive windows.
2.3.3. ML classification. ML classification procedures can be supervised or unsupervised [31]. A supervised ML algorithm trains on a sample of correctly labelled data. An unsupervised classification procedure has no labelled training set of data. WDF-ML uses an unsupervised classification procedure, as we have no previously labelled data set on which to train the algorithm. A supervised ML procedure will be implemented in a future study using information from auxiliary monitoring channels [32,33]. The unsupervised procedure consists of a clustering algorithm to identify classes of events in the parameter space created by the wavelet decomposition. WDF-ML applies the same ML classification algorithm, GMM, as described in section 2.1.3, but other clustering algorithms could be used, such as affinity propagation [34] or kmeans [35]. Wavelet coefficients are computed from the triggers and stored in an n × m matrix, where n is the number of triggers and m is the length of the wavelet window. Most of the matrix elements are zeros, as they are the coefficients that do not pass the thresholding step. Dimensional reduction is required to retain the most important features of the matrix. This is achieved by first applying PCA, which reduces m to 10, and then projecting the remainder of the coefficients on a two-dimensional space with spectral embedding [36,37]. Spectral embedding finds a low-dimensional representation of the data using a spectral decomposition of the graph Laplacian. The GMM ML algorithm is then applied to the reduced coefficients for classification.

Data sets
For the sake of this investigation, we assume all advanced detectors to be affected by the same populations of glitches. Thus we use early aLIGO sensitivity curves for the Livingston detector to generate simulated Gaussian noise [38]. We do not use real data for this study because we need to know all of the properties of the transients in the data set in order to accurately test the different methods. We generate three different data sets containing different types of simulated noise transients, which are added to the Gaussian noise in 5 s intervals. The three data sets are designed to test if the different algorithms can classify transients by frequency, SNR and waveform morphology. We consider three different waveform morphologies: sine Gaussian (SG), Gaussian (G) and ring-down (RD).
f 0 is the central frequency, Q is the quality factor, t 0 is the GPS time at the centre of the SG, and h hrss , where hrss is the root sum squared amplitude of the transient. The τ parameter determines the width of the simulated waveform in the time domain.
The Gaussian waveforms are centred at zero frequency with the maximum frequency determined by the duration. The spike glitches that were observed in S6 were characterized by a Gaussian waveform morphology in the time domain [8]. Their characteristic time series contained a dip followed by an upwards spike that typically lasted for a few milliseconds.
The RD waveforms are similar to high SNR spike glitches, which were observed with time-domain waveforms that RD after their initial spike [8].

Data set 1
The first data set contains 1000 simulated Gaussian transients and 1000 simulated SG transients of different duration, frequency and SNR. The transient waveforms were generated with Q, hrss, duration and frequency values distributed uniformly between the minimum and maximum values, shown in table 2.

Data set 2
Data set 2 includes 1000 simulated SG transients and 1000 RD transients with SNR uniformly distributed between 1 and 400. All transients were generated with identical frequency (400 Hz) and duration (2 ms). This data set is designed to test that the different algorithms can classify transients by waveform morphology only.

Data set 3
Data set 3 includes 1000 Gaussian, 1000 SG, and 1000 RD transients. The waveform parameters in this data set have a large range of values, which makes this data set more challenging to classify than the first two data sets. The parameters of the simulated noise transients in this data set allow us to test the limitations of the three different classifying methods. The parameters for the simulated waveforms are distributed uniformly between the minimum and maximum values in table 3.

Data set 1 results
In this subsection we show the results for classifying the two transient types in the first data set. ( -) Type 7 contains, on average, SG transients with SNRs larger by a factor of ∼5, and a standard deviation larger by a factor of ∼10, than type 1 transients.
By forcing PCAT to cluster the data on a maximum of two types, 99% of SG and 100% of Gaussian glitches are classified as type 1 and type 2, respectively. The few misclassified glitches in this case correspond to transients whose identified GPS time is not correctly aligned with the peak of transient. This issue can be resolved by further tuning of the PCAT trigger generator. ( )  Type 1 is the main type for the SG waveforms, and type 2 is mainly Gaussian waveforms. The 5% of Gaussian waveforms that were in the incorrect class had low SNR values 20 .

( ) 
The Bayes factors that were obtained for all of the detected waveforms in this data set are shown in figure 4. If the type 1 waveforms have been correctly classified then the glitch type Bayes factor should be positive, and if the type 2 waveforms have been correctly classified then then the glitch type Bayes factor should be negative. When using the correct transient waveform model the increase in the log signal to noise Bayes factors is proportional to the square of the SNR. When using the incorrect transient waveform model the log signal to noise Bayes factors remain low as the SNR values of the transients increase. There is a clear difference in the log Bayes factors between the two types once the SNR becomes larger than 20.  Figure 3 shows the ML results for the three types of transients found in the data. The wavelet coefficients for different types of transients are well separated in the parameter space. Type 0 contains the SG waveforms. The Gaussian waveforms have been split into two sub-types, 1 and 2, where type 2 contains more lower SNR (between 25 and 100) Gaussian waveforms than type 1.

Summary.
We have found that all three methods have a very high efficiency for the correct classification of different types of noise transients. LIB and PCAT require that the number of glitch types is specified in advance. If the number of glitch types requested by PCAT is higher than the actual number of glitch types in the data set, then the waveforms will be classified by waveform morphology first, and then split into further sub-types by frequency and SNR. WDF-ML has also shown that if it identifies more types than those present in the data, then the waveform morphologies will be split into further sub-types by SNR or frequency.
As LIB needs a set of PCs in advance to create a signal model, it is only possible for LIB to classify known types of transients in the data. A new set of PCs will need to be created any time that a new family of glitches appears in the data. As PCAT and WDF-ML do not need any information about a glitch type before they start the classification procedure, they can begin to classify new transient types as soon as they appear in the advanced detector data. As LIB runs on one second of data at a time, when analyzing real glitches there may be multiple glitches of different types inside that one second of data, which could affect the efficiency of the classification. Multiple glitches in a small segment of data could also create problems during the windowing stage of WDF.

Data set 2 results
This subsection describes the results for the second data set, which is designed to test if the classification methods can classify noise transients by waveform morphology only.  figure 5. This is due to after the 10th PC, the PCs only account for noise, and including too much noise degrades the efficiency of the classification algorithm.
Changing the number of PCs used to five, the location of the 'knee' of the variance curve (accounting for 51% of the variance), yields better classification efficiency. Transients are first classified by waveform morphology and then broken down into subclasses with different SNRs. The SG waveforms are in types 2, 3 and 7. The RD waveforms are contained in types 1, 5, 6 and 8. Type 4 contains less than 30 waveforms that are a mixture of the two types. The results show that PCAT is able to classify transients, by waveform morphology alone, with a very high efficiency when noisy PCs are not included.
The results can be improved further by limiting the maximum number of clusters to two. Type 1 contains the RD waveforms, and type 2 contains the SG waveforms. In this case, the few misclassified transients either have low SNR ( 10 ) or have waveforms with peaks that are not aligned with the GPS time for the transient.  table 5. 97.8% of the transients with a SG morphology were identified as type 1, and 95.2% of the transients with an RD morphology were classified as type 2 transients. LIB was clearly able to classify the transients by waveform morphology alone with a high efficiency. The simulated transients that were incorrectly classified by LIB had an SNR less than 20. The log Bayes factors for the two types of transients are shown in figure 4. The similar size and shape in distribution of Bayes factors shows that both of the transient types have the same distribution of SNR values.
Seven PCs were used for each transient type, as seven PCs represented 90% of the variance of the type 1 transients, and 80% of the variance of the type 2 transients. As we know that the waveforms in each type are identical in this data set only one PC should be necessary to represent all of the variance of the waveforms. Plotting the variance curve showed a larger number of PCs were needed to accurately represent the data set. This is because the variance curve is affected by the noise included in the waveforms used to make the curve. The PCs may give a better representation of the features of the transients if only high SNR transients Table 3. The minimum and maximum parameters used when creating the simulated noise transients in data set 3. > as shown in table 5. WDF-ML was able to classify different noise transients by waveform morphology alone, with a high efficiency. The results for the ML procedure applied to the reduced coefficients is shown in figure 3. There is a clear separation in the parameter space for the three different types. All of the detected RD transients are in type 0. The SG transients have been split into two classes, which are types 1 and 2. The two types of SG waveforms were not split by frequency or SNR in this case. The SG and RD waveforms can be easily incorrectly classified with a wrong choice of overlap value and window size, because if the  waveform is split over two consecutive analyzing windows then an SG would be cut off in the middle of the waveform, which would make it appear to be an RD waveform. In real data most glitches have a duration of a few milliseconds, therefore a window of a few 100 ms will be used.

Summary.
We have shown that all three classification methods are able to classify transient waveforms by morphology alone, with a very high efficiency. Depending on the maximum number of allowed classes, PCAT may classify transients not only by morphology, but also by SNR, assigning SNR sub-classes to each transient morphology. Including too many PCs degrades classification efficiency because too much noise is included. For PCAT the most effective method for the selection of the number of PCs was found to be the number suggested by the position of the knee of the variance curve. Choosing a high percentage of the variance may not be ideal in the case of glitches because it is not possible to eliminate background noise from the glitch waveforms

Data set 3 results
This subsection shows the results for the third data set, which contains transient waveforms that have a wide range of parameters.
4.3.1. PCAT. PCAT identifies 1480/3000 of the noise transients. They are classified into seven different types, as shown in table 6. Thirty-three PCs represented 75% of the variance of the data set. The classification results are mixed, with type 2 being the exception, containing 100% of the simulated transients belonging to the Gaussian morphology. From the distribution of transient peak frequencies for each PCAT type, shown in figure 6(a), the mixed classification can be understood as a frequency-based classification. Type 3 contains the highest frequency transients. Types 7 and 5 contain the lower frequency transients. There are a few RD and SG transients that are classified as type 2 (G), which have frequency distributions similar to the Gaussian waveforms (70-150 Hz). The wide range of parameters of the simulated waveforms makes it hard to capture the full range of the waveform parameters in the first few PCs, therefore, the main parameter captured by the PCs is frequency, on which the classification is then based. Table 6 also shows the results using 20 PCs, which corresponds to the approximate location of the knee of the variance curve. Changing the method used to select the number of PCs that represent this data set did not lead to an improvement in the result. The five PCs represent 67% of the variance of the SG waveforms, 93% of the variance of the Gaussian waveforms and 80% of the variance of the RD waveforms The results are shown in table 6. The table shows that type 2 contains the majority of the Gaussian waveforms, and the other two types of waveform morphologies are mixed in types 1 and 3. Figure 6(b) shows the frequency distribution of the three different types of transients. Type 1 contains the mid frequency range (300-700 Hz) waveforms and type 3 contains higher frequency waveforms (700-1500 Hz). A small number of low frequency SG and RD morphologies 20% ( ) were in the type 2 class with the Gaussian transients. The 12% of Gaussian morphologies that were incorrectly classified had low SNR values 20 .
( )  used to make the PCs were not uniform. The type 1 (SG) transients used to make the PCs contained more mid frequency range waveforms. The type 3 (RD) transients used to make the PCs contained more higher frequency waveforms. This shows that for real glitch types with a wider range of parameters, we need to be careful in the selection of waveforms that are used to make the PCs for the signal model, so that we do not introduce a bias in the results in certain areas of the parameter space. 4.3.3. WDF-ML. WDF detected 2547/3000 of the noise transients in data set 3, using a threshold SNR value of 15. The SG and RD waveforms morphologies are mixed together in type 0. The Gaussian waveforms have been split between types 1 and 2. The frequency for each WDF type in the data set is shown in figure 6(c). Type 2 contains the lower frequency Gaussian waveforms (up to ∼250 Hz), and the type 1 Gaussian waveforms have frequencies as high as ∼1500 Hz. The Gaussian waveforms that were incorrectly classified into type 0 were Gaussian waveforms with a low SNR 20 .
( )  Choosing more components for the spectral embedding stage will result in more subtypes for the SG and RD waveforms, but no clear distinction between the two types. In this data set the noise transients are spread in frequency and duration, therefore, results could be improved by using a multi-window analysis. This is a feature that will be added to a future version of the WDF-ML algorithm. 4.3.4. Summary. All three methods were able to correctly classify the Gaussian noise transients included in this data set, but were unable to distinguish between the SG and RD waveform morphologies when the range of parameters for the waveforms was very large. This is because a low frequency SG waveform has a closer waveform shape to a low frequency RD waveform than to a high frequency SG waveform. Real glitches with characteristic waveforms usually have narrow frequency or duration distributions, but this data set allows us to test the limitations of the different transient classifying algorithms.
The wide range of parameters of the simulated waveforms, especially duration, make it difficult to capture the variability of the waveforms in the first few PCs. The PCAT and LIB classification could be improved by being more selective about which waveforms are included in the making of the PCs. WDF-ML may see similar improvements by altering the windowing parameter used in the analysis.

Discussion
This paper introduces three new methods for the fast classification of noise transients in advanced gravitational-wave detectors and shows the results of testing and comparing these methods on data sets containing simulated noise transients. The purpose of this is to provide information that can lead to an improvement in DQ during a science run.
We show that all three methods can classify transient waveforms in gravitational-wave detectors with a high level of efficiency. In our first data set, which has transients well separated in frequency and SNR, over 97% efficiency is obtained by all three methods. Reducing the threshold of the trigger generators, and therefore including transients with an SNR less than 20, can reduce the classification efficiency. In the second data set we show that all three methods can classify noise transients by waveform morphology alone. This can be split into further types by frequency and SNR if the number of types requested is bigger than the number of morphology types in the detector data. The third data set was more challenging to classify due to the large range of parameters of the simulated transients.
The different algorithms identified different numbers of signals in the data. To identify transients PCAT's trigger generator measures the excess power in the time series of a given channel. More sophisticated methods for transient identification have been devised, and they are in use in the LIGO and Virgo data analysis and detector characterization groups. However, the main goal of PCAT's algorithm is to provide a proof of concept for classification of transients rather than to provide a trigger generator for detector characterization analysis. Thus a simple identification method based on excess power in time bins is sufficient for our scope. Future plans for the use of the PCA technique include improving the trigger generator or to interface the PCAT code with an existing trigger generator already in use by the LIGO and Virgo Collaborations.
For PCAT and LIB the number of PCs that are used can have a large effect on the results of the classification. If too many PCs are used then an incorrect classification is given, due to some of the PCs consisting of only noise. As we cannot eliminate the background noise from the glitch waveforms that are used to make the PCs, we have found the best method of choosing the number of PCs to be the position of the 'knee' of the variance plots. For WDF-ML the selection of the analyzing window size for the wavelet transform is fundamental for a correct classification. The window must be larger than the length of the transients in the data, and to avoid a false classification of a noise transient the waveform must not be overlapping between two windows.
In this study we only use the gravitational-wave channel of the detector. As all signals in the data will be classified into glitch types it is possible that a real gravitational-wave signal could be included in our glitch classification results. This could be avoided by removing signals that are coincident between two detectors before applying the classification methods. In future work we plan to include multiple auxiliary channels in the classification procedure. If a noise transient occurs in the gravitational-wave channel in time coincidence with an auxiliary channel, it can help us to identify the cause of the transient type [32,33]. The number of possible auxiliary channels may be very large, which makes ML an ideal tool for this type of classification due to the speed at which ML methods can process a large volume of detector data.
PCAT runs daily on data from the aLIGO detectors, providing a powerful diagnostic tool to the detector characterization team in preparation for the first aLIGO observing run (O1). LIB plans to start running daily on aLIGO data before the start of O1, and to provide information back to the detector characterization teams to be used during DQ shifts. WDF has been used as a noise transient event trigger generator, and monitoring tool, during past Virgo science runs. The ML classification procedure of WDF-ML is an innovative addition to this algorithm that will be used to classify transients during the advanced detector science runs. The algorithms can be run on parallel computing clusters, and the code can be optimised, to allow the algorithms to run efficiently in real time.