Optimal Classification and Outlier Detection for Stripped-Envelope Core-Collapse Supernovae

In the current era of time-domain astronomy, it is increasingly important to have rigorous, data driven models for classifying transients, including supernovae. We present the first application of Principal Component Analysis to the spectra of stripped-envelope core-collapse supernovae. We use one of the largest compiled optical datasets of stripped-envelope supernovae, containing 160 SNe and 1551 spectra. We find that the first 5 principal components capture 79\% of the variance of our spectral sample, which contains the main families of stripped supernovae: Ib, IIb, Ic and broad-lined Ic. We develop a quantitative, data-driven classification method using a support vector machine, and explore stripped-envelope supernovae classification as a function of phase relative to V-band maximum light. Our classification method naturally identifies"transition"supernovae and supernovae with contested labels, which we discuss in detail. We find that the stripped-envelope supernovae types are most distinguishable in the later phase ranges of $10\pm5$ days and $15\pm5$ days relative to V-band maximum, and we discuss the implications of our findings for current and future surveys such as ZTF and LSST.


INTRODUCTION
Supernova classification is a longstanding challenge in the astronomical community. The first spectral classification of supernovae (SNe) was introduced by Minkowski (1941) who defined two classes, Type I (Hydrogen absent) versus Type II (Hydrogen present). This broad criterion is still in use today, and multiple subclasses were added as the number of SNe spectra increased and spectral differences were observed (for a comprehensive review of SNe classifcation see Filippenko 1997;Gal-Yam 2017). In this work, we focus on stripped-envelope core-collapse supernovae (SESNe; Clocchiatti et al. 1997), which are the deaths of massive (> 8M ) stars that have lost part or all of their outer Hydrogen and Helium layers. The diversity of the amount of these elements remaining in the outer envelopes of the stellar progenitors at the time of explosion is the likely explanation for the classification into three major SNe classes: Type Ib (spectra have conspicuous He features), Type IIb (spectra show strong H at early phases, He features at later phases), and Type Ic (no prominent H nor He features in spectra). For more detailed review of SESNe see Filippenko et al. (1993); Matheson et al. (2001); Woosley et al. (2002); Modjaz et al. (2014); Liu et al. (2016). Over the last 20 years, the class of broad-lined SNe Ic (Ic-bl) has emerged with members showing spectra devoid of strong lines of H and He, but with broad lines that indicate expansion velocities between 15000 and 20000 km/s (Modjaz et al. 2016;Prentice & Mazzali 2017;Sahu et al. 2018). In addition, the Ic-bl type is the only SN type associated with longduration gamma ray bursts (for reviews see Woosley & Bloom 2006;Modjaz 2011;Cano et al. 2017).
SESNe classes. However, template matching has some downsides. It is difficult to gain physical insight into stellar progenitors from a simple similarity measure. In addition, template matching classification methods do not directly yield a physical understanding of the differences between different classes. The second category of classification techniques focuses on characterizing specific spectral features (i.e. line depth or width and line intensity or velocity) at particular wavelengths (Sun & Gal-Yam 2017; Prentice & Mazzali 2017). These specific feature techniques allow for more physical interpretation than template matching, but they do not use all of the information available in a spectrum.
In this paper, we propose a new classification technique for SESNe using Principal Component Analysis (PCA) combined with Support Vector Machine (SVM).
PCA is a dimensionality reduction algorithm that linearly transforms data in order to capture as much information as possible in the smallest number of transformed features, called principal components (PC's). PCA has been previously applied to attempt to understand the diversity of SNe Ia subtypes (Cormier & Davis 2011;Sasdelli et al. 2014), but this is the first application of PCA to SESNe. After applying a PCA decomposition to our SESNe spectral dataset, we use a multi-class linear SVM, a supervised learning method, to classify our SNe. This work is the first application of such machine learning techniques to spectroscopically classifying SESNe.
Our PCA and SVM based algorithm allows continuous, quantitative classification that reflects the physical properties of SESNe stellar progenitors. Instead of the traditional SN classification with four discreet classes (IIb, Ib, Ic, Ic-bl), our classification method facilitates better understanding of which SNe are representative of their class, and which are "transition" objects, and comparison between the SESNe mean spectra and our constructed eigenspectra allows us to physically interpret our results. New and upcoming data releases by the ) will drastically increase the number of SESNe spectra. In this new data-rich context, a continuous, and data-driven classifier will be crucial for addressing some of the most interesting outstanding questions pertaining to SESNe.

SESNE SPECTRAL DATASET
In this section, we describe the spectral dataset used in this work and the preprocessing applied to the data before our analysis is performed. We expand the SESNe spectral library produced and compiled in Modjaz et al. well-typed SNe with light curves for which a date of maximum can be extracted. The dataset contains 160 SNe and 1551 spectra. We exclude SNe Ib-n, SNe Ib-Ca, superluminous supernovae, and SNe that transition between normal and excluded types. We restrict the spectra in our sample to the optical wavelength range 4000Å to 7000Å since the vast majority of our SNe have observed fluxes in this wavelength range, and this range contains features of both H and He that drive the classification. For newly added SN spectra obtained from the literature or directly from authors, we follow the same preprocessing steps detailed in Liu et al. (2016) that were used in subsequent papers of our group (Liu et al. 2016;Modjaz et al. 2016). The preprocessing is briefly summarized as follows: when newly added spectra lack a date of V-band maximum (but do have a date of maximum in other bands), we convert their date of maximum to V-band using the process described by Bianco et al (2014). Spectra are redshift corrected when necessary, and the continuum removal and normalization (spectra are scaled by their means to have relative fluxes) is performed with tools within the SNID framework (Blondin & Tonry 2007). In the few cases where telluric lines are present in the spectra, the tellurics are removed using linear interpolation consistent with the procedure in Liu et al (2016). Small gaps in the spectra are similarly interpolated before a fourier based smoothing is applied (Liu et al. 2016). The bandpass filter used by SNID for classification purposes is not applied. A summary of our dataset can be found in Table 1, and the SNID templates of the newly added SNe are released on our github page 1 . Figure 1. Cumulative fraction of variance of the entire SESNe dataset captured by nPC eigenspectra. The first 5 eigenspectra capture 79% of the sample variance.

METHODS
In this section, we present a brief background on the two machine learning methods used in our analysis, PCA and SVM, as well as details on our specific application. For both methods, we use the scikit-learn 2 implementation (Pedregosa et al. 2011). For a detailed review of PCA theory see Pearson (1901) and Jolliffe (2011), and for a detailed review of SVM theory see Vapnik (1998). Our research is reproducible: all code and raw data is accessible on github 3 .

PCA-Derivation of Eigenspectra
PCA is a dimensionality reduction technique based on singular value decomposition of a data matrix. The principal components are the eigenvectors of the covariance matrix of the data, and are therefore orthonormal. Each PC is a linear combination of the original data features (normalized fluxes) and therefore has the same wavelength range as our original data. We therefore use the term "eigenspectra" to describe the PC's and discuss their physical interpretation in Section 4. The eigenspectra are ordered according to how much variance from the mean of the dataset each component captures. Thus, the original spectra can be projected onto a subset of the eigenspectra while maximizing the amount of information retained. Figure 1 shows the cumulative amount of variance of the entire dataset captured as a function of the number of PC's. The first five eigenspectra contain seventy-nine percent of the variance. Figure  2 shows an example supernova, SN2011ei (type IIb), reconstructed using increasingly larger numbers of eigenspectra. In the top panel, only the first five eigenspectra are used, and the large scale spectral features are almost entirely reconstructed. For the purpose of classification, we mostly care about the large scale features, so considering only the first 5 eigenspectra of our PCA decomposition is a good first step to reduce the complexity of the problem. days. An increasing number of eigenspectra (nPC) is used to reconstruct the original spectrum from top to bottom. As nPC increases, more features are captured, but 5 eigenspectra already capture the H and He features (indicated by shaded regions).
Since SNe change over time, in this work we apply a PCA decomposition to four different phase ranges of spectra: 0 ± 5, 5 ± 5, 10 ± 5, and 15 ± 5 days relative to V-band maximum. We present and discuss in detail the eigenspectra for the phase range t Vmax = 15 ± 5 days in Section 4.1. The time dependence of the eigenspectra as a function of phase is discussed in Section 4.2, but in general we find that there is very little change in the large scale features of a given eigenspectrum over time.

SVM-A New Approach to SESNe Classification
For each of the four phase ranges in this work, we train a multi-class linear SVM on the 2D projection of SESNe spectra onto each pair of the first five eigenspectra in order to understand which eigenspectra are most useful for classification. Specifically, we use the LinearSVC class from scikit-learn which implements SVM classification using LIBLINEAR (Fan et al. 2008) and employs the "one-vs-rest" approach to multi-class labeling: a binary linear SVM is trained to distinguish each class of SESNe from the rest of the population, and these binary classifiers are combined to make final decisions on predicting the labels of new data. Each binary SVM determines the optimal hyperplane that separates one class from the rest of the data. For each 2D projection, we randomly generate multiple train-test splits of the data (a random subset of 70% of the data is used to train the SVM, while the remaining 30% is used to test the ability of the SVM to accurately predict SNe classes). Using multiple train-test splits on each 2D projection allows us to report a mean test score for the SVM and to gain insight into the uncertainty of the SVM linear decision boundaries. The results of our SVM classification are discussed in Section 5.

PHYSICAL INTERPRETATIONS OF EIGENSPECTRA
One of the major benefits of our PCA and SVM based classification method is that we can physically interpret the eigenspectra using mean spectra of each of the SESNe classes. This allows us to understand why the SVM identifies certain eigenspectra as better classifiers than others, and how this behavior changes as a function of phase.

Comparing Eigenspectra to SESNe Mean Spectra
The first few eigenspectra are the most important building blocks for reconstructing a spectrum from our dataset. Therefore, in order to understand any strong eigenspectra features, we compare the first five eigenspectra for the phase range t Vmax = 15 ± 5 days to the mean spectra for each of the four SESNe types, presented in Liu et al. (2016) and Modjaz et al. (2016). The first 5 eigenspectra are plotted in Figure 3, along with the mean spectra for the four SESN subtypes. The principal components are naturally normalized, and we choose the sign of each component to properly represent the absorption features they capture. We highlight a few important features of each of the first 5 eigenspectra: 1. PC1 has a strong trough that lines up with the HeI5876 absorption feature present in both types IIb and Ib mean spectra, as well as the absorption feature in the Ic mean spectrum, (the cause of which is debated; Dessart & Hillier 2010).
3. PC3 has small troughs in the Hα and Hβ regions in addition to a stronger trough in the HeI5876 region.
4. PC4 has strong troughs in the Hα and Hβ regions, but lacks a strong HeI5876 feature.
5. Due to the broadening of their features, SNe Icbl are effectively nearly featureless spectra, which result in a much smoother average spectrum than any of the first 5 PC's.
These similarities between the eigenspectra and the SESNe mean spectra provide an excellent context to interpret the SVM classification. From Figure 3, we see that all of the SESNe types except Ic-bl have an absorption feature near λ ≈ 5876Å (although this feature is most likely not due to Helium for the Ic type). Moreover, as shown in Liu et al. (2016), this feature exists in the IIb, Ib, and Ic mean spectra even at early phases. Therefore we conclude that PCA generates eigenspectra that match previously identified important SESNe spectral features. Figure 4. Change in eigenspectrum order between tV max = 0±5 days vs later phase ranges. PC5 at early times is equivalent to PC3 at later times as they capture the same features: Hα, Hβ, and HeI5876 as discussed in Section 4.1. Similarly, PC3 at early times is equivalent to PC4 at later times because they both primarily capture Hα and Hβ absorption. Otherwise, the important large scale features do not change with time.

Time Evolution of Eigenspectra
In Section 4.1 we present the eigenspectra only for the phase range t Vmax = 15 ± 5 days because we find the SESNe types to be maximally separated at this phase, as we show in Section 5. Here we discuss how the eigenspectra change as a function of time. We have calculated and compared the first five eigenspectra for each of the phase ranges 0 ± 5 days, 5 ± 5 days, 10 ± 5 days, and 15 ± 5 days, relative to V-band date of max. We find that there is very little change for a given eigenspectrum across different phases. However, there is a slight change in the ordering of the first five eigenspectra between the later phase ranges and the t Vmax = 0 ± 5 day phase range. Figure 4 shows that PC5 at phase t Vmax = 0 ± 5 days corresponds (i.e. is most similar) to PC3 of the later phase ranges, and PC3 at phase t Vmax = 0 ± 5 corresponds to PC4 of the later phases. In the later phase ranges, PC3 is the eigenspectrum with weak troughs in the Hα and Hβ regions and a strong trough in the HeI5876 region. Thus it is not surprising that this eigenspectrum captures less variance of the sample in the earliest phase range. Liu et al. (2016) showed that the pseudo-equivalent line width (pEW) of HeI5876 in SNe types IIb and Ib are at their lowest values near V-band maximum and increase over time. PC4 in the later phases, which consists of two strong troughs at the Hα and Hβ wavelengths, is more highly ranked in the t Vmax = 0 ± 5 phase range because the Hα absorption feature is very strong in type IIb spectra even at early phases.

SVM CLASSIFICATION RESULTS
We project each spectrum on a 2D plane in PCA space (as done in Bianco et al. 2016) and classify the SNe using an SVM model. The main results of our work are shown in Figure 5. We find that we can recreate the SNID labels of our dataset. Furthermore, we find that the optimal phase ranges for classifying SESNe are t Vmax = 10 ± 5 days and t Vmax = 15 ± 5 days, as opposed to at maximum light (t Vmax = 0±5 days). This is important in a future that, with the advent of LSST, will see an overwhelming number of SN discoveries, and a radical pressure on the urgency of spectroscopic follow-up for classification. Lowering the pressure on immediate follow-up for one type of transient (SESNe) alleviates pressure on the follow-up facilities altogether. Figure 5 shows the two-dimensional projection of our SESNe spectra onto the optimal eigenspectra pairs that maximally separate subclasses: PC1 vs PC3 for t Vmax = 5 ± 5, 10 ± 5, 15 ± 5 days, and PC1 vs PC5 for t Vmax = 0 ± 5 days (as we described in Section 4.2, PC5 at t Vmax = 0 ± 5 corresponds to PC3 in the later phase ranges). The colored regions illustrate the linear SVM decision boundaries. Boundaries for 50 different 70%-30% train-test splits of the data are shown, thus assessing the statistical robustness of the decision boundaries. The SVM test-score, a measure of the accuracy of the classification, is indicated in each figure panel, including uncertainties generated from the 50 train-test splits. Colored ellipses in each panel represent the 1-standarddeviation (1-σ) contours of the PC coefficients for the different SESNe types. We have not included SNe Ibpec (e.g. SN2007uy, SN2009er Modjaz et al. 2014) nor SNe Ic-pec (e.g. SN2005ek Drout et al. 2013) in the calculation of the ellipses (but we do show the datapoints of these peculier subtypes). Figure 5. Each panel shows the SESNe classification regions and linear decision boundaries for each SVM train-test split of the data. Ellipses represent the 1 standard deviation contour of the PC coefficients for each SESN type (excluding the peculiar SNe SN2007uy, SN2009er, and SN2005ek). Outliers of more than 2 standard deviations from the mean are marked with stars. The phase range tV max is labeled in the upper left of each panel, along with the mean SVM test-score. PC1 vs PC3 provides the highest SVM test-score for each phase range except tV max = 0 ± 5 where PC1 vs PC2 has a slightly (< 1σ) higher SVM test-score but very similar 1-σ contour and SVM region overlap. Upper Left: (tV max = 0 ± 5 days) There is large overlap between the IIb (green), Ib (blue), and Ic (orange) 1-σ contours, and between the SVM IIb and Ib region, and the IIb and Ic region (as indicated by the region boundaries changing significantly for different train-test splits and the colors bleeding into each other). Upper Right: (tV max = 5 ± 5 days) As as tV max = 0 ± 5 days, there is overlap between the IIb, Ib, and Ic 1-σ contours, and the corresponding SVM regions. Lower Left: (tV max = 10 ± 5 days) The IIb, Ib, Ic, and Ic-bl 1-σ contours are completely separated. Each colored SVM region is well defined and stable for different train-test splits, and the SVM test score is highest. The Ic (orange) SVM region has collapsed and the IIb (green) SVM region has expanded. Lower Right: (tV max = 15 ± 5 days) SESNe type 1-σ contours are well-separated and the SVM regions are stable.

Classification in the PC1 vs PC3 Projection
Both SVM test-score and the 1-σ ellipses allow us to evaluate the success of our classifying scheme. The twodimensional projection of the eigenspectra pair PC1 vs PC3 have a higher average SVM test-score in the later phase ranges (t Vmax = 10 ± 5, 15 ± 5, test-scores .71 ± .10, and .70 ± .11 respectively), i.e. the SESNe classes are maximally separated in this space. Similarly the 1σ contours (ellipses) have minimal overlap in the later phase ranges. Therefore we find that the optimal time for classifying SESNe spectra is later than-(t Vmax = 10±5, 15±5 days) rather than at-or near-peak (t Vmax = 0 ± 5, 5 ± 5 days).
PC1 is a poor choice of eigenspectrum for SESNe classification at early times because the HeI5876 absorption feature in Ic, IIb, and Ib spectra has not had time to strengthen. In the phase ranges t Vmax = 10 ± 5 and t Vmax = 15±5 days, PC1 and PC3 both become more effective at distinguishing between SESNe spectral types, with less overlap in the 1-σ contours and a higher SVM test-score. In particular, we find that the PC1 coefficients of SNe types IIb and Ib increase (while SNe Ic PC1 coefficients remain relatively unchanged) as phase increases. Since PC1 captures the strong feature at λ ≈ 5600 − 5800Å which is due to He in SNe Ib and IIb, this behavior is consistent with Liu et al. (2016), which found that the pseudo-equivalent width (pEW) of the HeI absorption features in SNe types IIb and Ib increases as a function of phase. Figure 5 also shows that as phase increases, PC3 becomes better at distinguishing between SNe types IIb (green region) and Ib (blue region). Specifically, the SNe type IIb PC3 coefficients systematically increase with increasing phase. Since PC3 captures the Hα and Hβ features, this behavior is consistent with the strengthening of the Hβ absorption feature in SNe IIb mean spectra shown in Liu et al. (2016).
The SNe Ic-bl region (gray) is reasonably well separated from the other SESNe types at all phases in Figure 5. However, note that the Ic-bl data and the corresponding 1 standard deviation ellipse is centered near the origin in every panel. Moreover, we find the PC coefficients of the SNe Ic-bl to be clustered around zero in every two dimensional projection of the first five eigenspectra. This is expected because the SNe Ic-bl mean spectra do not have a strong absorption feature due to HeI5876, even if it were highly broadened (Modjaz et al. 2016).

Transition Supernovae and Type Outliers in PCA Space
One major benefit of our work is that the PC coefficients of the SESNe in our sample are continuous, and therefore well suited for capturing the physical continuity of chemical abundances in SNe ejecta. This behavior is particularly useful for objectively identifying "transition" SNe, which often have debated classification in the literature due to spectra that resemble more than one SESN type. Our method also identifies outliers in a particular class that are extreme versions of the SESN type, but not "transition" SNe. In Figure 5 we label all SNe in each panel that are more than 2 standard deviation outliers and discuss them below. Figure 5 shows two SNe Ib that are consistently strong outliers: SN2007uy and SN2009er. These two supernovae either appear within the SNe Ic-bl (gray) region or close to the SVM decision boundary separating the Ic-bl and Ib regions (note that if SN2007uy or SN2009er does not appear in a panel of Figure 5, it is because we have no spectra in the corresponding phase range). SN2007uy and SN2009er have been previously identified in the literature (Modjaz et al. 2014) as peculiar members of the Ib class. Modjaz et al. showed that SN2007uy and SN2009er have broader features at higher velocities than normal SNe Ib spectra, in agreement with our results. We also find that SN1990I, SN1998dt, and SN2004gq are consistent outliers towards the Ic-bl region, although to lesser degrees than SN2007uy and SN2009er. Elmhamdi et al. (2004) have previously identified SN1990I as having high velocity features atypical of a normal SN Ib, and Modjaz et al. (2014) show that SN2004gq and SN1990I both have high absorption velocity He features compared to other SNe Ib spectra. The outliers SN1990I, SN1998dt, and SN2004gq may form a continuum of SN Ib spectra with higher than normal doppler shifts, while SN2007uy and SN2009er indicate the possibility for a continuum of SNe Ib spectra with varying amounts of line blending. SN1999ex was initially classified as an SN Ic, then changed to an SN Ib/c due to moderate HeI absorption features (Hamuy et al. 2002). More recently, SN1999ex has been classified as an SN Ib (Modjaz et al. 2014). We identify SN1999ex as an outlier in multiple 2D projections of PCA space, indicating that it is not a standard SN Ib nor a standard SN Ic.

Type Ib Outliers
We also identify SN1990U, SN2007kj, and SN2007Y as outliers in Figure 5. SN1990U (found in the green SN IIb region) has previously been considered as an SN Ic (Matheson et al. 2001) and more recently as an SN Ib (Modjaz et al. 2014). Although we identify SN1990U as an outlier SN Ib in the PC1 vs PC3 projection, in the other projections it is a SN Ib, and in no projection is SN1990U located in the standard SN Ic region. Therefore our results support the reclassification of SN1990U as an SN Ib by Modjaz et al. (2014). SN2007kj was previously classified as an SN Ib/c "transition" object (Leloudas et al. 2011) and more recently as a SN Ib (Modjaz et al. 2014). We find that SN2007kj would be considered a strong outlier as an SN Ic in every 2D projection of the first 5 eigenspectra, while it is consistent with being a standard SN Ib in multiple 2D projections (not shown) other than PC1 vs PC3. Therefore we support the reclassification of SN2007kj as an SN Ib by Modjaz et al. (2014). SN2007Y has been classified both as an SN IIb (Folatelli et al. 2014) and an SN Ib (Liu et al. 2016). In the PC1 vs PC3 2D projection, we find that SN2007Y falls in the IIb region, consistent with Folatelli et al. (2014) who argued that SN2007Y is a SN IIb due to the strength and velocity of the HeI5876 feature. However, in another 2D projection (not shown), namely PC1 vs PC4 (strong Hα and Hβ features) at phases t Vmax = 5 ± 5, 15 ± 5 days, we find that SN2007Y falls in the Ib region, in agreement with Liu et al. (2016) who found that the H feature evolution of SN2007Y was consistent with SN Ib spectra. Thus our classification method captures the debate over the correct type for SN2007Y.

Type IIb Outliers
In Figure 5 we label the following outlier SNe IIb: SN2010as, SN2011ei, and SN2016gkg. At early times, SN2010as appears on the decision boundary between types Ic (orange) and IIb (green), which is consistent with Folatelli et al. (2014) who found that SN2010as exhibits weaker than normal He features at early times, in addition to weak H features. SN2011ei is a strong outlier in the PC1 vs PC3 2D projection. Milisavljevic et al. (2013) showed that SN2011ei evolves quickly, losing its H features within a week after V-band maximum, to resemble a type Ib spectrum characterized by Helium features. Figure 5 illustrates this evolution, with SN2011ei initially a standard IIb at phase t Vmax = 0 ± 5 days, then subsequently moving to the Ib region. However, Liu et al. (2016) showed that the Hα equivalent width evolves differently for type IIb and Ib spectra (including SN2011ei), so SN2011ei is distinguishable as a SN IIb even at late times. When we consider the PC1 vs PC4 (strong Hα feature) 2D projections (not shown) at the later phase ranges t Vmax = 5 ± 5, 10 ± 5, 15 ± 5 days, we find SN2011ei to be consistently within the IIb (green) region in agreement with Liu et al. (2016). SN2016gkg is classified as a SN IIb due to its Hα absorption, but Tartaglia et al. (2017) showed that SN2016gkg exhibits stronger than normal Helium features even at early times, similar to an SN Ib. Figure 5 captures this behavior, showing SN2016gkg as a strong outlier in the Ib (blue) region at t Vmax = 0±5 days, but a more normal SN Ib in other 2D projections (not shown).

Type Ic/Ic-bl Outliers
We identify three Ic outliers, SN1990B, SN1994I, SN2005az, and six Ic-bl outliers, SN2002ap, SN2007bg, SN2007ru, SN2010ay, SN2010bh, and SN2016coi in Figure 5.SN1990B is currently considered an SN Ic, however it was initially classified as an SN Ib (Clocchiatti et al. 2001), which is consistent with our results in Figure 5. SN1994I is one of only a few SN Ic with many spectra taken over a range of wavelength regimes (e.g. Filippenko et al. (1995), Richmond et al. (1996), Immler et al. (1998), and it is considered a prototypical SN Ic. However, we find that SN1994I is considered an outlier in many 2D PCA projections, at multiple phases, as illustrated in Figure 5. Our results indicate that SN1994I may not be a prototypical SN Ic, confirming the spectroscopic analysis of Modjaz et al. (2016) and the photometric analysis of Drout et al. (2011) and Bianco et al. (2014). SN2005az was initially classified as both an SN Ic (Aldering et al. 2005) and an SN Ib (Quimby et al. 2005). Recently SN2005az has been classified as an SN Ic (Kelly & Kirshner 2012) using SNID based on the updated SESNe library from Modjaz et al. (2014) and Liu et al. (2016). We find that SN2005az is inconsistent with being an SN Ib in the majority of 2D pojections, and when it is consistent with belonging to the Ib or IIb class, this is due to large overlap of the Ic and Ib/IIb regions at t Vmax = 0 ± 5 days. Meanwhile, there are some PCA 2D projections (PC2 vs PC4, not shown) where SN2005az is located within the SN Ic one standard deviation ellipse, so we support the classification of SN2005az as an SN Ic.
SN2002ap is claimed to be a relatively low energy SN Ic-bl compared to normal SNe Ic-bl events (Mazzali et al. 2002) and has been classified as a normal SN Ic from radio observations (Berger et al. 2002). We find that SN2002ap is indeed a potential transition object between the Ic and Ic-bl regions in Figure 5. Although SN2007bg is identified as an outlier at phase t Vmax = 5±5 days, it is well within the Ic-bl SVM region (gray), and it no longer fulfills the outlier criterion at later phases, in agreement with the literature view that SN2007bg is a standard SN Ic-bl (Young et al. 2010). Similarly, although SN2007ru is marked as an outlier in the lower left panel of Figure 5, it is well within the SN Ic-bl SVM region and considered a standard SN Ic-bl (Sahu et al. 2009). SN2010bh is considered a standard SN Ic-bl, although with slightly higher inferred explosion energy than other standard Ic-bl SNe (Chornock et al. 2010). At late times (bottom right) of Figure 5, we find that SN2010ay is a strong SN Ic-bl outlier well within the Ic-bl region. SN2010ay is a particularly interesting Ic-bl because it has been proposed that SN2010ay was associated with an off-axis low luminosity gamma ray burst, due to its high absorption velocity, high peak luminosity, and low metallicity (Sanders et al. 2012), combined with a lack of observed gamma rays. SN2016coi has broad spectral features in addition to a strong absorption feature generally attributed to HeI in the literature (Prentice et al. 2018), setting it apart from normal Ic-bl SNe. We find that SN2016coi is located right at the SVM boundary between the Ic-bl and Ib regions consistent with SN2016coi being similar to the SN Ib class.

SUMMARY & FUTURE WORK
In this work, we have shown that PCA is a useful tool as a first step towards a data driven classification method for SESNe types. We used multi-class linear SVM's to explore different projections of SESNe spectra onto eigenspectra and found that the SESNe types are more distinguishable in the later phase ranges t Vmax ≈ 10 − 15 days relative to V-band maximum, rather than at peak light. We recommend that spectral follow-up of ZTF and LSST supernovae take these considerations into account. In addition, our classification method naturally provides a continuous, quantifiable method for characterizing "transition" SNe based on distance to class boundaries or centroids. We showed that our classification method identified both "transition" SNe and SNe with debated types previously identified in the literature, and we interpreted these SNe using our PCA eigenspectra and our SVM classification regions.
PCA is clearly a promising dimensionality reduction tool for SESNe, and there are many future projects that would use the work presented here as a starting point. In particular, the probability of a supernova's membership in one of the SESNe types could be calculated using the distance from its PCA projection to an SVM decision boundary. This provides a quantitative understanding of "transition" SNe like the type Ibc's, and should especially be explored as a function of phase.