A graph-based spectral classification of Type II supernovae

Given the ever-increasing number of time-domain astronomical surveys, employing robust, interpretative, and automated data-driven classification schemes is pivotal. Based on graph theory, we present new data-driven classification heuristics for spectral data. A spectral classification scheme of Type II supernovae (SNe II) is proposed based on the phase relative to the maximum light in the $V$ band and the end of the plateau phase. We utilize a compiled optical data set that comprises 145 SNe and 1595 optical spectra in 4000-9000 $\overset{\circ}{\mathrm {A}}$. Our classification method naturally identifies outliers and arranges the different SNe in terms of their major spectral features. We compare our approach to the off-the-shelf umap manifold learning and show that both strategies are consistent with a continuous variation of spectral types rather than discrete families. The automated classification naturally reflects the fast evolution of Type II SNe around the maximum light while showcasing their homogeneity close to the end of the plateau phase. The scheme we develop could be more widely applicable to unsupervised time series classification or characterisation of other functional data.


Introduction
Classification systems have a long history, dating back to ancient times. One of the oldest documented examples is a pharmacopoeia called "The Divine Farmer's Materia Medica", which dates from around 200 AD (Yang, 1998). However, it was only centuries later that the foundations of modern taxonomy were laid by the Linnaean classification of organisms (Linnaeus, 1758) and Darwin's theory of common descent (Darwin, 1859). In astronomy, two well-known examples of classification systems are the Morgan-Keenan system of stellar spectral classification (Morgan & Keenan, 1973) and the Hertzsprung-Russell diagram (e.g. Chiosi et al., 1992, and references therein), which organise stars according to their luminosity, effective temperature, and evolutionary stage. With the advent of large-scale surveys, the diversity of observed objects has considerably increased, making the construction of a coherent taxonomy that embraces this plurality a particularly challenging task.
One essential tool employed in such tasks is graph networks. Graph theory originated in Königsberg, the capital of Eastern Prussia, in 1735, when Leonard Euler offered a rigorous mathematical proof that there is no path to walk across all seven bridges over the Pregel river without crossing one bridge at least twice. This solution is the first documented real-world problem solved by a graph (Shields, 2012).
More than a historical anecdote, this episode also elucidates that some problems can be solved by changing the paradigm we approach them. Graphs have been previously used in astronomy to visualise multivariate tabular data (de Souza & Ciardi, 2015) and have gained popularity in diverse fields, including ecology (Farage et al., 2021), neural sciences (Bessadok et al., 2021;Li et al., 2021), and computer vision (Vasudevan et al., 2022). They are particularly suitable for simplifying and highlighting hidden data structures and associations in multidimensional and heterogeneous datasets.
A notable example of a problem involving heterogeneous data is the challenge of building a coherent classification system for supernovae explosions. Supernovae classification relies on spectra and has traditionally been divided into two main classes: Type I SNe (SNe I) -those lacking any hydrogen emission lines in their spectra -and Type II SNe (SNe II) -those with hydrogen emission (Minkowski, 1941). SNe II are a diverse group exhibiting many photometric and spectroscopic features, including a broad range of brightness, visual decline rates, photospheric phase duration, and spectral line features.
In particular, Type II SNe have been historically divided into two groups based on their light-curve properties: IIP -sharp rise to the peak after ∼ 5-15 days of the explosion, followed by a plateau phase of nearly 70-120 days, and IIL -shorter cooling phase and faster luminosity decline rate (Gal-Yam, 2017). Conversely, other classes are defined based on their spectroscopic properties: IIn (Schlegel, 1990) -long-lasting narrow Balmer emission lines -and IIb (Filippenko, 1988) -transitional objects that begin their evolution similarly to SNe II and evolve towards a state with no H α but prominent [Ca II] and [O I] lines (typical of SNe Ib ). It has been suggested that SNe IIb are physically distinct from other SNe II (e.g. Pessi et al., 2019). An additional photometric subclass, namely SN 1987A-like SNe, displays a peculiar long rise to maximum (e.g., McCray, 2017), with spectra similar to SNe IIP . The connection between different SN II subtypes and their physical origin has been discussed extensively. Direct identification of SN IIP progenitors has suggested a red supergiant (RSG) origin (with mass ∼ 8 − 17M ⊙ ; see e.g. Smartt, 2009). Candidate SN IIL progenitors have been more difficult to identify, although tentative identifications have suggested that these may be more massive (e.g. Elias-Rosa et al., 2010. Some photometric analyses have suggested that SNe IIP and IIL are fully distinct with different physical mechanisms (e.g. red supergiants vs. magnetars; see Arcavi et al., 2012), whereas others have pointed to a single continuous family (e.g. Anderson et al., 2014;Gutiérrez et al., 2014;Galbany et al., 2016;Valenti et al., 2016) with ob-servational properties driven by the scale and density of the hydrodgen envelopes on the progenitors. Furthermore, in recent years, a large fraction of SNe II have been shown to present circumstellar material ejected before explosion -or extended atmospheres in the progenitor RSGs -explaining the fast rise in the light-curves (e.g. González-Gaitán et al., 2015;Morozova et al., 2017;Förster et al., 2018) and the clear narrow emission signatures in early spectra (e.g. Yaron et al., 2017).
Traditional classification methods include template matching (Howell et al., 2005;Blondin & Tonry, 2007;Harutyunyan et al., 2008), and similarity of specific spectral features (Sun & Gal-Yam, 2017;Prentice & Mazzali, 2017). Re-classification schemes have been proposed because of their wide spectral diversity, photometric features, and the limitations of the template-matching approach, given the lack of enough representative spectra. Ransome et al. (2021) proposed a new classification scheme for Type IIn solely based on a detectable narrow feature in the H α profile, whereas Anderson et al. (2014) suggested using the post-maximum decline rate to classify SNe II. Alternatively, data-driven approaches have been developed in previous years.
Noteworthy examples of spectral classification methods include Williamson et al. (2019), who utilized principal component analysis (PCA) for categorizing a sample of 160 stripped-envelope core-collapse supernovae. Chen et al. (2018) employed functional PCA to extract spectral features of SNe IIP/IIL, which were then classified using Support Vector Machine (SVM) and Artificial Neural Network (ANN). For a broader scope of supernova types, Muthukrishna et al. (2019) developed dash, a deep-learning-based model trained on over 4,000 classified spectra for determining a supernova spectrum's type, age, redshift, and host galaxy component. More recently, Bengyat & Gal-Yam (2022) applied an unsupervised Random Forest algorithm to characterize supernovae based on their spectral-temporal distribution.
Supernova classification often involves both the separation of distinct classes (e.g., SNe Ia vs. core-collapse) and the characterization of potentially continuous subfamilies (e.g., SNe II), where the full continuity of a class may not be known a priori. This work introduces a novel approach that can efficiently handle a mixture of discrete and continuous classes. By employing graph networks for unsupervised learning, it defines a taxonomic landscape of SNe II (SNe IIP and SNe IIL), resulting in a data-driven classification scheme that can rapidly update with new data.
Our heuristics rely on a hierarchical classification of the SNe II spectra based on similarity, conveying an intuitive graph-based visualisation. To account for the time evolution of SNe spectra, we apply our analysis over two reference phases of spectral evolution, around maximum light and the end of the plateau phase. Additionally, we compare our methodology with more complex manifoldlearning-based classification and find they lead to similar results, favouring a continuous spectral variation instead of compact families. The rest of the paper is organised as follows. Section 2 describes our data compilation and pre-processing. Section 3 describes the heuristics behind the automated spectral taxonomy scheme. Section 4 displays the spectral classification of Type II SNe for two different phases. Section 5 finalizes with our conclusions.

Data description
In this section, we describe the spectral and photometric dataset and pre-processing used in this work.
Our spectroscopic data are predominantly compiled from the Carnegie Type-II Supernova Survey and Carnegie Supernova Project (Gutiérrez et al., 2017) or taken from the Weizmann Interactive Supernova Data Repository (Yaron & Gal-Yam, 2012) and Asiago Supernova Catalogue (Barbon et al., 1989(Barbon et al., , 1999. We also make use of data from the Public ESO Spectroscopic Survey of Transient Objects (Smartt et al., 2015), Lick Observatory Supernova Search (de Jaeger et al., 2019), Center for Astrophysics Supernova Program (Modjaz et al., 2014;Hicken et al., 2017) We require a successful estimate of V-band maximum date via a polynomial fit to the light curve (as in Galbany et al., 2016). This step narrows the sample to 186 SNe, of which 180 have at least two observed spectra. Starting from this selection of 180 SNe II, we carry out our preprocessing (see Section 2.2) to homogenise the data, leading to a final sample of 145 SNe II.

Pre-processing
We create a flux-calibrated spectroscopic sequence sample by partially following the scheme of Vincenzi et al. (2019) 1 . The first step is to perform a Gaussian process (GP) regression (see, e.g., Rasmussen & Williams, 2006) on the photometric light curves, allowing one to estimate fluxes at the times of the observed spectra. We choose not to adopt the physically motivated extrapolation of Vincenzi et al. (2019) and strictly interpolate our light curves. Recently, Stevance & Lee (2022) highlighted some challenges associated with GP regression on SN II light curves. Our fits are carried out using a Matérn-3/2 kernel, identified by Stevance & Lee, 2022 as reasonably fitting the fast transitions in SN IIP light curves (something we also find to be true in practice). We have avoided extrapolating or estimating photometric parameters and have visually inspected for overfitting. We have found that the GP regression method is robust enough for interpolation, which is necessary for flux calibration.
After smoothing the observed spectra with a Savitzky-Golay filter of window size 100 Å (Savitzky & Golay, 1964), we synthesize the fluxes in the passbands in which photometry is available. We then compare those synthesized fluxes to fluxes interpolated from the light curves, allowing us to estimate smooth mangling functions to fluxcalibrate the spectra. This step uses a second GP regression in the wavelength direction, with a fixed scale of 300 Å. The light curves and flux-calibrated spectra are then jointly modelled as a two-dimensional GP in time and wavelength, under the approximation that the light curve points are observations of this surface at the central wavelengths of their passbands. We use the same Matérn-3/2 kernel as Vincenzi et al. (2019), with a fixed time and wavelength-scale of 30 days and 100 Å. The procedure enables all spectra and photometry for a given supernova to provide a principled extrapolation of the observed spectra to unobserved wavelengths.
We do not interpolate the flux surface to unobserved times, as the GP can over-smooth spectral features if this interpolation strays far from the epochs of real spectra. Additionally, we do not adopt the priors of Vincenzi et al. (2019). Instead, we rely solely on the data to justify extrapolation on a comparable wavelength range, which we carry out within the 4000-9000 Årange. Finally, the second round of mangling is performed on the extended spectra, yielding a set of flux-calibrated spectra on a consistent wavelength grid. We correct for Milky Way extinction but do not attempt to correct for host galaxy dust. A sample of 145 SNe II with a total of 1595 spectra 2 can be successfully pre-processed through the full pipeline. For those that could not be pre-processed in this way, there was at least one stage where a reliable GP regression was not possible. Figure 1 shows an example of the process for SN1999em. The solid lines represent the original spectra after Savitzky-Golay smoothing and the first round of mangling. The dashed line represents the final estimated spectra for maximum light and the end of the plateau phase.
Having mangled and extended the spectra, we sorted them into bins by rest-frame epoch. Specifically, we select two "target" epochs of ± 3 rest-frame days around maximum light and ± 5 days around the end of the plateau phase 3 . These two reference epochs are obtained in dif-2 For the sample of 145 SNe II, the median number of spectra per SNe is eight. The most well-covered supernova (SN 2012ec) has 68 spectra. There are 38 SNe with more than 1595/145 = 11 spectra. The median spectrum epoch is 31.25 days past the estimated time of maximum light, and 42 days before the end of the plateau. The interquartile range of our spectral epochs spans 9.12 to 71.79 days post-maximum. The 90th percentile falls at 135.79 days after maximum-light. Around 10 (13) per cent of our spectra are pre-maximum (post-plateau), with 62 (49) supernovae having a pre-maximum (post-plateau) spectrum. 3 The time after the plateau or linear decline phase in Type II ferent ways: the maximum is calculated with polynomial fits in V-band, whereas the end of the plateau is obtained by fitting a Fermi-Dirac step function (see e.g. Olivares 2008). These two reference epochs are thus independent of each other. We then select the closest spectra falling into one of these bins for each supernova. This procedure yields a sample of 65 SN spectra around maximum light and 27 spectra around the end of the plateau phase. The remainder of the 145 SNe do not have any spectra sufficiently close to our chosen target epochs.

Graph-based clustering
This section conveys a brief overview of the methods employed in our work. The steps include the construction of a pairwise dissimilarity matrix from all SNe spectra in a given phase, a minimum spanning tree algorithm (MST) to simplify the structure, and a graph community detection to characterize potential groups of similar spectra and outliers.

Dissimilarity Matrix
Estimating the proximity between two datasets in high-dimensional space is inherently difficult due to the curse of dimensionality. This is because the performance of similarity indexing structures in high dimensions degrades rapidly. Therefore, the choice of a distance metric is crucial, and it is not always straightforward. Previous studies have shown that for a wide variety of problems, the ℓ 1 (Manhattan) norm performs better than the ℓ 2 (Euclidean) norm (Aggarwal et al., 2001).
SNe represents a transition between the photospheric shock-powered hydrogen recombination phase and the start of a radioactive decaypowered 56 Co → 56 Fe linear decline phase.
The ℓ p -norm is given by: Where x and y represent two independent vectors of features, i is the index of the feature and d is the number of features. In our case, the features are the flux values at each wavelength bin, and the number of bins is 501, in the range of 4000 to 9000 Å. Thus, we choose to use the ℓ 1 -norm to measure the similarity between any two supernova spectra: Here, x i and y i represent the flux values at each wavelength bin i for two different supernova spectra x and y. Given this, the dissimilarity matrix is constructed by calculating the pairwise Manhattan distances between all the spectra in a given phase. The resulting heatmap of the dissimilarity matrix is shown in Figure 2 for maximum light and the end of the plateau phase. A simple visual inspection suggests that there is some level of structure in the data. For example, SN2004dj stands out as an outlier at the end of the plateau phase, not correlating strongly with any other supernovae. The next step is to make sense of this information and convey an intuitive visualization. To ensure the robustness of our results, we also conducted a sensitivity analysis using the ℓ 2 -norm and generated a corresponding dissimilarity matrix, which is presented in Appendix A. Although the resulting group arrangement of the supernovae was slightly different from that of the original analysis using the Manhattan distance measure, our subsequent analysis was not qualitatively affected by this change.

Connected, Undirected Graphs
To further understand the relationships between the supernova spectra, we use the concept of a connected, undirected graph. In this approach, each supernova spectrum is represented by a vertex, and the edges connecting these vertices represent the similarities between the spectra, i.e. the closer two SNe the more similar their spectra are. Precisely, the weight or length of each edge corresponds to the degree of similarity between the two supernova spectra that it connects. The closer two supernovae are, the more similar their spectra will be, and the shorter the corresponding edge will be. A connected, undirected graph is one in which every pair of vertices has a path connecting them. This means at least one sequence of edges connects any two vertices in the graph. Additionally, an undirected graph is one in which the edges do not have a specific direction. This means that if there exists an edge between two vertices, that edge can be traversed in either direction. The edges of a connected, undirected graph connect all the vertices to each other directly or indirectly. The connected, undirected graph is a useful tool for modelling relationships between objects without a clear discontinuity between them. Additionally, it is also useful for modelling relationships in which the direction of the relationship does not matter. In our case, this approach allows us to understand the similarities between the supernova spectra and to identify potential groups of similar spectra, as well as outliers. The use of a minimum spanning tree algorithm and graph community detection methods can also aid in simplifying the structure and characterizing the relationships in the data.

Minimum spanning tree
Minimum Spanning Tree (MST) algorithms have gained widespread popularity as a powerful tool for identifying clusters of heterogeneous data in various fields. Here, the algorithm is applied to graphs in which the vertices represent SNe spectra and the edges represent the distances between each pair of spectra, as calculated using the ℓ 1norm Equation 1. The MST is a subset of the edges in a connected, undirected graph that spans all the vertices without forming any cycles, while minimizing the total edge weight. In the context of SNe spectra, this weight represents the dissimilarity between each pair of spectra.
One of the most widely used and well-known MST algorithms is Prim's algorithm, first proposed by Robert C. Prim in 1957(Prim, 1957. This algorithm starts with an arbitrary root vertex and repeatedly adds the shortest edge that connects a new vertex to the tree. This process is repeated until all vertices are connected, resulting in a multilayered representation of the similarities between all spectra in the data for a given phase. This representation captures global and local associations, allowing for a comprehensive and in-depth data analysis. Figure 3 shows the matrix representation of the graph after applying the MST to the dissimilarity matrix shown in Figure 2.

Graph community detection
The final step in the analysis is to identify groups of SN spectra that are similar. We will start by looking at the simplified dissimilarity matrix shown in Figure 3. To identify groups of similar objects, known as "communities," we will use the Infomap algorithm (Rosvall & Bergstrom, 2008). sn2008in  sn2003ho  sn2007sq  sn2013am  sn2003fb  sn1987a  sn2008fq  sn1998a  sn2006au  sn2013fc  sn2004ej  sn1990k  sn2000cb  sn2003hg  sn1992am  sn2008bh  sn2003ci  sn2003b  sn2007oc  sn1990e  sn2002hj  sn2003e  sn2008w  sn1999ca  sn1986l  sn2007p  sn2003hf  sn2005dq  sn2013hj  sn1993k  sn2003bn  sn1999em  sn2006iw  sn2008if  sn2007ld  sn2012a  sn2003iq  sn2012ec  sn2008hg  sn2007od  sn1995ad  sn2009a  sn2008gr  sn2007u  sn2008br  sn2007av  sn2003cx  sn2006be  sn1996w  sn2005z  sn2005lw  sn2003cn  sn2009dd  sn2002ew  sn2014g  sn2002gd  sn1999cr  sn1999br  sn1992ba  sn2012aw  sn2006bp  sn2005cs  sn2006qr  sn2004fc   The Infomap algorithm is a method for detecting the community structure of networks. It is based on finding the most probable path a random walker would take through the network. The algorithm uses this information to identify groups of nodes (i.e. the communities) that are strongly connected within themselves but less connected to other groups. It does this by minimising the information cost of the paths taken by the random walker and using this to identify the most likely community structure of the network. The method is effective in identifying communities in a wide range of networks (Lancichinetti & Fortunato, 2009). The Infomap algorithm is implemented in the igraph package (Csardi & Nepusz, 2006), which provides a user-friendly interface for its application.

Results and Discussion
The top panels of Figs. 4 and 5 present the graph network representation of the diversity of supernova (SN) spectra at maximum light and the end of the plateau phase, respectively. The bottom panels of these figures show the corresponding spectra in each group, along with the median spectra for each group (indicated by black dashed lines).
A visual inspection of these figures reveals several interesting features. The method used to construct the graph network representation effectively splits the spectra at maximum light based on their line features and the spectral slope at lower wavelengths. One of the key aspects that can be immediately recognized is the continuum of SNe from the top-right part of the graph towards the bottom-left, which traces the relative intensity of H α . The spectra at the end of the plateau phase appear to show less diversity than those at maximum light, although the smaller sample size for this phase prevents us from making strong claims about the relative diversity of the spectra. The figures presented are a valuable tool for understanding the diversity of supernovae spectra. They enable a clear comparison of similarities and differences among different groups of spectra. They can be used to discover new SNe subclasses or examine the relationships between existing subclasses.
In the following, we compare the results of our graph classification method with those of a more traditional classification method found in the literature. Additional insights can be derived from Figure 6. SN2009dd is a Type II supernova that displays characteristics of both the bright branch and weak interaction between the circumstellar material and the ejecta, as indicated by high-velocity features in the Balmer lines (Inserra et al., 2013). This fea-ture is commonly observed in Type IIn SNe and Type IIL SNe (Bostroem et al., 2019). Our graph-based analysis positions SN2009dd between SN2014g and SN2005lw, both Type IIL at maximum light phase, suggesting that SN2009dd may be a transitional link between standard Type IIP and strongly interacting Type IIn supernovae. This further highlights the challenges in devising a clear classification scheme for these objects, as luminous Type II SNe like SN2009dd, can exhibit properties of multiple supernova types.
SN2006ai is noteworthy for its central position in the graph and its classification as a transitional type between Type IIL and IIb supernovae, with short-plateau phases, potentially indicating a high-mass red supergiant progenitor (Hiramatsu et al., 2021). Additionally, SN2014g, located close to SN2006ai, is classified as a Type IIL supernova, with a post-maximum light curve decline consistent with this classification. However, it was initially classified as a Type IIn due to strong emission lines in the earliest days of the explosion, likely caused by a metalrich circumstellar medium (CSM), possibly resulting from pre-explosion mass loss events (Terreran et al., 2016). SN2008aw also shows an extra absorption component on the blue side of H α (Gutiérrez et al., 2014). Not surprisingly, SN1987A appears to deviate from the general trend on the graph and is located on the periphery, while still maintaining a connection to other Type IIP supernovae.
While an in-depth examination of individual supernovae is not within the scope of this work, the examples above demonstrate how our approach can assist domain specialists in identifying objects within a population. These identified objects may provide crucial insights into their subsets' composition.

UMAP projection
For an independent visualisation of the SNe spectra at each phase, we use the Uniform Manifold Approximation and Projection (UMAP) algorithm (McInnes et al., 2018). UMAP is a manifold learning algorithm that aims to project high-dimensional data into a lower dimension while preserving local distances over global distances. It is similar in scope to other manifold learning algorithms such as t-SNE (van der Maaten & Hinton, 2008), ISOMAPs (Tenenbaum et al., 2000), and diffusion maps (Coifman et al., 2005). However, UMAP is computationally more efficient and has been shown to have better discriminating power than other state-of-the-art methods such as t-SNE (McInnes et al., 2018).   Dim 2 1987A-like II II-pec IIL IIL/n IIP IIP-fast dec IIP-pec Figure 7: The two-dimensional embedding of SN spectra by UMAP at maximum light is depicted in two panels. The left panel displays our graph-based community groups, while the right panel depicts the standard classification from the literature. In both panels, each SN is coded with a unique colour and shape according to its respective classification, which was assigned independently of the UMAP embedding. Dim 2 1987A-like/II-pec II IIL IIP IIP-short plateau Figure 8: The two-dimensional embedding of SN spectra by UMAP at the end of the plateau phase is depicted in two panels. The left panel displays our graph-based community groups, while the right panel depicts the standard classification from the literature. In both panels, each SN is coded with a unique colour and shape according to its respective classification, which was assigned independently of the UMAP embedding. and shape-coded according to the groups identified by graph analysis. A visual examination of the figure reveals a strong correspondence between these two independent methodologies. It is worth noting that the groups were not utilised in the UMAP projection and are highlighted solely for visualisation purposes. A closer examination of the UMAP projection confirms that while the SN spectra exhibit distinct features, particularly around maximum light, they form a continuum that ranges from the high blue slope, featureless spectra to spectra dominated by H α features. This figure provides independent validation of the results obtained through graph analysis and further supports the conclusion that the SN spectra constitute a continuous family, displaying varying spectral features. Although the limited size of our dataset hinders our ability to make definitive statements, we were able to draw some anecdotal insights from the 11 supernova (SN) spectra present in both phases. Notably, sn2004ej and sn2013fc appear next to each other in the graphs and UMAP representations, as do sn2003iq and sn1992ba. This similarity in positioning suggests a similar temporal evolution of these objects. Furthermore, an Adjusted Rand Index (ARI) of 0.5 4 was calculated for these 11 objects, indicating a moderate agreement between the group assignments. This result suggests some coherence in how SNe spectra evolve, but also indicates a certain degree of variance between different SNe spectra.
Both traditional classification and our methodology reveal distinct patterns in the UMAP space. However, upon visual inspection of the UMAP projection, it is apparent that our classification scheme presents a more unified and seamless distribution of classes solely based on SN spectra. Although this analysis cannot infer any potential differences in the progenitor, our methodology offers a more quantitative approach to organising SNe based on their spectra alone. Additionally, it demonstrates consistency with more advanced and independent non-linear feature extraction analysis. The motivation behind this step is to provide a validation check of our previous analysis, which involved a series of straightforward steps, from the computation of the dissimilarity matrix to the construction of the network. UMAP , on the other hand, is a non-linear dimensionality reduction technique. The consistency between these two methodologies suggests that a small set of primary properties can explain the SN spectra.
Similarly to any data-driven approach, the results presented in this work are bounded by the amount of information contained in the initial data set. Nevertheless, within the constraints imposed by the available data, the results presented here are a first hint of the potential application of such techniques. A larger and more diverse data set would help to verify or refute the relationships suggested here and possibly enable new discoveries.

Conclusions
This study presents a novel heuristic approach for automated spectral classification, which combines pairwise ℓ 1 -norm dissimilarity and graph community detection. The main advantage of this method is its ability to create a continuous representation of the spectral taxonomy, suitable for capturing major groups, outliers, and transitional types. We have applied our method to a sample of supernova spectra from various catalogues and analyzed it in two spectral phases: maximum light and end of the plateau phase.
Around maximum light, our method effectively captures the fast evolution of Type II supernova spectra, translating it into a structured graph. Additionally, it highlights their similarities and enables their representation as a continuum ranging from featureless and continuumdominated to H α dominated groups. On the other hand, the spectra around the end of the plateau phase give rise to a less complex graph structure, revealing the more extensive homogeneity among the members of this group.
Independent analysis using manifold learning projection confirms this result. This methodology is versatile and should be readily applicable to other unsupervised spectral classification problems or, more broadly, to other types of functional data in astronomy, such as photometric time series. The consistency between the two completely distinct analyses (graph-based clustering and UMAP ) and the ability to plug in a physically motivated distance metric between spectra is a plus. This approach represents a more flexible paradigm than traditionally employed classification schemes. A snippet code to map a tabular data into a network visualization is publicly available within the COINtoolbox 5 .
Overall, this study presents a powerful new tool for the analysis of spectral data, which may help to enhance our understanding of the underlying physical mechanisms driving the observed phenomena. The visualisation of SN spectra through a two-dimensional embedding by UMAP is presented in two panels. The left panel shows the graph-based community groups using Manhattan distance, while the right panel showcases the classification based on Euclidean distance between the SNe spectra. Each SN is distinguished with a unique colour and shape corresponding to its classification on each panel. Note, however, that the specific color and shape assigned to each cluster may vary between the panels. The visualisation of SN spectra through a two-dimensional embedding by UMAP is presented in two panels. The left panel shows the graph-based community groups using Manhattan distance, while the right panel showcases the classification based on Euclidean distance between the SNe spectra. Each SN is distinguished by a unique color and shape corresponding to its classification in both panels. Note, however, that the specific color and shape assigned to each cluster may vary between the panels.