The following article is Open access

On the Evolutionary History of a Simulated Disk Galaxy as Seen by Phylogenetic Trees

, , , , , , , , , , , , , , and

Published 2024 February 15 © 2024. The Author(s). Published by the American Astronomical Society.
, , Citation Danielle de Brito Silva et al 2024 ApJ 962 154 DOI 10.3847/1538-4357/ad036a

Download Article PDF
DownloadArticle ePub

You need an eReader or compatible software to experience the benefits of the ePub3 file format.

0004-637X/962/2/154

Abstract

Phylogenetic methods have long been used in biology and more recently have been extended to other fields—for example, linguistics and technology—to study evolutionary histories. Galaxies also have an evolutionary history and fall within this broad phylogenetic framework. Under the hypothesis that chemical abundances can be used as a proxy for the interstellar medium's DNA, phylogenetic methods allow us to reconstruct hierarchical similarities and differences among stars—essentially, a tree of evolutionary relationships and thus history. In this work, we apply phylogenetic methods to a simulated disk galaxy obtained with a chemodynamical code to test the approach. We found that at least 100 stellar particles are required to reliably portray the evolutionary history of a selected stellar population in this simulation, and that the overall evolutionary history is reliably preserved when the typical uncertainties in the chemical abundances are smaller than 0.08 dex. The results show that the shapes of the trees are strongly affected by the age–metallicity relation, as well as the star formation history of the galaxy. We found that regions with low star formation rates produce shorter trees than regions with high star formation rates. Our analysis demonstrates that phylogenetic methods can shed light on the process of galaxy evolution.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

Several areas of evolutionary science investigate evolutionary histories with phylogenetic methods, including biology, language, and astronomy (Baum et al. 2005; Gray et al. 2009; Ricker et al. 2014; Jofré et al. 2017; Yaxley & Foley 2019; Jackson et al. 2021; Bromham et al. 2022). Phylogenetic methods were originally developed in the context of biology studies, when Charles Darwin described patterns of descent among organisms as an evolutionary tree (Darwin 1859). It was a century later that DNA was identified as the information that is passed from one generation to the next, connecting different life forms in the hierarchical way that Darwin had illustrated. This happens because the DNA replication between progenitor and offspring is not perfect, e.g., the new DNA is modified. Modifications accumulate over time, causing the life forms to differ more with time. If one population is divided and each subgroup is isolated, their evolution and cumulative modification will occur independently. This process is named diversification and produces a hierarchy. Nowadays, DNA is widely used as an input to estimate phylogenetic trees, allowing the exploration of the shared evolutionary histories of an immense variety of living organisms (Bromham 2008; Yang 2014).

This approach considers two main concepts. The first concept is heritability and the second is descent with modification. Heritability considers that there is information passed from one generation to the next one. Descent with modification stands for the knowledge that a characteristic transferred from one generation to the next one suffers small changes. These changes accumulate over time and if there is also diversification, a hierarchy in similarity is formed. Due to hierarchical similarity, related organisms have more similar characteristics.

The chemical evolution of galaxies respects both the concepts of heritability and descent with modification. Chemical evolution in galaxies is linked to stellar nucleosynthesis (Burbidge et al. 1957; Tinsley 1979; Matteucci 2012). At the last stages of evolution, stars pollute the interstellar medium (ISM) with the chemical elements they synthesized during their lifetimes, causing the modification of the chemical composition of the ISM of their parent galaxy. The enriched ISM will later be the origin of new generations of stars that are chemically altered with respect to the previous generation. A large fraction of the stars formed in each episode are low-mass objects, hence they live longer than this cycle of new stars forming and their atmospheres preserve the chemical composition of their birth environment. In this way, the chemical abundances of low-mass stars can be considered as a proxy for the ISM's DNA (Freeman & Bland-Hawthorn 2002) and are very important to unveil the history of the Galaxy.

Luckily, chemical abundances on an industrial scale are now available, which is revolutionizing the field of Galactic archeology, both due to direct discoveries from the data, as well as because they are necessary to validate chemical evolution models. In particular, thanks to surveys such as GALAH (Buder et al. 2020), APOGEE (Majewski et al. 2017; Abolfathi et al. 2018; Holtzman et al. 2018), and Gaia (Gaia Collaboration et al. 2016a, 2016b, 2018; Brown et al. 2021; Eyer et al. 2023; Recio-Blanco et al. 2023), chemical abundances up to millions of stars are now available to better explore the processes that shaped the Galaxy.

As an example of the power of chemical abundances to unveil the past of the Milky Way, it is possible to remark on the ongoing extensive search for the building blocks of the Milky Way. Nissen & Schuster (2010) found two different sequences in halo stars: one sequence containing stars enhanced in α elements (attributed to an ancient disk or bulge, which had its orbit heated due to a past merger event) and another sequence that is α-poor (an accreted dwarf galaxy). Hawkins et al. (2015) found a population of α-poor stars with abundances of Al, C + N, and Ni that is different from α-rich stars, indicating that the population had a different chemical enrichment history from the bulk of the Milky Way. Later works found evidence of a major-merger event using, among other information, chemical abundances. This major-merger event is believed to have occurred between the Milky Way and a galaxy whose remnant stellar population is now known as the Gaia Enceladus Sausage (GES; Belokurov et al. 2018; Helmi et al. 2018). Carrillo et al. (2022) studied the chemical abundances of 62 stars accreted from GES, considering a wide wavelength range from the optical to the infrared. They report that accreted stars have enhanced neutron capture abundances when compared with Milky Way stars, in particular of Eu, indicating differences in the chemical evolution of GES when compared with the Milky Way (see also Aguado et al. 2021; Matsuno et al. 2021; De Brito Silva et al. 2022, D. de Brito Silva 2023, in preparation). Buder et al. (2022) used GALAH chemical abundances to study accreted stars and concluded that they are chemically different from stars born in situ in terms of Cu, Mg, Si, Na, Al, Mn, Fe, and Ni. Horta et al. (2023) used data from Gaia and APOGEE to characterize 12 halo substructures, which were candidates to have accreted origins. We note that these are only a few examples, but numerous other works have made remarkable contributions to this topic.

It is undeniable how important chemical abundances are in order to understand the evolution of the Milky Way. However, several open questions still remain, such as the unknown number of building blocks (i.e., accreted galaxies) that constitute the Milky Way. The building blocks are also not fully characterized. Their detailed chemical abundance distributions, masses, star formation histories (SFHs), and age–metallicity relations (AMRs) are still not defined. Some of the accreted stellar populations attributed to different progenitor galaxies could actually be from the same galaxy, considering the caveats associated with their selection (see Buder et al. 2022; Horta et al. 2023). Currently, multiple works are starting to approach these questions using numerical simulations (e.g., Bignone et al. 2019; Monachesi et al. 2019; Amarante et al. 2022; Carrillo et al. 2024). In this paper, we resort to a novel approach to contribute to answering open questions in Galactic Archeology by applying phylogenetic concepts to galaxy formation.

Phylogeny applied to the chemistry of low-mass stars can be referred to as stellar phylogeny. It was proposed in Jofré et al. (2017), where the authors used 17 chemical elements to perform a phylogenetic study of 22 solar neighborhood stars. They found three groups that had different chemical enrichment rates measured from the relations between the age and other phylogenetic properties. A second stellar phylogenetic study of the Milky Way was performed in Jackson et al. (2021), where they used 78 solar neighborhood stars and 30 chemical elements to explore the Milky Way disk. The goal of that study was to test if more stars and elements would help to understand how the three groups found in Jofré et al. (2017) were related to each other. With the aid of new Gaia data (Gaia Collaboration et al. 2016a, 2016b, 2018; Brown et al. 2021; Eyer et al. 2023), they proposed that one of the three groups was an ancestral population of the groups associated to the thin disk, having a significantly higher star formation rate (SFR), due to perhaps a starburst during the first epochs of the thin-disk formation.

While studies have explored stellar phylogenies in observed data, using simulated data has become key to helping the interpretation of trees. The advantage of working with numerical simulations for these purposes is that they provide the full evolution of baryons as the gas is transformed into stars and chemical elements are produced and injected into the ISM where the stars evolve. Since the chemical evolution is known and the simulated stellar populations can be traced back in time, phylogenetic trees estimated from simulated stellar populations can be directly compared to the true evolution, to learn which particular features of the trees can be related to events in the formation and evolution of galaxies. In this paper, we propose to use simulated galaxies to advance the development of stellar phylogeny.

In addition, simulations allow the assessment of the maximum chemical abundance uncertainties for which a phylogenetic signal is sufficiently preserved to provide phylogenetic trees that portray reliable evolutionary histories. Furthermore, with simulated data, it is possible to assess for selection effects, since we have information about the entire galaxy.

Stellar phylogeny is still a very new approach and multiple questions about its applicability and interpretation remain open. Some of these questions can be best addressed by using simulations of galaxies. In this work, we use for the first time phylogenetics applied to a simulated disk galaxy in order to answer three specific questions. First, how many stellar particles are required to estimate phylogenetic trees that robustly portray the evolutionary history of this simulated galaxy? Second, how do the uncertainties in the chemical abundance data impact the robustness of the evolutionary history represented by phylogenetic trees? And third, can phylogenetic trees from different regions of a simulated galaxy, which have different histories of formation, illustrate the different evolutionary histories?

In Section 2, we describe how the phylogenetic trees are estimated and how we compare them. In Section 3, we describe the simulation used in this work, as well as the selection of stellar particles used to approach the different specific questions proposed. In Section 4, we present the results and interpretation of our findings. Finally, in Section 6, we present our summary and conclusions.

2. Phylogenetic Tree Construction and Analysis

In this section, we describe how the phylogenetic trees are estimated and compared. An exhaustive analysis of the suitability of phylogenetic trees for the reconstruction of the ISM history is given by C. J. L. Eldridge et al. (2023, in preparation).

2.1. Tree Concepts

To interpret the phylogenetic trees presented in this paper, we focus on key concepts from the trees that involve the branching pattern, the root, and the branch lengths. Extensive explanations of these concepts and their applicability can be found in the seminal books on trees and phylogenetics, such as those by Felsenstein (2004), Hall (2004), Lemey et al. (2004), Baum et al. (2005), and Yang (2014).

The branching pattern is related to the structure or topology of the tree. In biology, the tips represent present-day species, while the internal nodes represent the last common ancestor of all the tips that descend from it. In our case, the tips represent the stellar particles, which are stellar populations with a given age and chemical abundances. Most of these stellar particles are fossil records of an ISM that is now extinct.

The ancestral form of all the objects considered in a tree is the root. We note that estimating a tree with the algorithm we used does not provide a rooted tree, even if many tree reconstruction methods might display trees in rooted form. To root a phylogenetic tree is a delicate procedure, because depending on the root chosen, the ancestor–descendant temporal relationship of the tree changes and so does the reconstruction of the history. There are few ways to find the root, but most of them rely on an evolutionary model developed for biology. As a consequence, we need to consider an alternative approach. Since we are working with a simulated galaxy, and therefore we know the origin of each stellar particle, we can consider the most ancient ones that existed as soon as the ISM started evolving due to chemical enrichment for rooting. Therefore, we set the outgroup as the most ancient stellar particle in the simulation that is related to the ingroup (all other sampled particles) and place the root in the branch that connects that ancient stellar particle with the rest of the tree.

The length of a branch represents the amount of chemical change or chemical divergence between nodes. A tree showing only the topology without the branch length information can be referred to as a cladogram, while a tree that specifies the branch lengths can be referred to as a phylogram. This is important here, because that differs from the usage of dendrograms or some other mathematical tree graphs widely used in astronomy to perform data analysis, such as clustering or classifications: for example, HDBSCAN by Campello et al. (2013); t-SNE by Van der Maaten & Hinton (2008); and random forest by Ho (1995). We can associate a relation of branch length and the age between two tips or between the root and the tips as a measure of the chemical enrichment rate (see also Jofré et al. 2017).

2.2. Estimating Phylogenetic Trees

We use the same methodology thoroughly described in Jackson et al. (2021), which was adapted from Jofré et al. (2017). Briefly, it consists of three steps: (i) the selection of evolutionary traits; (ii) estimating the phylogenetic tree; and (iii) evaluating its robustness.

Encoding evolutionary traits is fundamental, since this has a direct impact on the tree topology and its interpretation. In modern biology, most trees are inferred from sequences of DNA, with each site in the sequence acting as an independent and discrete observation (Drummond & Rambaut 2007; Maddison & Maddison 2009; Hall 2013). In our case, the chemical abundances of stars are continuous. Fortunately, there are methods that use distances matrices and it is possible to calculate distances from continuous data.

Distance matrices are used to quantify the differences of traits between observations. In the case of our study, our traits are the chemical abundances of each single stellar population, as mentioned above (see also Section 3.1), which in the simulations are represented by stellar particles. The distance matrix is formed by the difference in chemical abundance (or chemical distance) of all the stellar particles we used to estimate a tree in relation to all the other particles. In order to calculate the pairwise distance of the stellar particles, we used the Euclidean distance. The total chemical distance between the stellar particles i and j was calculated as ${D}_{{\rm{i}},{\rm{j}}}\,={\sum }_{k=1}^{N}\sqrt{{\left({[{{\rm{X}}}_{k}/{\rm{H}}]}_{i}\right)}^{2}-{({[{{\rm{X}}}_{k}/{\rm{H}}]}_{j})}^{2}}$. For more details about chemical distances and distance matrices, we refer the reader to Jofré et al. (2017).

From the distance matrices, the phylogenetic trees are estimated with the neighbor-joining (NJ; Saitou & Nei 1987; Gascuel & Steel 2006) algorithm, which assesses the distances to find the most probable evolutionary sequence. This algorithm, unlike others available in the literature, does not compel equal distance between the root of the tree and any of the tips. This is an important consideration, because it is known that chemical evolution differs from place to place and from chemical element to chemical element (e.g., Matteucci 2012; Maiolino & Mannucci 2019; Johnson et al. 2023). Apart from this assumption that agrees with our knowledge of the chemical evolution of galaxies, NJ methods can be used to infer phylogenies from distance matrices (Kuhner & Felsenstein 1994; Atteson 1997; Lemey et al. 2004; Mihaescu et al. 2009; Jofré et al. 2017; Jackson et al. 2021). The NJ method has the advantage of being very fast and simple to implement, which satisfies our needs, since we aim to empirically test phylogenetic approaches in a data set that is not one governed by the biological law of evolution. For more fundamental discussion about the usage of NJ trees in galaxy evolution, we refer to C. J. L. Eldridge et al. (2023, in preparation).

2.3. Comparing Phylogenetic Trees

2.3.1. Robinson–Foulds Distance

One common method to compare trees is the widely used measure of topological distance between two trees, as defined by Robinson & Foulds (1981), which is referred to as the Robinson–Foulds distance (RFD).

The RFD evaluates how similar two trees are by matching the similarity between a partition or split in one tree and its pair on the second tree. The partition distance is defined as the total number of splits that exist in one tree but not on the other. It can be equivalently defined as the number of contractions and expansions needed to transform one tree into the other. Removing an internal branch by reducing its length to zero is a contraction, while creating an internal branch is an expansion. For a rooted tree with n tips and (n − 2) internal nodes, the partition distance ranges between 0 and ${D}_{\max }=2(n-2)$ (see Yang 2014 for extensive discussion). The RFD considers a performance parameter $P=1-D/{D}_{\max }$ to assess the similarity between trees. We note that the RFD varies between 0 and 1, where the smaller the value, the more similar two phylogenetic trees are.

There are a few limitations on using the RFD. First, as it only focuses on splits in the trees, it does not consider the branch length as information for similarity. Second, some deep relationships in the tree might be neglected for trees in which the splits of the outer nodes are different, despite sharing internal nodes. This implies that while the performance of the RFD ranges between 0 and 1, two random trees normally differ by 80%.

We comment that the RFD parameter can only be calculated for a tree estimated from the same set of objects. It serves thus to compare different input data, but not to compare different sets of objects, since the identification of splits in different trees cannot be matched. In order to calculate the RFD, we used the R library treedist 15 (Smith 2020a, 2020b, 2022) and the module TreeDistance, which follows Smith (2020a) and uses the concepts of entropy and information described in MacKay & Mac Kay (2003).

2.3.2. Consensus Tree

While tree distances are a measure of how different trees are, consensus trees summarize common features about a collection of trees. In the same way as the RFD, a consensus tree can be obtained when the set of objects used to estimate trees is the same.

In this work, we consider the majority-rule consensus tree, which shows the branches and splits that are present in the majority of the trees. Majority is defined as more than 50%. A consensus tree is a summary tree that essentially selects the nodes that appear in at least half of the trees and rejects all other nodes. Rejected nodes are transformed in polytomies, e.g., there are more than two branches connecting a given node with a tip (Baum et al. 2005). There are two types of polytomies: hard and soft. Hard polytomies are associated with multifurcations in the tree, while soft polytomies are associated with unresolved relationships in the tree. Soft polytomies are an indication of lower phylogenetic resolution in the tree. Hence, polytomies can imply a particular extreme event that might give rise to several evolutionary paths, but in a consensus tree they might illustrate a lack of accuracy in the data to solve the branching pattern of the historical events. Therefore, while consensus trees are not ideal to study the evolutionary history of a galaxy, they are extremely useful to study the global properties of a set of phylogenetic trees, since they display their common features.

It is worth noting that polytomies in a consensus tree are a way to illustrate uncertainties, and do not represent a particular evolutionary event that could cause a large divergence of lineages. It is therefore not encouraged to interpret evolutionary histories with consensus trees because the polytomies can easily lead to wrong interpretations.

3. Simulated Data

In this work, we use the data of a simulated disk galaxy. The information available from the simulation will be used to characterize the level of agreement between the evolutionary history traced by the phylogenetic trees and the history of the simulated galaxy. This way, we will take numerous advantages of the information provided by using hydrodynamical simulations. First, chemical abundances and ages for a large number of stellar particles are available. This allows the consideration of selection biases that are common when working with observed data. Second, it provides the opportunity to examine in detail the places and times different stellar particles were formed, which allows an assessment of the reliability of the phylogenetic trees to assign connections. Finally, the simulation provides information about the galaxy studied, from its SFR through time to its AMR and the nucleosynthetic channels that produce different chemical elements. Therefore, by using simulated data, we can estimate phylogenetic trees for which reverse engineering of the evolutionary history traced is possible.

3.1. Simulations

For this paper, we use a preprepared simulation of an isolated disk galaxy. This simple initial condition allows us to perform the construction and analysis of the phylogenetic trees in a system that does not receive material (gas inflows or mergers) from the the surroundings. It is simple enough to be used as a first test bed for phylogenetic trees. Therefore, this simulated disk galaxy is not expected to represent a real galaxy. From this starting point, we will build up more complex galaxy formation scenarios until reaching maturity in the technique, to adequately apply phylogenetic trees in a cosmological context in future works.

The analyzed simulation was performed by using a version of the P-GADGET-3 code (Springel 2005), which includes a multiphase model for the gas component, metal-dependent cooling, star formation, and supernova feedback, as described in Scannapieco et al. (2005) and Scannapieco et al. (2006). A Chabrier Initial Mass Function is assumed, with a lower and upper mass cutoff of 0.1 and 40 M respectively, (Chabrier 2003).

The chemical evolution model includes enrichment by Type Ia (SNe Ia) and Type II (SNe II) supernovae (Mosconi et al. 2001; Scannapieco et al. 2006). The SNe Ia events are assumed to originate from CO white-dwarf binary systems, in which the explosion is triggered when the primary star, due to mass transfer from its companion, exceeds the Chandrasekhar limit. For simplicity, the lifetimes of the progenitor systems (delay times) are assumed to be randomly distributed over the range [0.7, 1.1] Gyr. This simple model for the lifetime distribution produces consistent results with the single-degenerated model (Jimenez et al. 2015). The nucleosynthesis yields of SNe Ia correspond to Iwamoto et al. (1999). SNe II originate from massive stars with lifetimes estimated according to Raiteri et al. (1996). Their nucleosynthesis products are derived from the metal-dependent yields of Woosley & Weaver (1995). The chemical model traces the following 12 different chemical elements: H (hydrogen), 4He (helium), 12C (carbon), 14N (nitrogen), 16O (oxygen), 20Ne (neon), 24Mg (magnesium), 28Si (silicon), 32S (sulfur), 40Ca (calcium),56Fe (iron), and 62Zn (zinc). Initially, the gas component is assumed to have primordial abundances, i.e., XH = 0.76, YHe = 0.24, and Z =0.

The initial conditions correspond to a disk galaxy composed of a dark matter (DM) halo, a stellar bulge component, and an exponential disk, with a total baryonic mass of mb ∼ 5.2 × 1010 M. The halo and bulge components were modeled by a Navarro–Frenk–White profile (Navarro 1996) and a Hernquist profile (Hernquist 1990), respectively. The gas component is distributed in the disk and accounts for 50% of the total disk mass. The initial gas-mass particle is mgas = 1.96 × 105 M. The gravitational softening (i.e., a numerical length introduced to avoid unrealistic gravitational forces during particles' close encounters) adopted is 200 pc for the gas and star particles and 320 pc for the DM component.

Each stellar particle represents a single stellar population with the same age and chemical abundances. Hereafter, we will use the standard definition $[X/H]=({\mathrm{log}}_{10}{{\rm{X}}}_{* }/{H}_{* })\mbox{--}({\mathrm{log}}_{10}{{\rm{X}}}_{\odot }/{H}_{\odot })$, where X and H are the abundances of the elements X and H, respectively. Hence, for each stellar particle, the abundances can be defined by combining the chemical elements described above.

3.2. Data

The simulated galaxy has a strong initial starburst that, while widely spread, is more intense in its central region. After the initial starburst, the star formation activity decreases. We chose to follow the evolution of the system until this time as this allows SNe Ia to take place in the simulation. Since the simulation starts with primordial gas, the first stellar particles that formed will have Z = 0, where Z is the so-called metallicity that quantifies the abundances of elements heavier than He. However, this simulation does not include a model for the formation of such stellar particles, which are known to be different from second-generation ones. Considering this and the fact that chemical abundances are the input parameters to estimate phylogenetic trees, we excluded the stellar particles that have been formed from primordial gas. The stellar particles selected for the analysis have ages ≤1.5 Gyr and −3.0 ≤ [Fe/H] ≤ 0.5, approximately. We used these particles to create different subsamples that were used to explore the different specific questions concerning this analysis.

3.2.1. Stellar Samples

For our study, we perform different selections of stellar particles from different regions of the simulated disk galaxy, as described above. We refer to them as deterministic, noise, Group 01, Group 02, Group 03, and Group 04. They are summarized in Table 1 and explained below.

Table 1. Descriptions of the Different Samples of Stellar Particles Used in This Work

Sample NameDescriptionSample SizeUsed In
DeterministicSphere of 1 kpc of radius centered at the position (0,0,0). We only consider stellar particles where progenitor gas particles were inside the sphere at the beginning of the simulation and have remained within the same region since they were born.761Section 4.2. Phylogenetic Signal in Numerical Simulations. Section 4.3. Phylogenetic Signal Considering Uncertainties.
NoiseBuilt using chemical abundances randomly created, without any astrophysical meaning. The synthetic chemical abundances created respect the range of the distribution as observed in the simulation.10, 50, 100 and 200Section 4.2. Phylogenetic Signal in Numerical Simulations.
Group 01Sphere with center at (0,0,0), without the other constraints considered in the deterministic sample (birthplace or location of progenitor gas particle).2365Section 4.4. Evolutionary History Considering Different Regions of the Galaxy.
Group 02Sphere with center at (3,3,0).324Section 4.4. Evolutionary History Considering Different Regions of the Galaxy.
Group 03Sphere with center at (−3,3,0).478Section 4.4. Evolutionary History Considering Different Regions of the Galaxy.
Group 04Sphere with center at (−5,5,0).159Section 4.4. Evolutionary History Considering Different Regions of the Galaxy.

Note. We note that the total number of stellar particles (sample size) refers to the global number of the entire sample and not the number of stellar particles used to estimate the phylogenetic trees. All of the spheres used to select Groups 01, 02, 03, and 04 and the deterministic sample have a radius of 1 kpc.

Download table as:  ASCIITypeset image

The deterministic sample is our primary sample and was created to explore the phylogenetic signal based on the number of stellar particles used to estimate the phylogenetic trees (see Section 4.2) and also the impact that uncertainties on the chemical abundances have in this kind of study (see Section 4.3). We wanted this sample to have a history in which older populations directly contributed to the chemistry of the younger populations. In order to select these particles, we defined a sphere of 1 kpc of radius around the galaxy's center of mass at the snapshot that corresponds to 1.5 Gyr. The radius of the sphere is larger than three gravitational softening lengths, but small enough to maximize the possibility that the stellar particles represent populations that have a common chemical history of evolution. Then we chose only the stellar particles whose progenitor gas particle was also in the same region since the beginning of the simulation. We adopt a time of 0.016 Gyr, which corresponds to the first snapshot available of the simulation. Finally, we chose only the stellar particles whose birth radii were also inside the sphere. The central location of the deterministic sample also considers that the particles have low probabilities of experiencing significant migration, since they are located at the center of the gravitational potential well.

The noise sample was built by replacing the chemical abundances of the deterministic sample by random chemical abundances. The random chemical abundances were generated within the range of the deterministic sample. Therefore, the noise sample has stellar particles whose chemical abundances have no astrophysical meaning. This sample was included in this study in order to compare how phylogenetic trees from data compare to trees from random chemical abundances and to evaluate the presence of the phylogenetic signal.

Finally, groups 01, 02, 03, and 04 are used to explore the evolutionary histories of different regions of the galaxy (see Section 4.4). We selected stellar particles in four different spheres at different galactic radii. All the spheres have 1 kpc of radius, like the deterministic sample. Unlike the deterministic sample, however, here we perform no further selections on the birth radii or the location of their progenitor gas particles, hence allowing the particles to come from outside the corresponding sphere. Group 01 was built from a sphere centered at (x, y, z) = (0, 0, 0) kpc. Group 02 was built around the position (x, y, z) = (3, 3, 0) kpc. Group 03 is from a sphere centered at (x, y, z) = (−3, 3, 0) kpc. Group 04 was selected around the position (x, y, z) = (−5, 5, 0) kpc. Groups 02 and 03 were selected to assess possible azimuthal variations, in which only Group 03 selected stellar particles from a spiral arm.

Figure 1 shows the spatial distribution of the four regions studied. Group 01 (green) contains 2365 stellar particles. Group 02 (blue) contains 324 stellar particles. Group 03 (pink) has 478 stellar particles. Finally, Group 04 has 159 stellar particles. The colors associated with each group are respected in the rest of this work. In gray, we show the spatial distribution of all stellar particles at 1.5 Gyr. The difference in the number of particles in these regions is due to the different gas densities in the simulation, which follows an exponential profile. This has an impact on the SFH and therefore the chemical enrichment.

Figure 1.

Figure 1. Face-on (upper panel) and edge-on (lower panel) spatial distribution of the stellar populations in the four defined groups: Group 1 (green), Group2 (blue), Group 3 (pink), and Group 4 (red). The gray points represent the whole distribution of stellar populations in the simulated galaxy. We note that volumes mapped by the selected groups represent a sphere of 1 kpc radius.

Standard image High-resolution image

The deterministic and noise samples are used to assess the dependence of the phylogenetic signal on the number of stellar particles selected to estimate the trees. Hence, we created subsamples containing 10, 50, 100, and 200 stellar particles. Groups 01–04 are used to study the physical information that can be retrieved by the phylogenetic trees. For these groups, we selected 100 stellar particles to represent the stellar population of the corresponding region, based on the results of the analysis of the deterministic and noise samples. We applied a Kolmogorov–Smirnov test (K-S test) to guarantee that every subsample of 100 stellar particles provided a fair representation of the properties of their parent sample. We rejected the null hypothesis if the p-value was lower than 0.05. The K-S test considered the distributions of [Fe/H], [O/Fe], and star formation time.

3.2.2. Input Information for Trees

We used the chemical abundances of 10 chemical elements in order to estimate the phylogenetic trees. The chemical elements are: O, Mg, Ca, Si, Ne, S, Fe, Zn, C, and N. They trace different nucleosynthetic channels and provide important information about the chemical evolution processes in the simulation. O, Mg, Ca, Si, and Ne, for example, are α elements, produced mainly by SNe II, while Fe and Zn are iron-peak elements produced mainly by SNe Ia. In the case of C and N in this simulation, the production is done only by SN Ia and SN II, as winds from asymptotic giant branch (AGB) stars are not included in our simulation.

The chemical abundances were defined in relation to hydrogen and the Sun, in the format [X/H], as defined in Section 3.1. We chose this format in order to have a more direct parallel between the abundances in this work and the observational works on the chemical evolution of the Milky Way. Another reason for this choice is to have Fe as an independent element to estimate phylogenetic trees. We note that a galaxy that might experience the inflow of pristine gas can have a trend of [X/H], which is not monotonic. In the case of this simulated galaxy, there is no inflow of pristine gas, therefore the ratio [X/H] can be used without that concern.

The chemical abundances provided by the simulation do not have intrinsic uncertainties, therefore each tree we estimate is the result of one distance matrix that is the result of the simulated abundances. When studying the impact of uncertainties on the evolutionary history provided by the phylogenetic trees, we varied the original abundance value considering a normal distribution. In order to do so, we created normal distributions where their mean was the original abundance value and the uncertainties (σ) were 0.01, 0.05, 0.08, 0.1, 0.2, and 0.3 dex. The widths of the normal distribution were chosen in order to investigate uncertainties found in standard observational studies (e.g., 0.1, 0.2, and 0.3 dex) and also in high-precision studies (e.g., 0.01 and 0.05 dex), while considering intermediate cases to better delimit the maximum uncertainties possible for which a phylogenetic signal is mostly preserved (e.g., 0.08 dex).

4. Results and Interpretation

In this section, we present the results we obtained in three tests, performed using the different samples discussed in Section 3.2.1. The astrophysical properties of the samples used here are discussed in Section 4.1. In our first test, we explore the phylogenetic signal provided by trees when we vary the number of stellar particles (Section 4.2). Then, we investigate the impact of the chemical abundance uncertainties on the evolutionary history traced by the trees and in the phylogenetic signal (Section 4.3). Finally, we explore the evolutionary history found in different regions of the simulated galaxy and its connection with the AMR and SFH of the location (Section 4.4).

4.1. Astrophysical Properties of the Different Samples Used

In order to explore the astrophysical properties of the different samples used in this work, their SFHs, AMRs, and [O/Fe] versus [Fe/H] distributions are considered. Oxygen is a chemical element mostly deposited in the ISM due to SNe II, which are explosions of massive stars, while the production of Fe by SNe Ia and SNe II varies according to the yields adopted. Hence, the deviation of [O/Fe] from what is typically found in SNe II ejecta represents the contribution from low-mass stars. As a consequence, the ratio [O/Fe] is a powerful diagnosis of the low- and high-mass star contributions to the chemical evolution of the ISM, which happen over different timescales because of stellar evolution. The AMR relation is also key, since it shows how the metallicity of the environment changes with time. Finally, from the SFH, we can identify when star formation, hence chemical enrichment, has been most prominent in the simulation, or how star formation might vary in the different samples studied. We therefore use the [O/Fe] versus [Fe/H] distribution, the AMR, and the SFH to guide the interpretation of the evolutionary history traced by the phylogenetic trees.

Figure 2 shows the cumulative stellar-mass fraction as a function of the stellar ages of the populations within each analyzed sample. The deterministic sample is in yellow, and Groups 01, 02, 03, and 04 are in green, blue, pink, and red, respectively. The dashed horizontal lines represent the 50th and 80th percentiles of the stellar-mass contribution. This figure allows us to compare the SFHs of the different samples. We note that Group 04 forms 80% of its stellar mass in a considerably shorter timescale than Group 01, reflecting that the outskirts of the galaxy formed the majority of their stellar mass faster than the center of the galaxy at the given time. We also observe that Group 03 also creates 80% of its stellar mass faster than Group 02.

Figure 2.

Figure 2. Cumulative stellar-mass fraction as a function of age of all samples considered in this work (see Table 1). The orange, green, blue, pink, and red lines refers to the deterministic and Group 01, 02, 03, and 04 samples. The horizontal lines indicate the 50th and 80th percentiles of the stellar-mass contribution.

Standard image High-resolution image

In Figure 3, we show the SFH, the AMR, and the [O/Fe] versus [Fe/H] diagrams for the different samples studied here. Each row of the figure is a different sample. The gray background points correspond to all the 31,807 stellar particles at 1.5 Gyr that passed our first selection criterion (i.e., have Z higher than 0) and are therefore the same in all rows. In color, we show all stellar particles selected for each sample. The stellar symbols enclosed correspond to a random selection of 100 particles that are referred to as example samples. We make this selection because we can only estimate trees with a limited number of stellar particles, to avoid visual cluttering, therefore we need to assess if these selections are a good representation of the entire sample. These are the 100 particles selected considering a K-S test and displayed in the trees of the following sections, and we can see that in every sample, they are well distributed with respect to the main sample.

Figure 3.

Figure 3. Example of astrophysical properties of the samples studied. Each line represents the following samples, respectively: deterministic, Group 01, Group 02, Group 03, and Group 04. Left: star formation history (SFH) Center: age–metallicity relation (AMR). Right: [O/Fe] vs. [Fe/H] relation. Gray: all stellar particles with chemical abundances available in the simulation at 1.5 Gyr. The dark colors represent all possible stellar particles from each sample. The star symbols represent the chosen 100 particles used to estimate phylogenetic trees in this work.

Standard image High-resolution image

Looking at the left columns of Figure 3, we see that the peak of the star formation happened at the start of the galaxy's evolution. This peak is seen across the different samples, although it lasts for longer in the central part of the galaxy. One can see that both the deterministic and the Group 01 samples have an SFH that peaks at 1.4 Gyr and decreases gradually over approximately 0.3 Gyr, while Groups 02, 03, and 04 have peaks that lasts only for about 0.1 Gyr. There is still star formation happening during the rest of the history of this galaxy across all regions, but at a much lower rate.

It is expected that a galaxy that evolved in isolation would not present further enhancement of the star formation after the first peak, which is driven by the formation of the arms in this simulation. The newborn stellar populations tend to be concentrated in the central regions following the initial gas density distribution, but they will also populate the denser regions of arms. After this, the star formation self-regulates, consuming the remaining gas (recall that there are no external gas inflows or mergers in this isolated case) into stars, which subsequently injects SN feedback into the ISM. The energy increases the temperature and pressure and contributes to regulate the star formation activity, producing a more continuous star formation activity with the same weak star formation bursts.

The AMR relations in the middle panels show the relation between chemical enrichment and the SFH. Since a lot of stellar particles are formed at the beginning of the galaxy's history, it is expected that chemical enrichment will happen quickly, particularly in the central regions where the gas density is highest. The AMR is therefore expected to be steep for stellar particles formed at the epoch of the star formation peak. We observe that happening in all regions. Once the star formation has slowed down, the metallicity slightly increases. We can note some differences among regions. The AMR relation in the central regions increases more monotonically, which is an effect of more significant star formation happening over a longer period of time with respect to the outer regions. This can also be seen from the cumulative mass ratio of Figure 2, where the central region forms 50% or 80% of its stellar particles later than the outer regions. The level of metal enrichment reached by each stellar population is also different, with the central regions being systematically more enriched, as expected.

The AMRs of Groups 02, 03, and 04 show a breaking point around 1.3 Gyr, which is related to the abrupt change of star formation activity at that time. The AMR of Group 04 has very few stellar particles with ages younger than about 1.2 Gyr. In fact, from Figure 2, we see that 80% of the stellar mass in that region was formed 1.2 Gyr ago. It is thus more difficult to attribute these stellar particles as a population that is following one chemical evolution path through an ancestor–descendant relationship.

In all the analyzed samples, we observe a decrease of [O/Fe] with the increase of [Fe/H], as expected according to the chemical evolution of galaxies. In the first stages of the evolution of the simulation, multiple SNe II occur, producing O in great quantity. SNe Ia progenitors have longer lifetimes, therefore only at later stages is Fe deposited in the ISM in a more substantial way, decreasing [O/Fe]. This is seen in every panel.

It is worth commenting on the differences between the deterministic and the Group 01 samples, since both concern the same region in the galaxy, namely the central one. We note that the deterministic sample is a subset of Group 01, since we impose that both the stellar and the gas particles residing at the end of the simulation must have stayed in the inner region. This results in removing most of the younger particles of Group 01, which shows how much gas flow is ongoing in the central region of the simulation. In Figure 2, it is possible to see how the deterministic sample assembles 80% of its stellar particles around 1.2 Gyr, while Group 01 does it about 0.3 Gyr later.

Groups 02 and 03 are also worth commenting on, since they are selected to study possible asymmetric effects in the disk. It is customary to assume that because of the galactic rotation, disks are asymmetric, and therefore only the galactic radius is considered as a variable for studying variations in galactic structure and evolution, but the presence of the spiral arms might cause some asymmetries. Here, we see that the SFH, AMR, and [O/Fe] versus [Fe/H] have very similar distributions in Figure 3. But we also note that the total number of stellar particles in both regions is different, which is related to the different densities across the arms. Group 03 is located on a spiral arm. This has an impact on the SFR, as seen from the cumulative mass fraction of Figure 2, where Group 02 assembles 80% of its stellar particles about 0.4 Gyr later than Group 03.

As a consequence of the SFHs, [Fe/H] has a quick increase during the first 0.5 Gyr, but after 1.2 Gyr, it is approximately constant, with a weak increase in some regions depending on the SFH and local characteristics of the ISM. That delayed enrichment of SNe Ia relative to SNe II causes [O/Fe] to decrease as metallicity increases across the entire galaxy, as a result of the interplay between the chemical production of O and Fe caused by stars of different lifetimes. We also show in Figure 3 that our selection of 100 particles from our samples is a fair representation of the particles in that sample. We estimate the trees to explore the impacts of these different SFHs and AMRs in these regions in the following sections.

4.2. Phylogenetic Signal in Numerical Simulations

In this section, we focus on the deterministic sample to study if there is a phylogenetic signal in our simulation. To do so, we first compare our trees with the noise sample, to ensure we are obtaining results that are different than a random distribution, and then interpret our trees in the context of historical reconstruction. We used as the root of the trees the oldest stellar particle for which chemical abundances were available, as discussed in Section 2.1.

4.2.1. Trees from Chemical Abundances Obtained from Simulated Data or from a Random Distribution

In this section, we investigate the dependence of the phylogenetic signal on the population density, specifically the number of stellar particles used to construct it within a given volume. This analysis is highly relevant, as it allows us to determine the minimum number of stellar particles necessary to extract a signal that surpasses numerical noise in the simulation as well as natural stochasticity.

Figure 4 shows an example of a tree estimated using the deterministic sample and one tree estimated using the noise sample. Both trees were estimated by using subsamples of 100 stellar particles selected at random from the corresponding volume (see Table 1 for detailed definitions of these subsamples). In this figure, we can see that the two trees are very different from each other in their general aspect.

Figure 4.

Figure 4. Left: example tree estimated from the deterministic selected stellar particles. The stellar particles represent a single stellar population. Right: example tree estimated from the noise sample (see Table 1). In order to estimate both trees, we included an outsider stellar particle that corresponds to the oldest stellar particle in the simulation for which chemical abundances are available. Both the trees presented in the left and right panels were rooted in this stellar particle, for a better comparison. The scale at the bottom of each panel refers to the branch length (total chemical difference).

Standard image High-resolution image

The most notable difference between the trees is the branching pattern; in particular, the number of main branches. The tree from the deterministic sample shows one main branch, e.g., the tree is very asymmetric or imbalanced. Moreover, the branch lengths that connect the tips to nodes are very short. We recall that nodes in biology reflect the last common ancestor of the descendant lineages. Here, since almost all the nodes have at least one descendant lineage that connects directly to a tip, one might attribute that we are sampling the ancestral states and directly tracing the ancestor–descendant relationships of the stellar particles.

The noise tree, on the contrary, has long branches, especially at the tips. All nodes are therefore a representation of a state that is very different to the tips and not directly sampled in the data. Moreover, the internal branches are shorter than the external ones, which is a reflection of the differences in this sample being driven by randomness and not by internal hierarchical structures, since this branching pattern shows that much of the chemical distance between the stellar particles is not explained by the inferred phylogenetic relationship and it is then deposited in the tips. The tree shows an even distribution of branches that bifurcate from nodes from the root to the tips (e.g., it is a symmetric or balanced tree).

As discussed in Jackson et al. (2021), imbalanced trees happen when there is gradual evolution of a single lineage through time. Differences between traits can therefore be traced as information passed through generations, but they still might represent the evolution of the same population. Balanced trees might reflect rather the differentiation of populations and processes that cause populations to evolve independently from each other. In astronomy, so far stars or stellar particles whose chemical abundances are the result of a shared chemical evolution history produce very imbalanced trees. That was found in Jackson et al. (2021), in Walsen et al. (2023), and in K. J. Yaxley et al. (2023, in preparation; both with solar twin observed data), in C. J. L. Eldridge et al. (2023, in preparation), and throughout this article.

Based on these findings, we can report that the trees constructed from the simulated chemical abundances successfully capture a discernible phylogenetic signal that deviates from noise. We will now investigate the minimum number of members required in the sample to attain this objective and, hence, justify the use of 100 members as adopted above.

Figure 5 shows the RFD (see Section 2.3) between the deterministic and noise samples. Here we attempt to quantify the difference between a tree estimated from the deterministic sample and from the noise sample (e.g., comparing the trees displayed in Figure 4). We consider trees estimated using 10, 50, 100, and 200 stellar particles. We compare 1000 times this difference by randomly selecting particles from the deterministic sample and the noise sample. The yellow distribution represents these 1000 RFD estimates. This figure also shows the RFD obtained between two noise samples. In the same fashion as with the deterministic sample, we randomly select particles 1000 times from the noise sample and compare them. The RFD distribution in this case is represented with the gray color. We recall that the higher the RFD, the more different the trees are from each other. Therefore, when the mean RFD of the yellow distribution is larger than the mean RFD of the gray distribution, we consider we have a phylogenetic signal. Also, to have trees that are generally different from noise, it is preferable that both distributions do not overlap.

Figure 5.

Figure 5. RFD distributions for the trees estimated with stellar particles selected from the deterministic sample compared to the noise (yellow histograms) and noise-only (gray histograms) samples. The top left, top right, bottom left, and bottom right represent respectively the cases considering 10, 50, 100, and 200 stellar particles. Each panel contains the mean (μ) RFD of the distributions. The larger the RFD, the more different the trees are from each other.

Standard image High-resolution image

In the case of estimating trees with 10 stellar particles, the distributions of the RFD of the deterministic and noise samples overlap. The mean RFD for the deterministic sample is 0.77, while for the noise sample alone, the mean is 0.75. The standard deviations (SDs) are respectively 0.08 and 0.09. Hence, we interpret that trees estimated from noise containing 10 stellar particles are not more similar to each other than they are to trees estimated from simulated data. When using 50 stellar particles to estimate trees, the distributions of the RFD become more different, but the tails in the distributions still overlap. The mean RFD for the comparison between the deterministic sample is 0.90, while the mean for the noise sample alone is 0.85. The SDs are 0.01 and 0.02, respectively. With 50 stellar particles, we interpret that phylogenetic trees estimated from noise are more similar among each other than they are compared to a tree estimated using abundances from simulated data.

In the cases of 100 and 200 stellar particles, the distributions of the RFD do not overlap, but the mean of the deterministic distributions increases. In both cases, the RFDs are on average larger in the comparison of trees made from the deterministic and noise samples than among trees from only the noise sample. This indicates that phylogenetic trees estimated from noise are more similar to each other than they are to phylogenetic trees from simulated data containing 100 and 200 stellar particles. In the case of 100 stellar particles, the mean of the RFD between the random and deterministic samples is 0.93 and in the case of 200 particles that mean is 0.94. The SDs are respectively 0.01 and 0.01. The mean RFDs of the noise against noise particles are 0.87 (with an SD of 0.01) and 0.88 (with an SD of 0.01) for 100 and 200 stellar particles, respectively.

We conclude that the more particles we consider, the more our trees are different from a random distribution, but using 50 particles or less might still produce some phylogenetic trees whose topologies are comparable with a random tree. When using 100 particles, however, we obtain trees that are always different from noise, therefore we use 100 particles from now on to interpret the phylogenetic signal of our data and reconstruct the history of our simulated galaxy. We note that this result might differ when considering more complex cases or a different resolution for the simulation, and it is possible that more stellar particles might be required in those scenarios to reliably represent the evolutionary history of the system.

4.2.2. Phylogenetic Signal from the Deterministic Sample

We consider the difference between the tree estimated from the abundances resulting from a simulation and the tree estimated from noise as a proxy for a phylogenetic signal. We now investigate if our tree can help us to reconstruct the history of the deterministic sample. Figure 6 shows an example of a deterministic tree color-coded according to age. This is the same tree as the one shown in Figure 4. We can see that this tree has its internal nodes rank ordered according to age. Moreover, the oldest particles are closer to the root, but that is expected if we have used the most ancient particle to root the tree. Given the AMR of this sample (see the top middle panel of Figure 3), it does not come as a surprise that the tree will have a clear directional evolution, since our tree uses [Fe/H] as one of the traits in the distance matrix. The AMR is flat below ages of approximately 1.2 Gyr, and this lack of sensitivity leads to a worse age ranking at the top of the phylogenetic tree relative to the base of the tree. Figure 12 shows this tree, but with particles colored by their [O/Fe].

Figure 6.

Figure 6. Example of a phylogenetic tree of the deterministic sample (the same as presented in Figure 4), color-coded according to age.

Standard image High-resolution image

There is a section in the tree where the neighboring particles do not necessarily have very similar ages. This coincides with the sector in which [O/Fe] mixes. It is possible that this is related to the moment in which SNe Ia events start to occur, which changes the overall chemical enrichment rate. Considering that the distance matrix uses a mix of elements coming from SNe II and SNe Ia, if the rates of their production vary during the history of the galaxy, it might cause particles of different ages to be chemically more similar than coeval particles. This section in the tree corresponds to ages below 1.0 Gyr, which is when the star formation slows down, the AMR becomes flatter, and the [O/Fe] reaches solar values.

It is further interesting to note the branch lengths between the nodes in this tree become shorter along the path of the tree. This might be related to the SFH. At earlier stages of the history, when star formation is at its peak, there is a notable change in chemical abundances, which is represented by the steep AMR (see Figure 3). In the beginning, the gas is very metal-poor, therefore any enrichment is significant compared to its surroundings. This causes long branches. As the star formation slows down, the difference in chemistry becomes smaller, the AMR flatter, and the branch lengths shorter. Therefore, the branching pattern illustrates that the rate of chemical enrichment declines.

Because our tree is asymmetric (e.g., it presents only one main branch) and has rank-ordered ages, it reflects the result of one single history. This is consistent with the fact that our simulated galaxy did not experience interaction with another chemical-enriched galaxy, causing the mixing of preprocessed gases or the inflow of pristine gas from filaments.

4.3. Phylogenetic Signal Considering Uncertainties

In the previous section, we defined the minimum number of stellar particles necessary in order to have enough phylogenetic signal to have trees that represent the evolutionary history of our studied galaxy. In this section, we investigate the maximum uncertainties on the chemical abundances for which the phylogenetic trees are evolutionary informative. Using the example tree of the deterministic sample (Section 4.2), we explore the effect chemical abundance uncertainties have on the phylogenetic signal and how they affect the evolutionary history we can interpret from the tree.

For the purpose of this analysis, we perturbed the chemical abundances of the 100 stellar particles from the deterministic sample considering six uncertainty values: 0.01, 0.05, 0.08, 0.1, 0.2, and 0.3 dex. Abundances with precision below 0.05 dex fall in the high-precision domain and are rather obtained when analyzing very high-resolution and high-signal-to-noise-ratio spectra (e.g., Nissen & Gustafsson 2018) or using machine-learning tools when large samples of reference stars are available for training a good model (Ness et al. 2015; Leung & Bovy 2019; Wheeler et al. 2020; Ambrosch et al. 2023, Walsen et al. 2023). Standard spectral analyses have abundance precisions that are rather of the order of 0.1–0.2 dex. A precision of 0.3 dex is understood as a large uncertainty, but is unfortunately still very common for studies, particularly for faint stars for which the signal-to-noise ratio is not very high, as, for example, for halo stars.

In order to account for these uncertainties, we created new values of chemical abundances for each stellar particle. The new values consider a normal distribution with the mean as the original value and the SD as the corresponding uncertainty considered. Using the perturbed chemical abundances, we estimated new trees for the deterministic sample, which we compare with the original tree. In Figure 7, we show the RFD between the original tree and the trees estimated considering uncertainties of 0.01, 0.05, 0.08, 0.1, 0.2, and 0.3 dex with different colors.

Figure 7.

Figure 7. RFD when comparing the deterministic tree and those with chemical abundance uncertainties of 0.01, 0.05, 0.08, 0.1, 0.2, and 0.3 dex. The means of these distributions are shown in the legend. The larger the uncertainty, the more different the trees become from the original.

Standard image High-resolution image

The RFD shows that the yellow distribution has a mean of 0.07 (and an SD of 0.03), indicating that considering uncertainties within 0.01 dex does not significantly change the trees. As the uncertainties increase, the RFD increases as well, which is expected. For an uncertainty of 0.3 dex, the trees deviate from the original one, reaching a mean RFD of 0.50 (and an SD of 0.04). We note that this value is still lower than the mean RFD of 0.93 for the comparison of the deterministic and the noise sample when 100 particles are considered. This suggests that while the trees with uncertainties of 0.3 dex differ among each other, there is still some phylogenetic signal, as they are still distinct from pure noise. Additionally, to have a better idea of how the trees change when the abundances are modified, we show in Figure 8 the link between two example trees for three cases of abundances. The left-hand trees are always the deterministic tree, with no uncertainties in the chemical abundances, and the right-hand trees correspond to one example tree obtained by perturbing the abundances, considering 0.01, 0.1, and 0.3 dex, respectively. The dashed lines in each case connect the same particle in each tree.

Figure 8.

Figure 8. Comparison between the deterministic tree (the left tree in all three examples) and a new tree estimated by perturbing the abundances (the right trees in all three examples) within a range of 0.01 (left), 0.1 (middle), and 0.3 (right) dex. The dashed lines connect the same particle in each tree.

Standard image High-resolution image

From the lines shown in Figure 8, we can see that when the abundances have an uncertainty of 0.01 dex, 24 of the 100 particles change their labeling order (locations in the tree). Their new location is relatively close to the original tree, as expected if the change in abundance is small. In the case of an uncertainty distribution of 0.1 dex, 60 out of 100 stellar particles change their places. Finally, in the case of 0.3 dex uncertainties, 90 stellar particles change place in the tree. The new positions are quite far from the original tree. It is interesting to note the gradual increase of the branch lengths when the uncertainties increase. This is also seen in the noise tree (see Figure 4) and in Walsen et al. (2023), who compared trees built from observed stars whose abundance measurements have different uncertainties. This tree, however, is different to the noise tree, as expected from the different RFD value obtained here and in Section 4.2. The tree with 0.3 dex uncertainty is in fact still very imbalanced, unlike the noise tree.

While Figure 8 shows the displacement of stellar particles when considering uncertainties, it is fundamental to evaluate if different chemical abundances would still carry evolutionary information to reconstruct the shared history of the selected stellar particles. It is thus necessary to study the support of a tree with uncertain abundances in this context. To do so, we computed 1000 trees by perturbing the abundances and collected these trees in a majority-rule consensus tree (see Section 2.3). Figure 9 shows the consensus trees when considering the uncertainties of 0.01, 0.05, 0.08, 0.1, 0.2, and 0.3 dex. We note that only the tree topology is shown, since the branch lengths of consensus trees cannot be directly related to the branch length of an actual sampled tree, which is the result of a distance matrix.

Figure 9.

Figure 9. Consensus tree topologies color-coded according to age. Top: trees estimated considering chemical abundance uncertainties of the order of 0.01, 0.05, and 0.08 dex, respectively. Bottom: trees estimated considering chemical abundance uncertainties of the order of 0.1, 0.2, and 0.3 dex respectively. The polytomies in each tree are indicated as A, B, C, D, E, F, G, and H.

Standard image High-resolution image

In this work, we aim to focus on the branching pattern of the nodes and the age ranking of the selected nodes; we do not focus on the branch lengths. By collapsing nodes into multifurcations when nodes are conflicting in a sample of phylogenetic trees, we are reducing the number of total nodes in a tree, which essentially means reducing the resolution in which the shared history can be extracted. It is not trivial to define a limit of the maximum number of nodes that can be reduced from a sampled tree to a consensus tree that means a significant loss of the phylogenetic signal, but it is clear that if we allow multifurcations in our trees, they should be somehow distributed along the tree, such that groups of stellar particles can be distinguished in, e.g., their mean ages. That means a polytomy that contains more than 50% of the particles that span the entire age range is not evolutionary informative.

Figure 9 shows consensus trees made with sampled trees that consider different abundance uncertainties. The top left panel (P1) considers an abundance uncertainty of 0.01 dex and shows that overall most nodes are present in more than 50% of the sampled trees. That tree has very few multifurcations, with four branches rising from a node at most. Moreover, these polytomies are at a significant distance from the root. Overall the age ranking of the branches remains, thus we conclude that uncertainties of 0.01 dex do not affect the phylogenetic signal of an evolutionary tree of these properties.

When focusing on the middle top panel of Figure 9 (P2), we see the consensus tree topology obtained from trees sampled considering an uncertainty of 0.05 dex. As expected, the number and size of the polytomies increase. In this case, we find two significant multifurcations, labeled as A and B. The particles in polytomy B are mainly old stellar particles, while the stellar particles in polytomy A are intermediate-age particles. The age ranking in the tree is kept, even if the relation of age and distance from the root is not as tight as in the deterministic tree (see Figure 6). Polytomy B is closer to the root than polytomy A.

The top right panel (P3) shows the consensus tree with uncertainties of 0.08 dex. Close to the root, the tree is still resolved, but it becomes less resolved farther out from the root. We label three significant polytomies: C, D, and E. As in the previous case, these polytomies contain stellar particles that overall have different ages, with polytomy C containing young stellar particles, D containing intermediate-age stellar particles, and E containing old stellar particles. Polytomy E and Polytomy B are at a similar distance from the root in the trees presented in panels 3 and 2, respectively. We thus conclude that while the age ranking of the nodes has a large scatter, the ranking is still present and therefore, with uncertainties of 0.08 dex, we are still able to reconstruct a history from a phylogenetic tree.

The situation with uncertainties above 0.1 dex is more critical. Consensus trees with uncertainties of 0.1, 0.2, and 0.3​​​​​​ dex are shown in the lower panels of Figure 9, in Panels 4, 5, and 6 (P4, P5, and P6). Here we are able to label only one significant polytomy per tree: F, G, and H, respectively. They contain stellar particles of all ages and contain a significant fraction of the particles of the sample. In these consensus trees, it is not possible to arrange the star particles according to their ages in the tree, and therefore it is not possible to reconstruct the evolutionary history of this galaxy. We further find that as the uncertainty increases, the polytomy becomes deeper in the tree. For an uncertainty of 0.3 dex, the polytomy is a few nodes away from the root.

The fact that only close to the root we are able to resolve the tree in these cases is due to the significant change in metallicity at old ages (see the AMR in Figure 3), which is related to the peak in SFH. When the star formation is less extreme, and the AMR does not present a significant change arriving at a plateau, uncertainties above 0.1 dex in the abundance measurements do not allow us to study the evolution of that system using phylogenetic trees.

4.4. Evolutionary History Considering Different Regions of the Galaxy

While in Section 4.2 we investigated the dependence of the phylogenetic signal on the population density and in Section 4.3 we explored the dependence of the phylogenetic signal on the uncertainties in the chemical abundances, in this section we explore how the AMR and SFH of different regions of the galaxy impact the properties of phylogenetic trees.

In Section 4.2, we discussed the evolutionary history traced by phylogenetic trees from the deterministic sample. In this section, we repeat that analysis using phylogenetic trees from different regions of the galaxy. We thus analyze the trees estimated from the example samples of Groups 01, 02, 03, and 04, whose spatial distributions are shown in Figure 1 and astrophysical properties in Figure 3, with the colors green, blue, pink, and red, respectively. The chosen 100 stellar particles are used to estimate and analyze the trees of this section.

Figure 10 shows the trees of each group, with the stellar particles color-coded according to age in the top row and according to [O/Fe] in the bottom row. Similar to the tree estimated using the deterministic sample, these trees are imbalanced, and show rank-ordered ages, implying that everywhere in the galaxy we can reconstruct history. The branching order of the ages, however, becomes weaker from Group 01 to Group 04. This might be an effect of the SFH, whose peak becomes narrower toward the edge of the galaxy (see Figure 3). This translates into a flatter AMR for stellar particles younger than about 1.2 Gyr.

Figure 10.

Figure 10. Phylogenetic trees of a selection of 100 stellar particles from the groups selected in the different regions shown in Figure 1. The tips are color-coded according to age (upper panels) and to [O/Fe] (lower panels).

Standard image High-resolution image

All trees show the presence of an apparent second branch of very old stellar particles, which are close to the root. The trees here have been rooted using the oldest star particle, but that does not imply that this particular stellar particle is a common ancestor to the rest of the stellar population. At the beginning of the simulation, there is significant homogeneity in the distribution of metals in the gas, which reflects the local distribution of the cold gas from which stars are formed. This has an impact in how the chemical evolution due to the first SNe enriches the ISM. At the very first stages of evolution, the metallicity of the ISM is strongly heterogeneous. As star formation progresses, the regions became more chemically enriched and mixed, and the exchange of enriched material between regions could take place (e.g SN outflows and radial migration). However, as we moved from the central to the outer regions, the level of enrichment systematically decreases, even though the AMR shapes are similar. This decrease in the global metallicity with radius is expected for galaxies with an exponential gas density distribution like the simulated galaxy used in this study.

In order to better quantify the different trees and so discuss the rate of change in the chemical distance from the root to each tip, we calculate the distances of each tip to the root. Figure 11 shows the cumulative chemical distance from the root of stellar particles as a function of their ages in the left panel and the distribution of distances for each sample in the remaining panels. We can first observe that in all the four groups, there is a sharp increase in the distance for the oldest stellar particles, with few tips having short distances from the root. At around 1.2 Gyr, the distance reaches a more or less constant value, which ranges between 3.5 dex and 5.5 dex approximately, depending on the group. The point when the sharp increase in the distance from the root stops is related to when the peak of star formation ends in each region, according to their SFH (see Figure 3).

Figure 11.

Figure 11. Left: cumulative distances from the root to the tip as a function of the ages of stellar particles. Groups 01, 02, 03, and 04 are represented as green, blue, pink, and red lines, respectively. Right: cumulative percentages of stellar particles contained in bins of distance from the root. In the left panel, it is shown that the distance from the root reaches a plateau or a region with slow increase in the same area that contains the majority of stellar particles according to the right panel.

Standard image High-resolution image

Considering that Group 01 corresponds to the galactic center, that Group 04 corresponds to the outskirts of the galaxy, and that Groups 02 and 03 are in the middle, it is encouraging to notice that the largest maximum distance is reached by the tree estimated from stellar particles in Group 01 and that the shortest maximum distance is estimated from Group 04. From Figure 3, we know that the SFH between Group 01 and Group 04 is different, in the sense that the central region experienced a long peak of star formation and continued forming stellar particles until the present date, while the outer region experienced a short star formation peak, with an abrupt stop and almost no recent star formation. This translates into an AMR of a population that increases in metallicity until the present day for Group 01, while for Group 04, the AMR is rather flat.

It is thus expected that a tree path that is drawn from a population with more star formation will be longer. From our results, we find that indeed trees can be used to learn about the SFH of galaxies, since the difference in the total length path of the tree (i.e., the distance from the root) is large (2 dex), even if the AMR or the [O/Fe]–[Fe/H] planes are comparable.This shows that the tree enhances the differences. We note that another advantage of using the tree path length to study the efficiency of star formation is that it is not necessary to know accurately the ages of the particles. This is an advantage, because determining stellar ages is a challenging task.

The right-hand panel of Figure 11 shows how the distribution of distances from the root is different for the different groups. The group with higher and more extended SFH reaches higher lengths than the group with lower SFH. The latter has a wider distribution of lengths between 2 and 4 dex, reflecting also the scatter in the AMR. In the case of Groups 02 and 03, the SFH is very similar in both cases. There are more stellar particles formed in Group 03 than 02 due to the higher gas density in Group 03, which is where the spiral arm lies. From the AMR or the [O/Fe] versus [Fe/H] diagrams, the impact on the gas density is difficult to identify, and the same can be said considering the length of the tree.

Figure 11 can be related to the AMR, since the metallicity is one of the traits in the tree distance matrix. It is therefore not surprising that the age–branch length relation will be very similar to the AMR. The tree branch lengths incorporate the other chemical abundances, in addition to the Fe, which is why it covers a larger range in chemistry. Since we are using all abundances relative to hydrogen, all elements are expected to increase with time, making the chemical distance increase in a way that directly relates to the increase in metallicity. This is valid for the studied system, which does not experience the infall of pristine gas. Moreover, the distance matrix uses [Zn/H], which are also produced by SNe Ia. They follow a comparable evolution to [Fe/H] and cause the relation between branch length and age observed in Figure 11.

5. Prospects and Limitations of Stellar Phylogeny

As previously mentioned, stellar phylogeny has already being applied to observational data (Jofré et al. 2017; Jackson et al. 2021; Walsen et al. 2023). However, this is the first time it has been applied to simulations. As this is the first study of its kind, using an isolated disk galaxy simulation serves as an ideal test case and a fundamental step to maturing the method before applying it to more complex systems, which can better represent real galaxies.

Interactions play an important role in the evolution of galaxies (Toomre 1977; Efstathiou 1990; Barnes & Hernquist 1992). When a galaxy undergoes mergers, for instance, both its stellar population and gas content experience alterations (Torrey et al. 2012; Monachesi et al. 2019). Additionally, such events can trigger episodes of star formation, further impacting the galaxy's chemical composition and stellar populations (as illustrated in Di Matteo et al. 2007). Consequently, the environment becomes more complex, making the application of stellar phylogeny more delicate. We expect that in more complex systems that experience interactions, the results regarding stellar phylogeny can possibly be impacted by the mass ratio of the galaxies and also their amounts of available gas. While a comprehensive investigation into how mergers affect phylogenetic trees is currently a work in progress, we anticipate that a meticulous selection of stellar particles or stars (in the case of observational studies) will be fundamental for conducting stellar phylogeny in more complex systems. The selection will be crucial not only to estimating phylogenetic trees that are evolutionary informative, but also in order to have a robust interpretation of the results.

Another factor that requires further characterization is how stars born from the same molecular cloud but having different masses can be addressed in stellar phylogenetic studies. The inclusion of stars with a wide range of masses can introduce an additional layer of complexity, since different stellar evolution processes rule stars with difference masses, potentially altering the chemical abundances in the atmospheres of stars. Using massive stars might complicate the application and interpretation of stellar phylogeny, due to potential alterations in their chemical abundances resulting from internal processes, such as mass loss and mixing (Meynet & Maeder 2000; Langer 2012; Martins et al. 2015). However, low-mass stars can also have their chemical composition altered by processes such as atomic diffusion and rotation (Deal et al. 2020).

A better characterization of the limits of chemical tagging would benefit the development of stellar phylogeny. We acknowledge the significance of exploring the effects of studying stars from the same molecular cloud but with different masses to better characterize this method. However, such an investigation falls outside the scope of this work, since here the stellar particles represent stellar populations, where such effects are not included.

6. Summary and Conclusions

In this study, we have investigated the phylogenetic signal within a simulated disk galaxy, addressing three specific questions. First, we explored the dependence of the phylogenetic signal on population density. Second, we investigated the dependence of the phylogenetic signal on the uncertainties associated with the chemical abundances. Third, we studied the dependence of the properties of the phylogenetic trees with different regions of the simulated disk galaxy.

Approaching the first question, we explored the minimum number of stellar particles required to obtain a phylogenetic signal and reconstruct the galaxy's evolutionary history. This was done because it is fundamental to be able to differentiate a phylogenetic signal from noise and stochasticity. In this analysis, we varied the number of stellar particles from 10 to 200 and found that using 100 stellar particles allowed for the reconstruction of this galaxy's history. For 100 stellar particles, the distributions of the RFD did not overlap when considering trees estimated from simulated and random data. The mean RFD considering trees from simulated samples was 0.93, with an SD of 0.01, while the RFD considering trees from random chemical abundances was 0.87, with an SD of 0.01. We also observed that the topologies of the trees estimated using the simulated and random data were different, supporting the conclusion that phylogenetic trees from simulated data were significantly different from random noise.

In the second question, we studied the impact of uncertainties in the chemical abundances on the evolutionary history portrayed by the phylogenetic trees. In order to do so, we perturbed the chemical abundances of reference phylogenetic trees considering uncertainties in the range of 0.01 and 0.3 dex. As the uncertainties in the abundances increase, the RFD between the original trees and the perturbed trees also increases. Trees with uncertainties of 0.01 dex remain similar to the original tree, having a mean RFD of 0.07 with an SD of 0.03, while those with 0.3 dex uncertainties deviate significantly, having a mean RFD of 0.50 and an SD of 0.04. However, even with uncertainties as high as 0.3 dex, there was still a retrievable phylogenetic signal when considering trees estimated from random chemical abundances. We report that the resolution of phylogenetic trees decreased with higher uncertainties and that the displacement of stellar particles within the trees becomes more pronounced as the uncertainties increase. Finally, we observed that for uncertainties below 0.08 dex, we could successfully reconstruct the galaxy's history, since the uncertainties do not significantly affected the age ranking of the nodes in the tree and the polytomies are not the domineering structure of the trees.

In the final question approached in this work, we analyzed whether the evolutionary histories inferred from phylogenetic trees constructed using stellar particles from different regions of the galaxy were consistent with their AMRs and SFHs. We observed that the trees displayed one primary branch, indicating a gradual evolution of a single lineage over time. Also, the trees from the different regions displayed rank-ordered ages, with the older particles closer to the root. However, there are differences between regions. The cumulative distances from the root to stellar particles revealed that the path lengths in the phylogenetic trees were related to the SFH. Regions with higher and more extended star formation activity had longer tree path lengths, while regions with lower and shorter star formation activity exhibited shorter tree path lengths. The observed differences of the cumulative distances achieved a value of 2 dex. The aspect of the path length as a function of age was also related to the AMR of the system, with a sharp increase of the distance from the root associated with periods of rapid chemical enrichment. These findings highlight the potential of phylogenetic trees to capture variations in the SFH and AMR across different regions of the simulated disk galaxy, providing insights into its chemical history and SFH.

In summary, this work has demonstrated that it is possible to use phylogenetic trees to reconstruct the evolutionary history of a simulated disk galaxy. It has highlighted the relationship between phylogenetic tree properties and the AMR and SFH. This parallel between the phylogenetic trees and the global properties of a galaxy will be particularly useful when applying phylogeny to observed data of stars when the method is more mature, since usually the SFH as well as the AMR of real galaxies are not fully known. We also note that a natural next step to continue this work is to explore phylogenetic trees in more realistic simulated galaxies. These results open doors for exploring several other exciting questions about the archeology of galaxies and their evolution, both with simulated and observed data applied to stellar phylogeny.

Acknowledgments

We thank the anonymous referee for their comments. This work has been funded by Millennium Nucleus ERIS NCN2021_017. D.D.B.S. thanks Jorge González López and Kurt Walsen for the constructive discussions that made this project even more intriguing. D.D.B.S. also thanks ANID (Beca Doctorado Nacional, Folio 21220843) and Universidad Diego Portales for the financial support provided. P.J. is grateful to the Stromlo Distinguished Visitor Program at ANU. P.B.T. acknowledges partial funding by Fondecyt-ANID 1200703/2020. We acknowledge the use of the Ladgerda Cluster (Fondecyt 1200703/2020) and the National Laboratory for HPC (NLHPC). J.G.J. acknowledges support from CONICYT/ANID-PFCHA/Doctorado Nacional/2021-21210846 and the CONICYT Basal project AFB-170002. E.J.J. acknowledges support from FONDECYT Iniciación en investigación 2020 Project 11200263. A.R.A. acknowledges support from DICYT through grant 062319RA.

Appendix: Tree Colored by [O/Fe]

Previously in this work, we showed the phylogenetic tree of the deterministic sample color-coded according to the ages of stellar particles (see Figure 6). In Figure 12, we present the same tree, but color-coded according to [O/Fe]. In this tree, it is possible to see that there is a section where [O/Fe] is mixed, which might be related to the moment in which SNe Ia events start to occur. This is the same region where the age ranking in Figure 6 is weaker.

Figure 12.

Figure 12. Deterministic tree of Figure 6. The tree is color-coded with [O/Fe].

Standard image High-resolution image

Footnotes

Please wait… references are loading.
10.3847/1538-4357/ad036a