FLORAH: A generative model for halo assembly histories

The mass assembly history (MAH) of dark matter halos plays a crucial role in shaping the formation and evolution of galaxies. MAHs are used extensively in semi-analytic and empirical models of galaxy formation, yet current analytic methods to generate them are inaccurate and unable to capture their relationship with the halo internal structure and large-scale environment. This paper introduces FLORAH, a machine-learning framework for generating assembly histories of ensembles of dark matter halos. We train FLORAH on the assembly histories from the GUREFT and VSMDPL N-body simulations and demonstrate its ability to recover key properties such as the time evolution of mass and concentration. We obtain similar results for the galaxy stellar mass versus halo mass relation and its residuals when we run the Santa Cruz semi-analytic model on FLORAH-generated assembly histories and halo formation histories extracted from an N-body simulation. We further show that FLORAH also reproduces the dependence of clustering on properties other than mass (assembly bias), which is not captured by other analytic methods. By combining multiple networks trained on a suite of simulations with different redshift ranges and mass resolutions, we are able to construct accurate main progenitor branches (MPBs) with a wide dynamic mass range from $z=0$ up to an ultra-high redshift $z \approx 20$, currently far beyond that of a single N-body simulation. FLORAH is the first step towards a machine learning-based framework for planting full merger trees; this will enable the exploration of different galaxy formation scenarios with great computational efficiency at unprecedented accuracy.


INTRODUCTION
In the Λ-Cold Dark Matter (ΛCDM) paradigm, dark matter (DM) halos form hierarchically and grow in mass through the mergers of smaller DM halos (White & Rees 1978).The assembly history of a halo or galaxy refers to the sequence of events that led to its formation and growth over cosmic time.It encompasses the accretion of matter, mergers with other halos or galaxies, and internal processes such as star formation and black hole growth.In simulations, this is often represented in the form of "merger trees" i.e. progenitor and descendant halos that are linked across cosmic time.In the modern paradigm of galaxy formation, the properties of galaxies (such as stellar mass and star formation rate) are believed to be closely linked to the assembly history of their halos and their formation environment.Understanding this intricate halo-galaxy connection remains one of the key open questions of modern astrophysics.
N-body simulations provide a powerful tool to directly follow the formation and evolution of DM halos as they merge with other halos and interact with the large-scale environment (Vogelsberger et al. 2020).However, these simulations are computationally expensive, and their cost grows rapidly with the simulated volume and resolu-★ E-mail: tnguy@mit.edution.Therefore, they are often run with only dark matter and without including baryonic physics.It is not feasible to run a single simulation that can simultaneously capture the formation histories of halos from dwarf galaxies (10 5 − 10 10 M ⊙ ) to galaxy clusters (10 14 − 10 15 M ⊙ ) up to high redshifts.It is currently challenging to run even a single dark matter-only simulation with a volume comparable to that of existing galaxy surveys and a mass resolution sufficient to accurately trace the merger histories of the halos hosting the galaxies that are detected in those surveys -let alone next-generation surveys.Computational limitations also hinder the exploration of different structure formation scenarios and cosmological parameters.
Semi-analytic models (SAMs) are commonly used to populate DM halo merger trees with galaxies.SAMs apply simplified prescriptions for baryonic physics (e.g., radiative cooling, star formation, supernova, and AGN feedback prescriptions, etc.) within cosmological merger trees to track the formation and evolution of galaxies and forward model their observable properties (Cole et al. 1994(Cole et al. , 2000;;Somerville et al. 2015;Yung et al. 2019a;Hutter et al. 2021a;Elliott et al. 2021).SAMs require significantly fewer computational resources compared to N-body simulations and allow for exploration of a wider range of physical processes and parameters.They have been used to study many aspects of galaxy formation, ranging from large-scale clustering (e.g.Kauffmann et al. 1999;Somerville et al. 2001;Benson et al. 2003;Somerville et al. 2021;Hadzhiyska et al. 2021;Ayromlou et al. 2021;Gabrielpillai et al. 2022;Yung et al. 2022Yung et al. , 2023b) ) to galaxy properties (e.g., colors, metallicities, sizes, cold gas contents) (Lagos et al. 2012(Lagos et al. , 2013;;Somerville et al. 2015;Popping et al. 2017;Yung et al. 2019bYung et al. , 2020;;Henriques et al. 2020;Yates et al. 2021) to supermassive black hole formation and active galactic nuclei feedback (Somerville et al. 2008;Fanidakis et al. 2011;Yung et al. 2021) to reionization (Hutter et al. 2021a;Ucci et al. 2021;Hutter et al. 2021b).Similarly, semi-empirical models such as UniverseMachine and EMERGE link galaxy observables to halo formation histories (Behroozi et al. 2019(Behroozi et al. , 2020;;Moster et al. 2018).Accounting for the dependence of halo clustering on properties other than mass (assembly bias) has been shown to be important for accurately interpreting galaxy clustering measurements (e.g Wechsler et al. 2001;Hadzhiyska et al. 2021;Gabrielpillai et al. 2022)."Decorated" halo occupation distribution models have been proposed, which again require knowledge of the halo formation history and/or structural properties (e.g.Hearin et al. 2016).
To overcome these computational challenges, previous works have developed analytic methods that rely on the Extended Press-Schechter (EPS) formalism (Bond et al. 1991;Bower 1991).EPS trees are constructed by sampling the conditional mass probability ( 1 | 0 ,  0 ,  1 ) that a halo with mass  0 at redshift  0 had a mass of  1 at an earlier redshift  1 >  0 .For halos of any given initial mass  0 and redshift  0 , the algorithm uses Monte-Carlo methods and assumes the Markov property to construct its past merger histories (e.g.Lacey & Cole 1993;Somerville & Kolatt 1999;Zentner 2007;Neistein & Dekel 2008;Parkinson et al. 2008;Correa et al. 2015a,c).However, this approach has many limitations.The resulting MAHs and merger trees are known to disagree with the results of N-body simulations at the factor of few levels (Somerville et al. 2000;Li et al. 2007;Benson et al. 2019).Moreover, EPS-based techniques are fundamentally unable to capture the relationship between the assembly history, the halo structure, and the environment.For a comparison between different EPS-based techniques for generating merger trees, we refer readers to Jiang & van den Bosch (2014).
A recent study has applied Generative Adversarial Networks and Convolutional Neural Networks to generate ensembles of merger trees (Robles et al. 2022).Although this approach seems promising, this work did not incorporate the correlation between halo formation history and environment and did not show a complete set of diagnostic tests on the resulting merger tree ensembles.Another recent study using differential programming was able to accurately capture both the MAHs and assembly bias (Hearin et al. 2021).However, this approach smooths out the MAH of individual halos, making it difficult to capture merger events.It is also unclear how the halo structure and formation environment can potentially be included in the framework.
In this paper, we introduce florah (FLOw-based Recurrent model for Assembly Histories) a deep generative model based on recurrent neural network and normalizing flows, to generate assembly histories of halos.As a first step, we focus on generating the mass assembly histories (MAHs) and DM concentration histories of only the main progenitor branches (MPBs).The MPB tracks the most massive progenitor of a halo and thus is the most important for understanding the assembly history.MPBs can be naturally modeled as a timeordered sequence; hence we use a recurrent neural network to learn their representative features.Furthermore, for a halo of given mass and concentration, we are interested in learning the full distribution of possible assembly histories and hence we combine this recurrent network with a normalizing flow to learn this distribution.Once trained, florah can be used to generate the MAHs and DM con-centration histories of MPBs consistent with N-body simulations.Although florah only generates the MPBs of merger trees, it is the first framework to build in the correlation between halo structural properties and merger history.
To demonstrate this, we use the Santa Cruz SAM (SC-SAM) (Somerville et al. 2015;Yung et al. 2023a), with free parameters calibrated to reproduce a subset of observed galaxy properties at  = 0 (see Gabrielpillai et al. 2022 for detailed calibration criteria), to populate florah-generated MPBs with galaxies and show that florah can correctly capture the assembly bias and environmental dependence of galaxies, which existing EPS-based methods cannot reproduce.In addition, by combining multiple networks trained on different simulations with varying resolutions and redshift ranges, we are able to generate high-resolution MPBs to ultra-high redshifts, from  = 0 to  = 20.This allows us to overcome the dynamic range limit of a single simulation.
This paper is organized as follows.Section 2 provides a detailed description of the training simulation and the algorithms employed to extract the merger trees.Section 3 outlines our proposed method, including the data preprocessing procedure for a single simulation (Section 3.1.1)and for the combination of multiple simulations (Section 3.1.2).We describe the neural network architecture and optimization in Section 3.2, followed by an explanation of the generation procedure in Section 3.3.In Section 4, we show the generation results with vsmdpl (Section 4.1) and gureft (Section 4.2), while Section 4.3 presents a comparison with the combined gureft and vsmdpl simulations.We compare florah to previous approaches for modeling MAHs, and discuss the limitations of the current approach and potential avenues for future research in Section 5. We conclude the paper in Section 6.
The gureft suite consists of four boxes with sizes of 5, 15, 35, and 90 Mpc h −1 .Each box has the same number of particles 1024 3 , with the smaller box having a higher mass resolution.Therefore, each box captures the assembly histories of a different halo mass range.The gureft boxes have 171 snapshots from  ≈ 40 to 6 at a high temporal resolution, making them the ideal simulations to capture the merger rates and assembly histories at high to ultra-high redshift.
For  ≲ 6, we instead use the vsmdpl simulation, which has 151 snapshots from  = 0 to 25. Thanks to its high mass resolution, with a particle mass of 6.2 × 10 6 M ⊙ h −1 , vsmdpl has a higher number of well-resolved and low-mass halos at high redshift.This allows us to combine vsmdpl with gureft into one training dataset spanning a wider redshift and mass range.We will describe this procedure in more detail in Section 3.1.2.
We construct merger trees for each gureft simulation and use the publicly available merger trees from vsmdpl.All merger trees used in this work are constructed using the Rockstar halo finder and the ConsistentTree algorithm (Behroozi et al. 2013a,b).Note that  merger trees may be sensitive to the details of the halo finding and tree construction algorithms.We plan to experiment with different parameters and tree reconstruction algorithms in future work.
Figure 1 displays the median MAHs on the main branches of gureft and vsmdpl.The overlapping MAHs between the two simulations from  ≈ 13 − 5 are crucial for combining them.We have only shown the median MAHs down to approximately 100  DM , where  DM denotes the mass of the DM particles in the corresponding simulations.Halos with masses below this threshold cannot be reliably identified by Rockstar.Therefore, this limits the maximum redshift that we can generate for each simulation, with a maximum of  ≈ 20 (from gureft-05), as shown in Figure 1.Furthermore, the less massive root halos of vsmdpl (≲ 10 10 M ⊙ ) do not overlap with gureft, which imposes a further limit on the maximum generation redshift for these halos.Additionally, each simulation has a unique particle mass and a resolved mass limit, which can complicate the process of combining the simulations.We will provide a more detailed discussion of the effects of particle mass and resolution limitations in Section 3.1.2.

METHODOLOGY
In this section, we outline the data preprocessing, neural network architecture, and the optimization and generation process used in this work.A schematic illustration of florah is displayed in Figure 2. As a first approach, we aim to generate only the MPBs of merger trees since they contribute a first-order impact on the assembly histories of halos.We plan to extend our framework to generate secondary branches and other branches in future work.

Using a single simulation
In this section, we describe the data preprocessing steps for a single simulation.The same steps apply for the four gureft boxes and vsmdpl with different parameters because they have different redshift ranges and resolutions.We will highlight the parameters that differ in Section 4, where we describe the training dataset in more detail.The following steps are the same for all simulations.
As a first step, we extract only the main progenitor branches (MPBs) and exclude all branches with root halos containing fewer than 500 DM particles; halos below this limit may have poorly resolved progenitors.As previously mentioned in Section 2, the lower limit of the progenitor mass is about 100  DM , below which Rockstar may not reliably identify halos.We remove these unresolved halos during the post-processing of the generated MPBs (described in Section 3.3), rather than during the training dataset creation, as we have found that it leads to improved model performance.This resolution limit is particularly important when combining multiple simulations with varying  DM into a single training dataset, which we further elaborate on in Section 3.1.2.In addition, MPBs from ConsistentTree may sometimes have progenitor masses greater than their descendants.This is a common phenomenon that occurs due to the tree construction algorithm (including ConsistentTree) misidentifying the halos and their mass assignments.As with the unresolved halos, we will correct for this effect during the generation and post-processing of the MPBs in Section 3.3, rather than during the training phase.
To enhance the generalization ability of our model, we employ data augmentation by creating multiple "sub-branches" for each MPB.We begin by selecting the initial snapshot from a uniform distribution up to the first 50 snapshots ( ≈ 2 for vsmdpl and  ≈ 10.6 for gureft).Next, we subsample the MPB by selecting every 2 to 6 snapshots (chosen uniformly) up to either a maximum redshift of  max,train or a maximum length of 20 halos.We limit the length of each subbranch" to 20 halos to prevent overly long sequences that could impact the learning process of the recurrent neural network (Pascanu et al. 2013;Ribeiro et al. 2020).Subsampling the tree in this way has a few advantages.First, it allows us to cover a wider range of redshifts, improving the overall generalization of the model.Second, it has a smoothing effect on the MAHs.Third, randomizing the time steps helps prevent the model from becoming overly reliant on one particular set of time steps and improves generalization.We repeat the above data augmentation process 15 times for each tree.
The input features to the recurrent neural network at each stage are the logarithm of the virial mass log 10  vir , the Navarro-Frenk-White DM concentration  vir (Navarro et al. 1996), and scale factors  of the halo and the next progenitor.The concentration is not given directly by Rockstar and instead computed from the relation  vir ≡  vir /  where   is the Navarro-Frenk-White (NFW) scale radius (Navarro et al. 1996) and  vir is the virial radius.Note that the NFW scale radius in Rockstar is computed in two different ways, by fitting the density profiles or by converting from the radius of the maximum circular velocity (Klypin et al. 2011).Here we use the value provided by fit to the density profiles.If a halo undergoes a recent major merger or has multiple density peaks, the fitting procedure in Rockstar can fail, and   will be capped at  vir .This can create a "pile up" in the concentration for halos with   ≈  vir .Therefore, we require that (  −  vir )/  > 0.1, or equivalently  vir > 1.1.We remove all halos that do not pass this cut during the generation and post-processing of the MPBs in Section 3.3, as with the unresolved halos.Including  vir helps the model learn assembly histories more accurately because  vir has been found to capture the environmental dependency in many assembly bias studies (Nusser & Sheth 1999;van den Bosch 2002;Wechsler et al. 2006;Correa et al. 2015b).We will experiment with expanding to more halo features (e.g.spin, shape) and environment features (e.g.local density, number of neighboring halos) in future work.The scale factors of the halo and its next progenitor serve as   the time features in our framework.We have experimented with other time features like redshift and age and found similar performance.
For the output of the network, we model the logarithm of the change in mass, defined as: for the (+1)-th halo, and the concentration  vir of the next progenitor.We found that using accreted masses (Δ log 10  (+1) vir ) as targets, instead of progenitor masses (log 10  (+1) vir ), improves the model performance.During generation, progenitor masses can be derived from masses and accreted masses using Eq. 1.

Combining multiple simulations
We present a procedure for combining the four gureft boxes (gureft-05, gureft-15, gureft-35, and gureft-90)  resolutions and volumes, MAH for halos in overlapping mass bins could vary by a significant amount, particularly at high redshifts.
In such instances, the box with the highest resolution (i.e., the one with the lowest  DM ) would yield a more accurate MAH and we would like to only use these for training.Hence to combine the gureft boxes, we impose an additional minimum root mass cut of log 10  min,root ≈ 9.5, 10.5, 11.5 dex for gureft-15, gureft-35, and gureft-90 respectively.These cuts are applied directly to the root halos at  = 5.89 and before any data augmentation step.These thresholds are chosen to strike a good balance between MAH accuracy (which increases for smaller boxes at a given mass) and the number of halos in a mass bin (which is lower for smaller boxes), as having too few halos can adversely affect the training process.
No additional cut is applied to gureft-05, which has the highest resolution, resulting in an over-representation of low-mass halos in our training dataset.Despite the imbalance in the number of low and high-mass halos, we show in Section 4 that we are able to capture the high-mass halos accurately.The remaining data preprocessing steps follow the guidelines outlined in Section 3.1.1.Each gureft box is preprocessed separately, and the resulting datasets are combined into a single, large training dataset denoted as gureft-c.

Neural network architecture and optimization
We model each sub-branch as a sequence of  halos, with the root halo denoted with the index zero.The input and target feature vectors are: respectively, where  in = 4,  out = 2, and  = 0, ...,  − 2. We do not include any "end-of-sequence" token in our framework, so the feature vector includes only the first  − 1 halos.During the generation process (Section 3.3), we can choose to terminate the assembly history at a maximum redshift or minimum progenitor mass.Our goal is to learn the true conditional distribution of  (+1) , denoted as ( (+1) |{ ( ≤) }).
We use a recurrent neural network   : R  in → R  with parameters  to extract  summary features from the input features of each halo.The summary features are then: The hidden state ℎ () is dependent on the input features of all the previous halos  ( ≤) , which allows the network to "memorize" the entire assembly history.Our recurrent network consists of 4 Gated Recurrent Unit (GRU) layers, each with  = 128 hidden channels.
For the recurrent layers, we have also experimented with different network architectures like Transformers (Vaswani et al. 2023), adding a decay mechanism to the hidden states (Che et al. 2018), and adding time-embedding layers.We found that these choices result in similar performance while being more computationally intensive than our current model.
During training, we optimize the parameters {, } of the recurrent network and flow simultaneously using the negative log-density as the optimization loss.We use the AdamW optimizer (Kingma & Ba 2014;Loshchilov & Hutter 2019) with parameters (,  1 ,  2 , ) = (0.001, 0.9, 0.98, 0.01), where  is the learning rate,  1 and  2 are the running average coefficients, and  is the weight decay coefficient.At the end of each epoch (defined as one full iteration over the training set), we evaluate the loss on the validation samples and reduce the learning rate by a factor of 10 if no improvement is seen after 20 epochs.We terminate the training process if the validation has not improved after 40 epochs.The training process terminated at ∼ 200 epochs or ∼ 2 hours on a single NVIDIA Tesla V100 GPU.

Generation and post-processing
Once the model is trained, the generation process for one tree is as follows: (i) Select an initial halo feature  (0) , which is the initial mass log 10  (iii) Starting with the first time step  = 0, pass the halo feature  () and the hidden state ℎ () (with ℎ (0) initialized to 0) through the GRU layers and extract the summary features  () =   ( () , ℎ () ).
(iv) Sample ŷ (+1) ∼ p  ( ŷ (+1) |  () ) using the normalizing flow    (v) Convert the accreted mass to progenitor mass using Eq. 1 and set the feature of the progenitor halo to be vir ,  (+1) ,  (+2) ).(vi) Update the time step to  + 1 and repeat Steps (ii) to (v) until a minimum progenitor mass  min,halo or a maximum redshift  max,gen .We elaborate on the procedure to choose these thresholds below.
As mentioned in Section 3.1, halos with fewer than 100 DM particles are not reliably identified by Rockstar.For a model trained on a single simulation (i.e.vsmdpl in our case), we set the minimum 1 As mentioned in Section 3.1.1,Rockstar caps   at  vir , resulting in a "pile up" at around  ≈ 1.To remove this population, we require the generated concentration  (+1) vir > 1.1 and simply re-sample the halo if this condition is not satisfied.
progenitor mass to be  min,halo = 100  DM , where  DM is the DM particle mass of the simulation.For a model trained on a combined simulation (i.e.gureft-c in our case), due to the additional mass cut described in Section 3.1.2,the generated MPBs will have varying mass resolutions depending on their root masses.In the case of gureft-c, we determine the DM particle mass  DM based on the root mass of the MPB at  = 5.89 and set  min,halo = 100  DM as before.The minimum progenitor masses are shown in the last column of Table 1 for each simulation.As for the maximum generation redshift  max,gen , we generally recommend  max,gen to be smaller than  max,train to avoid extrapolation.However, we will show in Section 4 that florah can extrapolate the MPBs beyond  max,train .
Throughout the formation process of an MPB, the mass of the halo increases monotonically.However, as mentioned in Section 3.1, due to the misidentification of halos and mass assignments, progenitor halos may sometimes be assigned a mass greater than that of their descendant halos.For florah-generated MPBs, this can be corrected by simply re-sampling the mass and concentration in Step (iv).However, for MPBs in N-body simulations, the specific corrective measures will depend on the nature of the problem and the characteristics of the simulation data.Thus, to compare florahgenerated MPBs with N-body MPBs, we adopt the following simple method.For an MPB with  halos, starting from the last progenitor halo at time step  =  − 1 (with the root halo at time step 0), we compare its mass  vir ).We then move forward to the next time step  − 1 until we reach the root halo at time step 0.

RESULTS
In Section 4.1, we train florah on vsmdpl to show results for learning MPBs of a single simulation.Then we train florah on the combined gureft-c simulations to show results for combining multiple simulations at different resolutions.Finally, in Section 4.3, we combine the trained models to extend the mass and redshift coverage of florah.

Learning the vsmdpl simulation
We trained florah on vsmdpl to test our method on its capability to learn the assembly histories of a single simulation.We refer to this model as florah-v.For our training and validation datasets, we extracted 306, 014 and 34, 400 MPBs, respectively, from a (80 Mpc h −1 )3 sub-volume of vsmdpl and applied the preprocessing step in Section 3.1.1.The maximum training redshift is set to be  max,train = 10.The training process terminated at ∼ 200 epochs or ∼ 2 hours on a single NVIDIA Tesla V100 GPU.
For our test dataset, we extracted 387, 031 MPBs from a different (80 Mpc h −1 ) 3 vsmdpl sub-volume.For each MPB in the test dataset, we used the initial halo at  = 0, taking its mass and concentration as the initial input feature  (0) = (log 10  (0) vir ,  (0) vir ) for the generation process.We sampled the scale factors every 2 -6 snapshots2 starting from  = 0 to obtain the time features of each branch  () .We also sampled up to a redshift of  max,gen = 14, beyond the maximum training redshift ( max,train = 10), to explore the model's extrapolation capability.As the length of each MPB is set by choosing  () , which are chosen randomly, we can generate MPBs with variable lengths, and the upper bound in the number of progenitor halos depends on the minimum mass.We generated 387, 031 florah-v MPBs to match the number obtained from vsmdpl.
In Figure 4, we show an example of a few histories generated by florah-v.Each column shows the mass (top panel) and concentration (bottom panel) histories of an example root halo in the vsmdpl simulation (blue).For each root halo, we use florah-v to generate 30 different realizations of the histories (orange).The shaded regions denote the extrapolation region beyond  max,train = 10, as mentioned above.This figure demonstrates the stochasticity of the assembly histories of halos: for the same descendant mass and concentration, we can arrive at vastly assembly scenarios.In addition, we see that florah-generated MAHs can capture both smooth accretion, characterized by a steady increase in the MAHs, as well as implicit merger events, characterized by sharp jumps in the MAHs.This highlights the advantage of florah over other approaches, distinguishing it from other methods that predominantly capture the average trends of MAH.

Progenitor-descendant ratio distribution
In the following sections, we present various tests to validate the mass and concentration histories generated by florah-v.In Figure 5, we show the distribution of the progenitor-descendant mass ratios log 10   /  across four mass bins (columns) and three redshift slices (rows).The distribution of log 10   /  is learned directly by florah-v during the training process and thus presents a good first validation test.We see good agreement between the distributions from vsmdpl (shaded blue) and florah-v (solid orange).Note that because the last mass bin with log 10  vir,0 ∈ [13.0, 14.0] dex contains only 373 MPBs, the distributions are noticeably noisier.In each bin of the first row of Figure 5, the descendant halos are also the root halos and thus have roughly the same virial mass.However, we see that the distribution of the progenitor-descendant mass ratios can span a wide range of values, which further demonstrates the stochasticity of the MAHs between redshifts.

Mass and DM concentration histories
In Figure 6, we show the median, middle-68% percentile, and middle-95% percentile containment region of the assembly histories of vsmdpl (blue) and florah (orange), along with their residuals, in four mass bins.We normalize the MAHs by dividing the progenitor masses by the root mass so that they can be more easily compared between bins.The residual of the containment region is computed by averaging the residuals of the corresponding upper and lower percentile curves.As can be seen from the slopes of the MAHs, low-mass halos tend to form earlier than high-mass halos, as has been observed in previous studies of galaxy formation (e.g.(Wechsler et al. 2001)).The MAHs plateau out at around the resolution limit  min,halo = 100  DM (dashed black line, computed from the high end of each mass bin), as expected, because we remove all halos below this mass limit.In all mass bins, the MAHs predicted by florah-v are consistent with those from vsmdpl.As mentioned above, the last mass bin log 10  vir,0 ∈ [13.0, 14.0] dex contains significantly fewer MPBs, so the containment regions are noisier.In the bottom panels of Figure 6, we show the DM concentration  vir histories of vsmdpl and florah.Similarly, the concentration histories predicted by florah agree well with those of vsmdpl.The shaded gray box represents the  >  max,train = 10 region, beyond which the MAHs and concentration histories are extrapolated.Except for the first bin, florah-v demonstrates an impressive capability to extrapolate and predict the MAHs and concentration histories up to  = 14.In the lowest mass bin, florah accurately extrapolates up to about  ≈ 12, but it overestimates the MAHs after this point.We attribute this to the fact that this bin is the closest to the resolution limit  min,halo , as can be seen from the horizontal dashed line, and caution against extrapolating florah near this limit.
In Figure 7, we show the joint distribution between a few chosen redshifts ( = 0.1, 1, 2, 4, 6, 8) for  vir and  vir for florah-vgenerated (solid orange) and gureft-c (dashed blue) MPBs.We also emphasize that florah jointly predicts the mass and DM concentration of the halos at each redshift using the conditional normalizing flows.The flows excel at modeling high-dimensional correlations, so additional halo properties (e.g., halo shape, environment density) can naturally be incorporated into florah without requiring additional assumptions about the data.In Figure 8 relations.The bottom panel shows the residual between the florah-v and vsmdpl median.In general, we see that the concentration-mass relations recovered by florah-v are consistent with vsmdpl across multiple redshifts.
In Figure 9, we show the distribution of the formation redshift  50 , defined as the redshift at which the halo forms 50% of its mass, for the four chosen mass bins.As mentioned above, low-mass halos tend to form at higher redshift than high-mass halos.We see that the  50 distribution by florah-v agrees well with vsmdpl.We emphasize that the formation redshift  50 is not explicitly learned by the network but instead derived from the MAHs.This further shows that florah can capture the "long-range" mass correlation in the MAHs.

Observables and Halo assembly bias
We do not directly observe dark matter halos or merger trees, but only the observable properties of galaxies.In semi-analytic models (SAMs), these properties are predicted by using merger trees as inputs and solving ordinary differential equations for observables like galaxy luminosities and quasi-observables like stellar mass.Thus we need to test that florah-generated assembly histories can be used in place of N-body assembly histories in SAMs to predict these observables.To that end, here we apply the SC-SAM to predict the stellar masses  ★ of galaxies.For a fair comparison with vsmdpl, we input only the MPBs of vsmdpl trees to the SAM.We use both the mass and DM concentration of the generated MPBs.In SC-SAM, the concentration is used in two ways: first, in computing the galaxy sizes (see Somerville et al. 2008), and second, in computing the timescale for disruption of satellite galaxies by tidal forces.Note that since we do not model satellite galaxies in our analysis, the latter point is not directly relevant.We apply the minimum progenitor mass cut  min,halo to both vsmdpl and florah halos.In addition, we do not restrict the redshift range of either vsmdpl and florah halos.
In the top panel of Figure 10, we show the stellar-to-halo mass relation (SHMR).Next, we compute the SHMR residual, defined as the difference between the  ★ / vir value for each halo and its median value in the corresponding  vir bin.The correlation between the SHMR residual and the DM concentration  vir is shown in the bottom panel.Estimating the residual this way removes most of the mass dependence.This demonstrates that the galaxy properties depend on secondary halo characteristics beyond halo mass, which can lead to the phenomenon known as "assembly bias" (Hadzhiyska et al. 2021;Gabrielpillai et al. 2022).In both cases, relations predicted by florah are consistent with vsmdpl.A SAM run on florah-generated assembly histories accurately reproduces the correlation between stellar mass, halo formation history, and halo concentration.
However, in the top panel of Figure 10, we observe a bifurcation in the SHMR, leading to a secondary population of galaxies with low stellar mass.We theorize that this population represents galaxies which normally grow their mass via mergers.Since we only input MPBs into the SC-SAM, we underestimate the stellar mass of these galaxies.The bifurcation disappears when we input full merger trees of vsmdpl into the SC-SAM, which further supports our theory.Currently florah only generates MPBs, and thus it is unable to capture these merger-driven galaxies.We have also found that using only the MPBs, SC-SAM does not correctly recover the mass of the central supermassive black holes for both vsmdpl and florah.This is expected because the central supermassive black holes grow mass via mergers.We further discuss the limitations of the current framework and future outlook for florah in Section 5.

Clustering
To further probe the assembly bias, we compute the galaxy power spectra () of vsmdpl and florah-v-generated galaxy catalogs at  = 0. florah does not generate positions, and so for each halo, we take the positions of the corresponding vsmdpl halo in the test dataset at  = 0. Thus, only the stellar masses computed by SC-SAM differ between the two catalogs.In the top left panel of Figure 11, we show the galaxy power spectra () of vsmdpl (dashed blue) and florah-v (solid orange) catalogs.The ratio between the residual Δ() =  florah () −  vsmdpl () and the vsmdpl power spectrum is shown in the bottom panel.We see excellent agreement between the power spectrum of the florah-v catalog and that of the vsmdpl catalog.We emphasize that this test is non-trivial because florah is trained on individual halos, independently, and has no notion of any spatial correlations when generating MAHs for a population of clustered halos together.
Studies of assembly bias have also shown that early-forming galaxies exhibit stronger clustering tendencies compared to their lateforming counterparts (e.g., Wechsler et al. (2006)).To investigate this phenomenon, we categorized each galaxy catalog into two groups based on their formation redshift  50 : a late-forming population comprised of the first 25% of halos, and an early-forming population consisting of the last 25%.The resulting power spectra () and residuals are displayed in the right panels of Figure 11, with the solid line representing the late-forming population and the dashed line representing the early-forming population.The orange lines represent florah-v galaxies, while the blue lines represent vsmdpl galaxies.Our analysis indicates that the power spectra of florah-v galaxies are consistent with those of vsmdpl galaxies for both populations.This further suggests that florah-generated histories can accurately replicate the assembly bias of galaxies.

Learning the combined gureft simulations
Using the procedure in Section 3.1.2to combine the four gureft boxes, we extracted 42, 774 unique MPBs and applied the data augmentation steps in Section 3.1.1.Note that gureft-c contains significantly fewer halos than vsmdpl because the gureft boxes have much smaller volumes and particle numbers (as can be seen from Table 1).For this reason, we used 38, 023 MPBs for the training dataset and 4, 751 MPBs for the validation dataset and did not create a test dataset for gureft-c.We trained a model, denoted as florah-g, up to a redshift of  max,train = 20.Then we generated 4, 751 florah-g MPBs to match the number obtained from gureft-c.During the generation process, we applied a similar procedure in Section 4.1 to choose the starting halo  (0) = (log 10    We perform additional validation checks similar to Section 4.1.In Figure 12, we compare distributions of the progenitor-descendant mass ratio log 10   /  from gureft-c (shaded blue) and generated by florah-g (solid orange) for four mass bins (columns) and three redshift slices (rows).We choose the four mass bins such that each bin corresponds to a different gureft box, with gureft-05 being the least massive bin and gureft-90 being the most massive bin.Similar to the vsmdpl example in Section 4.1, we observe good agreement between the distributions from gureft-c and generated by florah-g.
In Figure 13, we present the median, 68%-percentile, and 95%percentile containment regions of the MAHs (top panels) and the DM concentration histories (bottom panels) for MPBs from gureftc (blue) and generated by florah-g (orange).Similarly to vsmdpl, low-mass halos tend to form at earlier redshifts than high-mass halos.Also as expected, the MAHs of both gureft-c and florah-vg plateau out at  min,halo .There are a few differences compared to vsmdpl and the low-redshift case.Unlike in vsmdpl, where  min,halo is fixed at 100 times the DM mass of vsmdpl, here  min,halo is determined by the log 10  vir,0 at  = 5.89 (refer to Table 1).We choose the same four mass bins as Figure 12, with each bin corresponding to a different gureft box.Due to a low number of halos in the validation dataset, we only display redshift slices with more than 10 resolved halos, resulting in the termination of the MPBs before  max,train = 20 in the last two mass bins.Again, we observe good agreement between the MAHs and concentration histories of florah-g MPBs with those of gureft-c MPBs.Moreover, florah-g exhibits a similar capability to extrapolate the MPBs beyond  max,train = 20 in the first two mass bins.
In Figure 14, we show the joint distributions of  vir (left panels) and  vir (right panels) at selected redshifts (𝑧 = 6, 8, 12, 16, 20) for florah-g-generated (solid orange) and gureft-c (dashed blue) MPBs.The contour lines represent the 68% and 95% intervals.Once again, the distributions of florah-g MPBs match well with those of gureft-c MPBs.Note that the mass distribution of gureft-c exhibits multiple peaks due to the applied mass cuts in the procedure to combine the four gureft-c boxes (see Section 3.1.2).
In Figure 15, we show the median concentration-mass relations from the combined gureft-c boxes and florah-g-generated MPBs.Similar to the vsmdpl example in Figure 8, the error bars denote the spread of the relations, computed from the 16th and 84th percentiles.The bottom panel shows the residual between florah-g and gureftc median relations.Here, we also see that the relations between gureft-c and florah-g agree well.
Finally, we show the distribution of the formation redshift  50 for the four chosen mass bins of gureft-c in Figure 16.We see that similarly as in the vsmdpl case, we recover the  50 distribution of the gureft-c simulations.Again, we emphasize that the formation redshift  50 is a derived quantity from the MAHs and not learned directly by the network, indicating that florah can capture the "longrange" mass correlation in the MAHs.We opt not to apply the same SAM analysis shown in Section 4.1.3and 4.1.4on florah-g and gureft-c MPBs.Given that florah-g can recover the environmentdependent quantities such as the  50 , we expect the SHMR, SHMR residual-concentration relation, and galaxy clustering predicted by florah-g to also be consistent with the simulations.
To conclude this section, we developed a procedure to combine multiple N-body simulations with varying resolutions into the training dataset and showed that florah can accurately learn the MAHs and concentration histories of these simulations.The gureft simulations in this example are designed to capture the assembly histories of DM halos at the ultra-high redshifts at an unprecedented temporal resolution.By successfully incorporating the gureft simulations into our training dataset, florah becomes an essential tool for understanding and analyzing the formation and evolution of cosmic structures during the early Universe.

Combining gureft and vsmdpl
High computational costs make it infeasible to run N-body simulations at the mass resolution required to simultaneously capture the merger dynamics from dwarf galaxies (10 5 − 10 10 M ⊙ ) to galaxy clusters (10 14 − 10 15 M ⊙ ) up to high redshifts and in large volumes.The gureft simulations are simulated up to  ≈ 6 and designed to be complementary to past large volume simulations such as vsmdpl.In this section, we combine the previously trained model, florah-v on vsmdpl and florah-g on gureft-c, in an attempt to generate MPBs spanning  = 0 to  ≈ 24.
The generation of the combined assembly histories is done in two steps.First, we applied a similar procedure in Section 4.1 to choose the starting halo vir ) at  = 0 from the test dataset of vsmdpl.As before, we chose the list of vsmdpl scale factors by sampling every 2 -6 vsmdpl snapshots from  = 0 and used florah-v to generate the MPBs.Here, we only generate the MPBs up to a redshift of  = 5.89 and down to a progenitor mass of  min,halo = 5.89 × 10 8 M ⊙ , i.e. 100 times the DM mass of vsmdpl.In the second step, for the MPBs that are not terminated before  = 5.89, we used florah-g to further generate their progenitors.Note that currently, we do not carry the hidden states between models (i.e. from florah-v to florah-g).This is because the two models are trained independently due to the lack of N-body simulations spanning the entire redshift range from  = 0 to  ≈ 24, and hence their hidden states are not necessarily correlated.Thus here, for the sake of simplicity, we initialize the hidden state for florah-g with zeros but plan on investigating better initialization procedures in the future.We chose a list of gureft scale factors by sampling every 2 - 6 gureft snapshots from  = 5.89 up to  max,gen = 24.Again, we set  max,gen = 24 >  max,train of gureft-c to demonstrate the model's extrapolation capability.Unlike in vsmdpl, the  min,halo is set based on the progenitor mass at  = 5.89.After we generate the full MPB, we apply the post-processing steps in Section 3.3.We generated 387, 031 MPBs, the same number of trees as the vsmdpl test dataset.For convenience, we denote these combined MPBs as florah-vg MPBs.as mentioned in Section 2, due to the resolution limit of vsmdpl, less massive root halos of vsmdpl (≲ 10 10 M ⊙ ) do not have MAHs that overlap with gureft-c.We cannot extend the MPBs of these halos with gureft-c and florah-g unless we allow the generation of poorly resolved halos (i.e.halos with fewer than 100 DM particles) during the first step of the generation procedure.Thus, for halos with ≲ 10 10 M ⊙ , which corresponds to the host halos of dwarf galaxies, we are unable to reliably generate their assembly histories beyond  ≈ 4 − 5.The longest florah-vg MPBs have root masses from 1.8 × 10 10 − 10 11 M ⊙ (corresponding to the bright end of the dwarf galaxy population) extend to  ≈ 20 − 22. Finally, it is worth noting that gureft-90 and vsmdpl do not overlap due to the mass cut applied when combining the gureft boxes (as described in Section 3.1.2).As a result, we do not fully utilize gureft-90 during the generation.In future work, we can connect the massive end of gureft-90 with vsmdpl by including an additional simulation in the training dataset.
Next, we examine the generated MPBs in more detail.Since we do not have N-body MPBs that span from  = 0 to  ≈ 24, we divide each florah-vg MPB into a low-redshift component which spans  = 0 − 10 and a high-redshift component which spans  = 5.89 − 25.We will compare the low-redshift component with vsmdpl and the high-redshift component with gureft.

Mass and DM concentration histories at low redshifts
In Figure 18, we show the median, 68%-percentile, and 95%percentile containment regions of the MAHs (top panels) and DM concentration (bottom panels) histories for MPBs from vsmdpl (dashed blue) and florah-vg (solid orange).The MAHs and DM concentration histories of vsmdpl are the same as Figure 6.Unlike the vsmdpl MAHs and the florah-g MAHs in Figure 6, the florah-vg MAHs do not plateau out at the resolution limit of vs-mdpl (horizontal dashed line).This is to be expected because the progenitors of florah-vg halos at high redshifts are generated by the florah-g model and thus have a higher mass resolution.We will show in Section 4.3.2below that the MAHs at high redshifts are indeed consistent with gureft-c.However, we note some unphysical features in the MAHs and concentration histories.The MAHs of vsmdpl and gureft-c do not line up perfectly at the transition redshift  = 5.89, creating "kinks" in the 95% of the MAHs in the first two mass bins.This is particularly clear in the bottom panels, where the kinks can also be seen in the third mass bins and the median and 68% of the DM concentration histories.We defer the discussions of the limitations of our approach in more detail in Section 5, after we compare the high-redshift components of florah-vg to gureft-c.

Mass and DM concentration histories at high redshifts
In Figure 19, we compare the high-redshift components of the florah-vg MPBs with gureft-c MPBs.As above, the median, 68%percentile, and 95%-percentile containment regions of the MAHs (top panels) and DM concentration (bottom panels) histories.Because gureft-c contains significantly fewer MPBs than generated by florah-vg (since florah-vg MPBs are generated from the root halos of vsmdpl), we subsample florah-vg MPBs to match the number of MPBs in gureft-c for a fair comparison.Similarly to Figure 13, we do not include redshift slices with fewer than 10 resolved halos, resulting in the MAHs and concentration histories in the last two bins terminating before  max,train = 20.In general, both the MAHs and concentration histories of florah-vg agree well with gureft-c.Unlike in the low-redshift case, the MAHs of both gureft-c and florah-vg plateau out at  min,halo as expected.As a reminder, here  min,halo is determined by the log 10  vir,0 at  = 5.89 (refer Table 1).Note that we still see the effect of the sharp transition between   The disagreement between the vsmdpl MAHs and florah-vg MAHs at high redshift is expected.Similarly to Figure 6, the vsmdpl MAHs plateau out at the resolution limit of vsmdpl.On the other hand, at  ≳ 6, florah-vg MAHs do not plateau out since they are generated by florah-g, which is trained on gureft-c, and thus have higher resolutions than vsmdpl.
vsmdpl and gureft-c at  = 5.89 in the concentration histories, though to a much lesser extent.

Comparison with previous works
Previous approaches to modeling the cosmological MAHs of DM halos typically fit a parameterized functional form over the entire simulated MAH population (e.g.Hearin et al. 2021;McBride et al. 2009;Wechsler et al. 2001, etc.).For example, McBride et al. (2009) assumes the MAHs take the form: where  0 =  (0) is the mass of the root halo, and  ≡ (, ) are free parameters of the fit.One can also imagine fitting the evolution of the DM concentration  vir () or other properties of the halos in a similar manner.As shown in the aforementioned works, these approaches can also provide a reasonably accurate representation of the MAH and capture population-level statistics such as the mean MAH for a given root halo mass.To capture the full distribution (i.e., the scatter) of the MAH population, one can model the full distribution of the fit parameters (), as shown in Hearin et al. (2021).The dependency of  on the root halo properties such as  0 and  0 can be included as a conditional probability, i.e. (| 0 ,  0 ).The florah framework can be considered as a non-parametric modeling method for the MAH, with a few notable advantages.First, the parametric modeling approaches above assume a functional form not only for the mean of the MAH but also the scatter (and other quantiles) of the MAH population distribution, as this stochasticity is folded into the sampling of the fit parameters (| 0 ,  0 ) at  = 0.It is unclear if these all should follow the same distribution.Thus, non-parametric methods such as florah can provide more flexibility in modeling the MAH distribution.Moreover, parametric methods may require additional assumptions to incorporate more than one halo property.For example, to fit both the MAH and the DM concentration histories, one needs to assume how the mass and concentration are correlated, e.g.via the halo concentration-mass re- lations.On the other hand, florah can jointly fit both the mass and DM concentration without requiring additional assumptions about the data.Indeed, we show in Figures 8 and 15 that MPBs generated by florah follow the same halo concentration-mass relations as the simulations.Additionally, incorporating additional halo properties (e.g., the halo shape or environment information) into florah is straightforward, since the conditional normalizing flows can easily model high-dimensional distributions.We plan to explore different sets of halo properties in future work.
As shown in Section 4, florah can correctly capture the progenitor-descendant mass ratios at each time step and model the MAH at the individual halo level.The distribution of the mass (and DM concentration) at each redshift  is modeled independently with the conditional normalizing flows, thus allowing for "jumps" in the MAH to occur, representing DM halo mergers, which can play an important role in galaxy formation models.In contrast, the parametric modeling approaches assume the MAH to be a smooth function, so there is no stochasticity between redshifts once the fit parameters  are sampled from (| 0 ,  0 ) at  = 0. Though it is possible to increase the fidelity of the parametric fits by resampling  at different , this requires modeling (| 0 ,  0 , ), which will be much more difficult.
Halo-level properties such as the progenitor-descendant mass ratios are important for certain modeling applications of black hole seeding in SC-SAM, which we will explore with florah in future works.We note that EPS-based approaches can also model the stochasticity between redshifts in a similar autoregressive manner, by sampling the conditional mass function from the EPS formalism.However, in EPS-based approaches, the MAH is usually modeled as a Markov Chain, in which the mass of the progenitor is modeled using only the mass of its immediate descendant.florah extends this so that the progenitor mass is modeled using the entire assembly history.

Current limitations and future outlook
We showed in Section 4.1 that when trained on a single N-body simulation, florah can accurately recover the MAHs and concentration histories.This applies similarly when multiple N-body simulations with the same redshift range but varying mass resolutions are combined into a single training dataset (Section 4.2).We further showed that florah-generated MAHs can be input into the SC-SAM to predict observable properties and assembly bias of galaxies.However, because the current framework only generates the MPBs of merger trees, florah cannot yet capture correctly merger-driven properties such as the stellar mass of some population of galaxies or the mass of supermassive black holes.To capture these properties, we must include secondary branches as well as other sub-branches in the generation process.In ongoing work, we are extending it to generate full merger trees by, for example, modeling the probabilities of branching events and utilizing graph generative models.
In Section 4.3, we combined the florah-v and florah-g model to generate assembly histories from  = 0 to  ≈ 24.Overall, we found that the generated MAHs and concentration histories agree with the N-body simulations.However, the current approach has two main drawbacks.
First, due to the lack of N-body simulations spanning the redshift range from  = 0 to  ≈ 24, florah-v and florah-g are trained independently.Thus, their hidden states are not necessarily correlated and cannot be carried from one model to the other (i.e. from florah-v to florah-g).As a result, the information carried by the hidden states of florah-v is lost during the transition.We plan on investigating better initialization procedures in future work.For example, instead of transitioning at  = 5.89, we may better utilize the overlapping redshift range between vsmdpl and gureft.We found that the current range,  > 5.89, to be insufficient because it includes too many poorly resolved halos of vsmdpl (i.e.those with fewer than 100 DM particles), which create an inconsistency between the MAHs of vsmdpl and of gureft.
Second, the current approach for combining the simulations can result in some unphysical features in the MAHs and concentration histories at the transition redshift  = 5.89 between the two simulations.This is especially clear in the concentration histories, where we see that the concentration histories of vsmdpl are consistently higher than florah-vg at  > 5.89.However, from Figure 19, it is clear that the concentration histories of florah-vg is consistent with gureft-c.This implies that there is a systematic difference between the concentration histories of gureft-c and vsmdpl.We attribute this difference to the absence of unresolved, low-mass progenitor halos in vsmdpl at these redshifts.At high redshifts,  vir tends to be lower for lower-mass halos, so missing the low-mass halo population will artificially push the distribution of  vir towards the high values.This is supported by the fact that the lower-68% and lower-95% of the concentration histories are less affected compared to the upper-68% and upper-95%.Although a more sophisticated generation procedure may mitigate the sharp transitions, we attribute the problem mainly to the inconsistency in the MAHs and concentration histories of the two training simulations.We refer to Yung et al. (2024) for a more detailed diagnosis of this inconsistency and comparison between gureft and the MultiDark simulations (including vsmdpl).
Both of the above drawbacks can be mitigated by improving the simulations.In future work, we are considering the possibility of further running gureft to a redshift of 4 and with a larger box size (e.g. 120 Mpc h −1 ).This will extend the overlapping redshift range between the two simulations and ensure that vsmdpl has a sufficient number of resolved halos in this range, thus improving the consistency between the assembly histories.In addition, this enables the assembly histories of halos with less than 10 10 M ⊙ to be extended using a gureft-based florah model.

CONCLUSION
In this paper, we developed florah, a deep generative model based on recurrent neural networks and normalizing flows, to generate the mass assembly histories (MAHs) of dark matter halo merger trees.
We trained two models independently on the vsmdpl and gureft N-body simulations and assessed the performance of the models to recover the MAHs, concentration histories, and observable and semi-observable properties derived from semi-analytic models such as stellar mass.Our main findings are summarized as follows: • In Section 4.1, we found that the florah-v model, which is trained on the vsmdpl simulation up to redshift 10, can accurately capture the MAHs and concentration histories across a wide range of host halo masses.Additionally, when operating well above the resolution limit of the simulation, florah-v robustly extrapolates the histories up to redshift 14, beyond the training redshift.We applied the Santa Cruz semi-analytic model to florah-generated histories and showed that florah can recover correlations between galaxy properties and halo assembly history, such as the correlation between stellar-to-mass ratio residual and halo concentration, as well as the galaxy matter power spectrum.florah is the first machine learning framework that accurately learns and incorporates the correlations between halo properties and merger history.
• In Section 3.1.2,we developed a procedure to incorporate multiple simulations with the same redshift range and varying mass resolutions into a single training dataset.In Section 4.2, we trained the florah-g model on the gureft suite of simulations, comprising four boxes (gureft-05, gureft-15, gureft-35, and gureft-90), and successfully demonstrated florah's ability to handle this variation.The gureft simulations were purposefully crafted to capture the evolution of dark matter halos at the ultra-high redshift Universe and at an unprecedented temporal resolution.By successfully learning the assembly histories of these simulations, florah proves to be a versatile tool for studying the formation and evolution of cosmic structures during the initial phases of the Universe.
• In Section 4.3, we developed a procedure to concatenate multiple florah models.We used florah-v to generate the assembly histories from  = 0 to  = 5.89 and florah-g to further generate the histories to  ≈ 20 − 24.We showed that despite the simplicity of our approach, we were able to generate the mass assembly histories from  = 0 to an ultra-high redshift  ≈ 24 for halos with root mass ≲ 10 10 M ⊙ , which corresponds to the bright end of the dwarf galaxy population.We were unable to reliably generate assembly histories below this mass due to the resolution limitation of vsmdpl.In addition, during the generation process, we found an inconsistency in the dark matter concentration histories of vsmdpl and gureft at  ≳ 6, which may be attributed to the missing of unresolved progenitor halos in vsmdpl (refer to Yung et al. (2024) for a more detailed discussion).This results in sharp kinks in the concentration histories at the transition redshift  = 5.89 between the two florah models.These limitations can be overcome by improving the overlapping between gureft-c and vsmdpl, e.g., by running gureft-c to a lower redshift and with a larger volume.)Despite the current limitations, this represents the first step towards generating assembly histories at a mass resolution required to simultaneously capture the merger dynamics from dwarf galaxies to galaxy clusters up to high redshifts and in large volumes.At present, this is far beyond the capability of numerical simulations.
• In Section 5, we discuss the advantages of florah over traditional parametric modeling approaches for the MAHs.Like many machine learning methods, florah is non-parametric and does not require assumptions on the functional form of the mass and DM concentration assembly histories.It is also straightforward to include additional halo properties beyond the mass and DM concentration since the normalizing flows employed for density modeling excel at capturing high-dimensional distributions.In addition, florah mod-els the MAHs in an autoregressive manner, sampling the halo mass and concentration at each redshift based on the entire halo history.This allows florah to capture variations between redshifts, such as the progenitor-descendant mass ratios, more easily.florah represents an exciting and promising initial step towards the development of a machine learning-based framework for generating full merger trees.Such a framework has the potential to revolutionize our understanding of how galaxies form and evolve by allowing for the exploration of different galaxy formation scenarios with excellent computational efficiency at unprecedented accuracy.To achieve this goal, we intend to expand florah's capabilities to generate secondary branches and other sub-branches.In addition, we aim to develop an emulator that can generate merger trees by training florah on simulations featuring different cosmological parameters.Doing so will enable florah to learn and capture the dependence of halo assembly histories on cosmology.

Figure 1 .
Figure 1.The median MAHs of the main progenitor branches of gureft and vsmdpl down to 100  DM , where  DM is the mass of the DM particle in the corresponding simulations.
Select a list of scale factors {() } as time steps.florah is robust across various time steps, as long as the chosen time steps remain within a reasonable range relative to the training time steps.

Figure 4 .
Figure 4. Example mass and concentration assembly histories generated by florah-v.Each column shows the mass (top) and concentration (bottom) assembly histories of an example root halo from the vsmdpl simulation (blue) and 30 different realizations by florah-v.The shaded gray box denotes the "extrapolation region" as we only train florah-v up to a maximum training redshift of  = 10.

zFigure 5 .
Figure5.The normalized distribution of the progenitor-descendant mass ratio log 10   /  as modeled by florah-v (dashed) and from the vsmdpl simulation (shaded).Each column represents a different root mass bin (in M ⊙ unit).Each row represents a different progenitor-descendant redshift slice.

Figure 6 .
Figure6.The medians, middle 68-% percentile, and middle-95% containment regions of the MAHs (top) and DM concentration histories (bottom) of vsmdpl (blue) and florah (orange), along with their residuals (bottom), in four mass bins (in M ⊙ unit).The residual of the containment region is computed by averaging the residuals of the corresponding upper and lower percentile curves.The shaded gray box ( > 10) denotes the "extrapolation region" beyond the maximum training redshift.

Figure 10 .
Figure10.Top: The stellar-to-halo mass relation (SHMR) at  = 0 computed by the SC-SAM.Bottom: The residual of SHMR, defined as the difference between the  ★ / vir value for each halo and the median value in its corresponding  vir bin, as a function of the DM halo concentration.In both panels, the shaded regions represent the middle-68 % containment regions.

Figure 11 .
Figure11.Left: The top panel shows the galaxy power spectra  ( ) at  = 0 of galaxy catalogs from vsmdpl (dashed blue) and generated by florah-v (solid orange).The bottom panel shows the ratio of the residual Δ ( ) =  florah ( ) −  vsmdpl ( ) and the vsmdpl power spectrum.Right: The top panel shows the galaxy power spectra  ( ) at  = 0 of galaxy catalogs from vsmdpl (blue) and generated by florah-v (orange) divided into a late-forming galaxy population (solid) and an early-forming population (dashed).The bottom panel shows the ratio of the residual Δ ( ) and the vsmdpl power spectrum for the corresponding population.Similarly, the solid line represents the late-forming population, while the dashed line represents the early-forming population.
vir ) at  = 5.89 and list of scale factors from the validation dataset.Similarly, we generate MPBs up to a redshift  max,gen = 24 >  max,train to demonstrate the model's extrapolation capability.

Figure 12 .
Figure 12.The normalized distribution of the progenitor-descendant mass ratio log 10   /  as modeled by florah-g (blue dashed line) and from the gureft-c simulation (shaded orange).Panels are the same with Figure 5.

Figure 13 .
Figure 13.The MAHs and DM concentration histories of the MPBs in gureft-c and florah-g in four mass bins (in M ⊙ unit).Panels are the same as Figure 6.

Figure 16 .Figure 17 .
Figure 16.The formation redshift  50 , defined as the redshift at which the halo forms 50% of its mass, for four mass bins of gureft-c.Masses are in M ⊙ unit.

Figure 18 .
Figure 18.The MAHs and DM concentration histories of the MPBs in vsmdpl and florah-vg in four mass bins (in M ⊙ unit).Panels are the same as Figure 6.The disagreement between the vsmdpl MAHs and florah-vg MAHs at high redshift is expected.Similarly to Figure6, the vsmdpl MAHs plateau out at the resolution limit of vsmdpl.On the other hand, at  ≳ 6, florah-vg MAHs do not plateau out since they are generated by florah-g, which is trained on gureft-c, and thus have higher resolutions than vsmdpl.

Figure 19 .
Figure 19.The MAHs and DM concentration histories of the MPBs in gureft-c and florah-vg in four mass bins (in M ⊙ unit).Panels are the same as Figure 6.

Table 1 .
The simulation specifications of vsmdpl and the four gureft simulations.The third and second-to-last columns show the minimum and maximum root mass of the training dataset for each simulation.Note that the four gureft simulations are combined into one training dataset.The last column shows the minimum progenitor mass during the generation process.
into a single training dataset, which we denote as gureft-c.The four gureft boxes exhibit significant overlap in halo masses as depicted by the distributions of  vir in Figure3.However, due to differences in