Ahf: AMIGA'S HALO FINDER

and

Published 2009 May 19 © 2009. The American Astronomical Society. All rights reserved.
, , Citation Steffen R. Knollmann and Alexander Knebe 2009 ApJS 182 608 DOI 10.1088/0067-0049/182/2/608

0067-0049/182/2/608

ABSTRACT

Cosmological simulations are the key tool for investigating the different processes involved in the formation of the universe from small initial density perturbations to galaxies and clusters of galaxies observed today. The identification and analysis of bound objects, halos, is one of the most important steps in drawing useful physical information from simulations. In the advent of larger and larger simulations, a reliable and parallel halo finder, able to cope with the ever-increasing data files, is a must. In this work we present the freely available MPI parallel halo finder Ahf. We provide a description of the algorithm and the strategy followed to handle large simulation data. We also describe the parameters a user may choose in order to influence the process of halo finding, as well as pointing out which parameters are crucial to ensure untainted results from the parallel approach. Furthermore, we demonstrate the ability of Ahf to scale to high-resolution simulations.

Export citation and abstract BibTeX RIS

1. INTRODUCTION

The identification and hierarchical grouping of "clumps" within large sets of particles (may it be dark matter, star, or gas particles) produced by cosmological simulation codes is the objective in halo finding. A variety of methods have been developed and have seen many improvements over the years; however, at heart, all methods try to identify peaks in the density field and group particles around those peaks.

The classical method to identify isolated structures is the purely geometrical "Friends-of-Friends" (Fof) algorithm (e.g., Davis et al. 1985) in which particles closer than a given scale (the linking length) are connected. The whole particle distribution then separates into isolated regions where outside particles do not come closer than the linking length. A serious shortcoming of this method is the danger of linking two blobs together via a "linking bridge." Additionally, with a fixed linking length it is impossible to identify substructure within a Fof halo. Many variants of this method have been developed, trying to overcome the shortcomings of the classical algorithm, either by using adaptive linking lengths (Suginohara & Suto 1992; van Kampen 1995; Okamoto & Habe 1999), multiple linking lengths (Klypin et al. 1999, hierachical Fof) or the inclusion of earlier snapshot analyses (Couchman & Carlberg 1992; Summers et al. 1995; Klypin et al. 1999).

Most other halo finding methods employ an explicit measure of the density field. One of the first methods to do so is the spherical overdensity (So) algorithm (Press & Schechter 1974; Warren et al. 1992; Lacey & Cole 1994) which calculates the density from the distance to the Nth nearest neighbor. It then searches for density peaks and grows spheres about them until a certain overdensity threshold is reached iteratively adapting the center to the center of the enclosed particles.

The Denmax algorithm (Bertschinger & Gelb 1991; Gelb & Bertschinger 1994) uses a grid to calculate the density field and then moves the particles on a path given by the density gradient until a local density maxima is reached. This artificially increases the separation of clumps and circumvents the linking bridges, so that then a Fof approach can be used to collect halo particles. Similar in spirit is the Hop algorithm (Eisenstein & Hut 1998) which employs a different way to move the particles to a density maxima, avoiding the calculation of a density gradient by instead "hopping" to a neighboring particle associated with the highest density. The offspring of Denmax, Skid (see, e.g., Stadel 2001; Governato et al. 1997; Weinberg et al. 1997; Jang-Condell & Hernquist 2001) uses a Lagrangian density estimator similar to the one employed by the So algorithm.

The Bdm method (Klypin & Holtzman 1997; Klypin et al. 1999) uses randomly placed spheres with predefined radius which are iteratively moved to the center of mass of the particles contained in them until the density center is found. This iteration process is also used in the So method. Recently, the Voboz (Neyrinck et al. 2005) technique has been described, which uses a Voronoi tessellation to calculate the local density. The Subfind algorithm (Springel et al. 2001; Dolag et al. 2008) uses in a first step a Fof method and then looks for saddle points in the density field within each Fof group.

Gill et al. (2004) used the grid hierarchy generated by an adaptive mesh refinement (AMR) code (Knebe et al. 2001, Mlapm) to construct a halo finder, Mhf. The grid hierarchy is built in such a way that the grid is refined in high-density regions and hence naturally traces density contours. This can be used to not only select halos, but also to identify included substructure. The AMR grid structure naturally defines the halo–subhalo hierarchy on all levels, which is a mandatory requirement for any state-of-the-art halo finder.

One further very important aspect in finding substructure is the pruning of their associated particle lists to remove unbound particles to get a dynamical description of the halo. All current halo finding tools (Skid, Bdm, Subfind, Voboz, Mhf) perform this step; however, the degree to which the potential field is reconstructed to define the binding energy varies. Usually the halo is treated in isolation and only recently methods have been proposed to handle the inclusion of tidal effects to define the demarcation of subhalos (e.g., Weller et al. 2005; Kim & Park 2006; Shaw et al. 2007).

In this work we will describe the successor of Mhf named Ahf (the Amiga Halo Finder). Amiga aims to be the replacement for the cosmological simulation code Mlapm (Knebe et al. 2001) and is capable of doing the halo analysis during the course of the simulation. However, Ahf can also be used as a stand-alone halo finder and as such it is described in this paper. It features new reading routines which can handle large simulation data in parallel and enhanced features, although the principle algorithmic idea of Mhf is kept.

The paper is organized as follows. Section 2 introduces the method used to identify halos in cosmological simulations and describes the parallel strategy. In Section 3 we apply Ahf and show the impact of the free parameters and verify that the parallel approach does not introduce artifacts. We then compare Ahf to other halo finders and to theoretical predictions in Section 4. Section 5 shows the scalability and stability of our results. We conclude and summarize in Section 6.

2. HALO FINDING THE Ahf WAY

In this section we describe our algorithm for finding structures in cosmological simulations. Ahf is the successor of Mhf. For a more detailed description of the underlying principles of Mhf, we refer the reader to Gill et al. (2004) and Gill (2005); for reference, especially on the configuration and compilation, the user manual available on the Ahf Web site3 is also helpful. In this paper we focus on the parallel implementation and also provide a study of the parameters of the algorithm. However, we will give a short description of the main ideas underlying Ahf in Section 2.1 and describe the parallelizing approach in Section 2.2. Finally, we summarize the parameters in Section 2.3.

2.1. Ahf

In Ahf we start by covering the whole simulation box with a regular grid of a user-supplied size (the domain grid, described by the ${\sf{DomGrid}}$ parameter). In each cell the particle density is calculated by means of a triangular shaped cloud (TSC) weighing scheme (Hockney & Eastwood 1988). If the particle density4 exceeds a given threshold (the refinement criterion on the domain grid, ${\sf{DomRef}}$), the cell will be refined and covered with a finer grid with half the cell size. On the finer grid (where it exists), the particle density is recalculated in every cell and then each cell exceeding another given threshold (the refinement criterion on refined grids, ${\sf{RefRef}}$) is refined again. This is repeated until a grid is reached on which no further cell needs refinement.

Note that the use of two different refinement criteria—one for the domain grid and one for the refinements—has historical reasons routed in the requirement that for a cosmological simulation the interparticle separation of the initial particle distribution should be resolved and hence $\hbox{\textsf {DomGrid}}= 2 N$, with N3 being the total number of particles. However, this might lead to memory problems and hence the ability to choose a coarser domain grid but to refine it more aggressively was provided. For halo finding the choice of the domain grid is rather arbitrary. We will test for the influence of these parameters later on.

Following the procedure outlined above yields a grid hierarchy constructed in such a way that it traces the density field and can then be used to find structures in cosmological simulations: starting on the finest grid, isolated regions are identified and marked as possible halos. On the next coarser grid again isolated refinements are searched and marked, but now also the identified regions from the finer grid are linked to their corresponding volume in the coarser grid. Note that by construction the volume covered by a fine grid is a subset of the volume covered by the coarser grids.

By following this procedure, a tree of nested grids is constructed and we follow the halos from "inside-out," stepping in density contour level from very high densities to the background density. Of course, it can happen that two patches which are isolated on one level link into the same patch on the next coarser grid, in which case the two branches of the grid tree join. The situation after this step is depicted in the upper row of Figure 1.

Figure 1.

Figure 1. Here we illustrate the process of classifying a grid tree into substructures. In the top row, an arbitrary sample grid structure is shown on the left and the corresponding grid tree on the right. The classification starts by labeling the coarsest grid as part of the host and then proceeds to the next level. This is depicted in the second row: five isolated grids are embedded and the one containing the most particles is marked as the "host." The other four grids are then marked as subhalos. This process is repeated for the next levels, always deciding for each isolated grid whether is it the start of a new substructure or part of the parent structure.

Standard image High-resolution image

In a later step, once the grid forest is constructed, the classification of substructure can be made. To do this, we process each tree starting from the coarsest level downward to the finer levels, the procedure is also illustrated in Figure 1. Once the finer level splits up into two or more isolated patches, a decision needs to be made where the main branch continues. This is done by counting the particles contained within each of the isolated fine patches and we choose the one containing the most as the main branch, whereas the others are marked as substructures.5 Note that this procedure is recursive and also applies to the detection of sub-sub-structure.

Assuming now that the leaf of each branch of the grid tree corresponds to a halo, we start collecting particles by first assigning all particles on the corresponding isolated grids to this center. Once two halos "merge" on a coarser level, we tentatively assign all particles within a sphere given by half the distance to the host halo (here, the notation of substructure generated by the construction of the grid tree enters). We then consider the halo in isolation and iteratively remove unbound particles; the details of this process are described in Appendix A. Particles not bound to a subhalo will be considered for boundness to the host halo. By doing this for all prospective centers, we construct a list of halos with their respective particles; note that subhalos are included in their parent halo by default, but there is, however, an option to not include them.

The extent of the haloes is defined such that the virial radius is given by

Equation (1)

where $\bar{\rho }(r)$ denotes the overdensity within r and ρb is the background density. Hence the mass of the halo becomes

Equation (2)

Note that the virial overdensity Δvir depends on the redshift and the given cosmology; it is however also possible to choose a fixed Δ. This definition for the extent of a halo does not necessarily hold for subhalos, as it can happen that the overdensity never drops below the given threshold. The subhalo is therefore truncated at higher overdensities, given by a rise in the density profile.

We would like to note that host halos (or in other words field halos) initially include all particles out to the first isodensity contour that fulfills the criterion ρiso < Δvir(zb. Then the same procedure as outlined above is applied, i.e., the halo is considered in isolation and unbound particles are iteratively removed.

The (bound) particle lists will then be used to calculate canonical properties like the density profile, rotation curve, mass, spin, and many more. After the whole halo catalog with all properties has been constructed, we produce three output files: the first one contains the integral properties of each halo, the second holds for each halo radially binned information and the third provides for each halo a list of the IDs of all particles associated with this halo.

As this algorithm is based on particles, we natively support multi-mass simulations (dark matter particles as well as stars) and also SPH gas simulations. For Ahf they all are "just" particles and we can find halos based on their distribution, however, for the gas particles of SPH simulations we also consider their thermal energy in the unbinding procedure. Even though, for instance, the stellar component is much more compact than the DM part, this does not pose a challenge to Ahf due to its AMR nature: the grid structure is generated based on particle densities rather than matter densities that are obviously dominated by dark matter particles.

To summarize, the serial version of Ahf exhibits, in principle, only few parameters influencing the halo finding. The user has three choices to make, namely the size of the domain grid (${\sf{DomGrid}}$), the refinement criterion on the domain grid (${\sf{DomRef}}$) and the refinement criterion on the refined grids (${\sf{RefRef}}$) need to be specified. While all these three parameters are mainly of a technical nature they nevertheless influence the final halo catalogues obtained with Ahf. The code utilizes isodensity contours to locate halo centers as well as the outer edges of the objects. Therefore, ${\sf{DomGrid}}$ along with ${\sf{DomRef}}$ sets the first such isodensity level whereas ${\sf{RefRef}}$ determines how finely the subsequent isodensity contours will be spaced: if ${\sf{RefRef}}$ is set too large an object may not be picked up correctly as Ahf samples the density contours too coarsely. We refer the reader to a more indepth study of the influence of these parameters on the results in Section 3.

2.2. Parallel Approach

We will now describe the approach taken to parallelize the halo finding. In Section 2.2.1 we describe the way the volume is decomposed into smaller chunks and we then elaborate in Section 2.2.2 how the load-balancing is achieved before ending in Section 2.2.3 with some cautionary notes on the applicability of this scheme.

2.2.1. Volume Decomposition

There are many ways to distribute the workload to multiple processes. In some halo finders this is done by first generating particle groups with a Fof finder which can then independently processed (e.g., Kim & Park 2006). However, we chose to "blindly" split the whole computational volume to multiple CPUs. This is done because we plan to implement Ahf as part of the simulation code Amiga and not only as a stand-alone version6; the same as Mhf is also integrated into Mlapm and can perform the halo finding "on the fly" during the runtime of a simulation. For the simulation code, a proper volume decomposition scheme is required and hence we are using the approach described here. However, it should be noted that, in principle, Ahf is capable of working on any user defined subset of a simulation box and as such is not tied to the employed decomposition scheme.

The volume decomposition is done by using a space-filling curve (SFC), which maps the three spatial coordinates (x, y, z) into a one-dimensional SFC-index i; an illustration is given in the left panel of Figure 2. As many other workers have done before (e.g., Springel 2005; Prunet et al. 2008) we use the Hilbert curve. A new parameter, ${\sf{LB}}$, of Ahf regulates on which level this ordering is done; we then use 2LB cells per dimension; it is important to remark that the grid used for load-balancing is distinct from the domain grid and therefore an additional parameter. Each process will then receive a segment, described by a consecutive range of indices istart...istop, of the SFC curve and hence a subvolume of the whole computational box. The Hilbert curve has the useful property to preserve locality and also to minimize the surface area. It is in this respect superior to a simple slab decomposition, however at the cost of volume chunks with a complicated shape.

Figure 2.

Figure 2. Sample volume decomposition is shown in the left panel, the total volume is divided into four segments along a Hilbert curve (dashed line) on the load-balance grid. Here we choose $\hbox{\textsf {LB}}= 3$ and only show a two-dimensional sketch for the sake of clarity. The right panel shows for one selected segment, that is assigned to one MPI process, which additional boundary volume elements will be copied to the task.

Standard image High-resolution image

In addition to its segment of the SFC curve, each process will also receive copies of the particles in a buffer zone around its volume with the thickness of one cell of the ${\sf{LB}}$-grid (cf. the right panel of Figure 2). With these particles, each CPU can then follow the standard recipe described above (Section 2.1) in finding halos. The thickness of the boundary is hence an important quantity that is supposed to counteract the situation in which one halo is situated close to the boundary of two processes. As there is no further communication between the tasks, for reasons we will allude to below, the buffer zones need to be large enough to encompass all relevant particles. To fulfill this requirement, the thickness of the boundary zone, given by the ${\sf{LB}}$ parameter, should obey the relation

Equation (3)

where B is the size of the simulation box and Rmaxvir is the radius of the largest objects of interest. Note that each process only keeps those halos whose centers are located in the proper volume (as given by the SFC segment) of the process.

2.2.2. Load-balancing

To subdivide the SFC curve, different schemes can be employed, influencing the quality of the achieved load-balancing; this is ultimately measured by the difference of the run times of the different processes. We chose to use a scheme that distributes the particles evenly between the processes, e.g., each CPU will require the same amount of storage for the particles. This can of course lead to vastly different volumes assigned to each task (cf. Figure 2), but we will show in Section 5.2 that this simple scheme can provide reasonable results.

We therefore segment the SFC curve into chunks containing the same (within the precision allowed by the coarseness of the load balance grid given by ${\sf{LB}}$) number of particles. As with this scheme (or any scheme based on segmenting the volume along a SFC curve) it can happen that objects are cut between different processes, the grid tree construction would be very communication intensive. To circumvent this, we use the inclusion of a buffer zone alluded to above.

Note that after the duplication of boundary particles the balance of particles might shift, which is especially prominent for very inhomogeneous particle distributions, as found, for example, in simulations with a small box size. Since the decision of how to segment the SFC curve is modular, it is very easy to implement a different scheme, we plan on investigating into further criteria in future work.

2.2.3. Caveats

As we described above, we require that a single halo is processed by one task; we do not yet allow for parallelism on a halo level. Hence we introduced the duplication of particles, which can lead to an unfortunate distribution of the workload: large halos may require more time to be processed than many smaller halos. To counteract this, we provide the option of dividing the halo finding into a splitting and an analyzing step. Only the splitting step needs to be performed by a parallel task in which each process will dump its particles to a restart file. All those can then be analyzed independently.

This becomes important for large simulations, containing hundreds of million of particles. On smaller simulations (up to 5123) we find that the runtime unbalance is not very significant in absolute terms and a direct parallel approach is very well feasible; we will discuss this further in Section 5.2.

2.3. The Parallel Parameters of Ahf

To summarize, besides the three already existing parameters (cf. Section 2.1) the user of the MPI-enabled version of Ahf has to choose two additional parameters: the number of ${\sf{CPU}}$s will influence the total runtime and is the parameter that can be adapted to the available resources. Very crucial, however, is the size of the boundary zones, given by ${\sf{LB}}$, which has a significant influence on the reliability of the derived results and must be chosen carefully (cf. Equation (3)).

All of the Ahf parameters are of a more technical nature and hence should not influence the final halo catalogue. However, we like to caution the reader that inappropriate choices can in fact lead to unphysical results. The following section therefore aims at quantifying these influences and deriving the best-choice values for them.

3. TESTING

In this section we are discussing the impact of the free parameters of Ahf on the halo finding. In Section 3.1 we first introduce the ensemble of simulations we used and describe the tests. To gauge the influence of numerical effects, we perform one additional test in Section 3.2, before systematically varying the parameters in Sections 3.33.5.

3.1. Performed Tests

We use two different sets of simulations, summarized in Table 1. To investigate the impact of the free parameters of the algorithm and to produce comparative figures of the serial version to the parallel version, we employ the simulations containing 2563 particles contained in set 1. The boxes have different sizes, namely 20 h−1 Mpc, 50 h−1 Mpc, and 1.5 h−1 Gpc. The first simulation belongs to the set of simulations described in more detail in Knebe & Power (2008). The particulars of the second simulation are described elsewhere (C. Power et al. 2009, in preparation) and the last one is described in Wagner et al. (2008). Note that with this particle resolution we can still use the serial version to produce halo catalogs. An additional set of simulations is later used (cf. Section 5.3) to investigate the scaling behavior of Ahf with the number of particles. The three simulations of the 936 h−1 Mpc box forming set 2 were provided to us by Christian Wagner (see also Heitman et al. 2008).

Table 1. Summary of the Simulation Parameters

  Name B(h−1 Mpc) Npart
Set 1 B20 20 2563
  B50 50 2563
  B1500 1500 2563
Set 2 B936lo 936 2563
  B936me 936 5123
  B936hi 936 10243

Download table as:  ASCIITypeset image

For a successful run of the parallel version of Ahf, five parameters need to be chosen correctly, each potentially affecting the performance and reliability of the results. These are:

  • 1.  
    ${\sf{DomGrid}}$: the size of the domain grid (cf. Section 3.3);
  • 2.  
    ${\sf{DomRef}}$: the refinement criterion on the domain grid (cf. Section 3.3);
  • 3.  
    ${\sf{RefRef}}$: the refinement criterion on the refined grid (cf. Section 3.4);
  • 4.  
    ${\sf{LB}}$: the size of boundary zones (cf. Section 3.5);
  • 5.  
    ${\sf{CPU}}$: the number of MPI processes used for the analysis (cf. Section 3.5).

The latter two apply for parallel runs only. We systematically varied those parameters and analyzed the three simulations of set 1 described above. In Table 2, we summarize the performed analyses.

Table 2. Summary of the Analyses Parameters of B20, B50 and B1500

Namea ${\sf{DomGrid}}$ ${\sf{CPU}}$b ${\sf{LB}}$c ${\sf{DomRef}}$ ${\sf{RefRef}}$
064-01-5-5.0-5.0 64 1 5 5.0 5.0
064-01-5-1.0-5.0 64 1 5 1.0 5.0
128-01-5-5.0-5.0 128 1 5 5.0 5.0
128-01-5-1.0-5.0 128 1 5 1.0 5.0
128-01-5-1.0-4.0 128 1 5 1.0 4.0
128-01-5-1.0-4.0 128 1 5 1.0 3.0
128-01-5-1.0-4.0 128 1 5 1.0 2.0
128-02-4-5.0-5.0 128 2 4 5.0 5.0
128-02-5-5.0-5.0 128 2 5 5.0 5.0
128-02-6-5.0-5.0 128 2 6 5.0 5.0
128-02-7-5.0-5.0 128 2 7 5.0 5.0
128-04-4-5.0-5.0 128 4 4 5.0 5.0
128-04-5-5.0-5.0 128 4 5 5.0 5.0
128-04-6-5.0-5.0 128 4 6 5.0 5.0
128-04-7-5.0-5.0 128 4 7 5.0 5.0
128-08-4-5.0-5.0 128 8 4 5.0 5.0
128-08-5-5.0-5.0 128 8 5 5.0 5.0
128-08-6-5.0-5.0 128 8 6 5.0 5.0
128-08-7-5.0-5.0 128 8 7 5.0 5.0
128-16-4-5.0-5.0 128 16 4 5.0 5.0
128-16-5-5.0-5.0 128 16 5 5.0 5.0
128-16-6-5.0-5.0 128 16 6 5.0 5.0
128-16-7-5.0-5.0 128 16 7 5.0 5.0

Notes. aThe name is constructed from the parameters in the following way: ${\sf{DomGrid\hbox{-}CPU\hbox{-}LB\hbox{-}DomRef\hbox{-}RefRef}}$. bAnalyses employing one CPU have been performed on an Opteron running at 1.8 GHz, whereas the analyses utilizing more than one CPU were performed on Xeons running at 3 GHz. cNote that this parameter has no effect for runs with only one CPU.

Download table as:  ASCIITypeset image

From each analysis, we will construct the mass function NM) in logarithmic mass bins and the spin parameter distribution N(Δλ), where the spin parameter is calculated as (Peebles 1969)

Equation (4)

where J is the magnitude of the angular momentum of material within the virial radius, M is the virial mass, and E the total energy of the system. The spin parameter λ hence combines mass and velocity information and therefore allows for a good stability test of the results. We will further cross-correlate the halo catalogs to investigate differences in the deduced individual halo properties.

3.2. Numerics

As the grid hierarchy plays the major role in identifying halos, we test the sensibility of the derived properties on the grid structure by introducing small numerical artifacts which can lead to slightly different refinement structures. To do this, we employ only one CPU and perform two analyses, one on the original particle distribution and one on a shifted (by half a domain grid cell in each direction, taking periodicity into account) particle distribution; we otherwise keep all other parameters constant at $\hbox{\textsf {DomGrid}}= 64$, $\hbox{\textsf {DomRef}}= 5.0$, and $\hbox{\textsf {RefRef}}= 5.0$. The ratio of the resulting mass functions and spin parameter distributions are shown in Figure 3.

Figure 3.

Figure 3. We show a comparison of derived global properties for the original particle distribution and a distribution that has been shifted by half a domain grid cell ($\hbox{\textsf {DomGrid}}= 64$) in each direction. The ratio of the shifted to the unshifted mass function (spin parameter distribution) is shown in the upper (lower) panel.

Standard image High-resolution image

Small deviations can be seen in the mass function, which is however of the order of 1–3 per cent in a few bins and then mostly due to one or two haloes changing a bin. This translate into some scatter in the spin parameter distribution; however, the seemingly large deviation on the low- and high-spin end are artificially enhanced due to the low number counts in the bins, we also excluded halos with less than 100 particles (cf. Section 3.4) from this plot. The scatter here is also of the order of 1–3 per cent. It is important to keep in mind that the derived properties can vary to this extent simply due to numerical effects.

However, the main point of this comparison is to verify that, up to numerical effects caused by the perturbed density field, we do not introduce any systematic deviation in the statistical properties.

3.3. Domain Grid

We now focus on the choice of ${\sf{DomGrid}}$, i.e., the size of the domain grid. To this extent, we employ one process and vary the domain grid and the refinement criterion ${\sf{DomRef}}$ on the domain grid. In Figure 4 we show the deduced mass function and spin parameter distribution for the four cases. As can be seen, the impact of the domain grid choice is negligible; small deviations can be seen at the high-mass end of the mass function in B50; however, these are caused by (of order) one halo changing bin across runs with varied parameters. We will discuss the drop of NM) at the low M end of the mass function below.

Figure 4.

Figure 4. Here we show the mass function (upper row) and the spin parameter distribution (lower row) for the three simulations. The left (middle, right) column corresponds to the B20 (B50, B1500), respectively. We employed one CPU and varied the domain grid between 64 and 128 and for each choice also varied the refinement criterion on the domain grid between 5.0 and 1.0, keeping the refinement criterion on the refined grids fixed at 5.0.

Standard image High-resolution image

In view of the parallel strategy the insensitivity of the results to the choice of the domain grid is reassuring, as we can use a rather coarse domain grid and start the refinement hierarchy from there; note that the domain grid will be allocated completely on each CPU and hence choosing a fine grid would lead to a large number of empty cells in the parallel version.

It can also be seen that the choice of the refinement criterion on the domain grid has no impact. This is because the domain grid is coarse enough as to justify refinement everywhere anyhow. In fact, only very pronounced underdense regions would not trigger refinements for the choices of parameters and particle resolution.

3.4. Refinement Criterion

Now we investigate the effect of choosing a different refinement criterion (${\sf{RefRef}}$) for the refined grids. We limit ourselves to a domain grid of $\hbox{\textsf {DomGrid}}= 128$ cells per dimension and use a refinement criterion on the domain grid of $\hbox{\textsf {DomRef}}= 1.0$ with one process. We then vary the refinement criterion on the refined grids from $\hbox{\textsf {RefRef}}= 5.0$ (this corresponds to analysis already used above when investigating the impact of the domain grid and its refinement criterion) to $\hbox{\textsf {RefRef}}= 2.0$ in steps of 1.0.

In Figures 5 and 6 we show the effect of varying the refinement criterion on the refined grids on the mass function and the spin parameter distribution. It can be seen that the mass function changes mainly at the low mass end. The changes are related to the mass discreteness: the more particles a halo is composed of, the easier it is to pick it up with our refinement criterion based on the number of particles within a cell. Hence it can be readily understood that by forcing a smaller criterion, more halos with a small amount of particles will be found.

Figure 5.

Figure 5. Effect of the refinement criterion on the refined grids on the mass function. We show the mass functions derived for the given analyses in the top and also deviations arbitrarily normalized to the 128-01-1.0-4.0 in the smaller bottom panels. For guidance, the vertical lines indicate the mass corresponding to halos with 50 and 100 particles, respectively. The columns are organized as in Figure 4.

Standard image High-resolution image
Figure 6.

Figure 6. As Figure 4 but for the spin parameter distribution of the whole sample of halos (top) and restricted to halos resolved with more than 100 particles (bottom).

Standard image High-resolution image

This is important for the completeness of the deduced halo catalogs: only when choosing a very small refinement criterion (≲3.0), we are complete at the low-particle-count7 end.

The spin parameter distribution is also affected. We can see in the top row of Figure 6 that the peak of the distribution shifts slightly and is reduced in height.8 However, when only including halos with more than 100 particles (shown in the bottom panels of Figure 6) in the calculation of the distribution, the shift is reduced and the distributions coincide; note that the most massive halo in the analyses of B1500 is resolved with only 237 particles.

As we will discuss later, the choice of the refinement criterion severely impacts on the runtime and the memory requirements (see Figure 14) and hence should be lowered only with care.

3.5. The Boundary Zone and the Number of Processes

We will now investigate the effect of the size of the boundary zone and the number of processes. As the reference model we use a serial run with a domain grid of $\hbox{\textsf {DomGrid}}= 128$ cells per dimension, a refinement criterion of $\hbox{\textsf {DomRef}}= 1.0$ on the domain grid and a refinement criterion of $\hbox{\textsf {RefRef}}= 5.0$ on the refinements; note that the ${\sf{LB}}$ parameter has no effect in the case of a serial run. We also change the number of processes involved in the analysis from 1 (the reference analysis for each box) in factors of 2–16, in which case the ${\sf{LB}}$ parameter has a very significant meaning: Recall that the volume decomposition scheme not only assigns to each CPU the associated unique volume, but also a copy of the boundary layer with a thickness of 1 cell. We increase the size of the decomposition grid from a 24 = 16 ($\hbox{\textsf {LB}}= 4$) cells per dimension grid by factors of two up to a 27 = 128 ($\hbox{\textsf {LB}}= 7$) cells per dimension grid. To correctly identify a halo located very close to a boundary, the buffer volume must be large enough to encompass all its particles, namely the thickness of the boundary should be Rmaxvir (cf. Equation (3)). This condition is violated for B20 from $\hbox{\textsf {LB}}> 4$ on and for B50 for $\hbox{\textsf {LB}}> 5$, whereas in B1500 it would only be violated for $\hbox{\textsf {LB}}> 8$. Hence we expect to see differences in the derived properties due to the volume splitting for all analyses violating this condition.

3.5.1. Statistical Comparison

We first concentrate on integrated properties, namely the mass functions and the spin parameter distributions again. In Figure 7 we show the ratio of the mass functions, whereas in Figure 8 the ratio of the spin parameter distribution is depicted. In all cases we have taken the serial run 128-01-5-5.0-5.0 as the reference and show relative deviations from it. In each figure the number of employed processes increases from top to bottom from 2 to 16, whereas each single plot shows the derived mass function (spin parameter distribution) for the four choices of the boundary size.

Figure 7.

Figure 7. We show the mass functions derived for our three simulations varying the number of processes and size of the domain decomposition grid.

Standard image High-resolution image
Figure 8.

Figure 8. As Figure 7 but for the spin parameter distribution.

Standard image High-resolution image

It can be seen that the integrated properties appear to be in general rather unaffected by the choice of the number of CPUs or the ${\sf{LB}}$ level. For B20, the mass function only shows a difference for the ${\sf{LB}}$7 analysis, the B50 shows differences at the high-mass end for ${\sf{LB}}$6 and ${\sf{LB}}$7 and more than 4 CPUs. Looking at the spin parameter distribution, only B20 shows a difference at the large λ end for ${\sf{LB}}$6 and ${\sf{LB}}$7. B1500 is completely unaffected by any choice of tested parameters. As we have alluded to above, this is expected and we can clearly see two effects coming into play: first, the smaller the boundary zone is, the higher the probability that a halo might not be available entirely to the analyzing task. Second, increasing the number of CPUs simultaneously increases the amount of boundary cells and hence the chance to provoke the aforementioned effect.

However, it is interesting to note that in a statistical view we really have to go to the worst case choices (large ${\sf{LB}}$, many CPUs), to get a significant effect. Also we should note that the shown deviations are relative and are hence more pronounced in the bins with low counts, namely the outskirts of the spin parameter distribution and the high-mass end of the mass function. Choosing the correct ${\sf{LB}}$ value, we can see that the integrated properties do not depend on the number of CPUs involved.

3.5.2. Halo–Halo Cross Comparison

We will now investigate the dependence of the individual halo properties on the parallelizing parameters. To this extent we perform a cross-correlation of particles in identified objects between the different parallel runs and—taken as the reference—a serial run.

We employ a tool included in the Ahf distribution which establishes for two given halo catalogs—for this purpose consisting of the particle IDs—for each halo from the first catalog the best matching halo from the second catalog. This is done by maximizing the quantity

Equation (5)

where Nshared is the number of shared particles between the two halos and N1 and N2 are the total number of particles of the halo from the first and the second catalog, respectively.

In the ideal case, we expect every particle found in a halo in the reference analysis to also appear in the corresponding halo in the parallel version. Due to the choice of the boundary size, this might not be the case, as for halos located close to a boundary the required particles might not be stored in the boundary. Also numerical effects from a different grid structure in the boundary can lead to slight variations in the identified halos, as we have demonstrated already in Figure 3. Additionally we should note that the catalogs produced by two runs with different numbers of CPUs cannot be compared line by line, as, even though the ordering is by number of particles, halos with the same number of particles are not guaranteed to appear in the same order.

We show this impact for the mass of the halos in Figure 9 and for the spin parameter in Figure 10. Note that we only plot data points for halos that have different properties. In total, we have 16631 (19621, 4971) objects in B20 (B50, B1500).

Figure 9.

Figure 9. Mass ratio of halos in the parallel version using 2 (4, 8, 16) processes from top to bottom in the three boxes. Open boxes (circles, upright triangles, downright triangles) are used to indicate ${\sf{LB}}$4 (5, 6, 7). The additional curved lines correspond to a difference in halo mass of ±5 particles. Note that we only show those halos that have a ratio not equal to unity and we see the expected behavior that the mismatch becomes more prominent in the case of a large ${\sf{LB}}$ (cf. discussion in Section 3.5.2.)

Standard image High-resolution image
Figure 10.

Figure 10. As Figure 9 but for the spin parameter as a function of halo mass.

Standard image High-resolution image

It is interesting to note that even though the distribution of integrated properties (cf. Section 3.5.1) show no significant effect, we do find slightly different results for individual halos. However, the observed differences in B20 and B50 for ${\sf{LB}}$6 and ${\sf{LB}}$7 are to be expected, as in these cases the boundary zones are to thin to cope with the extent of the halos, as we have alluded to above. For B1500 we would need to increase the ${\sf{LB}}$ level to a significantly larger number to provoke the same effects. Additionally the differences found are generally of the order of a few particles as indicated by the dashed lines in Figure 9 which correspond to a difference of five particles. We have observed in Section 3.2 that numerical noise—as introduced by shifting the particle positions—already can induce fluctuations of the order of a few particles (cf. Figure 3).

Generally, however, we can say that, choosing a sane ${\sf{LB}}$ level complying with Equation (3), the parallel version gives the same result as the serial version.

4. COMPARISON TO OTHER HALO FINDERS AND THEORTICAL PREDICTIONS

In this section we compare the results of Ahf to three other halo finding mechanisms. We restrict ourselves to the analysis of B20 and use a Fof halo finder (Davis et al. 1985), Skid (Stadel 2001), and an implementation of the Bdm (Klypin & Holtzman 1997) method.

For the Fof run, we use a linking length of 0.17, which yields an overdensity at the outer radius comparable to the virial overdensity used in Ahf. In the Bdm analysis, in order to find (potential) halo centers we used a density field smoothed on a scale of 10h−1 kpc, approximately corresponding to a mass scale of 109h−1 Mpc, and we therefore expect the mass function to not be complete on this scale. Otherwise we use a variant of the original Bdm scheme to identify the final halos and their properties (cf. Appendix B in Bullock et al. 2001). For the Skid run, we used a linking length of 3epsilon, where epsilon is the force resolution of the simulation (epsilon = 1.5 h−1kpc).

Besides a direct comparison between these halo finders in Section 4.1, we will further investigate in Section 4.2 how well theoretical descriptions presented in the literature describe the numerically obtained mass functions. To this extent, we use the highest resolved simulation of set 2, B963hi.

As one of our claims is that Ahf is capable of identifying field halos as well as subhalos simultaneously it also appears mandatory to compare the derived subhalo mass function against the findings of other groups. These results will be presented in Subsection 4.3.

4.1. Comparison to Other Halo Finders

In Figure 11 we show the derived mass functions for the four different codes in the top panel alongside Poissonian error bars for each bin. The lower panel investigates the relative deviation of the three additional halo finders tested in this section to the results derived from our Ahf run. To this end, we calculate

Equation (6)

where X is either Bdm, Fof, or Skid. Hence a positive deviation means that the Ahf run found more halos in the given bin than the halo finder compared to.

Figure 11.

Figure 11. Mass function derived with different halo finders for B20. The top graph depicts the actual mass function and we only show the Poisson errors for the Ahf data for clarity. In the lower frame the ratio of the mass functions to the Ahf mass function is shown. Additionally, the dashed vertical lines indicate the halo masses that correspond to 50 and 100 particles, respectively.

Standard image High-resolution image

Please note that with the settings used, we do not consider the mass function below masses corresponding to 100 particles to be complete in the Ahf analysis. Also the sharp decline of the Bdm function is not due to the method but rather to the smoothed density field alluded to above.

Generally, we find the mass functions derived with Bdm and Skid to match the results from the Ahf run within the error bars quite well, whereas the Fof analysis shows a systematic offset. This mismatch between mass functions deduced with the Fof method and So-based methods is known and expected (see, e.g., the discussion in Tinker et al. 2008; Lukić et al. 2009, concerning the relation of Fof masses to So masses).

4.2. Comparison to Theoretical Mass Functions

The entering of the era of precision cosmology in the last couple of years made it possible to derive the mass function of gravitationally bound objects in cosmological simulations with unprecedented accuracy. This led to an emergence of refined analytical formulae (cf. Sheth & Tormen 1999; Jenkins et al. 2001; Reed et al. 2003; Warren et al. 2006; Tinker et al. 2008) and it only appears natural to test whether or not our halo finder Ahf complies with these prescriptions. To this extent we are utilizing our best resolved simulation (i.e., B963hi of Set 2, cf. Table 1) and present in Figure 12 a comparison against the mass functions proposed by Press & Schechter (1974), Sheth & Tormen (1999), Jenkins et al. (2001), Reed et al. (2003), Warren et al. (2006), and Tinker et al. (2008).

Figure 12.

Figure 12. In this figure, the mass function for B963hi derived with Ahf is compared to a variety of theoretical forms. Again, the vertical dashed lines correspond to the halo masses of 50 and 100 particles, respectively.

Standard image High-resolution image

In this respect it must be noted that the proposed theoretical mass functions are calibrated to mass functions utilizing different halo overdensities. The mass function of Sheth & Tormen (1999) is based on halo catalogs applying an overdensity of Δ = 178 (Tormen 1998), and the Reed et al. (2003) and Warren et al. (2006) functions are formulated for overdensities of Δ = 200. Jenkins et al. (2001) provide various calibration and we use their Equation (B4) which is calibrated to a ΛCDM simulation and an So-halo finder with an overdensity of Δ = 324, quite close to our value of Δ = 340. The Tinker et al. (2008) mass function explicitly includes the overdensity as a free parameter and as such we use the value of our catalog to produce the mass function.

We find the Tinker et al. (2008) formula to give an excellent description of the observed mass function and also the Jenkins et al. (2001) formula provides a good description, albeit overestimating the number of halos at intermediate masses, which might be due to the slightly wrong overdensity. While the other formulas also provide an adequate description for the mass function at intermediate masses, they overestimate the number of halos at the high-mass end. As has been eluded to above this is to be expected and can be understood by the different halo overdensities they have been calibrated to. However, clearly the Press & Schechter (1974) prescription does not provide a good description of the halo mass function in this case. As the Tinker et al. (2008) formula provides an excellent fit to our data, we refer the reader to their work for further discussions concerning the impact of the different overdensities used to define a halo.

4.3. The Subhalo Mass Function

As our halo finder simultaneously finds field and subhalos without the need to refine the parameters for the latter it appears mandatory to compare the derived subhalo mass function against results in the literature, too. To this extent, we identify the substructure of the most massive halo in B20 with our cross matching tool (cf. Section 3.5.2). We restrict ourselves here to only investigating this particular halo, as the particle resolution for the other available simulations is not sufficient to resolve the subhalo population adequately. Also, we did use the 128-01-5-1.0-2.0 analysis to have a high sensitivity to very small halos (cf. the discussion of Ahf's parameters in Section 3).

It has been found before (e.g., Ghigna et al. 2000; Helmi et al. 2002; De Lucia et al. 2004; Gao et al. 2004; van den Bosch et al. 2005; Diemand et al. 2007; Giocoli et al. 2008; Springel et al. 2008) that the subhalo mass function can be described with a functional form Nsub(>M) ∝ M−α, with α = 0.7...0.9. In Figure 13 we show the cumulative mass function of the most massive halo in B20 and provide—as guide for the eye—two power laws with those limiting slopes. Additionally, we fitted a power law to the actual NAHFsub(>M), yielding a slope of α = 0.81. This test confirms the ability of Ahf to reproduce previous findings for the subhalo mass function and hence underlining its ability to function as a universal halo finder.

Figure 13.

Figure 13. The cumulative subhalo mass function for the most massive halo in B20. This halo is resolved with ≈8 × 105 particles and contains 221 resolved subhalos. Also we show three power law fits, two with a fixed slope of −0.7 and −0.9, respectively, and a third with the slope as a free parameter, yielding −0.81. Note that this plot is based on the 128-01-5-1.0-2.0 analysis.

Standard image High-resolution image

5. RESULTS

In this section we summarize the results obtained from the parameter study (Section 5.1) and present the requirements of memory and time. We believe this to be important information for understanding how to maximize the gain from using Ahf. In this respect, we specifically investigate the achieved load-balancing of the parallel version (Section 5.2). We further present the scaling of Ahf with increasing problem size in Section 5.3.

5.1. Parameter Dependence

As we have shown, the ${\sf{DomGrid}}$ and ${\sf{DomRef}}$ parameters have a negligible impact on the derived properties and given an adequate choice for the LB level, also the number of employed CPUs has no impact on the derived parameters. However, the ${\sf{RefRef}}$ parameter can be used to force completeness of the mass function down to small particle counts. This comes with a price as we show in Figure 14; this figure depicts the increase in run time and memory requirement relative to a choice of $\hbox{\textsf {RefRef}}= 5.0$ when changing the ${\sf{RefRef}}$ parameter. As we have shown in Figure 5, a ${\sf{RefRef}}$ parameter of 3.0 would be required to achieve completeness for halos with less than 50 particles. However this is a factor of 2–3 in runtime and an increase in memory requirement by ∼40 per cent. These numbers are derived from a serial run.

Figure 14.

Figure 14. Impact of the refinement criterion on the refined grids on the memory and time consumption of Ahf. We normalize the 128-01-5-1.0-X.0 runs to the 128-01-5-1.0-5.0 run and show the relative time in the upper and the relative memory consumption in the lower panel.

Standard image High-resolution image

5.2. Loadbalancing

We will now investigate the quality of the load-balancing of Ahf. All parallel analyses of set 1 (cf. Table 1) have been performed on the damiana9 cluster, where each node is equipped with two Dual Core Xeon processors running at 3 GHz. Due to memory constraint—especially for the variation of the ${\sf{RefRef}}$ parameter—the serial analyses have been performed on one of our local machines (Opteron running at 1.8 GHz) where we were certain to not be bound by memory. For this reason, the times between the serial and parallel runs are not directly comparable.

In Figure 15 we show the absolute times of the runs and we give the span of run times of the separate processes for the parallel analyses. It can be seen that increasing the number of CPUs involved in the analysis indeed reduces the runtime. However, when increasing the number of CPUs a widening of the spread between the fastest and the slowest task can be observed. This is most pronounced in B20 and in particular for the time required to do the halo analysis, whereas the time required to generate the grid hierarchy exhibits a smaller spread. The large spread can also be seen in the memory requirement for the grids, shown in the lower row of Figure 15. This is especially pronounced for small ${\sf{LB}}$ levels. When going to larger boxes the spread becomes smaller.

Figure 15.

Figure 15. We show the absolute times required for the different analyses in the top row and the total memory requirement for the grid hierarchy and the particles in the lower row. The columns are ordered as described in Figure 4. The top row gives the duration for the whole run, whereas the two middle rows give the times needed for the grid generation and the actual halo analysis. The last row shows the amount of memory required to store the particle information and the grid hierarchy. We use the 128-01-5-5.0-5.0 run for producing the data point for 1 CPU and plot for the parallel runs the maximal and minimal values. Each panel shows the results for different choices of LB and for clarity, the curves are offset to one another by factors of 2. Note that runs with one CPU were, due to memory restrictions, performed on a different, slower, machine than the parallel runs, hence the kink in the timing curves at 2 CPUs is not real. However, the memory curves are not affected by this.

Standard image High-resolution image

The behavior of increasing spread with decreasing box size can be readily understood as the effect of the non-homogeneity of the particle distribution manifesting itself. A small box will typically be dominated by one big halo, whereas in large boxes there tend to be more equally sized objects. As we do not yet allow one single halo to be jointly analyzed by more than one CPU, the analysis of the most massive halo will basically dictate the achievable speed-up. Of course, when decreasing the boundary size (increasing the ${\sf{LB}}$ level), the spread can be reduced; however, the results will be tainted as has been shown in Figures 9 and 10.

In Figure 16 we closer investigate the achieved memory saving R(n) with the number of MPI-processes n, defined as

Equation (7)

where M(n) of course refers to the amount of memory required for one process when using n CPUs. We correct the memory information for the contribution of the domain grid (for the choices of parameters it consumes 64 MB); note that this correction is not done in Figure 15 presented above. We chose to do this here to better visualize the actual gain in the parallel version, which is close to what is expected, when the relative contribution of the domain grid to the memory usage is small. That is not quite the case in our test simulations; however, this is true for large simulations.

Figure 16.

Figure 16. Here we show the achieved memory reduction as defined by Equation (8) of the parallel runs. For guidance, the curves for linear scaling are shown as a dashed (gray) curve and additionally, we show the expected ideal memory scaling with a solid gray line (see Section 5.2 for more details).

Standard image High-resolution image

Note that linear scaling cannot be expected due to the duplicated boundary zones. We show this in Figure 16 for the ideal case memory reduction expected including the boundary zones (solid gray line; for the case of ${\sf{LB}}$4). We can estimate this by comparing the amount of cells each process is assigned to as

Equation (8)

where N ≡ 2LB is the number of cells in one dimension on the ${\sf{LB}}$-grid and n is the number of CPUs. Note that for this estimate it is assumed that the amount of work that needs to be done per cell is same for all cells and in such this is only an ideal case estimate. We can see that this curve is tracked closely in larger boxes when the distribution of particles is more uniform than in small boxes.

It is important to keep in mind that this holds for simulations with 2563 particles. In fact, the analysis could still be performed in reasonable time (∼30 minutes, depending on the box size) and with modest memory requirements (∼2–3 GB) on a single CPU. However, a single CPU approach would be unfeasible already for a 5123 simulation and completely out of the question for larger simulations.

5.3. Scalability

So far we have investigated the scaling behavior for simulations with a fixed particle resolution. It is now interesting to ask how the algorithm scales when increasing the number of particles in the simulation. To this extent we use the simulations of set 2 (see Table 1 for the particulars) which were performed on the same initial conditions represented with three different particle resolutions. All analyses have been performed on hlrbii at the LRZ München.

According to the results derived in Section 3, we used the analysis parameters as $\hbox{\textsf {DomRef}}= 64$, $\hbox{\textsf {DomRef}}= 5.0$, $\hbox{\textsf {RefRef}}= 5.0$, and $\hbox{\textsf {LB}}= 7$. We used the option to divide the run of Ahf into a splitting and an analyzing step (cf. Section 2.2.3) and used 4 (16, 64) CPUs for the splitting of B936lo (B936me, B936hi). For the subsequent analyzing step, we only changed the number of CPUs for B936hi from 64 to 1210. In Table 3 we present the full timing information reported by the queueing system for the different jobs; note that the wall time for B936hi is quite large compared to the wall time used to B936lo and B936hi; however, this is due to the fact that only 12 CPUs were used to process the 64 chunks generated in the splitting step.

Table 3. Timing Results for the Analyses of Set 2

Name TCPUan(h) TWallan(h) nCPUan TWallsp(m) nCPUsp
B936lo 0.40 0.10 4 1.42 4
B936me 3.74 0.33 16 3.25 16
B936hi 39.42 3.04 12 5.73 64

Notes. The subscript "an" refers to quantities related to the actual analysis, where the subscript "sp" labels quantities related to the splitting of the simulation volume. The superscript "CPU" labels the required CPU time, whereas "Wall" notes the wall clock time. Note the times for the analysis process are given in hours, whereas the times for the splitting are in minutes.

Download table as:  ASCIITypeset image

We also present the scaling in Figure 17 were we show the required CPU time for all three analyses normalized to B936lo. Note that the expected Nlog N scaling is tracked quite remarkably. It should be noted though that the large box size represents the ideal situation for Ahf, the scaling will be not as good for smaller box sizes.

Figure 17.

Figure 17. Relative CPU time as reported by the queueing system is shown depending on the relative simulation size. The data points are normalized to the 2563 particle representation and we show a linear scaling with a dashed gray line and an Nlog N scaling with a solid line. The data points are labeled with the simulation resolution they correspond to.

Standard image High-resolution image

6. SUMMARY AND CONCLUSIONS

We have introduced and described the halo finding tool Ahf, which is freely provided for use to the community.11 Ahf natively reads Gadget (Springel et al. 2001; Springel 2005) format as well as Amiga binaries and can deal with large simulation data. Furthermore, the adequate inclusion of star and gas particles is supported as well. Ahf requires only very few parameters; in fact, only five (three) parameters need are to be specified for the parallel (serial) version. We have shown in Section 3.3 that the size of the domain grid (${\sf{DomGrid}}$) and the refinement criterion on the domain grid (${\sf{DomRef}}$) have only a very marginal impact on the results, reducing the number of relevant parameters to 3 (1).

However, the refinement criterion on the refined grids (${\sf{RefRef}}$) does influence the completeness of the derived halo catalog. We discussed this in detail in Section 3.4 finding that to be complete for halos containing less than 50 particles, a refinement criterion of ≲3.0 must be chosen. However, as we show in Figure 14, this increases the memory requirement by ∼40 per cent and the runtime by a factor of 2–3. This will be mostly interesting for analyses of snapshots at high redshift and low particle resolution.

Furthermore, we have shown in Section 3 that the number of CPUs involved in the analysis only has no impact on the derived properties when the ${\sf{LB}}$ parameter (Section 3.5) is chosen carefully. However, given a suitable choice of the ${\sf{LB}}$ parameter a reasonable scaling is achieved (Section 5.2). This is especially important for the memory scaling, as this is the key allowing for the analysis of billion-particle simulations. In fact he have shown in Section 5.3 that, given a good choice of parameters, Ahf scales very well with increasing particle resolution. We have shown this explicitly for a 10243 simulation in this paper and also have successfully employed Ahf previously on 5123 simulations (e.g., Knollmann et al. 2008).

To summarize the choice of recommended parameters:

  • 1.  
    We suggest to use a ${\sf{DomGrid}}$ of 64,
  • 2.  
    a ${\sf{DomRef}}$ of 5.0 and
  • 3.  
    a ${\sf{RefRef}}$ of 5.0 to achieve trustworthy results for halos made up by more than 50 particles.
  • 4.  
    To achieve untainted results in the parallel version, the ${\sf{LB}}$ level should be chosen in such a way that the relation given in Equation (3) holds.
  • 5.  
    The number of CPUs should be chosen as small as possible, the limiting factor here is the available memory.

As we have shown in Section 5.2, the mismatch in runtime between the separate tasks can become significant. We provide a way around this by a two-staged approach: first, the independent volume chunks including the boundary are constructed on all CPUs. Then those will be dumped to disk and can be analyzed in serial (or multiple at once, as the chunks are completely independent). This approach becomes important for inhomogeneous matter distributions (small box sizes) and large numbers of particles (>5123). This feature is available in the public version; for details, contact the authors.

S.R.K. and A.K. acknowledge funding through the Emmy Noether Programme of the DFG (KN 755/1). A.K. is supported by the MICINN through the Ramon y Cajal programme. Simulations have been performed at the Centre for Astrophysics & Supercomputing, Swinburne University and the Astrophysical Institute Potsdam; we thank Chris Power and Christian Wagner for providing the snapshots to us. The analyses have been conducted in part on a local machine at the AIP, whereas most analyses have been performed on the damiana cluster at the Albert-Einstein Institute in Golm and the hlrbii at the LRZ München within the context of the AstroGrid-D project. S.R.K. thanks Steve White for his help in using the grid infrastructure. Part of this work was carried out under the HPC-EUROPA project (RII3-CT-2003-506079), with the support of the European Community-Research Infrastructure Action under the FP6 "Structuring the European Research Area" Programme.

APPENDIX: Ahf'S UNBINDING PROCEDURE

In order to remove gravitationally unbound particles we have to obtain the (local) escape velocity vesc(r) at each particle position. As vesc is directly related to the (local) value of the potential, $v_\mathrm{esc} = \sqrt{2\left| \phi \right|}$, we integrate Poisson's equation (under the assumption of spherical symmetry):

Equation (A1)

The first integral reads as follows:

Equation (A2)

This equation shows that dϕ/drM(<r)/r2 and hence r2dϕ/dr → 0 for r → 0. We are therefore left with the following first-order differential equation for ϕ(r):

Equation (A3)

Another integration leaves us with

Equation (A4)

This time we need to calculate ϕ0. We do this by requiring ϕ() = 0:

Equation (A5)

Equation (A6)

Equation (A7)

Equation (A8)

and hence

Equation (A9)

Note that we assume that the halo is truncated at rvir when evaluating the integral $\int _{r_\mathrm{vir}}^{\infty } \frac{M(<r^{\prime })}{r^{\prime 2}} {d}r^{\prime }$. An initial guess for particles belonging to a (sub)halo in a cosmological simulation is based upon the adaptive mesh hierarchy of the simulation code Amiga, as we have described in Section 2.1. Unbound particles are removed iteratively where we integrate Equation (A4) along a list of radially ordered particles; the same holds for obtaining ϕ0 that has to be re-evaluated for every new iteration.

Footnotes

  • Please note that the particle density is used, not the mass density.

  • There are also different criteria available and described in detail in the user manual to make this decision.

  • Note that for the serial version this is already the case.

  • This translates in general into a low mass halo; however, the important quantity is really the number of particles, not their mass. In zoomed simulations, low-mass halos in the zoomed region might be found completely whereas halos of the same mass in the low-resolution regions are not picked up.

  • The latter is due to the fact that the distribution is not normalized.

  • For more details on the topology of the cluster, see http://supercomputers.aei.mpg.de/damiana/.

  • 10 

    Note that this means that a team of 12 analysis tasks worked on the 64 chunks produced in the splitting step.

  • 11 
Please wait… references are loading.
10.1088/0067-0049/182/2/608