Visualization of large scale structure from the Sloan Digital Sky Survey

We will discuss the challenges of visualizing large cosmological datasets. These include observational issues such as the masks and incomplete nature of the survey volume, cosmological issues such as redshift distortions and the difficulty of visualizing datasets that span cosmological epochs, as well as the inherent visualization challenges in presenting dense three-dimensional (3D) datasets. Two case studies will be presented. The first will feature the identification of filamentary structures in the large scale distribution of galaxies. The second case study will feature visualizations of the correlations between quasar absorption line systems and luminous red galaxies. Finally, we will give an overview of our visualization work-flow which features the use of the open-source 3D modeling program Blender.


The Sloan Digital Sky Survey
The Sloan Digital Sky Survey (SDSS; York et al 2000) is the largest sample of spectroscopic galaxy redshifts. Data release 6 (DR6; Adelman-McCarthy et al 2008) contains 790 000 galaxy redshifts, 102 000 quasars and spans 8400 degrees on the sky. Figure 1 is an Aitoff-Hammer projection of the density of spectroscopic galaxies in galactic coordinates. As this map demonstrates the main survey region is at the Northern Galactic Cap, with three flanking stripes in the South. Figure 2 shows a close up of the main survey region, now projected in equatorial coordinates. This projection places the survey area closer to the equator, leading to fewer distortions in the map. We have also color coded the galaxy according to redshift, and plotted the galaxies such that closer galaxies obscure more distant ones. The 'Cosmic Web' is clearly visible here, as is the decreasing angular size of the web with redshift. Also visible is the gap in the middle of the main survey volume, as well as smaller gaps and irregularities due to the incomplete nature of the survey. The imprint of the survey is evident in the footprint. Long strips of imaging 2.5 wide are produced by the drift scan imaging, these areas are then filled in by circular spectroscopic plates 3.0 across.
The large three-dimensional (3D) volume covered by the SDSS limits the usefulness of pie diagrams, heretofore a staple of large scale structure studies. Figure 3 shows the projected 3D galaxy distribution viewed from different angles. Galaxies are color coded by spectral-type as measured by the SDSS spectrographic pipeline. The radial color gradient in the images is due to the fact that the main galaxy sample is a magnitude limited sample and that the faint end of the luminosity function is dominated by bluer, late-type galaxies. Radial linear features known as fingers of God are quite noticeable here. These features are produced by the peculiar velocities of galaxies in rich clusters. Clearly the distance errors that result must be corrected for before attempting to identify topological features in the galaxy distribution, this topic is addressed in the next section.

Visualization workflow
The visualizations presented in this paper represent the result of an experiment to see how well the open-source 3D modeling program Blender could be adapted to scientific visualization tasks. Blender was chosen because of its ability to handle sophisticated lighting and texturing effects, and to render out animations with smooth cinematographic flightpaths. Blender is scriptable with Python, allowing for custom import of scientific datasets. Most of these large scale structure visualizations tasks involve the display catalog data, typically represented as large numbers of points, glyphs, lines and surfaces. Blender does not naturally deal with a set of points in 3D. However, one can create a mesh of unlinked vertices, and apply a halo material to the mesh. The halo material is an image with some sort of radial fall off in intensity. With a good choice of halo parameters a volumetric rendering can be approximated.
As figure 3 illustrates, a dense 3D dataset such as the SDSS is difficult to capture in a single image, or even an animation. In addition to the rendered animations presented here, realtime visualization tools are important for data discovery and exploration. In particular, we use the Partiview package (Levy 2001) because of its ability to render large numbers of objects in realtime. It also supports stereoscopic rendering to GeoWall-type displays, and its data format is accepted by most of the major full-dome planetarium systems. These real-time visualizations provide the initial step in our visualization workflow. Once salient features are discovered in the dataset, the animations are rendered with Blender. A close up of the main Galactic cap region. The map is now projected in equatorial coordinates, leading to less distortions. The galaxies are color coded by redshift: purple 0.0 > z > 0.03; blue 0.03 > z > 0.05; blue-green 0.05 > z > 0.07; cyan 0.07 > z > 0.09; green 0.09 > z > 0.11; yellow 0.11 > z > 0.13; orange 0.13 > z > 0.15; red z > 0.15.

Identifying structures in the large-scale structure
When examining a visualization such as figure 3 one can draw a useful analogy with an online mapping service. The visualization is akin to the 'satellite view', accurate and attractive, but without any identified structures. What we want is a 'street' or 'hybrid' view that superimposes the identified structures, such as clusters, filaments, walls and voids. The remainder of this section describes the identification and visualization of these structure.

Survey geometry and galaxy sample
The coverage mask of the SDSS spectroscopic sample has a complex geometry due to the fact that it is an on-going survey and several regions are still in the process of being mapped. The edges of the large patches are not uniform and several holes can be seen inside the patches. The edges of the survey introduce artifacts in the determination of quantities such as the density field. Because there is no information available the density field cannot be accurately computed in regions close to the edges of the survey. In order to minimize the effect of survey geometry we restricted our analysis to the high declination stripe (see figure 2). This is the patch that covers the maximum area with minimum perimeter. It roughly corresponds to the range −70 • < λ < 70 • and 0 • < η < 40 • in the survey coordinate system (λ, η). This coordinate range corresponds roughly to (15 • < δ < 70 • ) in declination.
The radial extent of the survey is restricted to the redshift range 0.01 < z < 0.11. This allows us to sample the 'local' large scale neighborhood as well as moderately remote structures. The far end of the survey has been chosen to be close to the peak in the radial selection function.
Once we defined the geometry of the survey and corrected the artifacts in the galaxy distribution the next step is the selection of the galaxy sample. This is of crucial importance in order to trace the intricate multiscale features in the large scale matter distribution. For our purposes the best choice is the magnitude limited sample. The great advantage of magnitude limited surveys is the ability to use all available spatial information present in the survey. The disadvantages are the non uniform coverage along the line of sight and the (small) uncertainty in the 'real' mean of the sample. In order to correct for the non-uniform radial coverage magnitude limited samples must be weighted with a function describing the dependence on the survey galaxy number density as a function of their distance from us. We used a simple formula to model the change in the mean number of galaxies as a function of their redshift given by Efstathiou and Moody (2001): where A is a constant that depends on the mean number of galaxies, φ = exp (−( z z r ) β ) encodes the dependence of the mean number of galaxies with redshift, z r is the characteristic redshift of the distribution and β specifies the slope of the curve. The parameters A and β are determined by fitting the observed galaxy redshift distribution. The decrease in the mean number of galaxies as function of redshift can be then corrected by weighting each galaxy with the inverse of the selection function evaluated at its redshift. This will provide us with a uniform density variation along the line of sight.

Redshift and distance
The expansion of the universe and its (assumed) isotropic nature provides means to determine the distance to galaxies by means of their recession velocity: where r gal is the distance to the galaxy, v rec the recession velocity of the unperturbed Hubble flow, v pec the peculiar velocity of the galaxy with respect to the Hubble flow and H 0 is the Hubble parameter, parameterized by h: The recession velocity can be inferred from the spectra of the galaxy by measuring the shift in frequency of emission or absorption lines compared to the rest frame. However, this recession velocity is the sum of two components, a cosmological component due to the expansion of space and a peculiar velocity component due to the motion of the galaxies along the line of sight. This second component introduces an error into our distance estimate.

Removing redshift distortions
The distance estimation of galaxies is affected by peculiar velocities. These so-called redshift distortions arise from both linear and nonlinear effects. Linear processes such as the Kaiser effect tend to enhance the Cosmic Web while nonlinear effects have the opposite effect. The so-called fingers of God can be identified as elongated distortions along the line of sight. Fingers of God originate from highly nonlinear gravitational interactions between galaxies and their environment. Their nonlinear origin makes it impossible to use simple prescriptions to reconstruct the original position of galaxies inside such structures. Following the same approach as (Tegmark et al 2004) we corrected the fingers of God by first identifying virialized clusters. Subsequently we compress them along the line of sight until their dispersion perpendicular to the line of sight equals the dispersion along the line of sight.
The first step in compressing the fingers of God is identifying elongated groups along the line of sight. This is done by using the well-known Friends-of-Friends (FoF) algorithm in which galaxies are recursively linked to other galaxies within a given linking volume defined by the linking length parameter wheren gal is the mean number density of galaxies in the sample and r link is the linking length.
The parameter b defines the local density threshold: where n thr is the density threshold defining groups of particles and r mean =n −1/3 gal is the mean interparticle separation.
In redshift space, the groups present a characteristic elongated shape along the line of sight. We used the approach of (Huchra and Geller 1982) in which the distance between a pair of galaxies i, j is decomposed in line of sight and transverse distances: where z i and z j are the redshifts to each galaxy and θ i, j is their angular distance. Two galaxies are then linked if the two following conditions are satisfied: wheren gal (z) is the mean number density of galaxies as a Finally, we compress the fingers of God by making their transverse and line-of-sight dispersions equal. The center of mass of the group before and after the finger of God compression remains the same (figure 4).

From galaxies to density field
We compute the density field from the discrete galaxy distribution using the Delaunay tessellation field estimator (DTFE) method (Schaap 2007) (figure 5). This method provides with a self-adaptive volume filling density field computed directly from the Delaunay tessellation of the galaxy distribution. The DTFE is based on the assumption that the density at the position of each point is proportional to the inverse of the volume of its contiguous Voronoi cell W i , defined as the union of all Delaunay tetrahedra of which point i forms one of the four vertices. For D-dimensional space the density estimate of a point distribution with mean described by the radial function φ(z) is, where w i is the weight of sample point i, usually related to the mass of the point. In our case we assigned the same mass to all galaxies, although one could use quantities such as stellar mass or mass-luminosity relations instead. Given a location x, the Delaunay tetrahedron m in which it is embedded is determined. On the basis of the field gradient ∇ f | m the field value is computed by (linear) interpolation for each element of a regular grid as: In order to alleviate aliasing in the sampled density field due to the small-scale features in the tessellation we oversampled each pixel at 27 positions inside the pixel, effectively sampling the grid at three times its formal resolution. Finally, the median of the 27 samples was assigned to the pixel. This has the advantage that in the case of pixels embedded inside large tetrahedra both the mean and median of the sampling points equal the mean of the density field integrated Figure 6. Density field estimation from the discrete galaxy distribution. Top insert shows a 2D slice across the complete survey geometry. The Delaunay tessellation is computed (here in 2D, bottom left) and a density estimate is assigned to each galaxy. Finally the density field is sampled inside a regular grid by interpolating the density estimates inside the tetrahedron that contains the sampling positions defined by the grid (bottom right).
inside the pixel. The input file used for the density estimation contained all the galaxies in the sample plus a weighting for each galaxy computed from equation (1). In addition, a 'buffer' zone was generated around the survey in order to minimize unwanted boundary effects. The buffer zone surrounds the survey geometry with random points with a similar radial distribution and weighting function as the galaxies in the sample. The final density field was computed on a rectangular grid with 1024 × 512 × 256 pixels of size corresponding to roughly 600 × 300 × 150 Mpc. Figure 6 shows the DTFE procedure applied to a slice from the complete galaxy sample. For illustrative purposes we computed the 2D Delaunay triangulation instead of the full 3D Figure 7. This image illustrates the procedure for generating the filament surfaces. From the galaxy positions (in yellow) the morphology binary mask is calculated (in magenta), around which the smooth filament surfaces are calculated (in cyan). An animation illustrating this procedure is presented as well (spine.mov, available from stacks.iop.org/NJP/10/125015/mmedia).
tessellation. The triangulation follows the galaxy patterns down to the smallest details giving a faithful representation of the underlying galaxy distribution. Note that both high density groups as well as underdense voids are well represented in terms of their density and intrinsic shape.

Visualizing galaxies
We assigned a luminous halo to each galaxy in our sample (figure 8). The color of the halo corresponds to the local density as mapped by a given color table. Blender stores a large set of properties for each object, so generating one Blender halo object for each of the approximately 100 000 galaxies in the sample would require a significant amount of memory. Instead in order to use memory efficiently we generated only 255 halo objects in Blender. Each object contains the position of a set of galaxies sharing a common property. In the case of halos we binned the galaxies in 255 levels of local density and then derived the red, green and blue colors corresponding to each density bin mapped by a given color table. The 255 color restriction is irrelevant since halos are close to each other and blend their colors producing a much richer color palette.
We generated a quad for each galaxy in our sample and used an image from a nearby galaxy as a texture. We faced the same memory problem as with the halos and also solved it by generating a small set of approximately 100 objects, each one containing a large number of galaxies sharing the same properties. The galaxy template consists of more than 100 goodquality galaxy images chosen from nearby galaxies. The galaxy assignment was done using a simple morphology-density relation. We binned the galaxies in density and assigned an image to each density bin. An additional randomizing was done inside each bin in order to avoid the same galaxy image appearing in adjacent galaxies as a result of close-by galaxies with similar densities.

Identifying the spine of the Cosmic Web
The spine of the Cosmic Web was identified using a novel method presented in (Aragón-Calvo et al 2008). It is based on the study of the topology of the density field as traced by the discrete watershed transform. The spine of the density field is identified as a set of singular points along the intersection of walls, which are defined as points in the watershed transform with two adjacent voids. For detailed description of the method see (Aragón-Calvo et al 2008).

Visualizing the spine of the Cosmic Web
From the binary mask describing the morphology of each pixel (see previous section) we generated a visualization of the discrete skeleton image by assigning a squared box to each pixel that belongs to the spine. We generated boxes slightly smaller than the pixel size in order to improve their visual appearance.
In order to improve the visual appearance of the discrete skeleton we smooth the binary image with a Gaussian kernel of 2 pixels width. This removes the sharp corners of individual pixels giving a soft appearance without significantly affecting its general topology. From the smoothed spine we then proceed to compute an isosurface at 0.25 of the maximum value (figure 7).
After we computed the isosurface we obtain a surface made of several million triangles. This poses a severe limitation for rendering or visualization with standard computing resources. However, the huge dynamical range of the structures present in the Cosmic Web means that at any given point we can only see some particular level of detail. When the camera field of view covers the complete survey we can only see the large structures and as the camera gets closer to the survey more and more detail is needed. This motivated the use of multiscale techniques in order to use efficiently our computational resources.
We implemented a simple method for generating static multiresolution surfaces (figure 9) (i.e. the surface does not change during the complete animation). The multiresolution surface was generated by dividing the survey box into square subboxes of 32 pixels of side (16 Mpc). For each subbox we then computed isosurfaces at three levels of detail (100, 50 and 25%). Finally, we pasted the surface from the individual subboxes choosing the level of detail according to: where r CamPath is the closest distance from center of the subbox to the camera path.

Quasar absorption line systems
Distant quasars are sensitive probes of otherwise invisible gaseous clouds through a forest of absorption features imprinted in their spectra. These gaseous clouds are thought to originate in a broad range of overdense environments in different epochs, from diffuse filaments to dense star-forming regions. Absorption-line observations therefore complement traditional galaxy surveys for establishing a complete understanding of the recycling of baryonic matter, but their impact on the research of galaxy formation and evolution has been limited because of their uncertain correlation with luminous matter. Insights into the origin of quasar absorbers can be obtained from their clustering amplitude on large scales >1 Mpc. The clustering of the absorbers is a consequence of dark matter halos in which they are found. High-mass halos are highly clustered, whereas low mass halos have weaker clustering strength. Measurements of the galaxy and absorber cross-correlation function allow us to quantify the mean halo mass of the absorbers.

The visualization
This visualization seeks to investigate the spatial relationship between the luminous red galaxy (LRG; Eisenstein et al 2001) sample and the Mg II absorption line systems (figure 10). The LRGs consist of approximately 80 000 galaxies roughly spanning a redshift of 0.25 < z < 0.55, these overlap with the Mg II absorbers, which are redshifted into the SDSS wavelength coverage at z ≈ 0.35. The LRG sample is visualized as large, high opacity halos which simulates a volumetric rendering of the LRG density field. The absorbers can only be found along the quasar lines of sight, and these lines of sight are displayed over the LRG Mg II absorber overlap region 0.35 < z < 0.6. As is apparent from the density of these lines, the LRG density field is well Figure 11. The correlation between LRGs and Mg II absorption line systems. The visualization clearly shows that the absorbers are less likely to be found in regions of high galaxy density. sampled by potential lines of sight. The Mg II absorbers are rendered as sharp halos, and color coded by equivalent width of the absorption (average equivalent width of the doublet >1.0).
We will be comparing this to a quasar absorption line system catalog based on DR5 quasars (York et al 2008).

Impacts
In the past decade, wide-field redshift surveys such as the SDSS and the Two Degree Field Galaxy Redshift Survey (2dFGRS; Colless et al 2001) have yielded detailed maps of the largescale galaxy distribution to z ∼ 0.6. These galaxy data are accompanied by quasar catalogs that offer the largest number of known probes for studying the interstellar medium and intergalactic medium at z = 0.4-6. A qualitative understanding of the origin of quasar absorbers can already be learned from the vast survey data, based on a 3D visualization of the spatial distribution of these absorbers relative to luminous galaxies. Figure 11 clearly renders the lumpy structures formed by luminous galaxies, which are loosely traced out by the location of Mg II 2706, 2803 absorbers. These absorbers are understood to arise primarily in photo-ionized gas of temperature T 10 000 K. The red clumps mark the highest overdensity peaks where high-mass dark matter halos are located and where the gas temperature is high as supported by the lack of Mg II absorbers in the visualization. The loose correlation between high-mass halos and Mg II absorbers also suggests that the Mg II absorbers originate primarily in low mass halos.

Conclusions
Visualization is an important tool for data exploration, discovery and interpretation. We have demonstrated the use of the open-source modeling program Blender for scientific visualization tasks. Two case studies were presented. The first visualized structures in the large scale structure of galaxies, a data interpretation task. The second visualization explored the spatial correlations between Mg II absorbers and LRGs, a data exploration and discovery task. As the largescale structure catalogs grow larger and the surveys cover larger volumes we expect scientific visualization to play a role of increasing prominence.