The GIGANTES dataset: precision cosmology from voids in the machine learning era

We present GIGANTES, the most extensive and realistic void catalog suite ever released -- containing over 1 billion cosmic voids covering a volume larger than the observable Universe, more than 20 TB of data, and created by running the void finder VIDE on QUIJOTE's halo simulations. The expansive and detailed GIGANTES suite, spanning thousands of cosmological models, opens up the study of voids, answering compelling questions: Do voids carry unique cosmological information? How is this information correlated with galaxy information? Leveraging the large number of voids in the GIGANTES suite, our Fisher constraints demonstrate voids contain additional information, critically tightening constraints on cosmological parameters. We use traditional void summary statistics (void size function, void density profile) and the void auto-correlation function, which independently yields an error of $0.13\,\mathrm{eV}$ on $\sum\,m_{\nu}$ for a 1 $h^{-3}\mathrm{Gpc}^3$ simulation, without CMB priors. Combining halos and voids we forecast an error of $0.09\,\mathrm{eV}$ from the same volume. Extrapolating to next generation multi-Gpc$^3$ surveys such as DESI, Euclid, SPHEREx, and the Roman Space Telescope, we expect voids should yield an independent determination of neutrino mass. Crucially, GIGANTES is the first void catalog suite expressly built for intensive machine learning exploration. We illustrate this by training a neural network to perform likelihood-free inference on the void size function. Cosmology problems provide an impetus to develop novel deep learning techniques, leveraging the symmetries embedded throughout the universe from physical laws, interpreting models, and accurately predicting errors. With GIGANTES, machine learning gains an impressive dataset, offering unique problems that will stimulate new techniques.


INTRODUCTION
Cosmic voids-the large under-dense regions infilling the cosmic web (Gregory & Thompson 1978;Jõeveer et al. 1978;Kirshner et al. 1981;van de Weygaert & van Kampen 1993;Peebles 1980;Bond et al. 1996)-have gained interest in the last ten years as a robust tool to extract cosmological information (see Pisani et al. 2019, and references therein). Until a few decades ago survey volumes were too small to provide enough statistics for voids, since voids are among the largest objects in the Universe. Large-scale surveys are now ushering in the era of big data in astronomy and astrophysics, which is essential for unveiling the power of cosmic voids. With such large amounts of data and access to previously unresolved phenomena, machine learning is becoming a prudent tool in astrophysics (see e.g. Ntampaka et al. 2019;Baron 2019;Charnock et al. 2018;Chalapathy & Chawla 2019;He et al. 2019;Villaescusa-Navarro et al. 2020a, and references therein).
Many traditional techniques, such as multi-layer perceptrons (MLPs) and convolutional neural networks (CNNs), are not interpretable, making discovering new physics a challenge, and can obscure information. The inherently graphical nature of the Universe has prompted substantial recent development in interpretable deep learning methods with graph neural networks (GNNs) (Cranmer et al. 2020). At the same time, symmetries are embedded in physical laws that govern the evolution of the Universe. Cosmic voids offer more symmetries that can improve interpretable machine learning techniques. The data-rich field of astrophysics, with its large surveys, offers unique problems fundamentally characterized by symmetries, interpretability, and the need for accurate error estimates, providing an impetus for the development of new machine learning techniques to conquer these questions.
With the advent of large modern surveys, we are entering the golden age of cosmic voids. The Baryon Oscillation Spectroscopic Survey (BOSS) (Alam et al. 2017) and eBOSS (Dawson et al. 2016) have already provided several thousand voids (Hamaus et al. 2020;Aubert et al. 2020). This number will dramatically increase in the coming years thanks to data from the DESI experiment (DESI Collaboration et al. 2016), the Subaru survey with the Prime Focus Spectrograph (PFS) (Tamura et al. 2016), the SPHEREx mission (Doré et al. 2018), the Nancy Grace Roman Space telescope (Spergel et al. 2015), the Euclid mission (Laureijs et al. 2011), and the LSST survey from the Vera Rubin Observatory (The LSST Dark Energy Science Collaboration et al. 2018), each expected to detect up to an order of 10 5 or more voids. Large-scale surveys provide enough volume to measure voids over large areas of the sky while simultaneously observing low mass galaxies. Beyond increasing the statistical power of cosmic voids, we can now intensely study their properties, such as by mapping their interior in detail.
Devoid of matter, comic voids are inherently extremely sensitive to the properties of diffuse components: they are a novel tool to constrain neutrino masses (Villaescusa-Navarro et al. 2013;Massara et al. 2015;Banerjee & Dalal 2016;Sahlén 2018;Kreisch et al. 2019;Schuster et al. 2019;Zhang et al. 2020;, modified gravity (Clampitt et al. 2013;Zivick et al. 2015;Cai et al. 2015; Barreira et al. 2015;Achitouv 2016Achitouv , 2019Falck et al. 2018;Sahlén 2018;Perico et al. 2019;Contarini et al. 2021;Alam et al. 2020), and dark energy (Bos et al. 2012;Spolyar et al. 2013;Pisani et al. 2015;Pollina et al. 2016;Verza et al. 2019). The current understanding of our Universe sets the observed accelerated expansion of the Universe as one of the most puzzling mysteries for modern cosmology. The nature of dark energy, the component constituting ∼ 68% of the Universe and postulated to explain such accelerated expansion (Planck Collaboration et al. 2018), is still poorly understood. Since matter is missing in voids by definition, voids are regions in the Universe that become dark-energy dominated the earliest, thus providing a window towards unraveling this dark mystery (Lee & Park 2009;Lavaux & Wandelt 2012). The spherical property of void stacks in real space provides standard spheres for the Alcock-Paczyński test (Alcock & Paczynski 1979;Ryden 1995;Ryden & Melott 1996;Lavaux & Wandelt 2012;Biswas et al. 2010). Using stacked voids, it is also possible to model the redshift-space distortion (RSD) pattern around voids. The use of the void-galaxy cross-correlation function for the Alcock-Paczyński test and RSDs is an area of intense data analysis-based activity in recent years (Lavaux & Wandelt 2010). Voids have provided results competitive to those from galaxies, and since voids do not suffer from non-linearities to the same extent as galaxies, there are indications that modeling could be easier for voids than for galaxies (e.g. Sutter et al. 2012Sutter et al. , 2015Paz et al. 2013;Hamaus et al. 2016;Hawken et al. 2017;Hamaus et al. 2017Hamaus et al. , 2020Cai et al. 2016Cai et al. , 2017Achitouv et al. 2017;Correa et al. 2019;Hawken et al. 2020;Nadathur et al. 2019Nadathur et al. , 2020Aubert et al. 2020;Paillas et al. 2021). Aside from voids' use in cosmology, the study of galaxy properties in voids is also an active field of research (Hoyle et al. 2005;Patiri et al. 2006;Kreckel et al. 2012;Ricciardelli et al. 2014;Habouzit et al. 2020;Panchal et al. 2020).
Despite the wide use of voids in galaxy clustering analyses of survey data, simulation-based analyses of void capabilities with an extremely large set of simulations have only been performed using dark matter with neutrino particles or dark matter particles alone as tracers to find voids, and with a spherical void finder . To provide synergy with data analysis, a thorough investigation of voids extracted from the halo field and with a void finder preserving void shape and cosmic-web information is imperative.
Relying on the large set of halo catalogues from the QUIJOTE simulations (Villaescusa-Navarro et al. 2020b) and the shape-preserving void finder VIDE , we build the most extensive and realistic dataset of void catalogs ever released: the GIGANTES 1 void catalogs suite, containing over 1 billion cosmic voids. The suite includes 15,000 VIDE Figure 1. Constraints on the sum of neutrino masses Mν , Ωm, σ8, and h from the halo mass function (HMF), void size function (VSF), halo auto-correlation function (ξ hh ), void auto-correlation function (ξvv), and combined statistics for a volume of 1 h −3 Gpc 3 . Void summary statistics provide strong constraints and orthogonal information to halo summary statistics.
fiducial cosmology void catalogs, as well as over 9,000 catalogs in non-fiducial cosmologies, spanning various values of the following cosmological parameters {Ω m , Ω b , h, n s , σ 8 , M ν , w} and fully leveraging the QUIJOTE simulation suite, which covers redshifts z = 0.0, 0.5, 1.0, and 2 in real-and redshift-space. The void finding relies on the popular public void finder VIDE , arguably the most used void finder, as testified by its use in a plethora of papers performing both simulation-based theoretical modelling and data analysis from modern surveys (e.g. Sutter et al. 2014;Sutter et al. 2015;Chan et al. 2014;Pisani et al. 2014;Pisani et al. 2015;Pollina et al. 2017;Kreisch et al. 2019;Verza et al. 2019;Cousinou et al. 2019;Fang et al. 2019;Vielzeuf et al. 2021;Jeffrey et al. 2021). We provide further details on the QUIJOTE simulations and the VIDE void finder in Appendix A.
Voids provide a set of summary statistics uniquely sensitive to the properties of the cosmological model. However, voids have been hotly debated as to whether or not they provide different information than the halo distribution they are found within. The main void summary statistics are the void-halo cross-correlation function ξ vh , equivalent to the void density profile, the void size function n v , a histogram of void sizes, and the void auto-correlation function ξ vv . With our suite of over 1 billion voids, we show in Figure 1 that void summary statistics carry novel and complementary information to halos, evidenced by the different degeneracies in the cosmological parameters for a volume of 1 h −3 Gpc 3 (see Section 3 for details).
Due to its unprecedented size and exploration of the power of different void summary statistics, including well known summary statistics such as ξ vh and n v , but also recent summary statistics such as ξ vv , and its power in accurately estimating covariance matrices, the GIGANTES voids dataset constitutes a powerful benchmark for machine learning applications. Its massive number of voids coupled with a large number of realizations permits the extraction of a strong void signal, information that is usually obscured by noise due to a low number of voids. As such, the GIGANTES suite opens the realm of machine learning exploration for voids, a new avenue for these objects. With such a wealth of data for voids, the GIGANTES suite provides a testbed not only for developing new cosmological theories, but also for developing novel machine learning techniques. Given the nature of voids, they can be analyzed with traditional techniques, such as CNNs (when projected into 2D or 3D grids) albeit with the caveats mentioned earlier, or GNNs (when exploiting the full sparsity of the data). Their symmetries, need for interpretation, and need for uncertainty estimation, however, pose questions ideally suited for the development of new machine learning techniques. This paper is organized as follows, and all results are presented for a volume of 1 h −3 Gpc 3 . In Section 2 we present the GIGANTES dataset as well as the standard three void summary statistics. In Section 3 we address the question on whether voids carry additional information on cosmological parameters with a Fisher matrix forecast in real space at z = 0 exploring the constraining power of voids obtained from the halo field with respect to traditional probes. We leave for future work an analysis in redshift space. In Section 4 we study where the information comes from, with a computation of the correlation between and among void and halo summary statistics, a Fisher forecast in real space exploring the combined power of voids and halos, and a comparison of the power of void shape in constraints to a simplistic spherical void finder. This paper focuses on illuminating the full information contained in these probes rather than exploring models for such probes. In Section 5 we show a machine learning application by training a neural network to perform likelihood-free inference from the void size function. We draw the main conclusions of this work in Section 6.

GIGANTES
This section describes the void catalogs and the primary void summary statistics.

GIGANTES catalogs
We build VIDE void catalogs from all the QUIJOTE simulations at different redshifts. A summary of all catalogs available is shown in Table 1. GIGANTES includes catalogs for the fiducial cosmology, catalogs varying single parameters for Fisher matrix calculations, and catalogs sampling multiple parameters at a time from a latin hypercube. We provide catalogs at different redshifts (0.0, 0.5, 1.0 and 2.0) as well as for real-space and redshift-space. Overall, the GIGANTES suite provides more than 1 billion voids over thousands of different cosmological models. The GIGANTES dataset will be publicly released upon acceptance of the paper.
Each void catalog provides a set of files as described in Sutter et al. (2015), giving for each void a set of properties, including: 1. the void center position (RA, Dec, z for observations; x, y, z for simulations), 2. the void radius (see Appendix A), Since VIDE voids are found taking into account the hierarchical feature of the cosmic web, the void finder outputs all voids (parents) and their sub-voids (children): for each void, the catalog includes the ID of the void parent (if applicable), the tree level, and the number of children.

Main void summary statistics
The void finder provides void centers and radii, which in themselves yield two different summary statistics to be used in cosmological data analyses: the void auto-correlation function, and the void size function. Combining void positions with halo positions defines a third summary statistic: the void-halo cross-correlation function.

The void-halo cross-correlation function ξ vh
By knowing the position of void centers and halos, it is possible to compute the probability of finding a halo at a certain distance from the void center, known as the void-halo cross-correlation function, ξ vh , or the void density profile found in the halo field. This probability will be low in the center of the void (it is not very likely to find halos, or galaxies, close to the void center), and high at the over-dense rim enclosing the void, which is coincident with the average void radius. It is usually computed considering a stack of many voids. While the estimation of the void-halo cross-correlation function from data usually relies on estimators, it can be shown that ξ vh is equivalent to the void density profile (see Eq. (2.7) and (2.8) of Hamaus et al. 2015): where r is the distance between the halos and the void center, ρ vh is density of halos in shells around the void center, andρ h is the average tracer density of the dataset. The void density profile is sensitive to cosmological parameters in both real-space and redshift-space. The peak height and location of the 1D real space profile, denoting the void wall's rim and average void radius, responds to changes in cosmological parameters. In a homogeneous and isotropic universe void density profiles do not possess a preferred direction. Thus, in real space, the 2D density profile will be spherically symmetric. In redshift space, however, the spherical stacked 2D void profile will be distorted due to the Alcock-Paczyński effect, which introduces cosmology-dependent geometrical distortions along the line-of-sight (Alcock & Paczynski 1979). The 2D redshift space profile is also distorted due to RSDs. The magnitude of these effects depend on the cosmological model, so the observed 2D void density profile can be used to constrain cosmology.  The void density profile has been used to successfully measure the Alcock-Paczyński effect by using voids as standard spheres (e.g. Sutter et al. 2012) and simultaneously modeling RSDs (e.g. Hamaus et al. 2016;Hamaus et al. 2020). The profile provides the most stringent constraint on the Universe's expansion history at z = 0.5 to date from voids: a calibration-free measurement from voids only, with a 16.8% precision level measurement of the growth rate of structure and an independent measurement of the matter content of the Universe yielding Ω m = 0.312 ± 0.020 (Hamaus et al. 2020). Figure 3 shows two density profiles measured from the GIGANTES void catalogs for voids with radii 40-49 h −1 Mpc and 49-112 h −1 Mpc at z = 0. Distance is normalized by the average void radius for the denoted radius bin. Larger voids have a less pronounced void wall rim than smaller voids due to the fact that larger voids have evolved more than smaller voids at a fixed redshift and because we find voids in the halo field rather than in the dark matter field. Figure 4 shows profiles from two different redshift bins (z = 0 and z = 1) for voids 50-70 h −1 Mpc in radius 2 . For the same radius bin, voids at present are more evolved than voids at z = 1 since they are, on average, older. Thus, voids at z = 0 are more evolved and so have a lower void wall rim than voids at z = 1.

The void size function
The void size function measures the number of voids as a function of their radius, also known as the void abundance, and is particularly sensitive to cosmological parameters, such as the dark energy equation of state (Pisani et al. 2015;Verza et al. 2019), modified gravity (Contarini et al. 2021) and massive neutrinos (Kreisch et al. 2019;Sahlén 2018). The theoretical modeling of the void size function is an area of intense activity (e.g. Sheth & van de Weygaert 2004;Jennings et al. 2013;Pisani et al. 2015;Sahlén & Silk 2018;Contarini et al. 2019;Verza et al. 2019), and will provide stringent constraints when applied to the next generation of surveys (e.g. Pisani et al. 2015;Doré et al. 2018). The main theoretical model to be considered for the abundance is the Sheth and van de Weygaert model (Sheth & van de Weygaert 2004), an excursion-set based prediction of void numbers of different sizes, later extended to consider the volume conservation of voids, since small voids merge to give larger voids (Jennings et al. 2013). This model has been shown to reproduce simulation results in different cosmologies where either the dark energy equation of state (Pisani et al. 2015;Verza et al. 2019) is varied or modified gravity is considered (Contarini et al. 2021). Since the model assumes dark matter voids and spherical symmetry, current work focuses on improving the match of predictions with measurements of observed voids in mocks (e.g. by including tracer bias, see Pollina et al. 2018;Contarini et al. 2019Contarini et al. , 2021. Void auto-correlation function ξvv from GIGANTES voids, for both z = 0 and z = 1. The void exclusion scale is coincident with the peak of the void auto-correlation function at ∼ 2RV, whereRV is the median void radius. In this work we do not attempt to model the measured void abundance with theoretical models, but we instead choose the approach of relying on the measured values of void numbers from the GIGANTES dataset to assess its constraining power for cosmology. In Figure 1 we show a Fisher forecast over a volume of 1 h −3 Gpc 3 for constraints on a few cosmological parameters from the void size function for radii 6 − 60 h −1 Mpc. The void size function puts superior constraints on m ν , for example, compared to the halo mass function (errors of 0.21 eV and 1.56 eV, respectively), and is competitive with the halo auto-correlation function (error of 0.22 eV). See Appendix B and Appendix C for forecasted constraints on additional cosmological parameters. In Appendix B we also illustrate the stability of the Fisher uncertainties by varying the number of simulations used to calculate the uncertainties and illustrating that the change in the value of the uncertainties is less than the change in the number of simulations. Figure 5 shows the abundance of voids from the GIGANTES dataset at z = 0 and at z = 1. As described below in Section 2.2.3, voids are larger in the latter case. We show the void size function for radii ranging from 15 to 100 h −1 Mpc. The median void radius is 35 h −1 Mpc for z = 0. Voids of lower size are less abundant due to the low simulation resolution at small scales (the mean tracer separation is ∼ 13 h −1 Mpc at z = 0 and ∼ 17 h −1 Mpc at z = 1). Nevertheless, small voids improve cosmological constraints, pointing to the constraining power of the void hierarchy, which will be further explored in future work. The large volume and, thus, strong void statistics in the GIGANTES suite promotes a high signal-to-noise ratio for the void size function's response to cosmological parameters.

The void auto-correlation function ξvv
We can compute the probability of finding a void center at a given distance from a randomly selected void in our void catalog, yielding the void auto-correlation function. Since the center of a void is defined as the volume-weighted barycenter of the Voronoi cells composing the void, it carries information about the whole structure of the void. This allows us to uncover a new sensitivity of the large-scale structure of the Universe to cosmological parameters (see Section 3, Section 4, and Section 5). The void auto-correlation function requires a great number of voids to overcome shot-noise, and it has so far been measured from data only at low redshift, over a volume of 0.6 (h −1 Gpc) 3 (Clampitt et al. 2016), and considering relatively small effective void radii (< 30h −1 Mpc).
Given its limited application to data, the void auto-correlation function has not yet been used to derive constraints on parameters of the cosmological model. It has been investigated in simulations (Chan et al. 2014;) and shows a strong sensitivity to massive neutrinos (Kreisch et al. 2019). We further illustrate this sensitivity to neutrinos for a volume of 1 h −3 Gpc 3 in Figure 1, in which the void auto-correlation function puts the strongest forecasted constraints on m ν of 0.13 eV. One reason for ξ vv 's strong constraints on the parameters could be that it contains a strong, clear feature at the void exclusion scale. Such strong features are useful for measuring distances, like the BAO, probing the background cosmology (see also . Further, void bias can be large (see e.g. Chan et al. 2014;Schuster et al. 2019), and this boosts the large-scale correlation function amplitude which can provide a better signal-to-noise ratio to measure the large-scale power spectrum, which depends on σ 8 and n s . See Appendix B and Appendix C for further forecasted constraints on cosmological parameters.
With the advent of upcoming surveys, such as the DESI experiment (DESI Collaboration et al. 2016), the Roman Space Telescope (Spergel et al. 2015), the Euclid ESA satellite (Laureijs et al. 2011), the SPHEREx mission (Doré et al. 2018), the Rubin Observatory (The LSST Dark Energy Science Collaboration et al. 2018), and the PFS Subaru survey (Tamura et al. 2016), hundreds of thousands of voids will be observed ). The void auto-correlation function will then reach the realm of observable quantities that can be measured from data with high significance, and used to constrain cosmological parameters.
We show the void auto-correlation function measured from the GIGANTES data set in Figure 6 at z = 0 and z = 1. The void auto-correlation function has a maximum at twice the mean void radius of the sample. This corresponds to the void exclusion scale , which is termed as such because, on average, void walls touch at twice the mean void radius since voids cannot overlap. This makes it likely to have two void centers at a distance corresponding to twice the mean void radius of the sample. Voids on average cannot be separated by smaller scales because then they will overlap. However, there is a distribution of void sizes (i.e. the void size function), and smaller voids are allowed to be separated by equivalently smaller scales. Thus, the probability of two voids being separated smoothly decreases as scales become smaller, since there are fewer small voids.
At higher redshift, the observed average radius of GIGANTES voids is larger. The Universe is more dominated by dark energy at z = 0 than it was at z = 1. Large, isolated voids in the dark matter field mostly expand in their life time. Large voids in the dark matter field at z = 1 most likely continue to expand through z = 0. Here, however, we consider voids found in the halo field. The number density of halos of a lower fixed mass of, e.g., 10 13 M , is higher at present. This high number density of small halos results in, on average, more small voids at present than at z = 1. This is also why the peak of the void density profile at z = 0 is shallower than at z = 1, whereas in dark matter simulations the opposite occurs ). Thus, the average void size is larger at z = 1 in the GIGANTES void catalogs suite. We also note that at z = 1 there are less voids, so the ξ vv is noisier. The theoretical modelling of the void auto-correlation function is an area that will deserve further attention in coming years, given its promising constraining power (see Section 3).

DO VOIDS CARRY ADDITIONAL INFORMATION? YES.
The question of whether voids, defined by relying on the halo (or galaxy) distribution, carry additional cosmological information with respect to traditional tools based on the halo (or galaxy) distribution, such as the two-point correlation function or the halo-mass function, has been a debated topic.
From a theoretical perspective, there have been indications to believe voids contain higher order information. As extended 3-D objects, voids must be defined by at least 4 non-planar halos. The fact that voids are defined by many halos could hint to the possibility of accessing higher-order information, but this is not in principle guaranteed: it is possible to find extended 3-D objects, and, thus, voids, in a Gaussian random field (or a Poisson distribution of tracers, see Cousinou et al. 2019). Nevertheless, the universe at present is not a Gaussian random field, and certain summary statistics associated to voids, like the void size function, have been expected to contain information beyond the 2-pt correlation function. As 1-point functions, amplitude and shape are expected to depend on all n-point functions. The probability of a random volume being a void can be written in terms of n-point correlations (Fry 1986;Fry & Colombi 2013): whereN is the mean tracer (i.e. halo or galaxy) count. Further, we are using voids found in the halo field, which already probes higher k. Another possible explanation for why voids can capture non-linear information is that an individual void is built from halos separated by distances approximately less than or equal to its diameter, highlighting scales that are quasi-linear and approaching non-linear. This paper has the tools necessary to properly investigate this question, relying on a comprehensive Fisher analysis. We investigate the power of the void statistics mentioned in Section 2.2 with respect to two traditional statistics defined from the halo field: the halo mass function and the halo auto-correlation function. The halo mass function measures the abundance of halos of a given mass, while the halo auto-correlation function gives the probability that, taken a random halo, there is another halo at a distance r from the first halo. Figure 1 shows constraints from different void summary statistics, as well as for the halo mass function and the halo auto-correlation function, in real space over a volume of 1 h −3 Gpc 3 for selected parameters: the sum of neutrino masses m ν , the matter content of the Universe Ω m , the reduced Hubble constant h, and the amplitude of linear density fluctuations inside spheres of radius 8 h −1 Mpc σ 8 . See Section 4.1 in Villaescusa-Navarro et al. (2020b) for detailed information on how the Fisher analysis is performed.  analysed the power of voids found in the dark matter distribution, but here we use the realistic set of GIGANTES void catalogs drawn from the halo distribution to show that voids observed by galaxy surveys hold the power to strongly reduce the available parameter space for cosmological parameters.
From this figure we clearly see that the void size function constraints have a very different orientation with respect to constraints from the halo mass function and the halo auto-correlation function. The void auto-correlation function provides even stronger constraining power as well as access to unique information, as evidenced from its orientation with respect to the other three summary statistics. The void size function, in particular, is almost fully orthogonal to the halo mass function, illustrating it provides access to information beyond that accessible from the halo statistics. Further, the orientation of the void and halo auto-correlations differ, and for some parameter planes, such as the σ 8 − h plane, the halo mass function, void size function, halo auto-correlation function, and void auto-correlation function are almost all in different orientations.
The combination of all the mentioned summary statistics considerably tightens final constraints, demonstrating that voids carry complementary information to that of halos. See Appendix C for the constraining power for other parameters.

WHERE THE INFORMATION COMES FROM
Aside from showing that voids provide additional information, the large GIGANTES dataset allows us to estimate how correlated the different summary statistics are and where the information on cosmological parameters comes from. The GIGANTES dataset contains void catalogs similar to those expected from future surveys such as PFS, since they have a similar number density to the QUIJOTE simulations, although different halo biases. Upcoming surveys such as Euclid and the Roman Space Telescope will have a higher number density than the QUIJOTE simulations, and so are expected to provide even better constraints from voids than what is illustrated here. See Appendix A for details on comparing the QUIJOTE simulations to upcoming surveys.

Summary Statistic Correlation
An important point to address is how the different void statistics are correlated, and if information from different bins is correlated. In Figure 7 we show the correlation matrix between and within the halo mass function (first 15 bins), the void size function (following 18 bins), the void auto-correlation function (following 61 bins), the halo auto-correlation function (following 61 bins), and the void-halo cross-correlation function (last 61 bins). We utilize the 15,000 fiducial simulations at z = 0 and compute the standard Pearson correlation coefficient for the different summary statistics: The correlations are computed for a volume of 1 h −3 Gpc 3 for scales utilized in the Fisher forecast, namely 15 to 200 h −1 Mpc in log space for the correlation functions and radii 6-60 h −1 Mpc for the void size function. Bins for the HMF are used from . This, however, does not make a statement on whether or not the parameter constraints are independent, as 0 correlation does not imply parameter independence. In Appendix B we also illustrate the stability of the covariance by varying the number of simulations used to calculate the covariance and illustrating that the change in the value of the uncertainties is less than the change in the number of simulations. The void size function and the halo mass function each have relatively uncorrelated bins. A few bins at small distances (below the median radius of 35 h −1 Mpc) in the void auto-correlation function are correlated, indicating some coupling at those scales, while larger scales are decoupled. The halo auto-correlation function is more strongly correlated over all scales, while the void-halo correlation function shows a slightly higher correlation at low radii (that is, below the average void radius) than the void auto-correlation function. We note that scales roughly below 20 h −1 Mpc in the correlation matrix show some scatter. This may be physical due to QUIJOTE being well resolved below such scales and the covariance being stable (see Appendix B), but in the interest of transparency, we compute the Fisher uncertainties for all ξ for only scales 20 − 200 h −1 Mpc in Appendix B and  are near 10%, although there are a few that exceed this. Fisher forecasts are idealized, however, and so should be trusted up to the 10% level as a rule of thumb. The void size function and the halo mass function are uncorrelated with each other, as highlighted by the perpendicular Fisher contours already, confirming that the two probes are strongly complementary. By inspecting the correlation between the void size function and void auto-correlation function, as well as the void size function and the void-halo cross-correlation, we see that large voids tend to be slightly anti-correlated at small scales, while small voids tend to be correlated at small scales. We can also see that large voids tend to be anti-correlated with halos that are close to each other, as expected. Viewing the correlation between the void auto-correlation function and the void-halo cross-correlation function, we see that at scales larger than the void exclusion scale and larger than the void wall rim, respectively, the two functions are somewhat correlated. This is because each follows the matter field at such large scales. We also see a parallel increase in correlation for scales larger than the median void radius and larger than the void wall rim, due to the fact that voids are more likely to be clustered at these scales. Without the sheer volume of the GIGANTES suite, the correlation among and within these void summary statistics could not be understood in such detail.

Optimal constraints combining voids and halos
The previous section showed that voids provide different, additional information than other probes. Given that voids are obtained from the halo or galaxy distribution, we want to compare the total information from all void summary statistics (void size function and ξ vv ) with the total information from halo summary statistics (halo mass function and ξ hh )-without considering the combined statistics ξ vh for this comparison. Figure 8 shows the comparison for a volume of 1 h −3 Gpc 3 : for some parameters, voids in fact perform better than halos, and overall voids are orthogonal in their constraints to halos. Voids are particularly effective in constraining massive neutrinos and h. Constraints on neutrinos are expected to be powerful from voids (Massara et al. 2015;Kreisch et al. 2019), as diffuse components particularly impact voids. See Appendix C for further parameter comparisons.
In all cases, as highlighted above, combined constraints are powerful. We recall that for this comparison we have not considered the void-halo cross-correlation, in order to compare halo-only with void-only summary statistics. If adding the void-halo cross-correlation, constraining power is further increased (see Appendix C). We leave for future work the exploration of the constraining power on dark energy, expected to be large from voids (Bos et al. 2012;Pisani et al. 2015;Verza et al. 2019;Pisani et al. 2019).

Void shape adds information
Void definition has been an intense area of discussion in the past (van de Weygaert & Schaap 2009;Colberg et al. 2008;Cautun et al. 2018). Recently, to exploit cosmological information, most works have chosen the approach of testing a particular definition and pushing its use in measuring cosmological signals. This approach has been particularly successful, as the point is not finding the best definition, but merely understanding the sensitivity of a particular definition to the signal to measure. In the realm of maximising the signal and being able to model the signal, which can be more challenging for some void definitions than others, fully understanding the impact of definition choices is crucial.
There are various classes of void finders, but a striking differentiating feature is whether the algorithm assumes a spherical shape for voids. While it can be considered intuitive that a void finder with no shape assumption helps when trying to infer cosmological information from the void shape (that is for applications relying on the void-halo cross-correlation function, such as Alcock-Paczyński test and RSD measurements from voids, or void ellipticity), it is not so trivial to understand what is the relevance of considering void shape for applications such as the void size function. Our large set of simulations allows us to address this question. Figure 9. Comparison of constraints from a void finder with no prior on the shape of voids, and a more simplistic sphericalassumption based void finder. Voids are found from the same halo field.
In this Section we compare the information content captured when the void shape is measured in detail with the case in which a spherical assumption is made by the void finder. In other words we compare constraints obtained when selecting voids with VIDE, a void finder with no prior on void shape, and a more simplistic spherical-assumption based void finder (see e.g. Villaescusa-Navarro et al. 2020b). Figure 9 shows results of the comparison for a volume of 1 h −3 Gpc 3 . For most of the cosmological parameters considered in this paper the void size function measured by VIDE provides more stringent constraints than the void size function measured by the spherical void finder (see Appendix C for the full set of contours). We notice, however, that in some cases the orientation of constraints is different. This consideration is relevant: in future work aiming to maximise the constraints extracted from the cosmic web, it could be possible to combine constraints from different void finders for the void size function. One has to keep in mind, however, that the correlations between constraints from two different void finders may be non-trivial to be accounted for, but GIGANTES contains enough catalogues to estimate this covariance accurately.
We also note that this comparison is more realistic given that it is obtained with voids found in the halo distribution, and not in the dark matter field. These results showcase for the first time that even for non-shape based applications, such as the void size function, shape plays a strong role in determining the quality of constraints.

LIKELIHOOD-FREE INFERENCE OF COSMOLOGICAL PARAMETERS
In this section we show a machine learning application to the GIGANTES dataset. Our goal is to perform likelihoodfree inference from one of the most important summary statistics associated to cosmic voids: the void size function. In order to carry out this task, we need many examples from different cosmological models in order to be able to extract unique patterns that allow us to find a connection between the void size function and the value of the cosmological parameters.
We use the void catalogs from the high-resolution QUIJOTE latin-hypercube at z = 0 over a volume of 1 h −3 Gpc 3 . We use 18 bins equally spaced from R = 10.5 h −1 Mpc to R = 61.5 h −1 Mpc. Our goal is to predict the mean (θ) and standard deviation of the posterior (δθ) from the void size function where we are going to use neural networks to approximate the function f . We use moment networks (Jeffrey & Wandelt 2020) to achieve this. We show the results in Figure 10 for 40 different void size functions of the test set. We note that we only show results for Ω m and n s , as those are the only two parameter for which we have enough statistical power to find a relation Inference n s Figure 10. We use likelihood-free inference with moment networks (Jeffrey & Wandelt 2020) to estimate the mean and standard deviation of the posterior from measurements of the void size function. The above panels show the results for Ωm (left) and ns (right). Each point represents the posterior mean from a measurement of the void size function from one realization of the HR latin hypercube at z = 0. The error bars display the standard deviation of the posterior. For Ωm (left) and ns we are able to recover the true value in most of cases; it is expected that some points lie outside the posterior standard deviation interval. We are unable to recover the value of Ω b , h, and σ8 from these measurements, as we do not have enough statistical power for these parameters since we only use the VSF from 1h −3 Gpc 3 .
between the void size function and the value of the parameters. For the case of Ω b , h, and σ 8 our model just predicts the mean with a large errorbar that is not meaningful due to the edges of the prior.
As it can be seen, our method is able to recover the true value within the standard deviation in most of the cases; it is expected that the true value lies outside of the posterior standard deviation for some cases.
This simple application illustrates the power of the so-called simulation based inference, where accurate and reliable simulations can be used to perform inference over statistics whose likelihood is unknown. We emphasize that this procedure is not limited to summary statistics, but can be carried out at the field level.

CONCLUSION
We have introduced the state of the art GIGANTES void catalogs suite obtained from the QUIJOTE simulations. The catalogs, built with the popular void finder VIDE, provide a comprehensive set of more than 1 billion voids in a large range of ΛCDM cosmologies.
To present the GIGANTES dataset we show the void size function, average profiles, and the void auto-correlation function. We also compute the correlation among and between different void summary statistics, as well as between traditional quantities such as the halo mass function and the halo auto-correlation function.
To illustrate the immense power provided by the GIGANTES catalogs, we show for the first time in a realistic set up of voids found in the halo field that voids provide complementary and independent information with respect to halos. Voids allow us to extract higher-order information from the cosmic web that would otherwise remain inaccessible. Our results show that added information is particularly effective with respect to constraining neutrinos and h, but helps for most parameters of interest.
We show that, aside from the void-halo cross-correlation function and the void size function, a strong contribution to constraints comes from the void auto-correlation function. We also compare the power of halos with respect to voids when combining different summary statistics. Finally, by comparing constraints from VIDE voids with voids from a spherical void finder, we show that void shape knowledge provides an important contribution to the overall information content from voids.
While providing the largest and most realistic set of voids ever released, this paper answers many open questions relevant for void cosmology: Do voids provide additional information? How is that information correlated? Which parameters are optimally constrained by voids? Is void shape important for cosmological constraints? We have showed that voids allow us to go beyond the two-point correlation function and provide additional constraining power.
GIGANTES opens up the exploration of the cosmological constraining power of void statistics beyond the ones considered in this paper. Theory developments and instrument systematics analyses for void summary statistics will allow us to fully exploit the power of the void size function, the void-halo, the void-void correlation functions, and other void statistics to constrain cosmological parameters with voids from the next generation of surveys. The QUIJOTE simulations ) are a suite of 44, 100 full N-body simulations exploring more than 7, 000 cosmological models. These simulations contain trillions of particles over a combined volume larger than the volume of the entire observable Universe, and were designed for two main tasks: 1) to quantify the information content on cosmological observables, and 2) to provide enough data to train machine learning algorithms. Given their immense combined volume, they represent a powerful tool to explore the properties of extreme objects, such as galaxy clusters and cosmic voids. The simulations have been used in a range of papers (e.g. Samushia et al. 2021;Bernal et al. 2021;Gualdi et al. 2021a;Kuruvilla & Aghanim 2021;Obuljen et al. 2019;Banerjee et al. 2020;Hahn et al. 2020;Giusarma et al. 2019;Uhlemann et al. 2020;Friedrich et al. 2020;Kodi Ramanah et al. 2020;Massara et al. 2021;Dai et al. 2020;Philcox 2021;Philcox et al. 2021Philcox et al. , 2020aAllys et al. 2020;Cranmer et al. 2020;Aviles & Banerjee 2020;Parimbelli et al. 2021;Banerjee & Abel 2021a,b;Lazeyras et al. 2021;Gualdi et al. 2021b,a;Chartier et al. 2021;Giri & Smith 2020;de la Bella et al. 2020;Alves de Oliveira et al. 2020;Hahn & Villaescusa-Navarro 2021;Chen et al. 2020;). On average, each realization of the QUIJOTE simulations contains ∼ 500, 000 dark matter halos over a volume of 1 (h −1 Gpc) 3 at z = 0, corresponding to a halo number density of 4 × 10 −4 h 3 Mpc −3 and mean halo separation of ∼ 13 h −1 Mpc. The equivalent values at z = 1 are 2 × 10 −4 h 3 Mpc −3 and ∼ 17 h −1 Mpc, respectively. Since access to the void hierarchy may depend on tracer number density, it is relevant to notice that the tracer number density of the QUIJOTE halo catalogs is of a similar order of magnitude than what we can expect from surveys such as PFS (Tamura et al. 2016), and denser than DESI (Dark Energy Spectroscopic Instrument DESI). With the QUIJOTE halo catalogs, however, we have a tracer number density below what will be achieved by Euclid (Laureijs et al. 2011), and the Roman Space telescope (Spergel et al. 2015), indicating that these upcoming surveys will be able to fully capture the power from the void hierarchy, and can expect even tighter constraints on cosmological parameters than what we illustrate in this work.
A.2. The void finder VIDE VIDE is the most popular Voronoi-watershed based void finder used in Cosmology. Based on ZOBOV (Neyrinck 2008), VIDE provides a convenient, publicly-released toolkit to find voids in simulations (dark matter/hydro-dynamical) and data (spectroscopic or photometric data).
The void finder first performs a Voronoi tessellation of the tracer particle field (tracers can be dark matter particles, halos, galaxies), providing a physical splitting of the 3D particle field into cells. Each cell is assigned a density equal to the inverse of the volume of the cells. Starting from local minima (largest cells) of the tessellation, the algorithm creates basins by merging cells with a monotonic increase in density. Through the use of the watershed trasform, VIDE provides the final set voids.
Void centers are defined as the volume-weighted barycenters of the Voronoi cells composing the void. Void centers are therefore sensitive to the whole structure of the object. Considering a sphere of equivalent volume to the volume V i of all the cells, one can define an effective radius for voids, R: Even though it provides the effective radius (that we simply refer to as radius in the rest of the paper), VIDE makes no assumption on the void shape, conversely to other void finders assuming sphericity. The absence of a shape prior makes VIDE an excellent option for cosmological analyses for which a precise determination of void shape is a strength (e.g. Alcock-Paczyński test and redshift-space distortion analyses). Finally, VIDE is built to handle a survey mask, and can easily be applied to study voids from both data and simulations, as its wide range of applications shows. Among other results, it has been used to provide the most stringent to date constraints from cosmic voids (Hamaus et al. 2020). Recently, a Python 3 version of the code has been released 3 .
Since most applications of void finding in data for cosmology rely on Voronoi-based void finders, the work performed in this paper for this category of void finders is pertinent to current cosmological observational analyses. The large set of void catalogs presented in this work will be publicly released upon acceptance of the paper. Tables 2, 3, 4, 5, 6, and 7 show 1 standard deviation uncertainties on cosmological parameters from the denoted summary statistic and given number of simulation realizations marginalized over the parameters listed as well as Ω b . Percent differences relative to the error with 500 simulations are shown in parentheses. For all void summary statistics, the percent change in error is less than the percent change in the number of realizations, indicating the constraints are stable. Tables 8, 9, 10, 11, 12, and 13 also show 1 standard deviation uncertainties, but given the number of simulations used to compute the covariance. Percent differences relative to the error with 15000 simulations are shown in parentheses. Most percent changes are below 1%, indicating the covariance is stable.

B. FISHER ERROR STABILITY
Finally, in Table 14, we illustrate the impact on the Fisher uncertainties from computing the correlation functions on scales 15 − 200 h −1 Mpc in log space compared to computing the correlation functions on scales 20 − 200 h −1 Mpc in log space. Percent differences relative to scales 20 − 200 h −1 Mpc are shown in parentheses. Most changes are near 10%, although a handful are larger. An increase in error is expected as less data are used.

C. FULL CONSTRAINTS
We show in Figure 11 the constraints from different void summary statistics, as well as for the halo mass function and the halo auto-correlation function, for all the parameters considered in this paper: the sum of neutrino masses, σ 8 , n s , h, and Ω m . In Figure 12 and Figure 13 we show full cosmological parameter contours for combined halo and combined void summary statistics, and the spherical void finder VSF and VIDE VSF, respectively.             sphereVSF VSF Figure 13. Comparison of constraints for all parameters from a void finder with no prior on the shape of voids, and a more simplistic spherical-assumption based void finder. Voids are found from the same halo field.