Neutrino Mass Constraint from an Implicit Likelihood Analysis of BOSS Voids

Cosmic voids identified in the spatial distribution of galaxies provide complementary information to two-point statistics. In particular, constraints on the neutrino mass sum, ∑m ν , promise to benefit from the inclusion of void statistics. We perform inference on the CMASS Northern Galactic Cap sample of the third-generation Sloan Digital Sky Survey/Baryon Oscillation Spectroscopic Survey (BOSS) with the aim of constraining ∑m ν . We utilize the void size function, the void–galaxy cross power spectrum, and the galaxy auto power spectrum. To extract constraints from these summary statistics we use a simulation-based approach, specifically implicit likelihood inference. We populate approximate gravity-only, particle-neutrino cosmological simulations with an expressive halo occupation distribution model. With a conservative scale cut of kmax=0.15hMpc−1 and a Planck-inspired Lambda cold dark matter prior, we find upper bounds on ∑m ν of 0.43 and 0.35 eV from the galaxy auto power spectrum and the full data vector, respectively (95% credible interval). Relaxing the scale cut to kmax=0.2hMpc−1 , we find 0.28 and 0.32 eV. Generally, void statistics appear to add modest but nonzero information beyond the auto power spectrum. We also substantiate the usual assumption that the void size function is Poisson distributed.


I. INTRODUCTION
The Universe's ability to provide glimpses into experimentally inaccessible conditions has a long history, including the deduction of the laws of Gravity and the discovery of helium.In the present day, cosmology offers a unique view on the properties of neutrinos which are amongst the last unknowns in the standard model of particle physics.First evidence for a non-zero neutrino mass sum, m ν , came from the solar neutrino problem [1][2][3].Subsequently, oscillation experiments provided proof that neutrinos must have mass [4][5][6][7][8] and established the lower bounds of 0.06 and 0.1 eV in the normal and inverted hierarchy, respectively.The terrestrial experiment KATRIN currently sets an upper bound of ≥ 0.8 eV [9]. 1  However, the strongest upper bounds are already provided by cosmological data, the primary CMB alone giving 0.38 eV [10], for example.It will be one of cosmology's primary goals in the coming decade to tighten this bound and eventually detect neutrino mass.
One of the natural regimes to look at to constrain m ν are extremely underdense regions, cosmic voids [11][12][13][14].As the cold dark matter (CDM) flows out of the voids and into filaments and clusters, neutrinos are more smoothly distributed.Thus, the neutrino/CDM * lthiele@princeton.edu 1 The KATRIN bound is on m 2 β ≡ ν |U PMNS eν | 2 m 2 ν , so it only equals a bound on mν for a special, experimentally excluded, choice of the PMNS matrix and the mass hierarchy.In general, the bound is weaker.ratio is higher in the voids and lower in the clusters.These qualitative considerations have spawned considerable theoretical interest in the use of void properties to constrain m ν .This includes simulated data vectorlevel investigations [15][16][17][18][19][20] as well as forecasts [21][22][23].The forecasts find promising error bars on m ν , albeit under simplifying assumptions.Voids may be the first regime in which non-linear signatures of massive neutrinos will be observed [also c.f. 24].
In this work, we use voids identified in the CMASS sample of the Sloan Digital Sky Survey (SDSS)-III Baryon Oscillation Spectroscopic Survey (BOSS) [48][49][50] to place constraints on m ν .The void statistics we consider are the void size function (VSF) and the voidgalaxy cross power spectrum.We combine these with the usual galaxy auto power spectrum multipoles which by themselves already place a tight upper bound on m ν (through the suppression of matter power below the neutrino free-streaming scale) [e.g., 51,52].
However, all analytic approaches to modeling void statistics are problematic for our purposes.First, it is difficult to construct a consistent galaxy bias model across the different statistics comprising the data vector.Second, the calibration of analytic models typically did not utilize large simulations with varied neutrino mass.Third, existing models typically apply only to an aggressively cleaned subset of the entire void catalog, potentially leading to appreciable losses in constraining power.
Therefore, we choose to work in a simulation-based framework.Our simulations are based on particleneutrino, approximate gravity-only FastPM [76,77] realizations, in which we place galaxies through an expressive halo occupation distribution model (HOD) [78][79][80].We then post-process the galaxy catalogs to generate light cones incorporating survey realism.
The likelihood analysis with these simulations is a nontrivial problem.A popular approach is to build emulators of the mean data vector and perform the analysis under the assumption of a usually Gaussian likelihood where the covariance matrix is estimated from simulations.However, this approach turns out to be challenging for our problem.First, constructing an emulator in a 17-dimensional space (6 cosmology, 11 HOD) is quite difficult, especially given a feasible number of simulations.Second, the assumption of a Gaussian likelihood is wrong.We demonstrate in Sec.IV E that the VSF is very close to Poisson distributed (as long as bins are chosen wide enough, as would be naively expected), but modeling its covariance with the void-galaxy cross power spectrum and the galaxy auto power spectrum is difficult.
For these reasons, we opt for an implicit-likelihood2 approach [81].This formalism uses neural networks to approximate functions that can be converted into posteriors.In general structure, this work is therefore similar to the SIMBIG papers [82,83], but it differs in almost all details (statistics, simulations, objective, HOD, code).The resulting complementarity will therefore be useful to assess the state of implicit likelihood inference in galaxy clustering cosmology.
The rest of this paper is structured as follows.Sec.II describes our simulation pipeline.Sec.III contains details on the data vector and the inference procedure.Sec.IV collects our results and their interpretation.We conclude in Sec.V.The appendices contain additional material as well as information about data and code availability.
Since our objective is m ν , we place a tight prior on ΛCDM.For this, we use the posterior from the Planck [10] primary CMB analysis. 3Specifically, we use the chains run with fixed m ν and measure the mean and covariance matrix in the five "CMB parameters" ω c , ω b , log A s , n s , and θ MC .In these parameters the posterior is close to Gaussian and we approximate it as such.To ensure the robustness of our conclusions, we inflate the Planck error bars on cosmological parameters by a factor of two.For m ν , we choose a flat prior between 0 and 0.6 eV, the upper boundary being motivated by preliminary tests in which we established a sensitivity of the order σ ∼ 0.2 eV.We assume three neutrino species with degenerate masses.
Of course, the primary CMB's information leads to some correlation between m ν and the CMB parameters.This correlation is not included in our prior.However, given the sensitivity of the data used (compared to Planck ), these residual correlations have a relatively small effect.For example, projecting the m ν -ω c correlation in the Planck posterior to our upper prior boundary of 0.6 eV, we obtain a shift ∆ω c /σ EFT ∼ 0.25 where σ EFT is the error bar obtained from the EFTofLSS analysis of BOSS [84] (of which we only use a subset).From these considerations, it also follows that our results do not depend strongly on the precise choice of ΛCDM prior.
We draw from the cosmological prior using an open quasi-random sequence.In contrast to popular sampling methods such as latin hypercube or Sobol, an open sequence does not require initial knowledge of the total number of samples.Our sequence is constructed by taking integer multiples of a vector whose elements are powers of a generalized golden ratio. 4In hindsight it turns out that our results are insensitive to taking out random subsets from the sequence (c.f.appendix D), indicating that a simple pseudo-random sampling would have been sufficient.This is likely due to the aforementioned compactness of the prior compared to the data's sensitivity.

B. Cosmological simulations
We run 127 simulations with varied cosmologies and 69 at a fiducial cosmology, illustrated in Fig. 1. 5 We choose the fiducial cosmology close to the mean of the ΛCDM prior, with m ν = 0.1 eV.The cosmo-varied simulations share the random seed.This is because the decision to adopt an implicit-likelihood approach was only made after we encountered severe challenges with the conventional approach, as described in the introduction.However, as we shall see below, the simulations are large enough that sufficient quasi-independent data vectors can be generated.
After generating a cosmological parameter vector in the CMB parameters we replace θ MC by the Hubble constant using CAMB [85,86].We then produce power spectra at z = 99 using CLASS [87] and REPS [88,89].
We run particle neutrino gravity-only simulations using the approximate FastPM solver.We choose a box size of 2.5 h −1 Gpc with 2800 3 CDM particles, leading to a minimum resolved halo mass of ∼ 1.3 × 10 12 h −1 M that is sufficient for the CMASS LRG sample.With regard to the neutrino options in FastPM we follow Ref. [77]. 6In particular, we follow their recommendation to increase the number of early time steps for larger neutrino masses.Specifically, at m ν = 0 we take seven logarithmic steps (in scale factor) between z = 99 and z = 19, followed by twelve linear steps until z = 0.68.Afterwards, we take twenty steps until z = 0.44 during each of which 6 every_ncdm=4, n_side=4, n_shell=10 we write a snapshot of CDM particles to disk.At higher neutrino masses we insert up to twelve additional logarithmic steps before z = 79.The seemingly large number of twenty snapshots was established in our preliminary tests during which we saw slight differences between ten and twenty snapshots (in the summary statistics considered and for a single FastPM run) and thus decided to err on the side of caution.
Each simulation takes about 90 minutes on 70 40-core nodes of the Tiger machine at Princeton.

C. Galaxies
We fine-tuned the method to populate CDM snapshots with galaxies through preliminary tests.Specifically, we performed global optimization over a large and partly discrete space of halo occupation distributions considering two objectives: (1) the power spectrum multipoles of galaxies placed in Quijote simulations [90]; (2) the VSF of the CMASS data.The first step was intended to identify degrees of freedom that are necessary to correct for potential approximation errors in FastPM, while the second test was primarily meant to test our simulations' fidelity.The optimization problems were solved with optuna [91].In the following, we briefly describe the halo occupation distribution model (HOD) resulting from these preliminary tests.More detail can be found in appendix A.
We identify halos in the CDM snapshots using Rockstar [92,93], which in our preliminary tests performed better than the friends-of-friends finder shipped with FastPM.
Then, galaxies are assigned stochastically to halos using an HOD.Besides the usual five-dimensional model parameterized by M min , σ log M , M 0 , M 1 , α [94], we introduce six additional degrees of freedom.
Although assembly bias [95][96][97] has been argued to be not necessary to describe the clustering of CMASS galaxies [98][99][100], we decide to be conservative by adding assembly bias parameterized by P 1 and a bias .Furthermore, we add velocity bias [101][102][103][104] parameterized by η cen and η sat .Finally, we introduce redshift dependence to M min and M 1 , parameterized by µ(M min ) and µ(M 1 ).One advantage of having these slopes as free parameters is that we know them to be relatively close to zero, enabling useful sanity checks on any posteriors.
We populate the cosmo-varied simulations with galaxies according to HOD parameters drawn from the priors given in appendix A. For the simulations at the fiducial cosmology we only populate with a single HOD.We choose the HOD parameters used for the fiducial mocks based on preliminary inference runs using the VSF only.It turns out that these parameters are not very close to best-fit when considering the entire data vector; generating new mocks closer to the best-fit point may increase the efficiency of the compression step described below.
However, the difference in HOD parameters cannot cause biases since the fiducial mocks are not used in constructing the likelihood.

D. Light cones
We use the cuboid remapping code [106] to deform our simulated cubes to the CMASS NGC geometry.It turns out that there are two possible choices of remapping and we use both (as part of the augmentation scheme discussed below).
When projecting galaxies onto the light cone, we extrapolate their positions from the snapshots using the host halo velocities (using the stochastic galaxy velocities would weaken the correlation between centrals and satellites).The resulting corrections are small thanks to the large number of available snapshots.
After mapping galaxies to the light cone, we apply all angular masks and approximately mimic fiber collisions using the procedure described in Ref. [82].
In contrast to some other works, we downsample the galaxy field predicted by the HOD to the data's density n(z) (only, of course, if the simulation contains more galaxies at the given redshift).This downsampling is performed iteratively in conjunction with the implementation of fiber collisions so that both are self-consistent.Our implementation performs any necessary downsampling regardless of host halo properties; future work could take a prediction for stellar mass into account.

III. INFERENCE A. Data vectors
We use the northern galactic cap (NGC) part of the CMASS sample.The southern part (SGC) is smaller and we do not expect dramatic improvements from its inclusion.Since our focus is on better understanding the impact of void statistics, rather than the tightest possible bounds on neutrino mass, we ignore the SGC for simplicity.Similarly, we do not include the LOWZ sample; its lower volume makes it less suitable for void science.
We cut galaxies into the redshift interval 0.42 < z < 0.7 and map them to comoving space using a fixed Ω m = 0.3439.Voids are identified using the VIDE code [107] which is based on ZOBOV [108] and works by Voronoi tessellating the galaxies and then applying a watershed algorithm to find contiguous density minima.We use the "untrimmed" catalog computed by VIDE as it does not require arbitrary assumptions.
While many different void finders exist [e.g., [109][110][111], prior work suggests that shape-agnostic void finders such as VIDE yield voids with better constraining power on m ν than spherical finders [23].Future work could investigate the influence of void definition on signal-tonoise.
Galaxy auto power spectra P gg (k) and void-galaxy cross power spectra P vg (k) are computed using nbodykit [112,113] and pypower, 7 reducing variance with FKP weights [114] and correcting for observational systematics using the provided weights (except for fiber collisions, of course) [115].We only utilize the systematics weights when computing P gg .In the case of P vg , we find no significant change in posteriors when using the galaxy weights, consistent with Ref. [40].In the case of void identification, there is no guarantee that the obvious method to incorporate the systematics weights would yield cleaner voids.Given the relatively large void sizes considered in this work, we do not expect significant contamination by unmodeled survey systematics, but suggest that this point may warrant future work.Galaxy randoms are taken from the public catalogs.Void randoms are constructed by taking a large catalog of voids from many different mocks to choose angular positions and constructing a kernel density estimator in redshift matched to the specific void catalog.This procedure ensures that the randoms are consistent with a given cut in void radius since voids of different sizes have somewhat different angular distributions due to the survey mask.
We consider the VSF in 32 linearly spaced effective radius bins between 30 and 80 h −1 Mpc.The minimum radius cut is well above the mean tracer separation and thus we expect contamination by Poisson voids to be small [116,117].We split voids for the VSF into two redshift bins, separated at z = 0.53.This splits the CMASS sample approximately equally.
We perform analyses with k max = 0.15 and 0.2. 8We consider k max = 0.15 the conservative baseline choice but k max = 0.2 is still expected to be reliably modeled by FastPM as well as the halo model [e.g., 100].We do not use power spectra on scales larger than k min = 0.01 [118].Since our theoretical model is simulation based, we do not deconvolve the survey window function.This means that there is a small level of contamination by Fourier modes outside the k-range considered, but we assess this effect to be negligible.
In our baseline analysis we only use the monopole P gg 0 of the galaxy auto power spectrum multipoles.This choice was made based on the limited information content of the quadrupole (from EFTofLSS posteriors) and with the aim of simplicity.We discuss the effect of including the quadrupole below.For P vg we use both the monopole and the quadrupole.
It is worth noting that we opt to use reciprocal space void-galaxy cross power spectra P vg instead of the more popular configuration space correlation function.The correlation function has the primary advantage that one can rescale its argument on a void-by-void basis by the 40   respective radius and thus sharpen the resulting void profile (this is also possible in reciprocal space but computationally expensive, future work could explore this point).We believe, however, that the mixing of Fourier modes in the correlation function could lead to problems with approximate solvers like FastPM whose domain of validity is better localized in reciprocal space.In order to optimize signal-to-noise, we consider P vg computed with three different choices of minimum void radius, 30, 40, and 50 h −1 Mpc.An illustration of the data vector is given in Fig. 2.

B. Implicit likelihood inference
As discussed in the introduction, the standard emulator-based approach is difficult in 17 dimensions.The main reason is that the training objective for an emulator does not directly map to the ultimate goal of accurate posteriors, implying that the optimum needs to be very sharp (which requires many simulations).
Combined with the unknown likelihood function, we believe implicit likelihood inference (ILI) to be the appropriate tool for our task.
We opt for neural ratio estimation (NRE) [119][120][121][122][123] which recasts inference as a classification problem.The choice of an amortized instead of a sequential method was made based on the hierarchical structure of our simulations; we then opt for NRE because of its simplicity.In its original and simplest form, NRE works with pairs of parameter vectors θ, θ drawn from the prior p(θ).We then consider a data vector x drawn from the likelihood p(x|θ), where the simulation process described above approximates this draw.A neural network f maps the pairs (x, θ), (x, θ ) to scalars y, y .If we now choose a classification loss function L(y, y ), e.g., binary cross entropy it is easy to show that the functional optimization problem has the solution In other words, a neural network trained to distinguish between samples from p(x, θ) and samples from p(x)p(θ) approximates the likelihood-to-evidence ratio at optimum.Posteriors can then be obtained through usual Monte Carlo Markov Chain sampling which we perform with emcee [124,125].In practice, this general idea of approximating p(x|θ)/p(x) through a classifier works better in the multi-class version "NRE-B" [121].We use the implementation provided in the sbi package [126].
In the above, it is actually not necessary for the parameter vectors θ to be drawn independently from the prior p(θ).In fact, all that is required is that a sum over the simulated parameter vectors approximates the integral in Eq. (2).For this reason, it is correct for us to populate each of the 127 cosmo-varied simulations with multiple draws from the HOD prior (∼ 230).For each HOD draw we compute 8 augmentations as described below, yielding ∼ 1.7 × 10 5 training samples.
The ILI framework allows implicit marginalization over nuisance parameters.This is one of its primary benefits in high dimensional parameter spaces. 9In principle, we could take θ = { m ν } as one dimensional.In practice, it is likely better to include a subset of the nuisance parameters in θ.This is because we have intuition for the posteriors expected for some nuisance parameters and thus making them explicit allows useful checks.We opt to include log M min and µ(M min ) in the parameter vector.For the former we know that the data should provide a constraint considerably tighter than the prior, while for the latter we expect a result close to zero.The posterior on m ν is unaffected by this choice of θ and the extra computational cost in training and sampling the neural network is marginal when making two nuisance parameters explicit (however, making all nuisance parameters explicit would complicate the training unnecessarily).
We parameterize the classifier f as a residual neural network.Hyperparameter optimization was performed considering the loss on a validation set of 13 cosmologies (i.e., ∼ 2.4 × 10 4 mocks). 10We converged at a relatively large network with 1.7 × 10 7 trainable parameters but high dropout rates.
High-dimensional data vectors x are often problematic for ILI, our problem being no exception.This necessitates a compression step before the data vector is passed to the neural network.Since we expect our likelihood to be close to Gaussian/Poissonian, we use the linear score compression MOPED [127] to 17 compressed statistics.Indeed, MOPED is locally optimal both for a pure Gaussian and a pure Poissonian likelihood.We also experimented with the nuisance-hardened generalization [128] to five and ten dimensions, obtaining consistent but slightly wider posteriors.In order to construct the MOPED compression matrix, estimates for the covariance and derivatives of the data vector are required.We construct the covariance matrix from our fiducial mocks using the usual estimator.For the derivatives, we generate ∼ 10 5 additional mocks (10 3 parameter vectors, each with 96 augmentations) in a small ball around the fiducial model.We then perform linear regression and read off the derivatives.Simple tests indicate that the dependence on parameters is close to linear in the region considered.

C. Augmentations
As discussed before, our cosmo-varied simulations share the random seed.This fact ostensibly makes them unsuitable for the ILI approach discussed before since Checks for statistical independence of augmentations, in the compressed space.The data labeled "augments" are obtained by marginalizing over augmentations, while those labeled "ICs" are obtained by marginalizing over initial conditions.Top: ratio of average standard deviations, difference with unity not exceeding two percent.This check can also be performed with the uncompressed data vector, leading to similar conclusions.Bottom: distributions of loglikelihoods of covariance matrices under the Wishart distribution with our fiducial covariance matrix.The distributions have large overlap, again indicating that the augmentations are very close to statistical independence.
the integral in Eq. ( 2) requires a sampling of initial conditions.
However, as we shall discuss in this section, it is possible to generate many quasi-independent realizations from a single simulation.As mentioned before, we do not require independent identically distributed realizations, so this is in fact enough to approximate Eq. ( 2) with sufficient accuracy.
For a single 2.5 h −1 Gpc simulation box populated with galaxies, we can take the product of the following transformations: 2 cuboid remappings, 8 reflections, 6 axis transpositions.This results in 96 augmentations.In principle, many more augmentations can be generated through translations, but we expect these to be more correlated.
The crucial question now is whether these 96 augmentations approximate the distribution over initial conditions.We can answer this question by considering our fiducial simulations which have 69 different random seeds.Given the fiducial parameter vector, we generate a matrix D µa whose elements are data vector-valued and where µ = 1 . . .69, a = 1 . . .96 index the initial conditions and augmentations, respectively.We can perform statistical tests by computing marginals over µ and a individually or jointly.In order to simplify the statistical interpretation, we restrict a to 69 randomly chosen indices.
In the upper panel of Fig. 3, we compare the diagonals of covariance matrices in the MOPED compressed space.We see that the standard deviations are almost identical for marginalization over µ and a.This test can also be performed for the uncompressed data vector, yielding consistent results and no systematic differences between different summary statistics or scales.
In the lower panel, we perform a test considering the entire content of the covariance matrices.We construct the Wishart distribution given the covariance matrix C joint obtained by marginalizing over µ and a jointly and then compute the log-likelihood of the individually marginalized covariance matrices.If these covariance matrices were drawn from the Wishart distribution sourced by C joint , their log-likelihoods would be distributed as indicated by the green line.We see that the distributions are somewhat different but still have large overlap.In conclusion, the 96 augmentations reproduce the distribution over initial conditions reasonably well.Since 96 1, we expect the augmentations to provide a good approximation to the integral in Eq. ( 2).
Why does this augmentation procedure work?First, our simulation boxes are about 5.7× larger than the survey volume.Second, the augmentations alter the redshift direction.Third, galaxies and the survey mask interact.Fourth, galaxies are captured at different times so their peculiar motions alter their real space positions.All these points need to be seen relative to the specific survey and simulation configuration; the described augmentation procedure is certainly not expected to work universally.

IV. RESULTS
In this section, we first present our main posteriors on m ν from the CMASS NGC data, taking various combinations of the summary statistics VSF N v , voidgalaxy cross power spectrum P vg , and galaxy auto power spectrum P gg .
We present most of our posteriors in their cumulative form.This is because at the current level of precision, no neutrino mass detection is expected and upper bounds are the main objective.The cumulative posterior is the most direct visualization of upper bounds.In all plots we include a diagonal dashed line indicating the prior.
In the following, we will occasionally compare with results obtained with the EFTofLSS [118,129,130].The EFTofLSS allows for the analysis of the full-shape galaxy auto power spectrum (as well as other statistics we will not consider here).We use the window-less full-shape likelihood [84,131,132] 11 and the CLASS-PT code [133]. 12 We restrict the data included in the likelihood to the The addition of void-galaxy cross power spectra improves the constraint compared to the galaxy auto power spectrum, whereas the VSF has negligible impact.
NGC high-z sample, approximately equal to the data we use for our analysis.Furthermore, we impose the same ΛCDM prior while keeping the nuisance parameter priors equal to those implemented in the public likelihood code.In any comparison with our results we use identical k max .Likewise, we usually only use the monopole P gg 0 , consistent with our simulation-based analysis.The likelihood part termed "Alcock-Paczynski" in the EFTofLSS likelihood is included, since our method also effectively includes this term.On the other hand, we do not include the BAO reconstruction or real space likelihoods.
We emphasize that a comparison between EFTofLSS and HOD methods is beyond the scope of this work.Therefore, we will use the EFTofLSS posteriors to provide intuition, show that our posteriors are at least qualitatively reasonable, and for an interesting observation about the quadrupole P gg 2 later on.

A. Neutrino mass posterior
In Fig. 4, we show the baseline posterior on m ν , with k max = 0.15.The galaxy auto power spectrum gives a 95 % credible interval constraint of m ν < 0.43 eV.Upon inclusion of the VSF, the posterior broadens somewhat.Including the void-galaxy cross power spectrum tightens the posterior to m ν < 0.35 eV, a ∼ 20 % improvement.Further adding the VSF does not lead to any appreciable change.Posteriors are generally wider than the EFTofLSS result.
In Fig. 5, we show a similar set of posteriors obtained with k max = 0.2.We believe that our simulated model should still have a high level of fidelity at these somewhat  Figure 6.Coverage (q-q) plot, demonstrating that when performing inference on mocks the posteriors are well calibrated.
smaller scales.We observe that including smaller scales tightens the posterior, as expected.However, adding void statistics to P gg now slightly broadens the posterior.Most of the remainder of this section will be devoted to better understanding the observations from Figs. 4, 5.

B. Validation
Any simulation-based, and especially implicitlikelihood, inference necessitates rigorous validation of the simulated model, the likelihood approximation, and the resulting posteriors.In this section, we present three tests verifying different aspects of our pipeline.
First, in Fig. 6, we present the usual coverage (or q-q) plot [134].For this diagnostic, we perform inference on mocks drawn from the prior; in particular, we use the ∼ 2.4 × 10 4 validation mocks discussed before.We use the N v +P vg +P gg likelihood with k max = 0.15.For each resulting chain, we compute the marginal distributions of the explicit parameters and then the confidence level at which the true input parameter is located.In Fig. 6, we show the cumulative histograms of these confidence levels.If the posterior is well-calibrated, these CDFs should coincide with the diagonal.As can be seen, for all parameters considered this is the case.This diagnostic is a powerful internal consistency check and verifies that the neural network is well-trained.
Second, in Fig. 7, we show an interesting observation concerning the galaxy auto power spectrum quadrupole P gg 2 .As discussed before, this summary statistic has limited constraining power and we do not use it for our main posteriors.As can be seen in Fig. 7, adding the quadrupole to the data vector slightly broadens the posterior.This happens consistently in our analysis and in the EFTofLSS.We believe that this observation increases confidence in the validity of our simulation model, in particular the modeling of redshift space distortions.
Third, in Fig. 8 we compare the data posteriors with posteriors obtained by running inference on randomly chosen mocks generated at the fiducial point.We remind the reader that the fiducial HOD is rather far from best-fit which somewhat complicates the interpretation.
We observe that at k max = 0.15 the data posterior is tighter than most of the mock ones.If the cosmological simulations were to blame for this, the naive expectation would be for the discrepancy to become more severe as smaller scales are included in the analysis.However, this appears not to be the case: at k max = 0.20 the data posterior becomes more typical.We conclude that even though we observe hints of differences between data and simulations, the evidence is not conclusive and the data posterior could well be consistent with the observed distribution.It should also be noted that the real m ν may be less than the choice 0.1 eV with which the fiducials were run, potentially leading to a tighter data posterior.

C. Broadening of posteriors
One peculiar observation is that inclusion of void statistics can broaden the posterior on m ν .We do not fully understand this phenomenon and can only provide some suggestive results.These are more comprehensively described in appendix B; here we only provide a summary.
We observe similar broadening on fiducial mocks and thus propose that we are in fact observing a generic phenomenon.Therefore, we suggest that void statistics are most effective at constraining the neutrino mass sum from below.A further test using artificially enlarged volumes supports this theory.
For a potential physical explanation, we consider the free streaming length.
At z = 0.5, λ fs = 90 h −1 Mpc (0.3 eV/ m ν ) for degenerate masses.This length scale is comparable to the diameter of the voids that seem to contribute most (c.f.Sec.IV D).Thus, it may be that m ν at the upper end of the posterior is "invisible" to voids.However, we identify voids using tracers of small-scale fluctuations, so the full picture is much more complicated and could be a subject for further study.

D. Void radius
In Fig. 9, we show posteriors obtained with the N v + P vg +P gg likelihood, concentrating on void size.Cuts on effective radius are performed both in the VSF and P vg parts of the data vector.We observe that the posteriors are almost identical regardless of whether we cut at 30 (the baseline analysis) or 40 h −1 Mpc.On the other hand, further increasing the minimum radius to 50 h −1 Mpc removes much of the effect of voids on the posterior.constraining.Smaller voids might be contaminated by spurious Poisson voids and perhaps also due to their shallower density profile less affected by neutrinos.Larger voids presumably suffer from their low abundance.In Fig. 10 we show posteriors obtained from void statistics only.We show them mostly for completeness; in the present analysis these are entirely prior dominated.However, even in this plot we see the previously mentioned observation that larger voids appear to carry more signal.

E. Poissonian void size function
As a final point of this section, we substantiate the previous claim that the VSF is very close to Poissonian distributed.While this seems to be a natural assumption, void exclusion makes it non-trivial.Indeed, previous works have assumed Poisson likelihoods [21,46]; our simulations enable us to check this assumption.
In Fig. 11, we show two checks performed with our fiducial mocks.The left panel shows the covariance matrix divided by the outer square of Poissonian standard deviations; the result is close to the identity.The right panel shows a check using the variance-stabilizing Anscombe transform [135].For each mock data vector c (α) and bin i, we compute the transformed VSF count In the limit of large counts the distribution of these transformed counts converges to the standard normal if the counts themselves are Poissonian.As can be seen, the agreement with the standard normal is quite good indeed.These tests demonstrate that deviations from Poissonian distribution are small for the VSF, at least for the choice of binning considered here.

V. CONCLUSIONS
We have performed inference on galaxy clustering in the BOSS CMASS northern sample, combining the void size function, the void-galaxy cross power spectrum, and the galaxy auto power spectrum.Our primary target was the neutrino mass sum, m ν ; thus, we imposed a tight prior on ΛCDM informed by primary CMB data.
We argued that analytic models for the considered void statistics are not mature enough and unsuitable for our specific problem, necessitating a simulation-based approach.To this end, we ran approximate gravity-only simulations and populated them with galaxies using an expressive halo occupation distribution.Several factors motivated the use of implicit likelihood inference.
In our baseline analysis, we find m ν < 0.43 eV from the galaxy auto power spectrum alone, and m ν < 0.35 eV with the void statistics included (95 % credible in-terval).We performed several tests to confirm statistical and systematic validity of our likelihood approximation.
We performed a short investigation of the impact of voids on the neutrino mass posterior.It appears that the void statistics may be most effective in constraining m ν from below.This result would imply that future analyses aiming at measuring m ν may benefit from including void statistics.
Our results suggest that larger voids with effective radii > 40 h −1 Mpc carry most of the signal despite their lower abundance.This has interesting implications for future analyses, since voids of this size should be detectable in photometric catalogs with relatively low redshift error, such as the one expected for Rubin/LSST [136].Of course, spectroscopic surveys such as DESI [137], Euclid [138], SPHEREx [139], PFS [140], and Roman [141] will continue to be cornerstones of void science.The trade-off between volume, galaxy number density, and redshift precision warrants further investigation.
We also demonstrate that the void size function is very close to Poisson distributed, a feature that had been assumed in previous analyses but never explicitly confirmed.
Future work could improve upon our analysis in multiple ways.First, the cosmo-varied simulations should be run with different random seeds (we decided for a fixed seed in anticipation of an emulator-based analysis which ultimately turned out to be very difficult).Second, it may be beneficial to normalize the void-galaxy cross power spectrum by void number.Although in principle this would contain the same information as our data vector once the VSF is included, the necessary transformation is non-linear and thus potentially invisible to our data compression.Third, the HOD modeling could be improved.Some of our priors may not be optimal, and our n(z) downsampling is simplistic.The CMASS sample's completeness is quite well known and could be used to put a prior on the downsampling.Fourth, it turns out that the cosmological simulations did not dominate compute cost.It may therefore be economical to increase accuracy in FastPM or switch to a different solver.
Our results point toward a complicated picture with regard to the relationship between massive neutrinos and voids.Future data sets, both spectroscopic as well as photometric, promise to bring tight cosmological constraints from void science, since it scales well with number."posterior" "P gg " "voids" "P gg +voids" esis.One way, however, to test it is to look at average posteriors on our fiducial mocks.We perform inference on ∼ 20 randomly chosen mocks and plot the CDFs of concatenated chains in Fig. 12. There, we observe that the expected, average behavior is for the posterior to broaden once void statistics are added to the data vector.
The second possible explanation could be that once void statistics are added our compression procedure becomes less efficient.This could certainly be the case if at linear order the void statistics appear more constraining than they are globally, thus P gg would be unnecessarily downweighted.This hypothesis appears unlikely in light of the full posteriors presented in Figs. 15, 16.In these posteriors, we observe that for the parameters that are actually constrained (like M min ) adding void statistics generically tightens the posteriors.It appears unlikely that m ν should be an exception.Having found these two hypotheses unsatisfactory, we arrive at the third one: void statistics tend to constrain m ν from below.We illustrate this theory qualitatively in Fig. 13, which should not be interpreted as a literal depiction.In fact, in Sec.IV D we show that void statistics alone yield posteriors close to the prior.Fig. 13 provides merely an effective depiction.
We can investigate this hypothesis further by performing the following test.In order to increase signal-to-noise, we perform inference on four fiducial mocks at the same time, shown in Fig. 14.For this, we use a different set of neural nets in which we leave five nuisance parameters explicit.The reason is that all implicitly marginalized nuisance parameters are effectively assumed to be different for each of the four mocks, an effect we would like to minimize.Of course, increasing the number of explicit nuisance parameters complicates the training and we have less confidence in the precise calibration of the posteriors.For this reason, our baseline results were obtained with only two explicit nuisance parameters.For reference, the real data posteriors obtained with these alternative neural nets are shown in Fig. 16.We perform this test with two different sets of nuisance parameters kept explicit in order to gauge robustness (corresponding to the solid and dashed lines in Fig. 14 Posteriors on joint analyses of four randomly chosen fiducial mocks, averaged over ∼ 30 groups.The solid and dashed lines correspond to likelihoods with two different sets of five nuisance parameters kept explicit.We see that the posteriors where void statistics are included have a slightly more pronounced bump at the true value mν = 0.1 eV, consistent with the speculative picture in Fig. 13.
to Fig. 12, we average posteriors over ∼ 30 randomly chosen groups of four mocks in order to decrease sample variance.We observe that, consistent with our theory, the posteriors that include void statistics show a more pronounced hint of a bump at the true m ν .In principle, one could increase the simulated volume further by combining more mocks, but our neural nets are not calibrated at the required level of precision and thus the resulting posteriors would not be robust.
In summary, the more mundane ideas to explain the observed broadening of posteriors appear questionable given the tests presented.On the other hand, the idea that void statistics are most effective at constraining m ν from below receives support from our experiments.A more in-depth examination of this issue would constitute a great starting point for future work.

Appendix C: Corner plots
This appendix collects posteriors in the full parameter spaces considered.Fig. 15 shows the baseline parameter space with two explicit nuisance parameters.Fig. 16 shows larger sections of parameter space (it should be mentioned, however, that the corresponding neural networks were trained without further hyperparameter optimization, implying a somewhat lower level of confidence in the validity of these posteriors).Fig. 17 shows our EFTofLSS posteriors, demonstrating that the ΛCDM part of the parameter space is prior-dominated.cal prior.We test this by discarding a third of the simulations and training on the rest.The resulting posterior, compared to our baseline result, is shown in Fig. 18.Agreement between the two posteriors is almost perfect, demonstrating that our simulations cover the cosmological prior sufficiently well.
Appendix E: Simulation data About 50TB of halo catalogs, light cones, void catalogs, and summary statistics have been saved (at 20 times between z = 0.44 and z = 0.68 in 127 different massiveneutrino cosmologies with various HODs and 69 different initial conditions with a fiducial model).We are currently finalizing how to make this data set publicly available.

Appendix F: Code
In terms of new code, we have written C ++ code to populate halo catalogs with galaxies and to generate light cones including survey realism.
We have also written a C implementation of the quasirandom sampling scheme for uniform and Gaussian priors.
This work necessitated several small modifications to public codes: • REPS: read files generated by the current CLASS version; write output in a user-defined directory.• FastPM: do not write neutrinos to disk.
• Rockstar: native reading of the bigfile snapshots generated by FastPM (using the file chunking to read in distributed fashion since Rockstar does not use MPI).• Rockstar/find_parents: output to bigfile with lower priority fields in half precision.• cuboidremap: support for velocities.
• sbi: custom splitting into training and validation data.
Since all these items are relatively obscure, we do not provide documentation.However, we are happy to share any of these with interested researchers.A repository with most of the code is available at https://github.com/leanderthiele/nuvoid_production.

Figure 1 .
Figure 1.The cosmological parameter values sampled.We adopt a prior on ΛCDM that has the same shape and twice the size (in σ) as the Planck posterior.We sample this prior with an open quasi-random sequence designed to yield lowvariance estimates of integrals.The color scale correlates with mν .The red marker indicates the position of our fiducial simulations.
Figure 3.Checks for statistical independence of augmentations, in the compressed space.The data labeled "augments" are obtained by marginalizing over augmentations, while those labeled "ICs" are obtained by marginalizing over initial conditions.Top: ratio of average standard deviations, difference with unity not exceeding two percent.This check can also be performed with the uncompressed data vector, leading to similar conclusions.Bottom: distributions of loglikelihoods of covariance matrices under the Wishart distribution with our fiducial covariance matrix.The distributions have large overlap, again indicating that the augmentations are very close to statistical independence.

Figure 4 .
Figure 4. Cumulative posteriors on mν from different data vector combinations, with our baseline choice of kmax = 0.15.The addition of void-galaxy cross power spectra improves the constraint compared to the galaxy auto power spectrum, whereas the VSF has negligible impact.

Figure 5 .
Figure 5.Effect of increasing kmax to 0.2.The posterior shrinks, as expected, but remains broader than the EFTofLSS result.The impact of adding voids to the data vector is now reversed.

Figure 7 .Figure 8 .
Figure 7. Effect of adding the quadrupole P gg 2to the data vector.Consistent with the EFTofLSS result, the mν posterior broadens slightly.This effect is likely a statistical fluctuation.However, it provides a useful cross-check for how well our simulations model redshift space distortions.
Fig.9indicates that at least for the present analysis voids with effective radii between 40 and 50 h −1 Mpc are the most

Figure 9 . 2 ,= 50 Figure 10 .
Figure 9.Effect of considering voids above different radius cuts.Voids with radii between 40 and 50 h −1 Mpc contribute the majority of the observed tightening of the posterior relative to the P gg -only result.

Figure 11 .
Figure 11.Checks for Poissonian nature of the VSF.In the left panel, we show a rescaled covariance matrix obtained from our fiducial simulations of the VSF part of the data vector.For an exactly Poissonian distribution, this matrix would be the identity.In the right panel, we show the distribution of Anscombe transformed VSF counts.Overplotted is a standard normal.Both tests indicate a distribution that is very close to Poissonian.
sources at Princeton University which is a consortium of groups led by the Princeton Institute for Computational Science and Engineering (PICSciE) and Office of Information Technology's Research Computing.

Figure 13 .
Figure 13.Schematic illustration of the proposed mechanism explaining the broadening of posteriors when voids are included.This figure is only meant as a guide and not as a literal depiction of posteriors.Correlations and nuisance parameters complicate the picture.

Figure 15 .
Figure 15.Posteriors in the full parameter space considered, including the two HOD parameters we choose to keep explicit.The HOD posteriors are consistent between different analysis choices.

Figure 16 .Figure 17 .Figure 18 .
Figure16.Posteriors with different sets of nuisance parameters kept explicit.Since no hyperparameter optimization was performed when training the corresponding networks and the larger parameter space makes training more difficult, we assess these posteriors as less robust than our baseline results.The neutrino mass posteriors, however, are quite consistent.The left panel corresponds to the solid lines in Fig.12, the right panel to the dashed lines.
404060 0.05 0.15 0.05 0.15 0.05 0.15 0.05 0.15 0.05 0.15 0.05 0.15 0.05 0.15 0.05 0.15 Figure2.Illustration of the data vector considered.The vertical axis has been scaled such that the different data vector components are well visible and the power spectra are plotted as kP (k).The solid lines display best-fit mocks, considering separate parts of the data vector.The χ 2 was computed under the approximation of a Gaussian likelihood and the mocks were averaged over eight augmentations (no interpolation/emulation was performed).The fact that, e.g., the model with best-fit VSF still reproduces the other parts of the data vector reasonably well, as well as the reduced χ 2 values close to unity, indicate our mocks' high fidelity.Note that we do not use k < 0.01 and k > 0.15 in our baseline analysis and these scales were not included in identifying the best-fit models shown here.