Quijote-PNG: Quasi-maximum likelihood estimation of Primordial Non-Gaussianity in the non-linear dark matter density field

Future Large Scale Structure surveys are expected to improve over current bounds on primordial non-Gaussianity (PNG), with a significant impact on our understanding of early Universe physics. The level of such improvements will however strongly depend on the extent to which late time non-linearities erase the PNG signal on small scales. In this work, we show how much primordial information remains in the bispectrum of the non-linear dark matter density field by implementing a new, simulation-based, methodology for joint estimation of PNG amplitudes ($f_{\rm NL}$) and standard $\Lambda$CDM parameters. The estimator is based on optimally compressed statistics, which, for a given input density field, combine power spectrum and modal bispectrum measurements, and numerically evaluate their covariance and their response to changes in cosmological parameters. We train and validate the estimator using a large suite of N-body simulations (QUIJOTE-PNG), including different types of PNG (local, equilateral, orthogonal). We explicitly test the estimator's unbiasedness, optimality and stability with respect to changes in the total number of input realizations. While the dark matter power spectrum itself contains negligible PNG information, as expected, including it as an ancillary statistic increases the PNG information content extracted from the bispectrum by a factor of order $2$. As a result, we prove the capability of our approach to optimally extract PNG information on non-linear scales beyond the perturbative regime, up to $k_{\rm max} = 0.5~h\,{\rm Mpc}^{-1}$, obtaining marginalized $1$-$\sigma$ bounds of $\Delta f_{\rm NL}^{\rm local} \sim 16$, $\Delta f_{\rm NL}^{\rm equil} \sim 77$ and $\Delta f_{\rm NL}^{\rm ortho} \sim 40$ on a cubic volume of $1~(\mathrm{Gpc}/h)^3$ at $z=1$. At the same time, we discuss the significant information on cosmological parameters contained on these scales.


INTRODUCTION
The best cosmological probe of primordial non-Gaussianity (PNG) has been, up to now, the Cosmic Microwave Background (CMB) (Akrami et al. 2020). The available information in the angular bispectrum (i.e., the three-point function of harmonic multipoles) of CMB primary anisotropies has been, however, nearly completely extracted. If we want to obtain significant improvements over current primordial NG (PNG) constraints for most models, we thus have to look at different observables. To this purpose, galaxy clustering carries a significant potential (Alvarez et al. 2014;Karagiannis et al. 2018;Meerburg et al. 2019), since the 3D galaxy density field contains many more bispectrum modes than the 2D CMB fluctuation fields. Most of these modes are however in the gravitational non-linear regime. The important challenge to face is therefore to separate, up to the smallest possible scales, the PNG signal from the late-time NG component, induced by evolution of cosmic structures.
To tackle this issue, significant efforts have been devoted to obtain theoretical predictions of the galaxy bispectrum, via a suitable perturbative treatment valid on the largest scales (for the current state of the art, see Ivanov et al. 2022, and references therein). This approach has even led recently to the first constraints on PNG using both the galaxy power spectrum and bispectrum, by analyzing data from the BOSS survey (Dawson et al. 2013;Cabass et al. 2022a,b;D'Amico et al. 2022). 1 Recent works based on field-level inference (Baumann & Green 2021;Andrews et al. 2022), neural networks (Giri et al. 2022), or reconstruction methods (Shirasaki et al. 2021), have however shown that much more information should be present in the data. To extract this extra information, alternative promising observables as the density probability density function (Mao et al. 2014;Uhlemann et al. 2018;Friedrich et al. 2020), persistent homology (Biagetti et al. 2021b(Biagetti et al. , 2022, or higher order correlation functions in Fourier space (Gualdi et al. 2021b,a), have been considered. It has also been confirmed that probing non-linear scales, using the power spectrum and the bispectrum, improves significantly the constraints on cosmological parameters (Hahn et al. 2020;Hahn & Villaescusa-Navarro 2021). Checking if this is also the case for PNG is then an important question. It is however unclear if analytical approaches can go significantly beyond the mildly nonlinear regime.
In this work, we thus consider instead the alternative, simulation-based route using a large number of realizations of the matter density field. This paper, and its companion paper Coulton et al. (2022), are the first of a series of studies, in which we plan to take a step-by-step approach and address the problem in increasing level of complexity. We start here by discussing the general implementation of our statistical estimation and data compression algorithms based on the modal bispectrum estimator, while, in Coulton et al. (2022) we describe in more detail our input simulations and use them to perform an independent Fisher matrix analysis using a binned bispectrum estimator. Moreover, for additional complementarity, the two analyses are performed at different redshifts. In Coulton et al. (2022), the main focus is on z = 0, where the non-linear effects are maximal. Here, we study the redshift z = 1, more suited for comparison with upcoming surveys such as Euclid (Laureijs et al. 2011;Amendola et al. 2018). At this initial stage, we focus our analysis just on the dark matter field, with a twofold aim: to setup the general pipeline that we will also use in following analyses and to address the first crucial question of how much PNG information can be in principle extracted from the matter field, by pushing the analysis up to small scales, which would be hard to model analytically. In a follow up paper, we will extend this analysis to halos, whereas in later studies we will finally consider the galaxy density field and account for additional important effects, such as redshift-space distortions and incomplete sky coverage.
We extract the power spectrum and bispectrum of a large number of realizations of the matter density field, allowing us to describe precisely the contribution of PNG down to nonlinear scales (k max = 0.5 h Mpc −1 ), as well as measuring the corresponding covariance matrix including all non-Gaussian and off-diagonal terms (see Biagetti et al. 2021a, for a discussion of their relative importance). We then combine these measurements into optimally compressed statistics, a procedure which was shown to be extremely efficient to deal with the galaxy bispectrum in Gualdi et al. (2018Gualdi et al. ( , 2019. Finally, following the general scheme developed in Alsing & Wandelt (2018), we derive an estimator to jointly measure ΛCDM and PNG parameters.
The plan of the paper is as follows. In section 2 we describe the PNG models that we consider in this work and define the parameters in our analysis; in section 3 we describe our estimators of the power spectrum and bispectrum and the optimal data compression procedure applied to build a quasi-maximum likelihood estimator of PNG and cosmological parameters; in section 4 we discuss the application of these estimators to our input mock datasets and show the main results of our analysis; in section 5 we summarize our conclusions and discuss future prospects.

PRIMORDIAL BISPECTRUM SHAPES
For most inflationary models, the main PNG signature is a non-vanishing bispectrum in the primordial curvature perturbation field, which is defined as: where Φ is related to the comoving perturbation variable on super-horizon scales by Φ ≡ 3 5 ζ and the Dirac delta function enforces translation invariance. The quantity F (k 1 , k 2 , k 3 ) defines instead the functional dependence of the bispectrum on different Fourier space tripletsthe so-called bispectrum shape -and it depends only on the length of the wavevectors in virtue of rotation invariance. Finally, the dimensionless PNG amplitude parameter f NL measures the strength of the PNG signal, for a given shape. A main goal in our analysis is to build an efficient f NL estimator for the three main bispectrum shapes, namely the local, equilateral and orthogonal 2 shapes, defined by the following templates (see Akrami et al. 2020, and references therein): In the present study we directly work with the matter density field; in such case, for small PNG, nearly all the information on f local NL , f equil NL , f ortho NL is contained in the field bispectrum. When dealing instead with observed dark matter tracers, it is well known that an important additional signature of PNG arises in the form of a scale dependent bias term, sourced by the squeezed configurations of the primordial bispectrum (Dalal et al. 2008;Slosar et al. 2008;Matarrese & Verde 2008;Afshordi & Tolley 2008;Verde & Matarrese 2009;Desjacques & Seljak 2010) (hence, especially important for PNG of the local type). In such case, the large scale power spectrum carries significant extra information on the PNG amplitude. Since at this stage we perform joint estimation of f NL and standard ΛCDM cosmological parameters, the power spectrum is already included in our analysis. Therefore, in follow-up studies that will consider dark matter halos and galaxies, no essential modification of our current data analysis pipeline will be needed, except in the input fields to analyze.

Summary statistics
As discussed, the aim of our work is to efficiently combine power spectrum and bispectrum measurements of the matter field in the input simulations, to maximize the PNG information that can be extracted, up to the smallest possible scales. This will be achieved by resorting to a suitable compression scheme, taking the power spectrum and bispectrum as the starting summary statistics and requiring numerical evaluation of their covariances, to build the standard quasimaximum likelihood estimator described in Alsing & Wandelt (2018). This data compression step is also a key ingredient to perform density-estimation likelihoodfree inference (see e.g. Alsing et al. 2019), an alternative approach we aim to pursue in a future work. Forecasts based on data compression also have the advantage of being conservative, whereas a standard Fisher matrix analysis using the starting set of modes could in principle underestimate errors; this would happen in case of a lack of statistical convergence in the Monte Carlo averages, due to a low number of input simulations. In the current analysis, we however explicitly show that we can achieve optimality, thanks to fast and efficient estimators to measure the power spectra and bispectra, which are applied to a large enough set of input realizations.

Power spectrum
To measure the power spectrum, we use the standard estimator (e.g. Feldman et al. 1994) where δ(k) is the density field in Fourier space defined on a grid, the k-range has been divided into bins ∆ i of width the fundamental mode, V is the survey volume and N i is the number of vectors k in each bin.
The main idea of modal estimation is to expand a general weighted bispectrum shape in a factorizable basis: where Q n ≡ q {p (k 1 )q r (k 2 )q s} (k 3 ) corresponds to the symmetrized product of one-dimensional basis functions q p (k) and a partial ordering n ↔ (p, q, r) has been introduced (see e.g. Fergusson et al. 2012a), w is a weight function and β are the expansion coefficients. If the input bispectrum is a smooth function of k 1 , k 2 , k 3 , it can be well approximated by truncating the sum over a number N of modes that is much smaller than the total number of Fourier triangles. γ mn ≡ Q m |Q n is the inner product between mode functions, where the following scalar product (see appendix A.2 for details on the derivation) is computed over the bispectrum tetrapyd domain V T , i.e., the set of all the Fourier triplets that form a closed triangle and such that each wavenumber has a length in the chosen range [k min , k max ]. Each coefficient β n can be estimated by fitting the corresponding mode Q n to the data bispectrum. Making the customary choice of a separable weight function using the total power spectrum P (k): one can show e.g., (e.g. Fergusson et al. 2012b) that the modal estimator takes the form: where This inverse Fourier transform can be computed in an efficient way using fast Fourier transformation (FFT) routines (e.g. FFTW 3 ), makingβ n extremely fast to extract from data. In this paper, we work directly with the observed modal coefficients β n to estimate primordial non-Gaussianity amplitudes f NL and other cosmological parameters. We therefore do not need to evaluate the inner product matrix γ, a rather technical step of the numerical pipeline. The only part in this work, for which the calculation of γ is needed is when we reconstruct the bispectrum from the modal basis, presented in figures 2 and 14, obtained by substituting the measured β n into eq. (6). We describe our method to compute γ in detail in appendix A.2. The basis of one-dimensional functions q p (k) we use here is mainly composed of well-behaved polynomials normalised within the tetrapyd that fits inside a unit cube (see Fergusson et al. 2012a, for details). This means that in the above expressions q p (k) → q p ((k − k min )/(k max − k min )). We also include custom modes based on the separable matter bispectrum shapes, i.e. the gravity induced tree-level bispectrum and the local PNG template eq. (2) to further improve the bispectrum decomposition in eq. (6). This was introduced in Hung et al. (2019a) and further developed in Byun et al. (2021), see appendix A.1 for details. A general useful feature of the modal estimator is that, for a proper choice of basis, it allows for a preliminary data compression step, by reconstructing the signal in a small number of modes, compared to the starting amount of triplets in Fourier space. This speeds up the numerical evaluation of covariances, which is in turn a key ingredient for the final compression step described in the next section.

Data compression and quasi maximum-likelihood estimator
To build an optimal estimator of primordial non-Gaussianity amplitude parameter f NL 's, we follow the method developed in Alsing & Wandelt (2018), based on a two-step data compression scheme. The first step consists on extracting a set of summary statistics from a full dataset, which in our case are the power spectrum P (k) and bispectrum modes β n , described in previous paragraphs. The second step is an additional compression, down to only n numbers where n is the number of parameters we aim to infer from data. The optimal compressed quantities, in the sense that no information about the wanted parameters is lost, are the components of the score function (the parameter gradient of the log-likelihood), which can be computed explicitly by assuming an approximate form for the likelihood of the summary statistics. When working with the power spectrum and bispectrum, a Gaussian likelihood is a reasonable approximate choice, by virtue of the central limit theorem. 4 The compressed statistics , called t, then take the following form: where d corresponds to the summary statistics (a vector of modal coefficients β n and possibly power spectrum bins P (k)), µ and C are respectively the mean and the covariance of d, θ denotes the parameters of interest and the subscript * indicates that the quantity is evaluated at the chosen fiducial cosmology. Note that in eq. (12), we made the further assumption that the covariance C does not depend on the parameters θ and in that case the compressed statistics t is equivalent to the moped data compression of Heavens et al. (2000). By construction, the covariance of the compressed statistic t at the fiducial point θ * is equal to the Fisher matrix F * , where One can also write the following quasi maximumlikelihood estimator for the parameters θ: which has an expected covariance given by the Fisher matrix evaluated at the fiducial point F * . In this work, we will compute both the derivatives and the covariance matrix from N-body simulations, which makes it possible to include nonlinear scales in the analysis and estimate their constraining power on primordial non-Gaussianity.

Simulations
The analysis presented in this work is mainly based on the publicly available Quijote suite of N-body simulations, 5 see Villaescusa-Navarro et al. (2020) for details. These simulations are cubic boxes of length 1 h −1 Gpc, containing 512 3 particles, generated with the Gadget-III code (Springel 2005), with input transfer functions computed by CAMB (Lewis et al. 2000). Initial conditions are generated at z i = 127 using the code 2LPTIC (Crocce et al. 2006). We use a set of 8, 000 simulations at fiducial cosmology to evaluate the covariance matrix, together with smaller sets of 500 realizations, in which parameters are one by one slightly displaced from their fiducial values, to numerically compute the derivatives in eq. 12. Fiducial parameter values and steps used for numerical differentiation are reported in table 1. In appendix B, we illustrate the effects of small parameter variations on the power spectrum and bispectrum. Throughout the analysis, we work at redshift z = 1.
To study the effects of PNG, we use the Quijote-png set presented in detail in the companion paper (Coulton et al. 2022). Non-Gaussian initial conditions are generated using the method developed in Scoccimarro et al. (2012), and evolved following the exact same procedure as the standard Quijote N-body simulations described above. For each of the three standard primordial shapes -local (eq. 2), equilateral (eq. 3) and orthogonal (eq. 4) -we analyze two sets of 500 realizations, with either f NL = +100 or f NL = −100.
For further testing, we also consider three additional sets of 10 simulations, with f local NL = −40, 0 and 40, all generated from the same Gaussian seeds, independently of the Quijote fiducial realizations, but with the same numerical specifications and input cosmological parameters. For this suite of simulations, initial conditions have been generated with a modified version of the PNGRun code (Wagner et al. 2010) that imprints local NG contributions on top of the primordial Gaussian potential field, starting from a prescribed shape of the primordial matter bispectrum (see Wagner et al. 2010, for a detailed description of the PNGRun algorithm). The simulations have been evolved with Gadget-III by including the contribution of relativistic species in the background cosmic expansion, which results in a different normalisation at the starting redshift of the runs, compared to Quijote to match the same perturbations amplitude at z = 0.

Results
The first step of our analysis consists in the extraction of power spectrum and bispectrum modes from all available simulations, using the methodology outlined in section 3.1.
Modal bispectrum estimation requires us to fix k max beforehand. In this work, we consider three different values: k max = 0.07 h Mpc −1 (linear), 0.2 h Mpc −1 (mildly non-linear) and 0.5 h Mpc −1 (non-linear), in order to study the amount of information as a function of scales. We begin by depositing particle positions into a regular grid using the public Pylians3 6 code, with a fourthorder interpolation scheme and a resolution N grid = 360 in the non-linear case, as in Hahn et al. (2020), and the  Figure 1. The impact of PNG on the matter power spectrum at z = 1 (ratio NG/G, averaged from 500 N-body simulations). We show the local (green), equilateral (orange), and orthogonal (purple) shapes. Solid and dashed lines correspond respectively to positive and negative fNL values (±100).
faster cloud-in-cell interpolation at the lower resolutions N grid = 128 and N grid = 64 7 for the k max = 0.2 and 0.07 h Mpc −1 cases respectively. As an initial visual check, in figure 1 we show the impact on the power spectrum of including PNG in the simulations at redshift z = 1. As expected, such impact is essentially negligible on linear scales and rises to percent level, at most, in the non-linear regime, despite the large NG amplitudes in input (f NL = ±100). Moreover, the effect of local and equilateral NG is very degenerate. In figure 2, we show the same comparison for bispectrum 7 It has been argued in Sefusatti et al. (2016) that the FFTbased standard bispectrum estimator can probe scales up to to kmax = 2k Ny /3, where the Nyquist frequency is k Ny = k f N grid /2 and k f = 2π/L is the fundamental frequency of the box. This is due to the factor e i(k 1 +k 2 +k 3 )·x , present in the standard estimator, which is invariant under a translation k i → k i + k f N grid /3. This indicates that any estimator that has this exponent factor will have the above momentum cutoff scale, rather than kmax = k Ny . In Byun et al. (2021) they test this and find indications that the modal estimator falls under the same category, since it takes a form similar to the standard bispectrum estimator (see eq. A6). Taking this into account, gives the largest values for the wavenumbers k to be kmax 0.75, 0.26 and 0.13 h Mpc −1 for N grid = 360, 128 and 64 respectively, well above our chosen kmax values.
configurations, reconstructed from our modal basis, by using eq. 6, where the details in calculating the γ matrix, needed for this step, can be found in appendix A.2. While the effect remains at percent level for a single triangle, we have now of course a much larger number of available configurations. Moreover, we can clearly see how the effect of the local primordial template peaks in the squeezed limit -where one k is much smaller than the other two -whereas the equilateral shape produces the largest signal when three k are of the same order.
The orthogonal bispectrum has both squeezed and equilateral components.
In figures 3 and 4, we show some modal bispectrum coefficients β n of the Quijote simulations, evaluated using eq. (10). Figure 3 is the direct equivalent of figure 2 at the modal level, and also highlights the very distinct behaviours of the three primordial shapes. In figure 4, we compare the modal coefficients of the fiducial Gaussian simulations to their theoretical prediction, computed at tree-level. The first five modes correspond to the special functions based on the tree-level and local separable templates (see appendix A.2 for more details), while the rest are polynomials of increasing degree, from left to right. As expected, the difference between measured and predicted signals increases significantly with k max . Note also that in this analysis we work with the Q n separable modes defined in eq. (6). These, by definition, are not orthogonal with respect to the inner product defined in eq. (7) and always display non-trivial correlations. For this reason, in modal bispectrum analyses a new basis is often introduced, by orthogonalizing the Q n templates. This procedure, however, only removes the correlations in the weakly non-Gaussian case and, after verification, does not bring any improvement to the results presented in this paper. On the contrary it can add some numerical instability when including highorder polynomial modes; thus, it is not included here.
After extracting summary statistics from the data and checking that they behave according to expectation, the next step of the analysis is the optimal compression to the score function, following the procedure described in section 3.2. This requires the evaluation of the mean and the inverse covariance matrix of the estimated power  spectrum bins and bispectrum modal coefficients, as well as their derivatives with respect to each considered parameter (cosmological + NG amplitudes). We compute the data covarianceĈ from 8000 Quijote N-body simulations at fiducial cosmology and we apply the Hartlap/Anderson correction factor (Hartlap et al. 2007), to obtain an unbiased estimate of the precision matrix: where n r is the number of realizations (8000 here) and n d represents the size of the data vector (a few hundreds at most here). 8 In figure 5, we show the full correlation matrix given by C ij / C ii C jj , including both power spectrum bins P (k) and mode amplitudes β n . As expected, the power spectrum becomes more correlated on smaller scales. Bispectrum modal coefficients however have the opposite behaviour, becoming less correlated at higher k max . This is due to the fact that each mode is, by definition, the fit of a given template (polynomial, or simple separable bispectrum shape) to the data, including all scales in the range [k min , k max ]. Thus, an increased k max helps to better distinguish between them, leading to lower correlations. Note also the non-zero correlation between the power spectrum and the bispectrum, which will play an important role in the analyses presented in this paper. We do not include super-sample covariance terms in our analyses, which have a negligible impact on our results as shown in our companion paper (Coulton et al. 2022). We also use these 8000 simulations to compute the mean value of the modal coefficient vector, which fixes the score function to zero at fiducial cosmology with no PNG. The derivatives can be calculated quickly from the sets of 500 simulations with one adjusted parameter and matching random seeds, 9 using the standard formula: where θ can be any of the cosmological or NG amplitude parameters considered in this paper and d are either power spectrum or bispectrum mode estimates.

Fisher constraints
With the numerically evaluated, joint covariance matrix of power spectrum bins and bispectrum modes, described in the previous section, we can derive expected error bars for the quasi-maximum likelihood estimator (eq. 14), using the Fisher matrix defined in eq. (13).
In table 2, we present the resulting constraints on cosmological parameters σ 8 , Ω m , Ω b , n s , h and PNG amplitude parameters f local NL , f equil NL , f ortho NL , for the three different k max studied. We consider the power spectrum and the bispectrum (modal coefficients) separately and then analyze them jointly, in order to gain insight on their relative constraining power at different scales.
In the upper part of the table, we assume Gaussian initial conditions. As pointed out in Hahn et al. (2020), we see in this case that adding bispectrum information does help to improve power spectrum constraints on cosmological parameters. While the gain is marginal on linear scales, it becomes significant in the non-linear regime, where the bispectrum alone is actually a more powerful probe than the power spectrum alone, for all considered parameters, with the exception of σ 8 .  In the lower part of the table, we include both cosmological parameters and PNG of the three standard shapes. In this case, if we look at bounds from the power spectrum alone, we not only see that PNG amplitudes are essentially unconstrained, as expected, but also that errors on other parameters significantly degrade in comparison with the primordial Gaussian case. This is due to degeneracies arising between f NL and cosmological parameters, which are even more pronounced at non-linear scales. The most affected parameter is σ 8 , the error of which increases by more than a factor 3. The bispectrum now plays a crucial role not only in constraining PNG at a significant level, but also in breaking such degeneracies and allowing for large improvements in predicted errors for cosmological parameters.
Note also that in table 2, we assume that all PNG shapes are simultaneously present in the data. However, we verified that similar degeneracies are introduced even when we consider one NG shape at a time. This is not surprising, remembering that introducing NG of different types produces a similar impact on the power spectrum (see figure 1).
When combining the power spectrum and bispectrum information, we recover very similar error bars on cos-mological parameters as in the purely Gaussian case, meaning that degeneracies are strongly broken. For the same reason, we also see that PNG bounds from the bispectrum alone are improved by a factor ∼ 2 when we add power spectrum measurements, despite the fact that the power spectrum yields almost no PNG constraining power by itself. Interestingly, even if we completely fix cosmological parameters in the analysis, combining power spectrum and bispectrum still produces some improvement in PNG constraints. In this case, degeneracies clearly cannot play a role. It turns out that the improvements are now due to correlations between the power spectrum and the bispectrum, displayed in figure 5 (intuitively, at tree level, the gravitational bispectrum is linked to the power spectrum squared, via a convolution kernel; hence, measuring the latter produces an information gain also on the former. A better knowledge of the gravitational bispectrum allows in turn for a better separation of this component from the primordial signal). 10 In table 3, we report the independent Fisher bounds for the three PNG shapes, after marginalizing over cosmological parameters.
Another important result is that probing non-linear scales significantly improves the constraints on every considered parameter. This is further illustrated in figure 6, where we show the contours obtained using jointly the power spectrum and the bispectrum for different k max . A more detailed analysis of the information content of different scales is in our companion paper (Coulton et al. 2022), where we show that errors tend to saturate at k max = 0.3 h Mpc −1 , at z = 0, due to correlations between Fourier modes at strongly non-linear scales.
The accuracy of the modal expansion of the bispectrum has a strong impact on expected error bars and thus it is important to verify that the modal results presented in this section are fully converged. There are several solutions to improve the modal reconstruction, in order to achieve higher accuracy with as small as possible number of modes. Some of them involve using custom separable templates, corresponding to, or highly correlated with, expected signals in the data. In this work, we already pursue this approach by including the tree-level matter bispectrum and the primordial local shape in the modal basis. In the future, we plan to explore it further, by considering, for example, separable templates that are strongly correlated to the predicted bispectrum at one-loop in perturbation theory, or describing the other primordial shapes.
The obvious, although less economical, alternative approach to improve reconstruction accuracy simply consists of including additional, higher order polynomials in the basis. We consider here polynomials up to the degree 20, which are combined in triplets to obtain several hundred bispectrum modes. In figure 7, we study the convergence of the constraints as a function of the number of modes included in the modal expansion, each data point from left to right corresponding to adding polynomials of one degree higher. As could be expected, the accurate description of the bispectrum on small scales requires more modes. The constraints on some cosmological parameters are not even fully converged by taking polynomials up to degree 20, making our Fisher forecast 10 An additional role is played by PNG contributions to the power spectrum, at loop-level, if we perform a single shape analysis. Such contributions are however degenerate among different shapes.
in such case slightly overly conservative; we verify that the results on primordial non-Gaussianity are however always converged. The situation is significantly better with the joint power spectrum and bispectrum analysis. In the mildly non-linear regime, we achieve convergence with ∼ 100 modes, and using only 10 modes produces constraints that are already within 10% of their converged values. Even in the non-linear regime, going beyond ∼ 100 modes we improve the Fisher bounds only by ∼ 1% (except for Ω b and h for which it is of a few percent). This means that, for the combined power spectrum and bispectrum analysis, we can stop the polynomial expansion to a lower degree than 20, with a negligible loss of information. To be exact, the constraints given in table 2 use polynomials up to the degree 12 for the joint power spectrum + bispectrum case, and 20 for the bispectrum only analysis. Note that using higher order polynomials can add numerical noise to the analysis, thus stopping the expansion at a reasonable degree, like we do here, is a more robust approach. The numbers discussed above show that the modal representation is quite efficient. Compare, for example, the ∼ 100 modal coefficients in the current analysis to the thousands of k-triplets that are typically necessary to study the bispectrum at the same non-linear scales, using the standard, "binned" estimator (e.g. ∼ 2000 triangle configurations with rather large bins of width 3k F ). This more efficient compression is, for example, important to increase the accuracy of the numerical derivatives evaluated from simulations.
Convergence tests have to be performed not only for the modal basis, but also with respect to the total number of mock realizations used to evaluate the Fisher matrix. In figure 8, we vary the number of simulations used to estimate derivatives and verify the impact of this change on the final forecast. We find that the constraints from the bispectrum and the joint power spectrum + bispectrum analyses remain very stable, even by using 25 simulations instead of the usual 500. At the same time, though, we see that the power spectrum-only forecast is not fully converged, even when we use the full 500 simulation sample, leading to somewhat overoptimistic constraints in table 2, when cosmological parameters and PNG are studied jointly. As the power spectrum-only case was already reported as essentially unconstraining for PNG, this does not change any of the main conclusions presented above (see the companion paper Coulton et al. 2022, for a more detailed study of this issue, which can for example be mitigated by assuming some prior on f NL ).
In figure 8, we also verify that the constraints are highly stable when we change the number of simulations Table 2. Joint constraints on cosmological parameters and PNG from the power spectrum and the modal bispectrum at z = 1, for different kmax. We analyzed 8000 Quijote N-body simulations of 1 (Gpc/h) 3 volume at fiducial cosmology, and sets of 500 N-body simulations with one adjusted parameter.  Table 3. Fisher 1-σ constraints on the 3 PNG shapes from the power spectrum and the modal bispectrum at kmax = 0.5 h Mpc −1 , for a cubic volume of 1 (Gpc/h) 3 and at z = 1, after marginalization over cosmological parameters. Each shape is analyzed independently.
(at fiducial cosmology) used to estimate covariance matrices. Using only 1000 instead of 8000 simulations only leads to variations at the percent level. However, the requirement on covariance precision will become stricter in the next section, where we discuss actual parameter inference.

Parameter estimation
After deriving Fisher matrix forecasts, we now build the quasi-maximum likelihood estimator defined in eq. (14) and apply it to different datasets, to verify its efficiency to extract accurate cosmological information.
Our first validation test consists of estimating jointly cosmological parameters and PNG amplitudes (σ 8 , Ω m , Ω b , n s , h, f local NL , f equil NL , f ortho NL ) in the Quijote simula-tions, for different cosmologies. We include non-linear scales (k max = 0.5 h Mpc −1 ) and use both the power spectrum and the bispectrum in this analysis, as their combination should lead to smaller error bars, as verified in the previous section.
In figure 9, we show the corresponding results for fiducial cosmology datasets, as well as others having PNG (f local NL = 100, f equil NL = 100 or f ortho NL = 100). 11 We compute averages of the estimated parameters and compare them to their input values in the simulations. The correct values are recovered each time, confirming the unbiasedness of the estimator. Note also that we consider only cases in which these averages are extracted from simulations not overlapping with those used to evaluate derivatives and covariances. In this way, we avoid spurious correlations, which could arise from using over- 11 We focus here on changes in f NL , rather than on other cosmological parameters, both because the main focus of this work is on the study of PNG and because, in the available Quijote set, we can access realizations with f local NL = 100, which is more than a 4-σ deviation from the fiducial value of f NL = 0. This allows us to test an interesting regime, in which the data we analyze are generated from parameters which are significantly displaced from the model we use to build the estimator. lapping sets of data both to calibrate the estimator and to measure parameters.
We also perform a similar analysis for the additional sets of 10 realizations, produced independently of the Quijote simulations and having respectively f local NL = −40, 0 and 40 described in section 4.1. The quasimaximum likelihood estimator is built from all the available Quijote simulations. Results are shown in figure  10, where we focus on the combined power spectrum and modal bispectrum analysis at different k max . All estimated central values are within or very close to the expected 1-σ range. Moreover, the internal scatter between cosmological parameter estimates for the three datasets (f local NL = −40, 0 and 40) is extremely small. In general, we expect the estimator to yield slightly different results, depending on the exact set of simulations used for its construction. For this reason, we perform several stability tests, in which we study variations in the estimated parameters, by changing the number of simulations used to calculate covariances or derivatives.  These tests highlight that less accurate derivatives and covariances, which can be obtained if we use a too small number of realizations in their evaluation, have different impact on the final measurements. A less precise covariance matrix leads to a suboptimal estimation of parameters, as shown in figure 11. We compare the error bars of the quasi-maximum likelihood estimator applied to the Quijote simulations at fiducial cosmology, as a function of the number of realizations used to compute covariances, to the Fisher forecasts described in section 4.3. This confirms that our estimator yields close to optimal constraints, at the percent level for most parameters (5% for Ω b and h), when using many simulations to evaluate the covariance. Using only 1000 simulations, the increase of error bars is still only of a reasonable order 10%, which is small compared to the gain due to including non-linear scales in the analysis (a factor 2 or more for all parameters).
For the derivatives, the impact on optimality is negligible, but the results can be biased, as shown in figure 12. To study the typical size of the bias, we define less accurate estimators, using smaller subsets of the 500 simulations, which are initially used to compute derivatives. For example, using 25 simulations to compute derivatives gives 20 independent estimators. Then, we measure the difference between the measured parameters with every estimator and the benchmark results, obtained with an estimator calibrated using all available simulations. Finally, we compute the standard de-viation of this difference as a function of the number of simulations used to calculate derivatives (from 1 to 400), which is what we show in figure 12 after dividing it by the corresponding Fisher error bar.
We apply the same procedure for different sizes of data samples (10 and 100 simulations) to verify that the results are indeed biased, and not suboptimal, as, in the latter case, the spurious bias evaluated with this methodology would decrease with the number of simulations. As data samples, we consider three different choices. In the first, all parameters are set at their fiducial values, meaning in particular that all PNG amplitudes are set to 0. In the second, we set f equil NL = 100; we will define this case as "close to fiducial" since it corresponds to a 1-σ deviation from f equil NL = 0. In the third, we take f local NL = 100; this is our "far from fiducial" case, since it is characterized by a 4-σ deviation from f local NL = 0. We see that the impact of the number of simulations used to compute derivatives on the "fiducial" and "close to fiducial" cases is negligible. However, when analyzing the simulations with local PNG ("far from fiducial"), the measured value of f local NL displays a bias. However, this effect decreases rapidly and becomes already negligible when using more than 100 simulations to compute derivatives (let us recall that in our main analysis we use 500 realizations for this task). Even when using less than 10 simulations, it is still smaller than the typical 1-σ error bar. The two other primordial shapes, both being correlated with local PNG, are In the left panels, we vary the number of simulations used to calculate numerical derivatives, while on the right panels we vary the number of simulations used to measure covariances. All error bars are normalized by the result obtained using the full dataset, see table 7 for actual values. The top, middle and bottom rows correspond respectively to power spectrum only, bispectrum only and combined power spectrum and bispectrum analyses. Each column corresponds to the constraints on one parameter, and they are all analyzed jointly. We consider 5 different numbers of simulations to compute derivatives (from 25 to 500) and covariances (from 500 to 8000), each number corresponding to its own colour and marker. We include scales down to kmax = 0.5 h Mpc −1 .
also affected in similar ways, whereas for cosmological parameters the effect is negligible. We can thus conclude that the quasi-maximum likelihood estimator combining power spectrum and modal bispectrum is unbiased and can optimally extract information about cosmology and PNG up to significantly non-linear scales, provided a sufficiently large number of simulations to calculate covariance and derivatives is used.
An important caveat is that, to achieve optimality, it is also important to evaluate covariances and derivatives by using an input fiducial cosmology in the simulations, which is close enough to the actual value of parameters in the data. In our current test, this is true by construction. However, in general, it may be necessary to reach a good enough fiducial set of parameters by iteration. This poses the numerical challenge of generating thousands of N-body simulations at several different cosmologies. One efficient solution to overcome this issue is to use the CARPool method , 2022, which combines a few high-fidelity simulations with many fast surrogates (typically 100-1000 times faster to produce) to obtain accurate estimates of the mean and covariance matrix of our observables. This approach is under investigation and will be further discussed in a forthcoming publication.

CONCLUSIONS
In this paper, we discussed the implementation of a new, simulation-based, joint power spectrum and bispectrum statistical estimator of PNG amplitudes and standard ΛCDM parameters in LSS data. The methodology follows the general approach developed in Alsing & Wandelt (2018); Heavens et al. (2000) and it is based on extracting summary power spectrum and bispectrum statistics from the data and computing the score function, making the reasonable assumption that sampling distributions are Gaussian to a good approximation. This produces a vector of optimally compressed statistics, which is then used to build a quasi-maximum Measured -expected (in units of ) Figure 9. The unbiasedness of the quasi-maximum likelihood estimator (eq. 14), when measuring cosmological parameters and PNG amplitudes. We use the power spectrum and the bispectrum jointly, up to kmax = 0.5 h Mpc −1 . Each individual column corresponds to a given parameter (cosmological or PNG). To be exact, we show the difference between the expected and measured values, divided by the respective Fisher error bar. Each panel corresponds to a different cosmology of the data samples (i.e. one with Gaussian initial conditions and the three types of PNG). For each of them, we analyze five independent datasets of 100 realizations, each being indicated by its own colour and marker.
likelihood estimator of the parameters. The required covariances and numerical derivatives with respect to parameters are evaluated from a large set of mock realizations with Gaussian and non-Gaussian initial conditions. To extract bispectrum summaries, we implement an efficient numerical pipeline for modal bispectrum estimation, allowing for a useful preliminary compression step, in which the information in all available Fourier triplets is described by a relatively small set of mode amplitudes. Our pipeline was applied to a large set of Quijote simulations at z = 1, including newly produced sets of realizations with PNG of the local, equilateral and orthogonal types. In the analysis presented here, we focused on the study of the dark matter field. Of course this is a vastly idealized scenario, compared to the analysis of the galaxy density field in redshift space, required in a real dataset. However, a preliminary study of the matter field is a very useful starting point, as it both enables to validate the entire methodology -developed here for the first time -and to address at the same time the interesting question of how much information on PNG and cosmological parameters we can in principle extract, by pushing the analysis to small scales, where perturbative approaches fail, as we show, even in this simplified context. To this purpose, we derived Fisher bounds for the parameter set {f local NL , f equil NL , f ortho NL , σ 8 , Ω m , Ω b , n s , h}, using the covariance of the summary statistics -extracted from a training set of 8000 realizations -at three different cut-off scales, namely k max = 0.07 h Mpc −1 , k max = 0.2 h Mpc −1 , k max = 0.5 h Mpc −1 . We then applied the quasi-maximum likelihood estimator to our mock data and verified its unbiasedness.
The robustness of the results was thoroughly checked, for example by reducing the number of training realizations used to calibrate the estimator, or to compute the Fisher forecasts. We confirmed that all our results are fully converged and remain stable (e.g., with results within the 1-σ range, even when we used a few times less data to evaluate covariances). Moreover, we run the pipeline on a set of NG realizations, generated independently of the Quijote suite and displaying a different input f NL from the training set; also in this case, we recovered the unbiasedness of the estimator. In further 1000 3000 5000 7000 Number of simulations  Figure 11. The error bars of the quasi-maximum likelihood estimator as a function of the number of realizations used to calculate the covariance matrix. We analyze the 8000 Quijote simulations at z = 1, including both the power spectrum and the bispectrum, up to kmax = 0.5 h Mpc −1 . All error bars are divided by their corresponding Fisher prediction, as described in section 4.3.
consistency tests, we considered power spectrum and bispectrum estimates separately. Interestingly, starting from bispectrum results, we found out that PNG constraints significantly improve when we add power spectrum estimates. This effects persists when we fix all standard cosmological parameters to their known fiducial values. This might seem surprising at first, since there are no f NL signatures at tree-level in the dark matter power spectrum, for any PNG shape. However, it turns out that the power spectrum works in this case as an ancillary statistic, able to improve the separation between the primordial (signal) and gravitational ("noise") bispectrum component, through its correlation with the latter.
In summary, the outcome of our analysis shows that our pipeline can extract information on f NL and cosmological parameters up to small scales, deep into the nonlinear regime, which is a very promising starting point. The overall estimation procedure is unbiased and fast, once input simulations to train the estimator have been generated. A large set of input realizations, based on the Quijote suite, was produced to this purpose and will soon be made publicly available. These simulations are fully described in our companion paper (Coulton et al. 2022). In the same companion paper, we also study in more detail the information content of the power spectrum and bispectrum, at z = 0, addressing the important points of quantifying how large is the information contribution coming from non-linear scales and of establishing when such contribution saturates. This is done via a Fisher matrix analysis using bispectrum Fourier modes, which also includes a further battery of numerical convergence and validation tests.
In the work presented here, the training and validation sets are characterized in most cases by matching input cosmological parameters. In general, this would not be the case and too large a mismatch between the input fiducial cosmology and the actual value of parameters in the data would make the results suboptimal or biased. This can be solved by implementing a recursive procedure, in which the fiducial cosmology is updated by taking the best-fit values of the parameters at each step. Such an approach has the extra cost of having to generate new sets of simulations to recalibrate the estimator weights at each step. This issue can in principle be significantly alleviated by resorting to the CARPool method, which allows for accurate estimates of covariances by resorting to just a small set of high-fidelity realizations, combined with a large set of fast surrogates. This approach is under investigation and it will be the object of a separate publication.
In forthcoming works, we will also gradually extend our analysis to include more and more realistic scenarios: first we will analyze the actual density field of biased tracers (dark matter halos and galaxies) and later on we will account for redshift space, incomplete sky-coverage and other crucial observational effects, to finally reach the capability to analyze real datasets. Note that, given our simulation-based approach, this will not require any significant modification in the pipeline that was illustrated and validated here, but only changes in the input training sets and data.

A.2. Gamma matrix
The gamma matrix is defined as the inner product (eq. 7) of the mode functions Q n , given by: The above inner product matrix needs to be calculated only once for a chosen basis function Q n and scale range [k min , k max ]. Different ways exist to calculate the integral of the γ matrix, where these methods have been summarised and compared extensively in Byun et al. (2021). In this work we use the method applied for the first time in Karagiannis (2018), where the inner-product is calculated by utilising 1D FFTs. In order to describe this method, we need to take a step back and use the definition of the inner product used to derive eqs. (10) and (11), as well as eq. (A5), which is: where for brevity we introduce the shorthand notations k ≡ d 3 k (2π) 3 and k 123 ≡ k 1 + k 2 + k 3 . Following the steps of Fergusson et al. (2012b), we use where the exponent can be written as Using the above equation into eq. (A6) we get The integral over k i can be separated into a radial and angular part, where the latter is written as where to derive the above we have used the normalisation of spherical harmonics, i.e. dΩ ki Y imi (k i ) 2 = 1 and Y 00 = 1/ √ 4π. The above equation forces all i and m i in eq. (A9) to zero, giving Q m |Q n = (4π) 9/2 (2π) 9 dx x 2 dΩ x dk 1 k 2 1 j 0 (k 1 x)Y 00 (x) × dk 2 k 2 2 j 0 (k 2 x)Y 00 (x) × dk 3 k 2 3 j 0 (k 3 x)Y 00 (x) × Q m (k 1 , k 2 , k 3 )Q n (k 1 , k 2 , k 3 ) k 1 k 2 k 3 .
Now, if we use for the integration over x the fact that, dΩ x Y 00 (x) 3 = 1/ √ 4π and dx x 2 j 0 (k 1 x)j 0 (k 2 x)j 0 (k 3 x) = π/(8k 1 k 2 k 3 ), where the latter is non-zero only for wavevectors that form a triangle, then we retrieve the expression in eq. (A6). However, at this point we will only use the expression for the angular part of the x integral, while we substitute in eq. (A11) the basis functions Q m = q {p1 (k 1 )q r1 (k 2 )q s1} (k 3 ) and Q n = q {p2 (k 1 )q r2 (k 2 )q s2} (k 3 ), which is a separable product of one-dimensional functions q p (k). Furthermore, we use the expression for the zero-order Bessel function, i.e. j 0 (k i x) = sin(k i x)/(k i x), thus getting where ξ pr (x) ≡ kmax kmin dk q p k − k min k max − k min q r k − k min k max − k min sin(kx).
The above integral over k can be calculated by using one-dimensional FFTs, as described in Chapter 13.9 of Numerical Recipes (Press et al. 1992), while the integration over x in eq. (A12) can be done by using conventional numerical integration methods, like the five-point Newton-Cotes quadrature rule used here. The accuracy of the outlined method, on calculating eq. (A12), depends on the grid resolution of k, which has to be sufficient, not only to accurately calculate the integral of eq. (A13), but the derived range and resolution of x generated by the FFT, should be adequate enough, as well, to retrieve accurate numerical result for the outer most integration over x. More details on this method can be found in (Press et al. 1992).

B. IMPACT OF COSMOLOGICAL PARAMETERS
In figures 13 and 14, we show the impact on the power spectrum and on the bispectrum of varying the cosmological parameters considered in this work, similar to what was done for PNG in section 4.2. This highlights distinct behaviours for each parameter and each observable, in contrast with PNG which is strongly degenerate at the power spectrum level.  Figure 13. The impact of cosmological parameters on the matter power spectrum ( averaged from 500 N-body simulations). Solid and dashed lines correspond respectively to values perturbed above and below the fiducial. Figure 14. The impact of cosmological parameters on the matter bispectrum (ratio bispectrum with a modified parameter to fiducial, averaged from 500 N-body simulations).