Constraining Primordial Non-Gaussianity from Large Scale Structure with the Wavelet Scattering Transform

We investigate the Wavelet Scattering Transform (WST) as a tool for the study of Primordial non-Gaussianity (PNG) in Large Scale Structure (LSS), and compare its performance with that achievable via a joint analysis with power spectrum and bispectrum (P+B). We consider the three main primordial bispectrum shapes - local, equilateral and orthogonal - and produce Fisher forecast for the corresponding fNL amplitude parameters, jointly with standard cosmological parameters. We analyze simulations from the publicly available"Quijote"and"Quijote-png"N-body suites, studying both the dark matter and halo fields. We find that the WST outperforms the power spectrum alone on all parameters, both on the fNL's and on cosmological ones. In particular, on fNL_loc for halos, the improvement is about 27%. When B is combined with P, halo constraints from WST are weaker for fNL_loc (at ~ 15% level), but stronger for fNL_eq (~ 25%) and fNL_ortho (~ 28%). Our results show that WST, both alone and in combination with P+B, can improve the extraction of information on PNG from LSS data over the one attainable by a standard P+B analysis. Moreover, we identify a class of WST in which the origin of the extra information on PNG can be cleanly isolated.


Introduction
An important goal of observational cosmology in the coming years will be the development of methodologies to extract maximum information from non-linear scales in galaxy surveys.Clearly, this will involve going beyond power spectrum analysis, since the power spectrum alone cannot capture phase information and is therefore not sensitive to structures such as, for example, filaments in the cosmic web.A natural choice in this respect is to consider the bispectrum (three-point function of the Fourier modes of the density field), as this is the lowest order non-Gaussian (NG) correlator.Indeed, many joint power spectrum and bispectrum (P+B) Large Scale Structure (LSS) forecasts and analyses are already present in the cosmological literature [1][2][3][4][5] and this kind of approach to the analysis of future galaxy survey data presents some advantages, notably the availability of several well established and validated P+B analysis pipelines and the theoretical interpretability of bispectrum measurements -at mildly non-linear scales -in cosmological perturbation theory.On the other hand, the bispectrum clearly does not exhaust the available information at fully non-linear scales and a straightforward extension to higher order correlators becomes increasingly more cumbersome and difficult to implement, especially when considering the need for accurate evaluation of covariance matrices in the non-linear regime of interest and for inclusion of realistic observational effects (e.g., window functions).This motivates the search for different NG summary statistics.The ideal properties that we would require from them would be ease of implementation, efficiency at extracting all the available NG information using a limited number of modes and clear physical interpretability.With such goals in mind, several NG summaries have already been considered in LSS analysis.Among those, let us mention for example marked spectra [6][7][8][9][10], power spectra from cosmic web environments [11,12], skew spectra [13][14][15][16], Minkowski functionals [17,18], k-nearest neighbour cumulative distributions functions [19,20], probability distribution function (PDF) of late-time cosmic density fluctuations [21,22], halo mass function probes [23][24][25][26][27][28], and the void abundance [29,30].
Here, we focus on another statistic that has received significant recent attention in observational cosmology, namely the Wavelet Scattering Transform (WST) [31,32].The WST has the structure of a three-layer convolutional network, in which the operations performed at each stage are, in sequence, non-linear activation via the modulus function, convolution by a wavelet filter and low-pass spatial averaging of the data.The WST has recently been applied to simulations in [33][34][35][36][37], showing that it could efficiently capture NG information and produce significant improvements over power-spectrum-only cosmological constraints, especially on neutrino mass.The main goal of this work is to investigate the WST as a tool for the study of primordial non-Gaussianity (PNG) and to compare in detail its performance with that achievable via a P+B analysis, along with providing a general physical interpretation for this comparison.
Non-Gaussian features in the primordial cosmological density field are a distinctive prediction of inflationary models beyond standard single-field slow-roll.Primordial non-Gaussian signatures are encoded in the primordial bispectrum of cosmological perturbations: different deviations from the single-field slow-roll scenario produce specific functional dependencies of the bispectrum on triangular configurations of Fourier modes.This model dependence makes PNG a very powerful tool to constrain inflation.At the same time, the expected PNG signal is very small and strongly degenerate, at non-linear scales, with the bispectrum produced at late times by gravitational evolution of cosmic structures and galaxy bias.Beyond-bispectrum statistics, such as the WST, can help breaking this degeneracy, which is one of the main motivations of the present analysis.Let us note here that, instead of relying on summary statistics, an alternative, more ambitious approach consists in performing forward modeling field level inference.Field level analysis is more and more employed in cosmological data analysis and carries the promise of extracting all the available information in the data (see, e.g., [38][39][40][41][42][43][44][45][46][47][48][49][50][51]).The more traditional approach, based on preliminary compression of the data in a set of summaries, albeit lossy, remains nonetheless a valid alternative, often leading to easier physical interpretation of the results and comparison with analytical models; it is also relevant as it gives insight on where (i.e., in which domain, at which scales, and so on) the bulk of information about the relevant signals lies and on how it can be efficiently recovered.
In our work, we consider the three main primordial bispectrum shapes -local, equilateral and orthogonal -and produce Fisher forecast for the corresponding f NL amplitude parameters, jointly with standard cosmological parameters (σ 8 , Ω m , n s , h).To build the Fisher matrix up to a scale k max = 0.5 h Mpc −1 , we compute covariance matrices and summary responses to changes in parameters by means of Monte Carlo averaging and numerical differentiation, using the publicly available Quijote and Quijote-png N-body simulation suites.This a set of simulations with 512 3 dark matter particles in boxes of size 1 h −1 Gpc.
The paper is structured as follows.In Sect. 2 we introduce the summary statistics considered in our analysis: power spectrum, bispectrum and WST; in Sect. 3 we describe our methodology to compute Fisher matrices; in Sect. 4 we describe in more detail the input Quijote simulation dataset; in Sect. 5 we describe our analysis and present the main results; in Sect.6 we discuss the interpretation of our results and draw a comparison between the P+B and WST performance; in Sect.7 we draw our main conclusions.

Power spectrum and bispectrum
To extract the cosmological information of the density contrast field δ(⃗ x), we consider its two and three-point correlators in Fourier space, the power spectrum and bispectrum.
From a Fourier-transformed density field δ( ⃗ k) defined on a three-dimensional grid, the power spectrum P (k) is easily obtained using a binned estimator (see e.g., [52]): where V is the surveyed volume, ∆ i defines the binning of the relevant k-range, N i is the count of modes k in the i-th bin and k i the average modulus.Rather than computing the binned bispectrum B(k 1 , k 2 , k 3 ), we use a modal approach based on its expansion on a basis of separable functions (i.e.symmetrized product of onedimensional functions in k) [53][54][55][56].Then, evaluating the information content of the bispectrum only comes to estimating modal modes of the following form: where the q p (k) are the chosen one-dimensional functions for the bispectrum expansion.As was shown in [57][58][59], only a small number of such functions, constructed from simple polynomials (e.g.Legendre polynomials) or from the tree-level matter bispectrum, leading to typically only 100 modal modes β n , is necessary to study the matter and halo bispectrum from linear to non-linear scales (k max = 0.5 h Mpc −1 ).

Wavelet Scattering Transform
The observables in the WST are the scattering coefficients, which are built from nonlinear operations on δ(⃗ x), and are organized in different orders, or layers.In this paper, we adopt the implementation of the WST encoded in the Kymatio 1 package [32].The zeroth order coefficient is just the spatial average of the modulus of δ(⃗ x) raised to some power q, To obtain the higher order coefficients a mother wavelet function is defined, which, for Kymatio, is where x ≡ |⃗ x|, x ≡ ⃗ x/x, and the R m l (⃗ r) are the regular solid harmonics, with Y m l (r) the Laplacian Spherical Harmonics.The coefficients C l are given in appendix B. From the mother wavelet, a family of rescaled ones, ψ m j l , with j = 1, 2, • • • , is generated by successive doublings of the length scale σ, Following [33], in this analysis based on simulation data we will set where L is the side of the cubic box and N the number of grid points.
In Fourier space, the wavelets read, with p ≡ |⃗ p|, p ≡ ⃗ p/p.First order coefficients are then defined as where the symbol ' * ' indicates convolution in configuration space.Due to the sum over m, the coefficients depend, for fixed q, on j and l only.In the following, we will also show results for the special case q = 2, where the interpretation of these coefficients simplifies considerably.Indeed, in this case, the first order coefficients are directly related to the nonlinear power spectrum P (p), where we have used the property of spherical harmonics, and defined the Fourier space filter, showing that, for fixed variance of the Gaussian filter (set by j), higher values of l probe larger wavenumbers.The last line of Eq. (2.10) is the space averaged variance of the inverse Fourier transform of the filtered density field δ(⃗ p; j, l) ≡ W l 2 j σp δ(⃗ p).
To define second order coefficients, we first define a new nonlinear field, and then convolve it again with the wavelet at scale j 2 , using the same operation as (2.9), where, again following [33], we make the choice of using the same value for l both for the internal (j 1 ) and for the external (j 2 ) convolution.The q = 2 case now gives where the power spectrum in the integral now refers to the field and the filtered field The q = 2 case turns out to be particularly useful in order to compare the performance of the WST with that of the PS, as all the extra information beyond the nonlinear PS is encoded in scattering coefficients beyond first order.The second order coefficients are just the variances of the filtered fields U (⃗ x : j 2 , j 1 , l), which therefore contain information on the bispectrum and higher order correlators.
Fixing σ as in Eq. (2.7), we must decide how many coefficients to compute at first and second order.For the first order, we take j = 0, 1, • • • , J, with J = 4, corresponding to Gaussian smoothing filters in real space of widths from σ to 2 4 σ.In Fig. 1, we show the Fourier space filters use to analyze the Quijote suite of simulations presented in Sect. 4. As for the angular dependence of the wavelets, we take l = 0, 1 , • • • , L, with L = 4.When considering second order scattering coefficients, we take the external convolution, controlled by j 2 , at scales larger than the internal one, therefore j 1 < j 2 = 1 • • • , J, with J = 4 as for the first order coefficients.This gives a total of 76 coefficients, one of zeroth order, 25 of first order and 50 of second order.As we do not consider scattering coefficients at higher orders, these will form our data vector.

Simulation-based Fisher matrices
To benchmark the amount of information on PNG amplitudes, and more generally on cosmological parameters, extracted from the non-linear density field using the different summary statistics presented in Sect.2, a standard method is to use the Fisher matrix formalism.
Under the reasonable assumption of normally distributed summary statistics s, depending on a set of parameters θ through their mean s, the Fisher matrix is given by: where C is the covariance matrix.For each parameter θ i , the quantity (F −1 ) ii is the Cramer-Rao bound, the expected error bar from an optimal estimator.To evaluate Eq. (3.1), we choose the simulation-based approach (see Sect. 4 for a description of the simulations used in this work).Unbiased estimates of inverse covariance matrices are obtained using the Hartlap/Anderson correction factor [60]: where n r is the number of realizations used and n s the number of summary statistics.Derivatives can be computed at a chosen fiducial point (denoted by the superscript * ) by central finite difference: However, a direct application of this method requires a very large number of simulations to obtain numerically converged results, or can otherwise lead to spuriously super-optimistic Fisher error bars (for a discussion of this issue in the context of PNG, see [61,62]).This issue can be mitigated using the method of [63], consisting in computing the Fisher matrix from score-compressed statistics instead of the raw summary statistics.This score-compression is optimal, in the sense it keeps the full information about parameters, under the same conditions as the ones used to write Eq. (3.1) for the Fisher matrix [64,65].Score-compressed statistics s are obtained using: which are the same ingredients as for the Fisher matrix computation, and thus can be evaluated from the same sets of simulations with the advantage of a much faster numerical convergence.

Simulations
Evaluating with precision the statistical properties of the different observables introduced in Sect. 2 up to non-linear scales, as well as their dependence on cosmological parameters, requires a large number of simulations.In this work, we use the Quijote2 and Quijotepng3 suites of N-body simulations [66,67] which were designed exactly for this purpose.These are N-body simulations, containing each 512 3 dark matter particles in boxes of size 1 Gpc h −1 , in which the gravitational evolution of structures up to z = 0 is computed with the TreePM code Gadget-III [68] starting from initial conditions determined at z = 127 using 2LPTIC [69] or 2LPTPNG [67,70]. 4The suite contains 15 000 simulations with Planck-like cosmological parameters [71], and subsets of 500 were one of the parameters is slightly shifted above or below its fiducial value.The characteristics of these different sets are recalled in Table 1.
In this work, we focus on the parameters ⃗ θ = {σ 8 , Ω m , n s , h, f local NL , f equil NL , f ortho NL }, and study both the matter (at z = 1) and halo (at z = 0) density fields.The definitions of the PNG parameters, f local NL , f equil NL , and f ortho NL , characterizing different shapes of the primordial bispectrum, are given in [67].The halos have been identified using friends-of-friends (FOF) [72].Note that we only include halos with a mass above M min = 3.2 × 10 13 M ⊙ /h, and include M min as a nuisance parameters by varying it in the halo analyses.As shown in previous analyses [62,67], M min can be thought of as a proxy for the linear bias parameter b 1 .
We use the power spectra and bispectrum modal modes which were previously measured in [59] and [62] for the matter and halo fields, respectively.Note that the matter density field is studied in real space, while the halo density field is transformed to redshift-space applying the velocity correction along the x direction.
We estimate the WST coefficients from density fields computed by interpolating the dark matter particle or halo positions on regular three-dimensional grids of size N grid = 256 using the CIC scheme implemented in the code Pylians3. 5We apply a sharp, low-pass filter on the input field δ(⃗ x) that excludes all modes above k max = 0.5 h Mpc −1 (that is, the smallest accessible scale with Quijote simulations), and then compute all 76 scattering coefficients.

First order scattering coefficients and P (k)
As we have shown in Eq. (2.10), first order scattering coefficients with the specific choice q = 2, that is, S 2 1 (j, l), contain no more information than the (nonlinear) power spectrum P (k).To check this explicitly, we run a Fisher matrix analysis on the non-linear dark matter field at redshift z = 1.The resulting constraints for the set of cosmological parameters ⃗ θ = {σ 8 , Ω m , n s , h} are shown in Fig. 2. The near to perfect overlap between the 2D contours for P (k) and S 2 1 (j, l) shows that the information content of the two statistics is nearly identical.P (k) gives slightly better constraints, which shows that the equivalence between the 25 integrals of P (k) of Eq. (2.10) and the 78 P (k) bins is not perfect.This can be probably improved by increasing the maximum j and/or l.However, to limit the computational cost, we decide to stick to our choice since, as we will see, including second order coefficients outperforms the power spectrum for all parameters, including PNG ones, both for matter and for halos.
We also show the constraints from the first order coefficients with q = 0.8.In this case, the equivalence with P (k) is lost, and one expects that some information from higher order correlators should leak into first order scattering coefficients.As expected, the contours for the S 0.8 1 (j, l) coefficients differ from the ones for the S 2 1 (j, l) and for the P (k).Moreover, for all the parameters but n s , the performance is actually worse, suggesting that the redistribution of the information in passing from q = 2 to q = 0.8 moves it from first order to higher order coefficients.

Matter at z = 1
We now consider the matter field at z = 1 and extend the analysis of the previous subsection by enlarging the parameter vector to include the PNG parameters, and by including second order scattering coefficients and the bispectrum.Specifically, in Figs. 3 and 4 we use the parameter vector ⃗ θ = {σ 8 , Ω m , n s , h, f local NL , f equil NL , f ortho NL }, whereas in the upper table of Fig. 7, each PNG shape is analyzed jointly with cosmological parameters and independently from the two others PNG shapes.For cosmological parameters, we report the largest bound of the three analyses (differences being small anyway).
In Figs. 3 and 4 we show the constraints from the combination of all the 76 WST coefficients computed with q = 0.8 and q = 2, respectively.We compare them with those obtained by the combination of the power spectrum and the bispectrum (P + B), and the combination of all the statistics considered, P +B +WST.The corresponding 1σ Cramer-Rao bounds are listed in Fig. 7.We do not show results for P alone, as they are outperformed by both versions of WST on all parameters.
As discussed in the previous subsection, for q = 2 the extra information comes entirely from the second order scattering coefficients.For q = 0.8, the inclusion of second order coefficients improves the performance, which is now overall comparable with that of q = 2: it is worse for all cosmological parameters except n s , but slightly better for the PNG parameters.However, when combined with P + B, WST at q = 0.8 is slightly more efficient than q = 2 at removing degeneracies between parameters.

Halos at z = 0
We then apply the Fisher matrix analysis on the non-linear dark matter halo field at z = 0.In order to test the performance of the WST on a biased tracer in a more realistic scenario, the field is transformed in redshift space.The set of cosmological parameters analyzed in Figs. 5 and 6 We show the bounds given by all the 76 WST coefficients computed with q = 0.8 and q = 2 respectively, compared with those given by the first two power spectrum multipoles (P 0 + P 2 ), their combination with the bispectrum (P 0 + P 2 + B), and the combination of all the statistics.The 1σ Cramer-Rao bounds for all the above statistics are listed in the lower table of Fig. 7, where each PNG shape is analyzed independently from the other two.
As for matter, the WST outperforms the power spectrum alone for both values of q.This is true in particular for the PNG parameters, including for f local NL (we do not report the constraints on the other two PNG shapes from P 0 + P 2 , in Fig. 7, as they are huge).P 0 + P 2 + B performances are comparable with WST for the cosmological parameters (and for M min ), while for the PNG parameters we find a mixed situation: f local NL constraints are better (by about 10-15%) for P 0 +P 2 +B, f equil NL are better (by about 20-30%) for WST, while f ortho NL ones are of the same order for WST with q = 2 and about 30% better for q = 0.8.Overall, our results on halos confirm what we found for matter, namely, that q = 0.8 and q = 2 WST give comparable constraints basically on all parameters.This means that taking different exponents does not change significantly the amount of extracted information, once second order coefficients are included.
In appendix A, we compare directly the information content on PNG and cosmological parameters of WST with marked statistics, which is very similar.

Discussion
In order to understand where the information on PNG parameters is stored in the WST, we plot in Figs. 8 and 9 the sensitivity of WST coefficients to each parameter.More specifically, for each parameter α, we plot the absolute value of the difference between the WST coefficients computed in the set of simulations with parameter α + and the one for α − , normalized to the ones for the fiducial cosmology, (see Tab. 1 for the specific values taken for each parameter).Fig. 8 is for dark matter at z = 1 and real space, and Fig. 9 for halos at The colour scale indicates the ratio of each error bar with respect to its power spectrum + bispectrum equivalent, highlighting the significant additional information picked by the WST in most cases.Note that each PNG shape is analyzed jointly with cosmological parameters and independently from the two others PNG shapes.For cosmological parameters, we report the largest bound of the three analyses (differences being small anyway).
z = 0 and redshift space.The upper row of each panels contains PNG parameters, while the lower row all the other ones.Moreover, the left and right columns are obtained by setting the exponent parameter, q, to q = 0.8 and q = 2, respectively.The WST coefficients from 0 to 75 are ordered in decreasing length scale of the most external convolution.So, the first WST coefficient on the left is the second order one with j 2 = 4, j 1 = 3, l = 0, then j 2 = 4, j 1 = 3, l = 1, and so on up to j 2 = 4, j 1 = 0, l = 4 (end of the first light brown horizontal segment on the right).Then the first order WST coefficients with j = 4, l = 0, • • • 4 (dark brown segment), followed by second order coefficients with j 1 < j 2 = 3, and so on.
First of all, we notice no clear qualitative difference between left and right columns of each panel, meaning that the intuition developed in sect.2.2 for the q = 2 case, namely, that first order coefficients are related to the power spectrum, extends, to a large extent, also to q = 0.8.This explains why the first order coefficients for matter are almost insensitive Figure 8. Variation of WST coefficients for matter in real space with respect to different parameters, normalized to the fiducial values, for the matter field at z = 1.Left: q = 0.8, Right: q = 2. Upper row: NG parameters; lower row: all other parameters.The green segments on the x-axis identify second order WST coefficients, whereas blue segments identify first order ones.There is essentially no NG signal in first order WST coefficients because the power spectrum of the matter field -to which such coefficients are sensitive -does not depend on f N L at lowest order.See the main text for details to PNG parameters (Fig. 8, upper row, dark brown segments), especially at large scales.Indeed, the f NL dependence of the matter power spectrum is a suppressed one-loop effect, and therefore it manifests itself only at small scales.The same argument explains why also second order coefficients with l = 0 are almost insensitive to PNG parameters, as the sum in U (j, l)(⃗ x) of Eq. (2.13) involves just one term and then the second order coefficient is also related to the power spectrum.We conclude that, for matter, the information on PNG parameters comes mainly from second order coefficients with l ̸ = 0. On the other hand, for the other parameters the information is stored also in the power spectrum at linear order, and therefore we expect signal also in first order WST coefficients, as seen in the second row of Fig. 8.
Before concluding the analysis of the results for matter, we recall that the information content not only depends on the derivatives of the WST coefficients, but also on their covariances.In the first row of Fig. 10 we show the correlation coefficients defined as where C ij is the covariance between the i-th and the j-th WST coefficient.r ij can take values between −1 (complete anticorrelation) and 1 (complete correlation), with r ij = 0 corresponding to no correlation.The correlation patterns of q = 0.8 and the q = 2 cases are clearly different.For the q = 2 case, the correlation between first order coefficients is very small at large scales and increases at smaller ones, reflecting the behaviour of the power Variation of WST coefficients for the halo field in redshift space wi th respect to different parameters, normalized to the fiducial values.Now, first order WST coefficients pick up significant primordial NG signal for the local shape, due to the scale dependent bias signature in the power spectrum on large scales.See the main text for more explanations.Left: q = 0.8, Right: q = 2.
spectrum bins.First order coefficients are also correlated with the second order ones for l = 0, confirming our interpretation that the latter are also related to the power spectrum.The most striking features are, perhaps, the strong correlation among second order coefficients with l ̸ = 0, and the very small correlation between second order coefficients (with the exception of l = 0 ones) and first order ones, especially at large scales.This separation is absent for q = 0.8, confirming our conclusion (see Sect. 5.1) that the PS information is confined to 1st order coefficients for q = 2 and distributed among 1st and higher order ones for q ̸ = 2.Moreover, the strong correlation between second order coefficients for q = 2 suggests that a smaller set of them may be enough to extract the same amount of information.
When we turn to halos, the differences between the PNG sensitivities of first and second order coefficients, and between the correlation patterns for q = 2 and q = 0.8, basically disappear for the local shape.In this case, first order WST coefficients are sensitive to f NL since, as is well known, the power spectrum exhibits scale dependent bias at large scales [73].This can be understood as a new contribution to the density field where b 1 and b ϕ are tracer-dependent bias parameters (for matter, b 1 = 1 and b ϕ = 0 ) and ϕ is the gravitational potential.Finally, a stochastic contribution, ϵ, uncorrelated to δ m and ϕ, describes 'shot noise' in the halo field.It is responsible for the reduction of the sensitivity compared to matter, especially at small scales, and also for the decorrelation between WST coefficients in the q = 2 case seen in Fig. 10.

Conclusions
In this paper we have investigated the WST as a tool for the study of PNG, and compared its performance with that achievable via a P+B analysis.We analyzed the matter and the halo fields, in real and redshift space, measured from the publicly available Quijote and Quijotepng N-body simulation suite, producing Fisher forecast for the amplitude parameters, f local NL , f equil NL , and f ortho NL , along with the cosmological parameters.In order to make a fair comparison, WST and P+B analyses were implemented on the same fields.In particular, to reduce possible spurious effects picked up from WST at small scales, a sharp cut of wavenumbers above k max = 0.5 h Mpc −1 was applied.Moreover, we compared WST's with two values of the exponent parameter q, namely, q = 0.8 and q = 2, as the latter allows for a more clear interpretation of the results: first order scattering coefficients are equivalent to the power spectrum, see Fig. 2, so all information beyond that comes from second order coefficients.
We find that WST's (both for q = 0.8 and q = 2) outperform the power spectrum alone on both on PNG and cosmological parameter constraints, as well as on the 'bias' parameter M min .Specifically, on f local NL for halos, the improvement is about 27%.When B is combined with P, halo constraints from WST are weaker for f local NL (at ∼ 15% level), but still stronger for f equil NL (∼ 25%) and f ortho NL (∼ 28% for q = 0.8).Our results on the standard cosmological (non-PNG) cosmological parameters are in line with previous analyses [33,34], which also display a better performance of the WST with respect to the power spectrum alone.Concerning PNG parameters -which are the main focus of this work -we show that the WST can improve the extraction of PNG information from LSS data over that attainable by a standard P+B analysis.The level of improvement for Quijote-png simulations is competitive with the best results we obtained in previous works, using other summary statistics, such as the marked power spectrum and bispectrum [74].
Further developments should advance the current analysis in two directions.On one hand, the level of realism of data should be increased, in order to match as closely as possible that of a realistic survey.Therefore, modeling of galaxy bias, survey systematics, and so on, should be included in our pipeline.On the other hand, advances in the theoretical interpretation of the WST are highly desirable.We have provided a first attempt in this paper, by considering the WST for q = 2.This choice enables a cleaner theoretical description, while carrying at the same time basically no information loss with respect to other q values.It will be interesting to investigate more general families of wavelets, in search for those best tailored for extracting information, especially on PNG parameters, from the LSS.For instance, the angular structure could be exploited to minimize the effect of redshift space distortions, and the underlying symmetries of the problem, such as extended galileian invariance [75][76][77][78][79][80], could be exploited to reduce the number of independent coefficients to take into account in the analyses.

A Comparison with marked statistics
In [74], the marked power spectrum and marked bispectrum, computed from re-weighted density fields, were identified as promising observables of PNG on non-linear scales.In Fig. 11, we compare their reported 1σ Fisher bounds combining standard and marked power spectra and bispectra to our strongest constraints based on WST reported in Fig. 7.We obtain very similar constraints on PNG by including marked statistics or the WST, indicating that they probe the same information from higher order correlators beyond the bispectrum.

B Mother wavelets in Kymatio
The C l coefficients appearing in (2.4) are given by

C Convergence and Saturation
Having obtained our results from a simulation-based approach, we must check that our statistics have converged numerically.We test this by comparing the variation in the Cramer-Rao bounds obtained by using a growing number N der < 500 of simulations to compute the numerical derivative of Eq. (3.3), with the same result given by the full simulation suite (that is, with N der = 500).In figure 12 we report the ratio between the two bounds as a function of N der for the WST, highlighting the f NL configurations for clarity.If a line asymptotically converges to 1, then the bound for that parameter is considered to have numerically converged.We see that the bounds appear to converge rather quickly, especially for f NL in the case of halos.In the case of matter, the situation is reversed, however the absolute difference in the values never exceeds 3%, which implies very high numerical stability.For similar tests, showing the stability of the P + B analysis, see [59,61,62,67].It is also interesting to see how the Cramer-Rao bounds improve when adding more and more WST coefficients.This is shown in Fig. 13, where, as in Figs. 8, 9, and 10, we order the coefficients starting from j = 4, all the way to j = 0 (see Sect.

Figure 1 .
Figure 1.Radial profile of the wavelets at l = 0 (solid lines) and l = 4 (dashed lines) at m = 0 and for different values of j, where j = 0 is equivalent to the mother wavelet.As discussed in the text, we cutoff the field beyond k max = 0.5 h Mpc −1 , this is highlighted by the shaded region.

Figure 2 .
Figure 2. Expected distribution of the parameters around the fiducial quijote-PNG cosmology, with covariance given by the Fisher matrix computed on the non-linear matter field at z = 1.The statistics compared in the figure are 1st order WST coefficients S 0.8 1 (j, l) (olive), S 2 1 (j, l) (grey) and P (k) (aquamarine).

Figure 3 .
Figure 3. Expected distribution of the parameters around the fiducial quijote-PNG cosmology, with covariance given by the Fisher matrix computed on the non-linear matter field at z = 1.The statistics compared in the figure are P + B (olive), WST coefficients S 0.8 n (j, l) with n = 0, 1, 2 (grey), and the full combination P + B + WST (aquamarine).

Figure 4 .
Figure 4. Expected distribution of the parameters around the fiducial quijote-PNG cosmology, with covariance given by the Fisher matrix computed on the non-linear matter field at z = 1.The statistics compared in the figure are P + B (olive), WST coefficients S 2 n (j, l) with n = 0, 1, 2 (grey), and the full combination P + B + WST (aquamarine).

Figure 5 .
Figure 5. Expected distribution of the parameters around the fiducial quijote-PNG cosmology, with covariance given by the Fisher matrix computed on the non-linear halo field at z = 0.The statistics compared in the figure are the power spectrum monopole and quadrupole P 0 + P 2 (olive), P 0 +P 2 +B (grey), WST coefficients S 0.8 n (j, l) with n = 0, 1, 2 (aquamarine), and the full combination P 0 + P 2 + B + WST (violet).

Figure 6 .
Figure 6.Expected distribution of the parameters around the fiducial quijote-PNG cosmology, with covariance given by the Fisher matrix computed on the non-linear halo field at z = 0.The statistics compared in the figure are the power spectrum monopole and quadrupole P 0 + P 2 (olive), P 0 + P 2 + B (grey), WST coefficients S 2 n (j, l) with n = 0, 1, 2 (aquamarine), and the full combination P 0 + P 2 + B + WST (violet).

Figure 7 .
Figure 7.The Cramer-Rao 1σ bounds for different combinations of summary statistics (power spectrum, bispectrum and WST, "all" referring to the three of them altogether).Every statistic is measured up to k max = 0.5 h Mpc −1 in the Quijote N-body simulations at z = 1 (top panel), or in the corresponding halo catalogues at z = 0 (bottom panel), for a volume of 1 h −1 Gpc 3 .The

Figure 9 .
Figure 9.Variation of WST coefficients for the halo field in redshift space wi th respect to different parameters, normalized to the fiducial values.Now, first order WST coefficients pick up significant primordial NG signal for the local shape, due to the scale dependent bias signature in the power spectrum on large scales.See the main text for more explanations.Left: q = 0.8, Right: q = 2.

Figure 11 .
Figure 11.Comparison of the 1-σ Fisher bounds on cosmological parameters and PNG amplitudes for different combinations of summary statistics (power spectrum, bispectrum, WST (q = 0.8), marked power spectrum and marked bispectrum), measured up to k max = 0.5 h Mpc −1 in the Quijote halo catalogues at z = 0.

Figure 12 .
Figure 12.Convergence of the Cramer-Rao 1-σ bounds on the cosmological parameters as more derivatives on the WST coefficients are used to compute the Fisher matrix.Upper row: matter, lower row: halos.Left: q = 0.8, Right: q = 2.
6).The horizontal line at 1 indicates the corresponding bounds from P + B. Focusing on PNG parameters, we see that extra information with respect to P + B comes in already at intermediate scales (starting from the 40 th coefficient, approximately) and, in the case of halos it shows clear saturation at small scales, supporting the robustness of the bounds with respect to spurious small scale effects.

Figure 13 .
Figure13.Change in the Cramer-Rao 1σ bounds on the cosmological parameters obtained from as we add more WST coefficients to the Fisher matrix.The horizontal lines represent the bounds given by P + B. Upper row: matter, lower row: halos.Left: q = 0.8, Right: q = 2.

Table 1 .
The parameters of the Quijote and Quijote-png N-body simulations and halo catalogues used in this work.