Studying Interstellar Turbulence Driving Scales using the Bispectrum

We demonstrate the utility of the bispectrum, the Fourier three-point correlation function, for studying driving scales of magnetohydrodynamic (MHD) turbulence in the interstellar medium. We calculate the bispectrum by implementing a parallelized Monte Carlo direct measurement method, which we have made publicly available. In previous works, the bispectrum has been used to identify non-linear scaling correlations and break degeneracies in lower-order statistics like the power spectrum. We find that the bicoherence, a related statistic which measures phase coupling of Fourier modes, identifies turbulence driving scales using density and column density fields. In particular, it shows that the driving scale is phase-coupled to scales present in the turbulent cascade. We also find that the presence of an ordered magnetic field at large-scales enhances phase coupling as compared to a pure hydrodynamic case. We therefore suggest the bispectrum and bicoherence as tools for searching for non-locality for wave interactions in MHD turbulence.

To maintain turbulence in a fluid, energy must be injected and replenished at least on a eddy turn over time, t eddy = L/σ z , where L is the injection scale of the turbulence and σ z is the injection velocity scale. After this time, the turbulent eddies have cascaded down to the dissipation scale, where the energy is radiated, converted into heat, or transferred via friction of ions and neutrals. There are many possible energy sources for ISM turbulence that may act on a range of scales and in different ISM phases. These injection sources can include gravitational instabilities and mass accretion onto galaxies (Forbes et al. 2014;Goldbaum et al. 2015;Krumholz & Burkhart 2016;Krumholz et al. 2018), protostellar jets (Cunningham et al. 2006;Banerjee et al. 2007;Cunningham et al. 2009;Federrath 2015), the Magnetorotational Instability (Dziourkevitch et al. 2004), expansion of H II regions (Mac Low & Klessen 2004), stellar winds (Offner & Arce 2015;Gallegos-Garcia et al. 2020), and super-novae explosions (Leão et al. 2009;Gressel et al. 2011;Padoan et al. 2017). The dominant largest-scale injection processes are typically thought to be supernova and disk instabilities, which can act on 100 parsec to kiloparsec scales and would imply a dissipation timescale of ≈ 10 − 100 Myrs for Milky Way-like galaxies. While this time-scale is short, it is longer than or equal to other relevant time scales such as the average supernova rate (≈ 5−10 Myrs) or the orbital time (≈ 100 Myr) (Padoan et al. 2016;Martizzi et al. 2015;Krumholz et al. 2018). Thus it is likely that a number of injection mechanisms can maintain the observed levels of non-thermal velocity dispersion seen in different ISM tracers however which mechanism dominates is still an open question (Utomo et al. 2019).
Turbulent motions are self-similar in the inertial range and give rise to the power-law behavior seen in the power spectra, while at the injection scale and dissipation scales, non-power law shapes are observed. In particular, the dissipation scale shows exponential decay in the turbulent kinetic energy power spectra while the injection scale shows a marked bump at larger scales, followed by a transition to a power-law. Additionally, the slope of the power-law depends on a number of physical factors, such as the compressibility of the turbulence, the presence of gravity, thermal instability, and the strength of the magnetic field: (Lazarian & Pogosyan 2006;Padoan et al. 1997;Kritsuk et al. 2007;Lazarian 2009;Kowal & Lazarian 2007;Collins et al. 2012;Burkhart et al. 2013;Pingel et al. 2018;Gallegos-Garcia et al. 2020).
In practice, it is difficult to observationally directly measure the kinetic energy power spectrum and the injection scale and several methods have been proposed in the literature. When dealing with radio positionposition-velocity (PPV) data, the Velocity Coordinate Spectrum (VCS) has been used to directly measure the injection scale of the turbulence (Lazarian & Pogosyan 2006;Chepurnov et al. 2015). Haverkorn et al. (2008) observed Faraday rotation of extragalatic radio sources through the Galactic plane and suggested that stellar winds and protostellar outflows drive turbulence on parsec scales in the spiral arms while supernova do the driv-ing at 100 parsec scales in the interarm regions. Yoon & Cho (2019) demonstrated the use of polarization and the Davis-Chandrasekhar-Fermi Method to determine the driving scale of turbulence. More recently, Bialy et al. (2017) and Bialy & Burkhart (2020) suggest using chemical transitions as diagnostics of the injection scale, since the injection scale can influence the formation of density fluctuations in the ISM and thereby alter the chemistry. Finally, comparison of analytic scaling predictions for different driving mechanisms can also be useful to shed light on which driving mechanism might be dominant, e.g. the study by (Krumholz et al. 2018) suggests investigating the star formation rate vs. gas velocity dispersion correlation can discern between supernova vs. mass transport driving mechanisms.
In this paper we examine the use of higher-order statistics to identify the driving scale of ISM turbulence. In particular, injection of energy can produce correlated Fourier modes and thus techniques that examine modemode coupling and preserve phase information may be valuable. A commonly-used higher-order signal processing technique that contains information from both Fourier amplitude and phases is the bispectrum (Mendel 1991). The Fourier transform of the second-order cumulant (the autocorrelation function) is the power spectrum (see Eq. 5), while the Fourier transform of the thirdorder cumulant is known as the bispectrum. The bispectrum and related three-point correlation function has been used in cosmology to identify key scales of interest in cosmic structure, e.g. Baryon Acoustic Osculations (Eisenstein et al. 2007;Slepian & Eisenstein 2015Pearson & Samushia 2018). The bispectrum and threepoint correlation function have been applied in the context of turbulence in order to determine key parameters of the flow such as the sonic and Alfvénic Mach number (Burkhart et al. 2009;Portillo et al. 2018;Saydjari et al. 2021). However, given its past use in cosmology as a indicator of key physical scales of interest, the bispectrum may also be of interest in galaxy physics for locating turbulent driving scales and scales of filament structure. The motivation of this work is to investigate the utility of the bispectrum and related higher-order statistics for locating the driving scales of interstellar turbulence. This paper is organized as follows: in Section 2 we outline the simulation datasets used with varying driving scale, in Section 3 we define the bispectrum and bicoherence for our convention, in Section 4 we show our results of the bispectrum with different turbulence driving scales, and finally in Section 5 we discuss the implications of our results followed by our conclusions in Section 6. Last, the Appendices A to E describe the technical details of our bispectrum implementation.

SIMULATIONS
We run 3D numerical simulations of isothermal compressible MHD turbulence with a setup similar to that of a number of past works (Hill et al. 2008;Burkhart et al. 2009;Herron et al. 2016;Bialy et al. 2017;Bialy & Burkhart 2020). We refer to these works for the details of the numerical set-up and here provide a short overview.
The code is a third-order accurate essentially nonoscillatory method (ENO) scheme which solves the ideal MHD equations in a periodic box: Here B is the magnetic field, p is the gas pressure, and I is the identity matrix. These simulations have periodic boundary conditions and an isothermal equation of state p = c 2 s ρ, with c s the isothermal sound speed. For the energy source term f , we assume a random large-scale driving at a wave number which is varied as described below. The driving is continuous in space and time and constrained to be solenoidal. The models are run for t ≈ 5 crossing times, to guarantee full development of the energy cascade.
The code outputs a velocity field v, density field ρ, and magnetic field B. The magnetic field consists of the uniform background field and a turbulent field, i.e: B = B 0 + b with the magnetic field initialized along a single preferred direction.
We run simulations with a sonic Mach number M s = 4.5. While previous studies used driving on large scales, with k drive = 2 − 2.5, here we run several simulations, each of which is driven on a different peak driving scale, of k drive = 2.5, 5, 7, 10, and 20, with the full driving range about ∆k = ±1 around the peak. The sonic Mach number is set to M s = 4.5 and the Alfvén Mach number is M A = 0.7 for all simulations. To test numerical convergence we run simulations with various resolutions; N res = 256 3 , 512 3 and 1024 3 resolution elements.
The simulations presented in this work are entirely in adimensional code units and can be rescaled to different physical scalings. For a detailed discussion on how to convert code units into physical units, see Appendix A of (Hill et al. 2008) and the discussion in (McKee et al. 2010).
A subset of these simulations are part of the Catalog for Astrophysical Turbulence Simulations (CATS 5 , ). An example simulation cube is shown in Fig. 1.

THE BISPECTRUM AND BICOHERENCE
In order to measure the relevant scales of sources and sinks of turbulent energy, it is common to employ correlation functions as statistical descriptors. The Fourier space two-point correlation function is the power spectrum. Under the assumption of translational invariance, we define the density power spectrum as where k is a wave vector, · is an ensemble average, andρ(k) is the Fourier transform of the density field ρ(x) (Bernardeau et al. 2002). We restrict our study to the density, but we can in general define power spectra for the velocity and magnetic field vectors or the kinetic and magnetic energy scalars. Assuming that the system size is large enough that spatial averaging is sufficient The bispectrum and bicoherence of (a). These statistics are able to pick out the turbulence driving scale by identifying phase coupling produced by the Fourier driving mechanism. This is indicated by the dashed black line overlayed onto the L-shaped spikes in the bispectrum and bicoherence. The values of 0.2 − 0.3 in the bicoherence can be interpreted as the strength of phase coupling (Cui & Jacobi 2021).
as an ensemble average, we compute the isotropic power spectrum where k is the wavenumber and ∆k is a bin width. In this work we set ∆k = 1. P (k) is often normalized by the bin volume for a particular k, however we use the fluid mechanics convention for computing energy spectra. We also consider the Fourier three-point correlation function, known as the bispectrum. Under the same assumptions as Eq. 4 we define the bispectrum as where k 1 and k 2 are wave vectors (Bernardeau et al. 2002). The power spectrum in Eq. 4 does not capture phase information, while the bispectrum does. Similar to Eq. 5 we isotropically sum the bispectrum over modes with fixed scalar wavenumbers (k 1 , k 2 , k 3 ), where k 3 = |k 1 + k 2 |. Rather than parametrizing by wavenumber k 3 , we do so by the opening triangle angle θ between wave vectors with side lengths (k 1 , k 2 ). In particular, following the conventions in Eq. 5, we compute the isotropic bispectrum where Ω = Ω(k 1 , k 2 , θ) is the set of (k 1 , k 2 ) such that The bicoherence satisfies 0 ≤ b ≤ 1. Decomposing ρ(k) = A(k) exp iφ(k) with A(k) real, the summand of the bispectrum decomposes into an amplitude term A(k 1 )A(k 2 )A(k 1 + k 2 ) and a phase term φ(k 1 ) + φ(k 2 ) − φ(k 1 + k 2 ). Thus, the bicoherence normalization sums only over the amplitudes. If the phase term is a constant over Ω, then b = 1. On the other hand if the phases are random and uniform over (0, 2π], then b ≈ 0. Figure 2 demonstrates how scrambling the phases of an FFT drives the bicoherence to zero. There can be many choices for normalizing bispectra, but here we choose one with a simple geometric interpretation with regard to our spatial averaging. We reduce dimensionality further by summing over all θ contributions for fixed (k 1 , k 2 ). More specifically, we consider B = B(k 1 , k 2 ) and b = b(k 1 , k 2 ), where we define Ω to have a fixed (k 1 , k 2 ) but any θ, i.e. Ω = Ω(k 1 , k 2 ). Summing out the θ dependency is sufficient for our purposes here. That said, our code implementation described in the Appendix can quickly calculate an entire distribution over θ, which is important as many implementations of the bispectrum either average over angles or only calculate for a narrow band of angles to reduce the amount of information. Allowing for variable triangle configurations could be important when one also varies the sonic and the Alfvénic Mach numbers M s and M A , much like in Burkhart et al. (2009) Related to the bispectrum is the three-point correlation function (3PCF). The 3PCF is the inverse Fourier transform of the bispectrum, i.e., it is the real-space analogy of the relationship between the two-point correlation function and the power spectrum. A fast multipole expansion for the 3PCF in opening triangle angle θ was applied to MHD turbulence simulations in Portillo et al. (2018), which found a mild sensitivity to the sonic and Alfvénic Mach numbers.
The bispectrum B(k 1 , k 2 ) has been applied to simulations and observations (Burkhart et al. 2009;Portillo et al. 2018) in the context of characterizing non-Gaussian features arising from compressible turbulence. These studies found that the bispectrum is a sensitive diagnostic for both the sonic and the Alfvénic Mach numbers. More importantly, the bispectrum can describe the behavior of non-linear mode correlation across spatial scales.
The bispectrum characterizes and searches for nonlinear interactions and non-Gaussianity, which makes it a useful technique for studies of supersonic super-Alfvénic MHD turbulence in the ISM and solar wind due to the fact that as turbulent eddies evolve, they transfer energy from large scales to small scales, generating a hierarchical turbulence cascade. For incompressible Kolmogorov-type flows, this can be expressed as k 1 ≈ k 2 = k and k 3 ≈ 2k. Non-linear interactions take place more strongly in compressible and magnetized flows, i.e., k 1 = k 2 . The bispectrum as well as other three-point statistics can characterize these nonlinear interactions.
A complex-valued version of the bispectrum has been used to study time-series correlations. Its phase content has been shown to contain rich information about wave shape (Maccarone 2013) and direction of energy transfer in a fluid (Cui & Jacobi 2021). Other recent works define a complex-valued bispectrum over wave vector and frequency space (Schmidt 2020).
In the following we simply refer to P (k) as the power spectrum and B(k 1 , k 2 ) as the bispectrum. In the appendices we describe the specifics of the implementation, which allows us to calculate density bispectra for a 3D simulation using N res = 1024 3 in an hour and for a 2D column density map N res = 1024 2 in under a minute (in double-precision on a Nvidia A100 40GB GPU). In Appendix A, we describe a Monte Carlo estimator for calculating Eqs. 7 and 8. We demonstrate numerical convergence in Appendix B and describe the open-source implementation in Appendix C. Finally, we describe a caveat to summing over θ in Appendix D and include a discussion of the advantages and disadvantages of our method compared to recent algorithmic developments in Appendix E.

Averaging the Bispectrum
The bispectrum B(k 1 , k 2 ) is more difficult to interpret than the power spectrum P (k) as it is represented by a 2D map of wavenumbers. To address this issue,  introduced averaging over bispectral contours to reduce the information provided by the bispectral amplitudes.
To compactly represent our data, we define the sum over lines of constant k 1 = k Choosing to sum over constant k 1 rather than k 2 is arbitrary because B(k 1 , k 2 ) = B(k 2 , k 1 ). This particular way to reduce the information contained in the bispectrum and bicoherence conveniently highlights the signals that we see in our data. It is also a natural way to average while capturing correlations for a fixed k. Geometrically, it fixes a triangle side length k 1 = k and sums together all triangles with varying side length k 2 .

STUDYING DRIVING SCALES
In this paper we calculate the bispectrum for the density field ρ(x, t) of our simulations, where the dynamics has reached a statistical steady state. Specifically, we compute the bispectrum ofρ(k) = F[ρ(x) −ρ], whereρ is the mean of the density field and F denotes the Fourier transform. Subtracting by the mean has the effect of settingρ(k = 0) = 0, which highlights off-diagonal compo- nents of the bispectrum in the contour plots of Figure 1, for example. Figure 1(a) illustrates an example density field for k drive = 5 and N res = 1024 3 . Its bicoherence b in Fig.  1(b) is strongest for fixed k 1 = k drive for k 2 ≥ k drive . In other words, scales in the turbulent cascade are phasecoupled to the driving scale.
In the following sections we consider the results of our N res = 1024 3 simulations for varying k drive . Figure 3 shows the density power spectra for a range of driving scales. These alone, using only amplitude data, display evidence of the driving scales.

Density
In Fig. 4(a), the bicoherence and the bispectrum as defined in Eq. 10 can also reliably pick out driving scales by illustrating the scales present in the turbulence cascade. However, as also displayed by Fig. 1(b) the bicoherence shows a stronger driving scale signature than the bispectrum. The larger k drive cases also show additional correlations at long-waves in the power spectra of Fig. 3 and averaged bispectra and bicoherence of Fig. 4(b). Figure 5 suggests a reason for these additional correlations. Here, we compare the averaged bicoherence for MHD turbulence and hydrodynamic turbulence at k drive = 10. The hydrodynamic case does not show correlations, hinting that the external magnetic field in MHD turbulence induces mode-mode coupling. We discuss the significance of these magnetic field correlations in Section 5.

Column density
In this section we show results for integrated column density along axes perpendicular and parallel to the background magnetic field B 0 . Results from column density can have applications for observational studies, particularly here for a low-gravity limit. An important future study would study driving scales for models incorporating both turbulence and gravity.
Column density projections perpendicular and parallel to the background magnetic field B 0 are expected to have different statistics. In MHD turbulence, projections perpendicular to B 0 are expected to look anisotropic and parallel projections isotropic. Figure 6 shows the power spectrum across driving scales of column density. In Fig. 6(a), looking perpendicular to B 0 along the z axis, there is an indication of k drive for all cases. However, it is particularly unclear for long-wave driving, k drive = 2.5, 5. The story is similar along the y axis in 6(b). The signal is fainter than for the power spectra of the whole 3D field in Fig. 3. Figure  6(c) is density projected parallel to B 0 . Here, there is less indication of driving scale for all k drive .
Studying the bispectrum and bicoherence is more reliable in extracting driving scales. Figures 7(a) and (b) show column density images along both directions perpendicular to B 0 for k drive = 2.5, 5, 10, and 20. Each image has its bicoherence illustrated beneath. Figures 7(a) and 7(b) show driving scale signatures more reliable than that of the power spectrum in Fig.  6(a) and 6(b). For example, k drive = 5 is especially clear for the bicoherence and less so for the power spectrum. The amplitude enhancement in the power spectrum could be easily washed out by smoothing in observations too.
Notably, the nature of the signal between the bicoherence and power spectrum is very different. The stronger power spectral amplitudes are a result of momentum injection in a very narrow band of Fourier modes. Realistic driving is less simple and requires more robust metrics beyond simple Fourier amplitude enhancement. The bicoherence is able to recognize self-similarity across scales, a physical phenomenon of turbulence. Figure 8 illustrates for column density integrated along B 0 . Like in Fig. 6(c), it is difficult to pick out the driving scale. The column density maps (particularly for k drive = 10 and 20) exhibit more line-of-sight confusion than those in Fig. 7. The smaller scale features do not clearly show a precise preferred scale.
However, the bispectrum is a more robust statistic for analyzing driving scales even along this axis. Correlation enhancements (particularly for k drive = 2.5 and 5) appear to have higher signal-to-noise than the power spectrum.
Interestingly, the long-wave correlations (explained in Fig. 5) disappear when looking along the magnetic field. This is visible in both the power spectrum in Fig. 6(c) and the bicoherence in Fig. 8.

DISCUSSION
In this paper we have demonstrated the utility of the bispectrum (and related bicoherence) for determining driving scales of turbulence in the interstellar medium. The driving scale is an essential physical aspect of a turbulent cascade, where energy can begin to cascade either to small (for a forward cascade) or to large scales (in an inverse cascade). Our study paves the way for future applications of the bispectrum and bicoherence for observational data. Future investigation of the effects of radiative transfer to study particular line tracers (e.g. CO, HI) as well as dust extinction/emission will be performed to further assess the application of our technique for observational data. We also suggest the study of multiple driving scales and more physical driving mechanisms, such as supernova driven turbulence and outflow driven turbulence. We are conducting a follow up study to address these.
The bispectrum includes phase information, whereas the traditional Fourier power spectrum only contains the  amplitudes. The addition of phases provides more structural information that amplitude alone does not capture. The power spectrum more reliably provide the driving scale when dealing with 3D quantities such as density and kinetic energy, but when applied to projected quantities, such as column density, it is no longer able to distinguish the driving scale over noise and effects of the changing magnetic orientation. Interestingly, our results show that the phase information is primarily tracing the driving scale, since the bicoherence is more sensitive than the bispectrum in terms of picking out the driving scale. This is because the bispectrum includes both phase and amplitude information while the bicoherence normalizes out the Fourier phases.
Furthermore, quantifying the spatial structure of the ISM with statistical diagnostics that are sensitive the Fourier phases allows the full information contained in images produced by numerical simulations to be more directly compared with observations. The importance of phase information has become clear recently with the rising interest and use of machine learning algorithms for classifying turbulence. For example, in Peek & Burkhart (2019) a trained convolutional neural network was able to tell sub-Alfvénic vs. super-Alfvénic simulations apart with about 98% accuracy due to phase information alone.  Fig. 7 and Fig. 8. The network was not presented with any of the power spectra information and the only information available was embedded in the Fourier phase domain. This experiment has shown that there is a large amount of valuable information in Fourier phase space which is not available to the power spectrum. Phase information is also of general interest in the MHD turbulence community. Non-linear interactions among MHD waves (i.e., slow, fast, and Alfvén) produce finite correlation of the wave phases. As a result, several other statistical studies have suggested using phase information present in MHD turbulence to understand non-linear mode interactions and plasma turbulence ; Portillo et al. (2018).
We also found that averaging the bispectrum and bicoherence is a very useful post-processing step that allows the bispectral wave vector parameter space to be distilled down to a simpler representation. Our Fig. 4 reveals this averaging demonstrates not only the scale of the driving but also a sensitivity to the presence of a magnetic field. A long-wave mean magnetic field produces correlations that dominate the small wavenumbers, whereas our hydrodynamic simulations do not exhibit this enhancement of correlation. This is likely due to non-local effects present in the MHD turbulence cascade. Nonlocality in MHD turbulence is much more pronounced than that of hydrodynamic turbulence (Cho 2010;Beresnyak & Lazarian 2010). In a forward-cascade, hydrodynamic (HD) turbulence, energy moves from large to small scales as kinetic energy is transferred to smaller eddies by shearing motions of other eddies. Locality in this framework means that interactions between eddies of similar size dominate such that, in Fourier space, a Fourier mode at a wavenumber k interacts primarily with other modes having similar wavenumbers and energy is transferred to larger wavenumbers. MHD turbulence has non-local effects in the form of large-scale shearing motion (which does not promote energy transfer) and non-local energy transfer. This can explain why we see additional mode correlation at scales larger than the driving scales. Further study of the bispectrum of kinetic energy is warranted in order to confirm the effect we see is due to nonlocality effects. The effect of the magnetic field on the bicoherence and phase coupling should be studied further as a parameter based investigation where the strength of the magnetic field is altered.

CONCLUSIONS
In this work we present an adaptable and easy-to-use Monte Carlo GPU/CPU bispectrum and bicoherence implementation. Our code is made publicly available for general use. We describe the implementation and link to the codebase in Appendix C. We have applied the bispectrum and bicoherence to isothermal MHD driven turbulence simulations which have different driving scales from k = 2.5 to k = 20.
We find that: • The power spectrum, bicoherence, and bispectrum are all sensitive to the driving scale of compressible MHD turbulence when applied to the density field.
• The bicoherence of MHD turbulence density fields reveal that Fourier phase correlations are responsible for outlining the scales present in the turbulent cascade.
• When studying projected density (Column density) the bicoherence retains signatures of the driving scales. This opens up applications of the bicoherence to astrophysical observations in order to determine the driving scale of turbulence.
• The presence of an external magnetic field induces  Fig. 4(a). The k drive = 10 and 20 cases also clearly depict the long-wave magnetic field correlations along with those from turbulence driving.
additional mode-mode coupling in the MHD cascade that are not present in a hydrodynamic cascade. The enhanced mode coupling is clear at large scales and can be larger than the driving scale of the turbulence. This may be a signature of Alfvén waves traveling along field lines and inducing an additional cascade at larger scales. When computing the bispectrum of three-dimensional data, it is often too computationally intensive to solve for the bispectrum and bicoherence using Eqs. 7 and 8. Instead, for a given (k 1 , k 2 ), one can do so by uniformly sampling possible combinations of vectors (k 1 , k 2 ) from the sample space Ω. In order to converge to B(k 1 , k 2 , θ), we first need to converge to the expectation value ofρ(k 1 )ρ(k 2 )ρ * (k 1 + k 2 ) by calculating a moving average. Once estimating this value, we can recover B(k 1 , k 2 , θ)-which we defined as the sum over triangles rather than the average over triangles-by multiplying by |Ω|. Note that |Ω| could be used interchangeably with a volume element. We define the estimator where N is the number of samples and (k n 1 , k n 2 ) are independent draws from a joint uniform distribution on Ω. This is just the Monte Carlo integration of a multidimensional integral over Ω. As is standard with such an approach,B is an unbiased estimator of B and the error should decrease as 1/ √ N . More specifically, the error is the mean squared error of the quantityρ(k 1 )ρ(k 2 )ρ * (k 1 + k 2 ) multiplied by |Ω|. We demonstrate convergence numerically in Fig. 9.
One can define a very similar estimator for the bicoherencê For simplicity, we useB interchangeably with B andb with b in this work. For our calculations on data with N res = 1024 3 , we use N = 5 × 10 7 samples. For N res = 1024 2 , we sum over Ω exactly.
B. NUMERICAL CONVERGENCE Figure 9 demonstrates the convergence of sampled Eqs. A1 and A2 to exact Eqs. 7 and 8, respectively. Figure 9(a) shows the bispectral amplitudes and bicoherence binned by the size of the sample space |Ω| for an increasing number of samples N . For the case of an integrated column density 1024 2 dataset, one can take the number of samples to be 10 5 and converge to a strong estimate even for the largest |Ω| = 10 7 (1% of the total sample space). Due to the external magnetic field B 0 , our data has structural anisotropy. For data that is isotropic, we should see even faster convergence due to a smaller mean squared error. Figure 9(b) illustrates the bispectral amplitudes and the bicoherence for the case of exact calculations and calculation with 10 5 samples per (k 1 , k 2 ). The plotted contours show lines of constant |Ω| and therefore demonstrate regions where 10 5 is considered under-sampling or over-sampling. Statistical noise due to sampling is much easier observed in the bicoherence, where the values span many less orders of magnitude than the bispectrum. This noise disappears at a larger number of samples.

C. IMPLEMENTATION
We have written CPU and GPU Python implementations of Eqs. 7, 8, A1, and A2 that have reduced the computational bottleneck for bispectral analysis and allow for calculations on higher resolution data compared to previous studies. For our GPU code we use CuPy, which provides a convenient just-in-time Python interface to the CUDA programming language. We also provide an almost identically implemented CPU code using the popular package Numba.
The bispectrum implementation enumerates all points in an FFT by wavenumber k, and assigns an index to every possible triangle with side lengths (k 1 , k 2 ). This can be done without explicit calculation since the space of all possible triangles is nothing but the cartesian product of spherical shells with radii k 1 and k 2 . To calculate the bispectrum and bicoherence, the code loops over unique (k 1 , k 2 ) and, in parallel, gathers triangle samples, and either sums them together or bins by opening angle θ. The code provides many configurable settings, which are described in its documentation.
The code is implemented in an easy-to-use python package spatialstats, available at https://github.com/mjo22/spatialstats.
The package also includes a GPU power spectrum code and other routines for calculating correlation functions. Each routine can be used independently of the whole package and can be downloaded separately. The routines are well-documented and should be simple to use. Fig. 9.-(a) Numerical convergence for 512x512 correlation maps of bispectral amplitudes |B(k 1 , k 2 )| and bicoherence b(k 1 , k 2 ) calculated from Nres = 1024 2 column density data with varying number of random samples N . We plot the profiles against |Ω|, the number of available triangles to sample from. Convergence should require a larger N with increasing |Ω|. Convergence over all (k 1 , k 2 ) appears to be at N = 10 5 samples. (b) A comparison between exact calculations of the bispectrum and bicoherence and sampling with N = 10 5 . Contours of |Ω| are drawn in white to indicate the regions where N = 10 5 under-samples vs. over-samples the total number of triangles. The bicoherence exhibits more statistical noise than the bispectrum in the sampling case since it spans many less orders of magnitude, but the noise disappears with a larger number of samples.

D. CONSIDERATIONS WITH THE NYQUIST FREQUENCY
It is important to note that one must be careful when interpreting an angularly averaged bispectrum for k 1 +k 2 > k nyq , where k nyq is the Nyquist frequency. For these (k 1 , k 2 ) there exist (k 1 , k 2 ) in the sampling space such that k 1 + k 2 is outside of the domain of the FFT. More specifically, one cannot characterize a number of triangles such that the angle between (k 1 , k 2 ) is between 0 and π/2. Therefore, in this work the bispectrum is biased toward angles between π/2 and π at these upper-triangular wavenumbers. The signals we consider in this paper are not in this upper triangular region.
In this region, the bispectrum B(k 1 , k 2 ) will incur some error. When we compute the bispectrum, there is good agreement with trends from the k 1 + k 2 ≤ k nyq lower-triangular region. We do not attempt an error analysis in this work, but we can revise Eq. 7 for the angularly averaged case to mitigate error.
Under the assumption that one can still attain a good estimate for the averageρ(k 1 )ρ(k 2 )ρ * (k 1 + k 2 ) at these (k 1 , k 2 ), we can revise the angularly averaged version of Eq. 7 to where Ω = Ω(k 1 , k 2 ) is the usual sample space of triangles with constant side lengths and Ω = Ω (k 1 , k 2 , k nyq ) is the sample space restricted by the Nyquist frequency. Multiplying by the correction factor |Ω|/|Ω | can be thought of as first computing an average bispectrum by dividing by |Ω | and then multiplying by the true number of triangles |Ω| that we would have access to were we not restricted by the domain of the FFT. |Ω| is easy to compute because |Ω(k 1 , k 2 )| = |Ω(k 1 )||Ω(k 2 )|, where Ω(k) is the number of modes in a spherical shell of radius k. This is similar to the common practice of dividing by a number of modes and multiplying by a volume element. Note that Eq. D1 simplifies to Eq. 7 for k 1 + k 2 ≤ k nyq . We need not revise the bicoherence in Eq. 8, besides that in reality we are sampling from Ω .

E. COMPARISON WITH BISPECTRUM FFT ESTIMATORS
Recently, the bispectrum community in cosmology has began to utilize FFT estimators with improved computational complexity over direct triangle sampling (Watkinson et al. 2017;Scoccimarro 2015;Regan 2017). The method can be summarized as follows.
We can obtain the three kj dk jρ (k j ) exp(ik j · r) terms by computing an inverse FFT ofρ(k j ) multiplied by a binning function a(k j ) which is 1 for all |k j | = k j and 0 otherwise. Then the integral dr is computed as a sum over all grid points r. This definition for the FFT estimator iŝ Standard methods can be used to calculate V (k 1 , k 2 , k 3 ). Our Monte Carlo approach to sampling may have practical advantages in some use cases. It is parallelized over a collection of triangles and loops over (k 1 , k 2 ) to create maps of wavenumbers over a range of θ. The primary memory requirements are one FFT and the N terms of the sum in Eq. A1. Direct sampling is less memory intensive than FFT estimators and one can fit larger datasets on a GPU. In this work we calculate bispectra for N res = 1024 3 in double precision on a Nvidia A100 40GB GPU, which would not be possible if we needed to store 6 FFTs in memory as indicated in Watkinson et al. (2017). In our use case, a true speed comparison pits a GPU Monte Carlo implementation against a CPU FFT estimator method.
It is also a bit non-trivial to calculate the bispectrum over a large range of (k 1 , k 2 , k 3 ) with the FFT estimator method. If one wants to sweep over all wavenumbers, ideally the FFT estimator method could recycle many of the 6 FFTs per (k 1 , k 2 , k 3 ). However, CPU memory limitations for large datasets may require a user to recompute or store some of these in memory. While this is still often much faster than measuring all possible triangles, if it is appropriate to measure only a subset of them then our GPU Monte Carlo code can sweep over many (k 1 , k 2 , k 3 ) without added bottlenecks.
Another key advantage is that the Monte Carlo code is parallel for a given (k 1 , k 2 )-not for a (k 1 , k 2 , θ). The FFT estimator adds another sequential loop since it is parallel per (k 1 , k 2 , k 3 ). This is another reason why our implementation is preferable when computing grid points in bulk, like the 2D bispectral correlation maps that we have presented in this paper.
Use cases for the bispectrum where one can relax the number of samples N are particularly appealing for Monte Carlo. In the case of perfectly isotropic data, we expect rapid convergence with increasing N since the variance over all triangle evaluations should be quite small. This tunable complexity is also advantageous if a user is more interested in qualitative features of the bispectrum opposed to measuring precise amplitude scaling. They can relax the computation and get an answer quickly.
Recent developments in bispectrum FFT estimators have clear advantages in computational complexity. They make exact computation feasible when counting all triangles is not. However, they are not always optimal. For some cases, a GPU Monte Carlo algorithm could prove to be faster and simpler to use.