Calliope: Pseudospectral shearing magnetohydrodynamics code with a pencil decomposition

The pseudospectral method is a highly accurate numerical scheme suitable for turbulence simulations. We have developed an open-source pseudospectral code, \textsc{\textsf{Calliope}}, which adopts the P3DFFT library \citep{Pekurovsky2012} to perform a fast Fourier transform with the two-dimensional (pencil) decomposition of numerical grids. \textsc{\textsf{Calliope}} can solve incompressible magnetohydrodynamics (MHD), isothermal compressible MHD, and rotational reduced MHD with parallel computation using very large numbers of cores ($>10^5$ cores for $2048^3$ grids). The code can also solve for local magnetorotational turbulence in a shearing frame using the remapping method \citep{Rogallo1981,Umurhan2004}. \textsc{\textsf{Calliope}} is currently the only pseudospectral code that can compute magnetorotational turbulence using pencil-domain decomposition. This paper presents the numerical scheme of \textsc{\textsf{Calliope}} and the results of linear and nonlinear numerical tests, including compressible local magnetorotational turbulence with the largest grid number reported to date.


INTRODUCTION
Accretion disks are ubiquitous in the universe, including active galactic nuclei, close binary systems, and protostars. Thus, accretion disks have long been a subject of interest to astrophysicists. For mass accretion to occur, the accretion disk must be turbulent. The origin of the turbulence is believed to be due to magnetorotational instability (MRI; Balbus & Hawley 1991, 1998, and extensive research on MRIdriven turbulence has been carried out. However, even 30 years after the first discovery of MRI-driven accretion turbulence, many unanswered questions remain about the nature of magnetorotational turbulence. In general, turbulence is characterized by three scales: the energy injection range, inertial range, and dissipation range. Since nonlinear effects are dominant in the inertial range, an essential aspect of direct numerical simulation is how to widely resolve the inertial range. Many theoretical models for the inertial range of magnetohydrodynamics (MHD) turbulence have been proposed (e.g., Iroshnikov 1963;Kraichnan 1965;Goldreich & Sridhar 1995;Boldyrev 2006;Mallet et al. 2017;Loureiro & Boldyrev 2017, see also Schekochi-Corresponding author: Y. Kawazura kawazura@tohoku.ac.jp hin 2020 for a recent review), and direct numerical simulations with free decay or external forcing have confirmed these models. However, these theoretical models of the inertial range have not been confirmed numerically for MRI-driven MHD turbulence.
The simulation of magnetorotational turbulence can be divided into two types: global simulations, which solve for the entire disk (e.g., Machida et al. 2000;Hawley 2000;Tchekhovskoy et al. 2011;Suzuki & Inutsuka 2014;Ressler et al. 2015;Sądowski et al. 2017;Chael et al. 2018, and numerous other recent studies motivated by the Event Horizon Telescope), and local shearing box simulations, which clip a part out of the disk for finer resolution (e.g., Hawley et al. 1995;Sano et al. 2004;Sharma et al. 2006;Lesur & Longaretti 2007;Bodo et al. 2008;Hoshino 2015;Kunz et al. 2016;Zhdankin et al. 2017). However, numerical resolution in recent studies is insufficient to reach the inertial range in MRI-driven turbulence even with the shearing box approach 1 . We need a numerical code with high-order accuracy and high-parallel computing performance to resolve the iner-tial range in MRI-driven turbulence 2 . In this study, we develop a code using a pseudospectral method (Orszag 1969), a highly accurate scheme commonly used in MHD turbulence simulations. In the pseudospectral method, a Fourier transform is performed in the spatial direction, and the time evolution is then solved for a finite number of Fourier coefficients. The pseudospectral method converges faster than any finite difference scheme for smooth functions (infinite-order accuracy or spectral accuracy; Hussaini & Zang 1987), resulting in minimal numerical dissipation. In addition, hyperviscosity and hyper-resistivity proportional to ∇ 2n can be easily used, as algebraic operations replace spatial differentiation, effectively broadening the inertial range. Moreover, the divergence-free condition of the magnetic field, which is often difficult to implement in finite difference MHD simulations, can be easily satisfied algebraically. A negative aspect of the pseudospectral method is that the boundary conditions are restricted because the field is expanded by global orthogonal basis functions. In addition, numerical oscillations due to the Gibbs phenomenon occur when discontinuous structures, such as shocks, are present. For the former concern, the pseudospectral method can be used for local simulations of accretion disks because periodic boundary conditions can be imposed by transforming the computational domain to shearing coordinates (Rogallo 1981;Umurhan & Regev 2004), as described later in this paper. Regarding shocks, even though MRI-driven turbulence is predominantly subsonic, spiral density waves lead to shock formation (Heinemann & Papaloizou 2009); whether these shocks damage the simulations of our code requires investigation. At present, several simulation codes can perform local simulations of accretion disks, but only SNOOPY (Lesur & Longaretti 2007) 3 uses the pseudospectral method. SNOOPY has been used to solve a variety of plasma turbulence problems (e.g., Squire & Bhattacharjee 2015;St-Onge et al. 2020;Squire et al. 2020;Hosking & Schekochihin 2020;Perrone & Latter 2021a,b) as well as MRI-driven turbulence using incompressible MHD (Lesur & Longaretti 2011;Kunz & Lesur 2013;Walker et al. 2016;Kempski et al. 2019;Zhdankin et al. 2017).
In the pseudospectral method, it is necessary to perform fast Fourier transforms (FFTs) and inverse FFTs at each step to evaluate the nonlinear terms. However, the threedimensional FFT is a challenging operation to perform in massively parallel computations because of its algorithmic 2 Note that a high-order MHD solver using a compact finite difference scheme was developed recently, and the code demonstrated a very narrow dissipation range in MRI-driven shearing-box turbulence (Hirai et al. 2018). 3 https://ipag.osug.fr/ lesurg/snoopy.html complexity 4 . Usually, the parallel three-dimensional FFT is performed by the combination of serial FFTs and transposes. Since the grids are not parallelized in the directions in which the FFT is performed, the grids are decomposed either in one dimension (slab decomposition) or two dimensions (pencil decomposition). In the case of slab decomposition, one can use up to N Message Passing Interface (MPI) processes for N 3 grids. Although a hybrid method with OpenMP increases the number of available cores to some extent (Mininni et al. 2011), the increase is limited to only a factor of a few in many cases. The SNOOPY code adopts the slab decomposition approach. Pencil decomposition, on the other hand, in principle allows up to N 2 MPI processes to be used for N 3 grids, enabling much larger parallel computations. One of the most popular current FFT libraries using a pencil decomposition approach is P3DFFT (Pekurovsky 2012) 5 . This study reports the development of an open-source code, Calliope 6 , which solves MRI-driven turbulence in shearing coordinates using a pseudospectral method with the P3DFFT library. It is expected that this code will be able to compute MRI-driven turbulence at higher resolutions than those previously used. To the best of our knowledge, no other pseudospectral code solves MRI-driven turbulence using pencil decomposition. Thus, we believe that the release of Cal- liope as an open-source code will benefit the astrophysical community. Furthermore, Calliope can also be applied to the study of other kinds of three-dimensional MHD turbulence. This paper is organized as follows. In Section 2, we discuss the models that Calliope can solve. Next, in Section 3, we describe the numerical algorithm used by Calliope. The results of linear and nonlinear numerical tests are then presented in Section 4. Finally, Section 5 summarizes the study' s conclusions.

MODEL
In this section, we describe the models that Calliope can solve: isothermal compressible MHD, incompressible MHD, and rotational reduced MHD (RRMHD; Kawazura et al. 2021). First, we consider a system of Cartesian coordinates that co-rotates with the disk at a distance r 0 from the center of the disk with an angular velocity Ωẑ. In this system, the coordinate axes (x, y, z) are taken in the radial, azimuthal, and vertical directions, respectively. The set of equations for isothermal compressible MHD is: where u 0 = −qΩŷ is the background shear flow, q = −(d ln Ω/d ln r) r=r 0 is the shear rate, ρ is the density, u is the velocity, M ≡ ρu is the momentum density, B is the magnetic field, c S is the sound speed (constant), and I is the unit tensor. In the following, we consider only the case of Keplerian rotation, i.e., q = 3/2. Next, the set of equations for the incompressible MHD is: where ρ is the constant density and P is the thermal pressure to be determined by ∇ · u = 0.
In the shearing box, we impose periodic boundaries in the y-and z-directions and a shearing boundary condition f (0, y, z) = f (L x , y − qΩL x t, z) in the x-direction (Hawley et al. 1995), where L x is the box size in the x-direction. In order to use the pseudospectral method, the x-direction must also be periodic, therefore, we perform the shearing coordinate transformation y → y − qΩtx (Rogallo 1981;Umurhan & Regev 2004). By this transformation, x-direction becomes periodic, and the second term on the left-hand side of equations (1a)-(1c) and (2a)-(2b) disappears; instead, the wavenumber in the x-direction evolves in time, as described in the next section.
Next, we show the RRMHD equations. Unlike the other models presented above, we assume the presence of a background magnetic field B 0 that is constant in time and space and tilted at an angle θ with respect to the equatorial plane of the accretion disk. We define x as the radial direction, z as the B 0 direction, and y as the direction perpendicular to x and z . We then employ the Reduced MHD (RMHD) approximation; namely, we make assumptions regarding the wavenumber anisotropy (k /k ⊥ 1) and small amplitude fluctuations (δB/B 0 ∼ u/v A 1), where k (k ⊥ ) is the parallel (perpendicular) wavenumber component to B 0 , δB represents the magnetic field fluctuations, and v A = B 0 / 4πρ 0 is the Alfvén speed. We further assume that B 0 is almost azimuthal, i.e., sin θ 1. Under these assumptions, the set of equations for RRMHD is: where Φ and Ψ are the stream and flux functions, respectively, defined by u ⊥ =ẑ × ∇ ⊥ Φ and δB ⊥ = 4πρẑ × ∇ ⊥ Ψ.
When the angular velocity is zero (i.e., Ω = 0), the RRMHD becomes RMHD, where (3a)-(3b) and (3c)-(3d) are decoupled, and u and δB are passive with respect to Φ and Ψ (Schekochihin et al. 2009). In RRMHD, the effect of the background shear flow disappears due to the assumptions that k /k ⊥ 1 and sin θ 1, so the periodic boundary condition can be imposed without transforming to shearing coordinates.
3. NUMERICAL SCHEME Since all the models described above do not include dissipation, meaning that energy accumulates at the grid-scale due to the turbulent cascade, hyper-dissipation terms should be added to the right-hand side of each model when performing simulations with Calliope. For the isothermal compressible MHD and incompressible MHD cases, the hyper-dissipation is proportional to ∇ 2n , where n is an integer greater than or equal to unity. For the RRMHD case, the perpendicular hyper-dissipation proportional to ∇ 2n ⊥ and the parallel hyperdissipation proportional to (∂/∂z) 2n can be set independently.
Let U be the set of the field variables [e.g., U = (ρ, M, B) for isothermal compressible MHD]. The models solved by Calliope can then be expressed as: where the U k denote the Fourier coefficients of U, (N[U]) k denote the Fourier coefficients of the nonlinear terms, L[U k ] is the linear term originating from the rotation, and D[U k ] is the hyper-dissipation term, respectively. In Calliope, D is treated implicitly, and N and L are treated explicitly. The time evolution is solved using the third-order Gear method (Karniadakis et al. 1991): (5) where the superscript n denote the value at the n-th timestep. To remove the aliasing error of the nonlinear term, we adopt a 2/3-rule utilizing the pruned-FFT feature of P3DFFT.
We now describe the array layout of the field variables. Calliope solves the time evolution of the fields in the Fourier space while the physical space is used only to evaluate the nonlinear terms. In pencil decomposition, one dimension of the three-dimensional array is not distributed between MPI processes, i.e., the array is localized in that direction. As shown in Fig. 1, the y-direction in physical space and the k x -direction in Fourier space are localized. This choice is to avoid unnecessary transposition or inter-process communication during the remapping process, as described below. Since Calliope uses the stride-1 data structure of P3DFFT, the array layout in each process is: x , N (l) y , and N (l) z are the numbers of grids in physical space, and N (k) z /3 are the numbers of grids in Fourier space. In addition, M 1 M 2 represents the total number of processes.
Next, the remapping method is described. In Calliope, the periodic remapping method (Rogallo 1981;Umurhan & Regev 2004) is used for isothermal compressible MHD and incompressible MHD. This method is also used in the SNOOPY code. In shearing coordinates, the wavenumber in the x-direction evolves in time according to k x (t) = k x + qΩtk y . To prevent k x (t) from growing limitlessly, remapping is performed every T = L y /qΩL x . The fields that are periodic in the x-direction in the non-shearing coordinate system at t = 0 become periodic again in the x-direction in the nonshearing coordinate system at t = T . Thus, we can rearrange the field such that: Simultaneously, we reset k x (T ) to k x (0). With this rearrangement, the data outside the computational domain are discarded, and the portion newly allocated to the computational domain is initialized to zero. Thus, the time evolution of the model becomes somewhat choppy before and after remapping 7 . Since the array is rearranged in the k x direction upon remapping, and Calliope localizes the array in the k x direction, data does not need to be transferred between processors when remapping. Note, finally, that Calliope is currently limited to periodic boundary conditions in all three directions. P3DFFT can support a Chebyshev transform in one direction allowing non-periodic boundaries while the other two directions are Fourier transformed. This set of transforms is useful to solve systems in a spherical shell. Implementation of a Chebyshev transform in Calliope should be conducted in the future.

TESTS
In this section, we show the results of linear and nonlinear tests and demonstrate the parallel performance of Calliope.

Linear wave propagation in isothermal MHD
As a first relatively simple test, we calculate linear wave propagation in isothermal compressible MHD. We set the uniform background magnetic field (0, 0, B 0 ), the uniform background density ρ 0 , and the wavenumber vector (k x , 0, k z ). The initial perturbations of the fields are set to the eigenfunctions of the Alfvén, slow, and fast modes, as shown below (Goedbloed & Poedts 2004) • Alfvén mode • Slow and fast modes where ω s,f is the frequency of the slow and fast modes: and α s,f = 1 − k 2 v 2 A /ω 2 s,f . The subscripts s and f denote the slow and fast modes, corresponding to the minus and plus signs in the above equation, respectively. We initialize a mode with one of these eigenfunctions and compute the time evolution to obtain the frequency. We test cases where the value of β ≡ 8πρ 0 c 2 S /B 2 0 is 0.1, 1, and 10 by fixing k x L = 1 and varying k z , and by fixing k z L = 1 and varying k x . Figure 2 shows the results of these tests. In all cases, the numerically obtained frequencies accurately reproduce the theoretical values.

Axisymmetric linear MRI in incompressible MHD
Next, we present a test of an axisymmetric (k y = 0) linear MRI in incompressible MHD. The initial magnetic field is set to (0, 0, B 0 ). The wavenumber is considered only in the z-direction. In this case, the eigenfunctions of the MRI are given by: and γ is the growth rate of the MRI. One mode is initialized with this eigenfunction, and the time evolution is calculated to obtain the growth rate. As shown in Fig. 3, the numerically computed growth rates accurately reproduce the theoretical values.

Two-dimensional Orszag-Tang vortex problem of RMHD
In the following section, we present the tests of nonlinear simulations. First, we compute the two-dimensional Orszag--Tang problem using the RMHD model (Orszag & Tang 1979). We initialize Φ and Ψ as follows: where u 0 is the initial speed, and L ⊥ is the size of x and y directions. Hyperdissipation proportional to ∇ 8 ⊥ is used to terminate the turbulent cascade. Figure 4 shows the results of the test with (n (l) x , n (l) y ) = (2048, 2048). In Fig. 4-(a), we plot each term of the time evolution of free energy: and power balance: where −D tot is the sum of the dissipation due to the hyperviscosity and hyper-resistivity terms. At early times, the magnetic field energy increases and the kinetic energy decreases, with the magnetic field energy reaching a peak at t 2τ 0 where τ 0 = L ⊥ /u 0 . This behavior is consistent with the results of a simulation by Parashar et al. (2009). When t 2τ 0 , only energy exchange occurs between the magnetic field and flow, with no energy dissipation [lower panel in Fig. 4-(a)]. The cascade then reaches the grid scale at t 2τ 0 , and the energy dissipation starts to increase. The magnetic field profile at t 6τ 0 is shown in Fig. 4-(b), where vortices of various sizes ranging from box scale to grid scale are present, indicating that the flow is turbulent. Fig. 4-(c) shows the kinetic and magnetic spectra at t 6τ 0 . Both spectra follow k −3/2 ⊥ power law, which is consistent with observations from other studies of two-dimensional freely decaying MHD turbulence (e.g., Biskamp & Schwarz 2001). The spectra also show that the dissipation is suppressed until the cascade approaches the grid scale, resulting in a wide inertial range.

Three-dimensional nonlinear MRI turbulence in isothermal MHD
In this section, we present a test of three-dimensional MRIdriven turbulence in isothermal compressible MHD. This is the first study where compressible MRI turbulence has been simulated using the pseudospectral method, to the best of our knowledge. The sound speed c S , the angular velocity Ω, initial density ρ 0 , and uniform initial magnetic field (0, 0, B 0 ) are set such that the condition λ MRI = 0.1H is satisfied, where H = c S /Ω is the scale height of the accretion disk and λ MRI = 2πv A /Ω is approximately equal to the wavelength of the fastest-growing MRI mode (Balbus & Hawley 1998). The box size is set to (2H, 4H, H). The corresponding β for this setting is 7.8 × 10 3 . The turbulent cascade is terminated with hyper-dissipation proportional to ∇ 8 . Figure 5 shows the results of a test with (n (l) x , n (l) y , n (l) z ) = (1024, 2048, 512). Even though it is a test, the resolution of this simulation is higher than any other previously reported shearing box simulation. In Figure 5-(a), we plot each term of the time evolution of the free energy: and power balance: where is the MRI energy injection rate and D total is the sum of the hyper-dissipation for M, B, and ρ. As shown in Figure 5-(a), when tΩ 10, the MRI grows linearly and then transitions to a nonlinearly saturated state. This figure also illustrates that the power balance is precisely maintained at any given time in the simulation. Figure 5-(b) shows the spatial profile of magnetic field strength on the x = 0, y = 0, and z = 0 planes. The structure shows clear elongation in the y-direction, due to the stretch-ing caused by shear flow in the y direction, being consistent with other shearing box simulations.
Finally, Fig. 5-(c) shows the omni-directional energy spectra of the velocity and the magnetic field. The velocity field is shallower than k −3/2 and the magnetic field is slightly steeper than k −5/3 . These are consistent with the spectra found in previous shearing box simulations (e.g., Sun & Bai 2021). In addition, since the cascade is terminated with hyper-dissipation proportional to ∇ 8 , the dissipation is suppressed until the cascade approaches the grid scale.

Parallel Performance
Finally, we describe the parallel performance of Calliope. In the pseudospectral method, most of the computation time is consumed evaluating the nonlinear terms using the FFTs and inverse FFTs. Therefore, the parallel performance of Calliope is mostly determined by that of P3DFFT 8 . Here, the parallel performance was measured for nonlinear incompressible MHD simulations with (n (l) x , n (l) y , n (l) z ) = (2048,2048,2048). Note that in the case of isothermal compressible MHD, only one forward FFT is added for ρ while the number of inverse FFT remains unchanged, so the scaling is presumably almost the same as that for incompressible MHD. The measurements were carried out on    Oakforest-PACS 9 at the University of Tokyo and Fugaku 10 (the Japanese flagship supercomputer as of 2021) at RIKEN. The highest performance was obtained when the number of threads was N thread = 4 on Oakforest-PACS and N thread = 8 on Fugaku. We fixed the number of threads at these values and measured the execution time upon changing the number of MPI processes, N proc . Figure 6 shows strong scaling, demonstrating the excellent parallel performance of Calliope. As shown, almost ideal scaling is maintained up to 2 × 10 5 cores at Fugaku.
In Fig. 6, we chose the aspect ratio of the MPI process grid so that M 1 and M 2 become as nearly equal as possible. However, this choice is not always optimal [for example, Pekurovsky (2012, Fig. 3) showed up to 1.44 times performance difference depending on the aspect ratio]. Thus, we measured the performance of Calliope at Oakforest-PACS changing the aspect ratio. The setting was the same as in Fig. 6 (i.e., grid number is fixed to 2048 3 , and the thread number is fixed to N thread = 4). As shown in Fig. 7, we can barely find the performance difference between the choices of the aspect ratio. Note, however, that the effect of the aspect ratio is supposed to depend on the platform.

SUMMARY
This paper has introduced a newly developed open-source code, Calliope, which simulates MHD turbulence with highorder accuracy using the pseudospectral method. Calliope adopts the P3DFFT library to improve the parallel per-9 https://www.cc.u-tokyo.ac.jp/en/supercomputer/ofp/service/ 10 https://www.r-ccs.riken.jp/en/fugaku/ Figure 7. Performance dependence on the aspect ratio of the MPI process grid at Oakforest-PACS. The setting is the same as in Fig. 6. The total number of MPI processes is fixed to (top) N proc = 2048 and (bottom) N proc = 32768.
formance of the three-dimensional FFT computation. Due to the pencil decomposition of P3DFFT, almost ideal parallel performance was demonstrated up to 2 × 10 5 processes on the Fugaku supercomputer for 2048 3 grids. We also presented various linear and nonlinear tests to validate the code. In particular, Calliope demonstrated a very narrow dissipation range in nonlinear tests, which is the principal merit of the pseudospectral method. We highlight that this paper has presented the first simulation of MRI-driven turbulence in isothermal compressible MHD using the pseudospectral method. It is anticipated that further interesting properties of MRI-driven turbulence can be identified by analyzing the data obtained in this test.
Moreover, the inertial range in MRI-driven turbulence may be reached using Calliope with increased computational resources. Revealing the nature of the inertial range is crucial for understanding hot accretion disks. For example, it is vital in understanding energy partitioning between ions and electrons (Kawazura et al. , 2020 and the acceleration of non-thermal particles (Kimura et al. 2016;Sun & Bai 2021). Finally, although this code was developed to simulate MRIdriven turbulence in shearing coordinates, we anticipate that Calliope will be a useful tool for studying other turbulence problems. For example, one can easily modify the RRMHD modules of Calliope to solve Hall RMHD (eqs. (5.14)-(5.17) in Schekochihin et al. 2019), which is a useful model to study turbulence in a regime of electron beta ∼ 1 and ion beta 1. We hope that this code will be widely used in the future. This work was supported by JSPS KAKENHI grant JP19K23451 and JP20K14509. The numerical computations reported here were carried out on Fugaku at RIKEN, on Cray XC50 at the Center for Computational Astrophysics at the National Astronomical Observatory of Japan, on ITO at Kyushu University, on Oakforest-PACS and Oakbridge-CX at the University of Tokyo, and on AOBA-B at Tohoku University.